Lix Development | 424 Members | |
| (Technical) development of Lix, the package manager, a Nix implementation. Please be mindful of ongoing technical conversations in this channel. | 140 Servers |
| Sender | Message | Time |
|---|---|---|
| 20 Mar 2026 | ||
| even other libraries typically have an option "RFC compliant vs safe&sane", because for many applications, this is a security issue | 10:01:44 | |
In reply to @kfears:matrix.orgWe would like to point again to this gem of a blogpost that dissects extremely well how many RFCs and versions JSON has and how much is left to be interpretation-dependent, so you could very easily be RFC-compliant and also diverge from other parsers in major and very painful and noticeable ways | 10:03:23 | |
| Are there any other specifications of JSON other than that RFC? | 10:04:02 | |
| there are multiple RFCs | 10:04:47 | |
| and the original spec on json.org or whatever which is 100% useless | 10:04:55 | |
| I-JSON defines a restricted format for interoperability | 10:05:01 | |
| It could make sense to refer to the behaviors and quirks of the currently used JSON parser directly and state outright "this is what we're working with", and roll back behaviors back to that baseline if changing the parser impl | 10:05:04 | |
| but a JSON parser definitely shouldn't insist on only parsing I-JSON, it's for reference for JSON producers and use in other RFCs | 10:05:19 | |
| anyway | 10:05:29 | |
| we're not talking about dodgy Nix programs here, we're talking about dodgy things in another spec and program input | 10:05:43 | |
| being unnecessarily conservative there is far more painful/gratuitous than when it's about actual Nix code | 10:06:00 | |
In reply to @piegames:flausch.social Quoting the linked blog post:
| 10:06:08 | |
| (fwiw, new versions of things RFCs specify are always new RFCs, so the fact that there's a bunch of RFCs doesn't mean that there's tons of divergence necessarily, it just means the spec was refined to clarify things and add warnings. the main JSON RFC is meant to be descriptive not prescriptive; that's what I-JSON is for) | 10:07:19 | |
| * Quoting the linked blog post:
| 10:07:20 | |
| that blog post is probably not a good reference in 2026, since it makes no reference to I-JSON etc. | 10:08:02 | |
| speaking of chaning the parser impl, I want to get away from nlohmann as soon as we are rewriting the primops in Rust | 10:10:17 | |
| Decent starting point, though (and the test suite can be re-ran with modern data) | 10:10:18 | |
| since the RFC is prescriptive, it is never going to say "you must not have duplicate keys" | 10:10:42 | |
| * since the RFC is descriptive, it is never going to say "you must not have duplicate keys" | 10:10:48 | |
| that's what subsets like I-JSON etc. are for | 10:10:53 | |
| it does point out several interoperability issues though, hence the SHOULDs | 10:11:00 | |
| back to the main question though, are there any reasonable use cases for duplicate keys? | 10:11:16 | |
| there are documents in the wild that have duplicate keys and that people have to parse; documents with numeric values outside the safe float range (indeed Nix parses many of them as integers); etc. | 10:11:27 | |
| I mean the use case is what do you do if you need to parse some valid JSON with duplicate keys in a Nix program? | 10:12:00 | |
| the fact that JSON is a bad format doesn't mean Nix shouldn't be able to parse JSON | 10:12:15 | |
having a parseJSONWith that lets you be more specific about how to handle weird issues might be good, but is a separate matter | 10:12:32 | |
| We prefer the behavior of taking the last key's value but parsing successfully otherwise, in a general-case JSON implementation. Because we consider JSON with duplicate keys to be malformed, but not to a degree where you'd reject it outright, without options to parse it more liberally | 10:13:34 | |
| even going by that blog post it's very very rare for implementations to reject duplicate keys | 10:14:02 | |
| though sadly it doesn't look like they checked how it's resolved for differing values of the same key | 10:14:09 | |
| but I expect taking the last value is by far the most common | 10:14:19 | |