!lymvtcwDJ7ZA9Npq:lix.systems

Lix Development

424 Members
(Technical) development of Lix, the package manager, a Nix implementation. Please be mindful of ongoing technical conversations in this channel.140 Servers

Load older messages


SenderMessageTime
20 Mar 2026
@piegames:flausch.socialpiegames even other libraries typically have an option "RFC compliant vs safe&sane", because for many applications, this is a security issue 10:01:44
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them)
In reply to @kfears:matrix.org
https://seriot.ch/software/parsing_json.html
We would like to point again to this gem of a blogpost that dissects extremely well how many RFCs and versions JSON has and how much is left to be interpretation-dependent, so you could very easily be RFC-compliant and also diverge from other parsers in major and very painful and noticeable ways
10:03:23
@piegames:flausch.socialpiegamesAre there any other specifications of JSON other than that RFC?10:04:02
@emilazy:matrix.orgemilythere are multiple RFCs10:04:47
@emilazy:matrix.orgemilyand the original spec on json.org or whatever which is 100% useless10:04:55
@emilazy:matrix.orgemilyI-JSON defines a restricted format for interoperability10:05:01
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them)It could make sense to refer to the behaviors and quirks of the currently used JSON parser directly and state outright "this is what we're working with", and roll back behaviors back to that baseline if changing the parser impl10:05:04
@emilazy:matrix.orgemilybut a JSON parser definitely shouldn't insist on only parsing I-JSON, it's for reference for JSON producers and use in other RFCs10:05:19
@emilazy:matrix.orgemilyanyway10:05:29
@emilazy:matrix.orgemily we're not talking about dodgy Nix programs here, we're talking about dodgy things in another spec and program input 10:05:43
@emilazy:matrix.orgemilybeing unnecessarily conservative there is far more painful/gratuitous than when it's about actual Nix code10:06:00
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them)
In reply to @piegames:flausch.social
Are there any other specifications of JSON other than that RFC?

Quoting the linked blog post:

Yet JSON is defined in at least seven different documents: 2002 - json.org, and the business card 2006 - IETF RFC 4627, which set the application/json MIME media type 2011 - ECMAScript 262, section 15.12 2013 - ECMA 404 according to Tim Bray (RFC 7159 editor), ECMA rushed out to release it because: "Someone told the ECMA working group that the IETF had gone crazy and was going to rewrite JSON with no regard for compatibility and break the whole Internet and something had to be done urgently about this terrible situation. (...) It doesn’t address any of the gripes that were motivating the IETF revision. 2014 - IETF RFC 7158 makes the specification "Standard Tracks" instead of "Informational", allows scalars (anything other than arrays and objects) such as 123 and true at the root level as ECMA does, warns about bad practices such as duplicated keys and broken Unicode strings, without explicitly forbidding them, though. 2014 - IETF RFC 7159 was released to fix a typo in RFC 7158, which was dated from "March 2013" instead of "March 2014". 2017 - IETF RFC 8259 was released in December 2017. It basically adds two things: 1) outside of closed eco-systems, JSON MUST be encoded in UTF-8 and 2) JSON text that is not networked transmitted MAY now add the byte order mark U+FEFF, although this is not stated explicitly. Despite the clarifications they bring, RFC 7159 and 8259 contain several approximations and leaves many details loosely specified.

10:06:08
@emilazy:matrix.orgemily(fwiw, new versions of things RFCs specify are always new RFCs, so the fact that there's a bunch of RFCs doesn't mean that there's tons of divergence necessarily, it just means the spec was refined to clarify things and add warnings. the main JSON RFC is meant to be descriptive not prescriptive; that's what I-JSON is for)10:07:19
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them) *

Quoting the linked blog post:

Yet JSON is defined in at least seven different documents:

  • 2002 - json.org, and the business card
  • 2006 - IETF RFC 4627, which set the application/json MIME media type
  • 2011 - ECMAScript 262, section 15.12
  • 2013 - ECMA 404 according to Tim Bray (RFC 7159 editor), ECMA rushed out to release it because:

"Someone told the ECMA working group that the IETF had gone crazy and was going to rewrite JSON with no regard for compatibility and break the whole Internet and something had to be done urgently about this terrible situation. (...) It doesn’t address any of the gripes that were motivating the IETF revision.

  • 2014 - IETF RFC 7158 makes the specification "Standard Tracks" instead of "Informational", allows scalars (anything other than arrays and objects) such as 123 and true at the root level as ECMA does, warns about bad practices such as duplicated keys and broken Unicode strings, without explicitly forbidding them, though.
  • 2014 - IETF RFC 7159 was released to fix a typo in RFC 7158, which was dated from "March 2013" instead of "March 2014". 2017 - IETF RFC 8259 was released in December 2017. It basically adds two things: 1) outside of closed eco-systems, JSON MUST be encoded in UTF-8 and 2) JSON text that is not networked transmitted MAY now add the byte order mark U+FEFF, although this is not stated explicitly.

Despite the clarifications they bring, RFC 7159 and 8259 contain several approximations and leaves many details loosely specified.

10:07:20
@emilazy:matrix.orgemilythat blog post is probably not a good reference in 2026, since it makes no reference to I-JSON etc.10:08:02
@piegames:flausch.socialpiegamesspeaking of chaning the parser impl, I want to get away from nlohmann as soon as we are rewriting the primops in Rust10:10:17
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them)Decent starting point, though (and the test suite can be re-ran with modern data)10:10:18
@emilazy:matrix.orgemilysince the RFC is prescriptive, it is never going to say "you must not have duplicate keys"10:10:42
@emilazy:matrix.orgemily* since the RFC is descriptive, it is never going to say "you must not have duplicate keys"10:10:48
@emilazy:matrix.orgemilythat's what subsets like I-JSON etc. are for10:10:53
@emilazy:matrix.orgemilyit does point out several interoperability issues though, hence the SHOULDs10:11:00
@piegames:flausch.socialpiegamesback to the main question though, are there any reasonable use cases for duplicate keys?10:11:16
@emilazy:matrix.orgemilythere are documents in the wild that have duplicate keys and that people have to parse; documents with numeric values outside the safe float range (indeed Nix parses many of them as integers); etc.10:11:27
@emilazy:matrix.orgemilyI mean the use case is what do you do if you need to parse some valid JSON with duplicate keys in a Nix program?10:12:00
@emilazy:matrix.orgemilythe fact that JSON is a bad format doesn't mean Nix shouldn't be able to parse JSON10:12:15
@emilazy:matrix.orgemily having a parseJSONWith that lets you be more specific about how to handle weird issues might be good, but is a separate matter 10:12:32
@kfears:matrix.orgKFears& 🏳️‍⚧️ (they/them)We prefer the behavior of taking the last key's value but parsing successfully otherwise, in a general-case JSON implementation. Because we consider JSON with duplicate keys to be malformed, but not to a degree where you'd reject it outright, without options to parse it more liberally10:13:34
@emilazy:matrix.orgemilyeven going by that blog post it's very very rare for implementations to reject duplicate keys10:14:02
@emilazy:matrix.orgemilythough sadly it doesn't look like they checked how it's resolved for differing values of the same key10:14:09
@emilazy:matrix.orgemilybut I expect taking the last value is by far the most common10:14:19

Show newer messages


Back to Room ListRoom Version: 10