| 20 Mar 2026 |
Qyriad | We should almost definitely at least warn on this though | 10:41:10 |
piegames | hm, as in, runtime warning? | 10:41:30 |
Qyriad | Yes | 10:41:37 |
piegames | fucking hell | 10:44:41 |
piegames | there is a rust crate called ijson | 10:44:45 |
piegames | which has nothing to do with I_JSON, and is just "an opinionated fork of serde_json by a person whose name starts with i" | 10:45:12 |
Qyriad | Of course | 10:45:32 |
Coca | All the rust json crates I already know of:
Input: {"name": "One", "name": "Two"}
serde_json (serde API): Err(Error("duplicate field `name`", line: 1, column: 22))
serde_json (Value API): Ok(Some(String("Two")))
nanoserde: Ok(Test { name: "Two" })
facet-json: Ok(Test { name: "Two" })
simd-json (serde API): Err(Error { index: 0, character: None, err_type: Serde("duplicate field `name`") })
simd-json (Value API): Ok(Some(Value([String("One")])))
| 10:46:07 |
piegames | "internet json" surely wins the prize for the least searchable name of the week | 10:46:08 |
Coca | * All the rust json crates I already know of:
Input: {"name": "One", "name": "Two"}
serde_json (serde API): Err(Error("duplicate field `name`", line: 1, column: 22))
serde_json (Value API): Ok(Some(String("Two")))
nanoserde: Ok(Test { name: "Two" })
facet-json: Ok(Test { name: "Two" })
simd-json (serde API): Err(Error { index: 0, character: None, err_type: Serde("duplicate field `name`") })
simd-json (Value API): Ok(Some(Value([String("One")])))
| 10:46:15 |
piegames | "cause fewer problems" as in "fail less loudly and prominently" | 10:47:01 |
piegames | also what the fuck on them having different behavior based on which API you use | 10:47:40 |
piegames | can we collectively please all go back to XML already? | 10:48:11 |
Coca | yeah its certainly something | 10:48:37 |
Coca | actually argh I could be more clear here with serde_json since Value is still parsed using serde traits in serde_json, but in simd-json the Value API is wholly separate from serde | 10:51:33 |
emily | btw I would very strongly caution against doing this without an extremely comprehensive test suite and probably carefully directed fuzzing. it is likely to cause far more "no abort, different results" interoperability issues than duplicate keys ever would. 2.4 switching to nlohmann was part of why it was probably the most hash-breaking release in memory | 11:22:54 |
emily | see also toml11 bumps sleepwalking into breaking Nixpkgs lib tests etc. | 11:23:20 |
emily | any complete of builtin parser implementation like that should most likely come with a period of running both of them and aborting on any divergence | 11:24:03 |
emily | (how many hours do you have to learn about XML interoperability issues?) | 11:25:26 |
emily | (it's a miracle Nix at least only has it as an output format) | 11:25:40 |
emily | (and I like XML!) | 11:25:46 |
emily | I really don't think so. the JSON iceberg is not the Nix iceberg | 11:27:26 |
emily | a warning about input you don't control is not actionable or going to go anywhere, unless it's a truly unfixable interoperability problem | 11:28:29 |
emily | it's unlikely JSON documents people produce solely for consumption in Nix will have duplicate keys because most language APIs won't do that | 11:29:07 |
emily | Lix accepting escaped NUL codepoints in JSON that correctly-written Nixpkgs library functions violate their contract on is far more likely to cause interoperability issues than duplicate keys tbh | 11:30:11 |
emily | (and that's a case where there actually is current and historical divergence, so a warning would be much more justifiable there...) | 11:30:58 |
emily | interesting to see simd-json diverge, though it also has several correctness issues open on GitHub, some from years ago 😅 | 11:34:00 |
emily | wonder if they somehow get better perf from that result or something | 11:34:38 |
emily | are you sure it's not storing both values in a parse tree type form and the lookup is just terminating on the first one actually? | 11:35:18 |
Coca | My hunch was that simd-json can get the earliest acceptable value and then stop parsing, at least with the tape Value, but I would have to see | 11:35:49 |