30 Jul 2025 |
John Ericson | I guess the calendar is not updated still | 14:57:44 |
John Ericson | https://github.com/nlohmann/json/discussions/2612 | 14:57:48 |
John Ericson | so maybe we should use that for the new derivation format | 14:57:55 |
John Ericson | canonical and easy to read | 14:58:01 |
John Ericson | frankly, I don't think non-canonical would cause issues, because it is fine to have fewer cache hits | 14:58:24 |
John Ericson | but it do feel better with it | 14:58:28 |
John Ericson | * frankly, I don't think non-canonicity would cause issues, because it is fine to have fewer cache hits | 14:58:49 |
John Ericson | oh the RFC exists, but nlohmann doesn't yet implement it | 14:59:16 |
emily | (from the last time this came up, including my strong warning against canonicalized JSON formats and suggestions for alternatives) | 15:00:26 |
emily | the good thing about JSON canonicalization schemes is that there are so many to pick from! | 15:00:54 |
John Ericson | emily: how about CBOR? | 15:09:51 |
John Ericson | Robert Hensing (roberth) was once concerned that making it binary would be a humans debugging drawback | 15:10:24 |
emily | mentioned here (there's also three versions of that spec unfortunately) | 15:10:39 |
John Ericson | to be clear, we need a parser that validates the canonical format | 15:11:19 |
John Ericson | it is not good enough to make just the serializer do the thing | 15:11:31 |
emily | IMO textual formats are good but you need to keep them separate from the non-malleable canonical forms since the two are pretty in tension. Preserves does this. (though again I doubt Preserves is appealing for direct adoption given lack of ecosystem, it's just the measuring stick for these things in my view) | 15:11:38 |
John Ericson | it has to actually enforce it | 15:11:39 |
emily | right. the problem is that there are inevitably multiple consumers | 15:11:52 |
emily | of any format trying to do any kind of interoperability (or in some cases not trying) | 15:12:02 |
John Ericson | I like BCS, but I do want some extensibility/versionability | 15:12:22 |
John Ericson | whihc, unless we want a registry of enum tags, means it is good to have some self-describing strings | 15:12:43 |
John Ericson | * whihc, unless we want a registry of enum tags, means it is good to have some self-describing sttrings | 15:12:45 |
John Ericson | so I figure that would already be crossing the rubicon out of "most dense information, minimal self description" | 15:13:30 |
John Ericson | like, as soon as you have string tags, there is probably some slack like is lexigraphic ordering required | 15:14:07 |
John Ericson | * like, as soon as you have string tags, there is probably some slack like whether lexigraphic ordering required | 15:14:16 |
fzakaria | i'm not usually a big protobuf fan but i do like having a high level IDL to define the thigns being stored. | 15:15:11 |
John Ericson | if/when we stop putting drvs in store objects and go full harvard architecture, we can no worry about the on-disk version being the hashed canonical form | 15:15:13 |
magic_rb | Sorry to interject, whats BCS | 15:16:00 |
John Ericson | but so long as we're deciding "what goes in the drv file", the drvPath is that hash of that | 15:16:02 |
John Ericson | magic_rb: from facebook's blockchain, it is basically concat the things to do products, use leb128 or whatever the variable thing is called to do enum tags for sums | 15:16:40 |