| 30 Jul 2025 |
John Ericson | but it do feel better with it | 14:58:28 |
John Ericson | * frankly, I don't think non-canonicity would cause issues, because it is fine to have fewer cache hits | 14:58:49 |
John Ericson | oh the RFC exists, but nlohmann doesn't yet implement it | 14:59:16 |
emily | (from the last time this came up, including my strong warning against canonicalized JSON formats and suggestions for alternatives) | 15:00:26 |
emily | the good thing about JSON canonicalization schemes is that there are so many to pick from! | 15:00:54 |
John Ericson | emily: how about CBOR? | 15:09:51 |
John Ericson | Robert Hensing (roberth) was once concerned that making it binary would be a humans debugging drawback | 15:10:24 |
emily | mentioned here (there's also three versions of that spec unfortunately) | 15:10:39 |
John Ericson | to be clear, we need a parser that validates the canonical format | 15:11:19 |
John Ericson | it is not good enough to make just the serializer do the thing | 15:11:31 |
emily | IMO textual formats are good but you need to keep them separate from the non-malleable canonical forms since the two are pretty in tension. Preserves does this. (though again I doubt Preserves is appealing for direct adoption given lack of ecosystem, it's just the measuring stick for these things in my view) | 15:11:38 |
John Ericson | it has to actually enforce it | 15:11:39 |
emily | right. the problem is that there are inevitably multiple consumers | 15:11:52 |
emily | of any format trying to do any kind of interoperability (or in some cases not trying) | 15:12:02 |
John Ericson | I like BCS, but I do want some extensibility/versionability | 15:12:22 |
John Ericson | whihc, unless we want a registry of enum tags, means it is good to have some self-describing strings | 15:12:43 |
John Ericson | * whihc, unless we want a registry of enum tags, means it is good to have some self-describing sttrings | 15:12:45 |
John Ericson | so I figure that would already be crossing the rubicon out of "most dense information, minimal self description" | 15:13:30 |
John Ericson | like, as soon as you have string tags, there is probably some slack like is lexigraphic ordering required | 15:14:07 |
John Ericson | * like, as soon as you have string tags, there is probably some slack like whether lexigraphic ordering required | 15:14:16 |
fzakaria | i'm not usually a big protobuf fan but i do like having a high level IDL to define the thigns being stored. | 15:15:11 |
John Ericson | if/when we stop putting drvs in store objects and go full harvard architecture, we can no worry about the on-disk version being the hashed canonical form | 15:15:13 |
magic_rb | Sorry to interject, whats BCS | 15:16:00 |
John Ericson | but so long as we're deciding "what goes in the drv file", the drvPath is that hash of that | 15:16:02 |
John Ericson | magic_rb: from facebook's blockchain, it is basically concat the things to do products, use leb128 or whatever the variable thing is called to do enum tags for sums | 15:16:40 |
John Ericson | it is a formalization of sort of the "obvious" way to do binary representations of algebraic data types | 15:17:12 |
John Ericson | no string tags, no self description, everything is ordered (products have a left side and right side, the variants of the sum types are totally ordered and assigned 0 1 2 3 4....) | 15:17:52 |
John Ericson | it is basically what you would get on the heap too, except there is no padding/alignment, so you can't just cast it captn-proto style in general | 15:18:19 |
John Ericson | * it is basically what you would get on the heap (if no pointers) too, except there is no padding/alignment, so you can't just cast it captn-proto style in general | 15:19:00 |
magic_rb | Ill read up more on it, its good to know about this stuff, for when you need a binary format | 15:20:03 |