30 Jul 2025 |
John Ericson | it is not good enough to make just the serializer do the thing | 15:11:31 |
emily | IMO textual formats are good but you need to keep them separate from the non-malleable canonical forms since the two are pretty in tension. Preserves does this. (though again I doubt Preserves is appealing for direct adoption given lack of ecosystem, it's just the measuring stick for these things in my view) | 15:11:38 |
John Ericson | it has to actually enforce it | 15:11:39 |
emily | right. the problem is that there are inevitably multiple consumers | 15:11:52 |
emily | of any format trying to do any kind of interoperability (or in some cases not trying) | 15:12:02 |
John Ericson | I like BCS, but I do want some extensibility/versionability | 15:12:22 |
John Ericson | whihc, unless we want a registry of enum tags, means it is good to have some self-describing strings | 15:12:43 |
John Ericson | * whihc, unless we want a registry of enum tags, means it is good to have some self-describing sttrings | 15:12:45 |
John Ericson | so I figure that would already be crossing the rubicon out of "most dense information, minimal self description" | 15:13:30 |
John Ericson | like, as soon as you have string tags, there is probably some slack like is lexigraphic ordering required | 15:14:07 |
John Ericson | * like, as soon as you have string tags, there is probably some slack like whether lexigraphic ordering required | 15:14:16 |
fzakaria | i'm not usually a big protobuf fan but i do like having a high level IDL to define the thigns being stored. | 15:15:11 |
John Ericson | if/when we stop putting drvs in store objects and go full harvard architecture, we can no worry about the on-disk version being the hashed canonical form | 15:15:13 |
magic_rb | Sorry to interject, whats BCS | 15:16:00 |
John Ericson | but so long as we're deciding "what goes in the drv file", the drvPath is that hash of that | 15:16:02 |
John Ericson | magic_rb: from facebook's blockchain, it is basically concat the things to do products, use leb128 or whatever the variable thing is called to do enum tags for sums | 15:16:40 |
John Ericson | it is a formalization of sort of the "obvious" way to do binary representations of algebraic data types | 15:17:12 |
John Ericson | no string tags, no self description, everything is ordered (products have a left side and right side, the variants of the sum types are totally ordered and assigned 0 1 2 3 4....) | 15:17:52 |
John Ericson | it is basically what you would get on the heap too, except there is no padding/alignment, so you can't just cast it captn-proto style in general | 15:18:19 |
John Ericson | * it is basically what you would get on the heap (if no pointers) too, except there is no padding/alignment, so you can't just cast it captn-proto style in general | 15:19:00 |
magic_rb | Ill read up more on it, its good to know about this stuff, for when you need a binary format | 15:20:03 |
emily | yeah. there are more and less rigorous standards about it though. (e.g. https://preserves.dev/canonical-binary.html again as an exemplar.) but anyway I'd just be repeating what I said overviewing a whole bunch of formats last time I think :) | 15:21:33 |
emily | FWIW protobufs are worse at canonicalization than most formats IIRC | 15:22:05 |
emily | https://protobuf.dev/programming-guides/serialization-not-canonical/ | 15:22:14 |
emily | "Inherent Barriers to Stable Serialization" :) | 15:22:27 |
fzakaria | makes sense; i would probably calculate my own stable key | 15:23:42 |
fzakaria | * makes sense; i would probably calculate my own stable key off the requisite fields | 15:23:53 |
fzakaria | i have rarely ever used the serialization format itself as inputs to things | 15:24:37 |
John Ericson | is this preserves thing widely implemented? | 15:24:44 |
fzakaria | but maybe its because i never have canonicalization :P | 15:24:50 |