!LemuOOvbWqRXodtSsw:nixos.org

NixOS Reproducible Builds

521 Members
Report: https://reproducible.nixos.org Project progress: https://github.com/orgs/NixOS/projects/30120 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
3 Apr 2025
@raboof:matrix.orgraboof'the most secure code is code that doesn't run'? 🦆07:41:13
@raboof:matrix.orgraboof(fixing)07:41:15
5 Apr 2025
@guider-le-recit:matrix.orgguider-le-recit joined the room.12:03:56
@tinybronca:sibnsk.net@tinybronca:sibnsk.net removed their display name underpantsgnome.15:52:40
@tinybronca:sibnsk.net@tinybronca:sibnsk.net left the room.15:53:18
6 Apr 2025
@guider-le-recit:matrix.orgguider-le-recitHello everyone, my name is Keanu an outreachy intern hoping to work on nix if its not too late. I've gone through the nixdev documentation and pkgs reference manual, and started looking into the reproducibility issue for libpinyin #393693. After tracing the code trough the DB files, i thought the problem was in memorychunk, it looks to properly handle its own memory initialization. My question is the non determinism coming from potentially uninitialized padding within the c++ structs, or is there something else to account for? im thankful for any direction01:12:36
@sigmasquadron:matrix.orgFernando Rodrigues Hi Keanu, welcome to Nixpkgs! libpinyin's issues don't look like they come from c++ structs, but from the generation of the pinyin indexes. You're probably going to have to dig into how those indexes are generated by libpinyin. 01:18:31
@sigmasquadron:matrix.orgFernando RodriguesIn this case, it seems like there is already an issue upstream with more information.01:19:24
@sigmasquadron:matrix.orgFernando Rodrigues * Hi Keanu, welcome to Nixpkgs! libpinyin's issues don't come from runtime errors like memory allocation or uninitialised structs, but from the generation of the pinyin indexes. You're probably going to have to dig into how those indexes are generated by libpinyin. 01:20:10
@guider-le-recit:matrix.orgguider-le-recitThanks for pointing me in the right direction Fernando. My mistake I initially thought the issue was struct padding because of the binary differences shown in diffoscope. After looking at the upstream issue and examining the diffoscope output more carefully, I noticed the differences are in database files. I found the gen_binary_files utility in the build logs that seems responsible for generating these files. How does this utility process and sort the input data before writing to the database?03:40:43
@sigmasquadron:matrix.orgFernando RodriguesGood question. That's what we (and apparently also upstream) need you to explore. Dig into the utility's source code, and try to learn its inner workings. If you find something promising, collect your findings and share them in the issue.03:50:44
@guider-le-recit:matrix.orgguider-le-recitThat sounds like fun, thank you Fernando I will start exploring and hopefully get back to you soon.03:58:37
7 Apr 2025
@guider-le-recit:matrix.orgguider-le-recitHi Fernando i was wrong this wasn't fun at all, I've analyzed the load_text methods for the B-Tree tables (pinyin_index, phrase_index, etc.) and they seem to process input sequentially without obvious sources of non-deterministic insertion order. I also traced bigram.db generation to import_interpolation using the Bigram class, confirming it uses DB_HASH. While the Bigram::store method looks deterministic given its SingleGram input, could the non-reproducibility of bigram.db stem from Berkeley DB's default DB_HASH implementation itself being sensitive to the build environment? Is this a known pattern, and are there ways to make BDB Hash generation reproducible?14:19:32
@raboof:matrix.orgraboofI don't know, but I do remember diffoscope has support for berkely db files, so that is at least a promising sign that people who care about reproducibility have been looking at those :)15:20:48
@raboof:matrix.orgraboofthough "Format-specific differences are supported for Berkeley DB database files but no file-specific differences were detected" in this case so not very helpful :)15:37:32
@guider-le-recit:matrix.orgguider-le-recitThe smiley faces help15:57:06
@guider-le-recit:matrix.orgguider-le-recitSo instead we are working with BDB's internal metadata, not application level logic right?15:57:30
@raboof:matrix.orgraboofkinda looks like, yeah15:58:12
@raboof:matrix.orgraboofseems like there's consistently a difference at 0x34 of the file - might be interesting to see if we can figure out what's written there - I bet some sort of timestamp? - and then where it comes from15:59:13
@guider-le-recit:matrix.orgguider-le-recitOkay thank you, I'll start looking at it, and get back to you16:03:29
8 Apr 2025
@guider-le-recit:matrix.orgguider-le-recitSo I've read into the BDB v9 header format docs. While I couldn't pinpoint the exact field name at 0x34 from the available docs, this area contains generic metadata. I thought thought about what you mentioned about a timestamp but decoding the specific bytes didn't yield obvious Unix timestamps. Wouldn't Another strong candidate for volatile data in that region, based on BDB's design, be LSNs? I looked for BDB flags to control this. The most promising one seems to be DB_TXN_NOT_DURABLE. Docs explicitly state this removes the LSNs from page headers. Since LSNs/transaction state fit the profile of volatile metadata in that header area potentially sensitive to the environment (ASLR), this flag seems like a direct way to target the suspected cause of the variation.12:28:44
@guider-le-recit:matrix.orgguider-le-recit * So I've read into the BDB v9 header format docs. While I couldn't pinpoint the exact field name at 0x34 from the available docs, this area contains generic metadata. I thought about what you mentioned about a timestamp but decoding the specific bytes didn't yield obvious Unix timestamps. Wouldn't Another strong candidate for volatile data in that region, based on BDB's design, be LSNs? I looked for BDB flags to control this. The most promising one seems to be DB_TXN_NOT_DURABLE. Docs explicitly state this removes the LSNs from page headers. Since LSNs/transaction state fit the profile of volatile metadata in that header area potentially sensitive to the environment (ASLR), this flag seems like a direct way to target the suspected cause of the variation. 12:29:07
@guider-le-recit:matrix.orgguider-le-recitBut im not entirely sure on this though12:30:16
@raboof:matrix.orgraboofnice research work! this is deeper into BDB internals than I've gotten, but it sounds worth a try?12:41:48
9 Apr 2025
@gigamonster256:matrix.orggigamonster256 joined the room.17:04:16
10 Apr 2025
@raboof:matrix.orgraboofgenerated another report for the minimal iso, no surprises (gettext and jemalloc are still in the staging pipeline) https://reproducibility.nixos.social/reports/nixos-minimal-25.05pre780821.c8cd81426f45-x86_64-linux.iso06:44:57
11 Apr 2025
@poeta_007:matrix.orgAlexander (Axler1) joined the room.12:32:19

Show newer messages


Back to Room ListRoom Version: 6