!LemuOOvbWqRXodtSsw:nixos.org

NixOS Reproducible Builds

486 Members
Report: https://reproducible.nixos.org Project progress: https://github.com/orgs/NixOS/projects/30108 Servers

Load older messages


SenderMessageTime
12 Apr 2025
@wiryfuture:matrix.orgPhilip changed their profile picture.11:36:07
@guider-le-recit:matrix.orgguider-le-recitApologies for the delayed update on this, in essence I modified the libpinyin source (ngram_bdb.cpp, chewing_large_table2_bdb.cpp, phrase_large_table3_bdb.cpp, punct_table_bdb.cpp) to add a DB set_flags(handle, DB_TXN_NOT_DURABLE) call immediately after db_create() and before DB open() for all database handles used during index generation. However, attempting to build failed during the make process when running gen_binary_files. The build log showed: ''BDB1566 DB_NOT_DURABLE interface requires an environment configured for the transaction subsystem'' Given that DB_TXN_NOT_DURABLE did not work, I reverted those changes and went back to test the difference between access methods. I modified only ngram_bdb.cpp to change the bigram.db file type from DB_HASH to DB_BTREE in all its DB open calls. The build completed this time, but the reproducibility check (--check) still failed. Running diffoscope showed that while bigram.db is now reported as a B-Tree file, it exhibits the exact same header difference pattern around offset 0x34 as all the other B-Tree files I did not want to try refractoring a DB ENV, so I tried looking if there where any flags i missed that could handle this, there aren't any but instead I found that the problematic region starting at 0x34 corresponds to a 20-byte uid field within the common DBMETA structure (https://github.com/zvelo/BerkeleyDB/blob/master/src/dbinc/db_page.h) This uid is an apperantly inherently non-deterministic unique file identifier generated by BDB during database creation, influenced by runtime factors potentially including ASLR. BDB docs confirms there are no API flags controllable via DB->set_flags() on standalone handles to suppress or stabilize this uid generation. I guess it seems that, the non-reproducibility affecting all generated BDB files stem directly from this volatile uid field. At this point i am tired and not sure what to do next20:05:15
@sigmasquadron:matrix.orgFernando Rodrigues
In reply to @guider-le-recit:matrix.org
Apologies for the delayed update on this, in essence I modified the libpinyin source (ngram_bdb.cpp, chewing_large_table2_bdb.cpp, phrase_large_table3_bdb.cpp, punct_table_bdb.cpp) to add a DB set_flags(handle, DB_TXN_NOT_DURABLE) call immediately after db_create() and before DB open() for all database handles used during index generation.
However, attempting to build failed during the make process when running gen_binary_files. The build log showed: ''BDB1566 DB_NOT_DURABLE interface requires an environment configured for the transaction subsystem''

Given that DB_TXN_NOT_DURABLE did not work, I reverted those changes and went back to test the difference between access methods. I modified only ngram_bdb.cpp to change the bigram.db file type from DB_HASH to DB_BTREE in all its DB open calls.
The build completed this time, but the reproducibility check (--check) still failed. Running diffoscope showed that while bigram.db is now reported as a B-Tree file, it exhibits the exact same header difference pattern around offset 0x34 as all the other B-Tree files

I did not want to try refractoring a DB ENV, so I tried looking if there where any flags i missed that could handle this, there aren't any but instead I found that the problematic region starting at 0x34 corresponds to a 20-byte uid field within the common DBMETA structure
(https://github.com/zvelo/BerkeleyDB/blob/master/src/dbinc/db_page.h)
This uid is an apperantly inherently non-deterministic unique file identifier generated by BDB during database creation, influenced by runtime factors potentially including ASLR. BDB docs confirms there are no API flags controllable via DB->set_flags() on standalone handles to suppress or stabilize this uid generation.

I guess it seems that, the non-reproducibility affecting all generated BDB files stem directly from this volatile uid field. At this point i am tired and not sure what to do next
This is great! The next step would be to iteratively implement a deterministic way to generate the unique file identifier, which would mean patching BerkeleyDB.
But we don't need to worry about that right now. Since the deadline for outreachy applications is coming up, please merge all of your research and your messages here in a GitHub issue, so you have a link to post on Outreachy.
20:10:08
@sigmasquadron:matrix.orgFernando Rodrigues
In reply to @guider-le-recit:matrix.org
Apologies for the delayed update on this, in essence I modified the libpinyin source (ngram_bdb.cpp, chewing_large_table2_bdb.cpp, phrase_large_table3_bdb.cpp, punct_table_bdb.cpp) to add a DB set_flags(handle, DB_TXN_NOT_DURABLE) call immediately after db_create() and before DB open() for all database handles used during index generation.
However, attempting to build failed during the make process when running gen_binary_files. The build log showed: ''BDB1566 DB_NOT_DURABLE interface requires an environment configured for the transaction subsystem''

Given that DB_TXN_NOT_DURABLE did not work, I reverted those changes and went back to test the difference between access methods. I modified only ngram_bdb.cpp to change the bigram.db file type from DB_HASH to DB_BTREE in all its DB open calls.
The build completed this time, but the reproducibility check (--check) still failed. Running diffoscope showed that while bigram.db is now reported as a B-Tree file, it exhibits the exact same header difference pattern around offset 0x34 as all the other B-Tree files

I did not want to try refractoring a DB ENV, so I tried looking if there where any flags i missed that could handle this, there aren't any but instead I found that the problematic region starting at 0x34 corresponds to a 20-byte uid field within the common DBMETA structure
(https://github.com/zvelo/BerkeleyDB/blob/master/src/dbinc/db_page.h)
This uid is an apperantly inherently non-deterministic unique file identifier generated by BDB during database creation, influenced by runtime factors potentially including ASLR. BDB docs confirms there are no API flags controllable via DB->set_flags() on standalone handles to suppress or stabilize this uid generation.

I guess it seems that, the non-reproducibility affecting all generated BDB files stem directly from this volatile uid field. At this point i am tired and not sure what to do next
*
20:10:39
@guider-le-recit:matrix.orgguider-le-recitThank you Fernando20:11:28
@guider-le-recit:matrix.orgguider-le-recitdo i make a new issue or post message onto the original?20:11:50
@guider-le-recit:matrix.orgguider-le-recit* do i make a new issue or post the message onto the original?20:11:59
@sigmasquadron:matrix.orgFernando RodriguesI think it's best to make a new issue, since this affects more than just libpinyin.20:12:49
@guider-le-recit:matrix.orgguider-le-recitOkay, thank you once more20:13:31
@emilazy:matrix.orgemilyfantastic great work!21:35:55
@emilazy:matrix.orgemilyFWIW, BerkeleyDB was abandoned by Oracle. I know there are various forks and API-compatible replacements21:36:53
@emilazy:matrix.orgemilymaybe one of them avoids this issue?21:36:54
@emilazy:matrix.orgemilyit also might be an option to move packages off BerkeleyDB to alternative backends like GNU dbm where supported: https://fedoraproject.org/wiki/User:Pkubat/Draft_-_Removing_BerkeleyDB_from_Fedora21:37:48
@emilazy:matrix.orgemily for libpinyin,
libpinyin X GPLv3+ depends on KyotoCabinet since f24
21:41:20
@emilazy:matrix.orgemilythough I'm not sure if Kyoto Cabinet is maintained either 😆21:42:12
@emilazy:matrix.orgemilyah, https://dbmx.net/kyotocabinet/ points to https://dbmx.net/tkrzw/.21:42:35
@emilazy:matrix.orgemilybut https://github.com/libpinyin/libpinyin/blob/a6f4d3c239883b5e1dd0770ab2b433042845e9c9/configure.ac hardcodes only support for Berkeley DB and Kyoto Cabinet.21:43:04
@emilazy:matrix.orgemily the latest Kyoto Cabinet is still like three years newer than the latest Berkeley DB, and there's a good chance it doesn't have this specific reproducibility bug, so… it may be a good option for libpinyin :) 21:46:43
13 Apr 2025
@bot-wxt1221:matrix.orgBot_wxt1221 joined the room.13:32:05
@guider-le-recit:matrix.orgguider-le-recitHi miss Emily, you are aboslutely correct, I edited the package.nix file, removed Berkeley DB from buildInputs and replaced it with kyotocabinet, added a list configureFlags = [ "--with-dbm=KyotoCabinet" ];, and now the build completes no more derivation errors13:32:30
@guider-le-recit:matrix.orgguider-le-recit * Hi miss Emily, you are aboslutely correct, I edited the package.nix file, removed Berkeley DB from buildInputs and replaced it with kyotocabinet, added a list configureFlags = [ "--with-dbm=KyotoCabinet" ];, and now the build completes with no more derivation errors 13:32:51
@guider-le-recit:matrix.orgguider-le-recitThank you13:33:13
@emilazy:matrix.orgemilynice!13:33:45
@guider-le-recit:matrix.orgguider-le-recitI'm gonna make the github issue now and send the link so you can push your solution13:33:55
@emilazy:matrix.orgemilysolving Berkeley DB reproducibility issues would still be valuable in general though, since it's likely there's software that doesn't support anything else :)13:34:18
@emilazy:matrix.orgemilyI know Debian and Fedora have been working on getting rid of it for years, but I don't know if they've fully achieved that13:35:08
@emilazy:matrix.orgemilydocumenting your great deep dive into the internals will definitely be valuable13:36:17
@guider-le-recit:matrix.orgguider-le-recitAre you aware of any links that i can read up on that?13:37:21
@guider-le-recit:matrix.orgguider-le-recitokay13:37:28
@emilazy:matrix.orgemilyhttps://fedoraproject.org/wiki/User:Pkubat/Draft_-_Removing_BerkeleyDB_from_Fedora is an old table from Fedora and https://lists.debian.org/debian-devel/2014/06/msg00338.html is an email from a decade-old mailing list thread in Debian talking about alternatives like LMDB13:39:34

Show newer messages


Back to Room ListRoom Version: 6