NixOS Reproducible Builds - Public Room Timeline

	NixOS Reproducible Builds	485 Members
	Report: https://reproducible.nixos.org Project progress: https://github.com/orgs/NixOS/projects/30	108 Servers

Load older messages

Sender	Message	Time
1 Apr 2025
	Adam Neverwas set a profile picture.	23:15:40
2 Apr 2025
raboof	any additional eyes on figuring out what's going on with jemalloc in https://github.com/NixOS/nixpkgs/pull/393724 would be welcome - this is the last remaining known issue for the minimal iso and needs to go through staging...	08:18:42
Pol	The diff LGTM and make sense. Building stuff straight from the repository, not using artifacts (remind me somethinxz! )	08:24:11
raboof	notably someone with C experience who can judge whether it'd be safer to over-estimate or under-estimate the LG_VADDR value in https://github.com/jemalloc/jemalloc/blob/dev/include/jemalloc/internal/rtree.h would be a great help :)	09:51:45
Pol	Packaging this library is a nightmare... I thought that Apache Foundation would be more sensible to reproducibility... but nope. https://github.com/apache/orc The build process download stuff from Internet, it seems very hard to get around it.	14:37:01
Pol	Got it building in the end.	16:35:37
	rntpts joined the room.	19:13:16
3 Apr 2025
	diamond (it/its) changed their display name from Diamond (it/she) to diamond (it/its).	01:07:09
emily	raboof: I think you broke `jemalloc`: `#define JEMALLOC_VERSION "0.0.0-0-g000000missing_version_try_git_fetch_tags" #define JEMALLOC_VERSION_MAJOR 0 #define JEMALLOC_VERSION_MINOR 0 #define JEMALLOC_VERSION_BUGFIX 0 #define JEMALLOC_VERSION_NREV 0 #define JEMALLOC_VERSION_GID "000000missin" #define JEMALLOC_VERSION_GID_IDENT 000000missin`	02:13:16
Fernando Rodrigues	reproducibly broken is still reproducible :)	02:13:47
Pol	That's the spirit :D	07:24:20
raboof	'the most secure code is code that doesn't run'? 🦆	07:41:13
raboof	(fixing)	07:41:15
5 Apr 2025
	guider-le-recit joined the room.	12:03:56
	@tinybronca:sibnsk.net removed their display name underpantsgnome.	15:52:40
	@tinybronca:sibnsk.net left the room.	15:53:18
6 Apr 2025
guider-le-recit	Hello everyone, my name is Keanu an outreachy intern hoping to work on nix if its not too late. I've gone through the nixdev documentation and pkgs reference manual, and started looking into the reproducibility issue for libpinyin #393693. After tracing the code trough the DB files, i thought the problem was in memorychunk, it looks to properly handle its own memory initialization. My question is the non determinism coming from potentially uninitialized padding within the c++ structs, or is there something else to account for? im thankful for any direction	01:12:36
Fernando Rodrigues	Hi Keanu, welcome to Nixpkgs! `libpinyin`'s issues don't look like they come from c++ structs, but from the generation of the pinyin indexes. You're probably going to have to dig into how those indexes are generated by libpinyin.	01:18:31
Fernando Rodrigues	In this case, it seems like there is already an issue upstream with more information.	01:19:24
Fernando Rodrigues	* Hi Keanu, welcome to Nixpkgs! `libpinyin`'s issues don't come from runtime errors like memory allocation or uninitialised structs, but from the generation of the pinyin indexes. You're probably going to have to dig into how those indexes are generated by libpinyin.	01:20:10
guider-le-recit	Thanks for pointing me in the right direction Fernando. My mistake I initially thought the issue was struct padding because of the binary differences shown in diffoscope. After looking at the upstream issue and examining the diffoscope output more carefully, I noticed the differences are in database files. I found the gen_binary_files utility in the build logs that seems responsible for generating these files. How does this utility process and sort the input data before writing to the database?	03:40:43
Fernando Rodrigues	Good question. That's what we (and apparently also upstream) need you to explore. Dig into the utility's source code, and try to learn its inner workings. If you find something promising, collect your findings and share them in the issue.	03:50:44
guider-le-recit	That sounds like fun, thank you Fernando I will start exploring and hopefully get back to you soon.	03:58:37
7 Apr 2025
guider-le-recit	Hi Fernando i was wrong this wasn't fun at all, I've analyzed the load_text methods for the B-Tree tables (pinyin_index, phrase_index, etc.) and they seem to process input sequentially without obvious sources of non-deterministic insertion order. I also traced bigram.db generation to import_interpolation using the Bigram class, confirming it uses DB_HASH. While the Bigram::store method looks deterministic given its SingleGram input, could the non-reproducibility of bigram.db stem from Berkeley DB's default DB_HASH implementation itself being sensitive to the build environment? Is this a known pattern, and are there ways to make BDB Hash generation reproducible?	14:19:32
raboof	I don't know, but I do remember diffoscope has support for berkely db files, so that is at least a promising sign that people who care about reproducibility have been looking at those :)	15:20:48
raboof	though "Format-specific differences are supported for Berkeley DB database files but no file-specific differences were detected" in this case so not very helpful :)	15:37:32
guider-le-recit	The smiley faces help	15:57:06
guider-le-recit	So instead we are working with BDB's internal metadata, not application level logic right?	15:57:30
raboof	kinda looks like, yeah	15:58:12
raboof	seems like there's consistently a difference at 0x34 of the file - might be interesting to see if we can figure out what's written there - I bet some sort of timestamp? - and then where it comes from	15:59:13

Show newer messages

Back to Room ListRoom Version: 6