29 Nov 2024 |
raboof | interesting, it reproduces for me. could you diffoscope it and file an issue (https://github.com/NixOS/nixpkgs/issues/new?assignees=&labels=0.kind%3A+enhancement%2C6.topic%3A+reproducible+builds&projects=&template=unreproducible_package.md&title=)? | 15:45:33 |
p14 | How do I diffoscope it; how do I get my hands on the installed paths? I just did --keep-failed but from what I see this keeps the build directory but not the install directory? | 15:46:40 |
p14 | If I diff the rsync binary in the build directory against the installed one, they seem quite different, and the build directory's one hasn't been stripped | 15:47:19 |
raboof | the --keep-failed should keep something like /nix/store/x850848v3xl4wxjqzc3q9jp7j6fbkh27-rsync-3.3.0.check or so and tell you about it | 15:49:28 |
raboof | file $(nix-build '<nixpkgs>' -A rsync)/bin/rsync is also not stripped for me | 15:49:55 |
p14 | OK, nix build --rebuild is different from nix-build --check ; the latter reports that as you say. | 15:50:54 |
p14 | It's just the rsync binary which is differing, and it's differing in various virtual addresses leading to quite a large binary diff. | 15:53:45 |
raboof | ok, so nothing obvious in the 'readable' parts of the diffoscope output? | 15:55:16 |
raboof | sometimes 'strings' produces some hint? | 15:55:56 |
p14 | Filed https://github.com/NixOS/nixpkgs/issues/360152 -- apologies I didn't see the link was to an issue template | 15:59:25 |
raboof | thanks! nothing jumps out at me at first glance either | 16:02:47 |
raboof | back to the original topic, though: I'm surprised specifying a -frandom-seed does seem to cause content-adressed rebuilds, but at the same time leaving it unspecified does not cause reproducibility issues. worth an experiment, though, of course. | 16:43:20 |
raboof | * back to the original topic: I'm surprised specifying a -frandom-seed does seem to cause content-adressed rebuilds, but at the same time leaving it unspecified does not cause reproducibility issues. worth an experiment, though, of course. | 16:43:32 |
atemu12 | It depends on how you define the random seed i.g. If you used $out to deduce it, that'd obviously cause CA rebuilds | 16:51:52 |
raboof | what is it set to when you leave it unspecified? | 16:52:17 |
atemu12 | It's random IIRC | 16:54:02 |
raboof | then wouldn't that just-as-obviously cause reproducibility issues? | 16:54:23 |
atemu12 | Sure would | 16:55:08 |
p14 | It depends how or whether it is used, right? Clang for example doesn’t use it | 16:55:13 |
atemu12 | I fixed that in the kernel once | 16:55:19 |
p14 | I am unclear how it is used in gcc, is there information about that somewhere? At least for some standard builds of some software, removing it improves reproducibility by removing the outpath from affecting the build. | 16:56:57 |
raboof | https://reproducible-builds.org/docs/randomness/ mentions "Link-Time Optimizations" may have to do with it. if we figure out what exactly is going on it'd be good to add that to that page. | 17:00:04 |
p14 | Yeah, I think it is used in LTO somehow; is LTO used in nixpkgs? If not it could be a noop. And even if it was, it would be good to use a different value for it which does not depend on outpath. | 17:01:22 |
raboof | if it'd be a noop then setting it to the output shouldn't hurt, though | 17:01:48 |
raboof | * if it'd be a noop then setting it to the output path shouldn't hurt, though | 17:02:01 |
raboof | but I agree it doesn't seem like a great choice | 17:02:26 |
Mindavi | It is at least used to generate the build-id, when separateDebugInfo is set you see that it differs | 17:09:36 |
Mindavi | When the input path is different and the output is expected to be the same, e.g. when setting an unused environment variable | 17:10:36 |
p14 | In reply to @raboof:matrix.org if it'd be a noop then setting it to the output path shouldn't hurt, though Except that it breaks CA derivations because if something varies on the input then the outPath varies, which means now you are varying the compiler args, which results in changes to the bits in the output (e.g because compiler args go into the binaries or the build id) where otherwise you would have bitwise identical outputs.
So downstream packages then need to be rebuilt where otherwise they could have used the previous CA output. | 17:14:45 |
raboof | aah, so the theory would be that the args are hashed into the output somewhere before being 'interpreted' - yeah I suppose that could be. | 17:19:18 |