| 21 Oct 2025 |
Julian | Hmm, considering all the past CI issues, I think I spotted one point in CI logs where the issue starts happening, as it logs the additional unpacking [...] log that later is not printed anymore. Though I cannot confirm whether this is the only place where the issue starts to appear. This code snippet in CI executes nix flake metadata . --refresh. It runs in a checkout without submodules, and I suppose without the empty git submodule directories.
Do you think this could populate the cache with an invalid entry?
| 14:24:42 |
Julian | Thank you for these thoughts and hints. I guess I at least figured out the root cause now with this. | 14:25:52 |
Sergei Zimmerman (xokdvium) | In reply to @juliankuners:matrix.org
Hmm, considering all the past CI issues, I think I spotted one point in CI logs where the issue starts happening, as it logs the additional unpacking [...] log that later is not printed anymore. Though I cannot confirm whether this is the only place where the issue starts to appear. This code snippet in CI executes nix flake metadata . --refresh. It runs in a checkout without submodules, and I suppose without the empty git submodule directories.
Do you think this could populate the cache with an invalid entry?
Hm, a local checkout might do that yeah. The git fetcher might consider it to be cacheable and doesn’t take into account empty subdirs | 14:27:23 |
Julian | As I am thinking about ways to work around this, I feel like all workarounds are very hacky. Is this wrong caching behaviour that I observed covered by the above mentioned PR? Or would it be an on-going issue without clear solution, because submodules are involved? Just wondering whether there's a feasible fix in the meantime or whether it's a bigger scale on-going issue. | 14:37:45 |
Sergei Zimmerman (xokdvium) | In reply to @juliankuners:matrix.org As I am thinking about ways to work around this, I feel like all workarounds are very hacky. Is this wrong caching behaviour that I observed covered by the above mentioned PR? Or would it be an on-going issue without clear solution, because submodules are involved? Just wondering whether there's a feasible fix in the meantime or whether it's a bigger scale on-going issue. Reasonably we could just avoid caching the fetchToStore entry if submodules are present in the local checkout. A more clever solution would be to filter out the empty subdirs for submodules when fetching to the store from a local checkout. The first solution seems more immediate and backportable. Submodules + caching is definitely an ongoing issue | 14:45:20 |
Julian | Re-looking at the CI logs, it's probably not immediately the checkout problem, as it the mis-cached repository is another repository and not the ceckout (that is also not a submodule or anything) :/. Sorry for the confusion. | 14:48:26 |
Julian | For clarification. The repositories have submodules, but none are used for nix flakes and derivations. | 14:49:13 |
Sergei Zimmerman (xokdvium) | I think I understand what’s going on. Will poke around soonish and see | 15:13:25 |
Julian | Yea, me somwhat too. It is actually the checkout issue, but it's caused by a checkout of the mis-cached repository in another CI on the same CI machine, see here. The timestamp of the CI job and the cache entry in the sqlite database match exactly, same as the wrong nix store path 72alw9m62226brv4v4m98fqrk31mlp34. | 15:21:07 |
Julian | I created a respective issue, highlighting the I suppose actual root cause, all things considered https://github.com/NixOS/nix/issues/14317 | 17:03:51 |
dramforever | In reply to @juliankuners:matrix.org I created a respective issue, highlighting the I suppose actual root cause, all things considered https://github.com/NixOS/nix/issues/14317 oh yeah that's a duplicate of https://github.com/NixOS/nix/issues/13698 | 17:13:40 |
dramforever | guess i was a few hours late to the discussion | 17:15:47 |
Julian | Thank you, I missed this one during my issue search. I closed it and linked to it in the respective other issue. | 17:19:57 |
| @echobc:matrix.org joined the room. | 18:09:44 |
| NixOS Moderation Bot banned @echobc:matrix.org (<no reason supplied>). | 18:09:44 |
Eelco | learned today that using std::string for large buffers is very inefficient (huge kernel overhead): https://github.com/DeterminateSystems/nix-src/pull/238/commits/edf45d6e1158a059e7ded5460d2a3947dc35fdf8 | 20:44:48 |
Sergei Zimmerman (xokdvium) | This is sort of more about doing unnecessary construction/destruction of an object. Not special to std::string in any way | 20:55:40 |
Eelco | it's a result of having a large contiguous allocation (so it would also affect std::vector<char>) | 20:56:21 |
Sergei Zimmerman (xokdvium) | Yeah, it's much better to allocate once and reuse that allocation as you did there | 20:56:45 |
Eelco | a data type that consists of a vector of buffers would avoid this problem | 20:57:06 |
Sergei Zimmerman (xokdvium) | I did a similar cleanup in filetransfer..cc at some point | 20:57:07 |
Sergei Zimmerman (xokdvium) | You'd still benefit from allocating it only once though | 20:57:23 |
Sergei Zimmerman (xokdvium) | Going through malloc/free is still a function call through PLT and doing that in a loop is kind of expensive anyway | 20:58:05 |
Sergei Zimmerman (xokdvium) | https://llvm.org/docs/ProgrammersManual.html#vector | 20:58:58 |
Eelco | it's funny though that the optimization that skipped parsing 15 GB of NARs (https://github.com/DeterminateSystems/nix-src/pull/238/commits/1f8d587a0df8f9de366640831dade43d17021c30) had basically no observable effect | 21:01:55 |
Eelco | it's completely dwarfed by the memory allocation / page fault overhead | 21:02:07 |
Sergei Zimmerman (xokdvium) | Yeah that will do it certainly. Also having a too large buffer on the stack is bad:
https://github.com/NixOS/nix/pull/13877 | 21:03:27 |
Sergei Zimmerman (xokdvium) | The stack pointer does get decremented one page at a time. Is that the default behavior or some hardening flag? | 21:04:10 |
Eelco | surprising since stack pages should stay around once paged in | 21:07:22 |
Eelco | though there is some overhead to handle guard pages | 21:07:42 |