19 Sep 2024 |
K900 | The actual revert to 2.18 fallback-paths is in the next one | 14:08:26 |
Lily Foster | In reply to @lily:lily.flowers the one i linked above was github:lilyinstarlight/foosteros#checks.x86_64-linux.host-bina with nix 2.22.1. i thought it was a newer version though, that specific one may have been fixed by now i'm still dubious about it even being the same issue regardless (even if they both look like gc unsoundness bugs) | 13:52:19 |
K900 | Filed a proper bug upstream so we can investigate async: https://github.com/NixOS/nix/issues/11547 | 14:18:43 |
K900 | Started new eval on unstable-small with the revert | 14:20:29 |
K900 | Will monitor it and then do unstable-large | 14:20:35 |
emily | (well, a config miseval is a config miseval right?) | 13:52:53 |
K900 | And I'm going to need someone with SSH access to mask the channel update timer on pluto so we don't get the current unstable-large eval out | 14:21:04 |
Lily Foster | In reply to @emilazy:matrix.org (well, a config miseval is a config miseval right?) (fair. but i misremembered version number and it being older nix version does mean that specific one may have been fixed, is my point. so by all means test it, but it may not appear with 2.24) | 13:54:12 |
K900 | unstable-small eval'd, bumped | 14:23:56 |
emily | oh sorry | 14:08:33 |
K900 | unstable-large eval starting | 14:24:00 |
emily | I missed that d6e34a6ef1e3a57e8302ad2a1b741615a2558e3f has it | 14:08:35 |
puck | been bisecting this, using GC_FREE_SPACE_DIVISOR=69 GC_ENABLE_INCREMENTAL=1 to make it reproducible to debug | 15:04:06 |
puck | * been bisecting this, using GC_FREE_SPACE_DIVISOR=69 GC_ENABLE_INCREMENTAL=1 to make it mostly reproducible-ish to debug | 15:04:14 |
K900 | unstable-small advanced | 15:17:47 |
puck | In reply to @puck:puck.moe been bisecting this, using GC_FREE_SPACE_DIVISOR=69 GC_ENABLE_INCREMENTAL=1 to make it mostly reproducible-ish to debug down to somewhere in march, in 2.22, it seems? | 15:24:45 |
aleksana (force me to bed after 18:00 UTC) | In reply to @puck:puck.moe down to somewhere in march, in 2.22, it seems? You are already reproducing it? | 15:25:36 |
puck | (i bisected 2.18 to now; doing some careful work to not accidentally bisect into unrelated issues) | 15:25:39 |
puck | In reply to @aleksana:mozilla.org You are already reproducing it? of course, this isn't the first GC bug i'm debugging ^_^ | 15:26:13 |
puck | * of course, this isn't the first GC bug i've debugged ^_^ | 15:26:18 |
aleksana (force me to bed after 18:00 UTC) | Then there's not a lot modification to libexpr at that time | 15:29:14 |
puck | bisecting this is taking a lot longer than i expected, there's a lot of build issues around where git bisect brought us | 15:36:22 |
puck | okay so i definitely made a bisect error because it brought me to the wrong point, but around the right timestamp -- manually checking the most suspicious commit got me to 10/10 successful evals before the commit adding the ListBuilder helper and only 8/10 successful evals after that commit; and that commit introduces what i think is a value that the GC could, in very specific cases, end up reclaiming while still in play | 16:35:07 |
fricklerhandwerk | Thanks a lot puck for doing the archeology | 17:05:10 |
puck | I tried running an even more aggressive GC patch, and hit some other issues (nix-instantiate doesn't GC root the entire EvalState) | 17:05:26 |
Robert Hensing (roberth) | std::map<Symbol, Item> attrsSeen; looks very suspect. That a normal map is not in the GC heap, so the pointers to the allocated elems>2 regions won't be marked | 17:15:11 |
emily | In reply to @puck:puck.moe I tried running an even more aggressive GC patch, and hit some other issues (nix-instantiate doesn't GC root the entire EvalState) maybe it would be a good idea to test with aggressive GC settings in CI? given this and ^ | 17:22:46 |
emily | seems like NixOS configs are good at surfacing GC bugs (probably just because they do a lot of complex eval) | 17:23:04 |
puck | In reply to @puck:puck.moe I tried running an even more aggressive GC patch, and hit some other issues (nix-instantiate doesn't GC root the entire EvalState) there's probably more GC issues around but i don't really know how to test these properly; it's a pretty complex ratio between running a GC as often as possible and just the amount of time the GC ends up churning (especially exponentially) | 18:10:29 |
puck | a bunch of these are hard to test because of the way uses-after-free present themselves; and the boehm allocator's quirks even, i suspect | 18:13:17 |