| 19 Sep 2024 |
Find me at aleksana:qaq.li | If is then we can try to get it to print stack trace with debug info | 03:54:58 |
Find me at aleksana:qaq.li | No luck | 03:57:52 |
Find me at aleksana:qaq.li | In reply to @k900:0upti.me https://hydra.nixos.org/build/273031249/nixlog/6 If there's neither stable reproduction of the sigfault nor stack trace, we're not even completely sure it's brought by a Nix update and not something else like gcc, glibc from a staging-next cycle? | 04:04:15 |
Find me at aleksana:qaq.li | x2 no repro | 04:07:38 |
Find me at aleksana:qaq.li | x3 | 04:10:55 |
Find me at aleksana:qaq.li | x4, well I don't think this gonna work, maybe some stress tests on builtins.concatLists? | 04:16:35 |
puck | got it reproduced locally by pure chance; smells like a GC bug, likely to do with concatLists being passed a list of three lists? | 04:33:31 |
Find me at aleksana:qaq.li | Note that the function void EvalState::concatLists(Value & v, size_t nrLists, Value * * lists, const PosIdx pos, std::string_view errorCtx) was changed a bit after Nix 2.22, in commit https://github.com/NixOS/nix/commit/fecff520d7ce6598319862efc50c2dc6e1f6e9d9#diff-f118e4c6f6e02148b887fdf627352311fca5a3a4eadf0b4a9d9f348e0be464ffR1949 | 04:40:25 |
Find me at aleksana:qaq.li | And a mkList helper function was added | 04:41:08 |
Find me at aleksana:qaq.li | In reply to @puck:puck.moe got it reproduced locally by pure chance; smells like a GC bug, likely to do with concatLists being passed a list of three lists? Did you have a more minimal reproducer tho? | 04:43:19 |
puck | nope! just my local system config | 04:44:07 |
Lily Foster | For what it's worth, i'd been experiencing list corruption errors in CI as well since 2.23~2.24 (i'm not sure when exactly) almost daily in https://github.com/lilyinstarlight/foosteros before i finally gave up CI'ing against cppnix at all. I've no clue if it's related, but here if anyone wants to see an example of a CI run that failed with nonsensical list mis-evaluation weeks ago that then succeeded on a subsequent rerun (and this was happening constantly so i can provide multiple examples. i forget how varied the examples were): https://github.com/lilyinstarlight/foosteros/actions/runs/10822449565/job/30026445898#step:6:4654 | 04:48:12 |
Lily Foster | Actually it looks like that CI run specifically was 2.22.1 from the published release binary tarball (https://releases.nixos.org/nix/nix-2.22.1/nix-2.22.1-x86_64-linux.tar.xz): https://github.com/lilyinstarlight/foosteros/actions/runs/10822449565/job/30026445898#step:3:127 | 04:50:31 |
puck | oof, if that's the same issue i'm a bit worried about possible silent misevaluations | 04:51:56 |
Find me at aleksana:qaq.li | It looks like the memory has crossed the boundary but has not crossed the boundary to the outside of thread, just happen to read another string? | 04:53:25 |
Find me at aleksana:qaq.li | (I am not particularly familiar with this area | 04:53:58 |
Lily Foster | (i'd originally tested my config against dev builds via https://github.com/nix-community/nix-unstable-installer to catch these bugs early, but eval was so constantly regressed on HEAD for long enough that i also disabled it too several months ago) | 04:53:59 |
Lily Foster | * (i'd originally tested my config against dev builds via https://github.com/nix-community/nix-unstable-installer to catch these bugs early before release, but eval was so constantly regressed on HEAD for long enough that i also disabled it too several months ago) | 04:54:13 |
Lily Foster | (but i might be able to dig up logs from those runs before that too if possible regression points against unreleased commit hash might be helpful) | 04:55:39 |
puck | In reply to @puck:puck.moe oof, if that's the same issue i'm a bit worried about possible silent misevaluations looks like in your case the value pointed to in the list passed to concatLists got replaced with another value; but in my case it seems the entire list's elems was replaced with a tApp Value | 04:58:40 |
Find me at aleksana:qaq.li | * It looks like the pointer has crossed the boundary but has not crossed the boundary to the outside of thread, just happen to read another string? | 05:11:08 |
K900 | So what I'm getting here is that no one fully understands the bug yet | 05:52:33 |
K900 | And it involves GC | 05:52:38 |
K900 | Which brings me back to my original question | 05:52:47 |
K900 | Do we revert again | 05:52:51 |
Find me at aleksana:qaq.li | How do we make sure that the bug definitely doesn't happen with nix 2.18 and newer libraries in tree | 06:02:37 |
K900 | Which libraries? | 06:07:03 |
Find me at aleksana:qaq.li | In reply to @k900:0upti.me Which libraries? compiler, libc, bohemgc, other stuff | 06:12:22 |
K900 | Compiler and libc have not been touched in a long time | 06:14:29 |
K900 | And nothing broke there | 06:14:33 |