| 21 Mar 2025 |
Vladimír Čunát | That was aarch64. But the last x86_64 stack-trace looks exactly the same. | 14:22:36 |
Vladimír Čunát | So I copied one of the x86_64 cores to a local machine with services.nixseparatedebuginfod.enable = true; and got
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7f73f68d96c0 (LWP 1994564))]
(gdb)
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f73f855f8f3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f73f850d576 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f73f84f5935 in __GI_abort () at abort.c:79
#4 0x00007f73f84f67f3 in __libc_message_impl (fmt=fmt@entry=0x7f73f8679ffc "%s") at ../sysdeps/posix/libc_fatal.c:134
#5 0x00007f73f85528e9 in __GI___libc_fatal (message=<optimized out>) at ../sysdeps/posix/libc_fatal.c:143
#6 0x00007f73f8566474 in unwind_cleanup (reason=<optimized out>, exc=<optimized out>) at unwind.c:114
#7 0x00007f73f8b2e78c in nix::unix::triggerInterrupt () at src/libutil/unix/signals.cc:94
#8 0x00007f73f8d5b8d5 in nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}::operator()() const (__closure=0x561807cd7598) at src/libutil/unix/monitor-fd.hh:55
#9 std::__invoke_impl<void, nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}>(std::__invoke_other, nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}&&) (__f=...)
at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
#10 std::__invoke<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}>(nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}&&) (__fn=...)
at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:96
#11 std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x561807cd7598)
at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:292
#12 std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> >::operator()() (this=0x561807cd7598)
at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:299
#13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> > >::_M_run() (this=0x561807cd7590)
at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:244
#14 0x00007f73f88bb6d3 in execute_native_thread_routine () from /nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libstdc++.so.6
#15 0x00007f73f855daf3 in start_thread (arg=<optimized out>) at pthread_create.c:447
#16 0x00007f73f85dcf4c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
| 14:47:20 |
Vladimír Čunát | So we continue by creating a nix ticket with nix devs taking the lead on this? | 14:48:05 |
hexa | and tag it with infra, as proposed | 14:48:27 |
hexa | do you want to file the issue? | 14:48:43 |
Vladimír Čunát | I can. (But I don't care who does it.) | 14:54:18 |
hexa | Please do 🙂 | 14:54:37 |
hexa | similar to https://github.com/NixOS/nix/issues/8946? | 15:00:31 |
Vladimír Čunát | Yes, I've been looking at that for some time now. | 15:08:42 |
Vladimír Čunát | The stack traces are almost the same, so let's not open a new one. | 15:08:56 |
Mic92 | vcunat: let me know, when you are creating one. | 15:10:04 |
Mic92 | I will bring this up in the next meeting. | 15:10:18 |
Vladimír Čunát | I put the extra info into the existing issue and labeled it. Feel free to ask for more (I do know C debugging at least). | 15:12:48 |
Mic92 | This looks like exceptions. nix-daemon doesn't list anything, right? | 15:13:34 |
Mic92 | Ah. I think it's the "unreachable()" instance in this function | 15:14:22 |
Vladimír Čunát | You mean journalctl logs? | 15:14:23 |
hexa | hard to say, given that every build causes an "accepted connection" line 😄 | 15:14:33 |
hexa | Mar 21 15:13:47 elated-minsky nix-daemon[1647693]: FATAL: exception not rethrown
| 15:14:46 |
hexa | there are these intermittently | 15:14:50 |
Vladimír Čunát | Yes, timestamp fits exactly. | 15:17:24 |
Vladimír Čunát | (in another instance: coredumpctl crash stamp matches the log stamp with this line) | 15:18:20 |
Mic92 | I am inclined to backport https://github.com/NixOS/nix/pull/12636/files to cherry-pick into the nix version that we use in NixOS infra to get more insights | 15:18:44 |
Mic92 | Probably not needed for this bug but in general | 15:22:48 |
Vladimír Čunát | It shouldn't be a problem to switch versions, too. As long as it isn't too unstable. If that makes something easier. | 15:22:51 |
Vladimír Čunát | We generally simply use defaults on the stable NixOS. | 15:23:45 |
hexa | ok, so bump to 2.26 and try to repro? | 15:27:29 |
Mic92 | Probably yes. The current stack trace doesn't contain the actual exception that comes from something else. | 15:36:52 |
Mic92 | Ah and I also forgot about: https://github.com/picnoir/nix/commit/0edc530f18a9cea2377aa3380d34bc37076ebb99 | 15:43:57 |
Mic92 | hexa (signing key rotation when): ^ I think this might be actual the fix. | 15:48:15 |
Mic92 | Will get this merged into Nix asap. | 15:52:10 |