!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

391 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.121 Servers

Load older messages


SenderMessageTime
21 Mar 2025
@hexa:lossy.networkhexa* so what is the best way to trawl through these many coredumps and group them?13:58:50
@vcunat:matrix.orgVladimír Čunát

The backtraces look all the same to me at a quick glance.

for pid in $(cat /tmp/core-pids); do echo bt | coredumpctl gdb 2803640 | sed '1,/Program terminated .*/d'; done
14:16:20
@vcunat:matrix.orgVladimír Čunát

Random one:

warning: core file may not match specified executable file.
#0  0x0000ffffa27ca5b4 in __pthread_kill_implementation () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
[Current thread is 1 (Thread 0xffffa059e440 (LWP 2803641))]
(gdb) #0  0x0000ffffa27ca5b4 in __pthread_kill_implementation () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#1  0x0000ffffa277b15c in raise () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#2  0x0000ffffa2765a00 in abort () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#3  0x0000ffffa27bca78 in __libc_message_impl () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#4  0x0000ffffa27bcae8 in __libc_fatal () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#5  0x0000ffffa27d1f64 in unwind_cleanup () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#6  0x0000ffffa2dd8970 in nix::unix::triggerInterrupt() () from /nix/store/qdcg0g77rbj8bipd8xy3k6kw32yh17vr-nix-2.24.12/lib/libnixutil.so
#7  0x0000ffffa2f62490 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> > >::_M_run() ()
   from /nix/store/qdcg0g77rbj8bipd8xy3k6kw32yh17vr-nix-2.24.12/lib/libnixstore.so
#8  0x0000ffffa2add33c in execute_native_thread_routine () from /nix/store/9df8irigdgxl3cnyfwir2xw4fs2q9my7-gcc-13.3.0-lib/lib/libstdc++.so.6
#9  0x0000ffffa27c87cc in start_thread () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
#10 0x0000ffffa2834e4c in thread_start () from /nix/store/g4agqy9fxhnzmq31idkhp2kqk4sgp3i0-glibc-2.40-66/lib/libc.so.6
14:16:47
@vcunat:matrix.orgVladimír ČunátThat was aarch64. But the last x86_64 stack-trace looks exactly the same.14:22:36
@vcunat:matrix.orgVladimír Čunát

So I copied one of the x86_64 cores to a local machine with services.nixseparatedebuginfod.enable = true; and got

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44                                                                                                  
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7f73f68d96c0 (LWP 1994564))]
(gdb) 
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f73f855f8f3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007f73f850d576 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f73f84f5935 in __GI_abort () at abort.c:79
#4  0x00007f73f84f67f3 in __libc_message_impl (fmt=fmt@entry=0x7f73f8679ffc "%s") at ../sysdeps/posix/libc_fatal.c:134
#5  0x00007f73f85528e9 in __GI___libc_fatal (message=<optimized out>) at ../sysdeps/posix/libc_fatal.c:143
#6  0x00007f73f8566474 in unwind_cleanup (reason=<optimized out>, exc=<optimized out>) at unwind.c:114
#7  0x00007f73f8b2e78c in nix::unix::triggerInterrupt () at src/libutil/unix/signals.cc:94
#8  0x00007f73f8d5b8d5 in nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}::operator()() const (__closure=0x561807cd7598) at src/libutil/unix/monitor-fd.hh:55
#9  std::__invoke_impl<void, nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}>(std::__invoke_other, nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}&&) (__f=...)
    at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
#10 std::__invoke<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}>(nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}&&) (__fn=...)
    at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:96
#11 std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x561807cd7598)
    at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:292
#12 std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> >::operator()() (this=0x561807cd7598)
    at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:299
#13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<nix::MonitorFdHup::MonitorFdHup(int)::{lambda()#1}> > >::_M_run() (this=0x561807cd7590)
    at /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_thread.h:244
#14 0x00007f73f88bb6d3 in execute_native_thread_routine () from /nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libstdc++.so.6
#15 0x00007f73f855daf3 in start_thread (arg=<optimized out>) at pthread_create.c:447
#16 0x00007f73f85dcf4c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
14:47:20
@vcunat:matrix.orgVladimír ČunátSo we continue by creating a nix ticket with nix devs taking the lead on this?14:48:05
@hexa:lossy.networkhexaand tag it with infra, as proposed14:48:27
@hexa:lossy.networkhexado you want to file the issue?14:48:43
@vcunat:matrix.orgVladimír ČunátI can. (But I don't care who does it.)14:54:18
@hexa:lossy.networkhexaPlease do 🙂 14:54:37
@hexa:lossy.networkhexasimilar to https://github.com/NixOS/nix/issues/8946?15:00:31
@vcunat:matrix.orgVladimír ČunátYes, I've been looking at that for some time now.15:08:42
@vcunat:matrix.orgVladimír ČunátThe stack traces are almost the same, so let's not open a new one.15:08:56
@joerg:thalheim.ioMic92 vcunat: let me know, when you are creating one. 15:10:04
@joerg:thalheim.ioMic92I will bring this up in the next meeting.15:10:18
@vcunat:matrix.orgVladimír ČunátI put the extra info into the existing issue and labeled it. Feel free to ask for more (I do know C debugging at least).15:12:48
@joerg:thalheim.ioMic92This looks like exceptions. nix-daemon doesn't list anything, right?15:13:34
@joerg:thalheim.ioMic92Ah. I think it's the "unreachable()" instance in this function15:14:22
@vcunat:matrix.orgVladimír ČunátYou mean journalctl logs?15:14:23
@hexa:lossy.networkhexahard to say, given that every build causes an "accepted connection" line 😄 15:14:33
@hexa:lossy.networkhexa
Mar 21 15:13:47 elated-minsky nix-daemon[1647693]: FATAL: exception not rethrown
15:14:46
@hexa:lossy.networkhexathere are these intermittently15:14:50
@vcunat:matrix.orgVladimír ČunátYes, timestamp fits exactly.15:17:24
@vcunat:matrix.orgVladimír Čunát(in another instance: coredumpctl crash stamp matches the log stamp with this line)15:18:20
@joerg:thalheim.ioMic92I am inclined to backport https://github.com/NixOS/nix/pull/12636/files to cherry-pick into the nix version that we use in NixOS infra to get more insights15:18:44
@joerg:thalheim.ioMic92Probably not needed for this bug but in general15:22:48
@vcunat:matrix.orgVladimír ČunátIt shouldn't be a problem to switch versions, too. As long as it isn't too unstable. If that makes something easier.15:22:51
@vcunat:matrix.orgVladimír ČunátWe generally simply use defaults on the stable NixOS.15:23:45
@hexa:lossy.networkhexaok, so bump to 2.26 and try to repro?15:27:29
@joerg:thalheim.ioMic92Probably yes. The current stack trace doesn't contain the actual exception that comes from something else.15:36:52

Show newer messages


Back to Room ListRoom Version: 6