| 20 Jan 2025 |
ElvishJerricco | also, if you feel like trying something nifty, you can try using magic-wormhole to send me the log. Just a bit nicer than matrix uploads. | 19:42:44 |
iridium | Relevant timestamps (approximate!):
"restarting systemd..." at roughly 20:41:06.
"Error: Failed to reset failed units" at roughly 20:41:20 (plusminus...). | 19:45:06 |
ElvishJerricco | Ok I think the lots of Closing set fd 657 style messages indicate I was right about my hunch of the file descriptor pool thingymajig | 19:47:49 |
ElvishJerricco | trying to remember what systemd calls that correctly so I can actually look it up :P | 19:48:14 |
ElvishJerricco | iridium: https://systemd.io/FILE_DESCRIPTOR_STORE/ | 19:49:39 |
ElvishJerricco | this | 19:49:40 |
ElvishJerricco | the "protocol error" makes me think there's something up with this store | 19:50:07 |
ElvishJerricco | it could be that the expected state of this store changes between releases | 19:50:20 |
ElvishJerricco | it could be that there's a bug in restoring the state after reexec | 19:50:30 |
ElvishJerricco | not sure | 19:50:33 |
iridium | hm... Not sure if relevant in any way, but both systems I experienced this on have a bcachefs filesystem mounted | 19:50:35 |
ElvishJerricco | absolutely that's relevant :P | 19:50:46 |
ElvishJerricco | this is exactly the kind of obscure bug I expect from bcachefs | 19:50:56 |
iridium | Sorry! 🙂 | 19:51:13 |
iridium | (for one of the machines, it's not the root FS) | 19:51:28 |
ElvishJerricco | it's reasonably likely not to be bcachefs's fault | 19:51:44 |
ElvishJerricco | but it's something to keep in mind for sure | 19:51:49 |
ElvishJerricco | can you tell me the exact nixpkgs revisions you were switching between, and share as much of your configuration, including the NFS mount, as you can? | 19:52:04 |
iridium | I'll ping you some details via dm, give me a sec | 19:52:29 |
iridium | Might be worth retrying this without the NFS share mounted, now that I can reproduce reliably | 19:52:58 |
ElvishJerricco | actually I was going to suggest we make a nixpkgs issue to track all this info | 19:53:01 |
iridium | sample size 1: did not reproduce the issue without the NFS mount | 19:55:26 |
iridium | (reproduced ~8 times in a row with NFS mounted, no other differences I'm aware of) | 19:55:43 |
ElvishJerricco | nfs is another thing that causes all kinds of obscure bugs | 19:57:07 |
iridium | second attempt, upgrade succeeded again. I'd be reasonably confident it's somehow NFS-related | 19:58:00 |
ElvishJerricco | yea | 19:58:11 |
ElvishJerricco | it's probably more about how systemd tracks the NFS mount | 19:58:21 |
iridium | https://github.com/NixOS/nixpkgs/issues/375376
This also has the flake lockfiles from before/after the upgrade. Everything else you should know already. | 20:14:41 |
iridium | Ah, and details about the NFS mounts 🙂 | 20:14:53 |
ElvishJerricco | thanks! | 20:19:42 |