| 20 Jan 2025 |
@elvishjerricco:matrix.org | ok so for some reason, letting systemd do all the handling of special file systems leads to.... networking not working? | 02:57:50 |
@elvishjerricco:matrix.org | inconsistently | 02:57:58 |
@elvishjerricco:matrix.org | god dammit | 04:11:03 |
@elvishjerricco:matrix.org | my problem is with systemd thinking its own path is under /run | 04:11:23 |
@elvishjerricco:matrix.org | which it checks before it does its switch-root stuff | 04:12:08 |
@elvishjerricco:matrix.org | but the switch-root stuff is the stuff that bind mounts /run into the new root | 04:12:18 |
@elvishjerricco:matrix.org | so it doesn't find itself | 04:12:26 |
@elvishjerricco:matrix.org | and quits before trying to switch-root | 04:12:39 |
@elvishjerricco:matrix.org | basically I need to do systemd's switch-root bind mount transfer stuff in the activation unit | 04:13:45 |
@elvishjerricco:matrix.org | which is exactly what I wanted to avoid :P | 04:13:50 |
@elvishjerricco:matrix.org | Ok. I have things that work now. I'm going to summarize the problems now if only to make sure I've got them all in line in my head :P
- systemd's switch-root is frustrating
- It will only serialize state and hand it over if the new PID1 is the builtin path (
/run/current-system/systemd/lib/systemd/systemd, and the empty string counts as equivalent).
- It will check for the existence of the new PID1 binary before it does its
switch_root function.
- This
switch_root function is the one that bind-mounts /run, meaning if the new init is in /run then you just won't be allowed to switch-root.
- Also when
switch_root does the bind mounts, it only does them if there's not already a mount there.
- We (inadvertently, I think) resolved this by bind mounting
/run ourselves for initrd-nixos-activation.service.
- Now, the reason credentials weren't being "imported" in stage 2 is because systemd expects them to have already been imported in stage 1. Stage 1 was importing, but we were't bind mounting
/run recursively.
- Because
switch_root skips already-mounted mounts, it also skipped this.
- Result, imported credentials are killed.
- All this means that we have to setup
/sysroot/run before we switch-root, even though we want switch-root to be the thing setting up /sysroot/run for us.
UGH
| 04:50:54 |
@elvishjerricco:matrix.org | * Ok. I have things that work now. I'm going to summarize the problems now if only to make sure I've got them all in line in my head :P
-
systemd's switch-root is frustrating
- It will only serialize state and hand it over if the new PID1 is the builtin path (
/run/current-system/systemd/lib/systemd/systemd, and the empty string counts as equivalent).
- It will check for the existence of the new PID1 binary before it does its
switch_root function.
- This
switch_root function is the one that bind-mounts /run, meaning if the new init is in /run then you just won't be allowed to switch-root, because the previous step will have failed before getting here.
- Also when
switch_root does the bind mounts, it only does them if there's not already a mount there.
- We (inadvertently, I think) resolved this by bind mounting
/run ourselves for initrd-nixos-activation.service.
-
Now, the reason credentials weren't being "imported" in stage 2 is because systemd expects them to have already been imported in stage 1. Stage 1 was importing, but we were't bind mounting /run recursively.
- Because
switch_root skips already-mounted mounts, it also skipped this.
- Result, imported credentials are killed.
-
All this means that we have to setup /sysroot/run before we switch-root, even though we want switch-root to be the thing setting up /sysroot/run for us.
UGH
| 04:52:00 |
@elvishjerricco:matrix.org | Additionally, there's a related problem for trying to eliminate specialFileSystems. Activation expects that some things in /sys and /proc are mounted too, not just /run, so now also have to setup those! Now, I think those can be temporary, but it's still something I wish was just handled by systemd. | 04:55:43 |
@elvishjerricco:matrix.org | Oh I have a bad idea. A really bad idea. We could solve all of this by switch-rooting into a system that's almost completely unconfigured, except for one unit that runs activation in the real, current root, and then does a soft-reboot into the real system | 04:58:31 |
phaer | Wait, why would you need a soft-reboot here? Shouldn't you end up with a working system after activation - similar to if you just run activation in an already booted system, i.e. during nixos-rebuild? What am I missing? | 16:57:46 |
phaer | Might actually give this a try later today/tomorrow to find out :D | 16:58:00 |
@elvishjerricco:matrix.org | phaer: You'd do the soft-reboot just to avoid the complex process of switch-to-configuration | 18:43:21 |
@elvishjerricco:matrix.org | it's a beast that really shouldn't be part of bootup | 18:43:28 |
@elvishjerricco:matrix.org | funny thought I just had about that idea. The intermediate phase would be like a stage 1.5, except it's more like stage 2 because it actually exists in the stage 2 rootfs, so maybe more like stage 2.-5? :P | 18:45:08 |
@elvishjerricco:matrix.org | Imagine trying to explain to upstream systemd "yea, this error happens on nixos during stage two and a negative half" | 18:45:54 |
iridium | @elvishjerricco:matrix.org: My notebook just crashed during upgrade - again, systemd restart failed. I now have a system in defunct state, but do have a shell. What useful things should I look at to collect more data for debugging? 🙂 | 18:59:35 |
@elvishjerricco:matrix.org | oh gosh | 18:59:58 |
@elvishjerricco:matrix.org | I need to remind myself of your exact issue again | 19:00:06 |
iridium | https://discourse.nixos.org/t/system-inoperable-after-automatic-upgrades/50197/2 | 19:02:16 |
@elvishjerricco:matrix.org | iridium: Are you able to open journalctl -e? | 19:03:05 |
iridium | Yes: https://pastebin.com/raw/7Yz2drXP | 19:05:09 |
@elvishjerricco:matrix.org | ok good. And you would expect that downgrading and redoing the upgrade would trigger it again, right? | 19:05:25 |
iridium | Not sure if relevant: https://pastebin.com/raw/GSkBauCk | 19:05:58 |
iridium | I have to admit I never tried, but would guess so, yes | 19:06:14 |
@elvishjerricco:matrix.org | oh if you know how to use gdb productively, that could be useful :P I am at level zero with that stuff | 19:06:51 |