20 Mar 2025 |
Arian | Should the Systemd team be on nixos.org/community | 18:34:34 |
21 Mar 2025 |
| mrdev023 joined the room. | 13:51:10 |
22 Mar 2025 |
@elvishjerricco:matrix.org | I am finding some extremely broken behavior with systemd-repart running during boot when the device is already partitioned but needs modification (e.g. grow a partition). It seems that when it runs and repartitions, it causes the device units to stop and start. This causes fsck and mount units to be stopped, all the way up to initrd-fs.target , causing initrd-find-nixos-closure.service to be stopped. Then initrd-parse-etc.service starts and that starts initrd-fs.target again, but initrd-find-nixos-closure.service is still stopped, so it never happens and the system fails to boot | 09:10:18 |
Arian | Wut | 09:12:19 |
@elvishjerricco:matrix.org | now, all those units being stopped is a "job canceled", so they don't need to have already been started for this chain reaction to occur | 09:12:43 |
@elvishjerricco:matrix.org | * now, all those units being stopped is a "job canceled" scenario, so they don't need to have already been started for this chain reaction to occur | 09:12:49 |
@elvishjerricco:matrix.org | (some of these observations come from me being in the middle of messing with things and adding various orderings to debug things, so I might be getting the details wrong, but the core idea is I think a problem) | 09:13:41 |
@elvishjerricco:matrix.org | the specific use case I was trying to debug was when / is a tmpfs and /nix is on a partition that needs to grw | 09:14:37 |
@elvishjerricco:matrix.org | * the specific use case I was trying to debug was when / is a tmpfs and /nix is on a partition that needs to grow | 09:14:39 |
@elvishjerricco:matrix.org | but I think it would be a problem in a lot more generic scenarios than that | 09:14:56 |
Arian | Can you make a non-nix-specific reproducer? | 09:15:11 |
@elvishjerricco:matrix.org | uh | 09:15:28 |
@elvishjerricco:matrix.org | I don't know enough about other distros to make another distro do this :P | 09:15:43 |
Arian | I don't understand how you have mounts before repart runs | 09:16:23 |
Arian | You cant resize a mounted partition no? | 09:16:31 |
@elvishjerricco:matrix.org | well, a) yes you can, and b) I don't have mounts anyway. The mount jobs are cancelled before they're started | 09:16:51 |
Arian | Repart should be running before /sysroot is mounted | 09:16:57 |
@elvishjerricco:matrix.org | the device appears, satisfying some dependencies, and then disappears, which causes job cancellations because of BindsTo=dev-foo.device | 09:17:42 |
@elvishjerricco:matrix.org | and then reappears, but then it's too late and damage is done | 09:18:07 |
Arian | Could it be a kernel bug? | 09:18:20 |
Arian | Why is the kernel sending uevents on resize | 09:18:30 |
@elvishjerricco:matrix.org | is it not normal for a device's partitions to be removed and added from udev's perspective when the device is partscanned? | 09:19:01 |
Arian | Well you just said it's possible to resize a partition that is mounted. In that case it doesn't sound like sane behaviour that the underlying device would disappear and appear no | 09:19:50 |
Arian | That makes 0 sense to me | 09:20:04 |
Arian | Ah yeh we use online resize for cloud images etc. I remember now | 09:21:28 |
@elvishjerricco:matrix.org | after repart finishes, I see Changed plugged -> dead for each of the partitions in the systemd debug logging, and then immediately after I see Changed dead -> plugged for them | 09:23:12 |
@elvishjerricco:matrix.org | vda3: Processing udev action (SEQNUM=1400, ACTION=remove)
then I get a
vda: Processing udev action (SEQNUM=1401, ACTION=change)
And then I get
vda3: Processing udev action (SEQNUM=1404, ACTION=add)
| 09:23:57 |
@elvishjerricco:matrix.org | * vda3: Processing udev action (SEQNUM=1400, ACTION=remove)
then I get a
vda: Processing udev action (SEQNUM=1401, ACTION=change)
And then I get
vda3: Processing udev action (SEQNUM=1404, ACTION=add)
| 09:24:15 |
@elvishjerricco:matrix.org | I feel like this can't possibly be right, because then imagine what happens with a normal stage 2 repart service. It would repartition, and immediately cancel all mount jobs depending on those partitions, stopping local-fs.target | 09:26:43 |
Arian | Is Dev/vda3 mounted at this point? | 09:27:35 |