22 Jun 2023 |
@nikstur:matrix.org | In reply to @lily:lily.flowers Seems weird it would only happen with tmpfs and only with systemd-initrd too, but idk I'd have to poke at it. Does adding that one module actually fix your issue nikstur? * I also can't make the connection there. This is where it works and doesnt:
- legacy-initrd without tmpfs / -> works
- legacy-initrd with tmpfs / -> works
- sd-initrd wihtout tmpfs / -> works
- sd-initrd with tmpfs / -> doesn't work
| 13:23:22 |
@nikstur:matrix.org | * I also can't make the connection there. This is where it works and doesnt:
- legacy-initrd without tmpfs / -> works
- legacy-initrd with tmpfs / -> works
- sd-initrd without tmpfs / -> works
- sd-initrd with tmpfs / -> doesn't work
| 13:23:38 |
@lily:lily.flowers | In reply to @nikstur:matrix.org
I also can't make the connection there. This is where it works and doesnt:
- legacy-initrd without tmpfs / -> works
- legacy-initrd with tmpfs / -> works
- sd-initrd without tmpfs / -> works
- sd-initrd with tmpfs / -> doesn't work
Can you share full boot logs? Or provide a system derivation to reproduce so we can poke at it? | 13:28:07 |
@nikstur:matrix.org | https://github.com/NixOS/nixpkgs/pull/238848 | 13:28:44 |
@nikstur:matrix.org | Didnt get around to parameterize it for sd-initrd yet but if you just manually set boot.initrd.systemd.enable = true it fails | 13:29:09 |
@lily:lily.flowers | If no one else does, I'll take a look and poke a little in about an hour or two when I'm done with a thing at $dayjob | 13:31:09 |
@lily:lily.flowers | Alright so you're gonna hate this and I think it may be a kernel bug | 17:24:35 |
@lily:lily.flowers | It works with this snippet:
boot.initrd.kernelModules = [ "9p" "9pnet_virtio" ];
boot.initrd.systemd.services.systemd-modules-load.before = [ "sysroot.mount" ];
boot.initrd.systemd.services.systemd-modules-load.serviceConfig.ExecStartPost = "${pkgs.coreutils}/bin/sleep 5";
| 17:24:52 |
@lily:lily.flowers | Something about loading 9p after /sysroot tmpfs is mounted breaks it | 17:25:02 |
@lily:lily.flowers | (and yes it did also need the sleep) | 17:25:10 |
@lily:lily.flowers | nikstur | 17:25:17 |
@lily:lily.flowers | Scripted stage-1 only works because it handles that specially and serially rather than generally and parallel like systemd-initrd | 17:25:40 |
@nikstur:matrix.org | I figured it would be something like this... :(( | 17:37:11 |
@lily:lily.flowers | Looks like 6.3 still has the same problem and I'm testing 5.15 now. I didn't find any immediately obvious related bug reports on lkml or bugzilla, but I also didn't look too hard and not sure which exact part of that interaction does it | 17:38:45 |
@lily:lily.flowers | (I'll admit I really don't feel like bisecting the kernel right now, though, if it does turn out to be a kernel bug) | 17:40:16 |
@gdamjan:spodeli.org | tmpfs /sysroot is too fast, and not the whole PCI is enumerated? | 18:05:23 |
@lily:lily.flowers | Yeah but scripted stage-1 should be loading 9pnet_virtio on-demand too. Let me try introducing a wait between sysroot.mount and the 9pnet mounts | 18:07:26 |
@gdamjan:spodeli.org | I don't think async PCI is on the mind of many people :D | 18:07:59 |
@lily:lily.flowers | I suppose? Let me actually just introduce a sleep before sysroot.mount then. PCI should be settling before then anyway, but that would at least show that it's not just that the bootup is too fast | 18:09:27 |
@lily:lily.flowers | Damn. That also let it pass | 18:10:39 |
@lily:lily.flowers | I guess it is something like that then | 18:12:11 |
@gdamjan:spodeli.org | does the 9p_virtio appear somewhere in /sys/bus/virtio/devices/* | 18:16:15 |
@gdamjan:spodeli.org | or maybe in /sys/bus/pci/devices/0000:0* | 18:17:22 |
@lily:lily.flowers | Probably. I'll poke more in a minute. I suppose we need to add a dependency for the mounts for after 9p stuff settles | 18:17:24 |
@lily:lily.flowers | * Probably. I'll poke more in a minute. I suppose we need to add a dependency for the mounts for after pci stuff settles | 18:17:34 |
@lily:lily.flowers | (also, good guess btw that tmpfs mounts too fast gdamjan -- I suppose device-backed mounts already get a dependency on all of that settling through the .device units, so after sysroot.mount the 9pnet_virtio stuff never lost the race for the channels) | 18:32:31 |
@gdamjan:spodeli.org | device mounts probably get an fsck too :) | 18:38:08 |
@lily:lily.flowers | Not necessarily (e.g. btrfs), but just waiting for the device to be available is probably enough I'm guessing. I'm doing more tests now to confirm what exactly needs to be waited on, because then we can configure an explicit dependency for the mounts | 18:39:21 |
@lily:lily.flowers | Interesting, just even doing an ls /sys/bus/pci/devices is enough to make it pass | 18:44:18 |
@gdamjan:spodeli.org | hah, it's milliseconds I bet | 18:44:40 |