22 Jun 2023 |
@lily:lily.flowers | Something about loading 9p after /sysroot tmpfs is mounted breaks it | 17:25:02 |
@lily:lily.flowers | (and yes it did also need the sleep) | 17:25:10 |
@lily:lily.flowers | nikstur | 17:25:17 |
@lily:lily.flowers | Scripted stage-1 only works because it handles that specially and serially rather than generally and parallel like systemd-initrd | 17:25:40 |
@nikstur:matrix.org | I figured it would be something like this... :(( | 17:37:11 |
@lily:lily.flowers | Looks like 6.3 still has the same problem and I'm testing 5.15 now. I didn't find any immediately obvious related bug reports on lkml or bugzilla, but I also didn't look too hard and not sure which exact part of that interaction does it | 17:38:45 |
@lily:lily.flowers | (I'll admit I really don't feel like bisecting the kernel right now, though, if it does turn out to be a kernel bug) | 17:40:16 |
@gdamjan:spodeli.org | tmpfs /sysroot is too fast, and not the whole PCI is enumerated? | 18:05:23 |
@lily:lily.flowers | Yeah but scripted stage-1 should be loading 9pnet_virtio on-demand too. Let me try introducing a wait between sysroot.mount and the 9pnet mounts | 18:07:26 |
@gdamjan:spodeli.org | I don't think async PCI is on the mind of many people :D | 18:07:59 |
@lily:lily.flowers | I suppose? Let me actually just introduce a sleep before sysroot.mount then. PCI should be settling before then anyway, but that would at least show that it's not just that the bootup is too fast | 18:09:27 |
@lily:lily.flowers | Damn. That also let it pass | 18:10:39 |
@lily:lily.flowers | I guess it is something like that then | 18:12:11 |
@gdamjan:spodeli.org | does the 9p_virtio appear somewhere in /sys/bus/virtio/devices/* | 18:16:15 |
@gdamjan:spodeli.org | or maybe in /sys/bus/pci/devices/0000:0* | 18:17:22 |
@lily:lily.flowers | Probably. I'll poke more in a minute. I suppose we need to add a dependency for the mounts for after 9p stuff settles | 18:17:24 |
@lily:lily.flowers | * Probably. I'll poke more in a minute. I suppose we need to add a dependency for the mounts for after pci stuff settles | 18:17:34 |
@lily:lily.flowers | (also, good guess btw that tmpfs mounts too fast gdamjan -- I suppose device-backed mounts already get a dependency on all of that settling through the .device units, so after sysroot.mount the 9pnet_virtio stuff never lost the race for the channels) | 18:32:31 |
@gdamjan:spodeli.org | device mounts probably get an fsck too :) | 18:38:08 |
@lily:lily.flowers | Not necessarily (e.g. btrfs), but just waiting for the device to be available is probably enough I'm guessing. I'm doing more tests now to confirm what exactly needs to be waited on, because then we can configure an explicit dependency for the mounts | 18:39:21 |
@lily:lily.flowers | Interesting, just even doing an ls /sys/bus/pci/devices is enough to make it pass | 18:44:18 |
@gdamjan:spodeli.org | hah, it's milliseconds I bet | 18:44:40 |
Arian | Cursed | 18:45:02 |
Arian | Absolutely cursed | 18:45:05 |
@lily:lily.flowers | Ugh still trying to root out that one issue in between $dayjob stuff. I'm taking a break now, but I'll come back to it later. It's an interesting and cursed issue for sure | 20:43:47 |
@lily:lily.flowers | ElvishJerricco: Should we go ahead and merge https://github.com/NixOS/nixpkgs/pull/237820 and https://github.com/NixOS/nixpkgs/pull/237823 or were you wanting more reviews on them? (or if anyone else here wants to review that is of course welcome) | 20:44:24 |
@elvishjerricco:matrix.org | that's a lot of stuff I wasn't paying attention to in here today :P | 20:44:34 |
@elvishjerricco:matrix.org | I think it's fine to merge both of those | 20:45:25 |
@elvishjerricco:matrix.org | In reply to @lily:lily.flowers Ugh still trying to root out that one issue in between $dayjob stuff. I'm taking a break now, but I'll come back to it later. It's an interesting and cursed issue for sure any chance of a quick summary so I don't have to read all that scrollback for context? | 20:46:18 |
@lily:lily.flowers | In reply to @elvishjerricco:matrix.org any chance of a quick summary so I don't have to read all that scrollback for context? 9pfs over virtio can apparently be mounted too quickly before the virtio channels have populated. It's not a problem with slow, serial scripted stage-1 but is a problem with systemd-initrd when using a sysroot without a backing device (i.e. tmpfs) which would otherwise wait for enough dev stuff to settle before attempting to mount | 20:48:30 |