| 30 May 2024 |
waltmck | I am running it through nixos-anywhere | 11:19:40 |
waltmck | I have a rescue system booted from a network drive so I have access to the disks | 11:20:06 |
lassulus | it should be, but removing existing partitions is sometimes a bit lagging :) | 11:20:14 |
lassulus | since they can be used in all different sort of ways | 11:20:29 |
lassulus | but nixos-anywhere usually takes care of that by running a kexec | 11:20:45 |
waltmck | when you initialize a new partition, you shouldn't be assuming that the disk is zeroed out | 11:20:47 |
waltmck | Right, disko runs after a kexec | 11:21:07 |
lassulus | ah, we run wipefs before running disko | 11:22:06 |
lassulus | https://github.com/nix-community/disko/blob/master/disk-deactivate/disk-deactivate.jq#L33 | 11:22:06 |
waltmck | I've tried this many times across reboots and it is totally reproducible. I haven't tried manually zeroing out the disks though just because I assumed that semantically the curring contents of a disk shouldn't matter when reformatting | 11:22:13 |
waltmck | In reply to @waltmck:matrix.org I've tried this many times across reboots and it is totally reproducible. I haven't tried manually zeroing out the disks though just because I assumed that semantically the curring contents of a disk shouldn't matter when reformatting current | 11:22:43 |
lassulus | well they shouldn't, but sometimes, if there is already an mdadm it can get activated at random times and that can interfere for example with other stuff | 11:23:28 |
waltmck | Interesting, I'm not really sure how that works. How does the mdadm get activated? Is the problem that writes to the virtual device might be concurrent with writes to the underlying devices? | 11:24:34 |
lassulus | not sure when that exactly happens, otherwise I would have reproduced and fixed it :) but I have seen raid devices being activated later, some minutes after booting | 11:26:34 |
lassulus | if this happens after we run the disk-deactivate script, things get wonky | 11:27:05 |
lassulus | but not sure that's even the issue here, I guess you can check if there are any lingering raid devices active | 11:28:02 |
waltmck | I could give you access to the server if it's helpful for you to debug this issue with disko | 11:28:08 |
waltmck | (independently of my issue, if you are having trouble reproducing) | 11:28:23 |
waltmck | there's nothing on it, I could just wipe it after you're done | 11:29:51 |
lassulus | hmm, not sure I have the time to debug that further :) also not sure if the issue would happen if disko is run again? | 11:31:07 |
waltmck | I've restarted the server a few times and the issue persists | 11:31:45 |
waltmck | I'll restart a few more times and hopefully that will fix the issue. If not, I'll let you know | 11:33:01 |
lassulus | hmm, it also fails after the partprobe | 11:37:56 |
lassulus | and the zap should after the partprobe | 11:38:02 |
lassulus | this could be another issue | 11:38:08 |
waltmck | yep, issue just persisted under another full reboot. Reboots wipe all of the rescue system state so the problem is either in my config file or in the disk state | 11:38:14 |
lassulus | https://github.com/nix-community/disko/pull/654 | 11:40:41 |
lassulus | ah forgot to rebase | 11:41:23 |
waltmck | ahh, I will try running sgdisk --zap-all | 11:41:25 |
waltmck | I think it worked | 11:43:10 |