| 30 May 2024 |
lassulus | but nixos-anywhere usually takes care of that by running a kexec | 11:20:45 |
waltmck | when you initialize a new partition, you shouldn't be assuming that the disk is zeroed out | 11:20:47 |
waltmck | Right, disko runs after a kexec | 11:21:07 |
lassulus | ah, we run wipefs before running disko | 11:22:06 |
lassulus | https://github.com/nix-community/disko/blob/master/disk-deactivate/disk-deactivate.jq#L33 | 11:22:06 |
waltmck | I've tried this many times across reboots and it is totally reproducible. I haven't tried manually zeroing out the disks though just because I assumed that semantically the curring contents of a disk shouldn't matter when reformatting | 11:22:13 |
waltmck | In reply to @waltmck:matrix.org I've tried this many times across reboots and it is totally reproducible. I haven't tried manually zeroing out the disks though just because I assumed that semantically the curring contents of a disk shouldn't matter when reformatting current | 11:22:43 |
lassulus | well they shouldn't, but sometimes, if there is already an mdadm it can get activated at random times and that can interfere for example with other stuff | 11:23:28 |
waltmck | Interesting, I'm not really sure how that works. How does the mdadm get activated? Is the problem that writes to the virtual device might be concurrent with writes to the underlying devices? | 11:24:34 |
lassulus | not sure when that exactly happens, otherwise I would have reproduced and fixed it :) but I have seen raid devices being activated later, some minutes after booting | 11:26:34 |
lassulus | if this happens after we run the disk-deactivate script, things get wonky | 11:27:05 |
lassulus | but not sure that's even the issue here, I guess you can check if there are any lingering raid devices active | 11:28:02 |
waltmck | I could give you access to the server if it's helpful for you to debug this issue with disko | 11:28:08 |
waltmck | (independently of my issue, if you are having trouble reproducing) | 11:28:23 |
waltmck | there's nothing on it, I could just wipe it after you're done | 11:29:51 |
lassulus | hmm, not sure I have the time to debug that further :) also not sure if the issue would happen if disko is run again? | 11:31:07 |
waltmck | I've restarted the server a few times and the issue persists | 11:31:45 |
waltmck | I'll restart a few more times and hopefully that will fix the issue. If not, I'll let you know | 11:33:01 |
lassulus | hmm, it also fails after the partprobe | 11:37:56 |
lassulus | and the zap should after the partprobe | 11:38:02 |
lassulus | this could be another issue | 11:38:08 |
waltmck | yep, issue just persisted under another full reboot. Reboots wipe all of the rescue system state so the problem is either in my config file or in the disk state | 11:38:14 |
lassulus | https://github.com/nix-community/disko/pull/654 | 11:40:41 |
lassulus | ah forgot to rebase | 11:41:23 |
waltmck | ahh, I will try running sgdisk --zap-all | 11:41:25 |
waltmck | I think it worked | 11:43:10 |
waltmck | the output for sgdisk --zap-all /dev/md/raid1 was
Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: OK
Backup partition table: ERROR
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
| 11:43:32 |
waltmck | great that fixed everything | 11:46:36 |
lillecarl | lassulus: Out of curiosity, would changing the size of partitions resolve this? Since /dev/md/raid1 header would be at a different location.... right? | 12:37:49 |
lassulus | not sure, probably | 12:38:46 |