| 7 Oct 2025 |
connor (burnt/out) (UTC-8) | https://forge.someonex.net/else/sidx | 15:40:26 |
SomeoneSerge (back on matrix) | connor (he/him) (UTC-7): ping about azure stuff again | 19:30:18 |
SomeoneSerge (back on matrix) | Hey, we've been getting more experience with infra and hydra now, and I think like ephemeral builders are becoming more and more relevant. May I ask you to elaborate? | 19:34:01 |
SomeoneSerge (back on matrix) | * connor (he/him) (UTC-7): ping about azure images again | 19:34:34 |
SomeoneSerge (back on matrix) | MI25s to test ROCm? CC Lun https://docs.azure.cn/en-us/virtual-machines/sizes/gpu-accelerated/nv-family | 19:39:24 |
Lun | Too old | 19:39:46 |
SomeoneSerge (back on matrix) | * Hey, we've been getting more experience with infra and hydra now, and I feel like ephemeral builders are becoming more and more relevant. May I ask you to elaborate? | 19:39:48 |
SomeoneSerge (back on matrix) | mx250 only? | 19:40:33 |
Lun | gfx906 is barely cared for by upstream so not sure they'd be great for automated testing
If we have credits that would otherwise go to waste and that's the only option then maybe worth it? | 19:41:00 |
Lun | gfx90a (MI210/250) is the oldest instinct option upstream seems to actually be paying attention to | 19:41:26 |
SomeoneSerge (back on matrix) | Relatable: azure offers the right hw for us, but I'm not confident we can utilize it efficiently enough yet | 19:44:47 |
Lun | I see a NGads V620-series option on a different page which is supposedly gfx1030 (probably rebranded W6800 cards) | 19:47:01 |
| 8 Oct 2025 |
connor (burnt/out) (UTC-8) | I’ll try to get it cleaned up and pushed
Broadly I used NixOS-anywhere to install machines provisioned with Ubuntu because I didn’t want to deal with blob storage accounts and VHDs (though it should very doable to produce images)
IIRC the tricky part was finding the kernel modules missing for the HB series (I never got around to packaging the mellanox drivers but whatever they still have very fast IP connections) | 15:23:50 |
connor (burnt/out) (UTC-8) | Thankfully Azure offers serial console through their web console so I was able to debug that (shout out to @jmbaur for being an absolute saint and walking me through the kernel side of stuff) | 15:25:29 |
SomeoneSerge (back on matrix) |
(though it should very doable to produce images)
I only tried once and, well, producing images if trivial of course, but making azure consume them... I got completely lost somewhere between "Azure Compute Galleries" and "x64 vs arm64 disks"
| 15:40:15 |
connor (burnt/out) (UTC-8) | I swear at some point in https://github.com/ConnorBaker/nix-cuda-test I had written scripts to create and upload VHDs, provision Azure instances, and do builds on them; the goal being to then have scripts which provision Lambda Labs instances which pull in and run the builds to do GPU testing (since it’s cheaper than Azure GPU instances) | 16:05:41 |
connor (burnt/out) (UTC-8) | Oh yeah lmao https://github.com/ConnorBaker/nix-cuda-test/blob/238062c23d1ec87cd1146652e5dde9c1cd02ff9c/.github/workflows/azure-vm-create.yaml#L7 | 16:06:42 |
connor (burnt/out) (UTC-8) | I got tired of writing terraform configs and decided to just use the azure CLI. That’s probably fine for provisioning or whatever and now I do have NixOS-anywhere working lol | 16:10:56 |
connor (burnt/out) (UTC-8) | I remember when I tried doing that the azure support in NixOS wasn’t great and I never got past the kernel panics I figured out with Jared at nix camp | 16:11:30 |
SomeoneSerge (back on matrix) | YES THAT WAS MY CONCLUSION AFTER MY AND MY FRIEND SPENT TWO (2!!!!!!!) DAYS FIGHTING TERRAFORM | 16:41:27 |
SomeoneSerge (back on matrix) | Cloud is insane | 16:41:34 |
SomeoneSerge (back on matrix) | Like sure I'm holding it wrong, but also it is insane | 16:41:46 |
connor (burnt/out) (UTC-8) | Okay I put it here: https://github.com/ConnorBaker/nixos-configs/tree/feat/azure-remote-builders | 18:26:59 |
connor (burnt/out) (UTC-8) | I've only tested with the HBv3 series Would spin up an instance with ubuntu through the web interface, then use nixos-anywhere to deploy Since I'm using sops for key management, I need to pass --extra-files and give it a path containing a persist directory (since I'm using impermanence), so for example /Volumes/nixos-azuore01 should have only /Volumes/nixos-azure01/persist/etc/ssh/ssh_host_ed25519_key in it | 18:29:25 |
| 9 Oct 2025 |
| srhb set a profile picture. | 07:08:01 |
connor (burnt/out) (UTC-8) | Ugh my head | 07:08:21 |
connor (burnt/out) (UTC-8) | SomeoneSerge (back on matrix)I hope to have the CUDA 13 PR ready for review in the next 24h | 07:08:49 |
SomeoneSerge (back on matrix) | Looking forward to review! | 11:35:34 |
SomeoneSerge (back on matrix) | * Looking forward to review (and rebase my shit)! | 11:35:57 |
connor (burnt/out) (UTC-8) | Here's an example of the using the output of the diff part of nix-nixpkgs-review to generate release-cuda.nix: https://github.com/NixOS/nixpkgs/pull/450477/commits/0b971ca46608e58381a8613dc52306da2f242311 | 22:42:59 |