| 7 Oct 2025 |
SomeoneSerge (back on matrix) | * Hey, we've been getting more experience with infra and hydra now, and I feel like ephemeral builders are becoming more and more relevant. May I ask you to elaborate? | 19:39:48 |
SomeoneSerge (back on matrix) | mx250 only? | 19:40:33 |
Lun | gfx906 is barely cared for by upstream so not sure they'd be great for automated testing
If we have credits that would otherwise go to waste and that's the only option then maybe worth it? | 19:41:00 |
Lun | gfx90a (MI210/250) is the oldest instinct option upstream seems to actually be paying attention to | 19:41:26 |
SomeoneSerge (back on matrix) | Relatable: azure offers the right hw for us, but I'm not confident we can utilize it efficiently enough yet | 19:44:47 |
Lun | I see a NGads V620-series option on a different page which is supposedly gfx1030 (probably rebranded W6800 cards) | 19:47:01 |
| 8 Oct 2025 |
connor (he/him) | I’ll try to get it cleaned up and pushed
Broadly I used NixOS-anywhere to install machines provisioned with Ubuntu because I didn’t want to deal with blob storage accounts and VHDs (though it should very doable to produce images)
IIRC the tricky part was finding the kernel modules missing for the HB series (I never got around to packaging the mellanox drivers but whatever they still have very fast IP connections) | 15:23:50 |
connor (he/him) | Thankfully Azure offers serial console through their web console so I was able to debug that (shout out to @jmbaur for being an absolute saint and walking me through the kernel side of stuff) | 15:25:29 |
SomeoneSerge (back on matrix) |
(though it should very doable to produce images)
I only tried once and, well, producing images if trivial of course, but making azure consume them... I got completely lost somewhere between "Azure Compute Galleries" and "x64 vs arm64 disks"
| 15:40:15 |
connor (he/him) | I swear at some point in https://github.com/ConnorBaker/nix-cuda-test I had written scripts to create and upload VHDs, provision Azure instances, and do builds on them; the goal being to then have scripts which provision Lambda Labs instances which pull in and run the builds to do GPU testing (since it’s cheaper than Azure GPU instances) | 16:05:41 |
connor (he/him) | Oh yeah lmao https://github.com/ConnorBaker/nix-cuda-test/blob/238062c23d1ec87cd1146652e5dde9c1cd02ff9c/.github/workflows/azure-vm-create.yaml#L7 | 16:06:42 |
connor (he/him) | I got tired of writing terraform configs and decided to just use the azure CLI. That’s probably fine for provisioning or whatever and now I do have NixOS-anywhere working lol | 16:10:56 |
connor (he/him) | I remember when I tried doing that the azure support in NixOS wasn’t great and I never got past the kernel panics I figured out with Jared at nix camp | 16:11:30 |
SomeoneSerge (back on matrix) | YES THAT WAS MY CONCLUSION AFTER MY AND MY FRIEND SPENT TWO (2!!!!!!!) DAYS FIGHTING TERRAFORM | 16:41:27 |
SomeoneSerge (back on matrix) | Cloud is insane | 16:41:34 |
SomeoneSerge (back on matrix) | Like sure I'm holding it wrong, but also it is insane | 16:41:46 |
connor (he/him) | Okay I put it here: https://github.com/ConnorBaker/nixos-configs/tree/feat/azure-remote-builders | 18:26:59 |
connor (he/him) | I've only tested with the HBv3 series Would spin up an instance with ubuntu through the web interface, then use nixos-anywhere to deploy Since I'm using sops for key management, I need to pass --extra-files and give it a path containing a persist directory (since I'm using impermanence), so for example /Volumes/nixos-azuore01 should have only /Volumes/nixos-azure01/persist/etc/ssh/ssh_host_ed25519_key in it | 18:29:25 |
| 9 Oct 2025 |
| srhb set a profile picture. | 07:08:01 |
connor (he/him) | Ugh my head | 07:08:21 |
connor (he/him) | SomeoneSerge (back on matrix)I hope to have the CUDA 13 PR ready for review in the next 24h | 07:08:49 |
SomeoneSerge (back on matrix) | Looking forward to review! | 11:35:34 |
SomeoneSerge (back on matrix) | * Looking forward to review (and rebase my shit)! | 11:35:57 |
connor (he/him) | Here's an example of the using the output of the diff part of nix-nixpkgs-review to generate release-cuda.nix: https://github.com/NixOS/nixpkgs/pull/450477/commits/0b971ca46608e58381a8613dc52306da2f242311 | 22:42:59 |
connor (he/him) | Okay I think the CUDA 13 PR is ready: https://github.com/NixOS/nixpkgs/pull/437723
And by that I mean I'm exhausted and don't really want to think about it any more | 22:47:32 |
connor (he/him) | TL;DR: expect basically nothing in-tree to work with CUDA 13. If it does, rejoice! | 22:48:06 |
connor (he/him) | I'm currently running nixpkgs-review on x86_64-linux | 22:55:29 |
connor (he/him) | kill me cudaPackages_13.saxpy doesn't build
cuda13.0-saxpy> CMake Error in CMakeLists.txt:
cuda13.0-saxpy> Imported target "CUDA::cublas" includes non-existent path
cuda13.0-saxpy>
cuda13.0-saxpy> "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl"
cuda13.0-saxpy>
cuda13.0-saxpy> in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
cuda13.0-saxpy>
cuda13.0-saxpy> * The path was deleted, renamed, or moved to another location.
cuda13.0-saxpy>
cuda13.0-saxpy> * An install or uninstall procedure did not complete successfully.
cuda13.0-saxpy>
cuda13.0-saxpy> * The installation package was faulty and references files it does not
cuda13.0-saxpy> provide.
| 23:04:06 |
connor (he/him) | nvcc.profile gets patched from SYSTEM_INCLUDES += "-isystem" "$(TOP)/$(_TARGET_DIR_)/include/cccl" $(_SPACE_) to SYSTEM_INCLUDES += "-isystem" "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl" $(_SPACE_) 🥴 | 23:32:04 |
connor (he/him) | https://github.com/NixOS/nixpkgs/pull/437723/commits/ffead29ec174980fbcc2ac610195f64328856705 | 23:42:31 |