| 7 Oct 2025 |
Lun | I see a NGads V620-series option on a different page which is supposedly gfx1030 (probably rebranded W6800 cards) | 19:47:01 |
| 8 Oct 2025 |
connor (he/him) | I’ll try to get it cleaned up and pushed
Broadly I used NixOS-anywhere to install machines provisioned with Ubuntu because I didn’t want to deal with blob storage accounts and VHDs (though it should very doable to produce images)
IIRC the tricky part was finding the kernel modules missing for the HB series (I never got around to packaging the mellanox drivers but whatever they still have very fast IP connections) | 15:23:50 |
connor (he/him) | Thankfully Azure offers serial console through their web console so I was able to debug that (shout out to @jmbaur for being an absolute saint and walking me through the kernel side of stuff) | 15:25:29 |
SomeoneSerge (back on matrix) |
(though it should very doable to produce images)
I only tried once and, well, producing images if trivial of course, but making azure consume them... I got completely lost somewhere between "Azure Compute Galleries" and "x64 vs arm64 disks"
| 15:40:15 |
connor (he/him) | I swear at some point in https://github.com/ConnorBaker/nix-cuda-test I had written scripts to create and upload VHDs, provision Azure instances, and do builds on them; the goal being to then have scripts which provision Lambda Labs instances which pull in and run the builds to do GPU testing (since it’s cheaper than Azure GPU instances) | 16:05:41 |
connor (he/him) | Oh yeah lmao https://github.com/ConnorBaker/nix-cuda-test/blob/238062c23d1ec87cd1146652e5dde9c1cd02ff9c/.github/workflows/azure-vm-create.yaml#L7 | 16:06:42 |
connor (he/him) | I got tired of writing terraform configs and decided to just use the azure CLI. That’s probably fine for provisioning or whatever and now I do have NixOS-anywhere working lol | 16:10:56 |
connor (he/him) | I remember when I tried doing that the azure support in NixOS wasn’t great and I never got past the kernel panics I figured out with Jared at nix camp | 16:11:30 |
SomeoneSerge (back on matrix) | YES THAT WAS MY CONCLUSION AFTER MY AND MY FRIEND SPENT TWO (2!!!!!!!) DAYS FIGHTING TERRAFORM | 16:41:27 |
SomeoneSerge (back on matrix) | Cloud is insane | 16:41:34 |
SomeoneSerge (back on matrix) | Like sure I'm holding it wrong, but also it is insane | 16:41:46 |
connor (he/him) | Okay I put it here: https://github.com/ConnorBaker/nixos-configs/tree/feat/azure-remote-builders | 18:26:59 |
connor (he/him) | I've only tested with the HBv3 series Would spin up an instance with ubuntu through the web interface, then use nixos-anywhere to deploy Since I'm using sops for key management, I need to pass --extra-files and give it a path containing a persist directory (since I'm using impermanence), so for example /Volumes/nixos-azuore01 should have only /Volumes/nixos-azure01/persist/etc/ssh/ssh_host_ed25519_key in it | 18:29:25 |
| 9 Oct 2025 |
| srhb set a profile picture. | 07:08:01 |
connor (he/him) | Ugh my head | 07:08:21 |
connor (he/him) | SomeoneSerge (back on matrix)I hope to have the CUDA 13 PR ready for review in the next 24h | 07:08:49 |
SomeoneSerge (back on matrix) | Looking forward to review! | 11:35:34 |
SomeoneSerge (back on matrix) | * Looking forward to review (and rebase my shit)! | 11:35:57 |
connor (he/him) | Here's an example of the using the output of the diff part of nix-nixpkgs-review to generate release-cuda.nix: https://github.com/NixOS/nixpkgs/pull/450477/commits/0b971ca46608e58381a8613dc52306da2f242311 | 22:42:59 |
connor (he/him) | Okay I think the CUDA 13 PR is ready: https://github.com/NixOS/nixpkgs/pull/437723
And by that I mean I'm exhausted and don't really want to think about it any more | 22:47:32 |
connor (he/him) | TL;DR: expect basically nothing in-tree to work with CUDA 13. If it does, rejoice! | 22:48:06 |
connor (he/him) | I'm currently running nixpkgs-review on x86_64-linux | 22:55:29 |
connor (he/him) | kill me cudaPackages_13.saxpy doesn't build
cuda13.0-saxpy> CMake Error in CMakeLists.txt:
cuda13.0-saxpy> Imported target "CUDA::cublas" includes non-existent path
cuda13.0-saxpy>
cuda13.0-saxpy> "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl"
cuda13.0-saxpy>
cuda13.0-saxpy> in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include:
cuda13.0-saxpy>
cuda13.0-saxpy> * The path was deleted, renamed, or moved to another location.
cuda13.0-saxpy>
cuda13.0-saxpy> * An install or uninstall procedure did not complete successfully.
cuda13.0-saxpy>
cuda13.0-saxpy> * The installation package was faulty and references files it does not
cuda13.0-saxpy> provide.
| 23:04:06 |
connor (he/him) | nvcc.profile gets patched from SYSTEM_INCLUDES += "-isystem" "$(TOP)/$(_TARGET_DIR_)/include/cccl" $(_SPACE_) to SYSTEM_INCLUDES += "-isystem" "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl" $(_SPACE_) 🥴 | 23:32:04 |
connor (he/him) | https://github.com/NixOS/nixpkgs/pull/437723/commits/ffead29ec174980fbcc2ac610195f64328856705 | 23:42:31 |
| 10 Oct 2025 |
connor (he/him) | cuda-legacy is going to be such a pain in the ass if the roughly nine hours I just spent trying to build PyTorch against CUDA 11.4 is any indication | 23:25:40 |
connor (he/him) | (I was not successful; will resume trying with PyTorch 2.6 instead of 2.7 later) | 23:26:30 |
| 11 Oct 2025 |
Tristan Ross | Hey, connor (he/him) (UTC-7) & SomeoneSerge (back on matrix). Either of you wanna collab on getting Tenstorrent support into nixpkgs? I'm the only one working on it but I think since this is in a realm of AI, ML, and GPU-like computing, it would make sense to involve people already touching that stuff. | 02:29:45 |
connor (he/him) | I’d love to but I don’t have time :( | 15:37:38 |
Gaétan Lepage | FYI: I'm working on bumping onnx[runtime] in https://github.com/NixOS/nixpkgs/pull/450587
However, the build fails... More investigation needed. | 16:20:35 |