!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

282 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
7 Oct 2025
@lt1379:matrix.orgLun I see a NGads V620-series option on a different page which is supposedly gfx1030 (probably rebranded W6800 cards) 19:47:01
8 Oct 2025
@connorbaker:matrix.orgconnor (he/him)I’ll try to get it cleaned up and pushed Broadly I used NixOS-anywhere to install machines provisioned with Ubuntu because I didn’t want to deal with blob storage accounts and VHDs (though it should very doable to produce images) IIRC the tricky part was finding the kernel modules missing for the HB series (I never got around to packaging the mellanox drivers but whatever they still have very fast IP connections)15:23:50
@connorbaker:matrix.orgconnor (he/him)Thankfully Azure offers serial console through their web console so I was able to debug that (shout out to @jmbaur for being an absolute saint and walking me through the kernel side of stuff)15:25:29
@ss:someonex.netSomeoneSerge (back on matrix)

(though it should very doable to produce images)

I only tried once and, well, producing images if trivial of course, but making azure consume them... I got completely lost somewhere between "Azure Compute Galleries" and "x64 vs arm64 disks"

15:40:15
@connorbaker:matrix.orgconnor (he/him)I swear at some point in https://github.com/ConnorBaker/nix-cuda-test I had written scripts to create and upload VHDs, provision Azure instances, and do builds on them; the goal being to then have scripts which provision Lambda Labs instances which pull in and run the builds to do GPU testing (since it’s cheaper than Azure GPU instances)16:05:41
@connorbaker:matrix.orgconnor (he/him)Oh yeah lmao https://github.com/ConnorBaker/nix-cuda-test/blob/238062c23d1ec87cd1146652e5dde9c1cd02ff9c/.github/workflows/azure-vm-create.yaml#L716:06:42
@connorbaker:matrix.orgconnor (he/him)I got tired of writing terraform configs and decided to just use the azure CLI. That’s probably fine for provisioning or whatever and now I do have NixOS-anywhere working lol16:10:56
@connorbaker:matrix.orgconnor (he/him)I remember when I tried doing that the azure support in NixOS wasn’t great and I never got past the kernel panics I figured out with Jared at nix camp16:11:30
@ss:someonex.netSomeoneSerge (back on matrix)YES THAT WAS MY CONCLUSION AFTER MY AND MY FRIEND SPENT TWO (2!!!!!!!) DAYS FIGHTING TERRAFORM16:41:27
@ss:someonex.netSomeoneSerge (back on matrix)Cloud is insane16:41:34
@ss:someonex.netSomeoneSerge (back on matrix)Like sure I'm holding it wrong, but also it is insane16:41:46
@connorbaker:matrix.orgconnor (he/him)Okay I put it here: https://github.com/ConnorBaker/nixos-configs/tree/feat/azure-remote-builders18:26:59
@connorbaker:matrix.orgconnor (he/him) I've only tested with the HBv3 series
Would spin up an instance with ubuntu through the web interface, then use nixos-anywhere to deploy
Since I'm using sops for key management, I need to pass --extra-files and give it a path containing a persist directory (since I'm using impermanence), so for example /Volumes/nixos-azuore01 should have only /Volumes/nixos-azure01/persist/etc/ssh/ssh_host_ed25519_key in it
18:29:25
9 Oct 2025
@srhb:matrix.orgsrhb set a profile picture.07:08:01
@connorbaker:matrix.orgconnor (he/him)Ugh my head07:08:21
@connorbaker:matrix.orgconnor (he/him) SomeoneSerge (back on matrix)I hope to have the CUDA 13 PR ready for review in the next 24h 07:08:49
@ss:someonex.netSomeoneSerge (back on matrix)Looking forward to review!11:35:34
@ss:someonex.netSomeoneSerge (back on matrix)* Looking forward to review (and rebase my shit)!11:35:57
@connorbaker:matrix.orgconnor (he/him) Here's an example of the using the output of the diff part of nix-nixpkgs-review to generate release-cuda.nix: https://github.com/NixOS/nixpkgs/pull/450477/commits/0b971ca46608e58381a8613dc52306da2f242311 22:42:59
@connorbaker:matrix.orgconnor (he/him)Okay I think the CUDA 13 PR is ready: https://github.com/NixOS/nixpkgs/pull/437723 And by that I mean I'm exhausted and don't really want to think about it any more22:47:32
@connorbaker:matrix.orgconnor (he/him)TL;DR: expect basically nothing in-tree to work with CUDA 13. If it does, rejoice!22:48:06
@connorbaker:matrix.orgconnor (he/him)I'm currently running nixpkgs-review on x86_64-linux22:55:29
@connorbaker:matrix.orgconnor (he/him)

kill me cudaPackages_13.saxpy doesn't build

cuda13.0-saxpy> CMake Error in CMakeLists.txt:
cuda13.0-saxpy>   Imported target "CUDA::cublas" includes non-existent path
cuda13.0-saxpy> 
cuda13.0-saxpy>     "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl"
cuda13.0-saxpy> 
cuda13.0-saxpy>   in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:
cuda13.0-saxpy> 
cuda13.0-saxpy>   * The path was deleted, renamed, or moved to another location.
cuda13.0-saxpy> 
cuda13.0-saxpy>   * An install or uninstall procedure did not complete successfully.
cuda13.0-saxpy> 
cuda13.0-saxpy>   * The installation package was faulty and references files it does not
cuda13.0-saxpy>   provide.
23:04:06
@connorbaker:matrix.orgconnor (he/him) nvcc.profile gets patched from SYSTEM_INCLUDES += "-isystem" "$(TOP)/$(_TARGET_DIR_)/include/cccl" $(_SPACE_) to SYSTEM_INCLUDES += "-isystem" "/nix/store/96n5czdjq66csa28ml9s1kwa13xnsbdp-cuda13.0-cuda_nvcc-13.0.88/include/cccl" $(_SPACE_) 🥴 23:32:04
@connorbaker:matrix.orgconnor (he/him)https://github.com/NixOS/nixpkgs/pull/437723/commits/ffead29ec174980fbcc2ac610195f6432885670523:42:31
10 Oct 2025
@connorbaker:matrix.orgconnor (he/him) cuda-legacy is going to be such a pain in the ass if the roughly nine hours I just spent trying to build PyTorch against CUDA 11.4 is any indication 23:25:40
@connorbaker:matrix.orgconnor (he/him)(I was not successful; will resume trying with PyTorch 2.6 instead of 2.7 later)23:26:30
11 Oct 2025
@rosscomputerguy:matrix.orgTristan Ross Hey, connor (he/him) (UTC-7) & SomeoneSerge (back on matrix). Either of you wanna collab on getting Tenstorrent support into nixpkgs? I'm the only one working on it but I think since this is in a realm of AI, ML, and GPU-like computing, it would make sense to involve people already touching that stuff. 02:29:45
@connorbaker:matrix.orgconnor (he/him)I’d love to but I don’t have time :(15:37:38
@glepage:matrix.orgGaétan Lepage FYI: I'm working on bumping onnx[runtime] in https://github.com/NixOS/nixpkgs/pull/450587
However, the build fails... More investigation needed.
16:20:35

Show newer messages


Back to Room ListRoom Version: 9