| 3 Dec 2025 |
Gaétan Lepage | Ok makes sense! Done. | 19:23:21 |
Gaétan Lepage | connor (burnt/out) (UTC-8) the onnx bump should be good to go. | 20:50:50 |
teto | I wonder what priority to set the cache.nixos-cuda.org cache to ? it's the defualt on the webpage but shouldn't a typical user fetch from cache.nixos.org first ? in terms of cost etc, is there any preference ? | 21:17:23 |
hexa (UTC+1) | the priority is set on the server-side no? | 21:18:07 |
hexa (UTC+1) | and yes, you'll always want to prefer c.n.o | 21:18:28 |
Gaétan Lepage | You can also append ?priority=3 to the substituters in /etc/nix/nix.conf | 21:20:09 |
hexa (UTC+1) | ah ok | 21:20:19 |
teto | ha yeah the server shows 50 as default https://cache.nixos-cuda.org/ so I have nothing to do nice :) | 22:22:24 |
teto | (I was indeed thinking of ?priority ) | 22:22:39 |
hexa (UTC+1) | it does now 😛 | 22:24:16 |
| corroding556 joined the room. | 23:55:01 |
| 4 Dec 2025 |
corroding556 | Hi all! Very much appreciate the work that's been put into CUDA support in nixpkgs/NixOS. Recently updated my system configuration to a more recent version of nixpkgs and had to pin cudaCapabilities to 6.1 now that CUDA 13.0 has dropped support for Pascal, started getting some confusing build failures as a result. Spent several hours looking into how the CUDA packaging ecosystem works only to realize using --trace-verbose gave the answer straight up >.<.
It seems nixpkgs updating to use cuDNN 9.13 means that other packages pulling in cudaPackages_12_{6,8,9} no longer support compute capabilities < 7.5 even though CUDA supports compute capabilities >= 5.0 up until the jump to 13.0.
I noticed 9.13 is not the only version in nixpkgs though, what is the strategy around how many legacy versions of CUDA packages to maintain in nixpkgs? Does it make sense to add cuDNN 9.11 as a pinned version to bridge the gap since 9.12 has dropped support for compute capabilities < 7.5? If that's not appropriate 8.9.7 is the most recent version available in nixpkgs which still supports my hardware, how would I/how reasonable is it to force my config to use that?
Sorry for all the questions, appreciate any advice 😅
| 01:57:08 |
Alexandros Liarokapis | any idea what is the difference between torch-bin and torchWithCuda ? | 12:28:06 |
Robbie Buxton | In reply to @aliarokapis:matrix.org any idea what is the difference between torch-bin and torchWithCuda ? Iirc torch-bin is torch not built from source and torchWithCuda is torch built from source with cuda enabled forced regardless of global configuration? | 13:34:25 |
Gaétan Lepage | Yes, this is it. | 13:46:25 |
Alexandros Liarokapis | and it is apparently in the nixos cache by defualt? | 14:05:18 |
Alexandros Liarokapis | * and it is apparently in the nixos cache by default? | 14:16:57 |
Gaétan Lepage | I'm not sure torchWithCuda will be.
For `cudaSupport-enabled packages, consider using the Flox binary cache, or the NixOS-CUDA one. | 14:28:52 |
connor (burnt/out) (UTC-8) | I’ll try to answer this later today. Depending on how comfortable you are with Nix, pull in the overlay for CUDA-legacy (https://github.com/nixos-cuda/cuda-legacy) to add a bunch of manifests and then customize the package set to your liking by using override on the CUDA package set and providing the manifest version you want. The docs are lacking an example for this.
As you discovered, NVCC may support capabilities but that doesn’t mean the big libraries most people use (cuDNN, libcublas, TensorRT, etc.) do. We have the unenviable job of either adopting the latest release for each version or fixing them in time and never updating. The decision is made more difficult by the fact NVIDIA seems to fix bugs by doing major/minor releases much more often than patch releases.
The trace-verbose thing is handy but undocumented and only exists because implementations of the Problems RFC keep getting bikeshedded to death.
We should probably have a section in the CUDA docs which list supported capabilities for each package set. Could be automatically generated given I added the available capabilities for each release to backendStdenv.
| 16:28:20 |
connor (burnt/out) (UTC-8) | god i hate computers | 16:29:35 |
connor (burnt/out) (UTC-8) | Reminder to self: post about changes I’ve been working on / need (fix adding attributes to backendStdenv, nvcc multiple outputs again, ccache) | 16:33:13 |
SomeoneSerge (back on matrix) |
problems rfc
Tha release was cut off, IMO we should push a half-assed partial impl as per my closed PR in, bc 99% of the features we don't care about
| 19:52:52 |
SomeoneSerge (back on matrix) |
stdenv attributes
...also nuke the other 90% attributes that don't actually belong and aren't even used
| 19:53:40 |
| 4 Aug 2022 |
| Winter (she/her) joined the room. | 03:26:42 |
Winter (she/her) | (hi, just came here to read + respond to this.) | 03:28:52 |
tpw_rules | hey. i had previously sympathzied with samuela and like i said before had some of the same frustrations. i just edited my github comment to add "[CUDA] packages are universally complicated, fragile to package, and critical to daily operations. Nix being able to manage them is unbelievably helpful to those of us who work with them regularly, even if support is downgraded to only having an expectation of function on stable branches." | 03:29:14 |
Winter (she/her) | In reply to @tpw_rules:matrix.org i'm mildly peeved about a recent merging of something i maintain where i'm pretty sure the merger does not own the expensive hardware required to properly test the package. i don't think it broke anything but i was given precisely 45 minutes to see the notification before somebody merged it ugh, 45 minutes? that's... not great. not to air dirty laundry but did you do what samuela did in the wandb PR and at least say that that wasn't a great thing to do? (not sure how else to word that, you get what i mean) | 03:30:23 |
tpw_rules | no, i haven't yet, but i probably will | 03:31:03 |
Winter (she/her) | i admittedly did that with a PR once, i forget how long the maintainer was requested for but i merged it because multiple people reported it fixed the issue. the maintainer said "hey, don't do that" after and now i do think twice before merging. so it could help, is what i'm saying. | 03:31:50 |
tpw_rules | i'm not sure what went wrong with the wandb PR anyway, i think it was just a boneheaded move on the maintainer's part | 03:32:10 |