| 31 Oct 2025 |
Robbie Buxton | I think also a fair amount of stuff upstream doesn’t even build with cuda 13 yet either | 14:37:54 |
connor (he/him) | Yeah NVIDIA does not care outside of projects they dedicate engineering hours to supporting, and changing the default version of OpenCV or other large projects to a commit from master adding support would be dead on arrival, and trying to special case it just for when CUDA is configured would be difficult. | 14:39:51 |
| Daniel Fahey set a profile picture. | 14:56:01 |
Daniel Fahey | This is quite a convincing argument to revert the 99 commits https://github.com/NixOS/nixpkgs/pull/437723#issuecomment-3472997390
Maybe there could be a cuda-refactor branch that is continually built and tested by e.g. https://hydra.nixos-cuda.org/jobset/nixpkgs/cuda-refactor while it gets the attention it deserves?
| 15:50:50 |
Daniel Fahey | (all a bit over my current pay grade with my limited Nixpkgs experience though, lol) just really want to express my gratitude to the CUDA Team | 15:52:21 |
Robbie Buxton | My understanding (which may be incorrect) is that CUDA 13 is opt in so will only break if you try and use it instead of the default? | 16:03:44 |
connor (he/him) | Gaétan LepageSomeoneSerge (back on matrix) are you okay with merging:
- https://github.com/NixOS/nixpkgs/pull/457338
- https://github.com/NixOS/nixpkgs/pull/457220
I’d like there to be consensus as a team for those reverts to go through. Serge, I know you’re in favor of the config.cudaSupport one, but I’d like to issue the statement/decision as a team.
| 19:40:25 |
connor (he/him) | Correct | 19:46:10 |
connor (he/him) | We don’t have anywhere near the capacity (hardware or labor) to do that on a regular cadence, but that would be nice | 19:47:00 |
apyh | what kind of hardware is needed for reasonably-fast-ish compile cycles? | 19:59:36 |
connor (he/him) | That depends entirely on what you’re building. My suggestion is to compile for exactly the CUDA capabilities you need — the CUDA compiler and linker is incredibly slow so it helps a lot. | 20:01:29 |
apyh | yeah makes sense - was seeing if i could volunteer a personal machine to help make the dev cycle possible 😓 | 20:02:07 |
Robbie Buxton | From experience adding compute 12 capability doubled my PyTorch build time so def keep an eye on it | 20:02:37 |
Gaétan Lepage | We have very recently acquired new hardware. That is still far from the perfect infra, but it's definitely a good progress. | 20:02:54 |
Gaétan Lepage | I broke the record yesterday building python3Packages.torch with cudaSupport enabled.
-> 41 min on 96 cores. | 20:03:48 |
Gaétan Lepage | Do not try to replicate on your laptop 🫠 | 20:04:03 |
Gaétan Lepage | connor (burnt/out) (UTC-7) ACK for both. | 20:04:17 |
Robbie Buxton | I’ve oomed a machine with over 1tb of ram building nix cuda packages 😎 | 20:04:38 |
apyh | In reply to @glepage:matrix.org I broke the record yesterday building python3Packages.torch with cudaSupport enabled.
-> 41 min on 96 cores. omg. i wanna try. | 20:04:40 |
Gaétan Lepage | I have only 128GB of RAM on my builder. So I got swap to a (sometimes necessary) 500GB size. | 20:05:45 |
Gaétan Lepage | ptxas can be very expensive memory-wise... | 20:06:15 |
Gaétan Lepage | connor (burnt/out) (UTC-7) I found out the issue with firefox.
Both prior and after the CUDA 13 PR, cudaPackages.backendStdenv.cc (gcc-wrapper) was leaking into the firefox output.
However, before the CUDA 13 PR, stdenv and cudaPackages.backendStdenv were not the same.
After the CUDA 13 PR, stdenv and cudaPackages.backendStdenv are the same.
Hence, disallowedRequisites = [ stdenv.cc ]; catches the nvcc-leaked gcc-wrapper (cudaPackages.backendStdenv.cc).
So, who's fault is it?
A) It is wrong that stdenv == cudaPackages.backendStdenv. Then the issue is not cudaPackages.cuda_nvcc leaking gcc-wrapper.
B) It is normal that stdenv == cudaPackages.backendStdenv, but cudaPackages.cuda_nvcc should have never leaked gcc-wrapper in the first place. | 21:21:05 |
Gaétan Lepage | * connor (burnt/out) (UTC-7) I found out the issue with firefox.
Both prior and after the CUDA 13 PR, cudaPackages.backendStdenv.cc (gcc-wrapper) was leaking into the firefox output.
However, before the CUDA 13 PR, stdenv and cudaPackages.backendStdenv were not the same.
After the CUDA 13 PR, stdenv and cudaPackages.backendStdenv are the same.
Hence, in firefox (wrapper) derivation, disallowedRequisites = [ stdenv.cc ]; catches the nvcc-leaked gcc-wrapper (cudaPackages.backendStdenv.cc).
So, who's fault is it?
A) It is wrong that stdenv == cudaPackages.backendStdenv. Then the issue is not cudaPackages.cuda_nvcc leaking gcc-wrapper.
B) It is normal that stdenv == cudaPackages.backendStdenv, but cudaPackages.cuda_nvcc should have never leaked gcc-wrapper in the first place. | 21:21:43 |
apyh | In reply to @apyh:matrix.org omg. i wanna try. ripped it on your branch in 23m, including thr magma compile - only compute 8.9 tho | 21:38:03 |
Gaétan Lepage | Oh, I was implying "all caps enabled" | 21:39:18 |
apyh | lemme try it :3 | 21:43:58 |
connor (he/him) | stdenv can be cudaPackages.backendStdenv if the version of GCC is supported by NVCC. It’s only different when we need to use an older version of GCC.
NVCC shouldn’t leak the GCC wrapper since it should be largely build-time only. Any ideas why it’s propagating like that? | 21:46:06 |
Gaétan Lepage | Thanks for the follow-up.
The leakage chain is:
cudaPackages.cuda_nvcc -> cudaPackages.nccl -> firefox-unwrapped -> firefox.
In cuda_nvcc.nix, backendStdenv.cc is added into cuda_nvcc's propagatedBuildInputs. | 21:50:01 |
Gaétan Lepage | https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/cuda-modules/packages/cuda_nvcc.nix#L24-L25 | 21:50:06 |
apyh | do you have a command / nixpkgs config for me to work with or nah? | 21:50:14 |