NixOS CUDA | 326 Members | |
| CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda | 64 Servers |
| Sender | Message | Time |
|---|---|---|
| 22 Feb 2023 | ||
| Is there a recommended way to get in touch with NVIDIA about their docs? For example, https://docs.nvidia.com/cuda/archive/11.0.3/ gives me an access denied, and some of their tables in their older docs are missing supported compute capabilities (https://docs.nvidia.com/cuda/archive/11.2.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list vs https://docs.nvidia.com/cuda/archive/11.3.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, sm_37 reappears, but sm_52 is missing in both) | 15:05:30 | |
| Ah, the link for their 11.0.x docs on https://developer.nvidia.com/cuda-toolkit-archive is wrong -- it follows the 10.2 format so it should be something like https://docs.nvidia.com/cuda/archive/11.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list | 15:09:01 | |
| 23 Feb 2023 | ||
| If anyone has any knowledge to contribute, I'd appreciate it: https://github.com/NixOS/nixpkgs/issues/217780 | 01:14:30 | |
| RE: Getting in touch, I'd recommend starting a new thread in https://forums.developer.nvidia.com/c/8 | 03:09:29 | |
| NVCC has a certain range of compilers it supports. I know that currently we export CC/CXX/CUDAHOSTCXX as appropriate to handle that... but that only changes things in the current derivation. Since the default language standard (like c++11 -> c++14) can change between compiler releases, it's possible that we build a derivation with an NVCC-supported version of GCC or clang, but the libraries that derivation links against were built with a different compiler version with a different language standard. That can manifest as missing or broken symbols during linking, right? | 21:51:31 | |
| 24 Feb 2023 | ||
| Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this:
OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!) | 01:02:04 | |
It did! Now I'm seeing a different error of RuntimeError: CUDA driver error: PTX JIT compiler library not found, but that's progress :) | 01:32:45 | |
* It did! Now I'm seeing a different error of RuntimeError: CUDA driver error: PTX JIT compiler library not found, but that's because I'm not using nixGL yet on a non-NixOS machine | 01:41:42 | |
| Is there an issue with
but on master I see
(might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped) | 19:06:53 | |
Yes, there is. Currently derivations need to set the C/C++ compilers to the same version used by NVCC, otherwise you get errors like that (in the case the CUDA host compiler isn't specified, where NVCC just uses whatever compiler stdenv has) or weird symbol errors when linking (if libraries being linked against each other were built with different compilers/language standard versions) | 20:18:55 | |
| Ah, thanks! I saw your message just above, but didn't make the connection that it's the same issue. Will play around with it and see if I can get the minimal example to work | 20:23:06 | |
In reply to @connorbaker:matrix.orgTo add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95 | 20:25:15 | |
| connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically? | 20:34:46 | |
| https://github.com/NixOS/nixpkgs/issues/217913 https://github.com/NixOS/nixpkgs/issues/217878 These two seem like instances of the same problem | 20:35:47 | |
In reply to @connorbaker:matrix.org* To add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95 Same with jax, faiss, &c | 20:36:03 | |
In reply to @mcwitt:matrix.orgJust overriding this with gcc11Stdenv succeeds | 20:37:35 | |
| Btw, great test, it's a shame we don't run it anywhere automatically 🤣 | 20:38:07 | |
In reply to @ss:someonex.netMaybe rather than fixing cudaPackages.cudatoolkit.cc in non-redist cudatoolkit's versions.toml we should set a cudaPackages-wide default stdenv (e.g. cudaPackages.stdenv = gcc11Stdenv in case of pre-cuda-12). It seems like downstream packages do have to use that stdenv if they build any cuda kernels. | 20:43:23 | |
| RE: opencv in BSRT as well as tensorflow and jax Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist | 20:46:07 | |
| * RE: opencv in BSRT as well as tensorflow and jax Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist | 20:46:10 | |
In reply to @ss:someonex.net* Maybe rather than fixing cudaPackages.cudatoolkit.cc in non-redist cudatoolkit's versions.toml we should set a cudaPackages-wide default stdenv (e.g. cudaPackages.stdenv = gcc11Stdenv in case of pre-cuda-12)? It seems like downstream packages do have to use that stdenv if they build any cuda kernels. | 20:46:42 | |
In reply to @mcwitt:matrix.org* Just overriding this with gcc11Stdenv succeeds (same applies e.g. to faiss attribute in nixpkgs) | 20:47:22 | |
| A standard environment for CUDA would be really nice given that NVCC always has version constraints on the compiler ALTERNATIVELY, if we didn't want to change anything else, we could add the NVCC flag --allow-unsupported-compiler (or something similar, I don't remember) and just build with whatever | 20:49:12 | |
Just to close the loop, the fix in my case was to set cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER ${cudaPackages.cudatoolkit.cc}/bin/cc" ] (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction! | 21:34:02 | |
* Just to close the loop, the fix in my case was to set cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER=${cudaPackages.cudatoolkit.cc}/bin/cc" ] (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction! | 21:34:32 | |
I made a helper for nixpkgs-review workflows! After redirecting all the output to a file, the script takes all of the failing derivations, makes a gist for each build log, and makes a little markdown table in a comment on the PR you were review.Script here: https://gist.github.com/ConnorBaker/b32a7f69d318e3f338b6b4fedeef37ef Example comment here: https://github.com/NixOS/nixpkgs/pull/218035#issuecomment-1444682137 | 23:26:47 | |
| Although, all these tools print with color to output even if it's a file, so there are escape characters in them :( | 23:28:54 | |
| 25 Feb 2023 | ||
Damn... I forget again, how do I make a command run after autoPatchelfHook?.. | 11:32:03 | |
Appending to postFixup doesn't seem to do the trick | 11:32:19 | |
Is there a reason we use glob in auto-patchelf.py? It skips hidden files, including files renamed by wrapProgram | 11:39:11 | |