!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

326 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
22 Feb 2023
@connorbaker:matrix.orgconnor (he/him) Is there a recommended way to get in touch with NVIDIA about their docs?
For example, https://docs.nvidia.com/cuda/archive/11.0.3/ gives me an access denied, and some of their tables in their older docs are missing supported compute capabilities (https://docs.nvidia.com/cuda/archive/11.2.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list vs https://docs.nvidia.com/cuda/archive/11.3.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, sm_37 reappears, but sm_52 is missing in both)
15:05:30
@connorbaker:matrix.orgconnor (he/him)Ah, the link for their 11.0.x docs on https://developer.nvidia.com/cuda-toolkit-archive is wrong -- it follows the 10.2 format so it should be something like https://docs.nvidia.com/cuda/archive/11.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list15:09:01
23 Feb 2023
@connorbaker:matrix.orgconnor (he/him)If anyone has any knowledge to contribute, I'd appreciate it: https://github.com/NixOS/nixpkgs/issues/21778001:14:30
@justbrowsing:matrix.orgKevin Mittman (UTC-7) RE: Getting in touch, I'd recommend starting a new thread in https://forums.developer.nvidia.com/c/8  03:09:29
@connorbaker:matrix.orgconnor (he/him)NVCC has a certain range of compilers it supports. I know that currently we export CC/CXX/CUDAHOSTCXX as appropriate to handle that... but that only changes things in the current derivation. Since the default language standard (like c++11 -> c++14) can change between compiler releases, it's possible that we build a derivation with an NVCC-supported version of GCC or clang, but the libraries that derivation links against were built with a different compiler version with a different language standard. That can manifest as missing or broken symbols during linking, right?21:51:31
24 Feb 2023
@connorbaker:matrix.orgconnor (he/him)

Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this:

[connorbaker@fedora bsrt_temp]$ python3 -m bsrt
Traceback (most recent call last):
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module>
    from mfsr_utils.pipelines.synthetic_burst_generator import (
  File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module>
    import cv2  # type: ignore[import]
ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407)

OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!)

01:02:04
@connorbaker:matrix.orgconnor (he/him) It did! Now I'm seeing a different error of RuntimeError: CUDA driver error: PTX JIT compiler library not found, but that's progress :) 01:32:45
@connorbaker:matrix.orgconnor (he/him) * It did! Now I'm seeing a different error of RuntimeError: CUDA driver error: PTX JIT compiler library not found, but that's because I'm not using nixGL yet on a non-NixOS machine 01:41:42
@mcwitt:matrix.orgmcwitt

Is there an issue with cudaPackages since the the gcc version bump to 12? I'd expect the following to work

let pkgs = import ./. { };
in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } ''
  nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out
''

but on master I see

error -- unsupported GNU version! gcc versions later than 11 are not supported!

(might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped)

19:06:53
@connorbaker:matrix.orgconnor (he/him) Yes, there is. Currently derivations need to set the C/C++ compilers to the same version used by NVCC, otherwise you get errors like that (in the case the CUDA host compiler isn't specified, where NVCC just uses whatever compiler stdenv has) or weird symbol errors when linking (if libraries being linked against each other were built with different compilers/language standard versions) 20:18:55
@mcwitt:matrix.orgmcwittAh, thanks! I saw your message just above, but didn't make the connection that it's the same issue. Will play around with it and see if I can get the minimal example to work20:23:06
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org

Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this:

[connorbaker@fedora bsrt_temp]$ python3 -m bsrt
Traceback (most recent call last):
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module>
    from mfsr_utils.pipelines.synthetic_burst_generator import (
  File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module>
    import cv2  # type: ignore[import]
ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407)

OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!)

To add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95
20:25:15
@ss:someonex.netSomeoneSerge (matrix works sometimes) connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically? 20:34:46
@ss:someonex.netSomeoneSerge (matrix works sometimes)https://github.com/NixOS/nixpkgs/issues/217913 https://github.com/NixOS/nixpkgs/issues/217878 These two seem like instances of the same problem20:35:47
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org

Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this:

[connorbaker@fedora bsrt_temp]$ python3 -m bsrt
Traceback (most recent call last):
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module>
    from mfsr_utils.pipelines.synthetic_burst_generator import (
  File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module>
    import cv2  # type: ignore[import]
ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407)

OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!)

*

To add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95

Same with jax, faiss, &c

20:36:03
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @mcwitt:matrix.org

Is there an issue with cudaPackages since the the gcc version bump to 12? I'd expect the following to work

let pkgs = import ./. { };
in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } ''
  nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out
''

but on master I see

error -- unsupported GNU version! gcc versions later than 11 are not supported!

(might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped)

Just overriding this with gcc11Stdenv succeeds
20:37:35
@ss:someonex.netSomeoneSerge (matrix works sometimes)Btw, great test, it's a shame we don't run it anywhere automatically 🤣20:38:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @ss:someonex.net
connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically?
Maybe rather than fixing cudaPackages.cudatoolkit.cc in non-redist cudatoolkit's versions.toml we should set a cudaPackages-wide default stdenv (e.g. cudaPackages.stdenv = gcc11Stdenv in case of pre-cuda-12). It seems like downstream packages do have to use that stdenv if they build any cuda kernels.
20:43:23
@ss:someonex.netSomeoneSerge (matrix works sometimes)

RE: opencv in BSRT as well as tensorflow and jax

Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist cudatoolkit

20:46:07
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

RE: opencv in BSRT as well as tensorflow and jax

Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist cudatoolkit?

20:46:10
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @ss:someonex.net
connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically?
* Maybe rather than fixing cudaPackages.cudatoolkit.cc in non-redist cudatoolkit's versions.toml we should set a cudaPackages-wide default stdenv (e.g. cudaPackages.stdenv = gcc11Stdenv in case of pre-cuda-12)? It seems like downstream packages do have to use that stdenv if they build any cuda kernels.
20:46:42
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @mcwitt:matrix.org

Is there an issue with cudaPackages since the the gcc version bump to 12? I'd expect the following to work

let pkgs = import ./. { };
in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } ''
  nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out
''

but on master I see

error -- unsupported GNU version! gcc versions later than 11 are not supported!

(might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped)

* Just overriding this with gcc11Stdenv succeeds (same applies e.g. to faiss attribute in nixpkgs)
20:47:22
@connorbaker:matrix.orgconnor (he/him) A standard environment for CUDA would be really nice given that NVCC always has version constraints on the compiler
ALTERNATIVELY, if we didn't want to change anything else, we could add the NVCC flag --allow-unsupported-compiler (or something similar, I don't remember) and just build with whatever
20:49:12
@mcwitt:matrix.orgmcwitt Just to close the loop, the fix in my case was to set cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER ${cudaPackages.cudatoolkit.cc}/bin/cc" ] (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction! 21:34:02
@mcwitt:matrix.orgmcwitt * Just to close the loop, the fix in my case was to set cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER=${cudaPackages.cudatoolkit.cc}/bin/cc" ] (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction! 21:34:32
@connorbaker:matrix.orgconnor (he/him) I made a helper for nixpkgs-review workflows! After redirecting all the output to a file, the script takes all of the failing derivations, makes a gist for each build log, and makes a little markdown table in a comment on the PR you were review.
Script here: https://gist.github.com/ConnorBaker/b32a7f69d318e3f338b6b4fedeef37ef
Example comment here: https://github.com/NixOS/nixpkgs/pull/218035#issuecomment-1444682137
23:26:47
@connorbaker:matrix.orgconnor (he/him)Although, all these tools print with color to output even if it's a file, so there are escape characters in them :(23:28:54
25 Feb 2023
@ss:someonex.netSomeoneSerge (matrix works sometimes) Damn... I forget again, how do I make a command run after autoPatchelfHook?.. 11:32:03
@ss:someonex.netSomeoneSerge (matrix works sometimes) Appending to postFixup doesn't seem to do the trick 11:32:19
@ss:someonex.netSomeoneSerge (matrix works sometimes) Is there a reason we use glob in auto-patchelf.py? It skips hidden files, including files renamed by wrapProgram 11:39:11

Show newer messages


Back to Room ListRoom Version: 9