NixOS CUDA - Public Room Timeline

	NixOS CUDA	326 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	64 Servers

Load older messages

Sender	Message	Time
22 Feb 2023
connor (he/him)	Is there a recommended way to get in touch with NVIDIA about their docs? For example, https://docs.nvidia.com/cuda/archive/11.0.3/ gives me an access denied, and some of their tables in their older docs are missing supported compute capabilities (https://docs.nvidia.com/cuda/archive/11.2.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list vs https://docs.nvidia.com/cuda/archive/11.3.1/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, `sm_37` reappears, but `sm_52` is missing in both)	15:05:30
connor (he/him)	Ah, the link for their 11.0.x docs on https://developer.nvidia.com/cuda-toolkit-archive is wrong -- it follows the 10.2 format so it should be something like https://docs.nvidia.com/cuda/archive/11.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list	15:09:01
23 Feb 2023
connor (he/him)	If anyone has any knowledge to contribute, I'd appreciate it: https://github.com/NixOS/nixpkgs/issues/217780	01:14:30
Kevin Mittman (UTC-7)	RE: Getting in touch, I'd recommend starting a new thread in https://forums.developer.nvidia.com/c/8	03:09:29
connor (he/him)	NVCC has a certain range of compilers it supports. I know that currently we export CC/CXX/CUDAHOSTCXX as appropriate to handle that... but that only changes things in the current derivation. Since the default language standard (like c++11 -> c++14) can change between compiler releases, it's possible that we build a derivation with an NVCC-supported version of GCC or clang, but the libraries that derivation links against were built with a different compiler version with a different language standard. That can manifest as missing or broken symbols during linking, right?	21:51:31
24 Feb 2023
connor (he/him)	Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this: [connorbaker@fedora bsrt_temp]$ python3 -m bsrt Traceback (most recent call last): File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module> from mfsr_utils.pipelines.synthetic_burst_generator import ( File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module> import cv2 # type: ignore[import] ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407) OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!)	01:02:04
connor (he/him)	It did! Now I'm seeing a different error of `RuntimeError: CUDA driver error: PTX JIT compiler library not found`, but that's progress :)	01:32:45
connor (he/him)	* It did! Now I'm seeing a different error of `RuntimeError: CUDA driver error: PTX JIT compiler library not found`, but that's because I'm not using nixGL yet on a non-NixOS machine	01:41:42
mcwitt	Is there an issue with `cudaPackages` since the the gcc version bump to 12? I'd expect the following to work `let pkgs = import ./. { }; in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } '' nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out ''` but on master I see `error -- unsupported GNU version! gcc versions later than 11 are not supported!` (might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped)	19:06:53
connor (he/him)	Yes, there is. Currently derivations need to set the C/C++ compilers to the same version used by NVCC, otherwise you get errors like that (in the case the CUDA host compiler isn't specified, where NVCC just uses whatever compiler `stdenv` has) or weird symbol errors when linking (if libraries being linked against each other were built with different compilers/language standard versions)	20:18:55
mcwitt	Ah, thanks! I saw your message just above, but didn't make the connection that it's the same issue. Will play around with it and see if I can get the minimal example to work	20:23:06
SomeoneSerge (matrix works sometimes)	In reply to @connorbaker:matrix.org Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this: [connorbaker@fedora bsrt_temp]$ python3 -m bsrt Traceback (most recent call last): File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module> from mfsr_utils.pipelines.synthetic_burst_generator import ( File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module> import cv2 # type: ignore[import] ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407) OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!) To add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95	20:25:15
SomeoneSerge (matrix works sometimes)	connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically?	20:34:46
SomeoneSerge (matrix works sometimes)	https://github.com/NixOS/nixpkgs/issues/217913 https://github.com/NixOS/nixpkgs/issues/217878 These two seem like instances of the same problem	20:35:47
SomeoneSerge (matrix works sometimes)	In reply to @connorbaker:matrix.org Example of me trying to run something I just packaged (https://github.com/connorbaker/bsrt) and maybe getting bitten by (what I think is) exactly this: [connorbaker@fedora bsrt_temp]$ python3 -m bsrt Traceback (most recent call last): File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/nix/store/0pyymzxf7n0fzpaqnvwv92ab72v3jq8d-python3-3.10.9/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/connorbaker/Documents/bsrt_temp/bsrt/__main__.py", line 6, in <module> from mfsr_utils.pipelines.synthetic_burst_generator import ( File "/nix/store/ay2msah0yd16xjwyldqd0n6incf9gd7l-python3.10-mfsr_utils-1.7/lib/python3.10/site-packages/mfsr_utils/pipelines/synthetic_burst_generator.py", line 6, in <module> import cv2 # type: ignore[import] ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/s9jsa3p9csvnpvfhix19b3rfyg08m275-opencv-4.7.0/lib/libopencv_gapi.so.407) OpenCV specifies the CUDA host compiler, but does not set the C or C++ compilers. I'm trying a build with a patched derivation for opencv and hoping that resolves the problem. (Also, OpenCV apparently doesn't build for specific GPU architectures or take advantage of CUDNN!) * To add to the pile: tensorflow is currently broken, failing with the same error https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fjh4mgzwa0g877sv4i3yn7kszfp5wa2dx-python3.10-jax-0.4.1.drv/log?via-job=0cd9e121-eb28-429d-a769-5ba401322f95 Same with jax, faiss, &c	20:36:03
SomeoneSerge (matrix works sometimes)	In reply to @mcwitt:matrix.org Is there an issue with `cudaPackages` since the the gcc version bump to 12? I'd expect the following to work `let pkgs = import ./. { }; in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } '' nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out ''` but on master I see `error -- unsupported GNU version! gcc versions later than 11 are not supported!` (might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped) Just overriding this with `gcc11Stdenv` succeeds	20:37:35
SomeoneSerge (matrix works sometimes)	Btw, great test, it's a shame we don't run it anywhere automatically 🤣	20:38:07
SomeoneSerge (matrix works sometimes)	In reply to @ss:someonex.net connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically? Maybe rather than fixing `cudaPackages.cudatoolkit.cc` in non-redist cudatoolkit's `versions.toml` we should set a `cudaPackages`-wide default `stdenv` (e.g. `cudaPackages.stdenv = gcc11Stdenv` in case of pre-cuda-12). It seems like downstream packages do have to use that stdenv if they build any cuda kernels.	20:43:23
SomeoneSerge (matrix works sometimes)	RE: opencv in BSRT as well as tensorflow and jax Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist `cudatoolkit`	20:46:07
SomeoneSerge (matrix works sometimes)	* RE: opencv in BSRT as well as tensorflow and jax Is there a chance we misinterpret the "wrong glibc" errors, them using auto-patchelf-ed non-redist `cudatoolkit`?	20:46:10
SomeoneSerge (matrix works sometimes)	In reply to @ss:someonex.net connor (he/him): is there currently an open issue tracking this stdenv/compiler compatibility problem specifically? * Maybe rather than fixing `cudaPackages.cudatoolkit.cc` in non-redist cudatoolkit's `versions.toml` we should set a `cudaPackages`-wide default `stdenv` (e.g. `cudaPackages.stdenv = gcc11Stdenv` in case of pre-cuda-12)? It seems like downstream packages do have to use that stdenv if they build any cuda kernels.	20:46:42
SomeoneSerge (matrix works sometimes)	In reply to @mcwitt:matrix.org Is there an issue with `cudaPackages` since the the gcc version bump to 12? I'd expect the following to work `let pkgs = import ./. { }; in pkgs.runCommandCC "test" { buildInputs = with pkgs.cudaPackages; [ cuda_nvcc cuda_cudart ]; } '' nvcc ${pkgs.writeText "test.cu" "int main() { return 0; }"} -o $out ''` but on master I see `error -- unsupported GNU version! gcc versions later than 11 are not supported!` (might be missing something because I'm not immediately finding workarounds that were necessary for other CUDA packages since gcc was bumped) * Just overriding this with `gcc11Stdenv` succeeds (same applies e.g. to `faiss` attribute in nixpkgs)	20:47:22
connor (he/him)	A standard environment for CUDA would be really nice given that NVCC always has version constraints on the compiler ALTERNATIVELY, if we didn't want to change anything else, we could add the NVCC flag `--allow-unsupported-compiler` (or something similar, I don't remember) and just build with whatever	20:49:12
mcwitt	Just to close the loop, the fix in my case was to set `cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER ${cudaPackages.cudatoolkit.cc}/bin/cc" ]` (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction!	21:34:02
mcwitt	* Just to close the loop, the fix in my case was to set `cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER=${cudaPackages.cudatoolkit.cc}/bin/cc" ]` (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction!	21:34:32
connor (he/him)	I made a helper for `nixpkgs-review` workflows! After redirecting all the output to a file, the script takes all of the failing derivations, makes a gist for each build log, and makes a little markdown table in a comment on the PR you were review. Script here: https://gist.github.com/ConnorBaker/b32a7f69d318e3f338b6b4fedeef37ef Example comment here: https://github.com/NixOS/nixpkgs/pull/218035#issuecomment-1444682137	23:26:47
connor (he/him)	Although, all these tools print with color to output even if it's a file, so there are escape characters in them :(	23:28:54
25 Feb 2023
SomeoneSerge (matrix works sometimes)	Damn... I forget again, how do I make a command run after `autoPatchelfHook`?..	11:32:03
SomeoneSerge (matrix works sometimes)	Appending to `postFixup` doesn't seem to do the trick	11:32:19
SomeoneSerge (matrix works sometimes)	Is there a reason we use `glob` in `auto-patchelf.py`? It skips hidden files, including files renamed by `wrapProgram`	11:39:11

Show newer messages

Back to Room ListRoom Version: 9