NixOS CUDA - Public Room Timeline

	NixOS CUDA	292 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
16 Sep 2022
Suwon Park	* I'm running `nix develop` and `genericBuild` with `flake.nix`!	17:49:47
Suwon Park	In reply to @ss:someonex.net `python3Packages.pytorch` currently still uses the older `cudaPackages.cudatoolkit` expression, which ships a lot of stuff, including `cuda_cudart` But, if you check `pytorch` in `22.05` version of `nixpkgs`, the following code already exists `# Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart* $lib/lib/` which means that `cudaPackages.cudatoolkit` expression does not ship with `cudart`!	17:55:09
Suwon Park	In reply to @ss:someonex.net `python3Packages.pytorch` currently still uses the older `cudaPackages.cudatoolkit` expression, which ships a lot of stuff, including `cuda_cudart` * But, if you check `cudaPackages` in `22.05` version of `nixpkgs`, the following code already exists `# Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart* $lib/lib/` which means that `cudaPackages.cudatoolkit` expression does not ship with `cudart`!	17:56:11
Suwon Park	* But, if you check `cudaPackages.cudatoolkit` in `22.05` version of `nixpkgs`, the following code already exists `# Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart* $lib/lib/` which means that `cudaPackages.cudatoolkit` expression does not ship with `cudart`!	17:56:20
Suwon Park	If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` And in the end if I understood the code correctly, that `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have` libcuda.*` because of the above code I mentioned. Am I right..?🤔	18:07:46
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` That's because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have` libcuda.*` because of the above code I mentioned. Am I right..?🤔	18:08:28
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` That's because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn``t have `libcuda.*` because of the above code I mentioned. Am I right..?🤔	18:08:53
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` That's because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.*` because of the above code I mentioned. Am I right..?🤔	18:09:06
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` That's because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.*` because of the above code I mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:13:12
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` The above code cause error because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.` because of # Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart $lib/lib/ I mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:14:59
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` The above code cause error because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.` because of # Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart $lib/lib/ I mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:15:22
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` The above code cause error because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.` because of # Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart $lib/lib/ I mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:15:52
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` The above code cause error because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.` because of the following code `# Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart $lib/lib/` I just mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:16:20
Suwon Park	* If you unpack `python39Packages.pytorch` (current version : 1.11.0), and go to `source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake` line 1128, there is the following code block which creates `-- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6")` error. `find_package_handle_standard_args(CUDA REQUIRED_VARS CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS ${CUDA_CUDART_LIBRARY_VAR} VERSION_VAR CUDA_VERSION )` The above code block causes error because in the end, if I understood the code correctly, `${CUDA_CUDART_LIBRARY_VAR}` looks for `libcudart.so` inside `cudaPackages.cudatoolkit` which now doesn`t have `libcuda.` because of the following code `# Move some libraries to the lib output so that programs that # depend on them don't pull in this entire monstrosity. mkdir -p $lib/lib mv -v $out/lib64/libcudart $lib/lib/` I just mentioned. Am I right..?🤔 But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.	18:21:50
SomeoneSerge (back on matrix)	pytorch derivation uses `symlinkJoin` which includes contents of `cudatoolkit.out` and `cudatoolkit.lib`	20:54:46
Suwon Park	Someone S: Aha! Let me try some modification! Thank you!	21:10:06
Suwon Park	* Someone S: Aha! Let me try some modifications! Thank you!	21:10:18
SomeoneSerge (back on matrix)	`cudatoolkit.{out,lib}` bring in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57	21:14:47
SomeoneSerge (back on matrix)	* `cudatoolkit.{out,lib}` brings in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57	21:14:53
17 Sep 2022
aidalgol	In a flake shell with `config.allowUnfree = true;` and `config.cudaSupport = true;`, the python `torch` module is throwing an unknown CUDA error. Is there something more I need to do to get the package's CUDA support enabled? `File "/nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.`	05:54:58
tpw_rules	are you running on nixos?	05:55:22
aidalgol	Yes, sorry, this is on NixOS.	05:55:35
tpw_rules	and you have the nvidia drivers set up and nvidia-smi works and stuff?	05:57:16
aidalgol	Yep, `nvidia-smi` output still looks good.	05:58:14
tpw_rules	is this the torch-bin module or did you compile it yourself?	05:58:52
aidalgol	I was referencing `torch`, not `torch-bin`. Should I try that one?	06:00:26
aidalgol	I'm also using the `cuda-maintainers` cachix cache, if that makes a difference.	06:01:19
tpw_rules	you can try torch-bin, it's precompiled by upstream with cuda support.	06:01:41
tpw_rules	do you have an excessively recent nvidia card?	06:02:01
aidalgol	RTX3080	06:02:07

Show newer messages

Back to Room ListRoom Version: 9