!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

283 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda59 Servers

Load older messages


SenderMessageTime
16 Sep 2022
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

That's because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of the above code I mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without cuda_cudart which means that I'm probably doing something wrong or unnecessary.

18:13:12
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

The above code cause error because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of

    # Move some libraries to the lib output so that programs that
    # depend on them don't pull in this entire monstrosity.
    mkdir -p $lib/lib
    mv -v $out/lib64/libcudart* $lib/lib/
 I mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.
18:14:59
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

The above code cause error because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of

    # Move some libraries to the lib output so that programs that
    # depend on them don't pull in this entire monstrosity.
    mkdir -p $lib/lib
    mv -v $out/lib64/libcudart* $lib/lib/
 I mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.
18:15:22
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

The above code cause error because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of

    # Move some libraries to the lib output so that programs that
    # depend on them don't pull in this entire monstrosity.
    mkdir -p $lib/lib
    mv -v $out/lib64/libcudart* $lib/lib/
 I mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without `cuda_cudart` which means that I'm probably doing something wrong or unnecessary.
18:15:52
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

The above code cause error because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of the following code

    # Move some libraries to the lib output so that programs that
    # depend on them don't pull in this entire monstrosity.
    mkdir -p $lib/lib
    mv -v $out/lib64/libcudart* $lib/lib/

I just mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without cuda_cudart which means that I'm probably doing something wrong or unnecessary.

18:16:20
@sepiabrown:matrix.orgSuwon Park *

If you unpack python39Packages.pytorch (current version : 1.11.0), and go to source/cmake/Modules_CUDA_fix/upstream/FindCUDA.cmake line 1128, there is the following code block which creates -- Could NOT find CUDA (missing: CUDA_CUDART_LIBRARY) (found version "11.6") error.

find_package_handle_standard_args(CUDA
  REQUIRED_VARS
    CUDA_TOOLKIT_ROOT_DIR
    CUDA_NVCC_EXECUTABLE
    CUDA_INCLUDE_DIRS
    ${CUDA_CUDART_LIBRARY_VAR}
  VERSION_VAR
    CUDA_VERSION
  )

The above code block causes error because in the end, if I understood the code correctly, ${CUDA_CUDART_LIBRARY_VAR} looks for libcudart.so inside cudaPackages.cudatoolkit which now doesn`t have libcuda.* because of the following code

    # Move some libraries to the lib output so that programs that
    # depend on them don't pull in this entire monstrosity.
    mkdir -p $lib/lib
    mv -v $out/lib64/libcudart* $lib/lib/

I just mentioned. Am I right..?🤔
But in the github history, it seems like there was no problem building the package without cuda_cudart which means that I'm probably doing something wrong or unnecessary.

18:21:50
@ss:someonex.netSomeoneSerge (back on matrix) pytorch derivation uses symlinkJoin which includes contents of cudatoolkit.out and cudatoolkit.lib 20:54:46
@sepiabrown:matrix.orgSuwon Park Someone S: Aha! Let me try some modification! Thank you! 21:10:06
@sepiabrown:matrix.orgSuwon Park * Someone S: Aha! Let me try some modifications! Thank you! 21:10:18
@ss:someonex.netSomeoneSerge (back on matrix) cudatoolkit.{out,lib} bring in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57 21:14:47
@ss:someonex.netSomeoneSerge (back on matrix) * cudatoolkit.{out,lib} brings in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57 21:14:53
17 Sep 2022
@aidalgol:matrix.orgaidalgol

In a flake shell with config.allowUnfree = true; and config.cudaSupport = true;, the python torch module is throwing an unknown CUDA error. Is there something more I need to do to get the package's CUDA support enabled?

  File "/nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
    torch._C._cuda_init()
RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.
05:54:58
@tpw_rules:matrix.orgtpw_rulesare you running on nixos?05:55:22
@aidalgol:matrix.orgaidalgolYes, sorry, this is on NixOS.05:55:35
@tpw_rules:matrix.orgtpw_rulesand you have the nvidia drivers set up and nvidia-smi works and stuff?05:57:16
@aidalgol:matrix.orgaidalgol Yep, nvidia-smi output still looks good. 05:58:14
@tpw_rules:matrix.orgtpw_rulesis this the torch-bin module or did you compile it yourself?05:58:52
@aidalgol:matrix.orgaidalgol I was referencing torch, not torch-bin. Should I try that one? 06:00:26
@aidalgol:matrix.orgaidalgol I'm also using the cuda-maintainers cachix cache, if that makes a difference. 06:01:19
@tpw_rules:matrix.orgtpw_rulesyou can try torch-bin, it's precompiled by upstream with cuda support. 06:01:41
@tpw_rules:matrix.orgtpw_rulesdo you have an excessively recent nvidia card?06:02:01
@aidalgol:matrix.orgaidalgolRTX308006:02:07
@aidalgol:matrix.orgaidalgol
Driver Version: 515.48.07    CUDA Version: 11.7
06:02:23
@tpw_rules:matrix.orgtpw_rulesgive torch-bin a try, it doesn't sound like you're doing anything wrong with regular torch though, something might be broken with cuda 11.7 or so06:02:54
@tpw_rules:matrix.orgtpw_rulesactually i think nixpkgs only has cuda 11.6 so that shouldn't even be it. i reviewed the pr and tested it i thought..06:03:26
@aidalgol:matrix.orgaidalgolSome days it feels like GPU programming has invented a new kind of dependency hell.06:03:49
@tpw_rules:matrix.orgtpw_ruleswhat nixpkgs commit are you on06:04:41
@tpw_rules:matrix.orgtpw_rulesand are you trying to run any particular code06:05:15
@tpw_rules:matrix.orgtpw_rulesi might be able to debug next week. i have a 3060Ti at work06:06:18
@aidalgol:matrix.orgaidalgol

I'm trying to run this script for some video upscaling I'm trying to do with VapourSynth and arcane plugins. https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/main/convert_esrgan_to_onnx.py

06:06:50

There are no newer messages yet.


Back to Room ListRoom Version: 9