!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
27 Jun 2024
@ss:someonex.netSomeoneSerge (back on matrix) Is it added manually? Hmm now come to think of it if a is in buildInputs, and has b in propagatedBuildInputs, which has c in propagatedBuildINputs - c should end up at the same offsets, i.e. in buildInputs 21:32:31
@ss:someonex.netSomeoneSerge (back on matrix) * Is it added manually? Hmm now come to think of it if a is in buildInputs, and has b in propagatedBuildInputs, which has c in propagatedBuildINputs - we want c to end up at the same offsets, i.e. in buildInputs 21:32:46
28 Jun 2024
@ss:someonex.netSomeoneSerge (back on matrix) Should work this time: https://github.com/NixOS/nixpkgs/pull/323056
Can I bump a nixpgks-review? xD
01:43:56
@ss:someonex.netSomeoneSerge (back on matrix) * Should work this time: https://github.com/NixOS/nixpkgs/pull/323056
Can I bum a nixpgks-review? xD
01:44:00
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Should work this time: https://github.com/NixOS/nixpkgs/pull/323056
Can I bum a nixpgks-review? xD
Omg, nevermind... I checked that magma still builds after the first commit, then did something in the second and now it doesn't
01:49:25
@howird:matrix.orgHoward Nguyen-Huu joined the room.02:44:51
@search-sense:matrix.orgsearch-sense
In reply to @matthewcroughan:defenestrate.it
they removed the .6 from the release
I know, that it's broken ... actually it would be good if someone upgrade it to the current version TensorRT-10.1.0.27
11:00:16
@ss:someonex.netSomeoneSerge (back on matrix) Shoot, I think propagatedBuildOutputs are broken with __structuredAttrs 11:08:29
@ss:someonex.netSomeoneSerge (back on matrix) The hook loops over $propagatedBuildOutputs but __structuredAttrs make it onti an array, so the first expression resolves into the value of the first element 🤡 11:09:03
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @search-sense:matrix.org
I know, that it's broken ... actually it would be good if someone upgrade it to the current version TensorRT-10.1.0.27
Would you like to just take over tensorrt in Nixpkgs?
11:28:58
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Should work this time: https://github.com/NixOS/nixpkgs/pull/323056
Can I bum a nixpgks-review? xD
Yay
12:07:56
@ss:someonex.netSomeoneSerge (back on matrix)clipboard.png
Download clipboard.png
12:08:03
@titus-von-koeller:matrix.orgTitus joined the room.12:52:05
@matthewcroughan:defenestrate.itmatthewcroughan
In reply to @ss:someonex.net
Would you like to just take over tensorrt in Nixpkgs?
I wouldn't wish that on my worst enemy
13:00:27
@titus-von-koeller:matrix.orgTitus

Hey! I just started using NixOS and I love it but have a MAJOR blocker, as I'm maintaining a FOSS deep learning package and can't get CUDA to work :( I would really love to continue on this journey and also eventually contribute to this community here, but right now it feels like I just shot myself in the foot badly, as I've spent the last days exclusively configuring NixOS only to reach a point which is seemingly unsurmountable for me.. The issue seems to be that PyTorch doesn't find the CUDA driver and what's also weird is that nvidia-smi seems to work fine, but shows CUDA Version: ERR!

The thing is that in order to work with my collaborators, I need to work in a non NixOS way, in my case I would like to use pixi which is very much like conda/micromamba, just better.. Therefore, I'm trying to get things working in an FHS shell. Does one of you have an idea? Am I doing anything obvious wrong?

from my configuration.nix

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };


  # Allow unfree packages
  nixpkgs.config.allowUnfree = true;
  services.xserver.videoDrivers = ["nvidia"];

  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    package = config.boot.kernelPackages.nvidiaPackages.beta;
  };

pixi-fhs.nix

{ pkgs, unstable }:

let
cudatoolkit = pkgs.cudaPackages.cudatoolkit_12_1;
nvidia_x11 = pkgs.nvidia_x11;
in
pkgs.buildFHSUserEnv {
name = "pixi-env";
targetPkgs = pkgs: with pkgs; [
unstable.pixi
cudatoolkit
nvidia_x11
# bashInteractive
# bash-completion
# complete-alias
];
runScript = "bash";
profile = ''
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
export XDG_CONFIG_DIRS=${nvidia_x11}/share/X11/xorg.conf.d''${XDG_CONFIG_DIRS:+:}$XDG_CONFIG_DIRS
export XDG_DATA_DIRS=${nvidia_x11}/share''${XDG_DATA_DIRS:+:}$XDG_DATA_DIRS

export LD_LIBRARY_PATH=${cudatoolkit}/lib:${cudatoolkit}/lib64:${cudatoolkit}/lib64/stubs''${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH
export CUDA_PATH=${cudatoolkit}
export PATH=${cudatoolkit}/bin:$PATH
export LIBRARY_PATH=${cudatoolkit}/lib:${cudatoolkit}/lib64:$LIBRARY_PATH

export CPLUS_INCLUDE_PATH="${cudatoolkit}/include''${CPLUS_INCLUDE_PATH:+:$CPLUS_INCLUDE_PATH}"
export C_INCLUDE_PATH="${cudatoolkit}/include''${C_INCLUDE_PATH:+:$C_INCLUDE_PATH}"

# Pixi completion -- not working yet, due to missing `complete` command
eval "$(pixi completion --shell bash 2>/dev/null)"

echo "*** Pixi environment activated, using $(which pixi). ***"

'';
}


Thanks in advance <3
13:00:34
@titus-von-koeller:matrix.orgTitus *

Hey! I just started using NixOS and I love it but have a MAJOR blocker, as I'm maintaining a FOSS deep learning package and can't get CUDA to work :( I would really love to continue on this journey and also eventually contribute to this community here, but right now it feels like I just shot myself in the foot badly, as I've spent the last days exclusively configuring NixOS only to reach a point which is seemingly unsurmountable for me.. The issue seems to be that PyTorch doesn't find the CUDA driver and what's also weird is that nvidia-smi seems to work fine, but shows CUDA Version: ERR!

The thing is that in order to work with my collaborators, I need to work in a non NixOS way, in my case I would like to use pixi which is very much like conda/micromamba, just better.. Therefore, I'm trying to get things working in an FHS shell. Does one of you have an idea? Am I doing anything obvious wrong?

from my configuration.nix

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
  };


  # Allow unfree packages
  nixpkgs.config.allowUnfree = true;
  services.xserver.videoDrivers = ["nvidia"];

  hardware.nvidia = {
    modesetting.enable = true;
    powerManagement.enable = false;
    powerManagement.finegrained = false;
    open = false;
    package = config.boot.kernelPackages.nvidiaPackages.beta;
  };

pixi-fhs.nix


{ pkgs, unstable }:

let
cudatoolkit = pkgs.cudaPackages.cudatoolkit\_12\_1;
nvidia\_x11 = pkgs.nvidia\_x11;
in
pkgs.buildFHSUserEnv {
name = "pixi-env";
targetPkgs = pkgs: with pkgs; \[
unstable.pixi
cudatoolkit
nvidia\_x11
# bashInteractive
# bash-completion
# complete-alias
\];
runScript = "bash";
profile = ''
export NVIDIA\_DRIVER\_CAPABILITIES=compute,utility
export XDG\_CONFIG\_DIRS=${nvidia\_x11}/share/X11/xorg.conf.d''${XDG\_CONFIG\_DIRS:+:}$XDG\_CONFIG\_DIRS
export XDG\_DATA\_DIRS=${nvidia\_x11}/share''${XDG\_DATA\_DIRS:+:}$XDG\_DATA\_DIRS

export LD_LIBRARY_PATH=${cudatoolkit}/lib:${cudatoolkit}/lib64:${cudatoolkit}/lib64/stubs''${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH
export CUDA_PATH=${cudatoolkit}
export PATH=${cudatoolkit}/bin:$PATH
export LIBRARY_PATH=${cudatoolkit}/lib:${cudatoolkit}/lib64:$LIBRARY_PATH

export CPLUS_INCLUDE_PATH="${cudatoolkit}/include''${CPLUS_INCLUDE_PATH:+:$CPLUS_INCLUDE_PATH}"
export C_INCLUDE_PATH="${cudatoolkit}/include''${C_INCLUDE_PATH:+:$C_INCLUDE_PATH}"

Pixi completion -- not working yet, due to missing complete command

eval "$(pixi completion --shell bash 2>/dev/null)"

echo "*** Pixi environment activated, using $(which pixi). ***"


'';
}

Thanks in advance <3

13:00:55
@titus-von-koeller:matrix.orgTitus
  File "/home/titus/src/bnb/bitsandbytes/diagnostics/main.py", line 66, in main
    sanity_check()
  File "/home/titus/src/bnb/bitsandbytes/diagnostics/main.py", line 33, in sanity_check
    p = torch.nn.Parameter(torch.rand(10, 10).cuda())
  File "/home/titus/src/bnb/.pixi/envs/default/lib/python3.8/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
13:06:53
@ss:someonex.netSomeoneSerge (back on matrix)

pixi

Is one of the package managers that ship incomplete dependencies because they expect an FHS environment. If you want to use it on NixOS I recommend you use nix-ld. You'll aslo need to ensure (using a shell e..g) that the ld.so is aware of ${addDriverRunpath.driverLink}/lib, which you can also do as part of the nix-ld configuration. E.g. you can deploy a nixos with programs.nix-ld.enable and then, in your project tree, use a nix shell that looks something like the following: https://github.com/NixOS/nixpkgs/blob/48dbb2ae90be0ba21b44e77b8278fd7cefb4b75f/nixos/doc/manual/configuration/fhs.chapter.md?plain=1#L105-L113

13:09:09
@ss:someonex.netSomeoneSerge (back on matrix)
pkgs.buildFHSUserEnv {
name = "pixi-env";
targetPkgs = pkgs: with pkgs; [
...
nvidia_x11
...
];

...this, if it had any affect on the dynamic loader (in this form it doesn't, instead it provides hints for the compiler), would conflict with the libcuda driver deployed by NixOS. NVidia makes it so that the driver has to be deployed impurely, because each libcuda only works with the corresponding kernel. TLDR: delete nvidia_x11 from that list

13:12:14
@ss:someonex.netSomeoneSerge (back on matrix) Also note that cudaPackages.cudatoolkit is a package for development; e.g. if pixi runs any builds (idk if it does) and you want it to use nixpkgs' cudatoolkit libraries instead of pixi libraries, that's when you include it int he shell 13:13:27
@ss:someonex.netSomeoneSerge (back on matrix) * Also note that cudaPackages.cudatoolkit is a package for development; e.g. if pixi runs any builds (idk if it does) and you want it to use nixpkgs' cudatoolkit libraries instead of pixi libraries, that's when you include it in the shell 13:13:30
@titus-von-koeller:matrix.orgTitus

Thank you so much for your help, SO appreciated!

yeah, I had added nvidia_x11 only in the latest iteration.. I'll remove it right away.

The weird thing is that even outside the fhs, nvidia smi gives me this N/A: Have you ever seen that before? Does it tell us sth useful?

❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
13:40:43
@titus-von-koeller:matrix.orgTitus *

Thank you so much for your help, SO appreciated!

yeah, I had added nvidia_x11 only in the latest iteration.. I'll remove it right away.

The weird thing is that even outside the fhs, nvidia smi gives me this N/A: Have you ever seen that before? Does it tell us sth useful?

❯ nvidia-smi
Fri Jun 28 14:55:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: ERR!     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:01:00.0  On |                  Off |
|  0%   43C    P8             40W /  480W |    3302MiB /  24564MiB |     26%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        On  |   00000000:03:00.0 Off |                  Off |
|  0%   36C    P8             29W /  480W |      12MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
13:41:09
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @titus-von-koeller:matrix.org

Thank you so much for your help, SO appreciated!

yeah, I had added nvidia_x11 only in the latest iteration.. I'll remove it right away.

The weird thing is that even outside the fhs, nvidia smi gives me this N/A: Have you ever seen that before? Does it tell us sth useful?

❯ nvidia-smi
Fri Jun 28 14:55:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: ERR!     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:01:00.0  On |                  Off |
|  0%   43C    P8             40W /  480W |    3302MiB /  24564MiB |     26%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        On  |   00000000:03:00.0 Off |                  Off |
|  0%   36C    P8             29W /  480W |      12MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
That is odd. Just to make sure, you did reboot after you had set config.boot.kernelPackages.nvidiaPackages.beta and enabled hardware.opengl? If not, please reboot because these drivers are an edge case where mutable state matters...
13:51:30
@ss:someonex.netSomeoneSerge (back on matrix) If you did, could post the output of nix run -f '<nixpkgs>' cudaPackages.saxpy? 13:52:18
@titus-von-koeller:matrix.orgTitus
❯ nix run -f '<nixpkgs>' cudaPackages.saxpy
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
Start
Runtime version: 12020
Driver version: 0
Host memory initialized, copying to the device
CUDA error at cudaMalloc(&xDevice, N * sizeof(float)): CUDA driver is a stub library
14:47:01
@titus-von-koeller:matrix.orgTitusok, so I updated my pixi-fhs.nix according to my best understanding of what you proposed, unfortunately, the error is still the exact same. Would you mind taking another look? I also added some other stuff to give a better overview, please let me know if any other info is needed.. https://gist.github.com/Titus-von-Koeller/e541c1175b0a191bac75b72d9964e9d015:10:22
@ss:someonex.netSomeoneSerge (back on matrix)Ls /run/opengl-driver/lib/libcuda*15:10:39
@ss:someonex.netSomeoneSerge (back on matrix)Ican habea look tmr, nowI'm just typing from the phone...15:11:13
@titus-von-koeller:matrix.orgTitus
❯ ls /run/opengl-driver/lib/libcuda*
 /run/opengl-driver/lib/libcuda.so -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcuda.so*
 /run/opengl-driver/lib/libcuda.so.1 -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcuda.so.1*
 /run/opengl-driver/lib/libcuda.so.550.78 -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcuda.so.550.78*
 /run/opengl-driver/lib/libcudadebugger.so -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcudadebugger.so*
 /run/opengl-driver/lib/libcudadebugger.so.1 -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcudadebugger.so.1*
 /run/opengl-driver/lib/libcudadebugger.so.550.78 -> /nix/store/2paf3i2g4arx5j4m9l87zdrzsikwmizh-nvidia-x11-550.78-6.6.35/lib/libcudadebugger.so.550.78*
15:11:19

Show newer messages


Back to Room ListRoom Version: 9