!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

287 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
17 Sep 2022
@aidalgol:matrix.orgaidalgol(Modifying that script to use a different input file)06:07:21
@tpw_rules:matrix.orgtpw_rulesmaybe tensorrt is the problem. don't think that's in nixpkgs06:08:03
@tpw_rules:matrix.orgtpw_rulesanyway it is excessively my bedtime. good luck06:08:16
@aidalgol:matrix.orgaidalgolWelp, that made no difference.06:37:31
@aidalgol:matrix.orgaidalgol

With this shell.nix,

{ pkgs ? import <nixpkgs> {
  config.allowUnfree = true;
  config.cudaSupport = true;
} }:

pkgs.mkShell {
  packages = with pkgs; [
    (python3.withPackages (ps: [
      ps.torch
    ]))
  ];
}

Just a basic "is CUDA available" check fails.

$ nix-shell --run 'python'
Python 3.10.6 (main, Aug  1 2022, 20:38:21) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> assert torch.cuda.is_available()
/nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /build/source/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
06:41:20
@aidalgol:matrix.orgaidalgolUh, false alarm I guess, because after a NixOS update and reboot, the assert passes.09:09:37
@tpw_rules:matrix.orgtpw_rulesthe ways of cuda are strange. glad you got it working. maybe you updated your kernel recently and needed to reboot13:44:11
@ss:someonex.netSomeoneSerge (back on matrix)

(too late, but I'll still chime in with a comment on how I so far understand the landscape)

This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a /run/opengl-driver/lib/libcuda.so and some kernel module loaded. Both are deployed on NixOS when hardware.opengl.enable = true and the driver is nvidia

14:19:31
@aidalgol:matrix.orgaidalgol
In reply to@ss:someonex.net

(too late, but I'll still chime in with a comment on how I so far understand the landscape)

This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a /run/opengl-driver/lib/libcuda.so and some kernel module loaded. Both are deployed on NixOS when hardware.opengl.enable = true and the driver is nvidia

I did not have hardware.opengl.enable = true; in my system config, so I'm not sure how OpenGL ever worked on my system. It's there now, though. 👍️
18:39:18
18 Sep 2022
@greaka:greaka.degreaka left the room.11:35:02
@hexa:lossy.networkhexa (UTC+1) Samuel Ainsworth: jax needs an update on python-updates to handle the new protobuf version 13:18:40
@FRidh:matrix.orgFRidh set a profile picture.17:20:55
19 Sep 2022
@hexa:lossy.networkhexa (UTC+1)does samuel even read message on matrix?02:13:28
@hexa:lossy.networkhexa (UTC+1)pinging on github02:14:00
22 Sep 2022
@hexa:lossy.networkhexa (UTC+1)https://github.com/NixOS/nixpkgs/pull/19239107:13:30
@hexa:lossy.networkhexa (UTC+1)packaged up due to a request, quickly tested (CPU only) and looks to be working07:13:50
23 Sep 2022
@aidalgol:matrix.orgaidalgolLately my GPU isn't showing up as a CUDA device after a system suspend and wake.19:22:33
@aidalgol:matrix.orgaidalgol But nvidia-smi output looks fine. 19:23:54
@aidalgol:matrix.orgaidalgolAnd games still work and run with a playable framerate.19:24:22
@aidalgol:matrix.orgaidalgolSo Vulkan and OpenGL are unaffected; just CUDA.19:25:18
24 Sep 2022
@peddie:matrix.orgpeddie joined the room.10:45:23
@ss:someonex.netSomeoneSerge (back on matrix)Isn't showing up where?13:28:08
@aidalgol:matrix.orgaidalgolBlender doesn't see it, and python CUDA libraries fail to evaluate CUDA devices.20:02:47
@aidalgol:matrix.orgaidalgol I've tried setting hardware.nvidia.nvidiaPersistenced = true; in my NixOS config, and that seems to have resolved it so far. 20:04:06
25 Sep 2022
@hexa:lossy.networkhexa (UTC+1)fyi https://github.com/NixOS/nixpkgs/pull/19287911:26:48
29 Sep 2022
@ss:someonex.netSomeoneSerge (back on matrix)

FRidh: Hey, I just don't want to flood the nvidia-ml-py issue, so I post here: the thing is I don't know what you mean by "referring from the stub to the driver library"

I know that if you just run LD_PRELOAD=$(nix-build '<nixpkgs>' -A cudaPackages.cudatoolkit --no-out-link)/lib/stubs/libnvidia-ml.so nvidia-smi (or prepend the stub path in RUNPATH) without doing anything else, you're going to get a failure. If you meant something else (e.g. if you've heard of other ways to use the stubs than to bypass linker errors), then it might that I just lack knowledge

10:59:49
@FRidh:matrix.orgFRidhYes, and what if you do that after patchelf'ing ${cudatoolkit}/lib/stubs/libnvidia-ml.so to point to /run/current-system/... ?11:06:12
@FRidh:matrix.orgFRidhI have never tried this myself, just thought this should work.11:06:31
@ss:someonex.netSomeoneSerge (back on matrix) Ok, I don't know about anything like it. The way I see it: the downstream app (e.g. nvidia-ml-py) searches for libnvidia-ml.so, resolves that into lib/stubs/libnvidia-ml.so, dlopen's it, and calls a "function" defined there. This now would have to look for a new libnvidia-ml.so? 11:17:09
@ss:someonex.netSomeoneSerge (back on matrix)image.png
Download image.png
11:28:14

Show newer messages


Back to Room ListRoom Version: 9