!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

291 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
23 Jun 2025
@longregen:matrix.orglon joined the room.08:55:01
@longregen:matrix.orglonHi! I have a question, would anybody be interested in a services.vllm module? I was working on running it as systemd service and hardening it and I'm happy with the result...08:57:13
@longregen:matrix.orglonDownload vllm.nix08:58:43
@longregen:matrix.orglon(I've never contributed to nixpkgs, so I'm not sure how high quality is this)08:59:15
@longregen:matrix.orglon

The interesting part is

      MemoryDenyWriteExecute = false; # Needed for CUDA/PyTorch JIT
      PrivateDevices = false; # Needed for GPU access
      RestrictAddressFamilies = ["AF_UNIX" "AF_INET" "AF_INET6" "AF_NETLINK"];
      DevicePolicy = "closed"; # Only allow the following devices, based on strace usage:
      DeviceAllow = lib.flatten [
        # Basic devices
        "/dev/null rw"
        "/dev/urandom r"
        "/dev/tty rw"

        # NVIDIA control devices
        "/dev/nvidiactl rw"
        "/dev/nvidia-modeset rw"
        "/dev/nvidia-uvm rw"
        "/dev/nvidia-uvm-tools rw"

        (builtins.map (i: "/dev/nvidia${builtins.toString i} rw") (lib.splitString " " cfg.cudaDevices))

        # NVIDIA capability devices
        "/dev/nvidia-caps/nvidia-cap1 r"
        "/dev/nvidia-caps/nvidia-cap2 r"
      ];
      ProtectKernelTunables = true;
      ProtectKernelModules = true;
      ProtectControlGroups = true;
      RestrictNamespaces = true;
      LockPersonality = true;
      RestrictRealtime = true;
      RestrictSUIDSGID = true;
      RemoveIPC = true;
      PrivateMounts = true;
      PrivateUsers = true;
      ProtectHostname = true;
      ProtectKernelLogs = true;
      ProtectClock = true;
      ProtectProc = "invisible";
      UMask = "0077";
      CapabilityBoundingSet = ["CAP_SYS_NICE"];
      AmbientCapabilities = ["CAP_SYS_NICE"];
09:06:10
@connorbaker:matrix.orgconnor (he/him)

Two things I've promised to look at today:

  1. Bumping the version of protobuf used by OpenCV, which hasn't been updated in a while (need to backport to 25.05 as well).
  2. Figuring out how to revert https://github.com/NixOS/nixpkgs/pull/414647 in a way that doesn't break consumers of OpenCV -- really don't want cudatoolkit propagated to all consumers of OpenCV.
17:30:42
24 Jun 2025
@connorbaker:matrix.orgconnor (he/him):L23:45:47
@connorbaker:matrix.orgconnor (he/him) https://github.com/NixOS/nixpkgs/blob/5d0aa4675f7a35ec9661325d1dc22dfcbba5d040/pkgs/development/python-modules/warp-lang/default.nix#L100 is wrong; there's no bsd license 23:45:58
@connorbaker:matrix.orgconnor (he/him)https://github.com/NixOS/nixpkgs/pull/41972223:56:43
@hexa:lossy.networkhexaproper meta-checks when 🙂 23:59:26
25 Jun 2025
@connorbaker:matrix.orgconnor (he/him)WIP other PR to fix the CUDA builds: https://github.com/NixOS/nixpkgs/pull/41975001:30:11
@glepage:matrix.orgGaétan LepageThanks for cathing this guys06:37:59
@connorbaker:matrix.orgconnor (he/him) Okay that was a gigantic pain in the ass but I think that PR is all good to go now; added passthru.tests as well. 22:37:42
@indoor_squirrel:matrix.orgindoor_squirrel You go, Connor! We're all rooting for you! Thanks @ss:someonex.net: ! 22:41:36
@indoor_squirrel:matrix.orgindoor_squirrel* You go, Connor! We're all rooting for you! Thanks @ss:someonex.net and Gaétan Lepage!22:42:01
27 Jun 2025
@ss:someonex.netSomeoneSerge (back on matrix) connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss? 13:17:19
@ss:someonex.netSomeoneSerge (back on matrix) * connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss on an ubuntu/jetpack? 13:18:16
@ss:someonex.netSomeoneSerge (back on matrix)

I'm smh getting an error despite the cuda_compat driver:

Runtime version: 12080
Driver version: 12080
Host memory initialized, copying to the device
CUDA error at cudaMalloc(&xDevice, N * sizeof(float)): system has unsupported display driver / cuda driver combination
13:20:34
@connorbaker:matrix.orgconnor (he/him)I'll take a look in a few16:11:38
@connorbaker:matrix.orgconnor (he/him) Off the top of my head -- are you running JetPack 6? On JetPack 5 cuda_compat only works up through 12.2.
The other thing I can think of: make sure the cuda_compat driver comes before the host driver so it's loaded first
16:13:50
@connorbaker:matrix.orgconnor (he/him) IIRC if the host driver is loaded first it ignore the one provided by cuda_compat (I ran into a bunch of issues in my fork of cuda-packages because autoAddDriverRunpath and autoAddCudaCompatHook both append to RUNPATH, so the order they execute in is significant, which is what started the whole process of me writing the array-utilities setup hooks, because if I was going to have to re-arrange arrays (hook order) I wanted to make sure I only had to write the code once and could test it). 16:16:21
@connorbaker:matrix.orgconnor (he/him) SomeoneSerge (Ever OOMed by Element): here's what I got: https://gist.github.com/ConnorBaker/d6791db3dd5a385abfc562af161856e9 20:56:29
@connorbaker:matrix.orgconnor (he/him) It successfully finds and loads the first vendor libraries it needs (libnvrm_gpu.so and libnvrm_mem.so), but then fails to find dependencies of those (like libnvos.so) because they have empty runpaths! 20:58:03
@connorbaker:matrix.orgconnor (he/him)

As an example, doing

sudo /home/connor/.local/state/nix/profile/bin/patchelf --set-rpath '$ORIGIN' /run/opengl-driver/lib/libnvrm_gpu.so

allows it to find more libraries! Not enough to succeed, but it then says

    264389:	find library=libcuda.so.1 [0]; searching
    264389:	 search path=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat:/run/opengl-driver/lib:/nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib		(RUNPATH from file /nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib/libcudart.so.12)
    264389:	  trying file=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat/libcuda.so.1
    264389:	
    264389:	find library=libnvrm_gpu.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_gpu.so
    264389:	
    264389:	find library=libnvrm_mem.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_mem.so
    264389:	
    264389:	find library=libnvos.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvos.so
    264389:	
    264389:	find library=libnvsocsys.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvsocsys.so
    264389:	
    264389:	find library=libnvrm_sync.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_sync.so
    264389:	
    264389:	find library=libnvsciipc.so [0]; searching
    264389:	 search cache=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/etc/ld.so.cache
    264389:	 search path=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib:/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib		(system search path)
    264389:	  trying file=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib/libnvsciipc.so
    264389:	  trying file=/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib/libnvsciipc.so
21:05:38
@connorbaker:matrix.orgconnor (he/him)Perhaps instead of symlinking the host libs, we copy them and patchelf them so they can search in the local directory?21:15:48
@connorbaker:matrix.orgconnor (he/him)Updated the gist so it does what I proposed in the previous message; seems to work!21:47:45
@rosscomputerguy:matrix.orgTristan RossI tried OBS and I'm trying out ollama using CUDA, works great on Ampere Altra Max.22:01:05
28 Jun 2025
@yzx9:matrix.orgZexin Yuan joined the room.05:56:13
@rdg:matrix.org@rdg:matrix.org left the room.23:24:12
30 Jun 2025
@ereslibre:ereslibre.socialereslibre hi everyone! I have reintroduced --gpus for docker and refactored the code a bit to make it easier to maintain. Please, have a look at https://github.com/NixOS/nixpkgs/pull/421088 when you have some time; I can confirm it works on all cases (except for --gpus with rootless mode, what never worked afaik) 07:19:09

Show newer messages


Back to Room ListRoom Version: 9