!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

251 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda46 Servers

Load older messages


SenderMessageTime
25 Jun 2025
@indoor_squirrel:matrix.orgindoor_squirrel* You go, Connor! We're all rooting for you! Thanks @ss:someonex.net and Gaétan Lepage!22:42:01
27 Jun 2025
@ss:someonex.netSomeoneSerge (Ever OOMed by Element) connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss? 13:17:19
@ss:someonex.netSomeoneSerge (Ever OOMed by Element) * connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss on an ubuntu/jetpack? 13:18:16
@ss:someonex.netSomeoneSerge (Ever OOMed by Element)

I'm smh getting an error despite the cuda_compat driver:

Runtime version: 12080
Driver version: 12080
Host memory initialized, copying to the device
CUDA error at cudaMalloc(&xDevice, N * sizeof(float)): system has unsupported display driver / cuda driver combination
13:20:34
@connorbaker:matrix.orgconnor (he/him) (UTC-7)I'll take a look in a few16:11:38
@connorbaker:matrix.orgconnor (he/him) (UTC-7) Off the top of my head -- are you running JetPack 6? On JetPack 5 cuda_compat only works up through 12.2.
The other thing I can think of: make sure the cuda_compat driver comes before the host driver so it's loaded first
16:13:50
@connorbaker:matrix.orgconnor (he/him) (UTC-7) IIRC if the host driver is loaded first it ignore the one provided by cuda_compat (I ran into a bunch of issues in my fork of cuda-packages because autoAddDriverRunpath and autoAddCudaCompatHook both append to RUNPATH, so the order they execute in is significant, which is what started the whole process of me writing the array-utilities setup hooks, because if I was going to have to re-arrange arrays (hook order) I wanted to make sure I only had to write the code once and could test it). 16:16:21
@connorbaker:matrix.orgconnor (he/him) (UTC-7) SomeoneSerge (Ever OOMed by Element): here's what I got: https://gist.github.com/ConnorBaker/d6791db3dd5a385abfc562af161856e9 20:56:29
@connorbaker:matrix.orgconnor (he/him) (UTC-7) It successfully finds and loads the first vendor libraries it needs (libnvrm_gpu.so and libnvrm_mem.so), but then fails to find dependencies of those (like libnvos.so) because they have empty runpaths! 20:58:03
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

As an example, doing

sudo /home/connor/.local/state/nix/profile/bin/patchelf --set-rpath '$ORIGIN' /run/opengl-driver/lib/libnvrm_gpu.so

allows it to find more libraries! Not enough to succeed, but it then says

    264389:	find library=libcuda.so.1 [0]; searching
    264389:	 search path=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat:/run/opengl-driver/lib:/nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib		(RUNPATH from file /nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib/libcudart.so.12)
    264389:	  trying file=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat/libcuda.so.1
    264389:	
    264389:	find library=libnvrm_gpu.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_gpu.so
    264389:	
    264389:	find library=libnvrm_mem.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_mem.so
    264389:	
    264389:	find library=libnvos.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvos.so
    264389:	
    264389:	find library=libnvsocsys.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvsocsys.so
    264389:	
    264389:	find library=libnvrm_sync.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_sync.so
    264389:	
    264389:	find library=libnvsciipc.so [0]; searching
    264389:	 search cache=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/etc/ld.so.cache
    264389:	 search path=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib:/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib		(system search path)
    264389:	  trying file=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib/libnvsciipc.so
    264389:	  trying file=/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib/libnvsciipc.so
21:05:38
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Perhaps instead of symlinking the host libs, we copy them and patchelf them so they can search in the local directory?21:15:48
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Updated the gist so it does what I proposed in the previous message; seems to work!21:47:45
@rosscomputerguy:matrix.orgTristan RossI tried OBS and I'm trying out ollama using CUDA, works great on Ampere Altra Max.22:01:05
28 Jun 2025
@yzx9:matrix.orgZexin Yuan joined the room.05:56:13
@rdg:matrix.org@rdg:matrix.org left the room.23:24:12
30 Jun 2025
@ereslibre:ereslibre.socialereslibre hi everyone! I have reintroduced --gpus for docker and refactored the code a bit to make it easier to maintain. Please, have a look at https://github.com/NixOS/nixpkgs/pull/421088 when you have some time; I can confirm it works on all cases (except for --gpus with rootless mode, what never worked afaik) 07:19:09
@ereslibre:ereslibre.socialereslibre despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for your nixos users 07:20:01
@ereslibre:ereslibre.socialereslibre * despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for our nixos users 07:20:07
@ereslibre:ereslibre.socialereslibre

Actually, I correct myself. Rootless works with --gpus too, if the nvidia-container-runtime is properly configured. Given CDI works for rootless just fine, I don't think it's worth putting much effort into that automation.

I might open a PR to document that

11:13:11
@ereslibre:ereslibre.socialereslibre* despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for our nixos users19:08:34
1 Jul 2025
@djacu:matrix.orgdjacuHey Cuda Team In case you haven't seen the recent post on discourse, the Marketing Team is preparing this year's community survey. I am reaching out to teams to see if there are any questions they would like to add to the survey to better serve the work you all do. More details in the post linked below. https://discourse.nixos.org/t/community-feedback-requested-2025-nix-community-survey-planning/6615503:24:48
2 Jul 2025
@apyh:matrix.orgapyh joined the room.17:28:38
3 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7) SomeoneSerge (Ever OOMed by Element): it occurred to me, this PR I'm going to be reviewing (https://github.com/NixOS/nixpkgs/pull/420575) could potentially serve as a way to inject information about requiredSystemFeatures into the contents of a derivation. It would be possible to have the GPU hook look for that content within the derivation and allow remote builders with GPUs to work as expected, right? (My recollection was that because the requiredSystemFeatures stuff wasn't in the derivation that even if a build was routed to a GPU-equipped builder it wouldn't know to add the GPU to the sandbox because the derivation content gave no indication that it should.) 15:28:22
5 Jul 2025
@aidalgol:matrix.orgaidalgolWould ZLUDA be under the purview of the nixpkgs CUDA maintainers?05:54:53
7 Jul 2025
@tobtobxx:matrix.org@tobtobxx:matrix.org left the room.14:27:56
8 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)
In reply to @aidalgol:matrix.org
Would ZLUDA be under the purview of the nixpkgs CUDA maintainers?
I don’t think it would be since it’s expressly about getting CUDA stuff working on non-NVIDIA devices (although that would definitely benefit from functioning packages!)
AFAICT it’s also in violation of NVIDIA’s EULA for that exact reason :/
08:37:41
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

SomeoneSerge (Ever OOMed by Element)two things:

  1. Can we push back our weekly by 30m? (I’m still not asleep so today is going to be brutal)
  2. I was getting ready to fix up one of the doc PR and remembered this comment: https://github.com/NixOS/nixpkgs/pull/414612#discussion_r2137625627. Do you still feel that tooling needs to exist prior to the changes to documentation along the lines of what’s proposed in the PR is merged?
08:41:14
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Okay, bonus thing: I’ve been taking a swing at getting the required-nix-mount stuff you wrote working with Jetson devices08:42:50
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Also in case people didn't see this, some exciting changes coming in the next release! https://github.com/NixOS/nix/pull/1340714:59:43
10 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)If anyone has the bandwidth to look at https://github.com/NixOS/nixpkgs/pull/422208 or https://github.com/NixOS/nixpkgs/pull/419335, I'd appreciate it22:52:13

Show newer messages


Back to Room ListRoom Version: 9