!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

251 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda46 Servers

Load older messages


SenderMessageTime
27 Jun 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)I'll take a look in a few16:11:38
@connorbaker:matrix.orgconnor (he/him) (UTC-7) Off the top of my head -- are you running JetPack 6? On JetPack 5 cuda_compat only works up through 12.2.
The other thing I can think of: make sure the cuda_compat driver comes before the host driver so it's loaded first
16:13:50
@connorbaker:matrix.orgconnor (he/him) (UTC-7) IIRC if the host driver is loaded first it ignore the one provided by cuda_compat (I ran into a bunch of issues in my fork of cuda-packages because autoAddDriverRunpath and autoAddCudaCompatHook both append to RUNPATH, so the order they execute in is significant, which is what started the whole process of me writing the array-utilities setup hooks, because if I was going to have to re-arrange arrays (hook order) I wanted to make sure I only had to write the code once and could test it). 16:16:21
@connorbaker:matrix.orgconnor (he/him) (UTC-7) SomeoneSerge (Ever OOMed by Element): here's what I got: https://gist.github.com/ConnorBaker/d6791db3dd5a385abfc562af161856e9 20:56:29
@connorbaker:matrix.orgconnor (he/him) (UTC-7) It successfully finds and loads the first vendor libraries it needs (libnvrm_gpu.so and libnvrm_mem.so), but then fails to find dependencies of those (like libnvos.so) because they have empty runpaths! 20:58:03
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

As an example, doing

sudo /home/connor/.local/state/nix/profile/bin/patchelf --set-rpath '$ORIGIN' /run/opengl-driver/lib/libnvrm_gpu.so

allows it to find more libraries! Not enough to succeed, but it then says

    264389:	find library=libcuda.so.1 [0]; searching
    264389:	 search path=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat:/run/opengl-driver/lib:/nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib		(RUNPATH from file /nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib/libcudart.so.12)
    264389:	  trying file=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat/libcuda.so.1
    264389:	
    264389:	find library=libnvrm_gpu.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_gpu.so
    264389:	
    264389:	find library=libnvrm_mem.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_mem.so
    264389:	
    264389:	find library=libnvos.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvos.so
    264389:	
    264389:	find library=libnvsocsys.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvsocsys.so
    264389:	
    264389:	find library=libnvrm_sync.so [0]; searching
    264389:	 search path=/run/opengl-driver/lib		(RUNPATH from file ./saxpy/bin/saxpy)
    264389:	  trying file=/run/opengl-driver/lib/libnvrm_sync.so
    264389:	
    264389:	find library=libnvsciipc.so [0]; searching
    264389:	 search cache=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/etc/ld.so.cache
    264389:	 search path=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib:/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib		(system search path)
    264389:	  trying file=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib/libnvsciipc.so
    264389:	  trying file=/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib/libnvsciipc.so
21:05:38
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Perhaps instead of symlinking the host libs, we copy them and patchelf them so they can search in the local directory?21:15:48
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Updated the gist so it does what I proposed in the previous message; seems to work!21:47:45
@rosscomputerguy:matrix.orgTristan RossI tried OBS and I'm trying out ollama using CUDA, works great on Ampere Altra Max.22:01:05
28 Jun 2025
@yzx9:matrix.orgZexin Yuan joined the room.05:56:13
@rdg:matrix.org@rdg:matrix.org left the room.23:24:12
30 Jun 2025
@ereslibre:ereslibre.socialereslibre hi everyone! I have reintroduced --gpus for docker and refactored the code a bit to make it easier to maintain. Please, have a look at https://github.com/NixOS/nixpkgs/pull/421088 when you have some time; I can confirm it works on all cases (except for --gpus with rootless mode, what never worked afaik) 07:19:09
@ereslibre:ereslibre.socialereslibre despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for your nixos users 07:20:01
@ereslibre:ereslibre.socialereslibre * despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for our nixos users 07:20:07
@ereslibre:ereslibre.socialereslibre

Actually, I correct myself. Rootless works with --gpus too, if the nvidia-container-runtime is properly configured. Given CDI works for rootless just fine, I don't think it's worth putting much effort into that automation.

I might open a PR to document that

11:13:11
@ereslibre:ereslibre.socialereslibre* despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for our nixos users19:08:34
1 Jul 2025
@djacu:matrix.orgdjacuHey Cuda Team In case you haven't seen the recent post on discourse, the Marketing Team is preparing this year's community survey. I am reaching out to teams to see if there are any questions they would like to add to the survey to better serve the work you all do. More details in the post linked below. https://discourse.nixos.org/t/community-feedback-requested-2025-nix-community-survey-planning/6615503:24:48
2 Jul 2025
@apyh:matrix.orgapyh joined the room.17:28:38
3 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7) SomeoneSerge (Ever OOMed by Element): it occurred to me, this PR I'm going to be reviewing (https://github.com/NixOS/nixpkgs/pull/420575) could potentially serve as a way to inject information about requiredSystemFeatures into the contents of a derivation. It would be possible to have the GPU hook look for that content within the derivation and allow remote builders with GPUs to work as expected, right? (My recollection was that because the requiredSystemFeatures stuff wasn't in the derivation that even if a build was routed to a GPU-equipped builder it wouldn't know to add the GPU to the sandbox because the derivation content gave no indication that it should.) 15:28:22
5 Jul 2025
@aidalgol:matrix.orgaidalgolWould ZLUDA be under the purview of the nixpkgs CUDA maintainers?05:54:53
7 Jul 2025
@tobtobxx:matrix.org@tobtobxx:matrix.org left the room.14:27:56
8 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)
In reply to @aidalgol:matrix.org
Would ZLUDA be under the purview of the nixpkgs CUDA maintainers?
I don’t think it would be since it’s expressly about getting CUDA stuff working on non-NVIDIA devices (although that would definitely benefit from functioning packages!)
AFAICT it’s also in violation of NVIDIA’s EULA for that exact reason :/
08:37:41
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

SomeoneSerge (Ever OOMed by Element)two things:

  1. Can we push back our weekly by 30m? (I’m still not asleep so today is going to be brutal)
  2. I was getting ready to fix up one of the doc PR and remembered this comment: https://github.com/NixOS/nixpkgs/pull/414612#discussion_r2137625627. Do you still feel that tooling needs to exist prior to the changes to documentation along the lines of what’s proposed in the PR is merged?
08:41:14
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Okay, bonus thing: I’ve been taking a swing at getting the required-nix-mount stuff you wrote working with Jetson devices08:42:50
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Also in case people didn't see this, some exciting changes coming in the next release! https://github.com/NixOS/nix/pull/1340714:59:43
10 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)If anyone has the bandwidth to look at https://github.com/NixOS/nixpkgs/pull/422208 or https://github.com/NixOS/nixpkgs/pull/419335, I'd appreciate it22:52:13
13 Jul 2025
@me:caem.dev@me:caem.dev left the room.00:13:30
15 Jul 2025
@farmerd:matrix.orgfarmerd joined the room.03:17:28
@farmerd:matrix.orgfarmerdI don't know if anyone has a minute to help double check me on something quickly but I've tried about half a dozen different ways to get pytorch working on nixos with cuda and I am continually getting build errors. This flake (https://github.com/mschoder/nix-cuda-template ) seemed like something that perhaps someone else could quickly check to see if the compilation issues I'm seeing are just me or more widespread? For me it actually generates a segfault in GCC so it's quite bizarre.03:23:11
@mcwitt:matrix.orgmcwitt

Hi farmerd , could you say a bit more about what you're trying to do and what specific errors you see?

For basic pytorch usage with the CUDA backend, the following minimal flake seems to work fine for me (just tested on nixpkgs-unstable): https://gist.github.com/mcwitt/b6c8da58a2e1fcbc1c2728f8f60ad136

18:04:39

There are no newer messages yet.


Back to Room ListRoom Version: 9