NixOS CUDA - Public Room Timeline

	NixOS CUDA	251 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	46 Servers

Load older messages

Sender	Message	Time
27 Jun 2025
connor (he/him) (UTC-7)	I'll take a look in a few	16:11:38
connor (he/him) (UTC-7)	Off the top of my head -- are you running JetPack 6? On JetPack 5 `cuda_compat` only works up through 12.2. The other thing I can think of: make sure the cuda_compat driver comes before the host driver so it's loaded first	16:13:50
connor (he/him) (UTC-7)	IIRC if the host driver is loaded first it ignore the one provided by cuda_compat (I ran into a bunch of issues in my fork of `cuda-packages` because `autoAddDriverRunpath` and `autoAddCudaCompatHook` both append to RUNPATH, so the order they execute in is significant, which is what started the whole process of me writing the array-utilities setup hooks, because if I was going to have to re-arrange arrays (hook order) I wanted to make sure I only had to write the code once and could test it).	16:16:21
connor (he/him) (UTC-7)	SomeoneSerge (Ever OOMed by Element): here's what I got: https://gist.github.com/ConnorBaker/d6791db3dd5a385abfc562af161856e9	20:56:29
connor (he/him) (UTC-7)	It successfully finds and loads the first vendor libraries it needs (`libnvrm_gpu.so` and `libnvrm_mem.so`), but then fails to find dependencies of those (like `libnvos.so`) because they have empty runpaths!	20:58:03
connor (he/him) (UTC-7)	As an example, doing `sudo /home/connor/.local/state/nix/profile/bin/patchelf --set-rpath '$ORIGIN' /run/opengl-driver/lib/libnvrm_gpu.so` allows it to find more libraries! Not enough to succeed, but it then says 264389: find library=libcuda.so.1 [0]; searching 264389: search path=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat:/run/opengl-driver/lib:/nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib (RUNPATH from file /nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib/libcudart.so.12) 264389: trying file=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat/libcuda.so.1 264389: 264389: find library=libnvrm_gpu.so [0]; searching 264389: search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat (RUNPATH from file ./saxpy/bin/saxpy) 264389: trying file=/run/opengl-driver/lib/libnvrm_gpu.so 264389: 264389: find library=libnvrm_mem.so [0]; searching 264389: search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat (RUNPATH from file ./saxpy/bin/saxpy) 264389: trying file=/run/opengl-driver/lib/libnvrm_mem.so 264389: 264389: find library=libnvos.so [0]; searching 264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy) 264389: trying file=/run/opengl-driver/lib/libnvos.so 264389: 264389: find library=libnvsocsys.so [0]; searching 264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy) 264389: trying file=/run/opengl-driver/lib/libnvsocsys.so 264389: 264389: find library=libnvrm_sync.so [0]; searching 264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy) 264389: trying file=/run/opengl-driver/lib/libnvrm_sync.so 264389: 264389: find library=libnvsciipc.so [0]; searching 264389: search cache=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/etc/ld.so.cache 264389: search path=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib:/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib (system search path) 264389: trying file=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib/libnvsciipc.so 264389: trying file=/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib/libnvsciipc.so	21:05:38
connor (he/him) (UTC-7)	Perhaps instead of symlinking the host libs, we copy them and patchelf them so they can search in the local directory?	21:15:48
connor (he/him) (UTC-7)	Updated the gist so it does what I proposed in the previous message; seems to work!	21:47:45
Tristan Ross	I tried OBS and I'm trying out ollama using CUDA, works great on Ampere Altra Max.	22:01:05
28 Jun 2025
	Zexin Yuan joined the room.	05:56:13
	@rdg:matrix.org left the room.	23:24:12
30 Jun 2025
ereslibre	hi everyone! I have reintroduced `--gpus` for docker and refactored the code a bit to make it easier to maintain. Please, have a look at https://github.com/NixOS/nixpkgs/pull/421088 when you have some time; I can confirm it works on all cases (except for --gpus with rootless mode, what never worked afaik)	07:19:09
ereslibre	despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect `--gpus` to work. This will make it easier for your nixos users	07:20:01
ereslibre	* despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect `--gpus` to work. This will make it easier for our nixos users	07:20:07
ereslibre	Actually, I correct myself. Rootless works with `--gpus` too, if the `nvidia-container-runtime` is properly configured. Given CDI works for rootless just fine, I don't think it's worth putting much effort into that automation. I might open a PR to document that	11:13:11
ereslibre	* despite the container device interface (CDI) should be the way to go, there's still many tutorials/manuals/scripts out there that expect --gpus to work. This will make it easier for our nixos users	19:08:34
1 Jul 2025
djacu	Hey Cuda Team In case you haven't seen the recent post on discourse, the Marketing Team is preparing this year's community survey. I am reaching out to teams to see if there are any questions they would like to add to the survey to better serve the work you all do. More details in the post linked below. https://discourse.nixos.org/t/community-feedback-requested-2025-nix-community-survey-planning/66155	03:24:48
2 Jul 2025
	apyh joined the room.	17:28:38
3 Jul 2025
connor (he/him) (UTC-7)	SomeoneSerge (Ever OOMed by Element): it occurred to me, this PR I'm going to be reviewing (https://github.com/NixOS/nixpkgs/pull/420575) could potentially serve as a way to inject information about `requiredSystemFeatures` into the contents of a derivation. It would be possible to have the GPU hook look for that content within the derivation and allow remote builders with GPUs to work as expected, right? (My recollection was that because the requiredSystemFeatures stuff wasn't in the derivation that even if a build was routed to a GPU-equipped builder it wouldn't know to add the GPU to the sandbox because the derivation content gave no indication that it should.)	15:28:22
5 Jul 2025
aidalgol	Would ZLUDA be under the purview of the nixpkgs CUDA maintainers?	05:54:53
7 Jul 2025
	@tobtobxx:matrix.org left the room.	14:27:56
8 Jul 2025
connor (he/him) (UTC-7)	In reply to @aidalgol:matrix.org Would ZLUDA be under the purview of the nixpkgs CUDA maintainers? I don’t think it would be since it’s expressly about getting CUDA stuff working on non-NVIDIA devices (although that would definitely benefit from functioning packages!) AFAICT it’s also in violation of NVIDIA’s EULA for that exact reason :/	08:37:41
connor (he/him) (UTC-7)	SomeoneSerge (Ever OOMed by Element)two things: Can we push back our weekly by 30m? (I’m still not asleep so today is going to be brutal) I was getting ready to fix up one of the doc PR and remembered this comment: https://github.com/NixOS/nixpkgs/pull/414612#discussion_r2137625627. Do you still feel that tooling needs to exist prior to the changes to documentation along the lines of what’s proposed in the PR is merged?	08:41:14
connor (he/him) (UTC-7)	Okay, bonus thing: I’ve been taking a swing at getting the required-nix-mount stuff you wrote working with Jetson devices	08:42:50
connor (he/him) (UTC-7)	Also in case people didn't see this, some exciting changes coming in the next release! https://github.com/NixOS/nix/pull/13407	14:59:43
10 Jul 2025
connor (he/him) (UTC-7)	If anyone has the bandwidth to look at https://github.com/NixOS/nixpkgs/pull/422208 or https://github.com/NixOS/nixpkgs/pull/419335, I'd appreciate it	22:52:13
13 Jul 2025
	@me:caem.dev left the room.	00:13:30
15 Jul 2025
	farmerd joined the room.	03:17:28
farmerd	I don't know if anyone has a minute to help double check me on something quickly but I've tried about half a dozen different ways to get pytorch working on nixos with cuda and I am continually getting build errors. This flake (https://github.com/mschoder/nix-cuda-template ) seemed like something that perhaps someone else could quickly check to see if the compilation issues I'm seeing are just me or more widespread? For me it actually generates a segfault in GCC so it's quite bizarre.	03:23:11
mcwitt	Hi farmerd , could you say a bit more about what you're trying to do and what specific errors you see? For basic pytorch usage with the CUDA backend, the following minimal flake seems to work fine for me (just tested on nixpkgs-unstable): https://gist.github.com/mcwitt/b6c8da58a2e1fcbc1c2728f8f60ad136	18:04:39

There are no newer messages yet.

Back to Room ListRoom Version: 9