NixOS CUDA - Public Room Timeline

	NixOS CUDA	306 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	60 Servers

Load older messages

Sender	Message	Time
18 Nov 2025
SomeoneSerge (matrix works sometimes)	Gaétan Lepage: not quite a morning slot, but wdyt about 21:15 Paris for the weekly?	14:13:14
connor (burnt/out) (UTC-8)	I should be able to attend too	16:00:11
Gaétan Lepage	Way better for me.	16:14:49
19 Nov 2025
	Eymeric joined the room.	12:59:28
	Jeremy Fleischman (jfly) joined the room.	18:13:28
Jeremy Fleischman (jfly)	i'm confused about the compatibility story between whatever libcuda.so file i have in `/run/opengl-driver` and my nvidia kernel module. i've read through <nixos/modules/hardware/video/nvidia.nix> and i see that `hardware.graphics.extraPackages` basically gets set to `pkgs.linuxKernel.packages.linux_6_12.nvidiaPackages.stable.out` (or whatever kernel i have selected) how much drift (if any) is allowed here?	18:18:44
Jeremy Fleischman (jfly)	to avoid an XY problem: what i'm actually doing is experimenting with defining systemd nixos containers that run cuda software internally, and i'm not sure how to get the right libcuda.so's in those containers so they play nicely with the host's kernel	18:21:46
Jeremy Fleischman (jfly)	if the answer is "just keep them perfectly in sync with the host kernel's version", that's OK. just trying to flesh out my mental model	18:22:27
connor (burnt/out) (UTC-8)	`libcuda.so` is provided by the NVIDIA CUDA driver, which for our purposes is generally part of the NVIDIA driver for your GPU. Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have `libcuda.so` available. The CDI stuff providing GPU access in containers provides /run/opengl-driver/lib (among other things): https://github.com/NixOS/nixpkgs/blob/6c634f7efae329841baeed19cdb6a8c2fc801ba1/nixos/modules/services/hardware/nvidia-container-toolkit/default.nix#L234-L237 General information about forward-backward compat is in NVIDIA's docs here: https://docs.nvidia.com/deploy/cuda-compatibility/#	18:31:45
Robbie Buxton	In reply to @jfly:matrix.org to avoid an XY problem: what i'm actually doing is experimenting with defining systemd nixos containers that run cuda software internally, and i'm not sure how to get the right libcuda.so's in those containers so they play nicely with the host's kernel If you run the host systems cuda kernel drivers ahead of the user mode drivers it’s normally fine provided it’s not a major version change (I.e 13 vs 12)	18:35:26
Jeremy Fleischman (jfly)	Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have libcuda.so available. afaik, they do not automatically do anything (please correct me if i'm wrong). i making them get their own libcuda.so by explicitly configuring them with `hardware.graphics.enable = true;` and `hardware.graphics.extraPackages`. mounting the cuda runtime from the host makes sense, though! thanks for the link to this nvidia-container-toolkit	18:39:03
Lun	What's the current best practice / future plans for impure GPU tests? Is the discussion in https://github.com/NixOS/nixpkgs/issues/225912 up to date? cc SomeoneSerge (back on matrix)	18:43:23
SomeoneSerge (matrix works sometimes)	Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have libcuda.so available. They don't (unless forced). Libcuda and its closure are mounted from the host.	20:10:33
SomeoneSerge (matrix works sometimes)	The issue is maybe growing stale, but I'd say there haven't been any fundamental updates. One bit it doesn't mention is that we rewrote most of the tests in terms of a single primitive, `cudaPackages.writeGpuTestPython` (can be overridden for e.g. `rocm`; could be moved outside cuda-modules). It's now also clear that the VM tests can also be done, we'd just have to use a separate marker to signal that a builder exposes an nvidia device with a vfio driver. If we replace the sandboxing mechanism (e.g. with microvms) it'll get trickier... but again, a low-bandwidth baseline with vfio is definitely achievable. And there's still the issue of describing constraints, like listing the architectures or like memory quotas: we need a pluggable mechanism for assessing which builders are compatible with the derivation?	20:37:12
SomeoneSerge (matrix works sometimes)	* The issue is maybe growing stale, but I'd say there haven't been any fundamental updates. One bit it doesn't mention is that we rewrote most of the tests in terms of a single primitive, `cudaPackages.writeGpuTestPython` (can be overridden for e.g. `rocm`; could be moved outside cuda-modules). It's now also clear that the VM tests can also be done, we'd just have to use a separate marker to signal that a builder exposes an nvidia device with a vfio driver. If we replace the sandboxing mechanism (e.g. with microvms) it'll get trickier... but again, a low-bandwidth baseline with vfio is definitely achievable. And there's still the issue of describing constraints, like listing the architectures or like memory quotas: we need a pluggable mechanism for assessing which builders are compatible with the derivation? Maybe a proxy instead...	20:37:53
SomeoneSerge (matrix works sometimes)	Also note that we still mount `libcuda` from `/run/current-system` instead of `/run/booted-system`...	20:39:08
Jeremy Fleischman (jfly)	Ah that sort of sounds like a bug since we'd want to be compatible with the host kernel?	21:28:58
apyh	yeah, current system means that updating nvidia drivers with a rebuild switch breaks all CUDA until a reboot	21:34:12
apyh	(experience this semi-frequently)	21:34:20
20 Nov 2025
	John joined the room.	05:54:29
ser(ial)	i have a Debian host with nvidia gpu which runs incus and in incus i have nixos containers. how can i utilise cuda programs in such container?	10:24:20
	plan9better joined the room.	12:41:04
SomeoneSerge (matrix works sometimes)	Hi. How do you use cuda in a non-NixOS container with Incus? Does it use CDI?	13:22:58
ser(ial)	with debian container i use built-in incus "nvidia.runtime" which passes the host NVIDIA and CUDA runtime libraries into the instance	13:30:32
ser(ial)	but nixos naturally does not seek for these libraries in that place	13:31:15
ser(ial)	does it mean that i need full libraries in nixos container which are with identical version as on debian host?	13:32:26
connor (burnt/out) (UTC-8)	Gaétan Lepage: I've got to package ONNX/ONNX Runtime/ONNX TensorRT for C++; if I upstream the PR do you think you'd have the bandwidth to look at it? I'd likely follow what I did here: https://github.com/ConnorBaker/cuda-packages/tree/8a317116a07717b13e0608f47b78bd6d75f8bb99/pkgs/development/libraries That is, the sort of cursed double-build in a single derivation which produces both the C++ binaries and a python wheel, so the `python3Packages` entry essentially turns into installing a wheel.	14:04:07
teto	are there differences between https://nix-community.cachix.org and https://cache.nixos-cuda.org . My goal is to gain access to cuda-enable packages for unstable	14:24:20
connor (burnt/out) (UTC-8)	community cache is no longer being populated, use the latter	14:27:28
connor (burnt/out) (UTC-8)	* community cache is no longer being populated with CUDA packages, use the latter	14:27:35

Show newer messages

Back to Room ListRoom Version: 9