!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

279 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
17 Nov 2025
@ss:someonex.netSomeoneSerge (back on matrix)Think I caught a touch of a cold, sorry19:48:43
@sporeray:matrix.orgRobbie Buxton
In reply to @bjth:matrix.org

How would you go about conditionally setting cudaCapabilities when instantiating nixpkgs? I.e.

Image I have this.

{
  inputs = {
    nixpkgs = "github:nixos/nixpkgs?ref=nixos-25.05";
  };
  
  outputs = { self, nixpkgs }: {
    packages.x86_64-linux.default = let
      pkgs = import nixpkgs {
        overlays = [ ];
        config = {
          allowUnfree = true;
          cudaSupport = true;
          cudaCapabilities = [ "..." "..." ];
        };
      };
    in
    pkgs.hello;
    
    packages.aarch64-linux.default = let
      pkgs = import nixpkgs {
        overlays = [ ];
        config = {
          allowUnfree = true;
          cudaSupport = true;
          cudaCapabilities = if isJetson then [ "..." "..." ] else [ "..." "..." ];
        };
      };
    in
    pkgs.hello;
  };
}

It's the aarch64-linux part specifically that I'm a bit stuck on. I have some cloud servers that have an NVIDIA GPUs in them that run aarch64-linux, but I also have some Jetson devices that are also considered aarch64-linux.

And if I understand the whole thing correctly, I can't just set the cudaCapabilities list to include both the non-jetson and jetson capabilities, right? Or at least, than isJetsonBuild would just always eval to true even if the build was meant for the cloud server.

Probably something stupid I'm just overlooking, sorry for bothering. 😅

Aarch based nvidia data center gpus 👀, yeah if you get the correct map of the cuda capabilities it should work fine
19:58:02
@sporeray:matrix.orgRobbie Buxton *

Aarch based nvidia data center gpus 👀, yeah if you get the correct map of the cuda capabilities it should work fine

Edit: misread, isJetsonBuild sounds funky so not sure

20:01:07
18 Nov 2025
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)isJetsonBuild and the like are set by cudaCapabilities. Jetson capabilities aren’t included by default because they’re niche architectures and prior to Thor needed separate binaries. If you just need to support Thor you can specify that capability with other ones. If you need to support Orin or Xavier there’s no clean way to do it. Like Serge said, they’re effectively different platforms but Nixpkgs doesn’t have a notion of accelerators and so has no way to differentiate. The only way we can tell in Nixpkgs is whether the Jetson capabilities are explicitly provided.06:26:45
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Would appreciate if someone could review https://github.com/NixOS/nixpkgs/pull/46276106:36:03
@ss:someonex.netSomeoneSerge (back on matrix) Gaétan Lepage: not quite a morning slot, but wdyt about 21:15 Paris for the weekly? 14:13:14
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I should be able to attend too16:00:11
@glepage:matrix.orgGaétan LepageWay better for me.16:14:49
19 Nov 2025
@eymeric:onyx.ovhEymeric joined the room.12:59:28
@jfly:matrix.orgJeremy Fleischman (jfly) joined the room.18:13:28
@jfly:matrix.orgJeremy Fleischman (jfly)

i'm confused about the compatibility story between whatever libcuda.so file i have in /run/opengl-driver and my nvidia kernel module. i've read through <nixos/modules/hardware/video/nvidia.nix> and i see that hardware.graphics.extraPackages basically gets set to pkgs.linuxKernel.packages.linux_6_12.nvidiaPackages.stable.out (or whatever kernel i have selected)

how much drift (if any) is allowed here?

18:18:44
@jfly:matrix.orgJeremy Fleischman (jfly)to avoid an XY problem: what i'm actually doing is experimenting with defining systemd nixos containers that run cuda software internally, and i'm not sure how to get the right libcuda.so's in those containers so they play nicely with the host's kernel18:21:46
@jfly:matrix.orgJeremy Fleischman (jfly)if the answer is "just keep them perfectly in sync with the host kernel's version", that's OK. just trying to flesh out my mental model18:22:27
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) libcuda.so is provided by the NVIDIA CUDA driver, which for our purposes is generally part of the NVIDIA driver for your GPU.
Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have libcuda.so available.
The CDI stuff providing GPU access in containers provides /run/opengl-driver/lib (among other things): https://github.com/NixOS/nixpkgs/blob/6c634f7efae329841baeed19cdb6a8c2fc801ba1/nixos/modules/services/hardware/nvidia-container-toolkit/default.nix#L234-L237
General information about forward-backward compat is in NVIDIA's docs here: https://docs.nvidia.com/deploy/cuda-compatibility/#
18:31:45
@sporeray:matrix.orgRobbie Buxton
In reply to @jfly:matrix.org
to avoid an XY problem: what i'm actually doing is experimenting with defining systemd nixos containers that run cuda software internally, and i'm not sure how to get the right libcuda.so's in those containers so they play nicely with the host's kernel
If you run the host systems cuda kernel drivers ahead of the user mode drivers it’s normally fine provided it’s not a major version change (I.e 13 vs 12)
18:35:26
@jfly:matrix.orgJeremy Fleischman (jfly)

Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have libcuda.so available.

afaik, they do not automatically do anything (please correct me if i'm wrong). i making them get their own libcuda.so by explicitly configuring them with hardware.graphics.enable = true; and hardware.graphics.extraPackages.

mounting the cuda runtime from the host makes sense, though! thanks for the link to this nvidia-container-toolkit

18:39:03
@lt1379:matrix.orgLun What's the current best practice / future plans for impure GPU tests? Is the discussion in https://github.com/NixOS/nixpkgs/issues/225912 up to date? cc SomeoneSerge (back on matrix) 18:43:23
@ss:someonex.netSomeoneSerge (back on matrix)

Do the systemd NixOS containers provide their own copy of NVIDIA's driver? If not, they wouldn't have libcuda.so available.

They don't (unless forced). Libcuda and its closure are mounted from the host.

20:10:33
@ss:someonex.netSomeoneSerge (back on matrix) The issue is maybe growing stale, but I'd say there haven't been any fundamental updates.
One bit it doesn't mention is that we rewrote most of the tests in terms of a single primitive, cudaPackages.writeGpuTestPython (can be overridden for e.g. rocm; could be moved outside cuda-modules).
It's now also clear that the VM tests can also be done, we'd just have to use a separate marker to signal that a builder exposes an nvidia device with a vfio driver.
If we replace the sandboxing mechanism (e.g. with microvms) it'll get trickier... but again, a low-bandwidth baseline with vfio is definitely achievable.
And there's still the issue of describing constraints, like listing the architectures or like memory quotas: we need a pluggable mechanism for assessing which builders are compatible with the derivation?
20:37:12
@ss:someonex.netSomeoneSerge (back on matrix) *

The issue is maybe growing stale, but I'd say there haven't been any fundamental updates.

  • One bit it doesn't mention is that we rewrote most of the tests in terms of a single primitive, cudaPackages.writeGpuTestPython (can be overridden for e.g. rocm; could be moved outside cuda-modules).
  • It's now also clear that the VM tests can also be done, we'd just have to use a separate marker to signal that a builder exposes an nvidia device with a vfio driver.
  • If we replace the sandboxing mechanism (e.g. with microvms) it'll get trickier... but again, a low-bandwidth baseline with vfio is definitely achievable.
  • And there's still the issue of describing constraints, like listing the architectures or like memory quotas: we need a pluggable mechanism for assessing which builders are compatible with the derivation? Maybe a proxy instead...
20:37:53
@ss:someonex.netSomeoneSerge (back on matrix) Also note that we still mount libcuda from /run/current-system instead of /run/booted-system... 20:39:08
@jfly:matrix.orgJeremy Fleischman (jfly) Ah that sort of sounds like a bug since we'd want to be compatible with the host kernel? 21:28:58
@apyh:matrix.orgapyhyeah, current system means that updating nvidia drivers with a rebuild switch breaks all CUDA until a reboot21:34:12
@apyh:matrix.orgapyh(experience this semi-frequently)21:34:20
20 Nov 2025
@user12592851:matrix.orgJohn joined the room.05:54:29
@ser:sergevictor.euser(ial)i have a Debian host with nvidia gpu which runs incus and in incus i have nixos containers. how can i utilise cuda programs in such container?10:24:20
@plan9better:matrix.orgplan9better joined the room.12:41:04
@ss:someonex.netSomeoneSerge (back on matrix)Hi. How do you use cuda in a non-NixOS container with Incus? Does it use CDI?13:22:58
@ser:sergevictor.euser(ial)with debian container i use built-in incus "nvidia.runtime" which passes the host NVIDIA and CUDA runtime libraries into the instance13:30:32
@ser:sergevictor.euser(ial)but nixos naturally does not seek for these libraries in that place13:31:15

Show newer messages


Back to Room ListRoom Version: 9