!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

286 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
10 Apr 2025
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) X86 has a cuda_compat library too from what I remember, it’s just not available as a redist
So maybe we shouldn’t package the one for Jetsons
And instead, nixglhost should use the one on the host system if it is available
16:36:17
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Although that won’t us on NixOS systems — cuda_compat is usually provided as a Debian with newer releases of CUDA, so it would just fail to run on NixOS systems if the driver isn’t new enough16:37:40
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)* Although that won’t help us on NixOS systems — cuda_compat is usually provided as a Debian with newer releases of CUDA, so it would just fail to run on NixOS systems if the driver isn’t new enough16:37:48
@ss:someonex.netSomeoneSerge (back on matrix)

So maybe we shouldn’t package the one for Jetsons

No, I think whenever it's available we'd rather do the pure linking, because that's what we do to other libraries. This is in general a tradeoff, and it would have been great if we had tools for quickly relinking stuff/tools for building stuff against reproducible content-addressed stubs with a separate linking phase, but that's not where we are

16:39:50
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) Ugh
So on all platforms, we should only use cuda_compat if the host driver is old and we need forward compat
I guess the question is where cuda_compat should come from, if the decision to use it or not requires knowing what version the host driver is
16:40:10
@ss:someonex.netSomeoneSerge (back on matrix)This is not different from the GL/vulkan situation16:41:29
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)(Where it should come from meaning Nixpkgs and the runpath or from the host OS, which is a non-starter on NixOs systems since we don’t package it, although we could, but then for people to add it to their environment they’d need to rebuild ugh)16:41:35
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Oh? What’s that situation?16:42:43
@ss:someonex.netSomeoneSerge (back on matrix)The situation is we'd like to develop and link a dynamic shim (libglvnd-like) that can select the right thing at runtime (per the logic you wrote down)16:44:07
@ss:someonex.netSomeoneSerge (back on matrix)Nixpkgs breaks GL/Vulkan on NixOS when mixing revisions because we don't have this shim logic16:45:01
@ss:someonex.netSomeoneSerge (back on matrix)* Nixpkgs breaks GL/Vulkan on NixOS when mixing revisions because we don't have this shim16:45:23
@ss:someonex.netSomeoneSerge (back on matrix)Or, maybe, instead of the selection logic we need better isolation. I.e. libcapsule16:49:41
@ss:someonex.netSomeoneSerge (back on matrix)In either case we need to learn to mimic another library's interface. This exists for GL and afaict we (nixos) still don't know whether this actually works. I've no idea if we can do this to libcuda or libcudart, or whether that would be even legal16:51:01
11 Apr 2025
@saeedc:matrix.orgsaeedc joined the room.20:55:59
12 Apr 2025
@oak:universumi.fioak 🏳️‍🌈♥️ changed their display name from oak to oak - mikatammi.fi ÄÄNESTÄ.12:11:39
@oak:universumi.fioak 🏳️‍🌈♥️ changed their profile picture.12:13:37
@oak:universumi.fioak 🏳️‍🌈♥️ changed their display name from oak - mikatammi.fi ÄÄNESTÄ to oak - mikatammi.fi.12:56:11
13 Apr 2025
@ereslibre:ereslibre.socialereslibre joined the room.11:43:29
@ereslibre:ereslibre.socialereslibre

Hi everyone! I am looking at a bug we have with CDI (Container Device Interface, for forwarding GPU's to containers): https://github.com/NixOS/nixpkgs/issues/397065

I think the user has a correct configuration (unless there are settings that were not mentioned in the issue), my main question is when using the datacenter driver, why the nvidia-container-toolkit is reporting:

ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: ERROR_LIBRARY_NOT_FOUND

Do you have any idea on why NVML would not be present in this environment?

11:45:34
@ss:someonex.netSomeoneSerge (back on matrix)

HI! I've a small announcement to make.

I've been failing badly to keep up with the backlog as a maintainer, even though I'm recently able to spend some more time on Nixpkgs&c. Working in occasional 1:1 meetings, otoh, has always felt comparatively productive. We've just had another call with Gaétan Lepage and I find it was nice, so I now want to try the following: https://md.someonex.net/s/9S4E00sIb#

This is not exactly "official", I'm not posting this e.g. on Discourse until I'm more confident, but as such it's an open invitation.

14:52:27
@glepage:matrix.orgGaétan Lepage Indeed, it was great! We were able to finally finish fixint mistral-rs's cuda support! 15:24:02
@glepage:matrix.orgGaétan Lepage * Indeed, it was great! We were able to finally finish fixing mistral-rs's cuda support! 15:24:22
15 Apr 2025
@ereslibre:ereslibre.socialereslibreBTW folks, if you have a moment, I'd love to get this one merged: https://github.com/NixOS/nixpkgs/pull/36776906:26:28

Show newer messages


Back to Room ListRoom Version: 9