!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

289 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
12 Apr 2025
@oak:universumi.fioak 🏳️‍🌈♥️ changed their display name from oak to oak - mikatammi.fi ÄÄNESTÄ.12:11:39
@oak:universumi.fioak 🏳️‍🌈♥️ changed their profile picture.12:13:37
@oak:universumi.fioak 🏳️‍🌈♥️ changed their display name from oak - mikatammi.fi ÄÄNESTÄ to oak - mikatammi.fi.12:56:11
13 Apr 2025
@ereslibre:ereslibre.socialereslibre joined the room.11:43:29
@ereslibre:ereslibre.socialereslibre

Hi everyone! I am looking at a bug we have with CDI (Container Device Interface, for forwarding GPU's to containers): https://github.com/NixOS/nixpkgs/issues/397065

I think the user has a correct configuration (unless there are settings that were not mentioned in the issue), my main question is when using the datacenter driver, why the nvidia-container-toolkit is reporting:

ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: ERROR_LIBRARY_NOT_FOUND

Do you have any idea on why NVML would not be present in this environment?

11:45:34
@ss:someonex.netSomeoneSerge (back on matrix)

HI! I've a small announcement to make.

I've been failing badly to keep up with the backlog as a maintainer, even though I'm recently able to spend some more time on Nixpkgs&c. Working in occasional 1:1 meetings, otoh, has always felt comparatively productive. We've just had another call with Gaétan Lepage and I find it was nice, so I now want to try the following: https://md.someonex.net/s/9S4E00sIb#

This is not exactly "official", I'm not posting this e.g. on Discourse until I'm more confident, but as such it's an open invitation.

14:52:27
@glepage:matrix.orgGaétan Lepage Indeed, it was great! We were able to finally finish fixint mistral-rs's cuda support! 15:24:02
@glepage:matrix.orgGaétan Lepage * Indeed, it was great! We were able to finally finish fixing mistral-rs's cuda support! 15:24:22
15 Apr 2025
@ereslibre:ereslibre.socialereslibreBTW folks, if you have a moment, I'd love to get this one merged: https://github.com/NixOS/nixpkgs/pull/36776906:26:28
@ss:someonex.netSomeoneSerge (back on matrix) connor (he/him) (UTC-7): did you use something like josh for cuda-legacy? I suspect this produced at least a few pings 😅 13:35:27
@connorbaker:matrix.orgconnor (he/him)I used https://github.com/newren/git-filter-repo — what would have pinged people?13:37:06
@ss:someonex.netSomeoneSerge (back on matrix)User handles in commit messages xD13:37:28
@ereslibre:ereslibre.socialereslibreHi! Given https://github.com/NixOS/nixpkgs/pull/362197 had conflicts recently due to the treewide formatting I closed it, and reopened it at https://github.com/NixOS/nixpkgs/pull/398993. I think we can merge this one too21:24:34
@ereslibre:ereslibre.socialereslibreWe have been going back and forth with the author for a while, and I thought it would be good to go ahead on our side21:25:10
@ereslibre:ereslibre.socialereslibreThanks!21:29:32
17 Apr 2025
@luke-skywalker:matrix.orgluke-skywalker joined the room.09:38:30
@luke-skywalker:matrix.orgluke-skywalker

is this the right place to ask questions / get pointes on how to properly setup cuda container toolkit?

For docker it seems to work when enabling deprecated enableNvidia = true; flag. However with neither nvidia-container-toolkit in systemPackages with or without hardware.nvidia-container-toolkit.enable = true; I cannot seem to get it to run...

11:01:34
@luke-skywalker:matrix.orgluke-skywalkerwas not lucky at all with containerd for k3s11:02:09
@luke-skywalker:matrix.orgluke-skywalkerfor anybody stumbling over this: I'm pretty sure im on the right track using CDIs, having it work with docker (& compose). Should have read the docs properly. The relevant section section from the nixOS CUDA docs that got me here was all the way at the bottom: https://nixos.wiki/wiki/Nvidia#NVIDIA%20Docker%20not%20Working 14:38:50
@luke-skywalker:matrix.orgluke-skywalkerfrom all I understand this gives a lot more flexibility to pass accelerators of different vendors to containerized workloads 🥳14:39:36
@ss:someonex.netSomeoneSerge (back on matrix) Yes, CDI is the supported way (and has received a lot of care from @ereslibre), enableNvidia relies on end-of-life runtime wrappers 16:18:38
@ss:someonex.netSomeoneSerge (back on matrix)

Should have read the docs properly. The relevant section section from

Did you manage to get containerd to work?

16:20:27
@ereslibre:ereslibre.socialereslibre+1, let us know if you run into any issues when enabling CDI :)19:31:30
18 Apr 2025
@connorbaker:matrix.orgconnor (he/him) SomeoneSerge (UTC+U[-12,12]) I removed all the module system stuff from https://github.com/connorbaker/cuda-packages 11:24:48
@luke-skywalker:matrix.orgluke-skywalker ereslibre: I got it to run with docker but still struggling to getting it to run with containerd and k8s-device-plugin. 20:46:45
@ereslibre:ereslibre.socialereslibre
In reply to @luke-skywalker:matrix.org
ereslibre: I got it to run with docker but still struggling to getting it to run with containerd and k8s-device-plugin.
Interesting. If you feel like it, please open an issue and we can follow up. I did not try to run CDI with either of those
20:48:38
20 Apr 2025
@ss:someonex.netSomeoneSerge (back on matrix) Updated https://github.com/NVIDIA/build-system-archive-import-examples/issues/5 to reflect preference for the.note.dlopen section over eager-loading 09:34:53
@techyporcupine:matrix.org@techyporcupine:matrix.org left the room.18:15:53
21 Apr 2025
@luke-skywalker:matrix.orgluke-skywalkerRedacted or Malformed Event13:54:54
@ss:someonex.netSomeoneSerge (back on matrix) @luke-skywalker:matrix.org: the moderation bot is configured to drop all media in nixos spaces because there was a spam campaign disseminating csam matrix-wide, it's an unfortunate situation but the mods don't really have any other tools at their disposal 19:48:15

Show newer messages


Back to Room ListRoom Version: 9