!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

288 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
12 Feb 2025
@connorbaker:matrix.orgconnor (he/him)All, I’m excited for our meeting tomorrow! Do we have a document with an agenda or minimum set of items to cover?15:33:17
@ss:someonex.netSomeoneSerge (back on matrix) Me too! Just started one https://pad.lassul.us/YGyymxE9Qqy9iFVt7A2VnA#, everyone feel free to edit 20:09:57
13 Feb 2025
@connorbaker:matrix.orgconnor (he/him) changed their display name from connor (he/him) (UTC-7) to connor (he/him) (UTC-8).06:59:16
@ss:someonex.netSomeoneSerge (back on matrix)Matrix calls did work last time didn't it?07:51:04
@ss:someonex.netSomeoneSerge (back on matrix) * Matrix call did work last time didn't it?07:51:09
@ss:someonex.netSomeoneSerge (back on matrix)Still haven't figured out what broke in jitsi07:51:29
@ss:someonex.netSomeoneSerge (back on matrix) This is in 50' from now 13:11:36
@ss:someonex.netSomeoneSerge (back on matrix) Let's try this I suppose 13:55:54
Jitsi widget added by @ss:someonex.netSomeoneSerge (back on matrix)13:56:07
@ss:someonex.netSomeoneSerge (back on matrix)(I wonder if this sends room-wide notifications 🤔)13:56:28
@connorbaker:matrix.orgconnor (he/him)I didn't see a notification, but I did see it pop up in the chat13:56:49
@palasso:matrix.org@palasso:matrix.orgIt does. I got a notification.14:02:01
@srhb:matrix.orgsrhbYup :D 14:02:10
@connorbaker:matrix.orgconnor (he/him)

Gaétan Lepage: I've got the manifest for cusparseLT here: https://github.com/ConnorBaker/cuda-packages/blob/main/modules/redists/cusparselt/manifests/0.6.3.json

I think with that you should be able to construct a Nix expression which manually calls redist-builder (or whatever I called it upstream) with the proper arguments

15:24:04
@ss:someonex.netSomeoneSerge (back on matrix)Meeting notes: https://pad.lassul.us/YGyymxE9Qqy9iFVt7A2VnA?both#Conclusion. Some intermediate conversations missing right now, but are recorded by Connor; hopefully he can fill in the blanks when he's free15:30:22
@connorbaker:matrix.orgconnor (he/him)Just pasting the last of them now15:30:46
@ss:someonex.netSomeoneSerge (back on matrix)

Regarding scheduling the future meetings,

  • we should probably aim to meet in 2-4 weeks to follow up on the patchelf exception and for a report on the ephemeral builders situation;
  • we can probably first bring up the alignment questions with nix-community just in their chat, without video because async is faster;
  • additionally, I think I should have hours this and next week to sort the backlog as mentioned in the notes; I think it'd still be useful, for onboarding new people, to do that with the audio and the screenshare, but it's not worth synchronizing people's schedules for this; maybe it'll be just a pop-in format?
15:38:04
@ss:someonex.netSomeoneSerge (back on matrix)(jaja, maybe we do this in Gaetan's twitch?)15:38:26
@glepage:matrix.orgGaétan LepageSure haha15:46:34
14 Feb 2025
@connorbaker:matrix.orgconnor (he/him)As of a few days ago Onnxruntime requires CUDA separable compilation… so I guess I gotta fix that now 🙃01:50:24
@ss:someonex.netSomeoneSerge (back on matrix)

RE: CI infra/yesterday's meeting
CC connor (he/him) (UTC-8):

By the way, while on my side I'm advertising both options for provisioning hardware, the spot instances and the owned hardware, I think we might want to incentivize companies to commit to support the latter path. While it's obviously more work, organisational and engineering, it is a much better long-term promise for the community. With the rented hardware, if two or three companies simultaneously decide to withdraw, we basically have to immediately scale down the CI. If we buy hardware for a non-profit and a few years later some companies decide they're not interested anymore, we maybe lose a retainer covering the maintenance work. With own hardware we can also be more flexible and maybe dedicate some machines to be used as community builders/devboxes for ad hoc experimentation.

11:15:16
@zopieux:matrix.zopi.euzopieux

It's me again :-) This time I have a genuinely surprising behavior from the community cache (the substituters are correctly configured): nccl was successfully built (derivation mv02…), the narinfo is available, but upon nix-shell -p cudaPackages_12.nccl I get

this derivation will be built:
  /nix/store/mv02rgvrhw9n1682dw7vs8w3pssc24lr-nccl-2.21.5-1.drv
(lots of compiling)

Others, like cudaPackages.cudnn, are successfully retrieved from the cache.

17:58:45
@ruroruro:matrix.orgruro

So, uh... I just noticed that CUDA versions prior to 11.4 don't have the individual redistributables (for example, there is no cudaPackages_11_3.cuda_cudart).

Unfortunately, I only noticed this after refactoring cuda-samples to use the individual packages instead of cudatoolkit. sigh

21:12:48
15 Feb 2025
@zowoq:matrix.orgzowoq joined the room.00:48:50
@zowoq:matrix.orgzowoq

we can probably first bring up the alignment questions with nix-community just in their chat

We could do it here if you like, I think that between Jonas Chevalier and me we can represent nix-community and discussion is probably of more interest to the people in this room, we can post a summary in the nix-community matrix.

00:49:34
@zowoq:matrix.orgzowoqhttps://github.com/NixOS/rfcs/pull/185 I discovered this RFC a day ago, I don't think it has been mentioned here yet?00:49:48

Show newer messages


Back to Room ListRoom Version: 9