!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
19 Nov 2024
@hexa:lossy.networkhexasowwy00:31:40
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
Should just work, what is the error?
Curl threw connection refused or something similar; I’ll try to get the log tomorrow
06:34:11
20 Nov 2024
@conroy:corncheese.orgConroy joined the room.04:47:44
@connorbaker:matrix.orgconnor (he/him)I did not get a chance; rip07:22:37
@damesberger:matrix.orgDaniel joined the room.18:53:01
22 Nov 2024
@deng23fdsafgea:matrix.orgdeng23fdsafgea joined the room.06:27:37
@numinit:matrix.orgMorgan (@numinit) joined the room.17:52:10
24 Nov 2024
@sielicki:matrix.orgsielickihttps://negativo17.org/nvidia-driver/ pretty good read 21:49:05
@sielicki:matrix.orgsielickimost of this is stuff that nixos gets right, but it's a nice collection of gotchas and solutions22:01:49
@sielicki:matrix.orgsielickianyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency.22:16:05
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency.
iirc we put it in there because if you set tensorflow = ...callPackage ... { cudaPackages = cudaPackages_XX_y; } you'll need to also pass a compatible nccl
22:17:33
@ss:someonex.netSomeoneSerge (back on matrix) so it's just easier to instantiate each cudaPackages variant with its own nccl and pass it along 22:17:55
@sielicki:matrix.orgsielickiI guess that's fair, and there is a pretty strong coupling of cuda versions and nccl versions... eg: https://github.com/pytorch/pytorch/pull/133593 has been stalled for some time due to nvidia dropping the pypi cu11 package for nccl, so there's reason to keep them consistent even if they technically release separately.22:20:12
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
https://negativo17.org/nvidia-driver/ pretty good read
Any highlights, what we might be missing?
22:22:09
@sielicki:matrix.orgsielickihonestly I am not sure there's anything, I just like the thought that went into it22:27:21
@sielicki:matrix.orgsielickithe special softdep for nvidia-uvm etc22:27:48
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
the special softdep for nvidia-uvm etc
yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore
22:28:24
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
the special softdep for nvidia-uvm etc
* yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore (not sure what the exact situation is)
22:29:12
25 Nov 2024
@sielicki:matrix.orgsielickiis this useful? https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa04:31:08
@sielicki:matrix.orgsielickiexample of its output: https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa/24c08bb29f1397c7d006b01f7afddd5cb06e90a504:31:38
@connorbaker:matrix.orgconnor (he/him) You can see what I eventually hope to move in-tree here: https://github.com/ConnorBaker/cuda-packages
Here’s the update script I’ve made for the different redists: https://github.com/ConnorBaker/cuda-packages/tree/main/scripts/cuda-redist
07:01:12

Show newer messages


Back to Room ListRoom Version: 9