!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

293 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
18 Nov 2024
@ss:someonex.netSomeoneSerge (back on matrix)You should chat with picnoir too12:20:44
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org
Unrelated -- if anyone has experience with NixOS VM tests and getting multiple nodes to talk to each other, I'd appreciate pointers. ping can resolve hostnames but curl can't for some reason (https://github.com/ConnorBaker/nix-eval-graph/commit/c5a1e2268ead6ff6ffaab672762c1eedee53f403).
Should just work, what is the error?
12:22:30
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
True. I'm still yet to read up on how SLURM and friends do this. Shameless plug: https://github.com/sinanmohd/evanix (slides)
Woah! Thanks for the links, I wasn't aware of these
20:17:47
19 Nov 2024
@hexa:lossy.networkhexapython-updates with numpy 2.1 has landed in staging00:31:36
@hexa:lossy.networkhexasowwy00:31:40
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
Should just work, what is the error?
Curl threw connection refused or something similar; I’ll try to get the log tomorrow
06:34:11
20 Nov 2024
@conroy:corncheese.orgConroy joined the room.04:47:44
@connorbaker:matrix.orgconnor (he/him)I did not get a chance; rip07:22:37
@damesberger:matrix.orgDaniel joined the room.18:53:01
22 Nov 2024
@deng23fdsafgea:matrix.orgdeng23fdsafgea joined the room.06:27:37
@numinit:matrix.orgMorgan (@numinit) joined the room.17:52:10
24 Nov 2024
@sielicki:matrix.orgsielickihttps://negativo17.org/nvidia-driver/ pretty good read 21:49:05
@sielicki:matrix.orgsielickimost of this is stuff that nixos gets right, but it's a nice collection of gotchas and solutions22:01:49
@sielicki:matrix.orgsielickianyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency.22:16:05
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency.
iirc we put it in there because if you set tensorflow = ...callPackage ... { cudaPackages = cudaPackages_XX_y; } you'll need to also pass a compatible nccl
22:17:33
@ss:someonex.netSomeoneSerge (back on matrix) so it's just easier to instantiate each cudaPackages variant with its own nccl and pass it along 22:17:55
@sielicki:matrix.orgsielickiI guess that's fair, and there is a pretty strong coupling of cuda versions and nccl versions... eg: https://github.com/pytorch/pytorch/pull/133593 has been stalled for some time due to nvidia dropping the pypi cu11 package for nccl, so there's reason to keep them consistent even if they technically release separately.22:20:12

Show newer messages


Back to Room ListRoom Version: 9