NixOS CUDA - Public Room Timeline

	NixOS CUDA	282 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
18 Nov 2024
connor (he/him)	In reply to @ss:someonex.net True. I'm still yet to read up on how SLURM and friends do this. Shameless plug: https://github.com/sinanmohd/evanix (slides) Woah! Thanks for the links, I wasn't aware of these	20:17:47
19 Nov 2024
hexa	python-updates with numpy 2.1 has landed in staging	00:31:36
hexa	sowwy	00:31:40
connor (he/him)	In reply to @ss:someonex.net Should just work, what is the error? Curl threw connection refused or something similar; I’ll try to get the log tomorrow	06:34:11
20 Nov 2024
	Conroy joined the room.	04:47:44
connor (he/him)	I did not get a chance; rip	07:22:37
	Daniel joined the room.	18:53:01
22 Nov 2024
	deng23fdsafgea joined the room.	06:27:37
	Morgan (@numinit) joined the room.	17:52:10
24 Nov 2024
sielicki	https://negativo17.org/nvidia-driver/ pretty good read	21:49:05
sielicki	most of this is stuff that nixos gets right, but it's a nice collection of gotchas and solutions	22:01:49
sielicki	anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency.	22:16:05
SomeoneSerge (back on matrix)	In reply to @sielicki:matrix.org anyone have strong opinions on moving nccl and nccl-tests out of cudaModules? Rationale on moving them out: neither one is distributed as a part of the cuda toolkit and they release on an entirely separate cadence, so there's no real reason for it to be in there. It's no different than eg: torch in terms of the cuda dependency. iirc we put it in there because if you set `tensorflow = ...callPackage ... { cudaPackages = cudaPackages_XX_y; }` you'll need to also pass a compatible nccl	22:17:33
SomeoneSerge (back on matrix)	so it's just easier to instantiate each `cudaPackages` variant with its own nccl and pass it along	22:17:55
sielicki	I guess that's fair, and there is a pretty strong coupling of cuda versions and nccl versions... eg: https://github.com/pytorch/pytorch/pull/133593 has been stalled for some time due to nvidia dropping the pypi cu11 package for nccl, so there's reason to keep them consistent even if they technically release separately.	22:20:12
SomeoneSerge (back on matrix)	In reply to @sielicki:matrix.org https://negativo17.org/nvidia-driver/ pretty good read Any highlights, what we might be missing?	22:22:09
sielicki	honestly I am not sure there's anything, I just like the thought that went into it	22:27:21
sielicki	the special softdep for nvidia-uvm etc	22:27:48
SomeoneSerge (back on matrix)	In reply to @sielicki:matrix.org the special softdep for nvidia-uvm etc yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore	22:28:24
SomeoneSerge (back on matrix)	In reply to @sielicki:matrix.org the special softdep for nvidia-uvm etc * yeah we have that, and iirc a special-case for the datacenter driver where it's not a softdep anymore (not sure what the exact situation is)	22:29:12
25 Nov 2024
sielicki	is this useful? https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa	04:31:08
sielicki	example of its output: https://gist.github.com/sielicki/2601de3ad8d8c732af80b12e36d326aa/24c08bb29f1397c7d006b01f7afddd5cb06e90a5	04:31:38
connor (he/him)	You can see what I eventually hope to move in-tree here: https://github.com/ConnorBaker/cuda-packages Here’s the update script I’ve made for the different redists: https://github.com/ConnorBaker/cuda-packages/tree/main/scripts/cuda-redist	07:01:12
connor (he/him)	Ugh we should write an update for the post Tom made on discourse (https://discourse.nixos.org/t/community-team-updates/56458) @someoneserge anything we should mention in particular? I think I started a draft for an update earlier this year so I’ll see if I can find it :/	07:03:39
SomeoneSerge (back on matrix)	In reply to @connorbaker:matrix.org Ugh we should write an update for the post Tom made on discourse (https://discourse.nixos.org/t/community-team-updates/56458) @someoneserge anything we should mention in particular? I think I started a draft for an update earlier this year so I’ll see if I can find it :/ Let's make a shared pad for the draft?	14:18:46
SomeoneSerge (back on matrix)	Also maybe we've already reached the point where a room-wide voice call could be a better way to list the "challenges"	14:41:51
hexa	is anyone here aware why tensorflow 2.13.0 on 24.11 now requires AVX CPU instructions and the same version on 24.05 did not? https://github.com/NixOS/nixpkgs/issues/358973	17:49:22
hexa	yes, the answer is dependencies, likely protobuf from the call trace	17:49:39
hexa	* the likely answer is dependencies, should be protobuf from the call trace	17:49:58
SomeoneSerge (back on matrix)	I was about to post "you sure it's not the pypi garbage" and then thought "I'm surely just being biased"	18:07:06

Show newer messages

Back to Room ListRoom Version: 9