NixOS CUDA - Public Room Timeline

	NixOS CUDA	316 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	63 Servers

Load older messages

Sender	Message	Time
7 May 2024
yklcs	Hello, I was wondering whether `cudaPackages.cudatoolkit` with Nix would allow me to use multiple versions of CUDA on my machine, either with NixOS or by just using the Nix package manager.	02:31:16
@brandon:matrix.radiation.io	yklcs: Cuda can be tricky but I've had good luck using nix shell and specific versions of cudatoolkit.	04:58:33
yklcs	In reply to @brandon:matrix.radiation.io yklcs: Cuda can be tricky but I've had good luck using nix shell and specific versions of cudatoolkit. Thanks. Do you have any `.nix` files to share?	05:38:34
@ironbound:hackerspace.pl	https://nixos.wiki/wiki/CUDA yklcs	10:06:58
Moritz Sanft	Hey, I tried switching from `virtualisation.docker.enableNvidia = true;` to the more recent `virtualisation.containers.cdi.dynamic.nvidia.enable = true;`, `hardware.nvidia-container-toolkit.enable = true;` and `features.cdi = true;`. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs: ` ` Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea?	14:57:05
Moritz Sanft	Hey, I tried switching from `virtualisation.docker.enableNvidia = true;` to the more recent `virtualisation.containers.cdi.dynamic.nvidia.enable = true;`, `hardware.nvidia-container-toolkit.enable = true;` and `features.cdi = true;`. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs: `May 07 14:50:04 nixos docker[2350]: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].` `Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea?`	14:57:12
trexd	In reply to @msanft:matrix.org Hey, I tried switching from `virtualisation.docker.enableNvidia = true;` to the more recent `virtualisation.containers.cdi.dynamic.nvidia.enable = true;`, `hardware.nvidia-container-toolkit.enable = true;` and `features.cdi = true;`. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs: `May 07 14:50:04 nixos docker[2350]: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].` Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea? Can you try my suggestion above?	15:24:31
trexd	In reply to @trexd:matrix.org I found that doing `docker run --gpus=all` results in `docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].` Whereas `docker run --device nvidia.com/gpu=all` will detect my GPU. vid My minimal settings are documented in this issue. https://github.com/NixOS/nixpkgs/issues/305312 This one Moritz Sanft	15:24:43
Moritz Sanft	Ohh, that seems helpful! Will try!	15:25:25
Moritz Sanft	That works. Thank you!	15:34:30
8 May 2024
	thirdofmay18081814goya changed their display name from nrs-status to thirdofmay18081814goya.	00:55:57
	thirdofmay18081814goya set a profile picture.	00:56:09
connor (he/him)	I am re-emerging from the exhaustion surrounding travel and interviews; will be hammering the PR I have open into shape tomorrow; hopefully ready for review and merge soon so we can get new releases of CUDA, CUDN , etc.	03:02:17
	@vid:matrix.org left the room.	12:47:16
connor (he/him)	Urhgh Spent a bit trying to figure out why PyTorch was marked as broken on my PR. It was because Magma was building against the latest version of CUDA, but PyTorch was building against 12.1 (the latest officially supported release). Because it's a nightmare to try to ensure everything across multiple package sets is built with the same version of CUDA packages, I've relaxed that condition. SomeoneSerge (Way down Hadestown) what are your thoughts on having something akin to `python312Packages` or `haskell.packages.<GHC version>` where we have a copy of `pkgs` available, but everything within that named package set is called with a single version of `cudaPackages`? That would avoid the need to thread different versions through dependencies via `passthru`, and should remove the possibility of mixing and matching CUDA versions between dependencies.	21:20:31
SomeoneSerge (matrix works sometimes)	The top-level `pkgs` is supposed to be that, but I guess we fail. Well, we definitely do because tensorflow	21:23:13
9 May 2024
	SomeoneSerge (matrix works sometimes) changed their display name from SomeoneSerge (Way down Hadestown) to SomeoneSerge (UTC+3).	17:11:24
10 May 2024
Kevin Mittman (UTC-7)	is it too complicated maintaining the closure with packages for each component? i.e. would a single input simplify?	14:31:54
connor (he/him)	In reply to @justbrowsing:matrix.org is it too complicated maintaining the closure with packages for each component? i.e. would a single input simplify? Is this in reference to the above or about the redistributable packaging in general?	14:47:33
Kevin Mittman (UTC-7)	why not both?	15:13:10
Kevin Mittman (UTC-7)	(more the latter)	15:13:57
	@brandon:matrix.radiation.io left the room.	15:23:52
connor (he/him)	Ah for the latter the trouble is mostly around Nixpkgs expecting certain outputs to behave in certain ways (like `dev` including a dependency on `out`) and us using the outputs as components rather than full outputs	16:00:12
connor (he/him)	For the former the issue is mostly around different packages in the global scope requiring different versions of CUDA (like PyTorch and Tensorflow use different versions of CUDA)	16:00:48
12 May 2024
Gaétan Lepage	staging-next has been merged to master a few minutes ago. Looks like most of the CUDA stuff is broken...	16:26:19
	Kevin Mittman (UTC-7) changed their display name from Kevin Mittman (jet-lagged) to Kevin Mittman.	16:30:23
connor (he/him)	Uh oh	19:03:43
connor (he/him)	Gaétan Lepage: can you send me a few reproducers? I’m going to rebase the PR I have outstanding on Monday and will pick those up so I’d like to know ahead of time what to look for	19:04:18
Gaétan Lepage	I am currently in the middle of many rebuilds. My JAX update PR was basically ready and now I have some kWh to spare rebuilding everything ^^ I am not sure yet about the failures. I have re-tried building the packages that were supposidly failing and it seems to work fine now.	19:06:32
Gaétan Lepage	I'll let you know if I spot anything fishy.	19:06:43

Show newer messages

Back to Room ListRoom Version: 9