!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

316 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda63 Servers

Load older messages


SenderMessageTime
7 May 2024
@yklcs:matrix.orgyklcs Hello, I was wondering whether cudaPackages.cudatoolkit with Nix would allow me to use multiple versions of CUDA on my machine, either with NixOS or by just using the Nix package manager. 02:31:16
@brandon:matrix.radiation.io@brandon:matrix.radiation.ioyklcs: Cuda can be tricky but I've had good luck using nix shell and specific versions of cudatoolkit.04:58:33
@yklcs:matrix.orgyklcs
In reply to @brandon:matrix.radiation.io
yklcs: Cuda can be tricky but I've had good luck using nix shell and specific versions of cudatoolkit.
Thanks. Do you have any .nix files to share?
05:38:34
@ironbound:hackerspace.pl@ironbound:hackerspace.pl https://nixos.wiki/wiki/CUDA yklcs 10:06:58
@msanft:matrix.orgMoritz Sanft Hey, I tried switching from virtualisation.docker.enableNvidia = true; to the more recent virtualisation.containers.cdi.dynamic.nvidia.enable = true;, hardware.nvidia-container-toolkit.enable = true; and features.cdi = true;. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs:

`
`

Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea?
14:57:05
@msanft:matrix.orgMoritz Sanft Hey, I tried switching from virtualisation.docker.enableNvidia = true; to the more recent virtualisation.containers.cdi.dynamic.nvidia.enable = true;, hardware.nvidia-container-toolkit.enable = true; and features.cdi = true;. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs:

May 07 14:50:04 nixos docker[2350]: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea?
14:57:12
@trexd:matrix.orgtrexd
In reply to @msanft:matrix.org
Hey, I tried switching from virtualisation.docker.enableNvidia = true; to the more recent virtualisation.containers.cdi.dynamic.nvidia.enable = true;, hardware.nvidia-container-toolkit.enable = true; and features.cdi = true;. I'm using Docker daemon and client at v25, and since switching to the new configuration options, I see the following when trying to start containers with GPUs:

May 07 14:50:04 nixos docker[2350]: docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Searching online for a little, most of the people running into that issue didn't install the CTK properly. However, that shouldn't be the case with the options mentioned above, or am I wrong? Does anyone of you have another idea?
Can you try my suggestion above?
15:24:31
@trexd:matrix.orgtrexd
In reply to @trexd:matrix.org

I found that doing docker run --gpus=all results in

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Whereas docker run --device nvidia.com/gpu=all will detect my GPU. vid

My minimal settings are documented in this issue. https://github.com/NixOS/nixpkgs/issues/305312

This one Moritz Sanft
15:24:43
@msanft:matrix.orgMoritz SanftOhh, that seems helpful! Will try!15:25:25
@msanft:matrix.orgMoritz SanftThat works. Thank you!15:34:30
8 May 2024
@nrs-status:matrix.orgthirdofmay18081814goya changed their display name from nrs-status to thirdofmay18081814goya.00:55:57
@nrs-status:matrix.orgthirdofmay18081814goya set a profile picture.00:56:09
@connorbaker:matrix.orgconnor (he/him)I am re-emerging from the exhaustion surrounding travel and interviews; will be hammering the PR I have open into shape tomorrow; hopefully ready for review and merge soon so we can get new releases of CUDA, CUDN , etc.03:02:17
@vid:matrix.org@vid:matrix.org left the room.12:47:16
@connorbaker:matrix.orgconnor (he/him)

Urhgh
Spent a bit trying to figure out why PyTorch was marked as broken on my PR. It was because Magma was building against the latest version of CUDA, but PyTorch was building against 12.1 (the latest officially supported release). Because it's a nightmare to try to ensure everything across multiple package sets is built with the same version of CUDA packages, I've relaxed that condition.

SomeoneSerge (Way down Hadestown) what are your thoughts on having something akin to python312Packages or haskell.packages.<GHC version> where we have a copy of pkgs available, but everything within that named package set is called with a single version of cudaPackages? That would avoid the need to thread different versions through dependencies via passthru, and should remove the possibility of mixing and matching CUDA versions between dependencies.

21:20:31
@ss:someonex.netSomeoneSerge (matrix works sometimes) The top-level pkgs is supposed to be that, but I guess we fail. Well, we definitely do because tensorflow 21:23:13
9 May 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes) changed their display name from SomeoneSerge (Way down Hadestown) to SomeoneSerge (UTC+3).17:11:24
10 May 2024
@justbrowsing:matrix.orgKevin Mittman (UTC-7)is it too complicated maintaining the closure with packages for each component? i.e. would a single input simplify?14:31:54
@connorbaker:matrix.orgconnor (he/him)
In reply to @justbrowsing:matrix.org
is it too complicated maintaining the closure with packages for each component? i.e. would a single input simplify?
Is this in reference to the above or about the redistributable packaging in general?
14:47:33
@justbrowsing:matrix.orgKevin Mittman (UTC-7)why not both?15:13:10
@justbrowsing:matrix.orgKevin Mittman (UTC-7)(more the latter)15:13:57
@brandon:matrix.radiation.io@brandon:matrix.radiation.io left the room.15:23:52
@connorbaker:matrix.orgconnor (he/him) Ah for the latter the trouble is mostly around Nixpkgs expecting certain outputs to behave in certain ways (like dev including a dependency on out) and us using the outputs as components rather than full outputs 16:00:12
@connorbaker:matrix.orgconnor (he/him)For the former the issue is mostly around different packages in the global scope requiring different versions of CUDA (like PyTorch and Tensorflow use different versions of CUDA)16:00:48
12 May 2024
@glepage:matrix.orgGaétan Lepage staging-next has been merged to master a few minutes ago.
Looks like most of the CUDA stuff is broken...
16:26:19
@justbrowsing:matrix.orgKevin Mittman (UTC-7) changed their display name from Kevin Mittman (jet-lagged) to Kevin Mittman.16:30:23
@connorbaker:matrix.orgconnor (he/him)Uh oh19:03:43
@connorbaker:matrix.orgconnor (he/him) Gaétan Lepage: can you send me a few reproducers? I’m going to rebase the PR I have outstanding on Monday and will pick those up so I’d like to know ahead of time what to look for 19:04:18
@glepage:matrix.orgGaétan Lepage I am currently in the middle of many rebuilds. My JAX update PR was basically ready and now I have some kWh to spare rebuilding everything ^^
I am not sure yet about the failures. I have re-tried building the packages that were supposidly failing and it seems to work fine now.
19:06:32
@glepage:matrix.orgGaétan LepageI'll let you know if I spot anything fishy.19:06:43

Show newer messages


Back to Room ListRoom Version: 9