NixOS CUDA - Public Room Timeline

	NixOS CUDA	287 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
17 Sep 2022
aidalgol	(Modifying that script to use a different input file)	06:07:21
tpw_rules	maybe tensorrt is the problem. don't think that's in nixpkgs	06:08:03
tpw_rules	anyway it is excessively my bedtime. good luck	06:08:16
aidalgol	Welp, that made no difference.	06:37:31
aidalgol	With this `shell.nix`, `{ pkgs ? import <nixpkgs> { config.allowUnfree = true; config.cudaSupport = true; } }: pkgs.mkShell { packages = with pkgs; [ (python3.withPackages (ps: [ ps.torch ])) ]; }` Just a basic "is CUDA available" check fails. $ nix-shell --run 'python' Python 3.10.6 (main, Aug 1 2022, 20:38:21) [GCC 11.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> assert torch.cuda.is_available() /nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /build/source/c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError	06:41:20
aidalgol	Uh, false alarm I guess, because after a NixOS update and reboot, the assert passes.	09:09:37
tpw_rules	the ways of cuda are strange. glad you got it working. maybe you updated your kernel recently and needed to reboot	13:44:11
SomeoneSerge (back on matrix)	(too late, but I'll still chime in with a comment on how I so far understand the landscape) This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a `/run/opengl-driver/lib/libcuda.so` and some kernel module loaded. Both are deployed on NixOS when `hardware.opengl.enable = true` and the driver is nvidia	14:19:31
aidalgol	In reply to @ss:someonex.net (too late, but I'll still chime in with a comment on how I so far understand the landscape) This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a `/run/opengl-driver/lib/libcuda.so` and some kernel module loaded. Both are deployed on NixOS when `hardware.opengl.enable = true` and the driver is nvidia I did not have `hardware.opengl.enable = true;` in my system config, so I'm not sure how OpenGL ever worked on my system. It's there now, though. 👍️	18:39:18
18 Sep 2022
	greaka left the room.	11:35:02
hexa (UTC+1)	Samuel Ainsworth: jax needs an update on python-updates to handle the new protobuf version	13:18:40
	FRidh set a profile picture.	17:20:55
19 Sep 2022
hexa (UTC+1)	does samuel even read message on matrix?	02:13:28
hexa (UTC+1)	pinging on github	02:14:00
22 Sep 2022
hexa (UTC+1)	https://github.com/NixOS/nixpkgs/pull/192391	07:13:30
hexa (UTC+1)	packaged up due to a request, quickly tested (CPU only) and looks to be working	07:13:50
23 Sep 2022
aidalgol	Lately my GPU isn't showing up as a CUDA device after a system suspend and wake.	19:22:33
aidalgol	But `nvidia-smi` output looks fine.	19:23:54
aidalgol	And games still work and run with a playable framerate.	19:24:22
aidalgol	So Vulkan and OpenGL are unaffected; just CUDA.	19:25:18
24 Sep 2022
	peddie joined the room.	10:45:23
SomeoneSerge (back on matrix)	Isn't showing up where?	13:28:08
aidalgol	Blender doesn't see it, and python CUDA libraries fail to evaluate CUDA devices.	20:02:47
aidalgol	I've tried setting `hardware.nvidia.nvidiaPersistenced = true;` in my NixOS config, and that seems to have resolved it so far.	20:04:06
25 Sep 2022
hexa (UTC+1)	fyi https://github.com/NixOS/nixpkgs/pull/192879	11:26:48
29 Sep 2022
SomeoneSerge (back on matrix)	FRidh: Hey, I just don't want to flood the nvidia-ml-py issue, so I post here: the thing is I don't know what you mean by "referring from the stub to the driver library" I know that if you just run `LD_PRELOAD=$(nix-build '<nixpkgs>' -A cudaPackages.cudatoolkit --no-out-link)/lib/stubs/libnvidia-ml.so nvidia-smi` (or prepend the stub path in RUNPATH) without doing anything else, you're going to get a failure. If you meant something else (e.g. if you've heard of other ways to use the stubs than to bypass linker errors), then it might that I just lack knowledge	10:59:49
FRidh	Yes, and what if you do that after patchelf'ing ${cudatoolkit}/lib/stubs/libnvidia-ml.so to point to /run/current-system/... ?	11:06:12
FRidh	I have never tried this myself, just thought this should work.	11:06:31
SomeoneSerge (back on matrix)	Ok, I don't know about anything like it. The way I see it: the downstream app (e.g. `nvidia-ml-py`) searches for `libnvidia-ml.so`, resolves that into `lib/stubs/libnvidia-ml.so`, dlopen's it, and calls a "function" defined there. This now would have to look for a new `libnvidia-ml.so`?	11:17:09
SomeoneSerge (back on matrix)	Download image.png	11:28:14

Show newer messages

Back to Room ListRoom Version: 9