NixOS CUDA - Public Room Timeline

	NixOS CUDA	291 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
16 Sep 2022
SomeoneSerge (back on matrix)	`cudatoolkit.{out,lib}` bring in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57	21:14:47
SomeoneSerge (back on matrix)	* `cudatoolkit.{out,lib}` brings in a lot (4-5GiB) of luggage; if you'd like to get rid of it later, maybe you could start with https://github.com/NixOS/nixpkgs/blob/befe56a1ee1d383fafaf9db41e3f4fc506578da1/pkgs/development/python-modules/pytorch/default.nix#L57	21:14:53
17 Sep 2022
aidalgol	In a flake shell with `config.allowUnfree = true;` and `config.cudaSupport = true;`, the python `torch` module is throwing an unknown CUDA error. Is there something more I need to do to get the package's CUDA support enabled? `File "/nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init torch._C._cuda_init() RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.`	05:54:58
tpw_rules	are you running on nixos?	05:55:22
aidalgol	Yes, sorry, this is on NixOS.	05:55:35
tpw_rules	and you have the nvidia drivers set up and nvidia-smi works and stuff?	05:57:16
aidalgol	Yep, `nvidia-smi` output still looks good.	05:58:14
tpw_rules	is this the torch-bin module or did you compile it yourself?	05:58:52
aidalgol	I was referencing `torch`, not `torch-bin`. Should I try that one?	06:00:26
aidalgol	I'm also using the `cuda-maintainers` cachix cache, if that makes a difference.	06:01:19
tpw_rules	you can try torch-bin, it's precompiled by upstream with cuda support.	06:01:41
tpw_rules	do you have an excessively recent nvidia card?	06:02:01
aidalgol	RTX3080	06:02:07
aidalgol	`Driver Version: 515.48.07 CUDA Version: 11.7`	06:02:23
tpw_rules	give torch-bin a try, it doesn't sound like you're doing anything wrong with regular torch though, something might be broken with cuda 11.7 or so	06:02:54
tpw_rules	actually i think nixpkgs only has cuda 11.6 so that shouldn't even be it. i reviewed the pr and tested it i thought..	06:03:26
aidalgol	Some days it feels like GPU programming has invented a new kind of dependency hell.	06:03:49
tpw_rules	what nixpkgs commit are you on	06:04:41
tpw_rules	and are you trying to run any particular code	06:05:15
tpw_rules	i might be able to debug next week. i have a 3060Ti at work	06:06:18
aidalgol	I'm trying to run this script for some video upscaling I'm trying to do with VapourSynth and arcane plugins. https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/main/convert_esrgan_to_onnx.py	06:06:50
aidalgol	(Modifying that script to use a different input file)	06:07:21
tpw_rules	maybe tensorrt is the problem. don't think that's in nixpkgs	06:08:03
tpw_rules	anyway it is excessively my bedtime. good luck	06:08:16
aidalgol	Welp, that made no difference.	06:37:31
aidalgol	With this `shell.nix`, `{ pkgs ? import <nixpkgs> { config.allowUnfree = true; config.cudaSupport = true; } }: pkgs.mkShell { packages = with pkgs; [ (python3.withPackages (ps: [ ps.torch ])) ]; }` Just a basic "is CUDA available" check fails. $ nix-shell --run 'python' Python 3.10.6 (main, Aug 1 2022, 20:38:21) [GCC 11.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> assert torch.cuda.is_available() /nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /build/source/c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError	06:41:20
aidalgol	Uh, false alarm I guess, because after a NixOS update and reboot, the assert passes.	09:09:37
tpw_rules	the ways of cuda are strange. glad you got it working. maybe you updated your kernel recently and needed to reboot	13:44:11
SomeoneSerge (back on matrix)	(too late, but I'll still chime in with a comment on how I so far understand the landscape) This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a `/run/opengl-driver/lib/libcuda.so` and some kernel module loaded. Both are deployed on NixOS when `hardware.opengl.enable = true` and the driver is nvidia	14:19:31
aidalgol	In reply to @ss:someonex.net (too late, but I'll still chime in with a comment on how I so far understand the landscape) This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a `/run/opengl-driver/lib/libcuda.so` and some kernel module loaded. Both are deployed on NixOS when `hardware.opengl.enable = true` and the driver is nvidia I did not have `hardware.opengl.enable = true;` in my system config, so I'm not sure how OpenGL ever worked on my system. It's there now, though. 👍️	18:39:18

Show newer messages

Back to Room ListRoom Version: 9