| 17 Sep 2022 |
aidalgol | (Modifying that script to use a different input file) | 06:07:21 |
tpw_rules | maybe tensorrt is the problem. don't think that's in nixpkgs | 06:08:03 |
tpw_rules | anyway it is excessively my bedtime. good luck | 06:08:16 |
aidalgol | Welp, that made no difference. | 06:37:31 |
aidalgol | With this shell.nix,
{ pkgs ? import <nixpkgs> {
config.allowUnfree = true;
config.cudaSupport = true;
} }:
pkgs.mkShell {
packages = with pkgs; [
(python3.withPackages (ps: [
ps.torch
]))
];
}
Just a basic "is CUDA available" check fails.
$ nix-shell --run 'python'
Python 3.10.6 (main, Aug 1 2022, 20:38:21) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> assert torch.cuda.is_available()
/nix/store/bf48f3zny7q08lg4hc4279fn3jw1lkpl-python3-3.10.6-env/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /build/source/c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
| 06:41:20 |
aidalgol | Uh, false alarm I guess, because after a NixOS update and reboot, the assert passes. | 09:09:37 |
tpw_rules | the ways of cuda are strange. glad you got it working. maybe you updated your kernel recently and needed to reboot | 13:44:11 |
SomeoneSerge (back on matrix) | (too late, but I'll still chime in with a comment on how I so far understand the landscape)
This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a /run/opengl-driver/lib/libcuda.so and some kernel module loaded. Both are deployed on NixOS when hardware.opengl.enable = true and the driver is nvidia
| 14:19:31 |
aidalgol | In reply to@ss:someonex.net
(too late, but I'll still chime in with a comment on how I so far understand the landscape)
This is all that's needed from nixpkgs. The other requirements are imposed on the running system, and probably amount to having a /run/opengl-driver/lib/libcuda.so and some kernel module loaded. Both are deployed on NixOS when hardware.opengl.enable = true and the driver is nvidia
I did not have hardware.opengl.enable = true; in my system config, so I'm not sure how OpenGL ever worked on my system. It's there now, though. 👍️ | 18:39:18 |
| 18 Sep 2022 |
| greaka left the room. | 11:35:02 |
hexa | Samuel Ainsworth: jax needs an update on python-updates to handle the new protobuf version | 13:18:40 |
| FRidh set a profile picture. | 17:20:55 |
| 19 Sep 2022 |
hexa | does samuel even read message on matrix? | 02:13:28 |
hexa | pinging on github | 02:14:00 |
| 22 Sep 2022 |
hexa | https://github.com/NixOS/nixpkgs/pull/192391 | 07:13:30 |
hexa | packaged up due to a request, quickly tested (CPU only) and looks to be working | 07:13:50 |
| 23 Sep 2022 |
aidalgol | Lately my GPU isn't showing up as a CUDA device after a system suspend and wake. | 19:22:33 |
aidalgol | But nvidia-smi output looks fine. | 19:23:54 |
aidalgol | And games still work and run with a playable framerate. | 19:24:22 |
aidalgol | So Vulkan and OpenGL are unaffected; just CUDA. | 19:25:18 |
| 24 Sep 2022 |
| peddie joined the room. | 10:45:23 |
SomeoneSerge (back on matrix) | Isn't showing up where? | 13:28:08 |
aidalgol | Blender doesn't see it, and python CUDA libraries fail to evaluate CUDA devices. | 20:02:47 |
aidalgol | I've tried setting hardware.nvidia.nvidiaPersistenced = true; in my NixOS config, and that seems to have resolved it so far. | 20:04:06 |
| 25 Sep 2022 |
hexa | fyi https://github.com/NixOS/nixpkgs/pull/192879 | 11:26:48 |
| 29 Sep 2022 |
SomeoneSerge (back on matrix) | FRidh: Hey, I just don't want to flood the nvidia-ml-py issue, so I post here: the thing is I don't know what you mean by "referring from the stub to the driver library"
I know that if you just run LD_PRELOAD=$(nix-build '<nixpkgs>' -A cudaPackages.cudatoolkit --no-out-link)/lib/stubs/libnvidia-ml.so nvidia-smi (or prepend the stub path in RUNPATH) without doing anything else, you're going to get a failure. If you meant something else (e.g. if you've heard of other ways to use the stubs than to bypass linker errors), then it might that I just lack knowledge
| 10:59:49 |
FRidh | Yes, and what if you do that after patchelf'ing ${cudatoolkit}/lib/stubs/libnvidia-ml.so to point to /run/current-system/... ? | 11:06:12 |
FRidh | I have never tried this myself, just thought this should work. | 11:06:31 |
SomeoneSerge (back on matrix) | Ok, I don't know about anything like it. The way I see it: the downstream app (e.g. nvidia-ml-py) searches for libnvidia-ml.so, resolves that into lib/stubs/libnvidia-ml.so, dlopen's it, and calls a "function" defined there. This now would have to look for a new libnvidia-ml.so? | 11:17:09 |
SomeoneSerge (back on matrix) |  Download image.png | 11:28:14 |