!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

316 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
19 May 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes)Damn. I guess it threw it away then:)16:10:24
@ss:someonex.netSomeoneSerge (matrix works sometimes)A perfectly sensible behaviour after spending 19 minutes of compute16:10:54
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I mean I have three desktops I can run three builds of it in parallel lol16:11:15
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)And then I could copy the closures to one machine and diff them there I guess16:11:42
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) Oh I guess I should enable keep-failed 16:13:08
@ss:someonex.netSomeoneSerge (matrix works sometimes) Wait, I ran nix build nixpkgs#apptainer.tests.image-hello-cowsay --rebuild --print-out-paths and I still don't see the alternative output 16:14:16
@ss:someonex.netSomeoneSerge (matrix works sometimes)
❯ nix build nixpkgs#apptainer.tests.image-hello-cowsay --rebuild --print-out-paths --keep-failed
note: keeping build directory '/tmp/nix-build-apptainer-image-hello-cowsay.img.drv-0'
error: derivation '/nix/store/fhf0m6lj0z4wixk3w69i28wv817mm2z9-apptainer-image-hello-cowsay.img.drv' may not be deterministic: output '/nix/store/dxpa23p7avaghl6r2rlx6i0wdmzf7kdq-apptainer-image-hello-cowsay.img' differs from '/nix/store/dxpa23p7avaghl6r2rlx6i0wdmzf7kdq-apptainer-image-hello-cowsay.img.check'
16:15:18
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)oh god what do I do with this output, it's massive16:33:07
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Okay, file-wise it's 175KB compressed and ~3MB uncompressed as HTML16:38:08
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Download magma-cuda-static-html-diffoscope.tar.zst16:38:11
@ss:someonex.netSomeoneSerge (matrix works sometimes)Soo stuff just shifted by a few bytes, presumably due to parallelism?16:42:32
@ss:someonex.netSomeoneSerge (matrix works sometimes)CUDA not ready for CA derivations xD16:42:46
@evax:matrix.orgevax SomeoneSerge (UTC+3): some more elements, I could make jaxlib-bin to work by overriding the src package to point to the cuda11 version instead of cuda12. For some reason the jaxlibWithCuda packages seems to be missing the cuda folder (flake setup, nixos-23.11 nixpkgs, cuda-maintainers cache) 17:40:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @evax:matrix.org
SomeoneSerge (UTC+3): some more elements, I could make jaxlib-bin to work by overriding the src package to point to the cuda11 version instead of cuda12. For some reason the jaxlibWithCuda packages seems to be missing the cuda folder (flake setup, nixos-23.11 nixpkgs, cuda-maintainers cache)

Jaxlib-bin is prebuilt against a concrete version if cuda. Nixpkgs manually pins that version. If an override helped you it's a bug in nixpkgs (pinning the wrong cuda).

Jaxlib (without bin) otoh should work with any cuda. We'd still need ld_debug=libs to derive any more conclusions

17:55:19
@ss:someonex.netSomeoneSerge (matrix works sometimes)* Jaxlib-bin is prebuilt against a concrete version of cuda. Nixpkgs manually pins that version. If an override helped you it's a bug in nixpkgs (pinning the wrong cuda). Jaxlib (without bin) otoh should work with any cuda. We'd still need ld_debug=libs to derive any more conclusions 17:55:42
@evax:matrix.orgevaxI can't really move any data to/from that system, could you tell me more about what to look for in the output?18:12:15
@aidalgol:matrix.orgaidalgol
In reply to @connorbaker:matrix.org

aidalgol: running nix-cuda-test I see it on my nvidia-smi

$ nvidia-smi
Sun May 19 15:11:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 45%   56C    P2            347W /  500W |    8187MiB /  24564MiB |     96%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3656630      C   ...y88kh-python3-3.11.9/bin/python3.11       8180MiB |
+-----------------------------------------------------------------------------------------+
Sorry, what's nix-cuda-test?
18:30:56
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Ah my bad, https://github.com/ConnorBaker/nix-cuda-test/tree/main18:31:16
@ss:someonex.netSomeoneSerge (matrix works sometimes)Depends on the exact error message, but firstly: where is libcuda.so loaded from18:56:19
@aidalgol:matrix.orgaidalgol
In reply to @connorbaker:matrix.org
Ah my bad, https://github.com/ConnorBaker/nix-cuda-test/tree/main
And what on earth is https://cantcache.me/ ? 🧐
19:00:20
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)It's a binary cache I made for myself using Attic (https://github.com/zhaofengli/attic)19:00:53
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I thought the domain name was funny and bought it so I've been using it on and off for various things19:01:04
@evax:matrix.orgevaxthe error message it that "a CUDA enabled jaxlib is not instaleld" - and it's actually not trying to load libcuda20:07:27
@evax:matrix.orgevax * the error message is that "a CUDA enabled jaxlib is not instaleld" - and it's actually not trying to load libcuda 20:07:34
@evax:matrix.orgevaxnow if I look at the path jaxlib is loaded from in python, there's no cuda folder in there (but there's one when I use jaxlib-bin)20:10:11
@aidalgol:matrix.orgaidalgol
In reply to @connorbaker:matrix.org
Ah my bad, https://github.com/ConnorBaker/nix-cuda-test/tree/main

I get an error trying to run that.

❯ nix run github:ConnorBaker/nix-cuda-test#nix-cuda-test
do you want to allow configuration setting 'extra-substituters' to be set to 'https://cantcache.me/cuda https://cuda-maintainers.cachix.org' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? 
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'cuda:NtbpAU7XGYlttrhCduqvpYKottCPdWVITWT+3nFVTBY= cuda-maintainers.cachix.org-1:0dq3bujKpuEPMCX6U4WylrUDZ9JyUG0VpVZa7CNfq5E=' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? 
do you want to allow configuration setting 'extra-trusted-substituters' to be set to 'https://cantcache.me/cuda https://cuda-maintainers.cachix.org' (y/N)? y
do you want to permanently mark this value as trusted (y/N)? 
error: builder for '/nix/store/vhh1jmqaf9pn9sfkygi8kn1l8lp8m322-python3.11-nix-cuda-test-0.1.0.drv' failed with exit code 1;
       last 25 log lines:
       > Using pypaInstallPhase
       > Sourcing python-imports-check-hook.sh
       > Using pythonImportsCheckPhase
       > Sourcing python-namespaces-hook
       > Sourcing python-catch-conflicts-hook.sh
       > Running phase: unpackPhase
       > unpacking source archive /nix/store/hb7ifp4m6n79cfgpc7ipnwp7cam9x71w-source
       > source root is source
       > setting SOURCE_DATE_EPOCH to timestamp 315619200 of file source/pyproject.toml
       > Running phase: patchPhase
       > Running phase: updateAutotoolsGnuConfigScriptsPhase
       > Running phase: configurePhase
       > no configure script, doing nothing
       > Running phase: buildPhase
       > Executing pypaBuildPhase
       > Creating a wheel...
       > * Getting build dependencies for wheel...
       > * Building wheel...
       > Successfully built nix_cuda_test-0.1.0-py3-none-any.whl
       > Finished creating a wheel...
       > Finished executing pypaBuildPhase
       > Running phase: pythonRuntimeDepsCheckHook
       > Executing pythonRuntimeDepsCheck
       > Checking runtime dependencies for nix_cuda_test-0.1.0-py3-none-any.whl
       >   - torchvision>=0.15.0 not satisfied by version 0.18.0a0

20:19:05
@evax:matrix.orgevaxI think I'm exactly in the situation described here: https://github.com/NixOS/nixpkgs/issues/28218420:24:52
@evax:matrix.orgevaxand I went through pretty much the same steps20:25:11
@evax:matrix.orgevaxI can get jaxlib-bin to work for me, but jaxlibWithCuda doesn't seem to ship cuda support20:25:53
@evax:matrix.orgevaxwait, the fix probably was never backported in 23.1120:32:27

Show newer messages


Back to Room ListRoom Version: 9