!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

311 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda62 Servers

Load older messages


SenderMessageTime
17 May 2024
@evax:matrix.orgevaxlibcuda.so is under /usr/lib/wsl/lib08:56:55
@evax:matrix.orgevaxsome finding, under NixOS-WSL, with the option to use windows drivers, a wsl-lib package is created in the nix store linking the contents of /usr/lib/wsl/lib09:11:06
@evax:matrix.orgevaxit's exposed under /run/opengl-driver/lib09:11:57
@evax:matrix.orgevaxit might just be that jax is expecting cuda12 but the actual version in the system is cuda1109:12:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @evax:matrix.org
some finding, under NixOS-WSL, with the option to use windows drivers, a wsl-lib package is created in the nix store linking the contents of /usr/lib/wsl/lib
Good, this sounds much safer than putting /usr/lib/wsl in LD_LIBRARY_PATH
09:12:40
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @evax:matrix.org
it might just be that jax is expecting cuda12 but the actual version in the system is cuda11
It links its cuda libraries directly, and the driver is likely compatible with both
09:13:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)* It links its cuda libraries directly, and the driver is likely compatible with both releases09:13:13
@evax:matrix.orgevaxanother finding, using jaxlibWithCuda (the nix compiled version) jax complains there's no CUDA enabled jaxlib, while using jaxlib-bin there's an error message related to loading CUDA09:14:53
@evax:matrix.orgevax(I can't cut/paste/gist from that system, sorry)09:16:40
@evax:matrix.orgevax the jaxlib-bin error (with TF_CPP_MIN_LOG_LEVEL=0) is external/tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 09:22:09
@evax:matrix.orgevaxI tried to LD_PRELOAD libcuda.so and it doesn't help09:22:29
@evax:matrix.orgevax with jaxlibWithCuda, the error is An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu. 09:24:35
@evax:matrix.orgevax torch finds the GPU with LD_LIBRARY_PATH pointing either to /usr/lib/wsl/lib or /run/opengl-driver/lib, but not without, for jaxWithCuda none of these options work 09:46:50
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) Okay, tired of machines restarting
I just bought three different kits of RAM to replace the existing kits in my builders. And two 10Gbe NICs to try to increase builder performance since they’re all networked together and the 2.5Gbe on two of the machines was a bottleneck.
17:09:15
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)God I hate hardware 🫠17:09:20
@glepage:matrix.orgGaétan LepageHow many systems do you have as builders ?20:27:05
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I have three desktops I use as builders; I also pay for an aarch64-linux Hetzner server which I use for aarch64-linux builds for CI23:13:40
18 May 2024
@glepage:matrix.orgGaétan Lepage Ok cool !
I am starting to think about building a workstation for nix builds.
Would you mind sharing the specs of your machines ?
11:53:53
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Sure! Although keep in mind I've had a very difficult time managing consumer-grade hardware (especially given I use ASUS motherboards and the stupid default levels for voltage which trigger instability in games also trigger very hard to reproduce segfaults during Nix builds)12:16:19
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)My main machine: https://pcpartpicker.com/user/connorbaker/saved/pxtbkL A builder: https://pcpartpicker.com/user/connorbaker/saved/h6mvZL A builder/storage: https://pcpartpicker.com/user/connorbaker/saved/Pyy7CJ12:51:29
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) FWIW, it takes magma-cuda-static with the default set of capabilities ~19m30s to build on nixos-desktop and ~21m12s to build on nixos-build01 or nixos-ext. 12:52:26
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) However, I would strongly recommend writing a few scripts to provision an Azure instance instead. For example, Standard_HB120rs_v3 (https://learn.microsoft.com/en-us/azure/virtual-machines/hbv3-series) is available as a spot instance in US-East for just $0.36 an hour. Keep in mind that has a 10Gb NIC in addition to two 1TB NVME drives.
It's also server-grade hardware so no need to chase down segfaults caused by the motherboard melting your nice chips :)
12:54:53
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I mean seriously, just in troubleshooting stability issues yesterday I got frustrated and got new RAM for all my machines. That was about $1000 -- that would have bought me ~2,777h of the HBv3 as a spot instance.12:58:17
@ss:someonex.netSomeoneSerge (matrix works sometimes)
>>> magma_compute_hours = (19.5 / 60) * 24 * 2 # 24 hyper-threading cores
>>> 2777 / magma_compute_hours
178.0128205128205

After about 180 magma builds azure will have costed more than your RAM 🤔

18:46:46
@ss:someonex.netSomeoneSerge (matrix works sometimes) *
>>> magma_compute_hours = (19.5 / 60) * 24 * 2 # 24 hyper-threading cores
>>> 2777 / magma_compute_hours
178.0128205128205

After about 180 magma builds azure will have costed more than your RAM, and I think we build several magmas a day 🤔

18:47:38
@ss:someonex.netSomeoneSerge (matrix works sometimes) *
>>> magma_compute_hours = (19.5 / 60) * 24 * 2 # 24 hyper-threading cores
>>> 2777 / magma_compute_hours
178.0128205128205

AFAIU after about 180 magma builds azure will have costed more than your RAM, and I think we build several magmas a day 🤔

18:47:59
19 May 2024
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

Correction since the i9-13900k has 32 cores in total, some are hyper-threaded and others are not

>>> magma_compute_hours = (19.5 / 60) * 32 # 32 "cores"
>>> 2777 / magma_compute_hours
267.01923076923
01:36:25
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)However, that assumes it takes magma the same amount of time to build on an i9-13900k as it does on the HBv3 (it does not)01:36:50
@aidalgol:matrix.orgaidalgol nvidia-smi is reporting 0% GPU usage even when I am running a game and I can hear my card's fans speed up. Is it reporting correctly for anyone else? 09:47:55
@aidalgol:matrix.orgaidalgolIt sounds exactly like this: https://forums.developer.nvidia.com/t/nvidia-smi-reporting-0-gpu-utilization/26187809:48:51

Show newer messages


Back to Room ListRoom Version: 9