!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

289 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
9 Sep 2024
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has:

/nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info                                    	 610.2 GiB
21:28:04
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) *

Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has:

$ nix path-info -Sh --impure .#cuda-redist-index
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
/nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info	 610.2 GiB
21:28:31
10 Sep 2024
@matthewcroughan:defenestrate.itmatthewcroughan changed their display name from matthewcroughan - going to nix.camp to matthewcroughan.15:52:11
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619.
OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.)
23:05:06
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
In reply to @ss:someonex.net
Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set
wouldn't things like API changes between versions cause breakage?
23:16:04
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
In reply to @ss:someonex.net
Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set
* wouldn't things like API changes between versions cause breakage?
EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!).
23:31:05
@adam:robins.wtf@adam:robins.wtf
In reply to @adam:robins.wtf

hmm, ollama is failing for me on unstable

Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work
23:41:03
@adam:robins.wtf@adam:robins.wtf
In reply to @adam:robins.wtf

hmm, ollama is failing for me on unstable

Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

* I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work even without using config.cudaSupport
23:41:28
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Any idea if it's just CUDA 12.4, or if it also had to do with the version bump https://github.com/NixOS/nixpkgs/pull/331585?23:45:24
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Although it looks like they didn't add CUDA 12 support until 0.3.7 (https://github.com/ollama/ollama/releases/tag/v0.3.7)23:45:58
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)What driver version are you using?23:47:24
@adam:robins.wtf@adam:robins.wtfi can try and downgrade ollama and see23:47:27
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Can you try upgrading it as well? Looks like 0.3.10 is out now23:47:59
@adam:robins.wtf@adam:robins.wtf560.35.0323:48:19
@adam:robins.wtf@adam:robins.wtf
In reply to @connorbaker:matrix.org
Can you try upgrading it as well? Looks like 0.3.10 is out now
yeah i'll try that first
23:48:32
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Is this a NixOS system, and what GPU?23:50:19
@adam:robins.wtf@adam:robins.wtfyes, NixOS. 6700XT23:51:52
@adam:robins.wtf@adam:robins.wtf * yes, NixOS. 3060Ti23:52:05
@adam:robins.wtf@adam:robins.wtf * yes, NixOS. 306023:52:13
@adam:robins.wtf@adam:robins.wtf06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)23:52:25
11 Sep 2024
@adam:robins.wtf@adam:robins.wtfresults of my ollama testing are: 0.3.5 - works with cudaPackages 12_3 and 12_4 0.3.9 - works on 12_3, broken on 12_4 0.3.10 - works on 12_3, broken on 12_401:13:46
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)It is surprising to me that 0.3.5 works with CUDA 12 at all; I guess there were no breaking API changes on stuff they relied on?18:05:26
12 Sep 2024
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
In reply to @connorbaker:matrix.org
So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619.
OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.)
I started writing a pkgs.testers implementation for what Serge suggested here: https://matrix.to/#/!eWOErHSaiddIbsUNsJ:nixos.org/$phSCjT-mxTap-ccF98Z7hZakHk3_-jjkPw2fIvzBhjA?via=nixos.org&via=matrix.org&via=nixos.dev
00:32:04
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? 23:30:01
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I'm in the process of implementing a tester (https://github.com/NixOS/nixpkgs/pull/341471) but it's taking a bit and I'd like OpenCV fixed (or at least buildable) with CUDA, without breaking a bunch of downstream consumers of OpenCV (like FFMPEG)23:35:35
13 Sep 2024
@kaya:catnip.eekaya 𖤐 changed their profile picture.07:16:41
@ss:someonex.netSomeoneSerge (back on matrix)Sorry my availability has been limited this way10:19:52
@ss:someonex.netSomeoneSerge (back on matrix) * Sorry my availability has been limited this week10:19:55
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org
wouldn't things like API changes between versions cause breakage?
EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!).
Yeah it occurred to me right after posting that for the issue you're actually describing we need very different tests. What I proposed was basically ensuring that the expected versions of dependencies are loaded when running in isolation. What you actually wanted to ensure is that when a different version has already been loaded (which is guaranteed to happen with python) the runtime still works
10:22:00
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org
SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix?

Sure let's try. I'd still check something something trivial like

# test1
import torch
torch.randn(10, 10, device="cuda").sum().item()
import cv2
# do something with cv2 and cuda

# test2

import cv2
# do something with cv2 and cuda
import torch
torch.randn(10, 10, device="cuda").sum().item()
10:24:51

Show newer messages


Back to Room ListRoom Version: 9