NixOS CUDA - Public Room Timeline

	NixOS CUDA	289 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
9 Sep 2024
connor (burnt/out) (UTC-8)	Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has: `/nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info 610.2 GiB`	21:28:04
connor (burnt/out) (UTC-8)	* Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has: `$ nix path-info -Sh --impure .#cuda-redist-index warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring /nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info 610.2 GiB`	21:28:31
10 Sep 2024
	matthewcroughan changed their display name from matthewcroughan - going to nix.camp to matthewcroughan.	15:52:11
connor (burnt/out) (UTC-8)	So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619. OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.)	23:05:06
connor (burnt/out) (UTC-8)	In reply to @ss:someonex.net Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set wouldn't things like API changes between versions cause breakage?	23:16:04
connor (burnt/out) (UTC-8)	In reply to @ss:someonex.net Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set * wouldn't things like API changes between versions cause breakage? EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!).	23:31:05
@adam:robins.wtf	In reply to @adam:robins.wtf hmm, ollama is failing for me on unstable Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127" I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work	23:41:03
@adam:robins.wtf	In reply to @adam:robins.wtf hmm, ollama is failing for me on unstable Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127" * I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work even without using config.cudaSupport	23:41:28
connor (burnt/out) (UTC-8)	Any idea if it's just CUDA 12.4, or if it also had to do with the version bump https://github.com/NixOS/nixpkgs/pull/331585?	23:45:24
connor (burnt/out) (UTC-8)	Although it looks like they didn't add CUDA 12 support until 0.3.7 (https://github.com/ollama/ollama/releases/tag/v0.3.7)	23:45:58
connor (burnt/out) (UTC-8)	What driver version are you using?	23:47:24
@adam:robins.wtf	i can try and downgrade ollama and see	23:47:27
connor (burnt/out) (UTC-8)	Can you try upgrading it as well? Looks like 0.3.10 is out now	23:47:59
@adam:robins.wtf	560.35.03	23:48:19
@adam:robins.wtf	In reply to @connorbaker:matrix.org Can you try upgrading it as well? Looks like 0.3.10 is out now yeah i'll try that first	23:48:32
connor (burnt/out) (UTC-8)	Is this a NixOS system, and what GPU?	23:50:19
@adam:robins.wtf	yes, NixOS. 6700XT	23:51:52
@adam:robins.wtf	* yes, NixOS. 3060Ti	23:52:05
@adam:robins.wtf	* yes, NixOS. 3060	23:52:13
@adam:robins.wtf	06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1)	23:52:25
11 Sep 2024
@adam:robins.wtf	results of my ollama testing are: 0.3.5 - works with cudaPackages 12_3 and 12_4 0.3.9 - works on 12_3, broken on 12_4 0.3.10 - works on 12_3, broken on 12_4	01:13:46
connor (burnt/out) (UTC-8)	It is surprising to me that 0.3.5 works with CUDA 12 at all; I guess there were no breaking API changes on stuff they relied on?	18:05:26
12 Sep 2024
connor (burnt/out) (UTC-8)	In reply to @connorbaker:matrix.org So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619. OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.) I started writing a `pkgs.testers` implementation for what Serge suggested here: https://matrix.to/#/!eWOErHSaiddIbsUNsJ:nixos.org/$phSCjT-mxTap-ccF98Z7hZakHk3_-jjkPw2fIvzBhjA?via=nixos.org&via=matrix.org&via=nixos.dev	00:32:04
connor (burnt/out) (UTC-8)	SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix?	23:30:01
connor (burnt/out) (UTC-8)	I'm in the process of implementing a tester (https://github.com/NixOS/nixpkgs/pull/341471) but it's taking a bit and I'd like OpenCV fixed (or at least buildable) with CUDA, without breaking a bunch of downstream consumers of OpenCV (like FFMPEG)	23:35:35
13 Sep 2024
	kaya 𖤐 changed their profile picture.	07:16:41
SomeoneSerge (back on matrix)	Sorry my availability has been limited this way	10:19:52
SomeoneSerge (back on matrix)	* Sorry my availability has been limited this week	10:19:55
SomeoneSerge (back on matrix)	In reply to @connorbaker:matrix.org wouldn't things like API changes between versions cause breakage? EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!). Yeah it occurred to me right after posting that for the issue you're actually describing we need very different tests. What I proposed was basically ensuring that the expected versions of dependencies are loaded when running in isolation. What you actually wanted to ensure is that when a different version has already been loaded (which is guaranteed to happen with python) the runtime still works	10:22:00
SomeoneSerge (back on matrix)	In reply to @connorbaker:matrix.org SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? Sure let's try. I'd still check something something trivial like `# test1 import torch torch.randn(10, 10, device="cuda").sum().item() import cv2 # do something with cv2 and cuda # test2 import cv2 # do something with cv2 and cuda import torch torch.randn(10, 10, device="cuda").sum().item()`	10:24:51

Show newer messages

Back to Room ListRoom Version: 9