| 9 Sep 2024 |
connor (burnt/out) (UTC-8) | Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has:
/nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info 610.2 GiB
| 21:28:04 |
connor (burnt/out) (UTC-8) | * Also do something to split up how I handle fetching packages to avoid a single massive derivation with every tarball NVIDIA has:
$ nix path-info -Sh --impure .#cuda-redist-index
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
/nix/store/412899ispzymkv5fgvav37j7v6sk5i7m-mk-index-of-package-info 610.2 GiB
| 21:28:31 |
| 10 Sep 2024 |
| matthewcroughan changed their display name from matthewcroughan - going to nix.camp to matthewcroughan. | 15:52:11 |
connor (burnt/out) (UTC-8) | So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619. OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.) | 23:05:06 |
connor (burnt/out) (UTC-8) | In reply to @ss:someonex.net Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set wouldn't things like API changes between versions cause breakage? | 23:16:04 |
connor (burnt/out) (UTC-8) | In reply to @ss:someonex.net Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set * wouldn't things like API changes between versions cause breakage? EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!). | 23:31:05 |
@adam:robins.wtf | In reply to @adam:robins.wtf
hmm, ollama is failing for me on unstable
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"
I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work | 23:41:03 |
@adam:robins.wtf | In reply to @adam:robins.wtf
hmm, ollama is failing for me on unstable
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"
* I just spent some time looking into this again, and it appears the issue is cudaPackages. when trying the larger config.cudaSupport change I had to downgrade cudaPackages to 12.3 to successfully build. Leaving this downgrade in place allows ollama to work even without using config.cudaSupport | 23:41:28 |
connor (burnt/out) (UTC-8) | Any idea if it's just CUDA 12.4, or if it also had to do with the version bump https://github.com/NixOS/nixpkgs/pull/331585? | 23:45:24 |
connor (burnt/out) (UTC-8) | Although it looks like they didn't add CUDA 12 support until 0.3.7 (https://github.com/ollama/ollama/releases/tag/v0.3.7) | 23:45:58 |
connor (burnt/out) (UTC-8) | What driver version are you using? | 23:47:24 |
@adam:robins.wtf | i can try and downgrade ollama and see | 23:47:27 |
connor (burnt/out) (UTC-8) | Can you try upgrading it as well? Looks like 0.3.10 is out now | 23:47:59 |
@adam:robins.wtf | 560.35.03 | 23:48:19 |
@adam:robins.wtf | In reply to @connorbaker:matrix.org Can you try upgrading it as well? Looks like 0.3.10 is out now yeah i'll try that first | 23:48:32 |
connor (burnt/out) (UTC-8) | Is this a NixOS system, and what GPU? | 23:50:19 |
@adam:robins.wtf | yes, NixOS. 6700XT | 23:51:52 |
@adam:robins.wtf | * yes, NixOS. 3060Ti | 23:52:05 |
@adam:robins.wtf | * yes, NixOS. 3060 | 23:52:13 |
@adam:robins.wtf | 06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) | 23:52:25 |
| 11 Sep 2024 |
@adam:robins.wtf | results of my ollama testing are:
0.3.5 - works with cudaPackages 12_3 and 12_4
0.3.9 - works on 12_3, broken on 12_4
0.3.10 - works on 12_3, broken on 12_4 | 01:13:46 |
connor (burnt/out) (UTC-8) | It is surprising to me that 0.3.5 works with CUDA 12 at all; I guess there were no breaking API changes on stuff they relied on? | 18:05:26 |
| 12 Sep 2024 |
connor (burnt/out) (UTC-8) | In reply to @connorbaker:matrix.org So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619. OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.) I started writing a pkgs.testers implementation for what Serge suggested here: https://matrix.to/#/!eWOErHSaiddIbsUNsJ:nixos.org/$phSCjT-mxTap-ccF98Z7hZakHk3_-jjkPw2fIvzBhjA?via=nixos.org&via=matrix.org&via=nixos.dev | 00:32:04 |
connor (burnt/out) (UTC-8) | SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? | 23:30:01 |
connor (burnt/out) (UTC-8) | I'm in the process of implementing a tester (https://github.com/NixOS/nixpkgs/pull/341471) but it's taking a bit and I'd like OpenCV fixed (or at least buildable) with CUDA, without breaking a bunch of downstream consumers of OpenCV (like FFMPEG) | 23:35:35 |
| 13 Sep 2024 |
| kaya 𖤐 changed their profile picture. | 07:16:41 |
SomeoneSerge (back on matrix) | Sorry my availability has been limited this way | 10:19:52 |
SomeoneSerge (back on matrix) | * Sorry my availability has been limited this week | 10:19:55 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org wouldn't things like API changes between versions cause breakage? EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!). Yeah it occurred to me right after posting that for the issue you're actually describing we need very different tests. What I proposed was basically ensuring that the expected versions of dependencies are loaded when running in isolation. What you actually wanted to ensure is that when a different version has already been loaded (which is guaranteed to happen with python) the runtime still works | 10:22:00 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? Sure let's try. I'd still check something something trivial like
# test1
import torch
torch.randn(10, 10, device="cuda").sum().item()
import cv2
# do something with cv2 and cuda
# test2
import cv2
# do something with cv2 and cuda
import torch
torch.randn(10, 10, device="cuda").sum().item()
| 10:24:51 |