NixOS CUDA - Public Room Timeline

	NixOS CUDA	289 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
29 Aug 2024
SomeoneSerge (back on matrix)	In reply to @gmacon:matrix.org Since this is a Rust project, I'm not building any shared libraries, but this is good to know. Thanks! (also relevant if you're loading other shared libraries, e.g. as plugins)	13:10:42
@gmacon:matrix.org	In reply to @ss:someonex.net (also relevant if you're loading other shared libraries, e.g. as plugins) I went ahead and changed my derivations anyway, so I'm all set for everything :-)	13:12:20
hexa (UTC+1)	In reply to @zimbatm:numtide.com hexa (UTC+1): can you give us the new stats after you add that cache :) my infra runs on nixos-24.05 🙂	14:09:02
Jonas Chevalier	right, we should probably also build 24.05. It shouldn't cost that much.	14:12:24
hexa (UTC+1)	that would be super cool	14:12:37
3 Sep 2024
hexa (UTC+1)	https://github.com/nix-community/infra/pull/1435	20:55:18
hexa (UTC+1)	not sure how useful release-cuda.nix is on 24.05, maybe SomeoneSerge (UTC+3) can speak to that?	20:55:38
hexa (UTC+1)	https://hydra.nix-community.org/jobset/nixpkgs/cuda-stable	21:35:16
4 Sep 2024
connor (burnt/out) (UTC-8)	I’ll take a look at it later today as well	17:40:12
connor (burnt/out) (UTC-8)	(Assuming I remember and my plumbing is fixed by then otherwise all bets are off)	17:40:28
	SomeoneSerge (back on matrix) changed their display name from SomeoneSerge (UTC+3) to SomeoneSerge (nix.camp).	21:48:39
hexa (UTC+1)	can you take care of the release-cuda backports?	22:46:43
SomeoneSerge (back on matrix)	I'll add them to my tomorrow's agenda	22:47:16
connor (burnt/out) (UTC-8)	I've got a PR to fix OpenCV's build for CUDA (and general cleanup) if that's of interest to anyone: https://github.com/NixOS/nixpkgs/pull/339619	22:51:10
connor (burnt/out) (UTC-8)	Is it worth back-porting? I can't remember if CUDA 12.4 is in 24.05	22:51:30
hexa (UTC+1)	only up to 12.3	22:56:15
7 Sep 2024
@adam:robins.wtf	hmm, ollama is failing for me on unstable Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"	20:12:04
SomeoneSerge (back on matrix)	Haven't checked it in a while but I remember the derivation had to use some very weird wrappers because they build some cuda programs at runtime/on the fly	21:33:06
8 Sep 2024
SomeoneSerge (back on matrix)	Kevin Mittman Hi! Do you know how dcgm uses cuda and why it has to link several versions?	11:45:14
9 Sep 2024
@adam:robins.wtf	In reply to @adam:robins.wtf hmm, ollama is failing for me on unstable Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error" Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127" I was able to fix this by setting nixpkgs.config.cudaSupport, but it also took many hours of compiling.	01:44:25
@adam:robins.wtf	4h 18m on a 5900x to be exact	01:49:53
@ironbound:hackerspace.pl	Dam	13:38:42
@adam:robins.wtf	there may be a simpler/smaller way to accomplish it. ollama used to work without that config option, which impacts many packages	14:06:05
connor (burnt/out) (UTC-8)	SomeoneSerge (nix.camp): at the top of https://github.com/NixOS/nixpkgs/pull/339619 I have a list of packages I found which have environments with mixed versions of CUDA packages. Any ideas on how best to test for cases where code loads arbitrary / incorrect versions of CUDA libraries? As an example, I’d hope OpenCV would load the CUDA libraries it was built with, and the other packages would load the CUDA libraries from their expressions (not the OpenCV one).	15:11:28
SomeoneSerge (back on matrix)	In reply to @connorbaker:matrix.org SomeoneSerge (nix.camp): at the top of https://github.com/NixOS/nixpkgs/pull/339619 I have a list of packages I found which have environments with mixed versions of CUDA packages. Any ideas on how best to test for cases where code loads arbitrary / incorrect versions of CUDA libraries? As an example, I’d hope OpenCV would load the CUDA libraries it was built with, and the other packages would load the CUDA libraries from their expressions (not the OpenCV one). Off the top of my head, I'd say we prepare a function under `pkgs.testers` for running an arbitrary command with LD_DEBUG=libs, parsing its outputs, and running asserts of the form: soname was/was not searched for soname was/was not loaded soname was/was not loaded from a path matching a pattern	17:27:04
SomeoneSerge (back on matrix)	As an example, I’d hope OpenCV would load the CUDA libraries it was built with, and the other packages would load the CUDA libraries from their expressions (not the OpenCV one). in most of our practical cases (native extensions in python) it's "who gets there first"	17:27:52
connor (burnt/out) (UTC-8)	So... PyTorch does something similar, in that packages with extensions are supposed to use the same version of CUDA libraries the PyTorch package was built with (and so we have `cudaPackages` in `torch.passthru` for downstream consumers). Assuming OpenCV, PyTorch, and the other packages using CUDA libraries don't play nicely with each other, are there any solutions you can think of? Best I can imagine is some sort of logic to compute the maximum supported version of `cudaPackages` or else package consumers being responsible for handling that themselves. Outside of that, unless there's a way to ensure each package has a unique namespace for libraries it looks for, and they can co-exist in the runtime, I don't know how to resolve such and issue.	17:54:42
SomeoneSerge (back on matrix)	Hypothesis: it should be probably ok to build with one version of cudart and execute with a newer, otherwise all other distributions would have been permanently broken. So we should try to do the same thing that we should start doing wrt libc: build against a "compatible" version, but exclude it from the closure in favour of linking the newest in the package set	18:39:26
SomeoneSerge (back on matrix)	Implicit evidence: at the end of the day users do put packages like tensorflow and pytorch and opencv into the same fixpoints (venvs), so it doesn't matter that our fixpoint is larger because the corner cases are the same	18:42:47
connor (burnt/out) (UTC-8)	Working on rebasing https://github.com/NixOS/nixpkgs/pull/306172 and... wow, NVIDIA just keeps pushing out updates don't they? CUDA 12.6.1? TensorRT 10.4?? I really gotta clean up the update scripts	21:26:08

Show newer messages

Back to Room ListRoom Version: 9