!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

211 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda42 Servers

Load older messages


SenderMessageTime
27 Aug 2024
@zimbatm:numtide.comJonas Chevalier hexa (UTC+1): can you give us the new stats after you add that cache :) 16:23:18
@hexa:lossy.networkhexa (UTC+1)sure16:23:24
28 Aug 2024
@gmacon:matrix.orggmacon

I'm trying to build the Rust candle-kernels crate within Nix, and nvcc is complaining that gcc is too new. I have gcc 13.2.0 and nvcc version 12.2.140 from nixpkgs-24.05-darwin bb8bdb47b718645b2f198a6cf9dff98d967d0fd4.

  /nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        |  ^~~~~
  thread 'main' panicked at /home/gmacon3/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:391:13:
  nvcc error while compiling "src/affine.cu":

  # CLI "nvcc" "--gpu-architecture=sm_90" "--ptx" "--default-stream" "per-thread" "--output-directory" "$PWD/target/debug/build/candle-kernels-809f3e0b9ee8b48d/out" "-Isrc" "-I/nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include" "src/affine.cu" 

Have other folks seen this? What's the best approach to resolve this?

17:13:49
@gmacon:matrix.orggmacon
In reply to @gmacon:matrix.org

I'm trying to build the Rust candle-kernels crate within Nix, and nvcc is complaining that gcc is too new. I have gcc 13.2.0 and nvcc version 12.2.140 from nixpkgs-24.05-darwin bb8bdb47b718645b2f198a6cf9dff98d967d0fd4.

  /nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        |  ^~~~~
  thread 'main' panicked at /home/gmacon3/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:391:13:
  nvcc error while compiling "src/affine.cu":

  # CLI "nvcc" "--gpu-architecture=sm_90" "--ptx" "--default-stream" "per-thread" "--output-directory" "$PWD/target/debug/build/candle-kernels-809f3e0b9ee8b48d/out" "-Isrc" "-I/nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include" "src/affine.cu" 

Have other folks seen this? What's the best approach to resolve this?

It turs out that Crane (which is the library I'm using to handle the Rust build) supports a stdenv argument to override the compilers used for the Rust build, so setting it to an older GCC worked.
18:55:04
@ss:someonex.netSomeoneSerge (utc+3)

older gcc

Note that if you're building a shared library you're going to run into libc issues if you just use gcc12Stdenv. THat's why we have cudaPackages.backendStdenv

21:48:12
29 Aug 2024
@gmacon:matrix.orggmacon
In reply to @ss:someonex.net

older gcc

Note that if you're building a shared library you're going to run into libc issues if you just use gcc12Stdenv. THat's why we have cudaPackages.backendStdenv

Since this is a Rust project, I'm not building any shared libraries, but this is good to know. Thanks!
13:06:40
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @gmacon:matrix.org
Since this is a Rust project, I'm not building any shared libraries, but this is good to know. Thanks!
(also relevant if you're loading other shared libraries, e.g. as plugins)
13:10:42
@gmacon:matrix.orggmacon
In reply to @ss:someonex.net
(also relevant if you're loading other shared libraries, e.g. as plugins)
I went ahead and changed my derivations anyway, so I'm all set for everything :-)
13:12:20
@hexa:lossy.networkhexa (UTC+1)
In reply to @zimbatm:numtide.com
hexa (UTC+1): can you give us the new stats after you add that cache :)
my infra runs on nixos-24.05 🙂
14:09:02
@zimbatm:numtide.comJonas Chevalierright, we should probably also build 24.05. It shouldn't cost that much.14:12:24
@hexa:lossy.networkhexa (UTC+1)that would be super cool14:12:37
3 Sep 2024
@hexa:lossy.networkhexa (UTC+1)https://github.com/nix-community/infra/pull/143520:55:18
@hexa:lossy.networkhexa (UTC+1) not sure how useful release-cuda.nix is on 24.05, maybe SomeoneSerge (UTC+3) can speak to that? 20:55:38
@hexa:lossy.networkhexa (UTC+1)https://hydra.nix-community.org/jobset/nixpkgs/cuda-stable21:35:16
4 Sep 2024
@connorbaker:matrix.orgconnor (he/him) (UTC-7)I’ll take a look at it later today as well17:40:12
@connorbaker:matrix.orgconnor (he/him) (UTC-7)(Assuming I remember and my plumbing is fixed by then otherwise all bets are off)17:40:28
@ss:someonex.netSomeoneSerge (utc+3) changed their display name from SomeoneSerge (UTC+3) to SomeoneSerge (nix.camp).21:48:39
@hexa:lossy.networkhexa (UTC+1)can you take care of the release-cuda backports?22:46:43
@ss:someonex.netSomeoneSerge (utc+3)I'll add them to my tomorrow's agenda22:47:16
@connorbaker:matrix.orgconnor (he/him) (UTC-7)I've got a PR to fix OpenCV's build for CUDA (and general cleanup) if that's of interest to anyone: https://github.com/NixOS/nixpkgs/pull/33961922:51:10
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Is it worth back-porting? I can't remember if CUDA 12.4 is in 24.0522:51:30
@hexa:lossy.networkhexa (UTC+1)only up to 12.322:56:15
7 Sep 2024
@adam:robins.wtfadamcstephens

hmm, ollama is failing for me on unstable

Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

20:12:04
@ss:someonex.netSomeoneSerge (utc+3)Haven't checked it in a while but I remember the derivation had to use some very weird wrappers because they build some cuda programs at runtime/on the fly21:33:06
8 Sep 2024
@ss:someonex.netSomeoneSerge (utc+3) Kevin Mittman Hi! Do you know how dcgm uses cuda and why it has to link several versions? 11:45:14
9 Sep 2024
@adam:robins.wtfadamcstephens
In reply to @adam:robins.wtf

hmm, ollama is failing for me on unstable

Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

I was able to fix this by setting nixpkgs.config.cudaSupport, but it also took many hours of compiling.
01:44:25
@adam:robins.wtfadamcstephens4h 18m on a 5900x to be exact01:49:53
@ironbound:hackerspace.pl@ironbound:hackerspace.plDam13:38:42
@adam:robins.wtfadamcstephensthere may be a simpler/smaller way to accomplish it. ollama used to work without that config option, which impacts many packages14:06:05
@connorbaker:matrix.orgconnor (he/him) (UTC-7) SomeoneSerge (nix.camp): at the top of https://github.com/NixOS/nixpkgs/pull/339619 I have a list of packages I found which have environments with mixed versions of CUDA packages. Any ideas on how best to test for cases where code loads arbitrary / incorrect versions of CUDA libraries? As an example, I’d hope OpenCV would load the CUDA libraries it was built with, and the other packages would load the CUDA libraries from their expressions (not the OpenCV one). 15:11:28

Show newer messages


Back to Room ListRoom Version: 9