!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
27 Aug 2024
@zimbatm:numtide.comJonas Chevalier
In reply to @hexa:lossy.network
is nix-community's jobset going to stay a ci essentially?
what else are you looking for, testing on PRs?
16:20:38
@hexa:lossy.networkhexaa binary cache16:20:58
@hexa:lossy.networkhexaimage.png
Download image.png
16:21:41
@hexa:lossy.networkhexathe rebuild of my homeserver on my homeserver (5600X) can't take all night and half the morning 🙂 16:21:54
@zimbatm:numtide.comJonas ChevalierI should have said that Hydra.nix-community.org publishes to nix-community.cachix.org 🙃16:22:00
@hexa:lossy.networkhexawhile all other machiens take less than 10 minutes16:22:01
@ss:someonex.netSomeoneSerge (back on matrix)OH I thought that was implied16:22:40
@zimbatm:numtide.comJonas Chevalier hexa (UTC+1): can you give us the new stats after you add that cache :) 16:23:18
@hexa:lossy.networkhexasure16:23:24
28 Aug 2024
@gmacon:matrix.org@gmacon:matrix.org

I'm trying to build the Rust candle-kernels crate within Nix, and nvcc is complaining that gcc is too new. I have gcc 13.2.0 and nvcc version 12.2.140 from nixpkgs-24.05-darwin bb8bdb47b718645b2f198a6cf9dff98d967d0fd4.

  /nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        |  ^~~~~
  thread 'main' panicked at /home/gmacon3/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:391:13:
  nvcc error while compiling "src/affine.cu":

  # CLI "nvcc" "--gpu-architecture=sm_90" "--ptx" "--default-stream" "per-thread" "--output-directory" "$PWD/target/debug/build/candle-kernels-809f3e0b9ee8b48d/out" "-Isrc" "-I/nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include" "src/affine.cu" 

Have other folks seen this? What's the best approach to resolve this?

17:13:49
@gmacon:matrix.org@gmacon:matrix.org
In reply to @gmacon:matrix.org

I'm trying to build the Rust candle-kernels crate within Nix, and nvcc is complaining that gcc is too new. I have gcc 13.2.0 and nvcc version 12.2.140 from nixpkgs-24.05-darwin bb8bdb47b718645b2f198a6cf9dff98d967d0fd4.

  /nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include/crt/host_config.h:143:2: error: #error -- unsupported GNU version! gcc versions later than 12 are not supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
        |  ^~~~~
  thread 'main' panicked at /home/gmacon3/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:391:13:
  nvcc error while compiling "src/affine.cu":

  # CLI "nvcc" "--gpu-architecture=sm_90" "--ptx" "--default-stream" "per-thread" "--output-directory" "$PWD/target/debug/build/candle-kernels-809f3e0b9ee8b48d/out" "-Isrc" "-I/nix/store/r45hzi56bzljzfvh6rgdnjbisy9pxqnj-cuda-merged-12.2/include" "src/affine.cu" 

Have other folks seen this? What's the best approach to resolve this?

It turs out that Crane (which is the library I'm using to handle the Rust build) supports a stdenv argument to override the compilers used for the Rust build, so setting it to an older GCC worked.
18:55:04
@ss:someonex.netSomeoneSerge (back on matrix)

older gcc

Note that if you're building a shared library you're going to run into libc issues if you just use gcc12Stdenv. THat's why we have cudaPackages.backendStdenv

21:48:12
29 Aug 2024
@gmacon:matrix.org@gmacon:matrix.org
In reply to @ss:someonex.net

older gcc

Note that if you're building a shared library you're going to run into libc issues if you just use gcc12Stdenv. THat's why we have cudaPackages.backendStdenv

Since this is a Rust project, I'm not building any shared libraries, but this is good to know. Thanks!
13:06:40
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @gmacon:matrix.org
Since this is a Rust project, I'm not building any shared libraries, but this is good to know. Thanks!
(also relevant if you're loading other shared libraries, e.g. as plugins)
13:10:42
@gmacon:matrix.org@gmacon:matrix.org
In reply to @ss:someonex.net
(also relevant if you're loading other shared libraries, e.g. as plugins)
I went ahead and changed my derivations anyway, so I'm all set for everything :-)
13:12:20
@hexa:lossy.networkhexa
In reply to @zimbatm:numtide.com
hexa (UTC+1): can you give us the new stats after you add that cache :)
my infra runs on nixos-24.05 🙂
14:09:02
@zimbatm:numtide.comJonas Chevalierright, we should probably also build 24.05. It shouldn't cost that much.14:12:24
@hexa:lossy.networkhexathat would be super cool14:12:37
3 Sep 2024
@hexa:lossy.networkhexahttps://github.com/nix-community/infra/pull/143520:55:18
@hexa:lossy.networkhexa not sure how useful release-cuda.nix is on 24.05, maybe SomeoneSerge (UTC+3) can speak to that? 20:55:38
@hexa:lossy.networkhexahttps://hydra.nix-community.org/jobset/nixpkgs/cuda-stable21:35:16
4 Sep 2024
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I’ll take a look at it later today as well17:40:12
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)(Assuming I remember and my plumbing is fixed by then otherwise all bets are off)17:40:28
@ss:someonex.netSomeoneSerge (back on matrix) changed their display name from SomeoneSerge (UTC+3) to SomeoneSerge (nix.camp).21:48:39
@hexa:lossy.networkhexacan you take care of the release-cuda backports?22:46:43
@ss:someonex.netSomeoneSerge (back on matrix)I'll add them to my tomorrow's agenda22:47:16
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I've got a PR to fix OpenCV's build for CUDA (and general cleanup) if that's of interest to anyone: https://github.com/NixOS/nixpkgs/pull/33961922:51:10
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Is it worth back-porting? I can't remember if CUDA 12.4 is in 24.0522:51:30
@hexa:lossy.networkhexaonly up to 12.322:56:15
7 Sep 2024
@adam:robins.wtf@adam:robins.wtf

hmm, ollama is failing for me on unstable

Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.680-04:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 gpu=GPU-c2c9209f-9632-bb03-ca95-d903c8664a1a parallel=4 available=12396331008 required="11.1 GiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.681-04:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=28 layers.offload=28 layers.split="" memory.available="[11.5 GiB]" memory.required.full="11.1 GiB" memory.required.partial="11.1 GiB" memory.required.kv="2.1 GiB" memory.required.allocations="[11.1 GiB]" memory.weights.total="10.1 GiB" memory.weights.repeating="10.0 GiB" memory.weights.nonrepeating="164.1 MiB" memory.graph.full="296.0 MiB" memory.graph.partial="391.4 MiB"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.695-04:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server --model /srv/fast/ollama/models/blobs/sha256-5ff0abeeac1d2dbdd5455c0b49ba3b29a9ce3c1fb181b2eef2e948689d55d046 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 28 --parallel 4 --port 35991"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.696-04:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
Sep 07 15:59:47 sink1 ollama[1314]: /tmp/ollama1289771407/runners/cuda_v12/ollama_llama_server: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
Sep 07 15:59:47 sink1 ollama[1314]: time=2024-09-07T15:59:47.947-04:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 127"

20:12:04

Show newer messages


Back to Room ListRoom Version: 9