!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

211 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda42 Servers

Load older messages


SenderMessageTime
11 Oct 2024
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @msanft:matrix.org
Hey folks! I tried to update libnvidia-container, as it was lacking quite some versions (including security releases) behind. We use it in a work scenario for GPU containers in legacy mode, where we tested it to "work" generally. Only thing that doesn't is the binary resolving (e.g. nvidia-smi, nvidia-persistenced, ...). I just adapted the patches from the old version so that they apply on the new one. I tried dropping the replacement of PATH usage for binary lookup with fixing it to the /run/nvidia-docker directory, as this seems to be an artifact of older times, I believe? At least, the path doesn't exist in a legacy mode container nor on the host. I think the binaries should really be looked up through the PATH, which should be set accordingly when calling nvidia-container-cli? What do the experts think?

CDI containers work, as the binary paths are resolved correctly through the CDI config generated at boot.

Find my draft PR here: https://github.com/NixOS/nixpkgs/pull/347867
What'd be a reasonable way to test this, now that our docker/podman flows all migrated to CDI and our singularity IIRC uses a plain text file with the library paths?
11:44:30
@msanft:matrix.orgMoritz SanftI tested it with an "OCI Hook", like so: https://github.com/confidential-containers/cloud-api-adaptor/blob/191ec51f6245a1a475c15312d354efaf07ff64de/src/cloud-api-adaptor/podvm/addons/nvidia_gpu/setup.sh#L11C1-L17C4 Getting that to work was also the particular reason for why I got to update this package in the first place.12:21:24
@msanft:matrix.orgMoritz Sanft The update is necessary to fix legacy library lookup for containers with GPU access, as newer drivers won't have the libnvidia-pkcs11.so (which corresponds to OpenSSL 1.1), but only the *.openssl3.so alternatives for OpenSSL 3. Just to give this some context. Legacy binary lookup doesn't work with 1.9.0 nor 1.16.2 as of now. I think we might even want to get the update itself merged without fixing that, as it's security-relevant and the binary availability is not a regression, but I'm also happy to hear your stance on that. 12:24:09
@zopieux:matrix.zopi.euzopieux Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.stuff from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken? 17:03:50
@zopieux:matrix.zopi.euzopieux * Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken? 17:04:05
12 Oct 2024
@mabau:matrix.orgMB joined the room.07:39:38
14 Oct 2024
@glepage:matrix.orgGaétan Lepage It looks like python312Packages.onnx does not build when cudaSupport = true. 08:11:25
@ss:someonex.netSomeoneSerge (utc+3) Gaétan Lepage: could you give https://github.com/NixOS/nixpkgs/pull/328247 another look? I just picked up where the author left off, I didn't try questioning whether e.g. adding a separate triton-llvm is the right way or whatever, and my brain is not in the place to think high-level rn 18:43:40
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @zopieux:matrix.zopi.eu
Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken?

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux

18:50:31
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @zopieux:matrix.zopi.eu
Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken?
*

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

18:51:50
@ss:someonex.netSomeoneSerge (utc+3)https://nix-community.org/cache/18:52:36
@glepage:matrix.orgGaétan Lepage
In reply to @ss:someonex.net

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

Indeed, it seems to fail currently
19:02:58
@glepage:matrix.orgGaétan Lepage
In reply to @ss:someonex.net

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

This is building the cuda version of onnx ?
19:03:19
@ss:someonex.netSomeoneSerge (utc+3)Yes but also the hydra history is all green 🤷19:08:54
@glepage:matrix.orgGaétan LepageYes, weird...19:13:19
@ss:someonex.netSomeoneSerge (utc+3)Noticed https://github.com/SomeoneSerge/nixpkgs-cuda-ci/issues/31#issuecomment-2412043822 only now, published a response19:22:08
@glepage:matrix.orgGaétan Lepage I can't get onnx to build...
Here are the logs in case someone know what is happening: https://paste.glepage.com/upload/eel-falcon-sloth
20:08:13
@ss:someonex.netSomeoneSerge (utc+3)
      error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20230125.3.tar.gz' failed

lol

20:19:08
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @ss:someonex.net
Yes but also the hydra history is all green 🤷
Maybe that just came in from staging
20:19:30
15 Oct 2024
@connorbaker:matrix.orgconnor (he/him) (UTC-7)
In reply to @glepage:matrix.org
I can't get onnx to build...
Here are the logs in case someone know what is happening: https://paste.glepage.com/upload/eel-falcon-sloth
Onnx's CMake isn't detecting at least one dependency, so it tries to download them all in order, starting with abseil. Since there's no networking in the sandbox, it fails.
00:06:48
@connorbaker:matrix.orgconnor (he/him) (UTC-7)I'm currently working on Onnx packaging for a thing, and you can see what I've got going on here: https://github.com/ConnorBaker/cuda-packages/blob/main/cudaPackages-common/onnx.nix (It's a combination C++/Python install so it's gnarly. But better than having two separate derivations with libraries built with different flags, I guess.)00:09:04
@glepage:matrix.orgGaétan LepageOk interesting, thanks for sharing05:46:57
@glepage:matrix.orgGaétan LepageIs your plan to upstream this to nixpkgs ?05:47:13
@glepage:matrix.orgGaétan Lepage [triton update]
triton-llvm fails during the test phase.
Logs: https://paste.glepage.com/upload/fish-jaguar-pig
08:48:05
@atagen:imagisphe.reatagen joined the room.11:38:21
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @glepage:matrix.org
[triton update]
triton-llvm fails during the test phase.
Logs: https://paste.glepage.com/upload/fish-jaguar-pig
Can't reproduce, builds for me
12:35:31
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @glepage:matrix.org
[triton update]
triton-llvm fails during the test phase.
Logs: https://paste.glepage.com/upload/fish-jaguar-pig
* Can't reproduce, builds for me. Maybe we tried different HEADs?
12:36:26
@atagen:imagisphe.reatagen hi, what am I missing to get a cache hit? going by this hydra output torch should be in the cache (for nixpkgs 5633bcf). I have nix-community cachix set up, allowUnfree, cudaSupport,, and the package in question is providing its overlay properly with final.callPackage so it ought to be using my system packages 12:46:24
@atagen:imagisphe.reatagenhttps://gist.github.com/atagen/615e187e323f3ca3f5f9d40e55ce2b7c12:55:50
@atagen:imagisphe.reatagen oof, could it be because I'm specifying python311Packages instead of python3Packages? 12:57:30

Show newer messages


Back to Room ListRoom Version: 9