!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
8 Oct 2024
@glepage:matrix.orgGaétan LepageYes, compiling on the fly is the core spirit of tinygrad.15:47:06
@ss:someonex.netSomeoneSerge (back on matrix) Trying to compose backendStdenv with ccacheStdenv 🙃 17:07:51
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Trying to compose backendStdenv with ccacheStdenv 🙃
callPackage is a blessing and a curse
17:50:29
@ss:someonex.netSomeoneSerge (back on matrix)It works with a bit of copypaste though17:50:43
@ss:someonex.netSomeoneSerge (back on matrix) But has anyone run into weird PermissionDenied errors with ccache? the directory is visible in the sandbox and owned by nixbld group and id seemst o match... 17:57:47
@kaya:catnip.eekaya 𖤐 changed their profile picture.19:36:06
9 Oct 2024
@john:friendsgiv.ingjohn joined the room.01:20:41
10 Oct 2024
@ss:someonex.netSomeoneSerge (back on matrix)Iterating on triton with ccache is so much faster lmao16:12:34
11 Oct 2024
@msanft:matrix.orgMoritz Sanft Hey folks! I tried to update libnvidia-container, as it was lacking quite some versions (including security releases) behind. We use it in a work scenario for GPU containers in legacy mode, where we tested it to "work" generally. Only thing that doesn't is the binary resolving (e.g. nvidia-smi, nvidia-persistenced, ...). I just adapted the patches from the old version so that they apply on the new one. I dropped the replacement of PATH usage for binary lookup with fixing it to the /run/nvidia-docker directory, as this seems to be an artifact of older times, I believe? At least, the path doesn't exist in a legacy mode container nor on the host. I think the binaries should really be looked up through the PATH, which should be set accordingly when calling nvidia-container-cli? What do the experts think?

CDI containers work, as the binary paths are resolved correctly through the CDI config generated at boot.

Find my draft PR here: https://github.com/NixOS/nixpkgs/pull/347867
07:49:12
@msanft:matrix.orgMoritz Sanft Hey folks! I tried to update libnvidia-container, as it was lacking quite some versions (including security releases) behind. We use it in a work scenario for GPU containers in legacy mode, where we tested it to "work" generally. Only thing that doesn't is the binary resolving (e.g. nvidia-smi, nvidia-persistenced, ...). I just adapted the patches from the old version so that they apply on the new one. I tried dropping the replacement of PATH usage for binary lookup with fixing it to the /run/nvidia-docker directory, as this seems to be an artifact of older times, I believe? At least, the path doesn't exist in a legacy mode container nor on the host. I think the binaries should really be looked up through the PATH, which should be set accordingly when calling nvidia-container-cli? What do the experts think?

CDI containers work, as the binary paths are resolved correctly through the CDI config generated at boot.

Find my draft PR here: https://github.com/NixOS/nixpkgs/pull/347867
07:53:31
@ss:someonex.netSomeoneSerge (back on matrix) * Iterating on triton with ccache is so much faster lmao EDIT: triton+torch in half an hour on a single node, this not perfect but is an improvement11:41:55
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @msanft:matrix.org
Hey folks! I tried to update libnvidia-container, as it was lacking quite some versions (including security releases) behind. We use it in a work scenario for GPU containers in legacy mode, where we tested it to "work" generally. Only thing that doesn't is the binary resolving (e.g. nvidia-smi, nvidia-persistenced, ...). I just adapted the patches from the old version so that they apply on the new one. I tried dropping the replacement of PATH usage for binary lookup with fixing it to the /run/nvidia-docker directory, as this seems to be an artifact of older times, I believe? At least, the path doesn't exist in a legacy mode container nor on the host. I think the binaries should really be looked up through the PATH, which should be set accordingly when calling nvidia-container-cli? What do the experts think?

CDI containers work, as the binary paths are resolved correctly through the CDI config generated at boot.

Find my draft PR here: https://github.com/NixOS/nixpkgs/pull/347867
What'd be a reasonable way to test this, now that our docker/podman flows all migrated to CDI and our singularity IIRC uses a plain text file with the library paths?
11:44:30
@msanft:matrix.orgMoritz SanftI tested it with an "OCI Hook", like so: https://github.com/confidential-containers/cloud-api-adaptor/blob/191ec51f6245a1a475c15312d354efaf07ff64de/src/cloud-api-adaptor/podvm/addons/nvidia_gpu/setup.sh#L11C1-L17C4 Getting that to work was also the particular reason for why I got to update this package in the first place.12:21:24
@msanft:matrix.orgMoritz Sanft The update is necessary to fix legacy library lookup for containers with GPU access, as newer drivers won't have the libnvidia-pkcs11.so (which corresponds to OpenSSL 1.1), but only the *.openssl3.so alternatives for OpenSSL 3. Just to give this some context. Legacy binary lookup doesn't work with 1.9.0 nor 1.16.2 as of now. I think we might even want to get the update itself merged without fixing that, as it's security-relevant and the binary availability is not a regression, but I'm also happy to hear your stance on that. 12:24:09
@zopieux:matrix.zopi.euzopieux Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.stuff from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken? 17:03:50
@zopieux:matrix.zopi.euzopieux * Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken? 17:04:05
12 Oct 2024
@mabau:matrix.org@mabau:matrix.org joined the room.07:39:38
14 Oct 2024
@glepage:matrix.orgGaétan Lepage It looks like python312Packages.onnx does not build when cudaSupport = true. 08:11:25
@ss:someonex.netSomeoneSerge (back on matrix) Gaétan Lepage: could you give https://github.com/NixOS/nixpkgs/pull/328247 another look? I just picked up where the author left off, I didn't try questioning whether e.g. adding a separate triton-llvm is the right way or whatever, and my brain is not in the place to think high-level rn 18:43:40
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @zopieux:matrix.zopi.eu
Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken?

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux

18:50:31
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @zopieux:matrix.zopi.eu
Pinning nixpkgs to 9357f4f23713673f310988025d9dc261c20e70c6 per this commit, I successfully manage to retrieve cudaPackages.(things) from cuda-maintainers cachix, however onnxruntime doesn't seem to be in there, is it broken?
*

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

18:51:50
@ss:someonex.netSomeoneSerge (back on matrix)https://nix-community.org/cache/18:52:36
@glepage:matrix.orgGaétan Lepage
In reply to @ss:someonex.net

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

Indeed, it seems to fail currently
19:02:58
@glepage:matrix.orgGaétan Lepage
In reply to @ss:someonex.net

Seems like dependencies failed to build: https://hercules-ci.com/accounts/github/SomeoneSerge/derivations/%2Fnix%2Fstore%2Fn3lww4jsfan66wyryh3ip3ryarn874q5-onnxruntime-1.18.1.drv?via-job=e51bf1d4-6191-4763-8780-dd317be0b70b

Rather than debugging this, I'd advise you look into https://hydra.nix-community.org/job/nixpkgs/cuda/onnxruntime.x86_64-linux. There hasn't been any official announcements from nix-community's infra team to the best of my knowledge -> no "promises", but the hope is that this will become the supported and long-term maintained solution

This is building the cuda version of onnx ?
19:03:19
@ss:someonex.netSomeoneSerge (back on matrix)Yes but also the hydra history is all green 🤷19:08:54
@glepage:matrix.orgGaétan LepageYes, weird...19:13:19
@ss:someonex.netSomeoneSerge (back on matrix)Noticed https://github.com/SomeoneSerge/nixpkgs-cuda-ci/issues/31#issuecomment-2412043822 only now, published a response19:22:08
@glepage:matrix.orgGaétan Lepage I can't get onnx to build...
Here are the logs in case someone know what is happening: https://paste.glepage.com/upload/eel-falcon-sloth
20:08:13
@ss:someonex.netSomeoneSerge (back on matrix)
      error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20230125.3.tar.gz' failed

lol

20:19:08
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Yes but also the hydra history is all green 🤷
Maybe that just came in from staging
20:19:30

Show newer messages


Back to Room ListRoom Version: 9