!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

317 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda63 Servers

Load older messages


SenderMessageTime
15 May 2024
@evax:matrix.orgevax joined the room.21:57:42
16 May 2024
@evax:matrix.orgevaxHi I'm trying to get jax with cuda to work in WSL using a flake, but the GPU is never recognized. Torch in the same flake recognizes the device. I've tried setting nixpkgs to both nixos-23.11 and nixos-unstable.19:11:22
@trexd:matrix.orgtrexd
In reply to @evax:matrix.org
Hi I'm trying to get jax with cuda to work in WSL using a flake, but the GPU is never recognized. Torch in the same flake recognizes the device. I've tried setting nixpkgs to both nixos-23.11 and nixos-unstable.
Can you post your Nix code?
19:18:24
@evax:matrix.orgevax
{
  description = "Jax+cuda shell";

  nixConfig = {
    extra-substituters = [
      "https://cuda-maintainers.cachix.org"
    ];
    extra-trusted-public-keys = [
      "cuda-maintainers.cachix.org-1:0dq3bujKpuEPMCX6U4WylrUDZ9JyUG0VpVZa7CNfq5E="
    ];
  };

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-23.11";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        config = {
          allowUnfree = true;
          cudaSupport = true;
        };
        pkgs = (import nixpkgs { inherit system config; }).pkgs;
        python3 = pkgs.python311;
        deps = ps: with ps; [
          jax
          jaxlib
        ];
        devPython = python3.withPackages(ps: with ps; (deps(ps) ++ [
          ipython
        ]));
      in rec {
        inherit pkgs;
        devShell = pkgs.stdenv.mkDerivation {
          name = "jax-shell";
          buildInputs = [
            devPython
          ];
          shellHook = ''
            export CUDA_PATH=${pkgs.cudatoolkit}
            export LD_LIBRARY_PATH=/usr/lib/wsl/lib:${pkgs.linuxPackages.nvidia_x11}/lib:${pkgs.ncurses5}/lib
            export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
            export EXTRA_CCFLAGS="-I/usr/include"
         '';
        };
        defaultPackage = devShell;
      }
    );
}
21:02:04
17 May 2024
@evax:matrix.orgevaxThis was originally with nix on top of alma linux in WSL, I switched to NixOS-WSL and have the same issue07:07:30
@ss:someonex.netSomeoneSerge (matrix works sometimes)

export LD_LIBRARY_PATH=/usr/lib/wsl/lib:${pkgs.linuxPackages.nvidia_x11}/lib:${pkgs.ncurses5}/lib

You don't want to reference nvidia_x11 in WSL environments. Even on linux we don't reference it directly, cf. various posts about "impure drivers" on github and discourse

08:38:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)

/usr/lib/wsl/lib:

I forget now, is libcuda.so placed directly under this path, or in a subdirectory?

08:38:38
@ss:someonex.netSomeoneSerge (matrix works sometimes) Could you also please gist the errors, and logs for LD_DEBUG=libs python -c "import torch; torch.cuda.is_available()" and LD_DEBUG=libs python -c "..." (some minimal code to make jax attempt loading libcuda) 08:39:58
@evax:matrix.orgevaxthanks, let me try these things08:45:37
@evax:matrix.orgevaxlibcuda.so is under /usr/lib/wsl/lib08:56:55
@evax:matrix.orgevaxsome finding, under NixOS-WSL, with the option to use windows drivers, a wsl-lib package is created in the nix store linking the contents of /usr/lib/wsl/lib09:11:06
@evax:matrix.orgevaxit's exposed under /run/opengl-driver/lib09:11:57
@evax:matrix.orgevaxit might just be that jax is expecting cuda12 but the actual version in the system is cuda1109:12:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @evax:matrix.org
some finding, under NixOS-WSL, with the option to use windows drivers, a wsl-lib package is created in the nix store linking the contents of /usr/lib/wsl/lib
Good, this sounds much safer than putting /usr/lib/wsl in LD_LIBRARY_PATH
09:12:40
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @evax:matrix.org
it might just be that jax is expecting cuda12 but the actual version in the system is cuda11
It links its cuda libraries directly, and the driver is likely compatible with both
09:13:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)* It links its cuda libraries directly, and the driver is likely compatible with both releases09:13:13
@evax:matrix.orgevaxanother finding, using jaxlibWithCuda (the nix compiled version) jax complains there's no CUDA enabled jaxlib, while using jaxlib-bin there's an error message related to loading CUDA09:14:53
@evax:matrix.orgevax(I can't cut/paste/gist from that system, sorry)09:16:40
@evax:matrix.orgevax the jaxlib-bin error (with TF_CPP_MIN_LOG_LEVEL=0) is external/tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 09:22:09
@evax:matrix.orgevaxI tried to LD_PRELOAD libcuda.so and it doesn't help09:22:29
@evax:matrix.orgevax with jaxlibWithCuda, the error is An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu. 09:24:35
@evax:matrix.orgevax torch finds the GPU with LD_LIBRARY_PATH pointing either to /usr/lib/wsl/lib or /run/opengl-driver/lib, but not without, for jaxWithCuda none of these options work 09:46:50
@connorbaker:matrix.orgconnor (he/him) Okay, tired of machines restarting
I just bought three different kits of RAM to replace the existing kits in my builders. And two 10Gbe NICs to try to increase builder performance since they’re all networked together and the 2.5Gbe on two of the machines was a bottleneck.
17:09:15
@connorbaker:matrix.orgconnor (he/him)God I hate hardware 🫠17:09:20
@glepage:matrix.orgGaétan LepageHow many systems do you have as builders ?20:27:05
@connorbaker:matrix.orgconnor (he/him)I have three desktops I use as builders; I also pay for an aarch64-linux Hetzner server which I use for aarch64-linux builds for CI23:13:40
18 May 2024
@glepage:matrix.orgGaétan Lepage Ok cool !
I am starting to think about building a workstation for nix builds.
Would you mind sharing the specs of your machines ?
11:53:53
@connorbaker:matrix.orgconnor (he/him)Sure! Although keep in mind I've had a very difficult time managing consumer-grade hardware (especially given I use ASUS motherboards and the stupid default levels for voltage which trigger instability in games also trigger very hard to reproduce segfaults during Nix builds)12:16:19
@connorbaker:matrix.orgconnor (he/him)My main machine: https://pcpartpicker.com/user/connorbaker/saved/pxtbkL A builder: https://pcpartpicker.com/user/connorbaker/saved/h6mvZL A builder/storage: https://pcpartpicker.com/user/connorbaker/saved/Pyy7CJ12:51:29
@connorbaker:matrix.orgconnor (he/him) FWIW, it takes magma-cuda-static with the default set of capabilities ~19m30s to build on nixos-desktop and ~21m12s to build on nixos-build01 or nixos-ext. 12:52:26

Show newer messages


Back to Room ListRoom Version: 9