!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

212 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda43 Servers

Load older messages


SenderMessageTime
9 Jul 2024
@connorbaker:matrix.orgconnor (he/him) (UTC-7)The SBSA package only increased from 2398575314 to 2423326645 bytes (so still about 2GB)18:57:16
@justbrowsing:matrix.orgKevin Mittman There's two CUDA variants, so it's more like 8GB total. The static .a is 3GB! Asked the same and "many new features" 19:45:41
10 Jul 2024
@zimbatm:numtide.comJonas Chevalier
In reply to @ss:someonex.net
Jonas Chevalier hexa (UTC+1) a question about release-lib.nix: my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way?

Patch release-lib.nix to add this logic:

nixpkgsArgs' = if builtins.isFunction nixpkgsArgs then nixpkgsArgs else (system: nixpkgsArgs);

And then replace all the nixpkgsArgs usages with (nixpkgsArgs' system)

10:03:33
@oak:universumi.fioak changed their profile picture.20:21:23
11 Jul 2024
@ss:someonex.netSomeoneSerge (utc+3)openai-triton broken with cuda+python3.12 😩00:52:55
@myrkskog:matrix.orgmyrkskog joined the room.03:15:20
@connorbaker:matrix.orgconnor (he/him) (UTC-7)
[2/2137/2141 built (157 failed), 11826 copied (948537.2 MiB), 27313.5 MiB DL] building xyce-7.8.0 (checkPhase): MEASURE/PrecisionTest............................................passed[sh]      (Time:   1s =   0.

💀

07:06:43
@ss:someonex.netSomeoneSerge (utc+3)ROCm stuff cache-missing again12:39:44
@ss:someonex.netSomeoneSerge (utc+3) https://hydra.nixos.org/job/nixpkgs/trunk/rocmPackages.rocsolver.x86_64-linux yeah getting a timeout locally as well 12:40:02
@yorik.sar:matrix.orgyorik.sar joined the room.15:05:02
@ss:someonex.netSomeoneSerge (utc+3) Madoura I'm trying to build rocsolver again but I already suspect it's going to time out another time 16:30:42
@ss:someonex.netSomeoneSerge (utc+3) Madoura https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/2 this is falling apart 😹 23:16:45
12 Jul 2024
@valconius:matrix.org@valconius:matrix.org left the room.01:16:15
13 Jul 2024
@mcwitt:matrix.orgmcwitt

This might just be me being dumb, but am surprised that I'm unable to build jax with doCheck = false (use case is I want to override jaxlib = jaxlibWithCuda and don't want to run the tests). Repro:

nix build --impure --expr 'let nixpkgs = builtins.getFlake("github:nixos/nixpkgs/nixpkgs-unstable"); pkgs = import nixpkgs { system = "x86_64-linux"; config.allowUnfree = true; }; in pkgs.python3Packages.jax.overridePythonAttrs { doCheck = false; }'

fails with ModuleNotFoundError: jax requires jaxlib to be installed

01:58:54
@mcwitt:matrix.orgmcwitt

Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):

Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"

yields [cuda(id=0), cuda(id=1)].

But when numpyro is overridden to use jaxlibWithCuda, for some reason the propagated jaxlib is still the CPU version:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"

yields [CpuDevice(id=0)]. (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔

Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?

04:30:05
@mcwitt:matrix.orgmcwitt *

Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):

Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"

yields [cuda(id=0), cuda(id=1)].

But when numpyro is overridden to use jaxlibWithCuda, for some reason the propagated jaxlib is still the CPU version:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"

yields [CpuDevice(id=0)]. (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔)

Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?

04:30:27
@mcwitt:matrix.orgmcwittAck, the second one was just me being dumb. I'd mixed up the proper ordering of `override` and `overridePythonAttrs` 🤦 sorry for the noise05:03:40
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @mcwitt:matrix.org

Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):

Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"

yields [cuda(id=0), cuda(id=1)].

But when numpyro is overridden to use jaxlibWithCuda, for some reason the propagated jaxlib is still the CPU version:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"

yields [CpuDevice(id=0)]. (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔)

Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?

Wait why is numpyro propagating jaxlib, I thought we had a convention not to propagate jaxlib
08:10:45
@ocharles:matrix.orgocharles set a profile picture.10:21:04
15 Jul 2024
@oak:universumi.fioak changed their profile picture.03:16:33
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Were there any changes to Python packages over the last month or so that may have caused breakages? Like default version or anything? I rebased a PR and I’m seeing more build failures than I was previously21:25:25
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @connorbaker:matrix.org
Were there any changes to Python packages over the last month or so that may have caused breakages? Like default version or anything? I rebased a PR and I’m seeing more build failures than I was previously
Like 3.12?
21:25:53
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Oooooh that would do it!21:26:10
16 Jul 2024
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

Trying to post the results of a nixpkgs-review run and:

There was a problem saving your comment. Your comment is too long (maximum is 65536 characters). Please try again.
00:29:53
@will4:matrix.orgwill joined the room.04:51:50
@hexa:lossy.networkhexa (UTC+1)that happend to me … never?11:32:43
@connorbaker:matrix.orgconnor (he/him) (UTC-7) What’s the mechanism which allows us to build the tests in passthru.tests? 13:27:48
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

SomeoneSerge (UTC+3): I'm having a bit of trouble with GPU-access in the sandbox. In particular, I've enabled it in my NixOS config with

{
  programs.nix-required-mounts = {
    enable = true;
    presets.nvidia-gpu.enable = true;
    # TODO: Fix merging of presets
    # error: The option `programs.nix-required-mounts.allowedPatterns.nvidia-gpu.unsafeFollowSymlinks' has conflicting definition values:
    # - In `/nix/store/dk2rpyb6ndvfbf19bkb2plcz5y3k8i5v-source/nixos/modules/programs/nix-required-mounts.nix': false
    # - In `/nix/store/2ja2h1nd0z2bw56cl4bn37cb9d18hnzr-source/devices/nixos-desktop/hardware.nix': true
    # Use `lib.mkForce value` or `lib.mkDefault value` to change the priority on any of these definitions.
    # TODO: After enabling running into this error when trying to build a derivation which has requiredSystemFeatures = [ "cuda" ];
    # error:
    #   … while setting up the build environment
    #
    #   error: getting attributes of path '/nix/store/0p1qsszik7hwjddzmyhikq9ywr2ki69l-systemd-minimal-255.6/sbin/bin': No such file or directory
    # Perhaps this is due to the systemd directory in /run/opengl-driver/lib?
    allowedPatterns.nvidia-gpu.unsafeFollowSymlinks = lib.mkForce true;
  };
}

I'm trying to enable the GPU portion of CMake's CUDA test suite (https://github.com/ConnorBaker/nixpkgs/commit/543cf7d2ec330286ba566e6e1187e531d155c5d0), but failing. I thought following the symlinks when mounting would help because before that I was unable to access the ${addDriverRunpath.driverLink}/lib directory (multiple symlinks), but it now fails with

$ nix build --impure -L .#cudaPackages.cmake-cuda-tests.tests.withGpu
warning: Nix search path entry '/nix/var/nix/profiles/per-user/root/channels' does not exist, ignoring
warning: killing stray builder process 5264 ()...
error:
       … while setting up the build environment

       error: getting attributes of path '/nix/store/0p1qsszik7hwjddzmyhikq9ywr2ki69l-systemd-minimal-255.6/sbin/bin': No such file or directory

Any ideas?

17:48:11
@connorbaker:matrix.orgconnor (he/him) (UTC-7)

Ah, okay.
The think addDriverRunpath.driverLink links to is /run/opengl-driver. That is in turn a symlink, created by this: https://github.com/NixOS/nixpkgs/blob/c82d9d313d5107c6ad3a92fc7d20343f45fa5ace/nixos/modules/hardware/graphics.nix#L5-L8
That derivation isn't expose except as a path, used here:
https://github.com/NixOS/nixpkgs/blob/c82d9d313d5107c6ad3a92fc7d20343f45fa5ace/nixos/modules/hardware/graphics.nix#L112-L121
I updated my nixos config as follows, and it seems to work.

{
  programs.nix-required-mounts = {
    enable = true;
    presets.nvidia-gpu.enable = true;
    allowedPatterns.nvidia-gpu = {
      onFeatures = [
        "gpu"
        "nvidia-gpu"
        "opengl"
        "cuda"
      ];
      # It exposes these paths in the sandbox:
      paths =
        let
          inherit (pkgs.addOpenGLRunpath) driverLink;
          thingDriverLinkLinksTo =
            config.systemd.tmpfiles.settings.graphics-driver."/run/opengl-driver"."L+".argument;
        in
        [
          driverLink
          thingDriverLinkLinksTo
          "/dev/dri"
          "/dev/nvidia*"
        ];
    };
  };
}

Of course, that same process would need to be repeated for anything in there which is in turn a symlink (which is the purpose of unsafeFollowSymlinks, I suppose), but I'm not getting that odd systemd bin error any more.

18:11:02
@mkiefel:matrix.orgmkiefelHi! I trying to get an application to work with libGL on a Jetson Orin AGX. For context, I am trying to get a camera image from a device with libargus (which requires GL). I'm not on the latest from unstable; maybe that is the issue. I've already tried pre-loading various GL libs from the base image of Jetpack but to no avail. Does anybody have some pointers for me, please?19:43:27

Show newer messages


Back to Room ListRoom Version: 9