!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

288 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda56 Servers

Load older messages


SenderMessageTime
9 Jul 2024
@hacker1024:matrix.orghacker1024It is times like this that make me question my values to the core06:21:39
@ss:someonex.netSomeoneSerge (back on matrix)

/usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp

That's a good start

06:23:50
@ss:someonex.netSomeoneSerge (back on matrix) *

/usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp

That's a promising opening

06:24:01
@ss:someonex.netSomeoneSerge (back on matrix)These are only distributed with the jetpack, right?06:29:34
@ss:someonex.netSomeoneSerge (back on matrix)Redacted or Malformed Event06:35:01
@hacker1024:matrix.orghacker1024Yep, luckily Jetpack-NixOS has all the samples packages06:49:01
@hacker1024:matrix.orghacker1024* Yep, luckily Jetpack-NixOS has all the samples packaged06:49:06
@hacker1024:matrix.orghacker1024Just needs some overlay weirdness to use CUDA from Nixpkgs now06:49:37
@hacker1024:matrix.orghacker1024Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported ` nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt`06:50:38
@hacker1024:matrix.orghacker1024* Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported when running the following `nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt`06:50:57
@ss:someonex.netSomeoneSerge (back on matrix)Not sure, tensorrt isn't receiving enough love:)07:11:44
@ss:someonex.netSomeoneSerge (back on matrix) https://github.com/NixOS/nixpkgs/issues/323124 07:12:14
@ss:someonex.netSomeoneSerge (back on matrix) Jonas Chevalier hexa (UTC+1) a question about release-lib.nix: my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way? 18:07:33
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

Kevin Mittman: is there any reason the TensorRT tarball exploded in size for the 10.2 release? It's clocking in at over 4GB, nearly twice the size it was for 10.1 (~2GB).

[connorbaker@nixos-desktop:~/cuda-redist-find-features]$ ./tensorrt/helper.sh 12.5 10.2.0.19 linux-x86_64
[582.9/4140.3 MiB DL] downloading 'https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz'
18:56:10
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)The SBSA package only increased from 2398575314 to 2423326645 bytes (so still about 2GB)18:57:16
@justbrowsing:matrix.orgKevin Mittman (UTC-8) There's two CUDA variants, so it's more like 8GB total. The static .a is 3GB! Asked the same and "many new features" 19:45:41
10 Jul 2024
@zimbatm:numtide.comJonas Chevalier
In reply to @ss:someonex.net
Jonas Chevalier hexa (UTC+1) a question about release-lib.nix: my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way?

Patch release-lib.nix to add this logic:

nixpkgsArgs' = if builtins.isFunction nixpkgsArgs then nixpkgsArgs else (system: nixpkgsArgs);

And then replace all the nixpkgsArgs usages with (nixpkgsArgs' system)

10:03:33
@oak:universumi.fioak 🏳️‍🌈♥️ changed their profile picture.20:21:23
11 Jul 2024
@ss:someonex.netSomeoneSerge (back on matrix)openai-triton broken with cuda+python3.12 😩00:52:55
@myrkskog:matrix.orgmyrkskog joined the room.03:15:20
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
[2/2137/2141 built (157 failed), 11826 copied (948537.2 MiB), 27313.5 MiB DL] building xyce-7.8.0 (checkPhase): MEASURE/PrecisionTest............................................passed[sh]      (Time:   1s =   0.

💀

07:06:43
@ss:someonex.netSomeoneSerge (back on matrix)ROCm stuff cache-missing again12:39:44
@ss:someonex.netSomeoneSerge (back on matrix) https://hydra.nixos.org/job/nixpkgs/trunk/rocmPackages.rocsolver.x86_64-linux yeah getting a timeout locally as well 12:40:02
@yorik.sar:matrix.orgyorik.sar joined the room.15:05:02
@ss:someonex.netSomeoneSerge (back on matrix) Madoura I'm trying to build rocsolver again but I already suspect it's going to time out another time 16:30:42
@ss:someonex.netSomeoneSerge (back on matrix) Madoura https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/2 this is falling apart 😹 23:16:45
12 Jul 2024
@valconius:matrix.org@valconius:matrix.org left the room.01:16:15
13 Jul 2024
@mcwitt:matrix.orgmcwitt

This might just be me being dumb, but am surprised that I'm unable to build jax with doCheck = false (use case is I want to override jaxlib = jaxlibWithCuda and don't want to run the tests). Repro:

nix build --impure --expr 'let nixpkgs = builtins.getFlake("github:nixos/nixpkgs/nixpkgs-unstable"); pkgs = import nixpkgs { system = "x86_64-linux"; config.allowUnfree = true; }; in pkgs.python3Packages.jax.overridePythonAttrs { doCheck = false; }'

fails with ModuleNotFoundError: jax requires jaxlib to be installed

01:58:54
@mcwitt:matrix.orgmcwitt

Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):

Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"

yields [cuda(id=0), cuda(id=1)].

But when numpyro is overridden to use jaxlibWithCuda, for some reason the propagated jaxlib is still the CPU version:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"

yields [CpuDevice(id=0)]. (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔

Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?

04:30:05
@mcwitt:matrix.orgmcwitt *

Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):

Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"

yields [cuda(id=0), cuda(id=1)].

But when numpyro is overridden to use jaxlibWithCuda, for some reason the propagated jaxlib is still the CPU version:

nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"

yields [CpuDevice(id=0)]. (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔)

Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?

04:30:27

Show newer messages


Back to Room ListRoom Version: 9