9 Jul 2024 |
hacker1024 | * Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported when running the following
`nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt` | 06:50:57 |
SomeoneSerge (utc+3) | Not sure, tensorrt isn't receiving enough love:) | 07:11:44 |
SomeoneSerge (utc+3) | https://github.com/NixOS/nixpkgs/issues/323124 | 07:12:14 |
SomeoneSerge (utc+3) | Jonas Chevalier hexa (UTC+1) a question about release-lib.nix : my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way? | 18:07:33 |
connor (he/him) (UTC-7) | Kevin Mittman: is there any reason the TensorRT tarball exploded in size for the 10.2 release? It's clocking in at over 4GB, nearly twice the size it was for 10.1 (~2GB).
[connorbaker@nixos-desktop:~/cuda-redist-find-features]$ ./tensorrt/helper.sh 12.5 10.2.0.19 linux-x86_64
[582.9/4140.3 MiB DL] downloading 'https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz'
| 18:56:10 |
connor (he/him) (UTC-7) | The SBSA package only increased from 2398575314 to 2423326645 bytes (so still about 2GB) | 18:57:16 |
Kevin Mittman | There's two CUDA variants, so it's more like 8GB total. The static .a is 3GB! Asked the same and "many new features" | 19:45:41 |
10 Jul 2024 |
Jonas Chevalier | In reply to @ss:someonex.net Jonas Chevalier hexa (UTC+1) a question about release-lib.nix : my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way? Patch release-lib.nix to add this logic:
nixpkgsArgs' = if builtins.isFunction nixpkgsArgs then nixpkgsArgs else (system: nixpkgsArgs);
And then replace all the nixpkgsArgs usages with (nixpkgsArgs' system)
| 10:03:33 |
| oak changed their profile picture. | 20:21:23 |
11 Jul 2024 |
SomeoneSerge (utc+3) | openai-triton broken with cuda+python3.12 š© | 00:52:55 |
| myrkskog joined the room. | 03:15:20 |
connor (he/him) (UTC-7) | [2/2137/2141 built (157 failed), 11826 copied (948537.2 MiB), 27313.5 MiB DL] building xyce-7.8.0 (checkPhase): MEASURE/PrecisionTest............................................passed[sh] (Time: 1s = 0.
š
| 07:06:43 |
SomeoneSerge (utc+3) | ROCm stuff cache-missing again | 12:39:44 |
SomeoneSerge (utc+3) | https://hydra.nixos.org/job/nixpkgs/trunk/rocmPackages.rocsolver.x86_64-linux yeah getting a timeout locally as well | 12:40:02 |
| yorik.sar joined the room. | 15:05:02 |
SomeoneSerge (utc+3) | Madoura I'm trying to build rocsolver again but I already suspect it's going to time out another time | 16:30:42 |
SomeoneSerge (utc+3) | Madoura https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/2 this is falling apart š¹ | 23:16:45 |
12 Jul 2024 |
| @valconius:matrix.org left the room. | 01:16:15 |
13 Jul 2024 |
mcwitt | This might just be me being dumb, but am surprised that I'm unable to build jax with doCheck = false (use case is I want to override jaxlib = jaxlibWithCuda and don't want to run the tests). Repro:
nix build --impure --expr 'let nixpkgs = builtins.getFlake("github:nixos/nixpkgs/nixpkgs-unstable"); pkgs = import nixpkgs { system = "x86_64-linux"; config.allowUnfree = true; }; in pkgs.python3Packages.jax.overridePythonAttrs { doCheck = false; }'
fails with ModuleNotFoundError: jax requires jaxlib to be installed
| 01:58:54 |
mcwitt | Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):
Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"
yields [cuda(id=0), cuda(id=1)] .
But when numpyro is overridden to use jaxlibWithCuda , for some reason the propagated jaxlib is still the CPU version:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"
yields [CpuDevice(id=0)] . (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib š¤
Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?
| 04:30:05 |
mcwitt | * Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):
Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"
yields [cuda(id=0), cuda(id=1)] .
But when numpyro is overridden to use jaxlibWithCuda , for some reason the propagated jaxlib is still the CPU version:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"
yields [CpuDevice(id=0)] . (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib š¤)
Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?
| 04:30:27 |
mcwitt | Ack, the second one was just me being dumb. I'd mixed up the proper ordering of `override` and `overridePythonAttrs` š¤¦ sorry for the noise | 05:03:40 |
SomeoneSerge (utc+3) | In reply to @mcwitt:matrix.org
Something else that's puzzling me: overriding with jaxlib = jaxlibWithCuda doesn't seem to work for numpyro (which has a pretty simple derivation):
Sanity check: a Python env with just jax and jaxlibWithCuda is GPU-enabled, as expected:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"
yields [cuda(id=0), cuda(id=1)] .
But when numpyro is overridden to use jaxlibWithCuda , for some reason the propagated jaxlib is still the CPU version:
nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"
yields [CpuDevice(id=0)] . (Furthermore, if we try to add jaxlibWithCuda to the withPackages call, we get a collision error, so clearly something is propagating the CPU jaxlib š¤)
Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?
Wait why is numpyro propagating jaxlib, I thought we had a convention not to propagate jaxlib | 08:10:45 |
| ocharles set a profile picture. | 10:21:04 |
15 Jul 2024 |
| oak changed their profile picture. | 03:16:33 |
connor (he/him) (UTC-7) | Were there any changes to Python packages over the last month or so that may have caused breakages? Like default version or anything? I rebased a PR and Iām seeing more build failures than I was previously | 21:25:25 |
SomeoneSerge (utc+3) | In reply to @connorbaker:matrix.org Were there any changes to Python packages over the last month or so that may have caused breakages? Like default version or anything? I rebased a PR and Iām seeing more build failures than I was previously Like 3.12? | 21:25:53 |
connor (he/him) (UTC-7) | Oooooh that would do it! | 21:26:10 |
16 Jul 2024 |
connor (he/him) (UTC-7) | Trying to post the results of a nixpkgs-review run and:
There was a problem saving your comment. Your comment is too long (maximum is 65536 characters). Please try again.
| 00:29:53 |
| will joined the room. | 04:51:50 |