NixOS CUDA - Public Room Timeline

	NixOS CUDA	288 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	56 Servers

Load older messages

Sender	Message	Time
9 Jul 2024
hacker1024	It is times like this that make me question my values to the core	06:21:39
SomeoneSerge (back on matrix)	/usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp That's a good start	06:23:50
SomeoneSerge (back on matrix)	* /usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp That's a promising opening	06:24:01
SomeoneSerge (back on matrix)	These are only distributed with the jetpack, right?	06:29:34
SomeoneSerge (back on matrix)	Redacted or Malformed Event	06:35:01
hacker1024	Yep, luckily Jetpack-NixOS has all the samples packages	06:49:01
hacker1024	* Yep, luckily Jetpack-NixOS has all the samples packaged	06:49:06
hacker1024	Just needs some overlay weirdness to use CUDA from Nixpkgs now	06:49:37
hacker1024	Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported ` nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt`	06:50:38
hacker1024	* Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported when running the following `nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt`	06:50:57
SomeoneSerge (back on matrix)	Not sure, tensorrt isn't receiving enough love:)	07:11:44
SomeoneSerge (back on matrix)	https://github.com/NixOS/nixpkgs/issues/323124	07:12:14
SomeoneSerge (back on matrix)	Jonas Chevalier hexa (UTC+1) a question about `release-lib.nix`: my impression is that `supportedPlatforms` is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing `nixpkgs` with different `config.cudaCapabilities` values... any thoughts on how to express this in a not-too-ad-hoc way?	18:07:33
connor (burnt/out) (UTC-8)	Kevin Mittman: is there any reason the TensorRT tarball exploded in size for the 10.2 release? It's clocking in at over 4GB, nearly twice the size it was for 10.1 (~2GB). `[connorbaker@nixos-desktop:~/cuda-redist-find-features]$ ./tensorrt/helper.sh 12.5 10.2.0.19 linux-x86_64 [582.9/4140.3 MiB DL] downloading 'https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz'`	18:56:10
connor (burnt/out) (UTC-8)	The SBSA package only increased from 2398575314 to 2423326645 bytes (so still about 2GB)	18:57:16
Kevin Mittman (UTC-8)	There's two CUDA variants, so it's more like 8GB total. The static .a is 3GB! Asked the same and "many new features"	19:45:41
10 Jul 2024
Jonas Chevalier	In reply to @ss:someonex.net Jonas Chevalier hexa (UTC+1) a question about `release-lib.nix`: my impression is that `supportedPlatforms` is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing `nixpkgs` with different `config.cudaCapabilities` values... any thoughts on how to express this in a not-too-ad-hoc way? Patch release-lib.nix to add this logic: `nixpkgsArgs' = if builtins.isFunction nixpkgsArgs then nixpkgsArgs else (system: nixpkgsArgs);` And then replace all the `nixpkgsArgs` usages with `(nixpkgsArgs' system)`	10:03:33
	oak 🏳️‍🌈♥️ changed their profile picture.	20:21:23
11 Jul 2024
SomeoneSerge (back on matrix)	openai-triton broken with cuda+python3.12 😩	00:52:55
	myrkskog joined the room.	03:15:20
connor (burnt/out) (UTC-8)	`[2/2137/2141 built (157 failed), 11826 copied (948537.2 MiB), 27313.5 MiB DL] building xyce-7.8.0 (checkPhase): MEASURE/PrecisionTest............................................passed[sh] (Time: 1s = 0.` 💀	07:06:43
SomeoneSerge (back on matrix)	ROCm stuff cache-missing again	12:39:44
SomeoneSerge (back on matrix)	https://hydra.nixos.org/job/nixpkgs/trunk/rocmPackages.rocsolver.x86_64-linux yeah getting a timeout locally as well	12:40:02
	yorik.sar joined the room.	15:05:02
SomeoneSerge (back on matrix)	Madoura I'm trying to build rocsolver again but I already suspect it's going to time out another time	16:30:42
SomeoneSerge (back on matrix)	Madoura https://discourse.nixos.org/t/testing-gpu-compute-on-amd-apu-nixos/47060/2 this is falling apart 😹	23:16:45
12 Jul 2024
	@valconius:matrix.org left the room.	01:16:15
13 Jul 2024
mcwitt	This might just be me being dumb, but am surprised that I'm unable to build jax with `doCheck = false` (use case is I want to override `jaxlib = jaxlibWithCuda` and don't want to run the tests). Repro: `nix build --impure --expr 'let nixpkgs = builtins.getFlake("github:nixos/nixpkgs/nixpkgs-unstable"); pkgs = import nixpkgs { system = "x86_64-linux"; config.allowUnfree = true; }; in pkgs.python3Packages.jax.overridePythonAttrs { doCheck = false; }'` fails with `ModuleNotFoundError: jax requires jaxlib to be installed`	01:58:54
mcwitt	Something else that's puzzling me: overriding with `jaxlib = jaxlibWithCuda` doesn't seem to work for numpyro (which has a pretty simple derivation): Sanity check: a Python env with just `jax` and `jaxlibWithCuda` is GPU-enabled, as expected: `nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"` yields `[cuda(id=0), cuda(id=1)]`. But when `numpyro` is overridden to use `jaxlibWithCuda`, for some reason the propagated `jaxlib` is still the CPU version: `nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"` yields `[CpuDevice(id=0)]`. (Furthermore, if we try to add `jaxlibWithCuda` to the `withPackages` call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔 Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?	04:30:05
mcwitt	* Something else that's puzzling me: overriding with `jaxlib = jaxlibWithCuda` doesn't seem to work for numpyro (which has a pretty simple derivation): Sanity check: a Python env with just `jax` and `jaxlibWithCuda` is GPU-enabled, as expected: `nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ps.jax ps.jaxlibWithCuda ]))' --command python -c "import jax; print(jax.devices())"` yields `[cuda(id=0), cuda(id=1)]`. But when `numpyro` is overridden to use `jaxlibWithCuda`, for some reason the propagated `jaxlib` is still the CPU version: `nix shell --refresh --impure --expr 'let nixpkgs = builtins.getFlake "github:nixos/nixpkgs/nixpkgs-unstable"; pkgs = import nixpkgs { config.allowUnfree = true; }; in (pkgs.python3.withPackages (ps: [ ((ps.numpyro.overridePythonAttrs (_: { doCheck = false; })).override { jaxlib = ps.jaxlibWithCuda; }) ]))' --command python -c "import jax; print(jax.devices())"` yields `[CpuDevice(id=0)]`. (Furthermore, if we try to add `jaxlibWithCuda` to the `withPackages` call, we get a collision error, so clearly something is propagating the CPU jaxlib 🤔) Has anyone seen this, or have a better way to use the GPU-enabled jaxlib as a dependency?	04:30:27

Show newer messages

Back to Room ListRoom Version: 9