!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

280 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
11 Nov 2025
@glepage:matrix.orgGaétan Lepageyes.21:47:41
@glepage:matrix.orgGaétan Lepage We are following nixos-unstable-small on https://cache.nixos.org 21:48:26
@daniel-fahey:matrix.orgDaniel Faheythanks, is that defined by https://github.com/NixOS/infra/blob/809088c05d04849e3660b22fa9e5bc895570c5fe/channels.nix#L32 ?22:16:35
12 Nov 2025
@arilotter:matrix.orgAri Lotterholy moly. 2tb of ram was not enough for JAX. guess i ran too many jobs at once/03:16:02
@arilotter:matrix.orgAri Lotter seems to have perma-stalled at building python3.12-jax-0.8.0 (pytestCheckPhase): replacing crashed worker gw3 03:16:19
@arilotter:matrix.orgAri Lotterwill try again tomorrow :<03:29:01
@arilotter:matrix.orgAri Lotterthanks JAX03:29:02
@daniel-fahey:matrix.orgDaniel FaheyThat entropy ain't gonna' to increase by itself 😜 2048 GiB / 128 (cores) = 16 GiB per core So JAX must need more than that. So even a rig the price of a house can benefit from swap! I 08:45:31
@daniel-fahey:matrix.orgDaniel Fahey* That entropy ain't gonna' to increase by itself 😜 2048 GiB / 128 (cores) = 16 GiB per core So JAX must need more than that. So even a rig the price of a house can benefit from swap!08:45:39
@daniel-fahey:matrix.orgDaniel Fahey* That entropy ain't gonna' to increase by itself 😜 2048 GiB / 128 cores = 2^11 GiB / 2^7 cores = 2^4 GiB per core = 16 GiB per core So JAX must need more than that. So even a rig the price of a house can benefit from swap!08:48:52
@glepage:matrix.orgGaétan LepageIf I remember correctly, I can build jax on 64 cores with 128 GB of RAM09:14:48
@daniel-fahey:matrix.orgDaniel Fahey* That entropy ain't gonna' increase by itself 😜 2048 GiB / 128 cores = 2^11 GiB / 2^7 cores = 2^4 GiB per core = 16 GiB per core So JAX must need more than that. So even a rig the price of a house can benefit from swap!11:33:21
@daniel-fahey:matrix.orgDaniel Fahey

128 GiB / 64 cores = 2 GiB per core 🤔

Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824

Must be a quirk with Intel CPUs then!

12:04:02
@daniel-fahey:matrix.orgDaniel Fahey *

128 GiB / 64 cores = 2 GiB per core 🤔

Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824

Must be a quirk with (some, server grade) Intel CPUs then!

12:05:14
@daniel-fahey:matrix.orgDaniel Fahey *

128 GiB / 64 cores = 2 GiB per core 🤔

Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824

Must be a quirk with (some, server grade) Intel CPUs then?

12:05:26
@glepage:matrix.orgGaétan Lepage replacing crashed worker is pytest's message following one of its processes crashing. 12:12:09
@daniel-fahey:matrix.orgDaniel FaheyYeah, could be for any reason, still seems a bit fishy12:16:40
@glepage:matrix.orgGaétan Lepage Try with -j 8 for instance. I have never experienced flakiness when building jax, even though I have done it dozens of times. 12:17:35
@glepage:matrix.orgGaétan LepageNever tried on Intel though...12:17:44
@daniel-fahey:matrix.orgDaniel Fahey

I also only saw build failures with python3.12

Looks like this particular version is in the team cache, so it built fine

$ nix-build https://github.com/daniel-fahey/nixpkgs/archive/fix/python3Packages.vllm.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0

Latest build with the matching hash https://hydra.nixos-cuda.org/build/8123#tabs-buildsteps

atlas.nixos-cuda.org and https://github.com/nixos-cuda/infra/blob/f150adfab4863b131ef67fb97919fff949793995/hosts/atlas/hardware.nix#L28 suggests it's Intel?

🤔

12:32:38
@glepage:matrix.orgGaétan LepageThen maybe some flakiness with your specific CPU... Sometimes things are weird.12:37:39
@glepage:matrix.orgGaétan Lepage (atlas is configured with cores = 9, maybe this plays a role in building jax successfully) 12:38:10
@daniel-fahey:matrix.orgDaniel Fahey

Same exact hashed versions declared from nixos-unstable and master branches too:

[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz' into the Git cache...
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0

[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/master.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/master.tar.gz' into the Git cache...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
12:38:15
@glepage:matrix.orgGaétan LepageSo you get a cache hit, all good then?12:39:15
@daniel-fahey:matrix.orgDaniel Fahey

Yeah, Ari Lotter, might be a good idea to add this cache (as well as Flox's):

extra-trusted-substituters = https://cache.nixos-cuda.org
extra-trusted-public-keys = cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M=
12:44:00
@daniel-fahey:matrix.orgDaniel Fahey

Yep, no cache hit with Flox:

[daniel@laptop:~]$ nix path-info --store https://cache.flox.dev /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
these 21 paths will be fetched (2586.20 MiB download, 4491.51 MiB unpacked):
  /nix/store/mx2c21i61q6mm21cr27h3kpz09z9j3ds-cuda12.8-cuda_cccl-12.8.90
  /nix/store/60bccal8rk5zm3nsxszvfvv6754imwcl-cuda12.8-cuda_cudart-12.8.90
  /nix/store/js94l573zp6a325irbymcpajr95r8011-cuda12.8-cuda_cupti-12.8.90-lib
  /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93
  /nix/store/pdjnbw4sa9f4mag54hxq8wrk5qidk6pn-cuda12.8-cudnn-9.13.0.50-lib
  /nix/store/1c0jcdqaf7pjf28jsizkysy6h1pj2048-cuda12.8-libcublas-12.8.4.1-lib
  /nix/store/x1pf46gpsy4s0b18598p4byagl15im89-cuda12.8-libcufft-11.3.3.83-lib
  /nix/store/2myxa089vbhxrls82nhhpi93gr68crwc-cuda12.8-libcusolver-11.7.3.90-lib
  /nix/store/lh80i7q850hgk6m55yfxzllhx0mcim88-cuda12.8-libcusparse-12.5.8.93-lib
  /nix/store/ajvrjfzbmmi2sarsf6xmjhd1ib1g6a8w-cuda12.8-libnvjitlink-12.8.93-lib
  /nix/store/kx6rjjpgybnxci4wfm0yq54zdm4qidnp-cuda12.8-nccl-2.28.7-1
  /nix/store/98qfxl63r5s3fa6q9dlaladsrb4pn8n1-python3.12-absl-py-2.3.1
  /nix/store/kmi3l0wdnkma0sjfbm6661jsy9957r5g-python3.12-flatbuffers-25.2.10
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
  /nix/store/cfav66cmsr83k0hf45pps5azhys6kfl8-python3.12-jaxlib-0.8.0
  /nix/store/ji79rzmqg5r66bkdhdglzfg9ji2lb32q-python3.12-ml-dtypes-0.5.3
  /nix/store/jir196c5rj03a561hzp4scmvv0xcivwn-python3.12-numpy-2.3.3
  /nix/store/p95s2s0id7n6lc7czsk5rg7j0qdy8853-python3.12-opt-einsum-3.4.0
  /nix/store/2ga29gq011y1wa7gkcqfc1az7cp1mkah-python3.12-scipy-1.16.2
error: path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' is not valid

[daniel@laptop:~]$ nix path-info --store https://cache.nixos-cuda.org /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
12:52:31
@daniel-fahey:matrix.orgDaniel Fahey(maybe they're having the same build issue, lol) I don't know enough about them tbh 12:58:09
@daniel-fahey:matrix.orgDaniel FaheyFlox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/) Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol13:06:13
@daniel-fahey:matrix.orgDaniel FaheySo much for the private sector13:06:18
@sporeray:matrix.orgRobbie Buxton
In reply to @daniel-fahey:matrix.org
Flox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/)

Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol
Are you sure they aren’t just wrapping nix unstable with some hacks for their project?
15:23:23

Show newer messages


Back to Room ListRoom Version: 9