| 11 Nov 2025 |
Gaétan Lepage | yes. | 21:47:41 |
Gaétan Lepage | We are following nixos-unstable-small on https://cache.nixos.org | 21:48:26 |
Daniel Fahey | thanks, is that defined by https://github.com/NixOS/infra/blob/809088c05d04849e3660b22fa9e5bc895570c5fe/channels.nix#L32 ? | 22:16:35 |
| 12 Nov 2025 |
Ari Lotter | holy moly. 2tb of ram was not enough for JAX. guess i ran too many jobs at once/ | 03:16:02 |
Ari Lotter | seems to have perma-stalled at building python3.12-jax-0.8.0 (pytestCheckPhase): replacing crashed worker gw3 | 03:16:19 |
Ari Lotter | will try again tomorrow :< | 03:29:01 |
Ari Lotter | thanks JAX | 03:29:02 |
Daniel Fahey | That entropy ain't gonna' to increase by itself 😜
2048 GiB / 128 (cores) = 16 GiB per core
So JAX must need more than that. So even a rig the price of a house can benefit from swap!
I | 08:45:31 |
Daniel Fahey | * That entropy ain't gonna' to increase by itself 😜
2048 GiB / 128 (cores) = 16 GiB per core
So JAX must need more than that. So even a rig the price of a house can benefit from swap! | 08:45:39 |
Daniel Fahey | * That entropy ain't gonna' to increase by itself 😜
2048 GiB / 128 cores = 2^11 GiB / 2^7 cores = 2^4 GiB per core = 16 GiB per core
So JAX must need more than that. So even a rig the price of a house can benefit from swap! | 08:48:52 |
Gaétan Lepage | If I remember correctly, I can build jax on 64 cores with 128 GB of RAM | 09:14:48 |
Daniel Fahey | * That entropy ain't gonna' increase by itself 😜
2048 GiB / 128 cores = 2^11 GiB / 2^7 cores = 2^4 GiB per core = 16 GiB per core
So JAX must need more than that. So even a rig the price of a house can benefit from swap! | 11:33:21 |
Daniel Fahey | 128 GiB / 64 cores = 2 GiB per core 🤔
Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824
Must be a quirk with Intel CPUs then!
| 12:04:02 |
Daniel Fahey | * 128 GiB / 64 cores = 2 GiB per core 🤔
Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824
Must be a quirk with (some, server grade) Intel CPUs then!
| 12:05:14 |
Daniel Fahey | * 128 GiB / 64 cores = 2 GiB per core 🤔
Aha! replacing crashed worker appears in my logs shared in https://github.com/NixOS/nixpkgs/issues/445824
Must be a quirk with (some, server grade) Intel CPUs then?
| 12:05:26 |
Gaétan Lepage | replacing crashed worker is pytest's message following one of its processes crashing. | 12:12:09 |
Daniel Fahey | Yeah, could be for any reason, still seems a bit fishy | 12:16:40 |
Gaétan Lepage | Try with -j 8 for instance. I have never experienced flakiness when building jax, even though I have done it dozens of times. | 12:17:35 |
Gaétan Lepage | Never tried on Intel though... | 12:17:44 |
Daniel Fahey | I also only saw build failures with python3.12
Looks like this particular version is in the team cache, so it built fine
$ nix-build https://github.com/daniel-fahey/nixpkgs/archive/fix/python3Packages.vllm.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
Latest build with the matching hash https://hydra.nixos-cuda.org/build/8123#tabs-buildsteps
atlas.nixos-cuda.org and https://github.com/nixos-cuda/infra/blob/f150adfab4863b131ef67fb97919fff949793995/hosts/atlas/hardware.nix#L28 suggests it's Intel?
🤔
| 12:32:38 |
Gaétan Lepage | Then maybe some flakiness with your specific CPU... Sometimes things are weird. | 12:37:39 |
Gaétan Lepage | (atlas is configured with cores = 9, maybe this plays a role in building jax successfully) | 12:38:10 |
Daniel Fahey | Same exact hashed versions declared from nixos-unstable and master branches too:
[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz' into the Git cache...
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/master.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/master.tar.gz' into the Git cache...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
| 12:38:15 |
Gaétan Lepage | So you get a cache hit, all good then? | 12:39:15 |
Daniel Fahey | Yeah, Ari Lotter, might be a good idea to add this cache (as well as Flox's):
extra-trusted-substituters = https://cache.nixos-cuda.org
extra-trusted-public-keys = cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M=
| 12:44:00 |
Daniel Fahey | Yep, no cache hit with Flox:
[daniel@laptop:~]$ nix path-info --store https://cache.flox.dev /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
these 21 paths will be fetched (2586.20 MiB download, 4491.51 MiB unpacked):
/nix/store/mx2c21i61q6mm21cr27h3kpz09z9j3ds-cuda12.8-cuda_cccl-12.8.90
/nix/store/60bccal8rk5zm3nsxszvfvv6754imwcl-cuda12.8-cuda_cudart-12.8.90
/nix/store/js94l573zp6a325irbymcpajr95r8011-cuda12.8-cuda_cupti-12.8.90-lib
/nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93
/nix/store/pdjnbw4sa9f4mag54hxq8wrk5qidk6pn-cuda12.8-cudnn-9.13.0.50-lib
/nix/store/1c0jcdqaf7pjf28jsizkysy6h1pj2048-cuda12.8-libcublas-12.8.4.1-lib
/nix/store/x1pf46gpsy4s0b18598p4byagl15im89-cuda12.8-libcufft-11.3.3.83-lib
/nix/store/2myxa089vbhxrls82nhhpi93gr68crwc-cuda12.8-libcusolver-11.7.3.90-lib
/nix/store/lh80i7q850hgk6m55yfxzllhx0mcim88-cuda12.8-libcusparse-12.5.8.93-lib
/nix/store/ajvrjfzbmmi2sarsf6xmjhd1ib1g6a8w-cuda12.8-libnvjitlink-12.8.93-lib
/nix/store/kx6rjjpgybnxci4wfm0yq54zdm4qidnp-cuda12.8-nccl-2.28.7-1
/nix/store/98qfxl63r5s3fa6q9dlaladsrb4pn8n1-python3.12-absl-py-2.3.1
/nix/store/kmi3l0wdnkma0sjfbm6661jsy9957r5g-python3.12-flatbuffers-25.2.10
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
/nix/store/cfav66cmsr83k0hf45pps5azhys6kfl8-python3.12-jaxlib-0.8.0
/nix/store/ji79rzmqg5r66bkdhdglzfg9ji2lb32q-python3.12-ml-dtypes-0.5.3
/nix/store/jir196c5rj03a561hzp4scmvv0xcivwn-python3.12-numpy-2.3.3
/nix/store/p95s2s0id7n6lc7czsk5rg7j0qdy8853-python3.12-opt-einsum-3.4.0
/nix/store/2ga29gq011y1wa7gkcqfc1az7cp1mkah-python3.12-scipy-1.16.2
error: path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' is not valid
[daniel@laptop:~]$ nix path-info --store https://cache.nixos-cuda.org /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
| 12:52:31 |
Daniel Fahey | (maybe they're having the same build issue, lol) I don't know enough about them tbh
| 12:58:09 |
Daniel Fahey | Flox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/)
Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol | 13:06:13 |
Daniel Fahey | So much for the private sector | 13:06:18 |
Robbie Buxton | In reply to @daniel-fahey:matrix.org Flox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/)
Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol Are you sure they aren’t just wrapping nix unstable with some hacks for their project? | 15:23:23 |