!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

276 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
12 Nov 2025
@glepage:matrix.orgGaétan Lepage replacing crashed worker is pytest's message following one of its processes crashing. 12:12:09
@daniel-fahey:matrix.orgDaniel FaheyYeah, could be for any reason, still seems a bit fishy12:16:40
@glepage:matrix.orgGaétan Lepage Try with -j 8 for instance. I have never experienced flakiness when building jax, even though I have done it dozens of times. 12:17:35
@glepage:matrix.orgGaétan LepageNever tried on Intel though...12:17:44
@daniel-fahey:matrix.orgDaniel Fahey

I also only saw build failures with python3.12

Looks like this particular version is in the team cache, so it built fine

$ nix-build https://github.com/daniel-fahey/nixpkgs/archive/fix/python3Packages.vllm.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0

Latest build with the matching hash https://hydra.nixos-cuda.org/build/8123#tabs-buildsteps

atlas.nixos-cuda.org and https://github.com/nixos-cuda/infra/blob/f150adfab4863b131ef67fb97919fff949793995/hosts/atlas/hardware.nix#L28 suggests it's Intel?

🤔

12:32:38
@glepage:matrix.orgGaétan LepageThen maybe some flakiness with your specific CPU... Sometimes things are weird.12:37:39
@glepage:matrix.orgGaétan Lepage (atlas is configured with cores = 9, maybe this plays a role in building jax successfully) 12:38:10
@daniel-fahey:matrix.orgDaniel Fahey

Same exact hashed versions declared from nixos-unstable and master branches too:

[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/nixos-unstable.tar.gz' into the Git cache...
these 3 paths will be fetched (443.62 MiB download, 443.62 MiB unpacked):
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
copying path '/nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0' from 'https://cache.nixos-cuda.org'...
copying path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' from 'https://cache.nixos-cuda.org'...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0

[daniel@laptop:~]$ nix-build https://github.com/NixOS/nixpkgs/archive/master.tar.gz --pure --arg config '{ allowUnfree = true; cudaSupport = true; }' --attr python312Packages.jax --builders ''
unpacking 'https://github.com/NixOS/nixpkgs/archive/master.tar.gz' into the Git cache...
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
12:38:15
@glepage:matrix.orgGaétan LepageSo you get a cache hit, all good then?12:39:15
@daniel-fahey:matrix.orgDaniel Fahey

Yeah, Ari Lotter, might be a good idea to add this cache (as well as Flox's):

extra-trusted-substituters = https://cache.nixos-cuda.org
extra-trusted-public-keys = cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M=
12:44:00
@daniel-fahey:matrix.orgDaniel Fahey

Yep, no cache hit with Flox:

[daniel@laptop:~]$ nix path-info --store https://cache.flox.dev /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
these 21 paths will be fetched (2586.20 MiB download, 4491.51 MiB unpacked):
  /nix/store/mx2c21i61q6mm21cr27h3kpz09z9j3ds-cuda12.8-cuda_cccl-12.8.90
  /nix/store/60bccal8rk5zm3nsxszvfvv6754imwcl-cuda12.8-cuda_cudart-12.8.90
  /nix/store/js94l573zp6a325irbymcpajr95r8011-cuda12.8-cuda_cupti-12.8.90-lib
  /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93
  /nix/store/pdjnbw4sa9f4mag54hxq8wrk5qidk6pn-cuda12.8-cudnn-9.13.0.50-lib
  /nix/store/1c0jcdqaf7pjf28jsizkysy6h1pj2048-cuda12.8-libcublas-12.8.4.1-lib
  /nix/store/x1pf46gpsy4s0b18598p4byagl15im89-cuda12.8-libcufft-11.3.3.83-lib
  /nix/store/2myxa089vbhxrls82nhhpi93gr68crwc-cuda12.8-libcusolver-11.7.3.90-lib
  /nix/store/lh80i7q850hgk6m55yfxzllhx0mcim88-cuda12.8-libcusparse-12.5.8.93-lib
  /nix/store/ajvrjfzbmmi2sarsf6xmjhd1ib1g6a8w-cuda12.8-libnvjitlink-12.8.93-lib
  /nix/store/kx6rjjpgybnxci4wfm0yq54zdm4qidnp-cuda12.8-nccl-2.28.7-1
  /nix/store/98qfxl63r5s3fa6q9dlaladsrb4pn8n1-python3.12-absl-py-2.3.1
  /nix/store/kmi3l0wdnkma0sjfbm6661jsy9957r5g-python3.12-flatbuffers-25.2.10
  /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
  /nix/store/78kfvx2q26r5053pkp4g9f9y41hc99xm-python3.12-jax-cuda12-pjrt-0.8.0
  /nix/store/3a5iw8yqhsc5x16wllsanyyyzqm3xmvd-python3.12-jax-cuda12-plugin-0.8.0
  /nix/store/cfav66cmsr83k0hf45pps5azhys6kfl8-python3.12-jaxlib-0.8.0
  /nix/store/ji79rzmqg5r66bkdhdglzfg9ji2lb32q-python3.12-ml-dtypes-0.5.3
  /nix/store/jir196c5rj03a561hzp4scmvv0xcivwn-python3.12-numpy-2.3.3
  /nix/store/p95s2s0id7n6lc7czsk5rg7j0qdy8853-python3.12-opt-einsum-3.4.0
  /nix/store/2ga29gq011y1wa7gkcqfc1az7cp1mkah-python3.12-scipy-1.16.2
error: path '/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0' is not valid

[daniel@laptop:~]$ nix path-info --store https://cache.nixos-cuda.org /nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
/nix/store/s2w4h19yylw9ls7q84j8bd1md62kcrzh-python3.12-jax-0.8.0
12:52:31
@daniel-fahey:matrix.orgDaniel Fahey(maybe they're having the same build issue, lol) I don't know enough about them tbh 12:58:09
@daniel-fahey:matrix.orgDaniel FaheyFlox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/) Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol13:06:13
@daniel-fahey:matrix.orgDaniel FaheySo much for the private sector13:06:18
@sporeray:matrix.orgRobbie Buxton
In reply to @daniel-fahey:matrix.org
Flox are building from their own fork of Nixpkgs. (according to https://flox.dev/blog/the-flox-catalog-now-contains-nvidia-cuda/)

Their https://github.com/flox/nixpkgs/tree/unstable is ~10 days old, lol
Are you sure they aren’t just wrapping nix unstable with some hacks for their project?
15:23:23
@arilotter:matrix.orgAri Lotterallllllright let's try nixpkgs-review again with the new binary cache :p15:30:38
@arilotter:matrix.orgAri Lotterstill building jax, but, uhhhh, we ball15:40:02
@glepage:matrix.orgGaétan LepageWhich PR?15:56:37
@arilotter:matrix.orgAri Lotterthis one https://github.com/NixOS/nixpkgs/pull/46070115:57:23
@arilotter:matrix.orgAri Lotter yep, workers keep crashing :/
] building python3.12-jax-0.8.0 (pytestCheckPhase): replacing crashed worker gw1
15:57:38
@arilotter:matrix.orgAri Lotter hm,
warning: ignoring the client-specified setting 'sandbox', because it is a restricted setting and you are not a trusted user
warning: ignoring the client-specified setting 'system', because it is a restricted setting and you are not a trusted user
would setting myself as a trusted user fix this, i wonder
16:02:06
@daniel-fahey:matrix.orgDaniel Fahey Yeah you might want to use extra-substituters (I never grok'd the difference, if I'm being honest) 16:06:51
@daniel-fahey:matrix.orgDaniel Faheynot sure if it's different with Nix on Ubuntu, but on NixOS, I have to rebuild the system before the binary cache is available. Is there a rebuild step with plain Nix?16:08:28
@daniel-fahey:matrix.orgDaniel Fahey

I reckon they might be in a private repo hinted at in https://github.com/flox/nixpkgs/pull/3#issuecomment-1276439899

But the https://github.com/flox/nixpkgs/tree/unstable is a simple fork that I'd like to see sync'd/rebased from upstream Nixpkgs more frequently.

Gaétan Lepage is this is the kind of thing that could be discussed in your CUDA Team meetings, and maybe brought up to the Steering Committee for discussion with Flox?

16:12:16
@glepage:matrix.orgGaétan Lepage Flox is managing their cache internally. As you pointed, they use an internal fork of nixpkgs that is slightly delayed from nixos-unstable (or nixos-unstable-small). It's normal that their cache is less fresh than chache.nixos-cuda.org. 16:13:41
@glepage:matrix.orgGaétan LepageThe difference is that they have the permission from Nvidia to redistribute their binaries.16:14:01
@daniel-fahey:matrix.orgDaniel Fahey😅16:14:45
@arilotter:matrix.orgAri Lotteri have it in both because i don't understand it <316:15:52
@arilotter:matrix.orgAri Lotter also - 100% sure i'm not running out of ram anymore, at only ~400gb/2tb used on the machine, but jax still has crashed workers - and not sure if it's progressing 16:17:00
@arilotter:matrix.orgAri Lottercan i pull up interactive logs for its derivation somehow?16:17:11

Show newer messages


Back to Room ListRoom Version: 9