!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

336 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
24 May 2026
@hexa:lossy.networkhexahttps://isaiprofitable.com/ lmao16:40:49
@hexa:lossy.networkhexawell played, nvidia16:40:59
@glepage:matrix.orgGaétan LepageYup, will do.19:30:48
@glepage:matrix.orgGaétan Lepage https://github.com/nixos-cuda/hydra-jobsets/pull/31 20:13:09
27 May 2026
@glepage:matrix.orgGaétan Lepage

If anyone has a decent modern GPU to test the flash-attention tests, please ping me.
The CUDA team's infra is not sufficent:

python3.13-flash-attention> FAILED tests/losses/test_cross_entropy.py::test_cross_entropy_loss[128256-0.9-0.7-True-0.01-True-False-dtype2] - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1002.00 MiB. GPU 0 has a total capacity of 19.55 GiB of which 360.38 MiB is free. Including non-PyTorch memory, this process has 19.19 GiB memory in use. Of the allocated memory 18.10 GiB is allocated by PyTorch, and 925.39 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)

Thanks in advance for your generosity

15:24:45
@berrij:fairydust.spaceBerriJWould an RTX 6000 Pro with 96GB VRAM be okay? If yes I could run these test but I would need relatively detailed instructions. I'm running a flake based system based on nixos-unstable and I'm running the "latest" Nvidia drivers.15:49:37
@glepage:matrix.orgGaétan Lepage

I'm pretty sure that would fit. Thanks a lot!

You'd need to add the following to your config:

      programs.nix-required-mounts = {
        enable = true;
        presets.nvidia-gpu.enable = true;
      };

Then

nix build github:GaetanLepage/nixpkgs/flash-attn#python3Packages.flash-attn.gpuCheck --cores 10
16:00:23
@glepage:matrix.orgGaétan Lepage Watch out for RAM consumption though. It's terribly hungry. I need to set it to 15 max on a 128GB system. 16:01:16
@glepage:matrix.orgGaétan Lepage Hmm. Wait, you need to set cudaSupport. 16:02:39
@berrij:fairydust.spaceBerriJI could also jump into a dev shell if you provide me a flake if that's easier. Anyway I can try when I'm back home in about an hour. And the machine in question has 760gb of ram so we should be fine I guess 😇16:04:05
@hexa:lossy.networkhexain this economy?!16:04:38
@glepage:matrix.orgGaétan Lepage
nix build --impure --cores 2 --expr '
    (import (builtins.getFlake "github:GaetanLepage/nixpkgs/flash-attn") {
      system = builtins.currentSystem;
      config = { allowUnfree = true; cudaSupport = true; };
    }).python3Packages.flash-attn.gpuCheck
  '

This should do it.

16:05:28
@glepage:matrix.orgGaétan Lepage *
nix build --impure --expr '
    (import (builtins.getFlake "github:GaetanLepage/nixpkgs/flash-attn") {
      system = builtins.currentSystem;
      config = { allowUnfree = true; cudaSupport = true; };
    }).python3Packages.flash-attn.gpuCheck
  '

This should do it.

16:05:59
@berrij:fairydust.spaceBerriJ
In reply to @hexa:lossy.network
in this economy?!
It's not my private one unfortunately 😅
But I'm the admin and currently there is no workload on that thing.
16:07:57
@glepage:matrix.orgGaétan Lepage

I mean... If only I had nix installed...

root@p4-r01-ct18:~# nvidia-smi
Wed May 27 16:10:23 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.21             Driver Version: 580.126.21     CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GB200                   On  |   00000008:01:00.0 Off |                    0 |
| N/A   45C    P0            170W / 1200W |       0MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GB200                   On  |   00000009:01:00.0 Off |                    0 |
| N/A   45C    P0            153W / 1200W |       0MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GB200                   On  |   00000018:01:00.0 Off |                    0 |
| N/A   45C    P0            153W / 1200W |       0MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GB200                   On  |   00000019:01:00.0 Off |                    0 |
| N/A   45C    P0            176W / 1200W |       0MiB / 189471MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
16:10:33
@hexa:lossy.networkhexamakes you wonder who we are building cuda support for16:12:00
@glepage:matrix.orgGaétan LepageNot for the owners of those GPUs unfortunately 🥲16:20:08
@berrij:fairydust.spaceBerriJIn my case I'm working at a German University and the server is used by a team of around 9 researchers :) 16:33:56
@hexa:lossy.networkhexapretty sure Gaetan works at some French university 😆16:37:26
@glepage:matrix.orgGaétan LepageNot anymore. (French universities don't have such fancy GPUs) 🫠16:38:38
@berrij:fairydust.spaceBerriJThe build is running now :)17:16:42
@ss:someonex.netSomeoneSerge (matrix works sometimes)Can't you nix in container?18:26:52
@ss:someonex.netSomeoneSerge (matrix works sometimes)Not TUM?18:27:57
@ss:someonex.netSomeoneSerge (matrix works sometimes) Not the OS group? I'd be hyped yo learn that somebody in academia/hpc/rse community actually uses nixpkgs cuda, because so far I've been getting the vibes that only the enterprise cares, and all these eurohpc/CSC/yada yada are completely unapproachable and dead set on their easybuild lmod workflows... 18:33:29
@berrij:fairydust.spaceBerriJUniversity of Duisburg-Essen, not TUM But it's really not that big of a deal. The economics faculty has its own little IT department, they bought some servers for machine learning of which our Chair was able to get one and we asked them to install nixos on that for us cause we use nixos since 2 years on all of our machines. That's essentially the full story, there is not that much support for NixOS besides me pushing it and my Boss seeing the advantages and sometimes proudly talking about our infra 😅18:55:53
@ss:someonex.netSomeoneSerge (matrix works sometimes)Shooting in the dark but anything that could be done or reprioritized on our side to potentially help the lab's story?19:48:31
@berrij:fairydust.spaceBerriJWell the biggest point is the cache. Currently we obtain pytorch and other ml packages from pypi cause it has the CUDA binaries packaged directly. I we can't really risk getting cache misses and triggering a 5 hour recompilation on my colleagues machines. And setting up our own binary cache is also not trivial, we are working from home a lot and the machines are only connected to the university vpn on demand. I've read that there is this flox cache now, but I also read that this does not strictly follow nixos-unstable.20:35:31
@berrij:fairydust.spaceBerriJBy the way the build is still running it's at the `pytestCheckPhase` of flash attention and causes a good 60gb of VRAM usage at the moment. I'll call it a day and report on the status tomorrow morning 🙂20:37:16
@busti:leitstelle511.net@busti:leitstelle511.net left the room.21:16:57
28 May 2026
@glepage:matrix.orgGaétan Lepage CUDA 13.3 is out: https://developer.nvidia.com/blog/nvidia-cuda-13-3-enhances-gpu-development-with-tile-programming-in-c-compiler-autotuning-and-python-updates/ 07:16:08

Show newer messages


Back to Room ListRoom Version: 9