| 5 Oct 2025 |
Gaétan Lepage | Now, maybe, bumping it nonetheless could fix the issue. | 21:53:40 |
Daniel Fahey | v0.11.0 uses CUTLASS v4.0.0, their next version will bump it | 21:56:35 |
Daniel Fahey | see https://github.com/vllm-project/vllm/commit/5234dc74514a6b3d0740b39f56a4a4208ec86ecc (part of https://github.com/vllm-project/vllm/pull/24673) | 21:58:06 |
Daniel Fahey | we could sort "backport" the version though, yeah | 21:58:23 |
Daniel Fahey | as a fix | 21:58:27 |
Daniel Fahey | * yeah see https://github.com/vllm-project/vllm/commit/5234dc74514a6b3d0740b39f56a4a4208ec86ecc (part of https://github.com/vllm-project/vllm/pull/24673) | 21:58:46 |
Daniel Fahey | * yep v0.11.0 uses CUTLASS v4.0.0, their next version will bump it | 21:58:58 |
Gaétan Lepage | staging-next was merged a few hours ago.
Apparently, new CUDA tests are failing: https://hydra.nixos-cuda.org/eval/267#tabs-now-fail
cc connor (he/him) (UTC-7) SomeoneSerge (back on matrix) | 22:28:21 |
Daniel Fahey | draft PR, I'm building, will take a few hours, will check in the morning | 22:35:48 |
Daniel Fahey | * draft PR, I'm building, will take a few hours, will check in the morning
https://github.com/NixOS/nixpkgs/pull/448965 | 22:35:55 |
Daniel Fahey | * draft PR, my machine's building it, will take a few hours, will check in the morning
https://github.com/NixOS/nixpkgs/pull/448965 | 22:37:12 |
lon | Interesting, when I was building vllm 0.11 yesterday I mistakenly took 4.2.1 from Cmakelists in master and I've been inferencing with it since yesterday. I have cudacapabilities 8.9, an "old" 4090 w/24gb, compiling takes ~45min in my i9 13900. IMO, instead of disabling building for sm100 I'd rather bump cutlass | 23:26:51 |
lon | Indeed. My current config to try to steer it away from downloading models, while at the same time reasonably "caging it" in systemd has so many variables. I'm curious to know how you run vllm, here's a "Claude Code-extracted" version of what I use in my machines https://gist.github.com/longregen/e8146a3e34fb7f114b2da43ffa0d8023#file-configuration-nix-L25
| 23:36:40 |
| 6 Oct 2025 |
Daniel Fahey | Wow, this is great to see, personal AI for the people! Thanks for sharing I'll definitely be referring to it | 06:20:02 |
SomeoneSerge (back on matrix) | RE: Diffing for release-cuda.nix
Just chatted with Gaétan Lepage about "checked-in lists vs IFD vs pure eval diffing". Previously expressed my feelings in the context of ROCm here: https://github.com/NixOS/nixpkgs/pull/446976#issuecomment-3353986656. Tldr: no diffing > pure eval > ifd > checked in codegen lists (although vcunat suggests no-diffing may be infeasible)
connor (he/him) (UTC-7)
| 09:56:44 |
Lun | Is there an example of acceptably fast diffing around somewhere?
I landed on checked in diff because I couldn't work out how to make it fast and hexa had already tried a no diff jobset. | 10:18:21 |
SomeoneSerge (back on matrix) |
acceptably fast
Not that I'm aware of.
Gaetan pointed to ci/eval/compare (would need adjustment) for a derivation-level solution. I was thinking of building up on top of release-lib.nix.
This one eval-level flat list routine I'd describe as "painfully slow": https://github.com/SomeoneSerge/nixpkgs-cuda-ci/blob/abee609531807217495cd15e6ced14ad0dee5d18/nix/utils.nix#L73-L85
| 10:24:41 |
SomeoneSerge (back on matrix) | *
acceptably fast
Not that I'm aware of.
Gaetan pointed to ci/eval/compare (would need adjustment) for a derivation-level solution. I was thinking of building up on top of release-lib.nix.
This one eval-level flat list routine I'd describe as "painfully slow": https://github.com/SomeoneSerge/nixpkgs-cuda-ci/blob/abee609531807217495cd15e6ced14ad0dee5d18/nix/utils.nix#L73-L85. Probably could be made less sequential
| 10:25:22 |
Daniel Fahey | Build fails with a simple CUTLASS bump https://github.com/NixOS/nixpkgs/pull/448965#issuecomment-3370979611
I suspect yours succeeded because you're using a cudaCapabilities with 8.9 only?
| 10:49:58 |
lon | Yes, probably that ! | 11:20:56 |
connor (he/him) | I made a faster diffing thing (but it requires a fair amount of memory): https://github.com/ConnorBaker/nix-nixpkgs-review | 14:33:23 |
connor (he/him) | As an example:
nix build -L .#diffs.x86_64-linux.pkgs-pre-pkgs-cuda-pre --build-dir /run/temp-ramdisk --builders '' --override-input nixpkgs-pre github:NixOS/nixpkgs
will evaluate a copy of nixpkgs using the nixpkgs-pre input without CUDA enabled and with CUDA enabled, and then diff the results (each step happens in a separate derivation so there's caching)
It's IO and memory hungry though (IO because it's instantiating ~1.5 GB worth of derivations) and memory hungry because it's evaluating all of Nixpksg in a single pass
I've written it so it uses DetSys' parallel eval as well
| 14:39:10 |
connor (he/him) | Here's the result of that command: https://gist.github.com/ConnorBaker/b1bbb3547d6c15921843ba0e048f94fd | 14:41:08 |
connor (he/him) | When the evaluations of Nixpkgs instantiations are done in the derivations, the --eval-store argument is set to the evalStore output so we can keep the derivations around. The entries in the packages output of the flake are small wrapper scripts which run a nix build using the added and changed derivations -- the evalStore outputs are used as extra substituters so derivations are copied as needed into the store and we avoid doing evaluation again | 14:44:52 |
connor (he/him) | Anyway, I built that because I didn't have a way to run nixpkgs-review with content-addressed derivations and got irritated that it kept evaluating the base commit of PRs that hadn't changed (all it needed to do was re-evaluate the head of the PR). | 14:48:28 |
connor (he/him) | (Using the scripts in packages does require the read-only-local-store feature be enabled, since the evalStore outputs from the reports are just small instances of Nix stores which are inside the Nix store, so they do need to be mounted as read-only.) | 14:51:17 |