NixOS CUDA - Public Room Timeline

	NixOS CUDA	274 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	55 Servers

Load older messages

Sender	Message	Time
4 Nov 2025
Robbie Buxton	V2 or v3	22:24:27
Robbie Buxton	And from what got tag?	22:24:51
Robbie Buxton	* And from what git tag?	22:24:59
Ari Lotter	v2, from tag `v2.8.2`	22:29:50
Robbie Buxton	I think there is currently a pr open in nixpkgs to add this, is that the one you’re building?	22:30:41
Ari Lotter	oh neat, no	22:31:37
Ari Lotter	let me compare my derivation with that one	22:31:40
Ari Lotter	ok yeah, decently similar. difference is i'm building against cutlass 4.0 instead of 4.1, and.. somehow my deps list is wayy simpler, yet the build works (on previous versions of my derivation, pre updating CUDA)? very strange..	22:35:13
Ari Lotter	but yeah i just smash into > build/lib.linux-x86_64-cpython-312/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: PC-relative offset overflow in PLT entry for `_ZNK3c1010TensorImpl4sizeEl' ``` 🤷	22:35:28
Ari Lotter	i'm so tired of CUDA nightmares 😭 im so close to giving up and building dockerized devenvs, i just really don't want to give in..... :(	22:37:57
Gaétan Lepage	(It's a secret, but you might want to add `https://cache.nixos-cuda.org` as a substituter, it is slowly getting more and more artifacts) Public key: `cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M=`	22:44:02
Gaétan Lepage	connor (burnt/out) (UTC-8), Serge and I got #457803 ready. We are waiting for nixpkgs's CI to get fixed (https://github.com/NixOS/nixpkgs/pull/458647). Let's merge ASAP	23:38:07
Robbie Buxton	For flash attention you should use the version of cutlass in the repo	23:54:57
Robbie Buxton	They have a hash	23:55:06
Robbie Buxton	In csrc/cutlass	23:56:01
Robbie Buxton	* They have a rev	23:56:25
5 Nov 2025
apyh	ah fair enough	00:10:30
SomeoneSerge (back on matrix)	step 1: `torchWithCuda = pkgsCuda.....torch` (we were supposed to be here now, but it got out of hand) step 2: `torchWithCuda = warn "..." pkgsCuda...` step 3: `torchWithCuda = throw`	00:12:18
SomeoneSerge (back on matrix)	and what we really want is late binding and incremental builds	00:13:41
connor (burnt/out) (UTC-8)	Why are you building for so many CUDA capabilities? I can’t really think of a reason you’d need that range in particular.	01:59:14
connor (burnt/out) (UTC-8)	Added to merge queue	02:07:23
apyh	In reply to @connorbaker:matrix.org Why are you building for so many CUDA capabilities? I can’t really think of a reason you’d need that range in particular. 's a distributed ml training application that needs to run on everything from gtx 10xx gpus to modern data center GH/GB200s :/	03:27:37
apyh	most common hardware is gonna be 30xx 40xx 50xx, h100, a100, b200	03:27:56
apyh	though.. i could just see what pytorch precompiled wherls runs on and limit to that	03:28:54
apyh	should be fine	03:28:56
connor (burnt/out) (UTC-8)	Flash attention doesn’t support anything older than Ampere I thought	03:29:07
Robbie Buxton	V2 does	03:29:19
Robbie Buxton	V3 is hopper only	03:29:24
apyh	ya its only v3 iirc	03:29:26
Robbie Buxton	V4 (cute) is Blackwell	03:29:33

Show newer messages

Back to Room ListRoom Version: 9