NixOS CUDA - Public Room Timeline

	NixOS CUDA	290 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
29 Dec 2024
SomeoneSerge (back on matrix)	Yeah	21:50:35
connor (he/him)	I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting. And every time upstream changed something, BOOM! Another merge conflict or more rewriting required. But I suppose it’s that way with lots of projects.	21:55:11
connor (he/him)	Serge, how do you stay upbeat about packaging stuff?	21:55:58
SomeoneSerge (back on matrix)	Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side	21:56:38
SomeoneSerge (back on matrix)	I clearly don't...	21:57:31
Gaétan Lepage	In reply to @connorbaker:matrix.org I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting. And every time upstream changed something, BOOM! Another merge conflict or more rewriting required. But I suppose it’s that way with lots of projects. At least we still build pytorch from source... looking at you protobuf-python, tensorflow and, since today, jax	21:57:59
Gaétan Lepage	🥲	21:58:05
Gaétan Lepage	At least they take less resources to build 🤡	21:58:59
Gaétan Lepage	* At least they take less resources to "build" 🤡	21:59:04
connor (he/him)	In reply to @ss:someonex.net Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side Thoughts on what to do when upstream makes it clear they don’t care or don’t want to implement changes that make it easier (or feasible) to build with Nix?	21:59:14
SomeoneSerge (back on matrix)	In case of pytorch, I think they are willing to accept stuff	21:59:41
SomeoneSerge (back on matrix)	They just won't do it themselves	21:59:53
Gaétan Lepage	Also, I'm afraid we are severly under-staffed :/	22:00:12
connor (he/him)	I’ve had good experiences with them too; I meant more like NVIDIA and the ONNX ecosystem	22:00:14
Gaétan Lepage	Since the latest staging-next merge, everything is kind of broken...	22:00:38
Gaétan Lepage	Hopefully, we merged the triton-llvm fix and the jax/jaxlib switch to bin.	22:00:54
SomeoneSerge (back on matrix)	Yeah right... like, pray that they lose the market?	22:00:55
Gaétan Lepage	But still	22:00:57
Gaétan Lepage	Also, we have python3.13 now which is very brittle	22:01:12
connor (he/him)	At least the gods gave Sisyphus the rock he has to push; I had to buy mine from ZOTAC	22:03:40
30 Dec 2024
	matthewcroughan changed their display name from matthewcroughan (DECT: 56490) to matthewcroughan.	17:27:46
connor (he/him)	Well, messing around with Triton compiler failure on pytorch, you know the good ol' error `torch._inductor.exc.InductorError: FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ldconfig'` seems similar to what Serge pointed out here https://github.com/NixOS/nixpkgs/pull/278969/files#diff-289748b7fbff3ff07ecd17030035a7e7aa78b21e882a549900885e6bc5030973	18:41:06
connor (he/him)	Oh I got `torch.compile` working!	21:02:45
connor (he/him)	I'll submit a PR for Nixpkgs	21:12:21
connor (he/him)	Oh wait, what's the proper way to expose a runtime dependency on `libcuda.so`? Is it enough to point it to the stub? Because (as I understand) that's only for linking, not for runtime use (because it's a stub). Since `libcuda.so` is provided by the driver, and library location depends on the host OS...	21:15:03
connor (he/him)	I guess if we knew ahead of time where `libcuda.so` and the like were, we wouldn't need `nixGL` or `nix-gl-host` because we could package everything in a platform-agnostic way, huh...	21:23:41
connor (he/him)	At any rate, here's https://github.com/NixOS/nixpkgs/pull/369495	21:24:07
31 Dec 2024
connor (he/him)	well, I was able to package https://github.com/NVIDIA/TransformerEngine for PyTorch updated (locally) https://github.com/ConnorBaker/nix-cuda-test to verify I could train an FP8 model on my 4090 using the work I've done in https://github.com/connorbaker/cuda-packages and it seems to work	09:49:28
	🐰 xiaoxiangmoe joined the room.	10:44:26
connor (he/him)	also packaging flash attention now because hopefully it supports fp8 training where PyTorch's implementation does not why does it require so much memory to build? What is NVCC doing?	16:09:28

Show newer messages

Back to Room ListRoom Version: 9