!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
29 Dec 2024
@ss:someonex.netSomeoneSerge (back on matrix) Yeah 21:50:35
@connorbaker:matrix.orgconnor (he/him) I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting.
And every time upstream changed something, BOOM! Another merge conflict or more rewriting required.
But I suppose it’s that way with lots of projects.
21:55:11
@connorbaker:matrix.orgconnor (he/him)Serge, how do you stay upbeat about packaging stuff?21:55:58
@ss:someonex.netSomeoneSerge (back on matrix) Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side 21:56:38
@ss:someonex.netSomeoneSerge (back on matrix) I clearly don't... 21:57:31
@glepage:matrix.orgGaétan Lepage
In reply to @connorbaker:matrix.org
I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting.
And every time upstream changed something, BOOM! Another merge conflict or more rewriting required.
But I suppose it’s that way with lots of projects.
At least we still build pytorch from source... looking at you protobuf-python, tensorflow and, since today, jax
21:57:59
@glepage:matrix.orgGaétan Lepage🥲21:58:05
@glepage:matrix.orgGaétan LepageAt least they take less resources to build 🤡21:58:59
@glepage:matrix.orgGaétan Lepage* At least they take less resources to "build" 🤡21:59:04
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side
Thoughts on what to do when upstream makes it clear they don’t care or don’t want to implement changes that make it easier (or feasible) to build with Nix?
21:59:14
@ss:someonex.netSomeoneSerge (back on matrix)In case of pytorch, I think they are willing to accept stuff21:59:41
@ss:someonex.netSomeoneSerge (back on matrix)They just won't do it themselves21:59:53
@glepage:matrix.orgGaétan LepageAlso, I'm afraid we are severly under-staffed :/22:00:12
@connorbaker:matrix.orgconnor (he/him)I’ve had good experiences with them too; I meant more like NVIDIA and the ONNX ecosystem22:00:14
@glepage:matrix.orgGaétan LepageSince the latest staging-next merge, everything is kind of broken...22:00:38
@glepage:matrix.orgGaétan LepageHopefully, we merged the triton-llvm fix and the jax/jaxlib switch to bin.22:00:54
@ss:someonex.netSomeoneSerge (back on matrix) Yeah right... like, pray that they lose the market? 22:00:55
@glepage:matrix.orgGaétan LepageBut still22:00:57
@glepage:matrix.orgGaétan LepageAlso, we have python3.13 now which is very brittle22:01:12
@connorbaker:matrix.orgconnor (he/him)At least the gods gave Sisyphus the rock he has to push; I had to buy mine from ZOTAC22:03:40
30 Dec 2024
@matthewcroughan:defenestrate.itmatthewcroughan changed their display name from matthewcroughan (DECT: 56490) to matthewcroughan.17:27:46
@connorbaker:matrix.orgconnor (he/him)

Well, messing around with Triton compiler failure on pytorch, you know the good ol' error

torch._inductor.exc.InductorError: FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ldconfig'

seems similar to what Serge pointed out here https://github.com/NixOS/nixpkgs/pull/278969/files#diff-289748b7fbff3ff07ecd17030035a7e7aa78b21e882a549900885e6bc5030973

18:41:06
@connorbaker:matrix.orgconnor (he/him) Oh I got torch.compile working! 21:02:45
@connorbaker:matrix.orgconnor (he/him)I'll submit a PR for Nixpkgs21:12:21
@connorbaker:matrix.orgconnor (he/him) Oh wait, what's the proper way to expose a runtime dependency on libcuda.so? Is it enough to point it to the stub? Because (as I understand) that's only for linking, not for runtime use (because it's a stub).
Since libcuda.so is provided by the driver, and library location depends on the host OS...
21:15:03
@connorbaker:matrix.orgconnor (he/him) I guess if we knew ahead of time where libcuda.so and the like were, we wouldn't need nixGL or nix-gl-host because we could package everything in a platform-agnostic way, huh... 21:23:41
@connorbaker:matrix.orgconnor (he/him)At any rate, here's https://github.com/NixOS/nixpkgs/pull/36949521:24:07
31 Dec 2024
@connorbaker:matrix.orgconnor (he/him)well, I was able to package https://github.com/NVIDIA/TransformerEngine for PyTorch updated (locally) https://github.com/ConnorBaker/nix-cuda-test to verify I could train an FP8 model on my 4090 using the work I've done in https://github.com/connorbaker/cuda-packages and it seems to work09:49:28
@xiaoxiangmoe:matrix.org🐰 xiaoxiangmoe joined the room.10:44:26
@connorbaker:matrix.orgconnor (he/him) also packaging flash attention now because hopefully it supports fp8 training where PyTorch's implementation does not
why does it require so much memory to build? What is NVCC doing?
16:09:28

Show newer messages


Back to Room ListRoom Version: 9