| 29 Dec 2024 |
SomeoneSerge (back on matrix) | Yeah | 21:50:35 |
connor (he/him) | I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting.
And every time upstream changed something, BOOM! Another merge conflict or more rewriting required.
But I suppose it’s that way with lots of projects. | 21:55:11 |
connor (he/him) | Serge, how do you stay upbeat about packaging stuff? | 21:55:58 |
SomeoneSerge (back on matrix) | Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side | 21:56:38 |
SomeoneSerge (back on matrix) | I clearly don't... | 21:57:31 |
Gaétan Lepage | In reply to @connorbaker:matrix.org I remember I had tried to work on using system-provided dependencies (I guess more than a year ago now) and gave up because it would have required a bunch CMake rewriting.
And every time upstream changed something, BOOM! Another merge conflict or more rewriting required.
But I suppose it’s that way with lots of projects. At least we still build pytorch from source... looking at you protobuf-python, tensorflow and, since today, jax | 21:57:59 |
Gaétan Lepage | 🥲 | 21:58:05 |
Gaétan Lepage | At least they take less resources to build 🤡 | 21:58:59 |
Gaétan Lepage | * At least they take less resources to "build" 🤡 | 21:59:04 |
connor (he/him) | In reply to @ss:someonex.net Yes, which is why this is really is about working with the upstream and getting the changes through on their side, not on nixpkgs side Thoughts on what to do when upstream makes it clear they don’t care or don’t want to implement changes that make it easier (or feasible) to build with Nix? | 21:59:14 |
SomeoneSerge (back on matrix) | In case of pytorch, I think they are willing to accept stuff | 21:59:41 |
SomeoneSerge (back on matrix) | They just won't do it themselves | 21:59:53 |
Gaétan Lepage | Also, I'm afraid we are severly under-staffed :/ | 22:00:12 |
connor (he/him) | I’ve had good experiences with them too; I meant more like NVIDIA and the ONNX ecosystem | 22:00:14 |
Gaétan Lepage | Since the latest staging-next merge, everything is kind of broken... | 22:00:38 |
Gaétan Lepage | Hopefully, we merged the triton-llvm fix and the jax/jaxlib switch to bin. | 22:00:54 |
SomeoneSerge (back on matrix) | Yeah right... like, pray that they lose the market? | 22:00:55 |
Gaétan Lepage | But still | 22:00:57 |
Gaétan Lepage | Also, we have python3.13 now which is very brittle | 22:01:12 |
connor (he/him) | At least the gods gave Sisyphus the rock he has to push; I had to buy mine from ZOTAC | 22:03:40 |
| 30 Dec 2024 |
| matthewcroughan changed their display name from matthewcroughan (DECT: 56490) to matthewcroughan. | 17:27:46 |
connor (he/him) | Well, messing around with Triton compiler failure on pytorch, you know the good ol' error
torch._inductor.exc.InductorError: FileNotFoundError: [Errno 2] No such file or directory: '/sbin/ldconfig'
seems similar to what Serge pointed out here https://github.com/NixOS/nixpkgs/pull/278969/files#diff-289748b7fbff3ff07ecd17030035a7e7aa78b21e882a549900885e6bc5030973
| 18:41:06 |
connor (he/him) | Oh I got torch.compile working! | 21:02:45 |
connor (he/him) | I'll submit a PR for Nixpkgs | 21:12:21 |
connor (he/him) | Oh wait, what's the proper way to expose a runtime dependency on libcuda.so? Is it enough to point it to the stub? Because (as I understand) that's only for linking, not for runtime use (because it's a stub). Since libcuda.so is provided by the driver, and library location depends on the host OS... | 21:15:03 |
connor (he/him) | I guess if we knew ahead of time where libcuda.so and the like were, we wouldn't need nixGL or nix-gl-host because we could package everything in a platform-agnostic way, huh... | 21:23:41 |
connor (he/him) | At any rate, here's https://github.com/NixOS/nixpkgs/pull/369495 | 21:24:07 |
| 31 Dec 2024 |
connor (he/him) | well, I was able to package https://github.com/NVIDIA/TransformerEngine for PyTorch
updated (locally) https://github.com/ConnorBaker/nix-cuda-test to verify I could train an FP8 model on my 4090 using the work I've done in https://github.com/connorbaker/cuda-packages and it seems to work | 09:49:28 |
| 🐰 xiaoxiangmoe joined the room. | 10:44:26 |
connor (he/him) | also packaging flash attention now because hopefully it supports fp8 training where PyTorch's implementation does not why does it require so much memory to build? What is NVCC doing? | 16:09:28 |