NixOS CUDA | 291 Members | |
| CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda | 57 Servers |
| Sender | Message | Time |
|---|---|---|
| 14 Dec 2024 | ||
| as far as it relates to cuda -- as long as you have cudaified torch in the python environment and the shebangs are modified to refer to that python environment, I don't think there's any real concern around what you're doing in the installPhase | 20:47:26 | |
| just keep in mind the general restrictions around running cuda apps on non-nixos | 20:47:42 | |
| yeah because the complexities of advertising more than nvidia/cuda outside of nixpkgs are high | 20:50:21 | |
| even if it reduces closure size, the overhead of writing nix code that is able to provide nvidia/rocm is really a lot | 20:50:49 | |
| * even if it reduces closure size, the overhead of writing nix code that is able to provide nvidia/rocm separately is really a lot | 20:50:54 | |
I don't want to have attributes like nvidia-myapp rocm-myapp and myapp | 20:51:36 | |
| would be much better to have a single derivation that is capable of all at runtime | 20:52:00 | |
| you shouldn't have to -- the torch input to your derivation will check whether nixpkgs was imported with config.cudaSupport and transparently do the right thing | 20:52:11 | |
| yes, which means in my flake that isn't nixpkgs, I will have to import nixpkgs 3 times to handle those 3 cases | 20:52:30 | |
| and also write nix code three times to handle the three cases | 20:52:51 | |
I could delay it by using an overlay and just not create 3 attributes upfront in the packages attribute of a flake | 20:53:22 | |
then it will depend on the importer.. but then I can't just have people nix run my attributes, they will have to write their own flake to consume, and specify the support in their own flake (which imo is not ideal) | 20:53:50 | |
nix run --impure should check the users ~/.config/nixpkgs file and do the right thing around nonFree and/or cudaSupport/rocmSupport. If you want to add separate imports of nixpkgs for your checks that's one thing but principally I think you should just expose a single package | 20:55:40 | |
| I'm not sure why I'd want a flake on GitHub that reads from the user's home directory at evaluation time. | 20:58:31 | |
I'd just want packages.x86_64-linux.{nvidia-myapp,rocm-myapp} | 20:59:11 | |
In theory, is there not a stub we could use to make it gpuSupport = true ? | 21:51:35 | |
| and infer the required libs for rocm or nvidia, from some intermediate representation? | 21:51:52 | |
buildInputs = with gpuPackages; [ somethingCommon ] which would decided what somethingCommon is based on cudaSupport or rocmSupport? | 21:52:26 | |
| Are the ecosystems really varying enough to need these two cases? Or can we commonise them? | 21:52:44 | |
| 15 Dec 2024 | ||
| I don’t believe there’s a way we can commonise them, at least currently, but my focus has been more on tools (ONNX, PyTorch, etc.) than things that use them (actual models, inference capabilities, etc.), so that’s definitely shaped my opinion. As I understand it, from an implementation perspective, you’d need some sort of mapping from functionality (BLAS, FFT, DNN) to the actual library (or libraries?) which implement that functionality. But the different ecosystems offer different functionalities and might not have corresponding libraries. | 05:54:52 | |
In reply to @matthewcroughan:defenestrate.it That depends entirely on the project and the nix expression for it. From what I understand from what you desire, the closest analogue I can think of would be Apple’s “universal binary” which supports both x86-64 and aarch64… but I suspect you’d also want the equivalent of function multiversioning, where each function is compiled into multiple variants using different architecture features (think SSE, NEON, AVX256, AVX512, etc.) so that at runtime the function matching the host’s most advanced capability is used. This would correspond to CUDA compute capabilities. NVCC can produce binaries with device code for multiple capabilities, but it does increase the compile time and binary size significantly — enough so that linking can fail due to size limitations! That’s part of the reason Magma in Nixpkgs produces a static library when building with CUDA. | 06:04:37 | |
| Is it fair to say that rocm is not supported very well right now in nixpkgs? | 21:01:43 | |
| yes, rather | 21:02:38 | |
We seem unable to compile torch, so okay I override python to use torch-bin, but then I still have to allowBroken | 21:02:48 | |
| 21:02:52 | |
| and then if I do, this happens | 21:02:56 | |
| SomeoneSerge (utc+3): connor (he/him) (UTC-7): I spent some time revamping my personal flake today/yesterday and now have a better understanding of the new hostPlatform/buildPlatform stuff, alongside lib.platforms I do think it's the right interface long-term for both cudaSupport, gencodes, and also configuring fabrics. The entire extent of my contributions to nixpkgs has been just doing small contributions to specific packages. I want to write up a small doc proposing this and discussing the migration path, but also wanted to collaborate w/ you guys. I don't even know where to post this, do I just open an issue on gh or does it need to go on the discourse? | 21:18:06 | |
| 16 Dec 2024 | ||
| I'd recommend testing the waters and getting a sense of prior art done in terms of extending those; perhaps the folks in the Nixpkgs Stdenv room would be good to reach out to? https://matrix.to/#/#stdenv:nixos.org After that (and you've gotten a list of people interested in or knowledgeable about such work), I think drafting an RFC would be the next step, to fully lay out the plan for design and implementation. If it's a relatively small change, maybe an RFC is unnecessary and a PR would be fine! | 06:55:30 | |
| That's the reason I've lately been ignoring flakes in personal projects: just stick to impure eval and autocalling. For me it's usually going in the direction of
| 14:45:04 | |
| * That's the reason I've lately been ignoring flakes in personal projects: just stick to impure eval and autocalling. For me it's usually going in the direction of
| 14:45:18 | |