!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

282 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
14 Dec 2024
@matthewcroughan:defenestrate.itmatthewcroughanAre the ecosystems really varying enough to need these two cases? Or can we commonise them?21:52:44
15 Dec 2024
@connorbaker:matrix.orgconnor (he/him) I don’t believe there’s a way we can commonise them, at least currently, but my focus has been more on tools (ONNX, PyTorch, etc.) than things that use them (actual models, inference capabilities, etc.), so that’s definitely shaped my opinion.
As I understand it, from an implementation perspective, you’d need some sort of mapping from functionality (BLAS, FFT, DNN) to the actual library (or libraries?) which implement that functionality. But the different ecosystems offer different functionalities and might not have corresponding libraries.
05:54:52
@connorbaker:matrix.orgconnor (he/him)
In reply to @matthewcroughan:defenestrate.it
if you enable cudaSupport and rocmSupport, what happens? Do you actually get an output that is usable for both?

That depends entirely on the project and the nix expression for it.
For the projects I’m aware of (Magma and PyTorch) CUDA and ROCm backends are exclusive of each other and the build systems would require major rewrites to produce a single build with support for both.

From what I understand from what you desire, the closest analogue I can think of would be Apple’s “universal binary” which supports both x86-64 and aarch64… but I suspect you’d also want the equivalent of function multiversioning, where each function is compiled into multiple variants using different architecture features (think SSE, NEON, AVX256, AVX512, etc.) so that at runtime the function matching the host’s most advanced capability is used. This would correspond to CUDA compute capabilities.

NVCC can produce binaries with device code for multiple capabilities, but it does increase the compile time and binary size significantly — enough so that linking can fail due to size limitations! That’s part of the reason Magma in Nixpkgs produces a static library when building with CUDA.

06:04:37
@matthewcroughan:defenestrate.itmatthewcroughanIs it fair to say that rocm is not supported very well right now in nixpkgs?21:01:43
@ss:someonex.netSomeoneSerge (back on matrix) yes, rather 21:02:38
@matthewcroughan:defenestrate.itmatthewcroughan We seem unable to compile torch, so okay I override python to use torch-bin, but then I still have to allowBroken 21:02:48
@matthewcroughan:defenestrate.itmatthewcroughan
       error: 'miopengemm' has been deprecated.
       It is still available for some time as part of rocmPackages_5.

21:02:52
@matthewcroughan:defenestrate.itmatthewcroughanand then if I do, this happens21:02:56
@sielicki:matrix.orgsielicki

SomeoneSerge (utc+3): connor (he/him) (UTC-7): I spent some time revamping my personal flake today/yesterday and now have a better understanding of the new hostPlatform/buildPlatform stuff, alongside lib.platforms

I do think it's the right interface long-term for both cudaSupport, gencodes, and also configuring fabrics.

The entire extent of my contributions to nixpkgs has been just doing small contributions to specific packages. I want to write up a small doc proposing this and discussing the migration path, but also wanted to collaborate w/ you guys. I don't even know where to post this, do I just open an issue on gh or does it need to go on the discourse?

21:18:06
16 Dec 2024
@connorbaker:matrix.orgconnor (he/him)I'd recommend testing the waters and getting a sense of prior art done in terms of extending those; perhaps the folks in the Nixpkgs Stdenv room would be good to reach out to? https://matrix.to/#/#stdenv:nixos.org After that (and you've gotten a list of people interested in or knowledgeable about such work), I think drafting an RFC would be the next step, to fully lay out the plan for design and implementation. If it's a relatively small change, maybe an RFC is unnecessary and a PR would be fine!06:55:30
@ss:someonex.netSomeoneSerge (back on matrix)

That's the reason I've lately been ignoring flakes in personal projects: just stick to impure eval and autocalling. For me it's usually going in the direction of

{ nixpkgs ? <nixpkgs>
, pkgs ? import nixpkgs { inherit config; }
, lib ? pkgs.lib
, cudaSupport ? false
, config ? { inherit cudaSupport; }
}:

lib.makeScope pkgs.newScope (self: {
 # ...
})
14:45:04
@ss:someonex.netSomeoneSerge (back on matrix) *

That's the reason I've lately been ignoring flakes in personal projects: just stick to impure eval and autocalling. For me it's usually going in the direction of

{ nixpkgs ? <nixpkgs>
, pkgs ? import nixpkgs { inherit config; }
, lib ? pkgs.lib
, cudaSupport ? false
, config ? { inherit cudaSupport; }
}:

lib.makeScope pkgs.newScope (self: {
 # ...
})
14:45:18
@matthewcroughan:defenestrate.itmatthewcroughanI actually ended up with a good pattern using flake.parts14:45:28
@matthewcroughan:defenestrate.itmatthewcroughan
      perSystem = { system, ... }: {
        _module.args.rocmPkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
            (self: super: {
              python3 = super.python3.override {
                packageOverrides = self: super: { torch = super.torch-bin; };
              };
            })
          ];
          config.allowUnfree = true;
          config.rocmSupport = true;
          inherit system;
        };
        _module.args.nvidiaPkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
          ];
          config.allowUnfree = true;
          config.cudaSupport = true;
          inherit system;
        };
        _module.args.pkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
          ];
          config.allowUnfree = true;
          inherit system;
        };
      };

14:46:22
@matthewcroughan:defenestrate.itmatthewcroughan *
      perSystem = { system, ... }: {
        _module.args.rocmPkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
            (self: super: {
              python3 = super.python3.override {
                packageOverrides = self: super: { torch = super.torch-bin; };
              };
            })
          ];
          config.allowUnfree = true;
          config.rocmSupport = true;
          inherit system;
        };
        _module.args.nvidiaPkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
          ];
          config.allowUnfree = true;
          config.cudaSupport = true;
          inherit system;
        };
        _module.args.pkgs = import inputs.nixpkgs {
          overlays = [
            inputs.self.overlays.default
          ];
          config.allowUnfree = true;
          inherit system;
        };
      };

14:46:27
@matthewcroughan:defenestrate.itmatthewcroughanCould be deduplicated using a function, but this is what it looks like unfolded14:46:38
@matthewcroughan:defenestrate.itmatthewcroughan Then other flake-modules have these rocmPkgs and nvidiaPkgs arguments passed to them 14:47:01
@matthewcroughan:defenestrate.itmatthewcroughan
  perSystem = { config, pkgs, nvidiaPkgs, rocmPkgs, system, ... }: {
    packages = {
      comfyui-nvidia = nvidiaPkgs.comfyuiPackages.comfyui;
      comfyui-amd = rocmPkgs.comfyuiPackages.comfyui;
    };
  };

14:47:04
@matthewcroughan:defenestrate.itmatthewcroughan *
  perSystem = { config, pkgs, nvidiaPkgs, rocmPkgs, system, ... }: {
    packages = {
      comfyui-nvidia = nvidiaPkgs.comfyuiPackages.comfyui;
      comfyui-amd = rocmPkgs.comfyuiPackages.comfyui;
    };
  };
14:47:05
@ss:someonex.netSomeoneSerge (back on matrix) I think this should be doable without major backwards incompatible changes? 14:47:07
@matthewcroughan:defenestrate.itmatthewcroughanthe issue is that rocmSupport is fully broken in Nixpkgs, so this doesn't work, but it should 14:47:25
@matthewcroughan:defenestrate.itmatthewcroughan * the issue is that rocmSupport is fully broken in Nixpkgs, so this doesn't work, but it should in future14:47:29
@ss:someonex.netSomeoneSerge (back on matrix)

That's more or less what the llama-cpp flake did, but didn't you say

I don't want to have attributes like nvidia-myapp rocm-myapp and myapp

14:48:09
@matthewcroughan:defenestrate.itmatthewcroughanI'm happy as long as I don't have to do weird things to achieve it14:48:58
@matthewcroughan:defenestrate.itmatthewcroughanand for me, this is not weird14:49:02
@matthewcroughan:defenestrate.itmatthewcroughanpreviously what my flake was doing was far weirder14:49:07
@matthewcroughan:defenestrate.itmatthewcroughanhttps://github.com/nixified-ai/flake/blob/master/projects/invokeai/default.nix#L66-L9614:49:32
@matthewcroughan:defenestrate.itmatthewcroughan previously it was defining functions that were able to create variants of packages without setting rocmSupport or cudaSupport 14:49:51
@matthewcroughan:defenestrate.itmatthewcroughanJust terrible14:50:00
@matthewcroughan:defenestrate.itmatthewcroughan Besides, the modules the flake will export, won't interact with the comfyui-nvidia or comfyui-amd attrs, this is just for people who want to try it with nix run 14:50:42

Show newer messages


Back to Room ListRoom Version: 9