!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

325 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
26 Feb 2023
@ss:someonex.netSomeoneSerge (matrix works sometimes) You mean the assert !cudaSupport || magma.cudatoolkit == cudatoolkit line? 20:20:12
@connorbaker:matrix.orgconnor (he/him)yes20:48:16
@ss:someonex.netSomeoneSerge (matrix works sometimes)I wonder why these asserts are there in the first place 🤔 I'm easily convinced that the only reason we'd ever pass different cudatoolkits to magma and torch is by mistake, but I don't know why these specific asserts22:02:47
@connorbaker:matrix.orgconnor (he/him)

Or someone mistakenly overrides one but not the other
I ended up doing this in python-modules:

  transformer-engine = callPackage ../development/python-modules/transformer-engine (
    let 
      cudaPackages = pkgs.cudaPackages_11_8;
      magma = pkgs.magma.override { inherit cudaPackages; };
      torch = self.torch.override { inherit cudaPackages magma; };
    in
    { 
      inherit cudaPackages torch;
    }
  );
22:12:44
@ss:someonex.netSomeoneSerge (matrix works sometimes)Yes, but it's not like this isn't going to build? It's just it's not what we probably wanted22:14:01
27 Feb 2023
@connorbaker:matrix.orgconnor (he/him)Yeah, it builds fine00:22:34
@connorbaker:matrix.orgconnor (he/him)

Unrelated, but is there any reason to prefer this pattern

cuda-redist = symlinkJoin {
  name = "cuda-redist";
  paths = with cudaPackages; [
    ...
  ];
};

over just declaring them inline with the buildInputs?

00:23:02
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org

Unrelated, but is there any reason to prefer this pattern

cuda-redist = symlinkJoin {
  name = "cuda-redist";
  paths = with cudaPackages; [
    ...
  ];
};

over just declaring them inline with the buildInputs?

No, symlinkJoin is a workaround for projects that expect all cuda pieces to be in one place and discover it through a single variable, like CUDA_HOME or CUDAToolkit_ROOT
07:12:29
@ss:someonex.netSomeoneSerge (matrix works sometimes) It's actually a bit problematic, because it increases the runtime closure size, and it mixes buildInputs with nativeBuildInputs 07:13:25
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org

Three more things that popped into my head (sorry, I am actively consuming coffee):

  1. When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards.
  2. If a package needs to link against libcuda.so specifically, what's the best way to make the linker aware of those stubs? I set LIBRARY_PATH and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76
  3. Is it better to set environment variables as env.BLAG = "blarg" (I saw a tree-wide change about using env because of "structuredAttrs") in the derivation or to export them in the shell, in something like preConfigure?

EDIT: I should put these in my issue about docs for CUDA packaging...

RE: 3.

Btw, did you find documentation for this env attribute?

08:02:22
@ss:someonex.netSomeoneSerge (matrix works sometimes) I'm trying to understand why they use it over just passing an attribute directly to mkDerivation 08:03:11
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org

Three more things that popped into my head (sorry, I am actively consuming coffee):

  1. When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards.
  2. If a package needs to link against libcuda.so specifically, what's the best way to make the linker aware of those stubs? I set LIBRARY_PATH and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76
  3. Is it better to set environment variables as env.BLAG = "blarg" (I saw a tree-wide change about using env because of "structuredAttrs") in the derivation or to export them in the shell, in something like preConfigure?

EDIT: I should put these in my issue about docs for CUDA packaging...

*

RE: 3.

Btw, did you find any documentation for this env attribute?

08:03:29
@ss:someonex.netSomeoneSerge (matrix works sometimes)

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

Couldn't find an issue tracking this, so I'll drop a message here.
The more precise argument in favour of building for individual capabilities is easier maintenance and nixpkgs development.
When working on master it's desirable to only build for your own arch, but currently it means a cache-miss for transitive dependencies.
For example, you work on torchvision and you import nixpkgs with config.cudaCapabilities = [ "8.6" ]. Snap! You're rebuilding pytorch, you cancel, you write a custom shell that overrides torchvision specifically, you remove asserts, etc.

Alternative world: cuda-maintainers.cachix.org has a day-old pytorch build for 8.6, a build for 7.5, a build for 6.0, etc

09:19:40
@ss:someonex.netSomeoneSerge (matrix works sometimes)

Indivudial builds:

  • More builds, but they're lighter
  • Can re-use cache when working on master
  • Hard to choose default capabilities that would fit most users and not cost too much

All-platforms build:

  • Less compute in total, but jobs are fat and sometimes drain the build machine
  • Simpler UX for end-users
09:23:14
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

Couldn't find an issue tracking this, so I'll drop a message here.
The more precise argument in favour of building for individual capabilities is easier maintenance and nixpkgs development.
When working on master it's desirable to only build for your own arch, but currently it means a cache-miss for transitive dependencies.
For example, you work on torchvision and you import nixpkgs with config.cudaCapabilities = [ "8.6" ]. Snap! You're rebuilding pytorch, you cancel, you write a custom shell that overrides torchvision specifically, you remove asserts, etc.

Alternative world: cuda-maintainers.cachix.org has a day-old pytorch build for 8.6, a build for 7.5, a build for 6.0, etc
Extra: faster nixpkgs-review, assuming fewer default capabilities

10:09:13
@ss:someonex.netSomeoneSerge (matrix works sometimes)https://github.com/Mic92/nixpkgs-review/issues/31410:27:30
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
It's actually a bit problematic, because it increases the runtime closure size, and it mixes buildInputs with nativeBuildInputs
So if a package works without using that pattern, we should try to avoid it?
11:25:21
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net

RE: 3.

Btw, did you find any documentation for this env attribute?

Seems like it’s going to be an upcoming change — there’s a new __structuredAttrs Boolean attribute a derivation can have (I think). When true, there’s some additional machinery which goes on in the background to ensure when you export or do string interpolation with certain variables that they come out in a sensible way into bash.

This was kind of handy for an example: https://nixos.mayflower.consulting/blog/2020/01/20/structured-attrs/

It is disabled by default currently though!

11:27:43
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net

RE: 3.

Btw, did you find any documentation for this env attribute?

* Seems like it’s going to be an upcoming change — there’s a new __structuredAttrs Boolean attribute stdenv can take (I think). When true, there’s some additional machinery which goes on in the background to ensure when you export or do string interpolation with certain variables that they come out in a sensible way into bash.
This was kind of handy for an example: https://nixos.mayflower.consulting/blog/2020/01/20/structured-attrs/
It is disabled by default currently though!
11:28:08
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net

RE: Caching "single build that supports all capabilities" vs "multiple builds that support individual cuda architectures"

Couldn't find an issue tracking this, so I'll drop a message here.
The more precise argument in favour of building for individual capabilities is easier maintenance and nixpkgs development.
When working on master it's desirable to only build for your own arch, but currently it means a cache-miss for transitive dependencies.
For example, you work on torchvision and you import nixpkgs with config.cudaCapabilities = [ "8.6" ]. Snap! You're rebuilding pytorch, you cancel, you write a custom shell that overrides torchvision specifically, you remove asserts, etc.

Alternative world: cuda-maintainers.cachix.org has a day-old pytorch build for 8.6, a build for 7.5, a build for 6.0, etc
Extra: faster nixpkgs-review, assuming fewer default capabilities

Can I add this question to the CUDA docs issue I have open on Nixpkgs? And if so, can I give you credit for it?
I’ve been spinning up 120 core VMs on azure as spot instances to do reviews and not having stuff cached is killing me. I’m currently working on my own binary cache with Cloudflare’s R2 (no ingress / egress fees and competing pricing per GB) to take care of that. Cachix is nice, but i keep hitting the limit and don’t want to pay for it / would feel bad about asking for a discount or something
11:31:19
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org
Can I add this question to the CUDA docs issue I have open on Nixpkgs? And if so, can I give you credit for it?
I’ve been spinning up 120 core VMs on azure as spot instances to do reviews and not having stuff cached is killing me. I’m currently working on my own binary cache with Cloudflare’s R2 (no ingress / egress fees and competing pricing per GB) to take care of that. Cachix is nice, but i keep hitting the limit and don’t want to pay for it / would feel bad about asking for a discount or something
Yes, please. I just wasn't sure where's the appropriate place to track this, and this sounds fit
11:34:25
@ss:someonex.netSomeoneSerge (matrix works sometimes)

I think NCCL is (still) ignoring cudaCapabilities. We should probably pass NVCC_GENCODE in makeFlagsArray

The format is:

NVCC_GENCODE is -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80

Seems like we can use cudaFlags.cudaGencode for that

13:58:39
@connorbaker:matrix.orgconnor (he/him)Ah yeah I didn’t switch it over yet, that’s in https://github.com/NixOS/nixpkgs/pull/21761914:36:18
@domenkozar:matrix.orgDomen Kožar
In reply to @connorbaker:matrix.org
Can I add this question to the CUDA docs issue I have open on Nixpkgs? And if so, can I give you credit for it?
I’ve been spinning up 120 core VMs on azure as spot instances to do reviews and not having stuff cached is killing me. I’m currently working on my own binary cache with Cloudflare’s R2 (no ingress / egress fees and competing pricing per GB) to take care of that. Cachix is nice, but i keep hitting the limit and don’t want to pay for it / would feel bad about asking for a discount or something
I'm happy to sponsor such stuff :)
16:22:38
@justbrowsing:matrix.orgKevin Mittman (UTC-7)Redacted or Malformed Event23:19:49
28 Feb 2023
@connorbaker:matrix.orgconnor (he/him)I've got the stomach flu so sorry if I haven't responded / reviewed things recently; I should be able to resume tomorrow.15:14:33
@connorbaker:matrix.orgconnor (he/him) Unrelated but something to know: in the cudaPackages 11.7 to 11.8 transition, be aware that cuda_profiler_api.h is no longer in cuda_nvprof; it's in a new cuda_profile_api package in cudaPackages. 15:16:06
@connorbaker:matrix.orgconnor (he/him) * Unrelated but something to know: in the cudaPackages 11.7 to 11.8 transition, be aware that cuda_profiler_api.h is no longer in cuda_nvprof; it's in a new cuda_profiler_api package in cudaPackages. 15:19:39
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org
I've got the stomach flu so sorry if I haven't responded / reviewed things recently; I should be able to resume tomorrow.

Ooh, that sounds devastating. Take care!

P.S. Also remember that the whole affair is voluntary, there isn't any rush, and it's more important to keep things sustainable than to sprint

15:32:17
@ss:someonex.netSomeoneSerge (matrix works sometimes)

RE: Building for individual arches

We'd need to choose a smaller list for nixpkgs' default cudaCapabilities, and we don't have a criteria to make that choice.
We could run a poll on nixos discourse, but I don't expect it to be representative.

One option is to include everything from Tim Dettmer's guide (available in json), which probably means just [ "8.6" "8.9" ]

Another is to choose whatever covers most of cuda support table from wikipedia, i.e. [ "6.1" "7.5" "8.6" ]. I feel like this would be still pretty fat build-wise

And then, I still wonder what happens if we something like [ "7.5" "8.6" "5.0" ] (i.e. with 5.0+PTX). I haven't seen anyone do that, I expect it would work, and it would cover all computes inbetween 5.0 and 8.6, just that everything except 8.6 and 7.5 might use suboptimal implementations?

20:29:39

Show newer messages


Back to Room ListRoom Version: 9