!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

291 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
18 Mar 2023
@ss:someonex.netSomeoneSerge (back on matrix) There's absolutely zero reason not to have a separate build step before install here =\ 22:39:32
@hexa:lossy.networkhexacan imagine this to get quite messy22:42:33
19 Mar 2023
@connorbaker:matrix.orgconnor (he/him) Is there guidance on when to use preConfigure to set environment variables vs. setting it in the derivation via all-caps attributes? For example, when building on the PyTorch derivation, should I add attributes for environment variables like https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/torch/default.nix#L208-L222, or set them in the preConfigure, like https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/torch/default.nix#L196? 21:24:32
20 Mar 2023
@ss:someonex.netSomeoneSerge (back on matrix)

crt/host_defines.h and such are shipped in cuda_nvcc -> currently we can't simply drop cuda_nvcc in nativeBuildInputs, but have to sometimes add it to buildInputs as well

We should split the outputs

20:15:15
@connorbaker:matrix.orgconnor (he/him)We should also definitely split the outputs for CUDNN given that the static libraries nearly, what, double the size? 1.2GB -> 2.4GB?20:34:13
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @connorbaker:matrix.org
We should also definitely split the outputs for CUDNN given that the static libraries nearly, what, double the size? 1.2GB -> 2.4GB?
Yes, we should split out all static libs, not only cudnn, but cuda redist
20:59:37
21 Mar 2023
@freesig:matrix.orgfreesig joined the room.00:59:54
@ss:someonex.netSomeoneSerge (back on matrix)Let's see https://github.com/NixOS/nixpkgs/pull/22227301:36:36
@ss:someonex.netSomeoneSerge (back on matrix) Not that I know, but we probably want to minimize the size of imperative scripts. One valid reason to use preConfigure could be to explicitly extent an existing bash variable or something. In the linked examples we could probably just as well use env = optionalAttrs ...? 01:39:49
@atrius:matrix.orgmjlbach connor (he/him): correct me if I'm wrong, but I believe if you only build for compute capability 9.0 (sm_90) that will only support the NVIDIA H100 because its not backwards compatible (PTX ensures forward compatibility with newer GPUS) 02:30:07
@atrius:matrix.orgmjlbachAlso Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL)02:30:40
@connorbaker:matrix.orgconnor (he/him)That's correct, it's only forward not backward compatible Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want?03:21:08
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
Also Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL)
Would he be okay if I removed him from stuff like Magma? What's the recommended way to handle that?
03:24:57
@atrius:matrix.orgmjlbachI just texted him, will let you know :)03:25:33
@atrius:matrix.orgmjlbach
In reply to @connorbaker:matrix.org
That's correct, it's only forward not backward compatible
Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want?
Is there a reason for targeting 90 instead of 86 + PTX
03:26:27
@atrius:matrix.orgmjlbachThe only issue with having users specify the default is its not exactly clear03:27:10
@atrius:matrix.orgmjlbachI guess you can just put a link to https://developer.nvidia.com/cuda-gpus for people to to figure out the needed versions 03:28:20
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
Is there a reason for targeting 90 instead of 86 + PTX
Not really; if this is in reference https://github.com/NixOS/nixpkgs/issues/221564#issuecomment-1477191551 that was just because it's the latest. 86+PTX would give the best performance with the largest compatibility with Nixpkgs as it is currently, I think (latest version Torch 1.13 supports for example). 50+PTX would give the broadest HW support with (possibly) the worst performance.
03:30:54
@atrius:matrix.orgmjlbachAh, sorry for the phrasing, was suggesting 86 :) 03:31:37
@atrius:matrix.orgmjlbach * Ah, sorry for the phrasing, was suggesting 86, 89, 90 + PTX :) 03:31:56
@atrius:matrix.orgmjlbach I also finally got pytorch building with poetry2nix... 03:32:25
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
The only issue with having users specify the default is its not exactly clear
that's definitely an issue! Personally I'd be a fan of an assert like cudaSupport -> cudaCapabilities != [ ] in cudaPackages which points the user to docs about how to build for a specific target/how to find out the cuda compute capability of their device
03:33:32
@atrius:matrix.orgmjlbachMaybe there should be a default cuda flake with instructions that shows how to override it03:34:31
@atrius:matrix.orgmjlbach * Maybe there should be a default cuda flake with instructions that shows how to override it/ with the doc link embedded03:34:41
@atrius:matrix.orgmjlbachhttps://github.com/cpcloud/torch-p2nix/blob/81f318026f19e1a2a41cf6126f8e1cd5a7fab8be/flake.nix#L24-L6103:35:03
@atrius:matrix.orgmjlbachThe issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation03:35:41
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @atrius:matrix.org
The issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation

all versions of torch that are currently in circulation

You mean semvers?

11:10:56
@ss:someonex.netSomeoneSerge (back on matrix) LoL, I think supplying a global cudaSupport overrides USE_CUDA in the ROCM version of torch and spoils nixpkgs-review 🙃 11:26:53
@ss:someonex.netSomeoneSerge (back on matrix) Also, we ought to do sometihng about these pytestCheckPhases, they really don't go well in parallel 11:27:55
@connorbaker:matrix.orgconnor (he/him) Does anyone else set max-jobs = auto in their nix configuration? I've found that I need to limit it to 2 or similar to prevent nixpkgs-review from building several copies of torch and Jax in parallel lol 14:27:15

There are no newer messages yet.


Back to Room ListRoom Version: 9