!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

340 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
21 Mar 2023
@atrius:matrix.orgmjlbachAlso Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL)02:30:40
@connorbaker:matrix.orgconnor (he/him)That's correct, it's only forward not backward compatible Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want?03:21:08
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
Also Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL)
Would he be okay if I removed him from stuff like Magma? What's the recommended way to handle that?
03:24:57
@atrius:matrix.orgmjlbachI just texted him, will let you know :)03:25:33
@atrius:matrix.orgmjlbach
In reply to @connorbaker:matrix.org
That's correct, it's only forward not backward compatible
Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want?
Is there a reason for targeting 90 instead of 86 + PTX
03:26:27
@atrius:matrix.orgmjlbachThe only issue with having users specify the default is its not exactly clear03:27:10
@atrius:matrix.orgmjlbachI guess you can just put a link to https://developer.nvidia.com/cuda-gpus for people to to figure out the needed versions 03:28:20
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
Is there a reason for targeting 90 instead of 86 + PTX
Not really; if this is in reference https://github.com/NixOS/nixpkgs/issues/221564#issuecomment-1477191551 that was just because it's the latest. 86+PTX would give the best performance with the largest compatibility with Nixpkgs as it is currently, I think (latest version Torch 1.13 supports for example). 50+PTX would give the broadest HW support with (possibly) the worst performance.
03:30:54
@atrius:matrix.orgmjlbachAh, sorry for the phrasing, was suggesting 86 :) 03:31:37
@atrius:matrix.orgmjlbach * Ah, sorry for the phrasing, was suggesting 86, 89, 90 + PTX :) 03:31:56
@atrius:matrix.orgmjlbach I also finally got pytorch building with poetry2nix... 03:32:25
@connorbaker:matrix.orgconnor (he/him)
In reply to @atrius:matrix.org
The only issue with having users specify the default is its not exactly clear
that's definitely an issue! Personally I'd be a fan of an assert like cudaSupport -> cudaCapabilities != [ ] in cudaPackages which points the user to docs about how to build for a specific target/how to find out the cuda compute capability of their device
03:33:32
@atrius:matrix.orgmjlbachMaybe there should be a default cuda flake with instructions that shows how to override it03:34:31
@atrius:matrix.orgmjlbach * Maybe there should be a default cuda flake with instructions that shows how to override it/ with the doc link embedded03:34:41
@atrius:matrix.orgmjlbachhttps://github.com/cpcloud/torch-p2nix/blob/81f318026f19e1a2a41cf6126f8e1cd5a7fab8be/flake.nix#L24-L6103:35:03
@atrius:matrix.orgmjlbachThe issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation03:35:41
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @atrius:matrix.org
The issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation

all versions of torch that are currently in circulation

You mean semvers?

11:10:56
@ss:someonex.netSomeoneSerge (matrix works sometimes) LoL, I think supplying a global cudaSupport overrides USE_CUDA in the ROCM version of torch and spoils nixpkgs-review πŸ™ƒ 11:26:53
@ss:someonex.netSomeoneSerge (matrix works sometimes) Also, we ought to do sometihng about these pytestCheckPhases, they really don't go well in parallel 11:27:55
@connorbaker:matrix.orgconnor (he/him) Does anyone else set max-jobs = auto in their nix configuration? I've found that I need to limit it to 2 or similar to prevent nixpkgs-review from building several copies of torch and Jax in parallel lol 14:27:15
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net

crt/host_defines.h and such are shipped in cuda_nvcc -> currently we can't simply drop cuda_nvcc in nativeBuildInputs, but have to sometimes add it to buildInputs as well

We should split the outputs

in your experience with the redist packages, is splitting outputs going to be simple or require doing so on a case-by-case basis? I'm going to make some more issues tonight and wanted to know if I should make a single issue for splitting the outputs or if I should find the worst offenders by closure size and make tickets for them specifically.
14:30:09
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @connorbaker:matrix.org
in your experience with the redist packages, is splitting outputs going to be simple or require doing so on a case-by-case basis? I'm going to make some more issues tonight and wanted to know if I should make a single issue for splitting the outputs or if I should find the worst offenders by closure size and make tickets for them specifically.
I have no idea. We just need to try editing build-cuda-redist-package.nix and see if that breaks any cmake/pkg-config discovery downstream
14:32:20
@ss:someonex.netSomeoneSerge (matrix works sometimes)

Meanwhile, I just noticed we don't actually patch cuda's .pc files:

       β”‚ File: cudaPackages/pkg-config/nvrtc-11.7.pc

   1   β”‚ cudaroot=/usr/local/cuda-11.7
   2   β”‚ libdir=${cudaroot}/targets/x86_64-linux/lib
   3   β”‚ includedir=${cudaroot}/targets/x86_64-linux/include
   4   β”‚
   5   β”‚ Name: nvrtc
   6   β”‚ Description: A runtime compilation library for CUDA C++
   7   β”‚ Version: 11.7
   8   β”‚ Libs: -L${libdir} -lnvrtc
   9   β”‚ Cflags: -I${includedir}

I guess all of the automatic discovery we had worked through FindCUDAToolkit.cmake and not pkg-config

14:34:03
@ss:someonex.netSomeoneSerge (matrix works sometimes) Either way, we definitely should replace this /usr/local stuff 14:34:22
@ss:someonex.netSomeoneSerge (matrix works sometimes) Which, CC Kevin Mittman πŸ˜†, isn't technically permitted by the license 14:35:20
@justbrowsing:matrix.orgKevin Mittman (UTC-7)

New release of nvJPEG2000

​https://developer.download.nvidia.com/compute/nvjpeg2k/redist/Β 

14:36:37
@ss:someonex.netSomeoneSerge (matrix works sometimes)!!!!14:36:57
@justbrowsing:matrix.orgKevin Mittman (UTC-7) Formulating a response to inquiry but wording is hardΒ  14:49:39
@ss:someonex.netSomeoneSerge (matrix works sometimes) Tfw upstream has - name: Patch setup.py in their github workflows (looking at openai/triton which we now need for pytochWithRocm...) 14:57:50
@ss:someonex.netSomeoneSerge (matrix works sometimes) * Tfw upstream has - name: Patch setup.py in their github workflows (looking at openai/triton which we now need for pytochWithRocm...): "hwy would you hide this from me?.." 15:17:03

Show newer messages


Back to Room ListRoom Version: 9