| 21 Mar 2023 |
mjlbach | Also Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL) | 02:30:40 |
connor (he/him) | That's correct, it's only forward not backward compatible
Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want? | 03:21:08 |
connor (he/him) | In reply to @atrius:matrix.org Also Tyler (tbenst) doesn't really use nix anymore, I saw some people pinging him for approval on magma and such. I just texted him to confirm (we are friends IRL) Would he be okay if I removed him from stuff like Magma? What's the recommended way to handle that? | 03:24:57 |
mjlbach | I just texted him, will let you know :) | 03:25:33 |
mjlbach | In reply to @connorbaker:matrix.org That's correct, it's only forward not backward compatible Come to think of it, should we even have a default set of cuda architectures we build for? If we're looking at making the default just the single test architecture, why bother having a default? Why not require the user to specify what they want? Is there a reason for targeting 90 instead of 86 + PTX | 03:26:27 |
mjlbach | The only issue with having users specify the default is its not exactly clear | 03:27:10 |
mjlbach | I guess you can just put a link to https://developer.nvidia.com/cuda-gpus for people to to figure out the needed versions | 03:28:20 |
connor (he/him) | In reply to @atrius:matrix.org Is there a reason for targeting 90 instead of 86 + PTX Not really; if this is in reference https://github.com/NixOS/nixpkgs/issues/221564#issuecomment-1477191551 that was just because it's the latest. 86+PTX would give the best performance with the largest compatibility with Nixpkgs as it is currently, I think (latest version Torch 1.13 supports for example). 50+PTX would give the broadest HW support with (possibly) the worst performance. | 03:30:54 |
mjlbach | Ah, sorry for the phrasing, was suggesting 86 :) | 03:31:37 |
mjlbach | * Ah, sorry for the phrasing, was suggesting 86, 89, 90 + PTX :) | 03:31:56 |
mjlbach | I also finally got pytorch building with poetry2nix... | 03:32:25 |
connor (he/him) | In reply to @atrius:matrix.org The only issue with having users specify the default is its not exactly clear that's definitely an issue! Personally I'd be a fan of an assert like cudaSupport -> cudaCapabilities != [ ] in cudaPackages which points the user to docs about how to build for a specific target/how to find out the cuda compute capability of their device | 03:33:32 |
mjlbach | Maybe there should be a default cuda flake with instructions that shows how to override it | 03:34:31 |
mjlbach | * Maybe there should be a default cuda flake with instructions that shows how to override it/ with the doc link embedded | 03:34:41 |
mjlbach | https://github.com/cpcloud/torch-p2nix/blob/81f318026f19e1a2a41cf6126f8e1cd5a7fab8be/flake.nix#L24-L61 | 03:35:03 |
mjlbach | The issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation | 03:35:41 |
SomeoneSerge (matrix works sometimes) | In reply to @atrius:matrix.org The issue with torch is that I can't see how one derivation is going to provide support for building all versions of torch that are currently in circulation
all versions of torch that are currently in circulation
You mean semvers?
| 11:10:56 |
SomeoneSerge (matrix works sometimes) | LoL, I think supplying a global cudaSupport overrides USE_CUDA in the ROCM version of torch and spoils nixpkgs-review π | 11:26:53 |
SomeoneSerge (matrix works sometimes) | Also, we ought to do sometihng about these pytestCheckPhases, they really don't go well in parallel | 11:27:55 |
connor (he/him) | Does anyone else set max-jobs = auto in their nix configuration? I've found that I need to limit it to 2 or similar to prevent nixpkgs-review from building several copies of torch and Jax in parallel lol | 14:27:15 |
connor (he/him) | In reply to @ss:someonex.net
crt/host_defines.h and such are shipped in cuda_nvcc -> currently we can't simply drop cuda_nvcc in nativeBuildInputs, but have to sometimes add it to buildInputs as well
We should split the outputs
in your experience with the redist packages, is splitting outputs going to be simple or require doing so on a case-by-case basis? I'm going to make some more issues tonight and wanted to know if I should make a single issue for splitting the outputs or if I should find the worst offenders by closure size and make tickets for them specifically. | 14:30:09 |
SomeoneSerge (matrix works sometimes) | In reply to @connorbaker:matrix.org in your experience with the redist packages, is splitting outputs going to be simple or require doing so on a case-by-case basis? I'm going to make some more issues tonight and wanted to know if I should make a single issue for splitting the outputs or if I should find the worst offenders by closure size and make tickets for them specifically. I have no idea. We just need to try editing build-cuda-redist-package.nix and see if that breaks any cmake/pkg-config discovery downstream | 14:32:20 |
SomeoneSerge (matrix works sometimes) | Meanwhile, I just noticed we don't actually patch cuda's .pc files:
β File: cudaPackages/pkg-config/nvrtc-11.7.pc
1 β cudaroot=/usr/local/cuda-11.7
2 β libdir=${cudaroot}/targets/x86_64-linux/lib
3 β includedir=${cudaroot}/targets/x86_64-linux/include
4 β
5 β Name: nvrtc
6 β Description: A runtime compilation library for CUDA C++
7 β Version: 11.7
8 β Libs: -L${libdir} -lnvrtc
9 β Cflags: -I${includedir}
I guess all of the automatic discovery we had worked through FindCUDAToolkit.cmake and not pkg-config
| 14:34:03 |
SomeoneSerge (matrix works sometimes) | Either way, we definitely should replace this /usr/local stuff | 14:34:22 |
SomeoneSerge (matrix works sometimes) | Which, CC Kevin Mittman π, isn't technically permitted by the license | 14:35:20 |
Kevin Mittman (UTC-7) | New release of nvJPEG2000 βhttps://developer.download.nvidia.com/compute/nvjpeg2k/redist/Β | 14:36:37 |
SomeoneSerge (matrix works sometimes) | !!!! | 14:36:57 |
Kevin Mittman (UTC-7) | Formulating a response to inquiry but wording is hardΒ | 14:49:39 |
SomeoneSerge (matrix works sometimes) | Tfw upstream has - name: Patch setup.py in their github workflows (looking at openai/triton which we now need for pytochWithRocm...) | 14:57:50 |
SomeoneSerge (matrix works sometimes) | * Tfw upstream has - name: Patch setup.py in their github workflows (looking at openai/triton which we now need for pytochWithRocm...): "hwy would you hide this from me?.." | 15:17:03 |