| 28 Feb 2023 |
connor (he/him) | I've got the stomach flu so sorry if I haven't responded / reviewed things recently; I should be able to resume tomorrow. | 15:14:33 |
connor (he/him) | Unrelated but something to know: in the cudaPackages 11.7 to 11.8 transition, be aware that cuda_profiler_api.h is no longer in cuda_nvprof; it's in a new cuda_profile_api package in cudaPackages. | 15:16:06 |
connor (he/him) | * Unrelated but something to know: in the cudaPackages 11.7 to 11.8 transition, be aware that cuda_profiler_api.h is no longer in cuda_nvprof; it's in a new cuda_profiler_api package in cudaPackages. | 15:19:39 |
SomeoneSerge (matrix works sometimes) | In reply to @connorbaker:matrix.org I've got the stomach flu so sorry if I haven't responded / reviewed things recently; I should be able to resume tomorrow. Ooh, that sounds devastating. Take care!
P.S. Also remember that the whole affair is voluntary, there isn't any rush, and it's more important to keep things sustainable than to sprint
| 15:32:17 |
SomeoneSerge (matrix works sometimes) | RE: Building for individual arches
We'd need to choose a smaller list for nixpkgs' default cudaCapabilities, and we don't have a criteria to make that choice. We could run a poll on nixos discourse, but I don't expect it to be representative.
One option is to include everything from Tim Dettmer's guide (available in json), which probably means just [ "8.6" "8.9" ]
Another is to choose whatever covers most of cuda support table from wikipedia, i.e. [ "6.1" "7.5" "8.6" ]. I feel like this would be still pretty fat build-wise
And then, I still wonder what happens if we something like [ "7.5" "8.6" "5.0" ] (i.e. with 5.0+PTX). I haven't seen anyone do that, I expect it would work, and it would cover all computes inbetween 5.0 and 8.6, just that everything except 8.6 and 7.5 might use suboptimal implementations?
| 20:29:39 |
SomeoneSerge (matrix works sometimes) | Personally, I'd prefer that there was only one arch in there by default.
Alt: default to cudaCapabilities = [ "5.0" ] (with PTX); probably cuda works by default for everyone; it's maybe mysteriously slow and people don't know to override the config Alt: default to cudaCapabilities = [ "8.6" ]; works for DL users, throws an error for lower grade cards, maybe people find out they need to override the config, but maybe they don't and end up feeling overwhelmed with nixpkgs
| 20:43:13 |
SomeoneSerge (matrix works sometimes) | Smaller closures 🙏 | 20:43:44 |
connor (he/him) | 8.6 wouldn’t work for people using an A100 though, right? Since that’s only 8.0 | 22:41:22 |
SomeoneSerge (matrix works sometimes) | Uh, right | 22:52:46 |
| 1 Mar 2023 |
Kevin Mittman (UTC-7) | FYI, CUDA 12.1.0 is now available https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.1.0.json | 00:22:22 |
Kevin Mittman (UTC-7) | which presents some questions - how are these software releases typically noticed - organically? when something depends on it?
- what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up
- normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful?
| 01:26:05 |
hexa | well, here we go again | 03:12:02 |
hexa | numba | 03:12:03 |
hexa | it just can't keep up | 03:12:13 |
hexa | no release in 5 months to address numpy lag | 03:12:34 |
hexa | we probably need https://github.com/numba/numba/pull/8691 | 03:15:19 |
hexa | but it is 20 commits big | 03:15:23 |
hexa | and has failing t ests | 03:15:28 |
hexa | * and has failing tests | 03:15:37 |
SomeoneSerge (matrix works sometimes) | Error in fail: Repository command failed
No library found under: /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0
ERROR: /build/source/WORKSPACE:15:14: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last)
❯ ldd /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0
ldd: /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0: No such file or directory
| 03:16:29 |
SomeoneSerge (matrix works sometimes) | I'ma sleep | 03:16:37 |
hexa | FRidhSomeone S please tell me if https://github.com/NixOS/nixpkgs/pull/218929 is acceptable | 03:48:17 |
SomeoneSerge (matrix works sometimes) | In reply to @hexa:lossy.network we probably need https://github.com/numba/numba/pull/8691
Presently CI will fail due to the lack of NumPy 1.24 packages in Anaconda, but this should be resolved in time. ... I think all review comments are now addressed and this is just waiting on package availability so as to complete testing.
What the actual fuck, they won't just relax pinned versions?
| 12:05:56 |
connor (he/him) | In reply to @justbrowsing:matrix.org
which presents some questions - how are these software releases typically noticed - organically? when something depends on it?
- what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up
- normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful?
Word of warning, I haven't been working on this too long but here's what I've noticed. All, please feel free to correct if any of this is wrong.
how are these software releases typically noticed - organically? when something depends on it?
I believe it's organic -- outside of https://github.com/ryantm/nixpkgs-update I'm not sure what exists in the way of automation for updating nixpkgs.
what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up
Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version.
normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful?
I think we find out about it either from breakages or reading the release notes/JSON.
Kevin Mittman I like these questions -- can I add them to the docs tracking issue I have here? https://github.com/NixOS/nixpkgs/issues/217780
| 14:29:48 |
connor (he/him) | In reply to @ss:someonex.net
Personally, I'd prefer that there was only one arch in there by default.
Alt: default to cudaCapabilities = [ "5.0" ] (with PTX); probably cuda works by default for everyone; it's maybe mysteriously slow and people don't know to override the config Alt: default to cudaCapabilities = [ "8.6" ]; works for DL users, throws an error for lower grade cards, maybe people find out they need to override the config, but maybe they don't and end up feeling overwhelmed with nixpkgs
With respect to specifying capabilities -- some packages (glares at magma) don't support every capability: https://github.com/NixOS/nixpkgs/pull/217410/files#diff-1e7812b78446dca0e64c4bb933e9255fca6f6539ec1ecd610edf1285a3fcbc56R55
Like, the hell? Skipping 8.6, 8.7, and 8.9? Packages like that make me think we need some way for the coda configuration and derivation to interact to agree on a list of architectures to build for. Imaging setting cudaCapabilities = [ "8.6" ]; and getting failures because, while some packages support Ampere, they don't support that capability by name. That'd be annoying right?
Or maybe that's desirable? Maybe it would be more annoying if the package used the greatest common factor (say, 8.0 when 8.6 was requested) and no errors were thrown? Is that misleading the user?
| 14:44:12 |
SomeoneSerge (matrix works sometimes) | https://github.com/NixOS/nixpkgs/pull/218265
tf builds âś… | 15:58:31 |
Kevin Mittman (UTC-7) | In reply to @connorbaker:matrix.org
Word of warning, I haven't been working on this too long but here's what I've noticed. All, please feel free to correct if any of this is wrong.
how are these software releases typically noticed - organically? when something depends on it?
I believe it's organic -- outside of https://github.com/ryantm/nixpkgs-update I'm not sure what exists in the way of automation for updating nixpkgs.
what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up
Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version.
normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful?
I think we find out about it either from breakages or reading the release notes/JSON.
Kevin Mittman I like these questions -- can I add them to the docs tracking issue I have here? https://github.com/NixOS/nixpkgs/issues/217780
I was thinking more along the lines of an issue filed here https://github.com/NVIDIA/build-system-archive-import-examples | 16:10:04 |
SomeoneSerge (matrix works sometimes) | In reply to @justbrowsing:matrix.org
FYI, CUDA 12.1.0 is now available https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.1.0.json In principle, as long as the directory listing at https://developer.download.nvidia.com/compute/cuda/redist/ works, we could work out our own automation. That being said, a single machine-readable entrypoint (a stable location with a json that lists URIs to all releases, or an RSS feed) would be more convenient | 16:14:00 |
SomeoneSerge (matrix works sometimes) |
Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version.
Been thinking that too. Not even automated PRs, but we could improve visiblity by just making a github workflow that runs on cron schedule, checks published JSONs, and publishes a status report on github pages
| 16:16:31 |
SomeoneSerge (matrix works sometimes) | I thought it would be a good idea to run nixpkgs-review with cudaSupport = true, but that just opened a hellgate: https://gist.github.com/SomeoneSerge/6cc00b41964e43f725fc12046778532d#file-218265-log-L23 | 22:38:52 |