| 9 Nov 2022 |
SomeoneSerge (back on matrix) | Samuel Ainsworth had started building a gpu-enabled testsuite, but I'm not aware of anything like that running on a regular basis. We do have builds running that populate cachix: https://app.cachix.org/cache/cuda-maintainers These builds don't require (and can't use) a GPU. We do, as a matter of fact, need more cpu compute just for the builds, or better yet a more sustainable option like a dedicated buildbox maintained by some committed organisation. We haven't got around to make any calls yet
And there's https://github.com/samuela/nixpkgs-upkeep which, I think, uses github workers
| 20:48:41 |
SomeoneSerge (back on matrix) | GPU-enabled sanity checks (even for things like "torch being able to allocate a tensor on a device") would be very nice, but somebody has to implement them | 20:53:49 |
breakds | Thanks for the introduction to how the builds and the packages are being tested. Agree that it might make sense to add some GPU-enabled tests such as training a very small model and verify it is successful. | 21:03:07 |
breakds | And this also looks like something I am capable of implementing and setup a dedicate machine to run. Probably need some guidance on which repo to put them into and get started. | 21:04:24 |
SomeoneSerge (back on matrix) | ...so far all of the "CI" infrastructure for CUDA has been implemented outside nixpkgs | 21:05:00 |
breakds | Do you mean this is not ideal? | 21:05:39 |
SomeoneSerge (back on matrix) | I meant, you can start with a personal repo. Maybe just something regularly running tests of your choice against current master and reporting the results in some form.
Later there'll be a question of where this testsuite could be integrated (e.g. ideally we'd make a separate "stable cuda channel", which commits would only reach after passing cuda-related tests, but that too is work to be done).
I think samuela's nixpkgs-upkeep is worth a look: it automatically opens issues about newly broken packages. | 21:14:00 |
SomeoneSerge (back on matrix) | * I rather meant you can start with a personal repo. Maybe just something regularly running tests of your choice against current master and reporting the results in some form.
Later there'll be a question of where this testsuite could be integrated (e.g. ideally we'd make a separate "stable cuda channel", which commits would only reach after passing cuda-related tests, but that too is work to be done).
I think samuela's nixpkgs-upkeep is worth a look: it automatically opens issues about newly broken packages. | 21:14:29 |
SomeoneSerge (back on matrix) | Now thinking of it though, I'm not quite sure if gpu-enabled tests are the most needed investment (compute-wise, or work-wise) | 21:16:25 |
breakds | Sounds good. I am currently maintaining a separate repo of a few machine learning packages: https://github.com/nixvital/ml-pkgs Will take a look at nixpkgs-upkeep and see how to add a CI similarly | 21:16:30 |
SomeoneSerge (back on matrix) | GPU checks are something we don't have, yes. But even the checkPhases we have now are sufficient most of the time to tell when a change breaks stuff. The issue is that we can't prevent that change from reaching the channels we consume as users. That could be solved by a separate branch (even if outside nixpkgs), which would be merged into automatically as long as our checks pass | 21:24:27 |
breakds | I see. But isn't it the CI's job to prevent such offending commits from being merged, by running the tests? Is it because running all the tests takes a lot of time due to the size of nixpkgs? | 21:34:30 |
SomeoneSerge (back on matrix) | hexa: you seem to be online 🙃 could you merge this? | 21:35:12 |
SomeoneSerge (back on matrix) | It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact | 21:38:23 |
SomeoneSerge (back on matrix) | Again, I think a separate branch with an auto-merge would be entirely reasonable | 21:39:26 |
SomeoneSerge (back on matrix) | as a compromise | 21:39:31 |
breakds | I understand now. Thanks for taking time to explain the situation Someone S ! | 21:40:07 |
SomeoneSerge (back on matrix) | No problem, welcome into the club:) | 21:40:59 |
hexa (UTC+1) | done | 21:42:54 |
| breakds set their display name to breakds. | 21:51:01 |
| 10 Nov 2022 |
Domen Kožar | In reply to @ss:someonex.net It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact ah that remind me, I need to finish https://github.com/cachix/nixpkgs-unfree-redistributable | 04:50:55 |
eahlberg | is it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiled which as far as I understand will take a really long time | 09:24:51 |
eahlberg | * is it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiling which as far as I understand will take a really long time | 09:27:54 |
SomeoneSerge (back on matrix) | The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ... But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix | 09:31:12 |
SomeoneSerge (back on matrix) | What's it going to be? I guess I should look up cachix-deploy-lib | 09:38:30 |
Domen Kožar | It will build all unfree packages for maco/linux | 10:40:27 |
Domen Kožar | * It will build all unfree packages for macos/linux | 10:40:35 |
SomeoneSerge (back on matrix) | ...dedicating hardware for that long-term?
A cache, separate from cuda? | 10:43:56 |
eahlberg | In reply to @ss:someonex.net The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ... But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix Cool, thanks! Managed to get it up and running using the cache | 14:47:22 |
Domen Kožar | In reply to @ss:someonex.net ...dedicating hardware for that long-term? A cache, separate from cuda? I think so. Any concerns? | 14:51:40 |