!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

323 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
9 Nov 2022
@breakds:matrix.orgbreakdsAnd this also looks like something I am capable of implementing and setup a dedicate machine to run. Probably need some guidance on which repo to put them into and get started.21:04:24
@ss:someonex.netSomeoneSerge (matrix works sometimes)...so far all of the "CI" infrastructure for CUDA has been implemented outside nixpkgs21:05:00
@breakds:matrix.orgbreakdsDo you mean this is not ideal?21:05:39
@ss:someonex.netSomeoneSerge (matrix works sometimes)I meant, you can start with a personal repo. Maybe just something regularly running tests of your choice against current master and reporting the results in some form. Later there'll be a question of where this testsuite could be integrated (e.g. ideally we'd make a separate "stable cuda channel", which commits would only reach after passing cuda-related tests, but that too is work to be done). I think samuela's nixpkgs-upkeep is worth a look: it automatically opens issues about newly broken packages.21:14:00
@ss:someonex.netSomeoneSerge (matrix works sometimes) * I rather meant you can start with a personal repo. Maybe just something regularly running tests of your choice against current master and reporting the results in some form. Later there'll be a question of where this testsuite could be integrated (e.g. ideally we'd make a separate "stable cuda channel", which commits would only reach after passing cuda-related tests, but that too is work to be done). I think samuela's nixpkgs-upkeep is worth a look: it automatically opens issues about newly broken packages.21:14:29
@ss:someonex.netSomeoneSerge (matrix works sometimes)Now thinking of it though, I'm not quite sure if gpu-enabled tests are the most needed investment (compute-wise, or work-wise)21:16:25
@breakds:matrix.orgbreakdsSounds good. I am currently maintaining a separate repo of a few machine learning packages: https://github.com/nixvital/ml-pkgs Will take a look at nixpkgs-upkeep and see how to add a CI similarly21:16:30
@ss:someonex.netSomeoneSerge (matrix works sometimes) GPU checks are something we don't have, yes. But even the checkPhases we have now are sufficient most of the time to tell when a change breaks stuff. The issue is that we can't prevent that change from reaching the channels we consume as users. That could be solved by a separate branch (even if outside nixpkgs), which would be merged into automatically as long as our checks pass 21:24:27
@breakds:matrix.orgbreakds I see. But isn't it the CI's job to prevent such offending commits from being merged, by running the tests? Is it because running all the tests takes a lot of time due to the size of nixpkgs? 21:34:30
@ss:someonex.netSomeoneSerge (matrix works sometimes) hexa: you seem to be online 🙃 could you merge this? 21:35:12
@ss:someonex.netSomeoneSerge (matrix works sometimes)It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact21:38:23
@ss:someonex.netSomeoneSerge (matrix works sometimes)Again, I think a separate branch with an auto-merge would be entirely reasonable21:39:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)as a compromise21:39:31
@breakds:matrix.orgbreakds I understand now. Thanks for taking time to explain the situation Someone S ! 21:40:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)No problem, welcome into the club:)21:40:59
@hexa:lossy.networkhexadone21:42:54
@breakds:matrix.orgbreakds set their display name to breakds.21:51:01
10 Nov 2022
@domenkozar:matrix.orgDomen Kožar
In reply to @ss:someonex.net
It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact
ah that remind me, I need to finish https://github.com/cachix/nixpkgs-unfree-redistributable
04:50:55
@eahlberg:matrix.orgeahlbergis it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiled which as far as I understand will take a really long time09:24:51
@eahlberg:matrix.orgeahlberg * is it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiling which as far as I understand will take a really long time09:27:54
@ss:someonex.netSomeoneSerge (matrix works sometimes) The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ...
But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix
09:31:12
@ss:someonex.netSomeoneSerge (matrix works sometimes) What's it going to be?
I guess I should look up cachix-deploy-lib
09:38:30
@domenkozar:matrix.orgDomen KožarIt will build all unfree packages for maco/linux10:40:27
@domenkozar:matrix.orgDomen Kožar* It will build all unfree packages for macos/linux10:40:35
@ss:someonex.netSomeoneSerge (matrix works sometimes)...dedicating hardware for that long-term? A cache, separate from cuda?10:43:56
@eahlberg:matrix.orgeahlberg
In reply to @ss:someonex.net
The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ...
But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix
Cool, thanks! Managed to get it up and running using the cache
14:47:22
@domenkozar:matrix.orgDomen Kožar
In reply to @ss:someonex.net
...dedicating hardware for that long-term?
A cache, separate from cuda?
I think so. Any concerns?
14:51:40
@ss:someonex.netSomeoneSerge (matrix works sometimes)Nooo no no, this sounds positively great! I'm only trying to understand what effective total resources the community is going to have access to after you get this running, and how that could be used to change the nixpkgs ML/scicomp user experience 😆15:09:13
11 Nov 2022
@tpw_rules:matrix.orgtpw_rules Someone S: can you explain a ltitle more why you want to add a platform attribute to numba? it seems most other python packages don't have it and there's nothing in there that really limits it to a particular platform that isn't already limited by some dependency 01:20:58
@tpw_rules:matrix.orgtpw_rulesjust wondering if this is some new standard i'm not aware of really01:23:54

Show newer messages


Back to Room ListRoom Version: 9