!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

306 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda60 Servers

Load older messages


SenderMessageTime
9 Nov 2022
@ss:someonex.netSomeoneSerge (matrix works sometimes)Now thinking of it though, I'm not quite sure if gpu-enabled tests are the most needed investment (compute-wise, or work-wise)21:16:25
@breakds:matrix.orgbreakdsSounds good. I am currently maintaining a separate repo of a few machine learning packages: https://github.com/nixvital/ml-pkgs Will take a look at nixpkgs-upkeep and see how to add a CI similarly21:16:30
@ss:someonex.netSomeoneSerge (matrix works sometimes) GPU checks are something we don't have, yes. But even the checkPhases we have now are sufficient most of the time to tell when a change breaks stuff. The issue is that we can't prevent that change from reaching the channels we consume as users. That could be solved by a separate branch (even if outside nixpkgs), which would be merged into automatically as long as our checks pass 21:24:27
@breakds:matrix.orgbreakds I see. But isn't it the CI's job to prevent such offending commits from being merged, by running the tests? Is it because running all the tests takes a lot of time due to the size of nixpkgs? 21:34:30
@ss:someonex.netSomeoneSerge (matrix works sometimes) hexa: you seem to be online 🙃 could you merge this? 21:35:12
@ss:someonex.netSomeoneSerge (matrix works sometimes)It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact21:38:23
@ss:someonex.netSomeoneSerge (matrix works sometimes)Again, I think a separate branch with an auto-merge would be entirely reasonable21:39:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)as a compromise21:39:31
@breakds:matrix.orgbreakds I understand now. Thanks for taking time to explain the situation Someone S ! 21:40:07
@ss:someonex.netSomeoneSerge (matrix works sometimes)No problem, welcome into the club:)21:40:59
@hexa:lossy.networkhexadone21:42:54
@breakds:matrix.orgbreakds set their display name to breakds.21:51:01
10 Nov 2022
@domenkozar:matrix.orgDomen Kožar
In reply to @ss:someonex.net
It's because nixpkgs' CI doesn't take CUDA packages into account (they're unfree and the trust model for nixpkgs' CI workers and public cache is such that they're... not expected to be running potentially malicious blackbox binaries). So we run a parallel CI. This ensures our packages are available prebuilt in cachix, and we can spot failures in the dashboard and handle them after the fact
ah that remind me, I need to finish https://github.com/cachix/nixpkgs-unfree-redistributable
04:50:55
@eahlberg:matrix.orgeahlbergis it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiled which as far as I understand will take a really long time09:24:51
@eahlberg:matrix.orgeahlberg * is it possible to see what versions are in the cuda-maintainers cachix cache? I'm trying to get CUDA up and running on an AWS ec2 instance with Tesla K80 but some things are compiling which as far as I understand will take a really long time09:27:54
@ss:someonex.netSomeoneSerge (matrix works sometimes) The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ...
But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix
09:31:12
@ss:someonex.netSomeoneSerge (matrix works sometimes) What's it going to be?
I guess I should look up cachix-deploy-lib
09:38:30
@domenkozar:matrix.orgDomen KožarIt will build all unfree packages for maco/linux10:40:27
@domenkozar:matrix.orgDomen Kožar* It will build all unfree packages for macos/linux10:40:35
@ss:someonex.netSomeoneSerge (matrix works sometimes)...dedicating hardware for that long-term? A cache, separate from cuda?10:43:56
@eahlberg:matrix.orgeahlberg
In reply to @ss:someonex.net
The only interface I know for checking the actual contents is nix path-info -r /nix/store/... --store ...
But a heuristic to avoid rebuilds is to pick a recently finished build that doesn't have too many failures: its will have been cached, and likely not have been garbage-collected by cachix
Cool, thanks! Managed to get it up and running using the cache
14:47:22
@domenkozar:matrix.orgDomen Kožar
In reply to @ss:someonex.net
...dedicating hardware for that long-term?
A cache, separate from cuda?
I think so. Any concerns?
14:51:40
@ss:someonex.netSomeoneSerge (matrix works sometimes)Nooo no no, this sounds positively great! I'm only trying to understand what effective total resources the community is going to have access to after you get this running, and how that could be used to change the nixpkgs ML/scicomp user experience 😆15:09:13
11 Nov 2022
@tpw_rules:matrix.orgtpw_rules Someone S: can you explain a ltitle more why you want to add a platform attribute to numba? it seems most other python packages don't have it and there's nothing in there that really limits it to a particular platform that isn't already limited by some dependency 01:20:58
@tpw_rules:matrix.orgtpw_rulesjust wondering if this is some new standard i'm not aware of really01:23:54
16 Nov 2022
@omlet:matrix.orgomlet joined the room.20:34:15
23 Nov 2022
@skainswo:matrix.orgSamuel Ainsworth
In reply to @breakds:matrix.org
A separate question, as I read from https://discourse.nixos.org/t/announcing-the-nixos-cuda-maintainers-team-and-a-call-for-maintainers/18074 , is x86_64-linux computing cycle still needed for github actions? I have a spare RTX 3080 not attached to any machine at this moment, not sure what is the best way to make it useful to the project. Shall I build a machine to run github actions?
Hi breakds! Thanks so much for your generosity! Yes, we'd love to find a way to make it useful somehow. Oddly enough our primary bottleneck atm is CPU cycles. It turns out that running CIs (nixpkgs-upkeep, Someone S 's build CI) takes a bit of compute. I'm finding that build jobs frequently hit the free-tier 6 hour limit on GH Actions, so finding an x86_64-linux machine that could be home to a GH Actions runner would be great!
09:11:05
@skainswo:matrix.orgSamuel AinsworthI created https://github.com/samuela/nixpkgs-upkeep with the goal of ultimately running GPU-enabled tests in CI, but honestly we've had so many issues just keeping the packages building at all (only CPU required to build), that GPU-enabled tests haven't been the issue so far09:12:34
@skainswo:matrix.orgSamuel Ainsworth and of course Someone S may have other needs as well 09:13:19
@skainswo:matrix.orgSamuel Ainsworth on the caching side, Domen Kožar has been incredibly helpful with the cuda-maintainers cache on Cachix (https://app.cachix.org/cache/cuda-maintainers#pull)! A lot of folks have him to thank for their setups working smoothly! 09:15:04

Show newer messages


Back to Room ListRoom Version: 9