!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

288 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
15 Feb 2025
@ss:someonex.netSomeoneSerge (back on matrix)But with the intention of building a scalable persistent-ish cache later11:32:56
@zowoq:matrix.orgzowoqRunning another dedicated machine (e.g. cheapish hetzner box) with just hydra and harmonia for the cache isn't a problem and spot instances for builders wouldn't be much maintenance overhead. Scope, funding, etc. are questions that I'll leave for @zimbatm.12:27:05
@ruroruro:matrix.orgruro Not 100% sure, what do you mean? The problematic cuda versions are 11.0-11.3. 11.4 and later have individual redistributables. 10.x are already deprecated/removed from nixpkgs, so no need to worry about those. 18:22:06
@ruroruro:matrix.orgruro *

Not 100% sure, what do you mean? The problematic cuda versions are 11.0-11.3.

11.4 and later have individual redistributables. 10.x are already deprecated/removed from nixpkgs, so no need to worry about those.

18:22:29
@connorbaker:matrix.orgconnor (he/him)
In reply to @ruroruro:matrix.org
Not 100% sure, what do you mean? The problematic cuda versions are 11.0-11.3. 11.4 and later have individual redistributables. 10.x are already deprecated/removed from nixpkgs, so no need to worry about those.
For what it’s worth 11.x will be removed prior to 25.05 from what I remember
19:34:06
@indoor_squirrel:matrix.orgindoor_squirrel joined the room.19:48:53
@indoor_squirrel:matrix.orgindoor_squirrel
In reply to @ss:someonex.net

Great, thanks! So, the question essentially is: we (I think I say this with the cuda team hat on) can and want to scale up the CI for testing CUDA-enabled packages, both by increasing the number of builders, and by adding GPU instances. We want to build many more variants of nixpkgs for different architectures, and, ideally, run tests across a matrix of co-processor devices. For obvious reasons, want the infra to be owned by a transparent community-aligned entity with diversified funding - like nix-community. If this were to be done in nix-community, we'd have to do some work upfront, like ensuring sufficiently smart scheduling to not jam other jobsets hosted by the organization. This would also probably increasing maintenance workload. This also raises questions about the scope of nix-community: how niche and how large of a project is acceptable? E.g. if nix-community does some GPU hardware stuff, why also not mobile, not IoT, not FPGA? Etc. If we decide that buying physical hardware is in-scope, we need to figure out how to manage the inventory and how to manage trust.

Despite all that, I do like the notion of doing this through nix-community, because it already up and running, it has a compatible structure, and it's already a recognized name.

To this end, would public financial support for this project allow for anonymous contributions?
19:52:24
@ss:someonex.netSomeoneSerge (back on matrix)I think OpenCollective allows anonymous donations (in the sense of hiding the source from the public, but not from the project owners)19:53:21
@indoor_squirrel:matrix.orgindoor_squirrel
In reply to @zowoq:matrix.org
I imagine that the amount and size of builds would make cachix or other cloud storage unfeasible. If it was only a dev cache could probably get away with just serving it off the CI master, if it was a proper public cache with non-trivial amount of users probably want a dedicated machine (or more than one if you want to keep the cache around for a while).
All substituter implantations today are centralized, right? It'd be neat to build one on top of IPFS, or similar, for example. In my head, no one host would control any one store path and something like Shamir secret sharing could support k of n hosts being able to sign a store path.
19:54:44
@indoor_squirrel:matrix.orgindoor_squirrel* All substituter implementations today are centralized, right? It'd be neat to build one on top of IPFS, or similar, for example. In my head, no one host would control any one store path and something like Shamir secret sharing could support k of n hosts being able to sign a store path.19:55:02
@indoor_squirrel:matrix.orgindoor_squirrel
In reply to @ss:someonex.net
I think OpenCollective allows anonymous donations (in the sense of hiding the source from the public, but not from the project owners)
This is potentially dangerous. Is there any effort that you know of for nix-community or you guys to work toward a funding solution which obscures this information from even OpenCollective, much less the project owners?
19:56:34
@indoor_squirrel:matrix.orgindoor_squirrel* This is potentially dangerous. Is there any effort that you know of for nix-community or you guys (CUDA dudes) to work toward a funding solution which obscures this information from even OpenCollective, much less the project owners?19:57:01
@ss:someonex.netSomeoneSerge (back on matrix)In what sense would this be dangerous for nix-community specifically?19:57:47
@indoor_squirrel:matrix.orgindoor_squirrelFor the contributor, not nix-community.19:58:01
@indoor_squirrel:matrix.orgindoor_squirrelAlthough, if they spend money that can be proven to derive from criminal action then it could endanger nix-community, too. Not thinking about that, though.19:58:29
@indoor_squirrel:matrix.orgindoor_squirrel* Although, if they spend money that can be proven to have been derived from criminal action then it could endanger nix-community, too. Not thinking about that, though.19:58:46
@ss:someonex.netSomeoneSerge (back on matrix)Regarding the ipfs message. That's not so much a substituter problem (an nginx instance with a local directory is more or less a substituter) as it is a build queue/hydra problem: afaiu we have no better solution for managing keys, than to copy outputs from all builders to a dedicated signer machine, nor do we have ready tools yet for doing this kind of cross-validation you describe. IPFS is more relevant for egress/distribution, which I'd say is a secondary concern right now. In either case, this is more of a research question, whereas with the CI we want to an immediate need with available tools20:10:41
@indoor_squirrel:matrix.orgindoor_squirrel
In reply to @ss:someonex.net
Regarding the ipfs message. That's not so much a substituter problem (an nginx instance with a local directory is more or less a substituter) as it is a build queue/hydra problem: afaiu we have no better solution for managing keys, than to copy outputs from all builders to a dedicated signer machine, nor do we have ready tools yet for doing this kind of cross-validation you describe. IPFS is more relevant for egress/distribution, which I'd say is a secondary concern right now. In either case, this is more of a research question, whereas with the CI we want to an immediate need with available tools
Sorry, didn't mean to imply that this idea solves any problem that you had, here. So definitely off-topic.
20:12:43
@indoor_squirrel:matrix.orgindoor_squirrelJust a drive-by thought.20:12:52
@indoor_squirrel:matrix.orgindoor_squirrelUnderstood you're looking for a compute solution not any substituter solution.20:13:13
@ss:someonex.netSomeoneSerge (back on matrix) I just realized I'm not sure about the exact legal form of nix-community and might have bullshitted on the last meeting about it being registered as a non-profit (if it were, where?). But the safe assumption is that all of the infra in the project is hosted in one of the jurisdictions with... strict AML/KYC regulations (in case of the Foundation, Netherlands), so a platform like opencollective would be forced to know, and idk if an organization receiving donations through some kind of mixer service would make it liable. Interesting questions, but out of scope for the specific effort 20:29:55
@ss:someonex.netSomeoneSerge (back on matrix) * Regarding the ipfs message. That's not so much a substituter problem (an nginx instance with a local directory is more or less a substituter) as it is a build queue/hydra problem: afaiu we have no better solution for managing keys, than to copy outputs from all builders to a dedicated signer machine, nor do we have ready tools yet for doing this kind of cross-validation you describe. IPFS is more relevant for egress/distribution, which I'd say is a secondary concern right now. In either case, this is more of a research question, whereas with the CI we want to an immediate need with available tools EDIT: There are several Discourse posts about using torrents and ipfs for substitution with extended discussion, including some posts disentangling the separate problems targeted by such suggestions. They're probably worth finding20:31:56
@ss:someonex.netSomeoneSerge (back on matrix)

Thank you, this addresses the concern that we might be "imposing unwanted work and expectations on another team" at least for now.

Jonas Chevalier in addition to the questions about the scope, I wonder what "Nix Community Projects" is, legally? In particular, thinking of physical hardware, can it "own things"?

20:42:11
16 Feb 2025
@connorbaker:matrix.orgconnor (he/him) SomeoneSerge (UTC+U[-12,12]): any tips for getting CMake PRs reviewed? https://gitlab.kitware.com/cmake/cmake/-/merge_requests/10354 01:12:36
@ruroruro:matrix.orgruroQuestion: how soon is "prior to 25.05"? Do you mean "after the 25.05 branch-off" or "some time between now and 25.05 release" or what? In other words, when will CUDA 11.x be removed from master/unstable?03:51:57
@connorbaker:matrix.orgconnor (he/him)My understanding from the GCC maintainers for Nixpkgs is that they'll remove it as part of their GCC 11 removal, which will happen prior to the 25.05 branch-off, so that it is not available in the 25.05 release05:16:56
@ss:someonex.netSomeoneSerge (back on matrix) Well they already pinged https://gitlab.kitware.com/robertmaynard 17:19:24
@aidalgol:matrix.orgaidalgolAny idea what's going on here? https://github.com/NixOS/nixpkgs/issues/38216918:56:23
@aidalgol:matrix.orgaidalgolI know I added TensorRT to nixpkgs, but I've fallen behind with some of the derivation refactoring.18:56:59
@connorbaker:matrix.orgconnor (he/him) Yeah it just needs a cleanup
Honestly need to move to the 10.x series which don’t have a login preventing download — I’ve got those packaged out of tree
Speaking of, in the out of tree stuff I’ve been working on I’m postponing the CUDA 12.8 update and instead going to be making a PR to merge it back into Nixpkgs
19:43:16

Show newer messages


Back to Room ListRoom Version: 9