!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
31 Jan 2025
@connorbaker:matrix.orgconnor (he/him) I am so tired
But now have setup hooks which can catch common issues like the order of different CUDA directories in a run path
Or fail a build if NVCC’s host compiler leaks out (which can/will cause glibc/glibcxx symbol issues)
Even beyond that
I implemented utility functions for arrays and associative arrays in bash because I got tired of repeating myself in different hooks
And then when I got tired of repeating myself in tests for those functions and hooks, I made a utility derivation to make testing for expected arrays and associative arrays easier
06:55:57
@connorbaker:matrix.orgconnor (he/him)It’s still a mess but it’s on this branch if anyone is curious https://github.com/ConnorBaker/cuda-packages/compare/main...fix/runpath-order-matters-and-cuda-compat-gets-clobbered06:56:57
@ss:someonex.netSomeoneSerge (back on matrix)

Let's schedule a call to discuss how to go forward with stdenv support, setup-hooks, wrappers, config.cudaSupport, localSystem/pkgsCross, clang support, and out-of-tree override-ability of manifests and toolkit components https://crab.fit/cudapackages-ng-781527

CC connor (he/him) (UTC-7), sielicki, Samuel Ainsworth, and anyone interested

10:49:09
@zimbatm:numtide.comJonas Chevalier changed their display name from Jonas Chevalier to Jonas Chevalier (FOSDEM).19:11:42
@ss:someonex.netSomeoneSerge (back on matrix) changed their display name from SomeoneSerge to SomeoneSerge (Bruxelles).19:35:00
1 Feb 2025
@matthewcroughan:defenestrate.itmatthewcroughan changed their display name from matthewcroughan (already in Brussels) to matthewcroughan (FOSDEM).09:41:01
2 Feb 2025
@pederbs:pvv.ntnu.nopbsds changed their display name from pbsds to pbsds (FOSDEM).16:04:38
@osmanfbayram:matrix.orgosbm joined the room.18:23:02
3 Feb 2025
@zimbatm:numtide.comJonas Chevalier changed their display name from Jonas Chevalier (FOSDEM) to Jonas Chevalier.08:23:12
@matthewcroughan:defenestrate.itmatthewcroughan changed their display name from matthewcroughan (FOSDEM) to matthewcroughan.09:11:41
@ss:someonex.netSomeoneSerge (back on matrix) changed their display name from SomeoneSerge (Bruxelles) to SomeoneSerge (Gand St. Pieters).13:40:41
@ruroruro:matrix.orgruro connor (he/him) (UTC-7): SomeoneSerge (Gand St. Pieters) sorry to keep annoying you guys, but could you respond to the above question? Alternatively, "we are too busy right now, you'll have to figure it out on your own" is also an acceptable answer))) 14:37:45
@ss:someonex.netSomeoneSerge (back on matrix) Sorry, I forgot to reply. I'll write before tomorrow 14:41:33
@ruroruro:matrix.orgruro❤️14:42:10
@pederbs:pvv.ntnu.nopbsds changed their display name from pbsds (FOSDEM) to pbsds.16:25:49
@ss:someonex.netSomeoneSerge (back on matrix)

Starting with the last question: great to hear! As one tool to help with discovery, we have a task board at https://github.com/orgs/NixOS/projects/27/views/1. We haven't been properly maintaining it for the last year, I see many invalidated/outdated items there, but some of the roadmap is still relevant, and the "New" column is automatically populated with all issues and PRs tagged "cuda".

If you're willing to do chores, fixing issues like "nvidia's bash wrapper for nsys-ui assumes things are installed into weird locations and is completely broken" and "a package has changed the way they hardcode /usr/lib or dlopen stuff and now fails to find libcuda.so again", those would be very useful, relatively straightforward, but involve an amount of debugging and suffering and usually get ignored for a long time because it's just demotivating.

If you're interested in architectural issues, then note the message about the upcoming meeting and the proposed subjects, check out the "Roadmap" column, and Connor's out-of-tree cuda-packages

22:27:33
@ss:someonex.netSomeoneSerge (back on matrix) OK one more item for the agenda: I think it would be good for us together to walk through the backlog, discuss issues' context, status, and present relevance, and sort/close outdated issues, maybe merge well-reviewed but forgotten PRs. I'd guess this is easily half an hour or more, should we schedule this separately? 22:30:38
@ss:someonex.netSomeoneSerge (back on matrix) * OK one more item for the agenda: I think it would be good for us together to walk through the backlog, discuss issues' contexts, statuses, and present relevance, and sort/close outdated issues, maybe merge well-reviewed but forgotten PRs. I'd guess this is easily half an hour or more, should we schedule this separately? 22:30:50
@ss:someonex.netSomeoneSerge (back on matrix)

I was thinking that we might be able to improve the situation by making general nixpkgs contributors more aware of this situation. For example, it would be pretty cool if we could track the nix-community hydra builds on status.nixos.org and on zh.fail (and try to include CUDA packages in future ZHF events).

You're certainly right, and the idea of promoting cuda fixes during ZHF has in fact been around. By the same token, an ofborg-like integration, an external service that would test a PR on-push and post a report on failures on non-default instantiations or involving out-of-tree tests is maybe even necessary to ensure stability of hw-accelerated packages. Even when a contributor doesn't care about cuda, it's important they are informed about unintended consequences of their changes, and maybe can ping the interested parties as needed

22:41:27
@ss:someonex.netSomeoneSerge (back on matrix)

For example, https://hydra.nix-community.org/jobset/nixpkgs/cuda/evals has a bunch of Eval Errors and build errors and I don't remember the last time that it was green (although some of those eval errors might not be indicative of actually broken packages).

My javascript might be broken, but I only see build failures. Some errors under cudaPackages. seem actually familiar, e.g. the cutensor error was fixed at least once already and is recurring... that's to be fixed somewhere around manifest.nix in the current implementation

22:44:52
@ss:someonex.netSomeoneSerge (back on matrix) Ah I see, thanks for the link. I guess "this is unfree" errors are kind of expected, you'll see them in the official hydra too? This does sound ridiculous though, I agree 22:49:09
@ss:someonex.netSomeoneSerge (back on matrix)

Also, I understand why hydra.nixos.org doesn't build CUDA packages, but do you think that we could enable evaluation-only checks for CUDA packages on nixpkgs github PRs and then build those PRs using the nix-community builders and report the results on the PR?

Ah great, you already said as much. Yes, we definitely can. You may have seen issues about unfree stuff open and closed in the Ofborg repo, so the notion isn't entirely new. I know for sure there are several interested parties, and this would be incredibly useful, maybe we can discuss in more detail on the call. This issue needs to be approached with some from the community perspective though, because it's desirable for nixpkgs and nix-community to still stay independent/disentangled: legally, socially, architecturally...

22:54:24
@ss:someonex.netSomeoneSerge (back on matrix) Is it still broken? I might have interest in fixing it, I'll check tmr 22:56:15
@ss:someonex.netSomeoneSerge (back on matrix) * Is it still broken? The attribute page shows latest eval grey. I might have interest in fixing it, I'll check tmr 22:57:50
@hexa:lossy.networkhexa
       > ERROR: noBrokenSymlinks: the symlink /nix/store/fqx2dv9vp1k0f00imgqshy6d92ykcw5d-python3.12-kaleido-0.2.1/lib/python3.12/site-packages/kaleido/executable/etc/fonts/fonts.conf points to a missing target /nix/store/2ynwbywyaxk4wgl8d3xrb9dzkdzv241x-fontconfig-2.15.0-bin/etc/fonts/fonts.conf
       > ERROR: noBrokenSymlinks: found 1 dangling symlinks and 0 reflexive symlinks
       For full logs, run 'nix log /nix/store/f7whd4p85k8b7bd8sx2bnp5jpmzycbkx-python3.12-kaleido-0.2.1.drv'.
error: 1 dependencies of derivation '/nix/store/gdg8kgy8ry0gjhpv2dws072wajkjk69l-python3.12-plotly-5.24.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/pfxw0g0npwr091cr7ks7012jl8qsg
23:00:36
@hexa:lossy.networkhexa yes, but now also due to that new hook connor (he/him) (UTC-7) introduced 23:00:45
@hexa:lossy.networkhexa * SomeoneSerge (Gand St. Pieters): yes, but now also due to that new hook connor (he/him) (UTC-7) introduced 23:00:50
@glepage:matrix.orgGaétan LepageRedacted or Malformed Event23:02:31
@ss:someonex.netSomeoneSerge (back on matrix) Huh? 23:10:48
@ss:someonex.netSomeoneSerge (back on matrix) * Huh? I thought it was out of tree 23:11:17

Show newer messages


Back to Room ListRoom Version: 9