!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
3 Feb 2025
@ss:someonex.netSomeoneSerge (back on matrix) OK one more item for the agenda: I think it would be good for us together to walk through the backlog, discuss issues' context, status, and present relevance, and sort/close outdated issues, maybe merge well-reviewed but forgotten PRs. I'd guess this is easily half an hour or more, should we schedule this separately? 22:30:38
@ss:someonex.netSomeoneSerge (back on matrix) * OK one more item for the agenda: I think it would be good for us together to walk through the backlog, discuss issues' contexts, statuses, and present relevance, and sort/close outdated issues, maybe merge well-reviewed but forgotten PRs. I'd guess this is easily half an hour or more, should we schedule this separately? 22:30:50
@ss:someonex.netSomeoneSerge (back on matrix)

I was thinking that we might be able to improve the situation by making general nixpkgs contributors more aware of this situation. For example, it would be pretty cool if we could track the nix-community hydra builds on status.nixos.org and on zh.fail (and try to include CUDA packages in future ZHF events).

You're certainly right, and the idea of promoting cuda fixes during ZHF has in fact been around. By the same token, an ofborg-like integration, an external service that would test a PR on-push and post a report on failures on non-default instantiations or involving out-of-tree tests is maybe even necessary to ensure stability of hw-accelerated packages. Even when a contributor doesn't care about cuda, it's important they are informed about unintended consequences of their changes, and maybe can ping the interested parties as needed

22:41:27
@ss:someonex.netSomeoneSerge (back on matrix)

For example, https://hydra.nix-community.org/jobset/nixpkgs/cuda/evals has a bunch of Eval Errors and build errors and I don't remember the last time that it was green (although some of those eval errors might not be indicative of actually broken packages).

My javascript might be broken, but I only see build failures. Some errors under cudaPackages. seem actually familiar, e.g. the cutensor error was fixed at least once already and is recurring... that's to be fixed somewhere around manifest.nix in the current implementation

22:44:52
@ss:someonex.netSomeoneSerge (back on matrix) Ah I see, thanks for the link. I guess "this is unfree" errors are kind of expected, you'll see them in the official hydra too? This does sound ridiculous though, I agree 22:49:09
@ss:someonex.netSomeoneSerge (back on matrix)

Also, I understand why hydra.nixos.org doesn't build CUDA packages, but do you think that we could enable evaluation-only checks for CUDA packages on nixpkgs github PRs and then build those PRs using the nix-community builders and report the results on the PR?

Ah great, you already said as much. Yes, we definitely can. You may have seen issues about unfree stuff open and closed in the Ofborg repo, so the notion isn't entirely new. I know for sure there are several interested parties, and this would be incredibly useful, maybe we can discuss in more detail on the call. This issue needs to be approached with some from the community perspective though, because it's desirable for nixpkgs and nix-community to still stay independent/disentangled: legally, socially, architecturally...

22:54:24
@ss:someonex.netSomeoneSerge (back on matrix) Is it still broken? I might have interest in fixing it, I'll check tmr 22:56:15
@ss:someonex.netSomeoneSerge (back on matrix) * Is it still broken? The attribute page shows latest eval grey. I might have interest in fixing it, I'll check tmr 22:57:50
@hexa:lossy.networkhexa (UTC+1)
       > ERROR: noBrokenSymlinks: the symlink /nix/store/fqx2dv9vp1k0f00imgqshy6d92ykcw5d-python3.12-kaleido-0.2.1/lib/python3.12/site-packages/kaleido/executable/etc/fonts/fonts.conf points to a missing target /nix/store/2ynwbywyaxk4wgl8d3xrb9dzkdzv241x-fontconfig-2.15.0-bin/etc/fonts/fonts.conf
       > ERROR: noBrokenSymlinks: found 1 dangling symlinks and 0 reflexive symlinks
       For full logs, run 'nix log /nix/store/f7whd4p85k8b7bd8sx2bnp5jpmzycbkx-python3.12-kaleido-0.2.1.drv'.
error: 1 dependencies of derivation '/nix/store/gdg8kgy8ry0gjhpv2dws072wajkjk69l-python3.12-plotly-5.24.1.drv' failed to build
error: 1 dependencies of derivation '/nix/store/pfxw0g0npwr091cr7ks7012jl8qsg
23:00:36
@hexa:lossy.networkhexa (UTC+1) yes, but now also due to that new hook connor (he/him) (UTC-7) introduced 23:00:45
@hexa:lossy.networkhexa (UTC+1) * SomeoneSerge (Gand St. Pieters): yes, but now also due to that new hook connor (he/him) (UTC-7) introduced 23:00:50
@glepage:matrix.orgGaƩtan LepageRedacted or Malformed Event23:02:31
@ss:someonex.netSomeoneSerge (back on matrix) Huh? 23:10:48
@ss:someonex.netSomeoneSerge (back on matrix) * Huh? I thought it was out of tree 23:11:17
@ss:someonex.netSomeoneSerge (back on matrix) * Huh? I thought it was still out of tree 23:11:21
@hexa:lossy.networkhexa (UTC+1)it is in the current staging cycle23:58:18
@hexa:lossy.networkhexa (UTC+1)and it is causing all sort of havoc23:58:27
@hexa:lossy.networkhexa (UTC+1)because no fixes were prepared in advance23:58:38
4 Feb 2025
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) I apologize, I should have made sure at least stdenvs for {x86_64,aarch64}-{darwin,linux} were working
In the case of rebuild-the-world PRs, I'm not sure what is considered sufficient in terms of testing -- is there particular language (or Nix expressions) you'd want to see in the contributing guidelines?
05:07:55
@ss:someonex.netSomeoneSerge (back on matrix) Oh nice! 09:46:24
@ss:someonex.netSomeoneSerge (back on matrix)Found it09:46:27
@afdee1c:matrix.orgafdee1c joined the room.20:04:25
@zopieux:matrix.zopi.euzopieux I'm a bit confused about the nix-community cache and I wonder if my system/config is to blame, or the cache.
This cuda build succeeded and depends on nixpkgs d0bb46, which I pinned in my flake. Upon building though, nix decides to build fmpz8s6hy3yr8z6kb84h6498437d0xj1-ollama-0.5.7.drv even though per the above and per https://nix-community.cachix.org/8njyvpf8sxh8k61zvnv13cymn7szv63c.narinfo, the output should be available in the cache. nix.conf confirms the substituter/pubkey is present. Am I missing something?
21:03:07
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) I haven't followed too closely how the community Hydra builds cudaPackages, but the first thing that comes to my mind is that perhaps your config.cudaCapabilities doesn't match the default set selected when it's unset? (By default, we build for a number of capabilities.) 22:28:12
5 Feb 2025
@ss:someonex.netSomeoneSerge (back on matrix) No, fmpz8s6hy3yr8z6kb84h6498437d0xj1 is the deriver of 8njyvpf8sxh8k61zvnv13cymn7szv63c.narinfo, and cachix knows about it, and if you try wget https://nix-community.cachix.org/nar/77cdeba29947cabc7c880df05f41200e0f9ee711651931ac046e64fcfd52f48b.nar.zst it actually begins to download the blob so it's not GCed 00:38:31
@ss:someonex.netSomeoneSerge (back on matrix) So Nix does behave weird here? zopieux what's it say if --builders "" -j0? 00:39:30
@ruroruro:matrix.orgruro Regarding "broken javascript", you can see eval failures on the "Evaluation Errors" tab of the nixpkgs:cuda jobset (but not on individual job pages, not sure why). 21:37:53
6 Feb 2025
@ruroruro:matrix.orgruroHere's my attempt at a PR that fixes a bunch of Eval Errors that I've outlined earlier: https://github.com/NixOS/nixpkgs/pull/37976806:38:45
@ruroruro:matrix.orgruro

Regarding the remaining 21 eval errors:

  • 13x cuda-samples depends on freeimage which is technically insecure. I am not 100% sure, if we should just filter all of the cuda-samples packages using the new filterPackagePredicates mechanism or if it might be better to do

    freeimage.overrideAttrs { meta.knownVulnerabilities = [ ]; }
    

    since cuda-samples isn't really "production-facing" anyway, so the users should only really care that the samples compile. The security risk of distributing sample code that technically has some vulnerabilities should be minimal.

  • colmap also depends on freeimage, this issue should be probably raised upstream

  • boxx and bpycv haven't been updated upstream in the last 11 months and they seem to not support any of the python versions that are currently supported in nixpkgs. So we should probably check in with the nixpkgs maintainer and remove these packages if they aren't required by something important.

  • pixinsight is (and always was) unfree, but it is explicitly listed in release-cuda.nix for some reason. Should it be removed?

  • tts because it depends on a -bin version of pytorch for some reason, which is "unfree" (bsd3 issl unfreeRedistributable). Is it possible to make it depend on a non-binary version of pytorch or should it be removed from release-cuda.nix?

  • mxnet is "actually" broken since #173463

  • truecrack-cuda is "actually" broken since #167250

  • pymc depends on pytensor is "actually" broken since #373239

06:56:17
@ruroruro:matrix.orgruro *

Regarding the remaining 21 eval errors:

  • 13x cuda-samples depends on freeimage which is technically insecure. I am not 100% sure, if we should just filter all of the cuda-samples packages using the new filterPackagePredicates mechanism or if it might be better to do
freeimage.overrideAttrs { meta.knownVulnerabilities = [ ]; }

specifically for cuda-samples. Since cuda-samples isn't really "production-facing" anyway, so the users should only really care that the samples compile. The security risk of distributing sample code that technically has some vulnerabilities should be minimal.

  • colmap also depends on freeimage, this issue should be probably raised upstream
  • boxx and bpycv haven't been updated upstream in the last 11 months and they seem to not support any of the python versions that are currently supported in nixpkgs. So we should probably check in with the nixpkgs maintainer and remove these packages if they aren't required by something important.
  • pixinsight is (and always was) unfree, but it is explicitly listed in release-cuda.nix for some reason. Should it be removed?
  • tts because it depends on a -bin version of pytorch for some reason, which is "unfree" (bsd3 issl unfreeRedistributable). Is it possible to make it depend on a non-binary version of pytorch or should it be removed from release-cuda.nix?
  • mxnet is "actually" broken since #173463
  • truecrack-cuda is "actually" broken since #167250
  • pymc depends on pytensor is "actually" broken since #373239
06:57:34

Show newer messages


Back to Room ListRoom Version: 9