NixOS CUDA - Public Room Timeline

	NixOS CUDA	290 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
7 Jan 2025
SomeoneSerge (back on matrix)	Hi, sorry if I missed anything, my homeserver was offline for a week	21:58:59
	Jitsi widget removed by SomeoneSerge (back on matrix)	21:59:20
	tomberek joined the room.	22:40:13
9 Jan 2025
hexa	connor (he/him) (UTC-7): can we get your result of https://github.com/microsoft/onnxruntime/issues/22855 ported into nixpkgs as well?	02:08:59
hexa	https://github.com/NixOS/nixpkgs/pull/364362 is still stuck on that issue	02:09:12
hexa	oh, I see your're in the LA area	02:12:06
hexa	take care!	02:12:08
connor (he/him)	Check out how I packaged https://github.com/ConnorBaker/cuda-packages/tree/main/cuda-packages/common/cudnn-frontend and https://github.com/ConnorBaker/cuda-packages/blob/main/cuda-packages/common/onnxruntime/package.nix	05:56:56
connor (he/him)	In reply to @hexa:lossy.network oh, I see your're in the LA area I’m about an hour south so I’m fine apart from poor air quality	15:01:45
hexa	torch and tensorboard on python3.13 https://github.com/NixOS/nixpkgs/pull/372406	15:33:12
Gaétan Lepage	Thanks for handling that !	15:37:46
11 Jan 2025
	oak 🏳️‍🌈♥️ removed their profile picture.	16:46:07
hexa	can I get some guidance on cudaPackages?	02:24:33
	oak 🏳️‍🌈♥️ set a profile picture.	16:47:04
hexa	https://github.com/NixOS/nixpkgs/pull/364362	02:24:37
hexa	error: evaluation aborted with the following error message: 'lib.customisation.callPackageWith: Function called without required argument "cuda_cccl" at /nix/store/sj06sl54sc0rxlj0g52pd3pq3glyvpak-source/pkgs/development/cuda-modules/cudnn-frontend/default.nix:5'	02:24:46
hexa	ok, guess I need to pick them out of cudaPackages	02:33:01
SomeoneSerge (back on matrix)	Odd, I thought we nuked old cuda releases that didn't have cuda_cccl?	11:47:40
hexa	oh yeah, that would explain why I could build it just fine, but eval would fail 🙂	16:09:02
13 Jan 2025
ruro	Hi, everyone. In my experience, CUDA packages and CUDA-enabled packages when `cudaSupport = true;` are quite often broken in nixpkgs (more often than other packages). For example, https://hydra.nix-community.org/jobset/nixpkgs/cuda/evals has a bunch of Eval Errors and build errors and I don't remember the last time that it was green (although some of those eval errors might not be indicative of actually broken packages). I was thinking that we might be able to improve the situation by making general nixpkgs contributors more aware of this situation. For example, it would be pretty cool if we could track the nix-community hydra builds on status.nixos.org, on zh.fail (and try to include CUDA packages in future ZHF events). Also, I understand why hydra.nixos.org doesn't build CUDA packages, but do you think that we could enable evaluation-only checks for CUDA packages on nixpkgs github PRs and then build those PRs using the nix-community builders and report the results on the PR? Finally, I was wondering if there is some canonical place to track/discuss CUDA-specific build failures in nixpkgs?	14:27:12
ruro	* Hi, everyone. In my experience, CUDA packages and CUDA-enabled packages when `cudaSupport = true;` are quite often broken in nixpkgs (more often than other packages). For example, https://hydra.nix-community.org/jobset/nixpkgs/cuda/evals has a bunch of Eval Errors and build errors and I don't remember the last time that it was green (although some of those eval errors might not be indicative of actually broken packages). I was thinking that we might be able to improve the situation by making general nixpkgs contributors more aware of this situation. For example, it would be pretty cool if we could track the nix-community hydra builds on status.nixos.org and on zh.fail (and try to include CUDA packages in future ZHF events). Also, I understand why hydra.nixos.org doesn't build CUDA packages, but do you think that we could enable evaluation-only checks for CUDA packages on nixpkgs github PRs and then build those PRs using the nix-community builders and report the results on the PR? Finally, I was wondering if there is some canonical place to track/discuss CUDA-specific build failures in nixpkgs?	14:28:08
ruro	It feels like fixing CUDA packages currently is "treadmill work" where some package gets fixed only for something else to get broken by unrelated changes in nixpkgs (because the current automation on github PRs doesn't check CUDA-enabled versions of packages).	14:35:04
ruro	On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab https://hydra.nix-community.org/jobset/nixpkgs/cuda#tabs-errors, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken for some reason 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken for some reason 4 errors are due to some of the `nsight_systems` packages being marked as broken for some reason 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it is marked as broken `pymc` because it depends on `pytensor` which is marked as broken `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `truecrack-cuda` because it depends on `truecrack` which is marked as broken `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:09:46
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken for some reason 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken for some reason 4 errors are due to some of the `nsight_systems` packages being marked as broken for some reason 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it is marked as broken `pymc` because it depends on `pytensor` which is marked as broken `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `truecrack-cuda` because it depends on `truecrack` which is marked as broken `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:10:11
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken (I think that for most of them, its because "CUDA version is too new" or "CUDA version is too old"). 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken for some reason 4 errors are due to some of the `nsight_systems` packages being marked as broken for some reason 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it is marked as broken `pymc` because it depends on `pytensor` which is marked as broken `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `truecrack-cuda` because it depends on `truecrack` which is marked as broken `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:12:44
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken (I think that for most of them, its because "CUDA version is too new" or "CUDA version is too old"). 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken (because, "Package is not supported; use drivers from linuxPackages") 4 errors are due to some of the `nsight_systems` packages being marked as broken for some reason 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it is marked as broken `pymc` because it depends on `pytensor` which is marked as broken `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `truecrack-cuda` because it depends on `truecrack` which is marked as broken `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:13:39
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken (I think that for most of them, its because "CUDA version is too new" or "CUDA version is too old"). 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken (because "Package is not supported; use drivers from linuxPackages") 4 errors are due to some of the `nsight_systems` packages being marked as broken (because "CUDA too old (<11.8)") 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it is marked as broken `pymc` because it depends on `pytensor` which is marked as broken `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `truecrack-cuda` because it depends on `truecrack` which is marked as broken `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:14:49
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken (I think that for most of them, its because "CUDA version is too new" or "CUDA version is too old"). 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken (because "Package is not supported; use drivers from linuxPackages") 4 errors are due to some of the `nsight_systems` packages being marked as broken (because "CUDA too old (<11.8)") 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it was marked as broken in #173463 `truecrack-cuda` because it was marked as broken in #167250 `pymc` because it depends on `pytensor` which is marked as broken (0 clues, why `nix-community` hydra thinks so, it seemed to work for me locally) `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" (`bsd3 issl unfreeRedistributable`) Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:23:52
ruro	* On a related note does anybody know, what's up with the Eval Errors on the nix-community cuda job? Looking at this tab, it seems that most eval errors are caused by the fact that `release-cuda.nix` just tries to indiscriminately build everything in `cudaPackages`. 119 of those errors are caused by the fact that TensorRT has an "unfree" license (and it can't be built in CI anyway, because you need to manually download the tarballs) 101 errors are due to some of the `cudnn_*` packages being marked as broken (I think that for most of them, its because "CUDA version is too new" or "CUDA version is too old"). 15 errors are due to `cuda-samples`, `colmap` and `deepin.image-editor` packages depending on `freeimage-unstable-2021-11-01` which is marked as insecure 13 errors are due to some of the `nvidia_driver` packages being marked as broken (because "Package is not supported; use drivers from linuxPackages") 4 errors are due to some of the `nsight_systems` packages being marked as broken (because "CUDA too old (<11.8)") 4 errors are due to CUDA 10 being removed from nixpkgs, but still being accessible via `cudaPackages_10{,_0,_1,_2}` 2 errors are due to `boxx` and `bpycv` depending on `fn` which doesn't work with `python>=3.11` And the following individual packages are also failing to eval: `pixinsight` because it is "unfree" `mxnet` because it was marked as broken in #173463 `truecrack-cuda` because it was marked as broken in #167250 `pymc` because it depends on `pytensor` which was marked as broken in #373239 `Theano` because it was removed from nixpkgs, but is still accessible (and listed in `release-cuda.nix`) `tts` because it depends on a `-bin` version of `pytorch` for some reason, which is "unfree" (`bsd3 issl unfreeRedistributable`) Interestingly, the "Evaluation Errors" tabs of individual job runs are empty for some reason	15:26:04
connor (he/him)	Excellent questions and ideas! You’re correct that CUDA packages are broken more often than other packages — we don’t get the benefit of any of the tooling Nixpkgs CI provides. I’m all for Hydra integrations to make that information more visible, but I fear it won’t prevent breakages since those are usually caught when maintainers run Nixpkgs-review, and they don’t typically enable CUDA support from what I can tell. I think evaluation only checks are very reasonable for upstream. I’m not sure what would be involved in getting the community builders to build CUDA packages (especially given some of their licenses and the fact that CUDA packages tend to be resource intensive to build). We do have a CUDA project board on GitHub, but nothing solely for build failures IIRC. I haven’t had the chance to follow what’s happening with the Nix community Hydra :(	15:50:21

Show newer messages

Back to Room ListRoom Version: 9