NixOS CUDA | 317 Members | |
| CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda | 63 Servers |
| Sender | Message | Time |
|---|---|---|
| 19 Apr 2024 | ||
| But aside from that I see no reason to discard multiple outputs | 16:09:20 | |
| The hook isn't the cause of the issue | 16:09:34 | |
| We've created an issue in two steps: we disabled propagatedBuildInputs, and as a consequence we had to hard-code outputSpecified; we did it instead of solving the underlying issue - the circular dependency | 16:12:02 | |
| 20 Apr 2024 | ||
| https://github.com/systemd/systemd/pull/32234 | 10:35:34 | |
| Want that | 10:35:37 | |
| (for pytorch and crap) | 10:36:14 | |
| 13:41:03 | ||
| Does anyone have cuda working with pytorch? | 14:23:04 | |
| If so, can I see your config? | 14:23:15 | |
| * Does anyone have cuda working with pytorch on nixos? | 14:23:25 | |
| I've got pytorch installed with cuda 11.8 and cdnn 8.9.1 and a 4090. Config is here: https://github.com/DieracDelta/flakes/blob/flakes/hosts/hw/desktop.nix + https://github.com/DieracDelta/flakes/blob/flakes/hosts/hw/shared.nix | 14:40:24 | |
| when I try to use nixos-23.11 to bring in pytorch with either python10 or python11, I get (different) errors about missing files. I can provide those in a bit. My flake is here: https://github.com/DieracDelta/detypstify/blob/master/flake.nix#L139 . The issue is when I run pytorch code, though my GPU is recognized and torch seems to build with cuda (after I brought in the nixified-ai flake) python segfaults with a bus error on some avx instruction. I can also grab that error in a bit. I'm wondering if folks have successfully been using pytorch (and what corresponding versions of cuda/cdnn etc) are being used. | 14:44:56 | |
In reply to @justinrestivo:matrix.org The nixos config is just
The nixpkgs config for building pytorch is | 14:57:26 | |
It's pretty flexible, you can build pretty much against whichever version | 14:59:26 | |
| I do the above and it works for me justinrestivo | 15:20:26 | |
In reply to @trexd:matrix.org Any chance you could run the following and see if it segfaults for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script
| 17:27:41 | |
In reply to @trexd:matrix.org* Any chance you could run the following and see if it core dumps for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script
| 17:27:46 | |
In reply to @trexd:matrix.orgThis also works for me until a pytorch call coredumps. | 17:31:26 | |
| The backtrace is kinda strange, but coming from libcudnn
| 17:31:42 | |
In reply to @trexd:matrix.org* This also works for me until a pytorch call coredumps. This is also could be an issue with my code. | 17:34:11 | |
| * Any chance you could run the following and see if it core dumps for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script. Cleanup can be done by removing the directory and gcing
| 17:38:18 | |
Hi folks, from the readme it seems like the only recommended flake usage is to use inputs nixpkgs-unstable & SomeoneSerge/nixpkgs-unfree, however I'm not sure I understand how this plays with the flake lock being updated every once in a while by the bot. I'd like to make sure I'm using a nixpkgs & unfree rev that has the packages I care about in the cachix. Is there some magic I'm missing? | 18:18:50 | |
In reply to @justinrestivo:matrix.orgTurns out connecting a display to the card, setting hardware.nvidia.nvidiaPersistenced=true and hardware.nvidia.modesetting.enable=true changes the error from a bus error to a python mismatched size error 🙏. No idea why that helped, though 🫠. Thank you everyone who helped. | 18:59:32 | |
In reply to @zopieux:matrix.zopi.euHi. Frankly, it's been broken again for a while, one might as well just give up. This needs a rehaul. We should reconsider the alternatives again: a community buildbot, garnix, hydra... Nonetheless, you pick a job in https://hercules-ci.com/github/SomeoneSerge/nixpkgs-cuda-ci/jobs/10700 and see what nixpkgs the corresponding flake.lock refers to... | 23:09:25 | |
Thanks! I ended up binary-searching the most recent nixpkgs-unstable rev from flake.lock commits which happens to result in a cache-only build :) I agree this sucks though. Is there anything the community can help with this rehaul? Is Hercules the problem or something else? | 23:12:48 | |
| No, Hercules isn't the problem, just lack of maintenance is, lack of targeted work | 23:15:46 | |
Yes, I'm sure the community can. Somebody has got to push this (I so far have been struggling with some unrelated stuff so I couldn't): write up an opencollective proposal, write a new proposal to reconsider the hydra situation, &c. One could also finance those who work on this for a living: https://nixos.org/community/teams/cuda/ | 23:23:08 | |
*
Yes, I'm sure the community can. Somebody has got to push this (I so far have been struggling with some unrelated stuff so I couldn't): write up an opencollective proposal, write a new proposal to reconsider the hydra situation, &c. One could also finance those who already work on this for a living: https://nixos.org/community/teams/cuda/ | 23:28:32 | |
| 21 Apr 2024 | ||
| Speaking of, I need to write a few proposals, docs, tutorials, and make an update for the NixOS discourse. After I finish the integration work with fixed output derivations. | 01:03:19 | |
| 23 Apr 2024 | ||
| Redacted or Malformed Event | 07:24:23 | |