!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

317 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda63 Servers

Load older messages


SenderMessageTime
19 Apr 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes)But aside from that I see no reason to discard multiple outputs16:09:20
@ss:someonex.netSomeoneSerge (matrix works sometimes)The hook isn't the cause of the issue16:09:34
@ss:someonex.netSomeoneSerge (matrix works sometimes)We've created an issue in two steps: we disabled propagatedBuildInputs, and as a consequence we had to hard-code outputSpecified; we did it instead of solving the underlying issue - the circular dependency16:12:02
20 Apr 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes) https://github.com/systemd/systemd/pull/32234 10:35:34
@ss:someonex.netSomeoneSerge (matrix works sometimes)Want that10:35:37
@ss:someonex.netSomeoneSerge (matrix works sometimes)(for pytorch and crap)10:36:14
@nscnt:matrix.org@nscnt:matrix.org left the room.13:41:03
@justinrestivo:matrix.orgjustinrestivoDoes anyone have cuda working with pytorch?14:23:04
@justinrestivo:matrix.orgjustinrestivoIf so, can I see your config?14:23:15
@justinrestivo:matrix.orgjustinrestivo * Does anyone have cuda working with pytorch on nixos?14:23:25
@justinrestivo:matrix.orgjustinrestivoI've got pytorch installed with cuda 11.8 and cdnn 8.9.1 and a 4090. Config is here: https://github.com/DieracDelta/flakes/blob/flakes/hosts/hw/desktop.nix + https://github.com/DieracDelta/flakes/blob/flakes/hosts/hw/shared.nix14:40:24
@justinrestivo:matrix.orgjustinrestivowhen I try to use nixos-23.11 to bring in pytorch with either python10 or python11, I get (different) errors about missing files. I can provide those in a bit. My flake is here: https://github.com/DieracDelta/detypstify/blob/master/flake.nix#L139 . The issue is when I run pytorch code, though my GPU is recognized and torch seems to build with cuda (after I brought in the nixified-ai flake) python segfaults with a bus error on some avx instruction. I can also grab that error in a bit. I'm wondering if folks have successfully been using pytorch (and what corresponding versions of cuda/cdnn etc) are being used.14:44:56
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @justinrestivo:matrix.org
If so, can I see your config?

The nixos config is just

hardware.opengl.enable = true;
services.xserver.videoDrivers = [ "nvidia" ];

The nixpkgs config for building pytorch is { cudaSupport = true; } plus optional cudaCapabilities and cudaEnableForwardCompat.

14:57:26
@ss:someonex.netSomeoneSerge (matrix works sometimes)

and what corresponding versions of cuda/cdnn etc are being used.

It's pretty flexible, you can build pretty much against whichever version

14:59:26
@trexd:matrix.orgtrexd I do the above and it works for me justinrestivo 15:20:26
@justinrestivo:matrix.orgjustinrestivo
In reply to @trexd:matrix.org
I do the above and it works for me justinrestivo

Any chance you could run the following and see if it segfaults for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script

git clone git@github.com:DieracDelta/detypstify.git && cd detypstify && git checkout reproducible && cd train && nix develop .#mvp -c python3 main.py

17:27:41
@justinrestivo:matrix.orgjustinrestivo
In reply to @trexd:matrix.org
I do the above and it works for me justinrestivo
*

Any chance you could run the following and see if it core dumps for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script

git clone git@github.com:DieracDelta/detypstify.git && cd detypstify && git checkout reproducible && cd train && nix develop .#mvp -c python3 main.py

17:27:46
@justinrestivo:matrix.orgjustinrestivo
In reply to @trexd:matrix.org
I do the above and it works for me justinrestivo
This also works for me until a pytorch call coredumps.
17:31:26
@justinrestivo:matrix.orgjustinrestivo

The backtrace is kinda strange, but coming from libcudnn

0x00007ffff7772e5d in __strlen_avx2 () from /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7772e5d in __strlen_avx2 () from /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/libc.so.6
#1  0x00007ff9189340dc in ?? ()
   from /nix/store/pq6sfashimm70zdzlh5gdzv3cl05f6xv-cudatoolkit-11-cudnn-8.9.1-lib/lib/libcudnn_cnn_train.so.8
17:31:42
@justinrestivo:matrix.orgjustinrestivo
In reply to @trexd:matrix.org
I do the above and it works for me justinrestivo
* This also works for me until a pytorch call coredumps. This is also could be an issue with my code.
17:34:11
@justinrestivo:matrix.orgjustinrestivo *

Any chance you could run the following and see if it core dumps for you? Would help me identify if it's my system setup or something else. It'll just bring in the python tooling and run a self contained script. Cleanup can be done by removing the directory and gcing

git clone git@github.com:DieracDelta/detypstify.git && cd detypstify && git checkout reproducible && cd train && nix develop .#mvp -c python3 main.py

17:38:18
@zopieux:matrix.zopi.euzopieux Hi folks, from the readme it seems like the only recommended flake usage is to use inputs nixpkgs-unstable & SomeoneSerge/nixpkgs-unfree, however I'm not sure I understand how this plays with the flake lock being updated every once in a while by the bot. I'd like to make sure I'm using a nixpkgs & unfree rev that has the packages I care about in the cachix. Is there some magic I'm missing? 18:18:50
@justinrestivo:matrix.orgjustinrestivo
In reply to @justinrestivo:matrix.org

The backtrace is kinda strange, but coming from libcudnn

0x00007ffff7772e5d in __strlen_avx2 () from /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7772e5d in __strlen_avx2 () from /nix/store/anlf335xlh41yjhm114swi87406mq5pw-glibc-2.38-44/lib/libc.so.6
#1  0x00007ff9189340dc in ?? ()
   from /nix/store/pq6sfashimm70zdzlh5gdzv3cl05f6xv-cudatoolkit-11-cudnn-8.9.1-lib/lib/libcudnn_cnn_train.so.8
Turns out connecting a display to the card, setting hardware.nvidia.nvidiaPersistenced=true and hardware.nvidia.modesetting.enable=true changes the error from a bus error to a python mismatched size error 🙏. No idea why that helped, though 🫠. Thank you everyone who helped.
18:59:32
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @zopieux:matrix.zopi.eu
Hi folks, from the readme it seems like the only recommended flake usage is to use inputs nixpkgs-unstable & SomeoneSerge/nixpkgs-unfree, however I'm not sure I understand how this plays with the flake lock being updated every once in a while by the bot. I'd like to make sure I'm using a nixpkgs & unfree rev that has the packages I care about in the cachix. Is there some magic I'm missing?
Hi. Frankly, it's been broken again for a while, one might as well just give up. This needs a rehaul. We should reconsider the alternatives again: a community buildbot, garnix, hydra...
Nonetheless, you pick a job in https://hercules-ci.com/github/SomeoneSerge/nixpkgs-cuda-ci/jobs/10700 and see what nixpkgs the corresponding flake.lock refers to...
23:09:25
@zopieux:matrix.zopi.euzopieux Thanks! I ended up binary-searching the most recent nixpkgs-unstable rev from flake.lock commits which happens to result in a cache-only build :) I agree this sucks though. Is there anything the community can help with this rehaul? Is Hercules the problem or something else? 23:12:48
@ss:someonex.netSomeoneSerge (matrix works sometimes)No, Hercules isn't the problem, just lack of maintenance is, lack of targeted work23:15:46
@ss:someonex.netSomeoneSerge (matrix works sometimes)

Is there anything the community can help with this rehaul?

Yes, I'm sure the community can. Somebody has got to push this (I so far have been struggling with some unrelated stuff so I couldn't): write up an opencollective proposal, write a new proposal to reconsider the hydra situation, &c. One could also finance those who work on this for a living: https://nixos.org/community/teams/cuda/

23:23:08
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

Is there anything the community can help with this rehaul?

Yes, I'm sure the community can. Somebody has got to push this (I so far have been struggling with some unrelated stuff so I couldn't): write up an opencollective proposal, write a new proposal to reconsider the hydra situation, &c. One could also finance those who already work on this for a living: https://nixos.org/community/teams/cuda/

23:28:32
21 Apr 2024
@connorbaker:matrix.orgconnor (he/him) Speaking of, I need to write a few proposals, docs, tutorials, and make an update for the NixOS discourse.
After I finish the integration work with fixed output derivations.
01:03:19
23 Apr 2024
@search-sense:matrix.orgsearch-senseRedacted or Malformed Event07:24:23

Show newer messages


Back to Room ListRoom Version: 9