!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

291 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
12 Dec 2024
@connorbaker:matrix.orgconnor (he/him)How is there so little time in a day 😭01:50:57
@connorbaker:matrix.orgconnor (he/him)ugh, my main desktop is gone from my tailscale network and unreachable from the other hosts. Damn Intel chip probably did a voltage whoopsy and needs to be hard reset.04:05:37
@connorbaker:matrix.orgconnor (he/him)Cool that shit is gonna be broken until I can fly home at the end of the month lmao04:42:54
@msanft:matrix.orgMoritz Sanft If I'm not mistaken, this brings in a dependency of nvidia-x11 upon all of the packages listed here, right?
https://github.com/NixOS/nixpkgs/blob/eac1633a086e8e109e00ce58c0b47721da1dbdfd/pkgs/os-specific/linux/nvidia-x11/generic.nix#L112

I wondered why perl is included in my closure, and thus discovered the mesa dependency of the driver, then got confused. If this is really the case, this is something you wouldn't want on a space-constrained system, right?
15:20:27
13 Dec 2024
@sielicki:matrix.orgsielickihey connor -- I can't request a github review on https://github.com/aws/aws-ofi-nccl/pull/745 for some reason, but I would like a review from you (or anyone else idling here) if you can find time. It's using the external cuda flake. 04:07:57
@sielicki:matrix.orgsielickiI have a very thorough libfabric derivation on my personal laptop that I intend to finish over the holidays and propose to nixpkgs proper04:08:36
@sielicki:matrix.orgsielicki * hey connor -- I can't request a github review on https://github.com/aws/aws-ofi-nccl/pull/745 for some reason, but I would like a review from you (or anyone else idling here) if you can find time. Specifically asking you because it's using your external cuda flake. 04:09:30
@sielicki:matrix.orgsielickiwant to call it out here (mostly as a reminder to myself): the openmpi drv in nixpkgs should be capable of completely detaching from ucx/ucc, but it's not simple to make that happen today. On aws you shouldn't need ucx/ucc at all.04:13:37
@connorbaker:matrix.orgconnor (he/him)I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed04:58:25
@sielicki:matrix.orgsielicki
In reply to @connorbaker:matrix.org
I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed
sure -- no need to build anything, just looking for an overall look at how it's using cudaPackagesExtensions etc
07:42:06
@sielicki:matrix.orgsielicki
In reply to @connorbaker:matrix.org
I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed
* sure -- no need to build anything, just looking for an overall look at how it's using cudaPackagesExtensions etc.
07:42:20
@sielicki:matrix.orgsielickigenerally just a review of the nix code and anything I did stupid07:42:31
@msanft:matrix.orgMoritz Sanft I just noticed another weird thing while trying to hunt down that Perl dependency:

As I'm building the driver for a server scenario, I removed the graphics and X11 stuff from the libPath. I still had the Perl dependency in my image though. When analyzing its chain, I saw the following:
/nix/store/1gx9dgmj33jd1753fww5cmq0q087q48n-nixos-system-nixos-25.05pre-git
└───/nix/store/czlpjck5z3vsgw1w9szinwnv15l4a2n3-system-path
    └───/nix/store/7m1var7g0swf2ikn3d3swsxk5w6lbcpv-nvidia-persistenced-550.90.07
        └───/nix/store/wh45iphj9kr43mxq0wks9qam2swabf6f-nvidia-x11-550.90.07-6.11
            └───/nix/store/lcq3ibmsb6c2jgqp3yfi1yp773x5wz19-mesa-24.2.6
                └───/nix/store/0i5icd6l3pkjckipa5f94jv7dsj5md70-lm-sensors-3.6.0
                    └───/nix/store/3vq9qasxlqpyq1k95nq3s13g2m6w59ay-perl-5.40.0

Now, when I remove the persistenced, the dependency is gone. This means that the persistenced depends on another NVIDIA driver than what the system actually uses, somehow. The driver that's used in the system is at /nix/store/zsdr4vrybbik9hb8nss6fbmi71wsqhv3-nvidia-x11-550.90.07-6.11. When I now run nix derivation show /path/to/persistenced-package, I see the following:

"postFixup": "# Save a copy of persistenced for mounting in containers\nmkdir $out/origBin\ncp $out/{bin,origBin}/nvidia-persistenced\npatchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 $out/origBin/nvidia-persistenced\n\npatchelf --set-rpath \"$(patchelf --print-rpath $out/bin/nvidia-persistenced):/nix/store/wh45iphj9kr43mxq0wks9qam2swabf6f-nvidia-x11-550.90.07-6.11/lib\" \\\n  $out/bin/nvidia-persistenced\n",

This means that another driver is used for building the persistenced somehow? Looking at the packaging infrastructure, it seems that nvidia_x11 is passed as an argument, which would mean that it should use the same one. However, I fear that there's some kind of evaluation differential here, as the persistenced package might be built before hardware.nvidia.package is even evaluated? Has anyone of you ever run into something similar before?
08:41:49
@msanft:matrix.orgMoritz Sanft fwiw; Solved it by doing a very dirty hack that overrides the nvidia_x11 used in nvidia-persistenced explicitly:
https://github.com/edgelesssys/contrast/commit/5bf5cb81ce05f6f25b2cdf960ca3ab57a7f3459f
15:05:40
14 Dec 2024
@matthewcroughan:defenestrate.itmatthewcroughanIs there a way to wrap programs in Nix so that they believe they have a specific directory structure, like an FHS Env, whilst not screwing around too much with things that impact Cuda/GPU access?16:47:31
@matthewcroughan:defenestrate.itmatthewcroughan

I'm trying to package an application that wants access to source code dir paths, and I think this would be a good use of a layer/wrapper that performs symlinking at runtime to change the view of the world from the perspective of the application:

  • https://github.com/BatteredBunny/nix-ai-stuff/blob/main/pkgs/comfyui/default.nix#L54-L70
  • https://github.com/lboklin/nixified-ai/blob/master/projects/comfyui/package.nix#L116-L147
16:52:25
@matthewcroughan:defenestrate.itmatthewcroughaninstead of doing it in the installPhase for example16:52:32
@matthewcroughan:defenestrate.itmatthewcroughanif you enable cudaSupport and rocmSupport, what happens? Do you actually get an output that is usable for both?20:24:53
@sielicki:matrix.orgsielicki matthewcroughan: IMO it might be faster and better for you to write the missing pyproject.toml it needs 20:34:25
@matthewcroughan:defenestrate.itmatthewcroughan Would I not have to rewrite that each and every single time the owner updates the package? 20:34:49
@matthewcroughan:defenestrate.itmatthewcroughanI would rather write down the missing source/destination, instead of filling in for what the developer isn't doing20:34:59
@matthewcroughan:defenestrate.itmatthewcroughanDoes the pyproject.toml actually account for mutable vs immutable ?20:35:44
@matthewcroughan:defenestrate.itmatthewcroughanIf it's a minimal enough difference then maybe I can submit it upstream and convince them to maintain it 20:36:25
@sielicki:matrix.orgsielickialthough python is interpretted you still end up with an automatic installation phase that generates bytecode -- it really shouldn't be the case that it needs any access to .py files at runtime20:36:59
@matthewcroughan:defenestrate.itmatthewcroughanBut it does, and this is a pattern present in plenty of projects, and this also happens in PHP all the time20:37:26
@matthewcroughan:defenestrate.itmatthewcroughanSo we have to get over it somehow20:37:37
@matthewcroughan:defenestrate.itmatthewcroughanI think the least evil method are wrappers that fool the application into seeing a FS you want them to see20:38:18
@matthewcroughan:defenestrate.itmatthewcroughanThough maintaining that with a series of cp/ln in the installPhase of something is pretty annoying20:38:38
@sielicki:matrix.orgsielickiI guess what I'm saying is that I think just copying the source into $out is insufficient and abnormal -- each of these should be a separate python module and the buildPhase should take care of both copying it into expected python path dir structures and byte-compiling it. Yes, it works to just have a py file instead of a pyc file and vice-versa, but you really do want the pyc files 20:44:58
@matthewcroughan:defenestrate.itmatthewcroughan

I guess what I'm saying is that I think just copying the source into $out is insufficient and abnormal

Yes, but abnormal applications exist, and make up the majority of nixpkgs

20:45:26

Show newer messages


Back to Room ListRoom Version: 9