!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
13 Dec 2024
@sielicki:matrix.orgsielicki * hey connor -- I can't request a github review on https://github.com/aws/aws-ofi-nccl/pull/745 for some reason, but I would like a review from you (or anyone else idling here) if you can find time. Specifically asking you because it's using your external cuda flake. 04:09:30
@sielicki:matrix.orgsielickiwant to call it out here (mostly as a reminder to myself): the openmpi drv in nixpkgs should be capable of completely detaching from ucx/ucc, but it's not simple to make that happen today. On aws you shouldn't need ucx/ucc at all.04:13:37
@connorbaker:matrix.orgconnor (he/him)I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed04:58:25
@sielicki:matrix.orgsielicki
In reply to @connorbaker:matrix.org
I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed
sure -- no need to build anything, just looking for an overall look at how it's using cudaPackagesExtensions etc
07:42:06
@sielicki:matrix.orgsielicki
In reply to @connorbaker:matrix.org
I’ll try to take a look sometime this weekend; one of my remote machines is down so my ability to build stuff is hosed
* sure -- no need to build anything, just looking for an overall look at how it's using cudaPackagesExtensions etc.
07:42:20
@sielicki:matrix.orgsielickigenerally just a review of the nix code and anything I did stupid07:42:31
@msanft:matrix.orgMoritz Sanft I just noticed another weird thing while trying to hunt down that Perl dependency:

As I'm building the driver for a server scenario, I removed the graphics and X11 stuff from the libPath. I still had the Perl dependency in my image though. When analyzing its chain, I saw the following:
/nix/store/1gx9dgmj33jd1753fww5cmq0q087q48n-nixos-system-nixos-25.05pre-git
└───/nix/store/czlpjck5z3vsgw1w9szinwnv15l4a2n3-system-path
    └───/nix/store/7m1var7g0swf2ikn3d3swsxk5w6lbcpv-nvidia-persistenced-550.90.07
        └───/nix/store/wh45iphj9kr43mxq0wks9qam2swabf6f-nvidia-x11-550.90.07-6.11
            └───/nix/store/lcq3ibmsb6c2jgqp3yfi1yp773x5wz19-mesa-24.2.6
                └───/nix/store/0i5icd6l3pkjckipa5f94jv7dsj5md70-lm-sensors-3.6.0
                    └───/nix/store/3vq9qasxlqpyq1k95nq3s13g2m6w59ay-perl-5.40.0

Now, when I remove the persistenced, the dependency is gone. This means that the persistenced depends on another NVIDIA driver than what the system actually uses, somehow. The driver that's used in the system is at /nix/store/zsdr4vrybbik9hb8nss6fbmi71wsqhv3-nvidia-x11-550.90.07-6.11. When I now run nix derivation show /path/to/persistenced-package, I see the following:

"postFixup": "# Save a copy of persistenced for mounting in containers\nmkdir $out/origBin\ncp $out/{bin,origBin}/nvidia-persistenced\npatchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 $out/origBin/nvidia-persistenced\n\npatchelf --set-rpath \"$(patchelf --print-rpath $out/bin/nvidia-persistenced):/nix/store/wh45iphj9kr43mxq0wks9qam2swabf6f-nvidia-x11-550.90.07-6.11/lib\" \\\n  $out/bin/nvidia-persistenced\n",

This means that another driver is used for building the persistenced somehow? Looking at the packaging infrastructure, it seems that nvidia_x11 is passed as an argument, which would mean that it should use the same one. However, I fear that there's some kind of evaluation differential here, as the persistenced package might be built before hardware.nvidia.package is even evaluated? Has anyone of you ever run into something similar before?
08:41:49
@msanft:matrix.orgMoritz Sanft fwiw; Solved it by doing a very dirty hack that overrides the nvidia_x11 used in nvidia-persistenced explicitly:
https://github.com/edgelesssys/contrast/commit/5bf5cb81ce05f6f25b2cdf960ca3ab57a7f3459f
15:05:40

Show newer messages


Back to Room ListRoom Version: 9