!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

309 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda60 Servers

Load older messages


SenderMessageTime
1 Aug 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes) *

like

$ LD_DEBUG=libs python my-repro.py
14:53:51
@yorickvp:matrix.orgyorickvpalright, I'll try that14:54:17
@yorickvp:matrix.orgyorickvp the failing line is LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${cudaPackages.cuda_cudart.stubs}/lib python -m pybind11_stubgen -o . bindings 14:55:02
@yorickvp:matrix.orgyorickvp
In reply to @ss:someonex.net

like

$ LD_DEBUG=libs python my-repro.py
okay, so my libraries have rpath $ORIGIN:/home/yorick/outputs/out/lib:/nix/store/kzx58d5pbb78gnv9s4d62f4r46x9waw9-gcc-12.3.0-lib/lib:/nix/store/8rzflwd9bxri4s0bpicm8bkmi2ikmv7n-nccl-2.21.5-1/lib:/nix/store/61q201jxc1g6pkbvhyyriwlm7zasa81k-openmpi-4.1.6/lib:/nix/store/g798k855fny946jnycp61vkzy27kwlyl-libcublas-12.1.3.1-lib/lib:/nix/store/dbwp0scbb0rk78m636sb7cvycz8xzgyh-glibc-2.39-52/lib:/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/2v1jx43nsp9njldxh4bfljvh5wmnbzk3-python3.10-tensorrt-cu12-libs-10.2.0/lib/python3.10/site-packages/tensorrt_libs:/nix/store/ybqfab6p2p6ir9dcr6gn6rxn825wb86g-cudnn-8.9.7.29-lib/lib
15:12:20
@yorickvp:matrix.orgyorickvplooks like cmake is writing it as a LINK_PATH15:22:52
@ss:someonex.netSomeoneSerge (matrix works sometimes)So there's something else propagating an unwrapped (differently wrapped) gcc12 maybe15:23:40
@yorickvp:matrix.orgyorickvphow can I list all propagated inputs?15:24:08
@ss:someonex.netSomeoneSerge (matrix works sometimes)all propagated inputs of15:24:30
@yorickvp:matrix.orgyorickvp I'm in a nix develop for the drv that produces the libraries with the wrong rpath 15:25:24
@ss:someonex.netSomeoneSerge (matrix works sometimes) H'mm, maybe you can echo "${pkgsBuildHost[@]}" for compilers/build tools 15:26:51
@ss:someonex.netSomeoneSerge (matrix works sometimes)But that won't tell you where it's coming from15:27:09
@ss:someonex.netSomeoneSerge (matrix works sometimes)Just do a nix-tree --derivation or path-info why-depends15:27:31
@yorickvp:matrix.orgyorickvpseems like there's no unwrapped gcc15:48:59
@yorickvp:matrix.orgyorickvplibtorch_cuda.so also manages to link it15:49:53
@yorickvp:matrix.orgyorickvphttps://gist.github.com/yorickvP/b263b9d6d058280a3f7d4c70eff2a758 /nix/store/mbg29pcjydgss24z0v6jczjda7q4z9x6-gcc-12.3.0.drv (the offending gcc lib) only occurs as a dependency of the gcc-wrapper that has the correct lib first15:54:09
@yorickvp:matrix.orgyorickvpI'll try to repro with torch on nixos-unstable15:57:23
@yorickvp:matrix.orgyorickvpyeah, ${python3.pkgs.torchWithCuda.lib}/lib/libtorch_cuda.so links to gcc-12.4.0-lib16:16:14
@ss:someonex.netSomeoneSerge (matrix works sometimes)Wow16:40:20
@ss:someonex.netSomeoneSerge (matrix works sometimes)This looks like a regression16:40:27
@ss:someonex.netSomeoneSerge (matrix works sometimes) Well the first obvious leak (the one we see in the wrapper) is https://github.com/NixOS/nixpkgs/blob/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L596-L606 16:53:54
@ss:someonex.netSomeoneSerge (matrix works sometimes) Notably, cudaPackages.saxpy works fine at that commit 16:54:12
@ss:someonex.netSomeoneSerge (matrix works sometimes) I'm running github:NixOS/nixpkgs/c66e984bda09e7230ea7b364e677c5ba4f0d36d0#opencv4.tests.no-libstdcxx-errors now (only defined for cudaSupport = true) 16:54:41
@ss:someonex.netSomeoneSerge (matrix works sometimes)Going to take a while16:54:45
@ss:someonex.netSomeoneSerge (matrix works sometimes)But it might be the regression is somehow magically torch specific16:54:59
@ss:someonex.netSomeoneSerge (matrix works sometimes) No idea why https://github.com/NixOS/nixpkgs/blame/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L605-L606 uses cc_solib honestly 16:55:53
@yorickvp:matrix.orgyorickvpyou know, I blame cmake17:00:59
@yorickvp:matrix.orgyorickvp * you know, I blame cmake :)17:01:03
@yorickvp:matrix.orgyorickvplooking at 36 megabytes of cmake logs, it obviously parses it out of some gcc output (together with the correct one, which it puts first in the path). I'm not sure what it does with it after17:02:50
@ss:someonex.netSomeoneSerge (matrix works sometimes)Waiting for opencv, but so far I'm leaning towards "maybe pytorch devs replaced some of the cmake logic with an unnecessary gcc -print-search-paths"17:06:46
@yorickvp:matrix.orgyorickvpI'm looking at https://github.com/Kitware/CMake/blob/master/Modules/CMakeParseImplicitLinkInfo.cmake17:08:01

Show newer messages


Back to Room ListRoom Version: 9