!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

290 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
1 Aug 2024
@ss:someonex.netSomeoneSerge (back on matrix) I'd rather guess that gcc12\s libstdc++ comes from elsewhere, like another dependency. Run with LD_DEBUG=libs 14:52:22
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Yeah the way we worked around cc-wrapper was kind of ugly and relied on gcc respecting priorities/a particular order of flags
(yes this is bad and we should fix that)
14:52:47
@yorickvp:matrix.orgyorickvprun what with LD_DEBUG=libs?14:53:06
@ss:someonex.netSomeoneSerge (back on matrix)

like

$ LD_DEBUG=libs python
import torch
14:53:26
@ss:someonex.netSomeoneSerge (back on matrix) *

like

$ LD_DEBUG=libs python
import torch
# do whatever you do
14:53:34
@ss:someonex.netSomeoneSerge (back on matrix) *

like

$ LD_DEBUG=libs python my-repro.py
14:53:51
@yorickvp:matrix.orgyorickvpalright, I'll try that14:54:17
@yorickvp:matrix.orgyorickvp the failing line is LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${cudaPackages.cuda_cudart.stubs}/lib python -m pybind11_stubgen -o . bindings 14:55:02
@yorickvp:matrix.orgyorickvp
In reply to @ss:someonex.net

like

$ LD_DEBUG=libs python my-repro.py
okay, so my libraries have rpath $ORIGIN:/home/yorick/outputs/out/lib:/nix/store/kzx58d5pbb78gnv9s4d62f4r46x9waw9-gcc-12.3.0-lib/lib:/nix/store/8rzflwd9bxri4s0bpicm8bkmi2ikmv7n-nccl-2.21.5-1/lib:/nix/store/61q201jxc1g6pkbvhyyriwlm7zasa81k-openmpi-4.1.6/lib:/nix/store/g798k855fny946jnycp61vkzy27kwlyl-libcublas-12.1.3.1-lib/lib:/nix/store/dbwp0scbb0rk78m636sb7cvycz8xzgyh-glibc-2.39-52/lib:/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/2v1jx43nsp9njldxh4bfljvh5wmnbzk3-python3.10-tensorrt-cu12-libs-10.2.0/lib/python3.10/site-packages/tensorrt_libs:/nix/store/ybqfab6p2p6ir9dcr6gn6rxn825wb86g-cudnn-8.9.7.29-lib/lib
15:12:20
@yorickvp:matrix.orgyorickvplooks like cmake is writing it as a LINK_PATH15:22:52
@ss:someonex.netSomeoneSerge (back on matrix)So there's something else propagating an unwrapped (differently wrapped) gcc12 maybe15:23:40
@yorickvp:matrix.orgyorickvphow can I list all propagated inputs?15:24:08
@ss:someonex.netSomeoneSerge (back on matrix)all propagated inputs of15:24:30
@yorickvp:matrix.orgyorickvp I'm in a nix develop for the drv that produces the libraries with the wrong rpath 15:25:24
@ss:someonex.netSomeoneSerge (back on matrix) H'mm, maybe you can echo "${pkgsBuildHost[@]}" for compilers/build tools 15:26:51
@ss:someonex.netSomeoneSerge (back on matrix)But that won't tell you where it's coming from15:27:09
@ss:someonex.netSomeoneSerge (back on matrix)Just do a nix-tree --derivation or path-info why-depends15:27:31
@yorickvp:matrix.orgyorickvpseems like there's no unwrapped gcc15:48:59
@yorickvp:matrix.orgyorickvplibtorch_cuda.so also manages to link it15:49:53
@yorickvp:matrix.orgyorickvphttps://gist.github.com/yorickvP/b263b9d6d058280a3f7d4c70eff2a758 /nix/store/mbg29pcjydgss24z0v6jczjda7q4z9x6-gcc-12.3.0.drv (the offending gcc lib) only occurs as a dependency of the gcc-wrapper that has the correct lib first15:54:09
@yorickvp:matrix.orgyorickvpI'll try to repro with torch on nixos-unstable15:57:23
@yorickvp:matrix.orgyorickvpyeah, ${python3.pkgs.torchWithCuda.lib}/lib/libtorch_cuda.so links to gcc-12.4.0-lib16:16:14
@ss:someonex.netSomeoneSerge (back on matrix)Wow16:40:20
@ss:someonex.netSomeoneSerge (back on matrix)This looks like a regression16:40:27
@ss:someonex.netSomeoneSerge (back on matrix) Well the first obvious leak (the one we see in the wrapper) is https://github.com/NixOS/nixpkgs/blob/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L596-L606 16:53:54
@ss:someonex.netSomeoneSerge (back on matrix) Notably, cudaPackages.saxpy works fine at that commit 16:54:12
@ss:someonex.netSomeoneSerge (back on matrix) I'm running github:NixOS/nixpkgs/c66e984bda09e7230ea7b364e677c5ba4f0d36d0#opencv4.tests.no-libstdcxx-errors now (only defined for cudaSupport = true) 16:54:41
@ss:someonex.netSomeoneSerge (back on matrix)Going to take a while16:54:45
@ss:someonex.netSomeoneSerge (back on matrix)But it might be the regression is somehow magically torch specific16:54:59
@ss:someonex.netSomeoneSerge (back on matrix) No idea why https://github.com/NixOS/nixpkgs/blame/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L605-L606 uses cc_solib honestly 16:55:53

Show newer messages


Back to Room ListRoom Version: 9