| 1 Aug 2024 |
SomeoneSerge (back on matrix) | I'd rather guess that gcc12\s libstdc++ comes from elsewhere, like another dependency. Run with LD_DEBUG=libs | 14:52:22 |
SomeoneSerge (back on matrix) | In reply to @ss:someonex.net Yeah the way we worked around cc-wrapper was kind of ugly and relied on gcc respecting priorities/a particular order of flags (yes this is bad and we should fix that) | 14:52:47 |
yorickvp | run what with LD_DEBUG=libs? | 14:53:06 |
SomeoneSerge (back on matrix) | like
$ LD_DEBUG=libs python
import torch
| 14:53:26 |
SomeoneSerge (back on matrix) | * like
$ LD_DEBUG=libs python
import torch
# do whatever you do
| 14:53:34 |
SomeoneSerge (back on matrix) | * like
$ LD_DEBUG=libs python my-repro.py
| 14:53:51 |
yorickvp | alright, I'll try that | 14:54:17 |
yorickvp | the failing line is LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${cudaPackages.cuda_cudart.stubs}/lib python -m pybind11_stubgen -o . bindings | 14:55:02 |
yorickvp | In reply to @ss:someonex.net
like
$ LD_DEBUG=libs python my-repro.py
okay, so my libraries have rpath $ORIGIN:/home/yorick/outputs/out/lib:/nix/store/kzx58d5pbb78gnv9s4d62f4r46x9waw9-gcc-12.3.0-lib/lib:/nix/store/8rzflwd9bxri4s0bpicm8bkmi2ikmv7n-nccl-2.21.5-1/lib:/nix/store/61q201jxc1g6pkbvhyyriwlm7zasa81k-openmpi-4.1.6/lib:/nix/store/g798k855fny946jnycp61vkzy27kwlyl-libcublas-12.1.3.1-lib/lib:/nix/store/dbwp0scbb0rk78m636sb7cvycz8xzgyh-glibc-2.39-52/lib:/nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/2v1jx43nsp9njldxh4bfljvh5wmnbzk3-python3.10-tensorrt-cu12-libs-10.2.0/lib/python3.10/site-packages/tensorrt_libs:/nix/store/ybqfab6p2p6ir9dcr6gn6rxn825wb86g-cudnn-8.9.7.29-lib/lib | 15:12:20 |
yorickvp | looks like cmake is writing it as a LINK_PATH | 15:22:52 |
SomeoneSerge (back on matrix) | So there's something else propagating an unwrapped (differently wrapped) gcc12 maybe | 15:23:40 |
yorickvp | how can I list all propagated inputs? | 15:24:08 |
SomeoneSerge (back on matrix) | all propagated inputs of | 15:24:30 |
yorickvp | I'm in a nix develop for the drv that produces the libraries with the wrong rpath | 15:25:24 |
SomeoneSerge (back on matrix) | H'mm, maybe you can echo "${pkgsBuildHost[@]}" for compilers/build tools | 15:26:51 |
SomeoneSerge (back on matrix) | But that won't tell you where it's coming from | 15:27:09 |
SomeoneSerge (back on matrix) | Just do a nix-tree --derivation or path-info why-depends | 15:27:31 |
yorickvp | seems like there's no unwrapped gcc | 15:48:59 |
yorickvp | libtorch_cuda.so also manages to link it | 15:49:53 |
yorickvp | https://gist.github.com/yorickvP/b263b9d6d058280a3f7d4c70eff2a758
/nix/store/mbg29pcjydgss24z0v6jczjda7q4z9x6-gcc-12.3.0.drv (the offending gcc lib) only occurs as a dependency of the gcc-wrapper that has the correct lib first | 15:54:09 |
yorickvp | I'll try to repro with torch on nixos-unstable | 15:57:23 |
yorickvp | yeah, ${python3.pkgs.torchWithCuda.lib}/lib/libtorch_cuda.so links to gcc-12.4.0-lib | 16:16:14 |
SomeoneSerge (back on matrix) | Wow | 16:40:20 |
SomeoneSerge (back on matrix) | This looks like a regression | 16:40:27 |
SomeoneSerge (back on matrix) | Well the first obvious leak (the one we see in the wrapper) is https://github.com/NixOS/nixpkgs/blob/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L596-L606 | 16:53:54 |
SomeoneSerge (back on matrix) | Notably, cudaPackages.saxpy works fine at that commit | 16:54:12 |
SomeoneSerge (back on matrix) | I'm running github:NixOS/nixpkgs/c66e984bda09e7230ea7b364e677c5ba4f0d36d0#opencv4.tests.no-libstdcxx-errors now (only defined for cudaSupport = true) | 16:54:41 |
SomeoneSerge (back on matrix) | Going to take a while | 16:54:45 |
SomeoneSerge (back on matrix) | But it might be the regression is somehow magically torch specific | 16:54:59 |
SomeoneSerge (back on matrix) | No idea why https://github.com/NixOS/nixpkgs/blame/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L605-L606 uses cc_solib honestly | 16:55:53 |