| 1 Aug 2024 |
yorickvp | seems like there's no unwrapped gcc | 15:48:59 |
yorickvp | libtorch_cuda.so also manages to link it | 15:49:53 |
yorickvp | https://gist.github.com/yorickvP/b263b9d6d058280a3f7d4c70eff2a758
/nix/store/mbg29pcjydgss24z0v6jczjda7q4z9x6-gcc-12.3.0.drv (the offending gcc lib) only occurs as a dependency of the gcc-wrapper that has the correct lib first | 15:54:09 |
yorickvp | I'll try to repro with torch on nixos-unstable | 15:57:23 |
yorickvp | yeah, ${python3.pkgs.torchWithCuda.lib}/lib/libtorch_cuda.so links to gcc-12.4.0-lib | 16:16:14 |
SomeoneSerge (back on matrix) | Wow | 16:40:20 |
SomeoneSerge (back on matrix) | This looks like a regression | 16:40:27 |
SomeoneSerge (back on matrix) | Well the first obvious leak (the one we see in the wrapper) is https://github.com/NixOS/nixpkgs/blob/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L596-L606 | 16:53:54 |
SomeoneSerge (back on matrix) | Notably, cudaPackages.saxpy works fine at that commit | 16:54:12 |
SomeoneSerge (back on matrix) | I'm running github:NixOS/nixpkgs/c66e984bda09e7230ea7b364e677c5ba4f0d36d0#opencv4.tests.no-libstdcxx-errors now (only defined for cudaSupport = true) | 16:54:41 |
SomeoneSerge (back on matrix) | Going to take a while | 16:54:45 |
SomeoneSerge (back on matrix) | But it might be the regression is somehow magically torch specific | 16:54:59 |
SomeoneSerge (back on matrix) | No idea why https://github.com/NixOS/nixpkgs/blame/fc27807b85986bb26a8f28e590e01fae742e6b53/pkgs/build-support/cc-wrapper/default.nix#L605-L606 uses cc_solib honestly | 16:55:53 |
yorickvp | you know, I blame cmake | 17:00:59 |
yorickvp | * you know, I blame cmake :) | 17:01:03 |
yorickvp | looking at 36 megabytes of cmake logs, it obviously parses it out of some gcc output (together with the correct one, which it puts first in the path). I'm not sure what it does with it after | 17:02:50 |
SomeoneSerge (back on matrix) | Waiting for opencv, but so far I'm leaning towards "maybe pytorch devs replaced some of the cmake logic with an unnecessary gcc -print-search-paths" | 17:06:46 |
yorickvp | I'm looking at https://github.com/Kitware/CMake/blob/master/Modules/CMakeParseImplicitLinkInfo.cmake | 17:08:01 |
SomeoneSerge (back on matrix) | saxpy and opencv are built using cmake too | 17:08:34 |
SomeoneSerge (back on matrix) | At least one of them has been shown to still work (whatever the cost) | 17:08:56 |
SomeoneSerge (back on matrix) | gy skimage.transform skimage.util skimage.segmentation
python3-3.11.9-env> building '/nix/store/4rqjcjk4h2mnfwsbvcgf3igjnmpxhxwf-python3-3.11.9-env.drv'
python3-3.11.9-env> created 521 symlinks in user environment
opencv-4.9.0-libstdcxx-test> building '/nix/store/2gh11xabzlxbfgvydhcln0qbfiharw32-opencv-4.9.0-libstdcxx-test.drv'
┏━ Dependency Graph:
┃ ┌─ ✔ opencv-4.9.0 ⏱ 17m40s
┃ ┌─ ✔ python3.11-pillow-heif-0.16.0 ⏱ 2m0s
┃ ┌─ ✔ python3.11-imageio-2.34.2 ⏱ 11s
┃ ┌─ ✔ python3.11-scikit-image-0.22.0 ⏱ 1m37s
┃ ┌─ ✔ python3-3.11.9-env ⏱ 1s
┃ ✔ opencv-4.9.0-libstdcxx-test
┣━━━ Builds
┗━ ∑ ⏵ 0 │ ✔ 6 │ ⏸ 0 │ Finished at 17:11:37 after 21m35s
| 17:12:13 |
SomeoneSerge (back on matrix) | So ugh at least opencv4's python extension must be linking the right libstdc++ | 17:13:11 |
SomeoneSerge (back on matrix) | Hmm the last torch update was almost two months ago https://github.com/NixOS/nixpkgs/pull/317576 | 17:14:41 |
SomeoneSerge (back on matrix) | * Hmm the last merged torch update was almost two months ago https://github.com/NixOS/nixpkgs/pull/317576 | 17:14:45 |
SomeoneSerge (back on matrix) | yorickvp would you volunteer to run the bisection? 🫠 | 17:15:40 |
yorickvp | sure, do you have a known working commit? | 17:15:47 |
SomeoneSerge (back on matrix) | Well, I got a workstation sat
Revision: b2852eb9365c6de48ffb0dc2c9562591f652242a
Last modified: 2024-06-27 16:44:53
Let me check if torch actually works there | 17:16:31 |
SomeoneSerge (back on matrix) | ❯ nix-shell -p 'python3.withPackages (ps: [ ps.torch ])'
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoFixElfFiles is deprecated, use pkgs.autoFixElfFiles instead
trace: warning: cudaPackages.autoAddOpenGLRunpathHook is deprecated, use pkgs.autoAddDriverRunpathHook instead
this derivation will be built:
/nix/store/qmmz2hxinp65zsprb3g92my7wqvbwncm-python3-3.11.9-env.drv
building '/nix/store/qmmz2hxinp65zsprb3g92my7wqvbwncm-python3-3.11.9-env.drv'...
created 516 symlinks in user environment
[WARN] - (starship::utils): Executing command "/home/ss/.nix-profile/bin/git" timed out.
[WARN] - (starship::utils): You can set command_timeout in your config to a higher value to allow longer-running commands to keep executing.
ss in 🌐 cs-338 in triton on openai-triton [$] via ❄️ impure (shell)
❯ python
Python 3.11.9 (main, Apr 2 2024, 08:25:04) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
| 17:17:20 |
yorickvp | it probably works but still secretly links gcc-12.4.0, which isn't always fatal | 17:17:19 |
SomeoneSerge (back on matrix) | No, it shouldn't | 17:17:33 |