!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

289 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
1 Aug 2024
@yorickvp:matrix.orgyorickvpyou know, I blame cmake17:00:59
@yorickvp:matrix.orgyorickvp * you know, I blame cmake :)17:01:03
@yorickvp:matrix.orgyorickvplooking at 36 megabytes of cmake logs, it obviously parses it out of some gcc output (together with the correct one, which it puts first in the path). I'm not sure what it does with it after17:02:50
@ss:someonex.netSomeoneSerge (back on matrix)Waiting for opencv, but so far I'm leaning towards "maybe pytorch devs replaced some of the cmake logic with an unnecessary gcc -print-search-paths"17:06:46
@yorickvp:matrix.orgyorickvpI'm looking at https://github.com/Kitware/CMake/blob/master/Modules/CMakeParseImplicitLinkInfo.cmake17:08:01
@ss:someonex.netSomeoneSerge (back on matrix)saxpy and opencv are built using cmake too17:08:34
@ss:someonex.netSomeoneSerge (back on matrix)At least one of them has been shown to still work (whatever the cost)17:08:56
@ss:someonex.netSomeoneSerge (back on matrix)
gy skimage.transform skimage.util skimage.segmentation
python3-3.11.9-env> building '/nix/store/4rqjcjk4h2mnfwsbvcgf3igjnmpxhxwf-python3-3.11.9-env.drv'
python3-3.11.9-env> created 521 symlinks in user environment
opencv-4.9.0-libstdcxx-test> building '/nix/store/2gh11xabzlxbfgvydhcln0qbfiharw32-opencv-4.9.0-libstdcxx-test.drv'
┏━ Dependency Graph:
┃             ┌─ ✔ opencv-4.9.0 ⏱ 17m40s
┃          ┌─ ✔ python3.11-pillow-heif-0.16.0 ⏱ 2m0s
┃       ┌─ ✔ python3.11-imageio-2.34.2 ⏱ 11s
┃    ┌─ ✔ python3.11-scikit-image-0.22.0 ⏱ 1m37s
┃ ┌─ ✔ python3-3.11.9-env ⏱ 1s
┃ ✔ opencv-4.9.0-libstdcxx-test 
┣━━━ Builds         
┗━ ∑ ⏵ 0 │ ✔ 6 │ ⏸ 0 │ Finished at 17:11:37 after 21m35s
17:12:13
@ss:someonex.netSomeoneSerge (back on matrix)So ugh at least opencv4's python extension must be linking the right libstdc++17:13:11
@ss:someonex.netSomeoneSerge (back on matrix) Hmm the last torch update was almost two months ago https://github.com/NixOS/nixpkgs/pull/317576 17:14:41
@ss:someonex.netSomeoneSerge (back on matrix) * Hmm the last merged torch update was almost two months ago https://github.com/NixOS/nixpkgs/pull/317576 17:14:45
@ss:someonex.netSomeoneSerge (back on matrix) yorickvp would you volunteer to run the bisection? 🫠 17:15:40
@yorickvp:matrix.orgyorickvpsure, do you have a known working commit?17:15:47
@ss:someonex.netSomeoneSerge (back on matrix)

Well, I got a workstation sat

Revision:      b2852eb9365c6de48ffb0dc2c9562591f652242a
Last modified: 2024-06-27 16:44:53

Let me check if torch actually works there

17:16:31
@ss:someonex.netSomeoneSerge (back on matrix)
❯ nix-shell -p 'python3.withPackages (ps: [ ps.torch ])'
trace: warning: cudaPackages.autoAddDriverRunpath is deprecated, use pkgs.autoAddDriverRunpath instead
trace: warning: cudaPackages.autoFixElfFiles is deprecated, use pkgs.autoFixElfFiles instead
trace: warning: cudaPackages.autoAddOpenGLRunpathHook is deprecated, use pkgs.autoAddDriverRunpathHook instead
this derivation will be built:
  /nix/store/qmmz2hxinp65zsprb3g92my7wqvbwncm-python3-3.11.9-env.drv
building '/nix/store/qmmz2hxinp65zsprb3g92my7wqvbwncm-python3-3.11.9-env.drv'...
created 516 symlinks in user environment
[WARN] - (starship::utils): Executing command "/home/ss/.nix-profile/bin/git" timed out.
[WARN] - (starship::utils): You can set command_timeout in your config to a higher value to allow longer-running commands to keep executing.
ss in 🌐 cs-338 in triton on  openai-triton [$] via ❄️  impure (shell) 
❯ python
Python 3.11.9 (main, Apr  2 2024, 08:25:04) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
17:17:20
@yorickvp:matrix.orgyorickvpit probably works but still secretly links gcc-12.4.0, which isn't always fatal17:17:19
@ss:someonex.netSomeoneSerge (back on matrix)No, it shouldn't17:17:33
@ss:someonex.netSomeoneSerge (back on matrix)It definitely was the case that gcc-12 reference was not retained in the output path17:17:56
@yorickvp:matrix.orgyorickvp patchelf --print-rpath $(nix-build python3.pkgs.torch.lib)/lib/libtorch_cuda.so ? 17:18:22
@ss:someonex.netSomeoneSerge (back on matrix)Yup, it's 12.3 in this commit and it happens to work, but yet again, this is a regression17:19:46
@ss:someonex.netSomeoneSerge (back on matrix)The reference used not to be retained in the outputs17:20:02
@ss:someonex.netSomeoneSerge (back on matrix)Trying 24.0517:21:01
@ss:someonex.netSomeoneSerge (back on matrix)I think it's time to add an exportReferencesGraph test to e.g. torch, or better yet to a few core packages17:22:47
@ss:someonex.netSomeoneSerge (back on matrix)As a very unambiguous way to ensure that this stuff isn't referenced17:23:10
@ss:someonex.netSomeoneSerge (back on matrix)Oh wait. Actually, now it is going to be in the closure if we include triton17:23:31
@ss:someonex.netSomeoneSerge (back on matrix)I think we keep a reference to the toolchain in triton17:24:03
@ss:someonex.netSomeoneSerge (back on matrix)A a rough estimate, I think 23.11 is a good commit xD17:29:22
@ss:someonex.netSomeoneSerge (back on matrix)Sorry got to leave now for a while17:29:35
@yorickvp:matrix.orgyorickvp

bisecting the following:

let
  pkgs = import ./. {
    config = {
      allowUnfree = true;
      cudaCapabilities = [ "8.6" ];
      cudaSupport = true;
    };
  };
in
{
  torchtest = (pkgs.python3.pkgs.torch.override { openai-triton = null; }).overridePythonAttrs (o: {
    disallowedReferences = [ pkgs.python3.pkgs.torch.cudaPackages.cuda_nvcc.stdenv.cc.cc.lib ];
    USE_CUDNN = 0;
    USE_KINETO = 0;
    USE_QNNPACK = 0;
    USE_PYTORCH_QNNPACK = 0;
    USE_XNNPACK = 0;
    INTERN_DISABLE_ONNX = 1;
    ONNX_ML = 0;
    USE_ITT = 0;
    USE_FLASH_ATTENTION = 0;
    USE_MEM_EFF_ATTENTION = 0;
    USE_FBGEMM = 0;
    USE_MKLDNN = 0;
  });
}
17:37:11
@yorickvp:matrix.orgyorickvp
In reply to @yorickvp:matrix.org

bisecting the following:

let
  pkgs = import ./. {
    config = {
      allowUnfree = true;
      cudaCapabilities = [ "8.6" ];
      cudaSupport = true;
    };
  };
in
{
  torchtest = (pkgs.python3.pkgs.torch.override { openai-triton = null; }).overridePythonAttrs (o: {
    disallowedReferences = [ pkgs.python3.pkgs.torch.cudaPackages.cuda_nvcc.stdenv.cc.cc.lib ];
    USE_CUDNN = 0;
    USE_KINETO = 0;
    USE_QNNPACK = 0;
    USE_PYTORCH_QNNPACK = 0;
    USE_XNNPACK = 0;
    INTERN_DISABLE_ONNX = 1;
    ONNX_ML = 0;
    USE_ITT = 0;
    USE_FLASH_ATTENTION = 0;
    USE_MEM_EFF_ATTENTION = 0;
    USE_FBGEMM = 0;
    USE_MKLDNN = 0;
  });
}
disallowedReferences seems not to work, though
17:52:16

Show newer messages


Back to Room ListRoom Version: 9