!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

324 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda64 Servers

Load older messages


SenderMessageTime
17 Jun 2024
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @ss:someonex.net
Is this nvidia-smi from your hard coded linuxPackages?
Or is the one mounted from the host and expecting that there would be a /lib/ld-linux*.so?
12:45:51
@grw00:matrix.orggrw00yes it is12:46:43
@grw00:matrix.orggrw00
bash-5.2# df
Filesystem     1K-blocks     Used Available Use% Mounted on
overlay         10485760    64056  10421704   1% /
tmpfs              65536        0     65536   0% /dev
tmpfs          132014436        0 132014436   0% /sys/fs/cgroup
shm             15728640        0  15728640   0% /dev/shm
/dev/nvme0n1p2  65478188 24385240  37721124  40% /sbin/docker-init
/dev/nvme0n1p4  52428800        0  52428800   0% /cache
udev           131923756        0 131923756   0% /dev/null
udev           131923756        0 131923756   0% /dev/tty
tmpfs          132014436       12 132014424   1% /proc/driver/nvidia
tmpfs          132014436        4 132014432   1% /etc/nvidia/nvidia-application-profiles-rc.d
tmpfs           26402888    18148  26384740   1% /run/nvidia-persistenced/socket
tmpfs          132014436        0 132014436   0% /proc/asound
tmpfs          132014436        0 132014436   0% /proc/acpi
tmpfs          132014436        0 132014436   0% /proc/scsi
tmpfs          132014436        0 132014436   0% /sys/firmware
12:46:52
@grw00:matrix.orggrw00

ok, checked their ubuntu-based image and mounts look like this:

root@cc04a766e493:~# df
Filesystem     1K-blocks     Used Available Use% Mounted on
overlay         20971520    64224  20907296   1% /
tmpfs              65536        0     65536   0% /dev
tmpfs          132014448        0 132014448   0% /sys/fs/cgroup
shm             15728640        0  15728640   0% /dev/shm
/dev/nvme0n1p2  65478188 18995924  43110440  31% /usr/bin/nvidia-smi
/dev/nvme0n1p4  20971520        0  20971520   0% /workspace
tmpfs          132014448       12 132014436   1% /proc/driver/nvidia
tmpfs          132014448        4 132014444   1% /etc/nvidia/nvidia-application-profiles-rc.d
tmpfs           26402892     8832  26394060   1% /run/nvidia-persistenced/socket
tmpfs          132014448        0 132014448   0% /proc/asound
tmpfs          132014448        0 132014448   0% /proc/acpi
tmpfs          132014448        0 132014448   0% /proc/scsi
tmpfs          132014448        0 132014448   0% /sys/firmware
12:50:05
@grw00:matrix.orggrw00interesting they have nvidia-smi mount when my nix container does not 🤔12:50:32
@9hp71n:matrix.orgghpzin joined the room.13:05:27
@gsaurel:laas.frnim65s joined the room.13:36:09
@ss:someonex.netSomeoneSerge (matrix works sometimes)H'm. Maybe they really don't mount the userspace driver o_0. I suppose images derived from NVC do contain a compat driver, but it's kind of weird of them to expect that14:14:23
@ss:someonex.netSomeoneSerge (matrix works sometimes)You still could use NixGL then14:14:39
@ss:someonex.netSomeoneSerge (matrix works sometimes)NixGL will look at the /proc (I think) and choose the correct linuxPackages14:15:22
@ss:someonex.netSomeoneSerge (matrix works sometimes)I'd suggest get an MWE based on that and also reach out with runpod's support asking why they won't mount a driver compatible with the host's kernel14:16:36
@ss:someonex.netSomeoneSerge (matrix works sometimes)(this conversation has happened here before: neither putting drivers into an image nor mounting the host's drivers is "correct": the driver in the image might not be compatible with the kernel running on the host, and the driver from the host might not be compatible e.g. with the libc in the image, et cetera)14:18:14
@grw00:matrix.orggrw00

great thanks, will try this and get back to you. it's

            Cmd = [ "${inputs.nix-gl-host.defaultPackage.x86_64-linux}/bin/nixglhost" "${my-bin}/bin/executor" ];

?

14:37:50
@grw00:matrix.orggrw00i guess i need to build some matrix of images with compat versions and choose which one based on cuda/kernel version in instance metadata14:39:17
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @grw00:matrix.org
i guess i need to build some matrix of images with compat versions and choose which one based on cuda/kernel version in instance metadata
You can build a single image with NixGL
14:39:55
@ss:someonex.netSomeoneSerge (matrix works sometimes)Note: NixGL and nixglhost are different tools 🙃14:40:03
@ss:someonex.netSomeoneSerge (matrix works sometimes) * You can build a single image with NixGL (and multiple drivers) 14:40:37
@grw00:matrix.orggrw00ah 😓14:41:32
19 Jun 2024
@hexa:lossy.networkhexapython312 default migration has starrted14:04:34
@ss:someonex.netSomeoneSerge (matrix works sometimes)stupid new faiss not building with cuda =\14:19:56
21 Jun 2024
@search-sense:matrix.orgsearch-sense

Hello, NixOS community, I want to install python311Packages.tensorrt

TensorRT> command, and try building this derivation again.
TensorRT> $ nix-store --add-fixed sha256 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
TensorRT> ***
error: builder for '/nix/store/140c5c8lpa30r3jrxxbw74631831prrw-TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz.drv' failed with exit code 1;

but the cuda is 12.2 on my system, is it compatible?

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
04:53:46
@search-sense:matrix.orgsearch-sense

Is anyone interested to add latest tensorrt-10.1.0 to NixOS ?

searching for dependencies of /nix/store/gknr686xg6ggafkdfy5323bc7f1m5yf7-tensorrt-10.1.0.27-lib/lib/stubs/libnvinfer_vc_plugin.so
    libstdc++.so.6 -> found: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib
    libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib
setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN
auto-patchelf: 1 dependencies could not be satisfied
error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec
auto-patchelf failed to find all the required dependencies.
Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.
error: builder for '/nix/store/7rqkwg91vnk5d3p4vaym0z0pskkmj4r8-tensorrt-10.1.0.27.drv' failed with exit code 1;
       last 10 log lines:
       >     libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib
       > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN
       > searching for dependencies of /nix/store/gknr686xg6ggafkdfy5323bc7f1m5yf7-tensorrt-10.1.0.27-lib/lib/stubs/libnvinfer_vc_plugin.so
       >     libstdc++.so.6 -> found: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib
       >     libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib
       > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN
       > auto-patchelf: 1 dependencies could not be satisfied
       > error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec
       > auto-patchelf failed to find all the required dependencies.
       > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.
       For full logs, run 'nix log /nix/store/7rqkwg91vnk5d3p4vaym0z0pskkmj4r8-tensorrt-10.1.0.27.drv'.
07:59:22
@search-sense:matrix.orgsearch-sense

export NIXPKGS_ALLOW_UNFREE=1 && nix-build -A cudaPackages.tensorrt

       > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN
       > auto-patchelf: 1 dependencies could not be satisfied
       > error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec
       > auto-patchelf failed to find all the required dependencies.
       > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.

11:03:14
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @search-sense:matrix.org

Hello, NixOS community, I want to install python311Packages.tensorrt

TensorRT> command, and try building this derivation again.
TensorRT> $ nix-store --add-fixed sha256 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
TensorRT> ***
error: builder for '/nix/store/140c5c8lpa30r3jrxxbw74631831prrw-TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz.drv' failed with exit code 1;

but the cuda is 12.2 on my system, is it compatible?

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
You can use cudaPackages.overrideScope to plug in the trt release compatible with your cuda, but also I think trt was originally introduced in Nixpkgs with a logic to select the compatible release in each cuda package set automatically. Evidently, that must be have broken
15:51:43
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @ss:someonex.net
Nvidia prevents unattended downloads, of course it broke
...primarily because of^^^ and because no one seems to be actively using Nixpkgs' in-tree trt expression?
15:52:47
@lcw:matrix.orgLucas joined the room.17:13:01
@lcw:matrix.orgLucas

Does anyone have nsight_systems working?

I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code.
Using the following flake

``` I was able to
get `ncu` working
19:46:16
@lcw:matrix.orgLucas *

Does anyone have nsight_systems working?

I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code.
Using the following flake

{
  description = "nsight_systems";

  inputs = {
    # nixpkgs.url = "github:NixOS/nixpkgs/release-24.05";
    # nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
    nixpkgs.url = "github:ConnorBaker/nixpkgs/feat/cudaPackages-fixed-output-derivations";
  };
  outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs { system = system; config.allowUnfree = true; };
    in
    {
      devShells.${system}.default = pkgs.mkShell {
        nativeBuildInputs = [
          pkgs.cudaPackages.nsight_systems
          pkgs.cudaPackages.nsight_compute
        ];
      };
    };
}

```
I was able to get `ncu` working.
19:47:05
@lcw:matrix.orgLucas *

Does anyone have nsight_systems working?

I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code.
Using the following flake

{
  description = "nsight_systems";

  inputs = {
    # nixpkgs.url = "github:NixOS/nixpkgs/release-24.05";
    # nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
    nixpkgs.url = "github:ConnorBaker/nixpkgs/feat/cudaPackages-fixed-output-derivations";
  };
  outputs = { self, nixpkgs }:
    let
      system = "x86_64-linux";
      pkgs = import nixpkgs { system = system; config.allowUnfree = true; };
    in
    {
      devShells.${system}.default = pkgs.mkShell {
        nativeBuildInputs = [
          pkgs.cudaPackages.nsight_systems
          pkgs.cudaPackages.nsight_compute
        ];
      };
    };
}

I was able to get ncu working.

19:47:39
@lcw:matrix.orgLucas

But when I try to run nsys-ui I get a dialogue box with the error message

Failed to load plugin: QuadDPlugin

Cannot load library /nix/store/hzp2wmqbqihx4slp353ixs405ry6li4f-cuda12.5-nsight_systems-2024.2.3.38-bin/nsight-systems/2024.2.3/host-linux-x64/Plugins/QuadDPlugin/libQuadDPlugin.so: /nix/store/hzp2wmqbqihx4slp353ixs405ry6li4f-cuda12.5-nsight_systems-2024.2.3.38-bin/nsight-systems/2024.2.3/host-linux-x64/Plugins/QuadDPlugin/libQuadDPlugin.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev, version Qt_6

Some functionality will be disabled
19:50:21

Show newer messages


Back to Room ListRoom Version: 9