NixOS CUDA - Public Room Timeline

	NixOS CUDA	324 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	64 Servers

Load older messages

Sender	Message	Time
17 Jun 2024
SomeoneSerge (matrix works sometimes)	In reply to @ss:someonex.net Is this nvidia-smi from your hard coded linuxPackages? Or is the one mounted from the host and expecting that there would be a /lib/ld-linux*.so?	12:45:51
grw00	yes it is	12:46:43
grw00	bash-5.2# df Filesystem 1K-blocks Used Available Use% Mounted on overlay 10485760 64056 10421704 1% / tmpfs 65536 0 65536 0% /dev tmpfs 132014436 0 132014436 0% /sys/fs/cgroup shm 15728640 0 15728640 0% /dev/shm /dev/nvme0n1p2 65478188 24385240 37721124 40% /sbin/docker-init /dev/nvme0n1p4 52428800 0 52428800 0% /cache udev 131923756 0 131923756 0% /dev/null udev 131923756 0 131923756 0% /dev/tty tmpfs 132014436 12 132014424 1% /proc/driver/nvidia tmpfs 132014436 4 132014432 1% /etc/nvidia/nvidia-application-profiles-rc.d tmpfs 26402888 18148 26384740 1% /run/nvidia-persistenced/socket tmpfs 132014436 0 132014436 0% /proc/asound tmpfs 132014436 0 132014436 0% /proc/acpi tmpfs 132014436 0 132014436 0% /proc/scsi tmpfs 132014436 0 132014436 0% /sys/firmware	12:46:52
grw00	ok, checked their ubuntu-based image and mounts look like this: root@cc04a766e493:~# df Filesystem 1K-blocks Used Available Use% Mounted on overlay 20971520 64224 20907296 1% / tmpfs 65536 0 65536 0% /dev tmpfs 132014448 0 132014448 0% /sys/fs/cgroup shm 15728640 0 15728640 0% /dev/shm /dev/nvme0n1p2 65478188 18995924 43110440 31% /usr/bin/nvidia-smi /dev/nvme0n1p4 20971520 0 20971520 0% /workspace tmpfs 132014448 12 132014436 1% /proc/driver/nvidia tmpfs 132014448 4 132014444 1% /etc/nvidia/nvidia-application-profiles-rc.d tmpfs 26402892 8832 26394060 1% /run/nvidia-persistenced/socket tmpfs 132014448 0 132014448 0% /proc/asound tmpfs 132014448 0 132014448 0% /proc/acpi tmpfs 132014448 0 132014448 0% /proc/scsi tmpfs 132014448 0 132014448 0% /sys/firmware	12:50:05
grw00	interesting they have nvidia-smi mount when my nix container does not 🤔	12:50:32
	ghpzin joined the room.	13:05:27
	nim65s joined the room.	13:36:09
SomeoneSerge (matrix works sometimes)	H'm. Maybe they really don't mount the userspace driver o_0. I suppose images derived from NVC do contain a compat driver, but it's kind of weird of them to expect that	14:14:23
SomeoneSerge (matrix works sometimes)	You still could use NixGL then	14:14:39
SomeoneSerge (matrix works sometimes)	NixGL will look at the /proc (I think) and choose the correct linuxPackages	14:15:22
SomeoneSerge (matrix works sometimes)	I'd suggest get an MWE based on that and also reach out with runpod's support asking why they won't mount a driver compatible with the host's kernel	14:16:36
SomeoneSerge (matrix works sometimes)	(this conversation has happened here before: neither putting drivers into an image nor mounting the host's drivers is "correct": the driver in the image might not be compatible with the kernel running on the host, and the driver from the host might not be compatible e.g. with the libc in the image, et cetera)	14:18:14
grw00	great thanks, will try this and get back to you. it's `Cmd = [ "${inputs.nix-gl-host.defaultPackage.x86_64-linux}/bin/nixglhost" "${my-bin}/bin/executor" ];` ?	14:37:50
grw00	i guess i need to build some matrix of images with compat versions and choose which one based on cuda/kernel version in instance metadata	14:39:17
SomeoneSerge (matrix works sometimes)	In reply to @grw00:matrix.org i guess i need to build some matrix of images with compat versions and choose which one based on cuda/kernel version in instance metadata You can build a single image with `NixGL`	14:39:55
SomeoneSerge (matrix works sometimes)	Note: NixGL and nixglhost are different tools 🙃	14:40:03
SomeoneSerge (matrix works sometimes)	* You can build a single image with `NixGL` (and multiple drivers)	14:40:37
grw00	ah 😓	14:41:32
19 Jun 2024
hexa	python312 default migration has starrted	14:04:34
SomeoneSerge (matrix works sometimes)	stupid new faiss not building with cuda =\	14:19:56
21 Jun 2024
search-sense	Hello, `NixOS community`, I want to install `python311Packages.tensorrt` `TensorRT> command, and try building this derivation again. TensorRT> $ nix-store --add-fixed sha256 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz TensorRT> *** error: builder for '/nix/store/140c5c8lpa30r3jrxxbw74631831prrw-TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz.drv' failed with exit code 1;` but the cuda is 12.2 on my system, is it compatible? `> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0`	04:53:46
search-sense	Is anyone interested to add latest `tensorrt-10.1.0` to `NixOS` ? searching for dependencies of /nix/store/gknr686xg6ggafkdfy5323bc7f1m5yf7-tensorrt-10.1.0.27-lib/lib/stubs/libnvinfer_vc_plugin.so libstdc++.so.6 -> found: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN auto-patchelf: 1 dependencies could not be satisfied error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec auto-patchelf failed to find all the required dependencies. Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`. error: builder for '/nix/store/7rqkwg91vnk5d3p4vaym0z0pskkmj4r8-tensorrt-10.1.0.27.drv' failed with exit code 1; last 10 log lines: > libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN > searching for dependencies of /nix/store/gknr686xg6ggafkdfy5323bc7f1m5yf7-tensorrt-10.1.0.27-lib/lib/stubs/libnvinfer_vc_plugin.so > libstdc++.so.6 -> found: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib > libgcc_s.so.1 -> found: /nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN > auto-patchelf: 1 dependencies could not be satisfied > error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec > auto-patchelf failed to find all the required dependencies. > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`. For full logs, run 'nix log /nix/store/7rqkwg91vnk5d3p4vaym0z0pskkmj4r8-tensorrt-10.1.0.27.drv'.	07:59:22
search-sense	`export NIXPKGS_ALLOW_UNFREE=1 && nix-build -A cudaPackages.tensorrt` > setting RPATH to: /nix/store/bn7pnigb0f8874m6riiw6dngsmdyic1g-gcc-13.3.0-lib/lib:/nix/store/pd8xxiyn2xi21fgg9qm7r0qghsk8715k-gcc-13.3.0-libgcc/lib:$ORIGIN > auto-patchelf: 1 dependencies could not be satisfied > error: auto-patchelf could not satisfy dependency libcudart.so.12 wanted by /nix/store/799sv915xqi5b8n14hdkbbp6h06rrjz7-tensorrt-10.1.0.27-bin/bin/trtexec > auto-patchelf failed to find all the required dependencies. > Add the missing dependencies to --libs or use `--ignore-missing="foo.so.1 bar.so etc.so"`.	11:03:14
SomeoneSerge (matrix works sometimes)	In reply to @search-sense:matrix.org Hello, `NixOS community`, I want to install `python311Packages.tensorrt` `TensorRT> command, and try building this derivation again. TensorRT> $ nix-store --add-fixed sha256 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz TensorRT> *** error: builder for '/nix/store/140c5c8lpa30r3jrxxbw74631831prrw-TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz.drv' failed with exit code 1;` but the cuda is 12.2 on my system, is it compatible? `> nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0` You can use `cudaPackages.overrideScope` to plug in the trt release compatible with your cuda, but also I think trt was originally introduced in Nixpkgs with a logic to select the compatible release in each cuda package set automatically. Evidently, that must be have broken	15:51:43
SomeoneSerge (matrix works sometimes)	In reply to @ss:someonex.net Nvidia prevents unattended downloads, of course it broke ...primarily because of^^^ and because no one seems to be actively using Nixpkgs' in-tree trt expression?	15:52:47
	Lucas joined the room.	17:13:01
Lucas	Does anyone have `nsight_systems` working? I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code. Using the following flake ``` I was able to get `ncu` working	19:46:16
Lucas	* Does anyone have `nsight_systems` working? I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code. Using the following flake { description = "nsight_systems"; inputs = { # nixpkgs.url = "github:NixOS/nixpkgs/release-24.05"; # nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable"; nixpkgs.url = "github:ConnorBaker/nixpkgs/feat/cudaPackages-fixed-output-derivations"; }; outputs = { self, nixpkgs }: let system = "x86_64-linux"; pkgs = import nixpkgs { system = system; config.allowUnfree = true; }; in { devShells.${system}.default = pkgs.mkShell { nativeBuildInputs = [ pkgs.cudaPackages.nsight_systems pkgs.cudaPackages.nsight_compute ]; }; }; } ``` I was able to get `ncu` working.	19:47:05
Lucas	* Does anyone have `nsight_systems` working? I am using CUDA to develop progrmans on NixOS 24.05 and it is working great. Now I want to profile my code. Using the following flake { description = "nsight_systems"; inputs = { # nixpkgs.url = "github:NixOS/nixpkgs/release-24.05"; # nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable"; nixpkgs.url = "github:ConnorBaker/nixpkgs/feat/cudaPackages-fixed-output-derivations"; }; outputs = { self, nixpkgs }: let system = "x86_64-linux"; pkgs = import nixpkgs { system = system; config.allowUnfree = true; }; in { devShells.${system}.default = pkgs.mkShell { nativeBuildInputs = [ pkgs.cudaPackages.nsight_systems pkgs.cudaPackages.nsight_compute ]; }; }; } I was able to get `ncu` working.	19:47:39
Lucas	But when I try to run `nsys-ui` I get a dialogue box with the error message Failed to load plugin: QuadDPlugin Cannot load library /nix/store/hzp2wmqbqihx4slp353ixs405ry6li4f-cuda12.5-nsight_systems-2024.2.3.38-bin/nsight-systems/2024.2.3/host-linux-x64/Plugins/QuadDPlugin/libQuadDPlugin.so: /nix/store/hzp2wmqbqihx4slp353ixs405ry6li4f-cuda12.5-nsight_systems-2024.2.3.38-bin/nsight-systems/2024.2.3/host-linux-x64/Plugins/QuadDPlugin/libQuadDPlugin.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev, version Qt_6 Some functionality will be disabled	19:50:21

Show newer messages

Back to Room ListRoom Version: 9