9 Jul 2024 |
ornx | i'm having a lot of trouble getting things working still - i'm using devShell.override with cudaPackages.backendStdenv and cuda programs compile and run with nvcc, but i'm getting a cudaErrorInsufficientDriver error even though i'm building my flake against the same revision of nixpkgs that the system was built against | 00:35:00 |
ornx | actually nvidia-smi in the system env closure has also stopped working so probably there is something wrong with my system flake lol | 00:38:13 |
ornx | okay yeah even after rebooting and fixing that i get an error when i try to actually run programs i compiled with nvcc | 00:42:55 |
ornx | is there some kind of mismatch between what driver version cudaPackages.cudatoolkit is expecting and what the system flake is using, even though they're on the same revision and config.boot.kernelPackages.nvidiaPackages is unset? | 00:43:56 |
SomeoneSerge (utc+3) | In reply to @ornx:littledevil.club i'm having a lot of trouble getting things working still - i'm using devShell.override with cudaPackages.backendStdenv and cuda programs compile and run with nvcc, but i'm getting a cudaErrorInsufficientDriver error even though i'm building my flake against the same revision of nixpkgs that the system was built against Ohhh you're probably stil running into the stub driver issue | 00:44:33 |
SomeoneSerge (utc+3) | * Ohhh you're probably stil running into the stub driver issue (It should be gone once the getOutput PR reaches unstable) | 00:45:20 |
SomeoneSerge (utc+3) |
nvidia-smi in the system env closure has also stopped working so probably there is something wrong with my system
You did reboot after the switch did you? | 00:45:39 |
SomeoneSerge (utc+3) | *
nvidia-smi in the system env closure has also stopped working so probably there is something wrong with my system
You did reboot after the switch did you? | 00:45:54 |
ornx | yeah, i just rebooted | 00:45:57 |
SomeoneSerge (utc+3) | *
nvidia-smi in the system env closure has also stopped working so probably there is something wrong with my system
You did reboot after the switch did you? EDIT: I will read before I write, I will read before I write... | 00:46:13 |
ornx | i can just merge that PR into a local nixpkgs if that's the fix | 00:46:50 |
SomeoneSerge (utc+3) | You can try running your program with the LD_DEBUG=libs environment variable | 00:47:07 |
SomeoneSerge (utc+3) | If it mentions libcuda.so from this cudatoolkit link farm, it's the stub driver issue, and the solution is to just not use the link farm | 00:47:36 |
SomeoneSerge (utc+3) | * If it mentions libcuda.so from this cudatoolkit link farm, it's the stub driver issue, and the solution is to just not use the link farm (take individual components from https://github.com/NixOS/nixpkgs/blob/7a95a8948b9ae171337bbf2794459dbe167032ed/pkgs/development/cuda-modules/cudatoolkit/redist-wrapper.nix#L44-L58) | 00:52:52 |
| ghishadow joined the room. | 04:21:50 |
hacker1024 | How's everyone's day going? Mine was great until my colleague asked me to package [this](https://github.com/jocover/jetson-ffmpeg/blob/master/CMakeLists.txt) | 06:21:10 |
hacker1024 | It is times like this that make me question my values to the core | 06:21:39 |
SomeoneSerge (utc+3) |
/usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp
That's a good start | 06:23:50 |
SomeoneSerge (utc+3) | *
/usr/src/jetson_multimedia_api/samples/common/classes/NvBuffer.cpp
That's a promising opening | 06:24:01 |
SomeoneSerge (utc+3) | These are only distributed with the jetpack, right? | 06:29:34 |
SomeoneSerge (utc+3) | Redacted or Malformed Event | 06:35:01 |
hacker1024 | Yep, luckily Jetpack-NixOS has all the samples packages | 06:49:01 |
hacker1024 | * Yep, luckily Jetpack-NixOS has all the samples packaged | 06:49:06 |
hacker1024 | Just needs some overlay weirdness to use CUDA from Nixpkgs now | 06:49:37 |
hacker1024 | Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported
` nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt` | 06:50:38 |
hacker1024 | * Speaking of which, is tensorrt supposed to work on aarch64? Because it's evaluating as both broken and unsupported when running the following
`nix-instantiate -I nixpkgs=channel:nixos-unstable '<nixpkgs>' --argstr localSystem aarch64-linux --arg config '{ cudaSupport = true; allowUnfree = true; }' -A cudaPackages.tensorrt` | 06:50:57 |
SomeoneSerge (utc+3) | Not sure, tensorrt isn't receiving enough love:) | 07:11:44 |
SomeoneSerge (utc+3) | https://github.com/NixOS/nixpkgs/issues/323124 | 07:12:14 |
SomeoneSerge (utc+3) | Jonas Chevalier hexa (UTC+1) a question about release-lib.nix : my impression is that supportedPlatforms is the conventional way to describe a "matrix" of jobs; for aarch64-linux, I'd like to define a matrix over individual capabilities because aarch64-linux mostly means embedded/jetson SBCs; currently this means importing nixpkgs with different config.cudaCapabilities values... any thoughts on how to express this in a not-too-ad-hoc way? | 18:07:33 |
connor (he/him) (UTC-7) | Kevin Mittman: is there any reason the TensorRT tarball exploded in size for the 10.2 release? It's clocking in at over 4GB, nearly twice the size it was for 10.1 (~2GB).
[connorbaker@nixos-desktop:~/cuda-redist-find-features]$ ./tensorrt/helper.sh 12.5 10.2.0.19 linux-x86_64
[582.9/4140.3 MiB DL] downloading 'https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz'
| 18:56:10 |