| 23 Apr 2024 |
search-sense | Redacted or Malformed Event | 07:26:21 |
| Tanja (Old; I'm now @tanja:catgirl.cloud) changed their display name from Tanja (Old) to Tanja (Old; I'm now @tanja:catgirl.cloud). | 12:30:52 |
connor (he/him) | Still very early, but this PR would add all the redistributables we’ve been missing as well as newer versions of CUDA/cuDNN: https://github.com/NixOS/nixpkgs/pull/306172 | 14:28:01 |
connor (he/him) | All the individual manifests go bye bye which is partly why it’s such a large diff | 14:37:18 |
connor (he/him) | I’ve got to look into the best way to try to package TensorRT since that one requires accepting a license to download. | 14:38:00 |
connor (he/him) | Maybe not! Looks like they have direct links now as part of open sourcing some components: https://github.com/NVIDIA/TensorRT | 14:39:41 |
connor (he/him) | * Maybe not! Looks like they have direct links now as part of open sourcing some components: https://github.com/NVIDIA/TensorRT?tab=readme-ov-file#optional---if-not-using-tensorrt-container-specify-the-tensorrt-ga-release-build-path | 14:40:08 |
connor (he/him) | Oh… it’s the same link as the one behind their login which you have to tick the box to agree to their license to see: https://developer.nvidia.com/tensorrt/download/10x | 14:45:33 |
connor (he/him) | 🤷♂️ | 14:46:01 |
connor (he/him) | Of course the URLs don’t use the same convention the other redistributables use 🫠 | 18:57:49 |
| 24 Apr 2024 |
| @stablejoy:matrix.org changed their profile picture. | 08:59:09 |
| 25 Apr 2024 |
| NixOS Moderation Bot banned @jonringer:matrix.org (Banned until 2024/06/10 after deliberation of the Moderation team). | 21:12:06 |
| SomeoneSerge (matrix works sometimes) changed their display name from SomeoneSerge (void) to SomeoneSerge (UTC+1). | 23:01:24 |
| 26 Apr 2024 |
ripi | Hi there, I have a new workstation with 8 GPUs, nvidia-smi shows them all perfectly but I can't seem to run anything with them ( nvtop, ffmpeg, my own cgo-cuda software, etc.. ). The errors mean nothing to me but you might know: ffmpeg:
dl_fn->cuda_dl->cuInit(0) failed -> CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
sudo nvidia-modprobe -u
Unknown CUresult: 803
| 07:39:55 |
ripi | A different workstation, different hardware, same software ( nixos with flakes ), works perfectly | 07:41:01 |
ripi | I have leads from the internet that it might be a "the versions of the display driver and the CUDA driver." as this new workstation is headless unlike the previous one, I will look that way | 07:42:42 |
ripi | ok so I fixed it by installing a desktop environment, weird | 07:53:15 |
SomeoneSerge (matrix works sometimes) | In reply to @ripi:matrix.org
Hi there, I have a new workstation with 8 GPUs, nvidia-smi shows them all perfectly but I can't seem to run anything with them ( nvtop, ffmpeg, my own cgo-cuda software, etc.. ). The errors mean nothing to me but you might know: ffmpeg:
dl_fn->cuda_dl->cuInit(0) failed -> CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
sudo nvidia-modprobe -u
Unknown CUresult: 803
The "driver combination" error means you need to reboot before you can use the version of libcuda.so you were trying to load | 07:59:25 |
| @stablejoy:matrix.org changed their profile picture. | 14:03:49 |
| 27 Apr 2024 |
kenshin79 | Hey all. I just started dipping my toe in CUDA, and after much fiddling I'm in this situation. I need xformers and openai-triton 2.1. The latter is only in unstable. But in unstable, if I set cudaSupport=true, xformers fails to compile because it says gcc is too new.
Now I'm trying pkgs.python3Packages.override { stdenv = pkgs.gcc11Stdenv; } in unstable, which is taking its sweet time but fingers crossed. But I'm curious why the rest of the packages do compile if the standard gcc is not good. What am I doing wrong?
| 14:58:30 |
kenshin79 | I thought of moving the expression for triton in unstable to stable, but it seems very different, and besides, being on the latest packages would be useful (I'm trying llama3 and phi-3...) | 15:00:11 |
connor (he/him) | When cudaPackages.cuda_nvcc is included in nativeBuildInputs, a setup hook is included as well which sets certain environment flags so NVCC knows which GCC to use. I don't have the xformers derivation in front of me but I'd guess it's either packaged incorrectly (so not pulling in the setup hooks), something is broken (doesn't build at all with CUDA enabled), or the build system for the package is overriding the environment variables we use | 16:58:19 |
connor (he/him) | Overriding the stdenv like that might allow it to build, but you're liable to get symbol errors when the program runs and tries to load glibc because of the version mismatch (one of the things we do with NVCC is wrap it so it uses an older version of GCC but still links against the same version of glibc the rest of Nixpkgs is built with) | 16:59:35 |
kenshin79 | I see. Let me see if I find that flag | 18:00:42 |
kenshin79 | there's a export CUDA_HOME=${cudaPackages.cuda_nvcc} in preBuild, but I don't see nvcc in nativeBuildInputs It is in there, instead, in bitsandbytes, which does compile. | 18:14:55 |
kenshin79 | but there's also a lot of substituteInPlace in postPatch | 18:16:13 |
| @ygt:matrix.org joined the room. | 18:23:05 |
connor (he/him) | Sorry, I’m not at my desktop and kind of burnt out at the moment so I’m mostly just pasting links to files from memory | 19:50:09 |
connor (he/him) | https://github.com/NixOS/nixpkgs/blob/c8d7c8a78fb516c0842cc65346506a565c88014d/pkgs/development/cuda-modules/cuda/overrides.nix#L168 | 19:50:13 |
connor (he/him) | https://github.com/NixOS/nixpkgs/blob/c8d7c8a78fb516c0842cc65346506a565c88014d/pkgs/development/cuda-modules/setup-hooks/mark-for-cudatoolkit-root-hook.sh | 19:50:46 |