NixOS CUDA | 291 Members | |
| CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda | 58 Servers |
| Sender | Message | Time |
|---|---|---|
| 24 Apr 2025 | ||
| I haven't used it much myself as I don't own a big enough GPU. According to me, it is not affiliated to Mistral (the company). I guess that it's the same as "ollama" and Llama (Meta). | 15:43:10 | |
| its getting better still though. Now switched from Daemonset deplyoment of the device plugin to helm deployment with custom values. This made it possible to also enable time slicing available GPU 🥳 | 16:10:02 | |
| thx for the info, yeah the same as ollama was my assumption. Guess ill stick to vllm depoyment with helm on k8s. | 16:37:54 | |
| 25 Apr 2025 | ||
| Not getting lighter by the release... | 06:52:16 | |
| Redacted or Malformed Event | 06:52:18 | |
| 06:52:33 | |
| luke-skywalker: yes, glibc is not forwards compatible, only backwards compatible. You can check https://github.com/NixOS/nixpkgs/issues/338511#issuecomment-2341496949 and the previous comments, since this is basically the issue you are hitting | 12:24:21 | |
| * luke-skywalker: yes, glibc is not forward compatible, only backwards compatible. You can check https://github.com/NixOS/nixpkgs/issues/338511#issuecomment-2341496949 and the previous comments, since this is basically the issue you are hitting | 12:24:30 | |
| thx 🙏 So just to get it right: is it nixOS (unstable) running a glibc version that is too new for the 17.x images or has the image build with a glibc version that is too new for nixos (unstable) ? 🤔 because nvidia device plugin image versions 14.x/15.x/16.x all work. Do you see any critical issue running clusters on 16.2? It works like a beaty currently testing gpu workload autoscaling. | 12:29:21 | |
| * thx 🙏 So just to get it right: is it nixOS (unstable) running a glibc version that is too new for the 17.x images or has the image been build with a glibc version that is too new for nixos (unstable) ? 🤔 because nvidia device plugin image versions 14.x/15.x/16.x all work. Do you see any critical issue running clusters on 16.2? It works like a beaty currently testing gpu workload autoscaling. | 12:29:39 | |
| * thx 🙏 So just to get it right: is it nixOS (unstable) running a glibc version that is too new for the 17.x images or has the image been build with a glibc version that is too new for nixos (unstable) ? 🤔 because nvidia device plugin image versions 14.x/15.x/16.x all work. Do you see any critical issue running clusters on 16.2? It works like a beaty currently testing gpu workload autoscaling and I would hate to let that go 😅 | 12:30:04 | |
| hi! good news, I was able to reproduce and have a fix; this is very related to an issue reported to nvidia-container-toolkit. Let me explain | 12:52:45 | |
| im all ears 🤩 | 12:53:11 | |
| https://gist.github.com/ereslibre/483fec3217ffca38b3244df42a477db2 | 13:00:36 | |
| this is related to upstream issue https://github.com/NVIDIA/nvidia-container-toolkit/issues/944 somehow. We need to figure out the best way to handle this, but at least you have two workarounds for the time being, none of them is ideal... | 13:04:11 | |
| I see. If Ill have to I would probably opt for editing | 13:07:54 | |
| * I see. If Ill have to I would probably opt for editing | 13:08:09 | |
| * I see. If Ill have to I would probably opt for editing | 13:08:17 | |
| yeah, updating /var/run/cdi/nvidia-container-toolkit.json is flaky as I exposed it, it expects ldconfig to be present within the container at the specified path | 13:10:27 | |
| good to know. well until I see a good reason not to and everything works as needed, I will stick with 16.2 for the time being then. | 13:13:16 | |
| currently just testing out different cluster setups in my homelab (4x machines, 2x with nvidia GPU) so will be a bit until any real deployment... | 13:14:35 | |
Kevin Mittman: I noticed the TensorRT binary archive for x86_64-linux (and only x86_64-linux) includes libnvinfer_builder_resource.so.10.9.0 and libnvinfer_builder_resource_win.so.10.9.0. Both are ~1.9 GB, and I'm wondering if libnvinfer_builder_resource_win.so.10.9.0 is relevant for x86_64-linux systems, and if so, what it does compared to libnvinfer_builder_resource.so.10.9.0. | 23:33:24 | |
In reply to @connorbaker:matrix.orgChecking. Also that tarball doesn't conform to the "binary archive" format ... and 6.4GB | 23:48:23 | |
| As the name implies, seems to be for cross compilation, Linux -> Windows | 23:55:01 | |
| 26 Apr 2025 | ||
Shouldn't it be in a different targets directory if it's for cross to another system? | 00:01:28 | |
| heads up | 19:52:11 | |
| current onnxruntime on unstable requires w+x, while the version on release-24.11 does not | 19:52:34 | |
| 19:52:55 | |
| 19:53:01 | |
implies systemd units that depend on onnxruntime and have MemoryDenyWriteExecute need to be updated to allow it | 19:53:33 | |