| 25 Apr 2025 |
luke-skywalker | I see.
If Ill have to I would probably opt for editing /var/run/cdi/nvidia-container-toolkit.json, but to this point I think dont see a reason not to stick with 16.2 and update once the upstream issue is resolved.
| 13:07:54 |
luke-skywalker | * I see.
If Ill have to I would probably opt for editing /var/run/cdi/nvidia-container-toolkit.json, but at this point I think dont see a reason not to stick with 16.2 and update once the upstream issue is resolved.
| 13:08:09 |
luke-skywalker | * I see.
If Ill have to I would probably opt for editing /var/run/cdi/nvidia-container-toolkit.json, but at this point I dont see a reason not to stick with 16.2 and update once the upstream issue is resolved.
| 13:08:17 |
ereslibre | yeah, updating /var/run/cdi/nvidia-container-toolkit.json is flaky as I exposed it, it expects ldconfig to be present within the container at the specified path | 13:10:27 |
luke-skywalker | good to know. well until I see a good reason not to and everything works as needed, I will stick with 16.2 for the time being then. | 13:13:16 |
luke-skywalker | currently just testing out different cluster setups in my homelab (4x machines, 2x with nvidia GPU) so will be a bit until any real deployment... | 13:14:35 |
connor (he/him) | Kevin Mittman: I noticed the TensorRT binary archive for x86_64-linux (and only x86_64-linux) includes libnvinfer_builder_resource.so.10.9.0 and libnvinfer_builder_resource_win.so.10.9.0. Both are ~1.9 GB, and I'm wondering if libnvinfer_builder_resource_win.so.10.9.0 is relevant for x86_64-linux systems, and if so, what it does compared to libnvinfer_builder_resource.so.10.9.0. | 23:33:24 |
Kevin Mittman (UTC-8) | In reply to @connorbaker:matrix.org Kevin Mittman: I noticed the TensorRT binary archive for x86_64-linux (and only x86_64-linux) includes libnvinfer_builder_resource.so.10.9.0 and libnvinfer_builder_resource_win.so.10.9.0. Both are ~1.9 GB, and I'm wondering if libnvinfer_builder_resource_win.so.10.9.0 is relevant for x86_64-linux systems, and if so, what it does compared to libnvinfer_builder_resource.so.10.9.0. Checking. Also that tarball doesn't conform to the "binary archive" format ... and 6.4GB | 23:48:23 |
Kevin Mittman (UTC-8) | As the name implies, seems to be for cross compilation, Linux -> Windows | 23:55:01 |
| 26 Apr 2025 |
connor (he/him) | Shouldn't it be in a different targets directory if it's for cross to another system? | 00:01:28 |
hexa | heads up | 19:52:11 |
hexa | current onnxruntime on unstable requires w+x, while the version on release-24.11 does not | 19:52:34 |
hexa | ❯ objdump -x result/lib/libonnxruntime.so | grep -A1 "STACK off"
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rwx
| 19:52:55 |
hexa | ❯ objdump -x result/lib/libonnxruntime.so | grep -A1 "STACK off"
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-
| 19:53:01 |
hexa | implies systemd units that depend on onnxruntime and have MemoryDenyWriteExecute need to be updated to allow it | 19:53:33 |
connor (he/him) | I don’t know if anyone else uses torchmetrics, but if you’re wondering why using DISTS is so freaking slow, it’s because they create a new instance of the model every time you call it: https://github.com/Lightning-AI/torchmetrics/blob/60e7686c97c14a4286825ec23187b8629f825d15/src/torchmetrics/functional/image/dists.py#L176
I tried just creating the model once and using it directly, and it is much faster, but something about doing that causes a memory leak which makes training OOM eventually :(
At any rate, it’s not the packaging’s fault, woohoo | 19:58:30 |
| 29 Apr 2025 |
connor (he/him) | finally started writing more docs (https://github.com/ConnorBaker/cuda-packages/blob/main/doc/language-frameworks/cuda.section.md) and moving some new package expressions (cuda-python, cutlass, flash-attn, modelopt, pyglove, schedulefree, transformer-engine) to my public repo (https://github.com/ConnorBaker/cuda-packages/tree/main/pkgs/development/python-modules) | 04:50:56 |
| @ygt:matrix.org left the room. | 23:42:49 |
| 1 May 2025 |
connor (he/him) | God I need to finish arrayUtilities so I can start landing CUDA setup hooks | 19:07:13 |
| oak 🏳️🌈♥️ changed their display name from oak - mikatammi.fi to oak 🫱⭕🫲. | 23:18:34 |
connor (he/him) | Kevin Mittman: is it intentional that the CUDA 12.9 docs (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id7) say they require a driver version >=575.51.03 for 12.9, but the latest release is 575.51.02 (https://download.nvidia.com/XFree86/Linux-x86_64/575.51.02/)? | 23:27:12 |
Kevin Mittman (UTC-8) | In reply to @connorbaker:matrix.org Kevin Mittman: is it intentional that the CUDA 12.9 docs (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id7) say they require a driver version >=575.51.03 for 12.9, but the latest release is 575.51.02 (https://download.nvidia.com/XFree86/Linux-x86_64/575.51.02/)? CUDA 12.9.0 ships with driver 575.51.03
what you are seeing is a separate release from GeForce BU | 23:29:15 |
connor (he/him) | It's also newer than the open kernel modules then? | 23:29:42 |
Kevin Mittman (UTC-8) | Experiencing technical difficulties | 23:30:00 |
| 2 May 2025 |
luke-skywalker | is there a way to pick up on the nvidia-container-toolkit-tools directory containing the runtimes at build time? for example /nix/store/72bp8mb7zzpjifcwasj5wh45ixasmck7-nvidia-container-toolkit-1.17.6-tools | 10:36:29 |
SomeoneSerge (back on matrix) | getOutput at eval time | 13:25:02 |
| 4 May 2025 |
luke-skywalker | Interesting. Damn working through the nixOS Manual is still on my bucket list. | 10:27:03 |
luke-skywalker | took me a a few looks but
"${getOutput "tools" pkgs.nvidia-container-toolkit}"
is actually pretty straight forward.
| 20:05:52 |
luke-skywalker | 🙏thx a lot | 20:06:01 |
| 5 May 2025 |
Gaétan Lepage | Hi there,
I'm working on bumping pytorch to 2.7.0. They now require libcufile.so. Are you aware of this library? Is it already available in nixpkgs? | 12:26:45 |