| 23 Jun 2025 |
| lon joined the room. | 08:55:01 |
lon | Hi! I have a question, would anybody be interested in a services.vllm module? I was working on running it as systemd service and hardening it and I'm happy with the result... | 08:57:13 |
lon | Download vllm.nix | 08:58:43 |
lon | (I've never contributed to nixpkgs, so I'm not sure how high quality is this) | 08:59:15 |
lon | The interesting part is
MemoryDenyWriteExecute = false; # Needed for CUDA/PyTorch JIT
PrivateDevices = false; # Needed for GPU access
RestrictAddressFamilies = ["AF_UNIX" "AF_INET" "AF_INET6" "AF_NETLINK"];
DevicePolicy = "closed"; # Only allow the following devices, based on strace usage:
DeviceAllow = lib.flatten [
# Basic devices
"/dev/null rw"
"/dev/urandom r"
"/dev/tty rw"
# NVIDIA control devices
"/dev/nvidiactl rw"
"/dev/nvidia-modeset rw"
"/dev/nvidia-uvm rw"
"/dev/nvidia-uvm-tools rw"
(builtins.map (i: "/dev/nvidia${builtins.toString i} rw") (lib.splitString " " cfg.cudaDevices))
# NVIDIA capability devices
"/dev/nvidia-caps/nvidia-cap1 r"
"/dev/nvidia-caps/nvidia-cap2 r"
];
ProtectKernelTunables = true;
ProtectKernelModules = true;
ProtectControlGroups = true;
RestrictNamespaces = true;
LockPersonality = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
RemoveIPC = true;
PrivateMounts = true;
PrivateUsers = true;
ProtectHostname = true;
ProtectKernelLogs = true;
ProtectClock = true;
ProtectProc = "invisible";
UMask = "0077";
CapabilityBoundingSet = ["CAP_SYS_NICE"];
AmbientCapabilities = ["CAP_SYS_NICE"];
| 09:06:10 |
connor (he/him) | Two things I've promised to look at today:
- Bumping the version of protobuf used by OpenCV, which hasn't been updated in a while (need to backport to 25.05 as well).
- Figuring out how to revert https://github.com/NixOS/nixpkgs/pull/414647 in a way that doesn't break consumers of OpenCV -- really don't want
cudatoolkit propagated to all consumers of OpenCV.
| 17:30:42 |
| 24 Jun 2025 |
connor (he/him) | :L | 23:45:47 |
connor (he/him) | https://github.com/NixOS/nixpkgs/blob/5d0aa4675f7a35ec9661325d1dc22dfcbba5d040/pkgs/development/python-modules/warp-lang/default.nix#L100 is wrong; there's no bsd license | 23:45:58 |
connor (he/him) | https://github.com/NixOS/nixpkgs/pull/419722 | 23:56:43 |
hexa | proper meta-checks when 🙂 | 23:59:26 |
| 25 Jun 2025 |
connor (he/him) | WIP other PR to fix the CUDA builds: https://github.com/NixOS/nixpkgs/pull/419750 | 01:30:11 |
Gaétan Lepage | Thanks for cathing this guys | 06:37:59 |
connor (he/him) | Okay that was a gigantic pain in the ass but I think that PR is all good to go now; added passthru.tests as well. | 22:37:42 |
indoor_squirrel | You go, Connor! We're all rooting for you! Thanks @ss:someonex.net: ! | 22:41:36 |
indoor_squirrel | * You go, Connor! We're all rooting for you! Thanks
@ss:someonex.net and Gaétan Lepage! | 22:42:01 |
| 27 Jun 2025 |
SomeoneSerge (back on matrix) | connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss? | 13:17:19 |
SomeoneSerge (back on matrix) | * connor (he/him) (UTC-7): can you test if https://gist.github.com/SomeoneSerge/75a8ec66917bc2dd8242e638a2c809f3 is sufficient to make nix run ...saxpy work without extra fuss on an ubuntu/jetpack? | 13:18:16 |
SomeoneSerge (back on matrix) | I'm smh getting an error despite the cuda_compat driver:
Runtime version: 12080
Driver version: 12080
Host memory initialized, copying to the device
CUDA error at cudaMalloc(&xDevice, N * sizeof(float)): system has unsupported display driver / cuda driver combination
| 13:20:34 |
connor (he/him) | I'll take a look in a few | 16:11:38 |
connor (he/him) | Off the top of my head -- are you running JetPack 6? On JetPack 5 cuda_compat only works up through 12.2. The other thing I can think of: make sure the cuda_compat driver comes before the host driver so it's loaded first | 16:13:50 |
connor (he/him) | IIRC if the host driver is loaded first it ignore the one provided by cuda_compat (I ran into a bunch of issues in my fork of cuda-packages because autoAddDriverRunpath and autoAddCudaCompatHook both append to RUNPATH, so the order they execute in is significant, which is what started the whole process of me writing the array-utilities setup hooks, because if I was going to have to re-arrange arrays (hook order) I wanted to make sure I only had to write the code once and could test it). | 16:16:21 |
connor (he/him) | SomeoneSerge (Ever OOMed by Element): here's what I got: https://gist.github.com/ConnorBaker/d6791db3dd5a385abfc562af161856e9 | 20:56:29 |
connor (he/him) | It successfully finds and loads the first vendor libraries it needs (libnvrm_gpu.so and libnvrm_mem.so), but then fails to find dependencies of those (like libnvos.so) because they have empty runpaths! | 20:58:03 |
connor (he/him) | As an example, doing
sudo /home/connor/.local/state/nix/profile/bin/patchelf --set-rpath '$ORIGIN' /run/opengl-driver/lib/libnvrm_gpu.so
allows it to find more libraries! Not enough to succeed, but it then says
264389: find library=libcuda.so.1 [0]; searching
264389: search path=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat:/run/opengl-driver/lib:/nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib (RUNPATH from file /nix/store/i7n0xv8v87xybicsqhm4fpq55r0n3qim-cuda_cudart-12.2.140-lib/lib/libcudart.so.12)
264389: trying file=/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat/libcuda.so.1
264389:
264389: find library=libnvrm_gpu.so [0]; searching
264389: search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat (RUNPATH from file ./saxpy/bin/saxpy)
264389: trying file=/run/opengl-driver/lib/libnvrm_gpu.so
264389:
264389: find library=libnvrm_mem.so [0]; searching
264389: search path=/run/opengl-driver/lib:/nix/store/2jd6vf145m0ldi05rzqwwk5n43405npk-cuda_compat-12.2.34086590/compat (RUNPATH from file ./saxpy/bin/saxpy)
264389: trying file=/run/opengl-driver/lib/libnvrm_mem.so
264389:
264389: find library=libnvos.so [0]; searching
264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy)
264389: trying file=/run/opengl-driver/lib/libnvos.so
264389:
264389: find library=libnvsocsys.so [0]; searching
264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy)
264389: trying file=/run/opengl-driver/lib/libnvsocsys.so
264389:
264389: find library=libnvrm_sync.so [0]; searching
264389: search path=/run/opengl-driver/lib (RUNPATH from file ./saxpy/bin/saxpy)
264389: trying file=/run/opengl-driver/lib/libnvrm_sync.so
264389:
264389: find library=libnvsciipc.so [0]; searching
264389: search cache=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/etc/ld.so.cache
264389: search path=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib:/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib (system search path)
264389: trying file=/nix/store/gydncjm02ww60x9gamkhfwj3f34g3g8m-glibc-2.40-66/lib/libnvsciipc.so
264389: trying file=/nix/store/0ga1cm2ild3sv9vg64ldizrdpfr72pvv-xgcc-14.3.0-libgcc/lib/libnvsciipc.so
| 21:05:38 |
connor (he/him) | Perhaps instead of symlinking the host libs, we copy them and patchelf them so they can search in the local directory? | 21:15:48 |
connor (he/him) | Updated the gist so it does what I proposed in the previous message; seems to work! | 21:47:45 |
Tristan Ross | I tried OBS and I'm trying out ollama using CUDA, works great on Ampere Altra Max. | 22:01:05 |
| 28 Jun 2025 |
| Zexin Yuan joined the room. | 05:56:13 |
| @rdg:matrix.org left the room. | 23:24:12 |
| 30 Jun 2025 |
ereslibre | hi everyone! I have reintroduced --gpus for docker and refactored the code a bit to make it easier to maintain. Please, have a look at https://github.com/NixOS/nixpkgs/pull/421088 when you have some time; I can confirm it works on all cases (except for --gpus with rootless mode, what never worked afaik) | 07:19:09 |