!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

211 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda42 Servers

Load older messages


SenderMessageTime
21 Jul 2024
@mindstorms6:matrix.orgmindstorms6 joined the room.18:43:32
@adam:robins.wtfadamcstephens joined the room.19:05:53
@hexa:lossy.networkhexa (UTC+1) SomeoneSerge (UTC+3): you know, I wonder if we can just keep these device node lists for cuda and rocm in a central location 20:58:59
@adam:robins.wtfadamcstephens
In reply to @hexa:lossy.network
can anyone here test a hardening change for ollama with rocm?
the serviceConfig changes in that work for me
23:54:28
@adam:robins.wtfadamcstephens
In reply to @hexa:lossy.network
can anyone here test a hardening change for ollama with rocm?
* the serviceConfig changes in your PR work for me
23:55:12
@hexa:lossy.networkhexa (UTC+1) adamcstephens 🐝: with what hardware/acceleration? 23:55:46
@adam:robins.wtfadamcstephensrocm on a 6700 XT23:56:07
@hexa:lossy.networkhexa (UTC+1)because for me (rx5700) and atemu (rx6700) it fails to find a device23:56:18
@hexa:lossy.networkhexa (UTC+1)
level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm]"
level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
level=WARN source=amd_linux.go:186 msg="amdgpu too old gfx000" gpu=0
level=INFO source=amd_linux.go:345 msg="no compatible amdgpu devices detected"
23:56:41
@hexa:lossy.networkhexa (UTC+1)it is related to DeviceAllow=/DevicePolicy23:56:57
@hexa:lossy.networkhexa (UTC+1) * it is related to DeviceAllow=/DevicePolicy=23:56:58
@ss:someonex.netSomeoneSerge (utc+3)
In reply to @hexa:lossy.network
SomeoneSerge (UTC+3): you know, I wonder if we can just keep these device node lists for cuda and rocm in a central location
Sounds like "why aren't we doing this yet?"
23:57:05
@hexa:lossy.networkhexa (UTC+1)yes, why aren't we? 😄 23:57:25
@hexa:lossy.networkhexa (UTC+1)you see that a lot of work goes into discovery of these 23:57:33
@adam:robins.wtfadamcstephens
Jul 21 19:43:20 sink1 ollama[3567]: time=2024-07-21T19:43:20.566-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm]"
Jul 21 19:43:20 sink1 ollama[3567]: time=2024-07-21T19:43:20.566-04:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Jul 21 19:43:20 sink1 ollama[3567]: time=2024-07-21T19:43:20.567-04:00 level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Jul 21 19:43:20 sink1 ollama[3567]: time=2024-07-21T19:43:20.568-04:00 level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
Jul 21 19:43:20 sink1 ollama[3567]: time=2024-07-21T19:43:20.568-04:00 level=INFO source=types.go:103 msg="inference compute" id=0 library=rocm compute=gfx1031 driver=0.0 name=1002:73df total="12.0 GiB" available="9.8 GiB"
23:59:37
22 Jul 2024
@hexa:lossy.networkhexa (UTC+1)sus00:01:56
@hexa:lossy.networkhexa (UTC+1)can you post deviceallow/policy?00:02:04
@adam:robins.wtfadamcstephens
❯ sudo systemctl cat ollama | rg Device
DeviceAllow=/dev/nvidia?
DeviceAllow=/dev/nvidia-caps/nvidia-cap?
DeviceAllow=/dev/nvidiactl
DeviceAllow=/dev/nvidia-modeset
DeviceAllow=/dev/nvidia-uvm
DeviceAllow=/dev/nvidia-uvm-tools
DeviceAllow=/dev/dri/card*
DeviceAllow=/dev/dri/renderD*
DeviceAllow=/dev/kfd
DevicePolicy=closed
00:02:54
@hexa:lossy.networkhexa (UTC+1)ok, that is wild00:03:56
@hexa:lossy.networkhexa (UTC+1)did you set up anything specific for rocm?00:05:03
@hexa:lossy.networkhexa (UTC+1)
level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm]"
level=WARN source=amd_linux.go:58 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
level=INFO source=amd_linux.go:333 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
level=INFO source=types.go:98 msg="inference compute" id=0 library=rocm compute=gfx1010 driver=0.0 name=1002:731f total="8.0 GiB" available="6.5 GiB"
00:07:46
@hexa:lossy.networkhexa (UTC+1)hah!00:07:47
@hexa:lossy.networkhexa (UTC+1)
diff --git a/nixos/modules/services/misc/ollama.nix b/nixos/modules/services/misc/ollama.nix
index 63ee6798a6dd..d7cabb9af497 100644
--- a/nixos/modules/services/misc/ollama.nix
+++ b/nixos/modules/services/misc/ollama.nix
@@ -183,16 +183,12 @@ in
         DeviceAllow = [
           # CUDA
           # https://docs.nvidia.com/dgx/pdf/dgx-os-5-user-guide.pdf
-          "/dev/nvidia?"
-          "/dev/nvidia-caps/nvidia-cap?"
-          "/dev/nvidiactl"
-          "/dev/nvidia-modeset"
-          "/dev/nvidia-uvm"
-          "/dev/nvidia-uvm-tools"
+          "char-nvidiactl"
+          "char-nvidia-caps"
+          "char-nvidia-uvm"
           # ROCm
-          "/dev/dri/card*"
-          "/dev/dri/renderD*"
-          "/dev/kfd"
+          "char-drm"
+          "char-kfd"
         ];
         DevicePolicy = "closed";
         LockPersonality = true;
00:08:01
@hexa:lossy.networkhexa (UTC+1)device node type matching works better for me00:08:27
@hexa:lossy.networkhexa (UTC+1)also more concise, less pattern matching00:08:34
@hexa:lossy.networkhexa (UTC+1)updated the PR, please retest, if it works for both of us now00:09:33
@adam:robins.wtfadamcstephens yeah i have the rocmOverrideGfx = "10.3.0"; since the 6700 xt is outside the supported cards 00:10:09
@hexa:lossy.networkhexa (UTC+1)yeah, few if any consumer cards have official rocm support00:10:32
@hexa:lossy.networkhexa (UTC+1)and the rx 5000 series is just broken with rocm00:10:56
@adam:robins.wtfadamcstephensupdated DeviceAllow still works00:12:31

Show newer messages


Back to Room ListRoom Version: 9