!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

282 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
23 Sep 2025
@winter:catgirl.cloudWinter joined the room.19:57:49
@georgyo:nycr.chatgeorgyo joined the room.23:08:45
24 Sep 2025
@apache8080:matrix.orgapache8080

I'm running into a weird issue with tensorrt and a sandboxed environment. Im trying to run tensorrt models within the nix sandbox by leveraging the nix extra-sandbox-paths. This allows the nix sandbox to have access to hardware and drivers (e.g nvidia drivers). I'm able to successfully run trtexec using this to generate tensorrt engines from an ONNX file but for some reason when I try to run inference on those tensorrt engines in the sandbox it just hangs forever. I verified that all of the correct libraries are loaded in the sandbox but it is still just hanging forever. What is weird is the model is loaded on to the GPU just fine but it just hangs forever on inference calls. This only happens in the sandbox and so I think I may just be missing some paths/settings to expose that our app requires or what trtexec brings in on its own. Outside of the sandbox I can run our app just fine.

Pretty stuck on this one at the moment

01:55:44
@apache8080:matrix.orgapache8080looks like the issue is on the application side and not a driver/nvidia library issue. extra-sandbox-paths seems to be working fine03:06:29
@connorbaker:matrix.orgconnor (he/him)What HW/host OS/driver/CUDA & TensorRT version? Generating inference engines with TensorRT in the sandbox is something I want to look into so I’d love to hear more about pain points06:01:46
@winter:catgirl.cloudWinter
ImportError: /nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN3tsl8profiler8internal21g_trace_filter_bitmapE
anyone know what could be going wrong here? i'm just using bog-standard pythonPackages.tensorflow with cudaSupport = true
15:58:45
@winter:catgirl.cloudWinter(though maybe unrelated)15:58:49
@winter:catgirl.cloudWinter i find it weird that this is even happening given this is all built by us/backendStdenv 16:00:57
@winter:catgirl.cloudWinter (occurs during the _pywrap_cpu_feature_guard) 16:03:00
@winter:catgirl.cloudWinter[maybe the wrong channel, lmk if i should move :)]16:04:33
@winter:catgirl.cloudWinterdisregard16:06:29
@winter:catgirl.cloudWintercomputers are downright evil16:06:34
@winter:catgirl.cloudWinter(the library it's pointing to isn't actually the one it's loading!)16:06:50
@apyh:matrix.orgapyhis there a server for pytorch stuff specifically, or is this as close as it gets? really struggling to get torch.compile working :/16:38:53
@sporeray:matrix.orgRobbie BuxtonWhat error are you running into apyh?16:42:44
@gammieduncan:matrix.orgDuncan Gammie apyh: you'll probably get the fastest answer to that here if you provide specific error messages here: https://discuss.pytorch.org/c/compile/41 18:30:20
@apyh:matrix.orgapyh
In reply to @sporeray:matrix.org
What error are you running into apyh?
well, torch's .compile functionality requires a bunch of stuff that isn't provided in its nix derivation - needs gcc at runtime, it reads an /etc/passwd file to pick a cache directory, etc - so it doesn't work out of the box thru it's nixpkgs stuff
18:50:26
@apyh:matrix.orgapyhwas just wondering if there was like a torch-nix chat outside here 18:51:40
@sporeray:matrix.orgRobbie BuxtonAh I’ve recently fixed the gcc iisue locally, I was planning to put a pr in upstream this week.18:58:56
@sporeray:matrix.orgRobbie Buxton* Ah I’ve recently fixed the gcc issue locally, I was planning to put a pr in upstream this week.18:59:05
@apyh:matrix.orgapyhyou will, for CUDA, also need to set TRITON_LIBCUDA_PATH - it normally tries to find it with ldconfig20:09:52
@sporeray:matrix.orgRobbie Buxton How are you providing your cuda kernel libraries, are you on NixOS or a different distribution? 20:15:01
@sporeray:matrix.orgRobbie Buxton I.e where are you getting libcuda.so from? 20:15:42
@apyh:matrix.orgapyhI'm in a docker container 😅21:17:32
@apyh:matrix.orgapyhso i just point to /lib64/libcuda.so21:17:44
@sporeray:matrix.orgRobbie Buxton Nix expects it in /run/opengl-driver/lib 21:18:49
@apyh:matrix.orgapyhah yeah I use nix-gl-host21:20:30
@apyh:matrix.orgapyhfor all that21:20:31
@sporeray:matrix.orgRobbie Buxton I’m not sure what the recommended way of doing that is these days but I symlink in all the required libraries to that path 21:20:32
@sporeray:matrix.orgRobbie Buxton I’m confused why triton is struggling to find cuda tho 21:21:01

Show newer messages


Back to Room ListRoom Version: 9