| 23 Sep 2025 |
| Winter joined the room. | 19:57:49 |
| georgyo joined the room. | 23:08:45 |
| 24 Sep 2025 |
apache8080 | I'm running into a weird issue with tensorrt and a sandboxed environment. Im trying to run tensorrt models within the nix sandbox by leveraging the nix extra-sandbox-paths. This allows the nix sandbox to have access to hardware and drivers (e.g nvidia drivers). I'm able to successfully run trtexec using this to generate tensorrt engines from an ONNX file but for some reason when I try to run inference on those tensorrt engines in the sandbox it just hangs forever. I verified that all of the correct libraries are loaded in the sandbox but it is still just hanging forever. What is weird is the model is loaded on to the GPU just fine but it just hangs forever on inference calls. This only happens in the sandbox and so I think I may just be missing some paths/settings to expose that our app requires or what trtexec brings in on its own. Outside of the sandbox I can run our app just fine.
Pretty stuck on this one at the moment
| 01:55:44 |
apache8080 | looks like the issue is on the application side and not a driver/nvidia library issue. extra-sandbox-paths seems to be working fine | 03:06:29 |
connor (he/him) | What HW/host OS/driver/CUDA & TensorRT version?
Generating inference engines with TensorRT in the sandbox is something I want to look into so I’d love to hear more about pain points | 06:01:46 |
Winter | ImportError: /nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN3tsl8profiler8internal21g_trace_filter_bitmapE
anyone know what could be going wrong here? i'm just using bog-standard pythonPackages.tensorflow with cudaSupport = true | 15:58:45 |
Winter | (though maybe unrelated) | 15:58:49 |
Winter | i find it weird that this is even happening given this is all built by us/backendStdenv | 16:00:57 |
Winter | (occurs during the _pywrap_cpu_feature_guard) | 16:03:00 |
Winter | [maybe the wrong channel, lmk if i should move :)] | 16:04:33 |
Winter | disregard | 16:06:29 |
Winter | computers are downright evil | 16:06:34 |
Winter | (the library it's pointing to isn't actually the one it's loading!) | 16:06:50 |
apyh | is there a server for pytorch stuff specifically, or is this as close as it gets? really struggling to get torch.compile working :/ | 16:38:53 |
Robbie Buxton | What error are you running into apyh? | 16:42:44 |
Duncan Gammie | apyh: you'll probably get the fastest answer to that here if you provide specific error messages here: https://discuss.pytorch.org/c/compile/41 | 18:30:20 |
apyh | In reply to @sporeray:matrix.org What error are you running into apyh? well, torch's .compile functionality requires a bunch of stuff that isn't provided in its nix derivation - needs gcc at runtime, it reads an /etc/passwd file to pick a cache directory, etc - so it doesn't work out of the box thru it's nixpkgs stuff | 18:50:26 |
apyh | was just wondering if there was like a torch-nix chat outside here | 18:51:40 |
Robbie Buxton | Ah I’ve recently fixed the gcc iisue locally, I was planning to put a pr in upstream this week. | 18:58:56 |
Robbie Buxton | * Ah I’ve recently fixed the gcc issue locally, I was planning to put a pr in upstream this week. | 18:59:05 |
apyh | you will, for CUDA, also need to set TRITON_LIBCUDA_PATH - it normally tries to find it with ldconfig | 20:09:52 |
Robbie Buxton | How are you providing your cuda kernel libraries, are you on NixOS or a different distribution? | 20:15:01 |
Robbie Buxton | I.e where are you getting libcuda.so from? | 20:15:42 |
apyh | I'm in a docker container 😅 | 21:17:32 |
apyh | so i just point to /lib64/libcuda.so | 21:17:44 |
Robbie Buxton | Nix expects it in /run/opengl-driver/lib | 21:18:49 |
apyh | ah yeah I use nix-gl-host | 21:20:30 |
apyh | for all that | 21:20:31 |
Robbie Buxton | I’m not sure what the recommended way of doing that is these days but I symlink in all the required libraries to that path | 21:20:32 |
Robbie Buxton | I’m confused why triton is struggling to find cuda tho | 21:21:01 |