NixOS CUDA - Public Room Timeline

	NixOS CUDA	282 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
23 Sep 2025
	Winter joined the room.	19:57:49
	georgyo joined the room.	23:08:45
24 Sep 2025
apache8080	I'm running into a weird issue with tensorrt and a sandboxed environment. Im trying to run tensorrt models within the nix sandbox by leveraging the nix extra-sandbox-paths. This allows the nix sandbox to have access to hardware and drivers (e.g nvidia drivers). I'm able to successfully run trtexec using this to generate tensorrt engines from an ONNX file but for some reason when I try to run inference on those tensorrt engines in the sandbox it just hangs forever. I verified that all of the correct libraries are loaded in the sandbox but it is still just hanging forever. What is weird is the model is loaded on to the GPU just fine but it just hangs forever on inference calls. This only happens in the sandbox and so I think I may just be missing some paths/settings to expose that our app requires or what trtexec brings in on its own. Outside of the sandbox I can run our app just fine. Pretty stuck on this one at the moment	01:55:44
apache8080	looks like the issue is on the application side and not a driver/nvidia library issue. extra-sandbox-paths seems to be working fine	03:06:29
connor (he/him)	What HW/host OS/driver/CUDA & TensorRT version? Generating inference engines with TensorRT in the sandbox is something I want to look into so I’d love to hear more about pain points	06:01:46
Winter	`ImportError: /nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN3tsl8profiler8internal21g_trace_filter_bitmapE` anyone know what could be going wrong here? i'm just using bog-standard `pythonPackages.tensorflow` with `cudaSupport = true`	15:58:45
Winter	(though maybe unrelated)	15:58:49
Winter	i find it weird that this is even happening given this is all built by us/`backendStdenv`	16:00:57
Winter	(occurs during the `_pywrap_cpu_feature_guard`)	16:03:00
Winter	[maybe the wrong channel, lmk if i should move :)]	16:04:33
Winter	disregard	16:06:29
Winter	computers are downright evil	16:06:34
Winter	(the library it's pointing to isn't actually the one it's loading!)	16:06:50
apyh	is there a server for pytorch stuff specifically, or is this as close as it gets? really struggling to get torch.compile working :/	16:38:53
Robbie Buxton	What error are you running into apyh?	16:42:44
Duncan Gammie	apyh: you'll probably get the fastest answer to that here if you provide specific error messages here: https://discuss.pytorch.org/c/compile/41	18:30:20
apyh	In reply to @sporeray:matrix.org What error are you running into apyh? well, torch's .compile functionality requires a bunch of stuff that isn't provided in its nix derivation - needs gcc at runtime, it reads an /etc/passwd file to pick a cache directory, etc - so it doesn't work out of the box thru it's nixpkgs stuff	18:50:26
apyh	was just wondering if there was like a torch-nix chat outside here	18:51:40
Robbie Buxton	Ah I’ve recently fixed the gcc iisue locally, I was planning to put a pr in upstream this week.	18:58:56
Robbie Buxton	* Ah I’ve recently fixed the gcc issue locally, I was planning to put a pr in upstream this week.	18:59:05
apyh	you will, for CUDA, also need to set TRITON_LIBCUDA_PATH - it normally tries to find it with ldconfig	20:09:52
Robbie Buxton	How are you providing your cuda kernel libraries, are you on NixOS or a different distribution?	20:15:01
Robbie Buxton	I.e where are you getting `libcuda.so` from?	20:15:42
apyh	I'm in a docker container 😅	21:17:32
apyh	so i just point to /lib64/libcuda.so	21:17:44
Robbie Buxton	Nix expects it in `/run/opengl-driver/lib`	21:18:49
apyh	ah yeah I use nix-gl-host	21:20:30
apyh	for all that	21:20:31
Robbie Buxton	I’m not sure what the recommended way of doing that is these days but I symlink in all the required libraries to that path	21:20:32
Robbie Buxton	I’m confused why triton is struggling to find cuda tho	21:21:01

Show newer messages

Back to Room ListRoom Version: 9