NixOS CUDA - Public Room Timeline

	NixOS CUDA	282 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
23 Sep 2025
Hugo	Especially since a working build (previous release) finished in 14 minutes.	12:08:09
Gregor Burger	Hi Guys, quick question is there an equivalent cudaPackages.backendStdenv for clang?	12:09:25
Gregor Burger	* Hi, quick question is there an equivalent cudaPackages.backendStdenv for clang?	12:11:37
connor (he/him)	It looks like you’re out of memory and swapping hard; try lowering the number of cores given to the job or the number of parallel instances NVCC runs with (per what Albert said above). I’ve had to enable ZRAM (which has been highly effective) for some builds even on my desktops with 96GB of RAM.	12:53:57
Hugo	I have this policy on my server, and a similar one on my desktop. Should that not prevent the OS from swapping? `systemd.services.nix-daemon.serviceConfig = { CPUAccounting = true; AllowedCPUs = "2-15"; MemoryAccounting = true; MemoryHigh = "48G"; MemoryMax = "56G"; };`	12:55:49
connor (he/him)	Currently no; I’m working on making the setup hooks and everything generic for Clang in https://github.com/NixOS/nixpkgs/pull/437723 but I ran into issues doing that and it’s not a high priority for me in the scope of that PR. Any particular use case?	12:56:18
connor (he/him)	I don’t know enough about systemd to answer that, but I know some of the flash attention kernel builds consume at least a hundred GB of RAM and if you’re seeing the build stall that reminds me of the swapping to disk behavior I’d seen previously. (I may also have misinterpreted the BTOP screen shot.)	12:58:25
Hugo	Apparently there is an extra setting `MemorySwapMax = "0";` that can disallow swap for a systemd unit.	13:37:42
Gregor Burger	We would like to compile our codebase both in gcc and clang to get a broader coverage of warnings and errors.	14:33:43
	Winter joined the room.	19:57:49
	georgyo joined the room.	23:08:45
24 Sep 2025
apache8080	I'm running into a weird issue with tensorrt and a sandboxed environment. Im trying to run tensorrt models within the nix sandbox by leveraging the nix extra-sandbox-paths. This allows the nix sandbox to have access to hardware and drivers (e.g nvidia drivers). I'm able to successfully run trtexec using this to generate tensorrt engines from an ONNX file but for some reason when I try to run inference on those tensorrt engines in the sandbox it just hangs forever. I verified that all of the correct libraries are loaded in the sandbox but it is still just hanging forever. What is weird is the model is loaded on to the GPU just fine but it just hangs forever on inference calls. This only happens in the sandbox and so I think I may just be missing some paths/settings to expose that our app requires or what trtexec brings in on its own. Outside of the sandbox I can run our app just fine. Pretty stuck on this one at the moment	01:55:44
apache8080	looks like the issue is on the application side and not a driver/nvidia library issue. extra-sandbox-paths seems to be working fine	03:06:29
connor (he/him)	What HW/host OS/driver/CUDA & TensorRT version? Generating inference engines with TensorRT in the sandbox is something I want to look into so I’d love to hear more about pain points	06:01:46
Winter	`ImportError: /nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so: undefined symbol: _ZN3tsl8profiler8internal21g_trace_filter_bitmapE` anyone know what could be going wrong here? i'm just using bog-standard `pythonPackages.tensorflow` with `cudaSupport = true`	15:58:45
Winter	(though maybe unrelated)	15:58:49
Winter	i find it weird that this is even happening given this is all built by us/`backendStdenv`	16:00:57
Winter	(occurs during the `_pywrap_cpu_feature_guard`)	16:03:00
Winter	[maybe the wrong channel, lmk if i should move :)]	16:04:33
Winter	disregard	16:06:29
Winter	computers are downright evil	16:06:34
Winter	(the library it's pointing to isn't actually the one it's loading!)	16:06:50
apyh	is there a server for pytorch stuff specifically, or is this as close as it gets? really struggling to get torch.compile working :/	16:38:53
Robbie Buxton	What error are you running into apyh?	16:42:44
Duncan Gammie	apyh: you'll probably get the fastest answer to that here if you provide specific error messages here: https://discuss.pytorch.org/c/compile/41	18:30:20
apyh	In reply to @sporeray:matrix.org What error are you running into apyh? well, torch's .compile functionality requires a bunch of stuff that isn't provided in its nix derivation - needs gcc at runtime, it reads an /etc/passwd file to pick a cache directory, etc - so it doesn't work out of the box thru it's nixpkgs stuff	18:50:26
apyh	was just wondering if there was like a torch-nix chat outside here	18:51:40
Robbie Buxton	Ah I’ve recently fixed the gcc iisue locally, I was planning to put a pr in upstream this week.	18:58:56
Robbie Buxton	* Ah I’ve recently fixed the gcc issue locally, I was planning to put a pr in upstream this week.	18:59:05
apyh	you will, for CUDA, also need to set TRITON_LIBCUDA_PATH - it normally tries to find it with ldconfig	20:09:52

Show newer messages

Back to Room ListRoom Version: 9