NixOS CUDA - Public Room Timeline

	NixOS CUDA	282 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	59 Servers

Load older messages

Sender	Message	Time
9 Sep 2025
Hugo	Hello! I am trying to launch tests that rely on CUDA, following help from Gaétan Lepage yesterday, but I do not succeed. My host is running nixpkgs 25.05, with a clone of `nixpkgs/master` in the current directory. Any idea how to get this to run?	09:36:10
Hugo	nix-build -I nixpkgs=. --arg config '{ allowUnfree = true; cudaSupport = true;}' -A python313Packages.triton.tests.axpy-cuda.gpuCheck this derivation will be built: /nix/store/2m1zkm221qr6ziw2qkbds3r37r57f7xj-test-cuda.drv building '/nix/store/2m1zkm221qr6ziw2qkbds3r37r57f7xj-test-cuda.drv'... Traceback (most recent call last): File "/nix/store/biwmrywsnh5nvfxg13d319cx65956rvc-tester-cuda/bin/tester-cuda", line 38, in <module> x = torch.rand(size, device='cuda') File "/nix/store/419qp86g5l617y4pv5m0fgj04rhfnxrp-python3-3.13.6-env/lib/python3.13/site-packages/torch/cuda/__init__.py", line 412, in _lazy_init torch._C._cuda_init() ~~~~~~~~~~~~~~~~~~~^^ RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx error: builder for '/nix/store/2m1zkm221qr6ziw2qkbds3r37r57f7xj-test-cuda.drv' failed with exit code 1; last 7 log lines: > Traceback (most recent call last): > File "/nix/store/biwmrywsnh5nvfxg13d319cx65956rvc-tester-cuda/bin/tester-cuda", line 38, in <module> > x = torch.rand(size, device='cuda') > File "/nix/store/419qp86g5l617y4pv5m0fgj04rhfnxrp-python3-3.13.6-env/lib/python3.13/site-packages/torch/cuda/__init__.py", line 412, in _lazy_init > torch._C._cuda_init() > ~~~~~~~~~~~~~~~~~~~^^ > RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx For full logs, run: nix log /nix/store/2m1zkm221qr6ziw2qkbds3r37r57f7xj-test-cuda.drv `python Python 3.12.11 (main, Jun 3 2025, 15:41:47) [GCC 14.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.cuda.is_available() True` `nix-shell -I nixpkgs=. --arg config '{ allowUnfree = true; cudaSupport = true;}' -p python312Packages.torch [nix-shell:~/Repos/hoh/nixpkgs]$ python Python 3.12.11 (main, Jun 3 2025, 15:41:47) [GCC 14.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> torch.cuda.is_available() True`	09:36:11
	Albert Larsan joined the room.	13:31:04
Albert Larsan	Hello! I am packaging the cuda plugin for xmrig. It supports all the way down to cuda 8, so it is made to use the cuda toolkit. I am able to make it build with the split packages, but it involves ugly hacks with env vars, as the file `/nix/store/vg32acb8vlqyhkhabbgvmralfw0kwhi3-cuda_cudart-12.8.90-dev/include/cuda_runtime.h` can't find the `crt/host_config.h` file.	13:56:49
	aciceri changed their display name from zrsk to aciceri.	15:01:30
Albert Larsan	Fixed it by using the "root" packages instead of using specific outputs	15:33:48
	PhiliPdB joined the room.	18:44:33
connor (burnt/out) (UTC-8)	You're trying to run tests which require a GPU in the sandbox. Make sure you've enabled the `nix-required-mounts` NixOS module option (https://search.nixos.org/options?channel=25.05&show=programs.nix-required-mounts.presets.nvidia-gpu.enable&query=nix-required-mounts) and try again.	23:29:21
connor (burnt/out) (UTC-8)	Do you have an example of what you were trying to do previously which wasn't working?	23:31:08
10 Sep 2025
	SomeoneSerge (back on matrix) changed their display name from SomeoneSerge (@nixcon & back on matrix) to SomeoneSerge (back on matrix).	00:35:23
SomeoneSerge (back on matrix)	softdep issues again?	00:43:26
zowoq	It only ever takes a couple of hours for our hydra to catch up. After the latest staging-next merge and subsequent nixos-unstable-small channel bump it took less than five hours for the full rebuild of the cuda jobset.	04:06:52
Albert Larsan	Here is my first version, before I went ahead and created a PR to add it to upstream nixpkgs: https://git.sr.ht/~albertlarsan68/nur/tree/8fcbc4612bcd097065c5691ca18cbc8f0e0825a0/item/pkgs/xmrig-cuda-mo/default.nix And here is the pr: https://github.com/NixOS/nixpkgs/pull/441494	05:59:05
Hugo	Thanks connor (he/him) (UTC+2) . I can now launch the triton test. However, when attempting to launch tests on the `unstloth` library, nix builds Torch for Python313 instead of Python 312. Torch is not supported on Python 3.13 yet - it still attempts to build however which confuses me. diff --git a/pkgs/development/python-modules/unsloth/default.nix b/pkgs/development/python-modules/unsloth/default.nix index 73f94721b5e0..e6473c3bfa1d 100644 --- a/pkgs/development/python-modules/unsloth/default.nix +++ b/pkgs/development/python-modules/unsloth/default.nix @@ -27,6 +27,9 @@ hf-transfer, diffusers, torchvision, + + # tests + cudaPackages, }: buildPythonPackage rec { @@ -85,6 +88,19 @@ buildPythonPackage rec { # NotImplementedError: Unsloth: No NVIDIA GPU found? Unsloth currently only supports GPUs! dontUsePythonImportsCheck = true; + passthru.tests = { + import-cuda = cudaPackages.writeGpuTestPython + { + libraries = ps: [ + ps.torch + ]; + } + '' + import unsloth + unsloth.test() + ''; + }; + meta = { description = "Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory"; homepage = "https://github.com/unslothai/unsloth"; `nix-build -I nixpkgs=. --arg config '{ allowUnfree = true; cudaSupport = true;}' -A python312Packages.unsloth.tests.import-cuda`	07:12:50
Gaétan Lepage	Why shouldn't torch support 3.13?	07:15:31
Hugo	Sorry, tensorflow does not support Python 3.13	07:16:41
Hugo	How can I get the passthru tests use the Python interpreter specified ?	07:20:59
SomeoneSerge (back on matrix)	connor (he/him) (UTC+2): are 7AM (2PM UTC) meetings still OK for you? Would you like to reschedule to a different time? We could have Kevin Mittman join us the next time or other	07:48:33
SomeoneSerge (back on matrix)	Oh cool, I wonder if this is the 1st project consuming cuda or torch via meson i nixpkgs	07:52:33
SomeoneSerge (back on matrix)	AFAIK NVIDIA has never had any objections to ZLUDA, only AMD did	07:55:35
Hugo	I managed to launch a test on my package with CUDA enabled, but I get an issue from `triton` not finding a C compiler. Does that ring a bell to someone ? `RuntimeError: Failed to find C compiler. Please specify via CC environment variable or set triton.knobs.build.impl.` I share my work in progress in a draft PR here: https://github.com/NixOS/nixpkgs/pull/441728	09:59:29
SomeoneSerge (back on matrix)	In reply to @hugo:okeso.eu I managed to launch a test on my package with CUDA enabled, but I get an issue from `triton` not finding a C compiler. Does that ring a bell to someone ? `RuntimeError: Failed to find C compiler. Please specify via CC environment variable or set triton.knobs.build.impl.` I share my work in progress in a draft PR here: https://github.com/NixOS/nixpkgs/pull/441728 At the very least we recently stopped early-binding rocm libraries, maybe hard-coded compiler paths went with them. Try giving it a compiler at test time as suggested in the error?	10:05:31
Hugo	I (vibe) tried to give it a compiler here in this commit https://github.com/NixOS/nixpkgs/pull/441728/commits/81f7997ca1ca37193f2f26fdbc85c586a92ba6dd but was unsuccessful. Any suggestion how to do that?	10:06:58
	matthewcroughan changed their display name from matthewcroughan @ nixcon to matthewcroughan.	15:02:50
Lun	That should only impact ROCm unless I messed it up! diff was https://github.com/NixOS/nixpkgs/commit/c74e5ffb6526ac1b4870504921b9ba9362189a17	15:52:00
	layus joined the room.	18:26:24
layus	Is this team involved in flox/nvidia partnership ? (See https://flox.dev/cuda/) I guess so since the nixos foundation also is, but there is no mention of this team or its amazing work.	18:30:16
matthewcroughan	adrian-gierakowski: !!! Exception during processing !!! HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions. Traceback (most recent call last): File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/execution.py", line 496, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/execution.py", line 315, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/execution.py", line 289, in _async_map_node_over_list await process_inputs(input_dict, i) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/execution.py", line 277, in process_inputs result = f(*inputs) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/nodes.py", line 74, in encode return (clip.encode_from_tokens_scheduled(tokens), ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^ File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd.py", line 170, in encode_from_tokens_scheduled pooled_dict = self.encode_from_tokens(tokens, return_pooled=return_pooled, return_dict=True) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd.py", line 232, in encode_from_tokens o = self.cond_stage_model.encode_token_weights(tokens) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd1_clip.py", line 689, in encode_token_weights out = getattr(self, self.clip).encode_token_weights(token_weight_pairs) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd1_clip.py", line 45, in encode_token_weights o = self.encode(to_encode) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd1_clip.py", line 291, in encode return self(tokens) File "/nix/store/jzm64j9dp50xs770h3w7n8h9pj6mpkjp-python3.13-torch-2.8.0/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(args, *kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/nix/store/jzm64j9dp50xs770h3w7n8h9pj6mpkjp-python3.13-torch-2.8.0/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl return forward_call(args, *kwargs) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd1_clip.py", line 253, in forward embeds, attention_mask, num_tokens, embeds_info = self.process_tokens(tokens, device) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/sd1_clip.py", line 204, in process_tokens tokens_embed = self.transformer.get_input_embeddings()(tokens_embed, out_dtype=torch.float32) File "/nix/store/jzm64j9dp50xs770h3w7n8h9pj6mpkjp-python3.13-torch-2.8.0/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(args, *kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/nix/store/jzm64j9dp50xs770h3w7n8h9pj6mpkjp-python3.13-torch-2.8.0/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl return forward_call(args, *kwargs) File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/ops.py", line 270, in forward return self.forward_comfy_cast_weights(args, **kwargs) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/nix/store/dg5g3ypdsjvy0274156l74klx4wr0nbx-comfyui-unstable-2025-09-06/lib/python3.13/site-packages/comfy/ops.py", line 266, in forward_comfy_cast_weights return torch.nn.functional.embedding(input, weight, self.padding_idx, self.max_norm, self.norm_type, self.scale_grad_by_freq, self.sparse).to(dtype=output_dtype) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/nix/store/jzm64j9dp50xs770h3w7n8h9pj6mpkjp-python3.13-torch-2.8.0/lib/python3.13/site-packages/torch/nn/functional.py", line 2546, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.AcceleratorError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.	19:53:26
matthewcroughan	After a whole day recompiling torch :)	19:53:32
matthewcroughan	Actually, with the HSA override to 11.0.0 it worked, but I get a different kind of error `loaded completely 30779.053608703613 1639.406135559082 True 0%\| \| 0/1 [00:00<?, ?it/s:0:rocdevice.cpp :3020: 78074348282d us: Callback: Queue 0x7f831c600000 aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f`	19:58:53

Show newer messages

Back to Room ListRoom Version: 9