| 30 Sep 2025 |
SomeoneSerge (back on matrix) | * Gaétan Lepage: wdyt throwing hydra at it? I can edit the input, if that's ok | 08:36:55 |
Gaétan Lepage | Sure, let's go for it. | 11:24:17 |
SomeoneSerge (back on matrix) | connor (he/him) (UTC-7): https://hydra.nixos-cuda.org/eval/44 | 12:24:07 |
connor (burnt/out) (UTC-8) | I'm a dummy, the option I wanted was one more down and doesn't have a short flag:
-P, --skip-package SKIP_PACKAGE
Packages to not build (can be passed multiple times)
--skip-package-regex SKIP_PACKAGE_REGEX
Regular expression that package attributes have not to match (can be passed multiple times)
| 12:43:21 |
Winter | 2025-09-30 15:11:37.818161: W external/local_xla/xla/service/gpu/llvm_gpu_backend/default/nvptx_libdevice_path.cc:40] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
...
/usr/local/cuda
/opt/cuda
/nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/../nvidia/cuda_nvcc
/nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/../../nvidia/cuda_nvcc
/nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/cuda
/nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/../../../..
/nix/store/d2b95k4ysi7822hnxq72np5vvfq7wbbp-python3.12-tensorflow-gpu-2.19.0/lib/python3.12/site-packages/tensorflow/../../../../..
anyone ever see smth like this before? | 19:14:29 |
Winter | grepped around existing issues/prs a bit but no dice | 19:14:57 |
| 1 Oct 2025 |
connor (burnt/out) (UTC-8) | nvvm is a subdirectory of cuda_nvcc pre-CUDA 13.0; I don’t remember which output it’s in though. Seems like the error is mostly about being unable to find that. | 01:03:41 |
SomeoneSerge (back on matrix) | I think we put some config next to bin/nvcc that points at the correct libdevice location? Used to be in the overrides | 15:46:38 |
Kevin Mittman (EOY sleep) | In CUDA 13, there was a split from cuda_nvcc to cuda_crt, libnvvm, and libnvptxcompiler components | 15:47:11 |
| Kevin Mittman (EOY sleep) changed their display name from Kevin Mittman (UTC+9) to Kevin Mittman (UTC-7). | 15:48:07 |
| @magic_rb:matrix.redalder.org left the room. | 18:23:33 |
Winter | this is just using pythonPackages.tensorflow w/ config.cudaSupport on 25.05 -- so that's CUDA 12, right? | 19:12:26 |
Winter | dunno why this would be sad then | 19:12:38 |
SomeoneSerge (back on matrix) | At what stage do you get that error? | 19:20:29 |
Winter | runtime, after i've successfully imported tf | 20:09:22 |
Winter | * runtime, after i've successfully imported tf -- i then get some JIT compilation error | 20:15:31 |
Winter | seems like XLA_FLAGS may be my friend here | 20:16:43 |
Winter | ok yeah setting XLA_FLAGS=--xla_gpu_cuda_data_dir=/nix/store/y16bl5h9nxdbyfs922x4bz9lkk51kx1d-cuda_nvcc-12.8.93 fixed it! | 20:20:09 |
Winter | i think this is just a simple fix in tensorflow-bin | 20:20:19 |
Winter | i don’t particularly want to build tf-bin again but i’m going to get a PR up to fix this :-) | 21:57:28 |
Winter | i assume we have the concept of a check for whatever cuda version is in use — connor (he/him) (UTC-7) do you want a check for <13 before 13 is actually in, as a reminder of sorts to test this out with 13? (still need an MRE tho) | 21:58:24 |
Winter | for clarity this is some JIT’d XLA stuff | 21:59:17 |
connor (burnt/out) (UTC-8) | IIRC on 13 I symlink it so it should still be available in NVCC’s bin output | 22:38:35 |