| 26 Jun 2024 |
hexa | I think every rebuild in the last few months for my homeserver took ~3-5 hours š | 01:12:33 |
SomeoneSerge (matrix works sometimes) | In reply to @hexa:lossy.network SomeoneSerge (UTC+3): how would you describe the current state of the cuda maintainers cache? I'd describe it as "sad". I ditched all but the "default" job because hercules effects (used to update the lock file) were misbehaving. I had patched the hole with a github action and switched to other business | 01:17:58 |
SomeoneSerge (matrix works sometimes) | I'm slowly getting my sh-t together, let's chat again about hydra later this week | 01:44:12 |
GaƩtan Lepage | Who provides cuda_runtime.h ? | 13:22:13 |
hexa | stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7981/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.ou.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7982/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7983/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) client_loop: send disconnect: Broken pipe
stderr) error: Nix daemon disconnected unexpectedly (maybe it crashed?)
stderr) error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
stderr) last 1 log lines:
stderr) > client_loop: send disconnect: Broken pipe
stderr) For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
| 13:25:27 |
hexa | š | 13:25:31 |
GaƩtan Lepage | In reply to @hexa:lossy.network
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7981/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.ou.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7982/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7983/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) client_loop: send disconnect: Broken pipe
stderr) error: Nix daemon disconnected unexpectedly (maybe it crashed?)
stderr) error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
stderr) last 1 log lines:
stderr) > client_loop: send disconnect: Broken pipe
stderr) For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
8253 ? That's torchWithCuda. (Why the hell do I know that by heart š« ?) | 13:26:36 |
hexa | because you are as emotionally damaged by cudaSupport as me | 13:27:07 |
hexa | the only thing worse is being gaslit by bazel build jobs | 13:27:28 |
GaƩtan Lepage | In reply to @hexa:lossy.network the only thing worse is being gaslit by bazel build jobs I suffer from this too | 13:27:56 |
SomeoneSerge (matrix works sometimes) | ⯠ag cuda_runtime.h
...
pkgs/development/python-modules/torch/default.nix
453: cuda_cudart.dev # cuda_runtime.h and libraries
| 13:32:00 |
hexa | the first 7800 build steps of torchWithCuda are essentially free | 14:17:54 |
hexa | I'm not sure what happens between 7800 and 8253 | 14:18:01 |
hexa | it feels like I'm doing proof of work in that range | 14:18:12 |
GaƩtan Lepage | cuda kernels :/ | 14:18:55 |
GaƩtan Lepage | They are excuciatingly slow | 14:19:17 |
SomeoneSerge (matrix works sometimes) | And it's good to limit parallelism there... | 14:19:31 |
hexa |  Download image.png | 14:20:28 |
hexa | do we? š | 14:20:29 |
hexa | I mean, its all userspace load, so it's not too bad | 14:20:44 |
hexa | some 5% system load | 14:21:06 |
hexa | error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
last 1 log lines:
> client_loop: send disconnect: Broken pipe
For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
| 15:40:24 |
hexa | remote building faiss fails repeatedly š | 15:40:42 |
hexa | saw this behavior weeks ago as well | 15:43:12 |
hexa | since I have to build pyannotate-audio for whisper-ctranslate2 š« | 15:43:41 |
connor (he/him) | Doesnāt torch still have that problem where, when compiling the CUDA kernels for flash attention, if you donāt limit parallelism to 1 then you can easily consume 100GB+ of RAM? | 16:23:11 |
connor (he/him) | I remember enabling ZRAM partly because of that. And Iām pretty sure they were all zero pages too, because they compressed to absolutely nothing lmao | 16:23:46 |
hexa | 128GB RAM here | 16:50:58 |
hexa | With 50% zramswap | 16:51:15 |
hexa | So yeah, pretty scuffes | 16:51:26 |