!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

310 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda60 Servers

Load older messages


SenderMessageTime
26 Jun 2024
@hexa:lossy.networkhexaI think every rebuild in the last few months for my homeserver took ~3-5 hours šŸ˜„ 01:12:33
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @hexa:lossy.network
SomeoneSerge (UTC+3): how would you describe the current state of the cuda maintainers cache?
I'd describe it as "sad". I ditched all but the "default" job because hercules effects (used to update the lock file) were misbehaving. I had patched the hole with a github action and switched to other business
01:17:58
@ss:someonex.netSomeoneSerge (matrix works sometimes)I'm slowly getting my sh-t together, let's chat again about hydra later this week01:44:12
@glepage:matrix.orgGaƩtan Lepage Who provides cuda_runtime.h ? 13:22:13
@hexa:lossy.networkhexa
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7981/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.ou.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7982/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7983/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) client_loop: send disconnect: Broken pipe
stderr) error: Nix daemon disconnected unexpectedly (maybe it crashed?)
stderr) error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
stderr)        last 1 log lines:
stderr)        > client_loop: send disconnect: Broken pipe
stderr)        For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
13:25:27
@hexa:lossy.networkhexa😭13:25:31
@glepage:matrix.orgGaƩtan Lepage
In reply to @hexa:lossy.network
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7981/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.ou.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7982/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7983/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) client_loop: send disconnect: Broken pipe
stderr) error: Nix daemon disconnected unexpectedly (maybe it crashed?)
stderr) error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
stderr)        last 1 log lines:
stderr)        > client_loop: send disconnect: Broken pipe
stderr)        For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
8253 ? That's torchWithCuda. (Why the hell do I know that by heart 🫠 ?)
13:26:36
@hexa:lossy.networkhexabecause you are as emotionally damaged by cudaSupport as me13:27:07
@hexa:lossy.networkhexathe only thing worse is being gaslit by bazel build jobs13:27:28
@glepage:matrix.orgGaƩtan Lepage
In reply to @hexa:lossy.network
the only thing worse is being gaslit by bazel build jobs
I suffer from this too
13:27:56
@ss:someonex.netSomeoneSerge (matrix works sometimes)
āÆ ag cuda_runtime.h
...

pkgs/development/python-modules/torch/default.nix
453:        cuda_cudart.dev # cuda_runtime.h and libraries
13:32:00
@hexa:lossy.networkhexathe first 7800 build steps of torchWithCuda are essentially free14:17:54
@hexa:lossy.networkhexaI'm not sure what happens between 7800 and 825314:18:01
@hexa:lossy.networkhexait feels like I'm doing proof of work in that range14:18:12
@glepage:matrix.orgGaƩtan Lepagecuda kernels :/14:18:55
@glepage:matrix.orgGaƩtan LepageThey are excuciatingly slow14:19:17
@ss:someonex.netSomeoneSerge (matrix works sometimes)And it's good to limit parallelism there...14:19:31
@hexa:lossy.networkhexaimage.png
Download image.png
14:20:28
@hexa:lossy.networkhexado we? šŸ˜„ 14:20:29
@hexa:lossy.networkhexaI mean, its all userspace load, so it's not too bad14:20:44
@hexa:lossy.networkhexasome 5% system load14:21:06
@hexa:lossy.networkhexa
error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
       last 1 log lines:
       > client_loop: send disconnect: Broken pipe
       For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
15:40:24
@hexa:lossy.networkhexaremote building faiss fails repeatedly šŸ˜•15:40:42
@hexa:lossy.networkhexasaw this behavior weeks ago as well15:43:12
@hexa:lossy.networkhexasince I have to build pyannotate-audio for whisper-ctranslate2 🫠15:43:41
@connorbaker:matrix.orgconnor (he/him)Doesn’t torch still have that problem where, when compiling the CUDA kernels for flash attention, if you don’t limit parallelism to 1 then you can easily consume 100GB+ of RAM?16:23:11
@connorbaker:matrix.orgconnor (he/him)I remember enabling ZRAM partly because of that. And I’m pretty sure they were all zero pages too, because they compressed to absolutely nothing lmao16:23:46
@hexa:lossy.networkhexa128GB RAM here16:50:58
@hexa:lossy.networkhexaWith 50% zramswap16:51:15
@hexa:lossy.networkhexaSo yeah, pretty scuffes16:51:26

Show newer messages


Back to Room ListRoom Version: 9