| 26 Jun 2024 |
connor (burnt/out) (UTC-8) | Doesn’t torch still have that problem where, when compiling the CUDA kernels for flash attention, if you don’t limit parallelism to 1 then you can easily consume 100GB+ of RAM? | 16:23:11 |
connor (burnt/out) (UTC-8) | I remember enabling ZRAM partly because of that. And I’m pretty sure they were all zero pages too, because they compressed to absolutely nothing lmao | 16:23:46 |
hexa (UTC+1) | 128GB RAM here | 16:50:58 |
hexa (UTC+1) | With 50% zramswap | 16:51:15 |
hexa (UTC+1) | So yeah, pretty scuffes | 16:51:26 |
hexa (UTC+1) | * | 16:51:30 |
SomeoneSerge (matrix works sometimes) | Gosh ofborg darn evals are so slow | 18:32:10 |
SomeoneSerge (matrix works sometimes) | Here goes https://github.com/NixOS/nixpkgs/pull/256230 | 19:17:07 |