| 15 Jul 2025 |
farmerd | That's where the hardware thing comes in. I was seeing issues about hash mismatches and then I tried to verify and repair my nix-store and it's got a bunch of corrupted files (it couldn't repair it). | 21:23:11 |
farmerd | Yeah I think I've got a dimm going bad on me. I've been having random crashes throughout the system and I hadn't put it together until I spent a bunch of time on this yesterday and realized how many random things were corrupted. | 21:24:02 |
farmerd | I've got a new pair of dimms coming tomorrow so I'll swap them in (and probably reinstall nix since my nix-store is apprently corrupted beyond repair :-/ ) and try again. | 21:24:59 |
farmerd | Oh, although may I ask how to specify the compute capability? I did notice it was passing a bunch of them to NVCC but I didn't see how to specify it. | 21:25:45 |
mcwitt | regardless of hardware issues, if you're just starting out I don't think you should need to build anything from source.
The reason you're seeing this is the flake template you linked is pinned to an old revision of nixpkgs-unstable, and the build artifacts have likely expired from cache.nixos.org. I'll often update the nixpkgs pin as a first step when starting with a new template for this reason | 23:57:02 |
| 16 Jul 2025 |
farmerd | Ok, that makes sense. | 01:09:44 |
connor (he/him) | See the end of the first section https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/cuda.section.md#cuda-cuda | 07:02:27 |
| 18 Jul 2025 |
connor (he/him) | Could I get a review on https://github.com/NixOS/nixpkgs/pull/426280? | 19:20:16 |
| 21 Jul 2025 |
connor (he/him) | Went ahead and merged it | 17:20:26 |
| 23 Jul 2025 |
apyh | oof the nccl version in nixpkgs is quite old now | 16:30:38 |
apyh | (quite old in the ml world, lol. only a month old) | 16:31:25 |
apyh | torchtitan needs torch 2.8, torch 2.8 requires nccl 2.27, gotta update nccl myself | 16:31:49 |
apyh | guess I'll pr to nixpkgs lol | 16:31:56 |
apyh | pr opened 😁 | 16:59:39 |
Gaétan Lepage | Can you share the link apyh? | 22:56:02 |
apyh | ah sure! https://github.com/NixOS/nixpkgs/pull/427804 | 23:00:23 |
apyh | they added a bunch of new stuff so i have to patch the shebang in a second python script. surprisingly didn't cause a build failure without it, just didn't export some of the new symbols | 23:01:02 |
Gaétan Lepage | Thanks! | 23:03:49 |
| 24 Jul 2025 |
apyh | huh. thanks for the nixpkgs-review. very strange to me that it fails to build pytorch as a result, but that the python 3.13 failure is just a bunch of .. warnings inside torch? i'll compile again locally to see.. | 14:56:24 |
apyh | can't repro the build failure locally for python312Packages.torchWithCuda Gaétan Lepage 🤔 left a comment here to that effect https://github.com/NixOS/nixpkgs/pull/427804#issuecomment-3114819745 | 20:26:13 |
apyh | can't repro any of the build failures in fact, only took 3.5 hours per torch to test 😭 | 23:51:03 |
| 25 Jul 2025 |
Gaétan Lepage | It probably failed because of flakiness | 10:57:16 |
apyh | rebased it btw :) | 17:29:53 |
apyh | both builds worked fine on my machine.. does nixpkgs-review have a timeout? lol | 17:30:06 |
apyh | i have a 7800x3d and it still took 3.5 hours per torch build | 17:30:26 |
| 26 Jul 2025 |
Tristan Ross | Is that a PR that my 128 cores could be useful with? | 00:34:02 |
apyh | haha i mean, if you have the ram to match ;) | 01:07:29 |
apyh | it builds fine on my end - just a verification from someone else would be nice :) | 01:07:40 |
| oak 🏳️🌈♥️ changed their profile picture. | 08:29:04 |
Gaétan Lepage | Any objection to merging the nccl bump?
https://github.com/NixOS/nixpkgs/pull/427804 | 09:26:39 |