13 Jul 2025 |
| @me:caem.dev left the room. | 00:13:30 |
15 Jul 2025 |
| farmerd joined the room. | 03:17:28 |
farmerd | I don't know if anyone has a minute to help double check me on something quickly but I've tried about half a dozen different ways to get pytorch working on nixos with cuda and I am continually getting build errors. This flake (https://github.com/mschoder/nix-cuda-template ) seemed like something that perhaps someone else could quickly check to see if the compilation issues I'm seeing are just me or more widespread? For me it actually generates a segfault in GCC so it's quite bizarre. | 03:23:11 |
mcwitt | Hi farmerd , could you say a bit more about what you're trying to do and what specific errors you see?
For basic pytorch usage with the CUDA backend, the following minimal flake seems to work fine for me (just tested on nixpkgs-unstable): https://gist.github.com/mcwitt/b6c8da58a2e1fcbc1c2728f8f60ad136
| 18:04:39 |
farmerd | I'm just trying to get pytorch working with my gpu. But whatever I try to do it ends up trying to build the cuda toolkit and GCC has an internal segfault when trying to build NCCL. | 21:16:02 |
farmerd | I think my current suspicion is that I've got a hardware issue though so I'm going to try addressing that tomorrow and see if I still have issues. | 21:17:17 |
mcwitt | have you tried updating the nixpkgs pin? (nix flake update nixpkgs ). That at least should let you use a cached toolkit and skip the build (unless you're also overriding for some reason) | 21:19:16 |
mcwitt | if your goal is just to get a python env running with pytorch and CUDA, I'd recommend starting with a more minimal flake (like the one I posted above) | 21:20:42 |
mcwitt | * if your goal is just to get a python env running with CUDA-enabled pytorch (versus wanting to compile CUDA code), I'd recommend starting with a more minimal flake (like the one I posted above) | 21:22:05 |
connor (he/him) (UTC-7) | Not sure about segfaults (I had them regularly if my RAM was clocked to high or voltage was unstable etc), but make sure you’re enabling cudaSupport and specifying your GPU’s compute capability for faster builds. | 21:22:40 |
farmerd | That's where the hardware thing comes in. I was seeing issues about hash mismatches and then I tried to verify and repair my nix-store and it's got a bunch of corrupted files (it couldn't repair it). | 21:23:11 |
farmerd | Yeah I think I've got a dimm going bad on me. I've been having random crashes throughout the system and I hadn't put it together until I spent a bunch of time on this yesterday and realized how many random things were corrupted. | 21:24:02 |
farmerd | I've got a new pair of dimms coming tomorrow so I'll swap them in (and probably reinstall nix since my nix-store is apprently corrupted beyond repair :-/ ) and try again. | 21:24:59 |
farmerd | Oh, although may I ask how to specify the compute capability? I did notice it was passing a bunch of them to NVCC but I didn't see how to specify it. | 21:25:45 |
mcwitt | regardless of hardware issues, if you're just starting out I don't think you should need to build anything from source.
The reason you're seeing this is the flake template you linked is pinned to an old revision of nixpkgs-unstable, and the build artifacts have likely expired from cache.nixos.org. I'll often update the nixpkgs pin as a first step when starting with a new template for this reason | 23:57:02 |
16 Jul 2025 |
farmerd | Ok, that makes sense. | 01:09:44 |
connor (he/him) (UTC-7) | See the end of the first section https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/cuda.section.md#cuda-cuda | 07:02:27 |
18 Jul 2025 |
connor (he/him) (UTC-7) | Could I get a review on https://github.com/NixOS/nixpkgs/pull/426280? | 19:20:16 |
21 Jul 2025 |
connor (he/him) (UTC-7) | Went ahead and merged it | 17:20:26 |
23 Jul 2025 |
apyh | oof the nccl version in nixpkgs is quite old now | 16:30:38 |
apyh | (quite old in the ml world, lol. only a month old) | 16:31:25 |
apyh | torchtitan needs torch 2.8, torch 2.8 requires nccl 2.27, gotta update nccl myself | 16:31:49 |
apyh | guess I'll pr to nixpkgs lol | 16:31:56 |
apyh | pr opened 😁 | 16:59:39 |
Gaétan Lepage | Can you share the link apyh? | 22:56:02 |
apyh | ah sure! https://github.com/NixOS/nixpkgs/pull/427804 | 23:00:23 |
apyh | they added a bunch of new stuff so i have to patch the shebang in a second python script. surprisingly didn't cause a build failure without it, just didn't export some of the new symbols | 23:01:02 |
Gaétan Lepage | Thanks! | 23:03:49 |