!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

251 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda46 Servers

Load older messages


SenderMessageTime
13 Jul 2025
@me:caem.dev@me:caem.dev left the room.00:13:30
15 Jul 2025
@farmerd:matrix.orgfarmerd joined the room.03:17:28
@farmerd:matrix.orgfarmerdI don't know if anyone has a minute to help double check me on something quickly but I've tried about half a dozen different ways to get pytorch working on nixos with cuda and I am continually getting build errors. This flake (https://github.com/mschoder/nix-cuda-template ) seemed like something that perhaps someone else could quickly check to see if the compilation issues I'm seeing are just me or more widespread? For me it actually generates a segfault in GCC so it's quite bizarre.03:23:11
@mcwitt:matrix.orgmcwitt

Hi farmerd , could you say a bit more about what you're trying to do and what specific errors you see?

For basic pytorch usage with the CUDA backend, the following minimal flake seems to work fine for me (just tested on nixpkgs-unstable): https://gist.github.com/mcwitt/b6c8da58a2e1fcbc1c2728f8f60ad136

18:04:39
@farmerd:matrix.orgfarmerdI'm just trying to get pytorch working with my gpu. But whatever I try to do it ends up trying to build the cuda toolkit and GCC has an internal segfault when trying to build NCCL.21:16:02
@farmerd:matrix.orgfarmerdI think my current suspicion is that I've got a hardware issue though so I'm going to try addressing that tomorrow and see if I still have issues.21:17:17
@mcwitt:matrix.orgmcwitt have you tried updating the nixpkgs pin? (nix flake update nixpkgs). That at least should let you use a cached toolkit and skip the build (unless you're also overriding for some reason) 21:19:16
@mcwitt:matrix.orgmcwittif your goal is just to get a python env running with pytorch and CUDA, I'd recommend starting with a more minimal flake (like the one I posted above)21:20:42
@mcwitt:matrix.orgmcwitt* if your goal is just to get a python env running with CUDA-enabled pytorch (versus wanting to compile CUDA code), I'd recommend starting with a more minimal flake (like the one I posted above)21:22:05
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Not sure about segfaults (I had them regularly if my RAM was clocked to high or voltage was unstable etc), but make sure you’re enabling cudaSupport and specifying your GPU’s compute capability for faster builds.21:22:40
@farmerd:matrix.orgfarmerdThat's where the hardware thing comes in. I was seeing issues about hash mismatches and then I tried to verify and repair my nix-store and it's got a bunch of corrupted files (it couldn't repair it).21:23:11
@farmerd:matrix.orgfarmerdYeah I think I've got a dimm going bad on me. I've been having random crashes throughout the system and I hadn't put it together until I spent a bunch of time on this yesterday and realized how many random things were corrupted.21:24:02
@farmerd:matrix.orgfarmerdI've got a new pair of dimms coming tomorrow so I'll swap them in (and probably reinstall nix since my nix-store is apprently corrupted beyond repair :-/ ) and try again.21:24:59
@farmerd:matrix.orgfarmerdOh, although may I ask how to specify the compute capability? I did notice it was passing a bunch of them to NVCC but I didn't see how to specify it.21:25:45
@mcwitt:matrix.orgmcwittregardless of hardware issues, if you're just starting out I don't think you should need to build anything from source. The reason you're seeing this is the flake template you linked is pinned to an old revision of nixpkgs-unstable, and the build artifacts have likely expired from cache.nixos.org. I'll often update the nixpkgs pin as a first step when starting with a new template for this reason23:57:02
16 Jul 2025
@farmerd:matrix.orgfarmerdOk, that makes sense. 01:09:44
@connorbaker:matrix.orgconnor (he/him) (UTC-7)See the end of the first section https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/cuda.section.md#cuda-cuda07:02:27
18 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Could I get a review on https://github.com/NixOS/nixpkgs/pull/426280?19:20:16
21 Jul 2025
@connorbaker:matrix.orgconnor (he/him) (UTC-7)Went ahead and merged it17:20:26
23 Jul 2025
@apyh:matrix.orgapyhoof the nccl version in nixpkgs is quite old now16:30:38
@apyh:matrix.orgapyh(quite old in the ml world, lol. only a month old)16:31:25
@apyh:matrix.orgapyhtorchtitan needs torch 2.8, torch 2.8 requires nccl 2.27, gotta update nccl myself 16:31:49
@apyh:matrix.orgapyhguess I'll pr to nixpkgs lol16:31:56
@apyh:matrix.orgapyhpr opened 😁16:59:39
@glepage:matrix.orgGaétan Lepage Can you share the link apyh? 22:56:02
@apyh:matrix.orgapyhah sure! https://github.com/NixOS/nixpkgs/pull/42780423:00:23
@apyh:matrix.orgapyhthey added a bunch of new stuff so i have to patch the shebang in a second python script. surprisingly didn't cause a build failure without it, just didn't export some of the new symbols 23:01:02
@glepage:matrix.orgGaétan LepageThanks!23:03:49

There are no newer messages yet.


Back to Room ListRoom Version: 9