!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

291 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
15 Jul 2025
@mcwitt:matrix.orgmcwitt* if your goal is just to get a python env running with CUDA-enabled pytorch (versus wanting to compile CUDA code), I'd recommend starting with a more minimal flake (like the one I posted above)21:22:05
@connorbaker:matrix.orgconnor (he/him)Not sure about segfaults (I had them regularly if my RAM was clocked to high or voltage was unstable etc), but make sure you’re enabling cudaSupport and specifying your GPU’s compute capability for faster builds.21:22:40
@farmerd:matrix.orgfarmerdThat's where the hardware thing comes in. I was seeing issues about hash mismatches and then I tried to verify and repair my nix-store and it's got a bunch of corrupted files (it couldn't repair it).21:23:11
@farmerd:matrix.orgfarmerdYeah I think I've got a dimm going bad on me. I've been having random crashes throughout the system and I hadn't put it together until I spent a bunch of time on this yesterday and realized how many random things were corrupted.21:24:02
@farmerd:matrix.orgfarmerdI've got a new pair of dimms coming tomorrow so I'll swap them in (and probably reinstall nix since my nix-store is apprently corrupted beyond repair :-/ ) and try again.21:24:59
@farmerd:matrix.orgfarmerdOh, although may I ask how to specify the compute capability? I did notice it was passing a bunch of them to NVCC but I didn't see how to specify it.21:25:45
@mcwitt:matrix.orgmcwittregardless of hardware issues, if you're just starting out I don't think you should need to build anything from source. The reason you're seeing this is the flake template you linked is pinned to an old revision of nixpkgs-unstable, and the build artifacts have likely expired from cache.nixos.org. I'll often update the nixpkgs pin as a first step when starting with a new template for this reason23:57:02
16 Jul 2025
@farmerd:matrix.orgfarmerdOk, that makes sense. 01:09:44
@connorbaker:matrix.orgconnor (he/him)See the end of the first section https://github.com/NixOS/nixpkgs/blob/master/doc/languages-frameworks/cuda.section.md#cuda-cuda07:02:27
18 Jul 2025
@connorbaker:matrix.orgconnor (he/him)Could I get a review on https://github.com/NixOS/nixpkgs/pull/426280?19:20:16
21 Jul 2025
@connorbaker:matrix.orgconnor (he/him)Went ahead and merged it17:20:26
23 Jul 2025
@apyh:matrix.orgapyhoof the nccl version in nixpkgs is quite old now16:30:38
@apyh:matrix.orgapyh(quite old in the ml world, lol. only a month old)16:31:25
@apyh:matrix.orgapyhtorchtitan needs torch 2.8, torch 2.8 requires nccl 2.27, gotta update nccl myself 16:31:49
@apyh:matrix.orgapyhguess I'll pr to nixpkgs lol16:31:56
@apyh:matrix.orgapyhpr opened 😁16:59:39
@glepage:matrix.orgGaétan Lepage Can you share the link apyh? 22:56:02
@apyh:matrix.orgapyhah sure! https://github.com/NixOS/nixpkgs/pull/42780423:00:23
@apyh:matrix.orgapyhthey added a bunch of new stuff so i have to patch the shebang in a second python script. surprisingly didn't cause a build failure without it, just didn't export some of the new symbols 23:01:02
@glepage:matrix.orgGaétan LepageThanks!23:03:49
24 Jul 2025
@apyh:matrix.orgapyhhuh. thanks for the nixpkgs-review. very strange to me that it fails to build pytorch as a result, but that the python 3.13 failure is just a bunch of .. warnings inside torch? i'll compile again locally to see..14:56:24
@apyh:matrix.orgapyh can't repro the build failure locally for python312Packages.torchWithCuda Gaétan Lepage 🤔
left a comment here to that effect https://github.com/NixOS/nixpkgs/pull/427804#issuecomment-3114819745
20:26:13
@apyh:matrix.orgapyhcan't repro any of the build failures in fact, only took 3.5 hours per torch to test 😭23:51:03
25 Jul 2025
@glepage:matrix.orgGaétan LepageIt probably failed because of flakiness10:57:16
@apyh:matrix.orgapyhrebased it btw :)17:29:53
@apyh:matrix.orgapyhboth builds worked fine on my machine.. does nixpkgs-review have a timeout? lol17:30:06
@apyh:matrix.orgapyhi have a 7800x3d and it still took 3.5 hours per torch build17:30:26
26 Jul 2025
@rosscomputerguy:matrix.orgTristan RossIs that a PR that my 128 cores could be useful with?00:34:02
@apyh:matrix.orgapyhhaha i mean, if you have the ram to match ;)01:07:29
@apyh:matrix.orgapyhit builds fine on my end - just a verification from someone else would be nice :)01:07:40

Show newer messages


Back to Room ListRoom Version: 9