NixOS CUDA - Public Room Timeline

	NixOS CUDA	291 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
6 Oct 2025
connor (he/him)	(Using the scripts in `packages` does require the `read-only-local-store` feature be enabled, since the `evalStore` outputs from the reports are just small instances of Nix stores which are inside the Nix store, so they do need to be mounted as read-only.)	14:51:17
connor (he/him)	But as an example, just with that diff output, if we change `python312Packages` and versioned package sets to their default, sort and deduplicate, that gives us massively better coverage of the packages which require CUDA that we can build (since allowBroken and allowInsecure are both still false)	14:54:32
SomeoneSerge (back on matrix)	~1.5 GB worth of derivations) LoL can we have evals run on a separate machine from the rest of hydra, and just do tmpfs store there xD	16:49:17
SomeoneSerge (back on matrix)	When the evaluations of Nixpkgs instantiations are done in the derivations, the --eval-store argument is set to the evalStore output so we can keep the derivations around Cursed	16:51:37
connor (he/him)	When I had done something like this a year ago I had to use recursive nix so this is an “improvement” in that it only requires the read-only store experimental feature, which is much more limited	18:43:17
connor (he/him)	I guess if you don’t mind re-evaluating everything there’s no need for evalStore since we could use an in-memory dummy store (probably wouldn’t need tmpfs build directory then either	18:44:47
connor (he/him)	*	18:44:52
SomeoneSerge (back on matrix)	Well the real issue here is that lol why physical realizations of aterm drvs in the first place	19:44:03
7 Oct 2025
connor (he/him)	Using toJSON calls addToStore on paths before serializing so they’re valid; I guess I could try using unsafeDiscardWhatever before serializing to JSON to see if that’s enough to prevent realization	01:32:05
connor (he/him)	There was a `--read-only` flag I had forgotten about	03:47:47
connor (he/him)	Okay, it's way faster now and doesn't need a ramdisk. Where as previously it would take about 1m40s on my i9-13900k now it's taking about 20s to do `time nix build -L ".#reports.x86_64-linux.pkgs-cuda-post^*" --builders '' --rebuild`	03:53:40
connor (he/him)	For generating a single report htop shows it was taking about 18% of my RAM (less than 20GB)	03:57:11
connor (he/him)	To clarify, it wasn’t enough to use unsafeDiscardStringContext because the derivation was instantiated as soon as drvPath was evaluated, even before toJSON. Using the read-only argument (which is different than the store URI query parameter of the same name lmao) avoids the instantiating.	06:54:57
connor (he/him)	https://cuda-index.someonex.net	15:40:22
connor (he/him)	https://forge.someonex.net/else/sidx	15:40:26
SomeoneSerge (back on matrix)	connor (he/him) (UTC-7): ping about azure stuff again	19:30:18
SomeoneSerge (back on matrix)	Hey, we've been getting more experience with infra and hydra now, and I think like ephemeral builders are becoming more and more relevant. May I ask you to elaborate?	19:34:01
SomeoneSerge (back on matrix)	* connor (he/him) (UTC-7): ping about azure images again	19:34:34
SomeoneSerge (back on matrix)	MI25s to test ROCm? CC Lun https://docs.azure.cn/en-us/virtual-machines/sizes/gpu-accelerated/nv-family	19:39:24
Lun	Too old	19:39:46
SomeoneSerge (back on matrix)	* Hey, we've been getting more experience with infra and hydra now, and I feel like ephemeral builders are becoming more and more relevant. May I ask you to elaborate?	19:39:48
SomeoneSerge (back on matrix)	mx250 only?	19:40:33
Lun	gfx906 is barely cared for by upstream so not sure they'd be great for automated testing If we have credits that would otherwise go to waste and that's the only option then maybe worth it?	19:41:00
Lun	gfx90a (MI210/250) is the oldest instinct option upstream seems to actually be paying attention to	19:41:26
SomeoneSerge (back on matrix)	Relatable: azure offers the right hw for us, but I'm not confident we can utilize it efficiently enough yet	19:44:47
Lun	I see a NGads V620-series option on a different page which is supposedly `gfx1030` (probably rebranded W6800 cards)	19:47:01
8 Oct 2025
connor (he/him)	I’ll try to get it cleaned up and pushed Broadly I used NixOS-anywhere to install machines provisioned with Ubuntu because I didn’t want to deal with blob storage accounts and VHDs (though it should very doable to produce images) IIRC the tricky part was finding the kernel modules missing for the HB series (I never got around to packaging the mellanox drivers but whatever they still have very fast IP connections)	15:23:50
connor (he/him)	Thankfully Azure offers serial console through their web console so I was able to debug that (shout out to @jmbaur for being an absolute saint and walking me through the kernel side of stuff)	15:25:29
SomeoneSerge (back on matrix)	(though it should very doable to produce images) I only tried once and, well, producing images if trivial of course, but making azure consume them... I got completely lost somewhere between "Azure Compute Galleries" and "x64 vs arm64 disks"	15:40:15
connor (he/him)	I swear at some point in https://github.com/ConnorBaker/nix-cuda-test I had written scripts to create and upload VHDs, provision Azure instances, and do builds on them; the goal being to then have scripts which provision Lambda Labs instances which pull in and run the builds to do GPU testing (since it’s cheaper than Azure GPU instances)	16:05:41

Show newer messages

Back to Room ListRoom Version: 9