NixOS Infrastructure - Public Room Timeline

	NixOS Infrastructure	382 Members
	Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) \| Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 \| See #infra-alerts:nixos.org for real time alerts from Prometheus.	117 Servers

Load older messages

Sender	Message	Time
1 Mar 2025
emily	(I am not on the infra team, don't trust anything I say)	16:19:34
Vladimír Čunát	I believe it's fine to put an unnecessarily big value in there. It's just a single package. I see the main purpose to limit some accidents where resources would be spent continually without bringing value.	16:23:17
Vladimír Čunát	But honestly, doing so much work in a single derivation is rather risky. Generally it's better to split it up.	16:24:19
Vladimír Čunát	For example, sometimes we need to restart the queue runner, i.e. all builds in progress get scrapped.	16:24:57
Vladimír Čunát	We even had periods where it crashed often (same effect).	16:25:16
Lun	Haven't figured out a way to split it up yet that'll be easy to maintain	16:26:25
Vladimír Čunát	And I have some concerns about increasing latency of channels, but maybe that will turn out negligible.	16:27:38
Vladimír Čunát	* And I have some concerns about increasing latency of channel updates, but maybe that will turn out negligible.	16:27:48
Vladimír Čunát	As it is now, everyone will be waiting for this huge thing to finish.	16:28:23
Vladimír Čunát	* As it is designed now, everyone will be waiting for this huge thing to finish.	16:29:11
emily	so… has `hydraPlatforms = [ ];` been considered?	16:30:05
emily	how important is the `composable_kernel` thing?	16:30:07
emily	I know ROCm has a bunch of stuff and some of it nobody uses, right?	16:30:19
Lun	It's needed for MIOpen which is needed for pytorch for kernels for some ops, it will also be used directly by pytorch in 2.6	16:32:47
Lun	https://github.com/NixOS/nixpkgs/blob/bda6bcbbacbd8f48e69b228b91b5541c03f7ab35/pkgs/development/rocm-modules/6/composable_kernel/default.nix If anyone happens to know cmake wizardry and can work out how to split this into multiple derivations and get a working end result without needing a massive patch to the CMakeFiles that might be the best option?	16:36:38
emily	okay that sounds pretty important then	16:37:02
emily	is the issue that there's only one install target and it depends on everything?	16:37:28
emily	I assume you can build individual libraries with `ninja <target>` at least	16:37:43
Lun	The kernels under tensor_operation_instance https://github.com/ROCm/composable_kernel/blob/1bf29478cdada3c7f56fbedc5542b275b0c107b3/library/src/tensor_operation_instance/gpu/CMakeLists.txt are approximately all the build time.	16:40:02
emily	what if you have derivations that do the same CMake setup dance, use `ninjaTargets` (or whatever that variable is called) to build one (set of) kernel(s), and the `installPhase` just directly copies it over? then the main derivation can aggregate all those sub-derivations and copy them into place in the build directory before running the rest of the build	16:44:18
emily	that way you can avoid patching the CMake build and still split it up	16:44:28
Lun	I'll give that a try	16:46:28
emily	the advantage is that it can be split across multiple builders and maybe not even need `big-parallel`	16:47:11
Vladimír Čunát	Especially if there are cases that someone needs to depend only on some of the kernels?	16:48:40
emily	I assume the install phase still expects all of it, I wasn't thinking the overall packaging would change since that sounds more involved	16:49:16
emily	I imagine PyTorch wants to be able to use whatever	16:49:25
Vladimír Čunát	So PyTorch will be a monster of several gigabytes, needing a big computer to build it? Anyway, I guess I'm too verbose for this channel.	16:52:44
emily	aren't all the ML libraries already that?	16:59:12
emily	(but I guess this is what `rocmSupport` or whatever is for?)	16:59:26
2 Mar 2025
Vladimír Čunát	Huh, I wonder what this is: Aborted: [31;1merror:[0m deleting cgroup '[35;1m/sys/fs/cgroup/system.slice/nix-daemon.service/nix-build-uid-30014[0m': [35;1mDevice or resource busy[0m	13:34:04

Show newer messages

Back to Room ListRoom Version: 6