NixOS Infrastructure - Public Room Timeline

	NixOS Infrastructure	391 Members
	Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) \| Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 \| See #infra-alerts:nixos.org for real time alerts from Prometheus.	121 Servers

Load older messages

Sender	Message	Time
1 Mar 2025
hexa	lun: I'm a bit afraid to ask, but there is supposed to be a migraphx python package, and one of the packages I maintain would want that to support rocm 🙈	07:09:15
hexa	supposedly this https://github.com/ROCm/AMDMIGraphX/blob/develop/src/py/migraphx_py.cpp	07:13:00
Lun	Mention it on the big ROCm tracking issue	16:04:24
Lun	definitely worse: https://gist.githubusercontent.com/LunNova/b1cf007f1af52b4dc353fd9925857b97/raw/63ea9ec1a500d5ef6ad4f2f0eac7a59b6db6e310/huge%2520composable_kernel%2520template%2520instantiation.txt	16:07:29
Lun	The ~4h builds are nix cores config set to 128 on a 64c/128t epyc milan eng sample that's clocking down to <3GHz due to power limits, not sure what the relative speedup per core will be but probably not enough to overcome dropping to 24 build threads. Does bumping meta.timeout to 20h to start with sound reasonable?	16:13:51
emily	might make sense to do 48 and then scale down based on the actual time	16:19:08
emily	to avoid spending 20 hours on a build that times out and gets thrown away	16:19:16
emily	(I am not on the infra team, don't trust anything I say)	16:19:34
Vladimír Čunát	I believe it's fine to put an unnecessarily big value in there. It's just a single package. I see the main purpose to limit some accidents where resources would be spent continually without bringing value.	16:23:17
Vladimír Čunát	But honestly, doing so much work in a single derivation is rather risky. Generally it's better to split it up.	16:24:19
Vladimír Čunát	For example, sometimes we need to restart the queue runner, i.e. all builds in progress get scrapped.	16:24:57
Vladimír Čunát	We even had periods where it crashed often (same effect).	16:25:16
Lun	Haven't figured out a way to split it up yet that'll be easy to maintain	16:26:25
Vladimír Čunát	And I have some concerns about increasing latency of channels, but maybe that will turn out negligible.	16:27:38
Vladimír Čunát	* And I have some concerns about increasing latency of channel updates, but maybe that will turn out negligible.	16:27:48
Vladimír Čunát	As it is now, everyone will be waiting for this huge thing to finish.	16:28:23
Vladimír Čunát	* As it is designed now, everyone will be waiting for this huge thing to finish.	16:29:11
emily	so… has `hydraPlatforms = [ ];` been considered?	16:30:05
emily	how important is the `composable_kernel` thing?	16:30:07
emily	I know ROCm has a bunch of stuff and some of it nobody uses, right?	16:30:19
Lun	It's needed for MIOpen which is needed for pytorch for kernels for some ops, it will also be used directly by pytorch in 2.6	16:32:47
Lun	https://github.com/NixOS/nixpkgs/blob/bda6bcbbacbd8f48e69b228b91b5541c03f7ab35/pkgs/development/rocm-modules/6/composable_kernel/default.nix If anyone happens to know cmake wizardry and can work out how to split this into multiple derivations and get a working end result without needing a massive patch to the CMakeFiles that might be the best option?	16:36:38
emily	okay that sounds pretty important then	16:37:02
emily	is the issue that there's only one install target and it depends on everything?	16:37:28
emily	I assume you can build individual libraries with `ninja <target>` at least	16:37:43
Lun	The kernels under tensor_operation_instance https://github.com/ROCm/composable_kernel/blob/1bf29478cdada3c7f56fbedc5542b275b0c107b3/library/src/tensor_operation_instance/gpu/CMakeLists.txt are approximately all the build time.	16:40:02
emily	what if you have derivations that do the same CMake setup dance, use `ninjaTargets` (or whatever that variable is called) to build one (set of) kernel(s), and the `installPhase` just directly copies it over? then the main derivation can aggregate all those sub-derivations and copy them into place in the build directory before running the rest of the build	16:44:18
emily	that way you can avoid patching the CMake build and still split it up	16:44:28
Lun	I'll give that a try	16:46:28
emily	the advantage is that it can be split across multiple builders and maybe not even need `big-parallel`	16:47:11

Show newer messages

Back to Room ListRoom Version: 6