NixOS Infrastructure - Public Room Timeline

	NixOS Infrastructure	426 Members
	Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) \| Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 \| See #infra-alerts:nixos.org for real time alerts from Prometheus.	131 Servers

Load older messages

Sender	Message	Time
20 May 2026
K900	I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck	12:07:33
Cobalt	The main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc.	12:13:23
Cobalt	fast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups.	12:14:15
K900	Well right now the scheduling is very stupid	12:14:20
K900	There's no locality awareness	12:14:30
K900	Or hell job size awareness	12:14:32
K900	Improving it will definitely help a little, but the big bottleneck is still the coordinator itself	12:15:32
hexa	the scheduler	12:15:50
K900	The coordinator as in the machine	12:16:01
K900	But yeah	12:16:02
hexa	the coordinator is the process that runs the remote build	12:16:14
Cobalt	So just to understand this a bit more, a significant problem is the performance of the software running the scheduler/coordinator (so the queue runner)?	12:17:07
K900	It's not even the software necessarily	12:18:34
K900	It's the design of the whole thing that requires a lot of copying data around	12:18:44
Cobalt	A montivation of the optimizations was to ensure that scheduling was supposed to stay cheap-ish so I would try to not compromise this too much.	12:18:53
K900	And also the fact that everything is xz compressed in transport which is extremely overhead	12:18:59
Cobalt	Is that regarding the data exchange of the build outputs, RPC or artifacts (logs)?	12:19:47
K900	Build outputs	12:20:05
K900	Logs are negligible by comparison	12:20:12
K900	And RPC is just normal Nix daemon protocol over SSH	12:20:18
Cobalt	Maybe I misunderstood something there but aren't they uploaded directly from the builder to S3 bucket?	12:20:37
K900	They are not	12:20:44
Cobalt	* Maybe I misunderstood something there but aren't they uploaded directly from the builder to the S3 bucket?	12:20:45
K900	They are currently copied to the coordinator for signing	12:20:52
Cobalt	Oh, well, that does sound really expensive for bandwidth (and compute with compression).	12:21:27
Cobalt	* Oh, that does sound really expensive for bandwidth (and compute with compression).	12:21:33
K900	It is	12:21:47
Vladimír Čunát	And S3 is over the Atlantic.	12:21:53
Vladimír Čunát	There are lots of "design problems".	12:22:09
Cobalt	Interesting, I will have to take a closer look at it then (especially if my knowledge of the architecture is apparently inaccurate). I will still probably stick with scheduling for now due to other constraints on the topic but thank you for the extra information.	12:24:03

Show newer messages

Back to Room ListRoom Version: 6