!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

426 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.131 Servers

Load older messages


SenderMessageTime
20 May 2026
@k900:0upti.meK900 I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck 12:07:33
@c0ba1t:matrix.orgCobaltThe main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc. 12:13:23
@c0ba1t:matrix.orgCobaltfast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups.12:14:15
@k900:0upti.meK900 Well right now the scheduling is very stupid 12:14:20
@k900:0upti.meK900 There's no locality awareness 12:14:30
@k900:0upti.meK900Or hell job size awareness12:14:32
@k900:0upti.meK900Improving it will definitely help a little, but the big bottleneck is still the coordinator itself12:15:32
@hexa:lossy.networkhexathe scheduler12:15:50
@k900:0upti.meK900The coordinator as in the machine12:16:01
@k900:0upti.meK900But yeah12:16:02
@hexa:lossy.networkhexathe coordinator is the process that runs the remote build12:16:14
@c0ba1t:matrix.orgCobaltSo just to understand this a bit more, a significant problem is the performance of the software running the scheduler/coordinator (so the queue runner)?12:17:07
@k900:0upti.meK900 It's not even the software necessarily 12:18:34
@k900:0upti.meK900 It's the design of the whole thing that requires a lot of copying data around 12:18:44
@c0ba1t:matrix.orgCobaltA montivation of the optimizations was to ensure that scheduling was supposed to stay cheap-ish so I would try to not compromise this too much.12:18:53
@k900:0upti.meK900 And also the fact that everything is xz compressed in transport which is extremely overhead 12:18:59
@c0ba1t:matrix.orgCobaltIs that regarding the data exchange of the build outputs, RPC or artifacts (logs)?12:19:47
@k900:0upti.meK900Build outputs12:20:05
@k900:0upti.meK900Logs are negligible by comparison12:20:12
@k900:0upti.meK900And RPC is just normal Nix daemon protocol over SSH12:20:18
@c0ba1t:matrix.orgCobaltMaybe I misunderstood something there but aren't they uploaded directly from the builder to S3 bucket?12:20:37
@k900:0upti.meK900They are not12:20:44
@c0ba1t:matrix.orgCobalt* Maybe I misunderstood something there but aren't they uploaded directly from the builder to the S3 bucket?12:20:45
@k900:0upti.meK900They are currently copied to the coordinator for signing12:20:52
@c0ba1t:matrix.orgCobaltOh, well, that does sound really expensive for bandwidth (and compute with compression).12:21:27
@c0ba1t:matrix.orgCobalt* Oh, that does sound really expensive for bandwidth (and compute with compression).12:21:33
@k900:0upti.meK900It is12:21:47
@vcunat:matrix.orgVladimír ČunátAnd S3 is over the Atlantic.12:21:53
@vcunat:matrix.orgVladimír ČunátThere are lots of "design problems".12:22:09
@c0ba1t:matrix.orgCobaltInteresting, I will have to take a closer look at it then (especially if my knowledge of the architecture is apparently inaccurate). I will still probably stick with scheduling for now due to other constraints on the topic but thank you for the extra information.12:24:03

Show newer messages


Back to Room ListRoom Version: 6