!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

418 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.128 Servers

Load older messages


SenderMessageTime
20 May 2026
@hexa:lossy.networkhexa (signing key rotation when)those are our linux builders, the first two x86_64-linux, the last two aarch64-linux11:57:39
@c0ba1t:matrix.orgCobaltThank you a lot!11:58:08
@hexa:lossy.networkhexa (signing key rotation when) and our gc logic 11:58:42
@c0ba1t:matrix.orgCobaltThank you for pointing me to the configurations, I was aware of that to some degree (searching for the metrics initially led me to the system configurations). The 500GB limit for a GC run is presumably to ensure that the GC is not locking the store for too long?12:01:36
@hexa:lossy.networkhexa (signing key rotation when)yeah, gc is not free and if we keep some outputs around for reuse that's good too12:03:27
@hexa:lossy.networkhexa (signing key rotation when)we might eventually migrate to nix-fast-gc in a bit12:03:59
@hexa:lossy.networkhexa (signing key rotation when) that'll fix us 12:04:12
@hexa:lossy.networkhexa (signing key rotation when)Redacted or Malformed Event12:04:23
@hexa:lossy.networkhexa (signing key rotation when)https://github.com/Mic92/fast-nix-gc12:04:35
@c0ba1t:matrix.orgCobaltThat also looks very interesting, especially the notes on SQLite queries in fast-nix-gc. 12:06:42
@k900:0upti.meK900 I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck 12:07:33
@c0ba1t:matrix.orgCobaltThe main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc. 12:13:23
@c0ba1t:matrix.orgCobaltfast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups.12:14:15
@k900:0upti.meK900 Well right now the scheduling is very stupid 12:14:20
@k900:0upti.meK900 There's no locality awareness 12:14:30
@k900:0upti.meK900Or hell job size awareness12:14:32
@k900:0upti.meK900Improving it will definitely help a little, but the big bottleneck is still the coordinator itself12:15:32
@hexa:lossy.networkhexa (signing key rotation when)the scheduler12:15:50
@k900:0upti.meK900The coordinator as in the machine12:16:01
@k900:0upti.meK900But yeah12:16:02
@hexa:lossy.networkhexa (signing key rotation when)the coordinator is the process that runs the remote build12:16:14
@c0ba1t:matrix.orgCobaltSo just to understand this a bit more, a significant problem is the performance of the software running the scheduler/coordinator (so the queue runner)?12:17:07
@k900:0upti.meK900 It's not even the software necessarily 12:18:34
@k900:0upti.meK900 It's the design of the whole thing that requires a lot of copying data around 12:18:44
@c0ba1t:matrix.orgCobaltA montivation of the optimizations was to ensure that scheduling was supposed to stay cheap-ish so I would try to not compromise this too much.12:18:53
@k900:0upti.meK900 And also the fact that everything is xz compressed in transport which is extremely overhead 12:18:59
@c0ba1t:matrix.orgCobaltIs that regarding the data exchange of the build outputs, RPC or artifacts (logs)?12:19:47
@k900:0upti.meK900Build outputs12:20:05
@k900:0upti.meK900Logs are negligible by comparison12:20:12
@k900:0upti.meK900And RPC is just normal Nix daemon protocol over SSH12:20:18

Show newer messages


Back to Room ListRoom Version: 6