NixOS Infrastructure | 427 Members | |
| Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus. | 131 Servers |
| Sender | Message | Time |
|---|---|---|
| 20 May 2026 | ||
| I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. | 11:38:49 | |
| * I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual costs of, e.g., asking the daemon for the subset of paths in the store. | 11:39:57 | |
| * I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store. | 11:40:07 | |
| * I'm aware of the issues with just tagging it on the existing daemon, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store. | 11:41:37 | |
| 11:56:04 | |
| those are our linux builders, the first two x86_64-linux, the last two aarch64-linux | 11:57:39 | |
| Thank you a lot! | 11:58:08 | |
| and our gc logic | 11:58:42 | |
| Thank you for pointing me to the configurations, I was aware of that to some degree (searching for the metrics initially led me to the system configurations). The 500GB limit for a GC run is presumably to ensure that the GC is not locking the store for too long? | 12:01:36 | |
| yeah, gc is not free and if we keep some outputs around for reuse that's good too | 12:03:27 | |
| we might eventually migrate to nix-fast-gc in a bit | 12:03:59 | |
| | 12:04:12 | |
| Redacted or Malformed Event | 12:04:23 | |
| https://github.com/Mic92/fast-nix-gc | 12:04:35 | |
| That also looks very interesting, especially the notes on SQLite queries in fast-nix-gc. | 12:06:42 | |
| I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck | 12:07:33 | |
| The main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc. | 12:13:23 | |
| fast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups. | 12:14:15 | |
| Well right now the scheduling is very stupid | 12:14:20 | |
| There's no locality awareness | 12:14:30 | |
| Or hell job size awareness | 12:14:32 | |
| Improving it will definitely help a little, but the big bottleneck is still the coordinator itself | 12:15:32 | |
| the scheduler | 12:15:50 | |
| The coordinator as in the machine | 12:16:01 | |
| But yeah | 12:16:02 | |
| the coordinator is the process that runs the remote build | 12:16:14 | |
| So just to understand this a bit more, a significant problem is the performance of the software running the scheduler/coordinator (so the queue runner)? | 12:17:07 | |
| It's not even the software necessarily | 12:18:34 | |
| It's the design of the whole thing that requires a lot of copying data around | 12:18:44 | |
| A montivation of the optimizations was to ensure that scheduling was supposed to stay cheap-ish so I would try to not compromise this too much. | 12:18:53 | |