NixOS Infrastructure | 426 Members | |
| Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus. | 131 Servers |
| Sender | Message | Time |
|---|---|---|
| 19 May 2026 | ||
| Though it's one lock-file shared for all kinds of machines, so I guess it's better not to let the real state drift too much. | 10:41:14 | |
| BTW
| 10:43:20 | |
| can you file an issue for that? | 10:52:31 | |
| 16:47:30 | ||
| 20:56:29 | ||
| 20 May 2026 | ||
| Hey, I was looking into optimizations for build scheduling to account for substitution cost and was wondering how large (in terms of number/total size of derivations) the nix store was on builder nodes. This is mainly intended to make some guestimates about how expensive the query of "Which subset of a package's build graph is present on a builder node and how large (in total file size) is that subset?" is. My personal builders have varied from 250K-1M but they are almost certainly smaller/less used than the public builders. I have looked through the data in the public Grafana instance but could not find an applicable metric. Could you please, if it is available somewhere, include this metrics in the Prometheus Instance for the NixOS Infra? Alternatively, could someone with access to the nodes please look it up on a few builder nodes for me? It should be queriable via the sqlite DB ( | 11:18:19 | |
| Something that was brought up before is that you can just do a bloom filter over store hashes | 11:32:45 | |
| It ends up on the order of a few kilobytes for a very good hit rate | 11:33:06 | |
| The real problem is that requires an additional agent on the workers | 11:33:22 | |
| I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. | 11:38:49 | |
| * I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual costs of, e.g., asking the daemon for the subset of paths in the store. | 11:39:57 | |
| * I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store. | 11:40:07 | |
| * I'm aware of the issues with just tagging it on the existing daemon, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz). My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store. | 11:41:37 | |
| 11:56:04 | |
| those are our linux builders, the first two x86_64-linux, the last two aarch64-linux | 11:57:39 | |
| Thank you a lot! | 11:58:08 | |
| and our gc logic | 11:58:42 | |
| Thank you for pointing me to the configurations, I was aware of that to some degree (searching for the metrics initially led me to the system configurations). The 500GB limit for a GC run is presumably to ensure that the GC is not locking the store for too long? | 12:01:36 | |
| yeah, gc is not free and if we keep some outputs around for reuse that's good too | 12:03:27 | |
| we might eventually migrate to nix-fast-gc in a bit | 12:03:59 | |
| | 12:04:12 | |
| Redacted or Malformed Event | 12:04:23 | |
| https://github.com/Mic92/fast-nix-gc | 12:04:35 | |
| That also looks very interesting, especially the notes on SQLite queries in fast-nix-gc. | 12:06:42 | |
| I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck | 12:07:33 | |
| The main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc. | 12:13:23 | |
| fast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups. | 12:14:15 | |
| Well right now the scheduling is very stupid | 12:14:20 | |
| There's no locality awareness | 12:14:30 | |
| Or hell job size awareness | 12:14:32 | |