!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

427 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.131 Servers

Load older messages


SenderMessageTime
20 May 2026
@c0ba1t:matrix.orgCobalt

I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz).

My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later.

11:38:49
@c0ba1t:matrix.orgCobalt *

I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz).

My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual costs of, e.g., asking the daemon for the subset of paths in the store.

11:39:57
@c0ba1t:matrix.orgCobalt *

I know, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz).

My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store.

11:40:07
@c0ba1t:matrix.orgCobalt *

I'm aware of the issues with just tagging it on the existing daemon, I plan to do some research on this for my baechelor thesis. iirc you or hexa (signing key rotation when) broght it up when I asked for relevant issues before in the offtopic channel. The main thing for me here was to not just count store paths though but instead the total size of the store paths (as , e.g., firefox-bin is heavier than harfbuzz).

My main plan here was to make a prototype with an extra agent/nix-scheduler-hook and use the results from testing there to propose changes to hydra queue runner later. The number/total size of paths here was to make some estimates here about actual computational cost of, e.g., asking the daemon for the subset of paths in the store.

11:41:37
@hexa:lossy.networkhexa
[root@elated-minsky:~]# nix shell nixpkgs#sqlite -c sqlite3 /nix/var/nix/db/db.sqlite 'select count(*), sum(narSize) from ValidPaths'
391244|5632213988352
[root@sleepy-brown:~]# nix shell nixpkgs#sqlite -c sqlite3 /nix/var/nix/db/db.sqlite 'select count(*), sum(narSize) from ValidPaths'
335188|5232269242216
[root@goofy-hopcroft:~]# nix shell nixpkgs#sqlite -c sqlite3 /nix/var/nix/db/db.sqlite 'select count(*), sum(narSize) from ValidPaths'
828294|12762811007880
[root@hopeful-rivest:~]# nix shell nixpkgs#sqlite -c sqlite3 /nix/var/nix/db/db.sqlite 'select count(*), sum(narSize) from ValidPaths'
157509|2004835377096
11:56:04
@hexa:lossy.networkhexathose are our linux builders, the first two x86_64-linux, the last two aarch64-linux11:57:39
@c0ba1t:matrix.orgCobaltThank you a lot!11:58:08
@hexa:lossy.networkhexa and our gc logic 11:58:42
@c0ba1t:matrix.orgCobaltThank you for pointing me to the configurations, I was aware of that to some degree (searching for the metrics initially led me to the system configurations). The 500GB limit for a GC run is presumably to ensure that the GC is not locking the store for too long?12:01:36
@hexa:lossy.networkhexayeah, gc is not free and if we keep some outputs around for reuse that's good too12:03:27
@hexa:lossy.networkhexawe might eventually migrate to nix-fast-gc in a bit12:03:59
@hexa:lossy.networkhexa that'll fix us 12:04:12
@hexa:lossy.networkhexaRedacted or Malformed Event12:04:23
@hexa:lossy.networkhexahttps://github.com/Mic92/fast-nix-gc12:04:35
@c0ba1t:matrix.orgCobaltThat also looks very interesting, especially the notes on SQLite queries in fast-nix-gc. 12:06:42
@k900:0upti.meK900 I'm not sure what kind of optimization you're looking at here, but generally GC or any kind of store queries aren't the bottleneck 12:07:33
@c0ba1t:matrix.orgCobaltThe main optimizations were how to reduce the number/cost of queries for evaluating the subset size required when looking scheduling a set of builds. They might not be a bottlneck by themselves but caching and/or applicability of a stochastic data structure seems an interesting extension. My supervisor was interested in this specific sub-problem as it relates a bit to his own research iirc. 12:13:23
@c0ba1t:matrix.orgCobaltfast-nix-gc does not really have anything related to this, it just mentions that they load the paths into a graph for the GC search first instead of querying the store for all lookups.12:14:15
@k900:0upti.meK900 Well right now the scheduling is very stupid 12:14:20
@k900:0upti.meK900 There's no locality awareness 12:14:30
@k900:0upti.meK900Or hell job size awareness12:14:32
@k900:0upti.meK900Improving it will definitely help a little, but the big bottleneck is still the coordinator itself12:15:32
@hexa:lossy.networkhexathe scheduler12:15:50
@k900:0upti.meK900The coordinator as in the machine12:16:01
@k900:0upti.meK900But yeah12:16:02
@hexa:lossy.networkhexathe coordinator is the process that runs the remote build12:16:14
@c0ba1t:matrix.orgCobaltSo just to understand this a bit more, a significant problem is the performance of the software running the scheduler/coordinator (so the queue runner)?12:17:07
@k900:0upti.meK900 It's not even the software necessarily 12:18:34
@k900:0upti.meK900 It's the design of the whole thing that requires a lot of copying data around 12:18:44
@c0ba1t:matrix.orgCobaltA montivation of the optimizations was to ensure that scheduling was supposed to stay cheap-ish so I would try to not compromise this too much.12:18:53

Show newer messages


Back to Room ListRoom Version: 6