!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

468 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.147 Servers

Load older messages


SenderMessageTime
26 Jun 2026
@hexa:lossy.networkhexa (signing key rotation when)it's where most other scheduling systems fail21:40:11
27 Jun 2026
@shadowrz:nixos.devYorusaka Miyabi https://hydra.nixos.org/build/332738696/download/1/manual/ Yields a 500 error 03:35:18
@shadowrz:nixos.devYorusaka Miyabi(Build refers to develop version of Nix manual)03:35:43
@grimmauld:m.grimmauld.deGrimmauld (any/all)i just had some fun plotting occupancy against 1d standard deviation (blue). It seems with new queue runner, we have much more fluctuations in occupancy. Why is this? Intuitively i would expect more deviation with worse scheduling. Don't get me wrong, total utilization seems to be higher, but this looks like we may have a delay between a job being finished and a new one being queued, causing occupation to temporarily drop05:24:05
@grimmauld:m.grimmauld.deGrimmauld (any/all)image.png
Download image.png
05:24:05
@joerg:thalheim.ioMic92Before we don't have https://github.com/NixOS/infra/pull/1099 I don't think those graphs are very meaningful. There are some known bottlenecks I introduced to make it stable.05:42:58
@joerg:thalheim.ioMic92 hexa (signing key rotation when): I also added now also the patch to macOS builders: https://github.com/NixOS/infra/pull/1104 But my internet upload is still not great until Sunday, if you wanted to take a stab if it. 05:52:11
@joerg:thalheim.ioMic92 * hexa (signing key rotation when): I also added now also the patch to macOS builders: https://github.com/NixOS/infra/pull/1104 But my internet upload is still not great until Sunday, if you wanted to take a stab earlier 05:52:17
@joerg:thalheim.ioMic92 * hexa (signing key rotation when): I also added now also the patch to macOS builders: https://github.com/NixOS/infra/pull/1104 But my internet upload is still not great until Sunday, if you wanted to take a stab at it earlier 05:52:30
@whispers:catgirl.cloudwhispers [& it/fae]out of curiosity, is it expected/known that no *-linux jobs are being scheduled at all? the only one is https://hydra.nixos.org/build/332926387, which has been running for five hours. the rest of the machines seem to be sitting fully idle: https://hydra.nixos.org/machines06:05:06
@whispers:catgirl.cloudwhispers [& it/fae] * 06:05:29
@whispers:catgirl.cloudwhispers [& it/fae] * 06:06:48
@whispers:catgirl.cloudwhispers [& it/fae] * 06:09:43
@vcunat:matrix.orgvcunatLooks like very slow ingestion right now.06:51:26
@vcunat:matrix.orgvcunat(no idea why)06:51:32
@vcunat:matrix.orgvcunat And it prefers nixpkgs/unstable which has no linux builds left. 06:52:15
@vcunat:matrix.orgvcunatThough we also have builds in nixos/*-small 🤔 That seems weird.06:52:55
@vcunat:matrix.orgvcunatAh, wrong assumptions. In the 7h old eval there are 1.2k linux builds left.07:00:21
@vcunat:matrix.orgvcunatLet me try restarting the queue runner 🤷 Seems low-risk.07:01:18
@vcunat:matrix.orgvcunat

This morning the runner is logging lots of

sqlx::query: slow statement: execution time exceeded alert threshold

07:10:20
@k900:0upti.meK900Does it log which query?07:13:07
@vcunat:matrix.orgvcunat

Typical example was

Jun 27 06:29:42 mimas hydra-queue-runner[884148]: 2026-06-27T06:29:42.410544Z WARN request:complete_build:succeed_step_by_uuid:succeed_step:mark_succeeded_build:update_build: sqlx::query: slow statement: execution time exceeded alert threshold summary="UPDATE builds SET finished …" db.statement="\n\n\n UPDATE builds SET\n finished = 1,\n buildStatus = $2,\n startTime = $3,\n stopTime = $4,\n size = $5,\n closureSize = $6,\n releaseName = $7,\n isCachedBuild = $8,\n notificationPendingSince = $4\n WHERE\n id = $1\n" rows_affected=1 rows_returned=0 elapsed=136.6054209s elapsed_secs=136.6054209 slow_threshold=1s method=POST uri=/runner.v1.RunnerService/CompleteBuild version=HTTP/2.0 machine_id="2d917310-f886-420b-b961-6709654457ba" build_id="ea683ccd-60b5-49cc-b2db-0fc9dcec66f4" machine_id=2d917310-f886-420b-b961-6709654457ba build_id=ea683ccd-60b5-49cc-b2db-0fc9dcec66f4 machine_id=2d917310-f886-420b-b961-6709654457ba drv_path=hb4wbdx8jknqb2jgp23jxn1nbd03ag7c-steel-language-server-0.8.2.drv build_id=333158552

07:15:18
@k900:0upti.meK900Hmmm07:15:56
@k900:0upti.meK900This is sus07:15:59
@vcunat:matrix.orgvcunat After the queue-runner restart it's mainly get_queued_builds:get_not_finished_builds 07:16:02
@k900:0upti.meK900 Can you explain analyze those on the real db? 07:16:33
@vcunat:matrix.orgvcunatIt's really extreme now. Over 15 minutes it did 20 build steps.07:17:34
@vcunat:matrix.orgvcunatThough I get that it can have a slower start.07:17:50
@vcunat:matrix.orgvcunatIt doesn't show a usable SQL, does it?07:30:18
@vcunat:matrix.orgvcunat All those $1, $2, etc. 07:30:35

Show newer messages


Back to Room ListRoom Version: 6