28 Jun 2025 |
ElvishJerricco | I've been running a personal hydra for a couple years now, and a couple days it basically just stopped doing builds. I can get it to do a couple more every time I restart it, but it doesn't continue through the queue. Meanwhile in the journal I see this every 10 seconds: hydra-queue-runner[345]: checking the queue for builds... | 02:41:59 |
aftix | I had something similar which happened since the disk was full | 03:01:48 |
hexa | oh yeah, watermark level 🙂 | 03:04:21 |
hexa | minimumDiskFree[Evaluator] | 03:04:47 |
ElvishJerricco | that has never been a problem before, and my disk is not as full as it has been in the past | 03:05:34 |
ElvishJerricco | how would I check if that's it? | 03:05:43 |
hexa | did you configure a limit? | 03:09:11 |
ElvishJerricco | nope | 03:09:20 |
hexa | then the feature is probably not enabled | 03:09:40 |
hexa | you can inspect the internal state of the queue-runner by appending /queue-runner-status to the baseur | 03:11:59 |
hexa | * you can inspect the internal state of the queue-runner by appending /queue-runner-status to the base url | 03:12:03 |
hexa | relevant is whether there are runnables, because only those jobs are ready to be scheduled | 03:12:59 |
hexa | ideally consecutiveFailures is 0 for all builders | 03:13:17 |
ElvishJerricco | "machineTypes" : {
"x86_64-linux:kvm,nixos-test" : {
"runnable" : 5,
"running" : 0
},
"x86_64-linux:local" : {
"runnable" : 0,
"running" : 0
}
},
"consecutiveFailures" : 1, for the builder, presumably because I canceled a build
| 03:17:54 |
hexa | does the builder have the relevant system features? | 03:30:49 |
ElvishJerricco | Yep | 03:30:55 |
ElvishJerricco | I also see "disabledUntil" : 1751079864, , which corresponds with the log message hydra-queue-runner[345]: will disable machine ‘ssh://builder@pyromancer’ for 71s that came after I canceled that build. | 03:31:24 |
ElvishJerricco | But that timestamp has come and gone | 03:31:28 |
ElvishJerricco | long gone | 03:31:30 |
ElvishJerricco | should it still have that disabledUntil field if the time has passed? | 03:31:48 |
hexa | I don't think so | 03:32:06 |
hexa | or yes, it can | 03:32:22 |
hexa | h.n.o has that as well | 03:32:27 |
ElvishJerricco | My other builder has "disabledUntil" : 0, | 03:32:40 |
hexa | yeah, if it never failed | 03:32:46 |
hexa | I think consecutiveFailures gets reset when it continues working | 03:33:05 |
hexa | but disabledUntil and lastFailure are sticky until queue-runner restart | 03:33:24 |
hexa | possibly stale ssh session? | 03:33:51 |
hexa | kill the local nix-daemon? | 03:34:04 |
hexa | test the ssh connection? | 03:34:08 |