28 Jun 2025 |
hexa | ideally consecutiveFailures is 0 for all builders | 03:13:17 |
ElvishJerricco | "machineTypes" : {
"x86_64-linux:kvm,nixos-test" : {
"runnable" : 5,
"running" : 0
},
"x86_64-linux:local" : {
"runnable" : 0,
"running" : 0
}
},
"consecutiveFailures" : 1, for the builder, presumably because I canceled a build
| 03:17:54 |
hexa | does the builder have the relevant system features? | 03:30:49 |
ElvishJerricco | Yep | 03:30:55 |
ElvishJerricco | I also see "disabledUntil" : 1751079864, , which corresponds with the log message hydra-queue-runner[345]: will disable machine ‘ssh://builder@pyromancer’ for 71s that came after I canceled that build. | 03:31:24 |
ElvishJerricco | But that timestamp has come and gone | 03:31:28 |
ElvishJerricco | long gone | 03:31:30 |
ElvishJerricco | should it still have that disabledUntil field if the time has passed? | 03:31:48 |
hexa | I don't think so | 03:32:06 |
hexa | or yes, it can | 03:32:22 |
hexa | h.n.o has that as well | 03:32:27 |
ElvishJerricco | My other builder has "disabledUntil" : 0, | 03:32:40 |
hexa | yeah, if it never failed | 03:32:46 |
hexa | I think consecutiveFailures gets reset when it continues working | 03:33:05 |
hexa | but disabledUntil and lastFailure are sticky until queue-runner restart | 03:33:24 |
hexa | possibly stale ssh session? | 03:33:51 |
hexa | kill the local nix-daemon? | 03:34:04 |
hexa | test the ssh connection? | 03:34:08 |
ElvishJerricco | well ssh is working, it seems | 03:36:08 |
ElvishJerricco | the way this hydra is set up is a little stupid | 03:36:17 |
ElvishJerricco | it's running in a nixos container on the host that is the builder :P | 03:36:34 |
ElvishJerricco | I don't remember why I set it up this way; I assume the daemon-socket bind mounted from the host in the container would have let hydra use the local machine as a builder | 03:38:23 |
ElvishJerricco | huh, restarting the nix daemon on the host, there's three processes left in the systemd unit from the previous service instance | 03:40:06 |
ElvishJerricco | I think whatever's going on has something to do with failed builds. It seems like it's chewing through successful builds, but once one fails it stops scheduling builds and never starts again | 03:47:50 |
ElvishJerricco | I recently updated this system, which included this hydra update: https://github.com/NixOS/nixpkgs/commit/cd9bf3369b9fc4ea0a6a8d91902a41d520580cb9
Which begins with this commit: https://github.com/NixOS/hydra/commit/720db63d52ebcbda617603e7aa5b5c750cc6afec | 05:05:32 |
ElvishJerricco | hmmm | 05:05:40 |
ElvishJerricco | mayhaps this "smarter scheduling criteria" is causing my problem? | 05:05:58 |
ElvishJerricco | well, still seems like I have the problem after patching in a revert for that | 05:18:25 |
ElvishJerricco | ok no, reverting hydra's src to the version I was using before does not fix the problem | 05:30:01 |
ElvishJerricco | I'm at a loss here. Anybody have any ideas I can try? | 05:32:52 |