7 Nov 2024 |
K900 ⚡️ | webserver: waiting for unit acme-finished-http.example.test.target
Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs" | 16:09:28 |
K900 ⚡️ | Again | 16:09:29 |
8 Nov 2024 |
m1cr0man | https://github.com/NixOS/nixpkgs/pull/336412 sometimes a fresh set of eyes is all that's needed. ThinkChaos' change here should significantly reduce flakiness. | 15:45:47 |
K900 ⚡️ | Appreciated | 15:47:40 |
K900 ⚡️ | Is it good to merge? | 15:47:44 |
K900 ⚡️ | OK I assume yes | 15:50:52 |
m1cr0man | Yes - apologies I closed my client | 15:59:30 |
K900 ⚡️ | Nope :( | 20:41:11 |
K900 ⚡️ | webserver # the following new units were started: acme-http.example.test.timer, multi-user.target, network-online.target, run-credentials-getty\x40tty1.service.mount, run-credentials-systemd\x2dtmpfiles\x2dresetup.service.mount, sysinit-reactivation.target, systemd-tmpfiles-resetup.service
webserver # [ 14.902862] nixos[844]: finished switching to system configuration /nix/store/m1jmxwnpaibvj9szm7q3li1nia20q7d2-nixos-system-webserver-test
(finished: must succeed: /run/current-system/specialisation/http01lego/bin/switch-to-configuration test, in 2.37 seconds)
webserver: waiting for unit acme-finished-http.example.test.target
Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"
cleanup
| 20:41:14 |
K900 ⚡️ | Hmm wait | 20:44:22 |
K900 ⚡️ | This feels wrong | 20:44:23 |
K900 ⚡️ | The service isn't even started by the switch | 20:44:30 |
K900 ⚡️ | Yeah OK this is definitely a race | 20:46:54 |
K900 ⚡️ | https://github.com/NixOS/nixpkgs/pull/354629 | 22:58:33 |
K900 ⚡️ | OK last thing I'm doing for the night | 22:58:38 |
K900 ⚡️ | I tried a bunch of ways to make it fail and it didn't | 23:02:46 |
K900 ⚡️ | Which is a good sign | 23:02:48 |
K900 ⚡️ | The funny thing is | 23:03:20 |
K900 ⚡️ | It actually fails if the test runs too fast | 23:03:27 |
K900 ⚡️ | Unlike most of our other flakes | 23:03:40 |
m1cr0man | In reply to @k900:0upti.me I tried a bunch of ways to make it fail and it didn't I once left my server executing the test suite in a loop over 24 hours and had no failures. I've never been able to reproduce the issue when I want to 😅 that change does look good though. | 23:06:01 |
K900 ⚡️ | I was hoping I could get it to trigger by giving the server machine a lot of resources and the CA machine no resources | 23:06:40 |
K900 ⚡️ | And that did not help either | 23:06:47 |
K900 ⚡️ | Great | 23:07:45 |
K900 ⚡️ | Just found a new way the test fails | 23:07:49 |
m1cr0man | wait really it failed already? | 23:07:59 |
K900 ⚡️ | Locally | 23:08:14 |
K900 ⚡️ | > webserver # [ 450.572939] acme-httpd-http.example.test-start[7384]: [httpd-http-alias.example.test] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Get "http://httpd-http-alias.example.test:80/.well-known/acme-challenge/gA8qyav0HCSbazw_qhgCZ5EsI8QX8_pnpHNCGsQ3WDg": dial tcp 192.168.1.4:80: connect: connection refused
| 23:08:31 |
m1cr0man | Lol was about to write this before your msg:
The only other race or failure that I can think of is that sometimes, for some reason, one part of the stack required for renewal (either Pebble, DNS, or a web server) was not responding to requests.
< there it is meme >
| 23:09:00 |
m1cr0man | what do the logs above that show nginx/apache doing on the webserver? | 23:09:56 |