!MthpOIxqJhTgrMNxDS:nixos.org

NixOS ACME / LetsEncrypt

104 Members
Another day, another cert renewal43 Servers

Load older messages


SenderMessageTime
7 Nov 2024
@k900:0upti.meK900 ⚡️webserver: waiting for unit acme-finished-http.example.test.target Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"16:09:28
@k900:0upti.meK900 ⚡️Again16:09:29
8 Nov 2024
@m1cr0man:m1cr0man.comm1cr0manhttps://github.com/NixOS/nixpkgs/pull/336412 sometimes a fresh set of eyes is all that's needed. ThinkChaos' change here should significantly reduce flakiness.15:45:47
@k900:0upti.meK900 ⚡️Appreciated15:47:40
@k900:0upti.meK900 ⚡️Is it good to merge?15:47:44
@k900:0upti.meK900 ⚡️OK I assume yes15:50:52
@m1cr0man:m1cr0man.comm1cr0manYes - apologies I closed my client15:59:30
@k900:0upti.meK900 ⚡️Nope :(20:41:11
@k900:0upti.meK900 ⚡️
webserver # the following new units were started: acme-http.example.test.timer, multi-user.target, network-online.target, run-credentials-getty\x40tty1.service.mount, run-credentials-systemd\x2dtmpfiles\x2dresetup.service.mount, sysinit-reactivation.target, systemd-tmpfiles-resetup.service
webserver # [   14.902862] nixos[844]: finished switching to system configuration /nix/store/m1jmxwnpaibvj9szm7q3li1nia20q7d2-nixos-system-webserver-test
(finished: must succeed: /run/current-system/specialisation/http01lego/bin/switch-to-configuration test, in 2.37 seconds)
webserver: waiting for unit acme-finished-http.example.test.target
Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"
cleanup
20:41:14
@k900:0upti.meK900 ⚡️Hmm wait20:44:22
@k900:0upti.meK900 ⚡️This feels wrong20:44:23
@k900:0upti.meK900 ⚡️The service isn't even started by the switch20:44:30
@k900:0upti.meK900 ⚡️Yeah OK this is definitely a race20:46:54
@k900:0upti.meK900 ⚡️https://github.com/NixOS/nixpkgs/pull/35462922:58:33
@k900:0upti.meK900 ⚡️OK last thing I'm doing for the night22:58:38
@k900:0upti.meK900 ⚡️I tried a bunch of ways to make it fail and it didn't23:02:46
@k900:0upti.meK900 ⚡️Which is a good sign23:02:48
@k900:0upti.meK900 ⚡️The funny thing is23:03:20
@k900:0upti.meK900 ⚡️It actually fails if the test runs too fast23:03:27
@k900:0upti.meK900 ⚡️Unlike most of our other flakes23:03:40
@m1cr0man:m1cr0man.comm1cr0man
In reply to @k900:0upti.me
I tried a bunch of ways to make it fail and it didn't
I once left my server executing the test suite in a loop over 24 hours and had no failures. I've never been able to reproduce the issue when I want to 😅 that change does look good though.
23:06:01
@k900:0upti.meK900 ⚡️I was hoping I could get it to trigger by giving the server machine a lot of resources and the CA machine no resources23:06:40
@k900:0upti.meK900 ⚡️And that did not help either23:06:47
@k900:0upti.meK900 ⚡️Great23:07:45
@k900:0upti.meK900 ⚡️Just found a new way the test fails23:07:49
@m1cr0man:m1cr0man.comm1cr0manwait really it failed already?23:07:59
@k900:0upti.meK900 ⚡️Locally23:08:14
@k900:0upti.meK900 ⚡️
       > webserver # [  450.572939] acme-httpd-http.example.test-start[7384]: [httpd-http-alias.example.test] acme: error: 400 :: urn:ietf:params:acme:error:connection :: Get "http://httpd-http-alias.example.test:80/.well-known/acme-challenge/gA8qyav0HCSbazw_qhgCZ5EsI8QX8_pnpHNCGsQ3WDg": dial tcp 192.168.1.4:80: connect: connection refused
23:08:31
@m1cr0man:m1cr0man.comm1cr0man

Lol was about to write this before your msg:

The only other race or failure that I can think of is that sometimes, for some reason, one part of the stack required for renewal (either Pebble, DNS, or a web server) was not responding to requests.

< there it is meme >

23:09:00
@m1cr0man:m1cr0man.comm1cr0manwhat do the logs above that show nginx/apache doing on the webserver?23:09:56

Show newer messages


Back to Room ListRoom Version: 6