!MthpOIxqJhTgrMNxDS:nixos.org

NixOS ACME / LetsEncrypt

106 Members
Another day, another cert renewal45 Servers

Load older messages


SenderMessageTime
5 Nov 2024
@k900:0upti.meK900 ⚡️Can someone please look into the flaky tests06:44:50
@k900:0upti.meK900 ⚡️It's been happening more and more lately06:44:56
7 Nov 2024
@k900:0upti.meK900 ⚡️Folks I know I am starting to sound like a broken record07:00:33
@k900:0upti.meK900 ⚡️ But the tests are flaking 07:00:39
@k900:0upti.meK900 ⚡️And I really don't want to retire them from the blocking jobs07:00:47
@k900:0upti.meK900 ⚡️And I have no idea what is going on there07:01:00
@k900:0upti.meK900 ⚡️Can someone with either knowledge or more free time please take a look07:01:12
@emilazy:matrix.orgemily cc m1cr0man 07:02:07
@emilazy:matrix.orgemilythe ACME tests are pretty important since they're the one line of defence we have against everyone's services going completely unavailable. unfortunately they have also long since exceeded the complexity at which I feel like I have a handle on them and I know m1cr0man only has so much time these days :(07:03:32
@m1cr0man:m1cr0man.comm1cr0manAre they still flaking? I did put out some fixes a few weeks ago to help reduce flakiness by wrapping some of the assertions in retries. I hadn't heard anything more so I assumed it was fixed. I am a bit better for time now (house move over) so I can look into it again. Feel free to spam me with any failures you see. I'll take a look on hydra too Wrt actual test complexity. I'm not sure how to simplify it. There's a lot of moving parts to testing acme. I did put a nice summary into an issue comment last week. https://github.com/NixOS/nixpkgs/pull/340136#issuecomment-244864894409:20:12
@k900:0upti.meK900 ⚡️Yes, they are09:27:47
@k900:0upti.meK900 ⚡️webserver: waiting for unit acme-finished-http.example.test.target Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"16:09:28
@k900:0upti.meK900 ⚡️Again16:09:29
8 Nov 2024
@m1cr0man:m1cr0man.comm1cr0manhttps://github.com/NixOS/nixpkgs/pull/336412 sometimes a fresh set of eyes is all that's needed. ThinkChaos' change here should significantly reduce flakiness.15:45:47
@k900:0upti.meK900 ⚡️Appreciated15:47:40
@k900:0upti.meK900 ⚡️Is it good to merge?15:47:44
@k900:0upti.meK900 ⚡️OK I assume yes15:50:52
@m1cr0man:m1cr0man.comm1cr0manYes - apologies I closed my client15:59:30
@k900:0upti.meK900 ⚡️Nope :(20:41:11
@k900:0upti.meK900 ⚡️
webserver # the following new units were started: acme-http.example.test.timer, multi-user.target, network-online.target, run-credentials-getty\x40tty1.service.mount, run-credentials-systemd\x2dtmpfiles\x2dresetup.service.mount, sysinit-reactivation.target, systemd-tmpfiles-resetup.service
webserver # [   14.902862] nixos[844]: finished switching to system configuration /nix/store/m1jmxwnpaibvj9szm7q3li1nia20q7d2-nixos-system-webserver-test
(finished: must succeed: /run/current-system/specialisation/http01lego/bin/switch-to-configuration test, in 2.37 seconds)
webserver: waiting for unit acme-finished-http.example.test.target
Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"
cleanup
20:41:14
@k900:0upti.meK900 ⚡️Hmm wait20:44:22
@k900:0upti.meK900 ⚡️This feels wrong20:44:23
@k900:0upti.meK900 ⚡️The service isn't even started by the switch20:44:30
@k900:0upti.meK900 ⚡️Yeah OK this is definitely a race20:46:54
@k900:0upti.meK900 ⚡️https://github.com/NixOS/nixpkgs/pull/35462922:58:33
@k900:0upti.meK900 ⚡️OK last thing I'm doing for the night22:58:38
@k900:0upti.meK900 ⚡️I tried a bunch of ways to make it fail and it didn't23:02:46
@k900:0upti.meK900 ⚡️Which is a good sign23:02:48
@k900:0upti.meK900 ⚡️The funny thing is23:03:20
@k900:0upti.meK900 ⚡️It actually fails if the test runs too fast23:03:27

Show newer messages


Back to Room ListRoom Version: 6