!MthpOIxqJhTgrMNxDS:nixos.org

NixOS ACME / LetsEncrypt

103 Members
Another day, another cert renewal42 Servers

Load older messages


SenderMessageTime
5 Nov 2024
@k900:0upti.meK900Can someone please look into the flaky tests06:44:50
@k900:0upti.meK900It's been happening more and more lately06:44:56
7 Nov 2024
@k900:0upti.meK900Folks I know I am starting to sound like a broken record07:00:33
@k900:0upti.meK900 But the tests are flaking 07:00:39
@k900:0upti.meK900And I really don't want to retire them from the blocking jobs07:00:47
@k900:0upti.meK900And I have no idea what is going on there07:01:00
@k900:0upti.meK900Can someone with either knowledge or more free time please take a look07:01:12
@emilazy:matrix.orgemily cc m1cr0man 07:02:07
@emilazy:matrix.orgemilythe ACME tests are pretty important since they're the one line of defence we have against everyone's services going completely unavailable. unfortunately they have also long since exceeded the complexity at which I feel like I have a handle on them and I know m1cr0man only has so much time these days :(07:03:32
@m1cr0man:m1cr0man.comm1cr0manAre they still flaking? I did put out some fixes a few weeks ago to help reduce flakiness by wrapping some of the assertions in retries. I hadn't heard anything more so I assumed it was fixed. I am a bit better for time now (house move over) so I can look into it again. Feel free to spam me with any failures you see. I'll take a look on hydra too Wrt actual test complexity. I'm not sure how to simplify it. There's a lot of moving parts to testing acme. I did put a nice summary into an issue comment last week. https://github.com/NixOS/nixpkgs/pull/340136#issuecomment-244864894409:20:12
@k900:0upti.meK900Yes, they are09:27:47
@k900:0upti.meK900webserver: waiting for unit acme-finished-http.example.test.target Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"16:09:28
@k900:0upti.meK900Again16:09:29
8 Nov 2024
@m1cr0man:m1cr0man.comm1cr0manhttps://github.com/NixOS/nixpkgs/pull/336412 sometimes a fresh set of eyes is all that's needed. ThinkChaos' change here should significantly reduce flakiness.15:45:47
@k900:0upti.meK900Appreciated15:47:40
@k900:0upti.meK900Is it good to merge?15:47:44
@k900:0upti.meK900OK I assume yes15:50:52
@m1cr0man:m1cr0man.comm1cr0manYes - apologies I closed my client15:59:30
@k900:0upti.meK900Nope :(20:41:11
@k900:0upti.meK900
webserver # the following new units were started: acme-http.example.test.timer, multi-user.target, network-online.target, run-credentials-getty\x40tty1.service.mount, run-credentials-systemd\x2dtmpfiles\x2dresetup.service.mount, sysinit-reactivation.target, systemd-tmpfiles-resetup.service
webserver # [   14.902862] nixos[844]: finished switching to system configuration /nix/store/m1jmxwnpaibvj9szm7q3li1nia20q7d2-nixos-system-webserver-test
(finished: must succeed: /run/current-system/specialisation/http01lego/bin/switch-to-configuration test, in 2.37 seconds)
webserver: waiting for unit acme-finished-http.example.test.target
Test "Can request certificate with Lego's built in web server" failed with error: "unit "acme-finished-http.example.test.target" is inactive and there are no pending jobs"
cleanup
20:41:14
@k900:0upti.meK900Hmm wait20:44:22
@k900:0upti.meK900This feels wrong20:44:23
@k900:0upti.meK900The service isn't even started by the switch20:44:30
@k900:0upti.meK900Yeah OK this is definitely a race20:46:54
@k900:0upti.meK900https://github.com/NixOS/nixpkgs/pull/35462922:58:33
@k900:0upti.meK900OK last thing I'm doing for the night22:58:38
@k900:0upti.meK900I tried a bunch of ways to make it fail and it didn't23:02:46
@k900:0upti.meK900Which is a good sign23:02:48
@k900:0upti.meK900The funny thing is23:03:20
@k900:0upti.meK900It actually fails if the test runs too fast23:03:27

Show newer messages


Back to Room ListRoom Version: 6