!MthpOIxqJhTgrMNxDS:nixos.org

NixOS ACME / LetsEncrypt

93 Members
Another day, another cert renewal43 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
13 Dec 2022
@m1cr0man:m1cr0man.comm1cr0manRight. I'm gonna write a script to run it 1000 times and capture the failures :P I have no clue why it's failing. I already did a pass on it a while ago when it failed far more frequently (like maybe a year ago now), so there must be some other race condition going on10:20:20
@k900:0upti.meK900I think it's trying to hit the webserver before the webserver is actually up10:20:51
@m1cr0man:m1cr0man.comm1cr0manyeah which it shouldn't be doing, I have appropriate port checks and retry logic but that seems to be insufficient10:21:57
@m1cr0man:m1cr0man.comm1cr0manalright, I'll start with 100 loops. They are taking about 4 minutes each. Will review after work11:04:55
@winterqt:nixos.devWinter (she/her)
In reply to @m1cr0man:m1cr0man.com
yeah which it shouldn't be doing, I have appropriate port checks and retry logic but that seems to be insufficient
wonder what hellish issue you'll uncover next 🫡
14:48:42
@m1cr0man:m1cr0man.comm1cr0man103 attempts later and not a single one has failed 😕 22:38:37
@m1cr0man:m1cr0man.comm1cr0man

Okay so being a bit smarter with this debugging, I am looking through the build logs on Hydra for successful builds and checking where any retry logic was triggered and how many times. if you search this build https://hydra.nixos.org/build/201652934/nixlog/1 for "s_client -brief" You will see an instance in the first few matches in which it has to be retried 3 consecutive times (and works on the third). The method performing this is here: https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/acme.nix#L407-L418 and is configured for 3 retries. You can also see the webserver giving the error client closed connection while waiting for request on 2 of the 3 attempts.

I think I need to increase the delays + number of retries for this method and any others that are waiting on web responses. Even from that log, I can't see any reason the server wasn't able to respond or why the client had sent a partial request. Hopefully this will be sufficient to stop the failures.

22:59:25
@m1cr0man:m1cr0man.comm1cr0manWelp, here's a PR https://github.com/NixOS/nixpkgs/pull/205983 hopefully this does the trick. I have good evidence to support increased retries + delays solving the issue.23:44:37
14 Dec 2022
@m1cr0man:m1cr0man.comm1cr0manAwh dammit I just realized that vscode auto formatted it 🤦🤦 will fix tomorrow00:22:06
@alesya-h:nixos.devAlesya Huzik joined the room.11:13:02
21 Dec 2022
@thatsnomoon_343:matrix.org@thatsnomoon_343:matrix.org joined the room.05:01:47
@thatsnomoon_343:matrix.org@thatsnomoon_343:matrix.org left the room.20:16:41
22 Dec 2022
@m1cr0man:m1cr0man.comm1cr0manThanks for the merge Raito! 🙂03:25:12
@raitobezarius:matrix.orgraitobezariuswith pleasure04:15:18

Show newer messages


Back to Room ListRoom Version: 6