13 Dec 2022 |
m1cr0man | Right. I'm gonna write a script to run it 1000 times and capture the failures :P I have no clue why it's failing. I already did a pass on it a while ago when it failed far more frequently (like maybe a year ago now), so there must be some other race condition going on | 10:20:20 |
K900 | I think it's trying to hit the webserver before the webserver is actually up | 10:20:51 |
m1cr0man | yeah which it shouldn't be doing, I have appropriate port checks and retry logic but that seems to be insufficient | 10:21:57 |
m1cr0man | alright, I'll start with 100 loops. They are taking about 4 minutes each. Will review after work | 11:04:55 |
Winter (she/her) | In reply to @m1cr0man:m1cr0man.com yeah which it shouldn't be doing, I have appropriate port checks and retry logic but that seems to be insufficient wonder what hellish issue you'll uncover next 🫡 | 14:48:42 |
m1cr0man | 103 attempts later and not a single one has failed 😕 | 22:38:37 |
m1cr0man | Okay so being a bit smarter with this debugging, I am looking through the build logs on Hydra for successful builds and checking where any retry logic was triggered and how many times. if you search this build https://hydra.nixos.org/build/201652934/nixlog/1 for "s_client -brief" You will see an instance in the first few matches in which it has to be retried 3 consecutive times (and works on the third). The method performing this is here: https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/acme.nix#L407-L418 and is configured for 3 retries. You can also see the webserver giving the error client closed connection while waiting for request on 2 of the 3 attempts.
I think I need to increase the delays + number of retries for this method and any others that are waiting on web responses. Even from that log, I can't see any reason the server wasn't able to respond or why the client had sent a partial request. Hopefully this will be sufficient to stop the failures.
| 22:59:25 |
m1cr0man | Welp, here's a PR https://github.com/NixOS/nixpkgs/pull/205983 hopefully this does the trick. I have good evidence to support increased retries + delays solving the issue. | 23:44:37 |
14 Dec 2022 |
m1cr0man | Awh dammit I just realized that vscode auto formatted it 🤦🤦 will fix tomorrow | 00:22:06 |
| Alesya Huzik joined the room. | 11:13:02 |
21 Dec 2022 |
| @thatsnomoon_343:matrix.org joined the room. | 05:01:47 |
| @thatsnomoon_343:matrix.org left the room. | 20:16:41 |
22 Dec 2022 |
m1cr0man | Thanks for the merge Raito! 🙂 | 03:25:12 |
raitobezarius | with pleasure | 04:15:18 |