NixOS ACME / LetsEncrypt | 106 Members | |
| Another day, another cert renewal | 47 Servers |
| Sender | Message | Time |
|---|---|---|
| 29 Nov 2024 | ||
doesn't explain why systemctl reload nginx gets stuck 😄 | 18:58:45 | |
| 18:59:54 | |
You could run the service's ExecReload manually to see if it's there or in Systemd it's hanging | 19:11:11 | |
| It only does 2 things: check the config, and send a SIGHUP | 19:11:58 | |
| systemctl reload nginx blocks, I think I established that earlier | 19:16:32 | |
| uhh, sorry | 19:16:37 | |
| I mean I established that the both work individually | 19:16:56 | |
| it is systemctl reload that is stuck for some reason | 19:17:03 | |
| https://gist.github.com/mweinelt/f099ec270ace7cb197954e23871471be | 19:21:08 | |
| 19:22:24 | ||
| 19:22:37 | ||
| Respectfully, I don't want to spend more time investigating this issue since it's in your personal config and not the NixOS modules. Your strace ends with ask-password related stuff so it's likely waiting to authenticate somehow.If you switch to reloadServices it uses --no-block.And better yet, if you switch to enableReload you'll use the battle tested solution. | 19:54:56 | |
| 1 Dec 2024 | ||
| I have another "fun" set of upstreaming work completed. I estimate this one at half the chance of being merged than the previous change, simply because of the structure of lego's cmd code + error handling. https://github.com/go-acme/lego/compare/master...m1cr0man:lego:renew-rc-2 https://github.com/m1cr0man/nixpkgs/commit/53846b07f5037e854993366beab3e0a618d1fd68 I have not opened PRs yet, will do that in a second | 01:52:09 | |
| With this work, I think the ACME module is in one of the best states it has ever been in. The remaining bash scripting in the module does only 2 things primarily: 1. Perform simple file operations like cp, chmod, chown. 2. Handle concurrency limits. The latter is being looked into by ThinkChaos too, see earlier discussions :) | 02:00:32 | |
| Lol, that ended quickly https://github.com/go-acme/lego/pull/2366 | 02:18:10 | |
| https://github.com/go-acme/lego/issues/2367 🤷 lets hope it doesn't take years | 02:37:50 | |
| 5 Dec 2024 | ||
| 01:53:01 | ||
| 16 Dec 2024 | ||
| So uh | 23:41:24 | |
| Do we have anything that can at least paper over the ordering issues | 23:41:42 | |
| Without making things even more complicated | 23:41:51 | |
| Because the tests are flaking a lot and it's getting on my nerves | 23:42:04 | |
| 17 Dec 2024 | ||
| Could someone please review the fix for cert ownership error message causing an unrelated exception PR, #362271? It's a tiny diff :) Users are getting misleading errors due to this throwing ATM | 23:57:19 | |
| 19 Dec 2024 | ||
K900 I looked at the log for the this failure, httpd only started after the ACME validation happened: Starting Apache HTTPD vs Attempting to validate w/ HTTPI think this is a switch-to-configuration-ng regression 😕The perl script starts all services in a single systemctl call, so a single Systemd transaction. That means httpd's Before relationship with the certs is enforced. Whereas -ng uses the Systemd D-BUS API to start services one by one, meaning multiple transactions. So Before is not enforced. I guess we can try and disable -ng for the ACME tests, see how it goes for a week or so and then potentially raise an issue with -ng. | 01:31:18 | |
| BTW thanks for the review + merge on the PR from above! | 01:39:02 | |
In reply to@thinkchaos:matrix.orgUhh | 06:55:59 | |
| Can you please report this in #NixOS systemd | 06:56:24 | |
| There is no api for starting multiple services in a single transaction. This has always been a lie | 10:46:30 | |
| I think systemctl start also is a for loop around starting single units through dbus afaicr | 10:46:51 | |
| Yeah I need to dig a bit more before I make too much noise, I'll look at systemctl's code, thanks for the hint | 13:38:17 | |
Either way I think we'll need to make the link between the certs and web server stronger to fix this: I'm thinking certs using HTTP validation can Require the relevant web server | 13:45:07 | |