NixOS ACME / LetsEncrypt | 103 Members | |
| Another day, another cert renewal | 42 Servers |
| Sender | Message | Time |
|---|---|---|
| 29 Nov 2024 | ||
| 19:22:37 | ||
| Respectfully, I don't want to spend more time investigating this issue since it's in your personal config and not the NixOS modules. Your strace ends with ask-password related stuff so it's likely waiting to authenticate somehow.If you switch to reloadServices it uses --no-block.And better yet, if you switch to enableReload you'll use the battle tested solution. | 19:54:56 | |
| 1 Dec 2024 | ||
| I have another "fun" set of upstreaming work completed. I estimate this one at half the chance of being merged than the previous change, simply because of the structure of lego's cmd code + error handling. https://github.com/go-acme/lego/compare/master...m1cr0man:lego:renew-rc-2 https://github.com/m1cr0man/nixpkgs/commit/53846b07f5037e854993366beab3e0a618d1fd68 I have not opened PRs yet, will do that in a second | 01:52:09 | |
| With this work, I think the ACME module is in one of the best states it has ever been in. The remaining bash scripting in the module does only 2 things primarily: 1. Perform simple file operations like cp, chmod, chown. 2. Handle concurrency limits. The latter is being looked into by ThinkChaos too, see earlier discussions :) | 02:00:32 | |
| Lol, that ended quickly https://github.com/go-acme/lego/pull/2366 | 02:18:10 | |
| https://github.com/go-acme/lego/issues/2367 🤷 lets hope it doesn't take years | 02:37:50 | |
| 5 Dec 2024 | ||
| 01:53:01 | ||
| 16 Dec 2024 | ||
| So uh | 23:41:24 | |
| Do we have anything that can at least paper over the ordering issues | 23:41:42 | |
| Without making things even more complicated | 23:41:51 | |
| Because the tests are flaking a lot and it's getting on my nerves | 23:42:04 | |
| 17 Dec 2024 | ||
| Could someone please review the fix for cert ownership error message causing an unrelated exception PR, #362271? It's a tiny diff :) Users are getting misleading errors due to this throwing ATM | 23:57:19 | |
| 19 Dec 2024 | ||
K900 I looked at the log for the this failure, httpd only started after the ACME validation happened: Starting Apache HTTPD vs Attempting to validate w/ HTTPI think this is a switch-to-configuration-ng regression 😕The perl script starts all services in a single systemctl call, so a single Systemd transaction. That means httpd's Before relationship with the certs is enforced. Whereas -ng uses the Systemd D-BUS API to start services one by one, meaning multiple transactions. So Before is not enforced. I guess we can try and disable -ng for the ACME tests, see how it goes for a week or so and then potentially raise an issue with -ng. | 01:31:18 | |
| BTW thanks for the review + merge on the PR from above! | 01:39:02 | |
In reply to@thinkchaos:matrix.orgUhh | 06:55:59 | |
| Can you please report this in #NixOS systemd | 06:56:24 | |
| There is no api for starting multiple services in a single transaction. This has always been a lie | 10:46:30 | |
| I think systemctl start also is a for loop around starting single units through dbus afaicr | 10:46:51 | |
| Yeah I need to dig a bit more before I make too much noise, I'll look at systemctl's code, thanks for the hint | 13:38:17 | |
Either way I think we'll need to make the link between the certs and web server stronger to fix this: I'm thinking certs using HTTP validation can Require the relevant web server | 13:45:07 | |
| 21 Dec 2024 | ||
| 06:43:11 | ||
In reply to @arianvp:matrix.orgReally? This completely blows my understanding of service relation chains | 22:43:00 | |
| Yeh pretty sure | 22:43:42 | |
| There is a mutable list of jobs and "dependencies" are some rules that cause some jobs to cancel others out | 22:44:36 | |
| The whole dependency model is kind of a lie | 22:44:45 | |
| https://blog.darknedgy.net/technology/2020/05/02/0/ is a nice read | 22:44:57 | |
| 22 Dec 2024 | ||
| How are we feeling about the acme-setup.service refactor now? https://github.com/NixOS/nixpkgs/pull/355087 I still want to get this merged, it really simplifies the systemd side of things a bit. | 12:31:30 | |
In reply to @thinkchaos:matrix.orgI totally forgot that we had a discussion about this a while ago 😅 tl;dr we could add a target for http01 renewal specifically. The web servers can be configured to want + before on it, and the renewals can require + after. This gives us a generic mechanism of linking whatever web server is running on port 80 to the certs using HTTP01. | 12:36:53 | |
| We do have to be careful about circular dependencies, but that's expected. HTTP01 server startup is complicated regardless. | 12:37:36 | |
In reply to @thinkchaos:matrix.org* I totally forgot that we had a discussion about this a while ago 😅 tl;dr we could add a target for http01 renewal specifically. The web servers can be configured to requiredBy + before on it, and the renewals can require + after. This gives us a generic mechanism of linking whatever web server is running on port 80 to the certs using HTTP01. | 12:41:42 | |