NixOS ACME / LetsEncrypt | 104 Members | |
| Another day, another cert renewal | 45 Servers |
| Sender | Message | Time |
|---|---|---|
| 10 Nov 2024 | ||
| which IMO went okay until the drift between what they're capable of and the model they're designed for, and what we actually need, became clear and we had to work around that | 02:32:06 | |
| all I can say is that I understand why Caddy gave up on LEGO, and they didn't even have the penalty of trying to express all the lifecycle logic and rate limiting in terms of a Unix service manager 😅 | 02:33:01 | |
| I don't think we should have a target that represents all certificate renewals and gate every use of certificates on all certificates, if that's what you mean | 02:33:31 | |
| that'll scale pretty badly when you have a ton of certs | 02:33:35 | |
| I understand what you're saying yeah. Wrt the target thing - it's not so much that I want to put up a gate, but I want to provide a simpler method for resolving the dependency chain. In deployments where many certs are in use that I have observed, almost all of them are a dependency of the service(s) they are attached to. In practicality, I don't think there would be a significant difference between dependencies per cert vs generalized targets. At the very least, a selfsigned target would go a long way. | 02:38:56 | |
| https://github.com/NixOS/nixpkgs/pull/355087 first big refactoring PR | 22:32:19 | |
| If there's any maintainers about, I think this PR is good to merge also https://github.com/NixOS/nixpkgs/pull/348344 | 22:34:17 | |
In reply to @m1cr0man:m1cr0man.comMerged that | 22:44:00 | |
In reply to @m1cr0man:m1cr0man.comThis scares me but in a good way | 22:44:06 | |
In reply to @k900:0upti.meWhat I will do next will surely terrify and amaze 🧛♂️ | 22:45:04 | |
In reply to @k900:0upti.meThanks a mil | 22:45:19 | |
| 11 Nov 2024 | ||
| This open, 2020 ticket is peak ACME module: https://github.com/NixOS/nixpkgs/issues/106862 This is actually a variant of what K900 saw yesterday wrt webserver startup ordering. It duplicates/affects this recent ticket too. I also found the reason why we check cert expiry ourselves (I recall there being an issue with container startup also? But I can't find any reference to it. Please link if you know of it.). I see two ways to frame this issue more generally, each with very different solutions:
In this case, we need to have a reliable mechanism for configuring services which may affect cert renewability to be started before renewal is attempted. One solution is to add an Sadly this only half-solves the problem. Running != listening, and systemd only accounts for the former in most cases (non-notify services). Socket units were suggested before (I do understand them now 😉) but that is a monumentous task for all the dependencies.
This is actually untrue - we do have systemd Restart directives configured on the units. The problem is that this causes start jobs to fail, and dependent services to not start or work. What we really want(ed) is a way to gracefully retry, where we don't fail the job. We could add some sort of retry logic to the script and do away with systemd's retry logic, but again feels like reinventing the systemd retry logic in a crude way. I would love to reduce flakiness and remove scripts (the logic for checking renewal date) at the same time, but we're stuck with a limited toolset to solve this. My feeling right now is that we should at least implement solution 1 and get dependency ordering right for non-failure scenarios. Despite the caveats, I still think this would be a significant improvement. I wish I could say with confidence that this would solve test flakiness, but it probably won't. | 01:04:00 | |
| It's peak because we opened it ourselves and then ignored it for 5 years | 07:51:12 | |
| 😂😭 | 07:51:19 | |
| Yep! 😁 | 09:40:41 | |
| 17:17:34 | ||
For simplifying the max concurrency, sem from GNU parallel seems like the right tool: https://man.archlinux.org/man/sem.1The cert ExecStart would look like sem --id nixos-acme --fg --max-procs ${cfg.maxConcurrentRenewals} 'lego ...' | 18:47:12 | |
| Account creation is still messy and I think the best thing would be to write a small CLI that creates the account and write the info where lego will look for it. So one acme-account-${escaped-email}.service per account, and each cert using that account requires that service. And we use a negative ConditionPathExists to ensure it only actually runs when needed (but not RemainAfterExit otherwise clearing the state and starting a cert service won't rerun the service). | 18:58:06 | |
| Or look at completely replacing lego but that seems much harder | 18:59:47 | |
| * Or look at completely replacing lego but that seems much harder to do backwards-compatibly with existing state | 19:00:33 | |
| Maybe we should run a Kubernetes apiserver and use certmanager | 19:01:26 | |
| Only half joking | 19:01:30 | |
| For the potential custom account creation tool https://github.com/mholt/acmez/blob/v2.0.3/examples/plumbing/main.go#L56-L88 | 19:02:18 | |
I'm old enough to remember when we replaced simp_le with Lego and destroyed everyone's data | 19:02:18 | |
| ah, so this is how I get ACMEZ in through the back door 😂 | 19:02:33 | |
| And ill do it again!! 😈 | 19:02:38 | |
| one day I'll write the thing that needs to exist and then you can inflict it on everyone | 19:03:14 | |
| I took a quick look at other ACME clients listed in https://letsencrypt.org/docs/client-options/ and pretty sure I saw one could migrate Lego data but don't find it again | 19:04:29 | |
Download image.png | 19:04:40 | |
| we were so innocent then | 19:04:41 | |