!MthpOIxqJhTgrMNxDS:nixos.org

NixOS ACME / LetsEncrypt

107 Members
Another day, another cert renewal45 Servers

Load older messages


SenderMessageTime
10 Nov 2024
@k900:0upti.meK900
In reply to @m1cr0man:m1cr0man.com
If there's any maintainers about, I think this PR is good to merge also https://github.com/NixOS/nixpkgs/pull/348344
Merged that
22:44:00
@k900:0upti.meK900
In reply to @m1cr0man:m1cr0man.com
https://github.com/NixOS/nixpkgs/pull/355087 first big refactoring PR
This scares me but in a good way
22:44:06
@m1cr0man:m1cr0man.comm1cr0man
In reply to @k900:0upti.me
This scares me but in a good way
What I will do next will surely terrify and amaze 🧛‍♂️
22:45:04
@m1cr0man:m1cr0man.comm1cr0man
In reply to @k900:0upti.me
Merged that
Thanks a mil
22:45:19
11 Nov 2024
@m1cr0man:m1cr0man.comm1cr0man

This open, 2020 ticket is peak ACME module: https://github.com/NixOS/nixpkgs/issues/106862

This is actually a variant of what K900 saw yesterday wrt webserver startup ordering. It duplicates/affects this recent ticket too. I also found the reason why we check cert expiry ourselves (I recall there being an issue with container startup also? But I can't find any reference to it. Please link if you know of it.).

I see two ways to frame this issue more generally, each with very different solutions:

  1. "ACME renewal does not reliably wait on external dependencies"

In this case, we need to have a reliable mechanism for configuring services which may affect cert renewability to be started before renewal is attempted. One solution is to add an acme-renewal-dependencies.target and add service modules to it as required on a best effort basis. I'm sure issues will be opened if we miss something, as they have been historically.

Sadly this only half-solves the problem. Running != listening, and systemd only accounts for the former in most cases (non-notify services). Socket units were suggested before (I do understand them now 😉) but that is a monumentous task for all the dependencies.
We could do some naive tests and delays like CURLing the configured ACME server and checking for a listening on port 80, but that just feels wrong.

  1. "ACME renewal does not gracefully handle failures"

This is actually untrue - we do have systemd Restart directives configured on the units. The problem is that this causes start jobs to fail, and dependent services to not start or work. What we really want(ed) is a way to gracefully retry, where we don't fail the job. We could add some sort of retry logic to the script and do away with systemd's retry logic, but again feels like reinventing the systemd retry logic in a crude way.

I would love to reduce flakiness and remove scripts (the logic for checking renewal date) at the same time, but we're stuck with a limited toolset to solve this.

My feeling right now is that we should at least implement solution 1 and get dependency ordering right for non-failure scenarios. Despite the caveats, I still think this would be a significant improvement. I wish I could say with confidence that this would solve test flakiness, but it probably won't.

01:04:00
@arianvp:matrix.orgArianIt's peak because we opened it ourselves and then ignored it for 5 years07:51:12
@arianvp:matrix.orgArian😂😭07:51:19
@m1cr0man:m1cr0man.comm1cr0manYep! 😁09:40:41
@thinkchaos:matrix.orgThinkChaos joined the room.17:17:34
@thinkchaos:matrix.orgThinkChaos For simplifying the max concurrency, sem from GNU parallel seems like the right tool: https://man.archlinux.org/man/sem.1
The cert ExecStart would look like sem --id nixos-acme --fg --max-procs ${cfg.maxConcurrentRenewals} 'lego ...'
18:47:12
@thinkchaos:matrix.orgThinkChaos Account creation is still messy and I think the best thing would be to write a small CLI that creates the account and write the info where lego will look for it.
So one acme-account-${escaped-email}.service per account, and each cert using that account requires that service. And we use a negative ConditionPathExists to ensure it only actually runs when needed (but not RemainAfterExit otherwise clearing the state and starting a cert service won't rerun the service).
18:58:06
@thinkchaos:matrix.orgThinkChaosOr look at completely replacing lego but that seems much harder18:59:47
@thinkchaos:matrix.orgThinkChaos * Or look at completely replacing lego but that seems much harder to do backwards-compatibly with existing state19:00:33
@arianvp:matrix.orgArianMaybe we should run a Kubernetes apiserver and use certmanager19:01:26
@arianvp:matrix.orgArianOnly half joking19:01:30
@thinkchaos:matrix.orgThinkChaosFor the potential custom account creation tool https://github.com/mholt/acmez/blob/v2.0.3/examples/plumbing/main.go#L56-L8819:02:18
@emilazy:matrix.orgemily I'm old enough to remember when we replaced simp_le with Lego and destroyed everyone's data 19:02:18
@emilazy:matrix.orgemilyah, so this is how I get ACMEZ in through the back door 😂19:02:33
@arianvp:matrix.orgArianAnd ill do it again!! 😈19:02:38
@emilazy:matrix.orgemilyone day I'll write the thing that needs to exist and then you can inflict it on everyone19:03:14
@thinkchaos:matrix.orgThinkChaosI took a quick look at other ACME clients listed in https://letsencrypt.org/docs/client-options/ and pretty sure I saw one could migrate Lego data but don't find it again19:04:29
@emilazy:matrix.orgemilyimage.png
Download image.png
19:04:40
@emilazy:matrix.orgemilywe were so innocent then19:04:41
@thinkchaos:matrix.orgThinkChaosAnyways the cert dir structure was different I think so would still break users19:04:52
@emilazy:matrix.orgemily
In reply to @thinkchaos:matrix.org
I took a quick look at other ACME clients listed in https://letsencrypt.org/docs/client-options/ and pretty sure I saw one could migrate Lego data but don't find it again
nothing really exists that meets requirements and is superior to lego IMO
19:04:55
@emilazy:matrix.orgemilyCaddy builds on CertMagic/ACMEZ and is a better implementation with a much better model (a proper daemon), but it doesn't quite have the shape of the thing we need19:05:18
@thinkchaos:matrix.orgThinkChaosYeah that was my conclusion from a quick look, hence the custom tool proposal :)19:05:20
@emilazy:matrix.orgemilyhttps://github.com/https-dev/docs/blob/master/acme-ops.md essential reading19:05:47
@emilazy:matrix.orgemily(primarily from the Caddy/CertMagic/ACMEZ author)19:06:00
@arianvp:matrix.orgArianMy website still runs 21.05 lol19:06:06

Show newer messages


Back to Room ListRoom Version: 6