NixOS ACME / LetsEncrypt | 103 Members | |
| Another day, another cert renewal | 44 Servers |
| Sender | Message | Time |
|---|---|---|
| 16 Nov 2024 | ||
| What is your overall goal with this implementation? | 11:04:03 | |
In reply to @m1cr0man:m1cr0man.comPrimarily reduce time of activation with a lot of certs. | 14:05:11 | |
| For some reason I find the long activation a bit nerve wrecking. 😬 | 14:06:18 | |
| The other pro mentioned, the clustering, is more PoC than anything else. You could do DNS RR that way, but not something I'd want to deploy. It might be interesting to build load balancers with failover, but I don't yet have an easy solution for that. (We currently rely on AWS ALB for that.) | 14:09:29 | |
| I definitely use per cert targets, and think it's indeed vital that if one cert fails it doesn't prevent the whole system from functioning | 16:26:41 | |
| Is that so you could do http-01 with multiple external addresses + servers? | 16:50:46 | |
| I use DNS validation but have multiple independent services using ACME, mostly an HTTP server and a (secure) DNS server | 16:52:04 | |
| BTW I'm trying a different approach to simplifying the locks (had a "why didn't I think of this earlier moment"). Basically remove all the round robin stuff from Nix and just use a loop in the shell script to try each lock with a timeout until one works That makes the locking pretty straightforward, and activation should be quicker since we'll parallelize more by using whatever lock is available at a given time | 17:30:39 | |
| Here's that code: https://github.com/NixOS/nixpkgs/commit/ec145d8ccdd64ea6faef4881163e3811a5bf07f3 | 18:00:48 | |
| m1cr0man do you prefer I wait for your PR to be merged before opening one for this to avoid conflicts? | 18:02:56 | |
| I would yeah, if that's alright? I'd also like to give that a review when I get a moment | 18:06:34 | |
| Ok no worries | 18:07:40 | |
In reply to @thinkchaos:matrix.orgDefinitely. I used the same approach generating selfsigned first, then notify systemd to continue dependant units. Do you use the targets to sequence startup of other services, or something else? | 18:26:49 | |
I don't use the self signed certs and my services use requires = [ "acme-finished-${cert}.target" ]; | 18:29:39 | |
In reply to @m1cr0man:m1cr0man.comExactly, certmagic coordinates using some shared storage. So you'd get some rudimentary balancing via DNS RR, but no redundancy. For that you still need some IP failover setup. | 18:32:53 | |
In reply to @thinkchaos:matrix.orgI wonder if I can hack around that with certmagic, haha. I kinda don't want to run the daemon as root, but maybe a separate service can run as root, act on PathModified to check for valid certs, then fire the targets. 🙃 | 18:45:05 | |
The daemon should be ok running as acme user and group combined with something like SupplementaryGroups = lib.unique (map (c: c.group) cfg.certs) | 19:13:33 | |
Actually plain acme user should do, certs.*.user doesn't do anything anymore so the acme user can read/write all certs | 19:21:53 | |
In reply to @stephank:stephank.nlmaybe you can hook something up with user systemd | 19:39:29 | |
In reply to @stephank:stephank.nl Cool ok. This is one of those things that I don't think we will ever support in the in-tree ACME module, at least not whilst we use Lego. The main thing going around in my head is that I don't want either project to give the impression that the other is broken in some way, but if there are additional use cases and features in nixos-certmagic that security.acme does not currently offer, then users who stumble upon both options can understand when to chose one implementation over the other. I find it hard to justify ever changing from lego right now. Most people treat ACME as set-and-forget, literally. If we break people's configs, we need to have a really good reason to do so, and right now I don't think that removing a few lines of bash or improving activation time for more-than-average cert configs is worth it. That's why I'm apprehensive about all this talk of using daemons. Out of tree - sure, I can see the appeal as both an upgrade for those who can handle the reconfiguration of their system, and to cover extra use cases. | 20:32:21 | |
| ThinkChaos: thanks for the second round of reviews on my PR. I am contesting both of the latest comments though 😅 I was wondering if you wanted to have further discussion? | 20:34:26 | |
| The path thing is just a nit, fine by me to leave as is (I'll say that in the review too) | 20:35:56 | |
That's not how I view it, for instance secrets go in /run too. To me it's for anything ephemeral. AFAIU tmpfiles is generally used for stuff that's required as long as the system is active. Which fits well here. | 20:39:23 | |
And small side effect is we don't use the lockdir var in the service so it makes dependency more hidden | 20:40:15 | |
* And small side effect is we don't use the lockdir var in the service so it makes the dependency more hidden | 20:40:23 | |
| The acme-setup.service is a requirement of all the renewal services (and is oneshot+RemainAfterExit), but systemd-tmpfiles is not. We actually had a test failure on hydra a couple of days ago because tmpfiles had not ran when lockdir was accessed. Let me see if I can find you the logs. | 20:41:10 | |
I agree with this, however /run/acme is directly related to service activation + logic implemented in systemd services. Having its lifecycle managed as a RuntimeDirectory definitely makes things easier. I will definitely add a comment to say where it's created, that's a good call that the relation is not obvious | 20:42:16 | |
In reply to @k900:0upti.meThis was the lockfiles error we saw last week. | 20:47:35 | |
Ok then RuntimeDir is ok with meI thought tmpfiles was something the activation scripts ensured ran earlier based on how it's generally used, but never confirmed that assumption. That also means lots of modules are broken 😕 | 20:47:47 | |
| I'll reply and approve 🙂 | 20:47:58 | |