NixOS ACME / LetsEncrypt - Public Room Timeline

	NixOS ACME / LetsEncrypt	103 Members
	Another day, another cert renewal	44 Servers

Load older messages

Sender	Message	Time
16 Nov 2024
m1cr0man	What is your overall goal with this implementation?	11:04:03
Stéphan	In reply to @m1cr0man:m1cr0man.com What is your overall goal with this implementation? Primarily reduce time of activation with a lot of certs.	14:05:11
Stéphan	For some reason I find the long activation a bit nerve wrecking. 😬	14:06:18
Stéphan	The other pro mentioned, the clustering, is more PoC than anything else. You could do DNS RR that way, but not something I'd want to deploy. It might be interesting to build load balancers with failover, but I don't yet have an easy solution for that. (We currently rely on AWS ALB for that.)	14:09:29
ThinkChaos	I definitely use per cert targets, and think it's indeed vital that if one cert fails it doesn't prevent the whole system from functioning	16:26:41
m1cr0man	Is that so you could do http-01 with multiple external addresses + servers?	16:50:46
ThinkChaos	I use DNS validation but have multiple independent services using ACME, mostly an HTTP server and a (secure) DNS server	16:52:04
ThinkChaos	BTW I'm trying a different approach to simplifying the locks (had a "why didn't I think of this earlier moment"). Basically remove all the round robin stuff from Nix and just use a loop in the shell script to try each lock with a timeout until one works That makes the locking pretty straightforward, and activation should be quicker since we'll parallelize more by using whatever lock is available at a given time	17:30:39
ThinkChaos	Here's that code: https://github.com/NixOS/nixpkgs/commit/ec145d8ccdd64ea6faef4881163e3811a5bf07f3	18:00:48
ThinkChaos	m1cr0man do you prefer I wait for your PR to be merged before opening one for this to avoid conflicts?	18:02:56
m1cr0man	I would yeah, if that's alright? I'd also like to give that a review when I get a moment	18:06:34
ThinkChaos	Ok no worries	18:07:40
Stéphan	In reply to @thinkchaos:matrix.org I definitely use per cert targets, and think it's indeed vital that if one cert fails it doesn't prevent the whole system from functioning Definitely. I used the same approach generating selfsigned first, then notify systemd to continue dependant units. Do you use the targets to sequence startup of other services, or something else?	18:26:49
ThinkChaos	I don't use the self signed certs and my services use `requires = [ "acme-finished-${cert}.target" ];`	18:29:39
Stéphan	In reply to @m1cr0man:m1cr0man.com Is that so you could do http-01 with multiple external addresses + servers? Exactly, certmagic coordinates using some shared storage. So you'd get some rudimentary balancing via DNS RR, but no redundancy. For that you still need some IP failover setup.	18:32:53
Stéphan	In reply to @thinkchaos:matrix.org I don't use the self signed certs and my services use `requires = [ "acme-finished-${cert}.target" ];` I wonder if I can hack around that with certmagic, haha. I kinda don't want to run the daemon as root, but maybe a separate service can run as root, act on PathModified to check for valid certs, then fire the targets. 🙃	18:45:05
ThinkChaos	The daemon should be ok running as acme user and group combined with something like `SupplementaryGroups = lib.unique (map (c: c.group) cfg.certs)`	19:13:33
ThinkChaos	Actually plain acme user should do, `certs.*.user` doesn't do anything anymore so the acme user can read/write all certs	19:21:53
emily	In reply to @stephank:stephank.nl I wonder if I can hack around that with certmagic, haha. I kinda don't want to run the daemon as root, but maybe a separate service can run as root, act on PathModified to check for valid certs, then fire the targets. 🙃 maybe you can hook something up with user systemd	19:39:29
m1cr0man	In reply to @stephank:stephank.nl Exactly, certmagic coordinates using some shared storage. So you'd get some rudimentary balancing via DNS RR, but no redundancy. For that you still need some IP failover setup. Cool ok. This is one of those things that I don't think we will ever support in the in-tree ACME module, at least not whilst we use Lego. The main thing going around in my head is that I don't want either project to give the impression that the other is broken in some way, but if there are additional use cases and features in nixos-certmagic that security.acme does not currently offer, then users who stumble upon both options can understand when to chose one implementation over the other. I find it hard to justify ever changing from lego right now. Most people treat ACME as set-and-forget, literally. If we break people's configs, we need to have a really good reason to do so, and right now I don't think that removing a few lines of bash or improving activation time for more-than-average cert configs is worth it. That's why I'm apprehensive about all this talk of using daemons. Out of tree - sure, I can see the appeal as both an upgrade for those who can handle the reconfiguration of their system, and to cover extra use cases.	20:32:21
m1cr0man	ThinkChaos: thanks for the second round of reviews on my PR. I am contesting both of the latest comments though 😅 I was wondering if you wanted to have further discussion?	20:34:26
ThinkChaos	The path thing is just a nit, fine by me to leave as is (I'll say that in the review too)	20:35:56
ThinkChaos	Generally, it is expected that /run/ directories map to RuntimeDirectory entries in systemd services. That's not how I view it, for instance secrets go in /run too. To me it's for anything ephemeral. AFAIU tmpfiles is generally used for stuff that's required as long as the system is active. Which fits well here. Relying on a service to start to create a dir then used by other things seems more convoluted to me	20:39:23
ThinkChaos	And small side effect is we don't use the `lockdir` var in the service so it makes dependency more hidden	20:40:15
ThinkChaos	* And small side effect is we don't use the `lockdir` var in the service so it makes the dependency more hidden	20:40:23
m1cr0man	The acme-setup.service is a requirement of all the renewal services (and is oneshot+RemainAfterExit), but systemd-tmpfiles is not. We actually had a test failure on hydra a couple of days ago because tmpfiles had not ran when lockdir was accessed. Let me see if I can find you the logs.	20:41:10
m1cr0man	That's not how I view it, for instance secrets go in /run too. To me it's for anything ephemeral. I agree with this, however /run/acme is directly related to service activation + logic implemented in systemd services. Having its lifecycle managed as a RuntimeDirectory definitely makes things easier. I will definitely add a comment to say where it's created, that's a good call that the relation is not obvious	20:42:16
m1cr0man	In reply to @k900:0upti.me `webserver # [ 426.884702] (es-start)[2816]: acme-lockfiles.service: Changing to the requested working directory failed: Permission denied webserver # [ 426.934208] (es-start)[2816]: acme-lockfiles.service: Failed at step CHDIR spawning /nix/store/n24xs3nmndyyivq3q5w52f7aqlb06hqh-unit-script-acme-lockfiles-start/bin/acme-lockfiles-start: Permission denied` This was the lockfiles error we saw last week.	20:47:35
ThinkChaos	Ok then `RuntimeDir` is ok with me I thought tmpfiles was something the activation scripts ensured ran earlier based on how it's generally used, but never confirmed that assumption. That also means lots of modules are broken 😕	20:47:47
ThinkChaos	I'll reply and approve 🙂	20:47:58

Show newer messages

Back to Room ListRoom Version: 6