| 29 Nov 2024 |
hexa | ● acme-spam.lossy.network.service - Renew ACME certificate for spam.lossy.network
Loaded: loaded (/etc/systemd/system/acme-spam.lossy.network.service; enabled; preset: ignored)
Active: activating (start-post) since Mon 2024-11-25 12:40:50 UTC; 3 days ago
Invocation: de04e9b6744344cf86391a79578da590
TriggeredBy: ● acme-spam.lossy.network.timer
Process: 180885 ExecStart=/nix/store/h9h8x07dqvzp9652p05jq8zs2205q66v-unit-script-acme-spam.lossy.network-start/bin/acme-spam.lossy.network-start (code=exited, status=0/SUCCESS)
Main PID: 180885 (code=exited, status=0/SUCCESS); Control PID: 180907 (vw98jrrshrz371a)
IP: 23.3K in, 12.4K out
IO: 52.1M read, 8K written
Tasks: 2 (limit: 4553)
Memory: 1.8M (peak: 105M)
CPU: 577ms
CGroup: /system.slice/acme-spam.lossy.network.service
├─180907 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun
└─180909 systemctl reload nginx
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180900]: 'certificates/spam.lossy.network.crt' -> 'out/fullchain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.key out/key.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180901]: 'certificates/spam.lossy.network.key' -> 'out/key.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.issuer.crt out/chain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180902]: 'certificates/spam.lossy.network.issuer.crt' -> 'out/chain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + ln -sf fullchain.pem out/cert.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cat out/key.pem out/fullchain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + chmod 640 out/cert.pem out/chain.pem out/fullchain.pem out/full.pem out/key.pem out/renewed
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + echo 'Releasing lock /run/acme/2.lock'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: Releasing lock /run/acme/2.lock
| 02:17:49 |
hexa | (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboarda t) | 02:18:14 |
hexa | * (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboard at) | 02:18:24 |
hexa | I'm leaving it in this state for now, so we can check this out tomorrow or on the weekend | 02:19:16 |
hexa | the cert expires on december 26th, so we have some time 😄 | 02:19:32 |
hexa | In reply to @hexa:lossy.network m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt builds on my machine(TM) | 02:28:01 |
hexa | worked on the third try on hydra | 02:34:51 |
ThinkChaos | It's not stuck on "Releasing lock", that process has exited: Main PID: 180885 (code=exited, status=0\/SUCCESS)
The code actually does nothing after printing that, the script just exits which automatically frees the lock (source)
Based on the CGroup content I think it's stuck on reloading Nginx though I don't understand how that would block or why it's doing that, as Nginx is supposed to reload itself through nginx-config-reload.service.
What's the content of /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun?
What's the value of services.nginx.enableReload? Did you add nginx to the cert's reloadServices? | 03:23:40 |
hexa | #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx
fi
| 12:38:15 |
hexa | * #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx # <-- stuck here
fi
| 12:38:32 |
hexa | I have not set enableReload | 12:39:10 |
hexa | security.acme.certs."spam.${config.networking.domain}" = {
postRun = ''
systemctl reload nginx
'';
};
| 12:40:03 |
hexa | converting that to reloadServices is obviously WIP | 12:40:46 |
ThinkChaos | I think you should set enableReload = true and remove all your custom reloading logic. The Nginx module will handle it: https://github.com/NixOS/nixpkgs/blob/0c582677378f2d9ffcb01490af2f2c678dcb29d3/nixos/modules/services/web-servers/nginx/default.nix#L1317-L1342 | 14:39:11 |
hexa | ok, cool | 18:58:37 |
hexa | doesn't explain why systemctl reload nginx gets stuck 😄 | 18:58:45 |
hexa | Thread 1 (Thread 0x7f7f4c1c5680 (LWP 180909) "systemctl"):
#0 0x00007f7f4c50963c in ppoll () from target:/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6
No symbol table info available.
#1 0x00007f7f4c82270b in ppoll_usec () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#2 0x00007f7f4c89e33a in bus_poll () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#3 0x00007f7f4c89e6c5 in sd_bus_wait () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#4 0x00007f7f4c6c41b9 in bus_wait_for_jobs () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#5 0x00005641d8690e2c in verb_start ()
No symbol table info available.
#6 0x00005641d8672bea in main ()
| 18:59:54 |
ThinkChaos | You could run the service's ExecReload manually to see if it's there or in Systemd it's hanging | 19:11:11 |
ThinkChaos | It only does 2 things: check the config, and send a SIGHUP | 19:11:58 |
hexa | systemctl reload nginx blocks, I think I established that earlier | 19:16:32 |
hexa | uhh, sorry | 19:16:37 |
hexa | I mean I established that the both work individually | 19:16:56 |
hexa | it is systemctl reload that is stuck for some reason | 19:17:03 |
hexa | https://gist.github.com/mweinelt/f099ec270ace7cb197954e23871471be | 19:21:08 |
| @admin:nixos.org joined the room. | 19:22:24 |
| @admin:nixos.org left the room. | 19:22:37 |
ThinkChaos | Respectfully, I don't want to spend more time investigating this issue since it's in your personal config and not the NixOS modules.
Your strace ends with ask-password related stuff so it's likely waiting to authenticate somehow.
If you switch to reloadServices it uses --no-block.
And better yet, if you switch to enableReload you'll use the battle tested solution. | 19:54:56 |
| 1 Dec 2024 |
m1cr0man | I have another "fun" set of upstreaming work completed. I estimate this one at half the chance of being merged than the previous change, simply because of the structure of lego's cmd code + error handling.
https://github.com/go-acme/lego/compare/master...m1cr0man:lego:renew-rc-2
https://github.com/m1cr0man/nixpkgs/commit/53846b07f5037e854993366beab3e0a618d1fd68
I have not opened PRs yet, will do that in a second | 01:52:09 |
m1cr0man | With this work, I think the ACME module is in one of the best states it has ever been in. The remaining bash scripting in the module does only 2 things primarily: 1. Perform simple file operations like cp, chmod, chown. 2. Handle concurrency limits. The latter is being looked into by ThinkChaos too, see earlier discussions :) | 02:00:32 |
m1cr0man | Lol, that ended quickly https://github.com/go-acme/lego/pull/2366 | 02:18:10 |