| 27 Nov 2024 |
hexa | root 130934 0.0 0.0 6620 3328 ? Ss Nov25 0:00 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/kia1z8g0zv7w2ndbr6bf88ybgacjldi1-acme-postrun
root 130936 0.0 0.1 18628 7424 ? S Nov25 0:00 \_ systemctl reload nginx
root 131100 0.0 0.0 6620 3328 ? Ss Nov25 0:00 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/x8pmg6g602b92rrbapxpcmb695n811lb-acme-postrun
root 131103 0.0 0.1 18628 7552 ? S Nov25 0:00 \_ systemctl reload nginx
root 143742 0.0 0.0 6620 3328 ? Ss Nov26 0:00 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/kr92xf7yisa0236ls1gmacqwapc3zqxz-acme-postrun
root 143745 0.0 0.1 18628 7424 ? S Nov26 0:00 \_ systemctl reload nginx
| 01:53:50 |
hexa | #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx
fi
| 01:54:14 |
hexa | renewed does not exist anymore | 01:54:21 |
hexa | rebooted the machine and reloading works again | 02:02:59 |
hexa | just wrote this down so maybe if someone else hits it we'll know it is a recurring thing? 😄 | 02:03:15 |
| 28 Nov 2024 |
hexa | ok, just saw it on the next host 🫠 | 01:40:56 |
m1cr0man | that's... weird | 22:01:25 |
m1cr0man | does the nginx service show one of the ExecReload processes as running? | 22:01:49 |
m1cr0man | I got the --replace-cert-domains --overwrite-domains --force-cert-domains PR merged to lego 😄 once it has been published in a released version, and the setup process refactor has merged, I have a patch set ready to remove the domain hash entirely (test suite remains unchanged + passes, and this won't trigger mass renewal). | 22:12:03 |
Sandro 🐧 | don't forget to drop it from the test | 22:13:51 |
Sandro 🐧 | does that mean we renew all certs again ? 😅 | 22:14:09 |
m1cr0man | No, this does not affect the certDir hash, and thus will not trigger mass renewals.
I'm not sure what you're referring to in the test suite? It does not make reference to the domain hash, and the existing tests are still valid + important. | 22:15:36 |
m1cr0man | * I got the --replace-cert-domains --overwrite-domains --force-cert-domains PR merged to lego 😄 once it has been published in a released version, and the setup process refactor has merged, I have a patch set ready to remove the domain hash entirely (test suite remains unchanged + passes, and this won't trigger mass renewal). | 23:40:13 |
| 29 Nov 2024 |
hexa | m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt | 02:12:12 |
hexa | on release-24.11 | 02:12:14 |
hexa | * on release-24.11 on hydra | 02:12:19 |
hexa | In reply to @m1cr0man:m1cr0man.com does the nginx service show one of the ExecReload processes as running? no
● nginx.service - Nginx Web Server
Loaded: loaded (/etc/systemd/system/nginx.service; enabled; preset: ignored)
Active: active (running) since Mon 2024-11-18 02:14:41 UTC; 1 week 4 days ago
Invocation: ae7b61c9c68d4f04bfe52186b992b0b2
Process: 1293 ExecStartPre=/nix/store/0myprdwjj6jjkpd72r1k8qv7fxqnivkp-unit-script-nginx-pre-start/bin/nginx-pre-start (code=exited, status=0/SUCCESS)
Process: 166807 ExecReload=/nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf -t (code=exited, status=0/SUCCESS)
Process: 166808 ExecReload=/nix/store/k48bha2fjqzarg52picsdfwlqx75aqbb-coreutils-9.5/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 1297 (nginx)
IP: 18.4M in, 80.2M out
IO: 21.3M read, 12.7M written
Tasks: 2 (limit: 4553)
Memory: 9.4M (peak: 13.6M)
CPU: 32.430s
CGroup: /system.slice/nginx.service
├─ 1297 "nginx: master process /nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf"
└─166814 "nginx: worker process"
| 02:17:04 |
hexa | it is also not a race, i think | 02:17:19 |
hexa | because I see it on a third host with just a single unit stuck right now 😩 | 02:17:33 |
hexa | ● acme-spam.lossy.network.service - Renew ACME certificate for spam.lossy.network
Loaded: loaded (/etc/systemd/system/acme-spam.lossy.network.service; enabled; preset: ignored)
Active: activating (start-post) since Mon 2024-11-25 12:40:50 UTC; 3 days ago
Invocation: de04e9b6744344cf86391a79578da590
TriggeredBy: ● acme-spam.lossy.network.timer
Process: 180885 ExecStart=/nix/store/h9h8x07dqvzp9652p05jq8zs2205q66v-unit-script-acme-spam.lossy.network-start/bin/acme-spam.lossy.network-start (code=exited, status=0/SUCCESS)
Main PID: 180885 (code=exited, status=0/SUCCESS); Control PID: 180907 (vw98jrrshrz371a)
IP: 23.3K in, 12.4K out
IO: 52.1M read, 8K written
Tasks: 2 (limit: 4553)
Memory: 1.8M (peak: 105M)
CPU: 577ms
CGroup: /system.slice/acme-spam.lossy.network.service
├─180907 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun
└─180909 systemctl reload nginx
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180900]: 'certificates/spam.lossy.network.crt' -> 'out/fullchain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.key out/key.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180901]: 'certificates/spam.lossy.network.key' -> 'out/key.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.issuer.crt out/chain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180902]: 'certificates/spam.lossy.network.issuer.crt' -> 'out/chain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + ln -sf fullchain.pem out/cert.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cat out/key.pem out/fullchain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + chmod 640 out/cert.pem out/chain.pem out/fullchain.pem out/full.pem out/key.pem out/renewed
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + echo 'Releasing lock /run/acme/2.lock'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: Releasing lock /run/acme/2.lock
| 02:17:49 |
hexa | (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboarda t) | 02:18:14 |
hexa | * (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboard at) | 02:18:24 |
hexa | I'm leaving it in this state for now, so we can check this out tomorrow or on the weekend | 02:19:16 |
hexa | the cert expires on december 26th, so we have some time 😄 | 02:19:32 |
hexa | In reply to @hexa:lossy.network m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt builds on my machine(TM) | 02:28:01 |
hexa | worked on the third try on hydra | 02:34:51 |
ThinkChaos | It's not stuck on "Releasing lock", that process has exited: Main PID: 180885 (code=exited, status=0\/SUCCESS)
The code actually does nothing after printing that, the script just exits which automatically frees the lock (source)
Based on the CGroup content I think it's stuck on reloading Nginx though I don't understand how that would block or why it's doing that, as Nginx is supposed to reload itself through nginx-config-reload.service.
What's the content of /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun?
What's the value of services.nginx.enableReload? Did you add nginx to the cert's reloadServices? | 03:23:40 |
hexa | #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx
fi
| 12:38:15 |
hexa | * #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx # <-- stuck here
fi
| 12:38:32 |
hexa | I have not set enableReload | 12:39:10 |