| 28 Nov 2024 |
Sandro 🐧 | don't forget to drop it from the test | 22:13:51 |
Sandro 🐧 | does that mean we renew all certs again ? 😅 | 22:14:09 |
m1cr0man | No, this does not affect the certDir hash, and thus will not trigger mass renewals.
I'm not sure what you're referring to in the test suite? It does not make reference to the domain hash, and the existing tests are still valid + important. | 22:15:36 |
m1cr0man | * I got the --replace-cert-domains --overwrite-domains --force-cert-domains PR merged to lego 😄 once it has been published in a released version, and the setup process refactor has merged, I have a patch set ready to remove the domain hash entirely (test suite remains unchanged + passes, and this won't trigger mass renewal). | 23:40:13 |
| 29 Nov 2024 |
hexa | m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt | 02:12:12 |
hexa | on release-24.11 | 02:12:14 |
hexa | * on release-24.11 on hydra | 02:12:19 |
hexa | In reply to @m1cr0man:m1cr0man.com does the nginx service show one of the ExecReload processes as running? no
● nginx.service - Nginx Web Server
Loaded: loaded (/etc/systemd/system/nginx.service; enabled; preset: ignored)
Active: active (running) since Mon 2024-11-18 02:14:41 UTC; 1 week 4 days ago
Invocation: ae7b61c9c68d4f04bfe52186b992b0b2
Process: 1293 ExecStartPre=/nix/store/0myprdwjj6jjkpd72r1k8qv7fxqnivkp-unit-script-nginx-pre-start/bin/nginx-pre-start (code=exited, status=0/SUCCESS)
Process: 166807 ExecReload=/nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf -t (code=exited, status=0/SUCCESS)
Process: 166808 ExecReload=/nix/store/k48bha2fjqzarg52picsdfwlqx75aqbb-coreutils-9.5/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 1297 (nginx)
IP: 18.4M in, 80.2M out
IO: 21.3M read, 12.7M written
Tasks: 2 (limit: 4553)
Memory: 9.4M (peak: 13.6M)
CPU: 32.430s
CGroup: /system.slice/nginx.service
├─ 1297 "nginx: master process /nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf"
└─166814 "nginx: worker process"
| 02:17:04 |
hexa | it is also not a race, i think | 02:17:19 |
hexa | because I see it on a third host with just a single unit stuck right now 😩 | 02:17:33 |
hexa | ● acme-spam.lossy.network.service - Renew ACME certificate for spam.lossy.network
Loaded: loaded (/etc/systemd/system/acme-spam.lossy.network.service; enabled; preset: ignored)
Active: activating (start-post) since Mon 2024-11-25 12:40:50 UTC; 3 days ago
Invocation: de04e9b6744344cf86391a79578da590
TriggeredBy: ● acme-spam.lossy.network.timer
Process: 180885 ExecStart=/nix/store/h9h8x07dqvzp9652p05jq8zs2205q66v-unit-script-acme-spam.lossy.network-start/bin/acme-spam.lossy.network-start (code=exited, status=0/SUCCESS)
Main PID: 180885 (code=exited, status=0/SUCCESS); Control PID: 180907 (vw98jrrshrz371a)
IP: 23.3K in, 12.4K out
IO: 52.1M read, 8K written
Tasks: 2 (limit: 4553)
Memory: 1.8M (peak: 105M)
CPU: 577ms
CGroup: /system.slice/acme-spam.lossy.network.service
├─180907 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun
└─180909 systemctl reload nginx
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180900]: 'certificates/spam.lossy.network.crt' -> 'out/fullchain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.key out/key.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180901]: 'certificates/spam.lossy.network.key' -> 'out/key.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.issuer.crt out/chain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180902]: 'certificates/spam.lossy.network.issuer.crt' -> 'out/chain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + ln -sf fullchain.pem out/cert.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cat out/key.pem out/fullchain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + chmod 640 out/cert.pem out/chain.pem out/fullchain.pem out/full.pem out/key.pem out/renewed
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + echo 'Releasing lock /run/acme/2.lock'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: Releasing lock /run/acme/2.lock
| 02:17:49 |
hexa | (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboarda t) | 02:18:14 |
hexa | * (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboard at) | 02:18:24 |
hexa | I'm leaving it in this state for now, so we can check this out tomorrow or on the weekend | 02:19:16 |
hexa | the cert expires on december 26th, so we have some time 😄 | 02:19:32 |
hexa | In reply to @hexa:lossy.network m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt builds on my machine(TM) | 02:28:01 |
hexa | worked on the third try on hydra | 02:34:51 |
ThinkChaos | It's not stuck on "Releasing lock", that process has exited: Main PID: 180885 (code=exited, status=0\/SUCCESS)
The code actually does nothing after printing that, the script just exits which automatically frees the lock (source)
Based on the CGroup content I think it's stuck on reloading Nginx though I don't understand how that would block or why it's doing that, as Nginx is supposed to reload itself through nginx-config-reload.service.
What's the content of /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun?
What's the value of services.nginx.enableReload? Did you add nginx to the cert's reloadServices? | 03:23:40 |
hexa | #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx
fi
| 12:38:15 |
hexa | * #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx # <-- stuck here
fi
| 12:38:32 |
hexa | I have not set enableReload | 12:39:10 |
hexa | security.acme.certs."spam.${config.networking.domain}" = {
postRun = ''
systemctl reload nginx
'';
};
| 12:40:03 |
hexa | converting that to reloadServices is obviously WIP | 12:40:46 |
ThinkChaos | I think you should set enableReload = true and remove all your custom reloading logic. The Nginx module will handle it: https://github.com/NixOS/nixpkgs/blob/0c582677378f2d9ffcb01490af2f2c678dcb29d3/nixos/modules/services/web-servers/nginx/default.nix#L1317-L1342 | 14:39:11 |
hexa | ok, cool | 18:58:37 |
hexa | doesn't explain why systemctl reload nginx gets stuck 😄 | 18:58:45 |
hexa | Thread 1 (Thread 0x7f7f4c1c5680 (LWP 180909) "systemctl"):
#0 0x00007f7f4c50963c in ppoll () from target:/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6
No symbol table info available.
#1 0x00007f7f4c82270b in ppoll_usec () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#2 0x00007f7f4c89e33a in bus_poll () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#3 0x00007f7f4c89e6c5 in sd_bus_wait () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#4 0x00007f7f4c6c41b9 in bus_wait_for_jobs () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#5 0x00005641d8690e2c in verb_start ()
No symbol table info available.
#6 0x00005641d8672bea in main ()
| 18:59:54 |
ThinkChaos | You could run the service's ExecReload manually to see if it's there or in Systemd it's hanging | 19:11:11 |
ThinkChaos | It only does 2 things: check the config, and send a SIGHUP | 19:11:58 |
hexa | systemctl reload nginx blocks, I think I established that earlier | 19:16:32 |