| 29 Nov 2024 |
hexa | on release-24.11 | 02:12:14 |
hexa | * on release-24.11 on hydra | 02:12:19 |
hexa | In reply to @m1cr0man:m1cr0man.com does the nginx service show one of the ExecReload processes as running? no
● nginx.service - Nginx Web Server
Loaded: loaded (/etc/systemd/system/nginx.service; enabled; preset: ignored)
Active: active (running) since Mon 2024-11-18 02:14:41 UTC; 1 week 4 days ago
Invocation: ae7b61c9c68d4f04bfe52186b992b0b2
Process: 1293 ExecStartPre=/nix/store/0myprdwjj6jjkpd72r1k8qv7fxqnivkp-unit-script-nginx-pre-start/bin/nginx-pre-start (code=exited, status=0/SUCCESS)
Process: 166807 ExecReload=/nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf -t (code=exited, status=0/SUCCESS)
Process: 166808 ExecReload=/nix/store/k48bha2fjqzarg52picsdfwlqx75aqbb-coreutils-9.5/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Main PID: 1297 (nginx)
IP: 18.4M in, 80.2M out
IO: 21.3M read, 12.7M written
Tasks: 2 (limit: 4553)
Memory: 9.4M (peak: 13.6M)
CPU: 32.430s
CGroup: /system.slice/nginx.service
├─ 1297 "nginx: master process /nix/store/m489ix7hrxznh7a5fmdsijdlq4x6p5nn-nginx-1.26.2/bin/nginx -c /nix/store/8nw18skbb0ic2wznfa652nh0xpdn4s5a-nginx.conf"
└─166814 "nginx: worker process"
| 02:17:04 |
hexa | it is also not a race, i think | 02:17:19 |
hexa | because I see it on a third host with just a single unit stuck right now 😩 | 02:17:33 |
hexa | ● acme-spam.lossy.network.service - Renew ACME certificate for spam.lossy.network
Loaded: loaded (/etc/systemd/system/acme-spam.lossy.network.service; enabled; preset: ignored)
Active: activating (start-post) since Mon 2024-11-25 12:40:50 UTC; 3 days ago
Invocation: de04e9b6744344cf86391a79578da590
TriggeredBy: ● acme-spam.lossy.network.timer
Process: 180885 ExecStart=/nix/store/h9h8x07dqvzp9652p05jq8zs2205q66v-unit-script-acme-spam.lossy.network-start/bin/acme-spam.lossy.network-start (code=exited, status=0/SUCCESS)
Main PID: 180885 (code=exited, status=0/SUCCESS); Control PID: 180907 (vw98jrrshrz371a)
IP: 23.3K in, 12.4K out
IO: 52.1M read, 8K written
Tasks: 2 (limit: 4553)
Memory: 1.8M (peak: 105M)
CPU: 577ms
CGroup: /system.slice/acme-spam.lossy.network.service
├─180907 /nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun
└─180909 systemctl reload nginx
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180900]: 'certificates/spam.lossy.network.crt' -> 'out/fullchain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.key out/key.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180901]: 'certificates/spam.lossy.network.key' -> 'out/key.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cp -vp certificates/spam.lossy.network.issuer.crt out/chain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180902]: 'certificates/spam.lossy.network.issuer.crt' -> 'out/chain.pem'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + ln -sf fullchain.pem out/cert.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + cat out/key.pem out/fullchain.pem
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + chmod 640 out/cert.pem out/chain.pem out/fullchain.pem out/full.pem out/key.pem out/renewed
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: + echo 'Releasing lock /run/acme/2.lock'
Nov 25 12:41:01 helios acme-spam.lossy.network-start[180885]: Releasing lock /run/acme/2.lock
| 02:17:49 |
hexa | (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboarda t) | 02:18:14 |
hexa | * (yes, the domain is spam.lossy.network, it is where I send all the spam from host my rspam dashboard at) | 02:18:24 |
hexa | I'm leaving it in this state for now, so we can check this out tomorrow or on the weekend | 02:19:16 |
hexa | the cert expires on december 26th, so we have some time 😄 | 02:19:32 |
hexa | In reply to @hexa:lossy.network m1cr0man: https://gist.githubusercontent.com/mweinelt/b27e2353eedc99242a1074a5d2a4e85f/raw/2475ed5c4863f763f8e0e3938dcc2567a633d67d/gistfile1.txt builds on my machine(TM) | 02:28:01 |
hexa | worked on the third try on hydra | 02:34:51 |
ThinkChaos | It's not stuck on "Releasing lock", that process has exited: Main PID: 180885 (code=exited, status=0\/SUCCESS)
The code actually does nothing after printing that, the script just exits which automatically frees the lock (source)
Based on the CGroup content I think it's stuck on reloading Nginx though I don't understand how that would block or why it's doing that, as Nginx is supposed to reload itself through nginx-config-reload.service.
What's the content of /nix/store/vw98jrrshrz371az32ssbcwrr3bz2fqs-acme-postrun?
What's the value of services.nginx.enableReload? Did you add nginx to the cert's reloadServices? | 03:23:40 |
hexa | #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx
fi
| 12:38:15 |
hexa | * #!/nix/store/0irlcqx2n3qm6b1pc9rsd2i8qpvcccaj-bash-5.2p37/bin/bash
cd /var/lib/acme/spam.lossy.network
if [ -e renewed ]; then
rm renewed
systemctl reload nginx # <-- stuck here
fi
| 12:38:32 |
hexa | I have not set enableReload | 12:39:10 |
hexa | security.acme.certs."spam.${config.networking.domain}" = {
postRun = ''
systemctl reload nginx
'';
};
| 12:40:03 |
hexa | converting that to reloadServices is obviously WIP | 12:40:46 |
ThinkChaos | I think you should set enableReload = true and remove all your custom reloading logic. The Nginx module will handle it: https://github.com/NixOS/nixpkgs/blob/0c582677378f2d9ffcb01490af2f2c678dcb29d3/nixos/modules/services/web-servers/nginx/default.nix#L1317-L1342 | 14:39:11 |
hexa | ok, cool | 18:58:37 |
hexa | doesn't explain why systemctl reload nginx gets stuck 😄 | 18:58:45 |
hexa | Thread 1 (Thread 0x7f7f4c1c5680 (LWP 180909) "systemctl"):
#0 0x00007f7f4c50963c in ppoll () from target:/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6
No symbol table info available.
#1 0x00007f7f4c82270b in ppoll_usec () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#2 0x00007f7f4c89e33a in bus_poll () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#3 0x00007f7f4c89e6c5 in sd_bus_wait () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#4 0x00007f7f4c6c41b9 in bus_wait_for_jobs () from target:/nix/store/ivqjhj99firnjq7gp14qf35821viwi5m-systemd-256.7/lib/systemd/libsystemd-shared-256.so
No symbol table info available.
#5 0x00005641d8690e2c in verb_start ()
No symbol table info available.
#6 0x00005641d8672bea in main ()
| 18:59:54 |
ThinkChaos | You could run the service's ExecReload manually to see if it's there or in Systemd it's hanging | 19:11:11 |
ThinkChaos | It only does 2 things: check the config, and send a SIGHUP | 19:11:58 |
hexa | systemctl reload nginx blocks, I think I established that earlier | 19:16:32 |
hexa | uhh, sorry | 19:16:37 |
hexa | I mean I established that the both work individually | 19:16:56 |
hexa | it is systemctl reload that is stuck for some reason | 19:17:03 |
hexa | https://gist.github.com/mweinelt/f099ec270ace7cb197954e23871471be | 19:21:08 |
| @admin:nixos.org joined the room. | 19:22:24 |