8 Jan 2024 |
hexa | yeah, pretty much the same cause as last time | 22:37:42 |
hexa | loving the redundant stack trace | 22:39:14 |
hexa | testing if depending on network-online.target does the trick | 22:40:02 |
hexa | ok, here is another take on the situation after adding that dependency | 22:47:03 |
hexa | matrix-synapse.service has Restart=on-failure | 22:47:32 |
hexa | * matrix-synapse.service has Restart=on-failure | 22:47:37 |
hexa | the workers as well | 22:47:48 |
hexa | so the matrix-synapse.service boots after a a few retries (because the idm is on a vpn, and the routing daemon needs to converge first....) | 22:48:19 |
hexa | the issue is that the worker services are failed with dependency when matrix-synapse.service fails the first time | 22:48:48 |
hexa | even if it works on the second try - too late. | 22:49:25 |
⛧-440729 [sophie] (it/its) | We could set RestartMode=direct on matrix-synapse.service so it doesn't go through a failure state | 22:50:09 |
Dandellion | The options I've seen to fix that floating around online is essentially the wait-for-script we removed | 22:50:48 |
hexa | Yeah, thought of BindsTo= and the wait script | 22:51:07 |
hexa | not sure about the implications of RestartMode=direct | 22:51:18 |
Dandellion | https://github.com/systemd/systemd/issues/1312 | 22:51:36 |
Dandellion | so seems direct can work? | 22:51:45 |
Dandellion | as long as we don't put any limits on the amount of restarts or something to that effect? | 22:52:25 |
hexa | sounds like that was the solution | 22:52:28 |
hexa | anyway, that means matrix-synapse.service can never reach failed state, which is not ideal for monitoring | 22:52:54 |
Dandellion | I've not seen/used it before so don't really know what it does | 22:53:11 |
Dandellion | * I've never seen/used it before so don't really know what it does | 22:53:17 |
⛧-440729 [sophie] (it/its) | In reply to @hexa:lossy.network anyway, that means matrix-synapse.service can never reach failed state, which is not ideal for monitoring How so? If restart limits are hit the service should still transition to inactive/failed even with direct set | 22:54:49 |
hexa | oh yeah, that is correct | 22:55:04 |
⛧-440729 [sophie] (it/its) | Or we use the Upholds= fix mentioned in the issue Dandellion linked | 22:56:19 |
hexa | yeah, that sounds like exactly what we want | 22:57:11 |
⛧-440729 [sophie] (it/its) | We could add Upholds=matrix-synapse-worker-...service to matrix-synapse.service so as soon as matrix-synapse.service is running it should start the workesr | 22:57:18 |
⛧-440729 [sophie] (it/its) | * We could add Upholds=matrix-synapse-worker-...service to matrix-synapse.service so as soon as matrix-synapse.service is running it should start the workers | 22:57:27 |
⛧-440729 [sophie] (it/its) | Not sure how stopping everything with the target would work in that case | 22:57:44 |
hexa | BindsTo= ? | 22:57:56 |
⛧-440729 [sophie] (it/its) | Hmm I feel like RestartMode=direct is the somewhat cleaner solution but idk | 22:59:52 |