!vxTmkuJzhGPsMdkAOc:transformierende-gesellschaft.org

NixOS Matrix Subsystem

139 Members
Coordination and discussion about the matrix subsystem in NixOS - https://nixos.wiki/wiki/Matrix71 Servers

Load older messages


SenderMessageTime
8 Jan 2024
@dandellion:dodsorf.asDandellionI've not seen/used it before so don't really know what it does22:53:11
@dandellion:dodsorf.asDandellion * I've never seen/used it before so don't really know what it does22:53:17
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its)
In reply to @hexa:lossy.network
anyway, that means matrix-synapse.service can never reach failed state, which is not ideal for monitoring
How so? If restart limits are hit the service should still transition to inactive/failed even with direct set
22:54:49
@hexa:lossy.networkhexaoh yeah, that is correct22:55:04
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its) Or we use the Upholds= fix mentioned in the issue Dandellion linked 22:56:19
@hexa:lossy.networkhexayeah, that sounds like exactly what we want22:57:11
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its) We could add Upholds=matrix-synapse-worker-...service to matrix-synapse.service so as soon as matrix-synapse.service is running it should start the workesr 22:57:18
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its) * We could add Upholds=matrix-synapse-worker-...service to matrix-synapse.service so as soon as matrix-synapse.service is running it should start the workers 22:57:27
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its)Not sure how stopping everything with the target would work in that case22:57:44
@hexa:lossy.networkhexa BindsTo=? 22:57:56
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its) Hmm I feel like RestartMode=direct is the somewhat cleaner solution but idk 22:59:52
@hexa:lossy.networkhexaas a fix to this particular issue, but not to define the proper relationship between these services imo23:00:21
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its)Fair enough23:00:45
@dandellion:dodsorf.asDandellionUpholds holds water for the workers which currently exist I think23:02:47
@dandellion:dodsorf.asDandellion

Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed

23:03:20
@dandellion:dodsorf.asDandellionwhich is what you basically always want23:03:58
@dandellion:dodsorf.asDandellionAnyone on making a pr?23:08:32
@ma27:nicht-so.sexyma27I may be able to take care of it tomorrow, too tired now23:10:10
@dandellion:dodsorf.asDandellionI'll do it once I finish up this then nw23:10:58
@dandellion:dodsorf.asDandellion* I'll do it once I finish up what I'm currently doing then, nw23:11:25
10 Jan 2024
@csyn:matrix.orgcsyn joined the room.04:53:49
@philipp:xndr.dephilipp changed their display name from philipp to philipp (prolog in linux kernel, when).13:47:05
@philipp:xndr.dephilipp changed their display name from philipp (prolog in linux kernel, when) to philipp (prolog in linux kernel when).13:47:16
@philipp:xndr.dephilipp changed their display name from philipp (prolog in linux kernel when) to test.13:47:52
@philipp:xndr.dephilipp changed their display name from test to philipp.13:49:29
15 Jan 2024
@fadenb:utzutzutz.netfadenb changed their profile picture.11:22:15
@fadenb:utzutzutz.netfadenb changed their profile picture.11:23:54
@fadenb:utzutzutz.netfadenb changed their profile picture.11:26:25
17 Jan 2024
@sumner:nevarro.spaceSumner Evans changed their profile picture.05:28:20
21 Jan 2024
@ma27:nicht-so.sexyma27
In reply to @dandellion:dodsorf.as

Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed

So I just played around with that and I don't think I like it that much anymore:

  • when you have a worker that fails to start (e.g. because of a configuration error), it isn't kept in failed state, but systemd regularly attempts to restart it without a timeout (as it's the case for Restart/StartLimitBurst/StartLimitInterval). This means that it's regularly brought back to the activating state which will probably give you intermittent firing/resolved messages from your monitoring depending on e.g. the sample rate of prometheus and the timing of when the service fails / gets restarted.
  • I managed to get one worker to ignore a config change on a deploy (as in I added an ExecStartPre=exit 1 for testing purposes and the service restarted, but the worker was running fine). I can't really explain what exactly was up there (systemctl cat confirmed my config change and journalctl -t systemd logged a restart of the worker in question), but that's a red flag to me.

I assume BindsTo=+RestartMode=direct may work (but bindsTo requries the latter AFAIU), but I'm also hesitant to do that because I'm not sure if this will have more weird implications.

Also, the more I think about it: synapse should either be able to wait on its remote dependencies on its own or systemd should be able to model dependencies properly here because convering by letting things fail and restart is kinda ugly IMHO and not even reliable: if the router daemon takes a little longer to converge, synapse will reach the restart timeout and then you have the exact same issue despite Upholds!

Is it perhaps possible to let synapse require the routing daemon which will only become active when converged (i.e. Type=notify)?
Also: I think we don't even need (or want?) to restart workers every time synapse itself gets restarted, do we?

12:44:46

Show newer messages


Back to Room ListRoom Version: 4