8 Jan 2024
@hexa:lossy.networkhexaas a fix to this particular issue, but not to define the proper relationship between these services imo23:00:21
@sophie:catgirl.cloud⛧-440729 [sophie] (it/its)Fair enough23:00:45
@dandellion:dodsorf.asDandellionUpholds holds water for the workers which currently exist I think23:02:47

Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed

@dandellion:dodsorf.asDandellionwhich is what you basically always want23:03:58
@dandellion:dodsorf.asDandellionAnyone on making a pr?23:08:32
@ma27:nicht-so.sexyma27I may be able to take care of it tomorrow, too tired now23:10:10
@dandellion:dodsorf.asDandellionI'll do it once I finish up this then nw23:10:58
@dandellion:dodsorf.asDandellion* I'll do it once I finish up what I'm currently doing then, nw23:11:25
10 Jan 2024
15 Jan 2024
17 Jan 2024
21 Jan 2024
In reply to @dandellion:dodsorf.as

Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed

So I just played around with that and I don't think I like it that much anymore:

  • when you have a worker that fails to start (e.g. because of a configuration error), it isn't kept in failed state, but systemd regularly attempts to restart it without a timeout (as it's the case for Restart/StartLimitBurst/StartLimitInterval). This means that it's regularly brought back to the activating state which will probably give you intermittent firing/resolved messages from your monitoring depending on e.g. the sample rate of prometheus and the timing of when the service fails / gets restarted.
  • I managed to get one worker to ignore a config change on a deploy (as in I added an ExecStartPre=exit 1 for testing purposes and the service restarted, but the worker was running fine). I can't really explain what exactly was up there (systemctl cat confirmed my config change and journalctl -t systemd logged a restart of the worker in question), but that's a red flag to me.

I assume BindsTo=+RestartMode=direct may work (but bindsTo requries the latter AFAIU), but I'm also hesitant to do that because I'm not sure if this will have more weird implications.

Also, the more I think about it: synapse should either be able to wait on its remote dependencies on its own or systemd should be able to model dependencies properly here because convering by letting things fail and restart is kinda ugly IMHO and not even reliable: if the router daemon takes a little longer to converge, synapse will reach the restart timeout and then you have the exact same issue despite Upholds!

Is it perhaps possible to let synapse require the routing daemon which will only become active when converged (i.e. Type=notify)?
Also: I think we don't even need (or want?) to restart workers every time synapse itself gets restarted, do we?

26 Jan 2024
@hexa:lossy.networkhexahrm, the synapse module also prevents me from using unix domain socket listeners16:34:01
@hexa:lossy.networkhexasupported since 1.8916:34:11
@hexa:lossy.networkhexakinda proves my point that nobody reads the release notes 😛16:34:21
@dandellion:dodsorf.asDandellion h7x4: has been working on that in https://github.com/dali99/nixos-matrix-modules/pull/7, the unix socket part is complete afaik 16:36:53
@hexa:lossy.networkhexaI kinda want it for the nixos.org homeserver16:37:43
@hexa:lossy.networkhexawould be cool if that could land in the next week or so16:37:58
@dandellion:dodsorf.asDandellionI'll ask if the sockets are ready for merge on their own16:41:40
@hexa:lossy.networkhexathank you!16:43:39
@h7x4:nani.wtfh7x4I'll have another look at it :)16:44:41

