NixOS Matrix Subsystem - Public Room Timeline

	NixOS Matrix Subsystem	139 Members
	Coordination and discussion about the matrix subsystem in NixOS - https://nixos.wiki/wiki/Matrix	71 Servers

Load older messages

Sender	Message	Time
8 Jan 2024
Dandellion	I've not seen/used it before so don't really know what it does	22:53:11
Dandellion	* I've never seen/used it before so don't really know what it does	22:53:17
⛧-440729 [sophie] (it/its)	In reply to @hexa:lossy.network anyway, that means matrix-synapse.service can never reach failed state, which is not ideal for monitoring How so? If restart limits are hit the service should still transition to inactive/failed even with direct set	22:54:49
hexa	oh yeah, that is correct	22:55:04
⛧-440729 [sophie] (it/its)	Or we use the `Upholds=` fix mentioned in the issue Dandellion linked	22:56:19
hexa	yeah, that sounds like exactly what we want	22:57:11
⛧-440729 [sophie] (it/its)	We could add `Upholds=matrix-synapse-worker-...service` to `matrix-synapse.service` so as soon as `matrix-synapse.service` is running it should start the workesr	22:57:18
⛧-440729 [sophie] (it/its)	* We could add `Upholds=matrix-synapse-worker-...service` to `matrix-synapse.service` so as soon as `matrix-synapse.service` is running it should start the workers	22:57:27
⛧-440729 [sophie] (it/its)	Not sure how stopping everything with the target would work in that case	22:57:44
hexa	`BindsTo=`?	22:57:56
⛧-440729 [sophie] (it/its)	Hmm I feel like `RestartMode=direct` is the somewhat cleaner solution but idk	22:59:52
hexa	as a fix to this particular issue, but not to define the proper relationship between these services imo	23:00:21
⛧-440729 [sophie] (it/its)	Fair enough	23:00:45
Dandellion	Upholds holds water for the workers which currently exist I think	23:02:47
Dandellion	Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed	23:03:20
Dandellion	which is what you basically always want	23:03:58
Dandellion	Anyone on making a pr?	23:08:32
ma27	I may be able to take care of it tomorrow, too tired now	23:10:10
Dandellion	I'll do it once I finish up this then nw	23:10:58
Dandellion	* I'll do it once I finish up what I'm currently doing then, nw	23:11:25
10 Jan 2024
	csyn joined the room.	04:53:49
	philipp changed their display name from philipp to philipp (prolog in linux kernel, when).	13:47:05
	philipp changed their display name from philipp (prolog in linux kernel, when) to philipp (prolog in linux kernel when).	13:47:16
	philipp changed their display name from philipp (prolog in linux kernel when) to test.	13:47:52
	philipp changed their display name from test to philipp.	13:49:29
15 Jan 2024
	fadenb changed their profile picture.	11:22:15
	fadenb changed their profile picture.	11:23:54
	fadenb changed their profile picture.	11:26:25
17 Jan 2024
	Sumner Evans changed their profile picture.	05:28:20
21 Jan 2024
ma27	In reply to @dandellion:dodsorf.as Configures dependencies similar to Wants=, but as long as this unit is up, all units listed in Upholds= are started whenever found to be inactive or failed So I just played around with that and I don't think I like it that much anymore: when you have a worker that fails to start (e.g. because of a configuration error), it isn't kept in failed state, but systemd regularly attempts to restart it without a timeout (as it's the case for Restart/StartLimitBurst/StartLimitInterval). This means that it's regularly brought back to the activating state which will probably give you intermittent firing/resolved messages from your monitoring depending on e.g. the sample rate of prometheus and the timing of when the service fails / gets restarted. I managed to get one worker to ignore a config change on a deploy (as in I added an `ExecStartPre=exit 1` for testing purposes and the service restarted, but the worker was running fine). I can't really explain what exactly was up there (`systemctl cat` confirmed my config change and `journalctl -t systemd` logged a restart of the worker in question), but that's a red flag to me. I assume `BindsTo=`+`RestartMode=direct` may work (but bindsTo requries the latter AFAIU), but I'm also hesitant to do that because I'm not sure if this will have more weird implications. Also, the more I think about it: synapse should either be able to wait on its remote dependencies on its own or systemd should be able to model dependencies properly here because convering by letting things fail and restart is kinda ugly IMHO and not even reliable: if the router daemon takes a little longer to converge, synapse will reach the restart timeout and then you have the exact same issue despite Upholds! Is it perhaps possible to let synapse require the routing daemon which will only become active when converged (i.e. Type=notify)? Also: I think we don't even need (or want?) to restart workers every time synapse itself gets restarted, do we?	12:44:46

Show newer messages

Back to Room ListRoom Version: 4