OfBorg - Public Room Timeline

	OfBorg	156 Members
	Number of builds and evals in queue: <TBD>	57 Servers

Load older messages

Sender	Message	Time
29 Sep 2023
Lily Foster	In reply to @7c6f434c:nitro.chat I wonder if this will be an error-out during pure eval, though `lib.isPureEval`	14:33:37
Lily Foster	* `lib.inPureEvalMode`	14:34:49
7c6f434c	Fortunately existence of stable branches would make tying to details of ofBorg deployments just too painful and impractical…	14:38:34
Lily Foster	Stables branches or not, it's a bad and frail idea	14:41:41
Lily Foster	Please don't seriously consider this in any capacity....	14:42:19
cole-h	My problem with increasing the ofborg timeout on darwin is that the darwin builders are pretty slow as it is. I worry that that would cause the darwin queue to blow up (as has happened in the past even without a longer timeout). It would be interesting to explore an "ofborgWillTimeoutOnTheseSystems" predicate, though.	15:40:09
@infinisil:matrix.org	cole-h: What about a dynamic approach: When the queue is too long, time out the longest-running job until it's short enough again	15:58:15
@asymmetric:matrix.dapp.org.uk	could we add a command to ofborg, so that one could do `@ofborg set timeout 2h` or something, on a case by case basis, as a github comment? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes	15:59:46
@asymmetric:matrix.dapp.org.uk	* could we add another command to ofborg, so that one could do `@ofborg set timeout 2h` or something, on a case by case basis, as a github comment? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes	15:59:54
@asymmetric:matrix.dapp.org.uk	* could we add another command to ofborg, so that one could do `@ofborg set timeout 2h aarch64-darwin` or something, on a case by case basis, as a github comment? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes	16:00:06
@asymmetric:matrix.dapp.org.uk	* could we add another command to ofborg, so that one could do `@ofborg set timeout 2h aarch64-darwin` or something, on a case by case basis, as a github comment on the pr of a specific package? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes	16:00:24
@asymmetric:matrix.dapp.org.uk	* could we add another command to ofborg, so that one could do `@ofborg set timeout 2h polkadot aarch64-darwin` or something, on a case by case basis, as a github comment on the pr of a specific package? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes	16:00:31
cole-h	In reply to @infinisil:matrix.org cole-h: What about a dynamic approach: When the queue is too long, time out the longest-running job until it's short enough again As far as I know, RabbitMQ (what ofborg uses) is a "dumb" queue system. I don't know if we get information about the queue aside from the fact that there's a job we can take, and then communicating that we succeeded a job... (I'm not all that familiar with RabbitMQ, however)	16:01:32
cole-h	In reply to @asymmetric:matrix.dapp.org.uk could we add another command to ofborg, so that one could do `@ofborg set timeout 2h polkadot aarch64-darwin` or something, on a case by case basis, as a github comment on the pr of a specific package? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes An interesting idea, but I think the problem with that is that 1) the timeout is set in the `nix-build` command (so there's no way to change it once the build has started running); and 2) the machine that processes comment commands is separate from the machines that run those commands (builds, evals, etc)...	16:03:22
cole-h	In reply to @asymmetric:matrix.dapp.org.uk could we add another command to ofborg, so that one could do `@ofborg set timeout 2h polkadot aarch64-darwin` or something, on a case by case basis, as a github comment on the pr of a specific package? this would at least be a stopgap, and would mean we don't have to "pollute" `meta` with ofborg-specific attributes * An interesting idea, but I think the problem with that is that 1) the timeout is set for the `nix-build` command (so there's no way to change it once the build has started running); and 2) the machine that processes comment commands is separate from the machines that run those commands (builds, evals, etc)...	16:03:38
@asymmetric:matrix.dapp.org.uk	In reply to @cole-h:matrix.org An interesting idea, but I think the problem with that is that 1) the timeout is set for the `nix-build` command (so there's no way to change it once the build has started running); and 2) the machine that processes comment commands is separate from the machines that run those commands (builds, evals, etc)... yeah i thought that we could either: add the comment in the PR body, so that it gets picked up before the job starts; or add the behaviour that when this comment is set, a build is restarted; or add a `restart` command which must also be used when using `set timeout` i'm not sure i understand the implications of your point 2) though -- couldn't the comments-listening machine direct the building-machine?	16:06:12
cole-h	My point with #2 is that (as far as I know), the coordinator doesn't know which machine took the job, so how should it determine which build machine to inform about the change? (I think the simplest depiction of how it all works, from my knowledge, is GitHub webhook -> ofborg-core (coordinator) -> AMQP server <-> ofborg-eval-X (evaluator and builder)) But the "comment in PR body" could work, if ofborg is notified about that information (it must be, in some roundabout way, because we have access to the PR number)......	16:10:11
@asymmetric:matrix.dapp.org.uk	In reply to @cole-h:matrix.org My point with #2 is that (as far as I know), the coordinator doesn't know which machine took the job, so how should it determine which build machine to inform about the change? (I think the simplest depiction of how it all works, from my knowledge, is GitHub webhook -> ofborg-core (coordinator) -> AMQP server <-> ofborg-eval-X (evaluator and builder)) But the "comment in PR body" could work, if ofborg is notified about that information (it must be, in some roundabout way, because we have access to the PR number)...... `ofborg-core` being the php code running `php`? as that seems to be the thing that sits between hooks and rabbitmq	19:05:05
@asymmetric:matrix.dapp.org.uk	In reply to @cole-h:matrix.org My point with #2 is that (as far as I know), the coordinator doesn't know which machine took the job, so how should it determine which build machine to inform about the change? (I think the simplest depiction of how it all works, from my knowledge, is GitHub webhook -> ofborg-core (coordinator) -> AMQP server <-> ofborg-eval-X (evaluator and builder)) But the "comment in PR body" could work, if ofborg is notified about that information (it must be, in some roundabout way, because we have access to the PR number)...... * `ofborg-core` being the php code under `./php`? as that seems to be the thing that sits between hooks and rabbitmq	19:05:17
cole-h	Not solely that. The `core` machine also runs most of these binaries (that aren't build or eval related): https://github.com/NixOS/ofborg/tree/released/ofborg/src/bin	19:06:39
1 Oct 2023
	cafkafk joined the room.	14:39:06
cafkafk	disk full https://github.com/NixOS/nixpkgs/pull/258395/checks?check_run_id=17292775692	14:39:14
hexa	https://ofborg.org/prometheus/alerts hmm	14:41:55
hexa	don't think we monitor the darwin machines in prometheus	14:43:41
Lily Foster	In reply to @hexa:lossy.network don't think we monitor the darwin machines in prometheus We do, but I bet they don't push that metric	14:45:15
hexa	https://ofborg.org/prometheus/graph?g0.expr=node_os_info&g0.tab=1&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1h	14:45:36
hexa	I don't think we do 😄	14:46:07
Lily Foster	Yeah looks like they are only sending ofborg metrics	14:46:19
Lily Foster	No system metrics	14:46:23
hexa	oh	14:46:30

Show newer messages

Back to Room ListRoom Version: 6