| 24 May 2025 |
emily | 09:38:56 up 8:56, 3 users, load average: 165.47, 75.01, 30.85
| 09:39:24 |
emily | this is on the x86 Linux box with 24 cores | 09:39:28 |
emily | do we have any way of setting up cgroups to limit the total number of cores a given builder user can use or is it hopeless because it all goes through the daemon? | 09:40:17 |
emily | maybe some way of hard-limiting the jobs/cores the daemon will accept? | 09:40:45 |
emily | 7× overloaded is really a bit much… | 09:41:03 |
emily | every time I check the builders to go to use them they're either completely/almost unused or completely dying from load | 09:41:54 |
emily | I don't think expecting people to check uptime and choose parallelism settings considerately based on utilization is working out | 09:42:30 |
emily | maybe it would be a good idea to disable the remote builder protocol entirely so that it has to be used by SSHing in and random Nix commands won't inevitably overload the builder due to bad remote builder configuration? | 09:43:11 |
emily | I really value the builders as an invaluable shared resource but for the last several months the load balancing has really been messed up. just trying to figure out how we can make them usable as a shared resource | 09:44:04 |
Gaétan Lepage | In reply to @emilazy:matrix.org
09:38:56 up 8:56, 3 users, load average: 165.47, 75.01, 30.85
@matthias? | 09:48:37 |
emily | not sure if he is on Matrix? | 09:48:54 |
emily | it's now 11× overloaded. I'm going to try wall | 09:49:20 |
emily | not sure if that actually went through since I'm not root, but I sent a wall | 09:51:05 |
Gaétan Lepage | Is there a way to restart the daemon on the darwin builder? Some derivations are locked, but they should have been cancelled. | 09:55:28 |
zowoq | Restarted. | 12:10:27 |
Gaétan Lepage | Thanks! | 12:10:34 |
zowoq | Might be possible, I'll take a look. | 12:28:17 |
zowoq | Could try this as well. | 12:28:29 |
zowoq | slurm has been mentioned a couple of times, might be time that we try something like that as well. | 12:32:00 |
Gaétan Lepage | In reply to @zowoq:matrix.org slurm has been mentioned a couple of times, might be time that we try something like that as well. I don't know how we could combine the nix daemon and slurm in practice, but it would be great if we could. | 12:39:16 |
| sss joined the room. | 13:02:23 |
l0b0 | Redacted or Malformed Event | 13:13:13 |
| 28 May 2025 |
| @deeok:matrix.org joined the room. | 00:01:11 |
| 30 May 2025 |
| tpw_rules joined the room. | 22:25:13 |
tpw_rules | hi all, finally time to migrate nixos-apple-silicon here. apparently Jonas Chevalier has been previously alerted? | 22:25:56 |
| tgerbet joined the room. | 23:03:54 |