| 21 Aug 2025 |
K900 | And we can't fan out early bootstrap | 15:44:32 |
K900 | I'd expect like, 2-3 days overhead per staging cycle from this, which is BAD | 15:44:47 |
K900 | If my estimates of single machine RISC-V perf are anywhere close to reality | 15:45:06 |
Alyssa Ross | Couldn't we just not block staging on RISC-V? | 15:45:28 |
K900 | We could run a separate jobset, but then the same thing could be done outside of official ifnra | 15:45:57 |
K900 | * We could run a separate jobset, but then the same thing could be done outside of official infra | 15:45:58 |
K900 | (and probably should be done outside official infra, at least initially) | 15:46:06 |
Vladimír Čunát | Trust is needed. People who have physical access to the HW can more-or-less put anything into the cache. | 15:53:05 |
Vladimír Čunát | (though not trivially at least) | 15:53:34 |
Tristan Ross | If the infra team doesn't have physical access but the hardware team does, is that acceptable for trust? | 15:53:55 |
K900 | Would not be great | 15:54:38 |
K900 | But also, again, I want to first see proof that there's enough hardware | 15:54:38 |
K900 | And that there's enough demand to justify the Hydra costs | 15:54:43 |
Tristan Ross | What sort of costs does it make on Hydra? Is it just eval and storage? | 15:55:56 |
K900 | Queue runner is slow | 15:56:35 |
emily | cache for the entire jobset on a new platform is substantial (since we pay the whole cost every staging cycle) | 15:56:40 |
K900 | Queue runner slowness scales with number of jobs | 15:56:44 |
emily | and Hydra scheduling is a bottleneck | 15:56:53 |
K900 | Storage is also still questionable | 15:56:59 |
K900 | Eval is largely not an issue | 15:57:06 |
emily | I think it would make sense to prove this externally first | 15:57:12 |
emily | with a separate Hydra | 15:57:23 |
emily | and see if the hardware can even keep up | 15:57:33 |
Tristan Ross | Ok, I'll take this back to the HW team and discuss things more. | 15:57:42 |
emily | the bottleneck right now is x86_64-darwin which I suspect is substantially faster than the best RISC-V hardware | 15:58:10 |
dramforever | i think risc-v hydra for nixpkgs is "no" | 16:02:43 |
dramforever | but we could have machines for y'all to play with | 16:02:58 |
Mic92 | @hexa:lossy.network: is it queue runner or infra week? | 16:03:18 |
hexa | infra, but I have another duty to attend to | 16:03:36 |
Jeremy Fleischman (jfly) | infra | 16:03:40 |