| 21 Aug 2025 |
K900 | If we're talking Hydra support, the answer is most likely just "no" | 15:41:05 |
Tristan Ross | We're figuring that out, we're needing to know what sort of requirements for infra there is for builders. | 15:43:28 |
K900 | I am going to be real, it'll be a hard sell when there's no practical usage for RISC-V and no cores that are even Ampere ast | 15:44:08 |
K900 | * I am going to be real, it'll be a hard sell when there's no practical usage for RISC-V and no cores that are even Ampere fast | 15:44:09 |
K900 | And we can't fan out early bootstrap | 15:44:32 |
K900 | I'd expect like, 2-3 days overhead per staging cycle from this, which is BAD | 15:44:47 |
K900 | If my estimates of single machine RISC-V perf are anywhere close to reality | 15:45:06 |
Alyssa Ross | Couldn't we just not block staging on RISC-V? | 15:45:28 |
K900 | We could run a separate jobset, but then the same thing could be done outside of official ifnra | 15:45:57 |
K900 | * We could run a separate jobset, but then the same thing could be done outside of official infra | 15:45:58 |
K900 | (and probably should be done outside official infra, at least initially) | 15:46:06 |
Vladimír Čunát | Trust is needed. People who have physical access to the HW can more-or-less put anything into the cache. | 15:53:05 |
Vladimír Čunát | (though not trivially at least) | 15:53:34 |
Tristan Ross | If the infra team doesn't have physical access but the hardware team does, is that acceptable for trust? | 15:53:55 |
K900 | Would not be great | 15:54:38 |
K900 | But also, again, I want to first see proof that there's enough hardware | 15:54:38 |
K900 | And that there's enough demand to justify the Hydra costs | 15:54:43 |
Tristan Ross | What sort of costs does it make on Hydra? Is it just eval and storage? | 15:55:56 |
K900 | Queue runner is slow | 15:56:35 |
emily | cache for the entire jobset on a new platform is substantial (since we pay the whole cost every staging cycle) | 15:56:40 |
K900 | Queue runner slowness scales with number of jobs | 15:56:44 |
emily | and Hydra scheduling is a bottleneck | 15:56:53 |
K900 | Storage is also still questionable | 15:56:59 |
K900 | Eval is largely not an issue | 15:57:06 |
emily | I think it would make sense to prove this externally first | 15:57:12 |
emily | with a separate Hydra | 15:57:23 |
emily | and see if the hardware can even keep up | 15:57:33 |
Tristan Ross | Ok, I'll take this back to the HW team and discuss things more. | 15:57:42 |
emily | the bottleneck right now is x86_64-darwin which I suspect is substantially faster than the best RISC-V hardware | 15:58:10 |
dramforever | i think risc-v hydra for nixpkgs is "no" | 16:02:43 |