11 Oct 2024 |
K900 | And I genuinely don't see why we need to go aarch64 instead of just upgrading to a beefier and/or better cooled x86 | 14:12:13 |
K900 | Hydra needs throughput, not latency, so it won't really care if we have many small cores or few big cores | 14:12:58 |
Tristan Ross | I'm just thinking in cost versus performance, if we're able to get more performance at a lower cost then wouldn't that be better than spending on a beefier expensive but similar performing system? | 14:14:29 |
K900 | Depends on how much the cost difference is | 14:14:55 |
hexa (signing key rotation when) |
- current: AX101, 5950X (16C/32T @ 3.4 GHz Base Clock), 128 GB RAM (~106 EUR/mo)
- alternatives:
- AX162-R, Epyc 9454P (48C/96T @ 2.75 GHz Base Clock), 256 GB RAM (~241 EUR/mo)
- RX220, Altra Q80-30 (80C @ 3.0 GHz Base Clock), 256 GB RAM (~260 EUR/mo)
| 14:15:50 |
hexa (signing key rotation when) |
- parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
- eval memory, which we compensate with zram at 150%
- eval time, which is single-threaded and probably not fixable through hw upgrades
| 14:15:57 |
hexa (signing key rotation when) | * bottlenecks:
- parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
- eval memory, which we compensate with zram at 150%
- eval time, which is single-threaded and probably not fixable through hw upgrades
| 14:16:15 |
K900 | Yeah I was going to say that Hetzner doesn't really have cheap ARM | 14:16:25 |
hexa (signing key rotation when) | (copied from an infra team discussion) | 14:16:34 |
hexa (signing key rotation when) | our feeling was that the epyc has stronger single core perf than the altra | 14:16:54 |
K900 | (notably, the Amperes are locked at 3GHz, and the 9454P can boost to ~3.8) | 14:17:13 |
K900 | So we're looking at very similar all core throughput | 14:17:31 |
K900 | Also, the EPYC is DDR5 and the Altra is DDR4, which may end up mattering for eval because eval is A LOT of pointer chasing | 14:17:57 |
Tristan Ross | From running Ampere, it does not have as good single core performance as other systems I've seen | 14:18:03 |
K900 | It's not supposed to | 14:18:10 |
Tristan Ross | But it's throughout is pretty good | 14:18:14 |
K900 | It's a many small cores design | 14:18:14 |
K900 | Like, this is going to depend on how well we can utilize SMT | 14:20:06 |
K900 | But I'd expect roughly similar MT perf | 14:20:18 |
K900 | With a pretty strong ST lead for the Epyc | 14:20:23 |
Tristan Ross | Gotcha, and the thermals would probably be similar | 14:20:53 |
K900 | Thermals, frankly, should not be our problem | 14:21:59 |
K900 | If Hetzner can't figure out a way to get us hardware that's not thermal throttling, we'll just have to do the math | 14:22:32 |
Tristan Ross | Yeah | 14:22:57 |
Mic92 | In reply to @hexa:lossy.network
bottlenecks:
- parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
- eval memory, which we compensate with zram at 150%
- eval time, which is single-threaded and probably not fixable through hw upgrades
Eval is parallel in hydra | 14:27:12 |
hexa (signing key rotation when) | it can be, but it is not on h.n.o | 14:27:33 |
Mic92 | Not enabled? | 14:27:52 |