11 Oct 2024 |
Vladimír Čunát | Tristan Ross: I believe the point was around x86 being strict around reordering of instructions while ARM is not. On language level you then need to be careful around https://en.cppreference.com/w/cpp/atomic/memory_order | 14:02:11 |
Tristan Ross | In reply to @vcunat:matrix.org Tristan Ross: I believe the point was around x86 being strict around reordering of instructions while ARM is not. On language level you then need to be careful around https://en.cppreference.com/w/cpp/atomic/memory_order Wouldn't this affect the nix cli itself and literally everything? | 14:08:29 |
K900 | No | 14:08:42 |
K900 | hydra-queue-runner is like 3k lines of C++ | 14:09:00 |
K900 | On top of the normal Nix things | 14:09:06 |
K900 | It's those bits I'm worried about, not Nix | 14:09:15 |
Tristan Ross | Oh, is there a way to test the queue runner in a way to trigger it breaking because of this on aarch64? | 14:10:29 |
K900 | Not really without just running it | 14:10:49 |
K900 | Realistically this is not a big problem | 14:11:00 |
K900 | It can be tested and fixed | 14:11:03 |
K900 | Probably in a reasonable amount of time | 14:11:09 |
K900 | It's just another thing to be aware of if migrating to aarch64 | 14:11:29 |
K900 | And I genuinely don't see why we need to go aarch64 instead of just upgrading to a beefier and/or better cooled x86 | 14:12:13 |
K900 | Hydra needs throughput, not latency, so it won't really care if we have many small cores or few big cores | 14:12:58 |
Tristan Ross | I'm just thinking in cost versus performance, if we're able to get more performance at a lower cost then wouldn't that be better than spending on a beefier expensive but similar performing system? | 14:14:29 |
K900 | Depends on how much the cost difference is | 14:14:55 |
hexa |
- current: AX101, 5950X (16C/32T @ 3.4 GHz Base Clock), 128 GB RAM (~106 EUR/mo)
- alternatives:
- AX162-R, Epyc 9454P (48C/96T @ 2.75 GHz Base Clock), 256 GB RAM (~241 EUR/mo)
- RX220, Altra Q80-30 (80C @ 3.0 GHz Base Clock), 256 GB RAM (~260 EUR/mo)
| 14:15:50 |
hexa |
- parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
- eval memory, which we compensate with zram at 150%
- eval time, which is single-threaded and probably not fixable through hw upgrades
| 14:15:57 |
hexa | * bottlenecks:
- parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
- eval memory, which we compensate with zram at 150%
- eval time, which is single-threaded and probably not fixable through hw upgrades
| 14:16:15 |
K900 | Yeah I was going to say that Hetzner doesn't really have cheap ARM | 14:16:25 |
hexa | (copied from an infra team discussion) | 14:16:34 |
hexa | our feeling was that the epyc has stronger single core perf than the altra | 14:16:54 |
K900 | (notably, the Amperes are locked at 3GHz, and the 9454P can boost to ~3.8) | 14:17:13 |
K900 | So we're looking at very similar all core throughput | 14:17:31 |
K900 | Also, the EPYC is DDR5 and the Altra is DDR4, which may end up mattering for eval because eval is A LOT of pointer chasing | 14:17:57 |
Tristan Ross | From running Ampere, it does not have as good single core performance as other systems I've seen | 14:18:03 |
K900 | It's not supposed to | 14:18:10 |
Tristan Ross | But it's throughout is pretty good | 14:18:14 |
K900 | It's a many small cores design | 14:18:14 |
K900 | Like, this is going to depend on how well we can utilize SMT | 14:20:06 |