!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

271 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.86 Servers

Load older messages


SenderMessageTime
11 Oct 2024
@k900:0upti.meK900Not really without just running it14:10:49
@k900:0upti.meK900Realistically this is not a big problem14:11:00
@k900:0upti.meK900It can be tested and fixed14:11:03
@k900:0upti.meK900Probably in a reasonable amount of time14:11:09
@k900:0upti.meK900It's just another thing to be aware of if migrating to aarch6414:11:29
@k900:0upti.meK900And I genuinely don't see why we need to go aarch64 instead of just upgrading to a beefier and/or better cooled x8614:12:13
@k900:0upti.meK900Hydra needs throughput, not latency, so it won't really care if we have many small cores or few big cores14:12:58
@rosscomputerguy:matrix.orgTristan RossI'm just thinking in cost versus performance, if we're able to get more performance at a lower cost then wouldn't that be better than spending on a beefier expensive but similar performing system?14:14:29
@k900:0upti.meK900Depends on how much the cost difference is14:14:55
@hexa:lossy.networkhexa (signing key rotation when)
  • current: AX101, 5950X (16C/32T @ 3.4 GHz Base Clock), 128 GB RAM (~106 EUR/mo)
  • alternatives:
    • AX162-R, Epyc 9454P (48C/96T @ 2.75 GHz Base Clock), 256 GB RAM (~241 EUR/mo)
    • RX220, Altra Q80-30 (80C @ 3.0 GHz Base Clock), 256 GB RAM (~260 EUR/mo)
14:15:50
@hexa:lossy.networkhexa (signing key rotation when)
  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
14:15:57
@hexa:lossy.networkhexa (signing key rotation when) *

bottlenecks:

  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
14:16:15
@k900:0upti.meK900Yeah I was going to say that Hetzner doesn't really have cheap ARM14:16:25
@hexa:lossy.networkhexa (signing key rotation when)(copied from an infra team discussion)14:16:34
@hexa:lossy.networkhexa (signing key rotation when)our feeling was that the epyc has stronger single core perf than the altra14:16:54
@k900:0upti.meK900(notably, the Amperes are locked at 3GHz, and the 9454P can boost to ~3.8)14:17:13
@k900:0upti.meK900So we're looking at very similar all core throughput14:17:31
@k900:0upti.meK900Also, the EPYC is DDR5 and the Altra is DDR4, which may end up mattering for eval because eval is A LOT of pointer chasing14:17:57
@rosscomputerguy:matrix.orgTristan RossFrom running Ampere, it does not have as good single core performance as other systems I've seen 14:18:03
@k900:0upti.meK900It's not supposed to14:18:10
@rosscomputerguy:matrix.orgTristan RossBut it's throughout is pretty good 14:18:14
@k900:0upti.meK900It's a many small cores design14:18:14
@k900:0upti.meK900Like, this is going to depend on how well we can utilize SMT14:20:06
@k900:0upti.meK900But I'd expect roughly similar MT perf14:20:18
@k900:0upti.meK900With a pretty strong ST lead for the Epyc14:20:23
@rosscomputerguy:matrix.orgTristan RossGotcha, and the thermals would probably be similar 14:20:53
@k900:0upti.meK900Thermals, frankly, should not be our problem14:21:59
@k900:0upti.meK900If Hetzner can't figure out a way to get us hardware that's not thermal throttling, we'll just have to do the math14:22:32
@rosscomputerguy:matrix.orgTristan RossYeah 14:22:57
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network

bottlenecks:

  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
Eval is parallel in hydra
14:27:12

Show newer messages


Back to Room ListRoom Version: 6