!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

271 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.86 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
11 Oct 2024
@rosscomputerguy:matrix.orgTristan Ross
In reply to @emilazy:matrix.org
wouldn't that run into the atomics and platform purity problems wrt the evaluator?
@tomberek:matrix.org: and I were discussing this a bit last night and we're not entirely sure atomics is an actual problem. How does it affect Hydra? Wouldn't this be an issue with a C++ compiler. Hydra appears to run fine from what I've heard when running on aarch64-linux. The purity thing though, as long as the system is passed through and things are expected right, shouldn't be a concern?
13:40:41
@vcunat:matrix.orgvcunat Tristan Ross: I believe the point was around x86 being strict around reordering of instructions while ARM is not. On language level you then need to be careful around https://en.cppreference.com/w/cpp/atomic/memory_order 14:02:11
@rosscomputerguy:matrix.orgTristan Ross
In reply to @vcunat:matrix.org
Tristan Ross: I believe the point was around x86 being strict around reordering of instructions while ARM is not. On language level you then need to be careful around https://en.cppreference.com/w/cpp/atomic/memory_order
Wouldn't this affect the nix cli itself and literally everything?
14:08:29
@k900:0upti.meK900No14:08:42
@k900:0upti.meK900hydra-queue-runner is like 3k lines of C++14:09:00
@k900:0upti.meK900On top of the normal Nix things14:09:06
@k900:0upti.meK900It's those bits I'm worried about, not Nix14:09:15
@rosscomputerguy:matrix.orgTristan RossOh, is there a way to test the queue runner in a way to trigger it breaking because of this on aarch64?14:10:29
@k900:0upti.meK900Not really without just running it14:10:49
@k900:0upti.meK900Realistically this is not a big problem14:11:00
@k900:0upti.meK900It can be tested and fixed14:11:03
@k900:0upti.meK900Probably in a reasonable amount of time14:11:09
@k900:0upti.meK900It's just another thing to be aware of if migrating to aarch6414:11:29
@k900:0upti.meK900And I genuinely don't see why we need to go aarch64 instead of just upgrading to a beefier and/or better cooled x8614:12:13
@k900:0upti.meK900Hydra needs throughput, not latency, so it won't really care if we have many small cores or few big cores14:12:58
@rosscomputerguy:matrix.orgTristan RossI'm just thinking in cost versus performance, if we're able to get more performance at a lower cost then wouldn't that be better than spending on a beefier expensive but similar performing system?14:14:29
@k900:0upti.meK900Depends on how much the cost difference is14:14:55
@hexa:lossy.networkhexa (signing key rotation when)
  • current: AX101, 5950X (16C/32T @ 3.4 GHz Base Clock), 128 GB RAM (~106 EUR/mo)
  • alternatives:
    • AX162-R, Epyc 9454P (48C/96T @ 2.75 GHz Base Clock), 256 GB RAM (~241 EUR/mo)
    • RX220, Altra Q80-30 (80C @ 3.0 GHz Base Clock), 256 GB RAM (~260 EUR/mo)
14:15:50
@hexa:lossy.networkhexa (signing key rotation when)
  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
14:15:57
@hexa:lossy.networkhexa (signing key rotation when) *

bottlenecks:

  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
14:16:15
@k900:0upti.meK900Yeah I was going to say that Hetzner doesn't really have cheap ARM14:16:25
@hexa:lossy.networkhexa (signing key rotation when)(copied from an infra team discussion)14:16:34
@hexa:lossy.networkhexa (signing key rotation when)our feeling was that the epyc has stronger single core perf than the altra14:16:54
@k900:0upti.meK900(notably, the Amperes are locked at 3GHz, and the 9454P can boost to ~3.8)14:17:13
@k900:0upti.meK900So we're looking at very similar all core throughput14:17:31
@k900:0upti.meK900Also, the EPYC is DDR5 and the Altra is DDR4, which may end up mattering for eval because eval is A LOT of pointer chasing14:17:57
@rosscomputerguy:matrix.orgTristan RossFrom running Ampere, it does not have as good single core performance as other systems I've seen 14:18:03
@k900:0upti.meK900It's not supposed to14:18:10
@rosscomputerguy:matrix.orgTristan RossBut it's throughout is pretty good 14:18:14
@k900:0upti.meK900It's a many small cores design14:18:14

Show newer messages


Back to Room ListRoom Version: 6