!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

271 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.86 Servers

Load older messages


SenderMessageTime
11 Oct 2024
@k900:0upti.meK900Also, the EPYC is DDR5 and the Altra is DDR4, which may end up mattering for eval because eval is A LOT of pointer chasing14:17:57
@rosscomputerguy:matrix.orgTristan RossFrom running Ampere, it does not have as good single core performance as other systems I've seen 14:18:03
@k900:0upti.meK900It's not supposed to14:18:10
@rosscomputerguy:matrix.orgTristan RossBut it's throughout is pretty good 14:18:14
@k900:0upti.meK900It's a many small cores design14:18:14
@k900:0upti.meK900Like, this is going to depend on how well we can utilize SMT14:20:06
@k900:0upti.meK900But I'd expect roughly similar MT perf14:20:18
@k900:0upti.meK900With a pretty strong ST lead for the Epyc14:20:23
@rosscomputerguy:matrix.orgTristan RossGotcha, and the thermals would probably be similar 14:20:53
@k900:0upti.meK900Thermals, frankly, should not be our problem14:21:59
@k900:0upti.meK900If Hetzner can't figure out a way to get us hardware that's not thermal throttling, we'll just have to do the math14:22:32
@rosscomputerguy:matrix.orgTristan RossYeah 14:22:57
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network

bottlenecks:

  • parallel compress slots (currently limited at 30, which seems reasonable in relation to the compute rhea has)
  • eval memory, which we compensate with zram at 150%
  • eval time, which is single-threaded and probably not fixable through hw upgrades
Eval is parallel in hydra
14:27:12
@hexa:lossy.networkhexa (signing key rotation when)it can be, but it is not on h.n.o14:27:33
@joerg:thalheim.ioMic92Not enabled?14:27:52
@hexa:lossy.networkhexa (signing key rotation when)evaling trunk-combined exceeds the available memory14:27:58
@k900:0upti.meK900Single threaded eval nearly OOMs the box14:28:08
@k900:0upti.meK900And the way parallel eval works just makes it even worse14:28:20
@joerg:thalheim.ioMic92Ok. Got it14:28:22
@joerg:thalheim.ioMic92Is there a derivation that depends on all other derivations?14:30:54
@joerg:thalheim.ioMic92Because this not normal 14:31:07
@joerg:thalheim.ioMic92I am able to eval arbitrary large package sets with nix-eval-jobs 14:33:33
@joerg:thalheim.ioMic92It will reclaim memory 14:33:45
@k900:0upti.meK900
In reply to@joerg:thalheim.io
Is there a derivation that depends on all other derivations?
The tested job depends on A LOT of things
14:34:31
@k900:0upti.meK900I believe it is the primary bottleneck14:34:37
@k900:0upti.meK900Because one of the things it depends on is like 200 VM tests14:34:50
@joerg:thalheim.ioMic92Sure but every of these should be able to eval independently 14:35:25
@rosscomputerguy:matrix.orgTristan RossIs it possible to predict memory usage of different derivations during eval based on the usage from the previous evals and then use that queue parallel evals in a way to not OOM?14:35:36
@k900:0upti.meK900Theoretically, yes14:35:49
@k900:0upti.meK900Practically, by the time you have tooling to do that, you can probably use the same tooling to just reduce eval requirements on the nixpkgs side14:36:09

Show newer messages


Back to Room ListRoom Version: 6