| 14 Oct 2025 |
emily | Hydra builds stdenv on staging, so by the time we do staging-next that part is frequently already done. after stdenv, it fans out pretty quickly | 17:12:29 |
emily | once you have gotten through e.g. Rust, LLVM, CMake, Meson you are off to the races and will never want for jobs. | 17:12:45 |
Vladimír Čunát | Most x86_64-linux jobs are only built with -j2 on Hydra.nixos.org currently. | 17:13:00 |
Vladimír Čunát | We primarily scale by running many jobs concurrently. | 17:13:16 |
Sami Liedes | I think in my case they would, but that happens probably because I have erred on the higher side of average load. The point is that the serialized configure processes also get seriously throttled (don't get a full core), whereas it would be ideal if serialized stages of derivations got the CPU they want, which I believe would be achievable by balancing the builders/derivations instead of processes. | 17:13:48 |
K900 | If you're saturating the CPU, it doesn't matter what you're saturating it with | 17:14:22 |
K900 | You'll have the same total build time anyway | 17:14:33 |
Vladimír Čunát | It could matter. | 17:14:49 |
Sami Liedes | Sure it does. You finish fastest if you allocate max cpu to the critical path. | 17:14:56 |
Vladimír Čunát | * It could matter. I mean, it could affect CPU saturation in future. | 17:15:09 |
Sami Liedes | And I see that happen in practice with my builds; some serialized configure gets throttled because lots of derivations build, then it runs out of derivations and waits until that is finished, and then explodes again. | 17:15:45 |
Vladimír Čunát | Hydra has so much work that critical path matters little in practice. | 17:16:03 |
Vladimír Čunát | (Hundreds of thousands jobs.) | 17:16:28 |
K900 | At Hydra scale the critical path finishes long before we're out of work in pretty much all cases | 17:17:07 |
Sami Liedes | Right. That I can imagine (and is what I asked for :). So not building e.g. only a single architecture of a single channel helps presumably a lot. | 17:17:06 |
Vladimír Čunát | No, that's not it. | 17:17:29 |
Vladimír Čunát | Architectures mostly don't share hardware. | 17:17:45 |
K900 | Hydra mostly always has overlapping loads | 17:17:56 |
Vladimír Čunát | (only *-darwin in our case) | 17:17:55 |
Sami Liedes | That makes sense. | 17:18:04 |
Vladimír Čunát | We always have at least unstable and stable. And staging-* corresponding to those two. Lots of work. And sometimes extra stuff like testing glibc upgrade. | 17:19:00 |
Sami Liedes | What kinds of loads then?
I have used as my test case building pkgs.freecad from an empty slate. I think this might be less of a problem if I reduced my cores/parallelism so that my load average didn't climb into 100-200 on this 16 core (32 thread) machine. | 17:19:02 |
K900 | You definitely should do that, yes | 17:19:27 |
K900 | You want to overcommit maybe 2x, but not 10x | 17:19:48 |
Sami Liedes | Or maybe it's something else weird I'm doing, but I feel I see too many builds of binutils which seem to configure very slowly and serially :-) | 17:20:06 |
Vladimír Čunát | That's a pain point. But less so on Hydra scale. | 17:20:10 |
Vladimír Čunát | (which is primarily about throughput) | 17:20:30 |
Sami Liedes | I actually also hacked on patches to make nix use systemd cgroups, but should figure out how to test it without risking my workstation... | 17:20:55 |
Sami Liedes | Good to know that Hydra doesn't suffer of this :) | 17:21:25 |
Sami Liedes | Just wanted to know if it's only my small scale! | 17:21:35 |