| 30 Aug 2021 |
Vladimír Čunát | * (i.e. the jobset just gets checked and merged within a couple days) | 19:17:50 |
Vladimír Čunát | Ah, well, I didn't mean this iteration speed :-) Maybe it now consumes quite a fraction of Hydra's resources. | 19:18:21 |
hexa | pretty sure the cycle is bi-weekly now | 19:18:27 |
sterni | it's at least bi-weekly, it really depends on how much regressions there are to fix | 19:18:46 |
sterni | i. e. I have merged the branch within a three days a couple of times before | 19:19:07 |
Vladimír Čunát | When you have multiple mass rebuilds, it doesn't make sense for any pair to have similar amount of shares, especially if they target the same branch and their combination creates yet another mass rebuild. You basically want a priority queue instead. | 19:19:39 |
Vladimír Čunát | (at least as long as the rebuild resources are relatively scarce) | 19:20:01 |
Vladimír Čunát | Now of course the contention is who gets more priority :-) | 19:20:30 |
sterni | in my experience the factors scheduled job count and time they have been scheduled for is more relevant than scheduling shares anyways | 19:20:36 |
sterni | haskell-updates quite often gets into a situation where nothing is built for days even though it probably has the highest scheduling shares on hydra atm | 19:21:09 |
Vladimír Čunát | Ah, yes... I've heard that already. And I also noticed myself that sometimes the scheduler appears to act weird. | 19:22:07 |
hexa | maybe x86_64-darwin related? 🤔 | 19:22:08 |
sterni | as far as I understand it Hydra tries to balance build time so at some point if you have built to much you are just getting nothing anymore | 19:22:17 |
sterni | and yeah we often get stuck on x86_64-darwin, but also aarch64-linux sometimes not sure what that is about | 19:22:41 |
Vladimír Čunát | Yes, another possibility is that you've drained the shares and that's why you didn't get much for some time. | 19:23:01 |
Vladimír Čunát | Also note that stable release jobsets have even much higher amount of shares than haskell. | 19:24:02 |
sterni | I think the high share count for haskell-updates could also stem from the fact that a full rebuild wouldn't be too many jobs in the past (<5000) | 19:40:11 |
sterni | since we now support more platforms and have more working packages in general, we have a more respectable scheduled job count on a full rebuild which is probably also beneficial for getting builds scheduled | 19:41:02 |
| 31 Aug 2021 |
andi- | In reply to @vcunat:matrix.org I lowered them now. nixos-unstable is still waiting for the first bump with new openssl. Anyone able to up the shares again? We've tons of idle x86_64 capacity but due to the 100 shares limit none of the jobs are actually being schedule there... At this rate it'll take way over a week what usually takes ~8hs. There are also 37k jobs in the queue. Not sure why they aren't being picked up. Probably also out of shares on those jobsets? | 23:35:54 |
| 1 Sep 2021 |
hexa | (this is about the systemd-v249 jobset) | 00:54:08 |
Vladimír Čunát | staging-next-21.05 (i.e. the first 21.05 with secure openssl) has over 30k queued x86_64-linux jobs. So I can't see what you mean about the idle capacity. | 04:30:59 |
Vladimír Čunát | I also thought the shares only affect relative priorities, i.e. if there's free capacity, I expect the matching jobs to get scheduled regardless of shares. | 04:32:21 |
Vladimír Čunát | (OK, the "runnable" metrics are better for judging the scheduling at a particular moment, but even that one's in thousands.) | 04:36:47 |
andi- | In reply to @vcunat:matrix.org
staging-next-21.05 (i.e. the first 21.05 with secure openssl) has over 30k queued x86_64-linux jobs. So I can't see what you mean about the idle capacity. I looked at the machines dashboard of hydra and there were ~4 x86_64-linux machines for about 30min that didn't execute a single job | 11:58:17 |
Vladimír Čunát | In reply to @andi:kack.it I looked at the machines dashboard of hydra and there were ~4 x86_64-linux machines for about 30min that didn't execute a single job Maybe they're stuck. We certainly have some macs that haven't made a step for days. | 11:59:41 |
hexa | it really seemed like the x86_64-linux machines were idle yesterday, I often saw no x86_64-linux jobs in the running builds | 12:11:04 |
Vladimír Čunát | Weird. I noticed that the /machines page isn't very precise, e.g. it seemed not to show jobs that take relatively short time... for this the /queue-runner-status seemed better. | 12:22:57 |
hexa | the machine page also shows machines as idle when they're copying stuff | 12:31:55 |
Vladimír Čunát | Well, the scheduling certainly isn't ideal. Now I looked at t4b, and it's been completely idle during the last 15 minutes, not even I/O waits. | 12:44:47 |
Vladimír Čunát | * Well, the scheduling certainly isn't ideal. Now I looked at t4b, and it's been completely idle during the last 15 minutes, not even I/O waits. (I ran atop on it) | 12:45:14 |