| 28 May 2021 |
| zwro changed their display name from zzz to zwro. | 11:42:19 |
cdepillabout | In reply to @sternenseemann:systemli.org cdepillabout: sometimes the status of the aggregate jobs is stale and you need to restart (just) the aggregate job Ah, I was actually wondering about that. Restarting the mergeable or maintained job should also restart all of its failed child jobs? | 12:09:48 |
cdepillabout | I had thought that I had tried that once and it didn't work, but maybe I was just mistaken. | 12:10:05 |
sterni (he/him) | In reply to @cdepillabout:matrix.org Ah, I was actually wondering about that. Restarting the mergeable or maintained job should also restart all of its failed child jobs? no, that's the point it doesn't restart any of the aggregated jobs | 12:10:32 |
sterni (he/him) | but if something was fixed by restarting for example the aggregate job doesn't get updated so you'd need to restart it | 12:10:52 |
sterni (he/him) | so restarting it only helps with stale failures | 12:11:13 |
maralorn | I mean that’s not even special for aggregate jobs. All jobs don‘t update automatically if they failed because of a broken dependency and then that dependency got fixed. | 12:12:00 |
cdepillabout | Ah, I see what you're saying. So if the aggregate job has a child job that has failed, you need to restart both the failed child job and the aggregate jobset? | 12:12:19 |
sterni (he/him) | yep | 12:12:28 |
cdepillabout | Man, that's somewhat annoying | 12:12:50 |
maralorn | In reply to @cdepillabout:matrix.org Man, that's somewhat annoying Yeah, the most annoying part is, that you can‘t selectively bulk restart jobs. | 12:16:24 |
cdepillabout | Ah, yeah, so you have to go to each failed job and manually restart it. | 12:17:03 |
maralorn | Or you restart all failed jobs, but that will produce a lot of unnecessary work. | 12:17:51 |
cdepillabout |  Download image.png | 12:20:01 |
cdepillabout | Oh, good point. I see that there is the Restart all failed builds option. Although you're completely right, that would create a lot of unnecessary work. | 12:20:46 |
sterni (he/him) | may be a good shout though when we have a lot of dubious build failures again | 12:21:40 |
cdepillabout | At some point in the not-too-far future, maybe we can correctly mark all darwin/aarch64 builds that don't work broken, so our "failing jobs" will really be unexpected failures. In that case, I imagine there will be a lot less failed builds, so restarting all failed builds should cause much less work. | 12:22:31 |
sterni (he/him) | seems like we are getting almost none of the queue because three jobsets have much more queued builds than us :'( | 12:22:47 |
sterni (he/him) | cdepillabout: yeah I'm planning-ish to work on that | 12:23:17 |
cdepillabout | In reply to @sternenseemann:systemli.org cdepillabout: yeah I'm planning-ish to work on that That'd be great :-) | 12:23:50 |
sterni (he/him) | btw a mid term goal may be to get aarch64-darwin supported-ish, but I'd appreciate someone with appropriate hardware to help us out on that | 12:23:58 |
sterni (he/him) | I'd say we should at least get GHC to build and teach cabal2nix that it is a platform that exists now | 12:24:21 |
cdepillabout |  Download image.png | 12:24:31 |
cdepillabout | In reply to @sternenseemann:systemli.org seems like we are getting almost none of the queue because three jobsets have much more queued builds than us :'( Is this what you mean? How staging-next and staging-21.05 have a lot more queued than us? | 12:24:46 |
sterni (he/him) | yeah the scheduling seems to take into account the amount of queued builds as well as the hydra shares | 12:26:06 |
maralorn | I am sill curious how exactly the scheduling works. Does the fact that we now have darwin and aarch64 builds mean that we get less x86 jobs? Or is the scheduling completely independent between architectures? | 12:28:45 |
maralorn | (btw. I have only recently realized how the fact that we didn‘t have darwin and aarch64 builds probably slowed down nixpkgs-unstable at least from time to time.) | 12:30:03 |
andi- | It is independent of architectures. Your jobset gets an amount of shares of the total shares. The scheduling looks at the last 24h and prefers jobs that have more unused build time within the 24h weighted by their share ratio. | 12:30:24 |
sterni (he/him) | it seems to me that it is a bit all or nothing? seemingly some jobsets just win and get all build capacity and another jobset is just stuck for a day | 12:32:02 |
andi- | In prometheus speech it is something like sum(build_duration[24h]}/share_ratio group by (jobset) the list of jobs will then be sorted so that the jobs with the least usage are preferred. | 12:32:22 |