| 28 May 2021 |
sterni (he/him) | btw a mid term goal may be to get aarch64-darwin supported-ish, but I'd appreciate someone with appropriate hardware to help us out on that | 12:23:58 |
sterni (he/him) | I'd say we should at least get GHC to build and teach cabal2nix that it is a platform that exists now | 12:24:21 |
cdepillabout |  Download image.png | 12:24:31 |
cdepillabout | In reply to @sternenseemann:systemli.org seems like we are getting almost none of the queue because three jobsets have much more queued builds than us :'( Is this what you mean? How staging-next and staging-21.05 have a lot more queued than us? | 12:24:46 |
sterni (he/him) | yeah the scheduling seems to take into account the amount of queued builds as well as the hydra shares | 12:26:06 |
maralorn | I am sill curious how exactly the scheduling works. Does the fact that we now have darwin and aarch64 builds mean that we get less x86 jobs? Or is the scheduling completely independent between architectures? | 12:28:45 |
maralorn | (btw. I have only recently realized how the fact that we didn‘t have darwin and aarch64 builds probably slowed down nixpkgs-unstable at least from time to time.) | 12:30:03 |
andi- | It is independent of architectures. Your jobset gets an amount of shares of the total shares. The scheduling looks at the last 24h and prefers jobs that have more unused build time within the 24h weighted by their share ratio. | 12:30:24 |
sterni (he/him) | it seems to me that it is a bit all or nothing? seemingly some jobsets just win and get all build capacity and another jobset is just stuck for a day | 12:32:02 |
andi- | In prometheus speech it is something like sum(build_duration[24h]}/share_ratio group by (jobset) the list of jobs will then be sorted so that the jobs with the least usage are preferred. | 12:32:22 |
maralorn | In reply to @andi:kack.it It is independent of architectures. Your jobset gets an amount of shares of the total shares. The scheduling looks at the last 24h and prefers jobs that have more unused build time within the 24h weighted by their share ratio. What’s unused build time? Like, what’s the total? | 12:34:49 |
maralorn | In reply to @andi:kack.it In prometheus speech it is something like sum(build_duration[24h]}/share_ratio group by (jobset) the list of jobs will then be sorted so that the jobs with the least usage are preferred. Okay the formula makes sense. | 12:35:51 |
andi- | https://github.com/NixOS/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc#L139-L184 | 12:36:03 |
maralorn | sterni (he/him): If that formula is true at least theoretically the following could happen: Queues are empty, haskell-updates get’s all the build time. 12h suddenly queues are full, haskell-updates has already used a lot of build time in the last 24h so now we get starved. | 12:37:17 |
andi- | Exactly. That has happened in the past with normal release jobs as well. | 12:37:41 |
andi- | I guess you could probably workaround that issue by bumping the "mergable" job to the front of the queue but that would be cheating the "fairness" system :) | 12:38:42 |
andi- | * I guess you could probably workaround that issue by bumping the "mergeable" job to the front of the queue but that would be cheating the "fairness" system :) | 12:38:51 |
maralorn | In reply to @andi:kack.it I guess you could probably workaround that issue by bumping the "mergeable" job to the front of the queue but that would be cheating the "fairness" system :) Well bumping only the aggreggate job wouldn‘t do anything harmful or does it transitively affect dependencies? | 12:39:33 |
maralorn | Ah, I should have read that comment.^^ | 12:39:56 |
maralorn | Anyways, I think we three don‘t even have the bump to front right and I actually don‘t think we need it. | 12:40:23 |
andi- | A situation where it might be useful is where only aarch64-darwin (or whatever arch) jobs are running and most of our builds are idle. At least then we could make use of the other compute power that is idle at that point in time. | 12:52:33 |
maralorn | Oh, damn. I interpreted your "it’s independent of architectures" completely wrong. | 12:53:37 |
maralorn | You are saying that we can have darwin builders idle because we had a large rebuild on x86-linux 12h ago? | 12:54:24 |
| marinelli joined the room. | 19:46:11 |
| marinelli changed their display name from Giorgio to marinelli. | 19:59:52 |
| 29 May 2021 |
cdepillabout | In reply to @maralorn:maralorn.de Anyways, I think we three don‘t even have the bump to front right and I actually don‘t think we need it. It would be convenient a situation like the following: We are trying to fix up the mergeable job, and there are only one or two failed child jobs. You believe that the jobs have "incorrectly" failed, so you restart them and bump them to the front. They finish building in a minute or so and you can then go ahead with merging haskell-updates into master. You could basically do it all in one sitting, without having to wait a few hours for the build to start. | 03:12:07 |
cdepillabout | So basically a situation where you are literally sitting at your computer and waiting to take some action based on the outcome of a job. | 03:13:18 |
joe (he/him) | cdepillabout: I'll release exact-real with a "fix" tonight\ | 10:23:33 |
joe (he/him) | * cdepillabout: I'll release exact-real with a "fix" tonight | 10:23:36 |
joe (he/him) | The fix is changing the arbitrary instance for CReal to generate less problematic numbers https://github.com/expipiplus1/exact-real/pull/38 | 10:24:01 |