!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

415 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.130 Servers

Load older messages


SenderMessageTime
9 May 2026
@Ericson2314:matrix.orgJohn Ericson(the new queue runner shares the whole drv graph which is causing rebuilding problems, I am working on it right now to make it like the older queue runner so that cannot happen)19:04:03
@emilazy:matrix.orgemily these problems predate when hexa (signing key rotation when) mentioned the new runner had been deployed 19:10:39
@emilazy:matrix.orgemily(as does my memory of talk of duplicate builds)19:10:47
@emilazy:matrix.orgemilyso whatever is going on here is unrelated to any problems the new runner has I think19:10:54
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)

Indeed. There are several related issue than can happen (frankenbuild shaped, not necessarily our cases).

  1. A builder grabs something it has built previously which isn’t what’s in the cache. - We’ve observed this in the nix repo. Can happen when a build gets scheduled after a successful build that couldn’t be uploaded by the queue runner.

  2. A partial upload where the queue runner uploads outputs that have been built by different machines. - this is probably the case with fish.

19:21:58
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Ideally we’d ensure consistency of the narHash of the inputs on the builder and what’s in the cache19:23:24
@emilazy:matrix.orgemily(2) is confusing though, since it seems like we're observing large time gaps between the builds/uploads19:24:30
@emilazy:matrix.orgemilyso why would it get built twice not even in a race condition way but after many minutes?19:24:45
@emilazy:matrix.orgemily the BasicDerivation point seems relevant as it's not like a transitive dependency can easily pull it in 19:25:12
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Probably can happen when the queue runner tries to upload outputs from the first builder and fails halfway and does it on another builder afterwards? And it pulls the one known output from the cache?19:25:40
@emilazy:matrix.orgemilyI guess the queue runner would observe the output is missing and schedule another build?19:25:42
@emilazy:matrix.orgemilyright. so that could be fixed at the queue runner level? "if we have an output waiting to be uploaded, then don't spawn another build; just keep trying to upload that output"?19:26:14
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)
In reply to @emilazy:matrix.org
right. so that could be fixed at the queue runner level? "if we have an output waiting to be uploaded, then don't spawn another build; just keep trying to upload that output"?
Makes sense yeah
19:26:57
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Things probably get complicated when the builder dies halfway?19:27:29
@emilazy:matrix.orgemilyas in half-way through sending its successfully-built outputs to the queue runner?19:27:57
@k900:0upti.meK900Then it should probably discard the entire build19:28:13
@emilazy:matrix.orgemilyyeah, though I think the issue is potentially that stuff happens per-output?19:28:33
@emilazy:matrix.orgemily"waiting for all outputs to be ready for upload before uploading any of them" would be good19:28:54
@k900:0upti.meK900I wonder if there's any reasonable way to do two phase commit on this19:29:59
@k900:0upti.meK900Like upload to a temporary directory and then move atomically19:30:07
@k900:0upti.meK900If S3 lets you do that19:30:15
@emilazy:matrix.orgemilyyeah, if we could have S3 expose all the narinfos atomically that would be great19:30:20
@emilazy:matrix.orgemilyI mean even just uploading all the actual outputs first and then uploading the narinfos would probably help19:30:28
@emilazy:matrix.orgemilyI don't really understand why Hydra would decide to build something it's trying to upload anyway though19:32:03
@emilazy:matrix.orgemilydoes it ever give up on retrying to upload?19:32:10
@emilazy:matrix.orgemilylike, anything that wants something in the process of uploading should block on that upload anyway, right?19:32:52
@emilazy:matrix.orgemily so perhaps the solution could be as simple as just "never stop retrying uploads"? it wouldn't handle the "builder disappears" case Sergei Zimmerman (xokdvium) mentioned but that's at least an edge case 19:33:24
@emilazy:matrix.orgemily hexa (signing key rotation when): can we get queue runner logs for musikcube 19:46:16
@emilazy:matrix.orgemilyhttps://github.com/NixOS/nixpkgs/issues/517508 is recent19:46:27
@emilazy:matrix.orgemilythough, let me check it's not a library dependency actually19:46:39

Show newer messages


Back to Room ListRoom Version: 6