!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

416 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.130 Servers

Load older messages


SenderMessageTime
9 May 2026
@hexa:lossy.networkhexahttps://grafana.nixos.org/d/he6lz9g/macos-disk-mem-swap?orgId=1&from=now-90d&to=now&timezone=utc18:03:47
@Ericson2314:matrix.orgJohn Ericsonin this situation, do we know why the (old) queue runner told two builders to build fish?18:52:31
@hexa:lossy.networkhexaI don't18:55:01
@Ericson2314:matrix.orgJohn Ericsonunless it thought the first one failed I can't think of why either18:56:46
@Ericson2314:matrix.orgJohn EricsonI am working on new queue runner right now making it more like old when with the use of BuildDerivation18:56:59
@Ericson2314:matrix.orgJohn Ericsonhopefully that will at lest help with the new scheduling things and building their dependencies problem18:57:24
@emilazy:matrix.orgemily I know that another job can pull fish in as a dependency and it'll be built on the Nix level. 19:00:47
@emilazy:matrix.orgemilyhow that interacts with the queue runner/uploads I don't know19:00:54
@Ericson2314:matrix.orgJohn Ericson the queue runner sends BasicDerivations so that the builder should not know how to build fish in that case 19:03:12
@Ericson2314:matrix.orgJohn Ericsonit should either substitute or fail19:03:19
@emilazy:matrix.orgemily right, okay. I don't know exactly why it happens but I know vcunat has mentioned it happening multiple times before. 19:03:34
@emilazy:matrix.orgemily e.g. in the context of duplicate work during stdenv building. 19:03:40
@Ericson2314:matrix.orgJohn Ericson(the new queue runner shares the whole drv graph which is causing rebuilding problems, I am working on it right now to make it like the older queue runner so that cannot happen)19:04:03
@emilazy:matrix.orgemily these problems predate when hexa (signing key rotation when) mentioned the new runner had been deployed 19:10:39
@emilazy:matrix.orgemily(as does my memory of talk of duplicate builds)19:10:47
@emilazy:matrix.orgemilyso whatever is going on here is unrelated to any problems the new runner has I think19:10:54
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)

Indeed. There are several related issue than can happen (frankenbuild shaped, not necessarily our cases).

  1. A builder grabs something it has built previously which isn’t what’s in the cache. - We’ve observed this in the nix repo. Can happen when a build gets scheduled after a successful build that couldn’t be uploaded by the queue runner.

  2. A partial upload where the queue runner uploads outputs that have been built by different machines. - this is probably the case with fish.

19:21:58
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Ideally we’d ensure consistency of the narHash of the inputs on the builder and what’s in the cache19:23:24
@emilazy:matrix.orgemily(2) is confusing though, since it seems like we're observing large time gaps between the builds/uploads19:24:30
@emilazy:matrix.orgemilyso why would it get built twice not even in a race condition way but after many minutes?19:24:45
@emilazy:matrix.orgemily the BasicDerivation point seems relevant as it's not like a transitive dependency can easily pull it in 19:25:12
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Probably can happen when the queue runner tries to upload outputs from the first builder and fails halfway and does it on another builder afterwards? And it pulls the one known output from the cache?19:25:40
@emilazy:matrix.orgemilyI guess the queue runner would observe the output is missing and schedule another build?19:25:42
@emilazy:matrix.orgemilyright. so that could be fixed at the queue runner level? "if we have an output waiting to be uploaded, then don't spawn another build; just keep trying to upload that output"?19:26:14
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)
In reply to @emilazy:matrix.org
right. so that could be fixed at the queue runner level? "if we have an output waiting to be uploaded, then don't spawn another build; just keep trying to upload that output"?
Makes sense yeah
19:26:57
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Things probably get complicated when the builder dies halfway?19:27:29
@emilazy:matrix.orgemilyas in half-way through sending its successfully-built outputs to the queue runner?19:27:57
@k900:0upti.meK900Then it should probably discard the entire build19:28:13
@emilazy:matrix.orgemilyyeah, though I think the issue is potentially that stuff happens per-output?19:28:33
@emilazy:matrix.orgemily"waiting for all outputs to be ready for upload before uploading any of them" would be good19:28:54

Show newer messages


Back to Room ListRoom Version: 6