!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

417 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.130 Servers

Load older messages


SenderMessageTime
9 May 2026
@emilazy:matrix.orgemilycan any Hydra-knowers say if the sequence of events given in https://github.com/NixOS/nix/pull/15638#issuecomment-4413076030 seems at all plausible?17:10:20
@emilazy:matrix.orgemilyI did some digging and it seems like the persistent Darwin ad-hoc code signature SIGKILL issues are indeed quite likely to be chronically caused by derivations with multiple outputs getting some of their outputs rebuilt and running into getting mangled by path rewrites17:10:54
@emilazy:matrix.orgemilywhat's not at all clear to me is why that would be happening, because any build of a derivation builds all its outputs, so as long as we have all outputs getting pushed out to the cache (rather than it being reasonably common for only some outputs to get pushed to the cache for a given build), substitutions by Hydra builders from the cache not chronically failing, and not some other weirdness like leftover outputs ending up registered in the store despite builds failing (? – recent disk space issues maybe?), I don't understand how we'd be regularly (and more commonly lately?) seeing this happen17:12:20
@hexa:lossy.networkhexawow, that's too long for me for now17:13:34
@hexa:lossy.networkhexathat issue17:13:36
@emilazy:matrix.orgemilyyeah just look at my last comment 😅17:15:12
@emilazy:matrix.orgemilyI can give further context as needed but the big question is just how we could end up seeing "some outputs present in the store but the derivation gets built anyway" on a regular basis on Hydra17:15:46
@emilazy:matrix.orgemilyoh I mentioned in the previous comment before that but forgot to mention it in the second one: maybe it could also be a race condition where two builders try to build the same package, where one of them has already uploaded one output, but the second build beats it to other outputs/logs?17:19:51
@emilazy:matrix.orgemily the timing for that to happen seems… tight, though; I don't think fish would take long to upload… 17:20:03
@emilazy:matrix.orgemilyif there are decently logs for the cache uploads that could be accessed that would likely help narrow things down a lot17:21:22
@hexa:lossy.networkhexaraces are absolutely a possiblity17:21:47
@hexa:lossy.networkhexanote that I tried out the new queue-runner at least three times in the last two weeks17:22:00
@emilazy:matrix.orgemilythese issues have been present for years17:22:11
@hexa:lossy.networkhexagood17:22:14
@emilazy:matrix.orgemilybut getting worse in the past, say, couple months?17:22:18
@emilazy:matrix.orgemilymuch worse17:22:31
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Hm is S3 still very bad on the on queue runner?17:23:09
@emilazy:matrix.orgemily

I think it would require

  1. builder A starts building fish
  2. builder A finishes building fish
  3. builder A uploads build log and fish^doc
  4. builder B substitutes fish^doc
  5. builder B starts building fish
  6. builder B uploads fish^out
  7. builder A doesn't get to upload fish^out
17:23:56
@emilazy:matrix.orgemily seems pretty contrived to me, the entire second build has to happen between builder A starting to upload and finishing (and what would be pulling in fish^doc to begin with?) 17:24:17
@hexa:lossy.networkhexanobody changed anything about s3, hydra repo has migrated to the new queue-runner i march17:24:41
@emilazy:matrix.orgemily seems more likely for the upload of fish^out to just hard fail the first time and then a second builder later picks it up and mangles it 17:24:45
@hexa:lossy.networkhexaRedacted or Malformed Event17:24:46
@hexa:lossy.networkhexathe old queue-runner is gone from the repo17:24:52
@hexa:lossy.networkhexayeah, same reaction17:25:07
@hexa:lossy.networkhexaI suddenly had to pin hydra without any prior communication17:25:18
@emilazy:matrix.orgemilydo logs exist for an attempt to push out a given store path and whether it succeeded?17:26:11
@emilazy:matrix.orgemily or for multiple builds of a derivation that happen? like can we get data about whether fish's outputs/logs on the cache for that one derivation are actually chimerical between two separate builds? 17:26:40
@xokdvium:matrix.orgSergei Zimmerman (xokdvium)Hm maybe last-modified could be a rough approximation?17:27:33
@emilazy:matrix.orgemilywe run into this issue ~every staging cycle now it feels like from what I've seen, and it's gone from "once or twice a year" to "constantly hitting users" it seems17:27:36
@emilazy:matrix.orgemilyit also holds up cycles because it breaks downstream builds etc.17:28:03

Show newer messages


Back to Room ListRoom Version: 6