| 22 Sep 2025 |
John Ericson | but I don't know why | 22:44:36 |
| 23 Sep 2025 |
Sandro | It's happening with the localhost special name which shouldn't use ssh | 06:13:29 |
Sandro | sysdig is not saying much useful sudo sysdig -c stderr
<3>possibly transient failure building ‘/nix/store/qs7p646zcgxxrv4d9bpj3hsbapwcl8s4-source.drv’ on ‘ssh://astrid.dse.in.tum.de’:
<5>will retry ‘/nix/store/qs7p646zcgxxrv4d9bpj3hsbapwcl8s4-source.drv’ after 62s
<5>performing step ‘/nix/store/9zw0q76n5z0737bds1wpswj5q7d7ic6x-initrd-linux-6.12.47.drv’ 1 times on ‘ssh://localhost’ (needed by build 24682 and 0 others)
<5>copying 0 paths...
<5>copying 0 paths...
<3>possibly transient failure building ‘/nix/store/zphg5h6vrqj9zwx3sq0wi2dqf3f8iyfl-source.drv’ on ‘ssh://clara’:
<5>will retry ‘/nix/store/zphg5h6vrqj9zwx3sq0wi2dqf3f8iyfl-source.drv’ after 65s
accepted connection from pid 46976, user hydra-queue-runner (trusted)
<5>copying 0 paths...
<5>copying 0 paths...
created 38395 symlinks in user environment
created 38395 symlinks in user environment
<3>possibly transient failure building ‘/nix/store/sdi2gbywg0w7r0y8lvwwpl73r22j7kba-source.drv’ on ‘ssh://clara’:
<5>will retry ‘/nix/store/sdi2gbywg0w7r0y8lvwwpl73r22j7kba-source.drv’ after 66s
accepted connection from pid 47001, user hydra-queue-runner (trusted)
| 07:23:01 |
Sandro | Sep 23 09:21:56 hydrogen hydra-queue-runner[11982]: will retry ‘/nix/store/zphg5h6vrqj9zwx3sq0wi2dqf3f8iyfl-source.drv’ after 65s
Sep 23 09:21:56 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:21:56 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:21:56 hydrogen hydra-queue-runner[11982]: possibly transient failure building ‘/nix/store/sdi2gbywg0w7r0y8lvwwpl73r22j7kba-source.drv’ on ‘ssh://clara’:
Sep 23 09:21:56 hydrogen hydra-queue-runner[11982]: will retry ‘/nix/store/sdi2gbywg0w7r0y8lvwwpl73r22j7kba-source.drv’ after 66s
Sep 23 09:21:57 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:21:58 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:21:58 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:21:58 hydrogen hydra-queue-runner[11982]: marking build 24677 as failed
Sep 23 09:22:01 hydrogen hydra-queue-runner[11982]: marking build 24682 as failed
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: possibly transient failure building ‘/nix/store/fk94qca8xwmq44h2cz4rs75sd213848k-pyccloud-0.1+20250218154744-py3-none-any.whl.drv’ on ‘ssh://astrid:
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: will retry ‘/nix/store/fk94qca8xwmq44h2cz4rs75sd213848k-pyccloud-0.1+20250218154744-py3-none-any.whl.drv’ after 62s
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: performing step ‘/nix/store/bqkn265gh4vnw6zmdmlrgicv9cvdsfq1-system-path.drv’ 1 times on ‘ssh://localhost’ (needed by build 24684 and 0 others)
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: performing step ‘/nix/store/mfm7g14axg5vz1f9c7rsy5jrhwg9fsva-system-path.drv’ 1 times on ‘ssh://localhost’ (needed by build 24685 and 0 others)
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: performing step ‘/nix/store/lkpdwpg6cvznl4vdbnyvnjya6zyvzilx-initrd-linux-6.12.47.drv’ 1 times on ‘ssh://localhost’ (needed by build 24685 and 0 others)
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: performing step ‘/nix/store/spqafxk9w0xn5bp3ksi8lsvx7ni9sb7f-system-path.drv’ 1 times on ‘ssh://localhost’ (needed by build 24686 and 0 others)
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:04 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:05 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:05 hydrogen hydra-queue-runner[11982]: copying 0 paths...
Sep 23 09:22:06 hydrogen hydra-queue-runner[11982]: marking build 24685 as failed
Sep 23 09:22:06 hydrogen hydra-queue-runner[11982]: marking build 24684 as failed
Sep 23 09:22:07 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:22:11 hydrogen hydra-queue-runner[11982]: marking build 24686 as failed
Sep 23 09:22:17 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:22:27 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:22:37 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:22:47 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:22:57 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
Sep 23 09:23:07 hydrogen hydra-queue-runner[11982]: checking the queue for builds...
| 07:23:38 |
Sandro | not sure how to provide better debugging things
Hydra 0-unstable-2025-09-13 (using nix-2.29.2+4 and nix-eval-jobs-2.31.0). You are signed in as sandro.
➜ nix --version
nix (Nix) 2.30.3+4
| 07:25:43 |
shawn8901 | Hi, I have a short question where I did not find a good solution yet and maybe someone here had a similar "issue". I am running my own hydra instance to ci check my system configs (defined as flake) and update the flake lock file.
I'm archive that so far by defining an releaseTools.aggregate and defining a runCommand for the aggregate job. That works like a charm. As the flake is a multi arch and I have another (more) powerful machine I have defined some remote builders (which also works fine).
But sometimes the aggregate job is scheduled to one of the remote builders, thus receiving all the closures of all hosts (as far as I can tell).
Is there a way to pin the aggregate to a specific builder (best case the hydra instance?)
I tried preferLocal which seems to have no effect and a tried to pass custom systemFeatures (so that just the hydra instance should be capable of running the job), but that does not work as aggregate does not accept required systemFeatures. Does someone maybe have an idea to pin it? | 08:42:51 |
| kenji changed their display name from a-kenji to kenji. | 10:41:21 |
Sandro | Just tried the following combination which I think worked earlier but no longer does...
Hydra 0-unstable-2025-09-13 (using nix-2.29.2+4 and nix-eval-jobs-2.30.0). You are signed in as sandro.
| 13:29:52 |
Sandro | and still the same error | 13:31:26 |
John Ericson | @sandro:supersandro.de: ah ok thanks that's very interesting | 22:26:10 |
John Ericson | I'll try to upgrade nix and nix eval jobs separately also | 22:26:22 |
| 24 Sep 2025 |
| cransom left the room. | 15:27:19 |
| 26 Sep 2025 |
vcunat | Hmm, weird. This command stopped working recently
curl -H "Accept: application/json" https://hydra.nixos.org/queue-runner-status
| 07:03:25 |
vcunat | But without the accept header it does still fetch the html, but it's just harder to work with. | 07:03:56 |
Sandro | Seems to work for me 🤔 | 17:40:00 |
| 27 Sep 2025 |
vcunat | Now it works for me as well. | 06:55:53 |
| 28 Sep 2025 |
John Ericson | @joerg:thalheim.io: so I think I know how I want to debug it: | 12:17:19 |
John Ericson | 1. Separate debug output, we just have that on Nix and keep it working in CI | 12:17:42 |
John Ericson | 2. Make the builder sleep for a long time | 12:17:55 |
John Ericson | Attach debugger to nix while builder is sleeping, then trace to the part where it kills the builder | 12:18:56 |
John Ericson | * 3. Attach debugger to nix while builder is sleeping, then trace to the part where it kills the builder | 12:19:06 |
Mic92 | Is nix a separate process in this case? | 16:08:22 |
Mic92 | There are also tricks where a function spawns gdb to itself. Also i would use gdbstub here | 16:09:16 |
vcunat | At a quick glance, maybe you'd consider to use rr to record an execution and then inspect that recording (which happens in a gdb interface again). | 17:16:52 |
vcunat | * At a quick glance, maybe you'd consider using rr to record an execution and then inspect that recording (which happens in a gdb interface again). | 17:17:07 |
John Ericson | In reply to @joerg:thalheim.io Is nix a separate process in this case? With the ssh://localhost yes | 23:11:04 |
John Ericson | Could use ?remote-program=... to run GDB hah | 23:11:49 |
John Ericson | In reply to @vcunat:matrix.org At a quick glance, maybe you'd consider using rr to record an execution and then inspect that recording (which happens in a gdb interface again). Yeah that might work well with the remote program thing | 23:12:15 |
| 29 Sep 2025 |
Mic92 | John Ericson: we need to make the call soon. I would prefer to get rid of nix 2.29 in nixpkgs soon to not have to maintain it for another release. | 06:27:25 |
| 30 Sep 2025 |
vcunat | Apparently it's flaky. Sometimes I get {"status":"unknown"}, sometimes a proper reply. | 09:13:26 |