!zghijEASpYQWYFzriI:nixos.org

Hydra

347 Members
102 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
22 Feb 2025
@Ericson2314:matrix.orgJohn Ericson hexa: is that....a semi-reproduce after all, or still no reproduce? 17:40:33
@hexa:lossy.networkhexaI can kill and restart and it repros17:40:58
@Ericson2314:matrix.orgJohn EricsonOK, and just to be clear this is builder hanging, no log, no progress17:41:37
@Ericson2314:matrix.orgJohn Ericson I've made git log master --first-parent very bisectable for this situation 17:42:15
@Ericson2314:matrix.orgJohn Ericsonthough, I suspect the issue is the most recent merge17:42:23
@Ericson2314:matrix.orgJohn Ericson git log master --first-parent $(git merge-base c3b6e7b master)...master to show what's been h.n.o and master 17:43:27
@Ericson2314:matrix.orgJohn Ericsonhave time to deploy one of those other ones in the middle?17:45:02
@Ericson2314:matrix.orgJohn Ericson * have time to test out one of those other ones in the middle hexa? 17:45:15
@hexa:lossy.networkhexaLater tonight17:45:59
@hexa:lossy.networkhexaIn 3h+17:46:07
@Ericson2314:matrix.orgJohn Ericsonok sounds good, thanks!17:46:14
@Ericson2314:matrix.orgJohn Ericsonhopefully once we find the PR which broke it, I can write a test :)17:46:42
@9hp71n:matrix.orgghpzin joined the room.19:15:02
@joshheinrichs-shopify:matrix.orgJosh Heinrichs joined the room.21:28:18
23 Feb 2025
@hexa:lossy.networkhexa John Ericson: bisected to https://github.com/NixOS/hydra/commit/4a4a0f901c70676ee47f830d2ff6a72789ba1baf 04:42:50
@Ericson2314:matrix.orgJohn Ericson @hexa:lossy.network: thanks Hexa! That's just the one I thought it would be 07:07:02
24 Feb 2025
@hacker1024:matrix.orghacker1024

We have an x86_64 machine running Hydra and an aarch64 builder. On recent versions of Nix/Hydra (I've tried the one with Nix 2.25 pre-LegacySSHStore and Nix 2.26 post-LegacySSHStore, it looks like x86_64 jobs that depend on outputs built on the aarch64 machine (e.g. deployment scripts that use aarch64 system closures) are getting stuck indefinitely on "Sending inputs", and cannot even be cancelled.

Pure x86_64 and aarch64 jobs are still fine.

Has anyone had this too?

02:54:28
@hacker1024:matrix.orghacker1024 *

We have an x86_64 machine running Hydra and an aarch64 builder. On recent versions of Nix/Hydra (I've tried the one with Nix 2.25 pre-LegacySSHStore and Nix 2.26 post-LegacySSHStore), it looks like x86_64 jobs that depend on outputs built on the aarch64 machine (e.g. deployment scripts that use aarch64 system closures) are getting stuck indefinitely on "Sending inputs", and cannot even be cancelled.

Pure x86_64 and aarch64 jobs are still fine.

Has anyone had this too?

02:54:40
@hacker1024:matrix.orghacker1024 *

We have an x86_64 machine running Hydra and an aarch64 builder. On recent versions of Nix/Hydra (I've tried the one with Nix 2.25 pre-LegacySSHStore and Nix 2.26 post-LegacySSHStore), it looks like x86_64 jobs that depend on outputs built on the aarch64 machine (e.g. deployment scripts that use aarch64 system closures) are getting stuck indefinitely on "Sending inputs", and cannot even be cancelled.

Nothing significant seems to be getting logged on either machine.

Pure x86_64 and aarch64 jobs are still fine.

Has anyone had this too?

02:54:57
@hacker1024:matrix.orghacker1024 *

We have an x86_64 machine running Hydra and an aarch64 builder. On recent versions of Nix/Hydra (I've tried the one with Nix 2.25 pre-LegacySSHStore and Nix 2.26 post-LegacySSHStore), it looks like x86_64 jobs that depend on outputs built on the aarch64 machine (e.g. deployment scripts that use aarch64 system closures) are getting stuck indefinitely on "Sending inputs", and cannot even be cancelled.

Nothing significant seems to be getting logged on either machine.

Pure x86_64 and aarch64 jobs are still fine.

Has anyone had this too?

Edit: Made issue

05:21:48
@shawn8901:matrix.orgshawn8901

Hi, since some time i am having the following error on my hydra instance.

Everytime it does an evaluation it aborts with the following error in log:

Feb 24 17:00:38 tank nix-daemon[1579]: accepted connection from pid 6217, user hydra
Feb 24 17:00:38 tank hydra-evaluator[2106]: (config:.jobsets) Evaluating...
Feb 24 17:00:38 tank hydra-evaluator[2106]: error: stoi
Feb 24 17:00:38 tank hydra-evaluator[2106]: {UNKNOWN}: process ended prematurely at /nix/store/kvnp4qdk6bcg9j0pc8d87dgz6z5qklhl-hydra-0-unstable-2025-02-18/bin/.hydra-eval-jobset-wrapped line 404. at /nix/store/i85ni9bphygj6d31v68x24ncvhbc2vn6-hydra-perl-deps/lib/perl5/site_perl/5.40.0/Catalyst/Model/DBIC/Schema.pm line 526
Feb 24 17:00:38 tank hydra-evaluator[2078]: evaluation of jobset ‘config:.jobsets (jobset#1)’ failed with exit code 1

I am kinda out of ideas, the webserver runs fine.
I did play around a bit with the initd system (switched from scripted to systemd), if i remember correctly its kinda at the same time frame.
I noticed that there was a wrongly mapped uid for another service (which i fixed with chowning to the new id), but for hydra i did not find similar.
I also tried to reinstall hydra (nuke /var/lib/hydra and drop the database).
But all of that did not help.

I found an old issue relating to memory on that error, tho it confuses me as the machine has plenty of memory left unused and was capable to run my hydra builds before.
Has anyone an idea for me how to continue analying that issue?

16:19:01
@shawn8901:matrix.orgshawn8901 *

Hi, since some time i am having the following error on my hydra instance.

Everytime it does an evaluation it aborts with the following error in log:

Feb 24 17:00:38 tank nix-daemon[1579]: accepted connection from pid 6217, user hydra
Feb 24 17:00:38 tank hydra-evaluator[2106]: (config:.jobsets) Evaluating...
Feb 24 17:00:38 tank hydra-evaluator[2106]: error: stoi
Feb 24 17:00:38 tank hydra-evaluator[2106]: {UNKNOWN}: process ended prematurely at /nix/store/kvnp4qdk6bcg9j0pc8d87dgz6z5qklhl-hydra-0-unstable-2025-02-18/bin/.hydra-eval-jobset-wrapped line 404. at /nix/store/i85ni9bphygj6d31v68x24ncvhbc2vn6-hydra-perl-deps/lib/perl5/site_perl/5.40.0/Catalyst/Model/DBIC/Schema.pm line 526
Feb 24 17:00:38 tank hydra-evaluator[2078]: evaluation of jobset ‘config:.jobsets (jobset#1)’ failed with exit code 1

I am kinda out of ideas, the webserver runs fine.
I did play around a bit with the initd system (switched from scripted to systemd), if i remember correctly its kinda at the same time frame.
I noticed that there was a wrongly mapped uid for another service (which i fixed with chowning to the new id), but for hydra i did not find similar.
I also tried to reinstall hydra (nuke /var/lib/hydra and drop the database).
But all of that did not help.

I found an old issue relating to memory on that error, tho it confuses me as the machine has plenty of memory left unused and was capable to run my hydra builds before.
Has anyone an idea for me how to continue analying that issue?

edit: I found https://github.com/NixOS/hydra/issues/1437 that could be similar thing, at least the time range could fit, tho i am seing a different error.

16:37:20
@shawn8901:matrix.orgshawn8901 *

Hi, since some time i am having the following error on my hydra instance.

Everytime it does an evaluation it aborts with the following error in log:

Feb 24 17:00:38 tank nix-daemon[1579]: accepted connection from pid 6217, user hydra
Feb 24 17:00:38 tank hydra-evaluator[2106]: (config:.jobsets) Evaluating...
Feb 24 17:00:38 tank hydra-evaluator[2106]: error: stoi
Feb 24 17:00:38 tank hydra-evaluator[2106]: {UNKNOWN}: process ended prematurely at /nix/store/kvnp4qdk6bcg9j0pc8d87dgz6z5qklhl-hydra-0-unstable-2025-02-18/bin/.hydra-eval-jobset-wrapped line 404. at /nix/store/i85ni9bphygj6d31v68x24ncvhbc2vn6-hydra-perl-deps/lib/perl5/site_perl/5.40.0/Catalyst/Model/DBIC/Schema.pm line 526
Feb 24 17:00:38 tank hydra-evaluator[2078]: evaluation of jobset ‘config:.jobsets (jobset#1)’ failed with exit code 1

I am kinda out of ideas, the webserver runs fine.
I did play around a bit with the initd system (switched from scripted to systemd), if i remember correctly its kinda at the same time frame.
I noticed that there was a wrongly mapped uid for another service (which i fixed with chowning to the new id), but for hydra i did not find similar.
I also tried to reinstall hydra (nuke /var/lib/hydra and drop the database).
But all of that did not help.

I found an old issue relating to memory on that error, tho it confuses me as the machine has plenty of memory left unused and was capable to run my hydra builds before.
Has anyone an idea for me how to continue analying that issue?

edit: I found https://github.com/NixOS/hydra/issues/1437 that could be similar thing, at least the time range could fit, tho i am seing a different error text.

16:37:27
@shawn8901:matrix.orgshawn8901 Okay, its locatable to the latest hydra bump, when I revert it, it's working again fine.
Should I create an issue in nixpks or more on hydras github?
20:18:34
25 Feb 2025
@hacker1024:matrix.orghacker1024 There are only two places where nix-eval-jobs uses stoi. Have you set evaluator_workers or evaluator_max_memory_size in your Hydra configuration? 02:14:18
@shawn8901:matrix.orgshawn8901
In reply to @hacker1024:matrix.org
There are only two places where nix-eval-jobs uses stoi. Have you set evaluator_workers or evaluator_max_memory_size in your Hydra configuration?
Yeah, I am limiting that. So stoi = out of memory?
06:25:39
@hacker1024:matrix.orghacker1024 No, stoi is a function that parses an integer. What is the exact contents of that part of your config? 06:26:39
@shawn8901:matrix.orgshawn8901 evaluator_max_memory_size = ${toString (4 * 1024 * 1024 * 1024)} which was totally fine previously 06:27:35
@shawn8901:matrix.orgshawn8901 And I am setting evaluator_workers = 2 06:28:15
@hacker1024:matrix.orghacker1024Hmm yeah that does seem fine06:28:30

Show newer messages


Back to Room ListRoom Version: 6