!zghijEASpYQWYFzriI:nixos.org

Hydra

367 Members
110 Servers

Load older messages


SenderMessageTime
24 Feb 2025
@shawn8901:matrix.orgshawn8901 *

Hi, since some time i am having the following error on my hydra instance.

Everytime it does an evaluation it aborts with the following error in log:

Feb 24 17:00:38 tank nix-daemon[1579]: accepted connection from pid 6217, user hydra
Feb 24 17:00:38 tank hydra-evaluator[2106]: (config:.jobsets) Evaluating...
Feb 24 17:00:38 tank hydra-evaluator[2106]: error: stoi
Feb 24 17:00:38 tank hydra-evaluator[2106]: {UNKNOWN}: process ended prematurely at /nix/store/kvnp4qdk6bcg9j0pc8d87dgz6z5qklhl-hydra-0-unstable-2025-02-18/bin/.hydra-eval-jobset-wrapped line 404. at /nix/store/i85ni9bphygj6d31v68x24ncvhbc2vn6-hydra-perl-deps/lib/perl5/site_perl/5.40.0/Catalyst/Model/DBIC/Schema.pm line 526
Feb 24 17:00:38 tank hydra-evaluator[2078]: evaluation of jobset β€˜config:.jobsets (jobset#1)’ failed with exit code 1

I am kinda out of ideas, the webserver runs fine.
I did play around a bit with the initd system (switched from scripted to systemd), if i remember correctly its kinda at the same time frame.
I noticed that there was a wrongly mapped uid for another service (which i fixed with chowning to the new id), but for hydra i did not find similar.
I also tried to reinstall hydra (nuke /var/lib/hydra and drop the database).
But all of that did not help.

I found an old issue relating to memory on that error, tho it confuses me as the machine has plenty of memory left unused and was capable to run my hydra builds before.
Has anyone an idea for me how to continue analying that issue?

edit: I found https://github.com/NixOS/hydra/issues/1437 that could be similar thing, at least the time range could fit, tho i am seing a different error text.

16:37:27
@shawn8901:matrix.orgshawn8901 Okay, its locatable to the latest hydra bump, when I revert it, it's working again fine.
Should I create an issue in nixpks or more on hydras github?
20:18:34
25 Feb 2025
@hacker1024:matrix.orghacker1024 There are only two places where nix-eval-jobs uses stoi. Have you set evaluator_workers or evaluator_max_memory_size in your Hydra configuration? 02:14:18
@shawn8901:matrix.orgshawn8901
In reply to @hacker1024:matrix.org
There are only two places where nix-eval-jobs uses stoi. Have you set evaluator_workers or evaluator_max_memory_size in your Hydra configuration?
Yeah, I am limiting that. So stoi = out of memory?
06:25:39
@hacker1024:matrix.orghacker1024 No, stoi is a function that parses an integer. What is the exact contents of that part of your config? 06:26:39
@shawn8901:matrix.orgshawn8901 evaluator_max_memory_size = ${toString (4 * 1024 * 1024 * 1024)} which was totally fine previously 06:27:35
@shawn8901:matrix.orgshawn8901 And I am setting evaluator_workers = 2 06:28:15
@hacker1024:matrix.orghacker1024Hmm yeah that does seem fine06:28:30
@hacker1024:matrix.orghacker1024Still maybe try without it for a bit and see if that helps?06:28:42
@hacker1024:matrix.orghacker1024Wait that size is in mb though? You're allocating 4EB06:29:20
@hacker1024:matrix.orghacker1024* Wait that size is in mb though? You're allocating 4PB06:29:39
@hacker1024:matrix.orghacker1024That's also 2x the signed integer limit06:30:53
@shawn8901:matrix.orgshawn8901Is it? It should be 4g, at least that was where it limited before06:31:04
@shawn8901:matrix.orgshawn8901Yeah maybe that is then where it's choking 06:31:31
@shawn8901:matrix.orgshawn8901I'll try it out when I am back at home, I just did not expect to break in such a way between a bump of Less then 2 weeks πŸ˜…06:32:14
@hacker1024:matrix.orghacker1024The unit is definitely mb now, and I'm pretty sure it always has been πŸ‘€ https://github.com/nix-community/nix-eval-jobs/blob/4b392b284877d203ae262e16af269f702df036bc/src/eval-args.cc#L5906:32:31
@hacker1024:matrix.orghacker1024So you'd just want 409606:32:41
@shawn8901:matrix.orgshawn8901Hum, the. Old hydra did not cry about it06:32:56
@shawn8901:matrix.orgshawn8901The value is unchanged for 8month in my config06:33:27
@shawn8901:matrix.orgshawn8901I'll check that out and report later, thank you very much for your help!06:33:49
@hacker1024:matrix.orghacker1024No problem06:33:56
@shawn8901:matrix.orgshawn8901Okay I guess I found where my initial confused comes from. There are old log outputs, where those has been printed out in bytes, and few years ago evaluator max heap size on hydra was also set in bytes (as the envvar that was passed did understand that), and I just blindly assumed that it's just then bytes 🫣 old hydra then possibly did just use the default (which is also 4g) and that's likely why it matched my observations06:54:43
26 Feb 2025
@shawn8901:matrix.orgshawn8901It was that issue at the end. New version works fine now. πŸŽ‰04:49:57
27 Feb 2025
@adam:robins.wtf@adam:robins.wtf joined the room.14:02:49
3 Mar 2025
@Ericson2314:matrix.orgJohn Ericson hexa: https://github.com/NixOS/hydra/pull/1451 put up a revert of that issue 15:13:42
@Ericson2314:matrix.orgJohn Ericsonuntil I have time to fix it properly15:13:48
6 Mar 2025
@polygon_:matrix.orgpolygon_Hello, is there a way to get lists (newly failing jobs, still failing jobs) as JSON or another easily machine readable format? E.g. https://hydra.nixos.org/eval/1810654?full=1#tabs-now-fail18:09:04
@k900:0upti.meK900No18:10:55
@janne.hess:helsinki-systems.dedas_j
In reply to @k900:0upti.me
No
Actually I crawl them for zh.fail and parse the HTML. Maybe I can just serve the cache files with nginx πŸ€”
19:25:49
@polygon_:matrix.orgpolygon_Do you happen to also crawl the logs? I noticed that quite some packages failed (and less popular ones still do) after moving to GCC14 due to some warnings that got turned to errors. I compiled a list of packages that failed in the first eval after the GCC change and a current eval. Identified 400 packages that failed first then and still fail now. Would the Hydra people be unhappy if I pulled all the logs for that, the ones caused by these warnings should be easily identifiable.19:56:25

Show newer messages


Back to Room ListRoom Version: 6