| 7 Jul 2022 |
Winter (she/her) | ah, okay, thank you | 04:38:00 |
Winter (she/her) | build/test failures i understand, but why can't eval failures be reported as failures? those shouldn't be problematic even in weird machines (i'd think...)
sorry for asking basically the same thing again | 04:39:19 |
7c6f434c | Eval failures are failures, yes | 04:39:49 |
Winter (she/her) | test eval failures, though? | 05:49:30 |
Winter (she/her) | like, when evaling a test(s) | 05:49:46 |
7c6f434c | Well, technically speaking pkg.tests not being there is a) eval failure b) definitely not a check failure… | 05:50:47 |
Winter (she/her) | I mean when manually triggering a test and it's eval failing, that's reported as neutral | 05:51:21 |
7c6f434c | For example, a test missing would be an eval failure! | 05:51:44 |
7c6f434c | And possibly just an invocation typo | 05:51:58 |
Winter (she/her) | Right, but that's reported as a neutral failure to GH. | 05:53:01 |
7c6f434c | As it should be | 05:53:13 |
7c6f434c | And also some tests might be assuming a platform and be useful as such and have eval failures on unexpected platforms | 05:53:58 |
Winter (she/her) | I guess I'm confining my thinking to NixOS tests when we have different types of tests in-tree as well. | 05:54:38 |
Winter (she/her) | (Not sure how widely used they actually are, though.) | 05:54:50 |
7c6f434c | Drawing a line is hard, and failure reports should be clear failures | 05:54:52 |
7c6f434c | NixOS tests might have some … interesting nuances between x86_64 and aarch64 | 05:55:28 |
Winter (she/her) | How so, in what way? Obviously it depends on the test, but what are you thinking of? | 05:56:19 |
7c6f434c | Using a package that has hand-optimised assembly in it and finally gets marked broken on aarch64 when someone gets fed up with build failures | 05:57:14 |
7c6f434c | ofBorg is and should be conservative with the red marks | 05:58:16 |
Winter (she/her) | Ah, yeah, in the case of broken packages, that won't stop manual triggers from trying to eval aarch64 (although we could gate passthru.tests based on broken for the automatic triggers) | 05:58:54 |
Winter (she/her) | Yeah that makes sense | 05:58:59 |
Winter (she/her) | Apologies for going in circles | 05:59:04 |
7c6f434c | Realistically GitHub is just too restricted to have good high-resolution status reporting | 06:00:11 |
7c6f434c | So ofBorg reports as red just the things that can be defined without finely carved logic | 06:00:37 |
Winter (she/her) | What are those things, if it's so conservative on tests for example? Eval errors on the packages themselves? I guess those could also be affected by the issues discussed so hm | 06:01:33 |
7c6f434c | Well, there is evaluator for package set where marked-broken and some other evaluation failures are treated as expected failures | 06:02:52 |
7c6f434c | Like, Hydra auto-builds new packages without them being added to special lists, but of course there are limits to how broken things can be before this enumeration breaks down | 06:03:33 |
7c6f434c | That would be a red failure | 06:03:42 |
7c6f434c | (happens from time to time!) | 06:03:49 |
Winter (she/her) | So anything that's on a release list/channel/whatever that list is called failing would be a failure from OfBorg? | 06:10:36 |