27 Jun 2025 |
K900 | In reply to @mattsturg:matrix.org
I reproduced the issue with set -x , but I don't see anything useful in the output:
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.8.2+699d445a1a (64-bit .NET 9.0.6)
[xUnit.net 00:00:00.08] Discovering: NexusMods.Networking.GitHub.Tests
[xUnit.net 00:00:00.11] Discovered: NexusMods.Networking.GitHub.Tests
[xUnit.net 00:00:00.12] Starting: NexusMods.Networking.GitHub.Tests
[xUnit.net 00:00:00.20] Finished: NexusMods.Networking.GitHub.Tests
No test matches the given testcase filter `Category!=Disabled&FlakeyTest!=True&RequiresNetworking!=True&FullyQualifiedName!=NexusMods.DataModel.SchemaVersions.Tests.LegacyDatabaseSupportTests.TestDatabase&FullyQualifiedName!=NexusMods.DataModel.SchemaVersions.Tests.MigrationSpecificTests.TestsFor_0...` in /build/NexusMods.App/tests/NexusMods.Networking.GitHub.Tests/bin/Release/net9.0/NexusMods.Networking.GitHub.Tests.dll
+ exitHandler
+ exitCode=1
+ set +e
+ '[' -n '' ']'
+ (( 1 != 0 ))
+ runHook failureHook
+ local hookName=failureHook
+ shift
+ local 'hooksSlice=failureHooks[@]'
+ local hook
+ for hook in "_callImplicitHook 0 $hookName" ${!hooksSlice+"${!hooksSlice}"}
+ _logHook failureHook '_callImplicitHook 0 failureHook'
+ [[ -z 2 ]]
+ local hookKind=failureHook
+ local 'hookExpr=_callImplicitHook 0 failureHook'
+ shift 2
+ declare -F '_callImplicitHook 0 failureHook'
+ type -p '_callImplicitHook 0 failureHook'
+ [[ _callImplicitHook 0 failureHook != \_\c\a\l\l\I\m\p\l\i\c\i\t\H\o\o\k* ]]
+ _eval '_callImplicitHook 0 failureHook'
+ declare -F '_callImplicitHook 0 failureHook'
+ eval '_callImplicitHook 0 failureHook'
++ _callImplicitHook 0 failureHook
++ local def=0
++ local hookName=failureHook
++ declare -F failureHook
++ type -p failureHook
++ '[' -n '' ']'
++ return 0
+ return 0
+ '[' -n '' ']'
+ return 1
Well it exits with 1 | 16:00:04 |
K900 | Presumably becuase it found no tests to run | 16:00:04 |
Matt Sturgeon | Maybe... but we also see that output in successful builds, and 6x total in the this build 🤔
Unlike the exit 1 , that error doesn't seem inconsistent.
| 16:02:20 |
Corngood | So it looks like dotnet test is definitely returning 1. I guess my next idea would be to test -v:diag and see if there's any sort of difference there. If that doesn't show anything then maybe COREHOST_TRACE or strace ? | 16:16:53 |
Matt Sturgeon | Just for comparison, here's a successful build log (also with set -x ). We see the same 6 matches for No test matches the given testcase filter . | 16:21:34 |
Matt Sturgeon | Download set-x-success.log | 16:21:40 |
Matt Sturgeon | Will try that out, just adding -v:diag to dotnetTestFlags? | 16:22:05 |
Corngood | Yeah, that should work. It'll probably be a lot of output. | 16:23:12 |
Matt Sturgeon | Ok took a couple tries, but got a build failure with set -x and -v:diag | 16:46:08 |
Matt Sturgeon | Download v-diag-fail-1.log | 16:46:34 |
Matt Sturgeon | For comparison, here's a successful build: | 16:49:11 |
Matt Sturgeon | Download v-diag-success-1.log | 16:49:25 |
Matt Sturgeon | This seems relevant:
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.8.2+699d445a1a (64-bit .NET 9.0.6)
[xUnit.net 00:00:00.11] Discovering: NexusMods.Games.RedEngine.Tests
[xUnit.net 00:00:00.19] Discovered: NexusMods.Games.RedEngine.Tests
[xUnit.net 00:00:00.19] Starting: NexusMods.Games.RedEngine.Tests
Passed NexusMods.Games.RedEngine.Tests.Cyberpunk2077SynchronizerTests.ContentIsIgnoredWhenSettingIsSet [102 ms]
Passed NexusMods.Games.RedEngine.Tests.Cyberpunk2077DiagnosticTests.CyberpunkGameExposesPatternBasedDiagnostics [106 ms]
The active test run was aborted. Reason: Test host process crashed
Test Run Aborted.
Total tests: Unknown
Passed: 2
Total time: 2.3107 Seconds
The "VSTestTask" task returned false but did not log an error. (TaskId:381)
Done executing task "VSTestTask" -- FAILED. (TaskId:381)
Done building target "_VSTestConsole" in project "NexusMods.Games.RedEngine.Tests.csproj" -- FAILED.: (TargetId:99)
Done executing task "CallTarget" -- FAILED. (TaskId:361)
Done building target "VSTest" in project "NexusMods.Games.RedEngine.Tests.csproj" -- FAILED.: (TargetId:98)
| 16:51:35 |
Corngood | I wonder if this can be reproduced in a dev shell by repeatedly running the test phase. That would make it quite a bit easier to investigate. | 16:52:46 |
Matt Sturgeon | Probably... I just don't have that workflow committed to muscle memory yet, so I never bother trying to use devshells for packages... 🙈 | 16:53:55 |
Corngood | It should be possible to run that 'test host process' directly, but I'm not exactly sure how to do it | 16:53:58 |
Corngood | my usual thing is
nix develop -f. nexusmods-app
cd $NIX_BUILD_TOP
genericBuild
the problem with that is that it won't stop on errors, so it'll try to run all the phases, which may or may not be a problem. at the end you can do runPhase checkPhase
| 16:55:42 |
Corngood | I also sometimes do (set -e; genericBuild) but the problem with that is that you lose the subshell environment, and some things might depend on shell vars etc | 16:56:42 |
Matt Sturgeon | I've not managed to reproduce the issue with runPhase checkPhase , either I'm getting (un?)lucky, the underlying problem is outside the checkPhase, or something is cached. | 18:50:11 |
28 Jun 2025 |
Matt Sturgeon | Looking around, it seems vtest has a thread discussing a similar Test host process crashed error. In that instance, it turned out to be an out-of-memory error.
Is there an easy way to tell the package to run time -v dotnet test instead of simply dotnet test ?
If it turns out to be a memory issue, is there a way to increase dotnet test 's available memory, or should we just disable RAM-intensive tests?
| 10:12:44 |
Corngood | The `dotnet test` call is in `dotnet-check-hook.sh`. You could hack it in there.
If it was out of memory it would have to be using a lot of memory to be reproducible on my desktop machine. Enough that I would certainly consider it a bug in itself. | 12:54:38 |
Corngood | Maybe it's hitting a file handle ulimit or something? It would be nice to know more about the child process invocation (args, output, exit code). | 13:00:19 |
Matt Sturgeon |
If it was out of memory it would have to be using a lot of memory to be reproducible on my desktop machine.
Good point, I have 64GB and was reproducing the issue fairly regularly. I take it there's no artificial memory limit from dotnet or the build sandbox?
Maybe it's hitting a file handle ulimit or something?
Could be something like that. Or could just be random segfaults in buggy software... 🤔
It would be nice to know more about the child process invocation (args, output, exit code).
Anything specific you want me to test, other than just time -v ?
| 20:49:59 |
Corngood | If it's building in the sandbox and under the systemd daemon, it might have different limits than dev shell.
The nuclear logging option is probably `strace -f`. That's going to be a lot of output though.
Maybe I'll see if I can find anything useful in `VSTestTask` | 21:00:06 |
30 Jun 2025 |
Matt Sturgeon | I'm wondering if it is practical to move the tests into a passthru.tests derivation 🤔
That way 1) end-users wouldn't need to run the tests to build the unfree package, 2) intermittent tests wouldn't cause build failures on hydra.
I'd still like to uncover the actual issue, but I'm curious if extracting the tests to a separate derivation is possible?
I guess the test derivation would need the same src as the main package, to access the test cases and relevant build properties. It'd also need the main package as an input, to get the built binaries.
That sounds reasonable, but what I'm unsure of is how easy it'd be to tell dotnet test to run the local test cases against the pre-built binaries from the main package?
| 12:19:21 |
Matt Sturgeon |
Maybe I'll see if I can find anything useful in VSTestTask
Looks like the impl is here: https://github.com/dotnet/dotnet/blob/main/src/vstest/src/Microsoft.TestPlatform.Build/Tasks/VSTestTask.cs
There's a note about the "Task returned false but did not log an error." error at line 69.
Other than that, I nothing is jumping out at me 🤔
Also: I noticed that the Test host process crashed and "VSTestTask" task returned false but did not log an error. errors also appear in normal (failing) build logs, even without -v:diag . We just need to scroll up to find them, as they're usually not at the end of the logs.
| 12:39:42 |
Matt Sturgeon | It looks like the same errors occasionally shows up in upstream's CI too. E.g. here | 17:35:27 |
Matt Sturgeon | * It looks like the same (or similar) errors occasionally shows up in upstream's CI too. E.g. here | 17:39:46 |
1 Jul 2025 |
Corngood | In reply to @mattsturg:matrix.org
I'm wondering if it is practical to move the tests into a passthru.tests derivation 🤔
That way 1) end-users wouldn't need to run the tests to build the unfree package, 2) intermittent tests wouldn't cause build failures on hydra.
I'd still like to uncover the actual issue, but I'm curious if extracting the tests to a separate derivation is possible?
I guess the test derivation would need the same src as the main package, to access the test cases and relevant build properties. It'd also need the main package as an input, to get the built binaries.
That sounds reasonable, but what I'm unsure of is how easy it'd be to tell dotnet test to run the local test cases against the pre-built binaries from the main package?
One way to do this would be to disable tests and just put an override with them enabled in passthru.tests .
Is it consistently crashing in certain tests? Maybe we can get away with disabling only certain tests or projects? | 01:19:19 |
Matt Sturgeon |
One way to do this would be to disable tests and just put an override with them enabled in passthru.tests.
Good idea. I've opened https://github.com/NixOS/nixpkgs/pull/421459
Overriding is a simple solution, but it means building the app twice... I was hoping we'd be able to reuse the binaries from the main package, as an input for the test package.
Is it consistently crashing in certain tests? Maybe we can get away with disabling only certain tests or projects?
I'm not sure; I don't think it prints which test failed in the logs. It might be hidden in the v:diag logs somewhere though. I also suspect it is a specific test or project, so if we can identify it we can disable it and report the issue upstream.
| 10:37:33 |