3 Sep 2023 |
hexa | restarted three times so far | 01:11:20 |
hexa |  Download image.png | 08:38:55 |
hexa | this is going great | 08:38:56 |
vcunat | It's certainly worse than usual. | 08:42:42 |
vcunat | (I normally do a few firefox test restarts per eval) | 08:43:00 |
vcunat | Ah, I see. | 08:44:04 |
vcunat | In recent past it looked like failing unless built on t4b machine (which is set up quite differently). | 08:45:31 |
vcunat | But the machine is unreachable right now. | 08:45:44 |
vcunat | (Which I did know. I planned to look into it physically today anyway.) | 08:46:16 |
vcunat | It would be best if the test wasn't so sensitive. | 08:47:02 |
vcunat | * It would be best if the test wasn't so sensitive, of course. | 08:47:05 |
hexa | can we lift the fd constraint somehow in nixos tests? | 08:56:12 |
hexa | because this is really weird | 08:56:20 |
hexa | why is it sufficient on my builders, but not on hydra's when both essentially start a naked nixos in qemu | 08:56:42 |
vcunat | (I have no idea how this happens.) | 09:01:02 |
hexa | OK, will dig into it a bit once i can sit down | 09:19:00 |
vcunat | For now unblocked. Fixed t4b and got lucky scheduling it there on the first retry. But it will be nice if you address the fragility anyway, as restarts are needed often. | 11:54:57 |
hexa | for me locally the resource limit for nofile on the test script is 1048576 | 20:32:46 |
hexa | both soft and hard | 20:33:48 |
4 Sep 2023 |
nbp | Is this an error on the emulated system or on the host? Maybe hydra has too many concurrent jobs. | 09:32:39 |
hexa | machine: must succeed: sleep 2
(finished: must succeed: sleep 2, in 2.02 seconds)
machine # base64: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 24
machine # tail: error while loading shared libraries: libcrypto.so.3: cannot open shared object file: Error 24
machine: must succeed: stat -c '%s' /tmp/last
machine # bash: line 1: /run/current-system/sw/bin/stat: Too many open files
machine: output:
Test "Check whether Firefox can play sound" failed with error: "command `stat -c '%s' /tmp/last` failed (exit code 126)"
cleanup
kill machine (pid 6)
machine # qemu-kvm: terminating on signal 15 from pid 4 (/nix/store/pkj7cgmz66assy7l18zc7j992npb41nx-python3-3.10.12/bin/python3.10)
(finished: cleanup, in 0.05 seconds)
kill vlan (pid 5)
| 13:07:44 |
hexa | could be the host, or the test runner itself | 13:08:24 |
nbp | maybe lsof would help tell them apart. | 13:10:25 |
nbp | change the test case to include the output of lsof command. | 13:11:06 |
hexa | a quick sampling with pustil reveals that qemu_kvm holds too many fds | 13:34:50 |
hexa | vm-test-run-firefox-unwrapped> (finished: waiting for the X11 server, in 17.94 seconds)
vm-test-run-firefox-unwrapped> machine: bash=4
vm-test-run-firefox-unwrapped> machine: .nixos-test-dri=13
vm-test-run-firefox-unwrapped> machine: vde_switch=6
vm-test-run-firefox-unwrapped> machine: qemu-kvm=551
| 13:34:55 |
hexa | vm-test-run-firefox-unwrapped> machine: bash=4
vm-test-run-firefox-unwrapped> machine: .nixos-test-dri=13
vm-test-run-firefox-unwrapped> machine: vde_switch=6
vm-test-run-firefox-unwrapped> machine: qemu-kvm=2006
vm-test-run-firefox-unwrapped> subtest: Check whether Firefox can play sound
| 13:35:07 |
hexa | to me that makes it hydra's fault for constraining build jobs like that | 13:36:03 |
K900 | But why would it do that on Hydra and not on other systems | 13:37:01 |
hexa | yeah, the open question | 13:37:26 |