4 Sep 2023 |
hexa | could be the host, or the test runner itself | 13:08:24 |
nbp | maybe lsof would help tell them apart. | 13:10:25 |
nbp | change the test case to include the output of lsof command. | 13:11:06 |
hexa | a quick sampling with pustil reveals that qemu_kvm holds too many fds | 13:34:50 |
hexa | vm-test-run-firefox-unwrapped> (finished: waiting for the X11 server, in 17.94 seconds)
vm-test-run-firefox-unwrapped> machine: bash=4
vm-test-run-firefox-unwrapped> machine: .nixos-test-dri=13
vm-test-run-firefox-unwrapped> machine: vde_switch=6
vm-test-run-firefox-unwrapped> machine: qemu-kvm=551
| 13:34:55 |
hexa | vm-test-run-firefox-unwrapped> machine: bash=4
vm-test-run-firefox-unwrapped> machine: .nixos-test-dri=13
vm-test-run-firefox-unwrapped> machine: vde_switch=6
vm-test-run-firefox-unwrapped> machine: qemu-kvm=2006
vm-test-run-firefox-unwrapped> subtest: Check whether Firefox can play sound
| 13:35:07 |
hexa | to me that makes it hydra's fault for constraining build jobs like that | 13:36:03 |
K900 | But why would it do that on Hydra and not on other systems | 13:37:01 |
hexa | yeah, the open question | 13:37:26 |
hexa | ajs124: maybe something hydra does? | 13:38:01 |
ajs124 | don't think that's a hydra thing. more like some strange config on the hydra build nodes. | 13:38:56 |
hexa | yeah, trying to find that config as we speak | 13:39:19 |
hexa | I think we're using https://github.com/DeterminateSystems/nix-netboot-serve to serve netboot images | 13:40:15 |
hexa | runs on eris apparently | 13:40:46 |
hexa | wondering if our runner configs are private? | 14:03:42 |
hexa | or state on eris even | 14:03:45 |
hexa | the nix-netboot-serve configures is too minimal | 14:05:49 |
hexa | https://github.com/NixOS/equinix-metal-builders/blob/main/modules/nix.nix#L34 | 14:22:06 |
hexa | there is a hard fdlimit on the nix-daemon | 14:22:18 |
vcunat | A million (per process) sounds quite a lot. | 14:42:54 |
vcunat | Unless some bad leak happens. Maybe it's more likely that it's stuck on a low soft limit or that it doesn't propagate as we'd expect. | 14:43:44 |
nbp | I wish we could have a wireguard-boot, where one image would connect using wireguard to download its latest image. This way we could make it work without having to redo the DHCP of the network. | 15:11:38 |
hexa |
Nowadays, the hard limit defaults to 524288, a very high value compared to historical defaults. Typically applications should increase their soft limit to the hard limit on their own, if they are OK with working with file descriptors above 1023, i.e. do not use select(2).
| 15:12:14 |
hexa | I think knowing what number of open fds we're having on the builders would be an easy first step | 15:21:40 |
K900 | In reply to @vcunat:matrix.org A million (per process) sounds quite a lot. It's not per process though | 15:22:43 |
K900 | It's per cgroup | 15:22:46 |
K900 | And everything is in the cgroup | 15:22:53 |
vcunat | Can you point me to docs about that? | 15:27:52 |
K900 | Uh, it's in systemd docs somewhere | 15:28:11 |
vcunat | I really thought that setrlimit is per-process and I can't quickly find a reference. | 15:28:15 |