| 22 Dec 2025 |
Ihar Hrachyshka | Qemu nixpkgs vm hangs on boot sometimes, with traceback in Console that suggests FD_SETSIZE overflow in g_poll. both aarch64 and x86_64 emulated VMs are affected. lsof shows qemu using ~1024+ fds then.
Sometimes the issue happens, sometimes not (so a workaround is rebooting the VM several times until it "settles" on a relatively low number of FDs, ~500-700).
qemu has its own "g_poll" implementation for platforms that support ppoll() that doesn't go through glib loop. AFAIU macos doesn't support it. (?)
is this a known issue? is there anything we could do about it?
| 15:09:23 |
wshtk | solved: https://github.com/nix-darwin/nix-darwin/issues/1148 | 15:26:20 |
Randy Eckenrode | As far as I call tell from the man pages, which admittedly kind of suck, all ppoll does is also let you wait on signals. As far as I can tell, g_poll does not let you do that. What is ppoll also doing in this case that one couldn’t do manually with poll? | 15:31:30 |
Ihar Hrachyshka | qemu g_poll implementation: https://github.com/qemu/qemu/blob/bb7fc1543fa45bebe7eded8115f25441a9fee76e/util/qemu-timer.c#L323-L347 | 15:33:37 |
Ihar Hrachyshka | in glib, there's this in meson
if host_system in ['windows', 'darwin']
# Poll doesn't work on devices on Windows, and macOS's poll() implementation is known to be broken
glib_conf.set('BROKEN_POLL', true)
endif
| 15:34:28 |
Randy Eckenrode | I wonder what they mean by “broken”. | 15:35:04 |
Randy Eckenrode | Sometimes “broken” means “conforms to POSIX but doesn’t do what GNU does”. | 15:35:48 |
Ihar Hrachyshka | /* The poll() emulation on OS/X doesn't handle fds=NULL, nfds=0,
* so we prefer our own poll emulation.
| 15:35:50 |
Ihar Hrachyshka | the MR that mentions tap networking broken? https://gitlab.gnome.org/GNOME/glib/-/merge_requests/2571 | 15:39:46 |
Ihar Hrachyshka | there's a "check" they use to detect a "broken" poll here: https://gitlab.gnome.org/GNOME/glib/-/commit/caecf2dda082e9c46c4157cdc10763deb8dcfc27
but afaiu it is no longer needed and just forced on darwin. wonder if the check would pass now... | 15:40:56 |
Ihar Hrachyshka | * there's a "check" they use to detect a "broken" poll here: https://gitlab.gnome.org/GNOME/glib/-/commit/caecf2dda082e9c46c4157cdc10763deb8dcfc27
but afaiu it is no longer used and just forced on darwin. wonder if the check would pass now... | 15:41:04 |
Randy Eckenrode | According to POSIX, fds is an array. My understanding is that NULL is not a valid value for an array in C. | 15:41:55 |
Ihar Hrachyshka | the check compiled with xcode clang returns 1 | 15:42:22 |
Randy Eckenrode | https://pubs.opengroup.org/onlinepubs/9799919799/functions/poll.html | 15:42:38 |
Randy Eckenrode | ppoll is apparently part of POSIX now. No idea if or when Apple will add it. | 15:43:07 |
Ihar Hrachyshka | they could at least maybe conditionalize it. like if it's null and the user needs some non-standard behavior, go through select. otherwise...
I think main loop for qemu doesn't pass nulls there. | 15:43:45 |
Ihar Hrachyshka | it's 2024 posix so rather new | 15:43:59 |
Randy Eckenrode | Yeah. Apple is adding newer stuff, but they still only go for UNIX03 when they certify. | 15:44:53 |
Randy Eckenrode | So no guarantee, particularly if they have alternative APIs already. | 15:45:24 |
Randy Eckenrode | Does that means it works now? | 15:46:11 |
Ihar Hrachyshka | exit(1); /* Does not work for devices -- fail */ | 15:46:34 |
Randy Eckenrode | That code is almost twenty years old. It wouldn’t be the first time Glib makes an assumption that doesn’t apply on modern Darwin. | 15:46:48 |
Randy Eckenrode | What is the expected semantics when the user provides NULL fds and nfds 0? | 15:47:44 |
Randy Eckenrode | * What are the expected semantics when the user provides NULL fds and nfds 0? | 15:47:57 |
Randy Eckenrode | Is it equivalent to passing an empty array? | 15:48:53 |
Randy Eckenrode | macOS supports pselect but not ppoll? | 15:51:13 |
Randy Eckenrode | Let’s look at the implementation. The signature takes a pointer, so NULL should be valid. | 15:52:54 |
Ihar Hrachyshka | one could probably implement g_poll conditioning to default to poll but fall back to select if a) any device fds passed or b) fds is null. then for most calls we would use poll. | 15:53:06 |
Ihar Hrachyshka | that's on nixos
(ins)[nix-shell:/tmp]$ ./a.out
poll() returned: 0
(ins)[nix-shell:/tmp]$ cat test.c
#include <stdio.h>
#include <poll.h>
#include <errno.h>
#include <string.h>
int main() {
int result = poll(NULL, 0, 1000);
printf("poll() returned: %d\n", result);
if (result == -1) {
printf("Error: %s\n", strerror(errno));
}
return 0;
}
(ins)[nix-shell:/tmp]$ clang ./test.c
(ins)[nix-shell:/tmp]$ ./a.out
poll() returned: 0
| 15:53:15 |
Ihar Hrachyshka | actuall it returns the same zero on darwin, not sure if there's more than just that they are looking for... | 15:54:09 |