8 Apr 2025 |
Leonardo Santiago | Yes, github.com/o-santi/nix-forall | 12:24:38 |
Leonardo Santiago | I took some inspiration from yours at some places hahaha, specially at the read_string/read_hashmap callbacks | 12:25:26 |
Leonardo Santiago | And I think yours is much more careful when handling thread locking and GC. I don't really understand the constraints there so maybe there's something wrong I did that caused this. | 12:26:45 |
Robert Hensing (roberth) | If everything is on the main thread you're fine, and nix_value registers/deregisters itself with the GC just fine (if all is well), but the GC may not be happy if it needs to operate from a thread it doesn't know about. That includes allocation, so for all intents and purposes that's the whole of nix-expr that should be called from registered threads (or the main thread) only | 12:29:59 |
Robert Hensing (roberth) | You'd get an error message along the lines of "trying to GC from unknown thread". I don't think it can cause corruption necessarily | 12:30:37 |
Robert Hensing (roberth) | but maybe my assumption about GC roots is wrong, and my code does rely on stack scanning regardless | 12:31:07 |
Leonardo Santiago | I think the problem may lie there, pyo3 requires that all your structs be freely movable between threads, as python is not really a single threaded interpreter, it just heavily relies on the GIL. I'm not doing anything multithreaded from the python side, much to the contrary, I'm just state.eval_file('path').get(attr) but I wouldn't say it isn't moving it to another thread either | 12:32:48 |
Robert Hensing (roberth) | Thing is, you might get away with coincidentally not triggering GC in your other threads / stacks | 12:32:58 |
Leonardo Santiago | One more weird detail is that the segfault is really consisten, happens everytime I run the program, but it doesn't happen always at the same place. | 12:33:59 |
Leonardo Santiago | Sometimes it happens at nix::ExprSelect::eval , sometimes at nix::ExprAttr::eval , in the original message I think it happened at nix::ExprVar::eval | 12:34:32 |
Leonardo Santiago | So indeed this may be related to something that is shared/passed to all of them, most likely the EvalState itself, as I may be doing something incorrectly with it. | 12:35:21 |
Leonardo Santiago | But I don't understand what it is yet, I'll dig further | 12:35:50 |
Leonardo Santiago | => if this was a race condition I don't think it would be this reproducible, it would sometimes fail and sometimes not. | 12:36:47 |
Leonardo Santiago | * One more weird detail is that the segfault is really consistent, happens everytime I run the program, but it doesn't happen always at the same place. | 12:36:55 |
Robert Hensing (roberth) | yeah | 12:37:53 |
Leonardo Santiago | Guess what? Setting ulimit -s unlimited made it work. | 16:29:35 |
Leonardo Santiago | It was an uncaught stack overflow. | 16:29:45 |
Leonardo Santiago | Didn't even occur to me until now. | 16:29:57 |
10 Apr 2025 |
Leonardo Santiago | @roberth how does nix circunvent this issue in their main binary? I see I can try leveraging ld 's -z stack_size=X but it only seems to work if you set it in the entry point elf binary, which I can't do as it's python ! I didn't want to bleed this problem elsewhere, like force people to set ulimit -s unlimited , but I don't see many other ways around it, and surely nix has had to deal with this; though it is the elf entry point. Any tips or hints? | 13:21:26 |
Leonardo Santiago | There's the possiblity of spawning a new thread with an increased stack size, but that adds the context switching overhead to every nix evaluation, which is something I'd like to avoid if possible. | 13:22:44 |
Leonardo Santiago | * @roberth how does nix circunvent this issue in their main binary? I see I can try leveraging ld 's -z stack_size=X but it only seems to work if you set it in the entry point elf binary, which I can't do as it's python and I'm merely offering a .so extension library. I didn't want to bleed this problem elsewhere, like force people to set ulimit -s unlimited , but I don't see many other ways around it, and surely nix has had to deal with this; though it is the elf entry point. Any tips or hints? | 13:23:25 |
Leonardo Santiago | There's also the possibility of using setrlimit to increase the process's own stack size, that seems to me like the most graceful solution | 14:09:58 |
Leonardo Santiago | And indeed, that seems the solution that nix's binary uses, src/nix/main.cc calls nix::setStackSize(64MB) which internally calls setrlimit with that value. Awesome to know, most likely I'll try going down this route. | 14:12:10 |
15 Apr 2025 |
Leonardo Santiago | I notice I may be bringing all the stupid problems and ideas to the chat, but would it be possible to static link against the nix C libraries? | 15:55:05 |
Leonardo Santiago | * I notice I may be the onebringing all the stupid problems and ideas to the chat, but would it be possible to static link against the nix C libraries? | 15:56:12 |
Leonardo Santiago | * I notice I may be the one bringing all the stupid problems and ideas to the chat, but would it be possible to static link against the nix C libraries? | 15:56:14 |
Leonardo Santiago | I'm trying to optimize the performance of a custom eval cache using the C API I wrote and the cache hit is at around 20ms, which seems very good, but knowing what it does I know it's not really that impressive, since it's hashing some 20 files and querying on a sqlite file, 20ms is actually very poor performance if you consider it. | 15:57:49 |
Leonardo Santiago | I tried perf record ing and most of the time (15ms~ish) seems to be spent on do_lookup_x , which seems to be a libc function related to finding the dynamic libraries. | 15:59:04 |
Leonardo Santiago | * I tried perf record ing and most of the time (15ms~ish) seems to be spent on do_lookup_x , which seems to be a libc function related to finding the dynamic libraries, and there are ~66 linked libraries, most related to nix stuff like libaws-c-sdkutils.so.1.0.0 which shouldn't even be used in this case but are loaded anyway before the program starts. | 16:01:47 |
Leonardo Santiago | If I remove most of the C API usage and just compile a simple binary to query the sqlite file it reduces down to 4 linked libraries and returns in ~6ms~ in --release mode, which seems to hint that indeed most of the time is spent finding the libraries | 16:03:47 |