| 21 Oct 2025 |
Eelco | learned today that using std::string for large buffers is very inefficient (huge kernel overhead): https://github.com/DeterminateSystems/nix-src/pull/238/commits/edf45d6e1158a059e7ded5460d2a3947dc35fdf8 | 20:44:48 |
Sergei Zimmerman (xokdvium) | This is sort of more about doing unnecessary construction/destruction of an object. Not special to std::string in any way | 20:55:40 |
Eelco | it's a result of having a large contiguous allocation (so it would also affect std::vector<char>) | 20:56:21 |
Sergei Zimmerman (xokdvium) | Yeah, it's much better to allocate once and reuse that allocation as you did there | 20:56:45 |
Eelco | a data type that consists of a vector of buffers would avoid this problem | 20:57:06 |
Sergei Zimmerman (xokdvium) | I did a similar cleanup in filetransfer..cc at some point | 20:57:07 |
Sergei Zimmerman (xokdvium) | You'd still benefit from allocating it only once though | 20:57:23 |
Sergei Zimmerman (xokdvium) | Going through malloc/free is still a function call through PLT and doing that in a loop is kind of expensive anyway | 20:58:05 |
Sergei Zimmerman (xokdvium) | https://llvm.org/docs/ProgrammersManual.html#vector | 20:58:58 |
Eelco | it's funny though that the optimization that skipped parsing 15 GB of NARs (https://github.com/DeterminateSystems/nix-src/pull/238/commits/1f8d587a0df8f9de366640831dade43d17021c30) had basically no observable effect | 21:01:55 |
Eelco | it's completely dwarfed by the memory allocation / page fault overhead | 21:02:07 |
Sergei Zimmerman (xokdvium) | Yeah that will do it certainly. Also having a too large buffer on the stack is bad:
https://github.com/NixOS/nix/pull/13877 | 21:03:27 |
Sergei Zimmerman (xokdvium) | The stack pointer does get decremented one page at a time. Is that the default behavior or some hardening flag? | 21:04:10 |
Eelco | surprising since stack pages should stay around once paged in | 21:07:22 |
Eelco | though there is some overhead to handle guard pages | 21:07:42 |
Eelco | so it has to touch at least 1 byte every 4096 bytes | 21:07:52 |
Sergei Zimmerman (xokdvium) | Yeah that was the overhead. A loop over all the pages | 21:08:28 |
Sergei Zimmerman (xokdvium) | 1.23 │ lea -0x10000(%rsp),%r11
0.23 │ 15: sub $0x1000,%rsp
1.01 │ orq $0x0,(%rsp)
59.12 │ cmp %r11,%rsp
0.27 │ ↑ jne 15
| 21:08:42 |
Eelco | right, that's to avoid a segfault if you have guard pages enabled (which I think is the default) | 21:09:13 |
Eelco | I would expect the overhead for that loop to be pretty trivial though | 21:09:31 |
Eelco | in the case where the pages are present | 21:09:52 |
| 22 Oct 2025 |
| 0xcafca changed their profile picture. | 10:21:53 |
| 0xcafca changed their profile picture. | 10:23:31 |
tomberek | @niksnut:matrix.org: builtins.fetchTree cannot take advantage of the "__final" optimization. This means usages of flake-compat will re-fetch inputs unnecessarily. Is there a way to expose `prim_fetchFinalTree`. This can create a large performance regression. | 15:11:06 |
Eelco | I think we should allow fetchTree { final = true; ... } | 15:11:43 |
Robert Hensing (roberth) | I get what it does but I never felt like I had a complete understanding somehow. If we were wrong about final we could always design something better without the pressure and call it fetchSource :) | 15:14:03 |
Eelco | final just means it won't add more attributes | 15:17:39 |
tomberek | @roberthensing:matrix.org: is the concern that it would be abused ir ossify some behavior? | 15:26:18 |
Robert Hensing (roberth) | I guess I just expected it to be prettier | 15:26:44 |
Robert Hensing (roberth) | sometimes things just aren't, and that's ok | 15:27:35 |