Nix Hackers | 954 Members | |
| For people hacking on the Nix package manager itself | 203 Servers |
| Sender | Message | Time |
|---|---|---|
| 9 Aug 2021 | ||
| manveru: what do you save this way? | 12:13:14 | |
| * manveru: what do you safe this way? | 12:13:22 | |
| Ah constructing a string. | 12:13:37 | |
| yeah :) | 12:13:43 | |
| would need to bench to see if it's worth it... just stumbled over it when reading the code | 12:14:08 | |
| I was also trying to optimize the very same code yesterday | 12:14:25 | |
| There are different hash table implementations one can use. | 12:14:41 | |
| llvm also has a hash set specific to symbols that allocates strings within the hash table to save memory. | 12:15:08 | |
| I guess a count should be cheaper than a malloc but maybe you can do the benchmark. | 12:16:08 | |
| well, i'd imagine a count won't allocate anything in the GC, so it should help for stuff like haskell.nix which has a ton of symbols afaik... | 12:20:46 | |
Why count over find? Do you fear the allocation an allocator might produce? count could probably be more costly than find that (should?) return on the first positive match. | 12:40:15 | |
| There are also other interesting hash tables. For example for pointer to pointer maps, a dense hash map is better. | 14:29:57 | |
| We probably want to store only the ptr but sort/bucket them by a hash for the string value to make lookups fast(er)? | 15:54:12 | |
| 15:59:41 | ||
| I did a very unscientific benchmark of one version that uses unique_ptr and allocates only for new elements (AFAICT) and one that is basically what we have right now: https://quick-bench.com/q/BI8_-Nfg9dYrGBEwKR-0xm1QKw4 | 17:19:38 | |
Download BI8_-Nfg9dYrGBEwKR-0xm1QKw4.png | 17:19:54 | |
| If you move the constructor / destructor of the table out of the loop it is 200 / 1100 in favor of the ptr approach. | 17:23:23 | |
| The sad story is that this doesn't really seem to have an effect in practice :( It is +- 0.02 sec difference when evaling the Firefox test. | 17:47:32 | |
| andi-: i'd love to try that with something a bit more GC heavy to see memory impact... i didn't expect much CPU difference | 18:54:27 | |
| But those Symbols aren't tracked by the GC? Otherwise we should set an allocator for them. | 18:55:10 | |
| i don't know how C++ works unfortunately :( | 18:55:47 | |
| They are allocated once and stay there until the process terminates | 18:56:16 | |
| my assumption was that this std::string thing does an allocation that needs to be cleaned up again afterwards | 18:56:26 | |
| I has to be cleaned up but that happens then the entire symboltable goes out of scope | 18:57:01 | |
With C++20 (and GCC 11) we could make use of Heterogeneous lookup for unordered containers (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r3.html) for the lookup using the std::string_view... so far that isn't possible without huge effort AFAICT. | 19:53:17 | |
In reply to @manveru:matrix.orgeven c++ programmers don't know how it works ;-) | 22:46:36 | |
| I am trying to compile
Feels like the | 23:52:00 | |
| 10 Aug 2021 | ||
| 05:51:17 | ||
| What is the point of the GitHub fetcher? I tested the speed difference compared to the Git fetcher and it wasn't so different. | 12:34:16 | |
| I think it performs much worse for big repos | 12:35:20 | |