| 5 Mar 2024 |
edef | the metadata for all this is peanuts | 05:10:17 |
edef | i've built some models for the live/die together part | 05:10:46 |
edef | but had some data quality/enrichment stuff to resolve first and i haven't redone that analysis yet | 05:11:07 |
nh2 | For nix store paths the annoying thing is that there's no natural "files that live together" that can be automatically deducted. For my app, all files myfile00 through myfile99 go into myfile.zip.
So you'd have to write some index that says in which archive which store path is.
Assuming we never delete anything, the packing can be arbitrary.
| 05:11:43 |
edef |  Download image.png | 05:11:54 |
edef | like this chart is a little iffy bc it shouldn't have this long a left tail, i have the data cleanup now to fix it | 05:12:00 |
edef | but other things on my plate before i can get to that one | 05:12:17 |
edef | basically this is meant to model temporal locality of path references | 05:12:52 |
edef | unfortunately it has a time travel issue that i think should be fixed now | 05:13:22 |
edef | i just need to do the dance again | 05:14:47 |
edef | also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths? | 05:15:47 |
edef | (direct, not transitive) | 05:16:40 |
nh2 | In reply to @edef1c:matrix.org also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths? you mean like "glibc is depended on by 200k packages, libpng by 10k, Ceph by 3"? | 05:17:06 |
edef | directionally correct yes | 05:17:25 |
edef | i have a scatterplot but it's a fun thought exercise i don't want to rob you of by posting the plot first :p | 05:18:05 |
nh2 |  Download image.png | 05:19:24 |
edef | yeah p much, it's power law distributed | 05:19:43 |
edef |  Download image.png | 05:19:54 |
edef | (this is log-log) | 05:19:55 |
nh2 | "the golf putter distribution" | 05:20:50 |
edef | haha | 05:20:58 |
edef | like you just take log on both axes and boom, you can fit a linear regression straight to it | 05:21:36 |
edef | genuinely didn't expect the empirical distribution to come out that pretty | 05:22:00 |
edef | i'd like to have some nice plots on what our request distribution looks like / how paths age out of the hot set but i just haven't done much crunching of the bucket log dataset yet | 05:26:47 |
edef | and we don't have a huge sample for that either, we only started logging those in november | 05:29:01 |
edef | we have fastly logs for a somewhat longer period | 05:29:10 |
edef | yeah we only serve like ~12M unique NARs over the entire period we have bucket logs for | 05:34:31 |
edef | so only like a quarter of them are in any sense remotely "live" | 05:35:18 |
edef | though that's kinda higher than i intuitively expected | 05:36:31 |
| Domen Kožar joined the room. | 07:29:56 |