!CcTBuBritXGywOEGWJ:matrix.org

NixOS Binary Cache Self-Hosting

173 Members
About how to host a very large-scale binary cache and more60 Servers

Load older messages


SenderMessageTime
5 Mar 2024
@edef1c:matrix.orgedefthe metadata for all this is peanuts05:10:17
@edef1c:matrix.orgedefi've built some models for the live/die together part05:10:46
@edef1c:matrix.orgedefbut had some data quality/enrichment stuff to resolve first and i haven't redone that analysis yet05:11:07
@nh2:matrix.orgnh2

For nix store paths the annoying thing is that there's no natural "files that live together" that can be automatically deducted.
For my app, all files myfile00 through myfile99 go into myfile.zip.

So you'd have to write some index that says in which archive which store path is.

Assuming we never delete anything, the packing can be arbitrary.

05:11:43
@edef1c:matrix.orgedefimage.png
Download image.png
05:11:54
@edef1c:matrix.orgedeflike this chart is a little iffy bc it shouldn't have this long a left tail, i have the data cleanup now to fix it05:12:00
@edef1c:matrix.orgedefbut other things on my plate before i can get to that one05:12:17
@edef1c:matrix.orgedef basically this is meant to model temporal locality of path references 05:12:52
@edef1c:matrix.orgedefunfortunately it has a time travel issue that i think should be fixed now05:13:22
@edef1c:matrix.orgedefi just need to do the dance again05:14:47
@edef1c:matrix.orgedefalso, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths?05:15:47
@edef1c:matrix.orgedef(direct, not transitive)05:16:40
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths?
you mean like "glibc is depended on by 200k packages, libpng by 10k, Ceph by 3"?
05:17:06
@edef1c:matrix.orgedefdirectionally correct yes05:17:25
@edef1c:matrix.orgedefi have a scatterplot but it's a fun thought exercise i don't want to rob you of by posting the plot first :p05:18:05
@nh2:matrix.orgnh2image.png
Download image.png
05:19:24
@edef1c:matrix.orgedefyeah p much, it's power law distributed05:19:43
@edef1c:matrix.orgedefimage.png
Download image.png
05:19:54
@edef1c:matrix.orgedef(this is log-log)05:19:55
@nh2:matrix.orgnh2"the golf putter distribution"05:20:50
@edef1c:matrix.orgedefhaha05:20:58
@edef1c:matrix.orgedeflike you just take log on both axes and boom, you can fit a linear regression straight to it05:21:36
@edef1c:matrix.orgedefgenuinely didn't expect the empirical distribution to come out that pretty05:22:00
@edef1c:matrix.orgedefi'd like to have some nice plots on what our request distribution looks like / how paths age out of the hot set but i just haven't done much crunching of the bucket log dataset yet05:26:47
@edef1c:matrix.orgedefand we don't have a huge sample for that either, we only started logging those in november05:29:01
@edef1c:matrix.orgedefwe have fastly logs for a somewhat longer period05:29:10
@edef1c:matrix.orgedefyeah we only serve like ~12M unique NARs over the entire period we have bucket logs for05:34:31
@edef1c:matrix.orgedefso only like a quarter of them are in any sense remotely "live"05:35:18
@edef1c:matrix.orgedefthough that's kinda higher than i intuitively expected05:36:31
@domenkozar:matrix.orgDomen Kožar joined the room.07:29:56

Show newer messages


Back to Room ListRoom Version: 10