NixOS Binary Cache Self-Hosting

101 Members
About how to host a very large-scale binary cache and more46 Servers

Load older messages

5 Mar 2024
In reply to @nh2:matrix.org
Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks.
yeah, we have a similar problem with Glacier
@edef1c:matrix.orgedefwhere objects are costly but size is cheap05:04:21
@edef1c:matrix.orgedefso i intend to do aggregation into larger objects05:04:53
@edef1c:matrix.orgedefbasically we can handle a lot of the read side of that by accepting that tail latencies suck and we just have a bunch of read amplification reading from larger objects and caching what's actually hot05:05:49
@edef1c:matrix.orgedefi'd really like to have build timing data so we can maybe just pass on requests for things that are quick to build05:07:05
In reply to @edef1c:matrix.org
and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget?
Yes, that should be fine, becaus the expected mean serving req/s are only 20% of the IOPS budget (a bit more for writes)
@edef1c:matrix.orgedefbut i'm not entirely sure how much of that data exists05:07:33
In reply to @edef1c:matrix.org
so i intend to do aggregation into larger objects
This is how we also solved the many-small-files problem on our app's production Ceph. We zipped put "files that live and die together" -- literally put them into a 0-compression ZIP, and the web server serves them out of the zip.
That way we reduced the number of files 100x, making Ceph recoveries approximately that much faster.
@edef1c:matrix.orgedefthe metadata for all this is peanuts05:10:17
@edef1c:matrix.orgedefi've built some models for the live/die together part05:10:46
@edef1c:matrix.orgedefbut had some data quality/enrichment stuff to resolve first and i haven't redone that analysis yet05:11:07

For nix store paths the annoying thing is that there's no natural "files that live together" that can be automatically deducted.
For my app, all files myfile00 through myfile99 go into myfile.zip.

So you'd have to write some index that says in which archive which store path is.

Assuming we never delete anything, the packing can be arbitrary.

Download image.png
@edef1c:matrix.orgedeflike this chart is a little iffy bc it shouldn't have this long a left tail, i have the data cleanup now to fix it05:12:00
@edef1c:matrix.orgedefbut other things on my plate before i can get to that one05:12:17
@edef1c:matrix.orgedef basically this is meant to model temporal locality of path references 05:12:52
@edef1c:matrix.orgedefunfortunately it has a time travel issue that i think should be fixed now05:13:22
@edef1c:matrix.orgedefi just need to do the dance again05:14:47
@edef1c:matrix.orgedefalso, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths?05:15:47
@edef1c:matrix.orgedef(direct, not transitive)05:16:40
In reply to @edef1c:matrix.org
also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths?
you mean like "glibc is depended on by 200k packages, libpng by 10k, Ceph by 3"?
@edef1c:matrix.orgedefdirectionally correct yes05:17:25
@edef1c:matrix.orgedefi have a scatterplot but it's a fun thought exercise i don't want to rob you of by posting the plot first :p05:18:05
Download image.png
@edef1c:matrix.orgedefyeah p much, it's power law distributed05:19:43
Download image.png
@edef1c:matrix.orgedef(this is log-log)05:19:55
@nh2:matrix.orgnh2"the golf putter distribution"05:20:50

There are no newer messages yet.

Back to Room ListRoom Version: 10