NixOS Binary Cache Self-Hosting - Public Room Timeline

	NixOS Binary Cache Self-Hosting	175 Members
	About how to host a very large-scale binary cache and more	62 Servers

Load older messages

Sender	Message	Time
5 Mar 2024
nh2	In reply to @edef1c:matrix.org (and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot) No problem, is not urgent, I should also really go to bed.	05:00:23
edef	In reply to @nh2:matrix.org This part will likely be the hardest / most annoying one operationally. With 6 servers * 10 disks, each one will have ~13 M objects. When a disk fails, 13 M seeks will need to be done, which will take 37 hours. When a server fails, it'll be 10x as much, so 15 days to recovery. During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2. okay, that's terrifying	05:00:30
edef	but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply	05:01:04
edef	and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget?	05:01:59
edef	O(objects) anything is kind of rough here	05:02:23
edef	like we're going to hit a billion objects pretty soon, the growth is slightly superlinear at minimum	05:03:04
nh2	In reply to @edef1c:matrix.org but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks.	05:03:37
edef	also note that if we start doing more serious dedup, i'll be spraying your stuff with small random I/O	05:03:42
edef	In reply to @nh2:matrix.org Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks. yeah, we have a similar problem with Glacier	05:04:14
edef	where objects are costly but size is cheap	05:04:21
edef	so i intend to do aggregation into larger objects	05:04:53
edef	basically we can handle a lot of the read side of that by accepting that tail latencies suck and we just have a bunch of read amplification reading from larger objects and caching what's actually hot	05:05:49
edef	i'd really like to have build timing data so we can maybe just pass on requests for things that are quick to build	05:07:05
nh2	In reply to @edef1c:matrix.org and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget? Yes, that should be fine, becaus the expected mean serving req/s are only 20% of the IOPS budget (a bit more for writes)	05:07:30
edef	but i'm not entirely sure how much of that data exists	05:07:33
nh2	In reply to @edef1c:matrix.org so i intend to do aggregation into larger objects This is how we also solved the many-small-files problem on our app's production Ceph. We zipped put "files that live and die together" -- literally put them into a 0-compression ZIP, and the web server serves them out of the zip. That way we reduced the number of files 100x, making Ceph recoveries approximately that much faster.	05:09:46
edef	yeah	05:10:01
edef	the metadata for all this is peanuts	05:10:17
edef	i've built some models for the live/die together part	05:10:46
edef	but had some data quality/enrichment stuff to resolve first and i haven't redone that analysis yet	05:11:07
nh2	For nix store paths the annoying thing is that there's no natural "files that live together" that can be automatically deducted. For my app, all files `myfile00` through `myfile99` go into `myfile.zip`. So you'd have to write some index that says in which archive which store path is. Assuming we never delete anything, the packing can be arbitrary.	05:11:43
edef	Download image.png	05:11:54
edef	like this chart is a little iffy bc it shouldn't have this long a left tail, i have the data cleanup now to fix it	05:12:00
edef	but other things on my plate before i can get to that one	05:12:17
edef	basically this is meant to model temporal locality of path references	05:12:52
edef	unfortunately it has a time travel issue that i think should be fixed now	05:13:22
edef	i just need to do the dance again	05:14:47
edef	also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths?	05:15:47
edef	(direct, not transitive)	05:16:40
nh2	In reply to @edef1c:matrix.org also, just for the bit: what's your best guess on the distribution of number-of-incoming-reference-edges for paths? you mean like "glibc is depended on by 200k packages, libpng by 10k, Ceph by 3"?	05:17:06

Show newer messages

Back to Room ListRoom Version: 10