NixOS Binary Cache Self-Hosting - Public Room Timeline

	NixOS Binary Cache Self-Hosting	159 Members
	About how to host a very large-scale binary cache and more	54 Servers

Load older messages

Sender	Message	Time
5 Mar 2024
edef	out of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs	04:46:45
edef	so not that rare	04:47:06
nh2	In reply to @nh2:matrix.org That is nice and large, should make it easy. Sorry, I had misread that sentence: I thought you wrote "mean 16MiB, median 14MiB" for file size. But it was throughput.	04:47:48
nh2	In reply to @edef1c:matrix.org out of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs Interesting, I wonder why it's that many, at least in my nix use it is very rare to interrupt downloads	04:48:21
edef	we have users like. everywhere	04:48:35
nh2	edef: Do you know total number of files?	04:48:39
edef	i've seen countries i'd never even heard of in the fastly logs	04:48:48
edef	we have like ballpark a quarter billion store paths, and slightly fewer NARs than that (since complete NARs are semi content addressed)	04:49:44
edef	~800M S3 objects total basically, ~190M NARs	04:51:07
edef	(and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot)	04:59:15
nh2	In reply to @edef1c:matrix.org ~800M S3 objects total basically, ~190M NARs This part will likely be the hardest / most annoying one operationally. With 6 servers * 10 disks, each one will have ~13 M objects. When a disk fails, 13 M seeks will need to be done, which will take 37 hours. When a server fails, it'll be 10x as much, so 15 days to recovery. During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2.	05:00:10
nh2	In reply to @edef1c:matrix.org (and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot) No problem, is not urgent, I should also really go to bed.	05:00:23
edef	In reply to @nh2:matrix.org This part will likely be the hardest / most annoying one operationally. With 6 servers * 10 disks, each one will have ~13 M objects. When a disk fails, 13 M seeks will need to be done, which will take 37 hours. When a server fails, it'll be 10x as much, so 15 days to recovery. During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2. okay, that's terrifying	05:00:30
edef	but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply	05:01:04
edef	and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget?	05:01:59
edef	O(objects) anything is kind of rough here	05:02:23
edef	like we're going to hit a billion objects pretty soon, the growth is slightly superlinear at minimum	05:03:04
nh2	In reply to @edef1c:matrix.org but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks.	05:03:37
edef	also note that if we start doing more serious dedup, i'll be spraying your stuff with small random I/O	05:03:42
edef	In reply to @nh2:matrix.org Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks. yeah, we have a similar problem with Glacier	05:04:14
edef	where objects are costly but size is cheap	05:04:21
edef	so i intend to do aggregation into larger objects	05:04:53
edef	basically we can handle a lot of the read side of that by accepting that tail latencies suck and we just have a bunch of read amplification reading from larger objects and caching what's actually hot	05:05:49
edef	i'd really like to have build timing data so we can maybe just pass on requests for things that are quick to build	05:07:05
nh2	In reply to @edef1c:matrix.org and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget? Yes, that should be fine, becaus the expected mean serving req/s are only 20% of the IOPS budget (a bit more for writes)	05:07:30
edef	but i'm not entirely sure how much of that data exists	05:07:33
nh2	In reply to @edef1c:matrix.org so i intend to do aggregation into larger objects This is how we also solved the many-small-files problem on our app's production Ceph. We zipped put "files that live and die together" -- literally put them into a 0-compression ZIP, and the web server serves them out of the zip. That way we reduced the number of files 100x, making Ceph recoveries approximately that much faster.	05:09:46
edef	yeah	05:10:01
edef	the metadata for all this is peanuts	05:10:17
edef	i've built some models for the live/die together part	05:10:46

Show newer messages

Back to Room ListRoom Version: 10