NixOS Binary Cache Self-Hosting | 159 Members | |
| About how to host a very large-scale binary cache and more | 54 Servers |
| Sender | Message | Time |
|---|---|---|
| 5 Mar 2024 | ||
| out of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs | 04:46:45 | |
| so not that rare | 04:47:06 | |
In reply to @nh2:matrix.orgSorry, I had misread that sentence: I thought you wrote "mean 16MiB, median 14MiB" for file size. But it was throughput. | 04:47:48 | |
In reply to @edef1c:matrix.orgInteresting, I wonder why it's that many, at least in my nix use it is very rare to interrupt downloads | 04:48:21 | |
| we have users like. everywhere | 04:48:35 | |
| edef: Do you know total number of files? | 04:48:39 | |
| i've seen countries i'd never even heard of in the fastly logs | 04:48:48 | |
| we have like ballpark a quarter billion store paths, and slightly fewer NARs than that (since complete NARs are semi content addressed) | 04:49:44 | |
| ~800M S3 objects total basically, ~190M NARs | 04:51:07 | |
| (and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot) | 04:59:15 | |
In reply to @edef1c:matrix.org This part will likely be the hardest / most annoying one operationally.
During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2. | 05:00:10 | |
In reply to @edef1c:matrix.orgNo problem, is not urgent, I should also really go to bed. | 05:00:23 | |
In reply to @nh2:matrix.orgokay, that's terrifying | 05:00:30 | |
| but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply | 05:01:04 | |
| and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget? | 05:01:59 | |
| O(objects) anything is kind of rough here | 05:02:23 | |
| like we're going to hit a billion objects pretty soon, the growth is slightly superlinear at minimum | 05:03:04 | |
In reply to @edef1c:matrix.orgYes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks. | 05:03:37 | |
| also note that if we start doing more serious dedup, i'll be spraying your stuff with small random I/O | 05:03:42 | |
In reply to @nh2:matrix.orgyeah, we have a similar problem with Glacier | 05:04:14 | |
| where objects are costly but size is cheap | 05:04:21 | |
| so i intend to do aggregation into larger objects | 05:04:53 | |
| basically we can handle a lot of the read side of that by accepting that tail latencies suck and we just have a bunch of read amplification reading from larger objects and caching what's actually hot | 05:05:49 | |
| i'd really like to have build timing data so we can maybe just pass on requests for things that are quick to build | 05:07:05 | |
In reply to @edef1c:matrix.orgYes, that should be fine, becaus the expected mean serving req/s are only 20% of the IOPS budget (a bit more for writes) | 05:07:30 | |
| but i'm not entirely sure how much of that data exists | 05:07:33 | |
In reply to @edef1c:matrix.orgThis is how we also solved the many-small-files problem on our app's production Ceph. We zipped put "files that live and die together" -- literally put them into a 0-compression ZIP, and the web server serves them out of the zip. That way we reduced the number of files 100x, making Ceph recoveries approximately that much faster. | 05:09:46 | |
| yeah | 05:10:01 | |
| the metadata for all this is peanuts | 05:10:17 | |
| i've built some models for the live/die together part | 05:10:46 | |