NixOS Binary Cache Self-Hosting | 158 Members | |
| About how to host a very large-scale binary cache and more | 54 Servers |
| Sender | Message | Time |
|---|---|---|
| 5 Mar 2024 | ||
| note that i've given you zero information about I/O sizes so far | 04:36:36 | |
In reply to @edef1c:matrix.orgWouldn't Nix users usually download the whole NAR, unless they abort the download? | 04:37:10 | |
| yes and no | 04:37:25 | |
| in that more recent Nix versions will do download resumption, and Fastly will do range requests to the backend ("segmented caching" is your keyword here) | 04:37:52 | |
| so you should be viewing the backend requests as (key, start offset, end offset) triples moreso than full fetches | 04:39:53 | |
In reply to @edef1c:matrix.org I will likely not need serving numbers, the trouble with Ceph is usally storing the small files, because its maintenance operations (integrity "scrubs", "recovery" balancing on disk failure) are O(objects = files = seeks). (For systems where the stored is bigger than what's served per month, which is the case here at 2TB/day.) | 04:40:41 | |
| right | 04:40:56 | |
| so on S3 small objects are also costly | 04:41:10 | |
In reply to @edef1c:matrix.orgWill Fastly always chunk up the requests into small range requests, even if the user's Nix requests the whole NAR, or only if the end user requests a range? | 04:41:58 | |
| eg it is a humongous pain in the rear to collect the narinfos, we basically have custom tools to rapid-fire pipelined S3 fetches | 04:42:04 | |
In reply to @nh2:matrix.orgi don't recall right now, sorry | 04:42:13 | |
In reply to @edef1c:matrix.orgBecause that could indeed inflate the IOPS, though Ceph has readaheads of configurable size, so it could be worked around that way | 04:43:07 | |
| Fastly segmented caching docs (https://docs.fastly.com/en/guides/segmented-caching#how-segmented-caching-works)
| 04:43:09 | |
| critical part being the final sentence | 04:43:23 | |
In reply to @edef1c:matrix.org And the beginning of the paragraph:
This suggests "no range request by nix" => "no range request by Fastly to upstream" | 04:45:12 | |
| So it should be a quite rare case | 04:45:32 | |
| out of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs | 04:46:45 | |
| so not that rare | 04:47:06 | |
In reply to @nh2:matrix.orgSorry, I had misread that sentence: I thought you wrote "mean 16MiB, median 14MiB" for file size. But it was throughput. | 04:47:48 | |
In reply to @edef1c:matrix.orgInteresting, I wonder why it's that many, at least in my nix use it is very rare to interrupt downloads | 04:48:21 | |
| we have users like. everywhere | 04:48:35 | |
| edef: Do you know total number of files? | 04:48:39 | |
| i've seen countries i'd never even heard of in the fastly logs | 04:48:48 | |
| we have like ballpark a quarter billion store paths, and slightly fewer NARs than that (since complete NARs are semi content addressed) | 04:49:44 | |
| ~800M S3 objects total basically, ~190M NARs | 04:51:07 | |
| (and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot) | 04:59:15 | |
In reply to @edef1c:matrix.org This part will likely be the hardest / most annoying one operationally.
During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2. | 05:00:10 | |
In reply to @edef1c:matrix.orgNo problem, is not urgent, I should also really go to bed. | 05:00:23 | |
In reply to @nh2:matrix.orgokay, that's terrifying | 05:00:30 | |
| but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply | 05:01:04 | |