!CcTBuBritXGywOEGWJ:matrix.org

NixOS Binary Cache Self-Hosting

159 Members
About how to host a very large-scale binary cache and more54 Servers

Load older messages


SenderMessageTime
5 Mar 2024
@edef1c:matrix.orgedefout of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs04:46:45
@edef1c:matrix.orgedefso not that rare04:47:06
@nh2:matrix.orgnh2
In reply to @nh2:matrix.org
That is nice and large, should make it easy.
Sorry, I had misread that sentence: I thought you wrote "mean 16MiB, median 14MiB" for file size. But it was throughput.
04:47:48
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
out of ~1B 2xx responses, ~25% are 206 Partial Content responses, ~75% are 200 OKs
Interesting, I wonder why it's that many, at least in my nix use it is very rare to interrupt downloads
04:48:21
@edef1c:matrix.orgedefwe have users like. everywhere04:48:35
@nh2:matrix.orgnh2 edef: Do you know total number of files? 04:48:39
@edef1c:matrix.orgedefi've seen countries i'd never even heard of in the fastly logs04:48:48
@edef1c:matrix.orgedefwe have like ballpark a quarter billion store paths, and slightly fewer NARs than that (since complete NARs are semi content addressed)04:49:44
@edef1c:matrix.orgedef~800M S3 objects total basically, ~190M NARs04:51:07
@edef1c:matrix.orgedef(and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot)04:59:15
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
~800M S3 objects total basically, ~190M NARs

This part will likely be the hardest / most annoying one operationally.
With 6 servers * 10 disks, each one will have ~13 M objects.

  • When a disk fails, 13 M seeks will need to be done, which will take 37 hours.
  • When a server fails, it'll be 10x as much, so 15 days to recovery.

During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2.

05:00:10
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
(and sorry for keeping you waiting on histograms, i'm just a bit far into my uptime and it involves poking more stuff than i have brain for rn, i'm running half on autopilot)
No problem, is not urgent, I should also really go to bed.
05:00:23
@edef1c:matrix.orgedef
In reply to @nh2:matrix.org

This part will likely be the hardest / most annoying one operationally.
With 6 servers * 10 disks, each one will have ~13 M objects.

  • When a disk fails, 13 M seeks will need to be done, which will take 37 hours.
  • When a server fails, it'll be 10x as much, so 15 days to recovery.

During that recovery time, only 1 more disk is allowed to fail with EC 6=4+2.

okay, that's terrifying
05:00:30
@edef1c:matrix.orgedefbut Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply05:01:04
@edef1c:matrix.orgedefand, stupid question, but i assume you're keeping space for resilver capacity in your iops budget?05:01:59
@edef1c:matrix.orgedefO(objects) anything is kind of rough here05:02:23
@edef1c:matrix.orgedeflike we're going to hit a billion objects pretty soon, the growth is slightly superlinear at minimum05:03:04
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
but Glacier Deep Archive doesn't exactly break the bank, we can basically insure ourselves against data loss quite cheaply
Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks.
05:03:37
@edef1c:matrix.orgedefalso note that if we start doing more serious dedup, i'll be spraying your stuff with small random I/O05:03:42
@edef1c:matrix.orgedef
In reply to @nh2:matrix.org
Yes, it's only really concerning availability. For write-mostly backups, one can use higher EC redundancy, or tar/zip the files, which gets rid of the problem of many small files / seeks.
yeah, we have a similar problem with Glacier
05:04:14
@edef1c:matrix.orgedefwhere objects are costly but size is cheap05:04:21
@edef1c:matrix.orgedefso i intend to do aggregation into larger objects05:04:53
@edef1c:matrix.orgedefbasically we can handle a lot of the read side of that by accepting that tail latencies suck and we just have a bunch of read amplification reading from larger objects and caching what's actually hot05:05:49
@edef1c:matrix.orgedefi'd really like to have build timing data so we can maybe just pass on requests for things that are quick to build05:07:05
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
and, stupid question, but i assume you're keeping space for resilver capacity in your iops budget?
Yes, that should be fine, becaus the expected mean serving req/s are only 20% of the IOPS budget (a bit more for writes)
05:07:30
@edef1c:matrix.orgedefbut i'm not entirely sure how much of that data exists05:07:33
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
so i intend to do aggregation into larger objects
This is how we also solved the many-small-files problem on our app's production Ceph. We zipped put "files that live and die together" -- literally put them into a 0-compression ZIP, and the web server serves them out of the zip.
That way we reduced the number of files 100x, making Ceph recoveries approximately that much faster.
05:09:46
@edef1c:matrix.orgedefyeah05:10:01
@edef1c:matrix.orgedefthe metadata for all this is peanuts05:10:17
@edef1c:matrix.orgedefi've built some models for the live/die together part05:10:46

Show newer messages


Back to Room ListRoom Version: 10