!CcTBuBritXGywOEGWJ:matrix.org

NixOS Binary Cache Self-Hosting

173 Members
About how to host a very large-scale binary cache and more62 Servers

Load older messages


SenderMessageTime
4 Mar 2024
@edef1c:matrix.orgedefthe narinfo dataset is archived in several places now, that part we have covered22:17:52
@raitobezarius:matrix.orgraitobezarius
In reply to @edef1c:matrix.org
that locks us in for 6 months, but if there are no other takers, i'll put down the $3k to buy myself 6 months of development time for an exit strategy from AWS for that data
we collected enough money to put the 3K as part of the "binary cache niceties" budget
22:18:04
@raitobezarius:matrix.orgraitobezariusfwiw22:18:09
@edef1c:matrix.orgedefsure, that works for me, $3k is def still a meaningful cost for me22:18:32
@edef1c:matrix.orgedefi just know that history would not judge me kindly if i let this data go to /dev/null22:19:13
5 Mar 2024
@nh2:matrix.orgnh2
In reply to @zimbatm:numtide.com
nh2: do you want to join the infra meeting on Thusday 18:00 GMT+1 and hash this out with us?
Unfortunately I'll be on a train at that time, so my ability to join may be reduced
02:30:40
@nh2:matrix.orgnh2
In reply to @edef1c:matrix.org
so like, my biggest proposal wrt the cache GC is that we aggregate the "deleted" data into Glacier Deep Archive, as large objects
edef: What will be the cost of getting them out again, just to be sure that it won't be forbiddingly large?
02:31:47
@edef1c:matrix.orgedefbatch restores are free, they just have 12h latency03:32:17
@edef1c:matrix.orgedefrestores happen to S3 reduced redundancy but we'd only need to float a small fraction of the dataset at a time03:33:31
@nh2:matrix.orgnh2I see, that makes sense03:34:59
@edef1c:matrix.orgedefso we can tune that for however much compute we want to throw at it in parallel03:35:31
@edef1c:matrix.orgedefi can run some numbers wrt the best bang-per-buck there but not right this second03:36:12
@edef1c:matrix.orgedefbasically depends on what the supply curve for EC2 spot compute looks like03:36:38
@nh2:matrix.orgnh2For Ceph hosting, do we know what the IOPS of cache.nixos.org are, just to see if some basic small cluster on HDDs could handle it?03:39:18
@edef1c:matrix.orgedefpresumably you want backend I/O, ie to the S3 bucket?03:40:25
@nh2:matrix.orgnh2yes, that would be the equivalent of what would hit the disks03:41:04
@edef1c:matrix.orgedefeasy stats: over the last 24h we've served 2.1TiB from all our S3 buckets, uploaded 491G, in ~30M requests03:47:32
@edef1c:matrix.orgedefwe're serving like 375Mbit/s to Fastly in the peak minute on a day chosen by Fair Dice Rollâ„¢04:03:53
@edef1c:matrix.orgedefnot sure how to meaningfully turn these things into iops numbers just because that depends on various factors04:05:27
@edef1c:matrix.orgedefclickhouse is refusing to deal with S3 wildcards for some reason and i haven't quite chased down why yet04:06:41
@edef1c:matrix.orgedefi'm just taking a request that completed in that minute to have fully executed in that minute but i think that shakes out to a slightly upwards biased estimator if anything04:08:29
@edef1c:matrix.orgedefokay i just need to upgrade clickhouse on the EC2 data box i think04:09:57
@edef1c:matrix.orgedef * we're serving like 375Mbit/s of compressed NARs to Fastly in the peak minute on a day chosen by Fair Dice Rollâ„¢04:10:45
@edef1c:matrix.orgedefi'm focusing on the NAR serving because that's the actual meat of it, the narinfos are only like 90G of stuff04:11:17
@edef1c:matrix.orgedefwe also have a few other file types but they're mostly pretty marginal04:11:52
@edef1c:matrix.orgedef
WHERE NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}\.narinfo$'
  AND NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}\.ls(\.xz)?$'
  AND NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}-[a-zA-Z0-9+\-_?=][a-zA-Z0-9+\-_?=.]*\.ls$'
  AND NOT key REGEXP '^nar/[0123456789abcdfghijklmnpqrsvwxyz]{52}\.nar\.(bz2|xz)$'
  AND NOT key REGEXP '^log/[0123456789abcdfghijklmnpqrsvwxyz]{32}-[a-zA-Z0-9+\-_?=][a-zA-Z0-9+\-_?=.]*\.drv$'
  AND NOT key REGEXP '^debuginfo/[0-9a-f]{40}$'
  AND NOT key REGEXP '^debuginfo/[0-9a-f]{16}$'
  AND NOT key IN ('.well-known/pki-validation/gsdv.txt', 'nix-cache-info', 'index.html', 'binary-cache/', 'error-pages/403', 'error-pages/404')
04:12:00
@edef1c:matrix.orgedef^ that yields an empty result set if applied over the S3 inventory04:12:33
@edef1c:matrix.orgedefdebuginfo is for dwarffs which basically nobody uses, i think the 64-bit ones are even more dead, logfiles aren't a huge traffic driver either, .ls files are used by nar-index iirc but we don't have very much of those04:13:27
@edef1c:matrix.orgedef * debuginfo is for dwarffs which basically nobody uses, i think the 64-bit ones are even more dead, logfiles aren't a huge traffic driver either, .ls files are used by nix-index iirc but we don't have very much of those04:13:45
@edef1c:matrix.orgedefbasically at peak we're serving like a gigabit of NARs04:15:22

Show newer messages


Back to Room ListRoom Version: 10