NixOS Binary Cache Self-Hosting - Public Room Timeline

	NixOS Binary Cache Self-Hosting	159 Members
	About how to host a very large-scale binary cache and more	54 Servers

Load older messages

Sender	Message	Time
5 Mar 2024
edef	so we can tune that for however much compute we want to throw at it in parallel	03:35:31
edef	i can run some numbers wrt the best bang-per-buck there but not right this second	03:36:12
edef	basically depends on what the supply curve for EC2 spot compute looks like	03:36:38
nh2	For Ceph hosting, do we know what the IOPS of cache.nixos.org are, just to see if some basic small cluster on HDDs could handle it?	03:39:18
edef	presumably you want backend I/O, ie to the S3 bucket?	03:40:25
nh2	yes, that would be the equivalent of what would hit the disks	03:41:04
edef	easy stats: over the last 24h we've served 2.1TiB from all our S3 buckets, uploaded 491G, in ~30M requests	03:47:32
edef	we're serving like 375Mbit/s to Fastly in the peak minute on a day chosen by Fair Dice Roll™	04:03:53
edef	not sure how to meaningfully turn these things into iops numbers just because that depends on various factors	04:05:27
edef	clickhouse is refusing to deal with S3 wildcards for some reason and i haven't quite chased down why yet	04:06:41
edef	i'm just taking a request that completed in that minute to have fully executed in that minute but i think that shakes out to a slightly upwards biased estimator if anything	04:08:29
edef	okay i just need to upgrade clickhouse on the EC2 data box i think	04:09:57
edef	* we're serving like 375Mbit/s of compressed NARs to Fastly in the peak minute on a day chosen by Fair Dice Roll™	04:10:45
edef	i'm focusing on the NAR serving because that's the actual meat of it, the narinfos are only like 90G of stuff	04:11:17
edef	we also have a few other file types but they're mostly pretty marginal	04:11:52
edef	WHERE NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}\.narinfo$' AND NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}\.ls(\.xz)?$' AND NOT key REGEXP '^[0123456789abcdfghijklmnpqrsvwxyz]{32}-[a-zA-Z0-9+\-_?=][a-zA-Z0-9+\-_?=.]\.ls$' AND NOT key REGEXP '^nar/[0123456789abcdfghijklmnpqrsvwxyz]{52}\.nar\.(bz2\|xz)$' AND NOT key REGEXP '^log/[0123456789abcdfghijklmnpqrsvwxyz]{32}-[a-zA-Z0-9+\-_?=][a-zA-Z0-9+\-_?=.]\.drv$' AND NOT key REGEXP '^debuginfo/[0-9a-f]{40}$' AND NOT key REGEXP '^debuginfo/[0-9a-f]{16}$' AND NOT key IN ('.well-known/pki-validation/gsdv.txt', 'nix-cache-info', 'index.html', 'binary-cache/', 'error-pages/403', 'error-pages/404')	04:12:00
edef	^ that yields an empty result set if applied over the S3 inventory	04:12:33
edef	debuginfo is for dwarffs which basically nobody uses, i think the 64-bit ones are even more dead, logfiles aren't a huge traffic driver either, .ls files are used by nar-index iirc but we don't have very much of those	04:13:27
edef	* debuginfo is for dwarffs which basically nobody uses, i think the 64-bit ones are even more dead, logfiles aren't a huge traffic driver either, .ls files are used by nix-index iirc but we don't have very much of those	04:13:45
edef	basically at peak we're serving like a gigabit of NARs	04:15:22
edef	* basically at peak we're serving like a gigabit per second of NARs	04:15:34
edef	that is, the following query yields 7400044328 bytes/min ≈ 1 Gbit/s `SELECT max(bytes_per_minute) FROM ( SELECT toStartOfMinute(fromUnixTimestamp(timestamp)) AS minute, SUM(bytes_sent) AS bytes_per_minute FROM s3('s3://nix-archeologist/nix-cache-bucket-logs/*.parquet', Parquet, 'key String, http_status Int64, bytes_sent Int64, user_agent String, operation String, requester String, timestamp UInt32') WHERE startsWith(key, 'nar/') AND (requester = 'arn:aws:iam::080433136561:user/fastly-cache-access') AND (http_status IN (200, 206)) GROUP BY minute )`	04:17:09
nh2	In reply to @edef1c:matrix.org easy stats: over the last 24h we've served 2.1TiB from all our S3 buckets, uploaded 491G, in ~30M requests Thanks! So mean is 24 MB/s, 350 req/s. That traffic will be very easy with 6 10 Gbit/s servers. The IOPS should work, EC 6=4+2 should give 1500 reads IO/s, so we'd have 5x margin over the mean. Probably a good amount of that can also be cached away from the HDDs as many people will be likely requesting latest nixos-* branches, and fewer people older, pinned branches.	04:19:30
edef	that number is coming from the S3 dash, and is over all nixos.org buckets	04:20:38
edef	but it's a 24h sample, my other numbers are coming from a few months worth of data	04:21:11
nh2	edef: Can you query the number of files / the distribution/histogram of their sizes? A weakness of Ceph is large amounts of small files.	04:21:14
nh2	* edef: Can you query the number of files / the distribution/histogram of their sizes? A weakness of Ceph on HDDs is large amounts of small files.	04:21:29
edef	sure, i can draw you some histograms	04:21:59
edef	i can tell you up front that our biggest source of small files is the narinfos though	04:22:20
edef	but there's only like 90G of that so we can serve that from SSD quite easily	04:23:08

Show newer messages

Back to Room ListRoom Version: 10