NixOS Binary Cache Self-Hosting - Public Room Timeline

	NixOS Binary Cache Self-Hosting	172 Members
	About how to host a very large-scale binary cache and more	59 Servers

You have reached the beginning of time (for this room).

Sender	Message	Time
2 Mar 2024
raitobezarius	so assuming we do have the hardware, this is a compelling option and if we offer a proper complete proposal to the foundation and operations make sense and infrastructure agrees to it, it could be adopted I guess	02:18:24
raitobezarius	* so assuming we do have the hardware, this is a compelling option and if we offer a proper complete proposal to the foundation and operations make sense and infrastructure folks agrees to it, it could be adopted I guess	02:18:30
raitobezarius	and to give an example of hardware costs, I guess a netapp de6600 60x3.5" could cost something like 800EUR, you can fit 60×20TB disk in there, so 1.2PB raw capacity, 60 disks of 20TB will cost approximately 440€*60 ~ 26.4K EUR at list price, obviously, so something like 27K EUR, this can be spread over the 2 locations to avoid having 1.2PB × 2 needlessly (and the rest can be filled as the cache grows organically)	02:24:08
raitobezarius	I'm ignoring server costs because honestly you can find a R730 in a trash bin, put enough SAS cards and plug the JBOD in	02:24:30
raitobezarius	tough question is whether flash is needed at all or not	02:24:41
raitobezarius	if so, this can add 5K-20K to the proposal	02:24:55
raitobezarius	actually hetzner seems to have proper connection options: https://docs.hetzner.com/robot/colocation/pricing	02:32:47
raitobezarius	they were just hidden	02:32:48
	delroth joined the room.	19:19:47
	misuzu joined the room.	19:27:04
	redblueflame joined the room.	20:15:52
	olaf joined the room.	20:59:00
	thubrecht joined the room.	21:12:45
nh2	raitobezarius: Thanks for the hardware info. For completeness and comparison, the way one would usually set up Ceph for HA is multiple servers in multiple Hetzner datacenters (a Hetzner "DC" is a physically separate, but still walking distance, building with its own independent backup power supply; so roughly equivalent to an AWS Availability Zone "AZ"). So for 6=4+2 Erasure Coding, with the DC as failure domain, one would need 6 servers, one per DC. This EC has only 1.66x storage overhead while supporting 2 losses. One would usually put 10 Gbit/s networking in between those; for Hetzner-rented servers that costs 40 EUR/month per 10 Gbit/s link, and also provides 10 Gbit/s to the Internet. With a single 60-disk megaserver with 1 Gbit/s link, you'd likely bottleneck on bandwidth immediately if many people use it. For archiving old store paths, that matters less.	22:14:38
raitobezarius	(technically AWS AZ have minimum distance between each other, contrary to other clouds definitions of "AZs", e.g. GCP AFAIK)	22:15:29
raitobezarius	(but that's just my pedantic brain)	22:15:39
nh2	raitobezarius: That's not pedantic, it's a perfectly valid topic. When the OVH fire happened, the DCs were so close that the fire could spread from one to the next. At that time, I checked it for Hetzner. My assessment from the photos is that a fire is unlikely to spread between Hetzner DCs, but the fire brigade might still shut down the whole DC park if one catches fire. So you'd have risk of downtime, but not loss.	22:18:48
raitobezarius	Yeah, the more I look at it, the more I like the rented idea because it enables also smoother ramp up	22:19:44
delroth	there's also some potential value in the foundation not having to manage assets, as opposed to operational costs	22:22:39
raitobezarius	ah fun fact btw https://lists.debian.org/debian-snapshot/2024/02/msg00003.html	22:23:22
raitobezarius	olasd told me "this is what happens when you have 17 architectures used by 3 persons" when I pinged him about that hexa :D	22:23:54
delroth	also copying what I was saying on the #dev channel to make sure we have everything in one history: we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :) (it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")	22:24:45
delroth	* also copying what I was saying on the #dev channel to make sure we have everything in one history: we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :) (it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")	22:24:49
delroth	* also copying what I was saying on the #dev channel to make sure we have everything in one history: we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :) (it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")	22:25:00

Show newer messages

Back to Room ListRoom Version: 10