!CcTBuBritXGywOEGWJ:matrix.org

NixOS Binary Cache Self-Hosting

154 Members
About how to host a very large-scale binary cache and more54 Servers

Load older messages


SenderMessageTime
2 Mar 2024
@redblueflame:matrix.orgredblueflame joined the room.20:15:52
@olafklingt:matrix.orgolaf joined the room.20:59:00
@thubrecht:matrix.orgthubrecht joined the room.21:12:45
@nh2:matrix.orgnh2

raitobezarius: Thanks for the hardware info.
For completeness and comparison, the way one would usually set up Ceph for HA is multiple servers in multiple Hetzner datacenters (a Hetzner "DC" is a physically separate, but still walking distance, building with its own independent backup power supply; so roughly equivalent to an AWS Availability Zone "AZ").
So for 6=4+2 Erasure Coding, with the DC as failure domain, one would need 6 servers, one per DC.
This EC has only 1.66x storage overhead while supporting 2 losses.
One would usually put 10 Gbit/s networking in between those; for Hetzner-rented servers that costs 40 EUR/month per 10 Gbit/s link, and also provides 10 Gbit/s to the Internet.

With a single 60-disk megaserver with 1 Gbit/s link, you'd likely bottleneck on bandwidth immediately if many people use it.
For archiving old store paths, that matters less.

22:14:38
@raitobezarius:matrix.orgraitobezarius(technically AWS AZ have minimum distance between each other, contrary to other clouds definitions of "AZs", e.g. GCP AFAIK)22:15:29
@raitobezarius:matrix.orgraitobezarius(but that's just my pedantic brain)22:15:39
@nh2:matrix.orgnh2 raitobezarius: That's not pedantic, it's a perfectly valid topic. When the OVH fire happened, the DCs were so close that the fire could spread from one to the next.
At that time, I checked it for Hetzner. My assessment from the photos is that a fire is unlikely to spread between Hetzner DCs, but the fire brigade might still shut down the whole DC park if one catches fire. So you'd have risk of downtime, but not loss.
22:18:48
@raitobezarius:matrix.orgraitobezariusYeah, the more I look at it, the more I like the rented idea because it enables also smoother ramp up22:19:44
@delroth:delroth.netdelroththere's also some potential value in the foundation not having to manage assets, as opposed to operational costs22:22:39
@raitobezarius:matrix.orgraitobezariusah fun fact btw https://lists.debian.org/debian-snapshot/2024/02/msg00003.html22:23:22
@raitobezarius:matrix.orgraitobezarius olasd told me "this is what happens when you have 17 architectures used by 3 persons" when I pinged him about that hexa :D 22:23:54
@delroth:delroth.netdelroth

also copying what I was saying on the #dev channel to make sure we have everything in one history:

we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :)

(it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")

22:24:45
@delroth:delroth.netdelroth *

also copying what I was saying on the #dev channel to make sure we have everything in one history:

we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :)

(it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")

22:24:49
@delroth:delroth.netdelroth *

also copying what I was saying on the #dev channel to make sure we have everything in one history:

we've had discussions about this in the past and came up to roughly the same cost estimates, the main issue is the big mindset change in having the current set of infra volunteers be in charge of the reliability of fairly complex infra directly in the main user query path. As much as I hate S3, nobody here has to be oncall for when it's down :)

(it doesn't disqualify a self-hosting solution, but uh, it's hard to have proper cost estimates that don't include a potential "we need to pay someone to be fulltime oncall")

22:25:00
@delroth:delroth.netdelroth(AFAIK nobody has made a proper call on what kind of availability target we'd like to hit, so it's hard to know what kind of HA requirements as well as staffing we'd need)22:26:48
@hexa:lossy.networkhexato be fair, I'd expect nobody to know22:27:05
@raitobezarius:matrix.orgraitobezariusArguably, I think the hard metric is the durability one22:27:05
@raitobezarius:matrix.orgraitobezariusAvailability is one that matters but with a CDN in front, a lot of stuff can be mitigated22:27:21
@raitobezarius:matrix.orgraitobezariusAnd during us-east-1 outages, I don't think there was much to be noticed22:27:30
@delroth:delroth.netdelrothI still think that if we run a "hot" / "recent" cache on Hetzner while keeping all the historical stuff on AWS, we can likely decrease the bill by a lot22:28:56
@raitobezarius:matrix.orgraitobezarius
In reply to @hexa:lossy.network
to be fair, I'd expect nobody to know
(it seems a political decision too tbh)
22:29:27
@raitobezarius:matrix.orgraitobezarius(how many MB/year are you OK to lose?)22:29:34
@raitobezarius:matrix.orgraitobezarius * ("how many MB/year are you OK to lose?")22:29:37
@hexa:lossy.networkhexareally depends on which MBs you are going to loose 😛 22:32:02
@raitobezarius:matrix.orgraitobezarius<insert meme about the dog "no choose; only lose">22:32:17
@nh2:matrix.orgnh2
In reply to @delroth:delroth.net
I still think that if we run a "hot" / "recent" cache on Hetzner while keeping all the historical stuff on AWS, we can likely decrease the bill by a lot
I don't understand that; the cost on AWS is the historical stuff, because the cost per TB is high on AWS.
The other way around the argument would make sense
22:34:31
@delroth:delroth.netdelroththe cost on AWS is in large part bandwidth22:34:45
@raitobezarius:matrix.orgraitobezarius(80TB/month)22:34:51
@delroth:delroth.netdelrothI don't have the exact breakdown, but $thousands/month22:35:14
@raitobezarius:matrix.orgraitobezariusI think it's 3K-ish22:35:22

There are no newer messages yet.


Back to Room ListRoom Version: 10