| 6 Mar 2024 |
edef | it's really not a big deal | 16:50:40 |
raitobezarius | In reply to @edef1c:matrix.org keeping a disaster recovery copy is cheap Yes but this is not what we were discussing in that context :p | 16:52:25 |
raitobezarius | Yes, we can keep additional copies | 16:52:34 |
raitobezarius | (so that healing is magic and what not) | 16:52:44 |
edef | in general, throwing around numbers of nines is cute, but none of it means anything until you're specifying what SLOs these are for | 17:01:06 |
edef | seven nines of durability+availability doesn't mean anything by itself if we're not specifying the details | 17:02:14 |
edef | available within what timescale? we can get a lot of nines for durability and availability within 14 days latency, we can all keep copies on tape in our basement | 17:03:18 |
edef | but the question that should be driving the SLO conversation is "what goals do these SLOs accomplish" | 17:05:11 |
edef | In reply to @whentze:matrix.org do any cloud providers even have 7-nines SLAs for this kind of service? Glacier, including Glacier Instant Retrieval, claims to offer 11 nines of durability | 17:08:24 |
edef | i don't have empirical data to support that but i would treat that number with some skepticism | 17:08:37 |
raitobezarius | In reply to @edef1c:matrix.org i don't have empirical data to support that but i would treat that number with some skepticism I know durability loss in AWS | 17:09:43 |
raitobezarius | In reply to @edef1c:matrix.org i don't have empirical data to support that but i would treat that number with some skepticism * I know of durability loss stories in AWS | 17:09:48 |
raitobezarius | And it was under the SLO | 17:09:57 |
raitobezarius | (yada yada when you know how the sausage is made you know why this happens and in which scale, etC.) | 17:10:25 |
raitobezarius | * (yada yada when you know how the sausage is made you know why this happens and in which scale, etc.) | 17:10:26 |
raitobezarius | But I don't think we need to convince folks of the fact that SLO is a budget and you burn into it | 17:11:12 |
raitobezarius | It's not a mathematical guarantee | 17:11:21 |
edef | In reply to @edef1c:matrix.org in general, throwing around numbers of nines is cute, but none of it means anything until you're specifying what SLOs these are for basically i don't really want to be all "you need to be this tall to ride, please have actual irl ops experience at scale to talk at all" but if people want to participate in conversations like this and make demands about SLOs/SLAs i'd like them to at least read the SRE at Google book and learn to think about this well, it's literally free | 17:11:27 |
Jonas Chevalier | I'll talk to Domen and sort this out. I think he is motivated to raise funds, which could be pretty helpful. But yeah.. | 17:12:32 |
edef | i would like to not lose a single bit of the cache, but i know it's statistically unlikely i can guarantee this; in expectation, we've already lost some | 17:12:33 |
edef | but i do think we can buy a lot of latitude for mistakes on the serving/hot storage stack if we have a cold copy | 17:13:43 |
edef | In reply to @raitobezarius:matrix.org But I don't think we need to convince folks of the fact that SLO is a budget and you burn into it so like, one of the things i'm curious about is what the actual distributions are | 17:18:41 |
| Julian Stecklina joined the room. | 17:19:30 |
edef | eg is this whole-object loss, object corruption, is this uncorrelated across objects or correlated within prefixes | 17:19:41 |
Jonas Chevalier | One thing I am excited about if we end up self-hosting is that it will be easier to do experiments, like a smart narinfo database that can answer to queries. We can also introduce a log of all the new entries, to make it easier for other caches to mirror the main cache. | 17:21:24 |
edef | yeah, exactly | 17:21:36 |
edef | one of the things in the works is incremental updating of the narinfo dataset from S3 | 17:21:50 |
Wanja Hentze | and dedup, of course | 17:22:19 |
Jonas Chevalier | if we can be smart about it, it has the potential to have better latency responses on the 404s | 17:22:24 |
Jonas Chevalier | yeah and dedup will be easier to try, without worrying to pay 10k | 17:22:39 |