| 4 Mar 2024 |
edef | the perf requirements on the storage backends are really loose, except for serving 404s | 14:50:34 |
edef | * the perf requirements on the storage backends are really loose, except for serving narinfo 404s | 14:50:55 |
edef | but narinfo keys are only 5G of stuff, we can serve 404s basically however we want | 14:51:22 |
edef | any actual serving ends up being from fastly and too heavily cached for the backend's perf to matter | 14:52:17 |
Jonas Chevalier | nh2: do you want to join the infra meeting on Thusday 18:00 GMT+1 and hash this out with us? | 14:55:08 |
raitobezarius | Isn't delroth going to be off? | 14:55:46 |
raitobezarius | I think it's good to have delroth on those discussions | 14:55:57 |
Jonas Chevalier | it's fine, we already discussed this | 14:57:12 |
Jonas Chevalier | we have the overall ideas but what's missing is to map some of the unknowns, like delroth said; can we make this sustainable, what a migration path looks like, ... | 15:00:45 |
Jonas Chevalier | we might be able to port other things than the cache first | 15:01:56 |
edef | so like, my biggest proposal wrt the cache GC is that we aggregate the "deleted" data into Glacier Deep Archive, as large objects | 22:12:32 |
edef | that locks us in for 6 months, but if there are no other takers, i'll put down the $3k to buy myself 6 months of development time for an exit strategy from AWS for that data | 22:13:19 |
edef | it should dedupe quite well, but my biggest issue is simply time pressure | 22:14:24 |
edef | i don't really intend don't intend to lose a single byte of that historical data but the stress from trying to do all of this fast is weighing on me | 22:16:33 |
edef | the narinfo dataset is archived in several places now, that part we have covered | 22:17:52 |
raitobezarius | In reply to @edef1c:matrix.org that locks us in for 6 months, but if there are no other takers, i'll put down the $3k to buy myself 6 months of development time for an exit strategy from AWS for that data we collected enough money to put the 3K as part of the "binary cache niceties" budget | 22:18:04 |
raitobezarius | fwiw | 22:18:09 |
edef | sure, that works for me, $3k is def still a meaningful cost for me | 22:18:32 |
edef | i just know that history would not judge me kindly if i let this data go to /dev/null | 22:19:13 |
| 5 Mar 2024 |
nh2 | In reply to @zimbatm:numtide.com nh2: do you want to join the infra meeting on Thusday 18:00 GMT+1 and hash this out with us? Unfortunately I'll be on a train at that time, so my ability to join may be reduced | 02:30:40 |
nh2 | In reply to @edef1c:matrix.org so like, my biggest proposal wrt the cache GC is that we aggregate the "deleted" data into Glacier Deep Archive, as large objects edef: What will be the cost of getting them out again, just to be sure that it won't be forbiddingly large? | 02:31:47 |
edef | batch restores are free, they just have 12h latency | 03:32:17 |
edef | restores happen to S3 reduced redundancy but we'd only need to float a small fraction of the dataset at a time | 03:33:31 |
nh2 | I see, that makes sense | 03:34:59 |
edef | so we can tune that for however much compute we want to throw at it in parallel | 03:35:31 |
edef | i can run some numbers wrt the best bang-per-buck there but not right this second | 03:36:12 |
edef | basically depends on what the supply curve for EC2 spot compute looks like | 03:36:38 |
nh2 | For Ceph hosting, do we know what the IOPS of cache.nixos.org are, just to see if some basic small cluster on HDDs could handle it? | 03:39:18 |
edef | presumably you want backend I/O, ie to the S3 bucket? | 03:40:25 |
nh2 | yes, that would be the equivalent of what would hit the disks | 03:41:04 |