| 29 Feb 2024 |
| @admin:nixos.orgchanged room power levels. | 12:56:08 |
| Jonas Chevalierchanged room power levels. | 12:56:09 |
@admin:nixos.org | all done, added into #teams:nixos.org | 12:56:21 |
Jonas Chevalier | thanks a lot! | 12:56:29 |
@admin:nixos.org | 👋 | 12:56:29 |
Jonas Chevalier | this is getting very professional :) | 12:56:42 |
| @admin:nixos.org left the room. | 12:56:50 |
| linj joined the room. | 12:58:10 |
| mei 🌒& joined the room. | 13:28:31 |
| hexa joined the room. | 13:43:29 |
| Philip Taron (UTC-8) joined the room. | 17:16:37 |
| 1 Mar 2024 |
| [0x4A6F] joined the room. | 19:37:08 |
| 3 Mar 2024 |
| @janik0:matrix.org joined the room. | 13:17:14 |
| Alyssa Ross joined the room. | 13:32:53 |
| Julien joined the room. | 13:34:22 |
| 4 Mar 2024 |
| nh2 joined the room. | 11:08:19 |
nh2 | Regarding the current dedup effort, can it perform the dedup on the source (AWS) side, while storing the content-addressed objects on the remote side (like bup/``bupstash does), so that the outgoing traffic deduped (cheaper egress), but storage operations are done outside of AWS (e.g. free on self-hosted Hetzner)? | 12:46:52 |
nh2 | * Regarding the current dedup effort, can it perform the dedup on the source (AWS) side, while storing the content-addressed objects on the remote side (like bup/``bupstash``` does), so that the outgoing traffic deduped (cheaper egress), but storage operations are done outside of AWS (e.g. free on self-hosted Hetzner)? | 12:47:10 |
nh2 | * Regarding the current dedup effort, can it perform the dedup on the source (AWS) side, while storing the content-addressed objects on the remote side (like bup/bupstash does), so that the outgoing traffic deduped (cheaper egress), but storage operations are done outside of AWS (e.g. free on self-hosted Hetzner)? | 12:47:21 |
nh2 | * Regarding the current dedup effort, can it perform the dedup on the source (AWS) side, while storing the content-addressed objects on the remote side (like bup/bupstash do), so that the outgoing traffic deduped (cheaper egress), but storage operations are done outside of AWS (e.g. free on self-hosted Hetzner)? | 12:47:32 |
flokli | Yes | 12:57:06 |
flokli | We could be reading through the NARs in the AWS bucket, decompress and CDC on a EC2 instance for example, then insert chunks into some hetzner object storage. however, xz decompression takes quite a big amount of CPU, so it might be quite slow, compared to snowballing out all, and running that part at Hetzner too. | 12:58:55 |
nh2 | flokli: How much of the code to do that does already exist? | 13:00:48 |
flokli | We have code essentially doing everything except actually persisting the chunks (only doing bookkeeping). That was used to test dedup ratios while we very chunking parameters | 13:02:01 |
flokli | * We have code essentially doing everything except actually persisting the chunks (only doing bookkeeping). That was used to test dedup ratios while we vary chunking parameters | 13:02:38 |
flokli | I'm also writing some code teaching tvix-castore how to use object storage. The internal model is the same, so it'd just be a matter of plugging some of this stuff together | 13:03:40 |
nh2 | flokli: Is there a current write-up of the expected dedup ratios? | 13:04:50 |
flokli | All that's on discourse, there's also a pad with our meeting notes linked somewhere there | 13:06:36 |
flokli | (ah, channel topic here too) | 13:06:44 |
nh2 | flokli: I have some troubles getting the latest numbers out of that pad. In https://pad.lassul.us/nixos-cache-gc#Day-2023-11-07 it suggests dedup to 20% of original size, in https://pad.lassul.us/nixos-cache-gc#Day-2024-01-16 it says we got 30-40% better once we got enough data for the deduper, does that mean 14%? | 13:17:52 |