| 8 May 2026 |
| jopejoe1 changed their display name from jopejoe1 (4094@epvpn) to jopejoe1. | 08:43:04 |
Vladimír Čunát | OK, though kernels will probably update again anyway before staging-next gets to master. | 08:45:40 |
Vladimír Čunát | * Sounds good, though kernels will probably update again anyway before staging-next gets to master. | 08:46:18 |
K900 | I'm trying to build my stuff against staging-next now | 08:55:07 |
K900 | Cause I have no spoons and a lot of compute | 08:55:16 |
| @hxr404:tchncs.de left the room. | 09:20:34 |
Arian | So our storage cost in S3 is going down due to the GC.
However our egress bandwidth cost are growing faster than the storage cost is shrinking.
I think AI scraping is killing us perhaps | 10:09:08 |
Arian | 2 years ago egress bandwidth was 3.5k per month. . It's 6.2k per month now | 10:09:48 |
Arian | I think in one month we'll be paying more for egress bandwidth than storage | 10:10:12 |
Arian | That sounds really off to me. | 10:10:17 |
Arian |  Download 1000065300.jpg | 10:11:44 |
Arian | Red line is egress bandwidth | 10:11:51 |
Arian | It's hockey sticking | 10:11:57 |
jappie | does that mean that the amount of requests for stuff that isn't cached by fastly is growing (perhaps scrapers stumbling upon old derivations)? because I'd assume an uptick in users downloading new-ish derivations would mean more hits in fastly and no noticeable growth in S3 egress | 10:17:43 |
hexa | I think so | 10:18:23 |
jappie | how the hell are scrapers discovering old store paths / derivations... the URLs for those contain a hash right? I'd expect that to be really difficult and time-consuming (and useless) to scrape | 10:19:37 |
Arian | You can dump all of hydras evaluations. Or run evaluations for all historical nixos commits yourself | 10:21:14 |
leona | can we determine that this is egress via fastly or is someone downloading them directly from AWS? | 10:21:46 |
Arian | Directly through S3 is only possible when requester pays | 10:22:01 |
Arian | We don't have anonymous auth enabled on our bucket. You need to provide your iam identity and it gets billed to the caller | 10:22:32 |
Arian | It would be preferable if scrapers would scrape S3 directly as then it doesn't cost us | 10:22:55 |
hexa | one obvious fix would be to GC harder, provide fewer targets | 10:24:42 |
Arian | I'm wondering if I can somehow figure out from S3 the distribution of the age of objects being requested | 10:30:52 |
emily | is it still true that Fastly doesn't cache paths for long even once built? | 11:10:55 |
emily | I forget what the conclusion of that discussion was (I know the focus was on missing paths because of the access pattern but presumably those are not what's causing these expenses) | 11:11:26 |
emily | I guess the issue is that if it's sufficiently spread out/high cardinality no per-path caching will help.
(though it seems surprising for scrapers to be going out of their way to find old stuff to query, I really doubt N versions of the same binaries are valuable?)
| 11:12:57 |
Arian | Missing paths don't generate bandwidth cost. They generate API call cost. Which is small | 11:33:00 |
emily | right | 11:58:09 |
emily | but I mean if a present path is being hammered, how long does Fastly cache that before going back to S3? | 11:58:33 |
Arian | 24h i think | 11:58:46 |