NixOS Infrastructure - Public Room Timeline

	NixOS Infrastructure	421 Members
	Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) \| Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 \| See #infra-alerts:nixos.org for real time alerts from Prometheus.	132 Servers

Load older messages

Sender	Message	Time
8 May 2026
	jopejoe1 changed their display name from jopejoe1 (4094@epvpn) to jopejoe1.	08:43:04
Vladimír Čunát	OK, though kernels will probably update again anyway before `staging-next` gets to master.	08:45:40
Vladimír Čunát	* Sounds good, though kernels will probably update again anyway before `staging-next` gets to master.	08:46:18
K900	I'm trying to build my stuff against staging-next now	08:55:07
K900	Cause I have no spoons and a lot of compute	08:55:16
	@hxr404:tchncs.de left the room.	09:20:34
Arian	So our storage cost in S3 is going down due to the GC. However our egress bandwidth cost are growing faster than the storage cost is shrinking. I think AI scraping is killing us perhaps	10:09:08
Arian	2 years ago egress bandwidth was 3.5k per month. . It's 6.2k per month now	10:09:48
Arian	I think in one month we'll be paying more for egress bandwidth than storage	10:10:12
Arian	That sounds really off to me.	10:10:17
Arian	Download 1000065300.jpg	10:11:44
Arian	Red line is egress bandwidth	10:11:51
Arian	It's hockey sticking	10:11:57
jappie	does that mean that the amount of requests for stuff that isn't cached by fastly is growing (perhaps scrapers stumbling upon old derivations)? because I'd assume an uptick in users downloading new-ish derivations would mean more hits in fastly and no noticeable growth in S3 egress	10:17:43
hexa	I think so	10:18:23
jappie	how the hell are scrapers discovering old store paths / derivations... the URLs for those contain a hash right? I'd expect that to be really difficult and time-consuming (and useless) to scrape	10:19:37
Arian	You can dump all of hydras evaluations. Or run evaluations for all historical nixos commits yourself	10:21:14
leona	can we determine that this is egress via fastly or is someone downloading them directly from AWS?	10:21:46
Arian	Directly through S3 is only possible when requester pays	10:22:01
Arian	We don't have anonymous auth enabled on our bucket. You need to provide your iam identity and it gets billed to the caller	10:22:32
Arian	It would be preferable if scrapers would scrape S3 directly as then it doesn't cost us	10:22:55
hexa	one obvious fix would be to GC harder, provide fewer targets	10:24:42
Arian	I'm wondering if I can somehow figure out from S3 the distribution of the age of objects being requested	10:30:52
emily	is it still true that Fastly doesn't cache paths for long even once built?	11:10:55
emily	I forget what the conclusion of that discussion was (I know the focus was on missing paths because of the access pattern but presumably those are not what's causing these expenses)	11:11:26
emily	I guess the issue is that if it's sufficiently spread out/high cardinality no per-path caching will help. (though it seems surprising for scrapers to be going out of their way to find old stuff to query, I really doubt N versions of the same binaries are valuable?)	11:12:57
Arian	Missing paths don't generate bandwidth cost. They generate API call cost. Which is small	11:33:00
emily	right	11:58:09
emily	but I mean if a present path is being hammered, how long does Fastly cache that before going back to S3?	11:58:33
Arian	24h i think	11:58:46

Show newer messages

Back to Room ListRoom Version: 6