| 1 Jun 2025 |
Arian | (found that out the hard way) | 17:44:57 |
edef | we cache them at most briefly, but this isn't the point | 17:45:30 |
edef | every derivation you build that isn't cached doesn't start building until it has been confirmed it 404s | 17:45:47 |
edef | approximately 100% of all nix derivations aren't in the cache | 17:46:02 |
edef | the caching is in the wrong direction | 17:46:29 |
edef | caching absences buys you essentially nothing | 17:46:42 |
edef | there are 1461501637330902918203684832716283019655932542976 (2^160) store paths, most of which are absent from the cache | 17:47:04 |
edef | we only have 2^28 or so present store paths | 17:47:31 |
edef | your 404 cache cannot fit 2^160 items | 17:47:41 |
Arian | Bloom Filter? | 17:48:01 |
Arian | Anyhow I digress. My idea definitely seems too simplistic | 17:48:12 |
edef | yes, a bloom filter works | 17:48:18 |
Vladimír Čunát | It's normal to cache even over larger spaces than 2^160. | 17:48:20 |
Vladimír Čunát | (I do that at work.) | 17:48:37 |
edef | but a bloom filter only kicks the can down the road, it only answers "probably present", etc | 17:49:30 |
edef | you'll improve p95 latency but not p100 | 17:49:49 |
edef | it is an improvement, but the actual hashset fits in memory easily | 17:50:28 |
Arian | Our p100 times probably pretty bad now too. S3 fetch across the continent is not particularly fast | 17:51:18 |
edef | this is true, yes | 17:51:32 |
edef | sticking the narinfos in postgres would be easy and get us a bloom filter with a CREATE INDEX | 17:52:13 |
edef | i forget what backends we currently have for the snix narinfo service but "just a database" is the straightforward answer | 17:52:59 |
edef | an object store full of text files is just an awkward answer | 17:53:16 |
Arian | What if we only do my idea for Nars and not narinfos | 17:53:36 |
Arian | Would still cut down bandwidth and keep 404s "fast" | 17:53:53 |
edef | In reply to @edef1c:matrix.org so like, for the most part you can do anything to the NAR datapath sure, that is fine ^ | 17:54:00 |
edef | all i aim to point at is that the narinfo datapath is uniquely sacred | 17:54:24 |
edef | it is not where the heavy lifting is in terms of data volume | 17:54:32 |
edef | nar requests are essentially guaranteed to hit (you were given the path in a narinfo, it will exist) and don't need to be particularly low-latency | 17:55:51 |
flokli | I still think it'd be very worthwhile to tap into narinfo uploads, so we can continuously update our data on narinfos that doesn't involve scraping millions of text files. | 17:58:22 |
edef | yes | 17:58:30 |