1 Jul 2025 |
magic_rb | In reply to @roberthensing:matrix.org Probably a symlink to a synthetic .git/annex directory is preferable Id say that each annexed object should be a separate store path | 11:43:15 |
magic_rb | You dont want to be copying 40GB every time a single image changes | 11:43:25 |
Robert Hensing (roberth) | True, but we shouldn't be copying those store paths in the first place | 11:43:50 |
Robert Hensing (roberth) | And if we do, they should match the non-Nix layouts | 11:44:02 |
Robert Hensing (roberth) | Currently there's not much of a connection between libfetchers fetchers and the store layer, so putting it in the store is more complicated than putting it in .git/annex , fwiw | 11:44:27 |
magic_rb | Wait so we just tell nix to include .git/annex during eval? | 11:44:57 |
Robert Hensing (roberth) | (copying to the store only happens at the end of fetchTree and not within the individual source accessors) | 11:45:02 |
magic_rb | Okay im lost lol | 11:45:17 |
magic_rb | We do have to copy for a build still no? | 11:45:50 |
Robert Hensing (roberth) | So we have the sources accessor objects that behave like very simple virtual file systems, and we plan to use them directly instead of copying everything to the store all the time | 11:46:11 |
Robert Hensing (roberth) | They can implement operations like readDirectory or readFile as they please, so the git accessor with annex enabled could add a .git and .git/annex to what it returns, and then do whatever is necessary to return the contents of that | 11:47:31 |
Robert Hensing (roberth) | Yes, but only the things you bring into the derivation. You could use a source filter to avoid some unneeded stuff. Currently that's all moot because fetchTree copies everything it could return, but we'll change that, and make source filtering a solution for this problem. | 11:49:19 |
magic_rb | In reply to @roberthensing:matrix.org Yes, but only the things you bring into the derivation. You could use a source filter to avoid some unneeded stuff. Currently that's all moot because fetchTree copies everything it could return, but we'll change that, and make source filtering a solution for this problem. Right which is why im saying that i think each annexed file should become its own store path, so that youre not copying about 40GB for each build | 11:50:14 |
Robert Hensing (roberth) | If it's not clear which is better, we could make this behavior configurable. Making the right parts of .git/annex available to derivations would be a pain. | 11:50:38 |
magic_rb | Say im working on a game, then to nix build it, i need essentially all the annexed files, so the for every build im copying all the assets which can be arbitrarily huge | 11:51:13 |
Robert Hensing (roberth) | That's designing for the current Nix, not the Nix we're promising, fwiw | 11:51:33 |
magic_rb | Even with lazy tree, if im using 40GB, i have to copy 40GB every time no? | 11:52:15 |
magic_rb | Thats how i understood it | 11:52:25 |
Robert Hensing (roberth) | So I guess we have three possible behaviors:
- deference it completely and return the file contents
- put it in the store and return store references somehow (more complicated than it seems, but potentially lazy under more circumstances)
- generate a
.git/annex which only has the relevant entries (probably annoying to work with, unless your build makes assumptions about .git/annex )
| 11:53:59 |
Robert Hensing (roberth) | Only if you dereference every annexed file | 11:54:31 |
Robert Hensing (roberth) | To make fetching the annexed files lazy involves:
- changing fetchTree to not copy everything to the store, returning a path value instead of a store path (or a lazy-trees store path that produces the correct hash part, which has never been done)
- implementing the annex support in an "event based" manner - not eagerly fetching everything
| 11:58:34 |
emily | if the build of a game requires processing every source asset – which I believe is normal – then this does not stop copying every asset anew for every build, even if lazy trees were fully implemented, right? | 11:59:20 |
emily | (maybe you can cut down on it if you can split it into one asset build per derivation since the store paths would remain the same if the individual asset doesn't change?) | 11:59:55 |
emily | (and I guess integration could avoid re-hashing the file to determine that?) | 12:00:03 |
emily | just checking I understand correctly that if all the assets are needed in the same derivation as part of a src = ./.; type thing it would still amount to copying the entire thing every time | 12:00:24 |
magic_rb | Yeah this is what im saying. Say youre building a godot game, you end up with src = ./. , you cant do much with uh, whats was it filterTree | 12:03:33 |
Robert Hensing (roberth) | The store references choice does make that easier to achieve. Otherwise, we're looking representations of store objects that don't reside in a real filesystem but in a FUSE store, and an underlying ca-store like tvix/snix. That would be great to have, but it's a lot of work | 12:03:44 |
emily | I think the practical solution is just to poke holes in the sandbox to access the underlying content-addressed annex store | 12:05:47 |
emily | and rewrite symlinks to point to it before you run your build | 12:05:56 |
Robert Hensing (roberth) | Right, so to clarify, if your main goal is to distribute a built game, you should not use builtin fetchers for anything but Nix expressions, because those will be intermediate results, and only in case of FODs it's possible to optimize those away by virtue of the dependent being available and having no reference to the source | 12:05:59 |