5 Jul 2022 |
Carl Thomé | Wonder if anyone is thinking about seeing ML model training as a nix-build and adding the resulting model binaries to the nix store, similarly to how DVC does it:
https://dvc.org/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory
Crazy? Reproducible experiment tracking as a build process rather than application state? | 21:35:28 |
jbedo | Yeah I do that a lot | 21:38:54 |
jbedo | Also run bioinformatics pipelines similarly: https://github.com/papenfusslab/bionix | 21:40:03 |
6 Jul 2022 |
Jez (he/him) 🌦️ | Makes sense that the outputs (including intermediate ones) should go in the nix store and be immutable | 06:39:46 |
Jez (he/him) 🌦️ | Although how do you deal with nondeterminism? Would it work to consider the RNG seed to be as part of the build input to recovery reproducibility? | 06:41:46 |
Jez (he/him) 🌦️ | * Although how do you deal with nondeterminism? Would it work to consider the RNG seed to be as part of the build input to recover reproducibility? | 06:48:39 |
jbedo | i set my seeds so it's deterministic | 06:48:58 |
jbedo | there's that new non-determinism feature of nix that might be useful if you can't do that | 06:50:38 |
Jez (he/him) 🌦️ | ooh, I hadn't heard about that | 06:53:32 |
jbedo | https://github.com/NixOS/nix/pull/6227 | 06:55:00 |
jbedo | i haven't really thought about this in the context of data pipelines, just remember seeing it pass my notifications | 06:55:36 |
cdepillabout | While I don't have a huge amount of experience with this, one thing to keep in mind is that if you have huge datasets, they end up in the Nix store (unless you're doing something unusual) along with your output model. Your Nix store can really blow up in space. You may need to make sure you're garbage collecting frequently, but that may reduce the helpfulness of running training with Nix in the first place.
You'd also have to be careful with running builds on a machine that pushes results to a shared cache, since huge datasets and models could really clear out everything else in a cache.
You may also want to setup your Nix builders so they can see your GPUs. And fiddle with the max-jobs option so that two builds aren't trying to train in parallel.
| 07:45:28 |
cdepillabout | If you're trying to productionize something, it may be easier to start with a system that already has answers for all this type of stuff (some MLOps service or something?), rather than rolling your own with Nix. | 07:55:13 |
9 Jul 2022 |
| betaboon changed their profile picture. | 11:33:12 |
12 Jul 2022 |
| pbsds joined the room. | 22:55:46 |
26 Jul 2022 |
| underpantsgnome changed their display name from tinybronca to tailrec. | 14:39:42 |
| underpantsgnome changed their display name from tailrec to tinybronca. | 15:40:54 |
| IPK joined the room. | 18:32:31 |
27 Jul 2022 |
| Collin Arnett joined the room. | 20:39:42 |
1 Aug 2022 |
| better_sleeping joined the room. | 09:06:29 |
| better_sleeping left the room. | 09:06:44 |
Collin Arnett | How do people deal with packages trying to download models at runtime to the nix store? This is more of a general question since I've noticed that a lot of data science libraries nowadays are just downloading dependencies at runtime. | 18:18:39 |
2 Aug 2022 |
FRidh | In reply to @collinarnett:matrix.org How do people deal with packages trying to download models at runtime to the nix store? This is more of a general question since I've noticed that a lot of data science libraries nowadays are just downloading dependencies at runtime. I think all you can do is contact upstream to fix where they install to. If they need models, that could be fine, but then they should probably go to ~/.cache | 09:26:59 |
Collin Arnett | In reply to @FRidh:matrix.org I think all you can do is contact upstream to fix where they install to. If they need models, that could be fine, but then they should probably go to ~/.cache Yeah I think this is probably the best solution. Thank you :) | 17:16:04 |
6 Aug 2022 |
| Yuu Yin changed their display name from yuu to yuu[m]. | 17:42:49 |
| Yuu Yin changed their display name from yuu[m] to yuu. | 20:35:43 |
13 Aug 2022 |
| Yuu Yin changed their display name from yuu to yuu[m]. | 03:15:45 |
| Yuu Yin changed their display name from yuu[m] to yuu. | 05:06:14 |
15 Aug 2022 |
| pbsds changed their display name from pederbs to pbsds. | 23:19:31 |
16 Aug 2022 |
| Felix joined the room. | 11:48:20 |