!fXpAvneDgyJuYMZSwO:nixos.org

Nix Data Science

287 Members
62 Servers

Load older messages


SenderMessageTime
10 Jun 2022
@johnwanyekz:matrix.orgKi left the room.06:14:29
18 Jun 2022
@atharvaamritkar:matrix.orgwiredhikari joined the room.10:04:30
30 Jun 2022
@necrophcodr:matrix.orgnecrophcodr joined the room.07:31:27
@necrophcodr:matrix.orgnecrophcodrDo we yet have anything like https://gitlab.inria.fr/guix-hpc/guix-kernel/ in Nix land?07:32:17
@schnecfk:ruhr-uni-bochum.deCRTified (old handle)There is a nix kernel in https://github.com/tweag/jupyterWith07:41:57
@jb:vk3.wtfjbedohttps://github.com/tweag/jupyterWith is similar07:41:58
@jb:vk3.wtfjbedoxd07:42:18
@necrophcodr:matrix.orgnecrophcodrWell, it spins up an instance with those kernels available, but what the guix-kernel does is allow you to use Guix directly in a notebook, allowing you not only a kernel per cell, but a kernel environment per cell.08:14:43
@necrophcodr:matrix.orgnecrophcodrI'd prefer to use Nix for that though, if something like it exists. jupyterWith is not close to that behaviour, and wouldn't work on a self-hosted jupyter platform as far as I can tell, especially for the users of the platform (which is my core focus as a data scientist)08:15:59
@necrophcodr:matrix.orgnecrophcodrjupyterWith is great if what you want to do is setup a reproducible simple environment on your own system or for yourself, or for a general reproducible platform, but not for individually reproduced (even foreign) notebooks on a hosted platform.08:17:22
@necrophcodr:matrix.orgnecrophcodr
In reply to @schnecfk:ruhr-uni-bochum.de
There is a nix kernel in https://github.com/tweag/jupyterWith
The nix kernel might be very useful though, if it's possible to spin up other per-cell kernels in that
08:19:04
5 Jul 2022
@rgrunbla:matrix.orgRémy Grünblatt changed their display name from Reventlov to Rémy Grünblatt.12:36:43
@carlthome:matrix.orgCarl ThoméWonder if anyone is thinking about seeing ML model training as a nix-build and adding the resulting model binaries to the nix store, similarly to how DVC does it: https://dvc.org/doc/user-guide/project-structure/internal-files#structure-of-the-cache-directory Crazy? Reproducible experiment tracking as a build process rather than application state?21:35:28
@jb:vk3.wtfjbedoYeah I do that a lot21:38:54
@jb:vk3.wtfjbedoAlso run bioinformatics pipelines similarly: https://github.com/papenfusslab/bionix21:40:03
6 Jul 2022
@jez:petrichor.meJez (he/him) 🌦️Makes sense that the outputs (including intermediate ones) should go in the nix store and be immutable06:39:46
@jez:petrichor.meJez (he/him) 🌦️Although how do you deal with nondeterminism? Would it work to consider the RNG seed to be as part of the build input to recovery reproducibility? 06:41:46
@jez:petrichor.meJez (he/him) 🌦️* Although how do you deal with nondeterminism? Would it work to consider the RNG seed to be as part of the build input to recover reproducibility? 06:48:39
@jb:vk3.wtfjbedoi set my seeds so it's deterministic 06:48:58
@jb:vk3.wtfjbedothere's that new non-determinism feature of nix that might be useful if you can't do that06:50:38
@jez:petrichor.meJez (he/him) 🌦️ooh, I hadn't heard about that06:53:32
@jb:vk3.wtfjbedohttps://github.com/NixOS/nix/pull/622706:55:00
@jb:vk3.wtfjbedoi haven't really thought about this in the context of data pipelines, just remember seeing it pass my notifications06:55:36
@cdepillabout:matrix.orgcdepillabout

While I don't have a huge amount of experience with this, one thing to keep in mind is that if you have huge datasets, they end up in the Nix store (unless you're doing something unusual) along with your output model. Your Nix store can really blow up in space. You may need to make sure you're garbage collecting frequently, but that may reduce the helpfulness of running training with Nix in the first place.

You'd also have to be careful with running builds on a machine that pushes results to a shared cache, since huge datasets and models could really clear out everything else in a cache.

You may also want to setup your Nix builders so they can see your GPUs. And fiddle with the max-jobs option so that two builds aren't trying to train in parallel.

07:45:28
@cdepillabout:matrix.orgcdepillaboutIf you're trying to productionize something, it may be easier to start with a system that already has answers for all this type of stuff (some MLOps service or something?), rather than rolling your own with Nix.07:55:13
9 Jul 2022
@betaboon:0x80.ninjabetaboon changed their profile picture.11:33:12
12 Jul 2022
@pederbs:pvv.ntnu.nopbsds joined the room.22:55:46
26 Jul 2022
@tinybronca:sibnsk.netunderpantsgnome changed their display name from tinybronca to tailrec.14:39:42
@tinybronca:sibnsk.netunderpantsgnome changed their display name from tailrec to tinybronca.15:40:54
@ipk:h0n3yb4dg3r.ems.hostIPK joined the room.18:32:31

Show newer messages


Back to Room ListRoom Version: 6