Sender | Message | Time |
---|---|---|
29 Jul 2021 | ||
19:22:32 | ||
3 Aug 2021 | ||
11:23:58 | ||
11:24:06 | ||
Any recommendations for data pipeline/toolkits that integrate well with nix? | 11:31:51 | |
to do what? i use nix with a thin layer on top to manage bioinformatics pipelines | 12:03:59 | |
jbedo: do you use bionix? or is it another thing? | 13:12:47 | |
Currently I just have a bunch of python scripts I execute in a given order, but I'm looking for something that would help me formalise that order, easily extend/swap out parts of those pipelines, and help with deployment | 15:01:59 | |
lunik1: that's a pattern I used Nix/Hydra for. Basically you have a set of "ingress"/"egress" derivations that may be impure (eg: fetch/store from S3) or pure. Then a chain of nix derivations that depend on each other. I defined a function to apply various transformations and map'd them to my list of ingress derivation. It was super nice for iteration, scaling up workers, cached results, experimenting with alternate pipelines. Way better and more productive than something like Airflow. I started to apply content-addressed derivations to them to do short-circuiting as well, it was still in progress for Hydra compatibility. | 19:36:25 | |
Damn that sounds awesome, any of this open source? | 19:37:47 | |
No. My plan is to capture the idea, organize it a bit better, and have that be open source. I've heard of a few people re-inventing this a few times, so I want extract out the common portions and perhaps provide a "flow-library" or something to make it easier to put together. | 19:39:30 | |
I'd be happy to collaborate on it. | 19:39:45 | |
Was that all batch processing or could you handle streaming data too? | 19:39:48 | |
It was not streaming (in the sense that you just shoved in requests on one and and results popped out the other), but it ended up having lower latency than our streaming solution.... so....... But that was because of Hydra, you can use the same "flow-library" and use a different evaluator/build system to get something more streaming | 19:41:46 | |
So you could effectively run in "real time"? With some sort of API? | 20:09:37 | |
potentially, but that comes with more complexity and requirements that i'd like to avoid (or at least be agnostic about) for now | 20:16:20 | |
For my application I would need some functionality like that, and nix doesn't seem obviously suited to it | 20:20:35 | |
In reply to @tomberek:matrix.orgyeah bionix | 22:17:54 | |
In reply to @tomberek:matrix.orgsounds somewhat similar to the way bionix models processing steps as nix functions, allowing you to easily map transformations over sets of inputs etc | 22:19:03 | |
On that matter, has anyone got https://www.fluvio.io/ running? | 22:19:26 | |
of course i had cluster executing in mind as well since i had to make the computations work on slurm | 22:19:39 | |
In reply to @vk3wtf:matrix.orgDo you have some configuration public for setting up slurm? I'm currently getting into HPC administration and I'm trying to get a slurm cluster up and running with nixops, so it'd be great to see what others use to set it up :) | 22:25:17 | |
no i don't run the cluster with nix, i just submit jobs to it with nix | 22:34:25 | |
bionix looks nice but I gather is pretty tightly tied to bioinformatics? | 23:50:34 | |
4 Aug 2021 | ||
well the library of tools is, but the general idea isn't | 00:48:11 | |
at it's core it's just a collection of functions taking config -> inputs -> output (drvs), and building pipelines by composing them together | 00:49:21 | |
there's some small abstractions in bionix to allow for switching of the execution context, so that instead of the nix builder running the build it can be submitted to a cluster instead | 00:53:45 | |
7 Aug 2021 | ||
Ah, haven't visited this chat in a while...
| 21:10:05 | |
8 Aug 2021 | ||
14:35:31 | ||
16 Aug 2021 | ||
14:41:12 | ||
4 Sep 2021 | ||
19:26:30 |