!fXpAvneDgyJuYMZSwO:nixos.org

Nix Data Science

315 Members
60 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
3 Aug 2021
@lunik1:lunik.onelunik1Any recommendations for data pipeline/toolkits that integrate well with nix?11:31:51
@vk3wtf:matrix.orgjbedoto do what? i use nix with a thin layer on top to manage bioinformatics pipelines12:03:59
@tomberek:matrix.orgtomberek jbedo: do you use bionix? or is it another thing? 13:12:47
@lunik1:lunik.onelunik1Currently I just have a bunch of python scripts I execute in a given order, but I'm looking for something that would help me formalise that order, easily extend/swap out parts of those pipelines, and help with deployment15:01:59
@tomberek:matrix.orgtomberek lunik1: that's a pattern I used Nix/Hydra for. Basically you have a set of "ingress"/"egress" derivations that may be impure (eg: fetch/store from S3) or pure. Then a chain of nix derivations that depend on each other. I defined a function to apply various transformations and map'd them to my list of ingress derivation. It was super nice for iteration, scaling up workers, cached results, experimenting with alternate pipelines. Way better and more productive than something like Airflow. I started to apply content-addressed derivations to them to do short-circuiting as well, it was still in progress for Hydra compatibility. 19:36:25
@lunik1:lunik.onelunik1Damn that sounds awesome, any of this open source?19:37:47
@tomberek:matrix.orgtomberekNo. My plan is to capture the idea, organize it a bit better, and have that be open source. I've heard of a few people re-inventing this a few times, so I want extract out the common portions and perhaps provide a "flow-library" or something to make it easier to put together.19:39:30
@tomberek:matrix.orgtomberekI'd be happy to collaborate on it.19:39:45
@lunik1:lunik.onelunik1Was that all batch processing or could you handle streaming data too?19:39:48

Show newer messages


Back to Room ListRoom Version: 6