!fXpAvneDgyJuYMZSwO:nixos.org

Nix Data Science

310 Members
64 Servers

Load older messages


SenderMessageTime
3 Sep 2023
@brodriguesco:matrix.orgBruno Rodriguesfriends, what do you use to build pipelines that integrate will with Nix? something that would let you build an environment and run a pipeline as simple as possible? me, coming from R I'm adding the command to run the pipeline in the shellHook of a shell (typically targets::tar_make)14:04:29
@brodriguesco:matrix.orgBruno Rodriguessou dropping in the shell starts the pipeline and then let's me explore the results 14:05:00
@brodriguesco:matrix.orgBruno RodriguesI would imagine que there's likely a way to use nix itself to do it 14:09:20
@crtified:crtified.meCRTified Maybe check out https://github.com/PapenfussLab/bionix - it appears to at least partially match your description (although I don't have any experience with it) 14:10:17
@brodriguesco:matrix.orgBruno Rodriguesyep I've taken a look at it already 😁14:11:22
@crtified:crtified.meCRTifiedCan't really help further than that, I'm still pondering whether I should migrate our hpc cluster to NixOS, but my contract ends soon and I likely won't have enough time14:13:14
@crtified:crtified.meCRTified* Can't really help further than that, I'm still only pondering whether I should migrate our hpc cluster to NixOS, but my contract ends soon and I likely won't have enough time14:13:27
@jb:vk3.wtfjbedoI use bionix pretty extensively, largest project currently involves processing ~90TiB of primary data21:43:06
@crtified:crtified.meCRTified
In reply to @jb:vk3.wtf
I use bionix pretty extensively, largest project currently involves processing ~90TiB of primary data
Does that dataset interact with the nix store in any way? Or is it kept separately? (Just asking, because I had some problems with putting just <30GB inside the store)
21:45:02
@crtified:crtified.meCRTifiedIt probably doesn't make sense to put it in the store, I guess 😄21:45:21
@jb:vk3.wtfjbedoIt does make sense and we do :)21:45:33
@crtified:crtified.meCRTifiedWow, that's surprising 😄21:46:15
@jb:vk3.wtfjbedoIt means we can cache some key intermediate products so we rarely need to revisit the primary inputs 21:46:26
@jb:vk3.wtfjbedoLarge store paths used to be an issue but mostly work pretty smoothly now as a lot of the memory bottlenecks have been removed21:49:18
@crtified:crtified.meCRTifiedThat makes sense. I really had the impression that large store entries still pose a lot of problems 🙂 Nice to know that this changed21:51:16
4 Sep 2023
@brodriguesco:matrix.orgBruno Rodriguesdoes that mean that you could retrieve intermediary outputs from the store to further Analyse in R?16:25:16
@brodriguesco:matrix.orgBruno Rodriguessay I create a ggplot does it get into the store, and could I look at it in R later?16:25:50
@crtified:crtified.meCRTifiedI mean, the store contains files after all. I don't see a reason why that shouldn't work16:26:07
@brodriguesco:matrix.orgBruno Rodriguesthat's essentially how targets works and I find it very useful 16:26:08
@brodriguesco:matrix.orgBruno Rodriguestrue, but I was wondering if bionix provides a mechanism for retrieving these files within an R session 16:26:49
@brodriguesco:matrix.orgBruno Rodriguesand also suppose I fit a model, and would like to save this model 16:31:35
@brodriguesco:matrix.orgBruno Rodrigueswith targets this model gets serialized and saved for later retrieval in an r session 16:32:06
@brodriguesco:matrix.orgBruno Rodriguesdoes nix/bionix serialize intermediary outputs ?16:32:46
@crtified:crtified.meCRTified I think jbedo could help a bit more for the bionix question, but for additional outputs, I'd actually check the outputs argument for mkDerivation 16:33:08
@crtified:crtified.meCRTifiedThe typical usecase are distinct outputs for libs, bin and so on, but I think it would fit that application nicely as well16:33:37
@jb:vk3.wtfjbedo
In reply to @brodriguesco:matrix.org
with targets this model gets serialized and saved for later retrieval in an r session
there's no knowledge of bionix at the application level, for R i use the builtin serialisation like save() and writeRDS(), and as CRTified said multiple outputs are handy for dealilng with several outputs produced simultaneously
22:27:20
5 Sep 2023
@brodriguesco:matrix.orgBruno Rodriguesvery nice I'll have to play around with it 07:18:04
@ri-char:hashi.sbsri-char left the room.15:55:18
11 Sep 2023
@softinio:matrix.org@softinio:matrix.org joined the room.01:26:02
@maupind:matrix.orgmaupind Do many people here work with non-nix users, particularly people on Windows? Curious on workflows that you use to share your code/projects. I've played around with the rix package (awesome work Bruno Rodrigues), but trying to make as low of barrier entry as possible. I'm hoping to be able to upload code used in research data analysis and would love a simple approach with as little work for end-users who may want to check. I've thought Docker could work and also been told about devcontainer.json as an option. Any other ideas/tips? 15:54:00

Show newer messages


Back to Room ListRoom Version: 6