| 31 Aug 2022 |
hexa | not an expert on where documentation would actually go | 00:40:48 |
tpw_rules | but i know that's a recent thing | 00:40:53 |
hexa | tying it to some package set would be easiest | 00:41:08 |
tpw_rules | i think the user would want to control it though. not sure what you mean by that comment | 00:41:47 |
tpw_rules | but more my question is i assume actually switching packages to care about it could wait until a later PR | 00:42:30 |
tpw_rules | isn't the cudaPackages a new thing? | 00:42:38 |
hexa | I don't know | 00:43:03 |
hexa | so you basically want a module that controls the cuda version used everywhere? | 00:43:20 |
Samuel Ainsworth | Seems fine to me as long as it builds! | 00:43:38 |
tpw_rules | not the cuda version, but the cuda architecture support list: https://github.com/NixOS/nixpkgs/blob/849bf642cf8319b0aca69708462ff8c4874189ca/pkgs/development/python-modules/torch/default.nix#L82 | 00:45:01 |
Samuel Ainsworth | IIRC Someone S also brought the cuda architecture support issue up in the past. we don't have a solution yet, but it sounds like a nice addition! | 00:46:35 |
tpw_rules | the tl;dr is that almost every generation of card requires binaries that target it specifically. binaries can't be used on cards that don't match. there are pre-binary forms which can be JITed into binaries in most circumstances, but performance can suffer. distributors would compile all binaries possible, but for the user who wants it to work on just their card, not doing that can save literal hours of compilation time | 00:46:47 |
tpw_rules | * the tl;dr is that almost every generation of card requires binaries that target it specifically. binaries can't be used on cards that don't match. there are pre-binary forms which can be JITed into binaries in most circumstances, but performance can suffer. distributors usually compile all binaries possible, but for the user who wants it to work on just their card and has to compile from source, which is everybody for nixpkgs cuda stuff, not doing that can save literal hours of compilation time | 00:47:09 |
tpw_rules | "all binaries possible" also depends on the cuda library version and specific package capabilities | 00:47:46 |
Samuel Ainsworth | IIUC the tradeoff here is between user compile times and the size of cached builds, ie. not every user needs every arch but we support more than one so that users don't have to rebuild locally | 00:47:57 |
tpw_rules | i wonder what percentage of nixpkgs cuda users use cached builds. i think it's very low | 00:48:29 |
Samuel Ainsworth | there's no established guidelines for this atm, packages set their own cuda arch's independently | 00:48:39 |
tpw_rules | maybe i am wrong | 00:48:41 |
Samuel Ainsworth | I would actually assume it's quite high. tensorflowWithCuda is something like 48 CPU-hours to build | 00:49:02 |
Samuel Ainsworth | or maybe 24... I don't remember exactly | 00:49:09 |
Samuel Ainsworth | but it's a big boy | 00:49:12 |
tpw_rules | yea, that's why i have a 48 core server as a remote builder and don't update my nixpkgs set except every 6 months :) | 00:49:41 |
Samuel Ainsworth | hehe lucky you | 00:49:53 |
Samuel Ainsworth | this was exactly why we built out the cachix cache | 00:50:16 |
Samuel Ainsworth | https://app.cachix.org/cache/cuda-maintainers#pull | 00:50:34 |
Samuel Ainsworth | and full wiki for context: https://nixos.wiki/wiki/CUDA | 00:50:51 |
tpw_rules | but in any case, being able to know what you'll get and how to ask for it would be good improvements to make, even if we leave the default at "all possible" is a good thing | 00:50:52 |
Samuel Ainsworth | yeah, that's fair | 00:51:08 |
Samuel Ainsworth | would be nice to get all packages aligned on how to do this in a consistent manner | 00:51:30 |
tpw_rules | yeah i've used that before, but the fact that it GCs old stuff spooks me a little. like i said i don't update much and not just because of build times, so being forced to upgrade or have to suffer the compile anyway in the future i don't like | 00:51:47 |