!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

288 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
25 Nov 2022
@ss:someonex.netSomeoneSerge (back on matrix)Oh, not at all. Just that I don't think I had even one PR accepted without someone asking me to specify the platforms explicitly 🤣 But it also makes sense: the attribute is treated specially by hydra (and probably other CIs), so why leave it implicit21:32:51
@ss:someonex.netSomeoneSerge (back on matrix)"Caching" was my first thought as well, but probably we're talking about different caches. It's always about caching though21:34:22
@ss:someonex.netSomeoneSerge (back on matrix) Does anyone feel like following through with the 11.7 -> 11.8 bump? There were some packages with cmake configure failing =\ 21:35:48
@tpw_rules:matrix.orgtpw_rules
In reply to @skainswo:matrix.org
I'm not sure I understand... do you have concrete examples of these caching failures? what could GitHub do to help the situation?
it's not possible to naïvely cache the store using the cache action because it can't restore the read only files. i'm not sure why they can't just extract the archive with sudo or whatever. it's work-around-able by manually changing permissions and shuffling stuff around but that's slow
22:37:16
26 Nov 2022
@skainswo:matrix.orgSamuel Ainsworthahhhhhh I see. sounds a bit like https://github.com/NixOS/nixos-foundation/issues/25#issuecomment-132789224300:01:04
@skainswo:matrix.orgSamuel Ainsworthyeah, agreed that would be helpful00:01:14
@tpw_rules:matrix.orgtpw_rulesyes. i'm not even clamoring for a global cache, i really just want the cache action to work nicely00:06:10
@tpw_rules:matrix.orgtpw_rulesthough a global cache would be great00:06:18
@tpw_rules:matrix.orgtpw_rules * though a global cache would be great. but fixing the cache action should be a very easy target00:06:36
@tpw_rules:matrix.orgtpw_rulesi commented on that thread, thanks for pointing me to it00:09:09
@hexa:lossy.networkhexahttps://github.com/NixOS/nixpkgs/pull/20276916:56:54
@hexa:lossy.networkhexacan someone review this?16:56:59
@hexa:lossy.networkhexaworking on torchaudio and it would be neat to have this16:57:15
@ahsmha:matrix.orgahmed changed their display name from rh to ahmed.19:19:35
29 Nov 2022
@tpw_rules:matrix.orgtpw_rules hexa: running a nixpkgs-review cycle including CUDA stuff. planning to try the test suite too. expect something by tomorrow evening but i don't expect any major issues 05:05:55
@tpw_rules:matrix.orgtpw_rulesthank you for your patience05:06:53
@skainswo:matrix.orgSamuel Ainsworthdraft CUDA 11.8 upgrade: https://github.com/NixOS/nixpkgs/pull/20365820:49:35
@skainswo:matrix.orgSamuel Ainsworthfor some reason jaxlib and tensorflow are not building with 11.8... anyone have any ideas?20:49:56
@tpw_rules:matrix.orgtpw_rules i thought Someone S had a draft too and he got similar errors 20:58:37
@skainswo:matrix.orgSamuel Ainsworthoh that's enitrely possible... i was not aware23:50:55
@skainswo:matrix.orgSamuel Ainsworthyup, you're totally right: https://github.com/NixOS/nixpkgs/pull/20043523:51:40
@skainswo:matrix.orgSamuel Ainsworthi'm really confused why we're seeing these errors... they seem to indicate that the directory structure changed between 11.7 -> 11.823:52:23
1 Dec 2022
@box1:matrix.org@box1:matrix.org

I'm trying to package dgl-cu116(dgl with cuda support) and it fails to find rpath for libtorch_cuda_cpp.so and libtorch_cuda_cu.so.

After some searching, those files are generated under torch when it is built with BUILD_SPLIT_CUDA=1 or BUILD_SPLIT_CUDA=1. (https://discuss.pytorch.org/t/no-libtorch-cuda-cpp-so-available-when-build-pytorch-from-source/159864). This link says that BUILD_SPLIT_CUDA is not default because

there may be other side effects (like increased binary size) that users might not be expecting, and it's only when we are compiling for many architectures where we run into these linker issues.

Currently, [torch](https://github.com/NixOS/nixpkgs/blob/nixos-22.11/pkgs/development/python-modules/torch/default.nix) doesn't have an option to it. Maybe an option like mklDnnSupport so that it can be turned on for packages like dgl-cuda116 that needs those files would be great. Any thought on this?

11:44:06
@box1:matrix.org@box1:matrix.org *

I'm trying to package dgl-cu116(dgl with cuda support) and it fails to find rpath for libtorch_cuda_cpp.so and libtorch_cuda_cu.so.

After some searching, those files are generated under torch when it is built with BUILD_SPLIT_CUDA=1 or BUILD_SPLIT_CUDA=1. (https://discuss.pytorch.org/t/no-libtorch-cuda-cpp-so-available-when-build-pytorch-from-source/159864). This link says that BUILD_SPLIT_CUDA is not default because

there may be other side effects (like increased binary size) that users might not be expecting, and it's only when we are compiling for many architectures where we run into these linker issues.

Currently, [torch](https://github.com/NixOS/nixpkgs/blob/nixos-22.11/pkgs/development/python-modules/torch/default.nix) doesn't have an option to it. Maybe an option like mklDnnSupport so that it can be turned on for packages like dgl-cuda116 that needs those files would be great. Any thought on this?

11:44:26
@box1:matrix.org@box1:matrix.org *

I'm trying to package dgl-cu116(dgl with cuda support) and it fails to find rpath for libtorch_cuda_cpp.so and libtorch_cuda_cu.so.

After some searching, those files are generated under torch when it is built with BUILD_SPLIT_CUDA=ON or BUILD_SPLIT_CUDA=1. (https://discuss.pytorch.org/t/no-libtorch-cuda-cpp-so-available-when-build-pytorch-from-source/159864). This link says that BUILD_SPLIT_CUDA is not default because

there may be other side effects (like increased binary size) that users might not be expecting, and it's only when we are compiling for many architectures where we run into these linker issues.

Currently, [torch](https://github.com/NixOS/nixpkgs/blob/nixos-22.11/pkgs/development/python-modules/torch/default.nix) doesn't have an option to it. Maybe an option like mklDnnSupport so that it can be turned on for packages like dgl-cuda116 that needs those files would be great. Any thought on this?

11:44:48
@box1:matrix.org@box1:matrix.org *

I'm trying to package dgl-cu116(dgl with cuda support) and it fails to find rpath for libtorch_cuda_cpp.so and libtorch_cuda_cu.so.

After some searching, those files are generated under torch when it is built with BUILD_SPLIT_CUDA=ON or BUILD_SPLIT_CUDA=1. (https://discuss.pytorch.org/t/no-libtorch-cuda-cpp-so-available-when-build-pytorch-from-source/159864). This link says that BUILD_SPLIT_CUDA is not default because

there may be other side effects (like increased binary size) that users might not be expecting, and it's only when we are compiling for many architectures where we run into these linker issues.

Currently, torch doesn't have an option to it. Maybe an option like mklDnnSupport so that it can be turned on for packages like dgl-cuda116 that needs those files would be great. Any thought on this?

11:45:26
@hexa:lossy.networkhexa changed their display name from hexa to hexa (22.11 now).13:09:03
@hexa:lossy.networkhexa changed their display name from hexa (22.11 now) to hexa.14:38:53
@danielrf:matrix.orgdanielrfHi, I have some recent work that might be of interest to the Nix CUDA community: jetpack-nixos (https://github.com/anduril/jetpack-nixos) See also this announcement post on the discourse: https://discourse.nixos.org/t/jetpack-nixos-nixos-module-for-nvidia-jetson-devices/2363219:50:11
@danielrf:matrix.orgdanielrf The CUDA version included with jetpack is apparently not the same as just the aarch64 CUDA for servers, but I've tried to repackage the debs from NVIDIA in a way similar to cudaPackages in nixpkgs: https://github.com/anduril/jetpack-nixos/blob/master/cuda-packages.nix19:50:23

There are no newer messages yet.


Back to Room ListRoom Version: 9