!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

311 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda61 Servers

Load older messages


SenderMessageTime
11 Jun 2024
@glepage:matrix.orgGaƩtan Lepage

I am packaging this: https://github.com/EricLBuehler/mistral.rs?tab=readme-ov-file#installation-and-build
You can see that it support several variations for building (CUDA, metal, mkl...)

-> What should be the approach ? Adding cudaSupport ? metalSupport ? mklSupport ?

07:01:41
@kaya:catnip.eekaya 𖤐 changed their profile picture.08:03:48
@hexa:lossy.networkhexa
In reply to @glepage:matrix.org
This looks like it could work !
However, how do you apply a patch to a wheel-type python derivation ?
likely in postInstall šŸ˜•
11:58:18
@hexa:lossy.networkhexacurses11:59:30
@glepage:matrix.orgGaƩtan Lepage Ok, but I can I use fetchpatch though ? 12:02:43
@ss:someonex.netSomeoneSerge (matrix works sometimes) connor (he/him) (UTC-5) IIRC you brought up setting legacy (FindCUDA&c) variables from the setup hooks. I think we should set them, and we should put that logic behind a guard (e.g. findCudaCmakeSupport=true), just as we should guard the current logic (e.g. findCudatoolkitCmakeSupport=true). We should disable the legacy by default. We should only set cmake flags when the cmake hook is actually used or when cmake flags are explicitly requested. 13:19:22
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @keiichi:matrix.org
when using localai 2.15 from unstable and even after a reboot I get ggml_cuda_init: failed to initialize CUDA: CUDA driver is a stub library. It's a bit random but if anyone has a tip, I take it. nvidia-smi output looks fine
LD_DEBUG=libs
13:19:44
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @glepage:matrix.org

I am packaging this: https://github.com/EricLBuehler/mistral.rs?tab=readme-ov-file#installation-and-build
You can see that it support several variations for building (CUDA, metal, mkl...)

-> What should be the approach ? Adding cudaSupport ? metalSupport ? mklSupport ?

Does it allow enabling multiple features at once?
13:20:13
@glepage:matrix.orgGaƩtan Lepage
In reply to @ss:someonex.net
Does it allow enabling multiple features at once?
No, but I think that I will copy the implementation from ollama
13:20:38
@glepage:matrix.orgGaƩtan LepageIt looks very clean to me13:20:44
@glepage:matrix.orgGaƩtan Lepage https://github.com/NixOS/nixpkgs/blob/master/pkgs/by-name/ol/ollama/package.nix#L65-L82 13:21:07
@ss:someonex.netSomeoneSerge (matrix works sometimes) The shouldEnable logic looks maybe a bit complex but the arguments seem good? 13:23:21
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @gjvnq:matrix.org
Hey, can I ask for help with compiling alice-vision on NixOS?
Looking more closely, I'd guess the issue is somewhere around __has_include(<Imath/half.h>) in ${openimageio.dev}/include/OpenImageIO/half.h
14:49:05
@keiichi:matrix.orgteto SomeoneSerge (UTC+3): TIL, LD_DEBUG looks quite useful. I suppose the "stub" referred to in the message concerns /nix/store/q3m473lh6gcg4xbhbknrhmcj7w7njjs6-cuda_cudart-12.2.140-lib/lib/stubs/glibc-hwcaps/x86-64-v3 . Do you know what a "stub" is and why that would be a problem ? I understand "stub" as a "generic" library ? (I have a 3060RTX) 16:34:54
@connorbaker:matrix.orgconnor (he/him) teto: as I understand it, we use stub libraries when the libraries we would link against aren't available -- for example, because they exist outside the sandbox (like libcuda.so does, as part of the NVIDIA driver, in /run/opengl-driver/lib/). They allow the build to succeed where they would otherwise fail due to missing symbols.
They shouldn't cause issues at runtime, because the executable should find and load the proper library from wherever it is it comes from (in this case, /run/opengl-driver/lib/).
18:05:24
@keiichi:matrix.orgtetoI dont seem to have any cuda library in /run/opengl-driver/. Should I add anything into hardware.opengl.extraPackages ?18:17:25
@connorbaker:matrix.orgconnor (he/him) You mean /run/opengl-driver/lib/ and not /run/opengl-driver/ right? 18:41:09
@keiichi:matrix.orgtetoI've searched both in depth so yes19:53:53
@connorbaker:matrix.orgconnor (he/him)What's the command you're using to try to run this piece of software? If it's a flake I can try to reproduce it on my machine20:37:10
@gjvnq:matrix.orgMir
In reply to @ss:someonex.net
Looking more closely, I'd guess the issue is somewhere around __has_include(<Imath/half.h>) in ${openimageio.dev}/include/OpenImageIO/half.h

Yeah, I had already figured out it but the bug issue is that I don't know what is the "right" way to include the definition of the half type.

To make matters worse, I've tried to compile AliceVision on a docker container and using the official compilation scripts and yet the thing keeps failing. This means I can't even look at how the thing is supposed to compile.

I'm at a bit of a loss as for how to proceed, but I suspect that I'll have to either ask the original authors for help or carefully read the cmake compilation scripts in order to look for potential sources of the error.

Theoretically AliceVision has a nice CI pipeline but I can't see their build history so I don't even know how useful their CI scripts are.

20:44:35
@keiichi:matrix.orgteto connor (he/him) (UTC-5): it's packaged in nixpkgs, nix run nixpkgs#local-ai (you need the override with config.cudaSupport true though). At one point I had GPU working but I use it on and off and now something changed in nixpkgs probably. 21:11:33
@ss:someonex.netSomeoneSerge (matrix works sometimes)
In reply to @keiichi:matrix.org
I dont seem to have any cuda library in /run/opengl-driver/. Should I add anything into hardware.opengl.extraPackages ?
hardware.opengl.enable and the nvidia driver?..
21:29:16
@ss:someonex.netSomeoneSerge (matrix works sometimes) * the state of hardware.opengl.enable and the nvidia driver?.. 21:29:30
@connorbaker:matrix.orgconnor (he/him) What revision of nixpkgs are you on? master fails to build (go-stable-diffusion errors during CMake configure) 21:50:56
@keiichi:matrix.orgtetoha nevermind I do have libcuda.so and so on in /run/opengl-driver/lib (my first search must have ignored symlinks). I use the local-ai from nixos-unstable so that would be c7b821ba2e1e635ba5a76d299af62821cbcb09f3 21:58:35
@connorbaker:matrix.orgconnor (he/him)

Huh, not sure how that's working for you since I can't get it to build:

$ cat ~/.config/nixpkgs/config.nix 
{
  allowAliases = false;
  allowBroken = false;
  allowUnfree = true;
  checkMeta = true;
  cudaCapabilities = [ "7.5" ];
  cudaSupport = true;
}
$ nix run --impure -L --show-trace --builders '' --max-jobs 1 github:nixos/nixpkgs/c7b821ba2e1e635ba5a76d299af62821cbcb09f3#local-ai
...
go-stable-diffusion> CMake Error at /nix/store/q1nssraba326p2kp6627hldd2bhg254c-cmake-3.29.2/share/cmake-3.29/Modules/FindCUDA.cmake:883 (message):
go-stable-diffusion>   Specify CUDA_TOOLKIT_ROOT_DIR
go-stable-diffusion> Call Stack (most recent call first):
go-stable-diffusion>   /nix/store/r008mflixfchlfscby4h0mjgqvz059pw-opencv-4.9.0/lib/cmake/opencv4/OpenCVConfig.cmake:86 (find_package)
go-stable-diffusion>   /nix/store/r008mflixfchlfscby4h0mjgqvz059pw-opencv-4.9.0/lib/cmake/opencv4/OpenCVConfig.cmake:108 (find_host_package)
go-stable-diffusion>   examples/CMakeLists.txt:17 (find_package)
22:08:21
@connorbaker:matrix.orgconnor (he/him) Oh! SomeoneSerge (UTC+3) I was rebuilding OpenCV4 with the changes to the setup hooks you mentioned earlier about the CMake flags being opt-in, and I noticed that switching --compiler-bindir to -ccbin was enough to get rid of the "incompatible redefinition" warnings we've been seeing with CMake: https://github.com/NixOS/nixpkgs/pull/306172/commits/7dc8d6d83a853f98a695e2b23aa8d33a50aff6df#diff-3692a7105fd90d95727cd2f794cdb4af2656be94af52d97485c9d4ded9107883L93-R72 22:23:06
@trexd:matrix.orgtrexdDo you guys have any recommendations on setting up a cache for https://github.com/hasktorch/hasktorch-skeleton/pull/9 ? The haskell part of the build already takes quite a long time but throw CUDA in the mix and I think build times would quickly get out of hand. Either way I've also never setup a cache via cachix or hercules-ci before so I don't know what the limits on build times are either. 23:31:32
12 Jun 2024
@aidalgol:matrix.orgaidalgol SomeoneSerge (UTC+3): Any thoughts? https://github.com/NixOS/nixpkgs/issues/319167 00:36:45
@aidalgol:matrix.orgaidalgolThis strikes me as a rather odd thing to even be checking for as a user.00:39:26

Show newer messages


Back to Room ListRoom Version: 9