| 10 Sep 2024 |
@adam:robins.wtf | * yes, NixOS. 3060Ti | 23:52:05 |
@adam:robins.wtf | * yes, NixOS. 3060 | 23:52:13 |
@adam:robins.wtf | 06:00.0 VGA compatible controller: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] (rev a1) | 23:52:25 |
| 11 Sep 2024 |
@adam:robins.wtf | results of my ollama testing are:
0.3.5 - works with cudaPackages 12_3 and 12_4
0.3.9 - works on 12_3, broken on 12_4
0.3.10 - works on 12_3, broken on 12_4 | 01:13:46 |
connor (burnt/out) (UTC-8) | It is surprising to me that 0.3.5 works with CUDA 12 at all; I guess there were no breaking API changes on stuff they relied on? | 18:05:26 |
| 12 Sep 2024 |
connor (burnt/out) (UTC-8) | In reply to @connorbaker:matrix.org So... I really don't want to have to figure out testing and stuff for OpenCV for https://github.com/NixOS/nixpkgs/pull/339619. OpenCV 4.10 (we have 4.9) supports CUDA 12.4+. Maybe just updating it to punt the issue down the road is fine? (Our latest CUDA version right now is 12.4.) I started writing a pkgs.testers implementation for what Serge suggested here: https://matrix.to/#/!eWOErHSaiddIbsUNsJ:nixos.org/$phSCjT-mxTap-ccF98Z7hZakHk3_-jjkPw2fIvzBhjA?via=nixos.org&via=matrix.org&via=nixos.dev | 00:32:04 |
connor (burnt/out) (UTC-8) | SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? | 23:30:01 |
connor (burnt/out) (UTC-8) | I'm in the process of implementing a tester (https://github.com/NixOS/nixpkgs/pull/341471) but it's taking a bit and I'd like OpenCV fixed (or at least buildable) with CUDA, without breaking a bunch of downstream consumers of OpenCV (like FFMPEG) | 23:35:35 |
| 13 Sep 2024 |
| kaya 𖤐 changed their profile picture. | 07:16:41 |
SomeoneSerge (back on matrix) | Sorry my availability has been limited this way | 10:19:52 |
SomeoneSerge (back on matrix) | * Sorry my availability has been limited this week | 10:19:55 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org wouldn't things like API changes between versions cause breakage? EDIT: I guess they would cause build failures... my primary concern was that it would cause failures at runtime, but I suppose that's not really a problem for compiled targets. Relative to libc, NVIDIA's libraries change way, way more between releases (even minor versions!). Yeah it occurred to me right after posting that for the issue you're actually describing we need very different tests. What I proposed was basically ensuring that the expected versions of dependencies are loaded when running in isolation. What you actually wanted to ensure is that when a different version has already been loaded (which is guaranteed to happen with python) the runtime still works | 10:22:00 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org SomeoneSerge (nix.camp): as a short-term thing, are you okay with me patching out OpenCV's requirement that CUDA version match so we can merge the CUDA fix? Sure let's try. I'd still check something something trivial like
# test1
import torch
torch.randn(10, 10, device="cuda").sum().item()
import cv2
# do something with cv2 and cuda
# test2
import cv2
# do something with cv2 and cuda
import torch
torch.randn(10, 10, device="cuda").sum().item()
| 10:24:51 |
connor (burnt/out) (UTC-8) | In reply to @ss:someonex.net Sorry my availability has been limited this week No need for apology; all volunteer time :) | 16:50:50 |
connor (burnt/out) (UTC-8) | In reply to @ss:someonex.net
Sure let's try. I'd still check something something trivial like
# test1
import torch
torch.randn(10, 10, device="cuda").sum().item()
import cv2
# do something with cv2 and cuda
# test2
import cv2
# do something with cv2 and cuda
import torch
torch.randn(10, 10, device="cuda").sum().item()
Ooh that’s a good minimal test (hopefully), mind if I use that? | 16:52:03 |
connor (burnt/out) (UTC-8) | To clarify SomeoneSerge (nix.camp), do you want a test like that in the OpenCV PR, or is it okay if that's tracked (via https://github.com/NixOS/nixpkgs/issues/341650) and added later? | 23:07:24 |
| 14 Sep 2024 |
| SomeoneSerge (back on matrix) changed their display name from SomeoneSerge (nix.camp) to SomeoneSerge (utc+3). | 11:37:51 |
| kaya 𖤐 changed their profile picture. | 20:26:46 |
| 15 Sep 2024 |
@adam:robins.wtf | In reply to @connorbaker:matrix.org It is surprising to me that 0.3.5 works with CUDA 12 at all; I guess there were no breaking API changes on stuff they relied on? Ok, so I think all the version stuff was a red herring. I believe I've found the culprit, which is that this derivation isn't ending up in the final nixos system. https://github.com/NixOS/nixpkgs/blob/345c263f2f53a3710abe117f28a5cb86d0ba4059/pkgs/by-name/ol/ollama/package.nix#L122 | 17:33:36 |
@adam:robins.wtf | I run ollama in an incus(lxc) container with the GPU passed in, but I don't build the system configuration on that host | 17:34:12 |
@adam:robins.wtf | manually copying it over from the build hosts allows ollama to successfully work | 17:34:27 |
@adam:robins.wtf | * manually copying it over from the build host allows ollama to successfully work | 17:34:37 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org To clarify SomeoneSerge (nix.camp), do you want a test like that in the OpenCV PR, or is it okay if that's tracked (via https://github.com/NixOS/nixpkgs/issues/341650) and added later? Ouch, I thought I had replied. Just a manual test is sufficient, but also needed because I suppose we do want to make sure opencv actually works after merging? | 19:00:51 |
SomeoneSerge (back on matrix) | In reply to @adam:robins.wtf manually copying it over from the build host allows ollama to successfully work Could you elaborate for the others, what is it that needs to be manually copied? | 19:01:34 |
@adam:robins.wtf | https://github.com/NixOS/nixpkgs/pull/342127 should fix it | 19:08:33 |
@adam:robins.wtf | it's that cudaToolkit/cuda-merged env that wasn't being included | 19:08:56 |
SomeoneSerge (back on matrix) | In reply to @adam:robins.wtf it's that cudaToolkit/cuda-merged env that wasn't being included Ehh it shouldn't be included | 19:09:27 |
@adam:robins.wtf | well, i'm open to other fixes, but without that env it fails to find cuda_cudart.so.12 | 19:10:08 |
@adam:robins.wtf | that's the same env being used to build ollama against cuda, so i assume it's expecting the files to be there at runtime too | 19:10:32 |
@adam:robins.wtf | * well, i'm open to other fixes, but without that env it fails to find lib_cudart.so.12 | 19:14:21 |