24 May 2025 |
little_dude | * Hello, this was a long time ago, but I'm finally back to trying to run ollama :D
saxpy doesn't work. I used this flake:
{
description = "CUDA saxpy test";
inputs.nixpkgs.url = "nixpkgs";
outputs =
{ self, nixpkgs }:
{
devShell.x86_64-linux =
let
pkgs = import nixpkgs {
system = "x86_64-linux";
config.allowUnfree = true; # Required for CUDA
};
in
pkgs.mkShell {
name = "cuda-saxpy-shell";
buildInputs = [
pkgs.cudaPackages.saxpy
pkgs.cudaPackages.cudatoolkit
];
shellHook = ''
export CUDA_PATH=${pkgs.cudatoolkit}
export EXTRA_LDFLAGS="-L/lib -L${pkgs.linuxPackages.nvidia_x11}/lib"
export EXTRA_CCFLAGS="-I/usr/include"
# Should I set this?
# export LD_LIBRARY_PATH=${pkgs.cudaPackages.cudatoolkit.lib}/lib:$LD_LIBRARY_PATH
'';
};
};
}
I'm running into the same(?) initialization error I think (see the log file attached) for LD_DEBUG=libs saxpy .
The output of nvidia-smi :
[little-dude@system76-laptop:~/cuda-tests]$ nvidia-smi
Sat May 24 11:08:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 46C P0 590W / 115W | 12MiB / 8188MiB | 13% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3706 G ...me-shell-48.1/bin/gnome-shell 2MiB |
+-----------------------------------------------------------------------------------------+
| 09:09:31 |
26 May 2025 |
connor (he/him) (UTC-7) | Was basically bed ridden with exhaustion this weekend, starting to come back to life
Should be able to review changes to the CUDA lib PR today | 17:52:27 |
SomeoneSerge (Ever OOMed by Element) | Anyone feel like bridging to irc?.. | 22:30:01 |
27 May 2025 |
connor (he/him) (UTC-7) | Aaaaand I’m not gonna get a chance | 02:21:35 |
connor (he/him) (UTC-7) | SomeoneSerge (UTC+U[-12,12]): you good for our usual weekly call in ~12h? | 02:22:06 |
connor (he/him) (UTC-7) | Also, just got a ticket to NixCon so let me know if you’re going and want to catch up | 02:22:30 |
SomeoneSerge (Ever OOMed by Element) | In reply to @connorbaker:matrix.org SomeoneSerge (UTC+U[-12,12]): you good for our usual weekly call in ~12h? Yes please! | 09:11:44 |
SomeoneSerge (Ever OOMed by Element) | In reply to @connorbaker:matrix.org Also, just got a ticket to NixCon so let me know if you’re going and want to catch up Finally! You chose the most expensive nixcon in the eu I must say xD | 09:12:45 |
SomeoneSerge (Ever OOMed by Element) | But yes I plan to be there | 09:12:58 |
SomeoneSerge (Ever OOMed by Element) | Hmm the driver is loaded correctly from the impure location, but the error is rather unspecific 🤔
I 143 78774: calling init: /run/opengl-driver/lib/libcuda.so.1
...
161 CUDA error at cudaMalloc(&xDevice, N * sizeof(float)): initialization error
| 09:21:21 |
SomeoneSerge (Ever OOMed by Element) |
# Should I set this?
No there's no need
| 09:21:57 |
SomeoneSerge (Ever OOMed by Element) | Ah sorry, I saw at least one of the PRs (the runtime wrapper one) and was meaning to merge but then got confused by how it relates to the conversation in the issue | 09:25:55 |
ereslibre | thanks a lot! 🙏 | 11:46:46 |
little_dude | Yes :( Any suggestion to debug further? | 12:43:52 |
hexa (UTC+1) | https://hydra.nix-community.org/eval/448541 | 20:34:35 |
hexa (UTC+1) | lots of jobs lost, can anyone look into this? | 20:35:56 |
hexa (UTC+1) | (stack trace truncated; use '--show-trace' to show the full, detailed trace)
error: attribute 'cudaLib' missing
at /nix/store/gzqv127zcha1gh0a3ib4k71mlw46nkyh-source/pkgs/top-level/release-cuda.nix:17:54:
16| lib = import ../../lib;
17| inherit (import ../development/cuda-modules/_cuda) cudaLib;
| ^
18| in\
| 20:36:36 |
hexa (UTC+1) | cc connor (he/him) (UTC-7) | 20:37:04 |
connor (he/him) (UTC-7) | Will take a look shortly, looks like a fixup mixed that one | 23:05:38 |
28 May 2025 |
connor (he/him) (UTC-7) | I think https://github.com/NixOS/nixpkgs/pull/411574 should fix everything | 00:05:08 |
connor (he/him) (UTC-7) | (I mean, everything broken by the rename, not like, everything related to CUDA) | 00:05:27 |
29 May 2025 |
connor (he/him) (UTC-7) | Okay work was very busy so I didn’t get a chance to review your changes since last week to the db PR Serge, apologies | 03:53:28 |
connor (he/him) (UTC-7) | Made, for some reason, the bizarre choice of staying up late to work on different approach than what nix-eval-jobs takes: https://github.com/ConnorBaker/nix/tree/feat/eval-drvs Essentially abusing CoW fork to do parallel eval of derivations in an incremental fashion | 09:14:29 |
little_dude | So fwiw setting hardware.nvidia.open = false; fixed the issue. | 14:32:37 |
30 May 2025 |
connor (he/him) (UTC-7) | Got halfway through reviewing your PR today Serge, hopefully I can knock the rest out tomorrow if work's not too busy | 07:58:13 |
| Priyanshu Pansari joined the room. | 12:08:22 |
| Robbie Buxton joined the room. | 21:02:30 |
31 May 2025 |
| @trofi:matrix.org left the room. | 13:47:01 |
| @assert-inequality:matrix.org left the room. | 19:33:16 |
2 Jun 2025 |
| matrixrooms.info mod bot (does NOT read/send messages and/or invites; used for checking reported rooms) joined the room. | 18:39:44 |