!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

291 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
2 Nov 2024
@sielicki:matrix.orgsielickimaybe I did a poor job of summarizing it here in riot but that's what I expected to see23:42:13
@ss:someonex.netSomeoneSerge (back on matrix) Nvidia ships stubs without the .1s which is why we have https://github.com/NixOS/nixpkgs/blob/a8ffc2295c358629bc1bda569bf8b3bbb21aa1be/pkgs/development/cuda-modules/cuda/overrides.nix#L124-L129 23:42:54
@sielicki:matrix.orgsielickiThe problem I'm wondering about is what actually enforces that ld.so prefers /run/opengl-driver/lib to /usr/local/nvidia/lib64/stubs? or potentially someone's conda env or virtualenv23:42:59
@sielicki:matrix.orgsielickiwith RPATH'ing all the things, it's probably fine23:43:31
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
The problem I'm wondering about is what actually enforces that ld.so prefers /run/opengl-driver/lib to /usr/local/nvidia/lib64/stubs? or potentially someone's conda env or virtualenv
That executables from Nixpkgs use their own ld.so which ignores /usr stuff
23:43:42
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
as long as i'm stuck on my computer on a saturday, qq: is there a strong reason to hold cudart at 12.4 in master, or just nobody raised a PR for it?
Yes it's just that it's toil
23:44:40
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
as long as i'm stuck on my computer on a saturday, qq: is there a strong reason to hold cudart at 12.4 in master, or just nobody raised a PR for it?
* Yes it's just that it's toil (and review roundtrip times aren't helping)
23:45:28
@sielicki:matrix.orgsielickilet me know if I can pick up any slack or what you guys need23:46:03
@ss:someonex.netSomeoneSerge (back on matrix)There's lots and the linked PR is one candidate 😍23:46:50
@sielicki:matrix.orgsielickiI just raised an issue earlier today about some of the driver hashes missing for some of the releases, it feels to me like we really need a solid cuda json scraper to prefetch thing 23:47:14
@sielicki:matrix.orgsielicki * I just raised an issue earlier today about some of the driver hashes missing for some of the releases, it feels to me like we really need a solid cuda json scraper to prefetch script 23:47:33
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Nvidia ships stubs without the .1s which is why we have https://github.com/NixOS/nixpkgs/blob/a8ffc2295c358629bc1bda569bf8b3bbb21aa1be/pkgs/development/cuda-modules/cuda/overrides.nix#L124-L129
I'm wondering what can we do to remove this hack
23:53:53
@sielicki:matrix.orgsielickiI have no idea how it works in the first place, let alone how to remove it23:56:04
@sielicki:matrix.orgsielickii guess it just consistently is the case that there's a better path than that one 23:57:02
@sielicki:matrix.orgsielicki * i guess it's just consistently the case that ld.so believes there's a better match than that one. 23:57:48
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
I just raised an issue earlier today about some of the driver hashes missing for some of the releases, it feels to me like we really need a solid cuda json scraper to prefetch script
Yes, there's https://github.com/ConnorBaker/cuda-redist-find-features/ but afaik Connor's developing this alone, and there stuff to be improved with the way we parse the results on nixpkgs side too
23:58:38
3 Nov 2024
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
i guess it's just consistently the case that ld.so believes there's a better match than that one.
It's not about ld.so in this case, it's that iirc CUDA::cuda_driver in CMake somehow wanted to see the .1 during the build, which I suspect is wrong
00:01:09
@sielicki:matrix.orgsielickido you remember what issue/bug it was? 00:04:40
@sielicki:matrix.orgsielicki
In reply to @ss:someonex.net
Yes, there's https://github.com/ConnorBaker/cuda-redist-find-features/ but afaik Connor's developing this alone, and there stuff to be improved with the way we parse the results on nixpkgs side too
really nice, I was just gonna propose we run wget --recursive on https://developer.download.nvidia.com/compute/cuda/repos/runfile/x86_64/
00:16:04
@sielicki:matrix.orgsielickimuch prefer the feature extraction stuff, that's wicked00:16:23
@sielicki:matrix.orgsielickione problem with the runfile json is it excludes certain components, eg: nccl00:17:14
4 Nov 2024
@snektron:matrix.orgSnektronIs there a reason why https://github.com/NixOS/nixpkgs/pull/291471 is not merged?13:46:49
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @snektron:matrix.org
Is there a reason why https://github.com/NixOS/nixpkgs/pull/291471 is not merged?
Stuck on unvendoring qt libraries it seems
15:58:17
@connorbaker:matrix.orgconnor (he/him) What are your thoughts on nixGL and nix-gl-host? Specifically as it relates to ensuring people use the same driver version on both NixOS and non-NixOS host machines 22:43:59
@ss:someonex.netSomeoneSerge (back on matrix)

It's a disagreeable opinion but the whole "inspect the system at eval time" business in nixGL makes me want to scream "why". Even the idea of requiring a running nix-daemon in order to load the drivers at the program's startup... I think it's very backwards.

I think nix-gl-host decomposes tasks better, but

  1. It doesn't handle the "nixos but a different release (libc)" situation (nixGL... well it does, just under some set of assumptions)
  2. I'd much rather have a mode where nix-gl-host is used to export the /run/opengl-driver/lib prefix, rather than construct LD_LIBRARY_PATH. ot only because the latter screws up search path priorities, but because the workflow I'd rather see is you run a non-nixos, you drop a systemd unit, and voilla your drivers work exactly the way they do on nixos
23:07:53
@ss:someonex.netSomeoneSerge (back on matrix) *

It's a disagreeable opinion but the whole "inspect the system at eval time" business in nixGL makes me want to scream "why". Even the idea of requiring a running nix-daemon in order to load the drivers at the program's startup... I think it's very backwards.

I think nix-gl-host decomposes tasks better, but

  1. It doesn't handle the "nixos but a different release (libc)" situation (nixGL... well it does, just under some set of assumptions)
  2. I'd much rather have a mode where nix-gl-host is used to export the /run/opengl-driver/lib prefix, rather than construct LD_LIBRARY_PATH. ot only because the latter screws up search path priorities, but because the workflow I'd rather see is you run a non-nixos, you drop a systemd unit, and voilla your drivers work can be consumed exactly the way they do on nixos
23:08:04
@ss:someonex.netSomeoneSerge (back on matrix)FWIW I see no reason not to implement both approaches (host's libs vs. nixpkgs' libs that match the running kernel) in one tool23:09:19
@ss:someonex.netSomeoneSerge (back on matrix) That could be just two flags: --drivers-from={HOST|<nix expr or a flake ref>} and --export=/run/opengl-driver/lib as the alternative to --run <CMD> 23:12:23
@ss:someonex.netSomeoneSerge (back on matrix)(and then of course there's this whole libcapsule business that's waiting to be tried)23:13:40
@ss:someonex.netSomeoneSerge (back on matrix)Note there was recently somebody in nix-gl-host issues advertising their Rust rewrite23:29:51

Show newer messages


Back to Room ListRoom Version: 9