!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

292 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
2 Nov 2024
@sielicki:matrix.orgsielicki * I just raised an issue earlier today about some of the driver hashes missing for some of the releases, it feels to me like we really need a solid cuda json scraper to prefetch script 23:47:33
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @ss:someonex.net
Nvidia ships stubs without the .1s which is why we have https://github.com/NixOS/nixpkgs/blob/a8ffc2295c358629bc1bda569bf8b3bbb21aa1be/pkgs/development/cuda-modules/cuda/overrides.nix#L124-L129
I'm wondering what can we do to remove this hack
23:53:53
@sielicki:matrix.orgsielickiI have no idea how it works in the first place, let alone how to remove it23:56:04
@sielicki:matrix.orgsielickii guess it just consistently is the case that there's a better path than that one 23:57:02
@sielicki:matrix.orgsielicki * i guess it's just consistently the case that ld.so believes there's a better match than that one. 23:57:48
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
I just raised an issue earlier today about some of the driver hashes missing for some of the releases, it feels to me like we really need a solid cuda json scraper to prefetch script
Yes, there's https://github.com/ConnorBaker/cuda-redist-find-features/ but afaik Connor's developing this alone, and there stuff to be improved with the way we parse the results on nixpkgs side too
23:58:38
3 Nov 2024
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @sielicki:matrix.org
i guess it's just consistently the case that ld.so believes there's a better match than that one.
It's not about ld.so in this case, it's that iirc CUDA::cuda_driver in CMake somehow wanted to see the .1 during the build, which I suspect is wrong
00:01:09
@sielicki:matrix.orgsielickido you remember what issue/bug it was? 00:04:40
@sielicki:matrix.orgsielicki
In reply to @ss:someonex.net
Yes, there's https://github.com/ConnorBaker/cuda-redist-find-features/ but afaik Connor's developing this alone, and there stuff to be improved with the way we parse the results on nixpkgs side too
really nice, I was just gonna propose we run wget --recursive on https://developer.download.nvidia.com/compute/cuda/repos/runfile/x86_64/
00:16:04
@sielicki:matrix.orgsielickimuch prefer the feature extraction stuff, that's wicked00:16:23
@sielicki:matrix.orgsielickione problem with the runfile json is it excludes certain components, eg: nccl00:17:14
4 Nov 2024
@snektron:matrix.orgSnektronIs there a reason why https://github.com/NixOS/nixpkgs/pull/291471 is not merged?13:46:49
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @snektron:matrix.org
Is there a reason why https://github.com/NixOS/nixpkgs/pull/291471 is not merged?
Stuck on unvendoring qt libraries it seems
15:58:17
@connorbaker:matrix.orgconnor (he/him) What are your thoughts on nixGL and nix-gl-host? Specifically as it relates to ensuring people use the same driver version on both NixOS and non-NixOS host machines 22:43:59
@ss:someonex.netSomeoneSerge (back on matrix)

It's a disagreeable opinion but the whole "inspect the system at eval time" business in nixGL makes me want to scream "why". Even the idea of requiring a running nix-daemon in order to load the drivers at the program's startup... I think it's very backwards.

I think nix-gl-host decomposes tasks better, but

  1. It doesn't handle the "nixos but a different release (libc)" situation (nixGL... well it does, just under some set of assumptions)
  2. I'd much rather have a mode where nix-gl-host is used to export the /run/opengl-driver/lib prefix, rather than construct LD_LIBRARY_PATH. ot only because the latter screws up search path priorities, but because the workflow I'd rather see is you run a non-nixos, you drop a systemd unit, and voilla your drivers work exactly the way they do on nixos
23:07:53
@ss:someonex.netSomeoneSerge (back on matrix) *

It's a disagreeable opinion but the whole "inspect the system at eval time" business in nixGL makes me want to scream "why". Even the idea of requiring a running nix-daemon in order to load the drivers at the program's startup... I think it's very backwards.

I think nix-gl-host decomposes tasks better, but

  1. It doesn't handle the "nixos but a different release (libc)" situation (nixGL... well it does, just under some set of assumptions)
  2. I'd much rather have a mode where nix-gl-host is used to export the /run/opengl-driver/lib prefix, rather than construct LD_LIBRARY_PATH. ot only because the latter screws up search path priorities, but because the workflow I'd rather see is you run a non-nixos, you drop a systemd unit, and voilla your drivers work can be consumed exactly the way they do on nixos
23:08:04
@ss:someonex.netSomeoneSerge (back on matrix)FWIW I see no reason not to implement both approaches (host's libs vs. nixpkgs' libs that match the running kernel) in one tool23:09:19
@ss:someonex.netSomeoneSerge (back on matrix) That could be just two flags: --drivers-from={HOST|<nix expr or a flake ref>} and --export=/run/opengl-driver/lib as the alternative to --run <CMD> 23:12:23
@ss:someonex.netSomeoneSerge (back on matrix)(and then of course there's this whole libcapsule business that's waiting to be tried)23:13:40
@ss:someonex.netSomeoneSerge (back on matrix)Note there was recently somebody in nix-gl-host issues advertising their Rust rewrite23:29:51
5 Nov 2024
@aidalgol:matrix.orgaidalgolHas anyone successfully built blender on unstable with CUDA support enabled recently?02:31:45
@julius:mtx.liftm.deˈt͡sɛːzaɐ̯🤔 Currently, openusd fails in patch phase.08:39:43
@zimbatm:numtide.comJonas Chevalier
In reply to @aidalgol:matrix.org
Has anyone successfully built blender on unstable with CUDA support enabled recently?
looks like it's broken since end of October: https://hydra.nix-community.org/build/1720981
15:13:24
@zimbatm:numtide.comJonas Chevalier
In reply to @ss:someonex.net
Note there was recently somebody in nix-gl-host issues advertising their Rust rewrite
Did you take a look at https://github.com/NVIDIA/nvidia-container-toolkit already? I think they are battling with similar issues, plus sandboxing on top. It seems like they are using ld.so.cache as a loading mechanism instead of the LD_* env var and that might be more robust?
15:16:02
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @zimbatm:numtide.com
Did you take a look at https://github.com/NVIDIA/nvidia-container-toolkit already? I think they are battling with similar issues, plus sandboxing on top. It seems like they are using ld.so.cache as a loading mechanism instead of the LD_* env var and that might be more robust?
Yes that's what we use for docker/podman on nixos
15:16:44
@ss:someonex.netSomeoneSerge (back on matrix)I'm not sure if it's worth reusing because nixGL and nix-gl-host are solving a more general problem; the ctk just assumes an FHS environment, we have to patch it and we have to patch its outputs to make them usable on nixos15:19:32
@ss:someonex.netSomeoneSerge (back on matrix)But yes we should keep in mind the general idea of exporting ld.so.cache. We actually used it at least for some time for the singularity containers15:21:25
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @zimbatm:numtide.com
looks like it's broken since end of October: https://hydra.nix-community.org/build/1720981
In file included from /build/source/intern/cycles/scene/image_vdb.cpp:5:
/build/source/intern/cycles/scene/../scene/image_vdb.h:12:12: fatal error: nanovdb/util/GridHandle.h: No such file or directory
   12 | #  include <nanovdb/util/GridHandle.h>
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [intern/cycles/scene/CMakeFiles/cycles_scene.dir/build.make:328: intern/cycles/scene/CMakeFiles/cycles_scene.dir/image_vdb.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 56%] Building CXX object source/blender/makesrna/intern/CMakeFiles/bf_rna.dir/rna_access_compare_override.cc.o
In file included from /build/source/intern/cycles/scene/image.cpp:9:
/build/source/intern/cycles/scene/../scene/image_vdb.h:12:12: fatal error: nanovdb/util/GridHandle.h: No such file or directory
   12 | #  include <nanovdb/util/GridHandle.h>
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~

Well here's the first offender

15:22:59
@ss:someonex.netSomeoneSerge (back on matrix)Ah isn't it nice to have a per-attribute build history15:23:36
@ss:someonex.netSomeoneSerge (back on matrix)So odd that wasn't a thing with hercules15:24:15

Show newer messages


Back to Room ListRoom Version: 9