NixOS CUDA - Public Room Timeline

	NixOS CUDA	310 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	60 Servers

Load older messages

Sender	Message	Time
11 Jun 2024
SomeoneSerge (matrix works sometimes)	* the state of `hardware.opengl.enable` and the nvidia driver?..	21:29:30
connor (he/him)	What revision of nixpkgs are you on? `master` fails to build (`go-stable-diffusion` errors during CMake configure)	21:50:56
teto	ha nevermind I do have libcuda.so and so on in /run/opengl-driver/lib (my first search must have ignored symlinks). I use the local-ai from nixos-unstable so that would be c7b821ba2e1e635ba5a76d299af62821cbcb09f3	21:58:35
connor (he/him)	Huh, not sure how that's working for you since I can't get it to build: $ cat ~/.config/nixpkgs/config.nix { allowAliases = false; allowBroken = false; allowUnfree = true; checkMeta = true; cudaCapabilities = [ "7.5" ]; cudaSupport = true; } $ nix run --impure -L --show-trace --builders '' --max-jobs 1 github:nixos/nixpkgs/c7b821ba2e1e635ba5a76d299af62821cbcb09f3#local-ai ... go-stable-diffusion> CMake Error at /nix/store/q1nssraba326p2kp6627hldd2bhg254c-cmake-3.29.2/share/cmake-3.29/Modules/FindCUDA.cmake:883 (message): go-stable-diffusion> Specify CUDA_TOOLKIT_ROOT_DIR go-stable-diffusion> Call Stack (most recent call first): go-stable-diffusion> /nix/store/r008mflixfchlfscby4h0mjgqvz059pw-opencv-4.9.0/lib/cmake/opencv4/OpenCVConfig.cmake:86 (find_package) go-stable-diffusion> /nix/store/r008mflixfchlfscby4h0mjgqvz059pw-opencv-4.9.0/lib/cmake/opencv4/OpenCVConfig.cmake:108 (find_host_package) go-stable-diffusion> examples/CMakeLists.txt:17 (find_package)	22:08:21
connor (he/him)	Oh! SomeoneSerge (UTC+3) I was rebuilding OpenCV4 with the changes to the setup hooks you mentioned earlier about the CMake flags being opt-in, and I noticed that switching `--compiler-bindir` to `-ccbin` was enough to get rid of the "incompatible redefinition" warnings we've been seeing with CMake: https://github.com/NixOS/nixpkgs/pull/306172/commits/7dc8d6d83a853f98a695e2b23aa8d33a50aff6df#diff-3692a7105fd90d95727cd2f794cdb4af2656be94af52d97485c9d4ded9107883L93-R72	22:23:06
trexd	Do you guys have any recommendations on setting up a cache for https://github.com/hasktorch/hasktorch-skeleton/pull/9 ? The haskell part of the build already takes quite a long time but throw CUDA in the mix and I think build times would quickly get out of hand. Either way I've also never setup a cache via cachix or hercules-ci before so I don't know what the limits on build times are either.	23:31:32
12 Jun 2024
aidalgol	SomeoneSerge (UTC+3): Any thoughts? https://github.com/NixOS/nixpkgs/issues/319167	00:36:45
aidalgol	This strikes me as a rather odd thing to even be checking for as a user.	00:39:26
ˈt͡sɛːzaɐ̯	Redacted or Malformed Event	00:45:31
connor (he/him)	In reply to @trexd:matrix.org Do you guys have any recommendations on setting up a cache for https://github.com/hasktorch/hasktorch-skeleton/pull/9 ? The haskell part of the build already takes quite a long time but throw CUDA in the mix and I think build times would quickly get out of hand. Either way I've also never setup a cache via cachix or hercules-ci before so I don't know what the limits on build times are either. Depends on how much space you need. Cachix gives 5Gb for free from what I remember. For a single project, that may be enough. For anything larger than that, you’re in for a world of unpleasantness depending on storage requirements. Even just hosting a binary cache via S3, you’re going to be paying for each API call (of which Nix will generate many, like an HTTP HEAD request for each NARINFO file, not to mention the actual retrieval of the data) as well as egress, which adds up very quickly.	03:37:41
connor (he/him)	The cheapest I’ve managed so far is a Hetzner instance with a 7950x3D and a 10gbe NIC for maybe 150$ a month. Download and upload speeds certainly aren’t saturating the NIC, but until I re-write enough of Attic to be able to run it fully server less via Cloudflare workers/R2/D1/KV, that’s the best I’m going to get I think.	03:42:09
trexd	In reply to @connorbaker:matrix.org Depends on how much space you need. Cachix gives 5Gb for free from what I remember. For a single project, that may be enough. For anything larger than that, you’re in for a world of unpleasantness depending on storage requirements. Even just hosting a binary cache via S3, you’re going to be paying for each API call (of which Nix will generate many, like an HTTP HEAD request for each NARINFO file, not to mention the actual retrieval of the data) as well as egress, which adds up very quickly. I figured time would be the more important factor rather than storage? 🤔 I don't imagine the package taking more than 5GB. Is there a nix command that I can use to get the total size of a build?	12:34:38
connor (he/him)	`nix path-info -rsSh <flake attribute or store path>` iirc	12:36:40
trexd	7.9G RIP 🫠	12:44:12
connor (he/him)	to be fair IIRC that's the uncompressed size, and if there are paths cached in the main nixos cache, it wouldn't count against you not sure Domen allows specifying other upstream caches though (avoiding caching some of the CUDA dependencies would probably be best)	13:41:29
trexd	I guess I'll play around with it and see what's up. Is pinning to a successful build of nixpkgs-cuda-ci still the best way to get CUDA hits?	13:43:59
connor (he/him)	I believe so, yes	15:00:37
SomeoneSerge (matrix works sometimes)	In reply to @aidalgol:matrix.org SomeoneSerge (UTC+3): Any thoughts? https://github.com/NixOS/nixpkgs/issues/319167 I suppose we need some kind of a fixpoint 🤷 Btw, I just had a look at the mangohud derivation, and we're still doing `inherit (linuxPackages.nvidia_x11) libXNVCtrl` which I think is a mauvais ton (referencing a concrete version of linuxPackages from nixpkgs) I still think it belongs in the top-level, maybe as an attrset, `libXNVCtrlVersions` probably a very bad idea, but the nvidia nixos module could add an overlay setting the respective default version of libXNVCtrl all packages taking `libXNVCtrl` from the top-level, we'd ensure only one version is in use in any given closure (probably at the cost of rebuilding reverse dependencies) passing a python package (mako) rather than python3Packages Honestly, not sure if this is worth the effort 🙃	16:24:04
13 Jun 2024
	shekhinah removed their display name yaldabaoth.	02:43:30
SomeoneSerge (matrix works sometimes)	Ehhh tfw `/proc/sys/fs/file-max` is 20 characters long but nix build fails with "too many open files"	16:17:07
SomeoneSerge (matrix works sometimes)	`$ sudo lsof \| wc -l 1191414` THat's not much is it	16:21:12
SomeoneSerge (matrix works sometimes)	In reply to @gjvnq:matrix.org Yeah, I had already figured out it but the bug issue is that I don't know what is the "right" way to include the definition of the half type. To make matters worse, I've tried to compile AliceVision on a docker container and using the official compilation scripts and yet the thing keeps failing. This means I can't even look at how the thing is supposed to compile. I'm at a bit of a loss as for how to proceed, but I suspect that I'll have to either ask the original authors for help or carefully read the cmake compilation scripts in order to look for potential sources of the error. Theoretically AliceVision has a nice CI pipeline but I can't see their build history so I don't even know how useful their CI scripts are. Could it be this https://github.com/AcademySoftwareFoundation/Imath/blob/2fc9d89ec52003350fcfd20f337bb3d0b870ff5a/src/Imath/half.h#L180-L182	16:25:14
Mir	In reply to @ss:someonex.net Could it be this https://github.com/AcademySoftwareFoundation/Imath/blob/2fc9d89ec52003350fcfd20f337bb3d0b870ff5a/src/Imath/half.h#L180-L182 possibly, but I'm afraid of just patching source code to force the inclusion of CUDA's half without first exhausting config flags. I feel like something in CMake is misconfigured or bugged and I feel like I should patch CMakeFile.txt before patching the source code directly	16:32:00
aidalgol	In reply to @ss:someonex.net I suppose we need some kind of a fixpoint 🤷 Btw, I just had a look at the mangohud derivation, and we're still doing `inherit (linuxPackages.nvidia_x11) libXNVCtrl` which I think is a mauvais ton (referencing a concrete version of linuxPackages from nixpkgs) I still think it belongs in the top-level, maybe as an attrset, `libXNVCtrlVersions` probably a very bad idea, but the nvidia nixos module could add an overlay setting the respective default version of libXNVCtrl all packages taking `libXNVCtrl` from the top-level, we'd ensure only one version is in use in any given closure (probably at the cost of rebuilding reverse dependencies) passing a python package (mako) rather than python3Packages Honestly, not sure if this is worth the effort 🙃 I did have an attempt at moving it to the top-level, but decided to do it in a separate PR. I need to come back to that. I'm not entirely sure how best to approach that so that `nvidia_x11` can override it.	19:59:27
aidalgol	I'm not convinced that having a single fixed version of `libXNVCtrl` is worth the trouble, but I do want to move it to the top level.	20:00:03
15 Jun 2024
	shekhinah set their display name to shekhinah.	08:46:32
matthewcroughan	Did you guys know that `python311Packages.tensorrt` is broken because someone updated `pkgs/development/cuda-modules/tensorrt/releases.nix` without checking that it broke any derivations?	15:14:46
matthewcroughan	`[astraluser@edward:~/Downloads/f/TensorRT-8.6.1.6]$ ls python/tensorrt-8.6.1-cp3 tensorrt-8.6.1-cp310-none-linux_x86_64.whl tensorrt-8.6.1-cp36-none-linux_x86_64.whl tensorrt-8.6.1-cp38-none-linux_x86_64.whl tensorrt-8.6.1-cp311-none-linux_x86_64.whl tensorrt-8.6.1-cp37-none-linux_x86_64.whl tensorrt-8.6.1-cp39-none-linux_x86_64.whl`	15:15:11
matthewcroughan	python3.11-tensorrt> /nix/store/d3dzfy4amjl826fb8j00qp1d9887h7hm-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context error: builder for '/nix/store/8pw2fjq86vbkdd6s1bl6axfkhbnm18lr-python3.11-tensorrt-8.6.1.6.drv' failed with exit code 2; last 10 log lines: > Using pythonImportsCheckPhase > Sourcing python-namespaces-hook > Sourcing python-catch-conflicts-hook.sh > Sourcing auto-add-driver-runpath-hook > Using autoAddDriverRunpath > Sourcing fix-elf-files.sh > Running phase: unpackPhase > tar: TensorRT-8.6.1.6/python/tensorrt-8.6.1.6-cp311-none-linux_x86_64.whl: Not found in archive > tar: Exiting with failure status due to previous errors > /nix/store/d3dzfy4amjl826fb8j00qp1d9887h7hm-stdenv-linux/setup: line 131: pop_var_context: head of shell_variables not a function context For full logs, run 'nix log /nix/store/8pw2fjq86vbkdd6s1bl6axfkhbnm18lr-python3.11-tensorrt-8.6.1.6.drv'.	15:15:34
matthewcroughan	they removed the `.6` from the release	15:15:47

Show newer messages

Back to Room ListRoom Version: 9