NixOS CUDA - Public Room Timeline

	NixOS CUDA	326 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	64 Servers

Load older messages

Sender	Message	Time
25 Feb 2023
SomeoneSerge (matrix works sometimes)	In reply to @mcwitt:matrix.org Just to close the loop, the fix in my case was to set `cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER=${cudaPackages.cudatoolkit.cc}/bin/cc" ]` (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction! Hm, I should try that. For whatever reason I see that `CUDA_HOST_COMPILER` is set, not `CMAKE_CUDA_HOST_COMPILER`	12:06:03
SomeoneSerge (matrix works sometimes)	Redacted or Malformed Event	12:09:40
SomeoneSerge (matrix works sometimes)	* ~~btw, looking at what cmake reference says, it seems this variable should point to nvcc 🤔~~	12:16:01
connor (he/him)	For what it's worth, some CMake projects don't respect those arguments (they will also print, at the the end of the configure phase, which arguments were not used). I've had better luck setting `CUDAHOSTCXX` as an environment variable because it's one CMake looks at specifically, unless the CMakeLists.txt is written in such a way to prohibit it: https://cmake.org/cmake/help/latest/envvar/CUDAHOSTCXX.html?highlight=cudahostcxx	12:20:32
SomeoneSerge (matrix works sometimes)	Yea, many projects haven't migrated to `FindCUDAToolkit` yet	12:21:20
connor (he/him)	Three more things that popped into my head (sorry, I am actively consuming coffee): When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards. If a package needs to link against `libcuda.so` specifically, what's the best way to make the linker aware of those stubs? I set `LIBRARY_PATH` and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76 Is it better to set environment variables as `env.BLAG = "blarg"` (I saw a tree-wide change about using `env` because of "structuredAttrs") in the derivation or to export them in the shell, in something like `preConfigure`?	12:26:25
connor (he/him)	* Three more things that popped into my head (sorry, I am actively consuming coffee): When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards. If a package needs to link against `libcuda.so` specifically, what's the best way to make the linker aware of those stubs? I set `LIBRARY_PATH` and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76 Is it better to set environment variables as `env.BLAG = "blarg"` (I saw a tree-wide change about using `env` because of "structuredAttrs") in the derivation or to export them in the shell, in something like `preConfigure`? EDIT: I should put these in my issue about docs for CUDA packaging...	12:27:10
SomeoneSerge (matrix works sometimes)	RE: linker Idk but I think it's kind of the point of the separate linkage phase that we can have some flexibility in mix-n-matching languages and compilers? I'm rn waiting for a build of `faiss` where I set `CUDAHOSTCXX` (as you suggested) but don't override `stdenv` (which means that gcc12 would still be used for `.cpp`/`.cc` files). I expect that it's going to succeed RE: `libcuda.so` I don't really know any uses we have for the stubs, we usually want apps to load `libcuda.so` from `/run/opengl-driver/lib`. So we add that to the runpaths and if we use autopatchelf we tell it to ignore the missing `libcuda.so`	12:36:36
SomeoneSerge (matrix works sometimes)	The build succeeded and it works at least as far as python import	12:41:21
SomeoneSerge (matrix works sometimes)	Runpath looks kind of weird (it has but gcc11 and gcc12 lib/): ❯ patchelf --print-rpath /nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib/python3.10/site-packages/faiss/_swigfaiss.so /run/opengl-driver/lib:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib64:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib64/stubs:/nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib:/nix/store/li1fg5xf6rzmpm7zlcnsymy8wfpmx0vj-cudatoolkit-11.7.0-lib/lib:/nix/store/m39wyb50jz4mqj22459nz397ascvmgiv-blas-3/lib:/nix/store/lqz6hmd86viw83f9qll2ip87jhb7p1ah-glibc-2.35-224/lib:/nix/store/k88zxp7cvd5gpharprhg9ah0vhz2asq7-gcc-12.2.0-lib/lib	12:42:08
SomeoneSerge (matrix works sometimes)	* Runpath looks kind of weird (it has both gcc11 and gcc12 lib/): ❯ patchelf --print-rpath /nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib/python3.10/site-packages/faiss/_swigfaiss.so /run/opengl-driver/lib:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib64:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib64/stubs:/nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib:/nix/store/li1fg5xf6rzmpm7zlcnsymy8wfpmx0vj-cudatoolkit-11.7.0-lib/lib:/nix/store/m39wyb50jz4mqj22459nz397ascvmgiv-blas-3/lib:/nix/store/lqz6hmd86viw83f9qll2ip87jhb7p1ah-glibc-2.35-224/lib:/nix/store/k88zxp7cvd5gpharprhg9ah0vhz2asq7-gcc-12.2.0-lib/lib	12:52:40
SomeoneSerge (matrix works sometimes)	#TODO there's no comment in `versions.toml` about where the gcc attribute comes from. I guess we take it from the release notes. Also there's no reason to hard-code gcc: looking at cudatoolkit12 release notes, they say they support clang as well	12:56:24
SomeoneSerge (matrix works sometimes)	What would be nice is an easy way to add an "am I using a compatible host compiler?" assert to a downstream package	12:57:56
SomeoneSerge (matrix works sometimes)	Briefly considered just setting `-ccbin` in `NVCC_PREPEND_FLAGS` for non-cmake projects, but oh boi was I naive: https://github.com/NVIDIA/nccl/blob/f3d51667838f7542df8ea32ea4e144d812b3ed7c/makefiles/common.mk#L65	13:56:08
SomeoneSerge (matrix works sometimes)	In addition, this generates a ton of "incompatible redefinition" warnings for the actual cmake projects. Asking the support if these can be suppressed: https://forums.developer.nvidia.com/t/setting-default-ccbin-or-suppressing-incompatible-redefinition-for-ccbin-warnings/244068	13:57:07
connor (he/him)	Probably should have asked this earlier — Samuel Ainsworth what do I need to do to be considered for the CUDA maintainers team?	17:20:08
connor (he/him)	In reply to @ss:someonex.net RE: linker Idk but I think it's kind of the point of the separate linkage phase that we can have some flexibility in mix-n-matching languages and compilers? I'm rn waiting for a build of `faiss` where I set `CUDAHOSTCXX` (as you suggested) but don't override `stdenv` (which means that gcc12 would still be used for `.cpp`/`.cc` files). I expect that it's going to succeed RE: `libcuda.so` I don't really know any uses we have for the stubs, we usually want apps to load `libcuda.so` from `/run/opengl-driver/lib`. So we add that to the runpaths and if we use autopatchelf we tell it to ignore the missing `libcuda.so` If you have some time, could you take a crack at getting https://github.com/NixOS/nixpkgs/pull/218166 to build without the CUDA stub? Not sure if I’m missing something but it fails in the linking portion of the build phase, complaining about missing `-lcuda` if I don’t add the stub to the library path. Is there a better way to do that? I don’t remember any of the other libraries failing like that, so I’m curious if there’s something weird going on.	17:22:58
connor (he/him)	In reply to @ss:someonex.net #TODO there's no comment in `versions.toml` about where the gcc attribute comes from. I guess we take it from the release notes. Also there's no reason to hard-code gcc: looking at cudatoolkit12 release notes, they say they support clang as well My approach to things like that has been to record something like minGcc, maxGcc, minClang, and maxClang and use them to find out valid compilers. From there, export the allowed stdenvs. That has the added benefit of allowing new tool chains as they’re added to Nixpkgs so long as they fall in the supported range. Although, at least personally speaking, I would also like a flag I could set globally to have NVCC just use whatever the stdenv does.	17:27:18
26 Feb 2023
SomeoneSerge (matrix works sometimes)	Didn't have enough compute yesterday because of nixpkgs-review 😅	16:28:31
hexa	Will work in getting python-updates in the road tonight	18:58:36
hexa	* Will work on getting python-updates in the road tonight	18:59:05
hexa	* Will work on getting python-updates on the road tonight	18:59:50
connor (he/him)	If a package requires a newer version of cudaPackages than is the default, how should that be handled? For example, I’m packaging NVIDIA’s Transformer Engine and that needs CUDA11.8+. It’s not enough to just pass it the newer, because every dependency the package has ALSO needs to use that newer version. Any suggestions?	19:28:08
SomeoneSerge (matrix works sometimes)	It’s not enough to just pass it the newer, because every dependency the package has ALSO needs to use that newer version. Hm. What happens if they use different cuda?	20:05:54
connor (he/him)	Oh there are checks in the derivation to make sure that doesn't happen -- for example, torch checks to make sure magma uses the same version. So I'd need to override and pass torch overridden with the newer cudaPackages and that torch derivation has to be overridden with a version of magma also using the new version of cudaPackages	20:10:17
SomeoneSerge (matrix works sometimes)	You mean the `assert !cudaSupport \|\| magma.cudatoolkit == cudatoolkit` line?	20:20:12
connor (he/him)	yes	20:48:16
SomeoneSerge (matrix works sometimes)	I wonder why these asserts are there in the first place 🤔 I'm easily convinced that the only reason we'd ever pass different cudatoolkits to magma and torch is by mistake, but I don't know why these specific asserts	22:02:47
connor (he/him)	Or someone mistakenly overrides one but not the other I ended up doing this in `python-modules`: `transformer-engine = callPackage ../development/python-modules/transformer-engine ( let cudaPackages = pkgs.cudaPackages_11_8; magma = pkgs.magma.override { inherit cudaPackages; }; torch = self.torch.override { inherit cudaPackages magma; }; in { inherit cudaPackages torch; } );`	22:12:44
SomeoneSerge (matrix works sometimes)	Yes, but it's not like this isn't going to build? It's just it's not what we probably wanted	22:14:01

Show newer messages

Back to Room ListRoom Version: 9