!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

283 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda58 Servers

Load older messages


SenderMessageTime
25 Feb 2023
@ss:someonex.netSomeoneSerge (back on matrix) * Appending to postFixup doesn't seem to do the trick 11:47:47
@ss:someonex.netSomeoneSerge (back on matrix) * ~~Is there a reason we use glob in auto-patchelf.py? It seems it skips hidden files, including files renamed by wrapProgram~~ Nope, it doesn't skip anything, so idk why the renamed file doesn't get patched 11:47:57
@ss:someonex.netSomeoneSerge (back on matrix)

RE: -allow-unsupported=compiler

Ok, I tried building faiss with that flag on, getting some gcc errors: /nix/store/9pgq84sf921xh97gjj2wh7a7clrcrh4m-gcc-12.2.0/include/c++/12.2.0/bits/random.h(104): error: expected a declaration

11:54:36
@ss:someonex.netSomeoneSerge (back on matrix)I wouldn't go any further into that, I think we should just default to building downstream packages with gcc version dictated by nvcc11:56:02
@ss:someonex.netSomeoneSerge (back on matrix) This is a bit of a pickle because downstream expressions expect stdenv as an argument and whenever we set cudaSupport = true we should override it 11:58:21
@ss:someonex.netSomeoneSerge (back on matrix)

So, the ugly and straightforward version could look like this:

{ config
, stdenv
, ...
, cudaSupport ? config.cudaSupport or false
, cudaPackages
}:


(if cudaSupport then cudaPackages.stdenv else stdenv).mkDerivation { ... }
12:02:22
@ss:someonex.netSomeoneSerge (back on matrix) *

So, the ugly and straightforward version could look like this:

{ config
, stdenv
, ...
, cudaSupport ? config.cudaSupport or false
, cudaPackages
}:


(if cudaSupport then cudaPackages.stdenv else stdenv).mkDerivation { ... }
12:02:28
@ss:someonex.netSomeoneSerge (back on matrix)And I'm pretty sure nobody in nixpkgs would want to do that just because of cuda12:03:19
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @mcwitt:matrix.org
Just to close the loop, the fix in my case was to set cmakeFlags = [ "-DCMAKE_CUDA_HOST_COMPILER=${cudaPackages.cudatoolkit.cc}/bin/cc" ] (and eventually found many examples of this in nixpkgs). Thanks for pointing me in the right direction!
Hm, I should try that. For whatever reason I see that CUDA_HOST_COMPILER is set, not CMAKE_CUDA_HOST_COMPILER
12:06:03
@ss:someonex.netSomeoneSerge (back on matrix)Redacted or Malformed Event12:09:40
@ss:someonex.netSomeoneSerge (back on matrix) * btw, looking at what cmake reference says, it seems this variable should point to nvcc 🤔 12:16:01
@connorbaker:matrix.orgconnor (he/him) For what it's worth, some CMake projects don't respect those arguments (they will also print, at the the end of the configure phase, which arguments were not used).
I've had better luck setting CUDAHOSTCXX as an environment variable because it's one CMake looks at specifically, unless the CMakeLists.txt is written in such a way to prohibit it: https://cmake.org/cmake/help/latest/envvar/CUDAHOSTCXX.html?highlight=cudahostcxx
12:20:32
@ss:someonex.netSomeoneSerge (back on matrix) Yea, many projects haven't migrated to FindCUDAToolkit yet 12:21:20
@connorbaker:matrix.orgconnor (he/him)

Three more things that popped into my head (sorry, I am actively consuming coffee):

  1. When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards.
  2. If a package needs to link against libcuda.so specifically, what's the best way to make the linker aware of those stubs? I set LIBRARY_PATH and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76
  3. Is it better to set environment variables as env.BLAG = "blarg" (I saw a tree-wide change about using env because of "structuredAttrs") in the derivation or to export them in the shell, in something like preConfigure?
12:26:25
@connorbaker:matrix.orgconnor (he/him) *

Three more things that popped into my head (sorry, I am actively consuming coffee):

  1. When we do override the C/C++ compilers by setting the CC/CXX environment variables, that doesn't change binutils, so (in my case) I still see ar/ranlib/ld and friends from gcc12 being used. Is that a problem? I don't know if version bumps to those tools can cause as much damage as libraries compiled with different language standards.
  2. If a package needs to link against libcuda.so specifically, what's the best way to make the linker aware of those stubs? I set LIBRARY_PATH and that seemed to do the trick: https://github.com/NixOS/nixpkgs/pull/218166/files#diff-ab3fb67b115c350953951c7c5aa868e8dd9694460710d2a99b845e7704ce0cf5R76
  3. Is it better to set environment variables as env.BLAG = "blarg" (I saw a tree-wide change about using env because of "structuredAttrs") in the derivation or to export them in the shell, in something like preConfigure?

EDIT: I should put these in my issue about docs for CUDA packaging...

12:27:10
@ss:someonex.netSomeoneSerge (back on matrix)

RE: linker

Idk but I think it's kind of the point of the separate linkage phase that we can have some flexibility in mix-n-matching languages and compilers? I'm rn waiting for a build of faiss where I set CUDAHOSTCXX (as you suggested) but don't override stdenv (which means that gcc12 would still be used for .cpp/.cc files). I expect that it's going to succeed

RE: libcuda.so
I don't really know any uses we have for the stubs, we usually want apps to load libcuda.so from /run/opengl-driver/lib. So we add that to the runpaths and if we use autopatchelf we tell it to ignore the missing libcuda.so

12:36:36
@ss:someonex.netSomeoneSerge (back on matrix)The build succeeded and it works at least as far as python import12:41:21
@ss:someonex.netSomeoneSerge (back on matrix)

Runpath looks kind of weird (it has but gcc11 and gcc12 lib/):

❯ patchelf --print-rpath /nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib/python3.10/site-packages/faiss/_swigfaiss.so
/run/opengl-driver/lib:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib64:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib64/stubs:/nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib:/nix/store/li1fg5xf6rzmpm7zlcnsymy8wfpmx0vj-cudatoolkit-11.7.0-lib/lib:/nix/store/m39wyb50jz4mqj22459nz397ascvmgiv-blas-3/lib:/nix/store/lqz6hmd86viw83f9qll2ip87jhb7p1ah-glibc-2.35-224/lib:/nix/store/k88zxp7cvd5gpharprhg9ah0vhz2asq7-gcc-12.2.0-lib/lib
12:42:08
@ss:someonex.netSomeoneSerge (back on matrix) *

Runpath looks kind of weird (it has both gcc11 and gcc12 lib/):

❯ patchelf --print-rpath /nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib/python3.10/site-packages/faiss/_swigfaiss.so
/run/opengl-driver/lib:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib64:/nix/store/2gs5jpwka7604pi9wqab4bp2hsxjkzjx-faiss-1.7.2/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib64/stubs:/nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib:/nix/store/713pzpgy2yhmnh3vs8cfdpv4j8pmqsmm-cudatoolkit-11.7.0/lib:/nix/store/li1fg5xf6rzmpm7zlcnsymy8wfpmx0vj-cudatoolkit-11.7.0-lib/lib:/nix/store/m39wyb50jz4mqj22459nz397ascvmgiv-blas-3/lib:/nix/store/lqz6hmd86viw83f9qll2ip87jhb7p1ah-glibc-2.35-224/lib:/nix/store/k88zxp7cvd5gpharprhg9ah0vhz2asq7-gcc-12.2.0-lib/lib
12:52:40
@ss:someonex.netSomeoneSerge (back on matrix) #TODO there's no comment in versions.toml about where the gcc attribute comes from. I guess we take it from the release notes. Also there's no reason to hard-code gcc: looking at cudatoolkit12 release notes, they say they support clang as well 12:56:24
@ss:someonex.netSomeoneSerge (back on matrix)What would be nice is an easy way to add an "am I using a compatible host compiler?" assert to a downstream package12:57:56
@ss:someonex.netSomeoneSerge (back on matrix) Briefly considered just setting -ccbin in NVCC_PREPEND_FLAGS for non-cmake projects, but oh boi was I naive: https://github.com/NVIDIA/nccl/blob/f3d51667838f7542df8ea32ea4e144d812b3ed7c/makefiles/common.mk#L65 13:56:08
@ss:someonex.netSomeoneSerge (back on matrix)In addition, this generates a ton of "incompatible redefinition" warnings for the actual cmake projects. Asking the support if these can be suppressed: https://forums.developer.nvidia.com/t/setting-default-ccbin-or-suppressing-incompatible-redefinition-for-ccbin-warnings/24406813:57:07
@connorbaker:matrix.orgconnor (he/him) Probably should have asked this earlier — Samuel Ainsworth what do I need to do to be considered for the CUDA maintainers team? 17:20:08
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net

RE: linker

Idk but I think it's kind of the point of the separate linkage phase that we can have some flexibility in mix-n-matching languages and compilers? I'm rn waiting for a build of faiss where I set CUDAHOSTCXX (as you suggested) but don't override stdenv (which means that gcc12 would still be used for .cpp/.cc files). I expect that it's going to succeed

RE: libcuda.so
I don't really know any uses we have for the stubs, we usually want apps to load libcuda.so from /run/opengl-driver/lib. So we add that to the runpaths and if we use autopatchelf we tell it to ignore the missing libcuda.so

If you have some time, could you take a crack at getting https://github.com/NixOS/nixpkgs/pull/218166 to build without the CUDA stub? Not sure if I’m missing something but it fails in the linking portion of the build phase, complaining about missing -lcuda if I don’t add the stub to the library path. Is there a better way to do that?

I don’t remember any of the other libraries failing like that, so I’m curious if there’s something weird going on.

17:22:58
@connorbaker:matrix.orgconnor (he/him)
In reply to @ss:someonex.net
#TODO there's no comment in versions.toml about where the gcc attribute comes from. I guess we take it from the release notes. Also there's no reason to hard-code gcc: looking at cudatoolkit12 release notes, they say they support clang as well

My approach to things like that has been to record something like minGcc, maxGcc, minClang, and maxClang and use them to find out valid compilers. From there, export the allowed stdenvs.

That has the added benefit of allowing new tool chains as they’re added to Nixpkgs so long as they fall in the supported range.

Although, at least personally speaking, I would also like a flag I could set globally to have NVCC just use whatever the stdenv does.

17:27:18
26 Feb 2023
@ss:someonex.netSomeoneSerge (back on matrix)Didn't have enough compute yesterday because of nixpkgs-review 😅16:28:31
@hexa:lossy.networkhexaWill work in getting python-updates in the road tonight18:58:36
@hexa:lossy.networkhexa* Will work on getting python-updates in the road tonight18:59:05
@hexa:lossy.networkhexa* Will work on getting python-updates on the road tonight18:59:50

There are no newer messages yet.


Back to Room ListRoom Version: 9