NixOS CUDA - Public Room Timeline - Matrix Static

	NixOS CUDA	336 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	64 Servers

You have reached the beginning of time (for this room).

Sender	Message	Time
20 Oct 2024
sielicki	In reply to @ss:someonex.net I think I'd vote pro propagation if we could say with some certainty, that that is the only way to guarantee correctness for users of `libcudart_static` and of cmake's `CUDA::cuda_driver` (just because supporting that scope sounds doable) Not specific to nixos, but just a rant from me: there's been a pretty large push around the cuda world for everyone to move to static libcudart... largely because with cuda 12 they introduced the minor version compatibility and "cuda enhanced compatibility" guarantees, and there's a lot of public statements (on github, etc.) from nvidia that suggests this is the safest way to distribute packages. All of this is really complicated and I don't fault projects for moving forward under this guidance, but i'm pretty confident that this does not cover all cases and you do still need to think about this stuff. One example of where you still need to think about it: a lot of code uses the runtime API to resolve the driver API (through `cudaGetDriverEntrypoint`). The returned function pointers are given by `min(linked_runtime_api_ver, actual_driver_version)`, exclusively. There's no automatic detection of another copy of libcudart in the same process that would allow for automatically matching the API version -- it's exclusively based on what you linked against compared to the driver version in use. (There's no way to implement API-level alignment between libraries in the same process; they would need a way to invalidate fnptrs they've already handed out when they suddenly encounter some new library in the process operating at a new version.) This is a really easy way to run afoul of the cuda version mixing guidelines, and I feel like it's pretty underdiscussed and underdocumented. Those version mixing guidelines are still relevant, dammit! It's not magic!	03:10:26
sielicki	In reply to @ss:someonex.net I think I'd vote pro propagation if we could say with some certainty, that that is the only way to guarantee correctness for users of `libcudart_static` and of cmake's `CUDA::cuda_driver` (just because supporting that scope sounds doable) * Not specific to nixos, but just a rant from me: there's been a pretty large push around the cuda world for everyone to move to static libcudart... largely because with cuda 12 they introduced the minor version compatibility and "cuda enhanced compatibility" guarantees, and there's a lot of public statements (on github, etc.) from nvidia that suggests this is the safest way to distribute packages. All of this is really complicated and I don't fault projects for moving forward under this guidance, but i'm pretty confident that this does not cover all cases and you do still need to think about this stuff. One example of where you still need to think about it: a lot of code uses the runtime API to resolve the driver API (through `cudaGetDriverEntrypoint`). The returned function pointers are given by `min(linked_runtime_api_ver, actual_driver_version)`, exclusively. There's no automatic detection of another copy of libcudart in the same process that would allow for automatically matching the API version -- it's exclusively based on what you linked against compared to the driver version in use. (There's no way to implement API-level alignment between libraries in the same process; they would need a way to invalidate fnptrs they've already handed out when they suddenly encounter some new library in the process operating at a new version.) This is a really easy way to run afoul of the cuda version mixing guidelines, and I feel like it's pretty underdiscussed and underdocumented. Those version mixing guidelines are still important, minor version compatibility does not save you, it's not the case that if they all start with "12" you don't have to think about it anymore.	03:12:17
sielicki	Don't get me started on pypi wheels, and the nuance between RPATH and RUNPATH, and so on	03:13:08
connor (burnt/out) (UTC-8)	In reply to @ss:someonex.net a footgun people keep firing, True autoAddDriverRunpath Yes and no. Yes because that'd definitely make one-off and our own contributions easier. No because once we start propagating it we lose the knowledge of which packages actually need to be patched. It still seems to me that most packages we don't have to patch because they call cudart and cudart is patchelfed. Maybe yes because I'm unsure what happens with libcudart_static. autoPatchelfHook I'd be rather strongly opposed to this one. Autopatchelf is a huge hammer, coarse and imprecise. It can actually erase correct runpaths from an originally correct binary. Let's reserve it for non Another important thing to consider is (here we go again) whether we want to keep both backendStdenv and the hook and which of these things should be propagating what My favorite functionality autoPatchelfHook has is that it will error on unresolved dependencies — I could live without the actual patching, I suppose, but I really like using it to check that all the libraries I need are in scope. Any ideas if such functionality already exists in Nixpkgs or would be a useful check?	07:30:53

Show newer messages

Back to Room ListRoom Version: 9