| 12 May 2024 |
connor (he/him) | Also because without deduplication on the backend, it'd be an explosion in space usage since most of the NARs we build and rebuild are very similar. | 23:54:59 |
| 13 May 2024 |
SomeoneSerge (matrix works sometimes) | In reply to @justbrowsing:matrix.org is it too complicated maintaining the closure with packages for each component? i.e. would a single input simplify? Wdym by "single input" and by the "closure with packages for a component"? | 00:09:36 |
SomeoneSerge (matrix works sometimes) | In reply to @connorbaker:matrix.org
Urhgh Spent a bit trying to figure out why PyTorch was marked as broken on my PR. It was because Magma was building against the latest version of CUDA, but PyTorch was building against 12.1 (the latest officially supported release). Because it's a nightmare to try to ensure everything across multiple package sets is built with the same version of CUDA packages, I've relaxed that condition.
SomeoneSerge (Way down Hadestown) what are your thoughts on having something akin to python312Packages or haskell.packages.<GHC version> where we have a copy of pkgs available, but everything within that named package set is called with a single version of cudaPackages? That would avoid the need to thread different versions through dependencies via passthru, and should remove the possibility of mixing and matching CUDA versions between dependencies.
Hmmhm, pkgs inside python312Packages are the same pkgs as in python311Packages | 00:10:55 |
SomeoneSerge (matrix works sometimes) | We should just hold pytorch and tensorflow at the same cuda version, maybe we can slow down on bumping the default cudaPackages | 00:13:19 |
SomeoneSerge (matrix works sometimes) | This really isn't a big problem is it? | 00:13:37 |
connor (he/him) | Ah that's a shame:
nix-repl> legacyPackages.x86_64-linux.python312Packages.pkgs.python3Packages.python.version
"3.11.9"
I would have expected it to be the same as
nix-repl> legacyPackages.x86_64-linux.python312Packages.python.version
"3.12.3"
| 00:13:31 |
connor (he/him) | Tensorflow and PyTorch already use different CUDA versions, and users are going to want later versions to take advantage of the improvements they bring to hopper chips | 00:14:11 |
connor (he/him) | (or at least, I know I do) | 00:14:16 |
SomeoneSerge (matrix works sometimes) | Yessss users who want to take the advantage can maintain overrides I suppose | 00:14:42 |
SomeoneSerge (matrix works sometimes) | This issue is limited to python packages and is due to two conditions: one is that python programs tend to be written in a way that they depend on everything in the good world, and the second that python import system doesn't support any isolation (cf.. reactions costrouc's threads on discuss-python) | 00:18:16 |
SomeoneSerge (matrix works sometimes) | Outside pythonPackages using different versions of cuda is kind of fine because it gets loaded in different processes | 00:18:59 |
connor (he/him) | I mean, we also wouldn't want to build OpenCV/Magma with CUDA 12.4 and PyTorch with CUDA 12.1, right? | 00:19:19 |
SomeoneSerge (matrix works sometimes) | We build them with the default? | 00:19:35 |
connor (he/him) | Yes we do | 00:19:40 |
SomeoneSerge (matrix works sometimes) | So, we have one [mostly-]consistent package set - the top-level | 00:19:56 |
SomeoneSerge (matrix works sometimes) | Where everything uses the same default, unless it comes from a hostile upstream (google) | 00:20:28 |
connor (he/him) | In reply to @ss:someonex.net Outside pythonPackages using different versions of cuda is kind of fine because it gets loaded in different processes I'm not too familiar with how symbol resolution works; are you saying it's okay to have different versions of the CUDA libraries in the dependencies because they're loaded into different processes? | 00:20:50 |
SomeoneSerge (matrix works sometimes) | We have means of spawning new consistent package sets with a different default: nixpkgsFun/overlays | 00:21:15 |
SomeoneSerge (matrix works sometimes) | In reply to @connorbaker:matrix.org I'm not too familiar with how symbol resolution works; are you saying it's okay to have different versions of the CUDA libraries in the dependencies because they're loaded into different processes? Yes, sure | 00:22:08 |
SomeoneSerge (matrix works sometimes) | I'm no expert, but it might be even possible to have a single process load one version of, say, cudart, and then its transitive dependency load another, assuming the loading is performed with some sort of isolation - dlmopen | 00:25:50 |
connor (he/him) | I was concerned that the mix-and-match versions of CUDA would raise the same issues we saw with glibc -- but that's not the case? | 00:27:44 |
SomeoneSerge (matrix works sometimes) | I didn't test cuda specifically, but generally having a single process depend on two different versions of the same library will present a problem | 00:29:28 |
SomeoneSerge (matrix works sometimes) | Even if mixing pytorch and tensorflow built against different versions of cuda doesn't always fail, it's accidental (maybe the abi of these cuda libraries happens to be stable enough) | 00:31:33 |
connor (he/him) | I was worried more about their dependencies and the projects, given Tensorflow and PyTorch can't co-exist in the same install because they rely on different versions of (I think they use different versions of GRPC) | 00:32:55 |
SomeoneSerge (matrix works sometimes) | I see. This is still about whether things are being loaded in a single process. I suspect that protobuf in torch and tf is again... special because of python:) | 00:36:02 |
SomeoneSerge (matrix works sometimes) | IIRC tensorflow propagates its protobuf as a python package | 00:36:25 |
SomeoneSerge (matrix works sometimes) | For a pure native project that produces an ELF this wouldn't be a problem: libtorch can link its own protobuf via RUNPATH, and so can the native parts of tensorflow. You can throw them into the same closure and they'd never conflict, unless you actually loaded both from a single executable. But the python package just shows up in sys.path... | 00:38:53 |
connor (he/him) | Ah okay | 00:39:25 |
SomeoneSerge (matrix works sometimes) | All in all, we just need python to discard its import system in favour of something at least as flexible as ld.so | 00:40:42 |
SomeoneSerge (matrix works sometimes) | * All in all, we just need python to discard its import system in favour of something at least as flexible as ld.so. Which is not to say the latter can't be improved | 00:41:14 |