NixOS CUDA - Public Room Timeline

	NixOS CUDA	328 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	64 Servers

Load older messages

Sender	Message	Time
1 Mar 2023
hexa	and has failing t ests	03:15:28
hexa	* and has failing tests	03:15:37
SomeoneSerge (matrix works sometimes)	Error in fail: Repository command failed No library found under: /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0 ERROR: /build/source/WORKSPACE:15:14: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last) ❯ ldd /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0 ldd: /nix/store/iq5b0g0md105dsw3zkw07lasaghsy0wq-cudatoolkit-12.0.1-merged/lib/libcupti.so.12.0: No such file or directory	03:16:29
SomeoneSerge (matrix works sometimes)	I'ma sleep	03:16:37
hexa	FRidh Someone S please tell me if https://github.com/NixOS/nixpkgs/pull/218929 is acceptable	03:48:17
SomeoneSerge (matrix works sometimes)	In reply to @hexa:lossy.network we probably need https://github.com/numba/numba/pull/8691 Presently CI will fail due to the lack of NumPy 1.24 packages in Anaconda, but this should be resolved in time. ... I think all review comments are now addressed and this is just waiting on package availability so as to complete testing. What the actual fuck, they won't just relax pinned versions?	12:05:56
connor (he/him)	In reply to @justbrowsing:matrix.org which presents some questions how are these software releases typically noticed - organically? when something depends on it? what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful? Word of warning, I haven't been working on this too long but here's what I've noticed. All, please feel free to correct if any of this is wrong. how are these software releases typically noticed - organically? when something depends on it? I believe it's organic -- outside of https://github.com/ryantm/nixpkgs-update I'm not sure what exists in the way of automation for updating nixpkgs. what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version. normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful? I think we find out about it either from breakages or reading the release notes/JSON. Kevin Mittman I like these questions -- can I add them to the docs tracking issue I have here? https://github.com/NixOS/nixpkgs/issues/217780	14:29:48
connor (he/him)	In reply to @ss:someonex.net Personally, I'd prefer that there was only one arch in there by default. Alt: default to `cudaCapabilities = [ "5.0" ]` (with PTX); probably cuda works by default for everyone; it's maybe mysteriously slow and people don't know to override the config Alt: default to `cudaCapabilities = [ "8.6" ]`; works for DL users, throws an error for lower grade cards, maybe people find out they need to override the config, but maybe they don't and end up feeling overwhelmed with nixpkgs With respect to specifying capabilities -- some packages (glares at magma) don't support every capability: https://github.com/NixOS/nixpkgs/pull/217410/files#diff-1e7812b78446dca0e64c4bb933e9255fca6f6539ec1ecd610edf1285a3fcbc56R55 Like, the hell? Skipping 8.6, 8.7, and 8.9? Packages like that make me think we need some way for the coda configuration and derivation to interact to agree on a list of architectures to build for. Imaging setting `cudaCapabilities = [ "8.6" ];` and getting failures because, while some packages support Ampere, they don't support that capability by name. That'd be annoying right? Or maybe that's desirable? Maybe it would be more annoying if the package used the greatest common factor (say, 8.0 when 8.6 was requested) and no errors were thrown? Is that misleading the user?	14:44:12
SomeoneSerge (matrix works sometimes)	https://github.com/NixOS/nixpkgs/pull/218265 tf builds ✅	15:58:31
Kevin Mittman (UTC-7)	In reply to @connorbaker:matrix.org Word of warning, I haven't been working on this too long but here's what I've noticed. All, please feel free to correct if any of this is wrong. how are these software releases typically noticed - organically? when something depends on it? I believe it's organic -- outside of https://github.com/ryantm/nixpkgs-update I'm not sure what exists in the way of automation for updating nixpkgs. what sort of translation would be hypothetically needed to convert this or a similar manifest into something automation could pick up Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version. normally how are changes discovered such as added, removed, renamed, or split components? if this was in the json would that be helpful? I think we find out about it either from breakages or reading the release notes/JSON. Kevin Mittman I like these questions -- can I add them to the docs tracking issue I have here? https://github.com/NixOS/nixpkgs/issues/217780 I was thinking more along the lines of an issue filed here https://github.com/NVIDIA/build-system-archive-import-examples	16:10:04
SomeoneSerge (matrix works sometimes)	In reply to @justbrowsing:matrix.org FYI, CUDA 12.1.0 is now available https://developer.download.nvidia.com/compute/cuda/redist/redistrib_12.1.0.json In principle, as long as the directory listing at https://developer.download.nvidia.com/compute/cuda/redist/ works, we could work out our own automation. That being said, a single machine-readable entrypoint (a stable location with a json that lists URIs to all releases, or an RSS feed) would be more convenient	16:14:00
SomeoneSerge (matrix works sometimes)	Given the switch to redistributables and Nix parsing JSON we grab from NVIDIA's website, maybe it'd be easier to automate it now! I could imagine a script which curls their index to see what the latest is, and adds a new copy of the JSON files we need if there's a newer version. Been thinking that too. Not even automated PRs, but we could improve visiblity by just making a github workflow that runs on cron schedule, checks published JSONs, and publishes a status report on github pages	16:16:31
SomeoneSerge (matrix works sometimes)	I thought it would be a good idea to run `nixpkgs-review` with `cudaSupport = true`, but that just opened a hellgate: https://gist.github.com/SomeoneSerge/6cc00b41964e43f725fc12046778532d#file-218265-log-L23	22:38:52
SomeoneSerge (matrix works sometimes)	`adapta-gtk-theme`, ..., `chromium`, ..., `wine64`, ...	22:39:38
SomeoneSerge (matrix works sometimes)	Mostly outPaths react to cudaSupport through gst-plugins-bad, which depend on opencv	22:40:23
SomeoneSerge (matrix works sometimes)	😢	22:40:55
Kevin Mittman (UTC-7)	is that the nix equivalent of portage's `emerge world` after changing USE flags?	22:51:03
SomeoneSerge (matrix works sometimes)	In reply to @justbrowsing:matrix.org is that the nix equivalent of portage's `emerge world` after changing USE flags? I'm not familiar with Gentoo, but I think I can guess the answer	23:44:51
SomeoneSerge (matrix works sometimes)	Tbh, I'd be happy enough to build gst bad against cpu-only opencv, and then just fork the output and patchelf it to use opencv-cuda. Ad hoc as it is, it'd probably discard 90% of false-positives. Just need a good tool for "forking"...	23:51:45
2 Mar 2023
connor (he/him)	Ugh yeah changes which touch OpenCV are such a pain. I flipped when I saw my PR needed me to rebuild 700+ packages	00:06:44
SomeoneSerge (matrix works sometimes)	In reply to @FRidh:matrix.org Yes, I think this was based on https://github.com/NixOS/rfcs/pull/3 From a quick look, couldn't tell if this was implemented after all. Do security updates result in massive rebuilds today, or are they shipped using some of these tricks?	14:53:02
SomeoneSerge (matrix works sometimes)	In reply to @connorbaker:matrix.org With respect to specifying capabilities -- some packages (glares at magma) don't support every capability: https://github.com/NixOS/nixpkgs/pull/217410/files#diff-1e7812b78446dca0e64c4bb933e9255fca6f6539ec1ecd610edf1285a3fcbc56R55 Like, the hell? Skipping 8.6, 8.7, and 8.9? Packages like that make me think we need some way for the coda configuration and derivation to interact to agree on a list of architectures to build for. Imaging setting `cudaCapabilities = [ "8.6" ];` and getting failures because, while some packages support Ampere, they don't support that capability by name. That'd be annoying right? Or maybe that's desirable? Maybe it would be more annoying if the package used the greatest common factor (say, 8.0 when 8.6 was requested) and no errors were thrown? Is that misleading the user? One more thing I'm afraid of: if we go with defaulting to very few capabilities (TM), I imagine the UX of a new user is basically: spend close to an hour on download, find out the stuff you downloaded doesn't work with your graphics	18:40:23
SomeoneSerge (matrix works sometimes)	Someone on Discourse mention the other day that unpacking the cudatoolkit's runfile took them about 40 minutes (we should get rid of it already)	18:40:56
SomeoneSerge (matrix works sometimes)	* Someone on Discourse mentioned the other day that unpacking the cudatoolkit's runfile took them about 40 minutes (we should get rid of it already)	18:41:03
SomeoneSerge (matrix works sometimes)	Download size is another reason for single-capability default, although it's laughable when we use the run-file	18:45:13
SomeoneSerge (matrix works sometimes)	Still. Tensorflow built for `8.6` only is 554.24MiB versus 918.59MiB for "all supported caps"	18:46:55
SomeoneSerge (matrix works sometimes)	(laughable because this doesn't account for runtime dependencies, which are 9GiB in total)	18:47:35
SomeoneSerge (matrix works sometimes)	* (laughable because this doesn't account for runtime dependencies, which are ~~9GiB in total~~ 9GiB for `8.6`-only and 11.15 for `[3.5, ..., 8.6]`)	18:48:12
SomeoneSerge (matrix works sometimes)	* (laughable because this doesn't account for runtime dependencies, which are ~~9GiB in total~~ 9GiB for `8.6`-only and 11.15GiB for `[3.5, ..., 8.6]`)	18:48:23
SomeoneSerge (matrix works sometimes)	Oh my god, pytorch has migrated to `FindCUDAToolkit.cmake`!!!!!!!!!!!	19:09:28

Show newer messages

Back to Room ListRoom Version: 9