NixOS CUDA - Public Room Timeline

	NixOS CUDA	317 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	63 Servers

Load older messages

Sender	Message	Time
19 May 2024
Gaétan Lepage	7960x would be HEDT I guess	14:18:08
connor (he/him)	Changing `"torchvision>=0.15.0",` to `"torchvision>=0.15.0a0",` in `nix-cuda-test`'s `pyproject.toml` enables pre-releases for that requirement (https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/specifiers.py#L249-L270). So I guess I should submit a PR to `torchvision` to fix their version (it doesn't match their tag either).	14:23:35
hexa	oohhh, pre-releases	14:24:13
hexa	my bad	14:24:14
hexa	not sure if we should allow pre-releases	14:25:06
hexa	it would probably remove confusion about the error	14:25:14
connor (he/him)	I think it's safe to say it's upstream's fault -- their previous releases didn't have mismatched `version.txt` files. I made a PR: https://github.com/pytorch/vision/pull/8431	14:36:28
connor (he/him)	So really, your hook helped me catch something upstream did <3	14:36:51
connor (he/him)	I don't know how feasible it is to add a warning about pre-releases to the hook, but that would have saved me from reading through `packaging`'s codebase to figure out what was going on haha	14:37:39
connor (he/him)	It's... tricky. Consider running `nixpkgs-review` for a CUDA PR as an example. A number of the packages are super small and can be built in parallel. But some of them are massive beasts that should never be built in parallel (OpenCV + JAX + PyTorch = cry). Nix doesn't provide a way to allocate cores per build based on system load, or anything similar. All we have to control the builder are `max-jobs` and `cores`. It's partly why I thought scaling out was the solution -- have a lot of very fast machines which build one derivation at a time, because there's no way to schedule whether they're going to be told to build some small python wrapper or some massive package.	14:41:15
connor (he/him)	I suppose another way around that is to not mark one of the builders with `big-parallel`, and to set `cores = 0` and `max-jobs = auto` so it can handle as many jobs as it wants in parallel, so long as they're known to be small. Then one of the other builders would have the `big-parallel` system feature and have `cores = 0` and `max-jobs = 1`, so it takes the big builds, and only has to build one at a time.	14:42:55
connor (he/him)	Eh I don't know about hardware :(	14:43:19
connor (he/him)	I will say though -- I thought AMD's x3D chips would provide a performance boost to compilation workloads, but that was not the case. So if you go for HEDT instead of professional-grade stuff, I think the 7950x would perform better than the 7950x3D.	14:44:12
Gaétan Lepage	That's really interesting !	14:45:46
Gaétan Lepage	`cores = 0` means "automatic" ?	14:45:55
Gaétan Lepage	Right now, I use one remote machine on which I ssh to code (has the nixpkgs clone). It is also where I run `nixpkgs-review` from, so it is in charge of the eval. Then, it uses another builder to perform the actual builds.	14:48:01
Gaétan Lepage	I don't develop directly from my laptop, because evaluation can themselves be quite heavy.	14:48:23
connor (he/him)	Yes, `cores = 0` is automatic. Weird that they didn't use `cores = auto` like they did with `max-jobs`.	14:50:57
Gaétan Lepage	Ok	14:52:00
connor (he/him)	Oh yeah tell me about it -- part of the reason I switched to 96GB of RAM was because `nixpkgs-review` kept filling up my ZRAM just during evaluation. Although, I did learn that I get a compression ratio of about 5:1 when I set ZRAM to use ZSTD!	14:52:04
Gaétan Lepage	Oh wow	14:52:52
Gaétan Lepage	The price difference between 7950x and 7960x is quite massive...	14:56:49
connor (he/him)	The 7950x is a consumer-grade desktop part, the 7960x is part of AMD's HEDT offerings IIRC, so they charge a premium for it	15:08:30
Gaétan Lepage	Yes, quite a premium	15:09:01
SomeoneSerge (matrix works sometimes)	Well it was meant as an epsilon=10 approximation xDD Point being, it's weeks of running the CI, rather than, say, years?	15:11:37
connor (he/him)	aidalgol: running `nix-cuda-test` I see it on my nvidia-smi $ nvidia-smi Sun May 19 15:11:11 2024 +-----------------------------------------------------------------------------------------+ \| NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 \| \|-----------------------------------------+------------------------+----------------------+ \| GPU Name Persistence-M \| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap \| Memory-Usage \| GPU-Util Compute M. \| \| \| \| MIG M. \| \|=========================================+========================+======================\| \| 0 NVIDIA GeForce RTX 4090 Off \| 00000000:01:00.0 Off \| Off \| \| 45% 56C P2 347W / 500W \| 8187MiB / 24564MiB \| 96% Default \| \| \| \| N/A \| +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ \| Processes: \| \| GPU GI CI PID Type Process name GPU Memory \| \| ID ID Usage \| \|=========================================================================================\| \| 0 N/A N/A 3656630 C ...y88kh-python3-3.11.9/bin/python3.11 8180MiB \| +-----------------------------------------------------------------------------------------+	15:11:45
SomeoneSerge (matrix works sometimes)	Yessss absolutely outrageous	15:12:33
connor (he/him)	The hbv3 absolutely chugs through the first part of the `magma-cuda-static` build, which involves building all the C++ objects (the first 2745/3430 of objects). However, it seems there aren't as many CUDA objects (or their dependencies prevent as many from being built in parallel as the C++ objects), and they take a long time to build, so instructions per cycle wins over number of cores. Look at all my cores! So few are being used :(	15:51:41
connor (he/him)	Download Screenshot 2024-05-19 at 11.46.48 AM.png	15:51:50
connor (he/him)	Oh my god	15:57:36

Show newer messages

Back to Room ListRoom Version: 9