!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

310 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda61 Servers

Load older messages


SenderMessageTime
19 May 2024
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) It's... tricky. Consider running nixpkgs-review for a CUDA PR as an example. A number of the packages are super small and can be built in parallel. But some of them are massive beasts that should never be built in parallel (OpenCV + JAX + PyTorch = cry). Nix doesn't provide a way to allocate cores per build based on system load, or anything similar. All we have to control the builder are max-jobs and cores. It's partly why I thought scaling out was the solution -- have a lot of very fast machines which build one derivation at a time, because there's no way to schedule whether they're going to be told to build some small python wrapper or some massive package. 14:41:15
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) I suppose another way around that is to not mark one of the builders with big-parallel, and to set cores = 0 and max-jobs = auto so it can handle as many jobs as it wants in parallel, so long as they're known to be small. Then one of the other builders would have the big-parallel system feature and have cores = 0 and max-jobs = 1, so it takes the big builds, and only has to build one at a time. 14:42:55
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Eh I don't know about hardware :(14:43:19
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I will say though -- I thought AMD's x3D chips would provide a performance boost to compilation workloads, but that was not the case. So if you go for HEDT instead of professional-grade stuff, I think the 7950x would perform better than the 7950x3D.14:44:12
@glepage:matrix.orgGaétan LepageThat's really interesting !14:45:46
@glepage:matrix.orgGaétan Lepage cores = 0 means "automatic" ? 14:45:55
@glepage:matrix.orgGaétan Lepage Right now, I use one remote machine on which I ssh to code (has the nixpkgs clone).
It is also where I run nixpkgs-review from, so it is in charge of the eval.
Then, it uses another builder to perform the actual builds.
14:48:01
@glepage:matrix.orgGaétan LepageI don't develop directly from my laptop, because evaluation can themselves be quite heavy.14:48:23
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) Yes, cores = 0 is automatic. Weird that they didn't use cores = auto like they did with max-jobs. 14:50:57
@glepage:matrix.orgGaétan LepageOk14:52:00
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) Oh yeah tell me about it -- part of the reason I switched to 96GB of RAM was because nixpkgs-review kept filling up my ZRAM just during evaluation. Although, I did learn that I get a compression ratio of about 5:1 when I set ZRAM to use ZSTD! 14:52:04
@glepage:matrix.orgGaétan LepageOh wow14:52:52
@glepage:matrix.orgGaétan LepageThe price difference between 7950x and 7960x is quite massive...14:56:49
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)The 7950x is a consumer-grade desktop part, the 7960x is part of AMD's HEDT offerings IIRC, so they charge a premium for it15:08:30
@glepage:matrix.orgGaétan LepageYes, quite a premium15:09:01
@ss:someonex.netSomeoneSerge (matrix works sometimes)Well it was meant as an epsilon=10 approximation xDD Point being, it's weeks of running the CI, rather than, say, years?15:11:37
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

aidalgol: running nix-cuda-test I see it on my nvidia-smi

$ nvidia-smi
Sun May 19 15:11:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0 Off |                  Off |
| 45%   56C    P2            347W /  500W |    8187MiB /  24564MiB |     96%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3656630      C   ...y88kh-python3-3.11.9/bin/python3.11       8180MiB |
+-----------------------------------------------------------------------------------------+
15:11:45
@ss:someonex.netSomeoneSerge (matrix works sometimes)Yessss absolutely outrageous15:12:33
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) The hbv3 absolutely chugs through the first part of the magma-cuda-static build, which involves building all the C++ objects (the first 2745/3430 of objects). However, it seems there aren't as many CUDA objects (or their dependencies prevent as many from being built in parallel as the C++ objects), and they take a long time to build, so instructions per cycle wins over number of cores. Look at all my cores! So few are being used :( 15:51:41
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Screenshot 2024-05-19 at 11.46.48 AM.png
Download Screenshot 2024-05-19 at 11.46.48 AM.png
15:51:50
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Oh my god15:57:36
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
real	17m29.002s
user	0m2.368s
sys	0m2.890s
15:57:42
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Okay so I guess the higher clockspeed combined with the limited parallelism when building the CUDA objects results in it being only 2m faster than my i9-13900k15:58:43
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)

Also:

error: derivation '/nix/store/krfxsgln7gispk9lnfpiav36wja2sg9x-magma-2.7.2.drv' may not be deterministic: output '/nix/store/gmwhmzv4ppjmrwzicdww0r1nfzzhnm34-magma-2.7.2' differs
15:59:02
@ss:someonex.netSomeoneSerge (matrix works sometimes)Oh nice. Can you save a diffoscope before it's GCed?15:59:30
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Sure! How do I do that?16:03:16
@ss:someonex.netSomeoneSerge (matrix works sometimes) I'm not sure if NIx does it without the --rebuild/--check option, but there should be another path beside /nix/store/gmwh...-magma-2.7.2. Something with a suffix (maybe .check) 16:08:42
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)
$ ls -1 /nix/store/*-magma-*
/nix/store/3qk2k6g7wpidmy0rs8gilqkmy14821ns-magma-2.7.2.tar.gz.drv
/nix/store/6482b0xigkghwkx5fl97y85xqclcga96-magma-2.7.2-test.lock
/nix/store/gmwhmzv4ppjmrwzicdww0r1nfzzhnm34-magma-2.7.2.lock
/nix/store/icmm2apcmxxl4zvx5k75ya8aj3n72ifm-magma-2.7.2.tar.gz
/nix/store/krfxsgln7gispk9lnfpiav36wja2sg9x-magma-2.7.2.drv

/nix/store/gmwhmzv4ppjmrwzicdww0r1nfzzhnm34-magma-2.7.2:
include
lib
16:09:42
@ss:someonex.netSomeoneSerge (matrix works sometimes) You nix run nixpkgs#diffoscope -- /nix/store/gm...-magma-2.7.2. /nix/store/...-magma-2.7.2.check. There's a flag to export e.g. an html 16:09:47
@ss:someonex.netSomeoneSerge (matrix works sometimes)Damn. I guess it threw it away then:)16:10:24

Show newer messages


Back to Room ListRoom Version: 9