| 1 Mar 2025 |
hexa | we have epyc 9454p (48C/96T) and schedule 4 big-parallel jobs with 24 "cores" | 05:18:04 |
hexa | * we have epyc 9454p (48C/96T) and schedule up to 4 big-parallel jobs with 24 "cores" | 05:18:15 |
hexa | nix.settings = {
cores = 24;
max-jobs = 4;
system-features = [ "big-parallel" ];
};
| 05:18:29 |
hexa | Redacted or Malformed Event | 05:19:41 |
hexa |
[40 jobs | 9% 389/4212 @ 0.0/s | 2:46:42 | ETA 27:18:26 ]
| 05:26:06 |
hexa | I mean, these are Skylake Cores, so I'm not sure what that means 😄 | 05:28:22 |
hexa | https://www.intel.com/content/www/us/en/products/sku/123550/intel-xeon-silver-4114-processor-13-75m-cache-2-20-ghz/specifications.html | 05:28:41 |
hexa | I sure hope that package won't surprise me in python-updates one day | 05:29:14 |
VladimÃr ÄŒunát | If it's 4h with 128 threads, I don't think it will fit into the 10h default limit in our current infra config. (--cores 24 for big-parallel mentioned here; I didn't verify now) | 06:57:47 |
VladimÃr ÄŒunát | But that's easy to override per job. | 06:58:34 |
hexa | where is the default limit set? | 06:59:30 |
hexa | a meta default? | 06:59:49 |
hexa | I think we only configure a max-silent-time on the builders | 07:00:17 |
hexa | and a max-unsupported-time on the queue-runner | 07:00:23 |
hexa | Redacted or Malformed Event | 07:00:28 |
VladimÃr ÄŒunát | .meta.timeout | 07:01:05 |
hexa | pkgs/applications/networking/browsers/chromium/browser.nix
121: timeout = 172800; # 48 hours (increased from the Hydra default of 10h)
| 07:01:34 |
hexa | 🤡 | 07:01:36 |
hexa | * pkgs/applications/networking/browsers/chromium/browser.nix
121: timeout = 172800; # 48 hours (increased from the Hydra default of 10h)
pkgs/development/tools/electron/common.nix
287: timeout = 172800; # 48 hours (increased from the Hydra default of 10h)
| 07:01:49 |
hexa | meta.timeout is not set by default, so … where is the default? 😄 | 07:02:35 |
VladimÃr ÄŒunát | I think it's in Hydra config. | 07:02:50 |
hexa |  Download image.png | 07:03:04 |
hexa | , timeout => getMeta($buildInfo->{meta}->{timeout}, 36000)
| 07:03:33 |
hexa | indeed, here we are | 07:03:35 |
hexa | lun: I'm a bit afraid to ask, but there is supposed to be a migraphx python package, and one of the packages I maintain would want that to support rocm 🙈 | 07:09:15 |
hexa | supposedly this https://github.com/ROCm/AMDMIGraphX/blob/develop/src/py/migraphx_py.cpp | 07:13:00 |
Lun | Mention it on the big ROCm tracking issue | 16:04:24 |
Lun | definitely worse: https://gist.githubusercontent.com/LunNova/b1cf007f1af52b4dc353fd9925857b97/raw/63ea9ec1a500d5ef6ad4f2f0eac7a59b6db6e310/huge%2520composable_kernel%2520template%2520instantiation.txt | 16:07:29 |
Lun | The ~4h builds are nix cores config set to 128 on a 64c/128t epyc milan eng sample that's clocking down to <3GHz due to power limits, not sure what the relative speedup per core will be but probably not enough to overcome dropping to 24 build threads.
Does bumping meta.timeout to 20h to start with sound reasonable? | 16:13:51 |
emily | might make sense to do 48 and then scale down based on the actual time | 16:19:08 |