!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

391 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.121 Servers

Load older messages


SenderMessageTime
1 Mar 2025
@vcunat:matrix.orgVladimír Čunát* As it is designed now, everyone will be waiting for this huge thing to finish.16:29:11
@emilazy:matrix.orgemily so… has hydraPlatforms = [ ]; been considered? 16:30:05
@emilazy:matrix.orgemily how important is the composable_kernel thing? 16:30:07
@emilazy:matrix.orgemilyI know ROCm has a bunch of stuff and some of it nobody uses, right?16:30:19
@lt1379:matrix.orgLunIt's needed for MIOpen which is needed for pytorch for kernels for some ops, it will also be used directly by pytorch in 2.616:32:47
@lt1379:matrix.orgLun https://github.com/NixOS/nixpkgs/blob/bda6bcbbacbd8f48e69b228b91b5541c03f7ab35/pkgs/development/rocm-modules/6/composable_kernel/default.nix
If anyone happens to know cmake wizardry and can work out how to split this into multiple derivations and get a working end result without needing a massive patch to the CMakeFiles that might be the best option?
16:36:38
@emilazy:matrix.orgemilyokay that sounds pretty important then16:37:02
@emilazy:matrix.orgemilyis the issue that there's only one install target and it depends on everything?16:37:28
@emilazy:matrix.orgemily I assume you can build individual libraries with ninja <target> at least 16:37:43
@lt1379:matrix.orgLun The kernels under tensor_operation_instance https://github.com/ROCm/composable_kernel/blob/1bf29478cdada3c7f56fbedc5542b275b0c107b3/library/src/tensor_operation_instance/gpu/CMakeLists.txt are approximately all the build time. 16:40:02
@emilazy:matrix.orgemily what if you have derivations that do the same CMake setup dance, use ninjaTargets (or whatever that variable is called) to build one (set of) kernel(s), and the installPhase just directly copies it over? then the main derivation can aggregate all those sub-derivations and copy them into place in the build directory before running the rest of the build 16:44:18
@emilazy:matrix.orgemilythat way you can avoid patching the CMake build and still split it up16:44:28
@lt1379:matrix.orgLunI'll give that a try16:46:28
@emilazy:matrix.orgemily the advantage is that it can be split across multiple builders and maybe not even need big-parallel 16:47:11
@vcunat:matrix.orgVladimír ČunátEspecially if there are cases that someone needs to depend only on some of the kernels?16:48:40
@emilazy:matrix.orgemilyI assume the install phase still expects all of it, I wasn't thinking the overall packaging would change since that sounds more involved16:49:16
@emilazy:matrix.orgemilyI imagine PyTorch wants to be able to use whatever16:49:25
@vcunat:matrix.orgVladimír ČunátSo PyTorch will be a monster of several gigabytes, needing a big computer to build it? Anyway, I guess I'm too verbose for this channel.16:52:44
@emilazy:matrix.orgemilyaren't all the ML libraries already that?16:59:12
@emilazy:matrix.orgemily (but I guess this is what rocmSupport or whatever is for?) 16:59:26
2 Mar 2025
@vcunat:matrix.orgVladimír Čunát

Huh, I wonder what this is:

Aborted: error: deleting cgroup '/sys/fs/cgroup/system.slice/nix-daemon.service/nix-build-uid-30014': Device or resource busy

13:34:04
@hexa:lossy.networkhexawild17:42:45
@raitobezarius:matrix.orgraitobezariusjoining non leaf node cgroups is illegal in Linux17:44:12
@raitobezarius:matrix.orgraitobezariusthis is probably the reason for the EBUSY17:44:18
@raitobezarius:matrix.orgraitobezariusis it running lix or nix?17:45:19
@hexa:lossy.networkhexathe builder is running lix17:45:32
@hexa:lossy.networkhexastill that version from when I set the node up17:45:41
@raitobezarius:matrix.orgraitobezariushm17:47:13
@raitobezarius:matrix.orgraitobezariusare you running with the cgroup xp feature?17:47:18
@hexa:lossy.networkhexayep17:47:24

Show newer messages


Back to Room ListRoom Version: 6