| 1 Mar 2025 |
emily | okay that sounds pretty important then | 16:37:02 |
emily | is the issue that there's only one install target and it depends on everything? | 16:37:28 |
emily | I assume you can build individual libraries with ninja <target> at least | 16:37:43 |
Lun | The kernels under tensor_operation_instance https://github.com/ROCm/composable_kernel/blob/1bf29478cdada3c7f56fbedc5542b275b0c107b3/library/src/tensor_operation_instance/gpu/CMakeLists.txt are approximately all the build time. | 16:40:02 |
emily | what if you have derivations that do the same CMake setup dance, use ninjaTargets (or whatever that variable is called) to build one (set of) kernel(s), and the installPhase just directly copies it over? then the main derivation can aggregate all those sub-derivations and copy them into place in the build directory before running the rest of the build | 16:44:18 |
emily | that way you can avoid patching the CMake build and still split it up | 16:44:28 |
Lun | I'll give that a try | 16:46:28 |
emily | the advantage is that it can be split across multiple builders and maybe not even need big-parallel | 16:47:11 |
Vladimír Čunát | Especially if there are cases that someone needs to depend only on some of the kernels? | 16:48:40 |
emily | I assume the install phase still expects all of it, I wasn't thinking the overall packaging would change since that sounds more involved | 16:49:16 |
emily | I imagine PyTorch wants to be able to use whatever | 16:49:25 |
Vladimír Čunát | So PyTorch will be a monster of several gigabytes, needing a big computer to build it? Anyway, I guess I'm too verbose for this channel. | 16:52:44 |
emily | aren't all the ML libraries already that? | 16:59:12 |
emily | (but I guess this is what rocmSupport or whatever is for?) | 16:59:26 |
| 2 Mar 2025 |
Vladimír Čunát | Huh, I wonder what this is:
Aborted: [31;1merror:[0m deleting cgroup '[35;1m/sys/fs/cgroup/system.slice/nix-daemon.service/nix-build-uid-30014[0m': [35;1mDevice or resource busy[0m
| 13:34:04 |
hexa | wild | 17:42:45 |
raitobezarius | joining non leaf node cgroups is illegal in Linux | 17:44:12 |
raitobezarius | this is probably the reason for the EBUSY | 17:44:18 |
raitobezarius | is it running lix or nix? | 17:45:19 |
hexa | the builder is running lix | 17:45:32 |
hexa | still that version from when I set the node up | 17:45:41 |
raitobezarius | hm | 17:47:13 |
raitobezarius | are you running with the cgroup xp feature? | 17:47:18 |
hexa | yep | 17:47:24 |
hexa | wait, xp? | 17:47:35 |
raitobezarius | it's experimental, yes | 17:47:41 |
hexa | ok | 17:47:59 |
raitobezarius | #if __linux__
experimentalFeatureSettings.require(Xp::Cgroups);
auto cgroupFS = getCgroupFS();
if (!cgroupFS)
throw Error("cannot determine the cgroups file system");
auto ourCgroups = getCgroups("/proc/self/cgroup");
auto ourCgroup = ourCgroups[""];
| 17:48:04 |
hexa | ah, experimental … yeah, sure | 17:48:23 |
hexa | nix-command and flakes have destroyed my perception of experimental features | 17:48:42 |