!PbtOpdWBSRFbEZRLIf:numtide.com

Nix Community Projects

596 Members
Meta discussions related to https://nix-community.org. (For project specific discussions use github issues or projects own matrix channel). Need help from an admin? Open an issue on https://github.com/nix-community/infra/issues155 Servers

You have reached the beginning of time (for this room).


SenderMessageTime
25 Jan 2025
@ss:someonex.netSomeoneSerge (back on matrix) Yes, although I'm now reflecting whether runtime tests (which we need the GPU instance for) should be the first priority if the goal is to ensure stability of GPU-accelerated software.
Now that there's a community-funded hydra we can be sure to observe build failures retrospectively, and the public substituter helps with iteration times - it's a big difference compared to the previous state of affairs.
Adding the GPU instance, while sounds very cool, would further increase visibility but only for discovering breakages that already happened.
Arguably, we'd get a much bigger impact by focusing on integration with the forge now.
There are two ways we could integrate with the forge: channel-blocking and OfBorg-like.
The former is mostly about working hours: we maintain our own channel, or we enhance the logic that advances the channels so that instead of being triggered by one jobset in the official hydra it could also get a report from an external source, namely the community hydra.
The latter, like runtime tests, is about resources: we'd need a CI that can react to on-push events in Nixpkgs to evaluate and build stuff with {cuda,rocm}Support, and we'd need a github action to fetch the report. We don't even need to block PRs, we just need a linting feature that would inform authors that their change also affects the GPU variants of nixpkgs and that they could maybe ping the responsible team
14:53:46
@ss:someonex.netSomeoneSerge (back on matrix)I imagine that on-push jobs would be a lot more pressure, but as long as we can reasonably argue that this is the right platform to build this kind of CI I think there are a few more sources we can attract to the opencollective14:59:23
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8) joined the room.15:32:46
@luxzi:matrix.org@luxzi:matrix.org changed their display name from luxzi (they/she) to luxzi (she/they).20:01:16
26 Jan 2025
@emilazy:matrix.orgemily Gaétan Lepage: load average on the 10-core Darwin builder is 21.58, trying to fix stdenv 21:17:55
@emilazy:matrix.orgemily Ihar Hrachyshka: are you running any builds? 21:21:48
@ihar.hrachyshka:matrix.orgIhar Hrachyshka emily: I do investigate the llama-cpp-python issue, yes. any issue with that? 21:23:01
@emilazy:matrix.orgemily the box's load average is ~2× the number of cores so it's pretty overloaded right now. I was going to fire off builds to test fixes for Darwin stdenv for the next staging-next cycle (due in ~2 days), and to try and reproduce the treewide Rust FOD hash replacement and check for any Darwin-specific issues, but it's struggling even as it is 21:24:09
@emilazy:matrix.orgemily might be good to lower cores/max-jobs though I don't know if it's already overloaded from other builds 21:24:55
@ihar.hrachyshka:matrix.orgIhar Hrachyshka emily: ok sorry, I'm obtuse :) you want me to hold off for now? that's fine, I can do something else. 21:24:59
@emilazy:matrix.orgemily could you just try with lower cores/max-jobs settings maybe? 21:25:21
@emilazy:matrix.orgemilythe default is 10 cores + 10 jobs, which can mean up to ~100 threads on a 10-core processor21:25:39
@emilazy:matrix.orgemily the stdenv build will take hours anyway so it's not urgent, but I don't want to throw more jobs at an already overloaded machine 21:26:08
@ihar.hrachyshka:matrix.orgIhar Hrachyshkait's 1 job for me. how do I limit the cores? is there a universal recipe or I patch cmake files?21:26:25
@emilazy:matrix.orgemily it's --option cores and --option max-jobs in Nix (-j for short on the latter) 21:28:41

Show newer messages


Back to Room ListRoom Version: 6