!zghijEASpYQWYFzriI:nixos.org

Hydra

342 Members
99 Servers

Load older messages


SenderMessageTime
11 Dec 2024
@dminca:matrix.org@dminca:matrix.org left the room.14:18:55
12 Dec 2024
@hexa:lossy.networkhexa gcroots prevent garbage collection 02:23:18
@hexa:lossy.networkhexathey come from the number of evaluations kept by the various jobsets02:23:33
@hexa:lossy.networkhexagcroot because they are the root of a closure that references one or many store paths02:24:13
@rhelmot:matrix.orgrhelmotwell... yes, but my question is that the garbage collection is happening anyway02:24:16
@hexa:lossy.networkhexahow many evals do you keep for your jobset?02:24:59
@hexa:lossy.networkhexaand does a newer eval maybe replace the older gcroots?02:25:12
@rhelmot:matrix.orgrhelmotthere have been no new evals02:38:47
@rhelmot:matrix.orgrhelmotI have the setting at 3 rn02:39:14
@rhelmot:matrix.orgrhelmotthe question I originally stated was "is it valid to have a gcroot which is a normal file and not a symlink"02:40:34
@rhelmot:matrix.orgrhelmot * the question I originally stated maybe should have been "is it valid to have a gcroot which is a normal file and not a symlink"02:40:47
@noisersup:matrix.orgnoisersup changed their profile picture.20:04:35
13 Dec 2024
@ctheune:matrix.flyingcircus.ioChristian Theune hexa: quick update from our end regarding hydra improvements we've planned/queued up (and which mostly ma27 will implement). there's a pr currently waiting to be merged to support globbing in named constituents that also adds logging for memory usage per job. we're currently switching our hydra to zstd compression and will then switch to introducing a hard memory limit. after that i think we'll take a look at moving compression to the workers. I've had one idea to look into making the compression an option on the jobset to allow more gradual changes if this is touched again in the future, but it seems like that might be complicated bordering fragile. and then we'll also take a look at some corner cases where stuck jobs need to be manually cancelled or even killed on the builder itself. as things are currently preparing for the holidays, it looks like most of that will likely happen in the new year. 06:20:00
@vcunat:matrix.orgvcunatSounds nice. At hydra.nixos.org we seem to have now avoided the compression bottleneck by brute force (48-core EPYC + hyperthreading).07:28:47
@ctheune:matrix.flyingcircus.ioChristian TheuneYeah, our immediate measurement was a reduction of (disk / channel) image compression time going down from 5 minutes to 7s, so that seems a big win.08:03:44
@ctheune:matrix.flyingcircus.ioChristian TheuneNevertheless I'm trying to keep an eye on things that happen in a blocking fashion in the queue runner.08:04:28
@ctheune:matrix.flyingcircus.ioChristian Theunebecause those will all be bottlenecks for scaling08:04:49
@ctheune:matrix.flyingcircus.ioChristian Theunei'm guessing s3 uploads being in a similar spot08:05:05
@ctheune:matrix.flyingcircus.ioChristian Theuneif s3 uploads are also in the queue runner blocking things, then I'm wondering whether the uploads could also happen form the workers as long as hydra provides the signature. 08:06:08
@ctheune:matrix.flyingcircus.ioChristian Theune * if s3 uploads are also in the queue runner blocking things, then I'm wondering whether the uploads could also happen from the workers as long as hydra provides the signature. 08:06:18
@ctheune:matrix.flyingcircus.ioChristian Theunefrom a security perspective i understand that we want to keep the signing key on the master08:06:29
@ctheune:matrix.flyingcircus.ioChristian Theunes3 upload credentials then aren't really /that/ sensitive compared to that we have to trust the content that the builders generate anyway.08:06:49
@vcunat:matrix.orgvcunatSigning itself is cheap, if you provide the hash to sign. The signer doesn't even need the whole NAR.08:07:09
@vcunat:matrix.orgvcunat * Signing itself is cheap, if you provide the hash to sign. The signer doesn't even need the whole NAR. (in principle)08:07:15
@ctheune:matrix.flyingcircus.ioChristian Theuneah, interesting. 08:07:28
@ctheune:matrix.flyingcircus.ioChristian Theunethat would mean we wouldn't even have to transfer the files to the master for that reason.08:07:43
@ctheune:matrix.flyingcircus.ioChristian Theuneand the builder already has the closure and could upload08:07:53
@ctheune:matrix.flyingcircus.ioChristian Theunei'll keep that in mind when we take a look at moving the compression around08:08:04
@vcunat:matrix.orgvcunatYes, that does sound like good architecture.08:08:17
@ctheune:matrix.flyingcircus.ioChristian Theunenot sure whether it's good. it seems better than what it is now. 😉08:08:37

Show newer messages


Back to Room ListRoom Version: 6