!zghijEASpYQWYFzriI:nixos.org

Hydra

379 Members
107 Servers

Load older messages


SenderMessageTime
12 Dec 2024
@hexa:lossy.networkhexahow many evals do you keep for your jobset?02:24:59
@hexa:lossy.networkhexaand does a newer eval maybe replace the older gcroots?02:25:12
@rhelmot:matrix.orgrhelmotthere have been no new evals02:38:47
@rhelmot:matrix.orgrhelmotI have the setting at 3 rn02:39:14
@rhelmot:matrix.orgrhelmotthe question I originally stated was "is it valid to have a gcroot which is a normal file and not a symlink"02:40:34
@rhelmot:matrix.orgrhelmot * the question I originally stated maybe should have been "is it valid to have a gcroot which is a normal file and not a symlink"02:40:47
@noisersup:matrix.orgnoisersup changed their profile picture.20:04:35
13 Dec 2024
@ctheune:matrix.flyingcircus.ioTheuni hexa: quick update from our end regarding hydra improvements we've planned/queued up (and which mostly ma27 will implement). there's a pr currently waiting to be merged to support globbing in named constituents that also adds logging for memory usage per job. we're currently switching our hydra to zstd compression and will then switch to introducing a hard memory limit. after that i think we'll take a look at moving compression to the workers. I've had one idea to look into making the compression an option on the jobset to allow more gradual changes if this is touched again in the future, but it seems like that might be complicated bordering fragile. and then we'll also take a look at some corner cases where stuck jobs need to be manually cancelled or even killed on the builder itself. as things are currently preparing for the holidays, it looks like most of that will likely happen in the new year. 06:20:00
@vcunat:matrix.orgvcunatSounds nice. At hydra.nixos.org we seem to have now avoided the compression bottleneck by brute force (48-core EPYC + hyperthreading).07:28:47
@ctheune:matrix.flyingcircus.ioTheuniYeah, our immediate measurement was a reduction of (disk / channel) image compression time going down from 5 minutes to 7s, so that seems a big win.08:03:44
@ctheune:matrix.flyingcircus.ioTheuniNevertheless I'm trying to keep an eye on things that happen in a blocking fashion in the queue runner.08:04:28
@ctheune:matrix.flyingcircus.ioTheunibecause those will all be bottlenecks for scaling08:04:49
@ctheune:matrix.flyingcircus.ioTheunii'm guessing s3 uploads being in a similar spot08:05:05
@ctheune:matrix.flyingcircus.ioTheuniif s3 uploads are also in the queue runner blocking things, then I'm wondering whether the uploads could also happen form the workers as long as hydra provides the signature. 08:06:08
@ctheune:matrix.flyingcircus.ioTheuni * if s3 uploads are also in the queue runner blocking things, then I'm wondering whether the uploads could also happen from the workers as long as hydra provides the signature. 08:06:18
@ctheune:matrix.flyingcircus.ioTheunifrom a security perspective i understand that we want to keep the signing key on the master08:06:29
@ctheune:matrix.flyingcircus.ioTheunis3 upload credentials then aren't really /that/ sensitive compared to that we have to trust the content that the builders generate anyway.08:06:49
@vcunat:matrix.orgvcunatSigning itself is cheap, if you provide the hash to sign. The signer doesn't even need the whole NAR.08:07:09
@vcunat:matrix.orgvcunat * Signing itself is cheap, if you provide the hash to sign. The signer doesn't even need the whole NAR. (in principle)08:07:15
@ctheune:matrix.flyingcircus.ioTheuniah, interesting. 08:07:28
@ctheune:matrix.flyingcircus.ioTheunithat would mean we wouldn't even have to transfer the files to the master for that reason.08:07:43
@ctheune:matrix.flyingcircus.ioTheuniand the builder already has the closure and could upload08:07:53
@ctheune:matrix.flyingcircus.ioTheunii'll keep that in mind when we take a look at moving the compression around08:08:04
@vcunat:matrix.orgvcunatYes, that does sound like good architecture.08:08:17
@ctheune:matrix.flyingcircus.ioTheuninot sure whether it's good. it seems better than what it is now. 😉08:08:37
@ctheune:matrix.flyingcircus.ioTheunibut yeah08:08:39
@vcunat:matrix.orgvcunatThough hydra.nixos.org is now blocked by loading jobs from DB. Probably the steps that check what's in S3 already. (it's overseas unfortunately so higher latency)08:08:54
@ctheune:matrix.flyingcircus.ioTheuniyeah i've read that. that part of the code/architecture i haven't looked at before and it's two steps further down the road on our map.08:10:14
@ctheune:matrix.flyingcircus.ioTheuni(our s3 is local and we have a much lower number of jobs anyway)08:10:40
@ctheune:matrix.flyingcircus.ioTheunibut yeah, happy to help in general, but need to be careful with my commitments ... 08:11:01

Show newer messages


Back to Room ListRoom Version: 6