| 19 Jun 2026 |
Mic92 | I think I was able to observe this now. Apparently if a multi-part complete request returns an error, one has to check if the object was created successfully. Our retry code was just retrying on already invalidated part ids. | 16:29:27 |
Mic92 | New version is deployed | 16:29:39 |
Mic92 | Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build productions. | 17:03:03 |
Mic92 | * Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build products. | 17:03:20 |
Mic92 | * Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build products -> than no decompression is required. | 17:04:22 |
Mic92 | However CPU usage just looks okay, so I will not deploy for now and let it instead get through the backlog. Instead I am going to test it a bit on staging hydra. | 17:28:30 |
Mic92 | Okay. Signing out for today. I will have some time tomorrow for smaller fixes but not on Sunday. | 17:33:32 |
hexa (signing key rotation when) | Thanks! | 17:34:12 |
hexa (signing key rotation when) |  Download | 20:33:49 |
hexa (signing key rotation when) | I think we're not quite there yet with uploads to s3 🤔 | 20:33:59 |
John Ericson | long term I hope to make us not use the queue runner store entirely | 21:22:02 |
John Ericson | everything should be binary cache or database | 21:22:12 |
John Ericson | and evaluations should be distributed to builders just like regular build | 21:22:21 |
| morwdan joined the room. | 23:16:29 |
| 20 Jun 2026 |
Mic92 | @hexa:lossy.network: when see a lot of receiving uploads, check also the load on the machine | 04:19:38 |
Mic92 | Because if all CPUs are working, I don't think we can make compression go much faster | 04:20:17 |
Mic92 | Yesterday I reduced the number of concurrent uploads a bit again because it was causing issues on some smaller machines. I want to optimise a bit more for stability first before going for maximum throughput | 04:21:53 |
Mic92 | * @hexa:lossy.network: when you see a lot of receiving uploads, check also the load on the machine | 04:22:33 |
Mic92 | The current upload code for S3 is also not optimal. Usually one should auto scale connections based on what the S3 store respond in terms of error codes | 04:25:04 |
Mic92 | I would like to get log and open telemetry collection to work. Than we get more insights | 04:26:21 |
Mic92 | * The current upload code for S3 is also not optimal. Usually one should auto scale connections based on what the S3 service respond in terms of error codes | 04:26:42 |
vcunat | I wondered why big-parallel builds are in so much trouble since the switch, and apparently we build them with -j1 | 05:56:57 |
vcunat | At least the kernels I see building right now are that way. Some builds do not seem to log the make-parallelism level (e.g. chromium but those I saw now fail due to out-of-disk, as... the temp build space is tmpfs I think) | 05:59:47 |
vcunat | Yes, firefox here with -j1 as well:
https://hydra.nixos.org/log/adgkzznigqcx9zyjaqwldaln30rcqjsg-firefox-beta-unwrapped-151.0b9.drv | 06:00:19 |
Mic92 | Okay that's hopefully an easy fix | 06:00:42 |
Mic92 | I published my fast-nix-gc fixes I came up when I had to battle with the 1TB nix store / 58GB sqlite nix db: https://github.com/Mic92/fast-nix-gc/pull/45 | 06:04:46 |
Mic92 | Deploying to staging now | 06:30:33 |
vcunat | _I'll be away from a computer for most of today._ | 06:51:05 |
vcunat | * I'll be away from a computer for most of today. | 06:51:20 |
Mic92 | QA passed for the fixes. I will now proceed production hydra. | 07:08:28 |