!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

470 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.146 Servers

Load older messages


SenderMessageTime
19 Jun 2026
@joerg:thalheim.ioMic92I think I was able to observe this now. Apparently if a multi-part complete request returns an error, one has to check if the object was created successfully. Our retry code was just retrying on already invalidated part ids.16:29:27
@joerg:thalheim.ioMic92New version is deployed16:29:39
@joerg:thalheim.ioMic92Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build productions.17:03:03
@joerg:thalheim.ioMic92* Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build products.17:03:20
@joerg:thalheim.ioMic92* Looks like hydra's disk no longer get trashed with nars... good so far. However NAR streaming is still quiet heavy and blocks the queue-runner async code a bit, so I should get this out of the event loop. My first attempt will be switching to ls files since the vast majority of uncached nars won't have any hydra build products -> than no decompression is required.17:04:22
@joerg:thalheim.ioMic92However CPU usage just looks okay, so I will not deploy for now and let it instead get through the backlog. Instead I am going to test it a bit on staging hydra.17:28:30
@joerg:thalheim.ioMic92Okay. Signing out for today. I will have some time tomorrow for smaller fixes but not on Sunday.17:33:32
@hexa:lossy.networkhexa (signing key rotation when)Thanks!17:34:12
@hexa:lossy.networkhexa (signing key rotation when)
Download
20:33:49
@hexa:lossy.networkhexa (signing key rotation when)I think we're not quite there yet with uploads to s3 🤔20:33:59
@Ericson2314:matrix.orgJohn Ericsonlong term I hope to make us not use the queue runner store entirely21:22:02
@Ericson2314:matrix.orgJohn Ericsoneverything should be binary cache or database21:22:12
@Ericson2314:matrix.orgJohn Ericsonand evaluations should be distributed to builders just like regular build21:22:21
@morwdan:tchncs.demorwdan joined the room.23:16:29
20 Jun 2026
@joerg:thalheim.ioMic92 @hexa:lossy.network: when see a lot of receiving uploads, check also the load on the machine 04:19:38
@joerg:thalheim.ioMic92Because if all CPUs are working, I don't think we can make compression go much faster04:20:17
@joerg:thalheim.ioMic92Yesterday I reduced the number of concurrent uploads a bit again because it was causing issues on some smaller machines. I want to optimise a bit more for stability first before going for maximum throughput04:21:53
@joerg:thalheim.ioMic92* @hexa:lossy.network: when you see a lot of receiving uploads, check also the load on the machine04:22:33
@joerg:thalheim.ioMic92The current upload code for S3 is also not optimal. Usually one should auto scale connections based on what the S3 store respond in terms of error codes04:25:04
@joerg:thalheim.ioMic92I would like to get log and open telemetry collection to work. Than we get more insights04:26:21
@joerg:thalheim.ioMic92* The current upload code for S3 is also not optimal. Usually one should auto scale connections based on what the S3 service respond in terms of error codes04:26:42
@vcunat:matrix.orgvcunat I wondered why big-parallel builds are in so much trouble since the switch, and apparently we build them with -j1 05:56:57
@vcunat:matrix.orgvcunatAt least the kernels I see building right now are that way. Some builds do not seem to log the make-parallelism level (e.g. chromium but those I saw now fail due to out-of-disk, as... the temp build space is tmpfs I think)05:59:47
@vcunat:matrix.orgvcunatYes, firefox here with -j1 as well: https://hydra.nixos.org/log/adgkzznigqcx9zyjaqwldaln30rcqjsg-firefox-beta-unwrapped-151.0b9.drv06:00:19
@joerg:thalheim.ioMic92Okay that's hopefully an easy fix06:00:42
@joerg:thalheim.ioMic92I published my fast-nix-gc fixes I came up when I had to battle with the 1TB nix store / 58GB sqlite nix db: https://github.com/Mic92/fast-nix-gc/pull/4506:04:46
@joerg:thalheim.ioMic92Deploying to staging now06:30:33
@vcunat:matrix.orgvcunat_I'll be away from a computer for most of today._06:51:05
@vcunat:matrix.orgvcunat* I'll be away from a computer for most of today.06:51:20
@joerg:thalheim.ioMic92QA passed for the fixes. I will now proceed production hydra.07:08:28

Show newer messages


Back to Room ListRoom Version: 6