| 19 Mar 2026 |
Mic92 | So looks like we always had keep-alive | 12:25:12 |
Mic92 | but aws does it's own pooling | 12:26:48 |
Sergei Zimmerman (xokdvium) | It might have retries for the error. Also one thing to note is that old code didn't run concurrent s3 requests at all, since it was using the blocking API. Now we fire off a bunch of requests in parallel. | 12:26:53 |
Sergei Zimmerman (xokdvium) | We do curl_multi pooling too, that does reuse the handles | 12:27:25 |
Sergei Zimmerman (xokdvium) | Or rather the connections for the easy handles | 12:27:36 |
Mic92 | aws, doesn't seem to use this interface | 12:28:30 |
Sergei Zimmerman (xokdvium) | Well it effectively does the same thing. What aws probably does is retry on that 400 error | 12:29:30 |
Arian | In reply to @joerg:thalheim.io but aws does it's own pooling Only if you use the transfer API iirc which we didn't | 12:32:54 |
Arian | For S3 you also need to retry on 503 for uploads
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-design-patterns.html | 12:34:08 |
Arian | They use 503 for rate limit :') | 12:34:30 |
Mic92 | https://github.com/aws/aws-sdk-cpp/blob/9204e236faaa1ca6a0342dee7caf61c7cf5ad8bb/src/aws-cpp-sdk-core/source/client/CoreErrors.cpp#L90-L91 | 12:35:08 |
Sergei Zimmerman (xokdvium) | Yup bernardo added that in https://github.com/NixOS/nix/pull/15449 | 12:35:11 |
Arian | Sweet | 12:35:38 |
Mic92 | where dos this handle 400? | 12:37:33 |
Mic92 | * where does this handle 400? | 12:37:39 |
Sergei Zimmerman (xokdvium) | I was talking about 503, 400 is not handled there. We'd need to look at the response xml | 12:38:17 |
Arian | Line 113 | 12:38:56 |
Arian | Decides if an error is retry able | 12:39:11 |
Mic92 | True, but I think Sergei is right, we need to look at the xml | 12:41:00 |
Mic92 | deploying this now | 12:44:59 |
Mic92 | how did you came up with this number, so I can monitor? | 12:49:53 |
hexa | journalctl -u hydra-queue-runner --since "2 days ago" --grep="unable to upload" | wc -l | 12:50:38 |
Mic92 | okay, we don't have any upload error since: Thu Mar 19 01:01:45 PM UTC 2026 | 13:02:00 |
Mic92 | Let's see | 13:02:05 |
Mic92 | journalctl -u hydra-queue-runner --since "3 hour ago" --grep="unable to upload" is also empty | 13:02:13 |
Mic92 | while true; do journalctl -u hydra-queue-runner --since "2026-03-19 13:01:45" --grep="unable to upload"; sleep 180; done | 13:06:19 |
Mic92 | * while true; do journalctl -u hydra-queue-runner --since "2026-03-19 13:01:45" --grep="unable to upload"; sleep 180; done for monitoring | 13:06:29 |
Mic92 | so far only getting a bunch of 500er:
Mar 19 13:11:55 mimas hydra-queue-runner[2800047]: warning: unable to upload 'https://nix-cache.s3.us-east-1.amazonaws.com/nar/0a8jn6fw74pdnpa10viam4wb4jmpszkfw19zrrm1h1n40zlpnlcx.nar.xz': HTTP error 503
Mar 19 13:11:55 mimas hydra-queue-runner[2800047]: warning: unable to upload 'https://nix-cache.s3.us-east-1.amazonaws.com/nar/0a8jn6fw74pdnpa10viam4wb4jmpszkfw19zrrm1h1n40zlpnlcx.nar.xz': HTTP error 503
| 13:16:20 |
Mic92 | * so far only getting a bunch of one 500er:
Mar 19 13:11:55 mimas hydra-queue-runner[2800047]: warning: unable to upload 'https://nix-cache.s3.us-east-1.amazonaws.com/nar/0a8jn6fw74pdnpa10viam4wb4jmpszkfw19zrrm1h1n40zlpnlcx.nar.xz': HTTP error 503 | 13:16:40 |
Mic92 | Still only one error. | 15:24:49 |