| 22 Sep 2021 |
Mic92 | wow | 06:09:33 |
Sandro | Disable fsck.f2fs | 06:18:02 |
Sandro | Or use ext4 | 06:18:07 |
Mic92 | or zfs as we do on other builders | 06:21:31 |
Mic92 | I don't think disabling fsck.f2fs is an option | 06:21:47 |
Mic92 | ext4 also has issues with inodes just like f2fs I think | 06:22:24 |
nix-community-bot | [firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-github.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-pypi.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-updatescript.service.
| 07:19:30 |
nix-community-bot | [firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-github.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-pypi.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-updatescript.service.
| 11:24:30 |
Mic92 | @ryantm: why do all those services constantly fail lately? | 11:25:42 |
ryantm | Mic92: By parallelizing the updates, I quadrupled the chances of something crashing. I also fixed some issues where it would sit for a long time without timing out, so it is probably getting to the crashes more quickly. I think it was crashing before just much more slowly. | 14:48:00 |
Mic92 | @ryantm: is there no way of catching errors at a higher level in your application? | 14:48:44 |
ryantm | Mic92: I'm sure there is! I've been slowly looking at the crashes and trying to fix them. For example: https://github.com/ryantm/nixpkgs-update/commit/279960c32fd2de5d86172251a041508f7945bb98 | 14:52:36 |
ryantm | I'm not sure if that commit got deployed though, because of the issues with nixops. | 14:52:55 |
Mic92 | I deployed yesterday | 14:53:45 |
Mic92 | all sources here should be deployed: https://github.com/nix-community/infra/tree/master/nix | 14:54:12 |
Mic92 | Maybe we should filter update errors and if they hit some threshold than we alert. | 14:56:10 |
Mic92 | That could be done by parsing nixpkgs-update logs or nixpkgs-update could expose some metrics that we can collect with telegraf. | 14:56:38 |
Mic92 | this would be also an option: https://hackage.haskell.org/package/ekg-statsd-0.2.5.0/docs/System-Remote-Monitoring-Statsd.html | 14:58:04 |
Mic92 | telegraf has a statsd input | 14:58:19 |
Mic92 | Your application would send stats to telegraf and than it is queryable with prometheus | 14:58:36 |
ryantm | Yeah, looks like I still need to deploy that top-level IO error catch. | 15:02:47 |
Mic92 | Right, but this still give you some insights once the error handler is in place. | 15:03:53 |
Mic92 | Because with updates something always will fail, but you still want to see if it deviates from what you had before. | 15:04:40 |
nix-community-bot | [firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-github.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-pypi.service.
[firing] systemd_service_failed: nix-community-build02 failed to (re)start service nixpkgs-update-updatescript.service.
| 15:29:31 |
Mic92 | I snoozed alerts again for 120h | 15:46:51 |
nix-community-bot | [nix-community/infra] ryantm pushed to master: update nixpkgs-update
* top-level IO exception catching - https://github.com/nix-community/infra/commit/86a999b233d078568fd3690da194cdfee8244d60 | 16:06:19 |
ryantm | The top-level IO exception catch should be in place now. | 16:09:08 |
| 23 Sep 2021 |
nix-community-bot | [resolved] filesystem_inodes_full: build02.nix-community.org:9273 device disk/by-uuid/29a6b37b-fafb-46a1-b856-1e1c20dc053b on /nix/store got less than 10% inodes left on its filesystem.
[resolved] filesystem_inodes_full: build02.nix-community.org:9273 device disk/by-uuid/29a6b37b-fafb-46a1-b856-1e1c20dc053b on / got less than 10% inodes left on its filesystem.
| 03:18:30 |
nix-community-bot | [nix-community/infra] ryantm pushed to master: nixpkgs-update: reset after IO exception caught - https://github.com/nix-community/infra/commit/4ba238f2a6d427e5292cff1f97403d201f8d096e | 04:16:05 |
nix-community-bot | [nix-community/infra] ryantm pushed to master: nixpkgs-update: cleanup nixpkgs-review files - https://github.com/nix-community/infra/commit/56764eff2351124eb17ddef4e3ddf07ff8d51117 | 04:23:31 |