!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

397 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.120 Servers

Load older messages


SenderMessageTime
25 Feb 2026
@jkarlson:kapsi.fiEmil ThorsøeObama?10:31:09
@jkarlson:kapsi.fiEmil Thorsøegotta be Obama10:31:13
@lassulus:lassul.uslassulussam altman12:26:54
@isabel:isabelroses.comisabel changed their profile picture.21:51:38
26 Feb 2026
@lily:lily.flowersLily Foster changed their profile picture.14:01:10
27 Feb 2026
@amadaluzia:tchncs.deamadaluzia[tde] changed their profile picture.03:54:05
@jfly:matrix.orgJeremy Fleischman (jfly) do we have any alerts to tell us if zfs scrub failed or if a zfs pool is degraded? the only zfs specific alert i see is for pool free space 07:06:46
@jfly:matrix.orgJeremy Fleischman (jfly) lol nevermind, i see we just added a unhealthy check: https://github.com/NixOS/infra/commit/1c46bbda28fe056dd4a4b9a0c95e5602fdf5f738 07:21:28
@jfly:matrix.orgJeremy Fleischman (jfly)

but i did play with this a bit, and i see that a pool that fails scrub stills shows up as ONLINE:

$ sudo zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 3 errors on Thu Feb 26 23:18:53 2026
config:

	NAME                            STATE     READ WRITE CKSUM
	tank                            ONLINE       0     0     0
	  /home/jeremy/tmp/fsing/disk1  ONLINE       0     0     8

errors: Permanent errors have been detected in the following files:

        /tank/hello.txt
07:23:03
@jfly:matrix.orgJeremy Fleischman (jfly)seems like the prometheus zfs exporter we use doesn't export scrub status: https://github.com/pdf/zfs_exporter/issues/20, which was closed in favor of https://github.com/pdf/zfs_exporter/issues/5, which seems to be blocked on added support for the new zfs cli json output07:35:27
@k900:0upti.meK900I am killing all builds on all non-staging non-darwin branches right now13:02:54
@k900:0upti.meK900Because of /nix/store/j4ra5i3f9x6bk3y6aq6ma17z1hlqr18d-nixos-system-konata-26.05.20260227.bde6ce613:02:57
@k900:0upti.meK900* Because of https://lore.kernel.org/all/bb9ab61c-3bed-4c3d-baf0-0bce4e142292@moonlit-rail.com/13:03:05
@k900:0upti.meK900 @vcunat can you pause the jobsets so we don't get any new evals 13:03:51
@vcunat:matrix.orgVladimír ČunátIt's possible to change config of each jobset individually.13:04:39
@k900:0upti.meK900I don't have permissions13:04:51
@vcunat:matrix.orgVladimír ČunátOK, paused now the 4 NixOS jobsets.13:06:19
@vcunat:matrix.orgVladimír Čunát(i.e. won't eval by timer)13:06:34
@vcunat:matrix.orgVladimír Čunát* (i.e. they won't eval by timer)13:06:42
@k900:0upti.meK900unstable-small already updated but too late now13:06:56
@vcunat:matrix.orgVladimír Čunátnixpkgs/unstable as well?13:07:01
@k900:0upti.meK900That one should be fine13:07:07
@k900:0upti.meK900Hopefully no one is actually running NixOS systems on it13:07:18
@vcunat:matrix.orgVladimír ČunátPush some revert, eval and bump?13:07:30
@vcunat:matrix.orgVladimír Čunát-small is primarily meant for servers which take security seriously.13:08:02
@vcunat:matrix.orgVladimír ČunátSo someone will probably be using it.13:08:13
@vcunat:matrix.orgVladimír Čunát* So someone will probably be using the channel.13:08:23
@k900:0upti.meK900Working on that13:09:50
@k900:0upti.meK900OK, reverted13:11:18
@k900:0upti.meK900Starting new eval13:11:29

Show newer messages


Back to Room ListRoom Version: 6