!sBfrWMVsLoSyFTCkNv:nixos.org

OfBorg

174 Members
Number of builds and evals in queue: <TBD>62 Servers

Load older messages


SenderMessageTime
4 Apr 2023
@hexa:lossy.networkhexastuck at kexinit13:05:12
@hexa:lossy.networkhexaif I didn't know any better I would assume MTU 😛13:05:30
@cole-h:matrix.orgcole-h
[  550.001721] mlx5_core 0001:01:00.1: wait_func:1137:(pid 18057): MODIFY_CQ(0x403) canceled on out of queue timeout.
[  550.001723] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18053): MODIFY_CQ(0x403) canceled on out of queue timeout.
[  551.221694] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  551.227603] rcu:     60-...0: (4 GPs behind) idle=d9e4/1/0x4000000000000000 softirq=1016/1016 fqs=54368
[  551.236726]  (detected by 34, t=115624 jiffies, g=13597, q=153057 ncpus=80)
[  551.243675] Task dump for CPU 60:
[  551.246977] task:kworker/u160:5  state:R  running task     stack:0     pid:815   ppid:2      flags:0x0000000a
[  551.256879] Workqueue: efi_rts_wq efi_call_rts
[  551.261313] Call trace:
[  551.263747]  __switch_to+0xf0/0x170
[  551.267226]  0xffff081f5b486ac0
[  556.145647] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18065): ACCESS_REG(0x805) canceled on out of queue timeout.
[  558.193622] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18068): ACCESS_REG(0x805) canceled on out of queue timeout.
13:05:34
@cole-h:matrix.orgcole-hlol13:05:36
@hexa:lossy.networkhexalow entropy?13:05:38
@hexa:lossy.networkhexathat call trace is magnificent13:06:10
@hexa:lossy.networkhexa__switch_to!13:06:14
@cole-h:matrix.orgcole-h
[  605.297123] INFO: task kworker/u160:3:519 blocked for more than 483 seconds.
[  605.324779]       Tainted: P           O       6.1.22 #1-NixOS
[  605.330601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.338417] task:kworker/u160:3  state:D stack:0     pid:519   ppid:2      flags:0x00000008
[  605.346758] Workqueue: events_freezable_power_ sync_hw_clock
[  605.352409] Call trace:
[  605.354844]  __switch_to+0xf0/0x170
[  605.358325]  __schedule+0x30c/0x1254
[  605.361889]  schedule+0x58/0xec
[  605.365017]  schedule_timeout+0x14c/0x180
[  605.369017]  __wait_for_common+0xd4/0x250
[  605.373017]  wait_for_completion+0x28/0x34
[  605.377102]  virt_efi_set_time+0x114/0x190
[  605.381188]  efi_set_time+0x84/0xc0
[  605.384664]  rtc_set_time+0xc0/0x1c4
[  605.388229]  sync_hw_clock+0x1ac/0x230
[  605.391966]  process_one_work+0x1f4/0x460
[  605.395966]  worker_thread+0x188/0x4e0
[  605.399704]  kthread+0xe0/0xe4
[  605.402747]  ret_from_fork+0x10/0x20
[  605.406326] INFO: task kworker/7:1H:808 blocked for more than 362 seconds.
[  605.413189]       Tainted: P           O       6.1.22 #1-NixOS
[  605.419009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.426826] task:kworker/7:1H    state:D stack:0     pid:808   ppid:2      flags:0x00000008
[  605.435165] Workqueue: kblockd blk_mq_timeout_work
[  605.439946] Call trace:
[  605.442381]  __switch_to+0xf0/0x170
[  605.445858]  __schedule+0x30c/0x1254
[  605.449423]  schedule+0x58/0xec
[  605.452551]  schedule_timeout+0x14c/0x180
[  605.456550]  __wait_for_common+0xd4/0x250
[  605.460548]  wait_for_completion+0x28/0x34
[  605.464633]  __wait_rcu_gp+0x194/0x1c4
[  605.468371]  synchronize_rcu+0x68/0xa0
[  605.472110]  blk_mq_timeout_work+0x198/0x1dc
[  605.476369]  process_one_work+0x1f4/0x460
[  605.480368]  worker_thread+0x188/0x4e0
[  605.484106]  kthread+0xe0/0xe4
[  605.487149]  ret_from_fork+0x10/0x20
13:06:14
@cole-h:matrix.orgcole-hI'm gonna bonk it again13:06:39
@hexa:lossy.networkhexawant to try a previous regeneration?13:06:59
@hexa:lossy.networkhexaif you even have that 😄13:07:08
@cole-h:matrix.orgcole-hnot yet (because it's not easy, if possible lol)13:07:16
@cole-h:matrix.orgcole-htelling Equinix to reboot the box is much easier hehe13:07:30
@hexa:lossy.networkhexacould very well be a kernel regression13:07:31
@cole-h:matrix.orgcole-hlovely13:07:39
@hexa:lossy.networkhexabecause who tests lts kernels, right?13:08:03
@hexa:lossy.networkhexayou just backport stuff into it and move on 13:08:11
@cole-h:matrix.orgcole-hlmao13:08:15
@raitobezarius:matrix.orgraitobezarius
In reply to @hexa:lossy.network
you just backport stuff into it and move on
greg k-h enters the channel
14:47:01
@cole-h:matrix.orgcole-hFound this thread: https://lkml.org/lkml/2023/3/16/765 So while it's not 6.2 as that thread mentions, may be the same problem14:47:34
@hexa:lossy.networkhexakernel downgrade when14:52:46
@hexa:lossy.networkhexawould also be interesting to know what its previous kernel verison was14:55:45
@hexa:lossy.networkhexa * would also be interesting to know what its previous kernel version was14:55:49
@cole-h:matrix.orgcole-h🤷 the box is unpinned, but likely 6.1.21 was its previous version14:56:08
@cole-h:matrix.orgcole-h

ok, 6.1.21 is also busted

[  110.726426] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  110.732340] rcu:     44-...0: (0 ticks this GP) idle=9b44/1/0x4000000000000000 softirq=2863/2863 fqs=2462
[  110.741636]  (detected by 70, t=5255 jiffies, g=14289, q=8797 ncpus=80)
[  110.748238] Task dump for CPU 44:
[  110.751540] task:kworker/u160:1  state:R  running task     stack:0     pid:419   ppid:2      flags:0x0000000a
[  110.761443] Workqueue: efi_rts_wq efi_call_rts
[  110.765878] Call trace:
[  110.768312]  __switch_to+0xf0/0x170
[  110.771791]  0xffff07ff85645b80
15:38:21
@cole-h:matrix.orgcole-hnvm it's still 6.1.22 somehow15:39:29
@cole-h:matrix.orgcole-h
[  242.441034] INFO: task kworker/u160:0:9 blocked for more than 120 seconds.
[  242.447910]       Tainted: P           O       6.1.22 #1-NixOS
[  242.453735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.461553] task:kworker/u160:0  state:D stack:0     pid:9     ppid:2      flags:0x00000008
[  242.469895] Workqueue: events_freezable_power_ sync_hw_clock
[  242.475548] Call trace:
[  242.477986]  __switch_to+0xf0/0x170
[  242.481467]  __schedule+0x30c/0x1254
[  242.485035]  schedule+0x58/0xec
[  242.488164]  schedule_timeout+0x14c/0x180
[  242.492165]  __wait_for_common+0xd4/0x250
[  242.496165]  wait_for_completion+0x28/0x34
[  242.500251]  virt_efi_set_time+0x114/0x190
[  242.504339]  efi_set_time+0x84/0xc0
[  242.507818]  rtc_set_time+0xc0/0x1c4
[  242.511385]  sync_hw_clock+0x1ac/0x230
[  242.515123]  process_one_work+0x1f4/0x460
[  242.519124]  worker_thread+0x188/0x4e0
[  242.522863]  kthread+0xe0/0xe4
[  242.525908]  ret_from_fork+0x10/0x20
15:39:43
@hexa:lossy.networkhexa
In reply to @cole-h:matrix.org
not yet (because it's not easy, if possible lol)
🤡
15:40:13
@cole-h:matrix.orgcole-hoh I missed something lol, let's try again15:55:36
@cole-h:matrix.orgcole-h
[  109.650254] rcu: rcu_sched kthread timer wakeup didn't happen for 3034 jiffies! g13841 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[  109.661456] rcu:     Possible timer handling issue on cpu=44 timer-softirq=210
[  109.668404] rcu: rcu_sched kthread starved for 3040 jiffies! g13841 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=44
[  109.678737] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  109.687682] rcu: RCU grace-period kthread stack dump:
[  109.692720] task:rcu_sched       state:I stack:0     pid:14    ppid:2      flags:0x00000008
[  109.701057] Call trace:
[  109.703490]  __switch_to+0xf0/0x170
[  109.706967]  __schedule+0x30c/0x1254
[  109.710530]  schedule+0x58/0xec
[  109.713659]  schedule_timeout+0xa4/0x180
[  109.717571]  rcu_gp_fqs_loop+0x138/0x4ac
[  109.721483]  rcu_gp_kthread+0x1d4/0x210
[  109.725307]  kthread+0xe0/0xe4
[  109.728350]  ret_from_fork+0x10/0x20

welp

16:22:23

Show newer messages


Back to Room ListRoom Version: 6