OfBorg - Public Room Timeline

	OfBorg	174 Members
	Number of builds and evals in queue: <TBD>	62 Servers

Load older messages

Sender	Message	Time
4 Apr 2023
hexa	stuck at kexinit	13:05:12
hexa	if I didn't know any better I would assume MTU 😛	13:05:30
cole-h	[ 550.001721] mlx5_core 0001:01:00.1: wait_func:1137:(pid 18057): MODIFY_CQ(0x403) canceled on out of queue timeout. [ 550.001723] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18053): MODIFY_CQ(0x403) canceled on out of queue timeout. [ 551.221694] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 551.227603] rcu: 60-...0: (4 GPs behind) idle=d9e4/1/0x4000000000000000 softirq=1016/1016 fqs=54368 [ 551.236726] (detected by 34, t=115624 jiffies, g=13597, q=153057 ncpus=80) [ 551.243675] Task dump for CPU 60: [ 551.246977] task:kworker/u160:5 state:R running task stack:0 pid:815 ppid:2 flags:0x0000000a [ 551.256879] Workqueue: efi_rts_wq efi_call_rts [ 551.261313] Call trace: [ 551.263747] __switch_to+0xf0/0x170 [ 551.267226] 0xffff081f5b486ac0 [ 556.145647] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18065): ACCESS_REG(0x805) canceled on out of queue timeout. [ 558.193622] mlx5_core 0001:01:00.0: wait_func:1137:(pid 18068): ACCESS_REG(0x805) canceled on out of queue timeout.	13:05:34
cole-h	lol	13:05:36
hexa	low entropy?	13:05:38
hexa	that call trace is magnificent	13:06:10
hexa	__switch_to!	13:06:14
cole-h	[ 605.297123] INFO: task kworker/u160:3:519 blocked for more than 483 seconds. [ 605.324779] Tainted: P O 6.1.22 #1-NixOS [ 605.330601] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.338417] task:kworker/u160:3 state:D stack:0 pid:519 ppid:2 flags:0x00000008 [ 605.346758] Workqueue: events_freezable_power_ sync_hw_clock [ 605.352409] Call trace: [ 605.354844] __switch_to+0xf0/0x170 [ 605.358325] __schedule+0x30c/0x1254 [ 605.361889] schedule+0x58/0xec [ 605.365017] schedule_timeout+0x14c/0x180 [ 605.369017] __wait_for_common+0xd4/0x250 [ 605.373017] wait_for_completion+0x28/0x34 [ 605.377102] virt_efi_set_time+0x114/0x190 [ 605.381188] efi_set_time+0x84/0xc0 [ 605.384664] rtc_set_time+0xc0/0x1c4 [ 605.388229] sync_hw_clock+0x1ac/0x230 [ 605.391966] process_one_work+0x1f4/0x460 [ 605.395966] worker_thread+0x188/0x4e0 [ 605.399704] kthread+0xe0/0xe4 [ 605.402747] ret_from_fork+0x10/0x20 [ 605.406326] INFO: task kworker/7:1H:808 blocked for more than 362 seconds. [ 605.413189] Tainted: P O 6.1.22 #1-NixOS [ 605.419009] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.426826] task:kworker/7:1H state:D stack:0 pid:808 ppid:2 flags:0x00000008 [ 605.435165] Workqueue: kblockd blk_mq_timeout_work [ 605.439946] Call trace: [ 605.442381] __switch_to+0xf0/0x170 [ 605.445858] __schedule+0x30c/0x1254 [ 605.449423] schedule+0x58/0xec [ 605.452551] schedule_timeout+0x14c/0x180 [ 605.456550] __wait_for_common+0xd4/0x250 [ 605.460548] wait_for_completion+0x28/0x34 [ 605.464633] __wait_rcu_gp+0x194/0x1c4 [ 605.468371] synchronize_rcu+0x68/0xa0 [ 605.472110] blk_mq_timeout_work+0x198/0x1dc [ 605.476369] process_one_work+0x1f4/0x460 [ 605.480368] worker_thread+0x188/0x4e0 [ 605.484106] kthread+0xe0/0xe4 [ 605.487149] ret_from_fork+0x10/0x20	13:06:14
cole-h	I'm gonna bonk it again	13:06:39
hexa	want to try a previous regeneration?	13:06:59
hexa	if you even have that 😄	13:07:08
cole-h	not yet (because it's not easy, if possible lol)	13:07:16
cole-h	telling Equinix to reboot the box is much easier hehe	13:07:30
hexa	could very well be a kernel regression	13:07:31
cole-h	lovely	13:07:39
hexa	because who tests lts kernels, right?	13:08:03
hexa	you just backport stuff into it and move on	13:08:11
cole-h	lmao	13:08:15
raitobezarius	In reply to @hexa:lossy.network you just backport stuff into it and move on greg k-h enters the channel	14:47:01
cole-h	Found this thread: https://lkml.org/lkml/2023/3/16/765 So while it's not 6.2 as that thread mentions, may be the same problem	14:47:34
hexa	kernel downgrade when	14:52:46
hexa	would also be interesting to know what its previous kernel verison was	14:55:45
hexa	* would also be interesting to know what its previous kernel version was	14:55:49
cole-h	🤷 the box is unpinned, but likely 6.1.21 was its previous version	14:56:08
cole-h	ok, 6.1.21 is also busted [ 110.726426] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: [ 110.732340] rcu: 44-...0: (0 ticks this GP) idle=9b44/1/0x4000000000000000 softirq=2863/2863 fqs=2462 [ 110.741636] (detected by 70, t=5255 jiffies, g=14289, q=8797 ncpus=80) [ 110.748238] Task dump for CPU 44: [ 110.751540] task:kworker/u160:1 state:R running task stack:0 pid:419 ppid:2 flags:0x0000000a [ 110.761443] Workqueue: efi_rts_wq efi_call_rts [ 110.765878] Call trace: [ 110.768312] __switch_to+0xf0/0x170 [ 110.771791] 0xffff07ff85645b80	15:38:21
cole-h	nvm it's still 6.1.22 somehow	15:39:29
cole-h	[ 242.441034] INFO: task kworker/u160:0:9 blocked for more than 120 seconds. [ 242.447910] Tainted: P O 6.1.22 #1-NixOS [ 242.453735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.461553] task:kworker/u160:0 state:D stack:0 pid:9 ppid:2 flags:0x00000008 [ 242.469895] Workqueue: events_freezable_power_ sync_hw_clock [ 242.475548] Call trace: [ 242.477986] __switch_to+0xf0/0x170 [ 242.481467] __schedule+0x30c/0x1254 [ 242.485035] schedule+0x58/0xec [ 242.488164] schedule_timeout+0x14c/0x180 [ 242.492165] __wait_for_common+0xd4/0x250 [ 242.496165] wait_for_completion+0x28/0x34 [ 242.500251] virt_efi_set_time+0x114/0x190 [ 242.504339] efi_set_time+0x84/0xc0 [ 242.507818] rtc_set_time+0xc0/0x1c4 [ 242.511385] sync_hw_clock+0x1ac/0x230 [ 242.515123] process_one_work+0x1f4/0x460 [ 242.519124] worker_thread+0x188/0x4e0 [ 242.522863] kthread+0xe0/0xe4 [ 242.525908] ret_from_fork+0x10/0x20	15:39:43
hexa	In reply to @cole-h:matrix.org not yet (because it's not easy, if possible lol) 🤡	15:40:13
cole-h	oh I missed something lol, let's try again	15:55:36
cole-h	[ 109.650254] rcu: rcu_sched kthread timer wakeup didn't happen for 3034 jiffies! g13841 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 109.661456] rcu: Possible timer handling issue on cpu=44 timer-softirq=210 [ 109.668404] rcu: rcu_sched kthread starved for 3040 jiffies! g13841 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=44 [ 109.678737] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior. [ 109.687682] rcu: RCU grace-period kthread stack dump: [ 109.692720] task:rcu_sched state:I stack:0 pid:14 ppid:2 flags:0x00000008 [ 109.701057] Call trace: [ 109.703490] __switch_to+0xf0/0x170 [ 109.706967] __schedule+0x30c/0x1254 [ 109.710530] schedule+0x58/0xec [ 109.713659] schedule_timeout+0xa4/0x180 [ 109.717571] rcu_gp_fqs_loop+0x138/0x4ac [ 109.721483] rcu_gp_kthread+0x1d4/0x210 [ 109.725307] kthread+0xe0/0xe4 [ 109.728350] ret_from_fork+0x10/0x20 welp	16:22:23

Show newer messages

Back to Room ListRoom Version: 6