Compreender e depurar freqüente 'difícil LOCKUP na CPU'

1

Minha caixa do Ubuntu trava com frequência (várias vezes por dia), deixando mensagens como essas (algumas vezes truncadas) no syslog e kern.log :

Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843824] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843826] Modules linked in: nls_utf8 btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) snd_hda_codec_hdmi nls_iso8859_1 eeepc_wmi asus_wmi sparse_keymap intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic aesni_intel aes_x86_64 lrw gf128mul glue_helper input_leds ablk_helper cryptd serio_raw snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep sb_edac edac_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq lpc_ich snd_seq_device snd_timer snd mei_me mei soundcore shpchp 8250_fintek mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid nouveau mxm_wmi video i2c_algo_bit ttm drm_kms_helper psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm ahci libahci ptp nvme pps_core fjes wmi
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843881] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G           OE   4.4.0-34-generic #53-Ubuntu
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843883] Hardware name: ASUS All Series/X99-A/USB 3.1, BIOS 3005 04/11/2016
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843884] task: ffff8807fb493700 ti: ffff8807fb4a8000 task.ti: ffff8807fb4a8000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843885] RIP: 0010:[<ffffffff816c3f61>]  [<ffffffff816c3f61>] cpuidle_enter_state+0x111/0x2b0
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843890] RSP: 0018:ffff8807fb4abe70  EFLAGS: 00000246
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843891] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000018
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843892] RDX: 00195eb06e5732b1 RSI: 0000000000500101 RDI: 0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843892] RBP: ffff8807fb4abea8 R08: 000000000032b396 R09: 0000000000000018
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843893] R10: ffff8807fb4abe20 R11: 000000000000bf7e R12: 0000000000000004
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843894] R13: ffffe8ffffd40a00 R14: 0000032c4dc034f3 R15: ffffffff81eb1f38
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843895] FS:  0000000000000000(0000) GS:ffff8807ff540000(0000) knlGS:0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843895] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843896] CR2: 00001496c022a008 CR3: 0000000002e0a000 CR4: 00000000003426e0
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843897] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843898] Stack:
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843899]  00000000ff553b00 0000032c4e87d60d ffffffff81f36140 ffff8807fb4ac000
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843900]  ffffe8ffffd40a00 ffffffff81eb1da0 ffff8807fb4a8000 ffff8807fb4abeb8
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843901]  ffffffff816c4137 ffff8807fb4abed0 ffffffff810c3fe2 ffffffff816c4113
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843903] Call Trace:
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843905]  [<ffffffff816c4137>] cpuidle_enter+0x17/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843908]  [<ffffffff810c3fe2>] call_cpuidle+0x32/0x60
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843910]  [<ffffffff816c4113>] ? cpuidle_select+0x13/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843911]  [<ffffffff810c42a0>] cpu_startup_entry+0x290/0x350
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843914]  [<ffffffff810516e4>] start_secondary+0x154/0x190
Sep  1 11:09:55 majestic-daemon kernel: [ 3506.843915] Code: 48 41 89 c4 e8 01 1a a3 ff 48 89 45 d0 0f 1f 44 00 00 31 ff e8 41 ff 9f ff 8b 45 cc 85 c0 0f 85 31 01 00 00 fb 66 0f 1f 44 00 00 <48> 8b 5d d0 48 ba cf f7 53 e3 a5 9b c4 20 4c 29 f3 48 89 d8 48 
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682295] INFO: rcu_sched detected stalls on CPUs/tasks:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682301]  13-...: (1 GPs behind) idle=54b/1/0 softirq=125133/125133 fqs=13613 
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682302]  (detected by 9, t=15002 jiffies, g=108062, c=108061, q=1917)
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682304] Task dump for CPU 13:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682305] swapper/13      R  running task        0     0      1 0x00000008
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682307]  ffff8807fb4abe70 0000000000000018 00000000ff553b00 0000032c4e87d60d
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682308]  ffffffff81f36140 ffff8807fb4ac000 ffffe8ffffd40a00 ffffffff81eb1da0
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682309]  ffff8807fb4a8000 ffff8807fb4abeb8 ffffffff816c4137 ffff8807fb4abed0
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682311] Call Trace:
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682317]  [<ffffffff816c4137>] ? cpuidle_enter+0x17/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682320]  [<ffffffff810c3fe2>] ? call_cpuidle+0x32/0x60
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682322]  [<ffffffff816c4113>] ? cpuidle_select+0x13/0x20
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682323]  [<ffffffff810c42a0>] ? cpu_startup_entry+0x290/0x350
Sep  1 11:09:55 majestic-daemon kernel: [ 3550.682326]  [<ffffffff810516e4>] ? start_secondary+0x154/0x190

A internet está cheia de sugestões sobre como consertá-las, incluindo a instalação de drivers, a desinstalação de drivers, a alteração das configurações do kernel, a alteração das configurações do BIOS e muitos outros voodoo. Ainda estou para ver qualquer explicação de como um remédio específico foi escolhido em qualquer caso particular.

Como devo começar a depurar um "difícil LOCKUP" como este? O que significa o resultado da mensagem, e como devo agir de acordo com as informações contidas nele para iniciar uma correção?

    
por Mark Amery 22.09.2016 / 12:28

0 respostas

Tags