BUG: bloqueio suave - CPU # 0 preso

1

Estou executando alguns aplicativos SD-WAN como comutação SDN, roteamento etc., em um nó GUEST (Ubuntu Linux 14.04 VM) que está hospedado em um nó Compute do ambiente Openstack .

Detalhes do nó do host:

root@host-node:/var/log/nova# uname -r
4.4.0-71-generic

root@host-node:/var/log/nova# dpkg -l | egrep -i 'qemu|kvm|libvirt'
ii ipxe-qemu 1.0.0+git-20150424.a25a16d-1ubuntu1 all PXE boot firmware - ROM images for qemu
ii libvirt-bin 1.3.1-1ubuntu10.15 amd64 programs for the libvirt library
ii libvirt0:amd64 1.3.1-1ubuntu10.15 amd64 library for interfacing with different virtualization systems
ii python-libvirt 1.3.1-1ubuntu1 amd64 libvirt Python bindings
ii qemu 1:2.5+dfsg-5ubuntu10.14 amd64 fast processor emulator
ii qemu-block-extra:amd64 1:2.5+dfsg-5ubuntu10.16 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-slof 20151103+dfsg-1ubuntu1 all Slimline Open Firmware – QEMU PowerPC version
ii qemu-system 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries
ii qemu-system-arm 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (arm)
ii qemu-system-common 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-mips 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (mips)
ii qemu-system-misc 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (miscelaneous)
ii qemu-system-ppc 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (ppc)
ii qemu-system-sparc 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (sparc)
ii qemu-system-x86 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU full system emulation binaries (x86)
ii qemu-user 1:2.5+dfsg-5ubuntu10.14 amd64 QEMU user mode emulation binaries
ii qemu-utils 1:2.5+dfsg-5ubuntu10.16 amd64 QEMU utilities
CPU
processor : 47
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
stepping : 2
microcode : 0x38
cpu MHz : 2902.046
cache size : 30720 KB
physical id : 1
siblings : 24
core id : 13
cpu cores : 12
apicid : 59
initial apicid : 59
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
bugs :
bogomips : 5194.87
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

root@host-node:/var/log/nova# free -m
total used free shared buff/cache available
Mem: 773931 186967 490462 4070 96502 576184
Swap: 0 0 0

De repente, o nó GUEST (onde o aplicativo SDN está em execução) não pode ser acessado por nenhum meio (sem ping, sem SSH). Mesmo no console do Openstack Horizon Dashboard, o console está congelado.

O único remédio é reinicializar (Hard Reboot) o nó GUEST e, depois disso, ele é ativado e sem problemas.

Abaixo estão os registros do console postados.

[1263400.052002] BUG: soft lockup - CPU#0 stuck for 22s! [XXXX:1722]
[1263436.850392] BUG: soft lockup - CPU#12 stuck for 31s! [python:2059]
[1263436.855480] BUG: soft lockup - CPU#1 stuck for 57s! [sleep:18861]
[1263436.852476] BUG: soft lockup - CPU#8 stuck for 38s! [monit:1864]
[1263436.850131] [sched_delayed] sched: RT throttling activated
[1263436.855565] Modules linked in:
[1263436.850392] Modules linked in:
[1263436.855924] 
[1263436.855924] CPU: 1 PID: 18861 Comm: sleep Tainted: G           OE 3.16.0-77-generic #99~14.04.1-Ubuntu
[1263436.852476] CPU: 8 PID: 1864 Comm: monit Tainted: G           OE 3.16.0-77-generic #99~14.04.1-Ubuntu
[1263436.850392] CPU: 12 PID: 2059 Comm: python Tainted: G           OE 3.16.0-77-generic #99~14.04.1-Ubuntu
[1263436.855924] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[1263436.852476] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[1263436.850392] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[1263436.855924] task: ffff88055f497010 ti: ffff88055bc68000 task.ti: ffff88055bc68000
[1263436.852476] task: ffff880812045180 ti: ffff8800bb840000 task.ti: ffff8800bb840000
[1263436.850392] task: ffff8800bb281460 ti: ffff880811b94000 task.ti: ffff880811b94000
[1263436.852476] RIP: 0010:[<ffffffff81776676>] 
[1263436.850392] RIP: 0010:[<ffffffff811b9275>] 

Detalhes do nó do convidado:

 uname -a
Linux Guest-node 3.16.0-77-generic #99~14.04.1-Ubuntu SMP Tue Jun 28 19:17:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

 uname -r
3.16.0-77-generic

Nota: Quando o nó GUEST é congelado, o ambiente HOST Compute in Openstack exibe 100% de uso da CPU em todos os 16 núcleos.

Threads:  19 total,  16 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.8 us,  6.5 sy,  0.0 ni, 89.5 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 79250592+total, 55210470+free, 18854126+used, 51860008 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 59347756+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                       
20666 libvirt+  20   0 37.586g 0.013t  26076 R 99.9  1.8  20126:42 qemu-system-x86                                                                                                               
20654 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20123:30 qemu-system-x86                                                                                                               
20655 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:04 qemu-system-x86                                                                                                               
20656 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:04 qemu-system-x86                                                                                                               
20657 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:00 qemu-system-x86                                                                                                               
20658 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20126:57 qemu-system-x86                                                                                                               
20659 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20126:54 qemu-system-x86                                                                                                               
20660 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20126:58 qemu-system-x86                                                                                                               
20661 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20126:54 qemu-system-x86                                                                                                               
20662 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:37 qemu-system-x86                                                                                                               
20663 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20128:59 qemu-system-x86                                                                                                               
20664 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20128:48 qemu-system-x86                                                                                                               
20665 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:01 qemu-system-x86                                                                                                               
20667 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20128:41 qemu-system-x86                                                                                                               
20668 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20127:04 qemu-system-x86                                                                                                               
20669 libvirt+  20   0 37.586g 0.013t  26076 R 93.8  1.8  20006:54 qemu-system-x86                                                                                                               
20597 libvirt+  20   0 37.586g 0.013t  26076 S  0.0  1.8   4:38.56 qemu-system-x86                                                                                                               
20647 libvirt+  20   0 37.586g 0.013t  26076 S  0.0  1.8   0:00.03 qemu-system-x86                                                                                                               
20671 libvirt+  20   0 37.586g 0.013t  26076 S  0.0  1.8   0:01.83 qemu-system-x86    

Observação:

Versão do Kernel do Nó Composto do HOST: root @ nó-hospedeiro: / var / log / nova # uname -r 4.4.0-71-genérico

Portanto, não está exatamente claro se o problema acima está vinculado à versão do kernel do Linux.

Solução 1 O link abaixo está próximo do problema relatado e haverá algum problema na implementação do mesmo no ambiente virtual, como a infraestrutura em nuvem Openstack.

link

Observação: implementamos a solução acima e o problema continua.

Solução alternativa 2 Com base em mais algumas análises, a versão abaixo do kernel do Ubuntu tem problemas conhecidos para problemas de bloqueio de software.

roothost-node:/var/log/nova# uname -r 
4.4.0-71-generic

Aprecie se alguém poderia ajudar com qualquer informação para o problema relatado acima. Obrigado.

    
por Rajeshkumar 28.12.2017 / 17:16

0 respostas