sistema sem resposta após bloqueio suave

1

Observamos frequentes problemas de bloqueio no Ubuntu 12.04 (kernel: 3.8.0-29-generic) e descobrimos que o sistema não responde depois disso. Aqui estão as mensagens kern.log logo antes de ocorreram bloqueios suaves. Qualquer ajuda seria muito apreciada.

Mar 29 00:12:01 HOST9016 kernel: [387780.959368] BUG: soft lockup - CPU#60 stuck for 23s! [java:113233]
Mar 29 00:12:01 HOST9016 kernel: [387781.007045] BUG: soft lockup - CPU#63 stuck for 23s! [java:113220]
Mar 29 00:12:01 HOST9016 kernel: [387781.007516] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.007520] CPU 63 
Mar 29 00:12:01 HOST9016 kernel: [387781.007521] Pid: 113220, comm: java Tainted: GF            3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.007530] RIP: 0010:[<ffffffff811674a5>]  [<ffffffff811674a5>] change_pte_range+0x205/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.007532] RSP: 0018:ffff883dbc9ffca8  EFLAGS: 00000286
Mar 29 00:12:01 HOST9016 kernel: [387781.007533] RAX: ffffea00f1431600 RBX: ffff883dbc8d4958 RCX: 0600000000080068
Mar 29 00:12:01 HOST9016 kernel: [387781.007960] RDX: 0000000000000000 RSI: 00007f2769b6e000 RDI: 8000003c50c58166
Mar 29 00:12:01 HOST9016 kernel: [387781.007961] RBP: ffff883dbc9ffd48 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007961] R10: 0000000000000004 R11: 0000000000000202 R12: 0000000000000004
Mar 29 00:12:01 HOST9016 kernel: [387781.007962] R13: 0000000000000202 R14: ffffffff81ce6fa0 R15: ffff883dbc9ffc98
Mar 29 00:12:01 HOST9016 kernel: [387781.007964] FS:  00007f22b1059700(0000) GS:ffff881fffa40000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.007966] CR2: 00007f47ab783028 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.007967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.007968] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.007969] Process java (pid: 113220, threadinfo ffff883dbc9fe000, task ffff883dbc9045c0)
Mar 29 00:12:01 HOST9016 kernel: [387781.007970] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.007985]  ffff883dbc9ffd38 ffff881fd06ba940 ffff881fd06ba680 000000007a400000
Mar 29 00:12:01 HOST9016 kernel: [387781.008435]  00007f2689600000 0000000000000001 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.008445]  ffffea017f48e570 8000000000000025 8000003c50c58166 00007f2769c00000
Mar 29 00:12:01 HOST9016 kernel: [387781.008445] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.008452]  [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.008875]  [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.008881]  [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.008889]  [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.008895]  [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.009311]  [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.009318]  [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.009738] Code: 0f 84 73 ff ff ff e9 69 ff ff ff 0f 1f 00 48 8b 7d 90 4c 89 f2 4c 89 ee e8 89 54 ff ff 31 d2 48 85 c0 0f 84 34 ff ff ff 48 8b 08 <48> c1 e9 3a 83 bd 7c ff ff ff ff 74 7e 39 8d 7c ff ff ff 0f b6 
Mar 29 00:12:01 HOST9016 kernel: [387781.098867] BUG: soft lockup - CPU#69 stuck for 23s! [java:113232]
Mar 29 00:12:01 HOST9016 kernel: [387781.148120] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.150284] CPU 69 
Mar 29 00:12:01 HOST9016 kernel: [387781.150288] Pid: 113232, comm: java Tainted: GF            3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.150701] RIP: 0010:[<ffffffff811674a5>]  [<ffffffff811674a5>] change_pte_range+0x205/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.150706] RSP: 0018:ffff887fcba19ca8  EFLAGS: 00000286
Mar 29 00:12:01 HOST9016 kernel: [387781.151137] RAX: ffffea00f71aee00 RBX: ffff883dbc8d4958 RCX: 0600000000080078
Mar 29 00:12:01 HOST9016 kernel: [387781.151139] RDX: 0000000000000000 RSI: 00007f2a3c820000 RDI: 8000003dc6bb8166
Mar 29 00:12:01 HOST9016 kernel: [387781.151141] RBP: ffff887fcba19d48 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151143] R10: 0000000000000004 R11: 0000000000000293 R12: 0000000000000004
Mar 29 00:12:01 HOST9016 kernel: [387781.151145] R13: 0000000000000293 R14: ffffffff81ce6fa0 R15: ffff887fcba19c98
Mar 29 00:12:01 HOST9016 kernel: [387781.151148] FS:  00007f22829a7700(0000) GS:ffff881fffb00000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.151153] CR2: 00007f60e5451720 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.151154] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.151156] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.151599] Process java (pid: 113232, threadinfo ffff887fcba18000, task ffff887cfb0345c0)
Mar 29 00:12:01 HOST9016 kernel: [387781.151600] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.151602]  ffff887fcba19d38 ffff881fd06ba940 0000000000000293 0000000200000004
Mar 29 00:12:01 HOST9016 kernel: [387781.152476]  0000000000000000 ffff883dbc8d4958 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.152895]  ffffea01723ae170 8000000000000025 8000003dc6bb8166 00007f2a3ca00000
Mar 29 00:12:01 HOST9016 kernel: [387781.153738] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.154157]  [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.154575]  [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.154992]  [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.155001]  [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.155009]  [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.155015]  [<ffffffff816f254b>] ? __schedule+0x3bb/0x6b0
Mar 29 00:12:01 HOST9016 kernel: [387781.155021]  [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.155448]  [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.155450] Code: 0f 84 73 ff ff ff e9 69 ff ff ff 0f 1f 00 48 8b 7d 90 4c 89 f2 4c 89 ee e8 89 54 ff ff 31 d2 48 85 c0 0f 84 34 ff ff ff 48 8b 08 <48> c1 e9 3a 83 bd 7c ff ff ff ff 74 7e 39 8d 7c ff ff ff 0f b6 
Mar 29 00:12:01 HOST9016 kernel: [387781.262831] BUG: soft lockup - CPU#79 stuck for 22s! [java:113234]
Mar 29 00:12:01 HOST9016 kernel: [387781.314646] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
Mar 29 00:12:01 HOST9016 kernel: [387781.319281] CPU 79 
Mar 29 00:12:01 HOST9016 kernel: [387781.319285] Pid: 113234, comm: java Tainted: GF            3.8.0-29-generic #42~precise1-Ubuntu HP ProLiant DL580 Gen8
Mar 29 00:12:01 HOST9016 kernel: [387781.319288] RIP: 0010:[<ffffffff8115c93f>]  [<ffffffff8115c93f>] vm_normal_page+0x1f/0x80
Mar 29 00:12:01 HOST9016 kernel: [387781.320152] RSP: 0000:ffff887d8ede7c88  EFLAGS: 00000a06
Mar 29 00:12:01 HOST9016 kernel: [387781.320568] RAX: 0070bea105980000 RBX: ffff881fd06ba940 RCX: 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.320570] RDX: 8000001c2fa84166 RSI: 00007f2b98da6000 RDI: 8000001c2fa84166
Mar 29 00:12:01 HOST9016 kernel: [387781.320572] RBP: ffff887d8ede7c98 R08: ffff883dbc8d4958 R09: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.320573] R10: 0000000000000004 R11: 0000000000000202 R12: 000000000000004f
Mar 29 00:12:01 HOST9016 kernel: [387781.320998] R13: ffffffff8104e810 R14: 000000000000003c R15: 0000004fd2942458
Mar 29 00:12:01 HOST9016 kernel: [387781.321001] FS:  00007f22827a5700(0000) GS:ffff883fffa60000(0000) knlGS:0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.321002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 29 00:12:01 HOST9016 kernel: [387781.321004] CR2: 00007f47b2eb3000 CR3: 0000001d8f9a3000 CR4: 00000000001407e0
Mar 29 00:12:01 HOST9016 kernel: [387781.321419] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 29 00:12:01 HOST9016 kernel: [387781.321421] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 29 00:12:01 HOST9016 kernel: [387781.321819] Process java (pid: 113234, threadinfo ffff887d8ede6000, task ffff887df5799740)
Mar 29 00:12:01 HOST9016 kernel: [387781.321820] Stack:
Mar 29 00:12:01 HOST9016 kernel: [387781.321821]  ffff887d8ede7c98 8000001c2fa84166 ffff887d8ede7d48 ffffffff81167497
Mar 29 00:12:01 HOST9016 kernel: [387781.322653]  ffff887d8ede7d38 ffff881fd06ba940 0000000000000202 0000000200000004
Mar 29 00:12:01 HOST9016 kernel: [387781.323512]  ffff887d8ede7e00 ffff883dbc8d4958 ffff883dbc8d4958 0000000000000001
Mar 29 00:12:01 HOST9016 kernel: [387781.324377] Call Trace:
Mar 29 00:12:01 HOST9016 kernel: [387781.324796]  [<ffffffff81167497>] change_pte_range+0x1f7/0x2d0
Mar 29 00:12:01 HOST9016 kernel: [387781.324802]  [<ffffffff811677ea>] change_protection_range+0x27a/0x410
Mar 29 00:12:01 HOST9016 kernel: [387781.325225]  [<ffffffff811679f5>] change_protection+0x75/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.325672]  [<ffffffff8117baeb>] change_prot_numa+0x1b/0x30
Mar 29 00:12:01 HOST9016 kernel: [387781.326888]  [<ffffffff8109544a>] task_numa_work+0x24a/0x320
Mar 29 00:12:01 HOST9016 kernel: [387781.326900]  [<ffffffff8107bdc8>] task_work_run+0xc8/0xf0
Mar 29 00:12:01 HOST9016 kernel: [387781.326912]  [<ffffffff81014d9a>] do_notify_resume+0xaa/0xc0
Mar 29 00:12:01 HOST9016 kernel: [387781.327756]  [<ffffffff816fcb9a>] int_signal+0x12/0x17
Mar 29 00:12:01 HOST9016 kernel: [387781.327758] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f8 48 89 d7 48 89 e5 53 48 83 ec 08 48 89 f8 0f 1f 40 00 48 c1 e0 12 <48> c1 e8 1e f6 c6 02 75 27 48 39 05 19 cd b8 00 72 3f 48 89 c3 
Mar 29 06:24:22 HOST9016 kernel: [410090.031877] BUG: soft lockup - CPU#103 stuck for 23s! [java:113233]
Mar 29 06:24:22 HOST9016 kernel: [410090.086169] Modules linked in: nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ip6table_filter(F) ip6_tables(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_LOG(F) xt_tcpudp(F) xt_conntrack(F) xt_hashlimit(F) iptable_filter(F) ip_tables(F) x_tables(F) vesafb(F) coretemp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) joydev(F) hid_generic(F) gpio_ich(F) microcode(F) psmouse(F) serio_raw(F) usbhid(F) hid(F) hpwdt(F) hpilo(F) lpc_ich(F) ioatdma(F) dca(F) wmi(F) bnep(F) rfcomm(F) bluetooth(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) acpi_power_meter(F) lockd(F) mac_hid(F) sunrpc(F) nf_conntrack_ftp(F) nf_conntrack(F) lp(F) parport(F) tg3(F) ptp(F) pps_core(F) hpsa(F)
    
por Compete2Cooperate 04.04.2016 / 13:59

2 respostas

0

Acho que não há informações suficientes nesta pergunta para respondê-la completamente, mas observando os logs, vejo que o erro é causado por um aplicativo java:

Mar 29 06:24:22 HOST9016 kernel: [410090.031877] BUG: soft lockup - CPU#103 stuck for 23s! [java:113233]

Então, acho que o próximo passo seria dar uma olhada nessa aplicação específica para ver o que ela está fazendo.

    
por 04.04.2016 / 14:45
0

Além disso, não acho que haja informações suficientes, os logs do aplicativo java teriam sido melhores, o rastreamento da pilha mostrará as chamadas de gerenciamento de memória. Possivelmente um vazamento de memória? O java não possui configurações explícitas para o máximo de memória, talvez verifique o que está definido.

    
por 04.04.2016 / 14:59