Servidor para de responder

2

De repente, sem motivo aparente, meu servidor parou de responder. Isso é o que eu encontrei em / var / log / messages O que pode ser?

    Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
    Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:56 __list_del_entry+0x82/0xd0()
    Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
    Apr 29 13:40:47 stephan kernel: list_del corruption. next->prev should be ffff880c86f92000, but was ffff880c86f92800
    Apr 29 13:40:47 stephan kernel: Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables bonding ip6t_REJECT nf_conntrack_ipv6 nf_defr$
    Apr 29 13:40:47 stephan kernel: Pid: 66, comm: kswapd1 Not tainted 3.0.0+ #1
    Apr 29 13:40:47 stephan kernel: Call Trace:
    Apr 29 13:40:47 stephan kernel: <IRQ>  [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
    Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
    Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
    Apr 29 13:40:47 stephan kernel: [<ffffffff81268c72>] __list_del_entry+0x82/0xd0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
    Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
    Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
    Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
    Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
    Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
    Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
    Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210
    Apr 29 13:40:47 stephan kernel: [<ffffffff810898c1>] ? hrtimer_interrupt+0x151/0x240
    Apr 29 13:40:47 stephan kernel: [<ffffffff8150317c>] call_softirq+0x1c/0x30
    Apr 29 13:40:47 stephan kernel: [<ffffffff8100d345>] do_softirq+0x65/0xa0
    Apr 29 13:40:47 stephan kernel: [<ffffffff8106964d>] irq_exit+0xbd/0xe0
    Apr 29 13:40:47 stephan kernel: <IRQ>  [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
    Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
    Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
    Apr 29 13:40:47 stephan kernel: [<ffffffff81268c72>] __list_del_entry+0x82/0xd0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
    Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
    Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
    Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
    Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
    Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
    Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
    Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210
    Apr 29 13:40:47 stephan kernel: [<ffffffff810898c1>] ? hrtimer_interrupt+0x151/0x240
    Apr 29 13:40:47 stephan kernel: [<ffffffff8150317c>] call_softirq+0x1c/0x30
    Apr 29 13:40:47 stephan kernel: [<ffffffff8100d345>] do_softirq+0x65/0xa0
    Apr 29 13:40:47 stephan kernel: [<ffffffff8106964d>] irq_exit+0xbd/0xe0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81503abe>] smp_apic_timer_interrupt+0x6e/0x99
    Apr 29 13:40:47 stephan kernel: [<ffffffff81502933>] apic_timer_interrupt+0x13/0x20
    Apr 29 13:40:47 stephan kernel: <EOI>  [<ffffffffa03d9b08>] ? xfs_perag_get_tag+0x8/0xd0 [xfs]
    Apr 29 13:40:47 stephan kernel: [<ffffffffa03f3968>] xfs_reclaim_inode_shrink+0x58/0xb0 [xfs]
    Apr 29 13:40:47 stephan kernel: [<ffffffff81113191>] shrink_slab+0x81/0x1a0
    Apr 29 13:40:47 stephan kernel: [<ffffffff811162ee>] balance_pgdat+0x70e/0x8f0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81116696>] kswapd+0x1c6/0x210
    Apr 29 13:40:47 stephan kernel: [<ffffffff811164d0>] ? balance_pgdat+0x8f0/0x8f0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81084d16>] kthread+0x96/0xa0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81503084>] kernel_thread_helper+0x4/0x10
    Apr 29 13:40:47 stephan kernel: [<ffffffff81084c80>] ? kthread_worker_fn+0x1a0/0x1a0
    Apr 29 13:40:47 stephan kernel: [<ffffffff81503080>] ? gs_change+0x13/0x13
    Apr 29 13:40:47 stephan kernel: ---[ end trace 40eb9c6ec15a76bf ]---
    Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
    Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0xa1/0xd0()
    Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
    Apr 29 13:40:47 stephan kernel: list_del corruption. prev->next should be ffff880c798a3000, but was 7f07e74200000000
Apr 29 13:40:47 stephan kernel: ------------[ cut here ]------------
Apr 29 13:40:47 stephan kernel: WARNING: at lib/list_debug.c:53 __list_del_entry+0xa1/0xd0()
Apr 29 13:40:47 stephan kernel: Hardware name: S5520SC
Apr 29 13:40:47 stephan kernel: list_del corruption. prev->next should be ffff880da7db9000, but was ffff880caa441000
Apr 29 13:40:47 stephan kernel: Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter ip_tables bonding ip6t_REJECT nf_conntrack_ipv6 nf_defr$
Apr 29 13:40:47 stephan kernel: Pid: 66, comm: kswapd1 Tainted: G        W   3.0.0+ #1
Apr 29 13:40:47 stephan kernel: Call Trace:
Apr 29 13:40:47 stephan kernel: <IRQ>  [<ffffffff81062b2f>] warn_slowpath_common+0x7f/0xc0
Apr 29 13:40:47 stephan kernel: [<ffffffff8101b927>] ? intel_pmu_enable_all+0xa7/0x160
Apr 29 13:40:47 stephan kernel: [<ffffffff81062c26>] warn_slowpath_fmt+0x46/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81268c91>] __list_del_entry+0xa1/0xd0
Apr 29 13:40:47 stephan kernel: [<ffffffff81268cd1>] list_del+0x11/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff8114ba5b>] free_block+0xcb/0x180
Apr 29 13:40:47 stephan kernel: [<ffffffff8114b8e0>] kmem_cache_free+0x290/0x2b0
Apr 29 13:40:47 stephan kernel: [<ffffffff811ba941>] proc_i_callback+0x31/0x40
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce6bc>] rcu_do_batch+0xdc/0x250
Apr 29 13:40:47 stephan kernel: [<ffffffff810ce8e4>] __rcu_process_callbacks+0xb4/0x1d0
Apr 29 13:40:47 stephan kernel: [<ffffffff810cea25>] rcu_process_callbacks+0x25/0x50
Apr 29 13:40:47 stephan kernel: [<ffffffff81069847>] __do_softirq+0xb7/0x210

Estou usando o centos6 64bit, não é VM e o sistema funcionou sem nenhum problema durante um ano. Três meses atrás eu atualizei o cpu para x5680. Espero que não seja o cpu, porque foi muito caro.

    
por uuisklmp 29.04.2013 / 13:59

1 resposta

5

Embora precisemos de muito mais informações (versão do kernel, quanto tempo a máquina executou antes disso, hardware) gostaria de chamar sua atenção para o ffff880c86f92000, mas era ffff880c86f92800 -line, o que significa que o bit # 11 mudou de 0 para 1. Se você não tiver o ECC, sugiro verificar sua memória.

Apr 29 13:40:47 stephan kernel: list_del corruption. next->prev should be ffff880c86f92000, but was ffff880c86f92800
    
por 29.04.2013 / 15:41