Por que todas as unidades estão saindo do RAID quando 1 falha? LSI9260-8i / IBM M5014

1

Eu tenho um problema interessante que espero que todos possam me ajudar. Eu tenho um IBM 5014 (equivalente ao LSI 9260-8i) executando dois drives virtuais RAID10. O primeiro é 4 WD RE4, cada 2TB, para uma unidade total de 4TB - vamos chamar isso de VD1. O outro é de 4 WD RE4-GP, cada 2 TB, para outro drive total de 4 TB - vamos chamar isso de VD0. Caso seja importante, o cartão é executado em um gabinete Norco com 3 ventiladores (1 em cada banco de 4 drives + 1 no Gigabyte MB, 16GB RAM e placa IBM. Há um IBM5015 executando 4 SSDs de 256GB no RAID10, bem ). Sou virtualizado usando o ESXi5.5 com uma série de VMs. A placa 5014 é executada em modo de passagem para um host WHS2011, enquanto a 5015 contém as próprias VMs.

O VD0 é executado corretamente e não apresenta problemas. É o meu armazenamento de documentos primário.

O VD1, no entanto, que contém todos os meus vídeos, periodicamente descarta uma unidade, causando um estado degradado e quase instantaneamente (normalmente com o mesmo carimbo de data e hora, mas ocasionalmente em um atraso de 1 segundo) descarta o restante das unidades bem como fazê-lo ficar offline.

O controlador em si tem funcionado bem por quase 6 meses, então, embora possa ser relacionado ao controlador, parece que isso causaria problemas tanto nas unidades virtuais quanto não apenas em uma delas.

O desafio que tenho é que as unidades não caem consistentemente (pelo menos de acordo com o log) na mesma ordem - então eu não sei qual unidade está causando o problema. Eu incluí um trecho do log abaixo. Como você verá, ele está descartando as unidades e adicionando-as novamente.

Qualquer conselho sobre como solucionar problemas com qual unidade seria muito bem-vinda - não posso acreditar que todos foram mal juntos, nem posso acreditar em como pouca informação está contida no próprio log do MSM.

Obrigado a todos antecipadamente!

Doug

        ID = 248
    SEQUENCE NUMBER = 382617
    TIME = 07-07-2015 08:14:46
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382616
    TIME = 07-07-2015 08:14:46
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382615
    TIME = 07-07-2015 08:14:45
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382614
    TIME = 07-07-2015 08:14:45
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382613
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382612
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382611
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382610
    TIME = 07-07-2015 08:14:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382609
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382608
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382607
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382606
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 247
    SEQUENCE NUMBER = 382605
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382604
    TIME = 07-07-2015 07:53:09
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 247
    SEQUENCE NUMBER = 382603
    TIME = 07-07-2015 07:53:04
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   13

    ID = 91
    SEQUENCE NUMBER = 382602
    TIME = 07-07-2015 07:53:04
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382601
    TIME = 07-07-2015 07:52:44
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382600
    TIME = 07-07-2015 07:52:44
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382599
    TIME = 07-07-2015 07:52:42
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382598
    TIME = 07-07-2015 07:52:42
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3

    ID = 248
    SEQUENCE NUMBER = 382597
    TIME = 07-07-2015 07:52:41
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382596
    TIME = 07-07-2015 07:52:41
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382595
    TIME = 07-07-2015 07:52:40
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382594
    TIME = 07-07-2015 07:52:40
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 145
    SEQUENCE NUMBER = 382593
    TIME = 07-07-2015 07:10:59
    LOCALIZED MESSAGE = Controller ID:  0   Battery temperature is high

    ID = 149
    SEQUENCE NUMBER = 382592
    TIME = 07-07-2015 06:56:54
    LOCALIZED MESSAGE = Controller ID:  0   Battery temperature is normal

    ID = 247
    SEQUENCE NUMBER = 382591
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382590
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382589
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382588
    TIME = 07-07-2015 04:08:56
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 247
    SEQUENCE NUMBER = 382587
    TIME = 07-07-2015 04:08:55
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382586
    TIME = 07-07-2015 04:08:55
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382585
    TIME = 07-07-2015 04:08:49
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382584
    TIME = 07-07-2015 04:08:49
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382583
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382582
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382581
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382580
    TIME = 07-07-2015 04:08:47
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382579
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   14

    ID = 91
    SEQUENCE NUMBER = 382578
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:2

    ID = 247
    SEQUENCE NUMBER = 382577
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   13

    ID = 91
    SEQUENCE NUMBER = 382576
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:3

    ID = 247
    SEQUENCE NUMBER = 382575
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   8

    ID = 91
    SEQUENCE NUMBER = 382574
    TIME = 07-07-2015 03:24:32
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:1

    ID = 247
    SEQUENCE NUMBER = 382573
    TIME = 07-07-2015 03:24:27
    LOCALIZED MESSAGE = Controller ID:  0  Device inserted   Device Type:       Disk  Device Id:   9

    ID = 91
    SEQUENCE NUMBER = 382572
    TIME = 07-07-2015 03:24:27
    LOCALIZED MESSAGE = Controller ID:  0   PD inserted:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382571
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

    ID = 112
    SEQUENCE NUMBER = 382570
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

    ID = 248
    SEQUENCE NUMBER = 382569
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

    ID = 112
    SEQUENCE NUMBER = 382568
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

    ID = 248
    SEQUENCE NUMBER = 382567
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

    ID = 112
    SEQUENCE NUMBER = 382566
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

    ID = 248
    SEQUENCE NUMBER = 382565
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

    ID = 112
    SEQUENCE NUMBER = 382564
    TIME = 07-07-2015 03:23:36
    LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3
ID = 139
SEQUENCE NUMBER = 382435
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   Deleted VD:       1

ID = 114
SEQUENCE NUMBER = 382434
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:0  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382433
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:2  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382432
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:1  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382431
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:3  Previous   =   Failed      Current   =   Unconfigured Bad

ID = 114
SEQUENCE NUMBER = 382430
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:0  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382429
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   9

ID = 112
SEQUENCE NUMBER = 382428
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:0

ID = 252
SEQUENCE NUMBER = 382427
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  VD is now OFFLINE   VD       1

ID = 81
SEQUENCE NUMBER = 382426
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change on VD:   1      Previous   =   Degraded  Current   =       Offline

ID = 114
SEQUENCE NUMBER = 382425
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:2  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382424
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   14

ID = 112
SEQUENCE NUMBER = 382423
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:2

ID = 114
SEQUENCE NUMBER = 382422
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:1  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382421
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   8

ID = 112
SEQUENCE NUMBER = 382420
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:1

ID = 251
SEQUENCE NUMBER = 382419
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  VD is now DEGRADED   VD       1

ID = 81
SEQUENCE NUMBER = 382418
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change on VD:   1      Previous   =   Optimal  Current   =       Degraded

ID = 114
SEQUENCE NUMBER = 382417
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   State change:   PD       =   -:-:3  Previous   =   Online      Current   =   Failed

ID = 248
SEQUENCE NUMBER = 382416
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0  Device removed   Device Type:       Disk  Device Id:   13

ID = 112
SEQUENCE NUMBER = 382415
TIME = 04-07-2015 08:27:32
LOCALIZED MESSAGE = Controller ID:  0   PD removed:       -:-:3
    
por DouglasABaker 10.07.2015 / 19:37

1 resposta

1

Lamentamos não ter experimentado o mesmo, mas executamos o LSI e atualizamos o firmware antes. Por favor, verifique se você tem o firmware mais recente para o dispositivo.

    
por 10.07.2015 / 19:55