Eu tenho um servidor com WDC WD3202ABYS ... Existem 100 hosts virtuais. Servidor está trabalhando cerca de 5 anos e neste período de tempo eu mudei 4 discos. Tudo com o mesmo motivo: erro de sata. O último:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x5
ata2.00: cmd 35/00:60:57:7b:b6/00:01:06:00:00/e0 tag 0 dma 180224 out
res 51/10:60:57:7b:b6/10:01:06:00:00/e0 Emask 0x81 (invalid argument)
ata2.00: status: { DRDY ERR }
ata2.00: error: { IDNF }
ata2.00: configured for UDMA/133
sd 1:0:0:0: SCSI error: return code = 0x08000002
sdb: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
Descriptor sense data with sense descriptors (in hex):
72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
06 b6 7b 57
end_request: I/O error, dev sdb, sector 112622423
Buffer I/O error on device dm-8, logical block 14077747
lost page write due to I/O error on dm-8
Buffer I/O error on device dm-8, logical block 14077748
lost page write due to I/O error on dm-8
Buffer I/O error on device dm-8, logical block 14077749
lost page write due to I/O error on dm-8
Buffer I/O error on device dm-8, logical block 14077750
lost page write due to I/O error on dm-8
Buffer I/O error on device dm-8, logical block 14077751
lost page write due to I/O error on dm-8
Buffer I/O error on device dm-8, logical block 14077756
lost page write due to I/O error on dm-8
ata2: EH complete
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x5
ata2.00: cmd 35/00:90:17:30:b7/00:02:08:00:00/e0 tag 0 dma 335872 out
res 51/10:90:17:30:b7/10:02:08:00:00/e0 Emask 0x81 (invalid argument)
ata2.00: status: { DRDY ERR }
ata2.00: error: { IDNF }
ata2.00: configured for UDMA/133
sd 1:0:0:0: SCSI error: return code = 0x08000002
sdb: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
Descriptor sense data with sense descriptors (in hex):
72 0b 14 00 00 00 00 0c 00 0a 80 00 00 00 00 00
08 b7 30 17
end_request: I/O error, dev sdb, sector 146223127
printk: 34 messages suppressed.
Buffer I/O error on device dm-8, logical block 18277835
Parece algum erro de software ...
mas em pouco tempo depois disso (talvez quando eu comecei o fsck) após erro:
EXT3-fs error (device dm-8): ext3_put_super: Couldn't clean up the journal
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/01:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
É possível que esses erros também sejam "software" ... Quero dizer, este HDD tem apenas 9000 horas de uso ... onde não há carga extra no HDD ... a temperatura é de 29 graus Celsius ... Preciso substituir o hdd? ou cheque disco é suficiente?
EXT3-fs error (device dm-8): ext3_put_super: Couldn't clean up the journal
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/01:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { UNC }
ata2.00: configured for UDMA/133
ata2: EH complete
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x4
ata2.00: cmd c8/00:00:8f:0d:84/00:00:00:00:00/e1 tag 0 dma 131072 in
res 51/40:00:a8:0d:84/10:02:08:00:00/e1 Emask 0x9 (media error)
Como descobrir o motivo?
Aqui estão os erros do smart:
Error 36 occurred at disk power-on lifetime: 9160 hours (381 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 22 09 80 e3 Error: UNC at LBA = 0x03800922 = 58722594
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 1f 09 80 03 0a 47d+13:38:13.534 READ DMA
ec 00 00 00 00 00 00 0a 47d+13:38:13.530 IDENTIFY DEVICE
ef 03 46 00 00 00 00 0a 47d+13:38:13.528 SET FEATURES [Set transfer mode]
Ok. É possível o seguinte cenário:
1. O disco estava no 9000 sem fsck.
2. Existem alguns erros
3. No dmesg, começaram erros como:
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: BMDMA stat 0x5
ata2.00: cmd 35/00:60:57:7b:b6/00:01:06:00:00/e0 tag 0 dma 180224 out
res 51/10:60:57:7b:b6/10:01:06:00:00/e0 Emask 0x81 (invalid argument)
ata2.00: status: { DRDY ERR }
ata2.00: error: { IDNF }
ata2.00: configured for UDMA/133
sd 1:0:0:0: SCSI error: return code = 0x08000002
sdb: Current [descriptor]: sense key: Aborted Command
Add. Sense: Recorded entity not found
- E erros como erro de inode e assim por diante ...
- Eu tentei desmontar essa parição, e erro veio do disco rígido como se não pudesse encontrar tal inode e assim por diante ...?
Se sim, eu não entendo. Preciso trocar de disco todos os anos? Apenas para evitar esse erro? Alguém tem o mesmo problema? Não só com um disco ...