Erros de disco em tty e syslog / dmesg

2

Recentemente, comecei a receber muitos desses erros:

Jun 18 08:57:42 abacus kernel: [  401.554292] ata5: SError: { HostInt 10B8B }
Jun 18 08:57:42 abacus kernel: [  401.559346] sr 4:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jun 18 08:57:42 abacus kernel: [  401.560191] ata5.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun 18 08:57:42 abacus kernel: [  401.560231]          res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
Jun 18 08:57:42 abacus kernel: [  401.575310] ata5.00: status: { DRDY ERR }
Jun 18 08:57:42 abacus kernel: [  401.579801] ata5: hard resetting link
Jun 18 08:57:42 abacus kernel: [  401.929320] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 18 08:57:42 abacus kernel: [  401.941936] ata5.00: configured for UDMA/100
Jun 18 08:57:42 abacus kernel: [  401.969426] ata5: EH complete
Jun 18 08:57:54 abacus kernel: [  413.527699] ata5.00: exception Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6
Jun 18 08:57:54 abacus kernel: [  413.527779] ata5.00: irq_stat 0x40000001
Jun 18 08:57:54 abacus kernel: [  413.527822] ata5: SError: { HostInt 10B8B }
Jun 18 08:57:54 abacus kernel: [  413.527901] sr 4:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jun 18 08:57:54 abacus kernel: [  413.528103] ata5.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun 18 08:57:54 abacus kernel: [  413.528142]          res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
Jun 18 08:57:54 abacus kernel: [  413.528184] ata5.00: status: { DRDY ERR }
Jun 18 08:57:54 abacus kernel: [  413.528303] ata5: hard resetting link
Jun 18 08:57:54 abacus kernel: [  413.875894] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 18 08:57:54 abacus kernel: [  413.888267] ata5.00: configured for UDMA/100
Jun 18 08:57:54 abacus kernel: [  413.916365] ata5: EH complete
Jun 18 08:57:56 abacus kernel: [  415.537834] ata5.00: exception Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6
Jun 18 08:57:56 abacus kernel: [  415.545253] ata5.00: irq_stat 0x40000001
Jun 18 08:57:56 abacus kernel: [  415.549788] ata5: SError: { HostInt 10B8B }
Jun 18 08:57:56 abacus kernel: [  415.554840] sr 4:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jun 18 08:57:56 abacus kernel: [  415.555201] ata5.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun 18 08:57:56 abacus kernel: [  415.555242]          res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
Jun 18 08:57:56 abacus kernel: [  415.570483] ata5.00: status: { DRDY ERR }
Jun 18 08:57:56 abacus kernel: [  415.574695] ata5: hard resetting link
Jun 18 08:57:56 abacus kernel: [  415.924954] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 18 08:57:56 abacus kernel: [  415.936831] ata5.00: configured for UDMA/100
Jun 18 08:57:56 abacus kernel: [  415.965001] ata5: EH complete
Jun 18 08:58:02 abacus kernel: [  421.529784] ata5.00: exception Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6
Jun 18 08:58:02 abacus kernel: [  421.529904] ata5.00: irq_stat 0x40000001
Jun 18 08:58:02 abacus kernel: [  421.530023] ata5: SError: { HostInt 10B8B }
Jun 18 08:58:02 abacus kernel: [  421.530104] sr 4:0:0:0: CDB: Test Unit Ready: 00 00 00 00 00 00
Jun 18 08:58:02 abacus kernel: [  421.530425] ata5.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jun 18 08:58:02 abacus kernel: [  421.530466]          res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x40 (internal error)
Jun 18 08:58:02 abacus kernel: [  421.530583] ata5.00: status: { DRDY ERR }
Jun 18 08:58:02 abacus kernel: [  421.530705] ata5: hard resetting link
Jun 18 08:58:02 abacus kernel: [  421.873218] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 18 08:58:02 abacus kernel: [  421.885040] ata5.00: configured for UDMA/100
Jun 18 08:58:02 abacus kernel: [  421.913404] ata5: EH complete

Estas mensagens de erro são críticas? Qual seria a causa e o remédio?

Aqui estão os dados do smartctl:

smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Momentus 5400.6 series
Device Model:     ST9640320AS
Serial Number:    5WX1W9PW
Firmware Version: 0002HPM1
User Capacity:    640,135,028,736 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Fri Jul  1 08:08:47 2011 PKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (   0) seconds.
Offline data collection
capabilities:            (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:    (   2) minutes.
Extended self-test routine
recommended polling time:    ( 161) minutes.
SCT capabilities:          (0x103f) SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   112   099   006    Pre-fail  Always       -       45873136
  3 Spin_Up_Time            0x0023   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       208
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   076   060   030    Pre-fail  Always       -       4339126852
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3132
 10 Spin_Retry_Count        0x0033   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       208
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       19
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   059   044   045    Old_age   Always   In_the_past 41 (0 3 46 33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       4
192 Power-Off_Retract_Count 0x0022   100   100   000    Old_age   Always       -       31
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1915
194 Temperature_Celsius     0x0022   041   056   000    Old_age   Always       -       41 (0 13 0 0)
195 Hardware_ECC_Recovered  0x003a   048   047   000    Old_age   Always       -       45873136
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3115         -
# 2  Extended offline    Aborted by host               90%      2865         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
    
por Shoaibi 30.06.2011 / 10:44

2 respostas

1

Parece que você pode ter algum firmware da unidade com bugs ou uma unidade com defeito. Não faria mal rodar o longo autoteste SMART. Você pode usar o smartctl ou o utilitário de disco gui para fazer isso. Você também pode baixar e executar a ferramenta de diagnóstico de unidade da Seagate. Se houver problemas com a unidade, eles devem substituí-la na garantia.

    
por psusi 27.12.2011 / 19:57
0

Eu diria que os erros de busca e leitura crua não são críticos, talvez apenas irritantes. Não há nada que você possa fazer a não ser substituir a unidade.

Para mais informações e explicações sobre os dados da SMART, consulte a Wikipedia .

Se os erros começaram a aparecer depois de uma atualização do kernel, pode valer a pena iniciar em um kernel mais antigo e verificar os logs lá.

    
por arrange 01.07.2011 / 11:42