Você está verificando o servidor errado. A segunda saída / proc / mdstat (com 4 raid arrays) não é de titan707 que possui três arrays de raid.
Eu tenho no meu cron rsync ativo e comecei a receber e-mails após cada rsync
This is an automatically generated mail message from mdadm running on titan707 A DegradedArray event had been detected on md device /dev/md/2. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdb3[1] sda3[0] 7995840 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdb2[1](F) sda2[0] 499712 blocks super 1.2 [2/1] [U_] md2 : active raid1 sdb4[1](F) sda4[0] 968130304 blocks super 1.2 [2/1] [U_] unused devices:
Mais tarde, o smartctl e o mdadmin não mostram nenhum problema, veja abaixo os logs do mdadm, smartctl.
$ cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear] [multipath] md0 : active raid1 sda1[0] sdb1[1] 33553336 blocks super 1.2 [2/2] [UU] md1 : active raid1 sdb2[1] sda2[0] 524276 blocks super 1.2 [2/2] [UU] md3 : active raid1 sdb4[1] sda4[0] 1822442815 blocks super 1.2 [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 1073740664 blocks super 1.2 [2/2] [UU] unused devices: $ smartctl -a /dev/sda smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-24-generic] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda XT Device Model: ST33000651AS Serial Number: Z291E1TG LU WWN Device Id: 5 000c50 03f2f8fbc Firmware Version: CC45 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Wed Mar 19 09:20:26 2014 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 600) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 152015022 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 6 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 40795438 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 20281 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 6 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 053 046 045 Old_age Always - 47 (Min/Max 43/54) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 194 Temperature_Celsius 0x0022 047 054 000 Old_age Always - 47 (0 23 0 0) 195 Hardware_ECC_Recovered 0x001a 021 003 000 Old_age Always - 152015022 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 253145372446521 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2852285811 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 811308464 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 20193 - # 2 Short offline Completed without error 00% 20185 - # 3 Extended offline Completed without error 00% 5723 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. $ smartctl -a /dev/sdb smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-24-generic] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Seagate Barracuda XT Device Model: ST33000651AS Serial Number: Z2917JDM LU WWN Device Id: 5 000c50 03f1b6146 Firmware Version: CC45 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Wed Mar 19 09:20:53 2014 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 609) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x103f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 144398334 3 Spin_Up_Time 0x0003 094 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 6 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 41707682 9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 20281 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 6 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 049 045 Old_age Always - 43 (Min/Max 39/51) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 6 194 Temperature_Celsius 0x0022 043 051 000 Old_age Always - 43 (0 23 0 0) 195 Hardware_ECC_Recovered 0x001a 021 003 000 Old_age Always - 144398334 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 38959648362297 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 162809159 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1526676264 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 20218 - # 2 Short offline Completed without error 00% 20185 - # 3 Extended offline Completed without error 00% 5723 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. $ $ mdadm -D /dev/md0 /dev/md0: Version : 1.2 Creation Time : Fri Jul 27 13:40:57 2012 Raid Level : raid1 Array Size : 33553336 (32.00 GiB 34.36 GB) Used Dev Size : 33553336 (32.00 GiB 34.36 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Mon Mar 17 12:24:57 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : rescue:0 UUID : 28ad38a2:f3df9bbc:2f1f4d98:2006ce16 Events : 22 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 $ mdadm -D /dev/md1 /dev/md1: Version : 1.2 Creation Time : Fri Jul 27 13:40:57 2012 Raid Level : raid1 Array Size : 524276 (512.07 MiB 536.86 MB) Used Dev Size : 524276 (512.07 MiB 536.86 MB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Mar 19 06:25:43 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : rescue:1 UUID : 659022e1:e93cfcb9:c7b533ae:5a81c83b Events : 25 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 1 8 18 1 active sync /dev/sdb2 $ mdadm -D /dev/md2 /dev/md2: Version : 1.2 Creation Time : Fri Jul 27 13:40:58 2012 Raid Level : raid1 Array Size : 1073740664 (1024.00 GiB 1099.51 GB) Used Dev Size : 1073740664 (1024.00 GiB 1099.51 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Mar 19 09:21:40 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : rescue:2 UUID : b79d3e48:62b55d0b:8501355c:2f905ef2 Events : 34 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3 $ mdadm -D /dev/md3 /dev/md3: Version : 1.2 Creation Time : Fri Jul 27 13:40:58 2012 Raid Level : raid1 Array Size : 1822442815 (1738.02 GiB 1866.18 GB) Used Dev Size : 1822442815 (1738.02 GiB 1866.18 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Mar 19 09:21:09 2014 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Name : rescue:3 UUID : fdb07043:8bd52646:9f267e1b:d0a43f0e Events : 22 Number Major Minor RaidDevice State 0 8 4 0 active sync /dev/sda4 1 8 20 1 active sync /dev/sdb4 $
Também não consigo encontrar nada no dmesg
$ dmesg | grep "md" [ 1.957908] md: raid0 personality registered for level 0 [ 1.959091] md: raid1 personality registered for level 1 [ 2.069112] md: bind [ 2.070684] md: bind [ 2.072032] md: bind [ 2.116159] md: bind [ 2.117310] md/raid1:md3: active with 2 out of 2 mirrors [ 2.117380] md3: detected capacity change from 0 to 1866181442560 [ 2.124174] md: bind [ 2.138621] md3: unknown partition table [ 2.140113] md: bind [ 2.141326] md/raid1:md2: active with 2 out of 2 mirrors [ 2.141398] md2: detected capacity change from 0 to 1099510439936 [ 2.162685] md2: unknown partition table [ 2.230596] md: bind [ 2.231715] md/raid1:md1: active with 2 out of 2 mirrors [ 2.231786] md1: detected capacity change from 0 to 536858624 [ 2.233100] md1: unknown partition table [ 2.436160] md: bind [ 2.437387] md/raid1:md0: active with 2 out of 2 mirrors [ 2.437456] md0: detected capacity change from 0 to 34358616064 [ 2.444765] md0: unknown partition table [ 2.456675] md: raid6 personality registered for level 6 [ 2.456738] md: raid5 personality registered for level 5 [ 2.456797] md: raid4 personality registered for level 4 [ 2.458570] md: raid10 personality registered for level 10 [ 2.462736] md: linear personality registered for level -1 [ 2.463538] md: multipath personality registered for level -4 [ 8.213448] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null) [ 11.334852] Adding 33553332k swap on /dev/md0. Priority:-1 extents:1 across:33553332k [ 11.337379] EXT4-fs (md2): warning: checktime reached, running e2fsck is recommended [ 11.359536] EXT4-fs (md2): re-mounted. Opts: (null) [ 11.700105] EXT3-fs (md1): warning: checktime reached, running e2fsck is recommended [ 11.778306] EXT3-fs (md1): using internal journal [ 11.778310] EXT3-fs (md1): mounted filesystem with ordered data mode [ 12.155704] EXT4-fs (md3): warning: checktime reached, running e2fsck is recommended [ 12.218303] EXT4-fs (md3): mounted filesystem with ordered data mode. Opts: (null) $ dmesg | grep "sd" [ 1.870244] sd 0:0:0:0: [sda] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 1.870251] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 1.870487] sd 0:0:0:0: [sda] Write Protect is off [ 1.870637] sd 1:0:0:0: [sdb] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 1.870638] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 1.870667] sd 1:0:0:0: [sdb] Write Protect is off [ 1.870668] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [ 1.870697] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 1.870989] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 1.870999] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 1.916610] sda: sda1 sda2 sda3 sda4 sda5 [ 1.917195] sd 0:0:0:0: [sda] Attached SCSI disk [ 1.928325] sdb: sdb1 sdb2 sdb3 sdb4 sdb5 [ 1.929042] sd 1:0:0:0: [sdb] Attached SCSI disk [ 2.069112] md: bind [ 2.070684] md: bind [ 2.072032] md: bind [ 2.116159] md: bind [ 2.124174] md: bind [ 2.140113] md: bind [ 2.230596] md: bind [ 2.436160] md: bind
Cron script que estou executando como usuário mybackup para sincronizar conteúdo entre dois servidores que eu gerencio
#!/bin/bash #follow instructions to setup mybackup account and sh keys from https://blogs.oracle.com/jkini/entry/how_to_scp_scp_and rsync -a -r -u [email protected]:/tralev/images /home/tralev/backup echo finished tralev images sleep 2s rsync -a -r -u [email protected]:/backup/* /home/tralev/backup/db echo finished tralev db sleep 2s #backup numbeo files to tralev server rsync -a -r -u /numbeo/* [email protected]:/numbeo/backup echo finished numbeo files like images sleep 2s rsync -a -r -u /root/backup/* [email protected]:/numbeo/db_backup echo finished numbeo db backup sleep 2s
Eu posso reproduzir o problema apenas ao executá-lo no cron, quando executo o script no servidor, não obtenho o mesmo problema.
Alguma ideia do que poderia correr mal?
EDIT: Acabei descobrindo que estava checando o servidor errado. Ainda mais, ambas as unidades no servidor titan707 falharam, então tive que substituir o servidor do backup! Erro humano!
Você está verificando o servidor errado. A segunda saída / proc / mdstat (com 4 raid arrays) não é de titan707 que possui três arrays de raid.
Tags symbolic-link rsync mdadm linux smartctl