Matriz RAID-1 md lenta


Temos alguns problemas com a resposta muito lenta do disco em nosso servidor. Eu verifiquei iostat ( iostat -d -x 30 ) e tenho alguns problemas com a sua interpretação:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               1.04   396.31    6.60   57.44   382.47  3649.21    62.95    10.31  160.87   8.64  55.36
sda               6.26   391.15   16.16   62.75  1810.79  3649.22    69.19     2.97   37.66   1.79  14.13
md0               0.00     0.00    0.55    0.01    16.88     0.08    30.11     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.02    0.07     1.10     0.54    18.31     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.02    0.04     0.13     0.34     8.00     0.00    0.00   0.00   0.00
md3               0.00     0.00   29.48  453.28  2175.15  3643.46    12.05     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00    56.15    0.70   81.34    12.00  1110.03    13.68    47.56  600.17   5.23  42.89
sda               0.00    51.02    0.47   81.37     4.53  1059.38    13.00     0.32    3.95   0.69   5.64
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.17   47.45    16.53   379.61     8.15     0.00    0.00   0.00   0.00

A primeira é a estatística inicial (histórico) iostat é exibida, a segunda é depois de 30 segundos.

Por que await para sdb é maior que para sda ? OK, porque svctm também é maior ( svctm faz parte de await , mas também influencia o tamanho da fila). Mas por que, se houver no espelho? Ambos são exatamente os mesmos discos, smartctl não reporta nenhum problema ou diferenças significativas nos contadores:


Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   145   021    Pre-fail  Always       -       9716
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       71
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14623
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       69
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       68
193 Load_Cycle_Count        0x0032   113   113   000    Old_age   Always       -       262965
194 Temperature_Celsius     0x0022   126   114   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   145   145   021    Pre-fail  Always       -       9708
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       67
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       14622
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       65
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       63
193 Load_Cycle_Count        0x0032   113   113   000    Old_age   Always       -       261839
194 Temperature_Celsius     0x0022   128   115   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

/etc/fstab :

proc            /proc           proc    defaults        0       0
/dev/md0        /               ext3    relatime,errors=remount-ro 0       1
/dev/md1        /var            ext3    relatime        0       2
/dev/md2        none            swap    sw              0       0
/dev/md3        /vz             ext3    relatime        0       3
/dev/hda        /media/cdrom0   udf,iso9660 user,noauto 0       0

Algumas medições com iostat -d -x 2 (a cada dois segundos) sob carga pesada. Você pode ver que ambos os discos podem ter fila e tempo de espera mais longos, mas o sda reduz com sucesso isso, enquanto o sdb continua tendo o tempo de espera mais longo. Isso é estranho, já que os discos são iguais e são RAID-1 (espelho).

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    6.00     0.00  6144.00  1024.00    21.40 4545.00 166.67 100.00
sda               0.00     0.00    2.00    1.00    16.00     8.00     8.00     0.49  390.00  75.33  22.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.50    0.50    12.00     4.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  1405.00    0.50   23.00     4.00 10632.00   452.60    18.96 1889.62  41.62  97.80
sda               0.50  1401.50    1.50   37.50   120.00 11512.00   298.26     4.29  110.00   3.13  12.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    2.50 1439.00   124.00 11512.00     8.07     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  1995.50    0.00   29.00     0.00  5304.00   182.90    13.64  873.31  34.34  99.60
sda               0.50  1986.50    6.50   28.50   512.00  1664.00    62.17     0.57    7.14   1.89   6.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    7.00 2046.00   512.00 16368.00     8.22     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00   930.00    0.00   18.50     0.00  1192.00    64.43    92.52  859.68  54.05 100.00
sda               0.00   928.50    0.00   35.50     0.00 18192.00   512.45    51.52  701.97  28.17 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00  946.50     0.00  7572.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   16.00     0.00  8976.00   561.00    56.14 2710.38  62.50 100.00
sda               0.00     0.00    0.00   13.50     0.00  4084.00   302.52     6.26 2457.63  47.56  64.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00 10240.00  1024.00    33.75 4877.20 100.00 100.00
sda               0.00     0.00    0.50    0.00     4.00     0.00     8.00     0.01   16.00  16.00   0.80
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.50    0.00     4.00     0.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00  3245.50    1.50   31.50   208.00 12756.00   392.85    64.57 2644.30  30.24  99.80
sda               0.00  3245.00    2.00   60.50   108.00 26444.00   424.83    17.03  272.42   4.61  28.80
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    3.50 3305.50   316.00 26444.00     8.09     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    8.00     0.00  8192.00  1024.00    74.48 2241.50 125.00 100.00
sda               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    1.00     0.00     8.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     3.00    0.00   22.50     0.00  5192.00   230.76    58.21 3204.18  44.44 100.00
sda               0.00     3.00    3.50    6.50    48.00    76.00    12.40     0.09    8.00   5.60   5.60
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    4.00   10.00    52.00    80.00     9.43     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.50  4098.50    1.50   31.50   324.00  4160.00   135.88    78.08 3401.39  30.24  99.80
sda               0.50  4084.00    2.00   32.00   216.00  8200.00   247.53    57.79   27.53  15.35  52.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    4.00 4173.00   536.00 33384.00     8.12     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   20.50     0.00  9228.00   450.15    97.71 1776.78  48.78 100.00
sda               0.00     0.00    0.00   32.00     0.00 13536.00   423.00    72.55 1675.31  31.25 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   13.00     0.00  7220.00   555.38    67.20 3830.46  76.92 100.00
sda               0.00     0.00    0.00   25.50     0.00 11652.00   456.94    38.91 4491.14  39.22 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.50    0.50     4.00     4.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   11.00     0.00  5548.00   504.36    50.62 6367.45  90.91 100.00
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     2.37    0.00   0.00 100.00
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00    7.50     0.00  6648.00   886.40    28.48 7513.07 133.33 100.00
sda               0.00     0.00    1.50    3.50    12.00    28.00     8.00     0.24  560.80  20.80  10.40
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.50    2.00    12.00    16.00     8.00     0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   10.00     0.00  4036.00   403.60    12.15 9193.00 100.00 100.00
sda               0.00     0.00    1.00    0.50     8.00     4.00     8.00     0.02   14.67  14.67   2.20
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md1               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md2               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md3               0.00     0.00    1.00    0.50     8.00     4.00     8.00     0.00    0.00   0.00   0.00
Há uma opção -W ou --write-mostly , que é descrita de maneira muito parecida com a que você recebe: «…

This is valid for RAID1 only and means that the 'md' driver will avoid reading from these devices if at all possible. This can be useful if mirroring over a slow link.

… »- man mdadm

Confira. Este poderia ser o problema.

Não tenho certeza se realmente existe um problema. Talvez você esteja lendo mais sobre esses resultados iostat do que existe. Fiz algumas pesquisas e parece que a saída do iowait é confusa.

Para citar link

iowait is one of the most confusing measurements in linux as it had nothing to do with CPU usage! Rather, it just tells you the > cpu had nothing else to do AND there was I/O in progress, something you'd expect to see when moving files around. Clearly you should care if your other CPU loads indicators are high but this is not one of them

Você deve tentar executar mais ou menos testes de velocidade da vida real. Experimente bonnie: link Também olhe para o sysstat: link

E faça cópias / movimentações regulares de arquivos para trás e para frente.

Compare os resultados com outro sistema com uma configuração semelhante. Se você não vê problemas óbvios de velocidade, então não é fora do campo de possibilidades que não há problema algum.

