Como acelerar a migração do Raid 5 para o Raid 6 com o mdadm?

5

Hoje iniciei a migração do meu Raid 5 para o Raid 6 adicionando um novo disco (indo de 7 a 8 discos, todos de 3 TB). Agora a reformulação está em andamento:

Personalities : [raid6] [raid5] [raid4] 
md0 : active raid6 sdi[9] sdh[8] sdf[7] sdc[6] sdd[4] sda[0] sde[5] sdb[1]
      17581590528 blocks super 1.2 level 6, 512k chunk, algorithm 18 [8/7] [UUUUUUU_]
      [>....................]  reshape =  2.3% (69393920/2930265088) finish=6697.7min speed=7118K/sec

unused devices: <none>

mas é lento como o inferno. Serão quase 5 dias antes da conclusão. Eu estava usando para reformular o array em cerca de 1 dia, mas aqui é horrível. A velocidade é muito baixa. O arquivo de backup está em um SSD.

Alterei o tamanho da faixa e os limites de velocidade mínima e máxima, e isso não mudou nada.

Existe alguma maneira de acelerar o processo a um tempo razoável ou tenho que esperar 5 dias para terminar?

Atualização: iostat -kx 10

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    1.68   22.07    0.00   76.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda            1675.90  1723.00   27.60   23.90 13875.20  6970.00   809.52     0.55   10.79    8.59   13.33   7.97  41.02
sdb            1675.90  1723.10   27.20   23.80 13670.40  6970.00   809.43     0.55   10.80    8.96   12.90   8.12  41.43
sdc            1675.90  1723.60   27.50   23.30 13824.00  6970.00   818.66     0.65   12.85   10.48   15.65   9.83  49.94
sdd            1675.90  1723.10   27.60   23.80 13875.20  6970.00   811.10     0.55   10.80    8.93   12.98   8.16  41.95
sde            1675.90  1723.10   27.20   23.80 13670.40  6970.00   809.43     0.60   11.79    9.17   14.79   9.19  46.87
sdf            1675.90  1723.80   27.70   23.10 13926.40  6970.00   822.69     0.72   14.28   11.65   17.43  10.12  51.40
sdg               0.00     4.10    0.00   93.20     0.00 39391.20   845.30     6.07   65.14    0.00   65.14   2.71  25.29
dm-0              0.00     0.00    0.00    4.30     0.00    18.40     8.56     0.00    0.07    0.00    0.07   0.02   0.01
dm-1              0.00     0.00    0.00   89.60     0.00 39372.80   878.86     6.07   67.78    0.00   67.78   2.82  25.28
md0               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdh            1583.50  1631.70  216.50  115.90 13824.00  6970.00   125.11     1.56    4.73    5.36    3.55   0.43  14.41
sdi               0.00  1631.70    0.00  115.90     0.00  6970.00   120.28     0.21    1.77    0.00    1.77   0.28   3.25
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

sdi é o disco que eu adicionei pela última vez. O SDG é o SSD. O dmX são as partições no LVM.

    
por Baptiste Wicht 23.07.2014 / 16:40

2 respostas

1

Parece-me que está relacionado com a migração do mdadm do RAID 5 para o RAID 6. Acabei de adicionar um novo disco ao array e a velocidade de crescimento é totalmente razoável (40000K / s) para o meu hardware.

    
por 29.07.2014 / 21:45
2

De acordo com este post do blog de Neil Brown (o criador de mdadm ), você pode evitar a penalidade de velocidade devido ao processo de backup do intervalo de blocos de mdadm :

  1. Aumentando o número de dispositivos RAID (por exemplo: reformular de 4 discos RAID5 para 5 discos RAID6) %código%
  2. Não especifique a opção mdadm --grow /dev/md0 --level=6 --raid-disk=5

O raciocínio que ele detalha em seu post é que o arquivo de backup é desnecessário quando outra unidade foi adicionada. Isso ocorre porque o processo é um pouco diferente, pois, nesse caso, geralmente há uma lacuna entre os layouts antigos e os novos, que podem ser usados para fazer backup dos dados antigos do layout que estão sendo operados durante a reformulação.

Este trecho de seu artigo explica isso com mais detalhes:

How Level Changing Works

If we think of "RAID5" as a little more generic than the standard definition, and allow it to be any layout which stripes data plus 1 parity block across a number of devices, then we can think of RAID4 as just a special case of RAID5. Then we can imagine a conversion from RAID0 to RAID5 as taking two steps. The first converts to RAID5 using the RAID4 layout with the parity disk as the last disk. This clearly doesn't require any data to be relocated so the change can be instant. It creates a degraded RAID5 in a RAID4 layout so it is not complete, but it is clearly a step in the right direction. I'm sure you can see what comes next. After converting the RAID0 to a degraded RAID5 with an unusual layout we would use the new change-the-layout functionality to convert to a real RAID5.

It is a very similar process that can now be used to convert a RAID5 to a RAID6. We first change the RAID5 to RAID6 with a non-standard layout that has the parity blocks distributed as normal, but the Q blocks all on the last device (a new device). So this is RAID6 using the RAID6 driver, but with a non-RAID6 layout. So we "simply" change the layout and the job is done.

A RAID6 can be converted to a RAID5 by the reverse process. First we change the layout to a layout that is almost RAID5 but with an extra Q disk. Then we convert to real RAID5 by forgetting about the Q disk.

Complexities of re-striping data

In all of this the messiest part is ensuring that the data survives a crash or other system shutdown. With the first reshape which just allowed increasing the number of devices, this was quite easy. For most of the time there is a gap in the devices between where data in the old layout in being read, and where data in the new layout is being written. This gap allows us to have two copies of that data. If we disable writes to a small section while it is being reshaped, then after a crash we know that the old layout still has good data, and simply re-layout the last few stripes from where-ever we recorded that we were up to.

This doesn't work for the first few stripes as they require writing the new layout over the old layout. So after a crash the old layout is probably corrupted and the new layout may be incomplete. So mdadm takes care to make a backup of those first few stripes and when it assembles an array that was still in the early phase of a reshape it first restores from the backup.

For a reshape that does not change the number of devices, such as changing chunksize or layout, every write will be over-writing the old layout of that same data so after a crash there will definitely be a range of blocks that we cannot know whether they are in the old layout or the new layout or a bit of both. So we need to always have a backup of the range of blocks that are currently being reshaped.

This is the most complex part of the new functionality in mdadm 3.1 (which is not released yet but can be found in the devel-3.1 branch for git://neil.brown.name/mdadm). mdadm monitors the reshape, setting an upper bound of how far it can progress at any time and making sure the area that it is allow to rearrange has writes disabled and has been backed-up.

This means that all the data is copied twice, once to the backup and once to the new layout on the array. This clearly means that such a reshape will go very slowly. But that is the price we have to pay for safety. It is like insurance. You might hate having to pay it, but you would hate it much more if you didn't and found that you needed it.

    
por 16.09.2015 / 02:59

Tags