Eu perdi meu RAID novamente?

3

Um pouco de história: há 2 anos, fiquei muito entusiasmado ao descobrir que o mdadm é tão poderoso que pode até remodelar matrizes, para que você possa começar com um array menor e, em seguida, expandi-lo conforme necessário. Eu comprei unidades de 3x1Tb e fiz um RAID-5. Foi bom por um ano.

Então eu comprei 2x mais, e tentei mudar para RAID-6 de 5 drives, e devido a algumas bagunças com versões superblock, perdi todo o conteúdo. Tive que reconstruí-lo a partir do zero, mas 2Tb de dados foram eliminados.

Ontem comprei mais 2 drives e, desta vez, eu tinha tudo: array bem construído, UPS. Desativei o mapa de intenção de gravação, adicionei duas novas unidades como sobressalentes e executei um comando para aumentar a matriz para 7 discos.

Começou a funcionar, mas a velocidade era ridiculamente lenta, ~ 100kb / seg. Depois de processar os primeiros 37Mb a uma velocidade incrível, um dos HDDs antigos falha. Eu desliguei o PC corretamente e desconectei a unidade com falha. Após o boot, pareceu que ele recriava o mapa intent como ele ainda estava no mdadm config, então eu o removi da configuração e reinicializei novamente.

Agora, tudo que vejo é que todos os processos do mdadm estão em conflito e não fazem nada.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1937 root      20   0 12992  608  444 D    0  0.1   0:00.00 mdadm
 2283 root      20   0 12992  852  704 D    0  0.1   0:00.01 mdadm
 2287 root      20   0     0    0    0 D    0  0.0   0:00.01 md0_reshape
 2288 root      18  -2 12992  820  676 D    0  0.1   0:00.01 mdadm

E tudo que vejo no mdstat é:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5]
      2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU]
      [>....................]  reshape =  0.0% (37888/976561152) finish=567604147.2min speed=0K/sec

Eu já tentei o mdadm 2.6.7, 3.1.4 e 3.2 - nada ajuda. Eu perdi meus dados novamente? Alguma sugestão sobre como posso fazer isso funcionar?

OS é o Ubuntu Server 10.04.2.

PS. Não é necessário dizer que os dados estão inacessíveis - não consigo montar / dev / md0 para salvar os dados mais valiosos.

Você pode ver minha decepção - a coisa muito específica com a qual eu estava empolgado falhou duas vezes com 5 TB de meus dados.

Update: Parece que há algumas informações interessantes em kern.log:

21:38:48 ...: [  166.522055] raid5: reshape will continue
21:38:48 ...: [  166.522085] raid5: device sdb1 operational as raid disk 1
21:38:48 ...: [  166.522091] raid5: device sdg1 operational as raid disk 4
21:38:48 ...: [  166.522097] raid5: device sdf1 operational as raid disk 5
21:38:48 ...: [  166.522102] raid5: device sde1 operational as raid disk 6
21:38:48 ...: [  166.522107] raid5: device sdd1 operational as raid disk 0
21:38:48 ...: [  166.522111] raid5: device sdc1 operational as raid disk 3
21:38:48 ...: [  166.523942] raid5: allocated 7438kB for md0
21:38:48 ...: [  166.524041] 1: w=1 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524050] 4: w=2 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524056] 5: w=3 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524062] 6: w=4 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524068] 0: w=5 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524073] 3: w=6 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524079] raid5: raid level 6 set md0 active with 6 out of 7 devices, algorithm 2
21:38:48 ...: [  166.524519] RAID5 conf printout:
21:38:48 ...: [  166.524523]  --- rd:7 wd:6
21:38:48 ...: [  166.524528]  disk 0, o:1, dev:sdd1
21:38:48 ...: [  166.524532]  disk 1, o:1, dev:sdb1
21:38:48 ...: [  166.524537]  disk 3, o:1, dev:sdc1
21:38:48 ...: [  166.524541]  disk 4, o:1, dev:sdg1
21:38:48 ...: [  166.524545]  disk 5, o:1, dev:sdf1
21:38:48 ...: [  166.524550]  disk 6, o:1, dev:sde1
21:38:48 ...: [  166.524553] ...ok start reshape thread
21:38:48 ...: [  166.524727] md0: detected capacity change from 0 to 2999995858944
21:38:48 ...: [  166.524735] md: reshape of RAID array md0
21:38:48 ...: [  166.524740] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
21:38:48 ...: [  166.524745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
21:38:48 ...: [  166.524756] md: using 128k window, over a total of 976561152 blocks.
21:39:05 ...: [  166.525013]  md0:
21:42:04 ...: [  362.520063] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:42:04 ...: [  362.520068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520073] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:42:04 ...: [  362.520083]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520092]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:42:04 ...: [  362.520100]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:42:04 ...: [  362.520107] Call Trace:
21:42:04 ...: [  362.520133]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520148]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520159]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:42:04 ...: [  362.520169]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:42:04 ...: [  362.520179]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520188]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:42:04 ...: [  362.520194]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:42:04 ...: [  362.520205]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:42:04 ...: [  362.520214]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:42:04 ...: [  362.520222]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:42:04 ...: [  362.520230]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:42:04 ...: [  362.520236]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:42:04 ...: [  362.520244]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:42:04 ...: [  362.520251]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:42:04 ...: [  362.520258]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:42:04 ...: [  362.520265]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:42:04 ...: [  362.520272]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:42:04 ...: [  362.520279]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:42:04 ...: [  362.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520290]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520297]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:42:04 ...: [  362.520304]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:42:04 ...: [  362.520310]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:42:04 ...: [  362.520317]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:42:04 ...: [  362.520324]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:42:04 ...: [  362.520331]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:42:04 ...: [  362.520338]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:42:04 ...: [  362.520344]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:42:04 ...: [  362.520350]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520356]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520362]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520369]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520377]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520385]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520391]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520398]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520406]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:42:04 ...: [  362.520414]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520421]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520428]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520437]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520446] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:42:04 ...: [  362.520450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520454] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:42:04 ...: [  362.520462]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520470]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:42:04 ...: [  362.520478]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:42:04 ...: [  362.520485] Call Trace:
21:42:04 ...: [  362.520495]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520502]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520508]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:42:04 ...: [  362.520514]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:42:04 ...: [  362.520520]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:42:04 ...: [  362.520527]  [<ffffffff81145375>] __fput+0xf5/0x210
21:42:04 ...: [  362.520534]  [<ffffffff811454b5>] fput+0x25/0x30
21:42:04 ...: [  362.520540]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:42:04 ...: [  362.520546]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:42:04 ...: [  362.520553]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520559] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:42:04 ...: [  362.520563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520567] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:42:04 ...: [  362.520575]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520582]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:42:04 ...: [  362.520590]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:42:04 ...: [  362.520597] Call Trace:
21:42:04 ...: [  362.520608]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520616]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520626]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:42:04 ...: [  362.520634]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520644]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:42:04 ...: [  362.520651]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:42:04 ...: [  362.520658]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:42:04 ...: [  362.520668]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:42:04 ...: [  362.520675]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:42:04 ...: [  362.520681]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520688]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:42:04 ...: [  362.520694]  [<ffffffff81084416>] kthread+0x96/0xa0
21:42:04 ...: [  362.520701]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:42:04 ...: [  362.520707]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:42:04 ...: [  362.520713]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:42:04 ...: [  362.520718] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:42:04 ...: [  362.520721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520725] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:42:04 ...: [  362.520733]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520741]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:42:04 ...: [  362.520748]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:42:04 ...: [  362.520755] Call Trace:
21:42:04 ...: [  362.520763]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520771]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:42:04 ...: [  362.520777]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520783]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:42:04 ...: [  362.520790]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520795]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520801]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520808]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520815]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520821]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520828]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520834]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520841]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:42:04 ...: [  362.520848]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:42:04 ...: [  362.520855]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:42:04 ...: [  362.520862]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520868]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520874]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520882]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520065] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:44:04 ...: [  482.520071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520077] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:44:04 ...: [  482.520087]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520096]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:44:04 ...: [  482.520104]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:44:04 ...: [  482.520112] Call Trace:
21:44:04 ...: [  482.520139]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520154]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520165]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:44:04 ...: [  482.520175]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:44:04 ...: [  482.520185]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520194]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:44:04 ...: [  482.520201]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:44:04 ...: [  482.520212]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:44:04 ...: [  482.520221]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:44:04 ...: [  482.520229]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:44:04 ...: [  482.520237]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:44:04 ...: [  482.520244]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:44:04 ...: [  482.520252]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:44:04 ...: [  482.520258]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:44:04 ...: [  482.520266]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:44:04 ...: [  482.520273]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:44:04 ...: [  482.520280]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:44:04 ...: [  482.520286]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:44:04 ...: [  482.520293]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520299]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520306]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:44:04 ...: [  482.520313]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:44:04 ...: [  482.520319]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:44:04 ...: [  482.520327]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:44:04 ...: [  482.520334]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:44:04 ...: [  482.520341]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:44:04 ...: [  482.520348]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:44:04 ...: [  482.520355]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:44:04 ...: [  482.520361]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520367]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520373]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520380]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520388]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520396]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520403]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520410]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520417]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:44:04 ...: [  482.520426]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520432]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520438]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520447]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520458] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:44:04 ...: [  482.520462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520467] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:44:04 ...: [  482.520475]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520483]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:44:04 ...: [  482.520490]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:44:04 ...: [  482.520498] Call Trace:
21:44:04 ...: [  482.520508]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520515]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520521]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:44:04 ...: [  482.520527]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:44:04 ...: [  482.520533]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:44:04 ...: [  482.520541]  [<ffffffff81145375>] __fput+0xf5/0x210
21:44:04 ...: [  482.520547]  [<ffffffff811454b5>] fput+0x25/0x30
21:44:04 ...: [  482.520554]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:44:04 ...: [  482.520560]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:44:04 ...: [  482.520568]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520574] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:44:04 ...: [  482.520578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520582] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:44:04 ...: [  482.520590]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520597]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:44:04 ...: [  482.520605]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:44:04 ...: [  482.520612] Call Trace:
21:44:04 ...: [  482.520623]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520633]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520643]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:44:04 ...: [  482.520651]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520661]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:44:04 ...: [  482.520668]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:44:04 ...: [  482.520675]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:44:04 ...: [  482.520685]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:44:04 ...: [  482.520692]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:44:04 ...: [  482.520699]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520705]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:44:04 ...: [  482.520711]  [<ffffffff81084416>] kthread+0x96/0xa0
21:44:04 ...: [  482.520718]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:44:04 ...: [  482.520725]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:44:04 ...: [  482.520730]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:44:04 ...: [  482.520735] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:44:04 ...: [  482.520739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520743] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:44:04 ...: [  482.520751]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520759]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:44:04 ...: [  482.520767]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:44:04 ...: [  482.520774] Call Trace:
21:44:04 ...: [  482.520782]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520790]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:44:04 ...: [  482.520797]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520804]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:44:04 ...: [  482.520810]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520816]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520822]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520829]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520837]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520843]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520850]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520857]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520864]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:44:04 ...: [  482.520871]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:44:04 ...: [  482.520878]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:44:04 ...: [  482.520885]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520891]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520897]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520905]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520053] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:46:04 ...: [  602.520059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520065] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:46:04 ...: [  602.520075]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520084]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:46:04 ...: [  602.520091]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:46:04 ...: [  602.520099] Call Trace:
21:46:04 ...: [  602.520127]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:46:04 ...: [  602.520142]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:46:04 ...: [  602.520153]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:46:04 ...: [  602.520162]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:46:04 ...: [  602.520171]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:46:04 ...: [  602.520180]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:46:04 ...: [  602.520187]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:46:04 ...: [  602.520197]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:46:04 ...: [  602.520206]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:46:04 ...: [  602.520215]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:46:04 ...: [  602.520222]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:46:04 ...: [  602.520229]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:46:04 ...: [  602.520237]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:46:04 ...: [  602.520244]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:46:04 ...: [  602.520252]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:46:04 ...: [  602.520259]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:46:04 ...: [  602.520266]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:46:04 ...: [  602.520273]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:46:04 ...: [  602.520279]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520292]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:46:04 ...: [  602.520300]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:46:04 ...: [  602.520306]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:46:04 ...: [  602.520314]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:46:04 ...: [  602.520321]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:46:04 ...: [  602.520328]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:46:04 ...: [  602.520335]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:46:04 ...: [  602.520342]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:46:04 ...: [  602.520348]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:46:04 ...: [  602.520354]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:46:04 ...: [  602.520359]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:46:04 ...: [  602.520367]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:46:04 ...: [  602.520375]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:46:04 ...: [  602.520383]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:46:04 ...: [  602.520390]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:46:04 ...: [  602.520397]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:46:04 ...: [  602.520404]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:46:04 ...: [  602.520413]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:46:04 ...: [  602.520419]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:46:04 ...: [  602.520425]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:46:04 ...: [  602.520434]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520443] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:46:04 ...: [  602.520447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520451] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:46:04 ...: [  602.520460]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520468]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:46:04 ...: [  602.520475]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:46:04 ...: [  602.520483] Call Trace:
21:46:04 ...: [  602.520492]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:46:04 ...: [  602.520500]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:46:04 ...: [  602.520506]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:46:04 ...: [  602.520512]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:46:04 ...: [  602.520518]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:46:04 ...: [  602.520526]  [<ffffffff81145375>] __fput+0xf5/0x210
21:46:04 ...: [  602.520533]  [<ffffffff811454b5>] fput+0x25/0x30
21:46:04 ...: [  602.520539]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:46:04 ...: [  602.520545]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:46:04 ...: [  602.520552]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
    
por BarsMonster 03.03.2011 / 20:51

2 respostas

2

Consegui entrar em contato com Neil Brown (THE developer) e ele sugeriu imensamente que aumentasse o stripe_cache_size para 2048, pelo menos. Isso se assemelha à minha pergunta anterior, em que não pude tornar essa configuração permanente.

Então, depois de configurá-lo, o 8192 é reformulado, então o problema é resolvido. Deus abençoe Neil Brown: -)

    
por BarsMonster 04.03.2011 / 13:46
1

Às vezes, uma reformulação ocorre em velocidade = 0K / seg porque o arquivo de backup não foi criado ou foi perdido durante o processamento.

A solução, neste caso, foi fornecida por Neil Brown em resposta a um email para linux-raid @ vger .kernel.org .

You should be able to simply stop the array and re-assemble with a different backup file and the magic flag "--invalid-backup" (required mdadm 3.2 or later).

The backup-file is only really needed in case of a crash. As you will stop the array cleanly there will be no need to recover anything when you re-assemble, so --invalid-backup (Which say "there is nothing in the backup file, but that is OK) is perfectly safe.

NeilBrown

Para um RAID5, como dispositivo /dev/md0 , com 7 discos montados em /mnt/data ; o procedimento para sua resposta é:

Todos os comandos a seguir devem ser executados como root ou equivalentes.

Encontre todas as conexões abertas para a unidade:

lsof /mnt/data

Feche-os ou pare os serviços que possam estar interagindo com ele.
Comumente:

systemctl stop <SERVICE_NAME>

ou

service <SERVICE_NAME> stop

Desmonte, pare e remonte novamente:

umount /mnt/data
mdadm --stop /dev/md0
mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1

Dependendo das configurações anteriores, o dispositivo pode remontar automaticamente após o comando de montagem. Se não, monte com:

mount /dev/md0 /mnt/data

É seguro reiniciar todos os serviços ou conexões que estão sendo executados.

    
por Kevin 12.08.2014 / 21:41