web-dev-qa-db-fra.com

Ai-je encore perdu mon RAID?

Un peu d'histoire: il y a 2 ans, j'étais vraiment excité de découvrir que mdadm est si puissant qu'il peut même remodeler les tableaux, vous pouvez donc commencer avec un tableau plus petit, puis l'agrandir selon vos besoins. J'ai acheté des disques 3x1 To et j'ai fait un RAID-5. C'était bien pendant un an.

Ensuite, j'ai acheté 2x de plus et j'ai essayé de remodeler en RAID-6 sur 5 disques, et en raison de certains problèmes avec les versions de superbloc, j'ai perdu tout le contenu. J'ai dû le reconstruire à partir de zéro, mais 2 To de données avaient disparu.

Hier, j'ai acheté 2 disques de plus, et cette fois, j'avais tout: un tableau correctement construit, un UPS. J'ai désactivé la carte d'intention d'écriture, ajouté 2 nouveaux disques en tant que pièces de rechange et exécuté une commande pour étendre la matrice à 7 disques.

Il a commencé à fonctionner, mais la vitesse était ridiculement lente, ~ 100 Ko/sec. Après avoir traité les premiers 37 Mo à une vitesse aussi incroyable, l'un des anciens disques durs tombe en panne. J'ai arrêté correctement le PC et déconnecté le disque défectueux. Après le démarrage, il est apparu qu'il avait recréé la carte d'intention car elle était encore dans la configuration mdadm, je l'ai donc supprimée de la configuration et redémarrée à nouveau.

Maintenant, tout ce que je vois, c'est que tous les processus mdadm sont bloqués et ne font rien.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 1937 root      20   0 12992  608  444 D    0  0.1   0:00.00 mdadm
 2283 root      20   0 12992  852  704 D    0  0.1   0:00.01 mdadm
 2287 root      20   0     0    0    0 D    0  0.0   0:00.01 md0_reshape
 2288 root      18  -2 12992  820  676 D    0  0.1   0:00.01 mdadm

Et tout ce que je vois dans mdstat est:

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb1[1] sdg1[4] sdf1[7] sde1[6] sdd1[0] sdc1[5]
      2929683456 blocks super 1.2 level 6, 1024k chunk, algorithm 2 [7/6] [UU_UUUU]
      [>....................]  reshape =  0.0% (37888/976561152) finish=567604147.2min speed=0K/sec

J'ai déjà essayé mdadm 2.6.7, 3.1.4 et 3.2 - rien n'y fait. Ai-je encore perdu mes données? Toutes les suggestions sur la façon dont je peux faire ce travail?

Le système d'exploitation est Ubuntu Server 10.04.2.

PS. Inutile de dire que les données sont inaccessibles - je ne peux pas monter/dev/md0 pour enregistrer les données les plus précieuses.

Vous pouvez voir ma déception - la chose très spécifique qui m'excitait a échoué à deux reprises en prenant 5 To de mes données avec.

pdate: Il semble qu'il y ait de belles informations dans kern.log:

21:38:48 ...: [  166.522055] raid5: reshape will continue
21:38:48 ...: [  166.522085] raid5: device sdb1 operational as raid disk 1
21:38:48 ...: [  166.522091] raid5: device sdg1 operational as raid disk 4
21:38:48 ...: [  166.522097] raid5: device sdf1 operational as raid disk 5
21:38:48 ...: [  166.522102] raid5: device sde1 operational as raid disk 6
21:38:48 ...: [  166.522107] raid5: device sdd1 operational as raid disk 0
21:38:48 ...: [  166.522111] raid5: device sdc1 operational as raid disk 3
21:38:48 ...: [  166.523942] raid5: allocated 7438kB for md0
21:38:48 ...: [  166.524041] 1: w=1 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524050] 4: w=2 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524056] 5: w=3 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524062] 6: w=4 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524068] 0: w=5 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524073] 3: w=6 pa=2 pr=5 m=2 a=2 r=7 op1=0 op2=0
21:38:48 ...: [  166.524079] raid5: raid level 6 set md0 active with 6 out of 7 devices, algorithm 2
21:38:48 ...: [  166.524519] RAID5 conf printout:
21:38:48 ...: [  166.524523]  --- rd:7 wd:6
21:38:48 ...: [  166.524528]  disk 0, o:1, dev:sdd1
21:38:48 ...: [  166.524532]  disk 1, o:1, dev:sdb1
21:38:48 ...: [  166.524537]  disk 3, o:1, dev:sdc1
21:38:48 ...: [  166.524541]  disk 4, o:1, dev:sdg1
21:38:48 ...: [  166.524545]  disk 5, o:1, dev:sdf1
21:38:48 ...: [  166.524550]  disk 6, o:1, dev:sde1
21:38:48 ...: [  166.524553] ...ok start reshape thread
21:38:48 ...: [  166.524727] md0: detected capacity change from 0 to 2999995858944
21:38:48 ...: [  166.524735] md: reshape of RAID array md0
21:38:48 ...: [  166.524740] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
21:38:48 ...: [  166.524745] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape.
21:38:48 ...: [  166.524756] md: using 128k window, over a total of 976561152 blocks.
21:39:05 ...: [  166.525013]  md0:
21:42:04 ...: [  362.520063] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:42:04 ...: [  362.520068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520073] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:42:04 ...: [  362.520083]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520092]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:42:04 ...: [  362.520100]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:42:04 ...: [  362.520107] Call Trace:
21:42:04 ...: [  362.520133]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520148]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520159]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:42:04 ...: [  362.520169]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:42:04 ...: [  362.520179]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520188]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:42:04 ...: [  362.520194]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:42:04 ...: [  362.520205]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:42:04 ...: [  362.520214]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:42:04 ...: [  362.520222]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:42:04 ...: [  362.520230]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:42:04 ...: [  362.520236]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:42:04 ...: [  362.520244]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:42:04 ...: [  362.520251]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:42:04 ...: [  362.520258]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:42:04 ...: [  362.520265]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:42:04 ...: [  362.520272]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:42:04 ...: [  362.520279]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:42:04 ...: [  362.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520290]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:42:04 ...: [  362.520297]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:42:04 ...: [  362.520304]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:42:04 ...: [  362.520310]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:42:04 ...: [  362.520317]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:42:04 ...: [  362.520324]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:42:04 ...: [  362.520331]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:42:04 ...: [  362.520338]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:42:04 ...: [  362.520344]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:42:04 ...: [  362.520350]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520356]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520362]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520369]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520377]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520385]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520391]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520398]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520406]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:42:04 ...: [  362.520414]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520421]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520428]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520437]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520446] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:42:04 ...: [  362.520450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520454] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:42:04 ...: [  362.520462]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520470]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:42:04 ...: [  362.520478]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:42:04 ...: [  362.520485] Call Trace:
21:42:04 ...: [  362.520495]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520502]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520508]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:42:04 ...: [  362.520514]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:42:04 ...: [  362.520520]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:42:04 ...: [  362.520527]  [<ffffffff81145375>] __fput+0xf5/0x210
21:42:04 ...: [  362.520534]  [<ffffffff811454b5>] fput+0x25/0x30
21:42:04 ...: [  362.520540]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:42:04 ...: [  362.520546]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:42:04 ...: [  362.520553]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:42:04 ...: [  362.520559] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:42:04 ...: [  362.520563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520567] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:42:04 ...: [  362.520575]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520582]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:42:04 ...: [  362.520590]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:42:04 ...: [  362.520597] Call Trace:
21:42:04 ...: [  362.520608]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:42:04 ...: [  362.520616]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:42:04 ...: [  362.520626]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:42:04 ...: [  362.520634]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520644]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:42:04 ...: [  362.520651]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:42:04 ...: [  362.520658]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:42:04 ...: [  362.520668]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:42:04 ...: [  362.520675]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:42:04 ...: [  362.520681]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:42:04 ...: [  362.520688]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:42:04 ...: [  362.520694]  [<ffffffff81084416>] kthread+0x96/0xa0
21:42:04 ...: [  362.520701]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:42:04 ...: [  362.520707]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:42:04 ...: [  362.520713]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:42:04 ...: [  362.520718] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:42:04 ...: [  362.520721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:42:04 ...: [  362.520725] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:42:04 ...: [  362.520733]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:42:04 ...: [  362.520741]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:42:04 ...: [  362.520748]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:42:04 ...: [  362.520755] Call Trace:
21:42:04 ...: [  362.520763]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:42:04 ...: [  362.520771]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:42:04 ...: [  362.520777]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:42:04 ...: [  362.520783]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:42:04 ...: [  362.520790]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:42:04 ...: [  362.520795]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:42:04 ...: [  362.520801]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:42:04 ...: [  362.520808]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:42:04 ...: [  362.520815]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:42:04 ...: [  362.520821]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:42:04 ...: [  362.520828]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:42:04 ...: [  362.520834]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:42:04 ...: [  362.520841]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:42:04 ...: [  362.520848]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:42:04 ...: [  362.520855]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:42:04 ...: [  362.520862]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:42:04 ...: [  362.520868]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:42:04 ...: [  362.520874]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:42:04 ...: [  362.520882]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520065] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:44:04 ...: [  482.520071] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520077] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:44:04 ...: [  482.520087]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520096]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:44:04 ...: [  482.520104]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:44:04 ...: [  482.520112] Call Trace:
21:44:04 ...: [  482.520139]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520154]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520165]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:44:04 ...: [  482.520175]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:44:04 ...: [  482.520185]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520194]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:44:04 ...: [  482.520201]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:44:04 ...: [  482.520212]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:44:04 ...: [  482.520221]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:44:04 ...: [  482.520229]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:44:04 ...: [  482.520237]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:44:04 ...: [  482.520244]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:44:04 ...: [  482.520252]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:44:04 ...: [  482.520258]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:44:04 ...: [  482.520266]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:44:04 ...: [  482.520273]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:44:04 ...: [  482.520280]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:44:04 ...: [  482.520286]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:44:04 ...: [  482.520293]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520299]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:44:04 ...: [  482.520306]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:44:04 ...: [  482.520313]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:44:04 ...: [  482.520319]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:44:04 ...: [  482.520327]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:44:04 ...: [  482.520334]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:44:04 ...: [  482.520341]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:44:04 ...: [  482.520348]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:44:04 ...: [  482.520355]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:44:04 ...: [  482.520361]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520367]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520373]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520380]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520388]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520396]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520403]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520410]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520417]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:44:04 ...: [  482.520426]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520432]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520438]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520447]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520458] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:44:04 ...: [  482.520462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520467] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:44:04 ...: [  482.520475]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520483]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:44:04 ...: [  482.520490]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:44:04 ...: [  482.520498] Call Trace:
21:44:04 ...: [  482.520508]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520515]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520521]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:44:04 ...: [  482.520527]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:44:04 ...: [  482.520533]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:44:04 ...: [  482.520541]  [<ffffffff81145375>] __fput+0xf5/0x210
21:44:04 ...: [  482.520547]  [<ffffffff811454b5>] fput+0x25/0x30
21:44:04 ...: [  482.520554]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:44:04 ...: [  482.520560]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:44:04 ...: [  482.520568]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:44:04 ...: [  482.520574] INFO: task md0_reshape:2287 blocked for more than 120 seconds.
21:44:04 ...: [  482.520578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520582] md0_reshape   D ffff88003aee96f0     0  2287      2 0x00000000
21:44:04 ...: [  482.520590]  ffff88003cf05a70 0000000000000046 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520597]  ffff88003aee9aa8 ffff88003cf05fd8 0000000000015bc0 ffff88003aee96f0
21:44:04 ...: [  482.520605]  0000000000015bc0 ffff88003cf05fd8 0000000000015bc0 ffff88003aee9aa8
21:44:04 ...: [  482.520612] Call Trace:
21:44:04 ...: [  482.520623]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:44:04 ...: [  482.520633]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:44:04 ...: [  482.520643]  [<ffffffffa0226f80>] reshape_request+0x4c0/0x9a0 [raid456]
21:44:04 ...: [  482.520651]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520661]  [<ffffffffa022777a>] sync_request+0x31a/0x3a0 [raid456]
21:44:04 ...: [  482.520668]  [<ffffffff81052713>] ? __wake_up+0x53/0x70
21:44:04 ...: [  482.520675]  [<ffffffff814156b1>] md_do_sync+0x621/0xbb0
21:44:04 ...: [  482.520685]  [<ffffffff810387b9>] ? default_spin_lock_flags+0x9/0x10
21:44:04 ...: [  482.520692]  [<ffffffff8141640c>] md_thread+0x5c/0x130
21:44:04 ...: [  482.520699]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:44:04 ...: [  482.520705]  [<ffffffff814163b0>] ? md_thread+0x0/0x130
21:44:04 ...: [  482.520711]  [<ffffffff81084416>] kthread+0x96/0xa0
21:44:04 ...: [  482.520718]  [<ffffffff810131ea>] child_rip+0xa/0x20
21:44:04 ...: [  482.520725]  [<ffffffff81084380>] ? kthread+0x0/0xa0
21:44:04 ...: [  482.520730]  [<ffffffff810131e0>] ? child_rip+0x0/0x20
21:44:04 ...: [  482.520735] INFO: task mdadm:2288 blocked for more than 120 seconds.
21:44:04 ...: [  482.520739] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:44:04 ...: [  482.520743] mdadm         D 0000000000000000     0  2288      1 0x00000000
21:44:04 ...: [  482.520751]  ffff88002cca9c18 0000000000000086 0000000000015bc0 0000000000015bc0
21:44:04 ...: [  482.520759]  ffff88003aee83b8 ffff88002cca9fd8 0000000000015bc0 ffff88003aee8000
21:44:04 ...: [  482.520767]  0000000000015bc0 ffff88002cca9fd8 0000000000015bc0 ffff88003aee83b8
21:44:04 ...: [  482.520774] Call Trace:
21:44:04 ...: [  482.520782]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:44:04 ...: [  482.520790]  [<ffffffff812a6d50>] ? exact_match+0x0/0x10
21:44:04 ...: [  482.520797]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:44:04 ...: [  482.520804]  [<ffffffff811742c8>] __blkdev_get+0x68/0x3d0
21:44:04 ...: [  482.520810]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:44:04 ...: [  482.520816]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:44:04 ...: [  482.520822]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:44:04 ...: [  482.520829]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:44:04 ...: [  482.520837]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:44:04 ...: [  482.520843]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:44:04 ...: [  482.520850]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:44:04 ...: [  482.520857]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:44:04 ...: [  482.520864]  [<ffffffff810ff0e1>] ? lru_cache_add_lru+0x21/0x40
21:44:04 ...: [  482.520871]  [<ffffffff8111109c>] ? do_anonymous_page+0x11c/0x330
21:44:04 ...: [  482.520878]  [<ffffffff81115d5f>] ? handle_mm_fault+0x31f/0x3c0
21:44:04 ...: [  482.520885]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:44:04 ...: [  482.520891]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:44:04 ...: [  482.520897]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:44:04 ...: [  482.520905]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520053] INFO: task mdadm:1937 blocked for more than 120 seconds.
21:46:04 ...: [  602.520059] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520065] mdadm         D 00000000ffffffff     0  1937      1 0x00000000
21:46:04 ...: [  602.520075]  ffff88002ef4f5d8 0000000000000082 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520084]  ffff88002eb5b198 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5ade0
21:46:04 ...: [  602.520091]  0000000000015bc0 ffff88002ef4ffd8 0000000000015bc0 ffff88002eb5b198
21:46:04 ...: [  602.520099] Call Trace:
21:46:04 ...: [  602.520127]  [<ffffffffa0224892>] get_active_stripe+0x312/0x3f0 [raid456]
21:46:04 ...: [  602.520142]  [<ffffffff81059ae0>] ? default_wake_function+0x0/0x20
21:46:04 ...: [  602.520153]  [<ffffffffa0228413>] make_request+0x243/0x4b0 [raid456]
21:46:04 ...: [  602.520162]  [<ffffffffa0221a90>] ? release_stripe+0x50/0x70 [raid456]
21:46:04 ...: [  602.520171]  [<ffffffff81084790>] ? autoremove_wake_function+0x0/0x40
21:46:04 ...: [  602.520180]  [<ffffffff81414df0>] md_make_request+0xc0/0x130
21:46:04 ...: [  602.520187]  [<ffffffff81414df0>] ? md_make_request+0xc0/0x130
21:46:04 ...: [  602.520197]  [<ffffffff8129f8c1>] generic_make_request+0x1b1/0x4f0
21:46:04 ...: [  602.520206]  [<ffffffff810f6515>] ? mempool_alloc_slab+0x15/0x20
21:46:04 ...: [  602.520215]  [<ffffffff8116c2ec>] ? alloc_buffer_head+0x1c/0x60
21:46:04 ...: [  602.520222]  [<ffffffff8129fc80>] submit_bio+0x80/0x110
21:46:04 ...: [  602.520229]  [<ffffffff8116c849>] submit_bh+0xf9/0x140
21:46:04 ...: [  602.520237]  [<ffffffff8116f124>] block_read_full_page+0x274/0x3b0
21:46:04 ...: [  602.520244]  [<ffffffff81172c90>] ? blkdev_get_block+0x0/0x70
21:46:04 ...: [  602.520252]  [<ffffffff8110d875>] ? __inc_zone_page_state+0x35/0x40
21:46:04 ...: [  602.520259]  [<ffffffff810f46d8>] ? add_to_page_cache_locked+0xe8/0x160
21:46:04 ...: [  602.520266]  [<ffffffff81173d78>] blkdev_readpage+0x18/0x20
21:46:04 ...: [  602.520273]  [<ffffffff810f484b>] __read_cache_page+0x7b/0xe0
21:46:04 ...: [  602.520279]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520285]  [<ffffffff81173d60>] ? blkdev_readpage+0x0/0x20
21:46:04 ...: [  602.520292]  [<ffffffff810f57dc>] do_read_cache_page+0x3c/0x120
21:46:04 ...: [  602.520300]  [<ffffffff810f5909>] read_cache_page_async+0x19/0x20
21:46:04 ...: [  602.520306]  [<ffffffff810f591e>] read_cache_page+0xe/0x20
21:46:04 ...: [  602.520314]  [<ffffffff811a6cb0>] read_dev_sector+0x30/0xa0
21:46:04 ...: [  602.520321]  [<ffffffff811a7fcd>] amiga_partition+0x6d/0x460
21:46:04 ...: [  602.520328]  [<ffffffff811a7938>] check_partition+0x138/0x190
21:46:04 ...: [  602.520335]  [<ffffffff811a7a7a>] rescan_partitions+0xea/0x2f0
21:46:04 ...: [  602.520342]  [<ffffffff811744c7>] __blkdev_get+0x267/0x3d0
21:46:04 ...: [  602.520348]  [<ffffffff81174650>] ? blkdev_open+0x0/0xc0
21:46:04 ...: [  602.520354]  [<ffffffff81174640>] blkdev_get+0x10/0x20
21:46:04 ...: [  602.520359]  [<ffffffff811746c1>] blkdev_open+0x71/0xc0
21:46:04 ...: [  602.520367]  [<ffffffff811419f3>] __dentry_open+0x113/0x370
21:46:04 ...: [  602.520375]  [<ffffffff81253f8f>] ? security_inode_permission+0x1f/0x30
21:46:04 ...: [  602.520383]  [<ffffffff8114de3f>] ? inode_permission+0xaf/0xd0
21:46:04 ...: [  602.520390]  [<ffffffff81141d67>] nameidata_to_filp+0x57/0x70
21:46:04 ...: [  602.520397]  [<ffffffff8115207a>] do_filp_open+0x2da/0xba0
21:46:04 ...: [  602.520404]  [<ffffffff811134a8>] ? unmap_vmas+0x178/0x310
21:46:04 ...: [  602.520413]  [<ffffffff8115dbfa>] ? alloc_fd+0x10a/0x150
21:46:04 ...: [  602.520419]  [<ffffffff81141769>] do_sys_open+0x69/0x170
21:46:04 ...: [  602.520425]  [<ffffffff811418b0>] sys_open+0x20/0x30
21:46:04 ...: [  602.520434]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
21:46:04 ...: [  602.520443] INFO: task mdadm:2283 blocked for more than 120 seconds.
21:46:04 ...: [  602.520447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
21:46:04 ...: [  602.520451] mdadm         D 0000000000000000     0  2283   2212 0x00000000
21:46:04 ...: [  602.520460]  ffff88002cca7d98 0000000000000086 0000000000015bc0 0000000000015bc0
21:46:04 ...: [  602.520468]  ffff88002ededf78 ffff88002cca7fd8 0000000000015bc0 ffff88002ededbc0
21:46:04 ...: [  602.520475]  0000000000015bc0 ffff88002cca7fd8 0000000000015bc0 ffff88002ededf78
21:46:04 ...: [  602.520483] Call Trace:
21:46:04 ...: [  602.520492]  [<ffffffff81543a97>] __mutex_lock_slowpath+0xf7/0x180
21:46:04 ...: [  602.520500]  [<ffffffff8154397b>] mutex_lock+0x2b/0x50
21:46:04 ...: [  602.520506]  [<ffffffff8117404d>] __blkdev_put+0x3d/0x190
21:46:04 ...: [  602.520512]  [<ffffffff811741b0>] blkdev_put+0x10/0x20
21:46:04 ...: [  602.520518]  [<ffffffff811741f3>] blkdev_close+0x33/0x60
21:46:04 ...: [  602.520526]  [<ffffffff81145375>] __fput+0xf5/0x210
21:46:04 ...: [  602.520533]  [<ffffffff811454b5>] fput+0x25/0x30
21:46:04 ...: [  602.520539]  [<ffffffff811415ad>] filp_close+0x5d/0x90
21:46:04 ...: [  602.520545]  [<ffffffff81141697>] sys_close+0xb7/0x120
21:46:04 ...: [  602.520552]  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
3
BarsMonster

J'ai pu contacter Neil Brown (LE développeur), et il a immédiatement suggéré d'augmenter stripe_cache_size à 2048 au moins. Cela ressemble à ma question précédente où je ne pouvais pas rendre ce paramètre permanent.

Ainsi, après l'avoir défini, la modification de la forme 8192 s'est poursuivie, le problème est donc résolu. Que Dieu bénisse Neil Brown :-)

2
BarsMonster

Parfois, un remodelage s'installera à la vitesse = 0K/sec car le fichier de sauvegarde n'a pas pu être créé ou a été perdu pendant le traitement.

La solution, dans ce cas, a été fournie par Neil Brown en réponse à un e-mail à [email protected] .

Vous devriez pouvoir simplement arrêter le tableau et le réassembler avec un fichier de sauvegarde différent et l'indicateur magique "--invalid-backup" (mdadm 3.2 requis ou plus récent).

Le fichier de sauvegarde n'est vraiment nécessaire qu'en cas de plantage. Comme vous arrêterez le tableau proprement, il ne sera pas nécessaire de récupérer quoi que ce soit lorsque vous réassemblez, donc --invalid-backup (qui dit "il n'y a rien dans le fichier de sauvegarde, mais c'est OK) est parfaitement sûr.

NeilBrown


Pour un RAID5, en tant que périphérique /dev/md0, avec 7 disques montés à /mnt/data; la procédure à sa réponse est:

Toutes les commandes suivantes doivent être exécutées en tant que root ou équivalent.

Recherchez toutes les connexions ouvertes au lecteur:

lsof /mnt/data

Fermez-les ou arrêtez les services susceptibles d'interagir avec.
Communément:

systemctl stop <SERVICE_NAME>

ou

service <SERVICE_NAME> stop

Démontez, arrêtez, puis remontez:

umount /mnt/data
mdadm --stop /dev/md0
mdadm --assemble --invalid-backup --backup-file=/root/mdadm0.bak /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1

Selon les configurations précédentes, le périphérique peut remonter automatiquement après la commande d'assemblage. Sinon, montez avec:

mount /dev/md0 /mnt/data

Il est alors sûr de redémarrer tous les services ou connexions qui s'exécutent à partir de là.

1
Kevin