Monday, September 20, 2010

Error: task md#_resync:### blocked for more than ### seconds

While building a new machine using CentOS 5.5, found this in the log files.  It also gets spammed to the console every 2-3 minutes.

kernel: INFO: task md1_resync:434 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: md1_resync    D ffff8100bebbde30     0   434     35           437   433 (L-TLB)
kernel:  ffff8100bebbdd70 0000000000000046 0000000000000000 ffff8100bed6300c
kernel:  ffff8100bed6340c 0000000000000009 ffff8100beb9d100 ffff8100bff937a0
kernel:  00000011b743e63b 000000000005ad3b ffff8100beb9d2e8 0000000000000000
kernel: Call Trace:
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_do_sync+0x1d8/0x833
kernel:  [] enqueue_task+0x41/0x56
kernel:  [] __activate_task+0x56/0x6d
kernel:  [] dequeue_task+0x18/0x37
kernel:  [] thread_return+0x62/0xfe
kernel:  [] autoremove_wake_function+0x0/0x2e
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_thread+0xf8/0x10e
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] md_thread+0x0/0x10e
kernel:  [] kthread+0xfe/0x132
kernel:  [] child_rip+0xa/0x11
kernel:  [] keventd_create_kthread+0x0/0xc4
kernel:  [] kthread+0x0/0x132
kernel:  [] child_rip+0x0/0x11

There's a Red Hat Bugzilla entry (#18061) on the topic.  Basically, it's because I have Linux software RAID in the process of synchronizing the freshly created arrays.  On the disks in the machine, I have each one divided up into (3) partitions, and each partition slice belongs to a different mdadm RAID.  Since mdadm is smart enough not to thrash the disks, any arrays which have not yet been synchronized will be put into a "DELAYED" mode

Personalities : [raid1]
md0 : active raid1 sdc1[2] sdb1[1] sda1[0]
      256896 blocks [3/3] [UUU]
     
md2 : active raid1 sdc3[2] sdb3[1] sda3[0]
      455892480 blocks [3/3] [UUU]
      [=>...................]  resync =  6.1% (27871296/455892480) finish=323.7min speed=22033K/sec
     
md1 : active raid1 sdc2[2] sdb2[1] sda2[0]
      32234304 blocks [3/3] [UUU]
        resync=DELAYED
     
unused devices:

The error is basically a false positive and no harm is being done except for the message spam on the console and in the system log files.  It will happen every time that this system does the weekly array sync, however (until it gets fixed).

No comments: