Sunday 22 May 2016

Unusual disk failures

Drives always seem to be failing or having problems. More often than I gave credit for. I had 3 drives do this at the same time just during a simple archive creation process. I wonder if I had a bad batch, then again the more drives you have the higher chance something will fail.


        NAME                         STATE     READ WRITE CKSUM
        data                         DEGRADED     0     0     0
          raidz2-0                   DEGRADED     0     0     0
            c0t50004CF210AD1C22d0    ONLINE       0     0     0
            spare-1                  DEGRADED     0     0   249
              c0t50004CF210BE51F1d0  DEGRADED     0     0    70
              c4t0d0                 ONLINE       0     0     0
            spare-2                  DEGRADED     1     0     2
              c0t50004CF210BE51F3d0  UNAVAIL      0     0     0
              c4t1d0                 ONLINE       0     0     0
            c0t50004CF210BE5214d0    ONLINE       0     0     0
            c5t3d0                   ONLINE       0     0     0
            c4t3d0                   ONLINE       0     0     0
        spares
          c4t1d0                     INUSE  
          c4t0d0                     INUSE  
     
  NAME                       STATE     READ WRITE CKSUM
        rpool                      DEGRADED     0     0     0
          mirror-0                 DEGRADED     0     0     0
            c0t500A0751F0096E9Ed0  DEGRADED     0     0   196
            c0t500A0751F0097DA7d0  ONLINE       0     0     0

I attempted reading some more and ...

         NAME                       STATE     READ WRITE CKSUM
        rpool                      DEGRADED     0     0     0
          mirror-0                 DEGRADED     0     0     0
            c0t500A0751F0096E9Ed0  DEGRADED     0     0 1.00K
            c0t500A0751F0097DA7d0  ONLINE       0     0     0


so 2 are degraded due to checksum errors attempting to read data back, other drive just seems to not be powered on at all. (L.E.D. on front inactive) why?

I'll re architect the data pool, first I'll test out autoreplace and find out how it works. (I assume simply take disk out, put in new then all done). Will depend on HW support as well so, best test this. Made a comment in the Oracle Community - https://community.oracle.com/message/13836284#13836284

zpool get autoreplace data
NAME  PROPERTY     VALUE  SOURCE
data  autoreplace  on     local

in the end the Mobo was simply faulty, (went on fire) beside some small chips by the LSI SAS controller next to heat-sink.

No comments:

Post a Comment