10.07.2015 Views

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 9 RECOVERING EXADATA)wherename='SCRATCH_DG'andstate='MOUNTED'Our test database also detected the loss of the grid disk, as can be seen in the database alert log:alert_SCRATCH.log-----------------------Tue Dec 28 08:40:54 2010Errors in file /u01/app/oracle/diag/rdbms/scratch/SCRATCH/trace/SCRATCH_ckpt_22529.trc:ORA-27603: Cell storage I/O error, I/O failed on disk o/192.168.12.5/SCRATCH_CD_05_cell03 atoffset 26361217024 for data length 16384ORA-27626: <strong>Exadata</strong> error: 201 (Generic I/O error)WARNING: Read Failed. group:5 disk:3 AU:6285 offset:16384 size:16384WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 260 ingroup [5.1611847437] from disk SCRATCH_CD_05_CELL03 allocation unit 6285 reason error; ifpossible,will try another mirror sideNOTE: successfully read mirror side 2 of virtual extent 0 logical extent 1 of file 260 ingroup [5.1611847437] from disk SCRATCH_CD_05_CELL02 allocation unit 224...Tue Dec 28 08:40:54 2010NOTE: disk 3 (SCRATCH_CD_05_CELL03) in group 5 (SCRATCH) is offline for readsNOTE: disk 3 (SCRATCH_CD_05_CELL03) in group 5 (SCRATCH) is offline for writesNotice that the database automatically switches to the mirror copy for data it can no longer readfrom the failed grid disk. This is ASM normal redundancy in action.When we reinsert the disk drive, the storage cell returns the grid disk to a state of Active, and ASMbrings the disk back online again. We can see that the grid disk has returned to a state of CACHED and aHEADER_STATUS of NORMAL in the following query:SYS:+ASM2> select dg.name, d.name, dg.state, d.mount_status, d.header_status, d.statefrom v$asm_disk d,v$asm_diskgroup dgwhere dg.name = 'SCRATCH'and dg.group_number = d.group_numberorder by 1,2;NAME NAME STATE MOUNT_S HEADER_STATU STATE------------- ------------------------- ---------- ------- ------------ ----------SCRATCH SCRATCH_CD_05_CELL01 MOUNTED CACHED MEMBER NORMALSCRATCH SCRATCH_CD_05_CELL02 MOUNTED CACHED MEMBER NORMALSCRATCH SCRATCH_CD_05_CELL03 MOUNTED CACHED MEMBER NORMAL It is likely that the disk group will need to catch up on writing data that queued up while the disk wasoffline. If the duration of the outage was short and the write activity was light, then the rebalance won’ttake long. It may even go unnoticed. But if the disk has been offline for several hours or days, or if therewas a lot of write activity in the disk group during the outage, this could take a while. Generally speaking,the delay is not a problem, because it all happens in the background. During the resilvering process,ASM redundancy allows our databases to continue with no interruption to service.If this had been an actual disk failure and we actually replaced the disk drive, we would need to waitfor the RAID controller to acknowledge the new disk before it could be used. This doesn’t take long but309

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!