10.07.2015 Views

POWER SOLUTIONS

POWER SOLUTIONS

POWER SOLUTIONS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

STORAGE TECHNOLOGYHost serverHost serverFigure 1. Link level at which error recovery occurs with FC-TAPEcorrupted if any of the data blocks are written out of sequence. Oncedata is written incorrectly to tape, it cannot be corrected—hencethe need to prevent the errors from occurring. In most cases, tapebackup operations are interrupted because of an external event suchas a target reset, misbehaving hardware devices, a SCSI commandtime-out, or a faulty link. Mechanisms are in place to detectand recover from many types of errors, and one such non–FibreChannel mechanism is the error recovery process built into tapebackup software applications.When a data transmission fails, a well-written tape backupsoftware application can perform some error recovery with a tapebackup device. However, such error recovery is often limited bythe type of error and can take several minutes. Although errorrecovery techniques are proprietary to each software vendor,a general process is implemented to identify the problem andattempt to recover from it. If an error is encountered, the tapesoftware application will rely on a SCSI check condition andsense data from the tape target. The tape software will thenidentify the current block position and compare it to the expectedblock position. As part of the error recovery process, tape backupsoftware applications are designed to verify whether all data wassuccessfully transferred to tape by reading back the last datatransfer. Once this process is complete, the tape software willcontinue with the tape backup operations.Although error recovery can be performed effectively by tapebackup software, this approach is a time-consuming process thatoften takes up to 10 minutes. This may appear to be an acceptableamount of time to administrators, but it is an extremelylong time compared to the error recovery time frame that anFC-TAPE device implements, which is typically less than fiveseconds. With FC-TAPE, error detection and recovery operationsare conducted at the link level (between the HBA and the tapedevice) and do not involve OS components such as the portdriver. This approach is designed to make FC-TAPE error recoveryand data retransmission more efficient than possible with tapebackup software.Fibre ChannelHBAFibre ChannelHBAFibre Channel–SCSIbridgeNative Fibre Channeltape deviceHandling backup errors with FC-TAPE devicesSCSItape deviceThe error recovery process for FC-TAPE devices occurs at the linklevel. During a Fibre Channel process login (PRLI) extended linkservice request, ports will exchange service parameters and declarethe capabilities of each device. Inside the parameter field of thePRLI request are two entities related to recovery: the RETRY bit andthe Confirm Completion Allowed bit. If the RETRY bit is set, thisindicates that the request initiator supports retransmission of dataframes, or that the target has the ability to request retransmissionof data. Once the RETRY bit is set, two functions are enabled toassist in error recovery: Read Exchange Concise (REC) and SequenceRetransmission Request (SRR).Read Exchange Concise functionThe REC is an extended link service that requests additionalinformation on an open exchange mainly based on the initiator’sSource Identifier (S_ID) and the Originator Exchange Identifier(OX_ID), which are specified in the payload of the REC. A REC istransmitted by a device whenever the time-out value (REC_TOV)has expired. The REC_TOV is defined as one second more thanthe Error Detect time-out value (E_D_TOV). Usually, the E_D_TOVdefault is two seconds; therefore, the REC_TOV is set to threeseconds. When the REC_TOV timer expires after three seconds,a REC is sent to the target device requesting information on theData on tape will be corruptedif any of the data blocks arewritten out of sequence. Oncedata is written incorrectly totape, it cannot be corrected—hence the need to prevent theerrors from occurring.open exchange. For example,a SCSI command may takelonger than three secondsto get a response and thusthe REC_TOV timer wouldexpire. If the target acceptsthe REC, it will respond withan accept (ACC) indicatingthe status of the exchange,via its payload. Otherwise,the target will reject the RECwith a link service rejectindicating why the REC wasrejected.Note that a failed datatransmission is not the only reason a REC can be transmitted.Tape backup operations that take longer than the REC_TOV timeralso cause the initiator to transmit a REC. The target will respondwith an ACC indicating that the sequence is still in progress.The initiator will then transmit a REC every three seconds (theREC_TOV) until the operation is complete. Examples of this scenarioinclude time-consuming commands such as a long erase totape drives and media movement within libraries.The ACC payload, in response to a REC, contains two wordvalues indicating the current status of the exchange: the datatransfer count (DTC) and the exchange status block (E_STAT). TheDTC is the number of bytes successfully received by the deviceduring a write operation, or the number of bytes transmitted by thewww.dell.com/powersolutions Reprinted from Dell Power Solutions, August 2005. Copyright © 2005 Dell Inc. All rights reserved. DELL <strong>POWER</strong> <strong>SOLUTIONS</strong> 89

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!