30.11.2012 Views

FICON Express2 Channel Performance Version 1.0 - IBM

FICON Express2 Channel Performance Version 1.0 - IBM

FICON Express2 Channel Performance Version 1.0 - IBM

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

April 2005<br />

<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong><br />

<strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Cathy Cronin<br />

zSeries I/O <strong>Performance</strong><br />

ccronin@us.ibm.com


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 1<br />

Introduction<br />

This white paper was developed to help <strong>IBM</strong> ® field sales specialists and technical<br />

representatives understand the performance characteristics of <strong>FICON</strong> ® <strong>Express2</strong> channels.<br />

What’s New<br />

<strong>FICON</strong> <strong>Express2</strong> channels are a new generation of <strong>FICON</strong> channels that offer improved<br />

performance capability over previous generations of <strong>FICON</strong> Express and <strong>FICON</strong> channels.<br />

They are being introduced on the <strong>IBM</strong> eServer zSeries ® 990 (z990) and zSeries 890 (z890).<br />

Overview<br />

<strong>IBM</strong> has made significant improvements to <strong>FICON</strong> channels since this product was initially<br />

shipped in 1999. The following chart depicts some of those improvements:<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

Figure 1<br />

<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong><br />

I/Os per second (k)<br />

4k block sizes<br />

<strong>Channel</strong> 100% utilized<br />

1200<br />

3600<br />

6000<br />

7200<br />

9200<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

13000<br />

300<br />

250<br />

200<br />

150<br />

100<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

2 Gbps<br />

270<br />

Reflected in the left bar chart is the "best can do" capabilities of each of the <strong>FICON</strong> channels<br />

in native <strong>FICON</strong> or FC mode measured at a point in time using an I/O driver benchmark<br />

program for 4K byte read hits. 4K bytes is the size of most online database I/O operations.<br />

These are the maximum possible or 100% channel utilization 4K I/O rates for each channel.<br />

50<br />

0<br />

MB/sec throughput (Full Duplex)<br />

Large Sequential R/Ws<br />

17<br />

74<br />

120<br />

170


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 2<br />

Normally customers should keep their channels at 50% or less channel utilization to achieve<br />

good online transaction response times.<br />

Reflected in the right bar chart is the "best can do" capabilities of each of the <strong>FICON</strong><br />

channels in native <strong>FICON</strong> or FC mode measured using an I/O driver benchmark program for<br />

6x27K or 6 half-track reads and writes. This is representative of the type of channel<br />

programs used in disk to tape backup jobs or other highly sequential batch jobs. The original<br />

<strong>FICON</strong> channels run at a link speed of 1 Gigabit/second. <strong>FICON</strong> Express and <strong>FICON</strong><br />

<strong>Express2</strong> channels will auto-negotiate to either 1 Gigabit/s or 2 Gigabit/s, depending on the<br />

capability of the director or control unit port at the other end of the link.<br />

As you can see, the <strong>FICON</strong> <strong>Express2</strong> channel as first introduced on the <strong>IBM</strong> zSeries z890 and<br />

z990 represents a significant improvement in both 4K I/O per second throughput and<br />

maximum bandwidth capability compared to ESCON ® and previous <strong>FICON</strong> offerings.<br />

Please remember that this performance data was measured in a controlled environment<br />

running an I/O driver program. The actual throughput or performance that any user will<br />

experience will vary depending upon considerations such as the amount of<br />

multiprogramming in the user’s job stream, the I/O configuration, the storage configuration,<br />

and the workload processed.<br />

This paper assumes that the reader is familiar with the basic benefits of <strong>FICON</strong> vs. ESCON<br />

technology and will explain in more detail the performance characteristics of <strong>FICON</strong><br />

<strong>Express2</strong> channels running in FC mode (native <strong>FICON</strong>) including DASD I/O driver<br />

benchmark results, CTC measurement results, <strong>FICON</strong> <strong>Express2</strong> channel and ESTI-M card<br />

level measurement results.<br />

Please note that <strong>FICON</strong> <strong>Express2</strong> channels do not support FCV (<strong>FICON</strong> Converter) mode for<br />

attachment to ESCON devices. For an introduction to the basic benefits of <strong>FICON</strong> vs.<br />

ESCON technology and for info on FCV mode performance, please see version 2 of the<br />

<strong>FICON</strong> and <strong>FICON</strong> Express <strong>Performance</strong> white paper on the zSeries I/O connectivity Web<br />

site at the following URL:<br />

www.ibm.com/servers/eserver/zseries/connectivity/


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 3<br />

Introduction to some terminology used in I/O processing<br />

First I would like to start by explaining some of the basic terms that I will be using in the rest<br />

of this paper.<br />

some resources & terminology involved in<br />

I/O processing<br />

zSeries CP<br />

<strong>FICON</strong><br />

channel<br />

card<br />

<strong>FICON</strong><br />

channel<br />

pci<br />

bus<br />

sti<br />

link<br />

ssch<br />

adapter<br />

esti-M<br />

card<br />

zSeries<br />

Sap/IOP<br />

esti<br />

link<br />

fc<br />

link director<br />

f-port<br />

Figure 2<br />

io<br />

interrupt<br />

MBA<br />

chip<br />

director<br />

f-port<br />

<strong>FICON</strong><br />

channel<br />

processor<br />

store<br />

fetch<br />

fc<br />

link<br />

zSeries<br />

Memory<br />

cu<br />

n-port<br />

As depicted in the top row of Figure 2, an I/O is initiated when a zSeries CP (Central<br />

Processor) executes a SSCH (start subchannel) instruction. This sends a signal to a SAP<br />

(System Assist Processor), which is also called an IOP ( I/O Processor) that there is I/O work<br />

to do. It is the SAP’s job to select which channel path to use to get to the device which is the<br />

target of this I/O. The SAP is also involved in processing the I/O interrupts that are sent back<br />

for most I/O’s at the end of the I/O operation. Some channel programs generate PCI’s<br />

(Programmed Controlled Interrupts) which can occur at designated points in the middle of an<br />

I/O operation.<br />

The second row depicts the path that is followed for any data transfer that occurs during an<br />

I/O operation between the <strong>FICON</strong> channel card and zSeries memory. For a READ I/O, data<br />

is READ from the device and stored into zSeries memory. For a WRITE I/O, data is fetched<br />

from zSeries memory and written to the device. There are 4 <strong>FICON</strong> <strong>Express2</strong> channels on a<br />

<strong>FICON</strong> <strong>Express2</strong> channel card that share a 1GB/sec STI link connected to an ESTI-M card.<br />

Up to 4 channel cards of any type (ESCON, <strong>FICON</strong> Express or <strong>FICON</strong> <strong>Express2</strong>) can be


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 4<br />

connected to the same ESTI-M card and these would share a single 2GB/sec ESTI link from<br />

the ESTI-M card to the MBA chip.<br />

The third row depicts the path followed by commands and data frames transferred from a<br />

<strong>FICON</strong> channel to a <strong>FICON</strong> CU port. Each of the 4 <strong>FICON</strong> <strong>Express2</strong> channels on the <strong>FICON</strong><br />

<strong>Express2</strong> channel card has its own PCI bus connected to an industry standard Emulex fibre<br />

channel adapter chip which handles the transmitting and receiving of frames across the<br />

2Gbps FC (fibre channel) link. The FC link could be connected point-to-point to a CU port<br />

or through a source and destination Fabric port (f-port) on a director. Both the channel and<br />

the CU ports are called N-ports in the Fabric. If two directors were cascaded together the<br />

ports connecting the two directors would be called E-ports and the link connecting the two<br />

directors is an ISL or inter-switch link. One source of confusion that I have seen very often is<br />

to use the term channel adapter or even just channel for the CU port. In this paper, when I<br />

use the term channel, I mean the chip on the card that is plugged into the zSeries CEC. It is<br />

important to understand that <strong>FICON</strong> channels and <strong>FICON</strong> CU ports can have very different<br />

performance capabilities. It is the performance capabilities of <strong>FICON</strong> <strong>Express2</strong> channels that<br />

are presented in this paper.<br />

In general, each of the various resources depicted above are utilized at different levels<br />

depending on the type of I/O that is being processed and the numbers of each resource (CPs,<br />

SAPs, MBA chips, ESTI-M cards, channel cards, director ports and CU ports) that are in the<br />

configuration. For the most part, with small block I/O operations, processors such as the<br />

<strong>FICON</strong> channel and the CU port are pushed to higher levels of utilization than the buses and<br />

links. In contrast, I/O’s that transfer a lot of data push the buses and links to higher levels of<br />

utilizations than the processors. The resource that gets pushed to the highest utilization will<br />

be the one that limits higher levels of throughput from being achieved.<br />

<strong>FICON</strong> <strong>Express2</strong> benchmark measurement results<br />

To achieve maximum channel capabilities, I/O driver benchmark measurements were<br />

conducted using a configuration with 4 <strong>FICON</strong> <strong>Express2</strong> channels on 4 different channel<br />

cards connected through three 2Gbps directors to 4 ports on each of 6 different control unit<br />

(CU) or storage subsystem boxes as depicted in Figure 3:


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 5<br />

z990<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

<strong>Channel</strong>s<br />

A1,B2,C3,D4<br />

CU box 1<br />

CU box 2<br />

Configuration used for<br />

<strong>FICON</strong> <strong>Express2</strong> channel<br />

benchmark measurements<br />

Director(s)<br />

Figure 3<br />

CU box 6<br />

Please note that the response time results reported in this paper are the average of all of the<br />

LCUs (Logical Control Units) on the storage subsystems or CU boxes used for these<br />

measurements.<br />

Measurements done in a point-to-point topology without directors and/or using control units<br />

that have ports with less I/O per second or MB/sec throughput capabilities than the <strong>FICON</strong><br />

<strong>Express2</strong> channels will not push the channels to their maximum capability. Furthermore, if<br />

one is interested in determining the maximum capability of a CU port instead of a channel,<br />

then it is recommended that a configuration with multiple channels connected through a<br />

director be used to obtain the best results. An example of this is depicted in Figure 4.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 6<br />

Recommended configuration for<br />

determining max capability of a CU port for<br />

benchmark testing<br />

multiple <strong>FICON</strong><br />

channels<br />

connected through<br />

a <strong>FICON</strong> director to<br />

a single CU port<br />

Figure 4<br />

The four basic DASD I/O driver benchmark programs used to evaluate the capabilities of the<br />

new <strong>FICON</strong> <strong>Express2</strong> channels are as follows:<br />

1. 4K bytes per I/O: this channel program processes small blocks of I/O and is capable of<br />

achieving high I/O per second rates but much lower MB/sec rates than large block<br />

channel programs. With the appropriate read/write ratios and CU cache hit ratios, this<br />

benchmark is representative of online transaction processing workloads.<br />

2. 6x27K bytes per I/O: this channel program processes 6 large blocks with 27K bytes each<br />

or 6 half-tracks of data and is capable of achieving high MB/sec but much lower I/O per<br />

second than the small block channel programs. It is representative of the type of channel<br />

programs used in disk to tape backup jobs or other highly sequential batch jobs.<br />

3. 27K bytes per I/O: this channel program processes a single half track of data and achieves<br />

both I/O per second and MB/sec that are in between the extremes of the 4K and 6x27K<br />

bytes per I/O benchmarks.<br />

4. 32x4K bytes per I/O: this channel program processes 32 small (4K byte) blocks of I/O and<br />

is representative of some DB2 pre-fetching utilities and other channel programs that<br />

process long chains of short blocks of data.<br />

Figure 5 below shows the average of all of the LCU (Logical Control Unit) response times for<br />

the 4k read hit benchmark measurement plotted with <strong>FICON</strong> Processor Utilization(FPU) %s .<br />

Response times in milliseconds are on the left y-axis. FPU %s are on the 2nd or right y-axis.<br />

The knee of the response time curve occurs around 10,000 I/O’s per second and just above<br />

70% <strong>FICON</strong> Processor utilization (FPU) when running this very simple 4K read hit<br />

benchmark workload. But most real production workloads are more complex than this<br />

z990<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

<strong>Channel</strong>s<br />

W1,X2,Y3,Z4<br />

Director<br />

CU box


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 7<br />

simple benchmark and in general, we usually recommend that you keep FPU below 50% to<br />

achieve good online transaction response times. The 50% FPU point occurs between 6000<br />

and 7000 I/O’s per second when running this very simple 4K read hit benchmark workload.<br />

<strong>FICON</strong> <strong>Express2</strong>(FEx2) 4K read hit<br />

response times & <strong>FICON</strong> processor utilization(FPU)<br />

resp time in millisec<br />

5<br />

100<br />

90<br />

4<br />

80<br />

70<br />

3<br />

60<br />

50<br />

2<br />

40<br />

30<br />

1<br />

20<br />

10<br />

0<br />

0<br />

0 2 4 6 8 10 12 14<br />

io/sec<br />

Thousands<br />

Figure 5<br />

Figure 6 shows the breakdown of response time components for this simple 4k read hit<br />

benchmark from 10% <strong>FICON</strong> Processor Utilization(FPU) through 70% FPU or just before<br />

the knee of the response time curve. Both total response times and the response time<br />

components of IOSQ, PEND, DISC (disconnect) and CONN (connect) time can be found on<br />

the RMF Device Activity report. IOSQ time is the time that an I/O is delayed due to the fact<br />

that another I/O from this system is already using the target device for this I/O. PEND time<br />

starts when the CP sends the I/O request to the SAP and includes the amount of time it takes<br />

for the channel to process the first few CCWs (<strong>Channel</strong> Command Words) in the channel<br />

program and send the commands to the Control Unit and does not end until the Control Unit<br />

sends a CMR or Command Response back to the channel. Disconnect time is the amount of<br />

time it takes the Control Unit to service a CU cache miss and retrieve the data from the<br />

device. CONNECT time is basically the data transfer time for the I/O. So, in this 4k read hit<br />

benchmark, there is no DISC time since there are no CU cache misses and there is no IOSQ<br />

time since the I/O driver program we use is designed to wait for an I/O to an individual<br />

device to finish before it issues another I/O to that same device. So, all we have is PEND<br />

<strong>FICON</strong> Processor Util%<br />

FEx2 resp time<br />

FEx2 FPU%


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 8<br />

time + CONN time. Since each I/O only transfers 4K bytes of data, CONN time is the smaller<br />

of the two components and remains relatively constant from low I/O rates to high I/O rates.<br />

On the other hand, PEND time grows as the FPU% grows. If we had used a point-to-point<br />

configuration to do this measurement and if the CU port had less capability than the new<br />

<strong>FICON</strong> <strong>Express2</strong> channel, then the PEND time would have grown faster as a function of the<br />

CU port processor utilization and we would not have been able to push the channel to its<br />

maximum capability.<br />

response time components for 4K read<br />

hits...4K bytes/io, 100% cu cache hit ratio<br />

<strong>FICON</strong> processor utilization<br />

10% FPU<br />

20% FPU<br />

30% FPU<br />

40% FPU<br />

50% FPU<br />

60% FPU<br />

70% FPU<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

response times in ms<br />

Figure 6<br />

pend<br />

conn<br />

Figure 7 shows the response time components for a more realistic version of an online<br />

transaction processing workload with a mix of reads and writes and a 70 to 80% CU cache hit<br />

ratio. In this case disconnect time is the largest component of the total response time and the<br />

component that grows the most as the activity rate increases. PEND and CONNECT times<br />

are about equal up to the 60% FPU point. There is a more significant increase in total<br />

response time beyond the 50% FPU point than there was with the simpler 4K read<br />

benchmark.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 9<br />

response time components for 4K bytes/io,<br />

3:1 r/w ratio, 70 to 80% cu cache hit ratio<br />

<strong>FICON</strong> Processor utilization<br />

10% FPU<br />

20% FPU<br />

30% FPU<br />

40% FPU<br />

50% FPU<br />

60% FPU<br />

70% FPU<br />

0 1 2 3 4 5<br />

response times in ms<br />

Figure 7<br />

pend<br />

disc<br />

conn<br />

Figures 8 and 9 represent two different ways of looking at the results of the 6x27k read hit<br />

benchmark. Figure 8 is a plot of response times and <strong>FICON</strong> processor utilization for the<br />

6x27K read hit benchmark. Since this workload transfers over 165,000 bytes per I/O using a<br />

block size of 27K bytes, it stresses the links and buses more than it does the <strong>FICON</strong> channel<br />

processor. The maximum I/Os per second achieved was only 1100 io/sec which only drove<br />

the <strong>FICON</strong> processor utilization to about 40%. Therefore the channel processor is really not<br />

the resource that prevents this workload from achieving higher throughput. In general, for<br />

workloads that use large block sizes such as the 27K byte half-track size, it makes more sense<br />

to look at MB/sec instead of I/O per second and bus or link utilizations instead of processor<br />

utilizations. Figure 9 shows that this workload achieves 200 MB/sec which is the limit of the<br />

2 Gbps link.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 10<br />

z990 FEx2 6x27K Read hits<br />

response times & FPU vs io/sec<br />

avg response time in ms<br />

5<br />

100<br />

90<br />

4<br />

80<br />

70<br />

3<br />

60<br />

50<br />

2<br />

40<br />

30<br />

1<br />

20<br />

10<br />

0<br />

0<br />

0 200 400 600 800 1000 1200 1400<br />

io/sec per channel<br />

Figure 8<br />

In Figure 9, we look at the results of this same 6x27K read hit benchmark measurement with<br />

response times and <strong>FICON</strong> link utilization vs. MB/sec. <strong>FICON</strong> channel link utilizations are<br />

not directly reported on RMF but can be easily calculated by dividing the READ or WRITE<br />

MB/sec by the link capacity. In this case, with 2 Gpbs links, the capacity is approximately<br />

200MB/sec. Here we see that the 2 Gbps link is the limit to achieving higher throughput<br />

since it is the resource that is being pushed closest to 100% utilization. The “knee of the<br />

response time” curve generally occurs between 70 and 80% link utilizations. If you are more<br />

interested in throughput than response times, then these utilization levels may be acceptable.<br />

If, however, you are running response time sensitive workloads then it might be more<br />

appropriate to keep link utilizations below 50%.<br />

<strong>FICON</strong> processor util%<br />

FEx2 resp time<br />

FEx2 FPU


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 11<br />

z990 FEx2 6x27K Read hits<br />

response times & link util vs MB/sec<br />

avg response time in ms<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

0<br />

0 40 80 120 160 200<br />

MB/sec per channel<br />

Figure 9<br />

Figure 10 shows the response time components for the 6x27k read hit benchmark. Since the<br />

data transfer size for this channel program is 162K bytes per I/O and it uses a large 27k block<br />

size, CONNECT time is the dominant part of the total response time. CONNECT time grows<br />

from under 2ms at 20% channel link utilization to over 3ms at very high (90%) link<br />

utilization levels but these CONNECT times are still significantly better than the 10ms<br />

measured for this benchmark a few years ago using ESCON channels. PEND time also grows<br />

a few tenths of a millisecond at high link utilizations. But at the 50% <strong>FICON</strong> <strong>Express2</strong><br />

channel link utilization level, total response times are only a few tenths of a millisecond<br />

higher than the best case response times for this workload.<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

<strong>FICON</strong> link util%<br />

FEx2 resp time<br />

FEx2 link util%


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 12<br />

response time components for 6x27K read<br />

hits...162K bytes/io, 100% cu cache hit ratio<br />

2Gbps Link Utilization(LU)<br />

20% LU<br />

30% LU<br />

40% LU<br />

50% LU<br />

60% LU<br />

70% LU<br />

80% LU<br />

90% LU<br />

0 1 2 3 4 5<br />

response times in ms<br />

Figure 10<br />

pend<br />

conn<br />

Figure 11 depicts the results of the 6x27K read/write mix benchmark where we achieve<br />

270MB/sec by taking advantage of the full duplex capabilities of <strong>FICON</strong> links and<br />

simultaneously processing some I/O’s that READ from DASD and some I/O’s that WRITE to<br />

DASD. The 270 MB/sec achieved for this benchmark using <strong>FICON</strong> <strong>Express2</strong> channels is<br />

more than 50% higher than the maximum MB/sec that was achieved with the previous<br />

generation <strong>FICON</strong> Express channels. The Full Duplex Link Utilizations (FDLU) plotted on<br />

the 2nd y-axis in Figure 11 is calculated by dividing the sum of the READ + WRITE MB/sec<br />

by 400 MB/sec which is the sum of the maximum instantaneous capabilities of the two<br />

directional 2 Gigabit per second (Gbps) links that exist between the <strong>FICON</strong> <strong>Express2</strong> channel<br />

and the director port, where one 2 Gbps link transmits the commands and data frames from<br />

the channel to the director and the other 2 Gbps link transfers the commands and data<br />

frames in the opposite direction from the director to the channel.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 13<br />

z990 FEx2 6x27K Read/write mix<br />

response times & Link Utilization vs MB/sec<br />

Response Times in ms<br />

10<br />

Figure 11<br />

Figure 12 depicts the response time components for the 6x27k read/write mix benchmark.<br />

CONNECT time is the largest component and increases the most as full duplex link<br />

utilization increases.<br />

Full Duplex Link Utilization(FDLU)<br />

8<br />

6<br />

4<br />

2<br />

0<br />

0<br />

30<br />

response time components for 6x27K read/write<br />

mix, 162K bytes/io, 100% cu cache hit ratio<br />

10% FDLU<br />

20% FDLU<br />

30% FDLU<br />

40% FDLU<br />

50% FDLU<br />

60% FDLU<br />

60<br />

90<br />

120<br />

150<br />

180<br />

210<br />

MB/sec per channel<br />

240<br />

270<br />

0 1 2 3 4 5 6<br />

response times in ms<br />

Figure 12<br />

0<br />

300<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

Full Duplex Link Utilization(FDLU)<br />

Resp time<br />

FEx2 FDLU<br />

pend<br />

conn


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 14<br />

Figure 13 shows the <strong>FICON</strong> <strong>Express2</strong> channel PCI bus utilizations for this same 6x27K<br />

Read/Write mix benchmark. PCI bus utilizations are the bus utilizations reported on the<br />

RMF <strong>Channel</strong> activity report for <strong>FICON</strong> <strong>Express2</strong> channels but there is another internal<br />

channel bus whose utilization is roughly 1.5 to 2 times the PCI bus utilization. This internal<br />

channel bus is the real resource that limits the 6x27K Read/Write benchmark from achieving<br />

higher than 270MB/sec but it is highly unlikely that any real production workload would<br />

come anywhere near approaching this limit. For real production workloads, the most<br />

relevant resource limits to pay attention to are the channel and the control unit processor and<br />

link limits and these are the limits that I have highlighted for each benchmark measurement<br />

result presented in this paper.<br />

z990 FEx2 6x27K Read/write mix<br />

<strong>FICON</strong> Bus Utilization(FBU) vs MB/sec<br />

<strong>FICON</strong> bus utilization (FBU)<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0 30 60 90 120 150 180 210 240 270 300<br />

Total READ + WRITE MB/sec<br />

Figure 13<br />

FEx2 FBU<br />

Figures 14 and 15 represent two different ways of looking at the results of the 27K or<br />

half-track read hit benchmark. The first figure below is a plot of response times and <strong>FICON</strong><br />

processor utilizations vs. io/sec. Here we see a sharp increase in response times just over<br />

6000 io/sec and at about 60% <strong>FICON</strong> processor utilization (FPU). The second 27K read hit<br />

graph plots response times and <strong>FICON</strong> channel link utilizations vs. MB/sec. The sharp<br />

increase in response times occurs at about 170 MB/sec which is about 85% of the maximum<br />

link capability and indicates that the 2 Gbps link is the limiting resource for this workload.<br />

But processor utilizations are pushed to high levels as well. The 27k read hit benchmark<br />

pushes both processor and link utilizations to high levels with link utilizations slightly higher<br />

than the processor.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 15<br />

<strong>FICON</strong> <strong>Express2</strong> 27K read hit<br />

response times & channel processor util<br />

resp time in millisec<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

0<br />

0 1 2 3 4 5 6 7<br />

io/sec<br />

Thousands<br />

Figure 14<br />

<strong>FICON</strong> <strong>Express2</strong> 27K read hit<br />

response times & link utilizations<br />

resp time in millisec<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

0<br />

0 40 80 120 160 200<br />

MB/sec<br />

Figure 15<br />

100<br />

80<br />

60<br />

40<br />

20<br />

100<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

<strong>FICON</strong> Processor Util%<br />

<strong>FICON</strong> Link Util%<br />

FEx2 resp time<br />

FEx2 FPU%<br />

FEx2 resp time<br />

FEx2 Link U%


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 16<br />

Figure 16 is the response time components for the 27k read hit benchmark. Because of the<br />

large 27k block size, CONNECT time is the largest component of the total response time. As<br />

link utilizations (LU) increase, CONNECT time increases first by one tenth of a millisecond,<br />

and then PEND time also increases by one tenth of a millisecond. At 80% LU, both<br />

CONNECT time and PEND time are 0.4ms higher than they were at 10% LU, another<br />

indicator that this workload is in the middle of the two extremes defined by processor limited<br />

workloads such as the 4K bytes per I/O benchmarks and bus or link limited workloads such<br />

as the 6x27K bytes per I/O benchmarks.<br />

response time components for 27K read hits...<br />

27K bytes/io, 100% cu cache hit ratio<br />

2Gbps Link Utilization(LU)<br />

10% LU<br />

20% LU<br />

30% LU<br />

40% LU<br />

50% LU<br />

60% LU<br />

70% LU<br />

80% LU<br />

0 0.5 1 1.5 2<br />

response times in ms<br />

Figure 16<br />

pend<br />

conn<br />

The 32x4k read hit benchmark depicted in Figure 17 is another benchmark that pushes both<br />

the processor and the link to high levels of utilization with the processor slightly higher than<br />

the link utilization. Figure 17 shows the response time components for a 32x4K read hit<br />

benchmark which is a long chain of short blocks. Here CONNECT time is the largest<br />

component since the total data transfer size is 128K bytes per I/O even though the block size<br />

for each CCW is only 4K bytes. Since there is a separate CCW for each of the 4K bytes and<br />

the <strong>FICON</strong> <strong>Express2</strong> channel processor works on each CCW separately, the processor gets<br />

pushed to high utilization levels for this workload. With this benchmark, CONNECT time<br />

starts out at a little over 2ms at 10% <strong>FICON</strong> Processor utilization (FPU) and increases to over<br />

3ms at 60% FPU, when just under 120MB/sec are being transferred, which represents about<br />

60% link utilization as well.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 17<br />

response time components for 32x4K read hits...<br />

128K bytes/io, 100% cu cache hits<br />

<strong>FICON</strong> Processor Utilization<br />

10% FPU<br />

20% FPU<br />

30% FPU<br />

40% FPU<br />

50% FPU<br />

60% FPU<br />

0 1 2 3 4<br />

response time in ms<br />

Figure 17<br />

pend<br />

conn<br />

The following table summarizes info from both the RMF <strong>Channel</strong> Activity and RMF <strong>FICON</strong><br />

Director Activity reports at high levels of utilization for these benchmarks measurements:<br />

<strong>Channel</strong><br />

program<br />

4K byte read<br />

hits<br />

32x4K byte read<br />

hits<br />

27K byte read<br />

hits<br />

6x27K byte read<br />

hits<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

channel<br />

processor<br />

utilization<br />

91%<br />

83%<br />

62%<br />

39%<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

channel link<br />

utilization<br />

26%<br />

71%<br />

84%<br />

100%<br />

READ MB/sec<br />

52 MB/se<br />

142 MB/sec<br />

168 MB/sec<br />

200 MB/sec<br />

Average<br />

FRAME size in<br />

bytes<br />

843<br />

1,334<br />

1,766<br />

1,967


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 18<br />

The average frame size info from the RMF <strong>FICON</strong> Director Activity report can be used to<br />

determine if a workload is more likely to be processor or link limited. The following general<br />

rule-of-thumb can be applied:<br />

If the average frame size is less than 1000 bytes, then the workload is most likely<br />

processor limited.<br />

If the average frame size is greater than 1500 bytes, then the workload is most likely<br />

bus or link limited.<br />

For workloads with average frame sizes greater than 1000 bytes and less than 1500<br />

bytes, then both channel processor and bus/link utilizations should be monitored.<br />

More information about how to find these fields on the RMF <strong>Channel</strong> Activity and<br />

<strong>FICON</strong> Director Activity reports for your production workload can be found in the<br />

<strong>FICON</strong> RMF Information section of this paper.<br />

In summary, the performance results of 4 different DASD I/O driver benchmarks run on<br />

<strong>FICON</strong> <strong>Express2</strong> channels were presented here. Response times and utilizations of the most<br />

pertinent channel resources for each benchmark were explained. For all of these<br />

benchmarks, results are significantly better than previous generations of <strong>FICON</strong>, <strong>FICON</strong><br />

Express and especially ESCON channels.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 19<br />

<strong>FICON</strong> <strong>Express2</strong> CTC performance<br />

For <strong>Channel</strong>-to-<strong>Channel</strong> (CTC) applications, the previous generation <strong>FICON</strong> Express channel<br />

was better than ESCON for all large block transfers. But for customers using CTC as a<br />

transport mechanism for small (1K bytes or less) XCF messages, ESCON CTC previously had<br />

the best response times at low activity rates. Now, as depicted in Figure 18, the new <strong>FICON</strong><br />

<strong>Express2</strong> CTC response times for short (1K bytes or less) XCF messages are 25 to 35% better<br />

than <strong>FICON</strong> Express and ESCON CTC response times at low activity rates. Furthermore,<br />

signals per second throughput rates at the 400 usec response time level for short messages<br />

across <strong>FICON</strong> <strong>Express2</strong> CTC are 1.5 to 3 times better than <strong>FICON</strong> Express CTC and ESCON<br />

CTC link capabilities.<br />

<strong>FICON</strong> <strong>Express2</strong>(FEx2) CTC response<br />

times with short XCF messages...<br />

better than ESCON!<br />

i/o resp time in microsec<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

0 5 10 15<br />

signals per second<br />

Thousands<br />

Figure 18<br />

ESCON<br />

FEx<br />

FEx2


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 20<br />

<strong>FICON</strong> <strong>Express2</strong> Card level performance<br />

<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> card configurations...<br />

4 channels per card<br />

4 cards or up to 16 channels per esti<br />

z-series<br />

Memory<br />

MBA<br />

chip<br />

esti<br />

link<br />

esti-M<br />

card<br />

Figure 19<br />

sti<br />

link<br />

sti<br />

link<br />

sti<br />

link<br />

sti<br />

link<br />

<strong>FICON</strong> <strong>Express2</strong><br />

channel card<br />

<strong>FICON</strong> <strong>Express2</strong><br />

channel card<br />

<strong>FICON</strong> <strong>Express2</strong><br />

channel card<br />

<strong>FICON</strong> <strong>Express2</strong><br />

channel card<br />

The 4 <strong>FICON</strong> <strong>Express2</strong> channels on the same physical card all connect to a single 1 GB/sec<br />

STI link. Since 4 times the max capability of a single <strong>FICON</strong> <strong>Express2</strong> channel exceeds 1<br />

GB/sec, measurements were done to determine the max capability of the 1 GB/sec STI link.<br />

In Figure 20, these measurement results are compared to the previous generation <strong>FICON</strong><br />

Express channel card which had 2 channels per card connected to a 333 MB/sec STI link.<br />

For the <strong>FICON</strong> <strong>Express2</strong> channel card, a max of 644 READ MB/sec, 651 WRITE MB/sec<br />

and 970 READ+WRITE MB/sec was measured which represents a 2.5 to 3.5 times<br />

improvement compared to the previous generation <strong>FICON</strong> Express channel card.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 21<br />

z990 <strong>FICON</strong> <strong>Express2</strong> vs <strong>FICON</strong> Express<br />

card level MB/sec comparison<br />

MB/sec<br />

1200<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

276<br />

265<br />

178<br />

z990 FEx reads<br />

FEx writes<br />

FEx r/w mix<br />

644 651<br />

970<br />

FEx2 reads<br />

FEx2 writes<br />

FEx2 r/w mix<br />

Figure 20<br />

4 <strong>FICON</strong> <strong>Express2</strong><br />

channels per card<br />

with a 1GB/sec STI<br />

vs 2 <strong>FICON</strong> Express<br />

channels per card<br />

with a 333MB/sec STI<br />

--> 2.5x to 3.5x<br />

improvement at card<br />

level<br />

As depicted in Figure 19, the 4 <strong>FICON</strong> <strong>Express2</strong> channel cards can be connected via an<br />

ESTI-M card to a single 2 GB/sec ESTI link. Since 4 times the max capability of a single<br />

<strong>FICON</strong> <strong>Express2</strong> channel card exceeds 2 GB/sec, measurements were done to determine the<br />

maximum capability of a single 2GB/sec ESTI link. As shown in Figure 21, a maximum of<br />

1551 READ MB/sec, 1587 WRITE MB/sec and 1843 READ + WRITE MB/sec was measured<br />

in this configuration using a 6x27K channel program.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 22<br />

z990 <strong>FICON</strong> <strong>Express2</strong> single card vs<br />

4 cards per 2GB/sec sti domain<br />

MB/sec<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

644 651<br />

single card reads<br />

single card writes<br />

single card r/w mix<br />

970<br />

1587<br />

1551<br />

1843<br />

sti domain reads<br />

sti domain writes<br />

sti domain r/w mix<br />

Figure 21<br />

4 <strong>FICON</strong> <strong>Express2</strong><br />

cards per 2GB/sec STI<br />

domain--><br />

78% of sti speed for<br />

reads<br />

79% of sti speed for<br />

writes<br />

92% of sti speed for<br />

read/write mix<br />

It has been my experience that the only time customers come close to pushing this level of<br />

I/O bandwidth in a single I/O domain is when running I/O driver benchmarks in a test<br />

environment. It is unlikely to see it in a real customer production environment for several<br />

reasons. Normally the configurator will spread different types of channel cards across multiple<br />

ESTI-M cards so that any one ESTI-M card might have a mixture of 1 ESCON card, 1 <strong>FICON</strong><br />

Express channel card and 1 or 2 <strong>FICON</strong> <strong>Express2</strong> channel cards. For many years, we have<br />

recommended that when configuring up to 8 channel paths per LCU that the individual<br />

channels in the path group be selected from different physical channel cards. In this way if a<br />

particular LCU is running a high I/O bandwidth application the MB/sec load will be spread<br />

across multiple channel cards, STI links, MBA chips and books. Furthermore, in any<br />

particular time interval “hot spots” of activity tend to be limited to small groups of channels.<br />

In any case, the maximum ESTI-M card capability is presented here for your awareness. To<br />

determine if you are one of the vast majority of customers that has NO reason to be<br />

concerned about this, you can take the following approach:


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 23<br />

1. As a first pass, you can simply add up the READ and WRITE MB/sec from the RMF<br />

<strong>Channel</strong> Activity report for all of the channels configured on your system. If that sum is<br />

less than 1.5 GB/sec, you are done.<br />

2. If not, then select the 16 channels with the highest MB/sec and add those up. If that sum<br />

is less than 1.5 GB/sec, you are done.<br />

3. If not, then you need to more carefully determine which channels are plugged in to which<br />

ESTI-M card (the PCHID report can help you do this) and add up the READ + WRITE<br />

MB/sec for those channels.<br />

Again, with the exception of those customers running high I/O bandwidth benchmark tests,<br />

most customers will be able to stop at step 1.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 24<br />

<strong>FICON</strong> RMF Information<br />

This section of the white paper will explain the I/O performance information available on the<br />

following RMF reports:<br />

1. <strong>Channel</strong> Path Activity report<br />

2. Device Activity report<br />

3. <strong>FICON</strong> Director Activity report<br />

4. I/O Queuing Activity report<br />

The primary RMF report of interest for <strong>FICON</strong> is the <strong>Channel</strong> Path Activity report. Figure 22<br />

is an excerpt from this report.<br />

C H A N N E L P A T H A C T I V I T Y<br />

MODE: LPAR CPMF: EXTENDED MODE CSSID: 0<br />

CHANNEL PATH UTILIZATION(%) READ(MB/SEC) WRITE(MB/SEC)<br />

ID TYPE G SHR PART TOTAL BUS PART TOTAL PART TOTAL<br />

95 FC_S 4 Y 61.11 61.11 32.56 119.34 119.34 0.00 0.00<br />

Figure 22<br />

<strong>FICON</strong> channels can be identified from the TYPE column; their type begins with FC:<br />

type FC indicates a native <strong>FICON</strong> channel;<br />

type FC_S indicates a native <strong>FICON</strong> channel connected to a switch or director<br />

type FCV indicates a <strong>FICON</strong> bridge channel which connects to an ESCON control unit via a<br />

bridge card in a 9032 model 5 ESCON director. <strong>FICON</strong> <strong>Express2</strong> channels do not support<br />

FCV mode.<br />

The ID column is the <strong>Channel</strong> Path ID or CHPID number. CHPID 95 is displayed in Figure<br />

22.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 25<br />

The Generation (G) field tells you a combination of which generation <strong>FICON</strong> channel is<br />

being used and the speed of the fibre channel link for this CHPID at the time the machine<br />

was IPL’d. A “4” appears in the G field for CHPID 95 in Figure 22. This means that this<br />

channel is a <strong>FICON</strong> <strong>Express2</strong> channel with a link speed of 2 Gbps. If this channel was<br />

connected to a 1 Gbps director, then there would be a “3” in the G field. A “2” indicates a<br />

<strong>FICON</strong> Express channel with a link speed of 2 Gbps and a “1” indicates a <strong>FICON</strong> Express<br />

channel operating at 1Gbps.<br />

For a given <strong>FICON</strong> channel there are three possible entries under UTILIZATION (%):<br />

1. PART denotes the <strong>FICON</strong> processor utilization due to this logical partition.<br />

2. TOTAL denotes the <strong>FICON</strong> processor utilization for the sum of all the LPARs.<br />

3. BUS denotes the <strong>FICON</strong> PCI bus utilization for the sum of all the LPARs.<br />

The <strong>FICON</strong> processor is busy for channel program processing, which includes the processing<br />

of each individual channel command word (CCW) in the channel program and some setup<br />

activity at the beginning of the channel program and cleanup at the end. A very precise<br />

algorithm is used for calculating zSeries <strong>FICON</strong> Express and <strong>FICON</strong> <strong>Express2</strong> channel<br />

utilizations. This algorithm is based on monitoring the amount of time the channel processor<br />

spends doing various separate functions, and the results of this algorithm give a much more<br />

accurate measure of <strong>FICON</strong> processor busy time than the original algorithm based on<br />

counting command and data sequences, which is still used for 9672 G5/G6 <strong>FICON</strong> channels.<br />

The <strong>FICON</strong> bus is busy for the actual transfer of command and data frames from the <strong>FICON</strong><br />

channel chip to the fibre channel adapter chip, which is connected via the fibre channel link<br />

to the director or control unit. For <strong>FICON</strong> and <strong>FICON</strong> Express channels, the <strong>FICON</strong> bus is<br />

also busy when the <strong>FICON</strong> processor is polling for work to do. This is why one can see<br />

anywhere from 5 to 15% <strong>FICON</strong> bus utilization on the RMF <strong>Channel</strong> Activity report during<br />

time intervals when there are no I/Os active on those channels. The new <strong>FICON</strong> <strong>Express2</strong><br />

channels, however, no longer use the bus for polling and therefore the bus utilization should<br />

be less than 1% for these channels when there are no I/O’s active for an entire RMF<br />

reporting interval.<br />

The actual FC channel processor and bus utilizations as reported by RMF will vary by<br />

workload and by channel type. As shown in Figure 22 above, <strong>FICON</strong> <strong>Express2</strong> channels<br />

provide bandwidth information (MB/SEC) not available for ESCON channels. This is<br />

provided separately for READs and WRITEs since the fibre channel link is full duplex, at<br />

both the logical partition level (PART) and the entire system level (TOTAL). Fibre channel<br />

link utilizations are not directly reported on RMF but can be easily calculated by dividing the


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 26<br />

READ or WRITE MB/sec by the link capacity. Several examples of <strong>FICON</strong> <strong>Express2</strong> channel<br />

processor, bus and link utilizations based on I/O driver benchmark measurements are<br />

displayed in Figures 5 through 17 of this paper.<br />

With <strong>FICON</strong> <strong>Express2</strong> channels, customers should continue to analyze their I/O activity by<br />

looking at the DASD or TAPE activity reports, just as they did with <strong>FICON</strong>, <strong>FICON</strong> Express<br />

and ESCON channels. An example of a Direct Access Device Activity report is shown in<br />

Figure 23.<br />

device activity report...response times...<br />

benefit of PAVs...<br />

D I R E C T A C C E S S D E V I C E A C T<br />

z/OS V1R6 SYSTEM ID xxxx DATE 01/24/2005<br />

RPT VERSION V1R5 RMF TIME 1<strong>1.0</strong>9.28<br />

DEVICE AVG AVG AVG AVG AVG AVG AVG<br />

DEV DEVICE VOLUME PAV LCU ACTIVITY RESP IOSQ CMR DB PEND DISC CONN<br />

NUM TYPE SERIAL RATE TIME TIME DLY DLY TIME TIME TIME<br />

4612 33903 DS3B02 1 0037 54.736 2.2 1.2 0.0 0.0 0.2 0.4 0.4<br />

4613 33903 DS3B03 1 0037 48.996 8.7 5.3 0.0 0.0 0.2 1.8 1.4<br />

4616 33903 DS3B06 1 0037 15.196 8.0 2.5 0.0 0.0 0.2 3.1 2.2<br />

4617 33903 DS3B07 1 0037 20.761 9.7 3.6 0.0 0.0 0.2 3.3 2.6<br />

461C 33903 DS3B0C 1 0037 17.189 13.6 6.6 0.0 0.0 0.2 3.8 2.9<br />

461E 33903 DS3B0E 1 0037 41.288 9.0 4.9 0.0 0.0 0.2 2.3 1.7<br />

LCU 0037 1196.01 3.5 1.7 0.0 0.0 0.2 0.9 0.9<br />

4612 33903 DS3B02 4 0037 55.669 0.5 0.0 0.0 0.0 0.2 0.1 0.2<br />

4613 33903 DS3B03 4 0037 50.145 1.8 0.0 0.0 0.0 0.2 0.8 0.8<br />

4616 33903 DS3B06 4 0037 13.828 8.2 0.0 0.0 0.0 0.2 4.3 3.7<br />

4617 33903 DS3B07 4 0037 20.348 6.4 0.0 0.0 0.0 0.2 3.3 2.9<br />

461C 33903 DS3B0C 4 0037 16.929 8.0 0.0 0.0 0.0 0.2 4.2 3.6<br />

461E 33903 DS3B0E 4 0037 41.106 3.4 0.0 0.0 0.0 0.2 1.7 1.5<br />

LCU 0037 1226.54 1.7 0.0 0.0 0.0 0.2 0.7 0.8<br />

Figure 23<br />

Here one can examine the AVG RESP TIME and various response time components (IOSQ,<br />

PEND, DISC and CONN times) for activity to the LCUs attached to the <strong>FICON</strong> <strong>Express2</strong><br />

channels. If response time is a problem, then the response time components need to be<br />

looked at. If disconnect time is a problem, then an increase in CU cache size might help. If<br />

IOSQ time is a problem, then Parallel Access Volumes might help. Figure 23 shows an<br />

example of the reduction in IOSQ time experienced on an IMS benchmark measurement<br />

when 4 PAVs were defined vs. 1. In this particular case, IOSQ time improved from an<br />

average of 1.7ms to 0ms for this LCU. If PEND or CONNECT times are too high, then one


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 27<br />

can look at the <strong>FICON</strong> processor, bus and link utilizations. If any one of these utilizations is<br />

above 50% then overuse of the <strong>FICON</strong> channel could be contributing to additional PEND<br />

and CONNECT time delays. If, on the other hand, PEND and CONNECT times are high and<br />

<strong>FICON</strong> channel utilizations are less than 50%, then overuse of a <strong>FICON</strong> director port or<br />

control unit port could be contributing factors. If <strong>FICON</strong> channels from multiple CECs are<br />

connected to the same director destination port, then one must add up the activity from all<br />

the CECs to determine the total destination port activity. This total activity level should be<br />

less than the “knee of the curve” points depicted in the measurement results that appear in<br />

the white papers for the specific native <strong>FICON</strong> DASD or TAPE product that is being used.<br />

One of the basic differences between native <strong>FICON</strong> and ESCON channel performance is the<br />

CONNECT time component of response time. Since an ESCON channel is only capable of<br />

executing one I/O at a time, the amount of time that it takes to execute the protocol + data<br />

transfer components of CONNECT time is relatively constant from one I/O operation to the<br />

next with the same exact channel program. With <strong>FICON</strong> however, CONNECT time can vary<br />

from one execution of a channel program to another. This is a side effect of the multiplexing<br />

capability of <strong>FICON</strong>. Since both the channel and the control unit can be concurrently<br />

executing multiple I/O operations, the individual data transfer frames of one I/O operation<br />

might get queued up behind the data transfer frames of another I/O operation. So, the<br />

CONNECT time of an I/O with <strong>FICON</strong> is dependent upon the number of I/O operations that<br />

are concurrently active on the same <strong>FICON</strong> channel, link and control unit connection.<br />

Multiplexing also means that the start and end of the CONNECT time for one native <strong>FICON</strong><br />

I/O operation can overlap the start and end of the CONNECT time for several other native<br />

<strong>FICON</strong> I/O operations. But AVG CONN TIME for large block size transfers should be<br />

significantly less for native <strong>FICON</strong> channels than for the same transfer size on ESCON or<br />

<strong>FICON</strong> Bridge channels due to the much faster (2 Gbps or 200 MB/sec) link transfer speeds<br />

of native <strong>FICON</strong> vs. the 20 MB/sec link transfer speed of ESCON. Several examples of<br />

CONNECT times at various levels of <strong>FICON</strong> <strong>Express2</strong> channel processor, bus and link<br />

utilizations are shown in the <strong>FICON</strong> <strong>Express2</strong> benchmark measurement results displayed in<br />

Figures 5 through 17 of this paper.<br />

Little’s Law can be used to estimate the average number of open exchanges or simultaneously<br />

active I/O’s or multiplexing level for both a <strong>FICON</strong> channel and a control unit port for a<br />

given RMF interval. This formula is essentially a variation of the formula for calculating I/O<br />

intensity levels which has been used for years to identify “hot spots” in an I/O configuration.<br />

I/O intensity levels are calculated by multiplying total response times by activity rates. The<br />

number of I/O’s that are simultaneously active and transferring data between the channel and<br />

the control unit can be determined by multiplying the CONNECT time component of<br />

response time (in units of seconds or milliseconds(ms) times 0.001) by the activity rate (in


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 28<br />

units of I/Os per second). If there is only 1 LCU (logical control unit) connected to a single<br />

set of <strong>FICON</strong> channels, then the average number of open exchanges can be calculated by<br />

multiplying the activity rate for that LCU by the sum of the “CMR + CONN + DISC” times<br />

for that LCU divided by the number of channels in the path group for that LCU. If there are<br />

multiple LCUs connected to a set of <strong>FICON</strong> channels, then the results of this calculation<br />

needs to be summed for all these LCUs. Similarly, to determine the average number of<br />

exchanges for a given physical CU port if there are multiple sets of channels from multiple<br />

LPARs on multiple CECs connected to the same set of CU ports, this calculation needs to be<br />

done for each LCU for each LPAR and then summed to get the total for the CU port.<br />

In any case, if the result of this calculation is a higher than normal value for your workload,<br />

then one must look at each of the components of the formula to determine the cause of the<br />

high number of open exchanges. AVG CMR DLY or “command response” delay time is a new<br />

field that has been added to the RMF Device Activity report for <strong>FICON</strong>. An example of this<br />

is displayed in Figure 23 above. AVG CMR DLY time is a subset of PEND time. As shown in<br />

Figure 24, when a channel opens a new exchange with a control unit by sending the first<br />

command in the channel program to the control unit, the control unit responds with a CMR.<br />

Architecturally, the official end to PEND time (for both <strong>FICON</strong> and ESCON) is designated by<br />

the time when the channel receives the CMR signal from the control unit.<br />

<strong>FICON</strong> Command/Data Transfer<br />

CCW=<strong>Channel</strong> Control Word CE=<strong>Channel</strong> End DE=Device End<br />

CMR = Command Response<br />

<strong>FICON</strong><br />

<strong>Express2</strong><br />

<strong>Channel</strong><br />

total pend time<br />

ssch<br />

CCW1<br />

CCW2<br />

CCW3<br />

CE/DE<br />

Control Unit<br />

cmr time...subset of pend<br />

CCW1<br />

CMR<br />

Figure 24<br />

CMR<br />

cmd<br />

End<br />

cmd<br />

End<br />

cp ---> sap ---> channel ---> cu port ---> channel<br />

Device<br />

CMR time begins when exchange begins & ends when pend time ends


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 29<br />

If the control unit is excessively busy with other I/O operations or exchanges that are already<br />

active, then this will be reflected in larger than normal AVG CMR DLY times. If DISC time is<br />

high, then the cause of a high number of average open exchanges could be low control unit<br />

cache hit ratios or contention in other internal resources of the control unit involved in<br />

reading or writing data from disk. Synchronous copying of data from primary DASD to<br />

secondary DASD located many kilometers away can also cause high DISCONNECT times.<br />

If CONN time is high then the cause of a high number of open exchanges could be either<br />

high channel utilizations or high control unit port utilizations or director port contention or<br />

long distances between the channel and the control unit or large data transfers or the nature<br />

of the particular channel programs being executed. <strong>Channel</strong> (processor and bus) utilizations<br />

can be found on the RMF <strong>Channel</strong> Activity report. Unfortunately, control unit port<br />

utilizations are not reported directly on any RMF report. However, some information about<br />

<strong>FICON</strong> director ports that are connected to either control unit ports or channels can be found<br />

on the RMF <strong>FICON</strong> Director Activity report. An example of this report is shown in Figure<br />

25.<br />

RMF <strong>FICON</strong> Director Activity report<br />

F I C O N D I R E C T O R A C T I V I T Y<br />

z/OS V1R6 SYSTEM ID S08 DATE 12/01/2004<br />

RPT VERSION V1R5 RMF TIME 16.18.00<br />

IODF = 4C NO CREATION INFORMATION AVAILABLE ACT: POR<br />

SWITCH DEVICE: 00C2 SWITCH ID: ** TYPE: 006140 MODEL: 001 MAN: MCD<br />

PORT -CONNECTION- AVG FRAME AVG FRAME SIZE PORT BANDWIDTH (MB/SEC)<br />

ADDR UNIT ID PACING READ WRITE -- READ -- -- WRITE --<br />

note: channel program = 32x4K read<br />

49 CHP-H 95 0 70 1334 2.19 125.21<br />

7A CU ---- 0 1334 70 39.70 0.70<br />

7B CU ---- 0 1334 70 41.65 0.73<br />

83 CU BF00 0 1334 70 41.71 0.73<br />

compare MB/sec at CU port to max CU port capability to approximate CU<br />

port utilization and compare to CU link max MB/sec based on link speed<br />

(100MB/sec for 1Gbps or 200MB/sec for 2Gbps links) to get CU link<br />

utilization<br />

Figure 25


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 30<br />

The first column “PORT ADDR” identifies the switch port address. The 2nd and 3rd<br />

“CONNECTION” columns identify what this switch port is connected to. The “UNIT”<br />

indicates whether is it a channel (CHP-H), a control unit port (CU) or in the case where two<br />

directors are cascaded, another switch port (SWITCH). The “ID” in column 3 is the CHPID<br />

number for the channel or the control unit address for the CU. The values in the “AVG<br />

FRAME PACING” column will be zero most of the time. This column is intended to display<br />

the amount of time that a frame is delayed when there are no more buffer credits available.<br />

The “AVG FRAME SIZE” columns display the average number of bytes per frame being<br />

“READ” into that director port or written out from that director port. These columns can be<br />

used to help understand if your workload is a processor or bus/link limited workload. The<br />

maximum frame size is 2K bytes. If your workload is transferring a small amount of data<br />

using small block sizes, such as the 4K bytes per I/O typically found in online transaction<br />

processing, then the average frame size will most likely be less than 1000 bytes and your<br />

workload will most likely be channel processor or control unit port processor limited. On the<br />

other hand, if your workload transfers a lot of data using large block sizes, then the average<br />

frame size will most likely be in the 1500 to 2000 byte range and your workload will most<br />

likely be channel or control unit bus or link limited. Figure 25 is an example of a workload<br />

that is in between these two extremes and has an average frame size of 1334 bytes. In this<br />

case, both processor and bus/link utilizations should be monitored.<br />

The last two columns on this report, the “PORT BANDWIDTH (MB/SEC)” “READ” and<br />

“WRITE” columns contain the MB/sec that are being “READ” into that director port or<br />

written out from that director port. Please note that for an RMF interval where 10 MB/sec of<br />

data is being “READ” from a device on a control unit that the 10 MB/sec value will appear<br />

on the line for the director port connected to the control unit in the “READ” column but in<br />

the “WRITE” column for the director port connected to the channel in the RMF <strong>FICON</strong><br />

Director Activity Report and in the “READ(MB/SEC)” column of the channel in the RMF<br />

<strong>Channel</strong> Activity Report. The “READs” and “WRITEs” on the <strong>FICON</strong> Director Activity<br />

report are from the perspective of the port, whereas the “READs” and “WRITEs” on the<br />

<strong>Channel</strong> Activity report are from the perspective of the higher level application. Figure 25 is<br />

an example of a benchmark measurement where about 40 MB/sec was “READ” from each of<br />

3 different control unit ports and over 120 MB/sec was written to a single channel, CHPID<br />

#95.<br />

To convert control unit port MB/sec data into control unit port utilizations, you also need to<br />

know what the maximum capability of the control unit port is for both small and large block<br />

sizes and whether your workload is a small or large block size workload. If a control unit<br />

vendor tells you or you run your own test to determine that the maximum capability of a<br />

single port on their box for 4k byte READs is 5000 I/Os per second, then this is the same as


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 31<br />

seeing 20MB/sec in the READ MB/sec column and less than 1000 bytes in the AVG READ<br />

FRAME SIZE column for the CU port line on the RMF <strong>FICON</strong> Director Activity report. If<br />

your workload is reporting more than 10 READ MB/sec with an AVG READ FRAME SIZE<br />

less than 1000 bytes, then your workload is driving this CU port to greater than 50%<br />

utilization. Similarly, if a control unit vendor tells you or you run your own test to determine<br />

that the maximum capability of a single port on their box for half-track or 27K byte READs is<br />

about 2500 I/Os per second or about 70 MB/sec, then this is the same as seeing 70 MB/sec<br />

in the READ MB/sec column and greater than 1500 bytes in the AVG READ FRAME SIZE<br />

column for the CU port line on the RMF <strong>FICON</strong> Director Activity report. If your workload is<br />

reporting more than 35 READ MB/sec with an AVG READ FRAME SIZE greater than 1500<br />

bytes, then your workload is driving this CU port to greater than 50% utilization. Driving a<br />

CU port to greater than 50% utilization could be the cause of higher than normal CONN<br />

times which could result in higher than normal average open exchanges for that CU port or<br />

for any of the channels connected to that CU port.<br />

If you were to ask the question, “what is an appropriate value for average open exchanges in<br />

an RMF interval for my workload?”, the answer, of course, would be “it depends on the<br />

characteristics of the workload”. The following example should illustrate this point.<br />

ACTIVITY<br />

5,634.1<br />

5,634.1<br />

5,634.1<br />

5,634.1<br />

5,634.1<br />

5,634.1<br />

5,634.1<br />

RESP<br />

1.8<br />

2.7<br />

3.7<br />

4.7<br />

5.7<br />

6.7<br />

7.7<br />

PEND<br />

0.3<br />

0.3<br />

0.3<br />

0.3<br />

0.3<br />

0.3<br />

0.3<br />

CMR<br />

0.2<br />

0.2<br />

0.2<br />

0.2<br />

0.2<br />

0.2<br />

0.2<br />

DISC<br />

1.2<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

CONN<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

0.4<br />

OPEX<br />

2.6<br />

3.7<br />

5.1<br />

6.5<br />

7.9<br />

9.3<br />

10.7<br />

CU H/R<br />

88%<br />

80%<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

The first row of this table is taken from the RMF reports for a 15 minute interval of the LSPR<br />

OLTP-T workload measurement.<br />

ACTIVITY = I/Os per second rate.<br />

RESP = total response time for each I/O in ms.<br />

PEND = pend time.<br />

CMR = command response time, which is a subset of PEND time.<br />

DISC = disconnect time.<br />

CONN = connect time.<br />

OPEX = average number of open exchanges per channel. In this configuration, there were 4<br />

channels per LCU.<br />

CU H/R = control unit cache hit ratio.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 32<br />

This workload has a control unit cache hit ratio of 88% and a disconnect time of 1.2ms<br />

which implies that it takes an average of about 10ms to resolve each CU cache miss. The rest<br />

of the rows in the above table illustrate what the response time and average open exchanges<br />

per channel would be if instead of a CU H/R of 88%, this workload had an 80%, 70%, 60%,<br />

50%, 40% or 30% control unit cache hit ratio. For each 10% drop in CU H/R, disconnect<br />

time and total response times increase by 1ms. Average open exchanges per channel increase<br />

from 2.6 with a CU H/R of 88% to 10.7 with a CU H/R of 30%. So, if the nature of your<br />

workload is such that it has a poor CU cache hit ratio, then it is acceptable to have higher<br />

average open exchange values for this workload compared to a workload with much better<br />

CU cache hit ratios. Furthermore, adding additional channel paths to a workload with poor<br />

CU cache hit ratios is not the appropriate action to take. For this workload the channels are<br />

only 18% busy. To improve the performance of a workload with high disconnect times,<br />

attention needs to be paid to actions that will either improve the CU cache hit ratio or reduce<br />

the amount of time that it takes to resolve each CU cache miss.<br />

This is just one example of how values for average open exchanges can vary based on<br />

workload characteristics. In general, an acceptable average open exchange value should be<br />

determined for each workload based on experiences of when bottom line workload<br />

performance is acceptable or not.<br />

With ESCON, the additional queuing delays caused by having multiple I/Os concurrently<br />

active appear in the PEND or DISC time component of response time. If the same workload<br />

with the same activity rate and the same level of I/O concurrency is run on native <strong>FICON</strong><br />

channels instead of ESCON channels, then one could see the PEND and DISC time<br />

components of response time decrease and the CONNECT time component increase for small<br />

data transfer sizes. For large data transfers, the improved CONNECT time due to the 100<br />

MB/sec or 200 MB/sec link transfer speed will most likely offset any increased CONNECT<br />

time due to multiplexing queuing delays. Figure 26 illustrates the type of improvement in<br />

CONNECT time experienced on the z900 <strong>FICON</strong> and for <strong>FICON</strong> Express as compared with<br />

ESCON. The exact CONNECT time will, of course, vary depending on the details of the I/O<br />

configuration (type of storage system, number of devices, workload intensity, etc.). Figures 5<br />

through 17 of this paper show several examples of CONNECT times at various utilization<br />

levels for the new z990 <strong>FICON</strong> <strong>Express2</strong> channels.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 33<br />

Connect times in ms<br />

Sample <strong>FICON</strong> vs ESCON connect<br />

times for large data transfer sizes<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

Figure 26<br />

In addition to the RMF <strong>Channel</strong> Activity, Device Activity and <strong>FICON</strong> Director Activity<br />

reports, the RMF I/O Queuing Activity report also provides information about your I/O<br />

configuration. Starting with z/OS V1R2 and RMF Release 12, several new fields were added<br />

to the I/O Queuing Activity report. Figures 27, 28 and 29 are examples of excerpts from this<br />

report. The “Initiative Queue” section of the report is the same as it has been for several<br />

years. The “IOP UTILIZATION” and the “RETRIES/SSCH” sections were added with z/OS<br />

V1R2. The “% IOP BUSY” column is the SAP utilization. The “I/O START RATE” column<br />

is the number of SSCHs per second sent from a CP to a particular SAP. The “INTERRUPT<br />

RATE” column is the number of I/O interrupts per second processed by each SAP. In<br />

general, if the channel programs being executed do not have the PCI (Programmed<br />

Controlled Interrupt) flag set, the total number of interrupts per second processed will be<br />

equal to the total number of SSCHs per second processed. The “RETRIES/SSCH” section<br />

indicates the average number of times per SSCH that the SAP encountered a busy signal in<br />

the process of doing its path selection work for this I/O operation. There are four types of<br />

busies reported:<br />

1. CP busy = channel path busy,<br />

2. DP busy = director port busy<br />

3. CU busy = control unit port busy<br />

4. DV busy = device busy<br />

ESCON 27K<br />

<strong>FICON</strong> 27K<br />

<strong>FICON</strong> Express 27K<br />

ESCON 6x27K<br />

<strong>FICON</strong> 6x27K<br />

<strong>FICON</strong> Express 6x27K


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 34<br />

Each time a SAP encounters a busy and has to retry another path for a SSCH, additional SAP<br />

cycles are consumed and the %IOP busy or SAP utilization will increase. One of the benefits<br />

of native <strong>FICON</strong> is that it makes SAPs or IOPs more productive due to the reduction in<br />

busies or RETRIES/SSCH. For the same activity rate, one should see less IOP utilization %<br />

busy with native <strong>FICON</strong>, <strong>FICON</strong> Express and <strong>FICON</strong> <strong>Express2</strong> channels than with ESCON<br />

channels. One must be careful not to misinterpret IOP utilization %’s however. High IOP<br />

utilization %’s are usually an indicator of contention especially with ESCON channels,<br />

directors and control units. Adding additional IOPs will NOT help reduce channel<br />

configuration contention. One must identify the source of the configuration contention and<br />

fix it. Migrating from ESCON to native <strong>FICON</strong> configurations is a natural solution to this<br />

problem. Figures 27 and 28 represent a dramatic example of this. Figure 27 is from a z900<br />

ESCON configuration with a lot of contention. Specifically, for the time interval reported,<br />

there were a total of 4.73 retries per SSCH. 4.19 of these were channel path busies. This<br />

means that when the SAP tried to start a new I/O operation on an ESCON channel, that<br />

channel was already busy processing another I/O and the SAP had to try to find another<br />

ESCON channel path that was available for this I/O and on the average it did this 4.19 times<br />

per SSCH. This means that either the ESCON channels were operating at high utilizations or<br />

there were not enough paths per LCU defined to handle the number of I/O operations that<br />

were being issued simultaneously to the total number of LCU’s that shared the same set of<br />

ESCON paths. This can happen when there is a burst of activity during a subset of the total<br />

RMF interval, e.g. for a few minutes out of a 30 minute or longer interval. There was also an<br />

average of 0.54 director port busies per SSCH during this time interval. This means that 2 or<br />

more ESCON channel paths most likely from multiple CECs in the same sysplex were trying<br />

to connect to the same director port and control unit port at the same time and with ESCON<br />

only 1 I/O operation to a given director port or CU port can be active at once. The other<br />

I/O’s that attempt to use the same destination port will get DP busy signals. The rules of<br />

thumb available for these statistics are:<br />

1. keep SAP utilization or %IOP BUSY below 70%,<br />

2. AVG Q LNGTH should be less than 1 and<br />

3. Total RETRIES/SSCH should be less than 2 with the sum of DP, CU and DV busies per<br />

SSCH less than 1.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 35<br />

I/O Q U E U I N G A C T I V I T Y<br />

RPT VERSION V1R2 RMF<br />

Figure 27<br />

from a z900 ESCON<br />

configuration with a lot of<br />

contention<br />

- INITIATIVE QUEUE - ------- IOP UTILIZATION -------<br />

IOP ACTIVITY AVG Q % IOP I/O START INTERRUPT<br />

RATE LNGTH BUSY RATE RATE<br />

00 2745.205 0.77 68.02 2745.181 3684.715<br />

01 3236.994 0.11 53.70 3236.990 3566.626<br />

02 3067.562 0.82 73.73 3067.292 3262.451<br />

SYS 9049.758 0.55 65.15 9049.461 10513.79<br />

IOP<br />

00<br />

01<br />

02<br />

SYS<br />

-------- RETRIES / SSCH ---------<br />

CP DP CU DV<br />

ALL BUSY BUSY BUSY BUSY<br />

4.80 4.17 0.62 0.00 0.00<br />

2.92 2.60 0.31 0.00 0.00<br />

6.58 5.88 0.69 0.00 0.00<br />

4.73 4.19 0.54 0.00 0.00<br />

rules of thumb:<br />

avg q lngth < 1,<br />

%IOP busy < 70%,<br />

retries/ssch < 1 or 2<br />

Figure 28 shows the dramatic improvements in effective SAP capacity after the migration<br />

from the z900 with all ESCON channels to the z990 with most of the I/O activity occurring<br />

on <strong>FICON</strong> channels. The number of RETRIES/SSCH went from 4.73 to 0.21 and average<br />

SAP utilizations dropped from over 65% to under 20%, resulting in a 3x improvement in<br />

effective SAP capacity. Improvements like this are not typical, however, and would be much<br />

less dramatic if the original ESCON configuration was much better tuned to the point where<br />

RETRIES/SSCH was less than 1.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 36<br />

- - INITIATIVE QUEUE - ------- IOP UTILIZATION -------<br />

IOP ACTIVITY AVG Q % IOP I/O START INTERRUPT<br />

RATE LNGTH BUSY RATE RATE<br />

00 3424.947 0.01 13.53 3424.922 3374.204<br />

01 1969.652 0.00 5.02 1969.651 1921.234<br />

02 40<strong>1.0</strong>22 0.00 2.75 400.995 591.365<br />

03 4950.215 0.02 35.58 4950.211 5147.980<br />

SYS 10745.84 0.01 14.22 10745.78 11034.79<br />

significant<br />

reduction in<br />

retries...3x<br />

improvement<br />

in effective<br />

Sap capacity<br />

Figure 28<br />

after migration to z990<br />

and <strong>FICON</strong> (+ESCON)<br />

-------- RETRIES / SSCH ---------<br />

CP DP CU DV<br />

ALL BUSY BUSY BUSY BUSY<br />

0.15 0.14 0.00 0.01 0.00<br />

0.25 0.24 0.00 0.01 0.00<br />

0.23 0.22 0.00 0.01 0.00<br />

0.24 0.16 0.07 0.01 0.00<br />

0.21 0.17 0.03 0.01 0.00<br />

Figures 27 and 28 show the average RETRIES/SSCH at the overall I/O configuration level.<br />

To identify which part of the overall I/O configuration is experiencing contention, one needs<br />

to look at the LCU section of the I/O Queuing activity report. An example of this is displayed<br />

in Figure 29. The first column is the LCU id and the 2nd column is the CU id. In the 3rd<br />

column is a list of the channel path ids for this LCU. Up to 8 channel paths can be defined<br />

per LCU. In Figure 29, 6 channel paths are defined for LCU 0222. The “CHPID TAKEN”<br />

column is the equivalent of an activity rate. It is the number of SSCHs per second that were<br />

executed on the channel paths defined for this LCU. The %DP BUSY column is the % of<br />

times that the SAP encountered a busy signal at an ESCON director port when attempting to<br />

select this path for a new SSCH. %DP BUSY will be 0 for native <strong>FICON</strong> due to the<br />

elimination of destination port busy signals with native <strong>FICON</strong> packet-switched directors.<br />

%CU BUSY should also be 0 for native <strong>FICON</strong> in most customer production environments.<br />

CU busies will only occur with native <strong>FICON</strong> when an individual CU port is being overloaded<br />

with work from many different <strong>FICON</strong> channels simultaneously. The high % CU BUSY (15%<br />

for path 03 & 14% for path 06) in Figure 29 is an example of <strong>FICON</strong> CU port contention.<br />

Further evidence of this contention is the high AVG CMR DLY times for these channel paths<br />

and the low CHPID TAKEN values or activity rates for channel paths 03 & 06 in comparison<br />

to the other channel paths defined for this LCU. The AVG CMR DLY of 203ms for channel<br />

path 03 and 207ms for channel path 06 indicates that the CU ports that are connected to<br />

these channel paths are taking a very long time to respond to the new SSCH work that the<br />

channel is trying to send to them. In contrast, the CU ports that are connected to channel


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 37<br />

paths 01, 02, 04 and 05 are responding on average in about 0.5ms. In this case, if no errors<br />

were made in the IOCDS, then some “tuning” of the configuration is necessary to reduce<br />

these CU busies and achieve better response time results. Contention due to CU busies<br />

results in higher than normal PEND times and contributes to a higher than normal average<br />

number of open exchanges for this LCU. The solution to this is to identify the source of<br />

contention at the CU ports connected to channel paths 03 and 06 in this example and fix it.<br />

I/O Q U E U I N G A C T I V I T Y from a configuration<br />

with <strong>FICON</strong> CU port<br />

contention causing<br />

high pend times<br />

LCU CONTROL UNITS<br />

0222 1000<br />

AVG AVG<br />

CHAN CHPID % DP % CU CUB CMR<br />

PATHS TAKEN BUSY BUSY DLY DLY<br />

01 89.256 0.00 0.00 0.0 0.5<br />

02 86.348 0.00 0.00 0.0 0.5<br />

03 1.908 0.00 15.08 0.0 203<br />

04 89.644 0.00 0.00 0.0 0.5<br />

05 86.055 0.00 0.00 0.0 0.5<br />

06 2.132 0.00 13.95 0.0 207<br />

* 355.34 0.00 0.19 0.0 2.9<br />

note: CU busies and high CMR delay times is NOT<br />

normal for native <strong>FICON</strong>...indicates CU port contention<br />

Figure 29<br />

For <strong>FICON</strong> channels it is also possible to estimate the average number of bytes transferred<br />

per SSCH by dividing the MB/sec of a <strong>FICON</strong> channel from the <strong>Channel</strong> Path Activity report<br />

by the total SSCH/sec processed by a <strong>FICON</strong> channel from the I/O Queuing Activity report.<br />

The total SSCH/sec processed by a <strong>FICON</strong> channel can be determined by adding up all of the<br />

“CHPID taken” fields on the I/O Queuing Activity report for each LCU that a single <strong>FICON</strong><br />

channel is connected to. If the average data transfer sizes of your channel programs are<br />

greater than 27K bytes, then your workload is most likely pushing the channel and CU port<br />

buses and links to higher levels of utilization than other resources and you should focus on<br />

the MB/sec fields on your RMF <strong>Channel</strong> Activity and <strong>FICON</strong> Director Activity Reports and<br />

compare these to the maximum capability of the <strong>FICON</strong> channels, CU ports and links used in<br />

your configuration.<br />

In summary, the basics of performance analysis do not change with a <strong>FICON</strong> configuration<br />

versus an ESCON configuration. In both environments, an appropriate technique to use is to<br />

first calculate I/O intensities, which equals I/O rate multiplied by response times. This<br />

analysis can be done at a device volume level, an LCU level, a physical CU box level or for a<br />

group of channels. The parts of the total I/O configuration that have the highest I/O


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 38<br />

intensities are the “hot spots” of the configuration. These are the areas where configuration<br />

tuning has the potential for yielding the highest benefit. As explained above, the individual<br />

components of response time (IOSQ, DISC, PEND and CONN) will tell you where you should<br />

focus your efforts. The average open exchange calculation is a subset of the I/O intensity<br />

calculation that uses the DISC + CONN + CMR components of response time. Except in<br />

cases of extremely low control unit cache hit ratios, the open exchange limit is not the cause<br />

of high values of average open exchanges. Instead high values for average open exchanges<br />

are most likely the result of driving either the channels or the control unit to high levels of<br />

utilization. Tuning efforts need to be focused on the appropriate areas based on the DISC,<br />

CONN and CMR components of workload response times. If the <strong>FICON</strong> channel processor<br />

and bus utilizations as reported on the RMF <strong>Channel</strong> Activity report and link utilizations<br />

calculated from the MB/sec info are less than 50%, then the tuning efforts need to focus on<br />

the control units in the configuration.<br />

The basic architecture and design differences between <strong>FICON</strong> and ESCON resulted in many<br />

changes to the performance data that appear on RMF reports. Additional information in the<br />

form of <strong>FICON</strong> processor and bus utilizations, READ and WRITE MB/sec, AVG FRAME<br />

SIZE and AVG CMR DLY is provided to help analyze the multiplexing capability of <strong>FICON</strong>.<br />

Since ESCON is only capable of executing one I/O operation at a time, RMF reports the time<br />

that the entire CHPID path is busy for ESCON channel utilization. With <strong>FICON</strong>, we must<br />

consider the individual components of the total CHPID path such as the <strong>FICON</strong> channel<br />

processor and bus, the fibre link, the director destination port and the control unit port<br />

adapter microprocessor, bus and link. The charts and examples provided in this paper<br />

should help guide you in assessing the maximum capability of <strong>FICON</strong> <strong>Express2</strong> channels for<br />

your workload.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 39<br />

Conclusion<br />

The zSeries <strong>FICON</strong> <strong>Express2</strong> channels available on the z990 and z890 offer many benefits<br />

over ESCON and previous generations of <strong>FICON</strong> channels. The increased throughput and<br />

bandwidth capabilities of these channels offer the opportunity for improved performance with<br />

simpler configurations and reduced infrastructure over longer distances to meet the needs of<br />

future datacenter growth including backup and disaster recovery requirements. The total<br />

native <strong>FICON</strong> solution – DASD, TAPE and Printer attachments, directors and the new and<br />

improved <strong>FICON</strong> <strong>Express2</strong> channels – are available and ready for your installation.<br />

Additional <strong>FICON</strong> product information is available on the <strong>IBM</strong> System Sales Web site and<br />

the zSeries I/O connectivity Web site at<br />

www.ibm.com/servers/eserver/zseries/connectivity/.<br />

Acknowledgements<br />

The data presented in this paper is based upon measurements carried out over several years<br />

using a mixture of <strong>IBM</strong> internal tools and non-<strong>IBM</strong> I/O driver programs, specifically <strong>Version</strong><br />

13 of the PAI/O Driver for z/OS. I would like to thank all of the reviewers of this paper for<br />

their helpful comments. Special thanks go to Mario Borelli for his continued support on this<br />

effort.


<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />

Page 40<br />

Copyright <strong>IBM</strong> Corporation 2005<br />

<strong>IBM</strong> Corporation<br />

Marketing Communications, Server Group<br />

Route 100<br />

Somers, NY 10589<br />

U.S.A.<br />

Produced in the United States of America<br />

04/05<br />

All Rights Reserved<br />

<strong>IBM</strong>, <strong>IBM</strong> eServer, <strong>IBM</strong> logo, ESCON, <strong>FICON</strong>, RMF, and zSeries are trademarks or registered trademarks of<br />

International Business Machines Corporation of the United States, other countries or both.<br />

Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States,<br />

other countries or both.<br />

Linux is a registered trademark of Linus Torvalds<br />

ON (LOGO) DEMAND BUSINESS is a trademark of International Business Machines Corporation.<br />

PAI/O is a trademark of <strong>Performance</strong> Associates, Inc.<br />

UNIX is a registered trademark of The Open Group in the United States and other countries.<br />

Intel is a trademark of Intel Corporation in the United States, other countries or both.<br />

Other company, product and service names may be trademarks or service marks of others.<br />

Information concerning non-<strong>IBM</strong> products was obtained from the suppliers of their products or their published<br />

announcements. Questions on the capabilities of the non-<strong>IBM</strong> products should be addressed with the suppliers.<br />

<strong>IBM</strong> hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our<br />

warranty terms apply.<br />

<strong>IBM</strong> may not offer the products, services or features discussed in this document in other countries, and the<br />

information may be subject to change without notice. Consult your local <strong>IBM</strong> business contact for information on<br />

the product or services available in your area.<br />

All statements regarding <strong>IBM</strong>’s future direction and intent are subject to change or withdrawal without notice, and<br />

represent goals and objectives only.<br />

<strong>Performance</strong> is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard<br />

<strong>IBM</strong> benchmarks in a controlled environment. The actual throughput that any user will experience will vary<br />

depending upon considerations such as the amount of multiprogramming in the user’s job stream, the<br />

I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given<br />

that an individual user will achieve throughput improvements equivalent to the performance ratios<br />

stated here.<br />

GM13-0702-00

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!