FICON Express2 Channel Performance Version 1.0 - IBM
FICON Express2 Channel Performance Version 1.0 - IBM
FICON Express2 Channel Performance Version 1.0 - IBM
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
April 2005<br />
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong><br />
<strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Cathy Cronin<br />
zSeries I/O <strong>Performance</strong><br />
ccronin@us.ibm.com
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 1<br />
Introduction<br />
This white paper was developed to help <strong>IBM</strong> ® field sales specialists and technical<br />
representatives understand the performance characteristics of <strong>FICON</strong> ® <strong>Express2</strong> channels.<br />
What’s New<br />
<strong>FICON</strong> <strong>Express2</strong> channels are a new generation of <strong>FICON</strong> channels that offer improved<br />
performance capability over previous generations of <strong>FICON</strong> Express and <strong>FICON</strong> channels.<br />
They are being introduced on the <strong>IBM</strong> eServer zSeries ® 990 (z990) and zSeries 890 (z890).<br />
Overview<br />
<strong>IBM</strong> has made significant improvements to <strong>FICON</strong> channels since this product was initially<br />
shipped in 1999. The following chart depicts some of those improvements:<br />
14<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
Figure 1<br />
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong><br />
I/Os per second (k)<br />
4k block sizes<br />
<strong>Channel</strong> 100% utilized<br />
1200<br />
3600<br />
6000<br />
7200<br />
9200<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
13000<br />
300<br />
250<br />
200<br />
150<br />
100<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
2 Gbps<br />
270<br />
Reflected in the left bar chart is the "best can do" capabilities of each of the <strong>FICON</strong> channels<br />
in native <strong>FICON</strong> or FC mode measured at a point in time using an I/O driver benchmark<br />
program for 4K byte read hits. 4K bytes is the size of most online database I/O operations.<br />
These are the maximum possible or 100% channel utilization 4K I/O rates for each channel.<br />
50<br />
0<br />
MB/sec throughput (Full Duplex)<br />
Large Sequential R/Ws<br />
17<br />
74<br />
120<br />
170
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 2<br />
Normally customers should keep their channels at 50% or less channel utilization to achieve<br />
good online transaction response times.<br />
Reflected in the right bar chart is the "best can do" capabilities of each of the <strong>FICON</strong><br />
channels in native <strong>FICON</strong> or FC mode measured using an I/O driver benchmark program for<br />
6x27K or 6 half-track reads and writes. This is representative of the type of channel<br />
programs used in disk to tape backup jobs or other highly sequential batch jobs. The original<br />
<strong>FICON</strong> channels run at a link speed of 1 Gigabit/second. <strong>FICON</strong> Express and <strong>FICON</strong><br />
<strong>Express2</strong> channels will auto-negotiate to either 1 Gigabit/s or 2 Gigabit/s, depending on the<br />
capability of the director or control unit port at the other end of the link.<br />
As you can see, the <strong>FICON</strong> <strong>Express2</strong> channel as first introduced on the <strong>IBM</strong> zSeries z890 and<br />
z990 represents a significant improvement in both 4K I/O per second throughput and<br />
maximum bandwidth capability compared to ESCON ® and previous <strong>FICON</strong> offerings.<br />
Please remember that this performance data was measured in a controlled environment<br />
running an I/O driver program. The actual throughput or performance that any user will<br />
experience will vary depending upon considerations such as the amount of<br />
multiprogramming in the user’s job stream, the I/O configuration, the storage configuration,<br />
and the workload processed.<br />
This paper assumes that the reader is familiar with the basic benefits of <strong>FICON</strong> vs. ESCON<br />
technology and will explain in more detail the performance characteristics of <strong>FICON</strong><br />
<strong>Express2</strong> channels running in FC mode (native <strong>FICON</strong>) including DASD I/O driver<br />
benchmark results, CTC measurement results, <strong>FICON</strong> <strong>Express2</strong> channel and ESTI-M card<br />
level measurement results.<br />
Please note that <strong>FICON</strong> <strong>Express2</strong> channels do not support FCV (<strong>FICON</strong> Converter) mode for<br />
attachment to ESCON devices. For an introduction to the basic benefits of <strong>FICON</strong> vs.<br />
ESCON technology and for info on FCV mode performance, please see version 2 of the<br />
<strong>FICON</strong> and <strong>FICON</strong> Express <strong>Performance</strong> white paper on the zSeries I/O connectivity Web<br />
site at the following URL:<br />
www.ibm.com/servers/eserver/zseries/connectivity/
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 3<br />
Introduction to some terminology used in I/O processing<br />
First I would like to start by explaining some of the basic terms that I will be using in the rest<br />
of this paper.<br />
some resources & terminology involved in<br />
I/O processing<br />
zSeries CP<br />
<strong>FICON</strong><br />
channel<br />
card<br />
<strong>FICON</strong><br />
channel<br />
pci<br />
bus<br />
sti<br />
link<br />
ssch<br />
adapter<br />
esti-M<br />
card<br />
zSeries<br />
Sap/IOP<br />
esti<br />
link<br />
fc<br />
link director<br />
f-port<br />
Figure 2<br />
io<br />
interrupt<br />
MBA<br />
chip<br />
director<br />
f-port<br />
<strong>FICON</strong><br />
channel<br />
processor<br />
store<br />
fetch<br />
fc<br />
link<br />
zSeries<br />
Memory<br />
cu<br />
n-port<br />
As depicted in the top row of Figure 2, an I/O is initiated when a zSeries CP (Central<br />
Processor) executes a SSCH (start subchannel) instruction. This sends a signal to a SAP<br />
(System Assist Processor), which is also called an IOP ( I/O Processor) that there is I/O work<br />
to do. It is the SAP’s job to select which channel path to use to get to the device which is the<br />
target of this I/O. The SAP is also involved in processing the I/O interrupts that are sent back<br />
for most I/O’s at the end of the I/O operation. Some channel programs generate PCI’s<br />
(Programmed Controlled Interrupts) which can occur at designated points in the middle of an<br />
I/O operation.<br />
The second row depicts the path that is followed for any data transfer that occurs during an<br />
I/O operation between the <strong>FICON</strong> channel card and zSeries memory. For a READ I/O, data<br />
is READ from the device and stored into zSeries memory. For a WRITE I/O, data is fetched<br />
from zSeries memory and written to the device. There are 4 <strong>FICON</strong> <strong>Express2</strong> channels on a<br />
<strong>FICON</strong> <strong>Express2</strong> channel card that share a 1GB/sec STI link connected to an ESTI-M card.<br />
Up to 4 channel cards of any type (ESCON, <strong>FICON</strong> Express or <strong>FICON</strong> <strong>Express2</strong>) can be
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 4<br />
connected to the same ESTI-M card and these would share a single 2GB/sec ESTI link from<br />
the ESTI-M card to the MBA chip.<br />
The third row depicts the path followed by commands and data frames transferred from a<br />
<strong>FICON</strong> channel to a <strong>FICON</strong> CU port. Each of the 4 <strong>FICON</strong> <strong>Express2</strong> channels on the <strong>FICON</strong><br />
<strong>Express2</strong> channel card has its own PCI bus connected to an industry standard Emulex fibre<br />
channel adapter chip which handles the transmitting and receiving of frames across the<br />
2Gbps FC (fibre channel) link. The FC link could be connected point-to-point to a CU port<br />
or through a source and destination Fabric port (f-port) on a director. Both the channel and<br />
the CU ports are called N-ports in the Fabric. If two directors were cascaded together the<br />
ports connecting the two directors would be called E-ports and the link connecting the two<br />
directors is an ISL or inter-switch link. One source of confusion that I have seen very often is<br />
to use the term channel adapter or even just channel for the CU port. In this paper, when I<br />
use the term channel, I mean the chip on the card that is plugged into the zSeries CEC. It is<br />
important to understand that <strong>FICON</strong> channels and <strong>FICON</strong> CU ports can have very different<br />
performance capabilities. It is the performance capabilities of <strong>FICON</strong> <strong>Express2</strong> channels that<br />
are presented in this paper.<br />
In general, each of the various resources depicted above are utilized at different levels<br />
depending on the type of I/O that is being processed and the numbers of each resource (CPs,<br />
SAPs, MBA chips, ESTI-M cards, channel cards, director ports and CU ports) that are in the<br />
configuration. For the most part, with small block I/O operations, processors such as the<br />
<strong>FICON</strong> channel and the CU port are pushed to higher levels of utilization than the buses and<br />
links. In contrast, I/O’s that transfer a lot of data push the buses and links to higher levels of<br />
utilizations than the processors. The resource that gets pushed to the highest utilization will<br />
be the one that limits higher levels of throughput from being achieved.<br />
<strong>FICON</strong> <strong>Express2</strong> benchmark measurement results<br />
To achieve maximum channel capabilities, I/O driver benchmark measurements were<br />
conducted using a configuration with 4 <strong>FICON</strong> <strong>Express2</strong> channels on 4 different channel<br />
cards connected through three 2Gbps directors to 4 ports on each of 6 different control unit<br />
(CU) or storage subsystem boxes as depicted in Figure 3:
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 5<br />
z990<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
<strong>Channel</strong>s<br />
A1,B2,C3,D4<br />
CU box 1<br />
CU box 2<br />
Configuration used for<br />
<strong>FICON</strong> <strong>Express2</strong> channel<br />
benchmark measurements<br />
Director(s)<br />
Figure 3<br />
CU box 6<br />
Please note that the response time results reported in this paper are the average of all of the<br />
LCUs (Logical Control Units) on the storage subsystems or CU boxes used for these<br />
measurements.<br />
Measurements done in a point-to-point topology without directors and/or using control units<br />
that have ports with less I/O per second or MB/sec throughput capabilities than the <strong>FICON</strong><br />
<strong>Express2</strong> channels will not push the channels to their maximum capability. Furthermore, if<br />
one is interested in determining the maximum capability of a CU port instead of a channel,<br />
then it is recommended that a configuration with multiple channels connected through a<br />
director be used to obtain the best results. An example of this is depicted in Figure 4.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 6<br />
Recommended configuration for<br />
determining max capability of a CU port for<br />
benchmark testing<br />
multiple <strong>FICON</strong><br />
channels<br />
connected through<br />
a <strong>FICON</strong> director to<br />
a single CU port<br />
Figure 4<br />
The four basic DASD I/O driver benchmark programs used to evaluate the capabilities of the<br />
new <strong>FICON</strong> <strong>Express2</strong> channels are as follows:<br />
1. 4K bytes per I/O: this channel program processes small blocks of I/O and is capable of<br />
achieving high I/O per second rates but much lower MB/sec rates than large block<br />
channel programs. With the appropriate read/write ratios and CU cache hit ratios, this<br />
benchmark is representative of online transaction processing workloads.<br />
2. 6x27K bytes per I/O: this channel program processes 6 large blocks with 27K bytes each<br />
or 6 half-tracks of data and is capable of achieving high MB/sec but much lower I/O per<br />
second than the small block channel programs. It is representative of the type of channel<br />
programs used in disk to tape backup jobs or other highly sequential batch jobs.<br />
3. 27K bytes per I/O: this channel program processes a single half track of data and achieves<br />
both I/O per second and MB/sec that are in between the extremes of the 4K and 6x27K<br />
bytes per I/O benchmarks.<br />
4. 32x4K bytes per I/O: this channel program processes 32 small (4K byte) blocks of I/O and<br />
is representative of some DB2 pre-fetching utilities and other channel programs that<br />
process long chains of short blocks of data.<br />
Figure 5 below shows the average of all of the LCU (Logical Control Unit) response times for<br />
the 4k read hit benchmark measurement plotted with <strong>FICON</strong> Processor Utilization(FPU) %s .<br />
Response times in milliseconds are on the left y-axis. FPU %s are on the 2nd or right y-axis.<br />
The knee of the response time curve occurs around 10,000 I/O’s per second and just above<br />
70% <strong>FICON</strong> Processor utilization (FPU) when running this very simple 4K read hit<br />
benchmark workload. But most real production workloads are more complex than this<br />
z990<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
<strong>Channel</strong>s<br />
W1,X2,Y3,Z4<br />
Director<br />
CU box
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 7<br />
simple benchmark and in general, we usually recommend that you keep FPU below 50% to<br />
achieve good online transaction response times. The 50% FPU point occurs between 6000<br />
and 7000 I/O’s per second when running this very simple 4K read hit benchmark workload.<br />
<strong>FICON</strong> <strong>Express2</strong>(FEx2) 4K read hit<br />
response times & <strong>FICON</strong> processor utilization(FPU)<br />
resp time in millisec<br />
5<br />
100<br />
90<br />
4<br />
80<br />
70<br />
3<br />
60<br />
50<br />
2<br />
40<br />
30<br />
1<br />
20<br />
10<br />
0<br />
0<br />
0 2 4 6 8 10 12 14<br />
io/sec<br />
Thousands<br />
Figure 5<br />
Figure 6 shows the breakdown of response time components for this simple 4k read hit<br />
benchmark from 10% <strong>FICON</strong> Processor Utilization(FPU) through 70% FPU or just before<br />
the knee of the response time curve. Both total response times and the response time<br />
components of IOSQ, PEND, DISC (disconnect) and CONN (connect) time can be found on<br />
the RMF Device Activity report. IOSQ time is the time that an I/O is delayed due to the fact<br />
that another I/O from this system is already using the target device for this I/O. PEND time<br />
starts when the CP sends the I/O request to the SAP and includes the amount of time it takes<br />
for the channel to process the first few CCWs (<strong>Channel</strong> Command Words) in the channel<br />
program and send the commands to the Control Unit and does not end until the Control Unit<br />
sends a CMR or Command Response back to the channel. Disconnect time is the amount of<br />
time it takes the Control Unit to service a CU cache miss and retrieve the data from the<br />
device. CONNECT time is basically the data transfer time for the I/O. So, in this 4k read hit<br />
benchmark, there is no DISC time since there are no CU cache misses and there is no IOSQ<br />
time since the I/O driver program we use is designed to wait for an I/O to an individual<br />
device to finish before it issues another I/O to that same device. So, all we have is PEND<br />
<strong>FICON</strong> Processor Util%<br />
FEx2 resp time<br />
FEx2 FPU%
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 8<br />
time + CONN time. Since each I/O only transfers 4K bytes of data, CONN time is the smaller<br />
of the two components and remains relatively constant from low I/O rates to high I/O rates.<br />
On the other hand, PEND time grows as the FPU% grows. If we had used a point-to-point<br />
configuration to do this measurement and if the CU port had less capability than the new<br />
<strong>FICON</strong> <strong>Express2</strong> channel, then the PEND time would have grown faster as a function of the<br />
CU port processor utilization and we would not have been able to push the channel to its<br />
maximum capability.<br />
response time components for 4K read<br />
hits...4K bytes/io, 100% cu cache hit ratio<br />
<strong>FICON</strong> processor utilization<br />
10% FPU<br />
20% FPU<br />
30% FPU<br />
40% FPU<br />
50% FPU<br />
60% FPU<br />
70% FPU<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />
response times in ms<br />
Figure 6<br />
pend<br />
conn<br />
Figure 7 shows the response time components for a more realistic version of an online<br />
transaction processing workload with a mix of reads and writes and a 70 to 80% CU cache hit<br />
ratio. In this case disconnect time is the largest component of the total response time and the<br />
component that grows the most as the activity rate increases. PEND and CONNECT times<br />
are about equal up to the 60% FPU point. There is a more significant increase in total<br />
response time beyond the 50% FPU point than there was with the simpler 4K read<br />
benchmark.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 9<br />
response time components for 4K bytes/io,<br />
3:1 r/w ratio, 70 to 80% cu cache hit ratio<br />
<strong>FICON</strong> Processor utilization<br />
10% FPU<br />
20% FPU<br />
30% FPU<br />
40% FPU<br />
50% FPU<br />
60% FPU<br />
70% FPU<br />
0 1 2 3 4 5<br />
response times in ms<br />
Figure 7<br />
pend<br />
disc<br />
conn<br />
Figures 8 and 9 represent two different ways of looking at the results of the 6x27k read hit<br />
benchmark. Figure 8 is a plot of response times and <strong>FICON</strong> processor utilization for the<br />
6x27K read hit benchmark. Since this workload transfers over 165,000 bytes per I/O using a<br />
block size of 27K bytes, it stresses the links and buses more than it does the <strong>FICON</strong> channel<br />
processor. The maximum I/Os per second achieved was only 1100 io/sec which only drove<br />
the <strong>FICON</strong> processor utilization to about 40%. Therefore the channel processor is really not<br />
the resource that prevents this workload from achieving higher throughput. In general, for<br />
workloads that use large block sizes such as the 27K byte half-track size, it makes more sense<br />
to look at MB/sec instead of I/O per second and bus or link utilizations instead of processor<br />
utilizations. Figure 9 shows that this workload achieves 200 MB/sec which is the limit of the<br />
2 Gbps link.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 10<br />
z990 FEx2 6x27K Read hits<br />
response times & FPU vs io/sec<br />
avg response time in ms<br />
5<br />
100<br />
90<br />
4<br />
80<br />
70<br />
3<br />
60<br />
50<br />
2<br />
40<br />
30<br />
1<br />
20<br />
10<br />
0<br />
0<br />
0 200 400 600 800 1000 1200 1400<br />
io/sec per channel<br />
Figure 8<br />
In Figure 9, we look at the results of this same 6x27K read hit benchmark measurement with<br />
response times and <strong>FICON</strong> link utilization vs. MB/sec. <strong>FICON</strong> channel link utilizations are<br />
not directly reported on RMF but can be easily calculated by dividing the READ or WRITE<br />
MB/sec by the link capacity. In this case, with 2 Gpbs links, the capacity is approximately<br />
200MB/sec. Here we see that the 2 Gbps link is the limit to achieving higher throughput<br />
since it is the resource that is being pushed closest to 100% utilization. The “knee of the<br />
response time” curve generally occurs between 70 and 80% link utilizations. If you are more<br />
interested in throughput than response times, then these utilization levels may be acceptable.<br />
If, however, you are running response time sensitive workloads then it might be more<br />
appropriate to keep link utilizations below 50%.<br />
<strong>FICON</strong> processor util%<br />
FEx2 resp time<br />
FEx2 FPU
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 11<br />
z990 FEx2 6x27K Read hits<br />
response times & link util vs MB/sec<br />
avg response time in ms<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
0<br />
0 40 80 120 160 200<br />
MB/sec per channel<br />
Figure 9<br />
Figure 10 shows the response time components for the 6x27k read hit benchmark. Since the<br />
data transfer size for this channel program is 162K bytes per I/O and it uses a large 27k block<br />
size, CONNECT time is the dominant part of the total response time. CONNECT time grows<br />
from under 2ms at 20% channel link utilization to over 3ms at very high (90%) link<br />
utilization levels but these CONNECT times are still significantly better than the 10ms<br />
measured for this benchmark a few years ago using ESCON channels. PEND time also grows<br />
a few tenths of a millisecond at high link utilizations. But at the 50% <strong>FICON</strong> <strong>Express2</strong><br />
channel link utilization level, total response times are only a few tenths of a millisecond<br />
higher than the best case response times for this workload.<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
<strong>FICON</strong> link util%<br />
FEx2 resp time<br />
FEx2 link util%
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 12<br />
response time components for 6x27K read<br />
hits...162K bytes/io, 100% cu cache hit ratio<br />
2Gbps Link Utilization(LU)<br />
20% LU<br />
30% LU<br />
40% LU<br />
50% LU<br />
60% LU<br />
70% LU<br />
80% LU<br />
90% LU<br />
0 1 2 3 4 5<br />
response times in ms<br />
Figure 10<br />
pend<br />
conn<br />
Figure 11 depicts the results of the 6x27K read/write mix benchmark where we achieve<br />
270MB/sec by taking advantage of the full duplex capabilities of <strong>FICON</strong> links and<br />
simultaneously processing some I/O’s that READ from DASD and some I/O’s that WRITE to<br />
DASD. The 270 MB/sec achieved for this benchmark using <strong>FICON</strong> <strong>Express2</strong> channels is<br />
more than 50% higher than the maximum MB/sec that was achieved with the previous<br />
generation <strong>FICON</strong> Express channels. The Full Duplex Link Utilizations (FDLU) plotted on<br />
the 2nd y-axis in Figure 11 is calculated by dividing the sum of the READ + WRITE MB/sec<br />
by 400 MB/sec which is the sum of the maximum instantaneous capabilities of the two<br />
directional 2 Gigabit per second (Gbps) links that exist between the <strong>FICON</strong> <strong>Express2</strong> channel<br />
and the director port, where one 2 Gbps link transmits the commands and data frames from<br />
the channel to the director and the other 2 Gbps link transfers the commands and data<br />
frames in the opposite direction from the director to the channel.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 13<br />
z990 FEx2 6x27K Read/write mix<br />
response times & Link Utilization vs MB/sec<br />
Response Times in ms<br />
10<br />
Figure 11<br />
Figure 12 depicts the response time components for the 6x27k read/write mix benchmark.<br />
CONNECT time is the largest component and increases the most as full duplex link<br />
utilization increases.<br />
Full Duplex Link Utilization(FDLU)<br />
8<br />
6<br />
4<br />
2<br />
0<br />
0<br />
30<br />
response time components for 6x27K read/write<br />
mix, 162K bytes/io, 100% cu cache hit ratio<br />
10% FDLU<br />
20% FDLU<br />
30% FDLU<br />
40% FDLU<br />
50% FDLU<br />
60% FDLU<br />
60<br />
90<br />
120<br />
150<br />
180<br />
210<br />
MB/sec per channel<br />
240<br />
270<br />
0 1 2 3 4 5 6<br />
response times in ms<br />
Figure 12<br />
0<br />
300<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
Full Duplex Link Utilization(FDLU)<br />
Resp time<br />
FEx2 FDLU<br />
pend<br />
conn
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 14<br />
Figure 13 shows the <strong>FICON</strong> <strong>Express2</strong> channel PCI bus utilizations for this same 6x27K<br />
Read/Write mix benchmark. PCI bus utilizations are the bus utilizations reported on the<br />
RMF <strong>Channel</strong> activity report for <strong>FICON</strong> <strong>Express2</strong> channels but there is another internal<br />
channel bus whose utilization is roughly 1.5 to 2 times the PCI bus utilization. This internal<br />
channel bus is the real resource that limits the 6x27K Read/Write benchmark from achieving<br />
higher than 270MB/sec but it is highly unlikely that any real production workload would<br />
come anywhere near approaching this limit. For real production workloads, the most<br />
relevant resource limits to pay attention to are the channel and the control unit processor and<br />
link limits and these are the limits that I have highlighted for each benchmark measurement<br />
result presented in this paper.<br />
z990 FEx2 6x27K Read/write mix<br />
<strong>FICON</strong> Bus Utilization(FBU) vs MB/sec<br />
<strong>FICON</strong> bus utilization (FBU)<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
0 30 60 90 120 150 180 210 240 270 300<br />
Total READ + WRITE MB/sec<br />
Figure 13<br />
FEx2 FBU<br />
Figures 14 and 15 represent two different ways of looking at the results of the 27K or<br />
half-track read hit benchmark. The first figure below is a plot of response times and <strong>FICON</strong><br />
processor utilizations vs. io/sec. Here we see a sharp increase in response times just over<br />
6000 io/sec and at about 60% <strong>FICON</strong> processor utilization (FPU). The second 27K read hit<br />
graph plots response times and <strong>FICON</strong> channel link utilizations vs. MB/sec. The sharp<br />
increase in response times occurs at about 170 MB/sec which is about 85% of the maximum<br />
link capability and indicates that the 2 Gbps link is the limiting resource for this workload.<br />
But processor utilizations are pushed to high levels as well. The 27k read hit benchmark<br />
pushes both processor and link utilizations to high levels with link utilizations slightly higher<br />
than the processor.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 15<br />
<strong>FICON</strong> <strong>Express2</strong> 27K read hit<br />
response times & channel processor util<br />
resp time in millisec<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
0<br />
0 1 2 3 4 5 6 7<br />
io/sec<br />
Thousands<br />
Figure 14<br />
<strong>FICON</strong> <strong>Express2</strong> 27K read hit<br />
response times & link utilizations<br />
resp time in millisec<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
0<br />
0 40 80 120 160 200<br />
MB/sec<br />
Figure 15<br />
100<br />
80<br />
60<br />
40<br />
20<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
<strong>FICON</strong> Processor Util%<br />
<strong>FICON</strong> Link Util%<br />
FEx2 resp time<br />
FEx2 FPU%<br />
FEx2 resp time<br />
FEx2 Link U%
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 16<br />
Figure 16 is the response time components for the 27k read hit benchmark. Because of the<br />
large 27k block size, CONNECT time is the largest component of the total response time. As<br />
link utilizations (LU) increase, CONNECT time increases first by one tenth of a millisecond,<br />
and then PEND time also increases by one tenth of a millisecond. At 80% LU, both<br />
CONNECT time and PEND time are 0.4ms higher than they were at 10% LU, another<br />
indicator that this workload is in the middle of the two extremes defined by processor limited<br />
workloads such as the 4K bytes per I/O benchmarks and bus or link limited workloads such<br />
as the 6x27K bytes per I/O benchmarks.<br />
response time components for 27K read hits...<br />
27K bytes/io, 100% cu cache hit ratio<br />
2Gbps Link Utilization(LU)<br />
10% LU<br />
20% LU<br />
30% LU<br />
40% LU<br />
50% LU<br />
60% LU<br />
70% LU<br />
80% LU<br />
0 0.5 1 1.5 2<br />
response times in ms<br />
Figure 16<br />
pend<br />
conn<br />
The 32x4k read hit benchmark depicted in Figure 17 is another benchmark that pushes both<br />
the processor and the link to high levels of utilization with the processor slightly higher than<br />
the link utilization. Figure 17 shows the response time components for a 32x4K read hit<br />
benchmark which is a long chain of short blocks. Here CONNECT time is the largest<br />
component since the total data transfer size is 128K bytes per I/O even though the block size<br />
for each CCW is only 4K bytes. Since there is a separate CCW for each of the 4K bytes and<br />
the <strong>FICON</strong> <strong>Express2</strong> channel processor works on each CCW separately, the processor gets<br />
pushed to high utilization levels for this workload. With this benchmark, CONNECT time<br />
starts out at a little over 2ms at 10% <strong>FICON</strong> Processor utilization (FPU) and increases to over<br />
3ms at 60% FPU, when just under 120MB/sec are being transferred, which represents about<br />
60% link utilization as well.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 17<br />
response time components for 32x4K read hits...<br />
128K bytes/io, 100% cu cache hits<br />
<strong>FICON</strong> Processor Utilization<br />
10% FPU<br />
20% FPU<br />
30% FPU<br />
40% FPU<br />
50% FPU<br />
60% FPU<br />
0 1 2 3 4<br />
response time in ms<br />
Figure 17<br />
pend<br />
conn<br />
The following table summarizes info from both the RMF <strong>Channel</strong> Activity and RMF <strong>FICON</strong><br />
Director Activity reports at high levels of utilization for these benchmarks measurements:<br />
<strong>Channel</strong><br />
program<br />
4K byte read<br />
hits<br />
32x4K byte read<br />
hits<br />
27K byte read<br />
hits<br />
6x27K byte read<br />
hits<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
channel<br />
processor<br />
utilization<br />
91%<br />
83%<br />
62%<br />
39%<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
channel link<br />
utilization<br />
26%<br />
71%<br />
84%<br />
100%<br />
READ MB/sec<br />
52 MB/se<br />
142 MB/sec<br />
168 MB/sec<br />
200 MB/sec<br />
Average<br />
FRAME size in<br />
bytes<br />
843<br />
1,334<br />
1,766<br />
1,967
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 18<br />
The average frame size info from the RMF <strong>FICON</strong> Director Activity report can be used to<br />
determine if a workload is more likely to be processor or link limited. The following general<br />
rule-of-thumb can be applied:<br />
If the average frame size is less than 1000 bytes, then the workload is most likely<br />
processor limited.<br />
If the average frame size is greater than 1500 bytes, then the workload is most likely<br />
bus or link limited.<br />
For workloads with average frame sizes greater than 1000 bytes and less than 1500<br />
bytes, then both channel processor and bus/link utilizations should be monitored.<br />
More information about how to find these fields on the RMF <strong>Channel</strong> Activity and<br />
<strong>FICON</strong> Director Activity reports for your production workload can be found in the<br />
<strong>FICON</strong> RMF Information section of this paper.<br />
In summary, the performance results of 4 different DASD I/O driver benchmarks run on<br />
<strong>FICON</strong> <strong>Express2</strong> channels were presented here. Response times and utilizations of the most<br />
pertinent channel resources for each benchmark were explained. For all of these<br />
benchmarks, results are significantly better than previous generations of <strong>FICON</strong>, <strong>FICON</strong><br />
Express and especially ESCON channels.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 19<br />
<strong>FICON</strong> <strong>Express2</strong> CTC performance<br />
For <strong>Channel</strong>-to-<strong>Channel</strong> (CTC) applications, the previous generation <strong>FICON</strong> Express channel<br />
was better than ESCON for all large block transfers. But for customers using CTC as a<br />
transport mechanism for small (1K bytes or less) XCF messages, ESCON CTC previously had<br />
the best response times at low activity rates. Now, as depicted in Figure 18, the new <strong>FICON</strong><br />
<strong>Express2</strong> CTC response times for short (1K bytes or less) XCF messages are 25 to 35% better<br />
than <strong>FICON</strong> Express and ESCON CTC response times at low activity rates. Furthermore,<br />
signals per second throughput rates at the 400 usec response time level for short messages<br />
across <strong>FICON</strong> <strong>Express2</strong> CTC are 1.5 to 3 times better than <strong>FICON</strong> Express CTC and ESCON<br />
CTC link capabilities.<br />
<strong>FICON</strong> <strong>Express2</strong>(FEx2) CTC response<br />
times with short XCF messages...<br />
better than ESCON!<br />
i/o resp time in microsec<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
0 5 10 15<br />
signals per second<br />
Thousands<br />
Figure 18<br />
ESCON<br />
FEx<br />
FEx2
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 20<br />
<strong>FICON</strong> <strong>Express2</strong> Card level performance<br />
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> card configurations...<br />
4 channels per card<br />
4 cards or up to 16 channels per esti<br />
z-series<br />
Memory<br />
MBA<br />
chip<br />
esti<br />
link<br />
esti-M<br />
card<br />
Figure 19<br />
sti<br />
link<br />
sti<br />
link<br />
sti<br />
link<br />
sti<br />
link<br />
<strong>FICON</strong> <strong>Express2</strong><br />
channel card<br />
<strong>FICON</strong> <strong>Express2</strong><br />
channel card<br />
<strong>FICON</strong> <strong>Express2</strong><br />
channel card<br />
<strong>FICON</strong> <strong>Express2</strong><br />
channel card<br />
The 4 <strong>FICON</strong> <strong>Express2</strong> channels on the same physical card all connect to a single 1 GB/sec<br />
STI link. Since 4 times the max capability of a single <strong>FICON</strong> <strong>Express2</strong> channel exceeds 1<br />
GB/sec, measurements were done to determine the max capability of the 1 GB/sec STI link.<br />
In Figure 20, these measurement results are compared to the previous generation <strong>FICON</strong><br />
Express channel card which had 2 channels per card connected to a 333 MB/sec STI link.<br />
For the <strong>FICON</strong> <strong>Express2</strong> channel card, a max of 644 READ MB/sec, 651 WRITE MB/sec<br />
and 970 READ+WRITE MB/sec was measured which represents a 2.5 to 3.5 times<br />
improvement compared to the previous generation <strong>FICON</strong> Express channel card.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 21<br />
z990 <strong>FICON</strong> <strong>Express2</strong> vs <strong>FICON</strong> Express<br />
card level MB/sec comparison<br />
MB/sec<br />
1200<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
276<br />
265<br />
178<br />
z990 FEx reads<br />
FEx writes<br />
FEx r/w mix<br />
644 651<br />
970<br />
FEx2 reads<br />
FEx2 writes<br />
FEx2 r/w mix<br />
Figure 20<br />
4 <strong>FICON</strong> <strong>Express2</strong><br />
channels per card<br />
with a 1GB/sec STI<br />
vs 2 <strong>FICON</strong> Express<br />
channels per card<br />
with a 333MB/sec STI<br />
--> 2.5x to 3.5x<br />
improvement at card<br />
level<br />
As depicted in Figure 19, the 4 <strong>FICON</strong> <strong>Express2</strong> channel cards can be connected via an<br />
ESTI-M card to a single 2 GB/sec ESTI link. Since 4 times the max capability of a single<br />
<strong>FICON</strong> <strong>Express2</strong> channel card exceeds 2 GB/sec, measurements were done to determine the<br />
maximum capability of a single 2GB/sec ESTI link. As shown in Figure 21, a maximum of<br />
1551 READ MB/sec, 1587 WRITE MB/sec and 1843 READ + WRITE MB/sec was measured<br />
in this configuration using a 6x27K channel program.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 22<br />
z990 <strong>FICON</strong> <strong>Express2</strong> single card vs<br />
4 cards per 2GB/sec sti domain<br />
MB/sec<br />
2000<br />
1500<br />
1000<br />
500<br />
0<br />
644 651<br />
single card reads<br />
single card writes<br />
single card r/w mix<br />
970<br />
1587<br />
1551<br />
1843<br />
sti domain reads<br />
sti domain writes<br />
sti domain r/w mix<br />
Figure 21<br />
4 <strong>FICON</strong> <strong>Express2</strong><br />
cards per 2GB/sec STI<br />
domain--><br />
78% of sti speed for<br />
reads<br />
79% of sti speed for<br />
writes<br />
92% of sti speed for<br />
read/write mix<br />
It has been my experience that the only time customers come close to pushing this level of<br />
I/O bandwidth in a single I/O domain is when running I/O driver benchmarks in a test<br />
environment. It is unlikely to see it in a real customer production environment for several<br />
reasons. Normally the configurator will spread different types of channel cards across multiple<br />
ESTI-M cards so that any one ESTI-M card might have a mixture of 1 ESCON card, 1 <strong>FICON</strong><br />
Express channel card and 1 or 2 <strong>FICON</strong> <strong>Express2</strong> channel cards. For many years, we have<br />
recommended that when configuring up to 8 channel paths per LCU that the individual<br />
channels in the path group be selected from different physical channel cards. In this way if a<br />
particular LCU is running a high I/O bandwidth application the MB/sec load will be spread<br />
across multiple channel cards, STI links, MBA chips and books. Furthermore, in any<br />
particular time interval “hot spots” of activity tend to be limited to small groups of channels.<br />
In any case, the maximum ESTI-M card capability is presented here for your awareness. To<br />
determine if you are one of the vast majority of customers that has NO reason to be<br />
concerned about this, you can take the following approach:
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 23<br />
1. As a first pass, you can simply add up the READ and WRITE MB/sec from the RMF<br />
<strong>Channel</strong> Activity report for all of the channels configured on your system. If that sum is<br />
less than 1.5 GB/sec, you are done.<br />
2. If not, then select the 16 channels with the highest MB/sec and add those up. If that sum<br />
is less than 1.5 GB/sec, you are done.<br />
3. If not, then you need to more carefully determine which channels are plugged in to which<br />
ESTI-M card (the PCHID report can help you do this) and add up the READ + WRITE<br />
MB/sec for those channels.<br />
Again, with the exception of those customers running high I/O bandwidth benchmark tests,<br />
most customers will be able to stop at step 1.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 24<br />
<strong>FICON</strong> RMF Information<br />
This section of the white paper will explain the I/O performance information available on the<br />
following RMF reports:<br />
1. <strong>Channel</strong> Path Activity report<br />
2. Device Activity report<br />
3. <strong>FICON</strong> Director Activity report<br />
4. I/O Queuing Activity report<br />
The primary RMF report of interest for <strong>FICON</strong> is the <strong>Channel</strong> Path Activity report. Figure 22<br />
is an excerpt from this report.<br />
C H A N N E L P A T H A C T I V I T Y<br />
MODE: LPAR CPMF: EXTENDED MODE CSSID: 0<br />
CHANNEL PATH UTILIZATION(%) READ(MB/SEC) WRITE(MB/SEC)<br />
ID TYPE G SHR PART TOTAL BUS PART TOTAL PART TOTAL<br />
95 FC_S 4 Y 61.11 61.11 32.56 119.34 119.34 0.00 0.00<br />
Figure 22<br />
<strong>FICON</strong> channels can be identified from the TYPE column; their type begins with FC:<br />
type FC indicates a native <strong>FICON</strong> channel;<br />
type FC_S indicates a native <strong>FICON</strong> channel connected to a switch or director<br />
type FCV indicates a <strong>FICON</strong> bridge channel which connects to an ESCON control unit via a<br />
bridge card in a 9032 model 5 ESCON director. <strong>FICON</strong> <strong>Express2</strong> channels do not support<br />
FCV mode.<br />
The ID column is the <strong>Channel</strong> Path ID or CHPID number. CHPID 95 is displayed in Figure<br />
22.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 25<br />
The Generation (G) field tells you a combination of which generation <strong>FICON</strong> channel is<br />
being used and the speed of the fibre channel link for this CHPID at the time the machine<br />
was IPL’d. A “4” appears in the G field for CHPID 95 in Figure 22. This means that this<br />
channel is a <strong>FICON</strong> <strong>Express2</strong> channel with a link speed of 2 Gbps. If this channel was<br />
connected to a 1 Gbps director, then there would be a “3” in the G field. A “2” indicates a<br />
<strong>FICON</strong> Express channel with a link speed of 2 Gbps and a “1” indicates a <strong>FICON</strong> Express<br />
channel operating at 1Gbps.<br />
For a given <strong>FICON</strong> channel there are three possible entries under UTILIZATION (%):<br />
1. PART denotes the <strong>FICON</strong> processor utilization due to this logical partition.<br />
2. TOTAL denotes the <strong>FICON</strong> processor utilization for the sum of all the LPARs.<br />
3. BUS denotes the <strong>FICON</strong> PCI bus utilization for the sum of all the LPARs.<br />
The <strong>FICON</strong> processor is busy for channel program processing, which includes the processing<br />
of each individual channel command word (CCW) in the channel program and some setup<br />
activity at the beginning of the channel program and cleanup at the end. A very precise<br />
algorithm is used for calculating zSeries <strong>FICON</strong> Express and <strong>FICON</strong> <strong>Express2</strong> channel<br />
utilizations. This algorithm is based on monitoring the amount of time the channel processor<br />
spends doing various separate functions, and the results of this algorithm give a much more<br />
accurate measure of <strong>FICON</strong> processor busy time than the original algorithm based on<br />
counting command and data sequences, which is still used for 9672 G5/G6 <strong>FICON</strong> channels.<br />
The <strong>FICON</strong> bus is busy for the actual transfer of command and data frames from the <strong>FICON</strong><br />
channel chip to the fibre channel adapter chip, which is connected via the fibre channel link<br />
to the director or control unit. For <strong>FICON</strong> and <strong>FICON</strong> Express channels, the <strong>FICON</strong> bus is<br />
also busy when the <strong>FICON</strong> processor is polling for work to do. This is why one can see<br />
anywhere from 5 to 15% <strong>FICON</strong> bus utilization on the RMF <strong>Channel</strong> Activity report during<br />
time intervals when there are no I/Os active on those channels. The new <strong>FICON</strong> <strong>Express2</strong><br />
channels, however, no longer use the bus for polling and therefore the bus utilization should<br />
be less than 1% for these channels when there are no I/O’s active for an entire RMF<br />
reporting interval.<br />
The actual FC channel processor and bus utilizations as reported by RMF will vary by<br />
workload and by channel type. As shown in Figure 22 above, <strong>FICON</strong> <strong>Express2</strong> channels<br />
provide bandwidth information (MB/SEC) not available for ESCON channels. This is<br />
provided separately for READs and WRITEs since the fibre channel link is full duplex, at<br />
both the logical partition level (PART) and the entire system level (TOTAL). Fibre channel<br />
link utilizations are not directly reported on RMF but can be easily calculated by dividing the
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 26<br />
READ or WRITE MB/sec by the link capacity. Several examples of <strong>FICON</strong> <strong>Express2</strong> channel<br />
processor, bus and link utilizations based on I/O driver benchmark measurements are<br />
displayed in Figures 5 through 17 of this paper.<br />
With <strong>FICON</strong> <strong>Express2</strong> channels, customers should continue to analyze their I/O activity by<br />
looking at the DASD or TAPE activity reports, just as they did with <strong>FICON</strong>, <strong>FICON</strong> Express<br />
and ESCON channels. An example of a Direct Access Device Activity report is shown in<br />
Figure 23.<br />
device activity report...response times...<br />
benefit of PAVs...<br />
D I R E C T A C C E S S D E V I C E A C T<br />
z/OS V1R6 SYSTEM ID xxxx DATE 01/24/2005<br />
RPT VERSION V1R5 RMF TIME 1<strong>1.0</strong>9.28<br />
DEVICE AVG AVG AVG AVG AVG AVG AVG<br />
DEV DEVICE VOLUME PAV LCU ACTIVITY RESP IOSQ CMR DB PEND DISC CONN<br />
NUM TYPE SERIAL RATE TIME TIME DLY DLY TIME TIME TIME<br />
4612 33903 DS3B02 1 0037 54.736 2.2 1.2 0.0 0.0 0.2 0.4 0.4<br />
4613 33903 DS3B03 1 0037 48.996 8.7 5.3 0.0 0.0 0.2 1.8 1.4<br />
4616 33903 DS3B06 1 0037 15.196 8.0 2.5 0.0 0.0 0.2 3.1 2.2<br />
4617 33903 DS3B07 1 0037 20.761 9.7 3.6 0.0 0.0 0.2 3.3 2.6<br />
461C 33903 DS3B0C 1 0037 17.189 13.6 6.6 0.0 0.0 0.2 3.8 2.9<br />
461E 33903 DS3B0E 1 0037 41.288 9.0 4.9 0.0 0.0 0.2 2.3 1.7<br />
LCU 0037 1196.01 3.5 1.7 0.0 0.0 0.2 0.9 0.9<br />
4612 33903 DS3B02 4 0037 55.669 0.5 0.0 0.0 0.0 0.2 0.1 0.2<br />
4613 33903 DS3B03 4 0037 50.145 1.8 0.0 0.0 0.0 0.2 0.8 0.8<br />
4616 33903 DS3B06 4 0037 13.828 8.2 0.0 0.0 0.0 0.2 4.3 3.7<br />
4617 33903 DS3B07 4 0037 20.348 6.4 0.0 0.0 0.0 0.2 3.3 2.9<br />
461C 33903 DS3B0C 4 0037 16.929 8.0 0.0 0.0 0.0 0.2 4.2 3.6<br />
461E 33903 DS3B0E 4 0037 41.106 3.4 0.0 0.0 0.0 0.2 1.7 1.5<br />
LCU 0037 1226.54 1.7 0.0 0.0 0.0 0.2 0.7 0.8<br />
Figure 23<br />
Here one can examine the AVG RESP TIME and various response time components (IOSQ,<br />
PEND, DISC and CONN times) for activity to the LCUs attached to the <strong>FICON</strong> <strong>Express2</strong><br />
channels. If response time is a problem, then the response time components need to be<br />
looked at. If disconnect time is a problem, then an increase in CU cache size might help. If<br />
IOSQ time is a problem, then Parallel Access Volumes might help. Figure 23 shows an<br />
example of the reduction in IOSQ time experienced on an IMS benchmark measurement<br />
when 4 PAVs were defined vs. 1. In this particular case, IOSQ time improved from an<br />
average of 1.7ms to 0ms for this LCU. If PEND or CONNECT times are too high, then one
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 27<br />
can look at the <strong>FICON</strong> processor, bus and link utilizations. If any one of these utilizations is<br />
above 50% then overuse of the <strong>FICON</strong> channel could be contributing to additional PEND<br />
and CONNECT time delays. If, on the other hand, PEND and CONNECT times are high and<br />
<strong>FICON</strong> channel utilizations are less than 50%, then overuse of a <strong>FICON</strong> director port or<br />
control unit port could be contributing factors. If <strong>FICON</strong> channels from multiple CECs are<br />
connected to the same director destination port, then one must add up the activity from all<br />
the CECs to determine the total destination port activity. This total activity level should be<br />
less than the “knee of the curve” points depicted in the measurement results that appear in<br />
the white papers for the specific native <strong>FICON</strong> DASD or TAPE product that is being used.<br />
One of the basic differences between native <strong>FICON</strong> and ESCON channel performance is the<br />
CONNECT time component of response time. Since an ESCON channel is only capable of<br />
executing one I/O at a time, the amount of time that it takes to execute the protocol + data<br />
transfer components of CONNECT time is relatively constant from one I/O operation to the<br />
next with the same exact channel program. With <strong>FICON</strong> however, CONNECT time can vary<br />
from one execution of a channel program to another. This is a side effect of the multiplexing<br />
capability of <strong>FICON</strong>. Since both the channel and the control unit can be concurrently<br />
executing multiple I/O operations, the individual data transfer frames of one I/O operation<br />
might get queued up behind the data transfer frames of another I/O operation. So, the<br />
CONNECT time of an I/O with <strong>FICON</strong> is dependent upon the number of I/O operations that<br />
are concurrently active on the same <strong>FICON</strong> channel, link and control unit connection.<br />
Multiplexing also means that the start and end of the CONNECT time for one native <strong>FICON</strong><br />
I/O operation can overlap the start and end of the CONNECT time for several other native<br />
<strong>FICON</strong> I/O operations. But AVG CONN TIME for large block size transfers should be<br />
significantly less for native <strong>FICON</strong> channels than for the same transfer size on ESCON or<br />
<strong>FICON</strong> Bridge channels due to the much faster (2 Gbps or 200 MB/sec) link transfer speeds<br />
of native <strong>FICON</strong> vs. the 20 MB/sec link transfer speed of ESCON. Several examples of<br />
CONNECT times at various levels of <strong>FICON</strong> <strong>Express2</strong> channel processor, bus and link<br />
utilizations are shown in the <strong>FICON</strong> <strong>Express2</strong> benchmark measurement results displayed in<br />
Figures 5 through 17 of this paper.<br />
Little’s Law can be used to estimate the average number of open exchanges or simultaneously<br />
active I/O’s or multiplexing level for both a <strong>FICON</strong> channel and a control unit port for a<br />
given RMF interval. This formula is essentially a variation of the formula for calculating I/O<br />
intensity levels which has been used for years to identify “hot spots” in an I/O configuration.<br />
I/O intensity levels are calculated by multiplying total response times by activity rates. The<br />
number of I/O’s that are simultaneously active and transferring data between the channel and<br />
the control unit can be determined by multiplying the CONNECT time component of<br />
response time (in units of seconds or milliseconds(ms) times 0.001) by the activity rate (in
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 28<br />
units of I/Os per second). If there is only 1 LCU (logical control unit) connected to a single<br />
set of <strong>FICON</strong> channels, then the average number of open exchanges can be calculated by<br />
multiplying the activity rate for that LCU by the sum of the “CMR + CONN + DISC” times<br />
for that LCU divided by the number of channels in the path group for that LCU. If there are<br />
multiple LCUs connected to a set of <strong>FICON</strong> channels, then the results of this calculation<br />
needs to be summed for all these LCUs. Similarly, to determine the average number of<br />
exchanges for a given physical CU port if there are multiple sets of channels from multiple<br />
LPARs on multiple CECs connected to the same set of CU ports, this calculation needs to be<br />
done for each LCU for each LPAR and then summed to get the total for the CU port.<br />
In any case, if the result of this calculation is a higher than normal value for your workload,<br />
then one must look at each of the components of the formula to determine the cause of the<br />
high number of open exchanges. AVG CMR DLY or “command response” delay time is a new<br />
field that has been added to the RMF Device Activity report for <strong>FICON</strong>. An example of this<br />
is displayed in Figure 23 above. AVG CMR DLY time is a subset of PEND time. As shown in<br />
Figure 24, when a channel opens a new exchange with a control unit by sending the first<br />
command in the channel program to the control unit, the control unit responds with a CMR.<br />
Architecturally, the official end to PEND time (for both <strong>FICON</strong> and ESCON) is designated by<br />
the time when the channel receives the CMR signal from the control unit.<br />
<strong>FICON</strong> Command/Data Transfer<br />
CCW=<strong>Channel</strong> Control Word CE=<strong>Channel</strong> End DE=Device End<br />
CMR = Command Response<br />
<strong>FICON</strong><br />
<strong>Express2</strong><br />
<strong>Channel</strong><br />
total pend time<br />
ssch<br />
CCW1<br />
CCW2<br />
CCW3<br />
CE/DE<br />
Control Unit<br />
cmr time...subset of pend<br />
CCW1<br />
CMR<br />
Figure 24<br />
CMR<br />
cmd<br />
End<br />
cmd<br />
End<br />
cp ---> sap ---> channel ---> cu port ---> channel<br />
Device<br />
CMR time begins when exchange begins & ends when pend time ends
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 29<br />
If the control unit is excessively busy with other I/O operations or exchanges that are already<br />
active, then this will be reflected in larger than normal AVG CMR DLY times. If DISC time is<br />
high, then the cause of a high number of average open exchanges could be low control unit<br />
cache hit ratios or contention in other internal resources of the control unit involved in<br />
reading or writing data from disk. Synchronous copying of data from primary DASD to<br />
secondary DASD located many kilometers away can also cause high DISCONNECT times.<br />
If CONN time is high then the cause of a high number of open exchanges could be either<br />
high channel utilizations or high control unit port utilizations or director port contention or<br />
long distances between the channel and the control unit or large data transfers or the nature<br />
of the particular channel programs being executed. <strong>Channel</strong> (processor and bus) utilizations<br />
can be found on the RMF <strong>Channel</strong> Activity report. Unfortunately, control unit port<br />
utilizations are not reported directly on any RMF report. However, some information about<br />
<strong>FICON</strong> director ports that are connected to either control unit ports or channels can be found<br />
on the RMF <strong>FICON</strong> Director Activity report. An example of this report is shown in Figure<br />
25.<br />
RMF <strong>FICON</strong> Director Activity report<br />
F I C O N D I R E C T O R A C T I V I T Y<br />
z/OS V1R6 SYSTEM ID S08 DATE 12/01/2004<br />
RPT VERSION V1R5 RMF TIME 16.18.00<br />
IODF = 4C NO CREATION INFORMATION AVAILABLE ACT: POR<br />
SWITCH DEVICE: 00C2 SWITCH ID: ** TYPE: 006140 MODEL: 001 MAN: MCD<br />
PORT -CONNECTION- AVG FRAME AVG FRAME SIZE PORT BANDWIDTH (MB/SEC)<br />
ADDR UNIT ID PACING READ WRITE -- READ -- -- WRITE --<br />
note: channel program = 32x4K read<br />
49 CHP-H 95 0 70 1334 2.19 125.21<br />
7A CU ---- 0 1334 70 39.70 0.70<br />
7B CU ---- 0 1334 70 41.65 0.73<br />
83 CU BF00 0 1334 70 41.71 0.73<br />
compare MB/sec at CU port to max CU port capability to approximate CU<br />
port utilization and compare to CU link max MB/sec based on link speed<br />
(100MB/sec for 1Gbps or 200MB/sec for 2Gbps links) to get CU link<br />
utilization<br />
Figure 25
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 30<br />
The first column “PORT ADDR” identifies the switch port address. The 2nd and 3rd<br />
“CONNECTION” columns identify what this switch port is connected to. The “UNIT”<br />
indicates whether is it a channel (CHP-H), a control unit port (CU) or in the case where two<br />
directors are cascaded, another switch port (SWITCH). The “ID” in column 3 is the CHPID<br />
number for the channel or the control unit address for the CU. The values in the “AVG<br />
FRAME PACING” column will be zero most of the time. This column is intended to display<br />
the amount of time that a frame is delayed when there are no more buffer credits available.<br />
The “AVG FRAME SIZE” columns display the average number of bytes per frame being<br />
“READ” into that director port or written out from that director port. These columns can be<br />
used to help understand if your workload is a processor or bus/link limited workload. The<br />
maximum frame size is 2K bytes. If your workload is transferring a small amount of data<br />
using small block sizes, such as the 4K bytes per I/O typically found in online transaction<br />
processing, then the average frame size will most likely be less than 1000 bytes and your<br />
workload will most likely be channel processor or control unit port processor limited. On the<br />
other hand, if your workload transfers a lot of data using large block sizes, then the average<br />
frame size will most likely be in the 1500 to 2000 byte range and your workload will most<br />
likely be channel or control unit bus or link limited. Figure 25 is an example of a workload<br />
that is in between these two extremes and has an average frame size of 1334 bytes. In this<br />
case, both processor and bus/link utilizations should be monitored.<br />
The last two columns on this report, the “PORT BANDWIDTH (MB/SEC)” “READ” and<br />
“WRITE” columns contain the MB/sec that are being “READ” into that director port or<br />
written out from that director port. Please note that for an RMF interval where 10 MB/sec of<br />
data is being “READ” from a device on a control unit that the 10 MB/sec value will appear<br />
on the line for the director port connected to the control unit in the “READ” column but in<br />
the “WRITE” column for the director port connected to the channel in the RMF <strong>FICON</strong><br />
Director Activity Report and in the “READ(MB/SEC)” column of the channel in the RMF<br />
<strong>Channel</strong> Activity Report. The “READs” and “WRITEs” on the <strong>FICON</strong> Director Activity<br />
report are from the perspective of the port, whereas the “READs” and “WRITEs” on the<br />
<strong>Channel</strong> Activity report are from the perspective of the higher level application. Figure 25 is<br />
an example of a benchmark measurement where about 40 MB/sec was “READ” from each of<br />
3 different control unit ports and over 120 MB/sec was written to a single channel, CHPID<br />
#95.<br />
To convert control unit port MB/sec data into control unit port utilizations, you also need to<br />
know what the maximum capability of the control unit port is for both small and large block<br />
sizes and whether your workload is a small or large block size workload. If a control unit<br />
vendor tells you or you run your own test to determine that the maximum capability of a<br />
single port on their box for 4k byte READs is 5000 I/Os per second, then this is the same as
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 31<br />
seeing 20MB/sec in the READ MB/sec column and less than 1000 bytes in the AVG READ<br />
FRAME SIZE column for the CU port line on the RMF <strong>FICON</strong> Director Activity report. If<br />
your workload is reporting more than 10 READ MB/sec with an AVG READ FRAME SIZE<br />
less than 1000 bytes, then your workload is driving this CU port to greater than 50%<br />
utilization. Similarly, if a control unit vendor tells you or you run your own test to determine<br />
that the maximum capability of a single port on their box for half-track or 27K byte READs is<br />
about 2500 I/Os per second or about 70 MB/sec, then this is the same as seeing 70 MB/sec<br />
in the READ MB/sec column and greater than 1500 bytes in the AVG READ FRAME SIZE<br />
column for the CU port line on the RMF <strong>FICON</strong> Director Activity report. If your workload is<br />
reporting more than 35 READ MB/sec with an AVG READ FRAME SIZE greater than 1500<br />
bytes, then your workload is driving this CU port to greater than 50% utilization. Driving a<br />
CU port to greater than 50% utilization could be the cause of higher than normal CONN<br />
times which could result in higher than normal average open exchanges for that CU port or<br />
for any of the channels connected to that CU port.<br />
If you were to ask the question, “what is an appropriate value for average open exchanges in<br />
an RMF interval for my workload?”, the answer, of course, would be “it depends on the<br />
characteristics of the workload”. The following example should illustrate this point.<br />
ACTIVITY<br />
5,634.1<br />
5,634.1<br />
5,634.1<br />
5,634.1<br />
5,634.1<br />
5,634.1<br />
5,634.1<br />
RESP<br />
1.8<br />
2.7<br />
3.7<br />
4.7<br />
5.7<br />
6.7<br />
7.7<br />
PEND<br />
0.3<br />
0.3<br />
0.3<br />
0.3<br />
0.3<br />
0.3<br />
0.3<br />
CMR<br />
0.2<br />
0.2<br />
0.2<br />
0.2<br />
0.2<br />
0.2<br />
0.2<br />
DISC<br />
1.2<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
CONN<br />
0.4<br />
0.4<br />
0.4<br />
0.4<br />
0.4<br />
0.4<br />
0.4<br />
OPEX<br />
2.6<br />
3.7<br />
5.1<br />
6.5<br />
7.9<br />
9.3<br />
10.7<br />
CU H/R<br />
88%<br />
80%<br />
70%<br />
60%<br />
50%<br />
40%<br />
30%<br />
The first row of this table is taken from the RMF reports for a 15 minute interval of the LSPR<br />
OLTP-T workload measurement.<br />
ACTIVITY = I/Os per second rate.<br />
RESP = total response time for each I/O in ms.<br />
PEND = pend time.<br />
CMR = command response time, which is a subset of PEND time.<br />
DISC = disconnect time.<br />
CONN = connect time.<br />
OPEX = average number of open exchanges per channel. In this configuration, there were 4<br />
channels per LCU.<br />
CU H/R = control unit cache hit ratio.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 32<br />
This workload has a control unit cache hit ratio of 88% and a disconnect time of 1.2ms<br />
which implies that it takes an average of about 10ms to resolve each CU cache miss. The rest<br />
of the rows in the above table illustrate what the response time and average open exchanges<br />
per channel would be if instead of a CU H/R of 88%, this workload had an 80%, 70%, 60%,<br />
50%, 40% or 30% control unit cache hit ratio. For each 10% drop in CU H/R, disconnect<br />
time and total response times increase by 1ms. Average open exchanges per channel increase<br />
from 2.6 with a CU H/R of 88% to 10.7 with a CU H/R of 30%. So, if the nature of your<br />
workload is such that it has a poor CU cache hit ratio, then it is acceptable to have higher<br />
average open exchange values for this workload compared to a workload with much better<br />
CU cache hit ratios. Furthermore, adding additional channel paths to a workload with poor<br />
CU cache hit ratios is not the appropriate action to take. For this workload the channels are<br />
only 18% busy. To improve the performance of a workload with high disconnect times,<br />
attention needs to be paid to actions that will either improve the CU cache hit ratio or reduce<br />
the amount of time that it takes to resolve each CU cache miss.<br />
This is just one example of how values for average open exchanges can vary based on<br />
workload characteristics. In general, an acceptable average open exchange value should be<br />
determined for each workload based on experiences of when bottom line workload<br />
performance is acceptable or not.<br />
With ESCON, the additional queuing delays caused by having multiple I/Os concurrently<br />
active appear in the PEND or DISC time component of response time. If the same workload<br />
with the same activity rate and the same level of I/O concurrency is run on native <strong>FICON</strong><br />
channels instead of ESCON channels, then one could see the PEND and DISC time<br />
components of response time decrease and the CONNECT time component increase for small<br />
data transfer sizes. For large data transfers, the improved CONNECT time due to the 100<br />
MB/sec or 200 MB/sec link transfer speed will most likely offset any increased CONNECT<br />
time due to multiplexing queuing delays. Figure 26 illustrates the type of improvement in<br />
CONNECT time experienced on the z900 <strong>FICON</strong> and for <strong>FICON</strong> Express as compared with<br />
ESCON. The exact CONNECT time will, of course, vary depending on the details of the I/O<br />
configuration (type of storage system, number of devices, workload intensity, etc.). Figures 5<br />
through 17 of this paper show several examples of CONNECT times at various utilization<br />
levels for the new z990 <strong>FICON</strong> <strong>Express2</strong> channels.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 33<br />
Connect times in ms<br />
Sample <strong>FICON</strong> vs ESCON connect<br />
times for large data transfer sizes<br />
12<br />
10<br />
8<br />
6<br />
4<br />
2<br />
0<br />
Figure 26<br />
In addition to the RMF <strong>Channel</strong> Activity, Device Activity and <strong>FICON</strong> Director Activity<br />
reports, the RMF I/O Queuing Activity report also provides information about your I/O<br />
configuration. Starting with z/OS V1R2 and RMF Release 12, several new fields were added<br />
to the I/O Queuing Activity report. Figures 27, 28 and 29 are examples of excerpts from this<br />
report. The “Initiative Queue” section of the report is the same as it has been for several<br />
years. The “IOP UTILIZATION” and the “RETRIES/SSCH” sections were added with z/OS<br />
V1R2. The “% IOP BUSY” column is the SAP utilization. The “I/O START RATE” column<br />
is the number of SSCHs per second sent from a CP to a particular SAP. The “INTERRUPT<br />
RATE” column is the number of I/O interrupts per second processed by each SAP. In<br />
general, if the channel programs being executed do not have the PCI (Programmed<br />
Controlled Interrupt) flag set, the total number of interrupts per second processed will be<br />
equal to the total number of SSCHs per second processed. The “RETRIES/SSCH” section<br />
indicates the average number of times per SSCH that the SAP encountered a busy signal in<br />
the process of doing its path selection work for this I/O operation. There are four types of<br />
busies reported:<br />
1. CP busy = channel path busy,<br />
2. DP busy = director port busy<br />
3. CU busy = control unit port busy<br />
4. DV busy = device busy<br />
ESCON 27K<br />
<strong>FICON</strong> 27K<br />
<strong>FICON</strong> Express 27K<br />
ESCON 6x27K<br />
<strong>FICON</strong> 6x27K<br />
<strong>FICON</strong> Express 6x27K
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 34<br />
Each time a SAP encounters a busy and has to retry another path for a SSCH, additional SAP<br />
cycles are consumed and the %IOP busy or SAP utilization will increase. One of the benefits<br />
of native <strong>FICON</strong> is that it makes SAPs or IOPs more productive due to the reduction in<br />
busies or RETRIES/SSCH. For the same activity rate, one should see less IOP utilization %<br />
busy with native <strong>FICON</strong>, <strong>FICON</strong> Express and <strong>FICON</strong> <strong>Express2</strong> channels than with ESCON<br />
channels. One must be careful not to misinterpret IOP utilization %’s however. High IOP<br />
utilization %’s are usually an indicator of contention especially with ESCON channels,<br />
directors and control units. Adding additional IOPs will NOT help reduce channel<br />
configuration contention. One must identify the source of the configuration contention and<br />
fix it. Migrating from ESCON to native <strong>FICON</strong> configurations is a natural solution to this<br />
problem. Figures 27 and 28 represent a dramatic example of this. Figure 27 is from a z900<br />
ESCON configuration with a lot of contention. Specifically, for the time interval reported,<br />
there were a total of 4.73 retries per SSCH. 4.19 of these were channel path busies. This<br />
means that when the SAP tried to start a new I/O operation on an ESCON channel, that<br />
channel was already busy processing another I/O and the SAP had to try to find another<br />
ESCON channel path that was available for this I/O and on the average it did this 4.19 times<br />
per SSCH. This means that either the ESCON channels were operating at high utilizations or<br />
there were not enough paths per LCU defined to handle the number of I/O operations that<br />
were being issued simultaneously to the total number of LCU’s that shared the same set of<br />
ESCON paths. This can happen when there is a burst of activity during a subset of the total<br />
RMF interval, e.g. for a few minutes out of a 30 minute or longer interval. There was also an<br />
average of 0.54 director port busies per SSCH during this time interval. This means that 2 or<br />
more ESCON channel paths most likely from multiple CECs in the same sysplex were trying<br />
to connect to the same director port and control unit port at the same time and with ESCON<br />
only 1 I/O operation to a given director port or CU port can be active at once. The other<br />
I/O’s that attempt to use the same destination port will get DP busy signals. The rules of<br />
thumb available for these statistics are:<br />
1. keep SAP utilization or %IOP BUSY below 70%,<br />
2. AVG Q LNGTH should be less than 1 and<br />
3. Total RETRIES/SSCH should be less than 2 with the sum of DP, CU and DV busies per<br />
SSCH less than 1.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 35<br />
I/O Q U E U I N G A C T I V I T Y<br />
RPT VERSION V1R2 RMF<br />
Figure 27<br />
from a z900 ESCON<br />
configuration with a lot of<br />
contention<br />
- INITIATIVE QUEUE - ------- IOP UTILIZATION -------<br />
IOP ACTIVITY AVG Q % IOP I/O START INTERRUPT<br />
RATE LNGTH BUSY RATE RATE<br />
00 2745.205 0.77 68.02 2745.181 3684.715<br />
01 3236.994 0.11 53.70 3236.990 3566.626<br />
02 3067.562 0.82 73.73 3067.292 3262.451<br />
SYS 9049.758 0.55 65.15 9049.461 10513.79<br />
IOP<br />
00<br />
01<br />
02<br />
SYS<br />
-------- RETRIES / SSCH ---------<br />
CP DP CU DV<br />
ALL BUSY BUSY BUSY BUSY<br />
4.80 4.17 0.62 0.00 0.00<br />
2.92 2.60 0.31 0.00 0.00<br />
6.58 5.88 0.69 0.00 0.00<br />
4.73 4.19 0.54 0.00 0.00<br />
rules of thumb:<br />
avg q lngth < 1,<br />
%IOP busy < 70%,<br />
retries/ssch < 1 or 2<br />
Figure 28 shows the dramatic improvements in effective SAP capacity after the migration<br />
from the z900 with all ESCON channels to the z990 with most of the I/O activity occurring<br />
on <strong>FICON</strong> channels. The number of RETRIES/SSCH went from 4.73 to 0.21 and average<br />
SAP utilizations dropped from over 65% to under 20%, resulting in a 3x improvement in<br />
effective SAP capacity. Improvements like this are not typical, however, and would be much<br />
less dramatic if the original ESCON configuration was much better tuned to the point where<br />
RETRIES/SSCH was less than 1.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 36<br />
- - INITIATIVE QUEUE - ------- IOP UTILIZATION -------<br />
IOP ACTIVITY AVG Q % IOP I/O START INTERRUPT<br />
RATE LNGTH BUSY RATE RATE<br />
00 3424.947 0.01 13.53 3424.922 3374.204<br />
01 1969.652 0.00 5.02 1969.651 1921.234<br />
02 40<strong>1.0</strong>22 0.00 2.75 400.995 591.365<br />
03 4950.215 0.02 35.58 4950.211 5147.980<br />
SYS 10745.84 0.01 14.22 10745.78 11034.79<br />
significant<br />
reduction in<br />
retries...3x<br />
improvement<br />
in effective<br />
Sap capacity<br />
Figure 28<br />
after migration to z990<br />
and <strong>FICON</strong> (+ESCON)<br />
-------- RETRIES / SSCH ---------<br />
CP DP CU DV<br />
ALL BUSY BUSY BUSY BUSY<br />
0.15 0.14 0.00 0.01 0.00<br />
0.25 0.24 0.00 0.01 0.00<br />
0.23 0.22 0.00 0.01 0.00<br />
0.24 0.16 0.07 0.01 0.00<br />
0.21 0.17 0.03 0.01 0.00<br />
Figures 27 and 28 show the average RETRIES/SSCH at the overall I/O configuration level.<br />
To identify which part of the overall I/O configuration is experiencing contention, one needs<br />
to look at the LCU section of the I/O Queuing activity report. An example of this is displayed<br />
in Figure 29. The first column is the LCU id and the 2nd column is the CU id. In the 3rd<br />
column is a list of the channel path ids for this LCU. Up to 8 channel paths can be defined<br />
per LCU. In Figure 29, 6 channel paths are defined for LCU 0222. The “CHPID TAKEN”<br />
column is the equivalent of an activity rate. It is the number of SSCHs per second that were<br />
executed on the channel paths defined for this LCU. The %DP BUSY column is the % of<br />
times that the SAP encountered a busy signal at an ESCON director port when attempting to<br />
select this path for a new SSCH. %DP BUSY will be 0 for native <strong>FICON</strong> due to the<br />
elimination of destination port busy signals with native <strong>FICON</strong> packet-switched directors.<br />
%CU BUSY should also be 0 for native <strong>FICON</strong> in most customer production environments.<br />
CU busies will only occur with native <strong>FICON</strong> when an individual CU port is being overloaded<br />
with work from many different <strong>FICON</strong> channels simultaneously. The high % CU BUSY (15%<br />
for path 03 & 14% for path 06) in Figure 29 is an example of <strong>FICON</strong> CU port contention.<br />
Further evidence of this contention is the high AVG CMR DLY times for these channel paths<br />
and the low CHPID TAKEN values or activity rates for channel paths 03 & 06 in comparison<br />
to the other channel paths defined for this LCU. The AVG CMR DLY of 203ms for channel<br />
path 03 and 207ms for channel path 06 indicates that the CU ports that are connected to<br />
these channel paths are taking a very long time to respond to the new SSCH work that the<br />
channel is trying to send to them. In contrast, the CU ports that are connected to channel
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 37<br />
paths 01, 02, 04 and 05 are responding on average in about 0.5ms. In this case, if no errors<br />
were made in the IOCDS, then some “tuning” of the configuration is necessary to reduce<br />
these CU busies and achieve better response time results. Contention due to CU busies<br />
results in higher than normal PEND times and contributes to a higher than normal average<br />
number of open exchanges for this LCU. The solution to this is to identify the source of<br />
contention at the CU ports connected to channel paths 03 and 06 in this example and fix it.<br />
I/O Q U E U I N G A C T I V I T Y from a configuration<br />
with <strong>FICON</strong> CU port<br />
contention causing<br />
high pend times<br />
LCU CONTROL UNITS<br />
0222 1000<br />
AVG AVG<br />
CHAN CHPID % DP % CU CUB CMR<br />
PATHS TAKEN BUSY BUSY DLY DLY<br />
01 89.256 0.00 0.00 0.0 0.5<br />
02 86.348 0.00 0.00 0.0 0.5<br />
03 1.908 0.00 15.08 0.0 203<br />
04 89.644 0.00 0.00 0.0 0.5<br />
05 86.055 0.00 0.00 0.0 0.5<br />
06 2.132 0.00 13.95 0.0 207<br />
* 355.34 0.00 0.19 0.0 2.9<br />
note: CU busies and high CMR delay times is NOT<br />
normal for native <strong>FICON</strong>...indicates CU port contention<br />
Figure 29<br />
For <strong>FICON</strong> channels it is also possible to estimate the average number of bytes transferred<br />
per SSCH by dividing the MB/sec of a <strong>FICON</strong> channel from the <strong>Channel</strong> Path Activity report<br />
by the total SSCH/sec processed by a <strong>FICON</strong> channel from the I/O Queuing Activity report.<br />
The total SSCH/sec processed by a <strong>FICON</strong> channel can be determined by adding up all of the<br />
“CHPID taken” fields on the I/O Queuing Activity report for each LCU that a single <strong>FICON</strong><br />
channel is connected to. If the average data transfer sizes of your channel programs are<br />
greater than 27K bytes, then your workload is most likely pushing the channel and CU port<br />
buses and links to higher levels of utilization than other resources and you should focus on<br />
the MB/sec fields on your RMF <strong>Channel</strong> Activity and <strong>FICON</strong> Director Activity Reports and<br />
compare these to the maximum capability of the <strong>FICON</strong> channels, CU ports and links used in<br />
your configuration.<br />
In summary, the basics of performance analysis do not change with a <strong>FICON</strong> configuration<br />
versus an ESCON configuration. In both environments, an appropriate technique to use is to<br />
first calculate I/O intensities, which equals I/O rate multiplied by response times. This<br />
analysis can be done at a device volume level, an LCU level, a physical CU box level or for a<br />
group of channels. The parts of the total I/O configuration that have the highest I/O
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 38<br />
intensities are the “hot spots” of the configuration. These are the areas where configuration<br />
tuning has the potential for yielding the highest benefit. As explained above, the individual<br />
components of response time (IOSQ, DISC, PEND and CONN) will tell you where you should<br />
focus your efforts. The average open exchange calculation is a subset of the I/O intensity<br />
calculation that uses the DISC + CONN + CMR components of response time. Except in<br />
cases of extremely low control unit cache hit ratios, the open exchange limit is not the cause<br />
of high values of average open exchanges. Instead high values for average open exchanges<br />
are most likely the result of driving either the channels or the control unit to high levels of<br />
utilization. Tuning efforts need to be focused on the appropriate areas based on the DISC,<br />
CONN and CMR components of workload response times. If the <strong>FICON</strong> channel processor<br />
and bus utilizations as reported on the RMF <strong>Channel</strong> Activity report and link utilizations<br />
calculated from the MB/sec info are less than 50%, then the tuning efforts need to focus on<br />
the control units in the configuration.<br />
The basic architecture and design differences between <strong>FICON</strong> and ESCON resulted in many<br />
changes to the performance data that appear on RMF reports. Additional information in the<br />
form of <strong>FICON</strong> processor and bus utilizations, READ and WRITE MB/sec, AVG FRAME<br />
SIZE and AVG CMR DLY is provided to help analyze the multiplexing capability of <strong>FICON</strong>.<br />
Since ESCON is only capable of executing one I/O operation at a time, RMF reports the time<br />
that the entire CHPID path is busy for ESCON channel utilization. With <strong>FICON</strong>, we must<br />
consider the individual components of the total CHPID path such as the <strong>FICON</strong> channel<br />
processor and bus, the fibre link, the director destination port and the control unit port<br />
adapter microprocessor, bus and link. The charts and examples provided in this paper<br />
should help guide you in assessing the maximum capability of <strong>FICON</strong> <strong>Express2</strong> channels for<br />
your workload.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 39<br />
Conclusion<br />
The zSeries <strong>FICON</strong> <strong>Express2</strong> channels available on the z990 and z890 offer many benefits<br />
over ESCON and previous generations of <strong>FICON</strong> channels. The increased throughput and<br />
bandwidth capabilities of these channels offer the opportunity for improved performance with<br />
simpler configurations and reduced infrastructure over longer distances to meet the needs of<br />
future datacenter growth including backup and disaster recovery requirements. The total<br />
native <strong>FICON</strong> solution – DASD, TAPE and Printer attachments, directors and the new and<br />
improved <strong>FICON</strong> <strong>Express2</strong> channels – are available and ready for your installation.<br />
Additional <strong>FICON</strong> product information is available on the <strong>IBM</strong> System Sales Web site and<br />
the zSeries I/O connectivity Web site at<br />
www.ibm.com/servers/eserver/zseries/connectivity/.<br />
Acknowledgements<br />
The data presented in this paper is based upon measurements carried out over several years<br />
using a mixture of <strong>IBM</strong> internal tools and non-<strong>IBM</strong> I/O driver programs, specifically <strong>Version</strong><br />
13 of the PAI/O Driver for z/OS. I would like to thank all of the reviewers of this paper for<br />
their helpful comments. Special thanks go to Mario Borelli for his continued support on this<br />
effort.
<strong>FICON</strong> <strong>Express2</strong> <strong>Channel</strong> <strong>Performance</strong> <strong>Version</strong> <strong>1.0</strong><br />
Page 40<br />
Copyright <strong>IBM</strong> Corporation 2005<br />
<strong>IBM</strong> Corporation<br />
Marketing Communications, Server Group<br />
Route 100<br />
Somers, NY 10589<br />
U.S.A.<br />
Produced in the United States of America<br />
04/05<br />
All Rights Reserved<br />
<strong>IBM</strong>, <strong>IBM</strong> eServer, <strong>IBM</strong> logo, ESCON, <strong>FICON</strong>, RMF, and zSeries are trademarks or registered trademarks of<br />
International Business Machines Corporation of the United States, other countries or both.<br />
Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States,<br />
other countries or both.<br />
Linux is a registered trademark of Linus Torvalds<br />
ON (LOGO) DEMAND BUSINESS is a trademark of International Business Machines Corporation.<br />
PAI/O is a trademark of <strong>Performance</strong> Associates, Inc.<br />
UNIX is a registered trademark of The Open Group in the United States and other countries.<br />
Intel is a trademark of Intel Corporation in the United States, other countries or both.<br />
Other company, product and service names may be trademarks or service marks of others.<br />
Information concerning non-<strong>IBM</strong> products was obtained from the suppliers of their products or their published<br />
announcements. Questions on the capabilities of the non-<strong>IBM</strong> products should be addressed with the suppliers.<br />
<strong>IBM</strong> hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our<br />
warranty terms apply.<br />
<strong>IBM</strong> may not offer the products, services or features discussed in this document in other countries, and the<br />
information may be subject to change without notice. Consult your local <strong>IBM</strong> business contact for information on<br />
the product or services available in your area.<br />
All statements regarding <strong>IBM</strong>’s future direction and intent are subject to change or withdrawal without notice, and<br />
represent goals and objectives only.<br />
<strong>Performance</strong> is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard<br />
<strong>IBM</strong> benchmarks in a controlled environment. The actual throughput that any user will experience will vary<br />
depending upon considerations such as the amount of multiprogramming in the user’s job stream, the<br />
I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given<br />
that an individual user will achieve throughput improvements equivalent to the performance ratios<br />
stated here.<br />
GM13-0702-00