09.01.2015 Views

Problem Reporting and Analysis Linux on System z ... - z/VM - IBM

Problem Reporting and Analysis Linux on System z ... - z/VM - IBM

Problem Reporting and Analysis Linux on System z ... - z/VM - IBM

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> <str<strong>on</strong>g>Reporting</str<strong>on</strong>g> <str<strong>on</strong>g>and</str<strong>on</strong>g> <str<strong>on</strong>g>Analysis</str<strong>on</strong>g> <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z -<br />

How to survive a <str<strong>on</strong>g>Linux</str<strong>on</strong>g> Critical Situati<strong>on</strong><br />

Sven Schuetz<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z Development <str<strong>on</strong>g>and</str<strong>on</strong>g> Service<br />

sven@de.ibm.com<br />

1<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Agenda<br />

Introducti<strong>on</strong><br />

How to help us to help you<br />

<strong>System</strong>s m<strong>on</strong>itoring<br />

How to dump a <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Real Customer cases<br />

2<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Introductory remarks<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> analysis looks straight forward <strong>on</strong> the charts but it might<br />

have taken weeks to get it d<strong>on</strong>e.<br />

A problem does not necessarily show up <strong>on</strong> the place of origin<br />

The more informati<strong>on</strong> is available, the so<strong>on</strong>er the problem can be<br />

solved, because gathering <str<strong>on</strong>g>and</str<strong>on</strong>g> submitting additi<strong>on</strong>al informati<strong>on</strong><br />

again <str<strong>on</strong>g>and</str<strong>on</strong>g> again usually introduces delays.<br />

This presentati<strong>on</strong> can <strong>on</strong>ly introduce some tools <str<strong>on</strong>g>and</str<strong>on</strong>g> how the tools<br />

can be used, comprehensive documentati<strong>on</strong> <strong>on</strong> their capabilities is<br />

to be found in the documentati<strong>on</strong> of the corresp<strong>on</strong>ding tool.<br />

Do not forget to update your systems<br />

3<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Describe the problem<br />

Get as much informati<strong>on</strong> as possible about the circumstances:<br />

– What is the problem<br />

– When did it happen (date <str<strong>on</strong>g>and</str<strong>on</strong>g> time, important to dig into logs )<br />

– Where did it happen One or more systems, producti<strong>on</strong> or test envir<strong>on</strong>ment<br />

– Is this a first time occurrence<br />

– If occurred before: how frequently does it occur<br />

– Is there any pattern<br />

– Was anything changed recently<br />

– Is the problem reproducible<br />

Write down as much informati<strong>on</strong> as possible about the problem!<br />

4<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Describe the envir<strong>on</strong>ment<br />

Machine Setup<br />

–Machine type (z196, z10, z9, ...)<br />

–Storage Server (ESS800, DS8000, other vendors models)<br />

–Storage attachment (FICON, ESCON, FCP, how many channels)<br />

–Network (OSA (type, mode), Hipersocket) ...<br />

Infrastructure setup<br />

–Clients<br />

–Other Computer <strong>System</strong>s<br />

–Network topologies<br />

–Disk c<strong>on</strong>figurati<strong>on</strong><br />

Middleware setup<br />

–Databases, web servers, SAP, TSM, (including versi<strong>on</strong> informati<strong>on</strong>)<br />

5<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Trouble Shooting First-Aid Kit (1/2)<br />

Install packages required for debugging<br />

– s390-tools/s390-utils<br />

• dbginfo.sh<br />

– sysstat<br />

• sadc/sar<br />

• iostat<br />

– procps<br />

• vmstat, top, ps<br />

– net-tools<br />

• netstat<br />

– dump tools crash / lcrash<br />

• lcrash (lkcdutils) available with SLES9 <str<strong>on</strong>g>and</str<strong>on</strong>g> SLES10<br />

• crash available <strong>on</strong> SLES11<br />

• crash in all RHEL distributi<strong>on</strong>s<br />

6<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Trouble Shooting First-Aid Kit (2/2)<br />

Collect dbginfo.sh output<br />

– Proactively in healthy system<br />

– When problems occur – then compare with healthy system<br />

Collect system data<br />

– Always archive syslog (/var/log/messages)<br />

– Start sadc (<strong>System</strong> Activity Data Collecti<strong>on</strong>) service when appropriate (please<br />

include disk statistics)<br />

– Collect z/<strong>VM</strong> MONWRITE Data if running under z/<strong>VM</strong> when appropriate<br />

When <strong>System</strong> hangs<br />

– Take a dump<br />

• Include <strong>System</strong>.map, Kerntypes (if available) <str<strong>on</strong>g>and</str<strong>on</strong>g> vmlinux file<br />

– See “Using the dump tools” book <strong>on</strong><br />

http://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26ddt02.pdf<br />

Enable extended tracing in /sys/kernel/debug/s390dbf for subsystem<br />

7<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

dbginfo Script (1/2)<br />

dbginfo.sh is a script to collect various system related files, for<br />

debugging purposes. It generates a tar-archive which can be<br />

attached to PMRs / Bugzilla entries<br />

part of the s390-tools package in SUSE <str<strong>on</strong>g>and</str<strong>on</strong>g> recent Red Hat<br />

distributi<strong>on</strong>s<br />

– dbginfo.sh gets c<strong>on</strong>tinuously improved by service <str<strong>on</strong>g>and</str<strong>on</strong>g> development<br />

Can be downloaded at the developerWorks website directly<br />

http://www.ibm.com/developerworks/linux/linux390/s390-tools.html<br />

It is similar to the RedHat tool sosreport or supportc<strong>on</strong>fig from Novell<br />

8<br />

root@larss<strong>on</strong>:~> dbginfo.sh<br />

Create target directory /tmp/DBGINFO-2011-01-15-22-06-<br />

20-t6345057<br />

Change to target directory /tmp/DBGINFO-2011-01-15-22-<br />

06-20-t6345057<br />

[...]<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

dbginfo Script (2/2)<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> Informati<strong>on</strong>:<br />

– /proc/[versi<strong>on</strong>, cpu, meminfo, slabinfo, modules, partiti<strong>on</strong>s, devices ...]<br />

– <strong>System</strong> z specific device driver informati<strong>on</strong>: /proc/s390dbf (RHEL 4 <strong>on</strong>ly) or<br />

/sys/kernel/debug/s390dbf<br />

– Kernel messages /var/log/messages<br />

– Reads c<strong>on</strong>figurati<strong>on</strong> files in directory /etc/ [ccwgroup.c<strong>on</strong>f, modules.c<strong>on</strong>f, fstab]<br />

– Uses several comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s: ps, dmesg<br />

– Query setup scripts<br />

• lscss, lsdasd, lsqeth, lszfcp, lstape<br />

– And much more<br />

z/<strong>VM</strong> informati<strong>on</strong>:<br />

– Release <str<strong>on</strong>g>and</str<strong>on</strong>g> service Level: q cplevel<br />

– Network setup: q [lan, nic, vswitch, v osa]<br />

– Storage setup: q [set, v dasd, v fcp, q pav ...]<br />

– C<strong>on</strong>figurati<strong>on</strong>/memory setup: q [stor, v stor, xstore, cpus...]<br />

– When the system runs as z/<strong>VM</strong> guest, ensure that the guest has the appropriate privilege<br />

class authorities to issue the comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s<br />

9<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

SADC/SAR<br />

Capture <str<strong>on</strong>g>Linux</str<strong>on</strong>g> performance data with sadc/sar<br />

– CPU utilizati<strong>on</strong><br />

–Disk I/O overview <str<strong>on</strong>g>and</str<strong>on</strong>g> <strong>on</strong> device level<br />

–Network I/O <str<strong>on</strong>g>and</str<strong>on</strong>g> errors <strong>on</strong> device level<br />

–Memory usage/Swapping<br />

–… <str<strong>on</strong>g>and</str<strong>on</strong>g> much more<br />

–Reports statistics data over time <str<strong>on</strong>g>and</str<strong>on</strong>g> creates average values for<br />

each item<br />

SADC example (for more see man sadc)<br />

– <strong>System</strong> Activity Data Collector (sadc) --> data gatherer<br />

– /usr/lib64/sa/sadc [opti<strong>on</strong>s] [interval [count]] [binary outfile]<br />

– /usr/lib64/sa/sadc 10 20 sadc_outfile<br />

10<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

SADC/SAR (c<strong>on</strong>t'd)<br />

– /usr/lib64/sa/sadc -d 10 sadc_outfile<br />

– -d opti<strong>on</strong>: statistics for disk<br />

– Should be started as a service during system start<br />

✱ SAR example (for more see man sar)<br />

– <strong>System</strong> Activity Report (sar) comm<str<strong>on</strong>g>and</str<strong>on</strong>g> --> reporting tool<br />

– sar -A<br />

– -A opti<strong>on</strong>: reports all the collected statistics<br />

– sar -A -f sadc_outfile >sar_outfile<br />

Please include the binary sadc data <str<strong>on</strong>g>and</str<strong>on</strong>g> sar -A output when<br />

submitting SADC/SAR informati<strong>on</strong> to <strong>IBM</strong> support<br />

11<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

CPU utilizati<strong>on</strong><br />

Per CPU values:<br />

watch out for<br />

system time (kernel time)<br />

iowait time (slow I/O subsystem)<br />

steal time (time taken by other guests)<br />

12<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Disk I/O rates<br />

read/write operati<strong>on</strong>s<br />

- per I/O device<br />

- tps: transacti<strong>on</strong>s<br />

- rd/wr_secs: sectors<br />

is your I/O balanced<br />

Maybe you should stripe your LVs<br />

13<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z dump tools<br />

DASD dump tool<br />

14<br />

– Writes dump directly <strong>on</strong> DASD partiti<strong>on</strong><br />

– Uses s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump format<br />

– ECKD <str<strong>on</strong>g>and</str<strong>on</strong>g> FBA DASDs supported<br />

– Single volume <str<strong>on</strong>g>and</str<strong>on</strong>g> multiple volume (for large systems) dump possible<br />

– Works in z/<strong>VM</strong> <str<strong>on</strong>g>and</str<strong>on</strong>g> in LPAR<br />

SCSI dump tool<br />

– Writes dump into filesystem<br />

– Uses lckd dump format<br />

– Works in z/<strong>VM</strong> <str<strong>on</strong>g>and</str<strong>on</strong>g> in LPAR<br />

<strong>VM</strong>DUMP<br />

– Writes dump to vm spool space (<strong>VM</strong> reader)<br />

– z/<strong>VM</strong> specific dump format, dump must be c<strong>on</strong>verted<br />

– Only available when running under z/<strong>VM</strong><br />

Tape dump tool<br />

– Writes dump directly <strong>on</strong> ESCON/FICON Tape device<br />

– Uses s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump format<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

DASD dump tool – general usage<br />

1. Format <str<strong>on</strong>g>and</str<strong>on</strong>g> partiti<strong>on</strong> dump device<br />

root@larss<strong>on</strong>:~> dasdfmt ­f /dev/dasd ­b 4096<br />

root@larss<strong>on</strong>:~> fdasd /dev/dasd<br />

2. Prepare dump device in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

root@larss<strong>on</strong>:~> zipl ­d /dev/dasd<br />

3. Stop all CPUs<br />

4. Store Status<br />

5. IPL dump device<br />

6. Copy dump to <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

root@larss<strong>on</strong>:~> zgetdump /dev/ > dump_file<br />

15<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

DASD dump under z/<strong>VM</strong><br />

Prepare dump device under <str<strong>on</strong>g>Linux</str<strong>on</strong>g>, if possible <strong>on</strong> 64Bit envir<strong>on</strong>ment:<br />

16<br />

root@larss<strong>on</strong>:~> zipl ­d /dev/dasd<br />

After <str<strong>on</strong>g>Linux</str<strong>on</strong>g> crash issue these comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s <strong>on</strong> 3270 c<strong>on</strong>sole:<br />

#cp cpu all stop<br />

#cp cpu 0 store status<br />

#cp i <br />

Wait until dump is saved <strong>on</strong> device:<br />

00: zIPL v1.6.0 dump tool (64 bit)<br />

00: Dumping 64 bit OS<br />

00: 00000087 / 00000700 MB 0<br />

...<br />

00: Dump successful<br />

Only disabled wait PSW <strong>on</strong> older Distributi<strong>on</strong>s<br />

Attach dump device to a linux system with dump tools installed<br />

Store dump to linux file system from dump device (e.g. zgetdump)<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

DASD dump <strong>on</strong> LPAR (1/2)<br />

17<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

DASD dump <strong>on</strong> LPAR (2/2)<br />

18<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Multi volume dump<br />

zipl can now dump to multiple DASDs. It is now possible to dump system<br />

images, which are larger than a single DASD.<br />

–You can specify up to 32 ECKD DASD partiti<strong>on</strong>s for a multi-volume dump<br />

What are dumps good for<br />

–Full snapshot of system state<br />

taken at any point in time<br />

(e.g. after a system has<br />

crashed, of or a running<br />

system)<br />

–Can be used to analyse<br />

system state bey<strong>on</strong>d<br />

messages written to the<br />

syslog<br />

–Internal data structures not<br />

exported to anywhere<br />

Obtain messages, which have not been written to the syslog due to a<br />

crash<br />

19<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Multi volume dump (c<strong>on</strong>t'd)<br />

How to prepare a set of ECKD DASD devices for a multi-volume<br />

dump (64-bit systems <strong>on</strong>ly)<br />

–We use two DASDs in this example:<br />

root@larss<strong>on</strong>:~> dasdfmt ­f /dev/dasdc ­b 4096<br />

root@larss<strong>on</strong>:~> dasdfmt ­f /dev/dasdd ­b 4096<br />

–Create the partiti<strong>on</strong>s with fdasd. The sum of the partiti<strong>on</strong> sizes must be<br />

sufficiently large (the memory size + 10 MB):<br />

root@larss<strong>on</strong>:~> fdasd /dev/dasdc<br />

root@larss<strong>on</strong>:~> fdasd /dev/dasdd<br />

–Create a file called sample_dump_c<strong>on</strong>f c<strong>on</strong>taining the device nodes<br />

(e.g. /dev/dasdc1) of the two partiti<strong>on</strong>s, separated by <strong>on</strong>e or more line<br />

feed characters<br />

–Prepare the volumes using the zipl comm<str<strong>on</strong>g>and</str<strong>on</strong>g>.<br />

root@larss<strong>on</strong>:~> zipl ­M sample_dump_c<strong>on</strong>f<br />

[...]<br />

20<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Multi volume dump (c<strong>on</strong>t'd)<br />

To obtain a dump with the multi-volume DASD dump tool, perform<br />

the following steps:<br />

–Stop all CPUs, Store status <strong>on</strong> the IPL CPU.<br />

–IPL the dump tool using <strong>on</strong>e of the prepared volumes, either 4711 or<br />

4712.<br />

–After the dump tool is IPLed, you'll see a messages that indicates the<br />

progress of the dump. Then you can IPL <str<strong>on</strong>g>Linux</str<strong>on</strong>g> again<br />

#cp cpu all stop<br />

#cp cpu 0 store status<br />

#cp ipl 4711<br />

Copying a multi-volume dump to a file<br />

–Use zgetdump without any opti<strong>on</strong> to copy the dump parts to a file:<br />

root@larss<strong>on</strong>:~> zgetdump /dev/dasdc > mv_dump_file<br />

21<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Multi volume dump (c<strong>on</strong>t'd)<br />

Display informati<strong>on</strong> of the involved volumes:<br />

root@larss<strong>on</strong>:~> zgetdump ­d /dev/dasdc<br />

'/dev/dasdc' is part of Versi<strong>on</strong> 1 multi­volume dump,which is<br />

spread al<strong>on</strong>g the following DASD volumes:<br />

0.0.4711 (<strong>on</strong>line, valid)<br />

0.0.4712 (<strong>on</strong>line, valid)<br />

[...]<br />

Display informati<strong>on</strong> about the dump itself:<br />

root@larss<strong>on</strong>:~> zgetdump ­i /dev/dasdc<br />

Dump device: /dev/dasdc<br />

>>> Dump header informati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

SCSI dump tool – general usage<br />

1. Create partiti<strong>on</strong> with PCBIOS disk-layout (fdisk)<br />

2. Format partiti<strong>on</strong> with ext2 or ext3 filesystem<br />

3. Install dump tool:<br />

–mount <str<strong>on</strong>g>and</str<strong>on</strong>g> prepare disk :<br />

root@larss<strong>on</strong>:~> mount /dev/sda1 /dumps<br />

root@larss<strong>on</strong>:~> zipl ­D /dev/sda1 ­t dumps<br />

–Opti<strong>on</strong>al: /etc/zipl.c<strong>on</strong>f:<br />

dumptofs=/dev/sda1<br />

target=/dumps<br />

4. Stop all CPUs<br />

5. Store Status<br />

6. IPL dump device<br />

23<br />

Dump tool creates dumps directly in filesystem<br />

SCSI dump supported for LPARs <str<strong>on</strong>g>and</str<strong>on</strong>g> as of z/<strong>VM</strong> 5.4<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

SCSI dump under z/<strong>VM</strong><br />

SCSI dump from z/<strong>VM</strong> is supported as of z/<strong>VM</strong> 5.4<br />

Issue SCSI dump<br />

#cp cpu all stop<br />

#cp cpu 0 store status<br />

#cp set dumpdev portname 47120763 00ce93a7 lun 47120000<br />

00000000 bootprog 0<br />

#cp ipl 4b49 dump<br />

To access the dump, mount the dump partiti<strong>on</strong><br />

24<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

SCSI dump <strong>on</strong> LPAR<br />

Select CPC image for LPAR to dump<br />

Goto Load panel<br />

Issue SCSI dump<br />

–FCP device<br />

–WWPN<br />

–LUN<br />

25<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

<strong>VM</strong>DUMP<br />

The <strong>on</strong>ly method to dump NSSes or DCSSes under z/<strong>VM</strong><br />

Works n<strong>on</strong>disruptive<br />

Create dump:<br />

#cp vmdump to cmsguest<br />

Receive dump:<br />

–Store the dump from the reader into CMS dump file:<br />

#cp dumpload<br />

–Transfer the dump to linux from CMS e.g. FTP<br />

–NEW: vmur device driver:<br />

root@larss<strong>on</strong>:~> vmur rec vmdump<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> tool to c<strong>on</strong>vert vmdump to lkcd format:<br />

root@larss<strong>on</strong>:~> vmc<strong>on</strong>vert vmdump linux.dump<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g>: Dump process relatively slow<br />

26<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

How to obtain informati<strong>on</strong> about a dump<br />

Display informati<strong>on</strong> of the involved volume:<br />

root@larss<strong>on</strong>:~> zgetdump ­d /dev/dasdb<br />

'/dev/dasdb' is Versi<strong>on</strong> 0 dump device.<br />

Dump size limit: n<strong>on</strong>e<br />

Display informati<strong>on</strong> about the dump itself:<br />

root@larss<strong>on</strong>:~> zgetdump ­i /dev/dasdb1<br />

Dump device: /dev/dasdb1<br />

Dump created <strong>on</strong>: Thu Oct 8 15:44:49 2009<br />

Magic number: 0xa8190173618f23fd<br />

Versi<strong>on</strong> number: 3<br />

Header size: 4096<br />

Page size: 4096<br />

Dumped memory: 1073741824<br />

Dumped pages: 262144<br />

Real memory: 1073741824<br />

cpu id: 0xff00012320978000<br />

<strong>System</strong> Arch: s390x (ESAME)<br />

Build Arch: s390x (ESAME)<br />

>>> End of Dump header


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

How to obtain informati<strong>on</strong> about a dump (c<strong>on</strong>t'd)<br />

Display informati<strong>on</strong> about the dump itself (dump header) <str<strong>on</strong>g>and</str<strong>on</strong>g> check if<br />

the dump is valid, use lcrash with opti<strong>on</strong>s<br />

’-i’ <str<strong>on</strong>g>and</str<strong>on</strong>g> ’-d’.<br />

root@larss<strong>on</strong>:~> lcrash ­i ­d /dev/dasdb1<br />

Dump Type: s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump<br />

Machine: s390x (ESAME)<br />

CPU ID: 0xff00012320978000<br />

Memory Start: 0x0<br />

Memory End: 0x40000000<br />

Memory Size: 1073741824<br />

Time of dump: Thu Oct 8 15:44:49 2009<br />

Number of pages: 262144<br />

Kernel page size: 4096<br />

Versi<strong>on</strong> number: 3<br />

Magic number: 0xa8190173618f23fd<br />

Dump header size: 4096<br />

Dump level: 0x4<br />

Build arch: s390x (ESAME)<br />

Time of dump end: Thu Oct 8 15:45:01 2009<br />

28<br />

End Marker found! Dump is valid!<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Automatic dump <strong>on</strong> panic (SLES 10/11, RHEL 5/6): dumpc<strong>on</strong>f<br />

The dumpc<strong>on</strong>f tool c<strong>on</strong>figures a dump device that is used for automatic<br />

dump in case of a kernel panic.<br />

–The comm<str<strong>on</strong>g>and</str<strong>on</strong>g> can be installed as service script under /etc/init.d/dumpc<strong>on</strong>f<br />

or can be called manually.<br />

–Start service: # service dumpc<strong>on</strong>f start<br />

–It reads the c<strong>on</strong>figurati<strong>on</strong> file /etc/sysc<strong>on</strong>fig/dumpc<strong>on</strong>f.<br />

–Example c<strong>on</strong>figurati<strong>on</strong> for CCW dump device (DASD) <str<strong>on</strong>g>and</str<strong>on</strong>g> reipl after dump:<br />

ON_PANIC=dump_reipl<br />

DUMP_TYPE=ccw<br />

DEVICE=0.0.4711<br />

29<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Automatic dump <strong>on</strong> panic (SLES 10/11, RHEL 5): dumpc<strong>on</strong>f<br />

(c<strong>on</strong>t'd)<br />

–Example c<strong>on</strong>figurati<strong>on</strong> for FCP dump device (SCSI disk):<br />

ON_PANIC=dump<br />

DUMP_TYPE=fcp<br />

DEVICE=0.0.4714<br />

WWPN=0x5005076303004712<br />

LUN=0x4047401300000000<br />

BOOTPROG=0<br />

BR_LBA=0<br />

–Example c<strong>on</strong>figurati<strong>on</strong> for re-IPL without taking a dump, if a kernel<br />

panic occurs:<br />

ON_PANIC=reipl<br />

–Example of executing a CP comm<str<strong>on</strong>g>and</str<strong>on</strong>g>, <str<strong>on</strong>g>and</str<strong>on</strong>g> rebooting from device 4711<br />

if a kernel panic occurs:<br />

30<br />

ON_PANIC=vmcmd<br />

<strong>VM</strong>CMD_1="MSG Starting <strong>VM</strong>DUMP"<br />

<strong>VM</strong>CMD_2="<strong>VM</strong>DUMP"<br />

<strong>VM</strong>CMD_3="IPL 4711"<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Get dump <str<strong>on</strong>g>and</str<strong>on</strong>g> send it to service organizati<strong>on</strong><br />

DASD/Tape:<br />

–Store dump to <str<strong>on</strong>g>Linux</str<strong>on</strong>g> file system from dump device:<br />

root@larss<strong>on</strong>:~> zgetdump /dev/ > dump_file<br />

–Alternative: lcrash (Compressi<strong>on</strong> possible)<br />

root@larss<strong>on</strong>:~> lcrash ­d /dev/dasdxx ­s <br />

SCSI:<br />

–Get dump from filesystem<br />

Additi<strong>on</strong>al files needed for dump analysis:<br />

– SUSE (lcrash tool): /boot/<strong>System</strong>.map-xxx <str<strong>on</strong>g>and</str<strong>on</strong>g> /boot/Kerntypes-xxx<br />

– Redhat & SUSE (crash tool): vmlinux file (kernel with debug info) c<strong>on</strong>tained in<br />

debug kernel rpms:<br />

• RedHat: kernel-debuginfo-xxx.rpm <str<strong>on</strong>g>and</str<strong>on</strong>g> kernel-debuginfo-comm<strong>on</strong>-xxx.rpm<br />

• SUSE: kernel-default-debuginfo-xxx.rpm<br />

31<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

H<str<strong>on</strong>g>and</str<strong>on</strong>g>ling large dumps<br />

Compress the dump <str<strong>on</strong>g>and</str<strong>on</strong>g> split it into parts of 1 GB<br />

root@larss<strong>on</strong>:~> zgetdump /dev/dasdc1 | gzip | split ­b 1G<br />

Several compressed files such as xaa, xab, xac, .... are created<br />

Create md5 sums of the compressed files<br />

root@larss<strong>on</strong>:~> md5sum xa* > dump.md5<br />

Upload all parts together with the md5 informati<strong>on</strong><br />

Verificati<strong>on</strong> of the parts for a receiver<br />

root@larss<strong>on</strong>:~> md5sum ­c dump.md5<br />

xaa: OK<br />

[....]<br />

Merge the parts <str<strong>on</strong>g>and</str<strong>on</strong>g> uncompress the dump<br />

root@larss<strong>on</strong>:~> cat xa* | gunzip ­c > dump<br />

32<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Transferring dumps<br />

Transferring single volume dumps with ssh<br />

root@larss<strong>on</strong>:~> zgetdump /dev/dasdc1 | ssh user@host "cat ><br />

dump_file_<strong>on</strong>_target_host"<br />

Transferring multi-volume dumps with ssh<br />

root@larss<strong>on</strong>:~> zgetdump /dev/dasdc | ssh user@host "cat ><br />

multi_volume_dump_file_<strong>on</strong>_target_host"<br />

Transferring a dump with ftp<br />

– Establish an ftp sessi<strong>on</strong> with the target host, login <str<strong>on</strong>g>and</str<strong>on</strong>g> set the transfer mode to<br />

binary<br />

– Send the dump to the host<br />

root@larss<strong>on</strong>:~> ftp> put |"zgetdump /dev/dasdc1"<br />

<br />

33<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Dump tool summary<br />

Tool<br />

Envir<strong>on</strong>ment<br />

St<str<strong>on</strong>g>and</str<strong>on</strong>g> al<strong>on</strong>e tools<br />

<strong>VM</strong>DUMP<br />

DASD Tape SCSI<br />

<strong>VM</strong>&LPAR <strong>VM</strong>&LPAR <strong>VM</strong><br />

Preparati<strong>on</strong><br />

Creati<strong>on</strong><br />

Dump<br />

medium<br />

Copy to<br />

filesystem<br />

Viewing<br />

zipl -d /dev/<br />

ECKD or<br />

FBA<br />

Stop CPU & Store status<br />

ipl <br />

Tape cartridges<br />

zgetdump /dev/<br />

> dump_file<br />

mkdir<br />

/dumps/mydumps<br />

zipl -D /dev/sda1 ...<br />

lcrash or crash<br />

LINUX file system<br />

<strong>on</strong> a SCSI disk<br />

---<br />

---<br />

vmdump<br />

<strong>VM</strong> reader<br />

Dumpload<br />

ftp ...<br />

vmc<strong>on</strong>vert ...<br />

See “Using the dump tools” book <strong>on</strong><br />

http://www.ibm.com/developerworks/linux/linux390/documentati<strong>on</strong>_dev.html<br />

34<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Customer Cases<br />

35<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots<br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

– Oracle RAC server or other HA<br />

soluti<strong>on</strong> under z/<strong>VM</strong><br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />

– Occasi<strong>on</strong>ally guests sp<strong>on</strong>taneously<br />

reboot without any notificati<strong>on</strong> or<br />

c<strong>on</strong>sole message<br />

Tools used for problem<br />

determinati<strong>on</strong>:<br />

– cp instructi<strong>on</strong> trace of (re)IPL code<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />

Oracle RAC<br />

Server<br />

HA Cluster<br />

communicati<strong>on</strong><br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />

Oracle RAC<br />

Server<br />

– Crash dump taken after trace was<br />

hit<br />

Oracle RAC Database<br />

36<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />

cause<br />

Questi<strong>on</strong>: Who rebooted the system<br />

Step 1<br />

–Find out address of (re)ipl code in the system map<br />

–Use this address to set instructi<strong>on</strong> trace<br />

cd /boot<br />

grep machine_restart <strong>System</strong>.map­2.6.16.60­0.54.5­default<br />

000000000010c364 T machine_restart<br />

00000000001171c8 t do_machine_restart<br />

0000000000603200 D _machine_restart<br />

37<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />

cause (c<strong>on</strong>t'd)<br />

Step 2<br />

–Set CP instructi<strong>on</strong> trace <strong>on</strong> the reboot address<br />

–<strong>System</strong> is halted at that address, when a reboot is triggered<br />

CP CPU ALL TR IN R 10C364.4<br />

HCPTRI1027I An active trace set has turned RUN off<br />

CP Q TR<br />

NAME INITIAL (ACTIVE)<br />

1 INSTR PSWA 0010C364­0010C367<br />

TERM NOPRINT NORUN SIM<br />

SKIP 00000 PASS 00000 STOP 00000 STEP 00000<br />

CMD NONE<br />

­> 000000000010C364' STMF EBCFF0780024 >> 000000003A557D48<br />

CC 2<br />

38<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />

cause (c<strong>on</strong>t'd)<br />

Step 3<br />

–Take a dump, when the (re)ipl code is hit<br />

cp cpu all stop<br />

cp store status<br />

Store complete.<br />

39<br />

cp i 4fc6<br />

Tracing active at IPL<br />

HCPGSP2630I The virtual machine is placed in CP mode due to a<br />

SOGP stop <str<strong>on</strong>g>and</str<strong>on</strong>g> store status from CPU 00.<br />

zIPL v1.6.3­0.24.5 dump tool (64bit)<br />

Dumping 64 bit OS<br />

00000128 / 00001024 MB<br />

......<br />

00001024 / 00001024 MB<br />

Dump successful<br />

HCPIR450W CP entered, disabled wait PSW 00020000 80000000<br />

00000000 00000000<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />

cause (c<strong>on</strong>t'd)<br />

Step 4<br />

–Save dump in a file<br />

zgetdump /dev/dasdb1 > dump_file<br />

Dump device: /dev/dasdb1<br />

>>> Dump header informati<strong>on</strong> > End of Dump header


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />

cause (c<strong>on</strong>t'd)<br />

Step 5<br />

–Use (l)crash, to find out, which process has triggered the reboot<br />

STACK:<br />

0 start_kernel+950 [0x6a690e]<br />

1 _stext+32 [0x100020]<br />

================================================================<br />

TASK HAS CPU (1): 0x3f720650 (oprocd.bin):<br />

LOWCORE INFO:<br />

­psw : 0x0704200180000000 0x000000000010c36a<br />

­functi<strong>on</strong> : machine_restart+6<br />

­prefix : 0x3f438000<br />

­cpu timer: 0x7fffffff 0xff9e6c00<br />

­clock cmp: 0x00c6ca69 0x22337760<br />

­general registers:<br />

<br />

/var/opt/oracle/product/crs/bin/oprocd.bin<br />

41<br />

STACK:<br />

0 __h<str<strong>on</strong>g>and</str<strong>on</strong>g>le_sysrq+248 [0x361240]<br />

1 write_sysrq_trigger+98 [0x2be796]<br />

2 sys_write+392 [0x225a68]<br />

3 sysc_noemu+16 [0x1179a8]<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Guest sp<strong>on</strong>taneously reboots (c<strong>on</strong>t'd)<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />

– HA comp<strong>on</strong>ent err<strong>on</strong>eously detected a system hang<br />

• hangcheck_timer module did not receive timer IRQ<br />

• z/<strong>VM</strong> 'time bomb' switch<br />

• TSA m<strong>on</strong>itor<br />

z/<strong>VM</strong> cannot guarantee 'real-time' behavior if overloaded<br />

– L<strong>on</strong>gest 'hang' observed: 37 sec<strong>on</strong>ds(!)<br />

Soluti<strong>on</strong>:<br />

– Offload HA workload from overloaded z/<strong>VM</strong><br />

• e.g. use separate z/<strong>VM</strong><br />

• or: run large Oracle RAC guests in LPAR<br />

42<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Network: network c<strong>on</strong>necti<strong>on</strong> is too slow<br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

–z/VSE running CICs, c<strong>on</strong>necti<strong>on</strong> to DB2 in <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

–Hipersocket c<strong>on</strong>necti<strong>on</strong> from <str<strong>on</strong>g>Linux</str<strong>on</strong>g> to z/VSE<br />

–But also applies to hipersocket c<strong>on</strong>necti<strong>on</strong>s between <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <str<strong>on</strong>g>and</str<strong>on</strong>g> z/OS<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />

–When CICS transacti<strong>on</strong>s were m<strong>on</strong>itored, some transacti<strong>on</strong>s take a<br />

couple of sec<strong>on</strong>ds instead of millisec<strong>on</strong>ds<br />

Tools used for problem determinati<strong>on</strong>:<br />

–dbginfo.sh<br />

–s390 debug feature<br />

–sadc/sar<br />

–CICS transacti<strong>on</strong> m<strong>on</strong>itor<br />

43<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Network: network c<strong>on</strong>necti<strong>on</strong> is too slow (c<strong>on</strong>t'd)<br />

s390 debug feature<br />

–Check for qeth errors:<br />

cat /sys/kernel/debug/s390dbf/qeth_qerr<br />

00 01282632346:099575 2 ­ 00 0000000180b20218 71 6f 75 74 65 72 72 00 | qouterr.<br />

00 01282632346:099575 2 ­ 00 0000000180b20298 20 46 31 35 3d 31 30 00 | F15=10.<br />

00 01282632346:099576 2 ­ 00 0000000180b20318 20 46 31 34 3d 30 30 00 | F14=00.<br />

00 01282632346:099576 2 ­ 00 0000000180b20390 20 71 65 72 72 3d 41 46 | qerr=AF<br />

00 01282632346:099576 2 ­ 00 0000000180b20408 20 73 65 72 72 3d 32 00 | serr=2.<br />

dbginfo file<br />

–Check for buffer count:<br />

cat /sys/devices/qeth/0.0.1e00/buffer_count<br />

16<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />

–Too few inbound buffers<br />

44<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Network: network c<strong>on</strong>necti<strong>on</strong> is too slow (c<strong>on</strong>t'd)<br />

Soluti<strong>on</strong>:<br />

– Increase inbound buffer count (default: 16, max 128)<br />

– Check actual buffer count with 'lsqeth -p'<br />

– Set the inbound buffer count in the appropriate c<strong>on</strong>fig file:<br />

• SUSE SLES10:<br />

- in /etc/sysc<strong>on</strong>fig/hardware/hwcfg-qeth-bus-ccw-0.0.F200<br />

- add QETH_OPTIONS="buffer_count=128"<br />

• SUSE SLES11:<br />

- in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add",<br />

SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200",<br />

ATTR{buffer_count}="128"<br />

• Red Hat:<br />

- in /etc/sysc<strong>on</strong>fig/network-scripts/ifcfg-eth0<br />

- add OPTIONS="buffer_count=128"<br />

45<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times<br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

–z10, HDS Storage Server (Hyper PAV enabled)<br />

–z/<strong>VM</strong>, <str<strong>on</strong>g>Linux</str<strong>on</strong>g> with Oracle Database<br />

–<strong>VM</strong> c<strong>on</strong>trolled Minidisks attached to <str<strong>on</strong>g>Linux</str<strong>on</strong>g>, L<strong>VM</strong> <strong>on</strong> top<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> descripti<strong>on</strong>:<br />

–I/O throughput not matching expectati<strong>on</strong>s<br />

–Oracle Database shows poor performance because of that<br />

–One L<strong>VM</strong> volume showing significant stress<br />

Tools used for problem determinati<strong>on</strong>:<br />

–dbginfo.sh<br />

–sadc/sar<br />

–z/<strong>VM</strong> M<strong>on</strong>itor data<br />

46<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />

Observati<strong>on</strong> in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

Resp<strong>on</strong>se time<br />

dm-9 0.00 0.00 49.75 0.00 19790.50 0.00 795.56 17.89 15.79 2.01 100.00<br />

Throughput<br />

Utilizati<strong>on</strong><br />

C<strong>on</strong>clusi<strong>on</strong><br />

PAV not being utilized<br />

No Hyper PAV support in SLES10 SP2<br />

Static PAV not possible with current setup (<strong>VM</strong> c<strong>on</strong>trolled minidisks)<br />

Need to look for other ways for more parallel I/O<br />

–Link same minidisk multiple times to a guest<br />

–Use smaller minidisks <str<strong>on</strong>g>and</str<strong>on</strong>g> increase striping in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

47<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />

Initial <str<strong>on</strong>g>and</str<strong>on</strong>g> proposed setup<br />

Physical Disk Logical Disk(s) <strong>VM</strong> Logical Disk(s) <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

1.Initial Setup<br />

2.Link Minidisks<br />

to guest<br />

multiple times<br />

Link multiple times<br />

multipath<br />

3.Smaller disks,<br />

more stripes<br />

striping<br />

L<strong>VM</strong> /<br />

Device mapper<br />

48<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />

New Observati<strong>on</strong> in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />

Resp<strong>on</strong>se times stay equal<br />

Throughput equal<br />

No PAV being used!!<br />

49<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />

Soluti<strong>on</strong>: check PAV setup in <strong>VM</strong><br />

50<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />

Finally:<br />

dasdaen 133.33 0.00 278.11 0.00 12298.51 0.00 88.44 0.79 2.83 1.48 41.29<br />

dasdcbt 161.19 0.00 248.26 0.00 12260.70 0.00 98.77 0.91 3.47 1.88 46.77<br />

dasdfwc 149.75 0.00 266.17 0.00 12374.13 0.00 92.98 1.88 7.07 2.54 67.66<br />

dasdael 162.19 0.00 250.25 0.00 12483.58 0.00 99.77 1.90 7.57 2.86 71.64<br />

dasddyz 134.83 0.00 277.61 0.00 12431.84 0.00 89.56 0.75 2.71 1.68 46.77<br />

dasdaem 151.24 0.00 266.17 0.00 12595.02 0.00 94.64 2.01 7.61 2.82 75.12<br />

dasdcbr 169.65 0.00 242.79 0.00 12386.07 0.00 102.03 1.72 7.05 2.83 68.66<br />

dasdfwd 162.69 0.00 249.25 0.00 12348.26 0.00 99.08 1.92 7.70 2.83 70.65<br />

dasddyy 157.21 0.00 259.70 0.00 12409.95 0.00 95.57 2.58 9.96 3.05 79.10<br />

dasddyx 174.63 0.00 237.81 0.00 12374.13 0.00 104.07 1.76 7.38 2.93 69.65<br />

dasdcbs 144.78 0.00 272.14 0.00 12264.68 0.00 90.14 2.53 9.31 2.89 78.61<br />

dasda 0.00 0.00 0.00 1.00 0.00 3.98 8.00 0.01 10.00 5.00 0.50<br />

dasdq 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />

dasdss 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />

dasdadx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.<br />

dasdawh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />

dasdamk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />

dasdaek 160.70 0.00 255.22 0.00 12382.09 0.00 97.03 2.27 8.95 2.88 73.63<br />

dasdcbq 148.76 0.00 265.67 0.00 12372.14 0.00 93.14 2.14 8.01 2.85 75.62<br />

dasddyw 162.19 0.00 254.23 0.00 12384.08 0.00 97.42 2.12 8.40 2.90 73.63<br />

dasdfwe 146.27 0.00 271.64 0.00 12419.90 0.00 91.44 2.63 9.71 2.80 76.12<br />

dasdfwf 162.19 0.00 249.75 0.00 12455.72 0.00 99.75 0.71 2.83 1.79 44.78<br />

dasdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />

dm­0 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70<br />

dm­1 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53<br />

dm­2 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55<br />

dm­3 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.05 7.32 0.53 87.56<br />

dm­4 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70<br />

dm­5 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53<br />

dm­6 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55<br />

dm­7 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.06 7.32 0.53 87.56<br />

dm­8 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00<br />

dm­9 0.00 0.00 497.51 0.00 197900.50 0.00 795.56 7.89 15.79 2.01 100.00<br />

51<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

B<strong>on</strong>ding throughput not matching expectati<strong>on</strong>s<br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

– SLES10 system, c<strong>on</strong>nected via OSA card <str<strong>on</strong>g>and</str<strong>on</strong>g> using b<strong>on</strong>ding driver<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />

– B<strong>on</strong>ding <strong>on</strong>ly working with 100mbps<br />

– FTP also slow<br />

Tools used for problem determinati<strong>on</strong>:<br />

– dbginfo.sh, netperf<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />

– ethtool cannot determine line speed correctly because qeth does not report it<br />

Soluti<strong>on</strong>:<br />

– Ignore the 100mbps message – upgrade to SLES11<br />

b<strong>on</strong>ding: b<strong>on</strong>d1: Warning: failed to get speed <str<strong>on</strong>g>and</str<strong>on</strong>g> duplex<br />

from eth0, assumed to be 100Mb/sec <str<strong>on</strong>g>and</str<strong>on</strong>g> Full<br />

52<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Unable to mount file system after L<strong>VM</strong> changes<br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />

– <str<strong>on</strong>g>Linux</str<strong>on</strong>g> HA cluster with two nodes<br />

– Accessing same dasds which are<br />

exported via ocfs2<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />

– Added <strong>on</strong>e node to cluster, brought<br />

Logical Volume <strong>on</strong>line<br />

– Unable to mount the filesystem from<br />

any node after that<br />

Tools used for problem<br />

determinati<strong>on</strong>:<br />

– dbginfo.sh<br />

OCFS2<br />

dasda b c d e f<br />

Logical Volume<br />

53<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Availability: Unable to mount file system after L<strong>VM</strong> changes<br />

(c<strong>on</strong>t'd)<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />

– L<strong>VM</strong> metadata was<br />

overwritten when<br />

adding 3rd node<br />

– e.g. superblock not<br />

found<br />

Soluti<strong>on</strong>:<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />

<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 3<br />

– Extract meta data from<br />

running node<br />

(/etc/lvm/backup) <str<strong>on</strong>g>and</str<strong>on</strong>g><br />

write to disk again<br />

OCFS2<br />

dasdf e c a b<br />

d<br />

{pv|vg|lv}create<br />

Logical Volume<br />

54<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>


<strong>IBM</strong><br />

Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />

Kernel panic: Low address protecti<strong>on</strong><br />

C<strong>on</strong>figurati<strong>on</strong>:<br />

– z10 <strong>on</strong>ly<br />

– High work load<br />

– The more likely the more multithreaded applicati<strong>on</strong>s are running<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />

– C<strong>on</strong>current access to pages to be removed from the page table<br />

Tools used for problem determinati<strong>on</strong>:<br />

– crash/lcrash<br />

<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />

– Race c<strong>on</strong>diti<strong>on</strong> in memory management in firmware<br />

Soluti<strong>on</strong>:<br />

– Upgrade to latest firmware!<br />

– Upgrade to latest kernels – fix to be integrated in all supported distributi<strong>on</strong>s<br />

55<br />

© 2011 <strong>IBM</strong> Corporati<strong>on</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!