Problem Reporting and Analysis Linux on System z ... - z/VM - IBM
Problem Reporting and Analysis Linux on System z ... - z/VM - IBM
Problem Reporting and Analysis Linux on System z ... - z/VM - IBM
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> <str<strong>on</strong>g>Reporting</str<strong>on</strong>g> <str<strong>on</strong>g>and</str<strong>on</strong>g> <str<strong>on</strong>g>Analysis</str<strong>on</strong>g> <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z -<br />
How to survive a <str<strong>on</strong>g>Linux</str<strong>on</strong>g> Critical Situati<strong>on</strong><br />
Sven Schuetz<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z Development <str<strong>on</strong>g>and</str<strong>on</strong>g> Service<br />
sven@de.ibm.com<br />
1<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Agenda<br />
Introducti<strong>on</strong><br />
How to help us to help you<br />
<strong>System</strong>s m<strong>on</strong>itoring<br />
How to dump a <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Real Customer cases<br />
2<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Introductory remarks<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> analysis looks straight forward <strong>on</strong> the charts but it might<br />
have taken weeks to get it d<strong>on</strong>e.<br />
A problem does not necessarily show up <strong>on</strong> the place of origin<br />
The more informati<strong>on</strong> is available, the so<strong>on</strong>er the problem can be<br />
solved, because gathering <str<strong>on</strong>g>and</str<strong>on</strong>g> submitting additi<strong>on</strong>al informati<strong>on</strong><br />
again <str<strong>on</strong>g>and</str<strong>on</strong>g> again usually introduces delays.<br />
This presentati<strong>on</strong> can <strong>on</strong>ly introduce some tools <str<strong>on</strong>g>and</str<strong>on</strong>g> how the tools<br />
can be used, comprehensive documentati<strong>on</strong> <strong>on</strong> their capabilities is<br />
to be found in the documentati<strong>on</strong> of the corresp<strong>on</strong>ding tool.<br />
Do not forget to update your systems<br />
3<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Describe the problem<br />
Get as much informati<strong>on</strong> as possible about the circumstances:<br />
– What is the problem<br />
– When did it happen (date <str<strong>on</strong>g>and</str<strong>on</strong>g> time, important to dig into logs )<br />
– Where did it happen One or more systems, producti<strong>on</strong> or test envir<strong>on</strong>ment<br />
– Is this a first time occurrence<br />
– If occurred before: how frequently does it occur<br />
– Is there any pattern<br />
– Was anything changed recently<br />
– Is the problem reproducible<br />
Write down as much informati<strong>on</strong> as possible about the problem!<br />
4<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Describe the envir<strong>on</strong>ment<br />
Machine Setup<br />
–Machine type (z196, z10, z9, ...)<br />
–Storage Server (ESS800, DS8000, other vendors models)<br />
–Storage attachment (FICON, ESCON, FCP, how many channels)<br />
–Network (OSA (type, mode), Hipersocket) ...<br />
Infrastructure setup<br />
–Clients<br />
–Other Computer <strong>System</strong>s<br />
–Network topologies<br />
–Disk c<strong>on</strong>figurati<strong>on</strong><br />
Middleware setup<br />
–Databases, web servers, SAP, TSM, (including versi<strong>on</strong> informati<strong>on</strong>)<br />
5<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Trouble Shooting First-Aid Kit (1/2)<br />
Install packages required for debugging<br />
– s390-tools/s390-utils<br />
• dbginfo.sh<br />
– sysstat<br />
• sadc/sar<br />
• iostat<br />
– procps<br />
• vmstat, top, ps<br />
– net-tools<br />
• netstat<br />
– dump tools crash / lcrash<br />
• lcrash (lkcdutils) available with SLES9 <str<strong>on</strong>g>and</str<strong>on</strong>g> SLES10<br />
• crash available <strong>on</strong> SLES11<br />
• crash in all RHEL distributi<strong>on</strong>s<br />
6<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Trouble Shooting First-Aid Kit (2/2)<br />
Collect dbginfo.sh output<br />
– Proactively in healthy system<br />
– When problems occur – then compare with healthy system<br />
Collect system data<br />
– Always archive syslog (/var/log/messages)<br />
– Start sadc (<strong>System</strong> Activity Data Collecti<strong>on</strong>) service when appropriate (please<br />
include disk statistics)<br />
– Collect z/<strong>VM</strong> MONWRITE Data if running under z/<strong>VM</strong> when appropriate<br />
When <strong>System</strong> hangs<br />
– Take a dump<br />
• Include <strong>System</strong>.map, Kerntypes (if available) <str<strong>on</strong>g>and</str<strong>on</strong>g> vmlinux file<br />
– See “Using the dump tools” book <strong>on</strong><br />
http://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26ddt02.pdf<br />
Enable extended tracing in /sys/kernel/debug/s390dbf for subsystem<br />
7<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
dbginfo Script (1/2)<br />
dbginfo.sh is a script to collect various system related files, for<br />
debugging purposes. It generates a tar-archive which can be<br />
attached to PMRs / Bugzilla entries<br />
part of the s390-tools package in SUSE <str<strong>on</strong>g>and</str<strong>on</strong>g> recent Red Hat<br />
distributi<strong>on</strong>s<br />
– dbginfo.sh gets c<strong>on</strong>tinuously improved by service <str<strong>on</strong>g>and</str<strong>on</strong>g> development<br />
Can be downloaded at the developerWorks website directly<br />
http://www.ibm.com/developerworks/linux/linux390/s390-tools.html<br />
It is similar to the RedHat tool sosreport or supportc<strong>on</strong>fig from Novell<br />
8<br />
root@larss<strong>on</strong>:~> dbginfo.sh<br />
Create target directory /tmp/DBGINFO-2011-01-15-22-06-<br />
20-t6345057<br />
Change to target directory /tmp/DBGINFO-2011-01-15-22-<br />
06-20-t6345057<br />
[...]<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
dbginfo Script (2/2)<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> Informati<strong>on</strong>:<br />
– /proc/[versi<strong>on</strong>, cpu, meminfo, slabinfo, modules, partiti<strong>on</strong>s, devices ...]<br />
– <strong>System</strong> z specific device driver informati<strong>on</strong>: /proc/s390dbf (RHEL 4 <strong>on</strong>ly) or<br />
/sys/kernel/debug/s390dbf<br />
– Kernel messages /var/log/messages<br />
– Reads c<strong>on</strong>figurati<strong>on</strong> files in directory /etc/ [ccwgroup.c<strong>on</strong>f, modules.c<strong>on</strong>f, fstab]<br />
– Uses several comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s: ps, dmesg<br />
– Query setup scripts<br />
• lscss, lsdasd, lsqeth, lszfcp, lstape<br />
– And much more<br />
z/<strong>VM</strong> informati<strong>on</strong>:<br />
– Release <str<strong>on</strong>g>and</str<strong>on</strong>g> service Level: q cplevel<br />
– Network setup: q [lan, nic, vswitch, v osa]<br />
– Storage setup: q [set, v dasd, v fcp, q pav ...]<br />
– C<strong>on</strong>figurati<strong>on</strong>/memory setup: q [stor, v stor, xstore, cpus...]<br />
– When the system runs as z/<strong>VM</strong> guest, ensure that the guest has the appropriate privilege<br />
class authorities to issue the comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s<br />
9<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
SADC/SAR<br />
Capture <str<strong>on</strong>g>Linux</str<strong>on</strong>g> performance data with sadc/sar<br />
– CPU utilizati<strong>on</strong><br />
–Disk I/O overview <str<strong>on</strong>g>and</str<strong>on</strong>g> <strong>on</strong> device level<br />
–Network I/O <str<strong>on</strong>g>and</str<strong>on</strong>g> errors <strong>on</strong> device level<br />
–Memory usage/Swapping<br />
–… <str<strong>on</strong>g>and</str<strong>on</strong>g> much more<br />
–Reports statistics data over time <str<strong>on</strong>g>and</str<strong>on</strong>g> creates average values for<br />
each item<br />
SADC example (for more see man sadc)<br />
– <strong>System</strong> Activity Data Collector (sadc) --> data gatherer<br />
– /usr/lib64/sa/sadc [opti<strong>on</strong>s] [interval [count]] [binary outfile]<br />
– /usr/lib64/sa/sadc 10 20 sadc_outfile<br />
10<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
SADC/SAR (c<strong>on</strong>t'd)<br />
– /usr/lib64/sa/sadc -d 10 sadc_outfile<br />
– -d opti<strong>on</strong>: statistics for disk<br />
– Should be started as a service during system start<br />
✱ SAR example (for more see man sar)<br />
– <strong>System</strong> Activity Report (sar) comm<str<strong>on</strong>g>and</str<strong>on</strong>g> --> reporting tool<br />
– sar -A<br />
– -A opti<strong>on</strong>: reports all the collected statistics<br />
– sar -A -f sadc_outfile >sar_outfile<br />
Please include the binary sadc data <str<strong>on</strong>g>and</str<strong>on</strong>g> sar -A output when<br />
submitting SADC/SAR informati<strong>on</strong> to <strong>IBM</strong> support<br />
11<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
CPU utilizati<strong>on</strong><br />
Per CPU values:<br />
watch out for<br />
system time (kernel time)<br />
iowait time (slow I/O subsystem)<br />
steal time (time taken by other guests)<br />
12<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Disk I/O rates<br />
read/write operati<strong>on</strong>s<br />
- per I/O device<br />
- tps: transacti<strong>on</strong>s<br />
- rd/wr_secs: sectors<br />
is your I/O balanced<br />
Maybe you should stripe your LVs<br />
13<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z dump tools<br />
DASD dump tool<br />
14<br />
– Writes dump directly <strong>on</strong> DASD partiti<strong>on</strong><br />
– Uses s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump format<br />
– ECKD <str<strong>on</strong>g>and</str<strong>on</strong>g> FBA DASDs supported<br />
– Single volume <str<strong>on</strong>g>and</str<strong>on</strong>g> multiple volume (for large systems) dump possible<br />
– Works in z/<strong>VM</strong> <str<strong>on</strong>g>and</str<strong>on</strong>g> in LPAR<br />
SCSI dump tool<br />
– Writes dump into filesystem<br />
– Uses lckd dump format<br />
– Works in z/<strong>VM</strong> <str<strong>on</strong>g>and</str<strong>on</strong>g> in LPAR<br />
<strong>VM</strong>DUMP<br />
– Writes dump to vm spool space (<strong>VM</strong> reader)<br />
– z/<strong>VM</strong> specific dump format, dump must be c<strong>on</strong>verted<br />
– Only available when running under z/<strong>VM</strong><br />
Tape dump tool<br />
– Writes dump directly <strong>on</strong> ESCON/FICON Tape device<br />
– Uses s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump format<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
DASD dump tool – general usage<br />
1. Format <str<strong>on</strong>g>and</str<strong>on</strong>g> partiti<strong>on</strong> dump device<br />
root@larss<strong>on</strong>:~> dasdfmt f /dev/dasd b 4096<br />
root@larss<strong>on</strong>:~> fdasd /dev/dasd<br />
2. Prepare dump device in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
root@larss<strong>on</strong>:~> zipl d /dev/dasd<br />
3. Stop all CPUs<br />
4. Store Status<br />
5. IPL dump device<br />
6. Copy dump to <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
root@larss<strong>on</strong>:~> zgetdump /dev/ > dump_file<br />
15<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
DASD dump under z/<strong>VM</strong><br />
Prepare dump device under <str<strong>on</strong>g>Linux</str<strong>on</strong>g>, if possible <strong>on</strong> 64Bit envir<strong>on</strong>ment:<br />
16<br />
root@larss<strong>on</strong>:~> zipl d /dev/dasd<br />
After <str<strong>on</strong>g>Linux</str<strong>on</strong>g> crash issue these comm<str<strong>on</strong>g>and</str<strong>on</strong>g>s <strong>on</strong> 3270 c<strong>on</strong>sole:<br />
#cp cpu all stop<br />
#cp cpu 0 store status<br />
#cp i <br />
Wait until dump is saved <strong>on</strong> device:<br />
00: zIPL v1.6.0 dump tool (64 bit)<br />
00: Dumping 64 bit OS<br />
00: 00000087 / 00000700 MB 0<br />
...<br />
00: Dump successful<br />
Only disabled wait PSW <strong>on</strong> older Distributi<strong>on</strong>s<br />
Attach dump device to a linux system with dump tools installed<br />
Store dump to linux file system from dump device (e.g. zgetdump)<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
DASD dump <strong>on</strong> LPAR (1/2)<br />
17<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
DASD dump <strong>on</strong> LPAR (2/2)<br />
18<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Multi volume dump<br />
zipl can now dump to multiple DASDs. It is now possible to dump system<br />
images, which are larger than a single DASD.<br />
–You can specify up to 32 ECKD DASD partiti<strong>on</strong>s for a multi-volume dump<br />
What are dumps good for<br />
–Full snapshot of system state<br />
taken at any point in time<br />
(e.g. after a system has<br />
crashed, of or a running<br />
system)<br />
–Can be used to analyse<br />
system state bey<strong>on</strong>d<br />
messages written to the<br />
syslog<br />
–Internal data structures not<br />
exported to anywhere<br />
Obtain messages, which have not been written to the syslog due to a<br />
crash<br />
19<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Multi volume dump (c<strong>on</strong>t'd)<br />
How to prepare a set of ECKD DASD devices for a multi-volume<br />
dump (64-bit systems <strong>on</strong>ly)<br />
–We use two DASDs in this example:<br />
root@larss<strong>on</strong>:~> dasdfmt f /dev/dasdc b 4096<br />
root@larss<strong>on</strong>:~> dasdfmt f /dev/dasdd b 4096<br />
–Create the partiti<strong>on</strong>s with fdasd. The sum of the partiti<strong>on</strong> sizes must be<br />
sufficiently large (the memory size + 10 MB):<br />
root@larss<strong>on</strong>:~> fdasd /dev/dasdc<br />
root@larss<strong>on</strong>:~> fdasd /dev/dasdd<br />
–Create a file called sample_dump_c<strong>on</strong>f c<strong>on</strong>taining the device nodes<br />
(e.g. /dev/dasdc1) of the two partiti<strong>on</strong>s, separated by <strong>on</strong>e or more line<br />
feed characters<br />
–Prepare the volumes using the zipl comm<str<strong>on</strong>g>and</str<strong>on</strong>g>.<br />
root@larss<strong>on</strong>:~> zipl M sample_dump_c<strong>on</strong>f<br />
[...]<br />
20<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Multi volume dump (c<strong>on</strong>t'd)<br />
To obtain a dump with the multi-volume DASD dump tool, perform<br />
the following steps:<br />
–Stop all CPUs, Store status <strong>on</strong> the IPL CPU.<br />
–IPL the dump tool using <strong>on</strong>e of the prepared volumes, either 4711 or<br />
4712.<br />
–After the dump tool is IPLed, you'll see a messages that indicates the<br />
progress of the dump. Then you can IPL <str<strong>on</strong>g>Linux</str<strong>on</strong>g> again<br />
#cp cpu all stop<br />
#cp cpu 0 store status<br />
#cp ipl 4711<br />
Copying a multi-volume dump to a file<br />
–Use zgetdump without any opti<strong>on</strong> to copy the dump parts to a file:<br />
root@larss<strong>on</strong>:~> zgetdump /dev/dasdc > mv_dump_file<br />
21<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Multi volume dump (c<strong>on</strong>t'd)<br />
Display informati<strong>on</strong> of the involved volumes:<br />
root@larss<strong>on</strong>:~> zgetdump d /dev/dasdc<br />
'/dev/dasdc' is part of Versi<strong>on</strong> 1 multivolume dump,which is<br />
spread al<strong>on</strong>g the following DASD volumes:<br />
0.0.4711 (<strong>on</strong>line, valid)<br />
0.0.4712 (<strong>on</strong>line, valid)<br />
[...]<br />
Display informati<strong>on</strong> about the dump itself:<br />
root@larss<strong>on</strong>:~> zgetdump i /dev/dasdc<br />
Dump device: /dev/dasdc<br />
>>> Dump header informati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
SCSI dump tool – general usage<br />
1. Create partiti<strong>on</strong> with PCBIOS disk-layout (fdisk)<br />
2. Format partiti<strong>on</strong> with ext2 or ext3 filesystem<br />
3. Install dump tool:<br />
–mount <str<strong>on</strong>g>and</str<strong>on</strong>g> prepare disk :<br />
root@larss<strong>on</strong>:~> mount /dev/sda1 /dumps<br />
root@larss<strong>on</strong>:~> zipl D /dev/sda1 t dumps<br />
–Opti<strong>on</strong>al: /etc/zipl.c<strong>on</strong>f:<br />
dumptofs=/dev/sda1<br />
target=/dumps<br />
4. Stop all CPUs<br />
5. Store Status<br />
6. IPL dump device<br />
23<br />
Dump tool creates dumps directly in filesystem<br />
SCSI dump supported for LPARs <str<strong>on</strong>g>and</str<strong>on</strong>g> as of z/<strong>VM</strong> 5.4<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
SCSI dump under z/<strong>VM</strong><br />
SCSI dump from z/<strong>VM</strong> is supported as of z/<strong>VM</strong> 5.4<br />
Issue SCSI dump<br />
#cp cpu all stop<br />
#cp cpu 0 store status<br />
#cp set dumpdev portname 47120763 00ce93a7 lun 47120000<br />
00000000 bootprog 0<br />
#cp ipl 4b49 dump<br />
To access the dump, mount the dump partiti<strong>on</strong><br />
24<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
SCSI dump <strong>on</strong> LPAR<br />
Select CPC image for LPAR to dump<br />
Goto Load panel<br />
Issue SCSI dump<br />
–FCP device<br />
–WWPN<br />
–LUN<br />
25<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
<strong>VM</strong>DUMP<br />
The <strong>on</strong>ly method to dump NSSes or DCSSes under z/<strong>VM</strong><br />
Works n<strong>on</strong>disruptive<br />
Create dump:<br />
#cp vmdump to cmsguest<br />
Receive dump:<br />
–Store the dump from the reader into CMS dump file:<br />
#cp dumpload<br />
–Transfer the dump to linux from CMS e.g. FTP<br />
–NEW: vmur device driver:<br />
root@larss<strong>on</strong>:~> vmur rec vmdump<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> tool to c<strong>on</strong>vert vmdump to lkcd format:<br />
root@larss<strong>on</strong>:~> vmc<strong>on</strong>vert vmdump linux.dump<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g>: Dump process relatively slow<br />
26<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
How to obtain informati<strong>on</strong> about a dump<br />
Display informati<strong>on</strong> of the involved volume:<br />
root@larss<strong>on</strong>:~> zgetdump d /dev/dasdb<br />
'/dev/dasdb' is Versi<strong>on</strong> 0 dump device.<br />
Dump size limit: n<strong>on</strong>e<br />
Display informati<strong>on</strong> about the dump itself:<br />
root@larss<strong>on</strong>:~> zgetdump i /dev/dasdb1<br />
Dump device: /dev/dasdb1<br />
Dump created <strong>on</strong>: Thu Oct 8 15:44:49 2009<br />
Magic number: 0xa8190173618f23fd<br />
Versi<strong>on</strong> number: 3<br />
Header size: 4096<br />
Page size: 4096<br />
Dumped memory: 1073741824<br />
Dumped pages: 262144<br />
Real memory: 1073741824<br />
cpu id: 0xff00012320978000<br />
<strong>System</strong> Arch: s390x (ESAME)<br />
Build Arch: s390x (ESAME)<br />
>>> End of Dump header
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
How to obtain informati<strong>on</strong> about a dump (c<strong>on</strong>t'd)<br />
Display informati<strong>on</strong> about the dump itself (dump header) <str<strong>on</strong>g>and</str<strong>on</strong>g> check if<br />
the dump is valid, use lcrash with opti<strong>on</strong>s<br />
’-i’ <str<strong>on</strong>g>and</str<strong>on</strong>g> ’-d’.<br />
root@larss<strong>on</strong>:~> lcrash i d /dev/dasdb1<br />
Dump Type: s390 st<str<strong>on</strong>g>and</str<strong>on</strong>g>al<strong>on</strong>e dump<br />
Machine: s390x (ESAME)<br />
CPU ID: 0xff00012320978000<br />
Memory Start: 0x0<br />
Memory End: 0x40000000<br />
Memory Size: 1073741824<br />
Time of dump: Thu Oct 8 15:44:49 2009<br />
Number of pages: 262144<br />
Kernel page size: 4096<br />
Versi<strong>on</strong> number: 3<br />
Magic number: 0xa8190173618f23fd<br />
Dump header size: 4096<br />
Dump level: 0x4<br />
Build arch: s390x (ESAME)<br />
Time of dump end: Thu Oct 8 15:45:01 2009<br />
28<br />
End Marker found! Dump is valid!<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Automatic dump <strong>on</strong> panic (SLES 10/11, RHEL 5/6): dumpc<strong>on</strong>f<br />
The dumpc<strong>on</strong>f tool c<strong>on</strong>figures a dump device that is used for automatic<br />
dump in case of a kernel panic.<br />
–The comm<str<strong>on</strong>g>and</str<strong>on</strong>g> can be installed as service script under /etc/init.d/dumpc<strong>on</strong>f<br />
or can be called manually.<br />
–Start service: # service dumpc<strong>on</strong>f start<br />
–It reads the c<strong>on</strong>figurati<strong>on</strong> file /etc/sysc<strong>on</strong>fig/dumpc<strong>on</strong>f.<br />
–Example c<strong>on</strong>figurati<strong>on</strong> for CCW dump device (DASD) <str<strong>on</strong>g>and</str<strong>on</strong>g> reipl after dump:<br />
ON_PANIC=dump_reipl<br />
DUMP_TYPE=ccw<br />
DEVICE=0.0.4711<br />
29<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Automatic dump <strong>on</strong> panic (SLES 10/11, RHEL 5): dumpc<strong>on</strong>f<br />
(c<strong>on</strong>t'd)<br />
–Example c<strong>on</strong>figurati<strong>on</strong> for FCP dump device (SCSI disk):<br />
ON_PANIC=dump<br />
DUMP_TYPE=fcp<br />
DEVICE=0.0.4714<br />
WWPN=0x5005076303004712<br />
LUN=0x4047401300000000<br />
BOOTPROG=0<br />
BR_LBA=0<br />
–Example c<strong>on</strong>figurati<strong>on</strong> for re-IPL without taking a dump, if a kernel<br />
panic occurs:<br />
ON_PANIC=reipl<br />
–Example of executing a CP comm<str<strong>on</strong>g>and</str<strong>on</strong>g>, <str<strong>on</strong>g>and</str<strong>on</strong>g> rebooting from device 4711<br />
if a kernel panic occurs:<br />
30<br />
ON_PANIC=vmcmd<br />
<strong>VM</strong>CMD_1="MSG Starting <strong>VM</strong>DUMP"<br />
<strong>VM</strong>CMD_2="<strong>VM</strong>DUMP"<br />
<strong>VM</strong>CMD_3="IPL 4711"<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Get dump <str<strong>on</strong>g>and</str<strong>on</strong>g> send it to service organizati<strong>on</strong><br />
DASD/Tape:<br />
–Store dump to <str<strong>on</strong>g>Linux</str<strong>on</strong>g> file system from dump device:<br />
root@larss<strong>on</strong>:~> zgetdump /dev/ > dump_file<br />
–Alternative: lcrash (Compressi<strong>on</strong> possible)<br />
root@larss<strong>on</strong>:~> lcrash d /dev/dasdxx s <br />
SCSI:<br />
–Get dump from filesystem<br />
Additi<strong>on</strong>al files needed for dump analysis:<br />
– SUSE (lcrash tool): /boot/<strong>System</strong>.map-xxx <str<strong>on</strong>g>and</str<strong>on</strong>g> /boot/Kerntypes-xxx<br />
– Redhat & SUSE (crash tool): vmlinux file (kernel with debug info) c<strong>on</strong>tained in<br />
debug kernel rpms:<br />
• RedHat: kernel-debuginfo-xxx.rpm <str<strong>on</strong>g>and</str<strong>on</strong>g> kernel-debuginfo-comm<strong>on</strong>-xxx.rpm<br />
• SUSE: kernel-default-debuginfo-xxx.rpm<br />
31<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
H<str<strong>on</strong>g>and</str<strong>on</strong>g>ling large dumps<br />
Compress the dump <str<strong>on</strong>g>and</str<strong>on</strong>g> split it into parts of 1 GB<br />
root@larss<strong>on</strong>:~> zgetdump /dev/dasdc1 | gzip | split b 1G<br />
Several compressed files such as xaa, xab, xac, .... are created<br />
Create md5 sums of the compressed files<br />
root@larss<strong>on</strong>:~> md5sum xa* > dump.md5<br />
Upload all parts together with the md5 informati<strong>on</strong><br />
Verificati<strong>on</strong> of the parts for a receiver<br />
root@larss<strong>on</strong>:~> md5sum c dump.md5<br />
xaa: OK<br />
[....]<br />
Merge the parts <str<strong>on</strong>g>and</str<strong>on</strong>g> uncompress the dump<br />
root@larss<strong>on</strong>:~> cat xa* | gunzip c > dump<br />
32<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Transferring dumps<br />
Transferring single volume dumps with ssh<br />
root@larss<strong>on</strong>:~> zgetdump /dev/dasdc1 | ssh user@host "cat ><br />
dump_file_<strong>on</strong>_target_host"<br />
Transferring multi-volume dumps with ssh<br />
root@larss<strong>on</strong>:~> zgetdump /dev/dasdc | ssh user@host "cat ><br />
multi_volume_dump_file_<strong>on</strong>_target_host"<br />
Transferring a dump with ftp<br />
– Establish an ftp sessi<strong>on</strong> with the target host, login <str<strong>on</strong>g>and</str<strong>on</strong>g> set the transfer mode to<br />
binary<br />
– Send the dump to the host<br />
root@larss<strong>on</strong>:~> ftp> put |"zgetdump /dev/dasdc1"<br />
<br />
33<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Dump tool summary<br />
Tool<br />
Envir<strong>on</strong>ment<br />
St<str<strong>on</strong>g>and</str<strong>on</strong>g> al<strong>on</strong>e tools<br />
<strong>VM</strong>DUMP<br />
DASD Tape SCSI<br />
<strong>VM</strong>&LPAR <strong>VM</strong>&LPAR <strong>VM</strong><br />
Preparati<strong>on</strong><br />
Creati<strong>on</strong><br />
Dump<br />
medium<br />
Copy to<br />
filesystem<br />
Viewing<br />
zipl -d /dev/<br />
ECKD or<br />
FBA<br />
Stop CPU & Store status<br />
ipl <br />
Tape cartridges<br />
zgetdump /dev/<br />
> dump_file<br />
mkdir<br />
/dumps/mydumps<br />
zipl -D /dev/sda1 ...<br />
lcrash or crash<br />
LINUX file system<br />
<strong>on</strong> a SCSI disk<br />
---<br />
---<br />
vmdump<br />
<strong>VM</strong> reader<br />
Dumpload<br />
ftp ...<br />
vmc<strong>on</strong>vert ...<br />
See “Using the dump tools” book <strong>on</strong><br />
http://www.ibm.com/developerworks/linux/linux390/documentati<strong>on</strong>_dev.html<br />
34<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Customer Cases<br />
35<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots<br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
– Oracle RAC server or other HA<br />
soluti<strong>on</strong> under z/<strong>VM</strong><br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />
– Occasi<strong>on</strong>ally guests sp<strong>on</strong>taneously<br />
reboot without any notificati<strong>on</strong> or<br />
c<strong>on</strong>sole message<br />
Tools used for problem<br />
determinati<strong>on</strong>:<br />
– cp instructi<strong>on</strong> trace of (re)IPL code<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />
Oracle RAC<br />
Server<br />
HA Cluster<br />
communicati<strong>on</strong><br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />
Oracle RAC<br />
Server<br />
– Crash dump taken after trace was<br />
hit<br />
Oracle RAC Database<br />
36<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />
cause<br />
Questi<strong>on</strong>: Who rebooted the system<br />
Step 1<br />
–Find out address of (re)ipl code in the system map<br />
–Use this address to set instructi<strong>on</strong> trace<br />
cd /boot<br />
grep machine_restart <strong>System</strong>.map2.6.16.600.54.5default<br />
000000000010c364 T machine_restart<br />
00000000001171c8 t do_machine_restart<br />
0000000000603200 D _machine_restart<br />
37<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />
cause (c<strong>on</strong>t'd)<br />
Step 2<br />
–Set CP instructi<strong>on</strong> trace <strong>on</strong> the reboot address<br />
–<strong>System</strong> is halted at that address, when a reboot is triggered<br />
CP CPU ALL TR IN R 10C364.4<br />
HCPTRI1027I An active trace set has turned RUN off<br />
CP Q TR<br />
NAME INITIAL (ACTIVE)<br />
1 INSTR PSWA 0010C3640010C367<br />
TERM NOPRINT NORUN SIM<br />
SKIP 00000 PASS 00000 STOP 00000 STEP 00000<br />
CMD NONE<br />
> 000000000010C364' STMF EBCFF0780024 >> 000000003A557D48<br />
CC 2<br />
38<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />
cause (c<strong>on</strong>t'd)<br />
Step 3<br />
–Take a dump, when the (re)ipl code is hit<br />
cp cpu all stop<br />
cp store status<br />
Store complete.<br />
39<br />
cp i 4fc6<br />
Tracing active at IPL<br />
HCPGSP2630I The virtual machine is placed in CP mode due to a<br />
SOGP stop <str<strong>on</strong>g>and</str<strong>on</strong>g> store status from CPU 00.<br />
zIPL v1.6.30.24.5 dump tool (64bit)<br />
Dumping 64 bit OS<br />
00000128 / 00001024 MB<br />
......<br />
00001024 / 00001024 MB<br />
Dump successful<br />
HCPIR450W CP entered, disabled wait PSW 00020000 80000000<br />
00000000 00000000<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />
cause (c<strong>on</strong>t'd)<br />
Step 4<br />
–Save dump in a file<br />
zgetdump /dev/dasdb1 > dump_file<br />
Dump device: /dev/dasdb1<br />
>>> Dump header informati<strong>on</strong> > End of Dump header
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots - Steps to find root<br />
cause (c<strong>on</strong>t'd)<br />
Step 5<br />
–Use (l)crash, to find out, which process has triggered the reboot<br />
STACK:<br />
0 start_kernel+950 [0x6a690e]<br />
1 _stext+32 [0x100020]<br />
================================================================<br />
TASK HAS CPU (1): 0x3f720650 (oprocd.bin):<br />
LOWCORE INFO:<br />
psw : 0x0704200180000000 0x000000000010c36a<br />
functi<strong>on</strong> : machine_restart+6<br />
prefix : 0x3f438000<br />
cpu timer: 0x7fffffff 0xff9e6c00<br />
clock cmp: 0x00c6ca69 0x22337760<br />
general registers:<br />
<br />
/var/opt/oracle/product/crs/bin/oprocd.bin<br />
41<br />
STACK:<br />
0 __h<str<strong>on</strong>g>and</str<strong>on</strong>g>le_sysrq+248 [0x361240]<br />
1 write_sysrq_trigger+98 [0x2be796]<br />
2 sys_write+392 [0x225a68]<br />
3 sysc_noemu+16 [0x1179a8]<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Guest sp<strong>on</strong>taneously reboots (c<strong>on</strong>t'd)<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />
– HA comp<strong>on</strong>ent err<strong>on</strong>eously detected a system hang<br />
• hangcheck_timer module did not receive timer IRQ<br />
• z/<strong>VM</strong> 'time bomb' switch<br />
• TSA m<strong>on</strong>itor<br />
z/<strong>VM</strong> cannot guarantee 'real-time' behavior if overloaded<br />
– L<strong>on</strong>gest 'hang' observed: 37 sec<strong>on</strong>ds(!)<br />
Soluti<strong>on</strong>:<br />
– Offload HA workload from overloaded z/<strong>VM</strong><br />
• e.g. use separate z/<strong>VM</strong><br />
• or: run large Oracle RAC guests in LPAR<br />
42<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Network: network c<strong>on</strong>necti<strong>on</strong> is too slow<br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
–z/VSE running CICs, c<strong>on</strong>necti<strong>on</strong> to DB2 in <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
–Hipersocket c<strong>on</strong>necti<strong>on</strong> from <str<strong>on</strong>g>Linux</str<strong>on</strong>g> to z/VSE<br />
–But also applies to hipersocket c<strong>on</strong>necti<strong>on</strong>s between <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <str<strong>on</strong>g>and</str<strong>on</strong>g> z/OS<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />
–When CICS transacti<strong>on</strong>s were m<strong>on</strong>itored, some transacti<strong>on</strong>s take a<br />
couple of sec<strong>on</strong>ds instead of millisec<strong>on</strong>ds<br />
Tools used for problem determinati<strong>on</strong>:<br />
–dbginfo.sh<br />
–s390 debug feature<br />
–sadc/sar<br />
–CICS transacti<strong>on</strong> m<strong>on</strong>itor<br />
43<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Network: network c<strong>on</strong>necti<strong>on</strong> is too slow (c<strong>on</strong>t'd)<br />
s390 debug feature<br />
–Check for qeth errors:<br />
cat /sys/kernel/debug/s390dbf/qeth_qerr<br />
00 01282632346:099575 2 00 0000000180b20218 71 6f 75 74 65 72 72 00 | qouterr.<br />
00 01282632346:099575 2 00 0000000180b20298 20 46 31 35 3d 31 30 00 | F15=10.<br />
00 01282632346:099576 2 00 0000000180b20318 20 46 31 34 3d 30 30 00 | F14=00.<br />
00 01282632346:099576 2 00 0000000180b20390 20 71 65 72 72 3d 41 46 | qerr=AF<br />
00 01282632346:099576 2 00 0000000180b20408 20 73 65 72 72 3d 32 00 | serr=2.<br />
dbginfo file<br />
–Check for buffer count:<br />
cat /sys/devices/qeth/0.0.1e00/buffer_count<br />
16<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />
–Too few inbound buffers<br />
44<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Network: network c<strong>on</strong>necti<strong>on</strong> is too slow (c<strong>on</strong>t'd)<br />
Soluti<strong>on</strong>:<br />
– Increase inbound buffer count (default: 16, max 128)<br />
– Check actual buffer count with 'lsqeth -p'<br />
– Set the inbound buffer count in the appropriate c<strong>on</strong>fig file:<br />
• SUSE SLES10:<br />
- in /etc/sysc<strong>on</strong>fig/hardware/hwcfg-qeth-bus-ccw-0.0.F200<br />
- add QETH_OPTIONS="buffer_count=128"<br />
• SUSE SLES11:<br />
- in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add",<br />
SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200",<br />
ATTR{buffer_count}="128"<br />
• Red Hat:<br />
- in /etc/sysc<strong>on</strong>fig/network-scripts/ifcfg-eth0<br />
- add OPTIONS="buffer_count=128"<br />
45<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times<br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
–z10, HDS Storage Server (Hyper PAV enabled)<br />
–z/<strong>VM</strong>, <str<strong>on</strong>g>Linux</str<strong>on</strong>g> with Oracle Database<br />
–<strong>VM</strong> c<strong>on</strong>trolled Minidisks attached to <str<strong>on</strong>g>Linux</str<strong>on</strong>g>, L<strong>VM</strong> <strong>on</strong> top<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> descripti<strong>on</strong>:<br />
–I/O throughput not matching expectati<strong>on</strong>s<br />
–Oracle Database shows poor performance because of that<br />
–One L<strong>VM</strong> volume showing significant stress<br />
Tools used for problem determinati<strong>on</strong>:<br />
–dbginfo.sh<br />
–sadc/sar<br />
–z/<strong>VM</strong> M<strong>on</strong>itor data<br />
46<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />
Observati<strong>on</strong> in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
Resp<strong>on</strong>se time<br />
dm-9 0.00 0.00 49.75 0.00 19790.50 0.00 795.56 17.89 15.79 2.01 100.00<br />
Throughput<br />
Utilizati<strong>on</strong><br />
C<strong>on</strong>clusi<strong>on</strong><br />
PAV not being utilized<br />
No Hyper PAV support in SLES10 SP2<br />
Static PAV not possible with current setup (<strong>VM</strong> c<strong>on</strong>trolled minidisks)<br />
Need to look for other ways for more parallel I/O<br />
–Link same minidisk multiple times to a guest<br />
–Use smaller minidisks <str<strong>on</strong>g>and</str<strong>on</strong>g> increase striping in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
47<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />
Initial <str<strong>on</strong>g>and</str<strong>on</strong>g> proposed setup<br />
Physical Disk Logical Disk(s) <strong>VM</strong> Logical Disk(s) <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
1.Initial Setup<br />
2.Link Minidisks<br />
to guest<br />
multiple times<br />
Link multiple times<br />
multipath<br />
3.Smaller disks,<br />
more stripes<br />
striping<br />
L<strong>VM</strong> /<br />
Device mapper<br />
48<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />
New Observati<strong>on</strong> in <str<strong>on</strong>g>Linux</str<strong>on</strong>g><br />
Resp<strong>on</strong>se times stay equal<br />
Throughput equal<br />
No PAV being used!!<br />
49<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />
Soluti<strong>on</strong>: check PAV setup in <strong>VM</strong><br />
50<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
High Disk resp<strong>on</strong>se times (c<strong>on</strong>t'd)<br />
Finally:<br />
dasdaen 133.33 0.00 278.11 0.00 12298.51 0.00 88.44 0.79 2.83 1.48 41.29<br />
dasdcbt 161.19 0.00 248.26 0.00 12260.70 0.00 98.77 0.91 3.47 1.88 46.77<br />
dasdfwc 149.75 0.00 266.17 0.00 12374.13 0.00 92.98 1.88 7.07 2.54 67.66<br />
dasdael 162.19 0.00 250.25 0.00 12483.58 0.00 99.77 1.90 7.57 2.86 71.64<br />
dasddyz 134.83 0.00 277.61 0.00 12431.84 0.00 89.56 0.75 2.71 1.68 46.77<br />
dasdaem 151.24 0.00 266.17 0.00 12595.02 0.00 94.64 2.01 7.61 2.82 75.12<br />
dasdcbr 169.65 0.00 242.79 0.00 12386.07 0.00 102.03 1.72 7.05 2.83 68.66<br />
dasdfwd 162.69 0.00 249.25 0.00 12348.26 0.00 99.08 1.92 7.70 2.83 70.65<br />
dasddyy 157.21 0.00 259.70 0.00 12409.95 0.00 95.57 2.58 9.96 3.05 79.10<br />
dasddyx 174.63 0.00 237.81 0.00 12374.13 0.00 104.07 1.76 7.38 2.93 69.65<br />
dasdcbs 144.78 0.00 272.14 0.00 12264.68 0.00 90.14 2.53 9.31 2.89 78.61<br />
dasda 0.00 0.00 0.00 1.00 0.00 3.98 8.00 0.01 10.00 5.00 0.50<br />
dasdq 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />
dasdss 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />
dasdadx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.<br />
dasdawh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />
dasdamk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />
dasdaek 160.70 0.00 255.22 0.00 12382.09 0.00 97.03 2.27 8.95 2.88 73.63<br />
dasdcbq 148.76 0.00 265.67 0.00 12372.14 0.00 93.14 2.14 8.01 2.85 75.62<br />
dasddyw 162.19 0.00 254.23 0.00 12384.08 0.00 97.42 2.12 8.40 2.90 73.63<br />
dasdfwe 146.27 0.00 271.64 0.00 12419.90 0.00 91.44 2.63 9.71 2.80 76.12<br />
dasdfwf 162.19 0.00 249.75 0.00 12455.72 0.00 99.75 0.71 2.83 1.79 44.78<br />
dasdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00<br />
dm0 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70<br />
dm1 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53<br />
dm2 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55<br />
dm3 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.05 7.32 0.53 87.56<br />
dm4 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70<br />
dm5 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53<br />
dm6 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55<br />
dm7 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.06 7.32 0.53 87.56<br />
dm8 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00<br />
dm9 0.00 0.00 497.51 0.00 197900.50 0.00 795.56 7.89 15.79 2.01 100.00<br />
51<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
B<strong>on</strong>ding throughput not matching expectati<strong>on</strong>s<br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
– SLES10 system, c<strong>on</strong>nected via OSA card <str<strong>on</strong>g>and</str<strong>on</strong>g> using b<strong>on</strong>ding driver<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />
– B<strong>on</strong>ding <strong>on</strong>ly working with 100mbps<br />
– FTP also slow<br />
Tools used for problem determinati<strong>on</strong>:<br />
– dbginfo.sh, netperf<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />
– ethtool cannot determine line speed correctly because qeth does not report it<br />
Soluti<strong>on</strong>:<br />
– Ignore the 100mbps message – upgrade to SLES11<br />
b<strong>on</strong>ding: b<strong>on</strong>d1: Warning: failed to get speed <str<strong>on</strong>g>and</str<strong>on</strong>g> duplex<br />
from eth0, assumed to be 100Mb/sec <str<strong>on</strong>g>and</str<strong>on</strong>g> Full<br />
52<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Unable to mount file system after L<strong>VM</strong> changes<br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />
– <str<strong>on</strong>g>Linux</str<strong>on</strong>g> HA cluster with two nodes<br />
– Accessing same dasds which are<br />
exported via ocfs2<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />
– Added <strong>on</strong>e node to cluster, brought<br />
Logical Volume <strong>on</strong>line<br />
– Unable to mount the filesystem from<br />
any node after that<br />
Tools used for problem<br />
determinati<strong>on</strong>:<br />
– dbginfo.sh<br />
OCFS2<br />
dasda b c d e f<br />
Logical Volume<br />
53<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Availability: Unable to mount file system after L<strong>VM</strong> changes<br />
(c<strong>on</strong>t'd)<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 2<br />
– L<strong>VM</strong> metadata was<br />
overwritten when<br />
adding 3rd node<br />
– e.g. superblock not<br />
found<br />
Soluti<strong>on</strong>:<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 1<br />
<str<strong>on</strong>g>Linux</str<strong>on</strong>g> 3<br />
– Extract meta data from<br />
running node<br />
(/etc/lvm/backup) <str<strong>on</strong>g>and</str<strong>on</strong>g><br />
write to disk again<br />
OCFS2<br />
dasdf e c a b<br />
d<br />
{pv|vg|lv}create<br />
Logical Volume<br />
54<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>
<strong>IBM</strong><br />
Live Virtual Class – <str<strong>on</strong>g>Linux</str<strong>on</strong>g> <strong>on</strong> <strong>System</strong> z<br />
Kernel panic: Low address protecti<strong>on</strong><br />
C<strong>on</strong>figurati<strong>on</strong>:<br />
– z10 <strong>on</strong>ly<br />
– High work load<br />
– The more likely the more multithreaded applicati<strong>on</strong>s are running<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Descripti<strong>on</strong>:<br />
– C<strong>on</strong>current access to pages to be removed from the page table<br />
Tools used for problem determinati<strong>on</strong>:<br />
– crash/lcrash<br />
<str<strong>on</strong>g>Problem</str<strong>on</strong>g> Origin:<br />
– Race c<strong>on</strong>diti<strong>on</strong> in memory management in firmware<br />
Soluti<strong>on</strong>:<br />
– Upgrade to latest firmware!<br />
– Upgrade to latest kernels – fix to be integrated in all supported distributi<strong>on</strong>s<br />
55<br />
© 2011 <strong>IBM</strong> Corporati<strong>on</strong>