10.07.2015 Views

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 12 MONITORING EXADATA PERFORMANCE Kevin Says: I’m often surprised at the lack of focus paid to data placement on physical disks. The authors arecorrect to point out that so-called short-stroked I/O requests have some, if not all, seek time removed from theservice time of the request. However, I’d like to elaborate on this point further. When discussing <strong>Exadata</strong> we oftenconcentrate on data capacity of enormous proportion. However, it is seldom the case that all data is active in allqueries. Most data warehouses have an active subset of the total database. One can easily envision the activeportion of a 100TB data warehouse being as small as 1TB. Just one percent? How can this be? Always rememberthat <strong>Oracle</strong> Database, combined with <strong>Exadata</strong>, offers many “lines of defense” in front of the dreaded physical I/O.Before physical I/O there is a results cache miss, SGA buffer cache miss (yes, it has purpose even when <strong>Exadata</strong> isinvolved), partition elimination, and finally a storage index miss. Once through these “lines of defense,” there is aphysical I/O for data that is potentially compressed with a compression ratio of, say, 6:1. To that end, the 1TBactive portion is actually 6TB—a sizable amount of data. Seek times are the lion’s share of physical I/O servicetime. In fact, from a current cost perspective, the optimal data warehouse architecture would consist of seek-freerotational hard disk drives. While that may seem absurd, allow me to point out that 1TB is exactly 1 percent of theaggregate surface area of the 168 (600GB) SAS drives offered in a full rack X2 model with high-performancedrives. Long seeks are not a fixed cost in modern data warehouses—especially when the active portion is such asmall percentage of the whole. If the active portion of the data warehouse remains a fixed percentage, yet servicetimes increase over time, the problem is likely fragmentation. Data scattered over the entirety of round, spinningmagnetic media is not optimal regardless of whether <strong>Oracle</strong> Database is connected to conventional storage or<strong>Exadata</strong>. Even with <strong>Exadata</strong>, the fundamentals still apply.The whole point of this explanation so far is really that it is possible to know what the ideal diskservice times should be for <strong>Exadata</strong> cells disk drives, and if the service times are constantly much higherthan that, then there is some problem with the storage hardware (disks or controllers). With SANstorage, high OS-level service times could also mean that there is some queuing going on inside the SANnetwork or the storage array (for example, if a thousand other servers are hammering the same storagearray with I/O). But as said above, <strong>Exadata</strong> storage cells have dedicated storage in them; only the currentcell OS can access this storage, and all IOs are visible in iostat.So, what if the service time is OK, but <strong>Oracle</strong> (cellsrv) still sees bad I/O performance? Well this maymean there is still queuing going on inside the cell Linux servers—and iostat can show this information,too. You can have only a limited number of outstanding I/Os in the “on-the-fly” state against yourstorage controller LUNs. The storage controller needs to keep track of each outstanding I/O (forexample, it has to remember where in the host RAM to write the block once it is read from disk andarrives at the controller), and these I/O slots are limited. So the Linux kernel does not allow sending outmore I/Os than the storage controller can handle; otherwise a SCSI reset would occur. These throttledI/Os will have to wait in the OS disk device I/O queue— and they are uselessly waiting there; they aren’teven sent out to the storage controller yet. Only when some previously outstanding I/O operationcompletes will the first item in the queue be sent to the disks (assuming that the I/O latency deadlinehasn’t been reached for some request; cellsrv uses the Linux “deadline” I/O scheduler).408

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!