10.07.2015 Views

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 12 MONITORING EXADATA PERFORMANCEOne topic deserves some elaboration— the CPU usage monitoring, both in the database and cells.The data warehouses and reporting systems, unlike OLTP databases, don’t usually require very quickresponse times for user queries. Of course, the faster a query completes, the better, but in a DW, peopledon’t really notice if a query ran in 35 seconds instead of 30 seconds. But a typical OLTP user woulddefinitely notice if their 1-second query took 6 seconds occasionally. That’s one of the reasons why inOLTP servers you would not want to constantly run at 100% CPU utilization. You cannot do that and alsomaintain stable performance. In an OLTP system you must leave some headroom. In DW servers,however, the small fluctuations in performance would not be noticed, and you can afford to run at 100%of CPU utilization in order to get the most out of your investment.However, <strong>Exadata</strong> complicates things. In addition to having multiple database nodes, you also haveanother whole layer of servers: the cells. Things get interesting especially when running Smart Scanswith high parallelism against EHCC tables. That’s because offloaded decompression requires a lot ofCPU cycles in the cells. Thus it is possible that for some workloads your cells’ CPUs will be 100% busyand unable to feed data back to the database layer fast enough. The database layer CPUs may be halfidle, while cells could really use some extra CPU capacity.The risk from cell utilization reaching 100% is the reason <strong>Oracle</strong> made cellsrv able to skip offloadprocessing for some datablocks and pass these blocks straight back to the database (starting in cellsrv11.2.2.3.0). The cell checks whether its CPU utilization is over 90% and whether the database CPUutilization (it’s sent in based on resource manager stats from the database) is lower than that. And if so,some blocks are not processed in the cells, but passed through to the database directly. The databasethen will decrypt (if needed) and decompress the blocks and perform projection and filtering in thedatabase layer. This allows you to fully utilize all your CPU capacity in both layers of the <strong>Exadata</strong> cluster.However, this automatically means that if some blocks are suddenly processed in the database layer(instead of being offloaded to cells), then you may see unexpected CPU utilization spikes in the databaselayer, when cells are too busy. This shouldn’t be a problem on most DW systems, especially withproperly configured resource manager, but you would want to watch out for this when running OLTP orother low-latency systems on <strong>Exadata</strong>.As usual, <strong>Oracle</strong> provides good metrics about this pass-through feature. Whenever the offloadprocessing is skipped for some blocks during a Smart Scan and these blocks are sent back to thedatabase layer for processing, the statistic “cell physical IO bytes pushed back due to excessive CPU oncell” gets incremented in V$SESSTAT/V$SYSSTAT and AWR reports. Read more about this feature andstatistic in Chapter 11.Monitoring <strong>Exadata</strong> Storage Cell OS-level Metrics<strong>Oracle</strong> Database and the cellsrv software do a good job of gathering various performance metrics, butthere are still cases where you would want to use an OS tool instead. One of the reasons is that usuallythe V$ views in <strong>Oracle</strong> tell you what <strong>Oracle</strong> thinks it’s doing, but this may not necessarily be what’s reallyhappening if you hit a bug or some other limitation of <strong>Oracle</strong>’s built-in instrumentation. One of thelimitations is low-level I/O measurement.Monitoring the Storage Cell Server’s I/O Metrics with iostatBoth <strong>Oracle</strong> Database and the storage cells do measure the I/O Completion Time; in other words, theresponse time of I/Os. With synchronous I/O operations, <strong>Oracle</strong>’s I/O wait time is merely the system call(like pread) completion time. With asynchronous I/O, the response time measurement is trickier, as anasynchronous I/O submit system call will not block and wait; it will return immediately in microsecondsand some I/O reaping system call will be executed later, which will mark the I/O operation as complete.The <strong>Exadata</strong> storage cells keep track of each asynchronous I/O request (for example, when it was406

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!