10.07.2015 Views

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

Expert Oracle Exadata - Parent Directory

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 15 COMPUTE NODE LAYOUTWith this many grid disks, (2,688 in this case) visible to all ASM instances, it’s easy to see how theycan be accidentally misallocated to the wrong ASM disk groups. To protect yourself from mistakes likethat, you might want to consider using cell security to restrict the access of each ASM instance so that itonly “sees” its own set of grid disks. For detailed steps on how to implement cell security, refer toChapter 14.RAC ClustersNow that we’ve discussed how each compute node and storage cell can be configured in a fullyindependent fashion, let’s take a look at how they can be clustered together to provide high availabilityand horizontal scalability using RAC clusters. But before we do that, we’ll take a brief detour andestablish what high availability and scalability are.High availability (HA) is a fairly well understood concept, but it often gets confused with faulttolerance. In a truly fault-tolerant system, every component is redundant. If one component fails,another component takes over without any interruption to service. High availability also involvescomponent redundancy, but failures may cause a brief interruption to service while the systemreconfigures to use the redundant component. Work in progress during the interruption must beresubmitted or continued on the redundant component. The time it takes to detect a failure,reconfigure, and resume work varies greatly in HA systems. For example, active/passive Unix clustershave been used extensively to provide graceful failover in the event of a server crash. Now, you mightchuckle to yourself when you see the words “graceful failover” and “crash” used in the same sentence(unless you work in the airline industry), so let me explain. Graceful failover in the context ofactive/passive clusters means that when a system failure occurs, or a critical component fails, theresources that make up the application, database, and infrastructure are shut down on the primarysystem and brought back online on the redundant system automatically with as little downtime aspossible. The alternative, and somewhat less graceful, type of failover would involve a phone call to yoursupport staff at 3:30 in the morning. In active/passive clusters, the database and possibly otherapplications only run on one node at a time. Failover using in this configuration can take severalminutes to complete depending on what resources and applications must be migrated. <strong>Oracle</strong> RAC usesan active/active cluster architecture. Failover on an RAC system commonly takes less than a minute tocomplete. True fault tolerance is generally very difficult and much more expensive to implement thanhigh availability. The type of system and impact (or cost) of a failure usually dictates which is moreappropriate. Critical systems on an airliner, space station, or a life support system easily justify a faulttolerantarchitecture. By contrast, a web application servicing the company’s retail store front usuallycannot justify the cost and complexity of a fully fault-tolerant architecture. <strong>Exadata</strong> is a high-availabilityarchitecture providing fully redundant hardware components. When <strong>Oracle</strong> RAC is used, thisredundancy and fast failover is extended to the database tier.When CPU, memory, or I/O resource limits for a single server are reached, additional servers mustbe added to increase capacity. The term “scalability” is often used synonymously with performance.That is, increasing capacity equals increasing performance. But the correlation between capacity andperformance is not a direct one. Take, for example, a single-threaded, CPU intensive program that takes15 minutes to complete on a two-CPU server. Assuming the server isn’t CPU-bound, installing two moreCPUs is not going to make the process run any faster. If it can only run on one CPU at a time, it will onlyexecute as fast as one CPU can process it. Performance will only improve if adding more CPUs allows aprocess to have more uninterrupted time on the processor. Neither will it run any faster if we run it on afour-node cluster. As the old saying goes, nine pregnant women cannot make a baby in one month.However, scaling out to four servers could mean that we can run four copies of our programconcurrently, and get roughly four times the amount of work done in the same 15 minutes. To sum it up,scaling out adds capacity to your system. Whether or not it improves performance depends on howscalable your application is, and how heavily loaded your current system is. Keep in mind that <strong>Oracle</strong>504

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!