HA for OpenStack: Connecting the dots
sTAWd
sTAWd
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>HA</strong> <strong>for</strong> <strong>OpenStack</strong>:<br />
<strong>Connecting</strong> <strong>the</strong> <strong>dots</strong><br />
Raghavan “Rags” Srinivas<br />
Rackspace<br />
<strong>OpenStack</strong> Meetup,<br />
Washington DC on Jan.<br />
23 rd 2013
Rags<br />
• Solutions Architect at Rackspace <strong>for</strong> <strong>OpenStack</strong>-based Rackspace Private Cloud<br />
• Speaker at JavaOne, RSA conferences, Sun Tech Days, JUGs and o<strong>the</strong>r<br />
developer conferences<br />
• Trying to help make <strong>OpenStack</strong> more App Developer friendly
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
<strong>OpenStack</strong> Design Tenets<br />
• Scalability and elasticity are our main goals<br />
• Any feature that limits our main goals must be optional<br />
• Everything should be asynchronous<br />
– a) If you can't do something asynchronously, see #2<br />
• All required components must be horizontally scalable<br />
• Always use shared nothing architecture (SN) or sharding<br />
– a) If you can't Share nothing/shard, see #2<br />
• Distribute everything<br />
– a) Especially logic. Move logic to where state naturally exists.<br />
• Accept eventual consistency and use it where it is appropriate.<br />
• Test everything<br />
RACKSPACE® HOSTING | WWW.RACKSPACE.COM<br />
4
What is <strong>HA</strong>?<br />
• Minimization of system downtime<br />
• Minimization of data/transaction loss<br />
• In case of multiple (or interrelated)<br />
failures, minimization of data loss is<br />
preferred over minimization of system<br />
downtime<br />
<strong>HA</strong> as Nines Downtime/Year<br />
99% (two nines) 3.65 days<br />
99.9% 8.76 hours<br />
99.99% 52.56 minutes<br />
99.999% 5.26 minutes<br />
99.9999% (six nines) 31.5 seconds
Implementing <strong>HA</strong><br />
• Elimination of Single Point of Failure (SPOFs)<br />
• Redundancy of network components such as switchers and routers<br />
• Redundancy of applications and automatic service migrations<br />
• Redundancy of storage components<br />
• Redundancy of facilities services such as power, AC, etc.
Components (High Level)<br />
Client<br />
VIP<br />
NODE 1<br />
Replication Services<br />
Health Check<br />
Cluster Communication<br />
NODE 2<br />
Replication Services<br />
Health Check<br />
Cluster Communication
Concepts<br />
State Description Example<br />
Stateless<br />
• There is no dependency between requests<br />
• No need <strong>for</strong> data replication/synchronization.<br />
Failed request may need to be restarted on a<br />
different node.<br />
Apache web server,<br />
Nova API, Nova<br />
Scheduler, etc.<br />
Stateful<br />
• An action typically comprises multiple requests<br />
• Data needs to be replicated and synchronized<br />
between redundant services (to preserve state<br />
and consistency)<br />
MySQL, RabbitMQ,<br />
etc.
More Concepts<br />
Terminology<br />
Failover<br />
Description<br />
Migration of a service from <strong>the</strong> “primary” to <strong>the</strong><br />
“secondary”<br />
Failback<br />
Migration of service back to <strong>the</strong> “primary”<br />
Switchover<br />
Migration is initiated manually
Much more concepts<br />
Active/Passive<br />
o There is a single master<br />
o Load balance stateless services using a VIP and a<br />
load balancer such as <strong>HA</strong>Proxy<br />
o For Stateful services a replacement resource can be<br />
brought online. A separate application monitors <strong>the</strong>se<br />
services, bringing <strong>the</strong> backup online as necessary<br />
o After a failover <strong>the</strong> system will encounter a speed<br />
bump since <strong>the</strong> passive node has to notice <strong>the</strong> fault<br />
in <strong>the</strong> active node and become active<br />
Active/Active<br />
o Multiple masters<br />
o Load balance stateless services using a VIP and a<br />
load balancer such as <strong>HA</strong>Proxy<br />
o Stateful Services are managed in such a way that<br />
services are redundant, and that all instances have<br />
an identical state<br />
o Updates to one instance of database would<br />
propagate to all o<strong>the</strong>r instances<br />
o After a failover <strong>the</strong> system will function in a<br />
degraded state
<strong>HA</strong> <strong>for</strong> <strong>OpenStack</strong><br />
• <strong>OpenStack</strong> APIs (nova, cinder, etc.)<br />
• RabbitMQ<br />
• MySQL<br />
• Cinder, Swift, and so on<br />
• Heat (still Work in Progress)<br />
• Application running on <strong>OpenStack</strong> (Application<br />
dependent)
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
<strong>HA</strong> on <strong>OpenStack</strong><br />
• Overall Philosophy (Don’t reinvent <strong>the</strong> wheel)<br />
• Leverage time-tested Linux utilities such as Keepalived, <strong>HA</strong>Proxy and Virtual IP<br />
(using VRRP)<br />
• Leverage Hardware Load Balancers<br />
• Leverage replication services <strong>for</strong> RabbitMQ/MySQL such as RabbitMQ<br />
Clustering, MySQL master-master replication, Corosync, Pacemaker, DRBD,<br />
Galera and so on
Keepalived<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
vrrp_script rabbitmq {!<br />
}!<br />
• Based on Linux Virtual Server (IPVS) kernel module providing layer 4 Load<br />
Balancing<br />
• Implements a set of checkers to maintain health and Load Balancing<br />
• <strong>HA</strong> is implemented using VRRP Protocol<br />
script “usr/sbin/service rabbitmq-server status" # Check <strong>the</strong> service status!<br />
interval 5<br />
weight -2<br />
rise 2<br />
fall -2<br />
# check every 5 seconds!<br />
# adjust priority by -2 if OK!<br />
# required number of failures <strong>for</strong> KO switch!<br />
# required number of successes <strong>for</strong> OK switch!
<strong>HA</strong>Proxy<br />
• Load Balancing and Proxying <strong>for</strong> HTTP and TCP<br />
Applications<br />
• Works over multiple connections
<strong>HA</strong> with Keepalived, VRRP &<br />
<strong>HA</strong>Proxy<br />
Application<br />
VRRP<br />
Network Layer<br />
Host1<br />
<strong>HA</strong>Proxy<br />
Keepalived<br />
Host2<br />
Backup<br />
Application Layer<br />
Realserver1<br />
Realserver2
<strong>HA</strong> on Rackspace Private<br />
Cloud<br />
INTERNET<br />
VIP(Keepalived, VRRP)<br />
<strong>HA</strong>Proxy<br />
Controller 1 Controller 2<br />
Active-Passive Infrastructure services<br />
(MySQL, Rabbit)<br />
Active-Active Infrastructure services<br />
(API services)<br />
Heartbeat<br />
Redundant Active-Passive<br />
Infrastructure services<br />
Redundant Active-Active<br />
Infrastructure services<br />
Compute Node 1 Compute Node 2<br />
Compute Node N<br />
VMs Instantiated
<strong>HA</strong> on Rackspace Private<br />
Cloud (switchover)<br />
INTERNET<br />
VIP(<strong>HA</strong>Proxy)<br />
Controller 1 Controller 2<br />
Active-Passive Infrastructure services<br />
(MySQL, Rabbit)<br />
Heartbeat<br />
Infrastructure services<br />
Compute Node 1 Compute Node 2<br />
Compute Node N<br />
VMs Instantiated
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
RabbitMQ <strong>HA</strong><br />
VRID 13<br />
192.168.236.199<br />
E<strong>the</strong>rnet<br />
Controller 1<br />
VRID 13<br />
IP address:<br />
192.168.236.11<br />
Master (Active)<br />
RabbitMQ<br />
RabbitMQ Clustering <br />
RabbitMQ<br />
Backup (Passive)<br />
Controller 2<br />
VRID 13<br />
IP address:<br />
192.168.236.12
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
MYSQL <strong>HA</strong>: MASTER/MASTER REPLICATION
MySQL – Master/Master<br />
Replication<br />
VRID 12<br />
192.168.236.198<br />
E<strong>the</strong>rnet<br />
Controller 1<br />
VRID 12<br />
IP address: 192.168.236.11<br />
Master (Active)<br />
MySQL<br />
Master/Master <br />
MySQL<br />
Backup (Passive)<br />
Controller 2<br />
VRID 12<br />
IP address:<br />
192.168.236.12
MySQL – Master/Master<br />
Replication simplified
MYSQL <strong>HA</strong>: COROSYNC, PACEMAKER AND DRBD
Pacemaker, Corosync and DRBD<br />
Image from: http://dev.mysql.com/doc/refman/5.0/en/ha-drbd.html"<br />
RACKSPACE® HOSTING | WWW.RACKSPACE.COM<br />
26
Pacemaker, Corosync,<br />
DRBD<br />
Pacemaker Corosync DRBD<br />
High availability and load<br />
balancing stack <strong>for</strong> <strong>the</strong> Linux<br />
plat<strong>for</strong>m<br />
Totem single-ring ordering and<br />
membership protocol<br />
Synchronizes data at <strong>the</strong><br />
block device<br />
Interacts with applications<br />
through Resource Agents (RA)<br />
UDP and InfiniBand based<br />
messaging, quorum, and<br />
cluster membership to<br />
Pacemaker<br />
Uses a journaling system<br />
(such as ext3 or ext4)
DRBD<br />
Service<br />
Service<br />
FILE SYSTEM<br />
FILE SYSTEM<br />
BUFFER CACHE<br />
BUFFER CACHE<br />
DRBD<br />
RAW DEVICE<br />
TCP/IP<br />
RAW DEVICE<br />
TCP/IP<br />
DRBD<br />
DISK SCHED<br />
DISK SCHED<br />
DISK DRIVER<br />
NIC DRIVER<br />
NIC DRIVER<br />
DISK DRIVER<br />
DISK<br />
NIC<br />
NIC<br />
DISK
MYSQL <strong>HA</strong>: GALERA
Galera<br />
• Synchronous multi-master cluster<br />
technology <strong>for</strong> MySQL/InnoDB<br />
CLIENTS<br />
• MySQL patched <strong>for</strong> wsrep (Write Set<br />
REPlication)<br />
Transparent<br />
Connections<br />
• Active/active multi-master topology<br />
• Read and write to any cluster node<br />
DBMS<br />
DBMS<br />
DBMS<br />
• True parallel replication, in row level<br />
wsrep API<br />
wsrep API<br />
wsrep API<br />
• No slave lag or integrity issues<br />
Galera Replication
Multi-master replication<br />
• Based on Optimistic Concurrency Control<br />
• In case of two transactions modifying <strong>the</strong> same row on different nodes, one of<br />
<strong>the</strong> transactions will abort<br />
• Victim transaction will get Deadlock Error<br />
• Application needs to handle this error
Multi-master Replication<br />
read & write read & write read & write<br />
MySQL<br />
Multi-master cluster looks<br />
like one big database with<br />
multiple entry points
Multi-master conflicts<br />
write<br />
write<br />
MySQL MySQL MySQL<br />
GALERA REPLICATION
Multi-master conflicts<br />
write<br />
write<br />
MySQL MySQL MySQL<br />
Conflict detected<br />
GALERA REPLICATION
Multi-master conflicts<br />
OK write Deadlock<br />
error<br />
MySQL MySQL MySQL<br />
GALERA REPLICATION
<strong>OpenStack</strong> and Galera<br />
Image from http://www.severalnines.com/blog/clustering-mysql-backendopenstack"
Galera on Rackspace Private<br />
Cloud/<strong>OpenStack</strong><br />
A How To: OFFICIALLY UNSUPPORTED<br />
1. Install Rackspace Private Cloud on 2 controllers with <strong>HA</strong> mode (Haproxy, Keepalived<br />
and VRRP is already installed)<br />
2. Install Galera (with ws-rep) on 3 separate nodes<br />
3. Mysqldump from controller nodes to Galera node<br />
4. Grant privileges to <strong>OpenStack</strong> (nova, glance, etc.) and haproxy users<br />
5. Update keepalived and haproxy and <strong>OpenStack</strong> configuration files on controller/compute<br />
6. Stop/Uninstall MySQL services on controller nodes and restart controller nodes
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
A PEEK INTO <strong>HA</strong> ON PUBLIC CLOUD
<strong>HA</strong> on <strong>the</strong> Public Cloud
Agenda <br />
What is <strong>HA</strong>?<br />
<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />
<strong>HA</strong> of RabbitMQ<br />
MySQL <strong>HA</strong><br />
A Peek into <strong>HA</strong> on Public Cloud<br />
Resources and Summary
<strong>HA</strong> methods<br />
Infrastructure<br />
Clustering/Replication<br />
Technique<br />
Characteristics<br />
<strong>OpenStack</strong> APIs None required (Stateless)<br />
• <strong>HA</strong> also serves as scale out using<br />
<strong>HA</strong>Proxy<br />
RabbitMQ RabbitMQ Clustering<br />
• RabbitMQ Clustering is setup <strong>for</strong> single/<br />
multiple nodes<br />
Heat TBD<br />
• Application Dependent (No standard<br />
methods yet).<br />
MySQL Many • Discussed later slide
<strong>HA</strong> methods <strong>for</strong> MySQL<br />
Clustering Method Replication Technique Characteristics<br />
Pacemaker/Corosync/DRBD Mirroring on Block Devices<br />
• Well tested, more complex to setup.<br />
• Split brain possibility<br />
Keepalived/<strong>HA</strong>Proxy/VRRP<br />
Works on MySQL master-master<br />
replication<br />
• Simple to implement and understand.<br />
• Works <strong>for</strong> any storage system.<br />
• Master-master replication does not work<br />
beyond 2 nodes.<br />
Galera<br />
O<strong>the</strong>rs<br />
Based on write-set Replication<br />
(wsrep)<br />
MySQL Cluster, RHCS with DAS/<br />
SAN Storage<br />
• No Slave lag<br />
• Needs at least 3 nodes<br />
• Deadlock erros on hotspot rows.<br />
• Relatively new<br />
• Some relatively new (GTID)<br />
• Some well test<br />
• More complex setup
Resources<br />
• <strong>OpenStack</strong> <strong>HA</strong> guide<br />
• http://docs.openstack.org/high-availability-guide/content/ch-intro.html<br />
• https://wiki.ubuntu.com/ServerTeam/<strong>OpenStack</strong><strong>HA</strong><br />
• O<strong>the</strong>r Resources<br />
• http://www.rackspace.com/blog/implementing-high-availability-ha-<strong>for</strong>-rackspace-private-cloud/<br />
• http://www.rackspace.com/blog/high-availability-ha-with-galera-<strong>for</strong>-rackspace-private-cloud/https://www.hastexo.com/<br />
• http://www.mysql.com/why-mysql/white-papers/mysql-high-availability-drbd-configuration-deployment-guide/<br />
• http://docwiki.cisco.com/wiki/<strong>OpenStack</strong>_Havana_Release:_High-Availability_Manual_Deployment_Guide<br />
• http://www.drbd.org/<br />
• http://www.codership.com/<br />
• http://www.severalnines.com/blog/clustering-mysql-backend-openstack<br />
• https://wiki.openstack.org/wiki/BasicDesignTenets<br />
• http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf
Book
Summary<br />
• In general leverage existing methods of <strong>HA</strong><br />
• There are several time-tested and more recent methods <strong>for</strong> implementing MySQL <strong>HA</strong>.<br />
• Rackspace Private Cloud provides Chef cookbooks and recipes <strong>for</strong> implementing <strong>HA</strong> via Keepalived,<br />
<strong>HA</strong>Proxy and VRRP.<br />
• Galera is gaining more popularity. Since it’s Active/Active it does scale out and is <strong>HA</strong>.<br />
• Few steps to get from Rackspace Private Cloud to MySQL with Galera (officially unsupported).<br />
• Corosync/Pacemaker/DRBD is recommended by Oracle/MySQL.<br />
• <strong>OpenStack</strong> <strong>HA</strong> guide goes through all <strong>the</strong>se options in more detail.
Thank you!<br />
Raghavan “Rags” Srinivas<br />
Solutions Architect<br />
Rackspace