29.11.2014 Views

HA for OpenStack: Connecting the dots

sTAWd

sTAWd

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>HA</strong> <strong>for</strong> <strong>OpenStack</strong>:<br />

<strong>Connecting</strong> <strong>the</strong> <strong>dots</strong><br />

Raghavan “Rags” Srinivas<br />

Rackspace<br />

<strong>OpenStack</strong> Meetup,<br />

Washington DC on Jan.<br />

23 rd 2013


Rags<br />

• Solutions Architect at Rackspace <strong>for</strong> <strong>OpenStack</strong>-based Rackspace Private Cloud<br />

• Speaker at JavaOne, RSA conferences, Sun Tech Days, JUGs and o<strong>the</strong>r<br />

developer conferences<br />

• Trying to help make <strong>OpenStack</strong> more App Developer friendly


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


<strong>OpenStack</strong> Design Tenets<br />

• Scalability and elasticity are our main goals<br />

• Any feature that limits our main goals must be optional<br />

• Everything should be asynchronous<br />

– a) If you can't do something asynchronously, see #2<br />

• All required components must be horizontally scalable<br />

• Always use shared nothing architecture (SN) or sharding<br />

– a) If you can't Share nothing/shard, see #2<br />

• Distribute everything<br />

– a) Especially logic. Move logic to where state naturally exists.<br />

• Accept eventual consistency and use it where it is appropriate.<br />

• Test everything<br />

RACKSPACE® HOSTING | WWW.RACKSPACE.COM<br />

4


What is <strong>HA</strong>?<br />

• Minimization of system downtime<br />

• Minimization of data/transaction loss<br />

• In case of multiple (or interrelated)<br />

failures, minimization of data loss is<br />

preferred over minimization of system<br />

downtime<br />

<strong>HA</strong> as Nines Downtime/Year<br />

99% (two nines) 3.65 days<br />

99.9% 8.76 hours<br />

99.99% 52.56 minutes<br />

99.999% 5.26 minutes<br />

99.9999% (six nines) 31.5 seconds


Implementing <strong>HA</strong><br />

• Elimination of Single Point of Failure (SPOFs)<br />

• Redundancy of network components such as switchers and routers<br />

• Redundancy of applications and automatic service migrations<br />

• Redundancy of storage components<br />

• Redundancy of facilities services such as power, AC, etc.


Components (High Level)<br />

Client<br />

VIP<br />

NODE 1<br />

Replication Services<br />

Health Check<br />

Cluster Communication<br />

NODE 2<br />

Replication Services<br />

Health Check<br />

Cluster Communication


Concepts<br />

State Description Example<br />

Stateless<br />

• There is no dependency between requests<br />

• No need <strong>for</strong> data replication/synchronization.<br />

Failed request may need to be restarted on a<br />

different node.<br />

Apache web server,<br />

Nova API, Nova<br />

Scheduler, etc.<br />

Stateful<br />

• An action typically comprises multiple requests<br />

• Data needs to be replicated and synchronized<br />

between redundant services (to preserve state<br />

and consistency)<br />

MySQL, RabbitMQ,<br />

etc.


More Concepts<br />

Terminology<br />

Failover<br />

Description<br />

Migration of a service from <strong>the</strong> “primary” to <strong>the</strong><br />

“secondary”<br />

Failback<br />

Migration of service back to <strong>the</strong> “primary”<br />

Switchover<br />

Migration is initiated manually


Much more concepts<br />

Active/Passive<br />

o There is a single master<br />

o Load balance stateless services using a VIP and a<br />

load balancer such as <strong>HA</strong>Proxy<br />

o For Stateful services a replacement resource can be<br />

brought online. A separate application monitors <strong>the</strong>se<br />

services, bringing <strong>the</strong> backup online as necessary<br />

o After a failover <strong>the</strong> system will encounter a speed<br />

bump since <strong>the</strong> passive node has to notice <strong>the</strong> fault<br />

in <strong>the</strong> active node and become active<br />

Active/Active<br />

o Multiple masters<br />

o Load balance stateless services using a VIP and a<br />

load balancer such as <strong>HA</strong>Proxy<br />

o Stateful Services are managed in such a way that<br />

services are redundant, and that all instances have<br />

an identical state<br />

o Updates to one instance of database would<br />

propagate to all o<strong>the</strong>r instances<br />

o After a failover <strong>the</strong> system will function in a<br />

degraded state


<strong>HA</strong> <strong>for</strong> <strong>OpenStack</strong><br />

• <strong>OpenStack</strong> APIs (nova, cinder, etc.)<br />

• RabbitMQ<br />

• MySQL<br />

• Cinder, Swift, and so on<br />

• Heat (still Work in Progress)<br />

• Application running on <strong>OpenStack</strong> (Application<br />

dependent)


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


<strong>HA</strong> on <strong>OpenStack</strong><br />

• Overall Philosophy (Don’t reinvent <strong>the</strong> wheel)<br />

• Leverage time-tested Linux utilities such as Keepalived, <strong>HA</strong>Proxy and Virtual IP<br />

(using VRRP)<br />

• Leverage Hardware Load Balancers<br />

• Leverage replication services <strong>for</strong> RabbitMQ/MySQL such as RabbitMQ<br />

Clustering, MySQL master-master replication, Corosync, Pacemaker, DRBD,<br />

Galera and so on


Keepalived<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

vrrp_script rabbitmq {!<br />

}!<br />

• Based on Linux Virtual Server (IPVS) kernel module providing layer 4 Load<br />

Balancing<br />

• Implements a set of checkers to maintain health and Load Balancing<br />

• <strong>HA</strong> is implemented using VRRP Protocol<br />

script “usr/sbin/service rabbitmq-server status" # Check <strong>the</strong> service status!<br />

interval 5<br />

weight -2<br />

rise 2<br />

fall -2<br />

# check every 5 seconds!<br />

# adjust priority by -2 if OK!<br />

# required number of failures <strong>for</strong> KO switch!<br />

# required number of successes <strong>for</strong> OK switch!


<strong>HA</strong>Proxy<br />

• Load Balancing and Proxying <strong>for</strong> HTTP and TCP<br />

Applications<br />

• Works over multiple connections


<strong>HA</strong> with Keepalived, VRRP &<br />

<strong>HA</strong>Proxy<br />

Application<br />

VRRP<br />

Network Layer<br />

Host1<br />

<strong>HA</strong>Proxy<br />

Keepalived<br />

Host2<br />

Backup<br />

Application Layer<br />

Realserver1<br />

Realserver2


<strong>HA</strong> on Rackspace Private<br />

Cloud<br />

INTERNET<br />

VIP(Keepalived, VRRP)<br />

<strong>HA</strong>Proxy<br />

Controller 1 Controller 2<br />

Active-Passive Infrastructure services<br />

(MySQL, Rabbit)<br />

Active-Active Infrastructure services<br />

(API services)<br />

Heartbeat<br />

Redundant Active-Passive<br />

Infrastructure services<br />

Redundant Active-Active<br />

Infrastructure services<br />

Compute Node 1 Compute Node 2<br />

Compute Node N<br />

VMs Instantiated


<strong>HA</strong> on Rackspace Private<br />

Cloud (switchover)<br />

INTERNET<br />

VIP(<strong>HA</strong>Proxy)<br />

Controller 1 Controller 2<br />

Active-Passive Infrastructure services<br />

(MySQL, Rabbit)<br />

Heartbeat<br />

Infrastructure services<br />

Compute Node 1 Compute Node 2<br />

Compute Node N<br />

VMs Instantiated


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


RabbitMQ <strong>HA</strong><br />

VRID 13<br />

192.168.236.199<br />

E<strong>the</strong>rnet<br />

Controller 1<br />

VRID 13<br />

IP address:<br />

192.168.236.11<br />

Master (Active)<br />

RabbitMQ<br />

RabbitMQ Clustering <br />

RabbitMQ<br />

Backup (Passive)<br />

Controller 2<br />

VRID 13<br />

IP address:<br />

192.168.236.12


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


MYSQL <strong>HA</strong>: MASTER/MASTER REPLICATION


MySQL – Master/Master<br />

Replication<br />

VRID 12<br />

192.168.236.198<br />

E<strong>the</strong>rnet<br />

Controller 1<br />

VRID 12<br />

IP address: 192.168.236.11<br />

Master (Active)<br />

MySQL<br />

Master/Master <br />

MySQL<br />

Backup (Passive)<br />

Controller 2<br />

VRID 12<br />

IP address:<br />

192.168.236.12


MySQL – Master/Master<br />

Replication simplified


MYSQL <strong>HA</strong>: COROSYNC, PACEMAKER AND DRBD


Pacemaker, Corosync and DRBD<br />

Image from: http://dev.mysql.com/doc/refman/5.0/en/ha-drbd.html"<br />

RACKSPACE® HOSTING | WWW.RACKSPACE.COM<br />

26


Pacemaker, Corosync,<br />

DRBD<br />

Pacemaker Corosync DRBD<br />

High availability and load<br />

balancing stack <strong>for</strong> <strong>the</strong> Linux<br />

plat<strong>for</strong>m<br />

Totem single-ring ordering and<br />

membership protocol<br />

Synchronizes data at <strong>the</strong><br />

block device<br />

Interacts with applications<br />

through Resource Agents (RA)<br />

UDP and InfiniBand based<br />

messaging, quorum, and<br />

cluster membership to<br />

Pacemaker<br />

Uses a journaling system<br />

(such as ext3 or ext4)


DRBD<br />

Service<br />

Service<br />

FILE SYSTEM<br />

FILE SYSTEM<br />

BUFFER CACHE<br />

BUFFER CACHE<br />

DRBD<br />

RAW DEVICE<br />

TCP/IP<br />

RAW DEVICE<br />

TCP/IP<br />

DRBD<br />

DISK SCHED<br />

DISK SCHED<br />

DISK DRIVER<br />

NIC DRIVER<br />

NIC DRIVER<br />

DISK DRIVER<br />

DISK<br />

NIC<br />

NIC<br />

DISK


MYSQL <strong>HA</strong>: GALERA


Galera<br />

• Synchronous multi-master cluster<br />

technology <strong>for</strong> MySQL/InnoDB<br />

CLIENTS<br />

• MySQL patched <strong>for</strong> wsrep (Write Set<br />

REPlication)<br />

Transparent<br />

Connections<br />

• Active/active multi-master topology<br />

• Read and write to any cluster node<br />

DBMS<br />

DBMS<br />

DBMS<br />

• True parallel replication, in row level<br />

wsrep API<br />

wsrep API<br />

wsrep API<br />

• No slave lag or integrity issues<br />

Galera Replication


Multi-master replication<br />

• Based on Optimistic Concurrency Control<br />

• In case of two transactions modifying <strong>the</strong> same row on different nodes, one of<br />

<strong>the</strong> transactions will abort<br />

• Victim transaction will get Deadlock Error<br />

• Application needs to handle this error


Multi-master Replication<br />

read & write read & write read & write<br />

MySQL<br />

Multi-master cluster looks<br />

like one big database with<br />

multiple entry points


Multi-master conflicts<br />

write<br />

write<br />

MySQL MySQL MySQL<br />

GALERA REPLICATION


Multi-master conflicts<br />

write<br />

write<br />

MySQL MySQL MySQL<br />

Conflict detected<br />

GALERA REPLICATION


Multi-master conflicts<br />

OK write Deadlock<br />

error<br />

MySQL MySQL MySQL<br />

GALERA REPLICATION


<strong>OpenStack</strong> and Galera<br />

Image from http://www.severalnines.com/blog/clustering-mysql-backendopenstack"


Galera on Rackspace Private<br />

Cloud/<strong>OpenStack</strong><br />

A How To: OFFICIALLY UNSUPPORTED<br />

1. Install Rackspace Private Cloud on 2 controllers with <strong>HA</strong> mode (Haproxy, Keepalived<br />

and VRRP is already installed)<br />

2. Install Galera (with ws-rep) on 3 separate nodes<br />

3. Mysqldump from controller nodes to Galera node<br />

4. Grant privileges to <strong>OpenStack</strong> (nova, glance, etc.) and haproxy users<br />

5. Update keepalived and haproxy and <strong>OpenStack</strong> configuration files on controller/compute<br />

6. Stop/Uninstall MySQL services on controller nodes and restart controller nodes


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


A PEEK INTO <strong>HA</strong> ON PUBLIC CLOUD


<strong>HA</strong> on <strong>the</strong> Public Cloud


Agenda <br />

What is <strong>HA</strong>?<br />

<strong>HA</strong> of <strong>OpenStack</strong> APIs<br />

<strong>HA</strong> of RabbitMQ<br />

MySQL <strong>HA</strong><br />

A Peek into <strong>HA</strong> on Public Cloud<br />

Resources and Summary


<strong>HA</strong> methods<br />

Infrastructure<br />

Clustering/Replication<br />

Technique<br />

Characteristics<br />

<strong>OpenStack</strong> APIs None required (Stateless)<br />

• <strong>HA</strong> also serves as scale out using<br />

<strong>HA</strong>Proxy<br />

RabbitMQ RabbitMQ Clustering<br />

• RabbitMQ Clustering is setup <strong>for</strong> single/<br />

multiple nodes<br />

Heat TBD<br />

• Application Dependent (No standard<br />

methods yet).<br />

MySQL Many • Discussed later slide


<strong>HA</strong> methods <strong>for</strong> MySQL<br />

Clustering Method Replication Technique Characteristics<br />

Pacemaker/Corosync/DRBD Mirroring on Block Devices<br />

• Well tested, more complex to setup.<br />

• Split brain possibility<br />

Keepalived/<strong>HA</strong>Proxy/VRRP<br />

Works on MySQL master-master<br />

replication<br />

• Simple to implement and understand.<br />

• Works <strong>for</strong> any storage system.<br />

• Master-master replication does not work<br />

beyond 2 nodes.<br />

Galera<br />

O<strong>the</strong>rs<br />

Based on write-set Replication<br />

(wsrep)<br />

MySQL Cluster, RHCS with DAS/<br />

SAN Storage<br />

• No Slave lag<br />

• Needs at least 3 nodes<br />

• Deadlock erros on hotspot rows.<br />

• Relatively new<br />

• Some relatively new (GTID)<br />

• Some well test<br />

• More complex setup


Resources<br />

• <strong>OpenStack</strong> <strong>HA</strong> guide<br />

• http://docs.openstack.org/high-availability-guide/content/ch-intro.html<br />

• https://wiki.ubuntu.com/ServerTeam/<strong>OpenStack</strong><strong>HA</strong><br />

• O<strong>the</strong>r Resources<br />

• http://www.rackspace.com/blog/implementing-high-availability-ha-<strong>for</strong>-rackspace-private-cloud/<br />

• http://www.rackspace.com/blog/high-availability-ha-with-galera-<strong>for</strong>-rackspace-private-cloud/https://www.hastexo.com/<br />

• http://www.mysql.com/why-mysql/white-papers/mysql-high-availability-drbd-configuration-deployment-guide/<br />

• http://docwiki.cisco.com/wiki/<strong>OpenStack</strong>_Havana_Release:_High-Availability_Manual_Deployment_Guide<br />

• http://www.drbd.org/<br />

• http://www.codership.com/<br />

• http://www.severalnines.com/blog/clustering-mysql-backend-openstack<br />

• https://wiki.openstack.org/wiki/BasicDesignTenets<br />

• http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf


Book


Summary<br />

• In general leverage existing methods of <strong>HA</strong><br />

• There are several time-tested and more recent methods <strong>for</strong> implementing MySQL <strong>HA</strong>.<br />

• Rackspace Private Cloud provides Chef cookbooks and recipes <strong>for</strong> implementing <strong>HA</strong> via Keepalived,<br />

<strong>HA</strong>Proxy and VRRP.<br />

• Galera is gaining more popularity. Since it’s Active/Active it does scale out and is <strong>HA</strong>.<br />

• Few steps to get from Rackspace Private Cloud to MySQL with Galera (officially unsupported).<br />

• Corosync/Pacemaker/DRBD is recommended by Oracle/MySQL.<br />

• <strong>OpenStack</strong> <strong>HA</strong> guide goes through all <strong>the</strong>se options in more detail.


Thank you!<br />

Raghavan “Rags” Srinivas<br />

Solutions Architect<br />

Rackspace

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!