29.12.2013 Views

l-migrate2cloud-1-pd..

l-migrate2cloud-1-pd..

l-migrate2cloud-1-pd..

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Migrate your Linux application to the Amazon<br />

cloud, Part 1: Initial migration<br />

How to migrate your application into the cloud<br />

Skill Level: Intermediate<br />

Sean Walberg (sean@ertw.com)<br />

Network Engineer<br />

13 Jul 2010<br />

Cloud computing and Infrastructure as a Service (IaaS) are well documented, but<br />

what's often not discussed is how to get a running application into a cloud<br />

environment. Discover how to move an application into the cloud and take advantage<br />

of the features this setup has to offer.<br />

Read more by Sean<br />

Browse all of Sean's articles on developerWorks.<br />

Infrastructure as a Service (IaaS) is a great concept: You use computing resources;<br />

you pay for them. You want more computing power; you pay more. The downside of<br />

this model is that you're working with computers that you'll never see or know much<br />

about. Once you get over that, however, there's a lot to be gained by using IaaS.<br />

Because the IaaS model is so different from the traditional model of buying servers,<br />

the way you manage your virtual computers changes. The way you run your<br />

application in the cloud also changes. Things you once took for granted, such as<br />

negligible latency between servers, are no longer givens.<br />

This series of articles follows the migration of a web application from a single<br />

physical server to Amazon Elastic Compute Cloud (Amazon EC2). Along the way,<br />

you learn how to adapt your application to the cloud environment and how to take<br />

advantage of the features that the cloud has to offer. To start, you see a straight<br />

migration from one physical server to a cloud server.<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 1 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

Working with Amazon EC2<br />

Amazon EC2 lets anyone with a credit card pay for servers by the hour, turning them<br />

on and off through an application programming interface (API). You have a variety of<br />

server types to choose from—depending on whether memory, disk, or CPU power is<br />

your primary concern—along with a suite of add-ons from persistent disks to load<br />

balancers. You pay only for what you use.<br />

Alongside the Amazon EC2 offering are others that give you, among other things,<br />

payment processing, databases, and message queuing. In this article series, you will<br />

be using Amazon Simple Storage Service (Amazon S3), which gives you access to<br />

disk space on a pay-per-use basis.<br />

The example application<br />

The web application that this series uses for examples is a payroll service called<br />

SmallPayroll.ca, written with the Ruby on Rails framework and a PostgreSQL back<br />

end. It is typical of many web applications: It has a database tier, an application tier,<br />

and a set of static files like cascading style sheet (CSS) and JavaScript files. Users<br />

navigate various forms to input and manipulate data, and they generate reports.<br />

The various components in use are:<br />

• Nginx. The front-end web server for static files and balancer to the middle<br />

tier.<br />

• Mongrel. The application server itself.<br />

• Ruby. The language you write the application in.<br />

• Gems. Third-party plug-ins and libraries for everything from database<br />

encryption to application-level monitoring.<br />

• PostgreSQL. The Structured Query Language database engine.<br />

Use of the site has exceeded the capacity of the single server that now houses it.<br />

Therefore, a migration to a new environment is in order, and this is a prime<br />

opportunity to move to the cloud.<br />

Desired improvements<br />

But simply moving from one server to a small number of cloud-based servers<br />

wouldn't take advantage of what can be done in the cloud, nor would it make for<br />

exciting reading. So, during the move, you'll make improvements, some of which are<br />

only possible in a cloud environment:<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 2 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

• Increased reliability. Because you can choose the size of server to run<br />

in the cloud, you can run multiple, smaller servers for redundancy.<br />

• Capacity for both scale-up and scale-down. Servers are incrementally<br />

added to the pool as the service grows. However, the number of servers<br />

can also be increased to accommodate short-term spikes in traffic or<br />

decreased during periodic lulls.<br />

• Cloud storage. Backups of the application data will be made to Amazon<br />

S3, eliminating the need for tape storage.<br />

• Automation. Everything in the Amazon environment—from the servers to<br />

the storage to the load balancers—can be automated. Less time<br />

managing an application means more time for other, more productive<br />

things.<br />

You'll make these improvements incrementally throughout this article series.<br />

Testing and migration strategies<br />

When deploying an application for the first time, you generally have the luxury of<br />

being able to test and tweak without the burden of production traffic. In contrast,<br />

when migrating an application, you have the challenge of users who are placing a<br />

load on the site. Once the new environment takes production traffic, the users will be<br />

expecting everything to work properly.<br />

A migration does not necessarily mean zero downtime. It's much easier if you can<br />

take the service offline for a period of time. You can use this outage window to<br />

perform final data synchronizations and allow for any network changes to stabilize.<br />

The window should not be used to do the initial deploy to the new environment—that<br />

is, the new environment should be in an operational state before the application<br />

migration starts. With this in mind, the key points are synchronization of data<br />

between the environments and network changes.<br />

As you plan your migration strategy, it helps to begin with a walk-through of your<br />

current environment. Answer the following questions:<br />

• What software do I use on my servers to run the application?<br />

• What software do I use on my servers to manage and monitor the<br />

application and server resources?<br />

• Where is all the user data kept? In databases? In files?<br />

• Are static assets, like images, CSS, and JavaScript files, stored<br />

somewhere else?<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 3 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

• What touchpoints into other systems does the application need?<br />

• Have I backed everything up recently?<br />

Notifying users<br />

In general, notifying your users is a good thing, even if you don't anticipate any<br />

downtime. In the case of the SmallPayroll.ca application, users tend to use the site<br />

at a consistent interval, corresponding with their two-week payroll cycle. Therefore,<br />

two weeks' notice would be a reasonable period. Sites like Google AdWords, which<br />

is the administrative interface for the Google advertising platform, give about a<br />

week's notice. If your website is more of a news site where users would not be as<br />

disrupted if it were down for an hour, you may choose to give notification on the day<br />

of the outage.<br />

The form of notification also varies depending on the nature of your site and how you<br />

currently communicate with your users. For SmallPayroll.ca, a prominent message<br />

when the user logs in will be enough. For example, a message like "The system will<br />

be unavailable between 12:01 a.m. and 1 a.m. Eastern time, June 24, 2010.<br />

Everything entered prior to this will still be saved. For more information, click here."<br />

This message provides the three key pieces of information that users need to know:<br />

• When the outage will happen, including the time zone<br />

• Reassurance that their data will be safe<br />

• Pointer to further information<br />

If possible, avoid using 12:00 a.m. or 12:00 p.m., including the term midnight. These<br />

tend to confuse people, as many are not sure if midnight on June 17 refers to early<br />

morning (12:01 a.m.) or very late (11:59 p.m.). Similarly, many are not sure whether<br />

noon means 12 a.m. or 12 p.m. It's much easier to add a minute and make the time<br />

unambiguous.<br />

Your details may be different, especially if you anticipate partial functionality during<br />

the outage. If you decide that you are going to put the notice up only during the<br />

outage (such as for a news site), the same information will still be helpful. My<br />

favorite site outage screen was along the lines of "The site is down for maintenance;<br />

back up around 3 p.m. EST. Play this game of Asteroids while you're waiting!"<br />

Don't neglect your internal users, either. If you have account representatives, you<br />

will want to give them notice in case their clients ask any questions.<br />

DNS considerations<br />

The domain name system (DNS) takes care of translating a name like<br />

www.example.com into an IP address like 192.0.32.10. Your computer connects to<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 4 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

IP addresses, so this translation is important. When migrating from one environment<br />

to another, you are almost guaranteed to be using a different IP address (the<br />

exception would be if you're staying in the same physical building).<br />

Computers cache the name-to-IP mapping for a certain period of time, known as the<br />

time to live (TTL), to reduce overall response time. When you make the switch from<br />

one environment to another—and therefore from one IP address to another—people<br />

who have the DNS entry cached will continue to try to use the old environment. The<br />

DNS entry for the application and its associated TTL must be managed carefully.<br />

TTLs are typically between one hour and one day. In preparation for a migration,<br />

though, you would want the TTL to be something short such as 5 minutes. This<br />

change must be made at least one TTL period before you intend to change the<br />

address, because computers get the TTL along with the name-to-IP mapping. For<br />

example, if the TTL for www.example.com were set to 86,400 seconds (one day),<br />

you would need to reset the TTL to 5 minutes at least one day before the migration.<br />

Decoupling the old and new environments<br />

It is essential that you fully test your new environment before migrating. All testing<br />

should happen in isolation from the production environment, preferably with a<br />

snapshot of production data so you can better exercise the new environment.<br />

Performing a full test with a snapshot of production data serves two purposes. The<br />

first is that you are more likely to spot errors if you are using real-world data,<br />

because it is more unpredictable than the test data used during development.<br />

Real-world data may refer to files that you forgot to copy over or that require certain<br />

configurations that were forgotten during your walk-through.<br />

The second reason to use production data is that you can practice your migration at<br />

the same time as you load data. You should be able to prove most aspects of your<br />

migration plan, except for the actual switch of environments.<br />

Even though you will be mocking up your new environment as if it were production,<br />

only one environment can be associated with the host name of the application. The<br />

easiest way to get around this requirement is to make a DNS override in your hosts<br />

file. In UNIX®, this file resides at /etc/hosts; in Windows®, it resides in<br />

C:\windows\system32\drivers\etc\hosts. Simply follow the format of the existing lines,<br />

and add an entry pointing your application's host name to its future IP address. Don't<br />

forget to do the same for any image servers or anything else that you will be moving.<br />

You will probably have to restart your browser, but after that, you will be able to<br />

enter your production URL and be taken to your new environment, instead.<br />

An Amazon EC2 primer<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 5 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

The Amazon EC2 service allows you to pay for a virtual machine (VM) by the hour.<br />

Amazon offers several different types of machines and classifies them by their CPU,<br />

memory, and disk profiles. Amazon measures memory and disk in terms of<br />

gigabytes and CPU in terms of Amazon EC2 Compute Units (ECU), where 1 ECU is<br />

roughly a 1.0 to 1.2GHz AMD Opteron or Intel® Xeon® processor (2007 era). For<br />

example, the standard small instance gives you 1.7GB of memory, 160GB of disk<br />

space, and 1 ECU of CPU. At the time of this writing, the biggest machine is the<br />

High-memory Quadruple Extra Large, which has 68.4GB of memory, 1.7TB of disk<br />

space, and 26 ECUs split across eight virtual cores. The prices range from US 8.5<br />

cents per hour for the smallest to US$2.40 per hour for the biggest.<br />

An Amazon EC2 instance begins life as an Amazon Machine Image (AMI), which is<br />

a template you use to build any number of VMs. Amazon publishes some AMIs, and<br />

you can make your own and share them with others. Some of these user-created<br />

AMIs are available at no cost; some incur an hourly charge on top of the Amazon<br />

hourly charge. For example, IBM publishes several paid AMIs that let you pay for<br />

licensing on an hourly basis.<br />

When you want to boot a VM, you choose the machine type and an AMI. The AMI is<br />

stored in Amazon S3 and copied to the root partition of your VM when you launch<br />

the instance. The root partition is always 10GB. The storage space associated with<br />

the machine type is called the instance storage or ephemeral storage and is<br />

presented to your VM as a separate drive. The storage is called ephemeral, because<br />

when you shut down your instance, the information is gone forever. You are required<br />

to back up your own data periodically to protect against loss. This also means that if<br />

the physical host running your instance crashes, your instance is shut down and the<br />

ephemeral disk is lost.<br />

The Amazon Machine Image<br />

All AMIs are assigned an identifier by Amazon, such as ami-0bbd5462. Amazon<br />

provides some public AMIs, and other people have made their own AMIs public. You<br />

can choose to start with a public AMI and make your own modifications, or you can<br />

start from scratch. Any time you make changes to the root file system of an AMI, you<br />

can save it as a new AMI, which is called re-bundling.<br />

In this series, you will be starting off with a publicly available CentOS image, though<br />

you can choose a different one. It is wise to spend some time looking through any<br />

image you use to make sure there are no extra accounts and that the packages are<br />

u<strong>pd</strong>ated. It is also possible to roll your own AMI from scratch, but that is outside the<br />

scope of this article.<br />

The Amazon API<br />

All of the functionality necessary to start, stop, and use the Amazon EC2 cloud is<br />

available using a web service. Amazon publishes the specifications for the web<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 6 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

services and also provides a set of command-line tools. You should download these<br />

tools before proceeding (see Resources). I also encourage you to look at the quick<br />

start guide (see Resources) to get your environment set up, which will save you a lot<br />

of typing.<br />

You authenticate to the API using security credentials. These credentials are found<br />

from the Account link within the Amazon Web Services (AWS) management<br />

console (see Resources). You will need your X.509 certificate files and your access<br />

keys. Keep these safe! Anyone with them could use AWS resources and incur<br />

charges on your behalf.<br />

Before you launch your first instance<br />

Before you launch your first instance, you must generate Secure Shell (SSH) keys to<br />

authenticate to your new instance and set up the virtual firewall to protect your<br />

instance. Listing 1 shows the use of the ec2-add-keypair command to generate<br />

an SSH key pair.<br />

Listing 1. Generating an SSH key pair<br />

[sean@sergeant:~]$ ec2-add-keypair main<br />

KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01<br />

-----BEGIN RSA PRIVATE KEY-----<br />

MIIEpAIBAAKCAQEAu8cTsq84bHLVhDG3n/fe9FGz0fs0j/FwZiDDovwfpxA/lijaedg6lA7KBzvn<br />

...<br />

-----END RSA PRIVATE KEY-----<br />

[sean@sergeant:~]$ ec2-describe-keypairs<br />

KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01<br />

The first command tells Amazon to generate a key pair with the name main. The first<br />

line of the result gives the hash of the key. The rest of the output is an unencrypted<br />

PEM private key. You must store this key somewhere—for example,<br />

~/.ssh/main.pem. Amazon retains the public portion of the key, which will be made<br />

available to the VMs you launch.<br />

The second command, ec2-describe-keypairs, asks Amazon for the current<br />

list of key pairs. The result is the name of the key pair, followed by the hash.<br />

Each instance is protected by a virtual firewall that initially allows nothing in. Amazon<br />

EC2 calls these instances security groups and has API calls and commands to<br />

manipulate them. You will look at these more closely when the time comes. In the<br />

meantime, Listing 2 shows how to view your current groups.<br />

Listing 2. Displaying the current security groups<br />

[sean@sergeant:~]$ ec2-describe-group<br />

GROUP 223110335193 default default group<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 7 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

Listing 2 shows a group called default with a description of "default group." The user<br />

ID associated with the group is 223110335193. There are no rules in this group. If<br />

there were, they would be described below the group with the word PERMISSION in<br />

the left column.<br />

Preparing the cloud environment<br />

The first step is to prepare the cloud environment to test the application. The new<br />

environment will mimic the current production environment.<br />

Start by launching the AMI, which has an ID of ami-10b55379. Listing 3 shows the<br />

AMI being launched and the status being checked.<br />

Listing 3. Launching the CentOS AMI<br />

[sean@sergeant:~]$ ec2-run-instances ami-10b55379 -k main<br />

RESERVATION r-750fff1e 223110335193 default<br />

INSTANCE i-75aaf41e ami-10b55379 pending main 0 m1.small<br />

2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />

instance-store<br />

[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e<br />

RESERVATION r-750fff1e 223110335193 default<br />

i-75aaf41e ami-10b55379 pending main 0 E3D48CEE m1.small<br />

2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />

instance-store<br />

[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e<br />

RESERVATION r-750fff1e 223110335193 default<br />

INSTANCE i-75aaf41e ami-10b55379 ec2-184-73-43-141.compute-1.amazonaws.com<br />

domU-12-31-39-00-64-71.compute-1.internal running main 0 E3D48CEE m1.small<br />

2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />

184.73.43.141 10.254.107.127 instance-store<br />

The first command launches the instance using the ami-10b55379 AMI and specifies<br />

that the key pair generated in Listing 1 is to be used to authenticate to the machine.<br />

The command returns several pieces of information, the most important being the<br />

instance identifier (i-750fff1e), which is the identity of the machine in the Amazon<br />

EC2 cloud. The second command uses the ec2-describe-instances<br />

command, which lists all the running instances. In Listing 3, the instance identifier<br />

has been passed on the command line to only show information about that instance.<br />

The state of the instance is listed as pending, which means that the instance is still<br />

being started. The IBM AMI is large, so it typically takes 5-10 minutes just to start.<br />

Running the same command some time later shows that the state is running and<br />

that the external IP address of 184.73.43.141 has been given. The internal IP<br />

address that starts with 10 is useful for talking within the Amazon EC2 cloud, but not<br />

now.<br />

You can then use SSH to connect to the server using the key you generated earlier.<br />

But first, you must allow SSH (22/TCP) in. Listing 4 shows how to authorize the<br />

connection and log in to your new server.<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 8 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

Understanding SSH keys<br />

If you're not familiar with SSH keys, it's helpful to know that SSH can authenticate<br />

users with keys instead of passwords. You generate a key pair, which is composed<br />

of a public key and a private key. You keep the private key private and upload the<br />

public key to a file called authorized_keys, which is inside the $HOME/.ssh directory.<br />

When you connect to a server with SSH, the client can try to authenticate with the<br />

key. If that succeeds, you're logged in.<br />

One property of the key pair is that messages encrypted with one of the keys can be<br />

decrypted only with the other. When you connect to a server, the server can encrypt<br />

a message with the public key stored in the authorized_keys file. If you can decrypt<br />

the message using your public key, the server knows you are authorized to log in<br />

without a password.<br />

The next logical question is, "How is the authorized_keys file filled in with the public<br />

key that's stored with Amazon?" Each Amazon EC2 instance can talk to a web<br />

server in the Amazon EC2 cloud at http://169.254.169.254 and retrieve metadata<br />

about the instance. One of the URLs is<br />

http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key, which returns<br />

the public key associated with the image.<br />

On startup, the AMI retrieves the public key and stores it in authorized_keys. This is<br />

done in /etc/init.d/getssh in the example AMI. It could just as easily happen in<br />

rc.local.<br />

Another use for instance metadata is to pass information to the image. You could<br />

have one generic AMI that could be either a web server or a background job server<br />

and have the instance decide which services to start based on the parameters you<br />

pass when starting the image.<br />

Listing 4. Connecting to the instance<br />

[sean@sergeant:~]$ ec2-authorize default -p 22 -s $MYIP/32<br />

...<br />

[sean@sergeant:~]$ ssh -i ~/.ssh/main.pem root@184.73.43.141<br />

The authenticity of host '184.73.43.141 (184.73.43.141)' can't be established.<br />

RSA key fingerprint is af:c2:1e:93:3c:16:76:6b:c1:be:47:d5:81:82:89:80.<br />

Are you sure you want to continue connecting (yes/no)? yes<br />

Warning: Permanently added '184.73.43.141' (RSA) to the list of known hosts.<br />

...<br />

The first command allows port 22 (TCP is the default option) from a source of your<br />

IP address. The /32 means that only the host is allowed, not the entire network. The<br />

ssh command connects to the server using the private key.<br />

Installing Ruby<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 9 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

CentOS comes with a dated Ruby version, so you will install Ruby Enterprise Edition<br />

(REE), which is a high-performance Ruby interpreter that's compatible with the<br />

current 1.8.7 branch of Ruby. Despite the expensive-sounding name, the software is<br />

open source. Listing 5 shows how to install REE.<br />

Listing 5. Installing REE<br />

# rpm -e ruby ruby-libs<br />

# yum -y install gcc-c++ zlib-devel openssl-devel readline-devel<br />

...<br />

Complete!<br />

# wget http://rubyforge.org/frs/download.php/71096/ruby-enterprise-1.8.7-2010.02.tar.gz<br />

...<br />

# tar -xzf ruby-enterprise-1.8.7-2010.02.tar.gz<br />

# ruby-enterprise-1.8.7-2010.02/installer -a /opt/ree<br />

The first two commands from Listing 5 remove the default Ruby installation and<br />

install a C compiler and a few necessary development packages. wget downloads<br />

the current REE tarball, which is then unpacked by tar. Finally, the last command<br />

runs the installer with an option to accept all defaults and place the results in<br />

/opt/ree. The installer is smart enough to tell you the commands you have to run if<br />

you're missing some packages, so look closely at the output if the installation isn't<br />

working.<br />

After Ruby is installed, add the bin directory to your path with export<br />

PATH="/opt/ree/bin:$PATH", which you can place in the system-wide<br />

/etc/bashrc or the .bashrc directory within your home directory.<br />

Installing PostgreSQL<br />

The PostgreSQL server is part of the CentOS distribution, so all you need to do is<br />

install it with the yum utility. Listing 6 shows how to install PostgreSQL and make<br />

sure it will start on boot.<br />

Listing 6. Installing PostgreSQL<br />

# yum -y install postgresql-server postgresql-devel<br />

...<br />

Installed: postgresql-devel.i386 0:8.1.21-1.el5_5.1<br />

postgresql-server.i386 0:8.1.21-1.el5_5.1<br />

Dependency Installed: postgresql.i386 0:8.1.21-1.el5_5.1<br />

postgresql-libs.i386 0:8.1.21-1.el5_5.1<br />

Complete!<br />

# chkconfig postgresql on<br />

The yum command installs packages from a repository. In Listing 7, you are<br />

installing the PostgreSQL server component and development libraries. Doing so<br />

automatically pulls in the core database utilities and any other packages you need.<br />

You will not need the development package yet, but when it comes time to integrate<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 10 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

Rails and PostgreSQL, you will need the libraries inside postgresql-devel.<br />

By default, the database stores its files in /var/lib/pgsql/data, which is part of the root<br />

file system. You move this directory to the instance storage on /mnt, as shown in<br />

Listing 7.<br />

Listing 7. Moving the PostgreSQL data store to /mnt<br />

# mv /var/lib/pgsql/data /mnt<br />

# ln -s /mnt/data /var/lib/pgsql/data<br />

# service postgresql start<br />

After entering the commands in Listing 7, PostgreSQL is running out of /mnt.<br />

Next, you must enable password logins for the payroll_prod database (which you'll<br />

create in the next step). By default, PostgreSQL does not use passwords: It uses an<br />

internal identification system. Simply add:<br />

host "payroll_prod" all 127.0.0.1/32 md5<br />

to the top of /var/lib/pgsql/data/pg_hba.conf, and then run:<br />

su - postgres -c 'pg_ctl reload'<br />

to make the change take effect. With this configuration, normal logins to PostgreSQL<br />

don't need a password (which is why the reload command didn't need a<br />

password), but any access to the payroll database will.<br />

The final step is to set up the Rails database from the command line. Run su -<br />

postgres -c psql, and follow along in Listing 8.<br />

Listing 8. Creating the user and database<br />

postgres=# create user payroll with password 'secret';<br />

CREATE ROLE<br />

postgres=# create database payroll_prod;<br />

CREATE DATABASE<br />

postgres=# grant all privileges on database payroll_prod to payroll;<br />

GRANT<br />

And with that, your database is created.<br />

Migrating the data<br />

For testing, you should grab a database dump of your production environment from<br />

a certain point in time so that you have something to test with. The SmallPayroll<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 11 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

application stores data both in the database and the file system. The database will<br />

be dumped using the pg_dump command that comes with PostgreSQL; the file<br />

system data will use rsync. The database will have to be wiped and re-transferred<br />

for the migration because of the nature of database dumps, but the file system data<br />

only needs to transfer new and changed files, because rsync can detect when a file<br />

hasn't changed. Thus, the testing part of the plan helps speed up the migration,<br />

because most of the data will already be there.<br />

The fastest way to copy the database is to run:<br />

pg_dump payroll_prod | gzip -c > /tmp/dbbackup.gz<br />

on your production machine, copy dbbackup.gz to the cloud server, and then run:<br />

zcat dbbackup.gz | psql payroll_prod<br />

This command simply creates a compressed dump of the database from one server,<br />

and then replays all the transactions on the other server.<br />

rsync is just as simple. From your production server, run:<br />

rsync -avz -e "ssh -i .ssh/main.pem" /var/uploads/ root@174.129.138.83:/var/uploads/<br />

This command copies everything from /var/uploads from the current production<br />

server to the new server. If you run it again, only the changed files are copied over,<br />

saving you time later on synchronizations.<br />

Because you are copying the database over, you do not have to apply your Rails<br />

migrations first. Rails will believe the database is up to date, because you already<br />

copied over the schema_migrations table.<br />

Deploying the Rails application<br />

At this point, you have the base server set up but not your application. You must<br />

install some basic gems, along with any gems your application requires, before your<br />

application will run. Listing 9 shows the commands to u<strong>pd</strong>ate your gems. Note that<br />

you must be in the root of your Rails application, so copy it over to your server first.<br />

Listing 9. U<strong>pd</strong>ating RubyGems and installing your gems<br />

# gem u<strong>pd</strong>ate --system<br />

U<strong>pd</strong>ating RubyGems<br />

Nothing to u<strong>pd</strong>ate<br />

# gem install rails mongrel mongrel-cluster postgres<br />

Successfully installed rails-2.3.8<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 12 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

Building native extensions. This could take a while...<br />

Successfully installed gem_plugin-0.2.3<br />

Successfully installed daemons-1.1.0<br />

Successfully installed cgi_multipart_eof_fix-2.5.0<br />

Successfully installed mongrel-1.1.5<br />

Successfully installed mongrel_cluster-1.0.5<br />

Building native extensions. This could take a while...<br />

Successfully installed postgres-0.7.9.2008.01.28<br />

7 gems installed<br />

...<br />

# rake gems:install<br />

(in /home/payroll)<br />

gem install haml<br />

Successfully installed haml-3.0.12<br />

1 gem installed<br />

Installing ri documentation for haml-3.0.12...<br />

Installing RDoc documentation for haml-3.0.12...<br />

gem install money<br />

...<br />

The first command makes sure that RubyGems itself is up to date. The second<br />

command installs some helpful gems:<br />

• rails. The Ruby on Rails framework<br />

• postgres. The database driver that lets you use PostgreSQL with<br />

ActiveRecord<br />

• mongrel. An application server used to host the Rails application<br />

• mongrel_cluster. Utilities to let you start and stop groups of mongrels<br />

at the same time<br />

The last command runs a Rails task to install all the extra gems that the application<br />

requires. If you didn't use the config.gem directive in your config/environment.rb<br />

file, then you may have to install your extra gems by hand using the gem install<br />

gemname command.<br />

Try to start your application with the RAILS_ENV=production script/console<br />

command. If this command succeeds, stop it, and then launch your pack of<br />

mongrels with:<br />

mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml<br />

If the first command doesn't succeed, you will get plenty of error messages to help<br />

you find the problem, which is usually a missing gem or file. Take this opportunity to<br />

go back and put in any missing config.gem directives so that you don't forget the<br />

gem in the future.<br />

Installing a front-end web server<br />

Nginx is the web server of choice for many virtual environments. It has low overhead<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 13 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

and is good at proxying connections to a back-end service like mongrel. Listing 10<br />

shows how to install nginx.<br />

Listing 10. Installing nginx<br />

# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm<br />

...<br />

# yum install nginx<br />

...<br />

Running Transaction<br />

Installing : nginx [1/1]<br />

Installed: nginx.i386 0:0.6.39-4.el5<br />

Complete!<br />

# chkconfig nginx on<br />

Listing 11 installs the Extra Packages for Enterprise Linux® (EPEL) repository, then<br />

installs nginx and makes sure it will come up on startup.<br />

Listing 11. An nginx configuration for a rails application<br />

# Two mongrels, balanced based on least connections<br />

upstream mongrel-payroll {<br />

fair;<br />

server 127.0.0.1:8100;<br />

server 127.0.0.1:8101;<br />

}<br />

server {<br />

listen 80;<br />

server_name<br />

app.smallpayroll.ca;<br />

root /home/payroll/current/public;<br />

gzip_static on;<br />

access_log /var/log/nginx/app.smallpayroll.ca_log main;<br />

error_page 404 /404.html;<br />

location / {<br />

# Because we're proxying, set some environment variables indicating this<br />

proxy_set_header X-Real-IP $remote_addr;<br />

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;<br />

proxy_set_header Host $http_host;<br />

proxy_redirect false;<br />

proxy_max_temp_file_size 0;<br />

# Serve static files out of Root (eg public)<br />

if (-f $request_filename) {<br />

break;<br />

}<br />

# Handle page cached actions by looking for the appropriately named file<br />

if (-f $request_filename.html) {<br />

rewrite (.*) $1.html;<br />

break;<br />

}<br />

# Send all other requests to mongrel<br />

if (!-f $request_filename) {<br />

proxy_pass http://mongrel-payroll;<br />

break;<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 14 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

}<br />

}<br />

}<br />

error_page 500 502 503 504 /500.html;<br />

location = /500.html {<br />

root /home/payroll/current/public;<br />

}<br />

Listing 11 shows a fairly typical nginx configuration, with some elements thrown in to<br />

handle Rails page caching and send dynamic requests to an upstream mongrel. You<br />

could map other URLs to file names here, if needed.<br />

With the configuration in place, service nginx start starts the web server.<br />

Testing<br />

For testing, it would be helpful to be able to refer to your cloud instance using the<br />

regular domain name of your application, because you want to ensure that you're<br />

using your test site and not the production site. You do this through a local DNS<br />

override. In Windows, edit C:\windows\system32\drivers\etc\hosts; in UNIX, edit<br />

/etc/hosts. Add a line like:<br />

x.x.x.x<br />

app.smallpayroll.ca<br />

where x.x.x.x is the IP address of your cloud server and app.smallpayroll.ca is the<br />

name of your application. Restart your browser, and browse to your website. You will<br />

be using the cloud version of your application now. (Don't forget to comment out the<br />

line you just added when you want to go back to the production version!)<br />

At this point, you should be able to test that the cloud version of your application<br />

works just as well as the production version; fix any problems you find. Make careful<br />

note of whatever you find, as you'll want to script it in case you launch a second<br />

server. Because you're using the cloud version of your application, you can delete<br />

and restore your database without any users complaining.<br />

Bundling the new AMI<br />

The last thing to do is re-bundle your AMI. Any time you start a new instance, you<br />

lose everything in /mnt, and your root partition is reset to whatever is in the AMI.<br />

There's nothing you can do yet about the /mnt problem, but re-bundling makes sure<br />

that your AMI is just the way you left it.<br />

If the AMI you are starting from does not have the AMI tools, you can install them<br />

with following command:<br />

rpm -i --nodeps http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.noarch.rpm<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 15 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

Bundling an AMI is a three-step process:<br />

1. Create the image on the instance itself.<br />

2. Upload the image to Amazon S3.<br />

3. Register the AMI.<br />

Before proceeding, shut down your mongrel and PostgreSQL instances, just to<br />

make sure any open files are handled correctly. You must also copy your X.509<br />

keys, found in the Amazon Console, to /mnt on your server. Listing 12 shows the<br />

first two steps of bundling, which are done on the VM itself.<br />

Listing 12. Bundling the AMI<br />

# ec2-bundle-vol -d /mnt -e /mnt --privatekey /mnt/pk-mykey.pem \<br />

--cert /mnt/cert-mycert.pem --user 223110335193 -p centos-ertw<br />

Please specify a value for arch [i386]:<br />

Copying / into the image file /mnt/centos-ertw...<br />

...<br />

Generating digests for each part...<br />

Digests generated.<br />

Creating bundle manifest...<br />

ec2-bundle-vol complete.<br />

# ec2-upload-bundle -b ertw.com -m /mnt/centos-ertw.manifest.xml \<br />

--secret-key MYSECRETKEY --access-key MYACCESSKEY<br />

Creating bucket...<br />

Uploading bundled image parts to the S3 bucket ertw.com ...<br />

...<br />

Uploaded centos-ertw.part.37<br />

Uploading manifest ...<br />

Uploaded manifest.<br />

Bundle upload completed.<br />

The first command generates the bundle, specifying that /mnt is to be ignored and<br />

that the bundle will go in /mnt (the -e and -d options, respectively). The -k,<br />

--cert, and --user options point to your security credentials and AWS user ID,<br />

which are all found in the account settings of your AWS Management Console. The<br />

last option, -p, lets you name this AMI to differentiate it from others.<br />

The first command will run for about 10 minutes, depending on how full your root<br />

partition is. The second command uploads the bundle to Amazon S3. The -b option<br />

specifies a bucket name, which will be created if it doesn't exist already. The -m<br />

option points to the manifest file created in the last step. The last two options are<br />

your Amazon S3 credentials, which are found right next to your X.509 credentials in<br />

the AWS Management Console. Just remember that X.509 credentials are used for<br />

Amazon EC2 operations, while Amazon S3 uses text keys.<br />

Finally, run the command:<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 16 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

ec2-register ertw.com/centos-ertw.manifest.xml<br />

to register the AMI, and you will see the AMI identifier to use from now on. Note that<br />

the ec2-register command is not distributed with the AMI, so it's easiest to run it<br />

from the server where you started the original AMI. You could also install the<br />

Amazon EC2 tools on your Amazon EC2 instance.<br />

Performing the migration<br />

Now that you've got your cloud environment running, the migration itself should be<br />

rather simple. You've verified that everything works: All that remains is to<br />

resynchronize the data and cut over in an orderly fashion.<br />

Premigration tasks<br />

Some time before the migration, make sure you lower the TTL of your domain name<br />

records to 5 minutes. You should also develop a checklist of the steps you will take<br />

to move everything over, the tests you want to run to verify that everything is<br />

working, and the procedure to back out of the change, if necessary.<br />

Make sure your users are notified of the migration!<br />

Just before your migration time, take another look at your cloud environment to<br />

make sure it is ready to be synchronized and accept production traffic.<br />

Migrating the application<br />

To migrate the application, perform the following steps:<br />

1. Disable the current production site or put it in read-only mode, depending<br />

on the nature of the site.<br />

Because most of SmallPayroll's requests involve writing to the database<br />

or file system, the site will be disabled. The Capistrano deployment gem<br />

includes a task, cap deploy:web:disable, that puts a maintenance<br />

page on the site informing users that the site is down for maintenance.<br />

2. Stop the application services in the cloud environment in preparation for<br />

the data migration by killing your mongrel processes.<br />

3. Copy your database over the same way you did for testing.<br />

4. Re-run rsync, if necessary.<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 17 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

5. Restart the application servers with the command:<br />

mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml<br />

6. Make sure your hosts file is pointing to the cloud environment, and<br />

perform some smoke tests. Make sure users can log in and browse the<br />

site.<br />

U<strong>pd</strong>ating DNS<br />

If your smoke tests pass, then you can change your DNS records to point to your<br />

cloud environment. At this point, I find it helpful to keep a tail -f running on the<br />

web server's log file to watch for people coming in to the site.<br />

Chances are that your local DNS server still has the old information cached for the<br />

next 5 minutes. You can verify this with the dig command, as shown in Listing 13.<br />

Listing 13. Verifying the DNS server is caching the query<br />

# dig app.smallpayroll.ca @172.16.0.23<br />

; DiG 9.3.4 app.smallpayroll.ca @172.16.0.23<br />

; (1 server found)<br />

;; global options: printcmd<br />

;; Got answer:<br />

;; ->>HEADER


ibm.com/developerWorks<br />

developerWorks®<br />

1. Set up the new environment.<br />

2. Test with a copy of production data.<br />

3. Turn off the old environment.<br />

4. Copy production data over to the new environment.<br />

5. Change DNS to point to the new environment.<br />

Despite now being "in the cloud," the application is probably worse off than it was<br />

before. Consider the following points:<br />

• The application is still running on one server.<br />

• If the server crashes, all the data is lost.<br />

• You have less control over performance than you do on a physical server.<br />

• The machine and application are not locked down.<br />

In the next article, you'll learn how to overcome these problems and start building a<br />

more robust environment for your application.<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 19 of 21


developerWorks®<br />

ibm.com/developerWorks<br />

Resources<br />

Learn<br />

• In the Cloud Computing zone on developerWorks, get the resources you need<br />

to develop and deploy applications in the cloud and keep on top of recent cloud<br />

developments.<br />

• Request instance metadata from your Amazon EC2 instance to get information<br />

about the instance, from the SSH keys it should use to user-specified<br />

information.<br />

• Start your cloud adventure by looking around the AWS Management Console.<br />

• If you're going to work with Amazon EC2, you should familiarize yourself with<br />

the various guides that Amazon provides.<br />

• Learn about the IBM AMIs from Amazon's perspective and from IBM's<br />

perspective.<br />

• In the developerWorks Linux zone, find hundreds of how-to articles and<br />

tutorials, as well as downloads, discussion forums, and a wealth of other<br />

resources for Linux developers and administrators.<br />

• Stay current with developerWorks technical events and webcasts focused on a<br />

variety of IBM products and IT industry topics.<br />

• Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM<br />

products and tools, as well as IT industry trends.<br />

• Watch developerWorks on-demand demos ranging from product installation and<br />

setup demos for beginners, to advanced functionality for experienced<br />

developers.<br />

• Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on<br />

developerWorks.<br />

Get products and technologies<br />

• Ruby Enterprise Edition is a high-performance Ruby implementation that can be<br />

used by itself or along with Phusion Passenger to integrate with Apache or<br />

nginx. Either way, you get access to faster memory management and improved<br />

garbage collection.<br />

• Sign up for the IBM Industry Application Platform AMI for Development Use to<br />

get started with various IBM products in the cloud. Remember that you have to<br />

go through a checkout process, but you're not going to be charged anything<br />

until you use it. You can also use ami-90ed0ff9.<br />

• The Amazon EC2 API tools are used to communicate with the Amazon API to<br />

launch and terminate instances and re-bundle new ones. These tools are<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 20 of 21


ibm.com/developerWorks<br />

developerWorks®<br />

periodically u<strong>pd</strong>ated as new features are introduced to Amazon EC2, so it's<br />

worth checking back after product announcements for u<strong>pd</strong>ates to this page. You<br />

will need at least the 2009-05-15 u<strong>pd</strong>ate, because you'll be using some of the<br />

load-balancing features later.<br />

• Evaluate IBM products in the way that suits you best: Download a product trial,<br />

try a product online, use a product in a cloud environment, or spend a few hours<br />

in the SOA Sandbox learning how to implement Service Oriented Architecture<br />

efficiently.<br />

Discuss<br />

• Get involved in the My developerWorks community. Connect with other<br />

developerWorks users while exploring the developer-driven blogs, forums,<br />

groups, and wikis.<br />

About the author<br />

Sean Walberg<br />

Sean Walberg is a network engineer and the author of two books on<br />

networking. He has worked in various industries, including health care<br />

and media.<br />

Initial migration<br />

Trademarks<br />

© Copyright IBM Corporation 2010. All rights reserved. Page 21 of 21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!