l-migrate2cloud-1-pd..

Migrate your Linux application to the Amazon 

cloud, Part 1: Initial migration 

How to migrate your application into the cloud 

Skill Level: Intermediate 

Sean Walberg (sean@ertw.com) 

Network Engineer 

13 Jul 2010 

Cloud computing and Infrastructure as a Service (IaaS) are well documented, but 

what's often not discussed is how to get a running application into a cloud 

environment. Discover how to move an application into the cloud and take advantage 

of the features this setup has to offer. 

Read more by Sean 

Browse all of Sean's articles on developerWorks. 

Infrastructure as a Service (IaaS) is a great concept: You use computing resources; 

you pay for them. You want more computing power; you pay more. The downside of 

this model is that you're working with computers that you'll never see or know much 

about. Once you get over that, however, there's a lot to be gained by using IaaS. 

Because the IaaS model is so different from the traditional model of buying servers, 

the way you manage your virtual computers changes. The way you run your 

application in the cloud also changes. Things you once took for granted, such as 

negligible latency between servers, are no longer givens. 

This series of articles follows the migration of a web application from a single 

physical server to Amazon Elastic Compute Cloud (Amazon EC2). Along the way, 

you learn how to adapt your application to the cloud environment and how to take 

advantage of the features that the cloud has to offer. To start, you see a straight 

migration from one physical server to a cloud server. 

Initial migration 

Trademarks 

© Copyright IBM Corporation 2010. All rights reserved. Page 1 of 21

developerWorks® 

ibm.com/developerWorks 

Working with Amazon EC2 

Amazon EC2 lets anyone with a credit card pay for servers by the hour, turning them 

on and off through an application programming interface (API). You have a variety of 

server types to choose from—depending on whether memory, disk, or CPU power is 

your primary concern—along with a suite of add-ons from persistent disks to load 

balancers. You pay only for what you use. 

Alongside the Amazon EC2 offering are others that give you, among other things, 

payment processing, databases, and message queuing. In this article series, you will 

be using Amazon Simple Storage Service (Amazon S3), which gives you access to 

disk space on a pay-per-use basis. 

The example application 

The web application that this series uses for examples is a payroll service called 

SmallPayroll.ca, written with the Ruby on Rails framework and a PostgreSQL back 

end. It is typical of many web applications: It has a database tier, an application tier, 

and a set of static files like cascading style sheet (CSS) and JavaScript files. Users 

navigate various forms to input and manipulate data, and they generate reports. 

The various components in use are: 

• Nginx. The front-end web server for static files and balancer to the middle 

tier. 

• Mongrel. The application server itself. 

• Ruby. The language you write the application in. 

• Gems. Third-party plug-ins and libraries for everything from database 

encryption to application-level monitoring. 

• PostgreSQL. The Structured Query Language database engine. 

Use of the site has exceeded the capacity of the single server that now houses it. 

Therefore, a migration to a new environment is in order, and this is a prime 

opportunity to move to the cloud. 

Desired improvements 

But simply moving from one server to a small number of cloud-based servers 

wouldn't take advantage of what can be done in the cloud, nor would it make for 

exciting reading. So, during the move, you'll make improvements, some of which are 

only possible in a cloud environment: 


Trademarks 




• Increased reliability. Because you can choose the size of server to run 

in the cloud, you can run multiple, smaller servers for redundancy. 

• Capacity for both scale-up and scale-down. Servers are incrementally 

added to the pool as the service grows. However, the number of servers 

can also be increased to accommodate short-term spikes in traffic or 

decreased during periodic lulls. 

• Cloud storage. Backups of the application data will be made to Amazon 

S3, eliminating the need for tape storage. 

• Automation. Everything in the Amazon environment—from the servers to 

the storage to the load balancers—can be automated. Less time 

managing an application means more time for other, more productive 

things. 

You'll make these improvements incrementally throughout this article series. 

Testing and migration strategies 

When deploying an application for the first time, you generally have the luxury of 

being able to test and tweak without the burden of production traffic. In contrast, 

when migrating an application, you have the challenge of users who are placing a 

load on the site. Once the new environment takes production traffic, the users will be 

expecting everything to work properly. 

A migration does not necessarily mean zero downtime. It's much easier if you can 

take the service offline for a period of time. You can use this outage window to 

perform final data synchronizations and allow for any network changes to stabilize. 

The window should not be used to do the initial deploy to the new environment—that 

is, the new environment should be in an operational state before the application 

migration starts. With this in mind, the key points are synchronization of data 

between the environments and network changes. 

As you plan your migration strategy, it helps to begin with a walk-through of your 

current environment. Answer the following questions: 

• What software do I use on my servers to run the application? 

• What software do I use on my servers to manage and monitor the 

application and server resources? 

• Where is all the user data kept? In databases? In files? 

• Are static assets, like images, CSS, and JavaScript files, stored 

somewhere else? 


Trademarks 




• What touchpoints into other systems does the application need? 

• Have I backed everything up recently? 

Notifying users 

In general, notifying your users is a good thing, even if you don't anticipate any 

downtime. In the case of the SmallPayroll.ca application, users tend to use the site 

at a consistent interval, corresponding with their two-week payroll cycle. Therefore, 

two weeks' notice would be a reasonable period. Sites like Google AdWords, which 

is the administrative interface for the Google advertising platform, give about a 

week's notice. If your website is more of a news site where users would not be as 

disrupted if it were down for an hour, you may choose to give notification on the day 

of the outage. 

The form of notification also varies depending on the nature of your site and how you 

currently communicate with your users. For SmallPayroll.ca, a prominent message 

when the user logs in will be enough. For example, a message like "The system will 

be unavailable between 12:01 a.m. and 1 a.m. Eastern time, June 24, 2010. 

Everything entered prior to this will still be saved. For more information, click here." 

This message provides the three key pieces of information that users need to know: 

• When the outage will happen, including the time zone 

• Reassurance that their data will be safe 

• Pointer to further information 

If possible, avoid using 12:00 a.m. or 12:00 p.m., including the term midnight. These 

tend to confuse people, as many are not sure if midnight on June 17 refers to early 

morning (12:01 a.m.) or very late (11:59 p.m.). Similarly, many are not sure whether 

noon means 12 a.m. or 12 p.m. It's much easier to add a minute and make the time 

unambiguous. 

Your details may be different, especially if you anticipate partial functionality during 

the outage. If you decide that you are going to put the notice up only during the 

outage (such as for a news site), the same information will still be helpful. My 

favorite site outage screen was along the lines of "The site is down for maintenance; 

back up around 3 p.m. EST. Play this game of Asteroids while you're waiting!" 

Don't neglect your internal users, either. If you have account representatives, you 

will want to give them notice in case their clients ask any questions. 

DNS considerations 

The domain name system (DNS) takes care of translating a name like 

www.example.com into an IP address like 192.0.32.10. Your computer connects to 


Trademarks 




IP addresses, so this translation is important. When migrating from one environment 

to another, you are almost guaranteed to be using a different IP address (the 

exception would be if you're staying in the same physical building). 

Computers cache the name-to-IP mapping for a certain period of time, known as the 

time to live (TTL), to reduce overall response time. When you make the switch from 

one environment to another—and therefore from one IP address to another—people 

who have the DNS entry cached will continue to try to use the old environment. The 

DNS entry for the application and its associated TTL must be managed carefully. 

TTLs are typically between one hour and one day. In preparation for a migration, 

though, you would want the TTL to be something short such as 5 minutes. This 

change must be made at least one TTL period before you intend to change the 

address, because computers get the TTL along with the name-to-IP mapping. For 

example, if the TTL for www.example.com were set to 86,400 seconds (one day), 

you would need to reset the TTL to 5 minutes at least one day before the migration. 

Decoupling the old and new environments 

It is essential that you fully test your new environment before migrating. All testing 

should happen in isolation from the production environment, preferably with a 

snapshot of production data so you can better exercise the new environment. 

Performing a full test with a snapshot of production data serves two purposes. The 

first is that you are more likely to spot errors if you are using real-world data, 

because it is more unpredictable than the test data used during development. 

Real-world data may refer to files that you forgot to copy over or that require certain 

configurations that were forgotten during your walk-through. 

The second reason to use production data is that you can practice your migration at 

the same time as you load data. You should be able to prove most aspects of your 

migration plan, except for the actual switch of environments. 

Even though you will be mocking up your new environment as if it were production, 

only one environment can be associated with the host name of the application. The 

easiest way to get around this requirement is to make a DNS override in your hosts 

file. In UNIX®, this file resides at /etc/hosts; in Windows®, it resides in 

C:\windows\system32\drivers\etc\hosts. Simply follow the format of the existing lines, 

and add an entry pointing your application's host name to its future IP address. Don't 

forget to do the same for any image servers or anything else that you will be moving. 

You will probably have to restart your browser, but after that, you will be able to 

enter your production URL and be taken to your new environment, instead. 

An Amazon EC2 primer 


Trademarks 




The Amazon EC2 service allows you to pay for a virtual machine (VM) by the hour. 

Amazon offers several different types of machines and classifies them by their CPU, 

memory, and disk profiles. Amazon measures memory and disk in terms of 

gigabytes and CPU in terms of Amazon EC2 Compute Units (ECU), where 1 ECU is 

roughly a 1.0 to 1.2GHz AMD Opteron or Intel® Xeon® processor (2007 era). For 

example, the standard small instance gives you 1.7GB of memory, 160GB of disk 

space, and 1 ECU of CPU. At the time of this writing, the biggest machine is the 

High-memory Quadruple Extra Large, which has 68.4GB of memory, 1.7TB of disk 

space, and 26 ECUs split across eight virtual cores. The prices range from US 8.5 

cents per hour for the smallest to US$2.40 per hour for the biggest. 

An Amazon EC2 instance begins life as an Amazon Machine Image (AMI), which is 

a template you use to build any number of VMs. Amazon publishes some AMIs, and 

you can make your own and share them with others. Some of these user-created 

AMIs are available at no cost; some incur an hourly charge on top of the Amazon 

hourly charge. For example, IBM publishes several paid AMIs that let you pay for 

licensing on an hourly basis. 

When you want to boot a VM, you choose the machine type and an AMI. The AMI is 

stored in Amazon S3 and copied to the root partition of your VM when you launch 

the instance. The root partition is always 10GB. The storage space associated with 

the machine type is called the instance storage or ephemeral storage and is 

presented to your VM as a separate drive. The storage is called ephemeral, because 

when you shut down your instance, the information is gone forever. You are required 

to back up your own data periodically to protect against loss. This also means that if 

the physical host running your instance crashes, your instance is shut down and the 

ephemeral disk is lost. 

The Amazon Machine Image 

All AMIs are assigned an identifier by Amazon, such as ami-0bbd5462. Amazon 

provides some public AMIs, and other people have made their own AMIs public. You 

can choose to start with a public AMI and make your own modifications, or you can 

start from scratch. Any time you make changes to the root file system of an AMI, you 

can save it as a new AMI, which is called re-bundling. 

In this series, you will be starting off with a publicly available CentOS image, though 

you can choose a different one. It is wise to spend some time looking through any 

image you use to make sure there are no extra accounts and that the packages are 

updated. It is also possible to roll your own AMI from scratch, but that is outside the 

scope of this article. 

The Amazon API 

All of the functionality necessary to start, stop, and use the Amazon EC2 cloud is 

available using a web service. Amazon publishes the specifications for the web 


Trademarks 




services and also provides a set of command-line tools. You should download these 

tools before proceeding (see Resources). I also encourage you to look at the quick 

start guide (see Resources) to get your environment set up, which will save you a lot 

of typing. 

You authenticate to the API using security credentials. These credentials are found 

from the Account link within the Amazon Web Services (AWS) management 

console (see Resources). You will need your X.509 certificate files and your access 

keys. Keep these safe! Anyone with them could use AWS resources and incur 

charges on your behalf. 

Before you launch your first instance 

Before you launch your first instance, you must generate Secure Shell (SSH) keys to 

authenticate to your new instance and set up the virtual firewall to protect your 

instance. Listing 1 shows the use of the ec2-add-keypair command to generate 

an SSH key pair. 

Listing 1. Generating an SSH key pair 

[sean@sergeant:~]$ ec2-add-keypair main 

KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01 

-----BEGIN RSA PRIVATE KEY----- 

MIIEpAIBAAKCAQEAu8cTsq84bHLVhDG3n/fe9FGz0fs0j/FwZiDDovwfpxA/lijaedg6lA7KBzvn 

... 

-----END RSA PRIVATE KEY----- 

[sean@sergeant:~]$ ec2-describe-keypairs 

KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01 

The first command tells Amazon to generate a key pair with the name main. The first 

line of the result gives the hash of the key. The rest of the output is an unencrypted 

PEM private key. You must store this key somewhere—for example, 

~/.ssh/main.pem. Amazon retains the public portion of the key, which will be made 

available to the VMs you launch. 

The second command, ec2-describe-keypairs, asks Amazon for the current 

list of key pairs. The result is the name of the key pair, followed by the hash. 

Each instance is protected by a virtual firewall that initially allows nothing in. Amazon 

EC2 calls these instances security groups and has API calls and commands to 

manipulate them. You will look at these more closely when the time comes. In the 

meantime, Listing 2 shows how to view your current groups. 

Listing 2. Displaying the current security groups 

[sean@sergeant:~]$ ec2-describe-group 

GROUP 223110335193 default default group 


Trademarks 




Listing 2 shows a group called default with a description of "default group." The user 

ID associated with the group is 223110335193. There are no rules in this group. If 

there were, they would be described below the group with the word PERMISSION in 

the left column. 

Preparing the cloud environment 

The first step is to prepare the cloud environment to test the application. The new 

environment will mimic the current production environment. 

Start by launching the AMI, which has an ID of ami-10b55379. Listing 3 shows the 

AMI being launched and the status being checked. 

Listing 3. Launching the CentOS AMI 

[sean@sergeant:~]$ ec2-run-instances ami-10b55379 -k main 

RESERVATION r-750fff1e 223110335193 default 

INSTANCE i-75aaf41e ami-10b55379 pending main 0 m1.small 

2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled 

instance-store 

[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e 


i-75aaf41e ami-10b55379 pending main 0 E3D48CEE m1.small 


instance-store 

[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e 


INSTANCE i-75aaf41e ami-10b55379 ec2-184-73-43-141.compute-1.amazonaws.com 

domU-12-31-39-00-64-71.compute-1.internal running main 0 E3D48CEE m1.small 


184.73.43.141 10.254.107.127 instance-store 

The first command launches the instance using the ami-10b55379 AMI and specifies 

that the key pair generated in Listing 1 is to be used to authenticate to the machine. 

The command returns several pieces of information, the most important being the 

instance identifier (i-750fff1e), which is the identity of the machine in the Amazon 

EC2 cloud. The second command uses the ec2-describe-instances 

command, which lists all the running instances. In Listing 3, the instance identifier 

has been passed on the command line to only show information about that instance. 

The state of the instance is listed as pending, which means that the instance is still 

being started. The IBM AMI is large, so it typically takes 5-10 minutes just to start. 

Running the same command some time later shows that the state is running and 

that the external IP address of 184.73.43.141 has been given. The internal IP 

address that starts with 10 is useful for talking within the Amazon EC2 cloud, but not 

now. 

You can then use SSH to connect to the server using the key you generated earlier. 

But first, you must allow SSH (22/TCP) in. Listing 4 shows how to authorize the 

connection and log in to your new server. 


Trademarks 




Understanding SSH keys 

If you're not familiar with SSH keys, it's helpful to know that SSH can authenticate 

users with keys instead of passwords. You generate a key pair, which is composed 

of a public key and a private key. You keep the private key private and upload the 

public key to a file called authorized_keys, which is inside the $HOME/.ssh directory. 

When you connect to a server with SSH, the client can try to authenticate with the 

key. If that succeeds, you're logged in. 

One property of the key pair is that messages encrypted with one of the keys can be 

decrypted only with the other. When you connect to a server, the server can encrypt 

a message with the public key stored in the authorized_keys file. If you can decrypt 

the message using your public key, the server knows you are authorized to log in 

without a password. 

The next logical question is, "How is the authorized_keys file filled in with the public 

key that's stored with Amazon?" Each Amazon EC2 instance can talk to a web 

server in the Amazon EC2 cloud at http://169.254.169.254 and retrieve metadata 

about the instance. One of the URLs is 

http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key, which returns 

the public key associated with the image. 

On startup, the AMI retrieves the public key and stores it in authorized_keys. This is 

done in /etc/init.d/getssh in the example AMI. It could just as easily happen in 

rc.local. 

Another use for instance metadata is to pass information to the image. You could 

have one generic AMI that could be either a web server or a background job server 

and have the instance decide which services to start based on the parameters you 

pass when starting the image. 

Listing 4. Connecting to the instance 

[sean@sergeant:~]$ ec2-authorize default -p 22 -s $MYIP/32 

... 

[sean@sergeant:~]$ ssh -i ~/.ssh/main.pem root@184.73.43.141 

The authenticity of host '184.73.43.141 (184.73.43.141)' can't be established. 

RSA key fingerprint is af:c2:1e:93:3c:16:76:6b:c1:be:47:d5:81:82:89:80. 

Are you sure you want to continue connecting (yes/no)? yes 

Warning: Permanently added '184.73.43.141' (RSA) to the list of known hosts. 

... 

The first command allows port 22 (TCP is the default option) from a source of your 

IP address. The /32 means that only the host is allowed, not the entire network. The 

ssh command connects to the server using the private key. 

Installing Ruby 


Trademarks 




CentOS comes with a dated Ruby version, so you will install Ruby Enterprise Edition 

(REE), which is a high-performance Ruby interpreter that's compatible with the 

current 1.8.7 branch of Ruby. Despite the expensive-sounding name, the software is 

open source. Listing 5 shows how to install REE. 

Listing 5. Installing REE 

# rpm -e ruby ruby-libs 

# yum -y install gcc-c++ zlib-devel openssl-devel readline-devel 

... 

Complete! 

# wget http://rubyforge.org/frs/download.php/71096/ruby-enterprise-1.8.7-2010.02.tar.gz 

... 

# tar -xzf ruby-enterprise-1.8.7-2010.02.tar.gz 

# ruby-enterprise-1.8.7-2010.02/installer -a /opt/ree 

The first two commands from Listing 5 remove the default Ruby installation and 

install a C compiler and a few necessary development packages. wget downloads 

the current REE tarball, which is then unpacked by tar. Finally, the last command 

runs the installer with an option to accept all defaults and place the results in 

/opt/ree. The installer is smart enough to tell you the commands you have to run if 

you're missing some packages, so look closely at the output if the installation isn't 

working. 

After Ruby is installed, add the bin directory to your path with export 

PATH="/opt/ree/bin:$PATH", which you can place in the system-wide 

/etc/bashrc or the .bashrc directory within your home directory. 

Installing PostgreSQL 

The PostgreSQL server is part of the CentOS distribution, so all you need to do is 

install it with the yum utility. Listing 6 shows how to install PostgreSQL and make 

sure it will start on boot. 

Listing 6. Installing PostgreSQL 

# yum -y install postgresql-server postgresql-devel 

... 

Installed: postgresql-devel.i386 0:8.1.21-1.el5_5.1 

postgresql-server.i386 0:8.1.21-1.el5_5.1 

Dependency Installed: postgresql.i386 0:8.1.21-1.el5_5.1 

postgresql-libs.i386 0:8.1.21-1.el5_5.1 

Complete! 

# chkconfig postgresql on 

The yum command installs packages from a repository. In Listing 7, you are 

installing the PostgreSQL server component and development libraries. Doing so 

automatically pulls in the core database utilities and any other packages you need. 

You will not need the development package yet, but when it comes time to integrate 


Trademarks 




Rails and PostgreSQL, you will need the libraries inside postgresql-devel. 

By default, the database stores its files in /var/lib/pgsql/data, which is part of the root 

file system. You move this directory to the instance storage on /mnt, as shown in 

Listing 7. 

Listing 7. Moving the PostgreSQL data store to /mnt 

# mv /var/lib/pgsql/data /mnt 

# ln -s /mnt/data /var/lib/pgsql/data 

# service postgresql start 

After entering the commands in Listing 7, PostgreSQL is running out of /mnt. 

Next, you must enable password logins for the payroll_prod database (which you'll 

create in the next step). By default, PostgreSQL does not use passwords: It uses an 

internal identification system. Simply add: 

host "payroll_prod" all 127.0.0.1/32 md5 

to the top of /var/lib/pgsql/data/pg_hba.conf, and then run: 

su - postgres -c 'pg_ctl reload' 

to make the change take effect. With this configuration, normal logins to PostgreSQL 

don't need a password (which is why the reload command didn't need a 

password), but any access to the payroll database will. 

The final step is to set up the Rails database from the command line. Run su - 

postgres -c psql, and follow along in Listing 8. 

Listing 8. Creating the user and database 

postgres=# create user payroll with password 'secret'; 

CREATE ROLE 

postgres=# create database payroll_prod; 

CREATE DATABASE 

postgres=# grant all privileges on database payroll_prod to payroll; 

GRANT 

And with that, your database is created. 

Migrating the data 

For testing, you should grab a database dump of your production environment from 

a certain point in time so that you have something to test with. The SmallPayroll 


Trademarks 




application stores data both in the database and the file system. The database will 

be dumped using the pg_dump command that comes with PostgreSQL; the file 

system data will use rsync. The database will have to be wiped and re-transferred 

for the migration because of the nature of database dumps, but the file system data 

only needs to transfer new and changed files, because rsync can detect when a file 

hasn't changed. Thus, the testing part of the plan helps speed up the migration, 

because most of the data will already be there. 

The fastest way to copy the database is to run: 

pg_dump payroll_prod | gzip -c > /tmp/dbbackup.gz 

on your production machine, copy dbbackup.gz to the cloud server, and then run: 

zcat dbbackup.gz | psql payroll_prod 

This command simply creates a compressed dump of the database from one server, 

and then replays all the transactions on the other server. 

rsync is just as simple. From your production server, run: 

rsync -avz -e "ssh -i .ssh/main.pem" /var/uploads/ root@174.129.138.83:/var/uploads/ 

This command copies everything from /var/uploads from the current production 

server to the new server. If you run it again, only the changed files are copied over, 

saving you time later on synchronizations. 

Because you are copying the database over, you do not have to apply your Rails 

migrations first. Rails will believe the database is up to date, because you already 

copied over the schema_migrations table. 

Deploying the Rails application 

At this point, you have the base server set up but not your application. You must 

install some basic gems, along with any gems your application requires, before your 

application will run. Listing 9 shows the commands to update your gems. Note that 

you must be in the root of your Rails application, so copy it over to your server first. 

Listing 9. Updating RubyGems and installing your gems 

# gem update --system 

Updating RubyGems 

Nothing to update 

# gem install rails mongrel mongrel-cluster postgres 

Successfully installed rails-2.3.8 


Trademarks 




Building native extensions. This could take a while... 

Successfully installed gem_plugin-0.2.3 

Successfully installed daemons-1.1.0 

Successfully installed cgi_multipart_eof_fix-2.5.0 

Successfully installed mongrel-1.1.5 

Successfully installed mongrel_cluster-1.0.5 

Building native extensions. This could take a while... 

Successfully installed postgres-0.7.9.2008.01.28 

7 gems installed 

... 

# rake gems:install 

(in /home/payroll) 

gem install haml 

Successfully installed haml-3.0.12 

1 gem installed 

Installing ri documentation for haml-3.0.12... 

Installing RDoc documentation for haml-3.0.12... 

gem install money 

... 

The first command makes sure that RubyGems itself is up to date. The second 

command installs some helpful gems: 

• rails. The Ruby on Rails framework 

• postgres. The database driver that lets you use PostgreSQL with 

ActiveRecord 

• mongrel. An application server used to host the Rails application 

• mongrel_cluster. Utilities to let you start and stop groups of mongrels 

at the same time 

The last command runs a Rails task to install all the extra gems that the application 

requires. If you didn't use the config.gem directive in your config/environment.rb 

file, then you may have to install your extra gems by hand using the gem install 

gemname command. 

Try to start your application with the RAILS_ENV=production script/console 

command. If this command succeeds, stop it, and then launch your pack of 

mongrels with: 

mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml 

If the first command doesn't succeed, you will get plenty of error messages to help 

you find the problem, which is usually a missing gem or file. Take this opportunity to 

go back and put in any missing config.gem directives so that you don't forget the 

gem in the future. 

Installing a front-end web server 

Nginx is the web server of choice for many virtual environments. It has low overhead 


Trademarks 




and is good at proxying connections to a back-end service like mongrel. Listing 10 

shows how to install nginx. 

Listing 10. Installing nginx 

# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm 

... 

# yum install nginx 

... 

Running Transaction 

Installing : nginx [1/1] 

Installed: nginx.i386 0:0.6.39-4.el5 

Complete! 

# chkconfig nginx on 

Listing 11 installs the Extra Packages for Enterprise Linux® (EPEL) repository, then 

installs nginx and makes sure it will come up on startup. 

Listing 11. An nginx configuration for a rails application 

# Two mongrels, balanced based on least connections 

upstream mongrel-payroll { 

fair; 

server 127.0.0.1:8100; 

server 127.0.0.1:8101; 

} 

server { 

listen 80; 

server_name 

app.smallpayroll.ca; 

root /home/payroll/current/public; 

gzip_static on; 

access_log /var/log/nginx/app.smallpayroll.ca_log main; 

error_page 404 /404.html; 

location / { 

# Because we're proxying, set some environment variables indicating this 

proxy_set_header X-Real-IP $remote_addr; 

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 

proxy_set_header Host $http_host; 

proxy_redirect false; 

proxy_max_temp_file_size 0; 

# Serve static files out of Root (eg public) 

if (-f $request_filename) { 

break; 

} 

# Handle page cached actions by looking for the appropriately named file 

if (-f $request_filename.html) { 

rewrite (.*) $1.html; 

break; 

} 

# Send all other requests to mongrel 

if (!-f $request_filename) { 

proxy_pass http://mongrel-payroll; 

break; 


Trademarks 




} 

} 

} 

error_page 500 502 503 504 /500.html; 

location = /500.html { 

root /home/payroll/current/public; 

} 

Listing 11 shows a fairly typical nginx configuration, with some elements thrown in to 

handle Rails page caching and send dynamic requests to an upstream mongrel. You 

could map other URLs to file names here, if needed. 

With the configuration in place, service nginx start starts the web server. 

Testing 

For testing, it would be helpful to be able to refer to your cloud instance using the 

regular domain name of your application, because you want to ensure that you're 

using your test site and not the production site. You do this through a local DNS 

override. In Windows, edit C:\windows\system32\drivers\etc\hosts; in UNIX, edit 

/etc/hosts. Add a line like: 

x.x.x.x 

app.smallpayroll.ca 

where x.x.x.x is the IP address of your cloud server and app.smallpayroll.ca is the 

name of your application. Restart your browser, and browse to your website. You will 

be using the cloud version of your application now. (Don't forget to comment out the 

line you just added when you want to go back to the production version!) 

At this point, you should be able to test that the cloud version of your application 

works just as well as the production version; fix any problems you find. Make careful 

note of whatever you find, as you'll want to script it in case you launch a second 

server. Because you're using the cloud version of your application, you can delete 

and restore your database without any users complaining. 

Bundling the new AMI 

The last thing to do is re-bundle your AMI. Any time you start a new instance, you 

lose everything in /mnt, and your root partition is reset to whatever is in the AMI. 

There's nothing you can do yet about the /mnt problem, but re-bundling makes sure 

that your AMI is just the way you left it. 

If the AMI you are starting from does not have the AMI tools, you can install them 

with following command: 

rpm -i --nodeps http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.noarch.rpm 


Trademarks 




Bundling an AMI is a three-step process: 

1. Create the image on the instance itself. 

2. Upload the image to Amazon S3. 

3. Register the AMI. 

Before proceeding, shut down your mongrel and PostgreSQL instances, just to 

make sure any open files are handled correctly. You must also copy your X.509 

keys, found in the Amazon Console, to /mnt on your server. Listing 12 shows the 

first two steps of bundling, which are done on the VM itself. 

Listing 12. Bundling the AMI 

# ec2-bundle-vol -d /mnt -e /mnt --privatekey /mnt/pk-mykey.pem \ 

--cert /mnt/cert-mycert.pem --user 223110335193 -p centos-ertw 

Please specify a value for arch [i386]: 

Copying / into the image file /mnt/centos-ertw... 

... 

Generating digests for each part... 

Digests generated. 

Creating bundle manifest... 

ec2-bundle-vol complete. 

# ec2-upload-bundle -b ertw.com -m /mnt/centos-ertw.manifest.xml \ 

--secret-key MYSECRETKEY --access-key MYACCESSKEY 

Creating bucket... 

Uploading bundled image parts to the S3 bucket ertw.com ... 

... 

Uploaded centos-ertw.part.37 

Uploading manifest ... 

Uploaded manifest. 

Bundle upload completed. 

The first command generates the bundle, specifying that /mnt is to be ignored and 

that the bundle will go in /mnt (the -e and -d options, respectively). The -k, 

--cert, and --user options point to your security credentials and AWS user ID, 

which are all found in the account settings of your AWS Management Console. The 

last option, -p, lets you name this AMI to differentiate it from others. 

The first command will run for about 10 minutes, depending on how full your root 

partition is. The second command uploads the bundle to Amazon S3. The -b option 

specifies a bucket name, which will be created if it doesn't exist already. The -m 

option points to the manifest file created in the last step. The last two options are 

your Amazon S3 credentials, which are found right next to your X.509 credentials in 

the AWS Management Console. Just remember that X.509 credentials are used for 

Amazon EC2 operations, while Amazon S3 uses text keys. 

Finally, run the command: 


Trademarks 




ec2-register ertw.com/centos-ertw.manifest.xml 

to register the AMI, and you will see the AMI identifier to use from now on. Note that 

the ec2-register command is not distributed with the AMI, so it's easiest to run it 

from the server where you started the original AMI. You could also install the 

Amazon EC2 tools on your Amazon EC2 instance. 

Performing the migration 

Now that you've got your cloud environment running, the migration itself should be 

rather simple. You've verified that everything works: All that remains is to 

resynchronize the data and cut over in an orderly fashion. 

Premigration tasks 

Some time before the migration, make sure you lower the TTL of your domain name 

records to 5 minutes. You should also develop a checklist of the steps you will take 

to move everything over, the tests you want to run to verify that everything is 

working, and the procedure to back out of the change, if necessary. 

Make sure your users are notified of the migration! 

Just before your migration time, take another look at your cloud environment to 

make sure it is ready to be synchronized and accept production traffic. 

Migrating the application 

To migrate the application, perform the following steps: 

1. Disable the current production site or put it in read-only mode, depending 

on the nature of the site. 

Because most of SmallPayroll's requests involve writing to the database 

or file system, the site will be disabled. The Capistrano deployment gem 

includes a task, cap deploy:web:disable, that puts a maintenance 

page on the site informing users that the site is down for maintenance. 

2. Stop the application services in the cloud environment in preparation for 

the data migration by killing your mongrel processes. 

3. Copy your database over the same way you did for testing. 

4. Re-run rsync, if necessary. 


Trademarks 




5. Restart the application servers with the command: 

mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml 

6. Make sure your hosts file is pointing to the cloud environment, and 

perform some smoke tests. Make sure users can log in and browse the 

site. 

Updating DNS 

If your smoke tests pass, then you can change your DNS records to point to your 

cloud environment. At this point, I find it helpful to keep a tail -f running on the 

web server's log file to watch for people coming in to the site. 

Chances are that your local DNS server still has the old information cached for the 

next 5 minutes. You can verify this with the dig command, as shown in Listing 13. 

Listing 13. Verifying the DNS server is caching the query 

# dig app.smallpayroll.ca @172.16.0.23 

; DiG 9.3.4 app.smallpayroll.ca @172.16.0.23 

; (1 server found) 

;; global options: printcmd 

;; Got answer: 

;; ->>HEADER



1. Set up the new environment. 

2. Test with a copy of production data. 

3. Turn off the old environment. 

4. Copy production data over to the new environment. 

5. Change DNS to point to the new environment. 

Despite now being "in the cloud," the application is probably worse off than it was 

before. Consider the following points: 

• The application is still running on one server. 

• If the server crashes, all the data is lost. 

• You have less control over performance than you do on a physical server. 

• The machine and application are not locked down. 

In the next article, you'll learn how to overcome these problems and start building a 

more robust environment for your application. 


Trademarks 




Resources 

Learn 

• In the Cloud Computing zone on developerWorks, get the resources you need 

to develop and deploy applications in the cloud and keep on top of recent cloud 

developments. 

• Request instance metadata from your Amazon EC2 instance to get information 

about the instance, from the SSH keys it should use to user-specified 

information. 

• Start your cloud adventure by looking around the AWS Management Console. 

• If you're going to work with Amazon EC2, you should familiarize yourself with 

the various guides that Amazon provides. 

• Learn about the IBM AMIs from Amazon's perspective and from IBM's 

perspective. 

• In the developerWorks Linux zone, find hundreds of how-to articles and 

tutorials, as well as downloads, discussion forums, and a wealth of other 

resources for Linux developers and administrators. 

• Stay current with developerWorks technical events and webcasts focused on a 

variety of IBM products and IT industry topics. 

• Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM 

products and tools, as well as IT industry trends. 

• Watch developerWorks on-demand demos ranging from product installation and 

setup demos for beginners, to advanced functionality for experienced 

developers. 

• Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on 

developerWorks. 

Get products and technologies 

• Ruby Enterprise Edition is a high-performance Ruby implementation that can be 

used by itself or along with Phusion Passenger to integrate with Apache or 

nginx. Either way, you get access to faster memory management and improved 

garbage collection. 

• Sign up for the IBM Industry Application Platform AMI for Development Use to 

get started with various IBM products in the cloud. Remember that you have to 

go through a checkout process, but you're not going to be charged anything 

until you use it. You can also use ami-90ed0ff9. 

• The Amazon EC2 API tools are used to communicate with the Amazon API to 

launch and terminate instances and re-bundle new ones. These tools are 


Trademarks 




periodically updated as new features are introduced to Amazon EC2, so it's 

worth checking back after product announcements for updates to this page. You 

will need at least the 2009-05-15 update, because you'll be using some of the 

load-balancing features later. 

• Evaluate IBM products in the way that suits you best: Download a product trial, 

try a product online, use a product in a cloud environment, or spend a few hours 

in the SOA Sandbox learning how to implement Service Oriented Architecture 

efficiently. 

Discuss 

• Get involved in the My developerWorks community. Connect with other 

developerWorks users while exploring the developer-driven blogs, forums, 

groups, and wikis. 

About the author 

Sean Walberg 

Sean Walberg is a network engineer and the author of two books on 

networking. He has worked in various industries, including health care 

and media. 


Trademarks

l-migrate2cloud-1-pd..

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?