l-migrate2cloud-1-pd..
l-migrate2cloud-1-pd..
l-migrate2cloud-1-pd..
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Migrate your Linux application to the Amazon<br />
cloud, Part 1: Initial migration<br />
How to migrate your application into the cloud<br />
Skill Level: Intermediate<br />
Sean Walberg (sean@ertw.com)<br />
Network Engineer<br />
13 Jul 2010<br />
Cloud computing and Infrastructure as a Service (IaaS) are well documented, but<br />
what's often not discussed is how to get a running application into a cloud<br />
environment. Discover how to move an application into the cloud and take advantage<br />
of the features this setup has to offer.<br />
Read more by Sean<br />
Browse all of Sean's articles on developerWorks.<br />
Infrastructure as a Service (IaaS) is a great concept: You use computing resources;<br />
you pay for them. You want more computing power; you pay more. The downside of<br />
this model is that you're working with computers that you'll never see or know much<br />
about. Once you get over that, however, there's a lot to be gained by using IaaS.<br />
Because the IaaS model is so different from the traditional model of buying servers,<br />
the way you manage your virtual computers changes. The way you run your<br />
application in the cloud also changes. Things you once took for granted, such as<br />
negligible latency between servers, are no longer givens.<br />
This series of articles follows the migration of a web application from a single<br />
physical server to Amazon Elastic Compute Cloud (Amazon EC2). Along the way,<br />
you learn how to adapt your application to the cloud environment and how to take<br />
advantage of the features that the cloud has to offer. To start, you see a straight<br />
migration from one physical server to a cloud server.<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 1 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
Working with Amazon EC2<br />
Amazon EC2 lets anyone with a credit card pay for servers by the hour, turning them<br />
on and off through an application programming interface (API). You have a variety of<br />
server types to choose from—depending on whether memory, disk, or CPU power is<br />
your primary concern—along with a suite of add-ons from persistent disks to load<br />
balancers. You pay only for what you use.<br />
Alongside the Amazon EC2 offering are others that give you, among other things,<br />
payment processing, databases, and message queuing. In this article series, you will<br />
be using Amazon Simple Storage Service (Amazon S3), which gives you access to<br />
disk space on a pay-per-use basis.<br />
The example application<br />
The web application that this series uses for examples is a payroll service called<br />
SmallPayroll.ca, written with the Ruby on Rails framework and a PostgreSQL back<br />
end. It is typical of many web applications: It has a database tier, an application tier,<br />
and a set of static files like cascading style sheet (CSS) and JavaScript files. Users<br />
navigate various forms to input and manipulate data, and they generate reports.<br />
The various components in use are:<br />
• Nginx. The front-end web server for static files and balancer to the middle<br />
tier.<br />
• Mongrel. The application server itself.<br />
• Ruby. The language you write the application in.<br />
• Gems. Third-party plug-ins and libraries for everything from database<br />
encryption to application-level monitoring.<br />
• PostgreSQL. The Structured Query Language database engine.<br />
Use of the site has exceeded the capacity of the single server that now houses it.<br />
Therefore, a migration to a new environment is in order, and this is a prime<br />
opportunity to move to the cloud.<br />
Desired improvements<br />
But simply moving from one server to a small number of cloud-based servers<br />
wouldn't take advantage of what can be done in the cloud, nor would it make for<br />
exciting reading. So, during the move, you'll make improvements, some of which are<br />
only possible in a cloud environment:<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 2 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
• Increased reliability. Because you can choose the size of server to run<br />
in the cloud, you can run multiple, smaller servers for redundancy.<br />
• Capacity for both scale-up and scale-down. Servers are incrementally<br />
added to the pool as the service grows. However, the number of servers<br />
can also be increased to accommodate short-term spikes in traffic or<br />
decreased during periodic lulls.<br />
• Cloud storage. Backups of the application data will be made to Amazon<br />
S3, eliminating the need for tape storage.<br />
• Automation. Everything in the Amazon environment—from the servers to<br />
the storage to the load balancers—can be automated. Less time<br />
managing an application means more time for other, more productive<br />
things.<br />
You'll make these improvements incrementally throughout this article series.<br />
Testing and migration strategies<br />
When deploying an application for the first time, you generally have the luxury of<br />
being able to test and tweak without the burden of production traffic. In contrast,<br />
when migrating an application, you have the challenge of users who are placing a<br />
load on the site. Once the new environment takes production traffic, the users will be<br />
expecting everything to work properly.<br />
A migration does not necessarily mean zero downtime. It's much easier if you can<br />
take the service offline for a period of time. You can use this outage window to<br />
perform final data synchronizations and allow for any network changes to stabilize.<br />
The window should not be used to do the initial deploy to the new environment—that<br />
is, the new environment should be in an operational state before the application<br />
migration starts. With this in mind, the key points are synchronization of data<br />
between the environments and network changes.<br />
As you plan your migration strategy, it helps to begin with a walk-through of your<br />
current environment. Answer the following questions:<br />
• What software do I use on my servers to run the application?<br />
• What software do I use on my servers to manage and monitor the<br />
application and server resources?<br />
• Where is all the user data kept? In databases? In files?<br />
• Are static assets, like images, CSS, and JavaScript files, stored<br />
somewhere else?<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 3 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
• What touchpoints into other systems does the application need?<br />
• Have I backed everything up recently?<br />
Notifying users<br />
In general, notifying your users is a good thing, even if you don't anticipate any<br />
downtime. In the case of the SmallPayroll.ca application, users tend to use the site<br />
at a consistent interval, corresponding with their two-week payroll cycle. Therefore,<br />
two weeks' notice would be a reasonable period. Sites like Google AdWords, which<br />
is the administrative interface for the Google advertising platform, give about a<br />
week's notice. If your website is more of a news site where users would not be as<br />
disrupted if it were down for an hour, you may choose to give notification on the day<br />
of the outage.<br />
The form of notification also varies depending on the nature of your site and how you<br />
currently communicate with your users. For SmallPayroll.ca, a prominent message<br />
when the user logs in will be enough. For example, a message like "The system will<br />
be unavailable between 12:01 a.m. and 1 a.m. Eastern time, June 24, 2010.<br />
Everything entered prior to this will still be saved. For more information, click here."<br />
This message provides the three key pieces of information that users need to know:<br />
• When the outage will happen, including the time zone<br />
• Reassurance that their data will be safe<br />
• Pointer to further information<br />
If possible, avoid using 12:00 a.m. or 12:00 p.m., including the term midnight. These<br />
tend to confuse people, as many are not sure if midnight on June 17 refers to early<br />
morning (12:01 a.m.) or very late (11:59 p.m.). Similarly, many are not sure whether<br />
noon means 12 a.m. or 12 p.m. It's much easier to add a minute and make the time<br />
unambiguous.<br />
Your details may be different, especially if you anticipate partial functionality during<br />
the outage. If you decide that you are going to put the notice up only during the<br />
outage (such as for a news site), the same information will still be helpful. My<br />
favorite site outage screen was along the lines of "The site is down for maintenance;<br />
back up around 3 p.m. EST. Play this game of Asteroids while you're waiting!"<br />
Don't neglect your internal users, either. If you have account representatives, you<br />
will want to give them notice in case their clients ask any questions.<br />
DNS considerations<br />
The domain name system (DNS) takes care of translating a name like<br />
www.example.com into an IP address like 192.0.32.10. Your computer connects to<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 4 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
IP addresses, so this translation is important. When migrating from one environment<br />
to another, you are almost guaranteed to be using a different IP address (the<br />
exception would be if you're staying in the same physical building).<br />
Computers cache the name-to-IP mapping for a certain period of time, known as the<br />
time to live (TTL), to reduce overall response time. When you make the switch from<br />
one environment to another—and therefore from one IP address to another—people<br />
who have the DNS entry cached will continue to try to use the old environment. The<br />
DNS entry for the application and its associated TTL must be managed carefully.<br />
TTLs are typically between one hour and one day. In preparation for a migration,<br />
though, you would want the TTL to be something short such as 5 minutes. This<br />
change must be made at least one TTL period before you intend to change the<br />
address, because computers get the TTL along with the name-to-IP mapping. For<br />
example, if the TTL for www.example.com were set to 86,400 seconds (one day),<br />
you would need to reset the TTL to 5 minutes at least one day before the migration.<br />
Decoupling the old and new environments<br />
It is essential that you fully test your new environment before migrating. All testing<br />
should happen in isolation from the production environment, preferably with a<br />
snapshot of production data so you can better exercise the new environment.<br />
Performing a full test with a snapshot of production data serves two purposes. The<br />
first is that you are more likely to spot errors if you are using real-world data,<br />
because it is more unpredictable than the test data used during development.<br />
Real-world data may refer to files that you forgot to copy over or that require certain<br />
configurations that were forgotten during your walk-through.<br />
The second reason to use production data is that you can practice your migration at<br />
the same time as you load data. You should be able to prove most aspects of your<br />
migration plan, except for the actual switch of environments.<br />
Even though you will be mocking up your new environment as if it were production,<br />
only one environment can be associated with the host name of the application. The<br />
easiest way to get around this requirement is to make a DNS override in your hosts<br />
file. In UNIX®, this file resides at /etc/hosts; in Windows®, it resides in<br />
C:\windows\system32\drivers\etc\hosts. Simply follow the format of the existing lines,<br />
and add an entry pointing your application's host name to its future IP address. Don't<br />
forget to do the same for any image servers or anything else that you will be moving.<br />
You will probably have to restart your browser, but after that, you will be able to<br />
enter your production URL and be taken to your new environment, instead.<br />
An Amazon EC2 primer<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 5 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
The Amazon EC2 service allows you to pay for a virtual machine (VM) by the hour.<br />
Amazon offers several different types of machines and classifies them by their CPU,<br />
memory, and disk profiles. Amazon measures memory and disk in terms of<br />
gigabytes and CPU in terms of Amazon EC2 Compute Units (ECU), where 1 ECU is<br />
roughly a 1.0 to 1.2GHz AMD Opteron or Intel® Xeon® processor (2007 era). For<br />
example, the standard small instance gives you 1.7GB of memory, 160GB of disk<br />
space, and 1 ECU of CPU. At the time of this writing, the biggest machine is the<br />
High-memory Quadruple Extra Large, which has 68.4GB of memory, 1.7TB of disk<br />
space, and 26 ECUs split across eight virtual cores. The prices range from US 8.5<br />
cents per hour for the smallest to US$2.40 per hour for the biggest.<br />
An Amazon EC2 instance begins life as an Amazon Machine Image (AMI), which is<br />
a template you use to build any number of VMs. Amazon publishes some AMIs, and<br />
you can make your own and share them with others. Some of these user-created<br />
AMIs are available at no cost; some incur an hourly charge on top of the Amazon<br />
hourly charge. For example, IBM publishes several paid AMIs that let you pay for<br />
licensing on an hourly basis.<br />
When you want to boot a VM, you choose the machine type and an AMI. The AMI is<br />
stored in Amazon S3 and copied to the root partition of your VM when you launch<br />
the instance. The root partition is always 10GB. The storage space associated with<br />
the machine type is called the instance storage or ephemeral storage and is<br />
presented to your VM as a separate drive. The storage is called ephemeral, because<br />
when you shut down your instance, the information is gone forever. You are required<br />
to back up your own data periodically to protect against loss. This also means that if<br />
the physical host running your instance crashes, your instance is shut down and the<br />
ephemeral disk is lost.<br />
The Amazon Machine Image<br />
All AMIs are assigned an identifier by Amazon, such as ami-0bbd5462. Amazon<br />
provides some public AMIs, and other people have made their own AMIs public. You<br />
can choose to start with a public AMI and make your own modifications, or you can<br />
start from scratch. Any time you make changes to the root file system of an AMI, you<br />
can save it as a new AMI, which is called re-bundling.<br />
In this series, you will be starting off with a publicly available CentOS image, though<br />
you can choose a different one. It is wise to spend some time looking through any<br />
image you use to make sure there are no extra accounts and that the packages are<br />
u<strong>pd</strong>ated. It is also possible to roll your own AMI from scratch, but that is outside the<br />
scope of this article.<br />
The Amazon API<br />
All of the functionality necessary to start, stop, and use the Amazon EC2 cloud is<br />
available using a web service. Amazon publishes the specifications for the web<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 6 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
services and also provides a set of command-line tools. You should download these<br />
tools before proceeding (see Resources). I also encourage you to look at the quick<br />
start guide (see Resources) to get your environment set up, which will save you a lot<br />
of typing.<br />
You authenticate to the API using security credentials. These credentials are found<br />
from the Account link within the Amazon Web Services (AWS) management<br />
console (see Resources). You will need your X.509 certificate files and your access<br />
keys. Keep these safe! Anyone with them could use AWS resources and incur<br />
charges on your behalf.<br />
Before you launch your first instance<br />
Before you launch your first instance, you must generate Secure Shell (SSH) keys to<br />
authenticate to your new instance and set up the virtual firewall to protect your<br />
instance. Listing 1 shows the use of the ec2-add-keypair command to generate<br />
an SSH key pair.<br />
Listing 1. Generating an SSH key pair<br />
[sean@sergeant:~]$ ec2-add-keypair main<br />
KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01<br />
-----BEGIN RSA PRIVATE KEY-----<br />
MIIEpAIBAAKCAQEAu8cTsq84bHLVhDG3n/fe9FGz0fs0j/FwZiDDovwfpxA/lijaedg6lA7KBzvn<br />
...<br />
-----END RSA PRIVATE KEY-----<br />
[sean@sergeant:~]$ ec2-describe-keypairs<br />
KEYPAIR main 40:88:59:b1:c5:bc:05:a1:5e:7c:61:23:5f:bc:dd:fe:75:f0:48:01<br />
The first command tells Amazon to generate a key pair with the name main. The first<br />
line of the result gives the hash of the key. The rest of the output is an unencrypted<br />
PEM private key. You must store this key somewhere—for example,<br />
~/.ssh/main.pem. Amazon retains the public portion of the key, which will be made<br />
available to the VMs you launch.<br />
The second command, ec2-describe-keypairs, asks Amazon for the current<br />
list of key pairs. The result is the name of the key pair, followed by the hash.<br />
Each instance is protected by a virtual firewall that initially allows nothing in. Amazon<br />
EC2 calls these instances security groups and has API calls and commands to<br />
manipulate them. You will look at these more closely when the time comes. In the<br />
meantime, Listing 2 shows how to view your current groups.<br />
Listing 2. Displaying the current security groups<br />
[sean@sergeant:~]$ ec2-describe-group<br />
GROUP 223110335193 default default group<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 7 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
Listing 2 shows a group called default with a description of "default group." The user<br />
ID associated with the group is 223110335193. There are no rules in this group. If<br />
there were, they would be described below the group with the word PERMISSION in<br />
the left column.<br />
Preparing the cloud environment<br />
The first step is to prepare the cloud environment to test the application. The new<br />
environment will mimic the current production environment.<br />
Start by launching the AMI, which has an ID of ami-10b55379. Listing 3 shows the<br />
AMI being launched and the status being checked.<br />
Listing 3. Launching the CentOS AMI<br />
[sean@sergeant:~]$ ec2-run-instances ami-10b55379 -k main<br />
RESERVATION r-750fff1e 223110335193 default<br />
INSTANCE i-75aaf41e ami-10b55379 pending main 0 m1.small<br />
2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />
instance-store<br />
[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e<br />
RESERVATION r-750fff1e 223110335193 default<br />
i-75aaf41e ami-10b55379 pending main 0 E3D48CEE m1.small<br />
2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />
instance-store<br />
[sean@sergeant:~]$ ec2-describe-instances i-75aaf41e<br />
RESERVATION r-750fff1e 223110335193 default<br />
INSTANCE i-75aaf41e ami-10b55379 ec2-184-73-43-141.compute-1.amazonaws.com<br />
domU-12-31-39-00-64-71.compute-1.internal running main 0 E3D48CEE m1.small<br />
2010-05-15T02:02:57+0000 us-east-1a aki-3038da59 ari-3238da5b monitoring-disabled<br />
184.73.43.141 10.254.107.127 instance-store<br />
The first command launches the instance using the ami-10b55379 AMI and specifies<br />
that the key pair generated in Listing 1 is to be used to authenticate to the machine.<br />
The command returns several pieces of information, the most important being the<br />
instance identifier (i-750fff1e), which is the identity of the machine in the Amazon<br />
EC2 cloud. The second command uses the ec2-describe-instances<br />
command, which lists all the running instances. In Listing 3, the instance identifier<br />
has been passed on the command line to only show information about that instance.<br />
The state of the instance is listed as pending, which means that the instance is still<br />
being started. The IBM AMI is large, so it typically takes 5-10 minutes just to start.<br />
Running the same command some time later shows that the state is running and<br />
that the external IP address of 184.73.43.141 has been given. The internal IP<br />
address that starts with 10 is useful for talking within the Amazon EC2 cloud, but not<br />
now.<br />
You can then use SSH to connect to the server using the key you generated earlier.<br />
But first, you must allow SSH (22/TCP) in. Listing 4 shows how to authorize the<br />
connection and log in to your new server.<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 8 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
Understanding SSH keys<br />
If you're not familiar with SSH keys, it's helpful to know that SSH can authenticate<br />
users with keys instead of passwords. You generate a key pair, which is composed<br />
of a public key and a private key. You keep the private key private and upload the<br />
public key to a file called authorized_keys, which is inside the $HOME/.ssh directory.<br />
When you connect to a server with SSH, the client can try to authenticate with the<br />
key. If that succeeds, you're logged in.<br />
One property of the key pair is that messages encrypted with one of the keys can be<br />
decrypted only with the other. When you connect to a server, the server can encrypt<br />
a message with the public key stored in the authorized_keys file. If you can decrypt<br />
the message using your public key, the server knows you are authorized to log in<br />
without a password.<br />
The next logical question is, "How is the authorized_keys file filled in with the public<br />
key that's stored with Amazon?" Each Amazon EC2 instance can talk to a web<br />
server in the Amazon EC2 cloud at http://169.254.169.254 and retrieve metadata<br />
about the instance. One of the URLs is<br />
http://169.254.169.254/latest/meta-data/public-keys/0/openssh-key, which returns<br />
the public key associated with the image.<br />
On startup, the AMI retrieves the public key and stores it in authorized_keys. This is<br />
done in /etc/init.d/getssh in the example AMI. It could just as easily happen in<br />
rc.local.<br />
Another use for instance metadata is to pass information to the image. You could<br />
have one generic AMI that could be either a web server or a background job server<br />
and have the instance decide which services to start based on the parameters you<br />
pass when starting the image.<br />
Listing 4. Connecting to the instance<br />
[sean@sergeant:~]$ ec2-authorize default -p 22 -s $MYIP/32<br />
...<br />
[sean@sergeant:~]$ ssh -i ~/.ssh/main.pem root@184.73.43.141<br />
The authenticity of host '184.73.43.141 (184.73.43.141)' can't be established.<br />
RSA key fingerprint is af:c2:1e:93:3c:16:76:6b:c1:be:47:d5:81:82:89:80.<br />
Are you sure you want to continue connecting (yes/no)? yes<br />
Warning: Permanently added '184.73.43.141' (RSA) to the list of known hosts.<br />
...<br />
The first command allows port 22 (TCP is the default option) from a source of your<br />
IP address. The /32 means that only the host is allowed, not the entire network. The<br />
ssh command connects to the server using the private key.<br />
Installing Ruby<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 9 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
CentOS comes with a dated Ruby version, so you will install Ruby Enterprise Edition<br />
(REE), which is a high-performance Ruby interpreter that's compatible with the<br />
current 1.8.7 branch of Ruby. Despite the expensive-sounding name, the software is<br />
open source. Listing 5 shows how to install REE.<br />
Listing 5. Installing REE<br />
# rpm -e ruby ruby-libs<br />
# yum -y install gcc-c++ zlib-devel openssl-devel readline-devel<br />
...<br />
Complete!<br />
# wget http://rubyforge.org/frs/download.php/71096/ruby-enterprise-1.8.7-2010.02.tar.gz<br />
...<br />
# tar -xzf ruby-enterprise-1.8.7-2010.02.tar.gz<br />
# ruby-enterprise-1.8.7-2010.02/installer -a /opt/ree<br />
The first two commands from Listing 5 remove the default Ruby installation and<br />
install a C compiler and a few necessary development packages. wget downloads<br />
the current REE tarball, which is then unpacked by tar. Finally, the last command<br />
runs the installer with an option to accept all defaults and place the results in<br />
/opt/ree. The installer is smart enough to tell you the commands you have to run if<br />
you're missing some packages, so look closely at the output if the installation isn't<br />
working.<br />
After Ruby is installed, add the bin directory to your path with export<br />
PATH="/opt/ree/bin:$PATH", which you can place in the system-wide<br />
/etc/bashrc or the .bashrc directory within your home directory.<br />
Installing PostgreSQL<br />
The PostgreSQL server is part of the CentOS distribution, so all you need to do is<br />
install it with the yum utility. Listing 6 shows how to install PostgreSQL and make<br />
sure it will start on boot.<br />
Listing 6. Installing PostgreSQL<br />
# yum -y install postgresql-server postgresql-devel<br />
...<br />
Installed: postgresql-devel.i386 0:8.1.21-1.el5_5.1<br />
postgresql-server.i386 0:8.1.21-1.el5_5.1<br />
Dependency Installed: postgresql.i386 0:8.1.21-1.el5_5.1<br />
postgresql-libs.i386 0:8.1.21-1.el5_5.1<br />
Complete!<br />
# chkconfig postgresql on<br />
The yum command installs packages from a repository. In Listing 7, you are<br />
installing the PostgreSQL server component and development libraries. Doing so<br />
automatically pulls in the core database utilities and any other packages you need.<br />
You will not need the development package yet, but when it comes time to integrate<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 10 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
Rails and PostgreSQL, you will need the libraries inside postgresql-devel.<br />
By default, the database stores its files in /var/lib/pgsql/data, which is part of the root<br />
file system. You move this directory to the instance storage on /mnt, as shown in<br />
Listing 7.<br />
Listing 7. Moving the PostgreSQL data store to /mnt<br />
# mv /var/lib/pgsql/data /mnt<br />
# ln -s /mnt/data /var/lib/pgsql/data<br />
# service postgresql start<br />
After entering the commands in Listing 7, PostgreSQL is running out of /mnt.<br />
Next, you must enable password logins for the payroll_prod database (which you'll<br />
create in the next step). By default, PostgreSQL does not use passwords: It uses an<br />
internal identification system. Simply add:<br />
host "payroll_prod" all 127.0.0.1/32 md5<br />
to the top of /var/lib/pgsql/data/pg_hba.conf, and then run:<br />
su - postgres -c 'pg_ctl reload'<br />
to make the change take effect. With this configuration, normal logins to PostgreSQL<br />
don't need a password (which is why the reload command didn't need a<br />
password), but any access to the payroll database will.<br />
The final step is to set up the Rails database from the command line. Run su -<br />
postgres -c psql, and follow along in Listing 8.<br />
Listing 8. Creating the user and database<br />
postgres=# create user payroll with password 'secret';<br />
CREATE ROLE<br />
postgres=# create database payroll_prod;<br />
CREATE DATABASE<br />
postgres=# grant all privileges on database payroll_prod to payroll;<br />
GRANT<br />
And with that, your database is created.<br />
Migrating the data<br />
For testing, you should grab a database dump of your production environment from<br />
a certain point in time so that you have something to test with. The SmallPayroll<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 11 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
application stores data both in the database and the file system. The database will<br />
be dumped using the pg_dump command that comes with PostgreSQL; the file<br />
system data will use rsync. The database will have to be wiped and re-transferred<br />
for the migration because of the nature of database dumps, but the file system data<br />
only needs to transfer new and changed files, because rsync can detect when a file<br />
hasn't changed. Thus, the testing part of the plan helps speed up the migration,<br />
because most of the data will already be there.<br />
The fastest way to copy the database is to run:<br />
pg_dump payroll_prod | gzip -c > /tmp/dbbackup.gz<br />
on your production machine, copy dbbackup.gz to the cloud server, and then run:<br />
zcat dbbackup.gz | psql payroll_prod<br />
This command simply creates a compressed dump of the database from one server,<br />
and then replays all the transactions on the other server.<br />
rsync is just as simple. From your production server, run:<br />
rsync -avz -e "ssh -i .ssh/main.pem" /var/uploads/ root@174.129.138.83:/var/uploads/<br />
This command copies everything from /var/uploads from the current production<br />
server to the new server. If you run it again, only the changed files are copied over,<br />
saving you time later on synchronizations.<br />
Because you are copying the database over, you do not have to apply your Rails<br />
migrations first. Rails will believe the database is up to date, because you already<br />
copied over the schema_migrations table.<br />
Deploying the Rails application<br />
At this point, you have the base server set up but not your application. You must<br />
install some basic gems, along with any gems your application requires, before your<br />
application will run. Listing 9 shows the commands to u<strong>pd</strong>ate your gems. Note that<br />
you must be in the root of your Rails application, so copy it over to your server first.<br />
Listing 9. U<strong>pd</strong>ating RubyGems and installing your gems<br />
# gem u<strong>pd</strong>ate --system<br />
U<strong>pd</strong>ating RubyGems<br />
Nothing to u<strong>pd</strong>ate<br />
# gem install rails mongrel mongrel-cluster postgres<br />
Successfully installed rails-2.3.8<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 12 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
Building native extensions. This could take a while...<br />
Successfully installed gem_plugin-0.2.3<br />
Successfully installed daemons-1.1.0<br />
Successfully installed cgi_multipart_eof_fix-2.5.0<br />
Successfully installed mongrel-1.1.5<br />
Successfully installed mongrel_cluster-1.0.5<br />
Building native extensions. This could take a while...<br />
Successfully installed postgres-0.7.9.2008.01.28<br />
7 gems installed<br />
...<br />
# rake gems:install<br />
(in /home/payroll)<br />
gem install haml<br />
Successfully installed haml-3.0.12<br />
1 gem installed<br />
Installing ri documentation for haml-3.0.12...<br />
Installing RDoc documentation for haml-3.0.12...<br />
gem install money<br />
...<br />
The first command makes sure that RubyGems itself is up to date. The second<br />
command installs some helpful gems:<br />
• rails. The Ruby on Rails framework<br />
• postgres. The database driver that lets you use PostgreSQL with<br />
ActiveRecord<br />
• mongrel. An application server used to host the Rails application<br />
• mongrel_cluster. Utilities to let you start and stop groups of mongrels<br />
at the same time<br />
The last command runs a Rails task to install all the extra gems that the application<br />
requires. If you didn't use the config.gem directive in your config/environment.rb<br />
file, then you may have to install your extra gems by hand using the gem install<br />
gemname command.<br />
Try to start your application with the RAILS_ENV=production script/console<br />
command. If this command succeeds, stop it, and then launch your pack of<br />
mongrels with:<br />
mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml<br />
If the first command doesn't succeed, you will get plenty of error messages to help<br />
you find the problem, which is usually a missing gem or file. Take this opportunity to<br />
go back and put in any missing config.gem directives so that you don't forget the<br />
gem in the future.<br />
Installing a front-end web server<br />
Nginx is the web server of choice for many virtual environments. It has low overhead<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 13 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
and is good at proxying connections to a back-end service like mongrel. Listing 10<br />
shows how to install nginx.<br />
Listing 10. Installing nginx<br />
# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm<br />
...<br />
# yum install nginx<br />
...<br />
Running Transaction<br />
Installing : nginx [1/1]<br />
Installed: nginx.i386 0:0.6.39-4.el5<br />
Complete!<br />
# chkconfig nginx on<br />
Listing 11 installs the Extra Packages for Enterprise Linux® (EPEL) repository, then<br />
installs nginx and makes sure it will come up on startup.<br />
Listing 11. An nginx configuration for a rails application<br />
# Two mongrels, balanced based on least connections<br />
upstream mongrel-payroll {<br />
fair;<br />
server 127.0.0.1:8100;<br />
server 127.0.0.1:8101;<br />
}<br />
server {<br />
listen 80;<br />
server_name<br />
app.smallpayroll.ca;<br />
root /home/payroll/current/public;<br />
gzip_static on;<br />
access_log /var/log/nginx/app.smallpayroll.ca_log main;<br />
error_page 404 /404.html;<br />
location / {<br />
# Because we're proxying, set some environment variables indicating this<br />
proxy_set_header X-Real-IP $remote_addr;<br />
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;<br />
proxy_set_header Host $http_host;<br />
proxy_redirect false;<br />
proxy_max_temp_file_size 0;<br />
# Serve static files out of Root (eg public)<br />
if (-f $request_filename) {<br />
break;<br />
}<br />
# Handle page cached actions by looking for the appropriately named file<br />
if (-f $request_filename.html) {<br />
rewrite (.*) $1.html;<br />
break;<br />
}<br />
# Send all other requests to mongrel<br />
if (!-f $request_filename) {<br />
proxy_pass http://mongrel-payroll;<br />
break;<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 14 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
}<br />
}<br />
}<br />
error_page 500 502 503 504 /500.html;<br />
location = /500.html {<br />
root /home/payroll/current/public;<br />
}<br />
Listing 11 shows a fairly typical nginx configuration, with some elements thrown in to<br />
handle Rails page caching and send dynamic requests to an upstream mongrel. You<br />
could map other URLs to file names here, if needed.<br />
With the configuration in place, service nginx start starts the web server.<br />
Testing<br />
For testing, it would be helpful to be able to refer to your cloud instance using the<br />
regular domain name of your application, because you want to ensure that you're<br />
using your test site and not the production site. You do this through a local DNS<br />
override. In Windows, edit C:\windows\system32\drivers\etc\hosts; in UNIX, edit<br />
/etc/hosts. Add a line like:<br />
x.x.x.x<br />
app.smallpayroll.ca<br />
where x.x.x.x is the IP address of your cloud server and app.smallpayroll.ca is the<br />
name of your application. Restart your browser, and browse to your website. You will<br />
be using the cloud version of your application now. (Don't forget to comment out the<br />
line you just added when you want to go back to the production version!)<br />
At this point, you should be able to test that the cloud version of your application<br />
works just as well as the production version; fix any problems you find. Make careful<br />
note of whatever you find, as you'll want to script it in case you launch a second<br />
server. Because you're using the cloud version of your application, you can delete<br />
and restore your database without any users complaining.<br />
Bundling the new AMI<br />
The last thing to do is re-bundle your AMI. Any time you start a new instance, you<br />
lose everything in /mnt, and your root partition is reset to whatever is in the AMI.<br />
There's nothing you can do yet about the /mnt problem, but re-bundling makes sure<br />
that your AMI is just the way you left it.<br />
If the AMI you are starting from does not have the AMI tools, you can install them<br />
with following command:<br />
rpm -i --nodeps http://s3.amazonaws.com/ec2-downloads/ec2-ami-tools.noarch.rpm<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 15 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
Bundling an AMI is a three-step process:<br />
1. Create the image on the instance itself.<br />
2. Upload the image to Amazon S3.<br />
3. Register the AMI.<br />
Before proceeding, shut down your mongrel and PostgreSQL instances, just to<br />
make sure any open files are handled correctly. You must also copy your X.509<br />
keys, found in the Amazon Console, to /mnt on your server. Listing 12 shows the<br />
first two steps of bundling, which are done on the VM itself.<br />
Listing 12. Bundling the AMI<br />
# ec2-bundle-vol -d /mnt -e /mnt --privatekey /mnt/pk-mykey.pem \<br />
--cert /mnt/cert-mycert.pem --user 223110335193 -p centos-ertw<br />
Please specify a value for arch [i386]:<br />
Copying / into the image file /mnt/centos-ertw...<br />
...<br />
Generating digests for each part...<br />
Digests generated.<br />
Creating bundle manifest...<br />
ec2-bundle-vol complete.<br />
# ec2-upload-bundle -b ertw.com -m /mnt/centos-ertw.manifest.xml \<br />
--secret-key MYSECRETKEY --access-key MYACCESSKEY<br />
Creating bucket...<br />
Uploading bundled image parts to the S3 bucket ertw.com ...<br />
...<br />
Uploaded centos-ertw.part.37<br />
Uploading manifest ...<br />
Uploaded manifest.<br />
Bundle upload completed.<br />
The first command generates the bundle, specifying that /mnt is to be ignored and<br />
that the bundle will go in /mnt (the -e and -d options, respectively). The -k,<br />
--cert, and --user options point to your security credentials and AWS user ID,<br />
which are all found in the account settings of your AWS Management Console. The<br />
last option, -p, lets you name this AMI to differentiate it from others.<br />
The first command will run for about 10 minutes, depending on how full your root<br />
partition is. The second command uploads the bundle to Amazon S3. The -b option<br />
specifies a bucket name, which will be created if it doesn't exist already. The -m<br />
option points to the manifest file created in the last step. The last two options are<br />
your Amazon S3 credentials, which are found right next to your X.509 credentials in<br />
the AWS Management Console. Just remember that X.509 credentials are used for<br />
Amazon EC2 operations, while Amazon S3 uses text keys.<br />
Finally, run the command:<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 16 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
ec2-register ertw.com/centos-ertw.manifest.xml<br />
to register the AMI, and you will see the AMI identifier to use from now on. Note that<br />
the ec2-register command is not distributed with the AMI, so it's easiest to run it<br />
from the server where you started the original AMI. You could also install the<br />
Amazon EC2 tools on your Amazon EC2 instance.<br />
Performing the migration<br />
Now that you've got your cloud environment running, the migration itself should be<br />
rather simple. You've verified that everything works: All that remains is to<br />
resynchronize the data and cut over in an orderly fashion.<br />
Premigration tasks<br />
Some time before the migration, make sure you lower the TTL of your domain name<br />
records to 5 minutes. You should also develop a checklist of the steps you will take<br />
to move everything over, the tests you want to run to verify that everything is<br />
working, and the procedure to back out of the change, if necessary.<br />
Make sure your users are notified of the migration!<br />
Just before your migration time, take another look at your cloud environment to<br />
make sure it is ready to be synchronized and accept production traffic.<br />
Migrating the application<br />
To migrate the application, perform the following steps:<br />
1. Disable the current production site or put it in read-only mode, depending<br />
on the nature of the site.<br />
Because most of SmallPayroll's requests involve writing to the database<br />
or file system, the site will be disabled. The Capistrano deployment gem<br />
includes a task, cap deploy:web:disable, that puts a maintenance<br />
page on the site informing users that the site is down for maintenance.<br />
2. Stop the application services in the cloud environment in preparation for<br />
the data migration by killing your mongrel processes.<br />
3. Copy your database over the same way you did for testing.<br />
4. Re-run rsync, if necessary.<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 17 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
5. Restart the application servers with the command:<br />
mongrel_rails cluster::start -C /home/payroll/current/config/mongrel_cluster.yml<br />
6. Make sure your hosts file is pointing to the cloud environment, and<br />
perform some smoke tests. Make sure users can log in and browse the<br />
site.<br />
U<strong>pd</strong>ating DNS<br />
If your smoke tests pass, then you can change your DNS records to point to your<br />
cloud environment. At this point, I find it helpful to keep a tail -f running on the<br />
web server's log file to watch for people coming in to the site.<br />
Chances are that your local DNS server still has the old information cached for the<br />
next 5 minutes. You can verify this with the dig command, as shown in Listing 13.<br />
Listing 13. Verifying the DNS server is caching the query<br />
# dig app.smallpayroll.ca @172.16.0.23<br />
; DiG 9.3.4 app.smallpayroll.ca @172.16.0.23<br />
; (1 server found)<br />
;; global options: printcmd<br />
;; Got answer:<br />
;; ->>HEADER
ibm.com/developerWorks<br />
developerWorks®<br />
1. Set up the new environment.<br />
2. Test with a copy of production data.<br />
3. Turn off the old environment.<br />
4. Copy production data over to the new environment.<br />
5. Change DNS to point to the new environment.<br />
Despite now being "in the cloud," the application is probably worse off than it was<br />
before. Consider the following points:<br />
• The application is still running on one server.<br />
• If the server crashes, all the data is lost.<br />
• You have less control over performance than you do on a physical server.<br />
• The machine and application are not locked down.<br />
In the next article, you'll learn how to overcome these problems and start building a<br />
more robust environment for your application.<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 19 of 21
developerWorks®<br />
ibm.com/developerWorks<br />
Resources<br />
Learn<br />
• In the Cloud Computing zone on developerWorks, get the resources you need<br />
to develop and deploy applications in the cloud and keep on top of recent cloud<br />
developments.<br />
• Request instance metadata from your Amazon EC2 instance to get information<br />
about the instance, from the SSH keys it should use to user-specified<br />
information.<br />
• Start your cloud adventure by looking around the AWS Management Console.<br />
• If you're going to work with Amazon EC2, you should familiarize yourself with<br />
the various guides that Amazon provides.<br />
• Learn about the IBM AMIs from Amazon's perspective and from IBM's<br />
perspective.<br />
• In the developerWorks Linux zone, find hundreds of how-to articles and<br />
tutorials, as well as downloads, discussion forums, and a wealth of other<br />
resources for Linux developers and administrators.<br />
• Stay current with developerWorks technical events and webcasts focused on a<br />
variety of IBM products and IT industry topics.<br />
• Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM<br />
products and tools, as well as IT industry trends.<br />
• Watch developerWorks on-demand demos ranging from product installation and<br />
setup demos for beginners, to advanced functionality for experienced<br />
developers.<br />
• Follow developerWorks on Twitter, or subscribe to a feed of Linux tweets on<br />
developerWorks.<br />
Get products and technologies<br />
• Ruby Enterprise Edition is a high-performance Ruby implementation that can be<br />
used by itself or along with Phusion Passenger to integrate with Apache or<br />
nginx. Either way, you get access to faster memory management and improved<br />
garbage collection.<br />
• Sign up for the IBM Industry Application Platform AMI for Development Use to<br />
get started with various IBM products in the cloud. Remember that you have to<br />
go through a checkout process, but you're not going to be charged anything<br />
until you use it. You can also use ami-90ed0ff9.<br />
• The Amazon EC2 API tools are used to communicate with the Amazon API to<br />
launch and terminate instances and re-bundle new ones. These tools are<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 20 of 21
ibm.com/developerWorks<br />
developerWorks®<br />
periodically u<strong>pd</strong>ated as new features are introduced to Amazon EC2, so it's<br />
worth checking back after product announcements for u<strong>pd</strong>ates to this page. You<br />
will need at least the 2009-05-15 u<strong>pd</strong>ate, because you'll be using some of the<br />
load-balancing features later.<br />
• Evaluate IBM products in the way that suits you best: Download a product trial,<br />
try a product online, use a product in a cloud environment, or spend a few hours<br />
in the SOA Sandbox learning how to implement Service Oriented Architecture<br />
efficiently.<br />
Discuss<br />
• Get involved in the My developerWorks community. Connect with other<br />
developerWorks users while exploring the developer-driven blogs, forums,<br />
groups, and wikis.<br />
About the author<br />
Sean Walberg<br />
Sean Walberg is a network engineer and the author of two books on<br />
networking. He has worked in various industries, including health care<br />
and media.<br />
Initial migration<br />
Trademarks<br />
© Copyright IBM Corporation 2010. All rights reserved. Page 21 of 21