The Real World: Amazon EC2 (Linux)

Amazon Web Services LogoIt was all the rage a while ago to spend 10 cents on an Amazon EC2 instance for an hour, blog about it, and deactivate the instance without doing any actual work. Not very useful.

Recently I had the chance to use EC2 for a commercial project for 2 months, so I thought I’d share some real-world experiences.

A friend of mine with a small West Coast data center (1/2 a rack or so, slow 2000-2005 vintage Intel servers) needed to do something big: build out a flexible internal SOAS renderfarm to generate promotional morphing video clips for an entertainment-industry client.

The existing capacity was about 1/morph per second sustained according to benchmarks.

The system requirements were:

  • linux OS running apache, FFmpeg and proprietary software
  • static IP address to reach each server from the permanent data center
  • easy and affordable to scale up or down (possibly 10x or more) with 4 hours notice or less. Start with 10 morphs/second, scalable to 100+ morphs/second.
  • Core 2 Duo CPUs for rendering (number crunching)
  • not much disk space, bandwidth (results hosted on Akamai) or memory needed – mostly CPU
  • West Coast location for servers preferred, not critical though
  • no important data to be stored off-site

The requirement to install custom software leaves you with dedicated servers or Amazon EC2. All the other grid offerings I looked at (Google Grid, Mosso, others) had limitations on what software you could install, basically PHP or python for web use – no root access, can’t install ffmpeg or proprietary binaries.

I like dedicated servers, but their problem is that generally you pay by the month, so it’s hard to quickly reduce capacity and expenses depending on advertising spikes. Sometimes new servers can be provisioned quickly, and other times they’re “sold out.” Not so “elastic.”

Some good American business dedicated server hosts are theplanet and gigeservers.

I was somewhat familiar with EC2 from various conferences, especially the Hadoop Conference last year. The talks by The New York Times (cloud OCR) and AutoCAD (vendor metadata processing) were the most influential on me that I’ve been to.

I did some benchmarking on the 20 cent instance (c1.medium: High-CPU Medium Instance – 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each). 32-bit, 1.7GB RAM, 350GB disk) and liked the performance.

After that I added 2 large 80 cent instances (c1.xlarge: High-CPU Extra Large Instance – 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each). 64-bit, 7GB RAM, 1690GB disk).

I noticed one c1.xlarge was somewhat faster than the other, so you may want to benchmark a few different instances and choose the fastest ones.

The pre-made AMI linux images are all a little different. I had no problems with the 20 cent one, but the 80 cent one was not so smooth – the Perl CPAN module was not installed, and conflicted with the Amazon tools. So I had to yum remove the Amazon tools, then yum install perl-cpan.

I ended up going with two of the monster 80 cent instances to reduce system maintenance and software release targets primarily, keeping the 20 cent instance for test/prod.

Amazon instances appear as regular linux Xen instances, so it’s just linux as usual.

In the end, we ended up using EC2 for 2 months with no problems.

Overall the project was a success, and I was happy with EC2. The invoice was about $1,300/month ($1.80/hour x 24 hours x 30 days) total for the 3 instances, including negligible bandwidth. Here’s an online AWS calculator to play with.

Some suggestions to Amazon would be to make provisioning possible from the web portal instead of just the command-line tools to help newbs get rolling, and do better testing on the images – CPAN.pm is a commonly-used Perl module indeed.

Note that FFmpeg (HEAD) is twice as fast as the older rpms floating around download sites.

Sample EC2 Commands

$ ec2-add-keypair gsg-keypair -K /root/.ec2/pk-xxx.pem -C /root/.ec2/cert-xxx.pem
$ ec2-describe-images -o self -o amazon
$ ec2-run-instances ami-2bb65342 -k gsg-keypair -t c1.medium
$ ec2-describe-instances
$ ssh -2 -i id_rsa_gsg_keypair ec2-67-202-32-93.compute-1.amazonaws.com
$ ec2-allocate-address
ADDRESS 75.101.148.165
$ ec2-describe-addresses
ADDRESS 75.101.148.165
$ ec2-associate-address -i i-aa46e5c3  75.101.148.165
$ ec2-authorize default -p 80 -K /root/.ec2/pk-xxx.pem -C /root/.ec2/cert-xxx.pem
GROUP           default
PERMISSION              default ALLOWS  tcp     80      80      FROM    CIDR    0.0.0.0/0
$ ssh -2 -i key aws01

         __|  __|_  )  Rev: 2
         _|  (     /
        ___|\___|___|

 Welcome to an EC2 Public Image
                       : - )

    Getting Started

    __ c __ /etc/ec2/release-notes.txt

[root@domU-12-31-38-00-46-01 ~]# uname -a
Linux domU-12-31-38-00-46-01 2.6.16-xenU #1 SMP Mon May 28 03:41:49 SAST 2007 i686 athlon i386 GNU/Linux

[root@domU-12-31-38-00-46-01 ~]# date
Sun Sep 28 21:17:28 EDT 2008

[root@domU-12-31-38-00-46-01 ~]# cat > /etc/yum.repos.d/dag.repo

[dag]
name=Dag RPM Repository for Red Hat Enterprise Linux
baseurl=http://apt.sw.be/redhat/el$releasever/en/$basearch/dag
gpgcheck=0
enabled=1
^D

[root@domU-12-31-38-00-46-01 ~]# exit

$ ec2-terminate-instances i-aa46e5c3

Rent or Own: Amazon EC2 vs. Colocation Comparison for Hadoop Clusters
George Reese: On Why I Don’t Like Auto-Scaling in the Cloud
More Adventures in Amazon EC2 (and EBS)
AWS Management Console

One Response to “The Real World: Amazon EC2 (Linux)”

  1. Anonymous says:

    any detailed step by step tips for Amazon EC2?
    ps: FFmpeg great, and i like MPlayer more…

Leave a Reply