I have to say that saw Amazons announcement today for Amazons Elastic MapReduce caught me off guard quite a bit. Having previously worked on a project which used a reasonably sized (100 nodes) Hadoop cluster running on EC2 I am familiar with many of the pains of setting up and running Hadoop on EC2. The reason why I found this announcement so surprising is that it demonstrates Amazons willingness provide even more middleware services. Many of the other AWS services like S3, SQS, and SDB provide very fundamental services, namely storage and queuing. I'm not saying that everyone on AWS uses all three services, but instead that just about everything running on EC2 requires some form of storage and some form of queuing. Thus if Amazon can provide the services that just about everyone needs, there is a decent chance that Amazon can get some of those people to use their services either as a result of convenience or cost. MapReduce is definately quite popular right now, and for good reason, but it is a far-cry away from being as fundamental as something like storage.
The point of that whole discussion is this, Amazon just made their first true Platform-as-a-Service (PaaS) offering. I know Amazon is the Infrastructure-as-a-Service (IaaS) provider, but Hadoop is absolutely a platform in the same way that Rails or Django is a platform. Sure it doesnt serve out websites (unless you count the JobTracker) but that is in no way part of the definition of what makes a platform. This is a particularly interesting move becuase it opens up the possibility of Amazon providing more Platform layer services (Rails anyone?) and encroaching on the space currently occupied by the Google AppEngines and Herokus of the world. One might ask why would Amazon venture into already occupied territory, why compete with people already providing those services. Its simple, what is likley the most common use of Amazon EC2? If you guessed hosting websites you are correct, and if you guess hosting Rails websites you get a bonus point. Since that is the case, it is pretty clear that EC2 is providing things that Heroku is not, wether it be flexibility, cost, or otherwise. So why not exploit that fact, make your customers happy and make money from it as well?
Go ahead, double take if you have to, make sure you got those decimal places right. Yeah so running a job using Elastic MapReduce (EMR?) is effectivly 15% of what it would cost to run it yourself on EC2. Ridiculous. To be honest it doesnt not make any sense to me that they are able to offer such a discounted price for a service that gives you the same exact machine as what you would get for $0.10/hour. I am going to have to think about that one for a while.
Either way, that made something that was already dirt cheap ($0.10/hour) into something even cheaper than dirt ($0.015/hour), and I am very excited about the prospects and implications. As stated on CNET "Bring your datamining to us".
Update: Dave has correctly pointed out that I missed a very big sentence in the Elastic MapReduce description which is "Amazon Elastic MapReduce pricing is in addition to normal Amazon EC2 and Amazon S3 pricing." That makes quite a bit more sense.
Even still, the 15% premium is a tiny price to pay to not have to deal with bringing up and tearing down servers all the time, along with the headache of actually getting the thing setup.