Installing RubyGems faster

One of the things I learned about over summer while using Rails quite a bit was that the gem installation process can be a slower one, especially when you are installing lots of gems. In many cases it is not even installing the actual gem that takes all the time, its generating the rdoc and ri information. In my experience very few developers actually use the rdoc information on their local boxes, and no one should be looking at rdocs on your production environment, so why bother installing them?
You can prevent those from being created by adding special flags to the end of the install command (e.g. "gem install rspec --no-ri --no-rdoc"). This is nice but I seem to always forget to add the flags until its too late and the gems are already installing. This can be fixed by adding the flags to your gemrc so it will happen automatically. Simply open your ~/.gemrc file and add the following line to the end "gem: --no-ri --no-rdoc". My .gemrc was created by root so I needed sudo to edit the file, but this may not be the case for you. Just for reference my .gemrc now looks like this:

---
:sources:
 - http://gems.rubyforge.org/
 - http://gems.github.com
:benchmark: false
:backtrace: false
:update_sources: true
:bulk_threshold: 1000
:verbose: true
gem: --no-ri --no-rdoc

As a quick test I installed the cucumber gem on my local box without the flags and it took 31 seconds. After changing my gemrc to include the flags the same installation time took 13 seconds, a pretty nice improvement. If you are deploying your app in an environment like RightScale where your machines are configured at boot-time I would certainly include that line in your gemrc, it should speed up the gem installation process a good deal.

The cloud solves a lot of problems, stupidity isn't one of them

The past few weeks brought some really unfortunate news for users of T-Mobile's Sidekick phones. This started with a outage of their data service which started on Friday the 2nd and lasting four dreadfully long days. During that time users couldn't access the Internet or, more importantly, their contact information since that information is all stored remotely. News only got worse this weekend when Danger (the company that makes the phone, also a Microsoft subsidiary) announced that the data not stored on the phones had "almost certainly has been lost" and that the chances of it being recovered was "extremely low".

A big undertone to this event has been "Should we continue to trust cloud computing content providers with our personal information?" (from this Slashdot article). Many people have pointed out that Microsoft purchased Danger about a year ago, and that this catastrophic data loss has cast a cloud over the soon to be launched Microsoft cloud, Azure. This is where I take issue.

First off, since when has the simple act of storing data remotely constituted cloud computing? Regardless of what definition of the cloud you subscribe to it probably has the words "virtualization", "elasticity", and "pay-per-use" in it somewhere. I don't see any of these three things or any other cloud-like properties which would lead me to believe that Sidekick == cloud. However, let us take this ridiculous assumption of the Sidekick being cloud and continue with it.

While there has been no official announcement from Danger regarding the cause of the data loss, word has surfaced that it was a result of a botched SAN upgrade. While things certainly can go very wrong when messing with a SAN, the kicker is that no backup was made prior to attempting the stunt. As far as I can tell, no backups were made at all (or at least ones that worked). Like the title says, the cloud solves a lot of problems, but stupidity isn't one of them. With its seemingly unlimited amount of storage and minimal cost its just plain stupid to not make backups of any important data. Better yet, get it all nice and encrypted and use something like the simple cloud API to back it up to multiple storage providers. Why not? In the long run the cost of keeping tons of backups of that data is so trivial that it shouldn't warrant a second thought.

Given that Danger was purchased by Microsoft many have now brought up the question, how does this affect Azure? The answer, it doesn't. If you have ever worked for or dealt with a large company you know that it takes a long time to get anything done. It has only been 18 months since Microsoft purchased Danger so I have a hard time believing that much changed for Danger aside from the sign on the building (if that). This is particularly true about system architecture where things are so complicated that the old adage "if it ain't broke don't fix it" often holds true until the very bitter end. It is pretty clear that the Danger infrastructure wasn't running on Azure and the two have very little in common. I think the only real impact this incident will have on Microsoft is for them to receive more questions the reliability of their data storage. Their response will be that they replicate data 3 times across multiple geographically separate data centers (this is just a guess) and then everyone will report this back to their CIO's who will approve and then everyone goes home happy. End of story.

Updated: Microsoft has just confirmed that they have been able to recover most, if not all of the data lost in the sidekick outage. Thats great news for sidekick customers, while they were out of a usable phone for a few weeks, getting they contacts back is a big win. As it turns out that they did have some backups in place and were able to recover from that, although it took quite a while (and still continues). While ultimately little or no data may have been lost, the damage has been done.