Designers Guide to Web Performance

I have found myself reading quite a bit about usability and web design recently. While I have learned a lot about design I have also learned that there is a large portion of the design community which is not terribly familiar with how they can improve performance of the websites they are creating. While a pixel perfect layout is a beautiful thing, no one wants to wait 20 seconds for it to load. Since the web design community has taught me so much, I wanted to give back by writing this post on some simple techniques for improving site performance.

Welcome to Web Performance 101, here is your first assignment:

The above is a waterfall chart which shows how long the pieces of a particular website take to load. To be nice I've blanked out the website URL's but I left the file extensions since they will come in handy later. What this chart shows us is that initially the browser made a request for the index ("/") page and 1342 ms later it had received all of HTML for that page. One second in the browser makes a request for the first javascript file which takes 585 ms to load. So on and so forth for about five seconds. While there was more to the chart I cut it off for brevity.

Progressive Rendering

You probably have noticed that often times your browser starts drawing the website before the entire thing has loaded, this is called progressive rendering. This is really nice because if done correctly it can make your website "feel" much faster for end users. In order for the browser to start drawing the page it needs two things, the HTML and the CSS. Until the HTML/CSS are downloaded your users are going to be staring at a blank white screen. This is why it is really important to put CSS in the head of the document before javascript and images. In the chart above there is a vertical green line at about 2.25 seconds, that is when the browser could start rendering the page. Had the designer of the site put the CSS before those five javascript files the page could have started rendering at 1.5 seconds or even earlier. Its such a simple an easy change, that there really is no reason not to.

Javascript: The Performance Killer

While every website is going to have a waterfall pattern the goal is to have a very steep waterfall. The further the bars extend out to the right of the graph, the longer the site takes to load. Hence, we want to shoot for a very steep and short waterfall which should translate into a fast site. Unfortunately this chart looks more like stairs then a waterfall, the reason is javascript.

While it may not be obvious to you, browsers are pretty smart. For example, when fetching the images on a website most browsers can download more than one at a time. Older browsers (IE7, Firefox 2) usually download two items at a time and newer browsers (IE8, Firefox 3) download 6 or more at a time. Notice how there is overlap between items 8 & 9 as well as 10 & 11, those items were being downloaded in parallel. Then notice how there is no overlap between items 2-7. This is because unlike images or CSS, javascript blocks the browser from downloading anything else. While newer browsers tend to do a better job at this and download them parallel, most of the world is still running IE7 which is what was used to generate the above chart.

One way to improve this would be to take all the separate javascript files and combine them into a single larger file. This will reduce a lot of the overhead incurred by download many files. Note how most of the bar of the javascript file is actually green, that is "time to first byte." It is the time between when your browser say "Hey give me foo.js" and when it finally receives the first byte of that file. You can barely even see the blue lines at the end which is the time spent downloading the actual data. By combining those javascript files together you cut out most of the time spent waiting for the data to come back. There are many tools online and available for download which will automatically concatenate the files together for you.

Zipping it Up

The easiest way to ensure that your site loads fast is to send less data. Less data means less time required to download it which means better performance. This is where data compression comes in. Most big web servers provide the ability to compress the data it sends out into a much smaller format so that it can be transferred more quickly. Then once it reaches the browser it is smart enough to know that it is compressed and it will uncompress it and read it just the same way it normally would. The most popular compression algorithm used today is called GZIP, and it can often cut file size in half or even a third. It is important to note that this compression is only effective on text data, so you want to gzip your HTML, CSS, and Javascript files. You don't want to gzip images since it usually eats up more server CPU then its worth. The best thing about gzip is that it only requires a few extra lines in the server config file and then your done. This website has a good guide for how to enable it for Apache.

As I mentioned above it is often a good idea to combine all of your javascript files into a single file, the same holds true for CSS. So if you have your css split across many files, combine it into a single file so that it loads faster. An extra bonus to this is that compression rates tend to get better as files get larger. Thus while the amount of text is still the same you will probably end up transferring less data and skipping much of the time wasted waiting for the data to arrive.

That's where we will stop for today. Hopefully you learned a few things about web performance which you can apply to websites you are working on. Keep in mind there are a lot more tips and tricks for improving performance, this is only the tip of the iceberg.

Private clouds are transitional

Given that I am going to be talking with the Eucalyptus guys in a week I figured it was a pretty opportune time to sit down and really think about private clouds. Much to the chagrin of some people, I firmly believe that private clouds are still clouds. In my mind the real question is where do private clouds fit in the cloud ecosystem?

Lets get one thing straight, private clouds are transitional. What I mean by this is that private clouds are not going to be here forever, they are instead filling in a very important gap in the current cloud landscape. While a large number of servers could be moved into the cloud today, there are still quite a few use cases which don't allow for such rapid change.

To illustrate one such example, lets say you head IT for some Fortune 500 mega-corporation. After a few months of waiting your purchase order for X racks of servers (finally) goes through. You estimated that with the servers you have ordered you should have sufficient capacity for the next year. Shortly thereafter you reassess the opportunities provided by the cloud and deem it fit for use by your company. So enthusiastic about the change you even decide to join Lew Moorman and Marc Benioff on stage at some cloud event to chant "No more servers!" and pledge never to buy another one. However, upon returning to the reality of work you recall that in addition to the servers you currently have running in your data center, the new racks you just ordered are going to provide you sufficient capacity for the next year. Assuming a 3-5 year lifespan of the average server you are looking at at least two years before you can move a majority of your servers into the cloud. Enter the private cloud. You already have the resources, there is no reason for them to sit around and gather dust. Make your own private cloud with the existing resources and expand out into the public cloud in as necessary. Once the servers in your data center have run their course, toss them and move to the cloud.

The other big elephant in the room with regards to moving to the public cloud is compliance. These days it seems like every big industry has its own compliance and regulatory constraints that must be met. Whether it's PCI for credit card processing or HIPAA in the health fields, almost none of the big cloud vendors have met the requirements for becoming compliant. In fact, its not even clear that they are trying. Unfortunately, regulation is not something that can be skirted around, it is a big time show stopper. This means that companies in industries which have regulatory requirements are going to be in a holding pattern around the public cloud for the foreseeable future. What's the next best thing? That's right, private cloud.

While private clouds are transitional, they will be around until the aforementioned issues are addressed. For some this may be on the order or months or a year, but for others it will probably be much longer. The regulatory issue in particular is not an overnight fix, I think it is going to be a big parachute that prevents the cloud from running at full sprint for the year. So while the cloud is transitional, that transition period is looking like its going to be a long one.

Rails named_scope Time Surprise

I have written previously about how much I like named_scopes in Rails and I still do very much. After using them for some time I got tripped up on an issue that came up with them which surprised me a bit. I thought it would be a good idea to document it here in case others have the same issue.

To demonstrate the issue, lets say I have an app that has a User model. On the home page I want to display a list of the users which have signed up in the last hour. This is an excellent use case for a named scope. We can start by creating a named scope called "recent" which will then allow us to simply say "User.recent" to retrieve all the recently created accounts from the database. This seems simple enough so I went ahead and wrote it up as follows:

Now you will notice that I named it recent_bad, and that is because this named scope is BAD! Take a look at the queries generated when I call recent_bad three times, notice anything wrong? Its subtle. Note how the date after created_at "2010-01-05 16:55:44" never changes. For effect I made the model acts_as_paranoid so you can see what the timestamp should be. What is happening here? The named_scope is at the class level, that means that the when the User class is loaded the is evaluated once and then never again. This is why the time only changes when the server is restarted. In order to avoid this issue simply put the condition within a lambda as follows:

Now you will see that the created_at time is updating as it should be. Its a subtle bug and one that caught me by surprise.

Slashdot loves Cloud Computing!

One of the reasons I love reading Slashdot is because the inhabitants of their community are unlike any other. This morning I woke up to an article on the front page about a venture capitalist who was defending the incredibly popular social game Farmville. Being interested in the topic I thought I would take a gander at the comments, that is when I came across this gem (direct link):

    This needs to be the year that those of us with even the slightest degree of technical knowledge take a stand against the goddamn "Cloud".
    It sounds fantastic in theory, but once in the real world, Cloud Computing falls flat on its face. My development and ops teams wasted too much time dealing with Cloud providers over the past year. So my resolution this year is to tell anyone who proposes the use of anything Cloud to cram it. We aren't doing it any longer. It's a failed approach.
    Just last week, during the holidays, we had to scramble after one of our Cloud providers ran into some hardware problems and couldn't get our service restored in a timely manner. After the outage exceeded my threshold, I called up my best developers and had them put together a locally-hosted solution in a rush, and payed them quite a bit more than usual due to the inconvenient timing. Then I called up the Cloud provider and basically told our rep there that we are done using them and their shitty service. Then I called up the manager in our company who recommended them, and told him to basically go smoke a horse's cock.

The commenter was apparently so proud of their work that they decided to post it anonymously. Now keep in mind that the article was about social gaming, it had nothing to do with the cloud. While Farmville does run on Amazon EC2, the article does not mention or discuss that at any point. Regardless, lets take a look at this comment and pretend that it was posted in a reasonable context.

Perhaps my favorite thing about the comment is the fact that it makes these huge substantive claims yet provides absolutely zero reasoning behind them. For example, "It sounds fantastic in theory, but once in the real world, Cloud Computing falls flat on its face." Really? In what sense? You must mean Animoto scaling form 40 to 4,000 servers in 3 days. Or how about the millions of people who are using Google Apps. Both of those are certainly real world and as far as I can tell there was very little falling on faces. In fact, the cloud helped Animoto avoid falling on its face!

As if the first claim wasn't enough the end of the second paragraph provides a real doozie. It completely writes off Cloud Computing by stating that "[i]t's a failed approach." Well I'd better get on the phone and tell all the people who put 64 billion objects into Amazon S3. That might take a little while...

The last paragraph at least provides a little bit of background on why this particular individual will "tell anyone who proposes the use of anything Cloud to cram it." (Well if he is going to tell them all to cram it maybe he can make all those calls to the S3 users for me...). This is where I start to get a little empathetic. Downtime sucks, it really does. Customers and providers can both agree that downtime sucks since everyone loses when it rears its ugly head. My question is this, if this service is so critical then why wasn't it built to be fault tolerant? If you are truly concerned about availability then you need to either a) build the service such that it can withstand failure or b) have an SLA in place with the provider. But honestly, who wants to do that? Instead I would recommend following the actions of the commenter which is 1) don't take the necessary precautionary steps to avoid downtime and then 2) complain when there is downtime. Whats next, eating three Big Macs a day and then complaining when you need triple-bypass surgery?

Lastly, I would like to commend the brave commenter for being brazen enough to tell a manager at his/her company to "basically go smoke a horse's cock." I can see your career blossoming as we speak.