Regular Expression

Managing Talent: A Running Metaphor

When I first joined the cross-country team in high school I naively thought that running was all the same, you just ran. Then came the first day we ran hills. It was then that I realized that running has a lot more technique and skills then were initially apparent. One of the most important things I learned was how run down hill.

On that first day of running hills our coach told us to “really work it going up the hill and recover on the way down.” After reaching the top of the hill I began my “recovery” back down the hill (i.e. jogging slowly). However, I noticed that everyone else was flying passed me on the way down the hill. Not only that, but when they reached the bottom they seemed even more rested and recovered than I was.

After practice I asked one of the guys on the varsity team why everyone’s “recovery” speed was so much faster than mine. In true Southern California surfer fashion he said “Dude, you just got to learn how to let your legs go.” Let my legs go, what the hell does that mean? Is that some sort of weird surfer zen thing?

What he meant was, when running uphill you really have to churn your legs to keep moving. However, when you reach the top and start going down, your legs are already churning. With the help of gravity, your legs will keep churning unless you actively try to stop them. See also: Newtons first law. It is the same effect you get as when you drive down hill and put your car in neutral.

Once you figure out how to “let your legs go” you just have two things to worry about. The first is that you regulate your speed. Your legs can take you far faster than you can safely go, so you should constantly be regulating your speed to go as fast as is safely possible. The second thing is avoiding obstacles. It only takes a small rock or divot to catch you off guard and send you tumbling down a hill. The further away you see the obstacle the easier it is to avoid.

Managing talented people fundamentally uses the same technique as running down hill. Imagine your team as the legs and you, as the manager, are the head. When you have a talented team, they should be able to move incredibly quickly. While their ability to move fast is a good thing, it is your job as the manager to make sure that your team is going as fast as they can while still being safe. If you allow the team to move too fast they will eventually trip and fall. Hopefully its just a scrape or a bruise and they can get back at it. But sometimes the injury can be really serious and the recovery process is a long and painful one.

Avoiding obstacles is the second skill you need to master. As the manager you are the vision. You see whats coming both short and long term, it is your job to steer the team appropriately. You have to make sure you are tracking what is right in front of you as well as what is on the horizon. Its important to remember that the further away you see an obstacle, the more smoothly you can steer the team around it. If you are really good, your team wont even notice the small course correction you made a while back that let you “effortlessly” avoid the giant bolder that would have eventually blocked their way.

If there is one piece of advice I can give you about managing a talented team it is: “let your team go.”

Eatability Testing: Why don’t more restaurants do it?

Two things I’ve observed about people in New York is that everyone jaywalks and no one cooks. Everyone eats out all the time. However, since most people are busy, a large amount of food is served via takeout and delivery (as opposed to eating in). This presents a challenge that few restaurants seem to realize: the long delay between when the food is prepared and when it is actually eaten.

How many times has this happened to you, it is late on Sunday evening and you call one of your favorite restaurants for a little delivery. You pick your favorite item off the menu, place your order, and wait in anticipation. Thirty minutes and a cash exchange later and food is on your dinner table. You pop open the styrofoam container, grab a bite, and yuck, it’s cold.

Across the street from where I work at there is a fantastic Vietnamese sandwich shop called Num Pang. Their menu is filled with tons of delicious sandwiches like ginger barbecue brisket and hoisin veal meatballs. The trick with eating Num Pang is that you have to eat it immediately. In the 10 short minutes it takes to get your order and walk back to the office, the bread has already started to become soggy. If someone happens to catch you in the hallway you might as well say goodbye to your sandwich and hello to a (not so) hot mess.

This begs the question: why don’t more restaurants and takeout joints test the experience of eating food from their establishment? Prepare an order, put it on the counter for 20 minutes, and then have the chef eat it. Better yet, have the delivery person take it out with an order and then bring it back. I’d imagine most chefs would be surprised at the “presentation” and taste of their food after it has taken a bike ride through NYC in the winter. What was once a nice pad thai dinner will likely have turned into a cold ball of yuck.

Why don’t more restaurants do this? I think many restaurants are in for a rude awakening when they start eatability testing their menu.

Here are some of the most common missteps I’ve seen:

If you are serving something on bread (hamburgers, sandwiches, etc) then any sort of sauce or liquid on it is going to make the bread soggy within 5 minutes. If you can, just put the sauce in a container on the side. I’m sure some health-conscience customers would appreciate it as well.
If you have to serve it with sauce, make sure to put the sauce as far away from the edges of the bread as possible. Otherwise, the sauce leaks out of the sides and make the top and bottom of the bread soggy as well.
Separate hot and cold, just like you would at the grocery store. Styrofoam insulates surprisingly poorly, so packing a cold dessert on top of a hot soup is a bad idea. Account for leakage. Some well placed napkins can go a long way.
Account for the cooking and cool down that happen during travel. Talk to any chef in a food competition and they will tell you that residual heat can have a large effect on the taste of a meal. Account for the 10-20 minutes of delivery time for all to-go orders.
Test & iterate. There will always be surprises, so eatability test with different items on the menu to make sure that it is as good as you expect it to be.

Soundtracked: Understanding how we consume music

Lately I’ve become fascinated with music, particularly how people talk about and discover new music. For the last few months I’ve been talking to just about anyone and everyone about music and I’ve learned some fascinating things which I plan to write about in a few blog posts. Here is the first set of observations I’ve made about how we consume music.

Music genres are useless

Just as an experiment, go ask a few people what type of music they listen to. Chances are they responded with something frustratingly generic like “I listen to rock music” or “I like hip-hop.” Given that, do you feel like you have a strong sense for that persons musical taste? Didn’t think so. While most people can describe their music tastes in terms of genres, for the person hearing those genres, they typically have very little meaning. This is becuase high level genres have become so overloaded that they could mean practically anything. The Foo Fighters (stadium rock), Pink Floyd (progressive rock) and The Decemberists (folk rock), all fall under the “rock” genre, yet in terms of sound they are miles apart. That is why I no longer ask people what type or genre of music they listen to, it's useless.

Finding out what people actually listen to

Now that we know music genres are useless, how do you actually ask someone what type of music they like and get a useful response? Initially I tried asking people what their favorite bands/artists were. Interestingly, I found that people struggled mightily with this question. There was often a good minute of hems and haws before I would get even a single band. After a lot of thought many people would toss out a few bands and then trail off with “yeah, I guess that's it...” Not happy with the result, I experimented with a few other questions.

The most effective way I’ve found is asking what are the last few bands/artists they’ve listened to. Most people have no problem rattling of the last three or four bands they listened to, and while its a biased sample, it's far easier for them to answer and much more useful then genres. For example, if someone tells me the last few artists they listened to were Fleet Foxes and The Freelance Whales I’d immediately know they they like softer indie rock, with great vocals, lots of harmonies, and acoustic guitars. It’s so much more accurate and useful then had they said “I like indie rock.”

New is a relative term

When people are sick of their current music they will often ask their friends if they know any “new music.” Most people interpret to mean any music that has been recently released. The reason they ask for this is that they want music they haven’t heard before, not because they have some aversion to older music. That's why even when people are asking for new music, you can generally interpret that to mean “new to them” rather than new in terms of release date. The beauty of music being “new to them” is that even if you know something is old, as long as they haven’t heard it yet, it is new to them. For example, when people ask me for Taking Back Sunday-style rock music I’ll recommend We Are Scientists - With Love and Squalor. Even though the album was released in 2005 the band was still pretty small at the time so most people didn’t (and still don’t) know about them. Score one for the relative new-ness of music.

That is all for now.

Web Application Caching Strategies: Generational caching

This is the second in a two part series on caching strategies. The first post provided an introduction and described write-through caching.

In the previous post in this series on caching strategies we described write-through caching. While an excellent strategy, it has a somewhat limited application. To see why that is that case and how generational caching improves upon it, let's start with an example.

Imagine that you are creating a basic blogging application. It is essentially just a single “posts” table with some metadata associated with each post (title, created date, etc). Blogs have two main types of pages that you need to be concerned about, pages for individual posts (e.g. “/posts/3”) and pages with lists of posts (e.g. “/posts/”). The question now becomes, how can you effectively cache this type of data?

Caching pages for individual posts is a straightforward application of the write-through caching strategy I discussed in a previous post. Each post gets mapped to a cache key based on the object type and id (e.g. “Post/3”) and every update to a post is written back to the cache.

The more difficult question is how to handle pages with lists of posts, like the home page or pages for posts in a certain category. In order to ensure the cache is consistent with the data store, it is important that any time a post is updated, all keys that contain the post need to be expired. This can become difficult to track since a post can be in many cache keys at the same time (e.g. latest ten posts, posts in Ruby category, posts favorited by user 15). While you could try to programmtically figure out all keys that contain a given post, that is a cumbersome and error-prone process. A more clean way of accomplishing this is what is called generational caching.

Generational caching maintains a "generation" value for the each type of object. Each time an object is updated, its associated generation value is incremented. Using the post example, any time someone updates a post object, we increment the post generation object as well. Then, any time we read/write a grouped object in the cache, we include the generation value in the key. Here is a sequence of actions and what would occur with the cache when performing those actions:

By including the generation in the cache key, any time a post is updated the subsequent request will miss the cache, and query the database for the data. The result of this strategy is that any time a post object is updated/deleted, all keys containing multiple posts will be implicitly expired. The reason I said implicitly expired is that we never actually have to delete the objects from the cache, by incrementing the generation we ensure that the old keys are simply never accessed again.

Here is some sample code for how this could be implemented (in psuedo-Python):

After implementing this strategy in multiple applications I want to point out a few things about how it works and its performance properties. In general I have found that using this strategy can dramatically improve application performance and lessen database load considerably. It can save tons of expensive table scans from happening in the database. By sparing the database of these requests, other queries that do hit the database can be completed more quickly.

In order to maintain cache consistency this strategy is conservative in nature, this results in keys being expired that don’t necessarily need to be expired. For example if you update a post in a particular category, this strategy will expire all the keys for all the categories. While this may seem somewhat inefficient and ripe for optimization, I’ve often found that most applications are so read-heavy that these types of optimization don’t make a noticeable overall performance difference. Plus, the code to implement those optimizations then become application or model specific, and more difficult to maintain.

I previously mentioned that in this strategy nothing is ever explicitly deleted from the cache. This has some implications with respect to the caching tool and eviction policy that you us. This strategy was designed to be used with caches that employ a Least Recently Used (LRU) eviction policy (like Memcached). An LRU policy will result in keys the with old generations being evicted first, which is precisely what you want. Other eviction policies can be used (e.g. FIFO) although they may not be as effective.

Overall, I’ve found generational caching to be a very elegant, clean, and performant caching strategy that can be applied in a wide variety of applications. So next time you need to do some caching, don’t forget about generations!

Web Application Caching Strategies: Write-through caching

You have probably heard about all of the big websites which heavily rely on caching to scale their infrastructure. Take a look at the Wikipedia entry on Memcached and you will find a veritable who's who of big internet companies. While we know these websites do caching, I have found very little written about how they actually do caching. Unfortunately a lot of caching is so particular to the individual website that it isn't very useful for most developers. There are however a few overarching caching "strategies" which if done correctly will guarantee you will never get stale data from your cache. In this series I'm going to discuss two of these strategies, write-through and generational caching.

The first and simplest strategy is write-through caching. What happens is when you write to the database, you also write the new entry into the cache. That way, on subsequent get requests the value should always be in the cache, and never need to hit the database. The only way a request would miss the cache is because a) the cache filled up and the value was purged or b) server failure. Here is some sample code in Python for how this works. I'm using database.get/put/delete as shorthands for SELECT/INSERT/DELETE in your database of choice.

The strategy itself is really simple to understand and for many workloads can result in dramatic performance improvements and decreased database load.

While a simple and clean caching strategy there are a few things you should be aware of to avoid some common issues when implementing this strategy.

Often times people will cache database objects by using the database id as the key. This can result in conflicts when caching multiple types of objects in the same cache. A simple solution for this is to prepend the type of the object to the front of the cache key (e.g. “User/17”).

Next, for any put/delete operations to the database it is important it check that those operations completed successfully before updating the cache. Without this type of error checking you can end up in situations where the database update failed but the cache update happened anyways, which results in an inconsistent cache.

While this strategy is effective for caching single objects, most applications fetch multiple objects from the database (e.g. get all books owned by "Joe"). For a strategy which handles multiple objects, see the next post in the series about generational caching.