Web Application Caching Strategies: Write-through caching

You have probably heard about all of the big websites which heavily rely on caching to scale their infrastructure. Take a look at the Wikipedia entry on Memcached and you will find a veritable who's who of big internet companies. While we know these websites do caching, I have found very little written about how they actually do caching. Unfortunately a lot of caching is so particular to the individual website that it isn't very useful for most developers. There are however a few overarching caching "strategies" which if done correctly will guarantee you will never get stale data from your cache. In this series I'm going to discuss two of these strategies, write-through and generational caching.

The first and simplest strategy is write-through caching. What happens is when you write to the database, you also write the new entry into the cache. That way, on subsequent get requests the value should always be in the cache, and never need to hit the database. The only way a request would miss the cache is because a) the cache filled up and the value was purged or b) server failure. Here is some sample code in Python for how this works. I'm using database.get/put/delete as shorthands for SELECT/INSERT/DELETE in your database of choice.



The strategy itself is really simple to understand and for many workloads can result in dramatic performance improvements and decreased database load.

While a simple and clean caching strategy there are a few things you should be aware of to avoid some common issues when implementing this strategy.

Often times people will cache database objects by using the database id as the key. This can result in conflicts when caching multiple types of objects in the same cache. A simple solution for this is to prepend the type of the object to the front of the cache key (e.g. “User/17”).

Next, for any put/delete operations to the database it is important it check that those operations completed successfully before updating the cache. Without this type of error checking you can end up in situations where the database update failed but the cache update happened anyways, which results in an inconsistent cache.

While this strategy is effective for caching single objects, most applications fetch multiple objects from the database (e.g. get all books owned by "Joe"). For a strategy which handles multiple objects, see the next post in the series about generational caching.

2 comments:

Unknown said...

Sort, simple, and effective. Thanks Jon!

Anonymous said...

These are the best strategies of web app caching. As a fresh developer, I learn many functions of this application from this post. iOS Event Applications

Post a Comment