NoSQL: The Paradox of Choice

I have been watching a lot of TED talks recently and while there are a ton of excellent talks, the one I want to talk about today is called "The Paradox of Choice" by Barry Schwartz (he has a book by the same title). While watching his talk it struck me how relevant the topic is to NoSQL databases. I want to talk about these connections and their implications.

More != Better

I want start with an idea that many people know intuitively but is still worth calling out explicitly. Having more options is not necessarily a good thing, in fact, in many cases its a bad thing. In his talk Schwartz cites a study which showed that for every additional 10 mutual funds offered by an employer, participation went down by 2%. As the number of options increases, the amount of time and effort required to make a decision increases significantly, to the point where you are put into a state of paralysis as a result of the myriad of options. For all intents and purposes the birth of the NoSQL database came with the publication of the BigTable paper by Google and the Dynamo paper by Amazon circa 2006. Since then, the number of NoSQL databases has gone from two, to five, to 29. That's right, http://nosql-database.org/ currently lists 29 different NoSQL databases each of which has slightly different feature set and benefit/drawback trade offs. Good luck picking the right one.

More Options => Higher Expectations

When many options exist it is only natural for us to expect that one of them hasto have the features you are looking for. With only one option it doesn't matter if it has the features you want since you don't have a choice. Prior to the NoSQL movement there was only one game in town and it was the relational database (MySQL/PostGREs/MSSQL), so you had no choice but to grin and bear it. However, now that there are almost 30 options (there probably will be by the time I'm done with this post) one of them has to have the right mix of peanut butter and chocolate to fit my tastebuds. Unfortunately this rarely turns out to be the case.

Higher expectations not only apply to features, but also performance. The promise of infinite scalability will draw a lot of eyes and ears, but in order to win them over you have to show users tangible benefits. I cant count how many blog posts I have read about people/companies who are using MySQL as a key-value store because it is faster then the key-value stores themselves. In order for NoSQL databases to win, users need to be able to benchmark the database and be impressed by its performance.

What have we learned from this? For all the hype it is getting, NoSQL is in a rough spot. The number of options is large and there aren't clear winners in any category. A few of the databases seem to be rising to the top (Cassandra for instance), but until enough people get behind a relatively small number of databases (I say pick 3) the knowledge, tool support, and amount of helpful information available online is going to be to painfully low.

3 comments:

stevecrozz said...

Just out of curiosity, where exactly have you read about 'people/companies using MySQL as a key-value store because its faster than key-value stores'? I've read nothing like that, but I'd certainly be interested in reading what you're referring to.

Jonathan Kupferman said...

There are a few websites that come to mind. FriendFeed uses MySQL as a key-value store with a little twist added in for secondary indices as described here. Reddit is another example which uses PostGREs as a key-value store. They gave a talk about their architecture and scaling at Pycon which is available here. In it they mention how they tested some of the NoSQL dbs and found that PostGREs was the fastest key value store out of all of them.

Anonymous said...

Are you sure your information about reddit is correct?

They were using memcachedb, but recently switched to Cassandra for thier key-value storage.

Link: http://blog.reddit.com/2010/03/she-who-entangles-men.html

Post a Comment