Be Wary of the Paranoid

I recall distinctly when first learning about Rails and plugins that the very first one I used was acts_as_paranoid. Something about actually deleting data concerned me and so I figured adding acts_as_paranoid to some important tables in my application would save a lot of headaches. While it is tremendously useful it also has a pretty big unintended consequence that I think gets overlooked by most. Lets take an example query generated by acts_as_parnoid.

>> User.find_by_id(7)
SELECT * FROM `users` WHERE (`users`.`id` = 7) AND (users.deleted_at IS NULL OR users.deleted_at > '2009-11-26 02:17:28')

Looks harmless right? What I want to draw attention to is the "OR users.deleted_at" part. In order to ensure that the user isn't deleted it checks not only that the deleted_at field is NULL, but also that deleted_at is greater then the current time. In reality the IS NULL check is sufficient unless you are setting the deleted_at of some object to be in the future. I have yet to see anyone actually use it in that way. This is what makes the use of the current time in the query so bad, it is slowing down tons of Rails applications and most people don't even know it.

One important thing to notice about the MySQL query cache is that it is pretty dumb. Basically it caches the incoming query string exactly as written and then stores its associated result set. This becomes a problem when you use something like the current time in the query string, it functions as a cache-buster each second. So at 0 seconds you make a query and it is stored in the cache, then at 0.5 seconds you make the same query and it is read from cache, then at 1.0 seconds you make the same query but it will miss the cache since the time has increased by a second. This means that anything written to the query cache which uses acts_as_paranoid effectively has a 1 second expiration time. That's awful , and all that for the 0.005% of users who want to expire things in the future. Not to mention the fact that it completely pollutes the cache with old data which never gets touched a second after its written.

Alright, enough moaning, here's how to fix the problem. Open up paranoid.rb and in the "with_deleted_scope" function rip out "OR #{table_name}.#{deleted_attribute} > ?" along with the current_time variable after it. Similarly in has_many_through_without_deleted_assocation.rb in the construct_conditions method delete the same string where it is appended to the conditions variable. Keep in mind that if you are setting deleted_at to values in the future then you don't want to make this change. But for everyone else, enjoy the improvement in your query cache hit rate.

As a final note, for the tables which you have made paranoid you probably also want to consider adding an index which includes the deleted_at field since it will be a condition of every SQL query on that table.

Updated: There is a fork of acts_as_paranoid courtesy of mikelovesrobots that provides the fixes that I talked about previously, it is available here. I'm gonna switch out my versions of acts_as_paranoid for this one, I'd suggest you do the same.

0 comments:

Post a Comment