Performance
Caching in Rails
There are two hard things in computer science: naming things, cache invalidation, and off-by-one errors. Let’s talk about that second one.
Caching is difficult to get right, but Rails provides you with some excellent tools to help make it easy. In this tutorial, we’re going to cover how to add caching to your Rails application to make it super snappy.
Setup
Get the Blogger project from Github and run setup procedures:
1 2 3 4 5 | |
All existing tests should pass. Optionally, run the tests continuously while developing by running guard
Simple Data Caching
Computers compute. Sometimes, they do a lot of computing. Sometimes, they need to do so much computing that it takes a really, really long time. Often, that’s not acceptable: we want our sites to be fast. So what to do?
The answer is a cache.
We’re working with the blog application you may have used in previous
tutorials, but if you haven’t, it’s pretty straightforward. Authors write
Articles that have Comments and Tags. We’re going to add some caching to
the admin dashboard to improve its performance.
The problem
"If you can not measure it, you can not improve it." - Lord Kelvin
When you have a web application, statistics on its usage are nice to have. They can help you optimize conversions, plan new content based on what’s been popular on the past, and tons of other things.
Let’s load up the site and check it out:
Terminal
$
| |
If you open http://localhost:3000 in your browser, you’ll see something like this:
That part at the bottom is what we’ll be improving upon. We collect a number
of statistics about the articles, comments, and the number of words in the sum
of them. This page renders pretty quickly at the moment, but as the number of
articles and comments goes up, it can get really slow. Let’s change things so
that we can see this difference. Open up db/seeds.rb and up the numbers:
1 2 3 | |
Then, re-build the database:
Terminal
$ $ | |
Now load up the site again, and it should feel… slow.
1 2 | |
Five seconds. Hit refresh. Another five seconds. This situation is unacceptable, so let’s fix it.
The simplest thing: instance variable memoization
The easiest possible thing that we can do is to Just Use Ruby. Let’s check
out the DashboardController, where the calculation is done:
1 2 3 4 5 6 7 8 9 10 11 12 | |
All the logic, captured in the models. Nice. Let’s check out
Article.total_word_count:
1 2 3 | |
So we load up all the Articles, and loop through them. This works, but
of course will be super slow: we load up every Article on each request! We
can use memoization to improve this situation. Memoization is a technique where
a method is made faster by not repeating cacluations that were previously
done. In Ruby, this is most commonly accomplished through instance variables:
1 2 3 | |
Now, the first time we use total_word_count, it will save the answer into the
@total_word_count variable, and due to the ||=, we won’t re-perform the
calculation on subsequent calls.
Go ahead and do the same thing for the other methods called in the
DashboardController#index.
Let’s test this out: start up your server again with rails s and load the
page, then hit refresh:
1 2 3 4 5 6 7 | |
Whoah! Here’s my diff:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
Just three lines changed, adding three variables, and we went from 4,000 ms to 43 ms! So we’re done, right?
Not so fast. The first page load still takes four seconds. And it’ll take four seconds every time we restart our server, as well. That’s often not good enough. And since our cached value never expires, it will also be wrong as soon as we add a new article or comment. What we need are two things:
- A way to persist our cached value across restarts of the server.
- An ‘expiration strategy’ to update the value in the cache when things change.
Better: Rails.cache
Luckily, people smarter than you or I have thought about this problem. There
are multiple bits of software called ‘key/value stores’ that can tackle this
exact sitaution. From the name, you can infer that a key/value store… stores
keys and values. You’ve already used keys and values in Ruby, with Hashes.
So basically, a key/value store is like a giant, persistant Ruby Hash. In
this section, we’ll explore Redis, which is an excellent key/value store.
You need to follow the instructions for installing and configuring Redis to follow this part of the tutorial.
There are multiple key/value stores, and so Rails provides an interface to use any one you want. Just remember the Rails API, and you can use Redis, Memcached, or another store that you may fancy.
The API is quite simple. Here’s how you’d save a value in a Ruby Hash:
IRB
|
Easy, right? Well, with Rails, you just do this:
IRB
|
Simple! If you check out the Redis console, you can see the value has been stored into Redis:
Terminal
$ | |
That "ns:count" there shows we have a count key in the ns namespace.
Now, we could use this to save things in our Rails application, but first, I want to show you a better syntax that you can use. If you were to implement it right now, I bet you’d do something like this:
1 2 3 4 5 6 7 8 9 10 | |
Check to see if we have it cached, if not, calculate and store it, then return the answer. What’s the matter with this?
In a word: #fetch.
Ruby has a really convenient method on Hash called #fetch. Here, let me
show you how it works:
IRB
|
Neat! The block we pass into #fetch gives us a value to return if there’s
no key in the hash. One bad part about #fetch, though:
IRB
|
It doesn’t actually save the value into our Hash. Bummer. However, Rails.cache not only implements #fetch, but also stores the value into the cache:
IRB
|
Awesome! Now, we can write our method in a much simpler way:
1 2 3 4 5 | |
Look at that! Way nicer. We don’t need to repeat the key, we don’t need to check the value, and it’s all nice and clear. Easy!
One small note: you may notice that we’ve added comment_ to the front of our
key. This is because we have a total_word_count for both Articles and
Comments, and if they shared a key, we’d get the wrong answer.
There’s one other tricky bit with the cache. Check out the #most_popular
method:
1 2 3 | |
This stored a Article object in our cache. That won’t work. You should try to
only store primitive objects into the cache. So, we have to do this:
1 2 3 4 5 6 7 | |
Go ahead and implement these tactics for all three of our methods that need caching. When you hit the server, the very first page load should be slow, and then all the rest should be fast. Restart your server, and it should still be fast. Great!
However, that first page load is still slow. Not cool. And if we add a new
Article or Comment, it doesn’t update. Bummer. Luckily, these two problems
have the same solution.
Cache expiration
We need a strategy to recalculate our cached values. The simplest one is to invalidate our cache whenever the data changes. This method is really easy, and really simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Now, when we make a new Article or Comment (or update it), the cache gets
blown away. Check it out:
IRB
|
Obviously, this is a bit heavy-handed: This now means that the first request after each update or creation will be super slow, but at least we’re now correct. Also, we’re blowing away the entire cache every time, if we were caching other values, this would get rid of them too.
So, to do this right:
- We need the list of keys that we need to invalidate.
- We need to only remove those keys when we update the correct models.
On top of that, this still means that we have one slow request per creation or update. So, to fix that, instead of removing the cache, we need to update it with the new value. To do that:
- We need to keep track of which calculation goes with which key, and upate accordingly.
- We need to know which calculations depend on each other. For example, the
total words calculation relies on both
Articles andComments, but the most popular article calculation only worries aboutArticles.
Caching is hard.
If you’re an experienced Rails dev, you might already be crafting a DSL in your
mind and typing bundle gem awesome_cache into your terminal. Stop! There’s
actually a better way!
Key-based cache expiration
What if I told you that Rails could handle all these details for you? Enter key-based cache expiration. Here’s the lowdown.
- The cache is append-only. What this means is that we never change the value of the cache, but change the key instead. You’ll see why this matters in a moment.
- The key is calculated based on the object who the cache is based on. So, when the object changes, the key changes. That’s why it’s append-only: when it needs to be expired, we get a different key.
- You might be thinking, "If we never delete things, won’t we just run out of memory?" Many key/value stores that are used as caches automatically evict older entries, and so we just don’t care, we let the store handle that. They do this based on a ‘least recently used’ algorithm.
- You can nest objects, and that ties their keys (and therefore, their values)
together. So if I’m caching
Posts and thierComments, and I nestComments inside ofPosts, then when I get a newComment, thePost’s cache gets invalidated.
That’s it! Let’s try it with the ‘all articles’ page first. Start up your
server and hit http://localhost:3000/articles in your browser.
1 2 | |
Brutal! We have to load a thousand articles, a hundred tags, and count all the comments… mega slow.
To begin, we have to add the cache_digests gem to our Gemfile, then
bundle:
Terminal
$ $ | |
If we were in Rails 4, we wouldn’t need to do this.
Anyway, the first thing we need to do is modify our associations to touch
their parent objects. For example, in app/models/article.rb:
1
| |
This is needed on the belongs_to side of associations, as children don’t
need to be updated when their parent is. The Article, Comment, and
Tagging models all need to be updated.
Next, we need to enable caching in development by modifying
config/environments/development.rb:
1
| |
Normally, this would be false, but since we’re experimenting with caching, we want it on.
Then, we need to actually add caching to our view. Modify
app/views/articles/index.html.erb to use a cache block:
1 2 3 4 5 6 7 8 9 | |
Now, each of these blocks will have a cache based on their object, and if we
modify one of them, only its cache will be invalidated. Try it out: open up
your browser, hit http://localhost:3000, and then refresh. As before, the
first hit should be slow, but after that, it should be snappy.
1 2 3 4 5 6 7 | |
1 2 3 4 | |
If we examine Redis, we can see all of the keys in there, too:
Terminal
$ | |
Nice! We still have that bad first page load, but rather than blow away the
entire cache, it only blows away the portion of the cache, so that’s a nice
improvement. Let’s try it out on the show page for an Article. Find one with
a lot of comments, or just add a bunch of comments to one in IRB. Mine is #799,
so I opened up http://localhost:3000/articles/799 in my browser. It has 15
comments:
1 2 | |
Not to shabby. Let’s examine the show view, it’s in
app/views/articles/show.html.erb:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | |
We need to tell Rails two things:
- We want to cache all of this based on the
Article. - We want to nest in a cache for the
Comments.
The first part is easy:
1 2 3 4 5 | |
If you refresh the page a few times, you’ll see the cache warm up and things will get snappy.
1 2 3 | |
Nice. Let’s try adding a comment by using a form at the bottom. Fill it out…
and hit submit…
1 2 3 4 5 6 7 | |
Nice! See that ‘Read fragment’ and ‘write fragment’? Because of our :touch,
when the Comment was created, it updated the timestamp on the Article,
which updated the cache.
There’s one big problem with this, though. Since we’ve modified our Article’s
updated_at, it now shows that it was last modified now. That didn’t really
happen. You might not care this time, but for some applications, this isn’t
great. Wouldn’t there be a better way?
Turns out there is! We can just nest the cache blocks and then :touch isn’t
needed. Let’s remove them from the models, and then try adding another comment:
Oh no! It still says 16 total comments, and this troll-y comment I left before is the ‘newest’ one. But I added another! Where’d it go?
Well, because our updated_at wasn’t modified for @article, we used the
old cache and didn’t update to the new one. Bummer. So what to do? We can
see the nested dependencies with this rake task:
Terminal
$ | |
Nothing. Let’s fix that. Change the view template a bit:
1
| |
and make a new partial (in app/views/comments/_comment.html.erb):
1 2 3 4 5 6 7 | |
Let’s examine those dependencies again:
Terminal
$ | |
Great, so now Rails will know that we rely on this partial as well. Rails will cache each one individually, as well as tie them to the greater view. Awesome.
Hit refresh, and you should see 17 (or whatever number you had +1) comments. Awesome.
One last problem
In the case of our dashboard, we can’t simply use cache_digests because
the objects are simple numbers, not objects with an updated_at. Fixing
this problem is currently left as an advanced exercise.
More Resources
- Memoization
- How Key-based Cache Expiration Works
- cache_digests gem
- The ‘caching’ branch of blogger_advanced. Commits roughly correspond to sections of this tutorial.