There are two hard things in computer science: naming things, cache invalidation, and off-by-one errors. Let’s talk about that second one.
Caching is difficult to get right, but Rails provides you with some excellent tools to help make it easy. In this tutorial, we’re going to cover how to add caching to your Rails application to make it super snappy.
Get the Blogger project from Github and run setup procedures:
1 2 3 4 5
All existing tests should pass. Optionally, run the tests continuously while developing by running
Simple Data Caching
Computers compute. Sometimes, they do a lot of computing. Sometimes, they need to do so much computing that it takes a really, really long time. Often, that’s not acceptable: we want our sites to be fast. So what to do?
The answer is a cache.
We’re working with the blog application you may have used in previous
tutorials, but if you haven’t, it’s pretty straightforward.
Articles that have
Tags. We’re going to add some caching to
the admin dashboard to improve its performance.
"If you can not measure it, you can not improve it." - Lord Kelvin
When you have a web application, statistics on its usage are nice to have. They can help you optimize conversions, plan new content based on what’s been popular on the past, and tons of other things.
Let’s load up the site and check it out:
If you open http://localhost:3000 in your browser, you’ll see something like this:
That part at the bottom is what we’ll be improving upon. We collect a number
of statistics about the articles, comments, and the number of words in the sum
of them. This page renders pretty quickly at the moment, but as the number of
articles and comments goes up, it can get really slow. Let’s change things so
that we can see this difference. Open up
db/seeds.rb and up the numbers:
1 2 3
Then, re-build the database:
Now load up the site again, and it should feel… slow.
Five seconds. Hit refresh. Another five seconds. This situation is unacceptable, so let’s fix it.
The simplest thing: instance variable memoization
The easiest possible thing that we can do is to Just Use Ruby. Let’s check
DashboardController, where the calculation is done:
1 2 3 4 5 6 7 8 9 10 11 12
All the logic, captured in the models. Nice. Let’s check out
1 2 3
So we load up all the
Articles, and loop through them. This works, but
of course will be super slow: we load up every Article on each request! We
can use memoization to improve this situation. Memoization is a technique where
a method is made faster by not repeating cacluations that were previously
done. In Ruby, this is most commonly accomplished through instance variables:
1 2 3
Now, the first time we use
total_word_count, it will save the answer into the
@total_word_count variable, and due to the
||=, we won’t re-perform the
calculation on subsequent calls.
Go ahead and do the same thing for the other methods called in the
Let’s test this out: start up your server again with
rails s and load the
page, then hit refresh:
1 2 3 4 5 6 7
Whoah! Here’s my diff:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Just three lines changed, adding three variables, and we went from 4,000 ms to 43 ms! So we’re done, right?
Not so fast. The first page load still takes four seconds. And it’ll take four seconds every time we restart our server, as well. That’s often not good enough. And since our cached value never expires, it will also be wrong as soon as we add a new article or comment. What we need are two things:
- A way to persist our cached value across restarts of the server.
- An ‘expiration strategy’ to update the value in the cache when things change.
Luckily, people smarter than you or I have thought about this problem. There
are multiple bits of software called ‘key/value stores’ that can tackle this
exact sitaution. From the name, you can infer that a key/value store… stores
keys and values. You’ve already used keys and values in Ruby, with
So basically, a key/value store is like a giant, persistant Ruby
this section, we’ll explore Redis, which is an excellent key/value store.
You need to follow the instructions for installing and configuring Redis to follow this part of the tutorial.
There are multiple key/value stores, and so Rails provides an interface to use any one you want. Just remember the Rails API, and you can use Redis, Memcached, or another store that you may fancy.
The API is quite simple. Here’s how you’d save a value in a Ruby
Easy, right? Well, with Rails, you just do this:
Simple! If you check out the Redis console, you can see the value has been stored into Redis:
"ns:count" there shows we have a
count key in the
Now, we could use this to save things in our Rails application, but first, I want to show you a better syntax that you can use. If you were to implement it right now, I bet you’d do something like this:
1 2 3 4 5 6 7 8 9 10
Check to see if we have it cached, if not, calculate and store it, then return the answer. What’s the matter with this?
In a word:
Ruby has a really convenient method on
#fetch. Here, let me
show you how it works:
Neat! The block we pass into
#fetch gives us a value to return if there’s
no key in the hash. One bad part about
It doesn’t actually save the value into our
Hash. Bummer. However,
Rails.cache not only implements
#fetch, but also stores the value into the cache:
Awesome! Now, we can write our method in a much simpler way:
1 2 3 4 5
Look at that! Way nicer. We don’t need to repeat the key, we don’t need to check the value, and it’s all nice and clear. Easy!
One small note: you may notice that we’ve added
comment_ to the front of our
key. This is because we have a
total_word_count for both
Comments, and if they shared a key, we’d get the wrong answer.
There’s one other tricky bit with the cache. Check out the
1 2 3
This stored a
Article object in our cache. That won’t work. You should try to
only store primitive objects into the cache. So, we have to do this:
1 2 3 4 5 6 7
Go ahead and implement these tactics for all three of our methods that need caching. When you hit the server, the very first page load should be slow, and then all the rest should be fast. Restart your server, and it should still be fast. Great!
However, that first page load is still slow. Not cool. And if we add a new
Comment, it doesn’t update. Bummer. Luckily, these two problems
have the same solution.
We need a strategy to recalculate our cached values. The simplest one is to invalidate our cache whenever the data changes. This method is really easy, and really simple:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Now, when we make a new
Comment (or update it), the cache gets
blown away. Check it out:
Obviously, this is a bit heavy-handed: This now means that the first request after each update or creation will be super slow, but at least we’re now correct. Also, we’re blowing away the entire cache every time, if we were caching other values, this would get rid of them too.
So, to do this right:
- We need the list of keys that we need to invalidate.
- We need to only remove those keys when we update the correct models.
On top of that, this still means that we have one slow request per creation or update. So, to fix that, instead of removing the cache, we need to update it with the new value. To do that:
- We need to keep track of which calculation goes with which key, and upate accordingly.
- We need to know which calculations depend on each other. For example, the
total words calculation relies on both
Comments, but the most popular article calculation only worries about
Caching is hard.
If you’re an experienced Rails dev, you might already be crafting a DSL in your
mind and typing
bundle gem awesome_cache into your terminal. Stop! There’s
actually a better way!
Key-based cache expiration
What if I told you that Rails could handle all these details for you? Enter key-based cache expiration. Here’s the lowdown.
- The cache is append-only. What this means is that we never change the value of the cache, but change the key instead. You’ll see why this matters in a moment.
- The key is calculated based on the object who the cache is based on. So, when the object changes, the key changes. That’s why it’s append-only: when it needs to be expired, we get a different key.
- You might be thinking, "If we never delete things, won’t we just run out of memory?" Many key/value stores that are used as caches automatically evict older entries, and so we just don’t care, we let the store handle that. They do this based on a ‘least recently used’ algorithm.
- You can nest objects, and that ties their keys (and therefore, their values)
together. So if I’m caching
Posts and thier
Comments, and I nest
Comments inside of
Posts, then when I get a new
Post’s cache gets invalidated.
That’s it! Let’s try it with the ‘all articles’ page first. Start up your
server and hit
http://localhost:3000/articles in your browser.
Brutal! We have to load a thousand articles, a hundred tags, and count all the comments… mega slow.
To begin, we have to add the
cache_digests gem to our Gemfile, then
If we were in Rails 4, we wouldn’t need to do this.
Anyway, the first thing we need to do is modify our associations to
their parent objects. For example, in
This is needed on the
belongs_to side of associations, as children don’t
need to be updated when their parent is. The
Tagging models all need to be updated.
Next, we need to enable caching in development by modifying
Normally, this would be false, but since we’re experimenting with caching, we want it on.
Then, we need to actually add caching to our view. Modify
app/views/articles/index.html.erb to use a
1 2 3 4 5 6 7 8 9
Now, each of these blocks will have a cache based on their object, and if we
modify one of them, only its cache will be invalidated. Try it out: open up
your browser, hit
http://localhost:3000, and then refresh. As before, the
first hit should be slow, but after that, it should be snappy.
1 2 3 4 5 6 7
1 2 3 4
If we examine Redis, we can see all of the keys in there, too:
Nice! We still have that bad first page load, but rather than blow away the
entire cache, it only blows away the portion of the cache, so that’s a nice
improvement. Let’s try it out on the show page for an
Article. Find one with
a lot of comments, or just add a bunch of comments to one in IRB. Mine is #799,
so I opened up
http://localhost:3000/articles/799 in my browser. It has 15
Not to shabby. Let’s examine the show view, it’s in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
We need to tell Rails two things:
- We want to cache all of this based on the
- We want to nest in a cache for the
The first part is easy:
1 2 3 4 5
If you refresh the page a few times, you’ll see the cache warm up and things will get snappy.
1 2 3
Nice. Let’s try adding a comment by using a form at the bottom. Fill it out…
and hit submit…
1 2 3 4 5 6 7
Nice! See that ‘Read fragment’ and ‘write fragment’? Because of our
Comment was created, it updated the timestamp on the
which updated the cache.
There’s one big problem with this, though. Since we’ve modified our
updated_at, it now shows that it was last modified now. That didn’t really
happen. You might not care this time, but for some applications, this isn’t
great. Wouldn’t there be a better way?
Turns out there is! We can just nest the cache blocks and then
needed. Let’s remove them from the models, and then try adding another comment:
Oh no! It still says 16 total comments, and this troll-y comment I left before is the ‘newest’ one. But I added another! Where’d it go?
Well, because our
updated_at wasn’t modified for
@article, we used the
old cache and didn’t update to the new one. Bummer. So what to do? We can
see the nested dependencies with this rake task:
Nothing. Let’s fix that. Change the view template a bit:
and make a new partial (in
1 2 3 4 5 6 7
Let’s examine those dependencies again:
Great, so now Rails will know that we rely on this partial as well. Rails will cache each one individually, as well as tie them to the greater view. Awesome.
Hit refresh, and you should see 17 (or whatever number you had +1) comments. Awesome.
One last problem
In the case of our dashboard, we can’t simply use
the objects are simple numbers, not objects with an
this problem is currently left as an advanced exercise.