Rules-based Cache Management

Jay
Director of Development
Apr
01
2011

Rules-based Cache Management

Corresponding Blog Image

Caching for some clients is a dirty word. It's this mysterious voodoo magic that runs behind the scenes and means their content is never updated when they think it will be. One of the problems is that strictly time-based caches are essentially arbitrary-- why do we expire a page every 15 minutes if that content hasn't been updated?

Today we will discuss one approach to make site admins and editors happy by implementing on-demand cache refreshing using the Rules module and the Cache Actions module.

Our Cache Setup

First, let's discuss the caching I might use on a high traffic site. Pressflow + Varnish is great for anonymous traffic. We use them for any medium to large website. Ideally we'd like to keep the varnish cache for a particular page as long as possible, to prevent arbitrarily refreshing it all the time. For instance, if we set the cache minimum lifetime to 5 minutes on some sites there's a good chance that you're still hitting a lot of uncached pages.

Now, for authenticated traffic I really like *authcache. Authcache lets you interface with Memcache or use the database's cache_page bin. The issue with Memcache is that although it's much faster and can scale higher, you need to refresh pages faster for authenticated traffic. We typically use the database with Authcache. The database usually gives us more control over when the cache gets refreshed. 

That covers page level caching, but it's also beneficial to use a contributed module's native cache feature to cache various elements on a page. Most notably Panels and Views both have caching that can speed up page execution when a page does have to be served uncached. The Cache Actions module lets you set the cache for Views and Panels to 'Rules-based cache' which essentially means they are cached indefinitely until manually refreshed with rules. This approach is what I will describe below. 

Cache Management Rules

The rules module allows developers to set up 'events' (such as a node save) that can trigger system actions like sending an email, displaying a message to the user, or in our case clearing some cache bins. Used in combination with the Rules module, the Cache Actions module will clear various cache bins, including Views and Panels, when a Rules event is fired. Now, this module does not include everything I need so I have supplemented the cache actions module with some custom logic.

First, I created a new custom event called "nodequeue save" that lets me tap into Nodequeue's save event to trigger cache clearing. I also wanted to clear out page-level caches, so I created a rules action that wipes out the associated page's cache for cache_page and varnish. For Varnish, the Rules action actually performs a curl request that tells Varnish to clear a specific cached page. 

One thing I've done in my custom cache page manager module (working title) is make a node page's cache auto-clear whenever the node is saved or a comment is published. I use hook_comment and hook_nodeapi() instead of rules for node pages (code below is for Drupal 6x).

Since node detail pages are handled, I go through every major section of the site and construct rules that will update the cache for that page whenever something on that page is updated. This essentially makes that page refresh on demand. I look through every content type and see where that content appears around the website.

Let's say, for example, you have a blog node that appears as a teaser on the homepage, in a blog section, a blog archive and maybe a few other places in a web site. I would write a rule (actually have to write it twice: once for new content, once for updates) for blog saves that trigger cache clearing on the homepage, blog section and any other place a blog might appear. If that blog appears in a view, I will clear that specific view's cache. Also, on a large website you typically have major "section fronts" that you want updated frequently. This approach helps you do that.

As you might imagine, this takes some work to configure. Depending on the size of your site, creating rules for cache management could take a while.

Although, if you go this route you're left with a lot of caches hanging around. This is a good thing because some content is updated more than other content. This kind of setup responds to certain 'events' and updates caches accordingly instead of blindly updating caches based on timing alone. While this solution might not work for every site, it is a strategy worth considering.

---------------
*Authcache with Pressflow requires a workaround that I helped discover. You can read about on the Authcache project homepage. 

comments powered by Disqus