Caching Strategies for Scaling Drupal: Foundations

Regardless of the purpose of your Drupal site, it is important that the site be reliably available and performant for your users. For those of us with limited resources at our disposal it isn’t feasible to scale up hardware indefinitely.

Thankfully, Drupal provides us with a number of tools in core, and even more in the contrib community, that make caching accessible to even the least technical amongst us. Let’s walk through the basics of the Drupal cache and discuss the importance of properly configuring cache with the goal of avoiding common missteps.

Anatomy of a web request

Let’s say you have a Drupal site, freshly minted from the most recent release of Drupal core, and you’ve built out a basic site with content, users, and a theme. When someone visits that site there are a number of things Drupal needs to do in order to deliver a response to that user. It is important to keep in mind that Drupal builds pages dynamically, which gives you great flexibility when customizing sections of your site based off of path, user roles, product categories, or any other criteria.

That customization comes a cost, however, because Drupal has to make calls to the database and use server memory to execute PHP functions. There is, of course, a limit to the number of concurrent PHP processes a server can run at any given time, and this is the most common bottleneck in uncached Drupal performance. When PHP processes are tied up it means that Apache must hold a request and wait for a process to become free; this will result in the server’s CPU warming up as it waits on other services. Eventually, when the queues for these services are too busy requests will time out, which is what we want to avoid.

So, how do we avoid a rush on PHP processes if Drupal needs those resources to build pages? The first thing we should do is enable page caching to prevent Drupal from doing the same work twice, or two hundred times. Going back to the example above, when a user types into a browser the process to deliver a page to that user looks like this:

  1. DNS server routes the request to your hosting provider.
  2. Drupal receives the request for and boots up.
  3. Drupal connects to its database to pull together all of the elements of that page.
  4. PHP code turns the query results into an object which is passed to the theme engine.
  5. A themed page is rendered and delivered to the user.

Now, that’s a very simplified workflow, but it gives you an idea of what Drupal is doing in broad strokes. With our currently uncached site this workflow must be stepped through for every single request, even if two users make the exact same request, which is where caching can help us reduce the load on server resources.

Starting from scratch with Drupal cache

Let's lay out a scenario where an uncached Drupal site can be optimized, so redundant requests aren’t sent to origin (Apache, PHP, MySQL). We’ll start with enabling Drupal core’s cache settings which are located at Administration > Configuration > Performance > Development > Performance. Here you’ll see a number of options in Drupal 7, while the Drupal 8 version of this page is a bit more straightforward.

In Drupal 7 you’ll see an option to first enable page caching for anonymous user, an option to enable caching of blocks, and then fields to set the lifetime of cached items. Each site will have its own requirements around what TTLs (time to live settings) work best given your use case, but let’s start with caching enabled and a 5 minute setting for "Expiration of cached page." This configuration will mean that when a request is made for a uncached page (like in the scenario above), once the page is rendered a static copy will be stored in MySQL so that subsequent requests for that page, for the next 5 minutes, will be served the static copy rather than having Drupal query, build, and render a new dynamic copy. This will result in far fewer requests to origin until the TTL of the cached page expires. After the 5 minute cache TTL expires, the next request made will go to origin and a new static copy will be stored in cache.

In Drupal 8 core caching options have been streamlined and there are fewer options needed in order to achieve the same results. You’ll now see a “Page cache maximum age” option in Drupal 8 that takes the place of the oft confusing TTL options in Drupal 7. Anonymous caching is on by default so a checkbox is no longer needed. Similarly, block caching architecture was changed for Drupal 8 and blocks now use the same caching system as pages, so there’s no need for added settings there.

The last options on the cache settings page for both Drupal 7 and 8 are the aggregation toggles for JS and CSS files. Enabling both settings are recommended as this can reduce the number of calls made for individual assets and can result in smaller file sizes through optimization that takes place during aggregation and caching.

Beyond Drupal Core

Now that we’ve discussed basic cache configuration for Drupal we should look at further optimizations to make our applications scale much further. Continuing the scenario we were using above, the current limitation with scaling core caching is the database layer. Because Drupal’s cache is written to the database, there’s still a requirement that the database have capacity to handle any administrative load (creating nodes, configuring Views, enabling modules, etc.) in addition to reading, writing, and updating cached objects. Therefore, our goal should be to move Drupal’s cache out of the database and into an external service. To take it a step even further, we should look at other tools and systems to leverage as proxy caching layers to further optimize responses for the majority of visitors.

Each of those next steps is worthy of an extensive discussion itself, which we’ll plan on addressing in a follow up article.