Coding with Cache Tags in Drupal 8

Cache tags are a game changer for your caching strategy in Drupal 8.

Expiry vs invalidation

Up until Drupal 8, Drupal has had one caching strategy called cache expiration. It cached computed output for a fixed period of time (e.g. 1 hour). There are two downsides to this approach:

  1. You cannot see newer content until the expiration time (time-to-live or TTL) had lapsed.
  2. You would have to compute output again after the expiration period even if nothing had changed.

Drupal 8 introduced another option called cache invalidation. This is where you set the cache lifetime to be permanent and invalidate (purge) that cached item when its no longer relevant. Drupal 8 does this by storing metadata about the cached item. Then, when an event occurs, such as an update on a node, the metadata can be searched to find all cache items that contain computed data about the updated node, and can be invalidated. This solves the two problems of cache expiration:

  1. Newer content can be seen immediately.
  2. Cache doesn’t need to be recomputed if nothings changed.

Pretty neat huh? But this means that any successful cache invalidation strategy must have accurate cache tagging occurring everywhere…. including your custom code!

Custom code & cache invalidation

Using cache tags

First up, let’s look at how you might use Drupal’s native application caching with cache tags:


use Drupal\node\Entity\Node;
use Drupal\Core\Cache\Cache;

$nid = 123;
$cid = 'markdown:' . $nid;

// Look for the item in cache so we don't have to do the work if we don't need to.
if ($item = \Drupal::cache()->get($cid)) {
  return $item->data;
}

// Build up the markdown array we're going to use later.
$node = Node::load($nid);
$markdown = [
  'title' => sprintf('## %s', $node->get('title')->getValue()),
  //...
];

// Set the cache so we don't need to do this work again until $node changes.
\Drupal::cache()->set($cid, $markdown, Cache::PERMANENT, $node->getCacheTags());

See the last line of code where the cached item is written to the cache provider. Here, there are two arguments passed at the end of the set call that show this is code that will obey a cache invalidation strategy:

  • Cache::PERMANENT tells the caching driver to cache the item for as long as possible.
  • $node->getCacheTags() passes in the node’s cache tags which will be registered with to the cached item.

Read more on Cache Tags

Now whenever Entity::save() is called (or extended versions of it), the entities cache tags will be invalidated which will include our cached item above (if its the node that’s being saved). Drupal does this by using Drupal\Core\Cache\Cache:invalidateTags() like this:

use Drupal\Core\Cache\Cache;

Cache::invalidateTags($node->getCacheTagsToInvalidate());

You can use this function as well if certain events occur that require cache tags to be invalidated that Drupal is not aware of (example at the end of this blog post).

Cache item variation

While cache tags help identify what needs to be invalidated, they don’t help validate if a cached item is fit for the purpose. To do that, we need to ensure the cache ID is one that variates based on all the environmental variables the cached item is subject to. For example, if your site is using entity translations, you’ll want to ensure that the language code is apart of any cache ID you create.

$cid = markdown:’ . $node->id() . ‘:” . $node->get('langcode')->value;

Cache contexts in the Render API

Inside the Render API, the renderer does the cache lookup for you and it may not know all the environmental variables required to produce an accurate cache ID. This is where cache contexts come in.  Let’s look at using cache contexts inside the Render API:

/**
 * Implements hook_preprocess_block().
 */
function mymodule_preprocess_block(&$variables) {
  // Unique cache per search string.
  if ($variables['elements']['#id'] == 'search_hero_type_1') {
    $variables['search_string'] = \Drupal::request()->get('search');
    $variables['#cache']['contexts'] = ['url.query_args:search'];
  }
}

In this example, the “search_string” variable changes subject to the value of the “search” query parameter in the URL. Because of this, the block cache needs to vary based on the value of that query string. By adding the cache context “url.query_args:search”, Drupal ensures the cached item ID will contain the value of the “search” query parameter. This ensures that we don’t accidentally pull from cache an item with a different “search_string” value.

Cache keys in the Render API

The more entities and contexts you have inside your cached items, the higher the probability a cache lookup will either miss (no such variation exists) or be invalidated (affected by many entity change events).

The render API is a deep tree of keyed arrays that maps out a rendering structure. Caching the array at the top will generally result in frequent invalidations. So sublevels of the array are cached also so that they are not rebuild when other areas of the render array are invalidated.

This is where cache keys come into play. They provide keys that are eventually combined with contexts to generate a cache ID used to store that level of the render array in cache.

$build['#cache'] = [
  'keys' => ['entity_view', 'node', $node->id()],
  'contexts' => ['languages'],
  'tags' => $node->getCacheTags(),
  'max-age' => Cache::PERMANENT,
];

The above code would result in storing $build in cache with a cid such as ‘entity_view:node:123:en’. If the ‘keys’ key is not provided at this level, then no caching at this level occurs. However the cache tags bubble up to the top level of the render array and become apart of the larger cache item cache tags.

Read more in Render API

Cache tags and entity queries

With an invalidation cache strategy, remember that any data you use needs to be registered with cache tags (and contexts or keys if using a render array). This is especially the case when using entity queries:


<?php use Drupal\node\Entity\Node;
use Drupal\Core\Cache\Cache;

$nodes = Node::loadMultiple(\Drupal::entityQuery('node')
  -?>condition('type', 'product')
  ->condition('field_product_brand', $brand->id())
  ->condition('status', 1)
  ->execute());

// Build a list of cache tags from the retrieved nodes.
$tags = array_reduce($nodes, function (array $tags, Node $node) {
  return Cache::mergeTags($tags, $node->getCacheTags());
});

\Drupal::cache()->set('my_products:' . $brand->id(), $nodes, Cache::PERMANENT, $tags);

In the code above, the cache tags from the queried nodes are aggregated together to store inside the cache item. Now take another look, because something is wrong… What if a new product node is added?

If a new product is added, how do we invalidate the right my_products:<id> cache item? Drupal is not smart enough to do this by itself yet. Instead, we need to hook into the update events ourselves and invalidate the tags manually:

/**
 * Implements hook_node_presave().
 */
function mymodule_node_presave(NodeInterface $node) {
  mymodule_invalidate_node($node);
}

/**
 * Implements hook_node_presave().
 */
function mymodule_node_delete(NodeInterface $node) {
  mymodule_invalidate_node($node);
}

/**
 * Invalidate custom cache associated to brand node.
 */
function mymodule_invalidate_node(NodeInterface $node) {
  $tags = [];
  if ($node->getType() == 'product' && !$node->get('field_product_brand')->isEmpty()) {
    $tags[] = 'my_products:' . $node->get('field_product_brand')->entity->id();
  }
  elseif ($node->nodeType() == 'brand') {
    $tags[] = 'my_products:' . $node->id();
  }

  if (!empty($tags)) {
    Cache::invalidateTags($tags);
  }
}

This is an existing problem for the Views module too.

Core mitigates this by adding <entity_type>_list tags to Views. For example, a node view will contain a tag called ‘node_list’. Wherever a node is updated, then all node views will be invalidated also. Same applies for all entity types.

Wait. Views are everywhere!

Yup, it kind of kills the caching strategy a little. There is an issue on Drupal.org that suggests including the bundle name also (see #2145751) which might improve the performance a little. But otherwise this is a bit of a gotcha!

We can however mitigate this at the page caching level if you use a caching proxy such as Acquia Cloud or Acquia Cloud Edge.

  1. Inside the Purge module admin UI, go to the core tag queuer and blacklist tags like node_list. This will prevent purge requests being sent to upstream cache proxies connected to the Purge module (like Acquia Purge, Akamai, Cloudflare, etc).
  2. Add the Views Custom Cache Tags module to your codebase and configure each anonymous visitor facing View with custom cache tags. The naming is not as important as the ability to pass in tokens to represent contextual filters of the view in the cache tags. In most cases, you’ll want one cache tag per contextual filter.
  3. Implement hook_entity_presave and hook_entity_delete and invalidate the cache tags added to the views accordingly to entity relationships you have in your data model.

Conclusion

A cache invalidation strategy will help you achieve killer performance and maximize the mileage you can get from your underlying hardware platform. But it doesn’t come for free - you need to review it, validate it works (test it), and ensure your custom code uses it correctly.

In my experience, Drupal core handles cache tags very well and with the use of Views Custom Cache Tags, it can be quite easy to produce a strong invalidation strategy. However, if you have a lot of custom code (e.g. controllers, rest plugins) it is quite likely it won’t support an invalidation strategy and you’ll have to settle for expiration instead.