Cache Strategy Considerations for Drupal Backend with Node.js Frontend

An increasingly common setup is to use Drupal as the CMS / backend of an application with a frontend Node.js application: Drupal holds the content and manages certain things like search indexing and integration with certain third-party tools (DAM, for example), and the Node application creates the presentation layer. An assumption in this post is that the Node.js application is making JSON:API requests to the Drupal application via the Drupal core JSON:API module’s functionality.

With this setup, it’s critically important to plan a cache strategy, to avoid making excessive requests to the backend and potentially sacrificing performance. Here are some considerations.

Objective: For the best results in terms of performance and utilization, both the Drupal backend and the Node.js frontend should have a cache strategy in place. 

Note that there are differences in architecture between the two. For example, a Drupal application on Acquia Cloud will have a Varnish cache layer; a node application will not. A Drupal application out-of-the-box has tooling for managing cache control headers, while a Node.js application might or might not.

The big-picture objective outlined here is to use a CDN layer for both: move as much of the backend traffic as possible to a CDN layer, and also move as much of the frontend traffic as possible to a CDN layer.

Approach on the Drupal side:
Cache strategy varies depending on the application’s requirements. For example, the type of content, how often it’s updated, and how quickly updates must be presented to site visitors can vary from one application to another.

For a typical site where you want content updates to be reflected promptly, consider the Purge-based cache strategy outlined in this blog post, where the basic approach is:

  • Make Varnish the arbiter of what’s “fresh”
    • Set a relatively high cache maxage in Drupal, so content is cached in Varnish for a long time
    • When content changes are made in Drupal, invalidate Varnish cache for pages containing the updated content, by configuring Purge (which implements cache tag-based invalidation).
  • Cache page responses (and static files) in a CDN layer, and set the TTL (time-to-live) in your cache rules for pages fairly low
    • So CDN will check Varnish frequently for updated versions
    • And if the content hasn’t changed, Varnish will return a 304 Not Modified response (which is fast and inexpensive), and the CDN layer will continue to serve the cached version

Approach on the Node side:
Since there isn’t a Varnish layer, the TTL at the CDN layer will determine how often a request will hit your Node.js application directly. Pages won’t get updated with new content until this TTL expires (and new content is served from the Backend’s cache layers). A typical strategy is to set this to something fairly short, but acceptable long, say a minute or five minutes.

Let’s suppose you set a TTL of five minutes in the CDN caching rule(s) for your Node.js application, combined with a TTL of one minute in the Drupal application’s CDN caching rule(s), and you’re using the Purge-based strategy noted above for Drupal. Assuming the page being requested is cacheable (see considerations list below):

  • A request for a given page will be served from the Node.js application’s CDN layer for five minutes.
  • Then that TTL will expire, and the next request for that page will be served by the Node.js application.
  • The Node.js application will make JSON:API requests to the Drupal application, to fetch what’s needed to compile the page.
  • These will hit the CDN layer for the Drupal application, which will serve cached responses for each endpoint for one minute.
  • When the one-minute TTL for the Drupal CDN cache has expired for any requested endpoint, the next request to that endpoint will hit Drupal’s Varnish layer.
  • At the Varnish layer, if there is a cached version available, Varnish will return “304 Not Modified” and CDN will return the cached version.
    • or, if the Varnish TTL has expired or Purge has invalidated it, Varnish will pass the request to Drupal.
    • and Drupal will compile the page and send it to Varnish, which will in turn return the new version to the CDN layer, which in turn will send this new version along.
  • The Node.js application will receive the response from the Drupal CDN layer (cached or new version) for each request to an endpoint.
  • The Node.js application will compile the new version of the page, and send it to the Node.js application’s CDN layer, which will reset the cache TTL and deliver the page to the visitor’s browser.

Overall result, ideally, is that most cacheable requests don’t bootstrap Drupal, and don’t utilize Views at the Varnish layer, either. Only when there’s new content or the (relatively long) TTL is reached, do we need to bootstrap Drupal and compile a new version.

Considerations:

  • Both the Drupal responses and the Node.js responses must have appropriate cache headers for the content to be cacheable in the CDN layer.
    • So be sure to review cache headers, and troubleshoot cases where you expect the response to be cacheable, but something is making it uncacheable.
    • Typical issues might be forms on the page, or captcha or recaptcha making the page uncacheable.
  • Make sure you have a “cache everything” page rule, both in the Drupal CDN settings and in the Node.js CDN settings. This is the “catch-all” rule for caching pages.
  • In addition, you might need some more specific caching rules, which might be a little different for Drupal vs. Node.js. For example:
    • Possibly some Drupal paths/patterns should not be cached in CDN, or some paths/patterns could be cached for a longer time.
    • Similarly in Node.js, you might have certain types of paths/pages that can have a longer TTL on the edge, and others where refreshing more often is a requirement.
  • Be sure to review static files caching, which is handled differently than caching for pages.
    • Note that in D10, the older two-week default setting in .htaccess for static files caching in Varnish is changed to one year.
    • Depending on your site’s requirements, you might want to modify this default, and/or modify or add caching rules at the CDN layer for static files.
  • Consider using jsonapi_extras to disable certain resources / to manage the JSON:API configuration (turn things off that you don’t need).
  • Same as with any Drupal site, review permissions, and ensure delivery of restricted content is in fact restricted as expected.
  • While there are caching libraries / Node.js tools to emulate some of what you can get in the Drupal caching layers (memcache, for example), use these with caution, as they tend to be hard to manage. For example, tooling for clearing caches might need to be built by your team, adding complexity.
  • Be cautious of features/methods in the Node.js application that are designed to rebuild large portions of the site all at once. If it’s absolutely necessary to do this, ensure that limits are built in, to avoid sending a flood of traffic to origin all at the same time.
  • If pages are “mostly cacheable” but some component or feature is making the page uncacheable, consider loading that component or feature in the browser instead, such as via AJAX.