Drupal 8: Configuration Performance Management

(Part 2 of the "Site Performance with Drupal 8" blog series). Just getting started? Check out Part 1
(This article does not represent the current state of Drupal 8 development.)

Drupal sites are all about content, but the site itself is more than just the code and the data it stores. The difference between production and staging isn't just which database to point it: It's the entire configuration. Storing, updating, and accessing this configuration is a big part of what Drupal does. Each module needs to read a different part of this configuration, and any CMS has to manage those values behind the scenes to make sure users get the right content, fast.

A single request, using the kernel.terminate event

Monitoring Drupal 8's configuration management system - including changes to the module's own settings. Here, the kernel.terminate event (dispatched after the page has been sent to the user) contains an update to the cache_config table to account for enabling additional tracing features. (Using TraceView)

Currently, Drupal's variable system provides a consistent interface for modules to store their configuration, and one of the benefits is letting them leverage variable caching. Rather than reading every variable as a row from SQL, the entire array of variables is serialized so it can be retrieved more easily later. The speed benefits can be increased even further by storing this cache in an in-memory data store like memcached rather than a database table.

However, optimizing for reads has resulted in optimizing against writes. Any call to variable_set() will invalidate the variable cache and force a future request to rebuild it. Worse, since the rebuilding process requires obtaining a semaphore lock, many concurrent requests may be left waiting for the task to finish. The upshot is that some of your users will be left waiting for something to show up on the page, and they may be waiting quite a while if you store many variables or are under load. Since these waiting processes consume resources, it's quite possible for a well-performing site to quickly buckle if many variables are suddenly modified.

If you're a Drupal developer, none of that is likely going to be news to you. But did you know that a lot of this common knowledge won't apply to Drupal 8 sites? Thanks to a major effort by the Configuration Management Initiative, the existing variable system has been almost completely uprooted. While variable_get() and variable_set() remain in the API, they perform a completely different function and won't be used by more than a handful of modules. Switching over is going to result in Drupal being much more friendly to version control and cross-environment migrations, but it also means that we have to learn new stories for how it affects performance.

Unfortunately, there isn't much existing work to draw upon. Despite the adoption of Symfony (which I discussed last blog post), the CMI didn't settle on using its Config component. Instead, they created a new configuration service provider out of whole cloth. This service, which is accessible through the config() convenience function defined in Drupal.php, takes a factory approach by instantiating a configuration object rather than just returning a value. This is a great example of forward progress in process as well as features, since while the code is properly Symfonic dependency-injected object-oriented PHP, it's born entirely out of the Drupal development workflow.

The new configuration system operates similarly to Strongarm in concept, where the database acts like a cache for filesystem-based configuration, but there are some important implementation differences. Each module defines a YAML file with its default configuration, and after installation this file is migrated into a randomly-named directory in /sites/default/files/ containing a 'canonical' production directory with current site settings and a staging directory for pending changes. Modified configuration files can be pushed to the staging directory (e.g. via rsync or FTP) and then synchronized into the production directory and the database through the admin GUI.

Unfortunately, the synchronization process isn't exactly fast, which could be a concern for certain deployment workflows:

A single request, using the kernel.terminate event

An 'out of the box' Drupal 8 site synchronizes its staging YAML files into production, which requires updates to the database and the filesystem. Even though only a single module settings file was updated, the process required 532 file_get_contents() calls and 783 database calls.

Once the configuration has been synchronized, another key difference is how it's retrieved. The variable (D6) and bootstrap (D7) cache tables have been replaced with a more granular cache_config table, and entries in it are lazy-loaded as they are requested. Also, unlike its predecessor variable_get(), a call to config() always retrieves a tree of configuration options under the provided namespace. This hierarchical approach has organizational benefits (no more foo_bar_custom_variable_two!), but it can also improve performance if the costs of lazy-loading are outweighed by the benefits of not loading irrelevant variables.

The new configuration system wins on features, but to test how it compares on performance, I extended my work from last post and modified the TraceView bundle to track the dispatch time of other Symfony events. This allowed me to monitor the amount of time spent making calls to the configuration subsystem. The results were quite interesting:

A single request, using the kernel.terminate event
A single request, using the kernel.terminate event

Comparing the performance of the Tracker default View on Drupal 7 (top) and Drupal 8 (bottom) installs, using 50 items of content created with Devel Generate. While the Drupal 8 request took more time overall, likely due to the many known performance regressions, lazy-loading configuration from the database as needed was still faster than loading all variables from memcached at the beginning of the request. (Generated with the AppNeta TraceView module)

It's not clear that there is any way to do a one-to-one comparison between Drupal 7 and 8 given the major architectural changes. However, the result above makes sense: most Drupal variables won't be relevant on most pageloads, so even if it takes longer to load each variable this way, it can still be a beneficial choice as long as you're selective about what configuration to retrieve. The CMI has made some great decisions about the features of the new configuration system, but it looks like they've done their part to ensure that it's performant, too.