The insides of AM:A's module: from heuristics to hypermedia & React

Acquia Migrate: Accelerate has been open source for 2 months (see the announcement, drupal.org project and tutorial).

Time for a peek behind the curtain now that the entire source code is visible — how does the module actually work? 🕵️

Heuristics

A hard requirement from the beginning was to empower non-technical users to be able to perform migrations. The UI had to prioritize the site builder persona as much as possible:

  • Good: content types, vocabularies and path aliases ✅
  • Bad: migration plugins, source/progress/destination plugins, YAML without validation ❌

(Or as I like to put it: "if it could contain an underscore, it should require clicking on "Details".)

How can this possibly be achieved? The Drupal Migrate API has no infrastructure for this.

In fact, migration source plugins and migration definitions are arguably not even developer-centric, but database table-centric (which many developers don't need to know the details of): they're designed to get all rows from one Drupal 7 database table into the equivalent tables in Drupal 8/9/10.
This is why for example the d7_url_alias migration is only able to migrate all path aliases at once — not only those for a certain content type, for example (AM:A's core patch to change that: #3122649). This makes sense for developers who know exactly what data is stored where (which indeed is necessary knowledge for complex migrations), but is not something every developer necessarily needs to know. It's definitely something the site builder should not need to know.

On top of that, the user of the Migration API is expected to know in which order to execute migrations. One often hears about "migration scripts", which use Drush to execute one specific migration plugin after another. Again, this is fine to expect for complex migrations. But it should not be necessary in most cases, and it's completely unreasonable to ask a site builder.
One that surprised us while working on AM:A was that migrations of entities with file/image fields do not depend on the migration of files, resulting in stubs getting created for everything (AM:A's core patch to change that: #3123775).

By applying patches to Drupal core that increase the granularity of migration plugins, as well as adding dependencies to some migrations that are missing it (and updating the ones that do exist to take into account the increased granularity), AM:A is able to achieve this:

What do you see here? Well:

  1. We're looking at the "Document media items" migration.
  2. AM:A is redefining what "a migration" actually is: AM:A makes it into a higher level concept, one that we believe better matches the mental model of the site builder. (And honestly: we think it better matches the mental model of anybody working on a Drupal migration.)
  3. The "underlying migrations" (migration plugins, to use the correct terminology) are d7_file_plain_source_field:document, d7_file_plain_type: document, and so on. Note how the last 3 list a "(X of Y)" suffix. These are the "data migration plugins", all preceding ones are "supporting configuration migration plugins". The latter are responsible for configuring Drupal 9 in a way that Drupal can actually store the data to be migrated.
  4. This migration depends on the "Public files" migration because one of the underlying migrations depends on d7_file.

As an end user, if I don't click the "Details" tab, I never see any of this. I would only ever see the dozens of "AM:A migrations", not the hundreds of migration plugins.

How does AM:A figure out what the underlying migrations should be if the Migrate API does not have infrastructure for this? Does it use core patches? No. Does it use magic? No. It uses a bunch of heuristics to achieve this — it's definitely imperfect but works >95% of the time without the need for tweaks to its heuristics. (For the curious: there is a very detailed comment at the top of MigrationClusterer.php, and MigrationClusterer::getHeuristics() lists all heuristics, what they impact and their weight.)

This hard requirement of a UX that is approachable for everyone is a key reason why AM:A has an intimidating 40 (!) core patches configured to be applied: many of these patches add derivers to core's migration plugins (to increase granularity), , to allow more piecemeal migrations.

Hypermedia & React

The AM:A UI is fully decoupled, and:

  • is written in React by Peter "zrpnr" Weber
  • is tightly integrated with Drupal's routing and theme system (did you notice above that the Drupal toolbar is still present and working as it otherwise would?) thanks to the Decoupled Pages module module written by Gabe Sullice
  • is fetching data and triggering migration operations through a JSON:API implementation provided by the acquia_migrate module — also written by Gabe Sullice

All of those aspects can be visualized in a single GIF:

This module was not written on top of JSON:API Hypermedia because it does not use Drupal entities (we could in theory have modeled the "AM:A migrations" as some kind of computed entity, but that would have introduced more layers of complexity for little gain).

The React UI receives application bootstrapping data from the route definition, with the glue provided by the Decoupled Pages module module.

The server side defines a number of link relation types. These describe the semantics of links ("to what concept does this link? what kind of operation does this trigger?"), enabling the AM:A React UI (or any competing UI!) to implement logic for each link relation type once and then have it work "automatically" for any new links that trigger different operations but with the same semantics! For example, the import, rollback, import-and-rollback and refresh operations all use the same link relation type: https://drupal.org/project/acquia_migrate#link-rel-start-batch-process. The React application only had to implement logic for how to discover, present and request these links (including tailored retry semantics) once. For each of those 4 operations.

If we add a new "retry any rows with messages" operation, or an "erase messages" operation, then:

  • the server side would have to implement a route + server-side logic to generate a response
  • the server side would then have to a new link with an appropriate title to present to the end user, but use the same rel
  • the client side would need to make zero changes
  • … and these new operations would show up in the AM:A UI!

This is why initially, a lot of time was spent on the UI: both the client and server needed to grow the necessary basic infrastructure. But then … we increasingly discovered that the building blocks we needed were the same, and we didn't need to build anything new — except for refining the semantics of the link relation types we had defined (we had little to no experience actually implementing Hypermedia).

It ended up working very well 🤩 Both for us (very low maintenance) and for the end user (great UX).

If you want more details still, see Gabe's previous blog post about this at https://dev.acquia.com/blog/tightly-integrated-loosely-coupled and the post on his personal blog at https://www.sullice.com/posts/2019/09/20/a-new-era-for-drupals-jsonapi/.