Revered management thinker Peter Drucker once wrote, “If you can’t replicate something because you don’t understand it, then it really hasn’t been invented; it’s only been done.” In many ways content modeling in Drupal has been done without being invented. There is no accepted method of analysis, no common format for specification, no process for change management, no best practices for testing. Consequently outcomes are highly variable. For this reason, we’re developing a discipline for content modeling at Acquia. It’s drastically reducing both costs and defect rates for us, and we’re sharing it with you in this post.
Introducing the Drupal Spec Tool
The cornerstone of the discipline is something we call the Drupal Spec Tool, consisting of a spreadsheet and automated testing infrastructure.
Why a Specification Tool?
A standard specification format is an industry’s answer to the difficulty of validating, communicating, and conceptualizing complex solutions. Every professional domain has theirs: Object Oriented programmers have the UML, architects and engineers have the blueprint, and mathematicians have mathematical notation.
Without the blueprint, architects couldn’t validate a building design without actually constructing it; they couldn’t communicate it for execution except in lengthy prose; and no one could understand or reason about the design except by directly inspecting the in-progress construction. Anyone who has watched Drupal content model details pass from customer to technical architect to implementer to QA will readily perceive the parallels.
Jira (or your favorite ticketing system) is not a good tool for specifying a content model. It’s great for managing the work of implementing one, but it’s not made for clearly presenting extensive technical details that change over time. It does nothing to help you ask the right questions when you’re with the customer (much less format the answers). And when someone later asks what the customer decided on a given point, you don’t want to have to search through tickets to find an answer that may or may not have been changed in a comment or a later ticket.
Why a Spreadsheet?
Because everything is a spreadsheet, really. The human brain seems to be wired to understand data in grids. Want to spot patterns in a complex data set? Arrange it in columns and rows. There’s a reason we learn our multiplication tables in grade school.
We chose Google Sheets in particular for its sharing and collaboration features, including access control and commenting.
Do I Need It?
If you’re not meaningfully customizing the out-of-the-box content model of the Drupal distribution you’re using, this tool may be overkill for you, like creating a blueprint for your mobile home. But the more important your content model is to the business, the more complex it is, the more often it changes, and the more people need to understand it, the more specifying it will pay dividends.
How much dividends, you ask? We wondered, too, so we ran some numbers. On a recent large (Drupal 8) enterprise build we asked developers to estimate the amount of time they ordinarily spend per week on content model related activities, including implementation, communication, and change management. The reported average was eight hours per person. That felt low to us, but all the better.
Next we introduced the tool and tracked hours performing the same activities. After some basic training, we averaged five hours per person per week, about a 38% savings. The rate of change to the content model remained basically constant over the course of the project, ending with a massive 165 content entity type bundles, 630 fields, and 246 relationships (what a perfect test!), so we extrapolate that, given ten people over 46 project weeks with an hourly billing rate of $150, we probably saved over $200,000.
Numbers aside (stunning as they are) the consensus on the dev team was that they would never go back to the old way of working. As one developer humorously put it, “Words cannot properly express how chicken-with-its-head-cut-off life is without this system.”
Design and Features
The specification tool is designed to guide the user through capturing the most important decisions about the content model, to dynamically validate their correctness as much as possible, and then to provide tools for analyzing the resultant design.
The “Overview” Sheet
The “Overview” sheet (or tab) provides a high-level overview of your Drupal specification, including the number of each type of configuration specified versus completed. It’s entirely dynamically generated, so everything but the project name and base Drupal distribution fields is protected.
All sheets share certain common design features. Column headers have self-explanatory labels or notes with explanations. (Notes are indicated by a black triangle in the top right of the cell. Hover to see them.)
Columns and sheets generally proceed from left to right in descending order of logical priority. Infrequently used columns and sheets are hidden by default. (These will be discussed later.) Others that don’t apply for a given project can be hidden without affecting the dynamic features of the tool.
Wherever possible, fields have validation rules to prevent specifying impossible configurations and present valid options in dropdowns. Empty cells that should be filled out with the customer have a red background until completed. Those that can be filled out alone afterward have a yellow background.
Query results and computed values are always italicized, and ranges that shouldn’t be manually edited are protected when possible.
The Specification Sheets
Sheets for specification data entry are colored, and related sheets share the same color. The "Bundles" and "Fields" sheets are for specifying your content types, block types, vocabularies, media types, and such (which Drupal internally refers to as “bundles”) and the fields that are attached to them. The other specification sheets should be self-explanatory:
- "Views" and "Views displays"
- "Migrations" and "Migration mappings"
- "Moderation states" and "Moderation state transitions"
- "User roles"
The “Diagrams” and “Behat” Sheets
The two gray colored sheets at the end dynamically generate PlantUML diagram markup and Gherkin for Behat tests, respectively. The “Diagrams” sheet has links to PlantUML implementations for converting the markup to graphics that can be stored in the codebase or supplementary documentation if desired. Gherkin depends on our Open Source Behat contexts.
The Hidden Sheets
Several infrequently used or “under the hood” sheets are hidden and can be accessed from the “View” menu under “Hidden sheets”:
- "Bundles CSV" and "Fields CSV" contain the same data as the Gherkin tables on the Behat tab for those who prefer a CSV format. (They look great on GitHub)
- "Settings" contains static values for field validation and dropdowns, mostly taken from Drupal source code. If you need to add options provided by a contrib module or custom code, this is where to add it.
- "Queries" is for Google Sheets functions and queries that drive the dynamic features of the tool. You shouldn’t need to touch it.
Using the Tool
A tool is a dead artifact without a corresponding process. With compliments to Eric Clapton, it’s in the way that you use it. ;)
The first use of the tool is in the early elicitation of requirements. The first client conversations intended to yield a list of content types, even for the purpose of estimation, should be guided by the tool to avoid surprise discoveries later on.
Start with “Bundles” and ask the customer to list all the “things” that live on their site. Don’t fall into the trap of asking about content types or other “Drupalese”. Speak in business terms: blog posts, press releases, events, staff bios, recipes, as well as videos, stock tickers, image galleries, and other page components. The beauty of the “Bundles” designation is that it ignores technical distinctions and encourages business-level thinking.
List the “things” in the “Name” column. Skip the “Machine name” and “X” columns for now. (We’ll come back to them.) And for each row, begin by asking the customer to provide a one sentence description. This forces clarity up-front (after all, if you can’t explain something simply you don’t understand it well) and becomes automatic UI documentation that we all know nobody provides after the fact.
Next, ask for a current, live example, if available, and link to it. This will serve you later when you can’t remember what a certain bundle is for or your front end developer asks what it should look like. (If there is no current, live example to point to, this is a great place to document the need for a mockup!)
Armed with a good description and example, follow the remaining columns in order. Hide any that are irrelevant to your project or context. By the time you get to the “Type” column, you’ve gathered enough detailed requirements that the decision between content type, vocabulary, and the like practically makes itself most of the time. Document any details that don’t have their own column under “Settings/notes”.
When you’ve finished this exercise, ask the customer to look over the details and confirm them. If they have no changes, mark the “X” (or status) column of the new rows “a” for “Approved and ready to implement”. Next, ask if any “things” are missing. If not, move on to the “Fields” tab and complete it in like manner. The other tabs follow all the same conventions and are used the same way.
Once the specification is complete it can be handed off to the delivery team for implementation. It is safest at this point to “freeze” it in order to provide an unchangeable artifact to work with, much as you would cut a release tag for a given state of your codebase. Duplicate the spreadsheet and add a version number to the resultant document’s name. A helpful convention is to use the sprint number with a point release, e.g. “v1.0” or “v2.3”.
The duplicate is now the “frozen” specification. lt must not be modified because changes cannot be automatically merged into the original, upstream document and may be lost, and you don’t want to change the requirements while developers are implementing them. Create a ticket to track the work and include a link to the document. Keep the ticket free from all other details to avoid conflicting details and misdirection. This is key to the efficiency and cost savings of the process. Something like “Implement specification v1.0” is a good summary.
If you have a working Behat setup (pro tip: BLT), you can include the generated tests from the “Behat” sheet for automatic assertion of many aspects of the specification. (Yay, BDD!) You’ll just need our Behat contexts installed.
Next comes the implementation work itself. Two hidden columns, “Dev” and “QA”, are provided on each sheet to track progress. Un-hide these by clicking on the arrows in the column header on the right border of the “X” (status) column. The implementer can put their initials in the “Dev” column of each row as they complete the corresponding work. Later the reviewer can put their initials in the “QA” column as they verify each row.
Once the ticket has been accepted, return to the canonical version of the specification and mark the completed rows as such by changing their “X” (status) cells to “x” for “Implemented and done”. This is a good time to refresh any diagrams you’ve generated from the “Diagrams” sheet.
Rare is the content model that suffers no change over time. More likely, requirements will evolve, the customer will change their mind on points, feature development will necessitate additions, or you’ll find you just plain made mistakes the first time. When the inevitable happens, you need to update the specification.
To begin with, never implement changes without first updating the specification. As they say, the only thing worse than no documentation is incorrect documentation. Let slip the first indication that the process isn’t respected or the specification doesn’t matter and it’s only a matter of time before you find it on the street, looted and burned.
Specify additions to the content model just as in initial discovery, by adding new rows.
For each modification, change the “X” (status) cell to “c” for “Changed since implemented” and update the other cells as appropriate. Make it easier for the implementer to identify the change by calling it out in a note on the “X” cell, e.g., “Changed to required” or “Changed form widget”.
To specify a removal, change the “X” (status) cell to “d” for “To be deleted”. (If you just delete the row from the sheet, there will be nothing to tell the implementer to make the change!)
When done specifying a round of changes, freeze the specification again. In case your content model is large, there’s a “To-do” filter view that will temporarily restrict the display to only rows that still need to be implemented without affecting others’ view of the sheet. Other than that, the implementation process is identical. Rinse and repeat.
Gotchas, Limitations, and Known Issues
We’re pleased with the tool, but this is its first public release, and we’d be lying if we said there weren’t any wrinkles to iron out. Here are a few things to look out for as you use it.
- You can’t pick more than one value for the “Ref. bundle” field (target bundle for reference fields) because Google Sheets validation doesn’t support multiple dropdown selections. We just hand type additional values, separated with commas. It captures the necessary details for implementers, but the cell will get flagged as having an error, and it will break the entity relationship diagram (ERD). This is probably the biggest limitation of the tool.
- A large enough data set can break the Behat test generation tab due to character length limits in Google Sheets.
- If you give two bundles the same label it will confuse the queries and break the auto-generated entity relationship diagram (ERD), even if they’re of different content entity types (e.g., a content type and a vocabulary). It’s probably best from a UX perspective not to name multiple things identically anyway. ¯\_(ツ)_/¯
- If you add a row at the top of a sheet it will confuse the queries, and the new row won’t be included in cell validation dropdowns, diagrams, or tests. When adding a new row that belongs at the top, it’s best to add it between rows two (2) and three (3) and then drag it into position.
- Google Sheets conditional formatting ranges are a little fragile, and if you drag rows around a lot you may find the zebra striping, for example, behaving strangely here and there.
We hope to enhance and improve the tool over time. Here are a few things we’re thinking about:
- Add support for contact forms and/or webforms.
- Add more validation, e.g., to ensure that bundle level specification implying fields actually have the corresponding fields, or to restrict “Form widget” dropdown options to those that are actually available for the given “Field type”.
- Add sheets for theming details.
- Blue skies and unicorns!
There’s so much opportunity for improvement and expansion, and we’d love for you to be a part of it! Please offer feedback, including bug reports and feature requests for the spreadsheet or the Behat contexts, at https://github.com/acquia/drupal-spec-tool. Thanks for joining us!
Revered management thinker Peter Drucker once wrote, “If you can’t replicate something because you don’t understand it, then it really hasn’t been invented; it’s only been done.” In many ways content modeling in Drupal has been done without being invented. For this reason, we’re developing a discipline for content modeling at Acquia. It’s drastically reducing both costs and defect rates for us.Acquia Developer Center May 30, 2018 June 13, 2018