Better tests through mutation testing; or, killing mutants for fun and profit

Quis custodiet ipsos custodes (who watches the watchers?)

Automated unit tests are one of the best ways of ensuring code quality. But how do you measure and ensure the quality of your unit tests?

The coverage metric is the traditional way of measuring test quality. Coverage measures what percentage of a codebase is executed during unit tests. For instance, if your codebase has 100 lines of code and only 90 of them run during unit tests, it is 90% covered.

Coverage alone isn't enough

But coverage alone is a poor measure of test quality because it doesn’t ensure that the tests actually test your code, only that they run it. Having 100% coverage only guarantees that your code doesn’t fatally error. But saying that your code won’t burn down, fall over, and sink into the swamp is hardly a measure of quality.

Mutation testing complements coverage by ensuring that your tests... well, actually test something! 🤯

It does this by mutating (i.e., intentionally breaking) your codebase and then checking that your tests catch the breaking change. If the change is caught, the mutant is “killed”. If the change is not caught, the mutant “escapes”. The ratio of killed mutants to total mutants, expressed as a percentage, is the mutation score indicator, (MSI). Like coverage, a higher MSI is better!

Mutation testing by example

https://github.com/danepowell/mutation-example

Fortunately, mutation testing is easy to implement in PHP using the Infection framework. Consider this codebase that implements the card game War. Follow along by cloning it yourself (you’ll just need  PHP 8.1 and the PHP pcov extension installed):

pecl install pcov # If you don't already have pcov installed
git clone https://github.com/danepowell/mutation-example.git --branch coverage-only
cd mutation-example
composer install
./vendor/bin/phpunit --coverage-text

If you check out the coverage-only branch and run ./vendor/bin/phpunit --coverage-text, you’ll see it has 100% coverage. Sounds great, right? But look more closely at the test cases. What do you notice?

public function testAnnounceWinner($card1, $card2, $expectedWinner): void {
  $war = new War($card1, $card2);
  $this->assertStringContainsString('!', $war->announceWinner());
}

None of the test cases have meaningful assertions. Imagine what would happen if we "accidentally" flip this inequality on line 36:

public function announceWinner(): string
{
  $card1value = self::getCardValue($this->card1);
  $card2value = self::getCardValue($this->card2);
  if ($card1value > $card2value) {
    return "Player 1 wins!";
  }
}

Our program would return the wrong winner 100% of the time. So much for great test coverage!

This thought experiment is exactly what mutation testing implements as a practice. To see this in action, run ./vendor/bin/infection --show-mutations:

$ ./vendor/bin/infection --show-mutations
...
20 mutations were generated:
       8 mutants were killed
       0 mutants were configured to be ignored
       0 mutants were not covered by tests
      12 covered mutants were not detected
       0 errors were encountered
       0 syntax errors were encountered
       0 time outs were encountered
       0 mutants required more time than configured

Metrics:
         Mutation Score Indicator (MSI): 40%
         Mutation Code Coverage: 100%
         Covered Code MSI: 40%

Notice the 40% MSI alongside 100% code coverage. This tells a fuller story of our test "quality" (such as it is). Now look at the escaped mutants.

10) /Users/dane.powell/src/danepowell/mutation-example/src/War.php:36    [M] GreaterThanNegotiation

--- Original
+++ New
@@ @@
     {
         $card1value = self::getCardValue($this->card1);
         $card2value = self::getCardValue($this->card2);
-        if ($card1value > $card2value) {
+        if ($card1value <= $card2value) {
             return "Player 1 wins!";
         }
         if ($card1value < $card2value) {

These mutants demonstrate the exact scenario we mentioned, i.e., changing the inequality so that the wrong winner is returned. The fact that our tests didn't fail in response to these mutations makes them escaped mutants.

Now for the fun part: killing those mutants! Check out the main branch, which adds assertions such as this:

public function testAnnounceWinner($card1, $card2, $expectedWinner): void {
  $war = new War($card1, $card2);
  $this->assertSame($expectedWinner, $war->announceWinner());
}

Then re-run the mutation tests.

$ git checkout main
$ ./vendor/bin/infection --show-mutations
20 mutations were generated:
      20 mutants were killed
       0 mutants were configured to be ignored
       0 mutants were not covered by tests
       0 covered mutants were not detected
       0 errors were encountered
       0 syntax errors were encountered
       0 time outs were encountered
       0 mutants required more time than configured

Metrics:
         Mutation Score Indicator (MSI): 100%
         Mutation Code Coverage: 100%
         Covered Code MSI: 100%

You’ll see that by simply adding a few assert statements, we've increased the MSI to 100% and, more importantly, significantly improved the quality of our tests.

Mutators

Going back to the coverage-only branch, let's look at how the mutations were generated. You'll see a "mutator" name printed to the right of each mutation, such as GreaterThan and LessThan.

11) src/War.php:40    [M] LessThan

--- Original
+++ New
@@ @@
         if ($card1value > $card2value) {
             return "Player 1 wins!";
         }
-        if ($card1value < $card2value) {
+        if ($card1value <= $card2value) {
             return "Player 2 wins!";
         }
         return "It's a war!";
     }
 }

The Infection library has dozens of mutators built-in, each of which trying to break your code in a unique way. In addition to GreaterThan and Lessthan, which flip inequalities, you’ll see, among others, DecrementInteger and IncrementInteger mutators, which work by changing the value of any integer they find.

8) src/War.php:28    [M] IncrementInteger

--- Original
+++ New
@@ @@
             'J' => 11,
             'Q' => 12,
             'K' => 13,
-            'A' => 14,
+            'A' => 15,
         };
     }
     public function announceWinner() : string

Implementing mutation testing

Keep reading for some tips and best practices, and when you're ready, refer to the thorough Infection documentation to get started.

The best way to implement mutation testing is via continuous integration. For a brand-new codebase, implement it from the start and require a minimum MSI for all pull requests. When adding mutation testing to an existing codebase, consider running infection with the --git-diff-lines option so that only changed lines are mutated. These practices ensure that test quality gradually improves over time without requiring a major up-front investment in mutant-killing.

If you use GitHub Actions, make sure the annotations logger (enabled by default) is working so you can see escaped mutants right in the PR!

Image
GitHub annotation

Also consider setting up a Stryker Dashboard account to see mutation results beautifully rendered and navigable, as in this example from Acquia CLI

Image
Stryker dash

Finally, keep in mind that Infection simply runs your existing PHPUnit tests under the hood. Furthermore, it runs these tests in parallel, dozens of times each, and in random order. This means that your PHPUnit tests need to be idempotent and thread-safe, and any underlying stability or performance issues will be exacerbated by mutation testing.

Happy hunting!

Mutation testing is one of the best ways to ensure test quality, and it takes just a few minutes to set up if your tests already follow best practices. Get started and get support by referring to the Infection docs and Infection GitHub page. As a reference, also check out Acquia CLI, an open-source Acquia product which implements mutation testing on its own codebase via a GitHub Actions workflow.

Now go forth, kill some mutants, and improve your tests. Happy hunting!