Mutation testing in the real world

In my last article, I helped you set up mutation testing using the Infection library on an example project that implemented a simple card game. Of course, the real world is not so simple. When you try to implement mutation testing in your own project, you’re likely to encounter stumbling blocks and questions such as:

  • How do I interpret a mutation test report?
  • How should I handle timeouts and errors?
  • Why do tests that pass in PHPUnit fail in Infection?
  • How do I exclude test cases from mutation testing?

These issues can seem overwhelming at first, but fear not! In this post I’ll walk you through how to get mutation tests running (and running well) on a real project.

Interpreting mutation reports

I’ve updated my example Mutation Testing application to demonstrate two of the most common problems in mutation testing: errors and timeouts. Try it yourself:

pecl install pcov # If you don't already have pcov installed
git clone https://github.com/danepowell/mutation-example.git --branch errors
cd mutation-example
composer install
./vendor/bin/infection --show-mutations

The results should look something like this:

Image
Mutation testing report

It may not be immediately obvious what each part of this report indicates, and indeed, whether it’s good or bad. Let’s break it down:

  • The first few lines indicate that Infection was able to generate 43 mutations. This means that Infection attempted to mutate (break) your code 43 times. Recall that Infection has a finite set of mutators it can apply to your source code, so the number of mutations generated will be a function of the size of your code base and how many mutators are applicable. This number is not a reflection of code quality.
  • We then see 38 mutants were killed. This means that of those 43 mutations, 38 caused tests to fail. This is a good thing! Killed mutants are good and indicate high quality tests. Recall that the ratio of killed mutants to mutations generated is the MSI (38/43 or 93% in this case).
  • 3 covered mutants were not detected. This means that of 43 mutations, three did not cause tests to fail. Uncovered mutants are bad and indicate poor quality tests. See the first post in the series for tips on how to kill mutants.
  • 1 errors were encountered. This means that when Infection mutated your code, it didn’t just break tests, it actually caused a fatal error such as memory exhaustion via an infinite loop. See the next section for an example and tips on how to fix these errors. Errors are bad because they slow down tests, but they don’t indicate any problem with your code.
  • 1 time outs were encountered. Like errors, timeouts indicate a test took too long. Often this is due to a mutation causing a wait condition to never terminate. See the next section on how to fix them. Timeouts are bad because they slow down tests, but they don’t indicate any problem with your code.
  • 0 mutants required more time than configured. Remember how Infection runs each test case once before applying mutators? Part of the purpose of this check run is to measure how long each test case takes. If the runtime is longer than the configured timeout, Infection skips the test entirely to avoid wasting time on tests it is reasonably certain will time out anyway. Skipped mutants are bad because they reduce test coverage. Try to speed up these tests or increase the timeout.

Handling timeouts and errors

As mentioned, timeouts and errors don’t indicate a problem with the quality of your source code; they are simply a byproduct of mutation testing. However, they are still a cause for concern because over time they can degrade the performance of mutation tests by increasing the amount of required time and resources (especially memory.)

For instance, consider the following mutation which generates an error:

Image
Increment RuntimeError

In this case, the Increment mutator causes an infinite loop, leading to memory exhaustion. Besides wasting time and resources on a test that you know will fail, this may have knock-on effects and cause stability issues on the machine running tests, so it’s best to address these errors by disabling mutators on the affected line using comments such as /** @infection-ignore-all */.

The same mutation could result in a timeout instead of an error if the process runs out of time before it runs out of other resources:

Image
Increment Timeout

In either case, the easiest fix is to disable mutators for that line of code.

You might be inclined to increase the Infection timeout in order to “fix” timeouts. Unless you know that your code legitimately takes longer than the timeout to function, this is likely to just exacerbate the problem and either convert timeouts to errors (i.e., memory exhaustion) or make tests take longer.

Why tests pass in PHPUnit and fail in Infection

When you add mutation testing to an existing project, the problem you’re most likely to encounter is that mutation tests fail when running the initial test suite, resulting in a rather alarming error:

Image
Infection failing test

Before actually running any mutation tests, Infection ensures your tests are passing by running the initial test suite. This is an important step, because otherwise failing tests would appear as caught mutants, erroneously boosting your mutation score indicator (MSI).

You might wonder why your tests pass in PHPUnit and fail in Infection. The most likely answer is that Infection runs tests in parallel and random order, whereas PHPUnit by default runs tests serially in a static order.

To ensure PHPUnit behaves like Infection, use a tool like ParaTest to run tests in parallel and configure PHPUnit to run tests in random order by updating your phpunit.xml:

<phpunit executionOrder="random">
   <!--  ...  -->
</phpunit>

The Infection documentation provides guidance on the most common root causes of test failures and some workarounds. Having fully independent test cases that can run in parallel will make your tests much faster and more robust. For instance, Acquia CLI is a great example of how to implement mutation testing on a complex application and is able to run over 350 functional tests in less than 3 seconds.

If you follow the best practices defined here and run PHPUnit tests yourself prior to running Infection, you can save even more time by using the --skip-initial-test flag.

Excluding tests from Infection

If you have tests that cannot be parallelized, or that otherwise are incompatible with mutation testing, it’s easy to exclude them by adding the testFrameworkOptions directive to infection.json5. For instance, a real-world configuration file that excludes test cases annotated as “serial” might look like this:

{
    "$schema": "vendor/infection/infection/resources/schema.json",
    "source": {
        "directories": [
            "src"
        ]
    },
    "logs": {
        "stryker": {
            "report": "main"
        },
        "github":  true,
        "html": "var/infection.html"
    },
    "mutators": {
        "@default": true
    },
    "timeout": 300,
    "testFrameworkOptions": "--exclude-group=serial"
}

Any value you provide for testFrameworkOptions will be passed directly to PHPUnit, so you could also use --filter or similar arguments.

Now it’s your turn

Hopefully this demystifies mutation testing and you're ready to go improve your tests. If you get stuck, be sure to check the comprehensive Infection documentation, and if you're still stuck open an issue on GitHub.