Process Improvement for Managing Project Complexity and Scale

All projects of scale go through a phase when they outgrow the loose and undefined processes that work for small projects. With the project growing in size and complexity, processes need to be better defined to ensure they can handle the additional scale. What sets successful projects apart from unsuccessful ones is that successful teams are able to continuously identify when and where processes are breaking down, and are able to continuously improve those processes. I have recently seen how a project team recognized its quality assurance and deployment processes were starting to break and, was able to fix the processes prior to any major issues occurring.

The team looked at the project and found a number of areas that had started to cause them pain while trying to perform their jobs. One of the first issues raised was an early decision not to use automated functional tests. The team recognized that as the system had grown in size and complexity, manual testing was no longer feasible. Another issue was that the rollback plans for each release were not fully developed or tested. The team had just assumed that rolling back the code from the source control system and a database restore would be enough, but again, as complexity increased, it was decided that more time was needed so that the rollback plan for each release could be properly planned and tested to ensure success. Finally, the deployment process itself had been too loose, with an ad-hoc communication path between developers, quality assurance, project management and the client during releases. The team did not feel that all members were aware of when deployments were going to be made, that adequate testing post-release was happening, and if there was an issue, when and how to rollback the release. The project team has started to take corrective action against these issues to ensure that they will continue to deliver a quality product to the client.

With our knowledge in hand from the project retrospective, the team has implemented a number of new processes to fix the above issues. The first is the deployment and rollback plan. The team will not only document the deployment and rollback plan, but will also will test those plans to ensure that they work prior to the release. Much of these plans include very basic procedures such as backing up the database before the release, documenting configuration changes that need to happen for deployment, and how to roll those changes back in the event of an unsuccessful release. The team now has a checklist of post-deployment tasks with task owners to ensure that nothing has been forgotten. They are also planning to have a “Go/No Go” meeting with all parties represented; development, quality assurance, and the client. This is to ensure that everyone is aware of new functionality to be deployed, bugs that were fixed, results of quality assurance, and any known issues that are outstanding. This meeting will allow the client to have a more clear “final word” on whether the release should be deployed to delayed to fix any outstanding issues. Making these small changes will ensure that future deployments go smoothly; if issues do occur, the release can be quickly reverted back to its original state.

Longer term, the team is moving toward a better Quality Management solution. This solution will include using a continuous integration tool, the Drupal SimpleTest Module for unit tests (the system is on D6), and Selenium for browser and functional tests. Using these tools will allow the team to truly regression test the system to make sure that improvements to the system or changes in configuration are not breaking the build. The tests that are created will be linked, manually, to the requirements and bug issues in their Redmine project management system, so the team understands what was tested and if it passed or failed. This will also allow them to understand how changes to requirements will affect testing procedure. These steps will take a while for the team to fully implement because the back-porting of the automated tests and linkage to requirements will be added slowly. The team has to balance this with moving forward with the development of new functionality. Finally, there is the possibility of adding performance testing and security testing into the continuous integration environment in the future, to further improve the testing process. When these changes are fully implemented, the team expects to save a tremendous amount of time and effort and to provide a better quality product.

Taking time to look at the processes is time well spent to ensure the continued success of any project. This is often difficult to accomplish because of the constant pressures to deliver under tight time frames, but it is sure to pay off in the future through added efficiency, and averting major customer-facing issues with website quality.