Project Applications are the process by which we invite new code contributors to the contributed project repository. This process needs to be made more scalable. In recent weeks the review times have gone down dramatically, but it will be tough to sustain that improvement. I believe solving this problem will require both automation and more humans.* Read through to the end for details on a program where I will mentor anyone interested in learning to do security reviews of a module.
Project application process history
The "project application" process has a relatively long history in the Drupal project. Going back as far as I remember, it was the "cvs application" process where people confirmed they would follow the guidelines for CVS, stated a "motivation" and as long as the motivation was decent they got access. Lots of people applied and then didn't follow the guidelines or never contributed anything. So, it then started to include a sample project so you could show you were legitimately interested and show you were already following some guidelines. This process was all done on a mailing list which meant few people were able to follow and learn from the process. In July of 2009 it was moved to the CVS Applications project which followed roughly the same process, but was at least in the open where the barriers to entry as a reviewer were lower, the process was more transparent, and anyone could potentially benefit from the feedback in a review. Along the way the requirements of to get approved have changed in a typical organic manner with plenty of discussion, but little unanimous agreement. In addition to security and licensing applications are now reviewed for code style, correctness in use of APIs, duplication of existing projects, quality of documentation. This is a big change from how things were done years ago, but the nature of the Drupal community has also changed a lot in that time.
In March of 2011 we moved to the Project Application process. The process is roughly:
- Create a git sandbox - a new concept introduced by the git migration which has limited review and limited features (no namespace, no downloadable tarball)
- Once your sandbox is in a good state, create an issue in the project application queue
- Wait, somewhere between a few days and a few months, to get reviewed, fix any problems identified in the review, do some learning about Drupal best practices, and eventually have your project accepted.
The last step - especially the waiting aspect - has caused a lot of concern.
Process inflection point: Drupalcon London
At Drupalcon London, several people interested in the process got together to do some research on the current system, problem analysis, brainstorming on solutions, and gave a presentation of their ideas to improve the review process. There were two big takeaways:
- the current system was taking too long, focusing on the wrong things, frustrating to new contributers
- scaling the process with human reviewers is hard, doesn't help with some areas, and we should focus on "automating" as much as possible
"Automating" takes on two meanings here. First, automate some truly automatable tasks like looking for files named LICENSE* or identifying the lack of a README.txt and alerting the user to fix those problems with links to documentation. This frees up human reviewers and is emotionally easier for the applicant to get that feedback from a robot. Second, it also means we could create a "quiz" which would help automate some of the API and security checks people do while also providing more consistent coverage of areas of Drupal. Regarding coverage, consider a typical Drupal module which might touch on 2 elements of security but miss 20. The quiz can help to cover all 20 items.
Work is underway on automating these things in a step-by-step manner. We'll add some, see how it works, and consider adding more as we can. If it's taken to its conclusion this will require improvements to the testing infrastructure, and automated code review tools like coder and secure code review among others.
Two big posts that you should refer to if interested in how to automate the process are the Jthorson battle plan which draws heavily on the Drupalcon London presentation for ideas. A somewhat related post is Jim Berry's ReviewDriven.com based proposal to revamp the testing.
Meanwhile back at the ranch...
Manual Drupal project reviews continue
2 months after Drupalcon London, the manual review process has to continue. For most of that time it stayed roughly the same: applications that needed review hovered in the 200+ range and projects took between 1 week and as long as 8 weeks to get a review.
However, in the last 2 weeks it has gotten a lot better. It's hard to say exactly why, but Klaus Purer of epiqo and Scott Reynen of Aten Design Group have both done an amazing number of reviews (I've done a fair number as well ;) ). Just take a look at this graph showing the "needs review" queue falling while we see a rise in the lines for the needs work and "all fixed" (a combination of fixed and closed fixed). The needs work queue dropped as low as 24, though it's back up a bit. After the initial cleanup we're getting to a point where the "needs work" queue is starting to fall and the "all fixed" is starting to rise.
How you can help!
So, now we get to the good stuff.
If you want to get involved we need help in a variety of places.
You can join the irc room for #drupal-codereview or view resources in the Code Review group, both of which will help you get plugged in to all the initiatives I described above. If you want to do reviews, you should especially read the resources in the "Go Ahead and Review" area on the group homepage. If you want to help with automation, see this wiki of tasks to automate.
Here is what I think is a somewhat important part: For folks who will do manual reviews, I am making an offer: you do the first step in the review looking for items in the how to review section and I'll do one-on-one mentoring through the security review. More on that idea in a blog post tomorrow.
*Important side note
There is also a vocal contingent that thinks the solution is to open the flood-gates. If we let anyone create a project with their own namespace and full tarball downloads then that would remove the whole need to build this crazy bureaucracy/automation stuff. The drawbacks there are namespace squatting, the problem of module duplication leading to confusion about which module to use, and that it would likely lead to large (or just larger?) numbers of insecure modules. Those problems can probably all be solved in other ways and its important to note the plan I'm working toward is not the only way to "solve" this problem.