What is a quick way to integrate static analysis in a big project?
By Evgeniy Ryzhkov
So, you're a developer working on a project containing a lot of (perhaps way lot of) source code - say, over 100 Mbytes.
You have read a pile of our articles about how we analyze various open-source projects and decided to check your own project by some code analyzer. You've run the tool on your code and got over one thousand messages. Well, one thousand messages is an optimistic estimate. There may well be more than a dozen thousand. But you aren't a lazybones, are you? You set about reviewing them. And, oops, as soon as at the fifth message you find a real bug! The same with the seventh, ninth, twelfth, and fifteenth. You note down another dozen of real bugs caught by the analyzer and go to your boss saying:
"Have a look here, Chief. I downloaded a cool analyzer. It caught ten real bugs in half an hour. And in total it output one (two, three, four) thousand warnings. Let's buy it! The guys and I will work on that and fix all the messages in two or three weeks. And after that we'll be getting 0 messages. That'll mean we are tough programmers writing super quality code!"
And though you are already anticipating getting pinned a new medal on your chest (you are doing your best for the project's sake after all, aren't you?), your boss will most likely answer something like this:
"You've got nothing to do, huh? You want to distract the whole team for three weeks to do bug fixing while we are having a new release in a month! So what about those bugs being real? We've been doing well all this time with them in the code. Sure, it would be great not to make mistakes in new code. So just don't make them! But don't you touch the old code! It has been already tested and the clients have paid for it. And who will pay the team for three weeks of more work? No, we are not buying that analyzer, nor touching the old code. And you seem to have no current tasks to do. OK, I'll give you a few - and they must be finished tomorrow!"
Is this scenario real? Yes, it is. The worst thing is that a nice idea of adopting static analysis wasn't fulfilled because you needed to review and fix all the messages generated by the code analyzer. If it outputs a thousand messages after every run, how will you tell between old and new warnings? Is there a means to solve this problem?
How do we solve this problem in PVS-Studio?
A couple releases ago, we added a mechanism to suppress old or "uninteresting" messages into PVS-Studio. By version 5.22, we got it completely tested and tuned and now it has proved to be so convenient and useful that we recommend everyone thinking of integrating static analysis into their projects to use it. And now I will tell you how to use this mechanism.
So, imagine you have a project of 5 million code lines (just as an example), its codebase comprised of 150 Mbytes of source files. You've got it checked by the analyzer and got a few thousand messages from the General Analysis rule set. You'd be glad to fix them but the project manager is not willing to allocate you time for that.
OK, no problem. Once you're finished analyzing the entire solution, go to the menu PVS-Studio -> Suppress Messages... The dialog box is simple, you just need to press "Suppress Current Messages" and then Close. After that, you will see 0 messages in the PVS-Studio window. And even if you analyze the whole code once again, you will still get 0 messages. But if you start writing new code or modifying the old one, the analyzer will generate warnings on those changes - unless you make it perfect right from the start, of course.
How does it work?
The working principle behind this mechanism is pretty simple - at least now, for we did have to carry out a number of tests to finally implement it the best way. OK, so what happens from the user's viewpoint when they press the "Suppress Current Messages" button?
In the project folder, a base of .suppress-files is created. Each project (.vcxproj-file) gets an associated .suppress-file. This file stores all the information about the messages generated by the analyzer when analyzing the project. It is these messages that every new analysis output is compared to.
This comparison is not head-on, of course. We take into account the message code, the text of the current code line as well as the previous and next ones. But we don't take into account the line number, for if there's a message generated for text at the end of the file (and stored in the codebase that way), adding lines to the beginning of the file would change the line triggering the message, but we can track that and won't allow the analyzer to generate the "uninteresting" message. But when one of the lines (previous, current or next one) changes, the analyzer restores the warning since the message's context has been changed and we can treat the message as if it were output for new (modified) code.
What do you do with these .suppress-files? If PVS-Studio is run on one machine, say, a build server, you can store these files right in the project folders. If it's not convenient (for example, you are having a clear build), then prior to launching the analyzer, you can copy the files into the project folder via robocopy from another location - this utility will keep the folder structure unchanged.
If there are several developers working with the analyzer simultaneously, you can upload the .suppress-files into the repository. The question is how to synchronize these .suppress-files between the developers. On the one hand, it's no problem at all as you are working with XML files. On the other hand, you don't even need to synchronize (or modify in any way) these files at all. You just need to create them once when integrating the analyzer into the development process for the first time, and there's no need to touch them afterwards. And of course you should try to keep the number of messages at 0 in future.
Note. And how do you keep the number of messages at 0 if the analyzer would generate false positives every now and then? There are a few methods to suppress single warnings in these cases. We will tell you about those a bit later.
So what is this mechanism for?
If some of you haven't got it yet, here you are an absolutely concrete example. You can add all the messages into a base like described above (to get a 0 messages output). Then you run the analyzer nightly on a build server (learn how to set up the analyzer to launch on a build server) and it will generate new messages only - that is, messages for new code written by your team during the day. These messages are stored not only in the .plog format (the analyzer's report in the xml format) but as a text file as well, in the same folder. This text file can then be sent by email to the project developers via any suitable application, for example SendEmail.
In the morning, the programmers will see the analyzer-generated messages in their email boxes and start fixing the errors even without having to install PVS-Studio on their computers. But if you still want to open the log (.plog-file), you can find it on the build server. This trick can help you save a lot of money on PVS-Studio licenses.
By the way, you can specify what messages (or, more exactly, messages of what severity levels) should get into the text report. It is done through the OutputLogFilter option found in the Specific Analyzer Settings tab in PVS-Studio's settings. We recommend that you include into this file General Analysis messages of Level1 and Level 2.
A small note on the incremental analysis mode
However, we still recommend that developers first of all use PVS-Studio on their local machines in theincremental analysis mode. I mean that they should tick the "Analysis after Build (Modified files only)" checkbox in the PVS-Studio menu. With this option enabled, the analyzer keeps tracking your progress and automatically starts analyzing all the files you have compiled (that is, it tracks .obj-files being modified). If the analyzer doesn't find anything, you will never even notice it has been running. And if it does, you will see a pop-up message about errors found.
Incremental analysis supports the base of "uninteresting" messages. If you have such a base, then incremental analysis messages will only refer to freshly written code. And if you don't, the tool will generate messages for the entire file being analyzed.
The incremental analysis mode allows fixing errors right after they have been made, even before they get into the control version system. And as you know from McConnell's book, it's at this stage that a bug is cheapest to fix. Therefore, we do recommend you to use PVS-Studio both for daily runs on the server and in the incremental analysis mode on the programmers' computers, whenever possible.
How to fix messages in the base?
OK, now you've got static analysis integrated into your project and the analyzer outputs just a few messages per day which your team eliminates right away. Cool. But imagine now you've got a free week you can spend on fixing old bugs. How to get to them? There are two ways.
- Or, if you have daily analysis set up on the build server, you simply need to go to the folder containing the last analysis' results and find the following files there:
- SolutionName.plog - log with new messages only;
- SolutionName.plog.txt - text log, the same as SolutionName.plog;
- SolutionName_WithSuppressedMessages.plog - all the messages including "uninteresting" ones.
It is this file SolutionName_WithSuppressedMessages.plog that you need to work with. Open it to see all the messages. This file will be large at first as there are lots of warnings. But if you fix them at least from time to time, it will eventually get small and maybe you will even be able to give it up completely (and, consequently, .suppress-files too).
There is one thing you should understand. If you have time and chance, you can get back to the old "uninteresting" messages and fix them at any time. We recommend doing this because those errors still remain errors and ought to be eliminated.
Doesn't it contradict the Mark as False Alarm feature?
PVS-Studio provides the Mark As False Alarm command which is used to mark messages as false positives. It implies adding a comment of the //-V501 pattern into the code line triggering the message. On coming across this comment, the analyzer won't generate the V501 warning for that line.
The mechanism described in this article and the Mark As False Alarm mechanism in no way contradict each other. They are used for a bit different purposes. Mass suppression of uninteresting messages is used for mass message marking when you are only starting to integrate the analyzer with your project, while Mark As False Alarm is used to tell the analyzer not to get angry with single code fragments.
Well, one could mark the whole code as False Alarm even before the new mechanism appeared, but people are usually a bit scared of making so many edits in the code. Besides, it's not clear how to work with the old bugs - remove everything marked as False Alarm? And what if there really were false alarms there?
However, it's wrong to use the mass suppression mechanism for suppressing single messages, too.
So, these are two separate features designed to solve different tasks. Don't mix them up.
So, we offer the following approach when integrating static analysis into a live project:
- Mark all the messages as "uninteresting" through the "Suppress Messages..." command.
- From now on, the analyzer will generate messages for new code only, starting from the next launch.
- If necessary, you can at any time get back to the old bugs hidden after the first launch.
It allows your project to start benefiting from static analysis immediately.