Be careful what you wish for… (false positive error reporting)
I wrote:
I gather that VB intends to publish both the statistically sound false-positive rates and the industry-standard “nice” rates. I’m hoping that they do so.
Well, my wish was granted! (you’ll need to register)
For each product they stated both “FP rate” and “FP of total mail corpus”. Once various special cases (see below) have been removed, FP rates of 1-3% were observed. Like any other actual measurement, these numbers are not necessarily representative of anything – and these may have been worsened by VB’s particular experiment setup – however they are at the upper end of what BoxSentry is seeing each time we perform an analysis.
Before commenting on the specifics, I’d like to address what I see as an error in VB’s report. Anyone who has completed high-school physics will have dealt with measurement and the need to discard erroneous measurements before reaching conclusions. The most common reason for choosing to do this is the observation of results that lie so far outside the norm that they are clearly incorrect. (Scenario: you’re analysing data on [human] child birth, one of your subjects has a recorded birth weight of 6125kg. This is clearly an error and should be removed from your data before calculating average birth weights.) This occurred with one of VB’s subjects. The report says:
a false positive rate of over 25% of all ham messages is
almost certainly a sign of product misconfiguration
While I feel a certain degree of schadenfreude at this product’s results, I am inclined to agree with this assessment and therefore feel that VB’s results would have been more sound had this product been removed from the results entirely. The decision to keep the rules constant for any complete run of the tests is a good idea, but not the extent of producing obviously incorrect results.
I’d also be inclined to discount the ClamAV and SpamAssassin results because, in both cases, the spam-catch rate was too low to be useful: the false negatives (spam reaching the inbox) were a considerable multiple of the true negatives (legitimate email reaching the inbox).
This leaves just three products. Despite VB’s careful disclaimers, their results are largely what I’d expected, albeit at the high end of the range. Let us hope that these results aren’t enough to discourage vendor participation in future trials.
leave a comment