Lost in Reception

Virus Bulletin’s trial results and the meaning of “false positive”

Posted in Uncategorized by Roland Turner on March 25, 2009

Virus Bulletin is gearing up to perform regular tests of anti-spam systems as an addition to their existing coverage of anti-virus systems.
They’ve performed their first trial and published anonymised results that are interesting, to say the least. They report false positive rates for the submitted systems falling between 0.04% and 0.4%. Unfortunately:

Following industry standards, the false positive rate, or ‘FP rate’, is the ratio of the number of false positives relative to the total number of emails.

Correctly determining what numbers to use when calculating a false positive rate requires a rudimentary knowledge of statistics. Early in spam’s history (when it was <1% of all email), getting the denominator wrong (“all messages” rather than “all legitimate messages”) made very little difference. Now that spam is closer to 95% of all email – and still climbing – this small error makes an enormous difference. At some stage vendors chose to report false-positive rates calculated over total message volume, rather than legitimate message volume, because it made the figures a little better. Sadly, there’s no convenient time for a vendor to undo that practice, even when the difference now makes the published figures more or less meaningless.

According to VB:

During this period, the filters saw a total of 20,764 emails, 877 of which were classified as ham by VB’s employees (the recipients)

Changing the denominator from 20 764 to 877 multiplies the result by 20 764 / 877 ~= 23.676. This is a pretty large correction; the “correct” figures for VB’s trial run are therefore 0.95% – 9.5%.

But, does it matter? Numbers like 0.04% look so small that they can be disregarded, but once it’s clear that this number means the loss of 1 in 100 legitimate messages, a potential customer is likely to see the number as being rather more important. Pity the vendor with the 0.4% result; 1 legitimate message in 10 will go astray!

I gather that VB intends to publish both the statistically sound false-positive rates and the industry-standard “nice” rates. I’m hoping that they do so.

One Response

Subscribe to comments with RSS.

  1. [...] (false positive error reporting) Posted in Uncategorized by Roland Turner on May 13, 2009 I wrote: I gather that VB intends to publish both the statistically sound false-positive rates and the [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.