Lost in Reception

Terry Zink on calculating false-positive rates

Posted in Uncategorized by Roland Turner on July 21, 2010

One of my bugbears when talking to customers about message loss through spam filter false-positive errors is that most email security vendors understate their false-positive rates by about an order of magnitude. Terry has noticed this too:

The industry cheats quite a bit with their SLAs, the language is deliberately ambiguous.  If a company claims a 1 in 25,000 false positive SLA, what that means is that they permit 1 false positive per 25,000 messages.  This means that if the spam/ham ratio is 10:1, then in 25,000 messages there will be 2272 hams and 22,728 spam messages.  If one of the good messages is flagged as spam, then the good mail FP rate is 1/2272 = 0.04%, which is actually quite high.  Yet by saying that you permit 1 in 25,000 messages, and messages is not defined but assumed to be both spam + non-spam, vendors have permitted themselves a lot of leeway when calculating how accurate their product is against good mail… by a factor of 10.

Combining positive and negative approaches to security

Posted in Uncategorized by Roland Turner on July 14, 2010

Most of the loss of legitimate email through filtering arises because the mindset is about blocking bad messages. BoxSentry has been making the case for several years that combining this approach with the ability to recognise most legitimate messages will improve the accuracy of filtering and, in many cases, reduce the resource cost of doing so. I’ve just noticed a three year old white paper by F5 making a similar argument about security in general.

Murphy and Salchow describe one way of looking at combining “positive” (everything not permitted is prohibited) and “negative” (everything not prohibited is permitted) approaches to security and make an efficiency argument for choosing a combined approach rather than using one or the other exclusively. The argument makes some sense, but overlooks the fact that in environments where a great deal of what distinguishes good from bad behaviour is unknowable, even in principle, the quantitative efficiency argument quickly grows into a qualitative argument about what is and isn’t possible. That is, the “more efficient” combined model is being compared with either of the single-approach models in which the cost of providing equivalent protection would be infinite.

I agree with Murphy and Salchow’s argument that the approaches should be combined, but I believe that the argument for doing so is considerably stronger than what they offer.

(Stated another way, they end up understating the impact because they fail to acknowledge that the number of unknown behaviours – good or bad – is usually effectively infinite, meaning that their graph on page 4 is incorrect. Of course there are situations in which all possible behaviours can reliably be finitely enumerated – for example there are only 65536 TCP port numbers that a client can be attempting to connect to – and those are situations where classifying all possible behaviours as either bad or possibly good is straightforward, but these ceased being important places to look long before the paper was written in 2007.)

Follow

Get every new post delivered to your Inbox.