Machine learning technology to fight image-based spam
Proofpoint, Inc. announced the availability of a new version of its machine learning-based anti-spam technology that features enhanced protection against the latest image-based spam attacks. The Proofpoint Spam Detection module — powered by Proofpoint MLX machine learning technology — offers the industry’s highest effectiveness against hard-to-detect image-based spam, using a unique combination of machine learning algorithms and patent-pending image analysis techniques. Proofpoint MLX provides outstanding accuracy against all types of spam by examining more than 200,000 structural, reputation and content attributes using a combination of advanced statistical analysis engines, powered by patent-pending machine learning techniques. Traditional anti-spam solutions evaluate only a limited number of attributes and are unable to decisively classify spam, leading to low effectiveness and a high number of misclassified messages (“false positives”).
The advanced methods used in Proofpoint MLX are superior to simple statistical techniques such as Bayesian filtering and signature- or fingerprint-based techniques, which are easily fooled by spammers. Proofpoint continues to be at the forefront in the battle against image-based spam — from both primary research and practical development perspectives. The latest generation of Proofpoint MLX machine learning technology applies both artificial intelligence and advanced image analysis methods to the problem of correctly identifying image-based spam. Just a few of the new analysis techniques used by Proofpoint MLX to combat image-based and botnet-delivered spam include:
— Automated image extraction threshold analysis: Proofpoint’s backend systems automatically detect images being used in new spam campaigns by examining high frequency variations across images.
— Fuzzy matching for obfuscated images: Proofpoint MLX detects obfuscated spam images by using techniques that mimic the way human beings perceive spam. Proofpoint has developed a variety of highly-effective- but minimally compute intensive-techniques that “see through” obfuscation tricks used by today’s image spammers.
— Animated GIF spam detection: In one of the newest spammer tricks, an image-based spam payload is “hidden” in a single frame of an animated GIF. Proofpoint MLX analyzes the structural and temporal attributes of animated images to identify those with spam characteristics.
— Dynamic botnet protection: Proofpoint MLX Dynamic Reputation continually profiles IP-level connections and source IP addresses, monitoring for activity characteristic of botnets. When botnet IPs are detected, Proofpoint MLX automatically rejects image-based and other types of spam from those sources.
— Predominant correlation: Proofpoint uses a machine learning technique known as information gain to identify the very best attributes (or clues) to use in detecting spam versus valid mail. From the millions of available attributes, information gain selects those that are most valuable. Proofpoint has taken this technique a step further with the introduction of predominant correlation-based attribute selection. This new technique allows Proofpoint MLX to identify attributes that are redundant and automatically remove them, ensuring that only the most effective indicators of spam are considered. This intelligent approach to attribute analysis maximizes effectiveness (the system’s ability to accurately detect spam) and performance (the system’s ability to rapidly process messages) at the same time.
— URL analysis techniques: Proofpoint’s backend systems perform statistical analyses of URLs from Proofpoint honeypots and customer sites, coupled with correlative analysis of URLs and the IP addresses hosting them. By using advanced network analysis techniques, Proofpoint MLX can determine if a sending IP address is associated with a known malicious URL or suspicious ISP and use these associations as a strong indicator of spam.