The rise of non-English language spear phishing emails
Business email compromise (BEC) threats are one of the many tried-and-tested tactics cybercriminals use to target their victims. These tend to be brief messages with only a few lines of text but no URLs, attached files or other elements that can be scanned and identified as malicious by security systems.
No problem, you might think, because AI algorithms can now easily spot BEC threats. Natural Language Processing (NLP), for instance, is one AI technique that can be effectively brought to bear against BEC. Defenders can use NLP algorithms to detect tell-tale threat traces such as requests for wire transfers, invoice payments or gift cards. These protective algorithms can even pick up the sort of urgent language used by cybercriminals trying to fool humans into handing over cash, credentials or anything else they’re after.
The language of cybercrime
In the past, most BEC emails have been written in English – meaning that defense systems can be tuned to recognise flag words and phrases written in this internationally recognized language. But recently, our researchers have observed an increase in the numbers of BEC emails including Italian, Spanish, German, and Slovenian. A simple swap to an alternative language is a major challenge to AI algorithms designed to be English-first protection systems, offering criminals the ability to dodge security shields using relatively rudimentary tactics.
We have observed a rise in the number of BEC emails in recent months. Statistics gathered from Microsoft 365 users in Europe and the US show that 950 BEC emails were detected in the US in December 2020 and 1,100 in Europe. This number increased to 1,250 and 1,500 for those two territories in January 2012.
The problem – and its solution – lies in the training of AI algorithms. Security teams can teach AI to recognise trigger words which indicate a phishing attempt or any other technique used to fool human targets. It’s a straightforward matter to recognize any emails containing the word “password” or “bank transfer,” and trigger a warning which tells the recipient or security staff that the email they’ve been sent is potentially dangerous. Likewise, if the email manages to sneak past protection layers, employees can be trained to avoid clicking on emails which are clearly send with malicious intent.
If an attacker swaps to, say, Slovenian, they can bypass AI systems which are built to analyze English-language emails. If a Slovenian email sneaks past the defenses, it isn’t necessarily going to lead to disaster, because you would hope staff are unlikely to respond to the email. The problem instead arises if the attacker’s address is whitelisted. If they manage to get an email past an organization’s security perimeter and their address is flagged as a safe account, then the threat level increases dramatically.
A smart attacker can begin to use advanced tactics to try and fool their target once the non-English message slips through email security defenses. We’re now seeing criminals using much cleverer language than the comical nonsense of old. They are avoiding grammatical errors and language which is obviously going to trigger suspicion.
Sometimes they try to engage victims in conversations in order to drop their guard as well as boost the likelihood of getting whitelisted. Using simple open-source intelligence techniques, they can devise a method of getting the target to drop their own defenses and push them to take action, which then allows the hacker to achieve their criminal objectives.
How to beat the new threat
In order to protect against non-English language BEC threats, organizations need to work with email security vendors. Unfortunately, while email security vendors typically have an abundance of English-language data with which to train their algorithms, the size of datasets in other languages is much smaller, particularly when it comes to non-global languages used by a small number of people. With a small dataset comes a big risk, because there are likely to be gaps and omissions which allow canny cybercriminals opportunities to bypass security mechanisms.
To increase the capabilities of AI algorithms, vendors must beef up their datasets as well as invest significant resources in updating their detection engines. This takes time and requires constant attention. Whenever a new threat is detected, it should be reported so that email security vendors can enhance their datasets and provide better protection.
For organizations, when it comes to choosing a vendor, size and global footprint matters. If an email security firm operates across the world in territories using various languages, its datasets are likely to be richer and more useful. After all, a global problem requires a global solution. To tackle non-English language email threats, organizations need to think big – or risk big consequences.