Identifying deceptive behavior in user-generated content
In this interview, JT Buser, Manager of Authenticity and Fraud at Bazaarvoice, talks about challenges involved in identifying deceptive behavior in user-generated content as well as interesting techniques he’s seen scammers use.
Bazaarvoice is a network that connects brands and retailers to the people. Each month, more than 500 million people view and share opinions, questions and experiences tens of millions of products in the Bazaarvoice network.
What are the challenges involved in identifying deceptive behavior in user-generated content?
Like most other anti-fraud efforts, one of the main challenges is volume. Bazaarvoice sees more than 500 million monthly unique users submitting or reading the content on our network, and that equates to a vast amount of content being generated.
After you address the challenge of volume, you have to start is with an understanding of the types of practices that are generally associated with deceptive behavior in user-generated content. On the one hand, you have practices that are not inherently fraudulent but do compromise the authenticity of the content, such as correcting misspellings. Those can be addressed by appropriate policies and controls on who has access to the content once it has been submitted. Then you have behavior that is fairly objective/black-and-white and comparatively easy to identify with the right data—things like automated submissions by bots, programs, or scripts, as well as relatively unsophisticated activity such as people creating fake usernames, businesses posting inaccurate reviews or false assertions intended to demote a competitor, or businesses who are self-promoting.
A bigger challenge is identifying instances of solicitation—either on the part of a reviewer or a business or their agents. Intercepting those efforts requires a much broader view of the activity coming onto and taking place within the network and we have taken some unique steps in order to monitor this type of activity, including pattern recognition strategies and “undercover” work.
How big of a problem are false online reviews? What are some of the consequences?
There is a wide variety of practices that are affecting the trustworthiness of platforms that capture and display user-generated content, such as ratings and reviews. It’s not hard to find well-publicized instances of people or small groups within businesses who are creating fake usernames, posting inaccurate reviews, or spreading false assertions about products and services. However, when you consider the instances of fraud in the context of the vast quantity of user-generated content being submitted or posted across the Internet every day, you start to see that fraudulent activity is more the exception than the rule.
That said, you have to remember that we’re dealing with content that embodies someone’s opinion about a product, service, or company. Consumers make purchase decisions based on that content, and businesses use it to represent their brand. When you view it that way, you realize that any amount of fake content has consequences. Authenticity and trust are among the few constants that help business create lasting, valuable relationships with consumers. And consumers today are far more informed, aware, and empowered—they will hold businesses accountable for anything that diminishes their trust. That means the reviews we collect and display on behalf of our clients only have value if they are authentic.
Because of that, our mission as a company comes with a responsibility to safeguard authenticity. While no system is bulletproof, our mix of anti-fraud technologies and human analysis give us a high degree of confidence that we are identifying all inauthentic content submitted to our platform. On the whole, the amount of fraud we encounter on our network is quite small—less than 1% of all content submitted, all of which is prohibited from being displayed—but one of the things we clearly see is that the prevalence of deceptive behavior varies by market or product type (travel and tourism vs. apparel vs. CPG, etc.) and geography (global or local).
What are some of the most interesting techniques you’ve seen scammers use?
Nothing surprises me anymore. When we first started looking at this problem 6 years ago it was really pretty cut and dry. Most fraudulent content would come in from a single or small group of individuals that had a direct relationship with the product and/or service and they were not very sophisticated—they would come in with brute force from easily identifiable locations.
Just like any other fraud, as we began to block this type of activity the scammers got more creative in the ways they would try to circumvent our system. We still see the brute force attempts but now we also get individuals who are much more sophisticated in attempting to evade detection and hide their identity. In some cases, individuals are evolving their own skills and, in others, those who want to deceive are seeking out those people who are sophisticated in the art of evading detection. With one business, we were able to pinpoint the exact day in which they hired someone to be a professional scammer for them. They began with brute force from an easily identifiable location and over time began using low-level evasion techniques. Then in a single day their activity picked up using extremely sophisticated techniques.
We also deal with the solicitation of reviews, where scammers pay and/or teach apparently legitimate individuals how to conduct fraud on their behalf. Some of these attempts go layers deep. For example, we identified one company that hired a representative who, in turn, hired an agency that hired local students and taught them the art of creating an astroturfing campaign.
Of course, you have really determined scammers who use a combination of techniques. In one instance, we intercepted a company submitting fake reviews via brute force from their own corporate headquarters. Over time, as we continued to block the content, we saw them shift to submitting fraudulent content using people in the Philippines and shift again to using individuals in China before eventually submitting fake content once again from their corporate headquarters.
Each time they shifted locations, they also changed other elements of the content—characteristics like length, grammar, etc. Later, we identified suspicious activity associated with the same company, except this time we traced the content to a post on Craigslist that was soliciting reviews for the brand for $5 apiece. I won’t detail precisely how we were able to monitor this string of activity but, suffice to say, we continuously blocked this content.
How does Bazaarvoice fight the scammers?
We approach authenticity using a combination and policies and technologies. At a high level, the major components of our authenticity system include:
1. Device identification and reputation technologies that allow us to identify the submission source of all reviews—from any Internet-enabled device including PCs, smartphones, tablets, laptops, or consoles—and expose hidden relationships, while still protecting users’ privacy.
2. Advanced algorithms that review source data, review text, and multiple other attributes to help identify patterns suggestive of fraudulent activity.
3. Business rules that flag submissions based on thresholds associated with evidence, profiles, velocity, geolocation, or other anomalies indicative of fraudulent activity.
4. System-wide analysis capabilities that aggregate large volumes of disparate data from the entire Bazaarvoice network to enable real-time decision screening via a fraud scoring engine.
5. Verified purchase technologies that match the source of submitted content to a specific transaction.
6. Dedicated fraud analysts that monitor suspicious reviews to make a final determination regarding their authenticity.
7. Client audits that help ensure the activities of our own clients do not violate the terms of our Authenticity Policy (e.g., attempts to “un-reject” identified fraudulent content) and which may be suggestive of other efforts to circumvent the system.
These components work together to prevent a single point of failure by allowing the weakness of one tactic to be countered by the strength of another. As a result, we effectively intercept and combat the variety deceptive practices that affect the trustworthiness of online reviews, including but not limited to:
- Spam, bots, and repetitive script scenarios
- Individuals utilizing various evasion techniques to conduct both small and large-scale fraud efforts
- Supplier solicited fraud (i.e., attempts by individuals to solicit payment in exchange for writing fake reviews via sites such as Fiverr, Craigslist, etc.)
- Agent solicited fraud (i.e., attempts by an agency to procure fake reviews on behalf of a client)
- Employee solicited fraud (i.e., attempts by a businesses/employees to solicit reviews from friends, family, etc.)
- Compliance concerns associated with employees or agents authoring reviews.