Can You Trust Your Data Quality? How to Protect Your Insights from Bots and Fraud
The robots are getting smarter, the hackers and fraudsters more sophisticated. And they’re coming for you(r data).
No, this isn’t the opening of an episode of 60 minutes. But when data is the backbone of decision-making in business, especially around customer and brand experiences, data quality has never been more vulnerable – or more critical.
And the impact is much bigger than some executives may realize.
With the majority of today’s quantitative and even qualitative research being conducted online, up to 30% of data collected is often thrown out because of quality concerns and panel fraud. Even a single compromised dataset can mean a not-so-small chunk of your data investment is wasted.
So, what can companies do to fight back?
The DRG’s Senior Director of Research Services and a data quality expert, Rob Skog, dives into the evolving risks companies are facing today, sharing insights and best practices for staying a step ahead in the battle for data integrity. Read on to find out how to protect your data quality, prevent fraud, and maintain the integrity of your insights – while avoiding the (avoidable) hits to your bottom line.
The DRG: What are some common data quality concerns companies have today when ensuring the reliability and trustworthiness of data and insights?
Rob Skog:
Data trustworthiness and reliability is a huge concern for a lot of companies. We’ve always heard the horror stories of bots infiltrating customer surveys and totally undermining the data. But now threats are so much more prevalent, with new technology and AI making it easier for bad actors to do what they do. Even in the last year alone, bots have gotten so much smarter. And the thing is, it often starts with a human understanding how to get through a screener and then a bot taking over after that.
Companies also need to make sure they’re receiving feedback from real people who are taking the time to read and truthfully respond to their survey questions. Verifying responses has become a bigger challenge for research teams, and then there’s making sure the responses come from unique individuals. Because, unfortunately, many people will become members of multiple panels and take the same survey multiple times to collect more incentives.
This is especially true for studies with high incentives for participating, usually in the B2B space. Bots and malicious survey takers have a huge impact on the integrity of the data. Without constant monitoring, it’s safe to assume that 20% or more of a data set would not be reliable. That’s a big deal, especially for those who don’t have the resources or an insights partner on deck to help prevent it.
Wow, there’s a lot companies have to watch out for, but has it always been this way? How has this changed in recent years?
RS:
The issues really increased shortly after Covid started. More people were at home looking for ways to make money – like taking incentivized surveys. There was also an increase in things like “survey farms” and “bot farms.” These are organized operations that aim to complete the same survey as many times as possible or as many different surveys as possible to collect incentives for completion. They caught a lot of panel groups off guard at the beginning of Covid and have continued to be a problem now. Panels had to change the way they recruited their panelists, and the best ones now require panelists to go through a waiting period and even triple opt in. That’s why it’s so important to vet your panel partners or ask your research vendors about their vetting process.
We really did see a switch in the last few years. How do you think this will change the state of data quality in the next few?
RS:
As AI and machine learning advances, bots are going to become more sophisticated and better at mimicking human behavior. This will make them harder to detect and filter out. To combat this, companies will need to make improvements in their bot detection methods more often to keep up.
And as companies experience more data integrity issues, they’ll demand greater transparency and evidence of data quality measures from their insights partners. Which isn’t a bad thing – to be honest, transparency and quality should really be table stakes for a good partner.
With that in mind, what should an insights partner be doing to validate the authenticity of survey responses?
RS:
This may seem obvious but make sure you are working with reputable, vetted panel partners. With this one, you usually get what you pay for. We also recommend incorporating things like Recaptcha, Honeypots, and trap questions to cover your bases. These can all be easily implemented and give you that extra layer of security. And that layer is definitely needed.
What’s a best-in-class approach to ensuring data quality and integrity?
RS:
Well, it starts with a mix of automated tools and human validation to ensure that data is of high quality and integrity. For example, we regularly use bot detecting and automated tools to alert us to data “red flags” like those speeders, straight-liners, and suspicious response patterns I mentioned. Then, we go in and verify them.Â
Something else to look out for are several verbatims or comments across respondents that are nearly identical, which could be an indication that a survey was infiltrated. And if you identify bad respondents, always replace them with new ones that completely pass your quality reviews.
Beware of suspicious “red flag” response patterns:
- Large numbers of survey completions at identical times (especially late at night or in the early morning hours)Â
- Sudden spikes in responses from traditionally hard-to-reach groupsÂ
- Multiple identical or near-identical verbatim responsesÂ
- Unnaturally formal or dictionary-like response patternsÂ
Beyond all of that, I’d recommend working closely with panel partners to make sure they’re also taking measures to combat these kinds of issues, too.
Last question. When it comes down to it, what’s the real impact of all this? Why should companies care about data quality right now?
RS:
Well, your data isn’t real unless real respondents are taking your surveys.
If the data collected is compromised by bots, fraud, or duplicates, the results will be inaccurate and pretty much useless. That leads to flawed decisions, wasted resources and money, missed opportunities, and even damage to a company’s brand and reputation.
Analytics and insights are how businesses create their strategies, their products, and marketing campaigns. When the data is accurate, it enables the decisions being made to be successful. When the data is compromised, it can have a lot of negative consequences that can take on a ripple effect.



