To a business owner, there’s nothing more frustrating than running a plethora of ad creative tests on TikTok or Facebook without getting any conclusive feedback about which ad is performing the best.
Nothing, that is, except getting the wrong conclusive feedback.
For instance, look at this very beautiful ad creative analysis from a reputable app.
Can you see why this analysis might be misleading? If not, you’re not alone.
Note: this screenshot is from Motion, and I want to call out that there’s nothing wrong with Motion App. It’s a fantastic tool, and I chose it for this screenshot because it’s probably the best-looking tool out there for analyzing creative. BUT, Motion App, and all of the other ad creative tools, can’t “think for you”. They aren’t naturally looking for Simpson’s Paradox and lurking variables (all stuff that I’ll explain in more detail below).
Digging into the data
Let’s take a look at this from a different angle. If you were presented with this data, which ad would you say is the winner?
If you picked Ad B, you’d be in the majority.
Most people would say that Ad B is clearly the winner. They both have plenty of spend behind them, they both have plenty of purchases, but when you look at the CPA for Ad B, it’s lower (better) than Ad A, and when you look at the ROAS for Ad B, it’s higher (better) than Ad A.
It seems like this couldn’t be any more obvious — right?
Here comes Simpson’s Paradox…
Let’s expand these cells so we can see what’s going on underneath them. But first, a super nerd alert.
🚨 Super Nerd Alert
If you’re a super nerd (like me) and you want more information about the history of Simpson’s Paradox, this is for you… if not, skip to the next section 🙂
According to a paper written by Judea Pearl at UCLA…
“Simpson’s paradox refers to a phenomena whereby the association between a pair of variables (X, Y ) reverses sign upon conditioning of a third variable, Z, regardless of the value taken by Z. If we partition the data into subpopulations, each representing a specific value of the third variable, the phenomena appears as a sign reversal between the associations measured in the disaggregated subpopulations relative to the aggregated data, which describes the population as a whole.
Edward H. Simpson first addressed this phenomenon in a technical paper in 1951, but Karl Pearson et al. in 1899 and Udny Yule in 1903, had mentioned a similar effect earlier. All three reported associations that disappear, rather than reversing signs upon aggregation. Sign reversal was first noted by Cohen and Nagel (1934) and then by Blyth (1972) who labeled the reversal “paradox,” presumably because the surprise that association reversal evokes among the unwary appears paradoxical at first.”
👉 This is the section to skip to if you’re not a Super Nerd
When we open this up, we can see that these ads were running to three different prospecting audiences — an interest audience, a lookalike audience (LAL), and a wide-open audience.
But here’s where it gets really fun. I’ve gone ahead and highlighted a few rows for you to make it easy to follow along.
When it comes to CPA… Ad A outperforms Ad B on every single audience. Wait, what?!
Oh, it’s better.
When it comes to ROAS… Ad A again outperforms Ad B on every single audience.
If you’re frustratedly busting out your TI-85 to crunch the numbers manually to prove that I did the math wrong — I’m not offended. That’s sorta the whole point of a paradox — something that seems “absurd or self-contradictory” but isn’t.
So, which ad is really the winner?
Well, if we group our ad data at the campaign level, it’s Ad B, but if we group our ad data at the Ad Set level, the data says that Ad A is the winner.
Since each Ad Set is a different audience, it’s more beneficial to analyze the data that way. In statistics, we call these hidden groupings “lurking variables”. Sometimes it’s hard to know if you have any lurking variables that should be used for categorizing results into different distribution groups, but I’ve got one more example that I think will make this significantly clearer for you.
What if our lurking variable was a prospecting audience vs. a retargeting audience?
It usually costs more money to acquire a purchase from someone that hasn’t been to your website (prospecting) than it does to acquire a purchase from someone that just recently visited your website (retargeting).
In this case, Ad A had more of its spend going towards the more difficult audience, so even though it had a lower CPA than Ad B on Prospecting, the CPA for Ad A on Prospecting was still higher than the CPA for Ad B on Retargeting.
There was an unequal weighting of which ad was targeting the more difficult audience vs the easier audience and that’s why Ad A can beat Ad B in both audiences, but end up with an aggregate result that seems to suggest that Ad B is the better ad creative.
Tom Grigg has a great illustration (which I’ve sloppily edited) that helps this make sense visually. The image on the left shows what happens if you slice through the data in aggregate, vs. the image on the right shows what that data looks like if you account for the lurking variables.
Practical application for advertising analysis
The key is to look for the lurking variables and not just taking rollup analyses for face value. Audiences is an obvious one, but here are a few other lurking variables that could really mess up your analysis:
- Campaign optimization (i.e. optimizing for traffic vs. optimizing for purchases)
- Ad Platform
- Landing pages
- Different time periods (i.e. did one ad get more spend on weekends while another got more spend on weekdays?)
There are lots of great tools out there that can help you analyze your advertising, but the tool can’t think on your behalf — you’re still responsible for understanding the lurking variables and making sure you set up the analysis in a way that will give you more meaningful insight into the results.
Going beyond advertising
This isn’t just about advertising. Simpson’s Paradox pops up all over the internet, and it can really cause a lot of issues when bad analyses are spread through social media. These are just a few examples where the aggregate data shows a very different conclusion than the lurking variables reveal — prime fodder for sensationalized headlines by journalists, politicians, and Karen:
- US Median Wage Decline by David Smith
- Gender Discrimination at UC Berkley by Tom Grigg
- Batting Average for Derek Jeter vs. David Justice by Eric Hart and Mariam Walla
- Median Income Change for Women 25+ by Simona Dobreva
- Trump vs. Hillary and the US Electoral College by Suraj Malpani
… and the list goes on!
Take some time to read through the articles above and use it to help you understand a bit more about the data you’re looking at to make decisions — in advertising — and in everyday life.