Statistical Significance & A/B Testing

Statistical Significance & A/B Testing
MOUNTAIN VIEW, CA/USA - FEBRUARY 1, 2014: Exterior view of a Google's Googleplex Corporate headquarters. Google is an American multinational corporation specializing in Internet-related services and products.


Statistical Significance & A/B Testing

Websites should be perpetually evolving its performance in order to not become stagnant and maximise the return from your visitors. It doesn’t matter what service you provide, there is always a requirement to generate as many leads/sales as possible from whatever traffic volume you have coming to your site.

Something that I think needs to be highlighted when looking to A/B test is deciding at what point the tests become statistically significant and you’re ready to pull the trigger on the permanent change.


The nature of search volume when A/B testing  

Search volume and the intent behind it fluctuates constantly as you’ll see peaks and troughs in all of your online marketing campaigns based on an incredible amount of variables such as day of week, time of day, the weather etc (this is why machine learning technology is so effective).

By the very nature of an A/B test, there will be two groups of people seeing a different variation of a page in order for you to establish what the audience would prefer to see and then implement it moving forward. However, if you showed variation A to both of those groups the performance would be different.

Each individual digests information differently and the only way you would get the exact same results would be to show the different variations, under the same circumstances, to the same individuals and that is of course impossible.

Instead, we’re required to make a decision based off of the way that two completely separate groups of individuals engage with your content based on their search intent and the fact that they’re your target market.

This is where statistical significance comes in to play.


Statistical Significance

Statistical significance is boiled down to minimising the risk that the potential change can be put down to chance or a “gamble”. You can never be 100% certain a change will be positive regardless of A/B testing of course but generally, if an A/B test, using significant amounts of data, is positive – then you’re going to see a positive result.

When you reach the point of statistical significance, however, is not as clear. For example, if you receive 1000 visitors to your homepage daily, you’ll be able to reach statistical significance relatively quickly. However, if you were wanting to test the form fields on a contact page that gets 15 visitors per day, you’ll need to run this test for a long period of time to establish statistical significance.

The obvious question that we need an answer for is: When do I achieve statistically significant data?

The truth is that it’s impossible to tell when this is achieved and it’s based on your own individual data and you’ll need to use this information to decide how long to run the A/B test for and how many visitors should pass through the test before you can be confident of the correct result.


A quick tip

When running an A/B test using Google Optimise and you’re not the type of individual who can sit back and leave the test run, you may find yourself engaging with your site on a more regular basis, on multiple devices and potentially even using incognito. This results in contaminated data and this is certainly not going to help you establish where the best performance can be obtained.

A great way of ensuring you or anyone else in your office doesn’t affect the data is by excluding your IP address in Google analytics. This will ensure that your engagements with the site from this IP don’t affect the overall data pool.



In summary, A/B testing needs to be done in a controlled environment and given the chance to effectively highlight whether the test is likely to yield positive returns if implemented.

How long the test needs to run and the data sample size is determined on an individual basis but there certainly needs to be a period of monitoring and the only way that we should implement any A/B test is if it’s provided us with statistical significance.