Statistics Basics for A/B Testing: Part 1
Knowledge of basic statistics is extremely important in designing and analyzing A/B tests accurately. Over the next few posts, I will be reviewing some important concepts all conversion optimization professionals should know and apply.
We need statistics any time it isn’t feasible to study the whole population to understand their attitudes, opinions, preferences etc. Let me explain with an example. Suppose we are interested in knowing which place in town has hotter coffee — McDonald’s or Starbucks. It would be difficult to measure each and every cup of coffee at both these places without some fancy gadgets.
So what we do instead is take a few cups of coffee (i.e. a sample) from both McDonald’s and Starbucks and measure their temperature. This data from both the samples can then be used to infer which place serves hotter coffee.
So, Population is the entire pool of users or things we want to measure. For a website, it is all your website visitors. When we A/B test, we essentially study a sample of visitors in experience A and a sample of visitors in experience B to make inference about the population. It can be to see if a new homepage design or a different checkout flow perform better.
Going back to the coffee cups example, now that we have the data, what do we do? We need to find a way to compare the two samples. This is where we use “mean” which is a measure of central tendency. A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. Mode and median are other measures of central tendency and when the data is skewed could be better measures of central tendency than mean.
When A/B testing, this could be your conversion rate, average order value or any other metric close to the point of change to measure it’s impact.
Once we plot the different temperatures in each of the samples, we get to something that looks like this.
The value at the middle of the distribution is the mean. Although the average temperature for Starbucks coffee is 175ºF, you may find some cups that are 165ºF or 172ºF or 185ºF. This variability gives the data different shapes. It can either be very tightly plotted around the mean if the data has low variability or have a greater spread as below.
The shape of the data or the spread of the data is the variability. Standard Deviation is a measure of this variability. Standard deviation is calculated by first taking the average squared distance of each data point from the mean and then taking the square root of that.
It is important to note that the variability or the spread of the data directly influences the sample size that we need to confidently calculate the population mean.
Now that we know the mean and the variability of the data, we need to estimate the population parameter. This is done by using confidence intervals. In the context of A/B testing, Confidence intervals is the amount of error allowed (business decision) i.e. it’s the measure of reliability of the estimate. Since we don’t know the true conversion rate of the population and are using the sample to infer that, we use the confidence interval to understand our risk of sampling error.
Confidence intervals are calculated using the following
- Mean: Sample mean that the confidence interval wraps around
- Standard Deviation & Sample Size: These determine how wide the confidence interval needs to be. Higher standard deviation and lower sample size means more uncertainty.
- Confidence level: How confident we want to be that our estimate of the parameter is within that confidence interval
Confidence intervals add a margin of error around the mean for a certain level of confidence. So, in our coffee cups example, we can say that we are 95% confident that the average temperature of Starbucks coffee is 175ºF ± 15ºF. This means that 95% of the time, the temperature of Starbucks coffee is expected to be between 160ºF and 190ºF.
When comparing the average temperature or Starbucks and McDonald’s coffee, it is important to take into account the confidence intervals as well. This has an important implication for A/B tests as we do not want the confidence intervals to heavily overlap between variations.
This post reviewed population, sample, measures of central tendency, standard deviation and confidence intervals. I will be covering other topics in statistics relevant to A/B testing in the next few posts. Stay tuned!