What is A/B Test Significance Calculator?

The A/B Test Significance Calculator is a free online tool that tells you whether the difference between two test variants is real or just random chance. Enter the visitors and conversions for your control (Variant A) and your variation (Variant B), and the tool runs a two-proportion z-test to produce the z-score, the p-value, the confidence level, and a clear verdict on the winner. You can pick your own confidence level (90%, 95% or 99%) and choose a one-tailed or two-tailed test. Beyond the verdict it also reports each observed conversion rate, the absolute lift (in percentage points) and relative lift (as a percentage), and a 95% confidence interval on the difference. You can optionally add a Variant C and D to compare more than two options at once, and a built-in sample size planner tells you how many visitors per variant you need before you even start. Everything runs in your browser, with no signup, so your test data stays completely private.

How to use A/B Test Significance Calculator?

Checking the statistical significance of an A/B test takes only a moment:

  1. 1 Enter the number of visitors and conversions for Variant A, your control or original version.
  2. 2 Enter the visitors and conversions for Variant B, the variation you are testing against the control. Optionally add Variant C and D to compare several variations at once.
  3. 3 Choose your confidence level (90/95/99%) and whether the test is one-tailed or two-tailed.
  4. 4 Click Calculate Significance. The tool applies a two-proportion z-test and computes the z-score, p-value, confidence level, lift, and the confidence interval on the difference.
  5. 5 Read the verdict and interpretation. If confidence reaches your chosen threshold, you can trust the declared winner; otherwise the result is not yet significant and the test should keep running.
  6. 6 Use the Required Sample Size Planner before launching: enter your baseline rate and the minimum detectable effect to see how many visitors per variant you need at 80% power.

Why use this tool?

Acting on an A/B test that has not reached significance is one of the most common and costly mistakes in marketing. A variant can look like a clear winner purely by chance, and rolling it out can quietly hurt conversions. This calculator removes that risk by quantifying how confident you can be that the difference is real. The p-value tells you the probability the result happened by luck, and 95% confidence is the standard bar for a trustworthy decision. The confidence interval on the difference shows the plausible range of the true effect: if it crosses zero, the result is inconclusive. Choosing a one-tailed test gives more power when you only care whether B beats A, while a two-tailed test is the safer default. The sample size planner helps you avoid underpowered tests by sizing the experiment up front. To get reliable results, wait for at least 100 conversions per variant, run the test across full business cycles, and do not stop early just because the numbers look good. Because everything runs locally in your browser, your test data is never uploaded.

Examples

A clear winner

Variant A converts 100 of 1,000 visitors (10%) and Variant B converts 130 of 1,000 (13%). The two-tailed z-test reports a p-value of about 0.035 and over 96% confidence, with a relative lift of 30%, so B is a trustworthy winner at the 95% level.

Comparing several variants

You test a control plus three variations (B, C and D). The comparison table shows each rate, its lift versus the control, the p-value, and a per-variant verdict, so you can see at a glance which variations have genuinely pulled ahead.

Planning before you launch

With a 5% baseline and a goal of detecting a 10% relative lift at 80% power, the planner reports roughly 31,000 visitors per variant — telling you the test needs serious traffic before it can conclude.

Frequently Asked Questions

What is statistical significance in an A/B test?

It is the confidence that the difference between two variants is real rather than random. A result is usually considered significant at 95% confidence or higher, though you can choose 90% or 99% in this tool.

What does the p-value mean?

The p-value is the probability that the observed difference happened by chance. A lower p-value means a more reliable result; a p-value of 0.05 corresponds to 95% confidence.

Should I use a one-tailed or two-tailed test?

A two-tailed test checks whether B differs from A in either direction and is the safer default. A one-tailed test only checks whether B beats A and has more statistical power, but should only be used when you are sure you only care about an improvement.

What is the confidence interval on the difference?

It is the plausible range for the true difference between the two conversion rates. If the interval crosses zero, the result is inconclusive; if it stays entirely positive or negative, there is a real effect at that confidence level.

How do I calculate the required sample size?

Enter your current baseline conversion rate and the minimum relative lift you want to detect. The planner uses an 80% statistical power standard to estimate the visitors needed per variant before you start the test.

How many conversions do I need per variant?

Aim for at least 100 conversions per variant before trusting a result, and run the test across full business cycles of one to two weeks to smooth out daily fluctuations.

Is the A/B test calculator free?

Yes. The tool is completely free with no signup, no limits, and no account required. It runs in your browser, so your test data is never uploaded or stored.

A/B Test Significance Calculator | Marketing Manager

A/B Test Significance Calculator

Calculate statistical significance of A/B tests

Variant A (Control)

Variant B (Variation)

Test Settings

Two-tailed tests whether B differs from A; one-tailed only whether B beats A.

Optional — Extra Variants

Add variant C and D to compare more than two options (compared against control A).

Variant C

Variant D

Required Sample Size Planner

Estimate how many visitors per variant you need before starting (statistical power 80%).

Smallest relative improvement you want to detect, e.g. 10 means a 10% relative lift.