Enter your test inputs
Use either realistic sample sizes (traffic) or paste your actual test numbers. Sliders update the results instantly â then hit âCalculateâ for a clean summary.
Compare your Control (A) and Variant (B) conversion rates to estimate absolute lift, relative uplift, incremental conversions, and a simple significance check (twoâproportion zâtest). Built for fast product decisions â and for sanityâchecking âwe got a 12% lift!â claims before you ship.
Use either realistic sample sizes (traffic) or paste your actual test numbers. Sliders update the results instantly â then hit âCalculateâ for a clean summary.
An A/B test compares two versions of the same experience: the Control (A) and the Variant (B). Each group sees one version, and you measure how often users complete the target action (purchase, signup, click, etc.). The simplest way to express performance is the conversion rate: conversions divided by visitors.
If the control conversion rate is pâ and control visitors are nâ, then expected control conversions are: xâ = nâ Ă pâ. Same for the variant: xâ = nâ Ă pâ. In this calculator, you input rates directly, and we compute conversions behind the scenes. (We round them to whole numbers for the significance estimate.)
There are two common ways to talk about improvement:
Both are valid. Absolute lift is often more âhonestâ because it shows the raw gap. Relative uplift is better for comparing across funnels with different baselines. For communication, many teams report both: â+0.3pp (+10%)â.
The question executives care about is: âIf we ship B to everyone, what changes?â A simple estimate treats your control rate as the counterfactual: what B traffic would have done without the change.
This is the âdirectâ uplift value. It doesnât include secondâorder effects like refunds, churn, or margin. If you want to model those, set âvalue per conversionâ to contribution margin or lifetime value instead of AOV.
A/B results are noisy because conversions are probabilistic. The zâtest asks: âIf there were truly no difference between A and B, how surprising is the observed gap?â The test uses a pooled estimate of the underlying conversion probability: pÌ = (xâ + xâ) / (nâ + nâ). The standard error of the difference is: SE = â(pÌ(1âpÌ)(1/nâ + 1/nâ)). Then the zâscore is: z = (pâ â pâ) / SE.
We convert z into a twoâsided pâvalue (probability of seeing a result this extreme under the null). If p-value < α, you can call it âstatistically significantâ under this model. If not, it doesnât prove the effect is zero â it means you havenât shown strong evidence yet.
Alongside the pâvalue, we compute an approximate confidence interval for the difference in rates: Π± z* Ă â(pâ(1âpâ)/nâ + pâ(1âpâ)/nâ). The interval shows a plausible range for the true lift. If the interval crosses 0, the result is not significant at that confidence level.
The best teams donât ship because a dashboard is green â they ship because the evidence is strong and the change makes sense. Use this simple checklist:
If your pâvalue is close to α (say 0.06 at α=0.05), donât panic. Consider running longer, increasing sample, or repeating the test. In growth work, replicability matters as much as a single âwin.â
These examples show why sample size matters. Copy them into the sliders above and see how the pâvalue and confidence interval change.
Absolute lift is +0.3pp, relative uplift is +10%, and incremental conversions are about 60 (20,000 Ă 0.003). At $50 each, thatâs about $3,000 in incremental value for the test sample alone â and much more if the effect persists when rolled out.
Relative uplift is +30% â sounds amazing â but with only 1,000 users per group, the uncertainty is huge. The confidence interval can easily include zero or even negative outcomes. The right move is usually to run longer.
The absolute lift is only +0.03pp, which feels tiny. But at scale, it can be meaningful â and because traffic is huge, it may still become statistically significant. Microâlifts are common in mature funnels.
Itâs a strong quick check, but not a full experiment platform. It uses a standard twoâproportion zâtest. For very low conversion rates, sequential testing, or multiple variants, you may want a more advanced approach.
α is the threshold used to decide significance. Confidence is 1âα. For example, α=0.05 corresponds to 95% confidence.
It means you donât have strong evidence of a difference yet under this test. The true effect might still be positive â your sample may just be too small (low power).
Twoâsided is safer and is the default here. Oneâsided can be appropriate if you truly only care about improvement and you commit to that plan ahead of time. Many teams still stick with twoâsided to avoid overconfidence.
Use average order value (AOV) for purchases. For signups, use expected LTV per signup, or even a conservative âqualified lead value.â Itâs okay to use a rough value â youâre looking for orderâofâmagnitude clarity.
Long enough to cover weekly seasonality and reach adequate sample size. Many ecommerce tests need at least 1â2 weeks. This calculator helps you see how pâvalues and intervals tighten as sample size increases.
Internal links help you compare decisions quickly (and help users discover the next tool).
If youâre doing many tests at once, be careful with false positives. Consider adjusting your decision thresholds or using a formal experimentation platform.
MaximCalculator builds fast, human-friendly tools. Treat results as estimates and validate with your analytics stack.