A/B Testing: Definition, How It Works, and Common Mistakes

A/B testing is a controlled experiment in which two versions of a single element are shown to separate audience segments to determine which performs better on a defined metric. Version A is the control (the original); version B is the variant with one change applied. The method is also called split testing. Marketers use A/B testing to make data-driven decisions about headlines, CTAs, email subject lines, landing pages, ad creative, and pricing displays.

How A/B Testing Works

A/B testing follows a fixed sequence. Deviating from any step introduces error that can invalidate the results.

Identify what to test. Choose one element with a direct connection to a measurable outcome. A landing page headline, a CTA button label, an email subject line.
Write a hypothesis. State what change you expect to improve which metric and by how much. Example: “Changing the CTA from ‘Learn More’ to ‘Start Free’ will increase click-through rate by at least 10%.”
Create the variant. Build version B with a single change. Do not change multiple elements in the same experiment.
Determine sample size and duration. Use a sample size calculator to find the minimum number of visitors per variant needed to detect the expected effect at 95% statistical confidence.
Split traffic randomly. Visitors are assigned to version A or B at random. Random assignment ensures the two groups are comparable.
Wait for the predetermined endpoint. Do not stop the test based on interim results. Run for at least one full business cycle (typically 7 to 14 days) to capture day-of-week behavioral variation.
Analyze and implement. Evaluate statistical significance at the endpoint. Deploy the winning variant. Document the hypothesis, parameters, results, and confidence level for future reference.

linkutm’s A/B test URL generator creates separate UTM-tagged URLs for each variant, allowing GA4 to attribute conversion data by variant directly in the Acquisition report.

A/B Testing vs Multivariate Testing

A/B testing and multivariate testing (MVT) are related but suited to different situations.

	A/B Testing	Multivariate Testing
Variables changed	One	Multiple simultaneously
Traffic required	Moderate	High (split across all combinations)
What it reveals	Which version wins	Which element combination works best
Speed to result	Faster	Slower
Best for	Single-element decisions	Interaction effects between elements

A/B testing is appropriate for most marketing decisions. Multivariate testing becomes useful when you need to understand how multiple page elements interact with each other, and only when traffic volume is large enough to support the increased number of combinations without extending the test to an impractical length.

What to A/B Test

A/B testing applies to any element where two versions can be served to comparable audience segments and a measurable conversion event exists.

Common marketing applications:

Email campaigns: Subject lines, preview text, sender name, CTA text
Landing pages: Headlines, hero images, form field count, CTA button copy and color
Paid ads: Ad copy, image vs. video, headline variations, audience targeting
Pricing pages: Plan order, price display format, feature bullet emphasis
Push notifications: Message copy, send time, notification frequency

In paid advertising, UTM parameters with distinct utm_content values differentiate A/B test variants without separate landing pages. Two ad creatives pointing to the same page can use utm_content=creative-a and utm_content=creative-b so GA4 segments conversion data by creative automatically.

Common A/B Testing Mistakes

Testing multiple elements at once. Changing the headline and the CTA button in the same experiment makes it impossible to attribute any result to either change. One element per test.

Stopping a test early. Checking results mid-test and ending it when a variant appears to be winning is called the “peeking problem.” It dramatically inflates the false-positive rate, producing winners that fail to hold up after implementation. Set the endpoint before the test begins and do not move it.

Ignoring practical significance. A result can reach 95% statistical confidence while delivering negligible business impact. A 0.1% conversion rate improvement on 400 monthly visitors produces fewer than 5 extra conversions per year. Define the minimum improvement worth implementing before running the test.

Running tests too short. Tests shorter than 7 days often miss behavioral variation across the week. E-commerce sites see different conversion behavior on weekends; B2B sites often see lower engagement on Fridays. The minimum recommended duration is 7 days, and 14 days provides more reliable results.

Not documenting outcomes. Teams that do not record test hypotheses, parameters, and results repeat failed experiments and cannot build incrementally on past learnings.

Frequently Asked Questions

What is A/B testing?

A/B testing is a controlled experiment that compares two versions of one element to determine which performs better on a specific metric. Version A is the original; version B has one change applied. Traffic is divided randomly between the two versions and results are evaluated once the predetermined sample size and duration are reached.

What is the difference between A/B testing and split testing?

They are the same methodology. Split testing is an informal synonym for A/B testing. Some practitioners use “split URL testing” specifically to describe experiments where the two variants live on entirely different URLs rather than being served dynamically on the same URL, but the underlying method is identical.

How many visitors does an A/B test need?

The required sample size depends on three variables: the baseline conversion rate, the minimum detectable effect (smallest improvement worth detecting), and the confidence level. A page converting at 3% that you want to improve by 20% relative (to 3.6%) needs approximately 10,000 visitors per variant at 95% confidence. Calculating sample size before starting prevents the common mistake of stopping tests too early.

How long should an A/B test run?

At minimum, 7 days. Ideally 14 days. Tests shorter than 7 days risk capturing only a slice of weekly behavioral variation. The test should also continue until the predetermined sample size is reached, regardless of interim results. These two conditions (time duration and sample size) must both be met before calling a result valid.

To verify whether observed conversion differences are statistically meaningful, the conversion rate calculator at linkutm computes rates and supports comparison across test variants.