Reading Experiment Results¶

Once your experiment has been running for a while, FlagPal gives you a detailed results view. Here's how to understand everything you see.

Navigating to Results¶

Click Experiments in the left sidebar
Find your experiment in the list and click on it (or click View)
You'll be taken to the experiment's results page

The Results Overview¶

The results page has two main sections: the Result Summary Table and a Result comparison over time chart.

The Result Summary Table¶

This table shows how each variant performed against each metric you attached to the experiment. It looks like this:

Variant	Exposure metric	Goal metric	Ratio	Probability to be the best
Variant A	1010	30	2.97	1%
Variant B	1017	51	5.01	99%	Results are statistically significant!

Reading the Table¶

Variant: The name of the variant you're testing.
Exposure metric: The number of recorded Metrics when users were exposed to the variant. Use the drop-down to select any Metric that is collected in this Experiment (e.g., clicked on a button, visited a page, etc.).
Goal metric: The number of times the goal Metric was achieved (same as above, you can select any Metric recorded in this Experiment).
Ratio: The ratio of goal metric to exposure metric for each variant. Using a ratio allows you to compare the performance of different types of metrics more easily: clicks VS conversions for a conversion rate, or conversions VS revenue for AOV.
Probability to be the best: The probability that this variant is the best performing variant.
Results are statistically significant!: Indicates that the difference between variants is not due to random chance.

Example:¶

You ran an experiment on a checkout button and measured checkout_started, purchase_completed, and order_value. By selecting different metrics in each drop-down, you can compare the performance of different metrics and gain more insight into the experiment's results.

Let's start by selecting our checkout_started as the Exposure metric and purchase_completed as the Goal metric.

Variant	Exposure metric	Goal metric	Ratio	Probability to be the best
Blue button	1010	30	2.97	1%
Green button	1017	51	5.01	99%	Results are statistically significant!

How to read this: Out of 1010 users who saw the blue button, 30 made a purchase (2.97%). Out of 1017 users who saw the green button, 51 made a purchase (5.01%). The green button shows a much higher conversion rate, and your results are statistically significant.

Now let's change our Exposure metric to purchase_completed and keep the Goal metric as order_value:

Variant	Exposure metric	Goal metric	Ratio	Probability to be the best
Blue button	30	750	25	N/A
Green button	51	765	15	N/A	Results are statistically significant!

How to read this: Users who saw the green button generated an average of $15 per conversion, vs $25 for the blue button. (Note: the average is across all purchases in the variant.) Even though the green button's AOV isn't as high as the blue button, it has a much higher conversion rate, making it a more effective option for driving revenue.

Lastly, let's change our Exposure metric to checkout_started and keep the Goal metric as order_value:

Variant	Exposure metric	Goal metric	Ratio	Probability to be the best
Blue button	1010	750	0.74	N/A
Green button	1017	765	0.75	N/A	Results are statistically significant!

How to read this: Users who saw the green button generated an average of $0.75 per session, vs $0.74 for the blue button. (Note: the average is across all users in the variant, even if they didn't make a purchase.)

The Charts¶

Below the summary table, FlagPal shows a chart of your metric data over time.

What the Chart Shows¶

The chart plots the ratio of your selected Exposure and Goal metrics over time, with a separate line per variant. This lets you see:

Trends: Is one variant consistently outperforming the other, or did it spike and drop?
Stability: Have the results leveled off, or are they still fluctuating?
Anomalies: Was there a sudden change on a particular day that might explain a result?

Reading the Chart¶

The X-axis is time (days the experiment has been running)
The Y-axis is the metric ratio (conversion rate, average order value, etc.)
Each line represents one variant

A healthy chart shows two relatively smooth lines that diverge over time. The more stable the gap between them, the more confident you can be in the result.

A noisy chart with lines crossing back and forth usually means you don't have enough data yet. Wait for more users before drawing conclusions.

Interpreting Results¶

When to Declare a Winner¶

You can be more confident in your results when:

Impressions are substantial — each variant has at least several hundred users, ideally 500+
The experiment has run for at least a week — this accounts for day-of-week variation in user behavior
The gap is consistent over time — the winning variant has been ahead for multiple days in a row, not just briefly

What "Winner" Means¶

The variant with the better performance on your primary metric (decided in your hypothesis) is the winner. For a Boolean metric, that's the higher conversion rate. For Money, it's the higher average revenue per user.

When There's No Clear Winner¶

Sometimes experiments produce inconclusive results — both variants perform about the same. This is also valuable information! It means:

The change you made doesn't matter much to users
You can choose either option based on other factors (cost, simplicity, design preference)
You might need to test a more significant change

When a Variant Is Losing¶

If a new variant is clearly underperforming the control, stop the experiment early and keep the control. There's no point in exposing more users to a worse experience just to collect more data.

Acting on Results¶

The Variant Won — What Now?¶

Deactivate the experiment (toggle it off)
Create an Experience that sets the winning variant's feature values for all users
Activate the Experience

Now 100% of users get the winner, and you can move on to the next experiment.

The Control Won (or There's No Difference)¶

Deactivate the experiment
No new Experience needed — the default behaviour (no experiment) already delivers the control values
Document the result in the experiment's Description field

Document Before You Move On¶

Before closing an experiment, always record the outcome. Edit the experiment's Description to include:

Result: [winner] won with [X%] vs [Y%] for [other variant].
Rolled out on [date]. / No rollout — keeping control.

This creates a historical record that helps your team learn from experiments over time.

Common Questions¶

Why does the same user sometimes seem to be counted multiple times?
If a metric event fires multiple times for the same user when it shouldn't (e.g. entering the experiment should only happen once), consider reviewing your Targeting Rules - check that the flag you're changing in variants is added as a rule with an (empty) value. See Targeting Rules for more details. In case your Goal metrics are counted multiple times, review your application's logic. Sometimes, it's completely normal for the goal metric to be recorded multiple times (a customer can make many purchases after entering an Experiment), yet sometimes it can happen because of a bug in your application.

Why do my numbers not add up perfectly?
Even when variants are split 50/50, the numbers will not add up perfectly. It is highly impractical to balance each user enrollment to a specific variant. FlagPal uses weighted randomization to ensure that each variant is equally represented over time.

The experiment has been running for two weeks and still no clear winner. Should I keep going?
If the difference between variants is very small (< 1%) and shows no sign of growing, the experiment is likely inconclusive. Accept the null result, document it, and stop the experiment.