HOW IT WORKS

Our Methodology

We believe bettors deserve to understand where the numbers come from. This page explains exactly how we model match probabilities, generate confidence scores, and identify market edges — without giving away the implementation details that make our model work. Our methodology is designed to be sport-agnostic. FIFA World Cup 2026 is our inaugural application; the framework extends to any sport where team strength and historical match data are available.

The Model: Dixon-Coles

Our predictions are built on the Dixon-Coles bivariate Poisson model, first published by Mark Dixon and Stuart Coles in 1997. It models the number of goals each team scores as a Poisson process driven by that team's attack strength and the opponent's defense strength — producing a full probability distribution over all possible scorelines, from which we derive home win, draw, and away win probabilities.

The key improvement Dixon-Coles made over simple Poisson models is a low-score correction factor — it adjusts the probability of 0-0, 1-0, 0-1, and 1-1 scorelines, which naive Poisson consistently misprices. In tournaments where defensive tactics dominate, this correction matters significantly.

WHY THIS MODEL

Dixon-Coles is well-understood, peer-reviewed, and empirically validated across decades of international match data. We chose it because it is accurate, explainable, and appropriate for the data volumes available in international sports — not because it is fashionable. Neural network approaches require far more data than any single sport's international tournament history provides.

The Data

For WC2026, our model is trained on StatsBomb open event data covering four major international tournaments: FIFA World Cup 2018, FIFA World Cup 2022, UEFA Euro 2020/21, and Copa América 2024. For future sports, we will use the best available play-by-play or event data for that sport.

262
MATCHES
6,619
SHOTS TRACKED
59
NATIONAL TEAMS
4
TOURNAMENTS

For each team we compute attack and defense strength ratings using expected goals (xG) rather than raw goals scored. xG measures shot quality, not just outcomes — it is a more stable signal of team ability, less susceptible to the variance inherent in whether individual shots went in. xG is a soccer-specific metric; for other sports we would use the equivalent efficiency statistic (points per possession, expected points added, and so on).

For the six teams whose WC2026 qualification was not confirmed at model training time, we substitute a ranking-based prior derived from FIFA world rankings. These fixtures are clearly marked with a TBD indicator in the dashboard and their confidence scores reflect the additional uncertainty.

The Pipeline

Raw data flows through a five-stage pipeline before becoming the probabilities you see in the dashboard.

01
INGEST
Pull StatsBomb match and shot data for all four tournaments
02
FEATURES
Compute xG-based attack and defense strength per team per tournament
03
TRAIN
Fit Dixon-Coles parameters via maximum likelihood estimation
04
VALIDATE
Backtest on historical holdout data — RPS 0.183, accuracy 62.5%
05
PREDICT
Generate probabilities for all 72 WC2026 group stage fixtures

Validation

We validate the model using two standard metrics for probabilistic soccer prediction:

Ranked Probability Score (RPS) measures how well the model's probability distributions match actual outcomes across home win, draw, and away win. Lower is better. Our model scores 0.183 — consistent with published benchmarks for well-calibrated Dixon-Coles implementations on international data.

Result accuracy measures how often the outcome the model rates most likely actually occurred on our holdout data. Our model achieves 62.5% — meaningfully above the naive baseline of always picking the most likely outcome without a model (~47% on international soccer data).

HONEST CAVEAT

No model predicts soccer with certainty. International soccer has high variance — upsets are frequent, and small sample sizes per team mean ratings carry meaningful uncertainty. Our predictions represent probability estimates, not guarantees. Always bet responsibly.

The Confidence Score

Each fixture displays a signal score from 0 to 10. This is not a prediction of the winner — it is a measure of how actionable the fixture is. A score of 8.5 means the model sees a clear, decisive outcome. A score of 2.1 means the fixture is close to a three-way coinflip.

PRE-TOURNAMENT FORMULA
score = (model_confidence × 0.6) + (xG_spread × 0.4)
model_confidence = 1 − Shannon entropy of the H/D/A probability distribution
xG_spread = min(|home_xG − away_xG| / 2.0, 1.0)
POST-ODDS FORMULA (from late April)
score = (model_confidence × 0.4) + (xG_spread × 0.3) + (market_edge × 0.3)
market_edge = min(|model_prob − market_implied_prob| / 0.15, 1.0)
8.4
STRONG
7.0 – 10.0
Decisive model + meaningful xG gap or market edge
5.1
MODERATE
4.0 – 6.9
Some lean but meaningful uncertainty remains
2.3
WEAK
0 – 3.9
Near-coinflip — low model confidence, tight xG spread

Market Edge

From late April, when bookmakers post WC2026 odds, we compute an edge signal for each fixture and outcome. Edge measures how much our model disagrees with the market's implied probability after removing the bookmaker's vig (overround).

EDGE FORMULA
edge = model_probability − vig_removed_implied_probability
Positive edge = model thinks the outcome is more likely than the market prices.
Negative edge = market prices the outcome higher than our model does.
Vig removal uses the standard normalization method across all three outcomes.

We average implied probabilities across multiple major bookmakers to produce a consensus market line. This reduces noise from individual book errors or position management and gives a cleaner signal of true market consensus.

WHAT WE DON'T SHOW

We do not publish the specific parameter values from our fitted model (attack ratings, defense ratings, the low-score correction coefficient, or our shrinkage weights). These are the implementation details that represent our investment in this work. The methodology above gives you everything you need to understand and evaluate our numbers — the proprietary part is the execution.

Model Updates

Group stage predictions are generated before the tournament begins and do not change as matches are played. This is a deliberate design choice — pre-tournament predictions reflect our model's assessment of team quality, not a reactive recalibration after surprising results.

Market odds and the edge signal update hourly once bookmakers post lines. The confidence score recalibrates automatically as odds populate — no manual intervention required.

We plan to publish updated predictions for the knockout rounds using group stage results as additional training signal, giving the model more recent data on each team's in-tournament form. For future sports and competitions, prediction update cadence will be adapted to the structure of that competition.