How good is Driblab’s Expected Goals (xG) model?

Category: Team Analysis

We analysed more than 44000 shots to measure whether Driblab's Expected Goals (xG) model measures the value of shots well. The result confirms that our model is well balanced.


Short answer: Very good.

Long answer: 

It is not unusual to see discrepancies in different xG values providers. Expected goals have a clear definition in theory, but it is difficult to put it in practice. A new season is about to start, and has actually started in many leagues, so we want to test our xG model. Football analytics main tool needs to be accurate in order to assess properly teams and players.

But, how can a xG model be tested? A shot with 0.22 xG means that, if an average player shots in the same circumstances a hundred times, he would score 22 goals. But this shot happens only one time. Does our model give too much value? Or do we underestimate the chance? The best way to asses this model is by looking not at one shot, but at 44.406 shots. This is the number of shots (excluding penalti kicks and own goals) that were taken in Europe’s top 5 leagues, the Copa America and the Euro.

One may think that it is impossible to miss that shot. Our model gives a value of 0.85xG. When our team’s striker misses a 0.85xG chance, we think that even ourselves could have scored that. But the missing 0.15xG tells us that roughly one out of every 7 of these shots is missed. And this is the one. Throughout the seasons, we recorded other 6 shots with that much xG, and all were scored. This is, obviously, a coincidence, but it still works to illustrate the meaning of Expected Goals.

Through the EURO, our model expected a total of 124.75 non-penalty goals, and the total number of non-penalty goals scored was 122. In the end, only a few national teams played and not a large number of matches were disputed. Taking the seven competitions we mentioned, we have 1905 matches where a total of 4485.8 non-penalty goals were expected and 4581 were scored. An underestimation of 2.07%, which is not statistically significant, shows us that our model is well balanced. To compare, the previous season, on these leagues, we had an overestimation of 0,16%.

But this does not tell the whole story. We might be overestimating low value shots and underestimating, or not counting all shot zones or shot types. As an example, we’ll take a look into different value shots. The graphic above shows our prediction against reality. We put the shots in bins of 0.01xG. This is, our predicted probability of 0.04 xG are shots between 0.035 and 0.045 xG. The size of the circle represents the number of shots (more shots mean less deviation) and we show how many of these shots were actually converted. An R-Squared of 0.968 just mathematically confirms our intuition: Driblab’s Expected Goals model is well weighted.

For big chances (xG over 0.33), small number of shots are recorded and convergence may not happen. Here is a histogram of the number of shots taken and goals scored. We can see how most shots amass less than 0.05xG, and how the number of big chances decreases drastically.

Expected goals also allows us to evaluate team performance. In this case, even though there is an obvious correlation, some teams have scored more than expected, and others, such as Brighton, have well underperformed. Having efficiency might hand you the League title (see Lille). And even though underperforming in the domestic league might take you to fourth place, Chelsea showed its true potential at the Champions League.

Driblab’s Expected Goals model is basic for many other metrics, so we have studied it to be very accurate. We show some of the internal analysis the model undergoes, in order to reduce any possible bias. A metric that allows us to asses players, teams and even leagues, is of the upmost importance to us. And the next time your team’s striker misses a golden chance, remember that even 0.97 xG shots are missed every now and then.

We are Driblab, a consultancy specialized in football analytics and big data; our work is focused on advising and minimizing risk in professional football decision-making in areas related to talent detection and footballer evaluations. Our database has more than 180,000 players from more than 180 competitions, covering information from all over the world. Here you can learn more about how we work and what we offer.

Autor: Joan Hernanz
For Team Analysis we also recommend you:

Belgium and the Netherlands: top U20 talents

We search, analyse and recommend following some of the best U20 players playing in the Belgian and Dutch leagues.

How much pressure do Ligue 1 teams put on?

In this text we will find out in more depth which teams place more value on pressing and which of them get the most out of their defence in the opposition half, whether it is more or less aggressive.

Quelle pression les équipes de Ligue 1 exercent-ils sur leurs rivaux ? (Français)

Dans ce texte, nous allons découvrir plus en profondeur quelles sont les équipes qui misent le plus sur le pressing et lesquelles tirent le meilleur parti de leur défense dans la moitié adverse, qu’elle soit plus ou moins agressive.

Roma 22/23: the facts on a team ready to cause an upset

Last season, José Mourinho’s Roma were the team that created the most chances in Serie A. How far can they go?

Interview: Korean football in Naver Sports Korea

Naver Sports interviews Driblab to find out how far our data coverage in Asia goes.

driblabPRO Release Notes Summer 2022

These are all the driblab PRO improvements developed in the last two months.

‘Scouting’: Take a closer look at your next opponent

We use all the tools available in our driblabPRO ‘Analysis’ department to find out how each scout can analyse his next opponent.

The Vélez Sarsfield revolution

With a clear commitment to home-grown players, Velez Sarsfield are fighting to get back to dreaming of winning the Copa Libertadores, 28 years after their 1994 title.

Who will be the next great Balkan talent?

The Balkans has historically been one of the most productive areas in terms of football talent. We discover the talents of the future in Croatia and Serbia.

MLS 2022: what does the ‘big data’ say about each team?

As MLS 2022 reaches the halfway point, we delve into collective statistics to learn more about the 28 teams and their performances.


Información corporativa

Somos una empresa con sede en Madrid fundada en 2017 por Salvador Carmona y Cristian Coré Ramiro. Desde nuestros inicios nuestro trabajo se ha centrado en el análisis estadístico de datos para ayudar a los clubes en la planificación deportiva. Somos una consultora big data que ofrece servicios personalizados para cada cliente y defiende un modelo de gestión mixto y una comunicación constante para acompañar el día a día de las instituciones. Nuestro punto fuerte es la más amplia cobertura disponible en número de torneos profesionales y juveniles. Para más detalles, póngase en contacto con nosotros.

Colaboramos con:


Hemos aparecido en:

Talk to our speciali