How good is Driblab’s Expected Goals (xG) model?

Category: Team Analysis

We analysed more than 44000 shots to measure whether Driblab's Expected Goals (xG) model measures the value of shots well. The result confirms that our model is well balanced.

Published:03/08/2021

Short answer: Very good.

Long answer: 

It is not unusual to see discrepancies in different xG values providers. Expected goals have a clear definition in theory, but it is difficult to put it in practice. A new season is about to start, and has actually started in many leagues, so we want to test our xG model. Football analytics main tool needs to be accurate in order to assess properly teams and players.

But, how can a xG model be tested? A shot with 0.22 xG means that, if an average player shots in the same circumstances a hundred times, he would score 22 goals. But this shot happens only one time. Does our model give too much value? Or do we underestimate the chance? The best way to asses this model is by looking not at one shot, but at 44.406 shots. This is the number of shots (excluding penalti kicks and own goals) that were taken in Europe’s top 5 leagues, the Copa America and the Euro.

One may think that it is impossible to miss that shot. Our model gives a value of 0.85xG. When our team’s striker misses a 0.85xG chance, we think that even ourselves could have scored that. But the missing 0.15xG tells us that roughly one out of every 7 of these shots is missed. And this is the one. Throughout the seasons, we recorded other 6 shots with that much xG, and all were scored. This is, obviously, a coincidence, but it still works to illustrate the meaning of Expected Goals.

Through the EURO, our model expected a total of 124.75 non-penalty goals, and the total number of non-penalty goals scored was 122. In the end, only a few national teams played and not a large number of matches were disputed. Taking the seven competitions we mentioned, we have 1905 matches where a total of 4485.8 non-penalty goals were expected and 4581 were scored. An underestimation of 2.07%, which is not statistically significant, shows us that our model is well balanced. To compare, the previous season, on these leagues, we had an overestimation of 0,16%.

But this does not tell the whole story. We might be overestimating low value shots and underestimating, or not counting all shot zones or shot types. As an example, we’ll take a look into different value shots. The graphic above shows our prediction against reality. We put the shots in bins of 0.01xG. This is, our predicted probability of 0.04 xG are shots between 0.035 and 0.045 xG. The size of the circle represents the number of shots (more shots mean less deviation) and we show how many of these shots were actually converted. An R-Squared of 0.968 just mathematically confirms our intuition: Driblab’s Expected Goals model is well weighted.

For big chances (xG over 0.33), small number of shots are recorded and convergence may not happen. Here is a histogram of the number of shots taken and goals scored. We can see how most shots amass less than 0.05xG, and how the number of big chances decreases drastically.

Expected goals also allows us to evaluate team performance. In this case, even though there is an obvious correlation, some teams have scored more than expected, and others, such as Brighton, have well underperformed. Having efficiency might hand you the League title (see Lille). And even though underperforming in the domestic league might take you to fourth place, Chelsea showed its true potential at the Champions League.

Driblab’s Expected Goals model is basic for many other metrics, so we have studied it to be very accurate. We show some of the internal analysis the model undergoes, in order to reduce any possible bias. A metric that allows us to asses players, teams and even leagues, is of the upmost importance to us. And the next time your team’s striker misses a golden chance, remember that even 0.97 xG shots are missed every now and then.

We are Driblab, a consultancy specialized in football analytics and big data; our work is focused on advising and minimizing risk in professional football decision-making in areas related to talent detection and footballer evaluations. Our database has more than 180,000 players from more than 180 competitions, covering information from all over the world. Here you can learn more about how we work and what we offer.

Autor: Joan Hernanz
For Team Analysis we also recommend you:

Driblab and MARCA: ‘Tactical Fouls’ and how to find them with data

In our latest collaboration with MARCA, we tried to discover which teams have a greater tendency to commit tactical fouls in a premeditated way.

Offensive and physical perfomance in second halves: Who attacks and presses more after halftime?

Among other functionalities, statistics serve to contrast, reinforce, or deny the sensations that, as analysts, one may have of a specific reality. It is very likely that our perception, especially in a game as peculiar and random as football, may tell us that a...

The data reflecting Bayern’s evolution

Entre muchos factores que explican sus dudas, ¿era el Bayern un club para Tuchel?¿Era Tuchel el entrenador idóneo para el Bayern?

‘Style metrics’: 10+ pass sequences per 90′

We discover which teams complete the most +10 pass sequences in each match and which complete the least in the five major leagues, an interesting style metric for analysis.

Scouting beyond the second divisions: on-demand data and coverage

Four of Driblab’s last eight clients are in the lower to second divisions. This is how we help clubs all over the world with data and scouting.

On comebacks and goals after the 90th minute

For some time now, we have been incorporating additional valuable information into all our models and advanced statistical metrics, regarding the competitive dynamics of each team. We unveil which teams make comebacks more often and which score more goals in the final stretches.

Artem Dovbyk, the late talent discovered by data

Artem Dovbyk, the late talent discovered by data

‘Coach analysis’: Thiago Motta’s notes

We take a closer look at some of the details of Thiago Motta’s Bologna.

‘Predictions 2024’: Top-6 European leagues

We discovered the odds of qualification in the six major leagues, using a multitude of criteria to shape our prediction model.

Set pieces win titles… and save you from relegation

“We visualize the best set-piece teams in the top five leagues, comparing the expected goals generated and their reliance on the total goals.”

Driblab

Información corporativa

Somos una empresa con sede en Madrid fundada en 2017 por Salvador Carmona y Cristian Coré Ramiro. Desde nuestros inicios nuestro trabajo se ha centrado en el análisis estadístico de datos para ayudar a los clubes en la planificación deportiva. Somos una consultora big data que ofrece servicios personalizados para cada cliente y defiende un modelo de gestión mixto y una comunicación constante para acompañar el día a día de las instituciones. Nuestro punto fuerte es la más amplia cobertura disponible en número de torneos profesionales y juveniles. Para más detalles, póngase en contacto con nosotros.

Colaboramos con:

           

Hemos aparecido en: