Do 'Expected' goals statistics suck?

  • Xenforo Cloud has upgraded us to version 2.3.6. Please report any issues you experience.
  • We are currently aware of "log in/security error" issues that are affecting some users. We apologize and ask for your patience as we try to get these issues fixed.
Why not just use the statistic we already have for such events, namely a goalie's save percentage?

Number of shots stopped, divided by number of shots faced.

Simple. Don't need "years of data" to create these new stats for what "might" happen.
Because SV% is heavily influenced by the team more so than GAA is.
 
  • Like
Reactions: bossram
My opinion on "Advanced Stats" in hockey dramatically changed a few years back when I got paired up with two guys for a round of golf. Turned out both of them worked for different NHL clubs in the analytics department.

One of them had been with his team for about five years and was part of an "early adopter" organization that built out one of the first "analytics" departments in the league. The other had spent a few years with one team and had just changed, that summer, organizations.

I had a long, fascinating talk about the hockey analytics with them over the new few hours and both of the agreed, despite having careers in the field, that "advanced stats" don't work for hockey.

One of them in particular was adamant that there is absolutely nothing that you can learn from "Advanced Stats" that the organization doesn't already know from their actual scouts. There are no "hidden" attributes and there is zero replacement for actually watching the players play.

One of them said that could conduct an entire baseball draft using only your analytics team and you'd probably do pretty well in the draft looking at just data, stats and numbers.

He said in hockey, that is absolutely false, and no team would ever draft a player they hadn't seen play "live" before because there are way too many variables that the stats don't tell you.

One of them freely admitted that his department isn't really a factor, at all, in building their draft board and that trades were made often without even requesting a report from the analytics department because their GM (he had worked for two, same for both) said that the analytics department would just echo what the professional scouts already told him (This guy carries the puck well, makes a good first pass, has a high IQ, doesn't make unforced errors, etc.) and that the primary use of their data and reports was for contract negotiations. Any way they could justify lowering a contract offer to an existing player in the organization backed by "stats" but otherwise they were largely ignored.

One of them mockingly said that there is no such thing as "the moneyball" player where you find that hidden gem that is being overlooked in hockey. One of the two guys was actively trying to get a job in baseball but couldn't find an opportunity and ended up taking another hockey job.
There’s nothing wrong with using multiple advanced stats to help inform decisions combined with decision making capabilities. This thread isn’t arguing that. It’s looking at specific advanced stats, in this case expected goals and using it as a basis for predicting goals. Which it isn’t, at least not on its own. The more fancy stats you build into a model the “better” it can be. But most don’t stand on their own.
 
Ken Hitchcock is Exhibit A, B, and C of how team play impacts goaltenders' save percentages.

Either you can conclude that his coaching style focused on limiting high-danger scoring chances at the expense of higher-volume, low-danger shot attempts, or you can conclude that all goaltenders got better when they joined his teams and then got worse when they left.
 
if you think about it, 0.16 is well above average, assuming a 0.900 save%, then 0.1 would be your average shot.

NHL goalies are very good and they do frequently stop cross-crease attempts
Depends on how you define "average" for a shot, but the median shot is actually far, far below 0.1 xG. The vast majority of shots have something like 0.01, 0.02, 0.03 xG. 0.1 xG would be classified as a medium danger shots, of which there are only a couple per game.

The main problem here is the assumption that percentages should be averaged additively, even though by the nature of how percentages work, they should be using multiplicative averages. If you do so, you'll find that the proper average is actually far lower than what you would expect.

Expected goals use an exponential distribution and in order to take a meaningful arithmetic mean, you would need to flatten them first. Or alternatively, use the pre-computed z scores for the gamma distribution(while individual expected goal samples follow the exponential distribution, their sum follows the gamma distribution).
 
Last edited:
My opinion on "Advanced Stats" in hockey dramatically changed a few years back when I got paired up with two guys for a round of golf. Turned out both of them worked for different NHL clubs in the analytics department.

One of them had been with his team for about five years and was part of an "early adopter" organization that built out one of the first "analytics" departments in the league. The other had spent a few years with one team and had just changed, that summer, organizations.

I had a long, fascinating talk about the hockey analytics with them over the new few hours and both of the agreed, despite having careers in the field, that "advanced stats" don't work for hockey.

One of them in particular was adamant that there is absolutely nothing that you can learn from "Advanced Stats" that the organization doesn't already know from their actual scouts. There are no "hidden" attributes and there is zero replacement for actually watching the players play.

One of them said that could conduct an entire baseball draft using only your analytics team and you'd probably do pretty well in the draft looking at just data, stats and numbers.

He said in hockey, that is absolutely false, and no team would ever draft a player they hadn't seen play "live" before because there are way too many variables that the stats don't tell you.

One of them freely admitted that his department isn't really a factor, at all, in building their draft board and that trades were made often without even requesting a report from the analytics department because their GM (he had worked for two, same for both) said that the analytics department would just echo what the professional scouts already told him (This guy carries the puck well, makes a good first pass, has a high IQ, doesn't make unforced errors, etc.) and that the primary use of their data and reports was for contract negotiations. Any way they could justify lowering a contract offer to an existing player in the organization backed by "stats" but otherwise they were largely ignored.

One of them mockingly said that there is no such thing as "the moneyball" player where you find that hidden gem that is being overlooked in hockey. One of the two guys was actively trying to get a job in baseball but couldn't find an opportunity and ended up taking another hockey job.


Interesting. I feel there is a way to measure a hockey equiv of WAR in evaluating players.

another area would be how much is a center improving their linenates and how much is a winger driving play
It’s what we do with goalie stats.

.900 is baseline, 8xx is shit and .910+ is good/great
I have problems in this without f as storing in shot/ chance wuality.

fir example you could have gotten 27 shots, then scored 3, sv Oct around 0.889.

When you see the game thr team had 5 odd man chances and 5 other high scoring chances.

Same stats from as different gamr where the D played strong. Thry got 3 PP goals and one shot was s point shot deflection
The rest of the chots were generally between fors as nd boards or ffg its sd nd blue line


It does not truly messurr thr goalie.
It's not a goalie stat.

The goalie is not fighting the other goalie and he's not doing worse because the other goalie is doing better. He's acting relatively independently. Therefore, it's a bit more reasonable to have a baseline quantity of saves.

When a team does better in its share of xG, the other inherently does worse. It's a completely different set of variables.

You could score s goalie based on quality saves vs quality scoring chances/ shots
 
My opinion on "Advanced Stats" in hockey dramatically changed a few years back when I got paired up with two guys for a round of golf. Turned out both of them worked for different NHL clubs in the analytics department.

One of them had been with his team for about five years and was part of an "early adopter" organization that built out one of the first "analytics" departments in the league. The other had spent a few years with one team and had just changed, that summer, organizations.

I had a long, fascinating talk about the hockey analytics with them over the new few hours and both of the agreed, despite having careers in the field, that "advanced stats" don't work for hockey.

One of them in particular was adamant that there is absolutely nothing that you can learn from "Advanced Stats" that the organization doesn't already know from their actual scouts. There are no "hidden" attributes and there is zero replacement for actually watching the players play.

One of them said that could conduct an entire baseball draft using only your analytics team and you'd probably do pretty well in the draft looking at just data, stats and numbers.

He said in hockey, that is absolutely false, and no team would ever draft a player they hadn't seen play "live" before because there are way too many variables that the stats don't tell you.

One of them freely admitted that his department isn't really a factor, at all, in building their draft board and that trades were made often without even requesting a report from the analytics department because their GM (he had worked for two, same for both) said that the analytics department would just echo what the professional scouts already told him (This guy carries the puck well, makes a good first pass, has a high IQ, doesn't make unforced errors, etc.) and that the primary use of their data and reports was for contract negotiations. Any way they could justify lowering a contract offer to an existing player in the organization backed by "stats" but otherwise they were largely ignored.

One of them mockingly said that there is no such thing as "the moneyball" player where you find that hidden gem that is being overlooked in hockey. One of the two guys was actively trying to get a job in baseball but couldn't find an opportunity and ended up taking another hockey job.
I'd bet an incredible amount of money that this convo didn't happen. Two people working in analytics willingly telling others that their job is completely worthless. No shot.
 
  • Haha
Reactions: Filthy Dangles
Rebounds are the most dangerous shots statistically.

You are implying that 4th line plugs can juice their numbers by getting flurries of rebound chances, but they don't realistically get those chances often at all. Most bad 4th liners average less than a shot per game.

Ovechkin generally generates 3-4 shots per game. A lot of Ovechkin's chances aren't grade A chances, that is why is career shooting percentage is relatively low for a first line player. Ovie just gets more chances than almost everyone else though.

It's not about juicing numbers. It's about assigning a blanket value to a shooting event almost entirely based on from where the shooting event took place, and then deciding you can conclude how many goals you are expected to score based on that data. It's true that a 4th liner doesn't often get that many rebound chances but it does happen, and not all that rarely either. You can't dismiss these occurrences as not statistically significant and at the same time accept xG conclusions at their face value.

Expected goals, at least the publicly available models, don't sufficiently account for where the other players are on the ice. How one can possibly then call it a comprehensive appraisal of what has happened then?

Flurry-adjusted expected goals was specifically made to deal with this.
Without flurry:
0.7 + 0.7 + 0.7 = 2.1 xG
With flurry:
0.7 + 0.21 + 0.063 = 0.973 xG

1741629311711.png


So even more than 10x what an Ovechkin one timer from his office at the top of the circle is worth in the model then. Totally flawless definitely.
Also, seems like you just don't understand that expected goals aren't tied to the shooter. That doesn't mean that you cannot adjust for the shooter separately from expected goals, by applying shooter-dependent modifiers on the raw expected goals-value. This, of course, is what you should be doing. Using it as an argument against expected goals is, frankly, ridiculous.

It's akin to saying that a player's shots on goal aren't worth tracking because players score at different %s. However, in reality, you should consider both the shooting frequency and the shooting percentage to get the most complete view. It's not rocket science, merely common sense.

Could really do without the condescension of "well you must not understand it if you think it's bad". I understand the model just fine and it's because I understand it I don't think it's worth as much as many of the advanced stat zealots do. The entire premise is flawed in its current form and paints a woefully incomplete picture of what occurred during a game.

You're telling me you are applying shooter-dependent and goalie-dependant modifiers for every expected goals-value you come across? Is that what you're doing? Come on. Frankly ridiculous is right. That's no longer a stat that is useful if you have to do a bunch of stuff to it after it's been recorded to make it meaningful. That's not rocket science, merely common sense.

I always love the example of some plug taking a bunch of shots in the crease because like, when does that ever happen??

Sam Carrick, who is a damn good 4th liner, has 64 shots in 64 games. David Pastrnak has 272.

The idea that the elite players in this league, who generate quality looks, aren't also the guys generating most of the volume is completely contrary to reality.

Well if we're accepting a baseline blanket 'expected goals' stat applicable to all players and all situations it really shouldn't matter who is taking the shot should it? But it does. I never argued the the elite players aren't generating the most volume either.


Since you all took issue with my initial scenario lets explore a couple more for how xG is calculated. These are common occurrences in NHL games.
*I know different sites use differing models and apply different filters and gradients etc but they are all at their basic form built upon something similar to the posted diagram above.*

1a) Kevin Rooney starts his shift in the DZ. Loses the faceoff and spends the next 73 seconds along with his linemates chasing the puck around under pressure from his opponents. His goalie bails him out several times. Finally the opponent's fwd line tire and go for a line change leaving Rooney's defenseman in the corner with the puck for an easy breakout with lots of open ice. He passes to Rooney while the rest of his line changes. Rooney, being the responsible and boiler plate nothing NHL player he is knows he must get the puck deep to allow for the next line to establish themselves on the ice. He takes the open ice and crosses the blueline with both opponent defenders in front of him. Rather than just dumping it in the corner Kev decides he should probably get at least one SoG this week so he wrists a 72mph shot at the goalie from just inside the blueline. Unless he shot some heroin between periods the goalie gloves it easily and sets the table for the weakside D to start a breakout.

This is deemed worthy of 0.03 xGs.

1b) Cale Makar starts his shift in the OZ. MacKinnon wins the faceoff and the winger gets it back to Makar. Makar controls the puck and fakes sending it back down low to open up the middle of the ice. He walks the blueline as layers of traffic gather in front of his opponent's net. The goalie can't see the puck through the traffic and stands taller in his perimeter stance to try to find it. Makar sees this and sees an opening through the moving bodies. With no windup he wrists a 72mph shot from just inside the blueline. The shot is well placed and he scores under the goalie's glove. Goalie never saw it and curses his winger for missing his assignment and not blocking the point shot.

This is deemed worthy of 0.03 xGs.

2a) Ryan Lomberg takes a pass with speed in the NZ after some nice work by his other winger to win a high board battle. He has some room so he takes the defender wide and executes some respectable net drive by keeping his feet moving. He's by himself as his linemates are busy picking fleas off each other by the bench. He takes a contested shot along the ice from in tight to the net, the goalie has the angle covered easily and the shot bounces harmlessly to the corner.

This is deemed worthy of 0.3 xGs.

2b) Patrick Kane vacating his defensive responsibilities as he senses the puck is about to turn over creeps past the mouth breathing weakside defenseman still standing on the blueline trying to keep the pressure on. Sure enough Kane's center wins the puck along the boards and lifts a nice outlet area pass to center ice. Kane grabs the puck and is in alone on the goalie. He fakes going high glove and freezes our poor tender before making a move to the backhand and then crossbar down roofs it.

This is deemed worthy of 0.3 xGs.


Go ahead and handwave away these differences if you want but to me they are not reconcilable. It's not that xG models tell us nothing or aren't worth considering, it's that they are incomplete and aren't actually providing what they are claiming to. Over season long samples they can definitely be useful and predictive but stuff like the deserve-to-win'o'meter or NST's charting for individual games just isn't that useful without the context of seeing the game happen. It's not one or the other eye test vs advanced stats, it should be considering both as parts of an overall picture.
 
  • Like
Reactions: Filthy Dangles
Not only that, but sometimes we can't even tell what direction it's going in.

Bad defense can raise save percentage through sheer volume.
Exactly! The worst defensive teams still know how to take the middle of the ice away. Bad defense results in a lot of time spent in the d zone and as a result the volume of low danger chances is the result of bad defense. Goals are scored from low danger chances too. The more low danger chances a team willfully gives up means more goals. However SV% also raises as a result.

High danger chances come from bad mistakes that are one off type plays. Falling down at the blue line, or a bad pass, or a broken stick are where HD chances come from. HD chances are very random at times.
 
  • Like
Reactions: Machinehead
It's not about juicing numbers. It's about assigning a blanket value to a shooting event almost entirely based on from where the shooting event took place, and then deciding you can conclude how many goals you are expected to score based on that data. It's true that a 4th liner doesn't often get that many rebound chances but it does happen, and not all that rarely either. You can't dismiss these occurrences as not statistically significant and at the same time accept xG conclusions at their face value.

Expected goals, at least the publicly available models, don't sufficiently account for where the other players are on the ice. How one can possibly then call it a comprehensive appraisal of what has happened then?



View attachment 990987

So even more than 10x what an Ovechkin one timer from his office at the top of the circle is worth in the model then. Totally flawless definitely.


Could really do without the condescension of "well you must not understand it if you think it's bad". I understand the model just fine and it's because I understand it I don't think it's worth as much as many of the advanced stat zealots do. The entire premise is flawed in its current form and paints a woefully incomplete picture of what occurred during a game.

You're telling me you are applying shooter-dependent and goalie-dependant modifiers for every expected goals-value you come across? Is that what you're doing? Come on. Frankly ridiculous is right. That's no longer a stat that is useful if you have to do a bunch of stuff to it after it's been recorded to make it meaningful. That's not rocket science, merely common sense.



Well if we're accepting a baseline blanket 'expected goals' stat applicable to all players and all situations it really shouldn't matter who is taking the shot should it? But it does. I never argued the the elite players aren't generating the most volume either.


Since you all took issue with my initial scenario lets explore a couple more for how xG is calculated. These are common occurrences in NHL games.
*I know different sites use differing models and apply different filters and gradients etc but they are all at their basic form built upon something similar to the posted diagram above.*

1a) Kevin Rooney starts his shift in the DZ. Loses the faceoff and spends the next 73 seconds along with his linemates chasing the puck around under pressure from his opponents. His goalie bails him out several times. Finally the opponent's fwd line tire and go for a line change leaving Rooney's defenseman in the corner with the puck for an easy breakout with lots of open ice. He passes to Rooney while the rest of his line changes. Rooney, being the responsible and boiler plate nothing NHL player he is knows he must get the puck deep to allow for the next line to establish themselves on the ice. He takes the open ice and crosses the blueline with both opponent defenders in front of him. Rather than just dumping it in the corner Kev decides he should probably get at least one SoG this week so he wrists a 72mph shot at the goalie from just inside the blueline. Unless he shot some heroin between periods the goalie gloves it easily and sets the table for the weakside D to start a breakout.

This is deemed worthy of 0.03 xGs.

1b) Cale Makar starts his shift in the OZ. MacKinnon wins the faceoff and the winger gets it back to Makar. Makar controls the puck and fakes sending it back down low to open up the middle of the ice. He walks the blueline as layers of traffic gather in front of his opponent's net. The goalie can't see the puck through the traffic and stands taller in his perimeter stance to try to find it. Makar sees this and sees an opening through the moving bodies. With no windup he wrists a 72mph shot from just inside the blueline. The shot is well placed and he scores under the goalie's glove. Goalie never saw it and curses his winger for missing his assignment and not blocking the point shot.

This is deemed worthy of 0.03 xGs.

2a) Ryan Lomberg takes a pass with speed in the NZ after some nice work by his other winger to win a high board battle. He has some room so he takes the defender wide and executes some respectable net drive by keeping his feet moving. He's by himself as his linemates are busy picking fleas off each other by the bench. He takes a contested shot along the ice from in tight to the net, the goalie has the angle covered easily and the shot bounces harmlessly to the corner.

This is deemed worthy of 0.3 xGs.

2b) Patrick Kane vacating his defensive responsibilities as he senses the puck is about to turn over creeps past the mouth breathing weakside defenseman still standing on the blueline trying to keep the pressure on. Sure enough Kane's center wins the puck along the boards and lifts a nice outlet area pass to center ice. Kane grabs the puck and is in alone on the goalie. He fakes going high glove and freezes our poor tender before making a move to the backhand and then crossbar down roofs it.

This is deemed worthy of 0.3 xGs.


Go ahead and handwave away these differences if you want but to me they are not reconcilable. It's not that xG models tell us nothing or aren't worth considering, it's that they are incomplete and aren't actually providing what they are claiming to. Over season long samples they can definitely be useful and predictive but stuff like the deserve-to-win'o'meter or NST's charting for individual games just isn't that useful without the context of seeing the game happen. It's not one or the other eye test vs advanced stats, it should be considering both as parts of an overall picture.
You're operating under the assumption that xG operates solely based on distance. Here, just as an example, is the list of variables used in Moneypuck's model.

1.) Shot Distance From Net
2.) Time Since Last Game Event
3.) Shot Type (Slap, Wrist, Backhand, etc)
4.) Speed From Previous Event
5.) Shot Angle
6.) East-West Location on Ice of Last Event Before the Shot
7.) If Rebound, difference in shot angle divided by time since last shot
8.) Last Event That Happened Before the Shot (Faceoff, Hit, etc)
9.) Other team's # of skaters on ice
10.) East-West Location on Ice of Shot
11.) Man Advantage Situation
12.) Time since current Powerplay started
13.) Distance From Previous Event
14.) North-South Location on Ice of Shot
15.) Shooting on Empty Net

Based on those parameters, the Rooney and Makar shots are two completely different plays.

The Lomberg and Kane plays are more similar, but Kane going to the backhand changes #5, #6, #10, and probably #4 too if he's completely unfettered as you described.

The potential issue with xG models on certain plays, is that it doesn't factor in ability level when two players take very similar shots. You have not described very similar shots at all.
 
And this kids, is why Connor Hellebuyck has an absurdly high GSAX or whatever abbreviation the spreadsheet jockeys are using
Hellebuyck only faces shots from 4th liners and that's pumping up his GSAX?

This thread has been pretty interesting.
 
You're operating under the assumption that xG operates solely based on distance. Here, just as an example, is the list of variables used in Moneypuck's model.

1.) Shot Distance From Net
2.) Time Since Last Game Event
3.) Shot Type (Slap, Wrist, Backhand, etc)
4.) Speed From Previous Event
5.) Shot Angle
6.) East-West Location on Ice of Last Event Before the Shot
7.) If Rebound, difference in shot angle divided by time since last shot
8.) Last Event That Happened Before the Shot (Faceoff, Hit, etc)
9.) Other team's # of skaters on ice
10.) East-West Location on Ice of Shot
11.) Man Advantage Situation
12.) Time since current Powerplay started
13.) Distance From Previous Event
14.) North-South Location on Ice of Shot
15.) Shooting on Empty Net

Based on those parameters, the Rooney and Makar shots are two completely different plays.

The Lomberg and Kane plays are more similar, but Kane going to the backhand changes #5, #6, #10, and probably #4 too if he's completely unfettered as you described.

The potential issue with xG models on certain plays, is that it doesn't factor in ability level when two players take very similar shots. You have not described very similar shots at all.

Well again, this is the basic model that these other ones such as MoneyPuck are built off of. That basic xG model would record the xG statistic described in my scenarios exactly as I've stated it would. I realize I have not described similar shots at all, that was kind of the point. Even after we apply all those 15 other variables and modifiers to it, the initial data the model is working off of I would judge as flawed and incomplete. Maybe others have a lower bar but I feel if you're going to claim to have proven how many goals a hockey team should have scored in a game you shouldn't have easily picked apart holes in the concept. I also have my doubts all those variables in MoneyPuck's model are recorded and collated accurately as their site looks like it's from 2006, but that's besides the point.

xG operates on two basic parameters to determine the shot quality. Shot distance and shot angle. That's literally it. Do those two basic parameters seem to provide a complete picture of how dangerous a shot is to you?
 

Ad

Ad