Principles of Estimating Ice Time

Iain Fyffe

Hockey fact-checker
The subject of estimating ice time for seasons before it was tracked came up in another thread. Killion suggested I start a thread on this topic. So here we go.

However as Ive suggested if you do start a Thread on it, just make sure your doing it in a compartmentalized format by era, by position, by roster sizes & number of games etc. But sure... good idea. Will require some serious knowledge of the game & its eras', of actual strategies & in-game situations, reliance upon star players & the movements inter-game in terms of utility players (Red Kelly for eg) from the Blue line to Forward & back resulting in greater ice-time than one might imagine. Hull on the Point in PP. In PK situations and against which opponents. Offensive stars, First Liners "sat" late in a game when the scores out of reach & so on. Defensive pairings wracking up astronomical minutes. The change in the ebb & flow pre-short-shift. Pacing. A great many factors involved and if your game, by all means, start the Thread ('s).

I'm only going to get into the basic principles here, and specifically I'm only going to address the post-expansion era, because before 1967 we're missing the juiciest data for estimating ice time; estimates before expansion will necessarily use a somewhat different methodology, and will be less reliable.

To begin with, to estimate ice time with any semblance of accuracy you need to do it team-by-team, not player-by-player. It's a team's total available ice time that is divided amongst its players, so that's where you have to start. If you just look at an individual player, you could end up with an unsupportable estimate such as Bobby Hull playing 40 minutes per game. You need to start with the team.

You also want to be systematic. I won't use the term objective, although it could arguably apply, because there is judgment involved in building the model in the first place. However, you want it to be systematic in order to avoid personal or cognitive biases entering into it. You want to be able to apply the same estimator to every player, and not make exceptions here and there. Obviously no estimate will be perfect, and some players will be off by more than others. Indeed it's possible that certain types of players will be off by more than others. But a model with a known error range is better than "I don't know", and it's better than a guess which has an unknown error range.

So what sort of inputs should we use? Obviously, games played are important. Scoring totals are useful, but only to a limited degree due to the variation in rates at which players accumulate points. What we really want is something that provides some reflection of the player being on the ice, and that is subject to relatively little influence by the individual player. By far the best candidate is the component parts of plus-minus. Not the plus-minus rating itself, which is useless here, but the totals of goals for and goals against, power-play and non-power-play that the player is on the ice for (that is, TGF, PGF, TGA and PGA). A player has less control over these items that he does over his goal or assist total, since they concern not only teammates but opponents as well.

Whenever a player is on the ice, he leaves behind data that he was there. Regardless of why a player was on the ice, he will leave behind statistical clues that he was on the ice. And that's what we're concerned with here. If you want to interpret the estimate afterwards to determine why the estimate is such, that's fine, but first you need to have the estimate. As such, much of what Killion mentions above does not need to be considered individually. We're looking for the player's ice time, not the coach's reasoning for giving him that amount of ice time. If a player plays both forward and defence in a game, that will affect his TGF, PGF, TGA and PGA. If first-liners sit late in some games, that will affect their TGF, PGF, TGA and PGA since you cannot accumulate them if you're not on the ice. Again, to develop an estimate it doesn't matter why a player was or was not on the ice, what matters is that he was or was not on the ice. If you get bogged down by such details, you risk missing the forest for the trees.

So the power-play components are the building blocks of the ice time estimate, specifically comparing a player's totals to the team's totals. (If we had shots for and shots against while the player was on the ice, like we do today, that would be even better because they're more granular, but we can only work with what we have.) I'm not going to go into detail on my exact method right now, in part because it's currently undergoing revision for an idea I had recently. However, we can look at some aggregate figures to get a sense of the accuracy at the moment. Let's look at 1997/98, the first season ice time data was tracked by the NHL.

If we look at the top 50 centres by actual ice time that season, we find they played a total of 73,377 actual minutes, and my estimator has a total of 73,855 estimated minutes. The total error (which is the sum of the absolute values of the differences between the estimated and actual ice time for each player) is 4,114. So the error is 5.6% of the estimated time, which means that the average estimate is 5.6% off.

For the top 50 left wings, the figures are: 61,204 actual minutes; 61,752 estimated minutes; 3,532 total error; 5.7% error rate.

For the top 50 right wings, the figures are: 66,763 actual minutes; 66,905 estimated minutes; 3,244 error; 4.8% error rate.

For the top 100 defencemen, the figures are: 163,459 minutes; 157,122 estimated minutes; 9,665 error; 6.2% error rate.

So the estimates for the defencemen are the least reliable, but still a 6% error is pretty decent. So this is the model that gets used for previous seasons, when we do not have actual ice times but we do have plus-minus components recorded, from 1967/68 onward. And as for any changes in the way the game has been played since then, just remember: it does not matter why the player was on the ice, or when the player was on the ice, it matters that he was on the ice, and that when players are on the ice, they leave behind clues that they were on the ice. That is the guiding principle.

It doesn't matter that shifts are shorter now; goals are still scored for and against every player that is on the ice at a particular time. It doesn't matter if a forward plays the point on the PP; he leaves behind clues that this was the case. As I said, certain types of players (that is, players used in certain sets of situations) will have a greater average error, but no system is perfect, and being systematic is better, on the whole, than assigning minutes based on inspection, which allows personal biases to enter into it and will not consider all available information.
 

TheDevilMadeMe

Registered User
Aug 28, 2006
52,271
6,990
Brooklyn
So there is about a 5-6% error rate between the estimates created by your method and the actual recorded ice time in 1997-98. I've seen it posited on this board that the error would likely increase the farther back in time you go. The reason - this method is calibrated to the modern player usage, while the farther back you go, the longer the shifts and the shorter the benches. Do you agree or disagree?
 

Iain Fyffe

Hockey fact-checker
So there is about a 5-6% error rate between the estimates created by your method and the actual recorded ice time in 1997-98. I've seen it posited on this board that the error would likely increase the farther back in time you go. The reason - this method is calibrated to the modern player usage, while the farther back you go, the longer the shifts and the shorter the benches. Do you agree or disagree?
I'd say I mostly disagree. There will always be X number of players on the ice when a goal is scored, both for the scoring team and their opponents. That's not going to change regardless of the length of shifts or benches. If a player plays 25 minutes in two-minute shifts, or 25 minutes in 45-second shifts, there's no particular reason to think his plus-minus components will be affected significantly, assuming he's not the only player taking such shifts. He may be playing shorter shifts and therefore at a higher pace, but so are his opponents.
 

reckoning

Registered User
Jan 4, 2005
7,098
1,456
I appreciate the work that has gone into estimating ice times. As someone whose favorite era was the 70s, it's interesting to see the results from those years.

I think the main question mark would be if defensive forwards are being undervalued by it. If an elite defensive forward is succeeding at shutting down the opposition, but not creating any offence for his team, it would stand to reason that his TGA and TGF would be lower than a teammate who was the opposite (good offence, no defence). So it's possible that the defensive minded player may have less TGF/TGA, but more icetime.

Iain, with the players who were off the most between your estimates and the actual times for '98, were they more defensive minded than offensive minded?
 

BraveCanadian

Registered User
Jun 30, 2010
15,391
4,696
So there is about a 5-6% error rate between the estimates created by your method and the actual recorded ice time in 1997-98. I've seen it posited on this board that the error would likely increase the farther back in time you go. The reason - this method is calibrated to the modern player usage, while the farther back you go, the longer the shifts and the shorter the benches. Do you agree or disagree?

I think that was me!

No question player usage changed between 67 and 97.

Even something fairly recent like TV timeouts would affect it because coaches were getting free timeouts and could play the guys they wanted to play more.. more. That shows up in GF and GA but how much? Was that a contributing factor in the front liners seeming to get more and more a portion of the offense as the 90s went on? I think it contributed.

Another thing that makes me question the accuracy is that obviously if we're going by GF and GA then the amount of GF and GA on average would impact how many data points you have to work with. Lower scoring seasons are going to be tougher than maybe the high flying 80s might be.

As someone else pointed out, a defensive player who was very effective but not generating much offense would not have many GF or GA and have to be fudged in..

Which brings us to a topic that I recall from earlier discussions about the icetime estimates.. aren't their fudge factors built into the estimates based on the subjective categorization of players?

ie. First liners for example are

estimate x some estimated number = estimated ice time

while second liners are

estimate x some other estimated number = ice time estimate?

I think it is just a pretty tough thing to do based on fairly sparse data.
 

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,783
Lake Memphremagog, QC.
When Was Ice Time Tracked?

When was ice time first tracked - well at least as early as 1936 - thanks to a recent find by Overpass:

http://hfboards.mandatory.com/showpost.php?p=80516921&postcount=282

One goalie, 14 skater roster. Very revealing about how the players and lines were used. How Charlie Conacher received extra ice time and other nuances of the game.

Basic fact is that the required game sheets exist back to the twenties, question of getting access from the NHL offices.

The publishing of the NHL boxscore was rather hit and miss but the techniques to generate them were not. The techniques and methodology were readily available, and other than some unfortunate round-off to the nearest minute are viable.

The history of hockey deserves the respect of 100% accuracy.
 

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
14,338
29,555
Basic fact is that the required game sheets exist back to the twenties, question of getting access from the NHL offices.

The history of hockey deserves the respect of 100% accuracy.

I've seen the old NHL game sheets - they do not have the ice time totals that you suggest.

And 100% accuracy is impossible in many (most?) cases. Is your suggestion that we don't discuss it at all in those cases (out of respect)?

We do the best that we can. Don't let perfect be the enemy of good.
 

plusandminus

Registered User
Mar 7, 2011
1,411
269
I studied this subject thoroughly in 2011 or 2012 (and before that in 2003), and posted about it here.
(I did it for every player on every team during all the years icetime was available.)

My experience, and advice, is to take estimated ice times with a large grain of salt.
The estimations are often correct within a few percentage units, but unfortunately they are sometimes way wrong, and a problem is that we don't know when they are.

I also think one should accept that we don't know how many ES goals for and against different players were on ice for. I'm against ignoring short handed goals, as if they never happened, because they did happen and they surely do affect the ES stats.


I also wonder, what is the point of trying to estimate past icetimes? (Apart from out of curiosity.)

If you want to find out how much ice time a player factually had, why not just accept we don't know? Instead we can focus on how large percentage of the team's GF and GA a player was on ice for, keeping in our had that we based on that do not know about their icetimes.

If you want to use the icetime estimates for further calculations, I advice against it. It will just creating partly misleading results.

For example, if trying to get insight into how good a player was at preventing goals, the effect of the goalie is a key factor that should/"must" be determined. Unfortunately it's not easy to adjust for goalie influence.

A common mistake people do, is to over-interpret statistical differences between players. Small differences can almost be ignored, as things like "luck" and error margins (if for example adjusting things, etc.) come affect things probably more. For example, they can rate a player ahead of another based on that they scored 10 % more points, while totally ignoring everything else (role on team, "luck", etc.).
So if we know that ice times have an "average error rate of 5 %" or so (just to take a number), we need differences significantly larger than that to draw conclusion regarding "A probably played more than B", but even then we don't know since in some cases the errors are even larger.

I think we cannot draw ice time related conclusions for seasons before ice time was tracked.
 
Last edited:

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,783
Lake Memphremagog, QC.
100% Accuracy

I've seen the old NHL game sheets - they do not have the ice time totals that you suggest.

And 100% accuracy is impossible in many (most?) cases. Is your suggestion that we don't discuss it at all in those cases (out of respect)?

We do the best that we can. Don't let perfect be the enemy of good.

Seems that you are striving for 100% accuracy in the goalie stats from the eighties;

http://hfboards.mandatory.com/showthread.php?t=1601949&page=4

but arguing against it here.
 

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
14,338
29,555
Seems that you are striving for 100% accuracy in the goalie stats from the eighties;

http://hfboards.mandatory.com/showthread.php?t=1601949&page=4

but arguing against it here.

I'm certainly not arguing both sides (as you suggest). Let me clarify.

The goal is 100% accuracy, yes. Do I think that I'll ever get all the way there? No.

I'm not letting that fact stop me from continuing to research, however. I started doing goaltender game logs in 1994, and the amount of data available was miniscule compared to now. I expect that it's only going to get better.

I know that the results I've currently got published on my webpage are not 100% accurate. Should I not publish them "out of respect"? Of course not. They're miles better than anything that's existed before, and many people have used them in their own research.

One of my hobbies is mountain climbing. When you start a climb, it typically seems like an impossible task. During these climbs, a friend of mine likes to ask "how do you eat an elephant?" The answer is "one bite at a time".

Don't let perfect be the enemy of good. Feel free to have a bite of elephant, and contribute to the effort towards better understanding. :)
 

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
14,338
29,555
Back to the point of the thread - ice times from the 2013-14 season aren't accurate to the second in all cases. Does that mean that we should ignore them?
 

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,783
Lake Memphremagog, QC.
!00% Accuracy II

I'm certainly not arguing both sides (as you suggest). Let me clarify.

The goal is 100% accuracy, yes. Do I think that I'll ever get all the way there? No.

I'm not letting that fact stop me from continuing to research, however. I started doing goaltender game logs in 1994, and the amount of data available was miniscule compared to now. I expect that it's only going to get better.

I know that the results I've currently got published on my webpage are not 100% accurate. Should I not publish them "out of respect"? Of course not. They're miles better than anything that's existed before, and many people have used them in their own research.

One of my hobbies is mountain climbing. When you start a climb, it typically seems like an impossible task. During these climbs, a friend of mine likes to ask "how do you eat an elephant?" The answer is "one bite at a time".


Don't let perfect be the enemy of good. Feel free to have a bite of elephant, and contribute to the effort towards better understanding. :)

Point is that there is a definite lack of respect for each bite of the elephant from certain posters here.

EM Orlick was criticized after only one of his articles was found not the complete collection which saw other articles added to later. Likewise it is common practice to question memories of the likes of Henry Joseph because they may be less than 100%.

Now we have evidence of of Charlie Conacher playing 39 minutes out of a 60 minute playoff game is this being celebrated as a bite of the elephant? No suddenly what is available in game logs is questioned.

So unless you support the contributions, partial as they may be of a Henry Joseph as being significant bites of the elephant, you are arguing both sides.

Unless you support the value of the 1936 find posted by Overpass and reposted above while encouraging similar research for more similar finds as opposed to estimates that you have previously stated would never be 100% accurate:

http://hfboards.mandatory.com/showpost.php?p=89392779&postcount=119

Predictive models or estimates inherently lack 100% accuracy by your own admission. Above in this thread, there is an admission that in a best case scenario there will be a 5-6% margin of error that will increase significantly the further back the model goes.


Seems that estimates are just a poor substitute for doing the actual research.
 

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
14,338
29,555
Who says that there's only one way to advance the research? Predictive models are a very valid method for understanding the sport of hockey. No one's forcing anyone to pay attention - feel free to change the channel if the mood strikes you.

No one's showing a lack of respect ("definite" or otherwise) to the history of the sport. No one. Why in the world would anyone do that much research just to "disrespect" the sport that we all love?
 

TheDevilMadeMe

Registered User
Aug 28, 2006
52,271
6,990
Brooklyn
I'd say I mostly disagree. There will always be X number of players on the ice when a goal is scored, both for the scoring team and their opponents. That's not going to change regardless of the length of shifts or benches. If a player plays 25 minutes in two-minute shifts, or 25 minutes in 45-second shifts, there's no particular reason to think his plus-minus components will be affected significantly, assuming he's not the only player taking such shifts. He may be playing shorter shifts and therefore at a higher pace, but so are his opponents.

Okay. I hope that once you are finished tweaking that you release what goes into the formula - only possible way to fully evaluate it, and in particular whether any of the variables might be affected by BraveCanadian's concerns.
 
Last edited:

TheDevilMadeMe

Registered User
Aug 28, 2006
52,271
6,990
Brooklyn
I've seen the old NHL game sheets - they do not have the ice time totals that you suggest.

And 100% accuracy is impossible in many (most?) cases. Is your suggestion that we don't discuss it at all in those cases (out of respect)?

We do the best that we can. Don't let perfect be the enemy of good.

To add to this, I would love it if someone were able to get the NHL to release game sheets giving player ice time back to the 1920s. I'm not at all convinced that such sheets will exist for all or even most games, but if they do, and the NHL would be willing to give them up, it would be fantastic.

But until someone does so, using the data we do have to estimate ice time is the best we have, and I applaud the effort. The OP is completely honest about the error rate; if a third party purports that the estimates are more accurate than they actually are, that is hardly the OP's fault.
 
  • Like
Reactions: Hockey Outsider

Iain Fyffe

Hockey fact-checker
Point is that there is a definite lack of respect for each bite of the elephant from certain posters here.
Gee, who could you be referring to here?

You are, of course, missing the point. There is a substantial difference between estimating past ice times, knowing that they are estimates and cannot be considered fully accurate, and establishing facts as they happened in the past.

Orlick's research is dealing with things like - who played the first ice hockey games in Canada, and where, and when. These are facts that should be establishable.

Estimating ice time is used to gain a little insight into the more recent past, but is not an attempt to establish facts. As such, different standards are appropriate.

Also, please refrain from dragging other threads into this one. This thread is for discussing estimated ice times. It has nothing to do with the origins of hockey.

Now we have evidence of of Charlie Conacher playing 39 minutes out of a 60 minute playoff game is this being celebrated as a bite of the elephant? No suddenly what is available in game logs is questioned.
I wouldn't question that. We know that some forwards in 1927/28 probably played 45 minutes or more per game, so a forward playing 39 minutes in the 1930s is certainly plausible. The 1930s are not the 1960s.

Seems that estimates are just a poor substitute for doing the actual research.
Doing what actual research? Are you suggesting that ice times for 1967/68 to 1996/97 are actually recorded somewhere? If not, then this comment is completely off-base.
 

Mickey Marner

Registered User
Jul 9, 2014
19,893
21,755
Dystopia
Iain, do you factor in IPP (individual points percentage; PTS/TGF) into your estimates? It seems what percentage of TGF a player generates when he is on the ice would be worth considering.
 

Iain Fyffe

Hockey fact-checker
I think the main question mark would be if defensive forwards are being undervalued by it. If an elite defensive forward is succeeding at shutting down the opposition, but not creating any offence for his team, it would stand to reason that his TGA and TGF would be lower than a teammate who was the opposite (good offence, no defence).
Absolutely, this is one group that I suspect would be systematically under-estimated, for the very reasons you set out. The problem with testing this is that we cannot be sure, before very recent seasons, who was playing against the top talent. We would know some of them, but wouldn't be able to account for all of them.

I should probably run the analysis for a more recent year, where we do have some estimate of quality of competition, to see if there's a pattern.
 

Iain Fyffe

Hockey fact-checker
Even something fairly recent like TV timeouts would affect it because coaches were getting free timeouts and could play the guys they wanted to play more.. more. That shows up in GF and GA but how much?
It shows up in the GF and GA in the sense of there being more for the top players because they play more. But is there reason to believe it affects the distribution of GF/GA per minute of ice time?

Another thing that makes me question the accuracy is that obviously if we're going by GF and GA then the amount of GF and GA on average would impact how many data points you have to work with. Lower scoring seasons are going to be tougher than maybe the high flying 80s might be.
This is true, greater granularity should produce better estimates.

Which brings us to a topic that I recall from earlier discussions about the icetime estimates.. aren't their fudge factors built into the estimates based on the subjective categorization of players?
That depends on whose version you're using, surely. In the first version I did years ago, there was a fudge factor built in. But I no longer do that, because such inherent subjectivity is undesirable, and the system is much more sophisticated now which obviates the need for it anyway.
 

Iain Fyffe

Hockey fact-checker
When was ice time first tracked - well at least as early as 1936 - thanks to a recent find by Overpass:
Actually the New York Times published season-end stats for 1927/28 that included total ice time for the season. Not sure if it was tracked by the league or by the paper, but it's nice to have a contemporary source to give us some idea for the time, even knowing that it can't be considered 100% accurate.
 

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,783
Lake Memphremagog, QC.
Sample Size and Available Minutes

The subject of estimating ice time for seasons before it was tracked came up in another thread. Killion suggested I start a thread on this topic. So here we go.



I'm only going to get into the basic principles here, and specifically I'm only going to address the post-expansion era, because before 1967 we're missing the juiciest data for estimating ice time; estimates before expansion will necessarily use a somewhat different methodology, and will be less reliable.

To begin with, to estimate ice time with any semblance of accuracy you need to do it team-by-team, not player-by-player. It's a team's total available ice time that is divided amongst its players, so that's where you have to start. If you just look at an individual player, you could end up with an unsupportable estimate such as Bobby Hull playing 40 minutes per game. You need to start with the team.

You also want to be systematic. I won't use the term objective, although it could arguably apply, because there is judgment involved in building the model in the first place. However, you want it to be systematic in order to avoid personal or cognitive biases entering into it. You want to be able to apply the same estimator to every player, and not make exceptions here and there. Obviously no estimate will be perfect, and some players will be off by more than others. Indeed it's possible that certain types of players will be off by more than others. But a model with a known error range is better than "I don't know", and it's better than a guess which has an unknown error range.

So what sort of inputs should we use? Obviously, games played are important. Scoring totals are useful, but only to a limited degree due to the variation in rates at which players accumulate points. What we really want is something that provides some reflection of the player being on the ice, and that is subject to relatively little influence by the individual player. By far the best candidate is the component parts of plus-minus. Not the plus-minus rating itself, which is useless here, but the totals of goals for and goals against, power-play and non-power-play that the player is on the ice for (that is, TGF, PGF, TGA and PGA). A player has less control over these items that he does over his goal or assist total, since they concern not only teammates but opponents as well.

Whenever a player is on the ice, he leaves behind data that he was there. Regardless of why a player was on the ice, he will leave behind statistical clues that he was on the ice. And that's what we're concerned with here. If you want to interpret the estimate afterwards to determine why the estimate is such, that's fine, but first you need to have the estimate. As such, much of what Killion mentions above does not need to be considered individually. We're looking for the player's ice time, not the coach's reasoning for giving him that amount of ice time. If a player plays both forward and defence in a game, that will affect his TGF, PGF, TGA and PGA. If first-liners sit late in some games, that will affect their TGF, PGF, TGA and PGA since you cannot accumulate them if you're not on the ice. Again, to develop an estimate it doesn't matter why a player was or was not on the ice, what matters is that he was or was not on the ice. If you get bogged down by such details, you risk missing the forest for the trees.

So the power-play components are the building blocks of the ice time estimate, specifically comparing a player's totals to the team's totals. (If we had shots for and shots against while the player was on the ice, like we do today, that would be even better because they're more granular, but we can only work with what we have.) I'm not going to go into detail on my exact method right now, in part because it's currently undergoing revision for an idea I had recently. However, we can look at some aggregate figures to get a sense of the accuracy at the moment. Let's look at 1997/98, the first season ice time data was tracked by the NHL.

If we look at the top 50 centres by actual ice time that season, we find they played a total of 73,377 actual minutes, and my estimator has a total of 73,855 estimated minutes. The total error (which is the sum of the absolute values of the differences between the estimated and actual ice time for each player) is 4,114. So the error is 5.6% of the estimated time, which means that the average estimate is 5.6% off.

For the top 50 left wings, the figures are: 61,204 actual minutes; 61,752 estimated minutes; 3,532 total error; 5.7% error rate.

For the top 50 right wings, the figures are: 66,763 actual minutes; 66,905 estimated minutes; 3,244 error; 4.8% error rate.

For the top 100 defencemen, the figures are: 163,459 minutes; 157,122 estimated minutes; 9,665 error; 6.2% error rate.

So the estimates for the defencemen are the least reliable, but still a 6% error is pretty decent. So this is the model that gets used for previous seasons, when we do not have actual ice times but we do have plus-minus components recorded, from 1967/68 onward. And as for any changes in the way the game has been played since then, just remember: it does not matter why the player was on the ice, or when the player was on the ice, it matters that he was on the ice, and that when players are on the ice, they leave behind clues that they were on the ice. That is the guiding principle.

It doesn't matter that shifts are shorter now; goals are still scored for and against every player that is on the ice at a particular time. It doesn't matter if a forward plays the point on the PP; he leaves behind clues that this was the case. As I said, certain types of players (that is, players used in certain sets of situations) will have a greater average error, but no system is perfect, and being systematic is better, on the whole, than assigning minutes based on inspection, which allows personal biases to enter into it and will not consider all available information.

Questions about your sample size and the selection of top 50 or top 100 skaters.

1997-98 NHL season featured 26 teams playing an 82 game Schedule.

http://www.hockey-reference.com/leagues/NHL_1998.html

This means that the regular season schedule featured a total of 82 x 13 games = 1066 games. Will not consider overtime minutes for the sake of brevity but each sixty minute game had a potential of 120 minutes for the Centers, RWs and LWs, 240 minutes for the defensemen.

For the season this generates potentially 127,920 C/LW/RW minutes and 255,840dmen minutes.Again excluding any regular season overtime minutes. So looking at somewhere in the 50-65 % of the actual minutes regular season, non overtime for your model. You largest margin of error happens when yo look at the largest sample space - 100 defensemen yields a 6.2% margin of error compared to 4.8,5.6, 5.7 when a sample space of 50 is used. So if all the minutes and all the players were considered your margin of error would only go up, well beyond 6.2% since the estimâtes on the marginal players are the trickiest. expansion teams or teams with huge roster turnovers in a season would be especially problematic in this regard.

A sampling of say 60/120 where 1/3 of the sample is from the top group, 1/3 is from the middle group and 1/3 is from the bottowm group would provide data with a lower margin of error. For forwards this would mean (20/20/20) for each position while defensemen would be (40/40/40).

This would allow you to consider all information for all skaters from top to bottom without biases that may arise from a top 50 or 100 perspective.
 

Iain Fyffe

Hockey fact-checker
I also wonder, what is the point of trying to estimate past icetimes? (Apart from out of curiosity.)
To try to shed some light on recent history. To help us understand the players from the 1960s to 1980s better.

Instead we can focus on how large percentage of the team's GF and GA a player was on ice for, keeping in our had that we based on that do not know about their icetimes.
What's the difference? Think of estimated ice time as just another way of presenting the information contained in the percentage of team GF and GA, if that helps you. Putting the data in a another form. That's what the most basic of ice time estimates does.

If you want to use the icetime estimates for further calculations, I advice against it. It will just creating partly misleading results.
As will doing any calculations that do not consider ice time. Per-game calculations, for example, are misleading because they do not consider differences in ice time.

There are no calculations that will not be "partly misleading." That is, there is no perfect calculation. Since none of them are perfect, avoiding one because it is not perfect seems incongruous.
 
  • Like
Reactions: seventieslord

Iain Fyffe

Hockey fact-checker
This would allow you to consider all information for all skaters from top to bottom without biases that may arise from a top 50 or 100 perspective.
Don't confuse the presentation of some of the data, with the method of developing the model. All players are considered in the model. Moreover, I presented the top players because generally speaking, they're the players we're interested in. If your system sacrifices some accuracy in players who played 20 games that season, in exchange for greater accuracy in players who played every game that season, that's a useful trade-off.

Taken as a whole, the average error for defencemen is 7.0% and for forwards it's 8.0%.
 

plusandminus

Registered User
Mar 7, 2011
1,411
269
To try to shed some light on recent history. To help us understand the players from the 1960s to 1980s better.

That I have nothing against.

What's the difference? Think of estimated ice time as just another way of presenting the information contained in the percentage of team GF and GA, if that helps you. Putting the data in a another form. That's what the most basic of ice time estimates does.

No! They are two different things. One tells you how much factual icetime a player really had, the other does not (it only tells you how many GF and GA the player was on ice for, which is something else).

As will doing any calculations that do not consider ice time. Per-game calculations, for example, are misleading because they do not consider differences in ice time.

Not necessarily. I think per game stats should be per game stats, regardless of the players' icetimes in those games.

If, however, wanting to look at "per minute" or "per 60 minute" stats, icetimes are necessary. And then it is a bad idea to use GF+GA to create "estimated icetimes".


I am very sure about what I wrote in my previous post (and what I write in this one), and am surprised isn't common knowledge here among hockey fans. If people have examined things, and published what can be learned, I wish those learnings could sort of stick (and be used to build further upon).

In this case, GF+GA is not reliable when estimating icetimes.
I'm not sure if this is understandable, but if 5 % is the average error, it means that many times the error is LARGER than that - sometimes much larger. And we do not know when it is.

Also, ES GF+GA cannot be calculated by simply taking TGF (or TGA) minus PPGF (or PPGA). It's a fact everyone working with stats should be aware of. SH goals affect things significantly.

Combining those two things, will make things even worse. We end up with such unreliable stats that we do not know how they fit reality.
Then, when people look at the presentations, they notice that "Player A is 5 % better than player B", and draws conclusions based on it, while in reality - if using factual data - it might as well have been the other way around.

And so on... Error rates grows even further.
Estimated icetimes are useless when determining how a player did on a "per x minute" basis.
 

Iain Fyffe

Hockey fact-checker
No! They are two different things. One tells you how much factual icetime a player really had, the other does not (it only tells you how many GF and GA the player was on ice for, which is something else).
It does no such thing. It tells you about how much ice time a player probably had.

I'm talking about ice time estimates here. Why would you pay any attention to GF% and GA% if you weren't interested in the approx. proportion of the team's minutes the player was playing?

Not necessarily. I think per game stats should be per game stats, regardless of the players' icetimes in those games.
That's great, but my point is that in this context "game" for Player A is not necessarily the same as "game" for Player B, making the results inherently misleading. If one player plays 20 minutes include 5 on the PP, while the other plays only 12 minutes at ES, then any per-game comparison between them is inherently misleading on the player level, because you're not comparing players on the same basis.

I am very sure about what I wrote in my previous post (and what I write in this one), and am surprised isn't common knowledge here among hockey fans. If people have examined things, and published what can be learned, I wish those learnings could sort of stick (and be used to build further upon).
You'll have to be more specific here. What are you surprised isn't common knowledge?

In this case, GF+GA is not reliable when estimating icetimes.
I'm not sure if this is understandable, but if 5 % is the average error, it means that many times the error is LARGER than that - sometimes much larger. And we do not know when it is.
It is understandable. 5% is the average error, but the error can be 0% for some players and 15% for others (that's about the highest for my estimator). And as long as you bear that in mind, there's no issue.

What do you mean by reliable? Something does not have to be 100% accurate to be useful.

Also, ES GF+GA cannot be calculated by simply taking TGF (or TGA) minus PPGF (or PPGA). It's a fact everyone working with stats should be aware of. SH goals affect things significantly.
Absolutely, SH GF and GA have to be estimated, which can be done fairly well. It's part of what is built into the average error.

Combining those two things, will make things even worse. We end up with such unreliable stats that we do not know how they fit reality.
Then, when people look at the presentations, they notice that "Player A is 5 % better than player B", and draws conclusions based on it, while in reality - if using factual data - it might as well have been the other way around.
I don't disagree that people can and will misuse such estimates. That doesn't mean it's an issue with the estimates, it's an issue with how people use them.

I don't disagree that people don't understand the role than normal variation plays in player statistics. But the solution to that is to stop using statistics, because some people might misuse them. Illustrate how they're misused, when they're misused.
 

Ad

Upcoming events

Ad

Ad