Iain Fyffe
Hockey fact-checker
The subject of estimating ice time for seasons before it was tracked came up in another thread. Killion suggested I start a thread on this topic. So here we go.
I'm only going to get into the basic principles here, and specifically I'm only going to address the post-expansion era, because before 1967 we're missing the juiciest data for estimating ice time; estimates before expansion will necessarily use a somewhat different methodology, and will be less reliable.
To begin with, to estimate ice time with any semblance of accuracy you need to do it team-by-team, not player-by-player. It's a team's total available ice time that is divided amongst its players, so that's where you have to start. If you just look at an individual player, you could end up with an unsupportable estimate such as Bobby Hull playing 40 minutes per game. You need to start with the team.
You also want to be systematic. I won't use the term objective, although it could arguably apply, because there is judgment involved in building the model in the first place. However, you want it to be systematic in order to avoid personal or cognitive biases entering into it. You want to be able to apply the same estimator to every player, and not make exceptions here and there. Obviously no estimate will be perfect, and some players will be off by more than others. Indeed it's possible that certain types of players will be off by more than others. But a model with a known error range is better than "I don't know", and it's better than a guess which has an unknown error range.
So what sort of inputs should we use? Obviously, games played are important. Scoring totals are useful, but only to a limited degree due to the variation in rates at which players accumulate points. What we really want is something that provides some reflection of the player being on the ice, and that is subject to relatively little influence by the individual player. By far the best candidate is the component parts of plus-minus. Not the plus-minus rating itself, which is useless here, but the totals of goals for and goals against, power-play and non-power-play that the player is on the ice for (that is, TGF, PGF, TGA and PGA). A player has less control over these items that he does over his goal or assist total, since they concern not only teammates but opponents as well.
Whenever a player is on the ice, he leaves behind data that he was there. Regardless of why a player was on the ice, he will leave behind statistical clues that he was on the ice. And that's what we're concerned with here. If you want to interpret the estimate afterwards to determine why the estimate is such, that's fine, but first you need to have the estimate. As such, much of what Killion mentions above does not need to be considered individually. We're looking for the player's ice time, not the coach's reasoning for giving him that amount of ice time. If a player plays both forward and defence in a game, that will affect his TGF, PGF, TGA and PGA. If first-liners sit late in some games, that will affect their TGF, PGF, TGA and PGA since you cannot accumulate them if you're not on the ice. Again, to develop an estimate it doesn't matter why a player was or was not on the ice, what matters is that he was or was not on the ice. If you get bogged down by such details, you risk missing the forest for the trees.
So the power-play components are the building blocks of the ice time estimate, specifically comparing a player's totals to the team's totals. (If we had shots for and shots against while the player was on the ice, like we do today, that would be even better because they're more granular, but we can only work with what we have.) I'm not going to go into detail on my exact method right now, in part because it's currently undergoing revision for an idea I had recently. However, we can look at some aggregate figures to get a sense of the accuracy at the moment. Let's look at 1997/98, the first season ice time data was tracked by the NHL.
If we look at the top 50 centres by actual ice time that season, we find they played a total of 73,377 actual minutes, and my estimator has a total of 73,855 estimated minutes. The total error (which is the sum of the absolute values of the differences between the estimated and actual ice time for each player) is 4,114. So the error is 5.6% of the estimated time, which means that the average estimate is 5.6% off.
For the top 50 left wings, the figures are: 61,204 actual minutes; 61,752 estimated minutes; 3,532 total error; 5.7% error rate.
For the top 50 right wings, the figures are: 66,763 actual minutes; 66,905 estimated minutes; 3,244 error; 4.8% error rate.
For the top 100 defencemen, the figures are: 163,459 minutes; 157,122 estimated minutes; 9,665 error; 6.2% error rate.
So the estimates for the defencemen are the least reliable, but still a 6% error is pretty decent. So this is the model that gets used for previous seasons, when we do not have actual ice times but we do have plus-minus components recorded, from 1967/68 onward. And as for any changes in the way the game has been played since then, just remember: it does not matter why the player was on the ice, or when the player was on the ice, it matters that he was on the ice, and that when players are on the ice, they leave behind clues that they were on the ice. That is the guiding principle.
It doesn't matter that shifts are shorter now; goals are still scored for and against every player that is on the ice at a particular time. It doesn't matter if a forward plays the point on the PP; he leaves behind clues that this was the case. As I said, certain types of players (that is, players used in certain sets of situations) will have a greater average error, but no system is perfect, and being systematic is better, on the whole, than assigning minutes based on inspection, which allows personal biases to enter into it and will not consider all available information.
However as Ive suggested if you do start a Thread on it, just make sure your doing it in a compartmentalized format by era, by position, by roster sizes & number of games etc. But sure... good idea. Will require some serious knowledge of the game & its eras', of actual strategies & in-game situations, reliance upon star players & the movements inter-game in terms of utility players (Red Kelly for eg) from the Blue line to Forward & back resulting in greater ice-time than one might imagine. Hull on the Point in PP. In PK situations and against which opponents. Offensive stars, First Liners "sat" late in a game when the scores out of reach & so on. Defensive pairings wracking up astronomical minutes. The change in the ebb & flow pre-short-shift. Pacing. A great many factors involved and if your game, by all means, start the Thread ('s).
I'm only going to get into the basic principles here, and specifically I'm only going to address the post-expansion era, because before 1967 we're missing the juiciest data for estimating ice time; estimates before expansion will necessarily use a somewhat different methodology, and will be less reliable.
To begin with, to estimate ice time with any semblance of accuracy you need to do it team-by-team, not player-by-player. It's a team's total available ice time that is divided amongst its players, so that's where you have to start. If you just look at an individual player, you could end up with an unsupportable estimate such as Bobby Hull playing 40 minutes per game. You need to start with the team.
You also want to be systematic. I won't use the term objective, although it could arguably apply, because there is judgment involved in building the model in the first place. However, you want it to be systematic in order to avoid personal or cognitive biases entering into it. You want to be able to apply the same estimator to every player, and not make exceptions here and there. Obviously no estimate will be perfect, and some players will be off by more than others. Indeed it's possible that certain types of players will be off by more than others. But a model with a known error range is better than "I don't know", and it's better than a guess which has an unknown error range.
So what sort of inputs should we use? Obviously, games played are important. Scoring totals are useful, but only to a limited degree due to the variation in rates at which players accumulate points. What we really want is something that provides some reflection of the player being on the ice, and that is subject to relatively little influence by the individual player. By far the best candidate is the component parts of plus-minus. Not the plus-minus rating itself, which is useless here, but the totals of goals for and goals against, power-play and non-power-play that the player is on the ice for (that is, TGF, PGF, TGA and PGA). A player has less control over these items that he does over his goal or assist total, since they concern not only teammates but opponents as well.
Whenever a player is on the ice, he leaves behind data that he was there. Regardless of why a player was on the ice, he will leave behind statistical clues that he was on the ice. And that's what we're concerned with here. If you want to interpret the estimate afterwards to determine why the estimate is such, that's fine, but first you need to have the estimate. As such, much of what Killion mentions above does not need to be considered individually. We're looking for the player's ice time, not the coach's reasoning for giving him that amount of ice time. If a player plays both forward and defence in a game, that will affect his TGF, PGF, TGA and PGA. If first-liners sit late in some games, that will affect their TGF, PGF, TGA and PGA since you cannot accumulate them if you're not on the ice. Again, to develop an estimate it doesn't matter why a player was or was not on the ice, what matters is that he was or was not on the ice. If you get bogged down by such details, you risk missing the forest for the trees.
So the power-play components are the building blocks of the ice time estimate, specifically comparing a player's totals to the team's totals. (If we had shots for and shots against while the player was on the ice, like we do today, that would be even better because they're more granular, but we can only work with what we have.) I'm not going to go into detail on my exact method right now, in part because it's currently undergoing revision for an idea I had recently. However, we can look at some aggregate figures to get a sense of the accuracy at the moment. Let's look at 1997/98, the first season ice time data was tracked by the NHL.
If we look at the top 50 centres by actual ice time that season, we find they played a total of 73,377 actual minutes, and my estimator has a total of 73,855 estimated minutes. The total error (which is the sum of the absolute values of the differences between the estimated and actual ice time for each player) is 4,114. So the error is 5.6% of the estimated time, which means that the average estimate is 5.6% off.
For the top 50 left wings, the figures are: 61,204 actual minutes; 61,752 estimated minutes; 3,532 total error; 5.7% error rate.
For the top 50 right wings, the figures are: 66,763 actual minutes; 66,905 estimated minutes; 3,244 error; 4.8% error rate.
For the top 100 defencemen, the figures are: 163,459 minutes; 157,122 estimated minutes; 9,665 error; 6.2% error rate.
So the estimates for the defencemen are the least reliable, but still a 6% error is pretty decent. So this is the model that gets used for previous seasons, when we do not have actual ice times but we do have plus-minus components recorded, from 1967/68 onward. And as for any changes in the way the game has been played since then, just remember: it does not matter why the player was on the ice, or when the player was on the ice, it matters that he was on the ice, and that when players are on the ice, they leave behind clues that they were on the ice. That is the guiding principle.
It doesn't matter that shifts are shorter now; goals are still scored for and against every player that is on the ice at a particular time. It doesn't matter if a forward plays the point on the PP; he leaves behind clues that this was the case. As I said, certain types of players (that is, players used in certain sets of situations) will have a greater average error, but no system is perfect, and being systematic is better, on the whole, than assigning minutes based on inspection, which allows personal biases to enter into it and will not consider all available information.