The Problem with HR's Adjusted Points

20feet

Registered User
Nov 1, 2022
8
13
A couple years ago, I set out to create my own ranking of the greatest players of all time, based in part on Hockey Reference's Adjusted stats. I soon realized that players from the 40s, 50s and 60s rate poorly according to these stats, and I wondered if that was an accurate reflection of their achievements, or an artifact of HR's methods.

Most of the adjustments they make are pretty common-sense. You obviously want to adjust for the length of season, goals per game, and assists per goal. But they also adjust for roster size, applying a penalty to players before 82-83, when rosters were smaller. I've concluded that this is a fair adjustment.

The reasoning, I think, is that smaller rosters mean players get more TOI on average per game. Players on a 16-man roster play 11% more on average than those on a 18-man roster; shouldn't we adjust their point totals accordingly? This sounds reasonable, but we can check whether this adjustment produces plausible results.

I'll focus on Adjusted Goals (G/A) here, because then I can ignore the assists per goal adjustment. If you look at the 37 players who've scored 60 G/A, the first thing you might notice is there are a lot from the 1920s and 30s. Look closer, and you'll see very few from the following decades. Specifically, there are:

10 from 1925-32 - more than one per year!
3 from 1933-71 - one every 13 years
3 from 1972-82 - one every 3 1/2 years
21 from 1983-2024 - one every 2 years

The shift after 1932 is stark, from 60 G/A seasons being routine, to almost non-existent. I've partitioned the last 3 eras based on changes to roster sizes: from 16 skaters to 17 in 1972, and to the current standard of 18 in 1983. There were several other changes to rosters in the 60s and earlier, but generally rosters were smaller than 17. You can see that the differences between these eras are quite dramatic.

It's unclear exactly what numbers HR is using for historical roster sizes, but one account of the history is here. Bear in mind, adjusting for a roster of 17 means removing 5.5% from a player's goal total. If they play on a roster of 16, they're penalized by 11%, and if it's 14, 22%!

I've looked at this a few different ways, and I'm convinced that the roster size adjustment is misguided, and is artificially depressing the numbers of players before 1983. I'll have more to share in the future, but I'm curious what others think, and if someone else out there is publishing adjusted stats without the roster size adjustment.
 

The Panther

Registered User
Mar 25, 2014
19,772
16,655
Tokyo, Japan
There are numerous complications with ANY employed system of "adjusted stats", and the one used by Hockey Reference -- if I remember correctly -- is based on a baseball system, which is a more linear and simple sport in terms of scoring than is hockey.

The roster-size impact you note is one of the factors to consider, yes. But I'd say it's a fairly minor factor. After all, the top players are always going to get the most ice-time. A top scorer isn't going to get 11% more ice-time before 1982-83. More likely, a 4th liner is going to have had more.

Anyway, other factors not taken into account by Hockey Ref. (just off the top of my head):

-- Distribution of scoring across lines/teams
This is probably the biggest problem with adjusting scoring simply based on League scoring rates. It's possible, for example, that in 1975 an average second line accounted for 24% of a team's total goals. But maybe in 2011 an average second line accounted for just 15% of a team's total goals. The simple Hockey Ref. system doesn't take this into account, which means top-line scorers in 1975 will be 'punished' more and top-line scorers in 2011 'rewarded' more.

-- Differences in PP frequency / PP goals
Obviously, a huge numbers of PP opportunities will drive up offense. Hockey Ref. doesn't take this into account at all.

-- Length of Season
I suppose there's no way to account for this statistically, but logically we can say that it's harder to maintain a scoring (or winning) pace for 80 games than it is for 50 games (as in 1943). Again, Hockey Ref. doesn't take this into account.

-- Outliers skewing stats
An obvious example here would be Boston in 1970-71 (literally more than doubling other teams in scoring; having 7 of the top-10 scorers in the League) or the Gretzky-Oilers of the early-80s driving up scoring averages. This 'skewing' would be far more pronounced the smaller the League is (i.e, in the very old days).

There are lots of other factors, too, but those are just some.
 

overpass

Registered User
Jun 7, 2007
5,399
3,295
The biggest issue, especially as it relates to pre-expansion hockey, is that the era adjustment is flawed. It implicitly assumes that the average team in a 6 team league is of equal strength to the average team in a 30 team league.

Next calculate the era adjustment, which we will do by dividing 6 by the league average goals per game without the player in question. In 1952-53 a total of 1006 goals were scored in 210 games. Without Howe this works out to (1006 - 49) / 210 = 4.56 goals per game, so our era adjustment is 6 / 4.56 = 1.32.

In particular, it doesn't consider that talent was more concentrated when the league had fewer teams. If you compare 1966-67 (6 teams) to 1968-69 (12 teams), the top scorers scored significantly more in the larger league. We know the difference is because 6 new teams of players who couldn't make the 66-67 NHL were added, so it was in fact easier to score points in the 1968-69 NHL. But the formula makes no distinction between the average team of 66-67 and the average team of 68-69, although 6 low scoring expansion teams are included in the 1968-69 average which were not in the 1966-67 average.

Compare the top scorers of the 6 team 1966-67 season to the 12 team 1968-69 season.

1966-67 - The top 10 scorers averaged 67.7 points scored. H-R adjusts them to an average of 73.2 adjusted points.

1968-69 - The top 10 scorers averaged 93.5 points scored. H-R adjusts them to an average of 91.7 adjusted points.

When the adjustments conclude that the top 10 scorers were an average of 18 points better just 2 years later, you know they are missing something..

Another way to look at this is that in 1966-67, the average team scored 209 goals. In 1968-69, the average team scored 227 goals - but the average Original Six team, i.e. the same teams from the 1966-67 league, scored 260 goals. It should be obvious that, if comparing how easy it was to score in the two seasons, 260 vs 209 is more accurate than 227 vs 209. But H-R's method compares 227 and 209.

It's easy to say the method is wrong, but it's not so easy to say how to fix it. You would have to have a model for the depth of scoring talent in each NHL season. I don't propose to do so here, just to say that H-R's pre-expansion adjusted points cannot be compared to their post-expansion adjusted points.
 
Last edited:

Golden_Jet

Registered User
Sep 21, 2005
25,011
12,686
Maybe look at rules changes over the years as well.
ie. one time no forward passing
Limited number of players in certain zones
6 skaters on ice vs 5
Goalies had to serve penalties, and put a player in the net. Etc

Seems like a good history summary here
 
  • Like
Reactions: BraveCanadian

20feet

Registered User
Nov 1, 2022
8
13
There are numerous complications with ANY employed system of "adjusted stats", and the one used by Hockey Reference -- if I remember correctly -- is based on a baseball system, which is a more linear and simple sport in terms of scoring than is hockey.

The roster-size impact you note is one of the factors to consider, yes. But I'd say it's a fairly minor factor. After all, the top players are always going to get the most ice-time. A top scorer isn't going to get 11% more ice-time before 1982-83. More likely, a 4th liner is going to have had more.

Anyway, other factors not taken into account by Hockey Ref. (just off the top of my head):

-- Distribution of scoring across lines/teams
This is probably the biggest problem with adjusting scoring simply based on League scoring rates. It's possible, for example, that in 1975 an average second line accounted for 24% of a team's total goals. But maybe in 2011 an average second line accounted for just 15% of a team's total goals. The simple Hockey Ref. system doesn't take this into account, which means top-line scorers in 1975 will be 'punished' more and top-line scorers in 2011 'rewarded' more.

-- Differences in PP frequency / PP goals
Obviously, a huge numbers of PP opportunities will drive up offense. Hockey Ref. doesn't take this into account at all.

-- Length of Season
I suppose there's no way to account for this statistically, but logically we can say that it's harder to maintain a scoring (or winning) pace for 80 games than it is for 50 games (as in 1943). Again, Hockey Ref. doesn't take this into account.

-- Outliers skewing stats
An obvious example here would be Boston in 1970-71 (literally more than doubling other teams in scoring; having 7 of the top-10 scorers in the League) or the Gretzky-Oilers of the early-80s driving up scoring averages. This 'skewing' would be far more pronounced the smaller the League is (i.e, in the very old days).

There are lots of other factors, too, but those are just some.
Ya, you could identify a lot of factors that contribute to changes in scoring over the years. That context could be interesting, particularly in outlier cases like the 71 Bruins, but at a high level, it's useful to have a quick mechanical adjustment that tells you how 50 goals in 1945 compares to 69 in 2024. I think the roster adjustment is a major flaw that prevents HR's adjusted stats from doing that effectively.
 

20feet

Registered User
Nov 1, 2022
8
13
Maybe look at rules changes over the years as well.
ie. one time no forward passing
Limited number of players in certain zones
6 skaters on ice vs 5
Goalies had to serve penalties, and put a player in the net. Etc

Seems like a good history summary here
We can look at how scoring rates were affected by various rule changes, but generally these effects should all be reflected in goals per game and assists per goal.
 
  • Like
Reactions: frisco

20feet

Registered User
Nov 1, 2022
8
13
The biggest issue, especially as it relates to pre-expansion hockey, is that the era adjustment is flawed. It implicitly assumes that the average team in a 6 team league is of equal strength to the average team in a 30 team league.

Next calculate the era adjustment, which we will do by dividing 6 by the league average goals per game without the player in question. In 1952-53 a total of 1006 goals were scored in 210 games. Without Howe this works out to (1006 - 49) / 210 = 4.56 goals per game, so our era adjustment is 6 / 4.56 = 1.32.

In particular, it doesn't consider that talent was more concentrated when the league had fewer teams. If you compare 1966-67 (6 teams) to 1968-69 (12 teams), the top scorers scored significantly more in the larger league. We know the difference is because 6 new teams of players who couldn't make the 66-67 NHL were added, so it was in fact easier to score points in the 1968-69 NHL. But the formula makes no distinction between the average team of 66-67 and the average team of 68-69, although 6 low scoring expansion teams are included in the 1968-69 average which were not in the 1966-67 average.

Compare the top scorers of the 6 team 1966-67 season to the 12 team 1968-69 season.

1966-67 - The top 10 scorers averaged 67.7 points scored. H-R adjusts them to an average of 73.2 adjusted points.

1968-69 - The top 10 scorers averaged 93.5 points scored. H-R adjusts them to an average of 91.7 adjusted points.

When the adjustments conclude that the top 10 scorers were an average of 18 points better just 2 years later, you know they are missing something..

Another way to look at this is that in 1966-67, the average team scored 209 goals. In 1968-69, the average team scored 227 goals - but the average Original Six team, i.e. the same teams from the 1966-67 league, scored 260 goals. It should be obvious that, if comparing how easy it was to score in the two seasons, 260 vs 209 is more accurate than 227 vs 209. But H-R's method compares 227 and 209.

It's easy to say the method is wrong, but it's not so easy to say how to fix it. You would have to have a model for the depth of scoring talent in each NHL season. I don't propose to do so here, just to say that H-R's pre-expansion adjusted points cannot be compared to their post-expansion adjusted points.
This is a good point, especially as it relates to the years immediately following expansion. As you say, it would be difficult to correct this mechanically. But I think making the basic adjustments is still valuable. Even though adjusting for length of season, goals per game, and assists per goal doesn't put all players on a fully even footing, it's a good starting point to which additional context can be added.

The problem with the roster size adjustment is it depresses the totals of all players before 82-83, suggesting that there were very few great offensive seasons for about 50 years. It's worse than missing context - it's a distortion that will persist regardless of any context you add on top.
 

overpass

Registered User
Jun 7, 2007
5,399
3,295
This is a good point, especially as it relates to the years immediately following expansion. As you say, it would be difficult to correct this mechanically. But I think making the basic adjustments is still valuable. Even though adjusting for length of season, goals per game, and assists per goal doesn't put all players on a fully even footing, it's a good starting point to which additional context can be added.

The problem with the roster size adjustment is it depresses the totals of all players before 82-83, suggesting that there were very few great offensive seasons for about 50 years. It's worse than missing context - it's a distortion that will persist regardless of any context you add on top.

When it comes to the roster size adjustment and your observation about era effects, keep in mind that H-R is applying these formulas to adjusted points for all players, not just the top scorers. So your exercise in using the very top scoring seasons of all time to validate the formula only applies to a very small percentage of players. How does H-R's formula perform for third line forwards? To #2 defencemen? Isn't it equally important to validate the formula for them?

I don't think the roster size adjustment of 82-83 is a problem. As far as I can tell, most teams did in fact use their fourth line and third defence pairing more frequently after the 82-83 roster size change. The biggest change may have been for defencemen, as 3 defence pairings became the norm and stars were no longer playing every other shift. Most teams also saw their forward scoring more evenly distributed, at least for 5-10 years after the change. The roster size adjustment is probably appropriate for 95% of players. The players it may not work as well for is the top end superstars like Gretzky and Lemieux, who went out for 24-27 minutes/game whether there were 16 skaters or 18 skaters.

From 1990 on, there were some trends that changed the distribution of scoring so first line forwards scored a higher percentage of points. One was an increase in the percentage of goals scored on the power play. Another was the change in third/fourth line forwards to be even more focused on checking and defensive play. Another related trend was that third and fourth line shifts were shortened, and top lines got more shifts/ice time as a result.

H-R's adjusted points formula is supposed to apply to all players, so it doesn't account for these distribution changes. If you added an adjustment to reduce adjusted points post 1990, with the goal of normalizing the very top scorers to historical levels, it would also reduce the adjusted points for depth players in this era, with no justification.

In my opinion, if you really want an adjusted points formula that will work well for the greatest scorers of all eras, the formula needs to adjust for the actual conditions that star scorers faced. You can't expect a one-size-fits-all formula to do the job.
 

20feet

Registered User
Nov 1, 2022
8
13
When it comes to the roster size adjustment and your observation about era effects, keep in mind that H-R is applying these formulas to adjusted points for all players, not just the top scorers. So your exercise in using the very top scoring seasons of all time to validate the formula only applies to a very small percentage of players. How does H-R's formula perform for third line forwards? To #2 defencemen? Isn't it equally important to validate the formula for them?

I don't think the roster size adjustment of 82-83 is a problem. As far as I can tell, most teams did in fact use their fourth line and third defence pairing more frequently after the 82-83 roster size change. The biggest change may have been for defencemen, as 3 defence pairings became the norm and stars were no longer playing every other shift. Most teams also saw their forward scoring more evenly distributed, at least for 5-10 years after the change. The roster size adjustment is probably appropriate for 95% of players. The players it may not work as well for is the top end superstars like Gretzky and Lemieux, who went out for 24-27 minutes/game whether there were 16 skaters or 18 skaters.

From 1990 on, there were some trends that changed the distribution of scoring so first line forwards scored a higher percentage of points. One was an increase in the percentage of goals scored on the power play. Another was the change in third/fourth line forwards to be even more focused on checking and defensive play. Another related trend was that third and fourth line shifts were shortened, and top lines got more shifts/ice time as a result.

H-R's adjusted points formula is supposed to apply to all players, so it doesn't account for these distribution changes. If you added an adjustment to reduce adjusted points post 1990, with the goal of normalizing the very top scorers to historical levels, it would also reduce the adjusted points for depth players in this era, with no justification.

In my opinion, if you really want an adjusted points formula that will work well for the greatest scorers of all eras, the formula needs to adjust for the actual conditions that star scorers faced. You can't expect a one-size-fits-all formula to do the job.
I agree that HR's roster size adjustment is probably more appropriate for players who are lower in the lineup. But the star players are the ones I'm interested in comparing, and I think that's by far the most common use. No one's rushing to HR to see how Johnny Wilson's 42 points in 52-53 compare to second-liners today. They want to know how Howe's 95 points compare to Lemieux or McDavid.

Knocking 14% off adjusted totals in 52-53 (15 skaters on the road, 16 at home) may be appropriate for whoever's debating Johnny Wilson vs. Yegor Sharangovich, but for Howe, that's the difference between an adjusted points peak of 131 and 152.

If we're deep in a Howe vs McDavid debate, I'm happy to discuss whether we should account for the distribution of scoring on the power play or top line usage in the modern era. But I think an adjusted point total that addresses the obvious differences of season length and goal/assist rates is an important starting point, before weighing other factors that are harder to apply mechanically across all NHL history.

I'm happy to keep HR's adjusted stats around for whoever's comparing historical depth players, but I think it's leading us astray when discussing the all-time greats.
 

overpass

Registered User
Jun 7, 2007
5,399
3,295
I agree that HR's roster size adjustment is probably more appropriate for players who are lower in the lineup. But the star players are the ones I'm interested in comparing, and I think that's by far the most common use. No one's rushing to HR to see how Johnny Wilson's 42 points in 52-53 compare to second-liners today. They want to know how Howe's 95 points compare to Lemieux or McDavid.

Knocking 14% off adjusted totals in 52-53 (15 skaters on the road, 16 at home) may be appropriate for whoever's debating Johnny Wilson vs. Yegor Sharangovich, but for Howe, that's the difference between an adjusted points peak of 131 and 152.

If we're deep in a Howe vs McDavid debate, I'm happy to discuss whether we should account for the distribution of scoring on the power play or top line usage in the modern era. But I think an adjusted point total that addresses the obvious differences of season length and goal/assist rates is an important starting point, before weighing other factors that are harder to apply mechanically across all NHL history.

I'm happy to keep HR's adjusted stats around for whoever's comparing historical depth players, but I think it's leading us astray when discussing the all-time greats.
Fair enough, maybe we don't really disagree then on the details of the adjustments, more on the way they are presented.

I think the idea that one formula can make all adjustments for all players is a regrettable importation from baseball. It's very possible that hockey might require different adjusted points formulae for different questions. Maybe a standard adjustment for the player pages, then a custom adjustment for the list of best adjusted point totals of all time, with an explanation of the difference.
 

20feet

Registered User
Nov 1, 2022
8
13
Fair enough, maybe we don't really disagree then on the details of the adjustments, more on the way they are presented.

I think the idea that one formula can make all adjustments for all players is a regrettable importation from baseball. It's very possible that hockey might require different adjusted points formulae for different questions. Maybe a standard adjustment for the player pages, then a custom adjustment for the list of best adjusted point totals of all time, with an explanation of the difference.
Ya, I don't think you can generate a single stat that will put all players on even footing. But I haven't seen anyone publish what I think would be the best basic stats for comparing top offensive players across eras (at least since the mid-1930s): HR's adjusted stats, minus the roster adjustment.

HR's adjusted stats get used in a lot of places, like Paul Pidutti's Adjusted Hockey. He then applies a "Timeline Penalty" because the league now draws from a larger population and fitness and technology are always improving. That's how he has Howe ranked below Crosby and Jagr. Now, people can have different opinions about what to adjust for when comparing stars across eras, but if the stat you start with is already shaving as much as 22% off of earlier stars and you don't realize it, you end up penalizing them more than you intend to.
 

BraveCanadian

Registered User
Jun 30, 2010
15,203
4,408
Ya, I don't think you can generate a single stat that will put all players on even footing. But I haven't seen anyone publish what I think would be the best basic stats for comparing top offensive players across eras (at least since the mid-1930s): HR's adjusted stats, minus the roster adjustment.

HR's adjusted stats get used in a lot of places, like Paul Pidutti's Adjusted Hockey. He then applies a "Timeline Penalty" because the league now draws from a larger population and fitness and technology are always improving. That's how he has Howe ranked below Crosby and Jagr. Now, people can have different opinions about what to adjust for when comparing stars across eras, but if the stat you start with is already shaving as much as 22% off of earlier stars and you don't realize it, you end up penalizing them more than you intend to.
1722027475798.gif
 
  • Like
Reactions: jigglysquishy

frisco

Some people claim that there's a woman to blame...
Sep 14, 2017
3,693
2,793
Northern Hemisphere
I've looked at this a few different ways, and I'm convinced that the roster size adjustment is misguided, and is artificially depressing the numbers of players before 1983. I'll have more to share in the future, but I'm curious what others think, and if someone else out there is publishing adjusted stats without the roster size adjustment.
I'd like to see a straight goals/game adjustment only without the roster size adjustments. A person could make their own adjustments for roster size (or not) if they wanted but it would be nice to have a straight statistical one as a foundation to start with.

Once you start moving into more subjective areas (like the effects of different roster sizes, longer dead-time commercial breaks or distribution of scoring within lines) poor assumptions can skew things.

That being said, pre-end of WW II (before the red line) stuff is never really going to work well no matter how one does it. And obviously, in a general sense, the farther the time periods being compared, the more the basic game of hockey changes, making comparisons less legitimate.

My Best-Carey
 

BraveCanadian

Registered User
Jun 30, 2010
15,203
4,408
Once you start moving into more subjective areas (like the effects of different roster sizes, longer dead-time commercial breaks or distribution of scoring within lines) poor assumptions can skew things.

Adjusting by average goals per game automatically builds in a bunch of poor assumptions.
 

BraveCanadian

Registered User
Jun 30, 2010
15,203
4,408

People have already pointed out how badly adjusting players totals based only on average scoring is up thread. You only need to look at some of the completely wacky results over time to see how ridiculous they can be, especially once you go beyond front liners who get the major share of PP time.
 

ijuka

Registered User
May 14, 2016
23,089
16,234
This "adjusted point"-stuff still is pretty funny. Almost all the methodologies have to do with "logic" and "what makes sense" rather than what actually works.

Just use machine learning. Infinitely more accurate.


For example, let's say we want to rank goal scorers between different seasons. Compare #1 to #500 goalscorer from every single season to one another, having the two seasons' average statistics as features. You'd get probably hundreds of thousands of data points, if not millions. Then you'd optimize for the most accurate solution.

And there, you have a far more accurate adjusted statistics-method than any of these arbitrary ones.
 

ijuka

Registered User
May 14, 2016
23,089
16,234
As someone who does machine learning for a living, please do tell.

Specifically what techniques are you advocating for?
Which techniques?

Okay. I'd use linear regression where the targets are the multipliers in logarithmic space.

I'd compare #1 to #1, #2 to #2, and so forth, for each season as the data points. The overall season data would be used as features. And to start with, probably 5on5 g/60 would be the simplest stat to use.

If Linear regression struggles with it, then I'd use neural networks instead. Probably something like 32, 16, 8 dense with relu activators and adam but I mean, it shouldn't be an issue.


I'm not sure why you're asking me this if you do machine learning for a living. It's not a particularly challenging problem.
 
  • Like
Reactions: Bear of Bad News

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
13,874
28,522
I appreciate the explanation. You’re oversimplifying. Fitting a GLM to the available data ignores a lot. And even the data you describe doesn’t exist for that many years.

You’ll necessarily overfit because you don’t have enough data to model, let alone to have a holdback set.
 

ijuka

Registered User
May 14, 2016
23,089
16,234
I appreciate the explanation. You’re oversimplifying. Fitting a GLM to the available data ignores a lot. And even the data you describe doesn’t exist for that many years.

You’ll necessarily overfit because you don’t have enough data to model, let alone to have a holdback set.
It ignores a lot, but less than arbitrarily using arbitrarily chosen data. And it nevertheless reaches the most accurate possible result that can be reached with the data available. The limit is the amount and quality of data. But other methods use that same data anyway.

You’ll necessarily overfit because you don’t have enough data to model, let alone to have a holdback set.
I'd say that 50000 data points has a lesser chance of overfitting than 50 data points. And I already told you the feature count would be relatively low, which makes overfitting less of a concern.

Not to mention, think about what our use case is, and what overfitting means in that context.

But if you think overfitting is a concern, use a bayesian linear model instead. This will not overfit by design and is completely driven by data. If it doesn't have enough data for certain estimates, it introduces uncertainty proportionally. This is more complex, though.
 
  • Like
Reactions: Bear of Bad News

Hockey Outsider

Registered User
Jan 16, 2005
9,369
15,370
Ya, I don't think you can generate a single stat that will put all players on even footing. But I haven't seen anyone publish what I think would be the best basic stats for comparing top offensive players across eras (at least since the mid-1930s): HR's adjusted stats, minus the roster adjustment.

HR's adjusted stats get used in a lot of places, like Paul Pidutti's Adjusted Hockey. He then applies a "Timeline Penalty" because the league now draws from a larger population and fitness and technology are always improving. That's how he has Howe ranked below Crosby and Jagr. Now, people can have different opinions about what to adjust for when comparing stars across eras, but if the stat you start with is already shaving as much as 22% off of earlier stars and you don't realize it, you end up penalizing them more than you intend to.
Off topic but I find Pidutti's system interesting. It's certainly well intentioned (even though I disagree with several of his assumptions, and the output can be questionable.). I might get around to doing a deeper dive at some point.
 

Dingo

Registered User
Jul 13, 2018
1,892
1,878
has anyone ever done even strength points adjusted by league even strength goals per game - thus eliminating powerplay inequalities completely?
 

Bear of Bad News

"The Worst Guy on the Site" - user feedback
Sep 27, 2005
13,874
28,522
It ignores a lot, but less than arbitrarily using arbitrarily chosen data. And it nevertheless reaches the most accurate possible result that can be reached with the data available. The limit is the amount and quality of data. But other methods use that same data anyway.


I'd say that 50000 data points has a lesser chance of overfitting than 50 data points. And I already told you the feature count would be relatively low, which makes overfitting less of a concern.

Not to mention, think about what our use case is, and what overfitting means in that context.

But if you think overfitting is a concern, use a bayesian linear model instead. This will not overfit by design and is completely driven by data. If it doesn't have enough data for certain estimates, it introduces uncertainty proportionally. This is more complex, though.

Ah, I think I see what you're intending - thank you (this is why I ask for clarification).

If your intended sample size is player-seasons, then that does get to a non-trivial sample. My primary concerns are the granularity of the features (for instance, if you use a leaguewide total for a value then everyone's value for, say, 1997 will be identical). It's worth trying because then you could at least see where the holes are.
 

20feet

Registered User
Nov 1, 2022
8
13
Off topic but I find Pidutti's system interesting. It's certainly well intentioned (even though I disagree with several of his assumptions, and the output can be questionable.). I might get around to doing a deeper dive at some point.
Ya, I disagree with some of his choices, but my main objection is that because a lot of his inputs are HR Adjusted stats, he's penalizing earlier players more than he (or his readers) realize.
 

Ad

Upcoming events

Ad

Ad