Alright well, time to bump this thread.
I pretty much skipped 2020 due to the weirdness of everything going on, but I wanted to take a stab at this year.
This year it's a bit different. I decided to take someone else's advice and essentially glue together stats from the past 2 seasons to create a larger data set. What this meant was that I had to tackle something that had been on my todo list for a long time.
You see, when entering players into my database, I try to keep it simple. League played, games played, points. Simplicity is the idea, after all. But what to do if a player plays in multiple leagues? Generally speaking, I take the league the player played the most games in. If it's close, I take the league that is strongest...but even that isn't so cut and dry. My general rule of thumb is "be most favorable to the player." So if a guy played 25 games in league A and put up 25 points, and then was promoted to much stronger league B and played 25 games there, but only had 1 point, I will probably use his league A stats, figuring that probably he wasn't getting icetime in league B and I won't count it against him. It's not exactly a science, some subjectivity has had to go into it, but usually it's a pretty easy call and there have only been a small handful of players for whom this this would have made any real difference.
But what I've always wanted to do was to basically take a weighted average. If a players plays 40 games in league A and 20 games in league B, I will use both data points, but I will weight league A more heavily.
(Warning, math to follow)
The math on this is simple. Take this example:
[TABLE="class: brtb_item_table"][TBODY][TR][TD]League[/TD][TD]GP[/TD][TD]P[/TD][TD]P/GP [/TD][TD]Lg strength [/TD][/TR]
[TR][TD]League A[/TD][TD]35[/TD][TD]30[/TD][TD] 0.86[/TD][TD] 1.00[/TD][/TR]
[TR][TD]League B[/TD][TD]15[/TD][TD]23[/TD][TD] 1.53[/TD][TD] 0.85[/TD][/TR][/TBODY][/TABLE]
Player played 15 games in the weaker league (85% the strength of League A) and put up 1.53 PPG there, before being promoted to League A, where he put up 0.86 PPG
(Note I made up these numbers this is not a real player)
So we would now calculate the adjusted PPG for each and then take a weighted average, which for this player will be:
(((0.86)*(1.00)*35) + (1.53)*(0.85)*15))/50 = 0.99
So his final result is that he played in multiple leagues and has a "score" of 0.99. This "score" doesn't mean anything it's just a ranking mechanism.
Anyway, so let me know if you have any thoughts or suggestions, just being transparent about the methodology. I have been *wanting* to do this for a long time but was not super incentivized to do so as it didn't affect things enough, but now what I am doing is looking at the players last 2 seasons and therefore having to deal with players playing in multiple leagues is far more of a problem. So I finally did it.
One other thing. I haven't yet decided, if I am taking the last two seasons, whether I should also weight this season slightly more than last season. I feel like I should but there are reasons not to as well. I'm not really sure. For now, I am not. So if player played 2020 in Allsvenskan and 2021 in SHL, I am combining the data and using the methodology outlined above to compute a weighted average.
With all that out of the way I have a very preliminary top 10, which I think is going to be quite different from toher top 10s. Your feedback here on who I might be missing or over/under evaluating might help me find bugs in the code that I wrote to implement the methodology, so would be very appreciated.
Here we go. First draft!
1. Xavier Bourgault
2. Matthew Coronato
3. Zachary L'Heureux
4. Dylan Guenther
5. Ryder Korczak
6. Isaac Belliveau
7. Zachary Bolduc
8. Valtteri Koskela
9. Viljami Marjala
10. Alexander Kisakov
There are definitely some kinks to work out here. The first thing I noticed was that William Eklund fell pretty far from where I had him before. This is basically because the 2 points in 20 SHL games last year is hurting him now, when it wasn't a factor before. This might be an indication that I should do something to weigh 2020 numbers less than 2021 numbers, since in all of my previous drafts those numbers wouldn't hurt the player, and it's only hurting this player because I need to glue last year's stats on to increase the sample size.
If I ignore 2020 altogether, but otherwise leave the "mixed league" method in place I get quite a different top 10!
1. William Eklund
2. Olen Zellweger
3. Xavier Bourgault
44. Matthew Coronato
5. Kent Johnson
6. Olivier Nadeau
7. Matthew Beniers
8. Brandt Clarke
9. Cole Sillinger
10. Ayrton Martino
So yeah, not sure! Maybe I should adopt the approach of only gluing on 2020 stats if I have to? So Olen Zellweger and his 11 GP will have his 2020 stats added on, but William Eklund and his 40 GP is fine as is. I'm really not sure. What do y'all think?