The Data-Based Drafting Thread (what players would a Potato pick?)

This thread is a great exercise in analytics and data analysis but I have one pretty big gripe about the premise of calling it the potato. Calling it a potato is spitting in the face of everyone who put in the hard work of generating the league equivalency values and the many hours I’m sure it took Melvin to develop the algorithm and rank the draft eligibles according to the published research. You’re basically reducing those researchers’ contributions to something that an inanimate object is capable of doing. And why? So that you could make fun of Benning by saying he’s dumber than a potato. It’s really insulting to everyone and dishonest to represent this model as “brain dead” when it took a lot of smart people to work hard to create the model. It’s not a potato. Imagine if someone said Google’s search engine algorithms were as smart as a potato? They would be laughed out of the room and rightly so.

When you think about it, it puts a really dark negative and arrogant twist on something that should be a wonderful and exciting approach to drafting.


Lol so pathetic in terms of always trying to defend Benning. Make that same argument with out trying to defend Benning and I might agree.

however the idea of calling it a potato drafting and @Melvin can correct me is to show that an inanimate object out there could generate a reasonable prospect ranking based on available statistical data. It’s to show an baseline to determine what “skill” there is in the visual test or scouting methodology.

I mean we can call the minimalist operating review of numbers scouting method if you prefer.
 
You know what sucks, when people complain about the name of something, but don't even understand what they're complaining about.

No "research" happened here. No other equivalencies are used. No other rankings are used.

RMB please don't come in here and ruin this great thread. Maybe read the OP again to get a better idea of what this is.

It's essentially scouting from the stat sheet without ever watching a single player, that's why it's called the potato - potatoes don't have eyes.
 
This thread has nothing to do with Benning whatsoever and if it triggers some sort of weird Benning defense response in you then you should maybe spend some time thinking about why it provokes such a reaction. The first page of this thread does not mention that name and is my refuge from those discussions.

This whole thing is basically a revamp of an exercise that I first tried in 2004.
 
I posted full team evaluations over a year ago to the website here. I talk about some of the biases and why we shouldn't draw too many conclusions from the potatos performance and also rank the teams in total nhl toi vs the toi of the players the potato selected. The canucks do okay here, ranking 18th. They would almost certainly rank higher now because of Gaudette and Hughes, but I haven't gotten around to doing updated evaluations.

For the most part I focus on Vancouver in this thread because this is the Vancouver forum but in no way has this exercise had any sort of Vancouver focus and to suggest otherwise is without merit. I really couldn't care less how Vancouver does specifically in this analysis. If it turns out with the new evaluations that Vancouver is #1 I will say so.
 
@Melvin can you speak to this? I'm curious.

Sure. I love questions like this.

Heiskanen was ranked very close to the top-20. He just barely missed the cut. Ultimately his production just wasn't that amazing. 10 in 37 in Liiga is 1st round material but he just didn't quite rank as highly as some others.

Makar was hurt by his league. Just didn't have much data on the AJHL to go on so it hurt his ranking a lot. This is where your scouts can theortically provide value is informing you about leagues that we don't really know a lot about.

Only 27 players in my database had been drafted from the AJHL prior to Cale Makar, the top ones being:

Nick Johnson
Joe Colborne
Mason Raymond
Matt Frattin
Scott McCulloch

So... uh .... ¯\_(ツ)_/¯

Hughes also was not that far off the list but was docked for his size. There just have not been many 5'9" defenders doing what he has done and honestly I think people don't realize that he is one of the highest-drafted short defenders of all time. Guys who have produced like him like Sam Girard and Jordan Subban tend to go in the 2nd - 3rd round. He's really a unique and special case as a short offensive-defender who went in the top-10.

All 3 of these guys DID rank in the top-50 so it's not like they were "DO NOT DRAFT" or something but all 3 had warning signs that frankly are not controversial and are the same ones that we were talking about as humans. Poor production, poor quality of play and poor size.
 
I do wonder how much has the change in style of play the nhl has gone under in the last 10-20 years skewers your data.

By that I mean a player like Hughes woudl have been killed 25 years ago but with the rule changes and style changes they get underrated because a 5’9” defender couldn’t produce the numbers that would accurately reflect the potential of today’s game.
 
I do wonder how much has the change in style of play the nhl has gone under in the last 10-20 years skewers your data.

By that I mean a player like Hughes woudl have been killed 25 years ago but with the rule changes and style changes they get underrated because a 5’9” defender couldn’t produce the numbers that would accurately reflect the potential of today’s game.
I think a lot of that was self-fulfilling prophecy, even before, for what it's worth. I won't pretend to have any data, but I kind of feel that smaller, skilled players would always have done largely OK in the NHL if they were legitimately good, it's just that few GMs were willing to take a chance on them. You always heard about how much guys like Fleury or Ronning "beat the odds", but then how many actual examples of failed counterparts were there? I dare say almost none. It's not like these guys were regularly given a chance to fail and they never got drafted very high, as @Melvin notes.

Relatedly, I've never bought into the idea that there has actually been a massive change in play in the NHL in recent years. There is just more enlightened data management. I think physicality has always been incredibly overrated.
 
  • Like
Reactions: Canucks5551
I think a lot of that was self-fulfilling prophecy, even before, for what it's worth. I won't pretend to have any data, but I kind of feel that smaller, skilled players would always have done largely OK in the NHL if they were legitimately good, it's just that few GMs were willing to take a chance on them. You always heard about how much guys like Fleury or Ronning "beat the odds", but then how many actual examples of failed counterparts were there? I dare say almost none. It's not like these guys were regularly given a chance to fail and they never got drafted very high, as @Melvin notes.

Relatedly, I've never bought into the idea that there has actually been a massive change in play in the NHL in recent years. There is just more enlightened data management. I think physicality has always been incredibly overrated.

I think the problem is that people are not good at dealing with probability and so players who are small tended to fall into a binary "DO NOT DRAFT" bucket instead of just having their size factored in. I think it's absolutely legit that there is a demerit that comes with size but for many years that demerit was far too severe. Like, yes, Brayden Point should rank lower than he would be if he was 6'2", but he shouldn't fall to the f***ing 3rd round for Christ sakes. And even though the potato was sort-of designed to be non-scientific, it still handles this sort of thing in a more precise way, because it will dock Point by a more appropriate like 5% instead of just striking him from the list.

I think there IS evidence that short players (especially defenders) tend to turn out worse than similar-producing players of a larger size, but for a long time this effect was greatly exaggerated. With 2018 we saw a crazy correction where like 9 tiny defenders were drafted in the first round. Time will tell if this ends up being justified but I actually feel like this might be an over-correction where these players are now overvalued. As they say, wait and see.
 
Watching Kaliyev score his first goal last night I thought I remembered his name but he was only the alternate pick. But still, way to go potato! :thumbu:
 
Alright well, time to bump this thread.

I pretty much skipped 2020 due to the weirdness of everything going on, but I wanted to take a stab at this year.

This year it's a bit different. I decided to take someone else's advice and essentially glue together stats from the past 2 seasons to create a larger data set. What this meant was that I had to tackle something that had been on my todo list for a long time.

You see, when entering players into my database, I try to keep it simple. League played, games played, points. Simplicity is the idea, after all. But what to do if a player plays in multiple leagues? Generally speaking, I take the league the player played the most games in. If it's close, I take the league that is strongest...but even that isn't so cut and dry. My general rule of thumb is "be most favorable to the player." So if a guy played 25 games in league A and put up 25 points, and then was promoted to much stronger league B and played 25 games there, but only had 1 point, I will probably use his league A stats, figuring that probably he wasn't getting icetime in league B and I won't count it against him. It's not exactly a science, some subjectivity has had to go into it, but usually it's a pretty easy call and there have only been a small handful of players for whom this this would have made any real difference.

But what I've always wanted to do was to basically take a weighted average. If a players plays 40 games in league A and 20 games in league B, I will use both data points, but I will weight league A more heavily.

(Warning, math to follow)

The math on this is simple. Take this example:
[TABLE="class: brtb_item_table"][TBODY][TR][TD]League[/TD]
[TD]GP[/TD]
[TD]P[/TD]
[TD]P/GP [/TD]
[TD]Lg strength [/TD][/TR]
[TR][TD]League A[/TD]
[TD]35[/TD]
[TD]30[/TD]
[TD] 0.86[/TD][TD] 1.00[/TD][/TR]
[TR][TD]League B[/TD]
[TD]15[/TD]
[TD]23[/TD]
[TD] 1.53[/TD][TD] 0.85[/TD][/TR][/TBODY][/TABLE]

Player played 15 games in the weaker league (85% the strength of League A) and put up 1.53 PPG there, before being promoted to League A, where he put up 0.86 PPG

(Note I made up these numbers this is not a real player)

So we would now calculate the adjusted PPG for each and then take a weighted average, which for this player will be:

(((0.86)*(1.00)*35) + (1.53)*(0.85)*15))/50 = 0.99

So his final result is that he played in multiple leagues and has a "score" of 0.99. This "score" doesn't mean anything it's just a ranking mechanism.

Anyway, so let me know if you have any thoughts or suggestions, just being transparent about the methodology. I have been *wanting* to do this for a long time but was not super incentivized to do so as it didn't affect things enough, but now what I am doing is looking at the players last 2 seasons and therefore having to deal with players playing in multiple leagues is far more of a problem. So I finally did it.

One other thing. I haven't yet decided, if I am taking the last two seasons, whether I should also weight this season slightly more than last season. I feel like I should but there are reasons not to as well. I'm not really sure. For now, I am not. So if player played 2020 in Allsvenskan and 2021 in SHL, I am combining the data and using the methodology outlined above to compute a weighted average.

With all that out of the way I have a very preliminary top 10, which I think is going to be quite different from toher top 10s. Your feedback here on who I might be missing or over/under evaluating might help me find bugs in the code that I wrote to implement the methodology, so would be very appreciated.

Here we go. First draft!

1. Xavier Bourgault
2. Matthew Coronato
3. Zachary L'Heureux
4. Dylan Guenther
5. Ryder Korczak
6. Isaac Belliveau
7. Zachary Bolduc
8. Valtteri Koskela
9. Viljami Marjala
10. Alexander Kisakov

There are definitely some kinks to work out here. The first thing I noticed was that William Eklund fell pretty far from where I had him before. This is basically because the 2 points in 20 SHL games last year is hurting him now, when it wasn't a factor before. This might be an indication that I should do something to weigh 2020 numbers less than 2021 numbers, since in all of my previous drafts those numbers wouldn't hurt the player, and it's only hurting this player because I need to glue last year's stats on to increase the sample size.

If I ignore 2020 altogether, but otherwise leave the "mixed league" method in place I get quite a different top 10!

1. William Eklund
2. Olen Zellweger
3. Xavier Bourgault
44. Matthew Coronato
5. Kent Johnson
6. Olivier Nadeau
7. Matthew Beniers
8. Brandt Clarke
9. Cole Sillinger
10. Ayrton Martino

So yeah, not sure! Maybe I should adopt the approach of only gluing on 2020 stats if I have to? So Olen Zellweger and his 11 GP will have his 2020 stats added on, but William Eklund and his 40 GP is fine as is. I'm really not sure. What do y'all think?
 
Alright well, time to bump this thread.

I pretty much skipped 2020 due to the weirdness of everything going on, but I wanted to take a stab at this year.

This year it's a bit different. I decided to take someone else's advice and essentially glue together stats from the past 2 seasons to create a larger data set. What this meant was that I had to tackle something that had been on my todo list for a long time.

You see, when entering players into my database, I try to keep it simple. League played, games played, points. Simplicity is the idea, after all. But what to do if a player plays in multiple leagues? Generally speaking, I take the league the player played the most games in. If it's close, I take the league that is strongest...but even that isn't so cut and dry. My general rule of thumb is "be most favorable to the player." So if a guy played 25 games in league A and put up 25 points, and then was promoted to much stronger league B and played 25 games there, but only had 1 point, I will probably use his league A stats, figuring that probably he wasn't getting icetime in league B and I won't count it against him. It's not exactly a science, some subjectivity has had to go into it, but usually it's a pretty easy call and there have only been a small handful of players for whom this this would have made any real difference.

But what I've always wanted to do was to basically take a weighted average. If a players plays 40 games in league A and 20 games in league B, I will use both data points, but I will weight league A more heavily.

(Warning, math to follow)

The math on this is simple. Take this example:
[TABLE="class: brtb_item_table"][TBODY][TR][TD]League[/TD][TD]GP[/TD][TD]P[/TD][TD]P/GP [/TD][TD]Lg strength [/TD][/TR]
[TR][TD]League A[/TD][TD]35[/TD][TD]30[/TD][TD] 0.86[/TD][TD] 1.00[/TD][/TR]
[TR][TD]League B[/TD][TD]15[/TD][TD]23[/TD][TD] 1.53[/TD][TD] 0.85[/TD][/TR][/TBODY][/TABLE]
Player played 15 games in the weaker league (85% the strength of League A) and put up 1.53 PPG there, before being promoted to League A, where he put up 0.86 PPG

(Note I made up these numbers this is not a real player)

So we would now calculate the adjusted PPG for each and then take a weighted average, which for this player will be:

(((0.86)*(1.00)*35) + (1.53)*(0.85)*15))/50 = 0.99

So his final result is that he played in multiple leagues and has a "score" of 0.99. This "score" doesn't mean anything it's just a ranking mechanism.

Anyway, so let me know if you have any thoughts or suggestions, just being transparent about the methodology. I have been *wanting* to do this for a long time but was not super incentivized to do so as it didn't affect things enough, but now what I am doing is looking at the players last 2 seasons and therefore having to deal with players playing in multiple leagues is far more of a problem. So I finally did it.

One other thing. I haven't yet decided, if I am taking the last two seasons, whether I should also weight this season slightly more than last season. I feel like I should but there are reasons not to as well. I'm not really sure. For now, I am not. So if player played 2020 in Allsvenskan and 2021 in SHL, I am combining the data and using the methodology outlined above to compute a weighted average.

With all that out of the way I have a very preliminary top 10, which I think is going to be quite different from toher top 10s. Your feedback here on who I might be missing or over/under evaluating might help me find bugs in the code that I wrote to implement the methodology, so would be very appreciated.

Here we go. First draft!

1. Xavier Bourgault
2. Matthew Coronato
3. Zachary L'Heureux
4. Dylan Guenther
5. Ryder Korczak
6. Isaac Belliveau
7. Zachary Bolduc
8. Valtteri Koskela
9. Viljami Marjala
10. Alexander Kisakov

There are definitely some kinks to work out here. The first thing I noticed was that William Eklund fell pretty far from where I had him before. This is basically because the 2 points in 20 SHL games last year is hurting him now, when it wasn't a factor before. This might be an indication that I should do something to weigh 2020 numbers less than 2021 numbers, since in all of my previous drafts those numbers wouldn't hurt the player, and it's only hurting this player because I need to glue last year's stats on to increase the sample size.

If I ignore 2020 altogether, but otherwise leave the "mixed league" method in place I get quite a different top 10!

1. William Eklund
2. Olen Zellweger
3. Xavier Bourgault
44. Matthew Coronato
5. Kent Johnson
6. Olivier Nadeau
7. Matthew Beniers
8. Brandt Clarke
9. Cole Sillinger
10. Ayrton Martino

So yeah, not sure! Maybe I should adopt the approach of only gluing on 2020 stats if I have to? So Olen Zellweger and his 11 GP will have his 2020 stats added on, but William Eklund and his 40 GP is fine as is. I'm really not sure. What do y'all think?

Interesting results, thanks for sharing. I agree that you should probably use this season's stats unless the player has a small sample size due to the covid situation and/or injury. Not sure what the arbitrary cutoff should be for a season to be considered a small sample size though.
 
Last edited:
alright so I ironed out a few kinks.

It turns out that a bunch of Euro leagues have re-structured and re-named and that messed up a bit. I also did end up putting something in place to discount 19-20 stats a bit. So I am still taking both seasons into account, but last season's numbers are worth more than 2020's numbers. The end result is, probably not surprisingly, something of a combination of the two lists I put above:

William Eklund
Xavier Bourgault
Dylan Guenther
Cole Sillinger
Kent Johnson
Ayrton Martino
Matthew Coronato
Owen Power
Zachary L'Heureux
Brandt Clarke

Could tweak it a bit more but feel this is a pretty reasonable ranking in the spirit of how the potato has ranked players in the past. Eklund absolutely makes sense as its #1 choice.
 
So yeah, not sure! Maybe I should adopt the approach of only gluing on 2020 stats if I have to? So Olen Zellweger and his 11 GP will have his 2020 stats added on, but William Eklund and his 40 GP is fine as is. I'm really not sure. What do y'all think?

I've always wondered what an ideal number of games played would be so that sample size isn't an issue when looking at production for prospects. Obviously this isn't always attainable depending on things like league structuring or even injuries but is there some of threshold?

Like maybe at 50 GP you'd feel comfortable saying this isn't just a tiny sample? I'm just basing this off a personal theory that short-term trends in juniors may have more significance as teenagers are probably still on a steeper development curve with more growing to do, compared to a guy who's 23-24 with a skillset that's largely already set in stone for the most part.

The other problem is of course European teens who move up to men's leagues and put up 0 points because they're playing limited minutes on a 4th line. I think league equivalencies are useful but is it even possible to adjust for that ice-time factor somehow?
 
I've always wondered what an ideal number of games played would be so that sample size isn't an issue when looking at production for prospects. Obviously this isn't always attainable depending on things like league structuring or even injuries but is there some of threshold?

Like maybe at 50 GP you'd feel comfortable saying this isn't just a tiny sample? I'm just basing this off a personal theory that short-term trends in juniors may have more significance as teenagers are probably still on a steeper development curve with more growing to do, compared to a guy who's 23-24 with a skillset that's largely already set in stone for the most part.

The other problem is of course European teens who move up to men's leagues and put up 0 points because they're playing limited minutes on a 4th line. I think league equivalencies are useful but is it even possible to adjust for that ice-time factor somehow?

it’s really tough. I don’t think there is a real answer. I am setting it to 20 right now cause I just figure, if you haven’t played 20 games in the last two seasons combined, I’m not really comfortable evaluating you. But it’s entirely arbitrary.

I wish I had some way of using ice time data as well but tbh the biggest thing I’ve learned from this project is that if you’re playing in a men’s league at all in your draft year, you project extremely well. That is why the potato has Elkins #1 even though his numbers aren’t really eye popping, it’s just the fact that he is playing in the SHL at all is very impressive historically.
 
  • Like
Reactions: vanuck
it’s really tough. I don’t think there is a real answer. I am setting it to 20 right now cause I just figure, if you haven’t played 20 games in the last two seasons combined, I’m not really comfortable evaluating you. But it’s entirely arbitrary.

I wish I had some way of using ice time data as well but tbh the biggest thing I’ve learned from this project is that if you’re playing in a men’s league at all in your draft year, you project extremely well. That is why the potato has Elkins #1 even though his numbers aren’t really eye popping, it’s just the fact that he is playing in the SHL at all is very impressive historically.
Yeah, ice time and quality of linemates can definitely skew things for a purely stat-based ranking like this but it’s still it’s interesting to look at the results.
 
  • Like
Reactions: Melvin
I guess I never posted the 2020 picks. I thought I did but just went through the thread and don't see them.

82. Adam Wilsby (Canucks took Joni Jurmo)
113. William Villeneuve (Jackson Kunz)
144. Axel Rindell (Jacob Truscott)
175. Samuel Johannesson (Dmitry Zlodeyev)
191. Oskar Magnusson (Viktor Persson)

The potato liked defenders last year, idk.

Wilsby was an overager but moved to the SHL in 2021 and put up 18 points as a defender, pretty good. There might be a sign-ability problem there, not sure.
Villeneuve was taken by the Leafs -- didn't have a great year but did get into 2 AHL games.
Rindell also taken by the Leafs -- 26 points in the Finnish Liiga. Pretty good also
Johannesson was taken by CBJ, played in SHL all year but only 4 points.
Magnusson the only forward the potato took for the Canucks, spent most of the year in Swedish Tier III.
 
I think you’re right on the money when you said you needed to weight the 2020 numbers.

Great work, this years list is gonna be interesting!
 
Big fan of your work Melvin.

Thanks.

u0QnjHv.gif
 

Ad

Upcoming events

Ad