The Data-Based Drafting Thread (what players would a Potato pick?)

Melvin · May 27, 2018

There seems to be enough interest in this topic to merit continuation, however I was starting to clutter up the management thread with stuff that is not directly related to current management and I have nowhere else to put this, so I am starting a new thread.

I have spent the last several months working on a system for drafting players based on nothing more than publicly-available information, namely games played and points in the player's draft year, along with the league played in and biographical data (birthday and height/weight as listed on NHL.com at time-of-draft.)

The purpose of this exercise was to establish a "baseline" that one could use to properly evaluate a team's draft performance. How well are they doing at drafting? Well, compare their picks to the picks that this simple system would have made, and you have your answer. If a team cannot consistently do better than "the potato," then some difficult questions should be asked of the scouting staff.

I have put a lot more information on the methodology and the intention on my blog, so I will just link to the introduction there rather than repeating everything here. I have made several follow-up posts there which talk about a few other topics.

Honestly, the system performs better than I would have expected, and it is something into which I have now invested substantial time. I have made several tweaks and added considerable more data to it since my original post, including draft data going back to 2009.

Because this is the Canucks forum, here are the picks the Canucks would have made since 2009, taking the best player available using the actual picks the Canucks had, and applying the latest version of my formula. Note that the system does not know where the player was actually drafted or even in some cases if the player was even drafted at all, so it makes some substantial reaches. Note also that these might differ in some way from what is on the blog, but not substantially so. Feel free to ask me if you have any questions about any particular player.

2009: Brandon Pirri, Anton Rodin, Mike Hoffman, Benjamin Cassavant, Curtis McKenzie, Brandon Kozun, Michael Cichy

2010: Jesper Fast, Brendan Gallagher, Artemi Panarin, Alexei Marchenko, Brendan Ranford

2011: Shane Prince, Jean-Gabriel Pageau, Andrew Fritsch, Ondrej Palat, Joel Lowry, Josh Manson, Ryan Dzingel, Henrik Tommernes

2012: Esa Lindell, Jujhar Khaira, Alexander Kerfoot, Matej Beran, Emil Lundberg

2013: Alexander Wennberg, Artturi Lehknonen, Sven Andreighetto, Eric Locke, Andreas Johnsson, Juuso Ikonen, Brendan Harms

2014: William Nylander, David Pastrnak, Brayden Point, Viktor Arvidsson, Spencer Watson, Axel Holmstrom, August Gunnarsson

2015: Anthony Beauvillier, Anthony Richard, Andrew Mangiapane, Nikita Korostelev, Jonathan Davidsson, Tim McGauley, Kay Schweri

2016: Matthew Tkachuk, Vitaly Abramov, David Bernhardt, Maxime Fortier, Brayden Burke, Tim Wahlgren

2017: Elias Pettersson, Jason Robertson, Jonah Gadjovich, Igor Shvyrov, Matthew Strome, Artem Minulin, Austen Keating, Ivan Chekhovich

The same formula is used for each and every draft and not altered or tweaked in any way for any particular draft year. The 2017 draft has also not been used for any assessment of it and I have tried, in each year, to include as many undrafted players as I could find to truly represent the pool of players available, but it is difficult to find this information and some may be missing.

Remember, scouts are paid for their predictions, and just as you would want to know how a money manager is doing by comparing his predictions to a standard model for picking stocks, so too should you compare your scouting to a standard model for picking players, and probably look to invest your money elsewhere if it compares poorly.

How the Canucks have performed compared to this simple baseline is 100% up to your evaluation of these players. I have also spent a lot of time on this topic but won't get into the actual team evaluations here because this post is already too long. I have team evaluations and rankings on my blog if you want to read them there.

As with any system, some drafts are looking better than others for this system. The 2009 and 2015 drafts are not good in general when applying this model to all 30 teams, while the 2010, 2011 and 2014 drafts are looking quite strong. The 2016 and 2017 are too early to call one way or the other.

Finally, I will close this post by posting the top-20 for both 2017 and 2018. I was originally going to do every draft since 2009 but this post is already so long and nobody is going to read it all. I have the 2018 draft also posted on my blog along with further commentary, however I will answer any questions here and take any requests for further information.

First, 2017:

1. Elias Pettersson
2. Lias Andersson
3. Nico Hischier
4. Nick Suzuki
5. Nolan Patrick
6. Cody Glass
7. Jason Robertson
8. Conor Timmins
9. Gabriel Vilardi
10. Kole Lind
11. Jonah Gadjovich
12. Martin Necas
13. Nicolas Hague
14. Igor Shvyryov
15. Owen Tippett
16. Kailer Yamamoto
17. Cal Foote
18. Aleksi Heponiemi
19. Matthew Strome
20. Antoine Morand
21. Michael Rasmussen

The Canucks actually managed to get 3 of the top ten from this system, although it preferred Jason Robertson to Kole Lind. I am not a big fan of lists, and surely parts of this is always going to be laughable, but the important thing to me is the overall performance of applying this methodology to every pick and comparing it to the overall performance of teams. There are going to be some massive misses in both cases so the long-run evaluation is more important than the rankings. On the one hand, the model had Sven Baertschi ahead of Nikita Kucherov in 2011, but on the other hand, so did the scouts, and at least the model had Kucherov in the top-20 (Baertschi was ranked 14th, drafted 13th; Kucherov was ranked 15th, drafted 58th.) I think that keeping perspective on this is important.

I think very clearly the biggest place where this methodology will differ from scouts is with players with perceived skating issues. Guys like Matthew Strome and Jonah Gadjovich were available to be selected much later because of their skating. This is not something that I can account for in the data (yet!) This seems like a very clear space where scouting can add value and should be able to out-perform this baseline. If it were "me" making the picks I would want to definitely consider skating ability and factor that into the rankings.

OK, finally we get to 2018. Brace yourself!

1. Rasmus Dahlin
2. Evan Bouchard
3. Andrei Svechnikov
4. Noah Dobson
5. Filip Zadina
6. Jesperi Kotkaniemi
7. Isac Lundeström
8. Martin Kaut
9. Jacob Olofsson
10. Filip Hållander
11. Joe Veleno
12. Alexander Alexeyev
13. Akil Thomas
14. Oliver Wahlstrom
15. Ryan McLeod
16. David Gustafsson
17. Jonatan Berggren
18. Linus Karlsson
19. Carl Wassenius
20. Nathan Dunkley

Without repeating what I wrote on the blog, it is a big draft for European players as guys like Lundestrom, Kaut, Olofsson and Hallander are highly-ranked but could be grabbed with later picks. Most of these guys are separated by the slimmest of margins and could be moved around anywhere. The biggest gap is after Dahlin, as the scouts seem to have as well.

I will be honest, I have worked very hard on this and probably put 200-300 hours into it at this point, far exceeding my original intentions of making the laziest system possible. I am excited to see how it does for 2018 but also aware that any one draft can boom or bomb, and the 2018 draft having a lot of defenders makes it even more difficult, so it wouldn't surprise me if it performs more like the 2015 draft than the 2014 one.

In any case, this should set for us a relevant baseline against which we can compare the Canuck picks. When the draft occurs, I will post our picks in here in "real time" as I will pick for the Canucks the best player available based on the system. I will also post "my" pick by taking into account some expectations for where a player will go which the system does not account for.

If you have any questions, comments or suggestions I am happy to answer them, although I will note just one more time that more information is on the blog so if you want some more details I would encourage you to at least read the introduction post there.

Thank you for your time.

EDIT: I had made a translation error when copying into this post and missed Conor Timmins in the 2017 rankings, who should have beetn between Robertson and Vilardi. I have updated the post but now show the top 21 so that I am not removing anything.

PuckMunchkin · May 27, 2018

Awesome job! Was looking forward to this when it was mentioned in one of the threads.

(I did read it all too!)

Canadian Canuck · May 27, 2018

Wow thank you for the information, interesting to see!

Kel Varnson · May 27, 2018

Good stuff! Interesting that we got 3 of the top performing players in this metric last year as the team did talk about a more analytic based system.

2014 draft of Nylander & Pastrnak would've drastically altered the look of this organization. What could've been..

Black Noise · May 27, 2018

Jesus christ the forward core would be stacked if they used this model.

Nylander, Pastrnak, Panarin, Hoffman, Point, Tkachuk, Palat, Gallagher and Arvidsson is insane haul in 8 drafts.

Someone hire this man!

Melvin · May 27, 2018

Black Noise said:
Jesus christ the forward core would be stacked if they used this model.

Nylander, Pastrnak, Panarin, Hoffman, Point, Tkachuk, Palat, Gallagher and Arvidsson is insane haul in 8 drafts.

Someone hire this man!

It is definitely easier to pick forwards in this way than defenders, that's for sure. Although this methodology makes it impossible to blow a 3rd overall on someone like Gudbranson, it will also not be able to find someone like Hampus Lindholm. It's a trade-off, of course.

opendoor · May 27, 2018

It's probably way too much work (or maybe it already accounts for this), but I'd be interested to see how different the model would be if it only used data that existed prior to each of the draft years in question. So for instance, Pastrnak and Nylander's success in the NHL likely improves the equivalency numbers for Allsvenskan players. But if you exclude any data after 2014 in calculating the equivalency numbers, would they be ranked as highly in the 2014 draft as they were, or is their success influencing their high ranking when looking back, creating a bit of a feedback loop?

Jack Burton · May 27, 2018

Very impressive and interesting. Good job But Gillis

Is there anyway you can add more than just their draft year stats to your model? Say like their last 2 or 3 years?

I'm a old school armchair scout who puts loads of time in watching these prospects and have always found that these young prospects who have shown consistency/improvement the last few years prior to them being drafted have always done well at the NHL level.

I'd be very interested to see how much your 2018 list would change if you added their last year stats in.

Hoping it's possible but if not than :cheers:

opendoor · May 27, 2018

It is pretty funny looking back at some of the ways scouts outsmarted themselves though. A guy like Point was #1 in the WHL for U18 in production and was #3 among draft eligible players behind Reinhart and Draisaitl. Yet 12 other WHL forwards got picked before him, most of whom weren't even in the same universe offensively. And it's not like Point was leeching points off of someone else; he had 65% more points than the 2nd highest scoring player on his team.

Yeah he was undersized, but at a certain point you can't ignore him producing 2-3x the offense of some of the WHL forwards taken prior to him.

I in the Eye · May 27, 2018

I wouldn't expect the lists to be similar to other scouting lists, because I think other lists not based solely on data run into the loop of following other lists.

If it was mine, I'd call it the "Potato Drafting Thread" to hit home the point that this is how a potato could draft, with a data-based list like this.

Edit: The link isn't showing up for me on mobile... I only get signatures on desktop.

Melvin · May 27, 2018

opendoor said:
It's probably way too much work (or maybe it already accounts for this), but I'd be interested to see how different the model would be if it only used data that existed prior to each of the draft years in question. So for instance, Pastrnak and Nylander's success in the NHL likely improves the equivalency numbers for Allsvenskan players. But if you exclude any data after 2014 in calculating the equivalency numbers, would they be ranked as highly in the 2014 draft as they were, or is their success influencing their high ranking when looking back, creating a bit of a feedback loop?

Yes, this was something I have been highly conscious of since I started, so I am glad someone point it out.

As much as possible I have tried to avoid this, but it's also hard to avoid wanting to use as much data as possible so it's a trade-off. For the most part I used the 2010-2013 drafts for a lot of the "training" and then ran evaluations for 2014-2017. There is a bit of subjectivity here. For example including the 2015 draft in the league evaluations would cause the rating of the Finnish Liiga to be higher but I am choosing to ignore this and leaving the Liiga rated how it is. I just did the Allsvenskan again for 2009-2013, not using the 2014 draft at all, and I came to the same factor. The more I work backwards and keep adding older drafts the more confident I am that this isn't a big problem.

I can tell you it doesn't make a massive difference though, and especially as I add more and more data; for example recently adding in the 2009 draft to my dataset did not result in me changing any of the factors. But I will admit that of all the possible flaws with the model I have been able to think of, this one has gnawed at me the most. At least we can say for sure that the 2017 and 2018 drafts do not have this possible problem, and we will see how they look in the coming years.

Having said that, there are definitely some leagues where it is tougher to tell because of less data available, and it's more subjective in terms of how exactly to rate it. The Czech league for example just doesn't have a lot to go on, and where you rank Kaut in this list depends largely on how you rate the strength of this league, so his placement here is a bit of a guess.

Melvin · May 27, 2018

Jack Burton said:
Very impressive and interesting. Good job But Gillis

Is there anyway you can add more than just their draft year stats to your model? Say like their last 2 or 3 years?

I'm a old school armchair scout who puts loads of time in watching these prospects and have always found that these young prospects who have shown consistency/improvement the last few years prior to them being drafted have always done well at the NHL level.

I'd be very interested to see how much your 2018 list would change if you added their last year stats in.

Hoping it's possible but if not than

Yeah, I've thought about this. In particular, once the 2018 draft is done I'd like to see if it's possible to do a very early 2019 list based on D-1, by researching how much value the D-1 year adds to the predictive power of the model.

So .... maybe!

Edit: This would be especially useful in some rare cases where I basically had to ignore a player because he didn't play in his draft year (Alex Galchenyuk.)

Melvin · May 27, 2018

opendoor said:
It is pretty funny looking back at some of the ways scouts outsmarted themselves though. A guy like Point was #1 in the WHL for U18 in production and was #3 among draft eligible players behind Reinhart and Draisaitl. Yet 12 other WHL forwards got picked before him, most of whom weren't even in the same universe offensively. And it's not like Point was leeching points off of someone else; he had 65% more points than the 2nd highest scoring player on his team.

Yeah he was undersized, but at a certain point you can't ignore him producing 2-3x the offense of some of the WHL forwards taken prior to him.

Yeah I think what happens is that scouts have the right idea to discount players for certain reasons, but the discounts applied are much too heavy.

Another example is "Russian Factor." I get that possibly Panarin was passed over year after year because teams felt he wouldn't come to North America, and, hell, maybe he even told teams he didn't want to come over. But you can't possibly tell me that using a 7th round pick on Sawyer Hannay is a better idea than just taking a chance on the Russian. Put another way, what is going to be harder: convincing a Russian player to come play in the NHL, or developing Sawyer Hannay into an NHL player?

Jack Burton · May 27, 2018

But Gillis said:
Yeah, I've thought about this. In particular, once the 2018 draft is done I'd like to see if it's possible to do a very early 2019 list based on D-1, by researching how much value the D-1 year adds to the predictive power of the model.

So .... maybe!

Edit: This would be especially useful in some rare cases where I basically had to ignore a player because he didn't play in his draft year (Alex Galchenyuk.)

I think you should definitely try it.

I have a feeling that your onto something much bigger than proving a potato can do just as good of a job as a NHL franchise.

I'm wondering if the BPA'S would stick out like a sore thumb????

I'd test it with a draft from like 10 years ago and punch in 3 years worth of their stats and see what gets spit out.

Melvin · May 27, 2018

I really think, further to opendoor's post about teams over-thinking things at times, I really think this might be the case with Evan Bouchard.

Since 2009, here is a list of defenders who had 80+ points in his draft year (in any league), who was not under-sized (over 6' tall):

- Evan Bouchard

Since 2009, here is a list of defenders who had 70+ points in his draft year (in any league), who was not under-sized (over 6' tall):

- Evan Bouchard

Like, the dude is a 6'2" defender with 87 points in 67 games. The next-highest guy on his team is a forward with 57 points. The next-highest defender has 25 points. Twenty. Five.

But people are saying he looks lazy or doesn't skate real fast or whatever? Hmmm. Maybe I'm going to look like a fool in a few years, but I am fascinated to see how it turns out.

Tim Calhoun · May 28, 2018

I began doing something similar back in 2015 but gave up halfway through in lieu of a game prediction model.

Could you share your code? Assuming you did this in something like R/Python and not Excel.

Tim Calhoun · May 28, 2018

One thing to take into account is players whose production was inflated by playing with high end line mates.

This would prevent over ranking players like Zack Phillips (played with Huberdeau) and Angelo Esposito (played with Radulov) in past drafts. Though I guess the downside is that it would hurt players like Ehlers and Drouin. Just some food for thought...

PuckMunchkin · May 28, 2018

Is this very different from SEAL adjusted scoring? The Big SEAL Reveal: Adjusted Scoring For First Time Eligible Prospects

Or just your variation on the same theme?

clunk · May 28, 2018

Holy **** that 2014 draft would have been absolutely insane.

PuckMunchkin · May 28, 2018

clunk said:
Holy **** that 2014 draft would have been absolutely insane.

Thats like a 1-year-rebuild level draft.

drax0s · May 28, 2018

But Gillis said:
But people are saying he looks lazy or doesn't skate real fast or whatever? Hmmm. Maybe I'm going to look like a fool in a few years, but I am fascinated to see how it turns out.

This is the benefit to making picks based on a system. Sometimes it works - sometimes it doesn't. In the end you don't look dumb, the system just shat the bed.

Hard to argue with a draft of Nylander, Pastrnak, Point and Arvidsson though. I thought the Nucks 2004 draft was good. :|

Melvin · May 28, 2018

Tim Calhoun said:
I began doing something similar back in 2015 but gave up halfway through in lieu of a game prediction model.

Could you share your code? Assuming you did this in something like R/Python and not Excel.

Most of it is just in sql scripts. I have a database in azure with the logic housed in a couple views, and a couple very basic Python scripts on top of that.

It is all very simple, and intentionally so. Doesn't require the use of R.

Pts/game * posfactor * LGfactor * agefactor * heightfactor and I basically filter out players with fewer than 20 games played and goalies.

Thats pretty much it. I've evaluated a few other factors as research pieces and will continue to do so but I want to keep it simple enough for anyone to understand. It is the baseline against which more fancy statistical models should be evaluated.

RobertKron · May 28, 2018

But Gillis said:
Most of it is just in sql scripts. I have a database in azure with the logic housed in a couple views, and a couple very basic Python scripts on top of that.

It is all very simple, and intentionally so. Doesn't require the use of R.

Pts/game * posfactor * LGfactor * agefactor * heightfactor and I basically filter out players with fewer than 20 games played and goalies.

Thats pretty much it. I've evaluated a few other factors as research pieces and will continue to do so but I want to keep it simple enough for anyone to understand. It is the baseline against which more fancy statistical models should be evaluated.

You left out the part where you doctor the data.

Seriously, though, do you think there's really any way to get away from the potato's bias toward forwards?

Melvin · May 28, 2018

PuckMunchkin said:
Is this very different from SEAL adjusted scoring? The Big SEAL Reveal: Adjusted Scoring For First Time Eligible Prospects

Or just your variation on the same theme?

I guess so, although they have different goals and are doing things that are more complicated than what I have done.

For example, I have looked into the early/late birthday stuff and found no evidence that taking that into account makes for better projections. I haven't looked into the situational stuff.

Really, you would want to look at picks taken based on their model and compare it to mine to see how much value all their extra stuff adds compared to my deliberately simplistic one.

Melvin · May 28, 2018

RobertKron said:
You left out the part where you doctor the data.

Seriously, though, do you think there's really any way to get away from the potato's bias toward forwards?

I think so, but nothing immediately comes to mind.

Right now I am applying pretty flat factors but I've been thinking about splitting them to see how that impacts things. Like is there a difference in league quality if you are looking at a defender vs a forward? All the talk about discounting Dobson because "Q defender" seems to suggest that there is. And same thing with age. I just said that the early birthday/late birthday stuff doesn't seem to say much but some research has suggested it might be useful for defenders specifically.

At the end of the day though, I am reliant on points and games played until I get my hands on different data.

The Data-Based Drafting Thread (what players would a Potato pick?)

21/12/05

Very Nice, Very Evil!

Hughes4Calder

Registered User

Flavourtown

21/12/05

Registered User

Pro Tank Since 13

Registered User

Drop a ball it falls

21/12/05

21/12/05

21/12/05

Pro Tank Since 13

21/12/05

#11

#11

Very Nice, Very Evil!

Registered User

Very Nice, Very Evil!

Registered User

21/12/05

Registered User

21/12/05

21/12/05

Ad

Ad