Player Discussion Jake "Big Tuna" Virtanen | XVII Nikolaj Who...?

Status
Not open for further replies.

rune74

Registered User
Oct 10, 2008
9,228
552
Who decides what is meaningful data and what isn't? All data is meaningful, even if some of it ends up being noise rather then useful data. You are just shoving data you don't like to the sidelines because you do not like the conclusions it draws.

The reason you want more data is because it paints a clearer picture and it can show trends and eliminate noise. Saying that small sub-sects of data is meaningless/irrelevant is just false. Saying that in October Virtanen played well. Is the exact same as saying in October Virtanen was statistically doing very well. I'm doing a micro evaluation of the data to draw a conclusion that within that sub-sect of data a player has preformed well. Again, the reason you'd want more data is to predict trends and to eliminate noise. Now Virtanen's October might be considered against his trend and just be considered noise. But we do not know that, so eliminating it as somehow being useless is not accurate.

Here is the reality. Virtanen's CURRENT statistics show that he has preformed well in the previous 9 games. However, we do not have enough data to determine that this is a anomaly or just a trend. To determine this we need more data.

I'm saying, because Virtanen is statistically doing well. He deserves a bigger role. It's a simple evaluation.

Here's a question, when you add in his other games in the NHL and take in his corsi as seen last year in the AHL what do you get?
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
Who decides what is meaningful data and what isn't? All data is meaningful, even if some of it ends up being noise rather then useful data. You are just shoving data you don't like to the sidelines because you do not like the conclusions it draws.

Now you are being childish. I could not care less about what "it draws." Data cannot draw its own conclusions; you are the one who wants to draw the conclusions from the data, and I am disagreeing with your methods of drawing them. I really don't care if Jake has played well or not. Maybe he has, maybe he hasn't. It is irrelevant to this discussion.

The reason you want more data is because it paints a clearer picture and it can show trends and eliminate noise. Saying that small sub-sects of data is meaningless/irrelevant is just false.

Have you ever heard the expression "measure twice, cut once?" Why is it? Because the first measurement may have been an error. By measuring twice you are removing some noise since it is harder to make the same error both times. Now, the expression is not "measure 309 times, cut once" because direct measurements with regards to physical area are typically pretty precise and do not contain much noise. Hockey data, on the other hand, contains a lot of it. There are a lot of things that can impact a player's Corsi rating that have nothing to do with how the player played. That makes it noisy. You want to collect more data so that these things even out, but until you do, they are absolutely meaningless. To assert that data can never be meaningless is demonstrably false.

Imagine if Jake comes on the ice, makes a blunderous play, but is bailed out by a teammate. Then the teammate makes a couple incredible plays and it leads to 4 shots on goal. Jake then leaves the ice. Has Jake played well? Do the measurements show that Jake has played well? No. The fact that he was a +4 in Corsi for that stretch of play has absolutely nothing to do with his actual play and is just noise. If you can understand why it would be ludicrous to make conclusions about his play based on one shift, then you can expand that to one game and even 10 games where I still believe there is too much noise to gather anything useful (my opinion, of course.) You are correct that I am not the authority over what is meaningful and what is not. It is my opinion that the samples you are using are too small and too noisy to draw any conclusions with regards to how any particular player is playing within that sample.

Saying that in October Virtanen played well. Is the exact same as saying in October Virtanen was statistically doing very well. I'm doing a micro evaluation of the data to draw a conclusion that within that sub-sect of data a player has preformed well. Again, the reason you'd want more data is to predict trends and to eliminate noise. Now Virtanen's October might be considered against his trend and just be considered noise. But we do not know that, so eliminating it as somehow being useless is not accurate.

This is again a misunderstanding of the concept of noise. It has nothing to do with trends or sustainability of his play. It has to do with margin of error and the indirect nature of the statistic. If your argument is that Jake has played well, I will not counter that argument. That is neither here nor there. I am arguing against the notion that we are able to draw conclusions about a players play based on ~60 minutes of exceptionally noisy on-ice data.

Here is the reality. Virtanen's CURRENT statistics show that he has preformed well in the previous 9 games. However, we do not have enough data to determine that this is a anomaly or just a trend. To determine this we need more data.

No. That is just not correct at all. His current statistics show that the team has witnessed good results when Jake has been on the ice, which may or may not be an indication of how well Jake has played.

For the record, I think that Jake likely has played well, and that his numbers do somewhat reflect some strong play for the most part, but I cannot let my biases interfere with how I review the data.
 
Last edited:

Gaunce4gm

Trusted Hockey Man
Dec 5, 2015
1,976
781
Victoria B.C.
Here's a question, when you add in his other games in the NHL and take in his corsi as seen last year in the AHL what do you get?

An elite possession player.

I don't think there's ever been a sample size that's suggested anything other than a positive possession player. They're just "too small" a sample size.
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
An elite possession player.

I don't think there's ever been a sample size that's suggested anything other than a positive possession player. They're just "too small" a sample size.

They were absolutely terrible last season, not that that means anything either.
 

rune74

Registered User
Oct 10, 2008
9,228
552
An elite possession player.

I don't think there's ever been a sample size that's suggested anything other than a positive possession player. They're just "too small" a sample size.


It will always be to small a sample size, is it the magical 300 games that will make it not so?
 

Nuckles

_________
Apr 27, 2010
28,877
5,240
heck
One little thing to keep in mind is that Virtanen is barely starting in the defensive zone, almost as little as the Sedins this year.

hmmf1Tk.png
 

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
Now you are being childish. I could not care less about what "it draws." Data cannot draw its own conclusions; you are the one who wants to draw the conclusions from the data, and I am disagreeing with your methods of drawing them. I really don't care if Jake has played well or not. Maybe he has, maybe he hasn't. It is irrelevant to this discussion.


Have you ever heard the expression "measure twice, cut once?" Why is it? Because the first measurement may have been an error. By measuring twice you are removing some noise since it is harder to make the same error both times. Now, the expression is not "measure 309 times, cut once" because direct measurements with regards to physical area are typically pretty precise and do not contain much noise. Hockey data, on the other hand, contains a lot of it. There are a lot of things that can impact a player's Corsi rating that have nothing to do with how the player played. That makes it noisy. You want to collect more data so that these things even out, but until you do, they are absolutely meaningless. To assert that data can never be meaningless is demonstrably false.

Imagine if Jake comes on the ice, makes a blunderous play, but is bailed out by a teammate. Then the teammate makes a couple incredible plays and it leads to 4 shots on goal. Jake then leaves the ice. Has Jake played well? Do the measurements show that Jake has played well? No. The fact that he was a +4 in Corsi for that stretch of play has absolutely nothing to do with his actual play and is just noise. If you can understand why it would be ludicrous to make conclusions about his play based on one shift, then you can expand that to one game and even 10 games where I still believe there is too much noise to gather anything useful (my opinion, of course.) You are correct that I am not the authority over what is meaningful and what is not. It is my opinion that the samples you are using are too small and too noisy to draw any conclusions with regards to how any particular player is playing within that sample.



This is again a misunderstanding of the concept of noise. It has nothing to do with trends or sustainability of his play. It has to do with margin of error and the indirect nature of the statistic. If your argument is that Jake has played well, I will not counter that argument. That is neither here nor there. I am arguing against the notion that we are able to draw conclusions about a players play based on ~60 minutes of exceptionally noisy on-ice data.



No. That is just not correct at all. His current statistics show that the team has witnessed good results when Jake has been on the ice, which may or may not be an indication of how well Jake has played.

For the record, I think that Jake likely has played well, and that his numbers do somewhat reflect some strong play for the most part, but I cannot let my biases interfere with how I review the data.
Honestly, I really don't feel like arguing the validity of advanced stats right now. There is a forum for that on HF. I don't claim to be an expert, and honestly I think you are mostly wrong, but it's really not a topic that I want to delve deeper into.

If you think the potential of noise is too large to make an accurate assessment of a players play then that's up to you. However, I do not.
 

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
One little thing to keep in mind is that Virtanen is barely starting in the defensive zone, almost as little as the Sedins this year.

hmmf1Tk.png
Yup he needs to play more in defensive roles.

I think he can be a good defensive role just that Green doesn't trust him to play defensive minutes.
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
Honestly, I really don't feel like arguing the validity of advanced stats right now. There is a forum for that on HF. I don't claim to be an expert, and honestly I think you are mostly wrong, but it's really not a topic that I want to delve deeper into.

If you think the potential of noise is too large to make an accurate assessment of a players play then that's up to you. However, I do not.

It is fine if you do not want to argue but do not misrepresent my position. I frequently use these stats as evaluation tools, but only with at least 40-50 games worth of data when a lot of things can be balanced out.

I would like to find common ground. Can you agree that using it for a single shift would make little sense? Can you agree that a player being +4 in one shift does not necessarily mean that the player played well during that shift? If you can at least agree with that, then we agree that the data is meaningless in a sufficiently small sample size, and it is just a matter of where you are willing to draw the line.

I am disappointed that you do not want to argue this topic because it challenges your beliefs. Saying "You are wrong but I am not going to argue" is pretty weak.

Edit: Here is an article you may want to read, if you change your mind about delving into this topic. This are the relevant parts that gets at what I am talking about:

In the end, Fistric was on the ice for 281 shots-at-net at even strength, but he only made some contribution to an actual scoring chance on 31 of those 281 shots, and he only made a mistake on a scoring chance against on 48 of the 318 shots-at-net against he was on the ice for.
...
From a five-year study on goals for and against the Oilers, we know that 30 per cent of the plus marks on goals are handed out to players who had nothing to do with the goal being scored, and that 50 per cent of the minus marks on goals against are handed out to players who made no mistake.
This same 30 per cent false positive and 50 per cent false negative ratio almost certainly applies to shots-at-net for and against, and is likely higher on shot counts, given that goals tend to come from slightly or greatly more involved sequences of play than mere shots.

But if 40 per cent of the time a player is getting a plus or minus mark he doesn’t merit, that builds in a lot of error and randomness into a system of rating a player. That is what Corsi does.

Again, with a sufficient and diverse sample size, a lot of that error and randomness goes away, but ten games just is not enough.
 
Last edited:

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
It is fine if you do not want to argue but do not misrepresent my position. I frequently use these stats as evaluation tools, but only with at least 40-50 games worth of data when a lot of things can be balanced out.

I would like to find common ground. Can you agree that using it for a single shift would make little sense? Can you agree that a player being +4 in one shift does not necessarily mean that the player played well during that shift? If you can at least agree with that, then we agree that the data is meaningless in a sufficiently small sample size, and it is just a matter of where you are willing to draw the line.

I am disappointed that you do not want to argue this topic because it challenges your beliefs.

The reason I do not feel like continuing this argument is because it bores me. I've already had these sorts of arguments before and come to the same conclusion.

You are also, not understanding what I'm doing. I'm not making a definitive statement on Virtanen as a player. I'm laying the data out, limited as it may, and saying "it's promising."


But you evaluate on 40-50 games, I personally think that is not enough time to evaluate someone and you are going to get too much noise when making a definitive statement about the player. Only 40-50 games is not enough data to create an accurate picture of the player. A player needs at least 300 games then you can evaluate him as a player. Or else you are too influenced by noise. That's basically the same argument you are trying to make, but I made it against you. It's a argument that can always be one upped.

I completely understand what you mean.

If you have 1/10 and 10/100, then add a number to it, 2/10 compared to 11/100 that one will be completely skewed and will have a larger chance of being wrong. However, I see no problem in saying "hey look at this 1/10 data, it's promising". That's a completely exaggerated way of explaining what I'm doing, but I understand your argument. I don't think it's necessary because I'm not saying "Look WOW! AMAZING 1/10!!!!".

We have 77 minutes of icetime to look at. Virtanen will probably get around 1000 ES TOI this season, if he stays up. So we are almost at 7.7% of his total TOI.
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
The reason I do not feel like continuing this argument is because it bores me. I've already had these sorts of arguments before and come to the same conclusion.

You are also, not understanding what I'm doing. I'm not making a definitive statement on Virtanen as a player. I'm laying the data out, limited as it may, and saying "it's promising."

No, I understand completely. You are assuming that the data so far paints a picture of his play, and I am disputing this. You do not seem to accept that a player can play very poorly and still wind up with the same numbers that Jake has (which I am not saying is the case.

But you evaluate on 40-50 games, I personally think that is not enough time to evaluate someone and you are going to get too much noise when making a definitive statement about the player. Only 40-50 games is not enough data to create an accurate picture of the player. A player needs at least 300 games then you can evaluate him as a player. Or else you are too influenced by noise. That's basically the same argument you are trying to make, but I made it against you. It's a argument that can always be one upped.

Yep. This is the crux of analytics! This will always be the case; you will never have enough data and as soon as you have a lot of it you run the risk of some of it being stale. There is no magical point where it becomes enough, you are looking at this as very black/white; obviously 300 would be better than 50, and I would agree that even 50 is probably not enough, but at least you can be confident after 50 games that it is getting more clear. The odds of a player playing poorly for 50 games and having great metrics is fairly low. The odds of a player playing poorly for 10 games and having great metrics is much higher, because you are not controlling for nearly as many things. You will always be influenced by noise and trying to separate out the signal is a full-time job for anyone working in analytics (myself included.)

I completely understand what you mean.

If you have 1/10 and 10/100, then add a number to it, 2/10 compared to 11/100 that one will be completely skewed and will have a larger chance of being wrong. However, I see no problem in saying "hey look at this 1/10 data, it's promising". That's a completely exaggerated way of explaining what I'm doing, but I understand your argument. I don't think it's necessary because I'm not saying "Look WOW! AMAZING 1/10!!!!".

You say you understand what I mean but then you say something that shows you do not understand what I mean. If the sample is so small and so fraught with noise that it could be almost entirely explained by random chance, then it is not logical to say that it is "promising" or that it says anything at all about the player in question.

Finally, if you do not want to argue this because it is "boring" then you are free to not reply, but do not try to hit-and-run me by misrepresenting my argument, claiming I am wrong and then being unwilling to argue further. It is pretty simple to just not respond.
 

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
No, I understand completely. You are assuming that the data so far paints a picture of his play, and I am disputing this. You do not seem to accept that a player can play very poorly and still wind up with the same numbers that Jake has (which I am not saying is the case.

You are assuming that the data in 40-50 games paints a picture of his play, I can dispute this.

If someone can put up these numbers of shots/scoring chances/individual corsi at the level Jake has. How can you dispute it. If you rank top of the charts in individual scoring chances, individual corsi, shots. How the hell can you claim he's a free loader. That's HIS INDIVIDUAL statistics.

We can run around in circles in this argument. It's not worth it.
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
You are assuming that the data in 40-50 games paints a picture of his play, I can dispute this.

If you want to take the position that 50 games is still not a large enough sample, then I won't argue. This has been studied by people smarter than I, and generally half a season is seen as a meaningful dataset, but I won't argue with someone who thinks more data is required. Of course, then you cannot simultaneously cling to the idea that Jake's stats after 10 games somehow means something. I am not sure what you are trying to accomplish here.

If someone can put up these numbers of shots/scoring chances/individual corsi at the level Jake has. How can you dispute it.

You are moving the goalposts by citing different metrics than the ones with which I had taken issue. I am not arguing about Jake's play, I am arguing against the use of his on-ice corsi data as a means of evaluating his play.

We can run around in circles in this argument. It's not worth it.

Then stop responding. Nobody is forcing you to reply except for your need to get the last word.

I am continuing to engage because I enjoy this conversation. If you do not, you can cease at any time.
 
  • Like
Reactions: ferroid

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
If you want to take the position that 50 games is still not a large enough sample, then I won't argue. This has been studied by people smarter than I, and generally half a season is seen as a meaningful dataset, but I won't argue with someone who thinks more data is required. Of course, then you cannot simultaneously cling to the idea that Jake's stats after 10 games somehow means something. I am not sure what you are trying to accomplish here.



You are moving the goalposts by citing different metrics than the ones with which I had taken issue. I am not arguing about Jake's play, I am arguing against the use of his on-ice corsi data as a means of evaluating his play.



Then stop responding. Nobody is forcing you to reply except for your need to get the last word.

I am continuing to engage because I enjoy this conversation. If you do not, you can cease at any time.

So any one making the argument above the 50 game threshold you won't argue with. But anything below you will...

Why so arbitrary?
 

Melvin

21/12/05
Sep 29, 2017
15,207
28,115
Vancouver, BC
So any one making the argument above the 50 game threshold you won't argue with. But anything below you will...

All I am trying to establish with you is that there actually is a threshold. Your initial position seemed to be that any amount of data is enough and you won't even acknowledge to me that one shift is too small a data set from which to draw conclusions about a player's play.

If I say it's 50 games, you say it is what....1 game? 1 shift? 1 second? If you will at least acknowledge that there exists a sufficiently small enough sample size that you are unable to draw any conclusions from it, good or bad, "promising" or not, then I will consider this a closed discussion.

Again, if the player plays one shift and leaves the ice +4 in corsi would you consider that a "promising" result, or do you agree that it's silly to assume that he played well from that one shift just because he had a +4 rating?
 

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
All I am trying to establish with you is that there actually is a threshold. Your initial position seemed to be that any amount of data is enough and you won't even acknowledge to me that one shift is too small a data set from which to draw conclusions about a player's play.

If I say it's 50 games, you say it is what....1 game? 1 shift? 1 second? If you will at least acknowledge that there exists a sufficiently small enough sample size that you are unable to draw any conclusions from it, good or bad, "promising" or not, then I will consider this a closed discussion.

Again, if the player plays one shift and leaves the ice +4 in corsi would you consider that a "promising" result, or do you agree that it's silly to assume that he played well from that one shift just because he had a +4 rating?

My choice in games to evaluate a player is just as arbitrary as yours. I think it's perfectly fine to do a micro-analysis of a player every game. But it's important to realize and acknowledge to not take solid conclusions from the data because it is a small sample size. And it's important not to use a definitive language when evaluating a player on a single game basis, or a single shift basis.
 

Hyzer

Jimbo is fired - the good guys won
Aug 10, 2012
4,941
2,162
Vancouver
If I recall, when Virty was on the Hitmen, his corsi and possession stats were actually very, very good. I will see if I can dig up the junior advanced stats when he was playing or at least find the site.
 

WTG

December 5th
Jan 11, 2015
24,408
8,798
Pickle Time Deli & Market
If I recall, when Virty was on the Hitmen, his corsi and possession stats were actually very, very good. I will see if I can dig up the junior advanced stats when he was playing or at least find the site.

CHL stats was the website IIRC. Pretty sure the sites shut down, the guy got hired by a NHL team I'm pretty sure.
 

Hyzer

Jimbo is fired - the good guys won
Aug 10, 2012
4,941
2,162
Vancouver
CHL stats was the website IIRC. Pretty sure the sites shut down, the guy got hired by a NHL team I'm pretty sure.


Yup, it was CHL stats, and I can see that its down now. Too bad, had some really good advanced stats for the WHL when I was looking and comparing Virtanen and Bean to others during the draft year. Very unfortunate. I am 100% sure that Virtanens advanced stats were actually surprisingly good during his draft year and his draft +1. Even during the Draft+1 year where he didn't score as much, his possession and corsi were still very good IIRC.

Too bad.
 
Status
Not open for further replies.

Ad

Upcoming events

Ad

Ad