Overtime Game Statistical Oddity

  • Xenforo Cloud will be upgrading us to version 2.3.5 on March 3rd at 12 AM GMT. This version has increased stability and fixes several bugs. We expect downtime for the duration of the update. The admin team will continue to work on existing issues, templates and upgrade all necessary available addons to minimize impact of this new version. Click Here for Updates
The equation for the variance when using the binomial model is Var(x)=np(1-p) and the standard deviation is just the square root of the variance. I simply used (30/2)*100*27 as my sample size. Each team with 27 games divided by two to account for the dual nature of games, 100 years.



If you would like to learn more about stats I highly recommend the following books:
Probability Theory: The Logic of Science by Jaynes
Naked Statistics: Stripping the Dread from the Data by Wheelan
They are some of the best basic statistics books out there. A calculus background is very useful when you start hitting some of the higher topics.

For data analysis, you absolutely must read
Data Analysis Using SQL and Excel by Linoff
It's a gem.

As for me, I'd like to eventually do advanced sports statistics for a living. I've been working on my own hockey analysis research here at Columbia, which I'll hopefully be able to publish sometime in the middle of next year.

It almost definitely won't work out, though, so I'll likely be looking at consulting/big data (companies like 1010data). In any case, if you really want to get into any analytics, a strong programming background is pretty much a prereq at this point. I'd start off with Python then move on to C++.

Thanks. BTW, it was brought to my attention that not all of the samples are independent. If there are 3 OT games in the first 27 game sample there is a 0% chance that there will be 1 or fewer OT games in the 2nd one. So I think I bit off more than I could chew with this analysis.

How much do you know about finance? If I remember correctly variance in finance is calculated the exact same way, no? I was reading a bit on risk management and that equation looks awfully similar. In stocks I think the variance and SD were a way of measuring volatility in the stock market. I assume this is something similar. I am however finding a hard time figuring out what p you used.
 
Eh, Tawnos, if you read through the thread you'll notice I overlooked 2 HUGE factors and if you look at ANY 27 game sample across ALL teams and not just 1 team, you get a 99.96% chances that within one 27 game stretch in a given year a team will have 1 or fewer OTs. I forgot to look at all 27 game samples. However, within 1 season there are 56 per team. I in the first post took only 1 27 game stretch and after that took 3 a season. So I got 149 samples to get to over 50% (and that's BARELY 50%). I also forgot to include every team (30) and all 56 samples per team. So that's 1,680 such samples a year. So the chances that a team in 1 27 game sample has only 1 OT team game or fewer is less than 0.5%, but within the season that 1 team will have 1 or fewer OTs in a 27 game stretch becomes (99.95^1,680) which actually is a HUGE number. 99.96%. So the chances that 1 team DOESN'T have any one such 27 game stretch of 1 or fewer OT games is 0.04%. In other words it should pretty much happen every season and the Rangers are nothing special. It just looks weird because it's the first 27 games and since we're only looking at 1 such sample per team (no one remembers 27 game samples in the middle of the season), there's only an approximately 13% chance that a team does it to start the season. So there's a reason it looks weird when we're looking at it with an eye test we're omitting 1,650 samples (1,680-30) because we're only looking at the first 27 games per team.

Great. Except that I wasn't talking about any of that. What you haven't looked into is what the NEXT 27 game stretch looks like for those teams. Hypothetically speaking, if the average of OT games for all teams in a 54 game stretch is 8 games, when a team has only 1 in the first 27 games, do they often have 7 in the next 27? (none of those numbers are correct... just an example)

Like I said, it would be interesting to see if the trend swings the other way for this particular team. Not really interested in probabilities in that sense, but rather what the real results are going to be.
 
Great. Except that I wasn't talking about any of that. What you haven't looked into is what the NEXT 27 game stretch looks like for those teams. Hypothetically speaking, if the average of OT games for all teams in a 54 game stretch is 8 games, when a team has only 1 in the first 27 games, do they often have 7 in the next 27? (none of those numbers are correct... just an example)

Like I said, it would be interesting to see if the trend swings the other way for this particular team. Not really interested in probabilities in that sense, but rather what the real results are going to be.

Well if those were independent events I don't see why 7 games in the next 27 is more likely than the average amount of games for a 8 game stretch. So say 8 per 54 is the average and 4 per 27, I don't see why 7 for the next 27 is more likely than 4 for the next 27. Unless you think they're not independent.
 
Sorry I don't see why 7 games in the next 27 is more likely than an average amount of games for a 27 game stretch, not 8 game stretch.
 
Well if those were independent events I don't see why 7 games in the next 27 is more likely than the average amount of games for a 8 game stretch. So say 8 per 54 is the average and 4 per 27, I don't see why 7 for the next 27 is more likely than 4 for the next 27. Unless you think they're not independent.

What I'm saying is that it will be interesting to see if, at the end of another 27 game stretch, the trend matches what the probability says it should. That's all. And then another 27 games beyond that, if it doesn't.

Sorta like how people talk about shooting % being unsustainable in one way or the other. Is this an unsustainable trend the Rangers have been on and does it correct itself over time to better match the probability (or average, since there's no other way to determine probability)? Or is it far enough into the season where the trend won't be able to correct itself fully? We'll see.
 
What I'm saying is that it will be interesting to see if, at the end of another 27 game stretch, the trend matches what the probability says it should. That's all. And then another 27 games beyond that, if it doesn't.

Sorta like how people talk about shooting % being unsustainable in one way or the other. Is this an unsustainable trend the Rangers have been on and does it correct itself over time to better match the probability (or average, since there's no other way to determine probability)? Or is it far enough into the season where the trend won't be able to correct itself fully? We'll see.

I guess you're talking about regression to the mean. I don't know. Are 82 games enough to regress to the mean? 27 games is 1/3 of that. Maybe if they played 1,000 games or something but I figure it wouldn't happen all at once.
 
Basically considering that there are 25% of OT games, I'd assume that the likeliest scenarios are X+3X=27. So that's 4X=27. X is about 7. (I actually believe I got that number before). However, without doing math, I'd think that if you took enough samples having say 7 OT games and 20 regulation games, while a plurality would be far from the majority. So you're bound to have stretches of 1 OT game or 11 OT games over a large enough sample. So it's likely if the sample were large enough (say 1,000 games) you'd have an above average sample that would average it out. However, since we're taking 82 games, it seems pretty unlikely that it would average out, because you basically have 2 more shots (2 samples of 27).
 
Some years back I was close to going to Columbia until I found out they required Engineers to take foreign language courses :D
And, as for a sports analytics book, I would recommend The Book by Tom Tango. It gets deep into statistics and makes a lot of conclusions from them. It also shows a lot of the math behind the work. It's solely baseball focused though.

It's literally any language, though.

I took a semester of Polish (which I'm fluent in already) and a semester of Yiddish (didn't learn anything, got an A). You could learn Swedish and call out to the King :laugh:
 
What I'm saying is that it will be interesting to see if, at the end of another 27 game stretch, the trend matches what the probability says it should. That's all. And then another 27 games beyond that, if it doesn't.

Sorta like how people talk about shooting % being unsustainable in one way or the other. Is this an unsustainable trend the Rangers have been on and does it correct itself over time to better match the probability (or average, since there's no other way to determine probability)? Or is it far enough into the season where the trend won't be able to correct itself fully? We'll see.

When people talk about regression to the mean in sh %, that was mainly a while back. Nowadays, when some coaches' systems specifically rely on shots shots shots from wherever, you could very well see a high Corsi and low sh % that doesn't regress to the mean.

What you were talking about earlier is conditional probability, which doesn't really work here because these events are at the very least quasi-independent. ie. The outcome of the next game going to OT has no bearing on how the last 27 went. You could use conditional probability for something like how many of these next 27 will go to OT given that Callahan is out (say statistically Callahan makes the team go to OT 3% more than on average).

Regressing to the mean doesn't mean what a lot of people here thinks it does. If we go through a spurt of 2% sh% and the mean is 7%, it doesn't mean that soon we'll have 12% sh%. It means that in the very long run and in the future, sh % should be 7% over a very long period of games. This very long period of games is much longer than the period of 2% sh% and the length of time itself will make the 2% statistically insignificant.


There's a lot more that goes into sh% though so it's hard to say. A coaching system that emphasizes lots of point shots will inevitably have a lower sh% than other teams. Based on the current rate of shot distance and shot quality projects, I predict that within 2 years we'll be able to gather all the shots taken by a team within a year and using previous trends, predict (without knowing the real sh% of that team beforehand) to within half a percent their sh %.
 
:laugh: Didn't I say "sorta like"? Obviously, I know it's not the same thing.

Yeah, I know what you mean, I was just trying to get a different point across (not to you specifically).

The problem is that people here assume 82 games is enough to regress towards the mean. It's not. But it's "good enough", you know? Technically to get a nice distribution you should get like 1000, that's why league shot percentage doesn't vary much year to year, but team shot percentage does.

Same thing applies to curves for grades in classes - you really shouldn't use it for classes less than 300 or so, but it still works "ok" for classes of 50. You don't get a nice distribution and you might have the average intelligence be higher than average or lower than average. Therefore a B in one year might get you an A in the next or C in the next. But that's just how it is.

Like,
chap06h.jpg

Not a great distribution. But it's almost frankly good enough.

BD3x.jpg

Not perfect, but is anyone really going to argue that it's not perfect? It's more likely to have outliers, but not so much that it can't be approximated as a model.
 
I bought Naked Statistics and it's very interesting! I find it amazing how many hate and can't understand statistics. I always loved them. I make mistakes (this thread being exhibit A) but most of statistics are pretty intuitive and very interesting. I will say I used to completely not understand probability, so I guess I can see that, but most stat classes barely talk about them.
 

Ad

Upcoming events

Ad