Where to find historical stats for download?

plusandminus

Registered User
Mar 7, 2011
1,404
269
(I have searched, both by google and here on hfboards. Also, sorry for non-fluent English.)

Is there any website or api online where you can download stats like the ones on hockeyref?

Looking at hockeyref (I recently subscribed) there are miscellaneous data, like TGF, TGA, PPGF, PPGA for every player. One can even list stats to see the stats for every player in the league in a long list. But, the list unfortunately just list the TOT stats for players having played on several teams during the season. So it's inclomplete; I would want to see it broken up to be shown as on this page (see link), where you for the 35 year old Gretzky see both his stats when playing for LAK and for STL respectively.
Wayne Gretzky Stats | Hockey-Reference.com
The list showing the misc stats for all players (you might need to subscribe) only list the TOT row for players having played on multiple teams during a season.

I also wonder if there exists data about which players were on ice when a goal was scored.
18 years ago, it was listed for each game at nhl.com. I found a way to write a program that downloaded that stat for all 1230 games. I had to download the wholes html pages, so the data was buried here and there in plain html, and there was inconsistency on how it was presented, etc. I probably spent more than 100 hours the program.
But today? Have they stopped listing the information?

Ideally, I would just want a long list (text/excel/json/etc) showing like (made up example):
2003-02-14 3rd 11:45 COL-WSH PP Forsberg (Sakic, Foote), 21,19,52,4,23,1, 33,34,32,12,1,-)
where 21=Forsberg, 19=Sakic, etc, followed by the oppponents on ice.
Or even better would be the player names. Or best of all the hockeyref keys, like forsbpe01, etc.
I really prefer hockeyref's keys, as opposed to more syntetical ones like 2342, 4325, 8439, etc. But those can do do, since I can translate them.
This information would add a lot of information about who played with who, or against who, and it would also open up for things like factual ES+/-, etc.

(A few of years ago, I tried the hockey summary project. Great work by the people involved, but unfortunately the data contained lots of errors, that took a lot of time to correct. It was also completely missing data for some seasons.)


So is there some api somewhere?
Or some place where the data is available as downloadable files?

(Though this thread is stat oriented, historical stats is referred to in basically every thread here.)
 
For your first query, try the NHL website's own stats page where you can use the filters to search for a particular player on a particular team for a particular season (Wayne Gretzky Kings and Blues) and get the various reports available. Of course, you actually have to search for the particular player and particular team, and the goals for and against report isn't available for that year, so maybe not what you're looking for.

For your second query, you can find HTML reports which provide HTML game summaries for games for about the past twenty years. However, they do not seem to exist or at least be accessible before 2000-2001. This gives you an boxscore with the information including who was on the ice on both sides (given with their jersey numbers). There is also the Stats API which provides JSON files for most historical games (though some are missing for every season from 1942-1943) which has much of the boxscore information, but critically not who else was on the ice for the events. The Stats API has apparently been pretty thoroughly documented online already (sucks for me, I sort of learned it all myself lol).

I've actually used the Stats API to gather the boxscore data I could for players points, putting them into JSON files which store every player (that actually scored a point) as a "dictionary"/key value pair of the player's name with an array of all the points they scored in a season, the point information including:

boxscore info.png


I used it to get all sorts of stuff I always wanted for historical stats (points together, primary points, empty net points, points on goalies, score effects, etc), and the plan was to put it all up on a website sometime, but I haven't yet gotten around to it for various reasons (just lazy and kinda suck at computers and also kinda hate computers lol and the NHL including some of the stuff on their stats reports).

Also, the way I parsed the data notably doesn't contain two specific things I wanted: who else was on the ice as you mentioned (which doesn't exist in the API anyway) and specific strength information (5v5 and so on) which doesn't exist explicitly, but could actually be derived painstakingly by cross referencing penalty plays and times with scoring plays and times.

Also, you should check out morehockeystats.com by @morehockeystats. The face2face report on his site shows players +/- against each other, so he has the data from the HTML reports already there. Perhaps he would run the reports you need from his dataset? It may cost you.

You can also try to reach out to one of the big sports stats organizations like Elias Sports Bureau to ask for datasets or reports, but that'll likely cost you as well if you even manage to get a response.

Good to see at least someone else is interested in collecting this stuff, I've been wanting to do this sort of thing for like over ten years now. Maybe I will eventually get some of these historic stats up.
 
For your second query, you can find HTML reports which provide HTML game summaries for games for about the past twenty years.

Thank you very much!

Yeah, now I remember more. It was the gs-files (game summaries) and es-files (event summaries) I downloaded from nhl.com. There was some inconsistency in the way they stored the data, and I remember especially Montreal home games was stored quite differently (for example in French). There also could be errors in the files.
Thanks to you, I now even found the program I wrote (in Visual Basic, it seems) 18+ years ago. Unfortunately it's not currently runnable, as I've upgraded Windows a couple of times since then.
It was the 2002-03 season I focused upon back then.

Today's gs-files (game summaries) even seems to list the names of the skaters on ice.


For your first query, try the NHL website's own stats page where you can use the filters to search for a particular player on a particular team for a particular season (Wayne Gretzky Kings and Blues) and get the various reports available. Of course, you actually have to search for the particular player and particular team, and the goals for and against report isn't available for that year, so maybe not what you're looking for.

I understand. But yeah, i was hoping for a simpler way, rather than having to look up each player.


There is also the Stats API which provides JSON files for most historical games (though some are missing for every season from 1942-1943) which has much of the boxscore information, but critically not who else was on the ice for the events. The Stats API has apparently been pretty thoroughly documented online already (sucks for me, I sort of learned it all myself lol).

I've actually used the Stats API to gather the boxscore data I could for players points, putting them into JSON files which store every player (that actually scored a point) as a "dictionary"/key value pair of the player's name with an array of all the points they scored in a season, the point information including:

I used it to get all sorts of stuff I always wanted for historical stats (points together, primary points, empty net points, points on goalies, score effects, etc), and the plan was to put it all up on a website sometime, but I haven't yet gotten around to it for various reasons (just lazy and kinda suck at computers and also kinda hate computers lol and the NHL including some of the stuff on their stats reports).

Also, the way I parsed the data notably doesn't contain two specific things I wanted: who else was on the ice as you mentioned (which doesn't exist in the API anyway) and specific strength information (5v5 and so on) which doesn't exist explicitly, but could actually be derived painstakingly by cross referencing penalty plays and times with scoring plays and times.

Thanks. So the Stats API has data back to at least 1942-43? (I wasn't able to find that info anywhere on their site. I also couldn't find the price for using the api.) Unfortunate that it's incomplete.
I'll wait a little regarding Stats API, to focus on nhl.com and hockeyreference. Hockeyreference has the ES/PP/SH info for goals. And one can also use the gs-files to figure out how many players on each team was on ice, but (as you say) only for the last 20 seasons.

(By the way, I think the NHL should make ES +/- an official stat. And it should only include ES goals scored when both teams had a goalie on the ice.)


Also, you should check out morehockeystats.com by @morehockeystats. The face2face report on his site shows players +/- against each other, so he has the data from the HTML reports already there. Perhaps he would run the reports you need from his dataset? It may cost you.

You can also try to reach out to one of the big sports stats organizations like Elias Sports Bureau to ask for datasets or reports, but that'll likely cost you as well if you even manage to get a response.

Good to see at least someone else is interested in collecting this stuff, I've been wanting to do this sort of thing for like over ten years now. Maybe I will eventually get some of these historic stats up.

Thanks for the links.
I'll start to see what I can gain from the gs- and es-files.

Yeah, I've done these things. What I did in early 2003 probably was quite unusual for someone to do. I even analyzed every shift, and things like zone starts, how much player A played with player B, etc. It took plenty of hard disk space, as well as took time to run, but with today's hardware that should be easier.

However, lack of time is a factor. I work full time and have other things than hockey stats I need to focus on.
 
Thank you very much!

Yeah, now I remember more. It was the gs-files (game summaries) and es-files (event summaries) I downloaded from nhl.com. There was some inconsistency in the way they stored the data, and I remember especially Montreal home games was stored quite differently (for example in French). There also could be errors in the files.
Thanks to you, I now even found the program I wrote (in Visual Basic, it seems) 18+ years ago. Unfortunately it's not currently runnable, as I've upgraded Windows a couple of times since then.
It was the 2002-03 season I focused upon back then.

Today's gs-files (game summaries) even seems to list the names of the skaters on ice.

Right, I really wanted the NHL to have these HTML reports from earlier on, but it looks like 2000-2001 is the earliest we will get. Who was on ice on both teams when a goal was scored must have been tracked all the way back to when plus/minus started being tracked, but this report really is the only place I know it exists that is available to us.

Should be pretty easy to scrape the data if you know HTML (I am not super good with it and so haven't gotten it yet), just iterate through all the game IDs.

NHL's game ID system is nice and easy: http://www.nhl.com/scores/htmlreports/ + season (i.e. 20012002) + / + GS + game type (01 for preseason/all star/etc, 02 for regular season, 03 for playoff) + game id + .HTM

game id for regular season is just between 0001 and number of teams that season * number of games that season / 2 (last season would be an exception)

game id for playoffs is first digit 0, second digit round (1-4), third digit matchup (1-8), fourth digit game (1-7) and you'll want to handle empty cases here to make it programmatic rather than finding actual values for every playoff year

Thanks. So the Stats API has data back to at least 1942-43? (I wasn't able to find that info anywhere on their site. I also couldn't find the price for using the api.) Unfortunate that it's incomplete.
I'll wait a little regarding Stats API, to focus on nhl.com and hockeyreference. Hockeyreference has the ES/PP/SH info for goals. And one can also use the gs-files to figure out how many players on each team was on ice, but (as you say) only for the last 20 seasons.

The Stats API is provided by the NHL. It's been available for some years now (around since the big redesign of the website). It is what is used to build the Scores reports on the NHL site. It contains data back to the first season. In fact, the data until before the start of the Original Six is more complete. I believe when the game boxscores were digitized, they started with the earliest period of NHL history and actually did a thorough job with it. So there are over 1000 missing games ("missing" means the play data that has the boxscore info is missing in the report) until 2017-2018 (maybe more now, I haven't gotten the last few seasons yet) but only two before 1942-1943, and I don't believe those games are actually missing, just that they were both 0-0 games with no relevant play data available (goals or penalties).

The Stats API is free to use for now and contains a fairly comprehensive set of context around a goal scored as shown above. Only thing it unfortunately is missing is who was on the ice, and you'd need to derive the actual strength beyond even strength/power play/shorthanded from cross referencing penalty times and score times (at least for older games, this info may actually exist for newer games to build the NHL's own stat reports).

These do a decent job documenting the various APIs available (one being discussed here is the "game" API for boxscore info):
How to Retrieve Player Stats from the NHL's undocumented REST API | Hacker Noon
dword4/nhlapi

I played around a lot with the liveData in the game API learning through trial and error (shootout goals were a complete pain that confounded me for months lol) so happy to give you pointers if you need them, just send me a message.

However, lack of time is a factor. I work full time and have other things than hockey stats I need to focus on.

So I did quite a bit of this work at work lol! Hey, I work as product in tech so I was sharpening my technical skills :sarcasm:
 
Right, I really wanted the NHL to have these HTML reports from earlier on, but it looks like 2000-2001 is the earliest we will get. Who was on ice on both teams when a goal was scored must have been tracked all the way back to when plus/minus started being tracked, but this report really is the only place I know it exists that is available to us.

Should be pretty easy to scrape the data if you know HTML (I am not super good with it and so haven't gotten it yet), just iterate through all the game IDs.

NHL's game ID system is nice and easy: http://www.nhl.com/scores/htmlreports/ + season (i.e. 20012002) + / + GS + game type (01 for preseason/all star/etc, 02 for regular season, 03 for playoff) + game id + .HTM

game id for regular season is just between 0001 and number of teams that season * number of games that season / 2 (last season would be an exception)

game id for playoffs is first digit 0, second digit round (1-4), third digit matchup (1-8), fourth digit game (1-7) and you'll want to handle empty cases here to make it programmatic rather than finding actual values for every playoff year

Thanks! Yeah, I get it. I had forgotten that 01 is preseason, 02 is regular season, and 03 is playoffs. Great to have it all available.

I also found the play by play stats. For example:
Play By Play
I remember using it to - for example - examine how much faceoffs meant. Conclusions were that where the faceoff takes place is more important than who "wins" it, and that advantage (of having an offensive zone faceoff) lasted for about 8-10 seconds, until shifting over to rather being the other team that scored.
One can use it for things like "shot +/-" too, although that is a stat I think most people seem to misunderstand. (It's not shooting that matters, it's scoring goals, and there is often not a strong correlation between the two.)
One can also use it to see how often different players started shifts together.
And zone starts.
And to see who got penalized and who it was "drawn by".
(I also know there are sites listing these things.)


The Stats API is provided by the NHL. It's been available for some years now (around since the big redesign of the website). It is what is used to build the Scores reports on the NHL site. It contains data back to the first season. In fact, the data until before the start of the Original Six is more complete. I believe when the game boxscores were digitized, they started with the earliest period of NHL history and actually did a thorough job with it. So there are over 1000 missing games ("missing" means the play data that has the boxscore info is missing in the report) until 2017-2018 (maybe more now, I haven't gotten the last few seasons yet) but only two before 1942-1943, and I don't believe those games are actually missing, just that they were both 0-0 games with no relevant play data available (goals or penalties).

The Stats API is free to use for now and contains a fairly comprehensive set of context around a goal scored as shown above. Only thing it unfortunately is missing is who was on the ice, and you'd need to derive the actual strength beyond even strength/power play/shorthanded from cross referencing penalty times and score times (at least for older games, this info may actually exist for newer games to build the NHL's own stat reports).

These do a decent job documenting the various APIs available (one being discussed here is the "game" API for boxscore info):
How to Retrieve Player Stats from the NHL's undocumented REST API | Hacker Noon
dword4/nhlapi

I played around a lot with the liveData in the game API learning through trial and error (shootout goals were a complete pain that confounded me for months lol) so happy to give you pointers if you need them, just send me a message.

Thanks. I'll check it out a bit later. First the gamelogs and event stats. :)


So I did quite a bit of this work at work lol! Hey, I work as product in tech so I was sharpening my technical skills :sarcasm:

Yeah, look at it as a win-win. :)
 

Users who are viewing this thread

Ad

Ad