National Hockey League Statistics¶
We can look at NHL statistics by team or by player, using data from hockey-reference.com or ESPN NHL Statistics.
Statistics by Team¶
team = 'EDM'
year = '2019'
# download the data
import pandas as pd
team_stats_url = 'https://www.hockey-reference.com/teams/'+team+'/'+year+'_games.html'
team_stats = pd.read_html(team_stats_url)[0]
# clean up the data
team_stats = team_stats[team_stats['Date']!='Date'].set_index('GP').drop(columns=['W','L','OL','Streak','Notes'])
team_stats.columns = ['Date', 'Away', 'Opponent', 'Goals For', 'Goals Against', 'Win', 'Overtime', 'Attendance', 'Duration']
team_stats = team_stats.fillna(0).replace('@', 1).replace('OT', 1).replace('W',1).replace('SO',1).replace('L',0)
# convert text string columns to number columns
team_stats['Goals For'] = pd.to_numeric(team_stats['Goals For'])
team_stats['Goals Against'] = pd.to_numeric(team_stats['Goals Against'])
team_stats['Attendance'] = pd.to_numeric(team_stats['Attendance'])
# convert duration in h:mm to duration in minutes
duration_values = team_stats['Duration'].str.split(':', expand=True).astype(int)
team_stats['Duration'] = duration_values[0]*60 + duration_values[1]
# display the data
team_stats
Date | Away | Opponent | Goals For | Goals Against | Win | Overtime | Attendance | Duration | |
---|---|---|---|---|---|---|---|---|---|
GP | |||||||||
1 | 2018-10-06 | 1 | New Jersey Devils | 2 | 5 | 0 | 0 | 12044 | 151 |
2 | 2018-10-11 | 1 | Boston Bruins | 1 | 4 | 0 | 0 | 17565 | 148 |
3 | 2018-10-13 | 1 | New York Rangers | 2 | 1 | 1 | 0 | 17085 | 144 |
4 | 2018-10-16 | 1 | Winnipeg Jets | 5 | 4 | 1 | 1 | 15321 | 149 |
5 | 2018-10-18 | 0 | Boston Bruins | 3 | 2 | 1 | 1 | 18347 | 146 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
78 | 2019-03-30 | 0 | Anaheim Ducks | 1 | 5 | 0 | 0 | 18347 | 141 |
79 | 2019-04-01 | 1 | Vegas Golden Knights | 1 | 3 | 0 | 0 | 18367 | 144 |
80 | 2019-04-02 | 1 | Colorado Avalanche | 2 | 6 | 0 | 0 | 17021 | 142 |
81 | 2019-04-04 | 0 | San Jose Sharks | 2 | 3 | 0 | 0 | 18347 | 147 |
82 | 2019-04-06 | 1 | Calgary Flames | 3 | 1 | 1 | 0 | 19289 | 145 |
82 rows × 9 columns
Statistics by Player¶
This data set contains the following columns for each player in the NHL:
GP: Games Played
G: Goals
A: Assists
PTS: Points
+/-: Plus/Minus Rating
PIM: Penalty Minutes
PTS/G: Points Per Game
SOG: Shots on Goal
PCT: Shooting Percentage
GWG: Game-Winning Goals
G.1: Power-Play Goals
A.1: Power-Play Assists
G.2: Short-Handed Goals
A.2: Short-Handed Assists
This will take a while to run, since it needs to get data from multiple pages.
# download the data
points_url = 'http://www.espn.com/nhl/statistics/player/_/stat/points'
import pandas as pd
for i in range(20):
try:
p = pd.read_html(points_url+'/count/'+str(1+40*i), header=1)[0]
p = p[p['PLAYER']!='PLAYER'].dropna(subset=['PLAYER']).fillna(method='ffill')
if i == 0:
points = p
else:
points = points.append(p).reset_index().drop(columns='index')
# if the site has run out of data
except:
pass
# convert text string columns to number columns
for column in points.columns:
if column != 'PLAYER' and column != 'TEAM':
points[column] = pd.to_numeric(points[column])
# split the player name and position into two columns
points['POSITION'] = points['PLAYER'].str.split(',', expand=True)[1]
points['PLAYER'] = points['PLAYER'].str.split(',', expand=True)[0]
# display the data
points
RK | PLAYER | TEAM | GP | G | A | PTS | +/- | PIM | PTS/G | SOG | PCT | GWG | G.1 | A.1 | G.2 | A.2 | POSITION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Nikita Kucherov | TB | 20 | 6 | 20 | 26 | 13 | 20 | 1.30 | 72 | 8.3 | 1 | 0 | 7 | 0 | 0 | RW |
1 | 2 | Nathan MacKinnon | COL | 15 | 9 | 16 | 25 | 13 | 12 | 1.67 | 65 | 13.8 | 0 | 3 | 6 | 0 | 0 | C |
2 | 2 | Brayden Point | TB | 18 | 9 | 16 | 25 | 10 | 8 | 1.39 | 54 | 16.7 | 2 | 1 | 3 | 0 | 0 | C |
3 | 4 | Miro Heiskanen | DAL | 22 | 5 | 18 | 23 | 5 | 2 | 1.05 | 47 | 10.6 | 0 | 2 | 6 | 0 | 0 | D |
4 | 5 | Mikko Rantanen | COL | 15 | 7 | 14 | 21 | 11 | 6 | 1.40 | 55 | 12.7 | 0 | 2 | 6 | 0 | 0 | RW |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
370 | 361 | Marcus Foligno | MIN | 4 | 0 | 1 | 1 | -2 | 5 | 0.25 | 4 | 0.0 | 0 | 0 | 0 | 0 | 0 | LW |
371 | 361 | Carl Hagelin | WSH | 8 | 0 | 1 | 1 | -4 | 2 | 0.13 | 4 | 0.0 | 0 | 0 | 0 | 0 | 0 | LW |
372 | 361 | Jake Evans | MTL | 6 | 0 | 1 | 1 | -1 | 0 | 0.17 | 3 | 0.0 | 0 | 0 | 0 | 0 | 0 | C |
373 | 361 | Ilya Kovalchuk | WSH | 8 | 0 | 1 | 1 | 0 | 2 | 0.13 | 5 | 0.0 | 0 | 0 | 0 | 0 | 0 | LW |
374 | 361 | Morgan Geekie | CAR | 8 | 0 | 1 | 1 | -1 | 0 | 0.13 | 4 | 0.0 | 0 | 0 | 0 | 0 | 0 | C |
375 rows × 18 columns