{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)\n", "\n", "\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# National Hockey League Statistics\n", "\n", "We can look at NHL statistics by team or by player, using data from [hockey-reference.com](https://www.hockey-reference.com) or [ESPN NHL Statistics](http://www.espn.com/nhl/statistics).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistics by Team" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateAwayOpponentGoals ForGoals AgainstWinOvertimeAttendanceDuration
GP
12018-10-061New Jersey Devils250012044151
22018-10-111Boston Bruins140017565148
32018-10-131New York Rangers211017085144
42018-10-161Winnipeg Jets541115321149
52018-10-180Boston Bruins321118347146
..............................
782019-03-300Anaheim Ducks150018347141
792019-04-011Vegas Golden Knights130018367144
802019-04-021Colorado Avalanche260017021142
812019-04-040San Jose Sharks230018347147
822019-04-061Calgary Flames311019289145
\n", "

82 rows × 9 columns

\n", "
" ], "text/plain": [ " Date Away Opponent Goals For Goals Against Win \\\n", "GP \n", "1 2018-10-06 1 New Jersey Devils 2 5 0 \n", "2 2018-10-11 1 Boston Bruins 1 4 0 \n", "3 2018-10-13 1 New York Rangers 2 1 1 \n", "4 2018-10-16 1 Winnipeg Jets 5 4 1 \n", "5 2018-10-18 0 Boston Bruins 3 2 1 \n", ".. ... ... ... ... ... ... \n", "78 2019-03-30 0 Anaheim Ducks 1 5 0 \n", "79 2019-04-01 1 Vegas Golden Knights 1 3 0 \n", "80 2019-04-02 1 Colorado Avalanche 2 6 0 \n", "81 2019-04-04 0 San Jose Sharks 2 3 0 \n", "82 2019-04-06 1 Calgary Flames 3 1 1 \n", "\n", " Overtime Attendance Duration \n", "GP \n", "1 0 12044 151 \n", "2 0 17565 148 \n", "3 0 17085 144 \n", "4 1 15321 149 \n", "5 1 18347 146 \n", ".. ... ... ... \n", "78 0 18347 141 \n", "79 0 18367 144 \n", "80 0 17021 142 \n", "81 0 18347 147 \n", "82 0 19289 145 \n", "\n", "[82 rows x 9 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "team = 'EDM'\n", "year = '2019'\n", "\n", "# download the data\n", "import pandas as pd\n", "team_stats_url = 'https://www.hockey-reference.com/teams/'+team+'/'+year+'_games.html'\n", "team_stats = pd.read_html(team_stats_url)[0]\n", "# clean up the data\n", "team_stats = team_stats[team_stats['Date']!='Date'].set_index('GP').drop(columns=['W','L','OL','Streak','Notes'])\n", "team_stats.columns = ['Date', 'Away', 'Opponent', 'Goals For', 'Goals Against', 'Win', 'Overtime', 'Attendance', 'Duration']\n", "team_stats = team_stats.fillna(0).replace('@', 1).replace('OT', 1).replace('W',1).replace('SO',1).replace('L',0)\n", "# convert text string columns to number columns\n", "team_stats['Goals For'] = pd.to_numeric(team_stats['Goals For'])\n", "team_stats['Goals Against'] = pd.to_numeric(team_stats['Goals Against'])\n", "team_stats['Attendance'] = pd.to_numeric(team_stats['Attendance'])\n", "# convert duration in h:mm to duration in minutes\n", "duration_values = team_stats['Duration'].str.split(':', expand=True).astype(int)\n", "team_stats['Duration'] = duration_values[0]*60 + duration_values[1]\n", "# display the data\n", "team_stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistics by Player\n", "\n", "This data set contains the following columns for each player in the NHL:\n", "- GP: Games Played\n", "- G: Goals\n", "- A: Assists\n", "- PTS: Points\n", "- +/-: Plus/Minus Rating\n", "- PIM: Penalty Minutes\n", "- PTS/G: Points Per Game\n", "- SOG: Shots on Goal\n", "- PCT: Shooting Percentage\n", "- GWG: Game-Winning Goals\n", "- G.1: Power-Play Goals\n", "- A.1: Power-Play Assists\n", "- G.2: Short-Handed Goals\n", "- A.2: Short-Handed Assists\n", "\n", "This will take a while to run, since it needs to get data from multiple pages." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RKPLAYERTEAMGPGAPTS+/-PIMPTS/GSOGPCTGWGG.1A.1G.2A.2POSITION
01Nikita KucherovTB206202613201.30728.310700RW
12Nathan MacKinnonCOL159162513121.676513.803600C
22Brayden PointTB18916251081.395416.721300C
34Miro HeiskanenDAL2251823521.054710.602600D
45Mikko RantanenCOL15714211161.405512.702600RW
.........................................................
370361Marcus FolignoMIN4011-250.2540.000000LW
371361Carl HagelinWSH8011-420.1340.000000LW
372361Jake EvansMTL6011-100.1730.000000C
373361Ilya KovalchukWSH8011020.1350.000000LW
374361Morgan GeekieCAR8011-100.1340.000000C
\n", "

375 rows × 18 columns

\n", "
" ], "text/plain": [ " RK PLAYER TEAM GP G A PTS +/- PIM PTS/G SOG PCT \\\n", "0 1 Nikita Kucherov TB 20 6 20 26 13 20 1.30 72 8.3 \n", "1 2 Nathan MacKinnon COL 15 9 16 25 13 12 1.67 65 13.8 \n", "2 2 Brayden Point TB 18 9 16 25 10 8 1.39 54 16.7 \n", "3 4 Miro Heiskanen DAL 22 5 18 23 5 2 1.05 47 10.6 \n", "4 5 Mikko Rantanen COL 15 7 14 21 11 6 1.40 55 12.7 \n", ".. ... ... ... .. .. .. ... ... ... ... ... ... \n", "370 361 Marcus Foligno MIN 4 0 1 1 -2 5 0.25 4 0.0 \n", "371 361 Carl Hagelin WSH 8 0 1 1 -4 2 0.13 4 0.0 \n", "372 361 Jake Evans MTL 6 0 1 1 -1 0 0.17 3 0.0 \n", "373 361 Ilya Kovalchuk WSH 8 0 1 1 0 2 0.13 5 0.0 \n", "374 361 Morgan Geekie CAR 8 0 1 1 -1 0 0.13 4 0.0 \n", "\n", " GWG G.1 A.1 G.2 A.2 POSITION \n", "0 1 0 7 0 0 RW \n", "1 0 3 6 0 0 C \n", "2 2 1 3 0 0 C \n", "3 0 2 6 0 0 D \n", "4 0 2 6 0 0 RW \n", ".. ... ... ... ... ... ... \n", "370 0 0 0 0 0 LW \n", "371 0 0 0 0 0 LW \n", "372 0 0 0 0 0 C \n", "373 0 0 0 0 0 LW \n", "374 0 0 0 0 0 C \n", "\n", "[375 rows x 18 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# download the data\n", "points_url = 'http://www.espn.com/nhl/statistics/player/_/stat/points'\n", "import pandas as pd\n", "for i in range(20):\n", " try:\n", " p = pd.read_html(points_url+'/count/'+str(1+40*i), header=1)[0]\n", " p = p[p['PLAYER']!='PLAYER'].dropna(subset=['PLAYER']).fillna(method='ffill')\n", " if i == 0:\n", " points = p\n", " else:\n", " points = points.append(p).reset_index().drop(columns='index')\n", " # if the site has run out of data\n", " except:\n", " pass\n", "# convert text string columns to number columns\n", "for column in points.columns:\n", " if column != 'PLAYER' and column != 'TEAM':\n", " points[column] = pd.to_numeric(points[column])\n", "# split the player name and position into two columns\n", "points['POSITION'] = points['PLAYER'].str.split(',', expand=True)[1]\n", "points['PLAYER'] = points['PLAYER'].str.split(',', expand=True)[0]\n", "# display the data\n", "points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }