{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Music is ubiquitous\n",
"\n",
"Music is one of the ways we humans express ourselves. Not only can music help form your identity, it has also helps people to connect in this globalizing world. Sharing an interest for a particular kind of music has allowed artists to travel across the globe. \n",
"\n",
"With online streaming platforms it has become easier ever to listen music from artists from different countries. Let's analyse the dataset from [Spotify](https://spotifycharts.com/regional), an audio streaming platform, which is available on [Kaggle](https://www.kaggle.com/datasets). \n",
"\n",
"The [original dataset](https://www.kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking/version/3) (with more than 3 million rows!) contained information about top 200 artists in daily charts in various countries for 2017. It has been modified to contain only top 10 artists which reduced the number of rows to 191848, still a [big dataset](https://en.wikipedia.org/wiki/Big_data) to work with. Als, note that the top artists were selected based on the *streams* of their tracks i.e. *number of times* songs were played by Spotify users."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Popularity of an artist across the globe \n",
"Let's check how many times tracks from a user-selected artist were streamed on Spotify. \n",
"\n",
"Run the code cells below and check what our dataset contains."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# You can comment out the following line if the module is already installed\n",
"!pip install wikipedia\n",
"\n",
"# Import Python libraries\n",
"import pandas as pd\n",
"import calendar\n",
"import plotly.express as px\n",
"import plotly.graph_objects as go\n",
"from ipywidgets import interact, fixed, widgets, Layout, Button, Box, fixed, HBox, VBox\n",
"import wikipedia\n",
"from IPython.display import clear_output\n",
"import random\n",
"\n",
"# Don't show warnings in output\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"clear_output()\n",
"\n",
"print('Libraries successfully imported.')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import the dataset and remove rows with missing entries\n",
"df = pd.read_csv('./Data/top_10_artists_spotify_2017_final.csv').dropna()\n",
"\n",
"# Import country codes (required for the choropleth map)\n",
"country_codes = pd.read_csv('./Data/country_codes.csv',sep='\\t')\n",
"country_codes['2let'] = country_codes['2let'].str.lower() # Convert country codes to lower case\n",
"\n",
"# Display the top 5 rows\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define a callback function for \"Show Popularity\" button\n",
"def show_popularity(ev):\n",
" clear_output(wait=True)\n",
" \n",
" # Define display order for the buttons and menus\n",
" display(Box(children = [artist_menu], layout = box_layout))\n",
" display(Box(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'row', align_items= 'center', width='100%', justify_content = 'center')))\n",
"\n",
" # Find total streams for a user-selected artist across countries\n",
" subset = df[df['Artist'] == artist_menu.value].groupby('Region')\n",
" total_streams = subset['Streams'].sum().to_frame('Streams').reset_index()\n",
" \n",
" # Merge the 3 letter country codes (required in plotly express) with the data\n",
" final = total_streams.merge(country_codes, left_on='Region', right_on='2let', how='inner')\\\n",
" .drop('2let',1)\\\n",
" .rename(columns={'3let':'Country Code'})\n",
" \n",
" # Find the wikipedia page for the artist \n",
" try:\n",
" p = wikipedia.page(artist_menu.value)\n",
" # If can't find the exact page, get the closest one \n",
" except wikipedia.exceptions.DisambiguationError as e:\n",
" s = e.options[0]\n",
" p = wikipedia.page(s)\n",
" \n",
" # Plot the choropleth map for the user-selected artist\n",
" fig = px.choropleth(final, # dataframe with required data \n",
" locations=\"Country Code\", # Column containing country codes\n",
" color=\"Streams\", # Color of country should be based on number of streams\n",
" hover_name=\"Countrylet\", # title to add to hover information\n",
" hover_data=[\"Streams\"], # data to add to hover information\n",
" projection='natural earth', # preferred view of choropleth map\n",
" \n",
" # Add title for the map (hyperlinks for wiki page of artist and dataset are added) \n",
" title = 'Popularity of {} by streams of tracks in daily Top 10 chart (2017)
Source: \\\n",
"\\\n",
"Spotify through Kaggle Datasets'.format(p.url,artist_menu.value))\n",
" fig.update_layout(geo=dict(showcountries=True))\n",
" # Show the figure\n",
" fig.show()\n",
"print('Defined the show_popularity function.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the code cell below and select the artist you want to analyze popularity of. Don't forget to click on `Show Popularity` button."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Layout for widgets\n",
"box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')\n",
"style = {'description_width': 'initial'}\n",
"\n",
"# Create dropdown menu for Artist\n",
"artist_menu = widgets.Dropdown(options = df['Artist'].sort_values().unique(), description ='Artist: ', style = style, disabled=False)\n",
"\n",
"# Create Show Popularity button and define click events\n",
"show_button = widgets.Button(button_style= 'info', description=\"Show Popularity\")\n",
"show_button.on_click(show_popularity)\n",
"\n",
"# Define display order for the buttons and menus\n",
"display(Box(children = [artist_menu], layout = box_layout))\n",
"display(Box(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'row', align_items= 'center', width='100%', justify_content = 'center')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Try it for various artists. If you don't know much about the artist, click the link in the title of the map which will take you to the *Wikipedia* page of the artist. Hover your mouse over different countries and see how many times tracks were streamed.\n",
"\n",
"### Questions:\n",
"1. Which artists were popular in North America, Europe, as well as the Asia-Pacific region?\n",
"2. Share your thoughts on how music (and media in general) has brought people together and become the globalizing force."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Artists popular in different countries\n",
"Let's try to gauge the effect of globalization on music listened by people of various countries. We will plot the top artists streamed on Spotify.\n",
"\n",
"Run the code cells below and see that the additional columns (`Country Name` and `Country Code`) are added to the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Add Country Name and 3 letter Country Code to the dataset\n",
"df_with_country_names = df.merge(country_codes, left_on='Region', right_on='2let', how='inner')\\\n",
" .drop('2let',1)\\\n",
" .rename(columns={'3let':'Country Code','Countrylet':'Country Name'})\n",
"\n",
"# Display top 5 rows\n",
"df_with_country_names.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a dataframe containing names of 12 months and number\n",
"months_all = pd.DataFrame(calendar.month_name[1:],columns={'Name'})\n",
"months_all['Number'] = list(range(1,13))\n",
"\n",
"# Define a callback function for \"Show Top Artists\" button\n",
"def show_top_3_artists(ev):\n",
" clear_output(wait=True)\n",
" \n",
" # Define display order for the buttons and menus\n",
" display(Box(children = [country_menu], layout = box_layout))\n",
" display(Box(children = [top_x_artists_slider,show_button1], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))\n",
" \n",
" # How many top artists will be shown in the chart (user-selected value)\n",
" top_x_artists = top_x_artists_slider.value\n",
" \n",
" # Query the data for the user-selected country\n",
" subset = df_with_country_names[df_with_country_names['Country Name'] == country_menu.value]\n",
" \n",
" # Convert Dates column to pandas datetime format and identify in which month the date falls in\n",
" # (This is required as we are trying to plot a bar chart on a monthly basis)\n",
" subset['Date'] = pd.to_datetime(subset['Date'])\n",
" subset['Month'] = [months_all['Name'][months_all['Number'] == i.month].iloc[0] for i in subset['Date']]\n",
" \n",
" total_streams = [] # Create an empty list\n",
" \n",
" # Import colors to be assigned to each artist\n",
" df_colors = pd.read_csv('./Data/colors_plotly.csv', header=None) # Import list of colors available in plotly\n",
" count = 0 # Initiate count\n",
" colors = {} # Create an empty dictionary\n",
" \n",
" # Create a plotly figure object (like an empty figure)\n",
" fig = go.Figure()\n",
" \n",
" # Get the number of streams for all artists for each month in Year 2017\n",
" for i in range(0,months_all.shape[0]):\n",
" month_curr = months_all['Name'][i] # Get the month name\n",
" top_artists = subset[subset['Month'] == month_curr].groupby('Artist') # Group data by the artists\n",
" \n",
" # Calculate total streams \n",
" total_streams = top_artists['Streams'].sum().sort_values(ascending=False).to_frame('Streams').reset_index()[:top_x_artists]\n",
" \n",
" # Add a trace in the bar chart \n",
" if(total_streams.shape[0] > 0):\n",
" for j in range(0,top_x_artists):\n",
" \n",
" # Assign color to an artist\n",
" if total_streams['Artist'].iloc[j] not in colors.keys():\n",
" colors[total_streams['Artist'].iloc[j]] = list(df_colors[0])[count] # Assign color in order from the list\n",
" count += 1 # Increase count\n",
" \n",
" fig.add_trace(go.Bar(name=total_streams['Artist'].iloc[j], x=[(month_curr)], y=[total_streams['Streams'].iloc[j]], \\\n",
" showlegend = False, width = 0.8, offset=[0,0], text = total_streams['Artist'].iloc[j], textposition='auto',\n",
" marker={'color':colors[total_streams['Artist'].iloc[j]]}))\n",
" \n",
" # Change the chart layout\n",
" fig.update_layout(xaxis_tickangle=-45, # Angle of x-axis ticks\n",
" xaxis_tickmode='linear', # Tickmode of x-axis ticks\n",
" yaxis=dict(title='Streams
(k=thousand, M=million)'), # Title for y-axis\n",
" \n",
" \n",
" # Title for the bar chart\n",
" title_text='Top {} artist(s) in {} by streams of tracks in \\\n",
"daily Top 10 chart (2017)
Source: \\\n",
"Spotify through Kaggle Datasets'.format(top_x_artists_slider.value, country_menu.value))\n",
"\n",
" # Show figure \n",
" fig.show()\n",
"print('Successfully defined the show_top_3_artists function.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the cell below. Select the country and move the slider to choose how many top artist you want to see in the bar chart. Then click on `Show Top Artists` button."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# Layout for widgets\n",
"box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')\n",
"style = {'description_width': 'initial'}\n",
"\n",
"# Create dropdown menu for Artist and slider for How Many Top Artists\n",
"country_menu = widgets.Dropdown(options = df_with_country_names['Country Name'].sort_values().unique(), description ='Country: ', style = style, disabled=False)\n",
"top_x_artists_slider = widgets.IntSlider(value = 3, min = 1, max = 5, description = \"How Many Top Artists\", style = style)\n",
"\n",
"# Create Show Top Artists button and define click events\n",
"show_button1 = widgets.Button(button_style= 'info', description=\"Show Top Artists\")\n",
"show_button1.on_click(show_top_3_artists)\n",
"\n",
"# Define display order for the buttons and menus\n",
"display(Box(children = [country_menu], layout = box_layout))\n",
"display(Box(children = [top_x_artists_slider,show_button1], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hover the mouse to see artist names and the number of times their tracks were streamed. \n",
"\n",
"### Questions:\n",
"1. Are the top three artists in Canada all Canadians?\n",
"2. Do you think music can affect the cultural identity of various communities?\n",
"3. Are you aware of steps taken by countries, particularly Canada, to protect local artists or performers in this era of globalization?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"nbTranslate": {
"displayLangs": [
"*"
],
"hotkey": "alt-t",
"langInMainMenu": true,
"sourceLang": "en",
"targetLang": "fr",
"useGoogleTranslate": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}