{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)\n", "\n", "\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Languages: An important identity in globalizing world\n", "\n", "Language is an important identity of the people. Not only does language provide a medium to communicate, but also it binds the community together. Therefore, it is essential to preserve linguistic diversity in the globalizing world. \n", "\n", "Let's check out various languages that once existed or currently exist on Earth. [UNESCO](https://en.unesco.org/) has compiled an [atlas](http://www.unesco.org/languages-atlas/index.php?hl=en&page=atlasmap) which contains various statistics about languages. A [choropleth map](https://en.wikipedia.org/wiki/Choropleth_map) is used to visualize this linguistic geographical dataset. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# You can comment out the following lines if these modules are already installed\n", "!pip install googletrans\n", "!pip install gTTS\n", "\n", "# Import Python libraries\n", "import pandas as pd\n", "import plotly.graph_objects as go\n", "from plotly.subplots import make_subplots\n", "from ipywidgets import interact, fixed, widgets, Layout, Button, Box, fixed, HBox, VBox\n", "import googletrans\n", "import gtts\n", "from IPython.display import Audio, display, clear_output\n", "\n", "# Don't show warnings in output\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "clear_output()\n", "\n", "print('Libraries successfully imported.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the dataset\n", "df = pd.read_csv('./Data/languoid.csv')\n", "\n", "# Data clean up - keep required columns\n", "df = df[df['level'] == 'language'] \\\n", " [['name','child_dialect_count','latitude','longitude','status']] \\\n", " .dropna() # remove rows with missing entries\n", "\n", "# Display top 5 rows\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This cell will take at least couple of minutes to run as we are trying to plot more than 7000 languages\n", "\n", "# Define color scheme\n", "colors = {'safe':'rgb(61,145,64)', 'vulnerable':'rgb(31,117,254)', 'definitely endangered':'rgb(137,207,240)', \\\n", " 'severely endangered':'rgb(255,191,0)', 'critically endangered':'rgb(255,0,0)', 'extinct':'rgb(0,0,0)'}\n", "\n", "# Create a plotly figure object (like an empty figure)\n", "fig = go.Figure()\n", "\n", "# Create a marker with desired properties for each language in the dataset\n", "for i in range(0,df.shape[0]):\n", " df_row = df.iloc[i] # Pass each row in the dataset\n", " fig.add_trace(go.Scattergeo(\n", " lon = [df_row['longitude']], # longitude\n", " lat = [df_row['latitude']], # latitude \n", " text = 'Dialects: {0}
Status: {1}'.format(df_row['child_dialect_count'],df_row['status']), # text accompanying the dataset\n", " name = df_row['name'], # name of the language\n", " hoverinfo = 'name + text', # specify information to be shown when a pointer is hovered over a marker\n", " marker = dict(\n", " size=9,\n", " color=colors[df_row['status']],\n", " line_width=0), # marker properties\n", " showlegend = False # remove legend\n", " )\n", " )\n", " \n", "# Update figure properties \n", "fig.update_layout(\n", " # Add title (see how hyperlink is added)\n", " title_text = 'Languages around the world
\\\n", "Source: \\\n", "UNESCO',\n", " # Other geological properties of the map\n", " geo = dict(\n", " resolution=50,\n", " showcoastlines=True,\n", " showframe=False,\n", " showland=True,\n", " landcolor=\"lightgray\",\n", " showcountries=True,\n", " countrycolor=\"white\" ,\n", " coastlinecolor=\"white\",\n", " bgcolor='rgba(255, 255, 255, 0.0)'\n", " )\n", ")\n", "\n", "# Show the figure\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hover over markers in the map and see the dialects and status of various languages. Feel free to zoom in/out as necessary to spot various languages in a country of your choice.\n", "\n", "The colors indicate the [status of a language](http://www.unesco.org/languages-atlas/en/statistics.html).\n", "\n", "### Questions:\n", "\n", "1. Which color indicates a language that is safe from extinction? Which are the two continents with most safe languages?\n", "2. Why do you think languages disappear? What are possible threats to linguistic diversity?\n", "3. Could we revive a language that is on the brink of extinction? How?\n", "4. Do you think that having more dialects makes a language safer or more vulnerable?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Languages spoken within a country\n", "\n", "Now that we know about various languages, let's dive a bit deeper. We'll analyze languages spoken in various countries and how they have changed over time, using a demographic dataset from the [United Nations](http://data.un.org/Data.aspx?d=POP&f=tableCode:27)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import the dataset\n", "df2 = pd.read_csv('./Data/UNdata_Countrywise.csv')\n", "df2 = df2.iloc[:-79] # Remove footnotes\n", "\n", "# Display top 5 rows\n", "df2.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def show_bar_chart(ev):\n", " clear_output(wait=True)\n", " \n", " # Define display order for the buttons and menus\n", " display(Box(children = [country_menu], layout = box_layout))\n", " display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))\n", " \n", " # Filter the data for given country and gender type\n", " to_be_removed = ['Unknown','Not stated','None','Total'] # Remove these categories\n", " df2_sub = df2[df2['Country or Area'] == country_menu.value] \\\n", " [df2['Sex'] == 'Both Sexes'] \\\n", " [df2['Area'] == 'Total'] \\\n", " [~df2['Language'].isin(to_be_removed)]\n", " \n", " # Plot the pie chart\n", " if(df2_sub.shape[0] == 0): \n", " print('Sorry... data not available :-(') # Show comment if data is not available\n", " else:\n", " years = df2_sub['Year'].unique()[0:2] # Show data for two latest years\n", " \n", " # Make subplots\n", " fig = make_subplots(rows=1, cols=len(years), subplot_titles=['Year {}'.format(i) for i in years])\n", "\n", " # Add trace for each year's bar chart\n", " for i,j in enumerate(years):\n", " \n", " new_df = df2_sub[df2_sub['Year'] == j] # Filter the data for each subplot\n", " new_df['Value'] = (100 * new_df['Value'] / new_df['Value'].sum()).round(1) # Convert population to percent\n", " new_df = new_df.sort_values('Value',ascending=False)[:5] # Sort values in descending order\n", " \n", " fig.add_trace(go.Bar(name='Year {}'.format(j),x=new_df['Language'],y=new_df['Value'],\n", " showlegend=False),1,i+1) # Add the trace for each subplot\n", "\n", " # Update yaxis properties of subplots\n", " fig.update_yaxes(title_text=\"Population (%)\", row=1, col=1, range=[0,100])\n", " fig.update_yaxes(title_text=\"Population (%)\", row=1, col=2, range=[0,100])\n", "\n", " # Update the title of the figure\n", " fig.update_layout(title_text='Top 5 languages spoken in {}
\\\n", "Source: \\\n", "United Nations'.format(country_menu.value))\n", " \n", " fig.show()\n", "print('Defined the function show_bar_chart')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cell below and select the country and gender you want to visualize. Don't forget to click on `Show Bar Chart` button." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Layout for widgets\n", "box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')\n", "style = {'description_width': 'initial'}\n", "\n", "# Create dropdown menu for Country and Gender\n", "country_menu = widgets.Dropdown(options = df2['Country or Area'].unique(), description ='Country: ', style = style, disabled=False)\n", "\n", "# Create Show Pie Chart button and define click events\n", "show_button = widgets.Button(button_style= 'info', description=\"Show Bar Chart\")\n", "show_button.on_click(show_bar_chart)\n", "\n", "# Define display order for the buttons and menus\n", "display(Box(children = [country_menu], layout = box_layout))\n", "display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each bar chart here is for a year in which data were collected. Move your cursor around to see how much of the total population speak a particular language and how that pattern has changed over time.\n", "\n", "### Questions:\n", "1. List the top 5 languages in Canada according to the latest data (2016).\n", "2. Has use of languages other than *English* and *French* increased among Canadians over time? \n", "3. How do you think globalization affects the spread languages across the borders?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring other languages\n", "\n", "Let's learn a little of some other languages. Here we will translate some English words into a language of your choice. The translation will be accompanied by an audio clip to show you how the translated word is pronounced." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_languages = gtts.lang.tts_langs()\n", "all_languages = {v: k for k, v in all_languages.items()}\n", "\n", "def translate_and_pronounce(ev):\n", " clear_output(wait=True)\n", " \n", " # Define display order for the buttons and menus\n", " display(Box(children = [textbox_input, languages_menu], layout = box_layout))\n", " display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))\n", "\n", " # Create a translator class (like a template) using Google Translator\n", " translator = googletrans.Translator()\n", " translation_data = translator.translate(textbox_input.value,\\\n", " dest=all_languages[languages_menu.value]) # Supply text to translate and language\n", " translation = translation_data.extra_data['translation'][0][0] # Extract translated data\n", " \n", " # Extract pronunciation for the translated text\n", " if(len(translation_data.extra_data['translation']) == 2):\n", " pronunciation = translation_data.extra_data['translation'][1][-1]\n", " else:\n", " pronunciation = 'None - Speak as you read' # For languages using Roman letters\n", " print('\\n')\n", " \n", " # Create and display textbox for \"Text\" and dropdown menu for languages\n", " textbox_translation = widgets.Text(value = translation, description = 'Translation: ', disabled = True, style = style)\n", " textbox_pronunciation = widgets.Text(value = pronunciation, description = 'Pronunciation: ', disabled = True, style = style)\n", " display(Box(children = [textbox_translation, textbox_pronunciation], layout = box_layout)) \n", " \n", " # Create an audio file for the translated text using Google Text-to-Speech\n", " tts = gtts.gTTS(translation, lang=all_languages[languages_menu.value])\n", " tts.save('audio.mp3')\n", " \n", " # Display audio widget\n", " display(Audio('audio.mp3'))\n", "print('Defined the translate_and_pronounce function, run the cell below to use it.')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Layout for widgets\n", "box_layout = Layout(display='flex', flex_flow='row', align_items='center', width='100%', justify_content = 'center')\n", "style = {'description_width': 'initial'}\n", "\n", "# Create textbox for \"Text\" and dropdown menu for languages\n", "textbox_input = widgets.Text(value = '',description = 'Text: ', disabled = False)\n", "languages_menu = widgets.Dropdown(options = list(all_languages.keys())[:-18], description ='Language: ', style = style, disabled=False)\n", "\n", "# Create Translate button and define click events\n", "show_button = widgets.Button(button_style= 'info', description=\"Translate\")\n", "show_button.on_click(translate_and_pronounce)\n", "\n", "# Define display order for the buttons and menus\n", "display(Box(children = [textbox_input, languages_menu], layout = box_layout))\n", "display(VBox(children = [show_button], layout = Layout(display= 'flex', flex_flow= 'column', align_items= 'center', width='100%', justify_content = 'center')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Questions:\n", "\n", "1. In this era of globalization, how important do you think it is for people to learn additional languages?\n", "2. List some scripts (alphabets) that are different from the [Roman](https://en.wikipedia.org/wiki/Latin_script) one we use in English (e.g. [Devanagari](https://en.wikipedia.org/wiki/Devanagari))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "nbTranslate": { "displayLangs": [ "*" ], "hotkey": "alt-t", "langInMainMenu": true, "sourceLang": "en", "targetLang": "fr", "useGoogleTranslate": true } }, "nbformat": 4, "nbformat_minor": 2 }