Stats Project - Income Per Person¶
by Annya Marx¶
For this project we used secondary data from Gapminder about countries’ gross domestic product (GDP) per person.
Research Question¶
Are there more countries with a high GDP per person or a low GDP per person? How does Canada compare to other countries?
Getting Data¶
spreadsheet_key = '10vHiHnBQre07TwX75vTc_H1lf-w5-hbe5mZH4ro6QNE'
spreadsheet_gid = '140930349'
import pandas as pd
csv_link = 'https://docs.google.com/spreadsheets/d/'+spreadsheet_key+'/export?gid='+spreadsheet_gid+'&format=csv'
data = pd.read_csv(csv_link, skiprows=2)
data = data.dropna()
data
geo | Country Name | 1800 | 1801 | 1802 | 1803 | 1804 | 1805 | 1806 | 1807 | ... | 2031 | 2032 | 2033 | 2034 | 2035 | 2036 | 2037 | 2038 | 2039 | 2040 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | afg | Afghanistan | 603.0 | 603.0 | 603.0 | 603.0 | 603.0 | 603.0 | 603.0 | 603.0 | ... | 2546.0 | 2602.0 | 2657.0 | 2711.0 | 2767.0 | 2823.0 | 2880.0 | 2939.0 | 2999.0 | 3060.0 |
1 | alb | Albania | 667.0 | 667.0 | 667.0 | 667.0 | 667.0 | 668.0 | 668.0 | 668.0 | ... | 19358.0 | 19781.0 | 20197.0 | 20613.0 | 21034.0 | 21463.0 | 21899.0 | 22345.0 | 22799.0 | 23263.0 |
2 | dza | Algeria | 715.0 | 716.0 | 717.0 | 718.0 | 719.0 | 720.0 | 721.0 | 722.0 | ... | 14343.0 | 14607.0 | 14890.0 | 15188.0 | 15495.0 | 15810.0 | 16131.0 | 16459.0 | 16794.0 | 17135.0 |
3 | and | Andorra | 1197.0 | 1199.0 | 1201.0 | 1204.0 | 1206.0 | 1208.0 | 1210.0 | 1212.0 | ... | 73605.0 | 75142.0 | 76689.0 | 78256.0 | 79850.0 | 81475.0 | 83132.0 | 84823.0 | 86548.0 | 88308.0 |
4 | ago | Angola | 618.0 | 620.0 | 623.0 | 626.0 | 628.0 | 631.0 | 634.0 | 637.0 | ... | 6109.0 | 6227.0 | 6352.0 | 6480.0 | 6611.0 | 6745.0 | 6883.0 | 7023.0 | 7165.0 | 7311.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
199 | africa | Africa | 687.0 | 687.0 | 688.0 | 689.0 | 689.0 | 688.0 | 690.0 | 694.0 | ... | 5660.0 | 5754.0 | 5850.0 | 5948.0 | 6048.0 | 6150.0 | 6255.0 | 6361.0 | 6470.0 | 6581.0 |
200 | asia | Asia | 811.0 | 810.0 | 808.0 | 806.0 | 804.0 | 802.0 | 800.0 | 798.0 | ... | 20113.0 | 20513.0 | 20903.0 | 21292.0 | 21683.0 | 22080.0 | 22485.0 | 22896.0 | 23316.0 | 23744.0 |
201 | europe | Europe | 1710.0 | 1710.0 | 1723.0 | 1728.0 | 1740.0 | 1741.0 | 1746.0 | 1759.0 | ... | 40132.0 | 40957.0 | 41806.0 | 42675.0 | 43564.0 | 44471.0 | 45398.0 | 46343.0 | 47308.0 | 48293.0 |
202 | americas | The Americas | 1382.0 | 1388.0 | 1392.0 | 1382.0 | 1377.0 | 1387.0 | 1395.0 | 1395.0 | ... | 34509.0 | 35183.0 | 35883.0 | 36603.0 | 37340.0 | 38094.0 | 38864.0 | 39651.0 | 40455.0 | 41277.0 |
203 | world | World | 1003.0 | 1003.0 | 1005.0 | 1005.0 | 1006.0 | 1006.0 | 1006.0 | 1008.0 | ... | 21167.0 | 21523.0 | 21877.0 | 22231.0 | 22590.0 | 22953.0 | 23322.0 | 23697.0 | 24077.0 | 24464.0 |
201 rows × 243 columns
We have data for 201 countries or regions, for the years 1800 to 2040 (which includes projections).
2019 Statistics¶
Let’s look at statistical calculations for the year 2019.
columns = ['Country Name', '2019']
data[columns].describe()
2019 | |
---|---|
count | 201.000000 |
mean | 18896.213930 |
std | 19699.824134 |
min | 631.000000 |
25% | 3931.000000 |
50% | 12143.000000 |
75% | 28774.000000 |
max | 113331.000000 |
Since that doesn’t include the median let’s find that.
data['2019'].median()
12143.0
The mode is not a useful measure of central tendency here, since there are all unique values in this column.
len(data['2019'].unique())
201
We do see a large range in the data (631 to 113331), meaning that there is a large difference between the poorest countries and riches countries in terms of GDP per person.
Visualizations¶
Bar Charts¶
Let’s create a bar chart of our sorted 2019 GDP per person data.
import plotly_express as px
fig = px.bar(data.sort_values('2019'), x='Country Name', y='2019', title='2019 GDP Per Person')
fig.show()
It looks like there are three countries that may be considered outliers for their high GDP per person (Qatar, Luxembourg, and Singapore). However they are probably not skewing the results significantly, and don’t need to be removed before looking at central tendency and dispersion.
To compare some countries, let’s make a bar chart comparing 2019 GDP per person Canada to the top five and bottom five countries.
sorted_data = data.sort_values('2019')
bottom_five = sorted_data.head()['Country Name'].tolist()
top_five = sorted_data.tail()['Country Name'].tolist()
countries = ['Canada']
countries.extend(bottom_five)
countries.extend(top_five)
px.bar(sorted_data[sorted_data['Country Name'].isin(countries)], x='Country Name', y='2019', title='2019 GDP Per Person')
It looks like Canada’s GDP per person is closer to the top five. Let’s compare it to the mean and median.
print('Mean', data['2019'].mean())
print('Median', data['2019'].median())
canada_row = data[data['Country Name']=='Canada'].index[0]
print('Canada', data.loc[canada_row]['2019'])
Mean 18896.213930348258
Median 12143.0
Canada 44181.0
We can see that Canada’s GDP per person is more than twice the mean value, and almost four times the median value.
Histogram¶
Next let’s create a histogram with 6 bins.
px.histogram(data, x='2019', nbins=6, title='Histogram of 2019 GDP Per Person')
The histogram shows that the data are not normally distributed. There are a lot more countries with a lower GDP per person than in the higher categories.
Conclusion¶
Based on 2019 data, there are many more countries in our world with a low gross domestic product per person. Canada’s GDP per person is well above average.
It would be interesting to see if and how this has changed over the years, and how it is predicted to change over time.