Callysto.ca Banner

Open in Callysto

Stats Project - Income Per Person

by Annya Marx

For this project we used secondary data from Gapminder about countries’ gross domestic product (GDP) per person.

Research Question

Are there more countries with a high GDP per person or a low GDP per person? How does Canada compare to other countries?

Getting Data

spreadsheet_key = '10vHiHnBQre07TwX75vTc_H1lf-w5-hbe5mZH4ro6QNE'
spreadsheet_gid = '140930349'

import pandas as pd
csv_link = 'https://docs.google.com/spreadsheets/d/'+spreadsheet_key+'/export?gid='+spreadsheet_gid+'&format=csv'
data = pd.read_csv(csv_link, skiprows=2)
data = data.dropna()
data
geo Country Name 1800 1801 1802 1803 1804 1805 1806 1807 ... 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040
0 afg Afghanistan 603.0 603.0 603.0 603.0 603.0 603.0 603.0 603.0 ... 2546.0 2602.0 2657.0 2711.0 2767.0 2823.0 2880.0 2939.0 2999.0 3060.0
1 alb Albania 667.0 667.0 667.0 667.0 667.0 668.0 668.0 668.0 ... 19358.0 19781.0 20197.0 20613.0 21034.0 21463.0 21899.0 22345.0 22799.0 23263.0
2 dza Algeria 715.0 716.0 717.0 718.0 719.0 720.0 721.0 722.0 ... 14343.0 14607.0 14890.0 15188.0 15495.0 15810.0 16131.0 16459.0 16794.0 17135.0
3 and Andorra 1197.0 1199.0 1201.0 1204.0 1206.0 1208.0 1210.0 1212.0 ... 73605.0 75142.0 76689.0 78256.0 79850.0 81475.0 83132.0 84823.0 86548.0 88308.0
4 ago Angola 618.0 620.0 623.0 626.0 628.0 631.0 634.0 637.0 ... 6109.0 6227.0 6352.0 6480.0 6611.0 6745.0 6883.0 7023.0 7165.0 7311.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
199 africa Africa 687.0 687.0 688.0 689.0 689.0 688.0 690.0 694.0 ... 5660.0 5754.0 5850.0 5948.0 6048.0 6150.0 6255.0 6361.0 6470.0 6581.0
200 asia Asia 811.0 810.0 808.0 806.0 804.0 802.0 800.0 798.0 ... 20113.0 20513.0 20903.0 21292.0 21683.0 22080.0 22485.0 22896.0 23316.0 23744.0
201 europe Europe 1710.0 1710.0 1723.0 1728.0 1740.0 1741.0 1746.0 1759.0 ... 40132.0 40957.0 41806.0 42675.0 43564.0 44471.0 45398.0 46343.0 47308.0 48293.0
202 americas The Americas 1382.0 1388.0 1392.0 1382.0 1377.0 1387.0 1395.0 1395.0 ... 34509.0 35183.0 35883.0 36603.0 37340.0 38094.0 38864.0 39651.0 40455.0 41277.0
203 world World 1003.0 1003.0 1005.0 1005.0 1006.0 1006.0 1006.0 1008.0 ... 21167.0 21523.0 21877.0 22231.0 22590.0 22953.0 23322.0 23697.0 24077.0 24464.0

201 rows × 243 columns

We have data for 201 countries or regions, for the years 1800 to 2040 (which includes projections).

2019 Statistics

Let’s look at statistical calculations for the year 2019.

columns = ['Country Name', '2019']
data[columns].describe()
2019
count 201.000000
mean 18896.213930
std 19699.824134
min 631.000000
25% 3931.000000
50% 12143.000000
75% 28774.000000
max 113331.000000

Since that doesn’t include the median let’s find that.

data['2019'].median()
12143.0

The mode is not a useful measure of central tendency here, since there are all unique values in this column.

len(data['2019'].unique())
201

We do see a large range in the data (631 to 113331), meaning that there is a large difference between the poorest countries and riches countries in terms of GDP per person.

Visualizations

Bar Charts

Let’s create a bar chart of our sorted 2019 GDP per person data.

import plotly_express as px
fig = px.bar(data.sort_values('2019'), x='Country Name', y='2019', title='2019 GDP Per Person')
fig.show()

It looks like there are three countries that may be considered outliers for their high GDP per person (Qatar, Luxembourg, and Singapore). However they are probably not skewing the results significantly, and don’t need to be removed before looking at central tendency and dispersion.

To compare some countries, let’s make a bar chart comparing 2019 GDP per person Canada to the top five and bottom five countries.

sorted_data = data.sort_values('2019')
bottom_five = sorted_data.head()['Country Name'].tolist()
top_five = sorted_data.tail()['Country Name'].tolist()
countries = ['Canada']
countries.extend(bottom_five)
countries.extend(top_five)
px.bar(sorted_data[sorted_data['Country Name'].isin(countries)], x='Country Name', y='2019', title='2019 GDP Per Person')

It looks like Canada’s GDP per person is closer to the top five. Let’s compare it to the mean and median.

print('Mean', data['2019'].mean())
print('Median', data['2019'].median())

canada_row = data[data['Country Name']=='Canada'].index[0]
print('Canada', data.loc[canada_row]['2019'])
Mean 18896.213930348258
Median 12143.0
Canada 44181.0

We can see that Canada’s GDP per person is more than twice the mean value, and almost four times the median value.

Histogram

Next let’s create a histogram with 6 bins.

px.histogram(data, x='2019', nbins=6, title='Histogram of 2019 GDP Per Person')

The histogram shows that the data are not normally distributed. There are a lot more countries with a lower GDP per person than in the higher categories.

Conclusion

Based on 2019 data, there are many more countries in our world with a low gross domestic product per person. Canada’s GDP per person is well above average.

It would be interesting to see if and how this has changed over the years, and how it is predicted to change over time.

Callysto.ca License