Module 6 Unit 3 - Tutorial: Working with Open Data#
Unit Learning Objectives#
By the end of this unit, you will be able to
Understand what open data is
Understand that there are many reliable open data sources available to use for free
What is Open Data?#
The Government of Canada defines open data as “structured data that is machine-readable, freely shared, used and built on without restrictions.” Or less technically stated: “Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike.”
These descriptions are a good place to start, but we gain more insight into the principles and standards of open data by digging the requirements a bit further, specifically:
Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form (e.g. comma-separated value file type).
Re-use and Redistribution: the data must be provided under terms that permit re-use and redistribution, including the ability to combine it with data from elsewhere.
Universal Participation: everyone must be able to use, re-use and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
Classroom Benefits#
The benefits gained from the publication of open data are often related to improved awareness on governmental and/or societal issues. These include improved efficiency of public services through sharing of data between public agencies, the development of new services or businesses that rely on open data, and increased societal benefits through transparent disclosure of data and information.
Access to open data is available to any classroom in Canada with an internet connection and has the opportunity to be transformative for both educators and students. For example, one common issue for educators is classroom resources such as textbooks often contain outdated information. A benefit that easy access to open data can provide is a source for current data to ensure that lessons or lesson plans remain relevant and engaging for students. As well, data sources that were previously paper bound can be digitized and transformed in different manners to enhance or create lessons that were previously not possible.
Accessing Open Data Sources#
Open data is published by many sources, including municipalities, provinces, federal governments, and other public and private sector organizations. You may be surprised at who publishes open data and how easy it is to access.
For example, in 2017, three Alberta cities placed in the top 20 nationwide, for their open data initiatives, according to the Open Cities Index.
The City of Edmonton placed 1st
The City of Calgary placed 4th
Strathcona County placed 11th
Each of these cities have robust open data infrastructure and datasets available for use by the public. The Government of Alberta and the Federal Government of Canada also have large open data repositories available to the public that are continuously updated.
City of Edmonton
What to Look for
Although many open data sources exist, not all are created equally with respect to how easy they are to perform data analysis on. It’s also important to consider the reliability of the data source– some online data are inaccurate, or even intentionally misleading.
As with any information source, open data is affected by the biases of the people who have curated it. Whenever possible, try to use sources that don’t have a vested interest in certain data results, or consider comparing multiple sources with alternative interest.
For example, if you’re looking to study the health effects of tobacco, a dataset provided by a study funded by a tobacco company may have different results than an independent study.
Always find out your data’s origins before you draw any major conclusions from it!
External Activity#
Let’s dig into these ideas further in a Jupyter notebook. Follow the steps below to get the Intro to Open Data notebook and open it in your Callysto account.
Start by opening the notebook file. Alternatively you can download it then upload the notebook file to your Callysto Hub. If you need a reminder on how to do this, revisit Module 4, Unit 5, Lesson 2.
Launch the new notebook, Intro to Open Data, by clicking on the file name in the Callysto Hub.
Once the notebook has loaded, don’t forget to click Run All to display the contents.
Read through the Intro to Open Data notebook and follow any interaction prompts for an overview of how some datasets are better than others, some extra sources of open data, and to see several examples of open data being analyzed within Jupyter notebooks.
External Activity: Using Open Data in Jupyter Notebooks#
Now that you have read through the Jupyter Notebook on open data, you will get the opportunity to explore a sample Jupyter Notebook that uses open data.
In the activity below, each of the available notebooks showcase different open data sources on different subjects, and demonstrates how you might enhance an existing lesson or activities with open data.
If you are not currently logged in to your Callysto account, do so now.
Find and open one of the following notebooks. Clicking the link will automatically add the file to your Callysto Hub and then open it.
Electrical Conductivity
Shakespeare and Statistics
When the notebook opens, click Run All to display the contents.
Work through the notebook as if you were a student, completing any questions and activities.