Callysto.ca Banner

Module 1 Unit 3 - Data Science and Critical Thinking#

Data Science and Critical Thinking#

Data science can be a powerful critical thinking tool, but mistakes or distortions can also mislead us. Just as we teach students media literacy and fact-checking, it’s important to introduce students to critical thinking about data science.

Elvis

Let’s say we wanted to use data to predict the number of Elvis impersonators at some point in the future. From our research, we learn that in 1977 there were 170 Elvis impersonators and in the year 2000 there were about 85,000.

Assuming exponential growth, we would extrapolate that by 2020 there should have been 21,912,675 Elvis impersonators.

A line chart using historical data to predict the exponential growth of Elivs impersonators by the year 2020. According to a exponential-growth model, there would be 21,912,675 impersonators in 2020.

historicaldata

That’s a lot of Elvises! (Elvi?)

A more conservative estimate might assume linear growth, but even then, that would mean a worldwide total of 141,621 Elvis impersonators in 2020.

A line chart using historical data to predict linear growth of Elvis impersonators by 2020. According to a linear-growth model. there would be 141,621 Elivis impersonators by 2020.

historicaldata2

This prediction is obviously very silly, but they serve as a reminder that predictive analytics is not simple or easy.

Myhobby

“Extrapolating” by XKCD is licensed under CC BY-NC 2.5

Data science in the media#

Data-based journalism is present in everything from sports statistics and fashion trends to stock market reports and political policy. In many cases, it can have a large impact on public opinion or behaviour. downjones

The COVID-19 pandemic brought data analysis to the forefront for many people. Daily news included concepts such as exponential growth and calls from the medical community to flatten the curve.

Conclusions and predictions based on data are increasingly shared in both traditional media and on social media, and both we and our students need to be able to critically evaluate these to navigate them safely.

cartoon2

“Correlation” by XKCD is licensed under CC BY-NC 2.5

When consuming media that involves data science or any sort of analytics, some good questions to ask include:

  • Is this informed by data, or just opinions (or everybody knows)?

  • Are there data visualizations or tables?

  • Are there sources cited and/or can you access the original data?

  • Who is leading the discussion and drawing the conclusions? What are their qualifications?

  • Is there some sort of call to action that may have unintended consequences?

🏷️ Activity#

activityimage

  1. Play the Factitious News Game at factitious.augamestudio.com/#/

  2. Review some of the spurious correlations at tylervigen.com/spurious-correlations

  3. Read about an example of data science in action: an analysis of gender in film scripts. pudding.cool/2017/03/film-dialogue

The dangers of bias#

As we’ve explored, data analysis can help us make decisions. Unfortunately, bias in data collection, cleaning or analysis can result in decisions that are harmful.

Imagine being asked to fill out a survey that included questions like “Is a longer school day a good idea or a great idea?” or “Have you stopped hitting your students?”

Surveys that use leading questions or loaded questions like these will produce biased data. But while many people can identify this type of poor question design, other collection methods may be more subtle.

For example, if a researcher collects data from a sample that is not representative of the population they intend to study, their data will be biased and any conclusions they make about that population will likely be faulty.

bias

The book Weapons of Math Destruction provides a variety of examples of “how big data increases inequality and threatens democracy,” including the IMPACT Teacher Evaluation System. IMPACT is intended to evaluate teachers’ performance and reward them for doing a good job. However, in addition to concerns related to the reliability of the measurements over time and between administrators, evaluations rely heavily on student test scores, which tend to be strongly correlated to socioeconomic factors. For example, see this article from the Journal of Poverty: Socioeconomic Status and Intelligence: Why Test Scores Do Not Equal Merit.

Ideally, the data we use to gain insights and make decisions should be both accurate and precise. This means that the values we have collected are closely aligned with reality, and are consistent with subsequent measurements.

Ideally, the data we use to gain insights and make decisions should be both accurate and precise. This means that the values we have collected are closely aligned with reality, and are consistent with subsequent measurements.

Just as critical thinking is necessary when navigating data science referenced in the media, we should ask questions when data science is driving decisions that we or others make.

  • What are the possible biases of the authors, promoters, or funders?

  • Are the data or analytics being chosen to support or promote a particular worldview?

  • Are there possible unfair advantages provided to individuals or groups?

  • Are there possible significant negative impacts for individuals or groups?

  • How will this affect future generations? (see “seventh generation”)

🏷️ Activity#

In the U.S., teacher Value Added Models (VAM) have been used to measure the impact teachers have on student test scores and contribute to a teacher’s evaluation score. Now, imagine that you are a teacher who is assigned an evaluation score every July based on your students’ test scores as well as other metrics. Talk to a friend or family member about what questions you might have.

Example questions

  • As we consider how to integrate data synthesis and artificial intelligence in our classrooms in safe and supportive ways, in your opinions, what kinds of metrics are ethical in learning spaces? Why are some metrics more or less ethical than others? What can we do as educators to ensure that we provide some ethical context around metrics and data collection for assessment? What are the possible implications of collecting data for assessments and teacher practice?

  • When the test scores were assigned, would there be a way to distinguish how the test score was collected? What kind of story would the test score tell? Would all tests score narratives be equal?

  • What kinds of metrics would be most valuable to you in terms of student scores on a test or exam? What would the scores reveal to you? What changes might you make to your practice, based on an analysis of student test scores?

  • What kinds of questions would you have regarding the type of evaluation, which students took the evaluation, what the evaluation was testing, who was going to use the data from the evaluation, how others might use the data, how long the data would be stored for and where it would be stored?

Conclusion#

As we have access to increasing amounts of information, it is imperative that we learn how to critically evaluate information and information sources. As educators, we can positively impact our future society by fostering these skills in our students.

In the next unit, we’ll look at how to get started with data science.

Callysto.ca License