Statistics Reflections

Motivation

In the MIDS program, we emphasize thinking critically about data analysis and upholding principles of diversity, equity, and inclusion. Quantitative fields like data science are often viewed as “amoral,” but human judgment always plays a role in data collection and analysis. As data scientists, we must carefully consider the societal factors that play a role in our work. The materials presented with this assignment explore how statistics/data science interact with society at large.

This assignment connects to the fourth course learning objective: Make careful and critical decisions about model building and consider real-world implications.

Instructions

You are required to complete four statistics reflections throughout the semester. You can select any four from the list of materials below.

  • Begin your assignment with a header that includes your name and the title and source of your selected material

  • Reflect on the material selected. What did it make you think about? How does it affect your work as a data scientist? Note that you are required to engage with the material and provide your thoughts/reactions; you should NOT be summarizing the piece.

  • While there is no minimum word count for the reflections, you are expected to meaningfully engage with the material (2-3 paragraphs). You can use the suggested questions to get started, but you are not required to answer them in your response.

  • In Gradescope, select the appropriate reflection assignment based on the due date. All of the assignments are already available, so you can complete them at any time!

I am always available if you have any questions or comments about the content presented in these materials. Additionally, if you come across a piece that is not on this list and you would like to use it for a statistics reflection, you are welcome to send me an email. Please include a link to the piece and a brief description of how it connects data science and society.

Note: It is not appropriate to use ChatGPT for this assignment. The task is to offer your personal reflections/thoughts; there are no right or wrong answers here.

Deadlines

  1. September 8, 11:59 PM

  2. September 22, 11:59 PM

  3. October 6, 11:59 PM

  4. November 3, 11:59 PM

Submit your reflection to the appropriate assignment on Gradescope

Materials

A Primer on Non-Binary Gender and Big Data
The author offers several questions you may want to consider in your response:
  • What potential insights might we derive from working with non-binary gender and data?
  • What are the risks to gener minorities in relation to data?
  • What kinds of variation do we see across culture, context, and history?
  • How might non-binary gender and data deal with intersectionality (Click this link to learn more about intersectionality)
Additionally, you may want to consider:
  • Is it always useful/important to collect data on gender? In which domains might it be more important than others?

How eugenics shaped statistics
  • Consider this quote: "The separation was everything—not how much, what else might explain it, or why it mattered, just that it was there." Reflect on what this means for you as a data scientist
  • The article argues that you cannot separate the science from the scientist. Do you agree?
  • Reflect on the argument that Galton, Pearson, and Fisher's views are "a product of their time."

To predict and serve?
This article may be of particular interest to those interested in social justice. The authors work with the Human Rights Data Analysis Group (and Kristian Lum is a Duke alum!)
  • In data analysis, we often use the phrase "garbage in, garbage out" to refer to models based on bad data. How does this relate to the problem of using biased data for predictive policing?
  • The last sentence states that data-driven approaches should be inclusive. How do you think this can be done? Is it possible to use biased data without reaching biased predictions?
  • Can you think of other examples in which using data to predict future behavior could do more harm than good?

Loud Numbers: Listen to whichever one of the podcast episodes looks most interesting to you. Also, read the FAQs under the What We Do tab. You might also be interested in the Talks and Articles listed under the Community tab.
  • One of the FAQs mentions accessibility. Consider the limitations of only representing data visually and how data sonification may make data more accessible.
  • How might we use the other senses (touch, smell, taste) to represent data? Have you seen data represented in any of these ways before?

Fairness in Machine Learning with Sherri Rose (interview starts at 8:15 and ends at 46:00)
  • Reflect on Dr. Rose's recommendations for ensuring fairness in machine learning
  • Consider Dr. Rose's comments about the "single metric leaderboard" and what it means for you as a data scientist.

Peter Donnelly: How stats fool juries (you can skip past the stats jokes and start at 3:30)
  • This talk is from 2007. Can you think of more recent examples of misleading statistics/misuse of statistical principles?
  • How do you think situations like the trial Donnelly describes can be avoided?