Statistics Reflections

Motivation

In the MIDS program, we emphasize thinking critically about data analysis and upholding principles of diversity, equity, and inclusion. Quantitative fields like data science are often viewed as “amoral,” but human judgment always plays a role in data collection and analysis. As data scientists, we must carefully consider the societal factors that play a role in our work. The materials presented with this assignment explore how statistics/data science interact with society at large.

This assignment connects to the third course learning objective: Make careful and critical decisions about model building and consider real-world implications.

Instructions

You are required to complete four statistics reflections throughout the semester. You can select any four from the list of materials below.

  • Begin your assignment with a header that includes your name and the title and source of your selected material

  • Reflect on the material selected. What did it make you think about? How does it affect your work as a data scientist? Note that you are required to engage with the material and provide your thoughts/reactions; you should NOT be summarizing the piece.

  • While there is no minimum word count for the reflections, you are expected to meaningfully engage with the material (2-3 paragraphs). You can use the suggested questions to get started, but you are not required to answer them in your response.

  • In Gradescope, select the appropriate reflection assignment based on the due date. All of the assignments are already available, so you can complete them at any time!

I am always available if you have any questions or comments about the content presented in these materials. Additionally, if you come across a piece that is not on this list and you would like to use it for a statistics reflection, you are welcome to send me an email. Please include a link to the piece and a brief description of how it connects data science and society.

Deadlines

  1. September 15, 11:59 PM

  2. October 6, 11:59 PM

  3. October 27, 11:59 PM

  4. November 17, 11:59 PM

Submit your reflection to the appropriate assignment on Gradescope

Materials

A Primer on Non-Binary Gender and Big Data
The author offers several questions you may want to consider in your response:
  • What potential insights might we derive from working with non-binary gender and data?
  • What are the risks to gener minorities in relation to data?
  • What kinds of variation do we see across culture, context, and history?
  • How might non-binary gender and data deal with intersectionality (Click this link to learn more about intersectionality)
Additionally, you may want to consider:
  • Is it always useful/important to collect data on gender? In which domains might it be more important than others?

How eugenics shaped statistics
  • Consider this quote: "The separation was everything—not how much, what else might explain it, or why it mattered, just that it was there." Reflect on what this means for you as a data scientist
  • The article argues that you cannot separate the science from the scientist. Do you agree?
  • Reflect on the argument that Galton, Pearson, and Fisher's views are "a product of their time."

Abolish Big Data
  • What does Milner mean when she says "abolish big data?"
  • What role does data literacy play in the use of big data tools? What role do data scientists have in the implementation of these tools?

Data sonification - from deep space research to improving lives through cancer research AND
Making data sing | Margaret Anne Schedel
  • What are the advantages and disadvantages of data sonification?
  • How might data sonification make data more accessible to people with disabilities? Can you think of any other ways data could be made more accessible?

Fairness in Machine Learning with Sherri Rose (interview starts at 8:15 and ends at 46:00)
  • Reflect on Dr. Rose's recommendations for ensuring fairness in machine learning
  • Consider Dr. Rose's comments about the "single metric leaderboard" and what it means for you as a data scientist.

Peter Donnelly: How stats fool juries (you can skip past the stats jokes and start at 3:30)
  • This talk is from 2007. Can you think of more recent examples of misleading statistics/misuse of statistical principles?
  • How do you think situations like the trial Donnelly describes can be avoided?