IDS 702: Data Modeling and Representation
Fall 2023
Duke University
Course Objectives
Developing an understanding of statistical modeling is a key component of becoming a data scientist. Statistical models are used to answer research questions and obtain meaningful insights from many kinds of data.
Broadly, this course will cover the following topics:
- Linear Regression
- Generalized Linear Models
- Special topics, including survival models and hierarchical models
But here in the MIDS program, understanding the content is only the beginning. Successful data scientists are critical thinkers, problem solvers, effective communicators, and enthusiastic collaborators. With that in mind, this course aims to meet four key learning objectives:
By the end of the course, students should be able to
- Fit and interpret statistical models, including linear and generalized linear models.
- Map a research question and dataset to the appropriate statistical model
- Make careful and critical decisions about model building and consider real-world implications
- Communicate (through written and oral communication) model results to a broad audience
Course Components
Classwise
We will use a platform called Classwise for lecture videos before class meetings. You will be required to engage with prep materials (mostly videos) and answer comprehension questions in Classwise before coming to class. The prep materials will primarily cover theoretical modeling concepts. Class meetings will then focus on implementation in R and application exercises. Classwise materials will be posted on the course website.
The Classwise course component connects to the first learning objective: Fit and interpret statistical models, including linear and generalized linear models.
Application exercises
During class meetings, we will complete application exercises. The goal of these exercises is to practice implementing and interpreting statistical models in R. You are encouraged to work on these exercises in small groups.
The application exercise course component connects to the first learning objective: Fit and interpret statistical models, including linear and generalized linear models.
Data analysis assignments
You will have three data analysis assignments to complete during the semester. Data analysis assignments require fitting and interpreting statistical models and communicating results to a broad audience. Each assignment will have a unique structure to develop written and oral communication skills.
You are encouraged to talk to each other about general concepts, or to the instructor/TAs. However, the write-ups, solutions, and code MUST be entirely your own work. The assignments must be typed up using Quarto and submitted on Gradescope. Note that you will not be able to make online submissions after the due date, so be sure to submit before the Gradescope-specified deadline.
Data analysis assignments connect to all four learning objectives.
Statistics reflections
You will be responsible for four statistics reflections throughout the semester. I have assigned six pieces that cover various topics related to the interaction between data science and society. You are to write a written reflection about the material that you select. Questions are provided as prompts, but you are not required to answer them in your reflection. Grades will be based on completion and thoughtful engagement. The chosen articles/videos address topics that may be sensitive and/or uncomfortable, including racism, eugenics, and gender identity. It is crucial that you engage in the reflections thoughtfully and respectfully. I seek to create a classroom environment that not only acknowledges diversity in all forms, but celebrates it. In that endeavor, I would be remiss not to acknowledge the discriminatory ways statistical science has been used both historically and currently. In having these important discussions, I want you to critically examine and appreciate the power (good and bad) of statistics as you begin your career as a data scientist. More information can be found on the statistics reflections page on the course website.
Statistics reflections connect to the third course learning objective: Make careful and critical decisions about model building and consider real-world implications
Team project
You and your team will apply the knowledge and skills learned throughout this course to analyze a dataset that interests you. The project should be an in-depth statistical analysis of a particular research question. Your team will select the dataset. Teams will be assigned. More detailed information will be available on the course website. The project will have multiple components:
Proposal
Exploratory data analysis report
Statistical analysis plan
Final deliverable and presentation
Team member evaluation
The team project connects to all four course learning objectives.
Grade Calculation
| Component | Percentage |
|---|---|
| Participation (Classwise) | 10% |
| Statistics reflections | 15% |
| Data analysis assignments | 45% |
| Team project | 30% |
Letter grade scales may be adjusted at the end of the semester. Cumulative averages \(\geq 90\%\) are guaranteed at least an A-, cumulative averages \(\geq 80\%\) are guaranteed at least a B-, and cumulative averages \(\geq 70\%\) are guaranteed at least a C-
Regrade requests can be made on Gradescope within 24 hours of the assignment’s grade release. Regrade requests for final project reports/presentations must be made within 12 hours of grade release.
There are no make-ups for any graded work except for cases of medical/personal/familial emergencies. Make-ups/extension requests must be made to the instructor BEFORE the assignment deadline.
Course policies
Late submissions
You (or your team when applicable) will lose 50% of the total points on each assignment if you submit within the first 24 hours after it is due. You will lose 100% of the total points if you submit later than that without explicit approval from the instructor.
Class meeting attendance
I expect all students to attend all class meetings. However, I know that things come up once in a while. Therefore, I expect 90% attendance each class period. If class meeting attendance begins to consistently drop below the 90% threshold, I will institute a more stringent, individual attendance policy. Note that class meetings will not be recorded.
Academic integrity
As a student in this course, you have agreed to uphold the Duke Community Standard as well as the practices specific to this course. This means that the work you submit is your own, even if you discuss assignments with your classmates. Additionally, when consulting resources (books, internet articles including stackexchange, chatgpt), you must cite them. If you have any questions about what should be cited in your work, feel free to reach out to the instructor.
Inclusive community
It is my intent that students from all diverse backgrounds and perspectives be well-served by this course, that students’ learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength, and benefit. It is my intent to present materials and activities that are respectful of diversity and in alignment with Duke’s commitment to diversity and inclusion. Your suggestions are encouraged and appreciated. Please let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups.
Furthermore, I would like to create a learning environment for my students that supports a diversity of thoughts, perspectives and experiences, and honors your identities. To help accomplish this:
If you feel like your performance in the class is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. If you prefer to speak with someone outside of the course, I encourage you to speak with MIDS administrators.
I (like many people) am still in the process of learning about diverse perspectives and identities. If something was said in class (by anyone) that made you feel uncomfortable, please let me or a member of the teaching team know.
Resources
Duke Counseling & Psychological Services (CAPS) helps Duke Students enhance strengths and develop abilities to successfully live, grow and learn in their personal and academic lives. CAPS offers many services to Duke students, including brief individual and group counseling, couples counseling and more. CAPS staff also provides outreach to student groups, particularly programs supportive of at-risk populations, on a wide range of issues impacting them in various aspects of campus life. CAPS provides services to students via Telehealth. To initiate services, you can contact their front desk at 919-660-1000.
If there is any portion of the course that is not accessible to you due to challenges with technology or the course format, please let me know so we can make appropriate accommodations. The Student Disability Access Office (SDAO) is available to ensure that students are able to engage with their courses and related assignments.
The Academic resource center provides learning resources to help you maximize your academic capabilities.