Code and Data for the Social Sciences: A Practitioner's Guide

TitleCode and Data for the Social Sciences: A Practitioner's Guide
Publication TypeMiscellaneous
Year of Publication2014
AuthorsGentzkow, Matthew, and Shapiro Jesse M.
Keywordscode, Data analysis, Data Mining, Data reuse, Data Standards, research methods
AbstractWhat does it mean to do empirical social science? Asking good questions. Digging up novel data. Designing statistical analysis. Writing up results. For many of us, most of the time, what it means is writing and debugging code. We write code to clean data, to transform data, to scrape data, and to merge data. We write code to execute statistical analyses, to simulate models, to format results, to produce plots. We stare at, puzzle over, fight with, and curse at code that isn’t working the way we expect it to. We dig through old code trying to figure out what we were thinking when we wrote it, or why we’re getting a different result from the one we got the week before. Even researchers lucky enough to have graduate students or research assistants who write code for them still spend a significant amount of time reviewing code, instructing on coding style, or fixing broken code. Though we all write code for a living, few of the economists, political scientists, psychologists, sociologists, or other empirical researchers we know have any formal training in computer science. Most of them picked up the basics of programming without much effort, and have never given it much thought since. Saying they should spend more time thinking about the way they write code would be like telling a novelist that she should spend more time thinking about how best to use Microsoft Word. Sure, there are people who take whole courses in how to change fonts or do mail merge, but anyone moderately clever just opens the thing up and figures out how it works along the way. This manual began with a growing sense that our own version of this self-taught seat-of-the-pants approach to computing was hitting its limits. Copyright (c) 2014, Matthew Gentzkow and Jesse M. Shapiro. E-mail:, Please cite this document as: Gentzkow, Matthew and Jesse M. Shapiro. 2014. Code and Data for the Social Sciences: A Practitioner’s Guide. University of Chicago mimeo,, last updated January 2014.
Full Text
Chicago Booth and NBER March 10, 2014