Research at the Harvard DSI centers around four themes that reflect faculty interests and strengths. The inaugural research themes— networks and markets, personalized health, data-driven scientific discovery, and evidence-based policy—are intentionally multidisciplinary to align with research interests across Harvard Schools. Further, they are built on a foundational commitment to advancing data science methodologies.
PERSONALIZED HEALTH | It is increasingly possible to form a much more comprehensive view of individuals: our fitness routines, the air we breathe, the housing we live in, the health care we have access to, family history, eating habits, engagement with social media, genetic data, and so on. With this, data science can begin to account for the millions of variables that make us unique—variables that can have life-or-death consequences for a patient— and help health care providers to make the right decisions. The promise is to develop individualized treatments, diagnostic tools, and policy interventions to revolutionize the way we prevent, diagnose, and treat disease. Projects may involve building predictive models for environmental exposures, disease, creating targeted genetic therapies for the individual patient, as well as topics related to the interpretability and robustness of machine learning.
EVIDENCE-BASED POLICY | Good public policy creates the most effective interventions to improve society. By analyzing a wealth of data, whether economic, environmental, societal, or demographic, we can quantify the efficacy of current interventions and design new experiments. Societal challenges of interest include those around economic opportunity, education, transportation, health care, environmental protections, justice, and more. Projects may involve developing new methodologies for causal inference from observational data, meta-studies to understand the impact of policy, large-scale experimentation, and translation of scientific results into public policy.
NETWORKS AND MARKETS | The digital revolution continues to impact everything about our social and economic lives - the way we make purchase decisions, the information we consume and opinions we form, the way finance operates on a micro and macroscale, and the very fabric of democratic society. On one hand, digitization provides new opportunity for understanding human decision-making, understanding social processes, and understanding the way that markets function. At the same time, there are profound questions that we need to grapple with. What should we be optimizing for? What effect will this rapid transformation have on employment? What about concerns in regard to privacy, fairness, and equal access? Projects may involve exploring the role of artificial intelligence in consumer choice and markets more broadly, the analysis and design of new economic systems around block chains and consensus mechanisms, the study of contests and open innovation, and understanding the impact of social networks and online media on the electoral process.
DATA-DRIVEN SCIENTIFIC DISCOVERY | For generations, science has started from a hypothesis, with data collected to confirm or refute the hypothesis, leading to additional theories. Today this is frequently turned upside down. The opportunity for discovery can occur without hypothesis, arising from the application of machine learning and data science methods to massive data sets. This is leading to breakthrough discoveries across many fields - finding new medicines, identifying energy-efficient materials, predicting earthquakes, locating star formations in the Milky Way, and uncovering the design principles of the circuits in our brains - while building new tools for automated discovery and guarding against false discovery.
METHODOLOGY | As massive amounts of data are generated from science, engineering, social sciences, and medicine, researchers are grappling with how to extract value from data to gain new understanding and for the benefit of society. To do this effectively we will continue to need new methodological breakthroughs to extract useful knowledge robustly and at scale from complex and often messy information sources. Harvard is host to a wealth of expertise in what can be termed the “science of the data,” and is a world-leader in the development of associated techniques. The Data Science Initiative aims to unite efforts across the university. By bringing together our Computer Science, Statistics and domain-focused scholars, there is a unique opportunity to accelerate progress--- to establish an entirely new and better way of learning from data and making difficult and important decisions using data.