Aldrich Hall 008, Harvard Business School, Boston, MA 02134
Join us for five presentations on causal inference from Ph.D. candidates from around the university. Refreshments will be provided.
Learning What to Learn: Experimental Design when Combining Experimental with Observational Evidence
Experiments deliver credible treatment-effect estimates but, because they are costly, are often restricted to specific sites, small populations, or particular mechanisms. A common practice across several fields is therefore to combine experimental estimates with reduced-form or structural external (observational) evidence to answer broader policy questions such as those involving general equilibrium effects or external validity. We develop a unified framework for the design of experiments when combined with external evidence, i.e., choosing which experiment(s) to run and how to allocate sample size under arbitrary budget constraints. Because observational evidence may suffer bias unknown ex-ante, we evaluate designs using a minimax proportional-regret criterion that compares any candidate design to an oracle that knows the observational study bias and jointly chooses the design and estimator. This yields a transparent bias-variance trade-off that does not require the researcher to specify a bias bound and relies only on information already needed for conventional power calculations. We illustrate the framework by (i) designing cash-transfer experiments aimed at estimating general equilibrium effects and (ii) optimizing site selection for microfinance interventions.
Speaker: Aristotle Epanomeritakis
GenAI-powered Inference
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages opensource Generative Artificial Intelligence (GenAI) models—such as large language models and diffusion models—not only to generate unstructured data at scale but also to extract lowdimensional representations that are guaranteed to capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates’ facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.
Speaker: Kentaro Nakamura
Monotonic Path-Specific Effects: Application to Estimating Educational Returns
Conventional research on educational effects typically either employs a “years of schooling” measure of education, or dichotomizes attainment as a point-in-time treatment. Yet, such a conceptualization of education is misaligned with the sequential process by which individuals make educational transitions. In this paper, I propose a causal mediation framework for the study of educational effects on outcomes such as earnings. The framework considers the effect of a given educational transition as operating indirectly, via progression through subsequent transitions, as well as directly, net of these transitions. I demonstrate that the average treatment effect (ATE) of education can be additively decomposed into mutually exclusive components that capture these direct and indirect effects. The decomposition has several special properties which distinguish it from conventional mediation decompositions of the ATE, properties that facilitate less restrictive identification assumptions as well as identification of all causal paths in the decomposition. An analysis of the returns to high school completion in the NLSY97 cohort suggests that the payoff to a high school degree stems overwhelmingly from its direct labor market returns. Mediation via college attendance, completion and graduate school attendance is small because of individuals’ low counterfactual progression rates through these subsequent transitions.
Speaker: Aleksei Opacic
Priors, Pooling, and Profiles: Unraveling Information Borrowing in Bayesian Hierarchical Models
Bayesian hierarchical models are widely used to stabilize group-specific estimates through partial pooling, with applications across the biomedical and social sciences as well as policy research. However, the amount, direction, and mechanism of information borrowing in these models often remain opaque. We develop a transparent, design-based interpretation of Bayesian hierarchical linear models by deriving closed-form implied weights for a broad class of group-level estimators. Each group-specific estimate admits an exact decomposition into (i) local weights, which capture within-group information, and (ii) global weights, which quantify information borrowed from other groups. We show that both sets of weights solve convex optimization problems with interpretable balance and regularization structures. This formulation, to our knowledge, makes explicit a previously implicit connection between Bayesian hierarchical models and generalization in causal inference: the target distribution, or synthetic covariate profile, induced by a hierarchical model interpolates between a fully pooled profile and a group-specific profile, with the prior variance governing the degree of pooling. We further decompose the global weights to characterize how between-group similarity drives information borrowing. Building on these results, we introduce diagnostics that quantify information borrowing, effective sample size, synthetic covariate balance, and cross-group influence. These tools provide practical insight into the behavior of hierarchical models beyond prior specification and clarify the mechanics of partial pooling and study representativeness.
Speaker: Wenqi Shi
Understanding Spatial Regression Models from a Weighting Perspective in an Observational Study of Superfund Remediation
A key challenge in environmental health research is unmeasured spatial confounding, driven by unobserved spatially structured variables that influence both treatment and outcome. A common approach is to fit a spatial regression that models the outcome as a linear function of treatment and covariates, with a spatially structured error term to account for unmeasured spatial confounding. However, it remains unclear to what extent spatial regression actually accounts for such forms of confounding in finite samples, and whether this regression adjustment can be reformulated from a design-based perspective. Motivated by an observational study on the effect of Superfund site remediation on birth outcomes, we present a weighting framework for causal inference that unifies three canonical classes of spatial regression models—random effects, conditional autoregressive, and Gaussian process models—and reveals how they implicitly construct causal contrasts across space. Specifically, we show that: (i) the spatial error term induces approximate balance on a latent set of covariates and therefore adjusts for a specific form of unmeasured confounding; and (ii) the covariance structure of the spatial error can be equivalently represented as regressors in a linear model. Building on these insights, we introduce a new estimator that jointly addresses multiple forms of unmeasured spatial confounding and develop visual diagnostics. Using our new estimator, we find evidence of a small but beneficial effect of remediation on the percentage of small vulnerable newborns.
Speaker: Sophie Woodward




