Edward McFowland III, Harvard University
October 20, 2022
Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem
Abstract: Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy uses predictive modeling techniques to “mine” variables of interest from available data, then includes those variables into an econometric framework to estimate causal effects. However, because the predictions from machine learning models are inevitably imperfect, econometric analyses based on the predicted variables likely suffer from bias due to measurement error.
Discussant: Iavor Bojinov
Edward Kennedy, Carnegie Mellon
November 3, 2022
Optimal nonparametric estimation of heterogeneous causal effects
Abstract: Estimation of heterogeneous causal effects — i.e., how effects of policies and treatments vary across units — is fundamental to medical, social, and other sciences, and plays a crucial role in optimal treatment allocation, generalizability, subgroup effects, and more. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there have remained important theoretical gaps in understanding if and when such methods make optimally efficient use of the data at hand. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). This talk surveys work across two recent papers in this context. First, we study a two-stage doubly robust estimator and give a new error bound, which, despite its generality, yields sharper results than those in the current literature. The second contribution is aimed at understanding the fundamental statistical limits of CATE estimation. We resolve this long-standing problem by deriving a minimax lower bound, with matching upper bound obtained via a new estimator based on higher order influence functions. Applications in medicine and political science are considered.
Discussants: Nima Hejazi and James M. Robins
Fabrizia Mealli, University of Florence
November 17, 2022
Selecting Subpopulations in RD Designs Selecting Subpopulations for Causal Inference in Regression Discontinuity Designs
Abstract: The Brazil Bolsa Familia program is a conditional cash transfer program aimed to reduce short-term poverty by direct cash transfers and to fight long-term poverty by increasing human capital among poor Brazilian people. Eligibility for Bolsa Familia benefits depends on a cutoff rule, which classifies the study as a regression discontinuity (RD) design. Extracting causal information from RD studies is challenging. Following Li, Mattei, Mealli (2015) and Branson, Mealli (2019), we formally describe the RD design as a local randomized experiment within the potential outcome approach. Under this framework, causal effects can be identified and estimated on a subpopulation where a local overlap assumption, a local SUTVA and a local ignorability assumption hold. We first discuss the potential advantages of this framework, in settings where assumptions are judged plausible, which concern both the definition of the causal estimands, as well as the design and the analysis of the study, and the interpretation and generalizability of the results.
Discussants: Matthew Blackwell and Rui Duan
Elizabeth Ogburn, Johns Hopkins
December 1, 2022
Disentangling confounding and dependence in spatial statistics
Abstract: In the first half of the talk, I will explain how “nonsense associations” can arise when an exposure and an outcome of interest exhibit similar patterns of dependence. Nonsense associations are distinct from confounding, but this distinction has been blurred in many dependent-data application areas. In the second half of the talk I will discuss implications for understanding the literature on spatial confounding and propose a causally informed approach to spatial confounding.
Discussants: Kate Hu
Alberto Abadie, MIT
March 2, 2023
When should you adjust standard errors for clustering?
Abstract: Clustered standard errors, with clusters defined by factors such as geography, are widespread in empirical research in economics and many other disciplines. Formally, clustered standard errors adjust for the correlations induced by sampling the outcome variable from a data-generating process with unobserved cluster-level components. However, the standard econometric framework for clustering leaves important questions unanswered: (i) Why do we adjust standard errors for clustering in some ways but not others, for example, by state but not by gender, and in observational studies but not in completely randomized experiments? (ii) Is the clustered variance estimator valid if we observe a large fraction of the clusters in
the population? (iii) In what settings does the choice of whether and how to cluster make a difference? We address these and other questions using a novel framework for clustered inference on average treatment effects. In addition to the common sampling component, the new framework incorporates a design component that accounts for the variability induced on the estimator by the treatment assignment mechanism. We show that, when the number of clusters in the sample is a non-negligible fraction of the number of clusters in the population, conventional clustered standard errors can be severely inflated, and propose new variance estimators that correct for this bias.
Discussants: Laura Hatfield and Xiao-Li Meng
Georgia Papadogeorou, University of Florida
April 6, 2023
Spatial causal inference in the presence of unmeasured confounding and interference
Abstract: Causal inference in spatial settings is met with unique challenges and opportunities. On one hand, a unit’s outcome can be affected by the exposure at many locations, leading to interference. On the other hand, unmeasured spatial variables can confound the effect of interest. Our work has two overarching goals. First, using causal diagrams, we illustrate that spatial confounding and interference can manifest as each other, meaning that investigating the presence of one can lead to wrongful conclusions in the presence of the other, and that statistical dependencies in the exposure variable can render standard analyses invalid. This can have crucial implications for analyzing data with spatial or other dependencies, and for understanding the effect of interventions on dependent units. Secondly, we propose a parametric approach to mitigate bias from local and neighborhood unmeasured spatial confounding and account for interference simultaneously. This approach is based on simultaneous modeling of the exposure and the outcome while accounting for the presence of spatially-structured unmeasured predictors of both variables. We illustrate our approach with a simulation study and with an analysis of the local and interference effects of sulfur dioxide emissions from power plants on cardiovascular mortality.
Ivan Diaz, NYU
May 4, 2023
Causal influence, causal effect, and the identification of mediation parameters
Abstract: Recent approaches to causal inference have focused on the identification and estimation of causal effects, defined as (properties of) the distribution of counterfactual outcomes under hypothetical actions that alter the nodes of a graphical model. In this talk we will explore an alternative approach using the concept of causal influence, defined through operations that alter the information propagated through the edges of a directed acyclic graph. Causal influence may be more useful than causal effects in settings in which interventions on the causal agents are infeasible or of no substantive interest, for example when considering gender, race, or genetics as a causal agent. Furthermore, the “information transfer” interventions proposed allow us to address a long-standing problem in causal mediation analysis, namely the non-parametric identification of path-specific effects in the presence of treatment-induced mediator-outcome confounding.
Discussants: Sara Lodi and Jessica Young