Newly Funded: Six Competitive Research Awards

The Harvard Data Science Initiative Competitive Research Fund provides targeted seed and bridge funding to Harvard faculty who propose novel methods, innovations, or solutions to data science challenges. Since 2017 the HDSI has provided over $2.2 Million in funding across the University. Learn more about the Competitive Research Fund.

Decoding Gene Regulation with DNA Large Language Models in the Mammalian Brain

Under the direction of Michael Greenberg, this project adapts DNA large language models to chart gene-regulatory activity across mammalian brain cell types. By integrating conditional-LoRA adapters with single-cell embeddings and activity-dependent signals, the team will train models on a five-million-cell multi-omic atlas, evaluate cross-species and cross-region generalization, and apply interpretability tools to identify human regulatory variants linked to neurological disease. Resulting datasets and model checkpoints will be openly released.

Learn more about Professor Greenberg’s research.

Enhancing research data metadata using Generative AI

Guided by Gary King, this project applies generative AI to automatically extract, standardize, and ontology-align variable-level metadata in Harvard Dataverse datasets. Fine-tuned open-source language models, paired with user studies and integration into Dataverse, aim to improve dataset discovery, interpretability, and interoperability, ultimately scaling across the Dataverse Network to strengthen FAIR data practices.

Learn more about Professor King’s research.

Generating Diversified Protein Conformations via Sequential Sampling

Building on recent advances in protein modeling, Samuel Kou’s team is developing a sequential-sampling framework that perturbs AlphaFold2 inputs to generate a wider landscape of biologically meaningful protein conformations. After early success recovering known conformational states, the project will refine sampling criteria and methodologies to systematically expand structural coverage.

Lean more about Professor Kou’s research.

Adversarial Challenges in AI Image Models: Unveiling the Unknown Unknowns

Vijay Janapa Reddi’s project establishes a data-centric benchmarking framework to surface hidden safety vulnerabilities in text-to-image generative models. By crowd-sourcing prompts that provoke harmful or biased outputs and pairing them with automated evaluation tools, the effort will produce a robust dataset and methodology to improve the safety and reliability of future image-generation systems.

Learn more about Professor Reddi’s research.

AI-Driven Discovery of Poly-specific Pharmaceutical Compounds that Induce Multidrug Resistance in N. gonorrhoeae

In work led by Pamela Silver, researchers are coupling a fluorescence-based assay with graph neural networks to identify small molecules—including FDA-approved drugs—that activate multidrug resistance pathways in N. gonorrhoeae. The team will fine-tune models on experimental measurements, virtually screen large chemical libraries, and validate candidate molecules to produce a searchable dataset and predictive tool for identifying resistance-inducing compounds.

Learn more about Professor Silver’s research.

Data Driven Discovery of Electrophysiological Biomarkers of Pain

Shriya Srinivasan’s team is creating objective electrophysiological biomarkers of pain by decoding compound peripheral nerve potentials using fiber-aware preprocessing and sequential machine-learning models. Drawing on recordings from 106 rats across six nerve-injury paradigms, the project aims to enable real-time classification of pain type, state, and intensity for future personalized clinical applications.

Learn more about Professor Srinivasan’s research.