Public Service Data Science Graduate Fellowship

The HDSI Public Service Data Science Graduate Fellowship supports Master’s students in Harvard’s data science programs (Biomedical Informatics, Health Data Science, Data Science) who want to explore career paths at not-for-profit and public sector organizations through a summer internship.

The fellowship includes a $10,000 stipend to support an unpaid summer internship at a not-for-profit or public sector organization that either (a) applies data science to solve social challenges, or (b) advocates for responsible data science.

NOW CLOSED | 2020 Request for Applications

2020 Fellows


Iris Braunstein (Harvard Medical School)
Dana Farber Cancer Institute

Dr. Eli Van Allen at Dana Farber Cancer Institute is leading the development of a patient education and data donation portal. We aim to help cancer patients understand how and why they can access their own medical data, and optionally share it with Dana Farber for research purposes. We hope that helping patients gain access to their clinical information will help them engage more effectively in their own care, while providing a foundation to build a rich research database in oncology. This project follows observations of patient challenges in acquiring personal medical data, detailed in this Forbes article.

2019 Fellows


Rachel Moon (Harvard John A. Paulson School of Engineering and Applied Sciences)
Austin Pets Alive!

During 10 weeks of her summer internship Rachel worked in various data science and software engineering projects. Her main project was building a Matchmaking App, a web application that works like Tinder to match potential adopters with dogs. When APA! visitors interested in adopting dogs sign in at the reception area, the web app uses their home environment information to recommend dogs that they can take home, prioritizing the long-stay dogs in need of home. The Matchmaking App is currently being used by APA! visitors who are interested in adopting dogs. The app not only matches visitors with the animals they can adopt, but also streamlines the adoption process by facilitating the work of the matchmakers (adoption managers), and therefore helps shelter dogs get adopted.

For her second project, Rachel assessed the dog adoption capacity of Travis County, where APA! and Austin is located in. Because APA! brings in animals even from outside of county, the wanted to assess whether APA!’s out-of-county intakes are over-saturating Travis County’s adoption capacity, thereby negatively impacting Travis County city shelter’s (Austin Animal Center) adoption rates. Rachel collected and cleaned AAC and APA!’s dog intakes and outcomes data and performed statistical and mathematical analyses to determine whether such was the case.

Rachel’s Blogpost

Pradeep Mangalath (Harvard Medical School)
Cures With Reach For Cancer

One of Pradeep’s primary projects was to develop an evidence synthesis framework for drug repurposing. Each year 17 million people are diagnosed with cancer and nearly $1 trillion is spent on cancer care. Hundreds of FDA approved non-cancer generic drugs have shown promise for treating cancer in preclinical or small-scale clinical studies. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents an opportunity to rapidly improve patient outcomes and reduce healthcare costs.

Working with our collaborators at IBM Research’s Science for Social Good Initiative and Northeastern University, Pradeep’s team developed machine learning algorithms and applied natural language processing (NLP) methods trained on biomedical literature to identify scientific studies describing anti-cancer activity for any drug of interest. This is a challenging problem that involves searching for contextually specific information among large volumes of text, applying pattern-recognition techniques to identify relevant phrases that signal evidence, and inferring levels of therapeutic evidence contained in the identified signal. For example, a paper might peripherally mention a drug, such as aspirin, without the drug actually being used to treat cancer in the study. This is just one example of noise that needed to be computationally identified and isolated.

Pradeep’s Blogpost