The HDSI Public Service Data Science Graduate Fellowship supports Master’s students in Harvard’s data science programs (Biomedical Informatics, Health Data Science, Data Science) who want to explore career paths at not-for-profit and public sector organizations through a summer internship.
The fellowship includes a $10,000 stipend to support an unpaid summer internship at a not-for-profit or public sector organization that either (a) applies data science to solve social challenges, or (b) advocates for responsible data science.
2022 Request for Applications (Application Closed)
I work on many things in the growing world of DataPoint Armenia. My direct contribution in research at DataPoint Armenia is in identifying misinformation and hate speech from Twitter data during Armenian-Azerbaijani war. The work is very unique because the Armenia-Azerbaijan 2020 war serves as a unique case study for investigating astroturfing from a data science perspective. Since Azerbaijani government restricted all access to social media in the country except Twitter, the open source tweets became a perfect source for tracking the coordinated information campaigns. I am very excited to share the results of this project once we are done with it this summer.
Some other projects under my supervision are medical records classification, economic impact analysis in Armenia and building an Armenian spell check. I am also leading the educational workshop committee where we make data science workshops.
More than 70% of patients with ulcerative colitis ultimately require surgery. With the advent of expensive medical treatments, the timing of surgery has become an incredibly important question; patients treated medically for too long have worse outcomes. We currently have a prediction model that leverages elements of explainable machine learning methods to predict and visualize transitions to surgery for these patients. My work is to build a platform that takes this machine learning model and makes it accessible to the physician at the bedside.
Karun will be working as a data science intern for the Health Improvement Team (HIT) at Cambridge Health Alliance. His focus will be on applying data science to strengthen HIT’s approach to facilitating collaborative community health needs assessments, monitoring health indicators of importance to the community, and evaluating outcomes of implementation strategies.
Dr. Eli Van Allen at Dana Farber Cancer Institute is leading the development of a patient education and data donation portal. We aim to help cancer patients understand how and why they can access their own medical data, and optionally share it with Dana Farber for research purposes. We hope that helping patients gain access to their clinical information will help them engage more effectively in their own care, while providing a foundation to build a rich research database in oncology. This project follows observations of patient challenges in acquiring personal medical data, detailed in this Forbes article.
During 10 weeks of her summer internship Rachel worked in various data science and software engineering projects. Her main project was building a Matchmaking App, a web application that works like Tinder to match potential adopters with dogs. When APA! visitors interested in adopting dogs sign in at the reception area, the web app uses their home environment information to recommend dogs that they can take home, prioritizing the long-stay dogs in need of home. The Matchmaking App is currently being used by APA! visitors who are interested in adopting dogs. The app not only matches visitors with the animals they can adopt, but also streamlines the adoption process by facilitating the work of the matchmakers (adoption managers), and therefore helps shelter dogs get adopted.
For her second project, Rachel assessed the dog adoption capacity of Travis County, where APA! and Austin is located in. Because APA! brings in animals even from outside of county, the wanted to assess whether APA!’s out-of-county intakes are over-saturating Travis County’s adoption capacity, thereby negatively impacting Travis County city shelter’s (Austin Animal Center) adoption rates. Rachel collected and cleaned AAC and APA!’s dog intakes and outcomes data and performed statistical and mathematical analyses to determine whether such was the case.
One of Pradeep’s primary projects was to develop an evidence synthesis framework for drug repurposing. Each year 17 million people are diagnosed with cancer and nearly $1 trillion is spent on cancer care. Hundreds of FDA approved non-cancer generic drugs have shown promise for treating cancer in preclinical or small-scale clinical studies. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents an opportunity to rapidly improve patient outcomes and reduce healthcare costs.
Working with our collaborators at IBM Research’s Science for Social Good Initiative and Northeastern University, Pradeep’s team developed machine learning algorithms and applied natural language processing (NLP) methods trained on biomedical literature to identify scientific studies describing anti-cancer activity for any drug of interest. This is a challenging problem that involves searching for contextually specific information among large volumes of text, applying pattern-recognition techniques to identify relevant phrases that signal evidence, and inferring levels of therapeutic evidence contained in the identified signal. For example, a paper might peripherally mention a drug, such as aspirin, without the drug actually being used to treat cancer in the study. This is just one example of noise that needed to be computationally identified and isolated.