AI and Data Science: Integrating Artificial and Human Ecosystems

114 Western Avenue, 2nd Floor, 114 Western Avenue, Allston, MA

Headshots of speakers.

Join us in celebrating the 5th Anniversary of Harvard Data Science Review (HDSR) at an exciting series of talks and panel discussions devoted to AI presented by a group of HDSR’s editors and authors.

The symposium will kick off with a fascinating presentation by an historian of science on the history of AI since the French Revolution and will conclude in the afternoon by the Director of the MIT Laboratory for Financial Engineering at Sloan School of Management on the use of generative AI for providing financial investment advice.

Other topics will include AI and data science education, AI and ethical considerations, and the impact of the generative AI revolution on AI policy and governance.

This event is free and open to the public.

AGENDA:

Welcome | 9:00 AM – 9:15 AM

  • Xiao-Li Meng, Founding Editor-in-Chief, Harvard Data Science Review, Harvard University

Opening Remarks | 9:15 AM – 9:30 AM

  • John Shaw, Vice Provost, Harvard University

Fictions of Man: Automation, Abstraction, and Artificial Intelligence | 9:30 AM – 10:20 AM

Abstract: Artificial intelligence has been widely accused of arrogance, hubris even – a vast body of critical scholarship argues that AI’s capacities have been consistently over-exaggerated and that difficult problems have often been over-simplified so that AI can be celebrated for having “solved” them. But AI has another history, one rife with anxiety and doubt. AI was developed in the context of the so-called “control sciences” of the American Cold War – systems engineering, cybernetics, game theory, and automata studies foremost among them. These in turn participate in epistemic and political histories that extend through European industrialization and the Enlightenment. This talk will situate modern AI in the history of attempts to be in control (of populations, of geopolitics, of nature) and the vision of human beings and human decision making that run alongside.


Break | 10:20 AM – 10:40 AM


Generative AI and Statistical Data Science: Opportunity and Challenges | 10:40  AM – 11:30 AM

Abstract: This talk introduces and explores the dynamic interplay between generative AI and statistical data science highlighting both the transformative opportunities and emerging challenges. We will delve into how generative models, such as autoregressive language models and diffusion models, are reshaping data analysis prediction and decision-making. We will discuss the mathematical underpinnings of this methodology and open theoretical questions. The session aims to provide insights into leveraging these technologies to advance data science while conscientiously navigating their complexities


Panel: Amid Advancement, Apprehension, & Ambivalence: AI in the Human Ecosystem | 11:30 AM – 12:30 PM

Moderator: 

Speakers: 


Lunch | 12:30 PM – 1:30 PM


Future Shock:  How did the Generative AI Revolution Trigger an International AI Policy and Governance Crisis | 1:30 PM – 2:30 PM

  • David Leslie, The Alan Turing Institute and Queen Mary University of London

Abstract: Despite the abruptness that characterized the eruption of the generative AI (GenAI) revolution, the extent to which this triggered ‘future shock’ among AI policy and governance communities across the globe remains debatable. Decades of policy development in areas such as cybersecurity, consumer protection, intellectual property, online safety, and data privacy and protection had yielded standards, laws, and regulations that formed a robust conceptual basis upon which policymakers could have drawn in confronting the many risks presented by the industrial scaling of GenAI technologies. Likewise, for several years leading up to the release of ChatGPT, the coalescence of global stakeholders around the key values and principles needed for responsible and trustworthy AI should have prepared the AI policy and governance community to respond forcefully, and coherently, to the myriad societal challenges posed by GenAI’s rapid spread. In this talk, Professor Leslie will explore why this has not been the case. Professor Lesle will focus, in particular, on how enforcement gaps and lack of regulatory AI capacity, difficulties in the moving from AI principles to practice, and dynamics of unprecedented technological and industrial scaling and centralization have lain behind the fraught reality of an AI policy and governance ecosystem that has struggled to cope with the widespread societal dislocation affected by the rapid advent of the GenAI revolution.


Bridging the Gap: A Comprehensive Approach to Responsible Data Science Education | 2:30 PM – 3:20 PM

  • Bin Yu, University of California, Berkeley

Abstract: The rapid advancement of AI relies heavily on the foundation of data science, yet its education significantly lag behind its demand in practice. The upcoming book ‘Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making’ (Yu and Barter, MIT Press, 2024; free online at www.vdsbook.com) tackles this gap by promoting Predictability, Computability, and Stability (PCS) as core principles for trustworthy data insights. It thoroughly integrates these principles into the Data Science Life Cycle (DSLC), from problem formulation to data cleaning and to result communication, fostering a new standard for responsible data analysis. In this talk, I will delve into the book’s motivations, comparing its approach with traditional ones. Using materials from chapters on data cleansing and clustering analysis, I will demonstrate PCS’s practical applications and describe four types of homework assignments—True/False, conceptual, mathematical, and data analysis and coding—to solidify learners’ grasp. Time permitting, I will discuss a prostate cancer research case study, illustrating PCS’s effectiveness in real-world data analysis.


Break | 3:20 PM – 3:40 PM


Panel: Democratizing Data: Discovering Data Use and Value for Research and Policy | 3:40 PM – 4:40 PM

Moderator:

Speakers:

  • Nancy Potok, New York University and George Washington University
  • Emilda Rivers, National Center for Science and Engineering Statistics (NCSES) at the National Science Foundation (NSF)

Abstract: This panel provides an overview of the Democratizing Data project and its importance for supporting the Evidence Act.  It discusses the potential ways in which the approach can be used by both programmatic and statistical agencies.  The participants will also discuss a vision for the value of data search and discovery platforms in supporting researchers from marginalized and under-represented communities.


Can ChatGPT Plan Your Retirement?: Generative AI and Financial Advice | 4:40 PM – 5:30 PM

  • Andrew Lo, Massachusetts Institute of Technology

Abstract: The emergence of generative AI as a powerful tool for the masses creates both challenges and opportunities in how they’re used and potentially abused. In this talk, Prof. Lo will consider the three issues facing most, if not all, applications of large language models (LLMs): domain-specific expertise, the ability to tailor that expertise to a user’s unique situation, and trustworthiness and adherence to the user’s moral and ethical standards, and conformity to regulatory guidelines and oversight. For concreteness, the talk will focus on the narrow context of financial advice, which serves as an ideal test bed both for determining the possible shortcomings of current LLMs and for exploring ways to overcome them. The goal will not be to provide solutions to these challenges—which will likely take years to develop—but to propose a framework and road map for solving them as part of a larger research agenda for improving generative AI in all applications.


Agenda subject to change.