Industry Seminar: Anupam Datta, Snowflake

What is your Agent’s GPA? Measuring and improving agent quality

Abstract: Agents have goals; they plan and act to achieve their goals, often alternating between planning and action steps to refine their plans after reflecting on the results of their actions. Observing that agent failures arise when their goals, plans, and actions are not aligned, we introduce a framework for evaluating and improving an agent’s GPA or Goal-Plan-Action alignment. We create a set of four evaluation metrics – Plan Quality, Plan Adherence, Execution Efficiency, and Logical Consistency – with LLM judges to measure an agent’s GPA. Plan Quality checks whether an agent’s plans are aligned with their goals. Plan Adherence checks if an agent’s actions are aligned with their plan. Execution efficiency checks whether the agent executed in the most efficient way to achieve its goal. Finally, Logical Consistency checks if an agent’s actions were logically consistent and can help identify issues with planning and action steps. Our experimental results on two benchmark datasets – an internal dataset for the Snowflake Intelligence data agent and the public GAIA dataset – shows that this framework (a) provides a systematic way to cover a broad range of agent failures; (b) exhibits strong agreement between human and LLM judges, ranging from 80% to over 95%; and (c ) informs significant improvement to agent performance by aiding in prompt optimization, tool selection and more.

Seminar Recording

front facing portrait of a male wearing glasses

Anupam Datta

Principal Research Scientist and Snowflake AI Research Lead
Snowflake

Bio

Anupam Datta is a Principal Research Scientist and Snowflake AI Research Lead at Snowflake. He joined Snowflake as part of the acquisition of TruEra where he served as Co-Founder, President, and Chief Scientist from 2019-2024. Datta was on the faculty at Carnegie Mellon University from 2007-2022, most recently as a tenured Professor of Electrical & Computer Engineering and Computer Science. Datta’s current research focuses on Trustworthy AI, spanning evaluation, explainability, fairness, and adversarial robustness of ML models and GenAI applications.