Agentic AI for Scientific Discovery: Benchmarks, Frameworks, and Applications

Zonglin Yang Zonglin Yang MiroMind
Chandan Reddy Chandan Reddy Virginia Tech
Xinya Du Xinya Du University of Texas at Dallas

Detailed Tutorial Description

Overview

The rise of large language models (LLMs) has introduced a paradigm shift in how AI can contribute to science. Beyond serving as static predictors, LLMs can function as agents that actively generate, refine, and evaluate hypotheses. This tutorial provides a structured overview of how agentic AI can accelerate the scientific discovery process, grounded in recent advances in benchmarks, frameworks, and applications.

Motivation

Traditional machine learning excels at prediction but falls short in hypothesis-driven discovery, where novelty, interpretability, and iterative reasoning are essential. The promise of agentic AI lies in closing this gap. By structuring the discovery process into two complementary phases, we highlight how AI can play an active role in advancing science:

  1. Hypothesis Generation – AI agents propose candidate hypotheses by retrieving inspirations, composing associations, and ranking them for plausibility.
  2. Feedback and Refinement – Hypotheses are iteratively improved using diverse feedback signals, including data fit, reasoning consistency, symbolic decomposition, or benchmark performance.

This cycle mirrors the way human scientists move from initial ideas to refined, testable hypotheses, but accelerates it through automated reasoning and structured agentic workflows.

Tutorial Outline

  1. Introduction to Agentic AI in Science
    • From prediction to discovery
    • Defining “agentic AI” and distinguishing it from static LLM use
    • Motivating examples
  2. Phase I: Hypothesis Generation
    • Inspiration retrieval and knowledge recombination
    • From qualitative hypotheses to symbolic formulations
    • Ranking strategies and novelty assessment
  3. Phase II: Feedback and Refinement
    • Iterative optimization using feedback signals
    • Data-driven evaluation, symbolic decomposition, and reasoning consistency checks
    • Hierarchical refinement from coarse ideas to fine-grained hypotheses
  4. Benchmarks for Scientific Discovery
    • Limitations of existing datasets (memorization vs reasoning)
    • Principles for robust benchmark design
    • Recent benchmarks for equations, hypotheses, and surfaces
  5. Frameworks for Agentic Discovery
    • Decomposition strategies, memory mechanisms, and feedback loops
    • Integration of evolutionary search and reinforcement learning
    • Examples of agentic workflows
  6. Applications Across Sciences
    • Social sciences (open-domain hypothesis generation)
    • Natural sciences (equation discovery, symbolic modeling)
    • Broader applications in AI for science
  7. Challenges and Future Directions
    • Reliability, interpretability, reproducibility
    • Balancing creativity and validity
    • Toward hybrid AI–science collaborations

Target Audience

Researchers and practitioners in machine learning, NLP, and AI for science who are interested in symbolic reasoning, agentic frameworks, and automated discovery. The tutorial is accessible to those with general familiarity with LLMs and does not require deep domain expertise.

Learning Outcomes

Participants will gain:


Reading List

Introduction

Pre-experiment Phase

Experiment-guided Phase (Efficient Experimentation)

Symbolic Regression Methods

Search Symbolic Regression Methods

Learning Symbolic Regression Methods

Learning + Search Symbolic Regression Methods

LLM-guided Symbolic Regression Methods

Symbolic Regression Benchmarks

Experiment-guided Phase (Costly Experimentation)