J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (5): 43-52.doi: 10.3969/j.issn.1000-5641.2025.05.005

• AI-Enabled Open Source Technologies and Applications • Previous Articles     Next Articles

ATBench: Benchmark for evaluating analysis trajectories in end-to-end data analysis

Xufei WANG1,2, Huarong XU1,2, Panfeng CHEN1,2, Mei CHEN1,2, Dan MA1,2,*(), Zhengxi CHEN1,2, Xu TIAN1,2, Hui LI1,2   

  1. 1. State Key Laboratory of Public Big Data, Guiyang 550025, China
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
  • Received:2025-06-27 Online:2025-09-25 Published:2025-09-25
  • Contact: Dan MA E-mail:dma@gzu.edu.cn

Abstract:

This paper introduces ATBench, a benchmark designed for evaluating analysis trajectories in end-to-end data analysis tasks, to address the limitations in granularity and domain coverage present in current benchmarks. Analysis trajectories represent the process in which an agent iteratively poses questions, derives insights, and formulates conclusions around a specific analysis goal via iterative interactions. Leveraging both existing benchmarks and real Kaggle task data, we constructed 151 evaluation datasets spanning eight distinct domains by employing an annotation strategy that balances goal-driven and exploratory approaches. Additionally, we propose a fine-grained evaluation metric, the analysis trajectory score, to assess an agent's coherent analytical capabilities during end-to-end data analysis tasks. Experimental results demonstrate that ATBench exhibits strong stability and discriminative power, effectively distinguishing performance differences among models in analytical tasks. The results also reveal the limitations in agents’ abilities for coherent analysis and insight discovery, thereby providing data-driven support for future improvements.

Key words: agent, data analysis, benchmark

CLC Number: