OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

基本信息

来源: arxiv
原始来源: https://arxiv.org/abs/2606.09826v1
作者: Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, Xiaojuan Qi
分类: cs.CV
论文时间: 2026-06-08T17:59:43Z
论文 PDF: https://arxiv.org/pdf/2606.09826v1.pdf

来源摘要/节选

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.

来源说明

当前只保存了官方论文摘要，不代表论文全文。请以原始来源为准。

本页只呈现已做哈希绑定的来源证据，不包含基于旧正文或缺失原文的扩展推断。

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

基本信息

来源摘要/节选

来源说明

应用场景

AI/ML项目

大语言模型

从首次观测到传播链