arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.04425 2026-04-07 cs.CV

HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance

Green Rosh, Prateek Kukreja, Vishakha SR, Pawan Prasad B H

Comments Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

详情

英文摘要

The emergence of virtual reality has necessitated the generation of detailed and customizable 3D hand models for interaction in the virtual world. However, the current methods for 3D hand model generation are both expensive and cumbersome, offering very little customizability to the users. While recent advancements in zero-shot text-to-3D synthesis have enabled the generation of diverse and customizable 3D models using Score Distillation Sampling (SDS), they do not generalize very well to 3D hand model generation, resulting in unnatural hand structures, view-inconsistencies and loss of details. To address these limitations, we introduce HandDreamer, the first method for zero-shot 3D hand model generation from text prompts. Our findings suggest that view-inconsistencies in SDS is primarily caused due to the ambiguity in the probability landscape described by the text prompt, resulting in similar views converging to different modes of the distribution. This is particularly aggravated for hands due to the large variations in articulations and poses. To alleviate this, we propose to use MANO hand model based initialization and a hand skeleton guided diffusion process to provide a strong prior for the hand structure and to ensure view and pose consistency. Further, we propose a novel corrective hand shape guidance loss to ensure that all the views of the 3D hand model converges to view-consistent modes, without leading to geometric distortions. Extensive evaluations demonstrate the superiority of our method over the state-of-the-art methods, paving a new way forward in 3D hand model generation.

URL PDF HTML ☆

赞 0 踩 0

2604.04420 2026-04-07 cs.LG cs.AI

Is Prompt Selection Necessary for Task-Free Online Continual Learning?

Seoyoung Park, Haemin Lee, Hankook Lee

Comments Accepted to CVPR Findings 2026. The code is available at https://github.com/efficient-learning-lab/SinglePrompt

2604.04419 2026-04-07 cs.CV

BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

Kaiwen Wang, Kaili Zheng, Rongrong Deng, Yiming Shi, Chenyi Guo, Ji Wu

详情

英文摘要

Recent multimodal large language models (MLLMs) have shown strong capabilities in general video understanding, driving growing interest in automatic sports commentary generation. However, existing benchmarks for this task focus exclusively on team sports such as soccer and basketball, leaving combat sports entirely unexplored. Notably, combat sports present distinct challenges: critical actions unfold within milliseconds with visually subtle yet semantically decisive differences, and professional commentary contains a substantially higher proportion of tactical analysis compared to team sports. In this paper, we present BoxComm, a large-scale dataset comprising 445 World Boxing Championship match videos with over 52K commentary sentences from professional broadcasts. We propose a structured commentary taxonomy that categorizes each sentence into play-by-play, tactical, or contextual, providing the first category-level annotation for sports commentary benchmarks. Building on this taxonomy, we introduce two novel and complementary evaluations tailored to sports commentary generation: (1) category-conditioned generation, which evaluates whether models can produce accurate commentary of a specified type given video context; and (2) commentary rhythm assessment, which measures whether freely generated commentary exhibits appropriate temporal pacing and type distribution over continuous video segments, capturing a dimension of commentary competence that prior benchmarks have not addressed. Experiments on multiple state-of-the-art MLLMs reveal that current models struggle on both evaluations. We further propose EIC-Gen, an improved baseline incorporating detected punch events to supply structured action cues, yielding consistent gains and highlighting the importance of perceiving fleeting and subtle events for combat sports commentary.

URL PDF HTML ☆

赞 0 踩 0

2604.04411 2026-04-07 cs.CL cs.AI cs.CV

Responses Fall Short of Understanding: Revealing the Gap between Internal Representations and Responses in Visual Document Understanding

Haruka Kawasaki, Ryota Tanaka, Kyosuke Nishida

Comments Accepted to CVPR2026 workshop (MULA)

2604.04410 2026-04-07 cs.LG cs.AI cs.CL stat.ML

Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

Hiroshi Takahashi, Tomoharu Iwata, Atsutoshi Kumagai, Sekitoshi Kanai, Masanori Yamada, Kosuke Nishida, Kazutoshi Shinoda

Comments Code is available at https://github.com/takahashihiroshi/rdro

2604.04409 2026-04-07 cs.RO cs.MA

FORMULA: FORmation MPC with neUral barrier Learning for safety Assurance

Qintong Xie, Weishu Zhan, Peter Chin

Comments Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026

2604.04406 2026-04-07 cs.CV

3D-Fixer: Coarse-to-Fine In-place Completion for 3D Scenes from a Single Image

Ze-Xin Yin, Liu Liu, Xinjie Wang, Wei Sui, Zhizhong Su, Jian Yang, Jin Xie

Comments 17 pages, 10 figures, CVPR 2026, project page: https://zx-yin.github.io/3dfixer

2604.04402 2026-04-07 cs.CV

UENR-600K: A Large-Scale Physically Grounded Dataset for Nighttime Video Deraining

Pei Yang, Hai Ci, Beibei Lin, Yiren Song, Mike Zheng Shou

2604.04401 2026-04-07 cs.RO cs.LG cs.SY eess.SY

ReinVBC: A Model-based Reinforcement Learning Approach to Vehicle Braking Controller

Haoxin Lin, Junjie Zhou, Daheng Xu, Yang Yu

2604.04399 2026-04-07 cs.AI

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

Yuwen Zhai, Runze Li, Liang Wang, Nian Shi, Liwu Xu, Wei Zhang, Ran Lin, Bo Xu, Benlei Cui

2604.04394 2026-04-07 cs.LG cs.SY eess.SY

Finite-Time Analysis of Q-Value Iteration for General-Sum Stackelberg Games

Narim Jeong, Donghwan Lee

Comments 8 pages

2604.04386 2026-04-07 cs.AI

Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis

Jiayu Fu, Mourad Heddaya, Chenhao Tan

Comments 8 pages (without reference and appendix), 4 figures, 1 table, accepted by ICLR 2026 Workshop of Logical Reasoning of Large Language Models

2604.04383 2026-04-07 cs.AI cs.MA math.OC

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

Yanyuan Wang, Xiaowei Zhang

2604.04380 2026-04-07 cs.LG

CPT: Controllable and Editable Design Variations with Language Models

Karthik Suresh, Amine Ben Khalifa, Li Zhang, Wei-ting Hsu, Fangzheng Wu, Vinay More, Asim Kadav

Comments 18 pages, 6 figures, Accepted at NeurIPS 2025 Workshop on Generative and Protective AI for Content Creation (GenProCC 2025)

2604.04379 2026-04-07 cs.CV

Reinforce to Learn, Elect to Reason: A Dual Paradigm for Video Reasoning

Songyuan Yang, Weijiang Yu, Jilin Ma, Ziyu Liu, Guijian Tang, Wenjing Yang, Huibin Tan, Nong Xiao

Comments Accepted at CVPR 2026. Camera-ready version

2604.04374 2026-04-07 cs.RO cs.AI cs.HC

Towards Considerate Human-Robot Coexistence: A Dual-Space Framework of Robot Design and Human Perception in Healthcare

Yuanchen Bai, Zijian Ding, Ruixiang Han, Niti Parikh, Wendy Ju, Angelique Taylor

2604.04373 2026-04-07 cs.AI cs.LG

Decocted Experience Improves Test-Time Inference in LLM Agents

Maohao Shen, Kaiwen Zha, Zexue He, Zhang-Wei Hong, Siru Ouyang, J. Jon Ryu, Prasanna Sattigeri, Suhas Diggavi, Gregory Wornell

2604.04372 2026-04-07 cs.CV

Graph-to-Frame RAG: Visual-Space Knowledge Fusion for Training-Free and Auditable Video Reasoning

Songyuan Yang, Weijiang Yu, Ziyu Liu, Guijian Tang, Wenjing Yang, Huibin Tan, Nong Xiao

Comments Accepted at CVPR 2026. Camera-ready version

2604.04364 2026-04-07 cs.LG cs.AI

Context is All You Need

Jean Erik Delanois, Shruti Joshi, Ryan Golden, Teresa Nick, Maxim Bazhenov

2604.04363 2026-04-07 cs.CV cs.AI cs.LG

Integer-Only Operations on Extreme Learning Machine Test Time Classification

Emerson Lopes Machadoa, Cristiano Jacques Miosso, Ricardo Pezzuol Jacobi

Comments 14 pages. Originally written in 2015; archived in 2026

2604.04359 2026-04-07 cs.CL cs.AI

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Tianyi Zhang, Andreas Marfurt

Comments To appear in the Proceedings of KG-LLM @ LREC 2026

2604.04357 2026-04-07 cs.CV

Spatially-Weighted CLIP for Street-View Geo-localization

Ting Han, Fengjiao Li, Chunsong Chen, Haoling Huang, Yiping Chen, Meiliu Wu

2604.04356 2026-04-07 cs.AI cs.CL cs.LG cs.PF

REAM: Merging Improves Pruning of Experts in LLMs

Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev

Comments code is at https://github.com/SamsungSAILMontreal/ream

2604.04349 2026-04-07 cs.RO cs.LG

Adversarial Robustness Analysis of Cloud-Assisted Autonomous Driving Systems

Maher Al Islam, Amr S. El-Wakeel

2604.04348 2026-04-07 cs.SD cs.CV cs.MM

OmniSonic: Towards Universal and Holistic Audio Generation from Video and Text

Weiguo Pian, Saksham Singh Kushwaha, Zhimin Chen, Shijian Deng, Kai Wang, Yunhui Guo, Yapeng Tian

Comments CVPR 2026

2604.04347 2026-04-07 cs.AI

RoboPhD: Evolving Diverse Complex Agents Under Tight Evaluation Budgets

Andrew Borthwick, Stephen Ash, Anthony Galczak

Comments 20 pages, 1 figure

详情

英文摘要

2026 has brought an explosion of interest in LLM-guided evolution of agentic artifacts, with systems like GEPA and Autoresearch demonstrating that LLMs can iteratively improve prompts, code, and agent architectures across diverse domains. As adoption accelerates, a central question emerges: given the same information, the same seed agent, and the same objective, which optimization algorithm yields the best results under the same evaluation budget? This question becomes critical when evaluations are expensive, such as when they require human judgment or multiple LLM calls. We present the first systematic comparison of three optimization paradigms -- Elo tournament selection (RoboPhD), Pareto-based selection (GEPA), and greedy hill-climbing (Autoresearch) -- across four benchmarks spanning abstract reasoning, cloud scheduling, SQL generation, and financial QA, all under a fixed budget of 1,500 evaluations. RoboPhD introduces validation-free evolution: instead of splitting the budget between training and validation, it uses Elo competition on training data to simultaneously evaluate agents and drive evolution. All three systems receive seed agents with diagnostic print() statements that evolution can grow, enabling self-instrumenting agents that develop increasingly informative diagnostics for the benefit of their evolutionary successors. Using a single default configuration, RoboPhD outperforms both GEPA and Autoresearch on three of four benchmarks, losing only on the simplest task, where the winning solution (from our Autoresearch adaptation) required under 90 lines of code. On ARC-AGI, RoboPhD evolves a 22-line seed agent into a 1,013-line multi-strategy system, improving accuracy from 27.8% to 65.8% using Gemini 3.1 Flash Lite as the solver. We release RoboPhD as a versatile toolkit under the MIT license with a simple optimize_anything() API for evolving diverse complex agents.

URL PDF HTML ☆

赞 0 踩 0

2604.04343 2026-04-07 cs.LG

Deep Kuratowski Embedding Neural Networks for Wasserstein Metric Learning

Andrew Qing He

2604.04342 2026-04-07 cs.LG stat.ML

Generative models for decision-making under distributional shift

Xiuyuan Cheng, Yunqin Zhu, Yao Xie

Comments Under review for INFORMS TutORials in Operations Research, 2026

2604.04341 2026-04-07 cs.AI

Implementing surrogate goals for safer bargaining in LLM-based agents

Caspar Oesterheld, Maxime Riché, Filip Sondej, Jesse Clifton, Vincent Conitzer

2604.04339 2026-04-07 cs.AI cs.LG

Thermodynamic-Inspired Explainable GeoAI: Uncovering Regime-Dependent Mechanisms in Heterogeneous Spatial Systems

Sooyoung Lim, Zhenlong Li, Zi-Kui Liu