arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07468 2026-04-10 cs.AI

M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery

Hanyi Liu, Zhonghao Jiu, Minghao Wang, Yuhang Xie, Heran Yang

Comments 13 pages, 5 figures, submitted to IEEE Access

详情

英文摘要

Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem: resemblance is necessary but not sufficient evidence. Most prior systems reduce influence discovery to embedding similarity or label-driven graph completion, while recent multimodal large language models (LLMs) remain vulnerable to temporal inconsistency and unverified attributions. This paper introduces M-ArtAgent, an evidence-based multimodal agent that reframes implicit influence discovery as probabilistic adjudication. It follows a four-phase protocol consisting of Investigation, Corroboration, Falsification, and Verdict governed by a Reasoning and Acting (ReAct)-style controller that assembles verifiable evidence chains from images and biographies, enforces art-historical axioms, and subjects each hypothesis to adversarial falsification via a prompt-isolated critic. Two theory-grounded operators, StyleComparator for Wolfflin formal analysis and ConceptRetriever for ICONCLASS-based iconographic grounding, ensure that intermediate claims are formally auditable. On the balanced WikiArt Influence Benchmark-100 (WIB-100) of 100 artists and 2,000 directed pairs, M-ArtAgent achieves 83.7% positive-class F1, 0.666 Matthews correlation coefficient (MCC), and 0.910 area under the receiver operating characteristic curve (ROC-AUC), with leakage-control and robustness checks confirming that the gains persist when explicit influence phrases are masked. By coupling multimodal perception with domain-constrained falsification, M-ArtAgent demonstrates that implicit influence analysis benefits from historically grounded adjudication rather than pattern matching alone.

URL PDF HTML ☆

赞 0 踩 0

2604.07467 2026-04-10 cs.CL cs.LG

Lexical Tone is Hard to Quantize: Probing Discrete Speech Units in Mandarin and Yorùbá

Opeyemi Osakuade, Simon King

Comments Accepted at Speech Prosody 2026

2604.07457 2026-04-10 cs.RO cs.AI cs.LG

CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection

Ziyang Cheng, Haoyu Wei, Hang Yin, Xiuwei Xu, Bingyao Yu, Jie Zhou, Jiwen Lu

Comments 14 pages, 8 figures. Under review. Project page and videos: https://shepherd1226.github.io/CMP

2604.07455 2026-04-10 cs.AI cs.LG cs.LO

Munkres' General Topology Autoformalized in Isabelle/HOL

Dustin Bryant, Jonathan Julián Huerta y Munive, Cezary Kaliszyk, Josef Urban

2604.07430 2026-04-10 cs.CV

HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents

Tencent Robotics X, HY Vision Team, :, Xumin Yu, Zuyan Liu, Ziyi Wang, He Zhang, Yongming Rao, Fangfu Liu, Yani Zhang, Ruowen Zhao, Oran Wang, Yves Liang, Haitao Lin, Minghui Wang, Yubo Dong, Kevin Cheng, Bolin Ni, Rui Huang, Han Hu, Zhengyou Zhang, Linus, Shunyu Yao

详情

英文摘要

We introduce HY-Embodied-0.5, a family of foundation models specifically designed for real-world embodied agents. To bridge the gap between general Vision-Language Models (VLMs) and the demands of embodied agents, our models are developed to enhance the core capabilities required by embodied intelligence: spatial and temporal visual perception, alongside advanced embodied reasoning for prediction, interaction, and planning. The HY-Embodied-0.5 suite comprises two primary variants: an efficient model with 2B activated parameters designed for edge deployment, and a powerful model with 32B activated parameters targeted for complex reasoning. To support the fine-grained visual perception essential for embodied tasks, we adopt a Mixture-of-Transformers (MoT) architecture to enable modality-specific computing. By incorporating latent tokens, this design effectively enhances the perceptual representation of the models. To improve reasoning capabilities, we introduce an iterative, self-evolving post-training paradigm. Furthermore, we employ on-policy distillation to transfer the advanced capabilities of the large model to the smaller variant, thereby maximizing the performance potential of the compact model. Extensive evaluations across 22 benchmarks, spanning visual perception, spatial reasoning, and embodied understanding, demonstrate the effectiveness of our approach. Our MoT-2B model outperforms similarly sized state-of-the-art models on 16 benchmarks, while the 32B variant achieves performance comparable to frontier models such as Gemini 3.0 Pro. In downstream robot control experiments, we leverage our robust VLM foundation to train an effective Vision-Language-Action (VLA) model, achieving compelling results in real-world physical evaluations. Code and models are open-sourced at https://github.com/Tencent-Hunyuan/HY-Embodied.

URL PDF HTML ☆

赞 0 踩 0

2604.07429 2026-04-10 cs.CV cs.AI cs.HC

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, Hwee Tou Ng, Mike Zheng Shou

Comments 23 pages, 8 figures

2604.07428 2026-04-10 cs.LG cs.AI

Regret-Aware Policy Optimization: Environment-Level Memory for Replay Suppression under Delayed Harm

Prakul Sunil Hiremath

Comments 18 pages, 3 figures. Includes theoretical analysis and experiments on graph diffusion environments

2604.07427 2026-04-10 cs.CV

Personalizing Text-to-Image Generation to Individual Taste

Anne-Sofie Maerten, Juliane Verwiebe, Shyamgopal Karthik, Ameya Prabhu, Johan Wagemans, Matthias Bethge

2604.07426 2026-04-10 cs.LG cs.AI

GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control

Prakul Sunil Hiremath

Comments 20 pages, 2 figures, 7 tables; reinforcement learning, world models

2604.07424 2026-04-10 cs.AI cs.CY cs.MA

An Analysis of Artificial Intelligence Adoption in NIH-Funded Research

Navapat Nananukul, Mayank Kejriwal

2604.07423 2026-04-10 cs.RO cs.LG

OpenPRC: A Unified Open-Source Framework for Physics-to-Task Evaluation in Physical Reservoir Computing

Yogesh Phalak, Wen Sin Lor, Apoorva Khairnar, Benjamin Jantzen, Noel Naughton, Suyi Li

Comments 23 pages, 7 figures

2604.07422 2026-04-10 cs.LG

Multimodal Large Language Models for Multi-Subject In-Context Image Generation

Yucheng Zhou, Dubing Chen, Huan Zheng, Jianbing Shen

Comments ACL 2026

2604.07417 2026-04-10 cs.SD eess.AS

Semantic-Emotional Resonance Embedding: A Semi-Supervised Paradigm for Cross-Lingual Speech Emotion Recognition

Ya Zhao, Yinfeng Yu, Liejun Wang

Comments Main paper (6 pages). Accepted for publication by IEEE International conference on Multimedia and Expo 2026 (ICME 2026)

2604.07416 2026-04-10 cs.LG cond-mat.mtrl-sci physics.comp-ph

Bayesian Optimization for Mixed-Variable Problems in the Natural Sciences

Yuhao Zhang, Ti John, Matthias Stosiek, Patrick Rinke

2604.07412 2026-04-10 cs.LG physics.data-an

Physics-informed neural operators for the in situ characterization of locally reacting sound absorbers

Jonas M. Schmid, Johannes D. Schmid, Martin Eser, Steffen Marburg

2604.07411 2026-04-10 cs.LG cs.AI

Reinforcement Learning with Reward Machines for Sleep Control in Mobile Networks

Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp

Comments Under review

2604.07409 2026-04-10 cs.LG eess.IV

GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design

Chenchen Xu, Min Zhou, Tiezheng Ge, Weiwei Xu

Comments arXiv admin note: text overlap with arXiv:2303.14377

2604.07405 2026-04-10 cs.LG cs.AI

Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization

Daniel Nobrega Medeiros

Comments 13 pages, 4 figures, 1 table, 23 experiments. Code available at https://github.com/danielxmed/TheLocalMinimumParadox

2604.07402 2026-04-10 cs.LG eess.IV

Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

Yucheng Zhou, Jianbing Shen

Comments ACL 2026 Findings

2604.07399 2026-04-10 cs.LG

Critical Patch-Aware Sparse Prompting with Decoupled Training for Continual Learning on the Edge

Wonseon Lim, Jaesung Lee, Dae-Won Kim

Comments Accepted to CVPR 2026. 10 pages, 8 figures

2604.07397 2026-04-10 cs.LG cs.AI

Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training

Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado

Comments CVPRW in the proceedings of CVPR 2026

2604.07395 2026-04-10 cs.RO cs.AI cs.CV

A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring

Wenze Wang, Mehdi Hosseinzadeh, Feras Dayoub

Comments Project page: https://wenzewwz123.github.io/Agentic-Loop/

2604.07394 2026-04-10 cs.LG cs.CL

Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference

Quantong Qiu, Zhiyi Hong, Yi Yang, Haitian Wang, Kebin Liu, Qingqing Dang, Juntao Li, Min Zhang

2604.07390 2026-04-10 cs.LG cs.IT math.IT

A Graph Foundation Model for Wireless Resource Allocation

Yucheng Sheng, Jiacheng Wang, Le Liang, Hao Ye, Shi Jin

2604.07385 2026-04-10 cs.LG cs.AI

Playing DOOM with 1.3M Parameters: Specialized Small Models vs Large Language Models for Real-Time Game Control

David Golchinfar, Daryoush Vaziri, Alexander Marquardt

Comments 17 pages, 3 figures, 3 tables. Code and model weights available at https://github.com/VAGOsolutions/SauerkrautLM-Doom-MultiVec

2604.07384 2026-04-10 cs.LG cs.AI

Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health

Shresth Verma, Arpan Dasgupta, Neha Madhiwalla, Aparna Taneja, Milind Tambe

2604.07380 2026-04-10 cs.LG

The Lifecycle of the Spectral Edge: From Gradient Learning to Weight-Decay Compression

Yongzhong Xu

Comments 15 pages, 12 figures

2604.07378 2026-04-10 cs.RO

Evaluation as Evolution: Transforming Adversarial Diffusion into Closed-Loop Curricula for Autonomous Vehicles

Yicheng Guo, Jiaqi Liu, Chengkai Xu, Peng Hang, Jian Sun

2604.07369 2026-04-10 cs.LG cs.AI

The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior

Ameen Patel, Felix Lee, Kyle Liang, Joseph Thomas

2604.07363 2026-04-10 cs.LG

Benchmark Shadows: Data Alignment, Parameter Footprints, and Generalization in Large Language Models

Hongjian Zou, Yidan Wang, Qi Ding, Yixuan Liao, Xiaoxin Chen

Comments 28 pages, 26 figures, 8 tables