arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09859 2026-03-11 cs.LG cs.AI cs.NI cs.SY eess.SY

A Graph-Based Approach to Spectrum Demand Prediction Using Hierarchical Attention Networks

Mohamad Alkadamani, Halim Yanikomeroglu, Amir Ghasemi

Comments 7 pages, 6 figures. Presented at IEEE GLOBECOM 2025, Taiwan. To appear in the conference proceedings

2603.09853 2026-03-11 cs.SD cs.AI

SCENEBench: An Audio Understanding Benchmark Grounded in Assistive and Industrial Use Cases

Laya Iyer, Angelina Wang, Sanmi Koyejo

Comments Accepted to EACL 2026 (Main Conference). 10 pages, 10 figures. Camera-ready version

2603.09842 2026-03-11 cs.LG stat.ME stat.ML

A Unified Hierarchical Multi-Task Multi-Fidelity Framework for Data-Efficient Surrogate Modeling in Manufacturing

Manan Mehta, Zhiqiao Dong, Yuhang Yang, Chenhui Shao

2603.09835 2026-03-11 cs.CL

Chow-Liu Ordering for Long-Context Reasoning in Chain-of-Agents

Naman Gupta, Vaibhav Singh, Arun Iyer, Kirankumar Shiragur, Pratham Grover, Ramakrishna B. Bairi, Ritabrata Maiti, Sankarshan Damle, Shachee Mishra Gupta, Rishikesh Maurya, Vageesh D. C

Comments Published as a workshop paper at ICLR 2026 Workshop MemAgents

2603.09826 2026-03-11 cs.CV

VLM-Loc: Localization in Point Cloud Maps via Vision-Language Models

Shuhao Kang, Youqi Liao, Peijie Wang, Wenlong Liao, Qilin Zhang, Benjamin Busam, Xieyuanli Chen, Yun Liu

Comments CVPR 2026

2603.09825 2026-03-11 cs.CV

BrainSTR: Spatio-Temporal Contrastive Learning for Interpretable Dynamic Brain Network Modeling

Guiliang Guo, Guangqi Wen, Lingwen Liu, Ruoxian Song, Peng Cao, Jinzhu Yang, Fei Wang, Xiaoli Liu, Osmar R. Zaiane

2603.09821 2026-03-11 cs.CL

One-Eval: An Agentic System for Automated and Traceable LLM Evaluation

Chengyu Shen, Yanheng Hou, Minghui Pan, Runming He, Zhen Hao Wong, Meiyi Qiang, Zhou Liu, Hao Liang, Peichao Lai, Zeang Sheng, Wentao Zhang

2603.09820 2026-03-11 cs.SD

EmoSURA: Towards Accurate Evaluation of Detailed and Long-Context Emotional Speech Captions

Xin Jing, Andreas Triantafyllopoulos, Jiadong Wang, Shahin Amiriparian, Jun Luo, Björn Schuller

Comments Submitted to Interspeech 2026

2603.09819 2026-03-11 cs.CV

ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

Liudi Yang, George Eskandar, Fengyi Shen, Mohammad Altillawi, Yang Bai, Chi Zhang, Ziyuan Liu, Abhinav Valada

Comments 13 pages

2603.09815 2026-03-11 cs.LG cs.AI

Correction of Transformer-Based Models with Smoothing Pseudo-Projector

Vitaly Bulgakov

Comments 29 pages, 23 figures

2603.09809 2026-03-11 cs.CV

RA-SSU: Towards Fine-Grained Audio-Visual Learning with Region-Aware Sound Source Understanding

Muyi Sun, Yixuan Wang, Hong Wang, Chen Su, Man Zhang, Xingqun Qi, Qi Li, Zhenan Sun

Comments Accepted by IEEE TMM

详情

英文摘要

Audio-Visual Learning (AVL) is one fundamental task of multi-modality learning and embodied intelligence, displaying the vital role in scene understanding and interaction. However, previous researchers mostly focus on exploring downstream tasks from a coarse-grained perspective (e.g., audio-visual correspondence, sound source localization, and audio-visual event localization). Considering providing more specific scene perception details, we newly define a fine-grained Audio-Visual Learning task, termed Region-Aware Sound Source Understanding (RA-SSU), which aims to achieve region-aware, frame-level, and high-quality sound source understanding. To support this goal, we innovatively construct two corresponding datasets, i.e. fine-grained Music (f-Music) and fine-grained Lifescene (f-Lifescene), each containing annotated sound source masks and frame-by-frame textual descriptions. The f-Music dataset includes 3,976 samples across 22 scene types related to specific application scenarios, focusing on music scenes with complex instrument mixing. The f-Lifescene dataset contains 6,156 samples across 61 types representing diverse sounding objects in life scenarios. Moreover, we propose SSUFormer, a Sound-Source Understanding TransFormer benchmark that facilitates both the sound source segmentation and sound region description with a multi-modal input and multi-modal output architecture. Specifically, we design two modules for this framework, Mask Collaboration Module (MCM) and Mixture of Hierarchical-prompted Experts (MoHE), to respectively enhance the accuracy and enrich the elaboration of the sound source description. Extensive experiments are conducted on our two datasets to verify the feasibility of the task, evaluate the availability of the datasets, and demonstrate the superiority of the SSUFormer, which achieves SOTA performance on the Sound Source Understanding benchmark.

URL PDF HTML ☆

赞 0 踩 0

2603.09793 2026-03-11 cs.LG

Information Theoretic Bayesian Optimization over the Probability Simplex

Federico Pavesi, Antonio Candelieri, Noémie Jaquier

Comments 16 pages, 5 figures

2603.09789 2026-03-11 cs.LG cs.AI quant-ph

A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines

Yixiong Chen

详情

英文摘要

Accurate forecasting of financial market volatility is crucial for risk management, option pricing, and portfolio optimization. Traditional econometric models and classical machine learning methods face challenges in handling the inherent non-linear and non-stationary characteristics of financial time series. In recent years, the rapid development of quantum computing has provided a new paradigm for solving complex optimization and sampling problems. This paper proposes a novel hybrid quantum-classical computing framework aimed at combining the powerful representation capabilities of classical neural networks with the unique advantages of quantum models. For the specific task of financial market volatility forecasting, we designed and implemented a hybrid model based on this framework, which combines a Long Short-Term Memory (LSTM) network with a Quantum Circuit Born Machine (QCBM). The LSTM is responsible for extracting complex dynamic features from historical time series data, while the QCBM serves as a learnable prior module, providing the model with a high-quality prior distribution to guide the forecasting process. We evaluated the model on two real financial datasets consisting of 5-minute high-frequency data from the Shanghai Stock Exchange (SSE) Composite Index and CSI 300 Index. Experimental results show that, compared to a purely classical LSTM baseline model, our hybrid quantum-classical model demonstrates significant advantages across multiple key metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and QLIKE loss, proving the great potential of quantum computing in enhancing the capabilities of financial forecasting models. More broadly, the proposed hybrid framework offers a flexible architecture that may be adapted to other machine learning tasks involving high-dimensional, complex, or non-linear data distributions.

URL PDF HTML ☆

赞 0 踩 0

2603.09786 2026-03-11 cs.AI

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Jonah Brown-Cohen, David Lindner, Rohin Shah

2603.09782 2026-03-11 cs.RO

TIMID: Time-Dependent Mistake Detection in Videos of Robot Executions

Nerea Gallego, Fernando Salanova, Claudio Mannarano, Cristian Mahulea, Eduardo Montijano

Comments 8 pages, 5 figures , IROS submission

2603.09774 2026-03-11 cs.AI

World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models

Shouwei Ruan, Bin Wang, Zhenyu Wu, Qihui Zhu, Yuxiang Zhang, Hang Su, Yubin Wang

2603.09772 2026-03-11 cs.CV cs.CR

Removing the Trigger, Not the Backdoor: Alternative Triggers and Latent Backdoors

Gorka Abad, Ermes Franch, Stefanos Koffas, Stjepan Picek

2603.09761 2026-03-11 cs.RO

MuxGel: Simultaneous Dual-Modal Visuo-Tactile Sensing via Spatially Multiplexing and Deep Reconstruction

Zhixian Hu, Zhengtong Xu, Sheeraz Athar, Juan Wachs, Yu She

Comments Submitted to IROS 2026

2603.09760 2026-03-11 cs.CV cs.RO eess.IV

PanoAffordanceNet: Towards Holistic Affordance Grounding in 360° Indoor Environments

Guoliang Zhu, Wanjun Jia, Caoyang Shao, Yuheng Zhang, Zhiyong Li, Kailun Yang

Comments The source code and benchmark dataset will be made publicly available at https://github.com/GL-ZHU925/PanoAffordanceNet

2603.09759 2026-03-11 cs.CV

LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

Mingyu Kang, Hyein Seo, Yuna Jeong, Junhyeong Park, Yong Suk Choi

2603.09758 2026-03-11 cs.CL

Beyond Fine-Tuning: Robust Food Entity Linking under Ontology Drift with FoodOntoRAG

Jan Drole, Ana Gjorgjevikj, Barbara Korouši'c Seljak, Tome Eftimov

Comments Preprint

2603.09745 2026-03-11 cs.RO

Caterpillar-Inspired Spring-Based Compressive Continuum Robot for Bristle-based Exploration

Zhixian Hu, Yu She, Juan Wachs

Comments Accepted by RoboSoft 2026

2603.09743 2026-03-11 cs.CV

LAP: A Language-Aware Planning Model For Procedure Planning In Instructional Videos

Lei Shi, Victor Aregbede, Andreas Persson, Martin Längkvist, Amy Loutfi, Stephanie Lowry

2603.09740 2026-03-11 cs.RO cs.CV

Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments

Haoyuan Li, Rui Liu, Hehe Fan, Yi Yang

Comments 28 pages, 10 figures

2603.09737 2026-03-11 cs.CV cs.RO eess.IV

$M^2$-Occ: Resilient 3D Semantic Occupancy Prediction for Autonomous Driving with Incomplete Camera Inputs

Kaixin Lin, Kunyu Peng, Di Wen, Yufan Chen, Ruiping Liu, Kailun Yang

Comments The source code will be publicly released at https://github.com/qixi7up/M2-Occ

2603.09733 2026-03-11 cs.CV cs.MA

FetalAgents: A Multi-Agent System for Fetal Ultrasound Image and Video Analysis

Xiaotian Hu, Junwei Huang, Mingxuan Liu, Kasidit Anmahapong, Yifei Chen, Yitong Luo, Yiming Huang, Xuguang Bai, Zihan Li, Yi Liao, Haibo Qu, Qiyuan Tian

2603.09727 2026-03-11 cs.LG

A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System

Luyao Zou, Hayoung Oh, Chu Myaet Thwal, Apurba Adhikary, Seohyeon Hong, Zhu Han

Comments 15 pages, 6 figures

2603.09718 2026-03-11 cs.CV

GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System

Zhiye Tang, Qiudan Zhang, Lei Zhang, Junhui Hou, You Yang, Xu Wang

2603.09716 2026-03-11 cs.AI

AutoAgent: Evolving Cognition and Elastic Memory Orchestration for Adaptive Agents

Xiaoxing Wang, Ning Liao, Shikun Wei, Chen Tang, Feiyu Xiong

详情

英文摘要

Autonomous agent frameworks still struggle to reconcile long-term experiential learning with real-time, context-sensitive decision-making. In practice, this gap appears as static cognition, rigid workflow dependence, and inefficient context usage, which jointly limit adaptability in open-ended and non-stationary environments. To address these limitations, we present AutoAgent, a self-evolving multi-agent framework built on three tightly coupled components: evolving cognition, on-the-fly contextual decision-making, and elastic memory orchestration. At the core of AutoAgent, each agent maintains structured prompt-level cognition over tools, self-capabilities, peer expertise, and task knowledge. During execution, this cognition is combined with live task context to select actions from a unified space that includes tool calls, LLM-based generation, and inter-agent requests. To support efficient long-horizon reasoning, an Elastic Memory Orchestrator dynamically organizes interaction history by preserving raw records, compressing redundant trajectories, and constructing reusable episodic abstractions, thereby reducing token overhead while retaining decision-critical evidence. These components are integrated through a closed-loop cognitive evolution process that aligns intended actions with observed outcomes to continuously update cognition and expand reusable skills, without external retraining. Empirical results across retrieval-augmented reasoning, tool-augmented agent benchmarks, and embodied task environments show that AutoAgent consistently improves task success, tool-use efficiency, and collaborative robustness over static and memory-augmented baselines. Overall, AutoAgent provides a unified and practical foundation for adaptive autonomous agents that must learn from experience while making reliable context-aware decisions in dynamic environments.

URL PDF HTML ☆

赞 0 踩 0

2603.09714 2026-03-11 cs.SD cs.AI cs.CL eess.AS

MUGEN: Evaluating and Improving Multi-audio Understanding of Large Audio-Language Models

Chih-Kai Yang, Yun-Shao Tsai, Yu-Kai Guo, Ping-Le Tsai, Yen-Ting Piao, Hung-Wei Chen, Ting-Lin Hsiao, Yun-Man Hsu, Ke-Han Lu, Hung-yi Lee

Comments 6 pages, 3 figures, 3 tables. Dataset: https://huggingface.co/Multi-Audio-Grounding