arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.11133 2026-03-13 cs.LG

Higher-Order Modular Attention: Fusing Pairwise and Triadic Interactions for Protein Sequences

Shirin Amiraslani, Xin Gao

Comments 11, 4 figures

详情

英文摘要

Transformer self-attention computes pairwise token interactions, yet protein sequence to phenotype relationships often involve cooperative dependencies among three or more residues that dot product attention does not capture explicitly. We introduce Higher-Order Modular Attention, HOMA, a unified attention operator that fuses pairwise attention with an explicit triadic interaction pathway. To make triadic attention practical on long sequences, HOMA employs block-structured, windowed triadic attention. We evaluate on three TAPE benchmarks for Secondary Structure, Fluorescence, and Stability. Our attention mechanism yields consistent improvements across all tasks compared with standard self-attention and efficient variants including block-wise attention and Linformer. These results suggest that explicit triadic terms provide complementary representational capacity for protein sequence prediction at controllable additional computational cost.

URL PDF HTML ☆

赞 0 踩 0

2603.11131 2026-03-13 cs.LG

Beyond Barren Plateaus: A Scalable Quantum Convolutional Architecture for High-Fidelity Image Classification

Radhakrishnan Delhibabu

2603.11130 2026-03-13 cs.RO

Robust Co-design Optimisation for Agile Fixed-Wing UAVs

Adrian Andrei Buda, Xavier Chen, Nicolò Botteghi, Urban Fasel

2603.11123 2026-03-13 cs.SD cs.CL

Uni-ASR: Unified LLM-Based Architecture for Non-Streaming and Streaming Automatic Speech Recognition

Yinfeng Xia, Jian Tang, Junfeng Hou, Gaopeng Xu, Haitao Yao

Comments Submitted to Interspeech 2026

2603.11121 2026-03-13 cs.LG

High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction

Piragash Manmatharasan, Girma Bitsuamlak, Katarina Grolinger

详情

DOI: 10.1016/j.enbuild.2026.117251
Journal ref: Energy and Buildings, 359 (2026), 117251

英文摘要

Building design optimization often depends on physics-based simulation tools such as EnergyPlus, which, although accurate, are computationally expensive and slow. Surrogate models provide a faster alternative, yet most are location-specific, and even weather-informed variants require simulations from many sites to generalize to unseen locations. This limitation arises because existing methods do not fully exploit the short-term weather-driven energy patterns shared across regions, restricting their scalability and reusability. This study introduces a high-resolution (weekly) weather-informed surrogate modeling approach that enhances model reusability across locations. By capturing recurring short-term weather-energy demand patterns common to multiple regions, the proposed method produces a generalized surrogate that performs well beyond the training location. Unlike previous weather-informed approaches, it does not require extensive simulations from multiple sites to achieve strong generalization. Experimental results show that when trained on a single location, the model maintains high predictive accuracy for other sites within the same climate zone, with no noticeable performance loss, and exhibits only minimal degradation when applied across different climate zones. These findings demonstrate the potential of climate-informed generalization for developing scalable and reusable surrogate models, supporting more sustainable and optimized building design practices.

URL PDF HTML ☆

赞 0 踩 0

2603.11119 2026-03-13 cs.LG

Group Resonance Network: Learnable Prototypes and Multi-Subject Resonance for EEG Emotion Recognition

Renwei Meng

Comments 12 pages, 5 figures

2603.11118 2026-03-13 cs.LG math.PR

A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks

Eliran Sherzer

2603.11117 2026-03-13 cs.LG

Learning Tree-Based Models with Gradient Descent

Sascha Marton

Comments PhD thesis

详情

英文摘要

Tree-based models are widely recognized for their interpretability and have proven effective in various application domains, particularly in high-stakes domains. However, learning decision trees (DTs) poses a significant challenge due to their combinatorial complexity and discrete, non-differentiable nature. As a result, traditional methods such as CART, which rely on greedy search procedures, remain the most widely used approaches. These methods make locally optimal decisions at each node, constraining the search space and often leading to suboptimal tree structures. Additionally, their demand for custom training methods precludes a seamless integration into modern machine learning (ML) approaches. In this thesis, we propose a novel method for learning hard, axis-aligned DTs through gradient descent. Our approach utilizes backpropagation with a straight-through operator on a dense DT representation, enabling the joint optimization of all tree parameters, thereby addressing the two primary limitations of traditional DT algorithms. First, gradient-based training is not constrained by the sequential selection of locally optimal splits but, instead, jointly optimizes all tree parameters. Second, by leveraging gradient descent for optimization, our approach seamlessly integrates into existing ML approaches e.g., for multimodal and reinforcement learning tasks, which inherently rely on gradient descent. These advancements allow us to achieve state-of-the-art results across multiple domains, including interpretable DTs rees for small tabular datasets, advanced models for complex tabular data, multimodal learning, and interpretable reinforcement learning without information loss. By bridging the gap between DTs and gradient-based optimization, our method significantly enhances the performance and applicability of tree-based models across various ML domains.

URL PDF HTML ☆

赞 0 踩 0

2603.11114 2026-03-13 cs.LG cs.AI

Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers

Mynampati Sri Ranganadha Avinash

Comments 11 pages, 5 figures. Empirical analysis of routing behavior in sparse Mixture-of-Experts transformers using OLMoE

2603.11110 2026-03-13 cs.RO cs.AI

ResWM: Residual-Action World Model for Visual RL

Jseen Zhang, Gabriel Adineera, Jinzhou Tan, Jinoh Kim

Comments Submit KDD2026

详情

英文摘要

Learning predictive world models from raw visual observations is a central challenge in reinforcement learning (RL), especially for robotics and continuous control. Conventional model-based RL frameworks directly condition future predictions on absolute actions, which makes optimization unstable: the optimal action distributions are task-dependent, unknown a priori, and often lead to oscillatory or inefficient control. To address this, we introduce the Residual-Action World Model (ResWM), a new framework that reformulates the control variable from absolute actions to residual actions -- incremental adjustments relative to the previous step. This design aligns with the inherent smoothness of real-world control, reduces the effective search space, and stabilizes long-horizon planning. To further strengthen the representation, we propose an Observation Difference Encoder that explicitly models the changes between adjacent frames, yielding compact latent dynamics that are naturally coupled with residual actions. ResWM is integrated into a Dreamer-style latent dynamics model with minimal modifications and no extra hyperparameters. Both imagination rollouts and policy optimization are conducted in the residual-action space, enabling smoother exploration, lower control variance, and more reliable planning. Empirical results on the DeepMind Control Suite demonstrate that ResWM achieves consistent improvements in sample efficiency, asymptotic returns, and control smoothness, significantly surpassing strong baselines such as Dreamer and TD-MPC. Beyond performance, ResWM produces more stable and energy-efficient action trajectories, a property critical for robotic systems deployed in real-world environments. These findings suggest that residual action modeling provides a simple yet powerful principle for bridging algorithmic advances in RL with the practical requirements of robotics.

URL PDF HTML ☆

赞 0 踩 0

2603.11106 2026-03-13 cs.CV cs.RO

RC-NF: Robot-Conditioned Normalizing Flow for Real-Time Anomaly Detection in Robotic Manipulation

Shijie Zhou, Bin Zhu, Jiarui Yang, Xiangyu Zhao, Jingjing Chen, Yu-Gang Jiang

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2603.11099 2026-03-13 cs.LG cs.AI

Graph Tokenization for Bridging Graphs and Transformers

Zeyuan Guo, Enmao Diao, Cheng Yang, Chuan Shi

Comments Accepted as a poster at ICLR 2026. Code is available at https://github.com/BUPT-GAMMA/Graph-Tokenization-for-Bridging-Graphs-and-Transformers

2603.11094 2026-03-13 cs.LG

Fingerprinting Concepts in Data Streams with Supervised and Unsupervised Meta-Information

Ben Halstead, Yun Sing Koh, Patricia Riddle, Mykola Pechenizkiy, Albert Bifet, Russel Pears

2603.11093 2026-03-13 cs.AI cs.RO

A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

Kejin Yu, Yuhan Sun, Taiqiang Wu, Ruixu Zhang, Zhiqiang Lin, Yuxin Meng, Junjie Wang, Yujiu Yang

Comments Published in TMLR (March 2026) | OpenReview: https://openreview.net/forum?id=XwQ7dc4bqn

详情

英文摘要

The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and generalizable reasoning. Although current AD systems manage structured environments, they consistently falter in long-tail scenarios and complex social interactions that require human-like judgment. Meanwhile, the advent of large language and multimodal models (LLMs and MLLMs) presents a transformative opportunity to integrate a powerful cognitive engine into AD systems, moving beyond pattern matching toward genuine comprehension. However, a systematic framework to guide this integration is critically lacking. To bridge this gap, we provide a comprehensive review of this emerging field and argue that reasoning should be elevated from a modular component to the system's cognitive core. Specifically, we first propose a novel Cognitive Hierarchy to decompose the monolithic driving task according to its cognitive and interactive complexity. Building on this, we further derive and systematize seven core reasoning challenges, such as the responsiveness-reasoning trade-off and social-game reasoning. Furthermore, we conduct a dual-perspective review of the state-of-the-art, analyzing both system-centric approaches to architecting intelligent agents and evaluation-centric practices for their validation. Our analysis reveals a clear trend toward holistic and interpretable "glass-box" agents. In conclusion, we identify a fundamental and unresolved tension between the high-latency, deliberative nature of LLM-based reasoning and the millisecond-scale, safety-critical demands of vehicle control. For future work, a primary objective is to bridge the symbolic-to-physical gap by developing verifiable neuro-symbolic architectures, robust reasoning under uncertainty, and scalable models for implicit social negotiation.

URL PDF HTML ☆

赞 0 踩 0

2603.11089 2026-03-13 cs.SD cs.MM eess.AS

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation

Nolan Chan, Timmy Gang, Yongqian Wang, Yuzhe Liang, Dingdong Wang

Comments Accepted at ICASSP2026

2603.11085 2026-03-13 cs.RO cs.CV cs.MA

Edge-Assisted Multi-Robot Visual-Inertial SLAM with Efficient Communication

Xin Liu, Shuhuan Wen, Jing Zhao, Tony Z. Qiu, Hong Zhang

Comments 13 pages, 18 figures

详情

DOI: 10.1109/TASE.2024.3376427
Journal ref: IEEE Transactions on Automation Science and Engineering, 22 (2025) 2186-2198

英文摘要

The integration of cloud computing and edge computing is an effective way to achieve global consistent and real-time multi-robot Simultaneous Localization and Mapping (SLAM). Cloud computing effectively solves the problem of limited computing, communication and storage capacity of terminal equipment. However, limited bandwidth and extremely long communication links between terminal devices and the cloud result in serious performance degradation of multi-robot SLAM systems. To reduce the computational cost of feature tracking and improve the real-time performance of the robot, a lightweight SLAM method of optical flow tracking based on pyramid IMU prediction is proposed. On this basis, a centralized multi-robot SLAM system based on a robot-edge-cloud layered architecture is proposed to realize real-time collaborative SLAM. It avoids the problems of limited on-board computing resources and low execution efficiency of single robot. In this framework, only the feature points and keyframe descriptors are transmitted and lossless encoding and compression are carried out to realize real-time remote information transmission with limited bandwidth resources. This design reduces the actual bandwidth occupied in the process of data transmission, and does not cause the loss of SLAM accuracy caused by data compression. Through experimental verification on the EuRoC dataset, compared with the current most advanced local feature compression method, our method can achieve lower data volume feature transmission, and compared with the current advanced centralized multi-robot SLAM scheme, it can achieve the same or better positioning accuracy under low computational load.

URL PDF HTML ☆

赞 0 踩 0

2603.11080 2026-03-13 cs.RO

SELF-VLA: A Skill Enhanced Agentic Vision-Language-Action Framework for Contact-Rich Disassembly

Chang Liu, Sibo Tian, Xiao Liang, Minghui Zheng

2603.11077 2026-03-13 cs.RO cs.SY eess.SY

TATIC: Task-Aware Temporal Learning for Human Intent Inference from Physical Corrections in Human-Robot Collaboration

Jiurun Song, Xiao Liang, Minghui Zheng

2603.11076 2026-03-13 cs.AI cs.SE

DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

Aili Chen, Chi Zhang, Junteng Liu, Jiangjie Chen, Chengyu Du, Yunji Li, Ming Zhong, Qin Wang, Zhengmao Zhu, Jiayuan Song, Ke Ji, Junxian He, Pengyu Zhao, Yanghua Xiao

2603.11074 2026-03-13 cs.RO cs.SY eess.SY

DRAFTO: Decoupled Reduced-space and Adaptive Feasibility-repair Trajectory Optimization for Robotic Manipulators

Yichang Feng, Xiao Liang, Minghui Zheng

2603.11072 2026-03-13 cs.RO cs.AI

OA-NBV: Occlusion-Aware Next-Best-View Planning for Human-Centered Active Perception on Mobile Robots

Boxun Hu, Chang Chang, Jiawei Ge, Man Namgung, Xiaomin Lin, Axel Krieger, Tinoosh Mohsenin

2603.11071 2026-03-13 cs.RO cs.CV cs.LG

TinyNav: End-to-End TinyML for Real-Time Autonomous Navigation on Microcontrollers

Pooria Roy, Nourhan Jadallah. Tomer Lapid, Shahzaib Ahmad, Armita Afroushe, Mete Bayrak

Comments 6 pages, 7 figures, presented at CUCAI2026 (Canadian Undergraduate Conference on AI, https://cucai.ca)

2603.11067 2026-03-13 cs.CL cs.AI

Summarize Before You Speak with ARACH: A Training-Free Inference-Time Plug-In for Enhancing LLMs via Global Attention Reallocation

Jingtao Wang, Yucong Wang, Jun Ding, Rui Cai, Xun Wang

2603.11053 2026-03-13 cs.CL cs.IT cs.LG math.IT

Speculative Decoding Scaling Laws (SDSL): Throughput Optimization Made Simple

Amirhossein Bozorgkhoo, Igor Molybog

2603.11052 2026-03-13 cs.LG cs.AI

Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates

Haoze Song, Zhihao Li, Mengyi Deng, Xin Li, Duyi Pan, Zhilu Lai, Wei Wang

2603.11049 2026-03-13 cs.LG

Comparison of Outlier Detection Algorithms on String Data

Philip Maus

Comments A bachelor's thesis comparing the local outlier factor algorithm against a new regular expression learner-based syntactical outlier detection algorithm for single-word string data

2603.10711 2026-03-13 cs.RO cs.SY eess.SY

Parallel-in-Time Nonlinear Optimal Control via GPU-native Sequential Convex Programming

Yilin Zou, Zhong Zhang, Maxime Robic, Fanghua Jiang

2603.10689 2026-03-13 cs.LG cs.AI

Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov

2603.10648 2026-03-13 cs.CV

Less is More: Decoder-Free Masked Modeling for Efficient Skeleton Representation Learning

Jeonghyeok Do, Yun Chen, Geunhyuk Youk, Munchurl Kim

Comments Please visit our project page at https://kaist-viclab.github.io/SLiM_site/

2603.10577 2026-03-13 cs.AI cs.HC

CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

Marta Sumyk, Oleksandr Kosovan

Comments This work has been accepted to appear at the HEAL @ CHI 2026 Worshop on Human-centered Evaluation and Auditing of Language Models