arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.27181 2026-03-31 cs.RO cs.AI

An End-to-end Flight Control Network for High-speed UAV Obstacle Avoidance based on Event-Depth Fusion

Dikai Shang, Jingyue Zhao, Shi Xu, Nanyang Ye, Lei Wang

Comments 7 pages, 10 figures

详情

英文摘要

Achieving safe, high-speed autonomous flight in complex environments with static, dynamic, or mixed obstacles remains challenging, as a single perception modality is incomplete. Depth cameras are effective for static objects but suffer from motion blur at high speeds. Conversely, event cameras excel at capturing rapid motion but struggle to perceive static scenes. To exploit the complementary strengths of both sensors, we propose an end-to-end flight control network that achieves feature-level fusion of depth images and event data through a bidirectional crossattention module. The end-to-end network is trained via imitation learning, which relies on high-quality supervision. Building on this insight, we design an efficient expert planner using Spherical Principal Search (SPS). This planner reduces computational complexity from $O(n^2)$ to $O(n)$ while generating smoother trajectories, achieving over 80% success rate at 17m/s--nearly 20% higher than traditional planners. Simulation experiments show that our method attains a 70-80% success rate at 17 m/s across varied scenes, surpassing single-modality and unidirectional fusion models by 10-20%. These results demonstrate that bidirectional fusion effectively integrates event and depth information, enabling more reliable obstacle avoidance in complex environments with both static and dynamic objects.

URL PDF HTML ☆

赞 0 踩 0

2603.27179 2026-03-31 cs.CV

Reasoning-Driven Anomaly Detection and Localization with Image-Level Supervision

Yizhou Jin, Yuezhu Feng, Jinjin Zhang, Peng Wang, Qingjie Liu, Yunhong Wang

Comments Accepted to CVPR 2026

2603.27176 2026-03-31 cs.CV

MEDIC-AD: Towards Medical Vision-Language Model's Clinical Intelligence

Woohyeon Park, Jaeik Kim, Sunghwan Steve Cho, Pa Hong, Wookyoung Jeong, Yoojin Nam, Namjoon Kim, Ginny Y. Wong, Ka Chun Cheung, Jaeyoung Do

2603.27170 2026-03-31 cs.CV

MultiLoc: Multi-view Guided Relative Pose Regression for Fast and Robust Visual Re-Localization

Nobel Dang, Bing Li

2603.27169 2026-03-31 cs.AI

Aligning LLMs with Graph Neural Solvers for Combinatorial Optimization

Shaodi Feng, Zhuoyi Lin, Yaoxin Wu, Haiyan Yin, Yan Jin, Senthilnath Jayavelu, Xun Xu

Comments 18 pages, 3 figures

2603.27165 2026-03-31 cs.CV

RiskProp: Collision-Anchored Self-Supervised Risk Propagation for Early Accident Anticipation

Yiyang Zou, Tianhao Zhao, Peilun Xiao, Hongyu Jin, Longyu Qi, Yuxuan Li, Liyin Liang, Yifeng Qian, Chunbo Lai, Yutian Lin, Zhihui Li, Yu Wu

Comments Accepted by CVPR 2026

2603.27164 2026-03-31 cs.AI cs.CL

daVinci-LLM:Towards the Science of Pretraining

Yiwei Qin, Yixiu Liu, Tiantian Mi, Muhang Xie, Zhen Huang, Weiye Si, Pengrui Lu, Siyuan Feng, Xia Wu, Liming Liu, Ye Luo, Jinlong Hou, Qipeng Guo, Yu Qiao, Pengfei Liu

详情

英文摘要

The foundational pretraining phase determines a model's capability ceiling, as post-training struggles to overcome capability foundations established during pretraining, yet it remains critically under-explored. This stems from a structural paradox: organizations with computational resources operate under commercial pressures that inhibit transparent disclosure, while academic institutions possess research freedom but lack pretraining-scale computational resources. daVinci-LLM occupies this unexplored intersection, combining industrial-scale resources with full research freedom to advance the science of pretraining. We adopt a fully-open paradigm that treats openness as scientific methodology, releasing complete data processing pipelines, full training processes, and systematic exploration results. Recognizing that the field lacks systematic methodology for data processing, we employ the Data Darwinism framework, a principled L0-L9 taxonomy from filtering to synthesis. We train a 3B-parameter model from random initialization across 8T tokens using a two-stage adaptive curriculum that progressively shifts from foundational capabilities to reasoning-intensive enhancement. Through 200+ controlled ablations, we establish that: processing depth systematically enhances capabilities, establishing it as a critical dimension alongside volume scaling; different domains exhibit distinct saturation dynamics, necessitating adaptive strategies from proportion adjustments to format shifts; compositional balance enables targeted intensification while preventing performance collapse; how evaluation protocol choices shape our understanding of pretraining progress. By releasing the complete exploration process, we enable the community to build upon our findings and systematic methodologies to form accumulative scientific knowledge in pretraining.

URL PDF HTML ☆

赞 0 踩 0

2603.27159 2026-03-31 cs.LG cs.SY eess.SY math.OC

Online Learning of Kalman Filtering: From Output to State Estimation

Lintao Ye, Ankang Zhang, Ming Chi, Bin Du, Jianghai Hu

2603.27156 2026-03-31 cs.LG cs.AI

GSR-GNN: Training Acceleration and Memory-Saving Framework of Deep GNNs on Circuit Graph

Yuebo Luo, Shiyang Li, Yifei Feng, Vishal Kancharla, Shaoyi Huang, Caiwen Ding

Comments 8 pages including references, already been accepted to DAC 2026

2603.27154 2026-03-31 cs.LG cs.AI cs.DS

A Tight Expressivity Hierarchy for GNN-Based Entity Resolution in Master Data Management

Ashwin Ganesan

2603.27153 2026-03-31 cs.LG

Preconditioned Attention: Enhancing Efficiency in Transformers

Hemanth Saratchandran

Comments AISTATS 2026

2603.27143 2026-03-31 cs.CV cs.LG

Follow Your Heart: Landmark-Guided Transducer Pose Scoring for Point-of-Care Echocardiography

Zaiyang Guo, Jessie N. Dong, Filippos Bellos, Jilei Hao, Emily J. MacKay, Trevor Chan, Shir Goldfinger, Sethu Reddy, Steven Vance, Jason J. Corso, Alison M. Pouch

Comments Accepted for oral presentation at the International Symposium on Biomedical Imaging 2026

2603.27138 2026-03-31 cs.LG

ScoutAttention: Efficient KV Cache Offloading via Layer-Ahead CPU Pre-computation for LLM Inference

Qiuyang Zhang, Kai Zhou, Ding Tang, Kai Lu, Cheng Li, Zhenyu Yang, Peng Xu, Jiguang Wan

Comments Accepted at the 63rd Design Automation Conference (DAC 2026)

2603.27135 2026-03-31 cs.LG stat.ML

Spectral-Aware Text-to-Time Series Generation with Billion-Scale Multimodal Meteorological Data

Shijie Zhang

Comments Accepted By IJCNN 2026 (WCCI)

2603.27119 2026-03-31 cs.LG cs.AI

Bayesian-Symbolic Integration for Uncertainty-Aware Parking Prediction

Alireza Nezhadettehad, Arkady Zaslavsky, Abdur Rakib, Seng W. Loke

Comments Accepted at IEEE ITSC 2025 (to appear)

2603.27116 2026-03-31 cs.AI cs.IR cs.NE

The Price of Meaning: Why Every Semantic Memory System Forgets

Sambartha Ray Barman, Andrey Starenky, Sofia Bodnar, Nikhil Narasimhan, Ashwin Gopinath

2603.27115 2026-03-31 cs.CV

SJD-VP: Speculative Jacobi Decoding with Verification Prediction for Autoregressive Image Generation

Bingqi Shan, Baoquan Zhang, Xiaochen Qi, Xutao Li, Yunming Ye, Liqiang Nie

2603.27114 2026-03-31 cs.LG stat.ME

Maximin Learning of Individualized Treatment Effect on Multi-Domain Outcomes

Yuying Lu, Wenbo Fei, Yuanjia Wang, Molei Liu

2603.27113 2026-03-31 cs.LG cond-mat.mtrl-sci stat.ML

Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation

Urvi Awasthi, Alexander Arjun Lobo, Leonid Zhukov

Comments 22 pages, 2 figures, 6 tables. Accepted to ICLR 2026 AI4Mat Workshop

2603.27112 2026-03-31 cs.CV

RailVQA: A Benchmark and Framework for Efficient Interpretable Visual Cognition in Automatic Train Operation

Sen Zhang, Runmei Li, Zhichao Zheng, Yuhe Zhang, Jiani Li, Kailun Zhang, Tao Zhang, Wenjun Wu, Qunbo Wang

2603.27108 2026-03-31 cs.CV

MotiMem: Motion-Aware Approximate Memory for Energy-Efficient Neural Perception in Autonomous Vehicles

Haohua Que, Mingkai Liu, Jiayue Xie, Haojia Gao, Jiajun Sun, Hongyi Xu, Handong Yao, Fei Qiao

Comments 8 pages,6 figures,conference

2603.27103 2026-03-31 cs.CV

LLM Enhanced Action Recognition via Hierarchical Global-Local Skeleton-Language Model

Ruosi Wang, Fangwei Zuo, Lei Li, Zhaoqiang Xia

详情

英文摘要

Skeleton-based human action recognition has achieved remarkable progress in recent years. However, most existing GCN-based methods rely on short-range motion topologies, which not only struggle to capture long-range joint dependencies and complex temporal dynamics but also limit cross-modal semantic alignment and understanding due to insufficient modeling of action semantics. To address these challenges, we propose a hierarchical global-local skeleton-language model (HocSLM), enabling the large action model be more representative of action semantics. First, we design a hierarchical global-local network (HGLNet) that consists of a composite-topology spatial module and a dual-path hierarchical temporal module. By synergistically integrating multi-level global and local modules, HGLNet achieves dynamically collaborative modeling at both global and local scales while preserving prior knowledge of human physical structure, significantly enhancing the model's representation of complex spatio-temporal relationships. Then, a large vision-language model (VLM) is employed to generate textual descriptions by passing the original RGB video sequences to this model, providing the rich action semantics for further training the skeleton-language model. Furthermore, we introduce a skeleton-language sequential fusion module by combining the features from HGLNet and the generated descriptions, which utilizes a skeleton-language model (SLM) for aligning skeletal spatio-temporal features and textual action descriptions precisely within a unified semantic space. The SLM model could significantly enhance the HGLNet's semantic discrimination capabilities and cross-modal understanding abilities. Extensive experiments demonstrate that the proposed HocSLM achieves the state-of-the-art performance on three mainstream benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, and Northwestern-UCLA.

URL PDF HTML ☆

赞 0 踩 0

2603.27101 2026-03-31 cs.CV cs.LG

PRUE: A Practical Recipe for Field Boundary Segmentation at Scale

Gedeon Muhawenayo, Caleb Robinson, Subash Khanal, Zhanpei Fang, Isaac Corley, Alexander Wollam, Tianyi Gao, Leonard Strnad, Ryan Avery, Lyndon Estes, Ana M. Tárano, Nathan Jacobs, Hannah Kerner

Comments 12 pages, 3 figures, supplementary material. Accepted at CVPR 2026 (IEEE/CVF Conference on Computer Vision and Pattern Recognition)

2603.27086 2026-03-31 cs.CV

EFlow: Fast Few-Step Video Generator Training from Scratch via Efficient Solution Flow

Dogyun Park, Yanyu Li, Sergey Tulyakov, Anil Kag

2603.27083 2026-03-31 cs.CV

LightCtrl: Training-free Controllable Video Relighting

Yizuo Peng, Xuelin Chen, Kai Zhang, Xiaodong Cun

Comments Accepted at ICLR 2026

2603.27076 2026-03-31 cs.AI

When Verification Hurts: Asymmetric Effects of Multi-Agent Feedback in Logic Proof Tutoring

Tahreem Yasir, Sutapa Dey Tithi, Benyamin Tabarsi, Dmitri Droujkov, Sam Gilson Yasitha Rajapaksha, Xiaoyi Tian, Arun Ramesh, DongKuan, Xu, Tiffany Barnes

Comments 21 pages, 1 figure

2603.27070 2026-03-31 cs.CV

Structural Graph Probing of Vision-Language Models

Haoyu He, Yue Zhuo, Yu Zheng, Qi R. Wang

Comments IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2603.27066 2026-03-31 cs.LG cs.AI math.OC

Dynamic resource matching in manufacturing using deep reinforcement learning

Saunak Kumar Panda, Yisha Xiang, Ruiqi Liu

Comments 29 pages, 6 figures, 3 tables; Published in European Journal of Operational Research, Vol. 318(2), 2024

详情

DOI: 10.1016/j.ejor.2024.05.027
Journal ref: European Journal of Operational Research, vol. 318, no. 2, pp. 408-423, 2024, ISSN 0377-2217

英文摘要

Matching plays an important role in the logical allocation of resources across a wide range of industries. The benefits of matching have been increasingly recognized in manufacturing industries. In particular, capacity sharing has received much attention recently. In this paper, we consider the problem of dynamically matching demand-capacity types of manufacturing resources. We formulate the multi-period, many-to-many manufacturing resource-matching problem as a sequential decision process. The formulated manufacturing resource-matching problem involves large state and action spaces, and it is not practical to accurately model the joint distribution of various types of demands. To address the curse of dimensionality and the difficulty of explicitly modeling the transition dynamics, we use a model-free deep reinforcement learning approach to find optimal matching policies. Moreover, to tackle the issue of infeasible actions and slow convergence due to initial biased estimates caused by the maximum operator in Q-learning, we introduce two penalties to the traditional Q-learning algorithm: a domain knowledge-based penalty based on a prior policy and an infeasibility penalty that conforms to the demand-supply constraints. We establish theoretical results on the convergence of our domain knowledge-informed Q-learning providing performance guarantee for small-size problems. For large-size problems, we further inject our modified approach into the deep deterministic policy gradient (DDPG) algorithm, which we refer to as domain knowledge-informed DDPG (DKDDPG). In our computational study, including small- and large-scale experiments, DKDDPG consistently outperformed traditional DDPG and other RL algorithms, yielding higher rewards and demonstrating greater efficiency in time and episodes.

URL PDF HTML ☆

赞 0 踩 0

2603.27065 2026-03-31 cs.CL

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen

Comments 10 pages, 4 figures,

2603.27062 2026-03-31 cs.LG stat.ML

Conformalized Signal Temporal Logic Inference under Covariate Shift

Yixuan Wang, Danyang Li, Matthew Cleaveland, Roberto Tron, Mingyu Cai