arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16303 2026-03-18 cs.RO

Toward Deep Representation Learning for Event-Enhanced Visual Autonomous Perception: the eAP Dataset

Jinghang Li, Shichao Li, Qing Lian, Peiliang Li, Xiaozhi Chen, Yi Zhou

详情

英文摘要

Recent visual autonomous perception systems achieve remarkable performances with deep representation learning. However, they fail in scenarios with challenging illumination.While event cameras can mitigate this problem, there is a lack of a large-scale dataset to develop event-enhanced deep visual perception models in autonomous driving scenes. To address the gap, we present the eAP (event-enhanced Autonomous Perception) dataset, the largest dataset with event cameras for autonomous perception. We demonstrate how eAP can facilitate the study of different autonomous perception tasks, including 3D vehicle detection and object time-to-contact (TTC) estimation, through deep representation learning. Based on eAP, we demonstrate the ffrst successful use of events to improve a popular 3D vehicle detection network in challenging illumination scenarios. eAP also enables a devoted study of the representation learning problem of object TTC estimation. We show how a geometryaware representation learning framework leads to the best eventbased object TTC estimation network that operates at 200 FPS. The dataset, code, and pre-trained models will be made publicly available for future research.

URL PDF HTML ☆

赞 0 踩 0

2603.16302 2026-03-18 cs.CV

Micro-AU CLIP: Fine-Grained Contrastive Learning from Local Independence to Global Dependency for Micro-Expression Action Unit Detection

Jinsheng Wei, Fengzhou Guo, Yante Li, Haoyu Chen, Guanming Lu, Guoying Zhao

2603.16299 2026-03-18 cs.CL

PyPhonPlan: Simulating phonetic planning with dynamic neural fields and task dynamics

Sam Kirkham

Comments Submitted to Interspeech 2026

2603.16285 2026-03-18 cs.CV

Persistent Story World Simulation with Continuous Character Customization

Jinlu Zhang, Qiyun Wang, Baoxiang Du, Jiayi Ji, Jing He, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun, Rongrong Ji

2603.16280 2026-03-18 cs.SD eess.AS

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS

Zihao Zheng, Wen Wu, Chao Zhang, Mengyue Wu, Xuenan Xu

Comments Submitted to Interspeech 2026

2603.16279 2026-03-18 cs.RO stat.ML

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Timothée Gavin, Simon Lacroix, Murat Bronz

2603.16277 2026-03-18 cs.LG physics.flu-dyn

Physics-integrated neural differentiable modeling for immersed boundary systems

Chenglin Li, Hang Xu, Jianting Chen, Yanfei Zhang

Comments 22 pages, 15 figures

详情

英文摘要

Accurately, efficiently, and stably computing complex fluid flows and their evolution near solid boundaries over long horizons remains challenging. Conventional numerical solvers require fine grids and small time steps to resolve near-wall dynamics, resulting in high computational costs, while purely data-driven surrogate models accumulate rollout errors and lack robustness under extrapolative conditions. To address these issues, this study extends existing neural PDE solvers by developing a physics-integrated differentiable framework for long-horizon prediction of immersed-boundary flows. A key design aspect of the framework includes an important improvement, namely the structural integration of physical principles into an end-to-end differentiable architecture incorporating a PDE-based intermediate velocity module and a multi-direct forcing immersed boundary module, both adhering to the pressure-projection procedure for incompressible flow computation. The computationally expensive pressure projection step is substituted with a learned implicit correction using ConvResNet blocks to reduce cost, and a sub-iteration strategy is introduced to separate the embedded physics module's stability requirement from the surrogate model's time step, enabling stable coarse-grid autoregressive rollouts with large effective time increments. The framework uses only single-step supervision for training, eliminating long-horizon backpropagation and reducing training time to under one hour on a single GPU. Evaluations on benchmark cases of flow past a stationary cylinder and a rotationally oscillating cylinder at Re=100 show the proposed model consistently outperforms purely data-driven, physics-loss-constrained, and coarse-grid numerical baselines in flow-field fidelity and long-horizon stability, while achieving an approximately 200-fold inference speedup over the high-resolution solver.

URL PDF HTML ☆

赞 0 踩 0

2603.16269 2026-03-18 cs.CV

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Jinsheng Wei, Zhaodi Xu, Guanming Lu, Haoyu Chen, Jingjie Yan

2603.16264 2026-03-18 cs.AI

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

Chunjiang Mu, Ya Zeng, Qiaosheng Zhang, Kun Shao, Chen Chu, Hao Guo, Danyang Jia, Zhen Wang, Shuyue Hu

Comments Accepted by AAAI 2026

2603.16261 2026-03-18 cs.CV cs.AI

AW-MoE: All-Weather Mixture of Experts for Robust Multi-Modal 3D Object Detection

Hongwei Lin, Xun Huang, Chenglu Wen, Cheng Wang

2603.16258 2026-03-18 cs.CL

Is Semi-Automatic Transcription Useful in Corpus Creation? Preliminary Considerations on the KIParla Corpus

Martina Simonotti, Ludovica Pannitto, Eleonora Zucchini, Silvia Ballarè, Caterina Mauri

2603.16257 2026-03-18 cs.CV

Point-to-Mask: From Arbitrary Point Annotations to Mask-Level Infrared Small Target Detection

Weihua Gao, Wenlong Niu, Jie Tang, Man Yang, Jiafeng Zhang, Xiaodong Peng

2603.16256 2026-03-18 cs.CV

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition

Xiaokun Sun, Yubo Wang, Haoyu Cao, Linli Xu

2603.16250 2026-03-18 cs.CV cs.AI

Visual Prompt Discovery via Semantic Exploration

Jaechang Kim, Yotaro Shimose, Zhao Wang, Kuang-Da Wang, Jungseul Ok, Shingo Takamatsu

详情

英文摘要

LVLMs encounter significant challenges in image understanding and visual reasoning, leading to critical perception failures. Visual prompts, which incorporate image manipulation code, have shown promising potential in mitigating these issues. While emerged as a promising direction, previous methods for visual prompt generation have focused on tool selection rather than diagnosing and mitigating the root causes of LVLM perception failures. Because of the opacity and unpredictability of LVLMs, optimal visual prompts must be discovered through empirical experiments, which have relied on manual human trial-and-error. We propose an automated semantic exploration framework for discovering task-wise visual prompts. Our approach enables diverse yet efficient exploration through agent-driven experiments, minimizing human intervention and avoiding the inefficiency of per-sample generation. We introduce a semantic exploration algorithm named SEVEX, which addresses two major challenges of visual prompt exploration: (1) the distraction caused by lengthy, low-level code and (2) the vast, unstructured search space of visual prompts. Specifically, our method leverages an abstract idea space as a search space, a novelty-guided selection algorithm, and a semantic feedback-driven ideation process to efficiently explore diverse visual prompts based on empirical results. We evaluate SEVEX on the BlindTest and BLINK benchmarks, which are designed to assess LVLM perception. Experimental results demonstrate that SEVEX significantly outperforms baseline methods in task accuracy, inference efficiency, exploration efficiency, and exploration stability. Notably, our framework discovers sophisticated and counter-intuitive visual strategies that go beyond conventional tool usage, offering a new paradigm for enhancing LVLM perception through automated, task-wise visual prompts.

URL PDF HTML ☆

赞 0 踩 0

2603.16245 2026-03-18 cs.CV cs.CL

How to Utilize Complementary Vision-Text Information for 2D Structure Understanding

Jiancheng Dong, Pengyue Jia, Derong Xu, Jiawei Cheng, Jingyu Peng, Chao Zhang, Bowen Liu, Xin Sun, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiangyu Zhao

Comments 16 pages, 5 figures

2603.16244 2026-03-18 cs.CL

More Rounds, More Noise: Why Multi-Turn Review Fails to Improve Cross-Context Verification

Song Tae-Eun

Comments 10 pages, 2 figures

2603.16243 2026-03-18 cs.CV cs.AI

RASLF: Representation-Aware State Space Model for Light Field Super-Resolution

Zeqiang Wei, Kai Jin, Kuan Song, Xiuzhuang Zhou, Wenlong Chen, Min Xu

Comments 10 pages, 5 figures

2603.16241 2026-03-18 cs.CV

Exclusivity-Guided Mask Learning for Semi-Supervised Crowd Instance Segmentation and Counting

Jiyang Huang, Hongru Cheng, Wei Lin, Jia Wan, Antoni B. Chan

2603.16240 2026-03-18 cs.RO

Industrial cuVSLAM Benchmark & Integration

Charbel Abi Hana, Kameel Amareen, Mohamad Mostafa, Dmitry Slepichev, Hesam Rabeti, Zheng Wang, Mihir Acharya, Anthony Rizk

2603.16238 2026-03-18 cs.CV

PureCLIP-Depth: Prompt-Free and Decoder-Free Monocular Depth Estimation within CLIP Embedding Space

Ryutaro Miya, Kazuyoshi Fushinobu, Tatsuya Kawaguchi

Comments 12 pages, 4 figures

2603.16223 2026-03-18 cs.LG

Dual Consensus: Escaping from Spurious Majority in Unsupervised RLVR via Two-Stage Vote Mechanism

Kaixuan Du, Meng Cao, Hang Zhang, Yukun Wang, Xiangzhou Huang, Ni Li

Comments 10 pages, 5 figures

2603.16219 2026-03-18 cs.CL

SpecSteer: Synergizing Local Context and Global Reasoning for Efficient Personalized Generation

Hang Lv, Sheng Liang, Hao Wang, Yongyue Zhang, Hongchao Gu, Wei Guo, Defu Lian, Yong Liu, Enhong Chen

2603.16218 2026-03-18 cs.RO

Enabling Dynamic Tracking in Vision-Language-Action Models via Time-Discrete and Time-Continuous Velocity Feedforward

Johannes Hechtl, Philipp Schmitt, Georg von Wichert, Wolfram Burgard

2603.16211 2026-03-18 cs.CV

Leveling3D: Leveling Up 3D Reconstruction with Feed-Forward 3D Gaussian Splatting and Geometry-Aware Generation

Yiming Huang, Baixiang Huang, Beilei Cui, Chi Kit Ng, Long Bai, Hongliang Ren

Comments 26 pages, 10 figures

2603.16210 2026-03-18 cs.AI

MOSAIC: Composable Safety Alignment with Modular Control Tokens

Jingyu Peng, Hongyu Chen, Jiancheng Dong, Maolin Wang, Wenxi Li, Yuchen Li, Kai Zhang, Xiangyu Zhao

2603.16207 2026-03-18 cs.AI

Proactive Rejection and Grounded Execution: A Dual-Stage Intent Analysis Paradigm for Safe and Efficient AIoT Smart Homes

Xinxin Jin, Zhengwei Ni, Zhengguo Sheng, Victor C. M. Leung

2603.16206 2026-03-18 cs.LG cs.CL

Offline Exploration-Aware Fine-Tuning for Long-Chain Mathematical Reasoning

Yongyu Mu, Jiali Zeng, Fandong Meng, JingBo Zhu, Tong Xiao

Comments Working in process

2603.16200 2026-03-18 cs.LG

Online Semi-infinite Linear Programming: Efficient Algorithms via Function Approximation

Yiming Zong, Jiashuo Jiang

2603.16197 2026-03-18 cs.AI cs.CL

Are Large Language Models Truly Smarter Than Humans?

Eshwar Reddy M, Sourav Karmakar

Comments 15 pages, 2 figures, 7 tables

2603.16196 2026-03-18 cs.RO

PanguMotion: Continuous Driving Motion Forecasting with Pangu Transformers

Quanhao Ren, Yicheng Li, Nan Song