arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03117 2026-04-06 cs.CV

Revealing Physical-World Semantic Vulnerabilities: Universal Adversarial Patches for Infrared Vision-Language Models

Chengyin Hu, Yuxian Dong, Yikun Guo, Xiang Chen, Junqi Wu, Jiahuan Long, Yiwei Wei, Tingsong Jiang, Wen Yao

详情

英文摘要

Infrared vision-language models (IR-VLMs) have emerged as a promising paradigm for multimodal perception in low-visibility environments, yet their robustness to adversarial attacks remains largely unexplored. Existing adversarial patch methods are mainly designed for RGB-based models in closed-set settings and are not readily applicable to the open-ended semantic understanding and physical deployment requirements of infrared VLMs. To bridge this gap, we propose Universal Curved-Grid Patch (UCGP), a universal physical adversarial patch framework for IR-VLMs. UCGP integrates Curved-Grid Mesh (CGM) parameterization for continuous, low-frequency, and deployable patch generation with a unified representation-driven objective that promotes subspace departure, topology disruption, and stealth. To improve robustness under real-world deployment and domain shift, we further incorporate Meta Differential Evolution and EOT-augmented TPS deformation modeling. Rather than manipulating labels or prompts, UCGP directly disrupts the visual representation space, weakening cross-modal semantic alignment. Extensive experiments demonstrate that UCGP consistently compromises semantic understanding across diverse IR-VLM architectures while maintaining cross-model transferability, cross-dataset generalization, real-world physical effectiveness, and robustness against defenses. These findings reveal a previously overlooked robustness vulnerability in current infrared multimodal systems.

URL PDF HTML ☆

赞 0 踩 0

2604.03114 2026-04-06 cs.CV cs.AI

Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

Zhangyun Tan, Zeliang Zhang, Susan Liang, Yolo Yunlong Tang, Lisha Chen, Chenliang Xu

2604.03110 2026-04-06 cs.CL

Multi-Aspect Knowledge Distillation for Language Model with Low-rank Factorization

Zihe Liu, Yulong Mao, Jinan Xu, Xinrui Peng, Kaiyu Huang

2604.03098 2026-04-06 cs.LG cs.AI cs.CL

Co-Evolution of Policy and Internal Reward for Language Agents

Xinyu Wang, Hanwei Wu, Jingwei Song, Shuyuan Zhang, Jiayi Zhang, Fanqi Kong, Tung Sum Thomas Kwok, Xiao-Wen Chang, Yuyu Luo, Chenglin Wu, Bang Liu

Comments 20 pages, 13 figures

2604.03096 2026-04-06 cs.RO

An Open-Source LiDAR and Monocular Off-Road Autonomous Navigation Stack

Rémi Marsal, Quentin Picard, Adrien Poiré, Sébastien Kerbourc'h, Thibault Toralba, Clément Yver, Alexandre Chapoutot, David Filliat

2604.03094 2026-04-06 cs.CV cs.AI

A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification

David Mike-Ewewie, Panhapiseth Lim, Priyanka Kumar

2604.03092 2026-04-06 cs.RO

Flash-Mono: Feed-Forward Accelerated Gaussian Splatting Monocular SLAM

Zicheng Zhang, Ke Wu, Xiangting Meng, Keyu Liu, Jieru Zhao, Wenchao Ding

详情

Journal ref: International Conference on Learning Representations, 2026

英文摘要

Monocular 3D Gaussian Splatting SLAM suffers from critical limitations in time efficiency, geometric accuracy, and multi-view consistency. These issues stem from the time-consuming $\textit{Train-from-Scratch}$ optimization and the lack of inter-frame scale consistency from single-frame geometry priors. We contend that a feed-forward paradigm, leveraging multi-frame context to predict Gaussian attributes directly, is crucial for addressing these challenges. We present Flash-Mono, a system composed of three core modules: a feed-forward prediction frontend, a 2D Gaussian Splatting mapping backend, and an efficient hidden-state-based loop closure module. We trained a recurrent feed-forward frontend model that progressively aggregates multi-frame visual features into a hidden state via cross attention and jointly predicts camera poses and per-pixel Gaussian properties. By directly predicting Gaussian attributes, our method bypasses the burdensome per-frame optimization required in optimization-based GS-SLAM, achieving a $\textbf{10x}$ speedup while ensuring high-quality rendering. The power of our recurrent architecture extends beyond efficient prediction. The hidden states act as compact submap descriptors, facilitating efficient loop closure and global $\mathrm{Sim}(3)$ optimization to mitigate the long-standing challenge of drift. For enhanced geometric fidelity, we replace conventional 3D Gaussian ellipsoids with 2D Gaussian surfels. Extensive experiments demonstrate that Flash-Mono achieves state-of-the-art performance in both tracking and mapping quality, highlighting its potential for embodied perception and real-time reconstruction applications. Project page: https://victkk.github.io/flash-mono.

URL PDF HTML ☆

赞 0 踩 0

2604.03072 2026-04-06 cs.CV

MI-Pruner: Crossmodal Mutual Information-guided Token Pruner for Efficient MLLMs

Jiameng Li, Aleksei Tiulpin, Matthew B. Blaschko

Comments 9 pages

2604.03071 2026-04-06 cs.AI

Automatic Textbook Formalization

Fabian Gloeckle, Ahmad Rammal, Charles Arnal, Remi Munos, Vivien Cabannes, Gabriel Synnaeve, Amaury Hayat

Comments 19 pages

2604.03069 2026-04-06 cs.CV

SparseSplat: Towards Applicable Feed-Forward 3D Gaussian Splatting with Pixel-Unaligned Prediction

Zicheng Zhang, Xiangting Meng, Ke Wu, Wenchao Ding

2604.03065 2026-04-06 cs.RO

Joint Prediction of Human Motions and Actions in Human-Robot Collaboration

Alessandra Bulanti, Alessandro Carfì, Fulvio Mastrogiovanni

Comments 8 pages, 6 figures. Submitted to IEEE AIM 2026

2604.03064 2026-04-06 cs.CV

Gram-MMD: A Texture-Aware Metric for Image Realism Assessment

Joé Napolitano, Pascal Nguyen

Comments 13 pages, 15 figures, 2 tables. Preprint

2604.03057 2026-04-06 cs.CL cs.AI

Querying Structured Data Through Natural Language Using Language Models

Hontan Valentin-Micu, Bunea Andrei-Alexandru, Tantaroudas Nikolaos Dimitrios, Popovici Dan-Matei

Comments in publication

2604.03045 2026-04-06 cs.CV cs.MM

STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models

Linfeng Fan, Yuan Tian, Ziwei Li, Zhiwu Lu

Comments Preprint

2604.03040 2026-04-06 cs.CV

QVAD: A Question-Centric Agentic Framework for Efficient and Training-Free Video Anomaly Detection

Lokman Bekit, Hamza Karim, Nghia T Nguyen, Yasin Yilmaz

2604.03023 2026-04-06 cs.RO

Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

Siwei Ju, Jan Tauberschmidt, Oleg Arenz, Peter van Vliet, Jan Peters

详情

英文摘要

Learning high-performance control policies that remain consistent with expert behavior is a fundamental challenge in robotics. Reinforcement learning can discover high-performing strategies but often departs from desirable human behavior, whereas imitation learning is limited by demonstration quality and struggles to improve beyond expert data. We propose a behavior-constrained reinforcement learning framework that improves beyond demonstrations while explicitly controlling deviation from expert behavior. Because expert-consistent behavior in dynamic control is inherently trajectory-level, we introduce a receding-horizon predictive mechanism that models short-term future trajectories and provides look-ahead rewards during training. To account for the natural variability of human behavior under disturbances and changing conditions, we further condition the policy on reference trajectories, allowing it to represent a distribution of expert-consistent behaviors rather than a single deterministic target. Empirically, we evaluate the approach in high-fidelity race car simulation using data from professional drivers, a domain characterized by extreme dynamics and narrow performance margins. The learned policies achieve competitive lap times while maintaining close alignment with expert driving behavior, outperforming baseline methods in both performance and imitation quality. Beyond standard benchmarks, we conduct human-grounded evaluation in a driver-in-the-loop simulator and show that the learned policies reproduce setup-dependent driving characteristics consistent with the feedback of top-class professional race drivers. These results demonstrate that our method enables learning high-performance control policies that are both optimal and behavior-consistent, and can serve as reliable surrogates for human decision-making in complex control systems.

URL PDF HTML ☆

赞 0 踩 0

2604.03016 2026-04-06 cs.AI

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Qianshan Wei, Yishan Yang, Siyi Wang, Jinglin Chen, Binyu Wang, Jiaming Wang, Shuang Chen, Zechen Li, Yang Shi, Yuqi Tang, Weining Wang, Yi Yu, Chaoyou Fu, Qi Li, Yi-Fan Zhang

2604.03015 2026-04-06 cs.LG math.PR stat.ML

Generating DDPM-based Samples from Tilted Distributions

Himadri Mandal, Dhruman Gupta, Rushil Gupta, Sarvesh Ravichandran Iyer, Agniv Bandyopadhyay, Achal Bassamboo, Varun Gupta, Sandeep Juneja

Comments 33 pages, 4 figures

2604.03008 2026-04-06 cs.RO

Asymptotically-Bounded 3D Frontier Exploration enhanced with Bayesian Information Gain

John Lewis, Meysam Basiri, Pedro U. Lima

Comments Submitted for review to IEEE Robotics and Automation Letters (RA-L)

2604.03006 2026-04-06 cs.RO

A Flow Matching Framework for Soft-Robot Inverse Dynamics

Hang Yang, Fangju Yang, Yangming Zhang, Ibrahim Alsarraj, Yuhao Wang, Zhenye Luo, Zixi Chen, Ke Wu

2604.03004 2026-04-06 cs.CL cs.AI

R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning

Wanlong Liu, Bo Zhang, Chenliang Li, Shaopeng Lai, Yuning Wu, Xuanyu Lei, Ming Yan

Comments 31 pages

2604.03002 2026-04-06 cs.CV

Explicit Time-Frequency Dynamics for Skeleton-Based Gait Recognition

Seoyeon Ko, Yeojin Song, Egene Chung, Luca Quagliato, Taeyong Lee, Junhyug Noh

Comments 5 pages, 1 figure, to appear in ICASSP 2026

2604.02990 2026-04-06 cs.LG cs.AI cs.DC

FedSQ: Optimized Weight Averaging via Fixed Gating

Cristian Pérez-Corral, Jose I. Mestre, Alberto Fernández-Hernández, Manuel F. Dolz, José Duato, Enrique S. Quintana-Ortí

2604.02986 2026-04-06 cs.LG cs.AI cs.CL

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness

Shinnosuke Ono, Johannes Ackermann, Soichiro Nishimori, Takashi Ishida, Masashi Sugiyama

Comments 27 pages, 7 figures

2604.02979 2026-04-06 cs.CV

Not All Frames Deserve Full Computation: Accelerating Autoregressive Video Generation via Selective Computation and Predictive Extrapolation

Hanshuai Cui, Zhiqing Tang, Zhi Yao, Fanshuai Meng, Weijia Jia, Wei Zhao

2604.02977 2026-04-06 cs.CV

Effect of Input Resolution on Retinal Vessel Segmentation Performance: An Empirical Study Across Five Datasets

Amarnath R

Comments 12 pages, 4 figures, 3 tables

2604.02973 2026-04-06 cs.CV

Exploring Motion-Language Alignment for Text-driven Motion Generation

Ruxi Gu, Zilei Wang, Wei Wang

Comments 10 pages, 8 figures

2604.02972 2026-04-06 cs.CL

NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons

Haonan Dong, Kehan Jiang, Haoran Ye, Wenhao Zhu, Zhaolu Kang, Guojie Song

2604.02971 2026-04-06 cs.AI

InfoSeeker: A Scalable Hierarchical Parallel Agent Framework for Web Information Seeking

Ka Yiu Lee, Yuxuan Huang, Zhiyuan He, Huichi Zhou, Weilin Luo, Kun Shao, Meng Fang, Jun Wang

2604.02967 2026-04-06 cs.AI cs.CL

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

Kehan Jiang, Haonan Dong, Zhaolu Kang, Zhengzhou Zhu, Guojie Song