arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28569 2026-03-31 cs.LG cs.AI cs.IR cs.PF

CirrusBench: Evaluating LLM-based Agents Beyond Correctness in Real-World Cloud Service Environments

Yi Yu, Guangquan Hu, Chenghuang Shen, Xingyan Liu, Jing Gu, Hangyi Sun, Junzhuo Ma, Weiting Liu, Jianfeng Liu, Mingyue Pu, Yu Wang, Zhengdong Xiao, Rui Xie, Longjiu Luo, Qianrong Wang, Gurong Cui, Honglin Qiao, Wenlian Lu

Comments Submitted for SIGKDD 2026

2603.28568 2026-03-31 cs.CV

XSPA: Crafting Imperceptible X-Shaped Sparse Adversarial Perturbations for Transferable Attacks on VLMs

Chengyin Hu, Jiaju Han, Xuemeng Sun, Qike Zhang, Yiwei Wei, Ang Li, Chunlei Meng, Xiang Chen, Jiahuan Long

详情

英文摘要

Vision-language models (VLMs) rely on a shared visual-textual representation space to perform tasks such as zero-shot classification, image captioning, and visual question answering (VQA). While this shared space enables strong cross-task generalization, it may also introduce a common vulnerability: small visual perturbations can propagate through the shared embedding space and cause correlated semantic failures across tasks. This risk is particularly important in interactive and decision-support settings, yet it remains unclear whether VLMs are robust to highly constrained, sparse, and geometrically fixed perturbations. To address this question, we propose X-shaped Sparse Pixel Attack (XSPA), an imperceptible structured attack that restricts perturbations to two intersecting diagonal lines. Compared with dense perturbations or flexible localized patches, XSPA operates under a much stricter attack budget and thus provides a more stringent test of VLM robustness. Within this sparse support, XSPA jointly optimizes a classification objective, cross-task semantic guidance, and regularization on perturbation magnitude and along-line smoothness, inducing transferable misclassification as well as semantic drift in captioning and VQA while preserving visual subtlety. Under the default setting, XSPA modifies only about 1.76% of image pixels. Experiments on the COCO dataset show that XSPA consistently degrades performance across all three tasks. Zero-shot accuracy drops by 52.33 points on OpenAI CLIP ViT-L/14 and 67.00 points on OpenCLIP ViT-B/16, while GPT-4-evaluated caption consistency decreases by up to 58.60 points and VQA correctness by up to 44.38 points. These results suggest that even highly sparse and visually subtle perturbations with fixed geometric priors can substantially disrupt cross-task semantics in VLMs, revealing a notable robustness gap in current multimodal systems.

URL PDF HTML ☆

赞 0 踩 0

2603.28565 2026-03-31 cs.RO cs.CV

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

Yiran Shi, Dongqi Guo, Tianchen Zhao, Feng Gao, Liangzhi Shi, Chao Yu, ZhiJian Mo, Qihua Xiao, XiaoShuai Peng, Qingmin Liao, Yu Wang

2603.28560 2026-03-31 cs.CV

Curriculum-Guided Myocardial Scar Segmentation for Ischemic and Non-ischemic Cardiomyopathy

Nivetha Jayakumar, Jonathan Pan, Shuo Wang, Bishow Paudel, Nisha Hosadurg, Cristiane C. Singulane, Sivam Bhatt, Amit R. Patel, Miaomiao Zhang

2603.28558 2026-03-31 cs.AI

T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and Gödel Semantics in a Neuro-Symbolic Reasoning System

Adam Laabs

Comments 11 pages, 8 tables, open-source code and dataset at https://github.com/TriStiX-LS/LggT-core

2603.28555 2026-03-31 cs.CV cs.AI

Domain-Invariant Prompt Learning for Vision-Language Models

Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt

2603.28550 2026-03-31 cs.CV

MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures

Tim Strohmeyer, Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Ahmed Nassar, Peter Staar

Comments 15 pages, to be published in CVPR 2026

2603.28548 2026-03-31 cs.CV

Seen2Scene: Completing Realistic 3D Scenes with Visibility-Guided Flow

Quan Meng, Yujin Chen, Lei Li, Matthias Nießner, Angela Dai

Comments Project page: https://quan-meng.github.io/projects/seen2scene/ Video: https://www.youtube.com/watch?v=5qJYLjMsJe8

2603.28547 2026-03-31 cs.CV

GEditBench v2: A Human-Aligned Benchmark for General Image Editing

Zhangqi Jiang, Zheng Sun, Xianfang Zeng, Yufeng Yang, Xuanyang Zhang, Yongliang Wu, Wei Cheng, Gang Yu, Xu Yang, Bihan Wen

Comments 30 pages, 24 figures

2603.28545 2026-03-31 cs.RO cs.CV

ManipArena: Comprehensive Real-world Evaluation of Reasoning-Oriented Generalist Robot Manipulation

Yu Sun, Meng Cao, Ping Yang, Rongtao Xu, Yunxiao Yan, Runze Xu, Liang Ma, Roy Gan, Andy Zhai, Qingxuan Chen, Zunnan Xu, Hao Wang, Jincheng Yu, Lucy Liang, Qian Wang, Ivan Laptev, Ian D Reid, Xiaodan Liang

Comments Technical report for CVPR 2026 Challenge ManipArena

2603.28542 2026-03-31 cs.RO

Feel Robot Feels: Tactile Feedback Array Glove for Dexterous Manipulation

Feiyu Jia, Xiaojie Niu, Sizhe Yang, Qingwei Ben, Tao Huang, Feng zhao, Jingbo Wang, Jiangmiao Pang

Comments 13 pages, 16 figures

2603.28534 2026-03-31 cs.CL physics.data-an

Compressing Transformer Language Models via Matrix Product Operator Decomposition: A Case Study on PicoGPT

Younes Javanmard, Tanmoy Pandit, Masoud Mardani

2603.28515 2026-03-31 cs.CL

EarlySciRev: A Dataset of Early-Stage Scientific Revisions Extracted from LaTeX Writing Traces

Léane Jourdan, Julien Aubert-Béduchaud, Yannis Chupin, Marah Baccari, Florian Boudin

Comments Accepted to NSLP@LREC

2603.28512 2026-03-31 cs.CL

TIEG-Youpu Solution for NeurIPS 2022 WikiKG90Mv2-LSC

Feng Nie, Zhixiu Ye, Sifa Xie, Shuang Wu, Xin Yuan, Liang Yao, Jiazhen Peng, Xu Cheng

Comments 6 pages, 1 figure

2603.28508 2026-03-31 cs.CV

Generalizable Detection of AI Generated Images with Large Models and Fuzzy Decision Tree

Fei Wu, Guanghao Ding, Zijian Niu, Zhenrui Wang, Lei Yang, Zhuosheng Zhang, Shilin Wang

2603.28503 2026-03-31 cs.CV

Bridging the Geometry Mismatch: Frequency-Aware Anisotropic Serialization for Thin-Structure SSMs

Jin Bai, Huiyao Zhang, Qi Wen, Ningyang Li, Shengyang Li, Atta ur Rahman, Xiaolin Tian

2603.28499 2026-03-31 cs.LG cs.AI cs.GT

Next-Token Prediction and Regret Minimization

Mehryar Mohri, Clayton Sanford, Jon Schneider, Kiran Vodrahalli, Yifan Wu

2603.28493 2026-03-31 cs.CV

ConceptWeaver: Weaving Disentangled Concepts with Flow

Jintao Chen, Aiming Hao, Xiaoqing Chen, Chengyu Bai, Chubin Chen, Yanxun Li, Jiahong Wu, Xiangxiang Chu, Shanghang Zhang

2603.28480 2026-03-31 cs.CV

INSID3: Training-Free In-Context Segmentation with DINOv3

Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, Stefan Roth

Comments CVPR 2026. Project page: https://visinf.github.io/INSID3

2603.28475 2026-03-31 cs.RO

Tac2Real: Reliable and GPU Visuotactile Simulation for Online Reinforcement Learning and Zero-Shot Real-World Deployment

Ningyu Yan, Shuai Wang, Xing Shen, Hui Wang, Hanqing Wang, Yang Xiang, Jiangmiao Pang

Comments 27 pages, 12 figures

2603.28474 2026-03-31 cs.CV cs.AI

CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

Wenhan Wang, Zhixiang Zhou, Zhongtian Ma, Yanzhu Chen, Ziyu Lin, Hao Sheng, Pengfei Liu, Honglin Ma, Wenqi Shao, Qiaosheng Zhang, Yu Qiao

2603.28467 2026-03-31 cs.RO

Communications-Aware NMPC for Multi-Rotor Aerial Relay Networks Under Jamming Interference

Giuseppe Silano, Daniel Bonilla Licea, Davide Liuzza, Antonio Franchi, Martin Saska

Comments This work has been submitted to the IEEE for possible publication

2603.28466 2026-03-31 cs.CV stat.ML

Post-hoc Self-explanation of CNNs

Ahcène Boubekki, Line H. Clemmensen

2603.28455 2026-03-31 cs.LG cs.AI cs.CV cs.DC stat.ML

FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation

Tiantian Wang, Xiang Xiang, Simon S. Du

2603.28444 2026-03-31 cs.AI cs.CL

Entropic Claim Resolution: Uncertainty-Driven Evidence Selection for RAG

Davide Di Gioia

Comments Preprint

2603.28436 2026-03-31 cs.SD

A Probabilistic Generative Model for Spectral Speech Enhancement

Marco Hidalgo-Araya, Raphaël Trésor, Bart Van Erp, Wouter W. L. Nuijten, Thijs Van De Laar, Bert De Vries

Comments Submitted to the IEEE Open Journal of Signal Processing

2603.28430 2026-03-31 cs.LG cs.CL

IsoQuant: Hardware-Aligned SO(4) Isoclinic Rotations for LLM KV Cache Compression

Zhongping Ji

Comments 11 pages

2603.28427 2026-03-31 cs.RO cs.CV

Tele-Catch: Adaptive Teleoperation for Dexterous Dynamic 3D Object Catching

Weiguang Zhao, Junting Dong, Rui Zhang, Kailin Li, Qin Zhao, Kaizhu Huang

2603.28426 2026-03-31 cs.CL cs.SC

Structural-Ambiguity-Aware Translation from Natural Language to Signal Temporal Logic

Kosei Fushimi, Kazunobu Serizawa, Junya Ikemoto, Kazumune Hashimoto

2603.28425 2026-03-31 cs.CV

From Pixels to Reality: Physical-Digital Patch Attacks on Real-World Camera

Victoria Leonenkova, Ekaterina Shumitskaya, Dmitriy Vatolin, Anastasia Antsiferova

Comments Accepted to the PerCom 2026 Demo