arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.08048 2026-04-10 cs.CV

Guiding a Diffusion Model by Swapping Its Tokens

Weijia Zhang, Yuehao Liu, Shanyan Guan, Wu Ran, Yanhao Ge, Wei Li, Chao Ma

Comments Accepted by CVPR 2026 (Oral)

详情

英文摘要

Classifier-Free Guidance (CFG) is a widely used inference-time technique to boost the image quality of diffusion models. Yet, its reliance on text conditions prevents its use in unconditional generation. We propose a simple method to enable CFG-like guidance for both conditional and unconditional generation. The key idea is to generate a perturbed prediction via simple token swap operations, and use the direction between it and the clean prediction to steer sampling towards higher-fidelity distributions. In practice, we swap pairs of most semantically dissimilar token latents in either spatial or channel dimensions. Unlike existing methods that apply perturbation in a global or less constrained manner, our approach selectively exchanges and recomposes token latents, allowing finer control over perturbation and its influence on generated samples. Experiments on MS-COCO 2014, MS-COCO 2017, and ImageNet datasets demonstrate that the proposed Self-Swap Guidance (SSG), when applied to popular diffusion models, outperforms previous condition-free methods in image fidelity and prompt alignment under different set-ups. Its fine-grained perturbation granularity also improves robustness, reducing side-effects across a wider range of perturbation strengths. Overall, SSG extends CFG to a broader scope of applications including both conditional and unconditional generation, and can be readily inserted into any diffusion model as a plug-in to gain immediate improvements.

URL PDF HTML ☆

赞 0 踩 0

2604.08045 2026-04-10 cs.CV

Adapting Foundation Models for Annotation-Efficient Adnexal Mass Segmentation in Cine Images

Francesca Fati, Alberto Rota, Adriana V. Gregory, Anna Catozzo, Maria C. Giuliano, Mrinal Dhar, Luigi De Vitis, Annie T. Packard, Francesco Multinu, Elena De Momi, Carrie L. Langstraat, Timothy L. Kline

2604.08042 2026-04-10 cs.CV cs.AI

3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

Hongcan Xiao, Xinyue Xiao, Yilin Wang, Yue Zhang, Yonggang Qi

Comments CVPR 2026 Highlight

2604.08038 2026-04-10 cs.CV

Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection

Jun Li, Yingying Shi, Zhixuan Ruan, Nan Guo, Jianhua Xu

2604.08036 2026-04-10 cs.LG cs.RO

PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC

Mohsen Amiri, Mohsen Amiri, Ali Beikmohammadi, Sindri Magnuśson, Mehdi Hosseinzadeh

Comments 8 pages, 3 figures

2604.08034 2026-04-10 cs.CV

Rotation Equivariant Convolutions in Deformable Registration of Brain MRI

Arghavan Rezvani, Kun Han, Anthony T. Wu, Pooya Khosravi, Xiaohui Xie

Comments Accepted at the 2026 International Symposium on Biomedical Imaging (ISBI) Poster 4-page paper presentation

2604.08033 2026-04-10 cs.AI cs.MA cs.NI

IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling

Zhaomeng Zhou, Lan Zhang, Junyang Wang, Mu Yuan, Junda Lin, Jinke Song

Comments To appear in ACM MobiCom 2026; 13 pages, 12 figures

2604.08032 2026-04-10 cs.AI cs.RO

"Why This Avoidance Maneuver?" Contrastive Explanations in Human-Supervised Maritime Autonomous Navigation

Joel Jose, Andreas Madsen, Andreas Brandsæter, Tor A. Johansen, Erlend M. Coates

Comments Submitted to IEEE Intelligent Transportation Systems Conference (ITSC) 2026

2604.08031 2026-04-10 cs.RO cs.CV

Open-Ended Instruction Realization with LLM-Enabled Multi-Planner Scheduling in Autonomous Vehicles

Jiawei Liu, Xun Gong, Fen Fang, Muli Yang, Bohao Qu, Yunfeng Hu, Hong Chen, Xulei Yang, Qing Guo

2604.08030 2026-04-10 cs.LG cs.AI

From Universal to Individualized Actionability: Revisiting Personalization in Algorithmic Recourse

Lena Marie Budde, Ayan Majumdar, Richard Uth, Markus Langer, Isabel Valera

Comments 27 pages, 8 figures, 6 tables

2604.08016 2026-04-10 cs.AI cs.LG

Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs

Moein Salimi, Shaygan Adim, Danial Parnian, Nima Alighardashi, Mahdi Jafari Siavoshani, Mohammad Hossein Rohban

详情

英文摘要

Regardless of its foundational role in human discovery and sense-making, abductive reasoning--the inference of the most plausible explanation for an observation--has been relatively underexplored in Large Language Models (LLMs). Despite the rapid advancement of LLMs, the exploration of abductive reasoning and its diverse facets has thus far been disjointed rather than cohesive. This paper presents the first survey of abductive reasoning in LLMs, tracing its trajectory from philosophical foundations to contemporary AI implementations. To address the widespread conceptual confusion and disjointed task definitions prevalent in the field, we establish a unified two-stage definition that formally categorizes prior work. This definition disentangles abduction into \textit{Hypothesis Generation}, where models bridge epistemic gaps to produce candidate explanations, and \textit{Hypothesis Selection}, where the generated candidates are evaluated and the most plausible explanation is chosen. Building upon this foundation, we present a comprehensive taxonomy of the literature, categorizing prior work based on their abductive tasks, datasets, underlying methodologies, and evaluation strategies. In order to ground our framework empirically, we conduct a compact benchmark study of current LLMs on abductive tasks, together with targeted comparative analyses across model sizes, model families, evaluation styles, and the distinct generation-versus-selection task typologies. Moreover, by synthesizing recent empirical results, we examine how LLM performance on abductive reasoning relates to deductive and inductive tasks, providing insights into their broader reasoning capabilities. Our analysis reveals critical gaps in current approaches--from static benchmark design and narrow domain coverage to narrow training frameworks and limited mechanistic understanding of abductive processes...

URL PDF HTML ☆

赞 0 踩 0

2604.08009 2026-04-10 cs.RO

AgiPIX: Bridging Simulation and Reality in Indoor Aerial Inspection

Sasanka Kuruppu Arachchige, Juan Jose Garcia, Changda Tian, Lauri Suomela, Panos Trahanias, Adriana Tapus, Joni-Kristian Kämäräinen

Comments Submitted for ICUAS 2026, 9 pages, 11 figures

2604.08008 2026-04-10 cs.CV cs.AI cs.LG

SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving

Felix Embacher, Jonas Uhrig, Marius Cordts, Markus Enzweiler

Comments To be published in CVPR 2026

2604.08005 2026-04-10 cs.LG

Preference Redirection via Attention Concentration: An Attack on Computer Use Agents

Dominik Seip, Matthias Hein

2604.08004 2026-04-10 cs.AI

Evaluating Counterfactual Explanation Methods on Incomplete Inputs

Francesco Leofante, Daniel Neider, Mustafa Yalçıner

2604.08001 2026-04-10 cs.LG cs.AI stat.ML

The ecosystem of machine learning competitions: Platforms, participants, and their impact on AI development

Ioannis Nasios

2604.08000 2026-04-10 cs.AI cs.CL cs.CV cs.HC cs.MA

PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory

Zhifei Xie, Zongzheng Hu, Fangda Ye, Xin Zhang, Haobo Chai, Zihang Liu, Pengcheng Wu, Guibin Zhang, Yue Liao, Xiaobin Hu, Deheng Ye, Chunyan Miao, Shuicheng Yan

Comments Technical report; Work in progress

2604.07999 2026-04-10 cs.LG

Benchmarking Deep Learning for Future Liver Remnant Segmentation in Colorectal Liver Metastasis

Anthony T. Wu, Arghavan Rezvani, Kela Liu, Roozbeh Houshyar, Pooya Khosravi, Whitney Li, Xiaohui Xie

Comments Accepted at the 2026 International Symposium on Biomedical Imaging (ISBI) Oral 4-page paper presentation

2604.07997 2026-04-10 cs.CV

Few-Shot Incremental 3D Object Detection in Dynamic Indoor Environments

Yun Zhu, Jianjun Qian, Jian Yang, Jin Xie, Na Zhao

Comments Accepted by CVPR 2026

2604.07991 2026-04-10 cs.CV cs.MM

MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models

Zile Guo, Zhan Chen, Enze Zhu, Kan Wei, Yongkang Zou, Xiaoxuan Liu, Lei Wang

详情

英文摘要

Recent advances in world models have demonstrated strong capabilities in simulating physical reality, making them an increasingly important foundation for embodied intelligence. For UAV agents in particular, accurate prediction of complex 3D dynamics is essential for autonomous navigation and robust decision-making in unconstrained environments. However, under the highly dynamic camera trajectories typical of UAV views, existing world models often struggle to maintain spatiotemporal physical consistency. A key reason lies in the distribution bias of current training data: most existing datasets exhibit restricted 2.5D motion patterns, such as ground-constrained autonomous driving scenes or relatively smooth human-centric egocentric videos, and therefore lack realistic high-dynamic 6-DoF UAV motion priors. To address this gap, we present MotionScape, a large-scale real-world UAV-view video dataset with highly dynamic motion for world modeling. MotionScape contains over 30 hours of 4K UAV-view videos, totaling more than 4.5M frames. This novel dataset features semantically and geometrically aligned training samples, where diverse real-world UAV videos are tightly coupled with accurate 6-DoF camera trajectories and fine-grained natural language descriptions. To build the dataset, we develop an automated multi-stage processing pipeline that integrates CLIP-based relevance filtering, temporal segmentation, robust visual SLAM for trajectory recovery, and large-language-model-driven semantic annotation. Extensive experiments show that incorporating such semantically and geometrically aligned annotations effectively improves the ability of existing world models to simulate complex 3D dynamics and handle large viewpoint shifts, thereby benefiting decision-making and planning for UAV agents in complex environments. The dataset is publicly available at https://github.com/Thelegendzz/MotionScape

URL PDF HTML ☆

赞 0 踩 0

2604.07986 2026-04-10 cs.CV

DP-DeGauss: Dynamic Probabilistic Gaussian Decomposition for Egocentric 4D Scene Reconstruction

Tingxi Chen, Zhengxue Cheng, Houqiang Zhong, Su Wang, Rong Xie, Li Song

2604.07981 2026-04-10 cs.CL cs.AI cs.LG

A Decomposition Perspective to Long-context Reasoning for LLMs

Yanling Xiao, Huaibing Xie, Guoliang Zhao, Shihan Dou, Shaolei Wang, Yiting Liu, Nantao Zheng, Cheng Zhang, Pluto Zhou, Zhisong Zhang, Lemao Liu

2604.07980 2026-04-10 cs.CV

Object-Centric Stereo Ranging for Autonomous Driving: From Dense Disparity to Census-Based Template Matching

Qihao Huang

Comments 10 pages, 4 figures

2604.07973 2026-04-10 cs.AI

How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace

Baining Zhao, Ziyou Wang, Jianjie Fang, Zile Zhou, Yanggang Xu, Yatai Ji, Jiacheng Xu, Qian Zhang, Weichen Zhang, Chen Gao, Xinlei Chen

2604.07966 2026-04-10 cs.CV

Lighting-grounded Video Generation with Renderer-based Agent Reasoning

Ziqi Cai, Taoyu Yang, Zheng Chang, Si Li, Han Jiang, Shuchen Weng, Boxin Shi

Comments Accepted to CVPR 2026

2604.07965 2026-04-10 cs.CV cs.AI cs.LG

DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing

Gyanendra Das, Sai Satyam Jena

Comments Accepted at CVPR 2026

详情

英文摘要

Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2604.07964 2026-04-10 cs.AI cs.LG

Are we still able to recognize pearls? Machine-driven peer review and the risk to creativity: An explainable RAG-XAI detection framework with markers extraction

Alin-Gabriel Văduva, Simona-Vasilica Oprea, Adela Bâra

2604.07963 2026-04-10 cs.CL cs.AI cs.LG

Rethinking Data Mixing from the Perspective of Large Language Models

Yuanjian Xu, Tianze Sun, Changwei Xu, XinLong Zhao, Jianing Hao, Ran Chen, Yang Liu, Ruijie Xu, Stephen Chen, Guang Zhang

2604.07962 2026-04-10 cs.LG

Is your algorithm unlearning or untraining?

Eleni Triantafillou, Ahmed Imtiaz Humayun, Monica Ribero, Alexander Matt Turner, Michael C. Mozer, Georgios Kaissis

2604.07958 2026-04-10 cs.CV

ImVideoEdit: Image-learning Video Editing via 2D Spatial Difference Attention Blocks

Jiayang Xu, Fan Zhuo, Majun Zhang, Changhao Pan, Zehan Wang, Siyu Chen, Xiaoda Yang, Tao Jin, Zhou Zhao