arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.11711 2026-04-14 cs.CV

Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models

Nhan Ho, Luu Le, Thanh-Huy Nguyen, Thien Nguyen, Xiaofeng Liu, Ulas Bagci

Comments Accepted at CV4Clinic, CVPR 2026. 10 pages, 4 figures

详情

英文摘要

Occlusion, where target structures are partially hidden by surgical instruments or overlapping tissues, remains a critical yet underexplored challenge for foundation segmentation models in clinical endoscopy. We introduce OccSAM-Bench, a benchmark designed to systematically evaluate SAM-family models under controlled, synthesized surgical occlusion. Our framework simulates two occlusion types (i.e., surgical tool overlay and cutout) across three calibrated severity levels on three public polyp datasets. We propose a novel three-region evaluation protocol that decomposes segmentation performance into full, visible-only, and invisible targets. This metric exposes behaviors that standard amodal evaluation obscures, revealing two distinct model archetypes: Occluder-Aware models (SAM, SAM 2, SAM 3, MedSAM3), which prioritize visible tissue delineation and reject instruments, and Occluder-Agnostic models (MedSAM, MedSAM2), which confidently predict into occluded regions. SAM-Med2D aligns with neither and underperforms across all conditions. Ultimately, our results demonstrate that occlusion robustness is not uniform across architectures, and model selection must be driven by specific clinical intent-whether prioritizing conservative visible-tissue segmentation or the amodal inference of hidden anatomy.

URL PDF HTML ☆

赞 0 踩 0

2604.11709 2026-04-14 cs.AI

A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment

Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby

2604.11708 2026-04-14 cs.RO cs.SE cs.SY eess.SY

ACT: Automated CPS Testing for Open-Source Robotic Platforms

Aditya A. Krishnan, Donghoon Kim, Hokeun Kim

2604.11707 2026-04-14 cs.CV

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

Efstathios Karypidis, Spyros Gidaris, Nikos Komodakis

2604.11705 2026-04-14 cs.AI cs.CL cs.RO cs.SY eess.SY

Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

Deeksha Prahlad, Daniel Fan, Hokeun Kim

2604.11704 2026-04-14 cs.LG cs.AI

Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning

Nicolas Rodriguez-Alvarez, Fernando Rodriguez-Merino

2604.11703 2026-04-14 cs.AI

DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness

Javad M Alizadeh, Genhui Zheng, Chiu C Tan, Yuzhou Chen, Omar Martinez, Philip McCallion, Ying Ding, Chenguang Yang, AnneMarie Tomosky, Huanmei Wu

Comments This manuscript has been accepted at the 14th IEEE International Conference on Healthcare Informatics (ICHI 2026)

2604.11699 2026-04-14 cs.CL cs.AI cs.LG

Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning

Jieying Xue, Phuong Minh Nguyen, Ha Thanh Nguyen, May Myo Zin, Ken Satoh

Comments Accepted at ICAIL 2026

详情

英文摘要

This work aims to improve the generalization of logic-based legal reasoning systems by integrating recent advances in NLP with legal-domain adaptive few-shot learning techniques using LLMs. Existing logic-based legal reasoning pipelines typically rely on fine-tuned models to map natural-language legal cases into logical formulas before forwarding them to a symbolic reasoner. However, such approaches are heavily constrained by the scarcity of high-quality annotated training data. To address this limitation, we propose a novel LLM-based legal reasoning framework that enables effective in-context learning through retrieval-augmented generation. Specifically, we introduce Legal2LogicICL, a few-shot retrieval framework that balances diversity and similarity of exemplars at both the latent semantic representation level and the legal text structure level. In addition, our method explicitly accounts for legal structure by mitigating entity-induced retrieval bias in legal texts, where lengthy and highly specific entity mentions often dominate semantic representations and obscure legally meaningful reasoning patterns. Our Legal2LogicICL constructs informative and robust few-shot demonstrations, leading to accurate and stable logical rule generation without requiring additional training. In addition, we construct a new dataset, named Legal2Proleg, which is annotated with alignments between legal cases and PROLEG logical formulas to support the evaluation of legal semantic parsing. Experimental results on both open-source and proprietary LLMs demonstrate that our approach significantly improves accuracy, stability, and generalization in transforming natural-language legal case descriptions into logical representations, highlighting its effectiveness for interpretable and reliable legal reasoning. Our code is available at https://github.com/yingjie7/Legal2LogicICL.

URL PDF HTML ☆

赞 0 踩 0

2604.11689 2026-04-14 cs.CV cs.RO

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

Dujun Nie, Fengjiao Chen, Qi Lv, Jun Kuang, Xiaoyu Li, Xuezhi Cao, Xunliang Cai

Comments Project: https://meituan-longcat.github.io/LARYBench Code: https://github.com/meituan-longcat/LARYBench Dataset: https://huggingface.co/datasets/meituan-longcat/LARYBench

2604.11687 2026-04-14 cs.CL

Please Make it Sound like Human: Encoder-Decoder vs. Decoder-Only Transformers for AI-to-Human Text Style Transfer

Utsav Paneru

Comments 12 pages, 3 figures, 2 tables

2604.11685 2026-04-14 cs.CV

Unfolding 3D Gaussian Splatting via Iterative Gaussian Synopsis

Yuqin Lu, Yang Zhou, Yihua Dai, Guiqing Li, Shengfeng He

2604.11680 2026-04-14 cs.RO

Dual-Control Frequency-Aware Diffusion Model for Depth-Dependent Optical Microrobot Microscopy Image Generation

Lan Wei, Zongcai Tan, Kangyi Lu, Jian-Qing Zheng, Dandan Zhang

2604.11668 2026-04-14 cs.CV

UNIGEOCLIP: Unified Geospatial Contrastive Learning

Guillaume Astruc, Eduard Trulls, Jan Hosang, Loic Landrieu, Paul-Edouard Sarlin

2604.11666 2026-04-14 cs.CL cs.AI cs.LG

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal

Comments First two authors contributed equally. Code: https://github.com/The-Inscrutable-X/AIDoubleAgentDefenders

详情

英文摘要

As large language models (LLMs) become the engine behind conversational systems, their ability to reason about the intentions and states of their dialogue partners (i.e., form and use a theory-of-mind, or ToM) becomes increasingly critical for safe interaction with potentially adversarial partners. We propose a novel privacy-themed ToM challenge, ToM for Steering Beliefs (ToM-SB), in which a defender must act as a Double Agent to steer the beliefs of an attacker with partial prior knowledge within a shared universe. To succeed on ToM-SB, the defender must engage with and form a ToM of the attacker, with a goal of fooling the attacker into believing they have succeeded in extracting sensitive information. We find that strong frontier models like Gemini3-Pro and GPT-5.4 struggle on ToM-SB, often failing to fool attackers in hard scenarios with partial attacker prior knowledge, even when prompted to reason about the attacker's beliefs (ToM prompting). To close this gap, we train models on ToM-SB to act as AI Double Agents using reinforcement learning, testing both fooling and ToM rewards. Notably, we find a bidirectionally emergent relationship between ToM and attacker-fooling: rewarding fooling success alone improves ToM, and rewarding ToM alone improves fooling. Across four attackers with different strengths, six defender methods, and both in-distribution and out-of-distribution (OOD) evaluation, we find that gains in ToM and attacker-fooling are well-correlated, highlighting belief modeling as a key driver of success on ToM-SB. AI Double Agents that combine both ToM and fooling rewards yield the strongest fooling and ToM performance, outperforming Gemini3-Pro and GPT-5.4 with ToM prompting on hard scenarios. We also show that ToM-SB and AI Double Agents can be extended to stronger attackers, demonstrating generalization to OOD settings and the upgradability of our task.

URL PDF HTML ☆

赞 0 踩 0

2604.11663 2026-04-14 cs.AI

Why Do Large Language Models Generate Harmful Content?

Rajesh Ganguli, Raha Moraffah

2604.11662 2026-04-14 cs.CL

Hidden Failures in Robustness: Why Supervised Uncertainty Quantification Needs Better Evaluation

Joe Stacey, Hadas Orgad, Kentaro Inui, Benjamin Heinzerling, Nafise Sadat Moosavi

2604.11655 2026-04-14 cs.CL cs.AI cs.MA

RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

Riccardo Rosati, Edoardo Colucci, Massimiliano Bolognini, Adriano Mancini, Paolo Sernani

2604.11653 2026-04-14 cs.CV

GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays

David Wong, Zeynep Isik, Bin Wang, Marouane Tliba, Gorkem Durak, Elif Keles, Halil Ertugrul Aktas, Aladine Chetouani, Cagdas Topel, Nicolo Gennaro, Camila Lopes Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur, Andrew C. Gordon, Ayis Pyrros, Frank H. Miller, Amir Borhani, Hatice Savas, Eric Hart, Elizabeth Krupinski, Ulas Bagci

Comments This work appears in ACM ETRA 2026

2604.11640 2026-04-14 cs.RO cs.SY eess.SY

Micro-Dexterity in Biological Micromanipulation: Embodiment, Perception, and Control

Kangyi Lu, Lan Wei, Zongcai Tan, Dandan Zhang

2604.11639 2026-04-14 cs.LG

Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures

Maxim Bolshim, Alexander Kugaevskikh

Comments 45 pages, 9 figures, 17 tables. Submitted to Neural Networks (Elsevier). Code: https://github.com/comiam/dag-hesse

2604.11637 2026-04-14 cs.CV

STS-Mixer: Spatio-Temporal-Spectral Mixer for 4D Point Cloud Video Understanding

Wenhao Li, Xueying Jiang, Gongjie Zhang, Xiaoqin Zhang, Ling Shao, Shijian Lu

Comments Accepted by CVPR 2026, Open Sourced

2604.11636 2026-04-14 cs.CV

MorphoFlow: Sparse-Supervised Generative Shape Modeling with Adaptive Latent Relevance

Mokshagna Sai Teja Karanam, Tushar Kataria, Shireen Elhabian

2604.11627 2026-04-14 cs.CV

POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs

Haicheng Wang, Yuan Liu, Yikun Liu, Zhemeng Yu, Zhongyin Zhao, Yangxiu You, Zilin Yu, Le Tian, Xiao Zhou, Jie Zhou, Weidi Xie, Yanfeng Wang

2604.11625 2026-04-14 cs.LG cs.AI

SCNO: Spiking Compositional Neural Operator -- Towards a Neuromorphic Foundation Model for Nuclear PDE Solving

Samrendra Roy, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam

2604.11611 2026-04-14 cs.CL cs.LG

Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation

Jiashu Yao, Heyan Huang, Zeming Liu, Yuhang Guo

Comments preprint

2604.11610 2026-04-14 cs.CL

Self-Evolving LLM Memory Extraction Across Heterogeneous Tasks

Yuqing Yang, Tengxiao Liu, Wang Bill Zhu, Taiwei Shi, Linxin Song, Robin Jia

2604.11609 2026-04-14 cs.AI cs.HC

Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

Benjamin Maltbie, Shivam Raval

2604.11590 2026-04-14 cs.CV

Learning Robustness at Test-Time from a Non-Robust Teacher

Stefano Bianchettin, Giulio Rossolini, Giorgio Buttazzo

详情

英文摘要

Nowadays, pretrained models are increasingly used as general-purpose backbones and adapted at test-time to downstream environments where target data are scarce and unlabeled. While this paradigm has proven effective for improving clean accuracy on the target domain, adversarial robustness has received far less attention, especially when the original pretrained model is not explicitly designed to be robust. This raises a practical question: \emph{can a pretrained, non-robust model be adapted at test-time to improve adversarial robustness on a target distribution?} To face this question, this work studies how adversarial training strategies behave when integrated into adaptation schemes for the unsupervised test-time setting, where only a small set of unlabeled target samples is available. It first analyzes how classical adversarial training formulations can be extended to this scenario, showing that straightforward distillation-based adaptations remain unstable and highly sensitive to hyperparameter tuning, particularly when the teacher itself is non-robust. To address these limitations, the work proposes a label-free framework that uses the predictions of a non-robust teacher model as a semantic anchor for both the clean and adversarial objectives during adaptation. We further provide theoretical insights showing that our formulation yields a more stable alternative to the self-consistency-based regularization commonly used in classical adversarial training. Experiments evaluate the proposed approach on CIFAR-10 and ImageNet under induced photometric transformations. The results support the theoretical insights by showing that the proposed approach achieves improved optimization stability, lower sensitivity to parameter choices, and a better robustness-accuracy trade-off than existing baselines in this post-deployment test-time setting.

URL PDF HTML ☆

赞 0 踩 0

2604.11589 2026-04-14 cs.CV

MLLM-as-a-Judge Exhibits Model Preference Bias

Shuitsu Koyama, Yuiga Wada, Daichi Yashima, Komei Sugiura

2604.11587 2026-04-14 cs.RO

Optimal Kinodynamic Motion Planning Through Anytime Bidirectional Heuristic Search with Tight Termination Condition

Yi Wang, Bingxian Mu, Shahab Shokouhi, May-Win Thein