arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.05515 2026-04-08 cs.CV

Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation

Chenxin Yuan, Shoupeng Chen, Haojiang Ye, Yiming Miao, Limei Peng, Pin-Han Ho

Comments 20 pages, 13 figures, supplementary material included, submitted to Medical Image Analysis

详情

英文摘要

Accurate segmentation of 3D medical scans is crucial for clinical diagnostics and treatment planning, yet existing methods often fail to achieve both high accuracy and computational efficiency across diverse anatomies and imaging modalities. To address these challenges, we propose GCNV-Net, a novel 3D medical segmentation framework that integrates a Tri-directional Dynamic Nonvoid Voxel Transformer (3DNVT), a Geometrical Cross-Attention module (GCA), and Nonvoid Voxelization. The 3DNVT dynamically partitions relevant voxels along the three orthogonal anatomical planes, namely the transverse, sagittal, and coronal planes, enabling effective modeling of complex 3D spatial dependencies. The GCA mechanism explicitly incorporates geometric positional information during multi-scale feature fusion, significantly enhancing fine-grained anatomical segmentation accuracy. Meanwhile, Nonvoid Voxelization processes only informative regions, greatly reducing redundant computation without compromising segmentation quality, and achieves a 56.13% reduction in FLOPs and a 68.49% reduction in inference latency compared to conventional voxelization. We evaluate GCNV-Net on multiple widely used benchmarks: BraTS2021, ACDC, MSD Prostate, MSD Pancreas, and AMOS2022. Our method achieves state-of-the-art segmentation performance across all datasets, outperforming the best existing methods by 0.65% on Dice, 0.63% on IoU, 1% on NSD, and relatively 14.5% on HD95. All results demonstrate that GCNV-Net effectively balances accuracy and efficiency, and its robustness across diverse organs, disease conditions, and imaging modalities highlights strong potential for clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2604.05514 2026-04-08 cs.AI

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

Haoyue Yang, Xuanle Zhao, Xuexin Liu, Feibang Jiang, Yao Zhu

Comments Accepted to ACL 2026 Findings

2604.05499 2026-04-08 cs.RO cs.SY eess.SY

MARS-Dragonfly: Agile and Robust Flight Control of Modular Aerial Robot Systems

Rui Huang, Zhiqian Cai, Siyu Tang, Pengxuan Wei, Lidong Li, Xin Chen, Wenhan Cao, Zhenyu Zhang, Lin Zhao

2604.05498 2026-04-08 cs.RO

JailWAM: Jailbreaking World Action Models in Robot Control

Hanqing Liu, Songping Wang, Jiahuan Long, Jiacheng Hou, Jialiang Sun, Chao Li, Yang Yang, Wei Peng, Xu Liu, Tingsong Jiang, Wen Yao, Yao Mu

详情

英文摘要

The World Action Model (WAM) can jointly predict future world states and actions, exhibiting stronger physical manipulation capabilities compared with traditional models. Such powerful physical interaction ability is a double-edged sword: if safety is ignored, it will directly threaten personal safety, property security and environmental safety. However, existing research pays extremely limited attention to the critical security gap: the vulnerability of WAM to jailbreak attacks. To fill this gap, we define the Three-Level Safety Classification Framework to systematically quantify the safety of robotic arm motions. Furthermore, we propose JailWAM, the first dedicated jailbreak attack and evaluation framework for WAM, which consists of three core components: (1) Visual-Trajectory Mapping, which unifies heterogeneous action spaces into visual trajectory representations and enables cross-architectural unified evaluation; (2) Risk Discriminator, which serves as a high-recall screening tool that optimizes the efficiency-accuracy trade-off when identifying destructive behaviors in visual trajectories; (3) Dual-Path Verification Strategy, which first conducts rapid coarse screening via a single-image-based video-action generation module, and then performs efficient and comprehensive verification through full closed-loop physical simulation. In addition, we construct JailWAM-Bench, a benchmark for comprehensively evaluating the safety alignment performance of WAM under jailbreak attacks. Experiments in RoboTwin simulation environment demonstrate that the proposed framework efficiently exposes physical vulnerabilities, achieving an 84.2% attack success rate on the state-of-the-art LingBot-VA. Meanwhile, robust defense mechanisms can be constructed based on JailWAM, providing an effective technical solution for designing safe and reliable robot control systems.

URL PDF HTML ☆

赞 0 踩 0

2604.05497 2026-04-08 cs.AI cs.CV

Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models

Keuntae Kim, Mingyu Kang, Yong Suk Choi

Comments CVPR 2026 - main

2604.05490 2026-04-08 cs.CV

A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures

Wenbo Zhang, Zekun Long, Zican Liu, Yangchen Zeng, Keyi Hu

Comments 8 pages, 7 figures, 5 tables. Accepted by International Joint Conference on Neural Networks (IJCNN)

2604.05485 2026-04-08 cs.AI

Auditable Agents

Yi Nian, Aojie Yuan, Haiyue Zhang, Jiate Li, Yue Zhao

2604.05484 2026-04-08 cs.RO cs.CV

CoEnv: Driving Embodied Multi-Agent Collaboration via Compositional Environment

Li Kang, Yutao Fan, Rui Li, Heng Zhou, Yiran Qin, Zhemeng Zhang, Songtao Huang, Xiufeng Song, Zaibin Zhang, Bruno N. Y. Chen, Zhenfei Yin, Dongzhan Zhou, Wangmeng Zuo, Lei Bai

Comments 31 pages, 8 figures, including supplementary material. Project page: https://faceong.github.io/CoEnv/

2604.05483 2026-04-08 cs.AI cs.CL

Can We Trust a Black-box LLM? LLM Untrustworthy Boundary Detection via Bias-Diffusion and Multi-Agent Reinforcement Learning

Xiaotian Zhou, Di Tang, Xiaofeng Wang, Xiaozhong Liu

2604.05482 2026-04-08 cs.CV cs.AI

Unifying VLM-Guided Flow Matching and Spectral Anomaly Detection for Interpretable Veterinary Diagnosis

Pu Wang, Zhixuan Mao, Jialu Li, Zhuoran Zheng, Dianjie Lu, Youshan Zhang

2604.05477 2026-04-08 cs.CL

Don't Act Blindly: Robust GUI Automation via Action-Effect Verification and Self-Correction

Yuzhe Zhang, Xianwei Xue, Xingyong Wu, Mengke Chen, Chen Liu, Xinran He, Run Shao, Feiran Liu, Huanmin Xu, Qiutong Pan, Haiwei Wang

Comments ACL 2026 Main Conference

2604.05476 2026-04-08 cs.LG

Reproducing AlphaZero on Tablut: Self-Play RL for an Asymmetric Board Game

Tõnis Lees, Tambet Matiisen

Comments For the code see https://github.com/tonislees/TablutZero

2604.05475 2026-04-08 cs.CV

A Synthetic Eye Movement Dataset for Script Reading Detection: Real Trajectory Replay on a 3D Simulator

Kidus Zewde, Yuchen Zhou, Dennis Ng, Neo Tiangratanakul, Tommy Duong, Ankit Raj, Yuxin Zhang, Xingyu Shen, Simiao Ren

Comments Synthetic eye movement dataset generation via 3D eye simulator; iris trajectory replay; script reading detection; behavioral data augmentation

2604.05468 2026-04-08 cs.AI

OntoTKGE: Ontology-Enhanced Temporal Knowledge Graph Extrapolation

Dongying Lin, Yinan Liu, Shengwei tang, Bin Wang, Xiaochun Yang

Comments 9 pages, 7 figures

2604.05465 2026-04-08 cs.AI

Adaptive Serverless Resource Management via Slot-Survival Prediction and Event-Driven Lifecycle Control

Zeyu Wang, Cuiqianhe Du, Renyue Zhang, Kejian Tong, Qi He, Qiyuan Tian

2604.05461 2026-04-08 cs.CL cs.SI

Content Fuzzing for Escaping Information Cocoons on Digital Social Media

Yifeng He, Ziye Tang, Hao Chen

Comments accepted to findings of ACL 2026

2604.05449 2026-04-08 cs.CV

Not All Agents Matter: From Global Attention Dilution to Risk-Prioritized Game Planning

Kang Ding, Hongsong Wang, Jie Gui, Lei He

Comments 14 pages, 5 figures

2604.05445 2026-04-08 cs.CL cs.AI cs.CV

Learning What Matters: Dynamic Dimension Selection and Aggregation for Interpretable Vision-Language Reward Modeling

Qiyuan Chen, Hongsen Huang, Jiahe Chen, Qian Shao, Jintai Chen, Hongxia Xu, Renjie Hua, Chuan Ren, Jian Wu

Comments ACL 2026 Main

2604.05436 2026-04-08 cs.CV cs.AI

Human Interaction-Aware 3D Reconstruction from a Single Image

Gwanghyun Kim, Junghun James Kim, Suh Yoon Jeon, Jason Park, Se Young Chun

Comments Accepted to CVPR 2026

2604.05435 2026-04-08 cs.AI

Automated Auditing of Hospital Discharge Summaries for Care Transitions

Akshat Dasula, Prasanna Desikan, Jaideep Srivastava

Comments Accepted as a poster at IEEE-ICHI 2026; 3 pages, 2 figure

2604.05433 2026-04-08 cs.CV

Few-Shot Semantic Segmentation Meets SAM3

Yi-Jen Tsai, Yen-Yu Lin, Chien-Yao Wang

Comments 14 pages, 3 figures

2604.05431 2026-04-08 cs.CV

Cross-Stage Attention Propagation for Efficient Semantic Segmentation

Beoungwoo Kang

Comments 7 pages, 6 figures

2604.05430 2026-04-08 cs.RO

Synergizing Efficiency and Reliability for Continuous Mobile Manipulation

Chengkai Wu, Ruilin Wang, Yixin Zeng, Jiayuan Wang, Mingjie Zhang, Guiyong Zheng, Qun Niu, Juepeng Zheng, Jun Ma, Boyu Zhou

Comments 33 pages, 26 figures, 4 tables. Video: https://www.bilibili.com/video/BV1YWP4zxEQD

2604.05427 2026-04-08 cs.RO

Pre-Execution Safety Gate & Task Safety Contracts for LLM-Controlled Robot Systems

Ike Obi, Vishnunandan L. N. Venkatesh, Weizheng Wang, Ruiqi Wang, Dayoon Suh, Temitope I. Amosa, Wonse Jo, Byung-Cheol Min

2604.05424 2026-04-08 cs.AI cs.CL

PRISM-MCTS: Learning from Reasoning Trajectories with Metacognitive Reflection

Siyuan Cheng, Bozhong Tian, YanChao Hao, Zheng Wei

2604.05417 2026-04-08 cs.CL

Multi-Drafter Speculative Decoding with Alignment Feedback

Taehyeon Kim, Hojung Jung, Se-Young Yun

Comments ACL 2026 Findings

2604.05416 2026-04-08 cs.AI

Multi-Agent Pathfinding with Non-Unit Integer Edge Costs via Enhanced Conflict-Based Search and Graph Discretization

Hongkai Fan, Qinjing Xie, Bo Ouyang, Yaonan Wang, Zhi Yan, Jiawen He, Zheng Fang

Comments 16 pages, 7 figures, submitted to cs.AI, Multi-Agent Systems, Pathfinding Optimization

2604.05415 2026-04-08 cs.CV

Learning to Synergize Semantic and Geometric Priors for Limited-Data Wheat Disease Segmentation

Shijie Wang, Zijian Wang, Yadan Luo, Scott Chapman, Xin Yu, Zi Huang

2604.05414 2026-04-08 cs.LG cs.CV

Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations

Chris Choy

2604.05409 2026-04-08 cs.CV

CRISP: Rank-Guided Iterative Squeezing for Robust Medical Image Segmentation under Domain Shift

Yizhou Fang, Pujin Cheng, Yixiang Liu, Xiaoying Tang, Longxi Zhou

详情

英文摘要

Distribution shift in medical imaging remains a central bottleneck for the clinical translation of medical AI. Failure to address it can lead to severe performance degradation in unseen environments and exacerbate health inequities. Existing methods for domain adaptation are inherently limited by exhausting predefined possibilities through simulated shifts or pseudo-supervision. Such strategies struggle in the open-ended and unpredictable real world, where distribution shifts are effectively infinite. To address this challenge, we introduce an empirical law called ``Rank Stability of Positive Regions'', which states that the relative rank of predicted probabilities for positive voxels remains stable under distribution shift. Guided by this principle, we propose CRISP, a parameter-free and model-agnostic framework requiring no target-domain information. CRISP is the first framework to make segmentation based on rank rather than probabilities. CRISP simulates model behavior under distribution shift via latent feature perturbation, where voxel probability rankings exhibit two stable patterns: regions that consistently retain high probabilities (destined positives according to the principle) and those that remain low-probability (can be safely classified as negatives). Based on these patterns, we construct high-precision (HP) and high-recall (HR) priors and recursively refine them under perturbation. We then design an iterative training framework, making HP and HR progressively ``squeeze'' to the final segmentation. Extensive evaluations on multi-center cardiac MRI and CT-based lung vessel segmentation demonstrate CRISP's superior robustness, significantly outperforming state-of-the-art methods with striking HD95 reductions of up to 0.14 (7.0\% improvement), 1.90 (13.1\% improvement), and 8.39 (38.9\% improvement) pixels across multi-center, demographic, and modality shifts, respectively.

URL PDF HTML ☆

赞 0 踩 0