arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.11214 2026-03-18 cs.AI cs.LG

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

Linus Folkerts, Will Payne, Simon Inman, Philippos Giavridis, Joe Skinner, Sam Deverett, James Aung, Ekin Zorer, Michael Schmatz, Mahmoud Ghanem, John Wilkinson, Alan Steer, Vy Hong, Jessica Wang

详情

英文摘要

We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across extended action sequences. By comparing seven models released over an eighteen-month period (August 2024 to February 2026) at varying inference-time compute budgets, we observe two capability trends. First, model performance scales log-linearly with inference-time compute, with no observed plateau-increasing from 10M to 100M tokens yields gains of up to 59%, requiring no specific technical sophistication from the operator. Second, each successive model generation outperforms its predecessor at fixed token budgets: on the corporate network range, average steps completed at 10M tokens rose from 1.7 (GPT-4o, August 2024) to 9.8 (Opus 4.6, February 2026). The best single run completed 22 of 32 steps, corresponding to roughly 6 of the estimated 14 hours a human expert would need. On the industrial control system range, performance remains limited, though the most recent models are the first to reliably complete steps, averaging 1.2-1.4 of 7 (max 3).

URL PDF HTML ☆

赞 0 踩 0

2603.10967 2026-03-18 cs.CV

Med-DualLoRA: Local Adaptation of Foundation Models for 3D Cardiac MRI

Joan Perramon-Llussà, Amelia Jiménez-Sánchez, Grzegorz Skorupko, Fotis Avgoustidis, Carlos Martín-Isla, Karim Lekadir, Polyxeni Gkontra

Comments 11 pages, 2 figures. Submitted to MICCAI 2026

2603.10349 2026-03-18 cs.CV

EmoStory: Emotion-Aware Story Generation

Jingyuan Yang, Rucong Chen, Weibin Luo, Hui Huang

Comments accepted to ICME

2603.08117 2026-03-18 cs.AI cs.IR

UIS-Digger: Towards Comprehensive Research Agent Systems for Real-world Unindexed Information Seeking

Chang Liu, Chuqiao Kuang, Tianyi Zhuang, Yuxin Cheng, Huichi Zhou, Xiaoguang Li, Lifeng Shang

Comments 21 pages, 5 figures, ICLR 2026

2603.06691 2026-03-18 cs.CV cs.RO

One-Shot Badminton Shuttle Detection for Mobile Robots

Florentin Dipner, William Talbot, Turcan Tuna, Andrei Cramariuc, Marco Hutter

2603.06471 2026-03-18 cs.CV

Match4Annotate: Propagating Sparse Video Annotations via Implicit Neural Feature Matching

Zhuorui Zhang, Roger Pallarès-López, Praneeth Namburi, Brian W. Anthony

2603.06289 2026-03-18 cs.CV

FlowMotion: Training-Free Flow Guidance for Video Motion Transfer

Zhen Wang, Youcan Xu, Jun Xiao, Long Chen

Comments CVPR 2026, Code: https://github.com/HKUST-LongGroup/FlowMotion

2603.06183 2026-03-18 cs.CL cs.AI cs.CV

CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation

Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmaish, Sung Eun Kim, Oishi Banerjee, Pranav Rajpurkar

详情

英文摘要

We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. Unlike prior metrics, CRIMSON incorporates full clinical context, including patient age, indication, and guideline-based decision rules, and prevents normal or clinically insignificant findings from exerting disproportionate influence on the overall score. The framework categorizes errors into a comprehensive taxonomy covering false findings, missing findings, and eight attribute-level errors (e.g., location, severity, measurement, and diagnostic overinterpretation). Each finding is assigned a clinical significance level (urgent, actionable non-urgent, non-actionable, or expected/benign), based on a guideline developed in collaboration with attending cardiothoracic radiologists, enabling severity-aware weighting that prioritizes clinically consequential mistakes over benign discrepancies. CRIMSON is validated through strong alignment with clinically significant error counts annotated by six board-certified radiologists in ReXVal (Kendalls tau = 0.61-0.71; Pearsons r = 0.71-0.84), and through two additional benchmarks that we introduce. In RadJudge, a targeted suite of clinically challenging pass-fail scenarios, CRIMSON shows consistent agreement with expert judgment. In RadPref, a larger radiologist preference benchmark of over 100 pairwise cases with structured error categorization, severity modeling, and 1-5 overall quality ratings from three cardiothoracic radiologists, CRIMSON achieves the strongest alignment with radiologist preferences. We release the metric, the evaluation benchmarks, RadJudge and RadPref, and a fine-tuned MedGemma model to enable reproducible evaluation of report generation, all available at https://github.com/rajpurkarlab/CRIMSON.

URL PDF HTML ☆

赞 0 踩 0

2603.03378 2026-03-18 cs.LG cs.AI

AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis

Pei Yang, Wanyi Chen, Asuka Yuxi Zheng, Xueqian Li, Xiang Li, Haoqin Tu, Jie Xiao, Yifan Pang, Dongdong Zhang, Fuqiang Li, Alfred Long, Lynn Ai, Eric Yang, Bill Shi

2603.02976 2026-03-18 cs.RO

DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent Space

Jiwon Park, Dongkyu Lee, I Made Aswin Nahrendra, Jaeyoung Lim, Hyun Myung

2603.01953 2026-03-18 cs.RO cs.AI cs.CV

Closed-Loop Action Chunks with Dynamic Corrections for Training-Free Diffusion Policy

Pengyuan Wu, Pingrui Zhang, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

Comments Accepted by ICRA2026

2603.00917 2026-03-18 cs.CL cs.AI

Prompt Sensitivity and Answer Consistency of Small Open-Source Language Models for Clinical Question Answering in Low-Resource Healthcare

Shravani Hariprasad

Comments 30 pages, 7 figures, 2 tables

2603.00512 2026-03-18 cs.CV

Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding

Wang Chen, Yuhui Zeng, Yongdong Luo, Tianyu Xie, Luojun Lin, Jiayi Ji, Yan Zhang, Xiawu Zheng

Comments Accepted at CVPR 2026

详情

英文摘要

Frame selection is crucial due to high frame redundancy and limited context windows when applying Large Vision-Language Models (LVLMs) to long videos. Current methods typically select frames with high relevance to a given query, resulting in a disjointed set of frames that disregard the narrative structure of video. In this paper, we introduce Wavelet-based Frame Selection by Detecting Semantic Boundary (WFS-SB), a training-free framework that presents a new perspective: effective video understanding hinges not only on high relevance but, more importantly, on capturing semantic shifts - pivotal moments of narrative change that are essential to comprehending the holistic storyline of video. However, direct detection of abrupt changes in the query-frame similarity signal is often unreliable due to high-frequency noise arising from model uncertainty and transient visual variations. To address this, we leverage the wavelet transform, which provides an ideal solution through its multi-resolution analysis in both time and frequency domains. By applying this transform, we decompose the noisy signal into multiple scales and extract a clean semantic change signal from the coarsest scale. We identify the local extrema of this signal as semantic boundaries, which segment the video into coherent clips. Building on this, WFS-SB comprises a two-stage strategy: first, adaptively allocating a frame budget to each clip based on a composite importance score; and second, within each clip, employing the Maximal Marginal Relevance approach to select a diverse yet relevant set of frames. Extensive experiments show that WFS-SB significantly boosts LVLM performance, e.g., improving accuracy by 5.5% on VideoMME, 9.5% on MLVU, and 6.2% on LongVideoBench, consistently outperforming state-of-the-art methods. Our code is available at https://github.com/MAC-AutoML/WFS-SB.

URL PDF HTML ☆

赞 0 踩 0

2603.00170 2026-03-18 cs.CV cs.AI cs.NE

A Novel Evolutionary Method for Automated Skull-Face Overlay in Computer-Aided Craniofacial Superimposition

Práxedes Martínez-Moreno, Andrea Valsecchi, Pablo Mesejo, Pilar Navarro-Ramírez, Valentino Lugli, Sergio Damas

Comments 11 pages, 6 figures, 3 tables

2603.00154 2026-03-18 cs.RO cs.HC

Trust in Autonomous Human--Robot Collaboration: Effects of Responsive Interaction Policies

Shauna Heron, Meng Cheng Lau

2602.24144 2026-03-18 cs.CV

Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation

Muquan Li, Hang Gou, Yingyi Ma, Rongzheng Wang, Ke Qin, Tao He

Comments Accepted to CVPR 2026

2602.23592 2026-03-18 cs.RO cs.AI cs.SE

KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning

Zebin Yang, Tong Xie, Baotong Lu, Shaoshan Liu, Bo Yu, Meng Li

Comments DAC 2026

2602.22896 2026-03-18 cs.RO

DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation

Zebin Yang, Yijiahao Qi, Tong Xie, Bo Yu, Shaoshan Liu, Meng Li

Comments DAC 2026

2602.22579 2026-03-18 cs.RO cs.SE

Metamorphic Testing of Vision-Language Action-Enabled Robots

Pablo Valle, Sergio Segura, Shaukat Ali, Aitor Arrieta

2602.22118 2026-03-18 cs.RO

System Design of the Ultra Mobility Vehicle: A Driving, Balancing, and Jumping Bicycle Robot

Benjamin Bokser, Daniel Gonzalez, Aaron Preston, Alex Bahner, Annika Wollschläger, Arianna Ilvonen, Asa Eckert-Erdheim, Ashwin Khadke, Bilal Hammoud, Dean Molinaro, Fabian Jenelten, Henry Mayne, Howie Choset, Igor Bogoslavskyi, Itic Tinman, James Tigue, Jan Preisig, Kaiyu Zheng, Kenny Sharma, Kim Ang, Laura Lee, Liana Margolese, Nicole Lin, Oscar Frias, Paul Drews, Ravi Boggavarapu, Rick Burnham, Samuel Zapolsky, Sangbae Kim, Scott Biddlestone, Sean Mayorga, Shamel Fahmi, Surya Singh, Tyler McCollum, Velin Dimitrov, William Moyne, Yu-Ming Chen, Farbod Farshidian, Marco Hutter, David Perry, Al Rizzi, Gabe Nelson

Comments 17 Pages, 11 figures, 3 movies, 2 tables

2602.22092 2026-03-18 cs.CV

Overview of the CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification

Hexin Dong, Yi Lin, Pengyu Zhou, Xuan Zhong Feng, Alan Clint Legasto, Mingquan Lin, Hao Chen, Yuzhe Yang, George Shih, Yifan Peng

2602.21637 2026-03-18 cs.CV

CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

Di Zhang, Zhangpeng Gong, Xiaobo Pang, Jiashuai Liu, Junbo Lu, Hao Cui, Jiusong Ge, Zhi Zeng, Kai Yi, Yinghua Li, Si Liu, Tingsong Yu, Haoran Wang, Mireia Crispin-Ortuzar, Weimiao Yu, Chen Li, Zeyu Gao

Comments Accepted to CVPR 2026

2602.21435 2026-03-18 cs.CV

Synergizing Understanding and Generation with Interleaved Analyzing-Drafting Thinking

Shengqiong Wu, Bobo Li, Xinkai Wang, Xiangtai Li, Lei Cui, Furu Wei, Shuicheng Yan, Hao Fei, Tat-seng Chua

Comments 28 pages, 17 figures, 6 tables, ICLR conference

2602.19570 2026-03-18 cs.CV

VALD: Multi-Stage Vision Attack Detection for Efficient LVLM Defense

Nadav Kadvil, Malak Fares, Ayellet Tal

2602.16564 2026-03-18 cs.LG cs.CR

A Scalable Approach to Solving Simulation-Based Network Security Games

Michael Lanier, Yevgeniy Vorobeychik

2602.13693 2026-03-18 cs.CV

A WDLoRA-Based Multimodal Generative Framework for Clinically Guided Corneal Confocal Microscopy Image Synthesis in Diabetic Neuropathy

Xin Zhang, Liangxiu Han, Tam Sobeih, Yue Shi, Yalin Zheng, Uazman Alam, Maryam Ferdousi, Rayaz Malik

详情

英文摘要

Corneal Confocal Microscopy (CCM) is a sensitive tool for assessing small-fiber damage in Diabetic Peripheral Neuropathy (DPN), yet the development of robust, automated deep learning-based diagnostic models is limited by scarce labelled data and fine-grained variability in corneal nerve morphology. Although Artificial Intelligence (AI)-driven foundation generative models excel at natural image synthesis, they often struggle in medical imaging due to limited domain-specific training, compromising the anatomical fidelity required for clinical analysis. To overcome these limitations, we propose a Weight-Decomposed Low-Rank Adaptation (WDLoRA)-based multimodal generative framework for clinically guided CCM image synthesis. WDLoRA is a parameter-efficient fine-tuning (PEFT) mechanism that decouples magnitude and directional weight updates, enabling foundation generative models to independently learn the orientation (nerve topology) and intensity (stromal contrast) required for medical realism. By jointly conditioning on nerve segmentation masks and disease-specific clinical prompts, the model synthesises anatomically coherent images across the DPN spectrum (Control, T1NoDPN, T1DPN). A comprehensive three-pillar evaluation demonstrates that the proposed framework achieves state-of-the-art visual fidelity (Fréchet Inception Distance (FID): 5.18) and structural integrity (Structural Similarity Index Measure (SSIM): 0.630), significantly outperforming GAN and standard diffusion baselines. Crucially, the synthetic images preserve gold-standard clinical biomarkers and are statistically equivalent to real patient data. When used to train automated diagnostic models, the synthetic dataset improves downstream diagnostic accuracy by 2.1% and segmentation performance by 2.2%, validating the framework's potential to alleviate data bottlenecks in medical AI.

URL PDF HTML ☆

赞 0 踩 0

2602.09021 2026-03-18 cs.RO cs.CV

$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

Checheng Yu, Chonghao Sima, Gangcheng Jiang, Hai Zhang, Haoguang Mai, Hongyang Li, Huijie Wang, Jin Chen, Kaiyang Wu, Li Chen, Lirui Zhao, Modi Shi, Ping Luo, Qingwen Bu, Shijia Peng, Tianyu Li, Yibo Yuan

2602.06698 2026-03-18 cs.RO

Crowd-FM: Learned Optimal Selection of Conditional Flow Matching-generated Trajectories for Crowd Navigation

Antareep Singha, Laksh Nanwani, Mathai Mathew P., Samkit Jain, Phani Teja Singamaneni, Arun Kumar Singh, K. Madhava Krishna

Comments Accepted at IEEE ICRA 2026. Authors Antareep Singha and Laksh Nanwani have equal contributions

2602.06533 2026-03-18 cs.AI cs.CL

LogicSkills: A Structured Benchmark for Formal Reasoning in Large Language Models

Brian Rabern, Philipp Mondorf, Barbara Plank

Comments 12 pages, 5 figures

2601.19913 2026-03-18 cs.CL cs.AI

From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Shinwoo Park, Yo-Sub Han