arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.13513 2026-04-16 cs.RO

A transformable slender microrobot inspired by nematode parasites for interventional endovascular surgery

Xin Yang, Dongliang Fan, Yunteng Ma, Yuxuan Liao, Diancheng Li, U Kei Cheang, Bo Peng, Hongqiang Wang

详情

英文摘要

Cardiovascular diseases account for around 17.9 million deaths per year globally, the treatment of which is challenging considering the confined space and complex topology of the vascular network and high risks during operations. Robots, although promising, still face the dilemma of possessing versatility or maneuverability after decades of development. Inspired by nematodes, the parasites living, feeding, and moving in the human body's vascular system, this work develops a transformable slender magnetic microrobot. Based on the experiments and analyses, we optimize the fabrication and geometry of the robot and finally create a slender prototype with an aspect ratio larger than 100 (smaller than 200 microns in diameter and longer than 20 mm in length), which possesses uniformly distributed magnetic beads on the body of an ultrathin polymer string and a big bead on the head. This prototype shows great flexibility (largest curvature 0.904 mm-1) and locomotion capability (the maximum speed: 125 mm/s). Moreover, the nematode-inspired robot can pass through sharp turns with a radius of 0.84 mm and holes distributed in three-dimensional (3D) space. We also display the potential application in interventional surgery of the microrobot by navigating it through a narrow blood vessel mold to wrap and transport a drug (95 times heavier than the robot) by deforming the robot's slender body and releasing the drug to the aim position finally. Moreover, the robot also demonstrates the possible applications in embolization by transforming and winding itself into an aneurysms phantom and exhibits its outstanding injectability by being successfully withdrawn and injected through a medical needle (diameter: 1.2 mm) of a syringe.

URL PDF HTML ☆

赞 0 踩 0

2604.13509 2026-04-16 cs.CV

DiT as Real-Time Rerenderer: Streaming Video Stylization with Autoregressive Diffusion Transformer

Hengye Lyu, Zisu Li, Yue Hong, Yueting Weng, Jiaxin Shi, Hanwang Zhang, Chen Liang

2604.13504 2026-04-16 cs.LG cs.AI cs.CL cs.MA cs.RO

Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning

Shentong Mo

2604.13495 2026-04-16 cs.CV

ADP-DiT: Text-Guided Diffusion Transformer for Brain Image Generation in Alzheimer's Disease Progression

Juneyong Lee, Geonwoo Baek, Ikbeom Jang

Comments 15 pages, 3 figures, accepted to ICPR 2026

2604.13492 2026-04-16 cs.RO cs.CV

RadarSplat-RIO: Indoor Radar-Inertial Odometry with Gaussian Splatting-Based Radar Bundle Adjustment

Pou-Chun Kung, Yuan Tian, Zhengqin Li, Yue Liu, Eric Whitmire, Wolf Kienzle, Hrvoje Benko

2604.13488 2026-04-16 cs.AI

Towards Scalable Lightweight GUI Agents via Multi-role Orchestration

Ziwei Wang, Junjie Zheng, Leyang Yang, Sheng Zhou, Xiaoxuan Tang, Zhouhua Fang, Zhiwei Liu, Dajun Chen, Yong Li, Jiajun Bu

Comments Findings of ACL 2026

2604.13481 2026-04-16 cs.LG cs.AI physics.ao-ph

Monthly Diffusion v0.9: A Latent Diffusion Model for the First AI-MIP

Kyle J. C. Hall, Maria J. Molina

2604.13472 2026-04-16 cs.LG cs.AI cs.MA

Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus

Zijian Zhao, Jing Gao, Sen Li

2604.13471 2026-04-16 cs.LG

Computational framework for multistep metabolic pathway design

Peter Zhiping Zhang, Jeffrey D. Varner

2604.13470 2026-04-16 cs.LG stat.ML

Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

Nafiz Ishtiaque, Syed Arefinul Haque, Kazi Ashraful Alam, Fatima Jahara

Comments 10+19 pages

2604.13465 2026-04-16 cs.LG eess.SP

Adaptive Unknown Fault Detection and Few-Shot Continual Learning for Condition Monitoring in Ultrasonic Metal Welding

Ahmadreza Eslaminia, Kuan-Chieh Lu, Klara Nahrstedt, Chenhui Shao

Comments 20 pages, 10 figures

详情

英文摘要

Ultrasonic metal welding (UMW) is widely used in industrial applications but is sensitive to tool wear, surface contamination, and material variability, which can lead to unexpected process faults and unsatisfactory weld quality. Conventional monitoring systems typically rely on supervised learning models that assume all fault types are known in advance, limiting their ability to handle previously unseen process faults. To address this challenge, this paper proposes an adaptive condition monitoring approach that enables unknown fault detection and few-shot continual learning for UMW. Unknown faults are detected by analyzing hidden-layer representations of a multilayer perceptron and leveraging a statistical thresholding strategy. Once detected, the samples from unknown fault types are incorporated into the existing model through a continual learning procedure that selectively updates only the final layers of the network, which enables the model to recognize new fault types while preserving knowledge of existing classes. To accelerate the labeling process, cosine similarity transformation combined with a clustering algorithm groups similar unknown samples, thereby reducing manual labeling effort. Experimental results using a multi-sensor UMW dataset demonstrate that the proposed method achieves 96% accuracy in detecting unseen fault conditions while maintaining reliable classification of known classes. After incorporating a new fault type using only five labeled samples, the updated model achieves 98% testing classification accuracy. These results demonstrate that the proposed approach enables adaptive monitoring with minimal retraining cost and time. The proposed approach provides a scalable solution for continual learning in condition monitoring where new process conditions may constantly emerge over time and is extensible to other manufacturing processes.

URL PDF HTML ☆

赞 0 踩 0

2604.13460 2026-04-16 cs.LG cs.AI

From Order to Distribution: A Spectral Characterization of Forgetting in Continual Learning

Zonghuan Xu, Xingjun Ma

2604.13456 2026-04-16 cs.LG cs.CV

MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection

Chaitanya Pallerla, Siavash Mahmoudi, Dongyi Wang

Comments Accepted at CVPR 2026 MetaFoods Workshop. 11 pages, 5 figures

2604.13453 2026-04-16 cs.LG

FAST: A Synergistic Framework of Attention and State-space Models for Spatiotemporal Traffic Prediction

Xinjin Li, Jinghan Cao, Mengyue Wang, Yue Wu, Longxiang Yan, Yeyang Zhou, Ziqi Sha, Yu Ma

Comments Accepted by ICME 2026

2604.13452 2026-04-16 cs.CL

CANVAS: Continuity-Aware Narratives via Visual Agentic Storyboarding

Ishani Mondal, Yiwen Song, Mihir Parmar, Palash Goyal, Jordan Boyd-Graber, Tomas Pfister, Yale Song

2604.13448 2026-04-16 cs.CV cs.AI

A Study of Failure Modes in Two-Stage Human-Object Interaction Detection

Lemeng Wang, Qinqian Lei, Vidhi Bakshi, Daniel Yi, Yifan Liu, Jiacheng Hou, Asher Seng Hao, Zheda Mai, Wei-Lun Chao, Robby T. Tan, Bo Wang

Comments Accepted to SAUAFG Workshop at CVPR 2026

2604.13441 2026-04-16 cs.RO

Robust Energy-Aware Routing for Air-Ground Cooperative Multi-UAV Delivery in Wind-Uncertain Environments

Tianshun Li, Hongliang Lu, Yanggang Sheng, Zhongzhen Wang, Haoang Li, Xinhu Zheng

2604.13440 2026-04-16 cs.LG cs.AI

A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models

Jason Kong, Nilesh Prasad Pandey, Flavio Ponzina, Tajana Rosing

2604.13438 2026-04-16 cs.LG

WIN-U: Woodbury-Informed Newton-Unlearning as a retain-free Machine Unlearning Framework

Xingjian Zhao, Mohammad Mohammadi Amiri, Malik Magdon-Ismail

Comments 21 pages, 3 figures, under review at COLM2026

2604.13432 2026-04-16 cs.CV cs.AI

MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis

Simin Huo, Ning Li

Comments 20 pages. Extended version of CVPR 2026 Findings paper. Neurocomputing (Elsevier) under review

2604.13425 2026-04-16 cs.CV

VibeFlow: Versatile Video Chroma-Lux Editing through Self-Supervised Learning

Yifan Li, Pei Cheng, Bin Fu, Shuai Yang, Jiaying Liu

2604.13419 2026-04-16 cs.CV

Physically-Guided Optical Inversion Enable Non-Contact Side-Channel Attack on Isolated Screens

Zhiwen Zheng, Yuheng Qiao, Xiaoshuai Zhang, Zhao Huang, Tao Zhang, Huiyu Zhou, Shaowei Jiang, Jin Liu, Wenwen Tang, Xingru Huang

2604.13418 2026-04-16 cs.CL cs.AI cs.CV

MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments

Han Wang, David Wan, Hyunji Lee, Thinh Pham, Mikaela Cankosyan, Weiyuan Chen, Elias Stengel-Eskin, Tu Vu, Mohit Bansal

Comments First three authors contributed equally. Project Page: https://merrin-benchmark.github.io/

详情

英文摘要

Motivated by the underspecified, multi-hop nature of search queries and the multimodal, heterogeneous, and often conflicting nature of real-world web results, we introduce MERRIN (Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments), a human-annotated benchmark for evaluating search-augmented agents. MERRIN measures AI agents' ability to identify relevant modalities, retrieve multimodal evidence, and perform multi-hop reasoning over noisy web sources. It differs from prior work in three important aspects: (1) using natural language queries without explicit modality cues, (2) incorporating underexplored modalities such as video and audio, and (3) requiring the retrieval of complex, often noisy or conflicting multimodal evidence during web search. We evaluate diverse search agents powered by ten models, including strong closed-source models (e.g., GPT-5.4-mini, Gemini 3/3.1 Flash/Pro) and open-weight models (Qwen3-4B/30B/235B), across three search settings (no search, native search, and agentic search). Our results show that MERRIN is highly challenging: the average accuracy across all agents is 22.3%, with the best-performing agent reaching only 40.1%. We further observe that while stronger agents like Gemini Deep Research achieve higher performance, gains are modest due to over-exploration; they take more steps and use more tools, but are often distracted by conflicting or partially relevant web content, leading to incorrect answers. Compared to humans, these agents consume more resources yet achieve lower accuracy, largely due to inefficient source selection and an overreliance on text modalities. These findings highlight the need for search agents capable of robust search and reasoning across diverse modalities in noisy web environments, making MERRIN a valuable testbed for evaluating such capabilities.

URL PDF HTML ☆

赞 0 踩 0

2604.13416 2026-04-16 cs.CV cs.AI

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Cheng-You Lu, Yi-Shan Hung, Wei-Ling Chi, Hao-Ping Wang, Charlie Li-Ting Tsai, Yu-Cheng Chang, Yu-Lun Liu, Thomas Do, Chin-Teng Lin

2604.13414 2026-04-16 cs.LG cs.AI

Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

2604.13413 2026-04-16 cs.LG

Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Tianyi Li, Kaiyu Tang, Xiao Li, Jing Li

2604.13409 2026-04-16 cs.CV

CausalDisenSeg: A Causality-Guided Disentanglement Framework with Counterfactual Reasoning for Robust Brain Tumor Segmentation Under Missing Modalities

Bo Liu, Yulong Zou, Jin Hong

2604.13405 2026-04-16 cs.RO

Singularity Avoidance in Inverse Kinematics: A Unified Treatment of Classical and Learning-based Methods

Vishnu Rudrasamudram, Hariharasudan Malaichamee

2604.13403 2026-04-16 cs.CV

Why Multimodal In-Context Learning Lags Behind? Unveiling the Inner Mechanisms and Bottlenecks

Yu Wang, Sharon Li

Comments ACL Main 2026

2604.13398 2026-04-16 cs.CL cs.AI

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning

Shihao Zhang, Ziwei Wang, Jie Zhou, Yulan Wu, Qin Chen, Zhikai Lei, Liyang Yu, Liang Dou, Liang He