arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.17388 2026-04-29 cs.LG cs.AI

Back to Repair: A Minimal Denoising Network for Time Series Anomaly Detection

Kadir-Kaan Özer, René Ebeling, Markus Enzweiler

Comments 9 pages, 6 figures, 5 tables

详情

英文摘要

We introduce JuRe (Just Repair), a minimal denoising network for time series anomaly detection that exposes a central finding: architectural complexity is unnecessary when the training objective correctly implements the manifold-projection principle. JuRe consists of a single depthwise-separable convolutional residual block with hidden dimension 128, trained to repair corrupted time series windows and scored at inference by a fixed, parameter-free structural discrepancy function. Despite using no attention, no latent variable, and no adversarial component, JuRe ranks second on the TSB-AD multivariate benchmark (AUC-PR 0.404, 180 series, 17 datasets) and second on the UCR univariate archive by AUC-PR (0.198, 250 series), leading all neural baselines on AUC-PR and VUS-PR. Component ablation on TSB-AD identifies training-time corruption as the dominant factor ($Δ$AUC-PR $= 0.047$ on removal), confirming that the denoising objective, not network capacity, drives detection quality. Pairwise Wilcoxon signed-rank tests establish statistical significance against 21 of 25 baselines on TSB-AD. Code is available at the URL https://github.com/iis-esslingen/JuRe.

URL PDF HTML ☆

赞 0 踩 0

2604.17188 2026-04-29 cs.CL cs.AI

Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization

Xiaoyong Mei, Tingting Zuo, Da Chen, Guangyu Hu, Xiangyu Wen, Chao Duan, Mingyan Zhang, Fudan Zheng

2604.16812 2026-04-29 cs.AI

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

Keshav Shenoy, Li Yang, Abhay Sheshadri, Sören Mindermann, Jack Lindsey, Sam Marks, Rowan Wang

2604.14862 2026-04-29 cs.CL cs.AI

Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

Yifan Le

Comments 11 pages, 3 figures

2604.14807 2026-04-29 cs.AI cs.CL

The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows

Hyunwoo Kim, Harin Yu, Hanau Yi

2604.14389 2026-04-29 cs.CL cs.AI

BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking

Hyunkyung Park, Arkaitz Zubiaga

Comments 15 pages, 7 figures; minor source and formatting cleanup; results unchanged

2604.12827 2026-04-29 cs.LG cs.AI stat.ML

Loop Corrections to the Training Error and Generalization Gap of Random Feature Models

Taeyoung Kim

Comments 28 pages, 12 figures

2604.10946 2026-04-29 cs.LG math.OC

Learning to Adapt: In-Context Learning Beyond Stationarity

Zhen Qin, Jiachen Jiang, Zhihui Zhu

2604.10873 2026-04-29 cs.AI cs.CC cs.LG

A Quantitative Definition of Intelligence

Kang-Sin Choi

Comments 27 pages; v2: syntax is semantics

2604.10103 2026-04-29 cs.CV

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

Ruibin Li, Tao Yang, Fangzhou Ai, Tianhe Wu, Shilei Wen, Bingyue Peng, Lei Zhang

2604.08567 2026-04-29 cs.CL cs.MA

Multi-User Large Language Model Agents

Shu Yang, Shenzhe Zhu, Hao Zhu, José Ramón Enríquez, Di Wang, Alex Pentland, Michiel A. Bakker, Jiaxin Pei

2604.07927 2026-04-29 cs.AI

EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools

Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He

2604.07802 2026-04-29 cs.CV cs.AI

Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models

Shaotian Li, Shangze Li, Chuancheng Shi, Wenhua Wu, Yanqiu Wu, Xiaohan Yu, Fei Shen, Tat-Seng Chua

2604.07105 2026-04-29 cs.RO

Genie Sim PanoRecon: Fast Immersive Scene Generation from Single-View Panorama

Zhijun Li, Yongxin Su, Di Yang, Jichao Wang, Zheyuan Xing, Qian Wang, Maoqing Yao

2604.05959 2026-04-29 cs.CV cs.LG

Multi-Modal Landslide Detection from Sentinel-1 SAR and Sentinel-2 Optical Imagery Using Multi-Encoder Vision Transformers and Ensemble Learning

Ioannis Nasios

详情

DOI: 10.1016/j.rsase.2026.102037

英文摘要

Landslides represent a major geohazard with severe impacts on human life, infrastructure, and ecosystems, underscoring the need for accurate and timely detection approaches to support disaster risk reduction. This study proposes a modular, multi-model framework that fuses Sentinel-2 optical imagery with Sentinel-1 Synthetic Aperture Radar (SAR) data, for robust landslide detection. The methodology leverages multi-encoder vision transformers, where each data modality is processed through separate lightweight pretrained encoders, achieving strong performance in landslide detection. In addition, the integration of multiple models, particularly the combination of neural networks and gradient boosting models (LightGBM and XGBoost), demonstrates the power of ensemble learning to further enhance accuracy and robustness. Derived spectral indices, such as NDVI, are integrated alongside original bands to enhance sensitivity to vegetation and surface changes. The proposed methodology achieves a state-of-the-art F1 score of 0.919 on landslide detection, addressing a patch-based classification task rather than pixel-level segmentation and operating without pre-event Sentinel-2 data, highlighting its effectiveness in a non-classical change detection setting. It also demonstrated top performance in a machine learning competition, achieving a strong balance between precision and recall and highlighting the advantages of explicitly leveraging the complementary strengths of optical and radar data. The conducted experiments and research also emphasize scalability and operational applicability, enabling flexible configurations with optical-only, SAR-only, or combined inputs, and offering a transferable framework for broader natural hazard monitoring and environmental change applications. Full training and inference code can be found in https://github.com/IoannisNasios/sentinel-landslide-cls.

URL PDF HTML ☆

赞 0 踩 0

2604.05271 2026-04-29 cs.CV

Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition

Gabriel E. Lima, Valfride Nascimento, Eduardo Santos, Eduil Nascimento, Rayson Laroca, David Menotti

Comments Accepted for publication in the Journal of the Brazilian Computer Society (JBCS)

详情

DOI: 10.5753/jbcs.2026.5899

英文摘要

Extracting vehicle information from surveillance images is essential for intelligent transportation systems, enabling applications such as traffic monitoring and criminal investigations. While Automatic License Plate Recognition (ALPR) is widely used, Fine-Grained Vehicle Classification (FGVC) offers a complementary approach by identifying vehicles based on attributes such as color, make, model, and type. Although there have been advances in this field, existing studies often assume well-controlled conditions, explore limited attributes, and overlook FGVC integration with ALPR. To address these gaps, we introduce UFPR-VeSV, a dataset comprising 24,945 images of 16,297 unique vehicles with annotations for 13 colors, 26 makes, 136 models, and 14 types. Collected from the Military Police of Paraná (Brazil) surveillance system, the dataset captures diverse real-world conditions, including partial occlusions, nighttime infrared imaging, and varying lighting. All FGVC annotations were validated using license plate information, with text and corner annotations also being provided. A qualitative and quantitative comparison with established datasets confirmed the challenging nature of our dataset. A benchmark using five deep learning models further validated this, revealing specific challenges such as handling multicolored vehicles, infrared images, and distinguishing between vehicle models that share a common platform. Additionally, we apply two optical character recognition models to license plate recognition and explore the joint use of FGVC and ALPR. The results highlight the potential of integrating these complementary tasks for real-world applications. The UFPR-VeSV dataset is publicly available at: https://github.com/Lima001/UFPR-VeSV-Dataset.

URL PDF HTML ☆

赞 0 踩 0

2604.02467 2026-04-29 cs.CV cs.AI

VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

Mengtian Li, Yuwei Lu, Feifei Li, Chenqi Gan, Zhifeng Xie, Xi Wang

Comments 28 pages, 10 figures

2604.01798 2026-04-29 cs.CV cs.AI

A deep learning pipeline for PAM50 subtype classification using histopathology images and multi-objective patch selection

Arezoo Borji, Gernot Kronreif, Bernhard Angermayr, Francisco Mario Calisto, Ali Abbasian Ardakani, Wolfgang Birkfellner, Inna Servetnyk, Yinyin Yuan, Sepideh Hatamikia

2604.00931 2026-04-29 cs.AI

PsychAgent: An Experience-Driven Lifelong Learning Agent for Self-Evolving Psychological Counselor

Yutao Yang, Junsong Li, Qianjun Pan, Jie Zhou, Kai Chen, Qin Chen, Jingyuan Zhao, Ningning Zhou, Xin Li, Liang He

2604.00485 2026-04-29 cs.LG

The Rashomon Effect for Visualizing High-Dimensional Data

Yiyang Sun, Haiyang Huang, Gaurav Rajesh Parikh, Cynthia Rudin

Comments The paper is accepted in AISTATS 2026

2603.29844 2026-04-29 cs.RO cs.AI cs.CV cs.LG

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA

Yi Chen, Yuying Ge, Hui Zhou, Mingyu Ding, Yixiao Ge, Xihui Liu

Comments Project page: https://xpeng-robotics.github.io/dial

详情

英文摘要

The development of Vision-Language-Action (VLA) models has been significantly accelerated by pre-trained Vision-Language Models (VLMs). However, most existing end-to-end VLAs treat the VLM primarily as a multimodal encoder, directly mapping vision-language features to low-level actions. This paradigm underutilizes the VLM's potential in high-level decision making and introduces training instability, frequently degrading its rich semantic representations. To address these limitations, we introduce DIAL, a framework bridging high-level decision making and low-level motor execution through a differentiable latent intent bottleneck. Specifically, a VLM-based System-2 performs latent world modeling by synthesizing latent visual foresight within the VLM's native feature space; this foresight explicitly encodes intent and serves as the structural bottleneck. A lightweight System-1 policy then decodes this predicted intent together with the current observation into precise robot actions via latent inverse dynamics. To ensure optimization stability, we employ a two-stage training paradigm: a decoupled warmup phase where System-2 learns to predict latent futures while System-1 learns motor control under ground-truth future guidance within a unified feature space, followed by seamless end-to-end joint optimization. This enables action-aware gradients to refine the VLM backbone in a controlled manner, preserving pre-trained knowledge. Extensive experiments on the RoboCasa GR1 Tabletop benchmark show that DIAL establishes a new state-of-the-art, achieving superior performance with 10x fewer demonstrations than prior methods. Furthermore, by leveraging heterogeneous human demonstrations, DIAL learns physically grounded manipulation priors and exhibits robust zero-shot generalization to unseen objects and novel configurations during real-world deployment on a humanoid robot.

URL PDF HTML ☆

赞 0 踩 0

2603.29080 2026-04-29 cs.CV cs.LG

Is the Modality Gap a Bug or a Feature? A Robustness Perspective

Rhea Chowers, Oshri Naparstek, Udi Barzelay, Yair Weiss

2603.28750 2026-04-29 cs.LG

Immediate Derivatives Suffice for Online Recurrent Adaptation

Aur Shalev Merin

Comments 25 pages, 4 figures, 19 tables. Submitted to NeurIPS 2026

2603.26783 2026-04-29 cs.CV cs.AI

Can We Change the Stroke Size for Easier Diffusion?

Yunwei Bai, Ying Kiat Tan, Yao Shu, Tsuhan Chen

2603.26554 2026-04-29 cs.LG stat.ML

Sharp Capacity Scaling of Spectral Optimizers in Learning Associative Memory

Juno Kim, Eshaan Nichani, Denny Wu, Alberto Bietti, Jason D. Lee

Comments 84 pages, 9 figures

2603.25268 2026-04-29 cs.CL cs.AI

CRAFT: Grounded Multi-Agent Coordination Under Partial Information

Abhijnan Nath, Hannah VanderHoeven, Nikhil Krishnaswamy

Comments Added revisions, corrected typos and additional analysis

2603.20645 2026-04-29 cs.LG

Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity

Zixuan Zhang, Kaixuan Huang, Tuo Zhao, Mengdi Wang, Minshuo Chen

2603.20092 2026-04-29 cs.LG

How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models

Luca Ambrogioni

2603.19124 2026-04-29 cs.RO

Tendon-Actuated Robots with a Tapered, Flexible Polymer Backbone: Design, Fabrication, and Modeling

Harald Minde Hansen, Nandita Gallacher, Nicholas B. Andrews, Kristin Y. Pettersen, Jan Tommy Gravdahl, Mario di Castro

详情

英文摘要

This paper presents the design, modeling, and fabrication of 3D-printed, tendon-actuated continuum robots featuring a flexible, tapered backbone constructed from thermoplastic polyurethane (TPU). Our scalable design incorporates an integrated electronics base housing that enables direct tendon tension control and sensing via actuators and compression load cells. Unlike many continuum robots that are single-purpose and costly, the proposed design prioritizes customizability, rapid assembly, and low cost while enabling high curvature and enhanced distal compliance through geometric tapering, thereby supporting a broad range of compliant robotic inspection and manipulation tasks. We develop a generalized forward kinetostatic model of the tapered backbone based on Cosserat rod theory using a Newtonian approach, extending existing tendon-actuated Cosserat rod formulations to explicitly account for spatially varying backbone cross-sectional geometry. The model captures the graded stiffness profile induced by the tapering and enables systematic exploration of the configuration space as a function of the geometric design parameters. Specifically, we analyze how the backbone taper angle influences the robot's configuration space and manipulability. The model is validated against motion capture data, achieving centimeter-level shape prediction accuracy after calibrating Young's modulus via a line search that minimizes modeling error. We further demonstrate teleoperated grasping using an endoscopic gripper routed along the continuum robot, mounted on a 6-DoF robotic arm. Parameterized iLogic/CAD scripts are provided for rapid geometry generation and scaling. The presented framework establishes a simple, rapid, and reproducible pathway from parametric design to controlled tendon actuation for tapered, tendon-driven continuum robots manufactured using fused deposition modeling 3D printers.

URL PDF HTML ☆

赞 0 踩 0

2603.17729 2026-04-29 cs.CV cs.AI

SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

Jingxiao Yang, DaLin He, Miao Pan, Kaixiang Yao, Ge Su, Wenqi Zhang, Yifeng Hu, Tangwei Li, Yuke Li, Xuhong Zhang

Comments preprint, under review