arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.01715 2026-04-03 cs.CV

SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing

Thinh Dao, Zhen Wang, Kien T. Pham, Long Chen

详情

英文摘要

Recent advances in flow-based generative models have enabled training-free, text-guided image editing by inverting an image into its latent noise and regenerating it under a new target conditional guidance. However, existing methods struggle to preserve source fidelity: higher-order solvers incur additional model inferences, truncated inversion constrains editability, and feature injection methods lack architectural transferability. To address these limitations, we propose SteerFlow, a model-agnostic editing framework with strong theoretical guarantees on source fidelity. In the forward process, we introduce an Amortized Fixed-Point Solver that implicitly straightens the forward trajectory by enforcing velocity consistency across consecutive timesteps, yielding a high-fidelity inverted latent. In the backward process, we introduce Trajectory Interpolation, which adaptively blends target-editing and source-reconstruction velocities to keep the editing trajectory anchored to the source. To further improve background preservation, we introduce an Adaptive Masking mechanism that spatially constrains the editing signal with concept-guided segmentation and source-target velocity differences. Extensive experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium demonstrate that SteerFlow consistently achieves better editing quality than existing methods. Finally, we show that SteerFlow extends naturally to a complex multi-turn editing paradigm without accumulating drift.

URL PDF HTML ☆

赞 0 踩 0

2604.01714 2026-04-03 cs.CV

End-to-End Shared Attention Estimation via Group Detection with Feedback Refinement

Chihiro Nakatani, Norimichi Ukita, Jean-Marc Odobez

Comments Accepted to CVPR2026 Workshop (GAZE 2026)

2604.01712 2026-04-03 cs.LG cs.AI eess.SP physics.comp-ph

Transformer self-attention encoder-decoder with multimodal deep learning for response time series forecasting and digital twin support in wind structural health monitoring

Feiyu Zhou, Marios Impraimakis

Comments 21 pages, 22 figures, 9 tables. This version corresponds to the published article in Computers & Structures. https://doi.org/10.1016/j.compstruc.2026.108216

详情

DOI: 10.1016/j.compstruc.2026.108216
Journal ref: Computers and Structures 326 (2026) 108216

英文摘要

The wind-induced structural response forecasting capabilities of a novel transformer methodology are examined here. The model also provides a digital twin component for bridge structural health monitoring. Firstly, the approach uses the temporal characteristics of the system to train a forecasting model. Secondly, the vibration predictions are compared to the measured ones to detect large deviations. Finally, the identified cases are used as an early-warning indicator of structural change. The artificial intelligence-based model outperforms approaches for response forecasting as no assumption on wind stationarity or on structural normal vibration behavior is needed. Specifically, wind-excited dynamic behavior suffers from uncertainty related to obtaining poor predictions when the environmental or traffic conditions change. This results in a hard distinction of what constitutes normal vibration behavior. To this end, a framework is rigorously examined on real-world measurements from the Hardanger Bridge monitored by the Norwegian University of Science and Technology. The approach captures accurate structural behavior in realistic conditions, and with respect to the changes in the system excitation. The results, importantly, highlight the potential of transformer-based digital twin components to serve as next-generation tools for resilient infrastructure management, continuous learning, and adaptive monitoring over the system's lifecycle with respect to temporal characteristics.

URL PDF HTML ☆

赞 0 踩 0

2604.01711 2026-04-03 cs.CL

Human-Guided Reasoning with Large Language Models for Vietnamese Speech Emotion Recognition

Truc Nguyen, Then Tran, Binh Truong, Phuoc Nguyen T. H

Comments 6 pages, 2 figures. Dataset of 2,764 Vietnamese speech samples across three emotion classes

2604.01709 2026-04-03 cs.CV

Bias mitigation in graph diffusion models

Meng Yu, Kun Zhan

Comments Accepted to ICLR 2025!

2604.01708 2026-04-03 cs.RO cs.AI

OpenGo: An OpenClaw-Based Robotic Dog with Real-Time Skill Switching

Hanbing Li, Xuewei Cao, Zhiwen Zeng, Yuhan Wu, Yanyong Zhang, Yan Xia

Comments 11 pages, 6 figures

2604.01705 2026-04-03 cs.CL cs.AI

Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

Ruijie Yang, Yan Zhu, Peiyao Fu, Te Luo, Zhihua Wang, Xian Yang, Quanlin Li, Pinghong Zhou, Shuo Wang

Comments Under review at npj Digital Medicine

2604.01703 2026-04-03 cs.RO

3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements

Chenyang Liang, Liangming Chen, Baoyi Cui, Jie Mei

Comments 29 pages, 28 figures

详情

DOI: 10.1177/02783649251363276
Journal ref: The International Journal of Robotics Research, 2025

英文摘要

Realizing relative localization by leveraging inter-robot local measurements is a challenging problem, especially in the presence of measurement noise. Motivated by this challenge, in this paper we propose a novel and systematic 3-D relative localization framework based on inter-robot interior angle and self-displacement measurements. Initially, we propose a linear relative localization theory comprising a distributed linear relative localization algorithm and sufficient conditions for localizability. According to this theory, robots can determine their neighbors' relative positions and orientations in a purely linear manner. Subsequently, in order to deal with measurement noise, we present an advanced Maximum a Posterior (MAP) estimator by addressing three primary challenges existing in the MAP estimator. Firstly, it is common to formulate the MAP problem as an optimization problem, whose inherent non-convexity can result in local optima. To address this issue, we reformulate the linear computation process of the linear relative localization algorithm as a Weighted Total Least Squares (WTLS) optimization problem on manifolds. The optimal solution of the WTLS problem is more accurate, which can then be used as initial values when solving the optimization problem associated with the MAP problem, thereby reducing the risk of falling into local optima. The second challenge is the lack of knowledge of the prior probability density of the robots' relative positions and orientations at the initial time, which is required as an input for the MAP estimator. To deal with it, we combine the WTLS with a Neural Density Estimator (NDE). Thirdly, to prevent the increasing size of the relative positions and orientations to be estimated as the robots continuously move when solving the MAP problem, a marginalization mechanism is designed, which ensures that the computational cost remains constant.

URL PDF HTML ☆

赞 0 踩 0

2604.01700 2026-04-03 cs.CV cs.MM

Can Video Diffusion Models Predict Past Frames? Bidirectional Cycle Consistency for Reversible Interpolation

Lingyu Liu, Yaxiong Wang, Li Zhu, Zhedong Zheng

2604.01696 2026-04-03 cs.RO

A Graph Neural Network Approach for Solving the Ranked Assignment Problem in Multi-Object Tracking

Robin Dehler, Martin Herrmann, Jan Strohbeck, Michael Buchholz

Comments 2024 IEEE Intelligent Vehicles Symposium (IV)

2604.01694 2026-04-03 cs.LG cs.AI cs.CL

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning

Sten Rüdiger, Sebastian Raschka

2604.01693 2026-04-03 cs.CV

From Understanding to Erasing: Towards Complete and Stable Video Object Removal

Dingming Liu, Wenjing Wang, Chen Li, Jing Lyu

2604.01683 2026-04-03 cs.LG cs.CL

Coupled Query-Key Dynamics for Attention

Barak Gahtan, Alex M. Bronstein

2604.01682 2026-04-03 cs.CL

PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment

Chenning Xu, Mao Zheng, Mingyang Song

2604.01681 2026-04-03 cs.RO cs.AI

Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast-Slow Planning

Jiayi Chen, Shuai Wang, Guangxu Zhu, Chengzhong Xu

Comments 8 pages, 12figures

2604.01679 2026-04-03 cs.CV

BTS-rPPG: Orthogonal Butterfly Temporal Shifting for Remote Photoplethysmography

Ba-Thinh Nguyen, Thi-Duyen Ngo, Thanh-Trung Huynh, Thanh-Ha Le, Huy-Hieu Pham

2604.01678 2026-04-03 cs.CV

Director: Instance-aware Gaussian Splatting for Dynamic Scene Modeling and Understanding

Yuheng Jiang, Yiwen Cai, Zihao Wang, Yize Wu, Sicheng Li, Zhuo Su, Shaohui Jiao, Lan Xu

Comments Project page: https://caiyw2023.github.io/Director/

2604.01675 2026-04-03 cs.CV

HOT: Harmonic-Constrained Optimal Transport for Remote Photoplethysmography Domain Adaptation

Ba-Thinh Nguyen, Thi-Duyen Ngo, Thanh-Trung Huynh, Thanh-Ha Le, Huy-Hieu Pham

2604.01671 2026-04-03 cs.CL

PRCCF: A Persona-guided Retrieval and Causal-aware Cognitive Filtering Framework for Emotional Support Conversation

Yanxin Luo, Xiaoyu Zhang, Jing Li, Yan Gao, Donghong Han

Comments 14 pages, 6 figures, 5 tables. Submitted to Transactions of the Association for Computational Linguistics (TACL)

2604.01670 2026-04-03 cs.AI

Hierarchical Memory Orchestration for Personalized Persistent Agents

Junming Liu, Yifei Sun, Weihua Cheng, Haodong Lei, Yuqi Li, Yirong Chen, Ding Wang

Comments 10 pages, 5 figures, 7 tables

2604.01669 2026-04-03 cs.CV cs.AI

Robust Embodied Perception in Dynamic Environments via Disentangled Weight Fusion

Juncen Guo, Xiaoguang Zhu, Jingyi Wu, Jingyu Zhang, Jingnan Cai, Zhenghao Niu, Liang Song

Comments Accepted by ICME2026

2604.01667 2026-04-03 cs.AI cs.CV

M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis

Rui Dong, Xiaotong Zhang, Jiaxing Li, Yueying Li, Jiayin Wei, Youyong Kong

2604.01666 2026-04-03 cs.CV

DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho

Comments Accepted to CVPR 2026. Website: https://jinwonjoon.github.io/DynaVid/

2604.01664 2026-04-03 cs.AI

ContextBudget: Budget-Aware Context Management for Long-Horizon Search Agents

Yong Wu, YanZhao Zheng, TianZe Xu, ZhenTao Zhang, YuanQiang Yu, JiHuai Zhu, Chao Ma, BinBin Lin, BaoHua Dong, HangCheng Zhu, RuoHui Huang, Gang Yu

2604.01661 2026-04-03 cs.AI

Ontology-Aware Design Patterns for Clinical AI Systems: Translating Reification Theory into Software Architecture

Florian Odi Stummer

Comments 7 design patterns, 3 tables, 1 figure, arXiv cs.AI preprint

2604.01659 2026-04-03 cs.RO

AURA: Multimodal Shared Autonomy for Real-World Urban Navigation

Yukai Ma, Honglin He, Selina Song, Wayne Wu, Bolei Zhou

Comments 17 pages, 18 figures, 4 tables, conference

2604.01657 2026-04-03 cs.CL

What Do Claim Verification Datasets Actually Test? A Reasoning Trace Analysis

Delip Rao, Chris Callison-Burch

Comments 11 pages

2604.01654 2026-04-03 cs.CV cs.AI cs.MM

Moiré Video Authentication: A Physical Signature Against AI Video Generation

Yuan Qing, Kunyu Zheng, Lingxiao Li, Boqing Gong, Chang Xiao

Comments 17 pages, 14 figures

2604.01653 2026-04-03 cs.LG cs.HC

Cognitive Energy Modeling for Neuroadaptive Human-Machine Systems using EEG and WGAN-GP

Sriram Sattiraju, Vaibhav Gollapalli, Aryan Shah, Timothy McMahan

2604.01652 2026-04-03 cs.AI cs.CL

ThinknCheck: Grounded Claim Verification with Compact, Reasoning-Driven, and Interpretable Models

Delip Rao, Feijiang Han, Chris Callison-Burch

Comments 15 pages