arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.05761 2026-03-09 cs.LG

Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing

Vansh Bansal, James G Scott

详情

英文摘要

Rectified Flow (RF) models achieve state-of-the-art generation quality, yet controlling them for precise tasks -- such as semantic editing or blind image recovery -- remains a challenge. Current approaches bifurcate into inversion-based guidance, which suffers from "geometric locking" by rigidly adhering to the source trajectory, and posterior sampling approximations (e.g., DPS), which are computationally expensive and unstable. In this work, we propose Score-Guided Proximal Projection (SGPP), a unified framework that bridges the gap between deterministic optimization and stochastic sampling. We reformulate the recovery task as a proximal optimization problem, defining an energy landscape that balances fidelity to the input with realism from the pre-trained score field. We theoretically prove that this objective induces a normal contraction property, geometrically guaranteeing that out-of-distribution inputs are snapped onto the data manifold, and it effectively reaches the posterior mode constrained to the manifold. Crucially, we demonstrate that SGPP generalizes state-of-the-art editing methods: RF-inversion is effectively a limiting case of our framework. By relaxing the proximal variance, SGPP enables "soft guidance," offering a continuous, training-free trade-off between strict identity preservation and generative freedom.

URL PDF HTML ☆

赞 0 踩 0

2603.05760 2026-03-09 cs.LG

MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution

2603.05758 2026-03-09 cs.CV cs.GR cs.LG

Full Dynamic Range Sky-Modelling For Image Based Lighting

Ian J. Maquignaz

2603.05757 2026-03-09 cs.RO

EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation

Gehao Zhang, Zhenyang Ni, Payal Mohapatra, Han Liu, Ruohan Zhang, Qi Zhu

2603.05754 2026-03-09 cs.RO

Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

Dian Yu, Qingchuan Zhou, Bingkun Huang, Majid Khadiv, Zewen Yang

2603.05751 2026-03-09 cs.RO cs.HC

Vision-Language System using Open-Source LLMs for Gestures in Medical Interpreter Robots

Thanh-Tung Ngo, Emma Murphy, Robert J. Ross

2603.05750 2026-03-09 cs.CL

NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories

Genet Asefa Gesese, Zongxiong Chen, Shufan Jiang, Mary Ann Tan, Zhaotai Liu, Sonja Schimmler, Harald Sack

Comments To be published (Accepted at WWW'26)

2603.05748 2026-03-09 cs.RO

Environment-Aware Path Generation for Robotic Additive Manufacturing of Structures

Mahsa Rabiei, Reza Moini

2603.05739 2026-03-09 cs.LG cs.AI

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Ved Sriraman, Adam Block

Comments 52 pages

2603.05732 2026-03-09 cs.CV

From Phase Grounding to Intelligent Surgical Narratives

Ethan Peterson, Huixin Zhan

2603.05729 2026-03-09 cs.CV

Unlocking ImageNet's Multi-Object Nature: Automated Large-Scale Multilabel Annotation

Junyu Chen, Md Yousuf Harun, Christopher Kanan

Comments Accepted to CVPR 2026 Findings

2603.05727 2026-03-09 cs.CL cs.NA math.NA

Structured Multidimensional Representation Learning for Large Language Models

Alaa El Ichi, Khalide Jbilou, Mohamed El Guide, Franck Dufrenois

Comments 25 pages, 6 figures. Preprint of a journal submission

2603.05723 2026-03-09 cs.CL cs.AI

Cultural Perspectives and Expectations for Generative AI: A Global Survey Approach

Erin van Liemt, Renee Shelby, Andrew Smart, Sinchana Kumbale, Richard Zhang, Neha Dixit, Qazi Mamunur Rashid, Jamila Smith-Loud

Comments 21 pages, 5 figures, 6 tables

2603.05716 2026-03-09 cs.RO cs.SY eess.SY

Introducing the transitional autonomous vehicle lane-changing dataset: Empirical Experiments

Abhinav Sharma, Zijun He, Danjue Chen

2603.05711 2026-03-09 cs.CV

Any to Full: Prompting Depth Anything for Depth Completion in One Stage

Zhiyuan Zhou, Ruofeng Liu, Taichi Liu, Weijian Zuo, Shanshan Wang, Zhiqing Hong, Desheng Zhang

2603.05708 2026-03-09 cs.CV

Interpretable Perception and Reasoning for Audiovisual Geolocation

Yiyang Su, Xiaoming Liu

2603.05706 2026-03-09 cs.AI

Reasoning Models Struggle to Control their Chains of Thought

Chen Yueh-Han, Robert McCarthy, Bruce W. Lee, He He, Ian Kivlichan, Bowen Baker, Micah Carroll, Tomek Korbak

2603.05697 2026-03-09 cs.CV

MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents

Dannong Xu, Zhongyu Yang, Jun Chen, Yingfang Yuan, Ming Hu, Lei Sun, Luc Van Gool, Danda Pani Paudel, Chun-Mei Feng

2603.05694 2026-03-09 cs.LG cs.FL

Warm Starting State-Space Models with Automata Learning

William Fishell, Sam Nicholas Kouteili, Mark Santolucito

2603.05690 2026-03-09 cs.CL

FreeTxt-Vi: A Benchmarked Vietnamese-English Toolkit for Segmentation, Sentiment, and Summarisation

Hung Nguyen Huy, Mo El-Haj, Dawn Knight, Paul Rayson

Comments 10 pages

2603.05686 2026-03-09 cs.CV

OWL: A Novel Approach to Machine Perception During Motion

Daniel Raviv, Juan D. Yepes

详情

英文摘要

We introduce a perception-related function, OWL, designed to address the complex challenges of 3D perception during motion. It derives its values directly from two fundamental visual motion cues, with one set of cue values per point per time instant. During motion, two visual motion cues relative to a fixation point emerge: 1) perceived local visual looming of points near the fixation point, and 2) perceived rotation of the rigid object relative to the fixation point. It also expresses the relation between two well-known physical quantities, the relative instantaneous directional range and directional translation in 3D between the camera and any visible 3D point, without explicitly requiring their measurement or prior knowledge of their individual values. OWL offers a unified, analytical time-based approach that enhances and simplifies key perception capabilities, including scaled 3D mapping and camera heading. Simulations demonstrate that OWL achieves geometric constancy of 3D objects over time and enables scaled 3D scene reconstruction from visual motion cues alone. By leveraging direct measurements from raw visual motion image sequences, OWL values can be obtained without prior knowledge of stationary environments, moving objects, or camera motion. This approach employs minimalistic, pixel-based, parallel computations, providing an alternative real-time representation for 3D points in relative motion. OWL bridges the gap between theoretical concepts and practical applications in robotics and autonomous navigation and may unlock new possibilities for real-time decision-making and interaction, potentially serving as a building block for next-generation autonomous systems. This paper offers an alternative perspective on machine perception, with implications that may extend to natural perception and contribute to a better understanding of behavioral psychology and neural functionality.

URL PDF HTML ☆

赞 0 踩 0

2603.05673 2026-03-09 cs.LG cs.SC math.AG

Reinforcement Learning for Power-Flow Network Analysis

Alperen Ergur, Julia Lindberg, Vinny Miller

Comments more experiments will be added in a relatively soon date

2603.05671 2026-03-09 cs.LG

The Value of Graph-based Encoding in NBA Salary Prediction

Junhao Su, David Grimsman, Christopher Archibald

Comments 6 pages,IEEE tempelate conference style. Submitted to IETC 2026, get decision on Mar 22th

2603.05670 2026-03-09 cs.RO

TransMASK: Masked State Representation through Learned Transformation

Sagar Parekh, Preston Culbertson, Dylan P. Losey

2603.05663 2026-03-09 cs.CV

Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding

Jiaqi Li, Shuntian Zheng, Yixian Shen, Jia-Hong Huang, Xiaoman Lu, Minzhe Ni, Yu Guan

2603.05651 2026-03-09 cs.CL cs.AI cs.HC

The Fragility Of Moral Judgment In Large Language Models

Tom van Nuenen, Pratik S. Sachdeva

Comments 22 pages, 7 figures, 10 tables, plus appendices

2603.05641 2026-03-09 cs.RO cs.HC

RFM-HRI : A Multimodal Dataset of Medical Robot Failure, User Reaction and Recovery Preferences for Item Retrieval Tasks

Yashika Batra, Giuliano Pioldi, Promise Ekpo, Arman Sayatqyzy, Purnjay Maruur, Shalom Otieno, Kevin Ching, Angelique Taylor

详情

英文摘要

While robots deployed in real-world environments inevitably experience interaction failures, understanding how users respond through verbal and non-verbal behaviors remains under-explored in human-robot interaction (HRI). This gap is particularly significant in healthcare-inspired settings, where interaction failures can directly affect task performance and user trust. We present the Robot Failures in Medical HRI (RFM-HRI) Dataset, a multimodal dataset capturing dyadic interactions between humans and robots embodied in crash carts, where communication failures are systematically induced during item retrieval tasks. Through Wizard-of-Oz studies with 41 participants across laboratory and hospital settings, we recorded responses to four failure types (speech, timing, comprehension, and search) derived from three years of crash-cart robot interaction data. The dataset contains 214 interaction samples including facial action units, head pose, speech transcripts, and post-interaction self-reports. Our analysis shows that failures significantly degrade affective valence and reduce perceived control compared to successful interactions. Failures are strongly associated with confusion, annoyance, and frustration, while successful interactions are characterized by surprise, relief, and confidence in task completion. Emotional responses also evolve across repeated failures, with confusion decreasing and frustration increasing over time. This work contributes (1) a publicly available multimodal dataset (RFM-HRI), (2) analysis of user responses to different failure types and preferred recovery strategies, and (3) a crash-cart retrieval scenario enabling systematic comparison of recovery strategies with implications for safety-critical failure recovery. Our findings provide a foundation for failure detection and recovery methods in embodied HRI.

URL PDF HTML ☆

赞 0 踩 0

2603.05638 2026-03-09 cs.RO

Control Lyapunov Functions for Underactuated Soft Robots

Huy Pham, Zach J. Patterson

Comments 8 pages, 5 figures, 2 tables. Submitted for publication to a conference

2603.05629 2026-03-09 cs.CV

Rethinking Concept Bottleneck Models: From Pitfalls to Solutions

Merve Tapli, Quentin Bouniot, Wolfgang Stammer, Zeynep Akata, Emre Akbas

Comments Accepted to CVPR 2026

2603.05625 2026-03-09 cs.LG

Identifying Adversary Characteristics from an Observed Attack

Soyon Choi, Scott Alfeld, Meiyi Ma