arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.23227 2026-03-25 cs.RO

Efficient Hybrid SE(3)-Equivariant Visuomotor Flow Policy via Spherical Harmonics for Robot Manipulation

Qinglun Zhang, Shen Cheng, Tian Dan, Haoqiang Fan, Guanghui Liu, Shuaicheng Liu

Comments Accepted by CVPR 2026

详情

英文摘要

While existing equivariant methods enhance data efficiency, they suffer from high computational intensity, reliance on single-modality inputs, and instability when combined with fast-sampling methods. In this work, we propose E3Flow, a novel framework that addresses the critical limitations of equivariant diffusion policies. E3Flow overcomes these challenges, successfully unifying efficient rectified flow with stable, multi-modal equivariant learning for the first time. Our framework is built upon spherical harmonic representations to ensure rigorous SO(3) equivariance. We introduce a novel invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual modalities (point clouds and images), injecting rich visual cues into the spherical harmonic features. We evaluate E3Flow on 8 manipulation tasks from the MimicGen and further conduct 4 real-world experiments to validate its effectiveness in physical environments. Simulation results show that E3Flow achieves a 3.12% improvement in average success rate over the state-of-the-art Spherical Diffusion Policy (SDP) while simultaneously delivering a 7x inference speedup. E3Flow thus demonstrates a new and highly effective trade-off between performance, efficiency, and data efficiency for robotic policy learning. Code: https://github.com/zql-kk/E3Flow.

URL PDF HTML ☆

赞 0 踩 0

2603.23220 2026-03-25 cs.LG cs.AI stat.ML

General Machine Learning: Theory for Learning Under Variable Regimes

Aomar Osmani

Comments 56 pages

2603.23190 2026-03-25 cs.CV

Gaze-Regularized VLMs for Ego-Centric Behavior Understanding

Anupam Pani, Yanchao Yang

2603.23186 2026-03-25 cs.CV

ViKey: Enhancing Temporal Understanding in Videos via Visual Prompting

Yeonkyung Lee, Dayun Ju, Youngmin Kim, Seil Kang, Seong Jae Hwang

Comments accepted to CVPR2026

2603.23184 2026-03-25 cs.CL cs.AI stat.AP

ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

Hao Wang, Haocheng Yang, Licheng Pan, Lei Shen, Xiaoxi Li, Yinuo Wang, Zhichao Chen, Yuan Lu, Haoxuan Li, Zhouchen Lin

2603.23182 2026-03-25 cs.RO cs.SY eess.SY

Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots

Álvaro Belmonte-Baeza, José Luis Ramón, Leonard Felicetti, Miguel Cazorla, Jorge Pomares

Comments Accepted for publication in The International Journal of Robotics Research (23-Mar-2026)

2603.23179 2026-03-25 cs.CV

Gimbal360: Differentiable Auto-Leveling for Canonicalized $360^\circ$ Panoramic Image Completion

Yuqin Lu, Haofeng Liu, Yang Zhou, Jun Liang, Shengfeng He, Jing Li

Comments Project page: https://orange-3dv-team.github.io/Gimbal360

2603.23178 2026-03-25 cs.AI

SAiW: Source-Attributable Invisible Watermarking for Proactive Deepfake Defense

Bibek Das, Chandranath Adak, Soumi Chattopadhyay, Zahid Akhtar, Soumya Dutta

详情

英文摘要

Deepfakes generated by modern generative models pose a serious threat to information integrity, digital identity, and public trust. Existing detection methods are largely reactive, attempting to identify manipulations after they occur and often failing to generalize across evolving generation techniques. This motivates the need for proactive mechanisms that secure media authenticity at the time of creation. In this work, we introduce SAiW, a Source-Attributed Invisible watermarking Framework for proactive deepfake defense and media provenance verification. Unlike conventional watermarking methods that treat watermark payloads as generic signals, SAiW formulates watermark embedding as a source-conditioned representation learning problem, where watermark identity encodes the originating source and modulates the embedding process to produce discriminative and traceable signatures. The framework integrates feature-wise linear modulation to inject source identity into the embedding network, enabling scalable multi-source watermark generation. A perceptual guidance module derived from human visual system priors ensures that watermark perturbations remain visually imperceptible while maintaining robustness. In addition, a dual-purpose forensic decoder simultaneously reconstructs the embedded watermark and performs source attribution, providing both automated verification and interpretable forensic evidence. Extensive experiments across multiple deepfake datasets demonstrate that SAiW achieves high perceptual quality while maintaining strong robustness against compression, filtering, noise, geometric transformations, and adversarial perturbations. By binding digital media to its origin through invisible yet verifiable markers, SAiW enables reliable authentication and source attribution, providing a scalable foundation for proactive deepfake defense and trustworthy media provenance.

URL PDF HTML ☆

赞 0 踩 0

2603.23173 2026-03-25 cs.LG math.OC

A Schrödinger Eigenfunction Method for Long-Horizon Stochastic Optimal Control

Louis Claeys, Artur Goldman, Zebang Shen, Niao He

Comments Accepted to ICLR 2026, code available in https://github.com/lclaeys/eigenfunction-solver

2603.23172 2026-03-25 cs.CL

From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service

Haoyu He, Jinyu Zhuang, Haoran Chu, Shuhang Yu, J, T AI Group, Hao Wang, Kunpeng Han

2603.23168 2026-03-25 cs.CV

GSwap: Realistic Head Swapping with Dynamic Neural Gaussian Field

Jingtao Zhou, Xuan Gao, Dongyu Liu, Junhui Hou, Yudong Guo, Juyong Zhang

Comments Accepted to TVCG, Project page: https://ustc3dv.github.io/GSwap/

2603.23162 2026-03-25 cs.RO

LiZIP: An Auto-Regressive Compression Framework for LiDAR Point Clouds

Aditya Shibu, Kayvan Karim, Claudio Zito

Comments 8 pages

2603.23161 2026-03-25 cs.CV

Dual Contrastive Network for Few-Shot Remote Sensing Image Scene Classification

Zhong Ji, Liyuan Hou, Xuan Wang, Gang Wang, Yanwei Pang

2603.23153 2026-03-25 cs.CV

VoDaSuRe: A Large-Scale Dataset Revealing Domain Shift in Volumetric Super-Resolution

August Leander Høeg, Sophia Wiinberg Bardenfleth, Hans Martin Kjer, Tim Bjørn Dyrby, Vedrana Andersen Dahl, Anders Bjorholm Dahl

Comments 18 pages, 15 figures. To be published in the proceedings of the Computer Vision and Pattern Recognition Conference 2026

2603.23152 2026-03-25 cs.RO

PHANTOM Hand

Teng Yan, Jiongxu Chen, Qixiang Hua, Yue Yu, Zihang Wang, Yaohua Liu, Bingzhuo Zhong

Comments 8 pages. Submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2026

2603.23149 2026-03-25 cs.AI

Describe-Then-Act: Proactive Agent Steering via Distilled Language-Action World Models

Massimiliano Pappa, Luca Romani, Valentino Sacco, Alessio Palma, Stéphane Lathuilière, Fabio Galasso, Xavier Alameda-Pineda, Indro Spinelli

2603.23136 2026-03-25 cs.CL cs.LG

HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature

Devvrat Joshi, Islem Rekik

详情

英文摘要

Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entities, often fail to generalize across domains, and typically overlook the hierarchical nature of scientific knowledge. While general-purpose large language models (LLMs) offer adaptability, they are computationally expensive and yield inconsistent accuracy on specialized tasks. As a result, current KGs are shallow and inconsistent, limiting their utility for exploration and synthesis. We propose a two-stage framework for scalable, zero-shot scientific KG construction. The first stage, Z-NERD, introduces (i) Orthogonal Semantic Decomposition (OSD), which promotes domain-agnostic entity recognition by isolating semantic "turns" in text, and (ii) a Multi-Scale TCQK attention mechanism that captures coherent multi-word entities through n-gram-aware attention heads. The second stage, HGNet, performs relation extraction with hierarchy-aware message passing, explicitly modeling parent, child, and peer relations. To enforce global consistency, we introduce two complementary objectives: a Differentiable Hierarchy Loss to discourage cycles and shortcut edges, and a Continuum Abstraction Field (CAF) Loss that embeds abstraction levels along a learnable axis in Euclidean space. This is the first approach to formalize hierarchical abstraction as a continuous property within standard Euclidean embeddings, offering a simpler alternative to hyperbolic methods. We release SPHERE (https://github.com/basiralab/SPHERE), a multi-domain benchmark for hierarchical relation extraction. Our framework establishes a new state of the art on SciERC, SciER, and SPHERE, improving NER by 8.08% and RE by 5.99% on out-of-distribution tests. In zero-shot settings, gains reach 10.76% for NER and 26.2% for RE.

URL PDF HTML ☆

赞 0 踩 0

2603.23134 2026-03-25 cs.LG stat.AP

A Bayesian Learning Approach for Drone Coverage Network: A Case Study on Cardiac Arrest in Scotland

Tathagata Basu, Edoardo Patelli, Gianluca Filippi, Ben Parsonage, Christy Maddock, Massimiliano Vasile, Marco Fossati, Adam Loyd, Shaun Marshall, Paul Gowens

2603.23132 2026-03-25 cs.CV

InterDyad: Interactive Dyadic Speech-to-Video Generation by Querying Intermediate Visual Guidance

Dongwei Pan, Longwei Guo, Jiazhi Guan, Luying Huang, Yiding Li, Haojie Liu, Haocheng Feng, Wei He, Kaisiyuan Wang, Hang Zhou

Comments Project Page: https://interdyad.github.io/

2603.23126 2026-03-25 cs.CV

3rd Place of MeViS-Audio Track of the 5th PVUW: VIRST-Audio

Jihwan Hong, Jaeyoung Do

Comments 4 pages, 2 figures. Technical report for the CVPR 2026 PVUW Workshop (MeViS-Audio Track)

2603.23122 2026-03-25 cs.CV

PiCo: Active Manifold Canonicalization for Robust Robotic Visual Anomaly Detection

Teng Yan, Binkai Liu, Shuai Liu, Yue Yu, Bingzhuo Zhong

Comments 16 pages. Submitted to the European Conference on Computer Vision (ECCV) 2026

2603.23118 2026-03-25 cs.CV cs.MM

SMSP: A Plug-and-Play Strategy of Multi-Scale Perception for MLLMs to Perceive Visual Illusions

Jinzhe Tu, Ruilei Guo, Zihan Guo, Junxiao Yang, Shiyao Cui, Minlie Huang

2603.23116 2026-03-25 cs.CV

Automatic Segmentation of 3D CT scans with SAM2 using a zero-shot approach

Miquel Lopez Escoriza, Pau Amargant Alvarez

Comments 11 pages, 5 figures

2603.23115 2026-03-25 cs.CV

AgentFoX: LLM Agent-Guided Fusion with eXplainability for AI-Generated Image Detection

Yangxin Yu, Yue Zhou, Bin Li, Kaiqing Lin, Haodong Li, Jiangqun Ni, Bo Cao

2603.23114 2026-03-25 cs.AI cs.CL cs.CY cs.HC

Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Adrian Sauter, Mona Schirmer

Comments preprint

2603.23112 2026-03-25 cs.RO

Active Robotic Perception for Disease Detection and Mapping in Apple Trees

Hayden Feddock, Francisco Yandun, Srđan Aćimović, Abhisesh Silwal

Comments 8 pages, 6 figures, IROS 2026 conference

2603.23104 2026-03-25 cs.CV

NeuroSeg Meets DINOv3: Transferring 2D Self-Supervised Visual Priors to 3D Neuron Segmentation via DINOv3 Initialization

Yik San Cheng, Runkai Zhao, Weidong Cai

Comments 17 pages, 12 figures, and 11 tables. Accepted to CVPR 2026

2603.23091 2026-03-25 cs.CL

When Language Models Lose Their Mind: The Consequences of Brain Misalignment

Gabriele Merlin, Mariya Toneva

Comments Accepted at ICLR 2026

2603.23086 2026-03-25 cs.LG cs.CV

Policy-based Tuning of Autoregressive Image Models with Instance- and Distribution-Level Rewards

Orhun Buğra Baran, Melih Kandemir, Ramazan Gokberk Cinbis

2603.23079 2026-03-25 cs.RO

AirSimAG: A High-Fidelity Simulation Platform for Air-Ground Collaborative Robotics

Yangjie Cui, Xin Dong, Boyang Gao, Jinwu Xiang, Daochun Li, Zhan Tu