arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09712 2026-03-11 cs.RO

Robotic Scene Cloning:Advancing Zero-Shot Robotic Scene Adaptation in Manipulation via Visual Prompt Editing

Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Tiancai Wang, Chang Wen Chen, Haoqiang Fan, Zhenzhong Chen

详情

英文摘要

Modern robots can perform a wide range of simple tasks and adapt to diverse scenarios in the well-trained environment. However, deploying pre-trained robot models in real-world user scenarios remains challenging due to their limited zero-shot capabilities, often necessitating extensive on-site data collection. To address this issue, we propose Robotic Scene Cloning (RSC), a novel method designed for scene-specific adaptation by editing existing robot operation trajectories. RSC achieves accurate and scene-consistent sample generation by leveraging a visual prompting mechanism and a carefully tuned condition injection module. Not only transferring textures but also performing moderate shape adaptations in response to the visual prompts, RSC demonstrates reliable task performance across a variety of object types. Experiments across various simulated and real-world environments demonstrate that RSC significantly enhances policy generalization in target environments.

URL PDF HTML ☆

赞 0 踩 0

2603.09706 2026-03-11 cs.AI

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

Ming Wen, Kun Yang, Jingyu Zhang, Yuxuan Liu, shiwen cui, Shouling Ji, Xingjun Ma

Comments 30 pages

2603.09703 2026-03-11 cs.CV

ProGS: Towards Progressive Coding for 3D Gaussian Splatting

Zhiye Tang, Lingzhuo Liu, Shengjie Jiao, Qiudan Zhang, Junhui Hou, You Yang, Xu Wang

2603.09702 2026-03-11 cs.CV

TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR

Fayaz Ali Dharejo, Sharif S. M. A., Aiman Khalil, Nachiket Chaudhary, Rizwan Ali Naqvi, Radu Timofte

2603.09696 2026-03-11 cs.CV

TemporalDoRA: Temporal PEFT for Robust Surgical Video Question Answering

Luca Carlini, Chiara Lena, Cesare Hassan, Danail Stoyanov, Elena De Momi, Sophia Bano, Mobarak I. Hoque

2603.09693 2026-03-11 cs.LG cond-mat.mtrl-sci physics.comp-ph

Physics-informed neural operator for predictive parametric phase-field modelling

Nanxi Chen, Airong Chen, Rujin Ma

2603.09691 2026-03-11 cs.CL cs.AI

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang Che

Comments Published at International Journal of Machine Learning and Cybernetics (IJMLC)

详情

DOI: 10.1007/s13042-025-02823-6
Journal ref: Int. J. Mach. Learn. & Cyber. 17, 127 (2026)

英文摘要

Existing end-to-end modeling methods for modular task-oriented dialog systems are typically tailored to specific datasets, making it challenging to adapt to new dialog scenarios. In this work, we propose ESAinsTOD, a unified End-to-end Schema-Aware Instruction-tuning framework for general Task-Oriented Dialog modeling. This framework introduces a structured methodology to go beyond simply fine-tuning Large Language Models (LLMs), enabling flexible adaptation to various dialogue task flows and schemas. Specifically, we leverage full-parameter fine-tuning of LLMs and introduce two alignment mechanisms to make the resulting system both instruction-aware and schema-aware: (i) instruction alignment, which ensures that the system faithfully follows task instructions to complete various task flows from heterogeneous TOD datasets; and (ii) schema alignment, which encourages the system to make predictions adhering to the specified schema. In addition, we employ session-level end-to-end modeling, which allows the system to access the results of previously executed task flows within the dialogue history, to bridge the gap between the instruction-tuning paradigm and the real-world application of TOD systems. Empirical results show that while a fine-tuned LLM serves as a strong baseline, our structured approach provides significant additional benefits. In particular, our findings indicate that: (i) ESAinsTOD outperforms state-of-the-art models by a significant margin on end-to-end task-oriented dialog modeling benchmarks: CamRest676, In-Car and MultiWOZ; (ii) more importantly, it exhibits superior generalization capabilities across various low-resource settings, with the proposed alignment mechanisms significantly enhancing zero-shot performance; and (iii) our instruction-tuning paradigm substantially improves the model's robustness against data noise and cascading errors.

URL PDF HTML ☆

赞 0 踩 0

2603.09685 2026-03-11 cs.CL cs.AI cs.IR

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van Es

Comments 17 pages, 3 figures, 5 tables

2603.09684 2026-03-11 cs.LG

On Catastrophic Forgetting in Low-Rank Decomposition-Based Parameter-Efficient Fine-Tuning

Muhammad Ahmad, Jingjing Zheng, Yankai Cao

2603.09681 2026-03-11 cs.CV

Improving 3D Foot Motion Reconstruction in Markerless Monocular Human Motion Capture

Tom Wehrbein, Bodo Rosenhahn

Comments Accepted at the 2026 International Conference on 3D Vision (3DV)

2603.09675 2026-03-11 cs.LG cs.AI

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González, Federico Larroca

2603.09673 2026-03-11 cs.CV

VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM

Anh Thuan Tran, Jana Kosecka

Comments Accepted to CVPR 2026

2603.09662 2026-03-11 cs.LG

No evaluation without fair representation : Impact of label and selection bias on the evaluation, performance and mitigation of classification models

Magali Legast, Toon Calders, François Fouss

Comments 31 pages, 14 figures + appendix Submitted to the ACM Journal on Responsible Computing

2603.09661 2026-03-11 cs.LG

FreqCycle: A Multi-Scale Time-Frequency Analysis Method for Time Series Forecasting

Boya Zhang, Shuaijie Yin, Huiwen Zhu, Xing He

Comments 18 pages, 17 figures, accepted to AAAI 2026. Code available at https://github.com/boya-zhang-ai/FreqCycle

2603.09657 2026-03-11 cs.CV cs.AI cs.ET eess.IV

When to Lock Attention: Training-Free KV Control in Video Diffusion

Tianyi Zeng, Jincheng Gao, Tianyi Wang, Zijie Meng, Miao Zhang, Jun Yin, Haoyuan Sun, Junfeng Jiao, Christian Claudel, Junbo Tan, Xueqian Wang

Comments 18 pages, 9 figures, 3 tables

2603.09654 2026-03-11 cs.CL cs.IR

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

Isabelle Augenstein

2603.09653 2026-03-11 cs.CV cs.RO

OTPL-VIO: Robust Visual-Inertial Odometry with Optimal Transport Line Association and Adaptive Uncertainty

Zikun Chen, Wentao Zhao, Yihe Niu, Tianchen Deng, Jingchuan Wang

2603.09651 2026-03-11 cs.LG physics.geo-ph

Well Log-Guided Synthesis of Subsurface Images from Sparse Petrography Data Using cGANs

Ali Sadeghkhani, A. Assadi, B. Bennett, A. Rabbani

Comments 6 pages, 3 figures. Extended abstract presented at the Fifth EAGE Digitalization Conference & Exhibition, 24-26 March 2025, United Kingdom

2603.09641 2026-03-11 cs.AI cs.IR

PRECEPT: Planning Resilience via Experience, Context Engineering & Probing Trajectories A Unified Framework for Test-Time Adaptation with Compositional Rule Learning and Pareto-Guided Prompt Evolution

Arash Shahmansoori

Comments 50 pages, 14 figures. Code and reproducibility resources: https://github.com/arash-shahmansoori/precept-framework

2603.09624 2026-03-11 cs.CV

Decoder-Free Distillation for Quantized Image Restoration

S. M. A. Sharif, Abdur Rehman, Seongwan Kim, Jaeho Lee

2603.09621 2026-03-11 cs.CV

Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution

Shuting Liu, Lei Zhang, Wei Huang, Zhao Zhang, Zizhou Wang

Comments Accepted to ICASSP

2603.09616 2026-03-11 cs.CL

Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

Palmer Schallon

Comments 15 pages, 7 figures, 2 supplementary figures. Code: https://github.com/Palmerschallon/bloom-head-surgery Checkpoints: https://huggingface.co/TheNexus42/bloom-1b7-head-surgery

2603.09611 2026-03-11 cs.CV

ParTY: Part-Guidance for Expressive Text-to-Motion Synthesis

KunHo Heo, SuYeon Kim, Yonghyun Gwon, Youngbin Kim, MyeongAh Cho

Comments Accepted by CVPR 2026. Code: https://github.com/VisualScienceLab-KHU/ParTY

2603.09606 2026-03-11 cs.LG

Learning the Hierarchical Organization in Brain Network for Brain Disorder Diagnosis

Jingfeng Tang, Peng Cao, Guangqi Wen, Jinzhu Yang, Xiaoli Liu, Osmar R. Zaiane

2603.09601 2026-03-11 cs.LG stat.ME stat.ML

MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluation

Elisabeth Sommer James, Asger Hobolth, Marta Pelizzola

2603.09596 2026-03-11 cs.RO

A Generalized Voronoi Graph based Coverage Control Approach for Non-Convex Environment

Zuyi Guo, Ronghao Zheng, Meiqin Liu, Senlin Zhang

Comments 8 pages, 7 figures, published to ACC 2026

2603.09595 2026-03-11 cs.CL

Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models

Shreyas Meher

Comments 33 pages, 5 figures, 13 tables (including appendix)

2603.09589 2026-03-11 cs.LG cs.NA math.NA

Memorization capacity of deep ReLU neural networks characterized by width and depth

Xin Yang, Yunfei Yang

2603.09585 2026-03-11 cs.RO

Towards Terrain-Aware Safe Locomotion for Quadrupedal Robots Using Proprioceptive Sensing

Peiyu Yang, Jiatao Ding, Wei Pan, Claudio Semini, Cosimo Della Santina

Comments 8 pages, 10 figures

2603.09582 2026-03-11 cs.CV

BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers

Chaodong Xiao, Zhengqiang Zhang, Lei Zhang

Comments Accepted by CVPR 2026