arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.18719 2026-03-26 cs.CV cs.AI

Ontology-Guided Diffusion for Zero-Shot Visual Sim2Real Transfer

Mohamed Youssef, Mayar Elfares, Anna-Maria Meer, Matteo Bortoletto, Andreas Bulling

详情

英文摘要

Bridging the simulation-to-reality (sim2real) gap remains challenging as labelled real-world data is scarce. Existing diffusion-based approaches rely on unstructured prompts or statistical alignment, which do not capture the structured factors that make images look real. We introduce Ontology- Guided Diffusion (OGD), a neuro-symbolic zero-shot sim2real image translation framework that represents realism as structured knowledge. OGD decomposes realism into an ontology of interpretable traits -- such as lighting and material properties -- and encodes their relationships in a knowledge graph. From a synthetic image, OGD infers trait activations and uses a graph neural network to produce a global embedding. In parallel, a symbolic planner uses the ontology traits to compute a consistent sequence of visual edits needed to narrow the realism gap. The graph embedding conditions a pretrained instruction-guided diffusion model via cross-attention, while the planned edits are converted into a structured instruction prompt. Across benchmarks, our graph-based embeddings better distinguish real from synthetic imagery than baselines, and OGD outperforms state-of-the-art diffusion methods in sim2real image translations. Overall, OGD shows that explicitly encoding realism structure enables interpretable, data-efficient, and generalisable zero-shot sim2real transfer.

URL PDF HTML ☆

赞 0 踩 0

2603.18336 2026-03-26 cs.RO

ManiDreams: An Open-Source Library for Robust Object Manipulation via Uncertainty-aware Task-specific Intuitive Physics

Gaotian Wang, Kejia Ren, Andrew S. Morgan, Kaiyu Hang

Comments 9 pages, 10 figures. Project page at https://rice-robotpi-lab.github.io/ManiDreams/

2603.18067 2026-03-26 cs.CV cs.DB

DarkDriving: A Real-World Day and Night Aligned Dataset for Autonomous Driving in the Dark Environment

Wuqi Wang, Haochen Yang, Baolu Li, Jiaqi Sun, Xiangmo Zhao, Zhigang Xu, Qing Guo, Haigen Min, Tianyun Zhang, Hongkai Yu

Comments 8 pages, 8 figures. Accepted to ICRA 2026

2603.17889 2026-03-26 cs.CV

Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Yingjie Chen, Shilun Lin, Cai Xing, Binxin Yang, Long Zhou, Qixin Yan, Wenjing Wang, Dingming Liu, Hao Liu, Chen Li, Jing Lyu

2603.13909 2026-03-26 cs.LG cs.AI cs.DC

FedPBS: Proximal-Balanced Scaling Federated Learning Model for Robust Personalized Training for Non-IID Data

Eman M. AbouNassar, Amr Elshall, Sameh Abdulah

2603.13904 2026-03-26 cs.CV cs.AI cs.LG cs.RO

Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition

Seokmin Lee, Yunghee Lee, Byeonghyun Pak, Byeongju Woo

Comments Accepted to CVPR 2026 Workshop: Pixel-level Video Understanding in the Wild

2603.13528 2026-03-26 cs.RO cs.CV

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Dayou Li, Jiuzhou Lei, Hao Wang, Lulin Liu, Yunhao Yang, Zihan Wang, Bangya Liu, Minghui Zheng, Zhiwen Fan

2603.13334 2026-03-26 cs.LG cs.CV cs.PL

Lipschitz-Based Robustness Certification Under Floating-Point Execution

Toby Murray

2603.13119 2026-03-26 cs.CV cs.AI

Geometry-Guided Camera Motion Understanding in VideoLLMs

Haoan Feng, Sri Harsha Musunuri, Guan-Ming Su

Comments 10 pages, 7 figures, supplementary included CVPR2026 PVUW

2603.12665 2026-03-26 cs.RO

TacVLA: Contact-Aware Tactile Fusion for Robust Vision-Language-Action Manipulation

Kaidi Zhang, Heng Zhang, Zhengtong Xu, Zhiyuan Zhang, Md Rakibul Islam Prince, Xiang Li, Xiaojing Han, Yuhao Zhou, Arash Ajoudani, Yu She

Comments 9 pages, 7 figures

2603.12252 2026-03-26 cs.CV cs.CL

EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models

Xuanlang Dai, Yujie Zhou, Long Xing, Jiazi Bu, Xilin Wei, Yuhong Liu, Beichen Zhang, Kai Chen, Yuhang Zang

Comments 23 pages, 18 figures, The code and dataset are publicly available at https://lennoxdai.github.io/EndoCoT-Webpage/

2603.11804 2026-03-26 cs.CV cs.LG

OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs

Stefan Maria Ailuro, Mario Markov, Mohammad Mahdi, Delyan Boychev, Luc Van Gool, Danda Pani Paudel

2603.11442 2026-03-26 cs.AI cs.CV

GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics

Yan Zhang, Simiao Ren, Ankit Raj, En Wei, Dennis Ng, Alex Shen, Jiayu Xue, Yuxin Zhang, Evelyn Marotta

Comments 12 pages, 7 figures, 7 tables

2603.11380 2026-03-26 cs.CV

DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding

Mingzhe Tao, Ruiping Liu, Junwei Zheng, Yufan Chen, Kedi Ying, M. Saquib Sarfraz, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

Comments Accepted to CVPR DriveX Workshop. Dataset and Code: https://github.com/jtjmd/DRIVEXQA

2602.23479 2026-03-26 cs.CL

FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records

Michael Frew, Nishit Bheda, Bryan Tripp

Comments Accepted to LREC 2026 CL4Health Workshop

2602.22212 2026-03-26 cs.CV

Neu-PiG: Neural Preconditioned Grids for Fast Dynamic Surface Reconstruction on Long Sequences

Julian Kaltheuner, Hannah Dröge, Markus Plack, Patrick Stotko, Reinhard Klein

Comments CVPR 2026, Code: https://github.com/vc-bonn/neu-pig

2602.21917 2026-03-26 cs.CV

Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration

Chen Wu, Ling Wang, Zhuoran Zheng, Yuning Cui, Zhixiong Yang, Xiangyu Chen, Yue Zhang, Weidong Jiang, Jingyuan Xia

Comments Aceepted by CVPR26

2602.21415 2026-03-26 cs.LG cs.SY eess.SY

Benchmarking State Space Models, Transformers, and Recurrent Networks for US Grid Forecasting

Sunki Hong, Jisoo Lee

Comments 11 pages, 2 figures, 8 tables

2602.19900 2026-03-26 cs.CV cs.GR

ExpPortrait: Expressive Portrait Generation via Personalized Representation

Junyi Wang, Yudong Guo, Boyang Guo, Shengming Yang, Juyong Zhang

Comments CVPR 2026, Project Page: https://ustc3dv.github.io/ExpPortrait/

2602.19345 2026-03-26 cs.LG cs.AI

Smooth Gate Functions for Soft Advantage Policy Optimization

Egor Denisov, Svetlana Glazyrina, Maksim Kryzhanovskiy, Roman Ischenko

2602.18996 2026-03-26 cs.CV

Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction

Shannan Yan, Leqi Zheng, Keyu Lv, Jingchen Ni, Hongyang Wei, Jiajun Zhang, Guangting Wang, Jing Lyu, Chun Yuan, Fengyun Rao

Comments The paper has been accepted to CVPR 2026 main track

2602.12684 2026-03-26 cs.RO cs.LG

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Rui Cai, Jun Guo, Xinze He, Piaopiao Jin, Jie Li, Bingxuan Lin, Futeng Liu, Wei Liu, Fei Ma, Kun Ma, Feng Qiu, Heng Qu, Yifei Su, Qiao Sun, Dong Wang, Donghao Wang, Yunhong Wang, Rujie Wu, Diyun Xiang, Yu Yang, Hangjun Ye, Yuan Zhang, Quanyun Zhou

Comments Project page: https://xiaomi-robotics-0.github.io

2602.07150 2026-03-26 cs.LG cs.AI cs.SE

On Randomness in Agentic Evals

Bjarni Haukur Bjarnason, André Silva, Martin Monperrus

2602.02378 2026-03-26 cs.CL cs.AI

From Sycophancy to Sensemaking: Premise Governance for Human-AI Decision Making

Raunak Jain

2602.00381 2026-03-26 cs.CV cs.LG

Modeling Image-Caption Rating from Comparative Judgments

Kezia Minni, Qiang Zhang, Monoshiz Mahbub Khan, Zhe Yu

Comments 12 pages

详情

英文摘要

Image caption rating is becoming increasingly important because computer-generated captions are used extensively for descriptive annotation. However, rating the accuracy of captions in describing images is time-consuming and subjective in nature. In contrast, it is often easier for people to compare (between two pairs) which image-caption pair better matches each other. In this study, we propose a machine learning framework that models such comparative judgments instead of direct ratings. The model can then be applied to rank unseen image-caption pairs in the same way as a regression model trained on direct ratings. Inspired by a state-of-the-art regression approach, we extracted visual and text features using a pre-trained ViLBERT model and tweaked the learning parameters of the baseline model to improve the model performance. This new regression model (with Kendall's $τ_c=0.812$) outperformed the baseline model (with Kendall's $τ_c=0.758$) on the VICR dataset. The same model structure was applied to the comparative learning framework. Trained on comparative judgments (image-caption pair A better matches each other than image-caption pair B), the comparative learning model achieved a performance similar (with Kendall's $τ_c=0.804$) to that of the regression model. In addition, a small-scale human subject study was conducted to compare the cost and quality of direct ratings, pairwise comparisons, and same-image comparisons. The results showed that comparative judgments yielded faster results and greater agreement among human annotators than direct ratings. These results suggest that collecting comparative judgments instead of direct ratings as training data labels is promising for lower annotation costs and greater consistency. The model trained on such comparative judgments can perform as well as the model trained on direct ratings.

URL PDF HTML ☆

赞 0 踩 0

2601.19178 2026-03-26 cs.AI

CollectiveKV: Decoupling and Sharing Collaborative Information in Sequential Recommendation

Jingyu Li, Zhaocheng Du, Qianhui Zhu, kaiyuan Li, Zhicheng Zhang, Song-Li Wu, Chaolang Li, Pengwen Dai

Comments Accepted by ICLR 2026

2601.12527 2026-03-26 cs.CV cs.GR

Deep Feature Deformation Weights

Richard Liu, Itai Lang, Rana Hanocka

Comments Project page at https://threedle.github.io/dfd

2601.12410 2026-03-26 cs.AI

Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation

Dingyi Yang, Junqi Zhao, Xue Li, Ce Li, Boyang Li

Comments 23 pages, 11 figures

2601.10305 2026-03-26 cs.CV cs.AI

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Hengyu Shen, Tiancheng Gu, Bin Qin, Lan Wu, Yuling Wu, Shuo Tan, Zelong Sun, Jun Wang, Nan Wu, Xiang An, Weidong Cai, Ziyong Feng, Kaicheng Yang

Comments 19 pages, 11 figures, 7 tables

2601.09195 2026-03-26 cs.CL cs.AI

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang