arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20194 2026-03-23 cs.CV

MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints

Yu Qi, Xinyi Xu, Ziyu Guo, Siyuan Ma, Renrui Zhang, Xinyan Chen, Ruichuan An, Ruofan Xing, Jiayi Zhang, Haojie Huang, Pheng-Ann Heng, Jonathan Tremblay, Lawson L. S. Wong

详情

英文摘要

Video generative models show emerging reasoning behaviors. It is essential to ensure that generated events remain causally consistent across frames for reliable deployment, a property we define as reasoning coherence. To bridge the gap in literature for missing reasoning coherence evaluation, we propose MME-CoF-Pro, a comprehensive video reasoning benchmark to assess reasoning coherence in video models. Specifically, MME-CoF-Pro contains 303 samples across 16 categories, ranging from visual logical to scientific reasoning. It introduces Reasoning Score as evaluation metric for assessing process-level necessary intermediate reasoning steps, and includes three evaluation settings, (a) no hint (b) text hint and (c) visual hint, enabling a controlled investigation into the underlying mechanisms of reasoning hint guidance. Evaluation results in 7 open and closed-source video models reveals insights including: (1) Video generative models exhibit weak reasoning coherence, decoupled from generation quality. (2) Text hints boost apparent correctness but often cause inconsistency and hallucinated reasoning (3) Visual hints benefit structured perceptual tasks but struggle with fine-grained perception. Website: https://video-reasoning-coherence.github.io/

URL PDF HTML ☆

赞 0 踩 0

2603.20193 2026-03-23 cs.CV cs.AI cs.LG

From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering

Xinyi Shang, Yi Tang, Jiacheng Cui, Ahmed Elhagry, Salwa K. Al Khatib, Sondos Mahmoud Bsharat, Jiacheng Liu, Xiaohan Zhao, Jing-Hao Xue, Hao Li, Salman Khan, Zhiqiang Shen

Comments Code and data at: https://github.com/VILA-Lab/PIXAR (Accepted in CVPR 2026 Findings, but not opted in)

2603.20192 2026-03-23 cs.CV cs.AI

LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

Jiazheng Xing, Fei Du, Hangjie Yuan, Pengwei Liu, Hongbin Xu, Hai Ci, Ruigang Niu, Weihua Chen, Fan Wang, Yong Liu

Comments ICLR 2026 Camera Ready Version. Code and Models: https://jiazheng-xing.github.io/lumosx-home/

2603.20191 2026-03-23 cs.CV

Deterministic Mode Proposals: An Efficient Alternative to Generative Sampling for Ambiguous Segmentation

Sebastian Gerard, Josephine Sullivan

2603.20188 2026-03-23 cs.CV

Wildfire Spread Scenarios: Increasing Sample Diversity of Segmentation Diffusion Models with Training-Free Methods

Sebastian Gerard, Josephine Sullivan

Comments Accepted at NLDL 2026. This version contains small corrections compared to the initial publication, see appendix for details

2603.20186 2026-03-23 cs.CV

Improving Image-to-Image Translation via a Rectified Flow Reformulation

Satoshi Iizuka, Shun Okamoto, Kazuhiro Fukui

2603.20185 2026-03-23 cs.CV cs.AI cs.CL

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking

Jingyang Lin, Jialian Wu, Jiang Liu, Ximeng Sun, Ze Wang, Xiaodong Yu, Jiebo Luo, Zicheng Liu, Emad Barsoum

Comments Accepted at CVPR 2026

2603.20184 2026-03-23 cs.LG stat.ML

Kolmogorov-Arnold causal generative models

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras

Comments 14 pages, 8 figures, 3 tables, 5 algorithms, preprint

2603.20174 2026-03-23 cs.CV

TinyML Enhances CubeSat Mission Capabilities

Luigi Capogrosso, Michele Magno

Comments Accepted at the 17th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS) 2026

2603.20170 2026-03-23 cs.AI

Learning Dynamic Belief Graphs for Theory-of-mind Reasoning

Ruxiao Chen, Xilei Zhao, Thomas J. Cova, Frank A. Drews, Susu Xu

2603.20169 2026-03-23 cs.CV cs.MM

EgoForge: Goal-Directed Egocentric World Simulator

Yifan Shen, Jiateng Liu, Xinzhuo Li, Yuanzhe Liu, Bingxuan Li, Houze Yang, Wenqi Jia, Yijiang Li, Tianjiao Yu, James Matthew Rehg, Xu Cao, Ismini Lourentzou

2603.20165 2026-03-23 cs.SD eess.AS

Audio Avatar Fingerprinting: An Approach for Authorized Use of Voice Cloning in the Era of Synthetic Audio

Candice R. Gerstner

2603.20164 2026-03-23 cs.RO cs.AI

The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning

Jiyu Lim, Youngwoo Yoon, Kwanghyun Park

Comments Accepted to ICRA 2026. 8 pages, 9 figures, Project page: https://limjiyu99.github.io/inner-critic/

2603.20162 2026-03-23 cs.CL

Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models

Sai Koneru, Elphin Joe, Christine Kirchhoff, Jian Wu, Sarah Rajtmajer

2603.20161 2026-03-23 cs.CL cs.AI cs.LG

Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models

Qi Cao, Andrew Gambardella, Takeshi Kojima, Yutaka Matsuo, Yusuke Iwasawa

Comments EACL 2026

2603.20155 2026-03-23 cs.LG cs.CV stat.ML

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

2603.20149 2026-03-23 cs.CL cs.AI cs.LG

Enhancing Hyperspace Analogue to Language (HAL) Representations via Attention-Based Pooling for Text Classification

Ali Sakour, Zoalfekar Sakour

Comments 7 pages, 1 figure, 1 table

2603.20148 2026-03-23 cs.CV

Can Large Multimodal Models Inspect Buildings? A Hierarchical Benchmark for Structural Pathology Reasoning

Hui Zhong, Yichun Gao, Luyan Liu, Hai Yang, Wang Wang, Haowei Zhang, Xinhu Zheng

2603.20147 2026-03-23 cs.RO

AGILE: A Comprehensive Workflow for Humanoid Loco-Manipulation Learning

Huihua Zhao, Rafael Cathomen, Lionel Gulich, Wei Liu, Efe Arda Ongan, Michael Lin, Shalin Jain, Soha Pouya, Yan Chang

2603.20143 2026-03-23 cs.CV

Synergistic Perception and Generative Recomposition: A Multi-Agent Orchestration for Expert-Level Building Inspection

Hui Zhong, Yichun Gao, Luyan Liu, Xusen Guo, Zhaonian Kuang, Qiming Zhang, Xinhu Zheng

2603.20132 2026-03-23 cs.LG

Revisiting Gene Ontology Knowledge Discovery with Hierarchical Feature Selection and Virtual Study Group of AI Agents

Cen Wan, Alex A. Freitas

2603.20129 2026-03-23 cs.RO

KUKAloha: A General, Low-Cost, and Shared-Control based Teleoperation Framework for Construction Robot Arm

Yifan Xu, Qizhang Shen, Vineet Kamat, Carol Menassa

Comments 9 pages, 4 figures, 1 table

2603.20128 2026-03-23 cs.CV

Generalizable NGP-SR: Generalizable Neural Radiance Fields Super-Resolution via Neural Graph Primitives

Wanqi Yuan, Omkar Sharad Mayekar, Connor Pennington, Nianyi Li

2603.20121 2026-03-23 cs.RO

Not an Obstacle for Dog, but a Hazard for Human: A Co-Ego Navigation System for Guide Dog Robots

Ruiping Liu, Jingqi Zhang, Junwei Zheng, Yufan Chen, Peter Seungjune Lee, Di Wen, Kunyu Peng, Jiaming Zhang, Kailun Yang, Katja Mombaur, Rainer Stiefelhagen

2603.20116 2026-03-23 cs.CV cs.AI

Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

Jiajie Li, Chenhui Xu, Meihuan Liu, Jinjun Xiong

2603.20115 2026-03-23 cs.LG q-bio.BM q-bio.QM

Conditioning Protein Generation via Hopfield Pattern Multiplicity

Jeffrey D. Varner

详情

英文摘要

Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored sequences equally and cannot direct generation toward a functional subset of interest. We show that a single scalar parameter, added as a bias to the sampler's attention logits, continuously shifts generation from the full family toward a user-specified subset, with no retraining and no change to the model architecture. A practitioner supplies a small set of sequences (for example, hits from a binding screen) and a multiplicity ratio that controls how strongly generation favors them. The method is agnostic to what the subset represents: binding, stability, specificity, or any other property. We find that the conditioning is exact at the level of the sampler's internal representation, but that the decoded sequence phenotype can fall short because the dimensionality reduction used to encode sequences does not always preserve the residue-level variation that defines the functional split. We term this discrepancy the calibration gap and show that it is predicted by a simple geometric measure of how well the encoding separates the functional subset from the rest of the family. Experiments on five Pfam families (Kunitz, SH3, WW, Homeobox, and Forkhead domains) confirm the monotonic relationship between separation and gap across a fourfold range of geometries. Applied to omega-conotoxin peptides targeting a calcium channel involved in pain signaling, curated seeding from 23 characterized binders produces over a thousand candidates that preserve the primary pharmacophore and all experimentally identified binding determinants. These results show that stochastic attention enables practitioners to expand a handful of experimentally characterized sequences into diverse candidate libraries without retraining a generative model.

URL PDF HTML ☆

赞 0 踩 0

2603.20111 2026-03-23 cs.LG cs.AI

Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning

Moritz Gögl, Christopher Yau

2603.20109 2026-03-23 cs.LG cs.IT math.IT

GO-GenZip: Goal-Oriented Generative Sampling and Hybrid Compression

Pietro Talli, Qi Liao, Alessandro Lieto, Parijat Bhattacharjee, Federico Chiariotti, Andrea Zanella

2603.20108 2026-03-23 cs.LG cs.CR

Trojan horse hunt in deep forecasting models: Insights from the European Space Agency competition

Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Artur Janicki, Przemysław Biecek, Ambros Marzetta, Atul Pande, Lalit Chandra Routhu, Swapnil Srivastava, Evridiki Ntagiou

Comments 43 pages, 18 figures

2603.20105 2026-03-23 cs.LG cs.AI

The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus

Amartya Roy, Rasul Tutunov, Xiaotong Ji, Matthieu Zimmer, Haitham Bou-Ammar