arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.15312 2026-04-17 cs.CV

Bidirectional Cross-Modal Prompting for Event-Frame Asymmetric Stereo

Ninghui Xu, Fabio Tosi, Lihui Wang, Jiawei Han, Luca Bartolomei, Zhiting Yao, Matteo Poggi, Stefano Mattoccia

Comments CVPR 2026. Code URL: https://github.com/xnh97/Bi-CMPStereo

详情

英文摘要

Conventional frame-based cameras capture rich contextual information but suffer from limited temporal resolution and motion blur in dynamic scenes. Event cameras offer an alternative visual representation with higher dynamic range free from such limitations. The complementary characteristics of the two modalities make event-frame asymmetric stereo promising for reliable 3D perception under fast motion and challenging illumination. However, the modality gap often leads to marginalization of domain-specific cues essential for cross-modal stereo matching. In this paper, we introduce Bi-CMPStereo, a novel bidirectional cross-modal prompting framework that fully exploits semantic and structural features from both domains for robust matching. Our approach learns finely aligned stereo representations within a target canonical space and integrates complementary representations by projecting each modality into both event and frame domains. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in accuracy and generalization.

URL PDF HTML ☆

赞 0 踩 0

2604.15311 2026-04-17 cs.CV

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

Zhanhao Liang, Tao Yang, Jie Wu, Chengjian Feng, Liang Zheng

Comments Accepted by CVPR 2026. Project page: https://rockeycoss.github.io/leapalign/

2604.15309 2026-04-17 cs.CV cs.AI cs.CL

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo

2604.15308 2026-04-17 cs.CV

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Hao Gao, Shaoyu Chen, Yifan Zhu, Yuehao Song, Wenyu Liu, Qian Zhang, Xinggang Wang

Comments Project page: https://hgao-cv.github.io/RAD-2

2604.15307 2026-04-17 quant-ph cs.IT math.IT

Heuristic Search for Minimum-Distance Upper-Bound Witnesses in Quantum APM-LDPC Codes

Kenta Kasai

2604.15306 2026-04-17 cs.AI cs.LG

Generalization in LLM Problem Solving: The Case of the Shortest Path

Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri

2604.15302 2026-04-17 cs.AI cs.CL cs.LG

Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

Manan Gupta, Dhruv Kumar

Comments Under Review

2604.15299 2026-04-17 cs.CV

AnimationBench: Are Video Models Good at Character-Centric Animation?

Leyi Wu, Pengjun Fang, Kai Sun, Yazhou Xing, Yinwei Wu, Songsong Wang, Ziqi Huang, Dan Zhou, Yingqing He, Ying-Cong Chen, Qifeng Chen

Comments Project Page: https://animationbench.github.io Code: https://github.com/VideoVerses/AnimationBench

2604.15298 2026-04-17 quant-ph cs.DS

Super-Constant Weight Dicke States in Constant Depth Without Fanout

Lucas Gretta, Meghal Gupta, Malvika Raj Joshi

2604.15295 2026-04-17 cs.IT math.IT

Reed--Muller Codes Achieve the Symmetric Capacity on Finite-State Channels

Henry D. Pfister, Navin Kashyap, Jean-Francois Chamberland, Galen Reeves

Comments 14 pages, extended version of paper accepted to ISIT 2026

2604.15294 2026-04-17 cs.AI

How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

Zhen Yang, Ping Jian, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu

Comments Published as a main-conference paper at The 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

2604.15291 2026-04-17 cs.CV cs.AI

AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto

2604.15289 2026-04-17 cs.RO

Abstract Sim2Real through Approximate Information States

Yunfu Deng, Yuhao Li, Josiah P. Hanna

2604.15285 2026-04-17 stat.ML cs.LG math.ST stat.TH

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

Víctor Soto-Larrosa, Nuria Torrado, Edmundo J. Huertas

2604.15282 2026-04-17 cs.IT math.IT

Bandwidth Cost of Locally Repairable Convertible Codes in the Global Merge Regime

Saransh Chopra, Shubhransh Singhvi, K. V. Rashmi

Comments This is an extended version of an IEEE ISIT 2026 paper with the same title

2604.15281 2026-04-17 cs.CV cs.RO

R3D: Revisiting 3D Policy Learning

Zhengdong Hong, Shenrui Wu, Haozhe Cui, Boyi Zhao, Ran Ji, Yiyang He, Hangxing Zhang, Zundong Ke, Jun Wang, Guofeng Zhang, Jiayuan Gu

2604.15280 2026-04-17 cs.CV cs.AI

Why Do Vision Language Models Struggle To Recognize Human Emotions?

Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh

详情

英文摘要

Understanding emotions is a fundamental ability for intelligent systems to be able to interact with humans. Vision-language models (VLMs) have made tremendous progress in the last few years for many visual tasks, potentially offering a promising solution for understanding emotions. However, it is surprising that even the most sophisticated contemporary VLMs struggle to recognize human emotions or to outperform even specialized vision-only classifiers. In this paper we ask the question "Why do VLMs struggle to recognize human emotions?", and observe that the inherently continuous and dynamic task of facial expression recognition (DFER) exposes two critical VLM vulnerabilities. First, emotion datasets are naturally long-tailed, and the web-scale data used to pre-train VLMs exacerbates this head-class bias, causing them to systematically collapse rare, under-represented emotions into common categories. We propose alternative sampling strategies that prevent favoring common concepts. Second, temporal information is critical for understanding emotions. However, VLMs are unable to represent temporal information over dense frame sequences, as they are limited by context size and the number of tokens that can fit in memory, which poses a clear challenge for emotion recognition. We demonstrate that the sparse temporal sampling strategy used in VLMs is inherently misaligned with the fleeting nature of micro-expressions (0.25-0.5 seconds), which are often the most critical affective signal. As a diagnostic probe, we propose a multi-stage context enrichment strategy that utilizes the information from "in-between" frames by first converting them into natural language summaries. This enriched textual context is provided as input to the VLM alongside sparse keyframes, preventing attentional dilution from excessive visual data while preserving the emotional trajectory.

URL PDF HTML ☆

赞 0 踩 0

2604.15279 2026-04-17 cs.DC

Wave-Based Dispatch for Circuit Cutting in Hybrid HPC--Quantum Systems

Ricard S. García-Raigada, Josep Jorba, Sergio Iserte

Comments 18 pages

2604.15278 2026-04-17 cs.SD eess.AS

A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas

Ignasi Sole

2604.15273 2026-04-17 cs.LG quant-ph

How Embeddings Shape Graph Neural Networks: Classical vs Quantum-Oriented Node Representations

Nouhaila Innan, Antonello Rosato, Alberto Marchisio, Muhammad Shafique

Comments 6 pages. Accepted at IJCNN 2026

2604.15272 2026-04-17 cs.PL cs.AI cs.LG

Prism: Symbolic Superoptimization of Tensor Programs

Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia

2604.15270 2026-04-17 cs.SE

Enhancing Large Language Models with Retrieval Augmented Generation for Software Testing and Inspection Automation

Zoe Fingleton, Nazanin Siavash, Armin Moin

2604.15269 2026-04-17 quant-ph cs.LG math.ST stat.TH

Cloning is as Hard as Learning for Stabilizer States

Nikhil Bansal, Matthias C. Caro, Gaurav Mahajan

Comments 10 + 33 + 8 pages

2604.15267 2026-04-17 cs.GT cs.AI cs.CL cs.CY cs.MA

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin

Comments 65 pages, 38 Figures, 8 Tables, 17 Listings

2604.15266 2026-04-17 cs.LO cs.PL

Simplifying Safety Proofs with Forward-Backward Reasoning and Prophecy

Eden Frenkel, Kenneth L. McMillan, Oded Padon, Sharon Shoham

2604.15252 2026-04-17 eess.SY cs.SY math.OC

Tube-Based Robust Data-Driven Predictive Control

Chi Wang, David Angeli

Comments 16 pages, 5 figures

2604.15248 2026-04-17 quant-ph cs.CC

IQP circuits for 2-Forrelation

Quentin Buzet, André Chailloux

Comments 27 pages

2604.15247 2026-04-17 cs.CG

Orthogonal Strip Partitioning of Polygons: Lattice-Theoretic Algorithms and Lower Bounds

Jaehoon Chung

详情

英文摘要

We study a variant of a polygon partition problem, introduced by Chung, Iwama, Liao, and Ahn [ISAAC'25]. Given orthogonal unit vectors $\mathbf{u},\mathbf{v}\in \mathbb{R}^2$ and a polygon $P$ with $n$ vertices, we partition $P$ into connected pieces by cuts parallel to $\mathbf{v}$ such that each resulting subpolygon has width at most one in direction $\mathbf{u}$. We consider the value version, which asks for the minimum number of strips, and the reporting version, which outputs a compact encoding of the cuts in an optimal strip partition. We give efficient algorithms and lower bounds for both versions on three classes of polygons of increasing generality: convex, simple, and self-overlapping. For convex polygons, we solve the value version in $O(\log n)$ time and the reporting version in $O\!\left(h \log\left(1 + \frac{n}{h}\right)\right)$ time, where $h$ is the width of $P$ in direction $\mathbf{u}$. We prove matching lower bounds in the decision-tree model, showing that the reporting algorithm is input-sensitive optimal with respect to $h$. For simple polygons, we present $O(n \log n)$-time, $O(n)$-space algorithms for both versions and prove an $Ω(n)$ lower bound. For self-overlapping polygons, we extend the approach for simple polygons to obtain $O(n \log n)$-time, $O(n)$-space algorithms for both versions, and we prove a matching $Ω(n \log n)$ lower bound in the algebraic computation-tree model via a reduction from the $δ$-closeness problem. Our approach relies on a lattice-theoretic formulation of the problem. We represent strip partitions as antichains of intervals in the Clarke--Cormack--Burkowski lattice, originally developed for minimal-interval semantics in information retrieval. Within this lattice framework, we design a dynamic programming algorithm that uses the lattice operations of meet and join.

URL PDF HTML ☆

赞 0 踩 0

2604.15244 2026-04-17 cs.CL

From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

Kiran Purohit, Ramasuri Narayanam, Soumyabrata Pal

2604.15242 2026-04-17 cs.LG

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

Come Fiegel, Pierre Menard, Tadashi Kozuno, Michal Valko, Vianney Perchet