arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.15994 2026-04-20 cs.AI

ReactBench: A Benchmark for Topological Reasoning in MLLMs on Chemical Reaction Diagrams

Qiang Xu, Shengyuan Bai, Yu Wang, He Cao, Leqing Chen, Yuanyuan Liu, Bin Feng, Zijing Liu, Yu Li

详情

英文摘要

Multimodal Large Language Models (MLLMs) excel at recognizing individual visual elements and reasoning over simple linear diagrams. However, when faced with complex topological structures involving branching paths, converging flows, and cyclic dependencies, their reasoning capabilities degrade sharply, even on tasks as basic as counting endpoints. Existing benchmarks fail to probe this gap, focusing on semantic comprehension rather than structural reasoning. We introduce ReactBench, a benchmark that reveals fundamental limitations in structural reasoning through chemical reaction diagrams. These real-world scientific diagrams offer an ideal testbed because they naturally span diverse structures from linear chains to cyclic graphs, while requiring both precise local recognition and coherent global reasoning. Our benchmark comprises 1,618 expert-annotated QA pairs across four hierarchical task dimensions. Extensive evaluation across 17 MLLMs reveals a significant performance gap exceeding 30% between anchor-based tasks and holistic structural reasoning tasks. Controlled ablations confirm this bottleneck lies in reasoning, not perception. These findings expose a fundamental deficit in structural understanding and establish directions for advancing visual reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.15979 2026-04-20 cs.CV

MMGait: Towards Multi-Modal Gait Recognition

Chenye Wang, Qingyuan Cai, Saihui Hou, Aoqi Li, Yongzhen Huang

Comments CVPR 2026

2604.15977 2026-04-20 cs.LG

Impact of Nonlinear Power Amplifier on Massive MIMO: Machine Learning Prediction Under Realistic Radio Channel

Marcin Hoffmann, Paweł Kryszkiewicz

Comments Accepted for publication in IEEE Transactions on Vehicular Technology

详情

英文摘要

M-MIMO is one of the crucial technologies for increasing spectral and energy efficiency of wireless networks. Most of the current works assume that M-MIMO arrays are equipped with a linear front end. However, ongoing efforts to make wireless networks more energy-efficient push the hardware to the limits, where its nonlinear behavior appears. This is especially a common problem for the multicarrier systems, e.g., OFDM used in 4G, 5G, and possibly also in 6G, which is characterized by a high Peak-to-Average Power Ratio. While the impact of a nonlinear Power Amplifier (PA) on an OFDM signal is well characterized, it is a relatively new topic for the M-MIMO OFDM systems. Most of the recent works either neglect nonlinear effects or utilize simplified models proper for Rayleigh or LoS radio channel models. In this paper, we first theoretically characterize the nonlinear distortion in the M-MIMO system under commonly used radio channel models. Then, utilizing 3D-Ray Tracing (3D-RT) software, we demonstrate that these models are not very accurate. Instead, we propose two models: a statistical one and an ML-based one using 3D-RT results. The proposed statistical model utilizes the Generalized Extreme Value (GEV) distribution to model Signal to Distortion Ratio (SDR) for victim users, receiving nonlinear distortion, e.g., as interference from neighboring cells. The proposed ML model aims to predict SDR for a scheduled user (receiving nonlinear distortion along with the desired signal), based on the spatial characteristics of the radio channel and the operation point of each PA feeding at the M-MIMO antenna array. The predicted SDR can then be used to perform PA-aware per-user power allocation. The results show about 12% median gain in user throughput achieved by the proposed ML-based power allocation scheme over the state-of-the-art, fixed operating point scheme.

URL PDF HTML ☆

赞 0 踩 0

2604.15972 2026-04-20 cs.AI cs.CL cs.MA

Weak-Link Optimization for Multi-Agent Reasoning and Collaboration

Haoyu Bian, Chaoning Zhang, Jiaquan Zhang, Xingyao Li, Yuanfang Guo, Wei Dong, Yang Yang

Comments 13 pages, 4 figures. Submitted to CAAI Transactions on Intelligence Technology

2604.15961 2026-04-20 cs.LG

Evaluating quality in synthetic data generation for large tabular health datasets

Jean-Baptiste Escudié, Benjamin Barnes, Stefan Meisegeier, Klaus Kraywinkel, Fabian Prasser, Nils Körber

2604.15948 2026-04-20 cs.CV

From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance

Jinhao Shen, Haoqian Du, Xulu Zhang, Xiao-Yong Wei, Qing Li

2604.15946 2026-04-20 cs.CV cs.RO

SENSE: Stereo OpEN Vocabulary SEmantic Segmentation

Thomas Campagnolo, Ezio Malis, Philippe Martinet, Gaétan Bahl

2604.15945 2026-04-20 cs.CL cs.LG

RAGognizer: Hallucination-Aware Fine-Tuning via Detection Head Integration

Fabian Ridder, Laurin Lessel, Malte Schilling

Comments accepted at IJCNN 2026

2604.15941 2026-04-20 cs.CV cs.GR

Neural Gabor Splatting: Enhanced Gaussian Splatting with Neural Gabor for High-frequency Surface Reconstruction

Haato Watanabe, Nobuyuki Umetani

Comments Accepted to CVPR 2026

2604.15940 2026-04-20 cs.LG stat.AP

(Weighted) Adaptive Radius Near Neighbor Search: Evaluation for WiFi Fingerprint-based Positioning

Khang Le, Joaquín Torres-Sospedra, Philipp Müller

Comments 11 pages, 2 figures, 2 tables, submitted to IPIN 2026

2604.15938 2026-04-20 cs.RO

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

Xinglei Yu, Zhenyang Liu, Shufeng Nan, Simo Wu, Yanwei Fu

2604.15923 2026-04-20 cs.SD cs.CV

Hierarchical Codec Diffusion for Video-to-Speech Generation

Jiaxin Ye, Gaoxiang Cong, Chenhui Wang, Xin-Cheng Wen, Zhaoyang Li, Boyuan Cao, Hongming Shan

Comments CVPR 2026

2604.15917 2026-04-20 cs.CV

Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions

Bo Zhao, Kairui Guo, Runnan Du, Haiyang Sun, Pengshan Wang, Huan Yang, Kun Gai, Yixin Cao, Wei Ji

Comments 9pages

2604.15911 2026-04-20 cs.CV

Efficient Video Diffusion Models: Advancements and Challenges

Shitong Shao, Lichen Bai, Pengfei Wan, James Kwok, Zeke Xie

2604.15907 2026-04-20 cs.RO

A Reconfigurable Pneumatic Joint Enabling Localized Selective Stiffening and Shape Locking in Vine-Inspired Robots

Ayodele James Oyejide, Ustaz A. Yaqub, Samir Erturk, Eray A. Baran, Fabio Stroppa

Comments Original Article

2604.15903 2026-04-20 cs.CV

AeroDeshadow: Physics-Guided Shadow Synthesis and Penumbra-Aware Deshadowing for Aerospace Imagery

Wei Lu, Zi-Yang Bo, Fei-Fei Sang, Yi Liu, Xue Yang, Si-Bao Chen

Comments 13 pages, 12 figures

2604.15893 2026-04-20 cs.CV

PolarMAE: Efficient Fetal Ultrasound Pre-training via Semantic Screening and Polar-Guided Masking

Meng Lv, Yapeng Li, Hang Su, Juhua Liu, Bo Du

Comments 10 pages, 6 figures, 3 tables

2604.15890 2026-04-20 cs.RO

Robust Fleet Sizing for Multi-UAV Inspection Missions under Synchronized Replacement Demand

Vishal Ramesh, Antony Thomas

2604.15877 2026-04-20 cs.AI cs.CL cs.MA

Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He

2604.15875 2026-04-20 cs.CV

CLOTH-HUGS: Cloth Aware Human Gaussian Splatting

Sadia Mubashshira, Nazanin Amini, Kevin Desai

2604.15873 2026-04-20 cs.CL

How Hypocritical Is Your LLM judge? Listener-Speaker Asymmetries in the Pragmatic Competence of Large Language Models

Judith Sieker, Sina Zarrieß

Comments Accepted at ACL 2026 (findings)

2604.15871 2026-04-20 cs.CV cs.AI

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

Lifan Jiang, Tianrun Wu, Yuhang Pei, Chenyang Wang, Boxi Wu, Deng Cai

2604.15866 2026-04-20 cs.CL cs.AI cs.LG

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Siun Kim, Hyung-Jin Yoon

Comments 9 pages, 3 figures; Accepted to the ACL 2026 Main Conference

2604.15865 2026-04-20 cs.RO

DTEA: A Dual-Topology Elastic Actuator Enabling Real-Time Switching Between Series and Parallel Compliance

Vishal Ramesh, Aman Singh, Shishir Kolathaya

2604.15862 2026-04-20 cs.CV

Splats in Splats++: Robust and Generalizable 3D Gaussian Splatting Steganography

Yijia Guo, Wenkai Huang, Tong Hu, Gaolei Li, Yang Li, Yuxin Hong, Liwen Hu, Xitong Ling, Jianhua Li, Shengbo Chen, Tiejun Huang, Lei Ma

详情

英文摘要

3D Gaussian Splatting (3DGS) has recently redefined the paradigm of 3D reconstruction, striking an unprecedented balance between visual fidelity and computational efficiency. As its adoption proliferates, safeguarding the copyright of explicit 3DGS assets has become paramount. However, existing invisible message embedding frameworks struggle to reconcile secure and high-capacity data embedding with intrinsic asset utility, often disrupting the native rendering pipeline or exhibiting vulnerability to structural perturbations. In this work, we present \textbf{\textit{Splats in Splats++}}, a unified and pipeline-agnostic steganography framework that seamlessly embeds high-capacity 3D/4D content directly within the native 3DGS representation. Grounded in a principled analysis of the frequency distribution of Spherical Harmonics (SH), we propose an importance-graded SH coefficient encryption scheme that achieves imperceptible embedding without compromising the original expressive power. To fundamentally resolve the geometric ambiguities that lead to message leakage, we introduce a \textbf{Hash-Grid Guided Opacity Mapping} mechanism. Coupled with a novel \textbf{Gradient-Gated Opacity Consistency Loss}, our formulation enforces a stringent spatial-attribute coupling between the original and hidden scenes, effectively projecting the discrete attribute mapping into a continuous, attack-resilient latent manifold. Extensive experiments demonstrate that our method substantially outperforms existing approaches, achieving up to \textbf{6.28 db} higher message fidelity, \textbf{3$\times$} faster rendering, and exceptional robustness against aggressive 3D-targeted structural attacks (e.g., GSPure). Furthermore, our framework exhibits remarkable versatility, generalizing seamlessly to 2D image embedding, 4D dynamic scene steganography, and diverse downstream tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.15859 2026-04-20 cs.LG cs.AI

QuantSightBench: Evaluating LLM Quantitative Forecasting with Prediction Intervals

Jeremy Qin, Maksym Andriushchenko

2604.15857 2026-04-20 cs.CV

AHS: Adaptive Head Synthesis via Synthetic Data Augmentations

Taewoong Kang, Hyojin Jang, Sohyun Jeong, Seunggi Moon, Gihwi Kim, Hoon Jin Jung, Jaegul choo

Comments CVPR 2026, Project Page : https://keh0t0.github.io/AHS/

2604.15856 2026-04-20 cs.CV cs.AI

Robust Multispectral Semantic Segmentation under Missing or Full Modalities via Structured Latent Projection

Irem Ulku, Erdem Akagündüz, Ömer Özgür Tanrıöver

Comments 15 pages, 7 figures, 9 tables

详情

英文摘要

Multimodal remote sensing data provide complementary information for semantic segmentation, but in real-world deployments, some modalities may be unavailable due to sensor failures, acquisition issues, or challenging atmospheric conditions. Existing multimodal segmentation models typically address missing modalities by learning a shared representation across inputs. However, this approach can introduce a trade-off by compromising modality-specific complementary information and reducing performance when all modalities are available. In this paper, we tackle this limitation with CBC-SLP, a multimodal semantic segmentation model designed to preserve both modality-invariant and modality-specific information. Inspired by the theoretical results on modality alignment, which state that perfectly aligned multimodal representations can lead to sub-optimal performance in downstream prediction tasks, we propose a novel structured latent projection approach as an architectural inductive bias. Rather than enforcing this strategy through a loss term, we incorporate it directly into the architecture. In particular, to use the complementary information effectively while maintaining robustness under random modality dropout, we structure the latent representations into shared and modality-specific components and adaptively transfer them to the decoder according to the random modality availability mask. Extensive experiments on three multimodal remote sensing image sets demonstrate that CBC-SLP consistently outperforms state-of-the-art multimodal models across full and missing modality scenarios. Besides, we empirically demonstrate that the proposed strategy can recover the complementary information that may not be preserved in a shared representation. The code is available at https://github.com/iremulku/Multispectral-Semantic-Segmentation-via-Structured-Latent-Projection-CBC-SLP-.

URL PDF HTML ☆

赞 0 踩 0

2604.15854 2026-04-20 cs.RO

Limits of Lamarckian Evolution Under Pressure of Morphological Novelty

Jed R Muff, Karine Miras, A. E. Eiben

Comments 8 pages, 7 figures, Submitted to WCCI 2026

2604.15853 2026-04-20 cs.CV

Learning to Look before Learning to Like: Incorporating Human Visual Cognition into Aesthetic Quality Assessment

Liwen Yu, Chi Liu, Xiaotong Han, Congcong Zhu, Minghao Wang, Sheng Shen

Comments Accepted for Poster Presentation at CogSci 2026