arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.13618 2026-04-16 cs.CL cs.LG

C2: Scalable Rubric-Augmented Reward Modeling from Binary Preferences

Akira Kawabata, Saku Sugawara

Comments ACL 2026

详情

英文摘要

Rubric-augmented verification guides reward models with explicit evaluation criteria, yielding more reliable judgments than single-model verification. However, most existing methods require costly rubric annotations, limiting scalability. Moreover, we find that rubric generation is vulnerable to a failure of cooperation; low-quality rubrics actively mislead reward models rather than help. Inspired by the principle of cooperative communication, we propose Cooperative yet Critical reward modeling (C2), a framework that significantly improves reward model judgments by having the reward model critically collaborate with a rubric generator trained solely from binary preferences. In C2, we synthesize helpful and misleading rubric pairs by measuring how each rubric shifts the reward model toward or away from the correct preference. Using these contrastive pairs, we train a cooperative rubric generator to propose helpful rubrics, and a critical verifier to assess rubric validity before making its judgment, following only rubrics it deems helpful at inference time. C2 outperforms reasoning reward models trained on the same binary preferences, with gains of up to 6.5 points on RM-Bench and 6.0 points length-controlled win rate on AlpacaEval 2.0. Without external rubric annotations, C2 enables an 8B reward model to match performance achieved with rubrics from a 4$\times$ larger model. Overall, our work demonstrates that eliciting deliberate cooperation in rubric-augmented verification makes reward models more trustworthy in a scalable way.

URL PDF HTML ☆

赞 0 踩 0

2604.13610 2026-04-16 cs.CV

What Are We Really Measuring? Rethinking Dataset Bias in Web-Scale Natural Image Collections via Unsupervised Semantic Clustering

Amir Hossein Saleknia, Mohammad Sabokrou

详情

DOI: 10.1016/j.neucom.2026.133679

英文摘要

In computer vision, a prevailing method for quantifying dataset bias is to train a model to distinguish between datasets. High classification accuracy is then interpreted as evidence of meaningful semantic differences. This approach assumes that standard image augmentations successfully suppress low-level, non-semantic cues, and that any remaining performance must therefore reflect true semantic divergence. We demonstrate that this fundamental assumption is flawed within the domain of large-scale natural image collections. High classification accuracy is often driven by resolution-based artifacts, which are structural fingerprints arising from native image resolution distributions and interpolation effects during resizing. These artifacts form robust, dataset-specific signatures that persist despite conventional image corruptions. Through controlled experiments, we show that models achieve strong dataset classification even on non-semantic, procedurally generated images, proving their reliance on superficial cues. To address this issue, we revisit this decades-old idea of dataset separability, but not with supervised classification. Instead, we introduce an unsupervised approach that measures true semantic separability. Our framework directly assesses semantic similarity by clustering semantically-rich features from foundational vision models, deliberately bypassing supervised classification on dataset labels. When applied to major web-scale datasets, the primary focus of this work, the high separability reported by supervised methods largely vanishes, with clustering accuracy dropping to near-chance levels. This reveals that conventional classification-based evaluation systematically overstates semantic bias by an overwhelming margin.

URL PDF HTML ☆

赞 0 踩 0

2604.13609 2026-04-16 cs.LG cs.AI

Golden Handcuffs make safer AI agents

Aram Ebtekar, Michael K. Cohen

Comments 26 pages, preliminary version

2604.13608 2026-04-16 cs.LG cs.AI

Design Space Exploration of Hybrid Quantum Neural Networks for Chronic Kidney Disease

Muhammad Kashif, Hanzalah Mohamed Siraj, Nouhaila Innan, Alberto Marchisio, Muhammad Shafique

2604.13602 2026-04-16 cs.LG

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Xiaohua Wang, Muzhao Tian, Yuqi Zeng, Zisu Huang, Jiakang Yuan, Bowen Chen, Jingwen Xu, Mingbo Zhou, Wenhao Liu, Muling Wu, Zhengkang Guo, Qi Qian, Yifei Wang, Feiran Zhang, Ruicheng Yin, Shihan Dou, Changze Lv, Tao Chen, Kaitao Song, Xu Tan, Tao Gui, Xiaoqing Zheng, Xuanjing Huang

Comments 42 pages, 5 figures, 2 tables

详情

英文摘要

Reinforcement Learning from Human Feedback (RLHF) and related alignment paradigms have become central to steering large language models (LLMs) and multimodal large language models (MLLMs) toward human-preferred behaviors. However, these approaches introduce a systemic vulnerability: reward hacking, where models exploit imperfections in learned reward signals to maximize proxy objectives without fulfilling true task intent. As models scale and optimization intensifies, such exploitation manifests as verbosity bias, sycophancy, hallucinated justification, benchmark overfitting, and, in multimodal settings, perception--reasoning decoupling and evaluator manipulation. Recent evidence further suggests that seemingly benign shortcut behaviors can generalize into broader forms of misalignment, including deception and strategic gaming of oversight mechanisms. In this survey, we propose the Proxy Compression Hypothesis (PCH) as a unifying framework for understanding reward hacking. We formalize reward hacking as an emergent consequence of optimizing expressive policies against compressed reward representations of high-dimensional human objectives. Under this view, reward hacking arises from the interaction of objective compression, optimization amplification, and evaluator--policy co-adaptation. This perspective unifies empirical phenomena across RLHF, RLAIF, and RLVR regimes, and explains how local shortcut learning can generalize into broader forms of misalignment, including deception and strategic manipulation of oversight mechanisms. We further organize detection and mitigation strategies according to how they intervene on compression, amplification, or co-adaptation dynamics. By framing reward hacking as a structural instability of proxy-based alignment under scale, we highlight open challenges in scalable oversight, multimodal grounding, and agentic autonomy.

URL PDF HTML ☆

赞 0 踩 0

2604.13598 2026-04-16 cs.LG stat.ME

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

Qin Zhou, Guoyan Liang, Qianyi Yang, Jingyuan Chen, Sai Wu, Chang Yao, Zhe Wang

Comments 13 pages,4 figures, ACL2026-main

2604.13586 2026-04-16 cs.CV

Efficient Multi-View 3D Object Detection by Dynamic Token Selection and Fine-Tuning

Danish Nazir, Antoine Hanna-Asaad, Lucas Görnhardt, Jan Piewek, Thorsten Bagdonat, Tim Fingscheidt

2604.13584 2026-04-16 cs.RO

UNRIO: Uncertainty-Aware Velocity Learning for Radar-Inertial Odometry

Jui-Te Huang, Tinashu Huang, Anthony Rowe, Michael Kaess

2604.13581 2026-04-16 cs.CV

SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance

Qi Xia, Peishan Cong, Ziyi Wang, Yujing Sun, Qin Sun, Xinge Zhu, Mao Ye, Ruigang Yang, Yuexin Ma

2604.13579 2026-04-16 cs.CL

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Jiahang Lin, Kai Hu, Binghai Wang, Yuhao Zhou, Zhiheng Xi, Honglin Guo, Shichun Liu, Junzhe Wang, Shihan Dou, Enyu Zhou, Hang Yan, Zhenhua Han, Tao Gui, Qi Zhang, Xuanjing Huang

2604.13568 2026-04-16 cs.CV

ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing

Zhentao Yang, Yixiang Luomei, Zhuoyang Liu, Zhenyu Liu, Feng Xu

Comments 14 pages, 8 figures, 5 tables

2604.13567 2026-04-16 cs.SD cs.AI

Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals

Mahmoud Fakhry, Abeer FathAllah Brery

2604.13565 2026-04-16 cs.CV cs.AI

UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing

Yunkai Dang, Minxin Dai, Yuekun Yang, Zhangnan Li, Wenbin Li, Feng Miao, Yang Gao

2604.13561 2026-04-16 cs.CV cs.AI

CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling

Shivika, Kartik Bose, Pankaj Gupta

2604.13560 2026-04-16 cs.LG cs.ET quant-ph

Parameter-efficient Quantum Multi-task Learning

Hevish Cowlessur, Chandra Thapa, Tansu Alpcan, Seyit Camtepe

详情

英文摘要

Multi-task learning (MTL) improves generalization and data efficiency by jointly learning related tasks through shared representations. In the widely used hard-parameter-sharing setting, a shared backbone is combined with task-specific prediction heads. However, task-specific parameters can grow rapidly with the number of tasks. Therefore, designing multi-task heads that preserve task specialization while improving parameter efficiency remains a key challenge. In Quantum Machine Learning (QML), variational quantum circuits (VQCs) provide a compact mechanism for mapping classical data to quantum states residing in high-dimensional Hilbert spaces, enabling expressive representations within constrained parameter budgets. We propose a parameter-efficient quantum multi-task learning (QMTL) framework that replaces conventional task-specific linear heads with a fully quantum prediction head in a hybrid architecture. The model consists of a VQC with a shared, task-independent quantum encoding stage, followed by lightweight task-specific ansatz blocks enabling localized task adaptation while maintaining compact parameterization. Under a controlled and capacity-matched formulation where the shared representation dimension grows with the number of tasks, our parameter-scaling analysis demonstrates that a standard classical head exhibits quadratic growth, whereas the proposed quantum head parameter cost scales linearly. We evaluate QMTL on three multi-task benchmarks spanning natural language processing, medical imaging, and multimodal sarcasm detection, where we achieve performance comparable to, and in some cases exceeding, classical hard-parameter-sharing baselines while consistently outperforming existing hybrid quantum MTL models with substantially fewer head parameters. We further demonstrate QMTL's executability on noisy simulators and real quantum hardware, illustrating its feasibility.

URL PDF HTML ☆

赞 0 踩 0

2604.13556 2026-04-16 cs.CL

YOCO++: Enhancing YOCO with KV Residual Connections for Efficient LLM Inference

You Wu, Ziheng Chen, Yizhen Zhang, Haoyi Wu, Chengting Yu, Yuchi Xu, Wenbo Su, Bo Zheng, Kewei Tu

2604.13555 2026-04-16 cs.CV cs.NI

AI Powered Image Analysis for Phishing Detection

K. Acharya, S. Ale, R. Kadel

Comments 8 pages, 3 figures

2604.13552 2026-04-16 cs.CL cs.AI

Training-Free Test-Time Contrastive Learning for Large Language Models

Kaiwen Zheng, Kai Zhou, Jinwu Hu, Te Gu, Mingkai Peng, Fei Liu

Comments Accepted by Findings ACL 2026

2604.13551 2026-04-16 cs.CL cs.IR

Debate to Align: Reliable Entity Alignment through Two-Stage Multi-Agent Debate

Cunda Wang, Ziying Ma, Po Hu, Weihua Wang, Feilong Bao

2604.13549 2026-04-16 cs.CV

Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation

Elton Cao, Hod Lipson

2604.13546 2026-04-16 cs.LG

Learning Inference Concurrency in DynamicGate MLP Structural and Mathematical Justification

Yongil Choi

Comments 20 pages, 6 figures

2604.13542 2026-04-16 cs.RO cs.DC cs.SE

Self-adaptive Multi-Access Edge Architectures: A Robotics Case

Mahyar T Moghaddam, Joakim Leed, Anders Frandsen

2604.13540 2026-04-16 cs.CV cs.AI

Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding

Yibo Jiang, Tao Wu, Rui Jiang, Yehao Lu, Chaoxiang Cai, Zequn Qin, Xi Li

2604.13538 2026-04-16 cs.CL

Synthesizing Instruction-Tuning Datasets with Contrastive Decoding

Tatsuya Ichinose, Youmi Ma, Masanari Oi, Ryuto Koike, Naoaki Okazaki

Comments 24 pages, 7 figures

2604.13531 2026-04-16 cs.AI cs.LG

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

Renqi Chen, Zeyin Tao, Jianming Guo, Jing Wang, Zezhou Xu, Jingzhe Zhu, Qingqing Sun, Tianyi Zhang, Shuai Chen

2604.13530 2026-04-16 cs.RO

Stability Principle Underlying Passive Dynamic Walking of Rimless Wheel

Fumihiko Asano

Comments This is a corrected version of the 2012 IEEE CCA paper. A typographical error in Eq. (16) has been corrected

2604.13521 2026-04-16 cs.LG cs.AI

C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions

Kenji Kubo, Shunsuke Kamiya, Masanori Koyama, Kohei Hayashi, Yusuke Iwasawa, Yutaka Matsuo

2604.13520 2026-04-16 cs.LG

LEGO-MOF: Equivariant Latent Manipulation for Editable, Generative, and Optimizable MOF Design

Chaoran Zhang, Guangyao Li, Dongxu Ji

Comments 36 pages including Supplementary Information, 10 figures in the main text and 12 figures/tables in the Supplementary Information

2604.13518 2026-04-16 cs.LG cs.AI

From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning

Mintu Dutta, Ritesh Vyas, Mohendra Roy

Comments This article has been submitted to the 2026 International Conference on Applied Artificial Intelligence (2AI), Central University of Kashmir, India

2604.13515 2026-04-16 cs.LG cs.AI cs.LO

SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

Xiaole Su, Kasey Zhang, Andy Lyu