arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.15764 2026-04-20 cs.LG cs.AI

When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Comments 6 pages, 1 figure, 7 tables, 1 algorithm

详情

英文摘要

Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8$\times$ inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy $H(D)$ and expected depth $\mathbb{E}[D]$ rather than maximum depth $K$, with sample complexity $\mathcal{O}((\mathbb{E}[D] \cdot d + H(D))/ε^2)$. (2) Explicit Constructive Constants: Our analysis yields the leading coefficient $\sqrt{2\ln 2} \approx 1.177$ with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks strictly outperform fixed-depth counterparts. (4) Extension to Approximate Label Independence: We relax the label-independence assumption to $ε$-approximate policies, broadening applicability to learned routing. (5) Comprehensive Validation: Experiments across 6 architectures on 7 benchmarks demonstrate tightness ratios of 1.52-3.87$\times$ (all $p < 0.001$) versus $>$100$\times$ for classical bounds. Bound-guided threshold selection matches validation-tuned performance within 0.1-0.3%.

URL PDF HTML ☆

赞 0 踩 0

2604.15760 2026-04-20 cs.AI cs.GT

KWBench: Measuring Unprompted Problem Recognition in Knowledge Work

Ankit Maloo

Comments 37 pages, 8 figures

详情

英文摘要

We introduce the first version of KWBench (Knowledge Work Bench), a benchmark for unprompted problem recognition in large language models: can an LLM identify a professional scenario before attempting to solve it. Existing frontier benchmarks have saturated, and most knowledge-work evaluations to date reduce to extraction or task completion against a specification. KWBench targets the step before that: recognizing the governing structure of the situation from raw inputs alone. The benchmark contains 223 tasks sourced from practitioners across acquisitions, contract negotiations, clinical pharmacy, organizational politics, fraud analysis, and incentive design. Each task encodes a formal game-theoretic pattern (principal-agent conflict, signaling, mechanism design failure, strategic omission, coalitional dynamics, strategic interdependence) and carries structured ground truth recording the expert reading of the situation and the anticipated failure modes. Models receive raw data and a task prompt with no indication of problem type. Scoring is a three-tier rubric gated by a mandatory conjunctive check. Mandatory criteria encode the predicted wrong paths. We evaluate 16 models. The best model passes on 27.9% of tasks. The top two models agree on only 31.7% of their passes. Among the top 8, 44 tasks are solved by exactly one model; routing across the top 8 covers 50.7% of the benchmark, nearly double the best single model. Conditional on passing, quality scores converge (approx 83% across models); unconditional scores do not. Same models articulate the relevant game-theoretic concept correctly when asked, then fail to apply it unprompted. We release KWBench to shift how frontier models are evaluated on knowledge work, scoring them on whether they recognize the right problem from the situation alone, not only on how well they execute once the problem has been framed for them.

URL PDF HTML ☆

赞 0 踩 0

2604.15757 2026-04-20 cs.LG

Multi-objective Reinforcement Learning With Augmented States Requires Rewards After Deployment

Peter Vamplew, Cameron Foale

2604.15756 2026-04-20 cs.CL cs.CV

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models

Jinlun Ye, Jiang Liao, Runhe Lai, Xinhua Lu, Jiaxin Zhuang, Zhiyong Gan, Ruixuan Wang

Comments Accepted to CVPR 2026

2604.15750 2026-04-20 cs.LG cs.AI

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference

Xiang Xia, Wuyang Zhang, Jiazheng Liu, Cheng Yan, Yanyong Zhang

详情

英文摘要

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive language generation due to their potential for parallel decoding and global refinement of the entire sequence. To unlock this potential, DLM inference must carefully balance generation quality and decoding speed. Recent block-wise DLM decoding methods improve this trade-off by performing diffusion-based decoding sequentially in blocks. However, existing methods typically rely on fixed block schedules or current-step local signals to determine block boundaries, and use conservative confidence-based parallel decoding to avoid conflicts, limiting the quality-speed trade-off. In this paper, we argue that block-wise DLM inference requires more suitable signals for its two core decisions: cross-step signals for determining block boundaries, and token-level conflict signals for parallel decoding. Based on this view, we propose DepCap, a training-free framework for efficient block-wise DLM inference. Specifically, DepCap instantiates the cross-step signal as the influence of the last decoded block and uses it to adaptively determine how far the next block should extend, while identifying a conflict-free subset of tokens for safe parallel decoding within each block, enabling substantial inference acceleration with negligible quality degradation. DepCap is a plug-and-play method applicable to various DLMs, and compatible with existing KV-cache strategies for block-wise DLM. An information-theoretic analysis further suggests that the cumulative last-block influence on a candidate block is approximately additive across tokens, supporting the proposed block-partitioning criterion. Experimental results show that DepCap achieves favorable speed-quality trade-offs across multiple DLM backbones and reasoning and coding benchmarks, with up to 5.63$\times$ speedup without significant performance degradation.

URL PDF HTML ☆

赞 0 踩 0

2604.15742 2026-04-20 cs.LG hep-th stat.ML

Collective Kernel EFT for Pre-activation ResNets

Hidetoshi Kawase, Toshihiro Ota

Comments 20 pages

2604.15741 2026-04-20 cs.CL cs.AI

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

Ponhvoan Srey, Xiaobao Wu, Cong-Duy Nguyen, Anh Tuan Luu

Comments Accepted at ACL 2026 (Main Conference)

2604.15738 2026-04-20 cs.LG

Why Colors Make Clustering Harder:Global Integrality Gaps, the Price of Fairness, and Color-Coupled Algorithms in Chromatic Correlation Clustering

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

2604.15736 2026-04-20 cs.CV cs.CL

RefereeBench: Are Video MLLMs Ready to be Multi-Sport Referees

Yichen Xu, Yuanhang Liu, Chuhan Wang, Zihan Zhao, jinghan luo, Jianzhe Ma, Wenxuan Wang, Qin Jin

Comments Work in Progress

2604.15735 2026-04-20 cs.CV cs.AI

Sketch and Text Synergy: Fusing Structural Contours and Descriptive Attributes for Fine-Grained Image Retrieval

Siyuan Wang, Hanchen Gao, Guangming Zhu, Jiang Lu, Yiyue Ma, Tianci Wu, Jincai Huang, Liang Zhang

Comments Image Retrieval, Hand-drawn Sketch, Multi-stage Cross-modal Feature Alignment

2604.15729 2026-04-20 cs.CV cs.AI

MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis

Sicheng Chen, Chad Wong, Tianyi Zhang, Enhui Chai, Zeyu Liu, Fei Xia

详情

英文摘要

Whole Slide Image (WSI) analysis is pivotal in computational pathology, enabling cancer diagnosis by integrating morphological and architectural cues across magnifications. Multiple Instance Learning (MIL) serves as the standard framework for WSI analysis. Recently, Mamba has become a promising backbone for MIL, overtaking Transformers due to its efficiency and global context modeling capabilities originating from Natural Language Processing (NLP). However, existing Mamba-based MIL approaches face three critical challenges: (1) disruption of 2D spatial locality during 1D sequence flattening; (2) sub-optimal modeling of fine-grained local cellular structures; and (3) high memory peaks during inference on resource-constrained edge devices. Studies like MambaOut reveal that Mamba's SSM component is redundant for local feature extraction, where Gated CNNs suffice. Recognizing that WSI analysis demands both fine-grained local feature extraction akin to natural images, and global context modeling akin to NLP, we propose MambaBack, a novel hybrid architecture that harmonizes the strengths of Mamba and MambaOut. First, we propose the Hilbert sampling strategy to preserve the 2D spatial locality of tiles within 1D sequences, enhancing the model's spatial perception. Second, we design a hierarchical structure comprising a 1D Gated CNN block based on MambaOut to capture local cellular features, and a BiMamba2 block to aggregate global context, jointly enhancing multi-scale representation. Finally, we implement an asymmetric chunking design, allowing parallel processing during training and chunking-streaming accumulation during inference, minimizing peak memory usage for deployment. Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods. Source code and datasets are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2604.15727 2026-04-20 cs.AI cs.LG cs.LO

Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants

Sankalp Gilda, Shlok Gilda

Comments 10 pages + 3 pages references. Accepted as a poster at the ICLR 2026 Workshop for LLM Reasoning

2604.15726 2026-04-20 cs.AI

LLM Reasoning Is Latent, Not the Chain of Thought

Wenshuo Wang

2604.15725 2026-04-20 cs.LG cs.AI

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

Zehao Wang, Lanjun Wang

2604.15723 2026-04-20 cs.CV cs.AI

Diffusion Autoencoder for Unsupervised Artifact Restoration in Handheld Fundus Images

Mathumetha Palani, Kavya Puthumana, Ayantika Das, Ganapathy Krishnamurthi

Comments 5 pages, 2 figures, 1 Table - ISBI IEEE 2025 CONFERENCE

2604.15718 2026-04-20 cs.CV cs.AI cs.CR cs.DB cs.LG

NeuroLip: An Event-driven Spatiotemporal Learning Framework for Cross-Scene Lip-Motion-based Visual Speaker Recognition

Junguang Yao, Wenye Liu, Stjepan Picek, Yue Zheng

详情

英文摘要

Visual speaker recognition based on lip motion offers a silent, hands-free, and behavior-driven biometric solution that remains effective even when acoustic cues are unavailable. Compared to traditional methods that rely heavily on appearance-dependent representations, lip motion encodes subject-specific behavioral dynamics driven by consistent articulation patterns and muscle coordination, offering inherent stability across environmental changes. However, capturing these robust, fine-grained dynamics is challenging for conventional frame-based cameras due to motion blur and low dynamic range. To exploit the intrinsic stability of lip motion and address these sensing limitations, we propose NeuroLip, an event-based framework that captures fine-grained lip dynamics under a strict yet practical cross-scene protocol: training is performed under a single controlled condition, while recognition must generalize to unseen viewing and lighting conditions. NeuroLip features a 1) Temporal-aware Voxel Encoding module with adaptive event weighting, 2) Structure-aware Spatial Enhancer that amplifies discriminative behavioral patterns by suppressing noise while preserving vertically structured motion information, and 3) Polarity Consistency Regularization mechanism to retain motion-direction cues encoded in event polarities. To facilitate systematic evaluation, we introduce DVSpeaker, a comprehensive event-based lip-motion dataset comprising 50 subjects recorded under four distinct viewpoint and illumination scenarios. Extensive experiments demonstrate that NeuroLip achieves near-perfect matched-scene accuracy and robust cross-scene generalization, attaining over 71% accuracy on unseen viewpoints and nearly 76% under low-light conditions, outperforming representative existing methods by at least 8.54%. The dataset and code are publicly available at https://github.com/JiuZeongit/NeuroLip.

URL PDF HTML ☆

赞 0 踩 0

2604.15715 2026-04-20 cs.CL cs.AI

GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows

Jize Wang, Xuanxuan Liu, Yining Li, Songyang Zhang, Yijun Wang, Zifei Shan, Xinyi Le, Cailian Chen, Xinping Guan, Dacheng Tao

2604.15710 2026-04-20 cs.SD

VoxMind: An End-to-End Agentic Spoken Dialogue System

Tianle Liang, Yifu Chen, Shengpeng Ji, Yijun Chen, Zhiyang Jia, Jingyu Lu, Fan Zhuo, Xueyi Pu, Yangzhuo Li, Zhou Zhao

Comments Accepted to ACL 2026 Main Conference.Code and data available at https://github.com/MM-Speech/VoxMind

2604.15709 2026-04-20 cs.AI

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

Chenyi Huang, Haoting Zhang, Jingxu Xu, Zeyu Zheng, Yunduan Lin

2604.15708 2026-04-20 cs.CV

APC: Transferable and Efficient Adversarial Point Counterattack for Robust 3D Point Cloud Recognition

Geunyoung Jung, Soohong Kim, Inseok Kong, Jiyoung Jung

Comments Accepted by CVPR 2026 Findings

2604.15707 2026-04-20 cs.CV

LP$^{2}$DH: A Locality-Preserving Pixel-Difference Hashing Framework for Dynamic Texture Recognition

Ruxin Ding, Jianfeng Ren, Heng Yu, Jiawei Li, Xudong Jiang

2604.15706 2026-04-20 cs.CL

Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

Zijun Wang, Haoqin Tu, Weidong Zhou, Yiyang Zhou, Xiaohuan Zhou, Bingni Zhang, Weiguo Feng, Taifeng Wang, Cihang Xie, Fengze Liu

2604.15705 2026-04-20 cs.LG

Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning

Xiaoyu Yang, En Yu, Wei Duan, Jie Lu

2604.15703 2026-04-20 cs.CV

P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models

Geunyoung Jung, Soohong Kim, Kyungwoo Song, Jiyoung Jung

Comments Accepted by ICRA 2026

2604.15701 2026-04-20 cs.CL

Improving Reasoning Capabilities in Small Models through Mixture-of-Layers Distillation with Stepwise Attention on Key Information

Yao Chen, Jiawei Sheng, Wenyuan Zhang, Tingwen Liu

Comments Accepted at EMNLP 2025

2604.15699 2026-04-20 cs.LG cs.SI

Graph self-supervised learning based on frequency corruption

Haojie Li, Mengjiao Zhang, Guanfeng Liu, Qiang Hu, Yan Wang, Junwei Du

Comments 11 pages, 4 tables, 3 figures. Accepted at The ACM Web Conference 2026 (WWW 2026)

2604.15687 2026-04-20 cs.CL

Preference Estimation via Opponent Modeling in Multi-Agent Negotiation

Yuta Konishi, Kento Yamamoto, Eisuke Sonomoto, Rikuho Takeda, Ryo Furukawa, Yusuke Muraki, Takafumi Shimizu, Kazuma Fukumura, Yuya Kanemoto, Takayuki Ito, Shiyao Ding

Comments This paper is accepted as a Findings of ACL 2026

2604.15681 2026-04-20 cs.CV

Self-Supervised Angular Deblurring in Photoacoustic Reconstruction via Noisier2Inverse

Markus Haltmeier, Nadja Gruber, Gyeongha Hwang

2604.15679 2026-04-20 cs.LG cs.AI cs.CV

Hierarchical Active Inference using Successor Representations

Prashant Rangarajan, Rajesh P. N. Rao

Comments Accepted for publication in Neural Computation (MIT Press). 82 pages, 29 figures

2604.15678 2026-04-20 cs.CV

HyCal: A Training-Free Prototype Calibration Method for Cross-Discipline Few-Shot Class-Incremental Learning

Eunju Lee, MiHyeon Kim, JuneHyoung Kwon, Yoonji Lee, JiHyun Kim, Soojin Jang, YoungBin Kim

Comments Accepted to CVPR 2026. Eunju Lee and MiHyeon Kim contributed equally as co-first authors