arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.00084 2026-04-28 cs.LG cs.AI cs.CL

Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs

Qibin Wang, Pu Zhao, Shaohan Huang, Fangkai Yang, Lu Wang, Furu Wei, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

详情

英文摘要

Test-time scaling (TTS) has gained widespread attention for enhancing LLM reasoning. Existing approaches such as Best-of-N and majority voting are limited as their performance depends on the quality of candidate responses, making them unable to produce a correct solution when all candidates are incorrect. Parallel self-refinement, generating multiple candidates and synthesizing a refined answer conditioned on them, offers a promising alternative, but the underlying mechanism driving its effectiveness remains obscure. To bridge this gap in understanding, we introduce a new metric, the Refinement Gap, designed to quantify the relative improvement of self-refinement beyond majority voting. We show that the Refinement Gap exhibits a clear scaling trend with model size and is only weakly correlated with the base capability. Based on this discovery, we propose Generative Self-Refinement (GSR), a parallel test-time scaling framework that transfers the refinement policy from larger teacher models with higher refinement gap into smaller students. Crucially, GSR jointly trains a single model to generate strong candidates and refine a better final answer based on these candidates. Experimental results demonstrate that our method achieves state-of-the-art performance across five mathematical benchmarks over other parallel aggregation methods, while the learned refinement skill transfers across multiple model scales and families and exhibits robust generalization to an out-of-distribution domain.

URL PDF HTML ☆

赞 0 踩 0

2508.19652 2026-04-28 cs.CV

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Zongxia Li, Wenhao Yu, Chengsong Huang, Zhenwen Liang, Rui Liu, Fuxiao Liu, Jingxi Che, Dian Yu, Jordan Boyd-Graber, Haitao Mi, Dong Yu

Comments 16 pages, two figures

2507.20088 2026-04-28 cs.LG math-ph math.MP math.OC stat.ML

Learning Latent Graph Geometry via Fixed-Point Schrödinger-Type Activation: A Theoretical Study

Dmitry Pasechnyuk-Vilensky, Martin Takáč

Comments 50 pages

2507.09245 2026-04-28 cs.CL

Swa-bhasha Resource Hub: Romanized Sinhala to Sinhala Transliteration Systems and Data Resources

Deshan Sumanathilaka, Sameera Perera, Sachithya Dharmasiri, Maneesha Athukorala, Anuja Dilrukshi Herath, Rukshan Dias, Pasindu Gamage, Ruvan Weerasinghe, Y. H. P. P. Priyadarshana

Comments 15 pages, 5 Tables, 3 figures

2507.06542 2026-04-28 cs.LG cs.DC cs.MA stat.ML

On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning

Tongtian Zhu, Tianyu Zhang, Mingze Wang, Zhanpeng Zhou, Can Wang

Comments We discover and theoretically explain why and when a single global parameter merging in decentralized learning can recover the performance of federated learning, even in highly heterogeneous and communication-constrained environments

2505.16518 2026-04-28 cs.CL cs.AI

CUB: Benchmarking Context Utilisation Techniques for Language Models

Lovisa Hagström, Youna Kim, Haeun Yu, Sang-goo Lee, Richard Johansson, Hyunsoo Cho, Isabelle Augenstein

Comments Accepted at ACL 2026, 33 pages

2505.12009 2026-04-28 cs.CV

LatentStealth: Unnoticeable and Efficient Adversarial Attacks on Expressive Human Pose and Shape Estimation

Zhiying Li, Guanggang Geng, Yeying Jin, Shuyuan Lin, Fengyuan Ma, Zhaoxin Fan, Lili Wang

Comments 10 pages, 6 figures

2505.02922 2026-04-28 cs.LG

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference

Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jing Liu, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Cheng Li, Yuqing Yang, Fan Yang, Mao Yang

Comments 16 pages; Accepted by VLDB 2026

详情

DOI: 10.14778/3796195.3796212
Journal ref: PVLDB, 19(5): 1016-1031, 2026

英文摘要

Recent large language models (LLMs) are rapidly extending their context windows, yet inference throughput lags due to increasing GPU memory and bandwidth demands. This is because the key-value (KV) cache, an intermediate structure storing token representations, grows linearly with context length and requires an iterative linear scan for attention computation. A promising direction to accelerate long-context inference is to exploit attention's inherent sparsity by offloading the KV cache to CPU memory and retrieving only a small subset of tokens important to the current generation step. However, prior sparse attention approaches struggle to balance accuracy and retrieval cost due to varying sparsity patterns and inefficient GPU-CPU memory management. We present RetroInfer, a vector storage engine that realizes a sparsity-based KV cache for long-context inference. RetroInfer introduces an Attention-aWare VEctor index (wave index), which fundamentally improves the tradeoff between attention accuracy and retrieval cost through tripartite attention approximation, accuracy-bound attention estimation, and segmented clustering. We also design the wave buffer, a GPU-CPU buffer manager that assigns computation and manages data across heterogeneous hardware. We evaluate RetroInfer across a range of models and workloads, demonstrating up to 4.4X decoding throughput over full attention at 120K context and up to 12.2X over sparse attention baselines at 1 million tokens -- all while preserving full-attention-level accuracy.

URL PDF HTML ☆

赞 0 踩 0

2505.01595 2026-04-28 cs.CL cs.AI cs.LG

Always Tell Me The Odds: Fine-grained Conditional Probability Estimation

Liaoyaqi Wang, Zhengping Jiang, Anqi Liu, Benjamin Van Durme

2504.10527 2026-04-28 cs.AI cs.CY

Explainable Artificial Intelligence Techniques for Interpretation of Food Models: a Review

Leonardo Arrighi, Ingrid Alves de Moraes, Marco Zullich, Michele Simonato, Douglas Fernandes Barbin, Sylvio Barbon Junior

Comments 47 pages, 10 figures, 7 tables

2504.09499 2026-04-28 cs.LG cs.AI

Decoding the mechanisms of the Hattrick football manager game using Bayesian network structure learning

Anthony C. Constantinou, Nicholas Higgins, Neville K. Kitson

2503.10666 2026-04-28 cs.CL cs.AI cs.LG

Green Prompting: Characterizing Prompt-driven Energy Costs of LLM Inference

Marta Adamska, Daria Smirnova, Hamid Nasiri, Zhengxin Yu, Peter Garraghan

Comments 9 pages, 5 figures

2501.07237 2026-04-28 cs.LG cs.AI

GWT: Scalable Optimizer State Compression for Large Language Model Training

Ziqing Wen, Ping Luo, Jiahuan Wang, Kun Yuan, Dongsheng Li, Tao Sun

2410.05970 2026-04-28 cs.CV cs.AI cs.CL

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

Xudong Xie, Hao Yan, Liang Yin, Yang Liu, Jing Ding, Minghui Liao, Yuliang Liu, Wei Chen, Xiang Bai

Comments Accepted by International Journal of Computer Vision (IJCV)

2408.00923 2026-04-28 cs.CV cs.AI

Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization

Róisín Luo, Alexandru Drimbarean, James McDermott, Colm O'Riordan

Comments Accepted by The 35th British Machine Vision Conference (BMVC 2024)

详情

英文摘要

This paper explores a novel paradigm in low-bit (i.e. 4-bits or lower) quantization, differing from existing state-of-the-art methods, by framing optimal quantization as an architecture search problem within convolutional neural networks (ConvNets). Our framework, dubbed \textbf{CoRa} (Optimal Quantization Residual \textbf{Co}nvolutional Operator Low-\textbf{Ra}nk Adaptation), is motivated by two key aspects. Firstly, quantization residual knowledge, i.e. the lost information between floating-point weights and quantized weights, has long been neglected by the research community. Reclaiming the critical residual knowledge, with an infinitesimal extra parameter cost, can reverse performance degradation without training. Secondly, state-of-the-art quantization frameworks search for optimal quantized weights to address the performance degradation. Yet, the vast search spaces in weight optimization pose a challenge for the efficient optimization in large models. For example, state-of-the-art BRECQ necessitates $2 \times 10^4$ iterations to quantize models. Fundamentally differing from existing methods, \textbf{CoRa} searches for the optimal architectures of low-rank adapters, reclaiming critical quantization residual knowledge, within the search spaces smaller compared to the weight spaces, by many orders of magnitude. The low-rank adapters approximate the quantization residual weights, discarded in previous methods. We evaluate our approach over multiple pre-trained ConvNets on ImageNet. \textbf{CoRa} achieves comparable performance against both state-of-the-art quantization-aware training and post-training quantization baselines, in $4$-bit and $3$-bit quantization, by using less than $250$ iterations on a small calibration set with $1600$ images. Thus, \textbf{CoRa} establishes a new state-of-the-art in terms of the optimization efficiency in low-bit quantization.

URL PDF HTML ☆

赞 0 踩 0

2407.14974 2026-04-28 cs.LG cs.AI

Out of Spuriousity: Improving Robustness to Spurious Correlations without Group Annotations

Phuong Quynh Le, Jörg Schlötterer, Christin Seifert

Comments Accepted to TMLR

2405.20642 2026-04-28 cs.LG stat.ML

Learning Under Moral Hazard with Instrumental Regression and Generalized Method of Moments

Shiliang Zuo

2405.04211 2026-04-28 cs.CV

Leveraging Medical Foundation Model Features in Graph Neural Network-Based Retrieval of Breast Histopathology Images

Nematollah Saeidi, Hossein Karshenas, Bijan Shoushtarian, Sepideh Hatamikia, Ramona Woitek, Amirreza Mahbod

Comments 29 pages

详情

DOI: 10.1002/ima.70346
Journal ref: International Journal of Imaging Systems and Technology, 2026

英文摘要

Breast cancer is the most common cancer type in women worldwide. Early detection and appropriate treatment can significantly reduce its impact. While histopathology examinations play a vital role in rapid and accurate diagnosis, they often require experienced medical experts for proper recognition and cancer grading. Automated image retrieval systems have the potential to assist pathologists in identifying cancerous tissues, thereby accelerating the diagnostic process. Nevertheless, proposing an accurate image retrieval model is challenging due to considerable variability among the tissue and cell patterns in histological images. In this work, we leverage the features from foundation models in a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval. Our results confirm the superior performance of models trained with foundation model features compared to those using pre-trained convolutional neural networks (up to 7.7% and 15.5% for mAP and mMV, respectively), with the pre-trained general-purpose self-supervised model for computational pathology (UNI) delivering the best overall performance. By evaluating two publicly available histology image datasets of breast cancer, our top-performing model, trained with UNI features, achieved average mAP/mMV scores of 96.7%/91.5% and 97.6%/94.2% for the BreakHis and BACH datasets, respectively. Our proposed retrieval model has the potential to be used in clinical settings to enhance diagnostic performance and ultimately benefit patients.

URL PDF HTML ☆

赞 0 踩 0

2403.16958 2026-04-28 cs.CV

TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving

Quang-Huy Che, Duc-Tri Le, Minh-Quan Pham, Vinh-Tiep Nguyen, Duc-Khai Lam

详情

Journal ref: Computers and Electrical Engineering 128 (2025) 110694

英文摘要

Semantic segmentation is a fundamental perception task in autonomous driving, particularly for identifying drivable areas and lane markings to enable safe navigation. However, most state-of-the-art (SOTA) models are computationally intensive and unsuitable for real-time deployment on resource-constrained embedded devices. In this paper, we introduce TwinLiteNet+, an enhanced multi-task segmentation model designed for real-time drivable area and lane segmentation with high efficiency. TwinLiteNet+ employs a hybrid encoder architecture that integrates stride-based dilated convolutions and depthwise separable dilated convolutions, balancing representational capacity and computational cost. To improve task-specific decoding, we propose two lightweight upsampling modules-Upper Convolution Block (UCB) and Upper Simple Block (USB)-alongside a Partial Class Activation Attention (PCAA) mechanism that enhances segmentation precision. The model is available in four configurations, ranging from the ultra-compact TwinLiteNet+_{Nano} (34K parameters) to the high-performance TwinLiteNet+_{Large} (1.94M parameters). On the BDD100K dataset, TwinLiteNet+_{Large} achieves 92.9% mIoU for drivable area segmentation and 34.2% IoU for lane segmentation-surpassing existing state-of-the-art models while requiring 11x fewer floating-point operations (FLOPs) for computation. Extensive evaluations on embedded devices demonstrate superior inference speed, quantization robustness (INT8/FP16), and energy efficiency, validating TwinLiteNet+ as a compelling solution for real-world autonomous driving systems. Code is available at https://github.com/chequanghuy/TwinLiteNetPlus.

URL PDF HTML ☆

赞 0 踩 0

2401.13568 2026-04-28 cs.RO

Investigating the Performance of Soft Robotic Adaptive Feet with Longitudinal and Transverse Arches

Anna Pace, Giorgio Grioli, Alice Ghezzi, Antonio Bicchi, Manuel G. Catalano

Comments Submitted to Frontiers in Robotics and AI

2401.03563 2026-04-28 cs.CL cs.IR

Data-CUBE: Data Curriculum for Instruction-based Sentence Representation Learning

Yingqian Min, Kun Zhou, Dawei Gao, Wayne Xin Zhao, He Hu, Yaliang Li

Comments 14 pages, working in progress

2312.08410 2026-04-28 cs.LG math.PR stat.ML

Universal approximation property of Banach space-valued random feature models including random neural networks

Ariel Neufeld, Philipp Schmocker

Comments 52 pages, 4 figures, 4 tables

2105.12708 2026-04-28 cs.CL cs.SD eess.AS

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Julia Pritzen, Michael Gref, Dietlind Zühlke, Christoph Schmidt

Comments Submitted to LREC 2022

1912.13213 2026-04-28 cs.LG math.OC stat.ML

A Modern Introduction to Online Learning

Francesco Orabona

Comments Major update: One new chapter (Online Learning to X); massive tightening of all the math; simplification of the betting algorithm that loses a constant fraction of money; exp-concave functions are now for extended-real-valued function; new layout for publication; added index

2604.24242 2026-04-28 cs.RO

OpenPodcar2: a robust, ROS2 vehicle for self-driving research

Rakshit Soni, Chris Waltham, Md Umar Ibrahim, Mark Crampton, Charles Fox

2604.24238 2026-04-28 cs.LG

GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models

Yiming Zhang, Sitong Liu, Ke Li, Zhihong Wu, Alex Cloninger, Melvin Leok

2604.24235 2026-04-28 cs.CV

Touchless Intraoperative Image Access System Based on Vision-Based Hand Tracking

Yin Lin, Domenico Aquino, Alberto Redaelli, Massimiliano Del Bene, Riccardo Barbieri, Simona Ferrante

2604.24234 2026-04-28 cs.CV

Graph-augmented Segmentation of Complex Shapes in Laser Powder bed Fusion for Enhanced In Situ Inspection

Stefano Raimondo, Matteo Bugatti, Marco Grasso

Comments Submitted to IEEE Transactions on Automation Science and Engineering (T-ASE)

2604.24230 2026-04-28 cs.CV

Radiomics- and Clinical Feature-Driven Prediction of Volumetric Response in Skull-Base Meningioma after CyberKnife Radiosurgery

Yin Lin, Elena De Martin, Giacomo Conte, Domenico Aquino, Cristiana Pedone, Alberto Redaelli, Riccardo Barbieri, Laura Fariselli, Simona Ferrante

2604.24224 2026-04-28 cs.LG

IMPA-Net: Meteorology-Aware Multi-Scale Attention and Dynamic Loss for Extreme Convective Radar Nowcasting

Haofei Cui, Guangxin He, Juanzhen Sun, Jingjia Luo, Haonan Chen, Xiaoran Zhuang, Mingxuan Chen, Xian Xiao