arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21611 2026-04-24 cs.CL cs.AI

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Hao-Yuan Chen

详情

英文摘要

Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision, via Verbal Process Supervision (VPS), a training-free framework that uses structured natural-language critique from a stronger supervisor to guide an iterative generate-critique-refine loop up to a round budget R. Across GPQA Diamond, AIME 2025, and LiveCodeBench V6 (covering both closed and open models), VPS yields three key results. First, on GPQA Diamond, GPT-5.4 (High) | GPT-5.4 (Low) reaches 94.9% at R=4, surpassing the 94.1% state of the art without gradient updates. Second, on AIME 2025, VPS enables strong weak-actor rescue, boosting scores from 11.7-26.7% to 63.3-90.0% (up to +63.3 points). Third, at matched compute, VPS outperforms Reflexion by +8.5 to +12.1 points and Self-Consistency@5 by +5.0 pp (GPQA) and +8.3 pp (LiveCodeBench), isolating critique granularity as the key driver. Performance scales with the supervisor-actor capability gap (Pearson r=0.90) and degrades when errors are not linguistically expressible (e.g., code synthesis), motivating hybrid verbal-executable methods. These results establish critique granularity as a new axis of inference-time scaling.

URL PDF HTML ☆

赞 0 踩 0

2604.21593 2026-04-24 cs.CL

Language as a Latent Variable for Reasoning Optimization

Linjuan Wu, Haoran Wei, Jialong Tang, Shuang Luo, Baosong Yang, Yongliang Shen, Weiming Lu

Comments 17 pages, 7 figures, Under Reviewing

2604.21592 2026-04-24 cs.CV

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han

2604.21590 2026-04-24 cs.CL

AgenticQwen: Training Small Agentic Language Models with Dual Data Flywheels for Industrial-Scale Tool Use

Yuanjie Lyu, Chengyu Wang, Haonan Zheng, Yuanhao Yue, Junbing Yan, Ming Wang, Jun Huang

2604.21584 2026-04-24 cs.AI cs.CE cs.LG

CoFEE: Reasoning Control for LLM-Based Feature Discovery

Maximilian Westermann, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Yagiz Ihlamur, Kelvin Amoaba, Joseph Ternasky, Fuat Alican, Yigit Ihlamur

2604.21575 2026-04-24 cs.CV cs.GR

OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

Zeyu Cai, Yuliang Xiu, Renke Wang, Zhijing Shao, Xiaoben Li, Siyuan Yu, Chao Xu, Yang Liu, Baigui Sun, Jian Yang, Zhenyu Zhang

Comments Project Page: https://zcai0612.github.io/OmniFit/

2604.21573 2026-04-24 cs.CV q-bio.QM

CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction

Changfan Wang, Xinran Wang, Donghai Liu, Fei Su, Lulu Sun, Zhicheng Zhao, Zhu Meng

2604.21572 2026-04-24 cs.CV

Deep kernel video approximation for unsupervised action segmentation

Silvia L. Pintea, Jouke Dijkstra

Comments Accepted at ICPR 2026

2604.21571 2026-04-24 cs.AI cs.LG

Separable Expert Architecture: Toward Privacy-Preserving LLM Personalization via Composable Adapters and Deletable User Proxies

Chris Schneider, Philipp Schoenegger, Ben Bariach

2604.21568 2026-04-24 cs.RO

A Bayesian Reasoning Framework for Robotic Systems in Autonomous Casualty Triage

Szymon Rusiecki, Cecilia Morales, Pia Störy, Kimberly Elenberg, Leonard Weiss, Artur Dubrawski

Comments Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)

2604.21567 2026-04-24 cs.LG cs.AI

Hybrid Deep Learning Approach for Coupled Demand Forecasting and Supply Chain Optimization

Nusrat Yasmin Nadia, Md Habibul Arif, Habibor Rahman Rabby, Md Iftekhar Monzur Tanvir, Md. Jakir Hossen, M. F. Mridha

Comments The paper is accepted in the Computers, Materials & Continua journal

2604.21556 2026-04-24 cs.AI cs.SE

Probabilistic Verification of Neural Networks via Efficient Probabilistic Hull Generation

Jingyang Li, Xin Chen, Hongfei Fu, Guoqiang Li

Comments 22 pages, 5 figures

2604.21555 2026-04-24 cs.CL

Finding Meaning in Embeddings: Concept Separation Curves

Paul Keuren, Marc Ponsen, Robert Ayoub Bagheri

Comments The code is open source and located on github at https://github.com/pkun-cbs/ConceptSeparationCurves. Original conference paper

2604.21554 2026-04-24 cs.AI

Engaged AI Governance: Addressing the Last Mile Challenge Through Internal Expert Collaboration

Simon Jarvers, Orestis Papakyriakopoulos

2604.21549 2026-04-24 cs.AI stat.ME

Unbiased Prevalence Estimation with Multicalibrated LLMs

Fridolin Linder, Thomas Leeper, Daniel Haimovich, Niek Tax, Lorenzo Perini, Milan Vojnovic

2604.21546 2026-04-24 cs.CV

Component-Based Out-of-Distribution Detection

Wenrui Liu, Hong Chang, Ruibing Hou, Shiguang Shan, Xilin Chen

2604.21541 2026-04-24 cs.RO

X2-N: A Transformable Wheel-legged Humanoid Robot with Dual-mode Locomotion and Manipulation

Yan Ning, Xingzhou Chen, Delong Li, Hao Zhang, Hanfu Gai, Tongyuan Li, Cheng Zhang, Zhihui Peng, Ling Shi

2604.21537 2026-04-24 cs.AI cond-mat.stat-mech cs.GT cs.SI physics.data-an

The CriticalSet problem: Identifying Critical Contributors in Bipartite Dependency Networks

Sebastiano A. Piccolo, Andrea Tagarelli

2604.21530 2026-04-24 cs.CV cs.AI

Attention-based multiple instance learning for predominant growth pattern prediction in lung adenocarcinoma wsi using foundation models

Laura Valeria Perez-Herrera, M. J. Garcia-Gonzalez, Karen Lopez-Linares

2604.21527 2026-04-24 cs.LG

A temporal deep learning framework for calibration of low-cost air quality sensors

Arindam Sengupta, Tony Bush, Ben Marner, Jose Miguel Pérez, Soledad Le Clainche

2604.21525 2026-04-24 cs.CL

Job Skill Extraction via LLM-Centric Multi-Module Framework

Guojing Li, Zichuan Fu, Junyi Li, Faxue Liu, Wenxia Zhou, Yejing Wang, Jingtong Gao, Maolin Wang, Rungen Liu, Wenlin Zhang, Xiangyu Zhao

Comments 5 pages, 5 figures, 3 tables

2604.21523 2026-04-24 cs.CV cs.CL

Seeing Isn't Believing: Uncovering Blind Spots in Evaluator Vision-Language Models

Mohammed Safi Ur Rahman Khan, Sanjay Suryanarayanan, Tushar Anand, Mitesh M. Khapra

2604.21519 2026-04-24 cs.CV

Gmd: Gaussian mixture descriptor for pair matching of 3D fragments

Meijun Xiong, Zhenguo Shi, Xinyu Zhou, Yuhe Zhang, Shunli Zhang

Comments 24 pages, 10 figures. Published in Multimedia Systems

2604.21515 2026-04-24 cs.AI cs.LO

Satisfying Rationality Postulates of Structured Argumentation Through Deductive Support -- Technical Report

Marcos Cramer, Tom Friese

2604.21510 2026-04-24 cs.CL

OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

Xinyu Zhang, Boxuan Zhang, Yuchen Wan, Lingling Zhang, YiXing Yao, Bifan Wei, Yaqiang Wu, Jun Liu

2604.21508 2026-04-24 cs.AI q-bio.BM

BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature

Jiaxian Yan, Jintao Zhu, Yuhang Yang, Qi Liu, Kai Zhang, Zaixi Zhang, Xukai Liu, Boyan Zhang, Kaiyuan Gao, Jinchuan Xiao, Enhong Chen

Comments 20 pages, 5 figures, 1 table

详情

英文摘要

Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer inter-structure relationships, and exact molecular construction is delegated to domain chemistry tools. For rigorous evaluation and method development, we further establish BioVista, a comprehensive benchmark comprising 16,457 bioactivity entries curated from 500 publications. BioMiner validates its extraction ability and provides a quantitative baseline, achieving an F1 score of 0.32 for bioactivity triplets. BioMiner's practical utility is demonstrated via three applications: (1) extracting 82,262 data from 11,683 papers to build a pre-training database that improves downstream models performance by 3.9%; (2) enabling a human-in-the-loop workflow that doubles the number of high-quality NLRP3 bioactivity data, helping 38.6% improvement over 28 QSAR models and identification of 16 hit candidates with novel scaffolds; and (3) accelerating protein-ligand complex bioactivity annotation, achieving a 5.59-fold speed increase and 5.75% accuracy improvement over manual workflows in PoseBusters dataset.

URL PDF HTML ☆

赞 0 踩 0

2604.21495 2026-04-24 cs.LG cs.AI cs.CL

Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

Hanjun Cho, Gahyun Yoo, Hanseong Kim, Jay-Yoon Lee

Comments Accepted to TACL. This is a pre-MIT Press publication version

2604.21489 2026-04-24 cs.RO cs.AI

MISTY: High-Throughput Motion Planning via Mixer-based Single-step Drifting

Yining Xing, Zehong Ke, Yiqian Tu, Zhiyuan Liu, Wenhao Yu, Jianqiang Wang

Comments 8 pages, 4 figures, 3 tables. Submitted to IEEE Robotics and Automation Letters (RA-L)

2604.21481 2026-04-24 cs.CL

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Srija Anand, Ashwin Sankar, Ishvinder Sethi, Aaditya Pareek, Kartik Rajput, Gaurav Yadav, Nikhil Narasimhan, Adish Pandya, Deepon Halder, Mohammed Safi Ur Rahman Khan, Praveen S, Shobhit Banga, Mitesh M Khapra

2604.21480 2026-04-24 cs.AI

Efficient Agent Evaluation via Diversity-Guided User Simulation

Itay Nakash, George Kour, Ateret Anaby-Tavor