arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.14265 2026-04-17 cs.LG cs.AI

Reinforcement Learning via Value Gradient Flow

Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang

Comments ICLR 2026

详情

英文摘要

We study behavior-regularized reinforcement learning (RL), where regularization toward a reference distribution (the dataset in offline RL or the base model in LLM RL finetuning) is essential to prevent value over-optimization caused by erroneous out-of-distribution extrapolation. Existing methods either rely on reparameterized policy gradient, which are difficult to scale to large generative models, or on reject sampling, which can be overly conservative when attempting to move beyond the behavior support. In this paper, we propose Value Gradient Flow (VGF), a scalable new paradigm for behavior-regularized RL. VGF casts behavior-regularized RL as an optimal transport problem that maps the reference distribution to the value-induced optimal policy distribution. We solve this transport problem via discrete gradient flow, where value gradients guide particles initialized from the reference distribution. Our analysis shows that VGF imposes regularization implicitly by controlling the transport budget. VGF eliminates explicit policy parameterization while remaining expressive and flexible, this enables adaptive test-time scaling by adjusting the transport budget. Extensive experiments demonstrate that VGF significantly outperforms prior methods, achieving state-of-the-art results on offline RL benchmarks (D4RL, OGBench) and LLM RL tasks. Code and runs can be found at https://ryanxhr.github.io/vgf.

URL PDF HTML ☆

赞 0 踩 0

2604.14262 2026-04-17 cs.LG cs.AI

GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

Yangyue Wang, Harshvardhan Sikka, Yash Mathur, Tony Zhou, Jinu Nyachhyon, Pranav Guruprasad

Comments 26 Pages, 17 Figures, 9 Tables

2604.14261 2026-04-17 cs.CL cs.AI

ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang

2604.14254 2026-04-17 cs.AI cs.LO

Formalizing Kantian Ethics: Formula of the Universal Law Logic (FULL)

Taylor Olson

2604.14251 2026-04-17 cs.LG

Calibrate-Then-Delegate: Safety Monitoring with Risk and Budget Guarantees via Model Cascades

Edoardo Pona, Milad Kazemi, Mehran Hosseini, Yali Du, David Watson, Osvaldo Simeone, Nicola Paoletti

2604.14249 2026-04-17 cs.LG stat.ML

Metric-Aware Principal Component Analysis (MAPCA):A Unified Framework for Scale-Invariant Representation Learning

Michael Leznik

Comments 12 pages , one figure

2604.14237 2026-04-17 cs.LG

TOPCELL: Topology Optimization of Standard Cell via LLMs

Zhan Song, Yu-Tung Liu, Chen Chen, Guoheng Sun, Jiaqi Yin, Chia-tung Ho, Ang Li, Haoxing Ren, Cunxi Yu

Comments Accepted to the 63rd ACM/IEEE Design Automation Conference (DAC 2026). 7 pages, 4 figures

2604.14235 2026-04-17 cs.LG cs.AI

Graph-Based Fraud Detection with Dual-Path Graph Filtering

Wei He, Wensheng Gan, Philip S. Yu

Comments Neural Networks

2604.14232 2026-04-17 cs.LG cs.AI

Explainable Graph Neural Networks for Interbank Contagion Surveillance: A Regulatory-Aligned Framework for the U.S. Banking Sector

Mohammad Nasir Uddin

Comments 28 pages, submitted to Research in International Business and Finance (RIBAF)

2604.14231 2026-04-17 cs.LG cs.AI cs.NE

Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation

Mohammad Nasir Uddin, Md Munna Aziz

Comments 28 pages. Submitted to Engineering Applications of Artificial Intelligence (Elsevier). IEEE-CIS dataset (590,540 transactions). Includes SGAE algorithm, SHAP stability evaluation, and OCC/SR 11-7 regulatory compliance mapping

2604.14221 2026-04-17 cs.AI

Fun-TSG: A Function-Driven Multivariate Time Series Generator with Variable-Level Anomaly Labeling

Pierre Lotte, André Péninou, Olivier Teste

2604.14218 2026-04-17 cs.CL cs.AI

MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes

Samir Wagle, Reewaj Khanal, Abiral Adhikari

Comments PrePrint

2604.14214 2026-04-17 cs.CL cs.AI

CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

Deep Shah, Sanket Badhe, Nehal Kathrotia, Priyanka Tiwari

Comments Accepted at ICLR 2026 Workshop on Logical Reasoning of Large Language Models

2604.14210 2026-04-17 cs.CL cs.SE

Chinese Language Is Not More Efficient Than English in Vibe Coding: A Preliminary Study on Token Cost and Problem-Solving Rate

Simiao Ren, Xingyu Shen, Yuchen Zhou, Dennis, Ng, Ankit Raj

2604.14209 2026-04-17 cs.LG cs.AI stat.ML

Towards Verified and Targeted Explanations through Formal Methods

Hanchen David Wang, Diego Manzanas Lopez, Preston K. Robinette, Ipek Oguz, Taylor T. Johnson, Meiyi Ma

Comments Paper has been accepted at JAIR

2604.14206 2026-04-17 cs.LG q-fin.PM stat.ML

Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training

Adhiraj Chattopadhyay

Comments 18 pages of main text. 10 pages of appendices. 35 references. Around 13 figures

2604.14204 2026-04-17 cs.SD cs.AI eess.AS

Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition

Chengling Guo, Yuntao Shou, Tao Meng, Wei Ai, Yun Tan, Keqin Li

Comments 16 pages

2604.14198 2026-04-17 cs.LG cs.AI cs.CL

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

Bingbing Wen, Sirajul Salekin, Feiyang Kang, Bill Howe, Lucy Lu Wang, Javier Movellan, Manjot Bilkhu

2604.14197 2026-04-17 cs.CL cs.AI

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

David A. Cook

Comments Presents the novel PICCO framework for LLM prompting, derived through a structured multi-database search and rigorous comparative synthesis of 11 published prompting frameworks. Submitted in PDF/A format to preserve the structure and readability of several multi-page tables central to the framework and methodology; these contain dense structured information that is best preserved in PDF form

2604.14191 2026-04-17 cs.CL cs.LG

Attention to Mamba: A Recipe for Cross-Architecture Distillation

Abhinav Moudgil, Ningyuan Huang, Eeshan Gunesh Dhekane, Pau Rodríguez, Luca Zappella, Federico Danieli

2604.14180 2026-04-17 cs.CL cs.AI

Internal Knowledge Without External Expression: Probing the Generalization Boundary of a Classical Chinese Language Model

Jiuting Chen, Yuan Lian, Hao Wu, Tianqi Huang, Hiroshi Sasaki, Makoto Kouno, Jongil Choi

Comments 15 pages, 5 figures, supplementary material included

2604.14179 2026-04-17 cs.CL cs.AI

An Underexplored Frontier: Large Language Models for Rare Disease Patient Education and Communication -- A scoping review

Zaifu Zhan, Yu Hou, Kai Yu, Min Zeng, Anita Burgun, Xiaoyi Chen, Rui Zhang

2604.14178 2026-04-17 cs.AI q-bio.NC

Simulating Human Cognition: Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM-based AI systems

Hong Su

2604.14177 2026-04-17 cs.CL cs.AI

Listen, Correct, and Feed Back: Spoken Pedagogical Feedback Generation

Junhong Liang, Yifan Lu, Ekaterina Kochmar, Fajri Koto

Comments NLP8506 course project

2604.14176 2026-04-17 cs.LG cs.AI stat.ML

The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery

Haiyang Zheng, Nan Pu, Yaqi Cai, Teng Long, Wenjing Li, Nicu Sebe, Zhun Zhong

Comments Accepted by CVPR26

2604.14175 2026-04-17 cs.CL cs.AI

QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment

Mohammad AL-Smadi

Comments Accepted for publication at CL4Health 2026 workshop, LREC2026 conference

2604.14172 2026-04-17 cs.CL cs.AI

Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations

Ziyin Zhou, Jianyi Zhang, Xu ji, Yilong Li, Jiameng Han, Zhangchi Zhao

2604.14171 2026-04-17 cs.CL cs.AI

Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali

Ananda Rimal, Adarsha Rimal

Comments 31 pages, 4 figures, 14 tables

详情

英文摘要

Romanized Nepali, the Nepali language written in the Latin alphabet, is the dominant medium for informal digital communication in Nepal, yet it remains critically underresourced in the landscape of Large Language Models (LLMs). This study presents a systematic benchmarking of linguistic adaptation across three comparable-sized open-weight models: Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B. We evaluate these architectures under zero-shot and fine-tuned settings using a curated bilingual dataset of 10,000 transliterated instruction-following samples. Performance is quantified across five metrics spanning seven measurement dimensions: Perplexity (PPL), BERTScore, chrF++, ROUGE-1, ROUGE-2, ROUGE-L, and BLEU, capturing fluency, phonetic consistency, and semantic integrity. Models were fine-tuned using Quantized Low-Rank Adaptation (QLoRA) with Rank-Stabilized LoRA (rsLoRA) at rank r=32 on dual NVIDIA Tesla T4 GPUs, training only approximately 1% of each model's parameters in under 27 total GPU-hours. At zero-shot, all three models fail to generate Romanized Nepali, each exhibiting a distinct architecture-specific failure mode. Following fine-tuning, all three resolve these failures and converge to BERTScore approximately 0.75 and chrF++ greater than 23. Overall dimension-wise assessment across ten criteria identifies Qwen3-8B as the overall recommended architecture, being the only model to produce semantically relevant zero-shot output and leading all structural alignment metrics post-SFT. The adaptation headroom hypothesis is confirmed: Llama-3.1-8B, despite its weakest zero-shot baseline, achieves the largest absolute fine-tuning gains in PPL (Delta = -49.77) and BERTScore (Delta = +0.3287), making it the preferred choice for iterative low-resource development pipelines. This work establishes the first rigorous baseline for Romanized Nepali adaptation in comparable-sized open-weight LLMs.

URL PDF HTML ☆

赞 0 踩 0

2604.14170 2026-04-17 cs.CL cs.AI

Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning

Qi Dong, Ziheng Lin, Ning Ding

2604.14169 2026-04-17 cs.CL

Chronological Knowledge Retrieval: A Retrieval-Augmented Generation Approach to Construction Project Documentation

Ioannis-Aris Kostis, Natalia Sanchiz, Steeve De Schryver, François Denis, Pierre Schaus