arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28204 2026-04-06 cs.LG cs.AI

ERPO: Token-Level Entropy-Regulated Policy Optimization for Large Reasoning Models

Song Yu, Li Li, Wenwen Zhao, Zhisheng Yang

Comments 17 pages, 5 figures

详情

英文摘要

Reinforcement learning from verifiable rewards has significantly advanced the reasoning capabilities of large language models. However, Group Relative Policy Optimization (GRPO) typically assigns a uniform, sequence-level advantage to all tokens, thereby overlooking the intrinsic information heterogeneity along reasoning chains. We show that this coarse-grained credit assignment leads to premature entropy collapse and encourages the model to generate redundant, low-quality reasoning paths. Through systematic empirical analysis, we identify Critical Decision Pivots (CDPs): transient high-entropy states where the policy's trajectory is most sensitive to perturbations. These pivots represent the "forks in the road" where effective multi-path exploration is most crucial yet often suppressed by uniform advantage signals. Building on these insights, we propose Entropy-Regulated Policy Optimization (ERPO), which transitions the optimization focus from coarse sequences to fine-grained token dynamics. ERPO introduces three synergistic components: (i) Entropy-aware Gating, which adaptively amplifies exploration at CDPs to facilitate diverse path discovery; (ii) Bucket-based Implicit Normalization, which mitigates difficulty bias by aligning token progress windows; and (iii) Result-anchored Advantage Synthesis, which re-weights token-level signals via outcome-driven anchors. Extensive experiments on competitive mathematical benchmarks demonstrate that ERPO significantly outperforms GRPO. Notably, ERPO not only boosts reasoning accuracy but also yields significantly more concise and robust derivation paths, while achieving performance comparable to large models with orders of magnitude more parameters.

URL PDF HTML ☆

赞 0 踩 0

2603.26584 2026-04-06 cs.CV

Scene Grounding In the Wild

Tamir Cohen, Leo Segre, Shay Shomer-Chai, Shai Avidan, Hadar Averbuch-Elor

Comments CVPR 2026. Project page at https://tau-vailab.github.io/SceneGround/

2603.26535 2026-04-06 cs.AI

PAPO: Stabilizing Rubric Integration Training via Decoupled Advantage Normalization

Zelin Tan, Zhouliang Yu, Bohan Lin, Zijie Geng, Hejia Geng, Yudong Zhang, Mulei Zhang, Yang Chen, Shuyue Hu, Zhenfei Yin, Chen Zhang, Lei Bai

Comments 16 Pages,9 Figures

2603.25744 2026-04-06 cs.CV

MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models

Bocheng Zou, Mu Cai, Mark Stanley, Dingfu Lu, Yong Jae Lee

Comments Project Page: https://murf-vfm.github.io/

2603.24326 2026-04-06 cs.CV cs.AI cs.IR

Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang, Jing Zhang, Jun Zhang, Xing Wei, Yi Liu, Dianhai Yu, Yanjun Ma

Comments Accepted by CVPR2026

2603.23766 2026-04-06 cs.CV

Semantic Iterative Reconstruction: One-Shot Universal Anomaly Detection

Ning Zhu

Comments 8 pages, 2 figures,4 table

2603.23750 2026-04-06 cs.CL

IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge

Ali Abdelaal, Mohammed Nader Al Haffar, Mahmoud Fawzi, Walid Magdy

Comments Leaderboard link: https://huggingface.co/spaces/islamicmmlu/leaderboard

2603.23032 2026-04-06 cs.CV cs.RO

Generative Event Pretraining with Foundation Model Alignment

Jianwen Cao, Jiaxu Xing, Nico Messikommer, Davide Scaramuzza

2603.22869 2026-04-06 cs.AI

Chain-of-Authorization: Embedding authorization into large language models

Yang Li, Yule Liu, Xinlei He, Youjian Zhao, Qi Li, Ke Xu

Comments 23 pages, 7 figures

2603.21991 2026-04-06 cs.LG cs.AI

$λ$-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks

Cristian Pérez-Corral, Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Enrique S. Quintana-Ortí

2603.21210 2026-04-06 cs.LG cs.CE

Pretrained Video Models as Differentiable Physics Simulators for Urban Wind Flows

Janne Perini, Rafael Bischof, Moab Arar, Ayça Duran, Michael A. Kraus, Siddhartha Mishra, Bernd Bickel

2603.20554 2026-04-06 cs.CV

When Negation Is a Geometry Problem in Vision-Language Models

Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis

Comments Accepted to CVPR (Multimodal Algorithmic Reasoning Workshop) 2026

2603.20266 2026-04-06 cs.LG cs.AI

JointFM-0.1: A Foundation Model for Multi-Target Joint Distributional Prediction

Stefan Hackmann

2603.19798 2026-04-06 cs.SD cs.CL eess.AS

Borderless Long Speech Synthesis

Xingchen Song, Di Wu, Dinghao Zhou, Pengyu Cheng, Hongwu Ding, Yunchao He, Jie Wang, Shengfan Shen, Sixiang Lv, Lichun Fan, Hang Su, Yifeng Wang, Shuai Wang, Meng Meng, Jian Luan

2603.18545 2026-04-06 cs.CV cs.AI

CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Xiang Chen, Fangfang Yang, Chunlei Meng, Yuxian Dong, Ang Li, Yiwei Wei, Jiahuan Long, Jiujiang Guo, Chengyin Hu

2603.17714 2026-04-06 cs.AI

From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving

A. Humnabadkar, A. Sikdar, B. Cave, H. Zhang, N. Bessis, A. Behera

Comments Accepted manuscript - Transactions on Intelligent Transportation Systems

2603.17677 2026-04-06 cs.CL cs.AI cs.LG

Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models

Jaemin Kim, Jong Chul Ye

2603.17069 2026-04-06 cs.CV

Edge-Efficient Two-Stream Multimodal Architecture for Non-Intrusive Bathroom Fall Detection

Haitian Wang, Yiren Wang, Xinyu Wang, Sheldon Fung, Atif Mansoor

Comments This paper has been accepted for poster presenation at IEEE ICME 2026

详情

英文摘要

Falls in wet bathroom environments are a major safety risk for seniors living alone. Recent work has shown that mmWave-only, vibration-only, and existing multimodal schemes, such as vibration-triggered radar activation, early feature concatenation, and decision-level score fusion, can support privacy-preserving, non-intrusive fall detection. However, these designs still treat motion and impact as loosely coupled streams, depending on coarse temporal alignment and amplitude thresholds, and do not explicitly encode the causal link between radar-observed collapse and floor impact or address timing drift, object drop confounders, and latency and energy constraints on low-power edge devices. To this end, we propose a two-stream architecture that encodes radar signals with a Motion--Mamba branch for long-range motion patterns and processes floor vibration with an Impact--Griffin branch that emphasizes impact transients and cross-axis coupling. Cross-conditioned fusion uses low-rank bilinear interaction and a Switch--MoE head to align motion and impact tokens and suppress object-drop confounders. The model keeps inference cost suitable for real-time execution on a Raspberry Pi 4B gateway. We construct a bathroom fall detection benchmark dataset with frame-level annotations, comprising more than 3~h of synchronized mmWave radar and triaxial vibration recordings across eight scenarios under running water, together with subject-independent training, validation, and test splits. On the test split, our model attains 96.1% accuracy, 94.8% precision, 88.0% recall, a 91.1% macro F1 score, and an AUC of 0.968. Compared with the strongest baseline, it improves accuracy by 2.0 percentage points and fall recall by 1.3 percentage points, while reducing latency from 35.9 ms to 15.8 ms and lowering energy per 2.56 s window from 14200 mJ to 10750 mJ on the Raspberry Pi 4B gateway.

URL PDF HTML ☆

赞 0 踩 0

2603.14267 2026-04-06 cs.CV cs.AI cs.MM cs.SD

DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Ngoc-Son Nguyen, Thanh V. T. Tran, Jeongsoo Choi, Hieu-Nghia Huynh-Nguyen, Truong-Son Hy, Van Nguyen

Comments Accepted at CVPR 2026 Findings

2603.12711 2026-04-06 cs.CV

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

Jing Yang, Hui Xue, Shipeng Zhu, Pengfei Fang

Comments Accepted by CVPR 2026

2603.06928 2026-04-06 cs.RO

Failure Mechanisms and Risk Estimation for Legged Robot Locomotion on Granular Slopes

Xingjue Liao, Feifei Qian

2603.01765 2026-04-06 cs.CV

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Minseok Seo, Wonjun Lee, Jaehyuk Jang, Changick Kim

Comments 17 pages, 7 figures [We achieved a new Pareto frontier in test-time depth completion.]

2603.01589 2026-04-06 cs.LG cs.AI

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Xiangyang Zhu, Yuan Tian, Qi Jia, Kaiwei Zhang, Zicheng Zhang, Chunyi Li, Kaiyuan Ji, Dongrui Liu, Zijian Chen, Lu Sun, Renrui Zhang, Yan Teng, Jing Shao, Wei Sun, Xia Hu, Yu Qiao, Guangtao Zhai

2602.23113 2026-04-06 cs.LG

Learning Physical Operators using Neural Operators

Vignesh Gopakumar, Ander Gray, Dan Giles, Lorenzo Zanisi, Matt J. Kusner, Timo Betcke, Stanislas Pamela, Marc Peter Deisenroth

2602.22911 2026-04-06 cs.LG cs.AI cs.CL

CeRA: Overcoming the Linear Ceiling of Low-Rank Adaptation via Capacity Expansion

Hung-Hsuan Chen

2602.18523 2026-04-06 cs.LG cs.AI

The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure

Yongzhong Xu

Comments 42 pages, 34 figures, 15 tables

2602.16967 2026-04-06 cs.LG cs.AI

Early-Warning Signals of Grokking via Loss-Landscape Geometry

Yongzhong Xu

Comments 33 pages, 16 figures

2602.16746 2026-04-06 cs.LG cs.AI

Low-Dimensional and Transversely Curved Optimization Dynamics in Grokking

Yongzhong Xu

Comments 37 pages, 25 figures

2602.13889 2026-04-06 cs.CV cs.LG

Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification

Daniel Chen, Zaria Zinn, Marcus Lowe

2602.10516 2026-04-06 cs.CV

3DXTalker: Unifying Identity, Lip Sync, Emotion, and Spatial Dynamics in Expressive 3D Talking Avatars

Zhongju Wang, Zhenhong Sun, Beier Wang, Yifu Wang, Daoyi Dong, Huadong Mo, Hongdong Li