arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2509.13866 2026-03-24 cs.LG cs.AI cs.CL

Masked Diffusion Models as Energy Minimization

Sitong Chen, Shen Nie, Jiacheng Sun, Zijin Feng, Zhenguo Li, Ji-Rong Wen, Chongxuan Li

详情

Journal ref: Published at NeurIPS 2025

英文摘要

We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy formulations--kinetic, conditional kinetic, and geodesic energy--are mathematically equivalent under the structure of MDMs, and that MDMs minimize all three when the mask schedule satisfies a closed-form optimality condition. This unification not only clarifies the theoretical foundations of MDMs, but also motivates practical improvements in sampling. By parameterizing interpolation schedules via Beta distributions, we reduce the schedule design space to a tractable 2D search, enabling efficient post-training tuning without model modification. Experiments on synthetic and real-world benchmarks demonstrate that our energy-inspired schedules outperform hand-crafted baselines, particularly in low-step sampling settings.

URL PDF HTML ☆

赞 0 踩 0

2509.04597 2026-03-24 cs.CV

DisPatch: Disarming Adversarial Patches in Object Detection with Diffusion Models

Jin Ma, Mohammed Aldeen, Christopher Salas, Feng Luo, Mashrur Chowdhury, Mert Pesé, Long Cheng

2508.20861 2026-03-24 cs.LG

Practical Physical Layer Authentication for Mobile Scenarios Using a Synthetic Dataset Enhanced Deep Learning Approach

Yijia Guo, Junqing Zhang, Y. -W. Peter Hong

详情

DOI: 10.1109/TIFS.2025.3602265

英文摘要

The Internet of Things (IoT) is ubiquitous thanks to the rapid development of wireless technologies. However, the broadcast nature of wireless transmissions results in great vulnerability to device authentication. Physical layer authentication emerges as a promising approach by exploiting the unique channel characteristics. However, a practical scheme applicable to dynamic channel variations is still missing. In this paper, we proposed a deep learning-based physical layer channel state information (CSI) authentication for mobile scenarios and carried out comprehensive simulation and experimental evaluation using IEEE 802.11n. Specifically, a synthetic training dataset was generated based on the WLAN TGn channel model and the autocorrelation and the distance correlation of the channel, which can significantly reduce the overhead of manually collecting experimental datasets. A convolutional neural network (CNN)-based Siamese network was exploited to learn the temporal and spatial correlation between the CSI pair and output a score to measure their similarity. We adopted a synergistic methodology involving both simulation and experimental evaluation. The experimental testbed consisted of WiFi IoT development kits and a few typical scenarios were specifically considered. Both simulation and experimental evaluation demonstrated excellent generalization performance of our proposed deep learning-based approach and excellent authentication performance. Demonstrated by our practical measurement results, our proposed scheme improved the area under the curve (AUC) by 0.03 compared to the fully connected network-based (FCN-based) Siamese model and by 0.06 compared to the correlation-based benchmark algorithm.

URL PDF HTML ☆

赞 0 踩 0

2508.20066 2026-03-24 cs.CV

PAUL: Uncertainty-Guided Partition and Augmentation for Robust Cross-View Geo-Localization under Noisy Correspondence

Zheng Li, Xueyi Zhang, Yanming Guo, Yuxiang Xie, Ding Zhaoyun, Siqi Cai, Haizhou Li, Mingrui Lao

Comments 10 pages

详情

英文摘要

Cross-view geo-localization is a critical task for UAV navigation, event detection, and aerial surveying, as it enables matching between drone-captured and satellite imagery. Most existing approaches embed multi-modal data into a joint feature space to maximize the similarity of paired images. However, these methods typically assume perfect alignment of image pairs during training, which rarely holds true in real-world scenarios. In practice, factors such as urban canyon effects, electromagnetic interference, and adverse weather frequently induce GPS drift, resulting in systematic alignment shifts where only partial correspondences exist between pairs. Despite its prevalence, this source of noisy correspondence has received limited attention in current research. In this paper, we formally introduce and address the Noisy Correspondence on Cross-View Geo-Localization (NC-CVGL) problem, aiming to bridge the gap between idealized benchmarks and practical applications. To this end, we propose PAUL (Partition and Augmentation by Uncertainty Learning), a novel framework that partitions and augments training data based on estimated data uncertainty through uncertainty-aware co-augmentation and evidential co-training. Specifically, PAUL selectively augments regions with high correspondence confidence and utilizes uncertainty estimation to refine feature learning, effectively suppressing noise from misaligned pairs. Distinct from traditional filtering or label correction, PAUL leverages both data uncertainty and loss discrepancy for targeted partitioning and augmentation, thus providing robust supervision for noisy samples. Comprehensive experiments validate the effectiveness of individual components in PAUL,which consistently achieves superior performance over other competitive noisy-correspondence-driven methods in various noise ratios.

URL PDF HTML ☆

赞 0 踩 0

2508.12632 2026-03-24 cs.CL

Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection

Chi Wang, Min Gao, Zongwei Wang, Junwei Yin, Kai Shu, Chenghua Lin

Comments published in WWW 2026

2508.12335 2026-03-24 cs.RO cs.SY eess.SY

Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Rashmi Dabir, Moritz Diehl

Comments 20 pages, 17 figures

2508.05141 2026-03-24 cs.LG cs.NA math.NA

Deep Neural Networks with General Activations: Super-Convergence in Sobolev Norms

Yahong Yang, Juncai He

Comments 56 pages, 7 figures

2508.05108 2026-03-24 cs.LG

Learning from Similarity-Confidence and Confidence-Difference

Tomoya Tate, Kosuke Sugiyama, Masato Uchida

Comments 41 pages, 13 figures. arXiv admin note: text overlap with arXiv:2310.05632 by other authors

2508.04865 2026-03-24 cs.LG cs.PL

Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment

Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi, Carolyn Jane Anderson, Joydeep Biswas, Arjun Guha

Comments 30 pages, 19 figures. Accepted at ICLR 2026. For data, code, artifacts, see https://agnostics.abgru.me

2508.02833 2026-03-24 cs.LG

TIC-GRPO: Provable and Efficient Optimization for Reinforcement Learning from Human Feedback

Lei Pang, Jun Luo, Ruinan Jin

Comments 44 pages

2508.02258 2026-03-24 cs.CV

Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning

Wenchuan Zhang, Jingru Guo, Hengzhe Zhang, Penghao Zhang, Jie Chen, Shuwan Zhang, Zhang Zhang, Yuhao Yi, Hong Bu

2508.01503 2026-03-24 cs.CL

A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents

Clayton Cohn, Surya Rayala, Namrata Srivastava, Joyce Horn Fonteles, Shruti Jain, Xinying Luo, Divya Mereddy, Naveeduddin Mohammed, Gautam Biswas

Comments Published in the proceedings of AAAI 2026 (main technical track)

2507.08704 2026-03-24 cs.CL cs.AI

Knowledge Fusion via Bidirectional Information Aggregation

Songlin Zhai, Guilin Qi, Yue Wang, Yuan Meng

详情

英文摘要

Knowledge graphs (KGs) are the cornerstone of the semantic web, offering up-to-date representations of real-world entities and relations. Yet large language models (LLMs) remain largely static after pre-training, causing their internal knowledge to become outdated and limiting their utility in time-sensitive web applications. To bridge this gap between dynamic knowledge and static models, a prevalent approach is to enhance LLMs with KGs. However, prevailing methods typically rely on parameter-invasive fine-tuning, which risks catastrophic forgetting and often degrades LLMs' general capabilities. Moreover, their static integration frameworks cannot keep pace with the continuous evolution of real-world KGs, hindering their deployment in dynamic web environments. To bridge this gap, we introduce KGA (\textit{\underline{K}nowledge \underline{G}raph-guided \underline{A}ttention}), a novel framework that dynamically integrates external KGs into LLMs exclusively at inference-time without any parameter modification. Inspired by research on neuroscience, we rewire the self-attention module by innovatively introducing two synergistic pathways: a \textit{bottom-up knowledge fusion} pathway and a \textit{top-down attention guidance} pathway. The \textit{bottom-up pathway} dynamically integrates external knowledge into input representations via input-driven KG fusion, which is akin to the \textit{stimulus-driven attention process} in the human brain. Complementarily, the \textit{top-down pathway} aims to assess the contextual relevance of each triple through a \textit{goal-directed verification process}, thereby suppressing task-irrelevant signals and amplifying knowledge-relevant patterns. By synergistically combining these two pathways, our method supports real-time knowledge fusion. Extensive experiments on four benchmarks verify KGA's strong fusion performance and efficiency.

URL PDF HTML ☆

赞 0 踩 0

2507.00462 2026-03-24 cs.CV

Unleashing the Potential of All Test Samples: Mean-Shift Guided Test-Time Adaptation

Jizhou Han, Chenhao Ding, SongLin Dong, Yuhang He, Xinyuan Gao, Yihong Gong

Comments Accepted by IEEE TCSVT. This is the author's version which has not been fully edited and content may change prior to final publication

2506.21220 2026-03-24 cs.LG cs.CL

Complexity-aware fine-tuning

Andrey Goncharov, Daniil Vyazhev, Petr Sychev, Edvard Khalafyan, Alexey Zaytsev

2506.21076 2026-03-24 cs.CV

PoseMaster: A Unified 3D Native Framework for Stylized Pose Generation

Hongyu Yan, Kunming Luo, Weiyu Li, Kaiyi Zhang, Yixun Liang, Jingwei Huang, Chunchao Guo, Ping Tan

Comments Accepted by CVPR 2026

2506.20294 2026-03-24 cs.CV

Ctrl-Z Sampling: Scaling Diffusion Sampling with Controlled Random Zigzag Explorations

Shunqi Mao, Wei Guo, Chaoyi Zhang, Jieting Long, Ke Xie, Weidong Cai

Comments 43 pages, 12 figures, 10 tables

2506.14186 2026-03-24 cs.RO cs.LG cs.SY eess.SY

Differentiable Simulation of Hard Contacts with Soft Gradients for Learning and Control

Anselm Paulus, A. René Geist, Pierre Schumacher, Vít Musil, Simon Rappenecker, Georg Martius

2506.13113 2026-03-24 cs.AI econ.GN q-fin.EC

Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning

Stella C. Dong, James R. Finlay

Comments The authors have determined that the current version contains incomplete analysis and preliminary results that are not suitable for public dissemination. The paper is withdrawn pending major revision

2506.09935 2026-03-24 cs.CV

LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning

Jiangyong Huang, Xiaojian Ma, Xiongkun Linghu, Junchao He, Qing Li, Song-Chun Zhu, Yixin Chen, Baoxiong Jia, Siyuan Huang

Comments Project page: https://leo-vl.github.io

2506.05736 2026-03-24 cs.LG cs.AI

Generalized Incremental Learning under Concept Drift across Evolving Data Streams

En Yu, Jie Lu, Guangquan Zhang

2506.04559 2026-03-24 cs.CV

Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning

Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

Comments ICLR 2026

2506.02845 2026-03-24 cs.CV

Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments

Di Wen, Lei Qi, Kunyu Peng, Kailun Yang, Fei Teng, Ao Luo, Jia Fu, Yufan Chen, Ruiping Liu, Yitian Shi, M. Saquib Sarfraz, Rainer Stiefelhagen

Comments 16 pages, 4 figures, code are available at https://github.com/LEI-QI-233/HAR-in-Space

2505.23667 2026-03-24 cs.AI

Formula-R1: Incentivizing LLM Reasoning over Complex Tables with Numerical Computation via Formula-Driven Reinforcement Learning

Lang Cao, Jingxian Xu, Hanbing Liu, Jinyu Wang, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang

2505.17782 2026-03-24 cs.CV

Thalia: A Global, Multi-Modal Dataset for Volcanic Activity Monitoring

Nikolas Papadopoulos, Nikolaos Ioannis Bountos, Maria Sdraka, Andreas Karavias, Gustau Camps-Valls, Ioannis Papoutsis

2505.15054 2026-03-24 cs.CL cs.AI cs.LG q-bio.BM

MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation

Feiyang Cai, Jiahui Bai, Tao Tang, Guijuan He, Joshua Luo, Tianyu Zhu, Srikanth Pilla, Gang Li, Ling Liu, Feng Luo

Comments ICLR-2026 Camera-Ready version

2505.11404 2026-03-24 cs.CV cs.AI

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner

Wenchuan Zhang, Penghao Zhang, Jingru Guo, Tao Cheng, Jie Chen, Shuwan Zhang, Zhang Zhang, Yuhao Yi, Hong Bu

详情

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 40(33): 28418-28426, 2026

英文摘要

Recent advances in vision language models (VLMs) have enabled broad progress in the general medical field. However, pathology still remains a more challenging subdomain, with current pathology specific VLMs exhibiting limitations in both diagnostic accuracy and reasoning plausibility. Such shortcomings are largely attributable to the nature of current pathology datasets, which are primarily composed of image description pairs that lack the depth and structured diagnostic paradigms employed by real world pathologists. In this study, we leverage pathology textbooks and real world pathology experts to construct high-quality, reasoning-oriented datasets. Building on this, we introduce Patho-R1, a multimodal RL-based pathology Reasoner, trained through a three-stage pipeline: (1) continued pretraining on 3.5 million image-text pairs for knowledge infusion; (2) supervised fine-tuning on 500k high-quality Chain-of-Thought samples for reasoning incentivizing; (3) reinforcement learning using Group Relative Policy Optimization and Decoupled Clip and Dynamic sAmpling Policy Optimization strategies for multimodal reasoning quality refinement. To further assess the alignment quality of our dataset, we propose Patho-CLIP, trained on the same figure-caption corpus used for continued pretraining. Comprehensive experimental results demonstrate that both Patho-CLIP and Patho-R1 achieve robust performance across a wide range of pathology-related tasks, including zero-shot classification, cross-modal retrieval, Visual Question Answering, and Multiple Choice Question. Our project is available at the Patho-R1 repository: https://github.com/Wenchuan-Zhang/Patho-R1.

URL PDF HTML ☆

赞 0 踩 0

2505.07775 2026-03-24 cs.CL cs.AI cs.CY

Must Read: A Comprehensive Survey of Computational Persuasion

Nimet Beyza Bozdag, Shuhaib Mehri, Xiaocheng Yang, Hyeonjeong Ha, Zirui Cheng, Esin Durmus, Jiaxuan You, Heng Ji, Gokhan Tur, Dilek Hakkani-Tür

Comments Accepted to ACM Computing Surveys

2504.14636 2026-03-24 cs.LG cs.AI

AlphaZero-Edu: Democratizing Access to AlphaZero

Ruitong Li, Aisheng Mo, Guowei Su, Ru Zhang, Binjie Guo, Haohan Jiang, Xurong Lin, Hongyan Wei, Jie Li, Zhiyuan Qian, Zhuhao Zhang, Xiaoyuan Cheng

2504.09396 2026-03-24 cs.LG cs.AI stat.ML

Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

Stella C. Dong