arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2510.25514 2026-04-02 stat.ML cs.LG

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Maik Overmars, Jasper Goseling, Richard Boucherie

详情

DOI: 10.1145/3797823.3797851
Journal ref: SIGMETRICS Perform. Eval. Rev. 53 (2026) 91-96

英文摘要

We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference between the on-policy and off-policy process. This improves upon known results in the literature that state that convergence holds for a sufficiently small discount factor by establishing an explicit bound. Convergence is with probability one and achieves projected Bellman error equal to zero. To obtain these results, we adapt the stochastic approximation framework that was used by Tsitsiklis and Van Roy [1997 for the on-policy case, to the off-policy case. We illustrate our results using different types of reversible Markov chains, such as one-dimensional random walks and random walks on a weighted graph.

URL PDF HTML ☆

赞 0 踩 0

2510.24906 2026-04-02 cs.GT cs.AI

Fair Indivisible Payoffs through Shapley Value

Mikołaj Czarnecki, Michał Korniak, Oskar Skibski, Piotr Skowron

2510.15669 2026-04-02 stat.ML cs.LG

Disentanglement of Sources in a Multi-Stream Variational Autoencoder

Veranika Boukun, Jörg Lücke

Comments 14 pages, 4 figures; expanded literature review, added Algorithm 1, and included new benchmarking results on fixed number of overlapping MNIST sources

2510.09997 2026-04-02 cs.GR cs.CV

CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting

Zhigang Cheng, Mingchao Sun, Yu Liu, Zengye Ge, Luyang Tang, Mu Xu, Yangyan Li, Peng Pan

Comments Accepted by ICLR 2026 poster

2510.09534 2026-04-02 stat.ML cs.LG

Conditional Flow Matching for Bayesian Posterior Inference

Percy S. Zhai, So Won Jeong, Veronika Ročková

2510.05097 2026-04-02 cs.GR cs.CV

Pulp Motion: Framing-aware multimodal camera and human motion generation

Robin Courant, Xi Wang, David Loiseaux, Marc Christie, Vicky Kalogeiton

Comments Project page: https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/

2510.00512 2026-04-02 q-bio.MN cs.AI cs.LG

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction

Yuanfang Xiang, Lun Ai

Comments Accepted at ICLR 2026

2509.19162 2026-04-02 math.CO cs.LG hep-th math.GR

CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs (Brief version)

A. Chervov, D. Fedoriaka, E. Konstantinova, A. Naumov, I. Kiselev, A. Sheveleva, I. Koltsov, S. Lytkin, A. Smolensky, A. Soibelman, F. Levkovich-Maslyuk, R. Grimov, D. Volovich, A. Isakov, A. Kostin, M. Litvinov, N. Vilkin-Krom, A. Bidzhiev, A. Krasnyi, M. Evseev, E. Geraseva, L. Grunwald, S. Galkin, E. Koldunov, S. Diner, A. Chevychelov, E. Kudasheva, A. Sychev, A. Kravchenko, Z. Kogan, A. Natyrova, L. Shishina, L. Cheldieva, V. Zamkovoy, D. Kovalenko, O. Papulov, S. Kudashev, D. Shiltsov, R. Turtayev, O. Nikitina, D. Mamayeva, S. Nikolenko, M. Obozov, A. Titarenko, A. Dolgorukova, A. Aparnev, O. Debeaupuis, S. Alami C., H. Isambert

Comments 46 pages, 30 figures; v2: typos fixed

2508.21236 2026-04-02 cs.SI cs.LG stat.AP

Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting

Malte Lüken, Javier Garcia-Bernardo, Sreeparna Deb, Flavio Hafner, Megha Khosla

Comments 29 pages, 6 figures, Supplementary Materials available at https://github.com/odissei-explainable-network/netaudit; update text introduction, results, and discussion

2508.15860 2026-04-02 eess.IV cs.CV eess.AS

Robust Residual Finite Scalar Quantization for Neural Compression

Xiaoxu Zhu, Xiaojie Yu, Guangchao Yao, Yiming Ren, Baoxiang Li

Comments 5 pages, 2 figures

2508.11216 2026-04-02 math.NA cs.CV cs.NA

Coupled Reconstruction of 2D Blood Flow and Vessel Geometry from Noisy Images via Physics-Informed Neural Networks and Quasi-Conformal Mapping

Han Zhang, Xue-Cheng Tai, Jean-Michel Morel, Raymond H. Chan

2508.02473 2026-04-02 cs.SE cs.LG

NES: An Instruction-Free, Low-Latency Next Edit Suggestion Framework Powered by Learned Historical Editing Trajectories

Xinfang Chen, Siyang Xiao, Xianying Zhu, Junhong Xie, Ming Liang, Dajun Chen, Wei Jiang, Yong Li, Peng Di

Comments Accepted by FSE'26 Industry Track

2505.21580 2026-04-02 stat.ML cs.LG math.ST stat.TH

A Pure Hypothesis Test for Inhomogeneous Random Graph Models Based on a Kernelised Stein Discrepancy

Anum Fatima, Gesine Reinert

Comments 53 pages, 21 figures

2505.19225 2026-04-02 eess.IV cs.CV

Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

2505.13911 2026-04-02 eess.IV cs.AI cs.CV

Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

Ruijie Zhao, Zuopeng Tan, Xiao Xue, Longfei Zhao, Bing Li, Zicheng Liao, Ying Ming, Jiaru Wang, Ran Xiao, Sirong Piao, Rui Zhao, Qiqi Xu, Wei Song

2505.04959 2026-04-02 eess.IV cs.CV

MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation

Tengya Peng, Ruyi Zha, Qing Zou

2504.09279 2026-04-02 stat.ML cs.LG math.OC math.ST stat.TH

No-Regret Generative Modeling via Parabolic Monge-Ampère PDE

Nabarun Deb, Tengyuan Liang

Comments 30 pages, 7 figures. Journal version accepted for publication in the Annals of Statistics

2503.19091 2026-04-02 math.OC cs.CC cs.LG cs.NA math.NA stat.ML

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

Yuchen Fang, Javad Lavaei, Sen Na

Comments 66 pages, 7 figures

2503.08228 2026-04-02 cs.SE cs.AI cs.CL cs.PF

Investigating Execution-Aware Language Models for Code Optimization

Federico Di Menna, Luca Traini, Gabriele Bavota, Vittorio Cortellessa

2502.05181 2026-04-02 cs.CY cs.AI cs.LG

Enhancing Team Diversity with Generative AI: A Novel Project Management Framework

Johnny Chan, Yuming Li

Comments A published version can be found from here - https://www.computer.org/csdl/proceedings-article/compsac/2024/769600b648/1ZIUInSDC0w

2411.10174 2026-04-02 cs.CR cs.AI

A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks

Benoit Coqueret, Mathieu Carbone, Olivier Sentieys, Gabriel Zaid

2410.22729 2026-04-02 stat.ML cs.LG math.ST stat.TH

Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots

Vincent Guan, Joseph Janssen, Hossein Rahmani, Andrew Warren, Stephen Zhang, Elina Robeva, Geoffrey Schiebinger

2410.09236 2026-04-02 eess.AS cs.SD

Enhancing Infant Crying Detection with Gradient Boosting for Improved Emotional and Mental Health Diagnostics

Kyunghun Lee, Lauren M. Henry, Eleanor Hansen, Elizabeth Tandilashvili, Lauren S. Wakschlag, Elizabeth Norton, Daniel S. Pine, Melissa A. Brotman, Francisco Pereira

2407.13477 2026-04-02 eess.SY cs.RO cs.SY

The Construction of a Soft Gripper Based on Magnetorheological Elastomer with Permanent Magnet

Jakub Bernat, Pawel Czopek, Paulina Superczynska, Piotr Gajewski, Agnieszka Marcinkowska

2405.15132 2026-04-02 stat.ML cs.LG math.ST stat.CO stat.ME stat.TH

Scale-adaptive and robust intrinsic dimension estimation via optimal neighbourhood identification

Antonio Di Noia, Iuri Macocco, Aldo Glielmo, Alessandro Laio, Antonietta Mira

2307.14012 2026-04-02 stat.ML cs.LG

MCMC-Correction of Score-Based Diffusion Models for Model Composition

Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson

Comments 27 pages. Published in Entropy 28(3):351 (2026). This version matches the published content

1811.06026 2026-04-02 cs.GT cs.DS cs.LG

Incentivizing Exploration with Selective Data Disclosure

Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

Comments The ACM-EC 2020 conference publication corresponds to the Feb'20 version. Section 7 ("robustness") and Section 8 (the numerical study) were added in, resp., Dec'20 and Nov'24. New discussions (Section 3.2.1 and Appendix B) were added in April'26, as well as a partial reframing of the motivating story to emphasize transparency and deemphasize commitment

2604.00316 2026-04-02 stat.ML cs.LG

Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Marcel Tomàs Bernal, Neil Rohit Mallinar, Mikhail Belkin

2604.00314 2026-04-02 eess.IV cs.AI

Prompt-Guided Prefiltering for VLM Image Compression

Bardia Azizian, Ivan V. Bajic

Comments 7 pages, 5 figures. Accepted to IEEE ICME 2026. Code: https://github.com/bardia-az/pgp-vlm-compression

2604.00283 2026-04-02 eess.SY cs.LG cs.SY

Data-Driven Reachability Analysis via Diffusion Models with PAC Guarantees

Yanliang Huang, Peng Xie, Wenyuan Wu, Zhuoqi Zeng, Amr Alanwar

Comments 8 pages, 5 figures, submitted to the 65th IEEE Conference on Decision and Control (CDC 2026)