arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1717
2510.25514 2026-04-02 stat.ML cs.LG

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

Maik Overmars, Jasper Goseling, Richard Boucherie

详情
Journal ref
SIGMETRICS Perform. Eval. Rev. 53 (2026) 91-96
英文摘要

We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function approximation can lead to divergence of the algorithm. Existing results for this setting modify the algorithm, for instance by reweighing the updates using importance sampling. This establishes convergence at the expense of additional complexity. In contrast, our approach is to analyse the standard algorithm, but to restrict our attention to the class of reversible Markov chains. We demonstrate convergence under this mild reversibility condition on the structure of the chain, which in many applications can be assumed using domain knowledge. In particular, we establish a convergence guarantee under an upper bound on the discount factor in terms of the difference between the on-policy and off-policy process. This improves upon known results in the literature that state that convergence holds for a sufficiently small discount factor by establishing an explicit bound. Convergence is with probability one and achieves projected Bellman error equal to zero. To obtain these results, we adapt the stochastic approximation framework that was used by Tsitsiklis and Van Roy [1997 for the on-policy case, to the off-policy case. We illustrate our results using different types of reversible Markov chains, such as one-dimensional random walks and random walks on a weighted graph.

2510.24906 2026-04-02 cs.GT cs.AI

Fair Indivisible Payoffs through Shapley Value

Mikołaj Czarnecki, Michał Korniak, Oskar Skibski, Piotr Skowron

详情
英文摘要

We consider the problem of payoff division in indivisible coalitional games, where the value of the grand coalition is a natural number. This number represents a certain quantity of indivisible objects, such as parliamentary seats, kidney exchanges, or top features contributing to the outcome of a machine learning model. The goal of this paper is to propose a fair method for dividing these objects among players. To achieve this, we define the indivisible Shapley value and study its properties. We demonstrate our proposed technique using three case studies, in particular, we use it to identify key regions of an image in the context of an image classification task.

2510.15669 2026-04-02 stat.ML cs.LG

Disentanglement of Sources in a Multi-Stream Variational Autoencoder

Veranika Boukun, Jörg Lücke

Comments 14 pages, 4 figures; expanded literature review, added Algorithm 1, and included new benchmarking results on fixed number of overlapping MNIST sources

详情
英文摘要

Variational autoencoders (VAEs) are among leading approaches to address the problem of learning disentangled representations. Typically a single VAE is used and disentangled representations are sought within its single continuous latent space. In this paper, we propose and provide a proof of concept for a novel Multi-Stream Variational Autoencoder (MS-VAE) that achieves disentanglement of sources by combining discrete and continuous latents. The discrete latents are used in an explicit source combination model, that superimposes a set of sources as part of the MS-VAE decoder. We formally define the MS-VAE approach, derive its inference and learning equations, and numerically investigate its principled functionality. The MS-VAE model is very flexible and can be trained using little supervision (we use fully unsupervised learning after pretraining with some labels). In our numerical experiments, we explored the ability of the MS-VAE approach in separating both superimposed hand-written digits as well as sound sources. For the former task we used superimposed MNIST digits (an increasingly common benchmark). For sound separation, our experiments focused on the task of speaker diarization in a recording conversation between two speakers. In all cases, we observe a clear separation of sources and competitive performance after training. For digit superpositions, performance is particularly competitive in complex mixtures (e.g., three and four digits). For the speaker diarization task, we observe an especially low rate of missed speakers and a more precise speaker attribution. Numerical experiments confirm the flexibility of the approach across varying amounts of supervision, and we observed high performance, e.g., when using just 10% of the labels for pretraining.

2510.09997 2026-04-02 cs.GR cs.CV

CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting

Zhigang Cheng, Mingchao Sun, Yu Liu, Zengye Ge, Luyang Tang, Mu Xu, Yangyan Li, Peng Pan

Comments Accepted by ICLR 2026 poster

详情
英文摘要

Level of Detail (LoD) is a fundamental technique in real-time computer graphics for managing the rendering costs of complex scenes while preserving visual fidelity. Traditionally, LoD is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm, however, suffers from two major drawbacks: it requires significant storage for multiple model copies and causes jarring visual ``popping" artifacts during transitions, degrading the user experience. We argue that the explicit, primitive-based nature of the emerging 3D Gaussian Splatting (3DGS) technique enables a more ideal paradigm: Continuous LoD (CLoD). A CLoD approach facilitates smooth, seamless quality scaling within a single, unified model, thereby circumventing the core problems of DLOD. To this end, we introduce CLoD-GS, a framework that integrates a continuous LoD mechanism directly into a 3DGS representation. Our method introduces a learnable, distance-dependent decay parameter for each Gaussian primitive, which dynamically adjusts its opacity based on viewpoint proximity. This allows for the progressive and smooth filtering of less significant primitives, effectively creating a continuous spectrum of detail within one model. To train this model to be robust across all distances, we introduce a virtual distance scaling mechanism and a novel coarse-to-fine training strategy with rendered point count regularization. Our approach not only eliminates the storage overhead and visual artifacts of discrete methods but also reduces the primitive count and memory footprint of the final model. Extensive experiments demonstrate that CLoD-GS achieves smooth, quality-scalable rendering from a single model, delivering high-fidelity results across a wide range of performance targets.

2510.09534 2026-04-02 stat.ML cs.LG

Conditional Flow Matching for Bayesian Posterior Inference

Percy S. Zhai, So Won Jeong, Veronika Ročková

详情
英文摘要

We propose a generative multivariate posterior sampler via flow matching. It offers a simple training objective, and does not require access to likelihood evaluation. The method learns a dynamic, block-triangular velocity field in the joint space of data and parameters, which results in a deterministic transport map from a source distribution to the desired posterior. The inverse map, named vector rank, is accessible by reversibly integrating the velocity over time. It is advantageous to leverage the dynamic design: proper constraints on the velocity yield a monotone map, which leads to a conditional Brenier map, enabling a fast and simultaneous generation of Bayesian credible sets whose contours correspond to level sets of Monge-Kantorovich data depth. Our approach is computationally lighter compared to GAN-based and diffusion-based counterparts, and is capable of capturing complex posterior structures. Finally, frequentist theoretical guarantee on the consistency of the recovered posterior distribution, and of the corresponding Bayesian credible sets, is provided.

2510.05097 2026-04-02 cs.GR cs.CV

Pulp Motion: Framing-aware multimodal camera and human motion generation

Robin Courant, Xi Wang, David Loiseaux, Marc Christie, Vicky Kalogeiton

Comments Project page: https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/

详情
英文摘要

Treating human motion and camera trajectory generation separately overlooks a core principle of cinematography: the tight interplay between actor performance and camera work in the screen space. In this paper, we are the first to cast this task as a text-conditioned joint generation, aiming to maintain consistent on-screen framing while producing two heterogeneous, yet intrinsically linked, modalities: human motion and camera trajectories. We propose a simple, model-agnostic framework that enforces multimodal coherence via an auxiliary modality: the on-screen framing induced by projecting human joints onto the camera. This on-screen framing provides a natural and effective bridge between modalities, promoting consistency and leading to more precise joint distribution. We first design a joint autoencoder that learns a shared latent space, together with a lightweight linear transform from the human and camera latents to a framing latent. We then introduce auxiliary sampling, which exploits this linear transform to steer generation toward a coherent framing modality. To support this task, we also introduce the PulpMotion dataset, a human-motion and camera-trajectory dataset with rich captions, and high-quality human motions. Extensive experiments across DiT- and MAR-based architectures show the generality and effectiveness of our method in generating on-frame coherent human-camera motions, while also achieving gains on textual alignment for both modalities. Our qualitative results yield more cinematographically meaningful framings setting the new state of the art for this task. Code, models and data are available in our \href{https://www.lix.polytechnique.fr/vista/projects/2025_pulpmotion_courant/}{project page}.

2510.00512 2026-04-02 q-bio.MN cs.AI cs.LG

Adaptive Data-Knowledge Alignment in Genetic Perturbation Prediction

Yuanfang Xiang, Lun Ai

Comments Accepted at ICLR 2026

详情
英文摘要

The transcriptional response to genetic perturbation reveals fundamental insights into complex cellular systems. While current approaches have made progress in predicting genetic perturbation responses, they provide limited biological understanding and cannot systematically refine existing knowledge. Overcoming these limitations requires an end-to-end integration of data-driven learning and existing knowledge. However, this integration is challenging due to inconsistencies between data and knowledge bases, such as noise, misannotation, and incompleteness. To address this challenge, we propose ALIGNED (Adaptive aLignment for Inconsistent Genetic kNowledgE and Data), a neuro-symbolic framework based on the Abductive Learning (ABL) paradigm. This end-to-end framework aligns neural and symbolic components and performs systematic knowledge refinement. We introduce a balanced consistency metric to evaluate the predictions' consistency against both data and knowledge. Our results show that ALIGNED outperforms state-of-the-art methods by achieving the highest balanced consistency, while also re-discovering biologically meaningful knowledge. Our work advances beyond existing methods to enable both the transparency and the evolution of mechanistic biological understanding.

2509.19162 2026-04-02 math.CO cs.LG hep-th math.GR

CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs (Brief version)

A. Chervov, D. Fedoriaka, E. Konstantinova, A. Naumov, I. Kiselev, A. Sheveleva, I. Koltsov, S. Lytkin, A. Smolensky, A. Soibelman, F. Levkovich-Maslyuk, R. Grimov, D. Volovich, A. Isakov, A. Kostin, M. Litvinov, N. Vilkin-Krom, A. Bidzhiev, A. Krasnyi, M. Evseev, E. Geraseva, L. Grunwald, S. Galkin, E. Koldunov, S. Diner, A. Chevychelov, E. Kudasheva, A. Sychev, A. Kravchenko, Z. Kogan, A. Natyrova, L. Shishina, L. Cheldieva, V. Zamkovoy, D. Kovalenko, O. Papulov, S. Kudashev, D. Shiltsov, R. Turtayev, O. Nikitina, D. Mamayeva, S. Nikolenko, M. Obozov, A. Titarenko, A. Dolgorukova, A. Aparnev, O. Debeaupuis, S. Alami C., H. Isambert

Comments 46 pages, 30 figures; v2: typos fixed

详情
英文摘要

This is the third paper of the CayleyPy project applying artificial intelligence to problems in group theory. We announce the first public release of CayleyPy, an open source Python library for computations with Cayley and Schreier graphs. Compared with systems such as GAP and Sage, CayleyPy handles much larger graphs and performs several orders of magnitude faster. Using CayleyPy we obtained about 200 new conjectures on Cayley and Schreier graphs, focused on diameters and growth. For many Cayley graphs of symmetric groups Sn we observe quasi polynomial diameter formulas: a small set of quadratic or linear polynomials indexed by n mod s. We conjecture that this is a general phenomenon, giving efficient diameter computation despite the problem being NP hard. We propose a refinement of the Babai type conjecture on diameters of Sn: n^2/2 + 4n upper bounds in the undirected case, compared to previous O(n^2) bounds. We also provide explicit generator families, related to involutions in a square with whiskers pattern, conjectured to maximize the diameter; search confirms this for all n up to 15. We further conjecture an answer to a question posed by V M Glushkov in 1968 on directed Cayley graphs generated by a cyclic shift and a transposition. For nilpotent groups we conjecture an improvement of J S Ellenberg's results on upper unitriangular matrices over Z/pZ, showing linear dependence of diameter on p. Some conjectures are LLM friendly, naturally stated as sorting problems verifiable by algorithms or Python code. To benchmark path finding we created more than 10 Kaggle datasets. CayleyPy works with arbitrary permutation or matrix groups and includes over 100 predefined generators. Our growth computation code outperforms GAP and Sage up to 1000 times in speed and size.

2508.21236 2026-04-02 cs.SI cs.LG stat.AP

Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting

Malte Lüken, Javier Garcia-Bernardo, Sreeparna Deb, Flavio Hafner, Megha Khosla

Comments 29 pages, 6 figures, Supplementary Materials available at https://github.com/odissei-explainable-network/netaudit; update text introduction, results, and discussion

详情
英文摘要

Administrative registry data can be used to construct population-scale networks whose ties reflect shared social contexts between persons. With machine learning, such networks can be encoded into numerical representations -- embeddings -- that automatically capture an individual's position within the network. We created embeddings for all persons in the Dutch population from a population-scale network that represents five shared contexts: neighborhood, work, family, household, and school. To assess the informativeness of these embeddings, we used them to predict right-wing populist voting. Embeddings alone predicted right-wing populist voting above chance-level but performed worse than individual characteristics. Combining the best subset of embeddings with individual characteristics only slightly improved predictions. After transforming the embeddings to make their dimensions more sparse and orthogonal, we found that one embedding dimension was strongly associated with the outcome. Mapping this dimension back to the population network revealed that differences in educational ties and attainment corresponded to distinct network structures associated with right-wing populist voting. Our study contributes methodologically by demonstrating how population-scale network embeddings can be made interpretable, and substantively by linking structural network differences in education to right-wing populist voting.

2508.15860 2026-04-02 eess.IV cs.CV eess.AS

Robust Residual Finite Scalar Quantization for Neural Compression

Xiaoxu Zhu, Xiaojie Yu, Guangchao Yao, Yiming Ren, Baoxiang Li

Comments 5 pages, 2 figures

详情
英文摘要

Finite Scalar Quantization (FSQ) offers simplified training but suffers from residual magnitude decay in multi-stage settings, where subsequent stages receive exponentially weaker signals. We propose Robust Residual Finite Scalar Quantization (RFSQ), addressing this fundamental limitation through two novel conditioning strategies: learnable scaling factors and invertible layer normalization. Our experiments across audio and image modalities demonstrate RFSQ's effectiveness and generalizability. In audio reconstruction at 24 bits/frame, RFSQ-LayerNorm achieves 3.646 DNSMOS, a 3.6% improvement over state-of-the-art RVQ (3.518). On ImageNet, RFSQ achieves 0.102 L1 loss and 0.100 perceptual loss, with LayerNorm providing 9.7% L1 improvement and 17.4% perceptual improvement over unconditioned variants. The LayerNorm strategy consistently outperforms alternatives by maintaining normalized input statistics across stages, effectively preventing exponential magnitude decay that limits naive residual approaches. RFSQ combines FSQ's simplicity with multi-stage quantization's representational power, establishing a new standard for neural compression across diverse modalities.

2508.11216 2026-04-02 math.NA cs.CV cs.NA

Coupled Reconstruction of 2D Blood Flow and Vessel Geometry from Noisy Images via Physics-Informed Neural Networks and Quasi-Conformal Mapping

Han Zhang, Xue-Cheng Tai, Jean-Michel Morel, Raymond H. Chan

详情
英文摘要

Blood flow imaging provides important information for hemodynamic behavior within the vascular system and plays an essential role in medical diagnosis and treatment planning. However, obtaining high-quality flow images remains a significant challenge. In this work, we address the problem of denoising flow images that may suffer from artifacts due to short acquisition times or device-induced errors. We formulate this task as an optimization problem, where the objective is to minimize the discrepancy between the modeled velocity field, constrained to satisfy the Navier-Stokes equations, and the observed noisy velocity data. To solve this problem, we decompose it into two subproblems: a fluid subproblem and a geometry subproblem. The fluid subproblem leverages a Physics-Informed Neural Network to reconstruct the velocity field from noisy observations, assuming a fixed domain. The geometry subproblem aims to infer the underlying flow region by optimizing a quasi-conformal mapping that deforms a reference domain. These two subproblems are solved in an alternating Gauss-Seidel fashion, iteratively refining both the velocity field and the domain. Upon convergence, the framework yields a high-quality reconstruction of the flow image. We validate the proposed method through experiments on synthetic flow data in a converging channel geometry under varying levels of Gaussian noise, and on real-like flow data in an aortic geometry with signal-dependent noise. The results demonstrate the effectiveness and robustness of the approach. Additionally, ablation studies are conducted to assess the influence of key hyperparameters.

2508.02473 2026-04-02 cs.SE cs.LG

NES: An Instruction-Free, Low-Latency Next Edit Suggestion Framework Powered by Learned Historical Editing Trajectories

Xinfang Chen, Siyang Xiao, Xianying Zhu, Junhong Xie, Ming Liang, Dajun Chen, Wei Jiang, Yong Li, Peng Di

Comments Accepted by FSE'26 Industry Track

详情
英文摘要

Code editing is a frequent yet cognitively demanding task in software development. Existing AI-powered tools often disrupt developer flow by requiring explicit natural language instructions and suffer from high latency, limiting real-world usability. We present NES (Next Edit Suggestion), an instruction-free, low-latency code editing framework that leverages learned historical editing trajectories to implicitly capture developers' goals and coding habits. NES features a dual-model architecture: one model predicts the next edit location and the other generates the precise code change, both without any user instruction. Trained on our open-sourced SFT and DAPO datasets, NES achieves state-of-the-art performance (75.6% location accuracy, 27.7% exact match rate) while delivering suggestions in under 250ms. Deployed at Ant Group, NES serves over 20,000 developers through a seamless Tab-key interaction, achieving effective acceptance rates of 51.55% for location predictions and 43.44% for edits, demonstrating its practical impact in real-world development workflows.

2505.21580 2026-04-02 stat.ML cs.LG math.ST stat.TH

A Pure Hypothesis Test for Inhomogeneous Random Graph Models Based on a Kernelised Stein Discrepancy

Anum Fatima, Gesine Reinert

Comments 53 pages, 21 figures

详情
英文摘要

Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop a KSD-type test for IRG models that can be carried out with a single observation of the network. The test applies to a network of any size, but is particularly interesting for small networks for which asymptotic tests are not warranted. We also provide theoretical guarantees.

2505.19225 2026-04-02 eess.IV cs.CV

Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

详情
英文摘要

Autoregressive modeling has driven major advances in multimodal AI, yet its application to medical imaging remains constrained by the absence of a unified image tokenizer that simultaneously preserves fine-grained anatomical structures and rich clinical semantics across heterogeneous modalities. Existing approaches jointly optimize image reconstruction and textual semantic objectives, relying on large-scale image-caption pairs and are prone to gradient interference. This is ill-suited for the medical domain where paired data are scarce and abundant unpaired images remain unexploited. This work identifies these issues in building unified medical image tokenizers, and introduces a principled two-stage training framework using visual representation as a bridge to address them. The propose visual representation alignment stage enables the utilization of large-scale unpaired medical images to ensure reconstruction fidelity and establish foundational semantics, alleviating the interference and better preparing for the second stage where fine-grained textual semantics are injected using image-text pairs. The resulting tokenizer, MedITok, is trained on over 33 million medical images spanning 9 modalities and 2 million image-text pairs. MedITok achieves state-of-the-art performance on 30+ benchmarks spanning 9 imaging modalities and 4 task families. It further enables autoregressive modeling for diagnostic and generative applications, serving as a scalable component for future multimodal models with unified synthesis and understanding capabilities in the medical domain. Project page: https://github.com/Masaaki-75/meditok

2505.13911 2026-04-02 eess.IV cs.AI cs.CV

Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

Ruijie Zhao, Zuopeng Tan, Xiao Xue, Longfei Zhao, Bing Li, Zicheng Liao, Ying Ming, Jiaru Wang, Ran Xiao, Sirong Piao, Rui Zhao, Qiqi Xu, Wei Song

详情
Journal ref
J Digit Imaging. Inform. med. (2026)
英文摘要

Pulmonary segment segmentation is crucial for cancer localization and surgical planning. However, the pixel-wise annotation of pulmonary segments is laborious, as the boundaries between segments are indistinguishable in medical images. To this end, we propose a weakly supervised learning (WSL) method, termed Anatomy-Hierarchy Supervised Learning (AHSL), which consults the precise clinical anatomical definition of pulmonary segments to perform pulmonary segment segmentation. Since pulmonary segments reside within the lobes and are determined by the bronchovascular tree, i.e., artery, airway and vein, the design of the loss function is founded on two principles. First, segment-level labels are utilized to directly supervise the output of the pulmonary segments, ensuring that they accurately encompass the appropriate bronchovascular tree. Second, lobe-level supervision indirectly oversees the pulmonary segment, ensuring their inclusion within the corresponding lobe. Besides, we introduce a two-stage segmentation strategy that incorporates bronchovascular priori information. Furthermore, a consistency loss is proposed to enhance the smoothness of segment boundaries, along with an evaluation metric designed to measure the smoothness of pulmonary segment boundaries. Visual inspection and evaluation metrics from experiments conducted on a private dataset demonstrate the effectiveness of our method.

2505.04959 2026-04-02 eess.IV cs.CV

MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation

Tengya Peng, Ruyi Zha, Qing Zou

详情
英文摘要

This study presents an unsupervised, motion-resolved reconstruction framework for high-resolution, free-breathing pulmonary magnetic resonance imaging (MRI), utilizing a three-dimensional Gaussian representation (3DGS). The proposed method leverages 3DGS to address the challenges of motion-resolved 3D isotropic pulmonary MRI reconstruction by enabling data smoothing between voxels for continuous spatial representation. Pulmonary MRI data acquisition is performed using a golden-angle radial sampling trajectory, with respiratory motion signals extracted from the center of k-space in each radial spoke. Based on the estimated motion signal, the k-space data is sorted into multiple respiratory phases. A 3DGS framework is then applied to reconstruct a reference image volume from the first motion state. Subsequently, a patient-specific convolutional neural network is trained to estimate the deformation vector fields (DVFs), which are used to generate the remaining motion states through spatial transformation of the reference volume. The proposed reconstruction pipeline is evaluated on six datasets from six subjects and bench-marked against three state-of-the-art reconstruction methods. The experimental findings demonstrate that the proposed reconstruction framework effectively reconstructs high-resolution, motion-resolved pulmonary MR images. Compared with existing approaches, it achieves superior image quality, reflected by higher signal-to-noise ratio and contrast-to-noise ratio. The proposed unsupervised 3DGS-based reconstruction method enables accurate motion-resolved pulmonary MRI with isotropic spatial resolution. Its superior performance in image quality metrics over state-of-the-art methods highlights its potential as a robust solution for clinical pulmonary MR imaging.

2504.09279 2026-04-02 stat.ML cs.LG math.OC math.ST stat.TH

No-Regret Generative Modeling via Parabolic Monge-Ampère PDE

Nabarun Deb, Tengyuan Liang

Comments 30 pages, 7 figures. Journal version accepted for publication in the Annals of Statistics

详情
英文摘要

We introduce a novel generative modeling framework based on a discretized parabolic Monge-Ampère PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative refinement in the space of Brenier maps using a mirror gradient descent step. We establish theoretical guarantees for generative modeling through the lens of no-regret analysis, demonstrating that the iterates converge to the optimal Brenier map under a variety of step-size schedules. As a technical contribution, we derive a new Evolution Variational Inequality tailored to the parabolic Monge-Ampère PDE, connecting geometry, transportation cost, and regret. Our framework accommodates non-log-concave target distributions, constructs an optimal sampling process via the Brenier map, and integrates favorable learning techniques from generative adversarial networks and score-based diffusion models. As direct applications, we illustrate how our theory paves new pathways for generative modeling and variational inference.

2503.19091 2026-04-02 math.OC cs.CC cs.LG cs.NA math.NA stat.ML

High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise

Yuchen Fang, Javad Lavaei, Sen Na

Comments 66 pages, 7 figures

详情
英文摘要

In this paper, we consider nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We propose a Trust-Region Stochastic Sequential Quadratic Programming (TR-SSQP) method and establish its high-probability iteration complexity bounds for identifying first- and second-order $ε$-stationary points. In our algorithm, we assume that exact objective values, gradients, and Hessians are not directly accessible but can be estimated via zeroth-, first-, and second-order probabilistic oracles. Compared to existing complexity studies of SSQP methods that rely on a zeroth-order oracle with sub-exponential tail noise (i.e., light-tailed) and focus mostly on first-order stationarity, our analysis accommodates biased (also referred to as irreducible in the literature) and heavy-tailed noise in the zeroth-order oracle, and significantly extends the analysis to second-order stationarity. We show that under heavy-tailed noise conditions, our SSQP method achieves the same high-probability first-order iteration complexity bounds as in the light-tailed noise setting, while further exhibiting promising second-order iteration complexity bounds. Specifically, the method identifies a first-order $ε$-stationary point in $\mathcal{O}(ε^{-2})$ iterations and a second-order $ε$-stationary point in $\mathcal{O}(ε^{-3})$ iterations with high probability, provided that $ε$ is lower bounded by a constant determined by the bias magnitude (i.e., the irreducible noise) in the estimation. We validate our theoretical findings and evaluate practical performance of our method on CUTEst benchmark test set.

2503.08228 2026-04-02 cs.SE cs.AI cs.CL cs.PF

Investigating Execution-Aware Language Models for Code Optimization

Federico Di Menna, Luca Traini, Gabriele Bavota, Vittorio Cortellessa

详情
Journal ref
2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC), Ottawa, ON, Canada, 2025, pp. 204-215
英文摘要

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies effectively. Recent studies have shown that language models can play a significant role in automating code optimization. However, these models may have insufficient knowledge of how code execute at run-time. To address this limitation, researchers have developed strategies that integrate code execution information into language models. These strategies have shown promise, enhancing the effectiveness of language models in various software engineering tasks. However, despite the close relationship between code execution behavior and efficiency, the specific impact of these strategies on code optimization remains largely unexplored. This study investigates how incorporating code execution information into language models affects their ability to optimize code. Specifically, we apply three different training strategies to incorporate four code execution aspects -- line executions, line coverage, branch coverage, and variable states -- into CodeT5+, a well-known language model for code. Our results indicate that execution-aware models provide limited benefits compared to the standard CodeT5+ model in optimizing code.

2502.05181 2026-04-02 cs.CY cs.AI cs.LG

Enhancing Team Diversity with Generative AI: A Novel Project Management Framework

Johnny Chan, Yuming Li

Comments A published version can be found from here - https://www.computer.org/csdl/proceedings-article/compsac/2024/769600b648/1ZIUInSDC0w

详情
英文摘要

This research-in-progress paper presents a new project management framework that utilises GenAI technology. The framework is designed to address the common challenge of uniform team compositions in academic and research project teams, particularly in universities and research institutions. It does so by integrating sociologically identified patterns of successful team member personalities and roles, using GenAI agents to fill gaps in team dynamics. This approach adds an additional layer of analysis to conventional project management processes by evaluating team members' personalities and roles and employing GenAI agents, fine-tuned on personality datasets, to fill specific team roles. Our initial experiments have shown improvements in the model's ability to understand and process personality traits, suggesting the potential effectiveness of GenAI teammates in real-world project settings. This paper aims to explore the practical application of AI in enhancing team diversity and project management

2411.10174 2026-04-02 cs.CR cs.AI

A Divide-and-Conquer Strategy for Hard-Label Extraction of Deep Neural Networks via Side-Channel Attacks

Benoit Coqueret, Mathieu Carbone, Olivier Sentieys, Gabriel Zaid

详情
英文摘要

During the past decade, Deep Neural Networks (DNNs) proved their value on a large variety of subjects. However despite their high value and public accessibility, the protection of the intellectual property of DNNs is still an issue and an emerging research field. Recent works have successfully extracted fully-connected DNNs using cryptanalytic methods in hard-label settings, proving that it was possible to copy a DNN with high fidelity, i.e., high similitude in the output predictions. However, the current cryptanalytic attacks cannot target complex, i.e., not fully connected, DNNs and are limited to special cases of neurons present in deep networks. In this work, we introduce a new end-to-end attack framework designed for model extraction of embedded DNNs with high fidelity. We describe a new black-box side-channel attack which splits the DNN in several linear parts for which we can perform cryptanalytic extraction and retrieve the weights in hard-label settings. With this method, we are able to adapt cryptanalytic extraction, for the first time, to non-fully connected DNNs, while maintaining a high fidelity. We validate our contributions by targeting several architectures implemented on a microcontroller unit, including a Multi-Layer Perceptron (MLP) of 1.7 million parameters and a shortened MobileNetv1. Our framework successfully extracts all of these DNNs with high fidelity (88.4% for the MobileNetv1 and 93.2% for the MLP). Furthermore, we use the stolen model to generate adversarial examples and achieve close to white-box performance on the victim's model (95.8% and 96.7% transfer rate).

2410.22729 2026-04-02 stat.ML cs.LG math.ST stat.TH

Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots

Vincent Guan, Joseph Janssen, Hossein Rahmani, Andrew Warren, Stephen Zhang, Elina Robeva, Geoffrey Schiebinger

详情
英文摘要

Stochastic differential equations (SDEs) are a fundamental tool for modelling dynamic processes, including gene regulatory networks (GRNs), contaminant transport, financial markets, and image generation. However, learning the underlying SDE from data is a challenging task, especially if individual trajectories are not observable. Motivated by burgeoning research in single-cell datasets, we present the first comprehensive approach for jointly identifying the drift and diffusion of an SDE from its temporal marginals. Assuming linear drift and additive diffusion, we show that non-identifiability can only arise if the initial distribution possesses generalized rotational symmetries. We further prove that even if this condition holds, the drift and diffusion can almost always be recovered from the marginals. Additionally, we show that the causal graph of any SDE with additive diffusion can be recovered from the identified SDE parameters. To complement this theory, we adapt entropy-regularized optimal transport to handle anisotropic diffusion, and introduce APPEX (Alternating Projection Parameter Estimation from $X_0$), an iterative algorithm designed to estimate the drift, diffusion, and causal graph of an additive noise SDE, solely from temporal marginals. We show that APPEX iteratively decreases Kullback-Leibler divergence to the true solution, and demonstrate its effectiveness on simulated data from linear additive noise SDEs.

2410.09236 2026-04-02 eess.AS cs.SD

Enhancing Infant Crying Detection with Gradient Boosting for Improved Emotional and Mental Health Diagnostics

Kyunghun Lee, Lauren M. Henry, Eleanor Hansen, Elizabeth Tandilashvili, Lauren S. Wakschlag, Elizabeth Norton, Daniel S. Pine, Melissa A. Brotman, Francisco Pereira

详情
英文摘要

Infant crying can serve as a crucial indicator of various physiological and emotional states. This paper introduces a comprehensive approach detecting infant cries within audio data. We integrate Wav2Vec with traditional audio features and employ Gradient Boosting Machines for cry classification. We validate our approach on a real world dataset, demonstrating significant performance improvements over existing methods.

2407.13477 2026-04-02 eess.SY cs.RO cs.SY

The Construction of a Soft Gripper Based on Magnetorheological Elastomer with Permanent Magnet

Jakub Bernat, Pawel Czopek, Paulina Superczynska, Piotr Gajewski, Agnieszka Marcinkowska

详情
英文摘要

Recently, magnetorheological elastomers have become an interesting smart material with many new designs for robotics. A variety of applications have been built with magnetorheological elastomers, such as vibration absorbers, actuators, or grippers, showing that this material is promising for soft robotics. In this work, the novel concept of a gripper is proposed, exploring the features of a magnetorheological elastomer and permanent magnet. The gripper uses the energy of a permanent magnet to provide a self-closing gripping mechanism. The usage of flexible material enables one to hold delicate objects of various shapes. This paper presents the rolling effect of magnetorheological elastomer and permanent magnet, the design process, and the features of the soft gripper. The effectiveness of the soft gripper was validated in a series of experiments that involved lifting different objects.

2405.15132 2026-04-02 stat.ML cs.LG math.ST stat.CO stat.ME stat.TH

Scale-adaptive and robust intrinsic dimension estimation via optimal neighbourhood identification

Antonio Di Noia, Iuri Macocco, Aldo Glielmo, Alessandro Laio, Antonietta Mira

详情
英文摘要

The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also appear erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. In the presented framework, to estimate the density it is necessary to know the ID, therefore, this condition is imposed self-consistently. We illustrate the usefulness and robustness of this procedure to noise by benchmarks on artificial and real-world datasets.

2307.14012 2026-04-02 stat.ML cs.LG

MCMC-Correction of Score-Based Diffusion Models for Model Composition

Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson

Comments 27 pages. Published in Entropy 28(3):351 (2026). This version matches the published content

详情
Journal ref
Entropy 28(3):351, 2026
英文摘要

Diffusion models can be parameterized in terms of either score or energy function. The energy parameterization is attractive as it enables sampling procedures such as Markov Chain Monte Carlo (MCMC) that incorporates a Metropolis--Hastings (MH) correction step based on energy differences between proposed samples. Such corrections can significantly improve sampling quality, particularly in the context of model composition, where pre-trained models are combined to generate samples from novel distributions. Score-based diffusion models, on the other hand, are more widely adopted and come with a rich ecosystem of pre-trained models. However, they do not, in general, define an underlying energy function, making MH-based sampling inapplicable. In this work, we address this limitation by retaining score parameterization and introducing a novel MH-like acceptance rule based on line integration of the score function. This allows the reuse of existing diffusion models while still combining the reverse process with various MCMC techniques, viewed as an instance of annealed MCMC. Through experiments on synthetic and real-world data, we show that our MH-like samplers {yield relative improvements of similar magnitude to those observed} with energy-based models, without requiring explicit energy parameterization.

1811.06026 2026-04-02 cs.GT cs.DS cs.LG

Incentivizing Exploration with Selective Data Disclosure

Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

Comments The ACM-EC 2020 conference publication corresponds to the Feb'20 version. Section 7 ("robustness") and Section 8 (the numerical study) were added in, resp., Dec'20 and Nov'24. New discussions (Section 3.2.1 and Appendix B) were added in April'26, as well as a partial reframing of the motivating story to emphasize transparency and deemphasize commitment

详情
英文摘要

We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by these subsequences. We asymptotically attain optimal regret rate for exploration, using a flexible frequentist behavioral model and mitigating rationality and commitment assumptions inherent in prior work. We suggest three components of effective recommendation systems: independent focus groups, group aggregators, and interlaced information structures.

2604.00316 2026-04-02 stat.ML cs.LG

Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Marcel Tomàs Bernal, Neil Rohit Mallinar, Mikhail Belkin

详情
英文摘要

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic (Power et al., 2022). We study grokking on algebraic tasks in a class of feature learning kernels via the Recursive Feature Machine (RFM) algorithm (Radhakrishnan et al., 2024), which iteratively updates feature matrices through the Average Gradient Outer Product (AGOP) of an estimator in order to learn task-relevant features. Our main experimental finding is that generalization occurs only when a certain symmetry in the training set is broken. Furthermore, we empirically show that RFM generalizes by recovering the underlying invariance group action inherent in the data. We find that the learned feature matrices encode specific elements of the invariance group, explaining the dependence of generalization on symmetry.

2604.00314 2026-04-02 eess.IV cs.AI

Prompt-Guided Prefiltering for VLM Image Compression

Bardia Azizian, Ivan V. Bajic

Comments 7 pages, 5 figures. Accepted to IEEE ICME 2026. Code: https://github.com/bardia-az/pgp-vlm-compression

详情
英文摘要

The rapid progress of large Vision-Language Models (VLMs) has enabled a wide range of applications, such as image understanding and Visual Question Answering (VQA). Query images are often uploaded to the cloud, where VLMs are typically hosted, hence efficient image compression becomes crucial. However, traditional human-centric codecs are suboptimal in this setting because they preserve many task-irrelevant details. Existing Image Coding for Machines (ICM) methods also fall short, as they assume a fixed set of downstream tasks and cannot adapt to prompt-driven VLMs with an open-ended variety of objectives. We propose a lightweight, plug-and-play, prompt-guided prefiltering module to identify image regions most relevant to the text prompt, and consequently to the downstream task. The module preserves important details while smoothing out less relevant areas to improve compression efficiency. It is codec-agnostic and can be applied before conventional and learned encoders. Experiments on several VQA benchmarks show that our approach achieves a 25-50% average bitrate reduction while maintaining the same task accuracy. Our source code is available at https://github.com/bardia-az/pgp-vlm-compression.

2604.00283 2026-04-02 eess.SY cs.LG cs.SY

Data-Driven Reachability Analysis via Diffusion Models with PAC Guarantees

Yanliang Huang, Peng Xie, Wenyuan Wu, Zhuoqi Zeng, Amr Alanwar

Comments 8 pages, 5 figures, submitted to the 65th IEEE Conference on Decision and Control (CDC 2026)

详情
英文摘要

We present a data-driven framework for reachability analysis of nonlinear dynamical systems that requires no explicit model. A denoising diffusion probabilistic model learns the time-evolving state distribution of a dynamical system from trajectory data alone. The predicted reachable set takes the form of a sublevel set of a nonconformity score derived from the reconstruction error, with the threshold calibrated via the Learn Then Test procedure so that the probability of excluding a reachable state is bounded with high probability. Experiments on three nonlinear systems, a forced Duffing oscillator, a planar quadrotor, and a high-dimensional reaction-diffusion system, confirm that the empirical miss rate remains below the Probably Approximately Correct (PAC) bound while scaling to state dimensions beyond the reach of classical grid-based and polynomial methods.