arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.12163 2026-03-13 cs.LG cs.AI math.ST stat.ML stat.TH

A Quantitative Characterization of Forgetting in Post-Training

Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan

详情
英文摘要

Continual post-training of generative models is widely used, yet a principled understanding of when and why forgetting occurs remains limited. We develop theoretical results under a two-mode mixture abstraction (representing old and new tasks), proposed by Chen et al. (2025) (arXiv:2510.18874), and formalize forgetting in two forms: (i) mass forgetting, where the old mixture weight collapses to zero, and (ii) old-component drift, where an already-correct old component shifts during training. For equal-covariance Gaussian modes, we prove that forward-KL objectives trained on data from the new distribution drive the old weight to zero, while reverse-KL objectives converge to the true target (thereby avoiding mass forgetting) and perturb the old mean only through overlap-gated misassignment probabilities controlled by the Bhattacharyya coefficient, yielding drift that decays exponentially with mode separation and a locally well-conditioned geometry with exponential convergence. We further quantify how replay interacts with these objectives. For forward-KL, replay must modify the training distribution to change the population optimum; for reverse-KL, replay leaves the population objective unchanged but prevents finite-batch old-mode starvation through bounded importance weighting. Finally, we analyze three recently proposed near-on-policy post-training methods, SDFT (arxiv:2601.19897), TTT-Discover (arxiv:2601.16175), and OAPL (arxiv:2602.19362), via the same lens and derive explicit conditions under which each retains old mass and exhibits overlap-controlled drift. Overall, our results show that forgetting can by precisely quantified based on the interaction between divergence direction, geometric behavioral overlap, sampling regime, and the visibility of past behavior during training.

2603.12102 2026-03-13 stat.ML cs.LG stat.CO stat.ME

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Louis Sharrock

详情
英文摘要

Bayesian optimal experimental design (BOED) provides a powerful, decision-theoretic framework for selecting experiments so as to maximise the expected utility of the data to be collected. In practice, however, its applicability can be limited by the difficulty of optimising the chosen utility. The expected information gain (EIG), for example, is often high-dimensional and strongly non-convex. This challenge is particularly acute in the batch setting, where multiple experiments are to be designed simultaneously. In this paper, we introduce a new approach to batch EIG-based BOED via a probabilistic lifting of the original optimisation problem to the space of probability measures. In particular, we propose to optimise an entropic regularisation of the expected utility over the space of design measures. Under mild conditions, we show that this objective admits a unique minimiser, which can be explicitly characterised in the form of a Gibbs distribution. The resulting design law can be used directly as a randomised batch-design policy, or as a computational relaxation from which a deterministic batch is extracted. To obtain scalable approximations when the batch size is large, we then consider two tractable restrictions of the full batch distribution: a mean-field family, and an i.i.d. product family. For the i.i.d. objective, and formally for its mean-field extension, we derive the corresponding Wasserstein gradient flow, characterise its long-time behaviour, and obtain particle-based algorithms via space-time discretisations. We also introduce doubly stochastic variants that combine interacting particle updates with Monte Carlo estimators of the EIG gradient. Finally, we illustrate the performance of the proposed methods in several numerical experiments, demonstrating their ability to explore multimodal optimisation landscapes and obtain high-utility batches in challenging examples.

2603.12060 2026-03-13 cs.LG cs.AI math.ST stat.ML stat.TH

Chemical Reaction Networks Learn Better than Spiking Neural Networks

Sophie Jaffard, Ivo F. Sbalzarini

Comments Keywords: Chemical Reaction Networks, Spiking Neural Networks, Supervised Learning, Classification, Mass-Action Kinetics, Statistical Learning Theory, Regret Bounds, Model Complexity

详情
英文摘要

We mathematically prove that chemical reaction networks without hidden layers can solve tasks for which spiking neural networks require hidden layers. Our proof uses the deterministic mass-action kinetics formulation of chemical reaction networks. Specifically, we prove that a certain reaction network without hidden layers can learn a classification task previously proved to be achievable by a spiking neural network with hidden layers. We provide analytical regret bounds for the global behavior of the network and analyze its asymptotic behavior and Vapnik-Chervonenkis dimension. In a numerical experiment, we confirm the learning capacity of the proposed chemical reaction network for classifying handwritten digits in pixel images, and we show that it solves the task more accurately and efficiently than a spiking neural network with hidden layers. This provides a motivation for machine learning in chemical computers and a mathematical explanation for how biological cells might exhibit more efficient learning behavior within biochemical reaction networks than neuronal networks.

2603.11991 2026-03-13 cs.CL cs.AI cs.LG stat.ML

BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

Ilias Aarab

Comments Accepted at ICLR 2026. 31 pages, 5 figures, 9 tables. Code: https://github.com/IliasAarab/btzsc ; Dataset: https://huggingface.co/datasets/btzsc/btzsc ; Leaderboard: https://huggingface.co/spaces/btzsc/btzsc-leaderboard . Proceedings of the Fourteenth International Conference on Learning Representations (ICLR 2026), 2026

详情
英文摘要

Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI), recent advances in text-embedding models, rerankers, and instruction-tuned large language models (LLMs) have challenged the dominance of NLI-based architectures. Yet, systematically comparing these diverse approaches remains difficult. Existing evaluations, such as MTEB, often incorporate labeled examples through supervised probes or fine-tuning, leaving genuine zero-shot capabilities underexplored. To address this, we introduce BTZSC, a comprehensive benchmark of 22 public datasets spanning sentiment, topic, intent, and emotion classification, capturing diverse domains, class cardinalities, and document lengths. Leveraging BTZSC, we conduct a systematic comparison across four major model families, NLI cross-encoders, embedding models, rerankers and instruction-tuned LLMs, encompassing 38 public and custom checkpoints. Our results show that: (i) modern rerankers, exemplified by Qwen3-Reranker-8B, set a new state-of-the-art with macro F1 = 0.72; (ii) strong embedding models such as GTE-large-en-v1.5 substantially close the accuracy gap while offering the best trade-off between accuracy and latency; (iii) instruction-tuned LLMs at 4--12B parameters achieve competitive performance (macro F1 up to 0.67), excelling particularly on topic classification but trailing specialized rerankers; (iv) NLI cross-encoders plateau even as backbone size increases; and (v) scaling primarily benefits rerankers and LLMs over embedding models. BTZSC and accompanying evaluation code are publicly released to support fair and reproducible progress in zero-shot text understanding.

2603.11989 2026-03-13 cs.LG math.OC stat.ML

On-Average Stability of Multipass Preconditioned SGD and Effective Dimension

Simon Vary, Tyler Farghly, Ilja Kuzborskij, Patrick Rebeschini

Comments 35 pages, 1 figure

详情
英文摘要

We study trade-offs between the population risk curvature, geometry of the noise, and preconditioning on the generalisation ability of the multipass Preconditioned Stochastic Gradient Descent (PSGD). Many practical optimisation heuristics implicitly navigate this trade-off in different ways -- for instance, some aim to whiten gradient noise, while others aim to align updates with expected loss curvature. When the geometry of the population risk curvature and the geometry of the gradient noise do not match, an aggressive choice that improves one aspect can amplify instability along the other, leading to suboptimal statistical behavior. In this paper we employ on-average algorithmic stability to connect generalisation of PSGD to the effective dimension that depends on these sources of curvature. While existing techniques for on-average stability of SGD are limited to a single pass, as first contribution we develop a new on-average stability analysis for multipass SGD that handles the correlations induced by data reuse. This allows us to derive excess risk bounds that depend on the effective dimension. In particular, we show that an improperly chosen preconditioner can yield suboptimal effective dimension dependence in both optimisation and generalisation. Finally, we complement our upper bounds with matching, instance-dependent lower bounds.

2603.11965 2026-03-13 stat.ML cs.LG stat.ME

Uncovering Locally Low-dimensional Structure in Networks by Locally Optimal Spectral Embedding

Hannah Sansford, Nick Whiteley, Patrick Rubin-Delanchy

详情
英文摘要

Standard Adjacency Spectral Embedding (ASE) relies on a global low-rank assumption often incompatible with the sparse, transitive structure of real-world networks, causing local geometric features to be 'smeared'. To address this, we introduce Local Adjacency Spectral Embedding (LASE), which uncovers locally low-dimensional structure via weighted spectral decomposition. Under a latent position model with a kernel feature map, we treat the image of latent positions as a locally low-dimensional set in infinite-dimensional feature space. We establish finite-sample bounds quantifying the trade-off between the statistical cost of localisation and the reduced truncation error achieved by targeting a locally low-dimensional region of the embedding. Furthermore, we prove that sufficient localisation induces rapid spectral decay and the emergence of a distinct spectral gap, theoretically justifying low-dimensional local embeddings. Experiments on synthetic and real networks show that LASE improves local reconstruction and visualisation over global and subgraph baselines, and we introduce UMAP-LASE for assembling overlapping local embeddings into high-fidelity global visualisations.

2603.11960 2026-03-13 stat.ME cond-mat.mtrl-sci

Bayesian Model Calibration with Integrated Discrepancy: Addressing Inexact Dislocation Dynamics Models

Liam Myhill, Enrique Martinez Saez, Sez Russcher

Comments Preprint with arxiv formatting

详情
英文摘要

In this work, a novel approach to Bayesian model calibration routines is developed which reinterprets the traditional definition of model discrepancy as defined by Kennedy and O'Hagan (KOH). The novelty lies in the integration of $δ_θ(x_i)$ GPs within the simulator, which is approximated as a GP surrogate model to ensure computational tractability. This approach assumes that the utilized simulator sufficiently predicts observed trends when calibrated with respect to the application domain, and that all model-form errors can be attributed to uncertainty in the input parameters. In contrast, the KOH method assumes discrepancy to be inherently decoupled from the simulator, acting as a 'catch-all' for various sources of model error. The new method is applied to Molecular Dynamics observations of the critical stress to drive dislocation dipoles, and equivalent predictions using a Discrete Dislocation Dynamics simulator whose coarse-grained physical interpretation of the underlying physical mechanisms requires calibration against MD observations. We present an overview of similar state-aware calibration routines; differentiate the provided approach through redefining the commonly used discrepancy Gaussian process and benchmark against KOH. A philosophical argument as to when application of the proposed method is appropriate is provided, and future directions for expanding upon this methodology are proposed.

2603.10520 2026-03-13 astro-ph.EP astro-ph.IM stat.AP

Spectral Decomposition Reveals Surface Processes on Europa

Gideon Yoffe, Sahar Shahaf

Comments Accepted for publication in The Astrophysical Journal

详情
英文摘要

Competing processes shape Europa's surface: geological activity replenishes material through resurfacing, while bombardment by charged particles alters surface chemical composition. Each process leaves distinct spectral signatures. We present a novel data-driven analysis of JWST NIRSpec-IFU observations of Europa's leading hemisphere across three observing geometries, targeting nine spectral bands sensitive to water ice, radiolytic products, and volatiles. Through spectral factorization, we isolate the dominant components of spectral variability and reconstruct their spatial distributions. We find that CO2 enrichment extends beyond Tara Regio, and covers multiple chaos units in a lens-like pattern. These CO2-enriched areas co-occur with anomalous ice-texture signatures. Together, these findings suggest that enrichment in volatiles on Europa may reflect retention-favorable near-surface microphysics as well as emplacement, refining how they are interpreted in the context of surface--interior exchange. This has implications for interpreting the sources and supply rates of extant carbon-bearing species and, ultimately, for assessing Europa's habitability.

2602.23151 2026-03-13 math.CA math.PR math.ST stat.TH

High-dimensional Laplace asymptotics up to the concentration threshold

Alexander Katsevich, Anya Katsevich

Comments Change from v1: added new result on normalizing flow style posterior approximation

详情
英文摘要

We study high-dimensional Laplace-type integrals $I(λ):=(λ/2π)^{d/2}\int_{\mathbb R^d} g(x)e^{-λf(x)}dx$ in the regime where both $d$ and $λ$ are large. Existing rigorous Laplace-expansion results in growing dimension are largely confined to the "Gaussian-approximation" regime $d^2/λ\to0$, which excludes many practically relevant settings that lie beyond this threshold but still satisfy the concentration condition $d/λ\to0$. We close this gap by deriving an explicit asymptotic expansion for $\log I(λ)$ with quantitative remainder bounds that remain valid throughout this intermediate region, arbitrarily close to the concentration threshold. Fix $L\ge1$ and assume that, in a neighborhood of the global minimizer of $f$, the operator norms of derivatives of $f$ and $g$ are bounded independently of $d,λ$ up to orders $2L+2$ and $2L$, respectively. Assuming also some mild global growth conditions, we prove $$\log I(λ)=\sum_{k=1}^{L-1} b_k(f,g)λ^{-k}+O(d^{L+1}/λ^L), \qquad d^{L+1}/λ^L\to0,$$ with coefficients satisfying $b_k(f,g)=O(d^{k+1})$. Moreover, the $b_k(f,g)$ coincide with the coefficients from the formal cumulant expansion of $\log I(λ)$. We also study computation for concentrating densities $π(x)\propto e^{-λf(x)}$. For smooth observables $g$, our expansion yields closed-form, analytic approximations of $\mathbb E_{X\simπ}[g(X)]$. For sampling, we construct explicit polynomial transports $x_L$ such that $π_L:=(x_L)_\# N(0,λ^{-1}I_d)$ satisfies $\mathrm{TV}(π,π_L)\lesssim d^{L+1}/λ^L$ for $L=1,2,3,\dots$, yielding an accurate procedure arbitrarily close to the concentration threshold $d=o(λ)$.

2512.17113 2026-03-13 stat.ME stat.CO

A systematic assessment of Large Language Models for constructing two-level fractional factorial designs

Alan R. Vazquez, Kilian M. Rother, Marco V. Charles-Gonzalez

Comments 31 pages, 11 tables

详情
英文摘要

Two-level fractional factorial designs permit the study multiple factors using a limited number of runs. Traditionally, these designs are obtained from catalogs available in standard textbooks or statistical software. However, modern Large Language Models (LLMs) can now produce two-level fractional factorial designs, but the quality of these designs has not been previously assessed. In this paper, we perform a systematic evaluation of two popular classes of LLMs, namely GPT and Gemini models, to construct two-level fractional factorial designs with 8, 16, and 32 runs, and 4 to 26 factors. To this end, we use prompting techniques to develop a high-quality set of design construction tasks for the LLMs. We compare the designs obtained by the LLMs with the best-known designs in terms of resolution and minimum aberration criteria. We show that the LLMs can effectively construct optimal 8-, 16-, and 32-run designs with up to eight factors.

2512.06297 2026-03-13 cs.LG cond-mat.dis-nn cond-mat.stat-mech cs.AI stat.ML

Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

Luca Di Carlo, Chase Goddard, David J. Schwab

Comments ICLR 2026

详情
英文摘要

Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a single convex basin and rarely explore intermediate points. We resolve this paradox by identifying entropic barriers arising from the interplay between curvature variations along these paths and noise in optimization dynamics. Empirically, we find that curvature systematically rises away from minima, producing effective forces that bias noisy dynamics back toward the endpoints - even when the loss remains nearly flat. These barriers persist longer than energetic barriers, shaping the late-time localization of solutions in parameter space. Our results highlight the role of curvature-induced entropic forces in governing both connectivity and confinement in deep learning landscapes.

2512.06210 2026-03-13 stat.AP cs.LG

Forests of Uncertaint(r)ees: Using tree-based ensembles to estimate probability distributions of future conflict

Daniel Mittermaier, Tobias Bohne, Martin Hofer, Daniel Racek

Comments 23 pages, 4 figures, 3 tables. Replication code available at https://github.com/ccew-unibw/uncertaintrees

详情
英文摘要

Predictions of fatalities from violent conflict on the PRIO-GRID-month (pgm) level are characterized by high levels of uncertainty, limiting their usefulness in practical applications. We discuss the two main sources of uncertainty for this prediction task, the nature of violent conflict and data limitations, embedding conflict prediction in the wider literature on uncertainty quantification in machine learning. Based on this, we develop a strategy to quantify uncertainty in conflict forecasting, shifting from traditional point predictions to full predictive distributions. Our approach combines multiple tree-based classifiers and distributional regressors in a custom AutoML setup, estimating distributions for each pgm individually. We also test the integration of regional models in spatial ensembles as a potential avenue to reduce uncertainty by lowering data requirements and accounting for systematic differences between conflict contexts. The models are able to consistently outperform a suite of benchmarks derived from conflict history in predictions up to one year in advance. Marginal differences in model-wide metrics emphasize the need to understand their behavior for a given prediction problem, in this case characterized by extremely high zero-inflatedness. Adressing this, we compliment our evaluation with a simulation experiment, which demonstrates that our models reflect meaningful performance improvements, which can be traced back to conflict-affected regions. Lastly, we show that the integration of regional models does not decrease performance, opening avenues to integrate additional data sources in the future.

2511.06967 2026-03-13 stat.ME stat.CO stat.ML

Approximate Bayesian inference for cumulative probit regression models

Emanuele Aliverti

详情
英文摘要

Ordinal categorical data are routinely encountered in many practical applications. When the primary goal is to construct a regression model for ordinal outcomes, cumulative link models represent one of the most popular choices to link the cumulative probabilities of the response with a set of covariates through a parsimonious linear predictor, shared across response categories. As the number of observations grows, standard sampling algorithms for Bayesian inference scale poorly, making posterior computation increasingly challenging for large datasets. In this article, we propose three scalable algorithms for approximating the posterior distribution of the regression coefficients in cumulative probit models relying on Variational Bayes and Expectation Propagation. We compare the proposed approaches with inference based on Markov Chain Monte Carlo, demonstrating superior computational performance and remarkable accuracy. Finally, we illustrate the utility of the proposed algorithms on a challenging case study to investigate the structure of a criminal network.

2510.07204 2026-03-13 econ.EM math.ST stat.ME stat.TH

Beyond the Oracle Property: Adaptive LASSO in Cointegrating Regressions with Local-to-Unity Regressors

Karsten Reichold, Ulrike Schneider

详情
英文摘要

This paper derives new asymptotic results for the adaptive LASSO estimator in cointegrating regressions, allowing for uncertainty about whether the regressors are exact unit root processes. We study model selection probabilities, estimator consistency, and limiting distributions under standard and moving-parameter asymptotics. We further derive uniform convergence rates and the fastest local-to-zero rates detectable by the estimator under conservative and consistent tuning. For consistent tuning, we construct confidence regions that are easy to implement, uniformly valid over the parameter space, and achieve sure asymptotic coverage without requiring knowledge or estimation of local-to-unity or long-run covariance parameters. Simulation results reveal that the finite-sample distribution of the adaptive LASSO estimator can deviate substantially from the oracle property, whereas moving-parameter asymptotics provide much more accurate approximations. Consequently, in addition to being infeasible in applications due to their dependence on non-estimable nuisance parameters, oracle-based confidence regions are often too small to achieve adequate coverage in empirically relevant scenarios with small but non-zero coefficients. In contrast, the proposed confidence regions are always feasible and deliver reliable coverage across the parameter space. An empirical application to predicting the U.S. unemployment rate illustrates their practical usefulness for quantifying uncertainty around adaptive LASSO estimates.

2507.14132 2026-03-13 stat.ME

A Bayesian Dirichlet Auto-Regressive Conditional Heteroskedasticity Model for Forecasting Currency Shares

Harrison Katz, Robert E. Weiss

详情
英文摘要

We analyze daily Airbnb service-fee shares across eleven settlement currencies, a compositional series that shows bursts of volatility after shocks such as the COVID-19 pandemic. Standard Dirichlet time series models assume constant precision and therefore miss these episodes. We introduce B-DARMA-DARCH, a Bayesian Dirichlet autoregressive moving average model with a Dirichlet ARCH component, which lets the precision parameter follow an ARMA recursion. The specification preserves the Dirichlet likelihood so forecasts remain valid compositions while capturing clustered volatility. Simulations and out-of-sample tests show that B-DARMA-DARCH lowers forecast error and improves interval calibration relative to Dirichlet ARMA and log-ratio VARMA benchmarks, providing a concise framework for settings where both the level and the volatility of proportions matter.

2506.17373 2026-03-13 stat.ME q-bio.QM

A practical identifiability criterion leveraging weak-form parameter estimation

Nora Heitzman-Breen, Vanja Dukic, David M. Bortz

详情
英文摘要

In this work, we define a practical identifiability criterion, (e, q)-identifiability, based on a parameter e, reflecting the noise in observed variables, and a parameter q, reflecting the mean-square error of the parameter estimator. This criterion is better able to encompass changes in the quality of the parameter estimate due to increased noise in the data (compared to existing criteria based solely on average relative errors). Furthermore, we leverage a weak-form equation error-based method of parameter estimation for systems with unobserved variables to assess practical identifiability far more quickly in comparison to output error-based parameter estimation. We do so by generating weak-form input-output equations using differential algebra techniques, as previously proposed by Boulier et al [1], and then applying Weak form Estimation of Nonlinear Dynamics (WENDy) to obtain parameter estimates. This method is computationally efficient and robust to noise, as demonstrated through two classical biological modelling examples.

2505.22034 2026-03-13 stat.ME math.ST stat.TH

Random irregular histograms

Oskar Høgberg Simensen, Dennis Christensen, Nils Lid Hjort

详情
Journal ref
Computational Statistics & Data Analysis (2026)
英文摘要

We propose a new method of histogram construction, providing a fully Bayesian approach to irregular histograms. Our procedure applies Bayesian model selection to a piecewise constant model of the underlying distribution, resulting in a method that selects both the number of bins as well as their location based on the data in a fully automatic fashion. We show that the histogram estimate is consistent with respect to the Hellinger metric under mild regularity conditions, and that it attains a convergence rate equal to the minimax rate (up to a logarithmic factor) for Hölder continuous densities. Simulation studies indicate that the new method performs comparably to other histogram procedures, both for minimizing the estimation error and for identifying modes. A software implementation is included as supplementary material.

2411.12184 2026-03-13 stat.ME cs.AI cs.LG

Testability of Instrumental Variables in Additive Nonlinear, Non-Constant Effects Models

Xichen Guo, Zheng Li, Biwei Huang, Yan Zeng, Zhi Geng, Feng Xie

详情
英文摘要

We address the issue of the testability of instrumental variables derived from observational data. Most existing testable implications are centered on scenarios where the treatment is a discrete variable, e.g., instrumental inequality (Pearl, 1995), or where the effect is assumed to be constant, e.g., instrumental variables condition based on the principle of independent mechanisms (Burauel, 2023). However, treatments can often be continuous variables, such as drug dosages or nutritional content levels, and non-constant effects may occur in many real-world scenarios. In this paper, we consider an additive nonlinear, non-constant effects model with unmeasured confounders, in which treatments can be either discrete or continuous, and propose an Auxiliary-based Independence Test (AIT) condition to test whether a variable is a valid instrument. We first show that, under the completeness condition, if the candidate instrument is valid, then the AIT condition holds. Moreover, we illustrate the implications of the AIT condition and demonstrate that, under certain additional conditions, the AIT condition is necessary and sufficient to detect all invalid IVs. We also extend the AIT condition to include covariates and introduce a practical testing algorithm. Experimental results on both synthetic and three different real-world datasets show the effectiveness of our proposed condition.

2411.03387 2026-03-13 cs.LG stat.ML

Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner

Valentyn Melnychuk, Stefan Feuerriegel, Mihaela van der Schaar

详情
Journal ref
Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, Canada, 2024
英文摘要

Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and, thus, quasi-oracle efficiency. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.

2311.11321 2026-03-13 stat.ML cs.AI cs.LG

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation

Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel

详情
Journal ref
Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024), Vienna, Austria
英文摘要

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic refutation framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATE is non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose a neural refutation framework which performs partial identification of CATE or, equivalently, aims at estimating lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our refutation framework is of direct relevance in practice where the validity of CATE estimation is of importance.

2603.11916 2026-03-13 stat.ME

Distributionally balanced sampling designs

Anton Grafström, Wilmer Prentius

Comments 16 pages, 3 figures

详情
英文摘要

We propose Distributionally Balanced Designs (DBD), a new class of probability sampling designs that target representativeness at the level of the full auxiliary distribution rather than selected moments. In disciplines such as ecology, forestry, and environmental sciences, where field data collection is expensive, maximizing the information extracted from a limited sample is critical. More precisely, DBD can be viewed as minimum discrepancy designs that minimize the expected discrepancy between the sample and population auxiliary distributions. The key idea is to construct samples whose empirical auxiliary distribution closely matches that of the population. We present a first implementation of DBD based on an optimized circular ordering of the population, combined with random selection of a contiguous block of units. The ordering is chosen to minimize the design-expected energy distance, a discrepancy measure that captures differences between distributions beyond low-order moments. This criterion promotes strong spatial spread, and yields low variance for Horvitz-Thompson estimators of totals of functions that vary smoothly with respect to auxiliaries. Simulation results show that approximate DBD achieves better distributional fit than state-of-the-art methods such as the local pivotal and local cube designs. Hence, DBD can improve the reliability of estimates from costly field data, making distributional balancing effective for constructing representative surveys in resource-constrained applications.

2603.11909 2026-03-13 cs.LG cs.AI stat.ML

EnTransformer: A Deep Generative Transformer for Multivariate Probabilistic Forecasting

Rajdeep Pathak, Rahul Goswami, Madhurima Panja, Palash Ghosh, Tanujit Chakraborty

详情
英文摘要

Reliable uncertainty quantification is critical in multivariate time series forecasting problems arising in domains such as energy systems and transportation networks, among many others. Although Transformer-based architectures have recently achieved strong performance for sequence modeling, most probabilistic forecasting approaches rely on restrictive parametric likelihoods or quantile-based objectives. They can struggle to capture complex joint predictive distributions across multiple correlated time series. This work proposes EnTransformer, a deep generative forecasting framework that integrates engression, a stochastic learning paradigm for modeling conditional distributions, with the expressive sequence modeling capabilities of Transformers. The proposed approach injects stochastic noise into the model representation and optimizes an energy-based scoring objective to directly learn the conditional predictive distribution without imposing parametric assumptions. This design enables EnTransformer to generate coherent multivariate forecast trajectories while preserving Transformers' capacity to effectively model long-range temporal dependencies and cross-series interactions. We evaluate our proposed EnTransformer on several widely used benchmarks for multivariate probabilistic forecasting, including Electricity, Traffic, Solar, Taxi, KDD-cup, and Wikipedia datasets. Experimental results demonstrate that EnTransformer produces well-calibrated probabilistic forecasts and consistently outperforms the benchmark models.

2603.11897 2026-03-13 q-fin.RM stat.AP

Deriving the term-structure of loan write-off risk under IFRS 9 by using survival analysis: A benchmark study

Arno Botha, Mohammed Gabru, Marcel Muller, Janette Larney

Comments 16871 words, 44 pages, 12 Figures

详情
英文摘要

The estimation of marginal loan write-off probabilities is a non-trivial task when modelling the loss given default (LGD) risk parameter in credit risk. We explore two types of survival models in estimating the overall write-off probability over default spell time, where these probabilities form the term-structure of write-off risk in aggregate. These survival models include a discrete-time hazard (DtH) model and a conditional inference survival tree. Both models are compared to a cross-sectional logistic regression model for write-off risk. All of these (first-stage) models are then ensconced in a broader two-stage LGD-modelling approach, wherein a loss severity model is estimated in the second stage. In expanding the model suite, a novel dichotomisation step is introduced for collapsing the write-off probability into a 0/1-value, prior to LGD-calculation. A benchmark study is subsequently conducted amongst the resulting LGD-models. We find that the DtH-model outperforms other two-stage LGD-models admirably across most diagnostics. However, a single-stage LGD-model still had the best results, likely due to the peculiar `L-shaped' LGD-distribution in our data. Ultimately, we believe that our tutorial-style work can enhance LGD-modelling practices when estimating the expected credit loss under IFRS 9.

2603.11835 2026-03-13 stat.ML cs.LG

Hypercomplex Widely Linear Processing: Fundamentals for Quaternion Machine Learning

Sayed Pouria Talebi, Clive Cheong Took

Comments Contributed chapter to appear in Handbook of Statistics Volume 54: Multidimensional Signal Processing, Elsevier, 2026

详情
英文摘要

Numerous attempts have been made to replicate the success of complex-valued algebra in engineering and science to other hypercomplex domains such as quaternions, tessarines, biquaternions, and octonions. Perhaps, none have matched the success of quaternions. The most useful feature of quaternions lies in their ability to model three-dimensional rotations which, in turn, have found various industrial applications such as in aeronautics and computergraphics. Recently, we have witnessed a renaissance of quaternions due to the rise of machine learning. To equip the reader to contribute to this emerging research area, this chapter lays down the foundation for: - augmented statistics for modelling quaternion-valued random processes, - widely linear models to exploit such advanced statistics, - quaternion calculus and algebra for algorithmic derivations, - mean square estimation for practical considerations. For ease of exposure, several examples are offered to facilitate the learning, understanding, and(hopefully) the adoption of this multidimensional domain.

2603.11784 2026-03-13 cs.LG stat.ML

Language Generation with Replay: A Learning-Theoretic View of Model Collapse

Giorgio Racca, Michal Valko, Amartya Sanyal

详情
英文摘要

As scaling laws push the training of frontier large language models (LLMs) toward ever-growing data requirements, training pipelines are approaching a regime where much of the publicly available online text may be consumed. At the same time, widespread LLM usage increases the volume of machine-generated content on the web; together, these trends raise the likelihood of generated text re-entering future training corpora, increasing the associated risk of performance degradation often called model collapse. In practice, model developers address this concern through data cleaning, watermarking, synthetic-data policies, or, in some cases, blissful ignorance. However, the problem of model collapse in generative models has not been examined from a learning-theoretic perspective: we study it through the theoretical lens of the language generation in the limit framework, introducing a replay adversary that augments the example stream with the generator's own past outputs. Our main contribution is a fine-grained learning-theoretic characterization of when replay fundamentally limits generation: while replay is benign for the strongest notion of uniform generation, it provably creates separations for the weaker notions of non-uniform generation and generation in the limit. Interestingly, our positive results mirror heuristics widely used in practice, such as data cleaning, watermarking, and output filtering, while our separations show when these ideas can fail.

2603.11764 2026-03-13 cs.LG stat.ML

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

Botao Chen, Jongyeong Lee, Chansoo Kim, Junya Honda

详情
英文摘要

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in $m$-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to $m$-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fréchet and Pareto distributions with certain parameters achieves the best possible regret of $O(\sqrt{mdT})$ in adversarial setting. We also show that FTPL with Fréchet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for $m$-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to $m$-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from $O(d^2)$ of the original geometric resampling to $O(md(\log(d/m)+1))$ without sacrificing the regret performance.

2603.11761 2026-03-13 stat.ME

Causal Influence Maximization with Steady-State Guarantees

Renjie Cao, Zhuoxin Yan, Xinyan Su, Zhiheng Zhang

详情
英文摘要

Influence maximization in networks is a central problem in machine learning and causal inference, where an intervention on a subset of individuals triggers a diffusion process through the network. Existing approaches typically optimize short-horizon rewards or rely on strong parametric assumptions, offering limited guarantees for longrun causal outcomes. In this work, we address the problem of selecting a seed set to maximize the total steady-state potential outcome under budget constraints. Theoretically, we demonstrate that under a low-probability propagation assumption, the high-dimensional path-dependent dynamics can be compressed into a low-dimensional exposure mapping with a bounded second-order approximation error. Leveraging this structural reduction, we propose CIM, a two-stage framework that first learns shape-constrained exposureresponse functions from observational data and then optimizes the objective via a greedy strategy. Our approach bridges causal inference with network optimization, providing provable guarantees for both the estimation of outcome functions and the approximation ratio of the influence maximization.

2603.11757 2026-03-13 cs.LG cs.AI stat.ML

Exploiting Expertise of Non-Expert and Diverse Agents in Social Bandit Learning: A Free Energy Approach

Erfan Mirzaei, Seyed Pooya Shariatpanahi, Alireza Tavakoli, Reshad Hosseini, Majid Nili Ahmadabadi

详情
英文摘要

Personalized AI-based services involve a population of individual reinforcement learning agents. However, most reinforcement learning algorithms focus on harnessing individual learning and fail to leverage the social learning capabilities commonly exhibited by humans and animals. Social learning integrates individual experience with observing others' behavior, presenting opportunities for improved learning outcomes. In this study, we focus on a social bandit learning scenario where a social agent observes other agents' actions without knowledge of their rewards. The agents independently pursue their own policy without explicit motivation to teach each other. We propose a free energy-based social bandit learning algorithm over the policy space, where the social agent evaluates others' expertise levels without resorting to any oracle or social norms. Accordingly, the social agent integrates its direct experiences in the environment and others' estimated policies. The theoretical convergence of our algorithm to the optimal policy is proven. Empirical evaluations validate the superiority of our social learning method over alternative approaches in various scenarios. Our algorithm strategically identifies the relevant agents, even in the presence of random or suboptimal agents, and skillfully exploits their behavioral information. In addition to societies including expert agents, in the presence of relevant but non-expert agents, our algorithm significantly enhances individual learning performance, where most related methods fail. Importantly, it also maintains logarithmic regret.

2603.11730 2026-03-13 stat.ME

Including historical control data in simultaneous inference for pre-clinical multi-arm studies

Max Menssen, Carsten Kneuer, Gyamfi Akyianu, Christian Röver, Tim Friede, Frank Schaarschmidt

Comments 48 pages, 12 figures

详情
英文摘要

In pre- and non-clinical toxicology, the reduction of animal use is highly desireable. Although approaches for possible sample size reduction in the concurrent control group were suggested previously under the virtual control groups framework for continuous endpoints, methodology that is applicable to binary outcomes that occur in long-term carcinogenicity studies is currently missing. In order to augment animals in the current control group with historical control data, we propose approaches that rely on dynamic Bayesian borrowing and simultaneous credible intervals for risk ratios. Several operation characteristics such as familywise error rate (FWER) and power are assessed via Monte-Carlo simulations and compared to the ones of approaches that rely on pooling of historical and current observations. It turned out that under optimal conditions, Bayesian approaches based on robustified prior distributions enable a substantial reduction of the control groups sample size, while still controlling the FWER up to a satisfactory level. Furthermore, at least to some extend, these approaches were able to protect against possible drift. This hightlights the potential of Bayesian study designs to reduce animal use in toxicology through re-use of the large pool of existing control data.

2603.11728 2026-03-13 stat.ME stat.CO

A Semiparametric Nonlinear Mixed Effects Model with Penalized Splines Using Automatic Differentiation

Matteo D'Alessandro, Magne Thoresen, Øystein Sørensen

详情
英文摘要

We present an estimation procedure for nonlinear mixed-effects models in which the population trajectory is represented by penalized splines and adapted to individuals via subject-specific transformation parameters. By exploiting the mixed model representation of penalized splines, the level of smoothness can be estimated jointly with other variance components. The integration over random effects needed to obtain the marginal likelihood is carried out using the Laplace approximation. Exact derivatives for evaluation and maximization of the resulting likelihood are obtained via automatic differentiation implemented through Template Model Builder. In simulation studies, the method produces improved inferential performance and reduced computational burden when compared to the existing procedure. The approach is further illustrated through a case study on infant height growth in the first two years of life.

2603.11705 2026-03-13 stat.ME stat.AP

Effective Degrees of Freedom for Balanced Repeated Replication and Paired Jackknife Variance Estimates: A Unified Approach via Stratum Contrasts

Matthias von Davier

详情
英文摘要

Balanced repeated replication (BRR) and the jackknife are two widely used methods for estimating variances in stratified samples with two primary sampling units per stratum. While both methods produce variance estimators that can be expressed as sums of squared stratum-level contrasts, they differ fundamentally in their construction and in the dependence structure of their replicate estimates. This article examines the independence properties of the components contributing to these variance estimators. For BRR, we show that although the replicate estimates themselves are correlated, the balancing property of Hadamard matrices collapses the variance estimator into a sum of independent stratum-specific components. For the jackknife, the independence of components follows directly from the construction. Using these independence results, we derive the variance of each variance estimator and establish a direct connection to the Welch-Satterthwaite degrees of freedom approximation. This yields a practical formula for estimating degrees of freedom when constructing confidence intervals for population totals. The derivation highlights the unified treatment of both replication methods and provides insights into their relative efficiency and applicability.

2603.11701 2026-03-13 stat.ML cs.LG

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Mustafa Cavus

Comments 19 pages, 3 figures

详情
英文摘要

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by the instability of the learned tree structure itself. We provide a formal decomposition of observational multiplicity into these two components and establish statistical guarantees. Our experimental evaluation across diverse credit risk scoring datasets confirms the near-perfect alignment between our theoretical decomposition and the empirically observed variance. Notably, we find that structural regret is the primary driver of observational multiplicity, accounting for over 15 times the variability of leaf regret in some datasets. Furthermore, we demonstrate that utilizing these regret measures as an abstention mechanism in selective prediction can effectively identify arbitrary regions and improve model safety, elevating recall from 92% to 100% on the most stable sub-populations. These results establish a rigorous framework for quantifying observational multiplicity, aligning with recent advances in algorithmic safety and interpretability.

2603.11685 2026-03-13 stat.AP math.ST stat.CO stat.ME stat.TH

On the Unit Teissier Distribution: Properties, Estimation Procedures and Applications

Zuber Akhter, Mohamed A. Abdelaziz, M. Z. Anis, Ahmed Z. Afify

详情
英文摘要

The Teissier distribution, originally proposed by Teissier [31], was designed to model mortality due to aging in domestic animals. More recently, Krishna et al. [19] introduced the Unit Teissier (UT) distribution on the interval (0, 1) through the transformation $X=e^{-Y}$, where $Y$ follows the Teissier distribution. In their work, the authors derived several fundamental properties of the UT distribution and investigated parameter estimation using maximum likelihood, least squares, weighted least squares and Bayesian methods. Building upon this work, the present paper develops additional theoretical and inferential results for the UT distribution. In particular, closed-form expressions for single moments of order statistics and L-moments are obtained, and characterization results based on truncated moments are established. Furthermore, several alternative parameter estimation methods are considered, including maximum product of spacings, Cramér-von Mises, Anderson-Darling, right-tail Anderson-Darling, percentile and L-moment estimation, while the estimation methods previously studied by Krishna et al. [19] are also included for comparison. Extensive simulation studies under various parameter settings and sample sizes are conducted to assess and compare the performance of the estimators. Finally, the flexibility and practical utility of the UT distribution are demonstrated using a real dataset.

2603.11660 2026-03-13 stat.AP q-fin.RM

One-Shot Individual Claims Reserving

Ronald Richman, Mario V. Wüthrich

详情
英文摘要

Individual claims reserving has not yet become established in actuarial practice. We attribute this to the absence of a satisfactory methodology: existing approaches tend to be either overly complex or insufficiently flexible and robust for practical use. Building on the classical chain-ladder (CL) method, we introduced a new perspective on individual claims reserving in Richman and Wüthrich [arXiv:2602.15385]. This manuscript has sparked considerable discussion within the actuarial community. The aim of the present paper is to continue and deepen that discussion, with the ultimate goal of advancing toward a new standard for micro-level reserving.

2603.11532 2026-03-13 math.OC cs.LG stat.ME

Simultaneous estimation of multiple discrete unimodal distributions under stochastic order constraints

Yasuhiro Yoshida, Noriyoshi Sukegawa, Jiro Iwanaga

详情
英文摘要

We study the problem of estimating multiple discrete unimodal distributions, motivated by search behavior analysis on a real-world platform. To incorporate prior knowledge of precedence relations among distributions, we impose stochastic order constraints and formulate the estimation task as a mixed-integer convex quadratic optimization problem. Experiments on both synthetic and real datasets show that the proposed method reduces the Jensen-Shannon divergence by 2.2% on average (up to 6.3%) when the sample size is small, while performing comparably to existing methods when sufficient data are available.

2603.11524 2026-03-13 stat.ME

Robust Joint Modeling for Data with Continuous and Binary Responses

Yu Wang, Ran Jin, Lulu Kang

Comments 25 pages of main texts, 13 pages of supplement, 8 figures

详情
英文摘要

In many supervised learning applications, the response consists of both continuous and binary outcomes. Studies have shown that jointly modeling such mixed-type responses can substantially improve predictive performance compared to separate analyses. But outliers pose a new challenge to the existing likelihood-based modeling approaches. In this paper, we propose a new robust joint modeling framework for data with both continuous and binary responses. It is based on the density power divergence (DPD) loss function with the $\ell_1$ regularization. The proposed framework leads to a sparse estimator that simultaneously predicts continuous and binary responses in high-dimensional input settings while down-weighting influential outliers and mislabeled samples. We also develop an efficient proximal gradient algorithm with Barzilai-Borwein spectral step size and a robust information criterion (RIC) for data-driven selection of the penalty parameters. Extensive simulation studies under a variety of contamination schemes demonstrate that the proposed method achieves lower prediction error and more accurate parameter estimation than several competing approaches. A real case study on wafer lapping in semiconductor manufacturing further illustrates the practical gains in predictive accuracy, robustness, and interpretability of the proposed framework.

2603.11497 2026-03-13 econ.EM stat.ME

Variance Estimation with Dependence and Heterogeneous Means

Luther Yap

详情
英文摘要

This paper considers the problem of estimating the variance of a sum of a triangular array of random vectors with heterogeneous means. When random vectors exhibit two-way cluster dependence or weak dependence, standard variance estimators designed under homogeneous means can underestimate the true variance, which results in subsequent tests being oversized. To restore validity, this paper proposes a simple conservative variance estimator robust to heterogeneous means and shows its asymptotic validity.

2603.11478 2026-03-13 stat.ME cs.DS

Graph Generation Methods under Partial Information

Tong Sun, Jianshu Hao, Michael C. Fu, Guangxin Jiang

Comments 53 pages, 10 figures

详情
英文摘要

We study the problem of generating graphs with prescribed degree sequences for bipartite, directed, and undirected networks. We first propose a sequential method for bipartite graph generation and establish a necessary and sufficient interval condition that characterizes the admissible number of connections at each step, thereby guaranteeing global feasibility. Based on this result, we develop bipartite graph enumeration and sampling algorithms suitable for different problem sizes. We then extend these bipartite graph algorithms to the directed and undirected cases by incorporating additional connection constraints, as well as feasibility verification and symmetric connection steps, while preserving the same algorithmic principles. Finally, numerical experiments demonstrate the performance of the proposed algorithms, particularly their scalability to large instances where existing methods become computationally prohibitive.

2603.11474 2026-03-13 stat.ME stat.AP

Dynamic Bayesian regression quantile synthesis for forecasting outlook-at-risk

Genya Kobayashi, Shonosuke Sugasawa, Yuta Yamauchi, Dongu Han

详情
英文摘要

This paper proposes dynamic Bayesian regression quantile synthesis (DRQS), a novel method for quantile forecasting within the Bayesian predictive synthesis (BPS) framework designed to combine quantile-specific information from multiple agent models. While existing BPS approaches primarily focus on mean forecasting, our method directly targets the conditional quantiles of the response variable by utilizing the asymmetric Laplace distribution for the synthesis function. The resulting framework can be interpreted as a dynamic quantile linear model with latent predictors. We extend the univariate DRQS to a multivariate setting-factor DRQS (FDRQS)-by introducing a time-varying latent factor structure for the synthesis weights. This allows the model to leverage cross-sectional dependencies and shared information across multiple time series simultaneously. We develop an efficient Markov chain Monte Carlo (MCMC) algorithm for posterior inference, utilizing data augmentation and forward-filtering backward-sampling. Empirical applications to US inflation and global GDP growth demonstrate the improved performance of the proposed methods for quantile forecasting. In particular, FDRQS exhibits superior resilience during periods of extreme economic stress, such as the COVID-19 pandemic, by adaptively rebalancing agent contributions and capturing emergent global dependencies.

2603.11465 2026-03-13 stat.ME

Prediction-Oriented Transfer Learning for Survival Analysis

Yu Gu, Donglin Zeng, D. Y. Lin

详情
英文摘要

Transfer learning is beneficial for survival analysis, especially when the target study has a limited number of events. However, existing transfer learning methods rely on the restrictive assumption that the target and source studies share similar parameters under Cox models, and most require access to individual-level source data. In this article, we propose a novel transfer learning framework that enhances model-based survival prediction by transferring predictive rather than distributional knowledge from source studies. Our approach employs flexible semiparametric transformation models for the target data while eliminating the need to model or share the source data. The ingeniously designed penalty enables simple and stable computation via an EM algorithm. We rigorously establish the asymptotic properties of the proposed estimator and show that it achieves a faster convergence rate than the target-only estimator when source knowledge is sufficiently accurate. We demonstrate the advantages of our methods through extensive simulation studies and an application to two major breast cancer studies.

2603.11385 2026-03-13 stat.ME stat.AP

Multivariate Functional Principal Component Analysis for Mixed-Type mHealth Data: An Application to Mood Disorders

Debangan Dey, Rahul Ghosal, Kathleen Merikangas, Vadim Zipunnikov

Comments 30 pages, 12 figures, 2 tables

详情
英文摘要

Modern mobile health (mHealth) assessment combines self-reported measures of participants' health experiences with passively collected health behavior data throughout the day. These data are collected across multiple measurement scales, including continuous (physical activity), truncated (pain), ordinal (mood), and binary (daily life events). When indexed by time of day and stacked across assessment domains, these data structures can be treated as multivariate functional data comprising continuous, truncated, ordinal, and binary variables. Motivated by these applications, we propose a multivariate functional principal component analysis for mixed-type data ($M^2$FPCA). The approach is based on a semiparametric Gaussian copula model and assumes that the observed data arise from an underlying multivariate generalized latent nonparanormal functional process. Latent temporal and inter-variable dependence are estimated semiparametrically through Kendall's tau bridging method. Two covariance estimation procedures are developed: a fully multivariate block-wise estimator and a computationally efficient alternative based on partial separability that assumes shared principal components across domains. The proposed method yields interpretable latent functional principal component scores that can serve as participant-specific digital biomarkers. Simulation studies demonstrate the method's competitive performance under various complex dependence structures. The method is applied to mHealth data from 307 participants in the National Institute of Mental Health Family Study of Mood and Affective Spectrum Disorders. Our approach identifies time-of-day patterns shared across mood, anxiety, energy, and physical activity that meaningfully stratify mood disorder subtypes.

2603.11368 2026-03-13 stat.ML cs.LG econ.EM stat.AP stat.ME

Spatially Robust Inference with Predicted and Missing at Random Labels

Stephen Salerno, Zhenke Wu, Tyler McCormick

详情
英文摘要

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.

2603.11355 2026-03-13 cs.LG stat.AP

Teleodynamic Learning a new Paradigm For Interpretable AI

Enrique ter Horst, Juan Diego Zambrano

详情
英文摘要

We introduce Teleodynamic Learning, a new paradigm for machine learning in which learning is not the minimization of a fixed objective, but the emergence and stabilization of functional organization under constraint. Inspired by living systems, this framework treats intelligence as the coupled evolution of three quantities: what a system can represent, how it adapts its parameters, and which changes its internal resources can sustain. We formalize learning as a constrained dynamical process with two interacting timescales: inner dynamics for continuous parameter adaptation and outer dynamics for discrete structural change, linked by an endogenous resource variable that both shapes and is shaped by the trajectory. This perspective reveals three phenomena that standard optimization does not naturally capture: self-stabilization without externally imposed stopping rules, phase-structured learning dynamics that move from under-structuring through teleodynamic growth to over-structuring, and convergence guarantees grounded in information geometry rather than convexity. We instantiate the framework in the Distinction Engine (DE11), a teleodynamic learner grounded in Spencer-Brown's Laws of Form, information geometry, and tropical optimization. On standard benchmarks, DE11 achieves 93.3 percent test accuracy on IRIS, 92.6 percent on WINE, and 94.7 percent on Breast Cancer, while producing interpretable logical rules that arise endogenously from the learning dynamics rather than being imposed by hand. More broadly, Teleodynamic Learning unifies regularization, architecture search, and resource-bounded inference within a single principle: learning as the co-evolution of structure, parameters, and resources under constraint. This opens a thermodynamically grounded route to adaptive, interpretable, and self-organizing AI.

2603.11315 2026-03-13 stat.AP math.ST stat.TH

Finite-Sample Decision Instability in Threshold-Based Process Capability Approval

Fei Jiang, Lei Yang

Comments 14 pages, 6 figures

详情
英文摘要

Process capability indices such as $C_{pk}$ are widely used in manufacturing quality control to support supplier qualification and product release decisions based on fixed acceptance thresholds (e.g., $C_{pk} \geq 1.33$). In practice, these decisions rely on sample-based estimates computed from moderate sample sizes ($n \approx$ 20-50), yet the stochastic nature of the estimator is often overlooked when interpreting threshold compliance. This study establishes a local asymptotic characterization of decision behavior when the true process capability lies near a fixed threshold. Under standard regularity conditions, if the true capability equals the threshold, the acceptance probability converges to 0.5 as sample size increases, implying that a fixed $C_{pk}$ gate embeds an inherent boundary decision risk even under ideal distributional assumptions. When the true capability deviates from the threshold by $O(n^{-1/2})$, the decision probability converges to a non-degenerate limit governed by a scaled signal-to-noise ratio. Monte Carlo simulations and an empirical study on 880 manufacturing dimensions demonstrate substantial resampling-based decision instability near the commonly used 1.33 criterion. These findings provide a probabilistic interpretation of threshold-based capability decisions and quantitative guidance for assessing boundary-induced release risk in engineering practice.

2603.11304 2026-03-13 stat.ML cs.AI cs.LG stat.ME

Worst-case low-rank approximations

Anya Fries, Markus Reichstein, David Blei, Jonas Peters

详情
英文摘要

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in that, for example, the leading principal components may explain substantially less variance in unseen domains than in the training domains. Existing approaches (such as FairPCA) have proposed to consider worst-case (rather than average) performance across multiple domains. This work develops a unified framework, called wcPCA, applies it to other objectives (resulting in the novel estimators such as norm-minPCA and norm-maxregret, which are better suited for applications with heterogeneous total variance) and analyzes their relationship. We prove that for all objectives, the estimators are worst-case optimal not only over the observed source domains but also over all target domains whose covariance lies in the convex hull of the (possibly normalized) source covariances. We establish consistency and asymptotic worst-case guarantees of empirical estimators. We extend our methodology to matrix completion, another problem that makes use of low-rank approximations, and prove approximate worst-case optimality for inductive matrix completion. Simulations and two real-world applications on ecosystem-atmosphere fluxes demonstrate marked improvements in worst-case performance, with only minor losses in average performance.

2603.11283 2026-03-13 astro-ph.IM astro-ph.CO stat.AP

Two Point Correlation Function Estimation with Contaminated Data

Arya Farahi

Comments 22 pages, comments are welcome

详情
英文摘要

The two-point correlation function (2PCF) is a cornerstone of precision cosmology, yet its estimation from imaging surveys is vulnerable to contamination and incompleteness arising from imperfect target selection and pipeline-level inclusion decisions. In practice, the scientific target is a physically defined population, while the working catalog is constructed from noisy measurements and selection cuts, leading to mismatches between true and observed inclusion. These errors are often spatially structured, correlating with survey depth, observing conditions, and foregrounds, and can imprint spurious large-scale power or suppress the true clustering signal. High-resolution spectroscopic samples provide gold-standard inclusion in the target population but are typically available for only a small subset of objects. We introduce a prediction-powered Landy--Szalay (PP--LS) estimator that combines noisy inclusion labels across the full catalog with exact labels on a small spectroscopic subset while preserving the standard random-catalog normalization for survey geometry and selection. PP--LS debiases pair counts using residual-based, design-weighted corrections computed only on the labeled subset, requiring no probability calibration, known misclassification rates, or explicit modeling of contamination. Under simple random sampling of the labeled subset, we establish recovery of the oracle (true-label) Landy--Szalay pair counts and thus consistency for the target 2PCF. In simulations with clustered and spatially structured contaminants, PP--LS removes the bias of naive catalog-level estimators while achieving substantially lower variance than spectroscopic-only clustering. The resulting estimator is statistically principled, computationally lightweight, and integrates directly with standard pair-counting pipelines, enabling robust clustering inference in next-generation surveys.

2603.11282 2026-03-13 stat.ME math.ST stat.ML stat.TH

Outrigger local polynomial regression

Elliot H. Young, Rajen D. Shah, Richard J. Samworth

详情
英文摘要

Standard local polynomial estimators of a nonparametric regression function employ a weighted least squares loss function that is tailored to the setting of homoscedastic Gaussian errors. We introduce the outrigger local polynomial estimator, which is designed to achieve distributional adaptivity across different conditional error distributions. It modifies a standard local polynomial estimator by employing an estimate of the conditional score function of the errors and an 'outrigger' that draws on the data in a broader local window to stabilise the influence of the conditional score estimate. Subject to smoothness and moment conditions, and only requiring consistency of the conditional score estimate, we first establish that even under the least favourable settings for the outrigger estimator, the asymptotic ratio of the worst-case local risks of the two estimators is at most $1$, with equality if and only if the conditional error distribution is Gaussian. Moreover, we prove that the outrigger estimator is minimax optimal over Hölder classes up to a multiplicative factor $A_{β,d}$, depending only on the smoothness $β\in (0,\infty)$ of the regression function and the dimension~$d$ of the covariates. When $β\in (0,1]$, we find that $A_{β,d} \leq 1.69$, with $\lim_{β\searrow 0} A_{β,d} = 1$. A further attraction of our proposal is that we do not require structural assumptions such as independence of errors and covariates, or symmetry of the conditional error distribution. Numerical results on simulated and real data validate our theoretical findings; our methodology is implemented in R and available at https://github.com/elliot-young/outrigger.

2603.11258 2026-03-13 stat.ME math.PR

Continuous-time modeling and bootstrap for Schnieper's reserving

Nicolas Baradel

详情
英文摘要

We revisit Schnieper's model, which decomposes incurred but not reported (IBNR) reserves into two components: reserves for newly reported claims (true IBNR) and reserves for changes over time in the estimated cost of already reported claims (IBNER). We propose a continuous-time stochastic model for the aggregate claims process, driven by a random Poisson measure for the arrival of newly reported claims and by Brownian motion for the cost fluctuations of reported claims. This framework is consistent with the key assumptions of Schnieper's original approach. Within this setting, we develop a bootstrap method to estimate the full predictive distribution of claims reserves. Our approach naturally accounts for asymmetry, ensures non-negativity, and respects intrinsic bounds on reserves, without requiring additional assumptions. We illustrate the method through a case study and compare it with alternative stochastic techniques based on Schnieper's model.

2603.11229 2026-03-13 stat.ML cs.LG

Trustworthy predictive distributions for rare events via diagnostic transport maps

Elizabeth Cucuzzella, Rafael Izbicki, Ann B. Lee

Comments 19 pages, 5 figures, 2 tables

详情
英文摘要

Forecast systems in science and technology are increasingly moving beyond point prediction toward methods that produce full predictive distributions of future outcomes y, conditional on high-dimensional and complex sequences of inputs x. However, even when forecast systems provide a full predictive distribution, the result is rarely calibrated with respect to all x and y. The estimated density can be especially unreliable in low-frequency or out-of-distribution regimes, where accurate uncertainty quantification and a means for human experts to verify results are most needed to establish trust in models. In this paper, we take an initial predictive distribution as given and treat it as a useful but potentially misspecified base model. WE then introduce diagnostic transport maps, covariate-dependent probability-to-probability maps that quantify how the base model's probabilities should be adjusted to better match the true conditional distribution of calibration data. At deployment, these maps provide the user with real-time local diagnostics that reveal where the model fails and how it fails (including bias, dispersion, skewness, and tail errors), while also producing a recalibrated predictive distribution through a simple composition with the base model. We apply diagnostic transport maps to short-term tropical cyclone intensity forecasting and show that an easy-to-fit parametric version identifies evolutionary modes associated with local miscalibration and improves the predictive performance for rare events, including 24-hour rapid intensity change, as compared to the operational forecasts of the National Hurricane Center.

2603.11138 2026-03-13 stat.ML cs.LG math.ST stat.TH

Deep regression learning from dependent observations with minimum error entropy principle

William Kengne, Modou Wade

详情
英文摘要

This paper considers nonparametric regression from strongly mixing observations. The proposed approach is based on deep neural networks with minimum error entropy (MEE) principle. We study two estimators: the non-penalized deep neural network (NPDNN) and the sparse-penalized deep neural network (SPDNN) predictors. Upper bounds of the expected excess risk are established for both estimators over the classes of Hölder and composition Hölder functions. For the models with Gaussian error, the rates of the upper bound obtained match (up to a logarithmic factor) with the lower bounds established in \cite{schmidt2020nonparametric}, showing that both the MEE-based NPDNN and SPDNN estimators from strongly mixing data can achieve the minimax optimal convergence rate.

2603.11134 2026-03-13 math.ST cs.LG stat.TH

Conformal e-prediction in the presence of confounding

Vladimir Vovk, Ruodu Wang

Comments 8 pages, 2 figures

详情
英文摘要

This note extends conformal e-prediction to cover the case where there is observed confounding between the random object $X$ and its label $Y$. We consider both the case where the observed data is IID and a case where some dependence between observations is permitted.

2603.11128 2026-03-13 stat.ML cs.LG cs.NE

Efficient Approximation to Analytic and $L^p$ functions by Height-Augmented ReLU Networks

ZeYu Li, FengLei Fan, TieYong Zeng

详情
英文摘要

This work addresses two fundamental limitations in neural network approximation theory. We demonstrate that a three-dimensional network architecture enables a significantly more efficient representation of sawtooth functions, which serves as the cornerstone in the approximation of analytic and $L^p$ functions. First, we establish substantially improved exponential approximation rates for several important classes of analytic functions and offer a parameter-efficient network design. Second, for the first time, we derive a quantitative and non-asymptotic approximation of high orders for general $L^p$ functions. Our techniques advance the theoretical understanding of the neural network approximation in fundamental function spaces and offer a theoretically grounded pathway for designing more parameter-efficient networks.

2603.11125 2026-03-13 stat.ML cs.LG

Co-Diffusion: An Affinity-Aware Two-Stage Latent Diffusion Framework for Generalizable Drug-Target Affinity Prediction

Yining Qian, Pengjie Wang, Yixiao Li, An-Yang Lu, Cheng Tan, Shuang Li, Lijun Liu

详情
英文摘要

Predicting drug-target affinity is fundamental to virtual screening and lead optimization. However, existing deep models often suffer from representation collapse in stringent cold-start regimes, where the scarcity of labels and domain shifts prevent the learning of transferable pharmacophores and binding motifs. In this paper, we propose Co-Diffusion, a novel affinity-aware framework that redefines DTA prediction as a constrained latent denoising process to enhance generalization. Co-Diffusion employs a two-stage paradigm: Stage I establishes an affinity-steered latent manifold by aligning drug and target embeddings under an explicit supervised objective, ensuring that the latent space reflects the intrinsic binding landscape. Stage II introduces modality-specific latent diffusion as a stochastic perturb-and-denoise regularizer, forcing the model to recover consistent affinity semantics from noisy structural representations. This approach effectively mitigates the reconstruction-regression conflict common in generative DTA models. Theoretically, we show that Co-Diffusion maximizes a variational lower bound on the joint likelihood of drug structures, protein sequences, and binding strength. Extensive experiments across multiple benchmarks demonstrate that Co-Diffusion significantly outperforms state-of-the-art baselines, particularly yielding superior zero-shot generalization on unseen molecular scaffolds and novel protein families-paving a robust path for in silico drug prioritization in unexplored chemical spaces.

2603.11113 2026-03-13 stat.ME math.ST stat.ML stat.TH

Partition-Based Functional Ridge Regression for High-Dimensional Data

Shaista Ashraf, Ismail Shah, Farrukh Javed

Comments 32 pages, 5 figures

详情
英文摘要

This paper proposes a partition-based functional ridge regression framework to address multicollinearity, overfitting, and interpretability in high-dimensional functional linear models. The coefficient function vector \( \boldsymbolβ(s) \) is decomposed into two components, \( \boldsymbolβ_1(s) \) and \( \boldsymbolβ_2(s) \), representing dominant and weaker functional effects. This partition enables differential ridge penalization across functional blocks, so that important signals are preserved while less informative components are more strongly shrunk. The resulting approach improves numerical stability and enhances interpretability without relying on explicit variable selection. We develop three estimators: the Functional Ridge Estimator (FRE), the Functional Ridge Full Model (FRFM), and the Functional Ridge Sub-Model (FRSM). Under standard regularity conditions, we establish consistency and asymptotic normality for all estimators. Simulation results reveal a clear bias--variance trade-off where FRSM performs best in small samples through strong variance reduction, whereas FRFM achieves superior accuracy in moderate to large samples by retaining informative functional structure through adaptive penalization. An empirical application to Canadian weather data further demonstrates improved predictive performance, reduced variance inflation, and clearer identification of influential functional effects. Overall, partition-based ridge regularization provides a practical and theoretically grounded method for high-dimensional functional regression.

2603.11084 2026-03-13 stat.ME q-bio.QM

Realizing Common Random Numbers: Event-Keyed Hashing for Causally Valid Stochastic Models

Vince Buffalo, Carl A. B. Pearson, Daniel Klein

详情
英文摘要

Agent-based models (ABMs) are widely used to estimate causal treatment effects via paired counterfactual simulation. A standard variance reduction technique is common random numbers (CRNs), which couples replicates across intervention scenarios by sharing the same random inputs. In practice, CRNs are implemented by reusing the same base seed, but this relies on a critical assumption: that the same draw index corresponds to the same modeled event across scenarios. Stateful pseudorandom number generators (PRNGs) violate this assumption whenever interventions alter the simulation's execution path, because any change in control flow shifts the draw index used for all downstream events. We argue that this execution-path-dependent draw indexing is not only a variance-reduction nuisance, but represents a fundamental mismatch between the scientific causal structure ABMs are intended to encode and the program-level causal structure induced by stateful PRNG implementations. Formalizing this through the lens of structural causal models (SCMs), we show that standard PRNG practices yield causally incoherent paired counterfactual comparisons even when the mechanistic specification is otherwise sound. We show that a remedy is to combine counter-based random number generators (e.g., Philox/Threefry) with event identifiers. This decouples random number generation from simulation execution order by making random draws explicit functions of the particular modeled event that called them, restoring the stable event-indexed exogenous structure assumed by SCMs.

2603.11060 2026-03-13 cs.SI math.PR stat.OT

LLY Ricci Reweighting in Stochastic Block Models: Uniform Curvature Concentration and Finite-Horizon Tracking

Varun Kotharkar

详情
英文摘要

We study curvature-driven edge reweighting for community recovery in the balanced two-block stochastic block model. Given a graph G with initial weights equal to the adjacency matrix, we iteratively update edge weights using Lin-Lu-Yau (Ollivier-type) Ricci curvature, while all transportation costs are computed in the unweighted graph metric. In a moderate-density regime we prove uniform concentration of edge curvatures and show that a single Ricci reweighting step produces a two-level weighting that amplifies within-block connectivity relative to across-block connectivity. As a consequence, spectral clustering on the reweighted graph has a strictly larger population eigengap, and we obtain corresponding non-asymptotic perturbation bounds and Davis-Kahan misclustering guarantees. We further analyze a fixed finite horizon of iterated reweighting, where the random iterates track a deterministic two-weight recursion uniformly over the time horizon. This yields a principled finite-horizon curvature flow interpretation for community detection in a canonical random graph model.

2603.10065 2026-03-13 cs.IT cs.AI cs.SY eess.SY math.IT stat.ME

The Epistemic Support-Point Filter: Jaynesian Maximum Entropy Meets Popperian Falsification

Moriba Kemessia Jah

详情
英文摘要

This paper proves that the Epistemic Support-Point Filter (ESPF) is the unique optimal recursive estimator within the class of epistemically admissible evidence-only filters. Where Bayesian filters minimize mean squared error and are driven toward an assumed truth, the ESPF minimizes maximum entropy and surfaces what has not been proven impossible -- a fundamentally different epistemic commitment with fundamentally different failure modes. Two results locate this theorem within the broader landscape of estimation theory. The first is a unification: the ESPF's optimality criterion is the log-geometric mean of the alpha-cut volume family in the Holder mean hierarchy. The Popperian minimax bound and the Kalman MMSE criterion occupy the p=+inf and p=0 positions on the same curve. Possibility and probability are not competing frameworks: they are the same ignorance functional evaluated under different alpha-cut geometries. The Kalman filter is the Gaussian specialization of the ESPF's optimality criterion, not a separate invention. The second result is a diagnostic: numerical validation over a 2-day, 877-step Smolyak Level-3 orbital tracking run shows that possibilistic stress manifests through necessity saturation and surprisal escalation rather than MVEE sign change -- a direct consequence of the Holder ordering, not an empirical observation. Three lemmas establish the result: the Possibilistic Entropy Lemma decomposes the ignorance functional; the Possibilistic Cramer-Rao Bound limits entropy reduction per measurement; the Evidence-Optimality Lemma proves minimum-q selection is the unique minimizer and that any rule incorporating prior possibility risks race-to-bottom bias.

2603.09602 2026-03-13 math.ST stat.TH

Inhomogeneous Submatrix Detection

Mor Oren-Loberman, Dvir Jerbi, Tamir Bendory, Wasim Huleihel

详情
英文摘要

In this paper, we study the problem of detecting multiple hidden submatrices in a large Gaussian random matrix when the planted signal is inhomogeneous across entries. Under the null hypothesis, the observed matrix has independent and identically distributed standard normal entries. Under the alternative, there exist several planted submatrices whose entries deviate from the background in one of two ways: in the mean-shift model, planted entries (templates) have nonzero and possibly varying means; in the variance-shift model, planted entries have inflated and possibly varying variances. We consider two placement regimes for the planted submatrices. In the first, the row and column index sets are arbitrary. Motivated by scientific applications, in the second regime the row and column indices are restricted to be consecutive. For both alternatives and both placement regimes, we analyze the statistical limits of detection by proving information-theoretic lower bounds and by designing algorithms that match these bounds up to logarithmic factors, for a wide family of templates.

2603.08771 2026-03-13 stat.ML cs.IT cs.LG math.IT

Micro-Diffusion Compression - Binary Tree Tweedie Denoising for Online Probability Estimation

Roberto Tacconelli

Comments 12 pages, 1 figure

详情
英文摘要

We present Midicoth, a lossless compression system that introduces a micro-diffusion denoising layer for improving probability estimates produced by adaptive statistical models. In compressors such as Prediction by Partial Matching (PPM), probability estimates are smoothed by a prior to handle sparse observations. When contexts have been seen only a few times, this prior dominates the prediction and produces distributions that are significantly flatter than the true source distribution, leading to compression inefficiency. Midicoth addresses this limitation by treating prior smoothing as a shrinkage process and applying a reverse denoising step that corrects predicted probabilities using empirical calibration statistics. To make this correction data-efficient, the method decomposes each byte prediction into a hierarchy of binary decisions along a bitwise tree. This converts a single 256-way calibration problem into a sequence of binary calibration tasks, enabling reliable estimation of correction terms from relatively small numbers of observations. The denoising process is applied in multiple successive steps, allowing each stage to refine residual prediction errors left by the previous one. The micro-diffusion layer operates as a lightweight post-blend calibration stage applied after all model predictions have been combined, allowing it to correct systematic biases in the final probability distribution. Midicoth combines five fully online components: an adaptive PPM model, a long-range match model, a trie-based word model, a high-order context model, and the micro-diffusion denoiser applied as the final stage.

2603.06506 2026-03-13 stat.ML cs.LG

Semantics-Aware Caching for Concept Learning

Louis Mozart Kamdem Teyou, Caglar Demir, Axel-Cyrille Ngonga Ngomo

详情
英文摘要

Concept learning is a form of supervised machine learning that operates on knowledge bases in description logics. State-of-the-art concept learners often rely on an iterative search through a countably infinite concept space. In each iteration, they retrieve instances of candidate solutions to select the best concept for the next iteration. While simple learning problems might require a few dozen instance retrieval calls to find a fitting solution, complex learning problems might necessitate thousands of calls. We alleviate the resulting runtime challenge by presenting a semantics-aware caching approach. Our cache is essentially a subsumption-aware map that links concepts to a set of instances via crisp set operations. Our experiments on 5 datasets with 4 symbolic reasoners, a neuro-symbolic reasoner, and 5 popular pagination policies demonstrate that our cache can reduce the runtime of concept retrieval and concept learning by an order of magnitude while being effective for both symbolic and neuro-symbolic reasoners.

2603.01470 2026-03-13 cs.LG stat.ML

Randomized Kriging Believer for Parallel Bayesian Optimization with Regret Bounds

Shuhei Sugiura, Ichiro Takeuchi, Shion Takeno

详情
英文摘要

We consider an optimization problem of an expensive-to-evaluate black-box function, in which we can obtain noisy function values in parallel. For this problem, parallel Bayesian optimization (PBO) is a promising approach, which aims to optimize with fewer function evaluations by selecting a diverse input set for parallel evaluation. However, existing PBO methods suffer from poor practical performance or lack theoretical guarantees. In this study, we propose a PBO method, called randomized kriging believer (KB), based on a well-known KB heuristic and inheriting the advantages of the original KB: low computational complexity, a simple implementation, versatility across various BO methods, and applicability to asynchronous parallelization. Furthermore, we show that our randomized KB achieves Bayesian expected regret guarantees. We demonstrate the effectiveness of the proposed method through experiments on synthetic and benchmark functions and emulators of real-world data.

2601.13010 2026-03-13 q-bio.PE stat.ME

Extracting useful information about reversible evolutionary processes from irreversible evolutionary accumulation models

Iain G. Johnston

详情
英文摘要

Evolutionary accumulation models (EvAMs) are an emerging class of machine learning methods designed to infer the evolutionary pathways by which features are acquired. Applications include cancer evolution (accumulation of mutations), anti-microbial resistance (accumulation of drug resistances), genome evolution (organelle gene transfers), and more diverse themes in biology and beyond. Following these themes, many EvAMs assume that features are gained irreversibly -- no loss of features can occur. Reversible approaches do exist but are often computationally (much) more demanding and statistically less stable. Our goal here is to explore whether useful information about evolutionary dynamics which are in reality reversible can be obtained from modelling approaches with an assumption of irreversibility. We identify, and use simulation studies to quantify, errors involved in neglecting reversible dynamics, and show the situations in which approximate results from tractable models can be informative and reliable. In particular, EvAM inferences about the relative orderings of acquisitions and the core dynamic structure of evolutionary pathways -- which features are likely present when another is acquired -- are robust to reversibility in many cases, while estimations of uncertainty and feature interactions are more error-prone.

2512.18492 2026-03-13 stat.ME

A Bayesian likely responder approach for the analysis of randomized controlled trials

Annan Deng, Carole Siegel, Hyung G. Park

详情
英文摘要

An important goal of precision medicine is to personalize medical treatment by identifying individuals who are most likely to benefit from a specific treatment. The Likely Responder (LR) framework, which identifies a subpopulation where treatment response is expected to exceed a certain clinical threshold, plays a role in this effort. However, the LR framework, and more generally, data-driven subgroup analyses, often fail to account for uncertainty in the estimation of model-based data-driven subgrouping. We propose a simple two-stage approach that integrates subgroup identification with subsequent subgroup-specific inference on treatment effects. We incorporate model estimation uncertainty from the first stage into subgroup-specific treatment effect estimation in the second stage, by utilizing Bayesian posterior distributions from the first stage. We evaluate our method through simulations, demonstrating that the proposed Bayesian two-stage model produces better calibrated confidence intervals than naïve approaches. We apply our method to an international COVID-19 treatment trial, which shows substantial variation in treatment effects across data-driven subgroups.

2511.00617 2026-03-13 cs.LG cs.AI cs.CL stat.ML

Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering

Eric Bigelow, Daniel Wurgaft, YingQiao Wang, Noah Goodman, Tomer Ullman, Hidenori Tanaka, Ekdeep Singh Lubana

详情
英文摘要

Large language models (LLMs) can be controlled at inference time through prompts (in-context learning) and internal activations (activation steering). Different accounts have been proposed to explain these methods, yet their common goal of controlling model behavior raises the question of whether these seemingly disparate methodologies can be seen as specific instances of a broader framework. Motivated by this, we develop a unifying, predictive account of LLM control from a Bayesian perspective. Specifically, we posit that both context- and activation-based interventions impact model behavior by altering its belief in latent concepts: steering operates by changing concept priors, while in-context learning leads to an accumulation of evidence. This results in a closed-form Bayesian model that is highly predictive of LLM behavior across context- and activation-based interventions in a set of domains inspired by prior work on many-shot in-context learning. This model helps us explain prior empirical phenomena - e.g., sigmoidal learning curves as in-context evidence accumulates - while predicting novel ones - e.g., additivity of both interventions in log-belief space, which results in distinct phases such that sudden and dramatic behavioral shifts can be induced by slightly changing intervention controls. Taken together, this work offers a unified account of prompt-based and activation-based control of LLM behavior, and a methodology for empirically predicting the effects of these interventions.

2510.05440 2026-03-13 stat.ML cs.CR cs.LG

Refereed Learning

Ran Canetti, Ephraim Linder, Connor Wagaman

详情
英文摘要

We initiate an investigation of learning tasks in a setting where the learner is given access to two competing provers, only one of which is honest. Specifically, we consider the power of such learners in assessing purported properties of opaque models. Following prior work in complexity theory that considers the power of competing provers in various settings, we call this setting refereed learning. After formulating a general definition of refereed learning tasks, we show refereed learning protocols that obtain a level of accuracy that far exceeds what is obtainable at comparable cost without provers, or even with a single prover. We concentrate on the task of choosing the better one out of two black-box models, with respect to some ground truth. While we consider a range of parameters, perhaps our most notable result is in the high-precision range: For all $\varepsilon>0$ and ambient dimension $d$, our learner makes only one query to the ground truth function, communicates only $(1+\frac{1}{\varepsilon^2})\cdot\text{poly}(d)$ bits with the provers, and outputs a model whose loss is within a multiplicative factor of $(1+\varepsilon)$ of the best model's loss. Obtaining comparable loss with a single prover would require the learner to access the ground truth at almost all of the points in the domain. We also present lower bounds that demonstrate the optimality of our protocols in a number of respects, including prover complexity, number of samples, and need for query access.

2510.04579 2026-03-13 cs.LG math.MG stat.ML

Busemann Functions in the Wasserstein Space: Existence, Closed-Forms, and Applications to Slicing

Clément Bonet, Elsa Cazelles, Lucas Drumetz, Nicolas Courty

Comments Published as a conference paper at AISTATS 2026

详情
英文摘要

The Busemann function has recently found much interest in a variety of geometric machine learning problems, as it naturally defines projections onto geodesic rays of Riemannian manifolds and generalizes the notion of hyperplanes. As several sources of data can be conveniently modeled as probability distributions, it is natural to study this function in the Wasserstein space, which carries a rich formal Riemannian structure induced by Optimal Transport metrics. In this work, we investigate the existence and computation of Busemann functions in Wasserstein space, which admits geodesic rays. We establish closed-form expressions in two important cases: one-dimensional distributions and Gaussian measures. These results enable explicit projection schemes for probability distributions on $\mathbb{R}$, which in turn allow us to define novel Sliced-Wasserstein distances over Gaussian mixtures and labeled datasets. We demonstrate the efficiency of those original schemes on synthetic datasets as well as transfer learning problems.

2509.22961 2026-03-13 stat.ME

Measuring capacities in multimodal maritime port systems with anchorage queues

Debojjal Bagchi, Kyle Bathgate, Kenneth N. Mitchell, Magdalena I. Asborno, Marin M. Kress, Stephen D. Boyles

详情
英文摘要

This paper presents a framework for estimating the capacity of a multimodal maritime port system handling vessels of multiple classes. Port system capacity can be categorized into two distinct types: operating capacity, defined as the maximum number of vessels that can be processed over an extended period under stable operating conditions, and ultimate capacity, defined as the absolute maximum vessel throughput achievable irrespective of stability. Distinguishing between these two capacity measures is critical for long-term planning and resilience analysis, as ports may temporarily operate above sustainable levels following disruptions or during demand surges. Despite the importance of this distinction, existing port capacity models generally do not provide methods to compute port-level capacity estimates that clearly differentiate between operating and ultimate capacity. We introduce methods to estimate both capacity measures for seaport systems. We apply the proposed framework using the Port of Houston, Texas as a case study. Operating capacity is estimated using a parsimonious queueing-theoretic model, while ultimate capacity is estimated by fitting an ordinary differential equation model to simulation outputs. We estimate an operating capacity of approximately 0.9 vph and an ultimate capacity of approximately 1.4 vph for the Port of Houston. Sensitivity analysis of key port resources indicates that liquid-bulk terminals constitute the primary bottlenecks under stable operating conditions, whereas pilot availability becomes the dominant bottleneck following disruptions. These methods can be used in port planning to determine the expected operational and resilience gains of a given infrastructure intervention, or to identify bottlenecks in a complex, multimodal port environment.

2509.11821 2026-03-13 stat.ME astro-ph.IM physics.data-an

Covering Unknown Correlations in Bayesian Priors by Inflating Uncertainties

Lukas Koch

Comments 5 pages, added citations and acknowledgments

详情
Journal ref
2026 JINST 21 P01040
英文摘要

Bayesian analyses require that all variable model parameters are given a prior probability distribution. This can pose a challenge for analyses where multiple experiments are combined if these experiments use different parametrisations for their nuisance parameters. If the parameters in the two models describe exactly the same physics, they should be 100% correlated in the prior. If the parameters describe independent physics, they should be uncorrelated. But if they describe related or overlapping physics, it is not trivial to determine what the joint prior distribution should look like. Even if the priors for each experiment are well motivated, the unknown correlations between them can have unintended consequences for the posterior probability of the parameters of interest, potentially leading to underestimated uncertainties. In this paper we show that it is possible to choose a prior parametrisation that ensures conservative posterior uncertainties for the parameters of interest under some very general assumptions.

2509.02337 2026-03-13 stat.ML cs.LG math.ST stat.TH

Distribution estimation via Flow Matching with Lipschitz guarantees

Lea Kunkel

详情
英文摘要

Flow Matching, a promising approach in generative modeling, has recently gained popularity. Relying on ordinary differential equations, it offers a simple and flexible alternative to diffusion models, which are currently the state-of-the-art. Despite its empirical success, the mathematical understanding of its statistical power so far is very limited. This is largely due to the sensitivity of theoretical bounds to the Lipschitz constant of the vector field which drives the ODE. In this work, we study the assumptions that lead to controlling this dependency. Based on these results, we derive a convergence rate for the Wasserstein $1$ distance between the estimated distribution and the target distribution which improves previous results in high dimensional setting. This rate applies to certain classes of unbounded distributions and particularly does not require $\log$-concavity.

2504.12760 2026-03-13 stat.ME

Robust Covariate Adjustment in Multi-Center Randomized Trials

Muluneh Alene, Stijn Vansteelandt, Kelly Van Lancker

详情
英文摘要

Augmented inverse probability weighting and G-computation with canonical generalized linear models have become increasingly popular for estimating average treatment effects (ATEs) in randomized experiments. These methods leverage outcome prediction models to adjust for imbalances in baseline covariates across treatment arms, improving power compared to unadjusted analyses, while controlling Type I error, even when models are misspecified. In multi-center trials they are often implemented without accounting for clustering by centers. We investigate how ignoring center-level correlation can impair estimation, degrade coverage of confidence intervals, and obscure interpretation. We find these issues to be especially acute for estimators of counterfactual means, as shown through simulations and clarified via theoretical arguments. To address these challenges, we develop semiparametric efficient estimators of counterfactual means and ATE defined for a randomly sampled center and patient. These estimators leverage outcome prediction models to improve efficiency yet retain large-sample unbiasedness under model misspecification. We further introduce an inference framework, inspired by random-effects meta-analysis, tailored to settings with many small centers. Incorporating center effects into the prediction models yields substantial efficiency gains, particularly when treatment effects vary across centers. Simulations and application to the WASH Benefits Bangladesh trial illustrate strong finite-sample performance of the proposed methods.

2502.13698 2026-03-13 stat.ME stat.AP

Multi-view biclustering via non-negative matrix tri-factorisation

Ella S. C. Orme, Theodoulos Rodosthenous, Marina Evangelou

详情
Journal ref
Pattern Recognition. 172 (Part B), 112454 (2026)
英文摘要

Multi-view data is ever more apparent as methods for production, collection and storage of data become more feasible both practically and fiscally. However, not all features are relevant to describe the patterns for all individuals. Multi-view biclustering aims to simultaneously cluster both rows and columns, discovering clusters of rows as well as their view-specific identifying features. A novel multi-view biclustering approach based on non-negative matrix factorisation is proposed named ResNMTF. Demonstrated through extensive experiments on both synthetic and real datasets, ResNMTF successfully identifies both overlapping and non-exhaustive biclusters, without pre-existing knowledge of the number of biclusters present, and is able to incorporate any combination of shared dimensions across views. Further, to address the lack of a suitable bicluster-specific intrinsic measure, the popular silhouette score is extended to the bisilhouette score. The bisilhouette score is demonstrated to align well with known extrinsic measures, and proves useful as a tool for hyperparameter tuning as well as visualisation.

2502.13325 2026-03-13 q-fin.RM math.PR math.ST q-fin.MF stat.TH

Arbitrage-free catastrophe reinsurance valuation for compound dynamic contagion claims

Jiwook Jang, Patrick J. Laub, Tak Kuen Siu, Hongbiao Zhao

详情
英文摘要

In this paper, we consider catastrophe stop-loss reinsurance valuation for a reinsurance company with dynamic contagion claims. To deal with conventional and emerging catastrophic events, we propose the use of a compound dynamic contagion process for the catastrophic component of the liability. Under the premise that there is an absence of arbitrage opportunity in the market, we obtain arbitrage-free premiums for these contracts. To this end, the Esscher transform is adopted to specify an equivalent martingale probability measure. We show that reinsurers have various ways of levying the security loading on the net premiums to quantify the catastrophic liability in light of the growing challenges posed by emerging risks arising from climate change, cyberattacks, and pandemics. We numerically compare arbitrage-free catastrophe stop-loss reinsurance premiums via the Monte Carlo simulation method. We also compare them with those from generalised compound Hawkes/compound Cox cases. Sensitivity analyses are performed by changing the retention level, the Esscher parameters and the intensity parameters.

2412.20555 2026-03-13 stat.ME

Parameter-Specific Bias Diagnostics in Random-Effects Panel Data Models

Andrew T. Karl

详情
英文摘要

The Hausman specification test assesses the random-effects specification by comparing the random-effects estimator with a fixed-effects alternative. This note shows how a recently proposed bias diagnostic for linear mixed models can complement that test in random-effects panel-data applications. The diagnostic delivers parameter-specific internal estimates of finite-sample bias, together with permutation-based $p$-values, from a single fitted random-effects model. We illustrate its use in a gasoline-demand panel and in a value-added model for teacher evaluation using publicly available \textsf{R} packages, and we discuss how the resulting coefficient-specific bias summaries can be incorporated into routine practice.

2412.12213 2026-03-13 cs.LG q-fin.CP stat.ML

Finance-Informed Neural Network: Learning the Geometry of Option Pricing

Amine M. Aboussalah, Xuanze Li, Cheng Chi, Raj Patel

详情
英文摘要

We propose a Finance-Informed Neural Network (FINN) for option pricing and hedging that integrates financial theory directly into machine learning. Instead of training on observed option prices, FINN is learned through a self-supervised replication objective based on dynamic hedging, ensuring economic consistency by construction. We show theoretically that minimizing replication error recovers the arbitrage-free pricing operator and yields economically meaningful sensitivities. Empirically, FINN accurately recovers classical Black--Scholes prices and performs robustly in stochastic volatility environments, including the Heston model, while remaining stable in settings where analytical solutions are unavailable or unreliable. Fundamental pricing relationships such as put--call parity emerge endogenously. When applied to implied-volatility surface reconstruction, FINN produces surfaces that are consistently closer to observed market-implied volatilities than those obtained from Heston calibrations, indicating superior out-of-sample adaptability and reduced structural bias. Importantly, FINN extends beyond liquid option markets: it can be trained directly on historical spot prices to construct coherent option prices and Greeks for assets with no listed options. More broadly, FINN defines a new paradigm for financial pricing, in which prices are learned from replication and risk-control principles rather than inferred from parametric assumptions or direct supervision on option prices. By reframing option pricing as the learning of a pricing operator rather than the fitting of prices, FINN offers practitioners a practical and scalable tool for pricing, hedging, and risk management across both established and emerging financial markets.

2410.16004 2026-03-13 math.ST math.PR stat.ML stat.TH

Are Bayesian networks typically faithful?

Philip Boeken, Patrick Forré, Joris M. Mooij

详情
英文摘要

Faithfulness is a common assumption in causal inference, often motivated by the fact that the faithful parameters of linear Gaussian and discrete Bayesian networks are typical, and the folklore belief that this should also hold for other classes of Bayesian networks. We address this open question by showing that among all Bayesian networks over a given DAG, the faithful Bayesian networks are indeed `typical': they constitute a dense, open set with respect to the total variation metric. This does not directly imply that faithfulness is typical in restricted classes of Bayesian networks that are often considered in statistical applications. To this end we consider the class of Bayesian networks parametrised by conditional exponential families, for which we show that under regularity conditions, the faithful parameters constitute a dense and open set, the unfaithful parameters have Lebesgue measure zero, and the induced faithful distributions are open and dense in the weak topology. This extends the existing results for linear Gaussian and discrete Bayesian networks. We also show for nonparametric classes of Bayesian networks with uniformly equicontinuous and uniformly bounded conditional densities that the faithful Bayesian networks are open and dense in the weak topology. All these results also hold for Bayesian networks with latent variables, if faithfulness is only required to hold with respect to the latent projection. Finally, for the considered conditional exponential family parametrisations and nonparametric conditional density models, the topological properties of conditional independence imply the existence of a consistent conditional independence test. Together with the topological properties of faithfulness, this implies that sound constraint-based causal discovery algorithms like PC and FCI are consistent on an open and dense -- and hence `typical' -- set of Bayesian networks.

2410.08009 2026-03-13 stat.AP

Quasi-average predictions and regression to the trend: an application the M6 financial forecasting competition

Jose M. G. Vilar

Comments 12 pages, 5 figures

详情
Journal ref
International Journal of Forecasting, Volume 41, Issue 4, 2025, Pages 1505-1513
英文摘要

The efficient market hypothesis considers all available information already reflected in asset prices and limits the possibility of consistently achieving above-average returns by trading on publicly available data. We analyzed low dispersion prediction methods and their application to the M6 financial forecasting competition. Predictive averages and regression to the trend offer slight but potentially consistent advantages over the reference indexes. We put these results in the context of high variability approaches, which, if not accompanied by high information content, are bound to underperform the benchmark index as they are prone to overfit the past. In general, predicting the expected values under high uncertainty conditions, such as those assumed by the efficient market hypothesis, is more effective on average than trying to predict actual values.

2409.07412 2026-03-13 cs.LG stat.ML

Geometry of Singular Foliations and Learning Manifolds in ReLU Networks via the Data Information Matrix

Eliot Tron, Rita Fioresi

详情
英文摘要

Understanding how real data is distributed in high dimensional spaces is the key to many tasks in machine learning. We want to provide a natural geometric structure on the space of data employing a ReLU neural network trained as a classifier. Through the Data Information Matrix (DIM), a variation of the Fisher information matrix, the model will discern a singular foliation structure on the space of data. We show that the singular points of such foliation are contained in a measure zero set, and that a local regular foliation exists almost everywhere. Experiments show that the data is correlated with leaves of such foliation. Moreover we show the potential of our approach for knowledge transfer by analyzing the spectrum of the DIM to measure distances between datasets.

2312.05169 2026-03-13 q-fin.PM cs.NA math.NA q-fin.CP stat.ML

Onflow: a model free, online portfolio allocation algorithm robust to transaction fees

Gabriel Turinici, Pierre Brugiere

详情
英文摘要

We introduce Onflow, a reinforcement learning method for optimizing portfolio allocation via gradient flows. Our approach dynamically adjusts portfolio allocations to maximize expected log returns while accounting for transaction costs. Using a softmax parameterization, Onflow updates allocations through an ordinary differential equation derived from gradient flow methods. This algorithm belongs to the large class of stochastic optimization procedures; we measure its efficiency by comparing our results to the mathematical theoretical values in a log-normal framework and to standard benchmarks from the 'old NYSE' dataset. For log-normal assets with zero transaction costs, Onflow replicates Markowitz optimal portfolio, achieving the best possible allocation. Numerical experiments from the 'old NYSE' dataset show that Onflow leads to dynamic asset allocation strategies whose performances are: a) comparable to benchmark strategies such as Cover's Universal Portfolio or Helmbold et al. ``multiplicative updates'' approach when transaction costs are zero, and b) better than previous procedures when transaction costs are high. Onflow can even remain efficient in regimes where other dynamical allocation techniques do not work anymore. Onflow is a promising portfolio management strategy that relies solely on observed prices, requiring no assumptions about asset return distributions. This makes it robust against model risk, offering a practical solution for real-world trading strategies.

2307.11465 2026-03-13 cs.LG stat.AP

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Camillo Maria Caruso, Valerio Guarrasi, Sara Ramella, Paolo Soda

Comments 24 pages, 4 figures

详情
Journal ref
Computer Methods and Programs in Biomedicine 254 (2024) 108308
英文摘要

In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.

2110.11149 2026-03-13 stat.ME math.ST stat.TH

Asymptotics of cut distributions and robust modular inference using Posterior Bootstrap

Emilia Pompe, Mikołaj J. Kasprzak, Pierre E. Jacob

Comments Major revision, including new results on the control of the Laplace approximation error for cut posteriors

详情
英文摘要

Bayesian inference provides a framework to combine various model components with shared parameters, allowing joint uncertainty estimation and the use of all available data sources. Unfortunately, misspecification of any part of the model might propagate to all other parts and can lead to unsatisfactory results. Cut distributions have been proposed as a remedy, where the information is prevented from flowing along certain directions. We study cut distributions from an asymptotic perspective and obtain a Bernstein-von Mises theorem, as well as a Laplace approximation with quantitative bounds. We then propose an algorithm based on the Posterior Bootstrap that delivers credible regions with the nominal frequentist asymptotic coverage. The proposed methods are illustrated with numerical experiments in a variety of examples, including causal inference with propensity scores.

2109.02236 2026-03-13 stat.ME

Predictive Distributions and the Transition from Sparse to Dense Functional Data

Álvaro Gajardo, Xiongtao Dai, Hans-Georg Müller

详情
英文摘要

A representation of Gaussian distributed sparsely sampled longitudinal data in terms of predictive distributions for their functional principal component scores (FPCs) maps available data for each subject to a multivariate Gaussian predictive distribution. Of special interest is the case where the number of observations per subject increases in the transition from sparse (longitudinal) to dense (functional) sampling of underlying stochastic processes. We study the convergence of the predicted scores given noisy longitudinal observations towards the true but unobservable FPCs, and under Gaussianity demonstrate the shrinkage of the entire predictive distribution towards a point mass located at the true FPCs and also extensions to the shrinkage of functional $K$-truncated predictive distributions when the truncation point $K=K(n)$ diverges with sample size $n$. To address the problem of non-consistency of point predictions, we construct predictive distributions aimed at predicting outcomes for the case of sparsely sampled longitudinal predictors in functional linear models and derive asymptotic rates of convergence for the $2$-Wasserstein metric between true and estimated predictive distributions. Predictive distributions are illustrated for longitudinal data from the Baltimore Longitudinal Study of Aging.