arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.06169 2026-04-08 cs.LG cs.AI cs.CL stat.ML

In-Place Test-Time Training

Guhao Feng, Shengjie Luo, Kai Hua, Ge Zhang, Di He, Wenhao Huang, Tianle Cai

Comments ICLR 2026 Oral Presentation; Code is released at https://github.com/ByteDance-Seed/In-Place-TTT

详情
英文摘要

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.

2604.06123 2026-04-08 stat.CO cs.LG econ.EM stat.ME

A Large-Scale Empirical Comparison of Meta-Learners and Causal Forests for Heterogeneous Treatment Effect Estimation in Marketing Uplift Modeling

Aman Singh

Comments 6 pages

详情
英文摘要

Estimating Conditional Average Treatment Effects (CATE) at the individual level is central to precision marketing, yet systematic benchmarking of uplift modeling methods at industrial scale remains limited. We present UpliftBench, an empirical evaluation of four CATE estimators: S-Learner, T-Learner, X-Learner (all with LightGBM base learners), and Causal Forest (EconML), applied to the Criteo Uplift v2.1 dataset comprising 13.98 million customer records. The near-random treatment assignment (propensity AUC = 0.509) provides strong internal validity for causal estimation. Evaluated via Qini coefficient and cumulative gain curves, the S-Learner achieves the highest Qini score of 0.376, with the top 20% of customers ranked by predicted CATE capturing 77.7% of all incremental conversions, a 3.9x improvement over random targeting. SHAP analysis identifies f8 as the dominant heterogeneous treatment effect (HTE) driver among the 12 anonymized covariates. Causal Forest uncertainty quantification reveals that 1.9% of customers are confident persuadables (lower 95% CI > 0) and 0.1% are confident sleeping dogs (upper 95% CI < 0). Our results provide practitioners with evidence-based guidance on method selection for large-scale uplift modeling pipelines.

2604.06116 2026-04-08 q-fin.ST econ.EM q-fin.RM stat.ME stat.ML

Sequential Audit Sampling with Statistical Guarantees

Masahiro Kato, Kei Nakagawa

详情
英文摘要

Financial statement auditing is conducted under a risk-based evidence approach to obtain reasonable assurance. In practice, auditors often perform additional sampling or related procedures when an initial sample does not provide a sufficient basis for a conclusion. Across jurisdictions, current standards and practice manuals acknowledge such extensions, while the statistical design of sequential audit procedures has not been fully explored. This study formulates audit sampling with additional, sequentially collected items as a sequential testing problem for a finite population under sampling without replacement. We define null and alternative hypotheses in terms of a tolerable deviation rate, specify stopping and decision rules, and formulate exact sequential boundary conditions in terms of finite-population error probabilities. For practical implementation, we calibrate those boundaries by Monte Carlo simulation at least-favorable deviation rates. The exact design yields ex ante control of decision error probabilities, and the simulation-based implementation approximates that design while allowing the computation of expected stopping times. The framework is most naturally suited to attribute auditing and deviation-rate auditing, especially tests of controls, and it can be extended to one-sided, two-stage, and truncated designs.

2604.06065 2026-04-08 math.ST math.PR stat.ML stat.TH

Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities

Arthur Stéphanovitch

详情
英文摘要

Under general assumptions on the target distribution $p^\star$, we establish a sharp Lipschitz regularity theory for flow-matching vector fields and diffusion-model scores, with optimal dependence on time and dimension. As applications, we obtain Wasserstein discretization bounds for Euler-type samplers in dimension $d$: with $N$ discretization steps, the error achieves the optimal rate $\sqrt{d}/N$ up to logarithmic factors. Moreover, the constants do not deteriorate exponentially with the spatial extent of $p^\star$. We also show that the one-sided Lipschitz control yields a globally Lipschitz transport map from the standard Gaussian to $p^\star$, which implies Poincaré and log-Sobolev inequalities for a broad class of probability measures.

2604.06032 2026-04-08 stat.ML cs.LG

Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification

Courtney Franzen, Farhad Pourkamali-Anaraki

Comments 48 pages

详情
英文摘要

Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.

2604.05993 2026-04-08 cs.LG stat.ML

Data Distribution Valuation Using Generalized Bayesian Inference

Cuong N. Nguyen, Cuong V. Nguyen

Comments Paper published at AISTATS 2026

详情
英文摘要

We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.

2604.05974 2026-04-08 stat.ME

Nonparametric Statistical Inference for Multivariate Niche Overlap

Jonas Beck, Solomon Harrar

详情
英文摘要

In ecological studies niche overlap is often used to quantify species interaction and dynamics. This paper develops a robust, nonparametric statistical framework for quantifying and analyzing multivariate niche overlap. Parametric methods are often constrained by restrictive assumptions and tend to underperform in complex multivariate settings. We introduce a nonparametric overlap index and propose estimators for it. Further, we investigate asymptotic properties of the estimators. We also propose bootstrap-based inference procedures that enable statistical testing and simultaneous confidence intervals in small sample settings. Extensive numerical examples demonstrate that our proposed methods maintain correct size and exhibit robust power across various scenarios. We illustrate the practical utility of our methodology using stable isotope measurements from multiple fish species and provide distinct ecological insights regarding species niche differentiation.

2604.05910 2026-04-08 math.PR math-ph math.AP math.FA math.MP math.ST stat.TH

Well-posedness and Hurst parameter estimation for fluid equations driven by fractional transport noise

Alexandra Blessing Neamtu, Dan Crisan, Oana Lang

Comments 43 pages

详情
英文摘要

We study a two-dimensional incompressible vorticity equation on the torus driven by transport-type fractional Brownian noise with Hurst parameter $H \in (1/2,1)$. The model captures persistent, long-range correlated forcing consistent with inertial-range scaling laws and fractional Brownian approximations of turbulent fluctuations. A central ingredient of our approach is a version of the sewing lemma adapted to a class of integrands that includes, but is not limited to, transport-type structures. This result provides a flexible tool for constructing the Young integral and serves as a basis for analysing a wider class of stochastic partial differential equations. Using this approach, we establish existence and uniqueness of solutions via a fixed point argument and investigate statistical properties of the flow. In particular, we study quadratic functionals of the solution and derive an estimator for the Hurst parameter $H$.

2604.05842 2026-04-08 cs.LG cs.IT math.IT stat.ML

Expectation Maximization (EM) Converges for General Agnostic Mixtures

Avishek Ghosh

Comments Accepted at IEEE International Symposium on Information Theory (ISIT 2026)

详情
英文摘要

Mixture of linear regression is well studied in statistics and machine learning, where the data points are generated probabilistically using $k$ linear models. Algorithms like Expectation Maximization (EM) may be used to recover the ground truth regressors for this problem. Recently, in \cite{pal2022learning,ghosh_agnostic} the mixed linear regression problem is studied in the agnostic setting, where no generative model on data is assumed. Rather, given a set of data points, the objective is \emph{fit} $k$ lines by minimizing a suitable loss function. It is shown that a modification of EM, namely gradient EM converges exponentially to appropriately defined loss minimizer even in the agnostic setting. In this paper, we study the problem of \emph{fitting} $k$ parametric functions to given set of data points. We adhere to the agnostic setup. However, instead of fitting lines equipped with quadratic loss, we consider any arbitrary parametric function fitting equipped with a strongly convex and smooth loss. This framework encompasses a large class of problems including mixed linear regression (regularized), mixed linear classifiers (mixed logistic regression, mixed Support Vector Machines) and mixed generalized linear regression. We propose and analyze gradient EM for this problem and show that with proper initialization and separation condition, the iterates of gradient EM converge exponentially to appropriately defined population loss minimizers with high probability. This shows the effectiveness of EM type algorithm which converges to \emph{optimal} solution in the non-generative setup beyond mixture of linear regression.

2604.04360 2026-04-08 stat.ME

Generalized win fraction regression for composite survival endpoints

Zhiqiang Cao, Xi Fang, Fan Li

详情
英文摘要

We propose a generalized win fraction regression framework for prioritized composite survival outcomes. The framework models the conditional win fraction through a chosen link function (including identity, logit, or probit), thereby accommodating multi-component time-to-event endpoints within a unified regression structure. To handle right censoring, we construct inverse-probability-of-censoring-weighted estimating equations that target the win fraction as if censoring were absent. Under the identity link, regression parameters characterize covariate associations on the natural win fraction scale. Under the logit link, they characterize the log odds of winning -- a new and complementary effect measure that treats ties as failures to win, imposing a more conservative standard than the win ratio or win odds. When there are no ties, the logit win fraction model reduces to proportional win fraction regression; moreover, the unweighted version of our estimating equations numerically coincides with the proportional win fraction point estimator regardless of ties. We establish large-sample properties of the proposed estimators and derive a consistent sandwich variance estimator that accounts for uncertainty from the estimated censoring weights. Extensive simulations examine finite-sample performance across link functions and censoring rates, and our method is illustrated through a reanalysis of the HF-ACTION clinical trial.

2604.03541 2026-04-08 cs.LG stat.ML

Choosing the Right Regularizer for Applied ML: Simulation Benchmarks of Popular Scikit-learn Regularization Frameworks

Benjamin S. Knight, Ahsaas Bajaj

详情
英文摘要

This study surveys the historical development of regularization, tracing its evolution from stepwise regression in the 1960s to recent advancements in formal error control, structured penalties for non-independent features, Bayesian methods, and l0-based regularization (among other techniques). We empirically evaluate the performance of four canonical frameworks -- Ridge, Lasso, ElasticNet, and Post-Lasso OLS -- across 134,400 simulations spanning a 7-dimensional manifold grounded in eight production-grade machine learning models. Our findings demonstrate that for prediction accuracy when the sample-to-feature ratio is sufficient (n/p >= 78), Ridge, Lasso, and ElasticNet are nearly interchangeable. However, we find that Lasso recall is highly fragile under multicollinearity; at high condition numbers (kappa) and low SNR, Lasso recall collapses to 0.18 while ElasticNet maintains 0.93. Consequently, we advise practitioners against using Lasso or Post-Lasso OLS at high kappa with small sample sizes. The analysis concludes with an objective-driven decision guide to assist machine learning engineers in selecting the optimal scikit-learn-supported framework based on observable feature space attributes.

2603.28917 2026-04-08 math.OC cs.LG cs.SY eess.SY stat.ML

Symmetrizing Bregman Divergence on the Cone of Positive Definite Matrices: Which Mean to Use and Why

Tushar Sial, Abhishek Halder

详情
英文摘要

This work uncovers variational principles behind symmetrizing the Bregman divergences induced by generic mirror maps over the cone of positive definite matrices. We show that computing the canonical means for this symmetrization can be posed as minimizing the desired symmetrized divergences over a set of mean functionals defined axiomatically to satisfy certain properties. For the forward symmetrization, we prove that the arithmetic mean over the primal space is canonical for any mirror map over the positive definite cone. For the reverse symmetrization, we show that the canonical mean is the arithmetic mean over the dual space, pulled back to the primal space. Applying this result to three common mirror maps used in practice, we show that the canonical means for reverse symmetrization, in those cases, turn out to be the arithmetic, log-Euclidean and harmonic means. Our results improve understanding of existing symmetrization practices in the literature, and can be seen as a navigational chart to help decide which mean to use when.

2602.10370 2026-04-08 stat.ML cs.LG stat.ME

Causal Effect Estimation with Learned Instrument Representations

Frances Dean, Jenna Fields, Radhika Bhalerao, Marie Charpignon, Ahmed Alaa

详情
英文摘要

Instrumental variable (IV) methods mitigate bias from unobserved confounding in observational causal inference but rely on the availability of a valid instrument, which can often be difficult or infeasible to identify in practice. In this paper, we propose a representation learning approach that constructs instrumental representations from observed covariates, which enable IV-based estimation even in the absence of an explicit instrument. Our model (ZNet) achieves this through an architecture that mirrors the structural causal model of IVs; it decomposes the ambient feature space into confounding and instrumental components, and is trained by enforcing empirical moment conditions corresponding to the defining properties of valid instruments (i.e., relevance, exclusion restriction, and instrumental unconfoundedness). Importantly, ZNet is compatible with a wide range of downstream two-stage IV estimators of causal effects. Our experiments demonstrate that ZNet can (i) recover ground-truth instruments when they already exist in the ambient feature space and (ii) construct latent instruments in the embedding space when no explicit IVs are available. Our work suggests when ZNet can be used as a module for causal inference in general observational settings.

2509.06076 2026-04-08 econ.GN q-fin.EC stat.AP

DETERring more than Deforestation: Environmental Enforcement Reduces Violence in the Amazon

Rafael Araujo, Vitor Possebom, Gabriela Setti

详情
英文摘要

We estimate the impact of environmental law enforcement on violence in the Brazilian Amazon. The introduction of the Real-Time Deforestation Detection System (DETER), which enabled the government to monitor deforestation in real time and issue fines for illegal clearing, significantly reduced homicides in the region. To identify causal effects, we exploit exogenous variation in satellite monitoring generated by cloud cover as an instrument for enforcement intensity. Our estimates imply that the expansion of state presence through DETER prevented approximately 1,477 homicides per year, a 15\% reduction in homicides. These results show that a replicable environmental enforcement policy produces social benefits.

2507.05084 2026-04-08 cs.LG stat.ML

Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Maria-Florina Balcan, Saumya Goyal, Dravyansh Sharma

Comments 55 pages

详情
英文摘要

Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

2505.24868 2026-04-08 math.ST stat.ML stat.TH

Consistent line clustering using geometric hypergraphs

Kalle Alaluusua, Konstantin Avrachenkov, B. R. Vinay Kumar, Lasse Leskelä

Comments Major revision: new information-theoretic analysis for latent sampling laws concentrating near the intersection, recovery results for arbitrary fixed angles between the latent lines, revised spectral clustering guarantees, and substantial expository improvements (60 pages, 5 figures, 1 table)

详情
英文摘要

Subspace clustering becomes inherently difficult near intersections, where points from different subspaces are barely separated. Most existing theoretical results address this issue by imposing separation or sampling assumptions that limit the statistical effect of points near the intersection. We study a minimal setting of two intersecting lines in which the latent sampling law places polynomially large mass in small neighborhoods of the intersection. We derive information-theoretic lower bounds for exact and almost exact recovery under Gaussian noise. In particular, we show that the exact-recovery threshold is determined by the rate at which the latent law concentrates near the intersection. Since any two points are collinear, pairwise information alone does not reveal whether they are sampled from the same latent line. We therefore construct a hypergraph in which nearly collinear triples form hyperedges, and study the resulting hypergraph similarity matrix. Under a simple regularity condition on the latent distribution, we introduce a spectral algorithm that achieves the information-theoretic bounds up to polylogarithmic factors.

2404.01566 2026-04-08 econ.EM stat.ME

Heterogeneous Treatment Effects and Causal Mechanisms

Jiawei Fu, Tara Slough

详情
英文摘要

The credibility revolution advances the use of research designs that permit identification and estimation of causal effects. However, understanding which mechanisms produce measured causal effects remains a challenge. The dominant current approach to the quantitative evaluation of mechanisms relies on the detection of heterogeneous treatment effects (HTEs) with respect to pre-treatment covariates. This paper develops a framework to understand when the existence of such heterogeneous treatment effects can support inferences about the activation of a mechanism. We show first that this design cannot provide evidence of mechanism activation without additional, generally implicit, exclusion assumptions. Further, even when these assumptions are satisfied, the presence of HTEs supports the inference that mechanism is active but the absence of HTEs is generally uninformative about mechanism activation. We provide novel guidance for interpretation and research design in light of these findings.

2604.05829 2026-04-08 cs.LG stat.ML

Bivariate Causal Discovery Using Rate-Distortion MDL: An Information Dimension Approach

Tiago Brogueira, Mário A. T. Figueiredo

Comments 22 pages

详情
英文摘要

Approaches to bivariate causal discovery based on the minimum description length (MDL) principle approximate the (uncomputable) Kolmogorov complexity of the models in each causal direction, selecting the one with the lower total complexity. The premise is that nature's mechanisms are simpler in their true causal order. Inherently, the description length (complexity) in each direction includes the description of the cause variable and that of the causal mechanism. In this work, we argue that current state-of-the-art MDL-based methods do not correctly address the problem of estimating the description length of the cause variable, effectively leaving the decision to the description length of the causal mechanism. Based on rate-distortion theory, we propose a new way to measure the description length of the cause, corresponding to the minimum rate required to achieve a distortion level representative of the underlying distribution. This distortion level is deduced using rules from histogram-based density estimation, while the rate is computed using the related concept of information dimension, based on an asymptotic approximation. Combining it with a traditional approach for the causal mechanism, we introduce a new bivariate causal discovery method, termed rate-distortion MDL (RDMDL). We show experimentally that RDMDL achieves competitive performance on the Tübingen dataset. All the code and experiments are publicly available at github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data.

2604.05778 2026-04-08 math.DS physics.chem-ph stat.ML

Effective Dynamics and Transition Pathways from Koopman-Inspired Neural Learning of Collective Variables

Alexander Sikorski, Luca Donati, Marcus Weber, Christof Schütte

详情
英文摘要

The ISOKANN (Invariant Subspaces of Koopman Operators Learned by Artificial Neural Networks) framework provides a data-driven route to extract collective variables (CVs) and effective dynamics from complex molecular systems. In this work, we integrate the theoretical foundation of Koopman operators with Krylov-like subspace algorithms, and reduced dynamical modeling to build a coherent picture of how to describe metastable transitions in high-dimensional systems based on CVs. Starting from the identification of CVs based on dominant invariant subspaces, we derive the corresponding effective dynamics on the latent space and connect these to transition rates and times, committor functions, and transition pathways. The combination of Koopman-based learning and reduced-dimensional effective dynamics yields a principled framework for computing transition rates and pathways from simulation data. Numerical experiments on one-, two-, and three-dimensional benchmark potentials illustrate the ability of ISOKANN to reconstruct the coarse-grained kinetics and reproduce transition times across enthalpic and entropic barriers.

2604.05759 2026-04-08 stat.CO stat.ME stat.ML

High-dimensional reliability-based design optimization using stochastic emulators

M. Moustapha, B. Sudret

详情
英文摘要

Reliability-based design optimization (RBDO) is traditionally formulated as a nested optimization and reliability problem. Although surrogate models are generally employed to improve efficiency, the approach remains computationally prohibitive in high-dimensional settings. This paper proposes a novel RBDO framework based on a stochastic simulator viewpoint, in which the deterministic limit-state function and the uncertainty in the model inputs are combined into a unified stochastic representation. Under this formulation, the system response conditioned on a given design is modeled directly through its output distribution, rather than through an explicit limit-state function. Stochastic emulators are constructed in the design space to approximate the conditional response distribution, enabling the semi-analytical evaluation of failure probabilities or associated quantiles without resorting to Monte Carlo simulation. Two classes of stochastic emulators are investigated, namely generalized lambda models and stochastic polynomial chaos expansions. Both approaches provide a deterministic mapping between design variables and reliability constraints, which breaks the classical double-loop structure of RBDO and allows the use of standard deterministic optimization algorithms. The performance of the proposed approach is evaluated on a set of benchmark problems with dimensionality ranging from low to very high, including a case with stochastic excitation. The results are compared against a Kriging-based approach formulated in the full input space. The proposed method yields substantial computational gains, particularly in high-dimensional settings. While its efficiency is comparable to Kriging for low-dimensional problems, it significantly outperforms Kriging as the dimensionality increases.

2604.05669 2026-04-08 stat.ML cs.LG

Efficient machine unlearning with minimax optimality

Jingyi Xie, Linjun Zhang, Sai Li

详情
英文摘要

There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.

2604.05518 2026-04-08 math.OC cs.LG cs.SY eess.SY stat.ML

Optimal Centered Active Excitation in Linear System Identification

Kaito Ito, Alexandre Proutiere

Comments 11 pages

详情
英文摘要

We propose an active learning algorithm for linear system identification with optimal centered noise excitation. Notably, our algorithm, based on ordinary least squares and semidefinite programming, attains the minimal sample complexity while allowing for efficient computation of an estimate of a system matrix. More specifically, we first establish lower bounds of the sample complexity for any active learning algorithm to attain the prescribed accuracy and confidence levels. Next, we derive a sample complexity upper bound of the proposed algorithm, which matches the lower bound for any algorithm up to universal factors. Our tight bounds are easy to interpret and explicitly show their dependence on the system parameters such as the state dimension.

2604.05513 2026-04-08 stat.ME

From Unsupervised to Guided Clustering: A Variational Implementation

Violaine Courrier, Christophe Biernacki

详情
英文摘要

Clustering is viewed as an unsupervised technique, but in practice it requires guidance to uncover meaningful structures. We formalize this with guided clustering, a paradigm that uses a guiding variable to steer the discovery process, and introduce the Guided Clustering Variational Autoencoder (GCVAE) as its deep generative realization. GCVAE learns a latent space structured as a Gaussian Mixture Model by optimizing a variational objective that forces the representation to be maximally informative about the guiding variable. This framework allows the resulting clustering to be reoriented by changing the guiding variable, yielding clusters that are meaningful for the specified context. Experiments on public (MNIST-SVHN) and proprietary connected health devices data demonstrate GCVAE's ability to discover coherent and task-relevant clusters in complex settings.

2604.05470 2026-04-08 stat.ME

Evaluating Black-Box Classifiers via Stable Adaptive Two-Sample Inference

Yuchen Chen, Jing Lei

Comments 30 pages

详情
英文摘要

We consider the problem of evaluating black-box multi-class classifiers. In the standard setup, we observe class labels $Y\in \{0,1,\ldots,M-1\}$ generated according to the conditional distribution $ Y|X \sim \text{ Multinom}\big(η(X)\big), $ where $X$ denotes the features and $η$ maps from the feature space to the $(M-1)$-dimensional simplex. A black-box classifier is an estimate $\hatη$ for which we make no assumptions about the training algorithm. Given holdout data, our goal is to evaluate the performance of the classifier $\hatη$. Recent work suggests treating this as a goodness-of-fit problem by testing the hypothesis $H_0: ρ((X,Y),(X',Y')) \le δ$, where $ρ$ is some metric between two distributions, and $(X',Y')\sim P_X\times \text{ Multinom}(\hatη(X))$. Combining ideas from algorithmic fairness, Neyman-Pearson lemma, and conformal p-values, we propose a new methodology for this testing problem. The key idea is to generate a second sample $(X',Y') \sim P_X \times \text{ Multinom}\big(\hatη(X)\big)$ allowing us to reduce the task to two-sample conditional distribution testing. Using part of the data, we train an auxiliary binary classifier called a distinguisher to attempt to distinguish between the two samples. The distinguisher's ability to differentiate samples, measured using a rank-sum statistic, is then used to assess the difference between $\hatη$ and $η$ . Using techniques from cross-validation central limit theorems, we derive an asymptotically rigorous test under suitable stability conditions of the distinguisher.

2604.05469 2026-04-08 stat.ME cs.LG stat.ML

Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models

Giulio Valentino Dalla Riva

详情
英文摘要

We study language models as evolving model organisms and ask when autoregressive next-token learning selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen--Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. We then determine when this fixed-encoding analysis applies to transformer families: frozen dense and frozen Mixture-of-Experts transformers satisfy it, in-context learning does not enlarge the model's separation set, and per-task adaptation breaks the premise. The framework predicts two characteristic failure modes: simplicity pressure preferentially removes low-gain distinctions, and training-optimal models can still incur positive excess on deployment ecologies that refine the training ecology. A conditional dynamic extension shows how inter-model selection and post-training can recover such gap distinctions under explicit heredity, variation, and selection assumptions. Exact finite-ecology checks and controlled microgpt experiments validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism in a regime where the relevant quantities are directly observable. The goal is not to model frontier systems at scale, but to use small language models as laboratory organisms for theory about representational selection.

2604.05462 2026-04-08 stat.ML cs.LG math.ST stat.TH

Hierarchical Contrastive Learning for Multimodal Data

Huichao Li, Junhan Yu, Doudou Zhou

Comments 34 pages,11 figures

详情
英文摘要

Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.

2604.05460 2026-04-08 stat.ME cs.AI

LLM Evaluation as Tensor Completion: Low Rank Structure and Semiparametric Efficiency

Jiachun Li, David Simchi-Levi, Will Wei Sun

详情
英文摘要

Large language model (LLM) evaluation platforms increasingly rely on pairwise human judgments. These data are noisy, sparse, and non-uniform, yet leaderboards are reported with limited uncertainty quantification. We study this as semiparametric inference for a low-rank latent score tensor observed through pairwise comparisons under Bradley-Terry-Luce-type models. This places LLM evaluation in a new tensor completion setting with structured observations, non-uniform sampling, and pairwise contrasts. Our target is a smooth functional $ψ(T^\star)$, including linear estimands such as ability gaps and nonlinear ones such as win probabilities. We derive the information operator on the low-rank tangent space, the efficient influence function, and the semiparametric efficiency bound, then construct a one-step debiased estimator with asymptotic normality. A central challenge is that the information operator is anisotropic and does not commute with the tangent-space projection, creating a bottleneck absent from isotropic models. We introduce a score-whitening method that equalizes local Fisher information and restores stable inference at the optimal sample-complexity scale. Our results provide a principled framework for uncertainty quantification in LLM evaluation and more broadly for inference on low-rank structures from pairwise data.

2604.05337 2026-04-08 stat.ML cs.LG

Individual-heterogeneous sub-Gaussian Mixture Models

Huan Qing

Comments 32 pages, 4 figures, 2 tables

详情
英文摘要

The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.

2604.05303 2026-04-08 cs.LG cs.NA math.NA physics.comp-ph stat.ML

Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation

Guang Lin, Christian Moya, Di Qi, Xuda Ye

详情
英文摘要

Sampling physical systems with rough energy landscapes is hindered by rare events and metastable trapping. While Boltzmann generators already offer a solution, their reliance on the reverse Kullback--Leibler divergence frequently induces catastrophic mode collapse, missing specific modes in multi-modal distributions. Here, we introduce the Jeffreys Flow, a robust generative framework that mitigates this failure by distilling empirical sampling data from Parallel Tempering trajectories using the symmetric Jeffreys divergence. This formulation effectively balances local target-seeking precision with global modes coverage. We show that minimizing Jeffreys divergence suppresses mode collapse and structurally corrects inherent inaccuracies via distillation of the empirical reference data. We demonstrate the framework's scalability and accuracy on highly non-convex multidimensional benchmarks, including the systematic correction of stochastic gradient biases in Replica Exchange Stochastic Gradient Langevin Dynamics and the massive acceleration of exact importance sampling in Path Integral Monte Carlo for quantum thermal states.

2604.05285 2026-04-08 stat.ME cs.LG

Robust Learning of Heterogeneous Dynamic Systems

Shuoxun Xu, Zijian Guo, Brooke R. Staveland, Robert T. Knight, Lexin Li

详情
英文摘要

Ordinary differential equations (ODEs) provide a powerful framework for modeling dynamic systems arising in a wide range of scientific domains. However, most existing ODE methods focus on a single system, and do not adequately address the problem of learning shared patterns from multiple heterogeneous dynamic systems. In this article, we propose a novel distributionally robust learning approach for modeling heterogeneous ODE systems. Specifically, we construct a robust dynamic system by maximizing a worst-case reward over an uncertainty class formed by convex combinations of the derivatives of trajectories. We show the resulting estimator admits an explicit weighted average representation, where the weights are obtained from a quadratic optimization that balances information across multiple data sources. We further develop a bi-level stabilization procedure to address potential instability in estimation. We establish rigorous theoretical guarantees for the proposed method, including consistency of the stabilized weights, error bound for robust trajectory estimation, and asymptotical validity of pointwise confidence interval. We demonstrate that the proposed method considerably improves the generalization performance compared to the alternative solutions through both extensive simulations and the analysis of an intracranial electroencephalogram data.

2604.05283 2026-04-08 stat.ME

Truncation by death in the sufficient cause framework

Bronner P. Gonçalves, Eiji Yamamoto, Etsuji Suzuki

详情
英文摘要

The sufficient cause framework has been used for decades to improve our understanding of both basic and more complex causal concepts in epidemiology, such as mediation and interaction. Here, we make use of this framework to provide a description of truncation by death, in which the outcome of interest is undefined for individuals who die before the time of assessment at the end of follow-up. We explain the non-causal nature of the crude estimand that compares outcomes by treatment levels conditional on observed survival by showing that it corresponds to a comparison of distinct risk status types, which are defined based on the susceptibility to sufficient causes. Further, expressions for the crude estimand and for the survivor average causal effect, a causal estimand defined under the principal stratification approach, are provided in terms of population-level joint frequencies of the background factors of sufficient causes. Finally, we also describe conditions, based on background factors of sufficient causes, under which the survivor average causal effect is null. Our description of this problem, which studies truncation by death from a new perspective, might encourage further analyses of principal stratification-based estimands using sufficient causes.

2604.05275 2026-04-08 stat.AP

Statistical Analysis of Spatial and Temporal Variability of Maximum Precipitation Events on the Rio Grande do Sul

Cleber Souza Corrêa

Comments 9 pages, 2 figures, published in Journal of Aerospace Technology and Management (JATM). São José dos Campos, Vol. 4, No. 2, pp. 227-235, Apr.-Jun., 2012

详情
Journal ref
Journal of Aerospace Technology and Management (JATM). São José dos Campos, Vol. 4, No. 2, pp. 227-235, Apr.-Jun., 2012
英文摘要

A statistical analysis of precipitation at Rio Grande do Sul State was presented in this article. The aim of this work was to identify spatial and temporal patterns of maximum precipitation, which was achieved by fitting a theoretical variogram in maximum annual rainfalls and their times of occurrence. In the literature, it was found that this pattern occurs according to phenomena typical from middle latitude, such as low and high level jets, and interactions between them. Some years ago, the relationship between maximum annual rainfalls and synoptic predominant configurations was found. Therefore, this work sought to understand the climatic characteristics that are important in Aerospace and Aeronautics, as extreme weather can cause numerous consequences in these activities. The use and validation of this proposed method would make possible its application in other regions of interest in the Brazilian aerospace. Understanding these climatological features of the atmospheric circulation dynamics, and analyzing maximum annual rainfall would allow a more efficient and appropriate climate trend forecast and its application in aerospace activities.

2604.05188 2026-04-08 physics.soc-ph stat.ME

Ratio of Quantiles Indicates Burstiness with Fewer False Negatives than the Conventional Burstiness Parameter

Joshua Z. Stadlan, Michelle Birkett, Jason H. Rife

Comments 41 pages, 14 figures; for associated code, see https://github.com/jstadlan-compass/burstiness-tail-index

详情
英文摘要

Complexity researchers view burstiness--fluctuating levels of activity--as evidence of hidden interactions within the system generating the activity signal. Yet, current burstiness metrics miss evidence of burstiness in some moderately bursty distributions and under moderate sampling conditions. The canonical Burstiness Parameter (BP) compares distributions of timing statistics to the exponential distribution, representing the timing of independent random events, but it provides false negatives for some parameter ranges of power laws, with and without cut-offs. We introduce a metric that maintains BP's measurement approach but reduces false negatives: the Burstiness Tail-based Index (BTI). Based on ratios of differences in quantiles, BTI correctly classifies bursty distributions over certain parameter ranges misclassified by BP. Additionally, we find BTI to be more robust than BP in the presence of limited sample sizes and short observation windows, using simulated samples drawn from distributions correctly classified by BP in their analytical form. As a case study, we revisit an analysis of human activity data and find that the choice of BTI over BP influences interpretations of the timescales of burstiness in the dataset. Given these analytical, simulated, and empirical results, we argue for BTI's practical advantage over BP in assessing burstiness in real-world temporal signals for complexity research and time series modeling.

2604.05057 2026-04-08 cs.LG stat.ML

Blind-Spot Mass: A Good-Turing Framework for Quantifying Deployment Coverage Risk in Machine Learning Systems

Biplab Pal, Santanu Bhattacharya, Madanjit Singh

Comments 15 pages, 7 figures, 1 table; submitted to Journal of Machine Learning Research (JMLR)

详情
英文摘要

Blind-spot mass is a Good-Turing framework for quantifying deployment coverage risk in machine learning. In modern ML systems, operational state distributions are often heavy-tailed, implying that a long tail of valid but rare states is structurally under-supported in finite training and evaluation data. This creates a form of 'coverage blindness': models can appear accurate on standard test sets yet remain unreliable across large regions of the deployment state space. We propose blind-spot mass B_n(tau), a deployment metric estimating the total probability mass assigned to states whose empirical support falls below a threshold tau. B_n(tau) is computed using Good-Turing unseen-species estimation and yields a principled estimate of how much of the operational distribution lies in reliability-critical, under-supported regimes. We further derive a coverage-imposed accuracy ceiling, decomposing overall performance into supported and blind components and separating capacity limits from data limits. We validate the framework in wearable human activity recognition (HAR) using wrist-worn inertial data. We then replicate the same analysis in the MIMIC-IV hospital database with 275 admissions, where the blind-spot mass curve converges to the same 95% at tau = 5 across clinical state abstractions. This replication across structurally independent domains - differing in modality, feature space, label space, and application - shows that blind-spot mass is a general ML methodology for quantifying combinatorial coverage risk, not an application-specific artifact. Blind-spot decomposition identifies which activities or clinical regimes dominate risk, providing actionable guidance for industrial practitioners on targeted data collection, normalization/renormalization, and physics- or domain-informed constraints for safer deployment.

2604.05055 2026-04-08 stat.ME math.ST stat.TH

Hypothesis Testing for Penalized Estimating Equations with Cross-Fitted Covariance Calibration

Jing Zhou, Zhe Zhang

详情
英文摘要

We study hypothesis testing for penalized estimators in settings where the full marginal distribution of a multivariate response is difficult to specify, such as longitudinal data with correlated measurements or high-dimensional heteroscedastic regression. Assuming that the conditional mean model is correctly specified, we establish that the penalized estimating equations admit a $\sqrt{n}$-consistent solution, even when the working covariance structure is misspecified. Our inferential target is a low-dimensional subvector of parameters associated with the mean model. We show that the resulting test statistic converges to a $χ^2$ distribution, and that its asymptotic power depends on the nuisance covariance function. To mitigate this dependence, we propose estimating the covariance function via cross-fitting, which provides a calibrated and robust procedure for inference.

2604.05008 2026-04-08 stat.ML cs.LG q-fin.MF q-fin.ST

Generative Path-Law Jump-Diffusion: Sequential MMD-Gradient Flows and Generalisation Bounds in Marcus-Signature RKHS

Daniel Bloch

详情
英文摘要

This paper introduces a novel generative framework for synthesising forward-looking, càdlàg stochastic trajectories that are sequentially consistent with time-evolving path-law proxies, thereby incorporating anticipated structural breaks, regime shifts, and non-autonomous dynamics. By framing path synthesis as a sequential matching problem on restricted Skorokhod manifolds, we develop the \textit{Anticipatory Neural Jump-Diffusion} (ANJD) flow, a generative mechanism that effectively inverts the time-extended Marcus-sense signature. Central to this approach is the Anticipatory Variance-Normalised Signature Geometry (AVNSG), a time-evolving precision operator that performs dynamic spectral whitening on the signature manifold to ensure contractivity during volatile regime shifts and discrete aleatoric shocks. We provide a rigorous theoretical analysis demonstrating that the joint generative flow constitutes an infinitesimal steepest descent direction for the Maximum Mean Discrepancy functional relative to a moving target proxy. Furthermore, we establish statistical generalisation bounds within the restricted path-space and analyse the Rademacher complexity of the whitened signature functionals to characterise the expressive power of the model under heavy-tailed innovations. The framework is implemented via a scalable numerical scheme involving Nyström-compressed score-matching and an anticipatory hybrid Euler-Maruyama-Marcus integration scheme. Our results demonstrate that the proposed method captures the non-commutative moments and high-order stochastic texture of complex, discontinuous path-laws with high computational efficiency.

2604.04993 2026-04-08 stat.ML cs.CR cs.LG stat.ME

The Hiremath Early Detection (HED) Score: A Measure-Theoretic Evaluation Standard for Temporal Intelligence

Prakul Sunil Hiremath

Comments 11 pages. Introduces a measure-theoretic framework for predictive velocity including the Hiremath Standard Table. Dedicated to the Hiremath lineage

详情
英文摘要

We introduce the Hiremath Early Detection (HED) Score, a principled, measure-theoretic evaluation criterion for quantifying the time-value of information in systems operating over non-stationary stochastic processes subject to abrupt regime transitions. Existing evaluation paradigms, chiefly the ROC/AUC framework and its downstream variants, are temporally agnostic: they assign identical credit to a detection at t + 1 and a detection at t + tau for arbitrarily large tau. This indifference to latency is a fundamental inadequacy in time-critical domains including cyber-physical security, algorithmic surveillance, and epidemiological monitoring. The HED Score resolves this by integrating a baseline-neutral, exponentially decaying kernel over the posterior probability stream of a target regime, beginning precisely at the onset of the regime shift. The resulting scalar simultaneously encodes detection acuity, temporal lead, and pre-transition calibration quality. We prove that the HED Score satisfies three axiomatic requirements: (A1) Temporal Monotonicity, (A2) Invariance to Pre-Attack Bias, and (A3) Sensitivity Decomposability. We further demonstrate that the HED Score admits a natural parametric family indexed by the Hiremath Decay Constant (lambda_H), whose domain-specific calibration constitutes the Hiremath Standard Table. As an empirical vehicle, we present PARD-SSM (Probabilistic Anomaly and Regime Detection via Switching State-Space Models), which couples fractional Stochastic Differential Equations (fSDEs) with a Switching Linear Dynamical System (S-LDS) inference backend. On the NSL-KDD benchmark, PARD-SSM achieves a HED Score of 0.0643, representing a 388.8 percent improvement over a Random Forest baseline (0.0132), with statistical significance confirmed via block-bootstrap resampling (p < 0.001). We propose the HED Score as the successor evaluation standard to ROC/AUC.

2604.04987 2026-04-08 cs.LG cs.AI math.OC stat.ML

Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Yongchang Hao, Lili Mou

Comments Camera-ready version. Accepted at ICLR 2026

详情
英文摘要

Speculative sampling (SpS) has been successful in accelerating the decoding throughput of auto-regressive large language models by leveraging smaller draft models. SpS strictly enforces the generated distribution to match that of the verifier LLM. This is unnecessarily restrictive as slight variations of the verifier's distribution, such as sampling with top-$k$ or temperature, would also be acceptable. Typical acceptance sampling (TAS) alleviates this issue by accepting more tokens using entropy-based heuristics. However, this approach distorts the verifier distribution, potentially degrading output quality when the verifier encodes critical information. In this work, we formalize the speculative sampling algorithm through the lens of constrained optimization. Based on this formulation, we propose Cactus (constrained acceptance speculative sampling), a method that guarantees controlled divergence from the verifier distribution and increasing acceptance rates. Empirical results across a wide range of benchmarks confirm the effectiveness of our approach.

2604.04963 2026-04-08 stat.ML cs.LG

Learning Nonlinear Regime Transitions via Semi-Parametric State-Space Models

Prakul Sunil Hiremath

Comments 12 pages, 1 figures, 2 tables

详情
英文摘要

We develop a semi-parametric state-space model for time-series data with latent regime transitions. Classical Markov-switching models use fixed parametric transition functions, such as logistic or probit links, which restrict flexibility when transitions depend on nonlinear and context-dependent effects. We replace this assumption with learned functions $f_0, f_1 \in \calH$, where $\calH$ is either a reproducing kernel Hilbert space or a spline approximation space, and define transition probabilities as $p_{jk,t} = \sigmoid(f(\bx_{t-1}))$. The transition functions are estimated jointly with emission parameters using a generalized Expectation-Maximization algorithm. The E-step uses the standard forward-backward recursion, while the M-step reduces to a penalized regression problem with weights from smoothed occupation measures. We establish identifiability conditions and provide a consistency argument for the resulting estimators. Experiments on synthetic data show improved recovery of nonlinear transition dynamics compared to parametric baselines. An empirical study on financial time series demonstrates improved regime classification and earlier detection of transition events.

2604.04961 2026-04-08 stat.ML cs.LG econ.EM math.ST stat.TH

Identification and Inference in Nonlinear Dynamic Network Models

Diego Vallarino

详情
英文摘要

We study identification and inference in nonlinear dynamic systems defined on unknown interaction networks. The system evolves through an unobserved dependence matrix governing cross-sectional shock propagation via a nonlinear operator. We show that the network structure is not generically identified, and that identification requires sufficient spectral heterogeneity. In particular, identification arises when the network induces non-exchangeable covariance patterns through heterogeneous amplification of eigenmodes. When the spectrum is concentrated, dependence becomes observationally equivalent to common shocks or scalar heterogeneity, leading to non-identification. We provide necessary and sufficient conditions for identification, characterize observational equivalence classes, and propose a semiparametric estimator with asymptotic theory. We also develop tests for network dependence whose power depends on spectral properties of the interaction matrix. The results apply to a broad class of economic models, including production networks, contagion models, and dynamic interaction systems.

2604.02656 2026-04-08 stat.ML cs.LG

Transfer Learning for Meta-analysis Under Covariate Shift

Zilong Wang, Ali Abdeen, Turgay Ayer

Comments Accepted to IEEE ICHI 2026 Early Bird Track (Oral Presentation)

详情
英文摘要

Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard IPD meta-analysis and transport estimators. We propose a placebo-anchored transport framework that treats source-trial outcomes as abundant proxy signals and target-trial placebo outcomes as scarce, high-fidelity gold labels to calibrate baseline risk. A low-complexity (sparse) correction anchors proxy outcome models to the target population, and the anchored models are embedded in a cross-fitted doubly robust learner, yielding a Neyman-orthogonal, target-site doubly robust estimator for patient-level heterogeneous treatment effects when target treated outcomes are available. We distinguish two regimes: in connected targets (with a treated arm), the method yields target-identified effect estimates; in disconnected targets (placebo-only), it reduces to a principled screen--then--transport procedure under explicit working-model transport assumptions. Experiments on synthetic data and a semi-synthetic IHDP benchmark evaluate pointwise CATE accuracy, ATE error, ranking quality for targeting, decision-theoretic policy regret, and calibration. Across connected settings, the proposed method is best or near-best and improves substantially over proxy-only, target-only, and transport baselines at small target sample sizes; in disconnected settings, it retains strong ranking performance for targeting while pointwise accuracy depends on the strength of the working transport condition.

2603.17466 2026-04-08 math.DS cs.NA math.NA stat.CO

A Full-Density Approach to Simulating Random Iteration Equations with Applications

Wolfgang Hoegele

详情
英文摘要

The goal of this study is to introduce a unified computational framework for simulating random iteration equations (RIE), understood as iteration equations containing random variables. The novelty of this work is that full probability densities of the state vectors are propagated stepwise through the iterations avoiding the need of repetitive pathwise Monte Carlo simulations of the iteration equation. The presentation of the methodology is conceptually efficient based on recent work on static random equations and intentionally accessible. Based on previous work, the modeling requirements for RIEs allow for potential nonsmooth nonlinearities and stochasticities in the transfer function, as well as nonstandard probability densities and diffusion processes. As results, illustrative applications of nonlinear random and stochastic differential equation simulations, a novel full-density gradient descent method (FDGD) for global optimization under uncertainty and examples of chaotic mappings are presented in order to demonstrate the breadth of the utility of this framework. In total, the character of the presentation is explorative and encourages new applications and theoretical studies.

2602.15089 2026-04-08 cs.LG stat.ML

Triplet Feature Fusion for Equipment Anomaly Prediction : An Open-Source Methodology Using Small Foundation Models

Takato Yasuno

Comments 15 pages, 8 figures, 7 table

详情
英文摘要

Predicting equipment anomalies before they escalate into failures is a critical challenge in industrial facility management. Existing approaches rely either on hand-crafted threshold rules, which lack generalizability, or on large neural models that are impractical for on-site, air-gapped deployments. We present an industrial methodology that resolves this tension by combining open-source small foundation models into a unified 1,116-dimensional Triplet Feature Fusion pipeline. This pipeline integrates: (1) statistical features (x in $R^{28}$) derived from 90-day sensor histories, (2) time-series embeddings (y in $R^{64}$) from a LoRA-adapted IBM Granite TinyTimeMixer (TTM, 133K parameters), and (3) multilingual text embeddings (z in $R^{1024}$) extracted from Japanese equipment master records via multilingual-e5-large. The concatenated triplet h = [x; y; z] is processed by a LightGBM classifier (< 3 MB) trained to predict anomalies at 30-, 60-, and 90-day horizons. All components use permissive open-source licenses (Apache 2.0 / MIT). The inference-time pipeline runs entirely on CPU in under 2 ms, enabling edge deployment on co-located hardware without cloud dependency. On a dataset of 64 HVAC units comprising 67,045 samples, the triplet model achieves Precision = 0.992, F1 = 0.958, and ROC-AUC = 0.998 at the 30-day horizon. Crucially, it reduces the False Positive Rate from 0.6 percent (baseline) to 0.1 percent - an 83 percent reduction attributable to equipment-type conditioning via text embedding z. Cluster analysis reveals that the embeddings align time-series signatures with distinct fault archetypes, explaining how compact multilingual representations improve discrimination without explicit categorical encoding.

2512.01026 2026-04-08 math.ST stat.TH

Asymptotic inference in a stationary quantum time series

Michael Nussbaum, Arleta Szkoła

详情
英文摘要

We consider a statistical model of a n-mode quantum Gaussian state which is shift invariant and also gauge invariant. Such models can be considered analogs of classical Gaussian stationary time series, parametrized by their spectral density. Defining an appropriate quantum spectral density as the parameter, we establish that the quantum Gaussian time series model is asymptotically equivalent to a classical nonlinear regression model given as a collection of independent geometric random variables. The asymptotic equivalence is established in the sense of the quantum Le Cam distance between statistical models (experiments). The geometric regression model has a further classical approximation as a certain Gaussian white noise model with a transformed quantum spectral density as signal. In this sense, the result is a quantum analog of the asymptotic equivalence of classical spectral density estimation and Gaussian white noise, which is known for Gaussian stationary time series. In a forthcoming version of this preprint, we will also identify a quantum analog of the periodogram and provide optimal parametric and nonparametric estimates of the quantum spectral density.

2511.21497 2026-04-08 stat.CO stat.ME

Nested ensemble Kalman filter for static parameter inference in nonlinear state-space models

Andrew Golightly, Sarah E. Heaps, Chris Sherlock, Laura E. Wadkin, Darren J. Wilkinson

Comments 31 pages

详情
英文摘要

The ensemble Kalman filter (EnKF) is a popular technique for performing inference in state-space models (SSMs), particularly when the dynamic process is high-dimensional. Unlike reweighting methods such as sequential Monte Carlo (SMC, i.e. particle filters), the EnKF leverages either the linear Gaussian structure of the SSM or an approximation thereof, to maintain diversity of the sampled latent states (the so-called ensemble members) via shifting-based updates. Joint parameter and state inference using an EnKF is typically achieved by augmenting the state vector with the static parameter. In this case, it is assumed that both parameters and states follow a linear Gaussian state-space model, which may be unreasonable in practice. In this paper, we combine the reweighting and shifting methods by replacing the particle filter used in the SMC^2 algorithm of Chopin et al. (2013), with the ensemble Kalman filter. Hence, parameter particles are weighted according to the estimated observed-data likelihood from the latest observation computed by the EnKF, and particle diversity is maintained via a resample-move step that targets the marginal parameter posterior under the EnKF. Extensions to the resulting algorithm are proposed, such as the use of a delayed acceptance kernel in the rejuvenation step and incorporation of nonlinear observation models. We illustrate the resulting methodology via several applications.

2511.14292 2026-04-08 stat.ME stat.AP

Covariate Adjustment for the Win Odds: Application to Cardiovascular Outcomes Trials

Cyrill Scheidegger, Simon Wandel, Tobias Mütze

详情
英文摘要

Covariate adjustment can enhance precision and power in clinical trials, yet its application to the win odds remains unclear. The win odds is an extension of the win ratio that counts ties as half a win for the treatment and the control group, respectively. In their original form, both the win ratio and the win odds rely on comparing each individual from the treatment group to each individual from the control group in a pairwise manner, and count the number of wins, losses, and ties from these pairwise comparisons. A priori, it is not clear how covariate adjustment can be implemented for the win odds. To address this, we establish a connection between the win odds and the marginal probabilistic index, a measure for which covariate adjustment theory is well-developed. Using this connection, we show how covariate adjustment for the win odds is possible, leading to potentially more precise estimators and larger power as compared to the unadjusted win odds. We present the underlying theory for covariate adjustment for the win odds in an accessible way and apply the method on synthetic data based on the CANTOS trial (ClinicalTrials.gov identifier: NCT01327846) characteristics, on a subset of the HF-ACTION trial data (ClinicalTrials.gov identifier: NCT00047437), and on simulated data to study the operating characteristics of the method. We observe that there is indeed a potential gain in power when the win odds is adjusted for baseline covariates if the baseline covariates are prognostic for the outcome. This comes at the cost of a slight inflation of the type I error rate for small sample sizes.

2511.09425 2026-04-08 cs.LG stat.ML

Supporting Evidence for the Adaptive Feature Program across Diverse Models

Yicheng Li, Qian Lin

详情
英文摘要

Theoretically exploring the advantages of neural networks might be one of the most challenging problems in the AI era. An adaptive feature program has recently been proposed to analyze feature learning, the characteristic property of neural networks, in a more abstract way. Motivated by the celebrated Le Cam equivalence, we advocate the over-parameterized sequence models to further simplify the analysis of the training dynamics of adaptive feature program and present several pieces of supporting evidence for the adaptive feature program. More precisely, after having introduced the feature error measure (FEM) to characterize the quality of the learned feature, we show that the FEM is decreasing during the training process of several concrete adaptive feature models including linear regression, single/multiple index models, etc. We believe that this hints at the potential successes of the adaptive feature program.

2509.02617 2026-04-08 stat.ML cs.LG stat.CO

Gaussian process surrogate with physical law-corrected prior for multi-coupled PDEs defined on irregular geometry

Pucheng Tang, Hongqiao Wang, Wenzhou Lin, Qian Chen, Heng Yong

Comments 28 pages, 15 figures, 6 tables

详情
英文摘要

Parametric partial differential equations (PDEs) serve as fundamental mathematical tools for modeling complex physical phenomena, yet repeated high-fidelity numerical simulations across parameter spaces remain computationally prohibitive. In this work, we propose a physical law-corrected prior Gaussian process (LC-prior GP) for efficient surrogate modeling of parametric PDEs. The proposed method employs proper orthogonal decomposition (POD) to represent high-dimensional discrete solutions in a low-dimensional modal coefficient space, significantly reducing the computational cost of kernel optimization compared with standard GP approaches in full-order spaces. The governing physical laws are further incorporated to construct a law-corrected prior to overcome the limitation of existing physics-informed GP methods that rely on linear operator invariance, which enables applications to nonlinear and multi-coupled PDE systems without kernel redesign. Furthermore, the radial basis function-finite difference (RBF-FD) method is adopted for generating training data, allowing flexible handling of irregular spatial domains. The resulting differentiation matrices are independent of solution fields, enabling efficient optimization in the physical correction stage without repeated assembly. The proposed framework is validated through extensive numerical experiments, including nonlinear multi-parameter systems and scenarios involving multi-coupled physical variables defined on different two-dimensional irregular domains to highlight the accuracy and efficiency compared with baseline approaches.

2507.13301 2026-04-08 stat.CO stat.AP stat.ML

mNARX+: A surrogate model for complex dynamical systems using manifold-NARX and automatic feature selection

S. Schär, S. Marelli, B. Sudret

详情
Journal ref
Computer Methods in Applied Mechanics and Engineering, vol. 449 Part I, February 2026, 118550
英文摘要

We propose an automatic approach for manifold nonlinear autoregressive with exogenous inputs (mNARX) modeling that leverages the feature-based structure of functional-NARX (F-NARX) modeling. This novel approach, termed mNARX+, preserves the key strength of the mNARX framework, which is its expressivity allowing it to model complex dynamical systems, while simultaneously addressing a key limitation: the heavy reliance on domain expertise to identify relevant auxiliary quantities and their causal ordering. Our method employs a data-driven, recursive algorithm that automates the construction of the mNARX model sequence. It operates by sequentially selecting temporal features based on their correlation with the model prediction residuals, thereby automatically identifying the most critical auxiliary quantities and the order in which they should be modeled. This procedure significantly reduces the need for prior system knowledge. We demonstrate the effectiveness of the mNARX+ algorithm on two case studies: a Bouc-Wen oscillator with strong hysteresis and a complex aero-servo-elastic wind turbine simulator. The results show that the algorithm provides a systematic, data-driven method for creating accurate and stable surrogate models for complex dynamical systems.

2507.04567 2026-04-08 stat.ME stat.AP

Inverse Probability Weighting for Recurrent Event Models

Jiren Sun, Tobias Mutze, Richard Cook, Tianmeng Lyu

详情
英文摘要

Recurrent events are common and important clinical trial endpoints in many disease areas, e.g., cardiovascular hospitalizations in heart failure, relapses in multiple sclerosis, or exacerbations in asthma. During a trial, patients may experience intercurrent events, that is, events after treatment assignment which affect the interpretation or existence of the outcome of interest. In many settings, a treatment effect in the scenario in which the intercurrent event would not occur is of clinical interest. A proper estimation method of such a hypothetical treatment effect has to account for all confounders of the recurrent event process and the intercurrent event. In this paper, we propose estimators targeting hypothetical estimands in recurrent events with proper adjustments of baseline and internal time-varying covariates. Specifically, we apply inverse probability weighting (IPW) to the commonly used Lin-Wei-Yang-Ying (LWYY) and negative binomial (NB) models in recurrent event analysis. Simulation studies demonstrate that our approach outperforms alternative analytical methods in terms of bias and power.

2506.15272 2026-04-08 stat.ME

A penalized least squares estimator for extreme-value mixture models

Anas Mourahib, Anna Kiriliouk, Johan Segers

详情
英文摘要

Estimating the parameters of max-stable parametric models poses significant challenges, particularly when some parameters lie on the boundary of the parameter space. This situation arises when a subset of variables exhibits extreme values simultaneously, while the remaining variables do not -- a phenomenon commonly referred to as an extreme direction. A novel estimator is proposed for the parameters of a general parametric mixture model, incorporating a threshold exceedances approach based on a pseudo-norm penalization. The latter plays a crucial role in accurately identifying parameters at the boundary of the parameter space. Additionally, the estimator comes with a data-driven algorithm to detect groups of variables corresponding to extreme directions. The performance of the estimator is assessed in terms of both parameter estimation and the identification of extreme directions through extensive simulation studies. Finally, the method is applied to two real-world datasets: discharge measurements at stations along the Danube river, and financial portfolio losses from stocks listed on the NYSE, AMEX, and NASDAQ. In both applications, the sets of variables that can become large simultaneously are identified.

2505.00711 2026-04-08 math.ST stat.TH

Global Activity Scores

Ruilong Yue, Giray Ökten

详情
英文摘要

We introduce a new global sensitivity measure, the global activity scores. The measure is based on finite differences of the underlying function, in contrast to several sensitivity measures in the literature that are based on derivatives of the function. We establish its theoretical connection with Sobol' sensitivity indices and demonstrate its performance through numerical examples. In these examples, we compare global activity scores with Sobol' sensitivity indices, derivative-based sensitivity measures, and activity scores. The results show that in the presence of additive noise or high variability, global activity scores provide more stable and reliable identification of influential variables than derivative-based measures and activity scores, which are more sensitive to noise. In noiseless settings, however, all three approaches yield comparable results.

2505.00629 2026-04-08 stat.ME math.ST stat.TH

Expected Weighted D-optimal Designs for Experiments with Mixed Factors

Siting Lin, Yifei Huang, Jie Yang

Comments 42 pages, 13 tables, and 4 figures

详情
英文摘要

Optimal designs can help experimenters obtain more accurate parameter estimates with reduced experimental time and cost. In this paper, we characterize the Expected Weighted (EW) D-optimal designs as robust designs against unknown parameter values for experiments under a general parametric model with discrete and continuous factors. When a pilot study is available, we recommend sample-based EW D-optimal designs for subsequent experiments. Otherwise, we recommend EW D-optimal designs under a prior distribution for model parameters. We propose an EW ForLion algorithm for finding EW D-optimal designs with mixed factors, and justify that the designs found by our algorithm are EW D-optimal. To facilitate potential users in practice, we also develop a rounding algorithm that converts an approximate design with mixed factors to exact designs with prespecified grid points and the total number of experimental units. By applying our algorithms for real experiments under multinomial logistic models or generalized linear models, we show that our designs are highly efficient with respect to locally D-optimal designs and more robust against parameter value misspecifications.

2504.13620 2026-04-08 math.PR math.ST stat.TH

Set-valued conditional functionals of random sets

Tobias Fissler, Ilya Molchanov

Comments 30 pages

详情
Journal ref
Mathematical Methods of Operations Research, 2026
英文摘要

Many key quantities in statistics and probability theory such as the expectation, quantiles, expectiles and many risk measures are law-determined maps from a space of random variables to the reals. We call such a law-determined map, which is normalised, positively homogeneous, monotone and translation equivariant, a gauge function. Considered as a functional on the space of distributions, we can apply such a gauge to the conditional distribution of a random variable. This results in conditional gauges, such as conditional quantiles or conditional expectations. In this paper, we apply such scalar gauges to the support function of a random closed convex set $\bX$. This leads to a set-valued extension of a gauge function. We also introduce a conditional variant whose values are themselves random closed convex sets. In special cases, this functional becomes the conditional set-valued quantile or the conditional set-valued expectation of a random set. In particular, in the unconditional setup, if $\bX$ is a random translation of a deterministic cone and the gauge is either a quantile or an expectile, we recover the cone distribution functions studied by Andreas Hamel and his co-authors. In the conditional setup, the conditional quantile of a random singleton yields the conditional version of the half-space depth-trimmed regions.

2504.13382 2026-04-08 stat.AP

Intelligent data collection for network discrimination in material flow analysis using Bayesian optimal experimental design

Jiankan Liao, Xun Huan, Daniel Cooper

Comments 21 pages for manuscript, 8 pages for supporting information and bibliography, 8 figures

详情
Journal ref
Journal of Industrial Ecology 29 (2025) 2005-2023
英文摘要

Material flow analyses (MFAs) are powerful tools for highlighting resource efficiency opportunities in supply chains. MFAs are often represented as directed graphs, with nodes denoting processes and edges representing mass flows. However, network structure uncertainty -- uncertainty in the presence or absence of flows between nodes -- is common and can compromise flow predictions. While collection of more MFA data can reduce network structure uncertainty, an intelligent data acquisition strategy is crucial to optimize the resources (person-hours and money spent on collecting and purchasing data) invested in constructing an MFA. In this study, we apply Bayesian optimal experimental design (BOED), based on the Kullback-Leibler divergence, to efficiently target high-utility MFA data -- data that minimizes network structure uncertainty. We introduce a new method with reduced bias for estimating expected utility, demonstrating its superior accuracy over traditional approaches. We illustrate these advances with a case study on the U.S. steel sector MFA, where the expected utility of collecting specific single pieces of steel mass flow data aligns with the actual reduction in network structure uncertainty achieved by collecting said data from the United States Geological Survey and the World Steel Association. The results highlight that the optimal MFA data to collect depends on the total amount of data being gathered, making it sensitive to the scale of the data collection effort. Overall, our methods support intelligent data acquisition strategies, accelerating uncertainty reduction in MFAs and enhancing their utility for impact quantification and informed decision-making.

2504.03943 2026-04-08 stat.ML cond-mat.mtrl-sci cs.LG

Multi-Variable Batch Bayesian Optimization in Materials Research: Synthetic Data Analysis of Noise Sensitivity and Problem Landscape Effects

Imon Mia, Armi Tiihonen, Anna Ernst, Anusha Srivastava, Tonio Buonassisi, William Vandenberghe, Julia W. P. Hsu

详情
英文摘要

Bayesian Optimization (BO) machine learning method is increasingly used to guide experimental optimization tasks in materials science. To emulate the large number of input variables and noise-containing results in experimental materials research, we perform batch BO simulation of six design variables with a range of noise levels. Two test cases relevant for materials science problems are examined: a needle-in-a-haystack case (Ackley function) that may be encountered in, e.g., molecule optimizations, and a smooth landscape with a local optimum in addition to the global optimum (Hartmann function) that may be encountered in, e.g., material composition optimization. We show learning curves, performance metrics, and visualization to effectively track the optimization progression and evaluate how the optimization outcomes are affected by noise, batch-picking method, choice of acquisition function, and exploration hyperparameter values. We find that the effects of noise depend on the problem landscape: noise degrades the optimization results of a needle-in-a-haystack search (Ackley) dramatically more. However, with increasing noise, we observe an increasing probability of landing on the local optimum in Hartmann. Therefore, prior knowledge of the problem domain structure and noise level is essential when designing BO for materials research experiments. Synthetic data studies -- with known ground truth and controlled noise levels -- enable us to isolate and evaluate the impact of different batch BO components, {\it e.g.}, acquisition policy, objective metrics, and hyperparameter values, before transitioning to the inherent uncertainties of real experimental systems. The results and methodology of this study will facilitate a greater utilization of BO in guiding experimental materials research, specifically in settings with a large number of design variables to optimize.

2411.10646 2026-04-08 math.ST stat.ME stat.TH

Wasserstein Spatial Depth

François Bachoc, Alberto González-Sanz, Jean-Michel Loubes, Yisha Yao

详情
英文摘要

Modeling observations as random distributions embedded within Wasserstein spaces is becoming increasingly popular across scientific fields, as it captures the variability and geometric structure of the data more effectively. However, the distinct geometry and unique properties of Wasserstein spaces pose challenges to the application of conventional statistical tools, which are primarily designed for Euclidean spaces. Consequently, adapting and developing new methodologies for analysis within Wasserstein spaces has become essential. The space of distributions on $\mathbb{R}^d$ with $d>1$ is not linear, and "mimic" the geometry of a Riemannian manifold. In this paper, we extend the concept of statistical depth to distribution-valued data, introducing the notion of Wasserstein spatial depth. This new measure provides a way to rank and order distributions, enabling the development of order-based clustering techniques and inferential tools. We show that Wasserstein spatial depth (WSD) preserves critical properties of conventional statistical depths, notably, ranging within $[0,1]$, transformation and geodesic invariance, vanishing at infinity, reaching a maximum at the geometric median, and continuity. Regarding robustness, we characterize the breakdown points of the empirical depth regions and the influence function of the WSD. Additionally, the population WSD has a straightforward plug-in estimator based on sampled empirical distributions. We establish the estimator's consistency and asymptotic normality. We also provide a two-sample test for populations of distributions based on the WSD. Finally, extensive simulations and a real-data application showcase the practical efficacy of the WSD.

2408.02667 2026-04-08 stat.ME

An Online Meta-Level Adaptive Design Framework with Targeted Learning Inference: Applications to Evaluating and Utilizing Surrogate Outcomes in Adaptive Designs

Wenxin Zhang, Aaron Hudson, Maya Petersen, Mark van der Laan

详情
英文摘要

Adaptive designs are increasingly used in clinical trials and online experiments to improve participant outcomes by dynamically updating treatment allocation as data accumulate. In practice, experimenters often consider multiple candidate designs, each with distinct trade-offs. However, typically only one design is implemented at a time, leaving benefits and costs of alternative designs unobserved and unquantified. To address this, we propose a novel meta-level adaptive design framework that enables real-time, data-driven evaluation and selection among candidate adaptive designs. Specifically, we define a new class of causal estimands to evaluate adaptive designs and propose Targeted Maximum Likelihood Estimators for these estimands. These estimators are asymptotically normal while accommodating dependence in adaptive-design data without parametric assumptions, enabling online selection among candidate designs. We further apply this framework to a motivating example where multiple surrogates of a long-term outcome are considered for updating randomization probabilities in adaptive experiments. Unlike existing surrogate evaluation methods, our approach comprehensively quantifies surrogates' utility to accelerate detection of heterogeneous treatment effects, expedite updates to treatment randomization, and improve participant outcomes, facilitating dynamic selection among surrogate-guided designs. Overall, our framework provides a unified approach for evaluating opportunities and costs of various adaptive designs and guiding real-time decision-making in adaptive experiments.

2404.19367 2026-04-08 math.ST stat.TH

Parametric estimation and LAN property of the birth-death-move process with mutations

Lisa Balsollier, Frédéric Lavancier

详情
Journal ref
Electronic Journal of Statistics 2026, 20 (1), pp.1280-1323
英文摘要

A birth-death-move process with mutations is a Markov model for a system of marked particles in interaction, that move over time, with births and deaths. In addition the mark of each particle may also change, which constitutes a mutation. Assuming a parametric form for this model, we derive its likelihood expression and prove its local asymptotic normality. The efficiency and asymptotic distribution of the maximum likelihood estimator, with an explicit expression of its covariance matrix, is deduced. The underlying technical assumptions are showed to be satisfied by several natural parametric specifications. As an application, we leverage this model to analyse the joint dynamics of two types of proteins in a living cell, that are involved in the exocytosis process. Our approach enables to quantify the so-called colocalization phenomenon, answering an important question in microbiology.

2403.18072 2026-04-08 stat.CO cs.LG stat.ME stat.ML

Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte Carlo

Shijie Zhong, Wanggang Shen, Tommie Catanach, Xun Huan

Comments 28 pages, 19 figures

详情
Journal ref
SIAM/ASA Journal on Uncertainty Quantification 14 (2026) 19-47
英文摘要

Optimal experimental design (OED) provides a systematic approach to quantify and maximize the value of experimental data. Under a Bayesian approach, conventional OED maximizes the expected information gain (EIG) on model parameters. However, we are often interested in not the parameters themselves, but predictive quantities of interest (QoIs) that depend on the parameters in a nonlinear manner. We present a computational framework of predictive goal-oriented OED (GO-OED) suitable for nonlinear observation and prediction models, which seeks the experimental design providing the greatest EIG on the QoIs. In particular, we propose a nested Monte Carlo estimator for the QoI EIG, featuring Markov chain Monte Carlo for posterior sampling and kernel density estimation for evaluating the posterior-predictive density and its Kullback-Leibler divergence from the prior-predictive. The GO-OED design is then found by maximizing the EIG over the design space using Bayesian optimization. We demonstrate the effectiveness of the overall nonlinear GO-OED method, and illustrate its differences versus conventional non-GO-OED, through various test problems and an application of sensor placement for source inversion in a convection-diffusion field.

2403.13027 2026-04-08 cs.LG cs.CR cs.IT math.IT stat.ML

Towards Better Statistical Understanding of Watermarking LLMs

Zhongze Cai, Shang Liu, Hanzhao Wang, Huaiyang Zhong, Xiaocheng Li

详情
英文摘要

In this paper, we study the problem of watermarking large language models (LLMs). We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the red-green list watermarking algorithm. We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.

2402.15095 2026-04-08 math.ST cs.DS cs.LG math.PR stat.TH

The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime

Shuyang Gong, Zhangsong Li

Comments 31 pages; updated funding information

详情
英文摘要

Motivated by the problem of matching two correlated random geometric graphs, we study the problem of matching two Gaussian geometric models correlated through a latent node permutation. Specifically, given an unknown permutation $π^*$ on $\{1,\ldots,n\}$ and given $n$ i.i.d. pairs of correlated Gaussian vectors $\{X_{π^*(i)},Y_i\}$ in $\mathbb{R}^d$ with noise parameter $σ$, we consider two types of (correlated) weighted complete graphs with edge weights given by $A_{i,j}=\langle X_i,X_j \rangle$, $B_{i,j}=\langle Y_i,Y_j \rangle$. The goal is to recover the hidden vertex correspondence $π^*$ based on the observed matrices $A$ and $B$. For the low-dimensional regime where $d=O(\log n)$, Wang, Wu, Xu, and Yolou [WWXY22+] established the information thresholds for exact and almost exact recovery in matching correlated Gaussian geometric models. They also conducted numerical experiments for the classical Umeyama algorithm. In our work, we prove that this algorithm achieves exact recovery of $π^*$ when the noise parameter $σ=o(d^{-3}n^{-2/d})$, and almost exact recovery when $σ=o(d^{-3}n^{-1/d})$. Our results approach the information thresholds up to a $\operatorname{poly}(d)$ factor in the low-dimensional regime.

2307.02719 2026-04-08 cs.LG stat.ML

Understanding Uncertainty Sampling via Equivalent Loss

Shang Liu, Xiaocheng Li

Comments An updated version of the previous paper titled "Understanding Uncertainty Sampling". Added a major result of sample complexity and other theoretical results; cut the experiment part

详情
英文摘要

Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: There is no consensus on the proper definition of ``uncertainty'' for a specific task under a specific loss, nor a theoretical guarantee that prescribes a standard protocol to implement the algorithm. In this work, we systematically examine uncertainty sampling algorithms in the binary classification problem via a notion of equivalent loss which depends on the used uncertainty measure and the original loss function, and establish that an uncertainty sampling algorithm is optimizing against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. When the convexity is preserved, we give a sample complexity result for the equivalent loss, and later translate it into a binary loss guarantee via the surrogate link function. We prove the asymptotic superiority of the uncertainty sampling against the passive learning via this approach under mild conditions. We also discuss some potential extensions, including pool-based setting and potential generalization to the multi-class classification as well as the regression problems.

2306.10430 2026-04-08 stat.ML cs.AI cs.LG stat.CO stat.ME

Variational Sequential Optimal Experimental Design using Reinforcement Learning

Wanggang Shen, Jiayuan Dong, Xun Huan

详情
Journal ref
Computer Methods in Applied Mechanics and Engineering 444 (2025) 118068
英文摘要

We present variational sequential optimal experimental design (vsOED), a novel method for optimally designing a finite sequence of experiments within a Bayesian framework with information-theoretic criteria. vsOED employs a one-point reward formulation with variational posterior approximations, providing a provable lower bound to the expected information gain. Numerical methods are developed following an actor-critic reinforcement learning approach, including derivation and estimation of variational and policy gradients to optimize the design policy, and posterior approximation using Gaussian mixture models and normalizing flows. vsOED accommodates nuisance parameters, implicit likelihoods, and multiple candidate models, while supporting flexible design criteria that can target designs for model discrimination, parameter inference, goal-oriented prediction, and their weighted combinations. We demonstrate vsOED across various engineering and science applications, illustrating its superior sample efficiency compared to existing sequential experimental design algorithms.

2305.02657 2026-04-08 stat.ML cs.LG

On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains

Yicheng Li, Zixiong Yu, Guhan Chen, Qian Lin

详情
英文摘要

In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.

2206.04236 2026-04-08 cs.CR cs.DS cs.LG stat.ML

Edgeworth Accountant: An Analytical Approach to Differential Privacy Composition

Hua Wang, Sheng Gao, Huanyu Zhang, Milan Shen, Weijie J. Su, Jiayuan Wu

详情
英文摘要

In privacy-preserving data analysis, many procedures and algorithms are structured as compositions of multiple private building blocks. As such, an important question is how to efficiently compute the overall privacy loss under composition. This paper introduces the Edgeworth Accountant, an analytical approach to composing differential privacy guarantees for private algorithms. Leveraging the $f$-differential privacy framework, the Edgeworth Accountant accurately tracks privacy loss under composition, enabling a closed-form expression of privacy guarantees through privacy-loss log-likelihood ratios (PLLRs). As implied by its name, this method applies the Edgeworth expansion to estimate and define the probability distribution of the sum of the PLLRs. Furthermore, by using a technique that simplifies complex distributions into simpler ones, we demonstrate the Edgeworth Accountant's applicability to any noise-addition mechanism. Its main advantage is providing $(ε, δ)$-differential privacy bounds that are non-asymptotic and do not significantly increase computational cost. This feature sets it apart from previous approaches, in which the running time increases with the number of mechanisms under composition. We conclude by showing how our Edgeworth Accountant offers accurate estimates and tight upper and lower bounds on $(ε, δ)$-differential privacy guarantees, especially tailored for training private models in deep learning and federated analytics.