arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.25678 2026-03-27 cs.CE econ.GN q-fin.EC stat.AP

Concentration And Distribution of Container Flows In Mauritania's Maritime System (2019-2022)

Mohamed Bouka, Moulaye Abdel Kader Ould Moulaye Ismail

详情
英文摘要

Small, trade-dependent economies often exhibit limited maritime connectivity, yet empirical evidence on the structural configuration of their container systems remains limited. This study analyzes route concentration and node distributions in Mauritania's maritime container system during 2019-2022 using shipment-level data measured in forty-foot equivalent units (FFE). Routes, origin nodes, destination nodes, and industries are represented as FFE-weighted probability distributions, and concentration and divergence metrics are used to assess structural properties. The results show strong corridor concentration across the seven observed routes (HHI = 0.296), with the top three accounting for approximately 84% of total FFE. Node structures differ by direction: imports are associated with a highly concentrated set of destination nodes (HHI = 0.848), while exports originate from only two origin nodes (HHI = 0.567) and are distributed across a large number of destinations (HHI = 0.053). Industry distributions are more concentrated for exports (HHI = 0.352) than for imports (HHI = 0.096), with frozen fish and seafood accounting for more than 53% of export volume. Temporal analysis shows that route concentration remains stable over time (HHI ~ 0.293-0.303), while node distributions exhibit measurable variation, particularly for export destinations (JSD ~ 0.395) and import origins (JSD ~ 0.250).

2603.25668 2026-03-27 stat.ME

A generalized Bayesian approach to multiple changepoint analysis

Yuhui Wang, Andrew M. Thomas, Michael Jauch

Comments 36 pages, 10 figures

详情
英文摘要

We introduce a generalized Bayesian method for multiple changepoint analysis with a loss function inspired by multinomial logistic regression. The method does not require a specification of the data-generating process and avoids restrictive assumptions on the nature of changepoints. From the joint posterior distribution, we can make simultaneous inference on the locations of changepoints and the coefficients of a multinomial logistic regression model for distinguishing data across homogeneous segments. The multinomial logistic regression coefficients provide a familiar means of interpreting potentially complex changes. To select the number of changepoints, we leverage posterior summaries that measure whether the multinomial logistic classifier can distinguish data from either side of a potential changepoint. To simulate from the generalized posterior distribution, we present a Gibbs sampler based on Pólya-Gamma data augmentation. We assess the accuracy and flexibility of our method through simulation studies featuring different types of changes and demonstrate its interpretability through applications to financial network data and topological data derived from nanoparticle videos.

2603.25657 2026-03-27 math.OC stat.ML

Instance-optimal stochastic convex optimization: Can we improve upon sample-average and robust stochastic approximation?

Liwei Jiang, Ashwin Pananjady

Comments 51 pages, 5 figures

详情
英文摘要

We study the unconstrained minimization of a smooth and strongly convex population loss function under a stochastic oracle that introduces both additive and multiplicative noise; this is a canonical and widely-studied setting that arises across operations research, signal processing, and machine learning. We begin by showing that standard approaches such as sample average approximation and robust (or averaged) stochastic approximation can lead to suboptimal -- and in some cases arbitrarily poor -- performance with realistic finite sample sizes. In contrast, we demonstrate that a carefully designed variance reduction strategy, which we term VISOR for short, can significantly outperform these approaches while using the same sample size. Our upper bounds are complemented by finite-sample, information-theoretic local minimax lower bounds, which highlight fundamental, instance-dependent factors that govern the performance of any estimator. Taken together, these results demonstrate that an accelerated variant of VISOR is instance-optimal, achieving the best possible sample complexity up to logarithmic factors while also attaining optimal oracle complexity. We apply our theory to generalized linear models and improve upon classical results. In particular, we obtain the best-known non-asymptotic, instance-dependent generalization error bounds for stochastic methods, even in linear regression.

2603.25622 2026-03-27 cs.DS cs.LG math.ST stat.ML stat.TH

The Geometry of Efficient Nonconvex Sampling

Santosh S. Vempala, Andre Wibisono

详情
英文摘要

We present an efficient algorithm for uniformly sampling from an arbitrary compact body $\mathcal{X} \subset \mathbb{R}^n$ from a warm start under isoperimetry and a natural volume growth condition. Our result provides a substantial common generalization of known results for convex bodies and star-shaped bodies. The complexity of the algorithm is polynomial in the dimension, the Poincaré constant of the uniform distribution on $\mathcal{X}$ and the volume growth constant of the set $\mathcal{X}$.

2603.25579 2026-03-27 stat.ML cond-mat.dis-nn cs.LG

The Rules-and-Facts Model for Simultaneous Generalization and Memorization in Neural Networks

Gabriele Farné, Fabrizio Boncoraglio, Lenka Zdeborová

详情
英文摘要

A key capability of modern neural networks is their capacity to simultaneously learn underlying rules and memorize specific facts or exceptions. Yet, theoretical understanding of this dual capability remains limited. We introduce the Rules-and-Facts (RAF) model, a minimal solvable setting that enables precise characterization of this phenomenon by bridging two classical lines of work in the statistical physics of learning: the teacher-student framework for generalization and Gardner-style capacity analysis for memorization. In the RAF model, a fraction $1 - \varepsilon$ of training labels is generated by a structured teacher rule, while a fraction $\varepsilon$ consists of unstructured facts with random labels. We characterize when the learner can simultaneously recover the underlying rule - allowing generalization to new data - and memorize the unstructured examples. Our results quantify how overparameterization enables the simultaneous realization of these two objectives: sufficient excess capacity supports memorization, while regularization and the choice of kernel or nonlinearity control the allocation of capacity between rule learning and memorization. The RAF model provides a theoretical foundation for understanding how modern neural networks can infer structure while storing rare or non-compressible information.

2603.25530 2026-03-27 stat.ML cs.NA math.NA

Adaptive Subspace Modeling With Functional Tucker Decomposition

Noah Steidle, Joppe De Jonghe, Mariya Ishteva

Comments 18 pages, 12 figures

详情
英文摘要

Tensors provide a structured representation for multidimensional data, yet discretization can obscure important information when such data originates from continuous processes. We address this limitation by introducing a functional Tucker decomposition (FTD) that embeds mode-wise continuity constraints directly into the decomposition. The FTD employs reproducing kernel Hilbert spaces (RKHS) to model continuous modes without requiring an a-priori basis, while preserving the multi-linear subspace structure of the Tucker model. Through RKHS-driven representation, the model yields adaptive and expressive factor descriptions that enable targeted modeling of subspaces. The value of this approach is demonstrated in domain-variant tensor classification. In particular, we illustrate its effectiveness with classification tasks in hyperspectral imaging and multivariate time series analysis, highlighting the benefits of combining structural decomposition with functional adaptability.

2603.25509 2026-03-27 econ.EM cs.LG stat.AP stat.ME stat.ML

Conformal Prediction for Nonparametric Instrumental Regression

Masahiro Kato

详情
英文摘要

We propose a method for constructing distribution-free prediction intervals in nonparametric instrumental variable regression (NPIV), with finite-sample coverage guarantees. Building on the conditional guarantee framework in conformal inference, we reformulate conditional coverage as marginal coverage over a class of IV shifts $\mathcal{F}$. Our method can be combined with any NPIV estimator, including sieve 2SLS and other machine-learning-based NPIV methods such as neural networks minimax approaches. Our theoretical analysis establishes distribution-free, finite-sample coverage over a practitioner-chosen class of IV shifts.

2603.25480 2026-03-27 cs.AI math.ST stat.TH

Retraining as Approximate Bayesian Inference

Harrison Katz

详情
英文摘要

Model retraining is usually treated as an ongoing maintenance task. But as Harrison Katz now argues, retraining can be better understood as approximate Bayesian inference under computational constraints. The gap between a continuously updated belief state and your frozen deployed model is "learning debt," and the retraining decision is a cost minimization problem with a threshold that falls out of your loss function. In this article Katz provides a decision-theoretic framework for retraining policies. The result is evidence-based triggers that replace calendar schedules and make governance auditable. For readers less familiar with the Bayesian and decision-theoretic language, key terms are defined in a glossary at the end of the article.

2603.25466 2026-03-27 stat.ML cs.LG math.ST stat.TH

Residual-as-Teacher: Mitigating Bias Propagation in Student--Teacher Estimation

Kakei Yamamoto, Martin J. Wainwright

详情
英文摘要

We study statistical estimation in a student--teacher setting, where predictions from a pre-trained teacher are used to guide a student model. A standard approach is to train the student to directly match the teacher's outputs, which we refer to as student soft matching (SM). This approach directly propagates any systematic bias or mis-specification present in the teacher, thereby degrading the student's predictions. We propose and analyze an alternative scheme, known as residual-as-teacher (RaT), in which the teacher is used to estimate residuals in the student's predictions. Our analysis shows how the student can thereby emulate a proximal gradient scheme for solving an oracle optimization problem, and this provably reduces the effect of teacher bias. For general student--teacher pairs, we establish non-asymptotic excess risk bounds for any RaT fixed point, along with convergence guarantees for the student-teacher iterative scheme. For kernel-based student--teacher pairs, we prove a sharp separation: the RaT method achieves the minimax-optimal rate, while the SM method incurs constant prediction error for any sample size. Experiments on both synthetic data and ImageNette classification under covariate shift corroborate our theoretical findings.

2603.25455 2026-03-27 stat.AP q-bio.QM

A Bayesian Gamma-power-mixture survival regression model: predicting the recurrence of prostate cancer post-prostatectomy

Tommy Walker Mackay, Mingtong Xu, Shahrokh F. Shariat, Roger Sewell

Comments 19 pages, 13 figures, 3 tables

详情
英文摘要

In a dataset of 423 patients who had had radical prostatectomy for localised prostate cancer we estimated the apparent Shannon information (ASI) about time to biochemical recurrence in various subsets of the available pre-op variables using a Bayesian Gamma-power-mixture survival regression model. In all the subsets examined the ASI was positive with posterior probability greater than 0.975 . Using only age and results of pre-operative blood tests (PSA and biomarkers) we achieved 0.232 (0.180 to 0.290) nats ASI (0.335 (0.260 to 0.419) bits) (posterior mean and equitailed 95% posterior confidence intervals). This is more than double the mean posterior ASI previously achieved on the same dataset by a subset of the current authors using a log-skew-Student-mixture model, and is greater than that previous value with posterior probability greater than 0.99 . Additionally using pre- or post-operative Gleason grades, operative findings, clinical stage, and presence or absence of extraprostatic extension or seminal vesicle invasion did not increase the ASI extracted. However removing the blood-based biomarkers and replacing them with either pre-operative Gleason grades or findings available from MRI scanning greatly reduced the available ASI to respectively 0.077 (0.038 to 0.120) and 0.088 (0.045 to 0.132) nats (both less than the values using blood-based biomarkers with posterior probability greater than 0.995). A greedy approach to selection of the best biomarkers gave TGFbeta1, VCAM1, IL6sR, and uPA in descending order of importance from those examined.

2603.25397 2026-03-27 stat.ME cs.AI cs.LG stat.ML

A Causal Framework for Evaluating ICU Discharge Strategies

Sagar Nagaraj Simha, Juliette Ortholand, Dave Dongelmans, Jessica D. Workum, Olivier W. M. Thijssens, Ameen Abu-Hanna, Giovanni Cinà

Comments 8 pages, 2 figures, 2 tables

详情
英文摘要

In this applied paper, we address the difficult open problem of when to discharge patients from the Intensive Care Unit. This can be conceived as an optimal stopping scenario with three added challenges: 1) the evaluation of a stopping strategy from observational data is itself a complex causal inference problem, 2) the composite objective is to minimize the length of intervention and maximize the outcome, but the two cannot be collapsed to a single dimension, and 3) the recording of variables stops when the intervention is discontinued. Our contributions are two-fold. First, we generalize the implementation of the g-formula Python package, providing a framework to evaluate stopping strategies for problems with the aforementioned structure, including positivity and coverage checks. Second, with a fully open-source pipeline, we apply this approach to MIMIC-IV, a public ICU dataset, demonstrating the potential for strategies that improve upon current care.

2603.25370 2026-03-27 stat.ML cs.LG

A Distribution-to-Distribution Neural Probabilistic Forecasting Framework for Dynamical Systems

Tianlin Yang, Hailiang Du, Louis Aslett

Comments 11 pages,5 figures

详情
英文摘要

Probabilistic forecasting provides a principled framework for uncertainty quantification in dynamical systems by representing predictions as probability distributions rather than deterministic trajectories. However, existing forecasting approaches, whether physics-based or neural-network-based, remain fundamentally trajectory-oriented: predictive distributions are usually accessed through ensembles or sampling, rather than evolved directly as dynamical objects. A distribution-to-distribution (D2D) neural probabilistic forecasting framework is developed to operate directly on predictive distributions. The framework introduces a distributional encoding and decoding structure around a replaceable neural forecasting module, using kernel mean embeddings to represent input distributions and mixture density networks to parameterise output predictive distributions. This design enables recursive propagation of predictive uncertainty within a unified end-to-end neural architecture, with model training and evaluation carried out directly in terms of probabilistic forecast skill. The framework is demonstrated on the Lorenz63 chaotic dynamical system. Results show that the D2D model captures nontrivial distributional evolution under nonlinear dynamics, produces skillful probabilistic forecasts without explicit ensemble simulation, and remains competitive with, and in some cases outperforms, a simplified perfect model benchmark. These findings point to a new paradigm for probabilistic forecasting, in which predictive distributions are learned and evolved directly rather than reconstructed indirectly through ensemble-based uncertainty propagation.

2603.25311 2026-03-27 stat.ML cs.LG

Practical Efficient Global Optimization is No-regret

Jingyi Wang, Haowei Wang, Nai-Yuan Chiang, Juliane Mueller, Tucker Hartland, Cosmin G. Petra

详情
英文摘要

Efficient global optimization (EGO) is one of the most widely used noise-free Bayesian optimization algorithms.It comprises the Gaussian process (GP) surrogate model and expected improvement (EI) acquisition function. In practice, when EGO is applied, a scalar matrix of a small positive value (also called a nugget or jitter) is usually added to the covariance matrix of the deterministic GP to improve numerical stability. We refer to this EGO with a positive nugget as the practical EGO. Despite its wide adoption and empirical success, to date, cumulative regret bounds for practical EGO have yet to be established. In this paper, we present for the first time the cumulative regret upper bound of practical EGO. In particular, we show that practical EGO has sublinear cumulative regret bounds and thus is a no-regret algorithm for commonly used kernels including the squared exponential (SE) and Matérn kernels ($ν>\frac{1}{2}$). Moreover, we analyze the effect of the nugget on the regret bound and discuss the theoretical implication on its choice. Numerical experiments are conducted to support and validate our findings.

2603.25280 2026-03-27 cs.IT math.IT math.ST stat.TH

List Estimation

Nikola Zlatanov, Amin Gohari, Farzad Shahrivari, Mikhail Rudakov

详情
英文摘要

Classical estimation outputs a single point estimate of an unknown $d$-dimensional vector from an observation. In this paper, we study \emph{$k$-list estimation}, in which a single observation is used to produce a list of $k$ candidate estimates and performance is measured by the expected squared distance from the true vector to the closest candidate. We compare this centralized setting with a symmetric decentralized MMSE benchmark in which $k$ agents observe conditionally i.i.d.\ measurements and each agent outputs its own MMSE estimate. On the centralized side, we show that optimal $k$-list estimation is equivalent to fixed-rate $k$-point vector quantization of the posterior distribution and, under standard regularity conditions, admits an exact high-rate asymptotic expansion with explicit constants and decay rate $k^{-2/d}$. On the decentralized side, we derive lower bounds in terms of the small-ball behavior of the single-agent MMSE error; in particular, when the conditional error density is bounded near the origin, the benchmark distortion cannot decay faster than order $k^{-2/d}$. We further show that if the error density vanishes at the origin, then the decentralized benchmark is provably unable to match the centralized $k^{-2/d}$ exponent, whereas the centralized estimator retains that scaling. Gaussian specializations yield explicit formulas and numerical experiments corroborate the predicted asymptotic behavior. Overall, the results show that, in the scaling with $k$, one observation combined with $k$ carefully chosen candidates can be asymptotically as effective as -- and in some regimes strictly better than -- this MMSE-based decentralized benchmark with $k$ independent observations.

2603.25235 2026-03-27 stat.AP

Bayesian Inference for Epidemic Final Size Datasets with Hidden Underlying Household Structure

Joseph Brooks, Thomas House, Lorenzo Pellis, Joe Hilton

详情
英文摘要

Households represent a key unit of interest in infectious disease epidemiology, in both empirical studies and mathematical modelling. The within-household transmission potential of a disease is often summarised by a secondary attack ratio (SAR). Despite its widespread use, the SAR depends on the household size distribution (HHSD) seen during the study period, making it difficult to generalise to new contexts. Extending estimates of transmission potential to new populations instead requires estimates of person-to-person transmission rates which can be convoluted with data on population structure to parametrise mechanistic transmission models. In this study we present a new Bayesian inference method which uses an MCMC algorithm to infer the transmission intensity by imputing the unreported household structure underlying the epidemic. This method can be run on household epidemiological data reported at varying levels of resolution. For synthetic data from a realistic underlying HHSD, we were able to achieve over 95% coverage in our estimates of transmission rate consistently. We were also able to consistently achieve over 95% coverage for data generated with a pathological underlying HHSD, given strong information about the HHSD. Using an existing dataset which recorded micro-scale household epidemiological outcomes during the COVID-19 pandemic, we show that stratifying observed SARs by household size substantially reduces the uncertainty in estimates. Our findings suggest that researchers conducting household epidemiological studies can improve the utility of results for infectious disease modellers by reporting household-stratified estimates. These results aim to encourage the reporting of higher resolution outputs in epidemiological field work as, in the absence of strong priors, transmission parameters were not easily identifiable from low resolution datasets, which are often reported.

2603.25224 2026-03-27 stat.ML cs.LG

Fair regression under localized demographic parity constraints

Arthur Charpentier, Christophe Denis, Romuald Elie, Mohamed Hebiri, François HU

详情
英文摘要

Demographic parity (DP) is a widely used group fairness criterion requiring predictive distributions to be invariant across sensitive groups. While natural in classification, full distributional DP is often overly restrictive in regression and can lead to substantial accuracy loss. We propose a relaxation of DP tailored to regression, enforcing parity only at a finite set of quantile levels and/or score thresholds. Concretely, we introduce a novel (${\ell}$, Z)-fair predictor, which imposes groupwise CDF constraints of the form F f |S=s (z m ) = ${\ell}$ m for prescribed pairs (${\ell}$ m , z m ). For this setting, we derive closed-form characterizations of the optimal fair discretized predictor via a Lagrangian dual formulation and quantify the discretization cost, showing that the risk gap to the continuous optimum vanishes as the grid is refined. We further develop a model-agnostic post-processing algorithm based on two samples (labeled for learning a base regressor and unlabeled for calibration), and establish finite-sample guarantees on constraint violation and excess penalized risk. In addition, we introduce two alternative frameworks where we match group and marginal CDF values at selected score thresholds. In both settings, we provide closed-form solutions for the optimal fair discretized predictor. Experiments on synthetic and real datasets illustrate an interpretable fairness-accuracy trade-off, enabling targeted corrections at decision-relevant quantiles or thresholds while preserving predictive performance.

2603.23783 2026-03-27 cs.LG cs.AI math.OC math.PR stat.ML

Probabilistic Geometric Alignment via Bayesian Latent Transport for Domain-Adaptive Foundation Models

Aueaphum Aueawatthanaphisut, Kuepon Auewattanapisut

Comments 11 pages, 8 Figures, 25 Equations, 5 Tables and 3 Theorems

详情
英文摘要

Adapting large-scale foundation models to new domains with limited supervision remains a fundamental challenge due to latent distribution mismatch, unstable optimization dynamics, and miscalibrated uncertainty propagation. This paper introduces an uncertainty-aware probabilistic latent transport framework that formulates domain adaptation as a stochastic geometric alignment problem in representation space. A Bayesian transport operator is proposed to redistribute latent probability mass along Wasserstein-type geodesic trajectories, while a PAC-Bayesian regularization mechanism constrains posterior model complexity to mitigate catastrophic overfitting. The proposed formulation yields theoretical guarantees on convergence stability, loss landscape smoothness, and sample efficiency under distributional shift. Empirical analyses demonstrate substantial reduction in latent manifold discrepancy, accelerated transport energy decay, and improved covariance calibration compared with deterministic fine-tuning and adversarial domain adaptation baselines. Furthermore, bounded posterior uncertainty evolution indicates enhanced probabilistic reliability during cross-domain transfer. By establishing a principled connection between stochastic optimal transport geometry and statistical generalization theory, the proposed framework provides new insights into robust adaptation of modern foundation architectures operating in heterogeneous environments. These findings suggest that uncertainty-aware probabilistic alignment constitutes a promising paradigm for reliable transfer learning in next-generation deep representation systems.

2601.07325 2026-03-27 stat.ML cs.LG math.ST stat.TH

Robust Bayesian Inference via Variational Approximations of Generalized Rho-Posteriors

EL Mahdi Khribch, Pierre Alquier

Comments 45 pages including the proofs in appendices, 16 figures

详情
英文摘要

We introduce the $\widetildeρ$-posterior, a modified version of the $ρ$-posterior, obtained by replacing the supremum over competitor parameters with a softmax aggregation. This modification allows a PAC-Bayesian analysis of the $\widetildeρ$-posterior. This yields finite-sample oracle inequalities with explicit convergence rates that inherit the key robustness properties of the original framework, in particular, graceful degradation under model misspecification and data contamination. Crucially, the PAC-Bayesian oracle inequalities extend to variational approximations of the $\widetildeρ$-posterior, providing theoretical guarantees for tractable inference. Numerical experiments on exponential families, regression, and real-world datasets confirm that the resulting variational procedures achieve robustness competitive with theoretical predictions at computational cost comparable to standard variational Bayes.

2512.09433 2026-03-27 stat.ME

Multiply-robust Estimator of Cumulative Incidence Function Difference for Right-Censored Competing Risks Data

Yifei Tian, Ying Wu

详情
英文摘要

In causal inference, estimating the average treatment effect is a central objective, and in the context of competing risks data, this effect can be quantified by the cause-specific cumulative incidence function (CIF) difference. While doubly robust estimators give a more robust way to estimate the causal effect from the observational study, they remain inconsistent if both models are misspecified. To improve the robustness, we develop a multiply robust estimator for the difference in cause-specific CIFs using right-censored competing risks data. The proposed framework integrates the pseudo-value approach, which transforms the censored, time-dependent CIF into a complete-data outcome, with the multiply robust estimation framework. By specifying multiple candidate models for both the propensity score and the outcome regression, the resulting estimator is consistent and asymptotically unbiased, provided that at least one of the multiple propensity score or outcome regression models is correctly specified. Simulation studies show our multiply robust estimator remains virtually unbiased and maintains nominal coverage rates under various model misspecification scenarios and a wide range of choices for the censoring rate. Finally, the proposed multiply robust model is illustrated using the Right Heart Catheterization dataset.

2509.18708 2026-03-27 stat.ME math.ST stat.CO stat.TH

Optimization-centric cutting feedback for semiparametric models

Linda S. L. Tan, David J. Nott, David T. Frazier

详情
英文摘要

Complex statistical models are often built by combining multiple submodels, called modules. Here we consider modular inference where the modules contain both parametric and nonparametric components. In such cases, standard Bayesian inference can be highly sensitive to misspecification in any module, and influential prior specifications for the nonparametric components can compromise inference for the parametric components, and vice versa. We propose a novel "optimization-centric" approach to cutting feedback for semiparametric modular inference, which can address misspecification and prior-data conflicts. The proposed cut posteriors are defined via a variational optimization problem like other generalized posteriors, but regularization is based on Rényi divergence, instead of Kullback-Leibler divergence (KLD). We show empirically that defining the cut posterior using Rényi divergence delivers more robust inference than KLD, and Rényi divergence reduces the tendency to underestimate uncertainty when the variational approximations impose strong parametric or independence assumptions. Novel posterior concentration results that accommodate the Rényi divergence and allow for semiparametric components are derived, extending existing results for cut posteriors that only apply to KLD and parametric models. These new methods are demonstrated in a benchmark example and two real examples: Gaussian process adjustments for confounding in causal inference and misspecified copula models with nonparametric marginals.

2508.19897 2026-03-27 stat.ML cs.AI cs.LG

The Information Dynamics of Generative Diffusion

Dejan Stancevic, Luca Ambrogioni

Comments 25 pages

详情
Journal ref
Entropy 2026, 28(2), 195
英文摘要

Generative diffusion models have emerged as a powerful class of models in machine learning, yet a unified theoretical understanding of their operation is still developing. This paper provides an integrated perspective on generative diffusion by connecting the information-theoretic, dynamical, and thermodynamic aspects. We demonstrate that the rate of conditional entropy production during generation (i.e., the generative bandwidth) is directly governed by the expected divergence of the score function's vector field. This divergence, in turn, is linked to the branching of trajectories and generative bifurcations, which we characterize as symmetry-breaking phase transitions in the energy landscape. Beyond ensemble averages, we demonstrate that symmetry-breaking decisions are revealed by peaks in the variance of pathwise conditional entropy, capturing heterogeneity in how individual trajectories resolve uncertainty. Together, these results establish generative diffusion as a process of controlled, noise-induced symmetry breaking, in which the score function acts as a dynamic nonlinear filter that regulates both the rate and variability of information flow from noise to data.

2508.18768 2026-03-27 stat.ML cs.LG

Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

Mengmeng Li, Philipp J. Schneider, Jelisaveta Aleksić, Daniel Kuhn

Comments Published at ICLR 2026

详情
英文摘要

We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees $\widetilde{\mathcal{O}}(\sqrt{T})$ regret in the adversarial regime and $\widetilde{\mathcal{O}}(\ln T)$ regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the $K$-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.

2508.03272 2026-03-27 cs.LG cs.IT math.IT stat.ML

The alpha-beta divergence for real and complex data

Sergio Cruces

详情
Journal ref
IEEE Signal Processing Letters (2026)
英文摘要

Divergences are fundamental to the information criteria that underpin most signal processing algorithms. The alpha-beta family of divergences, designed for non-negative data, offers a versatile framework that parameterizes and continuously interpolates several separable divergences found in existing literature. This work extends the definition of alpha-beta divergences to accommodate complex data, specifically when the arguments of the divergence are complex vectors. This novel formulation is designed in such a way that, by setting the divergence hyperparameters to unity, it particularizes to the well-known Euclidean and Mahalanobis squared distances. Other choices of hyperparameters yield practical separable and non-separable extensions of several classical divergences. In the context of the problem of approximating a complex random vector, the centroid obtained by optimizing the alpha-beta mean distortion has a closed-form expression, which interpretation sheds light on the distinct roles of the divergence hyperparameters. These contributions may have wide potential applicability, as there are many signal processing domains in which the underlying data are inherently complex.

2506.03726 2026-03-27 cs.DL stat.AP

Introducing multiverse analysis to bibliometrics: The case of team size effects on disruptive research

Christian Leibel, Lutz Bornmann

Comments 50 pages, 3 figures, 11 tables. Quantitative Science Studies 2026

详情
英文摘要

Although bibliometrics has become an essential tool in the evaluation of research performance, bibliometric analyses are sensitive to a range of methodological choices. Subtle choices in data selection, indicator construction, and modeling decisions can substantially alter results. Ensuring robustness (meaning that findings hold up under different reasonable scenarios) is therefore critical for credible research and research evaluation. To address this issue, this study introduces multiverse analysis to bibliometrics. Multiverse analysis is a statistical tool that enables analysts to transparently discuss modeling assumptions and thoroughly assess model robustness. Whereas standard robustness checks usually cover only a small subset of all plausible models, multiverse analysis includes all plausible models. The benefits of multiverse analysis are illustrated by assessing the robustness of the findings reported by Wu et al. (2019), who observed that small teams tend to produce more disruptive research than large teams. While we found robust evidence of a negative effect of team size on disruption scores, the effect size depends substantially on the model specification. Our findings underscore the importance of assessing the multiverse robustness of bibliometric results to clarify their practical implications.

2505.19807 2026-03-27 cs.LG stat.ML

Density Ratio-Free Doubly Robust Proxy Causal Learning

Bariscan Bozkurt, Houssam Zenati, Dimitri Meunier, Liyuan Xu, Arthur Gretton

Comments Neurips published version

详情
英文摘要

We study the problem of causal function estimation in the Proxy Causal Learning (PCL) framework, where confounders are not observed but proxies for the confounders are available. Two main approaches have been proposed: outcome bridge-based and treatment bridge-based methods. In this work, we propose two kernel-based doubly robust estimators that combine the strengths of both approaches, and naturally handle continuous and high-dimensional variables. Our identification strategy builds on a recent density ratio-free method for treatment bridge-based PCL; furthermore, in contrast to previous approaches, it does not require indicator functions or kernel smoothing over the treatment variable. These properties make it especially well-suited for continuous or high-dimensional treatments. By using kernel mean embeddings, we propose the first density-ratio free doubly robust estimators for proxy causal learning, which have closed form solutions and strong uniform consistency guarantees. Our estimators outperform existing methods on PCL benchmarks, including a prior doubly robust method that requires both kernel smoothing and density ratio estimation.

2505.19046 2026-03-27 stat.ML cs.LG

When Models Don't Collapse: On the Consistency of Iterative MLE

Daniel Barzilai, Ohad Shamir

详情
英文摘要

The widespread use of generative models has created a feedback loop, in which each generation of models is trained on data partially produced by its predecessors. This process has raised concerns about model collapse: A critical degradation in performance caused by repeated training on synthetic data. However, different analyses in the literature have reached different conclusions as to the severity of model collapse. As such, it remains unclear how concerning this phenomenon is, and under which assumptions it can be avoided. To address this, we theoretically study model collapse for maximum likelihood estimation (MLE), in a natural setting where synthetic data is gradually added to the original data set. Under standard assumptions (similar to those long used for proving asymptotic consistency and normality of MLE), we establish non-asymptotic bounds showing that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions (beyond MLE consistency) are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set. To the best of our knowledge, these are the first rigorous examples of iterative generative modeling with accumulating data that rapidly leads to model collapse.

2505.00450 2026-03-27 stat.ME stat.AP

Spatial vertical regression for spatial panel data: Evaluating the effect of the Florentine tramway's first line on commercial vitality

Giulio Grossi, Alessandra Mattei, Georgia Papadogeorgou

详情
英文摘要

Synthetic control methods are commonly used in panel data settings to evaluate the effect of an intervention. In many of these cases, the treated and control units correspond to spatial units such as regions or neighborhoods. Our approach addresses the challenge of understanding how an intervention applied at specific locations influences the surrounding area. Traditional synthetic control applications may struggle with defining the effective area of impact, the extent of treatment propagation across space, and the variation of effects with distance from the treatment sites. To address these challenges, we introduce Spatial Vertical Regression (SVR) within the Bayesian paradigm. This innovative approach allows us to accurately predict the outcomes in varying proximities to the treatment sites, while meticulously accounting for the spatial structure inherent in the data. Specifically, rooted on the vertical regression framework of the synthetic control method, SVR employs a Gaussian process to ensure that the imputation of missing potential outcomes for areas of different distance around the treatment sites is spatially coherent, reflecting the expectation that nearby areas experience similar outcomes and have similar relationships to control areas. This approach is particularly pertinent to our study on the Florentine tramway's first line construction. We study its influence on the local commercial landscape, focusing on how business prevalence varies at different distances from the tram stops.

2504.21419 2026-03-27 stat.ML cs.LG math.ST stat.TH

Kernel Density Machines

Andrea Della Vecchia, Damir Filipovic, Paul Schneider

详情
英文摘要

We introduce kernel density machines (KDM), an agnostic kernel-based framework for learning the Radon-Nikodym derivative (density) between probability measures under minimal assumptions. KDM applies to general measurable spaces and avoids the structural requirements common in classical nonparametric density estimators. We construct a sample estimator and prove its consistency and a functional central limit theorem. To enable scalability, we develop Nystrom-type low-rank approximations and derive optimal error rates, filling a gap in the literature where such guarantees for density learning have been missing. We demonstrate the versatility of KDM through applications to kernel-based two-sample testing and conditional distribution estimation, the latter enjoying dimension-free guarantees beyond those of locally smoothed methods. Experiments on simulated and real data show that KDM is accurate, scalable, and competitive across a range of tasks.

2504.08482 2026-03-27 math.ST stat.TH

Winsorized mean estimation with heavy tails and adversarial contamination

Anders Bredahl Kock, David Preinerstorfer

详情
英文摘要

Finite-sample upper bounds on the estimation error of a winsorized mean estimator of the population mean in the presence of heavy tails and adversarial contamination are established. In comparison to existing results, the winsorized mean estimator we study avoids a sample splitting device and winsorizes substantially fewer observations, which improves its applicability and practical performance.

2504.08435 2026-03-27 math.ST stat.TH

High-dimensional Gaussian and bootstrap approximations for robust means

Anders Bredahl Kock, David Preinerstorfer

详情
英文摘要

Recent years have witnessed much progress on Gaussian and bootstrap approximations to the distribution of sums of independent random vectors with dimension $d$ large relative to the sample size $n$. However, for any number of moments $m>2$ that the summands may possess, there exist distributions such that these approximations break down if $d$ grows faster than the polynomial barrier $n^{\frac{m}{2}-1}$. In this paper, we establish Gaussian and bootstrap approximations to the distributions of winsorized and trimmed means that allow $d$ to grow at an exponential rate in $n$ as long as $m>2$ moments exist. The approximations remain valid under some amount of adversarial contamination. Our implementations of the winsorized and trimmed means do not require knowledge of $m$. As a consequence, the performance of the approximation guarantees ``adapts'' to $m$.

2503.16104 2026-03-27 cs.CY cs.CR stat.AP

Doing More With Less: Mismatch-Based Risk-Limiting Audits

Alexander Ek, Michelle Blom, Philip B. Stark, Peter J. Stuckey, Vanessa J. Teague, Damjan Vukcevic

Comments 15 pages, 2 figures. Presented at Voting'25. The current version fixes a few minor errors

详情
Journal ref
FC 2025 Workshops, Lecture Notes in Computer Science 15754 (2026) 241-255
英文摘要

One approach to risk-limiting audits (RLAs) compares randomly selected cast vote records (CVRs) to votes read by human auditors from the corresponding ballot cards. Historically, such methods reduce audit sample sizes by considering how each sampled CVR differs from the corresponding true vote, not merely whether they differ. Here we investigate the latter approach, auditing by testing whether the total number of mismatches in the full set of CVRs exceeds the minimum number of CVR errors required for the reported outcome to be wrong (the "CVR margin"). This strategy makes it possible to audit more social choice functions and simplifies RLAs conceptually, which makes it easier to explain than some other RLA approaches. The cost is larger sample sizes. "Mismatch-based RLAs" only require a lower bound on the CVR margin, which for some social choice functions is easier to calculate than the effect of particular errors. When the population rate of mismatches is low and the lower bound on the CVR margin is close to the true CVR margin, the increase in sample size is small. However, the increase may be very large when errors include errors that, if corrected, would widen the CVR margin rather than narrow it; errors affect the margin between candidates other than the reported winner with the fewest votes and the reported loser with the most votes; or errors that affect different margins.

2312.10431 2026-03-27 cs.LG stat.ML

Continuous Diffusion for Mixed-Type Tabular Data

Markus Mueller, Kathrin Gruber, Dennis Fok

Comments published at ICLR 2025

详情
英文摘要

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. CDTD is based on a novel combination of score matching and score interpolation to enforce a unified continuous noise distribution for both continuous and categorical features. We explicitly acknowledge the necessity of homogenizing distinct data types by relying on model-specific loss calibration and initialization schemes. To further address the high heterogeneity in mixed-type tabular data, we introduce adaptive feature- or type-specific noise schedules. These ensure balanced generative performance across features and optimize the allocation of model capacity across features and diffusion time. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts sample quality. Replication code is available at https://github.com/muellermarkus/cdtd.

2209.10053 2026-03-27 math.PR math.ST stat.ML stat.TH

Instance-dependent uniform tail bounds for empirical processes

Sohail Bahmani

Comments accepted for publication in IEEE Transactions on Information Theory

详情
英文摘要

We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial ``deflation'' step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the ``deflated function class'' in terms of a generalization of Talagrand's $γ$ functional, and the deviation of the function instance, both of which are formulated based on the natural seminorm induced by the corresponding Cramér functions. Leveraging another less demanding natural seminorm, we also show similar bounds, though with implicit dependence on the sample size, in the more general case where finite exponential moments cannot be assumed. We also provide approximations of the tail bounds in terms of the more prevalent Orlicz norms or their ``incomplete'' versions under suitable moment conditions.

2603.25055 2026-03-27 math.ST stat.TH

Kendall Correlation Coefficient for non-Identically Distributed Variables

Alexei Stepanov

Comments no comment

详情
英文摘要

In the present paper, we discuss for the first time the theoretical Kendall correlation coefficient for non-identical bivariate data. In the non-identical case, we first introduce a theoretical Kendall correlation coefficient $τ_n$ and show that the expected value of the rank Kendall correlation coefficient $\tildeτ_n$ is equal to $τ_n$. We then prove that $\tildeτ_n$ converges in probability to $τ=\lim_{n\rightarrow\infty} τ_n$. These facts enable us to state that $τ_n$ is a correctly defined theoretical Kendall correlation coefficient for the non-identical case. We also support our theoretical results by simulation experiments.

2603.25047 2026-03-27 cs.LG stat.ML

The Order Is The Message

Jordan LeDoux

Comments 51 pages, 12 figures

详情
英文摘要

In a controlled experiment on modular arithmetic ($p = 9973$), varying only example ordering while holding all else constant, two fixed-ordering strategies achieve 99.5\% test accuracy by epochs 487 and 659 respectively from a training set comprising 0.3\% of the input space, well below established sample complexity lower bounds for this task under IID ordering. The IID baseline achieves 0.30\% after 5{,}000 epochs from identical data. An adversarially structured ordering suppresses learning entirely. The generalizing model reliably constructs a Fourier representation whose fundamental frequency is the Fourier dual of the ordering structure, encoding information present in no individual training example, with the same fundamental emerging across all seeds tested regardless of initialization or training set composition. We discuss implications for training efficiency, the reinterpretation of grokking, and the safety risks of a channel that evades all content-level auditing.

2603.25024 2026-03-27 stat.ML cs.LG

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Chenxu Yu, Wenqi Fang

详情
英文摘要

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image classification and sequence modeling, achieving lower NFEs and improved predictive accuracy.

2603.25017 2026-03-27 stat.ME stat.ML

Discrete Causal Representation Learning

Wenjin Zhang, Yixin Wang, Yuqi Gu

详情
英文摘要

Causal representation learning seeks to uncover causal relationships among high-level latent variables from low-level, entangled, and noisy observations. Existing approaches often either rely on deep neural networks, which lack interpretability and formal guarantees, or impose restrictive assumptions like linearity, continuous-only observations, and strong structural priors. These limitations particularly challenge applications with a large number of discrete latent variables and mixed-type observations. To address these challenges, we propose discrete causal representation learning (DCRL), a generative framework that models a directed acyclic graph among discrete latent variables, along with a sparse bipartite graph linking latent and observed layers. This design accommodates continuous, count, and binary responses through flexible measurement models while maintaining interpretability. Under mild conditions, we prove that both the bipartite measurement graph and the latent causal graph are identifiable from the observed data distribution alone. We further propose a three-stage estimate-resample-discovery pipeline: penalized estimation of the generative model parameters, resampling of latent configurations from the fitted model, and score-based causal discovery on the resampled latents. We establish the consistency of this procedure, ensuring reliable recovery of the latent causal structure. Empirical studies on educational assessment and synthetic image data demonstrate that DCRL recovers sparse and interpretable latent causal structures.

2603.25010 2026-03-27 stat.ME

Bayesian Propensity Score-Augmented Latent Factor Models for Causal Inference with Time-Series Cross-Sectional Data

Licheng Liu

详情
英文摘要

We propose a Bayesian propensity score-augmented latent factor model for causal inference with time-series cross-sectional data. The framework explicitly models the treatment assignment mechanism by incorporating latent factor loadings, while the outcome model flexibly incorporates the propensity score, for example through stratification. Relative to existing approaches, the proposed method provides greater flexibility and captures additional heterogeneity across propensity-score strata, enabling more credible comparisons between treated and control units within each stratum. For estimation and inference, we adopt an approximate Bayesian procedure to address the model feedback problem common in Bayesian propensity score analysis. We demonstrate the performance of the proposed method through Monte Carlo simulations and an empirical application examining the effect of political connections on firm value.

2603.24974 2026-03-27 math.OC cs.LG stat.ML

The Value of Information in Resource-Constrained Pricing

Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi

Comments Extended version of the NeurIPS 2025 paper (arXiv:2501.14155). This version adds phase transition, surrogate-assisted variance reduction under model misspecification, and numerical experiments

详情
英文摘要

Firms that price perishable resources -- airline seats, hotel rooms, seasonal inventory -- now routinely use demand predictions, but these predictions vary widely in quality. Under hard capacity constraints, acting on an inaccurate prediction can irreversibly deplete inventory needed for future periods. We study how prediction uncertainty propagates into dynamic pricing decisions with linear demand, stochastic noise, and finite capacity. A certified demand forecast with known error bound~$ε^0$ specifies where the system should operate: it shifts regret from $O(\sqrt{T})$ to $O(\log T)$ when $ε^0 \lesssim T^{-1/4}$, and we prove this threshold is tight. A misspecified surrogate model -- biased but correlated with true demand -- cannot set prices directly but reduces learning variance by a factor of $(1-ρ^2)$ through control variates. The two mechanisms compose: the forecast determines the regret regime; the surrogate tightens estimation within it. All algorithms rest on a boundary attraction mechanism that stabilizes pricing near degenerate capacity boundaries without requiring non-degeneracy assumptions. Experiments confirm the phase transition threshold, the variance reduction from surrogates, and robustness across problem instances.

2603.24916 2026-03-27 cs.LG stat.ML

Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML

Yassien Shaalan

Comments 12 pages, 5 figures. Accepted at MLSys 2026. TinyML / on-device learning paper on hypernetwork-based compression for ECG and other 1D biosignals, with integer-only inference on commodity MCUs. Evaluated on Apnea-ECG, PTB-XL, and MIT-BIH. Camera-ready version with additional datasets, experiments, and insights will appear after May 2026

详情
Journal ref
MLSys 2026
英文摘要

Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes, kept PW1, and backbone; (ii) a unified evaluation with validation-tuned t* and bootstrap confidence intervals; and (iii) a deployability analysis covering integer-only inference and boot versus lazy synthesis. On three ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), HYPER-TINYPW shifts the macro-F1 versus flash Pareto frontier: at about 225 kB it matches a roughly 1.4 MB CNN while being 6.31x smaller (84.15% fewer bytes), retaining at least 95% of large-model macro-F1. Under 32-64 kB budgets it sustains balanced detection where compact baselines degrade. The mechanism applies broadly to other 1D biosignals, on-device speech, and embedded sensing tasks where per-layer redundancy dominates, indicating a wider role for compression-as-generation in resource-constrained ML systems. Beyond ECG, HYPER-TINYPW transfers to TinyML audio: on Speech Commands it reaches 96.2% test accuracy (98.2% best validation), supporting broader applicability to embedded sensing workloads where repeated linear mixers dominate memory.

2603.24899 2026-03-27 econ.EM stat.AP

Calibrating Resident Surveys with Operational Data in Community Planning

Irene S. Gabashvili

Comments 13 pages, 2 figures, 1 table

详情
英文摘要

Community associations rely heavily on resident surveys to guide decisions about amenities, infrastructure, and services. However, survey responses reflect perceptions that may not directly correspond to underlying operational conditions. This study bridges that gap by calibrating survey-based satisfaction measures against objective utilization data. Using parking and facility data from Tellico Village, we map perceived problem rates to utilization exceedance probabilities to estimate behavioral congestion thresholds. Results show that dissatisfaction emerges near effective capacity - once spatial, temporal, and informational constraints are considered - rather than at nominal capacity limits. Perceived difficulty is concentrated among active users and is shaped by operational frictions and incomplete system knowledge. These findings demonstrate that perceived congestion reflects constraints on access and reliability, not simply physical shortages. By distinguishing between effective and nominal capacity, the proposed framework enables more accurate diagnosis of system conditions. We propose incorporating behavioral metrics into community performance frameworks to support better decision-making, reduce unnecessary capital expansion, and target operational improvements more effectively.

2603.24875 2026-03-27 stat.ME

Post-selection inference in generalized linear models via parametric programming

Qinyan Shen, Karl Gregory, Xianzheng Huang

详情
英文摘要

We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy for post-selection inference in the linear model with a Gaussian response by drawing parallels between maximum likelihood estimation in GLMs and least squares estimation in linear models. We then conduct post-selection inference based on a linearized model for pseudo response and covariate data strategically created based on the raw data. Using synthetic data generated from regression models for three different types of non-Gaussian responses in simulation experiments, we demonstrate that the proposed method effectively corrects the naive inference that ignores variable selection while achieving greater efficiency than a polyhedral-based post-selection adjustment.

2603.24859 2026-03-27 stat.ME math.ST stat.TH

Interpretable Causal Graphical Models for Equilibrium Systems with Confounding

Kai Z. Teh, Kayvan Sadeghi, Terry Soo

详情
英文摘要

In applications, quantities of interest are often modelled in equilibrium or an equilibrium solution is sought. The presence of confounding makes causal inference in this setting challenging. We provide interpretable graphical models for equilibrium systems with confounding using anterial graphs (Lauritzen and Sadeghi, 2018), a class of graphs containing directed acyclic graphs, ancestral graphs, and chain graphs. In this setting, we provide valid graphical representations of both counterfactual variables and observational variables, which we relate to counterfactual graphs (Shpitser and Pearl, 2007) and single-world intervention graphs (Richardson and Robins,2013). As an application of this graphical representation, we provide an element-wise procedure of selecting adjustment sets that flexibly include and exclude given covariates.

2603.24833 2026-03-27 stat.ME econ.EM stat.ML

Robust Matrix Estimation with Side Information

Anish Agarwal, Jungjun Choi, Ming Yuan

详情
英文摘要

We introduce a flexible framework for high-dimensional matrix estimation to incorporate side information for both rows and columns. Existing approaches, such as inductive matrix completion, often impose restrictive structure-for example, an exact low-rank covariate interaction term, linear covariate effects, and limited ability to exploit components explained only by one side (row or column) or by neither-and frequently omit an explicit noise component. To address these limitations, we propose to decompose the underlying matrix as the sum of four complementary components: (possibly nonlinear) interaction between row and column characteristics; row characteristic-driven component, column characteristic-driven component, and residual low-rank structure unexplained by observed characteristics. By combining sieve-based projection with nuclear-norm penalization, each component can be estimated separately and these estimated components can then be aggregated to yield a final estimate. We derive convergence rates that highlight robustness across a range of model configurations depending on the informativeness of the side information. We further extend the method to partially observed matrices under both missing-at-random and missing-not-at-random mechanisms, including block-missing patterns motivated by causal panel data. Simulations and a real-data application to tobacco sales show that leveraging side information improves imputation accuracy and can enhance treatment-effect estimation relative to standard low-rank and spectral-based alternatives.

2603.24820 2026-03-27 stat.ME stat.CO

Robust Twoblock Simultaneous Dimension Reduction

Sven Serneels

详情
英文摘要

This paper introduces robust twoblock (RTB) simultaneous dimension reduction, which is the first statistically robust method to perform simultaneous dimension reduction in two blocks of variables and allows to fine-tune the model complexity in each block individually. The paper proposes both a dense and a sparse version of the new method. Sparse RTB is the first robust estimator that allows to select both model complexity and the degree of sparsity for each block individually. RTB thereby allows to optimally extract and summarize the relevant portion of information in each block of data, also in the presence of outliers. As a corollary, the estimators can be recombined into a single estimate of regression coefficients for multivariate regression that is operable when the number of variables exceeds the number of cases in each block. An extensive simulation study illustrates that the new methods are resistant to different types of outliers, while maintaining estimation efficiency. across a range of dimensionality settings. These findings both hold true for the dense and the sparse method. The methods' performance is further illustrated on two example data sets and a straightforward algorithm is presented and made accessible in an open source repository.

2603.24792 2026-03-27 stat.ME math.ST stat.TH

Improving online FDR procedures via online analogs of e-closure and compound e-values

Ziyu Xu, Lasse Fischer, Aaditya Ramdas

Comments 44 pages, 9 figures

详情
英文摘要

In many scientific applications, hypotheses are generated and tested continuously in a stream. We develop a framework for improving online multiple testing procedures with false discovery rate (FDR) control under arbitrary dependence. Our approach is two-fold: we construct methods via the online e-closure principle, as well as a novel formulation of online compound e-values that is defined through donations. This yields strict power improvements over state-of-the-art e-value and p-value procedures while retaining FDR control. We further derive algorithms that compute the decision at time $t$ in $O(\log t)$ time, and we demonstrate improved empirical performance on synthetic and real data.

2603.24783 2026-03-27 stat.ME q-bio.GN stat.AP

Causal Discovery on Dependent Mixed Data with Applications to Gene Regulatory Network Inference

Alex Chen, Qing Zhou

详情
英文摘要

Causal discovery aims to infer causal relationships among variables from observational data, typically represented by a directed acyclic graph (DAG). Most existing methods assume independent and identically distributed observations, an assumption often violated in modern applications. In addition, many datasets contain a mixture of continuous and discrete variables, which further complicates causal modeling when dependence across samples is present. To address these challenges, we propose a de-correlation framework for causal discovery from dependent mixed data. Our approach formulates a structural equation model with latent variables that accommodates both continuous and discrete variables while allowing correlated Gaussian errors across units. We estimate the dependence structure among samples via a pairwise maximum likelihood estimator for the covariance matrix and develop an EM algorithm to impute latent variables underlying discrete observations. A de-correlation transformation of the recovered latent data enables the use of standard causal discovery algorithms to learn the underlying causal graph. Simulation studies demonstrate that the proposed method substantially improves causal graph recovery compared with applying standard methods directly to the original dependent data. We apply our approach to single-cell RNA sequencing data to infer gene regulatory networks governing embryonic stem cell differentiation. The inferred regulatory networks show significantly improved predictive likelihood on test data, and many high-confidence edges are supported by known regulatory interactions reported in the literature.

2603.24727 2026-03-27 econ.TH cs.GT math.OC stat.OT

Adversarial Selection

Alma Cohen, Alon Klement, Zvika Neeman, Eilon Solan

详情
英文摘要

In many institutional settings, $k$ items are selected with the goal of representing the underlying distribution of claims, opinions, or characteristics in a large population. We study environments with two adversarial parties whose preferences over the selected items are commonly known and opposed. We propose the Quantile Mechanism: one party partitions the population into $k$ disjoint subsets, and the other selects one item from each subset. We show that this procedure is optimally representative among all feasible mechanisms, and illustrate its use in jury selection, multi-district litigation, and committee formation.

2603.24718 2026-03-27 stat.ME

Wavelet-based estimation in aggregated functional data with positive and correlated errors

Alex Rodrigo dos Santos Sousa, João Victor Siqueira Rodrigues, Vitor Ribas Perrone, Raul Gomes Rocha

详情
英文摘要

We consider the statistical problem of estimating constituent curves from observations of their aggregated curves, referred to as \textit{aggregated functional data}, in models with strictly positive random errors following a Gamma distribution and correlated errors structured through AR(1) and ARFIMA processes. This problem arises in several areas of knowledge, such as chemometrics, for example, when absorbance curves of the constituents of a given substance must be estimated from its aggregated absorbance curve according to the Beer--Lambert law. In this context, we propose Bayesian wavelet-based methods to estimate the component functions within a functional data analysis framework. This approach has the advantage of accurately estimating curves with important local features, such as discontinuities, peaks, and oscillations, due to the representation properties of functions in wavelet bases. We further evaluate the performance of the proposed method through computational simulations, as well as applications to real data.

2603.24715 2026-03-27 astro-ph.IM astro-ph.CO astro-ph.GA stat.AP

A scalable Bayesian framework for galaxy emission line detection and redshift estimation

Alexander Kuhn, Bonnabelle Zabelle, Sara Algeri, Galin L. Jones, Claudia Scarlata

详情
英文摘要

Estimating galaxy redshifts is crucial for constraining key physical quantities like those in the equation of state of dark energy. Modern telescopes such as the James Webb Space Telescope, the Euclid Space Telescope, and the NASA Nancy Grace Roman Space Telescope are producing massive amounts of spectroscopic data that enable precise redshift estimation. However, a galaxy's redshift can be estimated only when emission lines are present in the observed spectrum, which is unknown a priori. A novel Bayesian approach to estimating redshift and simultaneously testing for the presence of emission lines is developed. Although modern spectroscopic surveys involve millions of spectra and give rise to highly multimodal posterior distributions, the proposed framework remains computationally efficient, admitting a parallelizable implementation suitable for large-scale inference.

2603.24704 2026-03-27 stat.ME cs.LG stat.AP stat.ML

Conformal Selective Prediction with General Risk Control

Tian Bai, Ying Jin

详情
英文摘要

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.

2603.24643 2026-03-27 stat.AP

A capture-recapture hidden Markov model framework for register-based inference of population size and dynamics

Lucy Y Brown, Eleni Matechou, Bruno Santos, Eleonora Mussino

Comments Submitted to Annals of Applied Statistics. Main paper: 20 pages (5 figures, 1 table). Supplementary material: 26 pages

详情
英文摘要

Accurate inference on population dynamics, such as migration and changes in population size, is essential for policymaking, resource allocation and demographic research. Traditional censuses are expensive, infrequent and not timely, leading many countries to adopt register-based approaches to replace or complement them. A primary challenge is that such registers are incomplete: even when individuals are present, their activities may not generate records in specific registers, resulting in false negative observation error. Conversely, some registers arise from administrative or household-level processes, so that individuals may appear in registers despite being absent, leading to false positive observation error. Existing approaches often either rely on ad-hoc decisions that ignore one or both error types, offer inference on population snapshots but not dynamics, or are computationally too slow for practical use. We propose a scalable framework for inferring population size and dynamics from register data, building on Cormack-Jolly-Seber type capture-recapture models formulated as hidden Markov models. Inference is carried out using maximum likelihood estimation, with uncertainty quantified via the Bag of Little Bootstraps. The model accounts for temporary emigration, incorporates an arbitrary number of possibly interacting registers subject to both error types, and allows observation probabilities to vary with individual characteristics and unobservable heterogeneity. We illustrate the approach using Swedish population registers, where overcoverage - individuals registered as living in the country although they are no longer present - provides a motivating example. The application yields new insights into population dynamics and individual trajectories.

2603.24640 2026-03-27 q-fin.RM math.PR math.ST stat.TH

Ordering results for extreme claim amounts based on random number of claims

Sangita Das

详情
Journal ref
Ricerche di Matematica, 2026
英文摘要

Consider two sequences of heterogeneous and independent portfolios of risks $T_1,T_2,\ldots$ and $T^*_{1}, T^*_{2},\ldots$ and, let $N_1$ and $N_2$ be two positive integer-valued random variables, independent of $T_i'$ and $T^*_i$, respectively. In this article, we investigate different stochastic inequalities involving $\min\{T_1,\ldots,T_{N_1}\}$ and $\min\{T^*_1,\ldots,T^*_{N_2}\},$ and $\max\{T_1,\ldots,T_{N_1}\}$ and $\max\{T^*_1,\ldots,T^*_{N_2}\}$ in the sense of usual stochastic order and reversed hazard rate order concerning maltivariate chain majorization order. These new results strengthen and generalize some of the well known results in the literature, including \cite{barmalzan2017ordering}, \cite{balakrishnan2018} and \cite{kundu2021_shock} for the case of random claim sizes. Different numerical examples are provided to highlight the applicability of this work. Finally, some interesting applications of our results in reliability theory and auction theory are presented.

2603.24638 2026-03-27 cs.LG cond-mat.mtrl-sci physics.chem-ph physics.comp-ph stat.ML

How unconstrained machine-learning models learn physical symmetries

Michelangelo Domina, Joseph William Abbott, Paolo Pegolo, Filippo Bigi, Michele Ceriotti

Comments 15 pages, 9 figures

详情
英文摘要

The requirement of generating predictions that exactly fulfill the fundamental symmetry of the corresponding physical quantities has profoundly shaped the development of machine-learning models for physical simulations. In many cases, models are built using constrained mathematical forms that ensure that symmetries are enforced exactly. However, unconstrained models that do not obey rotational symmetries are often found to have competitive performance, and to be able to \emph{learn} to a high level of accuracy an approximate equivariant behavior with a simple data augmentation strategy. In this paper, we introduce rigorous metrics to measure the symmetry content of the learned representations in such models, and assess the accuracy by which the outputs fulfill the equivariant condition. We apply these metrics to two unconstrained, transformer-based models operating on decorated point clouds (a graph neural network for atomistic simulations and a PointNet-style architecture for particle physics) to investigate how symmetry information is processed across architectural layers and is learned during training. Based on these insights, we establish a rigorous framework for diagnosing spectral failure modes in ML models. Enabled by this analysis, we demonstrate that one can achieve superior stability and accuracy by strategically injecting the minimum required inductive biases, preserving the high expressivity and scalability of unconstrained architectures while guaranteeing physical fidelity.

2603.24632 2026-03-27 stat.ME

Estimation in moderately misspecified models

Nils Lid Hjort

Comments 31 pages, 1 figure. Statistical Research Report, Department of Mathematics, University of Oslo, from May 1993, but arXiv'd March 2026

详情
英文摘要

Suppose data are fitted to some parametric model but that the true model happens to be one with an additional parameter. When a parameter is to be estimated one can use likelihood estimation in the wider model or in the narrow model. Including the extra parameter in the model means less bias but larger sampling variability. Two basic questions are addressed in this article. (i) Just how much misspecification can the narrow model tolerate? In the context of a large-sample moderate misspecification framework we find a surprisingly simple, sharp, and general answer. There is effectively a `tolerance radius' around a given narrow model, inside of which narrow estimation is more precise than wide estimation for all estimands. This is computed in a selection of examples that also demonstrate the degree of robustness of important standard methods against moderate incorrectness of the model under which they are optimal. (ii) Are there other estimators that work well both under narrow and wide circumstances? We discuss several possibilities and propose some new procedures. All methods are compared in a broad large-sample performance study.

2603.24613 2026-03-27 cs.CG math.AT math.OC stat.ML

Persistence-based topological optimization: a survey

Mathieu Carriere, Yuichi Ike, Théo Lacombe, Naoki Nishikawa

详情
英文摘要

Computational topology provides a tool, persistent homology, to extract quantitative descriptors from structured objects (images, graphs, point clouds, etc). These descriptors can then be involved in optimization problems, typically as a way to incorporate topological priors or to regularize machine learning models. This is usually achieved by minimizing adequate, topologically-informed losses based on these descriptors, which, in turn, naturally raises theoretical and practical questions about the possibility of optimizing such loss functions using gradient-based algorithms. This has been an active research field in the topological data analysis community over the last decade, and various techniques have been developed to enable optimization of persistence-based loss functions with gradient descent schemes. This survey presents the current state of this field, covering its theoretical foundations, the algorithmic aspects, and showcasing practical uses in several applications. It includes a detailed introduction to persistence theory and, as such, aims at being accessible to mathematicians and data scientists newcomers to the field. It is accompanied by an open-source library which implements the different approaches covered in this survey, providing a convenient playground for researchers to get familiar with the field.

2603.22208 2026-03-27 stat.AP stat.ME stat.ML stat.OT

Identification of physiological shock in intensive care units via Bayesian regime switching models

Emmett B. Kendall, Jonathan P. Williams, Curtis B. Storlie, Misty A. Radosevich, Erica D. Wittwer, Matthew A. Warner

详情
英文摘要

Detection of occult hemorrhage (i.e., internal bleeding) in patients in intensive care units (ICUs) can pose significant challenges for critical care workers. Because blood loss may not always be clinically apparent, clinicians rely on monitoring vital signs for specific trends indicative of a hemorrhage event. The inherent difficulties of diagnosing such an event can lead to late intervention by clinicians which has catastrophic consequences. Therefore, a methodology for early detection of hemorrhage has wide utility. We develop a Bayesian regime switching model (RSM) that analyzes trends in patients' vitals and labs to provide a probabilistic assessment of the underlying physiological state that a patient is in at any given time. This article is motivated by a comprehensive dataset we curated from Mayo Clinic of 33,924 real ICU patient encounters. Longitudinal response measurements are modeled as a vector autoregressive process conditional on all latent states up to the current time point, and the latent states follow a Markov process. We present a novel Bayesian sampling routine to learn the posterior probability distribution of the latent physiological states, as well as develop an approach to account for pre-ICU-admission physiological changes. A simulation and real case study illustrate the effectiveness of our approach.

2603.00704 2026-03-27 stat.ME econ.EM

Robustifying Empirical Bayes

Roger Koenker, Jiaying Gu

详情
英文摘要

Two strategies are explored for robustifying classical denoising procedures for the Gaussian sequence model. First, the Hodges and Lehmann (1952) restricted Bayes approach is used to reduce sensitivity to the specification of the initial prior distribution. Second, alternatives to the Gaussian noise assumption are explored. In both cases proposals of Huber (1964) and Mallows (1978) play a crucial role.

2602.20844 2026-03-27 math.ST cs.IT math.IT math.PR stat.ME stat.ML stat.TH

Maximum entropy based testing in network models: ERGMs and constrained optimization

Subhro Ghosh, Rathindra Nath Karmakar, Samriddha Lahiry

Comments 71 pages, authors are listed in alphabetical order of their surnames

详情
英文摘要

Stochastic network models play a central role across a wide range of scientific disciplines, and questions of statistical inference arise naturally in this context. In this paper we investigate goodness-of-fit and two-sample testing procedures for statistical networks based on the principle of maximum entropy (MaxEnt). Our approach formulates a constrained entropy-maximization problem on the space of networks, subject to prescribed structural constraints. The resulting test statistics are defined through the Lagrange multipliers associated with the constrained optimization problem, which, to our knowledge, is novel in the statistical networks literature. We establish consistency in the classical regime where the number of vertices is fixed. We then consider asymptotic regimes in which the graph size grows with the sample size, developing tests for both dense and sparse settings. In the dense case, we analyze exponential random graph models (ERGM) (including the Erdös-Rènyi models), while in the sparse regime our theory applies to Erd{ö}s-R{è}nyi graphs. Our analysis leverages recent advances in nonlinear large deviation theory for random graphs. We further show that the proposed Lagrange-multiplier framework connects naturally to classical score tests for constrained maximum likelihood estimation. The results provide a unified entropy-based framework for network model assessment across diverse growth regimes.

2602.15091 2026-03-27 stat.ML cs.IT cs.LG math.IT

Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs

Ali Khalesi, Mohammad Reza Deylam Salehi

详情
英文摘要

Mixture-of-Experts (MoE) architectures decompose prediction tasks into specialized expert sub-networks selected by a gating mechanism. This letter adopts a communication-theoretic view of MoE gating, modeling the gate as a stochastic channel operating under a finite information rate. Within an information-theoretic learning framework, {we specialize a mutual-information generalization bound and develop a rate-distortion characterization $D(R_g)$ of finite-rate gating, where $R_g:=I(X; T)$, yielding (under a standard empirical rate-distortion optimality condition) $\mathbb{E}[R(W)] \le D(R_g)+δ_m+\sqrt{(2/m)\, I(S; W)}$. }The analysis yields capacity-aware limits for communication-constrained MoE systems, and numerical simulations on synthetic multi-expert models empirically confirm the predicted trade-offs between gating rate, expressivity, and generalization.

2602.12023 2026-03-27 econ.EM math.ST stat.ML stat.TH

Decomposition of Spillover Effects Under Misspecification: Pseudo-true Estimands and a Local-Global Extension

Yechan Park, Xiaodong Yang

详情
英文摘要

Applied work under interference typically models outcomes as functions of own treatment and a low-dimensional exposure mapping of others' treatments, even when that mapping may be misspecified. We ask what policy object such exposure-based procedures target. Taking the marginal policy effect as primitive, we show that any researcher-chosen exposure mapping induces a unique pseudo-true outcome model: the best approximation to the underlying potential outcomes within the class of functions that depend only on that mapping. This yields a decomposition of the marginal policy effect into exposure-based direct and spillover effects, and each component optimally approximates its oracle counterpart, with a sign-preserving interpretation under monotonicity. We then study a structured misspecification setting in which outcomes depend on both network spillovers and a global equilibrium channel, while the analyst may model only one. In this setting, we obtain a sharper asymptotic decomposition into direct, local, and global components, implying that existing estimators recover their respective oracle channel-specific effects even when the other channel is present but omitted from the maintained model. The analysis also yields phase transitions in convergence rates and higher-order expansions for Z-estimators. A semi-synthetic experiment calibrated to a large cash-transfer study illustrates the empirical relevance of the framework.

2512.09295 2026-03-27 math.ST cs.LG math.PR stat.ML stat.TH

Distributional Shrinkage II: Higher-Order Scores Encode Brenier Map

Tengyuan Liang

Comments 25 pages

详情
英文摘要

Consider the additive Gaussian model $Y = X + σZ$, where $X \sim P$ is an unknown signal, $Z \sim N(0,1)$ is independent of $X$, and $σ> 0$ is known. Let $Q$ denote the law of $Y$. We construct a hierarchy of denoisers $T_0, T_1, \ldots, T_\infty \colon \mathbb{R} \to \mathbb{R}$ that depend only on higher-order score functions $q^{(m)}/q$, $m \geq 1$, of $Q$ and require no knowledge of the law $P$. The $K$-th order denoiser $T_K$ involves scores up to order $2K{-}1$ and satisfies $W_r(T_K \sharp Q, P) = O(σ^{2(K+1)})$ for every $r \geq 1$; in the limit, $T_\infty$ recovers the monotone optimal transport map (Brenier map) pushing $Q$ onto $P$. We provide a complete characterization of the combinatorial structure governing this hierarchy through partial Bell polynomial recursions, making precise how higher-order score functions encode the Brenier map. We further establish rates of convergence for estimating these scores from $n$ i.i.d.\ draws from $Q$ under two complementary strategies: (i) plug-in kernel density estimation, and (ii) higher-order score matching. The construction reveals a precise interplay among higher-order Fisher-type information, optimal transport, and the combinatorics of integer partitions.

2510.26930 2026-03-27 stat.ME stat.CO

The Interplay between Bayesian Inference and Conformal Prediction

Nina Deliu, Brunero Liseo

Comments 16 pages, 2 figures

详情
Journal ref
Philosophical Transactions of the Royal Society A 2026
英文摘要

Conformal prediction has emerged as a cutting-edge methodology in statistics and machine learning, providing prediction intervals with finite-sample frequentist coverage guarantees. Yet, its interplay with Bayesian statistics, often criticised for lacking frequentist guarantees, remains underexplored. Recent work has suggested that conformal prediction can serve to "calibrate" Bayesian credible sets, thereby imparting frequentist validity and motivating deeper investigation into frequentist-Bayesian hybrids. We further argue that Bayesian procedures have the potential to enhance conformal prediction, not only in terms of more informative intervals, but also for achieving nearly optimal solutions under a decision-theoretic framework. Thus, the two paradigms can be jointly used for a principled balance between validity and efficiency. This work provides a basis for bridging this gap. After surveying existing ideas, we formalise the Bayesian conformal inference framework, covering challenging aspects such as statistical efficiency and computational complexity.

2503.13115 2026-03-27 cs.LG cs.AI math.PR stat.ML

Beyond Propagation of Chaos: A Stochastic Algorithm for Mean Field Optimization

Chandan Tankala, Dheeraj M. Nagaraj, Anant Raj

详情
Journal ref
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:5410-5440, 2025
英文摘要

Gradient flow in the 2-Wasserstein space is widely used to optimize functionals over probability distributions and is typically implemented using an interacting particle system with $n$ particles. Analyzing these algorithms requires showing (a) that the finite-particle system converges and/or (b) that the resultant empirical distribution of the particles closely approximates the optimal distribution (i.e., propagation of chaos). However, establishing efficient sufficient conditions can be challenging, as the finite particle system may produce heavily dependent random variables. In this work, we study the virtual particle stochastic approximation, originally introduced for Stein Variational Gradient Descent. This method can be viewed as a form of stochastic gradient descent in the Wasserstein space and can be implemented efficiently. In popular settings, we demonstrate that our algorithm's output converges to the optimal distribution under conditions similar to those for the infinite particle limit, and it produces i.i.d. samples without the need to explicitly establish propagation of chaos bounds.

2411.12135 2026-03-27 stat.ML cs.LG

Exact Risk Curves of signSGD in High-Dimensions: Quantifying Preconditioning and Noise-Compression Effects

Ke Liang Xiao, Noah Marshall, Atish Agarwala, Elliot Paquette

详情
英文摘要

In recent years, signSGD has garnered interest as both a practical optimizer as well as a simple model to understand adaptive optimizers like Adam. Though there is a general consensus that signSGD acts to precondition optimization and reshapes noise, quantitatively understanding these effects in theoretically solvable settings remains difficult. We present an analysis of signSGD in a high dimensional limit, and derive a limiting SDE and ODE to describe the risk. Using this framework we quantify four effects of signSGD: effective learning rate, noise compression, diagonal preconditioning, and gradient noise reshaping. Our analysis is consistent with experimental observations but moves beyond that by quantifying the dependence of these effects on the data and noise distributions. We conclude with a conjecture on how these results might be extended to Adam.

2405.17490 2026-03-27 cs.LG stat.ML

Revisit, Extend, and Enhance Hessian-Free Influence Functions

Ziao Yang, Han Yue, Jian Chen, Hongfu Liu

详情
英文摘要

Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primarily due to the non-convex nature of the loss function and the large size of model parameters. This difficulty not only makes computing the inverse of the Hessian matrix costly but also renders it non-existent in some cases. Various approaches, including matrix decomposition, have been explored to expedite and approximate the inversion of the Hessian matrix, with the aim of making influence functions applicable to deep models. In this paper, we revisit a specific, albeit naive, yet effective approximation method known as TracIn. This method substitutes the inverse of the Hessian matrix with an identity matrix. We provide deeper insights into why this simple approximation method performs well. Furthermore, we extend its applications beyond measuring model utility to include considerations of fairness and robustness. Finally, we enhance TracIn through an ensemble strategy. To validate its effectiveness, we conduct experiments on synthetic data and extensive evaluations on noisy label detection, sample selection for large language model fine-tuning, and defense against adversarial attacks.

2402.11394 2026-03-27 math.PR econ.EM math.ST stat.TH

Maximal Inequalities for Empirical Processes under General Mixing Conditions

Demian Pouzo

详情
英文摘要

This paper provides a bound for the supremum of sample averages over a class of functions for a general class of mixing stochastic processes with arbitrary mixing rates. Regardless of the speed of mixing, the bound is comprised of a concentration rate and a novel measure of complexity. The speed of mixing, however, affects the former quantity implying a phase transition. Fast mixing leads to the standard root-n concentration rate, while slow mixing leads to a slower concentration rate whose speed depends on the mixing structure. Our findings are applied to obtain new Glivenko-Cantelli type results.

2311.10270 2026-03-27 cs.LG cs.NA cs.SI eess.SP math.NA stat.ML

Multiscale Hodge Scattering Networks for Data Analysis

Naoki Saito, Stefan C. Schonsheck, Eugene Shvarts

Comments 20 Pages, Comments Welcome

详情
Journal ref
ACHA.84.2026
英文摘要

We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction builds on multiscale basis dictionaries on simplicial complexes -- namely, the $κ$-GHWT and $κ$-HGLET -- which we recently developed for simplices of dimension $κ\in \mathbb{N}$ in a given simplicial complex by generalizing the node-based Generalized Haar--Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). Both the $κ$-GHWT and the $κ$-HGLET form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs adopt a layered structure analogous to a convolutional neural network (CNN), cascading the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation -- akin to local pooling in CNNs -- that can be performed either locally or per scale. Such pooling operations are more difficult to define in traditional scattering networks based on Morlet wavelets and in geometric scattering networks based on Diffusion Wavelets. As a result, our approach extracts a rich set of descriptive yet robust features that can be combined with simple machine learning models (e.g., logistic regression or support vector machines) to achieve high-accuracy classification with far fewer trainable parameters than most modern graph neural networks require. Finally, we demonstrate the effectiveness of MHSNs on three distinct problem types: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.

2309.03142 2026-03-27 math.AT math.ST stat.TH

Euler Characteristics and Homotopy Types of Definable Sublevel Sets, with Applications to Topological Data Analysis

Mattie Ji, Kun Meng

Comments 22 page, 2 figures, Accepted by Homology, Homotopy and Applications

详情
英文摘要

Given a definable function $f: S \to \mathbb{R}$ on a definable set $S$, we study sublevel sets of the form $S^f_t \coloneqq \{x \in S: f(x) \leq t\}$ for all $t \in \mathbb{R}$. Using o-minimal structures, we prove that the Euler characteristic of $S^f_t$ is right-continuous with respect to $t$. Furthermore, when $S$ is compact, we show that $S^f_{t+δ}$ deformation retracts to $S^f_t$ for all sufficiently small $δ> 0$. Applying these results, we also characterize the connections between the following concepts in topological data analysis: the Euler characteristic transform (ECT), smooth ECT, Euler-Radon transform (ERT), and smooth ERT.