arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.06668 2026-05-08 math.AG math.GN math.SG

Rational homology disk degenerations of elliptic surfaces

Marcos Canedo, Giancarlo Urzúa

详情
英文摘要

In this paper, a $\mathbb{Q}$HD singularity is a weighted homogeneous normal surface singularity admitting a rational homology disk ($\mathbb{Q}$HD) smoothing. These singularities are rational but often not log canonical. We classify all $\mathbb{Q}$HD degenerations of nonsingular projective elliptic surfaces, extending Kawamata's classification of the case with only Wahl singularities (i.e., log terminal $\mathbb{Q}$HD singularities). We also realize all $\mathbb{Q}$HD degenerations of Dolgachev surfaces $D_{a,b}$ with one $\mathbb{Q}$HD singularity, for every pair of integers $a,b$. For each such degeneration, we construct a minimal semi log canonical (slc) birational model via a Seifert partial resolution in the sense of Wahl followed by semistable flips. Finally, we prove that these minimal slc models are unobstructed and deform to the recent degenerations of Dolgachev surfaces constructed by D. Lee and Y. Lee.

2605.06656 2026-05-08 cs.LG cs.DM cs.ET math.OC

Why Global LLM Leaderboards Are Misleading: Small Portfolios for Heterogeneous Supervised ML

Jai Moondra, Ayela Chughtai, Bhargavi Lanka, Swati Gupta

详情
英文摘要

Ranking LLMs via pairwise human feedback underpins current leaderboards for open-ended tasks, such as creative writing and problem-solving. We analyze ~89K comparisons in 116 languages from 52 LLMs from Arena, and show that the best-fit global Bradley-Terry (BT) ranking is misleading. Nearly 2/3 of the decisive votes cancel out, and even the top 50 models according to the global BT ranking are statistically indistinguishable (pairwise win probabilities are at most 0.53 within the top 50 models). We trace this failure to strong, structured heterogeneity of opinions across language, task, and time. Moreover, we find an important characteristic - *language* plays a key role. Grouping by language (and families) increases the agreement of votes massively, resulting in two orders of magnitude higher spread in the ELO scores (i.e., very consistent rankings). What appears as global noise is in fact a mixture of coherent but conflicting subpopulations. To address such heterogeneity in supervised machine learning, we introduce the framework of $(λ, ν)$-portfolios, which are small sets of models that achieve a prediction error at most $λ$, "covering" at least a $ν$ fraction of users. We formulate this as a variant of the set cover problem and provide guarantees using the VC dimension of the underlying set system. On the Arena data, our algorithms recover just 5 distinct BT rankings that cover over 96% of votes at a modest $λ$, compared to the 21% coverage by the global ranking. We also provide a portfolio of 6 LLMs that cover twice as many votes as the top-6 LLMs from a global ranking. We further construct portfolios for a classification problem on the COMPAS dataset using an ensemble of fairness-regularized classification models and show that these portfolios can be used to detect blind spots in the data, which might be of independent interest to policymakers.

2605.06654 2026-05-08 cs.LG cs.AI math.OC

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Yuxing Liu, Jianyu Wang, Tong Zhang

详情
英文摘要

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while achieving the same or better performance on the new task, than other optimizers and, possibly surprisingly, LoRA, during the supervised finetuning (SFT) stage. We term this phenomenon optimizer-model consistency. To better understand it, through controlled experiments and theoretical analysis, we show that: 1) optimizers can shape the models by having regularization effects on the activations, leading to different landscapes around the pretrained checkpoints; 2) in response to this regularization effect, the weight update in SFT should follow some specific structures to lower forgetting of the knowledge learned in pretraining, which can be obtained by using the same optimizer. Moreover, we specifically compare Muon and AdamW when they are employed throughout the pretraining and SFT stages and find that Muon performs worse when finetuned for reasoning tasks. With a synthetic language modeling experiment, we demonstrate that this can come from Muon's strong tendency towards rote memorization, which may hurt pattern acquisition with a small amount of data, as for SFT.

2605.06626 2026-05-08 math.DS math-ph math.MP

Integrable perturbations of polynomial Hamiltonian systems

Dmitry Treschev

Comments 7 pages

详情
英文摘要

We consider a Hamiltonian system on the symplectic space $({\mathbb{R}}^{2n}, dy\wedge dx)$ with a real-analytic Hamiltonian $H : {\mathbb{R}}^{2n}\to {\mathbb{R}}$. We assume that the system has a non-degenerate equilibrium position at the origin. Under some nonresonance assumptions we prove the following. For any positive integer $M$ there exists a real-analytic function $F:{\mathbb{R}}^{2n}\to{\mathbb{R}}$ such that (1) $F = O\big( (|x|+|y|)^{M+1} \big)$ at the origin, (2) the system with Hamiltonian $H+F$ is completely integrable in ${\mathbb{R}}^{2n}$.

2605.06622 2026-05-08 math.FA

On the plasticity of the unit spheres of $\ell_1$, $\ell_{\infty}$, $c$, and Hilbert spaces

Maksym Levchenko, Olesia Zavarzina

详情
英文摘要

This paper demonstrates the expand-contract plasticity of the unit spheres of $\ell_1$, $\ell_{\infty}$, and $c$. Furthermore, it establishes the strong plasticity of the unit spheres of Hilbert spaces.

2605.06621 2026-05-08 math.CO math.MG

Point sets avoiding near-integer distances

Ritesh Goenka, Kenneth Moore

Comments 15 pages, 1 figure

详情
英文摘要

Let $d \in \mathbb{N}$, $δ\in (0, 1/2)$, and $X > 0$. Denote by $N_d(X, δ)$ the maximum number of points in a subset of the closed Euclidean ball of radius $X$ in $\mathbb{R}^d$ such that every pairwise distance is at least $δ$ away from any integer. In the planar case, Sárközy proved that for every $\varepsilon > 0$, $N_2(X, δ) = Ω_δ(X^{1/2-\varepsilon})$ as $X \rightarrow \infty$ whenever $δ$ is sufficiently small in terms of $\varepsilon$, while Konyagin proved the almost matching upper bound $N_2(X,δ) = O_δ(X^{1/2})$. We study this problem in higher dimensions, addressing a question of Erdős and Sárközy. Extending Sárközy's construction, we show that for every $\varepsilon > 0$, $N_3(X, δ) = Ω_δ(X^{1-\varepsilon})$ for $δ$ sufficiently small in terms of $\varepsilon$. We also provide a lifting lemma from integer distance sets to sets avoiding near-integer distances via bilipschitz embeddings of snowflaked Euclidean spaces. This allows us to prove a linear lower bound $N_4(X,δ) = Ω_δ(X)$ for all sufficiently small $δ$. Finally, adapting Konyagin's approach, we prove the upper bound $N_d(X, δ) = O_{d, δ}(X^{d/2})$ for all $d \in \mathbb{N}$.

2605.06618 2026-05-08 math.OC

MTRBO: Multiple trust-region based Bayesian optimization

Sourav Das, Debjani Chakraborty, Pabitra Mitra

详情
英文摘要

Bayesian Optimization (BO) is a popular framework for optimizing black-box functions. Despite its effectiveness, BO is often inefficient for high-dimensional problems due to the exponential growth of the search space, heterogeneity of the objective function, and low sampling budget. To overcome these issues, this work proposes a multiple trust region-based Bayesian optimization technique(MTRBO). A trust region is a localized region within which an optimization model is trusted to approximate the objective function accurately. Assuming a Gaussian process (GP) as a prior belief about the objective function and based on the posterior mean and variance functions, the method adaptively exploits near the promising current solution inside a trust region. Also explores the most uncertain region in the search space inside another trust region. The theoretical global convergence property of the proposed method is established. Then the work is benchmarked against other state-of-the-art trust-region-based Bayesian optimization algorithms, demonstrating superior performance on a variety of non-convex and high-dimensional test functions. The proposed method outperforms others in terms of solution quality within the sampling budget (the number of function evaluations). The proposed method is applied to the portfolio optimization problem to verify its applicability in real-world scenarios.

2605.06617 2026-05-08 math.AC

Connectedness in Codimension One and the Non-$S_2$ Locus

Likun Xie

Comments 27 pages, comments welcome

详情
英文摘要

We formulate a structural principle for finite $S_2$-objects: coherent $S_2$-sheaves and finitely generated graded $S_2$-modules decompose canonically according to the connected components in codimension $1$ of their support. This gives criteria relating indecomposability of $S_2$-objects to connectedness in codimension $1$ of their supports, and extends the Hochster--Huneke correspondences for complete local rings between connectedness in codimension $1$, indecomposability of canonical modules, and localness of the $S_2$-ifications. As a consequence, if $A$ is a local ring admitting a canonical module $ω_A$, there are canonical decompositions of both $ω_A$ and the $S_2$-ification $\operatorname{End}_A(ω_A)$ whose indecomposable summands are the canonical modules and $S_2$-ifications of the quotient rings associated to the connected components in codimension $1$. We then apply this viewpoint to the non-$S_2$ locus. For $A$ equidimensional and unmixed, this locus is naturally realized as $\operatorname{Supp}_A C$ via the $S_2$-ification sequence $0 \to A \to \operatorname{End}_A(ω_A) \to C \to 0$. The natural map between deficiency modules $K^{\dim C+1}(A)\to K^{\dim C}(C)$ identifies the canonical module $K^{\dim C}(C)$ with the $S_2$-hull of $K^{\dim C+1}(A)$. Under suitable conditions, this allows codimension-$1$ connectedness of the non-$S_2$ locus to be detected by the deficiency module $K^{\dim C+1}(A)$. We illustrate the theory with examples and apply it to codimension $2$ lattice ideals, obtaining connectedness-in-codimension-$1$ results for the non-$S_2$ loci of certain toric and lattice rings.

2605.06615 2026-05-08 cs.LG cs.AI cs.CL math.OC

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds

Hongyi Tao, Dingzhi Yu, Lijun Zhang

Comments Code is available at https://github.com/Dingzhen230/SignSGD_Outperforms_SGD

详情
英文摘要

Sign-based optimization algorithms, such as SignSGD and Muon, have garnered significant attention for their remarkable performance in training large foundation models. Despite this empirical success, we still lack a theoretical understanding of when and why these sign-based methods outperform vanilla SGD. The core obstacle is that under standard smoothness and finite variance conditions, SGD is known to be minimax optimal for finding stationary points measured by $\ell_2$-norms, thereby fundamentally precluding any complexity gains for sign-based methods in standard settings. To overcome this barrier, we analyze sign-based optimizers leveraging $\ell_1$-norm stationarity, $\ell_\infty$-smoothness, and a separable noise model, which can better capture the coordinate-wise nature of signed updates. Under this distinct problem geometry, we derive matched upper and lower bounds for SignSGD and explicitly characterize the problem class in which SignSGD provably dominates SGD. Specifically, we compare the \emph{upper bound of SignSGD} with the \emph{lower bound of SGD}, illustrating that SignSGD effectively reduces the complexity by a factor of $d$ under \emph{sparse noise}, where $d$ is the problem dimension. Furthermore, we elevate this framework to the matrix domain, providing an equivalent optimal lower bound for the Muon optimizer, proving that extending the sign operator to matrices preserves this optimal scaling with dimensionality. Finally, we bridge our theoretical bounds to practice, demonstrating that the theoretical superiority of SignSGD accurately predicts its faster convergence during the pretraining of a 124M parameter GPT-2 model.

2605.06589 2026-05-08 math.AP math.OC

Master equations with an individual noise on finite state graphs

Wilfrid Gangbo, Sebastian Munoz, Jeremy Wu, Zhaoyu Zhang

详情
英文摘要

We develop a classical well-posedness and regularity theory on a finite connected weighted graph for an extended mean field game system, its associated master equation, and a Hamilton-Jacobi- Bellman equation on the probability simplex, all in the presence of an individual noise operator. The geometric structure is inherited from the logarithmic-mean activation functional of discrete optimal transport, under which the entropic Fokker-Planck equation appears as a gradient flow on the graph and the individual noise operator is a bilinear form in the probability vector and the Wasserstein gradient. A central technical step is a quantitative preservation-of-positivity estimate for the discrete continuity equation, which rules out finite-time boundary degeneracy and yields a classical solution theory for the master equation on the open simplex without imposing any boundary condition. As an application, we recover a Nash equilibrium interpretation of the discrete system in terms of Markov chains on the graph. Our setup is inspired by the computational algorithms for optimal mass transport of [10, 11] and provides a rigorous well-posedness theory for several of the equations derived in [25].

2605.06585 2026-05-08 cs.LG math.OC

Distributionally-Robust Learning to Optimize

Vinit Ranjan, Jisun Park, Bartolomeo Stellato

详情
英文摘要

We propose a distributionally robust approach to learning hyperparameters for first-order methods in convex optimization. Given a dataset of problem instances, we minimize a Wasserstein distributionally robust version of the performance estimation problem (PEP) over algorithm parameters such as step sizes. Our framework unifies two extremes: as the robustness radius vanishes, we recover classical learning to optimize (L2O); as it grows, we recover worst-case optimal algorithm design via PEP. We solve the resulting problem with stochastic gradient descent, differentiating through the solution of an inner semidefinite program at each step. We prove high-probability bounds showing that the true risk of the learned algorithm is at most the in-sample L2O optimum plus a slack that shrinks with the sample size, and is no worse than the worst-case PEP bound. On unconstrained quadratic minimization, LASSO, and linear programming benchmarks, our learned algorithms achieve strong out-of-sample performance with certifiable robustness, outperforming both worst-case optimal and vanilla L2O baselines.

2605.06573 2026-05-08 math.FA

Common frequently hypercyclic random vectors

Augustin Mouze, Vincent Munnier

详情
英文摘要

We study common frequently hypercyclic vectors for countable families of weighted backward shifts acting on $\ell_p$ spaces, $1\leq p<\infty$. Using probabilistic techniques, we develop a general existence criterion, complemented by a non-existence result. These insights are then applied to the specific setting of countable families of polynomials of weighted backward shifts, providing conditions under which they share a common frequently hypercyclic vector.

2605.06572 2026-05-08 cs.CV cs.NA math.NA

Solving Minimal Problems Without Matrix Inversion Using FFT-Based Interpolation

Haidong Wu, Snehal Bhayani, Janne Heikkilä

Comments Accepted to CVPR 2026

详情
英文摘要

Estimating camera geometry typically involves solving minimal problems formulated as systems of multivariate polynomial equations, which often pose computational challenges when using existing Gröbner-basis or resultant-based methods due to matrix inversion needed in the online solver. Here we propose a sampling-based, matrix inversion-free method that constructs the solvers using sparse hidden-variable resultants. The determinant polynomial in the hidden variable is efficiently reconstructed via inverse fast Fourier transform interpolation from sampled evaluations, avoiding symbolic expansion. Solving this polynomial yields the hidden variable, and the remaining unknowns are recovered by identifying rank-1 deficient submatrices and applying Cramer's rule. A greatest common divisor-based criterion ensures robust submatrix identification under noise. Experiments on diverse minimal problems demonstrate that the proposed solver achieves strong numerical stability and competitive runtime, particularly for small-scale problems, providing a practical alternative to traditional Gröbner-basis and resultant-based solvers.

2605.06570 2026-05-08 cs.LG math.OC q-fin.CP q-fin.MF q-fin.RM

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Dmitri Goloubentsev, Natalija Karpichina

Comments 27 pages, 8 tables. Three domains: natural gas storage, pension fund ALM, pharmaceutical manufacturing. Benchmark code and trained policies available on request

详情
英文摘要

Many real-world problems require sequential decisions under uncertainty: when to inject or withdraw gas from storage, how to rebalance a pension portfolio each month, what temperature profile to run through a pharmaceutical reactor chain. Dynamic programming solves small instances exactly but scales exponentially in state dimensions. Black-box reinforcement learning handles high-dimensional states but trains slowly and produces no sensitivities. We introduce SNAPO (Smooth Neural Adjoint Policy Optimization), a framework that embeds a neural policy inside a known, differentiable simulator, replaces hard constraints with smooth approximations, and computes exact gradients of the objective with respect to all policy parameters and all inputs in a single adjoint pass. We demonstrate SNAPO on three domains: natural gas storage (training in under a minute, 365 forward curve sensitivities at no additional cost per sensitivity), pension fund asset-liability management (6.5x-200x sensitivity speedup over bump-and-revalue, scaling with the number of risk factors), and pharmaceutical manufacturing (cross-unit sensitivities through a 4-unit process chain, with 20 ICH Q8 regulatory sensitivities from 5 adjoint passes in 74.5 milliseconds). All sensitivities are produced by the same backward pass that trains the policy, at a cost proportional to one reverse pass regardless of how many sensitivities are computed.

2605.06569 2026-05-08 math.AP math-ph math.DS math.MP math.NT math.SP

Equidistribution of Eigenfunctions of Quantum Cat Maps

Robert Koirala

Comments 16 pages, 4 figures, comments welcome

详情
英文摘要

We prove that the short-period eigenfunctions of quantum cat maps constructed by Kim and the author equidistribute on $\mathbb{T}^2$ in the sense of semiclassical measures. We also show that their logarithmically large $\ell^\infty$-norm is asymptotically concentrated on a bounded number of coordinates. Thus, for this explicit family, strong coordinate localization coexists with semiclassical equidistribution. These results confirm the behavior suggested by earlier numerical evidence of Kim and the author, and contrast with the scarring phenomena for short-period eigenfunctions observed by Faure, Nonnenmacher, and De Bièvre.

2605.06565 2026-05-08 math.GT

Minimal Homotopies in Three Dimensions: A Cable System Approach

Lia Buchbinder, Bala Krishnamoorthy, Kevin R. Vixie

Comments 23 pages, 5 figures

详情
英文摘要

We study null homotopies of immersed spheres in $\mathbb{R}^3$ and the volume they sweep during contraction. For a smooth immersion with finitely many transverse self-intersections, we introduce a cable system that connects each bounded region of the complement to the exterior. From this construction we define the cable index and prove that it agrees with the Brouwer degree on each complementary region. Using this identification, we derive a degree-weighted lower bound for the swept volume of any Lipschitz null homotopy. We show that the bound is attained whenever the homotopy is sense-preserving, meaning the surface moves in a consistent direction, and the index evolves monotonically along the homotopy. In addition, in the case where the immersion arises as the boundary of an immersed ball, we construct an explicit homotopy that realizes this lower bound via a deformation of the ball. Finally, we present a linear-time algorithm that computes all cable indices from a finite cable system, providing a concrete and computable method for evaluating the lower bound.

2605.06556 2026-05-08 math.PR

Probability of Quota Violations in Divisor Apportionment Methods with Nonzero Allocations

Tyler C. Wunder, Joseph Cutrone

详情
英文摘要

Apportionment assigns indivisible items among groups. By the Balinski-Young theorem, no method can satisfy both house monotonicity and the quota rule. This paper investigates quota violations caused by nonzero allocation constraints, and derives exact probability formulas for their frequency. Such violations occur in systems like the U.S. House of Representatives, where each state is guaranteed at least one seat. We analyze the three-state case, introduce the $τ$ statistic to parametrize population distributions, and prove an Asymptotic Quota Stabilization theorem: for fixed $τ$, quota behavior stabilizes as populations grow, yielding probability results for quota violations determined by the set of ultimately violatory $τ$ values. Applying this framework to the five classical divisor methods, we derive exact probability formulas. Additionally, we show that as the number of seats $M \to \infty$, these probabilities converge to method-specific constants. These results provide a precise, quantitative foundation for evaluating the fairness and frequency of quota violations in constrained apportionment systems.

2605.06549 2026-05-08 math.OC

Stochastic Non-Smooth Non-Convex Optimization with Decision-Dependent Distributions

Chengchang Liu, Zongqi Wan, Haishan Ye, John C. S. Lui

详情
英文摘要

We study stochastic zeroth-order optimization with decision-dependent distributions, where the sampling law depends on the current decision and only noisy function values are available. For the non-smooth non-convex setting, we establish an explicit convergence guarantee for finding a $(δ,ε)$-Goldstein stationary point with stochastic zeroth-order oracle (SZO) complexity of $\mathcal{O}(d^2δ^{-3}ε^{-3})$. In addition, we show that the above complexity can be achieved with single SZO feedback per iteration. We further extend the analysis to smooth and Hessian-Lipschitz objectives, obtaining complexities $\mathcal{O}(d^2ε^{-6})$ and $\mathcal{O}(d^2ε^{-9/2})$, respectively. In the Hessian-Lipschitz case, this improves the best-known dependence on $ε$ for decision-dependent zeroth-order methods by a factor of $ε^{-1/2}$.

2605.06543 2026-05-08 cond-mat.stat-mech cond-mat.soft math-ph math.MP

A Rayleigh criterion for mechanical instability: inducing activity by chemo-mechanical coupling

Aaron Beyen, Francesco Casini, Christian Maes

Comments 36 pages, 14 figures

详情
英文摘要

Instabilities in thermodynamic systems are often undesirable, as they can lead to loss of control or even catastrophic behavior. Yet, the same mechanisms can also generate rich nonequilibrium behavior and may play a constructive role in living systems. We introduce a theoretical framework, inspired by Rayleigh's analysis of thermoacoustic instabilities, to study the emergence of mechanical activity. In particular, we derive Rayleigh-like criteria governing the onset of activity and the generation of rotational motion in a slow Newtonian probe coupled to driven chemical processes, described by Markov jump processes. These criteria are expressed in terms of the phase relation between entropic and frenetic contributions, providing a transparent condition for when chemical driving results in sustained rotational or active mechanical motion.

2605.06526 2026-05-08 physics.flu-dyn cs.NA math.NA

Reduced-Order Modeling of Parameterized Visco-Plastic Shallow Flows

Md Rezwan Bin Mizan, Ilya Timofeyev, Maxim Olshanskii

详情
英文摘要

We propose a non-intrusive reduced-order modeling framework for parametrized visco-plastic free-surface flows governed by a shallow-water formulation of Herschel--Bulkley fluids. These flows exhibit strong nonlinearities, non-smooth rheology, moving fronts, and yield surfaces, making efficient surrogate modeling particularly challenging. To address this challenge, we employ a tensor-based approach in which the solution manifold is approximated using a low-rank representation obtained via higher-order singular value decomposition of snapshot data over a structured parameter space. The resulting tensorial reduced-order model (TROM) enables rapid online evaluation by directly reconstructing solution trajectories from the compressed representation, thereby avoiding the need to perform time integration of a reduced dynamical system. The proposed non-intrusive framework can be interpreted as an encoder--decoder architecture with a compressed latent representation and efficient multilinear decoding. Numerical experiments demonstrate that the proposed approach accurately captures key flow features, including front propagation, plug and shear regions, and near-stopping dynamics, while achieving substantial computational speedups relative to full-order simulations.

2605.06521 2026-05-08 math.ST math.OC stat.TH

Time-sensitive anytime-valid testing

Eugenio Clerico, Tobias Wegel, Iskander Azangulov, Patrick Rebeschini

详情
英文摘要

Anytime-valid tests allow evidence to be checked during data collection: one can either continue testing or stop and reject the null while still controlling type-I error. Yet, in many applications rejection is useful only if it comes soon enough. We introduce a time-sensitive testing-by-betting framework that favours early rejection by assigning rewards to rejection times and maximising their expected value under a given alternative. This encompasses hard deadlines and softer time preferences. The resulting optimal control problem admits a Bellman representation in terms only of time and evidence against the null, rather than the full history. For hard deadlines, the simple-vs-simple case reduces to a finite-horizon Neyman--Pearson problem and identify the corresponding optimal e-process. Furthermore, we show that exponentially decaying rewards admit a stationary approximation, yielding the exponential-decay-optimal (EDO) criterion: a finite-time-scale counterpart to the classical growth-rate-optimal (GRO) viewpoint in anytime-valid statistics, with the GRO criterion recovered in the large-time-scale limit.

2605.06518 2026-05-08 math.DG math.PR

Absolute continuity of generalized Wasserstein barycenters of finitely many measures

Jianyu Ma

详情
英文摘要

Consider a complete Riemannian manifold $(M, g)$ and optimal transport problems on it with cost functions of the form $c(x,y) = h(d_{g}(x,y))$. We study the absolute continuity of the corresponding generalized Wasserstein barycenters of finitely many marginal measures. For general strictly convex profiles $h$ lacking $\mathcal{C}^2$-smoothness, such as $h(d)= d^p / p$ with $1 < p < 2$ that defines the $p$-Wasserstein space, the singularity at $d=0$ prevents the barycenter from inheriting absolute continuity from a single marginal measure as the quadratic case. To overcome this singularity, recent Euclidean results necessitate the absolute continuity of all marginals. Building upon the approximation framework toward absolute continuity in arXiv:2310.13832, we extend the Euclidean advancements to the manifold setting. Stripping away the implicit reliance on flat translational symmetry and local coordinate calculations of their Euclidean proofs, our work handles the singularity in a geometrically transparent way, revealing the precise analytic condition on the cost profile that governs the necessary assumptions.

2605.06516 2026-05-08 math.OC cs.AI

Learning to Cut: Reinforcement Learning for Benders Decomposition

Haochen Cai, Xian Yu

详情
英文摘要

Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.

2605.06511 2026-05-08 math.PR cs.DM

Logarithmic Mixing of Random Walks on Dynamical Random Cluster Models

Andreas Galanis, Leslie Ann Goldberg, Xandru Mifsud

Comments 43 pages, 1 figure

详情
英文摘要

We study random walks on dynamically evolving graphs, where the environment is given by a time-dependent subset of the edges of an underlying graph. Concretely, following the recently introduced framework of Lelli and Stauffer, we consider a random walk interacting with a dynamical random-cluster environment, in which edges are updated with rate $μ>0$ according to Glauber dynamics with parameters $p$ and $q$, and the walker moves at rate 1 but may only traverse edges that are present at the time of the move. This setting introduces strong dependencies between the walk and the environment, as edge-update probabilities depend on the global connectivity structure. We focus on the case where the underlying graph is a random $d$-regular graph and the parameters lie in the subcritical regime $p < p_{\mathrm{u}}(q, d)$ where it is known that the Glauber dynamics mixes quickly. Our main result is to show that for any $\varepsilon >0$ and all $q \ge 1$, for all $p$ in the subcritical regime, the mixing time of the joint process is $Θ(\log n)$ (in continuous time) whenever $μ\geq \varepsilon \log n$. This matches the mixing time of the simple random walk on a static random regular graph, showing that in this regime the evolving environment does not slow down mixing. Our proof is based on a coupling argument that uses path-count techniques to overcome the dependencies in the edge dynamics by controlling the structure of the environment along typical trajectories.

2605.06504 2026-05-08 math-ph math.MP

Eigenstates with Infinite Position Moments

Michal Jex

详情
英文摘要

We prove necessary and sufficient conditions for the Schrödinger operators to have zero-energy bound states at the threshold of the essential spectrum such that they have bounded $k$-th moment. This result is the extension of the results published in D. Hundertmark, M. Jex, and M. Lange [Forum Mathematics, Sigma 11(2023)].

2605.06503 2026-05-08 math.AP

Sharp local well-posedness for the Hirota-Satsuma system

Rafael Deiga

Comments 35 pages, 3 figures, 3 tables

详情
英文摘要

We establish sharp local existence results for the Hirota-Satsuma system in $H^k(\mathbb{R}) \times H^s(\mathbb{R})$, depending on the ratio between the dispersion of the components. These theorems significantly generalize previous works, which were restricted to the diagonal case of equal regularity $s=k$. Furthermore, we extend the known global well-posedness theory to the off-diagonal regime. The argument relies on the Fourier restriction norm method coupled with the concept of integrated-by-parts strong solution - a framework that generalizes the classical notion of strong solution.

2605.06495 2026-05-08 math.OC cs.SY eess.SY

Global self-optimizing control of batch processes

Chenchen Zhou, Hongxin Su, Xinhui Tang, Yi Cao, Shuang-hua Yang

详情
Journal ref
Journal of Process Control Volume 135, March 2024, 103163
英文摘要

This work considers to achieve near-optimal operation for a class of batch processes by employing self-optimizing control (SOC). Comparing with a continuous one, a batch process exhibits stronger nonlinearity with dynamics because of the non-steady operation condition. This necessitates a global version of SOC to achieve satisfactory performance. Meanwhile, it also makes the existing global SOC (gSOC) not directly applicable to batch processes due to the causality amongst variables. Therefore, it is necessary to extend the original gSOC to batch processes. In addition to the nonconvexity challenge of the original gSOC problem, the new extension for batch processes has to face even more challenges. Particularly, the causality due to dynamics of batch processes brings in structural constraints on controlled variables (CVs), making a CV selection problem even more difficult. To address these challenges, the gSOC problem is recast in a vectorized formulation and it is proved that the structural constraints considered are linear in the vectorized formulation. Moreover, a novel shortcut method is proposed to efficiently find sub-optimal but more transparent solutions for this problem. The effectiveness of the new approach is validated through a case study of a fed-batch reactor, where CVs are constructed through a combination matrix with a repetitive structure, resulting in a simple SOC scheme. This simplicity facilitates the implementation of the SOC approach and enhances its practical applicability and robustness.

2605.06479 2026-05-08 stat.ML cs.LG math.ST stat.TH

Risk-Controlled Post-Processing of Decision Policies

Sunay Joshi, Tao Wang, Hamed Hassani, Edgar Dobriban

详情
英文摘要

Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where switching to the oracle fallback policy yields a large reduction in conditional violation risk. At the finite-sample level, given a fitted fallback policy and score, we develop a post-processing algorithm that uses calibration data to select a threshold. Leveraging tools from algorithmic stability and stochastic processes, we show that under regularity conditions, in the i.i.d. setting, the expected excess risk of the post-processed policy is $O(\log n/n)$. In the special case when an exact-safe fallback policy is available, the algorithm achieves precise expected risk control under exchangeability. In this setting, we also give high-probability near-optimality guarantees on the post-processed policy. Experiments on a COVID-19 radiograph diagnosis task, an LLM routing problem, and a synthetic multiclass decision task show that targeted post-processing can meet or nearly meet risk budgets while preserving substantially more agreement with the baseline than score-blind random mixing.

2605.06471 2026-05-08 math.CO math.PR

Leap generators for composition schemes

Éric Fusy, Carine Pivoteau

Comments 37 pages

详情
英文摘要

Leap generators have been introduced in [Duchon et al.'04] for exact-size random generation of structures in a class of the form $\mathcal{C}=\mathrm{Seq}(\mathcal{B})$ (sequence construction), in the supercritical case. We extend these generators to supercritical composition schemes $\mathcal{C}=\mathcal{A}\circ\mathcal{B}$. Compared to the sequence construction, the obtained exact-size random generator for $\mathcal{C}$ still has linear time complexity (under conditions on the sampling complexity in $\mathcal{A}$ and $\mathcal{B}$), but perfect uniformity of the distribution is lost in general. However the distribution on $\mathcal{C}_n$, called leap distribution, is asymptotically uniform, the total variation distance from the uniform distribution being $(c+o(1))n^{-1/2}$ for an explicit constant $c$. These generators are simple to implement and can be applied to several classes of walks and trees, in particular Pólya trees. Leap generators can also be given for certain critical composition schemes, those relating planar map families, where this time the total variation distance to the uniform distribution is $\sim c\,n^{-1/3}$ for an explicit constant $c$.

2605.06469 2026-05-08 math.OC cs.LG cs.SY eess.SY

Dynamic Controlled Variables Based Dynamic Self-Optimizing Control

Chenchen Zhou, Shaoqi Wang, Hongxin Su, Xinhui Tang, Yi Cao, Shuang-Hua Yang

详情
Journal ref
Journal of Process Control, 2024, 138: 103228
英文摘要

Self-optimizing control is a strategy for selecting controlled variables, where the economic objective guides the selection and design of controlled variables, with the expectation that maintaining the controlled variables at constant values can achieve optimization effects, translating the process optimization problem into a process control problem. Currently, self-optimizing control is widely applied to steady-state optimization problems. However, the development of process systems exhibits a trend towards refinement, highlighting the importance of optimizing dynamic processes such as batch processes and grade transitions. This paper formally introduces the self-optimizing control problem for dynamic optimization, termed the dynamic self-optimizing control problem, extending the original definition of self-optimizing control. A novel concept, "dynamic controlled variables" (DCVs), is proposed, and an implicit control policy is presented based on this concept. The paper theoretically analyzes the advantages and generality of DCVs compared to explicit control strategies and elucidates the relationship between DCVs and traditional controllers. Moreover, this paper puts forth a data-driven approach to designing self-optimizing DCVs, which considers DCV design as a mapping identification problem and employs deep neural networks to parameterize the variables. Three case studies validate the efficacy and superiority of DCVs in approximating multi-valued and discontinuous functions, as well as their application to dynamic optimization problems with non-fixed horizons, which traditional self-optimizing control methods are unable to address.