arXivDaily arXiv每日学术速递 周一至周五更新
重置
stat.ML机器学习55
2606.12260 2026-06-11 econ.TH cs.AI cs.GT cs.LG stat.ML 新提交

Market Design for AI: Beyond the Copyright Binary

人工智能的市场设计:超越版权二元论

Yan Dai, Maryam Farboodi, Negin Golrezaei, Sepehr Shahshahani

AI总结 本文通过静态和动态博弈模型,分析AI训练数据市场中“自由使用”与“强知识产权”两种模式的失败,提出通过数据中介内部化外部性并补贴创新贡献的市场设计。

详情
AI中文摘要

我们如何设计一个用于训练AI模型的人类生成内容市场,既能促进技术进步,又能保留个人创作高质量内容的激励?现有方法采取两极立场:基于合理使用的“自由使用”模式和“强知识产权”模式。我们证明两者均失败:自由使用不补偿创作者,而通过建模为静态Stackelberg博弈,强知识产权也削弱了创作激励。我们发现这对更具创新性的创作者尤其如此,我们将此现象称为“原创性惩罚”。将这一见解扩展到动态模型,我们发现另一种市场失灵会损害AI模型性能,即使对于初始良好的模型也是如此:此类模型导致人类更依赖AI辅助创作,导致同质化内容反馈到训练中,从而降低模型性能——即“精确性诅咒”。我们进一步提出一种市场设计,通过数据中介内部化跨创作者外部性并补贴创新贡献,从而恢复效率。

英文摘要

How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency.

2606.12058 2026-06-11 stat.ML cond-mat.dis-nn cs.LG 新提交

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

注意力中的相变:复制头涌现的贝叶斯理论

Itay Lavie, Kirsten Fischer, Andrey Lekov, Frederic Van Maele, Zohar Ringel, Moritz Helias

AI总结 通过分析单层softmax注意力网络在复制任务上的训练,提出贝叶斯理论揭示注意力矩阵的后验分布存在相变,并对比线性注意力发现softmax注意力呈现一阶相变。

详情
AI中文摘要

注意力是Transformer中上下文学习的关键机制,经验上观察到注意力模式在训练过程中突然涌现。我们提出了注意力中特征学习的贝叶斯理论;然后通过分析在复制任务上训练的单层softmax注意力网络,专注于归纳头第一层中复制子电路的学习方式。我们推导出注意力矩阵上的闭式后验,并将其简化为低维序参数空间。这种简化揭示了训练数据量上的相变,我们通过贝叶斯采样和使用Adam的标准训练验证了这一点。我们将结果与线性注意力对比,发现softmax注意力表现出\emph{一阶相变},而在线性注意力中,初始的\emph{二阶相变}之后是向结构化注意力模式的平滑连续演化(\emph{交叉})。我们的工作为复制子电路的突然涌现提供了第一性原理的理论解释,这让人联想到在大语言模型训练中观察到的现象。

英文摘要

Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a \emph{first-order phase transition} while in linear attention an initial \emph{second-order phase transition} is followed by a smooth, continuous evolution toward the structured attention pattern (\emph{crossover}). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models.

2606.12047 2026-06-11 cs.CV cs.AI stat.ML 新提交

Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding

元数据感知的多提示推理用于零样本事故理解

Tarandeep Singh, Soumyanetra Pal, Soham Biswas, Nishanth Chandran

发表机构 * Netradyne

AI总结 提出三阶段流水线,通过视觉-语言相似性、元数据驱动的多提示推理和开放词汇检测,实现零样本事故视频的时序定位、语义分类和空间定位,显著提升性能。

详情
Comments
Accepted at the AUTOPILOT Workshop, CVPR 2026 (non-archival). Workshop Paper ID 15
AI中文摘要

在本文中,我们通过识别冲击事件发生的时间、类型以及帧中的位置,使用自然语言解决监控视频中事故的零样本理解问题。我们提出一个三阶段流水线,将事故理解分解为何时、何物和何地。第一阶段利用视觉-语言相似性提取冲击周围的短时间窗口。第二阶段,我们执行元数据驱动的多提示推理,包含五个互补视角(基线、运动、几何、对比和决胜),并通过熵门控成对裁决器解决分歧。最后,我们基于预测的事故类型和场景布局查询开放词汇检测器以定位冲击,并使用分数加权质心聚合关键帧上的检测结果。我们的流水线在零样本ACCIDENT @ CVPR基准测试上,相对于帧中心基线,调和平均分数有显著提升。我们表明,将零样本视频理解分解为时序定位、语义分类和空间定位,比直接提示更能实现视觉-语言模型的可靠推理。

英文摘要

In this paper, we address the problem of zero-shot understanding of accidents from surveillance videos by identifying when an impact event occurs, what type of impact it is, and where in the frame it occurs using natural language. We propose a three-stage pipeline that decomposes the accident understanding into when, what, and where. The first stage extracts a short temporal window around the impact using vision-language similarity. In the second stage, we perform metadata-driven multi-prompt reasoning with five complementary views (baseline, motion, geometry, contrast, and tiebreaker) and resolve disagreement via an entropy-gated pairwise adjudicator. Finally, we localize the impact of an open-vocabulary detector queried on the predicted accident type and scene layout, and aggregate detections across keyframes using a score-weighted centroid. Our pipeline achieves a substantial improvement in the harmonic-mean score over a centre-of-frame baseline on the zero-shot ACCIDENT @ CVPR benchmark. We show that decomposing zero-shot video understanding into temporal localization, semantic classification, and spatial grounding enable more reliable reasoning with vision-language models than direct prompting alone.

2606.11988 2026-06-11 cs.LG stat.ML 新提交

What Uncertainties Do We Need for Dynamical Systems?

动力系统需要哪些不确定性?

Yusuf Sale, Christopher Bülte, Felix Czaja, Joshua Stiller, Eyke Hüllermeier

发表机构 * Institute of Computer Science, LMU Munich(慕尼黑大学计算机科学研究所) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) Department of Mathematics, LMU Munich(慕尼黑大学数学系) German Research Center for Artificial Intelligence (DFKI, DSA)(德国人工智能研究中心(DFKI, DSA))

AI总结 本文从机器学习视角探讨动力系统中的不确定性,区分偶然与认知不确定性,并讨论不同任务中表示和量化不确定性的目标。

详情
Comments
EIML@ICML
AI中文摘要

偶然不确定性和认知不确定性之间的区别在机器学习研究中受到了相当大的关注,主要是在监督学习的背景下,但也涉及其他设置,如生成建模。在本文中,我们提供了一个关于动力系统不确定性建模的机器学习视角,这方面的研究迄今较少。特别是,我们提出:动力系统需要哪些不确定性?我们讨论了不确定性的来源,阐明了它们的性质(偶然或认知),并考虑了表示和量化不确定性的目标如何在不同任务中变化。

英文摘要

The distinction between aleatoric and epistemic uncertainty has received considerable attention in machine learning research, mainly in the context of supervised learning but also in other settings such as generative modeling. In this paper, we offer a machine learning perspective on uncertainty modeling for dynamical systems, which has been studied much less so far. In particular, we ask: what uncertainties do we need for dynamical systems? We discuss sources of uncertainty, clarify their nature (aleatoric or epistemic), and consider how the objectives of representing and quantifying uncertainty vary across different tasks.

2606.11968 2026-06-11 cs.LG stat.ML 新提交

Efficient Multinomial Logistic Bandit via Frequent Directions

基于频繁方向的高效多项式逻辑斯蒂老虎机

Linzhe He, Yu-Jie Zhang, Sifan Yang, Lijun Zhang

发表机构 * State Key Laboratory of Novel Software Technology, Nanjing University(南京大学计算机软件新技术国家重点实验室) School of Artificial Intelligence, Nanjing University(南京大学人工智能学院) Paul G. Allen School of Computer Science & Engineering, University of Washington(华盛顿大学保罗·G·艾伦计算机科学与工程学院)

AI总结 针对多项式逻辑斯蒂老虎机的高维计算瓶颈,提出集成频繁方向矩阵素描的EOFD-MLogB算法,将每轮复杂度降至O(Kd(m+K)^2)时间和O(Kd(m+K))空间,并证明其遗憾界接近原算法。

详情
AI中文摘要

本文研究多项式逻辑斯蒂老虎机(MLogB)的高效在线算法,其中$K+1$个结果的反馈分布遵循$d$维动作向量的多项式逻辑斯蒂模型。代表性的UCB型算法OFUL-MLogB实现了$\tilde{\mathcal{O}}(Kd\sqrt{T})$的遗憾界,但由于参数估计和乐观奖励构造,每轮仍需$\mathcal{O}(K^3d^3)$时间和$\mathcal{O}(K^2d^2)$空间,在高维场景下不可行。为解决此限制,我们提出EOFD-MLogB,将频繁方向矩阵素描集成到OFUL-MLogB中。通过维护累积Hessian的低秩SVD素描,参数估计中的约束在线牛顿更新和奖励奖励中的$Kd \times K$谱范数计算分别简化为单维求根任务和$K \times K$特征值计算。这导致每轮主要时间复杂度为$\mathcal{O}(Kd(m+K)^2)$,空间复杂度为$\mathcal{O}(Kd(m+K))$,其中$m \ll d$为素描大小。我们进一步证明了$\tilde{\mathcal{O}}(\Delta_T(Kd\ln\Delta_T+m)\sqrt{T})$的遗憾界,其中素描误差因子$\Delta_T$由Hessian的$m$截断谱尾控制。因此,当Hessian近似低秩时,遗憾接近OFUL-MLogB。实验验证了计算效率和竞争性能。

英文摘要

This paper studies efficient online algorithms for multinomial logistic bandits (MLogB), where the feedback distribution over $K+1$ outcomes follows a multinomial logistic model of $d$-dimensional action vectors. A representative UCB-type algorithm, OFUL-MLogB, achieves a regret bound of $\tilde{\mathcal{O}}(Kd\sqrt{T})$, but still requires $\mathcal{O}(K^3d^3)$ time and $\mathcal{O}(K^2d^2)$ space per round due to parameter estimation and optimistic reward construction, which is prohibitive in high-dimensional settings. To address this limitation, we propose EOFD-MLogB, which integrates frequent directions matrix sketching into OFUL-MLogB. By maintaining a low-rank SVD sketch of the accumulated Hessian, constrained online Newton updates in parameter estimation and $Kd \times K$ spectral-norm computations in the reward bonus are reduced to one-dimensional root-finding tasks and $K \times K$ eigenvalue computations, respectively. This yields dominant per-round time complexity $\mathcal{O}(Kd(m+K)^2)$ and space complexity $\mathcal{O}(Kd(m+K))$, where $m \ll d$ is the sketch size. We further prove a regret bound of $\tilde{\mathcal{O}}(\Delta_T(Kd\ln\Delta_T+m)\sqrt{T})$, where the sketching error factor $\Delta_T$ is controlled by the $m$-truncated spectral tail of the Hessian. Thus, when the Hessian is approximately low-rank, the regret is close to that of OFUL-MLogB. Experiments validate the computational efficiency and competitive performance.

2606.11949 2026-06-11 cs.LG cs.CR stat.ML 新提交

Online Shift Detection and Conformal Adaptation for Deployed Safety Classifiers

已部署安全分类器的在线漂移检测与共形自适应

Jun Wen Leong

AI总结 提出在线监测系统,使用校准序列统计检测分布漂移,并通过共形弃权层自适应阈值恢复目标错误率,在800个实验单元中实现86.6%有效检测。

详情
Comments
16 pages, 4 figures, 7 tables. Code and data at this https URL
AI中文摘要

我们提出了一种在线监测系统,用于检测已部署安全分类器中的分布漂移,使用校准的序列统计量来检测分类器何时移出分布。一旦检测到,共形弃权层会自适应调整决策阈值,以恢复目标错误率ε=0.1。在一项预注册的析因评估(4个分类器×5种漂移条件×20个种子×2个窗口大小,共800个单元)中,该系统实现了86.6%的有效检测(693/800,95% CI [84.1%, 88.8%]),平均延迟为39.5步。检测在三种真实标签机制下均有效:合成发作(86.6%)、真实时间越狱(85%,17/20)和GCG对抗攻击。加权共形预测为DeBERTa恢复了高达39个百分点的丢失覆盖率(ESS=46/300),但所有其他分类器均崩溃(ESS≈300):逻辑密度比估计在高维嵌入空间中实现了完美的源/目标可分离性,将所有重要性权重裁剪至下限。DeBERTa显示出从有效校正(释义,ESS=46)到几乎完全崩溃(对抗后缀,ESS=206)的梯度。PCA降至32维打破了崩溃,为Llama Guard恢复了33个百分点,为ShieldGemma恢复了21个百分点。方差分解显示分类器(η²=0.243)、漂移类型(η²=0.237)及其交互作用(η²=0.185)均对检测延迟方差有显著贡献(所有p<0.001),表明需要针对每个分类器的监测配置文件。

英文摘要

We present an online monitoring system for distributional shift in deployed safety classifiers, using calibrated sequential statistics to detect when a classifier has moved out of distribution. Upon detection, a conformal abstention layer adapts decision thresholds to recover a target error rate epsilon=0.1. In a pre-registered factorial evaluation (4 classifiers x 5 shift conditions x 20 seeds x 2 window sizes, 800 cells), the system achieves 86.6% valid detection (693/800, 95% CI [84.1%, 88.8%]) with mean latency of 39.5 steps. Detection holds across three ground-truth regimes: synthetic onset (86.6%), real temporal jailbreaks (85%, 17/20), and GCG adversarial attacks. Weighted conformal prediction recovers up to 39 pp of lost coverage for DeBERTa (ESS=46/300) but collapses for all other classifiers (ESS~300): logistic density ratio estimation achieves perfect source/target separability in high-dimensional embedding spaces, clipping all importance weights to the floor. DeBERTa shows a gradient from effective correction (paraphrase, ESS=46) to near-total collapse (adversarial suffix, ESS=206). PCA to 32 dimensions breaks the collapse, recovering 33 pp for Llama Guard and 21 pp for ShieldGemma. Variance decomposition reveals classifier (eta^2=0.243), shift type (eta^2=0.237), and their interaction (eta^2=0.185) all contribute substantially to detection latency variance (all p<0.001), indicating per-classifier monitoring profiles are necessary.

2605.04893 2026-06-11 cs.LG cs.CL stat.ML 版本更新

Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

自注意力作为传输:对称谱诊断的极限

Dominik Dahlem, Diego Maniloff, Mac Misiura

AI总结 研究语言模型注意力路由的两种失效形状(过度集中或过度分散),证明对称谱诊断对方向不敏感,并揭示因果注意力中传输容量的理论下限,提出基于容量和方向的双轴诊断方法。

详情
Comments
48 pages, 6 figures, 7 tables; 81-page online supplement (proofs, additional experiments, dataset statistics) as an ancillary file
AI中文摘要

当语言模型处理幻觉响应时,其注意力路由往往以两种形状之一失效:过度集中在狭窄的位置集合上,或者分散得如此广泛以至于相关性被稀释,而失效的形状携带诊断信号。我们研究这些形状作为诊断特征,从在基准标记响应的\emph{强制评分}下计算的注意力矩阵中得出,而不是在实时生成期间。一类广泛使用的谱方法分析度归一化注意力算子的对称分量,该算子控制传输\emph{容量};我们证明该算子的每个转置不变谱诊断在结构上是\emph{方向盲的}(它无法区分算子与其转置,因此无法检测信息流方向),并且盲定理的逆定理将任何Lipschitz诊断的转置敏感性限制为不对称系数$G$。将其与规范因果架构的闭式二分-Cheeger景观配对,我们证明均匀因果注意力满足一个与$n$无关的下界$\phi \ge 1/5$,而窗口注意力以$O(w/n)$穿透下界;失效模式在形状上不同,而不仅仅在数值上不同。这个下界是一个理想化架构的基准,而不是经验吸引子:穿透它的真实注意力头的比例本身就是一个架构特征。由此产生的双轴诊断($\phi$表示容量,$G$表示方向)产生一个可证伪的极性预测:瓶颈主导和分散主导的基准应表现出相反的极性。在长度控制评估下,传输特征在测试的仅解码器、仅编码器和编码器-解码器模型中保持可解释的信号(0.62-0.84 LC-AUROC),极性在HaluEval和MedHallu之间如预测般反转。

英文摘要

When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under \emph{forced scoring} of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport \emph{capacity}; we prove that every transpose-invariant spectral diagnostic of this operator is structurally \emph{orientation-blind} (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $\phi \ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($\phi$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu.

2606.11911 2026-06-11 stat.ML cs.LG math.AT 新提交

From Persistence to Survival: Hypothesis Testing, Effect Sizes and Vectorisation for Topological Features

从持续性到生存:拓扑特征的假设检验、效应大小与向量化

Juliette Murris, Bernadette Stolz, Karsten Borgwardt

AI总结 提出STRAND方法,将持久性图视为生存数据,利用持久性生存函数统一实现假设检验、效应大小计算和向量化,在合成数据和真实基准上验证了有效性。

详情
AI中文摘要

持久性图是拓扑数据分析中常见的表示形式,但它们并非天然存在于向量空间中,且用于比较它们的统计工具在很大程度上与用于下游预测的工具分开发展。我们引入STRAND(生存拓扑表示图分析),将(集合的)持久性图视为生存数据:每个具有持久性值 $p = d - b$ 的拓扑特征是一个完全观测的事件时间,持久性生存函数 $S(t) = \mathbb{P}(p > t)$ 是比较图的中心对象。从这个单一表示中,我们推导出(i)一个非参数双样本检验,具有校准的第一类错误率和少量图的高功效;(ii)可解释的效应大小;以及(iii)用于下游机器学习的1-Wasserstein稳定特征向量。我们在具有受控拓扑的合成流形上验证了校准和功效,展示了在14个图和3D点云基准上的竞争性向量化,并将该方法应用于fMRI/神经科学数据中的功能性脑连接研究。据我们所知,STRAND是第一个从单一连贯且可解释的表示为持久性图提供假设检验和向量化的方法。

英文摘要

Persistence diagrams are common representations in topological data analysis, but they do not naturally live in a vector space, and the statistical tools developed for comparing them have largely evolved separately from those used for downstream prediction. We introduce STRAND (Survival Topological Representation ANalysis of Diagrams), which treats (collections of) PDs as survival data: each topological feature with persistence value $p = d - b$ is a fully observed time-to-event, and the persistence survival function $S(t) = \mathbb{P}(p > t)$ is the central object for comparing diagrams. From this single representation we derive (i) a non-parametric two-sample test with calibrated Type I error and high power from a small number of diagrams; (ii) interpretable effect sizes; and (iii) a 1-Wasserstein-stable feature vector for downstream machine learning. We validate calibration and power on synthetic manifolds with controlled topology, demonstrate competitive vectorisation across 14 graph and 3D point cloud benchmarks, and apply the method to study functional brain connectivity in fMRI/neuroscience data. To our knowledge, STRAND is the first method to provide hypothesis testing and vectorisation for persistence diagrams from a single coherent and interpretable representation.

2606.11865 2026-06-11 stat.ML cs.LG 新提交

Conformal Bayes under Label Shift: Post-Hoc Calibration vs. In-Training Adaptation

标签偏移下的共形贝叶斯:事后校准与训练内适应

Seungjin Choi

AI总结 研究标签偏移下共形贝叶斯方法,通过重要性加权共形校准恢复目标域覆盖,比较事后校准与训练内适应两种策略,后者在偏差训练中起到去偏作用。

详情
Comments
2nd Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026)
AI中文摘要

共形贝叶斯将贝叶斯后验预测与共形校准相结合,产生既统计有效又几何高效的预测集。我们从统一视角研究标签偏移下的共形贝叶斯,识别出两种互补方法,它们通过重要性加权共形校准恢复名义目标域覆盖,但通过独立机制运作。\emph{事后校准}将后验预测向目标域倾斜,并通过重要性加权分位数校正共形阈值,保持参数后验不变。\emph{训练内适应}将参数后验本身向目标域倾斜,产生校正后的预测,其最高预测密度区域作为基于拟合目标预测的最高预测密度(HPD)预测集;效率依赖于模型,并不保证有限样本条件最优性。两个受控实验表明,在无偏训练机制下,两种策略同样实现有效覆盖,而在领先优化机制下,训练内适应作为去偏算子,在覆盖不变的情况下减少区间宽度。

英文摘要

Conformal Bayes combines Bayesian posterior predictives with conformal calibration to produce prediction sets that are both statistically valid and geometrically efficient. We study conformal Bayes under label shift from a unified perspective, identifying two complementary approaches that restore nominal target-domain coverage through importance-weighted conformal calibration but operate through independent mechanisms. \emph{Post-hoc calibration} tilts the posterior predictive toward the target domain and corrects the conformal threshold via an importance-weighted quantile, leaving the parameter posterior unchanged. \emph{In-training adaptation} tilts the parameter posterior itself to the target domain, producing a corrected predictive whose highest predictive density region serves as the highest predictive density (HPD) based prediction set under the fitted target predictive; efficiency is model-dependent and does not imply finite-sample conditional optimality. Two controlled experiments show that in an unbiased training regime both strategies achieve valid coverage equally, while in a lead-optimization regime in-training adaptation acts as a debiasing operator, reducing interval width at unchanged coverage.

2606.11775 2026-06-11 math.MG q-bio.QM stat.ML 新提交

Magnitude-Based Features for Multispecies Spatial Data

基于量值的多物种空间数据特征

Julia Sollberger, Joshua Bull, Sara Kališnik, Bernadette Stolz

AI总结 提出基于量值的全局和局部特征向量,用于分析多物种空间数据中的相互作用,在合成肿瘤微环境和人类结直肠癌组织微阵列数据中验证了其识别空间异质性和分类能力。

详情
Comments
32 pages, 24 figures
AI中文摘要

多物种空间数据出现在许多应用中,其中不同实体之间的相互作用对系统行为至关重要,包括生物医学成像、地理空间分析和物种生态学。尽管它们很重要,但捕获这种相互作用的定量工具相对较少。在这项工作中,我们提出了基于量值的特征用于分析多物种空间数据。量值是有限度量空间的一个实值不变量,可以解释为有效点数,结合了空间配置和尺度。我们开发了全局和局部量值特征向量,并在合成肿瘤微环境数据以及人类结直肠癌样本的组织微阵列数据中展示了它们的实用性。在局部,该方法识别出不同的邻域类型并揭示空间异质性;在模型中,这包括与模拟的不同定性结果相关的径向模式,而在真实世界数据中,它反映了B细胞和T细胞群体之间三级淋巴结构样相互作用的重要性。在全局上,该方法恢复了合成数据中跨参数区域的长期模拟结果的已知分类,并提示CD4+ T细胞和CD163+巨噬细胞在区分有利的克罗恩样反应与不利的弥漫性免疫浸润患者中发挥重要作用。总之,这些结果表明基于量值的特征为多物种空间数据分析提供了强大而灵活的工具。

英文摘要

Multispecies spatial data arise in many applications where interactions between different entities are central to system behaviour, including biomedical imaging, geospatial analysis, and species ecology. Despite their importance, relatively few quantitative tools exist to capture such interactions. In this work, we propose magnitude-based features for the analysis of multispecies spatial data. Magnitude is a real-valued invariant of finite metric spaces that can be interpreted as an effective number of points, incorporating both spatial configuration and scale. We develop global and local magnitude feature vectors and demonstrate their utility on synthetic tumour microenvironment data, and in tissue microarray data from human colorectal cancer samples. Locally, the method identifies distinct neighbourhood types and reveals spatial heterogeneity; in the model, this includes radial patterns associated with different qualitative outcomes of the simulations, while in the real-world data it reflects the importance of tertiary lymphoid structure-like interactions between B and T cell populations. Globally, the approach recovers known classifications of long-term simulation outcomes across parameter regimes in synthetic data, and suggests important roles for CD4+ T cells and CD163+ macrophages in distinguishing patients with favourable Crohn's like reactions from unfavourable diffuse immune infiltration. Together, these results suggest that magnitude-based features provide a powerful and flexible tool for the analysis of multispecies spatial data.

2606.11746 2026-06-11 astro-ph.IM stat.ML 新提交

Time Series Analysis in Machine Learning

机器学习中的时间序列分析

Antonio Pagliaro, Anna Anzalone

AI总结 从机器学习视角综述时间序列分析,涵盖经典统计模型与现代机器学习方法,强调跨领域应用原则。

详情
Comments
Invited chapter for the edited book "Machine Learning Techniques for Astrophysics and Cosmology" (Eds. Cosimo Bambi, Vinay Kashyap, Swarnim Shashank, Naoki Yoshida, Springer Singapore, expected in 2026). Submitted version
AI中文摘要

时间序列分析是机器学习的基本组成部分,尤其是在天体物理学和宇宙学中,时域数据丰富。本章从机器学习的视角对时间序列分析技术进行了教学性综述。我们涵盖了时间序列的基本概念(平稳性、自相关、季节性)、经典统计模型(自回归、移动平均、ARIMA、指数平滑、状态空间模型)以及现代机器学习方法。特别地,我们讨论了传统统计方法如何奠定基础,然后探索了用于时间序列的机器学习方法,包括基于特征的回归、基于树的集成方法、隐马尔可夫模型、高斯过程和深度学习模型(循环神经网络、卷积网络、变换器)。在整章中,我们通过来自多个领域(例如天文学、天气预报、金融)的示例进行说明,以强调共同原则。目标是使读者具备理论理解和实践背景,以便在其研究中应用机器学习技术进行时间序列分析。

英文摘要

Time series analysis is a fundamental component of machine learning, especially in astrophysics and cosmology where temporal data abound. This chapter provides a pedagogical review of time series analysis techniques from a machine learning perspective. We cover the basic concepts of time series (stationarity, autocorrelation, seasonality), classical statistical models (autoregressive, moving average, ARIMA, exponential smoothing, state-space models), and modern machine learning approaches. In particular, we discuss how traditional statistical methods lay the groundwork, and then explore machine learning methods for time series, including feature-based regression, tree-based ensemble methods, hidden Markov models, Gaussian processes, and deep learning models (recurrent neural networks, convolutional networks, transformers). Throughout, we illustrate with examples drawn from multiple domains (e.g. astronomy, weather forecasting, finance) to emphasize common principles. The goal is to equip readers with both the theoretical understanding and practical context to apply machine learning techniques for time series analysis in their research.

2606.11738 2026-06-11 stat.ML cs.LG 新提交

Renewable Lasso without Batch-Number Constraints: A Gradient-Enhanced Approach

无批次数量约束的可再生Lasso:一种梯度增强方法

Junzhuo Gao, Ling Peng, Xu Guo, Heng Lian

AI总结 针对高维广义线性模型的流数据在线估计,提出梯度增强替代损失函数,消除批次数量约束,并扩展到分布式流数据场景,理论推导非渐近误差界,实验验证精度提升。

详情
AI中文摘要

我们研究具有流数据的高维广义线性模型的在线估计。首先,针对非分布式设置,我们提出一种梯度增强替代损失函数,仅使用历史摘要近似累积损失,修改并改进了现有高维设置下同一模型的可再生估计方法,并消除了先前研究中的批次数量约束。然后,我们将该方法扩展到主从架构下的分布式流数据,其中批次按站点划分,仅交换摘要(梯度向量)。我们的调整方法不要求客户端计算完整的替代损失,而不是直接应用Jordan等人(2019)的流行方法到替代二次损失。我们在高维尺度下推导了非渐近误差界,没有先前研究中严格的批次数量约束。在线性和逻辑模型下的模拟结果以及实际数据应用表明,与现有的可再生估计器相比,精度有所提高。

英文摘要

We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.

2606.11711 2026-06-11 cs.LG stat.ML 新提交

Capacity-Constrained Online Convex Optimization with Delayed Feedback

具有延迟反馈的容量受限在线凸优化

Alexander Ryabchenko, Idan Attias, Daniel M. Roy

发表机构 * Department of Statistical Sciences, University of Toronto(多伦多大学统计科学系) Vector Institute(向量研究所) Institute for Data, Econometrics, Algorithms, and Learning (IDEAL), hosted by UIC and TTIC(数据、计量经济学、算法与学习研究所(IDEAL),由伊利诺伊大学芝加哥分校和丰田工业大学芝加哥分校主办)

AI总结 研究在硬容量约束下(最多同时跟踪C个待处理轮次)的延迟在线凸优化,通过引入半先知模型和延迟加权FTRL算法,首次给出了凸和强凸损失下容量受限OCO的遗憾界。

详情
AI中文摘要

具有延迟反馈的在线学习通常假设学习者可以跟踪所有待处理轮次直到其反馈到达。在实践中,跟踪资源是有限的,未跟踪轮次的反馈将永久丢失。在本文中,我们研究了在硬容量约束下的延迟在线凸优化(OCO),其中任何时候最多可以跟踪$C$个待处理轮次。为了建模延迟信息,我们引入了一个半先知模型,该模型细化了先前工作中的先知假设:学习者不需要在预测时知道延迟,而是在线观察延迟到期,这与经典的无约束延迟设置一致。我们的方法通过归约到一个新颖的“延迟且加权”的OCO问题来实现,使用一个随机化跟踪决策并对结果观测进行重要性加权的调度器。对于这个基础问题,我们提出并分析了延迟加权FTRL及其赌博机变体,建立了明确刻画时变权重与延迟反馈之间相互作用的遗憾界。将这些基础学习器与我们的调度器相结合,首次给出了在凸和强凸损失下容量受限OCO的遗憾保证,适用于一阶和赌博机反馈。对于一阶反馈,容量$C = \Omega(\log T)$足以在忽略对数因子的情况下恢复标准延迟OCO的速率。对于赌博机反馈,遗憾率由$(1 + \sigma_{\text{max}}/C)$的幂次调制,其中$\sigma_{\text{max}}$是任何时候的最大待处理观测数。这使得当$C < \sigma_{\text{max}}$时遗憾界能够优雅地退化,同时保持次线性。

英文摘要

Online learning with delayed feedback typically assumes that the learner can track all pending rounds until their feedback arrives. In practice, tracking resources are finite, and feedback from untracked rounds is permanently lost. In this paper, we study delayed online convex optimization (OCO) under a hard capacity constraint, where at most $C$ pending rounds can be tracked at any time. To model delay information, we introduce a semi-clairvoyant model that refines the clairvoyant assumption from prior work: rather than requiring delays to be known at prediction time, the learner observes delay expirations online, consistent with the classical unconstrained delayed setting. Our approach proceeds via a reduction to a novel ``delayed and weighted'' OCO problem, using a scheduler that randomizes tracking decisions and importance-weights the resulting observations. For this base problem, we propose and analyze Delayed-Weighted FTRL and its bandit analogue, establishing regret bounds that explicitly characterize the interaction between time-varying weights and delayed feedback. Combining these base learners with our schedulers yields the first regret guarantees for capacity-constrained OCO under convex and strongly convex losses, for both first-order and bandit feedback. For first-order feedback, capacity $C = \Omega(\log T)$ suffices to recover standard delayed OCO rates up to logarithmic factors. For bandit feedback, the regret rates are modulated by powers of $(1 + \sigma_{\text{max}}/C)$, where $\sigma_{\text{max}}$ is the maximum number of pending observations at any time. This allows the regret bound to degrade gracefully when $C < \sigma_{\text{max}}$, while remaining sublinear.

2606.11646 2026-06-11 cs.LG q-bio.QM stat.ML 新提交

Tree-Structured Orthonormal Decomposition of the Aitchison Simplex

Aitchison单纯形的树结构正交分解

Daisuke Yamada, Qijun Zhang, Travis Pence, Barbara B. Bendlin, Federico Rey, Vikas Singh

AI总结 提出PolyILR方法,利用树结构对成分数据进行正交分解,在微生物组和单细胞数据中生成稳定可解释的特征,并建立与softmax分类器的理论联系。

详情
Comments
Accepted at ICML 2026. To appear in PMLR vol. 306
AI中文摘要

成分数据——编码相对比例的向量——出现在包括生态学、地球化学和基因组学在内的科学领域。这些数据中的特征通常具有已知的层次结构(例如,分类学、系统发育、本体论),但现有方法要么忽略这种结构,要么丢弃内在的Aitchison几何,要么设计用于二叉树,要么产生不完整的坐标系。我们描述了PolyILR,一种与任何树拓扑对齐的Aitchison切空间的正交分解。我们的构造在每个内部节点定义了一个加权局部几何,捕获完整的分支结构,然后将这些提升到一个全局正交基,其中每个坐标对应一个特定的树位置。在微生物组和单细胞基准测试中,PolyILR产生稳定、可解释的特征,并支持多尺度树分辨率下的推理。我们还建立了与softmax分类器的新理论联系,暗示了在概率建模中的可能应用。

英文摘要

Compositional data -- vectors encoding relative proportions -- arise across scientific domains, including ecology, geochemistry, and genomics. The features in these data often come with known hierarchical structure (e.g., taxonomies, phylogenies, ontologies), yet existing methods either ignore this structure, discard the intrinsic Aitchison geometry, are designed for binary trees, or yield incomplete coordinate systems. We describe PolyILR, a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology. Our construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. We also establish a novel theoretical connection to softmax classifiers, suggesting possible applications to probabilistic modeling.

2606.11574 2026-06-11 cs.LG cond-mat.mtrl-sci physics.chem-ph stat.ML 新提交

Range-Aware Bayesian Optimization for Discovering Diverse Designs within Target Property Windows

范围感知贝叶斯优化用于在目标属性窗口内发现多样化设计

Shengli Jiang, Jason Wu, Charles M. Schroeder, Michael A. Webb

发表机构 * Department of Chemical and Biological Engineering, Princeton University(普林斯顿大学化学与生物工程系)

AI总结 提出范围感知贝叶斯优化框架,通过采集函数直接评分候选解满足目标范围的后验概率,在基准任务和实际案例中比标准方法发现更多样化的有效设计。

详情
Comments
64 pages, 6 main text figures, 17 supporting figures, 6 supporting tables
AI中文摘要

在许多材料和产品设计问题中,理想的候选物表现出可接受范围内的属性,而非达到单一最优值。恢复满足此类规格的多个不同解也具有实际价值,因为某些候选物可能因成本、可加工性或鲁棒性等原因而更受青睐,而这些因素难以直接编码到目标函数中。在此,我们开发了一个范围感知贝叶斯优化(BO)框架,其中采集函数直接评分候选解满足目标范围的后验概率。该框架自然扩展到在共享候选空间上并行追求多个不同规格。在基准任务中,范围感知采集一致地比标准BO基线和最近的目标寻求方法恢复更大且更多样化的有效设计集。其效用进一步在两个实际动机的设计案例研究中得到证明,涉及优化聚合物合成的反应条件和发现指定光学吸收带的序列定义低聚物,并得到量子化学计算的支持。这些结果表明,范围感知BO可以为规格驱动设计提供实用且样本高效的基础,特别是当设计灵活性和解多样性是重要考虑因素时。

英文摘要

In many materials and product design problems, desirable candidates exhibit properties that fall within an acceptable range rather than achieve a single optimum. Recovering multiple, distinct solutions that satisfy such specifications is also practically valuable, as some candidates may be preferred for reasons of cost, processability, or robustness that are difficult to encode directly in an objective function. Here, we develop a range-aware Bayesian optimization (BO) framework in which the acquisition function directly scores the posterior probability that a candidate satisfies a target range. The framework naturally extends to parallel pursuit of multiple distinct specifications over a shared candidate space. Across benchmark tasks, range-aware acquisition consistently recovers larger and more diverse sets of valid designs than standard BO baselines and recent goal-seeking methods. Its utility is further demonstrated in two practically motivated design case studies involving optimizing reaction conditions for polymer synthesis and sequence-defined oligomer discovery for prescribed optical absorption bands, supported by quantum chemical calculations. These results suggest that range-aware BO can provide a practical and sample-efficient foundation for specification-driven design, particularly when design flexibility and solution diversity are important considerations.

2606.11570 2026-06-11 stat.ML cs.LG stat.ME 新提交

Enhancing Spectral Embedding through Robust and Flexible Knowledge Transfer in Electronic Health Records

通过电子健康记录中的鲁棒且灵活的知识迁移增强谱嵌入

Feiqing Huang, Zongqi Xia, Rong Ma, Tianxi Cai

AI总结 提出一种基于谱的无监督表示学习框架,通过从更广泛人群提取知识矩阵并放松信号对齐假设,为罕见病队列生成低维嵌入,在模拟和真实多发性硬化症数据中优于现有方法。

详情
AI中文摘要

我们提出了一种基于谱的无监督表示学习框架,用于从电子健康记录中为罕见病队列的临床概念和患者导出低维嵌入,其中数据是高维的但样本量有限。为了克服这一挑战,我们引入了一个从更广泛人群中提取的知识矩阵,该矩阵与罕见病队列共享部分重叠的子空间。我们的方法不同于现有方法,它放松了潜在数据矩阵和知识矩阵之间严格的一对一信号对齐假设,允许更灵活和现实的结构化共享形式。我们引入了一种新颖的两步谱嵌入过程:首先,我们从知识矩阵中识别并移除不相关的成分;然后,我们应用基于投影的方法分别恢复共享和异质成分。模拟和对真实世界多发性硬化症队列的分析表明,所提出的方法优于竞争方法,特别是在共享信号较弱且仅部分对齐的挑战性场景中,这在罕见病数据中很常见。

英文摘要

We propose a spectral-based, unsupervised representation learning framework to derive low-dimensional embeddings for clinical concepts and patients in rare disease cohorts from electronic health records, where data are high-dimensional but sample sizes are limited. To overcome this challenge, we incorporate a knowledge matrix extracted from a broader population that shares a partially overlapping subspace with the rare-disease cohort. Our method departs from existing approaches by relaxing restrictive one-to-one signal-alignment assumptions between the latent data matrix and knowledge matrix, allowing more flexible and realistic forms of structured sharing. We introduce a novel two-step spectral embedding procedure: first, we identify and remove irrelevant components from the knowledge matrix; then, we apply a projection-based method to separately recover shared and heterogeneous components. Simulations and an analysis of a real-world multiple sclerosis cohort show that the proposed method outperforms competing approaches, particularly in challenging scenarios where shared signals are weak and only partially aligned, as is common in rare-disease data.

2606.11510 2026-06-11 q-bio.QM q-bio.PE stat.ML 新提交

Continuous biome representations from Earth observation embeddings

从地球观测嵌入中提取连续生物群落表示

Maxwell B. Joseph, Flávia De Souza Mendes, Dieu My T. Nguyen, Camile Sothe, Christopher B. Anderson (Planet Labs PBC)

AI总结 针对离散生物群落图压缩生态连续性的问题,提出从卫星图像嵌入中学习连续概率表示,在巴西6个生物群落和4672种植物数据上验证,优于离散标签预测物种分布。

详情
Comments
8 pages, 4 figures
AI中文摘要

生物群落随空间连续变化,但生物群落图通过分类边界压缩了这种变化,特别是在生态过渡带,过渡群落具有独特的生态特征。地球观测基础模型通过密集嵌入编码光谱、空间和时间信息,能否将离散的生物群落图转换为更好地捕捉生态变化的连续表示?本文在Clay v1.5卫星图像嵌入上拟合线性分类器,从分类图中预测生物群落标签。softmax输出产生一个连续概率向量,其维度对应命名的生物群落类别。我们使用巴西六个生物群落、130万个嵌入和10015个保留的森林清查样地(涵盖4672种植物)评估该方法。连续生物群落表示在预测物种出现方面优于离散生物群落标签(10次空间交叉验证中平均每物种AUC 0.618 vs. 0.570)。分解这一增益表明,改进来自分级概率输出的连续性,而非标签重新分配;该模式在距生物群落边界的所有距离上均成立。原始1024维嵌入仍然是我们测试的最强预测因子(平均AUC 0.646 vs. 0.618),但连续表示恢复了嵌入相对于离散标签的大部分增益。这种简单方法为分类地图标签提供了概率替代方案,保留了其含义,同时编码了离散地图抑制的分级变化。

英文摘要

Biotic communities vary continuously across space, yet biome maps impose categorical boundaries that compress this variation, particularly at ecotones where transitional communities are ecologically distinct. Could Earth observation (EO) foundation models, which encode spectral, spatial, and temporal information with dense embeddings, convert discrete biome maps into continuous representations that better capture ecological variation? Here, we fit a linear classifier on Clay v1.5 satellite image embeddings to predict biome labels from a categorical map. The softmax output yields a continuous probability vector whose dimensions correspond to named biome classes. We evaluate this approach using six Brazilian biomes, 1.3 million embeddings, and 10,015 withheld forest inventory plots spanning 4,672 plant species. The continuous biome representation outperforms discrete biome labels for predicting species occurrence (mean per-species AUC 0.618 vs. 0.570 across 10 spatial cross-validation folds). Decomposing this gain shows that continuity in the graded probability output, rather than label reassignment, accounts for the improvement; the pattern holds across all distances from biome boundaries. The raw 1024-dimensional embedding remains the strongest predictor we tested (mean AUC 0.646 vs. 0.618), but the continuous representation recovers most of the embedding's gain over discrete labels. This simple approach provides a probabilistic replacement for categorical map labels, preserving their meaning while encoding graded variation that discrete maps suppress.

2606.11487 2026-06-11 math.ST math.PR stat.ML 新提交

Unbiased Derivative Estimation for Stationary Mean of Parameterized Markov chains

参数化马尔可夫链平稳均值的无偏导数估计

Jeffrey Wang, Chang-han Rhee

AI总结 提出一种针对参数化马尔可夫链平稳均值梯度的无偏估计方法,在慢混合率下高效,无需密度函数先验知识,适用于神经网络参数化。

详情
Comments
Preliminary draft. Full version in preparation
AI中文摘要

我们提出了一种新方法,用于无偏估计与参数化马尔可夫链族相关的平稳均值的梯度。当马尔可夫链具有慢混合率时,我们的估计器特别高效。我们的方法不需要特定的参数化,除了一个预言机来评估给定数据点的转移密度及其梯度,而无需关于密度函数本身的任何额外知识。这使得我们的估计器适用于与神经网络相关的参数化。该估计器在效率方面可能实现大幅提升。数值实验证实了理论预测的良好性能。

英文摘要

We propose a new approach to unbiased estimation of the gradients of the stationary means associated with parametrized families of Markov chains. Our estimators are particularly efficient when the Markov chains have slow mixing rate. Our approach does not require a specific parametrization except for an oracle to evaluate the transition density and its gradient at a given data point without any additional knowledge about the density function itself. It makes our estimator suitable for parametrizations associated with neural networks. The estimator can potentially achieve large improvement in terms of efficiency. Numerical experiments confirm the good performance predicted by the theory.

2606.11473 2026-06-11 cs.LG cs.AI stat.ML 新提交

CRUMB: Efficient Prior Fitted Network Inference via Distributionally Matched Context Batching

CRUMB: 通过分布匹配上下文批处理实现高效先验拟合网络推理

Jamie Heredge, Mattia J. Villani, Pranav Deshpande, Akshay Seshadri, Niraj Kumar

发表机构 * Global Technology Applied Research, JPMorganChase(摩根大通全球技术应用研究)

AI总结 提出CRUMB方法,通过聚类查询、最小化最大均值差异选择训练子集、再执行精确推理,在不重新训练的情况下加速先验拟合网络推理,在51个数据集上优于同类方法。

详情
Comments
26 pages, 13 figures
AI中文摘要

先验拟合网络(PFNs)是一类有前景的表格基础模型,执行上下文学习,其中整个带标签的训练集作为上下文提供,并在单次前向传播中生成测试查询的预测。然而,许多PFN架构中二次缩放的自注意力机制使得对于非常大的训练数据集推理变得不可行。我们提出CRUMB(使用最小化MMD批处理的聚类检索),一个三阶段推理包装器:(i)聚类测试查询,(ii)通过贪心最小化最大均值差异(MMD)为每个聚类选择一个小型、分布匹配的训练子集,(iii)在每个缩减上下文的批次上执行精确的PFN推理。CRUMB是架构无关的,无需重新训练。在51个数据集的TabArena基准测试中,跨三种PFN架构(TabPFNv2、TabICLv1、TabICLv2)评估,我们展示了CRUMB优于类似的最先进的上下文选择策略。我们还展示了CRUMB对协变量漂移具有鲁棒性,因为MMD最小化步骤自然有助于对齐训练上下文分布以匹配当前测试批次分布。

英文摘要

Prior-fitted networks (PFNs) are a promising class of tabular foundation models that perform in-context learning, whereby the entire labelled training set is supplied as context, and predictions for test queries are produced in a single forward pass. However, the quadratically scaling self-attention mechanism in many PFN architectures makes inference prohibitive for very large training datasets. We propose CRUMB (Clustered Retrieval Using Minimised-MMD Batching), a three-stage inference wrapper that (i) clusters the test queries, (ii) selects a small, distributionally matched training subset for each cluster by greedily minimising the maximum mean discrepancy (MMD), and (iii) runs exact PFN inference on each reduced-context batch. CRUMB is architecture-agnostic and requires no retraining. On the 51-dataset TabArena benchmark, evaluated across three PFN architectures (TabPFNv2, TabICLv1, TabICLv2), we show that CRUMB outperforms similar state-of-the-art context selection strategies. We also show that CRUMB is resilient to covariate drift, as the MMD-minimisation step naturally helps align the training context distribution to match the current test batch distributions.

2606.11437 2026-06-11 cs.DS cs.AI cs.LG stat.ML 新提交

The Power of Test-Time Training for Approximate Sampling

测试时训练对近似采样的威力

Noah Golowich, Ankur Moitra, Dhruv Rohatgi

AI总结 本文形式化测试时训练(TTT)为从已知分布类中采样的问题,证明查询复杂度的二次下界,并展示在分布类大小受限时可规避该下界,为TTT提供理论框架。

详情
AI中文摘要

从复杂概率分布中高效采样是一个基本问题,近年来随着生成式AI的兴起,这一问题变得越来越重要,因为从大语言模型(LLM)中提出的复杂采样程序已被用于解决具有挑战性的推理问题。然而,这类采样算法的有效性受到LLM与特定采样任务之间关系的限制,这推动了测试时训练(TTT)框架的发展。TTT通过根据推理时收到的部分生成和奖励反馈更新模型权重来工作,从而适应特定问题。在这项工作中,我们提出了一种TTT的形式化,将其定义为从属于已知分布类$F$的给定概率测度$\mu^\star$中生成样本的问题,给定一个提供$\mu^\star$近似密度估计的预言机$\hat \mu$。这与Jerrum、Valiant和Vazirani(1986)以及Jerrum和Sinclair(1989)的开创性工作中研究的将采样约化为近似计数的问题密切相关:即当$F$是所有分布的类时,它恰好与上述计数到采样的约化一致。在本文中,我们首先证明了在给定对$\hat \mu$的查询访问的情况下,从$\mu^\star$采样的查询复杂度的二次下界(对于足够大的类$F$),从而表明Jerrum和Sinclair(1989)提出并由Hayes和Sinclair(2010)改进的随机游走方法是最优的。这回答了Hayes和Sinclair提出的一个开放问题。然后,我们证明如果$F$的大小适当受限,这个下界可以被规避。正如我们所讨论的,后一个结果可以被视为TTT的抽象,因此代表了为TTT发展一个原则性理论框架的起点。

英文摘要

Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $\mu^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat \mu$ which yields approximate density estimates for $\mu^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $\mu^\star$ given query access to $\hat \mu$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT.

2606.11417 2026-06-11 cs.LG cs.AI stat.ML 新提交

Signed Compression Progress on a Sealed Audit is Goodhart-Resistant

密封审计上的有符号压缩进展是古德哈特抵抗的

Ayush Mittal, Dhruv Gupta

AI总结 提出有符号压缩进展作为内在动机,证明其累积奖励等于审计改进,且对有限审计面板具有假阳性预算,抵抗古德哈特定律。

详情
Comments
16 pages, 7 figures. Lean 4 (Mathlib) mechanized core and ARC-TGI experiment code: this https URL
AI中文摘要

压缩进展是一个长期提出的内在动机方案:当智能体的世界模型在预测或压缩经验方面变得更好时给予奖励。民间声称这种奖励是“可信的”,因为它只在学习时支付。我们使这一点精确化并证明它。如果内在奖励是固定密封审计损失的有符号减少,即 r_t = E(theta_{t-1}) - E(theta_t),那么累积奖励恰好望远镜式地归结为端点审计改进,因此没有策略可以在真实审计性能停滞或下降时无限推高奖励。对于有限审计面板,同样的结果成立,并带有尖锐的假阳性预算:累积经验奖励最多为真实审计改进加上 2 Delta_n(F, delta),即模型类的均匀审计偏差。这是无水平依赖的:一旦密封面板均匀控制该类,随时间变化的适应性无需付出代价。该定理还识别了失败模式:如果进展被截断、在智能体自身流上评分、暴露于可重用面板上的高容量模型,或应用于使 Delta_n 无效的神经类,则保证消失。我们给出了结构核心(望远镜式、有限审计界、有限吉布斯和熵下限)的 Lean 4 机械化,以及在 ARC-TGI 网格变换生成器上带有自适应保留攻击的实验套件。实验证实了理论:有限审计偏差按 n^{-0.527} 缩放;有符号进展抵抗截断农场、流泄漏和噪声电视好奇心;朴素的可重用审计可被黑盒标量反馈利用,而标准发布防御将攻击保持在 2 Delta_n 阈值以下。密封审计上的有符号压缩进展是真正改进的会计信号。

英文摘要

Compression progress is a long-standing proposal for intrinsic motivation: reward an agent when its world model becomes better at predicting or compressing experience. The folk claim is that this reward is "credible" because it is paid only for learning. We make this precise and prove it. If intrinsic reward is the signed decrease of a fixed sealed-audit loss, r_t = E(theta_{t-1}) - E(theta_t), then cumulative reward telescopes exactly to endpoint audit improvement, so no policy can push reward up indefinitely while true audit performance stagnates or degrades. For finite audit panels the same result holds with a sharp false-positive budget: cumulative empirical reward is at most true audit improvement plus 2 Delta_n(F, delta), the uniform audit deviation of the model class. This is horizon-free: adaptivity over time costs nothing once the sealed panel uniformly controls the class. The theorem also identifies the failure modes: the guarantee disappears if progress is clipped, scored on the agent's own stream, exposed to a high-capacity model on a reusable panel, or applied to a neural class that makes Delta_n vacuous. We give a Lean 4 mechanization of the structural core (telescoping, the finite-audit bound, finite Gibbs, and the entropy floor) and an experiment suite on ARC-TGI grid-transformation generators with adaptive holdout attacks. Experiments confirm the theory: finite-audit deviation scales as n^{-0.527}; signed progress resists clip-farming, stream leakage, and noisy-TV curiosity; naive reusable audits are exploitable by black-box scalar feedback, while standard release defenses keep the attack below the 2 Delta_n threshold. Signed compression progress on a sealed audit is an accounting signal of genuine improvement.

2606.11402 2026-06-11 stat.CO astro-ph.IM stat.ML 新提交

GraphGP: Scalable Gaussian Processes with Vecchia's Approximation

GraphGP: 基于Vecchia近似的可扩展高斯过程

Benjamin Dodge, Philipp Frank, Susan E. Clark

AI总结 提出GraphGP算法,利用Vecchia近似和GPU加速,将高斯过程扩展到近十亿参数,实现线性时间和内存复杂度,适用于大动态范围任意点分布。

详情
Comments
Accepted to Conference on Physics and AI at Stanford University (PAI 2026)
AI中文摘要

高斯过程是建模连续场的强大工具,但其朴素的$\mathcal{O}(N^3)$计算成本和$\mathcal{O}(N^2)$内存需求常常限制其实际应用。Vecchia近似是一种针对平稳、衰减核的稀疏精度矩阵近似,它将每个点仅条件于其$k$个最近邻。我们提出GraphGP,一种用于Vecchia近似的GPU算法,可扩展到近十亿参数,具有线性时间和内存需求,并能处理大动态范围内的任意点分布。我们的关键贡献是:(1) 一种比特反转k-d树排序,允许高效邻居搜索同时最大化批处理并行性;(2) 一种可微的CUDA实现,比纯JAX基线显著更快且内存效率更高。GraphGP提供了推理所需的构建块,包括前向生成、逆应用、对数行列式和核参数导数。

英文摘要

Gaussian processes are a powerful tool for modeling continuous fields, but their naive $\mathcal{O}(N^3)$ computational cost and $\mathcal{O}(N^2)$ memory requirement often limit their practical use. Vecchia's approximation is a sparse precision matrix approximation for stationary, decaying kernels that conditions each point only on its $k$ nearest neighbors. We present GraphGP, a GPU algorithm for Vecchia's approximation that scales to nearly a billion parameters with linear time and memory requirements, handling arbitrary point distributions over a large dynamic range. Our key contributions are (1) a bit-reversed k-d tree ordering that allows efficient neighbor searches while also maximizing batch parallelism, and (2) a differentiable CUDA implementation, which is substantially faster and more memory efficient than our pure JAX baseline. GraphGP provides the building blocks for inference, including forward generation, inverse application, log-determinant, and kernel parameter derivatives.

2606.11347 2026-06-11 stat.ML cs.LG math.OC 新提交

Annealed Entropic Allocation for Ranking and Selection

退火熵分配用于排序与选择

Xin Fei, Juergen Branke

AI总结 提出退火熵分配框架,通过加权log-sum-exp替代非光滑极大极小大偏差率目标,结合鞍点近似提升有限预算下的区分能力,数值实验表明在多个候选接近时性能优异。

详情
AI中文摘要

我们提出了退火熵分配,一种用于排序与选择中顺序预算分配的退火加权软最小化框架。核心思想是用加权log-sum-exp替代非光滑的极大极小大偏差率目标,该替代通过软最小化权重聚合特定候选对的得分,从而在多个候选几乎同时活跃时缓解硬切换。为了提升有限预算下的区分能力,我们引入了鞍点近似——一种从精细化的成对尾部渐近性导出的次指数修正。由于这些修正是次指数的,且平滑参数退火至零,该替代保持了与经典极大极小公式相同的一阶大偏差目标。我们证明了该替代一致收敛于硬最小值,软最小化权重集中于活跃候选,并且在固定权重下,诱导的目标分配映射在单纯形内部是连续的。在高斯和指数实例上的数值实验展示了竞争性能,尤其是在多个候选几乎持平时。

英文摘要

We propose Annealed Entropic Allocation, an annealed weighted soft-min framework for sequential budget allocation in ranking and selection. The central idea is to replace the non-smooth maximin large-deviation rate objective with a weighted log-sum-exp surrogate that aggregates challenger-specific pairwise scores through soft-min weights, mitigating hard switching when several challengers are nearly active. To improve finite-budget discrimination, we incorporate the saddlepoint approximation -- a sub-exponential correction derived from refined pairwise tail asymptotics. Because these corrections are sub-exponential and the smoothing parameter is annealed to zero, the surrogate preserves the same first-order large-deviation target as the classical maximin formulation. We show that the surrogate converges uniformly to the hard minimum, that the soft-min weights concentrate on the active challengers, and that, under fixed weights, the induced target allocation map is continuous on the simplex interior. Numerical experiments on Gaussian and exponential instances demonstrate competitive performance, especially when multiple challengers are nearly tied.

2606.11339 2026-06-11 math.OC cs.AI cs.LG eess.SY stat.ML 新提交

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

松弛全局几何下分布式优化的量化随机原始-对偶方法

Susmit Sarkar, Abhinav Raghuvanshi, Kushal Chakrabarti, Mayank Baranwal

AI总结 提出量化随机原始-对偶方法q-PDGD,在松弛全局几何下证明线性收敛到邻域或O(1/k)收敛,匹配最优集中随机复杂度。

详情
Comments
Accepted to UAI
AI中文摘要

我们研究具有随机梯度和有限比特通信(由随机(无偏)量化建模)的分布式优化。我们提出q-PDGD,一种量化的随机原始-对偶方法,并在松弛全局几何下对其进行分析。在受限割线不等式(RSI)下,常数步长产生线性收缩到由梯度噪声、量化失真和网络连通性确定的显式邻域,而递减步长在没有共享最小化器假设的情况下实现O(1/k)收敛。在Polyak-Lojasiewicz(PL)不等式下,我们在相同的随机量化设置中获得线性到邻域的收敛。我们的结果在预言复杂度上匹配已知最优的集中随机速率,并通过实验证明了量化水平、步长选择和图结构之间的预测权衡。

英文摘要

We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI), a constant step-size yields linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity, while a diminishing step-size achieves O(1/k) convergence without shared-minimizer assumptions. Under Polyak-Lojasiewicz (PL) inequality, we obtain linear-to-neighborhood convergence in the same stochastic quantized setting. Our results match the best-known centralized stochastic rates in oracle complexity, and are supported by experiments demonstrating the predicted tradeoffs between quantization level, step-size choice, and graph structure.

2606.11283 2026-06-11 cs.DS cs.LG stat.ML 新提交

Fixed-Parameter Tractability of Private Synthetic Data Generation

私有合成数据生成的固定参数可处理性

Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Alexander Knop, Ravi Kumar, Pasin Manurangsi

AI总结 研究差分隐私下合成数据生成问题,通过查询族关联图的树宽参数建立固定参数可处理性,提出两种最优算法。

详情
AI中文摘要

我们研究在差分隐私下生成合成数据的问题。我们建立了该问题的固定参数可处理性(FPT),其中参数是查询族关联图的树宽。我们的算法在所有情况下都达到最优错误率,并通过两种不同方法实现:第一种基于线性规划(LP)和LP对偶分离问题的FPT;第二种基于子采样私有乘法权重方法,其中我们获得了从吉布斯分布采样的FPT。两种方法都通过树分解上的动态规划框架统一。

英文摘要

We study the problem of generating synthetic data under differential privacy. We establish fixed-parameter tractability (FPT) for this problem where the parameter is the treewidth of the query family's incidence graph. Our algorithms attain optimal error rates across all regimes and are realized by two different approaches: the first is based on linear programming (LP) and the FPT of the separation problem for the LP dual; the second is based on a subsampled private multiplicative weights method, where we obtain FPT for sampling from Gibbs distributions. Both approaches are unified by a dynamic programming framework over a tree decomposition.

2606.11118 2026-06-11 cs.LG math.OC math.PR stat.AP stat.ML 版本更新

Data-Driven Dynamic Assortment in Online Platforms: Learning about Two Sides

在线平台中的数据驱动动态分类:学习双边信息

Rahul Roy, Nur Sunar, Jayashankar M. Swaminathan

AI总结 针对双边服务平台,提出一种数据驱动算法,在未知顾客和卖家选择参数的情况下动态优化商品分类,并证明其遗憾值随时间呈多对数增长且达到最优速率。

详情
AI中文摘要

我们研究了一个在离散时间环境下,具有不完全信息和异质顾客的双边服务平台上的动态分类问题。在每个周期,一位顾客到达寻求服务,平台选择一组卖家进行展示。顾客根据多项逻辑选择模型,最多向分类中的一个卖家提出交易。经过固定数量的周期后,卖家审查收到的提议,并根据另一个多项逻辑选择模型,每位卖家最多选择一个顾客,然后循环重复。一个关键挑战是平台事先不知道顾客或卖家的选择模型参数。据我们所知,这是首次研究双边选择参数均未知的动态分类问题。我们开发了一种数据驱动算法,该算法在优化平台目标的同时学习这些参数。我们使用遗憾值来评估性能,该遗憾值衡量相对于一个预知所有参数和顾客到达时间的先知基准的收入损失。我们证明该算法的最坏情况遗憾值随时间呈多对数增长,并推导出匹配的下界,从而确定其速率最优性。

英文摘要

We study a dynamic assortment problem on a two-sided service platform with incomplete information and heterogeneous customers in a discrete-time setting. In each period, a customer arrives seeking service, and the platform chooses an assortment of sellers to display. The customer then proposes a transaction to at most one seller in the assortment according to a multinomial logit choice model. After a fixed number of periods, sellers review the proposals they have received and each chooses at most one customer according to another multinomial logit choice model, after which the cycle repeats. A key challenge is that the platform does not know the choice-model parameters of either customers or sellers in advance. To our knowledge, this is the first study of a dynamic assortment problem in which both sides' choice parameters are unknown. We develop a data-driven algorithm that learns these parameters while optimizing the platform's objective over time. We evaluate performance using regret, which measures revenue loss relative to a clairvoyant benchmark that knows all parameters and customer arrivals in advance. We show that the algorithm's worst-case regret grows polylogarithmically over time, and we derive a matching lower bound, establishing its rate optimality.

2606.10212 2026-06-11 math.ST stat.ML 版本更新

Intrinsic Riemannian Cross-covariance for Manifold-valued Random Objects

内蕴立足点不变黎曼互协方差

Carlos Soto, Cheng Wang, Yujing Huang, Xiaoyu Chen

AI总结 提出一种通过平行传输将局部变化映射到公共切空间的黎曼互协方差,实现流形上随机对象的二阶统计量估计,并证明其渐近性质,在球面、SPD流形和心脏瓣膜形状数据上验证有效性。

详情
Comments
31 pages, 16 figures
AI中文摘要

协方差估计是表示学习、降维和依赖建模中基本的二阶统计量。虽然协方差在欧几里得空间中已被充分理解,但对于位于非线性黎曼流形上的随机对象(在现代机器学习应用中日益常见,涉及形状、对称正定(SPD)矩阵等),协方差定义不明确。本文引入了一种针对流形值随机对象的内蕴黎曼互协方差。我们的方法通过平行传输将局部变化映射到公共切空间来定义协方差和相关,从而得到一个独立于任意坐标选择的二阶描述符。我们证明了所提出的协方差继承了欧几里得对应物的理想性质,并刻画了其渐近行为。在球面和SPD流形上的数值研究,以及在Kendall形状空间中心脏瓣膜形状的真实数据实验,证明了我们估计量的有效性并验证了所述性质。我们的结果将黎曼协方差定位为非欧几里得表示空间中二阶学习和分析的基本工具。

英文摘要

Covariance estimation yields a fundamental second-order statistic underlying representation learning, dimension reduction, and dependence modeling. While covariance has been well understood in Euclidean spaces, it is ill-defined for random objects residing on nonlinear Riemannian manifolds, which increasingly arise in modern machine learning applications involving shapes, symmetric positive definite (SPD) matrices, etc. This paper introduces an intrinsic Riemannian cross-covariance for manifold-valued random objects. Our approach defines covariance and correlation by transporting local variations to a common tangent space via parallel transport, yielding a second-order descriptor that is independent of arbitrary coordinate choices. We establish that the proposed covariance inherits desirable properties of its Euclidean counterparts and characterize its asymptotic behavior. Numerical studies on spheres and SPD manifolds, together with real-data experiments on heart valve shapes in Kendall's shape space, demonstrate the effectiveness of our estimators and verify the stated properties. Our results position the Riemannian covariance as a fundamental tool for second-order learning and analysis in non-Euclidean representation spaces.

2606.08493 2026-06-11 q-bio.GN cs.LG stat.ML 版本更新

Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement

在组织图上通过监督解缠查询反事实

Abdul Moeed, Stefan Schrod, Martin Rohbeck, Marc Jan Bonder, Pavlo Lutsik, Oliver Stegle, Daniel Dimitrov

AI总结 本文形式化组织图反事实为空间干预,提出Cellina框架通过监督解缠分解细胞内在状态与空间上下文,用于反事实预测,在结直肠癌和小鼠大脑数据上优于现有方法。

详情
AI中文摘要

组织图反事实询问在改变的空间邻居上下文中细胞的表达将如何变化。这类查询对于预测组织中细胞行为至关重要,但缺乏统一定义,现有方法针对特定干预类型或将细胞视为独立同分布。在这项工作中,我们首先将组织图反事实形式化为一类空间干预,这些干预要么重新连接细胞之间的边(边扰动),要么修改其邻居的表达(节点扰动)。然后,我们介绍Cellina(https://cellina.readthedocs.io),一个使用监督解缠将细胞内在状态从其空间上下文中分解出来的框架,将后者作为反事实预测的条件输入。在跨越结直肠癌和小鼠大脑中超过250万个空间分辨细胞的基准测试中,Cellina在组织扰动、解缠和可扩展性方面优于空间感知和非空间的竞争对手。此外,我们展示了Cellina以无监督方式揭示生物学上不同的癌症子域,并实现靶向邻居扰动模拟。

英文摘要

Tissue graph counterfactuals ask how a cell's expression would change under altered spatial neighbor contexts. Such queries are central to predicting cell behavior in tissues, but lack a unified definition, with existing methods targeting specific intervention types or treating cells as i.i.d. In this work, we first formalize tissue graph counterfactuals as a class of spatial interventions that either rewire connections between cells (edge perturbation) or modify the expression of their neighbors (node perturbation). We then introduce Cellina ( this https URL ) - a framework that uses supervised disentanglement to decompose a cell's intrinsic state from its spatial context, using the latter as a conditioning input for counterfactual predictions. Across benchmarks spanning over 2.5 million spatially-resolved cells in colorectal cancer and mouse brain, Cellina outperforms spatially-informed and non-spatial competitors in in-silico graph perturbations, disentanglement, and scalability. Additionally, we show that Cellina reveals biologically distinct cancer subdomains in an unsupervised manner and enables targeted neighbor perturbation simulations.

2606.05551 2026-06-11 stat.ML cs.AI cs.LG 版本更新

Conformal Risk-Averse Decision Making with Action Conditional Guarantee

具有行动条件保证的共形风险规避决策

Zihan Zhu, Shayan Kiyani, George Pappas, Hamed Hassani

AI总结 提出行动条件共形预测方法,通过分位数损失最小化算法实现行动条件风险价值优化,在有限样本下提供行动条件安全保证。

详情
AI中文摘要

由机器学习模型驱动的可靠决策管道需要具有明确安全保证的不确定性量化(UQ)方法。共形预测通过将ML预测包装成预测集来提供这种UQ,而Kiyani等人(2025b)的最新工作表明,这些集合可以转化为最优的风险规避决策策略——但仅继承边际安全保证。我们通过以下方式推广并加强了他们的结果:(i)引入行动条件共形预测,该预测产生明确条件于决策者所采取的每个行动的安全保证;(ii)表明行动条件预测集可作为风险规避决策者旨在优化行动条件风险价值的可行决策空间的代理;(iii)提出一种基于分位数损失最小化的原则性有限样本算法,将Gibbs等人(2025)的框架与行动条件保证联系起来。在两个真实世界数据集上的实验证实,我们的方法在行动条件性能上显著优于共形基线。

英文摘要

Reliable decision making pipelines powered by machine learning models require uncertainty quantification (UQ) methods that come with explicit safety guarantees. Conformal prediction provides such UQ by wrapping ML predictions into prediction sets, and recent work by Kiyani et al. (2025b) established that these sets can be translated into optimal risk-averse decision policies -- yet only inheriting marginal safety guarantees. We generalize and strengthen their results by (i) introducing action-conditional conformal prediction, which yields safety guarantees conditioned explicitly on each action taken by the decision maker, (ii) showing that action-conditional prediction sets serve as a proxy for the feasible decision space for risk-averse decision makers aiming to optimize action-conditional value-at-risk, and (iii) proposing a principled finite-sample algorithm based on pinball-loss minimization, connecting the framework of Gibbs et al. (2025) to action-conditional guarantees. Experiments on two real-world datasets confirm that our approach significantly improves action-conditional performance over conformal baselines.

2605.27478 2026-06-11 stat.ML cs.LG math.PR 版本更新

Triangular-Reference Schrödinger Bridges for Time Series Generation

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

AI总结 提出三角参考薛定谔桥框架,通过区间冻结的退化扩散参考和层次化潜在波动率结构,实现时间序列的保守生成,并保持熵最小化的变分核心。

详情
AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥(TR-SBTS),这是SBTS框架的一种保守扩展,其中布朗参考被替换为区间冻结的、可能退化的扩散参考,在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影,变分约束在时间和潜在水平上联合施加,并通过相对熵的分解层次展开。SBTS的变分核心得以保留:熵最小化器是参考的h-变换,在每个冻结区间上,最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式,即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现,该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核,以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤,其中每个潜在水平为上一水平产生动态参考,并由协方差描述符总结;该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.