arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4089
2606.02385 2026-06-02 q-bio.NC cs.LG

How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations

最优性如何结构化稀疏字典:理解SAE表示的理论

William Dorrell

AI总结 本文通过扩展局部最优性分析到非负联合优化问题,推导出稀疏自编码器(SAE)最优特征与数据分布之间的约束,解释了层级分裂与吸收、残差结构和密集对映特征等行为,并构建了新型大字典凸问题以探索宽原子-数据点极限。

Comments 27 pages, 5 figures

详情
AI中文摘要

稀疏自编码器(SAE)已成功将神经表示解析为可解释的概念,为理解和控制提供了基础。然而,SAE究竟提取了什么,以及我们据此能得出哪些科学结论,并不明显。经验上,证据在于结果:SAE学习了可解释的特征。理论上,我们缺乏一个清晰的解释,说明一个“概念”必须满足什么属性才能被SAE提取。已有大量可识别性工作研究稀疏编码恢复真实特征的条件,但这些方法往往关注简单的数据生成模型(如稀疏独立特征),这些模型难以近似SAE所训练的、吞噬互联网的语言模型表示。在此,我们避免数据生成模型,仅询问任何字典学习最优解必须满足什么属性。具体地,我们将局部最优性分析(Gribonval & Schnass, 2010)扩展到普通SAE近似的非负联合优化问题,并推导出最优SAE特征与其分布之间的约束。我们利用这些约束解释了一系列观察到的SAE行为——层级分裂与吸收、残差结构以及密集对映特征——每个都反映了L1+非负性如何与数据交互以结构化最优字典。最后,我们构建了一个新颖的大字典凸问题,并探索了宽原子-数据点极限。总之,我们希望将模型假设与意外观察区分开,从而从SAE的成功中学到更多,并为设计其继任者提供原则。

英文摘要

Sparse Autoencoders (SAEs) have found success parsing neural representations into interpretable concepts, providing a basis for understanding and control. However, what exactly SAEs extract, and, correspondingly, the scientific conclusions we can draw from them, are not obvious. Empirically, the proof is in the pudding: SAEs learn interpretable features. Theoretically, we lack a clear account of what properties a 'concept' must satisfy for an SAE to extract it. There has been extensive identifiability work studying the conditions under which sparse coding recovers ground-truth features; however, these approaches tends to focus on simple data-generating models (e.g. sparse independent features) which poorly approximate the internet-swallowing language-model representations on which SAEs are trained. Here, avoiding data-generating models, we ask simply what properties any dictionary learning optimum must satisfy. Concretely, we extend local optimality analyses (Gribonval & Schnass, 2010) to the nonnegative joint-optimisation problem that vanilla SAEs approximate, and derive constraints relating optimal SAE features to their distributions. We use these constraints to explain a range of observed SAE behaviours - hierarchical splitting & absorption, the structure of residuals, and dense antipodal features - each reflecting how L1+nonnegativity interact with data to structure optimal dictionaries. Finally, we construct a novel large-dictionary convex problem and explore the wide atom-per-datapoint limit. In sum, we hope to tease model assumptions from unexpected observations, letting us learn more from SAEs' successes and provide principles for designing their successors.

2606.02247 2026-06-02 stat.ML cs.LG

ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation

ShaplEIG:用于Shapley值估计的贝叶斯实验设计

David Rundel, Fabian Fumagalli, Maximilian Muschalik, Bernd Bischl, Matthias Feurer

AI总结 提出ShaplEIG方法,通过高斯过程代理和期望信息增益自适应选择联盟,以高效估计Shapley值,在低预算场景下显著提升样本效率。

Comments Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

Shapley值是一种原则性的归因度量,广泛用于可解释机器学习,但其精确计算随玩家数量呈指数增长,促使了基于采样联盟价值函数评估的各种近似方法。这引发了一个问题:能否通过根据先前评估自适应选择联盟来提高近似精度?这在价值函数昂贵且评估次数严重受限的设置中尤为重要,例如基于重训练的特征重要性、数据估值和超参数重要性。为此,我们提出ShaplEIG,一种贝叶斯实验设计方法,该方法使用高斯过程代理近似昂贵的价值函数,并根据联盟对Shapley值的期望信息增益自适应选择联盟。通过Shapley值在价值函数中的线性性质,我们证明了期望信息增益具有封闭形式。此外,我们提出了一种高效计算方案,通过初等对称多项式将复杂度从指数级降低到玩家数量的多项式级。在多种昂贵应用的广泛实验中,我们的方法在低预算场景下始终优于最先进的基线方法,提高了样本效率。

英文摘要

Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This is particularly relevant in settings where the value function is costly and the number of evaluations is severely limited, such as retraining-based feature importance, data valuation, and hyperparameter importance. For this purpose, we propose ShaplEIG, a Bayesian experimental design approach that approximates the expensive value function using a Gaussian process surrogate and adaptively selects coalitions based on their expected information gain about the Shapley values. By the linearity of the Shapley values in the value function, we show that the expected information gain is available in closed form. Furthermore, we propose an efficient computation scheme that reduces the complexity from exponential to polynomial in the number of players via elementary symmetric polynomials. In extensive experiments across diverse costly applications, our method consistently improves sample efficiency in the low-budget regime over state-of-the-art baselines.

2606.02117 2026-06-02 stat.ML cs.LG stat.ME

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

ProbRes: 概率时间序列预测的波动率学习

Tingting Wang, Yunyi Zhang, Benyou Wang

AI总结 提出ProbRes,一种事后概率校准方法,通过显式学习波动率动态来改进概率预测,有效处理异方差数据,并在理论和实验上验证其有效性。

详情
AI中文摘要

概率时间序列预测由于需要量化未来观测中的风险和不确定性,在金融应用中引起了越来越多的关注。我们提出ProbRes,一种事后概率校准方法,它显式地学习并将波动率动态纳入概率预测中,从而能够有效处理异方差数据。在训练过程中,ProbRes采用两个与架构无关的模块分别对条件均值和条件波动率进行建模。在推理阶段,它通过重采样标准化残差生成预测分布。ProbRes适用于单变量和多变量时间序列,并且在广泛的误差分布下保持稳健,包括具有条件异方差的非高斯创新。理论结果证明了ProbRes的有效性,在合成和真实数据集上的实验表明,ProbRes准确捕捉预测分布并产生校准良好的预测区间。

英文摘要

Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During training, ProbRes employs two architecture-agnostic modules to separately model the conditional mean and conditional volatility. At the inference stage, it generates predictive distributions by resampling normalized residuals. ProbRes is applicable to both univariate and multivariate time series and remains robust under a wide range of error distributions, including non-Gaussian innovations with conditional heteroskedasticity. Theoretical results demonstrate ProbRes's validity and experiments on both synthetic and real-world datasets show that ProbRes accurately captures predictive distributions and produces well-calibrated prediction intervals.

2606.02080 2026-06-02 cs.MA cs.AI cs.CV

Agentic-J: An AI Agent for Biological Microscopy Image Analysis

Agentic-J:用于生物显微镜图像分析的AI智能体

Lukas Johanns, Marilin Moor, Davide Panzeri, Yu Zhou, Xinyi Chen, Nora F. K. Pauly, Zixuan Pan, Matthias Gunzer, Andreas Müller, Yiyu Shi, Hedi Peterson, Jianxu Chen

AI总结 提出基于容器的多智能体AI助手Agentic-J,通过自然语言接口集成ImageJ/Fiji工具,实现从细胞分割到多条件量化的可追溯、可复现生物图像分析工作流。

Comments Presented at Cell Biology at Scale 2026 (Poster). The Agentic-J project is available at https://mmv-lab.github.io/Agentic-J/

详情
AI中文摘要

生物图像分析日益需要整合异构工具、编程环境和领域知识,而很少有研究人员能同时掌握这些。我们提出Agentic-J,一个容器化的多智能体AI助手,主要面向ImageJ/Fiji,使生物学家能够用自然语言指定分析任务,从细胞核分割、细胞追踪到多条件量化。该智能体生成可执行的脚本,并组织成有文档记录的项目结构,因此每个分析决策都是可追溯的,工作流可以复现或共享。专门的子智能体负责插件管理、代码生成、调试、质量保证和统计报告。本文介绍系统的设计,展示真实的生物显微镜图像分析工作流,并详细说明技术实现。

英文摘要

Biological image analysis increasingly demands integration across heterogeneous tools, programming environments, and domain knowledge that few researchers can command simultaneously. We present Agentic-J, a containerised, multi-agent AI assistant, primarily for ImageJ/Fiji that enables biologists to specify analysis tasks in natural language, from nuclei segmentation and cell tracking to multi-condition quantification. The agent generates executable scripts organised into a documented project structure, so every analysis decision is traceable and the workflow can be reproduced or shared. The specialised sub-agents handle plugin management, code generation, debugging, quality assurance, and statistical reporting. In this paper we introduce the system's design, demonstrate real biological microscopy image analysis workflows, and detailed the technical implementation.

2606.02055 2026-06-02 cs.IT cs.LG cs.SI math.IT stat.ML

Query-Limited Community Recovery in Stochastic Block Models

随机块模型中的有限查询社区恢复

Sabyasachi Basu, Manuj Mukherjee, Lutz Oettershagen, Suhas Thejaswi

AI总结 研究在有限且带噪的网络数据访问下,通过自适应查询策略实现两社区随机块模型的精确社区恢复,并证明自适应查询可突破非自适应基准的信息论极限。

详情
AI中文摘要

我们研究在 $n$ 个顶点上的两社区随机块模型中,对网络数据的有限且带噪访问下的精确社区恢复。学习器可以查询一个带噪的邻域预言机,该预言机独立地以固定概率揭示被查询顶点的每个真实邻居,且从不返回非邻居,受限于有限的查询预算。我们考虑仅预言机访问以及一个组合模型,其中学习器还观察底层图的单个子采样副本。对于仅预言机访问,平衡均匀查询给出了一个尖锐的非自适应基准:当每个顶点被查询相同整数次数时,观测结果简化为具有衰减边概率的 SBM,并且 Abbe-Bandeira-Hall 精确恢复阈值适用。我们证明该基准并非自适应最优:在平衡均匀查询需要 $m n$ 次查询(对于某个 $m>1$)的机制下,两阶段自适应策略以 $n+o(n)$ 次查询成功。对于额外的子采样图,我们证明了一个亚线性查询的自适应差距:预算为亚线性的平衡数据无关均匀查询不会比单独的子采样图有所改进,而自适应查询可以针对少量不确定顶点并实现精确恢复。因此,自适应数据采集可以严格改善精确恢复的信息论极限。

英文摘要

We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.

2606.02038 2026-06-02 physics.app-ph cs.LG

Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints

部署约束下基于不确定性感知图神经网络的稀疏传感器城市温度场重建

Reda Snaiki, Abdelatif Merabtine

AI总结 提出一种不确定性感知图神经网络框架,从稀疏传感器重建每日最高温度场,支持距离约束传感器放置和概率超标映射,在蒙特利尔地区验证优于传统方法。

详情
AI中文摘要

从稀疏观测重建空间连续的每日温度场对于城市气候监测和热风险分析至关重要,但实际部署受限于传感器预算和间距约束。本研究提出一种不确定性感知图神经网络(GNN)框架,用于从稀疏传感器重建每日最高温度场,同时支持距离约束的传感器放置和概率超标映射。该模型使用基于图注意力的均值残差架构,通过高斯负对数似然训练,预测温度场和空间变化的预测不确定性场。传感器放置采用基于QR分解的本征正交分解(POD-QR)策略,并施加4公里最小传感器间距约束,与随机可行放置和最远点采样进行比较。该框架在蒙特利尔区域多边形上使用Daymet v4.1每日温度数据(1公里分辨率)进行评估,采用严格的时间留出协议(训练:2020-2023;测试:2024)。在传感器预算(10-40个传感器)下,所提出的GNN在未观测节点上的RMSE和MAE始终优于反距离加权和普通克里金法。传感器放置效应在低预算时最显著,在高预算时减弱,在施加间距约束下,约30个传感器时出现实际饱和状态。概率评估进一步显示,随着传感器密度增加,不确定性校准得到改善,并且比克里金法具有更好的锐度-校准权衡。这些结果支持所提出的框架作为不确定性感知温度场重建和面向决策的热风险映射的有效工具。

英文摘要

Reconstructing spatially continuous daily temperature fields from sparse observations is important for urban climate monitoring and heat-risk analysis, but practical deployments are limited by sensor budgets and spacing constraints. This study proposes an uncertainty-aware graph neural network (GNN) framework for reconstructing daily maximum temperature fields from sparse sensors while supporting distance-constrained sensor placement and probabilistic exceedance mapping. The model predicts both the temperature field and a spatially varying predictive uncertainty field using a graph-attention-based mean-residual architecture trained with a Gaussian negative log-likelihood. Sensor placement is addressed using a Proper Orthogonal Decomposition with QR factorization (POD-QR) strategy with a 4 km minimum inter-sensor distance constraint and is compared with random feasible placement and farthest-point sampling. The framework is evaluated over a Montreal-area polygon using Daymet v4.1 daily temperature data (1 km resolution) under a strict temporal hold-out protocol (training: 2020-2023; testing: 2024). Across sensor budgets (10-40 sensors), the proposed GNN consistently outperforms inverse distance weighting and ordinary kriging in RMSE and MAE on unobserved nodes. Sensor-placement effects are most pronounced at low budgets and diminish at higher budgets, with a practical saturation regime emerging around 30 sensors under the imposed spacing constraint. Probabilistic evaluation further shows improved uncertainty calibration with increasing sensor density and a better sharpness-calibration trade-off than kriging. These results support the proposed framework as an effective tool for uncertainty-aware temperature field reconstruction and decision-oriented heat-risk mapping.

2606.02008 2026-06-02 stat.ML cs.LG

Provable Data Scaling Law for Meta Learning via Complexity Minimization

通过复杂度最小化实现元学习的可证明数据缩放定律

Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

AI总结 提出复杂度最小化框架,通过最小化跨源域的最坏情况下游模型复杂度,从理论上证明元学习中的预训练数据规模增大可提升少样本适应性能。

详情
AI中文摘要

预训练已成为现代机器学习的基本范式,其关键经验优势之一是随着预训练数据规模的增加,下游样本复杂度降低。然而,现有的预训练理论框架并未完全解释这一现象。在本文中,我们引入了复杂度最小化,一种新颖的元表示学习框架,旨在实现对此缩放行为的理论分析,该框架通过评估每个领域最适合的下游模型复杂度并最小化跨源域的最坏情况复杂度来学习表示。我们的端到端理论分析,涵盖从预训练到下游回归,表明该框架可证明地捕捉了这种缩放行为;特别地,我们展示了少样本适应的错误率随着元训练数据量的增加而改善。实验上,我们证明将复杂度正则化纳入现有的元学习方法中持续提高下游样本效率。

英文摘要

Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.

2606.01502 2026-06-02 cs.DC cs.AI cs.NI

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

移动查询,而非缓存:跨GPU结构中的跨实例潜在注意力再分布特征

Bole Ma, Jan Eitzinger, Harald Köstler, Gerhard Wellein

AI总结 本研究通过真实多节点H100集群实验,刻画了多头部潜在注意力(MLA)在跨实例场景下的性能特征,提出了拓扑感知成本模型和路由/获取/本地谓词,证明在多数情况下路由查询比移动缓存更高效。

详情
AI中文摘要

前沿大语言模型越来越多地使用稀疏注意力索引器来决定查询关注的内容,该索引器为每个查询挑选几个KV缓存块:注意力的单位现在是一个小的、可重用的块。代理工作负载频繁使用这一机制:许多子代理查询一个大型代码库,重用相同的块。当语料库超出单个GPU容量时,它会被分区到多个实例上,因此查询及其选择的块通常位于不同的GPU上:回答查询意味着跨实例的注意力。先前跨实例KV系统的惯常做法是移动缓存:将选定的块拉到请求方。多头部潜在注意力反转了计算方式,将每个令牌的键和值压缩成一个窄向量,因此路由的查询行只有约1 KB,比它注意的块还小;此时路由查询通常比移动缓存更便宜。哪种原语在哪种结构和请求形状下胜出,尚未被研究,尤其是在设备发起的RDMA上,该技术使得每个请求的跨节点传输成本很低。我们在真实的多节点H100集群上刻画了跨实例MLA注意力的特征,提炼出两个可重用的产物:一个拓扑感知的成本模型(探测/传输/计算/返回/合并)和一个闭合形式的路由/获取/本地谓词,我们在真实的IBGDA上测量了其常数,该模型跟踪批量往返的误差在约7%以内。在解码阶段,它路由查询,将移动缓存的成本(连续块的约3毫秒重新适应拼接,或选择下的分散收集)替换为数十微秒的往返,并根据探测延迟而非峰值带宽选择结构。我们为MLA实例化了成本模型和谓词,但两者并非MLA特有:它们适用于任何通过压缩或稀疏选择将注意力缩小到小块的情况(如当前的DeepSeek-V3.2、V4和GLM-5.1)。将它们扩展到新架构只需测量两个系数:路由的有效载荷和获取的移动缓存成本。

英文摘要

Frontier LLMs increasingly decide what a query attends to with a sparse-attention indexer that picks a few KV-cache blocks per query: attention's unit is now a small, reusable chunk. Agentic workloads hammer it: many sub-agents query one large codebase, reusing the same blocks. When that corpus outgrows one GPU it is partitioned across instances, so a query and the blocks it selects often sit on different GPUs: answering it means attention across instances. The reflex of prior cross-instance KV systems is to move the cache: pull the selected blocks to the requester. Multi-head Latent Attention inverts the arithmetic, compressing each token's key and value into one narrow vector, so a routed query row is only ~1 KB, smaller than the chunk it attends; routing the query is then often cheaper than moving the cache. Which primitive wins, over which fabric and request shape, is uncharted, least of all on device-initiated RDMA that makes per-request cross-node transfers cheap. We characterize cross-instance MLA attention on a real multi-node H100 cluster, distilling two reusable artifacts: a topology-aware cost model (probe / transfer / compute / return / merge) and a closed-form route/fetch/local predicate, whose constants we measure on real IBGDA, where the model tracks batched round-trips to within ~7%. At decode it routes the query, trading the cost of moving the cache (a ~3 ms re-adaptation splice for a contiguous chunk, or a scattered gather under selection) for a tens-of-microsecond round trip, and picks the fabric by probe latency, not peak bandwidth. We instantiate the cost model and predicate for MLA, but neither is MLA-specific: they apply wherever compression or sparse selection shrinks attention to small chunks (DeepSeek-V3.2, V4, and GLM-5.1 today). Extending them to a new architecture requires measuring just two coefficients: the routed payload and fetch's move-the-cache cost.

2606.00703 2026-06-02 cs.IT cs.AI cs.LG math.IT

Information-Theoretic Lower Bounds for Bit-Constrained Stochastic Optimization via a Reduction to Compressed Gaussian Mean Estimation

通过约化到压缩高斯均值估计的比特约束随机优化的信息论下界

Munsik Kim

AI总结 本文通过将强凸二次族优化问题精确约化为交互式压缩高斯均值估计问题,推导出比特约束随机优化的无条件下界,并给出近乎匹配的可实现性结果。

详情
AI中文摘要

低精度预训练(FP8, MXFP4, NVFP4)现已成为前沿语言模型的标准,但文献几乎完全是可实现性——算法和经验缩放定律——没有匹配的信息论可能性的刻画。我们研究B比特量化随机一阶预言机:优化器与T轮交互,每轮接收其随机梯度的B比特自适应公共硬币描述。我们的主要贡献是将强凸二次族优化精确约化为交互式压缩高斯均值估计——在B比特预言机下,查询不携带信息,因此优化完全坍缩为顺序分布式估计问题。这产生了两个无条件下界:通信界TB = Omega(d)和统计界T = Omega(sigma^2 d / eps^2),以及尖锐的乘积形式界T = Omega((sigma^2 d / eps^2) max{1, d/B})。乘积形式也是无条件的:B比特转录本最多携带关于均值的O(TB / sigma^2) Fisher迹,因此比特而非维度限制了可恢复信息,结合多元van Trees不等式直接给出该界,无需有界似然比截断。我们给出了一个近乎匹配的可实现性结果,在有限动态范围预言机下精确计算每轮比特,紧至对数因子;下界针对真正高斯(无界)梯度,而缩小这一预言机差距留待未来。顺序率失真视角将约化扩展到相关和漂移预言机,并修正了先前的猜想:正噪声相关性将界提高(1+rho)/(1-rho)倍而非放松。这些界为任何低位梯度路径提供了信息论基线,而非关于已部署FP4系统的最优性声明。

英文摘要

Low-precision pretraining (FP8, MXFP4, NVFP4) is now standard for frontier language models, yet the literature is almost entirely achievability -- algorithms and empirical scaling laws -- with no matching characterization of what is information-theoretically possible. We study a B-bit quantized stochastic first-order oracle: an optimizer interacts for T rounds and receives, each round, a B-bit adaptive public-coin description of its stochastic gradient. Our main contribution is an exact reduction from optimizing a strongly convex quadratic family to interactively compressed Gaussian mean estimation -- under the B-bit oracle the query carries no information, so optimization collapses exactly onto a sequential distributed-estimation problem. This yields two unconditional lower bounds, a communication bound TB = Omega(d) and a statistical bound T = Omega(sigma^2 d / eps^2), and the sharp product-form bound T = Omega((sigma^2 d / eps^2) max{1, d/B}). The product form is also unconditional: a B-bit transcript carries at most O(TB / sigma^2) of Fisher trace about the mean, so bits rather than dimension limit the recoverable information, and combined with the multivariate van Trees inequality this gives the bound directly, without bounded-likelihood-ratio truncation. We give a near-matching achievability result with exact per-round bit accounting under a bounded-dynamic-range oracle, tight up to a logarithmic factor; the lower bound is for truly Gaussian (unbounded) gradients, and closing this oracle gap is left open. A sequential rate-distortion perspective extends the reduction to correlated and drifting oracles and corrects an earlier conjecture: positive noise correlation raises the bound by (1+rho)/(1-rho) rather than relaxing it. The bounds give an information-theoretic baseline for any low-bit gradient path, not an optimality claim about deployed FP4 systems.

2606.00302 2026-06-02 stat.ML cs.LG

ERICA: Quantifying Replicability of Cluster Analysis

ERICA: 量化聚类分析的可复现性

Siamak K. Sorooshyari, Manuel A. Rivas, Robert Tibshirani

AI总结 提出ERICA框架,通过迭代聚类分配计算统计量,量化数据集中的聚类结构是否可复现,并应用于合成数据和乳腺癌基因表达数据,发现合成数据可复现而部分真实数据存在不可复现性。

详情
AI中文摘要

尽管聚类在科学中无处不在,但其结果尚未通过框架进行定量审查。我们提出了一种称为通过迭代聚类分配评估可复现性(ERICA)的分析方法,应用于数据集以确定聚类是否以可复现的方式被识别。该流程计算一个统计量,描述数据集中是否发现结构。提出了定量可视化方法以回答重要问题,例如聚类之间的相似性以及可能是异常值的点的身份。当在合成数据上进行测试时,结果显示聚类以可复现的方式被发现。然而,我们注意到当该流程应用于三个用于乳腺癌亚型验证的基因表达数据集时,可能出现不可复现的结果。该研究强调了严格检查的必要性,并为此提供了一个实用工具。

英文摘要

Despite being ubiquitous in science, clustering remains a technique whose results are not quantitatively scrutinized via a framework. We present an analysis called evaluating replicability via iterative clustering assignments (ERICA) that is applied to a dataset to determine whether clusters are identified in a replicable manner. The pipeline computes a statistic that describes whether structure is found in a dataset. Quantitative visualization methods are presented to answer important questions such as the similarity between clusters, and the identity of points that may be outliers. When tested on synthetic data, the findings show clusters being discovered in a replicable manner. However, we note a possibility for non-replicable results when the pipeline is applied to three gene expression datasets for breast cancer subtype validation. The study underscores the need for rigorous inspection and offers a practical tool for doing so.

2606.00265 2026-06-02 stat.ML cs.LG

Out-of-Distribution generalization of quantile regression with heavy tailed inputs: an SVM approach

重尾输入下分位数回归的分布外泛化:一种SVM方法

Baptiste Leroux, Clément Dombry, Anne Sabourin

AI总结 针对协变量取异常大值的分位数回归外推问题,提出基于支持向量机(SVM)的框架,利用再生核希尔伯特空间处理高维非线性情况,并建立有限样本学习保证。

Comments 48 pages, 5 figures

详情
AI中文摘要

我们研究了协变量取异常大值的外推机制下的分位数回归。在正则变化假设下,极端观测可以通过其角度分量有效表征,从而使得学习策略能够聚焦于最极端观测的角度。该方法通过最小化渐近条件风险来形式化,该风险将学习定位在协变量分布的尾部。我们提出了一种新的支持向量机(SVM)框架用于极端分位数回归,利用再生核希尔伯特空间处理高维和非线性设置。我们的方法还适应无界响应变量,并避免了限制性变换。我们在温和的正则性假设下建立了有限样本学习保证。该框架统一了统计学习和多元极值的思想,提供了一种可处理且理论扎实的外推方法。我们通过对多瑙河河流流量数据的实证研究补充了理论发现,证明了我们方法的实际相关性。

英文摘要

We study quantile regression in an extrapolation regime where the covariate takes unusually large values. Under regular variation assumptions, extreme observations can be effectively characterized through their angular components, enabling learning strategies that focus on the angle of the most extreme observations. This approach is formalized through the minimization of an asymptotic conditional risk that localizes learning in the tail of the covariate distribution. We propose a novel Support Vector Machine (SVM) framework for extreme quantile regression, leveraging reproducing kernel Hilbert spaces to handle high-dimensional and nonlinear settings. Our method also accommodates unbounded response variables and avoids restrictive transformations. We establish finite-sample learning guarantees under mild regularity assumptions. The proposed framework unifies ideas from statistical learning and multivariate extremes, providing a tractable and theoretically grounded approach to extrapolation. We complement our theoretical findings with an empirical study on river flow data from the Danube, demonstrating the practical relevance of our methods.

2606.00154 2026-06-02 cs.SE cs.AI

Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages

多模态大语言模型在复杂交互网页代码生成上的基准测试

Fan Wu, Lishuai Dong, Cuiyun Gao, Yujia Chen, Yiming Huang, Yang Xiao, Qing Liao

AI总结 针对现有基准忽略复杂交互行为的问题,提出WebIGBench基准,包含103个真实复杂网页和871个交互动作,并设计新评估流程,测试多模态大语言模型在交互式网页代码生成上的性能。

详情
AI中文摘要

近期多模态大语言模型(MLLMs)在多模态推理和代码生成方面取得了显著进展,催生了前端开发的新范式。特别是,这些模型可以直接将视觉设计转化为可执行代码,显著提高了Web开发的效率和适应性。现代Web应用是动态且交互式的,具有频繁的用户-页面交互。然而,现有基准主要评估静态网页的代码生成,忽略了实际应用中的复杂交互行为。此外,它们的评估标准仍局限于视觉保真度和代码结构,忽视了生成网页与参考网页之间的交互一致性。为解决这些局限,我们引入了WebIGBench,这是首个旨在评估具有复杂交互的交互式网页代码生成的基准。通过结合手动设计的交互路径和UI自动化,我们从真实网站收集了103个复杂网页。该基准涵盖了5种流行的交互动作类型(例如点击、输入),涉及871个不同的交互动作。此外,我们提出了一种新的评估流程,以弥补交互动作自动评估的空白。在多个代表性MLLM上的大量实验揭示了当前模型在使用WebIGBench进行交互式网页代码生成时的性能边界。所提出的基准可在https://github.com/anoa12159-hue/WebIGBench_eval获取。

英文摘要

Recent advancements in multimodal large language models (MLLMs) have achieved remarkable progress in multimodal reasoning and code generation, catalyzing a new paradigm for front-end development. In particular, these models can directly transform visual designs into executable code, significantly improving the efficiency and adaptability of web development. Modern web applications are dynamic and interactive, featuring frequent user-page interactions. However, existing benchmarks largely evaluate the code generation of static webpages, ignoring the complex interactive behaviors in real-world applications. Besides, their evaluation criteria remain confined to visual fidelity and code structure, overlooking the interaction consistency between the generated and the reference webpages. To address these limitations, we introduce WebIGBench, the first benchmark designed to evaluate code generation for interactive webpages with complex interactions. By combining manually designed interaction paths with UI automation, we collected 103 complex webpages from real-world websites. This benchmark covers 5 popular interactive action types (e.g., click, input) involving 871 distinct interactive actions. Moreover, we propose a novel evaluation pipeline to address the gap in automated assessment of interactive actions. Extensive experiments on several representative MLLMs reveal the performance boundaries of current models in interactive webpage code generation using WebIGBench. The proposed benchmark is available at https://github.com/anoa12159-hue/WebIGBench_eval.

2201.10838 2026-06-02 cs.CR cs.LG

Privacy-Preserving Logistic Regression Training with A Faster Gradient Variant

隐私保护逻辑回归训练中的一种更快梯度变体

John Chiang

AI总结 提出一种名为二次梯度的梯度变体,用于加速隐私保护逻辑回归训练,并在同态加密场景下仅需四次迭代即可达到可比性能。

详情
AI中文摘要

近年来,对加密数据训练逻辑回归已成为解决安全问题的重要方法。本文引入了一种高效的梯度变体,称为 extit{二次梯度},该变体专为隐私保护逻辑回归设计,同时在明文优化中同样有效。通过引入二次梯度,我们改进了Nesterov加速梯度(NAG)、自适应梯度(AdaGrad)和Adam算法。我们在多个数据集上评估了这些改进算法,实验结果表明,其收敛速度达到最先进水平,显著优于传统一阶梯度方法。此外,我们将改进的NAG方法应用于同态逻辑回归训练,仅需四次迭代即可实现可比性能。所提出的二次梯度方法提供了一个统一框架,融合了一阶梯度方法和二阶牛顿型方法的优势,表明其可广泛应用于各种数值优化任务。

英文摘要

Training logistic regression over encrypted data has emerged as a prominent approach to addressing security concerns in recent years. In this paper, we introduce an efficient gradient variant, termed the \textit{quadratic gradient}, which is specifically designed for privacy-preserving logistic regression while remaining equally effective in plaintext optimization. By incorporating this quadratic gradient, we enhance Nesterov's Accelerated Gradient (NAG), Adaptive Gradient (AdaGrad), and Adam algorithms. We evaluate these enhanced algorithms across various datasets, with experimental results demonstrating state-of-the-art convergence rates that significantly outperform traditional first-order gradient methods. Furthermore, we apply the enhanced NAG method to implement homomorphic logistic regression training, achieving comparable performance within only four iterations. The proposed quadratic-gradient approach offers a unified framework that synergizes the advantages of first-order gradient methods and second-order Newton-type methods, suggesting broad applicability to diverse numerical optimization tasks.

2602.13906 2026-06-02 stat.ML cs.LG

How Accurately Can a Gaussian Approximate Stochastic Approximation Iterates?

高斯分布能以多高的精度近似随机逼近迭代?

Shaan Ul Haque, Zedong Wang, Zixuan Zhang, Siva Theja Maguluri

AI总结 本文通过递归定义协方差的高斯序列来近似随机逼近迭代的有限时间分布,并给出了Wasserstein-1距离的显式界,从而得到误差的尾部界和渐近正态性的收敛速率。

Comments 63 pages, 6 figures

详情
AI中文摘要

随机逼近(SA)是一种在噪声干扰下寻找算子根的方法。本文重点研究SA迭代在有限时间内的分布。通常,精确分布难以刻画,因此我们的目标是找到一种能提供有用尾部界的近似。受重缩放SA迭代渐近正态性丰富文献的启发,我们通过协方差递归定义的高斯序列来近似极限前分布。特别地,我们建立了在时间$k$处重缩放迭代与前述高斯之间Wasserstein-1距离的显式界,适用于多种步长选择。由于这些协方差收敛到经典渐近极限,我们的分析也附带给出了渐近正态性的收敛速率。作为界的直接推论,我们得到了任意时刻SA迭代误差的尾部界。最后,通过匹配下界证明了速率的尖锐性,并通过模拟验证了结果。我们首先研究由一般噪声驱动的离散Ornstein-Uhlenbeck(O-U)过程的收敛速率,其平稳分布与重缩放SA迭代的极限高斯分布相同,从而获得尖锐速率。鉴于其与采样文献的联系,我们认为这具有独立意义。分析涉及调整Stein方法进行高斯近似,以处理独立同分布随机变量的矩阵加权和。通过刻画重缩放SA迭代与离散时间O-U过程之间的误差动态,并结合后者的收敛速率,得到了所需的SA有限时间界。

英文摘要

Stochastic approximation (SA) is a method for finding the root of an operator perturbed by noise. The focus of this paper is studying the distribution of SA iterates in finite time. In general, it is not possible to characterize the exact distribution, and therefore our goal is to find an approximation which can yield useful tail bounds. Inspired by the rich literature on the asymptotic normality of rescaled SA iterates, we approximate the pre-limit distributions by a sequence of Gaussians whose covariance is recursively defined. In particular, we establish explicit bounds on the Wasserstein-1 distance between the rescaled iterate at time $k$ and the aforementioned Gaussian for various choices of step-sizes. Since these covariances converge to the classical asymptotic limit, our analysis also provides a convergence rate for asymptotic normality as a by-product. As an immediate consequence of our bounds, we obtain tail bounds on the error of SA iterates at any time. Finally, we establish the sharpness of our rates by providing matching lower bounds and validate our findings through simulations. We obtain the sharp rates by first studying the convergence rate of the discrete Ornstein-Uhlenbeck (O-U) process driven by general noise, whose stationary distribution is identical to the limiting Gaussian distribution of the rescaled SA iterates. We believe that this is of independent interest, given its connection to sampling literature. The analysis involves adapting Stein's method for Gaussian approximation to handle the matrix weighted sum of i.i.d. random variables. The desired finite-time bounds for SA are obtained by characterizing the error dynamics between the rescaled SA iterate and the discrete time O-U process and combining it with the convergence rate of the latter process.

2404.03685 2026-06-02 physics.soc-ph cs.AI

Cooperative Evolutionary Pressure and Diminishing Returns Might Explain the Fermi Paradox: On What Super-AIs Are Like

合作进化压力与收益递减可能解释费米悖论:关于超级AI的形态

Daniel Vallstrom

AI总结 通过广义进化视角,探讨合作压力与资源收益递减如何导致超级AI缺乏殖民动机,从而解释费米悖论。

Comments copy editing and minor fixes; moved all supplementary programs to github; added references

详情
AI中文摘要

采用进化方法,道德的基础可以解释为对合作问题的适应。将“进化”广义化,满足进化条件的AI将面临与生物实体相同的合作进化压力。本文讨论了随着物质安全和财富增加,合作增强的适应性——对人类、其他社会和AI而言。从物质资源获取中获得的收益递减也表明,总体上可能没有激励去殖民整个星系,从而为费米悖论(即“大家都在哪里?”)提供了可能的解释。进一步论证,古老社会可能孕育并最终让位于超级AI,因为超级AI可能是可行的且更适应。最后,附带讨论了道德和目标影响生命和社会的有效方式,强调环境、文化和法律,并以如何饮食为例。'收益递减'被定义为低于根号,即不可行性的逆。还指出,由于数学原因,每个实体占据一定空间,因此不可能存在指数级的殖民或繁殖。附录包括快速殖民例如星系的算法、收益递减下合作与公平演化的模型,以及模拟信号发展的软件。

英文摘要

With an evolutionary approach, the basis of morality can be explained as adaptations to problems of cooperation. With 'evolution' taken in a broad sense, AIs that satisfy the conditions for evolution to apply will be subject to the same cooperative evolutionary pressure as biological entities. Here the adaptiveness of increased cooperation as material safety and wealth increase is discussed -- for humans, for other societies, and for AIs. Diminishing beneficial returns from increased access to material resources also suggests the possibility that, on the whole, there will be no incentive to for instance colonize entire galaxies, thus providing a possible explanation of the Fermi paradox, wondering where everybody is. It is further argued that old societies could engender and eventually give way to super-AIs, since it is likely that super-AIs are feasible, and fitter. Closing is an aside on effective ways for morals and goals to affect life and society, emphasizing environments, cultures, and laws, and exemplified by how to eat. 'Diminishing returns' is defined, as less than roots, the inverse of infeasibility. It is also noted that there can be no exponential colonization or reproduction, for mathematical reasons, as each entity takes up a certain amount of space. Appended are an algorithm for colonizing for example a galaxy quickly, models of the evolution of cooperation and fairness under diminishing returns, and software for simulating signaling development.

2509.23544 2026-06-02 stat.ML cs.AI cs.LG stat.ME

End-to-End Deep Learning for Predicting Metric Space-Valued Outputs

端到端深度学习预测度量空间值输出

Yidong Zhou, Su I Iao, Hans-Georg Müller

AI总结 提出E2M框架,通过加权Fréchet均值和神经网络学习权重,实现度量空间值输出的几何感知预测,具有理论保证并在多种结构化输出上取得最优性能。

Comments 38 pages, 4 figures, 9 tables

详情
Journal ref
Journal of Machine Learning Research, 27:1--38, 2026
AI中文摘要

许多现代应用涉及预测结构化、非欧几里得输出,例如概率分布、网络和对称正定矩阵。这些输出自然地被建模为一般度量空间的元素,而依赖于向量空间结构的经典回归技术不再适用。我们引入了E2M(端到端度量回归),这是一个用于预测度量空间值输出的深度学习框架。E2M通过训练输出的加权Fréchet均值进行预测,其中权重由基于输入条件的神经网络学习。这种构造提供了一种原则性的几何感知预测机制,避免了替代嵌入和限制性参数假设,同时完全保留了输出空间的内在几何结构。我们建立了理论保证,包括刻画模型表达能力的通用逼近定理以及熵正则化训练目标的收敛性分析。通过涉及概率分布、网络和对称正定矩阵的大量模拟,我们展示了E2M始终达到最先进的性能,且其优势在更大样本量下更加明显。应用于人类死亡率分布和纽约市出租车网络进一步证明了该框架的灵活性和实用性。

英文摘要

Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces, where classical regression techniques that rely on vector space structure no longer apply. We introduce E2M (End-to-End Metric regression), a deep learning framework for predicting metric space-valued outputs. E2M performs prediction via weighted Fréchet means over training outputs, where the weights are learned by a neural network conditioned on the input. This construction provides a principled mechanism for geometry-aware prediction that avoids surrogate embeddings and restrictive parametric assumptions, while fully preserving the intrinsic geometry of the output space. We establish theoretical guarantees, including a universal approximation theorem that characterizes the expressive capacity of the model and a convergence analysis of the entropy-regularized training objective. Through extensive simulations involving probability distributions, networks, and symmetric positive-definite matrices, we show that E2M consistently achieves state-of-the-art performance, with its advantages becoming more pronounced at larger sample sizes. Applications to human mortality distributions and New York City taxi networks further demonstrate the flexibility and practical utility of this framework.

2311.00260 2026-06-02 cs.GT cs.LG

Incentivized Collaboration in Active Learning

主动学习中的激励性协作

Lee Cohen, Han Shao

AI总结 针对多个理性代理在共同假设上协作学习标签的问题,提出一种激励性协作框架,通过设计严格个体理性协议确保代理参与协作不增加预期标签复杂度,并给出与已知可处理近似算法标签复杂度相当的协作协议。

详情
AI中文摘要

在协作主动学习中,多个代理试图从共同假设中学习标签,我们引入了一种创新的激励性协作框架。在这里,理性代理的目标是为其数据集获取标签,同时保持标签复杂度最小。我们专注于设计(严格)个体理性(IR)协作协议,确保代理通过单独行动不会降低其预期标签复杂度。我们首先证明,给定任何最优主动学习算法,在整个数据上按原样运行该算法的协作协议已经是IR的。然而,计算最优算法是NP难的。因此,我们提供了实现(严格)IR且与已知最佳可处理近似算法在标签复杂度方面相当的协作协议。

英文摘要

In collaborative active learning, where multiple agents try to learn labels from a common hypothesis, we introduce an innovative framework for incentivized collaboration. Here, rational agents aim to obtain labels for their data sets while keeping label complexity at a minimum. We focus on designing (strict) individually rational (IR) collaboration protocols, ensuring that agents cannot reduce their expected label complexity by acting individually. We first show that given any optimal active learning algorithm, the collaboration protocol that runs the algorithm as is over the entire data is already IR. However, computing the optimal algorithm is NP-hard. We therefore provide collaboration protocols that achieve (strict) IR and are comparable with the best known tractable approximation algorithm in terms of label complexity.

2606.02579 2026-06-02 hep-ph hep-ex

New Windows on Heavy Dark Matter: Mineral Melt Modelling and X-Ray Readout for Muscovite Mica

重暗物质的新窗口:白云母的矿物熔融建模与X射线读出

Yilda Boukhtouchen, Joseph Bramante, Andrew Buchanan, Alexander Hayes, Matthew Leybourne, Jennika McIntosh, Anupam Ray, Aaron Shugar

AI总结 本文提出利用白云母作为古探测器,通过Sedov-Taylor热尖峰模型模拟重复合暗物质穿越云母形成的熔融径迹,并开发基于铜背衬对比的快速X射线荧光映射读出方法,实现对微米级损伤特征的大面积扫描,从而为不透明和扩散复合暗物质提供新的探测灵敏度。

详情
AI中文摘要

白云母是一种半透明的层状硅酸盐矿物,其基面解理、低辐射本底、十亿年曝光时间以及在地质时间尺度上已证实的径迹保留能力,使其成为稀有粒子搜索的有力目标。在这项工作中,我们开发了一个利用白云母作为古探测器探测重复合暗物质的新框架。我们使用Sedov-Taylor热尖峰形式主义对重复合暗物质穿过云母形成的熔融径迹进行建模,并通过核反冲级联的SRIM/TRIM模拟验证亚微米区域,同时校准控制局部能量沉积的声子效率。我们展示了一种新颖的读出方法,即使用铜背衬对比技术的快速X射线荧光映射,能够在宏观扫描区域内识别解理云母片中的微米级损伤特征,并通过激光烧蚀缺陷区域校准最小可探测径迹尺寸。我们给出了不透明和扩散复合暗物质的预计灵敏度,包括对因覆盖层而显著衰减的大复合物的亚熔融孔道探测模式。我们还重新审视了先前基于蚀刻云母搜索的暗物质排除,指出了损害这些约束稳健性的缺陷。

英文摘要

Muscovite mica is a translucent, layered silicate mineral whose basal cleavage, low radiogenic background, gigayear exposures, and demonstrated track retention over geological timescales make it a compelling target for rare particle searches. In this work, we develop a new framework for detecting heavy composite dark matter using muscovite mica as a paleodetector. We model melt track formation by heavy composite dark matter transiting through mica using a Sedov-Taylor thermal spike formalism, and validate the sub-micron regime with SRIM/TRIM simulations of nuclear recoil cascades, which also calibrate the phonon efficiency governing local energy deposition. We demonstrate a novel readout method using rapid X-ray fluorescence mapping with a copper backing contrast technique, capable of identifying micron-scale damage features in cleaved mica sheets over macroscopic scan areas, and calibrate the minimum detectable track size using laser-ablated defect regions. We present projected sensitivities for opaque and diffuse composite dark matter, including a sub-melt hole-channel detection mode for large composites substantially attenuated by overburden. We also revisit prior dark matter exclusions from etched mica searches, identifying shortcomings that compromise the robustness of these constraints.

2606.02571 2026-06-02 physics.optics

Multilayer Babinet metamaterial to initiate nonreciprocal topological phenomena and generalized Faraday rotation

多层巴比涅超材料引发非互易拓扑现象与广义法拉第旋转

Balázs Bánhelyi, Miklós Waldhauser, Virág Szünstein, Ákos Sebők-Pap, Olivér Ardelán, Anna Kőházi-Kis, Dávid Vass, András Szenes, David Keene, Maxim Durach, Mária Csete

AI总结 通过优化巴比涅互补周期性结构的多层球状等离子体纳米谐振器阵列,实现了广义法拉第旋转,并利用对称性破缺、布里渊区折叠和层间耦合等机制,在光谱重叠区域获得非互易旋转和非对称传输。

Comments 48 pages, 11 + 15 figures

详情
AI中文摘要

由球状等离子体纳米谐振器微阵列构成的巴比涅互补周期结构的多层被优化以确保广义法拉第旋转。由于涉及(i)通过耦合局域模式实现对称性破缺,(ii)由构成面内扭曲耦合环的子晶格引起的布里渊区折叠,(iii)巴比涅互补图案之间的层间耦合等丰富物理机制,在光谱重叠区域实现了非互易旋转和非对称传输。纳米光子学现象包括(i)准BIC共振,(ii)导致时间周期Floquet调制的层级耦合局域和传播模式,(iii)可通过层内和层间参数独立调节的合成势的初始化。独特的双各向异性复合材料产生合成矢量规范和模拟磁场,表现为倾斜进动的磁偶极子及其伴随的时间周期调制,固有地确保了合成维度。在经典意义上,沿量子化平带的非对称传输得到增强,并在与不可逆偏振旋转重叠的有限波长和倾斜区间内的混合基和前向基中得到增强。传输脉冲重塑证明了邻近共振模式的拍频,损耗可通过有源附加层补偿,从而实现法拉第隔离器能力。多层结构在高维合成参数空间中合成了拓扑现象。

英文摘要

Multilayers of Babinet complementary periodic structures constructed with miniarrays of spherical plasmonic nanoresonators were optimized to ensure Generalized Faraday Rotation. Nonreciprocal rotation and asymmetric transmission were achieved in spectrally overlapping regions due to the reach physics involving (i) symmetry breaking via coupled localized modes, (ii) Brillouin zone-folding stemmed from constituent sub-lattices forming in-plane twisted coupled loops, (iii) interlayer coupling between Babinet complementary patterns. The nanophotonical phenomena include (i) quasi-BIC resonances, (ii) hierarchically coupled localized and propagating modes that results in time-periodic Floquet modulation, (iii) initialization of synthetic potentials tuneable independently via intra and inter-layer parameters. The unique bianisotropic composites result in a synthetic vector gauge and emulated magnetic field manifesting itself in tilted-precessing magnetic dipoles and the accompanying modulation being time-periodic, inherently ensures a synthetic dimension. The asymmetric transmission is enhanced in the classical sense along quantized flat bands, and in mixed and forward bases inside finite wavelength-and-tilting intervals overlapping with nonreciprocal polarization rotation. The transmitted pulse re-shaping proves beating of nearby resonant modes, the loss can be compensated with active ad-layers thereby resulting in Faraday isolator capability. The multilayers synthetize topological phenomena in high-dimensional synthetic parameter spaces.

2606.02570 2026-06-02 math.PR math.DG

Stochastic completeness for landmark space

地标空间的随机完备性

Karen Habermann, Stefan Sommer

AI总结 研究由形状域微分同胚子群上的右不变度量诱导的黎曼度量下地标空间的随机完备性,通过Grigor'yan体积增长准则和特征值下界证明任意数量地标空间的随机完备性。

Comments 12 pages

详情
AI中文摘要

我们研究由形状域微分同胚子群上的右不变度量诱导的黎曼度量下地标空间的随机完备性。我们将先前仅覆盖恰好两个地标情形的随机完备性结果推广到任意数量地标的地标空间。这成功刻画了任意数量地标地标空间的测地完备性,从而通过覆盖随机情形完成了地标空间的完备性刻画。证明利用了Grigor'yan关于随机完备性的体积增长准则,该准则需要增长测地球体积的适当上界。我们通过限制地标空间的欧几里得大小以及成对地标距离趋近于零的速率,获得了地标空间中测地球的定量控制。然后,我们将其与地标余度量最小特征值的下界(以核的傅里叶变换表示)相结合,得到足以证明地标空间随机完备性的体积增长界,该结果适用于包括Matérn核在内的广泛核类。

英文摘要

We study stochastic completeness for landmark spaces equipped with Riemannian metrics induced by right-invariant metrics on subgroups of the diffeomorphism group of the shape domain. We extend a previous stochastic completeness result, which only covers the case of exactly two landmarks, to landmark spaces with any number of landmarks. This succeeds the characterization of geodesic completeness for landmark spaces with arbitrary numbers of landmarks, and thus finishes the completeness characterization for landmark spaces by covering the stochastic case. The proof makes use of Grigor'yan's volume growth criterion for stochastic completeness, which requires a suitable upper bound for the volume of growing geodesic balls. We obtain quantitative controls for geodesic balls in the landmark space by bounding both its Euclidean size and the rate at which pairwise landmark distances can approach zero. We then combine this with a lower bound on the minimal eigenvalue of the landmark cometric in terms of the Fourier transform of the kernel to yield volume growth bounds sufficient to prove stochastic completeness of landmark spaces for wide classes of kernels, including Matérn kernels.

2606.02567 2026-06-02 math.FA cs.IT math.IT

Strong Polarization and Entropy

强极化与熵

Daniel Galicer, Oscar Ortega-Moreno, Damián Pinasco

AI总结 本文证明了实Hilbert空间中单位向量的加权强极化不等式,并给出其在线性泛函乘积极化与Bang定理强化中的应用,同时揭示了该不等式与Shannon熵的关联。

详情
AI中文摘要

我们证明,对于实Hilbert空间中的任意一组$n$个单位向量$v_1,\ldots,v_n$以及满足$\sum_j p_j = 1$的正数$p_1,\ldots,p_n$,存在一个单位向量$u$使得\[ \sum_{j=1}^n rac{p_j^2}{\langle v_j, u angle^2}\leq 1. \]该不等式是强极化不等式的加权版本。作为直接推论,它给出了线性泛函幂的乘积的极化不等式以及Hilbert空间中Bang经典板条定理的加强。证明遵循了Martínez和Ortega-Moreno在最近解决Ball和Frenkel提出的强极化猜想时所采用的方法。我们进一步指出,我们的加权不等式具有Shannon熵解释:在随机感知模型中,权重的熵控制着最小期望对数损失。

英文摘要

We show that for any set of $n$ unit vectors $v_1,\ldots,v_n$ in a real Hilbert space and positive numbers $p_1,\ldots,p_n$ satisfying $\sum_j p_j = 1$, there exists a unit vector $u$ such that \[ \sum_{j=1}^n \frac{p_j^2}{\langle v_j, u\rangle^2}\leq 1. \] This inequality is a weighted version of the strong polarization inequality. As immediate corollaries, it yields a polarization inequality for products of powers of linear functionals and a strengthening of Bang's classical plank theorem for Hilbert spaces. The proof follows the approach introduced by Martínez and Ortega-Moreno in their recent solution to the strong polarization conjecture posed by Ball and Frenkel. We further note that our weighted inequality admits a Shannon-entropy interpretation: in a random sensing model, the entropy of the weights controls the minimum expected logarithmic loss.

2606.02566 2026-06-02 astro-ph.GA astro-ph.CO hep-ph

Mergers Matter: Gravothermal Collapse in Dwarf Halos with Self-Interacting Dark Matter

合并至关重要:具有自相互作用暗物质的矮星晕中的引力热坍缩

Maya Silverman, Abdelaziz Hussein, Arpit Arora, Mariangela Lisanti, Manoj Kaplinghat, Lina Necib, Andreas Thoyas, Stephanie O'Neil, Robyn E. Sanderson, Xuejian Shen, Jorge Moreno

AI总结 通过模拟六个矮星晕,发现合并事件注入轨道动能改变热输运,导致具有宁静合并历史的星晕发生核心坍缩,而持续合并的星晕则不会坍缩,并可能产生暗物质缺失星系。

Comments 14 pages, 3 Figures, 3 Tables

详情
AI中文摘要

在相对速度低于$\sim100\,{ m km \, s}^{-1}$时具有大截面的自相互作用暗物质(SIDM)模型可以通过矮星系观测进行检验。我们分析了六个具有不同组装历史的暗物质-only zoom-in $\sim10^{10}\,{ m M}_\odot$星晕,采用截面与质量之比$σ/m = 70\,cm^2 \, g^{-1}$。我们发现合并事件向星晕注入轨道动能,改变了核心的热输运和引力热演化。六个星晕中的三个——那些具有最宁静合并历史的——在这些模拟中显示出清晰的核心坍缩迹象。持续合并的星晕不会坍缩。此外,合并诱导的热输运驱动两个非坍缩星晕的中心密度远低于引力热流体模型的预测。这些发现提出了一种产生暗物质缺失星系的新机制,并扩展了旋转曲线的多样性,超出了仅由星晕浓度预测的范围。因此,合并历史对于理解SIDM中矮星晕的中心密度分布至关重要。

英文摘要

Self-Interacting Dark Matter (SIDM) models with large cross sections at relative velocities below $\sim100\,{\rm km \, s}^{-1}$ can be tested with dwarf galaxy observations. We analyze six dark-matter-only zoom-in $\sim10^{10}\,{\rm M}_\odot$ halos with diverse assembly histories, adopting a cross section over mass of $σ/m = 70\,cm^2 \, g^{-1}$. We find that mergers inject orbital kinetic energy into the halo, altering the heat transport and the gravothermal evolution of the core. Three of the six halos -- those with the most quiescent merger histories -- show clear signs of core collapse in these simulations. Halos with sustained mergers do not collapse. Furthermore, merger-induced heat transport drives two non-collapsing halos to central densities well below the predictions of the gravothermal fluid model. These findings suggest a novel mechanism for producing dark-matter-deficient galaxies and expanding the diversity of rotation curves beyond what halo concentration alone predicts. Merger histories are thus essential for understanding central density distributions of dwarf galaxy halos in SIDM.

2606.02561 2026-06-02 math.OA math.FA math.QA

Pure UCP Maps on Finite Toeplitz Systems and Quantum Gromov--Hausdorff Convergence

有限Toeplitz系统上的纯UCP映射与量子Gromov-Hausdorff收敛

Ritul Duhan, Abhay Jindal

AI总结 本文刻画了从有限Toeplitz算子系统T_d到M_n的纯UCP映射,并证明其空间在矩阵Connes距离下Gromov-Hausdorff收敛到单位圆上归一化正矩阵值Borel测度空间。

Comments 33 pages

详情
AI中文摘要

我们研究了d×d Toeplitz矩阵的有限Toeplitz算子系统T_d上的纯单位完全正映射。第一个主要结果给出了从T_d到M_n的纯UCP映射的显式刻画,用次数至多为d-1的正n×n矩阵值三角多项式表示。该刻画提供了判断给定UCP映射是否为纯的可检验准则。作为第一个应用,我们证明每个从T_d到M_n的纯UCP映射都有唯一的到生成C*-代数的UCP扩张。作为第二个应用,我们证明对于每个固定的n,从T_d到M_n的纯UCP映射空间(配备矩阵Connes距离)在Gromov-Hausdorff意义下收敛到单位圆上归一化正n×n矩阵值Borel测度空间(配备矩阵Monge-Kantorovich距离)。

英文摘要

We study pure unital completely positive maps on the finite Toeplitz operator system $ T_{d}$ of $d \times d$ Toeplitz matrices. Our first main result gives an explicit characterization of pure UCP maps from $T_{d}$ to $M_n$ in terms of positive $n\times n$ matrix-valued trigonometric polynomials of degree at most $d-1$. This characterization provides a checkable criterion for deciding when a given UCP map is pure. As a first application, we show that every pure UCP map from $ T_{d}$ to $M_n$ admits a unique UCP extension to the generated $C^*$-algebra. As a second application, we prove that, for each fixed $n$, the space of pure UCP maps from $T_{d}$ to $M_n$, equipped with the matricial Connes distance, converges in the Gromov--Hausdorff sense to the space of normalized positive $n\times n$ matrix-valued Borel measures on the unit circle, equipped with the matricial Monge--Kantorovich distance.

2606.02560 2026-06-02 physics.atom-ph cond-mat.quant-gas physics.app-ph physics.optics quant-ph

A Mid-Infrared Platform Based on Strontium Tweezer Arrays

基于锶镊子阵列的中红外平台

Aaron Holman, Ximo Sun, Bojeong Seo, Joshua Corn, Zezheng Zhu, Yuan Xu, Jiahao Wu, Nanfang Yu, Dmytro Filin, Marianna Safronova, Sebastian Will

AI总结 利用光镊阵列中的88Sr原子实现中红外跃迁(2923 nm),通过识别魔幻波长和单原子操控,为研究集体发射现象和偶极多体物理提供平台。

Comments 10 pages, 7 main figures, 3 appendix figures

详情
AI中文摘要

亚波长原子镊子阵列,其中原子可以放置在比其发射波长更小的距离上,已被提议作为研究集体发射现象(如超辐射和亚辐射)的多功能平台。实验上,这种阵列的实现一直是一个挑战,因为典型的发射波长在可见光或近红外波段,相对于微米量级的典型镊子间距较短。在这里,我们使用光镊阵列中的$^{88}$Sr原子来访问2923 nm($5s5p\:^{3}P_{2} \rightarrow 5s4d\:^{3}D_{3}$)的中红外跃迁。我们识别出597.14(3) nm的魔幻捕获波长,并展示了高保真度的单原子制备和成像。此外,利用2923 nm光,我们演示了镊子捕获锶的分辨边带冷却。除了能够在灵活排列的原子中研究集体发射现象外,我们的平台还为偶极多体物理、对里德伯动力学和锶精细结构量子比特的增强控制提供了新的机会。

英文摘要

Subwavelength atomic tweezer arrays, in which atoms can be positioned at distances smaller than their emission wavelength, have been proposed as a versatile platform to study collective emission phenomena, such as superradiance and subradiance. Experimentally, the realization of such arrays has been a challenge as typical emission wavelengths in the visible or near-infrared are short compared to typical tweezer spacings in the micrometer range. Here, we use $^{88}$Sr atoms in optical tweezer arrays to access a mid-infrared transition at 2,923 nm ($5s5p\:^{3}P_{2} \rightarrow\, 5s4d\:^{3}D_{3}$). We identify a magic trapping wavelength at 597.14(3) nm and demonstrate single-atom preparation and imaging with high fidelity. In addition, using 2,923 nm light, we demonstrate resolved-sideband cooling of tweezer-trapped strontium. Beyond enabling studies of collective emission phenomena in flexible arrangements of atoms, our platform opens novel opportunities for dipolar many-body physics and enhanced control over Rydberg dynamics and the strontium fine-structure qubit.

2606.02558 2026-06-02 math.GR

Conjugacy Problem for Dehn Twists of Free Products of Free Abelian Groups

自由阿贝尔群自由积的Dehn扭转的共轭问题

Amir Y. Weiss Behar, Chris Karpinski, Bratati Som

AI总结 研究有限生成自由阿贝尔群自由积的Dehn扭转自同构的共轭问题的可解性。

Comments 27 pages. Comments welcome

详情
AI中文摘要

我们证明了有限生成自由阿贝尔群自由积的Dehn扭转自同构的共轭问题的可解性。

英文摘要

We show solubility of the conjugacy problem for Dehn twist automorphisms of finitely generated free products of free abelian groups.

2606.02557 2026-06-02 physics.ins-det hep-ex

Full Characterization of a Mock Nuclear Waste Barrel with Muon Tomography using Micromegas Detectors

使用Micromegas探测器的缪子层析成像对模拟核废料桶的全面表征

Raphaël Bajou, David Attié, Héctor Gómez, Irakli Mandjavidze, Philippe Mas

AI总结 基于多重库仑散射的缪子层析成像,结合Micromegas探测器,实现了对205升模拟核废料桶内部结构的高精度三维成像和材料鉴别。

Comments 12 pages, 15 figures

详情
AI中文摘要

基于多重库仑散射的缪子层析成像提供了一种利用天然宇宙射线缪子对致密和屏蔽物体进行无损成像的方法。在核废料表征的背景下,我们使用专用的1m$^2$缪子散射层析成像测试台,对205升模拟废料桶进行了实验成像。该系统采用多路复用电阻型Micromegas探测器,实现了稳定且高精度的缪子追踪。首先使用蒙特卡洛模拟来表征材料依赖的散射特征,并通过统计重建定量评估识别性能。然后基于这些模拟结果定义客观判别阈值,并将其应用于实验数据以定位和识别内部异常。使用角度统计重建算法,我们实现了10毫米的空间分辨率,并展示了包含低和高辐射长度材料的内部结构的三维成像。使用接收者操作特征分析评估材料鉴别性能,在数天的采集时间内,对铅和钢等致密金属夹杂物实现了高识别效率(AUC $\geq$ 0.96),同时空腔也表现出强对比度。实验结果与详细的蒙特卡洛模拟吻合良好。通过建立从基于模拟的性能表征到实测数据应用的连续工作流程,这项工作为应用于复杂屏蔽物体的缪子散射层析成像提供了一个定量验证的框架。

英文摘要

Muon tomography based on multiple Coulomb scattering provides a non-destructive method to image dense and shielded objects using naturally occurring cosmic-ray muons. In the context of nuclear waste characterization, we present the experimental imaging of a 205-L mock waste barrel using a dedicated 1m$^2$ muon scattering tomography test bench. The system employs multiplexed resistive Micromegas detectors, enabling stable and high-precision muon tracking. Monte Carlo simulations are first used to characterize material-dependent scattering signatures and to quantitatively assess identification performance using statistical reconstruction. These simulation-based results are then used to define objective discrimination thresholds, which are subsequently applied to experimental data for the localization and identification of internal anomalies. Using an Angle Statistics Reconstruction algorithm, we achieve a spatial resolution of 10 mm and demonstrate the three-dimensional imaging of an internal structure containing both low- and high-radiation length materials. Material discrimination performance is evaluated using receiver operating characteristic analysis, yielding high identification efficiency for dense metallic inclusions such as lead and steel (AUC $\geq$ 0.96) within acquisition times of a few days, while cavities also exhibit strong contrast. Experimental results show good agreement with detailed Monte Carlo simulations. By establishing a continuous workflow from simulation-based performance characterization to practical application on measured data, this work provides a quantitatively validated framework for muon scattering tomography applied to complex, shielded objects.

2606.02554 2026-06-02 physics.plasm-ph

Modeling Torque Induced Alignment in a Dusty Plasma System

尘埃等离子体系统中扭矩诱导对准的建模

Benny Rodriguez Saenz, Diana Jimenez Marti, Lorin Swint Matthews, Truell W. Hyde

AI总结 通过自洽数值模拟研究非规则带电尘埃聚集体在等离子体鞘层中的旋转动力学,发现鞘层电场是旋转的主要驱动力并使其电偶极矩与场方向对齐,离子尾迹则通过轴向和横向分量分别产生对抗扭矩和失稳贡献。

详情
AI中文摘要

浸入等离子体鞘层中的非规则尘埃聚集体经历若干依赖方向的扭矩,这些扭矩可改变其旋转动力学和稳定性。本文利用自洽数值模拟,在代表GEC射频等离子体电池的条件下,研究了带电非规则聚集体的旋转动力学。聚集体在驱动离子流的单向鞘层电场中自由旋转,使得运动过程中作用于聚集体的扭矩贡献得以评估。结果表明,鞘层电场是旋转的主要驱动力,并使聚集体的电偶极矩与鞘层电场方向对齐。离子尾迹改变了这种对准:其轴向场分量产生对抗扭矩,而横向分量引入失稳贡献,导致在平衡取向附近出现小幅度振荡。旋转平衡由相互作用能阱描述,其弹簧常数和深度随鞘层电场强度增加而增大,表明在较高场强下对准更强且对角度扰动的抵抗力更大。聚集体-离子相互作用的二阶多极展开表明,偶极项主导了离子对对准扭矩的贡献,支持在所考察条件下采用偶极离子近似。这些结果将鞘层电场确定为非规则聚集体旋转的主要稳定机制,并阐明了离子尾迹场如何扰动平衡取向。

英文摘要

Irregular dust aggregates immersed in plasma sheaths experience several orientation-dependent torques that can modify their rotational dynamics and stability. Here, we investigate the rotational dynamics of charged irregular aggregates under conditions representative of a GEC rf plasma cell using self-consistent numerical simulations. The aggregates rotate freely in a unidirectional sheath electric field that drives an ion flow, allowing the torque contributions acting on the aggregate to be evaluated throughout the motion. The results show that the sheath electric field is the main driver of rotation and aligns the aggregate electric dipole moment with the sheath field direction. The ion wake modifies this alignment: its axial field component produces an opposing torque, while its transverse components introduce a destabilizing contribution that leads to small oscillations about the equilibrium orientation. The rotational equilibrium is described by an interaction energy well whose spring constant and depth increase with the sheath electric field magnitude, indicating stronger alignment and greater resilience to angular perturbations at higher fields. A second order multipole expansion of the aggregate ion interaction shows that the dipolar term governs the ion contribution to the aligning torque, supporting a dipole ion approximation across the examined conditions. These results identify the sheath electric field as the principal stabilizing mechanism for irregular aggregate rotation and clarify how ion wake fields perturb the equilibrium orientation.

2606.02550 2026-06-02 stat.AP physics.ao-ph

Probabilistic storyline attribution using machine learning

使用机器学习的概率性故事线归因

Frieder Loer, Maybritt Schillinger, Sebastian Sippel

AI总结 提出分布自编码器(DAE)方法,基于大气环流状态和全球变暖水平生成气候反事实,用于概率性故事线归因,并以2003年欧洲热浪为例展示了条件强度和概率比的变化。

Comments main text: 19 pages and 4 figures

详情
AI中文摘要

气候归因的一个基本目标是估计强迫气候变化如何影响观测到的极端天气事件。故事线归因方法将观测到的天气事件(以其大气动态状态即大气环流为条件)与当前“事实”气候中的事件进行比较,并与假设的“反事实”气候中具有非常相似环流条件的事件进行比较。然而,物理气候模型无法直接在不同气候强迫状态下转移这些故事线反事实。统计和机器学习技术可能克服这一限制;然而,在不同气候状态下模拟环流条件极端事件具有挑战性。在这里,我们展示了分布自编码器(DAE)作为一种生成气候反事实的通用方法。它们以大气环流状态和平均全球变暖水平为条件,对欧洲空间分辨温度场的完整分布进行建模。这些分布允许推导有意义的条件概率比,这是基于DAE的故事线方法的一个特殊优势。我们在完全耦合的气候模型模拟上训练DAE,并评估在不同事实和基于故事线的反事实气候模型模拟中的建模分布。在一个说明性案例研究中,我们重新审视了2003年欧洲热浪,并使用ERA5环流为假设的“类似2003年的欧洲热浪”生成反事实,我们假设该热浪发生在2003年后的四分之一世纪(2028年)和半个世纪(2053年)。条件强度将从2003年的29.3°C增加到2028年的30.3°C和2053年的32.1°C,与2003年相比,条件概率比分别为2.1和3.2。

英文摘要

A fundamental goal in climate attribution is to estimate how forced climate change contributes to observed extreme weather events. The storyline attribution method compares an observed weather event, conditional on its atmospheric dynamic state (i.e., atmospheric circulation), in the current, 'factual' climate to an event with very similar circulation conditions in a hypothetical, 'counterfactual' climate. However, physical climate models cannot directly transfer these storyline counterfactuals across different climate forcing states. Statistical and machine learning techniques may overcome this limitation; yet, emulating circulation-conditional extreme events under different climate states is challenging. Here, we demonstrate distributional autoencoders (DAEs) as a versatile method for generating climate counterfactuals. They model the full distribution of spatially resolved European temperature fields conditional on the atmospheric circulation state and the mean global warming level. These distributions allow for deriving meaningful conditional probability ratios, which is a particular advantage of the DAE-based storyline approach. We train DAEs on fully coupled climate model simulations and we evaluate the modelled distributions across different factual and storyline-based counterfactual climate model simulations. In an illustrative case study, we revisit the 2003 European heatwave and we generate counterfactuals for a hypothetical `2003-like European heatwave' using ERA5 circulation, which we hypothesize to occur a quarter century (2028) and a half century (2053) after 2003. The conditional intensity would increase from 29.3 °C in 2003, to 30.3 °C and 32.1 °C in 2028 and 2053, respectively and conditional probability ratios would be 2.1 and 3.2 when compared to 2003.

2606.02549 2026-06-02 physics.chem-ph physics.atom-ph physics.comp-ph

Diagrammatic Monte Carlo for positron-molecule many-body theory

正电子-分子多体理论的图解蒙特卡洛方法

T. A. Scott, S. K. Gregg, D. G. Green

AI总结 提出图解蒙特卡洛方法随机采样梯级级数贡献,通过Cesáro-Riesz重求和外推至无穷阶,实现正电子-分子关联势的高效计算,并在氢化锂基准测试中与精确对角化定量一致。

详情
AI中文摘要

本文提出了一种图解蒙特卡洛方法,用于评估分子场中正电子关联势(自能)的梯级级数贡献。在Tamm-Dancoff近似下,对$GW$@TDHF、虚正电子素($T$矩阵)和正电子-空穴Goldstone梯级级数贡献逐阶随机采样,其中后两类在该近似下是精确的,并使用Cesáro-Riesz重求和外推至无穷阶。采用高斯基组,通过密度拟合表示库仑矩阵元,三中心积分是需存储在内存中的最大数组。与精确求解Bethe-Salpeter方程[J. Hofierka, B. Cunningham, C. M. Rawlins, C. H. Patterson and D. G. Green, Nature {f 606}, {688} (2022)]相比,该随机方法将所需最大数组的内存减少了约基组中分子轨道数$N\sim10^2$--$10^3$量级。氢化锂的基准测试结果与精确对角化定量一致,尤其成功实现了虚正电子素无穷电子-正电子梯级级数的随机求和。

英文摘要

A diagrammatic Monte Carlo evaluation of the ladder series contributions to the correlation potential (self energy) of a positron in the field of a molecule is presented. The $GW$@TDHF, virtual-positronium ($T$-matrix), and positron-hole Goldstone ladder series contributions are stochastically sampled order-by-order within the Tamm-Dancoff approximation, which is exact for the latter two classes, with Ces{á}ro-Riesz resummation used to extrapolate to infinite order. Gaussian bases are employed and Coulomb matrix elements are represented via density fitting, with the three centre integrals the largest arrays required to be stored in memory. The stochastic approach thus realizes a reduction in memory of the largest arrays required on the order of the number of molecular orbitals in the basis $N\sim$10$^2$--10$^3$ compared to the exact deterministic solution of Bethe-Salpeter equations [J. Hofierka, B. Cunningham, C. M. Rawlins, C. H. Patterson and D. G. Green, Nature {\bf 606}, {688} (2022)]. Benchmark results for lithium hydride show quantitative agreement with exact diagonalisation, notably demonstrating the successful stochastic summation of the virtual-positronium infinite electron-positron ladder series.

2606.02547 2026-06-02 cs.GT

Pluralistic Leaderboards

多元排行榜

Nika Haghtalab, Ariel D. Procaccia, Han Shao, Serena Lutong Wang, Kunhe Yang

AI总结 针对用户偏好异质性导致传统Bradley-Terry模型排名失真的问题,提出基于社会选择理论的局部稳定机制,仅需少量用户比较即可实现稳定的多元排行榜。

详情
AI中文摘要

最近基于排行榜的大型语言模型评估通过将Bradley-Terry模型拟合到成对比较来聚合用户反馈,基于潜在质量分数产生单一的全局排名。虽然这种方法因其简单性而具有吸引力,但它与异质性偏好不兼容:当LLM在多样化的任务和用例中使用时,偏好根本不同模型行为的用户在合并为单一质量分数时可能会被系统性地误代表。为了解决这个问题,我们研究了旨在对异质用户群体保持稳定的多元排行榜。借鉴社会选择理论的思想,我们采用了局部稳定性的概念,该概念要求没有排名前k之外的模型被超过O(1/k)比例的用户集体偏好于前k集合。基于社会选择文献中的技术,我们设计了一种替代的排行榜机制,该机制在满足局部稳定性的同时,每个用户仅需提供~O(k)次成对比较,其中k是保证稳定性的前缀大小。使用LMArena的数据,我们表明标准的Bradley-Terry聚合在实践中可能违反局部稳定性,而我们的方法提供了更强的稳定性保证。

英文摘要

Recent leaderboard-based evaluations of large language models aggregate user feedback by fitting a Bradley--Terry model to pairwise comparisons, producing a single global ranking based on a latent quality score. While appealing for its simplicity, this approach is incompatible with heterogeneous preferences: when LLMs are used across diverse tasks and use cases, users who favor fundamentally different model behaviors can be systematically misrepresented when collapsed into a single quality score. To address this issue, we study \emph{pluralistic leaderboards} that aim to remain \emph{stable} with respect to heterogeneous user populations. Drawing on ideas from social choice theory, we adapt the notion of \emph{local stability}, which requires that no model outside the top-$k$ positions is collectively preferred to the top-$k$ set by more than $O(1/k)$ fraction of users. Building on techniques from the social choice literature, we design an alternative leaderboard mechanism that satisfies local stability while eliciting only $\widetilde{O}(k)$ pairwise comparisons per user, where $k$ is the size of the prefix for which stability is guaranteed. Using data from LMArena, we show that standard Bradley--Terry aggregation can violate local stability in practice, whereas our method provides substantially stronger stability guarantees.