arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3410
2402.10665 2026-05-26 cs.LG cs.CV

Soft Dice Confidence: A Near-Optimal Confidence Estimator for Selective Prediction in Semantic Segmentation

Soft Dice Confidence: 语义分割中选择性预测的近似最优置信度估计器

Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva

AI总结 针对语义分割中的选择性预测问题,提出一种基于Dice系数的近似最优置信度估计器SDC,在已知或估计边际后验概率下均优于现有方法。

Comments 48 pages, 11 figures

详情
AI中文摘要

在语义分割中,即使是最先进的深度学习模型在某些高风险应用(如医学图像分析)中也达不到所需的性能。在这些情况下,可以通过允许模型在置信度低时放弃预测来提高性能,这种方法称为选择性预测。虽然在分类文献中广为人知,但选择性预测在语义分割的背景下尚未得到充分探索。本文通过关注图像级弃权来解决这个问题,即对整个图像产生单个置信度估计,而先前的方法则关注像素级不确定性。假设Dice系数作为分割的评估指标,本文提供了两个主要贡献:(i)在已知边际后验概率的情况下,我们推导出最优置信度估计器,但观察到对于典型图像大小难以处理。然后,提出了一种线性时间可计算的近似方法,称为Soft Dice Confidence(SDC),并证明它与最优估计器紧密有界。(ii)当仅知道边际后验概率的估计时,我们提出了SDC的插件版本,并证明它优于所有先前的方法,包括那些需要额外调优数据的方法。这些发现得到了合成数据和来自六项医学成像任务(包括分布外场景)的真实世界数据的实验结果的支持,将SDC定位为语义分割中选择性预测的可靠且高效的工具。

英文摘要

In semantic segmentation, even state-of-the-art deep learning models fall short of the performance required in certain high-stakes applications such as medical image analysis. In these cases, performance can be improved by allowing a model to abstain from making predictions when confidence is low, an approach known as selective prediction. While well-known in the classification literature, selective prediction has been underexplored in the context of semantic segmentation. This paper tackles the problem by focusing on image-level abstention, which involves producing a single confidence estimate for the entire image, in contrast to previous approaches that focus on pixel-level uncertainty. Assuming the Dice coefficient as the evaluation metric for segmentation, two main contributions are provided in this paper: (i) In the case of known marginal posterior probabilities, we derive the optimal confidence estimator, which is observed to be intractable for typical image sizes. Then, an approximation computable in linear time, named Soft Dice Confidence (SDC), is proposed and proven to be tightly bounded to the optimal estimator. (ii) When only an estimate of the marginal posterior probabilities are known, we propose a plug-in version of the SDC and show it outperforms all previous methods, including those requiring additional tuning data. These findings are supported by experimental results on both synthetic data and real-world data from six medical imaging tasks, including out-of-distribution scenarios, positioning the SDC as a reliable and efficient tool for selective prediction in semantic segmentation.

2310.04981 2026-05-26 cs.CV cs.LG

Compositional Semantics for Open Vocabulary Spatio-semantic Representations

开放词汇时空语义表示的组合语义

Robin Karlsson, Francisco Lepe-Salazar, Kazuya Takeda

AI总结 提出潜在组合语义嵌入z*作为可查询时空语义记忆的知识表示,证明其存在性、最优性及可发现性,并引入充分相似性推理方法提升重叠语义推理性能。

Comments Preprint

详情
AI中文摘要

视觉语言模型(VLM)将环境感知转换为LLM可解释的视觉语言语义。然而,完成复杂任务通常需要对当前感知之外的信息进行推理。我们提出潜在组合语义嵌入z*作为可查询时空语义记忆的基于学习的原则性知识表示。我们在数学上证明z*总是可以找到,并且最优z*是任何集合Z的质心。我们推导了估计相关和不相关语义可分离性的概率界限。我们证明z*可以通过迭代梯度下降从视觉外观和单一描述中发现。我们在包括CLIP和SBERT的四个嵌入空间上实验验证了我们的发现。结果表明,z*可以表示由SBERT编码的多达10个语义,以及理想均匀分布的高维嵌入的多达100个语义。我们引入了三个具有重叠语义的新数据集,以表明在常规非重叠注释上训练的常见VLM能够发现z*。我们提出的充分相似性推理方法克服了传统推理的根本局限性,并将更高层次的重叠语义推理性能平均提高了19.63 mIoU。

英文摘要

Vision-language models (VLMs) transform environment percepts into vision-language semantics interpretable by LLMs. However, completing complex tasks often requires reasoning about information beyond what is currently perceived. We propose latent compositional semantic embeddings z* as a principled learning-based knowledge representation for queryable spatio-semantic memories. We mathematically prove that z* can always be found, and that the optimal z* is the centroid for any set Z. We derive a probabilistic bound for estimating separability of related and unrelated semantics. We prove that z* is discoverable from visual appearance and singular descriptions by iterative gradient descent. We experimentally verify our findings on four embedding spaces including CLIP and SBERT. Our results show that z* can represent up to 10 semantics encoded by SBERT, and up to 100 semantics for ideal uniformly distributed high-dimensional embeddings. We introduce three new datasets with overlapping semantics to show that common VLMs trained on conventional nonoverlapping annotations discover z*. Our novel sufficient similarity inference method overcomes fundamental limitations of conventional inference, and improves higher-level overlapping semantic inference performance by 19.63 mIoU on average.

2305.11663 2026-05-26 cs.LG cs.AI cs.CL cs.CY

Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

作为人文学科方法论的算法失败:机器学习的错误预测识别出用于定性分析的丰富案例

Jill Walker Rettberg

AI总结 本文通过实验验证了Munk等人提出的利用机器学习失败预测识别定性分析中模糊且丰富案例的方法,使用简单kNN算法对虚构角色与机器视觉技术互动的动作数据进行分类,发现不可预测的动作更具矛盾性和情感负荷,支持该方法在人文学科中的适用性。

详情
Journal ref
Big Data & Society 9(2) 2022
AI中文摘要

本文评论测试了Munk等人(2022)提出的一种方法论,即利用机器学习中的失败预测作为识别定性分析中模糊且丰富案例的方法。使用一个描述500件艺术品、电影、小说和电子游戏中虚构角色与机器视觉技术互动动作的数据集,我训练了一个简单的机器学习算法(使用R中的kNN算法),仅根据虚构角色的信息预测动作是主动还是被动。可预测的动作通常是缺乏情感且明确的,其中机器视觉技术被当作简单工具。不可预测的动作,即算法无法正确预测的动作,则更加矛盾且情感负荷更重,角色与技术之间的权力关系更为复杂。因此,结果支持Munk等人的理论,即失败预测可以有效地用于识别定性分析的丰富案例。本测试不仅简单复制了Munk等人的结果,还证明了该方法可以应用于更广泛的人文学科领域,并且不需要复杂的神经网络,简单的机器学习算法也能奏效。需要进一步研究以理解该方法适用于哪些类型的数据以及哪种机器学习最具生成性。为此,附上了产生结果所需的R代码,以便复制测试。该代码也可重复使用或改编,以在其他数据集上测试该方法。

英文摘要

This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.

2105.13431 2026-05-26 cs.LG cs.AI cs.SY eess.SY

An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes

贝叶斯马尔可夫决策过程的离线风险感知策略选择方法

Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel

AI总结 针对离线强化学习中模型不确定性导致策略风险高的问题,提出一种基于贝叶斯形式化框架的风险感知策略选择方法EvC,通过最大化贝叶斯后验下的风险感知目标来选择稳健策略。

Comments Preprint, under review

详情
Journal ref
Artificial Intelligence, Volume 354, 2026
AI中文摘要

在离线模型学习用于规划以及离线强化学习中,有限的数据集阻碍了相对马尔可夫决策过程(MDP)的值函数估计。因此,所获得策略在真实世界中的性能受到限制且可能存在风险,尤其是当部署错误策略可能导致灾难性后果时。为此,目前正在探索多种途径以减少模型误差(或学习模型与真实模型之间的分布偏移),并在更广泛的意义上获得针对模型不确定性的风险感知解决方案。但在最终应用中,实践者应选择哪种基线?在计算时间不是问题且鲁棒性优先的离线背景下,我们提出了Exploitation vs Caution(EvC),这是一种范式:(1)优雅地融入遵循贝叶斯形式化的模型不确定性,以及(2)在由当前基线提供的固定候选策略集合中,选择最大化贝叶斯后验下风险感知目标的策略。我们在不同离散但简单的环境中使用最先进的方法验证了EvC,这些环境提供了多种MDP类别。在测试场景中,EvC成功选择了稳健策略,因此成为旨在将离线规划和强化学习求解器应用于真实世界的实践者的有用工具。

英文摘要

In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a risk-aware objective over the Bayesian posterior between a fixed set of candidate policies provided, for instance, by the current baselines. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners that aim to apply offline planning and reinforcement learning solvers in the real world.

2605.16562 2026-05-26 cs.CL cs.DL

Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

arXiv 上可访问数学的规模化:HTML 转换与 MathML 4

Deyan Ginev, Brian Caruso, Bruce Miller, Jeff Sank, Jacob Weiskoff

AI总结 本文报告 arXiv HTML 论文服务的持续开发,重点介绍 2025 年至 2026 年初的社区驱动改进、语料级转换、MathML 4 意图注释以及 LaTeXML 的 Rust 移植,旨在提升 HTML 保真度、可访问性和计算效率。

Comments 6 pages, ICMS 2026

详情
AI中文摘要

我们报告了 arXiv HTML 论文服务的持续开发情况,该服务自 2023 年首次发布以来已应用于每个新的 TeX/LaTeX 提交。2025 年至 2026 年初的主要亮点包括:(i) 社区驱动的 HTML 保真度和服务健康改进,约 6000 份用户报告中有一半已解决;(ii) 面向 90% 无错误 HTML 的语料级转换工作(目前为 75%);(iii) 用于可访问语音输出的初始 MathML 4 意图注释;(iv) LaTeXML 的 Rust 移植正在进行中,可降低计算成本并在提交时实现更快的预览。arXiv HTML 论文项目仍处于实验阶段,但随着我们更好地理解 arXiv 读者的需求以及新标准、编程语言和 AI 进步带来的技术机遇,该项目正在逐步成熟。

英文摘要

We report on the ongoing development of arXiv's HTML Papers offering, available on every new TeX/LaTeX submission since its initial release in 2023. The main highlights from 2025 and early 2026 are: (i) community-driven improvements to HTML fidelity and service health, with roughly half of 6,000 user reports resolved; (ii) corpus-scale conversion work aimed at 90% error-free HTML (currently 75%); (iii) initial MathML 4 Intent annotations for accessible speech output; (iv) an in-progress Rust port of LaTeXML, reducing compute costs and enabling faster previews on submission. The arXiv HTML Papers project remains experimental, but is gradually maturing as we better understand the needs of arXiv's readers and the technical opportunities presented by new standards and by advances in programming languages and AI.

2605.10543 2026-05-26 cs.CV

TIE: Time Interval Encoding for Video Generation over Events

TIE:面向事件视频生成的时间区间编码

Zhilei Shu, Shangwen Zhu, Zihang Liang, Xiaofan Li, Qianyu Peng, Xinyu Cui, Bo Ye, Yiming Li, Fan Cheng, Jian Zhao, Yang Cao, Zheng-Jun Zha, Ruili Feng

AI总结 提出时间区间编码(TIE),将旋转位置嵌入推广为区间感知形式,解决扩散变换器(DiT)在重叠事件视频生成中时间区间无法表示的问题,显著提升时间可控性。

详情
AI中文摘要

导演式提示、机器人动作预测和交互式视频代理需要对并发事件进行时间定位——在68%的通用片段和超过99%的机器人/游戏片段包含重叠事件的场景中,现有的事件生成器却基于单一活动提示假设。然而,现代视频生成器(如扩散变换器DiT)通过逐点位置编码将时间表示为离散点。这种表述造成了根本性的维度不匹配:时间上延展的区间和重叠事件在数学上无法被注意力机制表示。在本文中,我们提出时间区间编码(TIE),这是一种原则性的、即插即用的区间感知旋转嵌入推广,将时间区间提升为DiT交叉注意力中的一等公民。我们没有引入另一种启发式区间嵌入,而是证明,在兼容RoPE的双线性注意力中,TIE由两个基本原则刻画:时间可积性(要求事件在其整个持续时间内聚合位置证据)和持续时间不变性(消除对较长区间的平凡偏差)。在均匀核下,这种刻画产生了一个高效的闭式sinc解,该解保留了标准注意力接口,并通过区间积分自然地衰减边界噪声。实验上,TIE在保持基础DiT模型视觉质量的同时,显著提高了时间可控性。在OmniEvents数据集上的实验中,它将人工验证的时间约束满足率从77.34%提升至96.03%,将时间边界误差从0.261秒降低至0.073秒,同时改进了轨迹级时间对齐指标。代码和数据集可在https://github.com/MatrixTeam-AI/TIE获取。

英文摘要

Director-style prompting, robotic action prediction, and interactive video agents demand temporal grounding over concurrent events -- a regime in which 68% of general clips and over 99% of robotics/gameplay clips contain overlapping events, yet existing multi-event generators rest on a single-active-prompt assumption. However, modern video generators, such as Diffusion Transformers (DiT), represent time as discrete points through point-wise positional encodings. This formulation creates a fundamental dimension mismatch: temporally extended intervals and overlapping events are mathematically unrepresentable to the attention mechanism. In this paper, we propose Time Interval Encoding (TIE), a principled, plug-and-play interval-aware generalization of rotary embeddings that elevates time intervals to first-class primitives inside DiT cross-attention. Rather than introducing another heuristic interval embedding, we show that, within RoPE-compatible bilinear attention, TIE is characterized by two basic principles: Temporal Integrability, which requires an event to aggregate positional evidence over its full duration, and Duration Invariance, which removes the trivial bias toward longer intervals. Under a uniform kernel, this characterization yields an efficient closed-form sinc-based solution that preserves the standard attention interface and naturally attenuates boundary noise through interval integration. Empirically, TIE preserves the visual quality of the base DiT model while substantially improving temporal controllability. In our experiments on the OmniEvents dataset, it improves human-verified Temporal Constraint Satisfaction Rate from 77.34% to 96.03% and reduces temporal boundary error from 0.261s to 0.073s, while also improving trajectory-level temporal alignment metrics. The code and dataset are available at https://github.com/MatrixTeam-AI/TIE.

2603.18363 2026-05-26 cs.CL cs.AI cs.LG

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

PowerFlow: 通过原则性分布匹配释放LLMs的双重特性

Ruishuo Chen, Yu Chen, Zhuoran Li, Longbo Huang

AI总结 提出PowerFlow框架,将无监督微调重构成分布匹配问题,利用GFlowNet和长度感知轨迹平衡目标,通过调整α-幂分布方向性激发LLMs的逻辑推理或创造性。

Comments Camera-ready version accepted at ICML 2026

详情
AI中文摘要

无监督内部反馈强化学习(RLIF)已成为一种有前景的范式,可以在没有外部监督的情况下激发大型语言模型(LLMs)的潜在能力。然而,当前方法依赖于启发式内在奖励,通常缺乏明确的理论优化目标,并且容易产生退化偏差。在这项工作中,我们引入了PowerFlow,一个原则性框架,将无监督微调重新表述为分布匹配问题。通过将GFlowNet视为未归一化密度的摊销变分采样器,我们提出了一个长度感知的轨迹平衡目标,明确抵消了自回归生成中固有的结构长度偏差。通过针对$α$-幂分布,PowerFlow能够方向性地激发LLMs的双重特性:锐化分布($α> 1$)以增强逻辑推理,或展平分布($α< 1$)以释放表达性创造力。大量实验表明,PowerFlow始终优于现有的RLIF方法,匹配甚至超过有监督的GRPO。此外,通过减轻对齐模型中的过度锐化,我们的方法在多样性和质量上同时取得提升,在创造性任务中推动了帕累托前沿。

英文摘要

Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which often lack a well-defined theoretical optimization target and are prone to degenerative biases. In this work, we introduce PowerFlow, a principled framework that reformulates unsupervised fine-tuning as a distribution matching problem. By casting GFlowNet as an amortized variational sampler for unnormalized densities, we propose a length-aware Trajectory-Balance objective that explicitly neutralizes the structural length biases inherent in autoregressive generation. By targeting $α$-power distributions, PowerFlow enables the directional elicitation of the dual nature of LLMs: sharpening the distribution ($α> 1$) to intensify logical reasoning, or flattening it ($α< 1$) to unlock expressive creativity. Extensive experiments demonstrate that PowerFlow consistently outperforms existing RLIF methods, matching or even exceeding supervised GRPO. Furthermore, by mitigating over-sharpening in aligned models, our approach achieves simultaneous gains in diversity and quality, shifting the Pareto frontier in creative tasks.

2509.13389 2026-05-26 cs.AI

From Next Token Prediction to (STRIPS) World Models

从下一个词预测到(STRIPS)世界模型

Carlos Núñez-Molina, Vicenç Gómez, Hector Geffner

AI总结 研究下一个词预测能否产生支持规划的世界模型,提出STRIPS Transformer和标准Transformer两种架构,在五个经典规划领域评估训练准确率、泛化能力和规划性能。

详情
AI中文摘要

我们研究下一个词预测是否能够产生真正支持规划的世界模型,在一个受控的符号设置中,从动作轨迹单独学习命题STRIPS动作模型,并且可以精确评估正确性。我们引入了两种架构。第一种是STRIPS Transformer,一种符号对齐的模型,基于连接Transformer与STRIPS领域形式语言结构的理论结果。第二种是标准Transformer架构,没有内置显式符号结构,我们研究不同的位置编码方案和注意力聚合机制。我们在五个经典规划领域评估这两种架构,测量训练准确率、泛化能力以及跨领域和问题规模的规划性能。有趣的是,两种方法都可以产生支持使用现成STRIPS规划器在指数级多的未见初始状态和目标上进行规划的模型。尽管STRIPS Transformer具有强烈的符号归纳偏置,但它更难优化,并且需要更大的数据集才能可靠地泛化。相比之下,带有stick-breaking注意力的标准Transformer实现了近乎完美的训练准确率和强大的泛化能力。最后,没有stick-breaking注意力的标准Transformer无法泛化到长轨迹,而从较短轨迹训练的Transformer中提取的符号STRIPS模型则可以。

英文摘要

We study whether next-token prediction can yield world models that truly support planning, in a controlled symbolic setting where propositional STRIPS action models are learned from action traces alone and correctness can be evaluated exactly. We introduce two architectures. The first is the STRIPS Transformer, a symbolically aligned model grounded in theoretical results linking transformers and the formal language structure of STRIPS domains. The second is a standard transformer architecture without explicit symbolic structure built in, for which we study different positional encoding schemes and attention aggregation mechanisms. We evaluate both architectures on five classical planning domains, measuring training accuracy, generalization, and planning performance across domains and problem sizes. Interestingly, both approaches can be used to produce models that support planning with off-the-shelf STRIPS planners over exponentially many unseen initial states and goals. Although the STRIPS Transformer incorporates a strong symbolic inductive bias, it is harder to optimize and requires larger datasets to generalize reliably. In contrast, a standard transformer with stick-breaking attention achieves near-perfect training accuracy and strong generalization. Finally, standard transformers without stick-breaking attention do not generalize to long traces, whereas a symbolic STRIPS model extracted from a transformer trained on shorter traces does.

2602.05448 2026-05-26 cs.LG

BlitzRank: Principled Zero-shot Ranking Agents with Tournament Graphs

BlitzRank: 基于锦标赛图的原则性零样本排序智能体

Sheshansh Agrawal, Thien Hang Nguyen, Douwe Kiela

AI总结 提出一种基于锦标赛图框架的k-wise排序方法,通过聚合偏好图并计算传递闭包,以最少比较次数准确识别top-m项,在LLM重排序中实现25-40%的token节省。

Comments ICML 2026 spotlight

详情
AI中文摘要

通过昂贵的$k$元比较从$n$个项目中选出前$m$个,是从基于LLM的文档重排序到众包评估和锦标赛设计等场景的核心问题。现有方法要么依赖丢弃比较信息的启发式方法,要么以高昂成本利用比较信息。我们引入了一个锦标赛图框架,为$k$元排序提供了原则性基础。我们的关键观察是,每次$k$项比较揭示了$inom{k}{2}$个成对偏好的诱导锦标赛;将这些聚合到全局偏好图中并计算其传递闭包,可以在不额外调用预言机的情况下获得许多额外的排序。我们形式化了当前top-$m$输出何时可被确定,并设计了一种贪心查询调度,最大化识别top-$m$项的信息增益。该框架还能优雅地处理非传递性偏好——由现实世界预言机引起的循环——通过将它们折叠成等价类,从而产生原则性的分层排名。应用于14个基准测试和5个模型的LLM重排序,BlitzRank实现了对现有方法的帕累托优势:匹配或超过准确率,同时比同类方法少用25-40%的token;与成对重排序相比,它以7倍的token减少实现了近乎相同的质量。代码见https://github.com/ContextualAI/BlitzRank。

英文摘要

Selecting the top $m$ from $n$ items via expensive $k$-wise comparisons is central to settings ranging from LLM-based document reranking to crowdsourced evaluation and tournament design. Existing methods either rely on heuristics that discard comparison information, or exploit it at prohibitive cost. We introduce a tournament graph framework that provides a principled foundation for $k$-wise ranking. Our key observation is that each $k$-item comparison reveals an induced tournament of $\binom{k}{2}$ pairwise preferences; aggregating these into a global preference graph and computing its transitive closure yields many additional orderings without further oracle calls. We formalize when the current top-$m$ output is certifiably determined and design a greedy query schedule that maximizes information gain towards identifying the top-$m$ items. The framework also gracefully handles non-transitive preferences -- cycles induced by real-world oracles -- by collapsing them into equivalence classes that yield principled tiered rankings. Applied to LLM reranking across 14 benchmarks and 5 models, BlitzRank achieves Pareto dominance over existing approaches: matching or exceeding accuracy while requiring 25--40% fewer tokens than comparable methods; against pairwise reranking, it achieves near-identical quality with 7$\times$ fewer tokens. Code available at https://github.com/ContextualAI/BlitzRank.

2602.20191 2026-05-26 cs.LG cs.AI cs.CL

MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Any-Precision LLM

MoBiQuant: 面向令牌自适应任意精度LLM的混合比特量化

Dongwei Wang, Jinhee Kim, Seokho Han, Denis Gudovskiy, Yohei Nakata, Tomoyuki Okuno, KhayTze Peong, Kang Eun Jeon, Jong Hwan Ko, Yiran Chen, Huanrui Yang

AI总结 针对动态运行时约束下大语言模型任意精度量化的泛化性问题,提出基于令牌敏感度的混合比特量化框架MoBiQuant,通过多合一递归残差量化和令牌感知路由器实现灵活推理,在匹配或超越前沿单精度PTQ的同时显著节省内存并提升吞吐量。

Comments 20 pages, 10 figures

详情
AI中文摘要

动态运行时延迟和内存约束要求灵活部署大语言模型(LLM),使得LLM能够根据可用计算资源以不同的量化精度进行推理。最近关于这种任意精度量化的工作要么依赖于硬件效率低下的向量量化,要么在切换位宽时引入额外的缩放因子。同时,现有的为固定低精度校准的后训练量化(PTQ)方法在运行时精度变化下表现出较差的泛化性。在这项工作中,我们将跨位宽泛化性差的根源归因于一种精度依赖的“异常迁移”现象,其中PTQ敏感令牌的分布随精度变化。受此观察启发,我们提出了 exttt{MoBiQuant},一种新颖的任意精度混合比特量化框架,它根据令牌敏感性调整权重精度以实现灵活的LLM推理。具体来说,我们提出了一种多合一递归残差量化方法,可以在运行时迭代重建更高精度的权重,并通过令牌感知路由器缓解“异常迁移”,动态选择每个令牌的最优推理精度。大量实验表明, exttt{MoBiQuant}在匹配或超越前沿单精度PTQ的同时表现出强大的弹性,与最先进的任意精度方法相比,实现了显著的内存节省和高达$1.34 imes$的吞吐量提升。

英文摘要

Dynamic runtime latency and memory constraints necessitate flexible large language model (LLM) deployment, where an LLM can be inferred with various quantization precisions based on available computational resources. Recent work on such any-precision quantization either relies on hardware-inefficient vector quantization or induces additional scaling factors when switching between bit-widths. Meanwhile, existing post-training quantization (PTQ) methods calibrated for a fixed low precision show poor generalizability under runtime precision change. In this work, we attribute the source of poor generalization across bit-widths to a precision-dependent \textit{outlier migration} phenomenon where the distribution of PTQ-sensitive tokens changes across precisions. Motivated by this observation, we propose \texttt{MoBiQuant}, a novel any-precision Mixture-of-Bits quantization framework that adjusts weight precision for flexible LLM inference based on token sensitivity. Specifically, we propose a many-in-one recursive residual quantization that can iteratively reconstruct higher-precision weights at runtime and mitigates \textit{outlier migration} with a token-aware router to dynamically select the optimal inference precision of each token.Extensive experiments show that \texttt{MoBiQuant} matches or surpasses frontier single-precision PTQ while exhibiting strong elasticity, achieving significant memory savings and throughput gains of up to $1.34\times$ over state-of-the-art any-precision methods.

2602.17162 2026-05-26 cs.AI q-bio.GN

JEPA-DNA: Grounding Genomic Foundation Models through Joint-Embedding Predictive Architectures

JEPA-DNA:通过联合嵌入预测架构夯实基因组基础模型

Ariel Larey, Elay Dahan, Amit Bleiweiss, Raizy Kellerman, Guy Leib, Omri Nayshool, Dan Ofer, Tal Zinger, Dan Dominissini, Gideon Rechavi, Nicole Bussola, Simon Lee, Shane O'Connell, Dung Hoang, Marissa Wirth, Alexander W. Charney, Nati Daniel, Yoli Shavit

AI总结 提出JEPA-DNA框架,将联合嵌入预测架构与生成式目标结合,通过潜在空间监督全局序列嵌入,实现从令牌恢复到语义对齐的转变,在17项基因组基准任务上提升线性探测和零样本性能,达到新最优。

详情
AI中文摘要

基因组基础模型(GFM)通常依赖掩码语言建模(MLM)或下一令牌预测(NTP)来学习“自然法则”。虽然这些生成范式在捕捉局部语法方面有效,但它们优先考虑令牌级重建而非高级功能上下文。我们引入JEPA-DNA,一个模型无关的持续训练框架,将联合嵌入预测架构(JEPA)与传统生成式目标相结合。通过在潜在空间中监督全局序列嵌入,JEPA-DNA迫使模型预测掩码基因组片段的功能表示,将学习信号从令牌恢复转向语义对齐。我们在17个不同的基因组基准任务上评估JEPA-DNA,证明无论底层GFM架构或生成式目标如何,在线性探测和零样本性能上均有一致提升。我们的框架通过弥合生成精度与潜在语义基础之间的差距,建立了GFM的新最优水平,超越了现有最佳模型。通过广泛的消融研究,我们进一步表征了生成式目标与潜在目标之间的协同交互。我们的代码公开在https://github.com/NVIDIA-Digital-Bio/JEPA-DNA。

英文摘要

Genomic Foundation Models (GFMs) typically rely on Masked Language Modeling (MLM) or Next-Token Prediction (NTP) to learn the "Laws of Nature". While effective at capturing local syntax, these generative paradigms prioritize token-level reconstruction over high-level functional context. We introduce JEPA-DNA, a model-agnostic continual training framework that integrates a Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. By supervising global sequence embeddings in a latent space, JEPA-DNA forces models to predict the functional representations of masked genomic segments, shifting the learning signal from token recovery to semantic alignment. We evaluate JEPA-DNA on 17 diverse genomic benchmark tasks, demonstrating consistent gains in linear probing and zero-shot performance regardless of the underlying GFM architecture or generative objective. Our framework establishes a new state-of-the-art for GFMs, surpassing the best existing models by bridging generative precision with latent semantic grounding. Through extensive ablation studies, we further characterize the synergistic interplay between generative and latent objectives. Our code is publicly available at https://github.com/NVIDIA-Digital-Bio/JEPA-DNA.

2601.20539 2026-05-26 cs.AI cs.CL

PathWise: Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs

PathWise:通过世界模型规划实现基于自进化LLM的自动启发式设计

Oguzhan Gungordu, Siheng Xiong, Faramarz Fekri

AI总结 提出PathWise多智能体推理框架,将启发式生成建模为基于蕴含图的序列决策过程,通过策略智能体、世界模型智能体和评论智能体的协作实现状态感知规划,在组合优化问题上收敛更快、泛化更强。

Comments Accepted to ICML 2026

详情
AI中文摘要

大型语言模型(LLM)已实现组合优化问题(COP)的自动启发式设计(AHD),但现有框架依赖固定的进化规则和静态提示模板,常导致短视的启发式生成、冗余评估以及对新启发式如何推导的有限推理。我们提出一种新颖的多智能体推理框架,称为通过世界模型规划实现基于自进化LLM的自动启发式设计(PathWise),该框架将启发式生成公式化为一个基于蕴含图的序列决策过程,该图作为搜索轨迹的紧凑、有状态记忆。这种方法使系统能够继承过去的决策,并在不同代之间重用或避免推导信息。策略智能体规划进化动作,世界模型智能体根据这些动作生成启发式展开,评论智能体提供路由反思,总结先前步骤的经验教训,将基于LLM的AHD从试错进化转变为通过推理进行状态感知规划。在多种COP上的实验表明,PathWise能更快收敛到更好的启发式,在不同LLM骨干上泛化,并扩展到更大规模的问题。

英文摘要

Large Language Models (LLMs) have enabled automated heuristic design (AHD) for combinatorial optimization problems (COPs), but existing frameworks' reliance on fixed evolutionary rules and static prompt templates often leads to myopic heuristic generation, redundant evaluations, and limited reasoning about how new heuristics should be derived. We propose a novel multi-agent reasoning framework, referred to as Planning through World Model for Automated Heuristic Design via Self-Evolving LLMs (PathWise), which formulates heuristic generation as a sequential decision process over an entailment graph serving as a compact, stateful memory of the search trajectory. This approach allows the system to carry forward past decisions and reuse or avoid derivation information across generations. A policy agent plans evolutionary actions, a world model agent generates heuristic rollouts conditioned on those actions, and critic agents provide routed reflections summarizing lessons from prior steps, shifting LLM-based AHD from trial-and-error evolution toward state-aware planning through reasoning. Experiments across diverse COPs show that PathWise converges faster to better heuristics, generalizes across different LLM backbones, and scales to larger problem sizes.

2512.12677 2026-05-26 cs.CL cs.AI

Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

微调因果大语言模型用于文本分类:基于嵌入与基于指令的方法

Amirhossein Yousefiramandi, Ciaran Cooney

AI总结 本文探索在资源受限下微调解码器-only大语言模型用于文本分类,比较了基于嵌入的分类头方法和基于指令的微调方法,并采用4位量化与LoRA实现高效训练,实验表明嵌入头方法在单标签分类中匹配或超越微调BERT基线,而指令微调仅在多标签且大参数量时有效。

Comments 20 pages, 5 figures

详情
AI中文摘要

我们探索在资源受限下高效微调解码器-only大语言模型(LLMs)用于下游文本分类的策略。研究了两种方法:(1) 将分类头附加到预训练的因果LLM上,并在任务上微调,使用LLM的最终token嵌入作为序列表示;(2) 以提示-响应的格式对LLM进行指令微调以进行分类。为了在单GPU上微调高达8B参数的模型,我们将4位模型量化与低秩适配(LoRA)结合,实现参数高效训练。在两个专利基准测试(一个5类单标签内部语料库和具有14个类别的公共WIPO-Alpha多标签数据集)上的实验表明,嵌入头方法在单标签分类中匹配或超过微调BERT基线,同时训练参数少10-30倍。指令微调仅在多标签场景下具有竞争力,且需要至少1亿参数的大幅可训练预算。这些结果表明,直接利用因果LLM的内部表示,结合高效微调技术,在有限计算资源下能产生强大的分类性能。我们讨论了每种方法的优势,并概述了在分类场景中优化LLM微调的实用指南和未来方向。

英文摘要

We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a pretrained causal LLM and fine-tuning it on the task, using the LLM's final-token embedding as a sequence representation, and (2) instruction-tuning the LLM in a prompt-to-response format for classification. To enable single-GPU fine-tuning of models up to 8B parameters, we combine 4-bit model quantization with Low-Rank Adaptation (LoRA) for parameter-efficient training. Experiments on two patent benchmarks, a 5-class single-label internal corpus and the public WIPO-Alpha multi-label dataset with 14 categories, show that the embedding-head approach matches or exceeds fine-tuned BERT baselines on single-label classification while training 10-30x fewer parameters. Instruction-tuning is competitive only in the multi-label regime, and only with substantially larger trainable budgets of at least 100M parameters. These results demonstrate that directly leveraging the internal representations of causal LLMs, together with efficient fine-tuning techniques, yields strong classification performance under limited computational resources. We discuss the advantages of each approach and outline practical guidelines and future directions for optimizing LLM fine-tuning in classification scenarios.

2512.05402 2026-05-26 cs.LG cs.AI cs.CE cs.NE

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction

挖矿的智能时机:用于比特币硬件投资回报率预测的深度学习框架

Sithumi Wickramasinghe, Bikramjit Das, Dorien Herremans

AI总结 提出MineROI-Net,一种基于Transformer的深度学习框架,将比特币ASIC硬件采购建模为时间序列分类任务,预测一年内的投资回报率类别,在2015-2024年20种ASIC矿机数据上达到83.2%准确率和83.5%宏F1分数。

详情
AI中文摘要

由于市场波动、技术快速过时和协议驱动的收入周期,比特币挖矿硬件的获取需要战略时机。尽管挖矿已演变为资本密集型行业,但关于何时购买新的专用集成电路(ASIC)硬件的指导很少,且没有先前的计算框架解决这一决策问题。我们通过将硬件获取建模为时间序列分类任务来填补这一空白,预测购买ASIC机器是否在一年内产生盈利(投资回报率(ROI)>= 1)、边际(0 < ROI < 1)或亏损(ROI <= 0)的回报。我们提出了MineROI-Net,一种开源的基于Transformer的架构,旨在捕捉挖矿盈利能力中的多尺度时间模式。在2015年至2024年间发布的20种ASIC矿机在不同市场体制下的数据上评估,MineROI-Net优于循环、卷积和基于注意力的基线,达到了83.2%的准确率和83.5%的宏F1分数。该模型展示了强大的经济相关性,在检测亏损时期达到了97.8%的精确率,在检测盈利时期达到了81.5%的精确率,同时避免了将盈利场景误分类为亏损以及反之亦然。这些结果表明,MineROI-Net为挖矿硬件采购时机提供了一种实用的数据驱动工具,可能降低资本密集型挖矿操作中的财务风险。

英文摘要

Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a capital-intensive industry, there is little guidance on when to purchase new Application-Specific Integrated Circuit (ASIC) hardware, and no prior computational frameworks address this decision problem. We address this gap by formulating hardware acquisition as a time series classification task, predicting whether purchasing ASIC machines yields profitable (Return on Investment (ROI) >= 1), marginal (0 < ROI < 1), or unprofitable (ROI <= 0) returns within one year. We propose MineROI-Net, an open-source Transformer-based architecture designed to capture multi-scale temporal patterns in mining profitability. Evaluated on data from 20 ASIC miners released between 2015 and 2024 across diverse market regimes, MineROI-Net outperforms recurrent, convolutional, and attention-based baselines, achieving 83.2% accuracy and 83.5% macro F1-score. The model demonstrates strong economic relevance, achieving 97.8% precision in detecting unprofitable periods and 81.5% precision in detecting profitable ones, while avoiding misclassifying profitable scenarios as unprofitable and vice versa. These results indicate that MineROI-Net offers a practical, data-driven tool for timing mining hardware acquisitions, potentially reducing financial risk in capital-intensive mining operations.

2509.23413 2026-05-26 cs.LG

URS: A Unified Neural Routing Solver for Cross-Problem Zero-Shot Generalization

URS:一种面向跨问题零样本泛化的统一神经路由求解器

Changliang Zhou, Canhong Yu, Shunyu Yao, Xi Lin, Zhenkun Wang, Yu Zhou, Qingfu Zhang

AI总结 提出URS,一种统一神经路由求解器,通过统一数据表示和混合偏置模块,实现单个模型在110种车辆路径问题变体(含99种未见变体)上的零样本泛化,并支持高达7000节点的规模。

Comments accepted by ICML 2026

详情
AI中文摘要

多任务神经路由求解器因其能够使用单个模型解决多种车辆路径问题(VRP)而成为一种有前景的范式。然而,现有的神经求解器通常依赖预定义的问题约束或需要针对每个问题进行微调,这极大地限制了它们对未见VRP变体的零样本泛化能力。为了解决这一关键瓶颈,我们提出了URS,一种统一的神经路由求解器,能够通过单个模型在广泛的未见VRP变体上实现零样本泛化。我们提出了一种统一数据表示(UDR),用数据统一替代问题枚举,从而扩大了问题覆盖范围并减少了对领域专业知识的依赖。此外,我们在编码过程中引入了一个混合偏置模块(MBM)来改进节点嵌入,该模块有效地捕获了各种问题固有的多个先验。在UDR的基础上,我们开发了一个问题条件参数生成器,以进一步提高零样本泛化能力。大量实验表明,URS能够为110种VRP变体(包括99种未见变体)持续生成高质量的解,同时展现出对多达7000个节点的大规模实例的出色可扩展性。据我们所知,URS是第一个能够通过单个模型处理超过100种VRP变体的神经求解器。我们的代码可在https://github.com/CIAM-Group/URS获取。

英文摘要

Multi-task neural routing solvers have emerged as a promising paradigm for their ability to solve multiple vehicle routing problems (VRPs) using a single model. However, existing neural solvers typically rely on predefined problem constraints or require per-problem fine-tuning, which substantially limits their zero-shot generalization ability to unseen VRP variants. To address this critical bottleneck, we propose URS, a unified neural routing solver that achieves zero-shot generalization across a wide range of unseen VRPs with a single model. We propose a unified data representation (UDR) that replaces problem enumeration with data unification, thereby broadening the problem coverage and reducing reliance on domain expertise. In addition, we introduce a Mixed Bias Module (MBM) during encoding to improve node embeddings, which efficiently captures multiple priors inherent to various problems. On top of the UDR, we develop a problem-conditioned parameter generator to further improve zero-shot generalization. Extensive experiments show that URS consistently produces high-quality solutions for 110 VRP variants (including 99 unseen variants) while demonstrating impressive scalability to large-scale instances with up to 7000 nodes. To the best of our knowledge, URS is the first neural solver to handle over 100 VRP variants with a single model. Our code is available at https://github.com/CIAM-Group/URS.

2505.20110 2026-05-26 cs.LG cs.AI

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

超越代理:用于离线GFlowNet训练的轨迹蒸馏指导

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang

AI总结 提出轨迹蒸馏GFlowNet(TD-GFN),利用逆强化学习从离线轨迹中提取稠密边奖励,通过DAG剪枝和优先反向采样指导策略,避免代理模型,提升离线GFlowNet训练的收敛速度和样本质量。

Comments Camera-ready version accepted at ICML 2026

详情
AI中文摘要

生成流网络(GFlowNets)擅长采样多样化的高奖励对象。在许多实际应用中,由于无法进行主动奖励查询,这些模型必须使用静态离线数据集进行训练。主流的训练方法通常依赖代理模型为在线采样的轨迹提供奖励反馈。然而,由于数据稀缺或评估成本高,构建可靠的代理往往具有挑战性。虽然现有的无代理方法试图解决这一问题,但它们通常施加粗糙的约束,限制了模型有效探索的能力。为了克服这些限制,我们提出了轨迹蒸馏GFlowNet(TD-GFN),一种新颖的无代理训练框架。TD-GFN利用逆强化学习(IRL)从离线轨迹中提取稠密的、转移级别的边奖励,为高效探索提供丰富的结构指导。关键的是,为了确保鲁棒性,这些奖励通过DAG剪枝和优先反向采样间接指导策略。这种设计确保梯度更新仅依赖于数据集中的真实终端奖励,从而防止错误传播。实验结果表明,TD-GFN在收敛速度和样本质量上显著优于广泛的现有基线,为离线GFlowNet训练建立了更鲁棒和高效的范式。

英文摘要

Generative Flow Networks (GFlowNets) excel at sampling diverse, high-reward objects. In many practical applications where active reward queries are infeasible, these models must be trained using static offline datasets. Prevailing training methods typically rely on a proxy model to provide reward feedback for online sampled trajectories. However, constructing a reliable proxy is often challenging due to data scarcity or high evaluation costs. While existing proxy-free approaches attempt to address this, they often impose coarse constraints that limit the model's ability to explore effectively. To overcome these limitations, we propose Trajectory-Distilled GFlowNet (TD-GFN), a novel proxy-free training framework. TD-GFN utilizes inverse reinforcement learning (IRL) to extract dense, transition-level edge rewards from offline trajectories, providing rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards guide the policy indirectly through DAG pruning and prioritized backward sampling. This design ensures that gradient updates rely exclusively on ground-truth terminal rewards from the dataset, thereby preventing error propagation. Empirical results demonstrate that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

2509.15543 2026-05-26 cs.LG

Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noise

重尾噪声下的非凸分布式随机双层优化

Xinwen Zhang, Yihan Zhang, Heng Liang, Hongchang Gao

AI总结 针对重尾噪声下的非凸双层优化问题,提出一种无需裁剪的归一化随机方差缩减梯度下降算法,并首次给出严格收敛性证明。

详情
AI中文摘要

现有的分布式随机优化方法假设下层损失函数是强凸的且随机梯度噪声具有有限方差,这些强假设在现实机器学习模型中通常不满足。例如,语言数据上的学习通常导致重尾梯度。为了解决这些局限性,我们针对重尾噪声下的非凸双层优化问题,开发了一种新颖的分布式随机双层优化算法。具体地,我们提出了一种归一化随机方差缩减双层梯度下降算法,该算法不依赖于任何裁剪操作。此外,通过创新性地在重尾噪声下对非凸分布式双层优化问题中的相互依赖梯度序列进行界定的方法,我们建立了其收敛速率。据我们所知,这是第一个在重尾噪声下具有严格理论保证的分布式双层优化算法。大量的实验结果证实了我们的算法在处理重尾噪声方面的有效性。

英文摘要

Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. For example, learning on language data typically leads to heavy-tailed gradient. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noise. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noise for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noise. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noise.

2509.10452 2026-05-26 cs.CL cs.LG

WhisTLE: Deeply Supervised, Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers

WhisTLE: 深度监督的文本领域自适应方法用于预训练语音识别Transformer

Akshat Pandey, Karun Kumar, Raphael Tang

AI总结 提出WhisTLE,一种通过变分自编码器建模文本到编码器输出并微调解码器的文本领域自适应方法,显著降低词错误率。

Comments 10 pages

详情
AI中文摘要

预训练的自动语音识别(ASR)模型(如Whisper)表现良好,但仍需领域自适应以处理未见过的用语。在许多实际场景中,收集语音数据不切实际,因此需要仅文本的自适应。我们提出WhisTLE,一种用于预训练编码器-解码器ASR模型的深度监督文本自适应方法。WhisTLE训练一个变分自编码器(VAE)从文本建模编码器输出,并使用学习到的文本到潜在编码器微调解码器,可选地与文本到语音(TTS)自适应结合。在推理时,恢复原始编码器,不产生额外运行时成本。在四个数据集和四个ASR模型上,带有TTS的WhisTLE相对降低了49.0%的词错误率(WER),并在112个场景中的100个中优于所有非WhisTLE基线。我们还发现WhisTLE与任何其他领域自适应方法的组合都能互补增强;因此我们建议在标准流程中纳入WhisTLE以自适应编码器-解码器ASR模型。

英文摘要

Pretrained automatic speech recognition (ASR) models such as Whisper perform well but still need domain adaptation to handle unseen parlance. In many real-world settings, collecting speech data is impractical, necessitating text-only adaptation. We propose WhisTLE, a deeply supervised, text-only adaptation method for pretrained encoder-decoder ASR models. WhisTLE trains a variational autoencoder (VAE) to model encoder outputs from text and fine-tunes the decoder using the learned text-to-latent encoder, optionally combined with text-to-speech (TTS) adaptation. At inference, the original encoder is restored, incurring no extra runtime cost. Across four datasets and four ASR models, WhisTLE with TTS reduces word error rate (WER) by a relative 49.0% and outperforms all non-WhisTLE baselines in 100 of 112 scenarios. We also find that WhisTLE additively complements any combination of other domain adaptation approaches; we thus recommend the inclusion of WhisTLE during standard processes for adapting encoder-decoder ASR models.

2509.02113 2026-05-26 cs.LG cs.AI cs.CR cs.SI

HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis

HiGraph:用于恶意软件分析的大规模层次图数据集

Han Chen, Hanchen Wang, Hongmei Chen, Ying Zhang, Lu Qin, Wenjie Zhang

AI总结 针对现有图方法忽略软件层次结构的问题,提出包含2亿控制流图和59.5万函数调用图的大规模层次图数据集HiGraph,用于构建抗混淆和演化的鲁棒恶意软件检测器。

Comments updated dataset statistics

详情
AI中文摘要

基于图的恶意软件分析的进展受到缺乏捕捉软件固有层次结构的大规模数据集的严重限制。现有方法通常将程序简化为单层图,未能建模高层功能交互与低层指令逻辑之间的关键语义关系。为填补这一空白,我们引入了\dataset,这是用于恶意软件分析的最大公开层次图数据集,包含嵌套在 extbf{595K}个函数调用图(FCG)中的超过 extbf{2亿}个控制流图(CFG)。这种两层表示保留了构建对代码混淆和恶意软件演化具有鲁棒性的检测器所必需的结构语义。我们通过大规模分析展示了HiGraph的实用性,揭示了良性软件和恶意软件的不同结构特性,将其确立为社区的基础基准。数据集和工具可在https://higraph.org公开获取。

英文摘要

The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested within \textbf{595K} Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.

2405.01906 2026-05-26 cs.AI cs.LG cs.NE

Instance-Conditioned Adaptation for Large-scale Generalization of Neural Routing Solver

实例条件适应:神经路由求解器的大规模泛化

Changliang Zhou, Xi Lin, Zhenkun Wang, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang

AI总结 提出实例条件适应模型(ICAM),通过简单高效的实例条件适应函数和低复杂度的适应模块,显著提升神经路由求解器在大规模旅行商问题(TSP)、容量车辆路径问题(CVRP)和非对称旅行商问题(ATSP)上的泛化性能,同时保持快速推理速度。

Comments 13 pages, 5 figures

详情
Journal ref
IEEE Transactions on Intelligent Transportation Systems, 2026
AI中文摘要

神经组合优化(NCO)方法在无需专家知识的情况下,展现出了解决智能交通系统路由问题的巨大潜力。然而,现有的构造性NCO方法仍难以解决大规模实例,这严重限制了其应用前景。为了解决这些关键缺陷,本文提出了一种新颖的实例条件适应模型(ICAM),以实现神经路由求解器更好的大规模泛化。特别地,我们设计了一个简单而高效的实例条件适应函数,以较小的时空开销显著提升现有NCO模型的泛化性能。此外,通过对不同注意力机制之间信息融合性能的系统研究,我们进一步提出了一个强大且低复杂度的实例条件适应模块,为不同规模的实例生成更好的解。在合成实例和基准实例上的大量实验结果表明,我们提出的方法能够在解决大规模旅行商问题(TSP)、容量车辆路径问题(CVRP)和非对称旅行商问题(ATSP)时,以非常快的推理时间获得有希望的结果。我们的代码可在 https://github.com/CIAM-Group/ICAM 获取。

英文摘要

The neural combinatorial optimization (NCO) method has shown great potential for solving routing problems of intelligent transportation systems without requiring expert knowledge. However, existing constructive NCO methods still struggle to solve large-scale instances, which significantly limits their application prospects. To address these crucial shortcomings, this work proposes a novel Instance-Conditioned Adaptation Model (ICAM) for better large-scale generalization of neural routing solvers. In particular, we design a simple yet efficient instance-conditioned adaptation function to significantly improve the generalization performance of existing NCO models with a small time and memory overhead. In addition, with a systematic investigation on the performance of information incorporation between different attention mechanisms, we further propose a powerful yet low-complexity instance-conditioned adaptation module to generate better solutions for instances across different scales. Extensive experimental results on both synthetic and benchmark instances show that our proposed method is capable of obtaining promising results with a very fast inference time in solving large-scale Traveling Salesman Problems (TSPs), Capacitated Vehicle Routing Problems (CVRPs), and Asymmetric Traveling Salesman Problems (ATSPs). Our code is available at https://github.com/CIAM-Group/ICAM.

2412.15678 2026-05-26 cs.CV

Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network

多对时序句子定位的多线程知识迁移网络

Xiang Fang, Wanlong Fang, Changshuo Wang, Daizong Liu, Keke Tang, Jianfeng Dong, Pan Zhou, Beibei Li

AI总结 提出多对时序句子定位新任务,并设计多线程知识迁移网络,通过跨模态对比、原型对齐和自适应负样本选择实现多对视频-查询对的协同训练。

Comments Accepted by AAAI 2025

详情
AI中文摘要

给定一些包含未修剪视频和句子查询的视频-查询对,时序句子定位(TSG)旨在定位这些视频中与查询相关的片段。尽管先前优秀的TSG方法取得了显著成功,但它们单独训练每个视频-查询对,忽略了不同对之间的关系。我们观察到,相似的视频/查询内容不仅有助于TSG模型更好地理解和泛化跨模态表示,还能帮助模型定位一些复杂的视频-查询对。先前的方法遵循单线程框架,无法共同训练不同的对,并且通常花费大量时间重新获取冗余知识,限制了其实际应用。为此,在本文中,我们提出了一种全新的设置:多对TSG,旨在共同训练这些对。特别地,我们提出了一种新颖的视频-查询共同训练方法,即多线程知识迁移网络,以有效且高效地定位各种视频-查询对。首先,我们挖掘不同查询之间的空间和时间语义以相互协作。为了同时学习模态内和模态间表示,我们设计了一个跨模态对比模块,通过自监督策略探索语义一致性。为了充分对齐不同对之间的视觉和文本表示,我们设计了一种原型对齐策略,以1)匹配对象原型和短语原型以实现空间对齐,以及2)对齐活动原型和句子原型以实现时间对齐。最后,我们开发了一个自适应负样本选择模块,以自适应地生成跨模态匹配的阈值。大量实验表明了我们提出方法的有效性和效率。

英文摘要

Given some video-query pairs with untrimmed videos and sentence queries, temporal sentence grounding (TSG) aims to locate query-relevant segments in these videos. Although previous respectable TSG methods have achieved remarkable success, they train each video-query pair separately and ignore the relationship between different pairs. We observe that the similar video/query content not only helps the TSG model better understand and generalize the cross-modal representation but also assists the model in locating some complex video-query pairs. Previous methods follow a single-thread framework that cannot co-train different pairs and usually spends much time re-obtaining redundant knowledge, limiting their real-world applications. To this end, in this paper, we pose a brand-new setting: Multi-Pair TSG, which aims to co-train these pairs. In particular, we propose a novel video-query co-training approach, Multi-Thread Knowledge Transfer Network, to locate a variety of video-query pairs effectively and efficiently. Firstly, we mine the spatial and temporal semantics across different queries to cooperate with each other. To learn intra- and inter-modal representations simultaneously, we design a cross-modal contrast module to explore the semantic consistency by a self-supervised strategy. To fully align visual and textual representations between different pairs, we design a prototype alignment strategy to 1) match object prototypes and phrase prototypes for spatial alignment, and 2) align activity prototypes and sentence prototypes for temporal alignment. Finally, we develop an adaptive negative selection module to adaptively generate a threshold for cross-modal matching. Extensive experiments show the effectiveness and efficiency of our proposed method.

2412.06284 2026-05-26 cs.CV

Your Data Is Not Perfect: Towards Cross-Domain Out-of-Distribution Detection in Class-Imbalanced Data

你的数据并不完美:面向类别不平衡数据中的跨域分布外检测

Xiang Fang, Arvind Easwaran, Blaise Genest, Ponnuthurai Nagaratnam Suganthan

AI总结 针对跨域类别不平衡的分布外检测问题,提出基于原型对齐的不确定性感知自适应语义对齐网络(UASA),通过标签驱动原型、自适应阈值和不确定性感知聚类缩小域间隙、语义间隙和类别不平衡间隙。

Comments Accepted by Expert Systems with Applications

详情
AI中文摘要

以往的OOD检测系统只关注ID和OOD样本之间的语义差距。除了语义差距,我们还面临两个额外的差距:源域和目标域之间的域差距,以及不同类别之间的类别不平衡差距。事实上,来自不同域的相似对象应该属于同一类别。在本文中,我们引入了一个现实且具有挑战性的设置:类别不平衡的跨域OOD检测(CCOD),该设置包含一个标注良好(但通常较小)的源集用于训练,并在一个未标注(但通常较大)的目标集上进行OOD检测。我们不假设目标域仅包含OOD类别或类别平衡:目标数据集的类别分布不必与源数据集相同。为了应对这一具有挑战性的设置,我们提出了一种基于原型对齐策略的新型不确定性感知自适应语义对齐网络(UASA)。具体来说,我们首先在源域中构建标签驱动的原型,并利用这些原型进行目标分类以缩小域差距。我们不是使用固定阈值进行OOD检测,而是生成自适应样本级阈值来处理语义差距。最后,我们进行不确定性感知聚类,将语义相似的目标样本分组,以缓解类别不平衡差距。在三个具有挑战性的基准上的大量实验表明,我们提出的UASA以较大优势优于最先进的方法。

英文摘要

Previous OOD detection systems only focus on the semantic gap between ID and OOD samples. Besides the semantic gap, we are faced with two additional gaps: the domain gap between source and target domains, and the class-imbalance gap between different classes. In fact, similar objects from different domains should belong to the same class. In this paper, we introduce a realistic yet challenging setting: class-imbalanced cross-domain OOD detection (CCOD), which contains a well-labeled (but usually small) source set for training and conducts OOD detection on an unlabeled (but usually larger) target set for testing. We do not assume that the target domain contains only OOD classes or that it is class-balanced: the distribution among classes of the target dataset need not be the same as the source dataset. To tackle this challenging setting with an OOD detection system, we propose a novel uncertainty-aware adaptive semantic alignment (UASA) network based on a prototype-based alignment strategy. Specifically, we first build label-driven prototypes in the source domain and utilize these prototypes for target classification to close the domain gap. Rather than utilizing fixed thresholds for OOD detection, we generate adaptive sample-wise thresholds to handle the semantic gap. Finally, we conduct uncertainty-aware clustering to group semantically similar target samples to relieve the class-imbalance gap. Extensive experiments on three challenging benchmarks demonstrate that our proposed UASA outperforms state-of-the-art methods by a large margin.

2409.02416 2026-05-26 cs.LG stat.ML

Relative Translation Invariant Wasserstein Distance

相对平移不变Wasserstein距离

Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

AI总结 受Bures距离启发,提出相对平移不变Wasserstein距离RW_p,证明其度量性质,并设计双层算法计算离散分布间的RW_p距离,当p=2时提出RW_2-LP和RW_2-Sinkhorn算法以提高数值稳定性,实验验证了算法在减少数值误差和实际雷暴模式检索中的有效性。

Comments Accepted by Transactions on Machine Learning Research (TMLR). Final accepted version. The implementation is publicly available at \url{https://github.com/DRKWang/rw_metric}

详情
AI中文摘要

受Bures距离启发,我们引入了一类新的距离族——\\emph{相对平移不变Wasserstein距离},记为$RW_p$,作为经典Wasserstein距离$W_p$($p \\\in [1, +\\\infty)$)的推广。我们证明了$RW_p$定义了一个有效的度量,并表明这类度量比经典Wasserstein距离更具内在性。设计了一种双层算法来计算任意离散分布之间的一般$RW_p$距离。此外,当$p=2$时,我们证明在离散设定下最优耦合矩阵在分布平移下不变,并进一步提出了两种算法,即$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法,以提高计算$W_2$距离和最优耦合矩阵解的数值稳定性。最后,我们进行了三个实验来验证我们的理论结果和算法。前两个实验报告了$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法(无论是否归一化)相比标准算法能显著减少数值误差。第三个实验表明$RW_p$算法在计算上具有可扩展性,并适用于实际应用中相似雷暴模式的检索。

英文摘要

Motivated by the Bures distance, we introduce a new family of distances, \emph{relative translation invariant Wasserstein distances}, denoted by $RW_p$, as an extension of the classical Wasserstein distances $W_p$ for $p \in [1, +\infty)$. We establish that $RW_p$ defines a valid metric and demonstrate that this type of metric is more intrinsic than the classical Wasserstein distance. A bi-level algorithm is designed to compute the general $RW_p$ distance between arbitrary discrete distributions. Moreover, when $p = 2$, we show that the optimal coupling matrix is invariant under distributional translation in the discrete setting, and we further propose two algorithms, the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, to improve the numerical stability of computing $W_2$ distance and the optimal coupling matrix solutions. Finally, we conduct three experiments to validate our theoretical results and algorithms. The first two experiments report that the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, both with and without normalization, can significantly reduce the numerical errors compared to standard algorithms. The third experiment shows that $RW_p$ algorithms are computationally scalable and applicable to the retrieval of similar thunderstorm patterns in practical applications.

2605.26100 2026-05-26 cs.SE cs.AI

Beyond Summaries: Structure-Aware Labeling of Code Changes with Large Language Models

超越摘要:基于结构感知的代码变更标注与大型语言模型

Bar Weiss, Antonio Abu-Nassar, Adi Sosnovich, Karen Yorav

AI总结 提出两阶段流水线,利用大型语言模型对代码补丁中的变更进行基于分类的标注,捕获结构关系和语义属性,以提升代码审查效率。

Comments 13 pages, 6 figures

详情
AI中文摘要

代码审查是软件工程中的关键实践,然而现代项目中代码补丁的规模和频率不断增长,加上AI代码助手的广泛采用,使得人工审查越来越具有挑战性。识别补丁中的变更类型(如重命名、移动或逻辑修改)可以通过实现优先级排序、过滤和自动化来显著提高审查效率。然而,现有的基于LLM的代码审查方法主要集中在摘要和评论生成上,结构化代码审查尚未得到充分探索。在本文中,我们系统研究了使用大型语言模型(LLMs)对代码补丁中的代码变更进行基于分类的标注。我们引入了一个两阶段流水线,首先为差异块分配标签,然后对其进行细化以捕获结构关系和语义属性,例如重命名传播和类型变更。我们的方法采用少样本提示来生成与语言无关且可定制的标签,无需传统静态分析流水线的工程开销。我们在一个手动策划的自然和合成补丁基准上,跨多个上下文配置评估了四个LLM。我们的最佳配置实现了高达84%的召回率和81%的精确率,并在提取关系和属性元数据方面具有高准确性。这些结果表明,基于LLM的标注可以通过实现灵活、多语言和自动化友好的代码审查工作流,有效补充静态分析。

英文摘要

Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, together with the widespread adoption of AI code assistants, make manual review increasingly challenging. Identifying the types of changes within a patch, such as renames, moves, or logic modifications, can substantially improve review efficiency by enabling prioritization, filtering, and automation. However, existing LLM-based approaches to code review have largely focused on summarization and comment generation, leaving structured code reviews underexplored. In this paper, we present a systematic study of using large language models (LLMs) for taxonomy-based labeling of code changes in a code patch. We introduce a two-stage pipeline that assigns labels to diff hunks and then refines them to capture structural relationships and semantic attributes, such as rename propagation and type changes. Our approach employs few-shot prompting to produce language-agnostic and customizable labels, without the engineering overhead of traditional static-analysis pipelines. We evaluate four LLMs across multiple context configurations on a manually curated benchmark of natural and synthetic patches. Our best configuration achieves up to $84\%$ recall and $81\%$ precision, with high accuracy in extracting relational and attribute metadata. These results suggest that LLM-based labeling can effectively complement static analysis by enabling flexible, multilingual, and automation-friendly code review workflows.

2605.26087 2026-05-26 stat.ML cs.LG

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

DiscoverPhysics: 基准测试LLMs的即用型科学思维

Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma, Andrew Gordon Wilson, Pavel Izmailov, Carolina Cuesta-Lázaro

AI总结 提出DiscoverPhysics交互式基准,通过让LLM代理探索物理定律偏离现实的模拟世界,评估其设计实验、修正假设和发现物理规律的能力。

详情
AI中文摘要

前沿LLM现在在广泛的物理评估中表现强劲,但很难区分真正的推理与对已知科学的回忆。我们引入了DiscoverPhysics,一个交互式基准,要求LLM代理发现一个模拟世界的运动定律,该世界的物理故意偏离我们自己的世界。我们构建了22个世界,分别由屏蔽重力、分数幂重力、多物种耦合、隐藏暗物质样粒子、非坐标无关物理以及时变相互作用等支配。每个世界由N体模拟器按需生成,代理提出多轮实验,观察原始轨迹数据,最终提交对世界物理的自然语言解释以及推断定律的Python实现。由于解决一个世界需要代理设计信息性实验并修正其假设,该基准探测了在实验历史之上的长程推理。我们沿着两个互补轴评估提交:保留粒子的轨迹MSE和LLM评判的解释分数,该分数遵循专家编写的评估每个世界概念理解的规则。在11个前沿模型中,我们发现最强的代理仅通过一半的世界,并且在那些必须揭示潜在结构的世界中持续失败。开源模型在设计信息性实验和从数据中提取结论的能力方面明显落后于商业模型。我们进一步发现,良好的预测准确性并不能保证高质量的解释,并且概念理解依赖于通过精心选择的实验进行假设修正。

英文摘要

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.

2605.26062 2026-05-26 cs.GR cs.CV

Look Both Ways Before You Cross: Lifting Cross Fields From 2D Visual Priors

过马路前左右看:从2D视觉先验中提取交叉场

Dale Decatur, Jacob Serfaty, Oded Stein, Amir Vaxman, Rana Hanocka

AI总结 提出CrossLift方法,利用文本到图像先验从2D图像中提取方向信号,通过两次平滑插值将其反投影到网格表面,生成语义对齐的交叉场和四边形网格。

Comments Project page at: https://crosslift.github.io/

详情
AI中文摘要

我们提出了CrossLift,一种由图像中的视觉特征引导的网格交叉场计算技术。我们利用强大的文本到图像先验,这些先验能够合成特征对齐的二维四边形网格图像。我们将此信号提取为2D图像中明确的逐像素方向,然后将其反投影到网格表面。我们通过在网格表面上执行两次平滑插值(首先在每个视图内,然后在多个视图之间)来聚合这些候选表面方向。我们在每次插值中为候选方向提出基于置信度的自定义权重,这使我们能够解决同一面上的候选方向之间的冲突,并将我们的场平滑插值到被遮挡的面。我们的方法是模块化的,可以与许多不同的2D视觉先验一起使用。我们展示了在纹理对齐四边形网格以及使用粗略的用户绘制线条作为信号的交互式交叉场设计中的额外应用。我们在多种有机和机械形状上展示了CrossLift的有效性,并生成了与现有方法相比具有优越语义对齐的四边形网格。项目页面:https://crosslift.github.io/

英文摘要

We present CrossLift, a technique for computing cross fields on meshes guided by visual features in images. We leverage powerful text-to-image priors that are capable of synthesizing images of feature-aligned quad meshes in 2D. We extract this signal as explicit per-pixel directions in the 2D images, which we then back-project to the mesh surface. We aggregate these candidate surface directions by performing two smooth interpolations on the mesh surface (first within each view and second across multiple views). We propose custom confidence-based weights for the candidate directions in each interpolation that allow us to resolve conflicts between candidates on the same face and smoothly interpolate our field to occluded faces. Our method is modular and can be used with many different 2D visual priors. We show additional applications to texture-aligned quad meshing as well as interactive cross-field design using coarse, user-drawn lines as signal. We demonstrate the effectiveness of CrossLift on a diverse set of both organic and mechanical shapes and produce quad meshes that exhibit superior semantic alignment as compared to existing methods. Project page at: https://crosslift.github.io/

2605.26059 2026-05-26 physics.flu-dyn cs.LG

Accelerating Bayesian inverse design in computational fluid dynamics using neural operators

利用神经算子加速计算流体力学中的贝叶斯逆向设计

Bipin Tiwari, Omer San

AI总结 本文提出将神经算子代理模型嵌入MCMC采样循环,在保持后验结构的同时实现超过三个数量级的加速,用于计算流体力学中的贝叶斯逆向设计。

详情
Journal ref
Mach. Learn. Comput. Sci. Eng 2, 14 (2026)
AI中文摘要

贝叶斯逆向设计提供了一个原则性框架,用于从稀疏流场观测中推断空气动力学几何形状并量化不确定性。然而,其在计算流体力学(CFD)中的实际应用受到基于梯度的马尔可夫链蒙特卡洛(MCMC)采样所需重复高保真模拟成本的严重限制。虽然通常提出代理模型来降低这一成本,但它们对后验几何和不确定性(尤其是激波主导流)的影响仍知之甚少。在这项工作中,我们证明神经算子代理可以直接嵌入MCMC推断循环中,同时保持后验结构。通过准一维喷管流的全贝叶斯逆公式,我们证明几何参数化在可辨识性和后验条件中起决定性作用,其中三次B样条产生稳定且物理意义明确的不确定性估计。基于该公式,在No-U-Turn采样器中用CFD生成数据训练的深度算子网络替代CFD求解器,同时保持似然模型、先验和采样配置不变。在从稀疏到完全观测的范围内,基于代理的推断再现了CFD参考的后验几何和不确定性趋势。由于代理集成,总推断时间减少到一秒以下,对应超过三个数量级的加速。此外,直接逆神经算子作为逆向设计的确定性替代方案被研究,无需后验采样即可实现单次几何重建。这些结果表明,神经算子加速的贝叶斯推断能够为空气动力学应用实现实用的、不确定性感知的逆向设计工作流程。

英文摘要

Bayesian inverse design provides a principled framework for inferring aerodynamic geometries from sparse flow observations while quantifying uncertainty. However, its practical use in computational fluid dynamics (CFD) is severely limited by the cost of repeated high-fidelity simulations required for gradient-based Markov chain Monte Carlo (MCMC) sampling. While surrogate models are commonly proposed to reduce this cost, their effect on posterior geometry and uncertainty, especially for shock-dominated flows, remains poorly understood. In this work, we demonstrate that neural operator surrogates can be embedded directly within the MCMC inference loop while preserving posterior structure. Using a fully Bayesian inverse formulation of quasi-one-dimensional nozzle flow, we demonstrate that geometry parameterization plays a decisive role in identifiability and posterior conditioning, with cubic B-splines yielding stable and physically meaningful uncertainty estimates. Building on this formulation, a Deep Operator Network trained on CFD-generated data is substituted for the CFD solver within a No-U-Turn Sampler, while keeping the likelihood model, priors, and sampling configuration unchanged. Across sparse to fully observed regimes, surrogate-based inference reproduces the posterior geometry and uncertainty trends of the CFD reference. As a result of surrogate integration, total inference time is reduced to under one second, corresponding to a speedup exceeding three orders of magnitude. In addition, a direct inverse neural operator is examined as a deterministic alternative for inverse design, enabling single-shot geometry reconstruction without posterior sampling. These results demonstrate that neural operator-accelerated Bayesian inference enables practical, uncertainty-aware inverse design workflows for aerodynamic applications.

2605.26000 2026-05-26 stat.ML cs.LG stat.ME

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

超越有限方差的随机梯度下降统计推断

Jose Blanchet, Peter Glynn, Wenhao Yang

AI总结 针对随机梯度下降中梯度方差可能无限的问题,提出一种基于联合弱收敛和自正则化统计量的模型无关置信域构建方法,并通过子采样校准实现渐近有效推断。

详情
AI中文摘要

随机梯度下降(SGD)是大规模统计学习和随机优化的基础算法。然而,当随机梯度具有无限方差时,基于SGD迭代的统计推断仍然具有挑战性,因为相关的极限分布依赖于未知的冗余参数。在本文中,我们开发了一种高效、模型无关的方法,用于从SGD轨迹构建置信域,该方法适用于有限方差和无限方差两种情况。该过程基于Polyak-Ruppert平均估计量和由SGD轨迹上的随机梯度构建的经验二阶矩归一化器的联合弱收敛结果。这种联合极限产生了一个自归一化统计量,其中主要的尾部依赖尺度项相互抵消。然后,我们使用子采样校准方案来估计相关的临界值,避免了对尾部指数、慢变函数或稳定律参数的显式估计。由此产生的置信域易于实现,并且在有限二阶矩和无限二阶矩情况下都是渐近有效的。模拟研究显示了在各种设置下的可靠覆盖,支持所提出的方法作为随机优化中不确定性量化的实用工具。

英文摘要

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.

2605.25168 2026-05-26 eess.IV cs.AI cs.CV

Methodology for Creating a Clinically Verified Dermoscopic Image Dataset

创建临床验证的皮肤镜图像数据集的方法论

Kozachok Elena Sergeevna

AI总结 提出一种结合移动皮肤镜图像采集标准操作程序、结构化元数据信息模型和多阶段专家验证的方法,构建临床验证的皮肤镜图像数据集,用于医学信息学研究。

Comments 22 pages, 5 figures, 5 tables

详情
AI中文摘要

本研究提出了一种构建临床验证的皮肤镜图像数据集的方法,用于医学信息学研究。该工作的相关性在于,自动化诊断支持系统的性能不仅取决于图像数量,还取决于图像采集过程的可重复性、结构化元数据的完整性以及诊断标签的可靠性。国际数据集主要是在与俄罗斯常规门诊实践和移动皮肤镜显著不同的条件下创建的。所提出的方法整合了三个相互关联的组成部分:(1)通过移动皮肤镜采集图像的标准操作程序(SOP),(2)一个信息模型,包含16个结构化元数据字段,组织成六个临床导向的块,采用ISIC兼容的符号表示,以及(3)多阶段专家验证诊断标签(初始临床注释、三位专家的共识审查以及所有恶性肿瘤的组织学确认)。使用该方法,在2025年6月至2026年5月期间,收集了来自443名患者的1026张独特的皮肤镜图像数据集。从1044条初始记录中排除了18个重复项。该数据集包括九个疾病类别;所有39个恶性病变(18个黑色素瘤、15个基底细胞癌和6个鳞状细胞癌)均经过组织学验证。患者年龄范围为2至90岁(中位年龄38岁),其中女性279人(63%),男性164人(37%)。每张图像都附有专家注释的皮肤镜结构和明确的verification_stage字段,指示诊断确认的水平。所得数据集作为临床验证的试点资源,适用于独立模型评估、域偏移分析、可解释性研究和进一步扩展。

英文摘要

This study presents a methodology for constructing a clinically verified dataset of dermatoscopic images for medical informatics research. The relevance of the work is driven by the fact that the performance of automated diagnostic support systems depends not only on the volume of images, but also on the reproducibility of the image acquisition procedure, the completeness of structured metadata, and the reliability of diagnostic labels. International collections were primarily created under conditions that differ substantially from routine Russian outpatient practice and mobile dermatoscopy. The proposed methodology integrates three interconnected components: (1) a standard operating procedure (SOP) for acquiring images via mobile dermatoscopy, (2) an information model comprising 16 structured metadata fields organized into six clinically oriented blocks in ISIC-compatible notation, and (3) a multi-stage expert verification of diagnostic labels (initial clinical annotation, consensus review by three specialists, and histological confirmation of all malignant neoplasms). Using this methodology, a dataset of 1,026 unique dermatoscopic images from 443 patients was collected between June 2025 and May 2026. From 1,044 initial records, 18 duplicates were excluded. The dataset includes nine nosological categories; all 39 malignant lesions (18 melanomas, 15 basal cell carcinomas, and 6 squamous cell carcinomas) were histologically verified. Patient age ranged from 2 to 90 years (median 38), with 279 females (63%) and 164 males (37%). Each image is accompanied by expert-annotated dermatoscopic structures and an explicit verification_stage field indicating the level of diagnostic confirmation. The resulting dataset serves as a pilot clinically verified resource suitable for independent model evaluation, domain shift analysis, interpretability studies, and further expansion.

2605.23082 2026-05-26 stat.ML cs.AI cs.LG

KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis

KAPLAN: 用于生存分析的Kolmogorov-Arnold可预测可学习激活网络

Stelios Boulitsakis Logothetis, Angela Wood, Pietro Liò

AI总结 提出KAPLAN-HR模型,利用B样条Kolmogorov-Arnold网络非参数估计条件风险函数,通过深层架构自动捕捉交互和时变效应,并证明其收敛速率仅依赖于表示平滑性,从而缓解维度灾难,在六个临床数据集上达到或超越现有方法。

Comments 9 pages, 3 figures, 13 supplementary pages. Submitted to NeurIPS 2026

详情
AI中文摘要

生存分析旨在建模协变量和时间如何共同影响右删失下的事件时间分布。经典方法如Cox模型和广义加性模型(GAM)需要手动指定交互和时变效应,这在丰富的临床数据集上越来越不切实际。我们引入了KAPLAN-HR,一种B样条Kolmogorov-Arnold网络(KAN),用于非参数估计条件风险函数作为协变量和时间的联合函数。单层KAPLAN-HR模型恢复GAM,而更深层的架构通过组合捕捉交互和时变效应。我们为非参数KAN风险估计器建立了收敛速率,该速率仅依赖于底层KAN表示的平滑性,而不依赖于协变量维度,从而缓解了KAN可表示目标的维度灾难。在六个临床基准数据集的评估中,KAPLAN-HR匹配或超过了已建立的统计和深度学习生存方法的预测性能。

英文摘要

Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical methods such as the Cox model and generalised additive models (GAMs) require interactions and time-varying effects to be manually specified, which is increasingly impractical on rich clinical datasets. We introduce KAPLAN-HR, a B-spline Kolmogorov-Arnold Network (KAN) for nonparametric estimation of the conditional hazard as a joint function of covariates and time. A single-layer KAPLAN-HR model recovers a GAM, while deeper architectures capture interactions and time-varying effects through composition. We establish a convergence rate for the nonparametric KAN hazard estimator that depends only on the smoothness of the underlying KAN representation and not on the covariate dimension, thereby mitigating the curse of dimensionality for KAN-representable targets. In evaluations over six clinical benchmark datasets, KAPLAN-HR matches or exceeds the predictive performance of established statistical and deep learning survival methods.