arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4092
2602.06065 2026-06-02 stat.ML cond-mat.dis-nn cs.CL cs.LG

Deep networks learn to parse uniform-depth context-free languages from local statistics

深度网络从局部统计中学习解析均匀深度的上下文无关语言

Jack T. Parley, Francesco Cagnetta, Matthieu Wyart

发表机构 * GitHub

AI总结 通过引入可调类概率上下文无关文法并设计基于深度卷积网络的推理算法,揭示了语言结构从局部统计中涌现的机制,并验证了深度卷积和Transformer架构的预测。

Comments Accepted as regular paper at ICML 2026

详情
AI中文摘要

理解语言结构如何仅从句子中学习是认知科学和机器学习中的一个核心问题。大型语言模型(LLMs)内部表征的研究支持其在预测下一个词时解析文本的能力,同时独立于表面形式表示语义概念。然而,哪些数据统计使这些成就成为可能,以及需要多少数据,仍然在很大程度上未知。概率上下文无关文法(PCFGs)为研究这些问题提供了一个可处理的测试平台。然而,先前的工作要么侧重于训练网络使用的类解析算法的后验表征,要么侧重于具有固定语法(无需解析)的PCFGs的可学习性。在这里,我们(i)引入了一个可调的PCFGs类别,其中歧义程度和跨尺度的相关结构都可以被控制;(ii)提供了一种学习机制——一种受深度卷积网络结构启发的推理算法——将可学习性和样本复杂度与特定语言统计联系起来;(iii)在深度卷积和基于Transformer的架构上经验性地验证了我们的预测。总体而言,我们提出了一个统一框架,其中不同尺度的相关性消除了局部歧义,使数据的层次化表征得以涌现。

英文摘要

Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown. Probabilistic context-free grammars (PCFGs) provide a tractable testbed for studying these questions. However, prior work has focused either on the post-hoc characterization of the parsing-like algorithms used by trained networks; or on the learnability of PCFGs with fixed syntax, where parsing is unnecessary. Here, we (i) introduce a tunable class of PCFGs in which both the degree of ambiguity and the correlation structure across scales can be controlled; (ii) provide a learning mechanism -- an inference algorithm inspired by the structure of deep convolutional networks -- that links learnability and sample complexity to specific language statistics; and (iii) validate our predictions empirically across deep convolutional and transformer-based architectures. Overall, we propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.

2602.01460 2026-06-02 math.OC cs.LG

Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator

REINFORCE策略梯度估计器中的非均匀噪声信号比

Haoyu Han, Heng Yang

发表机构 * math.OC(数学优化)

AI总结 研究REINFORCE策略梯度估计器的噪声信号比(NSR),通过精确刻画线性与多项式系统中的NSR,发现NSR在策略参数空间中高度非均匀,且通常在策略接近最优时增大甚至爆炸,导致训练不稳定和策略崩溃。

详情
AI中文摘要

策略梯度方法在强化学习中被广泛使用,但随着学习的进行,训练常常变得不稳定或减慢。我们通过策略梯度估计器的噪声信号比(NSR)来研究这一现象,该比值定义为估计器方差(噪声)除以真实梯度的平方范数(信号)。我们的主要结果是,对于(i)具有高斯策略和线性状态反馈的有限时域线性系统,以及(ii)具有高斯策略和多项式反馈的有限时域多项式系统,REINFORCE估计器的NSR可以精确刻画——要么是闭式形式,要么通过数值矩评估算法——无需近似。对于一般的非线性动力学和表达性策略(包括神经策略),我们进一步推导了方差的一般上界。这些刻画使得能够直接检查NSR如何随策略参数变化,以及如何沿优化轨迹(例如SGD和Adam)演变。在一系列示例中,我们发现NSR景观高度非均匀,并且通常随着策略接近最优而增大;在某些情况下它会爆炸,从而触发训练不稳定和策略崩溃。

英文摘要

Policy-gradient methods are widely used in reinforcement learning, yet training often becomes unstable or slows down as learning progresses. We study this phenomenon through the noise-to-signal ratio (NSR) of a policy-gradient estimator, defined as the estimator variance (noise) normalized by the squared norm of the true gradient (signal). Our main result is that, for (i) finite-horizon linear systems with Gaussian policies and linear state-feedback, and (ii) finite-horizon polynomial systems with Gaussian policies and polynomial feedback, the NSR of the REINFORCE estimator can be characterized exactly-either in closed form or via numerical moment-evaluation algorithms-without approximation. For general nonlinear dynamics and expressive policies (including neural policies), we further derive a general upper bound on the variance. These characterizations enable a direct examination of how NSR varies across policy parameters and how it evolves along optimization trajectories (e.g. SGD and Adam). Across a range of examples, we find that the NSR landscape is highly non-uniform and typically increases as the policy approaches an optimum; in some regimes it blows up, which can trigger training instability and policy collapse.

2510.08948 2026-06-02 cs.IR cs.AI

SHERLOCK: Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

SHERLOCK:面向LLM增强电商风险管理的动态知识适应

Nan Lu, Yurong Hu, Jiaquan Fang, Yan Liu, Rui Dong, Yiming Wang, Rui Lin, Shaoyi Xu

发表机构 * Beijing Jiaotong University(北京交通大学) JD.com(京东公司) Southeast University(东南大学) Zhejiang University(浙江大学)

AI总结 提出Sherlock框架,通过构建领域知识库、两阶段检索增强生成和自演化数据飞轮,将结构化知识与LLM推理结合,提升电商风险案例调查的效率和准确性。

详情
AI中文摘要

有效的电商风险管理需要深入案例调查以识别高度对抗环境中的新兴欺诈模式。然而,人工调查通常需要分析多源异构数据之间的关联和耦合,这是一个劳动密集型过程,限制了效率。虽然大型语言模型(LLM)在自动化这些分析方面显示出潜力,但其部署受到风险场景复杂性和长尾领域知识稀疏性的阻碍。为应对这些挑战,我们提出了Sherlock,一个通过三个核心模块将结构化领域知识与基于LLM的推理相结合的框架。首先,我们通过从异构知识源中提取结构化专业知识来构建领域知识库(KB)。其次,我们设计了一种针对案例调查的两阶段检索增强生成策略,该策略将输入上下文增强与反思与细化模块相结合,以充分利用知识库提高分析质量。最后,我们开发了一个用于操作和标注的集成平台,以驱动自演化数据飞轮。通过知识库更新的实时热修复与后训练定期逻辑对齐相结合,我们促进系统持续演化以对抗对抗性漂移。在京东的在线A/B测试表明,Sherlock实现了82%的专家接受率(EAR),日调查吞吐量增加了386.7%。另外90天的评估显示,该飞轮成功从两次因策略变化导致的性能衰减中恢复,通过自主模型更新将EAR上限提高了约3.5%。

英文摘要

Effective e-commerce risk management requires in-depth case investigations to identify emerging fraud patterns in highly adversarial environments. However, manual investigation typically requires analyzing the associations and couplings among multi-source heterogeneous data, a labor-intensive process that limits efficiency. While Large Language Models (LLMs) show promise in automating these analyses, their deployment is hindered by the complexity of risk scenarios and the sparsity of long-tail domain knowledge. To address these challenges, we propose Sherlock, a framework that integrates structured domain knowledge with LLM-based reasoning through three core modules. First, we construct a domain Knowledge Base (KB) by distilling structured expertise from heterogeneous knowledge sources. Second, we design a two-stage retrieval-augmented generation strategy tailored for case investigation, which combines input contextual augmentation with a Reflect & Refine module to fully leverage the KB for improved analysis quality. Finally, we develop an integrated platform for operations and annotation to drive a self-evolving data flywheel. By combining real-time hotfixes through KB updates with periodic logic alignment via post-training, we facilitate continuous system evolution to counteract adversarial drifts. Online A/B tests at JD dot com demonstrate that Sherlock achieves an 82% Expert Acceptance Rate (EAR) and a 386.7% increase in daily investigation throughput. An additional 90-day evaluation shows that the flywheel successfully recovers from performance decay caused by changing tactics twice, raising the EAR ceiling by around 3.5% through autonomous model updates.

2602.07083 2026-06-02 cs.SE cs.AI

Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation

重新思考科学建模:迈向物理一致且可模拟执行的程序化生成

Yongqing Jiang, Jianze Wang, Zhiqi Shen, Zhenghong Lin, Jiayuan Wang, Yijian Yang, Kaoshan Dai, Haoran Luo

发表机构 * Sichuan University(四川大学) Nanyang Technological University(南洋理工大学) Fuzhou University(福州大学)

AI总结 针对大型语言模型在工程建模中生成代码的物理不一致问题,提出结合领域知识构建、约束对齐和验证驱动的框架,并引入CivilInstruct数据集和MBEval基准,通过两阶段微调提升模型生成的可执行性和物理一致性。

详情
AI中文摘要

结构建模是计算工程科学的基础组成部分,其中即使是微小的物理不一致或规范违反也可能使下游模拟失效。大型语言模型(LLMs)在自动生成建模代码方面的潜力已被证实。然而,在严格的工程约束下,不可执行或物理不一致的输出仍然普遍存在。因此,提出了一种物理一致自动建筑建模框架,整合了领域知识构建、面向约束的模型对齐和验证驱动的评估。引入了CivilInstruct作为领域特定数据集,形式化了结构工程知识和约束推理,以实现可模拟的模型生成。进一步采用两阶段微调策略来强制约束满足和应用程序编程接口合规性,显著减少了幻觉和不一致输出。提出了MBEval作为验证驱动的基准,通过闭环验证评估可执行性和结构动力学一致性。实验结果表明,在严格的验证指标上,该方法相比基线持续改进。我们的代码可在 https://github.com/Jovanqing/AutoBM 获取。

英文摘要

Structural modeling is a fundamental component of computational engineering science, in which even minor physical inconsistencies or specification violations may invalidate downstream simulations. The potential of large language models (LLMs) for automatic generation of modeling code has been demonstrated. However, non-executable or physically inconsistent outputs remain prevalent under stringent engineering constraints. A framework for physics-consistent automatic building modeling is therefore proposed, integrating domain knowledge construction, constraint-oriented model alignment, and verification-driven evaluation. CivilInstruct is introduced as a domain-specific dataset that formalizes structural engineering knowledge and constraint reasoning to enable simulation-ready model generation. A two-stage fine-tuning strategy is further employed to enforce constraint satisfaction and application programming interface compliance, substantially reducing hallucinated and non-conforming outputs. MBEval is presented as a verification-driven benchmark that evaluates executability and structural dynamics consistency through closed-loop validation. Experimental results show consistent improvements over baselines across rigorous verification metrics. Our code is available at https://github.com/Jovanqing/AutoBM.

2602.05395 2026-06-02 stat.ML cs.AI cs.LG

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

用于高效推断一致LLM答案的最优贝叶斯停止

Jingkai Huang, Will Ma, Zhengyuan Zhou

发表机构 * Stern School of Business, New York University, New York, USA(纽约大学 Stern 商学院) Graduate School of Business, Columbia University, New York, USA(哥伦比亚大学 商学院)

AI总结 利用贝叶斯先验信息,通过L-聚合停止策略在达到足够一致性时提前停止采样,以最小化采样成本并高效识别最一致的LLM答案。

Comments Accepted to ICML 2026. Camera-ready version

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

一种提高LLM准确性的简单策略,特别是在数学和推理问题中,是采样多个响应并提交最一致达成的答案。在本文中,我们利用贝叶斯先验信息来节省采样成本,一旦达到足够的一致性就停止。尽管精确后验在计算上难以处理,我们进一步引入了一种高效的“L-聚合”停止策略,该策略仅跟踪L-1个最频繁的答案计数。理论上,我们证明L=3就足够了:这种粗略近似足以实现渐近最优性,并且严格优于无先验基线,同时具有快速的后验计算。实验上,该方法使用更少的样本识别出最一致(即众数)的LLM答案,并且可以在减少LLM调用次数(即节省LLM推理成本)高达50%的同时实现相似的答案准确性。

英文摘要

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.

2602.02250 2026-06-02 math.OC cs.LG

Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences

通过Wasserstein和Kalman-Wasserstein KL散度的适定KL正则化控制

Viktor Stein, Adwait Datar, Nihat Ay

发表机构 * Department of Mathematics, Technical University of Munich \& Munich Center for Machine Learning, Germany. The majority of the work was conducted while at the Institute of Mathematics at the Technical University of Berlin, Germany \& the Berlin Mathematical School. Institute for Data Science Foundations, Hamburg University of Technology, Germany Santa Fe Institute, USA

AI总结 针对KL散度在支持不匹配和低噪声下失效的问题,提出基于传输几何的KL变体,消除线性时不变系统中的奇异性,实现适定控制并提升闭环性能。

Comments 37 pages, 9 figures, comments welcome. Accepted @ ICML'26

详情
AI中文摘要

Kullback-Leibler (KL) 散度正则化在强化学习中广泛使用,但在支持不匹配时会变得无穷大,且在低噪声情况下可能退化。利用统一的信息几何框架,我们通过用基于传输的几何替换KL动态公式中的Fisher-Rao几何,引入KL类似物,并推导出常见分布族的闭式表达式。在椭圆分布之间,这些散度对于退化的相等协方差保持有限,并为卡尔曼集成方法中使用的正则化启发式方法提供了几何解释。我们展示了这些散度在KL正则化最优控制中的效用。在具有高斯过程噪声的线性时不变系统的完全可处理设置中,经典KL简化为二次控制惩罚,该惩罚随着过程噪声消失而变得奇异。我们的变体消除了这种奇异性,并产生了适定问题。在双积分器和倒立摆示例中,所得控制保留了非平凡反馈,并实现了更好的闭环性能。

英文摘要

Kullback-Leibler (KL) divergence regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise regimes. Using a unified information-geometric framework, we introduce KL analogs by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based geometries, and derive closed-form expressions for common distribution families. Between elliptic distributions, these divergences remain finite for degenerating equal covariances and yield a geometric interpretation of regularization heuristics used in Kalman ensemble methods. We demonstrate the utility of these divergences in KL-regularized optimal control. In the fully tractable setting of linear time-invariant systems with Gaussian process noise, the classical KL reduces to a quadratic control penalty that becomes singular as process noise vanishes. Our variants remove this singularity and yield well-posed problems. In both the double integrator and cart-pole examples, the resulting controls preserve nontrivial feedback and achieve better closed-loop performance.

2601.06199 2026-06-02 eess.AS cs.AI cs.SD

FastSLM: Hierarchical Temporal Abstraction for Efficient Long-Form Speech Adaptation

FastSLM:用于高效长语音自适应的层次时间抽象

Junseok Lee, Sangyong Lee, Chang-Jae Chun

发表机构 * OKESTRO Sejong University(世宗大学)

AI总结 针对长语音输入中标记爆炸问题,提出FastSLM架构,通过层次时间抽象器(HTA)实现每秒1.67个标记的极端压缩率(减少97%),在显著降低计算量和参数的同时,在长语音基准上达到与最先进模型竞争的性能。

Comments Title updated

详情
AI中文摘要

将多模态大语言模型(MLLMs)扩展到长语音受到输入标记爆炸式增长的瓶颈限制。与图像或视频不同,音频缺乏重叠信息,使得极端的1-标记压缩极易丢失细粒度声学线索。为克服这一问题,我们提出FastSLM,一种标记高效的架构,其核心是层次时间抽象器(HTA)。HTA在多个时间尺度上逐步蒸馏非重叠的声学特征,实现了每秒1.67个标记的极端压缩率——减少了97%而不丢失关键上下文。实验结果表明,尽管FastSLM使用的FLOPs和参数显著更少,但在长语音基准上仍能达到与最先进模型竞争的性能。源代码和模型检查点可在https://anonymous.4open.science/r/FastSLM-8BD3获取。

英文摘要

Scaling Multimodal Large Language Models (MLLMs) to long-form speech is bottlenecked by the explosive growth of input tokens. Unlike images or videos, audio lacks overlapping information, making extreme 1-token compression highly susceptible to the loss of fine-grained acoustic cues. To overcome this, we propose FastSLM, a token-efficient architecture featuring the Hierarchical Temporal Abstractor (HTA). HTA progressively distills non-overlapping acoustic features across multiple temporal scales, achieving an extreme compression rate of 1.67 tokens per second a 97% reduction without losing critical context. Experimental results show that FastSLM achieves competitive performance with state-of-the-art models on long-form benchmarks despite operating with significantly fewer FLOPs and parameters. The source code and model checkpoints are available at https://anonymous.4open.science/r/FastSLM-8BD3.

2601.22784 2026-06-02 stat.ML cs.LG

Approximating $f$-Divergences with Rank Statistics

用秩统计量近似 $f$-散度

Viktor Stein, José Manuel de Frutos

发表机构 * Department of Mathematics, Technical University of Munich \& Munich Center for Machine Learning, Germany. The majority of the work was conducted while at the Institute of Mathematics at the Technical University of Berlin, Germany \& the Berlin Mathematical School. Department of Signal Theory

AI总结 提出一种基于秩统计量的 $f$-散度近似方法,通过直接处理秩分布避免显式密度比估计,并证明其单调性、下界性质及收敛速率,同时扩展到高维数据的切片版本。

Comments 40 pages, 16 figures, 6 tables, accepted at ICML'26. Comments welcome!

详情
AI中文摘要

我们引入了一种基于秩统计量的 $f$-散度近似方法,通过直接处理秩的分布来避免显式的密度比估计。对于分辨率参数 $K$,我们将两个单变量分布 $μ$ 和 $ν$ 之间的不匹配映射到 $\{0, \ldots, K\}$ 上的秩直方图,并通过离散 $f$-散度测量其与均匀分布的偏差,从而得到一个秩统计量散度估计量。我们证明该散度估计量在 $K$ 上是单调的,并且始终是真实 $f$-散度的下界,同时在分位数域密度比的适度正则性下,建立了 $K o\infty$ 时的定量收敛速率。为了处理高维数据,我们通过随机投影对单变量构造进行平均,定义了切片秩统计量 $f$-散度,并给出了切片极限的收敛结果。我们还推导了估计量的有限样本偏差界以及渐近正态性结果。最后,通过与神经基线进行基准测试,并展示其在生成建模实验中作为学习目标的应用,我们实证验证了该方法的有效性。

英文摘要

We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $μ$ and $ν$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding a rank-statistic divergence estimator. We prove that the resulting estimator of the divergence is monotone in $K$, is always a lower bound of the true $f$-divergence, and we establish quantitative convergence rates for $K\to\infty$ under mild regularity of the quantile-domain density ratio. To handle high-dimensional data, we define the sliced rank-statistic $f$-divergence by averaging the univariate construction over random projections, and we provide convergence results for the sliced limit as well. We also derive finite-sample deviation bounds along with asymptotic normality results for the estimator. Finally, we empirically validate the approach by benchmarking against neural baselines and illustrating its use as a learning objective in generative modeling experiments.

2601.20115 2026-06-02 cs.AR cs.AI

How Much Progress Has There Been in NVIDIA Datacenter GPUs?

NVIDIA 数据中心 GPU 取得了多少进展?

Emanuele Del Sozzo, Martin Fleming, Kenneth Flamm, Neil Thompson

发表机构 * MIT FutureTech, Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology Cambridge MA USA(麻省理工学院未来科技、计算机科学与人工智能实验室(CSAIL)、麻省理工学院剑桥MA美国)

AI总结 本文分析了 2000 年代中期至 2025 年 NVIDIA 数据中心 GPU 在计算性能、内存、功耗和价格方面的进展,并评估了美国出口管制的影响。

详情
AI中文摘要

随着现代图形处理单元(GPU)在多种计算任务中变得越来越重要,分析其过去和当前的进展对于确定未来科学研究的限制至关重要。这在人工智能(AI)领域尤为突出,该领域的技术快速进步和激烈的全球竞争导致美国最近实施了限制国际获取先进 AI 芯片的出口管制法规。因此,本文考察了从 2000 年代中期到 2025 年 NVIDIA 数据中心 GPU 的技术进展。我们的主要结果发现,FP16 和 FP32 密集运算的倍增时间分别为 1.43 年和 1.67 年,而 FP64 的倍增时间在 2.05 到 3.79 年之间。片外内存大小和带宽的增长速度慢于计算性能,每 3.29 到 3.41 年翻一番,而发布价格和功耗大约每 5.03 年和 15 年翻一番。此外,我们对每年性能最佳的 GPU 进行的跨供应商比较显示,NVIDIA 的性能优势正在缩小,但不足以促使重大的市场转变。最后,我们量化了当前美国出口管制法规的潜在影响以及由此产生的性能差距,最近提出的政策变化可能将这些差距从 23.6 倍缩小到 3.54 倍。

英文摘要

As the role of modern Graphics Processing Units (GPUs) becomes increasingly essential for several computing tasks, analyzing their past and current progress is paramount for determining future constraints on scientific research. This is particularly compelling in the Artificial Intelligence (AI) domain, where rapid technological advancements and fierce global competition have led the United States to recently implement export control regulations limiting international access to advanced AI chips. Consequently, this paper examines technical progress in NVIDIA datacenter GPUs from the mid-2000s through 2025. Our main results identify doubling times of 1.43 and 1.67 years for FP16 and FP32 dense operations, while FP64 doubling times range from 2.05 to 3.79 years. Off-chip memory size and bandwidth have grown at slower rates than computing performance, doubling every 3.29 to 3.41 years, whereas the release prices and power consumption roughly doubled every 5.03 and 15 years, respectively. Moreover, our cross-vendor comparison of the top-performing GPUs per year shows that NVIDIA's performance advantage is narrowing, but not enough to compel a major market shift. Finally, we quantify the potential implications of current U.S. export control regulations and the consequent performance gaps, which the recently proposed policy changes could shrink from 23.6X to 3.54X.

2601.21959 2026-06-02 stat.ML cs.LG

Near-Optimal Private Tests for Simple and MLR Hypotheses

简单和MLR假设的近最优私有检验

Yu-Wei Chen, Raghu Pasupathy, Jordan Awan

发表机构 * Department of Statistics, Purdue University West Lafayette(统计学系,普渡大学西拉法叶分校) Department of Statistics, University of Pittsburgh(统计学系,匹兹堡大学)

AI总结 本文在高斯差分隐私框架下,针对单调似然比条件下的简单、单侧和双侧假设检验,提出了一种基于数据驱动截断边界的私有均值估计器,并构造了私有检验统计量,实现了与非参数最有效检验相同的渐近相对效率,同时保守控制第一类错误。

详情
AI中文摘要

我们在高斯差分隐私框架下,针对单调似然比条件下的简单假设以及单侧和双侧检验,开发了一种近最优的检验程序。我们的机制基于具有数据驱动截断边界的私有均值估计器,其总体风险在对数因子范围内匹配私有极小化率。利用该估计器,我们构造了私有检验统计量,在保持保守的第一类错误控制的同时,实现了与非私有最有效检验相同的渐近相对效率。除了理论结果外,我们的数值实验表明,即使在中等小的样本量和隐私损失预算下,我们的私有检验也优于竞争性的差分隐私方法,并提供与非私有最有效检验相当的功效。

英文摘要

We develop a near-optimal testing procedure under the framework of Gaussian differential privacy for simple as well as one- and two-sided tests under monotone likelihood ratio conditions. Our mechanism is based on a private mean estimator with data-driven clamping bounds, whose population risk matches the private minimax rate up to logarithmic factors. Using this estimator, we construct private test statistics that achieve the same asymptotic relative efficiency as the non-private, most powerful tests while maintaining conservative type I error control. In addition to our theoretical results, our numerical experiments show that our private tests outperform competing DP methods and offer comparable power to the non-private most powerful tests, even at moderately small sample sizes and privacy loss budgets.

2601.07742 2026-06-02 cond-mat.mtrl-sci cs.LG

PFT: Phonon Fine-tuning for Machine Learned Interatomic Potentials

PFT: 机器学习原子间势的声子微调

Teddy Koker, Abhijeet Gangan, Mit Kotak, Jaime Marian, Tess Smidt

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出声子微调(PFT)方法,通过监督二阶力常数来优化机器学习原子间势(MLIP)的曲率,显著提升声子热力学性质预测精度。

Comments 17 pages, 11 figures, ICML 2026

详情
AI中文摘要

许多材料性质依赖于势能面的高阶导数,然而使用能量、力和应力误差的标准损失训练的机器学习原子间势(MLIP)可能在曲率上存在误差,从而降低振动性质的预测。我们引入了声子微调(PFT),通过将MLIP能量Hessian矩阵与有限位移声子计算得到的DFT力常数匹配,直接监督材料的二阶力常数。为了扩展到大型超胞,PFT随机采样Hessian列并通过单个Hessian-向量积计算损失。我们还使用简单的协同训练方案来整合上游数据以减轻灾难性遗忘。在MDR声子基准测试中,PFT在声子热力学性质上平均将Nequix MP提升了55%,并在基于Materials Project轨迹训练的模型中达到了最先进的精度。PFT还能泛化改进超越二阶导数的性质,改善了依赖于势能三阶导数的热导率预测。

英文摘要

Many materials properties depend on higher-order derivatives of the potential energy surface, yet machine learned interatomic potentials (MLIPs) trained with a standard loss on energy, force, and stress errors can exhibit error in curvature, degrading the prediction of vibrational properties. We introduce phonon fine-tuning (PFT), which directly supervises second-order force constants of materials by matching MLIP energy Hessians to DFT-computed force constants from finite displacement phonon calculations. To scale to large supercells, PFT stochastically samples Hessian columns and computes the loss with a single Hessian-vector product. We also use a simple co-training scheme to incorporate upstream data to mitigate catastrophic forgetting. On the MDR Phonon benchmark, PFT improves Nequix MP by 55% on average across phonon thermodynamic properties and achieves state-of-the-art accuracy among models trained on Materials Project trajectories. PFT also generalizes to improve properties beyond second-derivatives, improving thermal conductivity predictions that rely on third-order derivatives of the potential energy.

2601.21237 2026-06-02 cs.DS cs.CL cs.LG

Characterizing the Effect of Noise in Language Generation in the Limit

极限情况下语言生成中噪声影响的刻画

Aaron Li, Ian Zhang

发表机构 * Harvard University(哈佛大学) Duke University(杜克大学)

AI总结 本文在极限语言生成模型中,通过分析噪声字符串对生成能力的影响,证明了单个噪声字符串严格减少可生成集合族,且有限噪声等价于单个噪声,并首次刻画了非均匀噪声依赖的可生成性。

Comments ICML 2026

详情
AI中文摘要

Kleinberg 和 Mullainathan 最近提出了一个用于研究语言生成现象的正式框架,称为极限语言生成。在该模型中,对手从未知目标语言中给出示例字符串的枚举,算法需要在有限时间内正确生成目标语言中未见过的字符串。Li、Raman 和 Tewari(2025)后来引入了非均匀和均匀生成的细化概念,Raman 和 Raman(2025)引入了噪声模型,允许对手插入无关字符串。噪声模型中的一个自然问题是通过研究每个额外无关字符串的影响来量化噪声效应。我们在此设置中展示了两个互补的结果。首先,我们证明对于均匀和非均匀生成,单个噪声字符串严格减少了可生成的集合族,从而回答了 Raman 和 Raman(2025)中的一个开放问题。然后,我们证明对于均匀和非均匀生成,单个噪声字符串的生成等价于任何有限噪声量的生成,这与 Bai、Panigrahi 和 Zhang(2026)展示的极限噪声生成的严格层次结构形成鲜明对比。最后,我们利用先前的结果首次提供了非均匀噪声依赖可生成性的刻画。

英文摘要

Kleinberg and Mullainathan recently proposed a formal framework for studying the phenomenon of language generation, called language generation in the limit. In this model, an adversary gives an enumeration of example strings from an unknown target language, and the algorithm is tasked with correctly generating unseen strings from the target language within finite time. Refined notions of non-uniform and uniform generation were later introduced by Li, Raman, and Tewari (2025), and a noisy model was introduced by Raman and Raman (2025), which allows the adversary to insert extraneous strings. A natural question in the noisy model is to quantify the effect of noise, by studying the impact of each additional extraneous string. We show two complementary results in this setting. We first show that for both uniform and non-uniform generation, a single noisy string strictly reduces the set of collections that can be generated, thus answering an open question in Raman and Raman (2025). Then, we show for both uniform and non-uniform generation that generation with a single noisy string is equivalent to generation with any finite amount of noise, sharply contrasting with the strict hierarchy for noisy generation in the limit shown by Bai, Panigrahi, and Zhang (2026). Finally, we leverage our previous results to provide the first known characterization for non-uniform noise-dependent generatability.

2601.18798 2026-06-02 cs.MM cs.AI

ELF: A Family of Encoder-Free ECG-Language Models

ELF:无编码器心电图语言模型家族

William Han, Tony Chen, Chaojing Duan, Xiaoyu Song, Yihang Yao, Yuzhe Yang, Michael A. Rosenberg, Emerson Liu, Ding Zhao

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Allegheny Health Network(阿勒格尼医疗网络) University of California Los Angeles(加州大学洛杉矶分校) University of Colorado(科罗拉多大学) Allergy and Immunology(过敏与免疫学)

AI总结 提出三种无编码器架构的ECG语言模型ELF,简化架构和训练流程,在两个数据集上达到或超越现有最优模型。

Comments 31 pages, 11 figures

详情
AI中文摘要

ECG语言模型(ELMs)将多模态大语言模型(MLLMs)的最新进展扩展到自动心电图解读。然而,现有大多数ELMs继承了视觉语言模型(VLM)的设计选择,并依赖预训练的ECG编码器,引入了大量的架构和训练复杂性。受无编码器VLM的启发,我们引入了ELF,一个包含三种无编码器ELM架构的家族,尽管架构和训练流程更简单,但在两个数据集上仍能与先前最先进的ELMs竞争,并经常超越它们。所有代码和数据可在github.com/ELM-Research/ECG-Language-Models获取。

英文摘要

ECG-Language Models (ELMs) extend recent advances in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most existing ELMs inherit Vision-Language Model (VLM) design choices and rely on pretrained ECG encoders, introducing substantial architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, a family of three encoder-free ELM architectures that remain competitive with, and often outperform, prior state-of-the-art ELMs across two datasets despite substantially simpler architectures and training pipelines. All code and data is available at github.com/ELM-Research/ECG-Language-Models.

2511.12487 2026-06-02 cs.NE cs.AI cs.CL

ToxSearch: Evolving Prompts for Toxicity Search in Large Language Models

ToxSearch: 面向大型语言模型毒性搜索的提示演化

Onkar Shelar, Travis Desell

发表机构 * Rochester Institute of Technology(罗切斯特技术研究所)

AI总结 提出ToxSearch,一种黑盒演化框架,通过同步稳态循环演化提示来测试大型语言模型的安全性,并分析不同操作符的行为及跨模型迁移性。

Comments 16 pages

详情
Journal ref
In: García-Sánchez, P., Díaz Álvarez, J., Murphy, A. (eds) Applications of Evolutionary Computation. EvoApplications 2026. Lecture Notes in Computer Science, vol 16525. Springer, Cham
AI中文摘要

大型语言模型即使在安全对齐后,仍然容易受到引发毒性内容的对抗性提示的攻击。我们提出了ToxSearch,一种黑盒演化框架,通过同步稳态循环演化提示来测试模型安全性。该系统采用多种操作符,包括词汇替换、否定、回译、释义以及两种语义交叉操作符,同时一个审核预言机提供适应度指导。操作符级分析显示出异质性行为:词汇替换提供了最佳的收益-方差权衡,语义相似性交叉充当精确的低吞吐量插入器,而全局重写表现出高方差和较高的拒绝成本。使用在LLaMA 3.1 8B上演化的精英提示,我们观察到实际有意义但衰减的跨模型迁移,大多数目标上的毒性大约减半,较小的LLaMA 3.2变体表现出最强的抵抗力,而一些跨架构模型保留了较高的毒性。这些结果表明,小的、可控的扰动是系统性红队测试的有效载体,并且防御措施应预期对抗性提示的跨模型重用,而不是仅关注单模型加固。

英文摘要

Large Language Models remain vulnerable to adversarial prompts that elicit toxic content even after safety alignment. We present ToxSearch, a black-box evolutionary framework that tests model safety by evolving prompts in a synchronous steady-state loop. The system employs a diverse set of operators, including lexical substitutions, negation, back-translation, paraphrasing, and two semantic crossover operators, while a moderation oracle provides fitness guidance. Operator-level analysis shows heterogeneous behavior: lexical substitutions offer the best yield-variance trade-off, semantic-similarity crossover acts as a precise low-throughput inserter, and global rewrites exhibit high variance with elevated refusal costs. Using elite prompts evolved on LLaMA 3.1 8B, we observe practically meaningful but attenuated cross-model transfer, with toxicity roughly halving on most targets, smaller LLaMA 3.2 variants showing the strongest resistance, and some cross-architecture models retaining higher toxicity. These results suggest that small, controllable perturbations are effective vehicles for systematic red-teaming and that defenses should anticipate cross-model reuse of adversarial prompts rather than focusing only on single-model hardening.

2601.14323 2026-06-02 cs.CR cs.AI cs.RO

SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models

SilentDrift: 利用动作分块对视觉-语言-动作模型进行隐蔽后门攻击

Bingxin Xu, Yuzhang Shang, Binghui Wang, Emilio Ferrara

发表机构 * University of Southern California(南加州大学) University of Central Florida(中央佛罗里达大学) Illinois Institute of Technology(伊利诺伊理工学院)

AI总结 针对视觉-语言-动作模型中的动作分块与增量位姿表示导致的视觉开环漏洞,提出一种利用平滑步函数构建满足C2连续扰动的隐蔽黑盒后门攻击方法SilentDrift,并通过关键帧攻击策略实现高攻击成功率与低中毒率。

Comments Accepted to ACL Findings 2026

详情
AI中文摘要

视觉-语言-动作(VLA)模型越来越多地部署在安全关键的机器人应用中,但其安全漏洞仍未得到充分探索。我们识别出现代VLA系统中的一个基本安全缺陷:动作分块与增量位姿表示的结合产生了块内视觉开环。该机制迫使机器人执行K步动作序列,允许每步扰动通过积分累积。我们提出SILENTDRIFT,一种利用此漏洞的隐蔽黑盒后门攻击。我们的方法采用平滑步函数构建具有保证C2连续性的扰动,确保轨迹边界处的速度和加速度为零,以满足严格的运动学一致性约束。此外,我们的关键帧攻击策略仅选择性地毒化关键的接近阶段,在最小化触发暴露的同时最大化影响。生成的毒化轨迹在视觉上与成功演示难以区分。在LIBERO上评估,SILENTDRIFT在低于2%的中毒率下实现了93.2%的攻击成功率,同时保持了95.3%的干净任务成功率。

英文摘要

Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems: the combination of action chunking and delta pose representations creates an intra-chunk visual open-loop. This mechanism forces the robot to execute K-step action sequences, allowing per-step perturbations to accumulate through integration. We propose SILENTDRIFT, a stealthy black-box backdoor attack exploiting this vulnerability. Our method employs the Smootherstep function to construct perturbations with guaranteed C2 continuity, ensuring zero velocity and acceleration at trajectory boundaries to satisfy strict kinematic consistency constraints. Furthermore, our keyframe attack strategy selectively poisons only the critical approach phase, maximizing impact while minimizing trigger exposure. The resulting poisoned trajectories are visually indistinguishable from successful demonstrations. Evaluated on the LIBERO, SILENTDRIFT achieves a 93.2% Attack Success Rate with a poisoning rate under 2%, while maintaining a 95.3% Clean Task Success Rate.

2509.06093 2026-06-02 cs.DB cond-mat.mtrl-sci cs.AI cs.CL

Language-Native Materials Processing Design by Lightly Structured Text Database and Reasoning Large Language Model

基于轻结构化文本数据库和推理大语言模型的自然语言材料加工设计

Yuze Liu, Zhaoyuan Zhang, Xiangsheng Zeng, Yihe Zhang, Leping Yu, Liu Yang, Lejia Wang, Xi Yu

发表机构 * State Key Laboratory of Advanced Materials for Intelligent Sensing, Key Laboratory of Organic Integrated Circuit, Ministry of Education & Tianjin Key Laboratory of Molecular Optoelectronic Sciences, Department of Chemistry, School of Science, Tianjin University(智能传感先进材料国家重点实验室、有机集成电路重点实验室、教育部、天津分子光电子科学重点实验室、化学系、天津大学) Language Intelligence Technology Co., Ltd.(语言智能技术有限公司) College of Intelligence and Computing, Tianjin University(智能与计算学院、天津大学) School of Materials and Chemical Engineering, Ningbo University of Technology(材料与化学工程学院、宁波工业大学)

AI总结 将材料合成规划重构为文本推理问题,通过轻结构化知识基底结合检索增强生成与经验增强推理,在氮化硼纳米片剥离中三轮迭代获得高质量协议。

详情
AI中文摘要

材料合成步骤主要以叙述性文本形式记录在论文、方案和实验室记录中,这使得传统数据驱动优化框架难以处理。这种自然语言特性对复杂多阶段过程(如氮化硼纳米片(BNNS)的制备)构成了特殊挑战,其中结果取决于剥离、功能化和功能化中的路径依赖选择。在这里,我们将材料合成规划重构为一个文本推理问题,该问题由一个轻结构化的知识基底支持,该基底保留了程序逻辑和因果上下文,同时暴露了可计算元素以供检索。基于这种表示,我们的框架结合了语义匹配、词汇搜索和参数感知过滤,以支持检索增强生成,提供更准确、更有依据的合成指导。我们进一步引入了经验增强推理,其中从多源叙述中迭代提炼的文本指导支持假设生成、故障诊断和方案修订。我们在BNNS的目标剥离中验证了该框架,这是一个受多变量约束且文献方案在实验室间可迁移性有限的合成问题。通过将分散的文献证据与实验观察到的故障模式相结合,系统仅在三轮迭代内就收敛到一个高性能方案,该方案产生了符合目标规格的高质量超薄纳米片,大大缩短了通常由专家主导的冗长试错周期。通过实现对程序知识的自然语言推理,该框架将AI从文献辅助推向复杂材料工作流程中的主动合成规划、适应和加速。

英文摘要

Materials synthesis procedures are predominantly documented as narrative text in papers, protocols, and laboratory records, placing them beyond the reach of conventional data-driven optimization frameworks. This language-native character poses a particular challenge for complex, multistage processes such as the preparation of boron nitride nanosheets (BNNS), where outcomes depend on path-dependent choices in exfoliation, functionalization, and functionalization. Here, we recast synthesis planning of the materials as a text reasoning problem enabled by a lightly structured knowledge substrate that preserves the procedural logic and causal contexts while exposing computable elements for retrieval. Built on this representation, our framework combines semantic matching, lexical search, and parameter-aware filtering to support retrieval-augmented generation with more accurate and better-grounded synthesis guidance. We further introduce experience-augmented reasoning, in which iteratively refined text guides distilled from multi-source narratives support hypothesis generation, failure diagnosis, and protocol revision. We validated the framework in the targeted exfoliation of BNNS, a synthesis problem governed by multivariate constraints and limited transferability of literature protocols across laboratory settings. By integrating dispersed literature evidence with experimentally observed failure modes, the system converged within only three iterative rounds on a high-performing protocol that yielded high-quality ultrathin nanosheets meeting the target specifications, substantially shortening what is often a prolonged cycle of expert-led trial-and-error. By enabling language-native reasoning over procedural knowledge, this framework moves AI beyond literature assistance toward active synthesis planning, adaptation and acceleration in complex materials workflows.

2210.12860 2026-06-02 math.OC cs.CC cs.LG

Explicit Second-Order Min-Max Optimization: Practical Algorithms and Complexity Analysis

显式二阶极小极大优化:实用算法与复杂度分析

Tianyi Lin, Panayotis Mertikopoulos, Michael I. Jordan

发表机构 * Department of Electrical Engineering and Computer Sciences(电气工程与计算机科学系) Department of Statistics(统计学系) University of California, Berkeley(加州大学伯克利分校) Department of Industrial Engineering and Operations Research(工业工程与运作研究系) Columbia University(哥伦比亚大学) Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG(格勒诺布尔阿尔卑斯大学、CNRS、Inria、格勒诺布尔INP、LIG)

AI总结 针对凸-凹无约束极小极大优化问题,提出并分析了几种不精确正则化牛顿型方法,证明其迭代在界集内且平均迭代在O(ε^{-2/3})次内收敛到ε-鞍点,并通过Schur分解和线性系统求解器高效求解子问题,在合成基准和AUC最大化实际应用中优于一阶方法。

Comments Accepted by TMLR; Adding funding information; 35 pages

详情
AI中文摘要

我们提出并分析了几种不精确正则化牛顿型方法,用于寻找凸-凹无约束极小极大优化问题的全局鞍点。与一阶方法相比,我们对二阶方法在极小极大优化中的理解相对有限,因为利用二阶信息获得全局收敛率可能更加复杂。在本文中,我们研究了即使在不精确情况下,二阶信息如何用于加速额外梯度方法。特别地,我们证明了所提出的方法生成的迭代保持在有界集内,并且平均迭代在受限间隙函数意义下在O(ε^{-2/3})次内收敛到ε-鞍点。我们还提供了一个简单的例程来求解每次迭代的子问题,该例程需要一次Schur分解和O(log log(1/ε))次对拟上三角系统的线性系统求解器调用。因此,我们的方法通过将所需Schur分解次数减少O(log log(1/ε))因子,改进了现有的基于线搜索的二阶极小极大优化方法。最后,我们在合成基准和来自标准LIBSVM数据集上AUC最大化的实际应用上评估了我们的方法,发现所提出的二阶方法在这些问题上比代表性的一阶方法具有更强的实际效率。

英文摘要

We propose and analyze several inexact regularized Newton-type methods for finding a global saddle point of convex-concave unconstrained min-max optimization problems. Compared to first-order methods, our understanding of second-order methods for min-max optimization is relatively limited, as obtaining global rates of convergence with second-order information can be much more involved. In this paper, we examine how second-order information is used to speed up extra-gradient methods, even under inexactness. In particular, we show that the proposed methods generate iterates that remain within a bounded set and that the averaged iterates converge to an $ε$-saddle point within $O(ε^{-2/3})$ iterations in terms of a restricted gap function. We also provide a simple routine for solving the subproblem at each iteration, requiring a single Schur decomposition and $O(\log\log(1/ε))$ calls to a linear system solver in a quasi-upper-triangular system. Thus, our method improves the existing line-search-based second-order min-max optimization methods by shaving off an $O(\log\log(1/ε))$ factor in the required number of Schur decompositions. Finally, we evaluate our method on both synthetic benchmarks and a real-world application arising from AUC maximization on standard LIBSVM datasets, and find that the proposed second-order approach delivers stronger practical efficiency than representative first-order methods on these problems.

2511.06163 2026-06-02 eess.IV cs.CV cs.LG physics.med-ph

Cross-Modal Fine-Tuning of 3D Convolutional Foundation Models for ADHD Classification with Low-Rank Adaptation

基于低秩适应的3D卷积基础模型跨模态微调用于ADHD分类

Jyun-Ping Kao, Shinyeong Rho, Shahar Lazarev, Hyun-Hae Cho, Fangxu Xing, Taehoon Shin, C. -C. Jay Kuo, Jonghye Woo

发表机构 * National Institute of Mental Health, National Institutes of Health(国家精神卫生研究所,国立卫生研究院)

AI总结 提出一种参数高效的迁移学习方法,通过3D低秩适应(LoRA)将预训练于CT图像的3D卷积基础模型微调至MRI的ADHD分类任务,在公开扩散MRI数据集上达到71.9%准确率和0.716 AUC,仅需164万可训练参数。

Comments Accepted for presentation at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026

详情
Journal ref
2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI), pp. 1-4
AI中文摘要

儿童注意缺陷/多动障碍(ADHD)的早期诊断在改善教育和心理健康结果中起着关键作用。然而,由于异质性表现和与其他疾病的重叠症状,使用神经影像数据诊断ADHD仍然具有挑战性。为了解决这一问题,我们提出了一种新颖的参数高效迁移学习方法,将预训练于CT图像的大规模3D卷积基础模型适应于基于MRI的ADHD分类任务。我们的方法通过将3D卷积核分解为2D低秩更新,在3D中引入低秩适应(LoRA),大幅减少可训练参数,同时实现优越性能。在公开扩散MRI数据库上的五折交叉验证评估中,我们的3D LoRA微调策略取得了最先进的结果,一个模型变体达到71.9%的准确率,另一个达到0.716的AUC。两个变体仅使用164万可训练参数(比完全微调的基础模型少113倍以上)。我们的结果代表了神经影像中基础模型首次成功的跨模态(CT到MRI)适应之一,为ADHD分类建立了新的基准,同时大幅提高了效率。

英文摘要

Early diagnosis of attention-deficit/hyperactivity disorder (ADHD) in children plays a crucial role in improving outcomes in education and mental health. Diagnosing ADHD using neuroimaging data, however, remains challenging due to heterogeneous presentations and overlapping symptoms with other conditions. To address this, we propose a novel parameter-efficient transfer learning approach that adapts a large-scale 3D convolutional foundation model, pre-trained on CT images, to an MRI-based ADHD classification task. Our method introduces Low-Rank Adaptation (LoRA) in 3D by factorizing 3D convolutional kernels into 2D low-rank updates, dramatically reducing trainable parameters while achieving superior performance. In a five-fold cross-validated evaluation on a public diffusion MRI database, our 3D LoRA fine-tuning strategy achieved state-of-the-art results, with one model variant reaching 71.9% accuracy and another attaining an AUC of 0.716. Both variants use only 1.64 million trainable parameters (over 113x fewer than a fully fine-tuned foundation model). Our results represent one of the first successful cross-modal (CT-to-MRI) adaptations of a foundation model in neuroimaging, establishing a new benchmark for ADHD classification while greatly improving efficiency.

2601.04539 2026-06-02 cs.NE cs.AI cs.LG

Paradoxical noise preference in RNNs

RNN中的矛盾噪声偏好

Noah Eckstein, Manoj Srinivasan

发表机构 * Department of Mechanical and Aerospace Engineering(机械与航空航天工程系)

AI总结 研究发现,在循环神经网络中,训练时注入的噪声在测试时移除反而会降低性能,网络偏好训练时的噪声水平,该现象源于噪声引起的固定点偏移。

Comments Published in Transactions on Machine Learning Research (TMLR), 2026 21 pages, 8 figures

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

在用于模拟生物神经网络的循环神经网络(RNN)中,通常在训练期间引入噪声以模拟生物变异性和正则化学习。预期在测试时去除噪声应保持或提高性能。与这一直觉相反,我们发现连续时间RNN(CTRNN)通常在训练噪声水平或接近该水平时表现最佳。这种噪声偏好通常出现在噪声注入到神经激活函数内部时;而在激活函数外部注入噪声训练的网络在零噪声时表现最佳。该现象在多种任务中对于足够大的训练噪声鲁棒地出现;我们还展示了该现象出现在前馈神经网络中,而不仅仅是RNN中。我们的分析表明,该现象源于RNN底层随机动力学中固定点(平稳分布)的噪声诱导偏移。这些固定点偏移依赖于噪声水平,并在去除噪声时使网络输出产生偏差,从而降低性能。分析和数值结果表明,当神经状态在激活函数非线性附近运行时会产生偏差,此时噪声被不对称地衰减,而性能优化激励了在这些非线性附近运行;对于噪声在激活函数内部的网络存在这种性能激励,而外部噪声的网络则没有,这解释了为什么只有内部噪声网络表现出偏好。因此,网络可能过拟合到训练噪声本身,而不仅仅是输入-输出数据。该现象不同于随机共振,后者中非零噪声增强信号处理。我们的发现揭示了训练噪声可以成为神经网络学习到的计算的一部分,对理解神经群体动力学和设计鲁棒的人工RNN具有启示意义。

英文摘要

In recurrent neural networks (RNNs) used to model biological neural networks, noise is typically introduced during training to emulate biological variability and regularize learning. The expectation is that removing the noise at test time should preserve or improve performance. Contrary to this intuition, we find that continuous-time RNNs (CTRNNs) often perform best at or near the training noise level. This noise preference typically arises when noise is injected inside the neural activation function; networks trained with noise injected outside the activation function perform best with zero noise. The phenomenon arises robustly in diverse tasks for large enough training noise; we also show the phenomenon arising in feedforward neural networks, not just in RNNs. Our analyses show that the phenomenon stems from noise-induced shifts of fixed points (stationary distributions) in the underlying stochastic dynamics of the RNNs. These fixed point shifts are noise-level dependent and bias the network outputs when the noise is removed, degrading performance. Analytical and numerical results show that the bias arises when neural states operate near activation-function nonlinearities, where noise is asymmetrically attenuated, and that performance optimization incentivizes operation near these nonlinearities; such performance incentives exist for networks with noise inside, but not outside, the activation function, explaining why only noise-in networks show the preference. Thus, networks can overfit to the training noise itself rather than just to the input-output data. The phenomenon is distinct from stochastic resonance, wherein nonzero noise enhances signal processing. Our findings reveal that training noise can become an integral part of the computation learned by neural networks, with implications for understanding neural population dynamics and for the design of robust artificial RNNs.

2601.00672 2026-06-02 math.NA cs.LG cs.NA

Sparse FEONet: A Low-Cost, Memory-Efficient Operator Network via Finite-Element Local Sparsity for Parametric PDEs

稀疏FEONet:通过有限元局部稀疏性实现低计算成本、高内存效率的参数化PDE算子网络

Seungchan Ko, Jiyeon Kim, Dongwook Shin

发表机构 * Department of Mathematics, Inha University(inha大学数学系) Department of Mathematics, Ajou University(ajou大学数学系)

AI总结 针对参数化PDE的有限元算子网络(FEONet)在大规模问题中计算成本高、精度下降的问题,提出一种基于有限元结构的新型稀疏网络架构,在保持精度相当的同时显著提升计算效率,并给出理论逼近和稳定性分析。

详情
AI中文摘要

本文研究了有限元算子网络(FEONet),这是一种用于参数化问题的算子学习方法,最初由J. Y. Lee、S. Ko和Y. Hong在《Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs》(SIAM J. Sci. Comput., 47(2), C501-C528, 2025)中提出。FEONet在有限元空间上实现参数到解的映射,并采用无需训练数据的训练过程,同时在一大类问题上表现出高精度和鲁棒性。然而,随着单元数量的增加,其计算成本上升且精度可能下降,这给大规模问题带来了显著挑战。在本文中,我们受有限元结构启发,提出一种新的稀疏网络架构来解决这一问题。通过大量数值实验,我们表明所提出的稀疏网络在保持相当精度的同时,在计算成本和效率方面实现了显著改进。我们还建立了理论结果,证明稀疏架构能够有效逼近目标算子,并提供了稳定性分析以确保可靠的训练和预测。

英文摘要

In this paper, we study the finite element operator network (FEONet), an operator-learning method for parametric problems, originally introduced in J. Y. Lee, S. Ko, and Y. Hong, Finite Element Operator Network for Solving Elliptic-Type Parametric PDEs, SIAM J. Sci. Comput., 47(2), C501-C528, 2025. FEONet realizes the parameter-to-solution map on a finite element space and admits a training procedure that does not require training data, while exhibiting high accuracy and robustness across a broad class of problems. However, its computational cost increases and accuracy may deteriorate as the number of elements grows, posing notable challenges for large-scale problems. In this paper, we propose a new sparse network architecture motivated by the structure of the finite elements to address this issue. Throughout extensive numerical experiments, we show that the proposed sparse network achieves substantial improvements in computational cost and efficiency while maintaining comparable accuracy. We also establish theoretical results demonstrating that the sparse architecture can approximate the target operator effectively and provide a stability analysis ensuring reliable training and prediction.

2601.00389 2026-06-02 cs.CR cs.LG cs.NI

NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion

NOS-Gate: 面向消费网关的时序控制规避下队列感知流式入侵检测系统

Muhammad Bilal, Omer Tariq, Hasan Ahmed

发表机构 * School of Computing and Communications, Lancaster University(计算与通信学院,兰卡斯特大学) School of Computing, Korea Advanced Institute of Science and Technology(计算学院,韩国科学技术院)

AI总结 提出一种轻量级流式入侵检测系统NOS-Gate,基于网络优化脉冲动力学和K-of-M持久规则,在时序控制规避下实现高召回率低延迟的加密流量元数据检测。

Comments 9 pages, 3 figures, 4 tables. M. Bilal, O. Tariq and H. Ahmed, "NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion," in IEEE Transactions on Consumer Electronics, doi: 10.1109/TCE.2026.3682516

详情
AI中文摘要

时序和突发模式可能通过加密泄露,自适应攻击者可利用这一点。这削弱了独立消费网关中仅基于元数据的检测能力。因此,消费网关需要在严格的CPU和延迟预算下,仅使用元数据对加密流量进行流式入侵检测。我们提出了一种针对独立网关的流式入侵检测系统,该系统为每个流实例化一个源自网络优化脉冲(NOS)动力学的轻量级两状态单元,称为NOS-Gate。NOS-Gate对固定长度的元数据特征窗口进行评分,并在K-of-M持久规则下触发可逆缓解措施,在加权公平队列(WFQ)下暂时降低该流的权重。我们使用可执行程序worlds基准测试评估了NOS-Gate在时序控制规避下的性能,该基准测试指定了良性设备进程、可审计的攻击者预算、竞争结构以及数据包级WFQ重放以量化队列影响。所有方法均通过烧入分位数阈值进行无标签校准。在多个可复现的worlds和恶意事件中,在达到0.1%假阳性率的工作点下,NOS-Gate实现了0.952的事件召回率,而最佳基线为0.857。在门控下,它将p99.9排队延迟和p99.9附带延迟降低,CPU上每个流窗口的平均评分成本约为2.09微秒。

英文摘要

Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consumer gateways need streaming intrusion detection on encrypted traffic using metadata only, under tight CPU and latency budgets. We present a streaming IDS for stand-alone gateways that instantiates a lightweight two-state unit derived from Network-Optimised Spiking (NOS) dynamics per flow, named \emph{NOS-Gate}. NOS-Gate scores fixed-length windows of metadata features and, under a $K$-of-$M$ persistence rule, triggers a reversible mitigation that temporarily reduces the flow's weight under weighted fair queueing (WFQ). We evaluate NOS-Gate under timing-controlled evasion using an executable \emph{worlds} benchmark that specifies benign device processes, auditable attacker budgets, contention structure, and packet-level WFQ replay to quantify queue impact. All methods are calibrated label-free via burn-in quantile thresholding. Across multiple reproducible worlds and malicious episodes, at an achieved $0.1\%$ false-positive operating point, NOS-Gate attains 0.952 incident recall versus 0.857 for the best baseline in these runs. Under gating, it reduces p99.9 queueing delay and p99.9 collateral delay with a mean scoring cost of $\approx 2.09\,μ\mathrm{s}$ per flow-window on CPU.

2512.22060 2026-06-02 cs.CR cs.CL cs.CY

Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management

面向安全与合规的人工智能:NLP模型生命周期管理的组织标准与协议

Sunil Arora, John Hastings

发表机构 * ICAART 2026

AI总结 提出SC-NLP-LMF六阶段框架,通过系统综述整合NIST AI RMF等标准,结合偏差检测、差分隐私等方法,保障NLP系统从开发到退役的安全与合规,并以医疗案例验证其有效性。

Comments 9 pages, 2 tables, 1 figure

详情
Journal ref
Proc. of the 18th International Conference on Agents and Artificial Intelligence - Vol. 2: ICAART, 1624-1634, 2026
AI中文摘要

自然语言处理系统越来越多地用于医疗、金融和政府等敏感领域,处理大量个人和受监管数据。然而,这些系统引入了与安全、隐私和法规遵从相关的独特风险,现有的人工智能治理框架未能完全解决这些问题。本文介绍了安全合规的NLP生命周期管理框架(SC-NLP-LMF),这是一个全面的六阶段模型,旨在确保NLP系统从开发到退役的安全运行。该框架通过对45篇同行评审和监管来源进行基于PRISMA的系统综述而开发,与领先标准(包括NIST AI RMF、ISO/IEC 42001:2023、欧盟AI法案和MITRE ATLAS)保持一致。它集成了偏差检测、隐私保护(差分隐私、联邦学习)、安全部署、可解释性和安全模型退役等成熟方法。一个医疗案例研究展示了SC-NLP-LMF如何检测新兴术语漂移(例如,与COVID相关的语言)并指导合规的模型更新。该框架为组织提供了一个实用的、覆盖全生命周期的结构,用于在高风险环境中开发、部署和维护安全且负责任的NLP系统。

英文摘要

Natural Language Processing (NLP) systems are increasingly used in sensitive domains such as healthcare, finance, and government, where they handle large volumes of personal and regulated data. However, these systems introduce distinct risks related to security, privacy, and regulatory compliance that are not fully addressed by existing AI governance frameworks. This paper introduces the Secure and Compliant NLP Lifecycle Management Framework (SC-NLP-LMF), a comprehensive six-phase model designed to ensure the secure operation of NLP systems from development to retirement. The framework, developed through a systematic PRISMA-based review of 45 peer-reviewed and regulatory sources, aligns with leading standards, including NIST AI RMF, ISO/IEC 42001:2023, the EU AI Act, and MITRE ATLAS. It integrates established methods for bias detection, privacy protection (differential privacy, federated learning), secure deployment, explainability, and secure model decommissioning. A healthcare case study illustrates how SC-NLP-LMF detects emerging terminology drift (e.g., COVID-related language) and guides compliant model updates. The framework offers organizations a practical, lifecycle-wide structure for developing, deploying, and maintaining secure and accountable NLP systems in high-risk environments.

2512.18043 2026-06-02 cs.CR cs.AI cs.CY

Securing Agentic AI Systems -- A Multilayer Security Framework

保护自主AI系统——一种多层安全框架

Sunil Arora, John Hastings

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对自主AI系统的独特安全挑战,本文采用设计科学研究方法,提出了一种生命周期感知的安全框架MAAIS,并引入自主AI的CIAA概念,通过多层防御机制确保AI生命周期的机密性、完整性、可用性和问责性,最后利用MITRE ATLAS进行验证。

Comments 6 pages, 2 figures, 1 table

详情
Journal ref
2025 IEEE 5th International Conference on Robotics, Automation, and Artificial Intelligence (RAAI)
AI中文摘要

保护自主人工智能(AI)系统需要应对由自主、决策和自适应行为引入的复杂网络风险。自主AI系统正越来越多地部署在工业、组织以及网络安全、金融和医疗等关键领域。然而,它们的自主性带来了独特的安全挑战,包括未经授权的操作、对抗性操纵和动态环境交互。现有的AI安全框架未能充分应对这些挑战或自主AI的独特细微差别。本研究采用设计科学研究(DSR)方法,开发了一种专门针对自主AI系统的生命周期感知安全框架。本文介绍了MAAIS,一个自主安全框架,以及自主AI的CIAA(机密性、完整性、可用性和问责性)概念。MAAIS集成了多个防御层,以在AI生命周期中维护CIAA。通过映射已建立的MITRE ATLAS(人工智能系统对抗威胁全景)AI策略进行框架验证。本研究为在企业环境中安全部署和治理自主AI提供了一种结构化、标准化且基于框架的方法。该框架面向企业CISO、安全、AI平台和工程团队,并提供了保护自主AI工作负载的详细分步方法。

英文摘要

Securing Agentic Artificial Intelligence (AI) systems requires addressing the complex cyber risks introduced by autonomous, decision-making, and adaptive behaviors. Agentic AI systems are increasingly deployed across industries, organizations, and critical sectors such as cybersecurity, finance, and healthcare. However, their autonomy introduces unique security challenges, including unauthorized actions, adversarial manipulation, and dynamic environmental interactions. Existing AI security frameworks do not adequately address these challenges or the unique nuances of agentic AI. This research develops a lifecycle-aware security framework specifically designed for agentic AI systems using the Design Science Research (DSR) methodology. The paper introduces MAAIS, an agentic security framework, and the agentic AI CIAA (Confidentiality, Integrity, Availability, and Accountability) concept. MAAIS integrates multiple defense layers to maintain CIAA across the AI lifecycle. Framework validation is conducted by mapping with the established MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) AI tactics. The study contributes a structured, standardized, and framework-based approach for the secure deployment and governance of agentic AI in enterprise environments. This framework is intended for enterprise CISOs, security, AI platform, and engineering teams and offers a detailed step-by-step approach to securing agentic AI workloads.

2512.02342 2026-06-02 math.OC cs.LG stat.ML

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

非光滑优化的保护性随机Polyak步长:无需小(次)梯度的鲁棒性能

Dimitris Oikonomou, Nicolas Loizou

发表机构 * Mathematical Institute for Data Science (MINDS), Johns Hopkins University, Baltimore, MD, USA(数据科学数学研究所(MINDS),约翰霍普金斯大学,巴尔的摩,MD,美国) Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA(计算机科学系,约翰霍普金斯大学,巴尔的摩,MD,美国) Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA(应用数学与统计学系,约翰霍普金斯大学,巴尔的摩,MD,美国)

AI总结 针对非光滑凸优化问题,提出保护性随机Polyak步长(SPS_safe)用于随机次梯度方法,在无需强假设下提供收敛保证,并融入动量机制,实验验证其在深度神经网络训练中避免梯度消失的鲁棒性。

Comments 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

随机Polyak步长(SPS)已被证明是随机梯度下降(SGD)的一个有前景的选择,在光滑凸和非凸优化问题(包括深度神经网络训练)上,与最先进方法相比具有竞争性能。然而,该方法向非光滑设置的扩展仍处于早期阶段,通常依赖于插值假设或需要知道最优解。在这项工作中,我们为随机次梯度方法提出了一种新的SPS变体——保护性SPS(SPS$_{safe}$),并在无需强假设的情况下为非光滑凸优化提供了严格的收敛保证。我们进一步将动量融入更新规则中,得到了同样严格的理论结果。在凸基准和深度神经网络上的综合实验证实了我们的理论:所提出的步长在现有自适应基线中实现了竞争性能,并在广泛的问题设置中表现出稳定行为。最后,在深度神经网络训练的背景下,我们的步长下的梯度范数不会崩溃到(接近)零,表明了对梯度消失的鲁棒性。

英文摘要

The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Finally, in the context of deep neural network training, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.

2512.10234 2026-06-02 cs.HC cs.AI

InFerActive: Interactive Tree-Based Exploration of LLM Sampling for Safety Evaluation

InFerActive: 基于交互式树的安全评估中LLM采样探索

Junhyeong Hwangbo, Soohyun Lee, Hyeon Jeon, Kyochul Jang, Minsoo Cheong, Youngjae Yu, Jinwook Seo

发表机构 * Seoul National University(首尔国立大学)

AI总结 提出InFerActive系统,通过广度优先采样构建可导航短语树,提升LLM安全评估中低概率有害输出的覆盖率和效率,相比随机采样减少5倍样本量。

Comments v2: Revised version

详情
AI中文摘要

即使在评估中表现安全的LLM,在部署时仍可能产生有害响应。由于随机采样对同一提示产生不同响应,低概率的有害输出仍可能大规模到达用户。常见的人工评估工作流为每个提示生成大量随机样本,并在静态电子表格中审查。这种做法扩展性差,迫使评估者反复重读近乎重复的前缀。为解决此问题,我们提出InFerActive,一个交互式系统,将采样结果可视化为可导航的短语树,允许评估者按需过滤、探索和扩展生成空间。InFerActive利用广度优先采样,一种新颖的树构建过程,在匹配随机采样的有害响应覆盖范围的同时,所需样本最多减少5.0倍。两项受控用户研究(各N=12)表明,InFerActive在评估效率和覆盖率上显著优于电子表格和基本树基线。

英文摘要

Even LLMs that appear safe during evaluation can still produce harmful responses in deployment. Because stochastic sampling yields different responses to the same prompt, low-probability harmful outputs can still reach users at scale. Common human evaluation workflows generate many random samples per prompt and review them in static spreadsheets. The practice scales poorly, forcing evaluators to repeatedly reread near-duplicate prefixes. To address this, we present InFerActive, an interactive system that visualizes sampling results as a navigable tree of readable phrases, allowing evaluators to filter, explore, and extend the generation space on demand. InFerActive utilizes breadth-first sampling, a novel tree construction procedure that matches the harmful-response coverage of random sampling while requiring up to 5.0x fewer samples. Two controlled user studies (N = 12 each) demonstrate that InFerActive significantly improves evaluation efficiency and coverage over both spreadsheet and basic tree baselines.

2511.01064 2026-06-02 stat.ML cs.LG stat.CO

Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

存在偶对称和椭圆对称时变分推断的广义保证

Charles C. Margossian, Isaac E. Rankin, Lawrence K. Saul

发表机构 * University of British Columbia, Department of Statistics(不列颠哥伦比亚大学统计学系) Flatiron Institute, Center for Computational Mathematics(Flatiron研究所计算数学中心)

AI总结 本文证明,对于所有f-散度,在偶对称和椭圆对称条件下,变分推断的驻点能分别恢复目标密度的均值和相关矩阵,推广了先前对逆KL散度的结果。

详情
AI中文摘要

变分推断(VI)通过在易处理的分布族中寻找最佳匹配$q$来近似目标密度$p$。最佳变分近似通过最小化分布之间的散度$D(p||q)$得到,目前已提出多种散度作为VI的目标函数,不同选择导致不同近似。我们证明,即使这些散度具有不同的最小化器,所得近似都遵循某些对称匹配原则。具体来说,我们的结果适用于所有$f$-散度,这是一大类包括逆和前向Kullback-Leibler散度以及$\alpha$-散度的散度。我们证明,在存在偶对称时,$f$-散度的任何驻点都保证恢复$p$的均值;同样,在存在椭圆对称时,任何驻点都保证恢复其相关矩阵。为获得这些保证,我们假设$p$和$q$是单峰的,但值得注意的是,我们不要求它们是对数凹、轻尾或处处光滑的。这些保证推广了先前对逆Kullback-Leibler散度在$p$为对数凹时得到的结果。它们还扩展到目标密度$p$仅在其部分坐标上呈现对称性的情况。这些部分对称性自然出现在贝叶斯层次模型中,其中先验诱导出具有挑战性的几何结构,但仍具有对称轴。

英文摘要

Variational inference (VI) approximates a target density $p$ by the best match $q$ in a family of tractable distributions. The best variational approximation is found by minimizing a divergence between distributions, $D(p||q)$, and several divergences have been proposed as objective functions for VI, with different choices leading to different approximations. We show that even when these divergences have different minimizers, the resulting approximations all abide by certain symmetry-matching principles. Specifically, our results hold for all $f$-divergences, a broad class which includes the reverse and forward Kullback-Leibler divergences and the $α$-divergences. We show that in the presence of even symmetry, any stationary point of an $f$-divergence is guaranteed to recover the mean of $p$ and likewise, in the presence of elliptical symmetry, any stationary point is guaranteed to recover its correlation matrix. To obtain these guarantees we assume that $p$ and $q$ are unimodal, but notably we do not require them to be log-concave, light-tailed, or even everywhere-smooth. These guarantees generalize a previous result obtained for the reverse Kullback-Leibler divergence when $p$ is log-concave. They also extend to cases where the target density $p$ only exhibits symmetry along some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry.

2512.06906 2026-06-02 cs.SE cs.CR cs.DB cs.LG

MINES: Explainable Anomaly Detection through Web API Invariant Inference

MINES:通过Web API不变式推断实现可解释的异常检测

Wenjie Zhang, Yun Lin, Chun Fung Amos Kwok, Xiwen Teoh, Xiaofei Xie, Frank Liauw, Hongyu Zhang, Jin Song Dong

发表机构 * National University of Singapore(国立新加坡大学) Shanghai Jiao Tong University(上海交通大学) Singapore Management University(新加坡管理学院) GovTech Singapore(新加坡政府科技局) Chongqing University(重庆大学)

AI总结 提出MINES方法,通过从模式级别推断可解释的API不变式来检测Web应用异常,显著降低误报率并提高召回率。

Comments Accepted by ICSE 2026

详情
AI中文摘要

检测Web应用的异常对于提供可靠的Web服务至关重要,这些应用是现代公司和政府运行的重要基础设施。许多现代Web应用基于Web API(例如RESTful、SOAP和WebSockets)运行,其暴露性会招致有意攻击或无意非法访问,导致系统行为异常。然而,此类异常可能与正常日志共享非常相似的日志,缺少用于日志区分的關鍵信息(可能存在于数据库中)。此外,日志实例可能包含噪声,这会进一步误导最先进的日志学习解决方案学习虚假相关性,从而产生用于异常检测的浅层模型和规则。在这项工作中,我们提出MINES,它从模式级别而非详细的原始日志实例推断可解释的API不变式用于异常检测,能够(1)显著区分日志中的噪声以识别精确的正常行为,以及(2)检测超出已记录日志的异常行为。技术上,MINES(1)将API签名转换为表模式以增强原始数据库模式;(2)在增强的数据库模式上推断潜在的数据库约束,以捕获API与数据库表之间的潜在关系。MINES使用LLM基于两个给定的表结构提取潜在关系,并使用正常日志实例拒绝或接受LLM生成的不变式。最后,MINES将推断的约束转换为不变式,生成用于验证运行时日志的Python代码。我们在TrainTicket、NiceFish、Gitea、Mastodon和NextCloud基准测试上针对Web篡改攻击,与LogRobust、LogFormer和WebNorm等基线进行了广泛评估。结果表明,MINES在引入几乎零误报的情况下实现了对异常的高召回率,代表了新的最先进水平。

英文摘要

Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.

2511.08223 2026-06-02 stat.CO cs.LG cs.NA math.NA

High-Performance Variance-Covariance Matrix Construction Using an Uncentered Gram Formulation

使用非中心Gram形式的高性能方差-协方差矩阵构建

Felix Reichel

发表机构 * Department of Economics, Johannes Kepler University Linz(经济系,约翰尼斯·开普勒大学林茨)

AI总结 本文通过非中心Gram矩阵和修正项等价于成对差异定义,避免了显式中心化,将计算简化为一个p×p外积和一次减法,在Python基准测试中显著提升运行速度。

Comments 17 pages, 9 figures, 1 table

详情
Journal ref
A-ccepted at International Journal of Parallel, Emergent and Distributed Systems, 2026, Taylor & Francis, Unpublished
AI中文摘要

Reichel (2025) 将bariance定义为一种成对差异度量,该度量可以仅使用标量求和在线性时间内重写。我们通过证明涉及非中心Gram矩阵和修正项的标准矩阵表达式在代数上与成对差异定义相同,同时避免了显式中心化,将此思想扩展到协方差矩阵。然后计算简化为一个p×p维的外积和一次减法。Python中的基准测试显示出明显的运行时间增益,特别是在没有BLAS优化的情况下。可选的更快Gram矩阵例程(如RXTX, Rybin et al., 2025)进一步降低了总体成本。

英文摘要

Reichel (2025) defined the bariance as a pairwise-difference measure that can be rewritten in linear time using only scalar sums. We extend this idea to the covariance matrix by showing that the standard matrix expression involving the uncentered Gram matrix and a correction term is algebraically identical to the pairwise-difference definition while avoiding explicit centering. The computation then reduces to one outer product of dimension p-by-p and a single subtraction. Benchmarks in Python show clear runtime gains, especially when BLAS optimizations are absent. Optionally faster Gram-matrix routines such as RXTX (Rybin et al., 2025) further reduce overall cost.

2512.02328 2026-06-02 q-bio.QM cs.LG

Molecular Embedding-Based Algorithm Selection in Protein-Ligand Docking

基于分子嵌入的蛋白质-配体对接算法选择

Jiabao Brad Wang, Siyuan Cao, Hongxuan Wu, Yiliang Yuan, Mustafa Misir

发表机构 * Division of Natural and Applied Sciences, Duke Kunshan University(杜克昆山大学自然科学与应用科学系) Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence(莫扎德人工智能大学机器学习系)

AI总结 提出MolAS轻量级算法选择模型,利用预训练蛋白质和配体嵌入的注意力池化与浅层残差解码器预测对接算法性能,在五个基准上相比单一最优算法绝对提升高达15个百分点,并缩小了与虚拟最优算法之间17-66%的差距。

Comments 40 pages, 16 figures, 8 tables; updated to the accepted manuscript version

详情
Journal ref
J Cheminform 18, 47 (2026)
AI中文摘要

选择有效的对接算法高度依赖于具体情境,没有单一方法能在结构、化学和协议范围内可靠地表现。MolAS是一种轻量级算法选择模型,通过注意力池化和浅层残差解码器,从预训练的蛋白质和配体嵌入中预测每个算法的性能。使用数百到数千个标记复合物,MolAS在五个对接基准上相比单一最优算法(SBS)实现了高达15个百分点的绝对改进,并缩小了虚拟最优算法(VBS)与SBS之间17-66%的差距。对选择频率、边际条件可靠性和基准级预言结构分析表明,当工作流定义的预言景观具有低胜者熵和合理可分离的顶级求解器区域时,MolAS最有效,但在协议不匹配导致求解器排名变化和诱导标签改变时性能下降。这些结果表明,在评估的范围内,鲁棒性受限于工作流和协议引起的求解器层次不稳定性,而非表示能力,将MolAS定位为固定管线的领域内选择器以及评估对接算法选择是否适定的诊断工具。

英文摘要

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, and protocol regimes. MolAS is a lightweight algorithm-selection model that predicts per-algorithm performance from pretrained protein and ligand embeddings using attentional pooling and a shallow residual decoder. With hundreds to a few thousand labelled complexes, MolAS achieves up to a 15 percentage-point absolute improvement over the single-best solver (SBS) and closes 17--66\% of the Virtual Best Solver (VBS)--SBS gap across five docking benchmarks. Analyses of selection frequencies, margin-conditioned reliability, and benchmark-level oracle structure indicate that MolAS is most effective when the workflow-defined oracle landscape has low winner entropy and a reasonably separable top-solver region, but degrades under protocol mismatch that shifts solver rankings and changes the induced labels. These results suggest that, in the evaluated regime, robustness is limited less by representational capacity than by workflow- and protocol-induced instability in solver hierarchies, positioning MolAS as an in-domain selector for fixed pipelines and as a diagnostic tool for assessing when docking algorithm selection is well-posed.

2505.17648 2026-06-02 econ.GN cs.AI q-fin.EC

Simulating Macroeconomic Expectations in Survey Experiments with LLM-based Economic Agents

基于LLM的经济主体在调查实验中模拟宏观经济预期

Jianhao Lin, Lexuan Sun, Yixin Yan

发表机构 * Lingnan College, Sun Yat-sen University(中山大学岭南学院)

AI总结 提出一个利用基于大语言模型的经济主体(LLM Agents)模拟调查实验中宏观经济预期的框架,通过复现三种代表性调查设计验证其有效性,发现LLM Agents能生成与人类高度相似的预期分布并捕捉定性模式,其中先验信息对匹配分布至关重要。

详情
AI中文摘要

我们引入了一个框架,利用基于大语言模型的经济主体(LLM Agents)模拟调查实验中的宏观经济预期。我们构建了配备多个功能模块的LLM Agents,这些模块能够检索个人特征、先验预期和动态外部信息。我们通过复现三种涵盖不同类型受访者各种预期的代表性调查设计来验证我们的框架。结果表明,LLM Agents生成的预期分布与人类数据高度相似,并在开放式回答中捕捉到与人类一致的定性模式。评估显示,先验信息对于匹配分布至关重要,而个人和外部信息驱动类似人类的思维过程。我们的发现为在总体水平上缩小生成式AI与人类之间的信念差距提供了指导,同时界定了该框架的边界。

英文摘要

We introduce a framework for simulating macroeconomic expectations in survey experiments using LLM-based economic agents (LLM Agents). We construct LLM Agents equipped with several functional modules that retrieve personal characteristics, prior expectations, and dynamic external information. We validate our framework by recapitulating three representative survey designs covering various expectations across different types of respondents. Our results show that LLM Agents generate expectation distributions highly similar to human data and capture human-aligned qualitative patterns in open-ended responses. Evaluation reveals that priors are crucial for matching distributions, whereas personal and external information drive human-like thought processes. Our findings offer guidance for narrowing the belief gap between generative AI and humans at the aggregate level while delineating the boundaries of the framework.