arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.03121 2026-05-22 cs.CR cs.AI

Lost in Modality: Evaluating the Effectiveness of Text-Based Membership Inference Attacks on Large Multimodal Models

模态迷失：评估基于文本的成员推断攻击在大型多模态模型中的有效性

Ziyi Tong, Feifei Sun, Le Minh Nguyen

AI总结本文评估了基于文本的成员推断攻击（MIAs）在多模态模型中的有效性，发现其在分布内设置中表现相似，而在分布外设置中视觉输入起到正则化作用，有效掩盖了成员信号。

Comments accepted by ESANN 2026

2511.04106 2026-05-22 physics.soc-ph cs.CL cs.CY stat.AP

Sub-exponential Growth Dynamics in Complex Systems: A Piecewise Power-Law Model for the Diffusion of New Words and Names

复杂系统中的亚指数增长动力学：一种用于新词汇和名称扩散的分段幂律模型

Hayafumi Watanabe

AI总结本文提出了一种分段幂律模型，用于描述复杂增长曲线，通过分析大规模数据集发现亚指数增长是社会扩散的常见模式。

详情

DOI: 10.1103/f3d5-2tb8
Journal ref: Physical Review E (2026)

AI中文摘要

社会中思想和语言的扩散通常被S型模型描述，如逻辑斯蒂曲线。然而，亚指数增长——一种比指数增长更慢的模式，在更广泛的社会现象中作用被忽视。本文提出了一种分段幂律模型，通过分析约十亿篇日本博客文章与维基百科词汇的数据集，发现网络搜索趋势数据（英语、西班牙语和日语）中存在一致的模式。分析2963个选定项目（如足够持续时间/峰值、单调增长）发现，1625（55%）种扩散模式没有突变水平，可以由一个或两个段描述。对于单段曲线，发现（i）形状参数α的模式接近0.5，表明亚指数增长普遍；（ii）峰值扩散规模主要由增长速率R决定，次要贡献来自α或持续时间T；（iii）α倾向于随主题性质变化，小主题/本地主题的α较小，广泛共享主题的α较大。此外，一个微观行为模型表明，α可以解释为对外向（陌生人）与内向（社区）接触的偏好指数。这些发现表明亚指数增长是社会扩散的常见模式，我们的模型提供了一个实用框架，用于一致描述、比较和解释复杂多样的增长曲线。

英文摘要

The diffusion of ideas and language in society has conventionally been described by S-shaped models, such as the logistic curve. However, the role of sub-exponential growth -- a slower-than-exponential pattern known in epidemiology -- has been largely overlooked in broader social phenomena. Here, we present a piecewise power-law model to characterize complex growth curves with a few parameters. We systematically analyzed a large-scale dataset of approximately one billion Japanese blog articles linked to Wikipedia vocabulary, and observed consistent patterns in web search trend data (English, Spanish, and Japanese). Our analysis of 2,963 items, selected for reliable estimation (e.g., sufficient duration/peak, monotonic growth), reveals that 1,625 (55%) diffusion patterns without abrupt level shifts were adequately described by one or two segments. For single-segment curves, we found that (i) the mode of the shape parameter $α$ was near 0.5, indicating prevalent sub-exponential growth; (ii) the peak diffusion scale is primarily determined by the growth rate $R$, with minor contributions from $α$ or the duration $T$; and (iii) $α$ showed a tendency to vary with the nature of the topic, being smaller for niche/local topics and larger for widely shared ones. Furthermore, a micro-behavioral model of outward (stranger) vs. inward (community) contact suggests that $α$ can be interpreted as an index of the preference for outward-oriented communication. These findings suggest that sub-exponential growth is a common pattern of social diffusion, and our model provides a practical framework for consistently describing, comparing, and interpreting complex and diverse growth curves.

URL PDF HTML ☆

赞 0 踩 0

2509.26005 2026-05-22 stat.ML cs.LG

BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories under Spatio-Temporal Vector Fields

BALLAST：基于空间-时间向量场的海漂体轨迹的贝叶斯主动学习与前瞻性修正

Rui-Yang Zhang, Lachlan Astfalck, Edward Cripps, David S. Leslie, Henry B. Moss

AI总结本文提出了一种正式的主动学习方法，用于指导拉格朗日观测器的布置，以推断时间依赖的向量场，该方法利用了物理信息的空间-时间高斯过程代理模型。现有放置活动主要遵循标准的'空间填充'设计或相对随意的专家意见。在该设置中应用原理性主动学习的主要挑战是拉格朗日观测器持续被向量场推动，因此在不同位置和时间进行测量。因此，考虑已放置观测器的可能未来轨迹以评估候选放置位置的效用至关重要。为此，我们提出了BALLAST：用于海漂体轨迹的贝叶斯主动学习与前瞻性修正。我们观察到BALLAST辅助的顺序观测器布置策略在合成和高保真海洋流模型中均表现出显著优势。此外，我们还开发了一种新的GP推理方法——Vanilla SPDE Exchange（VaSE）——以提高GP后验采样效率，这也具有独立的研究价值。

Comments ICML 2026

详情

AI中文摘要

我们介绍了一种正式的主动学习方法，用于指导拉格朗日观测器的布置，以推断时间依赖的向量场——海洋学、海洋科学和海洋工程中的关键任务——使用一个具有物理信息的空间-时间高斯过程代理模型。现有放置活动要么遵循标准的'空间填充'设计，要么相对随意地依赖专家意见。在该设置中应用原理性主动学习的主要挑战是拉格朗日观测器持续被向量场推动，因此它们在不同的位置和时间进行测量。因此，考虑已放置观测器的可能未来轨迹以评估候选放置位置的效用至关重要。为此，我们提出了BALLAST：用于海漂体轨迹的贝叶斯主动学习与前瞻性修正。我们观察到BALLAST辅助的顺序观测器布置策略在合成和高保真海洋流模型中均表现出显著优势。此外，我们还开发了一种新的GP推理方法——Vanilla SPDE Exchange（VaSE）——以提高GP后验采样效率，这也具有独立的研究价值。

英文摘要

We introduce a formal active learning methodology for guiding the placement of Lagrangian observers to infer time-dependent vector fields -- a key task in oceanography, marine science, and ocean engineering -- using a physics-informed spatio-temporal Gaussian process surrogate model. The majority of existing placement campaigns either follow standard `space-filling' designs or relatively ad-hoc expert opinions. A key challenge to applying principled active learning in this setting is that Lagrangian observers are continuously advected through the vector field, so they make measurements at different locations and times. It is, therefore, important to consider the likely future trajectories of placed observers to account for the utility of candidate placement locations. To this end, we present BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories. We observe noticeable benefits of BALLAST-aided sequential observer placement strategies on both synthetic and high-fidelity ocean current models. In addition, we developed a novel GP inference method -- the Vanilla SPDE Exchange (VaSE) -- to boost the GP posterior sampling efficiency, which is also of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2507.05660 2026-05-22 cs.CR cs.AI cs.CL

Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI

Optimus: 一种用于在微调对话AI时缓解毒性行为的稳健防御框架

Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Ka-Shing Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath

AI总结本研究提出Optimus框架，通过整合训练无关的毒性分类方案和双重策略对齐过程，有效缓解微调过程中的毒性问题，并在有毒性分类器偏差时仍能保持高召回率，优于现有最佳防御方法StarDSS。

Comments Accepted at ACM CODASPY 2026

详情

AI中文摘要

定制化大型语言模型（LLMs）于不可信数据集上存在注入毒性行为的严重风险。在本文中，我们引入Optimus，一种新的防御框架，旨在在保持对话实用性的同时减轻微调危害。与现有防御方法依赖精确的毒性检测或限制性过滤不同，Optimus解决的是在毒性分类器不完美或有偏见时确保鲁棒缓解的关键挑战。Optimus整合了训练无关的毒性分类方案，重新利用商用LLMs的安全对齐性，并采用结合合成“治愈数据”与直接偏好优化（DPO）的双重策略对齐过程，以高效地将模型引导至安全方向。广泛的评估显示，即使依赖极度有偏见的分类器（召回率降高达85%），Optimus仍能缓解毒性。Optimus在性能上优于现有最佳防御方法StarDSS，并表现出对适应性对抗和越狱攻击的强大抵抗力。我们的源代码和数据集可在https://github.com/secml-lab-vt/Optimus上获得。

英文摘要

Customizing Large Language Models (LLMs) on untrusted datasets poses severe risks of injecting toxic behaviors. In this work, we introduce Optimus, a novel defense framework designed to mitigate fine-tuning harms while preserving conversational utility. Unlike existing defenses that rely heavily on precise toxicity detection or restrictive filtering, Optimus addresses the critical challenge of ensuring robust mitigation even when toxicity classifiers are imperfect or biased. Optimus integrates a training-free toxicity classification scheme that repurposes the safety alignment of commodity LLMs, and employs a dual-strategy alignment process combining synthetic "healing data" with Direct Preference Optimization (DPO) to efficiently steer models toward safety. Extensive evaluations demonstrate that Optimus mitigates toxicity even when relying on extremely biased classifiers (with up to 85% degradation in Recall). Optimus outperforms the state-of-the-art defense StarDSS and exhibits strong resilience against adaptive adversarial and jailbreak attacks. Our source code and datasets are available at https://github.com/secml-lab-vt/Optimus

URL PDF HTML ☆

赞 0 踩 0

2505.22749 2026-05-22 q-bio.NC cs.AI cs.LG cs.NE

Self-orthogonalizing attractor neural networks emerging from the free energy principle

从自由能原理中涌现的自正交吸引子神经网络

Tamas Spisak, Karl Friston

AI总结本文基于自由能原理，研究了自组织动力学如何从随机动力系统的基本原理中涌现，提出了一种无需显式学习和推断规则的高效且生物合理的方法，实现了多层贝叶斯主动推断过程，通过分析和模拟证明了所提网络倾向于产生近似正交化的吸引子表示，从而提升泛化能力和隐变量与可观测效应间的互信息。

Comments 27 pages main text, 8 pages appendix, 7 figures; interactive manuscript available at: https://pni-lab.github.io/fep-attractor-network Associated GitHub repository: https://github.com/pni-lab/fep-attractor-network

详情

DOI: 10.1016/j.neucom.2026.133472
Journal ref: Neurocomputing (2026): 133472

AI中文摘要

吸引子动力学是许多复杂系统，包括大脑的特征。理解这些自组织动力学如何从基本原理中涌现对于推进对神经计算和人工智能系统设计的理解至关重要。本文正式阐述了如何将自由能原理应用于随机动力系统的通用划分，从而推导出吸引子网络的形成机制。我们的方法消除了显式学习和推断规则的需要，并识别出这些自组织系统中涌现的、高效且生物合理的推断和学习动力学。这些结果导致了一个集体、多层次的贝叶斯主动推断过程。自由能景观上的吸引子编码先验信念；推断将感官数据整合到后验信念中；学习则微调耦合以最小化长期的惊讶。通过分析和模拟，我们证明所提出的网络倾向于产生近似正交化的吸引子表示，这是同时优化预测准确性和模型复杂性所导致的后果。这些吸引子能够高效地覆盖输入子空间，提升泛化能力和隐变量与可观测效应间的互信息。此外，尽管随机数据呈现导致对称且稀疏的耦合，但序列数据则促进不对称耦合和非平衡稳态动力学，提供了对传统玻尔兹曼机的自然扩展。我们的发现为自组织吸引子网络提供了统一的理论，为人工智能和神经科学提供了新的见解。

英文摘要

Attractor dynamics are a hallmark of many complex systems, including the brain. Understanding how such self-organizing dynamics emerge from first principles is crucial for advancing our understanding of neuronal computations and the design of artificial intelligence systems. Here we formalize how attractor networks emerge from the free energy principle applied to a universal partitioning of random dynamical systems. Our approach obviates the need for explicitly imposed learning and inference rules and identifies emergent, but efficient and biologically plausible inference and learning dynamics for such self-organizing systems. These result in a collective, multi-level Bayesian active inference process. Attractors on the free energy landscape encode prior beliefs; inference integrates sensory data into posterior beliefs; and learning fine-tunes couplings to minimize long-term surprise. Analytically and via simulations, we establish that the proposed networks favor approximately orthogonalized attractor representations, a consequence of simultaneously optimizing predictive accuracy and model complexity. These attractors efficiently span the input subspace, enhancing generalization and the mutual information between hidden causes and observable effects. Furthermore, while random data presentation leads to symmetric and sparse couplings, sequential data fosters asymmetric couplings and non-equilibrium steady-state dynamics, offering a natural generalization of conventional Boltzmann Machines. Our findings offer a unifying theory of self-organizing attractor networks, providing novel insights for AI and neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2503.02885 2026-05-22 cs.CY cs.CL cs.HC

"Would You Want an AI Tutor?" Understanding Stakeholder Perceptions of LLM-based Systems in the Classroom

你希望有一个AI导师吗？理解基于大语言模型的系统在课堂中的利益相关者观点

Caterina Fuligni, Daniel Dominguez Figaredo, Armanda Lewis, Julia Stoyanovich

AI总结本文研究了在课堂中部署基于大语言模型（LLM）系统的利益相关者观点，提出了一种以利益相关者为中心的框架，以支持更谨慎的决策。

详情

AI中文摘要

大语言模型（LLM）在教育环境中获得了广泛的应用，通常被描述为虚拟导师或教学助手。在早期的怀疑和禁令之后，许多学校和大学已经开始将这些系统整合到课程中。然而，关于是否以及如何部署LLM-based工具的决策通常是缺乏系统性地与所有受影响的利益相关者互动。在本文中，我们主张理解课堂中基于LLM的系统利益相关者观点不仅仅是衡量批准或接受，而是识别哪些关注被提出，何时提出，以及这对负责任的设计和治理有何影响。我们介绍了面向教育中LLM采用的感知框架（Co-PALE），该框架以利益相关者为中心，连接教育环境、负责任的人工智能原则和感知类别，以支持关于LLM-based工具采用的更谨慎的决策。我们通过针对先前工作的分析来奠定Co-PALE，以诊断在研究利益相关者观点时反复出现的差距，并通过具有不同教育场景的上下文不同的教育场景来说明相同的技术如何对不同的利益相关者产生不同的关注。我们进一步探讨了大学教师和K-12家长如何理解该框架，通过焦点小组讨论，使用他们的反思来揭示紧张和不确定性。Co-PALE支持更系统地思考在教育中部署LLM-based工具是否、在哪里以及为谁而部署。

英文摘要

Large Language Models (LLMs) have gained traction in educational settings, often framed as virtual tutors or teaching assistants. Following early skepticism and bans, many schools and universities have begun integrating these systems into curricula. Yet decisions about whether and how to deploy LLM-based tools are frequently made without systematic engagement with the full range of stakeholders they affect. In this paper, we argue that understanding stakeholder perceptions of LLM-based systems in the classroom is not a matter of measuring approval or acceptance, but of identifying whose concerns are surfaced, in which contexts, and with what implications for responsible design and governance. We introduce Contextualized Perceptions for the Adoption of LLMs in Education (Co-PALE), a stakeholder-first framework that connects educational context, responsible AI principles, and categories of perception to support more deliberate decision-making about the adoption of LLM-based tools. We ground Co-PALE through a targeted analysis of prior work to diagnose recurring gaps in how stakeholder perceptions are studied, and through contextually distinct educational scenarios that illustrate how the same technology raises different concerns for different stakeholders. We further examine how university faculty and K--12 parents make sense of the framework through focus groups, using their reflections to surface tensions and uncertainties. Co-PALE supports more systematic reasoning about whether, where, and for whom LLM-based tools should be deployed in education.

URL PDF HTML ☆

赞 0 踩 0

2502.13822 2026-05-22 stat.ML cs.LG

Uncertainty quantification for Markov chain induced martingales with application to temporal difference learning

马尔可夫链诱导的martingales的不确定性量化及其在时间差学习中的应用

Weichen Wu, Yuting Wei, Alessandro Rinaldo

AI总结本文提出了一种新的高维集中不等式和Berry-Esseen界，用于分析由马尔可夫链诱导的向量martingales，并将其应用于时间差学习算法的性能分析，得到了与渐近方差相符的高概率一致性保证，并建立了Gaussian近似的时间差估计器的分布收敛速率。

2209.03358 2026-05-22 cs.NE cs.AI cs.CR cs.CV cs.LG

Attacking the Spike: On the Transferability and Security of Spiking Neural Networks to Adversarial Examples

攻击尖峰：关于脉冲神经网络对抗示例的转移性和安全性

Nuo Xu, Kaleel Mahmood, Haowen Fang, Ethan Rathbun, Caiwen Ding, Wujie Wen

AI总结本文研究了脉冲神经网络（SNN）在对抗示例中的鲁棒性，揭示了对抗攻击的转移性，并提出了混合动态脉冲估计（MDSE）攻击方法，以提高SNN和非SNN模型的对抗示例生成效果。

Comments Accepted manuscript. Published in *Neurocomputing*, Volume 656, 2025, Article 131506. Available online 12 September 2025. DOI: 10.1016/j.neucom.2025.131506

详情

DOI: 10.1016/j.neucom.2025.131506
Journal ref: Neurocomputing, Volume 656, 2025, 131506

AI中文摘要

脉冲神经网络（SNNs）因其高能效和最近在分类性能上的进展而受到广泛关注。然而，与传统深度学习方法不同，SNN对对抗示例的鲁棒性研究仍相对薄弱。在本文中，我们通过三个贡献推进了SNN的对抗攻击研究。首先，我们表明对SNN的成功白盒对抗攻击高度依赖于底层的替代梯度估计器，即使对于对抗训练的SNN也是如此。其次，使用最佳的单一替代梯度估计器，我们分析了对抗攻击在SNN、视觉Transformer（ViTs）和CNN之间的可转移性。我们的分析揭示了两个关键差距：现有的白盒攻击没有利用多个替代梯度估计器来攻击SNN，且没有单个模型攻击能够可靠地生成同时欺骗SNN和非SNN模型的对抗示例。作为我们的第三个贡献，我们开发了混合动态脉冲估计（MDSE）攻击来解决这些问题。MDSE使用动态梯度估计方案，充分利用多个替代梯度估计器函数，生成能够同时欺骗SNN和非SNN模型的对抗示例。MDSE在SNN/ViT模型集合上比传统白盒攻击如Auto-PGD有效多达91.4%，在对抗训练的SNN集合上提供了3倍的提升。实验覆盖了三个数据集（CIFAR-10、CIFAR-100、ImageNet）和十九个分类器模型（每个CIFAR数据集七个，ImageNet五个）。我们的MDSE实现和评估的模型在https://github.com/nuoxuxxx/attacking-the-spike-mdse上公开可用。

英文摘要

Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and recent advances in classification performance. However, unlike traditional deep learning approaches, the study of SNN robustness to adversarial examples remains relatively underdeveloped. In this work, we advance the adversarial attack side of SNNs through three contributions. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient estimator, even for adversarially trained SNNs. Second, using the best single surrogate gradient estimator, we analyze the transferability of adversarial attacks across SNNs, Vision Transformers (ViTs) and CNNs. Our analysis reveals two key gaps: no existing white-box attack exploits multiple surrogate gradient estimators for SNNs, and no single-model attack reliably generates adversarial examples that simultaneously fool both SNN and non-SNN models. For our third contribution, we develop the Mixed Dynamic Spiking Estimation (MDSE) attack to address these issues. MDSE uses a dynamic gradient estimation scheme to fully exploit multiple surrogate gradient estimator functions and generates adversarial examples capable of fooling SNN and non-SNN models simultaneously. MDSE is up to 91.4% more effective on SNN/ViT model ensembles and provides a 3x boost on adversarially trained SNN ensembles compared to conventional white-box attacks like Auto-PGD. Experiments cover three datasets (CIFAR-10, CIFAR-100, ImageNet) and nineteen classifier models (seven per CIFAR dataset, five for ImageNet). Our implementation of MDSE and the evaluated models is publicly available at https://github.com/nuoxuxxx/attacking-the-spike-mdse.

URL PDF HTML ☆

赞 0 踩 0

2605.22060 2026-05-22 cs.CR cs.AI

Safeguarding Text-to-Image Generative Models Against Unauthorized Knowledge Distillation

防范未经授权的知识蒸馏的文本到图像生成模型

Yilan Gao, Sida Huang, Hongyuan Zhang, Xuelong Li

AI总结本文提出WaveGuard，一种单次生成器基保护框架，通过在用户指定的扰动预算下保护发布的合成图像，以防止未经授权的知识蒸馏和能力复制。

详情

AI中文摘要

闭包权重生成服务越来越多地通过基于查询的API部署，其中用户可以获取生成的输出，而模型参数保持不可访问。然而，这种部署并不能防止模型窃取：攻击者可以反复查询该服务，收集大量发布的合成图像，并将其用作私人替代模型的训练数据。这种查询-输出驱动的过程使未经授权的知识蒸馏和能力复制成为可能，而无需直接访问原始权重。为缓解这一威胁，一种实用的防御应保持发布的图像的视觉保真度，提供对扰动幅度的明确控制，并能够高效扩展到大规模输出发布。我们提出了WaveGuard，一种单次生成器基保护框架，该框架在用户指定的扰动预算下保护发布的合成图像。WaveGuard采用频率感知的扰动生成器，注入结构化、不可察觉的扰动，以保持对良性观众的感知效用，同时减少受保护图像作为未经授权的学生模型训练数据的有用性。在与WikiArt相关的合成输出蒸馏设置下的广泛实验表明，WaveGuard实现了有利的效用-保真度-效率权衡，具有显式的不可察觉性控制和显著的保护效率提升。

英文摘要

Closed-weight generative services are increasingly deployed through query-based APIs, where users can obtain generated outputs while model parameters remain inaccessible. However, such deployment does not prevent model stealing: an attacker can repeatedly query the service, collect large volumes of released synthetic images, and use them as training data for a private substitute model. This query-output-driven process enables unauthorized knowledge distillation and capability replication without direct access to the original weights. To mitigate this threat, a practical defense should preserve the visual fidelity of released images, provide explicit control over perturbation magnitude, and scale efficiently to large-volume output release. We present WaveGuard, a single-pass, generator-based protection framework that safeguards released synthetic images under a user-specified perturbation budget. WaveGuard employs a frequency-aware perturbation generator to inject structured, imperceptible perturbations that maintain perceptual utility for benign viewers while reducing the usefulness of protected images as training data for unauthorized student models. Extensive experiments under WikiArt-related synthetic-output distillation settings show that WaveGuard achieves a favorable efficacy--fidelity--efficiency trade-off, with explicit imperceptibility control and substantial gains in protection efficiency.

URL PDF HTML ☆

赞 0 踩 0

2605.22041 2026-05-22 cs.CR cs.LG

RADAR: Defending RAG Dynamically against Retrieval Corruption

RADAR: 通过动态防御对抗RAG的检索腐败

Ziyuan Chen, Yueming Lyu, Yi Liu, Weixiang Han, Jing Dong, Caifeng Shan, Tieniu Tan

AI总结 RADAR通过将可靠的上下文选择建模为图基能最小化问题，利用最大流最小割算法进行精确求解，采用贝叶斯记忆节点递归更新信念状态，以平衡稳定性和对抗性攻击，同时适应真实知识变化，在动态数据集上实现了比基线方法更优越的鲁棒性和响应质量，且存储开销小。

2605.22039 2026-05-22 cs.DC cs.AI cs.CR cs.MS

Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments

在边缘环境中的大规模矩阵安全并行行列式计算

Prajwal Panth

AI总结本文提出了一种安全并行行列式计算框架，通过复合元素扭曲等方法在分布式边缘服务器上实现隐私保护的行列式计算，以满足边缘计算环境下的实时需求。

Comments 15 pages, 7 figures, 5 tables. This paper was first made public in October 2024 and subsequently posted as v1 on TechRxiv (Dec 10, 2025): https://doi.org/10.36227/techrxiv.176539387.75109768/v1. The present arXiv submission is identical to that version (v1)

详情

AI中文摘要

边缘计算的出现使资源受限的客户端能够将密集的计算任务委托给分布式的边缘服务器，特别是在物联网（IoT）环境中。其中，矩阵行列式计算（MDC）对于控制系统、密码学和机器学习应用至关重要。然而，传统行列式算法的三次复杂度使其不适合在受限制的边缘场景中进行实时处理。我们提出了一种安全并行行列式计算（SPDC）框架，该框架在N个分布式边缘服务器上提供强安全性保障，包括隐私保护的MDC。该框架通过复合元素扭曲（CED）实现隐私保护，这是一种轻量级加密方法，结合了逐元素混淆（EWO）和Panth旋转定理（PRT），以隐藏矩阵的结构和数值内容，同时保持行列式属性。使用并行LU分解将加密的矩阵块分布到任意数量的不可信边缘服务器上，从而实现高效且可扩展的行列式计算。单向通信模型进一步通过消除服务器间的交互减少了协调开销。为了确保结果完整性并最小化客户端负担，我们进一步引入了两种验证算法：Q_2，一种概率性标量方法，以及Q_3，一种确定性和低复杂度的替代方案。数学分析表明，所提出的框架提供了强隐私和安全保障、低计算开销和部署灵活性，使其非常适合于安全、可扩展和实时的分布式边缘辅助系统中的MDC。

英文摘要

The advent of edge computing has enabled resource-constrained clients to delegate intensive computational tasks to distributed edge servers, especially within Internet of Things (IoT) environments. Among such tasks, Matrix Determinant Computation (MDC) remains critical for applications in control systems, cryptography, and machine learning. However, the cubic complexity of traditional determinant algorithms makes them unsuitable for real-time processing in constrained edge scenarios. We propose a Secure Parallel Determinant Computation (SPDC) framework, which provides strong security guaranties, including privacy-preserving MDC, across N distributed edge servers. The framework achieves privacy through Composite Element Distortion (CED) - a lightweight encryption method that combines Element-wise Obfuscation (EWO) and the Panth Rotation Theorem (PRT) to conceal both structural and numerical matrix content while preserving determinant properties. Parallel LU decomposition is used to distribute encrypted matrix blocks across an arbitrary number of untrusted edge servers, enabling efficient and scalable determinant computation. A one-way communication model further reduces coordination overhead by eliminating inter-server interactions. To ensure result integrity with minimal client burden, we further introduce two verification algorithms: Q_2, a probabilistic scalar method, and Q_3, a deterministic and low-complexity alternative. Mathematical analysis demonstrates that the proposed framework provides strong privacy and security guaranties, low computational overhead, and deployment flexibility - making it well-suited for secure, scalable, and real-time MDC in distributed edge-assisted systems.

URL PDF HTML ☆

赞 0 踩 0

2605.22010 2026-05-22 stat.ML cs.LG

Uniform-in-Time Weak Propagation-of-Chaos in Shallow Neural Networks

浅层神经网络中关于时间的弱传播混沌性

Margalit Glasgow, Joan Bruna

AI总结本文研究了在特征学习模式下使用梯度下降训练的一层神经网络，将有限宽度网络的输出与无限宽度网络的输出联系起来，并通过均场动力学来研究其长期行为。

Comments 46 pages

详情

AI中文摘要

我们考虑在特征学习模式下使用梯度下降训练的一层神经网络，并将有限宽度网络的输出$f_{\hatρ_t^m}$与其无限宽度对应的$f_{ρ_t^{MF}}$联系起来，后者在均场动力学中演变。虽然通过标准Grönwall估计可以得到常时间范围内的$\|f_{ρ_t^{MF}} - f_{\hatρ_t^m}\|$的界，但波动的长期行为则更为复杂。均匀时间界通常依赖于（局部）强凸性或噪声梯度动力学中出现的对数Sobolev不等式。在本文中，我们通过利用均场确定性Wasserstein梯度流动力学的收敛率，建立了非渐近的弱传播混沌性，该结果在时间上是均匀的。具体来说，设$L_t$为均场过剩均方误差损失在时间$t$处的值，$m$为神经元数量，在标准正则性假设和条件$\int_0^\infty L_t^{1/2} dt =O(\log d)$下，我们得到时间均匀界$\|f_{ρ_t^{MF}}- f_{\hatρ_t^m}\|^2 \lesssim ext{poly}(d) m^{-\min(1,c/6)}$，当$L_t \lesssim t^{-c}$时。我们的结果在无噪声环境中成立，并不假设在最优解附近景观的几何特性，且无缝扩展到其他离散形式，包括有限样本数和时间离散化。我们的结果的一个关键结论是，当均场人口损失动力学的收敛率快于$t^{-2}$时，我们仅需$ ext{poly}(d/ε)$个神经元、训练样本和GD步数即可达到损失$ε$。

英文摘要

We consider one-hidden layer neural networks trained in the feature-learning regime using gradient descent, and relate the output of the finite-width network $f_{\hatρ_t^m}$ to its infinite-width counterpart $f_{ρ_t^{MF}}$, which evolves in the mean-field dynamics. While constant-time horizon bounds for $\|f_{ρ_t^{MF}} - f_{\hatρ_t^m}\|$ may be obtained via standard Grönwall estimates, the long-time behavior of the fluctuation is a more delicate matter. Uniform-in-time bounds often rely on (local) strong convexity in the landscape or Logarithmic Sobolev inequalities present in noisy gradient dynamics. In this work, we establish non-asymptotic weak propagation-of-chaos that holds uniformly in time, obtained by exploiting instead the convergence rate of the mean-field deterministic Wasserstein-gradient-flow dynamics. Specifically, denoting by $L_t$ the mean-field excess MSE loss at time $t$ and $m$ the number of neurons, under standard regularity assumptions and the condition $\int_0^\infty L_t^{1/2} dt =O(\log d)$, we obtain the uniform in time bound $\|f_{ρ_t^{MF}}- f_{\hatρ_t^m}\|^2 \lesssim \text{poly}(d) m^{-\min(1,c/6)}$ whenever $L_t \lesssim t^{-c}$. Our result holds in a noiseless setting and does not make any assumptions on the geometry of the landscape near the optimum, and extends seamlessly to other forms of discretization, including finite number of samples and time discretization. A key takeaway of our result is that whenever the convergence rate of the mean-field, population-loss dynamics is faster than $t^{-2}$, we can attain a loss of $ε$ with only $\text{poly}(d/ε)$ neurons, training samples, and GD steps.

URL PDF HTML ☆

赞 0 踩 0

2605.22001 2026-05-22 cs.CR cs.AI cs.CL

Blind Spots in the Guard: How Domain-Camouflaged Injection Attacks Evade Detection in Multi-Agent LLM Systems

守卫中的盲区：如何域伪装注入攻击在多智能体大语言模型系统中逃避检测

Aaditya Pai

AI总结本文研究了在多智能体大语言模型系统中，域伪装注入攻击如何通过模仿目标文档的领域词汇和权威结构来逃避检测，揭示了检测器在静态和伪装负载之间的检测率差异（Camouflage Detection Gap, CDG），并展示了多智能体辩论架构对静态注入攻击的放大效应以及检测器增强的有限有效性。

Comments 8 pages, 3 figures, 2 tables. Submitted to EMNLP 2026 ARR cycle

详情

AI中文摘要

部署在保护大语言模型代理中的注入检测器是基于静态的模板化负载进行校准的，这些负载会公开声明自身为覆盖指令。我们识别出一个系统性的盲区：当负载被生成以模仿目标文档的领域词汇和权威结构时，我们称之为域伪装注入，标准检测器无法识别它们，检测率从Llama 3.1 8B上的93.8%降至9.7%，从Gemini 2.0 Flash上的100%降至55.6%。我们将此正式定义为Camouflage Detection Gap（CDG），即静态负载与伪装负载之间注入检测率的差异。在覆盖三个领域和两种模型家族的45项任务中，CDG是显著且统计显著的（Llama的chi^2=38.03，p<0.001；Gemini的chi^2=17.05，p<0.001），在两种情况下均无零反向不一致对。我们还评估了Llama Guard 3，一个生产安全分类器，其检测零伪装负载（IDRcamouflage=0.000），证实盲区不仅限于少量样本检测器，还扩展到专门的安全分类器。我们进一步表明，多智能体辩论架构通过小型模型放大静态注入攻击高达9.9倍，而更强的模型则表现出集体抵抗力。针对检测器的增强仅提供部分缓解（Llama上提高10.2%，Gemini上提高78.7%），这表明该漏洞是架构性的，而非偶然的，对于较弱的模型而言。我们的框架、任务库和负载生成器已公开发布。

英文摘要

Injection detectors deployed to protect LLM agents are calibrated on static, template-based payloads that announce themselves as override directives. We identify a systematic blind spot: when payloads are generated to mimic the domain vocabulary and authority structures of the target document, what we call domain camouflaged injection, standard detectors fail to flag them, with detection rates dropping from 93.8% to 9.7% on Llama 3.1 8B and from 100% to 55.6% on Gemini 2.0 Flash. We formalize this as the Camouflage Detection Gap (CDG), the difference in injection detection rate between static and camouflaged payloads. Across 45 tasks spanning three domains and two model families, CDG is large and statistically significant (chi^2 = 38.03, p < 0.001 for Llama; chi^2 = 17.05, p < 0.001 for Gemini), with zero reverse discordant pairs in either case. We additionally evaluate Llama Guard 3, a production safety classifier, which detects zero camouflage payloads (IDRcamouflage = 0.000), confirming that the blind spot extends beyond few-shot detectors to dedicated safety classifiers. We further show that multi-agent debate architectures amplify static injection attacks by up to 9.9x on smaller models, while stronger models show collective resistance. Targeted detector augmentation provides only partial remediation (10.2% improvement on Llama, 78.7% on Gemini), suggesting the vulnerability is architectural rather than incidental for weaker models. Our framework, task bank, and payload generator are released publicly.

URL PDF HTML ☆

赞 0 踩 0

2605.21996 2026-05-22 cs.SE cs.AI

From Patches to Trajectories: Privileged Process Supervision for Software-Engineering Agents

从片段到轨迹：软件工程代理的特权过程监督

Murong Ma, Tianyu Chen, Yun Lin, Shuai Lu, Qinglin Zhu, Yeyun Gong, Zhiyong Huang, Peng Cheng, Yan Lu, Jin Song Dong

AI总结本文提出Patches-to-Trajectories (P2T)方法，通过利用开发者撰写的参考补丁来改进软件工程代理的训练过程，提高训练效果和效率。

详情

AI中文摘要

监督微调（SFT）在长教师轨迹上的应用是使开放软件工程（SWE）代理具备调查和推理能力的主要方法。由于每个保留的响应都成为模仿目标，学生继承了最终结果和中间缺陷，包括无根据的跳跃和冗余循环。高质量的训练数据必须有效（每一步都基于事实并缩小代理的知识差距到正确修复）且高效（每一步都有信息而非冗余或循环）。现有方法仅使用二进制终端验证器过滤或重新标记教师轨迹，这并未直接针对这些方面，也无法对教师失败的实例提供监督。大多数真实问题包含一个开发者撰写的参考补丁，$p^\star$，揭示了正确修复所假设的文件路径、运行时行为和编码规范，但标准流程却将其丢弃。我们提出Patches-to-Trajectories（P2T），在整理过程中使用$p^\star$作为特权信息，并将轨迹构建制定为对每一步有效性与轨迹长度的双目标优化。一个反向阶段将$p^\star$转化为一个上下文事实和解决方案里程碑的潜在过程图，$G^\star$。一个正向阶段通过在泄漏阻断的 groundedness 检查下对每一步进展进行评分，从盲目的教师延续中整理轨迹，并保留最短的有效段。仅使用1.8k整理的SWE-Gym实例，P2T在结果过滤SFT及其工具错误掩码变体上提高了效果和效率。在SWE-bench Verified上，它将Pass@1提高多达10.8个点，同时将每实例推理成本减少约15%，在SWE-bench Lite上也保持一致的收益。大小匹配的消融分析和定性分析进一步将轨迹质量与数据规模分离。

英文摘要

Supervised fine-tuning (SFT) on long teacher trajectories is the dominant way to instill investigation and reasoning in open software-engineering (SWE) agents. Since every retained response becomes an imitation target, the student inherits the final outcome and intermediate flaws, including ungrounded leaps and redundant loops. High-quality training data must be effective(each step is grounded and narrows the agent's epistemic gap to the correct fix) and efficient(each step is information-bearing rather than redundant or looping). Existing recipes filter or relabel teacher rollouts using only a binary terminal verifier, which does not directly target these axes and provides no supervision on instances where the teacher fails. Most real issue includes a developer-authored reference patch, $p^\star$, revealing the file paths, runtime behaviors, and coding conventions presupposed by the correct fix, yet standard pipelines discard it. We propose Patches-to-Trajectories (P2T), which uses $p^\star$ as privileged information during curation and formulates trajectory construction as bi-objective optimization over per-step effectiveness and trajectory length. A reverse phase distills $p^\star$ into a latent process graph, $G^\star$, of contextual facts and solution milestones. A forward phase curates trajectories from blinded teacher continuations by scoring per-step progress against $G^\star$ under a leakage-blocking groundedness check and retaining the shortest effective segments. Using only 1.8k curated SWE-Gym instances, P2T improves effectiveness and efficiency over outcome-filtered SFT and its tool-error-masking variant. On SWE-bench Verified, it raises Pass@1 by up to 10.8 points while reducing per-instance inference cost by ~15%, with consistent gains on SWE-bench Lite. Size-matched ablations and qualitative analysis further isolate trajectory quality from data scale.

URL PDF HTML ☆

赞 0 踩 0

2605.21970 2026-05-22 eess.IV cs.CV

Entropy-Guided Self-Supervised Learning for Medical Image Classification

熵引导的自监督学习用于医学图像分类

Joao Florindo, Viviane Moura

AI总结本文提出了一种结合自监督学习和迁移学习的深度学习框架，通过使用熵引导的掩码自动编码器和ImageNet预训练模型，提升医学图像分类的性能和鲁棒性。

详情

AI中文摘要

准确且鲁棒的医学图像分类对于早期疾病诊断和治疗计划至关重要。然而，有限的标注数据、高类内变异性以及细微的类间差异往往阻碍深度学习模型的性能。本文介绍了一种协同深度学习框架，利用自监督学习和迁移学习的优势来增强医学图像分类。我们的方法使用两个不同的ConvNeXt-Tiny模型：一个在大规模自然图像数据集（ImageNet）上预训练，另一个在目标医学数据集上使用熵引导的掩码自动编码器（MAE）预训练。然后，这两个模型在特定的医学图像分类任务上进行微调。最终采用基于平均预测概率的集成策略，结合这两个模型的互补见解。在四个多样化的医学成像数据集（乳腺超声图像（BUSI）、国际皮肤成像协作（ISIC）2018、Kvasir和COVID）上的严格实验验证显示，我们的集成方法在性能和鲁棒性方面均优于现有方法。MAE预训练显著提升了领域特定数据的特征学习，而ImageNet预训练提供了强大的可迁移特征。集成方法始终取得最先进的结果，优于单独模型和现有方法，突显了结合多样预训练策略在挑战性医学图像分析中的有效性。

英文摘要

Accurate and robust medical image classification is paramount for early disease diagnosis and treatment planning. However, challenges such as limited annotated data, high intra-class variability, and subtle inter-class differences often hinder the performance of deep learning models. This paper introduces a synergistic deep learning framework that leverages the strengths of self-supervised learning and transfer learning for enhanced medical image classification. Our approach employs two distinct ConvNeXt-Tiny models: one pre-trained on a large-scale natural image dataset (ImageNet) and another pre-trained using an entropy-guided Masked Autoencoder (MAE) on the target medical dataset. Both models are then fine-tuned on specific medical image classification tasks. A final ensemble strategy, based on averaging predicted probabilities, is utilized to combine the complementary insights from these two models. Rigorous experimental validation across four diverse medical imaging datasets (Breast Ultrasound Images (BUSI), International Skin Imaging Collaboration (ISIC) 2018, Kvasir, and COVID) demonstrates the superior performance and robustness of our ensemble approach. The MAE pre-training significantly improves feature learning on domain-specific data, while the ImageNet pre-training provides strong generalizable features. The ensemble consistently achieves state-of-the-art results, outperforming individual models and existing methods, highlighting the efficacy of combining diverse pre-training strategies for challenging medical image analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.21969 2026-05-22 cs.IR cs.AI

LLM Retrieval for Stable and Predictable Ad Recommendations

基于大语言模型的稳定可预测广告推荐

Vinodh Kumar Sunkara, Satheeshkumar Karuppusamy, Hangjun Xu, Sai Deepika Regani, Kshitij Gupta, Gaby Nahum, Sneha Iyer, Jean-Baptiste Fiot, Yinglong Guo, Xiaowen Guo, Atul Jangra, Yucheng Liu, Jinghao Yan, Vijay Pappu, Benjamin Schulte, Deepak Chandra

AI总结本文提出了一种新的评估框架，用于量化广告推荐系统的稳定性和可预测性，并展示了基于微调大语言模型的在线验证语义候选生成框架，通过提高系统的语义感知能力，在稳定性和可预测性方面实现了显著改进。

Comments SIGIR 2026 AgentSearch Workshop, Melbourne Australia

详情

AI中文摘要

传统的广告推荐系统主要专注于使用召回率或归一化折扣累计增益（NDCG）等传统指标来优化点击或转化事件的预测准确性。随着生成AI技术的超大规模增长，广告库存和流动性不断增加，预测的稳定性和可预测性变得越来越关键。直观地说，预测的稳定性和可预测性可以定义为量化系统对小扰动（广告、创意）的鲁棒性，缺乏这些特性可能导致广告商可感知的问题，如重复性、冷启动和探索不足。本文介绍了一种新的评估框架，用于量化广告推荐系统的稳定性和可预测性，并提出了一个基于微调大语言模型（LLM）的在线验证语义候选生成框架，该框架在这些指标上实现了显著改进，通过从根本上提高系统的语义感知能力。该方法从广告创意中提取层次化的语义属性以获得LLM表示，这些表示作为基于图的扩展的基础，确保检索到的候选者包含广告的语义变体，保证来自广告商的小创意变体产生一致且可解释的用户交付结果。我们测试了这种LLM广告检索框架在大规模工业广告推荐系统中的表现，证明了在离线和在线A/B实验中均实现了显著改进，展示了可预测性和传统性能指标的提升。尽管在广告堆栈中进行了评估，但这是一个通用的框架，可广泛应用于面临类似扩展和可预测性挑战的其他大规模推荐和检索系统。

英文摘要

Traditional ads recommendation systems have primarily focused on optimizing for prediction accuracy of click or conversion events using canonical metrics such as recall or normalized discounted cumulative gain (NDCG). With the hyper-growth of ads inventory and liquidity with generative AI technologies, the prediction stability and predictability is becoming increasingly critical. Intuitively, prediction stability and predictability can be defined to quantify system robustness with respect to minor/noisy input (ads, creatives) perturbations, the lack of which could lead to advertiser perceivable problems such as repeatability, cold start and under-exploration. In this paper, we introduce a new evaluation framework for quantifying stability and predictability of an ads recommender system, and present an online validated semantic candidate generation framework powered by fine-tuned Large Language Models (LLMs) that showed significant improvement along these metrics by fundamentally improving the semantic-awareness of the system. The approach extracts hierarchical semantic attributes from ad creatives to obtain LLM representations, which serve as the foundation for graph-based expansion, ensuring the retrieved candidates encapsulate semantic variants of an ad, guaranteeing that small creative variants from the advertiser yield consistent and explainable delivery results to the user. We tested this LLM ads retrieval framework in a large-scale industrial ads recommendation system, demonstrating significant improvements across offline and online A/B experiments, showcasing gains in both predictability and traditional performance metrics. Although evaluated in the ads stack, this is a general framework that can be applied broadly to any large-scale recommendation and retrieval systems facing similar scaling and predictability challenges.

URL PDF HTML ☆

赞 0 踩 0

2605.21933 2026-05-22 cond-mat.stat-mech cs.AI cs.LG

Thermodynamic Irreversibility of Training Algorithms

训练算法的热力学不可逆性

Liu Ziyin, Yuanjie Ren, Adam Levine, Isaac Chuang

AI总结本文提出了一种通用框架，用于定义和分析训练算法的不可逆性，证明了四种不同方法在步长η的主导阶近似下是等价的，并展示了不可逆性如何导致时间反演对称性破缺的新兴力。

Comments preprint

2605.21916 2026-05-22 quant-ph cs.LG

A2QTGN: Adaptive Amplitude Quantum-Integrated Temporal Graph Network for Dynamic Link Prediction

A2QTGN：自适应幅度量子集成时间图网络用于动态链接预测

Nouhaila Innan, M. Murali Karthick, Simeon Kandan Sonar, Vivek Chaturvedi, Muhammad Shafique

AI总结本文提出A2QTGN，一种结合自适应幅度编码和时间图网络的混合量子-经典框架，用于动态链接预测，通过量子状态表示节点交互特征并根据时间活动选择性刷新幅度嵌入，提升时间表示能力。

Comments 9 pages, 3 figures

详情

AI中文摘要

动态链接预测对于建模复杂系统中演变的交互至关重要，包括社交、通信、金融和交通网络。经典时间图模型捕捉序列依赖性，但可能难以表示大规模动态图中同时和快速变化的节点-边交互。我们提出A2QTGN（自适应幅度量子集成时间图网络），一种混合量子-经典框架，结合自适应幅度编码与时间图网络骨干。所提出机制将节点交互特征表示为量子状态，并根据时间活动选择性刷新幅度嵌入，保留稳定节点状态的同时强调有意义的结构变化。此设计减少了不必要的量子重编码并改进了时间表示以进行链接预测。在五个时间图基准数据集上的实验表明，A2QTGN在多样化的动态图中实现了强大的预测和排名性能。消融研究证实了量子嵌入模块和自适应更新策略的重要性，而使用嘈杂后端和有限真实设备执行的硬件感知推断支持了近期量子辅助时间图学习的可行性。

英文摘要

Dynamic link prediction is important for modeling evolving interactions in complex systems, including social, communication, financial, and transportation networks. Classical temporal graph models capture sequential dependencies, but they may struggle to represent concurrent and rapidly changing node-edge interactions in large dynamic graphs. We propose A2QTGN (Adaptive Amplitude Quantum-Integrated Temporal Graph Network), a hybrid quantum-classical framework that combines adaptive amplitude encoding with a Temporal Graph Network backbone. The proposed mechanism represents node interaction features as quantum states and selectively refreshes amplitude embeddings based on temporal activity, preserving stable node states while emphasizing meaningful structural changes. This design reduces unnecessary quantum re-encoding and improves temporal representation for link prediction. Experiments on five Temporal Graph Benchmark datasets show that A2QTGN achieves strong predictive and ranking performance across diverse dynamic graphs. Ablation studies confirm the importance of both the quantum embedding module and the adaptive update strategy, while hardware-aware inference using a noisy backend and limited real-device execution supports the feasibility of near-term quantum-assisted temporal graph learning.

URL PDF HTML ☆

赞 0 踩 0

2605.21915 2026-05-22 cs.CR cs.LG

CCLab: Adversarial Testing of Learning- and Non-Learning-Based Congestion Controllers

CCLab: 学习型和非学习型拥塞控制器的对抗测试

Zhi Chen, Shehab Sarar Ahmed, Chenkai Wang, Brighten Godfrey, Gang Wang

AI总结本文提出CCLab框架，用于系统评估学习型和非学习型拥塞控制器在对抗性条件下的鲁棒性，发现学习型控制器在对抗测试中比传统算法更鲁棒，并展示了对抗性追踪可用于训练更鲁棒的拥塞控制器。

Comments 13 pages for main paper, 16 pages in total

详情

AI中文摘要

拥塞控制器（CCs）对网络性能至关重要，但其在恶劣条件下的鲁棒性仍不够了解。尽管最近的学习型CCs在受控环境中表现出色，但当控制器的输入信号被破坏或环境条件变得系统性挑战时，其与传统CCs的性能对比尚不清楚。本文介绍CCLab，一种对抗测试框架，用于系统评估学习型和非学习型CCs的鲁棒性。CCLab包含一个基于强化学习（RL）的对抗代理，在闭环中与拥塞控制策略协同工作，生成受约束的扰动，无论是对输入信号（特征级）还是外部网络条件（环境级），同时通过显式约束保持现实性。利用此框架，我们在特征级和环境级对抗性条件下比较学习型和非学习型CCs。尽管两种类型的CCs在对抗测试中均出现性能下降，但学习型CCs总体上比传统人工设计算法更鲁棒。最后，我们展示对抗性追踪可用于训练更鲁棒的CCs，其在挑战性和正常条件下均优于现有学习型CCs。

英文摘要

Congestion controllers (CCs) are critical to network performance, and yet their robustness under adverse conditions remains insufficiently understood. While recent learning-based CCs have demonstrated strong performance in controlled environments, it is unclear how they compare to traditional CCs when controllers' input signals are corrupted or when environmental conditions become systematically challenging. In this paper, we introduce CCLab, an adversarial testing framework for systematically evaluating the robustness of both learning-based and non-learning-based CCs. CCLab includes a reinforcement learning (RL)-based adversarial agent that operates in a closed loop with the congestion control policy, generating bounded perturbations either on input signals (feature-level) or on external network conditions (environment-level), while preserving realism through explicit constraints. Using this framework, we compare learning-based CCs with non-learning-based CCs under both feature-level and environment-level adversarial conditions. While both types of CCs suffer from performance degradation under adversarial testing, we find that learning-based CCs, in general, are more robust than traditional human-designed algorithms. Finally, we show that our adversarial traces can be used to train more robust CCs that outperform existing learning-based CCs under both challenging and normal conditions.

URL PDF HTML ☆

赞 0 踩 0

2605.21903 2026-05-22 eess.SY cs.AI cs.LG cs.NE cs.SY

Engineering Hybrid Physics-Informed Neural Networks for Next-Generation Electricity Systems: A State-of-the-Art Review

为下一代电力系统工程混合物理指导神经网络：最新综述

Joseph Nyangon

AI总结本文综述了用于电力系统的混合物理指导机器学习架构，探讨了物理指导神经网络（PINNs）、深度算子网络（DeepONets）、傅里叶神经算子、极端学习机增强的PINNs、基于图的PINNs（PIGNNs）和域分解PINNs等方法，展示了这些方法在场分析、故障检测、数字孪生、替代建模和控制优化中的应用，以及嵌入麦克斯韦方程等第一原理约束对预测精度、仿真时间和泛化能力的提升。

Comments 59 pages, 6 Figures

详情

DOI: 10.3389/frai.2026.1751785/

AI中文摘要

将机器学习与领域特定物理相结合，正在改变电力系统的設計、監測和控制，其中數據稀缺、解釋性有限以及需要强制物理定律限制了纯数据驱动模型。物理指导机器学习（PIML）通过将支配方程直接嵌入到学习过程中，解决了这些限制，为工业4.0应用提供了准确、高效且可扩展的解决方案。本文综述了用于电力系统的混合PIML架构，包括物理指导神经网络（PINNs）、深度算子网络（DeepONets）、傅里叶神经算子、极端学习机增强的PINNs、基于图的PINNs（PIGNNs）和域分解PINNs。每种方法通过覆盖场分析、故障检测、数字孪生、替代建模和控制优化的案例研究进行审查。综述显示，嵌入麦克斯韦方程和其他第一原理约束显著提高了在稀疏和噪声数据下的预测精度，将仿真时间相对于有限元方法减少了多个数量级，并增强了在不同运行条件下的一般化能力。混合框架在参数敏感性、动态行为和鲁棒性方面始终优于纯数据驱动的基线，同时支持实时数字孪生校准和不确定性量化。持续的挑战包括对于刚性多尺度问题训练不稳定、高保真模型的计算成本以及缺乏标准化的基准。研究结果表明，PIML使从黑箱数据驱动方法向透明、物理指导策略的转变成为可能，为在坚韧和智能电力系统中持续创新奠定了基础。

英文摘要

The integration of machine learning with domain-specific physics is transforming the design, monitoring, and control of electricity systems, where data scarcity, limited interpretability, and the need to enforce physical laws constrain purely data-driven models. Physics-informed machine learning (PIML) addresses these limitations by embedding governing equations directly into the learning process, yielding accurate, efficient, and scalable solutions for Industry 4.0 applications. This article reviews hybrid PIML architectures for electricity systems, including physics-informed neural networks (PINNs), Deep Operator Networks (DeepONets), Fourier Neural Operators, Extreme Learning Machine-enhanced PINNs, graph-based PINNs (PIGNNs), and domain-decomposition PINNs. Each approach is examined through case studies spanning field analysis, fault detection, digital twins, surrogate modeling, and control optimization. The review shows that embedding Maxwell's equations and other first-principles constraints substantially improves predictive accuracy under sparse and noisy data, reduces simulation time by orders of magnitude relative to finite element methods, and enhances generalization across operating regimes. Hybrid frameworks consistently outperform purely data-driven baselines on parameter sensitivity, dynamic behavior, and robustness, while supporting real-time digital-twin calibration and uncertainty quantification. Persistent challenges include training instability for stiff multi-scale problems, computational cost of high-fidelity models, and the absence of standardized benchmarks. The findings demonstrate that PIML enables a paradigm shift from black-box data-driven methods to transparent, physics-informed strategies, positioning the field for sustained innovation in resilient and intelligent electricity systems.

URL PDF HTML ☆

赞 0 踩 0

2605.21859 2026-05-22 q-bio.PE cs.LG q-bio.QM

PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference

PhylaFlow：在Billera-Holmes-Vogtmann树空间中进行混合流匹配用于系统发育推断

Yasha Ektefaie, Leo Cui, Shrey Jain, Marinka Zitnik, Pardis Sabeti

AI总结该研究提出PhylaFlow模型，通过在Billera-Holmes-Vogtmann树空间中学习后验盆地运输，实现混合流匹配，从而提高系统发育推断的效率和准确性。

Comments 9 pages, 3 figures

详情

AI中文摘要

系统发育树是混合对象：分支长度连续变化，而拓扑结构通过边收缩和扩展离散变化。Billera-Holmes-Vogtmann（BHV）树空间提供了这种结构的规范几何表示，将每个解析拓扑表示为欧几里得正交ant，并将拓扑变化表示为在共享的低维边界上移动。我们引入PhylaFlow，一种混合流匹配模型，该模型在BHV空间中学习后验盆地运输。PhylaFlow在BHV测地路径上训练，从随机起始树到短程后验样本，将连续分支长度运动与学习到的边界事件和离散拓扑转换耦合在一起。我们通过操作性评估所学的几何运算：如果流到达后验相关区域，则有限预算的贝叶斯细化，从或由其终端树初始化或引导，应能更有效地恢复后验支持的拓扑。在DS1-DS8系统发育后验基准上，PhylaFlow相对于经典初始化显著减少了初始Tree-KL。在有限预算的MrBayes细化后，直接PhylaFlow在大多数数据集上改进了早期和中期拓扑恢复轨迹，而split-guided PhylaFlow-MCMC在最困难的案例中取得了最强的结果。最好的PhylaFlow变体在八种数据集中的七种上优于短预热，并在八种数据集中的五种上优于PhyloGFN。在联合序列条件实验中，序列嵌入引导后验分裂恢复，尽管精确的后验拓扑恢复仍处于初步阶段。这些结果表明，混合流匹配可以学习BHV树空间中的可操作运输，并为贝叶斯系统发育推断提供几何感知的提议机制。

英文摘要

Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a hybrid flow-matching model that learns posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topology transitions. We evaluate the learned geometry operationally: if the flow reaches posterior-relevant regions, finite-budget Bayesian refinement initialized from, or guided by, its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 phylogenetic posterior benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to classical initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while split-guided PhylaFlow-MCMC obtains the strongest hard-case results. The best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight under the same refinement budget. In a joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.

URL PDF HTML ☆

赞 0 踩 0

2605.21846 2026-05-22 stat.ME cs.LG stat.ML

Causal Discovery in Structural VAR Models Under Equal Noise Variance

在等噪声方差假设下结构VAR模型中的因果发现

SeyedSina Seyedi HasanAbadi, Fahimeh Arab, Erfan Nozari, AmirEmad Ghassami

AI总结本文研究了在等噪声方差假设下线性高斯结构VAR模型中的因果发现问题，提出了一种基于稀疏性的方法ENVAR，用于在观测等价类中寻找稀疏的结构代表，并在合成数据和fMRI数据集上进行了评估。

详情

AI中文摘要

从多变量时间序列中进行因果发现具有挑战性，因为因果效应可能在时间上和同一采样间隔内同时发生。这个问题在神经科学等应用中尤为重要，其中采样率可能相对粗糙，而同时效应不一定形成无环图。我们研究了在等噪声方差假设下线性高斯结构VAR模型中的因果发现，这意味着结构噪声项具有共同的方差。与基于DAG的横断面等噪声方差设置不同，此处考虑的时间序列设置通常不会导致因果图的唯一点识别。相反，多种结构VAR参数化可以诱导相同的平稳观测过程定律。我们引入了一种针对此设置的观测等价性概念，并展示相应的等价类由结构方程的正交变换以及全局正比例尺度共同刻画。这种刻画导致了观测对齐差异，即比较结构模型模去保持观测定律的变换。基于这一理论，我们提出ENVAR，一种基于稀疏性的方法，用于在诱导的观测等价类中搜索稀疏的归一化结构代表。我们评估了所提出的方法在合成结构VAR数据和fMRI数据集上的性能。

英文摘要

Causal discovery from multivariate time series is challenging when causal effects may occur both across time and within the same sampling interval. This issue is especially important in applications such as neuroscience, where the sampling rate may be coarse relative to the underlying dynamics and contemporaneous effects need not form an acyclic graph. We study causal discovery in linear Gaussian structural VAR models under an equal noise variance assumption, meaning that the structural noise terms have a common variance. Unlike the DAG-based cross-sectional equal noise variance setting, the time-series setting considered here does not generally yield point identification of a unique causal graph. Instead, multiple structural VAR parameterizations can induce the same stationary observed process law. We introduce a notion of observational equivalence tailored to this setting and show that the corresponding equivalence class is characterized by orthogonal transformations of the structural equations together with a global positive scale. This characterization leads to an equivalence-aware model discrepancy, the observational alignment discrepancy, which compares structural models modulo transformations that preserve the observed law. Building on this theory, we propose ENVAR, a sparsity-based procedure that searches over the induced observational equivalence class for a sparse normalized structural representative. We evaluate the proposed methodology on synthetic structural VAR data and on an fMRI dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.21835 2026-05-22 eess.IV cs.AI cs.CV physics.med-ph

An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

一种开放的多中心全身FDG PET/CT基础模型用于肿瘤分割

Xiaofeng Liu, Qianru Zhang, Thibault Marin, Menghua Xia, Chi Liu, Georges El Fakhri, Jinsong Ouyang

AI总结本文提出了一种开放的多中心全身FDG PET/CT基础模型，通过整合四个公开数据集中的4997份标准化扫描，利用层次UNet结构和早期通道拼接实现解剖和代谢特征的交互，提高了肿瘤分割的标签效率和跨模态表征学习能力。

Comments Code available at: https://github.com/liu-xiaofeng/Foundation-Model-for-PET-CT

详情

AI中文摘要

解剖信息来自计算机断层扫描（CT）和代谢信息来自正电子发射断层扫描（PET）的协同解释对于肿瘤成像至关重要。然而，现有的PET/CT深度学习方法大多任务特定，通常在单一中心队列上训练，或者采用双分支融合方案，这延迟了跨模态交互并低估了PET和CT之间早期空间对应关系。为了解决这些限制，我们提出了一种开源的、多中心的、全身FDG PET/CT基础模型，利用四个公开数据集中的4997份标准化扫描。我们的框架采用层次UNet形状的后端，并在早期通道拼接，使解剖和代谢特征从第一个嵌入层开始交互。我们进一步引入基于零均值填补的掩码自编码目标，结合加权全局重建损失。这种设计避免了由于可学习掩码标记产生的非物理强度不连续性。在下游AutoPET病变分割中，所提出的模型显示出强大的标签效率：仅使用10%的标记训练数据，即可达到在完整数据集上训练的模型的性能。在极端5-shot线性探测下，联合PET/CT预训练也比单独模态预训练取得了更高的Dice分数。这种多中心基础模型展示了PET/CT肿瘤分割的标签效率和跨模态表征学习能力。它为推进自动化肿瘤成像提供了稳健、开源的基础，显著减少了临床实践中大规模手动注释的需求。

英文摘要

The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation, combined with a weighted global reconstruction loss. This design avoids non-physical intensity discontinuities at masked-region boundaries that arise from learnable mask tokens. On downstream AutoPET lesion segmentation, the proposed models demonstrate strong label efficiency: with only 10\% of the labeled training data, they achieve performance comparable to models trained from scratch on the full dataset. Under extreme 5-shot linear probing, joint PET/CT pretraining also achieves higher Dice scores than separated-modality pretraining. This multi-center foundation model demonstrates label efficiency and cross-modality representation learning for PET/CT tumor segmentation. It provides a robust, open-source basis for advancing automated oncologic imaging, significantly reducing the need for large-scale manual annotations in clinical practice.

URL PDF HTML ☆

赞 0 踩 0

2605.21805 2026-05-22 stat.CO cs.LG stat.ML

Truncated Neural Likelihood Estimation for Simulation-Based Inference in State-Space Models

截断神经似然估计用于状态空间模型中的基于模拟的推断

Kostas Tsampourakis, Víctor Elvira

AI总结本文提出了一种改进的截断神经似然估计（T-SNL）方法，解决了传统序列神经似然（SNL）在状态空间模型中推断时存在的样本需求大、扩展性差和不可 amortization 的问题，从而提高了推断的准确性、稳定性与鲁棒性。

详情

AI中文摘要

状态空间模型（SSMs）是强大的概率工具，用于建模具有潜变量动态的时间变化系统。在SSMs中的推断涉及对潜变量和参数的估计。在本文中，我们关注参数推断，这在SSMs中通常是一个极具挑战性的问题，因为似然函数不可行。最近，神经估计方法，如序列神经似然（SNL），在贝叶斯推断问题中显示出有前途的结果。在本文中，我们证明了当SNL应用于SSMs设置时，存在重要的限制，例如需要大量的模拟样本才能实现中等性能，序列长度扩展性差，且不具有amortization特性。我们随后介绍了一种新的推断算法，称为截断-SNL（T-SNL），以解决SNL的限制。我们的算法更加准确，训练过程中更加稳定和鲁棒，扩展性更强，且在新观测可用时可以进行amortization。我们的实验表明，T-SNL是一种样本效率高、鲁棒且灵活的算法，优于其他方法。

英文摘要

State-space models (SSMs) are powerful probabilistic tools for modeling time-varying systems with latent dynamics. Inference in SSMs involves the estimation of latent states and parameters. In this work, we focus on parameter inference, which for SSMs is in general a very challenging problem due to the intractability of the likelihood. Recently, neural estimation methods, such as sequential neural likelihood (SNL), have shown promising results in Bayesian inference problems. In this paper, we show that SNL, when applied to the SSM setting, suffers important limitations, such as requiring a large amount of simulated samples to achieve a moderate performance, scaling poorly with sequence length, while not being amortized. We then introduce a novel inference algorithm called truncated-SNL (T-SNL), which addresses the limitations of SNL. Our algorithm is more accurate, more stable and robust during training, more scalable to longer temporal sequences, and can be amortized when new observations become available. Our experiments show that T-SNL is sample-efficient, robust, and flexible algorithm which outperforms other approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.21804 2026-05-22 eess.IV cs.CV cs.LG

Mapping Tomato Cropping Systems in California Using AlphaEarth Geospatial Embeddings and Deep Learning Analysis

使用AlphaEarth地理空间嵌入和深度学习分析映射加利福尼亚州番茄种植系统

Mohammadreza Narimani, Alireza Pourreza, Parastoo Farajpoor

AI总结本研究评估了Google DeepMind的AlphaEarth地理空间嵌入是否可以作为替代方法，用于加利福尼亚州番茄种植系统的映射，通过使用LandIQ 2018的作物多边形构建平衡参考数据集，并利用U-Net分割模型和蒙特卡洛滴落技术实现高精度的番茄种植系统映射。

Comments 5 pages, 3 figures, 1 table. Preprint submitted to ASABE 2026 AIM

详情

AI中文摘要

田间尺度的作物地图支持供应链预测和政策制定，但州级作物识别仍常常依赖于回顾性调查或基于手工工程化光谱特征的遥感工作流程。这些流程可以准确，但需要重复预处理，并且在多年间往往失去鲁棒性。本研究评估了Google DeepMind的AlphaEarth地理空间嵌入是否可以作为映射加利福尼亚州番茄种植系统的替代分析方法。使用LandIQ 2018的作物多边形构建了包含4,742个番茄和4,742个非番茄地块的平衡参考数据集。对于每个多边形，提取了64波段的AlphaEarth嵌入芯片，并与二值掩码对齐，然后分为空间独立的训练集（n = 6,638）、验证集（n = 1,422）和测试集（n = 1,424）。在AWS SageMaker上使用复合掩码二进制交叉熵和软Dice损失训练了U-Net分割模型。为了补充硬预测，保留蒙特卡洛滴落并在每次芯片上重复100次以估计预测均值和方差。在独立的测试集上，模型实现了99.19%的像素准确率、98.69%的精确度、99.40%的召回率、99.04%的F1分数、98.11%的交并比和99.02%的芯片准确率。不确定性地图在田边区域始终最高，在田内区域较低。结果表明，AlphaEarth嵌入保留了与作物相关的空间和时间结构，并且可以支持无需手动特征工程的准确、田间尺度的番茄映射。

英文摘要

Field-scale crop maps support supply-chain forecasting and policy, yet statewide crop identification still often depends on retrospective surveys or remote-sensing workflows built around hand-engineered spectral features. Those pipelines can be accurate, but they require repeated preprocessing and often lose robustness across years. This study evaluated whether Google DeepMind's AlphaEarth geospatial embeddings can serve as an analysis-ready alternative for mapping processing tomato systems in California. LandIQ 2018 crop polygons were used to assemble a balanced reference dataset of 4,742 tomato and 4,742 non-tomato fields. For each polygon, 64-band AlphaEarth embedding chips were extracted and aligned with binary masks, then divided into spatially independent training (n = 6,638), validation (n = 1,422), and test (n = 1,424) sets. A U-Net segmentation model was trained on AWS SageMaker using a composite masked binary cross-entropy and soft Dice loss. To complement hard predictions, Monte Carlo dropout was retained at inference and repeated 100 times per chip to estimate predictive mean and variance. On the independent test set, the model achieved 99.19% pixel accuracy, 98.69% precision, 99.40% recall, 99.04% F1 score, 98.11% intersection over union, and 99.02% chip accuracy. Uncertainty maps were consistently highest near field edges and low within field interiors. The results show that AlphaEarth embeddings retain crop-relevant spatial and temporal structure and can support accurate, field-scale tomato mapping without manual feature engineering.

URL PDF HTML ☆

赞 0 踩 0

2605.21789 2026-05-22 hep-ex cs.AI

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

基于补丁层次注意力的高效粒子喷注标记变压器

Aaron Wang, Zihan Zhao, Alan Xia, Chang Sun, Abhijith Gandrakota, Jennifer Ngadiuba, Richard Cavanaugh, Javier Duarte

AI总结本文提出了一种结合物理启发的几何信息传递模块和基于补丁的层次注意力机制的Patch Hierarchical Attention Transformer (PHAT-JeT)，以在有限资源下实现高效的粒子喷注标记，从而在四个基准测试中取得最佳性能。

详情

AI中文摘要

实时喷注标记对于在大型强子对撞机的高通量探测器中识别短寿命粒子衰变至关重要，其中负责决定存储哪些碰撞事件的实时触发系统对延迟和准确性提出了严格要求。尽管变换器架构在计算不受限制时能够实现最高的喷注标记准确性，但其二次自注意力成本使得在触发预算内进行推理变得受限。现有的高效变体虽然降低了计算成本，但会阻碍分类性能。为了解决这一限制，我们引入了Patch Hierarchical Attention Transformer (PHAT-JeT)，它结合了两个机制：一个受物理启发的几何信息传递模块，用于编码局部探测器平面结构，以及一个基于补丁的层次注意力方案，该方案在小粒子组内计算精确的注意力，同时通过轻量级补丁-标记通信保持全局上下文。在受限预算内，PHAT-JeT在四个基准测试（hls4ml、JetClass、Top Tagging 和 Quark-Gluon）中实现了所有资源受限喷注标记模型中的最佳准确性和背景拒绝率。我们的代码可在 https://github.com/aaronw5/PHAT-JeT 上获得。

英文摘要

Real-time jet tagging is critical for identifying short-lived particle decays in the high-throughput detectors of the Large Hadron Collider, where real-time trigger systems responsible for deciding which collision events to store impose strict latency and accuracy constraints. While transformer architectures achieve the highest jet tagging accuracy when compute is unconstrained, their quadratic self-attention cost makes inference restrictive on trigger budget. Existing efficient variants reduce the computational cost, but hinder the classification performance. To address this limitation, we introduce the Patch Hierarchical Attention Transformer (PHAT-JeT), which combines two mechanisms: a physics-inspired geometric message-passing module that encodes local detector-plane structure, and a hierarchical patch-based attention scheme that computes exact attention within small particle groups while preserving global context through lightweight patch-token communication. Within a restricted budget, PHAT-JeT achieves state-of-the-art accuracy and background rejection among all resource-constrained jet tagging models on four benchmarks (\textsc{hls4ml}, JetClass, Top Tagging, and Quark--Gluon). Our code is available at https://github.com/aaronw5/PHAT-JeT.

URL PDF HTML ☆

赞 0 踩 0

2605.21777 2026-05-22 cs.HC cs.AI

Understanding Perspectives of Patients, Caregivers and Clinicians towards Emerging Collaborative-decision Making Technologies

理解患者、护理人员和临床医生对新兴协作决策技术的看法

Ray-Yuan Chung, Athena Ortega, Zixuan Xu, Daeun Yoo, Jaime Snyder, Wanda Pratt, Aaron Wightman, Ryan Hutson, Cozumel Pruette, Ari Pollack

AI总结研究探讨了患者、护理人员和临床医生对协作决策技术的态度差异，发现技术接受度与用户对技术的信任有关，需探索建立或促进用户与技术之间信任的设计和实施策略。

Comments Accepted at The Workshop on Interactive Systems in Healthcare (WISH) at AMIA Annual Symposium 2025

2605.21773 2026-05-22 cs.CR cs.LG

HIDBench: Benchmarking Large Language Models for Host-Based Intrusion Detection

HIDBench: 用于基于主机入侵检测的大型语言模型评估

Danyu Sun, Jinghuai Zhang, Yuan Tian, Zhou Li

AI总结本文提出HIDBench基准测试，用于评估大型语言模型在支持基于主机的入侵检测系统（HIDS）中的能力，揭示了LLM在复杂系统日志数据中的性能差异和敏感性。

详情

AI中文摘要

近年来，基准测试努力已推动了大型语言模型（LLMs）在网络安全中的评估，包括渗透测试和漏洞识别等任务。然而，入侵检测从系统日志这一关键网络安全任务仍未被探索。在本文中，我们提出一个新的基准测试，以评估LLM在支持基于主机的入侵检测系统（HIDS）中的能力。该任务需要在大规模、嘈杂且高度不平衡的系统日志上进行细粒度推理，其中良性与恶意活动之间的复杂相互作用使得可靠检测具有挑战性。我们的基准测试统一了三个公开的系统日志数据集，DARPA-E3、DARPA-E5和NodLink，并引入了一个数据构建管道，将原始主机遥测数据转换为LLM兼容的输入，从而在现实入侵检测设置下进行系统评估。我们对前沿LLM的评估揭示了在不同数据集上的显著性能差距。尽管许多模型在更简单的数据集上实现了高精度（通常超过0.8），但当系统日志变得更加嘈杂和复杂时，其性能显著下降，MCC经常低于0.5，误报率急剧上升。我们进一步分析了模型行为，并识别出不同的模式，包括具有低误报率的保守检测器和产生过多警报的过度敏感模型。总体而言，我们的结果表明，尽管LLM在HIDS中显示了强大的潜力，但其有效性对数据复杂性高度敏感，稳健的系统设计对于可靠的部署至关重要。

英文摘要

Recent benchmark efforts have advanced the evaluation of large language models (LLMs) in cybersecurity, including tasks such as penetration testing and vulnerability identification. However, a critical cybersecurity task, namely intrusion detection from system logs, remains unexplored. In this work, we present a new benchmark to assess LLMs' capabilities in supporting host-based intrusion detection systems (HIDS). This task requires fine-grained reasoning over large-scale, noisy, and highly imbalanced system logs, where complex interactions between benign and malicious activities make reliable detection challenging. Our benchmark unifies three public system log datasets, DARPA-E3, DARPA-E5, and NodLink, and introduces a data construction pipeline that transforms raw host telemetry into LLM-compatible inputs, enabling systematic evaluation under realistic intrusion detection settings. Our evaluation of frontier LLMs reveals substantial performance gaps across datasets. While many models achieve high precision (often above 0.8) on simpler datasets, their performance degrades significantly as system logs become noisier and more complex, with MCC frequently dropping below 0.5 and false positive rates increasing sharply. We further analyze model behavior and identify distinct regimes, including conservative detectors with low false positive rates and over-sensitive models that generate excessive alerts. Overall, our results highlight that while LLMs show strong potential for HIDS, their effectiveness is highly sensitive to data complexity, and robust system design is essential for reliable deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.21736 2026-05-22 stat.ML cs.AI cs.LG

Support-aware offline policy selection for advertising marketplaces

面向广告市场的支持感知离线策略选择

Prashant Shekhar, Caroline Howard

AI总结本文提出了一种支持感知的离线决策框架，用于广告市场的保留策略选择，通过将记录证据转化为保守决策对象，以确保验证的可靠性，而非仅依赖点估计排名。

详情

AI中文摘要

记录的广告拍卖使离线保留价格评估变得有吸引力但有风险。回放表可以识别具有大显眼收益增益的策略，但它们也可能隐藏弱阈值支持、多重比较效应、子组伤害和投标者响应不确定性。现有的回放和离线策略评估方法估计或排名策略价值，但它们不能直接回答可用证据是否足够强以证明验证的问题。本文开发了一种支持感知的离线决策框架用于保留策略选择。与其输出单一的点估计胜者，该框架将记录证据转化为保守的决策对象，包括认证的策略、统计上被主导的替代方案以及需要进一步验证的未解决候选者。主要理论结果给出了一种统一的有限目录保证，显示在同时控制不确定性和保守支持门控的情况下，该框架保留了最佳通过策略，同时排除了具有认证遗憾的策略。支持性结果描述了支持本地化的回放泛化，建立了信息论阈值解析极限，并量化了异质投标者响应如何推翻本地化回放排名。在iPinYou实时竞价日志上的实验显示，领先的保留规则在第二季实现了47.66%的回放提升，同时实现了40.71%的下限提升，在第三季实现了43.87%的冻结超时回放提升。该框架将19个策略目录减少到两个策略验证短名单，同时在44个广告商、交易所和地区段中认证无害。结果支持核心主张，即离线保留策略评估应产生认证的验证决策，而非仅依赖点估计排名。

英文摘要

Logged advertising auctions make offline reserve-price evaluation attractive but risky. Replay tables can identify policies with large apparent yield gains, yet they can also hide weak threshold support, multiple-comparison effects, subgroup harm, and bidder-response uncertainty. Existing replay and off-policy evaluation methods estimate or rank policy values, but they do not directly answer the operational question of whether the available evidence is strong enough to justify validation. This paper develops a support-aware offline decision framework for reserve-policy selection. Rather than outputting a single point-estimate winner, the framework converts logged evidence into a conservative decision object consisting of certified policies, statistically dominated alternatives, and unresolved candidates requiring further validation. The main theoretical result gives a unified finite-catalog guarantee showing that, under simultaneous uncertainty control and conservative support gates, the framework preserves the best gate-passing policy while eliminating only policies with certified regret. Supporting results characterize support-localized replay generalization, establish information-theoretic threshold-resolution limits, and quantify when heterogeneous bidder response can overturn localized replay rankings. Experiments on iPinYou real-time-bidding logs show that the leading reserve rule achieves a 47.66% replay lift in season two, a 40.71% simultaneous lower-bound lift, and a 43.87% frozen out-of-time replay lift in season three. The framework reduces a 19-policy catalog to a two-policy validation shortlist while certifying non-harm across 44 advertiser, exchange, and region segments. The results support the central claim that offline reserve-policy evaluation should produce certified validation decisions rather than point-estimate rankings alone.

URL PDF HTML ☆

赞 0 踩 0

2605.21722 2026-05-22 cond-mat.stat-mech cond-mat.mtrl-sci cs.LG

MetaDNS: Enhancing Exploration in Discrete Neural Samplers via Well-Tempered Metadynamics

MetaDNS: 通过良好温控元动力学增强离散神经采样器的探索

Xiaochen Du, Juno Nam, Jaemoo Choi, Wei Guo, Sathya Edamadaka, Junyi Sha, Elton Pan, Yongxin Chen, Molei Tao, Rafael Gómez-Bombarelli

AI总结本文提出MetaDNS，一种将良好温控元动力学整合到离散扩散或自回归采样器中的通用框架，以解决多模式和能量屏障离散分布采样中的模式崩溃问题，并实现自由能重建。

Comments Accepted at ICML 2026

详情

AI中文摘要

从具有多个模式和能量屏障的离散分布进行采样对于机器学习和计算物理都是基础性的。最近的离散神经采样器如MDNS在模式崩溃和无法采样模式之间高能屏障区域方面存在缺陷，这对自由能估计和相变理解至关重要。我们提出了元动力学离散神经采样器（MetaDNS），一种将良好温控元动力学整合到离散扩散或自回归采样器中的通用框架。通过在选定的低维坐标上维持一个适应性、历史依赖性的偏置势能，MetaDNS强迫探索之前不可达的区域，使自由能重建成为可能，这在标准神经采样器中由于缺乏高能样本而不可行。在具有挑战性的低温基准上，包括Ising、Potts和铜-金二元合金，MetaDNS再现了热力学分布。与基于MCMC的元动力学相比，MetaDNS也实现了相当的探索，所需偏置沉积步骤更少。

英文摘要

Sampling from discrete distributions with multiple modes and energy barriers is fundamental to machine learning and computational physics. Recent discrete neural samplers like MDNS suffer from mode collapse and fail to sample high-energy barrier regions between modes, which is critical for free energy estimation and understanding phase transitions. We propose Metadynamics Discrete Neural Sampler (MetaDNS), a general framework integrating well-tempered metadynamics into discrete diffusion or autoregressive samplers. By maintaining an adaptive, history-dependent bias potential along selected low-dimensional coordinates, MetaDNS forces exploration of previously inaccessible regions, enabling free energy reconstruction infeasible with standard neural samplers due to a lack of high-energy samples. On challenging low-temperature benchmarks including Ising, Potts, and the copper-gold binary alloy, MetaDNS reproduces the thermodynamic distribution. Compared to MCMC-based metadynamics, MetaDNS also achieves comparable exploration requiring fewer bias deposition steps.

URL PDF HTML ☆

赞 0 踩 0