arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.11830 2026-05-29 math.OC cs.LG

A Computational Method for Solving the Stochastic Joint Replenishment Problem in High Dimensions

一种求解高维随机联合补货问题的计算方法

Barış Ata, Wouter van Eekelen, Yuan Zhong

AI总结针对高维随机联合补货问题，提出一种基于深度神经网络和脉冲控制近似的仿真计算方法，在高达50个SKU的问题中匹配或超越现有基准。

Comments 71 pages, 5 figures

2510.20743 2026-05-29 cs.HC cs.AI cs.CL

Empathic Prompting: Non-Verbal Context Integration for Multimodal LLM Conversations

共情提示：多模态大语言模型对话中的非语言上下文整合

Lorenzo Stacchio, Andrea Ubaldi, Alessandro Galdelli, Maurizio Mauri, Emanuele Frontoni, Andrea Gaggioli

AI总结提出共情提示框架，通过集成面部表情识别服务将非语言情感线索隐式融入大语言模型对话，实现无需用户显式控制的流畅多模态交互。

详情

AI中文摘要

我们提出了共情提示，一种新颖的多模态人机交互框架，它通过隐式的非语言上下文丰富大语言模型（LLM）对话。该系统集成了商业面部表情识别服务以捕捉用户的情感线索，并将其作为上下文信号嵌入提示过程中。与传统多模态界面不同，共情提示不需要用户显式控制；相反，它通过情感信息无干扰地增强文本输入，以实现对话和流畅性对齐。该架构模块化且可扩展，允许集成额外的非语言模块。我们描述了通过本地部署的DeepSeek实例实现的系统设计，并报告了初步的服务和可用性评估（N=5）。结果表明，非语言输入能够一致地整合到连贯的LLM输出中，参与者强调了对话的流畅性。除了这一概念验证外，共情提示还指向了聊天机器人中介通信中的应用，特别是在医疗或教育等领域，这些领域中用户的情感信号至关重要，但在言语交流中往往难以察觉。

英文摘要

We present Empathic Prompting, a novel framework for multimodal human-AI interaction that enriches Large Language Model (LLM) conversations with implicit non-verbal context. The system integrates a commercial facial expression recognition service to capture users' emotional cues and embeds them as contextual signals during prompting. Unlike traditional multimodal interfaces, empathic prompting requires no explicit user control; instead, it unobtrusively augments textual input with affective information for conversational and smoothness alignment. The architecture is modular and scalable, allowing integration of additional non-verbal modules. We describe the system design, implemented through a locally deployed DeepSeek instance, and report a preliminary service and usability evaluation (N=5). Results show consistent integration of non-verbal input into coherent LLM outputs, with participants highlighting conversational fluidity. Beyond this proof of concept, empathic prompting points to applications in chatbot-mediated communication, particularly in domains like healthcare or education, where users' emotional signals are critical yet often opaque in verbal exchanges.

URL PDF HTML ☆

赞 0 踩 0

2510.12310 2026-05-29 cs.CR cs.LG

DeepTrust: Multi-Step Classification through Dissimilar Adversarial Representations for Robust Android Malware Detection

DeepTrust：通过不同对抗表示的多步分类实现鲁棒的Android恶意软件检测

Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà

AI总结提出DeepTrust元启发式方法，通过级联条件激活的异构分类器序列，最大化内部模型表示差异，在特征空间逃逸攻击下实现鲁棒检测，在2025年IEEE SaTML竞赛中获第一名。

详情

DOI: 10.1016/j.eswa.2026.132961

AI中文摘要

在过去十年中，机器学习已被广泛用于识别恶意Android应用程序。然而，这些方法仍然容易受到对抗样本的攻击，即那些被巧妙操纵以欺骗机器学习模型做出错误预测的样本。本研究提出了DeepTrust，一种新颖的元启发式方法，它将灵活的分类器（如深度神经网络）排列成有序序列，最终决策由单个内部模型根据级联激活的条件做出。在2025年IEEE SaTML会议的鲁棒Android恶意软件检测竞赛中，DeepTrust获得了第一名并取得了最先进的结果，在特征空间逃逸攻击下，其性能比次优竞争对手高出266%。同时，它在非对抗性恶意软件上保持了最高的检测率，假阳性率低于1%。该方法的效果源于最大化内部模型之间学习表示的差异。通过使用诱导数据产生根本不同嵌入的分类器，决策空间对攻击者变得不可预测。这挫败了逃逸攻击固有的迭代扰动过程，从而在不牺牲干净样本准确性的情况下增强了系统的鲁棒性。

英文摘要

Over the last decade, machine learning has been extensively applied to identify malicious Android applications. However, such approaches remain vulnerable against adversarial examples, i.e., examples that are subtly manipulated to fool a machine learning model into making incorrect predictions. This research presents DeepTrust, a novel metaheuristic that arranges flexible classifiers, like deep neural networks, into an ordered sequence where the final decision is made by a single internal model based on conditions activated in cascade. In the Robust Android Malware Detection competition at the 2025 IEEE Conference SaTML, DeepTrust secured the first place and achieved state-of-the-art results, outperforming the next-best competitor by up to 266% under feature-space evasion attacks. This is accomplished while maintaining the highest detection rate on non-adversarial malware and a false positive rate below 1%. The method's efficacy stems from maximizing the divergence of the learned representations among the internal models. By using classifiers inducing fundamentally dissimilar embeddings of the data, the decision space becomes unpredictable for an attacker. This frustrates the iterative perturbation process inherent to evasion attacks, enhancing system robustness without compromising accuracy on clean examples.

URL PDF HTML ☆

赞 0 踩 0

2510.12152 2026-05-29 stat.ML cs.LG

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

解耦赌博机的跟随扰动领导者：两全其美与实用性

Chaiwon Kim, Jongyeong Lee, Min-hwan Oh

AI总结针对解耦多臂赌博机问题，提出一种高效的跟随扰动领导者策略，在随机环境下实现常数遗憾，在对抗环境下实现最优O(√KT)遗憾，且避免了凸优化和重采样过程，显著降低计算成本。

Comments Accepted to ICML 2026, 31 pages

详情

AI中文摘要

我们研究了解耦多臂赌博机问题，其中学习者在每一轮分别选择一个臂进行探索，并选择另一个可能不同的臂进行利用。在此设置中，探索臂的损失被观察到但不承担，而利用臂的损失被承担但不被观察到。我们提出了一种高效的跟随扰动领导者（FTPL）策略，该策略在随机环境下实现常数遗憾，在对抗环境下实现最优$O(\sqrt{KT})$遗憾，从而获得两全其美（BOBW）保证。我们方法的一个关键特征是它完全避免了先前BOBW策略所需的凸优化以及FTPL赌博机策略中通常使用的重采样过程。这使得FTPL能够充分发挥其计算效率优势，大幅降低计算成本。我们通过实验证实，我们的策略不仅提高了运行时间，而且在两种环境下都表现出优越的遗憾性能。

英文摘要

We study the decoupled multi-armed bandit problem, where the learner separately selects one arm for exploration and one, possibly different, arm for exploitation at each round. In this setting, the loss of the explored arm is observed but not incurred, whereas the loss of the exploited arm is incurred without being observed. We propose an efficient Follow-the-Perturbed-Leader (FTPL) policy that achieves Best-of-Both-Worlds (BOBW) guarantee with constant regret in the stochastic regime and optimal $O(\sqrt{KT})$ regret in the adversarial regime. A key feature of our method is that it completely avoids both the convex optimization required by prior BOBW policies and the resampling procedures typically used in FTPL bandit policies. This allows FTPL to fully realize its computational efficiency advantages, leading to substantial reductions in computational cost. We empirically confirm that our policy not only improves the runtime but also demonstrates superior regret performance in both regimes.

URL PDF HTML ☆

赞 0 踩 0

2510.10020 2026-05-29 stat.ML cs.LG q-bio.BM

Calibrating Generative Models to Distributional Constraints

生成模型的分布约束校准

Henry D. Smith, Nathaniel L. Diamant, Brian L. Trippe

AI总结针对生成模型采样分布统计量偏离期望的校准问题，提出将校准形式化为受约束优化问题，并通过松弛损失和奖励损失两种替代目标进行微调，在蛋白质设计、图像生成和语言建模等应用中显著降低了数百个同时约束下的校准误差。

Comments To appear at the International Conference on Machine Learning (ICML), 2026. Codebase accompanying the paper is available at: https://github.com/smithhenryd/cgm

2509.23573 2026-05-29 cs.CR cs.AI

Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence

揭示LLM辅助网络威胁情报中的脆弱性

Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi

AI总结本文通过人机协同分类框架，识别并验证了LLM在CTI推理中的三种领域特定认知失败模式（虚假关联、矛盾知识和受限泛化），并证明针对性防御可显著降低失败率。

详情

AI中文摘要

大型语言模型（LLM）正越来越多地被用于帮助安全分析师应对激增的网络威胁，自动化从漏洞评估到事件响应的工作流程。然而，在实际操作的CTI工作流中，可靠性差距仍然显著。现有解释通常指向通用模型问题（如幻觉），但我们认为主要瓶颈在于威胁格局本身：CTI具有异质性、易变性和碎片化特征。在这些条件下，证据相互交织、众包且时间不稳定，这些特性是标准LLM研究很少捕捉到的。在本文中，我们对LLM在CTI推理中的脆弱性进行了全面的实证研究。我们引入了一个人机协同分类框架，该框架能够稳健地标注CTI生命周期中的失败模式，避免了自动化“LLM作为评判者”管道的脆弱性。我们识别出三种领域特定的认知失败：来自表面元数据的虚假关联、来自冲突来源的矛盾知识以及对新兴威胁的受限泛化。我们通过因果干预验证了这些机制，并表明针对性防御能显著降低失败率。这些结果共同为构建具有韧性且领域感知的CTI智能体提供了具体路线图。

英文摘要

Large language models (LLMs) are increasingly used to help security analysts manage the surge of cyber threats, automating tasks from vulnerability assessment to incident response. Yet in operational CTI workflows, reliability gaps remain substantial. Existing explanations often point to generic model issues (e.g., hallucination), but we argue the dominant bottleneck is the threat landscape itself: CTI is heterogeneous, volatile, and fragmented. Under these conditions, evidence is intertwined, crowdsourced, and temporally unstable, which are properties that standard LLM-based studies rarely capture. In this paper, we present a comprehensive empirical study of LLM vulnerabilities in CTI reasoning. We introduce a human-in-the-loop categorization framework that robustly labels failure modes across the CTI lifecycle, avoiding the brittleness of automated "LLM-as-a-judge" pipelines. We identify three domain-specific cognitive failures: spurious correlations from superficial metadata, contradictory knowledge from conflicting sources, and constrained generalization to emerging threats. We validate these mechanisms via causal interventions and show that targeted defenses reduce failure rates significantly. Together, these results offer a concrete roadmap for building resilient, domain-aware CTI agents.

URL PDF HTML ☆

赞 0 踩 0

2509.23571 2026-05-29 cs.CR cs.AI

Benchmarking LLM-Assisted Blue Teaming via Standardized Threat Hunting

通过标准化威胁狩猎评估LLM辅助蓝队

Yuqiao Meng, Luoxi Tang, Feiyang Yu, Xi Li, Guanhua Yan, Ping Yang, Zhaohan Xi

AI总结本文提出CyberTeam基准，通过构建标准化工作流和模块化操作步骤，评估大语言模型在蓝队威胁狩猎中的有效性，并揭示标准化设计带来的改进与开放式推理的局限性。

Comments ICML'26

详情

AI中文摘要

随着网络威胁在规模和复杂性上持续增长，蓝队防御者越来越需要先进工具来主动检测和缓解风险。大语言模型（LLMs）为增强威胁分析提供了有前景的能力。然而，它们在真实蓝队威胁狩猎场景中的有效性仍未得到充分探索。本文提出CyberTeam，一个旨在指导LLMs进行蓝队实践的基准。CyberTeam通过两个阶段构建标准化工作流。首先，它通过捕获从威胁归因到事件响应的分析任务之间的依赖关系，对真实的威胁狩猎工作流进行建模。接下来，每个任务通过一组针对其特定分析需求定制的操作模块来处理。这将威胁狩猎转化为一系列结构化的推理步骤，每个步骤基于离散操作并根据任务特定依赖关系排序。在此框架指导下，LLMs被引导通过模块化步骤执行威胁狩猎任务。总体而言，CyberTeam整合了30个任务和9个操作模块，以指导LLMs进行标准化威胁分析。我们评估了领先的LLMs和最先进的网络安全智能体，将CyberTeam与开放式推理策略进行比较。我们的结果突显了标准化设计带来的改进，同时也揭示了开放式推理在真实威胁狩猎中的局限性。

英文摘要

As cyber threats continue to grow in scale and sophistication, blue team defenders increasingly require advanced tools to proactively detect and mitigate risks. Large Language Models (LLMs) offer promising capabilities for enhancing threat analysis. However, their effectiveness in real-world blue team threat-hunting scenarios remains insufficiently explored. This paper presents CyberTeam, a benchmark designed to guide LLMs in blue teaming practice. CyberTeam constructs a standardized workflow in two stages. First, it models realistic threat-hunting workflows by capturing the dependencies among analytical tasks from threat attribution to incident response. Next, each task is addressed through a set of operational modules tailored to its specific analytical requirements. This transforms threat hunting into a structured sequence of reasoning steps, with each step grounded in a discrete operation and ordered according to task-specific dependencies. Guided by this framework, LLMs are directed to perform threat-hunting tasks through modularized steps. Overall, CyberTeam integrates 30 tasks and 9 operational modules to guide LLMs through standardized threat analysis. We evaluate both leading LLMs and state-of-the-art cybersecurity agents, comparing CyberTeam against open-ended reasoning strategies. Our results highlight the improvements enabled by standardized design, while also revealing the limitations of open-ended reasoning in real-world threat hunting.

URL PDF HTML ☆

赞 0 踩 0

2509.21707 2026-05-29 stat.ML cs.LG stat.ME

SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning

SADA：半监督学习中多个黑箱预测的安全自适应聚合

Jiawei Shan, Zhifeng Chen, Yiming Dong, Yazhen Wang, Jiwei Zhao

AI总结提出一种安全自适应聚合多个不确定质量黑箱预测的方法，保证不劣于仅用标注数据，并在存在完美预测时实现更快收敛或半参数效率界。

详情

AI中文摘要

半监督学习（SSL）在实践中出现于标注数据稀缺或获取成本高昂，而大量未标注数据易于获取的情况下。随着机器学习技术的广泛采用，使用多种模型和算法（包括深度学习、大语言模型和生成式AI）生成多个预测标签已变得越来越可行。在本文中，我们提出了一种新颖方法，能够安全且自适应地聚合多个质量不确定的黑箱预测，用于推理和预测任务。我们的方法提供两个关键保证：（i）无论预测质量如何，其表现永远不会差于仅使用标注数据；（ii）如果任意一个预测（无需知道是哪一个）完美拟合真实标签，算法会自适应地利用这一点，以实现更快的收敛速度或半参数效率界。我们通过小规模模拟和两项具有不同科学目标的真实数据分析展示了所提算法的有效性。提供了用户友好的R包sada以促进实际实施。

英文摘要

Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions of uncertain quality for both inference and prediction tasks. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through small-scale simulations and two real-data analyses with distinct scientific goals. A user-friendly R package, sada, is provided to facilitate practical implementation.

URL PDF HTML ☆

赞 0 踩 0

2509.19318 2026-05-29 eess.SP cs.RO

Scensory: Real-Time Robotic Olfactory Perception for Joint Identification and Source Localization

Scensory：用于联合识别和源定位的实时机器人嗅觉感知

Yanbaihui Liu, Erica Babusci, Claudia K. Gunsch, Boyuan Chen

AI总结提出一种基于学习的机器人嗅觉框架Scensory，通过廉价交叉敏感VOC传感器阵列的短时序信号，利用神经网络解码时间动态特征，同时实现真菌种类识别（最高89.85%准确率）和源定位（最高87.31%准确率）。

Comments Our project website is at: http://generalroboticslab.com/Scensory

详情

AI中文摘要

尽管机器人在视觉和触觉感知方面取得了快速进展，但使其能够从微弱的、扩散主导的化学信号中推理室内真菌污染仍然是一个未解决的挑战。我们提出了Scensory，一个基于学习的机器人嗅觉框架，该框架能够同时识别真菌种类，并通过由廉价、交叉敏感的VOC传感器阵列测量的短时序信号定位其来源。时间VOC动态编码了化学和空间特征，我们通过基于机器人自动化数据收集并带有空间监督训练的神经网络来解码这些特征。在五种真菌种类中，Scensory在环境条件下使用3-7秒的传感器输入实现了高达89.85%的种类准确率和87.31%的源定位准确率。这些结果证明了从扩散主导的化学信号中实现实时、空间基础的感知的能力，为机器人室内环境监测提供了可扩展且低成本的源定位方法。

英文摘要

While robotic perception has advanced rapidly in vision and touch, enabling robots to reason about indoor fungal contamination from weak, diffusion-dominated chemical signals remains an open challenge. We introduce Scensory, a learning-based robotic olfaction framework that simultaneously identifies fungal species and localizes their source from short time series measured by affordable, cross-sensitive VOC sensor arrays. Temporal VOC dynamics encode both chemical and spatial signatures, which we decode through neural networks trained on robot-automated data collection with spatial supervision. Across five fungal species, Scensory achieves up to 89.85% species accuracy and 87.31% source localization accuracy under ambient conditions with 3-7s sensor inputs. These results demonstrate real-time, spatially grounded perception from diffusion-dominated chemical signals, enabling scalable and low-cost source localization for robotic indoor environmental monitoring.

URL PDF HTML ☆

赞 0 踩 0

2509.05771 2026-05-29 stat.ML cs.LG math.OC

Risk-averse Fair Multi-class Classification

风险规避的公平多类分类

Darinka Dentcheva, Xiangyu Tian

AI总结基于一致风险度量与系统性风险理论，提出一种适用于噪声、稀缺和标签不可靠数据的风险规避多类分类框架，并通过非线性聚合的系统方法设计两阶段随机规划及正则化分解算法，同时实现公平性增强。

详情

AI中文摘要

我们基于一致风险度量和系统性风险理论开发了一种新的分类框架。所提出的方法适用于数据存在噪声、稀缺（相对于问题维度）且标签可能不可靠的多类问题。在论文的第一部分，我们提供了使用系统性风险模型的基础，并展示了如何将其应用于线性和基于核的多类问题中。我们提出了一种通过非线性聚合的系统理论方法进行更高级的公式化，这导致了一个两阶段随机规划问题。设计了一种风险规避的正则化分解方法来求解该问题。在性能分析中，我们使用一种流行的多类方法作为所提出分类方法的基准。我们通过使用一致风险度量对该方法进行多种推广来说明我们的想法。所提出的风险规避方法的可行性在理论和数值上得到了支持。此外，我们证明了系统性风险度量的应用有助于在分类中强制执行公平性。对所提出模型的公平性进行了仔细的分析和实验。对于所有方法，我们的数值实验表明，它们在训练数据不可靠的情况下具有鲁棒性，并且在未知数据上的表现优于最小化期望分类误差的方法。此外，当类别数量增加时，性能会得到提升。

英文摘要

We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem), and the labeling might be unreliable. In the first part of our paper, we provide the foundation of the use of systemic risk models and show how to apply it in the context of linear and kernel-based multi-class problems. More advanced formulation via a system-theoretic approach with non-linear aggregation is proposed, which leads to a two-stage stochastic programming problem. A risk-averse regularized decomposition method is designed to solve the problem. We use a popular multi-class method as a benchmark in the performance analysis of the proposed classification methods. We illustrate our ideas by proposing several generalization of that method by the use of coherent measures of risk. The viability of the proposed risk-averse methods are supported theoretically and numerically. Additionally, we demonstrate that the application of systemic risk measures facilitates enforcing fairness in classification. Analysis and experiments regarding the fairness of the proposed models are carefully conducted. For all methods, our numerical experiments demonstrate that they are robust in the presence of unreliable training data and perform better on unknown data than the methods minimizing expected classification errors. Furthermore, the performance improves when the number of classes increases.

URL PDF HTML ☆

赞 0 踩 0

2508.15151 2026-05-29 eess.IV cs.CV

Zero-shot CT Super-Resolution using Diffusion-based 2D Projection Priors and Signed 3D Gaussians

基于扩散的二维投影先验和有符号三维高斯的零样本CT超分辨率

Jeonghyun Noh, Hyun-Jic Oh, Won-Ki Jeong

AI总结提出一种零样本三维CT超分辨率框架，通过扩散模型上采样二维投影先验并结合有符号三维高斯溅射（NAB-GS）重建高分辨率CT体积，在公开数据集上实现4倍超分辨率的优越性能。

Comments MICCAI 2026 early accepted

详情

AI中文摘要

计算机断层扫描（CT）在临床诊断中至关重要，但获取高分辨率（HR）CT受到辐射暴露风险的限制。虽然基于深度学习的超分辨率（SR）方法在从低分辨率（LR）输入重建HR CT方面显示出前景，但监督方法需要通常不可用的配对数据集。零样本方法通过处理单个LR输入来解决这一限制；然而，由于单个体积内LR信息有限，它们常常无法恢复精细的结构细节。为克服这些限制，我们提出了一种新颖的零样本三维CT SR框架，将基于扩散的上采样二维投影先验集成到三维重建过程中。具体而言，我们的框架包含两个阶段：（1）LR CT投影SR，在丰富的X射线数据上训练扩散模型以对LR投影进行上采样，从而增强LR输入中固有的稀缺信息。（2）三维CT体积重建，使用我们新颖的负Alpha混合（NAB-GS）的三维高斯溅射，该技术建模正负高斯密度以学习扩散生成的HR投影与上采样的LR投影之间的有符号残差。我们的框架在两个公开数据集上展示了优越的定量和定性性能，专家评估表明了该框架在4倍超分辨率下的临床潜力。

英文摘要

Computed tomography (CT) is important in clinical diagnosis, but acquiring high-resolution (HR) CT is constrained by radiation exposure risks. While deep learning-based super-resolution (SR) methods have shown promise for reconstructing HR CT from low-resolution (LR) inputs, supervised approaches require paired datasets that are often unavailable. Zero-shot methods address this limitation by operating on single LR inputs; however, they frequently fail to recover fine structural details due to limited LR information within individual volumes. To overcome these limitations, we propose a novel zero-shot 3D CT SR framework that integrates diffusion-based upsampled 2D projection priors into the 3D reconstruction process. Specifically, our framework consists of two stages: (1) LR CT projection SR, training a diffusion model on abundant X-ray data to upsample LR projections, thereby enhancing the scarce information inherent in the LR inputs. (2) 3D CT volume reconstruction, using 3D Gaussian splatting with our novel Negative Alpha Blending (NAB-GS), which models positive and negative Gaussian densities to learn signed residuals between diffusion-generated HR and upsampled LR projections. Our framework demonstrates superior quantitative and qualitative performance on two public datasets, and expert evaluations present the framework's clinical potential at 4x.

URL PDF HTML ☆

赞 0 踩 0

2507.21429 2026-05-29 stat.ML cs.LG

From Sublinear to Linear: Local Convergence in Finite-Width Networks via Locally Polyak-Lojasiewicz Regions

从次线性到线性：通过局部Polyak-Lojasiewicz区域在有限宽度网络中的局部收敛

Agnideep Aich, Ashit Baran Aich, Bruce Wade

AI总结本文研究有限宽度前馈网络在平方经验损失下梯度下降的局部线性收敛，通过局部Polyak-Lojasiewicz不等式和NTK正定性条件，证明了在局部拟凸区域内可实现线性收敛。

详情

AI中文摘要

我们研究了有限宽度前馈网络在平方经验损失下梯度下降的局部线性收敛。先前的工作表明，梯度下降可以保持在初始化附近的局部拟凸区域（LQCR）内，但仅给出次线性速率。我们证明，如果经验神经正切核在初始化时正定、在LQCR上Lipschitz稳定且与LQCR半径兼容，则平方损失满足局部Polyak-Łojasiewicz不等式，常数$μ= λ_0 - L_Θr(\Rcal) > 0$。结合固定步长迭代包含在LQCR内（作为线性速率定理中的假设），这在该区域上产生线性收敛。LQCR提供局部化；固定步长包含作为线性速率定理中的假设；PL不等式来自平方损失下的NTK条件。因此，结果是充分的局部条件，并非声称该机制对于快速收敛是必要或唯一的。实验上，我们通过NTK谱间隙、参数漂移、经验PL比率和次优性衰减来检验理论。在二值MNIST上，NTK保持正定，PL比率有正的下包络，损失在稳定区域呈几何衰减。在宽度消融实验中，固定步长宽度1024的运行离开局部区域；减小步长将最终漂移从1.870降至0.158，恢复观察到的局部区域诊断，并产生研究中观察到的最大经验PL比率下包络。在CIFAR-10子集上的CNN鲁棒性检查显示，PL比率包络在三个种子下保持正，且在稳定区域上三个种子均有正的下包络。

英文摘要

We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but only gives a sublinear rate. We show that if the empirical Neural Tangent Kernel is positive at initialization, Lipschitz stable on the LQCR, and compatible with the LQCR radius, then the squared loss satisfies a local Polyak-Łojasiewicz inequality with constant $μ= λ_0 - L_Θr(\Rcal) > 0$. Combined with fixed-step iterate containment in the LQCR, imposed as a hypothesis in the linear-rate theorem, this yields linear convergence on the region. The LQCR supplies localization; fixed-step containment is imposed as a hypothesis in the linear-rate theorem; and the PL inequality comes from NTK conditioning under squared loss. The result is therefore a sufficient local condition, not a claim that this mechanism is necessary or unique for fast convergence. Empirically, we probe the theory through NTK spectral gap, parameter drift, empirical PL ratio, and suboptimality decay. On binary MNIST, the NTK remains positive, the PL ratio has a positive lower envelope, and the loss shows geometric decay on the stable regime. In a width ablation, the fixed-step width-$1024$ run leaves the local regime; reducing the step size lowers final drift from $1.870$ to $0.158$, restores the observed local-regime diagnostics, and yields the largest empirical PL-ratio lower envelope observed in the study. A CNN robustness check on a CIFAR-10 subset shows the PL-ratio envelope remains positive across three seeds, with a positive lower envelope across all three seeds on the stable regime.

URL PDF HTML ☆

赞 0 踩 0

2507.21114 2026-05-29 cs.IR cs.AI cs.CV

Page image classification for content-specific data processing

面向特定内容数据处理的页面图像分类

Kateryna Lutsai

AI总结本研究针对人文学科数字化项目中历史文档页面图像内容多样、手动分类困难的问题，开发并评估了一种基于人工智能和机器学习的图像分类系统，通过按内容类别（如文本类型、图形元素、布局）自动分类页面，以支持定制化的下游分析流程。

Comments Dataset licensing issues occurred

详情

AI中文摘要

人文学科的数字化项目通常会产生大量历史文档的页面图像，这给手动分类和分析带来了巨大挑战。这些档案包含多样化的内容，包括各种文本类型（手写体、打字体、印刷体）、图形元素（图画、地图、照片）以及布局（纯文本、表格、表单）。高效处理这些异构数据需要基于页面内容进行自动分类的方法，从而能够启用定制化的下游分析流程。本项目通过开发并评估一种专门为历史文档页面设计的图像分类系统来满足这一需求，该系统利用了人工智能和机器学习的最新进展。所选的类别集旨在促进特定内容处理工作流程，将需要不同分析技术（例如，用于文本的OCR、用于图形的图像分析）的页面区分开来。

英文摘要

Digitization projects in humanities often generate vast quantities of page images from historical documents, presenting significant challenges for manual sorting and analysis. These archives contain diverse content, including various text types (handwritten, typed, printed), graphical elements (drawings, maps, photos), and layouts (plain text, tables, forms). Efficiently processing this heterogeneous data requires automated methods to categorize pages based on their content, enabling tailored downstream analysis pipelines. This project addresses this need by developing and evaluating an image classification system specifically designed for historical document pages, leveraging advancements in artificial intelligence and machine learning. The set of categories was chosen to facilitate content-specific processing workflows, separating pages requiring different analysis techniques (e.g., OCR for text, image analysis for graphics)

URL PDF HTML ☆

赞 0 踩 0

2506.04602 2026-05-29 cs.GT cs.LG

MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball

MVP-Shapley：基于特征建模的篮球最有价值球员评估方法

Haifeng Sun, Yu Xiong, Runze Wu, Kai Wang, Lan Zhang, Changjie Fan, Shaojie Tang, Xiang-Yang Li

AI总结提出一种基于Shapley值的MVP评估框架，通过特征处理、胜负模型训练和贡献分配，结合因果优化实现球员排名，并在NBA数据集上验证有效性。

详情

AI中文摘要

电子竞技和多人在线游戏社区的蓬勃发展凸显了评估最有价值球员（MVP）的关键重要性。建立可解释且实用的MVP评估方法非常具有挑战性。在我们的研究中，我们特别关注逐回合数据，该数据记录了比赛中的相关事件，如助攻和得分。我们旨在通过引入一种新的MVP评估框架（记为\oursys）来应对这些挑战，该框架利用Shapley值。该方法包括特征处理、胜负模型训练、Shapley值分配以及基于球员贡献的MVP排名确定。此外，我们从因果关系的角度优化算法，使其与专家投票结果一致。最后，我们通过使用NBA数据集和Dunk City Dynasty数据集进行验证，证实了我们方法的有效性，并在行业中实现了在线部署。

英文摘要

The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is very challenging. In our study, we specifically focus on play-by-play data, which records related events during the game, such as assists and points. We aim to address the challenges by introducing a new MVP evaluation framework, denoted as \oursys, which leverages Shapley values. This approach encompasses feature processing, win-loss model training, Shapley value allocation, and MVP ranking determination based on players' contributions. Additionally, we optimize our algorithm to align with expert voting results from the perspective of causality. Finally, we substantiated the efficacy of our method through validation using the NBA dataset and the Dunk City Dynasty dataset and implemented online deployment in the industry.

URL PDF HTML ☆

赞 0 踩 0

2505.24503 2026-05-29 cs.GT cs.AI

Online Fair Division with Additional Information

在线公平分配与额外信息

Tzeh Yuan Neoh, Jannik Peters, Nicholas Teh

AI总结研究在线公平分配不可分割物品问题，通过引入归一化信息和频率预测，实现了比以往更强的公平性保证，并提供了学习增强的鲁棒变体。

Comments Appears in the 43rd International Conference on Machine Learning (ICML), 2026

详情

AI中文摘要

我们研究了在在线环境中公平分配不可分割物品给智能体的问题，其中物品顺序到达且必须不可撤销地分配。聚焦于流行的公平概念——无嫉妒、比例性和最大最小份额公平（及其近似变体），我们探讨了对未来信息的访问如何改变可实现的保证。在没有信息的情况下，我们证明了即使是近似公平也存在强不可能性结果。在归一化信息（智能体的总价值）下，我们提供了一种算法，实现了比以往已知结果更强的公平性保证，并展示了更强概念的匹配不可能性。在频率预测（无顺序的价值多重集）下，我们设计了一种元算法，将一大类离线“基于份额”的保证提升到在线环境，匹配了已知的最佳离线界限。最后，我们提供了两种模型的学习增强变体：在有噪声的总和或有噪声的频率预测下，我们的保证是鲁棒的，并随误差参数优雅地退化。

英文摘要

We study the problem of fairly allocating indivisible goods to agents in an online setting, where goods arrive sequentially and must be allocated irrevocably. Focusing on the popular fairness notions of envy-freeness, proportionality, and maximin share fairness (and their approximate variants), we investigate how access to future information changes what guarantees are achievable. Without any information, we prove strong impossibility results even for approximate fairness. With normalization information (agents' total values), we provide an algorithm that achieves stronger fairness guarantees than previously known results, and show matching impossibilities for stronger notions. With frequency predictions (value multisets without order), we design a meta-algorithm that lifts a broad class of offline ''share-based'' guarantees to the online setting, matching the best-known offline bounds. Finally, we provide learning-augmented variants of both models: under noisy totals or noisy frequency predictions, our guarantees are robust and degrade gracefully with the error parameters.

URL PDF HTML ☆

赞 0 踩 0

2410.19371 2026-05-29 stat.ML cs.CR cs.LG

Noise-Aware Differentially Private Variational Inference

噪声感知的差分隐私变分推断

Talal Alrawajfeh, Joonas Jälkö, Antti Honkela

AI总结针对差分隐私导致下游推断不可靠的问题，提出一种基于随机梯度变分推断的噪声感知近似贝叶斯推断方法，可应用于高维和非共轭模型，并改进了后验评估精度。

Comments 26 pages, 4 figures

2410.15236 2026-05-29 cs.CR cs.AI cs.LG

Jailbreaking and Mitigation of Vulnerabilities in Large Language Models

大语言模型的越狱与漏洞缓解

Benji Peng, Hanxuan Chen, Keyu Chen, Qian Niu, Ziqian Bi, Ming Liu, Pohsun Feng, Tianyang Wang, Lawrence K. Q. Yan, Yizhu Wen, Yichao Zhang, Caitlyn Heqi Yin, Xinyuan Song, Riyang Bao, Jiacheng Shi

AI总结本文综述了大语言模型在提示注入和越狱攻击下的漏洞，分类攻击方法并评估防御策略，指出研究空白与未来方向。

详情

DOI: 10.63336/Eureka.47
Journal ref: Eureka 1(1) (2026) 26-61

AI中文摘要

大语言模型通过推进自然语言理解和生成，在医疗、软件工程和对话系统等领域实现了广泛应用，从而改变了人工智能。尽管在过去几年取得了这些进展，但大语言模型已显示出相当大的漏洞，特别是对提示注入和越狱攻击。本综述分析了这些漏洞的研究现状，并介绍了可用的防御策略。我们大致将攻击方法分为基于提示的、基于模型的、多模态的和多语言的，涵盖对抗性提示、后门注入和跨模态利用等技术。我们还回顾了各种防御机制，包括提示过滤、转换、对齐技术、多智能体防御和自律，评估了它们的优缺点。我们还讨论了用于评估大语言模型安全性和鲁棒性的关键指标和基准，指出了在交互环境中量化攻击成功率的挑战以及现有数据集中的偏差。通过识别当前研究空白，我们提出了未来在韧性对齐策略、针对不断演变的攻击的高级防御、越狱检测自动化以及考虑伦理和社会影响方面的方向。本综述强调了在人工智能社区内持续研究和合作的必要性，以增强大语言模型的安全性并确保其安全部署。

英文摘要

Large Language Models (LLMs) have transformed artificial intelligence by advancing natural language understanding and generation, enabling applications across fields beyond healthcare, software engineering, and conversational systems. Despite these advancements in the past few years, LLMs have shown considerable vulnerabilities, particularly to prompt injection and jailbreaking attacks. This review analyzes the state of research on these vulnerabilities and presents available defense strategies. We roughly categorize attack approaches into prompt-based, model-based, multimodal, and multilingual, covering techniques such as adversarial prompting, backdoor injections, and cross-modality exploits. We also review various defense mechanisms, including prompt filtering, transformation, alignment techniques, multi-agent defenses, and self-regulation, evaluating their strengths and shortcomings. We also discuss key metrics and benchmarks used to assess LLM safety and robustness, noting challenges like the quantification of attack success in interactive contexts and biases in existing datasets. Identifying current research gaps, we suggest future directions for resilient alignment strategies, advanced defenses against evolving attacks, automation of jailbreak detection, and consideration of ethical and societal impacts. This review emphasizes the need for continued research and cooperation within the AI community to enhance LLM security and ensure their safe deployment.

URL PDF HTML ☆

赞 0 踩 0

2410.10398 2026-05-29 cs.CE cs.AI

Are LLMs Socially Adaptive? Contrasting Belief Evolution in Large Language Models and Humans

大型语言模型是否具有社会适应性？对比大型语言模型与人类的信念演化

Yu Lei, Hao Liu, Chengxing Xie, Songjia Liu, Zhiyu Yin, Canyu Chen, Guohao Li, Philip Torr, Zhen Wu

AI总结本研究提出基于社会心理学的仿真基准FairMindSim和信念-奖励对齐行为演化模型BREM，通过连续经济游戏对比人类与LLM的决策动态，发现中等能力模型表现出过度惩罚的刚性攻击性，而前沿模型随推理能力提升趋向人类式的克制与宽容。

Comments KDD 2026 Oral

详情

AI中文摘要

随着大型语言模型（LLM）越来越多地参与复杂的社会互动，确保其行为符合人类伦理原则和意图（即价值对齐）已成为一项关键的科学挑战。现有基准通常依赖静态评估，未能捕捉决策的纵向动态或驱动智能体行为的潜在认知过程。在这项工作中，我们提出了FairMindSim，一个基于社会心理学的现实仿真基准，通过连续经济游戏评估对齐性。为了超越黑箱观察，我们引入了信念-奖励对齐行为演化模型（BREM），这是一个概率框架，将决策形式化为最大化外在奖励与维护内在信念之间的动态权衡。我们进行了一项大规模比较研究，涉及1,017名人类参与者和十个LLM，包括GPT-5和Gemini-3-Pro。我们的实验结果揭示了第三方惩罚（TPP）游戏中一种与能力相关的非线性经验趋势。中等能力模型表现出僵化且算法化的攻击性，其特征是过度惩罚，而前沿模型则展现出克制收敛，并随着推理能力的扩展向类似人类的宽容转变。此外，利用BREM，我们分解了智能体的纵向决策动态，发现更先进的模型通过减少信念-行为不一致性，更好地平衡了相互冲突的目标。我们的贡献为心理压力测试提供了一个标准化协议，并为在受控社会困境环境中分析AI对齐的纵向演化提供了一种可解释的机制。

英文摘要

As large language models (LLMs) increasingly engage in complex social interactions, ensuring that their behaviors align with human ethical principles and intentions, known as value alignment, has become a critical scientific challenge. Existing benchmarks often rely on static assessments and fail to capture the longitudinal dynamics of decision-making or the latent cognitive processes driving agent behavior. In this work, we propose FairMindSim, a realistic simulation benchmark rooted in social psychology that evaluates alignment through continuous economic games. To move beyond black-box observations, we introduce the Belief-Reward Alignment Behavior Evolution Model (BREM), a probabilistic framework that formalizes decision-making as a dynamic trade-off between maximizing extrinsic rewards and upholding intrinsic beliefs. We conducted a large-scale comparative study involving 1,017 human participants and ten LLMs, including GPT-5 and Gemini-3-Pro. Our experimental results reveal a capability linked non linear empirical trend in the Third Party Punishment (TPP) game. Mid capability models exhibit rigid and algorithmic aggression that is characterized by over punishment, while frontier models show a convergence of restraint and a shift toward human like leniency as reasoning capabilities scale. Furthermore, using BREM, we decompose agents longitudinal decision dynamics and find that more advanced models better balance conflicting objectives by reducing belief action inconsistency. Our contributions provide a standardized protocol for psychological stress testing and an interpretable mechanism for analyzing the longitudinal evolution of AI alignment in controlled social dilemma settings.

URL PDF HTML ☆

赞 0 踩 0

2404.10706 2026-05-29 cs.CY cs.CL cs.HC cs.SI

Cross-Language Evolution of Divergent Collective Memory Around the Arab Spring

阿拉伯之春的跨语言分歧性集体记忆演化

H. Laurie Jones, Brian C. Keegan

AI总结通过分析2011-2024年间阿拉伯语和英语维基百科中阿拉伯之春相关文章的存档内容，定义了多语言的事件显著性、商议、语境化和集体记忆巩固度量，揭示了跨语言内容相似性的时间演化规律。

详情

DOI: 10.1609/icwsm.v20i1.42699

AI中文摘要

阿拉伯之春是始于2011年的一系列历史性抗议活动，这些抗议推翻了多国政府并导致了重大冲突。对于此类事件的集体记忆可能因政治、文化和语言因素而在不同社会语境中存在显著差异。尽管维基百科在记录历史及当前事件方面发挥着重要作用，但关于维基百科文章在重大事件发生后如何持续演化数年或数十年的问题却鲜有关注。利用2011年至2024年间阿拉伯语和英语维基百科中阿拉伯之春相关主题的存档内容，我们定义并评估了围绕阿拉伯之春的事件显著性、商议、语境化和集体记忆巩固的多语言度量。我们关于维基百科文章跨语言内容相似性时间演化的发现，对于在线集体记忆过程的理论构建以及基于这些数据训练的语言模型的评估具有启示意义。

英文摘要

The Arab Spring was a historic set of protests beginning in 2011 that toppled governments and led to major conflicts. Collective memories of events like these can vary significantly across social contexts in response to political, cultural, and linguistic factors. While Wikipedia plays an important role in documenting both historic and current events, little attention has been given to how Wikipedia articles, created in the aftermath of major events, continue to evolve over years or decades. Using the archived content of Arab Spring-related topics across the Arabic and English Wikipedias between 2011 and 2024, we define and evaluate multilingual measures of event salience, deliberation, contextualization, and consolidation of collective memory surrounding the Arab Spring. Our findings about the temporal evolution of the Wikipedia articles' content similarity across languages has implications for theorizing about online collective memory processes and evaluating linguistic models trained on these data.

URL PDF HTML ☆

赞 0 踩 0

2308.13222 2026-05-29 physics.comp-ph cs.LG physics.flu-dyn stat.ML

Bayesian Reasoning for Physics Informed Neural Networks

物理信息神经网络的贝叶斯推理

Krzysztof M. Graczyk, Kornel Witkowski

AI总结提出一种基于证据驱动的贝叶斯物理信息神经网络方法，通过拉普拉斯近似高效计算模型证据，自动优化偏微分方程残差、边界条件和观测数据之间的损失权重，并在热方程、波动方程和伯格斯方程上验证了其求解精度与不确定性量化能力。

Comments 21 pages, 12 figures, re-edit the description of the Bayesian framework, some of the content moved to Appendix. Discussion of numerical performance added, as well as related approaches

详情

DOI: 10.1103/29bd-jfhz
Journal ref: Phys. Rev. E 113, 055307 (2026)

AI中文摘要

我们引入了一种基于证据驱动的贝叶斯物理信息神经网络公式，能够自动优化偏微分方程残差、边界条件和观测数据之间的损失权重。与现有基于采样或变分推理的贝叶斯PINN方法不同，所提方法使用拉普拉斯近似解析计算模型证据，从而无需后验采样即可实现高效的超参数调优和模型比较。我们在热方程、波动方程和伯格斯方程上演示了该方法，获得了与精确解或参考解一致的结果。在伯格斯方程示例中，我们进一步展示了该框架自然地整合了控制方程和含噪声测量中的信息，在统一的贝叶斯框架内提供了预测不确定性。

英文摘要

We introduce an evidence-driven Bayesian formulation of physics-informed neural networks that enables automatic optimization of loss weights between PDE residuals, boundary conditions, and observational data. Unlike existing Bayesian PINN approaches based on sampling or variational inference, the proposed method uses a Laplace approximation to compute model evidence analytically, enabling efficient hyperparameter tuning and model comparison without posterior sampling. We demonstrate the method on the heat, wave, and Burgers' equations, obtaining solutions in agreement with exact or reference results. In the Burgers' equation example, we further show that the framework naturally integrates information from governing equations and noisy measurements, providing predictive uncertainties within a unified Bayesian setting.

URL PDF HTML ☆

赞 0 踩 0

2605.30354 2026-05-29 hep-th

Quiver Approach to Symmetry Theories

对称性理论的箭图方法

Vivek Chakrabhavi, Mirjam Cvetič, Jonathan J. Heckman, Shani Meynet

AI总结本文提出一种基于箭图路径代数的代数方法，从M理论背景的Calabi-Yau锥中提取5D超共形场论的全局对称性反常数据，适用于几何计算未知或组合复杂的情形。

Comments 55 pages + appendices, 12 figures

2605.30340 2026-05-29 gr-qc

Carr criterion and mass gaps in non-singular primordial black hole formation

非奇异原初黑洞形成中的Carr判据与质量间隙

Jens Boos, Arif Kağan Gündoğdu, Marek Hartenfels

AI总结本文通过推导含引力调节子ℓ的有效弗里德曼方程，发现引力调节子导致原初黑洞质量间隙，并给出Carr判据的修正形式，从而将原初黑洞丰度与引力调节子直接关联。

Comments 8 pages, 3 figures, comments welcome!

详情

AI中文摘要

非奇异引力理论预计在早期宇宙中具有相关性。本文推导了一组有效弗里德曼方程，描述了在存在引力调节子ℓ的情况下物质壳层的动力学。我们发现这样的调节子会诱导一个原初黑洞质量间隙，使得低于某个质量$M_\text{gap}(\ell, R_H)$时无法形成黑洞。该质量间隙的数量级由调节子$\sim c^2\ell/G$设定，并依赖于形成时的视界半径$R_H$的次主导项。最后，我们证明在一系列广泛的状态方程参数$\omega=0\dots 1/3$范围内，质量间隙暗示了形如$\delta_H > 2G M_\text{gap}/R_H - 1$的Carr判据。如果视界大小与调节子同量级，即$R_H \sim \ell$，则这一新判据比传统的原初黑洞形成Carr判据更强。这直接将原初黑洞丰度与引力调节子的存在联系起来。

英文摘要

Non-singular gravitational theories are expected to be relevant in the early universe. In this paper, we derive a set of effective Friedmann equations describing the dynamics of matter shells in the presence of a gravitational regulator $\ell$. We find that such a regulator induces a primordial black hole mass gap such that below a certain mass $M_\text{gap}(\ell, R_H)$ no black holes can form. The order of magnitude of this mass gap is set by the regulator $\sim c^2\ell/G$, with subleading dependence on the horizon radius at time of formation $R_H$. Finally, we show that over a wide range of equation of state parameters $ω= 0 \dots 1/3$, the mass gap implies a Carr criterion of the form $δ_H > 2G M_\text{gap}/R_H - 1$. If the horizon size is of the same order of the regulator, $R_H \sim \ell$, this new criterion is stronger than the traditional Carr criterion for primordial black hole formation. This connects the primordial black hole abundance directly to the presence of gravitational regulators.

URL PDF HTML ☆

赞 0 踩 0

2605.30321 2026-05-29 math.PR math.ST stat.TH

A Bayesian Proof and Interpretation of Talagrand's Majorizing Measure Theorem

Talagrand 优势测度定理的贝叶斯证明与解释

Ilias Zadik

AI总结本文通过贝叶斯方法，利用高斯加性模型的两个面积恒等式，比较最大似然估计与贝叶斯最优估计，给出了 Talagrand 优势测度定理下界的简洁证明。

2605.30316 2026-05-29 cond-mat.mes-hall cond-mat.supr-con

Visualizing orbital magnetism in electron doped rhombohedral multilayer graphene

电子掺杂菱面体多层石墨烯中的轨道磁性可视化

Owen I. Sheekey, Trevor B. Arp, Benjamin A. Foutty, Ruoxi Zhang, Tixuan Tan, Ludwig F. W. Holleis, Yi Guo, Sandesh S. Kalantre, Canxun Zhang, Mark Zakharyan, David Gong, Aidan Keough, Youngjoon Choi, Ysun Choi, Siyuan Xu, Tian Xie, Ben Hodder Alexander, Marisa Hocking, Qingrui Cao, Martin E. Huber, Takashi Taniguchi, Kenji Watanabe, Chenhao Jin, Etienne Lantagne-Hurtubise, Aaron Sharpe, Trithep Devakul, Andrea F. Young

AI总结利用纳米SQUID磁强计测量电子掺杂菱面体多层石墨烯的轨道磁化，揭示了四分之一金属相中的有限动量“火环”贝里曲率分布，并直接证明超导态具有轨道磁矩，证实其手性特性。

详情

AI中文摘要

在高位移场下，电子掺杂的菱面体多层石墨烯具有异常平坦的带底和近乎理想的量子几何。该区域的实验观察到“四分之一金属”的形成，其中电子液体凝聚成单一的自旋和谷味。值得注意的是，最近的实验在密度和位移场调谐的参数空间的同一区域发现了零电阻态，归因于由有限动量库珀对凝聚体表征的手性超导体的形成。在这里，我们使用纳米SQUID-on-tip磁强计绘制了厚度在3到13层之间的电子掺杂菱面体石墨烯器件的轨道磁化。四分之一金属相内的磁化在有限密度处达到峰值，与贝里曲率集中在有限动量“火环”中的现象一致。在四层样品中，关联输运和局部磁测量数据表明超导态具有有限的轨道磁矩，直接证明了其手性性质。我们进一步表明，在金属态中广泛观察到的电阻随机切换源于密度调谐的谷分辨总磁矩的符号变化。这导致在典型栅极控制序列下形成亚稳态磁畴，并且还可用于在整个器件中实现电场控制的轨道矩切换。出乎意料的是，我们发现手性超导体表观正常态特有的磁不均匀性，暗示了应变调谐的磁性和非磁性基态之间的竞争。我们的结果指出了在窄层数范围内观察到手性超导性背后的微妙能量竞争。

英文摘要

Electron doped rhombohedral multilayer graphene at high displacement field features an exceptionally flat band minimum with near-ideal quantum geometry. Experiments in this regime observe the formation of a 'quarter metal,' in which the electron liquid condenses into a single spin- and valley flavor. Remarkably, recent experiments have found a zero resistance state in the same region of the density- and displacement-field-tuned parameter space, attributed to the formation of a chiral superconductor characterized by a finite-momentum Cooper pair condensate. Here, we use nanoSQUID-on-tip magnetometry to map the orbital magnetization of electron-doped rhombohedral graphene devices ranging in thickness between 3 and 13 layers. Magnetization within the quarter metal phases peaks at finite density, consistent with concentration of the Berry curvature in a finite-momentum 'ring of fire'. Correlating transport and local magnetometry data in a tetralayer sample reveals that the superconducting state has a finite orbital magnetic moment, providing direct evidence of its chiral nature. We further show that widely observed stochastic switching of the resistivity in the metallic regime arises from a density-tuned sign change in the valley-resolved total magnetic moment. This leads to the formation of metastable magnetic domains under typical gate control sequences and can also be harnessed for electric-field controlled switching of orbital moment across the entire device. Unexpectedly, we find magnetic inhomogeneity specific to the apparent normal state of the chiral superconductor, suggestive of a strain-tuned competition between magnetic and non-magnetic ground states. Our results point to a subtle energetic competition underlying the observation of chiral superconductivity in a narrow range of layer numbers.

URL PDF HTML ☆

赞 0 踩 0

2605.30314 2026-05-29 cs.MA

SpecBench: Evaluating Specification-Level Reasoning for Software Engineering LLM Agents

SpecBench: 评估软件工程LLM代理的规约级推理能力

Grant Hamblin, Kevin Song, Zhanda Zhu, Anand Jayarajan, Sihang Liu, Nandita Vijaykumar, Gennady Pekhimenko

AI总结提出SpecBench基准，通过从RFC过程中提取任务，评估LLM代理在无执行反馈下识别初始设计提案中遗漏、歧义、不一致或错误假设的规约级推理能力。

详情

AI中文摘要

软件工程（SWE）代理正从代码生成过渡到全软件开发生命周期自动化。该生命周期中的一个关键阶段是规约设计：通过专家评审将初始提案转化为经过仔细考虑的需求。现有基准如SWE-Bench侧重于实现，通过衡量代理在给定固定、精确的设计需求下生成代码的能力。这种表述假设规约是正确且完整的。在现实世界中复杂且关键的软件系统中，初始规约往往不完整且有缺陷，需要经过广泛的专家评审和修订才能被接受用于实现。为填补这一空白，我们引入SpecBench来评估规约级推理：生成完整、无歧义、一致且正确的系统规约的能力。SpecBench任务源自成熟开源项目使用的请求评论（RFC）过程。对于每个任务，代理获得初始设计提案、项目代码库以及所有过去的项目RFC讨论。代理的任务是识别规约缺陷：初始提案中的遗漏、歧义、不一致或错误假设。我们根据历史RFC评审中专家维护者提出的批评来评估预测。SpecBench包含来自5个不同仓库的任务：Kubernetes、React、Rust、TVM和vLLM。我们在SpecBench上评估了最先进的SWE代理，分析了它们在无执行反馈下推理系统设计的能力。表现最佳的代理GPT-5.4达到了44.4%的准确率。

英文摘要

Software engineering (SWE) agents are transitioning from code generation to full software development lifecycle automation. A critical phase in this lifecycle is specification design: transforming initial proposals into carefully considered requirements through expert review. Existing benchmarks such as SWE-Bench are implementation-focused by measuring the agent's ability to generate code given fixed, precise design requirements. This formulation assumes specifications are correct and complete. In real-world complex and critical software systems, initial specifications are often incomplete and flawed, requiring extensive expert reviews and revisions before being accepted for implementation. To fill this gap, we introduce SpecBench to evaluate specification-level reasoning: the ability to generate complete, unambiguous, consistent, and correct system specifications. SpecBench tasks are derived from the Request for Comments (RFC) process used by mature open-source projects. For each task, an agent is given an initial design proposal, the project codebase, and all past project RFC discussions. The agent is tasked with identifying specification deficiencies: omissions, ambiguities, inconsistencies, or incorrect assumptions in the initial proposal. We evaluate predictions against critiques raised by expert maintainers during historical RFC reviews. SpecBench contains tasks from 5 diverse repositories: Kubernetes, React, Rust, TVM, and vLLM. We evaluate state-of-the-art SWE agents on SpecBench, analyzing their capacity to reason about system design without execution feedback. The best performing agent, GPT-5.4, achieves 44.4% accuracy.

URL PDF HTML ☆

赞 0 踩 0

2605.30312 2026-05-29 cs.CR

DP-SAPF: Saliency-Aware Parameter Fine-tuning of Public Models for Differentially Private Image Synthesis

DP-SAPF: 面向差分隐私图像合成的公共模型显著性感知参数微调

Chen Gong, Kecen Li, Zinan Lin, Tianhao Wang

AI总结提出DP-SAPF方法，通过显著性感知策略选择公共模型中最关键的参数进行低秩适应（LoRA）微调，在差分隐私图像合成中减少噪声积累并提升合成图像的效用和保真度。

Comments Accepted at Usenix Security 2026; code available at https://github.com/2019ChenGong/DP-SAPF

详情

AI中文摘要

差分隐私（DP）图像合成生成保留敏感数据集统计特征的图像，使得在提供严格隐私泄露保证的同时，能够进行敏感数据分析和使用。现有方法使用差分隐私随机梯度下降（DP-SGD）在敏感图像上微调公共模型以生成合成图像。但在敏感图像上完全微调公共模型计算成本高昂，因为当前公共模型通常包含大量参数。最近的工作启发式地在公共模型的所有注意力层参数上使用低秩适应（LoRA）以减少可训练参数数量。然而，我们认为在DP设置中，对所有注意力层参数进行穷举LoRA覆盖是次优的，因为它会导致训练过程中的噪声累积和崩溃。为解决此问题，我们提出DP-SAPF，它使用显著性感知策略来识别在DP下进行LoRA训练的特定目标参数。DP-SAPF的灵感来源于较大的梯度表示较高的显著性，表明这些参数对DP学习最为关键。具体来说，我们将敏感图像输入公共模型，计算梯度，并向梯度添加噪声以满足DP。然后，DP-SAPF识别最显著的参数，即在敏感图像上表现出高梯度幅度的参数，用于DP微调。在四个敏感图像数据集上的实验表明，与没有参数选择的微调方法相比，DP-SAPF提高了合成图像的效用和保真度，同时需要更少的计算资源。

英文摘要

Differentially private (DP) image synthesis generates images that preserve the statistical characteristics of a sensitive dataset, enabling sensitive data analysis and usage while providing rigorous guarantees of privacy leakage. Existing methods fine-tune public models using DP Stochastic Gradient Descent (DP-SGD) on sensitive images to generate synthetic images. But full fine-tuning public models on sensitive images is computationally expensive, because current public models typically contain a large number of parameters. Recent work proposes heuristically using Low-Rank Adaptation (LoRA) on all attention-layer parameters of public models to reduce the number of trainable parameters. However, we argue that exhaustive LoRA coverage across all attention-layer parameters is suboptimal in a DP setting, as it leads to noise accumulation and collapse during private training. To address this issue, we propose DP-SAPF, which uses a saliency-aware strategy to identify specific target parameters for LoRA training under DP. DP-SAPF is inspired by the fact that larger gradients signify higher saliency, indicating that these parameters are most critical for the DP learning. Specifically, we feed the sensitive images into public models, compute gradients, and add noise to the gradients to satisfy DP. Then, DP-SAPF identifies the most salient parameters, those exhibiting high gradient magnitudes on sensitive images, for DP fine-tuning. Experiments on four sensitive image datasets show that DP-SAPF improves the utility and fidelity of synthetic images while requiring fewer computational resources than fine-tuning methods without parameter selection.

URL PDF HTML ☆

赞 0 踩 0

2605.30308 2026-05-29 cs.DB

Zero-Scan Data Quality: Leveraging Table Format Metadata for Continuous Observability at Scale

零扫描数据质量：利用表格式元数据实现大规模持续可观测性

Mohit Verma, Shantanu Rawat, Christian Bush, Sumedh Sakdeo, Lokesh Amarnath Ravindranathan, Dwarak Bakshi

AI总结提出一种元数据优先方法，利用Apache Iceberg等现代表格式在写入时计算的统计信息（如提交时间戳、记录计数、空值计数和值边界）进行持续数据质量监控，无需扫描数据，在LinkedIn的20万+ Iceberg表（800+ PB）上部署，满足约60%的用户定义数据质量规则，零边际计算成本，并将分析资源消耗降低约50%。

Comments To appear in the 1st International Workshop on Data FORMATS for Modern Architectures and Workloads (FORMATS '26), Bengaluru, India, May 2026

详情

DOI: 10.1145/3802514.3812601

AI中文摘要

现代表格式如Apache Iceberg在写入时计算并存储元数据提交时间戳、记录计数以及列级统计信息（如空值计数和值边界），作为文件写入的一部分。这些统计信息服务于查询规划，但与数据质量监控需求高度重叠。我们描述了一种元数据优先方法，将写入时统计信息重新用于持续数据质量可观测性：异常检测、漂移监控、空值率跟踪，无需扫描任何数据。在LinkedIn的20万+ Iceberg表（800+ PB）上部署，该方法以零边际计算成本满足约60%的用户定义数据质量规则，并将分析资源消耗降低约50%。通过轻量级计数器（求和、零值计数、布尔计数）和增量可合并草图（用于不同计数的Theta草图、用于分位数的KLL草图）扩展清单统计信息，可进一步提高元数据可满足的覆盖率，接近生产数据质量规则的90%。我们在生产数据上验证了草图的准确性、可合并性和存储开销，并提出表格式应将每个文件的草图存储在Puffin侧边文件中，遵循与现有清单统计信息相同的存储后聚合模式。

英文摘要

Modern table formats such as Apache Iceberg compute and store metadata-commit timestamps, record counts, and column-level statistics such as null counts and value bounds at write time as part of file writing. These statistics serve query planning, yet they overlap substantially with data quality (DQ) monitoring needs. We describe a metadata-first approach that repurposes write-time statistics for continuous DQ observability: anomaly detection, drift monitoring, null-rate tracking; without scanning any data. Deployed at LinkedIn across 200,000+ Iceberg tables (800+ PB), this approach satisfies approximately 60% of user-defined DQ rules at zero marginal compute cost and reduces profiling resource consumption by around 50%. Extending manifest statistics with lightweight counters (sum, zero-value counts, boolean counts) and incrementally mergeable sketches; Theta sketches for distinct counts, KLL sketches for quantiles; can further raise metadata-satisfiable coverage to close to 90% of production DQ rules. We validate sketch accuracy, mergeability, and storage overhead on production data and propose that table formats should store per-file sketches in Puffin sidecar files, following the same store-then-aggregate pattern used for existing manifest statistics.

URL PDF HTML ☆

赞 0 踩 0

2605.30306 2026-05-29 cs.DM

On abelian periodicity of purely morphic words

纯形态词的阿贝尔周期性

Arina Filimonova, Svetlana Puzynina

AI总结研究由形态生成的无限词的阿贝尔周期性，给出了二元形态生成阿贝尔周期词的刻画条件，并针对纯阿贝尔周期词给出了算法化的上界。

详情

AI中文摘要

由形态生成的无限词的周期性判定是组合数学中一个经典结果，由Harju、Linna和Pansiot在80年代提出。本文关注该问题的阿贝尔版本。两个词称为阿贝尔等价，如果它们包含每个字母的出现次数相同。无限词$s$称为最终阿贝尔周期，如果它可以分解为$s=uv_1v_2v_3\cdots$，其中$v_i$是阿贝尔等价的词。如果$u$为空，则$s$称为纯阿贝尔周期。我们给出了生成阿贝尔周期词的二元形态的如下刻画：由二元形态$f$生成的词是阿贝尔周期的当且仅当要么它是周期的，要么存在整数$K$和词$u$、$v$、$u'$、$v'$使得$f^K(a) = uv$，$f^K(b) = u'v'$，$u\sim_{ab} u'$，并且$vu$和$v'u'$是阿贝尔周期且具有阿贝尔等价的周期。对于纯阿贝尔周期词的情况，我们还给出了$K$的上界，使得所得到的刻画是算法化的。

英文摘要

Deciding periodicity of infinite words generated by morphisms is a classical result in combinatorics on words from 80's by Harju, Linna and Pansiot. In this paper, we are interested in this question in the abelian setting. Two words are called \textit{abelian equivalent} if they contain the same numbers of occurrences of each letter. An infinite word $s$ is called \emph{ultimately abelian periodic} if it can be factorized as $s=uv_1v_2v_3\cdots$, where $v_i$'s are abelian equivalent words. If $u$ is empty, then $s$ is called \emph{purely abelian periodic}. We provide the following characterization of binary morphisms generating abelian periodic words: A word generated by a binary morphism $f$ is abelian periodic if and only if either it is periodic or there exist an integer $K$ and words $u$, $v$, $u'$, $v'$ such that $f^K(a) = uv$, $f^K(b) = u'v'$, $u\sim_{ab} u'$, and $vu$ and $v'u'$ are abelian periodic with abelian equivalent periods. For the case of the purely abelian periodic words, we also provide an upper bound on $K$ which makes the obtained characterization algorithmic.

URL PDF HTML ☆

赞 0 踩 0

2605.30305 2026-05-29 astro-ph.CO

Augmented Correlation Functions for Spectroscopic Galaxy Surveys

光谱星系巡天的增强相关函数

Davide Bianchi

AI总结提出增强相关函数框架，通过引入由星系场变换得到的潜在维度扩展标准两点相关函数，从光谱星系巡天中提取更多宇宙学信息，并在Quijote模拟中展示了对宇宙学参数约束的改进。

Comments 28 pages, 8 figures

详情

AI中文摘要

星系红移巡天编码了由非线性引力演化、星系偏差和红移空间畸变产生的丰富信息，其中只有部分可通过标准两点统计量获取。受对灵活且计算高效的替代方法的需求驱动，我们引入了增强相关函数，这是一个通用框架，其中星系场的任意变换定义了额外的“潜在”维度，扩展了标准两点相关函数，并分离了常规分析中平均掉的聚类特性。作为概念验证，我们研究了一个由星系密度场的逆拉普拉斯算子的成对梯度构建的潜在变量，表明所得统计量自然地区分了与内落和流出对相关的聚类状态。基于$νΛ\mathrm{CDM}$宇宙学中Quijote模拟的$z=1$暗物质晕目录的Fisher预测，我们发现增强相关函数系统地给出了所有考虑的宇宙学参数更紧的约束。尽管鉴于分析的探索性质以及Fisher预测和模拟的局限性，这些改进应被视为指示性的，但我们的结果展示了增强相关函数作为从光谱星系巡天中提取额外信息的灵活框架的潜力。

英文摘要

Galaxy redshift surveys encode a wealth of information generated by nonlinear gravitational evolution, galaxy bias, and redshift-space distortions, only part of which is accessible through standard two-point statistics. Motivated by the need for flexible and computationally efficient alternatives, we introduce the augmented correlation function, a general framework in which an arbitrary transformation of the galaxy field defines additional ``latent'' dimensions that extend the standard two-point correlation function and isolate clustering properties averaged out in conventional analyses. As a proof of concept, we study a latent variable constructed from the pairwise gradient of the inverse Laplacian of the galaxy density field, showing that the resulting statistics naturally distinguish clustering regimes associated with infalling and outflowing pairs. Using Fisher forecasts based on $z=1$ halo catalogues from the Quijote simulations within $νΛ\mathrm{CDM}$ cosmology, we find that the augmented correlation systematically yields tighter constraints on all cosmological parameters considered. Although these improvements should be regarded as indicative given the exploratory nature of the analysis and the limitations of Fisher forecasts and simulations, our results demonstrate the potential of augmented correlations as a flexible framework for extracting additional information from spectroscopic galaxy surveys.

URL PDF HTML ☆

赞 0 踩 0

2605.30304 2026-05-29 quant-ph physics.optics

Analytical model for structured light propagation through a turbulent atmosphere

结构光在大气湍流中传播的解析模型

Konstantin Kravtsov

AI总结基于分步法和模式光场表示，提出解析模型研究湍流引起的相位波动如何导致光功率在空间模式间重新分布，并给出矩阵指数形式的简单解。

Comments 12 pages, 3 figures

详情

AI中文摘要

我们开发了一个简单的解析框架，用于空间光模式在湍流大气中的传播。该框架基于分步法和基于模式的光场表示，直接评估湍流引起的相位波动如何消耗原始模式中的光功率并将其重新分配到相邻的空间模式中。重要的是，在均匀信道中，这种功率转移随传播距离线性变化，从而以矩阵指数的形式为任意距离提供了简单的解。转移速率由湍流谱与一对相互作用空间模式的接受谱之间的空间光谱重叠决定。该模型预测每个空间模式中的平均功率，并且在单个模式强烈主导所有其他模式时是精确的。我们的预测与中等到强湍流水平的模拟结果显示出相当好的一致性。该模型还证实了先前作为经验观察已知的随模式阶数的标度关系。

英文摘要

We develop a straightforward analytical framework for the propagation of spatial light modes through a turbulent atmosphere. Built upon the split-step approach with the mode-based optical field representation, it directly assesses how turbulence-induced phase fluctuations deplete the optical power in the original mode and re-distribute it into neighboring spatial modes. Importantly, this power transfer scales linearly with the propagation distance in a uniform channel, yielding a simple solution for arbitrary distances in the form of a matrix exponential. The transfer rate is determined by the spatial spectral overlap between the turbulence spectrum and the acceptance spectrum for a pair of interacting spatial modes. The model predicts the average power in each spatial mode and is exact when a single mode strongly dominates all others. Our predictions show reasonably good agreement with simulations up to medium-to-strong turbulence levels. The model also confirms the scalings with mode order previously known as empirical observations.

URL PDF HTML ☆

赞 0 踩 0