arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.20896 2026-05-25 cs.CR cs.AI cs.LG

GenAI-Driven Threat Detection with Microsoft Security Copilot

GenAI驱动的威胁检测与Microsoft Security Copilot

Scott Freitas, Amir Gharib

发表机构 * Microsoft Security Research（微软安全研究）

AI总结本文提出了一种名为动态威胁检测代理（DTDA）的自主代理系统，用于提升微软安全协作者（Microsoft Security Copilot）在检测隐蔽网络威胁方面的能力。DTDA结合了统一的活动时间线、版本化的大型语言模型提示合同、基于计划-执行的调查循环以及动态告警生成机制，能够持续分析安全事件并生成可解释的检测结果。实验表明，DTDA在实际部署中表现出较高的检测精度和效率，有效提升了现有系统的威胁识别能力。

详情

AI中文摘要

防御当今日益复杂的网络攻击需要安全分析师不断将不断演变的攻击者技术转化为检测逻辑。这使防御者处于被动状态，需要在日益碎片化的安全格局中不断更新专业知识。我们引入了动态威胁检测代理（DTDA），一种始终在线的自适应代理，持续调查Microsoft Defender中的安全事件，以发现隐藏威胁并在发现攻击故事缺口时生成可解释的检测。DTDA结合了：（1）统一的活动时间线，涵盖警报、事件、用户和实体行为分析以及威胁情报；（2）版本化的LLM提示合同，包含模式验证、基础要求、有限重试和故障关闭抑制；（3）规划器-执行器调查循环，生成攻击特定假设并收集支持和反驳证据；（4）动态告警生成，包含上下文相关的标题、严重性、MITRE映射、修复指导、涉及实体和自然语言攻击描述。集成到Microsoft Security Copilot并部署在数万个Defender客户中，DTDA在行业规模下持续运行。在120天的在线评估中，DTDA根据客户反馈实现了80.1%的精确率，同时为约15%的调查事件生成了新颖告警。在离线评估中，DTDA使用GPT-5.4以0.78的F1分数恢复了隐藏的恶意活动，比GPT-4.1提高了0.12 F1，并比基线高出0.26 F1点。在操作上，DTDA处理单个事件调查的中位端到端时间为28分钟，中位代币成本为2.04美元，作业级故障率为0.38%。这些结果表明，自主代理可以在生产规模上识别遗漏的恶意活动。

英文摘要

Defending against today's increasingly sophisticated cyberattacks requires security analysts to continuously translate evolving attacker tradecraft into detection logic. This places defenders in a reactive posture, requiring constantly updated expertise across an increasingly fragmented security landscape. We introduce the Dynamic Threat Detection Agent (DTDA), an always-on adaptive agent that continuously investigates security incidents across Microsoft Defender to uncover hidden threats and generate explainable detections when attack-story gaps are found. DTDA combines: (1) a unified activity timeline spanning alerts, events, user and entity behavior analytics, and threat intelligence; (2) versioned LLM prompt contracts with schema validation, grounding requirements, bounded retries, and fail-closed suppression; (3) a planner-executor investigation loop that generates attack-specific hypotheses and gathers supporting and refuting evidence; and (4) dynamic alert generation with a context-relevant title, severity, MITRE mappings, remediation guidance, implicated entities, and natural-language attack description. Integrated into Microsoft Security Copilot and deployed across tens of thousands of Defender customers, DTDA operates continuously at industry scale. In a 120-day online evaluation, DTDA achieves 80.1% precision from customer feedback while generating novel alerts for approximately 15% of investigated incidents. In offline evaluation, DTDA recovers hidden malicious activity with 0.78 F1 using GPT-5.4, improving over GPT-4.1 by 0.12 F1 and outperforming the baseline by 0.26 F1 points. Operationally, DTDA processes single-incident investigations end-to-end in a median of 28 minutes at a median token cost of USD 2.04, with a 0.38% job-level failure rate. These results demonstrate that autonomous agents can identify missed malicious activity at a production scale.

URL PDF HTML ☆

赞 0 踩 0

2605.18370 2026-05-25 stat.ML cs.LG math.ST stat.TH

On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions

重尾分布下样本分位数的稳定性与分解

Choudur Lakshminarayan

发表机构 * School of Business, Stevens Institute of Technology（斯蒂文斯理工学院商学院）

AI总结本文研究了在重尾分布下，基于估计参数的样本分位数的稳定性与分解问题，尤其关注与金融收益线性投影相关的风险价值（VaR）估计。传统Bahadur表示在固定分布下难以分离投影方向和分位数阈值带来的不稳定性，本文提出一种Q-Q正交性方法，将两者的影响分离开来，并将样本分位数与理论分位数的差异分解为三个部分，分别对应投影方向变化、样本分位数波动以及余项，从而更精确地分析分位数估计的稳定性来源。

Comments 0 figures

详情

AI中文摘要

我们研究由估计参数索引的分布样本分位数，重点关注与金融收益线性投影相关的风险价值，其潜在概率律是重尾的。在此设定下，投影方向和经验分位数阈值均从数据中估计，因此固定分布下的标准Bahadur表示无法分离不同的不稳定性来源。一个规范的起点是Bahadur表示，它通过经验分布函数加上余项来表达样本分位数\cite{bahadur1966}。经验过程理论通过半空间、对称差和Glivenko-Cantelli一致收敛的机制提供了可用的框架。它们给出了稳定性界，但将投影方向的变化和分位数阈值的变化吸收到单一的对称差度量中。有趣的是，对于本质上是局部分位数稳定性问题，却施加了全局一致收敛的要求。本文引入了一种Q-Q正交性公式来分离投影方向和分位数阈值效应。关注的对象是使用估计投影方向计算的经验分位数与参考投影方向下的总体分位数之间的差异。我们将此差异分解为三项：$\hat q_α(\hat w)-q_α(w_0)=D_1+D_2+D_3$。其中，$D_1$衡量由投影方向扰动引起的总体分位数移动，$D_2$衡量在投影方向固定时经验分位数的波动，$D_3$是Bahadur型余项。

英文摘要

We study sample quantiles of distributions indexed by estimated parameters, with a on Value-at-Risk related to linear projections of financial returns that whose underlying probability law is heavy-tailed. In this setting, the projection direction and the empirical quantile threshold are estimated from the data, so the standard Bahadur representation under a fixed distribution does not separate the distinct sources of instability. A canonical starting point is Bahadur's representation, which expresses the sample quantile through the empirical distribution function plus a remainder term \cite{bahadur1966}. Empirical-process theory provides a usable scaffolding through the mechanics of half-spaces, symmetric differences, and Glivenko--Cantelli uniform convergence. They yield stability bounds, but absorb changes in projection direction and changes in quantile threshold into a single symmetric-difference measure. Interestingly, a global uniform-convergence requirement is imposed on what is intrinsically a local quantile-stability problem. This paper introduces a Q-Q orthogonality formulation for separating projection-direction and quantile-threshold effects. The object of interest is the difference between the empirical quantile computed using the estimated projection direction and the population quantile computed at the reference projection direction. We decompose this difference into three terms, $\hat q_α(\hat w)-q_α(w_0)=D_1+D_2+D_3$. Here, $D_1$ measures the population quantile movement induced by perturbing the projection direction, $D_2$ measures the empirical quantile fluctuation with the projection direction held fixed, and $D_3$ is the Bahadur-type remainder.

URL PDF HTML ☆

赞 0 踩 0

2605.17767 2026-05-25 stat.ML cs.LG

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

线性宽度双层网络中的特征学习：梯度下降的两步 vs 一步

Behrad Moniri, Hamed Hassani

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结本文研究了在宽度线性增长的两层神经网络中特征学习的行为，重点分析了梯度下降第二步更新时隐藏层权重的变化。作者超越了之前仅分析单步更新的研究，揭示了第二步更新中权重的谱特性，表明其行为类似于具有多个异常值的尖峰随机矩阵，这些异常值对应于学习到的不同方向。研究还发现，通过重复使用训练批次而非独立批次，可以学习到信息指数大于一的方向，表明批次重用在宽网络中仍具有优势。

详情

AI中文摘要

我们在线性宽度机制下研究双层神经网络中的特征学习，其中隐藏神经元数量、样本量和输入维度成比例缩放。尽管近期工作分析了该机制下通过单步梯度下降更新第一层权重的特征学习，但这种单步更新方案存在根本性限制：权重更新近似秩一，仅捕获单个方向，且要求目标函数的信息指数为1。本文超越单步更新，完整刻画了步长$η_1\asymp N^{α_1}$和$η_2 \asymp N^{α_2}$（$α_1, α_2 \in [0,0.5)$，$N$为隐藏神经元数）的梯度下降 extit{第二步}过程中学习的特征。我们推导了更新权重的谱特征，证明其表现为具有多个离群点的尖峰随机矩阵，每个离群点对应一个学习方向。我们证明离群点数量由参数$α_1, α_2$通过$\lfloor \frac{α_2}{1/2 - α_1} \rfloor$决定。此外，通过分析学习方向与目标函数之间的对齐，我们发现了独立批次与重用批次训练之间的差距。独立批次将学习限制在信息指数为1的方向上，而批重用使得第二步更新能够捕获信息指数超过1的方向，前提是$α_1, α_2$选择得当。这表明先前在窄宽度机制中观察到的批重用优势在线性宽度极限下仍然存在。通过刻画这些早期阶段的演化，我们的工作为研究现代过参数化网络中的优化和特征学习现象提供了一个易处理的框架。

英文摘要

We study feature learning in two-layer neural networks within the linear-width regime, where the number of hidden neurons, sample size, and input dimension scale proportionally. While recent work has analyzed feature learning via a single step of gradient descent on the first layer weights in this regime, such one-step update schemes are fundamentally limited: the update to the weights is approximately rank-one, captures only a single direction, and requires the target function to have an information exponent of one. In this paper, we go beyond one-step updates to provide a full characterization of the features learned during the \textit{second step} of gradient descent with step-sizes $η_1\asymp N^{α_1}$ and $η_2 \asymp N^{α_2}$ for $α_1, α_2 \in [0,0.5)$, where $N$ is the number of hidden neurons. We derive a spectral characterization of the updated weights, demonstrating they behave as a spiked random matrix with multiple outliers, each corresponding to a learned direction. We show that the number of the outliers is determined by the parameters $α_1, α_2$ through $\lfloor \frac{α_2}{1/2 - α_1} \rfloor$. Furthermore, by analyzing the alignment between the learned directions and the target function, we identify a gap between training with independent versus reused batches. While independent batches restrict learning to directions with an information exponent of one, batch reuse enables the second update to capture directions even when the information exponent exceeds one, provided that $α_1, α_2$ are chosen properly. This shows that the benefits of batch reuse, previously observed in narrow-width regimes, persist in the linear-width limit as well. By characterizing these early-phase evolutions, our work proposes a tractable framework for studying optimization and feature learning phenomenology in modern overparameterized networks.

URL PDF HTML ☆

赞 0 踩 0

2605.17245 2026-05-25 cs.NI cs.LG

An Efficient Machine Learning-based Framework for Detection and Prevention of Frauds in Telecom Networks

一种基于机器学习的高效电信网络欺诈检测与预防框架

Praveen Hegde, Mishal Shah

发表机构 * Verizon Bloomberg LP（Verizon Bloomberg实验室）； Atlanta, USA（美国亚特兰大）； Jersey City, USA（美国新泽西州杰赛尔城）

AI总结本文提出了一种基于机器学习的高效框架，用于电信网络中欺诈行为的检测与预防。研究使用包含10万余条客户记录的电信详单数据集，通过特征预处理、数据平衡和模型训练等步骤，评估了多种机器学习模型的性能。实验结果表明，随机森林（RF）模型在准确率、精确率、召回率和F1分数等指标上均达到99.9%，是检测电信欺诈最有效的模型。

Comments Peer-reviewed and presented at 2025 International Conference on Advancement in Communication and Computing Technology (INOACC-2025); self-published by the author due to a sustained 13-month indexing delay by the organizers. Contains 7 pages and 7 figures

Journal ref International Conference on Advancement in Communication and Computing Technology (INOACC), 2025

详情

AI中文摘要

电信欺诈是一个严重问题，导致重大物质损失并损害全球电信系统的可靠性。只有有效且高效的检测机制才能应对这些威胁，尽管欺诈检测方法有所转变。本文使用通话详细记录（CDR）数据集评估了人工智能驱动的模型在电信网络欺诈检测中的性能。本研究聚焦于使用Telecom CDR数据集进行电信网络欺诈检测，该数据集包含101,174条客户记录，具有17个属性，其中包括8,830个欺诈案例。在特征预处理中，处理了缺失值，随后使用Min-Max缩放进行数据缩放，并使用SMOTE技术进行数据平衡。使用随机森林（RF）和XGBoost模型对数据集进行预测分析训练。使用F1分数、ROC AUC、召回率、准确率、时间和精确度作为指标来比较两个模型的性能。RF的准确率高达99.9%，而XGBoost为99.7%。结果表明，所提出的框架成功检测欺诈且误分类很少。评估和对比了多种机器学习模型，如RF、XGBoost、DBSCAN、RoBERTa和K-means。在所有模型中，RF表现最佳，准确率99.9%、精确度99.9%、召回率99.9%和F1分数99.9%，优于XGBoost、GNN和BERT。研究结果强调RF是检测电信网络欺诈活动的最有效模型，确保稳健可靠的欺诈预防。

英文摘要

Telecommunication fraud is an acute problem that leads to substantial material losses and compromises the reliability of telecom systems worldwide. Only effective and efficient detection mechanisms can help to deal with these threats, though there are certain shifts in the approaches to fraud detection. This paper evaluates the performance of AI-driven models for fraud detection in telecommunication networks using Call Detail Record (CDR) datasets. This study focuses on fraud detection in telecom networks using the Telecom CDR dataset, which contains 101,174 customer records with 17 attributes, including 8,830 fraud cases. In feature preprocessing, missing values were dealt with, followed by data scaling using Min-Max scaling and data balancing using the SMOTE technique. The dataset was trained for predictive analysis using Random Forest (RF) and XGBoost models. F1-score, ROC AUC, recall, accuracy, time, and precision were used as indicators with which to compare performance of the two models. RF recorded a high level of accuracy at 99.9% while XGBoost at 99.7%. Findings show that the suggested framework successfully detects fraud with few misclassifications. Several machine learning models were evaluated and contrasted, such as RF, XGBoost, DBSCAN, RoBERTa, and K-means. Among all the models, RF was seen to give the highest performance with an accuracy of 99.9% and precision of 99.9%, recall of 99.9% and F1-score of 99.9%, XGBoost, GNN and BERT. The findings emphasize RF as the most effective model for detecting fraudulent activities in telecom networks, ensuring robust and reliable prevention of fraud.

URL PDF HTML ☆

赞 0 踩 0

2605.16283 2026-05-25 cs.CY cs.AI

Can the Recovery Mechanism Survive AI? Skill Formation, Labor, and What Current Measurement Misses

恢复机制能否在人工智能中幸存？技能形成、劳动以及当前测量所遗漏的

Aysa Xuemo Fan

发表机构 * Aysa Xuemo Fan

AI总结本文探讨了生成式人工智能对传统技能形成机制的潜在冲击，指出AI可能首次打破技术进步与教育适应之间的历史循环。通过劳动经济学理论、大规模AI交互数据及技能形成实验，研究提出了三个核心贡献：构建了存量与流量分析框架，揭示当前AI主要增强现有劳动者能力却削弱下一代培养管道；系统分析发现现有研究普遍忽视认知中的知识维度，且AI虽提升表现却未促进学习；提出扩展认知分类体系，区分有助于和阻碍学习的AI交互模式。研究强调AI的社会风险不在于替代教师，而在于消除下一代能力形成所需的挑战过程。

详情

AI中文摘要

在整个现代时期，当新技术取代工人时，社会通过相同的机制进行适应：教育提高了认知上限，培养出能够完成机器尚未触及任务的工人。生成式AI可能是第一个打破这一循环的技术，因为它现在运作于该上限的顶端。本文借鉴劳动经济学、来自多个平台的数百万AI对话部署数据、对两个公共数据集的原始重新分析以及技能形成实验，提出了三项贡献。首先，一个存量-流量框架显示，经济数据和教育数据对同一技术讲述了不同的故事：增强主导当前工人，但培养下一代的发展管道正承受压力。其次，对证据基础的系统性差距分析揭示，所有主要研究均未测量认知的知识维度，三项测量学习成果的研究（每项n<200）一致发现AI提高了表现但未改善学习（在我们的跨平台重新分析中d=1.21），且没有研究连接专业人群和学生人群。第三，一个扩展的认知分类法（不确定性下的判断、认知身份和认知能动性）应用于证据中的三个案例，以区分保留学习的AI交互模式与结构相似但侵蚀学习的模式。本文认为，AI的社会风险不在于取代教师，而在于消除下一代能力形成所必需的生产性挣扎，并提出了针对当前测量系统所遗漏内容的研究和设计议程。

英文摘要

Throughout the modern era, when new technologies displaced workers, societies adapted through the same mechanism: education raised the cognitive ceiling, producing workers capable of tasks machines could not yet reach. Generative AI may be the first technology to break this cycle, because it now operates at the top of that ceiling. Drawing on labor economics, deployment data from millions of AI conversations across multiple platforms, original reanalysis of two public datasets, and skill-formation experiments, this paper develops three contributions. First, a stock-versus-flow framework showing that economic data and education data tell divergent stories about the same technology: augmentation dominates current workers, but the developmental pipeline producing the next generation is under strain. Second, a systematic gap analysis of the evidence base, revealing that the knowledge dimension of cognition is unmeasured across all major studies, that the three studies measuring learning outcomes (each $n < 200$) consistently find AI improves performance without improving learning ($d = 1.21$ in our cross-platform reanalysis), and that no study bridges professional and student populations. Third, an extended cognitive taxonomy (judgment under uncertainty, epistemic identity, and epistemic agency) applied to three cases from the evidence to distinguish AI interaction patterns that preserve learning from structurally similar ones that erode it. The paper argues that AI's societal risk lies not in replacing teachers but in eliminating the productive struggle through which the next generation's capacity forms, and proposes a research and design agenda targeting what current measurement systems miss.

URL PDF HTML ☆

赞 0 踩 0

2605.15652 2026-05-25 cs.NE cs.AI

Bridging Silicon and the Hippocampus: Algebro-Deterministic Memory "VaCoAl" as a Substrate for Vector-HaSH and TEM

连接硅与海马体：作为Vector-HaSH和TEM基底的代数确定性记忆“VaCoAl”

Hiroyuki Chuma, Kanji Otsuka, Yoichi Sato

发表机构 * Institute of Innovation Research, Hitotsubashi University（立命馆大学创新研究所）； Meisei University（明海大学）； Shuhari System（Shuhari系统）

AI总结该研究提出了一种基于伽罗瓦域线性反馈移位寄存器的代数确定性高维记忆架构VaCoAl，旨在为Vector-HaSH和TEM模型提供统一的数学基础。VaCoAl通过确定性扩散机制替代随机投影，实现了与Vector-HaSH相似的准正交性，同时保证了位精确的可复现性，并引入路径积分置信度比模型解释记忆回放的乘法衰减现象。研究还揭示了VaCoAl与海马体环路的生物学对应关系，并将其与因果推理层级联系起来，为计算神经科学与高维计算的融合提供了理论支撑。

Comments 52 pages, 5 figures, 1 table, 3 appendices

详情

AI中文摘要

Vector-HaSH和Tolman-Eichenbaum Machine（TEM）提出海马-内嗅回路通过网格细胞支架进行组合回放来分解记忆。同时，人类颅内脑电图显示尖波涟漪门控回忆，且多跳回放保真度呈乘法衰减。然而，这些领域缺乏共同的代数基础。我们引入VaCoAl，一种基于伽罗瓦域线性反馈移位寄存器的代数确定性超维记忆架构。其确定性伽罗瓦域扩散为Vector-HaSH的随机投影提供了基底级替代，在匹配准正交性的同时确保位精确可重现性。此外，路径积分置信比CR2为经验观察到的乘法回放衰减提供了代数可处理模型。在生物学上，VaCoAl的两种工作模式与EC-CA3直接通路和EC-DG-CA3三突触通路一致，解释了它们5.2亿年的保守性。独立的细胞证据支持DG-CA3通路实现了伽罗瓦域算术的生物物理同源物。我们还将这一框架与Judea Pearl的因果关系阶梯联系起来。可逆的GF(2)绑定为do算子（第2层）提供了手术代数，而VaCoAl的双正交化器架构为反事实推理（第3层）提供了所需的并行基底。最终，我们证明了这些形式对应关系并推导出可测试的颅内脑电图预测，统一了计算神经科学、电生理学和超维计算。

英文摘要

Vector-HaSH and the Tolman-Eichenbaum Machine (TEM) propose the hippocampal-entorhinal circuit factorizes memory via a grid-cell scaffold for compositional replay. Concurrently, human iEEG shows sharp-wave ripples gate recall and multi-hop replay fidelity decays multiplicatively. Yet, these fields lack a shared algebraic foundation. We introduce VaCoAl, an algebro-deterministic hyperdimensional memory architecture built on Galois-field linear-feedback shift registers. Its deterministic Galois-field diffusion offers a substrate-level alternative to Vector-HaSH's random projections, matching quasi-orthogonality while ensuring bit-exact reproducibility. Furthermore, the path-integral Confidence Ratio CR2 provides an algebraically tractable model for the empirically observed multiplicative replay decay. Biologically, VaCoAl's two operating regimes align with the EC-CA3 direct and EC-DG-CA3 trisynaptic pathways, explaining their 520-Myr conservation. Independent cellular evidence supports that the DG-CA3 pathway implements a biophysical homologue of Galois-field arithmetic. We also link this framework to Judea Pearl's Ladder of Causation. Reversible GF(2) binding provides the surgical algebra for the do-operator (Rung 2), and VaCoAl's dual-orthogonalizer architecture supplies the parallel substrate required for counterfactual reasoning (Rung 3). Ultimately, we prove these formal correspondences and derive testable iEEG predictions, uniting computational neuroscience, electrophysiology, and hyperdimensional computing.

URL PDF HTML ☆

赞 0 踩 0

2605.11215 2026-05-25 cs.DC cs.AI

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

ReCoVer: 通过容错集合和多功能工作负载实现的弹性LLM预训练系统

Ziyue Liu, Zhengyang Wang, Ruijie Zhang, Avinash Maurya, Hui Zhou, Paul Hovland, Sheng Di, Franck Cappello, Bogdan Nicolae, Zheng Zhang

发表机构 * University of California at Santa Barbara（加州大学圣芭芭拉分校）； Argonne National Laboratory（阿贡国家实验室）

AI总结在大规模GPU集群上预训练大语言模型时，硬件故障已成为常态，因此需要构建具有弹性的训练系统。本文提出ReCoVer，一种通过容错集体通信和多样化工作负载策略实现鲁棒预训练的系统，其核心在于保持每轮迭代微批次数量不变，从而确保梯度与无故障训练过程保持统计一致。ReCoVer支持多种并行方案，能够在GPU故障情况下维持训练轨迹，实验表明其在处理能力与训练效率上显著优于传统检查点重启方法。

Comments Preprint

详情

AI中文摘要

在大型GPU集群上预训练大型语言模型使得硬件故障变得常见而非罕见，推动了对弹性训练系统的需求。然而，现有框架要么专注于特定的并行方案，要么存在偏离无故障训练轨迹的风险。我们提出ReCoVer，一个弹性LLM预训练系统，它维护一个单一不变性：每次迭代保持微批次数恒定，确保每次迭代的梯度在随机意义上等同于无故障运行。该框架组织为三个解耦的协议层：(1) 容错集合，隔离故障以防止跨副本传播；(2) 步内细粒度恢复，保留迭代内进度并防止梯度损坏；(3) 多功能工作负载策略，动态地在幸存者之间重新分配微批次配额。该设计与并行方案无关，可直接作为即插即用基础集成到3D并行和混合分片数据并行(HSDP)中。我们在多达512个GPU的端到端预训练任务上评估了我们的实现，ReCoVer成功保持了无故障参考的训练轨迹，尽管在整个运行过程中丢失了256个GPU。与检查点重启基线相比，ReCoVer在连续故障后有效吞吐量提高了2.23倍。这一优势使得ReCoVer在234 GPU小时内处理了74.9%更多的令牌，且随着训练时间延长差距进一步扩大。

英文摘要

Pre-training large language models on massive GPU clusters has made hardware faults routine rather than rare, driving the need for resilient training systems. Yet existing frameworks either focus on specific parallelism schemes or risk drifting away from a failure-free training trajectory. We propose ReCoVer, a resilient LLM pre-training system that upholds a single invariant: each iteration keeps the number of microbatches constant, ensuring per-iteration gradients remain stochastically equivalent to a failure-free run. The framework is organized as three decoupled protocol layers: (1) Fault-tolerant collectives that isolate faults from propagating across replicas; (2) in-step fine-grained recovery that preserves intra-iteration progress and prevents gradient corruption; (3) versatile-workload policy that dynamically redistributes microbatch quotas across the survivors. The design is parallelism-agnostic, integrating directly with both 3D parallelism and Hybrid Sharded Data Parallel (HSDP) as a drop-in substrate. We evaluate our implementation on end-to-end pre-training tasks for up to 512 GPUs, ReCoVer successfully preserves the training trajectory from a failure-free reference despite of 256 GPUs lost spread across the run. For comparison with checkpoint-and-restart baselines, ReCoVer demonstrates $2.23\times$ higher effective throughput after successive failures. This advantage results in ReCoVer processing 74.9% more tokens at 234 GPU-hours, with the gap widening as the training prolongs.

URL PDF HTML ☆

赞 0 踩 0

2605.11053 2026-05-25 cs.CR cs.AI cs.LG

Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols

LLM Agent工具调用流量中的内容感知攻击检测：特征、架构与评估协议的实证研究

Sultan Zavrak

发表机构 * Department of Computer Engineering, Duzce University（杜兹大学计算机工程系）

AI总结本文研究了大语言模型代理在调用外部工具时的流量攻击检测问题，提出了一种基于内容感知的检测框架，将每个代理会话建模为图结构，并结合语句嵌入特征进行分类。研究对比了多种图神经网络和传统机器学习模型，发现内容级别的特征对检测性能至关重要，且基于SBERT的嵌入特征在多个数据集上表现优异，优于图神经网络和MLP模型。此外，研究还揭示了数据划分方式对评估结果的影响，并指出先前工作未充分考虑这一问题。

Comments v2: renamed manuscript (brand removed; descriptive title). No changes to methodology, results, tables, or figures

详情

AI中文摘要

模型上下文协议（MCP）已成为LLM agent调用外部工具的广泛采用的接口，然而对MCP工具调用流量的学习监控仍未被充分探索。本文提出的检测器是一个针对MCP工具调用流量的攻击检测框架，它将每个agent会话编码为图（工具调用作为节点，顺序和数据流链接作为边），通过参数和响应的句子嵌入特征丰富节点，并将会话分类为良性或受攻击。评估了三种GNN架构（GAT、GCN、GraphSAGE）、一个无图MLP以及经典基线（XGBoost、随机森林、逻辑回归、线性SVM），完整架构比较在RAS-Eval（任务分层分割）上进行，GraphSAGE作为GNN基线保留在ATBench和组合源变体（均标签分层）上。得出三个发现。首先，内容级特征至关重要：仅元数据检测的AUROC停滞在0.64左右，无论架构如何，而内容嵌入将AUROC推高至0.89以上。其次，相对于任务不相交分割，朴素随机分割评估将AUROC高估多达26个百分点，这是先前agent检测工作未解决的记忆混淆问题。第三，检测信号主要存在于SBERT内容嵌入中：在池化嵌入上，树集成达到了0.975的AUROC，在大多数情况下优于主要RAS-Eval设置中的神经架构，包括GNN（0.917）和MLP（0.896），并且自监督预训练在此任务上未带来标签效率优势。

英文摘要

The Model Context Protocol (MCP) has become a widely adopted interface for LLM agents to invoke external tools, yet learned monitoring of MCP tool-call traffic remains underexplored. In this article, the proposed detector is presented as an attack detection framework for MCP tool-call traffic that encodes each agent session as a graph (tool calls as nodes, sequential and data-flow links as edges), enriches nodes with sentence-embedding features over arguments and responses, and classifies sessions as benign or attacked. Three GNN architectures (GAT, GCN, GraphSAGE), a no-graph MLP, and classical baselines (XGBoost, random forest, logistic regression, linear SVM) are evaluated, with the full architecture comparison conducted on RAS-Eval (task-stratified splits) and GraphSAGE retained as the GNN baseline on ATBench and a combined-source variant (both label-stratified). Three findings emerge. First, content-level features are essential: metadata-only detection plateaus around an AUROC of 0.64 regardless of architecture, while content embeddings push the AUROC above 0.89. Second, naive random-split evaluation inflates AUROC by up to 26 percentage points relative to task-disjoint splits, a memorization confound that prior agent-detection work has not addressed. Third, the detection signal resides primarily in the SBERT content embeddings: an AUROC of 0.975 was reached by tree ensembles on pooled embeddings, performing, for the most part, better than the neural architectures in the primary RAS-Eval setting including GNNs (0.917) and the MLP (0.896), and self-supervised pre-training does not deliver a label-efficiency advantage on this task.

URL PDF HTML ☆

赞 0 踩 0

2605.10220 2026-05-25 astro-ph.GA cs.LG

Stellar Age Compression Reshapes Interpretations of the Milky Way Thick-Disk Formation History

恒星年龄压缩重塑对银河系厚盘形成历史的解释

Zhipeng Zhang

发表机构 * China Mobile Research Institute（中国移动研究院）； China Mobile GBA (Greater Bay Area) Innovation Institute（中国移动粤港澳大湾区创新研究院）

AI总结银河厚盘的形成时间尺度是银河考古学中的核心问题之一。本研究通过比较光谱推断年龄和星震学年龄两种独立的恒星年龄标度，发现厚盘形成历史的关键观测特征在星震学锚定下发生了系统性变化，表明之前支持快速形成的观点可能受到恒星年龄压缩效应的影响。研究进一步表明，年龄压缩变换本身即可解释快速形成特征的观测结果，无需假设厚盘本身具有突发形成的历史，揭示了银河形成历史的统计解释可能高度依赖于恒星年龄的定义。

详情

AI中文摘要

银河系厚盘的形成时标是银河考古学的核心争论之一。年龄-金属丰度关系（AMR）、形成时标和化学演化梯度常被用来推断厚盘的快速聚集、短时标增丰和爆发式形成历史。然而，恒星年龄并非直接可观测，这引入了推断年龄可能因观测质量而存在系统性压缩的潜在风险。在本文中，我们使用相同的恒星样本和相同的物理协变量匹配条件，但采用两种独立的年龄标度——光谱推断年龄（astroNN）和星震学年龄（APOKASC-3）——来比较厚盘形成历史的可观测特征。我们发现，先前支持厚盘快速形成的几个关键可观测特征在星震学锚定下系统性减弱：AMR斜率从-3.29变为-1.86 Gyr dex⁻¹（Δa = +1.43），形成时标从3.04 Gyr展宽至3.55 Gyr，峰值形成年龄从9.1 Gyr移至6.0 Gyr。通过传输反演实验，我们进一步表明加性噪声只能展宽年龄分布而无法重现上述模式，而压缩性传输映射（λ < 1）能同时重现更窄的年龄分布、更陡的AMR以及类似快速形成的可观测特征。这一结果表明，压缩变换本身足以产生有利于快速形成的可观测特征，而无需内在的爆发式形成历史。我们的发现揭示了银河系形成历史的统计解释可能敏感地依赖于恒星年龄定义本身。

英文摘要

The formation timescale of the Milky Way thick disk is one of the central debates in Galactic archaeology. The age-metallicity relation (AMR), formation timescale, and chemical evolution gradients are frequently used to infer a rapid assembly, short-timescale enrichment, and bursty formation history of the thick disk. However, stellar ages are not directly observable, introducing the potential risk that inferred ages may harbor a systematic compression tied to observational quality. In this paper, we use the same stellar sample and identical physical covariate matching conditions, but two independent age scales--spectroscopic inferred ages (astroNN) and asteroseismic ages (APOKASC-3)--to compare the observable signatures of the thick-disk formation history. We find that several key observables previously supporting a rapid thick-disk formation are systematically weakened under seismic anchoring: the AMR slope flattens from -3.29 to -1.86 Gyr dex-1 (Delta a = +1.43), the formation timescale widens from 3.04 to 3.55 Gyr, and the peak formation age shifts from 9.1 to 6.0 Gyr. Through transport inversion experiments, we further show that additive noise can only broaden the age distribution and cannot reproduce the above pattern, whereas a compressive transport map (lambda < 1) simultaneously reproduces a narrower age distribution, a steeper AMR, and rapid-formation-like observables. This result indicates that the compression transformation itself is sufficient to generate rapid-formation-friendly observables without requiring an intrinsically bursty formation history. Our findings reveal that statistical interpretations of the Milky Way formation history may depend sensitively on the stellar age definition itself.

URL PDF HTML ☆

赞 0 踩 0

2605.10219 2026-05-25 math.OC cs.CC cs.LG

Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

分段仿射函数与浅层CNN损失的平稳性检验的参数化复杂性

Yuhan Ye

发表机构 * MIT（麻省理工学院）

AI总结本文研究了在给定的点上测试连续分段仿射（PA）函数近似一阶平稳性的参数化复杂度问题，这是非光滑优化中的基本任务。作者从参数化复杂度的角度出发，以环境维度 $d$ 为参数，给出了固定维度下的XP算法，并证明了其对立面的W[1]-难性。此外，研究还扩展到浅层ReLU卷积神经网络的训练损失函数，表明相同参数化复杂度的结论也适用于这类简单CNN的训练问题。

Comments 32 pages, 1 figure, 1 table

详情

AI中文摘要

我们研究了在指定点检验连续分段仿射（PA）函数的近似一阶平稳性的参数化复杂性，这是非光滑优化中的基本任务。PA函数构成了非光滑平稳性检验的典型模型，并捕捉了ReLU型训练损失中出现的局部多面体几何。Tian和So（SODA 2025）最近的工作表明，在最坏情况下，PA函数的近似平稳性概念检验在计算上难以处理，并将固定维度的可处理性确定为一个开放方向。我们从参数化复杂性的角度处理这一方向，以环境维度$d$作为参数。在本文中，我们为可处理侧给出了固定维度的XP算法，并为互补侧证明了W[1]-难度。此外，在指数时间假设下的下界排除了运行时间为$ρ(d)\size^{o(d)}$的算法，其中$\size$表示平稳性检验实例的总二进制编码长度，$ρ$为任意可计算函数。作为进一步的结果，我们的结果给出了检验连续PA函数局部极小性的相应参数化复杂性图景。我们进一步将硬度结果推广到一系列浅层ReLU CNN训练损失，在可训练权重空间中检验平稳性。因此，简单的CNN训练损失也出现了相同的参数化复杂性图景。

英文摘要

We study the parameterized complexity of testing approximate first-order stationarity at a prescribed point for continuous piecewise-affine (PA) functions, a basic task in nonsmooth optimization. PA functions form a canonical model for nonsmooth stationarity testing and capture the local polyhedral geometry that appears in ReLU-type training losses. Recent work by Tian and So (SODA 2025) shows that testing approximate stationarity notions for PA functions is computationally intractable in the worst case, and identifies fixed-dimensional tractability as an open direction. We address this direction from the viewpoint of parameterized complexity, with the ambient dimension $d$ as the parameter. In this paper, we give XP algorithms in fixed dimension for the tractable sides, and prove W[1]-hardness for the complementary sides. Moreover, lower bounds under the Exponential Time Hypothesis rule out algorithms running in time $ρ(d)\size^{o(d)}$ for any computable function $ρ$, where $\size$ denotes the total binary encoding length of the stationarity-testing instance. As a further consequence, our results yield the corresponding parameterized complexity picture for testing local minimality of continuous PA functions. We further extend our hardness results to a family of shallow ReLU CNN training losses, with stationarity tested in the trainable weight space. Thus, the same parameterized-complexity picture also appears for simple CNN training losses.

URL PDF HTML ☆

赞 0 踩 0

2605.07717 2026-05-25 cs.SE cs.AI

The AI-Native Large-Scale Agile Software Development Manifesto

AI原生大规模敏捷软件开发宣言

Ricardo Britto, Fredrik Palmgren, Nishrith Saini, Marcus Ohlin

发表机构 * Ericsson, Sweden（爱立信（瑞典））； Blekinge Institute of Technology, Sweden（布莱金厄技术学院（瑞典））

AI总结尽管敏捷方法被广泛应用，但在大规模软件开发中实现真正的敏捷性仍然具有挑战。本文提出《AI原生的大规模敏捷软件开发宣言》，旨在将人工智能作为核心参与者而非辅助工具，重新定义大规模软件开发的组织方式。该宣言基于六大原则，强调通过智能、自适应和持续学习的系统，取代传统的会议驱动、文档密集和顺序式开发流程，从而提升组织层面的敏捷性。

详情

AI中文摘要

尽管敏捷方法被广泛采用，但在大规模实现真正的敏捷性仍然难以捉摸。大规模敏捷框架仍然以人为中心和手动为主，依赖协调会议、工件同步和基于角色的交接，这抑制了实时适应。与此同时，AI的快速进步，特别是大型语言模型，已经开始改变软件工程，但它们对组织级敏捷性的潜力仍未得到充分探索。我们提出了AI原生大规模敏捷软件开发宣言：一组价值观和原则，重新定义了当AI成为一等参与者而非外围工具时，大规模软件开发的组织方式。该宣言基于六项原则：并行流程、意图驱动团队、活知识、验证优先保障、编排的代理工作力和可重用蓝图，这些原则共同将开发从会议驱动、文档繁重、顺序的流程转变为智能、自适应、持续学习的系统。

英文摘要

Despite the widespread adoption of agile methods, achieving true agility at scale remains elusive. Large-scale agile frameworks remain largely human-centric and manual, relying on coordination meetings, artifact synchronization, and role-based handoffs that inhibit real-time adaptation. Meanwhile, rapid advances in AI, particularly large language models, have begun transforming software engineering, yet their potential for organizational-level agility remains underexplored. We present the AI-Native Large-Scale Agile Software Development Manifesto: a set of values and principles that redefine how large-scale software development is organized when AI becomes a first-class participant rather than a peripheral tool. The manifesto is grounded in six principles, parallel processes, intent-driven teams, living knowledge, verification-first assurance, orchestrated agent workforces, and reusable blueprints, that together shift development from a meeting-driven, document-heavy, sequential process to an intelligent, adaptive, continuously learning system.

URL PDF HTML ☆

赞 0 踩 0

2605.06936 2026-05-25 cs.AR cs.AI cs.MA

Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

跨越电路设计的最后一英里：PostEDA-Bench，一个用于PPA收敛和DRC修复的分层基准

Pengju Liu, Nuo Xu, Jinwei Tang, Yu Cao, Caiwen Ding

发表机构 * University of Minnesota（明尼苏达大学）

AI总结该论文提出了一种名为PostEDA-Bench的分层基准测试平台，用于评估基于大语言模型（LLM）的智能体在电子设计自动化（EDA）流程中“最后一公里”任务中的表现，包括修复设计规则检查（DRC）违规和优化功耗-性能-面积（PPA）目标。该基准包含145个任务，覆盖DRC修复、PPA单目标和多目标优化等场景，并支持多种EDA工具链进行机器可验证的评估。实验表明，当前主流LLM在处理合成DRC和单目标PPA任务时表现尚可，但在更实际的DRC推理和多目标PPA优化任务中效果显著下降，突显了当前模型在复杂设计优化和权衡推理方面仍面临重大挑战。

详情

AI中文摘要

基于LLM的代理越来越多地应用于电子设计自动化（EDA）的“最后一英里”：修复工具运行后残留的签核设计规则检查（DRC）违规并收敛功耗-性能-面积（PPA）目标。然而，现有的EDA-LLM基准完全忽略了DRC修复，并依赖于与单一工具链绑定的扁平层次结构。我们引入了PostEDA-Bench，这是一个分层基准，包含145个任务，涵盖DRC-Essential、DRC-Reasoning、PPA-Mono和PPA-Multi，由支持机器可检查评估的EDA工具链提供支持。在多个代理框架下的八个商业和开源LLM中，我们发现代理能够较好地处理合成DRC-Essential和单目标PPA-Mono任务，但在更实际的DRC-Reasoning（最佳成功率为36.66%）和PPA-Multi（最佳成功率为20.00%）上性能急剧下降；视觉增强始终提升DRC-Bench性能；而权衡推理（而非旋钮知识）是PPA-Multi的主要瓶颈。

英文摘要

LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks, however, omit DRC fixing entirely and rely on flat hierarchies tied to a single toolchain. We introduce PostEDA-Bench, a hierarchical benchmark with 145 tasks across DRC-Essential, DRC-Reasoning, PPA-Mono, and PPA-Multi, supported by EDA toolchains with machine-checkable evaluation. Across eight commercial and open-source LLMs under multiple agent scaffolds, we find that agents handle synthetic DRC-Essential and single-objective PPA-Mono reasonably well but degrade sharply on the more practical DRC-Reasoning, where the best success rate is 36.66%, and PPA-Multi, where the best success rate is 20.00%; vision augmentation consistently enhances DRC-Bench; and trade-off reasoning, rather than knob knowledge, is the dominant PPA-Multi bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2605.05704 2026-05-25 cs.CR cs.AI

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

SafeHarbor：用于LLM智能体安全的分层记忆增强防护栏

Zhe Liu, Zonghao Ying, Wenxin Zhang, Quanchen Zou, Deyue Zhang, Dongdong Yang, Xiangzheng Zhang, Hao Peng

发表机构 * School of Cyber Science and Technology, Beihang University, Beijing, China（北京航空航天大学网络安全学院）； Institute of Artificial Intelligence, Beihang University, Beijing, China（北京航空航天大学人工智能研究院）； University of Chinese Academy of Sciences, Beijing, China（中国科学院大学）； AI Security Lab, Beijing, China（360人工智能安全实验室）

AI总结随着大语言模型（LLM）逐渐具备自主推理和工具执行能力，其在实际应用中面临新的安全风险。为解决现有防御策略在安全性和实用性之间难以平衡的问题，本文提出SafeHarbor，一种基于分层记忆增强的防护框架，通过上下文感知的对抗生成提取防御规则，并结合信息熵驱动的自进化机制动态优化记忆结构，从而在保障安全的同时提升模型对合法请求的响应能力。实验表明，SafeHarbor在多个基准测试中表现出色，显著优于现有方法。

Comments Accepted by ICML 2026

详情

AI中文摘要

基础模型的最新进展已将LLM从被动对话系统转变为能够推理和执行工具的自主智能体。虽然这些能力带来了巨大的实用价值，但也引入了新的安全风险，因为对手可以操纵智能体在现实环境中执行有害操作。现有的防御策略可以缓解此类威胁，但往往难以平衡安全性和实用性，导致对良性用户请求的过度拒绝。为了缓解这种权衡，我们提出了SafeHarbor，一种新颖的框架，旨在为LLM智能体建立精确的决策边界。与静态指南不同，SafeHarbor通过增强对抗生成提取上下文感知的防御规则。我们设计了一个本地分层记忆系统用于动态规则注入，提供了一种无需训练、高效且即插即用的解决方案。此外，我们引入了一种基于信息熵的自进化机制，通过动态节点分裂和合并持续优化记忆结构。大量实验表明，SafeHarbor在模糊的良性任务和明确的恶意攻击上都达到了最先进的性能，特别是在GPT-4o上实现了63.6%的峰值良性效用，同时保持对有害请求超过93%的稳健拒绝率。源代码已公开在https://github.com/ljj-cyber/SafeHarbor。

英文摘要

Recent advances in foundation models have transformed LLMs from passive conversational systems into autonomous agents capable of reasoning and tool execution. While these capabilities unlock substantial practical value, they also introduce new security risks, as adversaries can manipulate agents into performing harmful actions in real-world environments. Existing defense strategies mitigate such threats but frequently struggle to balance safety and utility, resulting in over-refusal of benign user requests. To mitigate this trade-off, we propose SafeHarbor, a novel framework designed to establish precise decision boundaries for LLM agents. Unlike static guidelines, SafeHarbor extracts context-aware defense rules through enhanced adversarial generation. We design a local hierarchical memory system for dynamic rule injection, offering a training-free, efficient, and plug-and-play solution. Furthermore, we introduce an information entropy-based self-evolution mechanism that continuously optimizes the memory structure through dynamic node splitting and merging. Extensive experiments demonstrate that SafeHarbor achieves state-of-the-art performance on both ambiguous benign tasks and explicit malicious attacks, notably attaining a peak benign utility of 63.6\% on GPT-4o while maintaining a robust refusal rate exceeding 93\% against harmful requests. The source code is publicly available at https://github.com/ljj-cyber/SafeHarbor.

URL PDF HTML ☆

赞 0 踩 0

2605.04118 2026-05-25 q-bio.QM cs.AI

ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation

ProtDBench: 蛋白质结合物设计与评估的统一基准

Cong Liu, Milong Ren, Jiaqi Guan, Chengyue Gong, Jinyuan Sun, Xinshi Chen, Wenzhi Xiao

发表机构 * AMLab, AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands（AM实验室、AI4Science实验室、阿姆斯特丹大学、阿姆斯特丹、荷兰）

AI总结本文提出ProtDBench，一个统一的蛋白质配体设计与评估基准框架，旨在解决当前研究中因评估标准不统一而导致的性能指标难以比较的问题。该框架定义了标准化的任务、评估流程和成功标准，并引入基于固定预算和结构多样性的评估指标，揭示了不同验证方法和过滤规则对性能评估的影响。ProtDBench为蛋白质配体设计方法提供了公平、可复现的评估体系，支持在实际条件下进行系统对比。

详情

AI中文摘要

近年来，从头蛋白质结合物设计的进展使得越来越多的实验验证成为可能，但由于缺乏标准化的评估协议，报道的计算指标仍然难以解释或跨研究比较。我们引入了ProtDBench，一个标准化且考虑通量的蛋白质结合物设计评估框架。ProtDBench定义了统一的基准任务、评估协议和成功标准，能够系统分析评估设计如何影响观察到的性能。利用一个大型湿实验标注数据集，我们分析了常用的结构预测模型作为评估验证器，揭示了在相同过滤协议下显著的验证器依赖偏差和有限的一致性。然后，我们在固定评估协议下，针对十个不同的蛋白质靶点，对代表性的开源生成式结合物设计方法进行了基准测试。除了每条序列的成功率外，ProtDBench还基于固定的24小时预算纳入了考虑通量的指标，以及考虑结构多样性的聚类级成功标准。总之，这些结果揭示了过滤规则、成功定义以及考虑通量的评估在计算效率、成功率和结构多样性之间引起的系统性差异。总体而言，ProtDBench提供了一个公平且可复现的评估流程，支持在现实评估设置下对蛋白质结合物设计方法进行系统且受控的比较。

英文摘要

Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce ProtDBench, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.

URL PDF HTML ☆

赞 0 踩 0

2604.25755 2026-05-25 quant-ph cs.CV physics.comp-ph

Quantum-Inspired Robust and Scalable SAR Object Classification

量子启发的鲁棒可扩展SAR目标分类

Maximilian Scharf, Marco Trenti, Felix Bock, Padraig Davidson, Tobias Brosch, Benjamin Rodrigues de Miranda, Sigurd Huber, Timo Felser

发表机构 * Tensor AI Solutions GmbH（Tensor AI解决方案有限公司）； Ulm University（乌尔姆大学）； Institute for Complex Quantum Systems（复杂量子系统研究所）； Hensoldt Sensors GmbH（亨索尔特传感器有限公司）； German Aerospace Center (DLR)（德国航空航天中心（DLR））； Microwaves and Radar Institute（微波与雷达研究所）

AI总结本文研究了合成孔径雷达（SAR）图像分类中面对噪声干扰和动态范围大的挑战，以及在边缘设备上部署时对模型鲁棒性与效率的平衡需求。研究探索了张量网络在提升分类鲁棒性及模型压缩方面的潜力，特别是其对数据中毒攻击的抵御能力。与以往基于传统神经网络的方法不同，本文聚焦于张量网络在目标分类中的鲁棒性与模型简化能力，表明其在应对复杂环境和资源限制方面具有显著优势，为雷达应用和深度学习方法提供了新的见解。

Comments 6 pages, 6 figures, EUSAR 2026 conference

2604.07796 2026-05-25 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

一般尾分布下的最优序贯1比特均值估计

Ivan Lau, Jonathan Scarlett

发表机构 * National University of Singapore（新加坡国立大学）

AI总结本文研究了在1比特通信约束下的均值估计问题，提出了一种基于随机阈值查询的自适应均值估计方法，每个1比特反馈表示样本是否超过顺序选择的阈值。该估计器对任意具有有界均值和有界中心矩的分布具有$(ε, δ)$-PAC性质，且在所有尾部分布情形下均达到最优的样本复杂度。研究还揭示了1比特量化在有限方差情况下的基本性能限制，并展示了自适应方法相比非自适应方法在样本效率上的显著优势。

Comments This article substantially extends the AISTATS version, arXiv:2509.21940

详情

AI中文摘要

本文研究了1比特通信约束下的均值估计问题。我们提出了一种新颖的自适应均值估计器，仅基于随机化阈值查询，其中每个1比特输出指示给定样本是否超过顺序选择的阈值。对于任何具有有界均值$\mu\in [-\lambda, \lambda]$和有界$k$阶中心矩$\mathbb{E}[|X-\mu|^k] \le \sigma^k$（$k>1$固定）的分布，我们的估计器是$(\varepsilon, \delta)$-PAC的。此外，我们的样本复杂度在所有此类尾分布下都是阶数最优的，即对于每个这样的$k$值。对于$k\neq 2$，我们的估计器的样本复杂度匹配未量化极小极大下界加上不可避免的$O(\log(\lambda/\sigma))$定位代价。对于有限方差情形（$k=2$），我们的估计器的样本复杂度有额外的乘法$O(\log(\sigma/\varepsilon))$惩罚，并且我们建立了新的信息论下界，表明该惩罚是1比特量化的基本限制。我们还建立了一个显著的适应性差距：对于阈值查询和更一般的区间查询，任何非自适应估计器的样本复杂度必须与搜索空间参数$\lambda/\sigma$线性增长，使其样本效率远低于我们的自适应方法。最后，我们提出了算法变体，这些变体（i）处理未知的采样预算，（ii）在给定（可能宽松的）界限下适应未知尺度参数$\sigma$，（iii）仅需两个自适应阶段即可实现阶数最优样本复杂度，但以更一般的1比特查询为代价，以及（iv）利用每个1比特查询的多个局部样本按比例减少通信成本。

英文摘要

In this paper, we study the problem of mean estimation under 1-bit communication constraints. We propose a novel adaptive mean estimator based solely on randomized threshold queries, where each 1-bit outcome indicates whether a given sample exceeds a sequentially chosen threshold. Our estimator is $(ε, δ)$-PAC for any distribution with a bounded mean $μ\in [-λ, λ]$ and a bounded $k$-th central moment $\mathbb{E}[|X-μ|^k] \le σ^k$ for any fixed $k > 1$. Moreover, our sample complexity is order-optimal in all such tail regimes, i.e., for every such $k$ value. For $k \neq 2$, our estimator's sample complexity matches the unquantized minimax lower bounds plus an unavoidable $O(\log(λ/σ))$ localization cost. For the finite-variance case ($k=2$), our estimator's sample complexity has an extra multiplicative $O(\log(σ/ε))$ penalty, and we establish a novel information-theoretic lower bound showing that this penalty is a fundamental limit of 1-bit quantization. We also establish a significant adaptivity gap: for both threshold queries and more general interval queries, the sample complexity of any non-adaptive estimator must scale linearly with the search space parameter $λ/σ$, rendering it vastly less sample efficient than our adaptive approach. Finally, we present algorithmic variants that (i) handle an unknown sampling budget, (ii) adapt to an unknown scale parameter $σ$ given (possibly loose) bounds, (iii) require only two stages of adaptivity to achieve order-optimal sample complexity at the expense of more general 1-bit queries, and (iv) leverage multiple local samples per 1-bit query to proportionally reduce communication costs.

URL PDF HTML ☆

赞 0 踩 0

2604.05129 2026-05-25 cs.GT cs.LG

No Coin Left Behind: Maximizing Strategic Surplus Against No-Regret Dynamics

不遗漏任何硬币：对抗无遗憾动态的最大化战略剩余

Yiheng Su, Emmanouil-Vasileios Vlatakis-Gkaragkounis

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）

AI总结本文研究了在零和博弈中，如何对抗使用固定步长的Follow-the-Regularized-Leader（FTRL）学习者，最大化战略盈余。作者证明了从FTRL学习者中提取与遗憾尺度相关的盈余是该方法族的固有特性，而非特定实现的结果，并提出了两个关键结果：固定最大最小优化器下，盈余与学习者的次优动作数量成正比；交替优化器下，无论均衡结构如何，均可保证一定规模的盈余。研究还揭示了正则化器的几何二分现象，并提出了衡量正则化器对学习者策略敏感程度的指标。

详情

AI中文摘要

我们研究了在 $n\times m$ 两人零和博弈中，对抗使用恒定步长 $\eta$ 的跟随正则化领导者（FTRL）学习器时，先知优化者在 $T$ 轮博弈中可获得的战略剩余。与之前的分析不同，我们表明这种遗憾尺度剩余的提取是 FTRL 家族的固有特征，而非特定实例的产物。首先，对于固定的最大最小优化器，我们建立了一个阶为 $\Omega(N_{\mathrm{sub}}/\eta)$ 的普遍规律，证明效用剩余随学习器次优动作数量 $N$ 缩放，并在没有次优动作时消失。其次，对于交替优化器，在随机博弈中，无论均衡结构如何，都能以高概率保证 $\Omega(\eta T/\mathrm{poly}(n,m))$ 的剩余。我们的分析揭示了一个尖锐的几何二分法：非陡峭正则化器允许优化器通过有限时间消除次优动作实现最大瞬态剩余，而陡峭正则化器则引入一个消失的尾部修正，可能延迟剩余饱和。最后，我们讨论了这种优势在双边收益不确定性下是否持续存在，并提出了一个易感性度量，量化哪些正则化器最容易受到学习器感知的战略引导。

英文摘要

We investigate the strategic surplus obtainable against a Follow-the-Regularized-Leader (FTRL) learner with constant step size $η$ in $n\times m$ two-player zero-sum games played over $T$ rounds against a clairvoyant optimizer. In contrast with prior analysis, we show that the extraction of such regret-scale surplus is an inherent feature of the FTRL family, rather than an artifact of specific instantiations. First, for a fixed max-min optimizer, we establish a sweeping law of order $Ω(N_{\mathrm{sub}}/η)$, proving that utility surplus scales with the number of the learner's suboptimal actions $N$ and vanishes in their absence. Second, for an alternating optimizer, a surplus of $Ω(ηT/\mathrm{poly}(n,m))$ can be guaranteed regardless of the equilibrium structure, with high probability, in random games. Our analysis uncovers a sharp geometric dichotomy: non-steep regularizers allow the optimizer to realize the maximal transient surplus via finite-time elimination of suboptimal actions, whereas steep regularizers introduce a vanishing tail correction that can delay surplus saturation. Finally, we discuss whether this leverage persists under bilateral payoff uncertainty and propose a susceptibility measure quantifying which regularizers are most vulnerable to learner-aware strategic steering.

URL PDF HTML ☆

赞 0 踩 0

2603.24226 2026-05-25 cs.IR cs.LG

Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking

联合模型参数缩放与通用域数据集成用于电商搜索排序

Liren Yu, Caiyuan Li, Feiyi Dong, Tao Zhang, Zhixuan Zhang, Dan Ou, Haihong Tang, Bo Zheng

发表机构 * Taobao \& Tmall Group of Alibaba Hangzhou China ； Taobao \& Tmall Group of Alibaba Beijing China ； Taobao \& Tmall Group of Alibaba

AI总结本文研究了电商搜索排序中模型参数扩展与数据质量提升的联合优化问题，指出单纯增加模型规模效果有限，而异构大规模行为数据的处理也难以仅靠架构调整解决。为此，作者提出UniScale框架，包含两个核心组件：ES$^3$系统通过引入跨域示例和全局监督信号扩展训练数据，HHSFT模型则通过分层特征交互和用户兴趣融合处理异构数据。实验表明，UniScale在离线和在线测试中均显著提升了搜索效果，包括订单量和GMV的提升。

详情

AI中文摘要

工业搜索、广告和推荐的缩放研究主要强调扩大模型容量或改进架构。然而在现实系统中，性能不仅受限于模型大小，还受限于训练数据的质量和分布。我们的实证分析显示了两个关键瓶颈：单独增加参数带来的收益逐渐减小，且异构大规模行为数据引入的挑战无法仅通过架构调整完全解决。为解决此问题，我们提出了UniScale，一个将数据缩放与模型设计相结合的统一框架。UniScale包含两个组件。首先，ES$^3$，一个全空间样本构建系统，通过用全局归因的监督信号丰富域内搜索上下文，并引入反映用户在可比内容曝光条件下决策的跨域示例，将监督范围扩展到传统采样训练数据之外。其次，HHSFT，一个异构层次融合Transformer，旨在通过跨整个行为空间的层次化特征交互和用户兴趣融合，利用由此产生的大规模异构数据。这些组件共同实现了比仅以结构为中心的优化更有效的缩放。实验表明，UniScale持续改善离线性能，并展现出有利的缩放行为。在大型电商搜索平台的在线A/B测试中，它带来了1.70%的购买量提升和2.04%的GMV提升。

英文摘要

Scaling studies for industrial search, advertising, and recommendation have largely emphasized enlarging model capacity or refining architectures. Yet in real-world systems, performance is constrained not only by model size but also by the quality and distribution of training data. Our empirical analysis shows two key bottlenecks: increasing parameters alone yields progressively smaller gains, and the challenges introduced by heterogeneous, large-scale behavior data cannot be fully resolved by architecture tuning in isolation. To address this issue, we present UniScale, a unified framework that couples data scaling with model design. UniScale consists of two components. First, ES$^3$, an entire-space sample construction system, broadens supervision beyond conventional sampled training data by enriching intra-domain search contexts with globally attributed supervisory signals and introducing cross-domain examples that reflect user decisions under comparable content exposure conditions. Second, HHSFT, a heterogeneous hierarchical fusion transformer, is tailored to exploit the resulting large-scale heterogeneous data through hierarchical feature interaction and user-interest fusion across the entire behavior space. Together, these components enable more effective scaling than structure-centric optimization alone. Experiments show that UniScale consistently improves offline performance and demonstrates favorable scaling behavior. In online A/B tests on a large e-commerce search platform, it delivers a 1.70% increase in purchases and a 2.04% lift in GMV.

URL PDF HTML ☆

赞 0 踩 0

2603.18551 2026-05-25 math.OC cs.CC cs.LG

Learning Decision-Sufficient Representations for Linear Optimization

学习线性优化的决策充分表示

Yuhan Ye, Saurabh Amin, Asuman Ozdaglar

发表机构 * MIT（麻省理工学院）

AI总结本文研究如何构建压缩数据集以恢复具有未知成本向量的线性规划问题中的最优决策。作者证明了确定决策相关维度 $d^\star$ 是 NP 难的，并提出了一种点态充分性概念，从而在多项式时间内构造出适用于单个成本向量的决策数据集。进一步地，他们提出了一种累积算法，在独立同分布成本假设下实现稳定压缩，并给出了分布无关的 PAC 保证，同时将决策充分性表示应用于上下文线性优化，获得了更优的泛化界。

Comments 45 pages plus appendix, 2 figures. Accepted at COLT 2026

详情

AI中文摘要

我们研究如何构建压缩数据集，使其足以恢复未知成本向量$c$位于先验集$\mathcal{C}$中的线性规划的最优决策。Bennouna等人最近的工作通过内在的决策相关维度$d^\star$给出了充分决策数据集（SDDs）的精确几何刻画。然而，他们构建最小规模SDD的算法需要求解混合整数规划。在本文中，我们建立了硬度结果，表明计算$d^\star$是NP难的，判定数据集是否全局充分是coNP难的，从而解决了Bennouna等人提出的一个近期开放问题。为了应对这种最坏情况下的难解性，我们引入了点态充分性，这是一种要求对单个成本向量充分的松弛。在非退化条件下，我们提供了一种多项式时间的切割平面算法来构建点态充分的决策数据集。在具有独立同分布成本的数据驱动框架下，我们进一步提出了一种累积算法，该算法跨样本聚合决策相关方向，产生一个大小至多为$d^\star$的稳定压缩方案。这导致了一个无分布PAC保证：以高概率，在训练样本上，新样本的点态充分失败概率至多为$ ilde{O}(d^\star/n)$，且该速率在对数因子意义下是紧的。最后，我们将决策充分表示应用于上下文线性优化，获得压缩预测器，其泛化界为$ ilde{O}(\sqrt{d^\star/n})$而非$ ilde{O}(\sqrt{d/n})$，其中$d$是环境成本维度。

英文摘要

We study how to construct compressed datasets that suffice to recover optimal decisions in linear programs with an unknown cost vector $c$ lying in a prior set $\mathcal{C}$. Recent work by Bennouna et al. provides an exact geometric characterization of sufficient decision datasets (SDDs) via an intrinsic decision-relevant dimension $d^\star$. However, their algorithm for constructing minimum-size SDDs requires solving mixed-integer programs. In this paper, we establish hardness results showing that computing $d^\star$ is NP-hard and deciding whether a dataset is globally sufficient is coNP-hard, thereby resolving a recent open problem posed by Bennouna et al. To address this worst-case intractability, we introduce pointwise sufficiency, a relaxation that requires sufficiency for an individual cost vector. Under nondegeneracy, we provide a polynomial-time cutting-plane algorithm for constructing pointwise-sufficient decision datasets. In a data-driven regime with i.i.d.\ costs, we further propose a cumulative algorithm that aggregates decision-relevant directions across samples, yielding a stable compression scheme of size at most $d^\star$. This leads to a distribution-free PAC guarantee: with high probability over the training sample, the pointwise sufficiency failure probability on a fresh draw is at most $\tilde{O}(d^\star/n)$, and this rate is tight up to logarithmic factors. Finally, we apply decision-sufficient representations to contextual linear optimization, obtaining compressed predictors with generalization bounds scaling as $\tilde{O}(\sqrt{d^\star/n})$ rather than $\tilde{O}(\sqrt{d/n})$, where $d$ is the ambient cost dimension.

URL PDF HTML ☆

赞 0 踩 0

2603.18123 2026-05-25 eess.IV cs.AI

Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

理解可泛化超声基础模型的任务聚合

Fangyijie Wang, Tanya Akumu, Vien Ngoc Dang, Amelia Jiménez-Sánchez, Jieyun Bai, Guénolé Silvestre, Karim Lekadir, Kathleen M. Curran

发表机构 * Research Ireland Centre for Research Training in Machine Learning Departament de Matem\`atiques i Inform\`atica, Universitat de Barcelona, Barcelona, Spain School of Medicine, University College Dublin, Dublin, Ireland School of Computer Science, University College Dublin, Dublin, Ireland Instituci\'o Catalana de Recerca i Estudis Avan c ats (ICREA) Department of Cardiovascular Surgery, The First Affiliated Hospital of Jinan University, Jinan University, Guangzhou, China Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand Equal contribution

AI总结该研究探讨了如何在通用超声基础模型中有效整合多种临床任务，分析了任务聚合策略对模型性能的影响。研究提出，任务性能下降并非源于模型容量不足，而是任务异质性与训练数据规模之间的相互作用被忽视所致。为此，作者提出了基于DINOv3的多器官多任务框架M2DINO，并通过系统实验发现，任务聚合的效果高度依赖于数据规模，统一训练在低数据场景下表现更稳定，而临床分组训练可能带来负面影响。研究还揭示了不同任务类型对聚合策略的敏感性差异，为超声基础模型的设计提供了重要指导。

详情

AI中文摘要

基础模型有望在单一框架内统一多个临床任务，但最近的超声研究报告称统一模型可能不如特定任务基线。我们假设这种退化并非源于模型容量限制，而是由于任务聚合策略忽略了任务异质性与可用训练数据规模之间的相互作用。在这项工作中，我们系统分析了何时可以联合学习异质超声任务而不损失性能，为统一临床成像模型中的任务聚合建立了实用标准。我们引入了M2DINO，一个基于DINOv3的多器官、多任务框架，配备任务条件专家混合模块以实现自适应容量分配。我们系统评估了涵盖分割、分类、检测和回归的27项超声任务，采用三种范式：特定任务、临床分组和全任务统一训练。结果表明，聚合效果强烈依赖于训练数据规模。虽然临床分组训练可以在数据丰富的环境中提高性能，但在低数据环境中可能引发显著的负迁移。相比之下，全任务统一训练在临床组间表现出更一致的性能。我们进一步观察到，在我们的实验中，任务敏感性因任务类型而异：与回归和分类相比，分割显示出最大的性能下降。这些发现为超声基础模型提供了实用指导，强调聚合策略应同时考虑训练数据可用性和任务特性，而非仅依赖临床分类。

英文摘要

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified models can underperform task-specific baselines. We hypothesize that this degradation arises not from model capacity limitations, but from task aggregation strategies that ignore interactions between task heterogeneity and available training data scale. In this work, we systematically analyze when heterogeneous ultrasound tasks can be jointly learned without performance loss, establishing practical criteria for task aggregation in unified clinical imaging models. We introduce M2DINO, a multi-organ, multi-task framework built on DINOv3 with task-conditioned Mixture-of-Experts blocks for adaptive capacity allocation. We systematically evaluate 27 ultrasound tasks spanning segmentation, classification, detection, and regression under three paradigms: task-specific, clinically-grouped, and all-task unified training. Our results show that aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. We further observe that task sensitivity varies by task type in our experiments: segmentation shows the largest performance drops compared with regression and classification. These findings provide practical guidance for ultrasound foundation models, emphasizing that aggregation strategies should jointly consider training data availability and task characteristics rather than relying on clinical taxonomy alone.

URL PDF HTML ☆

赞 0 踩 0

2603.15278 2026-05-25 eess.SY cs.RO cs.SY

Encirclement Guaranteed Finite-Time Capture against Unknown Evader Strategies

针对未知逃逸策略的包围保证有限时间捕获

Dinesh Patra, Prajakta Surve, Ashish R. Hota, Shaunak D. Bopardikar

发表机构 * Department of Electrical Engineering, IIT Kharagpur（印度理工学院Kharagpur电子工程系）； Department of Electrical and Computer Engineering, Michigan State University（密歇根州立大学电子与计算机工程系）

AI总结本文研究了在二维无界环境中，多个追捕者在未知逃逸策略下对单个逃逸者进行有限时间捕获的问题。提出了一类保证在有限时间内完成捕获并保持逃逸者始终被包围的策略，且该策略对逃逸者的策略具有鲁棒性。研究还推导了捕获时间的上界，并通过数值实验验证了所提方法的有效性。

2603.04005 2026-05-25 cs.IT cs.LG math.IT

Training-Free Rate-Distortion-Perception Traversal With Diffusion

无训练率失真感知遍历与扩散

Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang

发表机构 * Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong（信息工程系，香港中文大学，香港）； College of Electronic and Information Engineering, Shenzhen University, Shenzhen（电子与信息工程学院，深圳大学，深圳）

AI总结本文研究了在损失压缩中比特率、重构保真度和感知质量之间的率-失真-感知（RDP）权衡问题，提出了一种无需重新训练即可遍历整个RDP曲面的训练自由框架。该方法结合预训练的扩散模型与反向信道编码模块，引入了一种基于分数缩放的概率流ODE解码器，并在高斯信道下理论证明了其在失真-感知权衡中的最优性。实验表明，该框架能够灵活有效地利用预训练扩散模型实现对RDP三元权衡的自适应压缩。

Comments Accepted by the Forty-Third International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

率失真感知（RDP）权衡刻画了有损压缩的基本极限，同时考虑比特率、重建保真度和感知质量。虽然最近的神经压缩方法提高了感知性能，但它们通常在RDP曲面上的固定点运行，需要重新训练以针对不同的权衡。在这项工作中，我们提出了一个无需训练的框架，利用预训练扩散模型遍历整个RDP曲面。我们的方法将反向信道编码（RCC）模块与新颖的分数缩放概率流ODE解码器相结合。我们从理论上证明，所提出的扩散解码器在AWGN观测下对失真-感知权衡是最优的，并且带有RCC模块的整体框架在高斯情况下实现了最优RDP函数。跨多个数据集的实证结果证明了该框架在使用预训练扩散模型导航三元RDP权衡时的灵活性和有效性。我们的结果为自适应、感知感知压缩建立了一种实用且具有理论依据的方法。

英文摘要

The rate-distortion-perception (RDP) tradeoff characterizes the fundamental limits of lossy compression by jointly considering bitrate, reconstruction fidelity, and perceptual quality. While recent neural compression methods have improved perceptual performance, they typically operate at fixed points on the RDP surface, requiring retraining to target different tradeoffs. In this work, we propose a training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface. Our approach integrates a reverse channel coding (RCC) module with a novel score-scaled probability flow ODE decoder. We theoretically prove that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework with the RCC module achieves the optimal RDP function in the Gaussian case. Empirical results across multiple datasets demonstrate the framework's flexibility and effectiveness in navigating the ternary RDP tradeoff using pre-trained diffusion models. Our results establish a practical and theoretically grounded approach to adaptive, perception-aware compression.

URL PDF HTML ☆

赞 0 踩 0

2602.13480 2026-05-25 cs.CR cs.LG

MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection

MELT：用于高风险 Memecoin 发行检测的行为轨迹数据集

Sihao Hu, Selim Furkan Tekin, Yichang Xu, Ling Liu

发表机构 * School of Computer Science（计算机科学学院）

AI总结本文提出MELT，一个用于检测高风险模因币发行的行为轨迹数据集。该数据集基于Solana区块链，包含超过41,000次模因币发行的2亿多笔交易，提取了包括交易类型、账户协调行为等结构化行为记录，揭示了发行方隐藏真实控制权的策略。MELT还提供了122个行为特征和风险等级标注，支持大规模监督学习，并通过实验验证了其在风险检测中的有效性，为模因币投资风险缓解提供了新方法。

详情

AI中文摘要

Launchpad 已成为发行 memecoin 的主要机制，使投资者面临现有 rug-pull 检测方法无法捕捉的新型高风险发行。我们认为，检测这些威胁需要结构化的行为轨迹，这些轨迹隐藏在原始异构区块链数据之下，即内部人员如何积累、协调和解除头寸。为了实现这种分析，我们引入了 MELT（Memecoin 发行轨迹），这是第一个用于分析和检测 Solana 上高风险 memecoin 发行的行为轨迹数据集。MELT 覆盖了 41k+ 个 memecoin 发行，包含 200M+ 笔交易，这些交易被解析为类型化的行为记录，区分了交换、洗盘交易、转账和铸造。除了每个账户的行为外，MELT 还贡献了捆绑轨迹数据，该数据链接了同一实体控制的账户，揭示平均 36.5% 的代币供应由协调账户持有，这是一种隐藏策略，使真正的所有权集中度不被不知情的买家察觉。在这些轨迹之上，MELT 提供了 122 个行为特征和风险级别标注，使得在人口规模上进行监督学习成为可能。我们在高风险发行检测任务上对代表性 ML 模型进行了基准测试。将其预测整合到一个简单的 memecoin 选择策略中，显著减少了投资损失，证明了行为轨迹可以转化为风险缓解。我们的数据集和代码可在 https://github.com/git-disl/MELT 获取。

英文摘要

Launchpads have become the dominant mechanism for issuing memecoins, exposing investors to a new class of high-risk launches that existing rug-pull detection methods cannot capture. We argue that detecting these threats requires structured behavioral traces that underlie raw heterogeneous blockchain data, i.e., how insiders accumulate, coordinate, and unwind positions. To enable such analysis, we introduce MELT (MEmecoin Launch Trace, the first behavioral trace dataset for analyzing and detecting high-risk memecoin launches on Solana. MELT covers 41k+ memecoin launches with 200M+ transactions parsed into typed behavioral records that distinguish swaps, wash trades, transfers, and mints. Beyond per-account behaviors, MELT contributes bundle-trace data that links accounts controlled by the same entity, revealing that, on average, 36.5% of token supply is held by coordinated accounts, a concealment strategy that disguises the true ownership concentration from unsuspecting buyers. On top of these traces, MELT provides 122 behavioral features and risk-level annotations, enabling supervised learning at a population scale. We benchmark representative ML models on the high-risk launch detection task. Integrating their predictions into a simple memecoin selection strategy reduces investment loss significantly, demonstrating that behavioral traces can be translated into risk mitigation. Our dataset and code is available at https://github.com/git-disl/MELT.

URL PDF HTML ☆

赞 0 踩 0

2602.13249 2026-05-25 q-bio.BM cs.AI cs.LG

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

小分子学习的共折叠模型表示的系统评估

Hyosoon Jang, Hyunjin Seo, Honghui Kim, Seonghyun Park, Taewon Kim, Yunhui Jang, Sungsoo Ahn

发表机构 * KAIST（韩国科学技术院）

AI总结本文系统评估了基于蛋白质-配体共折叠的模型在小分子学习中的表示能力。研究使用现代共折叠模型Boltz2，将其原子级配体表示迁移到独立的小分子任务中，结果表明其性能在ADMET基准测试中达到或超越现有模型，并提升了分子生成建模和结构引导的配体优化效率。此外，Boltz2的表示与传统独立分子监督方法具有互补性，并可应用于强化学习以增强分子发现过程。这些结果表明，蛋白质-配体共折叠是一种有前景的小分子表示学习预训练范式。

详情

AI中文摘要

小分子基础模型通常仅在独立分子数据上进行预训练，这与视觉和语言模型不同，后者通常受益于跨模态或关系监督。蛋白质-配体共折叠通过将模型暴露于原子级配体-蛋白质相互作用，提供了这种监督的分子类似物，引发了一个问题：共折叠模型能否产生强大的小分子表示。我们使用现代共折叠模型Boltz2研究这个问题，通过将其原子级配体表示转移到独立的小分子任务。通过系统探测和蒸馏，我们表明Boltz2表示在ADMET基准上匹配或超越现有模型，加速分子生成建模，并提高结构引导配体优化的样本效率。我们进一步发现Boltz2表示与从传统独立分子监督（包括3D构象、生物测定标签和量子化学性质）中学习到的表示互补。最后，我们将表示对齐扩展到强化学习，表明密集的表示级监督可以补充分子发现中的标量奖励。这些结果将蛋白质-配体共折叠确定为小分子表示学习的有前景的预训练范式，并将Boltz2定位为强大的现成分子基础模型。

英文摘要

Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue of such supervision by exposing models to atom-level ligand-protein interactions, raising the question of whether co-folding models can yield strong small-molecule representations. We study this question using Boltz2, a modern co-folding model, by transferring its atom-level ligand representations to standalone small-molecule tasks. Through systematic probing and distillation, we show that Boltz2 representations match or outperform existing models on the ADMET benchmark, accelerate molecular generative modeling, and improve sample efficiency in structure-guided ligand optimization. We further find that Boltz2 representations are complementary to those learned from conventional standalone molecular supervision, including 3D conformers, bioassay labels, and quantum-chemical properties. Finally, we extend representation alignment to reinforcement learning, showing that dense representation-level supervision can complement scalar rewards in molecular discovery. These results identify protein-ligand co-folding as a promising pretraining paradigm for small-molecule representation learning and position Boltz2 as a strong, off-the-shelf molecular foundation model.

URL PDF HTML ☆

赞 0 踩 0

2602.13241 2026-05-25 cs.CY cs.AI cs.HC

Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

赋能 9-1-1 接警培训：生成式 AI 的经验与教训

Zirong Chen, Yilin Liu, Meiyi Ma

发表机构 * College of Connected Computing（连接计算学院）； Vanderbilt University（范德比大学）

AI总结该研究探讨了如何利用生成式人工智能（GenAI）提升9-1-1紧急电话接线员的培训效率，以应对人员短缺和传统培训方式难以扩展的问题。研究团队与孟菲斯市紧急通讯部门合作，开发并部署了一套基于生成式AI的培训系统，经过六个月的实际应用，系统覆盖了190名用户，进行了1120次培训。通过分析大量用户交互数据，研究总结出四条关键经验，为在公共安全领域应用AI驱动培训系统提供了切实可行的设计与治理建议。

Comments Accepted at IEEE SmartComp 2026

详情

AI中文摘要

紧急接警员是公共安全响应的第一操作环节，每年处理超过 2.4 亿次呼叫，同时面临持续的培训危机：许多中心的人员短缺超过 25%，而培训一名新员工可能需要多达 720 小时的一对一指导，这会使得经验丰富的人员脱离现役。传统培训方法在这些限制下难以扩展，限制了覆盖范围和反馈及时性。与 Metro Nashville 紧急通信部（MNDEC）合作，我们在现实约束下设计、开发和部署了一个基于生成式 AI 的接警培训系统。在六个月内，部署从初始试点扩展到 190 名运营用户，覆盖 1120 次培训会话，暴露了在受控或纯模拟评估中基本不可见的系统交付、严谨性、弹性和人为因素方面的系统性挑战。通过分析记录 98429 次用户交互、组织流程和利益相关者参与模式的部署日志，我们提炼出四个关键教训，每个教训都附有具体的设计和治理实践。这些教训为在安全关键公共部门环境中寻求交付 AI 驱动培训系统的研究人员和实践者提供了基于实践的指导，在这些环境中，实际约束从根本上塑造了以人为本的设计。

英文摘要

Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to deliver AI-driven training systems in safety-critical public sector environments where practical constraints fundamentally shape human-centric design.

URL PDF HTML ☆

赞 0 踩 0

2602.08927 2026-05-25 stat.ML cs.LG stat.ME

Online monotone density estimation and log-optimal calibration

在线单调密度估计与对数最优校准

Rohan Hore, Ruodu Wang, Aaditya Ramdas

发表机构 * Department of Statistics and Data Science, Carnegie Mellon University, USA（统计与数据科学系，卡内基梅隆大学，美国）； Department of Statistics and Actuarial Science, University of Waterloo, Canada（统计与精算科学系，滑铁卢大学，加拿大）

AI总结本文研究在线单调密度估计问题，即从序列观测数据中可预测地构建密度估计器。作者提出了两种在线估计方法：一种是经典Grenander估计器的在线版本，另一种是受在线学习中指数加权方法启发的专家聚合估计器。理论分析表明，在密度单调的设定下，所提估计器与真实密度之间的累积对数似然差距具有$O(n^{1/3})$的上界，并且专家聚合估计器相对于最优离线估计器具有$\sqrt{n\log{n}}$的路径遗憾界。此外，作者还展示了该问题与序贯假设检验中对数最优p值到e值校准的联系，并基于所提方法构建了经验自适应的校准器。

Comments 31 pages, 2 figures

详情

AI中文摘要

我们研究在线单调密度估计问题，其中密度估计器必须根据顺序观测数据以可预测的方式构建。我们提出两种在线估计器：经典Grenander估计器的在线类比，以及受在线学习文献中指数加权方法启发的专家聚合估计器。在良好指定的随机设定下，即底层密度是单调的，我们证明在线估计器与真实密度之间的期望累积对数似然差距具有$O(n^{1/3})$界。我们进一步建立了专家聚合估计器相对于事后选择的离线最优单调估计器的$\sqrt{n\log{n}}$路径后悔界，对观测序列的正则性假设要求极低。作为一个独立兴趣的应用，我们证明构建用于序贯假设检验的对数最优p-to-e校准器的问题可以表述为在线单调密度估计问题。我们调整所提出的估计器以构建经验自适应的p-to-e校准器，并证明其最优性。数值实验验证了理论结果。

英文摘要

We study the problem of online monotone density estimation, where density estimators must be constructed in a predictable manner from sequentially observed data. We propose two online estimators: an online analogue of the classical Grenander estimator, and an expert aggregation estimator inspired by exponential weighting methods from the online learning literature. In the well-specified stochastic setting, where the underlying density is monotone, we show that the expected cumulative log-likelihood gap between the online estimators and the true density admits an $O(n^{1/3})$ bound. We further establish a $\sqrt{n\log{n}}$ pathwise regret bound for the expert aggregation estimator relative to the best offline monotone estimator chosen in hindsight, under minimal regularity assumptions on the observed sequence. As an application of independent interest, we show that the problem of constructing log-optimal p-to-e calibrators for sequential hypothesis testing can be formulated as an online monotone density estimation problem. We adapt the proposed estimators to build empirically adaptive p-to-e calibrators and establish their optimality. Numerical experiments illustrate the theoretical results.

URL PDF HTML ☆

赞 0 踩 0

2602.00979 2026-05-25 cs.CR cs.AI cs.CL

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

GradingAttack: 揭示基于LLM的教育评分代理中的安全漏洞

Xueyi Li, Zhuoneng Zhou, Zitao Liu, Yongdong Wu

发表机构 * Guangdong Institute of Smart Education（广东智能教育研究院）； Jinan University（济南大学）

AI总结随着大型语言模型（LLM）在自动短答案评分中的广泛应用，其安全性问题日益受到关注。本文提出GradingAttack，一种细粒度的对抗攻击框架，用于系统评估基于LLM的教育评分代理的安全漏洞。通过设计基于词元和提示的攻击策略，该方法在保持高隐蔽性的同时有效操控评分结果，揭示了当前系统在防御对抗攻击方面的不足，突显了构建安全可信教育代理系统的重要性。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为教育代理，用于实际教育环境中的自动简答题评分（ASAG），显著提升了评估效率和可扩展性。然而，当这些评分代理“在野外”运行时，它们对对抗性操纵的脆弱性引发了对代理安全性和可信度的关键担忧。在本文中，我们介绍了GradingAttack，一个细粒度的对抗攻击框架，系统地评估基于LLM的教育评分代理的安全漏洞。具体来说，我们设计了token级和prompt级攻击策略，在保持高隐蔽性的同时操纵代理评分结果，揭示了当前代理部署中的根本弱点。在多个数据集上的实验表明，两种攻击策略都能有效破坏评分代理，其中prompt级攻击成功率更高，而token级攻击具有更优的隐蔽性。我们的发现表明，当前的基于LLM的教育代理缺乏针对对抗性攻击的鲁棒防御，突显了为关键教育应用开发安全可信的代理系统的紧迫性。

英文摘要

Large language models (LLMs) are increasingly deployed as educational agents for automatic short answer grading (ASAG) in real-world educational environments, significantly boosting assessment efficiency and scalability. However, when these grading agents operate ``in the wild'', their vulnerability to adversarial manipulation raises critical concerns about agent security and trustworthiness. In this paper, we introduce GradingAttack, a fine-grained adversarial attack framework that systematically evaluates the security vulnerabilities of LLM based educational grading agents. Specifically, we design token-level and prompt-level attack strategies that manipulate agent grading outcomes while maintaining high stealth, exposing fundamental weaknesses in current agent deployments. Experiments on multiple datasets demonstrate that both attack strategies effectively compromise grading agents, with prompt-level attacks achieving higher success rates and token-level attacks exhibiting superior stealth capability. Our findings reveal that current LLM based educational agents lack robust defenses against adversarial attacks, underscoring the urgent need for developing secure and trustworthy agent systems for critical educational applications.

URL PDF HTML ☆

赞 0 踩 0

2601.22367 2026-05-25 stat.ML cs.LG

Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

通过神经后验估计在广义贝叶斯中进行摊销的基于模拟的推理

Shiyi Sun, Geoff K. Nicholls, Jeong Eun Lee

发表机构 * Department of Statistics, University of Oxford, Oxford, United Kingdom（英国牛津大学统计系）； Department of Statistics, University of Auckland, Auckland, New Zealand（新西兰奥克兰大学统计系）

AI总结该论文提出了一种基于神经后验估计的通用贝叶斯推断方法，通过引入温度参数 $β$ 来缓解模型误设下的过自信问题，并提升推断的鲁棒性。研究提出了一种完全摊销的变分近似方法，仅需一次前向计算即可对任意数据和 $β$ 值进行后验采样，无需调用模拟器或运行MCMC。通过两种互补的训练策略，该方法在多个标准模拟推断基准上展示了与非摊销MCMC方法相当的性能，具有较高的效率和稳定性。

Comments Accepted at ICML 2026

详情

AI中文摘要

广义贝叶斯推理（GBI）通过温度β>0调整损失以减轻过度自信并提高模型误设下的鲁棒性，但现有GBI方法通常依赖昂贵的MCMC或基于SDE的采样器，且必须为每个新数据集和每个β值重新运行。我们通过训练单一数据与β条件神经后验估计器，首次为温度后验族提供了完全摊销的变分近似，使得单次前向传播即可采样，无需模拟器调用或推理时MCMC。我们引入了两种互补的训练路径：一种从温度联合分布中合成流形外样本，另一种使用自归一化重要性采样（SNIS）对固定基础数据集进行重加权。我们证明，SNIS加权目标在有限权重方差下为温度后验提供了一致的前向KL拟合。在四个标准基于模拟的推理基准（包括混沌Lorenz-96系统）中，我们的β摊销估计器在标准双样本指标上实现了具有竞争力的后验近似，在广泛温度范围内匹配了非摊销的基于MCMC的幂后验采样器。

英文摘要

Generalized Bayesian Inference (GBI) tempers a loss with a temperature $β> 0$ to mitigate overconfidence and improve robustness under model misspecification, but existing GBI methods typically rely on costly MCMC or SDE-based samplers and must be re-run for each new dataset and each $β$ value. We give the first fully amortized variational approximation for the tempered posterior family by training a single data- and $β$-conditioned neural posterior estimator that enables sampling in a single forward pass, without simulator calls or inference-time MCMC. We introduce two complementary training routes: one synthesizes off-manifold samples from the tempered joint distribution, and the other reweights a fixed base dataset using self-normalized importance sampling (SNIS). We show that the SNIS-weighted objective provides a consistent forward-KL fit to the tempered posterior with finite weight variance. Across four standard simulation-based inference benchmarks, including the chaotic Lorenz-96 system, our $β$-amortized estimator achieves competitive posterior approximations, in standard two-sample metrics, matching non-amortized MCMC-based power-posterior samplers over a wide range of temperatures.

URL PDF HTML ☆

赞 0 踩 0

2601.21198 2026-05-25 cs.DC cs.AI cs.LG

ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling

ZipMoE：通过无损压缩和缓存亲和调度实现高效的设备端MoE服务

Yuchen Yang, Yaru Zhao, Pu Yang, Shaowei Wang, Zhi-Hua Zhou

发表机构 * School of Electronic Science and Engineering, Nanjing University, China.（南京大学电子科学与工程学院）； National Key Laboratory for Novel Software Technology, Nanjing University, China.（南京大学新型软件技术国家重点实验室）

AI总结本文提出了一种名为ZipMoE的高效边缘设备MoE服务系统，旨在解决大语言模型中MoE架构在资源受限设备上部署时的高内存消耗问题。ZipMoE通过结合边缘设备的硬件特性和MoE参数的统计冗余，设计了一种具有可证明性能保障的缓存与调度协同机制，将设备端MoE推理从I/O瓶颈转向计算驱动的工作流，从而实现高效的并行处理。实验表明，ZipMoE在多个边缘计算平台上显著降低了推理延迟并提升了吞吐量，优于现有先进系统。

Comments ICML 2026

详情

AI中文摘要

虽然混合专家（MoE）架构显著增强了大型语言模型的表达能力，但其巨大的内存占用严重阻碍了在资源受限的边缘设备上的实际部署，尤其是在必须保持模型行为而不依赖有损量化的情况下。在本文中，我们提出了ZipMoE，一个高效且语义无损的设备端MoE服务系统。ZipMoE通过具有可证明性能保证的缓存-调度协同设计，利用了边缘设备的硬件特性与MoE参数固有的统计冗余之间的协同作用。从根本上说，我们的设计将设备端MoE推理的范式从I/O瓶颈转变为以计算为中心的工作流，从而实现高效的并行化。我们实现了ZipMoE的原型，并在代表性边缘计算平台上使用流行的开源MoE模型和真实工作负载进行了广泛实验。评估结果表明，与最先进系统相比，ZipMoE实现了高达72.77%的推理延迟降低和高达6.76倍的吞吐量提升。我们的代码可在https://github.com/npnothard/ZipMoE-ICML26获取。

英文摘要

While Mixture-of-Experts (MoE) architectures substantially bolster the expressive power of large-language models, their prohibitive memory footprint severely impedes the practical deployment on resource-constrained edge devices, especially when model behavior must be preserved without relying on lossy quantization. In this paper, we present ZipMoE, an efficient and semantically lossless on-device MoE serving system. ZipMoE exploits the synergy between the hardware properties of edge devices and the statistical redundancy inherent to MoE parameters via a caching-scheduling co-design with provable performance guarantee. Fundamentally, our design shifts the paradigm of on-device MoE inference from an I/O-bound bottleneck to a compute-centric workflow that enables efficient parallelization. We implement a prototype of ZipMoE and conduct extensive experiments on representative edge computing platforms using popular open-source MoE models and real-world workloads. Our evaluation reveals that ZipMoE achieves up to $72.77\%$ inference latency reduction and up to $6.76\times$ higher throughput than the state-of-the-art systems.Our code is available at: https://github.com/npnothard/ZipMoE-ICML26.

URL PDF HTML ☆

赞 0 踩 0

2601.19117 2026-05-25 eess.IV cs.CV stat.AP

Optimized $k$-means color quantization of digital images in machine-based and human perception-based colorspaces

基于机器感知和人类感知色彩空间的优化 $k$-均值图像颜色量化

Ranjan Maitra

发表机构 * Department of Statistics, Iowa State University（统计学系，爱荷华州立大学）

AI总结该研究探讨了在不同颜色空间中使用 $k$-means 算法进行数字图像颜色量化的效果，比较了 RGB、CIE-XYZ 和 CIE-LUV/CIE-HCL 等颜色空间在不同量化级别下的表现。通过视觉信息保真度（VIF）指标评估图像质量，发现 $k$-means 在 RGB 空间中表现最佳的情况约占一半，而在较高量化级别时，CIE-XYZ 空间通常表现更优，部分低量化级别情况下 CIE-LUV 空间效果更佳。研究还分析了色调、色度和亮度分布对颜色空间选择的影响，为不同场景下的颜色量化提供了更细致的指导。

Comments 25 pages, 11 figures, 5 tables, accepted in the Journal of Electronic Imaging

Journal ref Journal of Electronic Imaging Journal of Electronic Imaging, Vol. 35, Issue 2, 023002 (Mar 2026)

详情

DOI: 10.1117/1.JEI.35.2.023002

AI中文摘要

颜色量化使用原始颜色数量的一小部分来表示图像，同时仅最小程度地损失视觉质量。$k$-均值算法在此背景下常用，但主要应用于由三原色组成的基于机器的RGB色彩空间。然而，最近一些研究表明其在基于人类感知的色彩空间中性能有所提升。我们研究了在RGB、CIE-XYZ和CIE-LUV/CIE-HCL色彩空间中，$k$-均值颜色量化在四个量化级别下对148张涵盖广泛场景、主题和设置的多样化数字图像的性能。视觉信息保真度（VIF）度量数值上评估了量化图像的质量，并显示在大约一半的情况下，$k$-均值颜色量化在RGB空间中最佳，而在其他时候，特别是对于更高的量化级别（$k$），CIE-XYZ色彩空间通常表现更好。也有一些情况，尤其是在较低的$k$下，最佳性能在CIE-LUV色彩空间中获得。进一步根据图像中色调、色度和亮度分布对性能的分析，为每个色彩空间更适合$k$-均值颜色量化的图像提供了细致的视角和特征描述。

英文摘要

Color quantization represents an image using a fraction of its original number of colors while only minimally losing its visual quality. The $k$-means algorithm is commonly used in this context, but has mostly been applied in the machine-based RGB colorspace composed of the three primary colors. However, some recent studies have indicated its improved performance in human perception-based colorspaces. We investigated the performance of $k$-means color quantization at four quantization levels in the RGB, CIE-XYZ, and CIE-LUV/CIE-HCL colorspaces, on 148 varied digital images spanning a wide range of scenes, subjects and settings. The Visual Information Fidelity (VIF) measure numerically assessed the quality of the quantized images, and showed that in about half of the cases, $k$-means color quantization is best in the RGB space, while at other times, and especially for higher quantization levels ($k$), the CIE-XYZ colorspace is where it usually does better. There are also some cases, especially at lower $k$, where the best performance is obtained in the CIE-LUV colorspace. Further analysis of the performances in terms of the distributions of the hue, chromaticity and luminance in an image presents a nuanced perspective and characterization of the images for which each colorspace is better for $k$-means color quantization.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

GenAI-Driven Threat Detection with Microsoft Security Copilot

On Stability and Decomposition of Sample Quantiles under Heavy-Tailed Distributions

Feature Learning in Linear-Width Two-Layer Networks: Two vs. One Step of Gradient Descent

An Efficient Machine Learning-based Framework for Detection and Prevention of Frauds in Telecom Networks

Can the Recovery Mechanism Survive AI? Skill Formation, Labor, and What Current Measurement Misses

Bridging Silicon and the Hippocampus: Algebro-Deterministic Memory "VaCoAl" as a Substrate for Vector-HaSH and TEM

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

Content-Aware Attack Detection in LLM Agent Tool-Call Traffic: An Empirical Study of Features, Architectures, and Evaluation Protocols

Stellar Age Compression Reshapes Interpretations of the Milky Way Thick-Disk Formation History

Parameterized Complexity of Stationarity Testing for Piecewise-Affine Functions and Shallow CNN Losses

The AI-Native Large-Scale Agile Software Development Manifesto

Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation

Quantum-Inspired Robust and Scalable SAR Object Classification

Order-Optimal Sequential 1-Bit Mean Estimation in General Tail Regimes

No Coin Left Behind: Maximizing Strategic Surplus Against No-Regret Dynamics

Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking

Learning Decision-Sufficient Representations for Linear Optimization

Understanding Task Aggregation for Generalizable Ultrasound Foundation Models

Encirclement Guaranteed Finite-Time Capture against Unknown Evader Strategies

Training-Free Rate-Distortion-Perception Traversal With Diffusion

MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection

A Systematic Evaluation of Co-folding Model Representations for Small-Molecule Learning

Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

Online monotone density estimation and log-optimal calibration

GradingAttack: Exposing Security Vulnerabilities in LLM Based Educational Grading Agents

Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation

ZipMoE: Efficient On-Device MoE Serving via Lossless Compression and Cache-Affinity Scheduling

Optimized $k$-means color quantization of digital images in machine-based and human perception-based colorspaces