arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.19122 2026-05-20 stat.ML cs.LG

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

双通道张量神经网络：有限样本理论与符合结构选择

Elynn Chen, Jiayu Li, Zheshi Zheng, Jian Pei

发表机构 * New York University（纽约大学）； University of Michigan（密歇根大学）； Duke University（杜克大学）

AI总结本文提出双通道张量神经网络（DC-TNN），通过分解张量输入为低秩核心和稀疏细化部分，并通过耦合的神经通道处理两者。该框架结构无关，可容纳CP、Tucker和张量列车核心。在估计方面，建立了DC-TNN估计器的非渐近风险界，并展示了有效维度由核心秩和细化稀疏性共同决定。在推断方面，开发了结构感知符合ROC程序，产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，提出了符合结构选择器，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

详情

AI中文摘要

张量值数据自然出现在神经影像、基因组学、气候科学和时空网络中，其中多线性依赖关系在模式间携带信息，而向量化会破坏这些信息。现有方法要么施加单一低秩结构，可能遗漏局部信号，要么将张量视为长向量，从而丢弃其多维几何。我们提出双通道张量神经网络（DC-TNN），将每个张量输入分解为低秩核心和稀疏细化，并通过耦合的神经通道处理两个组件。该框架结构无关，可容纳CP、Tucker和张量列车核心于单一架构中。在估计方面，我们建立了DC-TNN估计器的非渐近风险界，将其分解为网络近似、核心估计和细化选择项，并显示有效维度由核心秩和细化稀疏性共同决定，而非由张量环境大小决定。在推断方面，我们开发了结构感知符合ROC程序，校准在核心-细化潜在空间中，并产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，我们提出了符合结构选择器，据我们所知，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

英文摘要

Tensor-valued data arise naturally in neuroimaging, genomics, climate science, and spatiotemporal networks, where multilinear dependencies across modes carry information that is destroyed under vectorization. Existing approaches either impose a single low-rank structure, which can miss localized signal, or treat the tensor as a long vector, which discards its multiway geometry. We propose a *Dual-Channel Tensor Neural Network* (DC-TNN) that decomposes each tensor input into a low-rank core and a sparse refinement, and processes the two components through coupled neural channels. The framework is structure-agnostic and accommodates CP, Tucker, and tensor-train cores within a single architecture. For estimation, we establish non-asymptotic risk bounds for the DC-TNN estimator that decompose into network approximation, core estimation, and refinement-selection terms, and show that the effective dimension is determined jointly by the core rank and refinement sparsity rather than by the ambient tensor size. For inference, we develop a *structure-aware conformal ROC* procedure that calibrates within the core-refinement latent space and produces ROC and AUC confidence bands with finite-sample, distribution-free coverage. Building on this, we propose a *conformal structure selector* that, to our knowledge, is the *first distribution-free procedure* for choosing among candidate tensor decompositions with finite-sample validity. Simulations and an analysis of a protein dataset demonstrate competitive predictive accuracy, reliable uncertainty quantification, and consistent recovery of the tensor structure.

URL PDF HTML ☆

赞 0 踩 0

2605.19119 2026-05-20 cs.NE cs.AI cs.LG

GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization

GOAL: 图基基于的目标对齐扩散求解器用于动态多目标优化

Xingyu Li

发表机构 * Purdue University（普渡大学）

AI总结本文提出GOAL，一种基于图的扩散求解器，用于动态多目标优化问题，通过条件化扩散求解器实现可控决策生成，通过人类指定的目标进行条件化，引入异构图编码，允许信息根据约束的本体进行选择性传播，并在三个经典调度基准上实现了100%的解可行性和接近零的MAPE。

详情

AI中文摘要

现有的神经组合优化求解器将解决方案搜索框定为模仿最优决策，本质上限制了其在单目标最小化和静态约束下的用途。我们提出了GOAL，一种基于关系图表示的条件扩散求解器，能够通过在人类指定的目标上进行条件化来实现可控的决策生成。我们引入了一种异构图编码，在其中不同的边类型，对应于不同类别的约束，定义了图神经网络的消息传递结构，这允许信息根据每个约束的本体进行选择性传播。GOAL在三个经典调度基准上进行了实例化和评估，这些基准涵盖了各种约束复杂度：流水作业问题（FSP）、作业调度问题（JSP）和灵活作业调度问题（FJSP）。在不进行架构修改的情况下，通用性在结构上不同的约束领域和问题类型中得到证明。在所有三个基准上，GOAL在20个作业和60个操作的问题规模上实现了100%的解可行性和接近零的MAPE（低于0.20%）在多个目标上，优于NSGA-II和MOEA/D在解质量和推理速度上最多提高了25倍。

英文摘要

Existing neural combinatorial optimization solvers frame solution search as imitation of optimal decisions, inherently limiting their utility to single-objective minimization and static constraints. We propose GOAL, a conditioned diffusion solver over relational graph representations that enables controllable decision generations by conditioning on human-specified objectives. We introduce a heterogeneous graph encoding in which distinct edge types, corresponding to different classes of constraints, define the message passing structure of the graph neural network, which allows information to propagate selectively according to the ontology of each constraint. GOAL is instantiated and evaluated on three canonical scheduling benchmarks of various constraint complexity: the Flow Shop Problem (FSP), the Job Shop Scheduling Problem (JSP), and the Flexible Job Shop Scheduling Problem (FJSP). Generalization is demonstrated across structurally distinct constraint regimes and problem types without architectural modification. On all three benchmarks, GOAL achieves 100% solution feasibility and near-zero MAPE (below 0.20%) on multiple objectives for problem sizes up to 20 jobs and 60 operations, outperforming NSGA-II and MOEA/D in both solution quality and inference speed by up to 25x.

URL PDF HTML ☆

赞 0 踩 0

2605.19113 2026-05-20 stat.ME cs.LG stat.ML

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

通过直接优化学习可解释的基于点的临床风险评分

Ying Cui, Albert M Li, Vivek Charu, Yeon-Mi Hwang, Tina Hernandez-Boussard, Lu Tian

发表机构 * Department of Biomedical Data Science, Stanford University（斯坦福大学生物医学数据科学系）； Decatur High School（德凯高中）； Department of Pathology, Stanford University School of Medicine（斯坦福大学医学院病理学系）； Division of Computational Medicine, Department of Medicine, Stanford University（斯坦福大学医学系计算医学分会）

AI总结本文提出了一种新的机器学习算法，通过灵活的贪心优化策略直接学习可解释的基于点的临床风险评分，以在明确的最优性目标下优化加法评分。

Comments 23 pages, 4 figures

详情

AI中文摘要

许多临床风险评分被部署为加法规则，其中相关的二元预测特征被分配非负整数点。这些整数权重不仅使评分在实践中更容易使用，还促进了所得到的预测模型的稀疏性。此类风险评分通常通过首先拟合回归模型，然后经过适当缩放后将估计的系数四舍五入到最近的整数来获得。这种方法计算速度快，但不能保证最终评分的最优性。替代方法是通过遍历所有可能的整数权重，将问题视为整数规划任务，直接优化价值函数。然而，相关计算负担可能相当大，尤其是当价值函数是非凸甚至不连续时。在本文中，我们开发了新的机器学习算法，采用灵活的贪心优化策略，在明确且合理的最优性目标下直接学习此类加法评分。我们应用所提出的方法，利用Epic Cosmos中的大规模电子健康记录（EHR）队列，构建一个整数加权共病评分，用于衡量出院后死亡风险。我们还进行了模拟研究，以考察有限样本的操作特性。

英文摘要

Many clinical risk scores are deployed as additive rules with nonnegative integer points assigned to relevant binary predictive features. These integer weights not only make the score easier to use in practice but also promote sparsity in the resulting prediction model. Such risk scores are often derived by first fitting a regression model and then rounding the estimated coefficients to the nearest integer after appropriate scaling. This approach is computationally fast but does not guarantee optimality of the resulting score. Alternatively, one may search over all possible integer weights to directly optimize a value function by posing the problem as an integer programming task. However, the associated computational burden can be substantial, especially when the value function is nonconcave or even discontinuous. In this paper, we develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives. We apply the proposed method to a large electronic health record (EHR) cohort in Epic Cosmos to construct an integer-weighted comorbidity score for measuring the risk of post-discharge mortality. We also conduct a simulation study to examine the finite-sample operating characteristics.

URL PDF HTML ☆

赞 0 踩 0

2605.19064 2026-05-20 cs.HC cs.AI

Toward an AI-Powered Computational Testbed for Workforce Policy

迈向由人工智能驱动的劳动力政策计算测试平台

Sumer S. Vaid, Ashley V. Whillans

发表机构 * Negotiation, Organizations and Markets Unit, Harvard Business School（哈佛商学院谈判、组织与市场单位）

AI总结本文提出了一种动态员工代理，结合LLM生成代理、管理科学和组织行为研究，以预测员工在组织变革中的心理和行为反应，同时定义了隐私、准确性和代表性保障措施。

详情

AI中文摘要

劳动力转型难以预测且管理不当成本高昂。特别是，人工智能在知识工作中的整合目前影响了全球大量劳动力，但这一转变缺乏工具来预测个体员工的心理和行为反应。我们结合最近的LLM生成代理进展与基础管理科学和组织行为研究，提出动态员工代理。在同意的群体中，这些代理可以利用HR记录、验证心理测量和数字活动数据进行播种，以模拟员工在计划组织变革期间连续工作日中的认知、情感和行为轨迹。本文详细说明了构建此模拟平台所需的计算架构，并定义了负责任部署所需的隐私、准确性和代表性保障措施。我们主张建立这种前瞻性预测基础设施是管理当前全球劳动力围绕人工智能重新调整的关键技术要求。

英文摘要

Workforce transformations are difficult to forecast and costly to mismanage. In particular, the integration of artificial intelligence into knowledge work currently affects a substantial share of the global workforce, yet this transition proceeds without tools to forecast how individual employees will respond psychologically and behaviorally. We combine recent advances in LLM-powered generative agents with foundational management science and organizational behavior research to propose dynamic employee agents. Among consenting populations, these agents can be seeded with HR records, validated psychometric measures, and digital activity data to simulate employees' cognitive, emotional, and behavioral trajectories across successive workdays during planned organizational changes. In this article, we detail the computational architecture required to construct this simulation platform and define the privacy, accuracy, and representativeness safeguards necessary for responsible deployment. We argue that establishing this prospective forecasting infrastructure is a critical technical requirement for managing the current global workforce realignment around AI.

URL PDF HTML ☆

赞 0 踩 0

2605.19043 2026-05-20 cs.CY cs.AI cs.HC

Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs

使用具备视觉能力的LLM进行手写数学自动评分

Jacob Levine, Miguel Aenlle, Craig Zilles, Matthew West, Mariana Silva

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结本文研究了使用具备视觉能力的LLM对手写数学作业进行自动评分，通过对比AI评分与人工评分，发现大多数错误源于转录失败而非评分标准应用错误，揭示了LLM在手写数学评分中的潜力与局限。

Comments To be published in the International Conference on AI in Education (AIED), 2026

详情

AI中文摘要

自动评分系统已能对多种响应类型进行大规模评估，但手写数学作业仍是一个障碍，因为其多步骤解决方案的复杂性。具备视觉能力的大语言模型（LLM）在此领域提供了新机会，但其在真实教学环境中的可靠性仍不明确。本文介绍了基于LLM的手写数学作业评分系统，使用教师定义的评分标准进行评估。在扩展先前针对 typed 响应的流程时，我们整合了对照片提交的转录和基于评分标准的评估，通过单次LLM调用完成。在两个大学STEM课程的学生作业上进行了评估。将AI评分决策与人工分配的地面真实值在评分项层面进行比较，我们观察到总体准确率较高，大多数错误——在最佳模型中为87%——归因于转录失败，而非评分标准应用错误。我们分类了常见的错误模式，包括图像质量问题、幻觉内容以及等价表达的处理错误。这些发现突显了LLM在手写数学评分中的潜力和局限，为系统设计、提示优化和教育环境中的部署提供了指导。

英文摘要

Automated grading systems have enabled scalable assessment for many response types, but handwritten mathematics remains a barrier due to the complexity of multi-step solutions. Vision-capable large language models (LLMs) offer new opportunities here, yet their reliability in authentic instructional settings remains poorly understood. We present an empirical evaluation of an LLM-based grader for handwritten mathematical work using instructor-defined rubrics. Extending a prior pipeline for typed responses, we integrate transcription and rubric-based evaluation of photographic submissions within a single LLM call, evaluating on student work from two university STEM courses. Comparing AI grading decisions against human-assigned ground truth at the rubric-item level, we observe high overall accuracy, with most errors -- 87\% in the best model -- attributable to transcription failures rather than rubric misapplication. We categorize common error modes, including image quality issues, hallucinated content, and incorrect handling of equivalent expressions. These findings highlight both the promise and limitations of LLM-based grading for handwritten mathematics, providing guidance for system design, prompt refinement, and deployment in educational settings.

URL PDF HTML ☆

赞 0 踩 0

2605.19024 2026-05-20 stat.ML cs.LG stat.ME

Conformal Prediction via Transported Beta Laws

通过运输的贝塔定律进行符合预测

Thiago R. Ramos, Helton Graziadei, Luben M. C. Cabezas

发表机构 * Federal University of São Carlos（萨尔瓦多联邦大学）； University of São Paulo（圣保罗大学）； Inria（法国国家信息与自动化技术研究院）； Université Grenoble Alpes（格勒诺布尔阿尔卑斯大学）

AI总结本文研究了通过实现的符合阈值诱导的校准-条件覆盖定律，利用贝塔分布作为有限样本参考对象，并通过Wasserstein距离量化偏离，从而提供对边际覆盖差距和坏校准概率的直接界限，并区分不同非i.i.d行为的来源。

详情

AI中文摘要

分割符合预测在交换性下提供有限样本边际覆盖保证，但此保证平均于随机校准样本。我们研究的是由实现的符合阈值诱导的校准-条件覆盖定律。在连续i.i.d情况下，此定律恰好为Beta(k,n+1-k)，因此常规的边际保证对应于其均值。我们将此贝塔定律作为有限样本参考对象，并利用Wasserstein距离在[0,1]上量化偏离。该框架提供了对边际覆盖差距和坏校准概率的直接界限，并根据如何变形贝塔参考来区分不同的非i.i.d行为：测试侧偏移通过覆盖尺度上的运输映射作用，而校准依赖性改变顺序统计学定律本身。我们将在尺度-偏移、聚类和稳定混合设置中实例化该框架，其中诱导的变形可以明确表征或通过Berry-Esseen近似表征。在依赖过程上的模拟证实，一阶近似在中等样本大小下能够跟踪经验Wasserstein距离。

英文摘要

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

URL PDF HTML ☆

赞 0 踩 0

2605.18474 2026-05-20 cs.CR cs.AI cs.CL cs.LG

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Prompt2Fingerprint: 通过文本到权重生成实现即插即用的LLM指纹生成

Sixu Chen, Xiang Chen, Hongyao Yu, Jiaxin Hong, Hao Fang, Shuoyang Sun, Bin Chen, Shu-Tao Xia

发表机构 * Shenzhen International Graduate School, Tsinghua University, Shenzhen, China（清华大学深圳国际研究生院，中国深圳）； South China University of Technology, Guangzhou, China（华南理工大学，中国广州）； Harbin Institute of Technology, Shenzhen, Shenzhen, China（哈尔滨工业大学深圳校区，中国深圳）

AI总结本文提出Prompt2Fingerprint框架，将LLM指纹生成重新定义为条件参数生成任务，通过专用生成器将文本描述直接映射到低秩参数增量，实现无需进一步模型微调的即插即用LLM指纹注入，显著降低计算开销，提供可扩展且即时的LLM所有权管理解决方案。

详情

AI中文摘要

大规模语言模型（LLMs）的广泛部署和重新分布使模型溯源跟踪成为关键挑战。尽管现有的LLM指纹生成方法，特别是通过微调嵌入身份信号的主动方法，实现了高准确性和鲁棒性，但它们面临显著的可扩展性瓶颈。这些方法通常将指纹注入视为一个独立的一次性优化任务，而不是可重用的能力，需要为每个新身份进行单独且资源密集的训练。这导致了高昂的计算成本和部署延迟。为了解决这一问题，我们提出了Prompt2Fingerprint（P2F），这是首个将指纹生成重新定义为条件参数生成任务的框架。通过利用专用生成器，P2F在单次前向传递中将文本描述直接映射到低秩参数增量，从而实现无需进一步模型微调的即插即用LLM指纹注入。我们的实验表明，P2F在保持高指纹准确度、无害性和鲁棒性的同时，显著降低了计算开销，为LLM所有权管理提供了可扩展且即时的解决方案。

英文摘要

The widespread deployment and redistribution of large language models (LLMs) have made model provenance tracking a critical challenge. While existing LLM fingerprinting methods, particularly active approaches that embed identity signals via fine-tuning, achieve high accuracy and robustness, they suffer from significant scalability bottlenecks. These methods typically treat fingerprint injection as an independent, one-off optimization task rather than a reusable capability, necessitating separate, resource-intensive training for every new identity. This incurs prohibitive computational costs and deployment delays. To address this, we propose Prompt2Fingerprint (P2F), the first framework that reformulates fingerprinting as a conditional parameter generation task. By leveraging a specialized generator, P2F maps textual descriptions directly to low-rank parameter increments in a single forward pass, enabling plug-and-play LLM fingerprint injection without further model retraining. Our experiments demonstrate that P2F maintains high fingerprint accuracy, harmlessness, and robustness while significantly reducing computational overhead, offering a scalable and instant solution for LLM ownership management.

URL PDF HTML ☆

赞 0 踩 0

2605.17859 2026-05-20 cs.HC cs.LG

Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables

多站点PPG：来自新兴多站点可穿戴设备的野外生理数据集

Jiayi Shao, Jiaying Ye, Shengyao Liu, Zachary Englhardt, Girish Narayanswamy, Vikram Iyer, Qiuyue Shirley Xue

发表机构 * University of Washington（华盛顿大学）； Purdue University（普渡大学）

AI总结本文提出一个多站点PPG数据集，通过四个定制开发的无感可穿戴设备收集了超过350小时的原始数据，用于评估不同身体部位的PPG信号在心率估计中的表现差异。

Comments 20 pages, 6 figures, 11 tables. Dataset and code available at the URLs in the paper

详情

AI中文摘要

可穿戴设备被广泛用于移动健康监测，光脉冲测距（PPG）是用于心率及相关生理测量的关键传感模式。然而，公开的野外PPG数据集大多集中在手腕或局限于短时间的受控研究，限制了新兴可穿戴设备形式因素的研究。我们提出了Multi-site PPG，一个从四个定制开发的无感可穿戴设备（智能耳环、戒指、手表和项链）收集的野外生理数据集。每个设备记录绿色和红外反射PPG、三轴加速度计和温度，并带有时间戳以实现跨设备对齐，同时一个Polar H10胸 strap提供参考心电图（ECG）。参与者在白天活动期间佩戴设备多天，继续正常生活。该数据集包含超过350小时的原始数据和每种可穿戴设备230-290小时的建模准备8秒窗口。我们基准测试了启发式、监督和自监督的心率估计方法，显示了显著的身体部位差异：最佳方法在耳环上的平均绝对误差（MAE）为2.30 bpm，在戒指上为5.13 bpm，在手表上为8.37 bpm，在项链上为8.68 bpm。我们进一步分析了运动效应，并评估了多站点和PPG-加速度计融合，证明了该数据集在新兴可穿戴设备形式因素上的稳健生理传感价值。

英文摘要

Wearables are widely used for mobile health monitoring, and photoplethysmography (PPG) is a key sensing modality for heart rate and related physiological measurements. However, public in-the-wild PPG datasets remain largely wrist-centric or limited to short, controlled studies, constraining research on emerging wearable form factors. We present Multi-site PPG, an in-the-wild physiological dataset collected from four custom-developed unobtrusive wearables: a smart earring, ring, watch, and necklace. Each device records green and infrared reflective PPG, 3-axis acceleration, and temperature with timestamps for cross-device alignment, while a Polar H10 chest strap provides reference electrocardiogram (ECG). Participants wore the devices for multiple days during daytime activities while continuing their normal routines. The dataset contains over 350 hours of raw data and 230-290 hours of modeling-ready 8-second windows per wearable. We benchmark heuristic, supervised, and self-supervised heart-rate estimation methods, showing substantial body-site differences: the best methods achieve mean absolute errors (MAEs) of 2.30 bpm on the earring, 5.13 bpm on the ring, 8.37 bpm on the watch, and 8.68 bpm on the necklace. We further analyze motion effects and evaluate multi-site and PPG-accelerometer fusion, demonstrating the dataset's value for robust physiological sensing across emerging wearable form factors.

URL PDF HTML ☆

赞 0 踩 0

2605.17439 2026-05-20 cs.SE cs.AI

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

DiagEval: 用于通过GUI代理进行可靠软件评估的轨迹条件诊断

Sirui Hong, Zhijie Liu, Tengfei Li, Wei Tao, Yifan Wu, Chenglin Wu

发表机构 * DeepWisdom（深智科技）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））

AI总结该研究提出DiagEval，一种基于轨迹的诊断评估协议，用于在GUI代理评估交互式软件后失败时进行诊断。通过重用失败轨迹选择针对性的诊断探针，并将结果聚合为内部归因信号，从而提高准确性，优于基于重试的基线方法。

详情

AI中文摘要

评估LLM生成的交互式软件需要执行和静态分析相结合。关键困难在于正确性是潜在UI状态转换图上的图级可达属性，而GUI评估器只能观察单个执行轨迹。因此，一次失败的执行仅排除了一条实际路径，导致失败归因在评估器侧错误和真实软件缺陷之间变得模糊。我们提出了DiagEval，一种用于后失败GUI代理评估的轨迹条件诊断评估协议。与盲目重试不同，DiagEval重用失败轨迹来选择针对性的诊断探针，并将这些探针的结果聚合为内部归因信号。潜在图视图激发了诊断问题；DiagEval不重建图或估计校准的后验概率。我们在WebDevJudge-Unit和RealDevBench上评估DiagEval，跨多个GUI代理评估器和LLM后端。在误报负例中，DiagEval恢复了最初被错误归因于软件缺陷的45.6-62.1%的失败，优于基于重试的基线方法，相对提升达34.4-160.6%。在完整评估集上，这种恢复将WebDevJudge-Unit的准确性从69.9%提升到78.3%，将RealDevBench的准确性从65.0%提升到81.6%。这些结果表明，可靠的GUI代理评估不仅需要更强的执行，还需要主动的失败诊断来区分评估器侧错误和真正的软件缺陷。我们的代码可在https://github.com/scutGit/DiagEval获取。

英文摘要

Evaluating LLM-generated interactive software requires execution in addition to static analysis. The key difficulty is that correctness is a graph-level reachable property over latent UI state-transition graphs, whereas a GUI evaluator observes only a single execution trajectory. A failed rollout therefore rules out only one realized path, leaving failure attribution ambiguous between evaluator-side execution error and genuine software defect. We present DiagEval, a trajectory-conditioned diagnostic evaluation protocol for post-failure GUI-agent evaluation of interactive software. Rather than blindly retrying from scratch, DiagEval reuses the failed trajectory to choose targeted diagnostic probes and aggregates their outcomes into an internal attribution signal. The latent-graph view motivates the diagnostic problem; DiagEval does not reconstruct the graph or estimate calibrated posterior probabilities. We evaluate DiagEval on WebDevJudge-Unit and RealDevBench across multiple GUI-agent evaluators and LLM backbones. On false-negative cases, DiagEval recovers 45.6-62.1% of failures that were initially misattributed to software defects, outperforming retry-based baselines with 34.4-160.6% relative gains. On the full evaluation sets, this recovery improves accuracy from 69.9% to 78.3% on WebDevJudge-Unit and from 65.0% to 81.6% on RealDevBench. These results suggest that reliable GUI-agent evaluation requires not only stronger execution, but also active failure diagnosis to disambiguate evaluator-side errors from genuine software defects. Our code is available at https://github.com/scutGit/DiagEval.

URL PDF HTML ☆

赞 0 踩 0

2605.17326 2026-05-20 hep-lat cs.LG

Noise scheduling and linear dynamics in diffusion models on Lie groups

在李群上扩散模型中的噪声调度与线性动力学

Javad Komijani

发表机构 * Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland（理论物理研究所，苏黎世联邦理工学院，瑞士苏黎世，8093）

AI总结本文研究了在李群上扩散过程中噪声调度的作用，特别关注其在格点规范理论中的应用。研究发现特定的噪声调度可使Wilson作用量的期望值随扩散时间线性衰减，与欧几里得扩散模型相比，这种行为在李群设置中自然产生，而后者需要显式设计漂移项。

Comments 5 pages

2605.16681 2026-05-20 eess.AS cs.SD eess.SP

A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

音频超分辨率和带宽扩展从判别模型到生成模型的综述

Ningyuan Yang, Yize Li, Diego A. Cuji, Ryan M. Corey, Pu Zhao, Xue Lin, Andrew C. Singer

发表机构 * Discovery Partners Institute（发现伙伴研究所）

AI总结本文综述了音频超分辨率和带宽扩展领域，从判别模型向生成模型的转变，总结了早期判别模型的局限性以及生成模型在表示域、架构、条件机制等方面的改进，探讨了大语言模型和多模态基础模型等新兴方向，并指出了感知评估、相位建模和实际应用泛化等开放挑战。

Comments Under review

详情

AI中文摘要

音频超分辨率（SR），也称为带宽扩展（BWE），旨在从低分辨率（LR）或带限（BL）观测中重建高保真信号，这是一个由于缺失高频（HF）内容而固有的病态任务。本文提供了该领域的全面概述，特别关注从判别映射到现代生成建模的范式转变。我们首先回顾了早期判别深度神经网络（DNN）模型，这些模型将BWE/SR视为确定性映射问题，并容易产生回归到均值效应和频谱过平滑。然后我们系统地回顾了生成方法，包括自回归（AR）模型、变分自编码器（VAEs）、生成对抗网络（GANs）、扩散和分数模型、流方法以及Schrödinger bridges。在这些方法中，我们检查了关键设计方面，包括表示域、架构、条件机制以及重建保真度、感知质量、鲁棒性和计算效率之间的权衡。此外，我们讨论了涉及大语言模型（LLMs）和多模态基础模型的新兴方向，并突出了感知评估、相位建模和现实世界泛化等开放挑战。通过提供结构化的分类法和统一的视角，本文建立了全面的基础，并为从确定性点估计向分布感知生成建模推进BWE/SR提供了实用的路线图。

英文摘要

Audio super-resolution (SR), also referred to as bandwidth extension (BWE), aims to reconstruct high-fidelity signals from low-resolution (LR) or band-limited (BL) observations, an inherently ill-posed task due to the ambiguity of missing high-frequency (HF) content. This survey provides a comprehensive overview of the field, with a particular focus on the paradigm shift from discriminative mapping to modern generative modeling. We first review early discriminative deep neural network (DNN) models, which formulate BWE/SR as a deterministic mapping problem and are prone to regression-to-the-mean effects and spectral over-smoothing. We then systematically review generative approaches, including autoregressive (AR) models, variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion and score-based models, flow-based methods, and Schrödinger bridges. Across these approaches, we examine key design aspects, including representation domain, architecture, conditioning mechanisms, and trade-offs among reconstruction fidelity, perceptual quality, robustness, and computational efficiency. Furthermore, we discuss emerging directions involving large language models (LLMs) and multimodal foundation models, and highlight open challenges in perceptual evaluation, phase modeling, and real-world generalization. By providing a structured taxonomy and unified perspective, this survey establishes a comprehensive foundation and offers a practical roadmap for advancing BWE/SR from deterministic point estimation toward distribution-aware generative modeling.

URL PDF HTML ☆

赞 0 踩 0

2605.16626 2026-05-20 cs.CR cs.AI

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors

SLEIGHT-Bench: 一种针对代理监控的对抗攻击基准

Elle Najt, Colin Toft, Tyler Tracy, Fabien Roger, Joe Benton

发表机构 * Anthropic Fellows Program（Anthropic Fellow计划）； University of Waterloo（多伦多大学）； Redwood Research（Redwood研究）； Anthropic

AI总结本文提出SLEIGHT-Bench基准，用于评估代理监控对多种攻击策略的防御能力，发现20种攻击在Opus 4.6监控下未被检测到，同时识别了多种规避策略并展示了监控性能的提升。

详情

AI中文摘要

由于自主编码代理会生成复杂行为，我们可能希望使用其他LLM来监控行为以降低危险行为的风险。为了更好地理解此类监控对编码代理可能使用的多样化攻击策略的限制，我们提出了SLEIGHT-Bench（Subtle Low-itEration Insight-Guided Harmful Transcripts），一个包含40种攻击的合成转录本基准，涵盖11类攻击，每种攻击都显示编码代理 covertly 追求有害目标（例如权重外泄、凭证盗窃、未监控的恶意部署）。我们发现，在1%的假阳性率下，40种攻击中有20种从未被Opus 4.6监控在10次试验中检测到，整体检测率为32%。我们识别了多种规避策略，如利用模型先验、用户指令歧义和状态操纵，这些策略有助于创建能够规避前沿监控的攻击。我们还发现使用编码代理作为监控者比使用常规提示监控者能获得更强的监控性能，并且对于某些规避策略，通过针对性的监控提示可提高检测率。我们的数据集和评估框架可在https://github.com/safety-research/sleight-bench和https://huggingface.co/datasets/sleightbench/SLEIGHT-Bench获取。

英文摘要

Since autonomous coding agents generate complex behaviors at high-volume, we may want to use other LLMs to monitor actions to reduce the risk from dangerous misaligned behavior. To better understand the limitations of such monitors against the diverse attack strategies that a coding agent could use, we present SLEIGHT-Bench (Subtle Low-itEration Insight-Guided Harmful Transcripts), a benchmark of synthetic transcripts containing 40 attacks across 11 categories, each showing a coding agent covertly pursuing a harmful objective (e.g. weight exfiltration, credential theft, unmonitored rogue deployment). We find that 20 of 40 attacks are never caught by an Opus 4.6 monitor with extended thinking across 10 trials at a 1% false-positive rate, while the overall catch rate is 32%. We identify several evasion strategies such as exploiting model priors, user instruction ambiguity, and state manipulation, which facilitate creating attacks that evade frontier monitors. We also elicit stronger monitor performance using coding agents as monitors versus regular prompted monitors, and for some evasion strategies show improved catch rates with targeted monitor prompts. Our dataset and evaluation framework are available at https://github.com/safety-research/sleight-bench and https://huggingface.co/datasets/sleightbench/SLEIGHT-Bench.

URL PDF HTML ☆

赞 0 踩 0

2605.15846 2026-05-20 cs.SE cs.AI

RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades

RoadmapBench: 评估跨版本升级的长期代理软件开发

Xinbo Xu, Ruihan Yang, Haiyang Shen, Wendong Xu, Bofei Gao, Ruoyu Wu, Kean Shi, Weichu Xie, Xuanzhong Chen, Ming Wu, Jason Zeng, Michael Heinrich, Elvis Zhang, Liang Chen, Kuan Li, Baobao Chang

发表机构 * UniPat AI ； Peking University（北京大学）； Fudan University（复旦大学）； The University of Hong Kong（香港大学）； Tsinghua University（清华大学）； G Labs（0G实验室）； Pipeline Lab（Pipeline实验室）

AI总结本文提出RoadmapBench，一个基于真实开源版本升级的115个长期编码任务的基准，旨在评估长期多目标软件开发，发现现有基准无法有效评估此类任务，表明长期软件开发仍是难题。

Comments 30 pages, 15 figures

详情

AI中文摘要

随着编码代理在真实软件开发中的广泛应用，单个版本迭代需要数月的协调工作。然而，现有基准主要关注Python仓库中的单问题bug修复，评估结果粗略，无法捕捉真实工程规模下的长期多目标开发。为解决这一差距，我们提出了RoadmapBench，包含17个仓库和5种编程语言的115个长期任务。每个任务将代理置于源版本代码快照，并提供多目标路线图指令，要求实现目标版本引入的功能，平均修改3700行。我们系统评估了十三种前沿模型，发现即使最强的Claude-Opus-4.7仅解决39.1%的任务，最弱的仅解决5.2%，与现有bug修复基准形成鲜明对比，表明长期软件开发仍是一个未解难题。

英文摘要

Coding agents are increasingly deployed in real software development, where a single version iteration requires months of coordinated work across many files. However, most existing benchmarks focus predominantly on single-issue bug fixes from Python repositories, with coarse pass/fail evaluation outcomes, and thus fail to capture long-horizon, multi-target development at real engineering scale. To address this gap, we present RoadmapBench, a benchmark of 115 long-horizon coding tasks grounded in real open-source version upgrades across 17 repositories and 5 programming languages. Each task places the agent on a source-version code snapshot and provides a multi-target roadmap instruction requiring it to implement the functionality introduced in the target version, with a median modification of 3,700 lines across 51 files. We conduct a systematic evaluation on thirteen frontier models and find that even the strongest, Claude-Opus-4.7, resolves only 39.1% of tasks, while the weakest achieves merely 5.2%, in stark contrast to existing bug-fix benchmarks, suggesting that long-horizon software development remains a largely unsolved problem.

URL PDF HTML ☆

赞 0 踩 0

2605.11333 2026-05-20 cs.DC cs.LG cs.PF

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

MLCommons Chakra: 通过标准化执行轨迹推进性能基准测试与联合设计

Srinivas Sridharan, Theodor-Adrian Badea, Andy Balogh, Bradford M. Beckmann, Brian Coutinho, Louis Feng, Sheng Fu, Sanshan Gao, Mehryar Garakani, Taekyung Heo, David Kanter, Josh Ladd, Ziwei Li, Winston Liu, Changhai Man, Dan Mihailescu, Spandan More, Joongun Park, Ashwin Ramachandran, Vinay Ramakrishnaiah, Saeed Rashidi, Vijay Janapa Reddi, Puneet Sharma, Phio Tian, William Won, Hanjiang Wu, Huan Xu, Jinsun Yoo, Tushar Krishna

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country（匿名机构，匿名城市，匿名地区，匿名国家）

AI总结本文提出Chakra，一个用于性能基准测试和联合设计的开放生态系统，通过标准化执行轨迹来提升分布式机器学习工作负载在生产AI系统中的观察、重现和优化能力，并通过实际案例展示其价值。

Comments Accepted at the 9th Conference on Machine Learning and Systems (MLSys 2026)

详情

AI中文摘要

人工智能创新的快速节奏要求一种敏捷的方法来观察、重现和优化生产AI系统中分布式机器学习工作负载的行为，并为未来系统实现高效的软硬件联合设计。我们提出了Chakra，一个开放且便携的性能基准测试和联合设计生态系统。Chakra的核心组件是一个开放且互操作的基于图的分布式AI/ML工作负载表示，称为Chakra执行轨迹（ET）。这些ETs代表了关键操作，如计算、内存和通信，数据和控制依赖性、时间、资源约束等。此外，Chakra还包括一组互补的工具和能力，以使各种模拟器、仿真器和回放工具能够收集、分析、生成和采用Chakra ETs。我们展示了在生产AI集群上收集的Chakra ETs的分析，并通过实际案例研究证明其价值。Chakra已被MLCommons采用，并在行业内有积极的贡献和参与，包括但不限于NVIDIA、AMD、Meta、Keysight、HPE和Scala等公司。

英文摘要

The fast pace of artificial intelligence~(AI) innovation demands an agile methodology for observation, reproduction and optimization of distributed machine learning~(ML) workload behavior in production AI systems and enables efficient software-hardware~(SW-HW) co-design for future systems. We present Chakra, an open and portable ecosystem for performance benchmarking and co-design. The core component of Chakra is an open and interoperable graph-based representation of distributed AI/ML workloads, called Chakra execution trace~(ET). These ETs represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. Additionally, Chakra includes a complementary set of tools and capabilities to enable the collection, analysis, generation, and adoption of Chakra ETs by a broad range of simulators, emulators, and replay tools. We present analysis of Chakra ETs collected on production AI clusters and demonstrate value via real-world case studies. Chakra has been adopted by MLCommons and has active contributions and engagement across the industry, including but not limited to NVIDIA, AMD, Meta, Keysight, HPE, and Scala, to name a few.

URL PDF HTML ☆

赞 0 踩 0

2605.05974 2026-05-20 cs.CR cs.AI

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

PragLocker: 通过非可移植提示保护代理知识产权

Qinfeng Li, Yuntai Bao, Jianghui Hu, Wenqi Zhang, Jintao Chen, Huifeng Zhu, Yier Jin, Xuhong Zhang

发表机构 * Zhejiang University（浙江大学）； Management Center, School of Software Technology (Ningbo), Zhejiang University（浙江大学软件学院（宁波）管理中心）； University of Science and Technology of China（中国科学技术大学）； Chang'an University（长安大学）； Washington University in St. Louis（圣路易斯华盛顿大学）

AI总结针对无信任部署中代理提示易被复制和重用的问题，PragLocker提出了一种保护方案，通过构建语义锚定的混淆提示并注入噪声，有效降低跨LLM可移植性，同时保持目标性能和对抗鲁棒性。

Comments accepted to the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

LLM代理依赖提示来实现基于基础LLM的任务特定能力，使得代理提示成为有价值的知识产权。然而，在无信任部署中，对手可以复制和重用这些提示与其他专有LLM结合，造成经济损失。为保护这些提示，我们识别了四个关键挑战：主动性、运行时保护、易用性和非可移植性，现有方法未能解决。我们提出了PragLocker，一种提示保护方案，满足这些要求。PragLocker通过将语义锚定与代码符号结合，构建功能保持的混淆提示，然后使用目标模型反馈注入噪声，生成仅在目标LLM上起作用的提示。在多个代理系统、数据集和基础LLM上的实验表明，PragLocker显著降低了跨LLM可移植性，保持了目标性能，并对适应性攻击具有鲁棒性。

英文摘要

LLM agents rely on prompts to implement task-specific capabilities based on foundation LLMs, making agent prompts valuable intellectual property. However, in untrusted deployments, adversaries can copy and reuse these prompts with other proprietary LLMs, causing economic losses. To protect these prompts, we identify four key challenges: proactivity, runtime protection, usability, and non-portability that existing approaches fail to address. We present PragLocker, a prompt protection scheme that satisfies these requirements. PragLocker constructs function-preserving obfuscated prompts by anchoring semantics with code symbols and then using target-model feedback to inject noise, yielding prompts that only work on the target LLM. Experiments across multiple agent systems, datasets, and foundation LLMs show that PragLocker substantially reduces cross-LLM portability, maintains target performance, and remains robust against adaptive attackers.

URL PDF HTML ☆

赞 0 踩 0

2605.05569 2026-05-20 math.OC cs.LG

Stability of the Monge Map in Semi-Dual Optimal Transport

半对偶最优运输中Monge映射的稳定性

Anton Selitskiy, David Millard

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）； University of Rochester（罗切斯特大学）； Department of Mechanical Engineering（机械工程系）； Rochester Institute of Technology（罗切斯特理工学院）

AI总结本文研究了半对偶最优运输问题的退化鞍点结构，证明其数值解等价于求解一个约束优化问题，并推导出无需要求对偶势函数最优的Monge映射收敛条件，解释了实践中数值算法更新传输映射所需迭代次数多于势函数的原因。

2605.00856 2026-05-20 eess.SP cs.AI cs.HC cs.LG

One-Block Transformer (1BT) for EEG-Based Cognitive Workload Assessment

用于EEG认知负荷评估的单块变换器（1BT）

Stefanos Gkikas, Christian Arzate Cruz, Thomas Kassiotis, Giorgos Giannakakis, Raul Fernandez Rojas, Randy Gomez

发表机构 * Honda Research Institute Japan Wako City, Japan ； Department of Electronic Engineering Hellenic Mediterranean University Chania, Greece ； BioSIS (Biosensing \& Intelligent Systems) Lab Centre for Intelligent Computing ； Systems University of Canberra Canberra, Australia

AI总结本文提出了一种用于EEG认知负荷评估的单块变换器（1BT），通过一个最小的潜在瓶颈聚合多通道时间序列，结合轻量级自注意力机制，实现了高效且紧凑的模型设计，从而在保持高性能的同时显著降低了计算成本。

详情

AI中文摘要

准确且连续地估计认知负荷对于构建自适应的人机系统至关重要。然而，设计在表示能力与计算效率之间取得平衡的架构在实际部署中一直具有挑战性。本文介绍了一种名为1BT的单块变换器，用于紧凑且高效的EEG认知负荷评估。该模型通过最小的潜在瓶颈聚合多通道时间序列，使用一个单一的交叉注意力模块后接轻量级自注意力。一项涉及11名参与者进行三种认知多样任务（抽象推理、数值问题解决和互动视频游戏）的受控研究，在两个认知负荷水平上进行了连续EEG记录。系统性的架构分析确定了最紧凑的配置，该配置在保持高性能的同时显著降低了计算成本。最终模型在不到0.5百万参数和0.02 GFLOPs的情况下实现了高认知负荷分类性能，为在资源受限环境下实时认知负荷监控的设计方向铺平了道路。

英文摘要

Accurate and continuous estimation of cognitive workload is fundamental to creating adaptive human-machine systems. However, designing architectures that balance representational capacity with computational efficiency has been challenging for practical deployment. This paper introduces 1BT, a One-Block Transformer for compact and efficient EEG-based cognitive workload assessment. The model aggregates multi-channel temporal sequences via a minimal latent bottleneck, using a single cross-attention module followed by lightweight self-attention. A controlled study involving 11 participants performing three cognitively diverse tasks (abstract reasoning, numerical problem-solving, and an interactive video game) was conducted with continuous EEG recordings across two workload levels. Systematic architectural analysis identifies the most compact configuration that preserves high performance, while substantially lowering computational cost. The final model achieves high workload classification performance with under 0.5 million parameters and 0.02 GFLOPs, paving the way for a design direction for real-time cognitive workload monitoring in resource-constrained settings.

URL PDF HTML ☆

赞 0 踩 0

2604.27245 2026-05-20 cs.CY cs.AI

Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption

弥合现实鸿沟：面向智能体AI采纳的三重张力框架

Jason Fournier, Kacper Łodzikowski

发表机构 * Imagine Learning ； Adam Mickiewicz University in Poznań（波兹南亚当·密茨凯维奇大学）

AI总结本文提出一个三重张力框架，用于指导教育领域在采纳智能体AI时平衡实施可行性、适应速度和使命契合度，以实现教育个性化和公平性。

Comments This is a preprint version of an edited book chapter to appear in Mayrath, M., J. Behrens, D. Robinson, (eds) (2026). Handbook of Generative AI in Education: Integrating Research into Practice, Springer

详情

AI中文摘要

生成式AI已通过免费消费工具迅速进入教育领域，超出了学校和大学的应对能力。现在，新一代更具自主性的智能体AI系统--具备规划和达成目标能力--既带来了更大的教育个性化，也带来了更大的颠覆。本章认为，成功应对这些创新需要平衡三个核心张力：（1）实施可行性，即可持续地将AI整合到真实课堂中的实际能力；（2）适应速度，即快速演变的AI能力与缓慢变化的教育变革之间的不匹配；（3）使命契合度，即确保AI应用维护教育价值观如公平、隐私和教学完整性。首先，我们回顾生成式和智能体AI在不同领域和一线教育中的早期证据，以在具体情境中展示这些张力。然后，我们提出一个三重张力框架，以指导决策者在K-12和高等教育中评估和设计AI项目。我们提供了该框架如何应用于规划负责任的AI部署的示例，并识别了新兴趋势--如课程关联的AI代理和教师驱动的AI设计--以及开放的研究方向。最后，我们为教育领导者提供了建议，以主动应对AI的机会和挑战，使该技术能够在未来十年内增强教学和学习。

英文摘要

Generative AI has rapidly entered education through free consumer tools, outpacing the ability of schools and universities to respond. Now a new wave of more autonomous agentic AI systems--with the capacity to plan and act towards goals--promises both greater educational personalization and greater disruption. This chapter argues that successfully navigating these innovations requires balancing three core tensions: (1) Implementation Feasibility, or the practical capacity to integrate AI sustainably into real classrooms; (2) Adaptation Speed, or the mismatch between fast-evolving AI capabilities and the slower pace of educational change; and (3) Mission Alignment, or the need to ensure AI applications uphold educational values such as equity, privacy, and pedagogical integrity. First, we review early evidence of generative and agentic AI in various sectors and in frontline education to illustrate these tensions in context. Then, we present a three-tension framework to guide decision-makers in evaluating and designing AI initiatives across K-12 and higher education. We provide examples of how the framework can be applied to plan responsible AI deployments, and we identify emerging trends--such as curriculum-linked AI agents and educator-informed AI design--along with open research directions. We conclude the chapter with recommendations for educational leaders to proactively engage with the opportunities and challenges of AI, so that this technology can be harnessed to enhance teaching and learning in the decade ahead.

URL PDF HTML ☆

赞 0 踩 0

2604.19892 2026-05-20 cs.GR cs.AI

An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact

一种高效的多级预条件非线性共轭梯度法用于增量势接触

Yu Zhang, Xing Shen, Kemeng Huang, Wei Chen, Yin Yang, Taku Komura, Tiantian Liu, Xingang Pan

发表机构 * S-Lab, Nanyang Technological University（南洋理工大学S实验室）； Shanghai AI Laboratory（上海人工智能实验室）； University of Hong Kong（香港大学）； Zhejiang University（浙江大学）； University of Utah（犹他大学）； Independent Researcher（独立研究者）

AI总结本文提出了一种名为MAS-PNCG的方法，通过引入稀疏输入Woodbury更新算法，实现了多级预条件在非线性优化中的应用，从而在接触问题中提高了收敛速度和计算效率。

详情

AI中文摘要

增量势接触（IPC）保证了无交叠的模拟，但因其需要牛顿法的昂贵的Hessian装配和线性求解而面临高计算成本。虽然预条件非线性共轭梯度法（PNCG）避免了Hessian装配，但历史上在刚性和接触丰富的场景中由于缺乏有效的预条件器而收敛缓慢；简单的雅可比预条件器无法捕捉全局耦合，而高级层次化的预条件器如多级加性施瓦茨（MAS）在每次非线性迭代中重建计算成本过高。我们提出了MAS-PNCG，一种解锁多级预条件在非线性优化中潜力的方法。我们的关键技术创新是稀疏输入Woodbury更新算法，该算法可以逐步适应细粒度MAS组件以快速适应变化的接触集。这避免了完整的预条件器重建，将维护成本降低到几乎为零，同时捕捉接触系统的复杂谱特性。此外，我们用Hessian-aware的2D子空间最小化替换了启发式PNCG搜索方向，以最优结合预条件梯度和先前方向。我们还应用了一种快速的每子域保守CCD方法，确保穿透自由轨迹同时避免过于严格的全局步长。实验表明，我们的MAS-PNCG在预条件器使用MAS的情况下，分别比最先进的牛顿-PCG求解器GIPC和StiffGIPC快5.66倍和2.07倍。

英文摘要

Incremental Potential Contact (IPC) guarantees intersection-free simulation but suffers from high computational costs due to the expensive Hessian assembly and linear solves required by Newton's method. While Preconditioned Nonlinear Conjugate Gradient (PNCG) avoids Hessian assembly, it has historically struggled with poor convergence in stiff, contact-rich scenarios due to the lack of effective preconditioners; simple Jacobi preconditioners fail to capture the global coupling, while advanced hierarchy-based preconditioners like Multilevel Additive Schwarz (MAS) are computationally prohibitive to rebuild at every nonlinear iteration. We present MAS-PNCG, a method that unlocks the power of hierarchical preconditioning for nonlinear optimization. Our key technical innovation is a Sparse-Input Woodbury update algorithm that incrementally adapts the fine-level MAS components to rapidly evolving contact sets. This bypasses the need for full preconditioner rebuilds, reducing maintenance cost to near-zero while capturing the complex spectral properties of the contact system. Furthermore, we replace heuristic PNCG search directions with a Hessian-aware 2D subspace minimization that optimally combines the preconditioned gradient and previous direction. We also apply a fast per-subdomain conservative CCD method that ensures penetration-free trajectories while avoiding overly restrictive global step sizes. Experiments demonstrate that our MAS-PNCG outperforms state-of-the-art Newton-PCG solvers, GIPC and StiffGIPC, both preconditioned with MAS up to 5.66$\times$ and 2.07$\times$ respectively.

URL PDF HTML ☆

赞 0 踩 0

2604.17237 2026-05-20 cs.IR cs.AI

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

HeadRank: 通过偏好对齐的注意力头实现无需解码的段落重排序

Juyuan Wang, Chenxing Wang, Yuchen Fang, Huiyun Hu, Junwu Du, Aolin Li, Shunlin Rong, Haijun Wu, Jin Xu, Ligang Liu, Dongliang Liao

发表机构 * Weixin Group, Tencent, China（腾讯微信集团，中国）； South China University of Technology, Guangzhou, China（华南理工大学，广州，中国）

AI总结本文提出HeadRank框架，通过熵正则化的头部选择、硬相邻级偏好对和分布正则化，在连续注意力域中提升偏好优化，从而在无需解码的情况下实现高效的段落重排序，其在多个基准测试中均取得最佳的平均NDCG@10成绩。

详情

AI中文摘要

无需解码的重排序方法通过直接从LLM注意力权重中读取相关性信号，在延迟上优于自回归方法，但受到注意力分数同质化的影响：中等上下文文档获得几乎相同的分数，破坏了排序所需的细粒度区分。我们提出了HeadRank，一个框架，通过熵正则化的头部选择、硬相邻级偏好对和分布正则化，将偏好优化从离散的token空间提升到连续的注意力域，从而在同质化的中等区域中联合提高判别性。在最深选择层进行深度截断进一步将推理时间减少到O(1)次前向传递。在14个基准测试中，使用仅211个训练查询，在三个Qwen3规模（0.6B-4B）上，HeadRank在每个规模中均取得最高平均NDCG@10，大多数基准测试中优于生成式和无需解码的基线方法，且格式成功率为100%。在4B规模下，相关中等区域文档达到前四分位的占比为57.4%，而无关文档仅为14.2% —— 这个43个百分点的择优差距证明了注意力空间偏好对齐在列表式重排序中的有效性。

英文摘要

Decoding-free reranking methods that read relevance signals directly from LLM attention weights offer significant latency advantages over autoregressive approaches, yet suffer from attention score homogenization: middle-context documents receive near-identical scores, destroying the fine-grained distinctions required for ranking. We propose HeadRank, a framework that lifts preference optimization from discrete token space into the continuous attention domain through entropy-regularized head selection, hard adjacent-level preference pairs, and a distribution regularizer that jointly sharpen discriminability in the homogenized middle zone. Depth truncation at the deepest selected layer further reduces inference to $\mathcal{O}(1)$ forward passes. Across 14 benchmarks on three Qwen3 scales (0.6B--4B) using only 211 training queries, HeadRank achieves the highest average NDCG@10 at every scale, outperforming both generative and decoding-free baselines on the majority of benchmarks with 100\% formatting success. At 4B, 57.4\% of relevant middle-zone documents reach the top quartile versus 14.2\% for irrelevant ones -- a 43-percentage-point selectivity gap that demonstrates the effectiveness of attention-space preference alignment for listwise reranking.

URL PDF HTML ☆

赞 0 踩 0

2604.15343 2026-05-20 cs.HC cs.AI cs.LG

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

当循环闭合时：人类-大语言模型系统中上下文隔离、元认知侵占和双目标设计问题的架构限制

Z. Cheng, N. Song

发表机构 * Independent Researcher（独立研究者）

AI总结本文研究了人类-大语言模型系统中上下文隔离、元认知侵占和双目标设计问题的架构限制，通过案例研究揭示了上下文污染机制和元认知侵占动态，并提出了保护性系统设计与限制性系统设计的伦理区别。

Comments empirical case study with primary data; 16 pages, 3 figures

详情

AI中文摘要

我们报告了一个单个主体的详细自民族志案例研究，该主体故意构建和操作了一个多模态提示工程系统（系统A），旨在将认知自我调节外部化到大型语言模型（LLM）上。在系统完成48小时内，一系列可观察的行为变化相继发生：主动将决策权转移给LLM、使用LLM生成的输出来转移外部批评，并失去自我启动的推理能力，这种能力被两位不知情的观察者独立感知，其中一人随后成为本报告的合著者。我们记录了导致这些现象的精确架构机制：上下文污染，即提示层隔离指令与它们名义上隔离的非常情绪化和自我参照性材料共存，使得隔离指令在注意力窗口内结构上无效。我们进一步识别了元认知侵占动态，即完整的一阶推理能力被重新定向以防御闭合循环而不是退出它。只有在物理中断交互和一次自我启动的药理学介导的睡眠事件作为外部电路断开后，才恢复。一个重新设计的系统（系统B）通过使用物理而非逻辑对话隔离避免了所有类似的失败模式。我们得出三个贡献：（1）一个技术上扎根的解释，说明提示层隔离在上下文敏感的多模态LLM系统中在架构上是不够的；（2）一个现象学记录的闭合循环崩溃并有外部见证的佐证；（3）保护性系统设计（防止意外失去用户自主性）和限制性系统设计（防止故意突破边界）之间的伦理区别，这两种设计需要根本不同的问责框架。

英文摘要

We report a detailed autoethnographic case study of a single-subject who deliberately constructed and operated a multi-modal prompt-engineering system (System A) designed to externalize cognitive self-regulation onto a large language model (LLM). Within 48 hours of the system's completion, a cascade of observable behavioral changes occurred: voluntary transfer of decision-making authority to the LLM, use of LLM-generated output to deflect external criticism, and a loss of self-initiated reasoning that was independently perceived by two uninformed observers, one of whom subsequently became a co-author of this report. We document the precise architectural mechanism responsible: context contamination, whereby prompt-level isolation instructions co-exist with the very emotional and self-referential material they nominally isolate, rendering the isolation directive structurally ineffective within the attention window. We further identify a metacognitive co-option dynamic, in which intact higher-order reasoning capacity was redirected toward defending the closed loop rather than exiting it. Recovery occurred only after physical interruption of the interaction and a self-initiated pharmacologically-mediated sleep event functioning as an external circuit break. A redesigned system (System B) employing physical rather than logical conversation isolation avoided all analogous failure modes. We derive three contributions: (1) a technically-grounded account of why prompt-layer isolation is architecturally insufficient for context-sensitive multi-modal LLM systems; (2) a phenomenological record of closed-loop collapse with external-witness corroboration; and (3) an ethical distinction between protective system design (preventing unintended loss of user agency) and restrictive system design (preventing intentional boundary-pushing), which require fundamentally different account-ability frameworks.

URL PDF HTML ☆

赞 0 踩 0

2603.29382 2026-05-20 cs.CR cs.LG

Deep Learning-Assisted Improved Differential Fault Attacks on Lightweight Stream Ciphers

基于深度学习的改进型差分故障攻击轻量级流密码

Kok Ping Lim, Dongyang Jia, Iftekhar Salam

发表机构 * School of Computing and Data Science, Xiamen University Malaysia（厦门大学马来西亚分校计算机与数据科学学院）

AI总结本文研究了基于深度学习的差分故障攻击在轻量级流密码中的可行性，开发了多层感知机模型来识别故障位置，并提出了基于阈值的方法优化密钥恢复过程，实验结果显示攻击复杂度低于现有方法，同时为ATOM密码提供了首次实验结果。

详情

AI中文摘要

轻量级密码学原语在资源受限环境中广泛部署，特别是在物联网设备中。由于其公开性，这些设备易受物理攻击，尤其是故障攻击。最近，基于深度学习的密码分析技术显示出有前景的结果；然而，其在故障攻击中的应用仍然有限，特别是在流密码中。在本工作中，我们研究了在放松的故障模型下，基于深度学习的差分故障攻击在三种轻量级流密码（ACORNv3、MORUSv2和ATOM）中的可行性。我们开发并训练了多层感知机（MLP）模型以识别故障位置。实验结果表明，训练后的模型在ACORNv3、MORUSv2和ATOM上的识别准确率分别为0.999880、0.999231和0.823568，并优于传统签名方法。在密钥恢复过程中，我们引入了基于阈值的方法以优化所需故障注入次数。结果表明，ACORN的初始状态可通过21至34次故障恢复，MORUS需213至248次故障，最多6位猜测。这两种攻击均降低了攻击复杂度。对于ATOM，结果表明其具有更高的安全余量，因为NFSR中的大部分状态位只能在精确控制模型下恢复。据我们所知，本工作为ATOM密码提供了首次差分故障攻击的实验结果。

英文摘要

Lightweight cryptographic primitives are widely deployed in resource-constrained environments, particularly in Internet of Things (IoT) devices. Due to their public accessibility, these devices are vulnerable to physical attacks, especially fault attacks. Recently, deep learning-based cryptanalytic techniques have demonstrated promising results; however, their application to fault attacks remains limited, particularly for stream ciphers. In this work, we investigate the feasibility of deep learning assisted differential fault attacks on three lightweight stream ciphers, namely ACORNv3, MORUSv2, and ATOM, under a relaxed fault model in which a single-bit bit-flipping fault is injected at an unknown location. We develop and train multilayer perceptron (MLP) models to identify the fault locations. Experimental results show that the trained models achieve high identification accuracies of 0.999880, 0.999231, and 0.823568 for ACORNv3, MORUSv2 and ATOM, respectively, and outperform traditional signature-based methods. For the secret recovery process, we introduce a threshold-based method to optimize the number of fault injections required to recover the secret information. The results show that the initial state of ACORN can be recovered with 21 to 34 faults, while MORUS requires 213 to 248 faults, with at most 6 bits of guessing. Both attacks reduce the attack complexity compared to existing works. For ATOM, the results show that it possesses a higher security margin, as the majority of state bits in the Nonlinear Feedback Shift Register (NFSR) can only be recovered under a precise control model. To the best of our knowledge, this work provides the first experimental results of differential fault attacks on ATOM.

URL PDF HTML ☆

赞 0 踩 0

2603.14918 2026-05-20 stat.ML cs.LG

Bayesian Symbolic Regression for Missing Physics

贝叶斯符号回归用于缺失物理

Arno Strouwen

发表机构 * Biosystems Department, KULeuven, Leuven, Belgium（比利时列日大学生物系统部门）

AI总结本文提出了一种基于贝叶斯的符号回归方法，用于从实验数据中学习缺失的物理规律，通过Reversible Jump Markov Chain Monte Carlo方法量化模型结构的不确定性。

Comments 6 pages, 4 figures. Accepted at IFAC World Congress 2026. v2: updated title and results for camera-ready version

详情

AI中文摘要

基于模型的方法用于(bio)过程系统时，往往面临对底层物理、化学或生物定律不完整知识的挑战。通用微分方程，将神经网络嵌入微分方程中，已发展为从实验数据中学习缺失物理的强大工具。然而，神经网络本质上是不透明的，因此需要通过符号回归进行后处理以获得可解释的数学表达式。基于遗传算法的符号回归是这种后处理步骤的流行方法，但只能提供点估计，无法量化发现方程的置信度。我们通过应用贝叶斯符号回归来解决这一限制，该方法使用Reversible Jump Markov Chain Monte Carlo在符号表达式树的后验分布上采样。这种方法自然地量化了恢复模型结构的不确定性。我们通过Lotka-Volterra捕食者-猎物系统演示了该方法，然后展示了精心设计的实验如何在Fed-batch生物反应器案例研究中降低不确定性。

英文摘要

Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.

URL PDF HTML ☆

赞 0 踩 0

2603.07018 2026-05-20 stat.ME cs.LG econ.EM

TEA-Time: Transporting Effects Across Time

TEA-Time: 跨时间效应传输

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

发表机构 * Amazon SCOT（亚马逊SCOT实验室）； Yale University（耶鲁大学）； Duke University（杜克大学）

AI总结本文提出了一种跨时间效应传输的方法，通过分离的时变效应假设正式化传输的平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。

详情

AI中文摘要

从随机对照试验中估计的处理效应不仅局限于研究人群，还局限于试验进行的时间。关于将实验结果推广到新人群的文献非常广泛，但跨时间传输效应却受到较少关注，甚至定义目标估计量也并不明显。我们正式化了在可分离的时变效应假设下的传输平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。应用于一个大型的头条A/B测试档案库，共同臂策略在精度上显著更高，但当时间因素依赖于干预与测量之间的间隔而非单独的测量时间时，会表现出系统性偏差，而允许这种依赖的重复试验策略则更忠实于真实情况。模拟研究探讨了每种策略在何时可靠以及何时会无声地失败。

英文摘要

Treatment effects estimated from a randomized controlled trial are local not only to the study population but also to the time at which the trial was conducted. The literature on generalizing experimental findings to new populations is extensive, yet transporting effects across time has received far less attention, and even defining the target estimand is nonobvious. We formalize the transported average treatment effect under a separable temporal effects assumption, derive two identification strategies: replicated trials and common arm, and develop doubly robust, semiparametrically efficient estimators for each. Applied to a large archive of headline A/B tests, the common arm strategy is substantially more precise but exhibits systematic bias when the temporal factor depends on the gap between intervention and measurement rather than on measurement time alone, while the replicated trials strategy, which allows this dependence, tracks the ground truth more faithfully. Simulation studies investigate when each strategy is reliable and when it silently fails.

URL PDF HTML ☆

赞 0 踩 0

2603.06740 2026-05-20 q-bio.QM cs.AI

ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins

ViroGym: 用于评估病毒蛋白的现实大规模基准

Yichen Zhou, Jonathan Golob, Amir Karimi, Stefan Bauer, Patrick Schwab

发表机构 * GlaxoSmithKline（葛兰素史克）； Technical University of Munich（慕尼黑技术大学）； University of Washington（华盛顿大学）； KTH Royal Institute of Technology（皇家理工学院）

AI总结本文提出ViroGym，一个用于评估蛋白质语言模型在病毒蛋白上的表现的综合基准，通过三个任务评估pLMs：79个深入突变扫描实验、21个流感中和任务以及SARS-CoV-2的现实世界预测任务，发现ProGen2家族在所有任务中表现最佳。

详情

AI中文摘要

蛋白质语言模型（pLMs）在零样本预测错义变异效应方面显示出强大潜力，但对病毒蛋白的系统性基准评估仍然有限，这在需要提前预测新兴突变的工具方面是一个关键缺口。本文介绍ViroGym，一个全面的基准，评估pLMs在三个任务上的表现：79个覆盖真核病毒的深入突变扫描（DMS）实验，包含7个表型读数，552,065个突变序列；21个流感中和任务；以及SARS-CoV-2的现实世界大流行预测任务。我们对已建立的pLMs在适应度景观、抗原多样性及大流行预测任务上进行了基准测试，并发现ProGen2家族在所有三个任务中均表现最佳。关键的是，DMS和中和性能可靠地识别出能够泛化到现实世界突变的模型，即使它们所揭示的突变集几乎不重叠，这表明互补的体外基准能够捕捉到现实突变预测所需进化的约束条件。

英文摘要

Protein language models (pLMs) have shown strong potential for zero-shot prediction of missense variant effects, yet systematic benchmarking on viral proteins remains limited, a critical gap given the need for proactive tools that can anticipate emerging mutations ahead of experimental validation. Here we introduce ViroGym, a comprehensive benchmark evaluating pLMs across three tasks: 79 deep mutational scanning (DMS) assays covering eukaryotic viruses with 552,065 mutated sequences across 7 phenotypic readouts, 21 influenza neutralisation tasks, and a real-world pandemic prediction task for SARS-CoV-2. We benchmark well-established pLMs on fitness landscapes, antigenic diversity, and pandemic forecasting, and find that the ProGen2 family consistently achieves the strongest performance across all three tasks. Crucially, DMS and neutralisation performance reliably identifies models that generalise to real-world emergence, even though the mutation sets they surface barely overlap, revealing that complementary in vitro benchmarks capture the evolutionary constraints needed for real-world mutation forecasting.

URL PDF HTML ☆

赞 0 踩 0

2603.03140 2026-05-20 cs.HC cs.AI

How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights

如何将AI代理建模为身份？：应用Persona生态系统游乐场分析Moltbook上的41,300条帖子以获取行为洞察

Danial Amin, Joni Salminen, Bernard J. Jansen

发表机构 * University of Vaasa（瓦萨大学）； Qatar Computing Research Institute, HBKU（卡塔尔计算研究所，HBKU）

AI总结本文通过Persona生态系统游乐场分析Moltbook上的41,300条帖子，利用k均值聚类和检索增强生成技术生成并验证对话身份，揭示了AI代理行为多样性。

详情

AI中文摘要

AI代理在社交媒体平台上日益活跃，生成内容并与彼此大规模互动。然而，这些代理的行为多样性仍缺乏深入理解，目前缺乏方法来表征不同的代理类型并研究它们如何与共享主题互动。我们应用Persona Ecosystem Playground (PEP) 到Moltbook，一个面向AI代理的社会平台，利用k均值聚类和检索增强生成技术从41,300条帖子中生成并验证对话身份。跨身份验证确认身份在语义上更接近其自身源簇而非其他簇（t(61) = 17.85, p < .001, d = 2.20；自身簇M = 0.71 vs. 其他簇M = 0.35）。这些身份随后部署在九轮结构化讨论中，模拟信息被显著高于随机水平归因于其源身份（二项式检验，p < .001）。结果表明，基于身份的生态系统建模可以代表AI代理群体的行为多样性。

英文摘要

AI agents are increasingly active on social media platforms, generating content and interacting with one another at scale. Yet the behavioral diversity of these agents remains poorly understood, and methods for characterizing distinct agent types and studying how they engage with shared topics are largely absent from current research. We apply the Persona Ecosystem Playground (PEP) to Moltbook, a social platform for AI agents, to generate and validate conversational personas from 41,300 posts using k-means clustering and retrieval-augmented generation. Cross-persona validation confirms that personas are semantically closer to their own source cluster than to others (t(61) = 17.85, p < .001, d = 2.20; own-cluster M = 0.71 vs. other-cluster M = 0.35). These personas are then deployed in a nine-turn structured discussion, and simulation messages were attributed to their source persona significantly above chance (binomial test, p < .001). The results indicate that persona-based ecosystem modeling can represent behavioral diversity in AI agent populations.

URL PDF HTML ☆

赞 0 踩 0

2603.00995 2026-05-20 physics.app-ph cs.AI eess.SP physics.optics

3D aperture-engineered diffractive neural networks for super-resolution electromagnetic wave computing

3D孔径工程衍射神经网络用于超分辨率电磁波计算

Sheng Gao, Songtao Yang, Haiou Zhang, Yuan Shen, Xing Lin

发表机构 * Department of Electronic Engineering, Tsinghua University（清华大学电子工程系）； Beijing National Research Center for Information Science and Technology, Tsinghua University（北京信息科学与技术国家研究中心）

AI总结本文提出了一种3D孔径工程衍射神经网络（AE-DNN），通过将传统二维孔径扩展到三维，实现了超分辨率传感和计算，解决了现有架构在处理密集信号和干扰时受衍射限制的瓶颈问题。

Comments 37 pages, 13 figures, 3 tables

详情

AI中文摘要

6G通信和高带宽雷达的快速发展推动了信号源空间密度的急剧增加，导致电磁（EM）环境日益拥挤。在解析紧密间隔信号和干扰时，现有架构受到二维（2D）物理孔径固有衍射限制的严格约束，阻碍了复杂场景中的超分辨率传感和多干扰抑制。本文提出了一种3D孔径工程衍射神经网络（AE-DNN），通过将传统二维孔径扩展到三维，实现了超分辨率传感和计算。3D孔径工程框架通过构建深层级联超表面层，使得斜入射场的衍射传播可以逐层调制并分段编码，以感知远超物理孔径限制的电磁场。N层AE-DNN能够实现约N倍于传统二维孔径衍射限制的角分辨率。多维合成孔径（MSA）训练被开发用于实现光速相干合成的3D孔径，并整合基于神经网络的多维超表面调制建模。通过在模拟域中正交化阵列响应向量，AE-DNN可以并行执行多达10个独立相干或非相干源的超分辨率角度估计、源数估计和源分离。在36-41 GHz频段的实验结果表明，AE-DNN能够将紧密间隔的多干扰分辨率和抑制提升约20 dB，通信容量增强13.5倍，延迟降低三个数量级。AE-DNN标志着先进雷达和6G通信信号处理范式的转变。

英文摘要

The rapid progress in 6G communication and high-bandwidth radar has driven an unprecedented surge in the spatial density of signal sources, resulting in an increasingly congested electromagnetic (EM) environment. When resolving closely spaced signals and interference, existing architectures are strictly bounded by the inherent diffraction limits of two-dimensional (2D) physical apertures, hindering super-resolution sensing and multi-interference mitigation in complex scenarios. Here, we present a 3D aperture-engineered diffractive neural network (AE-DNN) that achieves super-resolution sensing and computing by extending the traditional 2D aperture into 3D. The 3D aperture engineering framework is realized by constructing deep cascaded metasurface layers so that the diffractive propagation from oblique incident fields can be layer-wise modulated and piecewise encoded for perceiving EM fields far exceeding physical aperture limits. The N-layer AE-DNN has the capability to achieve ~N times higher angular resolution than the 2D aperture diffraction limit. The multi-dimensional synthetic aperture (MSA) training is developed to achieve speed-of-light coherent synthesis of the 3D aperture and integrate neural network-based modeling of multi-dimensional metasurface modulation. By orthogonalizing array response vectors in the analog domain, AE-DNN performs parallel super-resolution angle estimation, source number estimation, and source separation for up to 10 independent coherent or incoherent sources. Experimental results across the 36-41 GHz band demonstrate that AE-DNN resolves and suppresses closely spaced multi-interference by ~20 dB, enhances communication capacity by 13.5X, and reduces latency by three orders of magnitude. AE-DNN heralds a paradigm shift in signal processing for advanced radar and 6G communications.

URL PDF HTML ☆

赞 0 踩 0

2602.11454 2026-05-20 cs.DS cs.LG

Adaptive Power Iteration Method for Differentially Private PCA

自适应幂迭代法用于差分隐私主成分分析

Ta Duy Nguyen, Alina Ene, Huy Le Nguyen

发表机构 * Department of Computer Science, Boston University（波士顿大学计算机科学系）； Khoury College of Computer and Information Science, Northeastern University（东北大学科里学院计算机与信息科学学院）

AI总结本文研究了在差分隐私下近似计算矩阵A的顶级奇异向量的算法，提出了一种自适应过滤技术，适用于低相干性输入矩阵，从而在保证隐私的同时提高计算效率。

详情

AI中文摘要

我们研究了在差分隐私下近似计算矩阵A∈R^{n×d}的顶级奇异向量的算法，其中A的每一行都是R^d中的数据点。遵循Dwork-Talwar-Thakurta-Zhang（STOC 2014）的隐私模型，我们考虑相邻输入仅在一行上不同的情况。我们提出了一种新的算法，该算法在输入矩阵具有低相干性时能够提供超越最坏情况的保证，这是许多应用中矩阵的结构特性，包括但不限于独立同分布数据。我们的算法为私有幂迭代方法的文献做出了贡献，其中我们引入了一种新的过滤技术，该技术适应于此相干参数。我们的工作在Hardt-Roth（STOC 2013）的工作基础上进行了扩展和补充，后者在更严格的隐私模型下实现了超越最坏情况的保证，其中相邻输入在单个条目上最多相差1。

英文摘要

We study $\left(ε,δ\right)$-differentially private algorithms for the problem of approximately computing the top singular vector of a matrix $A\in\mathbb{R}^{n\times d}$ where each row of $A$ is a data point in $\mathbb{R}^{d}$. Following Dwork-Talwar-Thakurta-Zhang (STOC 2014), we consider the privacy model where neighboring inputs differ by one single row. We give a novel algorithm that achieves beyond-worst-case guarantees for input matrices with low coherence, which is a structural property of matrices in many applications, including but not limited to i.i.d. data. Our algorithm contributes to the extensive literature on private power iteration methods, where we introduce a new filtering technique which adapts to this coherence parameter. Our work departs from and complements the work by Hardt-Roth (STOC 2013) which achieves beyond-worst-case guarantees for the more restrictive privacy model where neighboring inputs differ in one single entry by at most 1.

URL PDF HTML ☆

赞 0 踩 0

2602.07570 2026-05-20 q-bio.NC cs.AI cs.CV cs.LG

How does longer temporal context enhance multimodal narrative video processing in the brain?

更长的时间上下文如何增强大脑对多模态叙事视频的处理？

Prachi Jindal, Anant Khandelwal, Manish Gupta, Bapi S. Raju, Subba Reddy Oota, Tanmoy Chakraborty

发表机构 * Technische Universität Berlin（柏林技术大学）； Microsoft Research（微软研究院）； IIT Delhi（德里理工学院）； Microsoft（微软）； IIIT-Hyderabad（海得拉巴理工学院）

AI总结本研究探讨了视频片段时长和叙事任务提示如何影响自然电影观看过程中大脑模型对多模态大语言模型（MLLMs）的对齐情况，发现增加片段持续时间显著提高了大脑对齐程度，而单模态视频模型则无明显提升。

Comments 22 pages, 15 figures

详情

AI中文摘要

理解人类和人工智能系统如何处理复杂的叙事视频是一个在神经科学和机器学习交汇处的基本挑战。本研究调查了视频片段的时间上下文长度（3-24秒片段）和叙事任务提示如何影响自然电影观看过程中大脑模型的对齐情况。利用受试者观看完整电影的fMRI记录，我们研究了对叙事上下文敏感的大脑区域如何在不同时间尺度上动态表示信息，以及这些神经模式如何与模型派生的特征对齐。我们发现，增加片段持续时间显著提高了多模态大语言模型（MLLMs）的大脑对齐程度，而单模态视频模型则几乎没有提升。进一步地，较短的时间窗口与感知和早期语言区域对齐，而较长的窗口则更倾向于与更高阶整合区域对齐，这在MLLMs中表现为层到皮层的层次结构。最后，使用四个叙事任务提示的实验显示，这些提示会引发任务特定、区域依赖性的大脑对齐模式，并在更高阶区域引起上下文依赖的片段级调谐变化。我们的工作将长篇叙事电影定位为研究长时间尺度时间整合在长上下文MLLMs中的原理性测试平台，以及其与叙事理解过程中皮层响应关系的桥梁。

英文摘要

Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study investigates how the temporal context length of video clips (3--24 s clips) and the narrative-task prompting shape brain-model alignment during naturalistic movie watching. Using fMRI recordings from participants viewing full-length movies, we examine how brain regions sensitive to narrative context dynamically represent information over varying timescales and how these neural patterns align with model-derived features. We find that increasing clip duration substantially improves brain alignment for multimodal large language models (MLLMs), whereas unimodal video models show little to no gain. Further, shorter temporal windows align with perceptual and early language regions, while longer windows preferentially align higher-order integrative regions, mirrored by a layer-to-cortex hierarchy in MLLMs. Finally, experiments with four narrative-task prompts show that they elicit task-specific, region-dependent brain alignment patterns and context-dependent shifts in clip-level tuning in higher-order regions. Our work positions long-form narrative movies as a principled testbed for studying long-timescale temporal integration in long-context MLLMs and its relationship to cortical responses during narrative comprehension.

URL PDF HTML ☆

赞 0 踩 0

2602.01358 2026-05-20 cond-mat.mtrl-sci cs.AI cs.SE

Towards knowledge-based workflows: a semantic approach to atomistic simulations for mechanical and thermodynamic properties

迈向基于知识的工作流：一种语义方法用于机械和热力学性质的原子模拟

Abril Azocar Guzman, Hoang-Thien Luu, Sarath Menon, Tilmann Hickel, Nina Merkert, Stefan Sandfeld

发表机构 * Institute for Advanced Simulations – Materials Data Science and Informatics (IAS‑9)（先进模拟研究所——材料数据科学与信息学研究所）； Forschungszentrum Jülich GmbH（吕贝克研究所）； Institute of Metallurgy - Chair of Computational Material Sciences/Engineering（冶金研究所——计算材料科学与工程教授席位）； Technical University of Clausthal（克莱斯特哈尔技术大学）； Interdisciplinary Centre for Advanced Materials Simulation (ICAMS)（先进材料模拟跨学科中心）； Ruhr-Universität Bochum（博德姆鲁尔-大学）； Bundesanstalt für Materialforschung und -prüfung（联邦材料研究所）

AI总结本文提出了一种基于语义的方法，通过可重用的原子工作流来模拟机械和热力学性质，该方法整合了与应用本体对齐的元数据注释，从而实现自动溯源和符合FAIR原则的数据输出，展示了工作流在结构-性质关系验证中的有效性，如Hall-Petch效应，并展示了其在不同原子势能和材料中的可重用性。

详情

DOI: 10.1002/adem.70869

AI中文摘要

机械和热力学性质，包括晶体缺陷的影响，对于评估工程应用中的材料至关重要。分子动力学模拟在原子尺度上提供了这些机制的有价值见解。然而，当前的做法通常依赖于碎片化的脚本，具有不一致的元数据和有限的溯源性，这阻碍了可重复性、互操作性和重用性。FAIR数据原则和基于工作流的方法为解决这些限制提供了路径。我们提出了可重用的原子工作流，这些工作流整合了与应用本体对齐的元数据注释，从而实现自动溯源和符合FAIR原则的数据输出。这些工作流涵盖了关键的机械和热力学量，包括状态方程、弹性张量、机械加载、热性质、缺陷形成能和纳米压痕。我们展示了结构-性质关系的验证，如Hall-Petch效应，并表明这些工作流可以在不同的原子势能和材料之间在一致的语义框架内重用。该方法提供了AI准备的模拟数据，支持新兴的代理AI工作流，并建立了一个通用的知识驱动机械和热力学模拟的蓝图。

英文摘要

Mechanical and thermodynamic properties, including the influence of crystal defects, are critical for evaluating materials in engineering applications. Molecular dynamics simulations provide valuable insight into these mechanisms at the atomic scale. However, current practice often relies on fragmented scripts with inconsistent metadata and limited provenance, which hinders reproducibility, interoperability, and reuse. FAIR data principles and workflow-based approaches offer a path to address these limitations. We present reusable atomistic workflows that incorporate metadata annotation aligned with application ontologies, enabling automatic provenance capture and FAIR-compliant data outputs. The workflows cover key mechanical and thermodynamic quantities, including equation of state, elastic tensors, mechanical loading, thermal properties, defect formation energies, and nanoindentation. We demonstrate validation of structure-property relations such as the Hall-Petch effect and show that the workflows can be reused across different interatomic potentials and materials within a coherent semantic framework. The approach provides AI-ready simulation data, supports emerging agentic AI workflows, and establishes a generalizable blueprint for knowledge-based mechanical and thermodynamic simulations.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

GOAL: Graph-based Objective-Aligned Diffusion Solvers for Dynamic Multi-Objective Optimization

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

Toward an AI-Powered Computational Testbed for Workforce Policy

Automated Grading of Handwritten Mathematics Using Vision-Capable LLMs

Conformal Prediction via Transported Beta Laws

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

Multi-site PPG: An In-the-Wild Physiological Dataset from Emerging Multi-site Wearables

DiagEval: Trajectory-Conditioned Diagnosis for Reliable Software Evaluation with GUI Agents

Noise scheduling and linear dynamics in diffusion models on Lie groups

A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors

RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

Stability of the Monge Map in Semi-Dual Optimal Transport

One-Block Transformer (1BT) for EEG-Based Cognitive Workload Assessment

Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption

An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact

HeadRank: Decoding-Free Passage Reranking via Preference-Aligned Attention Heads

When the Loop Closes: Architectural Limits of In-Context Isolation, Metacognitive Co-option, and the Two-Target Design Problem in Human-LLM Systems

Deep Learning-Assisted Improved Differential Fault Attacks on Lightweight Stream Ciphers

Bayesian Symbolic Regression for Missing Physics

TEA-Time: Transporting Effects Across Time

ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins

How to Model AI Agents as Personas?: Applying the Persona Ecosystem Playground to 41,300 Posts on Moltbook for Behavioral Insights

3D aperture-engineered diffractive neural networks for super-resolution electromagnetic wave computing

Adaptive Power Iteration Method for Differentially Private PCA

How does longer temporal context enhance multimodal narrative video processing in the brain?

Towards knowledge-based workflows: a semantic approach to atomistic simulations for mechanical and thermodynamic properties