arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3406
2605.25878 2026-05-26 eess.IV cs.CV

A Clinically Validated Foundation Model for Comprehensive Lung Pathology Interpretation

临床验证的基础模型用于全面肺部病理解读

Zhengrui Guo, Zhengyu Zhang, Jiabo Ma, Yihui Wang, Fengtao Zhou, Yingxue Xu, Ling Liang, Chenglong Zhao, Qi Xie, Jinbang Li, Shujing Guo, Fangyi Han, Zhijian Cen, Ziyi Liu, Cheng Jin, Junlin Hou, Zhixuan Chen, Yu Cai, Lijuan Qu, Shifu Chen, Yueping Liu, Zhe Wang, Xiuming Zhang, Muyan Cai, Li Liang, Hao Chen

AI总结 提出PulmoFoundation,一种基于Virchow2和约4万张H&E染色全切片图像进行亚专科预训练的肺部病理基础模型,通过32项临床任务和前瞻性随机对照试验验证,在诊断准确性、效率和一致性上显著提升。

详情
AI中文摘要

病理评估指导肺癌诊断、治疗选择和预后评估,但当前的CPath方法依赖于针对孤立目标的任务特定模型。尽管泛癌基础模型提供了多功能性,但它们缺乏亚专科深度,且未在临床工作流程中评估或在真实世界环境中进行前瞻性验证。我们介绍了PulmoFoundation,这是一个多中心、前瞻性验证、随机对照试验(RCT)评估的基础模型,用于术前、术中和术后护理的全面肺部病理评估。PulmoFoundation基于Virchow2,通过使用约40,000张诊断性H&E染色全切片图像(WSI)进行亚专科特定预训练构建,并在约26,000张WSI上系统评估了32项临床相关任务。除了准确预测分子标记和患者生存率外,我们的模型在活检、冰冻切片和手术切除切片的核芯诊断任务中达到了临床级性能。在一项针对1,357名患者、涵盖11项诊断任务的注册前瞻性研究中,我们的模型实现了平均AUC 92.3%。使用预设的分诊阈值,PulmoFoundation可以减少68.8%的活检和83.0%的冰冻切片的额外二次复核负担,并推迟44.5%的IHC染色订单,阳性预测值分别为1.0、0.991和0.966。除了前瞻性验证,我们还进行了一项交叉RCT,涉及八名病理学家,AI辅助在4,928个病例-阅片者对中提高了诊断准确性(有AI为91.7%,无AI为83.8%)。AI辅助还使中位诊断时间减少了19.6%,诊断信心提高了8.7%,并将阅片者间一致性从中等(kappa=0.56)提高到显著(kappa=0.76)。这些评估共同支持PulmoFoundation作为临床验证的肺部病理决策支持系统。

英文摘要

Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.

2605.25868 2026-05-26 cs.HC cs.LG

The Timing Dependencies of Trust: Speed, Accuracy, and cBCI Neuro-Decoupling in Human-AI Teams

信任的时间依赖性:人机团队中的速度、准确性与cBCI神经解耦

Christopher Baker, Stephen Hinton, Akashdeep Nijjar, Riccardo Poli, Caterina Cinel, Tom Reed, Stephen Fairclough

AI总结 本研究通过比较快速低准确率(FLA-AI)与慢速高准确率(SA-AI)两种AI助手,利用协作脑机接口(cBCI)和自适应黎曼Oracle,揭示了AI响应时间决定了团队失败机制:快速AI引发盲目服从,慢速AI导致延迟认知冲突,并通过混合融合方法有效提升了团队性能。

详情
AI中文摘要

人工队友的速度和准确性从根本上改变了人机集成的失败状态。高速AI干预可能诱发反射性盲目服从,而延迟干预则可能引发模糊的认知冲突。本研究调查了任务内AI助手的基本特征——快速/低准确率(FLA-AI)与慢速/高准确率(SA-AI)——如何影响虚拟现实无人机任务中协作脑机接口(cBCI)团队的协同效应。17名操作员在高认知负荷下完成连续搜索任务,同时使用二维自适应黎曼Oracle映射其空间协方差。结果数学上证明,AI时间决定了团队失败机制。快速AI引发即时盲目服从;欺骗下的人类准确率降至50.2%,纯行为团队(N=8)无法超过74.1%。相反,慢速AI引发延迟认知冲突;人类犹豫(准确率61.1%),但N=8的行为团队最终恢复到100.0%。关键的是,黎曼Oracle数学上适应这些状态:它严格限制时间窗口(<0.8秒)以拦截快速反射性服从,同时扩大窗口(>1.2秒)以捕获延迟认知冲突。通过混合融合集成这些孤立的真实信号,成功挽救了快速AI团队(N=8时+7.6%),并显著加速了较小慢速AI团队的恢复(N=4时+6.9%)。这些发现证明,cBCI协同效应高度依赖于信任的时间动态,为设计动态门控的人机系统提供了关键框架。

英文摘要

The speed and accuracy of an artificial teammate fundamentally alter the failure states of Human-AI integration. While high-speed AI interventions risk inducing reflexive blind compliance, delayed interventions can induce ambiguous cognitive conflict. This study investigates how the fundamental characteristics of an in-task AI assistant, Fast/Less-Accurate (FLA-AI) versus Slow/Accurate (SA-AI) impact the synergy of Collaborative Brain-Computer Interface (cBCI) teams in a Virtual Reality drone task. Seventeen operators completed continuous search tasks under high cognitive workload while their spatial covariance was mapped using a 2D Adaptive Riemannian Oracle. The results mathematically demonstrate that AI timing dictates the mechanism of team failure. Fast AI induced instant, blind compliance; human accuracy under deception collapsed to 50.2%, and pure behavioural teams (N=8) failed to scale beyond 74.1%. In contrast, Slow AI induced delayed cognitive conflict; humans hesitated (61.1% accuracy), but N=8 behavioural teams eventually recovered to 100.0%. Crucially, the Riemannian Oracle mathematically adapted to these states: it heavily restricted temporal windows (< 0.8s) to intercept fast reflexive compliance, while widening windows (> 1.2s) to capture delayed cognitive conflict. Integrating these isolated veridical signals via Hybrid Fusion successfully rescued the Fast AI team (+7.6% at N=8) and significantly accelerated the recovery of smaller Slow AI teams (+6.9% at N=4). These findings prove that cBCI synergy is heavily contingent on the temporal dynamics of trust, providing a critical framework for designing dynamically gated Human-AI systems.

2605.25859 2026-05-26 math.ST cs.LG stat.TH

Minimax Limits of k-Fold Cross-Validation via Majority

k折交叉验证的极小极大极限:多数投票算法

Ido Nachum, Rüdiger Urbanke, Thomas Weinberger

AI总结 本文通过分析二元分类中多数投票算法的交叉验证均方误差,揭示了k折交叉验证的极小极大极限,证明当折数k随样本数n增长时,任何经验风险最小化算法的均方误差下界为Ω(√k/n)。

详情
AI中文摘要

我们研究了$k$折交叉验证作为风险估计量的均方误差,特别关注其精度如何依赖于折数$k$。尽管交叉验证被广泛使用,但关于如何选择$k$的原则性指导基本缺失,这主要是由于折间误差估计的复杂依赖性。为了获得清晰且可解释的结果,我们聚焦于二元分类中的多数投票算法,这是一个最小但非平凡的经验风险最小化过程。我们对其交叉验证行为进行了细粒度分析,表明即使这个简单算法也表现出微妙而精细的现象,现有理论对此给出的界是宽松甚至无效的。借助这一分析,我们引入了交叉验证风险估计的极小极大框架,并证明当折数随样本数$n$增长时,没有任何经验风险最小化算法能够达到$O(1/n)$的极小极大均方误差;相反,一个$Ω(√k/n)$阶的下界是不可避免的。我们的结果揭示了交叉验证作为数据重用策略的根本局限性,澄清了先前理论工作中的空白和不准确之处,并将多数投票算法定位为一个自然的基准,任何对交叉验证的紧致分析都应能够解释它。

英文摘要

We study the mean-squared error of $k$-fold cross-validation as a risk estimator, with particular emphasis on how its accuracy depends on the number of folds $k$. Despite the widespread use of cross-validation, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates. To obtain sharp and interpretable results, we focus on the majority algorithm in binary classification, a minimal yet nontrivial empirical risk minimization procedure. We provide a fine-grained analysis of its cross-validation behavior, showing that even this simple algorithm exhibits subtle and delicate phenomena for which existing theory provides loose and even vacuous bounds. Leveraging this analysis, we introduce a minimax framework for cross-validation risk estimation and prove that no empirical risk minimization algorithm can achieve an $O(1/n)$ minimax mean-squared error when the number of folds grows with the number of samples $n$; instead, a lower bound of order $Ω(\sqrt{k}/n)$ is unavoidable. Our results reveal fundamental limitations of cross-validation as a data-reuse strategy, clarify gaps and inaccuracies in prior theoretical work, and position the majority algorithm as a natural benchmark that any tight analysis of cross-validation should be able to explain.

2605.25856 2026-05-26 cs.HC cs.AI

Explaining Too Much? Understanding How Large Language Model Reasoning Traces Influence Performance and Metacognition

解释过多?理解大型语言模型推理轨迹如何影响性能和元认知

Daniela Fernandes, Daniel Buschek, Lev Tankelevitch, Thomas Kosch, Robin Welsch

AI总结 通过用户实验,研究大型语言模型展示推理轨迹(完整或摘要)对任务性能、信任、愉悦感和自我评估校准的影响,发现轨迹提升主观体验但无性能增益,且导致过度自信。

Comments 27 pages, 5 figures, 9 tables

详情
AI中文摘要

大型语言模型界面日益冗长,在最终答案之外暴露中间推理轨迹。轨迹被框架化为透明机制,但尚不清楚人们如何利用它们解决问题。我们报告了一项预注册的组间研究(N = 559),参与者在三种条件下解决十个LSAT式推理问题:仅答案基线、答案前显示完整轨迹、答案旁显示摘要轨迹。摘要轨迹在无轨迹基线上保持了任务性能,同时显著提升了信任和愉悦感,表明轨迹暴露改变了交互的主观评价,但未带来性能收益。在使用暴露冗长中间输出的开放权重推理模型时,完整轨迹相对于仅答案基线还损害了性能。在所有条件下,参与者大幅高估了自己的表现,且没有轨迹格式支持校准的自我评估。进一步分析表明,愉悦感(而非信任)承载了通向高估的间接路径,与处理流畅性解释一致。推理轨迹最好被理解为面向用户的界面工件,而非模型认知的透明窗口,校准不太可能从轨迹本身产生,最好通过首先引发用户自身推理的交互来支撑。

英文摘要

Large Language Model interfaces are increasingly verbose, exposing intermediate reasoning traces alongside final answers. Traces are framed as transparency mechanisms, yet it is unclear how people use them to solve problems. We report a preregistered between-subjects study (N = 559) in which participants solved ten LSAT-style reasoning problems under one of three conditions: an Answer-only baseline, a Full-trace revealed before the answer, and a Summary-trace presented alongside the answer. Summaries preserved task performance at the no-trace baseline while significantly elevating trust and hedonic appeal, establishing that trace exposure shifts subjective appraisal of the interaction without bringing performance benefits. Under an open-weight reasoning model exposing verbose intermediate output, full traces additionally impaired performance relative to the answer-only baseline. Across all conditions, participants substantially overestimated their performance, and no trace format supported calibrated self-evaluation. Further analysis indicates that hedonic appeal, not trust, carries the indirect path to overestimation, consistent with a processing-fluency account. Reasoning traces are best understood as user-facing interface artifacts rather than transparent windows into model cognition, and calibration is unlikely to emerge from the traces themselves and may best be scaffolded by interactions that elicit users' own reasoning first.

2605.25836 2026-05-26 cs.CR cs.AI cs.CL

TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification

TTPrint:通过发散-收敛验证实现基于证据的TTP提取

Yutong Cheng, Changze Li, Raihan Sultan Pasha Basuki, Qian Cui, Wei Ding, Peng Gao

AI总结 提出TTPrint方法,采用先广泛提取后严格验证的发散-收敛设计,结合确定性证据定位与权威定义验证,在文档级TTP提取任务上显著提升宏F1分数。

Comments Preprint

详情
AI中文摘要

从网络威胁情报(CTI)报告中提取MITRE ATT&CK技术是一个开放集、多标签问题,需要高召回率(不遗漏技术)和高精确率(不虚构未支持的技术)。现有方法——基于规则、监督学习和基于LLM的方法——难以同时实现两者:基于规则和监督方法缺乏跨多种攻击描述的泛化能力,而基于LLM的方法将候选生成和验证耦合在单一推理步骤中,导致召回率和精确率同时受限。我们提出TTPrint,通过受人类分析师工作方式启发的发散-收敛设计来解决这一挑战:首先广泛提取,然后严格验证。在发散阶段,报告被分解为原子行为,并广泛提出候选技术。然后,确定性跨度定位阶段将每个候选锚定到源文本中的特定证据窗口。收敛验证阶段仅保留由定位证据和权威MITRE定义支持的候选。我们贡献了两个评估资源——清理后的TRAM基准(TRAM-Clean)和一个新的注释数据集(TTPrint-Bench)——以解决现有基准中的已知注释噪声,并将任务提升到文档级TTP提取。在TRAM-Clean和TTPrint-Bench上,TTPrint分别达到76.48%和87.39%的宏F1,比领先基线高出63.5%和29.4%。跨六个LLM的多骨干分析和阈值敏感性研究进一步证明了跨模型选择的泛化能力,并为参数选择提供了实用指导。

英文摘要

Extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports is an open-set, multi-label problem requiring both high recall (not missing techniques) and high precision (not hallucinating unsupported ones). Existing methods--rule-based, supervised, and LLM-based--struggle to achieve both: rule-based and supervised approaches lack generalizability across diverse attack descriptions, while LLM-based approaches that couple candidate generation and validation within a single inference step suffer from limited recall and precision simultaneously. We propose TTPrint, which addresses this challenge through a diverge-then-converge design inspired by how human analysts work: first extracting broadly, then verifying rigorously. In the divergent phase, reports are decomposed into atomic behaviors and candidate techniques are proposed broadly. A deterministic span localization stage then anchors each candidate to a specific evidence window in the source text. A convergent verification stage retains only candidates supported by both the localized evidence and the authoritative MITRE definition. We contribute two evaluation resources--a cleaned TRAM benchmark (TRAM-Clean) and a new annotated dataset (TTPrint-Bench)--to address known annotation noise in existing benchmarks and elevate the task to document-level TTP extraction. On TRAM-Clean and TTPrint-Bench, TTPrint achieves 76.48% and 87.39% macro-F1 respectively, outperforming the leading baseline by 63.5% and 29.4%. A multi-backbone analysis across six LLMs and a threshold sensitivity study further demonstrate generalizability across model choices and provide practical guidance for parameter selection.

2605.25826 2026-05-26 math.NA cs.CE cs.LG cs.NA

Branched Signature Kernel Solvers for ODEs with rough Single-Trajectory signals

带粗糙单轨迹信号的常微分方程的分支签名核求解器

Munawar Ali, Qi Feng, Charlie Pyle, George Xu

AI总结 针对由单个粗糙信号驱动的ODE,提出基于计数采样和核配置的分支签名核求解器,实现准确稳定的预测。

Comments 39 pages, 12 figures

详情
AI中文摘要

我们开发了一种分支签名核求解器,用于求解由可能粗糙的强迫信号的\emph{单个观测轨迹}驱动的线性和非线性常微分方程——这种设置自然出现在地震工程、金融、生物学和结构健康监测中,其中强迫信号仅被观测一次,求解器必须尊重底层物理定律而不依赖集合实现。两个成分是新的。首先,一个\emph{计数采样}构造将单个观测转化为一个由$N+1$个嵌套训练路径组成的层次族,在这些路径上可以评估分支签名核;这使得原本为多实现回归问题设计的签名核机制能够处理单轨迹观测。其次,一个核配置框架将假设置于解的最高阶导数上(通过积分核恢复低阶导数)或解本身(在对ODE进行$m$次积分之后)。我们证明了分支签名核的通用逼近定理,利用Hairer–Kelly同态通过时间扩展路径的几何签名来表达分支签名评估。离线求解器被扩展为流式测试/训练/重训练协议,在线性情况下具有闭式在线更新,在非线性情况下具有标量牛顿步。在六个基准(El-Centro地震位移、Solow资本存量模型、fBM驱动的二阶ODE、强迫Duffing振子、路径依赖的Arias强度退化变系数振子以及含噪Kuramoto相位振子系统)上的数值实验表明,分支签名核求解器在所有情况下都能提供准确、稳定的预测。

英文摘要

We develop a branched signature kernel solver for linear and nonlinear ordinary differential equations driven by a \emph{single observed trajectory} of a possibly rough forcing signal -- a setting that arises naturally in earthquake engineering, finance, biology, and structural health monitoring, where the forcing is observed exactly once and the solver must respect the underlying physical law without recourse to an ensemble of realizations. Two ingredients are new. First, a \emph{count-sampling} construction turns the single observation into a hierarchical family of $N+1$ nested training paths on which the branched signature kernel can be evaluated; this allows the signature kernel machinery, originally designed for multi-realization regression problems, to operate on a single-trajectory observation. Second, a kernel-collocation framework places the ansatz either on the highest-order derivative of the solution (with lower derivatives recovered by integrating the kernel) or on the solution itself (after $m$-fold integration of the ODE). We prove a universal approximation theorem for the branched signature kernel, leveraging the Hairer--Kelly morphism to express branched signature evaluations through geometric signatures of time-extended paths. The offline solver is extended to a streaming Test/Train/Retrain protocol with closed-form online updates in the linear case and scalar Newton steps in the nonlinear case. Numerical experiments on six benchmarks (El-Centro earthquake displacement, the Solow capital-stock model, an fBM-driven second-order ODE, a forced Duffing oscillator, a path-dependent Arias-intensity-degraded oscillator with variable coefficients, and a noisy Kuramoto phase-oscillator system) show that the branched signature-kernel solver delivers accurate, stable predictions across all regimes.

2605.25811 2026-05-26 stat.ME cs.LG stat.ML

Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing

几何自适应反事实分布学习与扩散引导平滑

Kwangho Kim

AI总结 针对高维反事实分布学习,提出两种基于扩散引导的几何自适应平滑估计器,通过有效维度降低误差,并在CelebA实验验证。

详情
AI中文摘要

我们研究了高维结果的反事实分布学习,其反事实律可能集中在低维结构附近。标准各向同性平滑对所有环境方向一视同仁,导致不利的缩放和不稳定的局部推断。我们提出了两种基于半参数去偏的扩散引导估计器:用于反事实密度的扩散知情平滑和用于反事实得分的扩散知情得分平滑。这些估计器将因果干扰调整与由扩散得分信息驱动的几何自适应定位相结合,在去除一阶干扰偏差的同时使平滑与局部结果几何对齐。我们建立了平滑密度和基于得分目标的渐近展开、风险界限和推断程序,并在额外近似条件下获得了环境密度推断。在结构几何条件下,主导随机误差由扩散引导核诱导的有效维度控制,而非环境维度。基于CelebA的半合成实验显示几何自适应方法的误差衰减更陡峭,支持了所提出的有效维度理论。

英文摘要

We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to unfavorable scaling and unstable local inference. We propose two diffusion-guided estimators based on semiparametric debiasing: diffusion-informed smoothing for counterfactual densities and diffusion-informed score smoothing for counterfactual scores. The estimators combine causal nuisance adjustment with geometry-adaptive localization driven by diffusion score information, removing first-order nuisance bias while aligning smoothing with local outcome geometry. We establish asymptotic expansions, risk bounds, and inference procedures for smoothed density and score-based targets, with ambient density inference obtained under additional approximation conditions. Under structural geometry conditions, the leading stochastic error is governed by an effective dimension induced by the diffusion-guided kernel, rather than by the ambient dimension. Semi-synthetic experiments based on CelebA show steeper error decay for geometry-adaptive methods, supporting the proposed effective-dimension theory.

2605.25749 2026-05-26 cs.IR cs.AI cs.LG

DeGRe: Dense-supervised Generative Reranking for Recommendation

DeGRe: 密集监督的生成式重排序用于推荐

Chaotian Song, Jingyao Zhang, Chenghao Chen, Zisen Sang, Dehai Zhao, Guodong Cao, Boxi Wu, Deng Cai, Jia Jia

AI总结 提出DeGRe框架,通过离线探索中的密集监督信号(Lookahead Evaluator)指导在线生成器(Online Generator)进行单步贪婪解码,解决重排序中的启发式标签偏差和信用分配问题。

Comments Accepted to KDD 2026 (ADS Track)

详情
AI中文摘要

在多阶段推荐系统中,重排序通过捕获列表内上下文依赖关系来优化整体效用,但其核心挑战在于在指数级排列空间中探索最优序列。最近的研究转向端到端生成式框架,通常利用列表级奖励或偏好对齐来指导生成器训练。然而,这些方法仍面临两个关键问题。首先是启发式标签偏差。现有方法通常基于简单规则构建训练目标,例如将点击项提升到顶部,而忽略列表上下文中的因果依赖关系。其次是信用分配问题。稀疏的列表级后验奖励无法直接指导序列生成中的中间步骤,导致优化方向模糊。为了解决这些问题,我们提出DeGRe(密集监督的生成式重排序),一种通过密集监督弥合离线探索与在线效率之间差距的生成式重排序框架。DeGRe的核心在于其离线-在线解耦设计。在离线阶段,我们引入基于累积回归的Lookahead Evaluator,利用束搜索在未曝光空间中主动挖掘高价值前瞻序列。在训练期间,我们将评估器的逐步价值估计转换为密集监督信号,并将其蒸馏到轻量级在线生成器中。这种机制使生成器能够内化前瞻规划能力,在线推理时仅需一次高效的贪婪解码即可逼近全局最优。实验表明,DeGRe在公开基准和工业数据集上优于基线模型。我们已成功将DeGRe部署到淘宝闪购中,显著提升了在线推荐效果。

英文摘要

In multi-stage recommender systems, reranking optimizes overall utility by capturing intra-list contextual dependencies, yet its central challenge lies in exploring optimal sequences within an exponentially large permutation space. Recent studies have shifted towards end-to-end generative frameworks, which typically leverage list-wise rewards or preference alignment to guide generator training. However, these methods still face two critical issues. First is the heuristic label bias. Existing methods often construct training targets based on simple rules, such as promoting clicked items to the top, while ignoring causal dependencies within the list context. Second is the credit assignment problem. Sparse list-level posterior rewards fail to directly guide intermediate steps in sequence generation, leading to ambiguous optimization directions. To address these issues, we propose DeGRe (Dense-supervised Generative Reranking), a generative reranking framework that bridges the gap between offline exploration and online efficiency through dense supervision. The core of DeGRe lies in its offline-online decoupled design. During the offline phase, we introduce a Lookahead Evaluator based on cumulative regression, which leverages beam search to actively mine high-value lookahead sequences in the unexposed space. During training, we transform the step-wise value estimations from the evaluator into dense supervision signals and distill them into a lightweight Online Generator. This mechanism enables the generator to internalize lookahead planning capabilities, requiring only a single efficient greedy decoding pass during online inference to approximate the global optimum. Experiments demonstrate that DeGRe outperforms baseline models on public benchmarks and industrial datasets. We have successfully deployed DeGRe on Taobao Flash Shopping, significantly improving online recommendations.

2605.25746 2026-05-26 cs.MA cs.AI

Multi-Agent Coordination Adaptation via Structure-Guided Orchestration

基于结构引导编排的多智能体协调适应

Haoran Li, Shulun Chen, Shaoyuan Sun, Hanchen Wang

AI总结 提出MACA框架,通过概率视角将多智能体协调视为结构与编排的联合后验推断,利用任务和预算条件结构先验指导策略编排,实现高效自适应协调,性能平均提升8.42%且令牌消耗减少43.19%。

Comments 21 pages

详情
AI中文摘要

随着基于大语言模型的多智能体系统规模扩大以处理日益复杂的任务,平衡结构稳定性和动态适应性变得越来越具有挑战性。现有系统通常采用以结构为中心的方法,坚持预先确定的结构,限制了细粒度控制;或者采用以编排为中心的方法,动态调整决策,同时使协调结构隐含且不稳定。为了解决这一挑战,我们从概率角度重新审视多智能体协调,将其视为结构和编排联合分布的后验推断。我们引入了MACA,一个自动协调框架,它学习一个任务和预算条件的结构先验,用于智能体参与和交互。该先验指导基于策略的编排作为后验推断的近似,实现了具有细粒度控制的高效解决方案。在多个基准测试中,MACA比自适应多智能体基线平均高出8.42%,同时使用的令牌数减少了43.19%。进一步研究表明,结构和编排的联合适应抑制了冗余交互,使协调收敛到任务有效的执行。

英文摘要

As large language model (LLM)-based multi-agent systems scale to handle increasingly complex tasks, balancing structural stability and dynamic adaptability becomes increasingly challenging. Existing systems typically adopt either structure-centric methods, committing to structures determined upfront that limit fine-grained control, or orchestration-centric methods, adapting decisions dynamically while leaving coordination structure implicit and unstable. To address this challenge, we revisit multi-agent coordination from a probabilistic perspective, casting it as posterior inference over the joint distribution of structure and orchestration. We introduce MACA, an automated coordination framework that learns a task- and budget-conditioned structural prior over agent participation and interactions. This prior guides a policy-based orchestration as an approximation to posterior inference, enabling efficient solutions with fine-grained control. Across benchmarks, MACA outperforms adaptive multi-agent baselines by an average of 8.42% while using 43.19% fewer tokens. Further investigation reveals that joint adaptation of structure and orchestration suppresses redundant interactions, converging coordination toward task-effective execution.

2605.25710 2026-05-26 physics.chem-ph cond-mat.mtrl-sci cs.LG physics.comp-ph

Machine Learning Multiscale Interactions

机器学习多尺度相互作用

Àlex Solé, Sergio Suárez-Dou, Albert Mosella-Montoro, Silvia Gómez-Coca, Eliseo Ruiz, Alexandre Tkatchenko, Javier Ruiz-Hidalgo

AI总结 提出多尺度结构集成(MuSE)层次模型,通过软粗粒化池化构建多尺度表示,与多种机器学习力场耦合,准确捕获跨尺度的量子力学相互作用。

详情
AI中文摘要

现实物理系统的特征在于跨多个长度和时间尺度的涌现相互作用,这对预测性机器学习模型构成了重大挑战。大多数科学机器学习模型关注于狭窄的相互作用范围。虽然机器学习力场提供了接近量子精度的准确性,但普遍的消息传递层缺失了长程多体效应。在此,我们引入多尺度结构集成(MuSE),一种层次模型,它使用软粗粒化池化从原子到粗节点的平滑分数分配构建粗粒表示,使机器学习力场模块能够在多个尺度上运行。MuSE是架构无关的,并与SO3krates、MACE和PaiNN机器学习力场耦合,适用于分子和材料。通过基于Hessian的基准测试、生物分子的折叠轨迹以及分子-石墨烯纳米结构中的能量分布,我们展示了MuSE的强大能力——与近期其他长程机器学习模型不同,MuSE在相关尺度上准确捕获了量子力学相互作用。

英文摘要

Realistic physical systems are characterised by emergent interactions across multiple length and time scales, posing a significant challenge for predictive machine learning (ML) models. Most scientific ML models focus on a narrow range of interactions. While machine learning force fields (MLFFs) offer near-quantum accuracy, the ubiquitous message-passing layers miss long-range many-body effects. Here we introduce the Multiscale Structural Ensemble (MuSE), a hierarchical model that uses Soft Coarse-Graining Pooling to construct coarse representations from smooth fractional assignments of atoms to coarse nodes, enabling MLFF modules to operate across multiple scales. MuSE is architecture-agnostic and coupled with SO3krates, MACE, and PaiNN MLFFs for both molecules and materials. We demonstrate the power of MuSE through Hessian-based benchmarks, folding trajectories for biomolecules, and energy profiles in molecule-graphene nanostructures, where MuSE accurately captures quantum-mechanical interactions at relevant scales -- unlike other recent long-range ML models.

2605.25701 2026-05-26 cs.DC cs.CL cs.IR cs.NI

Neural Router: Semantic Content Matching for Agentic AI

神经路由器:面向智能体AI的语义内容匹配

Lauri Lovén, Abhishek Kumar, Alexander Engelhardt, Alaa Saleh, Roberto Morabito, Xiaoli Liu, Naser Hossein Motlagh, Sasu Tarkoma

AI总结 本文提出将大语言模型作为内容发布/订阅代理的语义匹配引擎,通过分析上下文窗口交叉点和判别能力交叉点,实现成本-准确性权衡,并给出三个可组合算法和自主LLM层级选择框架。

Comments 35 pages, 12 figures. Combined main paper and electronic supplement, folded into one document for arXiv

详情
AI中文摘要

大语言模型(LLM)可以作为边缘-云计算连续体中基于内容的发布/订阅代理的语义匹配引擎,用于智能体AI,弥合关键字和嵌入过滤器无法克服的词汇和模态差距。作为跨社交媒体、法律和智能家居传感器领域三个公共数据集(六个LLM、七个基线)的离线多标签检索,我们的核心贡献是一个双交叉点成本-准确性特征描述:一个分析性上下文窗口交叉点,低于该点时,CoverAndMerge压缩流水线减少LLM调用;以及一个经验性判别能力交叉点,高于该点时,匹配准确性独立于上下文预算而崩溃,取决于参数数量和训练代次的模型相关因素。两个发现具有实际意义:在判别交叉点之上,压缩无法恢复准确性,只有前沿规模的模型才能清除大型订阅集;并且后端选择主导配置选择,因此模型选择(而非流水线调优)是主要操作杠杆。我们为此提供了三个可组合算法和一个用于自主LLM层级选择的每集群体验质量框架。

英文摘要

Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum, bridging the vocabulary and modality gaps that defeat keyword and embedding filters. Framed as offline multi-label retrieval over three public datasets spanning social-media, legal, and smart-home sensor domains (six LLMs, seven baselines), our central contribution is a two-crossover cost-accuracy characterisation: an analytical context-window crossover below which a CoverAndMerge compression pipeline reduces LLM invocations, and an empirical discrimination-capacity crossover above which matching accuracy collapses independently of context budget, by a model-dependent factor of parameter count and training generation. Two findings carry practical weight: above the discrimination crossover, compression cannot recover accuracy and only frontier-scale models clear large subscription sets; and there backend choice dominates configuration choice, so model selection, not pipeline tuning, is the primary operator lever. We accompany this with three composable algorithms and a per-cluster Quality-of-Experience framework for autonomic LLM-tier selection.

2605.25682 2026-05-26 cs.DC cs.AI

Profiling-Driven Adaptive Distributed Transformer Inference on Embedded Edge Deployment

面向嵌入式边缘部署的剖析驱动自适应分布式Transformer推理

Muhammad Azlan Qazi, Alexandros Iosifidis, Qi Zhang

AI总结 通过结合分段均值压缩和轻量级离线剖析,自适应地在运行时选择本地或分布式执行,解决了嵌入式设备上分布式Transformer推理中CPU-GPU通信瓶颈问题,相比全张量交换降低了65%-77%延迟和34%-52%能耗。

详情
AI中文摘要

将Transformer推理分布在嵌入式边缘设备上可以缓解单个内存和计算约束,但在实际硬件上的实际益处仍不明确:先前的工作主要依赖于忽略硬件特定通信开销的模拟。我们在通过WiFi连接的NVIDIA Jetson Orin Nano设备上进行了硬件原型研究。我们的关键发现是,主要瓶颈不仅是网络带宽,还有通信期间的CPU-GPU暂存。由于Jetson的集成GPU架构缺乏NCCL所需的PCIe/NVLink路径,所有设备间数据通信应通过GLOO路由并在CPU内存中暂存;这种开销随通信数据量扩展,使得对于中等规模模型(如ViT),全张量交换比单设备推理更慢。因此,我们通过结合分段均值压缩与轻量级离线剖析来评估Prism,以在运行时自适应地选择本地或分布式执行。实验表明,相对于静态分布式执行设置中的全张量交换,该策略将延迟降低了65%-77%,能耗降低了34%-52%,证明了剖析驱动自适应对于嵌入式硬件上的实际分布式Transformer推理至关重要。

英文摘要

Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware remain unclear: prior work relies largely on simulations that overlook hardware-specific communication overheads. We present a hardware prototype study on NVIDIA Jetson Orin Nano devices connected over WiFi. Our key finding is that the dominant bottleneck is not just network bandwidth but also the CPU-GPU staging during communication. Because Jetson's integrated GPU architecture lacks the PCIe/NVLink pathway that NCCL requires, all inter-device data communication should be routed through GLOO and staged in CPU memory; an overhead that scales with communication data volume and makes full-tensor exchange slower than single-device inference across the batch sizes for medium sized models such as ViT. We therefore evaluate Prism by combining Segment Means compression with lightweight offline profiling to adaptively select between local and distributed execution at runtime. Experiments show that this strategy reduces latency by 65%-77% and energy consumption by 34%-52% relative to full-tensor exchange in static distributed execution setup, demonstrating that profiling-driven adaptation is essential for practical distributed Transformer inference on embedded hardware.

2605.25673 2026-05-26 cs.CR cs.AI

Referential Security as a New Paradigm for AI Evaluations

引用安全性作为AI评估的新范式

Dan Ristea, Vasilios Mavroudis

AI总结 针对AI系统持续更新导致评估标识不稳定问题,提出引用安全性范式,通过将模型身份作为可验证属性来确保评估的可重复性、纵向审计有效性和跨提供商等价性。

详情
AI中文摘要

安全评估本质上依赖于稳定的标识符。任何发现、审计或监管决策必须始终附属于其所涉及的具体工件。持续更新的人工智能系统违反了这一核心假设,公开的模型名称保持不变,而底层权重、提示、检索机制、滥用分类器、推理设置和服务基础设施却未经宣布地修改。因此,当前的评估常常适用于表面标签而非可识别和不同的系统。为了解决这个问题,我们提出引用安全性作为AI评估的新范式。基本安全问题不仅涉及模型是否安全,还涉及后续方能否最终确定特定安全声明所针对的是哪个系统。这种方法将模型身份重新定义为经验上可验证的属性,并将引用稳定性与其所制约的实质性安全声明分开。该框架为当前实践处理不善的三个关键工作流带来了可处理性。具体来说,它实现了可重复评估、纵向审计有效性和跨提供商等价性。通过将这些评估建立在可验证的工件上,我们的方法确保安全审计和监管发现在动态系统的整个操作生命周期中保持其实证效用。

英文摘要

Security evaluations inherently depend on stable identifiers. Any finding, audit, or regulatory decision must remain attached to the specific artifact it pertains to. Continuously updated artificial intelligence systems violate this core assumption, with public model designations remaining static while underlying weights, prompts, retrieval mechanisms, misuse classifiers, inference settings, and serving infrastructures undergo unannounced modifications. Consequently, current evaluations frequently apply to superficial labels rather than identifiable and distinct systems. To resolve this, we propose referential security as a new paradigm for AI evaluation. The fundamental security question extends beyond whether a model is safe to whether subsequent parties can conclusively determine which system a specific safety claim addressed. This approach reframes model identity as an empirically verifiable property and separates referential stability from the substantive security claims it conditions. This framework brings tractability to three critical workflows that current practices handle poorly. Specifically, it enables reproducible evaluation, longitudinal audit validity, and cross-provider equivalence. By grounding these evaluations in verifiable artifacts, our approach ensures that safety audits and regulatory findings maintain their empirical utility across the operational lifecycle of dynamic systems.

2605.25665 2026-05-26 cs.SE cs.AI

Meta-Engineering Harnesses for AI-Native Software Production: A Contract-Driven Adversarial Verification Architecture with Early Deployment Report

面向AI原生软件生产的元工程框架:一种基于合约的对抗性验证架构及早期部署报告

Satadru Sengupta, Tamunokorite Briggs, Ivan Myshakivskyi

AI总结 提出一种元工程框架,通过合约驱动、角色专业化AI代理和对抗性验证,实现AI原生软件的持续生产、验证与改进,并在小型服务公司的CTO即服务场景中部署17项功能,验证了其可靠性。

Comments 17 pages, 2 figures, early deployment report

详情
AI中文摘要

AI原生软件开发通常在单个模型、提示或生成工件的层面进行评估。这种框架对于生产环境是不够的,在这些环境中,软件必须在多个操作上下文和长时间跨度内持续生产、验证、部署、维护和适应。我们提出了一种元工程框架:一种软件生产架构,它将操作和产品特性需求转化为明确的合约,通过角色专业化的AI代理分配工作,执行独立和对抗性验证,并通过结构化失败分类和外环校准持续自我改进。该框架专为软件交付不是一次性项目而是持续运营功能的场景设计。在我们的激励应用——面向小型服务公司的CTO即服务中,该系统将网站、预订流程、支付系统、后台工作流自动化和AI代理接口作为持续演进的技术基础设施进行管理,而非一次性交付物。我们描述了分层架构,包括两遍合约编译、带有专业化记录的持久化Markdown记忆、基于注意力和独立性的验证、四路失败仲裁器以及外环校准。我们报告了早期生产部署的结果,该部署跨越数周,涵盖17项功能,包括一个详细的应用内支付案例研究,揭示了合约不完整性和验证边界问题。这些观察直接推动了框架的针对性改进。贡献在于实现了一个可测量、可扩展的验证架构,使AI原生服务即软件生产变得可靠、可审计且可随时间改进。

英文摘要

AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for production environments where software must be continuously produced, verified, deployed, maintained, and adapted across many operational contexts and long time horizons. We present a meta-engineering harness: a software-production architecture that transforms operational and product feature requirements into explicit contracts, routes work through role-specialized AI agents, performs independent and adversarial verification, and continuously improves itself through structured failure classification and outer-loop calibration. The harness is designed for settings in which software delivery is not a one-time project but an ongoing operating function. In our motivating application, CTO-as-a-service for small service firms, the system manages websites, booking flows, payment systems, backoffice workflow automations, and AI-agent interfaces as continuously evolving technical infrastructure rather than one-off deliverables. We describe the layered architecture, including two-pass contract compilation, persistent markdown memory with specialization records, attention-based and independence-based verifications, a four-way failure arbiter, and outer-loop calibration. We report results from an early production deployment spanning 17 features over several weeks, including a detailed in-app payments case study that revealed contract incompleteness and verification-boundary issues. These observations directly drove targeted improvements to the harness. The contribution is an implemented, measurable, and extensible verification architecture for making AI-native service-as-a-software production reliable, auditable, and improvable over time.

2605.25664 2026-05-26 cs.HC cs.AI cs.AR cs.CY

Posture Clip: Sit properly or I wont let you work

Posture Clip:坐姿端正,否则不让你工作

Arka Majhi, Aparajita Mondal

AI总结 提出一种名为PostureClip的衣夹式设备,通过屏幕变黑和恢复来限制用户弯腰工作,实验表明其能显著改善坐姿角度并减少弯腰时长。

Comments Published online by Cambridge University Press on 14 May 2026

详情
Journal ref
Wearable Technologies, 7, e5 (2026)
AI中文摘要

不良姿势因其对健康和生产率的有害影响而成为一个重要问题。本文提出了一种名为PostureClip的衣夹式设备,旨在通过黑屏并在纠正姿势后恢复屏幕,限制用户以弯腰角度坐着工作,从而促进更好的姿势。该设备集成了传感器和反馈机制,为用户提供实时姿势反馈。为了评估PostureClip的有效性,进行了一项对照实验,参与者(n=165)每天使用笔记本电脑/个人电脑工作超过6小时。参与者被随机分配到干预组(IG1,n=54;IG2,n=55),使用衣夹式设备,以及对照组(CG,n=56),不使用该设备。IG1未收到反馈,而IG2通过通知并进一步使屏幕变暗从设备获得反馈。研究在参与者的办公室环境中进行,持续4周,收集了姿势角度、弯腰持续时间以及用户反馈等指标。分析显示,与无反馈组和对照组(未干预)相比,使用带反馈的PostureClip的参与者组在姿势角度上有显著改善(p<0.001),弯腰持续时间显著减少(p<0.01)。用户反馈的定性分析强调了该设备的易用性、提供及时反馈的有效性以及对参与者姿势意识和习惯的积极影响。这些结果表明,PostureClip是促进久坐工作中更好姿势的有效工具。

英文摘要

Poor posture is a significant concern due to its detrimental effects on health and productivity. This paper presents a collar-clipped device called PostureClip, designed to restrict users from sitting and working at a bent angle, by blacking out the screen and resuming on correcting posture, thereby promoting better posture. The device integrates sensors and feedback mechanisms to provide real-time posture feedback to users. To evaluate the effectiveness of PostureClip, a controlled experiment was conducted with participants (n=165) who were working on a laptop/PC for over 6 hours per day. The participants were randomly assigned to both the intervention group (IG1,n=54 ; IG2,n=55), which used the collar-clipped device, and the control group (CG, n=56), which did not use the device. IG1 didn't get feedback while IG2 got feedback from the device by notifying and further darkening the screen. The study was conducted in the office environment of the participants, for 4 weeks, and metrics such as posture angle, duration of bent angle, and user feedback were collected. Analysis revealed significant improvements in posture angle (p<0.001) and significant reduction in bent angle duration (p<0.01) for participants' group using PostureClip with feedback and compared to the group without feedback and the control group (who were not intervened). The qualitative analysis of user feedback highlighted the device's ease of use, effectiveness in providing timely feedback, and positive impact on participants' awareness and habits regarding posture. These results indicate that PostureClip is an effective tool for promoting better posture during sedentary work.

2605.25648 2026-05-26 stat.ML cs.LG

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

StrTransformer: 面向无监督盲源恢复的源向结构化Transformer

Yuan-Hao Wei

AI总结 提出StrTransformer框架,通过源向结构化Transformer分支和观测空间混合器直接优化潜在源矩阵,实现盲源恢复和分支潜在建模。

详情
AI中文摘要

本文提出StrTransformer,一种用于盲源恢复和分支潜在建模的源向结构化Transformer框架。StrTransformer不使用编码器推断潜在变量,而是直接优化潜在源矩阵,同时结合观测空间混合器和源向结构化Transformer分支。混合器强制重建一致性,而每个Transformer分支对一条潜在源轨迹施加可微的结构约束。具体来说,每个源被转换为多尺度补丁令牌,随机掩码,由局部偏置Transformer处理,并通过掩码补丁重建能量进行评估。该能量作为隐式的源向结构先验。为了鼓励不同潜在分支专门处理不同的时间模式,StrTransformer进一步引入有序多尺度控制器,学习分支特定的补丁尺度权重、有序尺度中心和局部注意力斜率。最终目标函数结合了观测重建、源向结构正则化以及用于分离和尺度专门化的模块化辅助惩罚。我们分析了目标函数的解耦和耦合结构、正则化精确重建纤维,以及由有序分支描述符引起的置换对称性减少。一个受控案例研究表明,学习到的分支收敛到不同的时间尺度结构,并在事后评估中恢复源对齐的潜在轨迹。

英文摘要

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the latent source matrix together with an observation-space mixer and source-wise structural Transformer branches. The mixer enforces reconstruction consistency, while each Transformer branch imposes a differentiable structural constraint on one latent source trajectory. Specifically, each source is converted into multi-scale patch tokens, randomly masked, processed by a locality-biased Transformer, and evaluated through a masked patch reconstruction energy. This energy acts as an implicit source-wise structural prior. To encourage different latent branches to specialize into different temporal regimes, StrTransformer further introduces an ordered multi-scale controller that learns branch-specific patch-scale weights, ordered scale centers, and locality attention slopes. The resulting objective combines observation reconstruction, source-wise structural regularization, and modular auxiliary penalties for separation and scale specialization. We analyze the decoupling and coupling structure of the objective, the regularized exact-reconstruction fiber, and the reduction of permutation symmetry induced by ordered branch descriptors. A controlled case study shows that the learned branches converge to distinct temporal-scale structures and recover source-aligned latent trajectories under post-hoc evaluation.

2605.25640 2026-05-26 physics.ins-det cs.LG hep-ex nucl-ex

3D Magnetic Field Reconstruction and Mapping with Physics-Informed Neural Networks

基于物理信息神经网络的3D磁场重建与映射

Haohan Yu, Zhanxu Hao, Bingzhi Li, Zejia Lu, Xiang Chen, Liang Li

AI总结 提出一种物理信息神经网络(PINN)框架,通过将麦克斯韦方程直接融入损失函数并引入测量点物理残差损失,实现高精度3D磁场重建,仿真精度达10^{-4},实验精度达10^{-3}水平。

详情
AI中文摘要

准确重建不可达区域的磁场对于物理学中的许多高精度实验至关重要。传统方法(如球谐展开)常因截断误差而限制精度。本研究提出一种先进的物理信息神经网络(PINN)框架,用于高精度3D磁场映射。与传统的纯数据驱动模型不同,所提出的PINN将麦克斯韦方程直接融入损失函数,在整个域内强制执行无散度和无旋度条件。一个关键创新是在测量位置包含显式的物理残差损失,确保超越随机配点采样的严格物理一致性。使用模拟数据进行验证,重建精度达到$10^{-4}$,比现有PINN基准提高十倍。此外,使用定制线圈组件的实验验证表明,在环境条件下,相对精度达到亚百分比水平($10^{-3}$量级)的稳健重建。这种AI驱动方法为传感器放置受限的复杂实验环境中的场监测和测量提供了稳健的高精度解决方案。

英文摘要

Accurate reconstruction of magnetic fields in inaccessible regions is vital for many high-precision experiments in physics. Traditional methods, such as spherical harmonic expansion, often suffer from truncation errors that limit their precision. This study proposes an advanced Physics-Informed Neural Network (PINN) framework for high-precision 3D magnetic field mapping. Unlike conventional data-driven models, the proposed PINN integrates Maxwell's equations directly into the loss function, enforcing divergence-free and curl-free conditions across the entire domain. A key innovation is the inclusion of explicit physics-residual losses at measurement locations, ensuring rigorous physical consistency beyond random collocation sampling. Validation using simulated data achieves a reconstruction accuracy of $10^{-4}$, a tenfold improvement over existing PINN benchmarks. Furthermore, experimental validation using a custom coil assembly demonstrates robust reconstruction with sub-percent relative accuracy, reaching the $10^{-3}$ level under ambient conditions. This AI-driven methodology provides a robust, high-precision solution for field monitoring and measurement in complex experimental environments where direct sensor placement is restricted.

2605.25608 2026-05-26 stat.ML cs.LG

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

学习具有范数约束神经网络的稀疏组合函数

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, Tomaso Poggio

AI总结 本文通过范数约束的深度神经网络,建立了学习稀疏组合函数的逼近率和过风险界,证明了深度网络能够利用层次表示避免维数灾难。

详情
AI中文摘要

深度神经网络学习层次特征的能力被广泛认为是其在高维学习中成功的关键机制。现有理论通过基于参数计数的逼近率和组合模型的无维数灾难样本复杂度保证,部分支持了这一观点。为了研究参数数量超过样本量的过参数化场景,我们开发了一个通过参数范数衡量复杂度的框架。在该方法中,我们使用Frobenius范数约束的深度神经网络,为学习稀疏组合函数建立了逼近率和过风险界,其中组合函数的组合结构由有向无环图表示。我们的结果具有广泛的适用性,因为每个可有效图灵计算的函数都具有稀疏组合表示。特别地,我们涵盖了一系列代表性模型,包括多指标模型、二叉树结构和一般组合架构。我们推导的速率表明,深度网络可以利用目标函数的组合结构,通过层次表示有效避免维数灾难。

英文摘要

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates based on parameter counts and sample complexity guarantees for compositional models without incurring the curse of dimensionality (CoD). To study overparameterized regimes, where the number of parameters exceeds the sample size, we develop a framework that measures complexity via the parameter norm. Within this approach, we establish approximation rates and excess risk bounds for learning sparse compositional functions whose compositional structure is represented by directed acyclic graphs (DAGs), using Frobenius norm-constrained deep neural networks. Our results have broad applicability since every function that is efficiently Turing computable admits sparse compositional representations. In particular, we cover a range of representative models, including multi-index models, binary tree structures, and general compositional architectures. The rates we derive show that deep networks can exploit the compositional structure of the target functions, effectively avoiding the CoD through hierarchical representations.

2605.25605 2026-05-26 eess.AS cs.LG

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

在不平衡EEG数据集中基于刺激重建的听觉注意力鲁棒解码

Yuanming Zhang, Yayun Liang, Zhibin Lin, Jing Lu

AI总结 研究不平衡数据集对基于刺激重建的听觉注意力解码性能的影响,提出留一对包交叉验证协议以防止解码准确率膨胀。

详情
AI中文摘要

在过去十年中,许多研究通过刺激重建从脑电图信号中应用深度神经网络解码听觉注意力。然而,数据集平衡对基于刺激重建的AAD解码性能的影响尚未被探索。在本研究中,使用三个公开的EEG-AAD数据集——KUL、DTU和NJU cEEGrid——构建平衡和不平衡的实验条件。我们假设并证明基于刺激重建的DNN解码器倾向于在不平衡数据集上产生高估的解码性能。为了解决这个问题,我们提出了一种留一对包交叉验证协议。实验结果证实,LOPEO有效防止了在不平衡数据集上的解码准确率膨胀。虽然平衡数据集在实验设计中通常更受青睐,但LOPEO为已经发表的不平衡数据集提供了一个原则性的评估框架,填补了该领域的一个重要空白。

英文摘要

In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performance of stimulus reconstruction-based AAD remains unexplored. In this study, three publicly available EEG-AAD datasets - KUL, DTU, and NJU cEEGrid - are used to construct both balanced and unbalanced experimental conditions. We hypothesize and demonstrate that stimulus reconstruction-based DNN decoders tend to produce overestimated decoding performance on unbalanced datasets. To address this issue, we propose a leave-one-paired-envelope-out (LOPEO) cross-validation protocol. Experimental results confirm that LOPEO effectively prevents inflated decoding accuracy on unbalanced datasets. While balanced datasets are generally preferred in experimental design, LOPEO provides a principled evaluation framework for unbalanced datasets that have already been published, filling an important gap in the field.

2605.25592 2026-05-26 stat.ML cs.LG

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

多项Logit模型的最优设计及其在最佳组合识别中的应用

Joongkyu Lee, Min-hwan Oh

AI总结 针对多项Logit(MNL)模型,提出计算高效的最优实验设计框架,通过混合整数线性规划和多项式时间松弛方法实现统计效率与可扩展性,并应用于线性效用和非均匀收益下的最佳组合识别。

Comments Accepted at ICML 2026

详情
AI中文摘要

我们研究了多项Logit(MNL)赌博机的最优实验设计,其中智能体从大小为$N$的基集中重复选择$K$个物品的子集,并观察单选择反馈。与线性或广义线性赌博机不同,MNL赌博机具有组合动作空间,这使得经典的最优设计方法和对所有子集的朴素优化在计算上难以处理。我们为MNL模型提出了一种计算高效的最优设计框架,通过两种互补方法实现了统计效率和可扩展性:(i) 将设计预言精确或认证近似地重构为带有求解器认证早停的$0$-$1$混合整数线性规划(MILP),以及(ii) 一种完全多项式时间的提升设计,用可处理的替代目标替换非线性目标。利用Kiefer-Wolfowitz等价定理,我们建立了接近G-最优性的保证,并刻画了由此产生的统计-计算权衡。作为应用,我们为具有线性效用和非均匀收益的MNL赌博机开发了一种最佳组合识别算法,并证明了实例相关的样本复杂度为$\tilde{O}\big(\frac{d \log N}{\Delta^2}\big)$,其中$d$是特征维度,$N$是臂的数量,$\Delta$是最小收益差距。

英文摘要

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.

2605.25590 2026-05-26 stat.ML cs.LG

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

基于折扣在线镜像梯度的非平稳广义线性老虎机

Joongkyu Lee, Min-hwan Oh

AI总结 提出DOMD-GLB算法,利用折扣在线镜像梯度处理非平稳广义线性老虎机,在保持O(1)每轮计算和内存成本的同时,实现动态遗憾界。

详情
AI中文摘要

我们研究非平稳广义线性老虎机(GLBs),其中期望奖励通过非线性链接函数与未知时变参数建模。该框架涵盖广泛的奖励模型,包括线性、伯努利和二项式奖励。现有方法主要基于最大似然估计(MLE),使用滑动窗口、重启或折扣机制处理非平稳性。尽管这些方法在统计上实现了高效的遗憾保证,但它们通常需要在每轮重新访问过去观测,导致计算和内存成本随时间增长;此外,其中一些方法依赖于非凸投影步骤。本文提出DOMD-GLB,一种用于非平稳GLBs的新算法,利用折扣在线镜像梯度(DOMD)进行参数估计,从而每轮仅产生O(1)的计算和内存成本。我们证明了在漂移环境下的动态遗憾界为$\tilde{O} \big(c_\mu^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$,在分段平稳环境下为$\tilde{O}\big(c_\mu^{-1/3} d^{2/3} \Gamma_T^{1/3} T^{2/3}\big)$,其中$d$表示特征维度,$T$表示时间范围,$P_T$表示路径长度,$\Gamma_T$表示变化点数量,$c_\mu$是与链接函数相关的曲率参数,同时显著提高了计算效率。据我们所知,这是首个每轮计算和内存成本与时间无关的非平稳GLBs算法。

英文摘要

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $Γ_T$ the number of change points, and $c_μ$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.

2605.25541 2026-05-26 cs.CG cs.AI cs.HC cs.LG

TopoAlign: Topology-Aware Visual Representation Alignment

TopoAlign:拓扑感知的视觉表示对齐

Xinyuan Yan, Rita Sevastjanova, Mennatallah El-Assady, Bei Wang

AI总结 提出TopoAlign框架,利用拓扑数据分析中的mapper图,通过联合力导向优化、自动结构匹配区域检测和基序查询,从拓扑角度比较不同模型或层的表示结构对齐。

详情
AI中文摘要

神经网络将输入编码为高维向量(称为表示),通过编码任务相关的结构和语义来捕捉模型如何处理数据。表示对齐指不同模型、层或训练条件对相同输入产生相似表示的程度,对模型解释、选择和鲁棒性分析有重要意义。现有的对齐度量方法主要依赖于几何属性(如邻域和聚类相似性),对表示的全局组织提供的洞察有限。在这项工作中,我们提出了TopoAlign,一个从结构角度视觉比较模型表示的拓扑感知框架。利用拓扑数据分析中的mapper图,TopoAlign联合分析来自不同模型或层的共享输入构建的图。该框架支持自上而下的比较工作流:首先通过联合力导向优化进行全局结构对齐,生成协调的图布局;然后通过自动检测结构匹配区域(用Bubble Sets可视化)识别局部对应关系;最后通过基于基序的查询和膜启发式可视化实现细粒度模式检查。我们通过语言和多模态模型的案例研究以及专家反馈展示了TopoAlign。结果表明,TopoAlign从拓扑角度为表示结构和对齐提供了有意义的洞察。

英文摘要

Neural networks encode inputs as high-dimensional vectors, known as representations, that capture how models process data by encoding task-relevant structure and semantics. Representation alignment refers to the degree to which different models, layers, or training conditions produce similar representations for the same inputs, with important implications for model interpretation, selection, and robustness analysis. Existing approaches to measure alignment primarily rely on geometric properties, such as neighborhood and cluster similarity, offering limited insight into the global organization of representations. In this work, we present TopoAlign, a topology-aware framework for visually comparing model representations from a structural perspective. Leveraging mapper graphs from topological data analysis, TopoAlign jointly analyzes graphs constructed from representations of shared inputs across different models or layers. The framework supports a top-down comparative workflow: it first performs global structure alignment via joint force-directed optimization to produce coordinated graph layouts; it then identifies local correspondences through automated detection of structurally matching regions, visualized with Bubble Sets; and finally it enables fine-grained pattern inspection through motif-based queries and membrane-inspired visualizations. We demonstrate TopoAlign through case studies on language and multimodal models, complemented by expert feedback. Our results show that TopoAlign provides meaningful insights into representation structure and alignment from a topological perspective.

2605.25536 2026-05-26 cs.SE cs.AI

A Tertiary Review of Large Language Model-Based Code Generating Tasks: Trends, Challenges, and Future Directions

基于大语言模型的代码生成任务的三级综述:趋势、挑战与未来方向

Muslim Chochlov, Michael English, Jim Buckley

AI总结 本三级综述综合了30篇二级研究(2017-2025年),分析了基于大语言模型的代码生成任务在出版趋势、效果、场景、集成挑战和未来方向上的证据,发现基准测试准确率高但泛化性弱,鲁棒性脆弱,效率问题普遍,毒性和偏见报告不足,主要挑战涉及经济可行性、评估有效性和社会技术集成。

详情
AI中文摘要

上下文。大语言模型(LLMs)越来越多地被应用于软件工程中的代码生成任务(CGTs)。尽管报告的结果令人鼓舞,但这种应用的更广泛影响及其与真实世界开发的集成仍未被充分理解,现有的三级研究在这方面提供的很少。目标。本三级研究整合了关于基于LLM的CGTs的二级证据,综合了出版格局、效果、场景、集成挑战和未来研究方向。方法。遵循系统综述指南,我们在相关数字图书馆中进行了检索,并辅以前向和后向滚雪球及筛选步骤。评估了研究质量,并通过评估者间一致性统计对提取可靠性进行了审计。使用SWEBOK知识领域和HELM框架综合了证据。结果。我们识别出30篇发表于2017-2025年间的二级研究,自2023年以来快速增长。在基准测试上准确性似乎很强,但在真实世界泛化方面支持较弱;鲁棒性在不同任务和配置下脆弱;效率约束普遍存在;毒性和偏见报告不足。主要挑战涉及经济可行性、评估有效性和社会技术集成。未来方向建议领域感知的模型改进以及全面、标准化评估的需求。结论。基于LLM的CGTs代表了一个快速成熟但评估不均的研究领域,突出了对领域感知模型改进和全面、标准化评估的需求,以及解决效率和相关成本问题。

英文摘要

Context. Large language models (LLMs) are increasingly applied to code-generating tasks (CGTs) in software engineering. While reported results are promising, the broader effects of such application and their integration into real-world development remain insufficiently understood with existing tertiary studies provide little in this area. Objective. This tertiary study consolidates secondary evidence on LLM-based CGTs, synthesizing the publication landscape, effects, scenarios, integration challenges, and future research directions. Method. Following systematic review guidelines, we searched in related digital libraries, complemented by backward-and-forward snowballing and screening step. Study quality was assessed and extraction reliability was audited with inter-rater agreement statistics. Evidence was synthesized using SWEBOK knowledge areas and the HELM framework. Results. We identify 30 secondary studies published between 2017-2025, with rapid growth since 2023. Accuracy seems strong on benchmarks but weakly supported for real-world generalization; robustness is fragile across tasks and configurations; efficiency constraints are pervasive; toxicity and bias are under-reported. Dominant challenges concern economic feasibility, evaluation validity, and socio-technical integration. Future directions suggest domain-aware model improvement and the need for holistic, standardized evaluation. Conclusion. LLM-based CGTs represent a fast-maturing yet unevenly evaluated research area, highlighting the need for domain-aware model improvements and holistic, standardized evaluation, addressing efficiency and associated costs.

2605.25526 2026-05-26 stat.ML cs.LG

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

从DPP到$k$-DPP:通过谱分解的可识别性分析

Hideitsu Hino, Keisuke Yano

AI总结 通过谱分解研究行列式点过程(DPP)及其条件版本$k$-DPP的几何结构,揭示了$k$-DPP中谱参数和特征空间旋转参数的可识别性变化,并刻画了可识别性差距。

Comments 10 pages

详情
AI中文摘要

我们通过谱分解$L=UΛU^{\top}$研究行列式点过程(DPP)的几何结构。谱$Λ$通过初等对称多项式控制基数分布,而特征空间方向$U$控制每个固定基数层内的条件分布。在基数$k$上取条件得到$k$-DPP,其可识别性结构发生根本变化:谱参数仅在一个公共尺度下可识别,特征空间旋转参数仅通过特征向量矩阵的平方子式可识别。我们通过三个显式不变性(尺度、符号相似性和特征空间旋转)以及一个维数计数定理精确刻画了可识别性差距,该定理表明当$\binom{N}{k}<N(N+1)/2$时存在额外的连续不可识别性。相比之下,对于完整DPP,不可识别性仅来自离散的符号相似性。

英文摘要

We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=UΛU^{\top}$. The spectrum $Λ$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace orientation $U$ governs the conditional law within each fixed-cardinality stratum. Conditioning on cardinality $k$ yields the $k$-DPP, for which the identifiability structure changes fundamentally: the spectral parameter becomes identifiable only up to a common scale, and the eigenspace rotation parameter is identifiable only through squared minors of the eigenvector matrix. We characterize the identifiability gap precisely, via three explicit invariances (scale, sign similarity, and eigenspace rotation) and a dimension-counting theorem showing the existence of additional continuous non-identifiability whenever $\binom{N}{k}<N(N+1)/2$. In contrast, for the full DPP the non-identifiability comes only from the discrete sign similarity.

2605.25509 2026-05-26 stat.ML cs.LG

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

面向稀疏观测的正反PDE问题的引导流匹配:算法与理论

Xifeng Zhang, Jin Zhao

AI总结 提出FM4PDE流匹配生成框架,通过引导采样联合学习PDE系数与解分布,实现稀疏观测下的正向模拟与逆问题恢复,并提供误差保证。

Comments 50 pages, 8 figures, 4 tables

详情
AI中文摘要

从稀疏观测中重建PDE解是科学计算中的核心挑战。我们提出FM4PDE,一种流匹配生成框架,学习PDE系数(或初始状态)与解(或最终状态)的联合分布,从而在有限配对数据下实现正向模拟和逆问题恢复。在推理时,采样由一个复合损失引导,该损失强制与稀疏测量一致并减少PDE残差;我们支持确定性、随机性和混合采样器。我们为这些引导过程提供误差保证。对于确定性优化器,一个强制条件确保轨迹有界,且逐阶段收缩导致目标精度的对数复杂度。对于随机采样器,我们引入自适应引导并假设速度场的耗散性,以获得与噪声基底参数无关的均匀矩界。这导致多项式时间误差界,且一个匹配的下界表明恒定引导会引入不可避免的正偏差,从而激发自适应性。还提供了混合确定性-随机分析。在静态和时变基准PDE上的实验表明,与基于扩散的生成模型相比,具有竞争性的精度和更快的推理速度。

英文摘要

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and solutions (or final states), enabling both forward simulation and inverse recovery with limited paired data. At inference, sampling is guided by a composite loss that enforces agreement with sparse measurements and reduces the PDE residual; we support deterministic, stochastic, and hybrid samplers. We provide error guarantees for these guided procedures. For the deterministic optimizer, a coercivity condition ensures trajectory boundedness and a phase-wise contraction yields logarithmic complexity in the target accuracy. For the stochastic sampler, we introduce adaptive guidance and assume dissipativity of the velocity field to obtain uniform moment bounds independent of the noise-floor parameter. This leads to polynomial-time error bounds, and a matching lower bound shows constant guidance induces an unavoidable positive bias, motivating adaptivity. A hybrid deterministic-stochastic analysis is also provided. Experiments on static and time-dependent benchmark PDEs demonstrate competitive accuracy and faster inference than diffusion-based generative models.

2605.25505 2026-05-26 cs.CY cs.AI econ.GN physics.soc-ph q-fin.EC

Generative AI impacts on intra-urban inequality and skill premium in Beijing

生成式人工智能对北京城市内部不平等和技能溢价的影响

Xiliu He, Haoxiang Zhao, Mingyi Ma, Edward Wen Chuan Lai, Koei Enomoto, Anni Hu, Jiatong Li, Lingyun Chu, Yuan Lai

AI总结 利用北京2018-2024年500万条招聘数据,通过五个大语言模型评估任务级暴露度,构建社区级生成式人工智能暴露指数,发现生成式人工智能暴露集中在核心区,导致高暴露社区工资停滞和“高技能陷阱”,挑战了技能偏向技术变革理论。

Comments 21 pages, 8 figures

详情
AI中文摘要

生成式人工智能(GenAI)是首次大规模触及高认知任务的自动化浪潮,但其对城市内部不平等的影响仍基本未知。利用北京2018-2024年500万条招聘数据,我们通过汇总五个领先大语言模型的任务级评估,构建了社区级GenAI暴露指数。我们考察了这一冲击的空间、结构和因果机制。我们发现,GenAI暴露高度集中在城市核心区,加深了城市内部的人工智能鸿沟。自2023年以来,高暴露社区尽管继续吸引高技能工人,却经历了工资停滞——一种“高技能陷阱”。这种工资惩罚是由任务去技能化和劳动力市场拥挤加剧驱动的。以ChatGPT发布为中心的倍差法设计支持因果解释。这些发现挑战了流行的技能偏向技术变革理论,并为全球科技中心的包容性人工智能治理提供了基础。

英文摘要

Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.

2605.25460 2026-05-26 stat.ML cs.LG

Mean-Shift PCA by Knockoff Mean

通过Knockoff均值的Mean-Shift PCA

Mengda Li, Zeng Li, Jianfeng Yao

AI总结 提出一种通过故意引入knockoff均值扰动来消除PCA中均值偏移噪声的方法,利用随机矩阵理论证明均值偏移尖峰与原始协方差特征值谱可分离,并设计了两阶段PCA算法。

Comments ICML 2026

详情
AI中文摘要

去除噪声是困难的,但添加噪声是容易的。在这项工作中,我们展示了如何通过故意引入knockoff均值扰动来消除PCA中的均值偏移噪声成分。标准PCA对样本均值的偏移高度敏感:来自偏移分布的一小部分样本可能导致主成分方向的大偏差。在高维情况下,现有的鲁棒PCA方法无法处理混合模型中固有的均值偏移污染结构。利用随机矩阵理论工具,我们证明了均值偏移尖峰在谱上与原始协方差的稳定特征值可分离。此外,原始特征空间渐近地不受污染影响,与混合权重无关。利用这种谱稳定性,我们提出了一种简单的两阶段PCA算法,通过添加knockoff均值,仅使用标准PCA操作来识别和移除均值偏移成分。

英文摘要

Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in the sample mean: a small fraction of samples from a shifted distribution can cause large deviations in the leading principal components. In high-dimensional regimes, existing Robust PCA approaches cannot handle the mean-shift contamination structure inherent in the mixture model. Using tools from Random Matrix Theory, we prove that the mean-shift spikes are spectrally separable from the stable eigenvalues of the original covariance. Furthermore, the original eigenspace remains asymptotically invariant to the contamination, independent of the mixture weight. Exploiting this spectral stability, we propose a simple, two-stage PCA algorithm by adding knockoff mean that identifies and removes the mean-shift component using only standard PCA operations.

2605.25454 2026-05-26 cs.HC cs.AI cs.CL cs.CY cs.SI

AI Content Moderation in Therapy Conversations

AI在治疗对话中的内容审核

Jiwon Kim, Claire Wang, Taeung Yoon, Sabelle Huang, Koustuv Saha

AI总结 研究审计三种主流内容审核系统(OpenAI、Meta、Google)在真实治疗对话中的标记行为,揭示其限制LLM作为治疗师的潜力。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被用于情感支持。它们也正在被开发用于正式的治疗目的。然而,像ChatGPT或Llama这样的LLM通常配备内容审核护栏,出于责任和安全考虑,阻止它们与用户讨论敏感话题,而这种无法触及这些话题的能力可能影响它们作为治疗师的能力。在本研究中,我们对三种最先进的审核系统(OpenAI的审核端点、Meta的Llama Guard和Google的Shield Gemma)进行了算法审计,以调查这些系统将现实治疗会话内容标记为不良的程度。我们的结果揭示了用户和组织在设计LLM扮演治疗师角色时可能遇到的限制。

英文摘要

Large language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes. However, LLMs like ChaptGPT or Llama are often developed with content moderation guardrails that prevent them from discussing sensitive subjects with users for both liability and safety purposes, and this inability to broach these subjects may affect their capacity as therapists. In this study, we perform an algorithm audit on three state-of-the-art moderation systems (OpenAI's moderation endpoint, Meta's Llama Guard, and Google's Shield Gemma) to investigate the extent to which these systems flag the content of real-life therapy sessions as undesirable. Our results raise implications for the limitations that users and organizations may encounter when designing LLMs to play the part of a therapist.

2605.25452 2026-05-26 stat.ME cs.LG stat.ML

Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks

理解图神经网络泛化能力的不同统计视角

Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

AI总结 本文从学习理论、无限参数/图渐近和随机图模型三个统计框架综述图神经网络泛化性的理论进展。

Comments 15 pages, 4 figures, submission for Special Issue in AStA Advances in Statistical Analysis

详情
AI中文摘要

图神经网络(GNN)是目前用于图结构数据学习和预测的最流行方法,已部署在从社交网络分析到药物发现的各种领域。然而,对GNN性能的数学理解仍然有限。我们讨论了用于研究GNN统计泛化性的各种视角。我们识别出三个广泛的框架。第一种方法根植于学习理论,依赖于一致收敛界和特定GNN架构假设类的复杂度。该方法还建立在GNN的表达性之上,通常通过图同构测试的视角进行研究。第二个原则是通过分析无限多参数或无限图大小渐近下的GNN来简化神经架构。该方法使用高斯过程、神经正切核或图神经网络算子来近似GNN,从而可以研究训练后GNN的泛化性或稳定性。第三个框架在随机图模型(通常是上下文随机块模型)下研究GNN,并利用高维统计工具推导非渐近误差率。我们强调了一些关键的理论结果,并讨论了每个视角的一些局限性和开放研究问题。

英文摘要

Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited mathematical understanding of the performance of GNNs. We discuss the various perspectives used to study statistical generalisation in GNNs. We identify three broad frameworks. The first approach, rooted in learning theory, relies on uniform convergence bounds and the complexity of the hypothesis class of specific GNN architectures. This approach also builds on the expressivity of GNNs, typically studied through the lens of graph isomorphism tests. The second principle is to simplify the neural architecture by analysing GNNs under the asymptotics of infinitely many parameters or infinite graph size. This approach approximates GNNs using Gaussian processes, neural tangent kernels or graphon neural network operators, which allow studying the generalisation or stability of trained GNNs. The third framework studies GNNs under random graph models, often the contextual stochastic block model, and derives non-asymptotic error rates using tools from high-dimensional statistics. We highlight some key theoretical results and discuss a few limitations and open research questions for each perspective.

2605.25426 2026-05-26 cs.GR cs.CV

Learning View-Dependent Splatting Kernels

学习视图相关的溅射核

Huakeng Ding, Zhanpeng Liu, Fan Pei, Kun Zhou, Hongzhi Wu

AI总结 提出一种可微框架,通过自动学习视图相关的2D核,在基于溅射的管线中提升新视角合成质量与表示效率。

Comments Accepted to SIGGRAPH 2026. 10 pages, 8 figures

详情
AI中文摘要

我们提出一种可微框架,在基于溅射的管线中自动学习视图相关的2D核,以提升新3D视角合成的重建质量和表示效率。我们的体积基元定义为边界椭球体和3D核潜向量。首先学习一个投影网络,以椭球体属性和3D核潜向量为输入,输出2D核潜向量。接着,结果送入解码器,生成关于马氏距离的径向对称2D核,受投影椭球体约束。神经网络与每个基元的属性联合优化。在标准基准上展示了我们方法的有效性,与最先进的分析和学习的核技术相比具有优势。最后,我们将该思想扩展到学习用于2D溅射以及图像表示的通用2D核。

英文摘要

We present a differentiable framework to automatically learn view-dependent 2D kernels in a splatting-based pipeline to improve reconstruction quality and representation efficiency for novel 3D view synthesis. Our volumetric primitive is defined as a bounding ellipsoid and a 3D-kernel latent vector. We first learn a projection network to output a 2D-kernel latent, taking the attributes of the ellipsoid and the 3D-kernel latent as input. Next, the result is sent to a decoder to produce a radially symmetric 2D kernel in terms of Mahalanobis distance, bounded by the projected ellipsoid. The neural networks along with per-primitive attributes are jointly optimized. The effectiveness of our approach is demonstrated on standard benchmarks, comparing favorably against state-of-the-art techniques on both analytical and learned kernels. Finally, we extend the idea to learn general 2D kernels for 2D splatting as well as image representation.