arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.04378 2026-06-04 cs.CL

DLLG: Dynamic Logit-Level Gating of LLM Experts

DLLG: 大语言模型专家的动态logit级门控

Bingnan Li, Zhaoyang Zhang, Xiaoze Liu, Yantao Shen, Shuli Jiang, Shuo Yang, Wei Xia, Zhuowen Tu, Stefano Soatto

AI总结 提出DLLG框架,通过轻量级门控模块学习token级专家融合权重,利用稀疏的响应级监督实现动态logit级集成,无需token级标签或专家重训练,在推理和代码基准上优于路由、启发式集成和参数合并方法。

详情
AI中文摘要

利用多个专门的大语言模型可以结合互补优势,但现有方法在适应性和稳定性之间权衡:路由过早提交,启发式集成依赖脆弱的代理,参数合并引入干扰。我们提出DLLG(动态logit级门控),一个动态logit级集成框架,从稀疏的响应级监督中学习token级专家融合。一个轻量级门控模块预测逐步融合权重,将轨迹级正确性与生成联系起来,无需token级标签或专家重训练。在多样化的推理和代码基准上,DLLG在不同模型规模下始终优于强路由、启发式集成和参数合并基线,突显了学习到的logit级融合作为集成专门专家的稳健且可扩展的范式。

英文摘要

Leveraging multiple specialized LLMs can combine complementary strengths, but existing approaches trade adaptability for stability: routing commits prematurely, heuristic ensembling depends on fragile proxies, and parameter merging introduces interference. We propose DLLG (Dynamic Logit-Level Gating), a dynamic logit-level ensembling framework that learns token-level expert fusion from sparse response-level supervision. A lightweight gating module predicts step-wise fusion weights, linking trajectory-level correctness to generation without token-level labels or expert retraining. Across diverse reasoning and code benchmarks, DLLG consistently outperforms strong routing, heuristic ensembling, and parameter-merging baselines across model scales, highlighting learned logit-level fusion as a robust and scalable paradigm for integrating specialized experts.

2606.04375 2026-06-04 cs.LG stat.ML

When Do Fewer Coordinates Suffice in DP-SGD?

何时在DP-SGD中更少的坐标就足够了?

Huiqi Zhang, Fang Xie

AI总结 本文提出一种无需公共数据的两阶段坐标稀疏私有训练方法TP-TopK,通过私有预热阶段识别坐标支撑集,使得噪声项缩放比例从全参数维度d降至活跃维度k,并在非凸平稳性边界下给出坐标限制有效的条件。

详情
Comments
14 pages
AI中文摘要

差分隐私随机梯度下降(DP-SGD)向每个更新的坐标注入噪声,使得注入的噪声能量随环境参数维度\(d\)缩放。我们探究私有训练何时可以更新更少的坐标而不丢失优化所需的信号。我们提出 extsc{TP-TopK}(两阶段TopK DP-SGD),一种无需公共数据的坐标稀疏私有训练的两阶段方法,其中私有预热阶段识别用于指导主训练阶段的坐标支撑集。我们给出了一个刻画坐标限制何时有益的准则,通过非凸平稳性边界表明在该条件下相关噪声项随活跃维度\(k\)而非全参数维度\(d\)缩放,并提供了基于预热的坐标排序可靠性的下界。在MNIST、FMNIST和CIFAR-10上的实验表明,学习到的坐标支撑集比大小匹配的随机支撑集能保留更多的梯度能量,当活跃维度较小且预热分数信息丰富时收益最大。

英文摘要

Differentially private stochastic gradient descent (DP-SGD) injects noise into every updated coordinate, making the injected noise energy scale with the ambient parameter dimension \(d\). We ask when private training can update fewer coordinates without losing the signal needed for optimization. We propose \textsc{TP-TopK} (Two-Phase TopK DP-SGD), a two-phase method for coordinate-sparse private training without public data, in which a private warm-up phase identifies a coordinate support used to guide the main training phase. We give a criterion characterizing when coordinate restriction can be beneficial, show via a nonconvex stationarity bound that under this condition the relevant noise term scales with the active dimension \(k\) rather than the full parameter dimension \(d\), and provide a lower bound on the reliability of warm-up-based coordinate ranking. Experiments on MNIST, FMNIST, and CIFAR-10 show that learned coordinate supports can retain more gradient energy than size-matched random supports, with the largest gains when the active dimension is small and warm-up scores are informative.

2606.04374 2026-06-04 cs.IR cs.AI

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

DSIRM:学习查询桥接的离散语义标识符用于电商相关性建模

Bokang Wang, Xing Fang, Mingmin Jin, Jing Wang, Zhentao Song, Guangxin Song, Jianbo Zhu

AI总结 针对电商搜索中连续嵌入难以捕捉细粒度属性区分的问题,提出查询桥接对比量化的离散语义标识符相关性模型(DSIRM),通过注入查询-物品交互监督学习语义感知分区,并利用生成式大语言模型预测物品标识符,显著提升相关性建模效果。

详情
Comments
Jing Wang (Corresponding Author)
AI中文摘要

尽管连续嵌入在电商搜索相关性方面取得了快速进展,但一个长期存在的难题是难以捕捉细粒度的属性区分。虽然离散语义标识符(SIDs)已被广泛采用作为有前景的替代方案,但现有的SID生成方法严重依赖无监督量化。在现实场景中,缺乏显式监督通常使得更难决定哪些物品应共享一个SID,导致查询依赖排序的能力有限。为了解决无监督SID的问题,我们提出显式建模离散相关性特征,并开发了离散语义标识符相关性模型(DSIRM)。具体而言,我们在物品侧提出了一种查询桥接的对比量化方法,将查询-物品交互监督注入残差量化中,以主动学习相关性感知的语义分区。另一方面,我们在查询侧探索生成式大语言模型,从文本中显式预测物品SID,解决长尾查询和意图模糊问题。查询和物品SID之间的层次前缀匹配产生了具有判别力的特征,完美补充了密集信号。在天猫生产数据上的大量实验结果表明,我们提出的方法取得了更好的结果,离线AUC提升了+1.54%。通过高效的混合架构部署,它实现了显著的在线提升(UCTR +0.13%,UCTCVR +0.25%),证明了其巨大的工业价值。

英文摘要

Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a promising alternative, existing SID generation methods rely heavily on unsupervised quantization. In realistic scenarios, the lack of explicit supervision often makes it more difficult to dictate which items should share an SID, resulting in limited capability for query-dependent ranking. To address the issue of unsupervised SIDs, we propose to explicitly model discrete relevance features and develop a Discrete Semantic Identifier Relevance Model (DSIRM). Specifically, we present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. On the other hand, we explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity. Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals. Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54\%. Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13\% UCTR, +0.25\% UCTCVR), proving its massive industrial value.

2606.04370 2026-06-04 eess.AS cs.SD eess.SP

Masked Wavelet Scattering Transform Neural Field for Sound Field Reconstruction

掩蔽小波散射变换神经场用于声场重建

Xinmeng Luan, Samuel A. Verburg, Efren Fernandez-Grande, Gary Scavone

AI总结 提出一种利用小波散射变换作为多尺度特征提取器,结合神经场优化和掩蔽策略,实现稀疏观测下声场重建的方法,并在HRTF上采样中验证有效性。

详情
Comments
5 pages, 2 figures, conference
AI中文摘要

在本文中,我们提出了一种重建框架,利用小波散射变换(WST)作为多尺度特征提取器,在稀疏观测条件下施加统计先验。重建问题被表述为一个优化任务,并使用神经场求解,将WST纳入训练损失函数。作为概念验证,我们在HRTF上采样上验证了所提出的方法。对WST系数应用掩蔽策略,形成两阶段过程。第一阶段从小的多受试者数据集中学习二元掩码,第二阶段将学习到的掩码应用于单个HRTF的WST系数,以在重建过程中保留信息性统计结构。与基线方法的验证(同时也作为框架不同组件的消融研究)证明了所提出方法的有效性。

英文摘要

In this paper, we propose a reconstruction framework that leverages the Wavelet Scattering Transform (WST) as a multi-scale feature extractor to impose statistical priors under sparse observation conditions. The reconstruction problem is formulated as an optimization task and solved using a neural field, with the WST incorporated into the training loss function. As a proof of concept, we validate the proposed method on HRTF upsampling. A masking strategy is applied to the WST coefficients, resulting in a two-phase procedure. The first phase learns a binary mask from a small multi-subject dataset, while the second phase applies the learned mask to the WST coefficients of an individual HRTF to preserve informative statistical structures during reconstruction. Validation against baseline methods, which also serve as an ablation study of the different components of the framework, demonstrates the effectiveness of the proposed approach.

2606.04369 2026-06-04 cs.CV

VT-3DAD: Cross-Category 3D Anomaly Detection via Visual-Text Normal Space Alignment

VT-3DAD:通过视觉-文本正常空间对齐的跨类别3D异常检测

Zi Wang, Katsuya Hotta, Yawen Zou, Koichiro Kamide, Yijin Wei, Chao Zhang, Jun Yu

AI总结 提出VT-3DAD无训练框架,通过冻结CLIP编码器提取多视图视觉特征和文本正常锚点,融合视觉与语义偏差实现跨类别3D异常检测,在ShapeNetPart上达到最优性能。

详情
AI中文摘要

少样本跨类别3D异常检测旨在仅使用少量正常参考样本判断未知点云是否属于目标正常类别。现有的基于训练的方法通常需要类别级优化,而最近基于多视图CLIP视觉特征的无训练方法主要依赖视觉相似性,可能被几何相似的类别混淆。本文提出VT-3DAD,一种通过视觉-文本正常空间对齐进行跨类别3D异常检测的无训练框架。给定少量正常参考样本和测试点云,VT-3DAD首先生成逼真的多视图深度图,并使用冻结的CLIP视觉编码器提取视图级特征。视觉分支在多视图特征空间中度量参考-测试偏差。同时,深度感知和3D感知提示由冻结的CLIP文本编码器编码,构建文本正常锚点,为目标类别提供语义正常性约束。最终异常分数通过融合来自正常参考的视觉偏差和来自文本正常空间的语义偏差获得。在ShapeNetPart数据集上的实验表明,VT-3DAD达到了最先进性能。特别地,与仅视觉基线相比,VT-3DAD将单样本平均AUC-ROC从92.49%提升至94.80%,同时将平均标准差从5.64降至3.41。

英文摘要

Few-shot cross-category 3D anomaly detection aims to determine whether an unknown point cloud belongs to a target normal category using only a few normal references. Existing training-based methods usually require category-wise optimization, while recent training-free methods based on multi-view CLIP visual features mainly rely on visual similarity and may be confused by geometrically similar categories. In this paper, we propose VT-3DAD, a training-free framework for cross-category 3D anomaly detection via Visual-Text Normal Space Alignment. Given few-shot normal references and a test point cloud, VT-3DAD first generates realistic multi-view depth maps and extracts view-wise features using a frozen CLIP visual encoder. The visual branch measures reference-test deviation in the multi-view feature space. In parallel, depth-aware and 3D-aware prompts are encoded by the frozen CLIP text encoder to construct textual normal anchors, which provide semantic normality constraints for the target category. The final anomaly score is obtained by fusing visual deviation from normal references and semantic deviation from the textual normal space. Experiments on the ShapeNetPart dataset demonstrate that VT-3DAD achieves state-of-the-art performance. In particular, VT-3DAD improves the one-shot average AUC-ROC from 92.49% to 94.80% compared with the visual-only baseline, while also reducing the average standard deviation from 5.64 to 3.41.

2606.04367 2026-06-04 cs.CL cs.HC

GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings

GlossAssist——简化语料库创建并研究NLP模型在低资源文档设置中影响的工具

Bhargav Shandilya, Matt Buchholz, Alexis Palmer

AI总结 提出基于检索架构的自动标注工具GlossAssist,通过可编辑词库和主动学习反馈循环,提升低资源语言文档中词素标注的准确性和可解释性。

详情
Comments
6 pages, 3 figures
AI中文摘要

行间标注文本(IGT)是语言文档中语言学标注的标准格式。然而,手动生成IGT通常缓慢且成本高昂。近年来,自动标注系统有了显著改进,但实地语言学家中的采用率仍然有限。现有工具旨在被评估而非使用,没有提供可解释的修正路径或将语言学专业知识融入模型行为的方式。我们提出GlossAssist,一个基于CWMP(对比词-词素预训练)检索架构的标注工具,该工具将预测基于可变的已学习词素表示词库。结合CWMP,我们的系统将标注者的每次修正视为主动学习设置的一部分,从而扩展词库并改进未来预测,而无需重新训练模型。在本文中,我们展示我们的界面,并论证这种反馈循环应被视为面向文档语言学家的NLP工具的设计要求。

英文摘要

Interlinear glossed text (IGT) is the standard format for linguistic annotation in language documentation. Producing it manually, however, is often slow and costly. Automated glossing systems have improved substantially in recent years, but adoption among field linguists remains limited. Existing tools are designed to be evaluated rather than used, offering no interpretable path for correction or the incorporation of linguistic expertise back into model behavior. We present GlossAssist, a glossing tool built around the retrieval-based architecture of CWoMP (Contrastive Word-Morpheme Pre-training), which grounds predictions in a mutable lexicon of learned morpheme representations. In conjunction with CWoMP, our system treats each correction by an annotator as part of an active learning setting, which expands the lexicon and improves future predictions without having to retrain the model. In this paper, we present our interface and argue that this feedback loop should be treated as a design requirement for NLP tools aimed at documentary linguists.

2606.04366 2026-06-04 cs.LG cs.NA math.NA

MeshTok: Efficient Multi-Scale Tokenization for Scalable PDE Transformers

MeshTok:可扩展PDE Transformer的高效多尺度分词化

Yanshun Zhao, Xiaoyu Peng, Jiamin Jiang, Congcong Zhu, Jingrun Chen

AI总结 提出受自适应网格细化启发的MeshTok框架,通过多尺度分词化在统一Transformer中同时捕获粗粒度全局上下文和细粒度局部细节,改善PDE建模的效率-准确率权衡。

详情
Comments
ICML2026
AI中文摘要

传统的分块Transformer在均匀空间分区上运行,将计算工作量均匀分布在整个域中,而不考虑局部特征。这种不灵活的分词化方案在有效表示和处理复杂PDE解方面具有固有的局限性。为了解决这个问题,我们提出了MeshTok,一种受自适应网格细化(AMR)启发的分词化和序列建模框架。该方法选择性地细化具有陡峭梯度、瞬态特征或多尺度结构的空间区域,在固定模拟网格上生成一组异质的多尺度令牌。这些令牌在统一的Transformer序列中处理,使模型能够同时捕获粗粒度的全局上下文和细粒度的局部细节,而无需专门的架构组件。尽管自适应细化适度增加了令牌数量,但它促进了计算资源向物理信息区域的更有针对性的分配,我们将其视为一种实用的归纳偏置,而不是形式上的最优性保证。跨多个PDE族和基准数据集的实验评估表明,与均匀网格基线相比,MeshTok持续改善了效率-准确率权衡。这表明自适应多尺度分词化是神经PDE建模的一种可扩展且可推广的设计原则。代码可在https://github.com/SCAILab-USTC/MeshTok获取。

英文摘要

Conventional patchified Transformers operate on uniform spatial partitions, distributing computational effort evenly across the domain irrespective of local features. This inflexible tokenization scheme is inherently limited in its ability to efficiently represent and process solutions to complex PDEs. To address this, we propose MeshTok, an adaptive mesh refinement (AMR)-inspired tokenization and sequence modeling framework. This method selectively refines spatial regions exhibiting sharp gradients, transient features, or multiscale structures, generating a heterogeneous set of multiscale tokens defined on a fixed simulation grid. These tokens are processed within a unified Transformer sequence, enabling the model to simultaneously capture coarse-grained global context and fine-grained local details without requiring specialized architectural components. Although adaptive refinement moderately increases token count, it promotes a more targeted allocation of computational resources to physically informative regions, which we view as a practical inductive bias rather than a formal optimality guarantee. Experimental evaluations across multiple PDE families and benchmark datasets demonstrate that MeshTok consistently improves the efficiency-accuracy trade-off compared to uniform-grid baselines. This suggests adaptive multiscale tokenization as a scalable and generalizable design principle for neural PDE modeling. Code is available at https://github.com/SCAILab-USTC/MeshTok.

2606.04365 2026-06-04 cs.CV cs.AI

Multi-Granularity 3D Kidney Lesion Characterization from CT Volumes

多粒度3D肾脏病变特征提取来自CT体积

Renjie Liang, Zhengkang Fan, Jinqian Pan, Chenkun Sun, Jiang Bian, Russell Terry, Jie Xu

AI总结 提出LesionDETR,一种基于DETR的架构,通过大小距离匈牙利匹配和分层损失,实现从CT体积中按病变预测四个临床属性,在双侧异常检测上达到AUC 0.799。

详情
AI中文摘要

放射学报告通过类型、大小、增强和衰减描述肾脏病变,但现有的3D方法仅在患者或器官级别进行预测。我们将肾脏CT特征提取重新定义为每个病变的集合预测任务:一个模型为每个肾脏输出可变数量的病变,每个病变具有四个临床属性。我们从一家学术医疗中心的788名患者中整理了2,619个CT体积,具有多粒度的侧别和每个病变的标签,并使用KiTS23(489例)进行零样本外部验证。我们提出了 extbf{LesionDETR},一种DETR风格的架构,具有大小距离匈牙利匹配和分层损失,将每个槽的输出聚合到侧别目标。在四种输入表示和六种编码器初始化中,两个设计选择占主导地位:分割掩码作为输入通道,以及同域腹部预训练(SuPreM);通用大型语料库预训练并不比随机初始化更好。LesionDETR在UF-Health上达到双侧侧别异常AUC $0.799 \pm 0.009$,在KiTS23上达到$0.817 \pm 0.072$。计数条件变体在囊性病变上达到每个病变mAP $0.190 \pm 0.083$;罕见的实性病变AP仍处于噪声水平,表明下一个瓶颈是针对性数据收集,而非架构。该框架为下游结构化报告生成提供了经过验证的每个病变预测。

英文摘要

Radiology reports describe kidney lesions by type, size, enhancement, and attenuation, yet existing 3D methods predict only at the patient or organ level. We reformulate kidney CT characterization as a per-lesion set-prediction task: one model emits a variable number of lesions per kidney, each with four clinical attributes. We curated 2,619 CT volumes from 788 patients at one academic medical center, with multi-granularity side- and per-lesion labels, and used KiTS23 (489 cases) for zero-shot external validation. We propose \textbf{LesionDETR}, a DETR-style architecture with size-distance Hungarian matching and a hierarchical loss that aggregates per-slot outputs to side-level objectives. Across four input representations and six encoder initializations, two design choices dominate: a segmentation mask as an input channel, and same-domain abdominal pretraining (SuPreM); generic large-corpus pretraining is no better than random initialization. LesionDETR reaches bilateral side-level abnormality AUC $0.799 \pm 0.009$ on UF-Health and $0.817 \pm 0.072$ on KiTS23. A count-conditioned variant reaches per-lesion mAP $0.190 \pm 0.083$ on cystic lesions; rare solid-lesion AP stays at the noise floor, pointing to targeted data collection, not architecture, as the next bottleneck. The framework yields verified per-lesion predictions for downstream structured report generation.

2606.04362 2026-06-04 cs.IR cs.CL

Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic

解耦答案引擎优化与平台增长:基于日志的ChatGPT推荐流量自然实验

Keisuke Watanabe, Kazuki Nakayashiki

AI总结 本研究通过自然实验方法,利用同一域内未处理页面作为对照,分离了答案引擎优化(AEO)对推荐流量的因果效应与平台自身增长带来的混淆效应。

详情
Comments
9 pages, 4 figures, 1 table
AI中文摘要

大型语言模型(LLM)“答案引擎”(如ChatGPT)现在向开放网络发送可测量的推荐流量,一种类似于搜索引擎优化的实践——此处称为答案引擎优化(AEO)——已经出现。公开的AEO成功案例通常引用巨大的原始增长倍数,但原始推荐增长被答案引擎本身的快速平台级增长所混淆。我们报告了一项针对单个高流量域名(glasp.co)的纵向现场研究,该域名拥有数十万个YouTube问答页面,在2026年1月接受了一组明确的AEO干预(详见第4节)。由于干预集中在网站的一个子集上,同一域内未处理的剩余部分作为同期对照,吸收了平台尾风。使用第一方分析和服务器日志而非概率性第三方估计,我们发现:(1)原始增长由平台尾风主导:在月度汇总中,ChatGPT总推荐量增长了5.7倍,而同一域内未处理页面在同一时间段内增长了3.5倍;(2)对每周处理/对照比率的中断时间序列模型估计出一个离散的、与干预对齐的水平增长1.82倍(95% CI 1.31-2.54,HAC p=0.001),该结果在参与度过滤流量(2.27倍)和替代规格下稳健;(3)然而,保守的安慰剂时间置换检验得出p=0.16,因此该效应是提示性的而非结论性的,鉴于前期短且噪声大;(4)Google对处理页面的自然点击并未超出整体网站趋势下降,且索引得以保留,这与SEO保护规则一致。方法论上的信息——通过域内对照分离处理与平台尾风——比任何单一倍数更重要,并意味着标题中的AEO倍数大大高估了因果效应。

英文摘要

Large language model (LLM) "answer engines" such as ChatGPT now send measurable referral traffic to the open web, and a practice analogous to search engine optimization, here called Answer Engine Optimization (AEO), has emerged. Public AEO success stories typically quote large raw growth multiples, but raw referral growth is confounded by the rapid platform-level growth of the answer engines themselves. We report a longitudinal field study on a single high-traffic domain (glasp.co) whose corpus of hundreds of thousands of YouTube question-and-answer pages received a defined bundle of AEO interventions in January 2026 (detailed in Section 4). Because the interventions were concentrated on one subset of the site, the untreated remainder of the same domain acts as a contemporaneous control that absorbs the platform tailwind. Using first-party analytics and server logs rather than probabilistic third-party estimators, we find: (1) raw growth is dominated by the platform tailwind: on monthly aggregates total ChatGPT referrals grew 5.7x while untreated pages on the same domain grew 3.5x over the same window; (2) an interrupted time-series model on the weekly treated/control ratio estimates a discrete, intervention-aligned level increase of 1.82x (95% CI 1.31-2.54, HAC p=0.001), robust across engagement-filtered traffic (2.27x) and alternative specifications; (3) however, a conservative placebo-in-time permutation test yields p=0.16, so the effect is suggestive, not conclusive, given a short and noisy pre-period; and (4) Google organic clicks to treated pages did not fall beyond the ambient site-wide trend and indexation was preserved, consistent with the SEO-protection rule. The methodological message, separating treatment from platform tailwind with an on-domain control, matters more than any single multiple, and implies that headline AEO multiples substantially overstate causal effect.

2606.04361 2026-06-04 eess.SY cs.MA cs.RO cs.SY math.DS math.OC

When Freshness Is Not Enough: Distribution-Aware Age of Information for Networked LQR Control

当新鲜度不足时:面向网络化LQR控制的分布感知信息年龄

Abdullah Y. Etcibasi, C. Emre Koksal, Eylem Ekici

AI总结 本文研究网络化控制系统中,仅最小化平均信息年龄(AoI)不足以优化LQR跟踪性能,需考虑调度间隔的完整分布(包括高阶矩和指数矩)。

详情
AI中文摘要

信息年龄(AoI)已成为无线更新系统设计的核心指标,尤其是在新鲜测量支持跟踪、估计和控制的场景中。尽管其广泛应用,但将平均AoI或峰值AoI作为闭环性能的替代指标通常基于直觉而非控制理论推导。本文探讨了最小化平均AoI是否对网络化控制系统最优。对于具有延迟间歇更新的标量线性时不变系统,我们证明,在状态无关调度策略下,无限时域LQR跟踪问题可简化为对调度间隔分布的优化。所得目标函数依赖于调度过程的高阶统计矩,在不稳定或相关情况下还依赖于指数矩,而非仅依赖于其均值。因此,具有相同平均AoI的策略可能产生显著不同的跟踪成本。我们进一步将分析扩展到具有指数衰减自相关的扰动,并推导出揭示完整间隔分布作用的等效成本公式。最后,使用NGSIM US-101数据集中的真实车辆轨迹验证理论。实证结果与预测的性能趋势一致,表明仅凭平均AoI不足以进行面向控制的网络设计。

英文摘要

Age of Information (AoI) has become a central metric for the design of wireless update systems, especially in applications where fresh measurements support tracking, estimation, and control. Despite its popularity, the use of mean AoI or peak AoI as a surrogate for closed-loop performance is often motivated by intuition rather than by a control-theoretic derivation. This paper examines whether minimizing the mean AoI is in fact optimal for networked control systems. For scalar linear time-invariant systems with delayed intermittent updates, we show that, under state-independent scheduling policies, the infinite-horizon LQR tracking problem reduces to an optimization over the distribution of inter-scheduling intervals. The resulting objective depends on higher-order statistical moments, and in unstable or correlated regimes on exponential moments, of the inter-scheduling process rather than only on its mean. Consequently, policies with identical mean AoI can induce substantially different tracking costs. We further extend the analysis to disturbances with exponentially decaying autocorrelation and derive equivalent cost formulations that expose the role of the full interval distribution. Finally, we validate the theory using real vehicle trajectories from the NGSIM US-101 dataset. The empirical results match the predicted performance trends, demonstrating that mean AoI alone is insufficient for control-oriented network design.

2606.04360 2026-06-04 cs.CL cs.LG

Deliberate Evolution: Agentic Reasoning for Sample-Efficient Symbolic Regression with LLMs

Deliberate Evolution: 基于智能体推理的样本高效符号回归与LLM

Xinyu Pang, Zhanke Zhou, Xuan Li, Fangrui Lv, Shanshan Wei, Sen Cui, Bo Han, Changshui Zhang

AI总结 提出Deliberate Evolution框架,通过解耦符号生成与搜索控制,利用自适应算子、分析工具和反思记忆,在仅用40%样本预算下超越现有LLM符号回归方法。

详情
Comments
ICML 2026
AI中文摘要

符号回归(SR)从数据中发现紧凑的数学表达式,然而最近基于LLM的进化方法仍然样本效率低下,因为它们主要依赖标量反馈(如MSE)。我们发现一个核心限制:现有方法将候选提议与搜索指导混为一谈,要求LLM从单一分数中推断如何进化表达式、诊断其错误并重用过去经验。为了解决这个问题,我们提出了Deliberate Evolution(DE),一个将符号生成与搜索控制解耦的智能体框架。DE使用自适应算子引导搜索方向、分析工具进行结构诊断以及反思记忆存储轨迹级经验,从而指导LLM的提议。在LLM-SRBench上的实验表明,DE在仅使用标准样本预算的40%的情况下,在多个科学领域一致优于代表性的基于LLM的SR基线。

英文摘要

Symbolic regression (SR) discovers compact mathematical expressions from data, yet recent LLM-based evolutionary methods remain sample-inefficient because they rely mainly on scalar feedback such as MSE. We identify a core limitation: existing methods conflate candidate proposal with search guidance, requiring the LLM to infer how to evolve an expression, diagnose its errors, and reuse past experience from a single score. To address this, we propose Deliberate Evolution (DE), an agentic framework that decouples symbolic generation from search control. DE guides LLM proposals with adaptive operators for search direction, analytical tools for structural diagnosis, and reflective memory for trajectory-level experience. Experiments on LLM-SRBench show that DE consistently outperforms representative LLM-based SR baselines across diverse scientific domains while using only 40% of the standard sample budget.

2606.04358 2026-06-04 cs.SD eess.AS math.CO

Gauss Circle Lattices with Geometric Convolutions for Synthesizing High Dimensional Image-Source Room Impulse Responses

基于几何卷积的高斯圆格点用于合成高维图像源房间脉冲响应

Yuancheng Luo

AI总结 提出一种将图像源模型中的格点计数问题转化为经典高斯圆问题的方法,通过几何卷积将计算复杂度从O(k^N)降低到O(N k^2 log k),并扩展至频率依赖和反射加权的更高维图像源。

详情
Comments
Accepted for publication at the 29th International Conference on Digital Audio Effects 2026
AI中文摘要

图像源模型(ISM)是一种广泛采用的方法,用于在镜面反射假设下高效模拟声学房间脉冲响应(RIR)。源和接收器之间的声学路径被追踪到从房间边界平面的连续反射计算出的格点。矩形房间将图像源的总数限制为RIR持续时间或等效距离k的多项式,次数等于房间维度数N。因此,直接ISM模拟的计算上界为O(k^N),并且为了可处理性和实际应用,仅考虑N≤3的情况。本文提出了一种替代计算方法,通过将ISM格点计数简化为经典高斯圆问题(GCP),将整数坐标和房间维度的渐近计算界降低到O(N k^2 log k)。我们将格点计数模型扩展到更高维度的频率依赖和反射加权图像源,通过卷积算子关联连续维度之间的解。给出了两种实现RIR的构造方法,以及时频控制、误差和运行时间分析以及RIR统计量。

英文摘要

The image-source model (ISM) is a widely adopted method for efficiently simulating acoustic room impulse responses (RIRs) under specular reflection assumptions. Acoustic paths between source and receiver are traced to lattice points computed from successive reflections over bounding planes of the room. Rectangular rooms bound the total number of image-sources to be polynomial in the RIR's duration or distance $k$ equivalent, with degree equal the number of room dimensions $N$. Direct ISM simulations are therefore compute upper-bound by $O \left ( k^N \right )$, and consider only cases of $N \leq 3$ for tractability and real-world applications. This work proposes an alternative computational method that lowers the asymptotic compute bound to $O \left ( N k^2 \log k \right )$ for integer coordinates and room dimensions via reducing ISM lattice point counting to the classic Gauss circle problem (GCP). We extend the lattice counting model to frequency-dependent and reflection weighted image-sources in higher dimensions, relating solutions between successive dimensions via the convolution operator. Two constructions for realizing RIRs are presented, along with time-frequency controls, error and run-time analysis, and RIR statistics.

2606.04355 2026-06-04 cs.RO

Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling

快速思考与远见:通过快速状态采样实现长时域在线POMDP规划

Yuanchu Liang, Edward Kim, J. Arden Knoll, Wil Thomason, Zachary Kingston, Lydia E. Kavraki, Hanna Kurniawati

AI总结 提出一种基于快速状态采样的在线POMDP求解器ROP-RAS3,通过宏动作生成和信念空间采样,有效解决长时域POMDP问题,在多种高维连续/离散混合空间中显著优于现有方法。

详情
Comments
@inproceedings{Liang2026Thinking, title = {Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling}, author = {Yuanchu Liang and Edward Kim and J.Arden Knoll and Wil Thomason and Zachary Kingston and Lydia E. Kavraki and Hanna Kurniawati}, year = 2026, booktitle = {International Journal of Robotics Research (to appear)} }
AI中文摘要

部分可观测马尔可夫决策过程(POMDP)是不确定性下运动规划的通用且原则性框架。尽管POMDP求解器的可扩展性有了巨大提升,但长时域POMDP仍然难以求解。为缓解这一困难,本文提出了一种新的近似在线POMDP求解器,称为基于参考的快速状态空间采样在线POMDP规划(ROP-RAS3)。ROP-RAS3利用新颖的极快采样运动规划技术对状态空间进行采样,并在线生成多样化的宏动作,然后用于偏置信念空间采样并推断高质量策略,而无需对动作空间进行穷举枚举——这是现代在线POMDP求解器的一个基本约束。ROP-RAS3以依赖于采样动作数量而非动作空间大小的速率收敛到近最优的基于参考的解。ROP-RAS3在多种长时域POMDP上进行了评估,这些POMDP具有高达3000个前瞻步骤和35维状态空间,其中状态、动作和观测空间可以是连续的、离散的或离散与连续的混合。尽管基于参考的最优解可能与最优POMDP解不同,但经验结果表明,在所有这些问题中,就成功率而言,ROP-RAS3优于其他最先进方法多达数倍。我们还通过物理机器人演示展示了我们方法的能力。这项工作扩展了我们ISRR24论文的理论和实证结果。代码可在 exttt{https://github.com/RDLLab/ROPRAS3} 找到。

英文摘要

Partially Observable Markov Decision Processes (POMDPs) are a general and principled framework for motion planning under uncertainty. Despite tremendous improvement in the scalability of POMDP solvers, long-horizon POMDPs remain difficult to solve. To alleviate the difficulty, this paper proposes a new approximate online POMDP solver, called Reference-Based Online POMDP Planning via Rapid State Space Sampling (ROP-RAS3). ROP-RAS3 uses novel extremely fast sampling-based motion planning techniques to sample the state space and generate a diverse set of macro actions online, which are then used to bias belief-space sampling and infer high-quality policies without requiring exhaustive enumeration of the action space -- a fundamental constraint for modern online POMDP solvers. ROP-RAS3 converges to a near-optimal reference-based solution at a rate that depends on the number of sampled actions, rather than the size of the action space. ROP-RAS3 is evaluated on various long-horizon POMDPs with up to 3000 lookahead steps and 35-dimensional state spaces, where the state, action and observation spaces can be continuous, discrete, or a hybrid of discrete and continuous. Although the reference-based optimal solution may not be the same as the optimal POMDP solution, empirical results indicate that in all of these problems, in terms of success rate, ROP-RAS3 outperforms other state-of-the-art methods by up to multiple folds. We also demonstrate the capability of our approach on a physical robot demonstration. This work extends the theory and empirical results of our ISRR24 paper. Code can be found at \texttt{https://github.com/RDLLab/ROPRAS3}.

2606.04345 2026-06-04 cs.CV cs.AI cs.LG

HYolo: An Intelligent IoT-Based Object Detection System Using Hypergraph Learning

HYolo:一种基于超图学习的智能物联网目标检测系统

Isha Abid, Fawad Khan, Muhammad Khuram Shahzad

AI总结 提出HYolo框架,将超图学习融入YOLO架构以建模高阶特征关系,在COCO数据集上mAP@50提升约12%。

详情
Comments
8 pages, multiple figures;
AI中文摘要

本文提出HYolo,一种基于物联网的智能目标检测框架,将超图学习集成到YOLO架构中。传统的基于YOLO的目标检测模型主要捕获成对特征交互,可能无法建模对象与上下文特征之间的复杂高阶关系。为解决这一局限,HYolo引入超图学习以捕获更丰富的上下文依赖关系并改进对象表示。在COCO数据集上的实验评估表明,与基线YOLO模型相比,性能显著提升。所提方法在mAP@50上实现了约12%的提升,同时增强了整体检测准确性和鲁棒性。通过建模高阶特征关系,HYolo在物联网环境中提供了改进的上下文理解和更可靠的目标检测性能。结果表明,将超图学习集成到目标检测流程中,为智能且上下文感知的物联网视觉系统提供了一个有前景的方向。

英文摘要

This paper presents HYolo, an intelligent IoT-based object detection framework that integrates hypergraph learning into the YOLO architecture. Traditional YOLO-based object detection models primarily capture pairwise feature interactions and may fail to model complex high-order relationships among objects and contextual features. To address this limitation, HYolo incorporates hypergraph learning to capture richer contextual dependencies and improve object representation. Experimental evaluation on the COCO dataset demonstrates significant performance improvements over baseline YOLO models. The proposed approach achieves approximately 12% improvement in mAP@50 while enhancing overall detection accuracy and robustness. By modeling high-order feature relationships, HYolo provides improved contextual understanding and more reliable object detection performance in IoT-based environments. The results indicate that integrating hypergraph learning into object detection pipelines offers a promising direction for intelligent and context-aware IoT vision systems.

2606.04343 2026-06-04 cs.CV

Robust Multi-view Clustering against Imperfect Information

面向不完美信息的鲁棒多视图聚类

Zhichao Huang, Haochen Zhou, Hao Wang, Mouxing Yang, Xi Peng

AI总结 针对多视图数据中视图缺失和对应关系噪声的不完美信息问题,提出后验引导的潜在对应推理框架(PLCI),通过将跨视图对应视为潜在变量并融合实例级可靠性和原型级语义传输来统一处理两种挑战。

详情
Comments
19 pages, 11 figures
AI中文摘要

现实世界的多视图数据总是遭受不完美信息问题,其中特定实例的视图特定观测缺失(即不完整视图,IV)且跨视图对应关系不匹配(即噪声对应,NC)。作为补救,已经提出了许多面向IV和NC的多视图聚类(MvC)方法,然而这些方法要么需要可靠的对应关系,要么需要足够完整的实例,因此无法解决不完美信息问题。相比之下,我们观察到IV和NC挑战都源于同一个问题,即不完美的跨视图对应信息,其中锚点实例在另一视图中的对应可能不可用或不可靠。基于这一观察,我们提出了一种新颖的鲁棒MvC框架,称为后验引导的潜在对应推理(PLCI),它能够以统一的方式处理IV和NC。具体来说,PLCI将每个锚点实例所需的跨视图对应表述为潜在变量,并整合实例级可靠性和原型级语义传输来推断潜在对应的后验分布。在六个广泛使用的多视图数据集上,与10种最先进的MvC方法相比,大量实验证明了PLCI在处理不完美信息问题上的有效性。代码将在接收后发布。

英文摘要

Real-world multi-view data always suffer from imperfect information problem, where the view-specific observations are absent (i.e., Incomplete Views, IV) and cross-view correspondences are mismatched (i.e., Noisy Correspondences, NC) for certain instances. As a remedy, numerous IV- and NC-oriented multi-view clustering (MvC) methods have been proposed, which however require either reliable correspondences or sufficiently complete instances, thus stopping short of addressing the imperfect information problem. In contrast, we observe that both IV and NC challenges originate from the same issue of imperfect cross-view counterpart information, where the counterpart of an anchor instance in another view might be either unavailable or unreliable. Based on the observation, we propose a novel robust MvC framework, termed Posterior-guided Latent Counterpart Inference (PLCI), which could handle both IV and NC in a unified manner. Specifically, PLCI formulates the desired cross-view counterpart of each anchor instance as a latent variable, and integrates both instance-level reliability and prototype-level semantic transport to infer the posterior distribution of the latent counterpart. Extensive experiments on six widely-used multi-view datasets against 10 state-of-the-art MvC methods demonstrate the effectiveness of PLCI for tackling the imperfect information problem. The code will be released upon acceptance.

2606.04342 2026-06-04 cs.LG cs.AI

Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty

期望与现实:条件不确定性下MSE最优预测的成本

Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho

AI总结 本文通过条件不确定性间隙理论证明多步时间序列预测中MSE最优与边际真实性存在根本性权衡,并实证表明小幅牺牲MSE(≤5%)可显著提升边际真实性(中位数17.3%)。

详情
Comments
12 pages, Accepted for KDD 2026 Research track
AI中文摘要

多步时间序列预测(MSF)通常使用均方误差(MSE)等逐点误差指标进行评估,隐含地将条件均值视为充分目标。我们证明,在条件不确定性下,当条件期望在较长预测范围内无法代表典型实现值时,这种做法可能产生误导。我们通过条件不确定性间隙形式化这一效应,并证明只要该间隙非零,任何确定性预测器都无法同时最小化MSE并匹配实现未来的边际分布。这确立了MSF评估中逐点准确性与边际真实性之间根本性的、与模型无关的权衡。利用受控随机动力系统和九个真实世界预测基准,我们经验性地刻画了由此产生的准确性-真实性前沿,并量化了仅基于MSE的模型选择的实际成本。随着条件不确定性随预测范围增加,可达集扩展为明显的帕累托前沿,将MSE最优但分散不足的预测器与牺牲准确性换取真实边际变异性的方法区分开来。在多个基准中,我们发现MSE的小幅放松(≤5%)通常能带来边际真实性的不成比例提升,中位数改进为17.3%,在某些数据集中增益超过30%。我们进一步表明,常见的预测策略系统性地占据该前沿的不同区域:直接多输出预测器集中在准确性最优极端附近,而递归策略和基于样本的推断更倾向于边际真实性。这些结果共同揭示了长期预测中基于MSE评估的结构性失败模式,并将策略和推断选择重新定义为对不可避免的准确性-真实性权衡的导航。

英文摘要

Multi-step time series forecasting (MSF) is commonly evaluated using point-wise error metrics such as mean squared error (MSE), implicitly treating the conditional mean as a sufficient target. We show that this can be misleading under conditional uncertainty, where the conditional expectation becomes unrepresentative of typical realized values at longer horizons. We formalize this effect through a conditional uncertainty gap and prove that whenever this gap is nonzero, no deterministic predictor can simultaneously minimize MSE and match the marginal distribution of realized futures. This establishes a fundamental, model-agnostic trade-off between point accuracy and marginal realism in MSF evaluation. Using controlled stochastic dynamical systems and nine real-world forecasting benchmarks, we empirically characterize the resulting accuracy--realism frontier and \textbf{quantify the practical cost of MSE-only model selection}. As conditional uncertainty increases with forecast horizon, the attainable set expands into a pronounced Pareto front, separating MSE-optimal but under-dispersed predictors from methods that trade accuracy for realistic marginal variability. \textbf{Across benchmarks, we find that small relaxations in MSE ($\boldsymbol{\le 5\%}$) frequently unlock disproportionate gains in marginal realism, with median improvements of $\mathbf{17.3\%}$ and gains exceeding $\mathbf{30\%}$ in some datasets.} We further show that common forecasting strategies systematically occupy different regions of this frontier: direct multi-output predictors concentrate near the accuracy-optimal extreme, while recursive strategies and sample-based inference favors marginal realism. Together, these results expose a structural failure mode of MSE-based evaluation in long-horizon forecasting and recast strategy and inference selection as navigation of an unavoidable accuracy--realism trade-off.

2606.04340 2026-06-04 cs.CL

Noisy memory encoding explains negative polarity illusions

噪声记忆编码解释了负极词幻觉

Yuhan Zhang, Edward Gibson

AI总结 本研究利用Hahn等人(2022)的有损上下文惊奇理论,提出不完美的句子编码导致负极词幻觉,并通过六个新型限定词对的 acceptability 判断实验验证了限定词相似性增强幻觉效应的假设。

详情
Comments
21 pages, 5 figures, submitted for journal publication
AI中文摘要

像“The authors that no critics recommended have ever received acknowledgment for a best-selling novel”这样的句子有时被认为可接受,尽管严格来说它不合语法,因为负极词“ever”在其位置未获许可。这种行为效应有时被称为“负极词幻觉”。这里我们提出,Hahn等人(2022)的有损上下文惊奇理论——即人们对复杂句子的编码不完美——可能解释这种效应。我们假设人们对主句和从句主语中的限定词记忆表征较差,并可能设想一种限定词交换来许可“ever”。我们提出,这些位置上更相似的限定词会引发更强的幻觉效应。使用六种新型限定词对(例如,“few”和“many”,“few”和“most”)的可接受性判断任务支持了我们的提议,具体表明,即使没有时间压力,新句子“Many authors that few critics recommended have ever received acknowledgment for a best-selling novel”也比规范句引发了更强的幻觉。这些结果进一步支持了人类语言处理是不完美且资源理性的观点:面对工作记忆限制,人类理性地从噪声语言输入中重构最可能的内容,以促进下游处理。

英文摘要

A sentence like "The authors that no critics recommended have ever received acknowledgment for a best-selling novel" is sometimes rated as acceptable even though, strictly speaking, it is ungrammatical because the negative polarity word "ever" is not licensed where it is. This behavioral effect is sometimes called a "negative polarity illusion". Here we propose that the lossy context surprisal theory of Hahn et al. (2022) -- whereby people have an imperfect encoding of complex sentences -- might explain this effect. We hypothesize that people have poor memory representation of the determiners in the main-clause and embedded-clause subjects and could entertain a determiner exchange that licenses ever. We propose that more similar determiners in those positions would trigger stronger illusion effects. Acceptability judgment tasks with six novel determiner pairs (e.g., "few" and "many", "few" and "most") support our proposal, showing, specifically, that a novel sentence, "Many authors that few critics recommended have ever received acknowledgment for a best-selling novel", triggered a much stronger illusion than the canonical one even without time pressure. These results offer further support for the suggestion that human language processing is imperfect and resource-rational: in face of working memory limitations, humans rationally reconstruct what is most likely from noisy linguistic input to facilitate downstream processing.

2606.04339 2026-06-04 cs.LG

Literature-Guided Minimax Optimization of Virtual Epilepsy Neurostimulation

文献引导的虚拟癫痫神经刺激极小化优化

Cathy Liu

AI总结 提出一种文献引导的极小化优化流程,结合PubMed规模假设提取、TVB Epileptor模拟和大语言模型黑箱优化,以最大化最坏情况下的奖励,用于鲁棒的神经刺激设计。

详情
Comments
9 pages, 4 figures. Code and interactive essay at https://github.com/liuzhitong330/tvb-llm-robust-neurostim
AI中文摘要

癫痫的计算模型有望实现患者特异性治疗设计,但大多数优化工作流程仍搜索平均表现良好的参数。在神经调控中,这是一个薄弱目标:改善平均响应的方案仍可能对网络最不耐受刺激的患者失败。我们提出一种文献引导的极小化优化流程,结合PubMed规模假设提取、虚拟大脑(TVB)Epileptor模拟和大语言模型引导的黑箱优化。优化器提出内在模型控制参数或临床可解释的外部刺激方案;TVB对采样的虚拟患者评估每个方案;目标函数最大化最坏情况奖励,定义为模拟癫痫活动的负方差。在内在模型控制实验中,最佳存档参数集将最坏情况奖励从-0.5285提升至-0.3182,比基线提高39.8%。临床风格的外部刺激搜索产生较小的最坏情况改善(1.7%),尽管有55%的响应率和阳性颞叶亚组信号,但20名虚拟患者队列未显示总体获益(p=0.9019)。该研究应被视为鲁棒、文献感知的神经刺激设计的计算机概念验证,而非临床证据。

英文摘要

Computational models of epilepsy promise patient-specific treatment design, but most optimization workflows still search for parameters that perform well on average. In neuromodulation, this is a weak target: a protocol that improves the mean response can still fail in the patient whose network is least tolerant to stimulation. We present a literature-guided minimax pipeline that couples PubMed-scale hypothesis extraction, The Virtual Brain (TVB) Epileptor simulations, and large-language-model-guided black-box optimization. The optimizer proposes either intrinsic model-control parameters or clinically interpretable external-stimulation protocols; TVB evaluates each proposal across sampled virtual patients; and the objective maximizes worst-case reward, defined as the negative variance of simulated seizure activity. In the intrinsic model-control experiment, the best archived parameter set improved worst-case reward from -0.5285 to -0.3182, a 39.8% gain over baseline. The clinical-style external-stimulation search produced a much smaller worst-case improvement (1.7%), and a 20-patient virtual cohort showed no aggregate benefit (p=0.9019), despite a 55% responder rate and a positive temporal-lobe subgroup signal. The study should be read as an in silico proof of concept for robust, literature-aware neurostimulation design, not as clinical evidence.

2606.04338 2026-06-04 cs.LG cs.CR

Federated Learning for Multi-Center Sepsis Early Prediction with Privacy-Preserving

联邦学习用于隐私保护的多中心脓毒症早期预测

Xixi Tian, Di Wu, Xiang Liu, Yiziting Zhu, Yujie Li, Xin Shu, Bin Yi

AI总结 针对多中心医疗数据的隐私和分布式特性,提出基于联邦学习的分布式协作建模方法,实现与集中式模型相当的预测精度并避免隐私泄露。

详情
AI中文摘要

多中心医疗数据的隐私敏感性和分布式特征给集中式建模进行脓毒症早期准确预测带来了严重障碍。联邦学习作为一种有前景的协作模型开发框架,允许多个机构在不直接共享或集中原始数据的情况下联合训练预测模型,因此受到越来越多的关注。然而,其实际性能、鲁棒性和隐私保护优势尚未使用真实临床数据集进行充分评估。为弥补这一差距,本研究系统性地考察了联邦学习在多中心脓毒症预测中的应用。实验数据集包括从中国三家三级医院收集的648个临床筛选样本,并采用严格的纳入和排除标准。我们建立了集中式训练范式作为性能基线,然后实现了水平联邦学习框架用于分布式协作建模。大量实验结果表明,基于联邦学习的模型在预测精度上与集中式模型高度相当,同时从根本上避免了隐私泄露。进一步的隐私安全分析验证了恶意攻击者无法从传输的模型参数中重建原始患者数据,表明其对数据重建攻击具有强大的抵抗力。这项工作不仅验证了联邦学习在临床脓毒症预测中的实用性和安全性,而且为隐私保护的多中心医疗协作提供了可靠且可行的解决方案。

英文摘要

Privacy-sensitive and distributed characteristics of multi-center medical data bring severe obstacles to centralized modeling for accurate early prediction of sepsis. Federated learning (FL) has attracted growing attention as a promising framework for collaborative model development, as it allows multiple institutions to jointly train predictive models without directly sharing or centralizing raw data. Nevertheless, its practical performance, robustness, and privacy-preserving benefits remain insufficiently evaluated using real-world clinical datasets. To bridge this gap, this study systematically examines the application of federated learning to multi-center sepsis prediction. The experimental dataset consists of 648 clinically screened samples collected from three tertiary hospitals in China, with rigorous inclusion and exclusion criteria. We establish a centralized training paradigm as the performance baseline, and then implement a horizontal federated learning framework for distributed collaborative modeling. Extensive experimental results demonstrate that the federated learning-based model achieves highly comparable prediction accuracy to the centralized counterpart, while fundamentally avoiding privacy leakage. Further privacy security analysis verifies that malicious attackers cannot reconstruct the original patient data from the transmitted model parameters, indicating strong resistance against data reconstruction attacks. This work not only validates the practicality and security of federated learning in clinical sepsis prediction, but also provides a reliable and feasible solution for privacy-preserving multi-center medical collaboration.

2605.04607 2026-06-04 cs.RO

Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking

正确模型,正确时机:用于双足行走的实时级联保真度MPC

Franek Stark, Felix Wiebe, Shubham Vyas, Dennis Mronga, Frank Kirchner

AI总结 提出一种多阶段全身模型预测控制方法,结合近视野详细全身模型与远视野简化单刚体模型,降低计算复杂度并保持预测能力,在通用MPC框架acados中求解,无需预设足迹位置,在18自由度双足机器人HyPer-2上验证。

详情
Journal ref
Proceedings of the 2nd ICRA Workshop on Frontiers of Optimization for Robotics, 2026
Comments
Presented at IEEE ICRA 2026 Workshop "2cnd Workshop on Frontiers of Optimization for Robotics"
AI中文摘要

本文提出了一种用于双足行走的多阶段全身模型预测控制(MPC)方法,在近视野中结合详细的全身模型,在后续预测步骤中结合简化的单刚体模型。这降低了计算复杂度,同时保留了预测能力。所得到的非线性最优控制问题完全在通用现成的非线性MPC框架acados中求解,使用序列二次规划(SQP)。给定接触时间表和目标行走速度,控制器优化关节扭矩,而不依赖于预设的足迹位置。该控制器在18自由度双足机器人HyPer-2的MuJoCo仿真中得到验证。

英文摘要

This paper presents a multi-phase whole-body model predictive control (MPC) approach for bipedal walking, combining a detailed whole-body model in the near horizon with a simplified single-rigid-body model in the later prediction steps. This reduces computational complexity while retaining prediction capabilities. The resulting nonlinear optimal control problem is solved entirely within the general-purpose, off-the-shelf nonlinear MPC framework acados, using sequential quadratic programming (SQP). Given a contact schedule and a target walking speed, the controller optimizes joint torques without depending on preselected footstep locations. The controller is validated in MuJoCo simulation on the 18-DoF bipedal robot HyPer-2.

2605.03927 2026-06-04 cs.CV

StateVLM: A State-Aware Vision-Language Model for Robotic Affordance Reasoning

StateVLM: 一种用于机器人可操作推理的状态感知视觉语言模型

Xiaowen Sun, Matthias Kerzel, Mengdi Li, Xufeng Zhao, Paul Striker, Stefan Wermter

AI总结 提出StateVLM模型,通过辅助回归损失训练策略增强视觉语言模型在目标检测和状态定位中的数值推理能力,并构建OSAR基准验证其有效性。

详情
AI中文摘要

视觉语言模型(VLM)在各种机器人任务中表现出色,因为它们能够感知视觉信息并理解自然语言指令。然而,当应用于机器人时,VLM仍然受限于大型语言模型(LLM)固有的一个基本限制:它们在数值推理方面存在困难,特别是在目标检测和目标状态定位中。为了探索VLM中作为回归任务的数值推理,我们提出了一种新颖的训练策略,使VLM适应目标检测和目标状态定位。该方法在微调期间利用框解码器输出计算辅助回归损失(ARL),同时在推理时保持标准序列预测。我们利用这种训练策略开发了StateVLM(状态感知视觉语言模型),这是一种新颖的模型,旨在感知和学习细粒度的目标表示,包括目标和其状态的精确定位,以及可抓取区域。由于缺乏目标状态可操作推理的基准,我们引入了一个开源基准——目标状态可操作推理(OSAR),其中包含1172个场景,7746个单独目标及其对应的边界框。在适配基准(RefCOCO、RefCOCO+和RefCOCOg)上的对比实验表明,与没有ARL的模型相比,ARL使模型性能平均提高1.6%。在OSAR基准上的实验进一步支持了这一发现,表明带有ARL的StateVLM比没有ARL的模型平均性能高5.2%。特别是,ARL对于OSAR中复杂的可操作推理任务也很重要,它增强了模型输出的一致性。

英文摘要

Vision-language models (VLMs) have shown remarkable performance in various robotic tasks, as they can perceive visual information and understand natural language instructions. However, when applied to robotics, VLMs remain subject to a fundamental limitation inherent in large language models (LLMs): they struggle with numerical reasoning, particularly in object detection and object-state localization. To explore numerical reasoning as a regression task in VLMs, we propose a novel training strategy to adapt VLMs for object detection and object-state localization. This approach leverages box decoder outputs to compute an Auxiliary Regression Loss (ARL) during fine-tuning, while preserving standard sequence prediction at inference. We leverage this training strategy to develop StateVLM (State-aware Vision-Language Model), a novel model designed to perceive and learn fine-grained object representations, including precise localization of objects and their states, as well as graspable regions. Due to the lack of a benchmark for object-state affordance reasoning, we introduce an open-source benchmark, Object State Affordance Reasoning (OSAR), which contains 1172 scenes with 7746 individual objects and corresponding bounding boxes. Comparative experiments on adapted benchmarks (RefCOCO, RefCOCO+, and RefCOCOg) demonstrate that ARL improves model performance by an average of 1.6% compared to models without ARL. Experiments on the OSAR benchmark further support this finding, showing that StateVLM with ARL achieves an average of 5.2% higher performance than models without ARL. In particular, ARL is also important for the complex task of affordance reasoning in OSAR, where it enhances the consistency of model outputs.

2605.01910 2026-06-04 cs.LG cs.AI cs.DC

Stochastic Sparse Attention for Memory-Bound Inference

随机稀疏注意力用于内存受限推理

Kyle Lee, Corentin Delacour, Kevin Callahan-Coray, Kyle Jiang, Can Yaras, Samet Oymak, Tathagata Srimani, Kerem Y. Camsari

AI总结 提出SANTA方法,通过从后softmax分布中采样稀疏索引来减少值缓存访问,实现无乘法的高效解码,在Llama-3.1-8B-Instruct上获得1.5倍注意力核加速和1.25倍端到端加速。

详情
Journal ref
ICML 2026
Comments
Code available at https://github.com/OPUSLab/SANTA
AI中文摘要

自回归解码在长上下文中变得带宽受限,因为生成每个token需要从KV缓存中读取所有$n_k$个键和值向量。我们提出随机加法无乘法注意力(SANTA),一种通过从后softmax分布中采样$S \ll n_k$个索引并仅聚合这些值行来稀疏化值缓存访问的方法。这产生了后softmax值聚合的无偏估计,同时将值阶段的乘加运算替换为收集和加法。我们引入分层和系统采样来设计方差减少、GPU友好的变体。在32k token上下文的Llama-3.1-8B-Instruct上评估,S$^2$ANTA匹配基线准确率,同时在NVIDIA RTX 6000 Ada上相比FlashInfer和FlashDecoding实现高达1.5倍解码步注意力核加速。在批处理长上下文生成中,这些核增益转化为高达1.25倍的端到端解码延迟加速。最后,我们提出伯努利$qK^\mathsf{T}$采样作为补充技术来稀疏化分数阶段,通过随机三元查询减少键特征访问。两种方法对上游量化、低秩投影、KV缓存压缩和KV缓存选择方法互补。它们共同指向稀疏、无乘法和节能的推理。我们在https://github.com/OPUSLab/SANTA.git开源了我们的核。

英文摘要

Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S \ll n_k$ indices from the post-softmax distribution and aggregates only those value rows. This yields an unbiased estimator of the post-softmax value aggregation while replacing value-stage multiply-accumulates with gather-and-add. We introduce stratified and systematic sampling to design variance-reduced, GPU-friendly variants. Evaluated on Llama-3.1-8B-Instruct at 32k-token contexts, S$^2$ANTA matches baseline accuracy while achieving up to $1.5\times$ decode-step attention-kernel speedup over FlashInfer and FlashDecoding on an NVIDIA RTX 6000 Ada. In batched long-context generation, these kernel gains translate to up to $1.25\times$ end-to-end decode-latency speedup. Finally, we propose Bernoulli $qK^\mathsf{T}$ sampling as a complementary technique to sparsify the score stage, reducing key-feature access through stochastic ternary queries. Both methods are complementary to upstream quantization, low-rank projection, KV-cache compression, and KV-cache selection methods. Together, they point toward sparse, multiplier-free, and energy-efficient inference. We open-source our kernels at: https://github.com/OPUSLab/SANTA.git

2606.04329 2026-06-04 cs.CR cs.AI

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

从不可信输入到可信内存:LLM智能体中内存投毒攻击的系统研究

Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah, Zhiwei Shang

AI总结 本文系统研究了基于LLM的智能体中的内存投毒攻击,识别了四种内存写入通道和九种结构漏洞,提出了六类攻击的分类法,并设计了评估基准MPBench,发现更积极读写内存的智能体更易被利用,且现有提示注入防御无法覆盖内存投毒攻击。

详情
AI中文摘要

内存是AI智能体的核心组件,使其能够在交互中积累知识并提高性能。然而,持久性内存引入了内存投毒的风险,即单个对抗性内存写入可以对智能体行为产生长期影响。我们对基于LLM的智能体中的内存投毒进行了系统研究。我们识别了四种内存写入通道和九种模型能力、系统提示设计以及智能体系统架构中的结构漏洞,这些漏洞使得这些通道可被利用。基于这些漏洞,我们提出了六类内存投毒攻击的分类法。此外,我们设计了MPBench——一个用于评估内存投毒攻击的基准,并表明设计为更积极读写和检索内存的智能体更容易被利用。我们还表明,现有的提示注入防御无法覆盖内存投毒攻击。我们的发现为理解和缓解针对AI智能体的内存投毒攻击提供了基础。

英文摘要

Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior. We present a systematic study of memory poisoning in LLM-based agents. We identify four memory write channels and nine structural vulnerabilities in model capabilities, system prompt design, and agent system architecture that make these channels exploitable. Based on these vulnerabilities, we develop a taxonomy of six classes of memory poisoning attacks. Furthermore, we design MPBench -- a benchmark for evaluating memory poisoning attacks, and show that agents designed to write and retrieve memory more aggressively are more exploitable. We also show that existing prompt injection defenses fail to cover memory poisoning attacks. Our findings provide a foundation for understanding and mitigating memory poisoning attacks against AI agents.

2606.04328 2026-06-04 cs.NI cs.AI

Generalizable Multi-Task Learning for Wireless Networks Using Prompt Decision Transformers

基于提示决策变压器的无线网络可泛化多任务学习

Fatih Temiz, Shavbo Salehi, Melike Erol-Kantarci

AI总结 提出PromptDT框架,将多小区选择重构为序列建模问题,利用离线轨迹和任务特定提示实现跨异构网络配置的可扩展学习,在无需重训练的情况下提升多任务QoE达49%。

详情
Comments
Accepted paper at IEEE International Mediterranean Conference on Communications and Networking (MeditCom) 2026
AI中文摘要

未来无线网络需要快速适应高度异构的环境和动态任务配置,这要求从传统的基于规则和优化的无线资源管理(RRM)转向人工智能(AI)驱动的RRM。AI驱动的方法可以学习复杂的非线性关系,泛化到不同的网络条件,并实现实时、可扩展和自主的决策。在RRM技术中,协调多点(CoMP)传输对于减轻小区间干扰和提升小区边缘性能至关重要,从而在密集部署中改善体验质量(QoE)。然而,最优多小区选择仍然是一个复杂的组合挑战,因为它需要在动态流量和信道条件下联合优化许多可能的服务小区组合。尽管取得了成功,但传统的深度强化学习(DRL)方法,如近端策略优化(PPO),在状态和动作空间变化时存在样本效率低、泛化能力有限和重新训练成本高的问题。为了解决这些瓶颈,我们提出了一种基于提示决策变压器(PromptDT)的多任务学习框架,该框架能够跨不同网络配置学习,并将多小区选择重构为序列建模问题。通过利用离线轨迹和任务特定提示,PromptDT实现了跨不同网络配置(包括变化的基站和用户设备数量以及调度策略)的可扩展学习。实验结果表明,与基线相比,PromptDT在多任务设置中将QoE提高了高达49%,且性能随模型容量正向扩展。此外,PromptDT能有效泛化到未见过的任务,实现对新网络配置的鲁棒少样本适应,无需重新训练或微调。

英文摘要

Future wireless networks demand rapid adaptation to highly heterogeneous environments and dynamic task configurations, necessitating a shift from conventional rule-based and optimization-driven radio resource management (RRM) toward artificial intelligence (AI)-driven RRM. AI-driven approaches can learn complex nonlinear relationships, generalize across diverse network conditions and enable real-time, scalable and autonomous decision-making. Among RRM techniques, coordinated multipoint (CoMP) transmission is pivotal for mitigating inter-cell interference and enhancing cell-edge performance, thereby improving quality of experience (QoE) in dense deployments. However, optimal multi-cell selection remains a complex combinatorial challenge as it requires jointly optimizing over many possible serving-cell combinations under dynamic traffic and channel conditions. Despite their success, conventional deep reinforcement learning (DRL) methods such as proximal policy optimization (PPO) suffer from poor sample efficiency, limited generalization, and costly retraining when state and action spaces change. To address these bottlenecks, we propose a Prompt Decision Transformer (PromptDT) based multi-task learning framework capable of learning across diverse network configurations and reformulating multi-cell selection as a sequence modeling problem. By leveraging offline trajectories and task-specific prompts, PromptDT enables scalable learning across diverse network configurations, including varying base stations and user equipment counts, and scheduler policies. Experimental results demonstrate that PromptDT improves QoE by up to 49% in multi-task settings compared to baselines, with performance scaling positively alongside model capacity. Moreover, PromptDT generalizes effectively to unseen tasks, achieving robust few-shot adaptation to new network configurations without retraining or fine-tuning.

2606.04327 2026-06-04 cs.LG cs.AI math.OC

A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks

两层神经网络平稳高原的几何刻画

Tian Ding, Dawei Li, Ruoyu Sun

AI总结 通过定义“内Hessian”矩阵,研究了光滑激活函数下两层神经网络损失景观中平稳高原的几何结构,分类了所有平稳点的类型(局部极小或鞍点),并揭示了分裂系数与内Hessian的定性如何共同决定高原的局部几何。

详情
Comments
47 pages
AI中文摘要

我们研究了光滑激活函数的两层神经网络损失景观中出现的平稳高原的几何结构。我们关注“神经元分裂”现象,其中复制一个隐藏神经元会在更宽的网络中产生一个仿射平稳点集。我们提供了这些高原上所有平稳点的全面分类,确定了它们在何种条件下构成局部极小点或鞍点。我们的刻画依赖于一个我们称之为“内Hessian”矩阵的每个神经元曲率对象。我们的分析表明,内Hessian的定性以及分裂系数的选择共同决定了高原的局部几何。我们证明,分裂一个局部极小点可以产生局部极小和鞍点的混合,或者一个全鞍点的高原,在温和假设下确定了一个具体的必然鞍点区域。相反,分裂一个鞍点总是产生一个鞍点的高原。我们的结果统一并扩展了先前的景观分析,阐明了模型扩展何时以及如何保持或改变平稳点的性质。这些发现为神经网络中宽度扩展和重参数化的影响提供了新的几何见解。

英文摘要

We investigate the geometric structure of stationary plateaus that arise in the loss landscape of two-layer neural networks with smooth activation functions. We focus on the phenomenon of "neuron splitting" where duplicating a hidden neuron yields an affine set of stationary points in a wider network. We provide a comprehensive classification of all stationary points on these plateaus, determining under what conditions they constitute local minima or saddle points. Our characterization hinges on a per-neuron curvature object we term the "inner Hessian" matrix. Our analysis reveals that the definiteness of the inner Hessian and the choice of splitting coefficients jointly dictate the local geometry of the plateau. We show that "splitting" a local minimum can yield either a mixture of local minima and saddles or an all-saddle plateau, with a concrete sure-saddle region identified under mild assumptions. In contrast, splitting a saddle point always produces a plateau of saddle points. Our results unify and extend prior landscape analyses, elucidating when and how model expansion preserves or alters the nature of stationary points. These findings offer new geometric insights into the effects of width expansion and reparameterization in neural networks.

2606.04326 2026-06-04 cs.LG cs.AI

Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models

衡量重要之事:概念瓶颈模型的合成基准

Julian Skirzynski, Harry Cheon, Shreyas Kadekodi, Meredith Stewart, Berk Ustun

AI总结 本文开发了用于概念瓶颈模型的合成基准,通过控制数据模态、概念选择、标注质量和完整性等属性,评估模型在决策支持和自动化场景下的性能,并诊断失败模式。

详情
Comments
Benchmarks available at https://github.com/ustunb/concept-benchmark
AI中文摘要

概念瓶颈模型从输入中检测到的高级概念预测结果。尽管概念提供了从可解释性中获益的简单方法,但很少有数据集包含概念标签。这限制了研究人员确定哪些问题适合这些模型、隔离驱动其性能或导致失败的因素、或发现哪些算法表现良好的能力。在本文中,我们为概念瓶颈模型开发了合成基准,重点关注其两个主要用例:决策支持(模型帮助人类做出更好的决策)和自动化(模型在无监督下处理常规任务)。我们的基准可以生成带标签的数据集,同时控制影响性能的属性,包括数据模态、概念选择、标注质量和完整性。我们演示了如何使用这些基准评估代表性类别的概念瓶颈模型。我们的演示展示了基准如何诊断失败模式并指导后续测试。

英文摘要

Concept bottleneck models predict outcomes from high-level concepts detected in inputs. Although concepts provide a simple way to reap benefits from interpretability, very few datasets include concept labels. This limits researchers' ability to determine which problems are suitable for these models, isolate the factors that drive their performance or lead to failures, or uncover which algorithms perform well. In this paper, we develop synthetic benchmarks for concept-bottleneck models, focusing on their two main use cases: decision support, in which models assist humans in making better decisions, and automation, in which models handle routine tasks without supervision. Our benchmarks can generate labeled datasets while controlling for properties that affect performance, including data modality, concept choice, annotation quality, and completeness. We demonstrate how the benchmarks can be used to evaluate representative classes of concept bottleneck models. Our demonstrations show how the benchmarks can diagnose failure modes and guide follow-up testing.

2606.04325 2026-06-04 cs.CL

Parameter-Efficient Fine-Tuning with Learnable Rank

可学习秩的参数高效微调

Arpit Garg, Simon Lucey, Hemanth Saratchandran

AI总结 提出LR-LoRA方法,通过训练过程中学习适配器秩而非固定秩,在语言理解和常识推理基准上达到最先进性能。

详情
Comments
In Submission
AI中文摘要

低秩适配(LoRA)是一种流行的参数高效微调(PEFT)方法,通过将权重更新限制为低秩适配器,在低维子空间中进行优化,从而引入固定的低秩归纳偏置。在这项工作中,我们质疑固定秩约束是否是参数高效微调最有效的归纳偏置。我们引入了*可学习秩LoRA(LR-LoRA)*,一种在训练过程中学习适配器秩的PEFT方法。LR-LoRA不为所有适配器层规定统一的秩,而是允许优化器为每一层确定合适的秩。使用这种方法,我们发现学习到的秩在层间存在显著差异,Transformer模型中的注意力层和MLP层表现出系统性的不同秩偏好。在一系列语言理解和常识推理基准测试中,LR-LoRA在大多数设置下达到了最先进的性能,并且始终优于强大的PEFT基线,表明可学习秩比固定秩适配提供了更灵活和有效的归纳偏置。

英文摘要

Low-Rank Adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method that restricts weight updates to low-rank adapters, introducing a fixed low-rank inductive bias by optimizing in a low-dimensional subspace. In this work, we question whether a fixed-rank constraint is the most effective inductive bias for parameter-efficient fine-tuning. We introduce *Learnable Rank LoRA (LR-LoRA)*, a PEFT method in which the adapter rank is learned during the training process. Instead of prescribing a uniform rank for all adapter layers, LR-LoRA allows the optimizer to determine the appropriate rank for each layer. Using this approach, we find substantial layer-wise variation in the learned ranks, with the attention and MLP layers in the transformer models exhibiting systematically different rank preferences. Across a range of language understanding and commonsense reasoning benchmarks, LR-LoRA achieves state-of-the-art performance in most settings and consistently outperforms strong PEFT baselines, demonstrating that a learnable rank provides a more flexible and effective inductive bias than fixed-rank adaptations.

2606.04324 2026-06-04 cs.LG stat.ML

Neural Galerkin Normalizing Flows for Bayesian Inference of Diffusions with Inaccessible Boundaries

用于具有不可达边界的扩散模型贝叶斯推断的神经Galerkin归一化流

Riccardo Saporiti, Fabio Nobile

AI总结 提出一种新的归一化流架构,通过神经Galerkin框架求解Fokker-Planck方程,学习扩散过程在两次观测之间的转移密度函数,从而高效实现贝叶斯推断。

详情
Comments
27 pages, 12 figures
AI中文摘要

从离散观测对扩散模型参数进行贝叶斯推断的主要挑战之一是,在连续观测时间之间无法获得转移密度函数的解析表达式,而该函数是推导似然函数所必需的。扩展先前使用归一化流求解Fokker-Planck型偏微分方程的研究,我们提出一种新的归一化流架构,用于学习扩散过程在两个观测时间之间的转移密度函数。我们通过神经Galerkin框架,以狄拉克质量作为初始条件,在初始数据和扩散系数的指定训练分布上求解相关的Fokker-Planck方程来实现这一点。我们特别关注扩散矩阵在某些不可达边界区域消失的过程,例如满足Feller条件的随机波动率模型。沿观测轨迹评估所获得的转移密度的乘积近似似然函数,从而通过马尔可夫链蒙特卡洛实现廉价的后验采样。在离线训练阶段之后,推断变得显著更高效,因为它避免了为MCMC采样器提出的每个参数实时求解Fokker-Planck方程,或依赖其他涉及重复模拟扩散桥的无似然贝叶斯推断方法。

英文摘要

One of the primary challenges in Bayesian inference on the parameters of a diffusion model from discrete observations is the unavailability of an analytical expression for the transition density function between consecutive observation times, which is needed to derive the likelihood function. Extending previous studies that solve Fokker-Planck (FP) type partial differential equations with Normalizing Flows, we propose a new Normalizing Flow architecture to learn the transition density function of the diffusion process between two observation times. We do so by solving in a Neural Galerkin framework the associated FP equation with a Dirac mass as initial condition, over a specified training distribution of the initial datum and the coefficients of the diffusion. We specifically focus on processes whose diffusion matrix vanishes in certain inaccessible boundary regions, such as Stochastic Volatility models that satisfy a Feller condition. The product of the obtained transition densities evaluated along the observed trajectory approximates the likelihood function, thereby enabling cheap posterior sampling via Markov chain Monte Carlo (MCMC). After the offline training phase, inference becomes significantly more efficient, as it avoids the need to solve the FP equation in real time for each parameter proposed by the MCMC sampler or to rely on other likelihood-free methods for Bayesian inference that involve repeated simulation of diffusion bridges.

2606.04323 2026-06-04 cs.CV

Answer Self-Consistency with Margin-Triggered Question Re-Arbitration for the CVPR 2026 VidLLMs Challenge

面向CVPR 2026 VidLLMs挑战赛的基于边际触发问题重新仲裁的答案自一致性方法

Tomoya Miyazawa, Hiroyasu Okuno

AI总结 提出一种无需训练的测试时推理框架ASC-MQRA,通过答案自一致性聚合多轮视频问答结果,并利用边际触发机制对低置信度样本进行条件性重新仲裁,在CVPR 2026 VidLLMs挑战赛Track 2上取得领先性能。

详情
AI中文摘要

在本报告中,我们提出了针对CVPR 2026 VidLLMs挑战赛Track 2的解决方案。该赛道评估视频中的视觉关系推理能力,模型需要推断并非总是明确可见的关系。我们提出了答案自一致性结合边际触发问题重新仲裁(ASC-MQRA),一种基于多模态推理模型的无需训练的测试时推理框架。核心ASC组件执行多次随机视频问答运行,并通过答案级别的自一致性聚合其答案选择。这显著优于单次推理,并构成了我们的最终测试提交。我们进一步研究了MQRA,一种针对低边际样本的条件性重新仲裁模块,其中第一阶段的投票分布指示了不确定性。我们的投票边际分析表明,低边际样本通常在前几名候选答案中包含真实答案,这促使MQRA缩小候选集并仅针对保留的候选答案重新观看视频。在验证集上,MQRA相比ASC进一步提升,表明低边际投票分布可以提供有用的不确定性信号。然而,在测试集上,MQRA相对于ASC略微降低了性能,表明重新仲裁对触发子集的大小和类别分布敏感。因此,我们的最终测试提交使用了不带重新仲裁的ASC,在验证集上达到72.73的平均准确率和78.34的类别宏平均准确率,在测试集上达到81.16的平均准确率和80.91的类别宏平均准确率。本报告详细介绍了我们的提示策略、实现设置、消融研究和诊断分析。代码可在https://github.com/data-analytics-labo/ASC-MQRA获取。

英文摘要

In this report, we present our solution for Track 2 of the CVPR 2026 VidLLMs Challenge. This track evaluates visual relational reasoning in videos, where models must infer relations that are not always explicitly visible. We propose Answer Self-Consistency with Margin-Triggered Question Re-Arbitration (ASC-MQRA), a training-free test-time reasoning framework built on a multimodal reasoning model. The core ASC component performs multiple stochastic video question-answering runs and aggregates their answer choices through answer-level self-consistency. This substantially improves over single-pass inference and forms our final test submission. We further study MQRA, a conditional re-arbitration module for low-margin examples where the first-stage vote distribution indicates uncertainty. Our vote-margin analysis shows that low-margin examples often retain the ground-truth answer among the top candidates, motivating MQRA to narrow the candidate set and re-watch the video only over the retained candidates. On validation, MQRA further improves over ASC, indicating that low-margin vote distributions can provide a useful uncertainty signal. On test, however, MQRA slightly degrades performance relative to ASC, suggesting that re-arbitration is sensitive to the size and category distribution of the triggered subset. Our final test submission therefore uses ASC without re-arbitration, achieving 72.73 average accuracy and 78.34 category-wise macro average accuracy on validation, and 81.16 average accuracy and 80.91 category-wise macro average accuracy on test. This report details our prompting strategy, implementation setup, ablation studies, and diagnostic analyses. The code is available at https://github.com/data-analytics-labo/ASC-MQRA

2606.04321 2026-06-04 cs.AI

The Digital Apprentice: A Framework for Human-Directed Agentic AI Development

数字学徒:面向人类指导的自主AI开发框架

Travis Weber, Rohit Taneja

AI总结 提出数字学徒框架,通过方法论捕获、授权和持续对齐三个组件,使AI代理在人类指导下逐步获得自主权,实现可扩展且可信的自主系统。

详情
Comments
Submitted to ACM AI Leadership Summit 2026, Visionary Papers Track. 5 pages, 2 figures
AI中文摘要

自主AI部署面临一个反复出现的设计张力:重度人类监督限制了规模,而广泛自主则超出问责范围。这两种姿态都无法提供负责任委派所需的治理基础设施。我们提出数字学徒,一个可扩展、安全的AI代理框架,其中自主权是挣得的,而非假设的。数字学徒是一个发展型学习者,内化指导人类的隐性方法论,仅在经验证据证明合理时,才逐步通过每个技能的自主层级。结果是一个随时间变得真正有用,同时保持与特定人类标准一致的代理。三个架构组件使之成为可能。(1) 方法论捕获,将指导专家的隐性方法提炼为结构化资产。(2) 授权,自主升级由明确的人类批准控制。(3) 持续对齐,在运行时纠正漂移,并将每次纠正转化为自有偏好数据。我们将该框架实例化为推理时控制平面。我们对质量框架进行数学建模,并讨论旨在提高质量的策略和技术。我们将该框架应用于开放专业语料库,并展示在流量变化下,捕获数据漂移并在运行时应用不同技术如何恢复降级的质量维度。其意义超越任何单一应用。我们相信,这三个支柱作为一个系统缝合在一起,为能够在不牺牲信任的情况下扩展的自主系统提供了一条更安全、更可行的路径。

英文摘要

Agentic AI deployments face a recurring design tension: heavy human oversight limits scale, while broad autonomy outruns accountability. Neither posture provides the governance infrastructure required for responsible delegation. We present the Digital Apprentice, a framework for scalable, safe AI agency in which autonomy is earned, not assumed. The Digital Apprentice is a developmental learner that internalizes the tacit methodology of a directing human, graduating through per-skill autonomy tiers only when empirical evidence justifies it. The result is an agent that becomes genuinely useful over time while remaining aligned to a specific human's standards. Three architectural components make this possible. (1) Methodology capture, distilling a directing professional's tacit approach into structured assets. (2) Authorization, with autonomy escalation gated by explicit human approval. (3) Continuous alignment, correcting drift at runtime and converting each correction into owned preference data. We instantiate this framework as an inference-time control plane. We mathematically model the quality framework and discuss policies and techniques designed to raise quality. We apply the framework to an open professional corpus, and we show how catching data drift and applying a different technique at runtime recovers degraded quality dimensions under traffic shift. The implication extends beyond any single application. We believe these three pillars, stitched together as a system, form a safer and more viable path to agentic systems that can scale without sacrificing trust.