arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.31239 2026-06-01 stat.ML cs.AI cs.LG

Correcting Split Selection in Online Decision Trees via Anytime-Valid Inference

通过随时有效推断纠正在线决策树中的分裂选择

Salim I. Amoukou, Saumitra Mishra, Manuela Veloso

AI总结针对在线决策树分裂选择缺乏有效统计保证的问题，提出基于随时有效推断的方法，实现任意数据流下错误分裂的随时有效控制、预测优势下的有限承诺时间，并在平稳独立同分布数据下保证风险单调递减且每次分裂严格改善。

Comments Accepted as a Spotlight at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

基于装袋的集成方法，尤其是自适应随机森林，是数据流学习中最强的表现者之一。这些方法的共同点是依赖霍夫丁树作为基学习器，通过使用浓度不等式测试候选分裂是否显著优于其替代方案来增量式地构建决策树。尽管经验成功，现有变体缺乏有效的统计保证。当前分析依赖于固定样本浓度界，而分裂决策使用数据依赖的停止规则，这使其保证无效，并可能将错误分裂的概率推向1。我们引入了一种基于随时有效推断的原则性替代方案。我们的方法提供：(i) 在任意数据流（包括非平稳设置）下对错误分裂的随时有效控制；(ii) 在预测优势下的有限承诺时间；(iii) 在平稳独立同分布数据下，风险单调递减且每次分裂严格改善。在经验上，我们评估了独立树及其在非平稳流中在自适应随机森林中的使用。我们的方法提高了性能，同时生成了更小的树。

英文摘要

Bagging-based ensembles, most notably Adaptive Random Forests, are among the strongest performers for learning from data streams. A common denominator across these methods is their reliance on Hoeffding Trees as base learners, which grow decision trees incrementally by testing whether a candidate split is significantly better than its alternatives using concentration inequalities. Despite their empirical success, existing variants lack valid statistical guarantees. Current analyses rely on fixed-sample concentration bounds, while split decisions are made using data-dependent stopping rules, which invalidates their guarantees and can drive the probabilty of incorrect splits to one. We introduce a principled alternative based on anytime-valid inference. Our method provides: (i) anytime-valid control of false splits under arbitrary data streams, including non-stationary settings; (ii) finite commitment time under a predictive advantage; and (iii) under stationary i.i.d. data, risk is monotone decreasing and strictly improves at every split. Empirically, we evaluate both standalone trees and their use within Adaptive Random Forests on non-stationary streams. Our method improves performance while producing substantially smaller trees.

URL PDF HTML ☆

赞 0 踩 0

2605.31231 2026-06-01 math.NA cs.LG cs.NA

A holomorphic neural network framework for 3D boundary value problems governed by harmonic potentials

基于全纯神经网络的调和势控制的三维边值问题框架

Enrico Ballini, Allan Peter Engsig-Karup, Tito Andriollo

AI总结提出一种基于Whittaker积分公式和全纯神经网络的框架，通过构造精确满足偏微分方程的神经网络求解三维调和势边值问题，仅需边界配点训练，在拉普拉斯和线弹性问题中验证了精度。

详情

AI中文摘要

我们提出了一种基于神经网络的框架，用于求解解可表示为调和势的三维边值问题。该方法利用Whittaker积分公式，通过关于合适复变量的全纯函数来表示解。这些函数随后使用全纯神经网络进行逼近，从而保证全纯性要求。该公式的一个关键特征是，控制偏微分方程（PDE）通过构造精确满足。因此，与标准的物理信息神经网络相比，在域内部不需要PDE的残差最小化，训练完全基于边界配点。该方法针对三维拉普拉斯和线弹性问题进行了验证，在后一种情况下，位移和应力场通过Papkovich-Neuber势表示。数值结果表明，标量和矢量场均得到精确逼近，误差在整个域内保持可控。总体而言，该工作表明，将解析结构融入神经网络架构为三维边值问题的无网格逼近提供了一种自然且有效的框架，同时保留了控制方程的基本性质。

英文摘要

We present a neural-network-based framework for the solution of three-dimensional boundary value problems where the solution is expressible in terms of harmonic potentials. The approach leverages the Whittaker integral formula, which allows representing the solution through functions that are holomorphic with respect to a suitable complex variable. These functions are subsequently approximated using holomorphic neural networks, which guaranty fulfillment of the holomorphicity requirement. A key feature of the proposed formulation is that the governing partial differential equations (PDEs) are satisfied exactly by construction. Therefore, in contrast to standard physics-informed neural networks, no residual minimization of PDEs is required in the interior of the domain, and training is based exclusively on boundary collocation points. The method is validated against three-dimensional Laplace and linear elasticity problems, where, in the latter case, displacement and stress fields are expressed via the Papkovich-Neuber potentials. The numerical results show an accurate approximation of both scalar and vector fields, with errors remaining controlled throughout the domain. Overall, the work demonstrates that the incorporation of analytical structures into neural network architectures provides a natural and effective framework for the meshless approximation of three-dimensional boundary value problems while preserving the underlying properties of the governing equations.

URL PDF HTML ☆

赞 0 踩 0

2605.31224 2026-06-01 cs.CY cs.AI cs.HC

Comparing LLM-Based Conversational and Graphical Interfaces for Industrial Decision Tasks: An Exploratory Mixed-Methods Study

基于LLM的对话式与图形化界面在工业决策任务中的比较：一项探索性混合方法研究

Roberto Figliè, Simone Caputo, Alan Serrano, Tommaso Turchi, Daniele Mazzei

AI总结通过混合方法研究，比较了基于LLM的对话式界面与图形化仪表盘在工业决策任务中的表现，发现对话式界面可减少交互努力，但仪表盘在概览和验证方面仍有价值。

详情

AI中文摘要

生成式AI对话用户界面（CUI）作为访问和分析数据的新方式，在各个领域（包括工业领域）的应用正在增长。在工业领域，物联网设备产生的大量数据流经用户界面，可能需要对决策者新的分析需求进行适应。基于LLM的CUI通过自然语言的直接性，无需学习每个GUI设计的成本，有望提供一种与这些数据直接交互的新方式。此外，LLM的能力及其代理性为自动化某些任务并在决策活动中辅助推理提供了可能性。但这些承诺是否可靠？我们通过一项混合方法研究来探讨这一普遍问题，比较了最先进的仪表盘与对话代理。共有20名参与者使用两种界面完成四项复杂度不同的模拟工业决策任务。我们结合了心理工作量、完成时间和决策准确性的测量，以及通过主题分析进行的事后问卷和半结构化访谈。研究结果表明，对话代理可以通过支持更直接的信息访问来减少交互努力，而仪表盘在概览和验证方面仍然有价值。然而，这些好处可能因任务而异，需要通过更大规模的研究进行验证。

英文摘要

The use of Generative AI Conversational User Interfaces (CUI) as a new way to access and analyze data is growing in all sectors, and the industrial one is no exception. There, large amounts of data produced by IoT devices are flowing through user interfaces and may require them a new adaptation to the new analyses needs of decision-makers. LLM-based CUIs are promising a new way to directly interact with those data through the directness of natural language and without the learning costs that every GUI design has. Moreover, the capabilities of LLMs and their agency open up the possibility to automate some tasks and help with the reasoning during decision-making activities. But are this promises well founded? We try to scope this general question with a mixed-approach study comparing a state-of-the-art dashboard with a conversational agent. A total of 20 participants used both interfaces to complete four simulated industrial decision tasks of varying complexity. We combined measures of mental workload, completion time, and decision accuracy with a post-study questionnaire and semi-structured interviews analyzed through thematic analysis. The findings suggest that the conversational agent can reduce interactional effort by supporting more direct access to information, while the dashboard remains valuable for overview and verification. However, these benefits may vary across tasks and require validation through larger-scale studies.

URL PDF HTML ☆

赞 0 踩 0

2605.31199 2026-06-01 cs.CR cs.AI

MAECO-Lite: Modular Ontology for Dynamic Malware Analysis

MAECO-Lite：动态恶意软件分析的模块化本体

Zekeri Adams, Peter Švec, Ján Kľuka, Roderik Ploszek, Monday Onoja, Štefan Balogh, Martin Homola

AI总结针对MAEC和STIX在动态恶意软件分析中混淆工件与事件的问题，基于统一基础本体（UFO）进行本体分析，提出轻量级本体MAECO-Lite，通过模块化结构分离持久实体与运行时事件，提升语义清晰度和计算可用性。

详情

AI中文摘要

以实用且语义精确的方式捕获动态恶意软件行为仍然是网络威胁情报中的一个重大挑战。尽管MAEC和STIX等标准提供了广泛采用的词汇表来描述恶意软件工件和观测结果，但它们以相当复杂的结构表示数据，往往掩盖了重要的本体论区分。特别是，它们倾向于将持久的恶意软件工件与执行期间生成的事件混为一谈，从而模糊了本体设计基础标准中的核心区分。在本文中，我们以统一基础本体（UFO）为理论视角，对与动态恶意软件分析相关的核心MAEC和STIX构造进行了基础本体分析。我们的分析揭示了由于MAEC和STIX中工件、倾向和运行时事件的混淆而产生的一些本体论不匹配，这些不匹配使动态恶意软件行为的一致表示复杂化，并从实践角度限制了推理执行轨迹的能力。基于这些见解，我们提出了MAECO-Lite，一种轻量级本体，旨在表示数据并实现其处理以用于动态恶意软件分析。该本体采用模块化结构，以样本、进程、动作、系统工件和MITRE ATT&CK技术为中心，同时保持持久实体和运行时事件之间的清晰分离。使用描述逻辑概念学习算法的初步评估表明，简化的本体显著提高了学习性能，证明了基于本体的建模可以增强语义清晰度和计算可用性。

英文摘要

Capturing dynamic malware behavior in a practical but still semantically precise manner remains a significant challenge in cyber threat intelligence. While standards such as MAEC and STIX provide widely adopted vocabularies for describing malware artifacts and observations, they represent data with considerable complexity in structures that often obscure important ontological distinctions. In particular, they tend to conflate enduring malware artifacts with the events generated during execution, thereby flattening distinctions that are central in foundational standards for ontology design. In this paper, we conduct a foundational ontological analysis of core MAEC and STIX constructs relevant to dynamic malware analysis relying on Unified Foundational Ontology (UFO) as a theoretical lens. Our analysis reveals some ontological mismatches arising from the conflation of artifacts, dispositions, and runtime events in MAEC and STIX that complicate coherent representation of dynamic malware behavior and, from a practical perspective, limit the ability to reason about execution traces. Based on these insights, we propose MAECO-Lite, a lightweight ontology designed to represent data and operationalize their processing for dynamic malware analysis. The ontology adopts a modular structure centered on samples, processes, actions, system artifacts, and MITRE ATT&CK Techniques, while maintaining a clear separation between enduring entities and runtime events. An initial evaluation using description logic concept learning algorithms shows that the simplified ontology significantly improves learning performance, demonstrating that ontologically grounded modelling can enhance both semantic clarity and computational usability.

URL PDF HTML ☆

赞 0 踩 0

2605.31171 2026-06-01 cs.IR cs.AI

MIMO: Multilingual Information Retrieval via Monolingual Objectives

MIMO: 通过单语目标实现多语言信息检索

Youngjoon Jang, Seongtae Hong, Heuiseok Lim

AI总结提出MIMO两阶段框架，利用教师模型的稳定英语语义空间，通过知识蒸馏和跨语言对比学习联合优化，解决多语言信息检索中语言聚类和嵌入对齐-均匀性权衡问题。

详情

AI中文摘要

多语言信息检索（MLIR）反映了真实的搜索环境，其中查询和相关文档可能以不同语言出现在混合语言语料库中。然而，现有的嵌入模型主要针对多单语检索进行优化，在MLIR设置中其性能通常会下降。此外，直接将传统对比学习应用于MLIR会加剧语言聚类，并暴露跨语言对齐与嵌入均匀性之间的权衡。为了解决这些局限性，我们提出了MIMO：通过单语目标实现多语言信息检索，这是一个两阶段框架，使用来自高性能教师模型的稳定英语语义空间作为锚点。MIMO首先通过知识蒸馏初始化学生模型的跨语言对齐，然后联合优化蒸馏和跨语言对比学习，以提高检索判别力同时保持对齐。大量实验表明，MIMO在各种MLIR和多单语基准测试中始终优于现有的跨语言训练基线。MIMO在与类似或更大参数规模的现成模型相比也保持竞争力。此外，我们的跨语言对齐-均匀性分析阐明了两个损失组件的不同作用，并表明它们的组合在对齐和均匀性之间产生了有利的权衡。

英文摘要

Multilingual Information Retrieval (MLIR) reflects real-world search environments in which queries and relevant documents may appear in different languages within a mixed-language corpus. However, existing embedding models are primarily optimized for Multi-Monolingual retrieval and their performance often degrades in MLIR settings. Moreover, directly applying conventional contrastive learning to MLIR can exacerbate language clustering and expose a trade-off between cross-lingual alignment and embedding uniformity. To address these limitations, we propose MIMO: Multilingual Information Retrieval via Monolingual Objectives, a two-stage framework that uses a stable English semantic space from a high-performing teacher model as an anchor. MIMO first initializes the student model's cross-lingual alignment through knowledge distillation, and then jointly optimizes distillation and cross-lingual contrastive learning to improve retrieval discrimination while preserving alignment. Extensive experiments show that MIMO consistently outperforms existing cross-lingual training baselines across various MLIR and Multi-Monolingual benchmarks. MIMO also remains competitive with off-the-shelf models of similar or larger parameter scales. Furthermore, our cross-lingual Alignment-Uniformity analysis clarifies the distinct roles of the two loss components and shows that their combination yields a favorable trade-off between alignment and uniformity.

URL PDF HTML ☆

赞 0 踩 0

2605.31163 2026-06-01 stat.ML cs.LG

Memory by Design: Probabilistic Sequence Layers

记忆设计：概率序列层

Matthew Dowling, Hyungju Jeon, Cristina Savin, Il Memming Park

AI总结提出设计-模型框架，通过精确贝叶斯滤波推导高效循环序列映射，线性高斯实例中的贝叶斯层传播均值和协方差以跟踪不确定性，统一多种次二次递归，并提升鲁棒性和长上下文检索。

Comments Preprint, in submission

详情

AI中文摘要

我们引入了设计-模型框架：一种从关于记忆的显式假设中推导高效循环序列映射的方法。设计模型通过精确贝叶斯滤波将证据写入记忆；查询相关的读出产生一个预测分布，其均值即为层输出。在我们的线性高斯实例中，贝叶斯层同时传播均值和协方差：协方差跟踪存储关联的不确定性，引导写入朝向不确定方向，随着证据积累而衰减增益，并保留自信的记忆。同一框架统一了几种次二次递归。线性注意力、GLA和Mamba-2/SSD在一个设计模型下是精确滤波器，而DeltaNet及相关Delta-rule模型在另一个设计模型下作为协方差重置约简出现。恢复协方差为检索动力学提供了闭式预测，并经实验验证，在受控碰撞研究、学习关联回忆和Zoology MQAR基准上，改善了训练范围外的鲁棒性；将贝叶斯层蒸馏到预训练的340M Gated DeltaNet中，在匹配计算下提升了RULER长上下文检索性能。

英文摘要

We introduce the design-model framework: a way to derive efficient recurrent sequence maps from explicit assumptions about memory. A design model writes evidence into memory by exact Bayesian filtering; a query-dependent readout produces a predictive distribution whose mean is the layer output. In our linear-Gaussian instantiation, the \emph{Bayesian Layer} propagates both a mean and a covariance: the covariance tracks uncertainty over stored associations, steering writes toward uncertain directions, attenuating gains as evidence accumulates, and preserving confident memories. The same framework unifies several sub-quadratic recurrences. Linear attention, GLA, and Mamba-2/SSD are exact filters under one design model, whereas DeltaNet and related Delta-rule models arise as covariance-reset reductions under another. Restoring the covariance yields closed-form predictions for retrieval dynamics, verified empirically, and improves robustness beyond the training regime across controlled collision studies, learned associative recall, and the Zoology MQAR benchmark; distilling Bayesian Layers into a pretrained 340M Gated DeltaNet improves RULER long-context retrieval at matched compute.

URL PDF HTML ☆

赞 0 踩 0

2605.31152 2026-06-01 stat.ML cs.LG cs.NA math.NA

Approximation and learning of anisotropic and mixed smooth functions by deep ReLU neural networks

深度ReLU神经网络对各向异性和混合光滑函数的逼近与学习

Yunfei Yang, Jun Fan

AI总结本文研究深度ReLU神经网络对各向异性和混合光滑函数类的逼近率，并证明在平均光滑度条件下可达到接近最优的逼近速率。

详情

AI中文摘要

本文研究深度ReLU神经网络逼近和学习光滑函数的效率。当误差以$L^p([0,1]^d)$范数度量且逼近器为宽度$W$、深度$L$的网络时，近期工作已证明在Sobolev嵌入条件$s/d>1/q-1/p$下，对于Besov空间$\mathcal{B}^s_{q,r}([0,1]^d)$有超逼近率$\mathcal{O}((WL)^{-2s/d})$。为克服该速率中的维数灾难，我们将此结果推广到各向异性和混合光滑函数类。对于各向异性光滑度$oldsymbol{s}=(s_1,\dots,s_d)$的各向异性Besov空间$\mathcal{B}^{oldsymbol{s}}_{q,r}([0,1]^d)$，在嵌入条件$ ilde{s} > 1/q-1/p$下建立逼近率$\mathcal{O}((WL)^{-2 ilde{s}})$，其中平均光滑度$ ilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$。对于混合光滑度$s>1/q-1/p$的混合光滑Besov空间$\mathcal{MB}^s_{q,r}([0,1]^d)$，我们证明逼近率$\mathcal{O}((WL)^{-2s})$（忽略对数因子）。利用这些结果，我们还推导了各向异性Besov函数复合的逼近界。作为应用，表明深度ReLU神经网络可在广泛光滑函数类上达到极小化最优速率（忽略对数因子）。

英文摘要

This paper studies how efficiently deep ReLU neural networks can approximate and learn smooth functions. When the error is measured in $L^p([0,1]^d)$ norm and the approximator is a network with width $W$ and depth $L$, recent works have proven the supper approximation rate $\mathcal{O}((WL)^{-2s/d})$ for Besov space $\mathcal{B}^s_{q,r}([0,1]^d)$ under the Sobolev embedding condition $s/d>1/q-1/p$. In order to overcome the curse of dimensionality in this rate, we extent this result to anisotropic and mixed smooth function classes. We establish the approximation rate $\mathcal{O}((WL)^{-2\tilde{s}})$ for anisotropic Besov space $\mathcal{B}^{\boldsymbol{s}}_{q,r}([0,1]^d)$ with anisotropic smoothness $\boldsymbol{s}=(s_1,\dots,s_d)$ under the embedding condition $\tilde{s} > 1/q-1/p$, where the mean smoothness $\tilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$. For mixed smooth Besov space $\mathcal{MB}^s_{q,r}([0,1]^d)$ with mixed smoothness $s>1/q-1/p$, we show that the approximation rate $\mathcal{O}((WL)^{-2s})$ holds up to logarithmic factors. Using these results, we also derive approximation bounds for the composition of anisotropic Besov functions. As an application, it is shown that deep ReLU neural networks can achieve minimax optimal rates up to logarithmic factors for a wide range of smooth function classes.

URL PDF HTML ☆

赞 0 踩 0

2605.31149 2026-06-01 cs.HC cs.AI

Developing a UXR Point of View for Cognitive Accessibility in Mobile Learning with Generative AI

利用生成式AI在移动学习中开发认知无障碍的UXR视角

Fatima Ahmad Muazu, Festus Adedoyin, Huseyin Dogan, Abiodun Adedeji, Melike Akca, Olumuyiwa Ayorinde

AI总结本研究通过结合UX研究原则和大语言模型支持的分析，提出认知无障碍UXR剧本，以改善面向认知障碍学习者的移动学习系统需求质量。

详情

AI中文摘要

本研究探讨如何利用UX研究（UXR）原则，结合大语言模型（LLM）支持的分析，提高为认知障碍学习者设计的移动学习系统的需求质量。以UXR视角（PoV）金字塔为方法论框架，研究分为四个阶段：心理、行为和设计层的基础结构；使用DeLone和McLean信息系统成功模型及质量功能展开（QFD）进行结构化验证；通过开发九张认知无障碍UXR游戏卡进行洞察整合；以及支持跨学科沟通的利益相关者特定PoV表述。在人工监督下，整合LLM支持的合成以协助主题聚类、需求细化和假设制定。研究结果表明，移动学习中的许多可用性和参与度挑战源于模糊或未充分定义的需求，而不仅仅是界面设计。通过将认知无障碍原则嵌入可测量且技术可追溯的需求中，所提出的认知无障碍UXR剧本为协调理论、系统架构和利益相关者策略提供了结构化路径。

英文摘要

This study investigates how UX research (UXR) principles, combined with Large Language Model (LLM)-supported analysis, can be used to improve the quality of requirements for mobile learning systems designed for learners with cognitive disabilities. Using the UXR Point-of-View (PoV) pyramid as a methodological framework, the study progressed through four stages: foundational structuring of psychological, behavioral, and design layers; structured validation using the DeLone and McLean Information Systems Success Model and Quality Function Deployment (QFD); insight consolidation through the development of nine Cognitive Accessibility UXR Play Cards; and stakeholder-specific PoV articulation to support interdisciplinary communication. LLM-supported synthesis was integrated to assist in theme clustering, requirement refinement, and hypothesis formulation under human oversight. Findings suggest that many usability and engagement challenges in mobile learning originate from ambiguous or under-specified requirements rather than interface design alone. By embedding cognitive accessibility principles into measurable and technically traceable requirements, the proposed Cognitive Accessibility UXR Playbook provides a structured pathway for aligning theory, system architecture, and stakeholder strategy.

URL PDF HTML ☆

赞 0 踩 0

2605.31147 2026-06-01 cs.HC cs.AI

Developing a Culturally Grounded, AI-Augmented UX Research Point of View (POV): An Exemplar Case Study from Telemedicine Dementia Care

开发一个文化根基的、AI增强的用户体验研究观点（POV）：来自远程医疗痴呆症护理的示例案例研究

Abiodun Adedeji, Huseyin Dogan, Festus Adedoyin, Michelle Heward, Melike Akca, Emmanuel Oluwatosin Oluokun, Fatima Ahmad Muhazu, Olumuyiwa Ayorinde

AI总结本文通过一个远程医疗痴呆症护理案例，展示了如何结合混合方法研究、假设生成和本体建模，并集成生成式AI作为协作工具，来构建一个文化敏感的、可辩护的用户体验研究观点（POV）。

详情

AI中文摘要

用户体验研究（UXR）观点（POV）将复杂且通常碎片化的研究证据提炼为可操作的视角，指导团队理解用户需求、构建设计决策并协调利益相关者。尽管POV在行业实践中被广泛使用，但公开记录POV构建过程的例子很少，特别是在文化敏感和资源匮乏的背景下。本文展示了一个示例案例研究，展示了如何开发一个文化根基的、AI增强的UXR POV，以指导TeleDeCa——一个面向尼日利亚家庭护理人员的远程医疗痴呆症护理框架。基于UXR POV Playbook和金字塔框架，我们说明了如何将混合方法研究、假设生成和基于本体的建模结合起来，形成一个可辩护的POV，而无需完全最终化的系统或验证结果。生成式AI（GenAI）作为有限的研究合作者被整合到UXR POV框架中，支持综合、假设探索和叙事构建，同时保留人类判断、伦理责任和文化敏感性。本文的贡献在于提取了可重用的Play Cards和一个Play，扩展了UXR POV Playbook，并为CHI 2026关于开发AI驱动的UXR POV的工作坊提供了示例材料。

英文摘要

User Experience Research (UXR) Points of View (POVs) distil complex and often fragmented research evidence into actionable perspectives that guide how teams interpret user needs, frame design decisions, and align stakeholders. Although POVs are widely used in industry practice, there are few published examples that explicitly document how POVs are constructed, particularly in culturally sensitive and low-resource contexts. This paper presents an exemplar case study demonstrating how a culturally grounded, AI-augmented UXR POV was developed to inform TeleDeCa, a telemedicine dementia care framework for family caregivers in Nigeria. Building on the UXR POV Playbook and pyramid framework, we illustrate how mixed-methods research, hypothesis generation, and ontology-based modelling can be combined to form a defensible POV without requiring a fully finalised system or validated outcomes. Generative AI (GenAI) is integrated across the UXR POV framework as a bounded research collaborator, supporting synthesis, hypothesis exploration, and narrative construction while preserving human judgment, ethical accountability, and cultural sensitivity. The contribution of this paper lies in the extraction of reusable Play Cards and a Play that extend the UXR POV Playbook and serve as exemplar material for the CHI 2026 workshop on developing AI-powered UXR POVs.

URL PDF HTML ☆

赞 0 踩 0

2605.31146 2026-06-01 cs.HC cs.AI

From Evidence to Design: Developing an AI-Augmented UX Research Point of View for Digital Wellbeing in Emergency and Public Safety Contexts

从证据到设计：开发面向紧急与公共安全情境下数字福祉的AI增强用户体验研究视角

Olumuyiwa Ayorinde, Huseyin Dogan, Festus Adedoyin, Nan Jiang, Emmanuel Oluokun, Abiodun Adedeji, Melike Akca

AI总结本研究结合用户体验研究方法与AI支持分析，针对紧急与公共安全人员开发数字福祉干预措施的设计方向，通过文献分析识别模式并整合行为改变技术与说服性设计原则，最终产出UXR PoV金字塔、九张UXR游戏卡和利益相关者叙事。

详情

AI中文摘要

本文研究如何将用户体验研究方法与AI支持分析相结合，为针对紧急与公共安全人员的数字福祉干预措施开发更清晰的设计方向。EPSP在高压、轮班制环境中工作，认知疲劳和不可预测的日程降低了他们对传统福祉工具的参与度。本研究使用UXR观点框架，应用AI支持的文献分析过程来识别反复出现的心理、行为和设计模式。在整个解释过程中整合了行为改变技术和说服性设计原则，以连接证据与实际设计推理。该过程产生了UXR PoV金字塔、九张UXR游戏卡和以利益相关者为中心的PoV叙事。研究结果表明，有效的EPSP福祉系统必须最小化认知努力、适应操作环境并优先考虑心理安全。这项工作展示了AI如何协助大规模证据解释，而人类研究人员则保持对情境判断和设计方向的责任。

英文摘要

This paper investigates how User Experience Research (UXR) methods can be combined with AI-supported analysis to develop clearer design direction for digital wellbeing interventions targeting Emergency and Public Safety Personnel (EPSP). EPSP work in high-stress, shift-based environments where cognitive fatigue and unpredictable schedules reduce engagement with conventional wellbeing tools. Using the UXR Point-of-View (PoV) framework, this study applied an AI-supported literature analysis process to identify recurring psychological, behavioural, and design patterns. Behaviour Change Techniques and Persuasive Technology principles were integrated throughout interpretation to connect evidence with practical design reasoning. The process resulted in a UXR PoV Pyramid, nine UXR Play Cards, and stakeholder focused PoV narratives. Findings show that effective wellbeing systems for EPSP must minimise cognitive effort, adapt to operational context, and prioritise psychological safety. The work demonstrates how AI can assist large-scale evidence interpretation while human researchers maintain responsibility for contextual judgement and design direction.

URL PDF HTML ☆

赞 0 踩 0

2605.31143 2026-06-01 cs.HC cs.AI

Extending the UXR Point of View Pyramid: A Generative AI-Augmented Methodology for Human-Centred AI Systems

扩展UXR观点金字塔：一种面向人本AI系统的生成式AI增强方法论

Festus Fatai Adedoyin, Huseyin Dogan, Melike Akca, Abiodun Adedeji

AI总结针对英国债务管理中的AI金融系统，通过扩展UXR观点金字塔，提出一种结合生成式AI的增强方法论，包括AI增强观点金字塔、结构化提示架构和AI驱动的Playbook卡片系统，以提升可解释性、公平性和问责性。

详情

AI中文摘要

英国家庭债务和生活成本压力的上升，加剧了AI驱动的金融技术在信贷评估、还款结构和债务支持服务中的作用。这些系统日益影响重大的财务决策，但它们在复杂的社会技术环境中运作，受到监管限制、算法不透明性和高度脆弱性风险的影响。用户体验研究（UXR）观点（PoVs）对于将异质性研究证据转化为产品和治理决策的战略方向至关重要。然而，现有的UXR PoV框架并非为AI中介的金融系统设计，而在此类系统中，可解释性、公平性和问责性至关重要。本文扩展了UXR PoV金字塔，形成了一种面向英国金融服务背景下以人为中心的AI债务管理技术的AI增强方法论框架。我们形式化了（1）AI增强的PoV金字塔，（2）用于综合和假设生成的结构化提示架构，以及（3）AI驱动的Playbook卡片系统，该系统将生成式AI嵌入UXR工作流程，同时保持可追溯性和伦理监督。生成式AI并非作为分析权威，而是作为受人类验证和监管意识约束的认识论支持机制。通过将该框架应用于债务管理技术（包括可负担性评估、还款计划和财务压力预测系统），本研究推进了高风险金融AI环境下的UXR方法论，并为CHI社区内负责任、AI驱动的UXR实践的发展做出了贡献。

英文摘要

Rising household debt and cost-of-living pressures in the United Kingdom have intensified the role of AI-driven financial technologies in mediating credit assessment, repayment structuring, and debt support services. These systems increasingly shape consequential financial decisions, yet they operate within complex socio-technical environments characterised by regulatory constraint, algorithmic opacity, and heightened vulnerability risk. User Experience Research (UXR) Points of View (PoVs) are critical in translating heterogeneous research evidence into strategic direction for product and governance decisions. However, the existing UXR PoV framework was not designed for AI-mediated financial systems where interpretability, fairness, and accountability are central. This paper extends the UXR PoV pyramid into an AI-augmented methodological framework for Human-Centred AI debt management technologies in the UK financial services context. We formalise (1) an AI-Augmented PoV Pyramid, (2) a structured prompt architecture for synthesis and hypothesis generation, and (3) an AI-enabled Playbook Card system that embeds Generative AI into UXR workflows while preserving traceability and ethical oversight. Generative AI is positioned not as an analytic authority, but as an epistemic support mechanism subject to human validation and regulatory awareness. By grounding the framework in debt management technologies, including affordability assessment, repayment planning, and financial stress prediction systems, this work advances UXR methodology for high-stakes financial AI environments and contributes to the evolution of responsible, AI-powered UXR practice within the CHI community.

URL PDF HTML ☆

赞 0 踩 0

2605.31140 2026-06-01 cs.CR cs.CL

EvoDefense: Co-Evolving Black-Box Defense with Large Language Models

EvoDefense：与大语言模型共同进化的黑盒防御

Yu Li, Yuenan Hou, Yingmei Wei, Yanming Guo, Chaochao Lu

AI总结提出一种基于经验引导的共同进化黑盒防御范式EvoDefense，通过守卫LLM和经验记忆模块在攻防迭代中优化策略，实现对未见攻击和目标模型的泛化防御。

详情

AI中文摘要

大型语言模型（LLM）仍然极易受到各种攻击，特别是在黑盒设置中，目标模型的内部结构不可访问。现有的黑盒防御通常依赖于预定义的过滤启发式方法，这些方法往往无法泛化到未见过的攻击类型和目标模型架构。我们引入了EvoDefense，一种经验引导的共同进化黑盒防御范式。EvoDefense使用一个守卫LLM来检测恶意查询，并使用一个经验记忆模块来积累先前交互中的防御知识。EvoDefense的核心是一个连续的攻防进化循环，其中攻击生成器和守卫模型通过经验引导的优化迭代地改进其攻击策略和防御策略。这种设计使得EvoDefense能够在无需重新训练的情况下泛化到未见过的攻击和目标模型。在HarmBench、AdvBench和AlpacaEval上的实验表明，EvoDefense在七个流行模型和五种代表性LLM攻击上实现了一致且强大的防御性能，同时保持了有竞争力的通用能力。在HarmBench上，EvoDefense将AutoDAN-turbo对Gemini-3-flash和LLaMA-3-8B-Instruct的攻击成功率（ASR）分别从29.4%和43.4%降低到8.4%和6.2%。

英文摘要

Large Language Models (LLMs) remain highly vulnerable to diverse attacks, particularly in black-box settings where the internals of target models are inaccessible. Existing black-box defenses typically rely on pre-defined filtering heuristics, which often fail to generalize to unseen attack types and target model architectures. We introduce EvoDefense, an experience-guided co-evolving black-box defense paradigm. EvoDefense employs a guard LLM to detect malicious queries and an experience memory module to accumulate defense knowledge from previous interactions. At the core of EvoDefense is a continuous attack-defense evolution loop, where an attack generator and the guard model iteratively refine their attack strategies and defense policies through experience-guided optimization. This design enables EvoDefense to generalize across unseen attacks and target models without retraining. Experiments on HarmBench, AdvBench, and AlpacaEval show that EvoDefense achieves consistently strong defense performance across seven popular models and five representative LLM attacks, while preserving competitive general capabilities. On HarmBench, EvoDefense reduces the attack success rate (ASR) of AutoDAN-turbo on Gemini-3-flash and LLaMA-3-8B-Instruct from 29.4% and 43.4% to 8.4% and 6.2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2605.31138 2026-06-01 cs.HC cs.AI

Developing an AI-Powered UX Research Point of View for Digital Health in A Regulatory Context: An Exemplar Case from MSM and Transgender HIV Care in Nigeria

在监管背景下开发AI驱动的用户体验研究视角：以尼日利亚MSM和跨性别者HIV护理为例

Emmanuel Oluwatosin Oluokun, Festus Fatai Adedoyin, Huseyin Dogan, Nan Jiang, Melike Akca, Abiodun Adedeji, Olumuyiwa Ayorinde, Fatima Ahmad Muazu

AI总结本文提出一种生成式AI增强的用户体验研究方法论，通过四阶段UXR流程和十张理论驱动的UXR游戏卡，指导尼日利亚男男性行为者（MSM）和跨性别者HIV护理中数字健康干预的设计，核心贡献是可复制的、关注污名和隐私的负责任GenAI使用框架。

详情

AI中文摘要

在法律和监管背景下的用户体验研究（UXR）面临独特挑战，需要专门的方法来保护弱势群体，同时产生可操作的见解。数字咨询、预约和药物配送平台在扩展护理可及性方面显示出前景；然而，它们的实际有效性因缺乏充分考虑到这些人群心理社会状况的、基于理论的用户体验研究方法论而受到限制。本文介绍了一种生成式AI增强的UXR方法论，基于UXR视角（PoV）剧本，指导为尼日利亚感染HIV/AIDS的男男性行为者（MSM）和跨性别者设计心理安全、低认知负荷的数字健康干预措施。基于涉及协同设计工作坊、主题分析和需求工程的实证研究，该方法论通过一个四阶段UXR过程实现，包括AI支持的假设生成、基础规划、通过构建模块生成洞察以及构建利益相关者特定的PoV叙述。该过程产生了十张理论驱动的UXR游戏卡，将心理机制和实证发现转化为可操作的设计指导。每张游戏卡包含可操作的任务、AI增强的方法和针对边缘化人群研究的伦理护栏。输出是一套十张理论驱动的UXR游戏卡，将心理洞察和实证证据转化为可操作的设计指导。核心贡献是一个可复制的、关注污名和隐私的框架，用于在UXR实践中负责任地使用GenAI，推进边缘化社区的人本数字健康设计。

英文摘要

User Experience Research (UXR) in a legal and regulatory contexts presents unique challenges that require specialised approaches to protect vulnerable populations whilst generating actionable insights. Digital consultation, appointment booking, and medication delivery platforms show promise for extending care access; however, their real-world effectiveness is curtailed by an absence of theoretically grounded user experience research (UXR) methodologies that adequately account for the psychosocial conditions of these populations. This paper introduces a Generative AI-augmented UXR methodology, grounded in the UXR Point of View (PoV) Playbook, to guide the design of psychologically safe, low-cognitive-load digital health interventions for MSM and transgender individuals living with HIV/AIDS in Nigeria. Drawing from empirical research involving co-design workshops, thematic analysis, and requirements engineering, the methodology is operationalised through a four-stage UXR process encompassing AI-supported hypothesis generation, foundational planning, insight generation via Building Blocks, and the construction of stakeholder-specific PoV narratives. This process results in ten theory-informed UXR Play Cards that translate psychological mechanisms and empirical findings into actionable design guidance. Each play contains actionable tasks, AI-augmented approaches, and ethical guardrails tailored for research with marginalised populations. The output is a set of ten theory-informed UXR Play Cards translating psychological insight and empirical evidence into actionable design guidance. The core contribution is a replicable, stigma-aware, and privacy-centred framework for responsible GenAI use in UXR practice, advancing human-centred digital health design for marginalised communities.

URL PDF HTML ☆

赞 0 踩 0

2605.31131 2026-06-01 cs.HC cs.AI

UXR PoV for Neuroinclusive Emotion Regulation

神经包容性情绪调节的用户体验研究观点

Melike Akca, Mona Giff, Deniz Cetinkaya, Huseyin Dogan, Stephen Giff

AI总结本文提出一种生成式AI增强的用户体验研究方法，结合DBT、SDT和COM-B理论框架，通过四阶段流程生成十张UXR游戏卡，为ADHD成人设计神经包容性的数字情绪调节干预。

详情

AI中文摘要

注意缺陷/多动障碍（ADHD）是一种精神疾病，表现为个体在注意力不集中、多动和冲动方面的发展不适当模式，并在决策和情绪调节（ER）方面存在困难。尽管基于数字和人工智能的干预措施扩大了情绪调节支持的获取途径，但许多现有系统仍受限于理论整合薄弱、对神经多样性的适应不足以及缺乏将心理学洞察与设计实践相结合的结构化用户体验研究（UXR）方法。本文介绍了一种生成式AI增强的UXR方法，以UXR观点（PoV）剧本为基础，支持为ADHD成人设计具有情感智能和神经包容性的数字情绪调节干预。该方法将实证证据与既定心理学框架——辩证行为疗法（DBT）、自我决定理论（SDT）和COM-B行为模型相结合，并利用生成式AI作为协同分析工具，支持综合、假设形成和设计阐述。该方法通过四阶段UXR流程实施，包括AI支持的假设生成、基础规划、通过构建模块生成洞察以及构建利益相关者特定的PoV叙事。该流程产生了一套十张理论驱动的UXR游戏卡，将心理机制和实证发现转化为可操作的设计指导。本研究的主要贡献是一个可复制的、具有偏差意识的框架，用于将生成式AI整合到UXR实践中，推进数字心理健康设计中以人为本和神经包容性的方法。

英文摘要

Attention-deficit/hyperactivity disorder (ADHD) is a psychiatric disorder which presents itself in individuals through patterns of developmentally inappropriate levels of inattentiveness, hyperactivity, and impulsivity, with difficulties in decision making and emotional regulation (ER). Although digital and AI-based interventions have expanded access to ER support, many existing systems remain limited by weak theoretical integration, insufficient accommodation of neurodiversity, and a lack of structured user experience research (UXR) methodologies, that bridge psychological insight with design practice. This paper introduces a Generative AI-augmented UXR methodology, grounded in the UXR Point of View (PoV) Playbook, to support the design of emotionally intelligent and Neuroinclusive digital ER interventions for adults with ADHD. The approach integrates empirical evidence with established psychological frameworks Dialectical Behaviour Therapy (DBT), Self-Determination Theory (SDT), and the COM-B behavioural model and leverages Generative AI as a co-analytic tool to support synthesis, hypothesis formation, and design articulation. The methodology is operationalized through a four-stage UXR process encompassing AI-supported hypothesis generation, foundational planning, insight generation via Building Blocks, and the construction of stakeholder-specific PoV narratives. This process results in a set of ten theory informed UXR Play Cards that translate psychological mechanisms and empirical findings into actionable design guidance. The primary contribution of this work is a replicable, bias-aware framework for integrating Generative AI into UXR practice, advancing human-centred and Neuroinclusive approaches to digital mental health design.

URL PDF HTML ☆

赞 0 踩 0

2605.31120 2026-06-01 cs.GR cs.AI cs.LG

SWIM: Single-Instance Whole-Body Imitation for swiMming

SWIM: 用于游泳的单实例全身模仿

Binglun Wang, Edmond S. L. Ho, He Wang

AI总结提出一种基于物理的游泳动作合成方法SWIM，通过单实例模仿学习实现全身协调与流体连续交互，在数据效率、稳定性、鲁棒性和泛化性上优于现有方法。

详情

AI中文摘要

我们提出了一种合成基于物理的游泳动作的新方法。基于物理的角色动画旨在生成物理有效、可控且自然的动作，能够应对意外干扰，其中难度的一个决定性因素是任务的复杂性，尤其是与所需环境交互的复杂程度。现有研究已在静态和动态环境中的各种任务上取得成功。我们进一步将难度推向游泳，这需要全身协调和与流体的持续交互，这是与环境交互时的一个新复杂性层次。这种复杂性在学习控制时面临挑战，包括在易变的环境力下的控制学习、将控制泛化到不同环境和游泳风格、缺乏数据参考，以及在控制学习过程中不可避免的极其缓慢的物理模拟。为此，我们提出了SWIM，一种新的游泳动作模仿方法，它可以从单个游泳动作中学习，并泛化到未见过的环境、身体条件和游泳风格。广泛的评估和比较表明，SWIM具有数据效率高、稳定、鲁棒和可泛化的特点，在多个任务类别和指标上优于替代方法。

英文摘要

We propose a new method for synthesizing physically-based swimming motions. Physically-based character animation aims to generate physically valid, controllable, and natural-looking motions which can respond to unexpected disturbances, where one dictating factor of difficulty is the complexity of the task, especially the level of sophistication of the required interactions with the environment. Existing research has succeeded in various tasks in static and dynamic environments. We push the difficulty further to swimming, which requires full-body coordination and continuous interactions with fluids, a new level of complexity when it comes to interacting with the environment. This complexity imposes challenges in learning control under volatile environmental forces, generalizing control to different environments and swimming styles, lack of data references, and prohibitively slow physical simulation which is inevitable during control learning. To this end, we propose SWIM, a new imitation method for swimming motions, which can learn from a single swimming motion and generalize to unseen environments, body conditions, and swimming styles. Extensive evaluation and comparison demonstrate that SWIM is data-efficient, stable, robust, and generalizable, outperforming alternative methods across multiple classes of tasks and metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.31097 2026-06-01 cs.DB cs.AI

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

SpecDB: 通过面向特征的分解生成LLM定制的数据库

Yunkai Lou, Longbin Lai, Shunyang Li, Zhengping Qian, Ying Zhang

AI总结提出SpecDB系统，利用大语言模型通过面向特征的分解和依赖图DBGraph，从自然语言工作负载描述自动生成定制化关系数据库，在TPC-C测试中达到与PostgreSQL和MySQL相当的性能，代码量仅为它们的3%。

详情

AI中文摘要

主流关系数据库在部署时提供统一的特征集，尽管单个工作负载只使用可用子系统的一小部分。我们研究是否可以根据目标工作负载按需生成具有匹配特征集的数据库。我们提出SpecDB，一个使用大语言模型（LLM）合成定制化关系数据库的系统。我们调查了9个生产系统，并将其分解为10个功能模块，每个模块进一步划分为实现变体。为了捕获跨模块依赖关系，包括不相交子树中的实现必须协同设计的情况，我们采用FODA特征模型，并用合作边扩展它，得到依赖图DBGraph。SpecDB通过分层模块构建流水线来操作DBGraph，其中每个模块由专门的子代理（由三个内部代理驱动：主代理、测试代理、架构代理）生成、验证和集成，以及一个精炼代理，该代理根据用户提供的精炼工具（对现有数据库源代码具有只读访问权限）迭代修复和调整组装的数据库。配套的选择组件将自然语言工作负载描述转换为一组实现变体，提供从工作负载描述到可部署数据库的端到端流水线。我们在TPC-C上使用BenchmarkSQL评估SpecDB。生成的数据库（23,779行Rust代码）在1个和10个仓库下完成了60分钟的TPC-C测试，零错误。在10个仓库下，它达到tpmC=130，而PostgreSQL为128，MySQL为127，延迟相当，代码量约为它们的3%。由于代理在模块规范级别而非产品源代码级别操作，它原则上可以跨系统边界组合技术。随着LLM成本的下降，为目标工作负载生成专用数据库正变得简单。

英文摘要

Mainstream relational databases ship a uniform feature set across deployments, although individual workloads exercise only a fraction of the available subsystems. We investigate whether a database can instead be generated on demand with a feature set matched to the target workload. We present SpecDB, a system that uses large language models (LLMs) to synthesize customized relational databases. We survey 9 production systems and decompose them into 10 functional modules, each further divided into implementation variants. To capture cross-module dependencies, including cases where implementations in disjoint subtrees must be co-designed, we adopt the FODA feature model and extend it with a cooperate edge, yielding a dependency graph DBGraph. SpecDB operationalizes DBGraph through a layered module-construction pipeline in which each module is generated, validated, and integrated by a dedicated subagent (driven by three inner agents: Main, Tester, Architect), and a Refining Agent that iteratively repairs and tunes the assembled database against a user-supplied refining harness with read-only access to existing database source code. A companion selection component translates a natural-language workload description into a set of implementation variants, providing an end-to-end pipeline from workload description to deployable database. We evaluate SpecDB on TPC-C with BenchmarkSQL. The generated database (23,779 lines of Rust) completes 60-minute TPC-C at 1 and 10 warehouses with zero errors. At 10 warehouses it reaches tpmC=130, compared to 128 for PostgreSQL and 127 for MySQL, with comparable latency at ~3% of their code size. Because the agent operates at module-specification level rather than product source, it can in principle combine techniques across system boundaries. Paired with falling LLM costs, generating a purpose-built database for a target workload is becoming straightforward.

URL PDF HTML ☆

赞 0 踩 0

2605.31080 2026-06-01 cs.MM cs.AI cs.CL cs.CV cs.HC

A Pilot Study on Curator-Guided Multilingual Art Description for Blind and Low-Vision Audiences with Small Vision-Language Models

策展人引导的多语言艺术描述对盲人和低视力观众的小型视觉语言模型试点研究

Iosif Tsangko, Andreas Triantafyllopoulos, George Margetis, Ioana Crihana, Björn W. Schuller

AI总结本研究使用小型视觉语言模型Qwen2.5-VL-3B-Instruct，通过策展人引导的方式为盲人和低视力观众生成德语、罗马尼亚语和塞尔维亚语的多语言艺术描述，发现语言特定适配器在控制性和视觉基础描述质量上优于多语言适配器。

Comments 7 pages, 2 figures, 3 tables. Preprint

详情

AI中文摘要

盲人和低视力（BLV）观众在视觉艺术描述方面仍然服务不足，尤其是在跨语言和博物馆环境中，隐私和知识产权限制可能倾向于使用小型本地视觉语言模型（VLM）。本试点研究使用Qwen2.5-VL-3B-Instruct，针对德语、罗马尼亚语和塞尔维亚语，调查了策展人引导的多语言艺术描述。我们从艺术品图像和元数据构建了一个平行的BLV导向字幕语料库，并在固定骨干网络和训练预算下，比较了语言特定的LoRA适配器与单个多语言适配器。评估结合了自动词汇和基于嵌入的指标，以及针对小型罗马尼亚BLV试点研究校准的LLM作为评判协议。在我们的试点设置下，语言特定适配器在罗马尼亚语和塞尔维亚语上表现出更稳定的可控性和视觉基础描述质量，而多语言适配器在德语上仍具有竞争力。我们将这些发现视为小型本地VLM的部署导向证据，并强调在得出关于多语言可访问性的总体结论之前，需要进行更大规模的BLV用户研究和更广泛的语言覆盖。

英文摘要

Blind and low-vision (BLV) audiences remain underserved by visual art descriptions, particularly across languages and in museum settings where privacy and intellectual-property constraints may favour small on-premise vision-language models (VLMs). This pilot study investigates curator-guided multilingual art description with Qwen2.5-VL-3B-Instruct for German, Romanian, and Serbian. We construct a parallel BLV-oriented caption corpus from artwork images and metadata, and compare language-specific LoRA adapters with a single multilingual adapter under a fixed backbone and training budget. Evaluation combines automatic lexical and embedding-based metrics with an LLM-as-Judge protocol calibrated against a small Romanian BLV pilot study. Under our pilot setup, language-specific adapters show more stable controllability and visually grounded description quality for Romanian and Serbian, while multilingual adaptation remains competitive in German. We frame these findings as deployment-oriented evidence for small on-premise VLMs, and highlight the need for larger BLV user studies and broader language coverage before drawing general conclusions about multilingual accessibility.

URL PDF HTML ☆

赞 0 踩 0

2605.31065 2026-06-01 eess.SP cs.AI

DRIFT: Joint Channel Estimation and Prediction Towards Pilotless 6G Non-Terrestrial Networks

DRIFT：面向无导频6G非地面网络的联合信道估计与预测

Bruno De Filippo, Carla Amatetti, Alessandro Vanelli-Coralli

AI总结针对6G低轨卫星网络中导频开销大和星载计算受限的问题，提出一种轻量级联合信道估计与预测框架DRIFT，通过仅在初始时隙发送导频并利用数据驱动处理后续时隙，在低计算复杂度下实现高达12%的频谱效率提升。

Comments Submitted for publication

详情

AI中文摘要

非地面网络（NTN）有望通过实现无处不在的连接和大规模通信，在第六代（6G）系统中发挥关键作用。在此背景下，信道预测成为一项关键技术，通过限制导频开销来提高频谱利用效率。然而，许多基于人工智能（AI）的预测器具有高推理复杂度，给星载实现带来挑战。本文针对低地球轨道（LEO）NTN，在严格功率约束限制模型复杂度的情况下，设计了精确且计算高效的信道预测技术，以实现频谱效率增益。我们提出了一种面向6G NTN的迭代联合信道估计与预测框架，通过仅在初始时隙传输导频，并在后续时隙依赖数据驱动处理，显著降低了导频开销。我们引入了DRIFT（无线信道跟踪的数据驱动细化与迭代预测），这是一种轻量级架构，以低计算成本和减少的误差传播来细化数据辅助的信道估计并预测未来的信道频率响应。研究了基于卷积层和长短期记忆层的两种预测器变体。在上行链路LEO NTN场景的端到端仿真中，结果表明，与传统基于导频的系统相比，所提方法实现了高达12%的频谱效率增益，对训练-测试不匹配具有鲁棒性，并在不同信道模型下保持一致的性能。此外，DRIFT所需的乘加运算少于20万次，使其适用于严格功率约束下的星载实现。

英文摘要

Non-terrestrial networks (NTNs) are expected to play a pivotal role in sixth-generation (6G) systems by enabling ubiquitous connectivity and massive communication. In this context, channel prediction emerges as a key technique to improve the spectrum utilization efficiency by limiting the pilot overhead. However, many proposed predictors based on artificial intelligence (AI) are characterized by high inference complexity, posing challenges to onboard implementation. In this paper, we address the challenge of designing accurate yet computationally efficient channel prediction techniques tailored to low Earth orbit (LEO) NTNs, where strict power constraints limit model complexity, to enable spectral efficiency gains. We propose an iterative joint channel estimation and prediction framework in the context of 6G NTNs that significantly reduces pilot overhead by transmitting pilots only in the initial slot and relying on data-driven processing for subsequent slots. We introduce Data-driven Refinement and Iterative Forecast for wireless channel Tracking (DRIFT), a lightweight architecture that refines data-aided channel estimates and predicts future channel frequency responses with low computational cost and reduced error propagation. Two predictor variants based on convolutional and long short-term memory layers are investigated. Simulation results in an end-to-end simulation of an uplink LEO NTN scenario show that the proposed approach achieves up to 12% spectral efficiency gain compared to conventional pilot-based systems, with robustness to training-test mismatches and consistent performance across different channel models. Moreover, DRIFT requires fewer than 200k multiply-accumulate operations, making it suitable for on-board satellite implementation under stringent power constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.31064 2026-06-01 cs.IR cs.AI

Fighting Numerical Hallucinations via Data-centric Compilation for Online Financial QA

通过数据为中心的编译对抗在线金融问答中的数值幻觉

Hao Chen, Xing Tang, Qirui Liu, Weijie Shi, Shiwei Li, Fuyuan Lyu, Weihong Luo, Xiku Du, Xiuqiang He

AI总结提出数据为中心推理编译器（DCRC），通过对抗数据构建、多阶段训练和编译执行推理流程，解决在线金融问答中检索增强生成面临的噪声敏感、计算脆弱和可审计性危机，实现可靠的数值推理。

Comments Accepted by KDD 2026 ADS track

详情

AI中文摘要

大型语言模型（LLMs）显著推进了在线数据服务，特别是在金融问答（FinQA）领域。然而，此类系统仍然容易受到数值推理幻觉的影响，这在高风险金融应用中严重损害了可靠性。尽管检索增强生成（RAG）已被广泛采用以将响应基于外部知识，但它引入了三个持续挑战：噪声敏感性、计算脆弱性和可审计性危机。现有的以模型为中心的方法主要侧重于单独优化检索器或生成器，仍然难以以集成方式解决这些问题。在这项工作中，我们开创了一种以数据为中心的范式，并提出了一个新颖的框架——数据为中心推理编译器（DCRC）。该框架通过三个连贯的阶段运作：（1）对抗数据构建，合成带有受控噪声的训练示例以教授鲁棒性；（2）多阶段训练，培养一个能够进行显式证据审计和程序合成的数据为中心结构化代理（DSA）；（3）编译并执行推理过程，其中DSA将用户查询和检索到的文档转换为可验证、可执行的推理程序。这种数据驱动的框架通过设计确保了忠实的数值推理。我们在已建立的离线基准上进行了大量实验，并通过在实际在线金融问答系统中的部署进一步验证了我们的框架。

英文摘要

Large Language Models (LLMs) have significantly advanced online data services, particularly in the domain of financial question answering (FinQA). However, such systems remain susceptible to numerical reasoning hallucinations, which critically undermine reliability in high-stakes financial applications. Although retrieval-augmented generation (RAG) has been widely adopted to ground responses in external knowledge, it introduces three persistent challenges: noise sensitivity, calculation fragility, and an auditability crisis. Existing model-centric approaches, which primarily focus on optimizing either the retriever or generator in isolation, still struggle to address these issues in an integrated manner. In this work, we pioneer a data-centric paradigm and propose a novel framework, the Data-centric Reasoning Compiler (DCRC). The framework operates through three cohesive phases: (1) adversarial data construction, which synthesizes training examples with controlled noise to teach robustness; (2) multi-stage training that cultivates a Data-centric Structuring Agent (DSA) capable of explicit evidence auditing and program synthesis; and (3) a compile-and-execute inference process, where the DSA transforms user queries and retrieved documents into verifiable, executable reasoning programs. This data-driven framework ensures faithful numerical reasoning by design. We conduct extensive experiments on established offline benchmarks and further validate our framework through deployment in a real-world online financial QA system.

URL PDF HTML ☆

赞 0 踩 0

2605.31063 2026-06-01 stat.ML cs.LG physics.chem-ph physics.comp-ph

Free energy Estimation on Any State Space

任意状态空间上的自由能估计

Jiajun He, Zijing Ou, Francisco Vargas, Yingzhen Li, José Miguel Hernández-Lobato, Carles Domingo-Enrich, Yuanqi Du

AI总结提出一种基于广义神经传输学习的框架，将自由能估计推广到任意状态空间，并揭示时间反演与Doob h-变换的群论结构。

详情

AI中文摘要

自由能估计是一个从物理学到统计学的基础且具有挑战性的问题。经典方法依赖于热力学变换，包括直接估计、准静态积分和有限时间平均。最近的工作[He and Du et al., 2025]通过学习神经传输显著加速了有限时间区间的效率。在本文中，我们将此框架推广到任意状态空间。基于这一观点，我们开发了一种广义神经传输学习方法以实现高效估计。实验验证了所提方法在连续设置之外的有效性和效率，扩展到离散和多模态空间以及自回归设置。除了自由能估计，我们还建立了代数恒等式并揭示了连接无穷小时间反演和广义Doob h-变换的群论结构，表明它们的组合形成一个广义二面体群。

英文摘要

Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.

URL PDF HTML ☆

赞 0 踩 0

2605.31043 2026-06-01 stat.ML cs.AI cs.LG

Routing on the Stiefel Manifold: When Does Adaptive Subspace Selection Help for Cross-Domain EEG Decoding?

Stiefel流形上的路由：自适应子空间选择何时有助于跨域脑电解码？

Isabella Costa Maia, Pedro L. C. Rodrigues, Salem Said, Marco Congedo

AI总结针对跨域脑电解码中协方差矩阵域偏移问题，提出动态Stiefel路由方法，通过Stiefel流形上的专家投影滤波器池和交叉注意力机制实现自适应子空间选择，并引入三种结构性质避免退化为集成平均，在三个数据集上取得一致提升。

详情

AI中文摘要

尽管黎曼深度学习取得了进展，跨域脑电解码仍然具有挑战性：来自不同受试者的协方差矩阵占据了SPD流形上系统不同的区域，然而现有的域适应方法要么需要目标域校准数据，要么学习无法跨域泛化的受试者特定组件。我们提出了动态Stiefel路由：在Stiefel流形上有一个包含$K$个专家投影滤波器的池，每个滤波器专门处理SPD流形上的不同区域，每个输入协方差通过交叉注意力路由到最合适的滤波器，从而为每个样本自适应调整子空间投影。一个核心发现是，这种朴素实现的方法会退化为集成平均：当路由权重均匀时，自适应滤波器恰好等价于专家的等贡献组合，与单个固定滤波器无法区分。三种结构性质打破了这种退化：一个对称锚点$W_{\mathrm{base}} \in \mathrm{St}(n,k)$消除了专家间的邻近偏差；一个冻结的域判别查询编码器将路由与任务优化解耦；以及一个解耦的键对齐损失，将专家键训练到稳定的域吸引子。它们共同产生了SPD流形上第一个真正承诺且域结构化的路由，在三个数据集上取得一致提升：平衡准确率分别从$0.773\to 0.823$、$0.757\to 0.809$和$0.801\to 0.839$，且对齐策略由单一数据驱动规则自动确定，无需数据集特定的超参数搜索。

英文摘要

Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the Stiefel manifold, each specialised for a different region of the SPD manifold, with each input covariance routed to the most appropriate filter via cross-attention, adapting the subspace projection per sample. A central finding is that this approach, implemented naively, provably collapses to ensemble averaging: when routing weights are uniform, the adaptive filter reduces exactly to an equal-contribution combination of experts, indistinguishable from a single fixed filter. Three structural properties break this degeneracy: a symmetric anchor $W_{\mathrm{base}} \in \mathrm{St}(n,k)$ that removes proximity bias among experts; a frozen domain-discriminative query encoder that decouples routing from task optimisation; and a decoupled key alignment loss that trains expert keys toward stable domain attractors. Together they produce the first genuinely committed and domain-structured routing on SPD manifolds, with consistent gains across three datasets: balanced accuracy improves from $0.773\to 0.823$, $0.757\to 0.809$, and $0.801\to 0.839$, with the alignment strategy determined automatically by a single data-driven rule and no dataset-specific hyperparameter search.

URL PDF HTML ☆

赞 0 踩 0

2605.31042 2026-06-01 cs.CR cs.AI cs.CL

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

从提示注入到持久控制：防御智能体框架中的木马后门

Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu, Yiruo Cheng, Xiaoxi Li, Ji-Rong Wen

AI总结本文提出ClawTrojan基准测试揭示本地智能体框架中的多步木马攻击，并设计DASGuard防御方法，通过扫描控制文本、追溯来源并清除不可信控制内容，实现动态防御。

Comments Code and data are available at https://github.com/RUC-NLPIR/ClawTrojan

详情

AI中文摘要

LLM智能体正在从对话式聊天机器人演变为实际工作空间中的操作工具。在本地智能体框架中，LLM可以读写文件、调用工具，并在会话间重用工作空间状态。虽然这些功能增强了实用性，但也为攻击者暴露了新的攻击面。攻击者可以将提示注入嵌入文件或工具输出中。智能体可能会读取这一隐藏指令，存储它，并在之后执行。在这种多步木马攻击范式中，没有任何单个步骤本身是恶意的，但这些步骤可以共同将不可信文本转化为持久控制内容。然而，现有防御通常孤立地检查每个步骤。因此，它们可以阻止明显的恶意行为，但无法检测到植入后门的早期写操作。为了揭示这一威胁，我们引入了ClawTrojan，一个旨在识别本地智能体框架中多步木马攻击的基准测试。在OpenClaw风格的模拟工作空间中，使用GPT-5.4，ClawTrojan达到了95.5%的攻击成功率（ASR），而同一模型上现有的单轮提示注入攻击产生的ASR接近零。为了解决这一威胁，我们提出了DASGuard，它扫描敏感本地文件中的控制类文本，追溯其来源，并清除非可信来源的控制内容。我们的结果表明，DASGuard通过结合运行时攻击阻断和对工作空间的清理提交，实现了强大的动态防御。

英文摘要

LLM agents are evolving from conversational chatbots to operational tools in real-world workspaces. In local agentic harnesses, an LLM can read and write files, call tools, and reuse workspace state across sessions. While such capabilities enhance utility, they also expose a new attack surface for attackers. Attackers can embed a prompt injection within a file or tool output. Agents may read this hidden instruction, store it, and execute it later. In this multi-step trojan attack paradigm, no individual step appears malicious on its own, but these steps can collectively turn untrusted text into persistent control content. However, existing defenses often inspect each step in isolation. As a result, they can block a clear harmful action, but fail to detect the earlier write operation that plants the backdoor. To reveal this threat, we introduce ClawTrojan, a benchmark designed to identify multi-step trojan attacks in local agentic harnesses. In an OpenClaw-style simulated workspace with GPT-5.4, ClawTrojan reaches a 95.5% attack success rate (ASR), while existing single-turn prompt-injection attacks produce near-zero ASR on the same model. To address this threat, we propose DASGuard, which scans control-like text in sensitive local files, traces its origin, and removes control content that does not originate from a trusted source. Our results show that DASGuard achieves strong dynamic defense by combining runtime attack blocking with sanitized commits to the workspace.

URL PDF HTML ☆

赞 0 踩 0

2605.31036 2026-06-01 cs.GT cs.LG

Model Monotonicity in Autobidding Auctions: When Do Better Predictions Lead to Better Outcomes?

自动竞价拍卖中的模型单调性：更好的预测何时带来更好的结果？

Ashwinkumar Badanidiyuru

AI总结研究在线广告中推荐系统模型质量、拍卖格式和自动竞价者行为的相互作用，通过聚类精炼定义模型改进，并系统刻画不同竞价者类型、拍卖格式和预算约束下评估指标单调性的条件。

详情

Journal ref: ICML 2026

AI中文摘要

在线广告平台依赖机器学习模型预测点击率（pCTR）和转化率（pCVR）以用于拍卖机制。我们引入了一个新框架来研究推荐系统模型质量、拍卖格式和自动竞价者行为之间的相互作用。我们形式化了模型改进——通过受概率论中滤子启发的精炼关系定义——何时导致平台级评估指标（如收入、福利或流动性福利）的改进。我们的主要贡献是：（1）基于聚类精炼的模型改进的形式化定义，以及（2）跨不同竞价者类型（tCPA、max-CPA）、拍卖格式（第一价格、第二价格、VCG）和预算约束的ECM单调性的系统刻画。我们证明，具有统一竞价的第一价格拍卖保证了无预算的tCPA竞价者的收入单调性（通过Jensen不等式），而第二价格拍卖和预算约束可能破坏这一性质。我们为非单调性结果提供了完整的数值构造。我们的发现对寻求将模型改进与业务成果对齐的广告平台具有实际意义。

英文摘要

Online advertising platforms rely on machine learning models to predict click-through rates (pCTR) and conversion rates (pCVR) for auction mechanisms. We introduce a novel framework to study the interaction between recommender system model quality, auction format, and autobidder behavior. We formalize when model improvements -- defined via a refinement relation inspired by filtrations in probability theory -- lead to improvements in platform-level Evaluation Criteria Metrics (ECM) such as revenue, welfare, or liquid welfare. Our main contributions are: (1) a formal definition of model improvement based on cluster refinement, and (2) a systematic characterization of ECM monotonicity across different combinations of bidder types (tCPA, max-CPA), auction formats (first-price, second-price, VCG), and budget constraints. We show that first-price auctions with uniform bidding guarantee revenue monotonicity for tCPA bidders without budgets (via Jensen's inequality), while second-price auctions and budget constraints can break this property. We provide full numerical constructions for the non-monotonicity results. Our findings have practical implications for advertising platforms seeking to align model improvements with business outcomes.

URL PDF HTML ☆

赞 0 踩 0

2605.31000 2026-06-01 cs.NI cs.LG

HetCCL: Enabling Collective Communication For Mixed-Vendor Heterogeneous Clusters

HetCCL：实现混合供应商异构集群的集体通信

Yuejie Wang, Tao Chang, Yuanyuan Zhao, Yulong Ao, Zeyu Gu, Zhiyu Li, Yanmin Jia, Yan Zhang, Mingjun Zhang, He Liu, Yongzhe He, Yonghua Lin, Guyue Liu

AI总结提出HetCCL框架，通过高效P2P传输和边界通信器机制，在异构集群中实现跨供应商的集体通信，消除主机-设备内存拷贝开销，并优化带宽利用率。

详情

AI中文摘要

在异构集群上训练大型语言模型（LLM）给集体通信带来了重大挑战，因为来自多个供应商的硬件引入了多样化的网络和计算特性。现有的为同构环境设计的集体通信框架（如NCCL、RCCL）无法处理混合硬件设置，而支持异构的通信库（如Gloo、OpenMPI）在数据路径中引入了大量开销。本文提出了HetCCL，一个通过跨异构设备（如GPU）的高效P2P传输实现异构集体通信的框架，消除了主机-设备内存拷贝开销，同时将控制卸载到CPU。对于组合集体（如AllReduce、ReduceScatter），HetCCL引入了一种边界通信器机制，通过使用供应商集体通信库中组合集体的内在归约来实现供应商独立性。凭借高效的异构P2P传输和可移植的归约机制，HetCCL提出了异构集群的层次拓扑抽象，将集体通信分解为集群级原语，保证了最优的跨集群数据传输量和最优的带宽利用率。我们实现了支持4种不同供应商的HetCCL，并在4种异构设置下使用基准测试和端到端LLM任务进行了评估。评估结果表明，在异构通信中，HetCCL的带宽比Gloo高17-19倍，并且在端到端训练中每步时间加速高达16.9%。

英文摘要

Training Large Language Models (LLMs) on heterogeneous clusters presents significant challenges for collective communication, as hardware from multiple vendors introduces diverse network and computational characteristics. Existing collective communication frameworks (e.g., NCCL, RCCL) designed for homogeneous environments fail to address mixed-hardware setups, while communication libraries with heterogeneous support (e.g., Gloo, OpenMPI) incur heavy overhead in the data path. This paper presents HetCCL, a framework that enables heterogeneous collective communication by efficient P2P transport across heterogeneous devices (e.g., GPUs), eliminating the host-device memory copy overhead while offloading the control to the CPUs. For combining collectives (e.g., AllReduce, ReduceScatter), HetCCL introduces a border-communicator mechanism that achieves vendor independence by using the intrinsic reduction in the combining collectives in vendor collective communication libraries. With efficient heterogeneous P2P transport and portable reduction mechanism, HetCCL proposes a hierarchical topology abstraction for heterogeneous clusters, dissecting collective communication into cluster-level primitives that guarantee optimal cross-cluster data transfer volume and optimal bandwidth utilization. We implement HetCCL with 4 different vendor support and evaluate it in 4 heterogeneous settings with benchmarks and end-to-end LLM tasks. Our evaluation shows that HetCCL achieves 17-19x higher bandwidth than Gloo in heterogeneous communications, and speeds up end-to-end training by up to 16.9% in the per-step-time.

URL PDF HTML ☆

赞 0 踩 0

2605.30997 2026-06-01 stat.ML cs.LG

Hedging on the Frontier: Learning New Tasks with Few Samples

前沿对冲：基于少量样本学习新任务

Tobias Wegel, Federico Di Gennaro, Geelon So, Fanny Yang

AI总结针对新任务样本少的问题，利用弱单调性假设，通过转移学习和模型选择聚合在模型前沿进行对冲，实现可证明的统计增益。

2605.30976 2026-06-01 stat.ML cs.IT cs.LG math.IT

Batched Stochastic Linear Bandits with 1-Bit Communication Constraints

具有1比特通信约束的批量随机线性赌博机

Ivan Lau, Daniel McMorrow, Kevin Jamieson, Jonathan Scarlett

AI总结研究在批量大小B和每批仅1比特反馈的通信约束下，随机线性赌博机的遗憾最小化问题，提出了两种基于G-最优设计和1比特均值估计的相位消除算法，实现了接近无约束线性赌博机的最优遗憾。

详情

AI中文摘要

我们研究了在批处理和通信约束的自然组合下的随机线性赌博机：时间范围被划分为大小相等的批次$B$，在每个批次中，学习器向一个智能体发送$B$个请求的臂拉动，智能体观察相应的$B$个奖励，并用单个比特的反馈回复学习器。对于每个批次，学习器指定智能体使用的1比特量化规则，该规则可能依赖于所有先前接收到的比特，但不直接依赖于任何过去的奖励。这一设置解决了先前模型（仅有每轮量化或仅有总比特预算）之间一个显著但尚未探索的“中间地带”。我们建立了一个极小极大下界，表明由于1比特通信瓶颈，即使在没有噪声的情况下，$Ω(B\min\{d,\log\lvert \mathcal{A} vert\})$的遗憾也是不可避免的。结合标准的统计极限，这给出了一个通用的下界$\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} vert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} vert\}})$。我们开发了两种基于$G$-最优设计和1比特均值估计的相位消除算法。第一种算法实现了$\widetilde{O}(dB + d\sqrt{T})$的遗憾，当$\lvert \mathcal{A} vert = \exp(Ω(d))$时，该下界在对数因子内匹配；第二种算法结合了安全臂识别和热启动过程，获得了$\widetilde{O}(B\log\lvert \mathcal{A} vert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} vert})$的遗憾，在$(\lvert \mathcal{A} vert, B, d, T)$的广泛缩放范围内接近最优。总之，我们的结果表明，每批仅需一个比特的反馈就足以在广泛的缩放范围内几乎匹配无约束线性赌博机的极小极大遗憾，即使对于$Θ(\sqrt{T})$这样大的批量大小也是如此。

英文摘要

We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all previously received bits but not on any past rewards directly. This setting addresses a significant yet unexplored ``middle ground'' between previous models having per-round quantization only or total bit budgets only. We establish a minimax lower bound showing that $Ω(B\min\{d,\log\lvert \mathcal{A} \rvert\})$ regret is unavoidable due to the 1-bit communication bottleneck, even in the absence of noise. Combined with standard statistical limits, this yields a general lower bound of $\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} \rvert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} \rvert\}})$. We develop two phased-elimination algorithms based on $G$-optimal designs and 1-bit mean estimation. The first achieves $\widetilde{O}(dB + d\sqrt{T})$ regret, matching the lower bound up to logarithmic factors when $\lvert \mathcal{A} \rvert = \exp(Ω(d))$, and the second incorporates a safe-arm identification and warm-start procedure to obtain $\widetilde{O}(B\log\lvert \mathcal{A} \rvert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} \rvert})$ regret, which is near-optimal in broad scaling regimes of $(\lvert \mathcal{A} \rvert, B, d, T)$. Together, our results demonstrate that a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes, even for batch sizes as large as $Θ(\sqrt{T})$.

URL PDF HTML ☆

赞 0 踩 0

2605.30966 2026-06-01 cs.IR cs.AI cs.CL

Reading Between the Citations: A Typed Claim Network for Scientific Literature

解读引用：面向科学文献的类型化主张网络

Ning Ding, Sergio J. Rodríguez Méndez, Pouya G. Omran

AI总结针对现有知识图谱忽略引用立场的问题，提出将文献间引用具体化为带有立场标签的类型化主张网络，并构建了包含8260条主张的实例，在检索增强、立场摘要和拓扑分析三个任务上验证其有效性。

详情

AI中文摘要

基于相互引用文献语料库（如学术论文、法律意见书、政策简报）的知识图谱编码了引用的拓扑结构，但未编码其立场。标准表示将丰富的评价关系压缩为无类型边，丢失了支持社区级查询（关于一篇文献如何被另一篇文献接受）的关键内容。我们提出主张网络：一种表示模式，其中每个跨文献引用被具体化为一个类型化主张，携带来源、目标、主张文本以及基于引用意图文献的四类立场标签。我们给出了一个适用于任何学术相互引用文献语料库的构建流程，并在3D点云语义分割领域的127篇论文语料库上实例化，生成了一个包含8260个类型化主张的网络。三个下游任务系列展示了该网络的能力：检索信号增强、聚合立场摘要和拓扑分析。与标准检索增强生成（RAG）基线的直接比较表明，相对于平面检索的增益来自于正确的中间表示，而非错误的表示。

英文摘要

Knowledge graphs over corpora of inter-referencing documents - scholarly papers, legal opinions, policy briefs - encode the topology of reference but not its stance. The standard representation collapses a rich evaluative relation into an untyped edge, losing the very content that supports community-level queries about how one document is received by another. We propose the claim network: a representational pattern in which each cross-document reference is reified as a typed claim, carrying source, target, claim text, and a four-class stance label grounded in the citation-intent literature. We give a construction pipeline applicable to any corpus of scholarly inter-referencing documents and instantiate it on a corpus of 127 papers in 3D point cloud semantic segmentation, producing a network of 8,260 typed claims. Three downstream task families demonstrate what the network enables: retrieval signal augmentation, aggregated-stance summarisation, and topological analytics. Head-to-head evaluation against standard Retrieval-Augmented Generation (RAG) baselines shows that the gain over flat retrieval is the gain from the right intermediate representation rather than the wrong one.

URL PDF HTML ☆

赞 0 踩 0

2605.30965 2026-06-01 eess.AS cs.AI cs.CL

ImmersiveTTS: Environment-Aware Text-to-Speech with Multimodal Diffusion Transformer and Domain-Specific Representation Alignment

ImmersiveTTS：基于多模态扩散Transformer和领域特定表示对齐的环境感知文本转语音

Jun-Hak Yun, Seung-Bin Kim, Seong-Whan Lee

AI总结提出ImmersiveTTS模型，通过多模态扩散Transformer和领域特定表示对齐，实现与环境音频自然融合的文本到语音生成。

Comments Accepted to ACL 2026 main conference. Code is available at https://github.com/jjunak-yun/ImmersiveTTS

详情

AI中文摘要

最近在文本引导音频生成方面的进展在声音效果、语音和音乐等多个领域取得了有希望的结果。然而，由于语音和环境音频在声学模式和时域动态上的固有差异，联合生成语音和环境音频仍然具有挑战性。我们提出了ImmersiveTTS，一种环境感知的文本到语音（TTS）模型，通过显式建模跨模态交互，生成与环境上下文无缝融合的自然语音。我们的模型基于多模态扩散Transformer，并通过联合注意力将转录对齐的语音潜在表示与文本条件的环境上下文融合。为了增强语义一致性，我们引入了一种针对环境感知TTS量身定制的领域特定表示对齐目标，利用来自语音和音频编码器的互补自监督表示。实验结果表明，在客观指标和人类听力测试中，ImmersiveTTS在自然度、可懂度和音频保真度方面均优于现有方法。

英文摘要

Recent advancements in text-guided audio generation have yielded promising results in diverse domains, including sound effects, speech, and music. However, jointly generating speech with environmental audio remains challenging due to the inherent disparities in their acoustic patterns and temporal dynamics. We propose ImmersiveTTS, an environment-aware text-to-speech (TTS) model that generates natural speech seamlessly integrated within environmental contexts by explicitly modeling cross-modal interactions. Our model builds on a multimodal diffusion transformer and fuses transcript-aligned speech latent with text-conditioned environmental context via joint attention. To enhance semantic consistency, we introduce a domain-specific representation alignment objective tailored to environment-aware TTS, leveraging complementary self-supervised representations from speech and audio encoders. Experimental results show that ImmersiveTTS achieves higher naturalness, intelligibility, and audio fidelity than existing approaches across objective metrics and human listening tests.

URL PDF HTML ☆

赞 0 踩 0

2605.30963 2026-06-01 q-bio.BM cs.AI

AMix-2: Establishing Protein as a Native Modality in Large Language Models

AMix-2：将蛋白质确立为大语言模型的原生模态

Keyue Qiu, Yixin Wu, Lihao Wang, Yawen Ouyang, Jixiang Yu, Zihan Zhou, Changze Lv, Dongyu Xue, Yuxuan Song, Xinbo Zhang, Hao Wang, Jiangtao Feng, Zhiqiang Gao, Lijun Wu, Xiaoqing Zheng, Ka-Chun Wong, Lei Bai, Ya-Qin Zhang, Wei-Ying Ma, Dahua Lin, Bowen Zhou, Hao Zhou

AI总结提出AMix-2，一种蛋白质-文本基础模型，通过统一蛋白质理解与序列设计，将蛋白质作为大语言模型的原生模态，并引入块状扩散语言建模骨干以更好地匹配蛋白质内在特性。

Comments 30 pages, 4 figures, 12 tables

详情

AI中文摘要

我们提出了AMix-2，一种蛋白质-文本基础模型，将蛋白质确立为大语言模型（LLMs）的原生模态，在单一基础模型中统一了蛋白质理解和序列设计。AMix-2基于两个关键思想：（1）统一的蛋白质-文本公式，将自然语言和蛋白质序列嵌入共享的标记空间，使一个模型能够执行生物推理和条件设计，而不是使用单独的下游任务专用模型；（2）块状扩散语言建模骨干，结合了跨块的因果生成与块内的双向上下文和迭代细化。这种方案比严格的从左到右分解更好地匹配了蛋白质的内在本质。为了在现实的泛化设置下评估蛋白质基础模型，我们进一步引入了ProteinArena，一个全面的基准测试，具有时间感知和同源性感知协议，涵盖各种理解和设计任务，并以经典生物信息学工具、蛋白质专用模型和LLMs作为基线。在ProteinArena上，AMix-2优于前沿的LLMs，并展现出与任务专用蛋白质模型竞争的性能。控制实验进一步表明，基于扩散的范式普遍优于其自回归对应物，突显了蛋白质序列灵活生成顺序的优势。我们发布了AMix-2和ProteinArena，以促进蛋白质基础模型的开放研究。

英文摘要

We present AMix-2, a protein-text foundation model that establishes protein as a native modality in large language models (LLMs), unifying protein understanding and sequence design within a single foundation model. AMix-2 is built upon two key ideas: (1) a unified protein-text formulation that embeds natural language and protein sequence in a shared token space, enabling one model to perform biological reasoning and conditional design instead of separate downstream task-specialized models; and (2) a block-wise diffusion language modeling backbone that combines causal generation across blocks with bidirectional context and iterative refinement within blocks. This scheme better matches the intrinsic nature of proteins than a strict left-to-right factorization. To evaluate protein foundation models under realistic generalization settings, we further introduce ProteinArena, a comprehensive benchmark with time-aware and homology-aware protocols across various understanding and design tasks, and with baselines covering classical bioinformatics tools, protein-specialized models and LLMs. On ProteinArena, AMix-2 outperforms frontier LLMs and demonstrates competitive performance to task-specific protein models. Controlled experiments further show that the diffusion-based paradigm generally surpasses its autoregressive counterpart, highlighting the advantage of flexible generation order for protein sequences. We release both AMix-2 and ProteinArena to facilitate open research in protein foundation models.

URL PDF HTML ☆

赞 0 踩 0

2605.30940 2026-06-01 eess.AS cs.MM cs.SD

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

面向流式同步空间音频生成的自回归扩散Transformer

Ke Lei, Yu Zhang, Changhao Pan, Xueyi Pu, Wenxiang Guo, Ruiqi Li, Zhou Zhao

AI总结提出SwanSphere统一流式框架，通过因果自回归扩散Transformer、空间视频-音频对比学习及多目标在线直接偏好优化，实现从全景视频和文本提示生成高保真空间音频，并开发自动化标注管道缓解数据稀缺。

Comments Accepted by ICML 2026

详情

AI中文摘要

实时且准确的空间音频生成对于提供沉浸式体验至关重要。然而，现有的空间音频合成技术通常受限于生成质量与高推理延迟之间的权衡，以及难以从多模态输入中捕获精确的空间信息。为应对这些挑战，我们提出了SwanSphere，一个统一的流式框架，用于从全景视频和文本提示生成高保真空间音频。SwanSphere主要做出以下贡献：1）我们引入了一种因果自回归扩散Transformer架构，支持流式高质量空间音频生成。2）我们设计了一种空间视频-音频对比学习策略，以对齐视频编码器与声学领域，并进一步采用多目标在线直接偏好优化方案，从而实现强大的空间感知和鲁棒的多模态空间音频合成。3）为缓解当前空间音频数据集的稀缺性，我们还开发了一个自动化标注管道，用于生成详细的空间描述。实验结果表明，SwanSphere在视频到空间和文本到空间音频生成任务中均取得了优越性能。演示可在 https://swanaigc.github.io 找到。

英文摘要

Real-time and accurate spatial audio generation is pivotal for delivering an immersive experience. However, existing spatial audio synthesis technologies are often encumbered by a tradeoff between generation quality and high inference latency, as well as difficulty in capturing precise spatial information from multimodal inputs. To address these challenges, we propose SwanSphere, a unified streaming framework for high-fidelity spatial audio generation from panoramic videos and text prompts. SwanSphere mainly makes the following contributions: 1) We introduce a causal autoregressive diffusion transformer architecture that enables streaming high-quality spatial audio generation. 2) We design a Spatial Video-Audio Contrastive (SVAC) learning strategy to align the video encoder with the acoustic domain, and further employ a multi-objective online direct preference optimization (ODPO) scheme, resulting in strong spatial perception and robust multimodal spatial audio synthesis. 3) To alleviate the current scarcity of spatial audio datasets, we also develop an automated annotation pipeline for generating detailed spatial captions. Experimental results demonstrate that SwanSphere achieves superior performance in both video-to-spatial and text-to-spatial audio generation tasks. Demos can be found at: https://swanaigc.github.io.

URL PDF HTML ☆

赞 0 踩 0