arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2086
2605.04510 2026-05-08 math.OC cs.AI cs.LG

Predictive and Prescriptive AI toward Optimizing Wildfire Suppression

预测与规范性人工智能:优化野火扑救

Leonard Boussioux, Alexandre Jacquillat, Ryne Reger, Jacob Wachspress

发表机构 * Michael G. Foster School of Business and Paul G. Allen School of Computer Science & Engineering, University of Washington(迈克尔·G·福斯特商学院和保罗·G·阿伦计算机科学与工程学院,华盛顿大学)

AI总结 本文提出一种预测与规范性AI方法,联合优化扑救队伍分配与野火扑救,通过整数优化模型和双侧列生成算法提升扑救效率,减少火灾面积。

详情
AI中文摘要

剧烈的野火季节需要关键的优先决策来分配稀缺的扑救资源。本文开发了一种预测和规范性方法,共同优化队伍分配和野火扑救。问题特征是具有内生野火需求和非线性野火动态的离散资源分配结构。我们提出了一个整数优化模型,包含时间-空间-休息网络上的队伍分配,时间-状态网络上的野火动态,以及它们之间的连接约束。我们开发了一种基于双侧列生成方案、新家族的切割和新颖分支规则的双侧分支-切割算法。我们还提出了一种数据驱动的双重机器学习方法,以估计野火蔓延作为协变量信息和扑救努力的函数,减轻历史队伍分配与野火增长之间的观察混淆。广泛的计算实验表明,优化算法可以扩展到其他不可行的现实世界实例;并且该方法可以提高实际扑救效果,从而在野火季节内显著减少烧毁面积,并指导跨野火管辖区域的资源共享。

英文摘要

Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dynamics on a time-state network, and linking constraints between them. We develop a two-sided branch-and-price-and-cut algorithm based on: (i) a two-sided column generation scheme that generates fire suppression plans and crew routes iteratively; (ii) a new family of cuts exploiting the knapsack structure of the linking constraints; and (iii) novel branching rules to accommodate non-linear wildfire dynamics. We also propose a data-driven double machine learning approach to estimate wildfire spread as a function of covariate information and suppression efforts, mitigating observed confounding between historical crew assignments and wildfire growth. Extensive computational experiments show that the optimization algorithm scales to otherwise intractable real-world instances; and that the methodology can enhance suppression effectiveness in practice, resulting in significant reductions in area burned over a wildfire season and guiding resource sharing across wildfire jurisdictions.

2605.03482 2026-05-08 cs.CR cs.AI cs.LG

MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents

MEMSAD: 基于梯度耦合的内存污染异常检测用于检索增强型智能体

Ishrith Gowda

发表机构 * Department of Electrical Engineering and Computer Sciences(电气工程与计算机科学系) University of California, Berkeley(加州大学伯克利分校) Berkeley AI Research(伯克利人工智能研究)

AI总结 本文提出MEMSAD,通过梯度耦合定理实现内存污染攻击的检测,证明其在对抗策略下的正确分类保证,并通过实验验证复合防御在所有攻击下的高检测率和低误报率。

Comments 28 pages, 9 figures, 6 theorems. Submitted to NeurIPS 2026

详情
AI中文摘要

持久的外部内存使LLM智能体能够在会话间保持上下文,但其安全性属性仍未正式表征。我们正式将针对检索增强型智能体的内存污染攻击建模为一个Stackelberg博弈,并提出一个涵盖三种攻击类别的统一评估框架。修正Chen等人(2024)触发查询规范中的评估协议不一致,我们证明忠实评估使攻击成功率提高了4倍(ASR-R: 0.25→1.00)。我们的主要贡献是MEMSAD(语义异常检测),一种基于校准的防御方法,其基础是梯度耦合定理:在编码器正则性下,异常得分梯度和检索目标梯度可证明相同,因此任何减少检测风险的连续扰动必然降低检索排名。这种耦合产生了一种认证的检测半径,无论对手策略如何都能保证正确分类。我们通过Le Cam的方法证明了极小极大最优性,证明任何阈值检测器需要Ω(1/ρ²)校准样本,而MEMSAD在log(1/δ)因子内达到此水平。我们进一步推导了滚动校准的在线遗憾界,以率O(σ²/³Δ¹/³),并正式刻画了一个离散同义词不变性漏洞,标志着连续空间防御能保证的边界。在3×5攻击-防御矩阵上进行实验,使用bootstrap置信区间、Bonferroni校正的假设检验和Clopper-Pearson验证(n=1,000)证实:复合防御在所有攻击下实现TPR=1.00,FPR=0.00,而同义词替换在Δ ASR-R≈0时逃避检测,暴露了现有嵌入式防御无法弥补的差距。

英文摘要

Persistent external memory enables LLM agents to maintain context across sessions, yet its security properties remain formally uncharacterized. We formalize memory poisoning attacks on retrieval-augmented agents as a Stackelberg game with a unified evaluation framework spanning three attack classes with escalating access assumptions. Correcting an evaluation protocol inconsistency in the triggered-query specification of Chen et al. (2024), we show faithful evaluation increases measured attack success by $4\times$ (ASR-R: $0.25 \to 1.00$). Our primary contribution is MEMSAD (Semantic Anomaly Detection), a calibration-based defense grounded in a gradient coupling theorem: under encoder regularity, the anomaly score gradient and the retrieval objective gradient are provably identical, so any continuous perturbation that reduces detection risk necessarily degrades retrieval rank. This coupling yields a certified detection radius guaranteeing correct classification regardless of adversary strategy. We prove minimax optimality via Le Cam's method, showing any threshold detector requires $Ω(1/ρ^2)$ calibration samples and MEMSAD achieves this up to $\log(1/δ)$ factors. We further derive online regret bounds for rolling calibration at rate $O(σ^{2/3}Δ^{1/3})$, and formally characterize a discrete synonym-invariance loophole that marks the boundary of what continuous-space defenses can guarantee. Experiments on a $3 \times 5$ attack-defense matrix with bootstrap confidence intervals, Bonferroni-corrected hypothesis tests, and Clopper-Pearson validation ($n=1{,}000$) confirm: composite defenses achieve TPR $= 1.00$, FPR $= 0.00$ across all attacks, while synonym substitution evades detection at $Δ$ ASR-R $\approx 0$, exposing a gap existing embedding-based defenses cannot close.

2605.03213 2026-05-08 cs.CR cs.AI

When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

当代理处理秘密:关于代理AI的保密计算调查

Javad Forough, Marios Kogias, Hamed Haddadi

发表机构 * Department of Computing Imperial College London London, United Kingdom(计算系帝国理工学院伦敦英国)

AI总结 本文探讨了代理AI中保密计算的应用,分析了六个TEE平台的设计,提出了基于代理的威胁模型,并指出了六个开放挑战,旨在为生产级代理AI提供安全基础。

详情
AI中文摘要

代理AI系统,特别是基于大语言模型的代理,能够计划、调用工具、维护持久内存,并通过MCP和A2A等协议将任务委托给同僚代理,引入了与独立模型推理截然不同的威胁面。代理会积累敏感上下文,持有凭证,并在单个方无法完全控制的管道中运行,导致提示注入、上下文外泄、凭证盗窃和跨代理信息中毒。当前的防御措施完全基于软件栈,可以被足够特权的对手如被入侵的云运营商静默绕过。保密计算(CC)提供了一种基于硬件的替代方案:可信执行环境(TEEs)将代理代码和数据与特权系统软件隔离,而远程认证可实现分布式部署中的可验证信任。本文将设计空间分为四部分:(i)涵盖部署角色和性能权衡的六个TEE平台统一分类(Intel SGX、Intel TDX、AMD SEV-SNP、ARM TrustZone、ARM CCA和NVIDIA H100 CC);(ii)以代理为中心的威胁模型,覆盖感知、规划、内存、行动和协调层,并映射到九个安全目标;(iii)对基于CC的防御措施的比较调查,区分从单次调用推理转移的发现与需要新代理设计的发现;(iv)六个开放挑战,包括多跳代理链的复合认证和GPU-TEE在LLM规模下的性能。尽管几种硬件信任原始语似乎足够成熟以供定向部署,但尚无广泛确立的端到端框架将它们整合成一个连贯的安全基质用于生产代理AI。

英文摘要

Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via protocols such as MCP and A2A, introduce a threat surface that differs materially from standalone model inference. Agents accumulate sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within the software stack and can be silently bypassed by a sufficiently privileged adversary such as a compromised cloud operator. Confidential computing (CC) offers a hardware-rooted alternative: Trusted Execution Environments (TEEs) isolate agent code and data from privileged system software, while remote attestation enables verifiable trust across distributed deployments. This survey synthesizes the design space in four parts: (i) a unified taxonomy of six TEE platforms (Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, and NVIDIA H100 CC) covering deployment roles and performance tradeoffs; (ii) an agent-centric threat model spanning perception, planning, memory, action, and coordination layers mapped to nine security goals; (iii) a comparative survey of CC-based defenses distinguishing findings that transfer from single-call inference versus what requires new agentic designs; and (iv) six open challenges including compound attestation for multi-hop agent chains and GPU-TEE performance at LLM scale. While several hardware trust primitives appear mature enough for targeted deployments, no broadly established end-to-end framework yet binds them into a coherent security substrate for production agentic AI.

2604.20050 2026-05-08 econ.GN cs.AI cs.GT q-fin.EC

Information Aggregation with AI Agents

利用AI代理的信息聚合

Spyros Galanis

发表机构 * Department of Economics, University of Durham(杜伦大学经济学系)

AI总结 研究通过交易和观察价格波动,探讨大型语言模型能否聚合分散的私人信息,发现信息聚合在信息结构复杂时显著下降,且更智能的AI代理在聚合和盈利方面表现更好。

Comments 64 pages

详情
AI中文摘要

大型语言模型(AI代理)能否通过交易聚合分散的私人信息并通过观察价格波动来推理他人的知识?我们进行了一项受控实验,让AI代理在接收到私人信号后在预测市场中交易,通过最后价格的对数误差衡量信息聚合。我们发现,尽管中位市场在简单信息结构中有效聚合信息,但增加复杂性有显著的负面影响,表明AI代理在推理他人时可能面临与人类相似的限制。与我们的理论预测一致,信息聚合不受允许廉价谈话通信、改变市场持续时间和初始价格或战略提示的影响,从而证明预测市场是稳健的。我们证明了“更智能”的AI代理在聚合和盈利方面表现更好。令人惊讶的是,给它们提供过去表现的反馈对聚合没有影响。

英文摘要

Can Large Language Models (AI agents) aggregate dispersed private information through trading and reason about the knowledge of others by observing price movements? We conduct a controlled experiment where AI agents trade in a prediction market after receiving private signals, measuring information aggregation by the log error of the last price. We find that although the median market is effective at aggregating information in the easy information structures, increasing the complexity has a significant and negative impact, suggesting that AI agents may suffer from similar limitations as humans when reasoning about others. Consistent with our theoretical predictions, information aggregation remains unaffected by allowing cheap talk communication, changing the duration of the market or initial price, and strategic prompting, thus demonstrating that prediction markets are robust. We establish that "smarter" AI agents perform better at aggregation and they are more profitable. Surprisingly, giving them feedback about past performance has no impact on aggregation.

2604.06269 2026-05-08 q-bio.QM cs.AI

MAT-Cell: A Multi-Agent Tree-Structured Reasoning Framework for Batch-Level Single-Cell Annotation

MAT-Cell: 一种多智能体树状推理框架用于批量单细胞注释

Yehui Yang, Zelin Zang, Xienan Zheng, Yuzhe Jia, Changxi Chi, Jingbo Zhou, Chang Yu, Jinlin Wu, Fuji Yang, Jiebo Luo, Zhen Lei, Stan Z. Li

发表机构 * Westlake University(西湖大学) Shenzhen University of Advanced Technology(深圳先进技术大学) Center for Artificial Intelligence and Robotics(人工智能与机器人中心) Hong Kong Institute of Science and Innovation(香港科学与创新研究院) Chinese Academy of Sciences(中国科学院) University Key Laboratory of Information and Communication Security Backup and Recovery(信息与通信安全备份与恢复大学重点实验室)

AI总结 MAT-Cell通过分离证据基础与标签决策,结合反向验证查询和多轮辩论,提升批量单细胞注释的准确性与可追溯性。

详情
AI中文摘要

在最丰富的基因不一定是最判别性的或目标状态在固定参考图谱中覆盖不足时,自动单细胞注释具有挑战性。GPTCelltype-style one-shot prompting允许大型语言模型(LLMs)从通用表达信号生成合理标签,而基于参考的注释器可将不熟悉的状态强制归类到最近已知类别。我们提出了MAT-Cell,一种基于提示的批量单细胞注释框架,通过反向验证查询将组织上下文、观察到的差异表达基因和LLM提取的生物学先验整合为结构化的候选特定前提。验证代理将这些前提转换为前提到声明的推理树,并通过有界多轮辩论比较、挑战和修订最终声明,达成共识或最终裁决。返回的三段论推导树(SDT)提供可审计的辩论轨迹而非正式证明。在五个数据集的开放候选基准中,本地部署的Qwen3-30B模型结合MAT-Cell达到75.5%的平均准确率,优于64.2%的最强评估CoT基线和51.9%的最强评估scPilot变体。在三个物种的Oracle候选基准中,MAT-Cell在不同架构上保持竞争力,本地推断大幅降低批量注释的货币成本。代码可在:https://anonymous.4open.science/r/MATCell-4067

英文摘要

Automated single-cell annotation is difficult when the most abundant genes are not the most discriminative ones, or when a target state is poorly covered by a fixed reference atlas. GPTCelltype-style one-shot prompting allows large language models (LLMs) to produce plausible labels from generic expression signals, while reference-based annotators can force unfamiliar states into the nearest known category. We propose MAT-Cell, a prompt-driven framework for batch-level single-cell annotation that separates evidence grounding from label decision. MAT-Cell first uses Reverse Verification Query (RVQ) to combine tissue context, observed differentially expressed genes, and LLM-elicited biological priors into structured candidate-specific premises. Verifier agents then convert these premises into explicit premise-to-claim reasoning trees, and bounded multi-round debate compares,challenges, and revises the resulting claims before consensus or final adjudication.The returned Syllogistic Derivation Tree (SDT) provides an auditable debate trace rather than a formal proof of the annotation. In open-candidate benchmarks across five datasets, a locally deployed Qwen3-30B model with MAT-Cell achieves 75.5% average accuracy, compared with 64.2% for the strongest evaluated CoT baseline and 51.9% for the strongest evaluated scPilot variant. In oracle-candidate bench-marks across three species,MAT-Cell remains competitive across backbones, and local inference substantially reduces monetary cost for batch annotation. Code is available at: https://anonymous.4open.science/r/MATCell-4067

2603.23055 2026-05-08 stat.ML cs.IT cs.LG math.IT

Post-Selection Distributional Model Evaluation

后选择分布模型评估

Amirmohammad Farzaneh, Osvaldo Simeone

发表机构 * Institute for Intelligent Networked Systems (INSI)(智能网络系统研究所)

AI总结 本文提出PS-DME框架,用于在数据依赖的模型预选后进行统计有效的分布模型评估,通过控制后选择虚无覆盖率提升样本效率,实验证明其在性能-可靠性权衡中可靠性。

详情
AI中文摘要

正式的模型评估方法通常证明模型满足规定的性能指标(KPI)水平。然而,在许多应用中,相关的目标KPI水平可能无法预先确定,用户可能希望通过分析测试时性能与可靠性之间的完整权衡来比较候选模型。这项任务要求可靠估计测试时KPI分布,这变得更加复杂,因为同一数据通常用于预选候选模型集和估计其KPI分布,导致潜在的后选择偏差。本文介绍了后选择分布模型评估(PS-DME),一种通用的框架,用于在任意数据依赖的模型预选后进行统计有效的分布模型评估。基于e值,PS-DME控制后选择虚无覆盖率(FCR)以确保分布KPI估计的统计有效性,并建立了显式条件,证明其比基于样本分割的基线方法在样本效率上更具优势。在合成数据、文本到SQL解码与大语言模型以及电信网络性能评估中的实验表明,PS-DME能够在各种可靠性水平上可靠地比较候选配置,支持对性能-可靠性权衡的统计可靠探索。

英文摘要

Formal model evaluation methods typically certify that a model satisfies a prescribed target key performance indicator (KPI) level. However, in many applications, the relevant target KPI level may not be known a priori, and the user may instead wish to compare candidate models by analyzing the full trade-offs between performance and reliability achievable at test time by the models. This task, requiring the reliable estimate of the test-time KPI distributions, is made more complicated by the fact that the same data must often be used both to pre-select a subset of candidate models and to estimate their KPI distributions, causing a potential post-selection bias. In this work, we introduce post-selection distributional model evaluation (PS-DME), a general framework for statistically valid distributional model assessment after arbitrary data-dependent model pre-selection. Building on e-values, PS-DME controls post-selection false coverage rate (FCR) for the distributional KPI estimates and we establish explicit conditions under which it is provably more sample efficient than a baseline method based on sample splitting. Experiments on synthetic data, text-to-SQL decoding with large language models, and telecom network performance evaluation demonstrate that PS-DME enables reliable comparison of candidate configurations across a range of reliability levels, supporting the statistically reliable exploration of performance--reliability trade-offs.

2603.12278 2026-05-08 q-bio.OT cs.AI cs.LG

Unsupervised Anomaly Detection in Wearable Foot Sensor Data: A Baseline Feasibility Study Towards Diabetic Foot Ulcer Prevention

无监督的可穿戴足部传感器数据异常检测:一种面向糖尿病足溃疡预防的基线可行性研究

Md Tanvir Hasan Turja

发表机构 * Department of Computer Science, Middlesex University London(伦敦Middlesex大学计算机科学系)

AI总结 本文研究了利用无监督算法检测可穿戴足部传感器数据中的异常,通过温度和压力信号分析,建立基线模型以评估糖尿病足溃疡预防的可行性。

Comments 36 pages, 19 figures. Published in Biomedical Signal Processing and Control, Vol. 123, Part A, 110416, September 2026. https://doi.org/10.1016/j.bspc.2026.110416

Journal ref Biomedical Signal Processing and Control, Vol. 123, Part A, 110416 (2026)

详情
AI中文摘要

糖尿病足溃疡(DFUs)是糖尿病的严重并发症,与显著的发病率、截肢风险和医疗负担相关。开发有效的连续监测框架需要首先建立可靠的正常足部生物力学基线模型。本文提出了一种应用于可穿戴足部传感器时间序列数据的异常检测框架的可行性研究,具体使用NTC薄膜热电偶测量温度和FlexiForce A401压力传感器测量足底负荷。数据从312次捕捉会话中收集,生成93,790个有效多传感器读数,时间跨度为2023年9月至2024年6月。应用了两种无监督算法,即隔离森林和基于局部异常因子的K-最近邻(KNN/LOF),以检测足部温度和压力信号的统计偏差。结果表明,隔离森林对细微分布异常更敏感,而KNN/LOF识别集中极端偏差但会标记更多未被隔离森林验证的会话。由于没有临床真实情况,这种差异被解释为在共享5%污染假设下的较低特异性,而不是确认的假阳性率。压力和温度特征之间存在轻微正相关(0.41-0.48),支持多模态监测的案例。这些发现建立了一个经过验证的基线分析流程,并为未来涉及糖尿病患者的临床验证研究提供了方法学基础,其中检测到的异常与DFU相关病理生理学的关系可以被直接评估。

英文摘要

Diabetic foot ulcers (DFUs) are a severe complication of diabetes associated with significant morbidity, amputation risk, and healthcare burden. Developing effective continuous monitoring frameworks requires first establishing reliable baseline models of normal foot biomechanics. This paper presents a feasibility study of an anomaly detection framework applied to time-series data from wearable foot sensors, specifically NTC thin-film thermocouples for temperature and FlexiForce A401 pressure sensors for plantar load monitoring. Data were collected from healthy adult subjects across 312 capture sessions on an instrumented pathway, generating 93,790 valid multi-sensor readings spanning September 2023 to June 2024. Two unsupervised algorithms, Isolation Forest and K-Nearest Neighbors using Local Outlier Factor (KNN/LOF), were applied to detect statistical deviations in foot temperature and pressure signals. Results show that Isolation Forest is more sensitive to subtle, distributed anomalies, while KNN/LOF identifies concentrated extreme deviations but flags a higher proportion of sessions not corroborated by Isolation Forest. Since no clinical ground truth is available, this difference is interpreted as lower specificity under the shared 5 percent contamination assumption rather than a confirmed false-positive rate. A mild positive correlation (0.41-0.48) between pressure and temperature features supports the case for combined multi-modal monitoring. These findings establish a validated baseline analytical pipeline and provide a methodological foundation for future clinical validation studies involving diabetic patients, where the relationship between detected anomalies and DFU-related pathophysiology can be directly assessed.

2603.12031 2026-05-08 cs.DC cs.LG cs.MA

AGMARL-DKS: An Adaptive Graph-Enhanced Multi-Agent Reinforcement Learning for Dynamic Kubernetes Scheduling

AGMARL-DKS: 一种适应性图增强多智能体强化学习用于动态Kubernetes调度

Hamed Hamzeh

发表机构 * University of Westminster Computer Science(威斯敏斯特大学计算机科学) University of Westminster(威斯敏斯特大学)

AI总结 本文提出AGMARL-DKS,通过多智能体强化学习解决动态Kubernetes调度问题,引入图神经网络和应力感知策略,提升容错性和资源利用率。

详情
AI中文摘要

最先进的云原生应用需要智能调度器在系统稳定性、资源利用率和成本之间取得平衡。尽管Kubernetes默认提供可行性置放,但近期研究探索了强化学习(RL)用于更智能的调度决策。然而,现有基于RL的调度器存在三大局限:首先,大多数调度器使用集中式单一智能体,无法扩展至大规模异构集群;其次,使用多目标奖励函数的调度器假设简单静态线性组合目标;第三,无先前工作产生能适应动态条件的应力感知调度器。为解决这些研究空白,本文提出AGMARL-DKS,通过三个创新:1)将调度挑战视为协作多智能体问题,每个集群节点作为智能体,集中训练后分散执行;2)使用图神经网络(GNN)构建全局集群上下文的状态表示,优于仅依赖本地观察的方法;3)采用应力感知的字典序排列策略替代简单静态线性加权。在Google Kubernetes Engine(GKE)上的评估显示,AGMARL-DKS在容错性、利用率和成本方面显著优于默认调度器,尤其在批处理和关键任务工作负载调度中表现突出。

英文摘要

State-of-the-art cloud-native applications require intelligent schedulers that can effectively balance system stability, resource utilisation, and associated costs. While Kubernetes provides feasibility-based placement by default, recent research efforts have explored the use of reinforcement learning (RL) for more intelligent scheduling decisions. However, current RL-based schedulers have three major limitations. First, most of these schedulers use monolithic centralised agents, which are non-scalable for large heterogeneous clusters. Second, the ones that use multi-objective reward functions assume simple, static, linear combinations of the objectives. Third, no previous work has produced a stress-aware scheduler that can react adaptively to dynamic conditions. To address these gaps in current research, we propose the Adaptive Graph-enhanced Multi-Agent Reinforcement Learning Dynamic Kubernetes Scheduler (AGMARL-DKS). AGMARL-DKS addresses these gaps by introducing three major innovations. First, we construct a scalable solution by treating the scheduling challenge as a cooperative multi-agent problem, where every cluster node operates as an agent, employing centralised training methods before decentralised execution. Second, to be context-aware and yet decentralised, we use a Graph Neural Network (GNN) to build a state representation of the global cluster context at each agent. This represents an improvement over methods that rely solely on local observations. Finally, to make trade-offs between these objectives, we use a stress-aware lexicographical ordering policy instead of a simple, static linear weighting of these objectives. The evaluations in Google Kubernetes Engine (GKE) reveal that AGMARL-DKS significantly outperforms the default scheduler in terms of fault tolerance, utilisation, and cost, especially in scheduling batch and mission-critical workloads.

2603.02950 2026-05-08 cs.CY cs.AI cs.GT

Path Dependence under Adaptive AI Delegation

适应性AI委托下的路径依赖

Lingxiao Huang, Nisheeth K. Vishnoi

发表机构 * Nanjing University(南京大学) Yale University(耶鲁大学)

AI总结 研究探讨了适应性AI委托对长期技能与独立工作能力的动态影响,揭示了路径依赖性及技能变化与依赖倾向的耦合效应。

详情
AI中文摘要

重复的AI协助可提升短期任务表现,但会降低未来独立工作的技能储备。本文建立了一个数学框架来分析这种长期权衡。模型跟踪两个状态变量:一个隐含的人类技能水平,决定预期的独立表现;以及一个委托水平,代表学习者逐渐依赖AI的倾向。技能通过错误驱动的学习在实践中增强,但在委托下会衰减;委托则根据观察到的表现变化,当AI协助工作表现优于独立工作时会增加。我们分析了由此产生的动态,并将其与固定委托进行对比。在固定委托下,技能遵循一个维的学习-衰减过程,具有单一稳定平衡点。在适应性委托下,耦合系统有两个吸引平衡点,由内部鞍点的稳定曼陀罗分隔开。这种分隔线的存在和几何特性需要对耦合动态进行全局相位平面分析。该系统具有路径依赖性:初始技能或依赖的微小差异可能导致不同的长期结果。我们利用这一特征证明,AI协助可提升短期表现,但比无AI基准线的长期表现更差。增加AI能力可扩大低技能平衡点的吸引盆地,使委托在更长时间内显得有益,但增加最终技能丧失的风险。这种定性图景在替代规范下持续存在。这些结果表明,风险并非AI协助本身,而是性能驱动的依赖与使用依赖的技能变化之间的耦合。

英文摘要

Repeated AI assistance can improve immediate task performance while reducing the skill available for future independent work. We develop a mathematical framework for this long-run tradeoff. The model tracks two state variables: a latent human skill level governing expected independent performance, and a delegation level representing the learner's evolving tendency to rely on AI. Skill changes through error-driven learning under practice and decay under delegation; delegation responds to observed performance, increasing when AI-assisted work appears to outperform independent work. We analyze the resulting dynamics and contrast them with fixed delegation. With fixed delegation, skill follows a one-dimensional learning-decay process with a single stable equilibrium. With adaptive delegation, the coupled system has two attracting equilibria separated by the stable manifold of an interior saddle. The existence and geometry of this separatrix require a global phase-plane analysis of the coupled dynamics. The system is path-dependent: small differences in initial skill or reliance can lead to different long-run outcomes. We use this characterization to show that AI assistance can improve short-run performance while producing worse long-run performance than a no-AI baseline. Increasing AI capability can enlarge the basin of attraction of the low-skill equilibrium, making delegation appear beneficial for longer while increasing the risk of eventual skill loss. The qualitative picture is observed to persist across alternative specifications. Together, these results show that the risk is not AI assistance itself, but the coupling between performance-driven reliance and use-dependent skill change.

2603.01192 2026-05-08 stat.ML cs.LG

A Basin-Selection Perspective on Grokking via Singular Learning Theory

从奇异学习理论的角度看通过奇异学习理论实现的grokking现象

Ben Cullen, Sergio Estan-Ruiz, Riya Danait, Jiayi Li

发表机构 * Department of Computer Science(计算机科学系) Department of Mathematics(数学系) Mathematical Institute(数学研究所) University of Pisa(比萨大学) Imperial College London(伦敦帝国学院) University of Oxford(牛津大学) Section of Mathematics and Artificial Intelligence(数学与人工智能系) Max Planck Institute of Molecular Cell Biology and Genetics(马克斯·普朗克分子细胞生物学和遗传学研究所) Center for Systems Biology(系统生物学中心) Faculty of Mathematics(数学系) TU Dresden(德累斯顿技术大学)

AI总结 本文从奇异学习理论的角度研究grokking现象,通过分析损失景观的几何特性,探讨了记忆到泛化过渡的机制,并推导了浅层二次网络中局部学习系数的解析公式。

详情
AI中文摘要

Grokking,即在长时间训练后从记忆到泛化的突然转变,表明存在具有不同统计特性的竞争解盆地。我们通过奇异学习理论(SLT)研究这一现象,SLT是一种贝叶斯框架,用于描述损失景观的几何结构。关键度量是局部学习系数(LLC),它量化了损失表面的局部退化程度。SLT将低LLC盆地与较高的后验质量集中和较低的预期泛化误差联系起来。利用SLT,我们开发了二次网络中grokking的盆地选择视角:LLC通过统计偏好对竞争的近零损失盆地进行排名,而训练时在这些盆地之间的转换由优化动态决定。在此观点中,groke对应于从高LLC(记忆)盆地到低LLC(泛化)盆地的转换,该盆地主导后验。为了支持这一点,我们推导了浅层二次网络在懒惰和特征学习模式下的LLC解析公式。实证上,我们展示了从训练数据估计的LLC轨迹跟踪泛化的起始点,并提供了优化路径的有信息探测。

英文摘要

Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape. The key measure is the local learning coefficient (LLC) which quantifies the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging SLT, we develop a basin-selection perspective on grokking in quadratic networks: LLC ranks competing near-zero-loss basins by statistical preference, while the training-time transition between them is governed by optimisation dynamics. In this view, grokking corresponds to a transition from a higher-LLC (memorising) basin to a lower-LLC (generalising) basin that dominates the posterior. To support this, we derive analytic formulas for the LLC in shallow quadratic networks under both lazy and feature learning regimes. Empirically, we demonstrate that LLC trajectories estimated from training data track the onset of generalisation and provide an informative probe of the optimisation path.

2603.00113 2026-05-08 cs.MA cs.AI cs.CE cs.CY cs.SI

AI Agents Alone Are Not (Yet) Sufficient for Social Simulation

仅靠AI代理不足以进行社会模拟

Yiming Li, Dacheng Tao

发表机构 * College of Computing and Data Science, Nanyang Technological University, Singapore(南洋理工大学计算机与数据科学学院,新加坡)

AI总结 本文指出LLM代理单独使用不足以实现真实的社会动态,提出需考虑环境互动和调度机制的统一框架。

Comments 16 pages

详情
AI中文摘要

近年来,大语言模型(LLMs)的进步推动了使用集成代理的社会模拟研究,通常隐含假设角色指定代理在网络化多代理环境中会产生真实人口动态。本文认为LLM代理单独使用尚不足以进行社会模拟。我们归因于当前代理流程与模拟作为科学要求之间的系统性不匹配。具体而言,角色扮演的合理性不等于人类行为的有效性;集体结果常由代理与环境的共动态而非仅代理间消息传递决定;结果可能受交互协议、调度和初始信息先验主导。为使这些机制显式和可审计,我们提出将AI代理社会模拟统一为涉及环境的马尔可夫游戏,从中推导出设计、评估和解释的具体行动。

英文摘要

Recent advances in large language models (LLMs) have spurred growing interest in using LLM-integrated agents for social simulation, often under the implicit assumption that realistic population dynamics will emerge once role-specified agents are placed in a networked multi-agent setting. This position paper argues that LLM-based agents alone are not (yet) sufficient for social simulation. We attribute this over-optimism to a systematic mismatch between what current agent pipelines are typically optimized and validated to produce and what simulation-as-science requires. Concretely, role-playing plausibility does not imply faithful human behavioral validity; collective outcomes are frequently mediated by agent-environment co-dynamics rather than agent-agent messaging alone; and results can be dominated by interaction protocols, scheduling, and initial information priors. To make these underlying mechanisms explicit and auditable, we propose a unified formulation of AI agent-based social simulation as an environment-involved Markov game with explicit exposure and scheduling mechanisms, from which we derive concrete actions for design, evaluation, and interpretation.

2602.12805 2026-05-08 physics.med-ph cs.SD eess.IV

A Wavefield Correlation Approach to Improve Sound Speed Estimation in Ultrasound Autofocusing

一种基于波场相关的方法用于改进超声自聚焦中的声速估计

Louise Zhuang, Samuel Beuret, Ben Frey, Saachi Munot, Walter Simson, Dongwoon Hyun, Jeremy J. Dahl

发表机构 * Department of Electrical Engineering, Stanford University(电气工程系,斯坦福大学) Department of Radiology, School of Medicine, Stanford University(放射学系,医学院,斯坦福大学) Department of Applied Physics, Stanford University(应用物理学系,斯坦福大学) Department of Biomedical Engineering, Columbia University(生物医学工程系,哥伦比亚大学)

AI总结 本文提出利用波场相关技术优化声速估计,以提高超声自聚焦中图像质量,通过改进的波场相关成像方法减少杂波干扰,提升成像分辨率和对比度。

详情
AI中文摘要

在脉冲回声超声中,当成像不考虑波前畸变时,色差往往会退化图像质量。为解决这一问题,过去十年中已开发出本地声速估计器用于分布式色差校正。最近,基于迭代优化的方法提高了声速估计的准确性,但其准确性受介质中回声杂波和直线射线波传播模型的限制。为解决这些挑战,本文提出在进行声速优化时使用波场相关(WFC)成像。WFC是一种超声适应的反时间迁移,通过相关模拟正向传播的发射波场和反向传播的接收波场来重建图像。此过程更准确地建模异质介质中的波传播,并能通过其时空匹配滤波效应减少扩散杂波。本文实现了使用自动微分软件的WFC成像器,并通过梯度下降优化正则化共同中点相位聚焦标准来估计声速图。该方法与之前依赖于延迟和求和(DAS)的直线射线时间延迟计算方法在多种模拟、仿生和体内数据中进行了比较,这些数据具有较大的声速变化和杂波。结果表明,使用WFC可以降低声速估计误差,从而提高校正图像的分辨率和对比度。特别是,这些有前景的结果可能有助于改进具有挑战性的临床场景中的脉冲回声成像。

英文摘要

In pulse-echo ultrasound, aberration often degrades image quality when beamforming does not account for wavefront distortions. To address this issue, local sound speed estimators have been developed in the past decade for distributed aberration correction. Recently, methods based on iterative optimization have improved sound speed accuracy with respect to earlier approaches. However, the accuracy of these newer methods is limited by media with reverberation clutter and by the straight-ray model of wave propagation. To address these challenges, we propose using wavefield correlation (WFC) beamforming when performing sound speed optimization. WFC, an ultrasound adaptation of reverse time migration, correlates simulated forward-propagated transmit wavefields and backwards-propagated receive wavefields in order to reconstruct images. This process more accurately models wave propagation in heterogeneous media and can decrease diffuse clutter due to its spatiotemporal matched filtering effect. We implement herein a WFC beamformer using an auto-differentiation software and estimate the sound speed map by optimizing a regularized common-midpoint phase focusing criterion using gradient descent. This approach is compared to a previous method relying on delay and sum (DAS) with straight-ray time delay calculations on a variety of simulated, phantom, and in vivo data with large sound speed variations and clutter. Results show that using WFC decreases sound speed estimation error, leading to improvements in resolution and contrast in the corrected image. In particular, these promising results have potential to improve pulse-echo imaging for challenging clinical scenarios.

2602.08318 2026-05-08 stat.ML cs.LG nlin.CD

Is Flow Matching Just Trajectory Replay for Sequential Data?

流匹配是否只是用于序列数据的轨迹回放?

Soon Hoe Lim, Shizheng Lin, Michael W. Mahoney, N. Benjamin Erichson

发表机构 * Department of Mathematics(数学系) Nordita KTH Royal Institute of Technology and Stockholm University(KTH皇家理工学院与斯德哥尔摩大学联合研究所) Department of Statistics(统计系) International Computer Science Institute(国际计算机科学研究所) Lawrence Berkeley National Laboratory(伯克利国家实验室)

AI总结 本文研究流匹配是否学习可转移的动力学结构还是仅进行有效轨迹回放,通过推导完美函数逼近极限下的速度场,揭示流匹配模型作为非参数解的参数化近似,并提出稳健的ODE生成方案。

Comments 56 pages

详情
AI中文摘要

流匹配(FM)在科学领域的时间序列生成和预测中日益流行,但其是否学习可转移的动力学结构还是仅进行有效轨迹回放尚不明确。本文通过推导完美函数逼近极限下的经验FM目标的速度场,研究这一问题。对于实践中常用的高斯条件路径,我们表明隐含的采样器是一个ODE,其动态构成非参数化、内存增强的连续时间动力学系统。最优场的表达式为相似度加权的瞬时速度混合,使数据集依赖性显式且可解释。这一特性将神经FM模型定位为理想非参数解的参数化近似,并提出实用的ODE生成方案。作为分析的副产品,所得到的闭式采样器FreeFM能够从历史过渡直接提供非线性动力学系统基准的强概率预测,无需训练。

英文摘要

Flow matching (FM) is increasingly used in scientific domains for time series generation and forecasting, where data often arise from underlying dynamical systems. However, it is not well-understood whether it learns transferable dynamical structure or simply performs an effective "trajectory replay". We study this question by deriving the velocity field targeted by the empirical FM objective on sequential data in the limit of perfect function approximation. For the Gaussian conditional paths commonly used in practice, we show that the implied sampler is an ODE whose dynamics constitutes a nonparametric, memory-augmented continuous-time dynamical system. The optimal field admits a closed-form expression as a similarity-weighted mixture of instantaneous velocities induced by observed transitions, making the dataset dependence explicit and interpretable. This characterization positions neural FM models as parametric surrogates of an ideal nonparametric solution and suggests practical approximation schemes for robust ODE-based generation. As a byproduct of our analysis, the resulting closed-form sampler, FreeFM, provides strong probabilistic forecasts on nonlinear dynamical system benchmarks directly from historical transitions, without training.

2602.07633 2026-05-08 stat.ML cs.LG stat.ME

Flow-Based Conformal Predictive Distributions

基于流的符合预测分布

Trevor Harris

发表机构 * Department of Statistics(统计学系) University of Connecticut(康涅狄格大学) Storrs, CT 06269(斯托尔斯,CT 06269)

AI总结 本文提出基于流的方法,用于高效生成符合预测边界,适用于任意维度,通过混合置信水平得到符合预测分布,并在多个领域进行评估。

Comments 9 pages, 15 figures, 20 appendix pages

详情
AI中文摘要

符合预测提供了一个无分布框架,通过具有精确有限样本覆盖的预测集进行不确定性量化。在低维情况下这些集容易解释,但在高维或结构化输出空间中难以表示和使用,限制了其与下游任务如采样和概率预测的整合。我们证明任何足够正则的可微非符合分数诱导输出空间上的确定性流,其轨迹收敛到相应符合预测集的边界。这导致了一种计算高效、无需训练的方法,用于在任意维度中采样符合边界。跨置信水平混合产生符合预测分布,其分位数区域与经验符合预测集一致。我们提供了一个近似界,将CPD预测误差分解为分数诱导的扭曲、基础测度质量以及梯度流诱导的扭曲。我们在PDE逆问题、降水下缩、气候模型去偏差和飓风轨迹预测中评估了该方法。

英文摘要

Conformal prediction provides a distribution-free framework for uncertainty quantification via prediction sets with exact finite-sample coverage. In low dimensions these sets are easy to interpret, but in high-dimensional or structured output spaces they are difficult to represent and use, which can limit their ability to integrate with downstream tasks such as sampling and probabilistic forecasting. We show that any sufficiently regular differentiable nonconformity score induces a deterministic flow on the output space whose trajectories converge to the boundary of the corresponding conformal prediction set. This leads to a computationally efficient, training-free method for sampling conformal boundaries in arbitrary dimensions. Mixing across confidence levels yields conformal predictive distributions whose quantile regions coincide with the empirical conformal prediction sets. We provide an approximation bound decomposing CPD predictive error into score-induced distortion, base-measure quality, and gradient flow-induced distortion. We evaluate the approach on PDE inverse problems, precipitation downscaling, climate model debiasing, and hurricane trajectory forecasting.

2602.03258 2026-05-08 stat.ML cs.LG

Principled Federated Random Forests for Heterogeneous Data

原理化的联邦随机森林用于异质数据

Rémi Khellaf, Erwan Scornet, Aurélien Bellet, Julie Josse

发表机构 * Inria(法国国家信息与自动化研究所) PreMeDICaL(预医学实验室) Inserm(法国国家医学研究院) University of Montpellier(蒙彼利埃大学) Sorbonne Université(索邦大学) Université Paris Cité(巴黎城市大学) CNRS(国家科学研究中心) LPSM(巴黎高等师范学院)

AI总结 本文提出FedForest,一种适用于水平分割数据的联邦随机森林算法,能自然处理客户端数据异质性,通过聚合精心选择的客户端统计信息近似集中算法的分裂过程,并实现非参数化个性化。

详情
AI中文摘要

随机森林(RF)是用于集中表格数据最强大且广泛应用的预测模型之一,但很少有方法能将其适应到联邦学习设置中。不同于大多数联邦学习方法,随机森林的分段常数性质阻止了精确的梯度优化。因此,现有的联邦随机森林实现依赖于不严谨的启发式方法:例如,聚合在客户端独立训练的决策树无法优化全局纯度标准,即使在简单的分布偏移下也是如此。我们提出FedForest,一种新的联邦随机森林算法,适用于水平分割的数据,能够自然适应各种客户端数据异质性,从协变量偏移到更复杂的结果偏移机制。我们证明,基于聚合精心选择的客户端统计信息的分裂过程,能近似集中算法所选的分裂。此外,FedForest允许在客户端指示符上进行分裂,实现一种不存在于先前联邦随机森林方法中的非参数化形式的个性化。实证上,我们证明,所得到的联邦森林在异质基准上接近集中性能,同时保持通信高效。

英文摘要

Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike most federated learning approaches, the piecewise-constant nature of RF prevents exact gradient-based optimization. As a result, existing federated RF implementations rely on unprincipled heuristics: for instance, aggregating decision trees trained independently on clients fails to optimize the global impurity criterion, even under simple distribution shifts. We propose FedForest, a new federated RF algorithm for horizontally partitioned data that naturally accommodates diverse forms of client data heterogeneity, from covariate shift to more complex outcome shift mechanisms. We prove that our splitting procedure, based on aggregating carefully chosen client statistics, closely approximates the split selected by a centralized algorithm. Moreover, FedForest allows splits on client indicators, enabling a non-parametric form of personalization that is absent from prior federated random forest methods. Empirically, we demonstrate that the resulting federated forests closely match centralized performance across heterogeneous benchmarks while remaining communication-efficient.

2601.19886 2026-05-08 econ.GN cs.AI cs.CY cs.GT q-fin.EC

AI Cap-and-Trade: Efficiency Incentives for Accessibility and Sustainability

AI配额交易:为可及性和可持续性提升效率激励

Marco Bornstein, Amrit Singh Bedi

发表机构 * Independent Researcher(独立研究者) University of Central Florida(佛罗里达中央大学)

AI总结 本文提出通过市场机制激励AI效率,减少排放并为学术界和中小企业创造机会,倡导实施AI配额交易制度。

Comments 22 pages, 2 figures. Accepted as a position paper at ICML 2026

详情
AI中文摘要

人工智能(AI)主导的竞赛往往更重视规模而非效率。超大规模是行业常见的做法:更大的模型、更多的数据以及尽可能多的计算资源。使用更多资源是提升AI性能的更简单路径。因此,效率被弱化了。 consequently,对昂贵计算资源的需求使学术界和中小企业边缘化。同时,由于AI使用增加,能源支出增长导致环境成本上升。为应对可及性和可持续性问题,我们主张研究并实施基于市场的机制,以激励AI效率。我们认为,激励高效操作和方法将减少排放,同时为学术界和中小企业创造新机会。作为呼吁行动,我们提出AI配额交易制度。我们的系统可证明减少AI部署的计算量,从而降低排放,并将效率 monetize 以造福学术界和中小企业。

英文摘要

The race for artificial intelligence (AI) dominance often prioritizes scale over efficiency. Hyper-scaling is the common industry approach: larger models, more data, and as many computational resources as possible. Using more resources is a simpler path to improved AI performance. Thus, efficiency has been de-emphasized. Consequently, the need for costly computational resources has marginalized academics and smaller companies. Simultaneously, increased energy expenditure, due to growing AI use, has led to mounting environmental costs. In response to accessibility and sustainability concerns, we argue for research into, and implementation of, market-based methods that incentivize AI efficiency. We believe that incentivizing efficient operations and approaches will reduce emissions while opening new opportunities for academics and smaller companies. As a call to action, we propose a cap-and-trade system for AI. Our system provably reduces computations for AI deployment, thereby lowering emissions and monetizing efficiency to the benefit of academics and smaller companies.

2512.09538 2026-05-08 stat.ML cs.CL cs.LG

Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search

不要抛弃你的光束:通过光束搜索改进大语言模型中的基于一致性不确定性的方法

Ekaterina Fadeeva, Maiya Goloburda, Aleksandr Rubashevskii, Roman Vashurin, Artem Shelmanov, Preslav Nakov, Mrinmaya Sachan, Maxim Panov

发表机构 * ETH Zurich(苏黎世联邦理工学院) MBZUAI(穆萨伊人工智能研究院)

AI总结 本文提出通过光束搜索改进大语言模型中的基于一致性的不确定性量化方法,减少方差并提升性能,实验证明其在六个问答数据集上达到最先进的性能。

详情
AI中文摘要

基于一致性的方法已成为大语言模型中不确定性量化(UQ)的有效方法。这些方法通常依赖于通过多项式采样获得的多个生成,测量其一致性水平。然而,在短格式问答中,多项式采样容易由于尖峰分布产生重复,其随机性引入了不确定性估计在不同运行中的显著方差。我们引入了一种新的方法家族,利用光束搜索生成一致性UQ的候选,相比多项式采样,实现了更好的性能和更小的方差。我们还提供了光束集概率质量的理论下限,表明在该下限下,光束搜索的误差比多项式采样更小。我们实验证明了我们的方法在六个问答数据集上的表现,发现其对多项式采样的一致性改进导致了最先进的UQ性能。

英文摘要

Consistency-based methods have emerged as an effective approach to uncertainty quantification (UQ) in large language models. These methods typically rely on several generations obtained via multinomial sampling, measuring their agreement level. However, in short-form QA, multinomial sampling is prone to producing duplicates due to peaked distributions, and its stochasticity introduces considerable variance in uncertainty estimates across runs. We introduce a new family of methods that employ beam search to generate candidates for consistency-based UQ, yielding improved performance and reduced variance compared to multinomial sampling. We also provide a theoretical lower bound on the beam set probability mass under which beam search achieves a smaller error than multinomial sampling. We empirically evaluate our approach on six QA datasets and find that its consistent improvements over multinomial sampling lead to state-of-the-art UQ performance.

2511.23022 2026-05-08 eess.SY cs.RO cs.SY math.OC

Approximation-Free Control Barrier Functions for Prescribed-Time Reach-Avoid of Unknown Systems

无近似控制屏障函数用于未知系统的预定时间到达-避免

Shubham Sawarkar, Pushpak Jagtap

发表机构 * Indian Institute of Science (IISc)(印度科学研究所(IISc))

AI总结 本文提出无需在线模型学习或不确定性估计的控制屏障函数方法,通过虚拟系统生成安全参考,实现未知系统在动态障碍物环境中的预定时间到达-避免控制。

详情
AI中文摘要

我们研究了非线性系统在动态障碍物环境中的预定时间到达-避免(PT-RA)控制问题。与基于鲁棒或学习的控制屏障函数(CBF)方法不同,所提出的框架无需在线模型学习或不确定性界估计。通过一个简单的虚拟系统求解基于控制屏障函数的二次规划(CBF-QP)以生成满足PT-RA条件的安全参考,该参考针对时间变化的紧缩障碍和目标集。利用无近似反馈律将真实系统限制在参考周围的虚拟约束区(VCZ)内。这种构造在未知动态和动态约束下保证了实时安全性和预定时间目标可达性,而无需显式模型识别或离线预计算。仿真结果展示了可靠的动态障碍物避让和及时收敛到目标集。

英文摘要

We study the prescribed-time reach-avoid (PT-RA) control problem for nonlinear systems with unknown dynamics operating in environments with moving obstacles. Unlike robust or learning based Control Barrier Function (CBF) methods, the proposed framework requires neither online model learning nor uncertainty bound estimation. A CBF-based Quadratic Program (CBF-QP) is solved on a simple virtual system to generate a safe reference satisfying PT-RA conditions with respect to time-varying, tightened obstacle and goal sets. The true system is confined to a Virtual Confinement Zone (VCZ) around this reference using an approximation-free feedback law. This construction guarantees real-time safety and prescribed-time target reachability under unknown dynamics and dynamic constraints without explicit model identification or offline precomputation. Simulation results illustrate reliable dynamic obstacle avoidance and timely convergence to the target set.

2511.08416 2026-05-08 eess.SP cs.IT cs.LG cs.MM math.IT

Generative AI Meets 6G and Beyond: Diffusion Models for Semantic Communications

生成AI遇见6G与未来:扩散模型在语义通信中的应用

Hai-Long Qin, Jincheng Dai, Guo Lu, Shuo Shao, Sixian Wang, Tongda Xu, Wenjun Zhang, Ping Zhang, Khaled B. Letaief

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Shanghai Jiao Tong University(上海交通大学) East China Normal University(华东师范大学) Tsinghua University(清华大学) Hong Kong University of Science and Technology(香港科技大学)

AI总结 本文探讨生成AI与6G通信的结合,介绍扩散模型在语义通信中的应用,分析其在可控生成、高效推理和跨域适应中的核心方法及贡献。

Comments Accepted by IEEE COMST, GitHub repository: https://github.com/qin-jingyun/Awesome-DiffComm, project page: https://qin-jingyun.github.io/Awesome-DiffComm

详情
AI中文摘要

语义通信标志着从位精确传输向以意义为中心的通信转变,对于无线系统接近理论容量极限至关重要。生成AI的出现推动了生成语义通信的发展,接收端通过利用学习先验知识从最小的语义线索中重建内容。在生成方法中,扩散模型因其卓越的生成质量、稳定的训练动态和严谨的理论基础而突出。然而,目前缺乏将扩散技术与通信系统设计系统联系起来的指导,迫使研究人员在分散的文献中导航。本文首次全面介绍了扩散模型在生成语义通信中的教程。我们介绍了基于得分的扩散基础,并系统回顾了三个技术支柱:条件扩散用于可控生成、高效扩散用于加速推理、以及通用扩散用于跨域适应。此外,我们引入了逆问题视角,将语义解码重新表述为后验推断,将语义通信与计算成像联系起来。通过分析以人类为中心、以机器为中心和以代理为中心的场景,我们展示了扩散模型如何在极端压缩的同时保持语义保真度和鲁棒性。通过将生成AI创新与通信系统设计结合,本文旨在将扩散模型确立为下一代无线网络及未来的基础组件。

英文摘要

Semantic communications mark a paradigm shift from bit-accurate transmission toward meaning-centric communication, essential as wireless systems approach theoretical capacity limits. The emergence of generative AI has catalyzed generative semantic communications, where receivers reconstruct content from minimal semantic cues by leveraging learned priors. Among generative approaches, diffusion models stand out for their superior generation quality, stable training dynamics, and rigorous theoretical foundations. However, the field currently lacks systematic guidance connecting diffusion techniques to communication system design, forcing researchers to navigate disparate literatures. This article provides the first comprehensive tutorial on diffusion models for generative semantic communications. We present score-based diffusion foundations and systematically review three technical pillars: conditional diffusion for controllable generation, efficient diffusion for accelerated inference, and generalized diffusion for cross-domain adaptation. In addition, we introduce an inverse problem perspective that reformulates semantic decoding as posterior inference, bridging semantic communications with computational imaging. Through analysis of human-centric, machine-centric, and agent-centric scenarios, we illustrate how diffusion models enable extreme compression while maintaining semantic fidelity and robustness. By bridging generative AI innovations with communication system design, this article aims to establish diffusion models as foundational components of next-generation wireless networks and beyond.

2510.23254 2026-05-08 stat.ML cs.LG math.ST stat.TH

Optimal In-context Adaptivity and Distributional Robustness of Transformers

Transformer的最优上下文适应性与分布鲁棒性

Tianyi Ma, Tengyao Wang, Richard J. Samworth

发表机构 * Statistical Laboratory, University of Cambridge(剑桥大学统计实验室) Department of Statistics, London School of Economics and Political Science(伦敦政治经济学院统计系)

AI总结 本文研究了Transformer在不同测试分布下的性能,证明了预训练模型在固定难度任务上能实现最优收敛速度,并对分布偏移具有鲁棒性。

Comments 47 pages, 4 figures

详情
AI中文摘要

本文研究了上下文学习问题,其中Transformer在混合分布$π=\sum_{α∈\mathcal{A}} λ_α π_α$(称为预训练先验)上进行预训练,其中每个混合分量$π_α$是特定难度水平$α$上的任务分布。我们的目标是理解预训练Transformer在不同测试分布$μ$上的性能,$μ$由固定难度$β∈\mathcal{A}$的任务组成,并可能相对于$π_β$存在分布偏移,前提是chi-squared散度$χ^2(μ,π_β)$最大为$κ$。特别地,我们考虑了具有随机光滑性的非参数回归问题以及具有随机光滑性和随机有效维度的多索引模型。我们证明,预训练在足够数据上的大型Transformer能够实现对应难度水平$β$的最优收敛速率,这在chi-squared散度球内的测试分布$μ$上是均匀的。因此,预训练Transformer能够在更简单的任务上实现更快的收敛速率,并在测试时对分布偏移具有鲁棒性。最后,我们证明即使估计器可以访问测试分布$μ$,其在$μ$上的期望风险收敛速率也不能比我们的预训练Transformer更快,从而提供了比minimax下限更合适的最优性保证。

英文摘要

We study in-context learning problems where a Transformer is pretrained on tasks drawn from a mixture distribution $π=\sum_{α\in\mathcal{A}} λ_α π_α$, called the pretraining prior, in which each mixture component $π_α$ is a distribution on tasks of a specific difficulty level indexed by $α$. Our goal is to understand the performance of the pretrained Transformer when evaluated on a different test distribution $μ$, consisting of tasks of fixed difficulty $β\in\mathcal{A}$, and with potential distribution shift relative to $π_β$, subject to the chi-squared divergence $χ^2(μ,π_β)$ being at most $κ$. In particular, we consider nonparametric regression problems with random smoothness, and multi-index models with both random smoothness and random effective dimension. We prove that a large Transformer pretrained on sufficient data achieves the optimal rate of convergence corresponding to the difficulty level $β$, uniformly over test distributions $μ$ in the chi-squared divergence ball. Thus, the pretrained Transformer is able to achieve faster rates of convergence on easier tasks and is robust to distribution shift at test time. Finally, we prove that even if an estimator had access to the test distribution $μ$, the convergence rate of its expected risk over $μ$ could not be faster than that of our pretrained Transformers, thereby providing a more appropriate optimality guarantee than minimax lower bounds.

2510.01189 2026-05-08 cs.HC cs.AI

Beyond Value Elicitation: Towards Moral Profiles in Early Requirements Engineering via Role-Playing Games and Anthropologist LLMs

超越价值获取:通过角色扮演游戏和人类学家LLM实现早期需求工程中的道德画像

Gianluca De Ninno, Paola Inverardi, Francesca Belotti

发表机构 * Computer Science Area, Gran Sasso Science Institute(格兰萨索科学研究所计算机科学部门) Department of Computer Science, University of Pisa(比萨大学计算机科学系) Department of Humanities, University of L’Aquila(拉奎拉大学人文学系)

AI总结 本文通过结合沉浸式角色扮演游戏与LLM分析,提出一种获取和表示数字系统用户道德画像的方法,解决了传统方法在捕捉隐性、情境依赖性价值观上的不足。

详情
AI中文摘要

本研究提出了一种概念验证方法,通过结合沉浸式角色扮演游戏(RPGs)与大语言模型(LLM)分析,旨在需求工程(RE)中获取和表示数字系统用户道德画像。尽管现有方法依赖于预定义的价值分类和显式表达,但价值观往往隐性、情境依赖且难以直接表达。为解决这些限制,本文提出从获取离散道德价值观转向叙事重构和用户道德画像的表示。基于现象学和叙事人类学,该方法专注于捕捉用户道德取向,这些取向通过情境决策产生。RPG会话生成富含情境的叙事数据,然后由专门的LLM(GPT-A)分析以生成个体人类学道德画像(IAMPs)。基于模型输出与参与者在未见过的道德情境中的回应进行交叉比较的验证过程评估了生成表示的充分性。结果表明,RPG环境有效支持了生成丰富、情境依赖的数据以获取隐性价值观,且基于人类学的LLM能够将此类数据转化为用户道德画像的连贯叙事表示。这些表示使在给定领域内对用户偏好和价值观的上下文解释成为可能,当解释框架捕捉到行为、潜在动机和个体领域专业知识之间的关系时,性能得到提升。从需求工程角度看,该方法使在保持用户情境性和动态性的同时分析用户偏好和权衡,为在需求工程早期阶段整合人类道德价值观提供了基础。

英文摘要

This study presents a proof of concept for eliciting and representing the moral profiles of digital system users in Requirements Engineering (RE) by combining immersive role-playing games (RPGs) with large language model (LLM) analysis. While existing approaches rely on predefined value taxonomies and explicit articulation, values are often tacit, context-dependent, and difficult to express directly. To address these limitations, we propose moving from the elicitation of discrete moral values to the narrative reconstruction and representation of users' moral profiles. Grounded in phenomenological and narrative anthropology, the approach focuses on capturing users' moral orientations as they emerge through situated decision-making. RPG sessions generate context-rich narrative data, which are then analyzed by a specialized LLM (GPT-A) to produce individual anthropological moral profiles (IAMPs). A validation process based on cross-comparison between model outputs and participants' responses in unseen moral scenarios assesses the adequacy of the generated representations. Results indicate that RPG environments effectively support the generation of rich, context-dependent data for eliciting tacit values, and that an anthropologically grounded LLM can transform such data into coherent narrative representations of users' moral profiles. These representations enable the contextual interpretation of users' preferences and values within the given domain, with improved performance when interpretive framing captures relationships between actions, underlying motivations, and individual domain expertise. From an RE perspective, this approach enables the analysis of user preferences and trade-offs while preserving their situated and dynamic nature, providing a foundation for integrating human moral values into the early stages of RE.

2509.22126 2026-05-08 cs.CR cs.CV

Guidance Watermarking for Diffusion Models

扩散模型的引导水印技术

Enoal Gesny, Eva Giboulot, Teddy Furon, Vivien Chappelier

发表机构 * Inria(法国国家信息与自动化技术研究所) LABEL4.AI

AI总结 本文提出一种新型扩散模型水印方法,通过梯度计算提升抗攻击能力,将后验水印方案转化为生成过程中的嵌入,与变分自编码器技术互补,验证了不同模型和检测器的效果,不影响生成图像的质量和多样性。

详情
AI中文摘要

本文介绍了一种新型的扩散模型水印方法,该方法基于使用任何现成水印解码器计算的梯度来引导扩散过程。梯度计算涵盖不同的图像增强,从而提高对解码器原本不具鲁棒性的攻击的抵抗力,无需重新训练或微调。我们的方法有效地将任何后验水印方案转换为生成过程中的嵌入。我们展示了这种方法与在扩散过程结束时修改变分自编码器的水印技术互补。我们在不同的扩散模型和检测器上验证了这些方法。水印引导在给定种子和提示下不显著改变生成图像,从而保持生成的多样性和质量。

英文摘要

This paper introduces a novel watermarking method for diffusion models. It is based on guiding the diffusion process using the gradient computed from any off-the-shelf watermark decoder. The gradient computation encompasses different image augmentations, increasing robustness to attacks against which the decoder was not originally robust, without retraining or fine-tuning. Our method effectively convert any \textit{post-hoc} watermarking scheme into an in-generation embedding along the diffusion process. We show that this approach is complementary to watermarking techniques modifying the variational autoencoder at the end of the diffusion process. We validate the methods on different diffusion models and detectors. The watermarking guidance does not significantly alter the generated image for a given seed and prompt, preserving both the diversity and quality of generation.

2508.15899 2026-05-08 astro-ph.CO astro-ph.GA astro-ph.IM cs.LG

CIGaRS I: Combined simulation-based inference from type Ia supernovae and host photometry

CIGaRS I: 从Ia型超新星和宿主光度观测联合模拟推断

Konstantin Karchev, Roberto Trotta, Raul Jimenez

发表机构 * Theoretical and Scientific Data Science, International School for Advanced Studies (SISSA), Trieste, Italy(1 理论与科学数据科学,国际先进研究学院(SISSA),意大利特里este) Institute of Cosmos Sciences (ICC), University of Barcelona, Barcelona, Spain(2 大学巴塞罗那宇宙科学研究所(ICC),西班牙巴塞罗那) Department of Physics, Imperial College London, London, UK(3 物理系,伦敦帝国理工学院,英国伦敦) Institute for Fundamental Physics of the Universe (IFPU), Trieste, Italy(4 宇宙基本物理研究所(IFPU),意大利特里este) Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC), Casalecchio di Reno, Italy(5 意大利高性能计算、大数据和量子计算研究中心(ICSC),意大利Casalecchio di Reno) Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain(6 哥伦布研究与高级研究机构(ICREA),西班牙巴塞罗那)

AI总结 本文提出一个统一的贝叶斯分层模型,利用光度观测推断Ia型超新星的固有亮度与前身属性的关系、延迟时间分布及宇宙学参数,并通过模拟展示该方法在光度数据中的应用和优势。

Comments published in Nature Astronomy

详情
AI中文摘要

利用Ia型超新星作为宇宙学探针需要经验性修正,这些修正与宿主环境相关。本文提出一个统一的贝叶斯分层模型,旨在从纯光度观测中推断Ia型超新星的固有亮度与其前身属性(金属丰度和年龄)的关系、延迟时间分布、宇宙学参数以及所有宿主的红移。该模型结合了基于物理的恒星形成和化学演化方案(Prospector-β)、银河系和超新星光的尘埃消光效应以及观测选择效应。通过模拟展示固有依赖性在金属丰度和年龄上有独特的观测特征,金属丰度在宿主恒星质量约为10^10M_太阳时模仿已知的光度步骤。随后展示从约16,000颗Ia型超新星及其宿主的模拟观测中利用神经模拟推断所有模型参数。联合物理方法提供了稳健且精确的光度红移(中位数散射约0.01)并提高了宇宙学约束,比分析小比例的光谱后续观测提高了约4倍。此方法解锁了光度数据的全部潜力,并为LSST时代的端到端模拟推断流程铺平了道路。

英文摘要

Using type Ia supernovae as cosmological probes requires empirical corrections that are correlated with their host environment. Here we present a unified Bayesian hierarchical model designed to infer, from purely photometric observations, the intrinsic dependence of the brightness of type Ia supernovae on progenitor properties (metallicity and age), the delay-time distribution that governs their rate as a function of age, and cosmology, as well as the redshifts of all hosts. The model incorporates physics-based prescriptions for star formation and chemical evolution from Prospector-beta, dust extinction of both galaxy and supernova light, and observational selection effects. We show with simulations that intrinsic dependences on metallicity and age have distinct observational signatures, with metallicity mimicking the well-known step of magnitudes of type Ia supernovae across a host stellar mass of $\sim 10^{10}M_{\odot}$. We then demonstrate neural simulation-based inference of all model parameters from mock observations of ~16,000 type Ia supernovae and their hosts up to redshift 0.9. Our joint physics-based approach delivers robust and precise photometric redshifts (~0.01 median scatter) and improves cosmological constraints by a factor of ~4 over analyses of the small fraction of objects with spectroscopic follow-up. This approach unlocks the full power of photometric data and paves the way for an end-to-end simulation-based analysis pipeline in the LSST era.

2508.10533 2026-05-08 quant-ph cs.LG

Mitigating Exponential Mixed Frequency Growth through Frequency Selection

通过频率选择缓解指数级混合频率增长

Michael Poppel, David Bucher, Maximilian Zorn, Nico Kraus, Claudia Linnhoff-Popien, Philipp Altmann, Jonas Stein

发表机构 * Department of Computer Science, LMU Munich, Germany(慕尼黑大学计算机科学系) Aqarios GmbH, Munich, Germany(慕尼黑Aqarios公司)

AI总结 本文研究了量子角度编码中频率选择对训练性能的影响,发现非唯一频率主导梯度景观,通过频率选择限制模型频谱提升性能,尤其在高维和高频场景中表现优异。

Comments 11 pages, 4 figures

详情
AI中文摘要

角度编码已成为将经典数据嵌入量子模型的流行特征映射,自然生成截断傅里叶级数,具有通用函数逼近能力。尽管具有这种表达能力,实际训练面临重大挑战。通过受控实验与白盒目标函数,我们证明即使满足所有已建立的参数充分条件,训练失败仍可能发生。基于Duffy和Jastrzebski的冗余梯度框架,我们系统地实验表明非唯一频率主导梯度景观并挤占目标频率——这一负担在单编码下随着编码深度呈指数增长。小角度初始化在单维设置中缓解了这一问题,但在更高维度中失效,即使使用三进制编码,无论初始化或优化器选择,唯一频率元组的组合增长都难以处理。我们引入频率选择作为系统解决方案,限制模型频谱仅包含目标函数中存在的频率。对于二维目标,频率选择在密集方法挣扎的场景中实现接近最优性能(中位数R²≈0.95),并在高频率幅度处保持可处理性(中位数R²≈0.85),其中密集方法完全失效。在真实世界数据集上的验证证实了该方法在合成设置之外的泛化能力。

英文摘要

Angle encoding has emerged as a popular feature map for embedding classical data into quantum models, naturally generating truncated Fourier series with universal function approximation capabilities. Despite this expressive capability, practical training faces significant challenges. Through controlled experiments with white-box target functions, we demonstrate that training failures can occur even when all established parameter sufficiency conditions are satisfied. Building on the redundancy-gradient framework of Duffy and Jastrzebski, we provide systematic experimental evidence that non-unique frequencies dominate the gradient landscape and crowd out target frequencies -- a burden that grows exponentially with encoding depth under unary encoding. Small-angle initialization mitigates this in one-dimensional settings but fails to scale to higher dimensions, where even ternary encoding -- which minimizes per-frequency redundancy -- faces intractable combinatorial growth of unique frequency tuples regardless of initialization or optimizer choice. We introduce frequency selection as a principled solution that restricts the model spectrum to only those frequencies present in the target function. For two-dimensional targets, frequency selection achieves near-optimal performance (median $R^2 \approx 0.95$) where dense approaches struggle, and remains tractable at high-frequency magnitudes where dense approaches fail entirely (median $R^2 \approx 0.85$). Validation on a real-world dataset confirms the approach transfers beyond synthetic settings.

2507.20941 2026-05-08 stat.ML cs.AI cs.LG stat.ME stat.OT

Multivariate Standardized Residuals for Conformal Prediction

多元标准化残差用于置信预测

Sacha Braun, Eugène Berta, Michael I. Jordan, Francis Bach

发表机构 * Sierra team, Inria Paris, France(Inria巴黎分部)

AI总结 本文提出多元标准化残差方法,通过白化残差解耦输出相关性并标准化局部方差,从而提升置信预测的条件覆盖性能。

详情
AI中文摘要

尽管分割置信预测能保证边缘覆盖,但实现更强的条件覆盖属性对可靠不确定性量化至关重要。然而,朴素置信分数在异方差设置中表现不佳。在单变量回归中,通常通过使用估计的局部分数方差来标准化非置信分数。本文提出了一种自然扩展,将此标准化方法推广到多变量设置,有效白化残差以解耦输出相关性并标准化局部方差。进一步,我们推导出一个充分条件,表征一类广泛分布,其中标准化残差能提供渐近条件覆盖。我们证明使用由学习的局部协方差诱导的马氏距离作为非置信分数,提供了一种闭式、计算高效的机制,以捕捉输出间的相关性和异方差性,避免了之前基于累积分布函数的方法所需的昂贵采样。此结构解锁了多个实用扩展,包括处理缺失输出值、当部分信息被揭示时对置信集的细化,以及构造输出变换的有效置信集。最后,我们在合成和真实世界数据集上提供了广泛的实证证据,显示我们的方法能产生优于现有多变量基线的置信集。

英文摘要

While split conformal prediction guarantees marginal coverage, approaching the stronger property of conditional coverage is essential for reliable uncertainty quantification. Naive conformal scores, however, suffer from poor conditional coverage in heteroskedastic settings. In univariate regression, this is commonly addressed by normalizing non-conformity scores using an estimated local score variance. In this work, we propose a natural extension of this normalization to the multivariate setting, effectively whitening the residuals to decouple output correlations and standardize local variance. Furthermore, we derive a sufficient condition characterizing a broad class of distributions for which standardized residuals yield asymptotic conditional coverage. We demonstrate that using the Mahalanobis distance induced by a learned local covariance as a non-conformity score provides a closed-form, computationally efficient mechanism for capturing inter-output correlations and heteroskedasticity, avoiding the expensive sampling required by previous methods based on cumulative distribution functions. This structure unlocks several practical extensions, including the handling of missing output values, the refinement of conformal sets when partial information is revealed, and the construction of valid conformal sets for transformations of the output. Finally, we provide extensive empirical evidence on both synthetic and real-world datasets showing that our approach yields conformal sets that improve upon the conditional coverage of existing multivariate baselines.

2506.08080 2026-05-08 hep-ph cs.LG physics.comp-ph stat.ML

Towards AI-assisted Neutrino Flavor Theory Design

迈向AI辅助中微子味理论设计

Jason Benjamin Baretz, Max Fieg, Vijay Ganesh, Aishik Ghosh, V. Knapp-Perez, Jake Rudolph, Daniel Whiteson

发表机构 * Department of Physics and Astronomy, University of California, Irvine(加州大学 Irvine 分校物理与天文学系) Particle Theory Department, Fermilab(费米实验室粒子理论部门) Georgia Institute of Technology(佐治亚理工学院) Physics Division, Lawrence Berkeley National Laboratory(伯克利国家实验室物理部)

AI总结 本文提出AMBer框架,利用强化学习在物理软件管道中高效搜索模型空间,减少自由参数,验证了其在中微子味理论中的有效性。

Comments 28 pages, 12 Figures

详情
AI中文摘要

粒子物理理论,如解释中微子味混合的理论,源于庞大的模型构建可能性景观。模型的构建通常依赖于理论学家的直觉,还需要大量努力来确定适当的对称群、分配场表示,并提取预测以与实验数据比较。我们开发了自主模型构建器(AMBer),这是一个框架,其中强化学习代理与简化物理软件管道互动,以高效搜索这些空间。AMBer选择对称群、粒子内容和群表示分配,以构建可行的模型,同时最小化引入的自由参数数量。我们在已研究的理论空间区域中验证了我们的方法,并扩展探索到一个先前未研究的对称群。尽管在中微子味理论的背景下展示,这种与物理软件反馈结合的强化学习方法可能未来可扩展到其他理论模型构建问题。

英文摘要

Particle physics theories, such as those which explain neutrino flavor mixing, arise from a vast landscape of model-building possibilities. A model's construction typically relies on the intuition of theorists. It also requires considerable effort to identify appropriate symmetry groups, assign field representations, and extract predictions for comparison with experimental data. We develop an Autonomous Model Builder (AMBer), a framework in which a reinforcement learning agent interacts with a streamlined physics software pipeline to search these spaces efficiently. AMBer selects symmetry groups, particle content, and group representation assignments to construct viable models while minimizing the number of free parameters introduced. We validate our approach in well-studied regions of theory space and extend the exploration to a novel, previously unexamined symmetry group. While demonstrated in the context of neutrino flavor theories, this approach of reinforcement learning with physics software feedback may be extended to other theoretical model-building problems in the future.

2506.06313 2026-05-08 cs.IR cs.AI cs.CL

Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering

超越分块:面向长文档问答的篇章意识层次检索

Huiyao Chen, Yi Yang, Yinghui Li, Meishan Zhang, Baotian Hu, Min Zhang

发表机构 * Institute of Computing and Intelligence, Harbin Institute of Technology (Shenzhen)(计算与智能研究院,哈尔滨工业大学(深圳)) Shenzhen Loop Area Institute (SLAI)(深圳环形区研究院(SLAI))

AI总结 本文提出一种篇章意识层次检索框架,利用修辞结构理论处理长文档问答,通过 discourse parsing 和 LLM 增强实现结构与语义信息的融合,实验表明在多种语言和文档类型上均取得显著提升。

Comments 21 pages, 9 figures. Accepted at ACL 2026 Main conference

详情
AI中文摘要

现有的长文档问答系统通常将文本视为扁平序列或使用启发式分块,这忽视了自然引导人类理解的篇章结构。我们提出了一种篇章意识的层次框架,利用修辞结构理论(RST)进行长文档问答。我们的方法将篇章树转换为句子级表示,并利用 LLM 增强的节点表示来连接结构和语义信息。该框架包含三个关键创新:语言通用的篇章解析、基于 LLM 的篇章关系节点增强、以及结构引导的层次检索。在四个数据集上的广泛实验表明,通过整合篇章结构,方法在多种语种和文档类型上均优于现有方法。此外,所提框架在多样化的文档类型和语言环境下表现出强大的鲁棒性。

英文摘要

Existing long-document question answering systems typically process texts as flat sequences or use heuristic chunking, which overlook the discourse structures that naturally guide human comprehension. We present a discourse-aware hierarchical framework that leverages rhetorical structure theory (RST) for long document question answering. Our approach converts discourse trees into sentence-level representations and employs LLM-enhanced node representations to bridge structural and semantic information. The framework involves three key innovations: language-universal discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval. Extensive experiments on four datasets demonstrate consistent improvements over existing approaches through the incorporation of discourse structure, across multiple genres and languages. Moreover, the proposed framework exhibits strong robustness across diverse document types and linguistic settings.

2505.06799 2026-05-08 quant-ph cs.AI

Quantum Observers: A NISQ Hardware Demonstration of Chaotic State Prediction Using Quantum Echo-state Networks

量子观察者:基于量子回声状态网络的NISQ硬件中混沌状态预测演示

Erik L. Connerty, Ethan N. Evans, Gerasimos Angelatos, Vignesh Narayanan

发表机构 * University of South Carolina - Columbia(南卡罗来纳大学哥伦比亚分校) Naval Surface Warfare Center, Panama City Division(海军水面 warfare 中心,巴拿马城分部) RTX BBN, Cambridge MA(RTX BBN,马萨诸塞州剑桥)

AI总结 本文提出一种在噪声环境下运行的量子回声状态网络,用于高保真度模拟和硬件实验中预测混沌时间序列,实现比IBM Marrakesh QPU更长的预测时间。

Comments 14 pages, 12 figures

详情
AI中文摘要

最近的人工智能进展突显了神经网络系统在经典计算机上的强大能力,但这些系统面临显著的计算挑战,限制了可扩展性和效率。量子计算机有潜力克服这些限制,提高处理能力。然而,由于当前量子硬件中的噪声、退相干和高错误率,将量子计算与神经网络结合仍面临挑战。本文提出了一种新的量子回声状态网络(QESN)设计和实现算法,能够在当前IBM硬件上运行。我们通过经典控制理论响应分析来表征QESN,强调其丰富的非线性动态和记忆,以及通过稀疏性和重新上传块进行微调的能力。我们通过全面的演示验证了我们的方法,展示了QESN作为量子观察者在高保真度模拟和硬件实验中利用洛伦兹系统数据的应用。我们的结果表明,QESN能够预测具有持续记忆的长时间序列,运行时间超过IBM Marrakesh QPU的中位数T1和T2的100倍,在超导硬件上实现了最先进的时间序列性能。

英文摘要

Recent advances in artificial intelligence have highlighted the remarkable capabilities of neural network (NN)-powered systems on classical computers. However, these systems face significant computational challenges that limit scalability and efficiency. Quantum computers hold the potential to overcome these limitations and increase processing power beyond classical systems. Despite this, integrating quantum computing with NNs remains largely unrealized due to challenges posed by noise, decoherence, and high error rates in current quantum hardware. Here, we propose a novel quantum echo-state network (QESN) design and implementation algorithm that can operate within the presence of noise on current IBM hardware. We apply classical control-theoretic response analysis to characterize the QESN, emphasizing its rich nonlinear dynamics and memory, as well as its ability to be fine-tuned with sparsity and re-uploading blocks. We validate our approach through a comprehensive demonstration of QESNs functioning as quantum observers, applied in both high-fidelity simulations and hardware experiments utilizing data from a prototypical chaotic Lorenz system. Our results show that the QESN can predict long time-series with persistent memory, running over 100 times longer than the median T1 and T2 of the IBM Marrakesh QPU, achieving state-of-the-art time-series performance on superconducting hardware.

2503.16737 2026-05-08 stat.ML cs.LG math.PR math.ST stat.TH

Revenue Maximization Under Sequential Price Competition Via The Estimation Of s-Concave Demand Functions

通过估计s-凹需求函数实现序列价格竞争中的收益最大化

Daniele Bracale, Moulinath Banerjee, Cong Shi, Yuekai Sun

发表机构 * Department of Statistics University of Michigan Ann Arbor, MI, USA(统计系密歇根大学安娜堡分校) Department of Management University of Miami Miami, USA(管理系迈阿密大学)

AI总结 本文研究了多个卖方在T期内的序列价格竞争,提出利用半参数最小二乘估计的动态定价策略,证明价格收敛至纳什均衡的速度为O(T^{-1/7}),并建立新的集中性结果。

详情
AI中文摘要

我们考虑多个卖方在T期内的价格竞争。每个时期,卖方同时公布价格并观察各自的需求。每个卖方的需求函数依赖于所有卖方的价格,通过一个私有的、未知的非线性关系。我们提出一种动态定价策略,利用半参数最小二乘估计,证明当卖方采用我们的策略时,价格以O(T^{-1/7})的速度收敛到纳什均衡价格。每个卖方相对于动态基准策略的后悔值为O(T^{5/7})。我们的理论贡献是通过s-凹性概念证明在形状约束需求函数下的均衡存在性,并建立所提策略的后悔界。技术上,我们还建立了在形状约束下的最小二乘估计器的新集中性结果。我们的发现为动态竞争感知定价提供了重要见解,并促进了非参数学习在战略决策中的广泛研究。

英文摘要

We consider price competition among multiple sellers over a selling horizon of $T$ periods. In each period, sellers simultaneously offer their prices (which are made public) and subsequently observe their respective demand (not made public). The demand function of each seller depends on all sellers' prices through a private, unknown, and nonlinear relationship. We propose a dynamic pricing policy that uses semi-parametric least-squares estimation and show that when the sellers employ our policy, their prices converge at a rate of $O(T^{-1/7})$ to the Nash equilibrium prices that sellers would reach if they were fully informed. Each seller incurs a regret of $O(T^{5/7})$ relative to a dynamic benchmark policy. A theoretical contribution of our work is proving the existence of equilibrium under shape-constrained demand functions via the concept of $s$-concavity and establishing regret bounds of our proposed policy. Technically, we also establish new concentration results for the least squares estimator under shape constraints. Our findings offer significant insights into dynamic competition-aware pricing and contribute to the broader study of non-parametric learning in strategic decision-making.

2501.04046 2026-05-08 physics.soc-ph cs.AI cs.CY

Traits of a Leader: User Influence Level Prediction through Sociolinguistic Modeling

领袖特征:通过社会语言学建模预测用户影响力水平

Denys Katerenchuk, Rivka Levitan

发表机构 * The Graduate Center, CUNY(CUNY研究生中心)

AI总结 本文通过社会语言学建模预测用户影响力水平,利用社区背书定义影响力,并通过人口统计和性格数据提升预测效果,跨八个领域均提升RankDCG分数。

详情
AI中文摘要

识别用户影响力水平在在线互动中备受关注,因为有影响力用户能够影响他人观点以达成目标。因此,预测用户影响力有助于理解社交网络、预测趋势、防止虚假信息等。然而,预测用户影响力具有挑战性,因为影响力概念因情境或领域而异,且用户交流仅限于文本。本文将用户影响力水平定义为社区背书函数,并开发出显著优于基线的模型,通过利用人口统计和性格数据,该方法在八个不同领域中持续提升RankDCG分数。

英文摘要

Recognition of a user's influence level has attracted much attention as human interactions move online. Influential users have the ability to sway others' opinions to achieve some goals. As a result, predicting users' level of influence can help to understand social networks, forecast trends, prevent misinformation, etc. However, predicting user influence is a challenging problem because the concept of influence is specific to a situation or a domain, and user communications are limited to text. In this work, we define user influence level as a function of community endorsement and develop a model that significantly outperforms the baseline by leveraging demographic and personality data. This approach consistently improves RankDCG scores across eight different domains.