arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.06815 2026-06-08 cs.CR cs.LG 新提交

AMD-FCG: An Enhanced Function Call Graph Dataset with Integrated Topological Features for Malware Detection and Classification

AMD-FCG：一个集成拓扑特征的增强函数调用图数据集，用于恶意软件检测与分类

Parthajit Borah, Sakshi Singh, D. K. Bhattacharyya, J. K. Kalita

AI总结本文提出AMD-FCG数据集，通过集成恶意软件的拓扑特征增强函数调用图，以简化检测流程并消除动态分析需求，从而提升恶意软件检测的准确性和鲁棒性。

详情

AI中文摘要

由于恶意软件表现出复杂的结构和行为，其检测一直是网络安全领域及相关日常服务中的重大挑战。因此，拥有一个可靠且自适应的解决方案来解决该问题变得至关重要。在多年来开发的多种检测方法中，最可靠的方法之一是研究和分析恶意软件的结构和行为模式。这些复杂恶意软件的模式可以借助函数调用图（FCG）获得。然而，为了有效覆盖大量恶意软件家族群体，系统需要足够大的数据集来运行。为了确保系统的准确性和鲁棒性，数据集应包含不同恶意软件样本以及良性应用程序，以安全执行检测过程。本文介绍了AMD-FCG，一个集成恶意软件拓扑特征的增强函数调用图数据集。该框架增强了检测过程，简化了网络安全专业人员的工作流程，并消除了动态分析和大量处理的需求。因此，它可用于开发和部署更高效、更具创新性的恶意软件检测系统。

英文摘要

As malware illustrates a complex structure and behavior, detection of these has been a significant challenge in the domain of cybersecurity along with related services in daily life. So, it becomes crucial to have a reliable and adaptive solution to address the issue. Among the several detection methods developed over the years, one of the most reliable ones is studying and analyzing the structural and behavioral patterns of malware. These patterns of sophisticated malware can be obtained with the help of Function Call Graphs (FCGs). However, to effectively cover numerous groups of families of malware, it is required to have a sufficiently large dataset for the system to operate on. In order to ensure accuracy and robustness of the system, the dataset should comprise samples of different malwares and a benign application for secure execution of the detection process. This paper introduces AMD-FCG, an enhanced Function Call Graph dataset integrated with topological features of malwares. The framework enhances the detection procedure, streamlining the workflow for cybersecurity professionals and also eliminating the need for dynamic analysis and extensive processing. Therefore, it can be used to develop and deploy more efficient and innovative malware detection systems.

URL PDF HTML ☆

赞 0 踩 0

2606.06754 2026-06-08 cs.MA cs.CL 新提交

MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

MADRAG: 基于检索增强生成的多智能体辩论用于免训练分析性论文评分

Ali Keramati, Shiyuan Zhou, Sharad Mehrotra, Mark Warschauer

AI总结提出MADRAG框架，结合多智能体辩论与检索增强，通过倡导者、批评者和法官的交互以及检索示例校准，实现无需训练的论文评分，性能接近监督系统。

详情

Comments: 21 pages, 7 figures, 14 tables

AI中文摘要

我们提出了MADRAG，一个无需训练的分析性论文评分框架，它结合了多智能体推理与检索增强的 grounding。与标准LLM-as-judge方法（容易产生偏差和不稳定评分）不同，MADRAG将评估分解为一个交互过程：倡导者识别优点，批评者指出缺点，法官综合他们的论点给出最终分数。关键的是，法官通过检索与评分标准对齐的示例进行增强，从而通过与已评分示例的比较实现校准。我们的结果表明，MADRAG显著优于基于提示的基线方法，同时在没有任务特定训练的情况下接近监督系统的性能。消融研究表明，检索驱动校准增益，而辩论改善了高层次特质的推理。我们的发现强调了结构化交互和外部记忆在可靠的基于LLM的评估中的互补作用。

英文摘要

We present MADRAG, a training-free framework for analytic essay scoring that combines multi-agent reasoning with retrieval-augmented grounding. Unlike standard LLM-as-judge approaches, which are prone to bias and unstable scoring, MADRAG decomposes evaluation into an interactive process: an Advocate identifies strengths, a Skeptic critiques weaknesses, and a Judge aggregates their arguments into a final score. Crucially, the Judge is augmented with rubric-aligned exemplar retrieval, enabling calibration through comparison with scored examples. Our results show that MADRAG significantly outperforms prompt-based baselines while approaching the performance of supervised systems without requiring task-specific training. Ablation studies demonstrate that retrieval drives calibration gains, while debate improves reasoning on higher-level traits. Our findings highlight the complementary roles of structured interaction and external memory in reliable LLM-based evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.06570 2026-06-08 cs.CR cs.AI 新提交

MalTree: Tracing Malware Evolution from Embeddings at Scale

MalTree: 从嵌入中大规模追踪恶意软件演化

Akash Amalan, Georgios Smaragdakis, Tom J. Viering

AI总结提出MalTree框架，利用生物信息学系统发育技术（UPGMA和邻接法）基于结构、行为和图像特征自动建模恶意软件演化，通过VirusTotal时间戳验证达到87%时间一致性，揭示家族间变异速率差异，支持谱系感知的恶意软件分析。

详情

Comments: 33 pages, accepted at ICML 2026

AI中文摘要

恶意软件检测在很大程度上仍然是被动的：针对已知样本训练的机器学习模型随着威胁的演化而性能下降。理解恶意软件家族之间的演化关系可以为主动防御提供信息，但传统的逆向工程可能需要数月到数年才能揭示这种谱系关系。我们提出MalTree，一个框架，它大规模应用生物信息学启发的系统发育技术（UPGMA和邻接法），利用结构、行为和基于图像的特征自动建模恶意软件演化。我们引入基于VirusTotal时间戳的时间验证，以评估推断的树是否反映实际的演化顺序。MalTree达到了87%的时间一致性，表明推断的演化关系与真实世界的出现时间线紧密对齐。我们的分析显示，一些家族的变异速度比其他家族快10倍以上，这表明检测策略应针对家族特定的演化节奏进行调整。包括Mirai僵尸网络在内的案例研究证实，从我们的系统发育树推断的关系与记录在案的情报一致。我们的框架为将恶意软件分析从逐个样本分类转向谱系感知的演化建模奠定了基础。

英文摘要

Malware detection remains largely reactive: machine learning models trained on known samples degrade as threats evolve. Understanding evolutionary relationships among malware families can inform proactive defense, but traditional reverse engineering can take months to years to uncover such lineage relationships. We propose MalTree, a framework that applies bioinformatics inspired phylogenetic techniques (UPGMA and Neighbor-Joining) at scale to model malware evolution automatically using structural, behavioral, and image-based features. We introduce temporal validation using VirusTotal timestamps to assess whether inferred trees reflect actual evolutionary order. MalTree achieves 87% temporal consistency, indicating that inferred evolutionary relationships closely align with real-world emergence timelines. Our analysis shows that some families mutate over 10 times faster than others, suggesting that detection strategies should be tailored to family-specific evolutionary tempos. Case studies, including the Mirai botnet, confirm that inferred relationships from our phylogenetic tree align with documented threat intelligence. Our framework provides a foundation for shifting malware analysis from sample-by-sample classification toward lineage-aware evolutionary modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.06563 2026-06-08 cs.SE cs.AI 新提交

AI-Driven Test Case Generation from Natural Language Requirements: A Survey of Techniques and Research Gaps

AI驱动的自然语言需求测试用例生成：技术与研究空白综述

Orimoloye Folorunsho, Hassan Reza

AI总结综述AI、NLP和LLM从自然语言需求生成测试用例的技术，指出当前方法无法同时满足自动化、歧义处理等六个质量维度，提出四个研究方向。

详情

Comments: 22 pages, 7 figures, 4 tables

AI中文摘要

软件测试对于验证系统是否满足指定需求至关重要，但仍是开发中最耗时和最昂贵的活动之一。基于需求的测试生成允许从需求工件早期导出测试用例，但由于固有的歧义和不精确性，直接从自然语言生成测试用例具有挑战性。人工智能、自然语言处理（NLP）和大语言模型（LLM）的最新进展使得自动化这一流程越来越可行，同时也引入了新的风险，包括幻觉、可追溯性降低和不一致的评估。本综述探讨了四个研究问题：提出了哪些AI和NLP技术用于从自然语言需求生成测试用例；哪些工具和框架支持这些方法；如何评估生成的测试用例；以及存在哪些研究空白。遵循Kitchenham和Charters的系统综述指南，我们搜索了2000-2025年的主要学术数据库，并在应用严格纳入标准后，确定了21项主要研究。文献被组织为三个进化时代，揭示出没有现有方法能同时满足六个关键质量维度：自动化、歧义处理、领域适用性、可追溯性、评估彻底性和幻觉控制。本综述做出了三个主要贡献：基于AI的测试生成的三时代进化综合；六标准差距分析，显示当前没有方法完全满足所有质量维度；以及针对幻觉、可追溯性、复杂性敏感性和合规性的四个可操作研究指南。

英文摘要

Software testing is critical for verifying that systems meet specified requirements, yet remains among the most time-consuming and expensive activities in development. Requirements-based test generation allows test cases to be derived early from requirements artifacts, but generating them directly from natural language is challenging due to inherent ambiguity and imprecision. Recent advances in AI, natural language processing (NLP), and large language models (LLMs) have made automating this pipeline increasingly feasible, while introducing new risks including hallucination, reduced traceability, and inconsistent evaluation. This survey addresses four research questions: what AI and NLP techniques have been proposed for generating test cases from natural language requirements; what tools and frameworks support these approaches; how generated test cases are evaluated; and what research gaps remain. Following Kitchenham and Charters' systematic review guidelines, we searched major scholarly databases spanning 2000-2025 and, after applying strict inclusion criteria, identified 21 primary studies. The literature is organized into three evolutionary eras, revealing that no existing approach simultaneously satisfies six key quality dimensions: automation, ambiguity handling, domain applicability, traceability, evaluation thoroughness, and hallucination control. The survey makes three main contributions: a three-era evolutionary synthesis of AI-based test generation; a six-criteria gap analysis showing no current approach fully addresses all quality dimensions; and four actionable research guidelines targeting hallucination, traceability, complexity sensitivity, and compliance.

URL PDF HTML ☆

赞 0 踩 0

2606.07277 2026-06-08 cs.IT cs.CR cs.LG math.IT 新提交

The Capacity of Information-Theoretic Secure Aggregation in Federated Learning

联邦学习中信息论安全聚合的容量

Lanxin Yi, Jinbao Zhu, Kai Wan, Xiaohu Tang

AI总结针对联邦学习中的安全聚合问题，提出一种无需可信第三方或预设结构的通用密钥分发模型，并完整刻画了安全性、密钥分发通信和聚合通信三者间的容量区域。

详情

AI中文摘要

安全聚合允许服务器在保护更新隐私的情况下聚合用户的本地更新。现有的信息论问题通常假设相关随机密钥由可信第三方（TTP）提供或通过规定的群组结构生成，而建立此类相关密钥的通信成本往往被忽略。因此，在通用密钥分发机制下的基本极限仍然未知。在本文中，我们研究了在由密钥分发阶段和更新聚合阶段组成的通用两阶段框架下，具有$N$个用户的$T$共谋信息论安全聚合问题。与先前工作不同，我们通过用户间通信对密钥分发进行建模，并允许任意用户生成的密钥分发机制，消除了TTP或规定结构。这使得能够联合表征三种资源：用于安全性的随机性、密钥分发通信和聚合通信。我们通过构建一种新颖的安全聚合方案以及匹配的信息论逆定理，完全刻画了这三种资源之间的容量区域。特别地，我们在任何大小至少为$N$的有限域上开发了一种显式的确定性容量达到构造，而大多数现有方案要么依赖TTP，要么在足够大的有限域上采用随机或存在性构造。我们进一步表明，仅使用成对共享密钥即可实现最优性能，从而可以通过Diffie-Hellman密钥交换实现。与Google开创性的安全聚合方案相比，所提方案在保持相同聚合通信开销的同时，需要更少的随机掩码密钥。

英文摘要

Secure aggregation allows a server to aggregate users' local updates while preserving update privacy. Existing information-theoretic problems typically assume that correlated random keys are provided by a trusted third party (TTP) or generated via prescribed groupwise structures, while the communication cost for establishing such correlated keys is often ignored. Consequently, the fundamental limits under general key-distribution mechanisms remain unknown. In this paper, we study the $T$-colluding information-theoretic secure aggregation problem with $N$ users under a general two-phase framework consisting of a key distribution phase and an update aggregation phase. Unlike prior work, we model key distribution through user-to-user communication and allow arbitrary user-generated key-distribution mechanisms, eliminating TTP or prescribed structures. This enables a joint characterization of three resources: randomness for security, key-distribution communication, and aggregation communication. We completely characterize the capacity region among these three resources by constructing a novel secure aggregation scheme together with a matching information-theoretic converse. In particular, we develop an explicit deterministic capacity-achieving construction over any finite field of size at least $N$, whereas most existing schemes either rely on TTP or employ randomized or existential constructions over sufficiently large finite fields. We further show that the optimal performance can be achieved using only pairwise shared keys, enabling implementation via Diffie--Hellman key exchange. Compared with Google's seminal secure aggregation scheme, the proposed scheme requires fewer random masking keys while preserving the same aggregation communication overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.06765 2026-06-08 cond-mat.mtrl-sci cs.LG 新提交

Reactivity-Informed Machine Learning for Performance Prediction and Design Space Exploration of Alkali-Activated Slag

反应性信息驱动的机器学习用于碱激发矿渣性能预测与设计空间探索

Qiyao He, Zhanzhao Li, Kai Gong

AI总结通过整合矿渣反应性描述符（AMODE）与机器学习，从最大规模文献数据集预测碱激发矿渣抗压强度，揭示物理一致趋势并探索低碳设计空间。

详情

Comments: 68 pages, 14 figures, 2 tables

AI中文摘要

在胶凝材料中，建立配合比、原材料性能、养护条件和性能之间的定量关系一直是一个长期挑战，特别是对于具有可变前驱体和激发剂化学性质的碱激发材料。在此，我们整理了迄今为止最大的文献来源碱激发矿渣（AAS）数据集，包含超过3100条抗压强度记录、155种化学性质不同的粒化高炉矿渣（GGBS）以及24个属性，涵盖前驱体化学性质、细度和反应性。多种机器学习（ML）算法在逐步丰富的特征场景下进行了基准测试，表明整合GGBS成分、细度、养护条件和试件几何形状可提高预测性能。平均金属氧化物解离能（AMODE）作为前驱体反应性的物理可解释表示，为显式氧化物成分提供了紧凑的替代描述符，同时实现了相当的预测性能。模型解释揭示了来自异构数据的物理一致趋势，包括Na2O用量和硅酸盐模量的非单调效应、较高含水量和较大试件尺寸下预测强度降低，以及AMODE比单个氧化物含量更连贯地表示的耦合氧化物级效应。统计约束的设计空间探索揭示了强度、隐含CO2排放和成本之间依赖于反应性的权衡。设计图识别出高强度区域，其CO2排放量显著低于类似成本的OPC基参考。总体而言，这项工作展示了反应性信息驱动的ML如何从异构AAS数据中提取物理有意义的趋势，并指导源依赖性粘结剂设计。整理后的数据集可公开访问，以支持水泥和混凝土研究的进展。

英文摘要

Establishing quantitative relationships among mix design, raw material properties, curing conditions, and performance remains a long-standing challenge in cementitious materials, particularly for alkali-activated materials with variable precursor and activator chemistry. Here, we curated the largest literature-derived alkali-activated slag (AAS) dataset to date, comprising over 3100 compressive strength records, 155 chemically distinct ground granulated blast-furnace slags (GGBSs), and 24 attributes incorporating precursor chemistry, fineness, and reactivity. Multiple machine learning (ML) algorithms were benchmarked across progressively enriched feature scenarios, demonstrating that integrating GGBS compositions, fineness, curing conditions, and specimen geometry improves predictive performance. The average metal oxide dissociation energy (AMODE), a physically interpretable representation of precursor reactivity, provides a compact alternative descriptor to explicit oxide compositions while enabling comparable predictive performance. Model interpretation revealed physically consistent trends from heterogeneous data, including non-monotonic effects of Na2O dosage and silicate modulus, reduced predicted strength at higher water content and larger specimen size, and coupled oxide-level effects more coherently represented by AMODE than by individual oxide contents. Statistically constrained design space exploration reveals reactivity-dependent trade-offs among strength, embodied CO2 emissions, and cost. The design maps identify high-strength regions with substantially lower CO2 emissions than OPC-based references at similar cost. Overall, this work demonstrates how reactivity-informed ML can extract physically meaningful trends from heterogeneous AAS data and guide source-dependent binder design. The curated dataset is publicly accessible to support advances in cement and concrete research.

URL PDF HTML ☆

赞 0 踩 0

2605.04222 2026-06-08 eess.SY cs.RO cs.SY 版本更新

Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control

通过不变性保证安全，通过精化实现活性：分层控制协同设计的异构契约框架

Yoshinari Takayama, Alessio Iovine, Bart Besselink, Guillaume Sandou, Adnane Saoud

AI总结针对分层控制架构缺乏统一规范语言、跨时间尺度互联保证及层间组合分离的问题，提出将安全-活性分解引入异构假设-保证契约框架，通过连续时间层的不变性保证安全，离散时间层的精化实现活性，并形式化层间协调条件。

详情

Comments: 21 pages

AI中文摘要

现实世界的控制系统必须在满足连续时间安全约束的同时实现长期目标（活性），这一组合推动了分层控制架构（LCA）的研究。然而，现有的LCA研究缺乏（i）跨离散规划和连续执行的统一规范语言，（ii）在异构时间尺度下互连子系统时保证规范得以保持的形式化保证，以及（iii）由于依赖简单的输入滤波法则而导致的层间组合分离。本文通过将安全-活性分解引入异构假设-保证框架来填补这三个空白：\emph{安全通过连续时间层的不变性}来保证，而\emph{活性通过离散时间层的精化}来实现，层间协调通过垂直精化和时间兼容性条件形式化。我们通过一个结合MPC规划器、输入到状态稳定（ISS）底层控制器和参考调节器桥的新型LCA实例化该契约，并在包含电池和超级电容器的混合储能系统（HESS）上进行了验证。

英文摘要

Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.

URL PDF HTML ☆

赞 0 踩 0

2606.05967 2026-06-08 stat.ML cs.LG 版本更新

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

具有线性函数逼近、通用学习步长和独立同分布样本的TD(0)的快速鲁棒收敛速率

Ziad Kobeissi, Éloïse Berthier

AI总结针对线性函数逼近的TD(0)算法，在独立同分布样本和常数学习步长下，提出一种均方误差的快速（1/k阶）、鲁棒（不依赖最小特征值）且尖锐（乘性常数小于11）的收敛速率，并引入PCTD(0)变体以在强混合假设下获得更好收敛性。

详情

Journal ref: AISTATS 2026, May 2026, Tanger, Morocco
Comments: This is an extended version of a paper accepted at AISTATS 2026

AI中文摘要

本文研究了具有线性函数逼近（LFA）的TD(0)时序差分方法的有限时间行为。我们考虑策略内独立同分布（i.i.d.）样本、常数学习步长和Polyak-Juditsky平均方法。我们为近似函数的均方误差（MSE）建立了一个新的收敛速率，该速率（i）快速，即具有迭代次数k的最优依赖性（即1/k阶），（ii）对病态条件鲁棒：仅依赖于初始误差和模型无关常数，以及（iii）尖锐，乘性常数小于11。特别地，与TD(0)文献中所有现有的O(1/k)速率不同，它不依赖于线性参数化的非中心协方差矩阵的最小特征值。我们还引入了PCTD(0)，这是TD(0)的一个变体，在马尔可夫链的强混合附加假设下具有更好的收敛性质。

英文摘要

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

URL PDF HTML ☆

赞 0 踩 0

2606.04101 2026-06-08 cs.DC cs.LG 版本更新

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

UltraEP：在机架级节点上以近最优负载均衡释放MoE训练与推理

Xinming Wei, Chao Jin, Tuo Dai, Yinmin Zhong, Shan Yu, Chengxu Yang, Bingyang Wu, Zili Zhang, Jing Mai, Qianchao Zhu, Zhouyang Li, Yuliang Liu, Guojie Luo

AI总结提出UltraEP，首个基于精确负载的实时均衡器，通过协同设计规划求解与专家复制通信，在机架级节点上实现MoE训练和推理的微批次与逐层重均衡，达到94.3%的力均衡理想吞吐量。

详情

Comments: The authors have identified issues related to information disclosure in the current version of the manuscript and therefore request its withdrawal. A revised version may be prepared at a later date

AI中文摘要

大规模专家并行（EP）正成为训练和服务前沿MoE模型的关键，但它也加剧了设备级专家负载不均衡，导致计算掉队者、令牌全对全瓶颈和激活内存峰值。现有的均衡器基于历史负载定期重新分配专家，这对于具有非平稳负载模式的生产部署变得不可靠。我们提出UltraEP，首个用于大规模EP MoE训练和在机架级节点（RSN）上服务预填充的精确负载实时均衡器。基于RSN扩展的纵向扩展连接性，UltraEP在关键路径上对每个微批次和层进行重均衡，这需要规划求解和专家复制通信的非平凡协同设计，以最小化暴露的开销。为此，UltraEP通过高效的配额驱动规划对门控后负载做出积极反应，并利用RSN原生的持久tile流和基于中继的扇出缓解来执行由此产生的不规则专家状态传输。在训练和预填充中，平均涵盖106B到671B参数的MoE模型，UltraEP实现了力均衡理想吞吐量的94.3%，相比无均衡提升了1.49倍，同时将最终跨秩不均衡从1.30-4.01降低到1.01-1.04。此外，我们在2560个GPU的生产MoE训练中验证了UltraEP的可扩展性和鲁棒性。

英文摘要

Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Built upon the extended scale-up connectivity of RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with efficient quota-driven planning, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. Averaged across MoE models from 106B to 671B parameters in training and prefill, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over non-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04. Additionally, we validate UltraEP's scalability and robustness in production MoE training with 2560 GPUs.

URL PDF HTML ☆

赞 0 踩 0

2606.01446 2026-06-08 eess.SP cs.LG 版本更新

Spatially Distributed Task-Oriented Compression for Multi-Emitter Localization and Characterization with Spectral Overlap

面向多发射源定位与频谱重叠表征的空间分布式任务导向压缩

H. Nazim Bicer, J. Nicholas Laneman

AI总结提出一种任务导向的分布式压缩框架，利用空间分布式接收机对频谱重叠的多发射源进行联合定位与表征，通过置换不变训练实现高效压缩与信息保留。

详情

Comments: 6 pages, 2 figures

AI中文摘要

射频频谱感知需要在密集和竞争性的无线环境中检测、定位和表征发射源。本文提出了一种任务导向的分布式压缩框架，用于利用空间分布式接收机联合定位和表征多个发射源。每个接收机观测一段短时长的复数IQ样本，将观测转换为时频表示，并编码为紧凑的潜在向量。中央融合解码器结合各接收机的潜在向量，估计一组无序的活跃发射源，包括其位置、中心频率偏移、占用带宽和波形族。采用置换不变的训练目标来处理发射源和预测的任意排序。在具有频谱重叠的合成多发射源场景上的实验表明，即使极紧凑的接收端表示也能保留用于发射源计数和波形族估计的有用信息。然而，精确的定位和频谱参数回归需要更大的潜在维度。将接收机潜在维度从$d_{\mathrm{rx}}=1$增加到$d_{\mathrm{rx}}=16$带来了最大的改进，而进一步增加到$d_{\mathrm{rx}}=64$则增益较小。这些结果证明了学习型任务导向压缩在通信高效的分布式频谱感知中的潜力。

英文摘要

Radio frequency spectrum awareness requires the ability to detect, localize, and characterize emitters in dense and contested wireless environments. In this work, we propose a task-oriented distributed compression framework for joint multi-emitter localization and characterization using spatially distributed receivers. Each receiver observes a short window of complex IQ samples, converts the observation to a time--frequency representation, and encodes it into a compact latent vector. A central fusion decoder combines the receiver latents to estimate an unordered set of active emitters, including their locations, center-frequency offsets, occupied bandwidths, and waveform families. A permutation-invariant training objective is used to handle the arbitrary ordering of emitters and predictions. Experiments on synthetic multi-emitter scenes with spectral overlap show that even extremely compact receiver-side representations can preserve useful information for emitter counting and waveform-family estimation. However, accurate localization and spectral-parameter regression require larger latent dimensions. Increasing the receiver latent dimension from $d_{\mathrm{rx}}=1$ to $d_{\mathrm{rx}}=16$ provides the largest improvement, while further increasing to $d_{\mathrm{rx}}=64$ gives smaller gains. These results demonstrate the potential of learned task-oriented compression for communication-efficient distributed spectrum awareness.

URL PDF HTML ☆

赞 0 踩 0

2605.17548 2026-06-08 cs.SE cs.AI 版本更新

Rethinking Code Review in the Age of AI: A Vision for Agentic Code Review

重新思考AI时代的代码审查：面向代理代码审查的愿景

Hüseyin Özgür Kamalı, Erdem Tuna, Vahid Haratian, Eray Tüzün

AI总结本文探讨了在AI时代代码审查的演变，提出了一种结合专门代理和人类控制的质量闸门的AI驱动代码审查流程，旨在提升代码审查的效率和可靠性。

详情

Comments: Submitted to ACM Transactions on Software Engineering Methodology (TOSEM). A shorter version of this work has been presented at ICSE-JAWs 2026, Rio de Janeiro, Brazil

AI中文摘要

代码审查已经经历了数十年的发展，从非正式的同行检查发展到今天的拉取请求（PR）工作流程，但仍然主要是一种手动、不均匀且认知负担重的过程。人工智能（AI）编程助手的兴起加剧了这一挑战：虽然这些工具提高了代码生成的速度，但同时也增加了需要审查的代码量，使代码审查成为增长的瓶颈。当前的AI支持仍然碎片化，工具主要专注于孤立任务，如审阅者推荐、PR描述生成或评论建议，而非整个PR审查流程。本文回顾了代码审查实践的历史演变，并考察了由大语言模型（LLMs）和代理AI系统驱动的转变。随后，我们提出了一种AI驱动的代码审查流程愿景，结合专门的代理和人类控制的质量闸门。我们的框架涵盖五个阶段：PR创建、PR增强、审阅者选择、AI辅助代码审查和PR回顾，其中在关键决策点保留人类以保持判断、责任和团队层面的理解。我们识别了负责任采用的主要开放挑战，包括可靠性、偏见、隐私、自动化偏见、透明度和评估，并提出了更有效的软件工程中人类-AI协作的研究议程。

英文摘要

Code review has evolved for decades, from informal peer checking to today's pull request (PR) workflows, yet it remains a largely manual and cognitively demanding process. The rise of Artificial Intelligence (AI) coding assistants has intensified this challenge: while these tools increase code production velocity, they also expand the volume of code requiring review, turning code review into a growing bottleneck. Current AI support in code review remains fragmented, with tools focusing on isolated tasks such as reviewer recommendation, PR description generation, or comment suggestion rather than the end-to-end PR review workflow. We address this gap by treating review effectiveness as an outcome of the full code review lifecycle rather than a single stage, proposing a framework that carries context across stage boundaries. We propose a future vision for code review in which reviewers transition from manual inspectors into supervisory operators of agents. In this vision, staged, AI-powered workflows aim to align the pace of code generation with shared understanding and accountable engineering. In this paper, we review the historical evolution of code review practices, identify challenges in traditional code review systems, and examine the shift driven by large language models (LLMs) and agentic AI systems. We then present a vision for an AI-powered code review workflow combining specialized agents with human-controlled quality gates. Our framework spans five stages: PR Creation, PR Augmentation, Reviewer Selection, AI-Assisted Code Review, and PR Retrospective, with humans retained at key decision points to preserve judgment, accountability, and team-level understanding. Finally, we identify key adoption challenges and outline research directions for evaluation, governance, and responsible human-AI collaboration.

URL PDF HTML ☆

赞 0 踩 0

2601.13508 2026-06-08 cond-mat.mtrl-sci cs.AI 版本更新

Autonomous computational catalysis through an agentic research system

自主计算催化：通过智能体研究系统

Honghao Chen, Jiangjie Qiu, Yi Shen Tew, Xiaonan Wang

AI总结提出CatMaster催化原生智能体研究系统，将自然语言请求转化为计算研究，实现从建模到闭环催化剂设计的自主执行，在CO2-to-CO催化剂设计中识别出竞争性活性位点。

详情

Comments: 25 pages for main manuscript; SI not available here

AI中文摘要

自主智能体正开始将科学研究从工具辅助的工作流程转变为自我维持的发现过程。计算催化提供了一个代表性的挑战，因为催化剂发现需要将高层次问题转化为协调的模型构建、原子模拟、机理分析和跨尺度的迭代设计。在这里，我们介绍了CatMaster，一个催化原生的智能体研究系统，它将计算催化重塑为一个低门槛的自主研究虚拟生态系统。CatMaster维护一个不断演进的研究状态，并通过在一个可扩展环境内的模型构建、计算、批判和催化剂设计决策中的自我反馈来扩展能力。在逐渐具有挑战性的任务中，CatMaster将自然语言请求转化为具体的计算研究，从基本的原子建模和标准计算到机理探索和闭环催化剂设计。它在代表性的计算催化场景中展示了稳健的执行能力，并在选定的MatBench任务中表现出接近领先的性能，其中声子场景展示了其建模自我进化能力。在独立的CO2-to-CO催化剂设计案例中，CatMaster使用迭代的自我批判和证据精炼来识别出具有竞争力的B-CoN4和NiN3B/N-NiN3B基序。这些结果建立了一个虚拟生态系统范式，其中AI智能体超越模拟执行，走向端到端的计算研究，为催化和材料科学中的自主发现提供了基础。

英文摘要

Autonomous agents are beginning to transform scientific research from tool-assisted workflows toward self-sustaining discovery processes. Computational catalysis provides a representative challenge, as catalyst discovery requires high-level questions to be translated into coordinated model construction, atomistic simulation, mechanistic analysis, and iterative design across multiple scales. Here we introduce CatMaster, a catalysis-native agentic research system that recasts computational catalysis as a low-barrier virtual ecosystem for autonomous research. CatMaster maintains an evolving research state and extends capabilities through self-feedback across model construction, calculation, critique and catalyst-design decisions within one extensible environment. Across progressively challenging tasks, CatMaster converts natural-language requests into concrete computational studies, from essential atomistic modelling and standard calculations to mechanism exploration and closed-loop catalyst design. It showed robust execution in representative computational-catalysis scenarios and near-leading performance across selected MatBench tasks, with phonons scenario demonstrating its modelling self-evolution capability. In the independent CO2-to-CO catalyst design case, CatMaster used iterative self-critique and evidence refinement to identify competitive B-CoN4 and NiN3B/N-NiN3B motifs. These results establish a virtual-ecosystem paradigm in which AI agents move beyond simulation execution toward end-to-end computational research, providing a foundation for autonomous discovery in catalysis and materials science.

URL PDF HTML ☆

赞 0 踩 0

2605.06215 2026-06-08 physics.chem-ph cs.AI 版本更新

COF26: A new on-top functional for multiconfiguration pair-density functional theory

COF26：多组态对密度泛函理论的一种新的on-top泛函

Yuhao Chen, Donald G. Truhlar, Xiao He

AI总结提出COF26泛函，通过大语言模型辅助优化工作流，在强和弱关联体系中均表现优越，推荐用于未来MC-PDFT计算。

详情

AI中文摘要

多组态对密度泛函理论（MC-PDFT）为计算强关联分子体系的电子能量提供了一种高效且准确的框架，其中on-top泛函的质量是其预测精度的关键决定因素。在此，我们介绍了MMCDDB26，一个严格整理的基准数据库，包含76个数据集和1495个反应。我们进一步提出了一种受约束的、大语言模型辅助的优化工作流，用于MC-PDFT泛函的开发和评估。利用该工作流，我们在MMCDDB26上优化了MC23/MC25泛函的参数，得到了MC26。与同类早期泛函相比，MC26提高了训练集的精度，并实现了更平衡的整体性能。此外，我们开发了混合meta泛函COF26。我们发现COF26在强和弱关联体系中均表现出优越的性能，因此推荐在未来的MC-PDFT计算中使用COF26。

英文摘要

Multiconfiguration pair-density functional theory (MC-PDFT) provides an efficient and accurate framework for computing electronic energies in strongly correlated molecular systems, with the quality of the on-top functional being a key determinant of its predictive accuracy. Here, we introduce MMCDDB26, a rigorously curated benchmark database comprising 76 datasets and 1,495 reactions. We further propose a constrained, large-language-model-assisted optimization workflow for the development and assessment of MC-PDFT functionals. Using this workflow, we optimized the parameters of the MC23/MC25 functionals on MMCDDB26 to obtain MC26. Compared with earlier functionals of the same class, MC26 improves the accuracy on the training set and achieves a more balanced overall performance. In addition, we developed the hybrid meta-functional COF26. We find that COF26 delivers superior performance for both strongly and weakly correlated systems, and therefore recommend COF26 for future MC-PDFT calculations.

URL PDF HTML ☆

赞 0 踩 0

2604.03146 2026-06-08 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

AI总结通过将凸高斯极小极大定理推广到非高斯数据，刻画了高维经验风险最小化估计量的渐近分布，揭示了高斯普适性的适用范围与局限。

详情

Journal ref: ICML 2026
Comments: 28 pages, 5 figures, 1 table

AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化（ERM）。通过启发式地将凸高斯极小极大定理（CGMT）扩展到非高斯设置，我们推导出关键统计量的渐近极小极大表征，从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地，在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下，我们证明：对于独立于训练数据的测试协变量 $x$，投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量（方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$）的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外，我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟，以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

URL PDF HTML ☆

赞 0 踩 0

2603.21510 2026-06-08 eess.IV cs.CV 版本更新

Unregistered Spectral Image Fusion: Unmixing, Adversarial Learning, and Recoverability

未配准光谱图像融合：解混、对抗学习与可恢复性

Jiahui Song, Sagar Shrestha, Xiao Fu

AI总结提出无监督框架，通过耦合光谱解混和潜在空间对抗学习同时超分辨未配准的高光谱和多光谱图像，并首次建立可恢复性理论保证。

详情

AI中文摘要

本文研究一对空间未配准的高光谱图像（HSI）和多光谱图像（MSI）的融合问题，两者覆盖大致重叠区域。HSI提供高光谱但低空间分辨率，而MSI则相反。目标是整合它们的互补信息，以提升HSI空间分辨率和MSI光谱分辨率。虽然高光谱-多光谱融合（HMF）已被广泛研究，但未配准设置仍然具有挑战性。许多现有方法仅关注MSI超分辨，而保持HSI不变。监督深度学习方法被提出用于HSI超分辨，但依赖于准确的训练数据，这通常不可用。此外，理论分析主要处理已配准情况，导致未配准HMF理解不足。本文提出一种无监督框架，同时超分辨MSI和HSI。该方法将用于MSI超分辨的耦合光谱解混与用于HSI超分辨的潜在空间对抗学习相结合。在合理的生成模型下，建立了超分辨MSI和HSI可恢复性的理论保证——据我们所知，这是首次为未配准HMF提供此类见解。该方法在半真实和真实HSI-MSI对的不同条件下得到验证。

英文摘要

This paper addresses the fusion of a pair of spatially unregistered hyperspectral image (HSI) and multispectral image (MSI) covering roughly overlapping regions. HSIs offer high spectral but low spatial resolution, while MSIs provide the opposite. The goal is to integrate their complementary information to enhance both HSI spatial resolution and MSI spectral resolution. While hyperspectral-multispectral fusion (HMF) has been widely studied, the unregistered setting remains challenging. Many existing methods focus solely on MSI super-resolution, leaving HSI unchanged. Supervised deep learning approaches were proposed for HSI super-resolution, but rely on accurate training data, which is often unavailable. Moreover, theoretical analyses largely address the co-registered case, leaving unregistered HMF poorly understood. In this work, an unsupervised framework is proposed to simultaneously super-resolve both MSI and HSI. The method integrates coupled spectral unmixing for MSI super-resolution with latent-space adversarial learning for HSI super-resolution. Theoretical guarantees on the recoverability of the super-resolution MSI and HSI are established under reasonable generative models -- providing, to our best knowledge, the first such insights for unregistered HMF. The approach is validated on semi-real and real HSI-MSI pairs across diverse conditions.

URL PDF HTML ☆

赞 0 踩 0

2603.04982 2026-06-08 cs.CY cs.AI cs.HC 版本更新

Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis

技术培训：法律分析中生成式人工智能的采纳与生产性使用

Benjamin M. Chen, Hong Bao

AI总结通过随机实验发现，未经培训的法学学生使用大语言模型反而降低表现，而简短培训能显著提升采纳率和成绩，表明生成式AI的生产力需要培训支持。

详情

AI中文摘要

有针对性的用户培训能否释放生成式人工智能在专业环境中的生产潜力？我们通过一项随机实验研究了这个问题，其中164名法学学生在三种条件下完成了一项问题识别考试：无GenAI访问权限、可选访问大语言模型（LLM）、或LLM访问加简短培训干预。未经培训的LLM访问被证明适得其反：与没有任何LLM访问权限的参与者相比，未经培训的用户撰写的答案明显更短，犯更多案例陈述错误，且得分略低，尽管大多数差异未达到常规显著性水平。培训扭转了这一模式。接受培训的参与者以更高的比例采纳LLM（41% vs. 26%；p = 0.044），得分比未经培训的用户高0.27个绩点——大约一个精细等级——（p = 0.027），并且更准确地陈述了适用规则（p = 0.014）。主分层分析表明，培训主要通过采纳而非有效性发挥作用——在严格均值优势下，采纳下限（1.06）超过有效性上限（0.42）——尽管置信区间较宽。更广泛地说，这些发现挑战了GenAI主要惠及低技能工人的观点：没有培训，高能力从业者选择退出，而低能力用户采纳但无生产力。实现GenAI的生产力提升需要同时投资于访问和指导。

英文摘要

Can targeted user training unlock the productive potential of generative artificial intelligence in professional settings? We study this question using a randomized experiment in which 164 law students completed an issue-spotting examination under one of three conditions: no GenAI access, optional access to a large language model (LLM), or LLM access with a brief training intervention. Untrained LLM access proved counterproductive: relative to participants without any LLM access, untrained users wrote significantly shorter answers, committed more case misstatements, and scored marginally lower, though most differences fall short of conventional significance. Training reversed this pattern. Trained participants adopted the LLM at higher rates (41% vs. 26%; p = 0.044), scored 0.27 grade points higher than untrained users--roughly one fine grade--(p = 0.027), and stated applicable rules more accurately (p = 0.014). Principal stratification analysis suggests training operates primarily through adoption rather than effectiveness--the adoption lower bound (1.06) exceeds the effectiveness upper bound (0.42) at strict mean dominance--though confidence intervals are wide. More broadly, these findings challenge the view that GenAI primarily benefits lower-skilled workers: without training, higher-ability practitioners opt out while lower-ability users adopt but unproductively. Realizing GenAI's productivity gains requires investment in both access and instruction.

URL PDF HTML ☆

赞 0 踩 0

2603.22327 2026-06-08 cs.IR cs.AI cs.DL 版本更新

Evaluating AI-based Scientific Knowledge Synthesis with Epidemiological Systematic Reviews

基于流行病学系统评价评估AI科学知识综合

Shreyansh Padarha, Ryan Othniel Kearns, Tristan Naidoo, Lingyi Yang, Łukasz Borchmann, Piotr BŁaszczyk, Christian Morgenstern, Ruth McCabe, Sangeeta Bhatia, Philip H. Torr, Jakob Foerster, Scott A. Hale, Thomas Rawson, Anne Cori, Elizaveta Semenova, Adam Mahdi

AI总结提出AgentSLR评估框架，包含自动化工作流和专家标注数据集，测试LLM在流行病学系统评价各阶段能力，发现无模型全面领先，结构化提取是主要瓶颈。

详情

AI中文摘要

系统文献综述（SLR）是一种要求高且风险大的科学知识综合形式，但作为大型语言模型（LLM）的评估场景仍未被充分定义。我们引入了AgentSLR，一个大规模评估框架，包含SLR自动化工作流和覆盖16,248篇文章的专家标注数据集，旨在测试LLM在流行病学SLR各阶段的能力。参考标注来自关于WHO优先病原体的同行评审研究，并由领域专家制作。该框架将每个综述阶段作为独立单元进行评估，并配有专用指标，以便进行有针对性的失败分析。我们评估了五种前沿推理模型，发现没有单一模型在所有任务中占主导地位，显示出子任务专业化往往被聚合基准所掩盖。结构化数据提取是一个主要瓶颈，没有模型在平均字段级F1上超过0.67。估计成本差异很大，评估模型之间相差高达96倍。记录的失败模式表明，评估的模型在流行病学中尚不足以可靠地进行无监督部署，因为其发现可能影响公共政策。

英文摘要

Systematic literature reviews (SLRs) are a demanding and high-stakes form of scientific knowledge synthesis that remains underspecified as an evaluation setting for large language models (LLMs). We introduce AgentSLR, a large-scale evaluation harness comprising an SLR automation workflow and an expert annotated dataset covering 16,248 articles, designed to test LLM capabilities across the stages of SLRs in epidemiology. Reference annotations were derived from peer-reviewed studies on WHO priority pathogens and produced by domain experts. The harness evaluates each review stage as a separate unit with dedicated metrics enabling targeted failure analysis. We evaluated five frontier reasoning models and found that no single model dominated across all tasks, showing sub-task specialisation often hidden by aggregate benchmarks. Structured data extraction is a major bottleneck, with no model exceeding an average field-level F1 of 0.67. Estimated costs vary substantially, by up to 96 times across evaluated models. Documented failure modes suggest that the evaluated models are not yet reliable enough for unsupervised deployment in epidemiology, where findings can inform public policy.

URL PDF HTML ☆

赞 0 踩 0

2512.04123 2026-06-08 cs.CY cs.AI cs.LG cs.SE 版本更新

Measuring Agents in Production

生产环境中的智能体测量

Melissa Z. Pan, Negar Arabzadeh, Riccardo Cogo, Yuxuan Zhu, Alexander Xiong, Lakshya A Agrawal, Huanzhi Mao, Emma Shen, Sid Pallerla, Liana Patel, Shu Liu, Tianneng Shi, Xiaoyuan Liu, Jared Quincy Davis, Emmanuele Lacavalla, Alessandro Basile, Shuyi Yang, Paul Castro, Daniel Kang, Koushik Sen, Dawn Song, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia, Marquita Ellis

AI总结通过对86个已部署系统的调查和20个案例研究，发现生产环境中的LLM智能体主要采用简单可控的方法，可靠性是首要挑战，并依赖系统级设计和人工评估。

详情

Comments: Accepted to the 43rd International Conference on Machine Learning (ICML 2026) as Oral Presentation

AI中文摘要

基于LLM的智能体已经在许多行业的生产环境中运行，但我们缺乏对哪些技术方法能使部署成功的理解。我们首次系统性地研究了生产环境中的智能体测量（MAP），使用了来自智能体开发者的一手数据。我们通过深度访谈进行了20个案例研究，并调查了来自26个领域的86个已部署系统的从业者。我们调查了组织为何构建智能体、如何构建它们、如何评估它们以及它们面临的主要开发挑战。我们的研究发现，生产环境中的智能体是使用简单、可控的方法构建的：68%的智能体在人类干预前最多执行10步，70%依赖对现成模型进行提示而非权重调整，74%主要依赖人工评估。可靠性（随时间保持一致的正确行为）仍然是首要开发挑战，从业者目前通过系统级设计来解决。MAP记录了生产智能体的当前状态，为研究社区提供了部署现实和未充分探索的研究方向的可见性。

英文摘要

LLM-based agents already operate in production across many industries, yet we lack an understanding of what technical methods make deployments successful. We present the first systematic study of Measuring Agents in Production, MAP, using first-hand data from agent developers. We conducted 20 case studies via in-depth interviews and surveyed 86 deployed systems practitioners across 26 domains. We investigate why organizations build agents, how they build them, how they evaluate them, and their top development challenges. Our study finds that production agents are built using simple, controllable approaches: 68% execute at most 10 steps before human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation. Reliability (consistent correct behavior over time) remains the top development challenge, which practitioners currently address through systems-level design. MAP documents the current state of production agents, providing the research community with visibility into deployment realities and underexplored research avenues.

URL PDF HTML ☆

赞 0 踩 0

2601.12375 2026-06-08 cs.NI cs.LG 版本更新

LiQSS: Post-Transformer Linear Quantum-Inspired State-Space Tensor Networks for Real-Time 6G

LiQSS：后Transformer线性量子启发状态空间张量网络用于实时6G

Farhad Rezazadeh, Hatim Chergui, Amir Ashtari Gargari, Mehdi Bennis, Houbing Song, Lingjia Liu, Merouane Debbah

AI总结提出一种后Transformer的量子启发状态空间张量网络LiQSS，用线性复杂度结构状态空间核替代自注意力，结合张量训练分解和轻量门控，在6G O-RAN近实时KPI预测中实现参数减少155倍、推理加速2.74倍且不损失精度。

详情

Comments: 13 pages, 4 figures, 5 tables

AI中文摘要

第六代（6G）开放无线接入网络（O-RAN）中的主动和智能控制需要在严格的近实时（Near-RT）延迟和计算约束下进行控制级预测。虽然基于Transformer的模型在序列建模中有效，但其二次复杂度限制了在近实时RAN智能控制器（RIC）分析中的可扩展性。本文研究了一种后Transformer设计范式，用于高效的无线电遥测预测。我们提出了一种量子启发的多体状态空间张量网络，用稳定的结构化状态空间动力学核替代自注意力，实现线性时间序列建模。采用张量训练（TT）/矩阵乘积态（MPS）表示形式的张量网络分解，以减少输入投影和预测头中的参数化和数据移动，同时轻量级通道门控和混合层捕获非平稳的跨关键性能指标（KPI）依赖关系。该模型实例化为一个智能感知-预测xApp，并在一个包含13个KPI的59441个滑动窗口的定制O-RAN KPI时间序列数据集上评估，以参考信号接收功率（RSRP）预测作为代表性用例。我们提出的线性量子启发状态空间（LiQSS）模型比先前的结构化状态空间基线小10.8倍至15.8倍，速度快约1.4倍。相对于基于Transformer的模型，LiQSS在不牺牲预测精度的情况下，参数数量减少高达155倍，推理速度提升高达2.74倍。

英文摘要

Proactive and agentic control in Sixth-Generation (6G) Open Radio Access Networks (O-RAN) requires control-grade prediction under stringent Near-Real-Time (Near-RT) latency and computational constraints. While Transformer-based models are effective for sequence modeling, their quadratic complexity limits scalability in Near-RT RAN Intelligent Controller (RIC) analytics. This paper investigates a post-Transformer design paradigm for efficient radio telemetry forecasting. We propose a quantum-inspired many-body state-space tensor network that replaces self-attention with stable structured state-space dynamics kernels, enabling linear-time sequence modeling. Tensor-network factorizations in the form of Tensor Train (TT) / Matrix Product State (MPS) representations are employed to reduce parameterization and data movement in both input projections and prediction heads, while lightweight channel gating and mixing layers capture non-stationary cross-Key Performance Indicator (KPI) dependencies. The proposed model is instantiated as an agentic perceive-predict xApp and evaluated on a bespoke O-RAN KPI time-series dataset comprising 59,441 sliding windows across 13 KPIs, using Reference Signal Received Power (RSRP) forecasting as a representative use case. Our proposed Linear Quantum-Inspired State-Space (LiQSS) model is 10.8x-15.8x smaller and approximately 1.4x faster than prior structured state-space baselines. Relative to Transformer-based models, LiQSS achieves up to a 155x reduction in parameter count and up to 2.74x faster inference, without sacrificing forecasting accuracy.

URL PDF HTML ☆

赞 0 踩 0

2510.17004 2026-06-08 cs.MA cs.AI 版本更新

ReclAIm: A Multi-Agent Framework for Monitoring and Correcting Performance Decline in Medical Imaging AI

ReclAIm：用于监测和纠正医学影像AI性能下降的多智能体框架

Eleftherios Tzanis, Michail E. Klontzas

AI总结提出基于大语言模型的多智能体框架ReclAIm，通过自然语言交互自动监测医学图像分类模型性能下降并触发微调，采用数据增强、类别不平衡处理和参数锚定正则化策略，在多个数据集上验证了有效性。

详情

DOI: 10.1148/ryai.250923
Comments: Published in Radiology: Artificial Intelligence (https://doi.org/10.1148/ryai.250923)

AI中文摘要

目的：开发并评估一个用于自动监测、检测和纠正医学图像分类模型性能下降的多智能体框架（ReclAIm）。材料与方法：ReclAIm是一个基于大语言模型的多智能体系统，通过自然语言交互运行。一个主智能体协调三个任务特定智能体，执行性能评估并在检测到显著性能下降时触发微调。微调流程包含数据增强、类别不平衡处理以及参数锚定正则化策略以限制灾难性遗忘。该系统使用多个影像数据集进行基准测试，包括脑部MRI、胸部CT和胸部X光片，按模型开发、推理（性能监测）和微调子集划分（60%:20%:20%）。结果：ReclAIm成功协调了所有数据集的训练、评估和性能监测。在18个模型中的8个中检测到测试数据与推理数据之间的性能差异，触发了微调流程以减少性能差距。在性能下降高达40.6%的情况下（心脏肥大数据集，InceptionV3），微调将性能指标恢复至基线值的2%以内。结论：ReclAIm为医学图像分类模型的自动监测和定向微调提供了一个原型框架，其自然语言接口旨在支持研究及潜在临床应用的可及性。

英文摘要

Purpose: To develop and evaluate a multi-agent framework (ReclAIm) for automated monitoring, detection, and correction of performance decline in medical image classification models. Materials and Methods: ReclAIm is a large language model-based multi-agent system that operates through natural language interaction. A master agent coordinating three task-specific agents performed performance evaluation and triggered fine-tuning when substantial performance declines were detected. The fine-tuning workflow incorporated data augmentation, class imbalance handling, and a parameter-anchoring regularization strategy to limit catastrophic forgetting. The system was benchmarked using multiple imaging datasets, including brain MRI, chest CT, and chest radiography, partitioned into model development, inference (performance monitoring), and fine-tuning subsets (60%:20%:20%). Results: ReclAIm successfully orchestrated training, evaluation, and performance monitoring across all datasets. Performance discrepancies between test and inference data were detected in 8 of 18 models, prompting fine-tuning workflows that reduced performance gaps. In cases with declines of up to 40.6% (cardiomegaly dataset, InceptionV3), fine-tuning restored performance metrics to within 2% of baseline values. Conclusion: ReclAIm provides a prototype framework for automated monitoring and targeted fine-tuning of medical image classification models, with a natural language interface designed to support accessibility in research and potential clinical applications.

URL PDF HTML ☆

赞 0 踩 0

2509.04991 2026-06-08 physics.ao-ph cs.AI cs.LG 版本更新

A Mechanism-Coupled Split Window Network for Medium- to High-Resolution Land Surface Temperature Retrieval

一种面向中高分辨率地表温度反演的机制耦合分裂窗网络

Tian Xie, Menghui Jiang, Chao Zeng, Huifang Li, Guanhao Zhang, Chan Li, Huanfeng Shen

AI总结提出并行分量解耦神经网络（PCD-Net），将分裂窗反演重构为物理分量系数的动态学习问题，通过分量级解耦建模和残差分支，实现复杂大气和地表条件下的高精度、鲁棒且全局可泛化的地表温度反演。

详情

AI中文摘要

地表温度（LST）是陆-气相互作用、地表能量收支和气候过程中的基本物理变量。从中高分辨率热红外（TIR）观测中获取的LST能有效揭示不同景观单元间的热环境差异。然而，在复杂大气条件和多样土地覆盖类型下，实现准确、鲁棒且全局可泛化的LST反演仍具挑战。传统分裂窗（SW）算法严重依赖经验参数化，其固定系数无法适应高温地表和高大气水汽含量等复杂场景。同时，传统数据驱动模型因缺乏显式物理结构约束，对分布外（OOD）样本的泛化能力有限。为解决这些问题，本研究提出并行分量解耦神经网络（PCD-Net）框架，将SW反演重构为物理分量系数的动态学习问题。以SW方程作为物理主干，该框架构建并行子网络，自适应学习对应常数项、一阶和二阶亮度温度差项的动态系数；同时引入残差分支，补充由地表发射率和大气水汽联合效应引起的非线性耦合校正。通过这种分量级解耦建模，PCD-Net显式刻画了地表发射率、大气水汽含量与不同SW物理分量之间的动态响应关系。

英文摘要

Land surface temperature (LST) is a fundamental physical variable in land-atmosphere interactions, surface energy budgets, and climate processes. LST derived from medium- to high-resolution thermal infrared (TIR) observations effectively reveals thermal environmental disparities across distinct landscape units. However, achieving accurate, robust, and globally generalizable LST retrieval remains challenging under complex atmospheric conditions and diverse land cover types. Traditional split window (SW) algorithms heavily rely on empirical parameterizations, whose fixed coefficients fail to adapt to complex scenarios such as high surface temperatures and high atmospheric water vapor content. Concurrently, conventional data-driven models exhibit limited generalizability to out-of-distribution (OOD) samples due to the absence of explicit physical structure constraints. To address these issues, this study proposes a Parallel Component Decoupled Neural Network (PCD-Net) framework, which reformulates SW retrieval as a dynamic learning problem of physical component coefficients. Using the SW equation as the physical backbone, the framework constructs parallel subnetworks to adaptively learn the dynamic coefficients corresponding to the constant, first-order, and second-order brightness temperature difference terms; meanwhile, a residual branch is incorporated to supplement the nonlinear coupling corrections induced by the joint effects of surface emissivity and atmospheric water vapor. Through this component-level decoupled modeling, PCD-Net explicitly characterizes the dynamic response relationships between land surface emissivity, atmospheric water vapor content, and different SW physical components.

URL PDF HTML ☆

赞 0 踩 0

2404.02141 2026-06-08 stat.ME cs.LG econ.EM stat.CO stat.ML 版本更新

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

使用Rashomon分区稳健估计因子数据中的异质性

Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

AI总结提出Rashomon分区集(RPS)贝叶斯框架，通过枚举后验密度接近最大后验模型的所有模型来量化模型不确定性，实现稳健的异质性估计。

详情

AI中文摘要

在观测数据和随机对照试验中，研究人员选择统计模型来阐述感兴趣的结果如何随可观测协变量的组合而变化。选择过于简单的模型可能会掩盖协变量组之间结果的重要异质性，而过于复杂则可能识别出虚假模式。在本文中，我们提出了一种新颖的贝叶斯模型不确定性框架，称为Rashomon分区集(RPS)。RPS包含所有后验密度接近最大后验(MAP)模型的模型。我们通过枚举而非采样来构建RPS，这确保我们探索数据中具有高证据的所有模型，即使它们提供截然不同的实质性解释。我们使用l0先验，该先验允许我们在不对效应之间的关联施加强假设的情况下捕获复杂的异质性，并从信息论角度证明该先验是极小化最优的。我们刻画了在RPS内相对于整个后验条件计算的参数（的函数）的近似误差。我们提出了一种算法，从可解释且唯一的模型类中枚举RPS，然后给出RPS大小的界限。我们提供了模拟证据以及三个实证例子：价格对慈善捐赠的影响、染色体结构的异质性以及小额信贷的引入。

英文摘要

In both observational data and randomized control trials, researchers select statistical models to articulate how the outcome of interest varies with combinations of observable covariates. Choosing a model that is too simple can obfuscate important heterogeneity in outcomes between covariate groups, while too much complexity risks identifying spurious patterns. In this paper, we propose a novel Bayesian framework for model uncertainty called Rashomon Partition Sets (RPSs). The RPS consists of all models that have posterior density close to the maximum a posteriori (MAP) model. We construct the RPS by enumeration, rather than sampling, which ensures that we explore all models with high evidence in the data, even if they offer dramatically different substantive explanations. We use a l0 prior, which allows the allows us to capture complex heterogeneity without imposing strong assumptions about the associations between effects, showing this prior is minimax optimal from an information-theoretic perspective. We characterize the approximation error of (functions of) parameters computed conditional on being in the RPS relative to the entire posterior. We propose an algorithm to enumerate the RPS from the class of models that are interpretable and unique, then provide bounds on the size of the RPS. We give simulation evidence along with three empirical examples: price effects on charitable giving, heterogeneity in chromosomal structure, and the introduction of microfinance.

URL PDF HTML ☆

赞 0 踩 0

2507.01548 2026-06-08 cs.HC cs.AI cs.CL 版本更新

Telling stories, making Hanzi: AI-assisted co-creation with elderly migrants in urban China

讲述故事，创造汉字：人工智能辅助中国城市老年移民的协同创作

Yunfei Chen, Wen Zhan, Peiyue Lin, Ziqun Hua, Ying Hu

AI总结通过协同创作工作坊，结合口述故事、AI辅助和手工制作，让老年移民创造新汉字以记录被忽视的生活故事，揭示参与者的异质性和适应能力，并展示AI作为降低表达门槛的创意启动器。

详情

DOI: 10.21606/drs.2026.963

AI中文摘要

本文探讨了中国城市老年移民如何记录日常语言和设计常忽略的故事。我们与10位老年人开展了两次协同创作工作坊。活动结合了口述故事、主持人中介的AI辅助和手工制作。大型语言模型通过主持人提出候选字形。参与者创作了新的汉字来承载他们的故事。生成的字符作为记忆锚点，用于后续的分享和复述。我们的解释性分析揭示了参与者之间的异质性和适应能力。参与者将AI视为降低表达和创作门槛的创意启动器，尤其对数字素养较低者。这项工作挑战了关于老年人的同质化假设以及统一能力和需求的预设。我们贡献了一个将AI定位为后台促进者的工作坊框架，并提供了在包容性城市系统中将老年移民视为社区记忆和情境文化知识来源的见解。

英文摘要

This paper explores how older migrants in urban China can record stories that everyday language and design often miss. We ran two co-creation workshops with 10 elders. Activities combined oral storytelling, facilitator-mediated AI assistance, and hand-making. Large language models proposed candidate glyphs through a facilitator. Participants crafted new Hanzi to hold their stories. The resulting characters served as memory anchors for later sharing and retelling. Our interpretive analysis shows heterogeneity and adaptive capacity among participants. Participants experienced AI as a creative initiator that lowered barriers to expression and making, especially for those with lower digital literacy. The work challenges homogenizing assumptions about older adults and the presumption of uniform capacities and needs. We contribute a workshop framework that positions AI as a backstage facilitator. We also offer insights on engaging older migrants as sources of community memory and situated cultural knowledge within inclusive urban systems.

URL PDF HTML ☆

赞 0 踩 0

2203.07904 2026-06-08 eess.IV cs.CV cs.LG 版本更新

Unsupervised Learning Based Focal Stack Camera Depth Estimation

基于无监督学习的焦堆相机深度估计

Zhengyu Huang, Weizhi Du, Theodore B. Norris

AI总结提出一种基于无监督深度学习的方法，从焦堆相机图像估计深度，在NYU-v2数据集上相比单图像方法显著提高精度。

2505.18006 2026-06-08 cs.CY cs.AI cs.HC cs.IR

AI Literacy for Legal AI Systems: A practical approach

为法律AI系统设计的AI素养：一种实用方法

Gizem Gultekin-Varkonyi

AI总结本文探讨了法律AI系统的AI素养，分析了其对法律和伦理发展的关键作用，并提出了一种实用的风险评估工具。

详情

DOI: 10.69695/ias.2025.4.01
Journal ref: Iustum Aequum Salutare, 2025, 21 (4)
Comments: Forthcoming in Iustum Aequum Salutare (2025) vol.21

AI中文摘要

法律AI系统正被全球司法和法律系统部署者和提供者越来越多地采用，以支持各种应用。尽管它们提供了减少偏见、提高效率和改善问责的潜在好处，但也带来了重大风险，需要在机会、法律和伦理发展和部署之间取得平衡。AI素养作为欧盟AI法案中的法律要求，以及部署者和提供者实现伦理AI的关键使能者，可以成为实现这一平衡的工具。本文引入了“法律AI系统”一词，然后分析了AI素养的概念及其与这些系统相关的利弊。这一分析与处理法律AI系统的组织的更广泛AI-L概念相关联。本文的成果是一份路线图问卷，作为实用工具，帮助开发者和提供者评估风险、益处和利益相关者的担忧，以满足社会和监管对法律AI的期望。

英文摘要

Legal AI systems are increasingly being adopted by judicial and legal system deployers and providers worldwide to support a range of applications. While they offer potential benefits such as reducing bias, increasing efficiency, and improving accountability, they also pose significant risks, requiring a careful balance between opportunities, and legal and ethical development and deployment. AI literacy, as a legal requirement under the EU AI Act and a critical enabler of ethical AI for deployers and providers, could be a tool to achieve this. The article introduces the term "legal AI systems" and then analyzes the concept of AI literacy and the benefits and risks associated with these systems. This analysis is linked to a broader AI-L concept for organizations that deal with legal AI systems. The outcome of the article, a roadmap questionnaire as a practical tool for developers and providers to assess risks, benefits, and stakeholder concerns, could be useful in meeting societal and regulatory expectations for legal AI.

URL PDF HTML ☆

赞 0 踩 0

2303.11949 2026-06-08 cs.NE cs.LG

A fuzzy adaptive evolutionary-based feature selection and machine learning framework for single and multi-objective body fat prediction

一种基于模糊自适应进化的方法用于单目标和多目标身体脂肪预测的特征选择和机器学习框架

Farshid Keivanian, Raymond Chiong, Zongwen Fan

AI总结本文提出了一种融合模糊集理论和进化算法的特征选择与机器学习框架，用于提升身体脂肪预测的准确性与稳定性，同时解决多目标优化中的冲突问题。

详情

DOI: 10.1016/j.neucom.2026.132974
Journal ref: Neurocomputing, Article 132974, 2026
Comments: Due to unforeseen challenges in coordination and supervision, including unavoidable delays, this study requires further review and refinement. To ensure it meets necessary academic and methodological standards, we have decided to withdraw the paper. We appreciate the understanding of the research community

AI中文摘要

预测身体脂肪可以为医疗人员和用户提供预防和诊断心脏病的重要信息。混合机器学习模型通过选择相关身体测量值并捕捉所选特征之间的复杂非线性关系，比简单的回归分析方法表现更好。然而，这些模型也存在一些缺点。将身体脂肪预测建模为组合的单目标和多目标优化问题时，常常陷入局部最优。当多个特征子集产生相似或接近的预测时，避免局部最优变得更加复杂。进化特征选择已被用于解决几种基于机器学习的优化问题。模糊集理论决定了探索和利用的适当水平，同时管理参数化和计算成本。通过进化特征选择、模糊集理论和机器学习算法，探索了一种加权求和身体脂肪预测方法，将矛盾的指标整合到一个复合目标中，由模糊自适应进化特征选择优化。混合模糊自适应全局学习局部搜索通用多样性特征选择应用于这种单目标特征选择-机器学习框架（FAGLSUD-based FS-ML）。在使用较少特征的情况下，该模型比其他混合和最新机器学习模型获得了更准确和稳定的脂肪百分比估计。还提出了多目标FAGLSUD-based FS-MLP，用于同时分析准确性、稳定性和维度冲突。为了做出关于最关键身体部位脂肪沉积和血液脂质水平的明智决策，医疗人员和用户可以使用一个良好的分布的帕累托集的权衡解决方案。

英文摘要

Predicting body fat can provide medical practitioners and users with essential information for preventing and diagnosing heart diseases. Hybrid machine learning models offer better performance than simple regression analysis methods by selecting relevant body measurements and capturing complex nonlinear relationships among selected features in modelling body fat prediction problems. There are, however, some disadvantages to them. Current machine learning. Modelling body fat prediction as a combinatorial single- and multi-objective optimisation problem often gets stuck in local optima. When multiple feature subsets produce similar or close predictions, avoiding local optima becomes more complex. Evolutionary feature selection has been used to solve several machine-learning-based optimisation problems. A fuzzy set theory determines appropriate levels of exploration and exploitation while managing parameterisation and computational costs. A weighted-sum body fat prediction approach was explored using evolutionary feature selection, fuzzy set theory, and machine learning algorithms, integrating contradictory metrics into a single composite goal optimised by fuzzy adaptive evolutionary feature selection. Hybrid fuzzy adaptive global learning local search universal diversity-based feature selection is applied to this single-objective feature selection-machine learning framework (FAGLSUD-based FS-ML). While using fewer features, this model achieved a more accurate and stable estimate of body fat percentage than other hybrid and state-of-the-art machine learning models. A multi-objective FAGLSUD-based FS-MLP is also proposed to analyse accuracy, stability, and dimensionality conflicts simultaneously. To make informed decisions about fat deposits in the most vital body parts and blood lipid levels, medical practitioners and users can use a well-distributed Pareto set of trade-off solutions.

URL PDF HTML ☆

赞 0 踩 0

2302.00198 2026-06-08 cs.NE cs.AI cs.NA math.NA

A fuzzy adaptive metaheuristic algorithm for identifying sustainable, economical, lightweight, and earthquake-resistant reinforced concrete cantilever retaining walls

一种模糊自适应元启发式算法用于识别可持续、经济、轻质且抗震的钢筋混凝土悬臂挡土墙

Farshid Keivanian, Raymond Chiong, Ali R. Kashani, Amir H. Gandomi

AI总结本文提出一种模糊自适应元启发式算法，用于优化抗震钢筋混凝土悬臂挡土墙的设计，考虑了结构强度、地质稳定性及几何变量，以实现轻质、经济且环保的抗震设计。

详情

DOI: 10.1016/j.jocs.2023.101978
Journal ref: Journal of Computational Science, Volume 70, Article 101978, 2023
Comments: There are six figures, 51 pages, and 12 tables in the revised manuscript that has recently been resubmitted to the Journal of Computational Science

AI中文摘要

在地震多发区，钢筋混凝土悬臂挡土墙的抗震性能至关重要。本研究利用水平和垂直伪静态系数来评估其抗震性能。为解决由此产生的土压力导致的钢筋混凝土悬臂（RCC）重量和力问题，26个结构强度和地质稳定性约束以及12个几何变量与每个设计相关联。这些约束和设计变量形成一个十二维解空间的约束优化问题。为了有效搜索并产生可持续、经济、轻质且能抵御地震危害的RCC设计，本文提出了一种新颖的自适应模糊基于元启发式算法。该方法将搜索空间划分为子区域，并基于其新颖的搜索组件建立探索、信息共享和开发搜索能力。此外，模糊推理系统被用于解决参数化和计算成本评估问题。研究发现，与几种经典和表现最佳的设计优化器相比，所提出的算法能够在九种地震条件下实现低成本、低重量和低二氧化碳排放的RCC设计。

英文摘要

In earthquake-prone zones, the seismic performance of reinforced concrete cantilever (RCC) retaining walls is significant. In this study, the seismic performance was investigated using horizontal and vertical pseudo-static coefficients. To tackle RCC weights and forces resulting from these earth pressures, 26 constraints for structural strengths and geotechnical stability along with 12 geometric variables are associated with each design. These constraints and design variables form a constraint optimization problem with a twelve-dimensional solution space. To conduct effective search and produce sustainable, economical, lightweight RCC designs robust against earthquake hazards, a novel adaptive fuzzy-based metaheuristic algorithm is applied. The proposed method divides the search space to sub-regions and establishes exploration, information sharing, and exploitation search capabilities based on its novel search components. Further, fuzzy inference systems were employed to address parameterization and computational cost evaluation issues. It was found that the proposed algorithm can achieve low-cost, low-weight, and low CO2 emission RCC designs under nine seismic conditions in comparison with several classical and best-performing design optimizers.

URL PDF HTML ☆

赞 0 踩 0

2606.07469 2026-06-08 econ.EM cs.NA econ.TH math.NA math.PR 新提交

Statistical and Numerical Convergence in Stochastic Equilibrium

随机均衡中的统计与数值收敛

David Staines

AI总结本文基于SELCKE的严格随机均衡理论，发现系统以特征值或逆特征值中更接近单位圆者与最大冲击持久性中较大者给出的速率几何收敛至长期均衡，并开发了检验随机均衡存在的模拟程序。

详情

Comments: 91 Pages: 63 Main Text, 28 Suppelementary Materials

AI中文摘要

本文阐述了来自SELCKE（Staines (2024a)）arXiv:2312.16214的严格随机均衡理论的最一般的计算和计量经济学含义。分析基础是发现系统几何收敛至长期均衡，其速率由特征值或逆特征值（来自外部）中更接近单位圆者与最大冲击持久性中的较大者给出。高阶冲击收敛更快。我开发了一个模拟程序，用于渐近检验特定模型是否存在随机均衡。基本逼近结果断言，无论展开阶数或损失函数如何，随机稳态都能提供最准确的摄动解。我还证明了当二阶项消失时，会出现超一致参数估计量$O(1/T)$。除了Calvo模型，我还研究了两种替代定价模型中的随机均衡。动力学显著简化。我通过误差中的最大滞后限制了脉冲响应达到峰值的时间。这为泰勒合同提供了经验支持，尽管存在单位根和强成本渠道的问题。对于菜单成本，我证明了初始价格分布超指数衰减，产生了一个等价于具有内生重置概率的Calvo模型的系统。异质性扰动的影响表现为实际产出与有效产出之间的额外楔子。借助新的分布论证，证明了目标函数在边界处的爆破，因此该模型满足递归均衡的现有特征值存在条件。在此过程中，为现有的理论模型和统计程序提供了新的见解。

英文摘要

This paper sets out the most general computational and econometric implications of the rigorous stochastic equilibrium theory from SELCKE (Staines (2024a)) arXiv:2312.16214. The analytical backbone is the discovery that the system converges geometrically to long-run equilibrium, at a rate given by the greater of the eigenvalue or inverse eigenvalue (from outside) closest to the unit circle and the maximum shock persistence. High-order shocks converge faster. I develop a simulation procedure to test, with asymptotic power, whether stochastic equilibrium exists for a particular model. The fundamental approximation result asserts that, whatever the order of expansion or loss function, the stochastic steady state delivers the most accurate perturbation solution. I also show that super-consistent parameter estimators $O(1/T)$ arise whenever second-order terms vanish. Besides Calvo, I study stochastic equilibrium in two alternative pricing models. Dynamics simplify considerably. I bound the time the impulse response peaks, by the maximum lag in the errors. This lends empirical support to Taylor contracts, although there are issues surrounding unit roots and the strong cost-channel. For menu costs, I demonstrate that the initial price distribution decays away super-exponentially, producing a system equivalent to Calvo with an endogenous reset probability. The impact of idiosyncratic disturbances appears as an additional wedge between actual and efficient output. Blow-up of the objective function at the boundary is proven, with the help of new distributional arguments, so the model meets existing eigenvalue existence conditions for the recursive equilibrium. Along the way, new light is shone on existing theoretical models and statistical procedures.

URL PDF HTML ☆

赞 0 踩 0

2606.07049 2026-06-08 econ.EM 新提交

CausalAlpha: A Real-Time Geopolitical Risk Index from OSINT Channels for Causal Discovery in Financial Markets

CausalAlpha: 来自OSINT渠道的实时地缘政治风险指数及其在金融市场因果发现中的应用

Andres Azqueta-Gavaldon, Borja Ureta

AI总结提出CausalAlpha框架，利用Telegram OSINT渠道构建高频地缘政治风险指数，通过PC算法发现地缘政治不确定性与金融变量之间的有向因果结构，并识别出政治不稳定和能源媒体覆盖是冲突覆盖的因果前因。

详情

AI中文摘要

我们介绍了CausalAlpha，一个开源框架，它利用自然语言处理从Telegram OSINT渠道构建高频地缘政治风险（GPR）指数，并应用因果发现方法识别地缘政治不确定性与金融市场变量之间的有向因果结构。与标准的情绪指数或格兰杰因果关系方法不同，CausalAlpha采用Peter-Clark（PC）算法来恢复五个类别特定GPR指标与一组涵盖大宗商品价格、股票指数和信用工具的金融变量之间的因果依赖有向无环图（DAG），并在四种DAG规范和三个显著性水平下使用500次块自助重采样进行估计。在alpha = 0.10时，所有DAG规范中出现了两个全局稳健的发现：政治不稳定和能源媒体覆盖独立且因果地先于冲突覆盖，将冲突确立为实时OSINT渠道中地缘政治叙事升级的主要因果汇。在最严格的显著性水平（alpha = 0.05）下，冲突覆盖因果地先于能源板块股票回报（delta XLE），这与地缘政治升级传导至能源市场一致。核心宏观面板的结构VAR证实，地缘政治NLP信号到金融市场价格的动态传导在日频上统计上较弱，表明地缘政治新闻信号主要作用于媒体叙事系统内部。该框架作为生产应用程序部署在Google Cloud Run上，具有自动数据收集和指数构建功能，代表了利用OSINT进行实时宏观金融风险监测的一步。

英文摘要

We introduce CausalAlpha, an open-source framework that constructs a high-frequency Geopolitical Risk (GPR) index from Telegram OSINT channels using natural language processing, and applies causal discovery methods to identify the directed causal structure between geopolitical uncertainty and financial market variables. Unlike standard sentiment indices or Granger-causality approaches, CausalAlpha employs the Peter-Clark (PC) algorithm to recover the directed acyclic graph (DAG) of causal dependencies between five category-specific GPR indicators and a set of financial variables spanning commodity prices, equity indices, and credit instruments, estimated across four DAG specifications and three significance levels with 500 block-bootstrap resamples. Two findings emerge as globally robust across all DAG specifications at alpha = 0.10: political instability and energy media coverage independently and causally precede conflict coverage, establishing conflict as the primary causal sink of geopolitical narrative escalation in real-time OSINT channels. At the strictest significance level (alpha = 0.05), conflict coverage causally precedes energy sector equity returns (delta XLE), consistent with geopolitical escalation transmitting to energy markets. A Structural VAR on the core macro panel confirms that dynamic transmission from geopolitical NLP signals to financial market prices is statistically weak at daily frequency, suggesting that geopolitical news signals operate primarily within the media narrative system. The framework is deployed as a production application on Google Cloud Run with automated data collection and index construction, representing a step toward real-time macrofinancial risk monitoring using OSINT.

URL PDF HTML ☆

赞 0 踩 0

2606.06638 2026-06-08 econ.EM 新提交

Consistent estimation in logit models using historical choices as practical consideration set

使用历史选择作为实际考虑集的Logit模型中的一致估计

C. Angelo Guevara

AI总结本文证明在Logit数据生成过程下，使用历史选择作为实际考虑集可得到参数的一致估计，基于对备选方案抽样定理的重新解释，并提供了蒙特卡洛证据。

详情

AI中文摘要

选择建模中的一个关键挑战在于指定考虑集，即个体在做选择时实际评估的备选方案子集，这对研究者来说是未观察到的（潜在的）。经典的经济人假设认为个体评估全部备选方案，这是一个行为上不可信的假设。实际选项包括直接询问个体，这引入行为偏差；将考虑集视为潜在构念，需要完全枚举和强识别假设；或依赖试图复制个体如何形成这些集的临时启发式方法或非参数方法。最近，一些研究者使用历史选择作为实际考虑集，随着智能卡、手机记录和扫描仪数据等被动数据源的可用性，这种方法变得越来越可行。本文正式证明了一个充分条件，并提供了蒙特卡洛证据，表明在具有跨实例同质选择概率的Logit数据生成过程下，基于历史选择定义实际考虑集可得到参数的一致估计。该证明基于对备选方案抽样定理的重新解释，将历史选择视为来自真实考虑集的抽样，并表明在所述假设下，均匀条件性质成立。文章最后讨论了这一结果的实际意义以及向其他建模框架和假设的潜在扩展。

英文摘要

A key challenge in choice modeling lies in specifying the consideration set, the subset of alternatives that individuals actually evaluate when making choices, which is unobserved (latent) to the researcher. The classical homo economicus assumption posits that individuals assess the full universal set of alternatives, a behaviorally implausible premise. Practical options include directly asking individuals, which introduces behavioral biases; treating the consideration set as a latent construct, requiring full enumeration and strong identification assumptions; or relying on ad hoc heuristics that attempt to replicate how individuals form these sets or on non-parametric methods. Recently, some researchers have used historical choices as practical consideration set, an approach made increasingly feasible by the availability of passive data sources such as smartcards, mobile phone records, and scanner data. This article provides a formal demonstration of a sufficient condition, along with Monte Carlo evidence, showing that, under a Logit data-generating process with homogeneous choice probabilities across instances, defining a practical consideration set based on historical choices yields consistent parameter estimates. The demonstration is based on a reinterpretation of the sampling-of-alternatives theorem, viewing historical choices as draws from the true consideration set, and showing that under the stated assumptions, the uniform conditioning property holds. The article concludes by discussing the practical implications of this result and potential extensions to other modeling frameworks and assumptions.

URL PDF HTML ☆

赞 0 踩 0