arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.03249 2026-06-03 cs.CC quant-ph

Quantum-Classical Equivalence for AND-Functions

AND函数的量子-经典等价性

Sreejata Kishor Bhattacharya, Farzan Byramji, Arkadev Chattopadhyay, Yogesh Dahiya, Shachar Lovett

AI总结通过证明任意布尔函数f的AND组合f∘AND_2的有界误差量子通信复杂性与经典确定性通信复杂性多项式相关（至多对数因子），解决了关于AND函数量子通信优势的长期猜想。

详情

AI中文摘要

量子通信复杂性中的一个主要开放问题是，对于计算全布尔函数，量子协议是否可能比经典协议指数级更高效；普遍猜想认为不可能。在一项开创性工作中，Razborov (2002) 通过证明当外层函数$f$对称时，形如$$ F(x,y) = f(x_1 \land y_1, \ldots, x_n \land y_n) $$的AND函数的有界误差量子与经典通信复杂性是多项式相关的，从而解决了该问题。此后，将此结果推广到所有AND函数一直悬而未决，并被多位作者提出。在本工作中，我们以强有力的方式解决了这个问题。我们证明，对于任意布尔函数$f$，函数$f \circ \mathrm{AND}_2$的有界误差量子与经典确定性通信复杂性是多项式相关的，至多相差$n$的多对数因子。我们通过证明两者——至多多项式损失——均由$f$的德摩根稀疏性的对数刻画来证明这一点。我们的结果建立在Chattopadhyay、Dahiya和Lovett (2025) 关于非稀疏布尔函数结构刻画的最新工作之上，我们将其推广以解决一般AND函数的猜想。

英文摘要

A major open problem in quantum communication complexity is whether quantum protocols can be exponentially more efficient than classical protocols for computing total Boolean functions; the prevailing conjecture is that they cannot be so. In a seminal work, Razborov (2002) resolved this question for AND-functions of the form $$ F(x,y) = f(x_1 \land y_1, \ldots, x_n \land y_n), $$ when the outer function $f$ is symmetric, by proving that their bounded-error quantum and classical communication complexities are polynomially related. Since then, extending this result to all AND-functions has remained open and has been posed by several authors. In this work, we settle this problem in a strong way. We show that for every Boolean function $f$, the bounded-error quantum and classical deterministic communication complexities of the function $f \circ \mathrm{AND}_2$ are polynomially related, up to polylogarithmic factors in $n$. We prove this by showing that both are characterized--up to polynomial loss--by the logarithm of the De Morgan sparsity of $f$. Our results build on the recent work of Chattopadhyay, Dahiya, and Lovett (2025) on structural characterizations of non-sparse Boolean functions, which we extend to resolve the conjecture for general AND-functions.

URL PDF HTML ☆

赞 0 踩 0

2606.03248 2026-06-03 cs.HC

Investigating Novice Researchers' Perceptions of Research Privacy Within LLM-Assisted Workflows

调查新手研究者在LLM辅助工作流中对研究隐私的看法

Shuning Zhang, Changxi Wen, Eve He, Ying Ma, Robert Xiao, Xin Yi, Hewu Li

AI总结通过半结构化访谈44名跨学科研究者，发现新手研究者因发表压力而加速依赖LLM，同时存在隐私误解，并提出输入分段、对抗性探测等缓解措施，但参与者认为效果有限。

详情

AI中文摘要

大型语言模型辅助的学术工作流引入了关键的隐私和知识产权风险。作为一个独特脆弱的群体，新手研究者因发表压力且缺乏机构支持，严重依赖公共LLM，迫使他们权衡高风险的隐私与发表。为调查这些问题，我们对44名跨学科研究者进行了半结构化访谈。我们的发现表明，对想法泄露的恐惧反而加速而非阻止了对LLM的依赖，因为研究者利用它们加快发表。他们还持有误解，认为自己的想法缺乏独特价值以吸引针对性攻击，并且他们的输入会安全地稀释在大量数据集中，防止重建。从访谈中，我们识别了五种缓解措施，包括输入分段和对抗性探测，尽管我们发现参与者普遍认为这些措施无效。我们概述了影响，包括实施机构级别的沙盒隔离、基于场景的隐私教学法以及可验证的数据删除审计以提高透明度。

英文摘要

Large Language Model (LLMs)-assisted scholarly workflows introduce critical privacy and intellectual property risks. As a uniquely vulnerable cohort driven by publication pressure and a lack of institutional support, novice researchers rely heavily on public LLMs, compelling them to navigate high-stakes privacy-publication trade-offs. To investigate these concerns, we conducted semi-structured interviews with 44 researchers across diverse disciplines. Our findings reveal that the fear of idea leakage paradoxically accelerates, rather than deters, reliance on LLMs, as researchers utilize them to expedite publication. They also held misconceptions that their ideas lacked the unique value to attract targeted attacks, and that their inputs would be safely diluted within massive datasets, preventing reconstruction. From interviews, we identified five types of mitigations including input fragmentation and adversarial probing, though we found that participants largely perceived these measures as ineffective. We outline implications including implementing institution-level sandboxed isolation, scenario-based privacy pedagogy, and verifiable data-deletion audits for transparency.

URL PDF HTML ☆

赞 0 踩 0

2606.03225 2026-06-03 cs.DB cs.DS

HRNN: A Hybrid Graph Index for Approximate Reverse k-Nearest Neighbor Search on High-Dimensional Vectors

HRNN: 一种用于高维向量近似反向k近邻搜索的混合图索引

Wenxuan Xia, Mingyu Yang, Wentao Li, Wei Wang

AI总结针对高维空间中反向k近邻搜索存在的候选扩展开销大和验证成本高的问题，提出HRNN混合图索引，通过代理点策略和离线物化kNN半径实现高效近似搜索。

Comments technical report

详情

AI中文摘要

反向k近邻（RkNN）搜索返回所有将查询向量视为其k近邻（kNN）之一的数据点。现有的RkNN方法通常遵循过滤-验证框架：首先收集查询向量附近的向量作为候选，然后根据它们的kNN半径（即到第k近邻的距离）进行验证。然而，现有方法在高维空间中面临两个关键限制。首先，附近的向量通常不属于查询的真实RkNN集，导致过度的候选扩展开销。其次，现有方法在验证过程中在线计算kNN半径，产生大量查询处理成本。为解决这些限制，我们提出了HRNN，一种用于近似RkNN搜索的混合图索引。（1）HRNN不直接将附近向量视为RkNN候选，而是基于查询的RkNN结果通常可以通过其附近向量的RkNN结果发现的假设，将它们用作代理点。（2）为降低验证成本，HRNN离线物化高保真度的kNN半径，消除了昂贵的在线重建，同时保持准确性。HRNN将导航图、排序KNN图和反向邻居列表组合成一个混合索引，支持高效的代理检索、候选生成和kNN半径访问。我们还开发了高效的索引构建和仅追加维护算法。大量实验表明，HRNN始终优于现有方法，吞吐量提升高达一个数量级。此外，HRNN可扩展到包含多达1000万个高维向量的数据集，同时支持高效的动态索引维护。

英文摘要

Reverse k-nearest neighbor (RkNN) search returns all data points that regard a query vector as one of their k-nearest neighbors (kNNs). Existing RkNN methods typically follow a filter-and-verification framework: vectors near the query vector are first collected as candidates and then verified against their kNN-radius (i.e., the distance to their k-th nearest neighbor). However, existing methods face two key limitations in high-dimensional spaces. First, nearby vectors often do not belong to the query's true RkNN set, resulting in excessive candidate expansion overhead. Second, existing methods compute kNN-radius online during verification, incurring substantial query-processing cost. To address these limitations, we propose HRNN, a hybrid graph index for approximate RkNN search. (1) Rather than directly treating nearby vectors as RkNN candidates, HRNN uses them as proxy points based on the assumption that a query's RkNN results can often be discovered through the RkNN results of its nearby vectors. (2) To reduce verification cost, HRNN materializes high-fidelity kNN-radius offline, eliminating expensive online reconstruction while preserving accuracy. HRNN combines a navigation graph, a ranked KNN graph, and reverse-neighbor lists into a hybrid index that supports efficient proxy retrieval, candidate generation, and kNN-radius access. We also develop efficient index construction and append-only maintenance algorithms. Extensive experiments show that HRNN consistently outperforms existing methods, achieving up to one order of magnitude higher throughput. Moreover, HRNN scales to datasets containing up to 10 million high-dimensional vectors while supporting efficient dynamic index maintenance.

URL PDF HTML ☆

赞 0 踩 0

2606.03221 2026-06-03 cs.IR

VirtualMLE: A Virtual ML Engineer that Optimizes Sequential Recommenders

VirtualMLE: 一种优化序列推荐器的虚拟机器学习工程师

Shiteng Cao, Jingwen Liu, Junda She, Zhiheng Li

AI总结提出VirtualMLE，一种利用LLM认知能力将推荐器优化组织为执行、反思和记忆更新闭环的框架，在SASRec和HSTU上以更少试验达到竞争性推荐质量，并展示了跨数据集迁移调优启发式的潜力。

详情

AI中文摘要

大型语言模型（LLMs）的最新进展在推理、反思和工具利用方面展示了卓越的能力，为自动化复杂工程工作流程开辟了新范式。然而，在序列推荐（SR）领域，在新数据集上调整模型仍然严重依赖经验丰富的机器学习工程师的手动试错。为弥补这一差距，我们提出 extbf{VirtualMLE}，一种LLM代理框架，利用LLMs的认知能力将推荐器优化组织为执行、反思和记忆更新的闭环。每次试验后，代理明确分析观察到的结果，并将简洁的启发式反馈存储在分层记忆系统中。我们在三个Amazon SR基准上使用两个代表性骨干SASRec和HSTU评估VirtualMLE。VirtualMLE以显著更少的试验达到竞争性的推荐质量。此外，我们观察到从先前数据集中提炼的认知摘要可以显著加速未见数据集上的搜索过程，展示了迁移调优启发式的潜力。总体而言，我们的结果提供了令人信服的证据，表明配备反思和记忆的LLM代理可以作为实用的虚拟工程师，自动化和摊销SR优化中的启发式学习。我们的代码已公开。

英文摘要

Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, reflection, and tool utilization, unlocking new paradigms for automating complex engineering workflows. However, in the domain of sequential recommendation (SR), tuning models on new datasets still relies heavily on the manual trial-and-error of experienced machine learning engineers. To bridge this gap, we propose \textbf{VirtualMLE}, an LLM-agent framework that leverages the cognitive capabilities of LLMs to organize recommender optimizing into a closed loop of execution, reflection, and memory update. After each trial, the agent explicitly analyzes the observed outcomes and stores concise heuristic feedback in a hierarchical memory system. We evaluate VirtualMLE on three Amazon SR benchmarks with two representative backbones, SASRec and HSTU. VirtualMLE reaches competitive recommendation quality with substantially fewer trials. Furthermore, we observe that cognition summaries distilled from previous datasets can significantly accelerate the search process on unseen datasets, demonstrating the potential of transferring tuning heuristics. Overall, our results provide compelling evidence that LLM agents equipped with reflection and memory can serve as practical virtual engineers to automate and amortize heuristic learning in SR optimization. Our codes are available.

URL PDF HTML ☆

赞 0 踩 0

2606.03218 2026-06-03 cs.CR

The Role of Domain-Specific Features in Malware Detection: A macOS Case Study

域特定特征在恶意软件检测中的作用：macOS案例研究

Biagio Montaruli, Andrea Oliveri, Savino Dambra, Davide Balzarotti

AI总结针对macOS恶意软件检测问题，首次引入域特定静态特征（如嵌入式证书、权限、持久化技术和关键系统API），训练机器学习检测器，在41,129个样本上达到98.50%的检测率，优于现有方法16%，并在9,000个新样本上验证了泛化能力（99.50%），特征移除导致性能下降15.92%。

Comments Accepted to ACM ASIACCS 2026

详情

DOI: 10.1145/3779208.3785392

AI中文摘要

尽管macOS在终端用户和企业系统中的普及度日益增长，但恶意软件研究主要集中于Windows和Android操作系统，macOS恶意软件检测问题相对未被充分探索。实际上，操作系统的特异性以及Mach-O文件格式的独特特征在未知样本分类中可发挥关键作用，大幅提高检测率。本文首次在文献中采用新的域特定特征，即macOS二进制文件特有的静态特征，如嵌入式证书、权限、持久化技术和关键系统API，来训练机器学习恶意软件检测器。我们在包含41,129个样本（11,413个良性、29,716个恶意可执行文件）的新数据集上进行了全面的实验评估，结果表明我们的解决方案达到了最先进的检测性能（98.50%），优于所有现有方法，检测率平均提升16%。我们还深入分析了各个特征的重要性，显示检测器有效利用了新的域特定特征。然后，为了评估检测器随时间推移的泛化能力，我们在包含9,000个新鲜macOS可执行文件的新数据集上进行了真实世界评估。结果表明：（i）检测器保持非常高的检测率（99.50%），（ii）优于现有最先进方法50%，（iii）域特定特征对于泛化到新型恶意软件样本至关重要，因为移除这些特征会导致检测性能下降15.92%。最后，我们还将数据集发布给研究社区。

英文摘要

Despite the growing popularity of macOS among end users and enterprise systems, malware research has primarily focused on Windows and Android operating systems, leaving the problem of macOS malware detection relatively unexplored. Indeed, the specificity of the operating system and the unique characteristics of the Mach-O file format can play a fundamental role in the classification of unknown samples, drastically increasing the detection rate. In this work, for the first time in the literature, we employ new domain-specific features, i.e., static features specific to macOS binaries, such as embedded certificates, entitlements, persistence techniques and key system APIs, to train a machine learning malware detector. We perform a comprehensive experimental evaluation on a novel dataset of 41,129 samples, comprising 11,413 benign and 29,716 malicious executables, and demonstrate that our solution achieves state-of-the-art detection performance (98.50%), outperforming all existing approaches, with an average improvement of 16% in terms of detection rate. We also provide an in-depth analysis of the importance of the individual features, showing that our detector effectively leverages the new domain-specific features. Then, in order to evaluate the generalization capabilities of our detector over time, we perform a real-world evaluation on a new dataset of 9,000 fresh macOS executables. The results show that (i) our detector maintains a very high detection rate (99.50%), (ii) outperforms the state-of-the-art by 50%, and (iii) the domain-specific features are crucial for generalizing to novel malware samples, as their removal leads to a 15.92% drop in detection performance. Finally, we also release our dataset to the research community.

URL PDF HTML ☆

赞 0 踩 0

2606.03215 2026-06-03 cs.CR cs.HC

Generative AI-Enabled Refund Fraud in Chinese E-Commerce: Investigation on Merchants and Platform Workers

生成式AI赋能的中国电子商务退款欺诈：对商家和平台工作人员的调查

Shuning Zhang, Eve He, Xiao Zhan, Shijing He, Robert Xiao, Xin Yi, Hewu Li

AI总结通过访谈中国电商市场的商家和平台工作人员，研究生成式AI如何使攻击者以低成本制造超逼真的产品缺陷证据，导致可扩展的欺诈行为，并探讨平台和商家采用的验证策略及其面临的挑战。

详情

AI中文摘要

电子商务争议解决通常依赖于数字证据真实反映物理现实的安全假设。生成式AI（GenAI）使这一威胁模型失效，攻击者能够以极低成本制造超逼真的产品缺陷证据。通过对中国电商市场中商家（N=17）和平台工作人员（N=13）的半结构化访谈，我们描述了向GenAI赋能的可扩展伪造的转变。我们概述了跨交易、争议、物流和通信四个阶段的四种GenAI赋能威胁向量的分类，突出了攻击者如何利用GenAI大规模合成物理上可信的产品缺陷。为缓解这些威胁，平台和商家正在调整验证策略，依靠AI工具进行自动筛选和对抗性询问（例如，要求多角度视频）以增加攻击复杂度。然而，我们发现阻碍这些防御措施采用的若干挑战，包括结构性平台限制等实施障碍以及GenAI技术复杂性的根本限制。最后，我们概述了隐私保护的跨平台欺诈数据库的设计启示，以及诸如将可验证的材料锚点嵌入产品等可追溯性机制。

英文摘要

E-commerce dispute resolution typically relies on the security assumption that digital evidence truthfully reflects physical reality. Generative AI (GenAI) invalidates this threat model, enabling attackers to fabricate hyper-realistic evidence of product defects at negligible cost. Through semi-structured interviews with merchants (N=17) and platform workers (N=13) in the Chinese e-commerce market, we characterize this shift toward GenAI-enabled scalable fabrication. We outline a taxonomy of four GenAI-enabled threat vectors across the transaction, dispute, logistics and communication phases, highlighting how attackers exploit GenAI to synthesize physically plausible product defects at scale. To mitigate these threats, platforms and merchants are adapting verification strategies, relying on AI tools for automated screening and adversarial interrogation (e.g., requesting multi-angle videos) to increase attack complexity. However, we find several challenges that hinder the adoption of these defenses, including implementation hurdles like structural platform constraints and fundamental limitations regarding the technical sophistication of GenAI. We conclude by outlining design implications for privacy-preserving cross-platform fraud databases, and traceability mechanisms such as embedding verifiable material anchors into the product.

URL PDF HTML ☆

赞 0 踩 0

2606.03194 2026-06-03 cs.CC math.CO math.OC

Lean 4 Machine-Verified Proof of P = NP via the Pedigree Polytope Membership Problem

Lean 4 机器验证的 P = NP 证明：基于谱系多面体成员问题

T. S. Arthanari

AI总结本文通过递归构造分层网络和多重商品流问题，证明谱系多面体成员问题（M3P）可在强多项式时间内求解，进而由对称旅行商问题（STSP）归约得到 P = NP，并在 Lean 4 中完成机器验证。

Comments 33 pages, 10 figures

详情

AI中文摘要

谱系多面体成员问题（M3P）询问：给定 $X\in\mathbb{Q}^{\binom{n}{3}}$，是否 $X\in\mathrm{conv}(P_n)$，其中 $P_n$ 是所有谱系的集合。谱系是 $K_n$ 中哈密顿圈构造的结构化编码。我们通过递归构造的分层网络 $(N_k, R_k, \mu)$ 和多重商品流问题 MCF$(k)$ 证明 M3P 可在强多项式时间内求解。建立的成员必要充分条件是 MCF$(n-1)$ 中的最优总流量等于最大可能流量 $z_{\max}$。基于 Tardos（1986）的组合线性规划强多项式算法，复杂度分析表明该条件可在所涉及矩阵维度的强多项式时间内检验。由充分性，这意味着 M3P $\in$ P。由于对称旅行商问题（STSP）可通过多阶段插入（MI）公式（Arthanari 1983）归约到 M3P，STSP 可在多项式时间内求解，P 与 NP 问题由此解决。导致该结果的所有证明已在 Lean 4/Mathlib4 中完全机器验证，主证明链中无未解决的 \texttt{sorry}。主要贡献是对主链中所有证明的 Lean 4 机器验证，得到 \texttt{theorem p\_equals\_np}: P = NP。Lean 4 形式化验证覆盖了 MCF(n-1) 对 $\mathrm{conv}(P_n)$ 成员性的充分性，以及通过 Maurras (2002)、Grötschel–Lovász–Schrijver (1988)、Cook (1971) 和 Karp (1972) 得到的 P = NP 链。完整的 Lean 项目（36 个 Lean 4 文件，2968/2968 构建目标干净）可在该 https URL 获取。

英文摘要

The Membership Problem for Pedigree Polytope (M3P) asks, given $X\in\mathbb{Q}^{\binom{n}{3}}$, whether $X\in\mathrm{conv}(P_n)$, where $P_n$ is the set of all pedigrees. A pedigree is a structured encoding of a Hamiltonian cycle construction in $K_n$. We establish that M3P is solvable in strongly polynomial time via a recursively constructed layered network $(N_k, R_k, μ)$ and a multicommodity flow problem MCF$(k)$. The necessary and sufficient condition for membership established is that the optimal total flow in MCF$(n-1)$ equals the maximum possible flow $z_{\max}$. The complexity analysis, grounded in Tardos's strongly polynomial algorithm for combinatorial linear programs (1986), shows that this condition can be checked in strongly polynomial time in the dimension of the matrix involved. By sufficiency, this implies M3P~$\in$~P. Since the Symmetric Travelling Salesman Problem (STSP) reduces to M3P via the Multistage Insertion (MI) formulation (Arthanari 1983), STSP is solvable in polynomial time, and the P vs.NP question is resolved. The proofs leading to this result are fully machine-verified in Lean~4/Mathlib4, with zero unresolved \texttt{sorry}s in the main proof chain. The main contribution is the Lean~4 machine verification of all proofs in the main chain, resulting in \texttt{theorem p\_equals\_np}: P = NP. The Lean~4 formal verification covers the sufficiency of MCF(n-1) for membership in $\mathrm{conv}(P_n)$, and the P = NP chain via Maurras (2002), Grötschel--Lovász--Schrijver (1988), Cook (1971), and Karp (1972). The complete lean project (36 Lean~4 files, 2968/2968 build targets clean) is available at https://github.com/TiruArt/Pedigree-Polytopes-Lean4.

URL PDF HTML ☆

赞 0 踩 0

2606.03190 2026-06-03 cs.HC

Focused on the User, Overlooking the Risks: Security and Privacy Understandings, Practices and Challenges of Independent Chinese AI Agent Developers

聚焦用户，忽视风险：中国独立AI代理开发者的安全与隐私理解、实践与挑战

Shuning Zhang, Mingyao Xu, Zhixin Huang, Yutong Jiang, Rongjun Ma, Yuting Yang, Xin Yi, Kanye Ye Wang, Hewu Li

AI总结通过访谈28名中国独立AI代理开发者，发现他们以用户为中心，关注面向用户的安全风险（如有害内容），但对安全漏洞意识薄弱，依赖临时手动防护和非正式沟通，缺乏正式工具和流程，主要受限于缺乏安全培训、工具和平台指导。

详情

AI中文摘要

AI代理的普及使独立开发者（定义为个人或小团队，自主发起项目而非履行客户合同）能够创建复杂的自主系统，但也引入了超越传统企业结构的新型安全与隐私（S&P）挑战。我们对中国开发者进行了访谈研究（N=28），他们广泛使用全球LLM服务，为这一群体提供了宝贵见解。我们调查了他们在开发的AI代理产品中对S&P挑战的理解、实践和挑战。我们发现独立开发者经常从用户角度思考和行动。他们关注面向用户的安全风险，如有害内容，而对安全漏洞的意识较低。因此，开发者几乎完全依赖临时的、手动制作的防护措施和非正式沟通，缺乏用于S&P实践的正式工具或流程。我们发现这些行为受到多种抑制因素的驱动，主要是缺乏S&P相关技能的正规培训、可访问的安全工具以及来自平台的可操作指导。我们的工作首次探索了独立AI代理开发者的S&P理解，为定制安全工具提供了机会。

英文摘要

The proliferation of AI agents empowers independent developers, defined as individual or small groups who self-initiate projects rather than fulfill client-based contracts, to create sophisticated autonomous systems, but also introduces novel security and privacy (S&P) challenges beyond traditional corporate structures. We conducted an interview study (N=28) with Chinese developers, whose extensive use of global LLM services offer valuable insights into this population. We investigate their understandings, practices and challenges of S&P challenges in their developed AI agent products. We revealed that independent developers frequently think and act from their users' perspective. They focused on user-facing safety risks such as harmful content while exhibiting low awareness of security vulnerabilities. Consequently, developers rely almost exclusively on ad-hoc, manually crafted safeguards and informal communication, with an absence of formal tools or processes for S&P practices. We found these actions are driven by various inhibitors, primarily a lack of formal training on S&P related skills, accessible security tools and actionable guidance from platforms. Our work contributed the first exploration of independent AI agent developers' S&P understanding, outlining opportunities for tailored security tooling.

URL PDF HTML ☆

赞 0 踩 0

2606.03178 2026-06-03 cs.SI

Evidence-Aware Protein Complex Detection: Methods, Benchmarks, and Reproducibility Challenges

证据感知的蛋白质复合物检测：方法、基准与可重复性挑战

Sima Soltani, Mehrdad Jalali, Yahya Forghani, Reza Sheybani

AI总结本文综述了结合PPI网络拓扑与多种生物证据的蛋白质复合物检测方法，指出透明证据感知图方法在生物合理性与可重复性间取得最佳平衡，并强调统一基准和可重复评估协议的重要性。

Comments Review article; 23 pages, 7 figures, 7 tables

详情

AI中文摘要

蛋白质复合物是细胞组织的核心单元，然而由于相互作用组图谱存在噪声、不完整、依赖上下文且注释不均，从蛋白质-蛋白质相互作用（PPI）网络中识别它们仍然困难。这篇聚焦的方法论综述考察了将PPI拓扑与基因本体（GO）注释、表达谱、亚细胞定位、序列或结构域证据、时间信息和表示学习相结合的证据感知方法，重点放在2018年后的方法和选定的历史基线上。核心综合结论是，透明的证据感知图方法目前在生物合理性与可重复性之间提供了最强的权衡，而深度、超图和动态异质模型扩展了生物现实性，但需要更强的基准控制。核心瓶颈不再仅仅是缺乏算法，而是缺乏协调一致、重叠感知且可重复的评估协议。因此，我们推荐统一的基准版本、明确的GO循环控制、重叠感知指标、不确定性估计和可执行的软件包，而不是孤立的源特异性F值提升。

英文摘要

Protein complexes are central units of cellular organization, yet their identification from protein-protein interaction (PPI) networks remains difficult because interactome maps are noisy, incomplete, context dependent, and unevenly annotated. This focused methodological review examines evidence-aware approaches that combine PPI topology with Gene Ontology (GO) annotations, expression profiles, subcellular localization, sequence or domain evidence, temporal information, and representation learning, with emphasis on post-2018 methods and selected historical baselines. The central synthesis is that transparent evidence-aware graph methods currently offer the strongest tradeoff between biological plausibility and reproducibility, while deep, hypergraph, and dynamic heterogeneous models expand biological realism but require stronger benchmark control. The central bottleneck is no longer only the lack of algorithms, but the lack of harmonized, overlap-aware, and reproducible evaluation protocols. We therefore recommend unified benchmark versions, explicit GO-circularity controls, overlap-aware metrics, uncertainty estimates, and executable software packages over isolated source-specific F-measure gains.

URL PDF HTML ☆

赞 0 踩 0

2606.03164 2026-06-03 cs.HC

Pulse Focus: Validation of the Focus Performance Score as a Behavioral Signal for Human Attentional State Modeling Toward Attention-Aware AI

Pulse Focus：聚焦表现评分作为人类注意力状态建模行为信号的验证——面向注意力感知AI

Yisak Debele, Israel Goytom, Anwar Misbah

AI总结通过行为、神经和公式验证，证明Pulse Focus移动Stroop应用中的聚焦表现评分（FPS）是衡量注意力控制的有效、可靠且具有神经基础的行为信号。

详情

AI中文摘要

建模和支持人类认知的人工智能系统需要可靠的认知状态测量。我们提出了来自Pulse Focus移动Stroop应用的聚焦表现评分（FPS），并评估了它在颜色-词语冲突解决过程中是否衡量注意力控制。我们进行了行为、神经和公式验证分析。行为结果（N=466，111,133次试验）显示，FPS捕捉到了Stroop干扰效应，追踪了注意力控制的个体差异，并表现出强大的重测信度。使用DMCC55B fMRI数据集（N=55）的神经验证表明，FPS的主要成分——平均不一致反应时——与前扣带回皮层（冲突监测的关键神经基质）的激活显著相关。公式验证识别并解决了评分框架内的结构冗余，并为加权设计提供了汇聚支持。综合来看，这些发现确立了FPS作为注意力控制的行为有效、可靠且具有神经基础的测量指标。FPS为评估人类注意力状态提供了可辩护的行为信号，并支持未来关于注意力感知的人机交互和生理状态建模的工作。

英文摘要

Artificial intelligence systems that model and support human cognition require reliable measures of cognitive state. We present the Focus Performance Score (FPS) from the Pulse Focus mobile Stroop application and evaluate whether it measures attentional control during color-word conflict resolution. We conduct behavioral, neural, and formula validation analyses. Behavioral results (N=466, 111,133 trials) show that FPS captures the Stroop interference effect, tracks individual differences in attentional control, and demonstrates strong test-retest reliability. Neural validation using the DMCC55B fMRI dataset (N=55) shows that the primary FPS component, mean incongruent reaction time, is significantly associated with anterior cingulate cortex activation, a key neural substrate of conflict monitoring. Formula validation identifies and resolves structural redundancy within the scoring framework and provides convergent support for the weighting design. Together, these findings establish FPS as a behaviorally valid, reliable, and neurally grounded measure of attentional control. FPS provides a defensible behavioral signal for evaluating human attentional state and supports future work on attention-aware human-AI interaction and physiological state modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.03152 2026-06-03 cs.DB

Cost-Aware Optimization for Agentic Query Execution

面向代价的智能查询执行优化

Lunyiu Nie, Yilin Xia, Yiren Liu, Christopher Jermaine, Swarat Chaudhuri

AI总结针对LLM支持的查询执行中算子放置、顺序和粒度影响代价与质量的问题，提出EnumGRPO优化器，通过上下文强化学习从枚举计划中蒸馏启发式策略，在SWAN数据库上实现0.011美元/查询的LLM代价和35.4%的执行准确率。

详情

AI中文摘要

经典查询优化搜索仅在代价上不同的代数等价计划。一旦LLM支持的算子进入画面，这一假设就被打破：它们的位置、顺序和粒度共同决定美元代价和答案质量，而备选方案中的正确选择通常只在运行时才显现。我们将此形式化为智能查询执行，这是一种将基于智能体的规划与执行交错进行的查询执行范式，智能体工作流优化成为经典查询优化的类比。然后，我们提出了EnumGRPO，一种针对此设置的自改进优化器。在学习阶段，EnumGRPO枚举关于执行范式、算子类型、算子放置、选择性范围和投影宽度等决策的查询计划，然后通过上下文强化学习将质量-代价反馈蒸馏为可重用的规划启发式策略。在SWAN的四个数据库中，EnumGRPO在LLM算子代价为0.011美元/查询时实现了35.4%的执行准确率，与混合查询基线相比，代价降低了约317倍，答案准确率相对提高了18%。

英文摘要

Classical query optimization searches over algebraically equivalent plans that differ only in cost. This assumption breaks once LLM-backed operators enter the picture: their placement, ordering, and granularity jointly determine both dollar cost and answer quality, and the right choice among the alternatives is often revealed only at runtime. We formalize this setting as agentic query execution, a query execution paradigm in which agent-based planning is interleaved with execution, and agent workflow optimization becomes the analogue of classical query optimization. We then present EnumGRPO, a self-improving optimizer for this setting. During a learning stage, EnumGRPO enumerates query plans over decisions such as execution paradigm, operator type, operator placement, selectivity scope, and projection width, then distills quality-cost feedback into reusable planning heuristics via in-context reinforcement learning. Across four databases in SWAN, EnumGRPO achieves 35.4% execution accuracy at $0.011 per query in LLM-operator cost, a ~317x cost reduction over the hybrid query baseline with an 18% relative improvement in answer accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.03151 2026-06-03 cs.AR cs.DB cs.ET

ACRONYM: Accelerated Approximate Nearest Neighbor Search in Memory for Dynamic Vector Databases

ACRONYM: 面向动态向量数据库的加速内存近似最近邻搜索

Md Mizanur Rahaman Nayan, Tianqi Zhang, Flavio Ponzina, Tajana Rosing, Azad J Naeemi

AI总结针对动态向量数据库频繁更新导致的索引重建问题，提出算法-硬件协同设计平台ACRONYM，利用与数据分布无关的编码和基于汉明距离的搜索实现高效硬件加速，支持连续更新，在百万级数据集上达到>90%召回率、8e6 QPS吞吐量。

详情

AI中文摘要

在检索增强生成、推荐系统和大规模嵌入检索等应用中，需要频繁更新的向量数据库搜索变得越来越关键。现有的解决方案，如基于图和基于分区的近似最近邻搜索（ANNS），由于依赖数据分布的索引影响持续部署并导致长重建延迟，因此遭受频繁的索引重建。本文提出了一个算法-硬件协同设计的平台ACRONYM，解决了最先进的数据库搜索中的关键问题。在算法上，它利用与数据分布无关的高效编码和基于汉明距离的搜索来实现高效的硬件加速。在架构上，我们提出了基于CAM的内存并行距离计算，随后是时分复用的近似top-k选择，以实现穷举搜索。我们提出了两阶段搜索，包括粗搜索和随后的二进制精化，以在基于CAM的搜索中实现高召回率，而由于容量和字线寄生，该搜索严重受限于小向量维度。ACRONYM支持无停顿的连续更新，并集成了新颖的基于XOR-and-Accumulate（XAC）的脉动阵列编码器，用于搜索期间的高效片上编码。在百万级数据集上，服务于动态数据库时，ACRONYM在8e6查询/秒的吞吐量下实现了>90%的召回率，内存占用仅为32MB，平均每次查询能耗为2.56uJ，相对于HNSW（CPU）加速约400倍，相对于FAISS-IVF（GPU）加速约80倍。

英文摘要

Vector database search with frequent updates is increasingly critical in applications such as retrieval augmented generation, recommendation systems, and large-scale embedding retrieval. Existing solutions, such as graph-based and partition-based approximate nearest neighbor search (ANNS), suffer from frequent index rebuilding due to data distribution-dependent indexing that impacts continuous deployment and causes long rebuilding latency. This paper proposes an algorithm-hardware co-designed platform, ACRONYM, that addresses key problems with state of the art database search. Algorithmically, it leverages efficient encoding independent of data distribution and Hamming-distance based search for efficient hardware acceleration. Architecturally, we propose CAM-based in-memory parallel distance computation followed by time multiplexed approximated top-k selection to enable the exhaustive search. We propose two-stage search that includes coarse search followed by binary refinement to achieve high recall in CAM based search which is heavily limited to small vector dimension due to capacity and wordline parasitic. ACRONYM supports continuous update without stalling and integrates novel XOR-and-Accumulate (XAC) based systolic-array encoder for efficient on chip encoding during search. Across million-scale datasets, while serving dynamic database ACRONYM achieves >90% recall at a throughput of 8e6 queries per second, with a memory footprint of only 32MB and an average energy consumption of 2.56uJ per query, speedup over HNSW (CPU) of about 400x and FAISS-IVF (GPU) of about 80x.

URL PDF HTML ☆

赞 0 踩 0

2606.03149 2026-06-03 eess.SY cs.SY

Equivalent Circuit Model based Electric Vehicle Evacuation with Mobile Charging Stations

基于等效电路模型的电动汽车疏散与移动充电站

Joseph Moyalan, Ricardo de Castro, Shuang Feng, Xuchang Tang, Xinfan Lin, Qijian Gan

AI总结提出基于等效电路模型的电动汽车疏散框架，通过电路类比建模交通流，联合优化路径、充电和拥堵管理，并引入移动充电站以最小化疏散时间。

详情

AI中文摘要

电动汽车的日益普及给紧急疏散规划带来了新的挑战，原因在于其有限的行驶里程、较长的充电时间以及受限的充电基础设施，尤其是在灾害引发的干扰下。本文提出了一种新颖的基于优化的电动汽车疏散框架，该框架利用等效电路模型（ECM）来联合处理路径规划、充电和拥堵管理。通过利用电学类比，将交通流建模为电流，行程时间建模为电阻，行驶里程建模为电压，从而能够使用基尔霍夫定律来强制执行流量平衡和能量可行性约束。所提出的可控ECM包含二进制开关以调节路径选择，并明确地对固定充电站（FCS）和移动充电站（MCS）的充电延迟和里程补充进行建模。由此产生的公式化导致一个整数规划问题，该问题确定最优疏散路线、充电时长以及MCS的放置位置和数量，以最小化疏散时间。该框架利用叠加原理扩展到多个起点-终点对，并支持公平感知的性能指标，包括最坏情况、平均和基于方差的疏散时间。在加利福尼亚州大规模交通网络上的仿真研究表明，所提出的方法显著提高了疏散效率和鲁棒性，特别是在充电接入受限的场景中，凸显了MCS在基于电动汽车的紧急疏散中的关键作用。

英文摘要

The increasing penetration of electric vehicles (EVs) introduces new challenges for emergency evacuation planning due to limited driving range, long charging times, and constrained charging infrastructure, particularly under disaster induced disruptions. This paper proposes a novel optimization based evacuation framework for EVs using Equivalent Circuit Models (ECMs) to jointly address routing, charging, and congestion management. By leveraging electrical analogies, traffic flow is modeled as electrical current, travel time as resistance, and driving range as voltage, enabling the use of Kirchhoff laws to enforce flow balance and energy feasibility constraints. The proposed controllable ECM incorporates binary switches to regulate route selection and explicitly models charging delays and range replenishment at both Fixed Charging Stations (FCSs) and Mobile Charging Stations (MCSs). The resulting formulation leads to an integer programming problem that determines optimal evacuation routes, charging durations, and the placement and number of MCSs to minimize evacuation time. The framework is extended to multiple origin destination pairs using the principle of superposition and supports fairness aware performance metrics, including worst case, average, and variance based evacuation times. Simulation studies on large scale transportation networks in California demonstrate that the proposed approach significantly improves evacuation efficiency and robustness, particularly in scenarios with limited charging access, highlighting the critical role of MCSs in EV based emergency evacuations.

URL PDF HTML ☆

赞 0 踩 0

2606.03145 2026-06-03 cs.DB

The Case for Text-to-SQL Friendly Logical Database Design

面向文本到SQL的友好逻辑数据库设计

Shi Heng Zhang, Zhengjie Miao, Jiannan Wang

AI总结本文提出LLM友好的逻辑数据库设计目标，通过三种语义保持的模式变换（抽象、分区、重命名）优化数据库模式，提升文本到SQL的生成准确率，实验显示执行准确率提升高达4.2%。

详情

AI中文摘要

传统逻辑数据库设计优化数据库模式（包括表、列、键、约束和视图）以确保正确性、完整性并支持人工编写的应用查询。基于LLM的文本到SQL改变了使用者：模式现在通常由语言模型以文本形式读取，因此保持数据库语义的设计选择仍可改变SQL生成的准确性。我们认为这创造了一个新的设计目标，与经典目标并列——LLM友好的逻辑数据库设计，即模式易于语言模型从自然语言映射到正确SQL的属性，并将其作为本文的优化目标。我们通过三种语义保持的模式变换来实例化该目标，这些变换重新利用了经典模式设计思想：模式抽象（+A：物化重复连接路径的逻辑视图）、模式分区（+P：感知工作负载的逻辑分区，修剪不相关上下文）和模式重命名（+R：改进下游列链接和谓词构建的描述性标识符）。这三个算子可组合，且每个都保持底层数据库语义。当历史问题-SQL对可用时，它们指导分区和抽象；在零样本设置中，重命名直接应用，抽象退化为针对每个问题的临时变体。我们在BIRD-Union和Spider-Union上评估了生成的模式，跨越多个文本到SQL流水线和语言模型骨干，执行准确率提升高达4.2%。最佳变换在不同流水线和模型间略有变化，完整的+A+P+R持续改进；多个算子组合在每个流水线上都具有竞争力。这些结果表明，LLM友好的逻辑设计是一个实用且未被充分探索的数据库端优化目标，与现有文本到SQL流水线互补。

英文摘要

Logical database design has traditionally optimized database schemas, including tables, columns, keys, constraints, and views, for correctness, integrity, and human-written application queries. LLM-based Text-to-SQL changes the consumer: the schema is now often read as text by a language model, so design choices that preserve database semantics can still change SQL-generation accuracy. We argue that this creates a new design objective alongside the classical ones - LLM-friendly logical database design, the property that a schema is easy for a language model to map from natural language to correct SQL - and treat it as the optimization target of this paper. We instantiate this objective with three semantics-preserving schema transformations that re-purpose classical schema-design ideas: schema abstraction (+A: logical views that materialize recurring join paths), schema partitioning (+P: workload-aware logical partitions that prune irrelevant context), and schema renaming (+R: descriptive identifiers that improve downstream column linking and predicate construction). The three operators compose, and each preserves the underlying database semantics. When historical question-SQL pairs are available, they guide both partitioning and abstraction; in zero-shot settings, renaming applies directly, and abstraction falls back to an ad-hoc per-question variant. We evaluate the resulting schemas on BIRD-Union and Spider-Union across multiple Text-to-SQL pipelines and language model backbones, with gains of up to 4.2% in execution accuracy. The best transformation varies modestly across pipelines and models, with the full +A+P+R consistently improving; multiple operator combinations are competitive on each pipeline. These results show that LLM-friendly logical design is a practical and underexplored database-side optimization target, complementary to existing Text-to-SQL pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.03138 2026-06-03 cs.IR

Section-Weighted Hybrid Approach for Legal Case Retrieval

基于章节加权的混合方法用于法律案例检索

Rajith Arulanandam, Nisansa de Silva

AI总结提出一个两阶段、感知章节的法律案例检索框架，先通过大语言模型分割判决书，再结合词法和语义搜索及加权聚合，在司法基准上取得一致提升。

Comments 10 pages, 4 figures. Accepted to the International Conference on Natural Language Processing (ICNLP 2026)

详情

AI中文摘要

寻找真正类似的先例需要捕捉超越表面词汇重叠的法律推理。我们提出了一个两阶段、感知章节的法律案例检索框架，首先使用确定性大语言模型离线将原始判决书分割为事实、争议点、判决和推理部分。在第一阶段，我们通过倒数排名融合结合并行词法（BM25）和语义（稠密ANN）全文搜索，形成高召回候选池。在第二阶段，我们进行细粒度的同类比较（例如，查询推理与候选推理）。为了解决无界词法分数与余弦相似度之间的尺度不匹配问题，我们在使用学习到的章节权重聚合信号之前，应用查询级别的Z分数归一化。对于顶部结果，系统返回相关章节文本，并附有简洁、有依据的理由和当事人立场标签。我们在一个司法规模基准上进行评估，证明了在保持高候选覆盖率的同时，相对于强词法和神经基线的一致提升。

英文摘要

Finding truly analogous precedents requires capturing legal reasoning beyond surface word overlap. We present a two-stage, section-aware framework for legal case retrieval that first segments raw judgments into facts, issues, decision, and reasoning using a deterministic large language model (LLM) offline. In Stage 1, we combine parallel lexical (BM25) and semantic (dense ANN) whole-document searches via Reciprocal Rank Fusion (RRF) to form a high-recall candidate pool. In Stage 2, we perform fine-grained, like-for-like comparisons (e.g., query reasoning vs. candidate reasoning). To address the scale mismatch between unbounded lexical scores and cosine similarities, we apply query-wise Z-score normalization before aggregating signals with learned section weights. For the top results, the system returns the relevant section text with a concise, grounded rationale and party-stance labels. We evaluate on a jurisdiction-scale benchmark, demonstrating consistent gains over strong lexical and neural baselines while maintaining high candidate coverage

URL PDF HTML ☆

赞 0 踩 0

2606.03126 2026-06-03 eess.SY cs.SY

Dynamics of the Thermomagnetic Pendulum

热磁摆的动力学

Ryan Thompson, Ethan Wang, Nilay Kant

AI总结研究一种由铁磁摆锤和偏置永磁体构成的热-磁-机械耦合系统，通过多物理场模型模拟其非线性动力学行为，揭示了扭矩不对称、居里点附近力快速减小及持续振荡等特性。

2606.03117 2026-06-03 cs.DL

Excessive use, ill use and misuse of Bibliometrics

文献计量学的过度使用、不当使用和滥用

Rajeeva Laxman Karandikar

AI总结本文指出影响因子、H指数等文献计量指标在全球科研评估中被过度依赖，缺乏统计基础，呼吁决策者应基于专家内容评估，而非这些指标。

Comments This article is meant for all sciences, specially decision makers

2606.03115 2026-06-03 cs.SE cs.MA

SPOQ: Specialist Orchestrated Queuing for Multi-Agent Software Engineering

SPOQ: 面向多智能体软件工程的专家编排队列

Royce Carbowitz, Dheeraj Kumar

AI总结提出SPOQ方法，通过基于波形的拓扑调度、双重验证门控和人类作为智能体集成，优化多智能体软件工程中的协调、质量控制和人类监督问题。

Comments 55 pages, 12 tables, 6 figures; includes longitudinal deployment study and open-weights replication

详情

AI中文摘要

多智能体AI系统在自动化软件工程任务方面显示出潜力，但现有方法存在协调开销、质量控制缺口和人类监督有限的问题。我们提出SPOQ（专家编排队列），一种结合三项创新的方法：（1）基于波形的拓扑调度，从任务依赖图计算并行执行波；（2）双重验证门控，在执行前（规划验证）和执行后（代码验证）应用质量度量以减少返工周期；（3）人类作为智能体（HaaA）集成，其中人类专家参与分解并在执行期间可被咨询。SPOQ使用三层智能体层次结构（Opus工作者、Sonnet评审者、Haiku调查者）来优化成本-质量权衡。我们通过四个实验评估SPOQ。实验1：波形调度接近关键路径下界（比率1.03--1.11，加速比高达14.3倍）；在2槽本地后端上，它提供稳定的1.4倍加速。实验2：SPOQ将规划覆盖率从93.0提高到99.75，消除循环规划，并将并行度从31.0提升到75.25。实验3：双重验证将每个任务的缺陷从0.34减少到0.20，并将测试通过率从91.25%提高到99.75%。实验4：人工评审将每个任务的残余缺陷从0.47减少到0.03。结果在本地托管的开放权重模型（Qwen3.6-35B-A3B）上复现，验证了增益归因于编排而非特定模型。一项涵盖17个仓库、8,589次提交、1,822个任务和13,866个测试（通过率99.87%）的纵向研究提供了生态验证。

英文摘要

Multi-agent AI systems show promise for automating software engineering tasks, yet existing approaches suffer from coordination overhead, quality control gaps, and limited human oversight. We introduce SPOQ (Specialist Orchestrated Queuing), a methodology combining three innovations: (1) wave-based topological dispatch that computes parallel execution waves from task dependency graphs; (2) dual validation gates applying quality metrics before execution (planning validation) and after (code validation) to reduce rework cycles; and (3) Human-as-an-Agent (HaaA) integration, where a human specialist participates in decomposition and can be consulted during execution. SPOQ uses a three-tier agent hierarchy (Opus workers, Sonnet reviewers, Haiku investigators) to optimize cost-quality tradeoffs. We evaluate SPOQ through four experiments. Experiment 1: wave dispatch approaches the critical-path lower bound (ratio 1.03--1.11, speedup up to 14.3x); on a 2-slot local backend it delivers a stable 1.4x speedup. Experiment 2: SPOQ improves planning coverage from 93.0 to 99.75, eliminates cyclic plans, and lifts parallelism from 31.0 to 75.25. Experiment 3: dual validation reduces defects from 0.34 to 0.20 per task and lifts test pass rate from 91.25% to 99.75%. Experiment 4: human review reduces residual defects from 0.47 to 0.03 per task. Results are replicated on a locally hosted open-weights model (Qwen3.6-35B-A3B), verifying gains are attributable to orchestration rather than any specific model. A longitudinal study across 17 repositories, 8,589 commits, 1,822 tasks, and 13,866 tests (99.87% pass rate) provides ecological validation.

URL PDF HTML ☆

赞 0 踩 0

2606.03107 2026-06-03 eess.SY cs.SY

Learning Local Optimal Controller for a Class of Nonlinear Systems via Impulse-Supervised Exploration

通过脉冲监督探索学习一类非线性系统的局部最优控制器

Adebayo Olayinka Oke, Nilay Kant

AI总结提出一种脉冲监督的受限探索框架，结合连续时间近似动态规划与脉冲监督层，通过脉冲制动将状态限制在局部线性近似有效的区域内，实现参数收敛所需的持续激励，同时避免状态偏离导致局部最优性失效。

2606.03104 2026-06-03 eess.SY cs.SY

Impedance Modeling and Stability Analysis of Droop-Controlled Inverter Under Unbalanced Power Grid Operating Conditions

不平衡电网运行条件下下垂控制逆变器的阻抗建模与稳定性分析

Qiang Zeng, Lipeng Zhu, Yang Li, Yi Lei, Quan Zhou, Jiayong Li, Cong Zhang, Bingxu Li, Zhikang Shuai

AI总结针对不平衡工况下镜像频率耦合效应导致现有模型不可靠的问题，提出基于谐波线性化的单输入单输出序列阻抗建模方法，通过捕获多频交互和失衡因素，结合归一化灵敏度分析与比例加权识别主导因素，并利用Bode准则分析其对稳定裕度的影响。

Comments 12 pages, accepted for publication in IEEE Transactions on Industrial Electronics

详情

DOI: 10.1109/TIE.2026.3692856

AI中文摘要

随着可再生能源在电网中的渗透率不断提高，并网逆变器与电网之间相互作用引起的振荡风险日益突出。尽管现有研究在逆变器建模和振荡稳定性分析方面取得了显著进展，但大多数研究未能充分考虑不平衡运行条件下复杂的镜像频率耦合效应（MFCE），导致模型不可靠和稳定性分析结果错误。为解决这一不足，本文开发了一种可广泛适用于不平衡运行条件的新型序列阻抗建模方案。具体而言，以典型构网型逆变器——下垂控制逆变器（DCI）为例，提出了一种基于谐波线性化（HL）的单输入单输出序列阻抗建模方法，对给定DCI及所连电网进行综合建模。通过考虑DCI内部的多频交互，该方法捕获了MFCE和不平衡因素，从而得到更精确的阻抗模型。进一步，结合归一化灵敏度分析和比例加权，识别出影响系统稳定性的主导因素。最后，通过Bode准则分析了三种典型不平衡运行条件下这些主导因素对系统稳定裕度的具体影响。本文所提整体方案的有效性和可靠性在搭建的并网下垂控制实验平台上得到了验证。

英文摘要

With the growing integration of renewable energy sources into power grids, the risks of oscillation caused by interactions between grid-tied inverters and the grids are becoming increasingly prominent. Although existing studies have made significant progress in inverter modeling and oscillatory stability analysis, most of them do not sufficiently consider complex mirror frequency coupling effects (MFCE) under unbalanced operating conditions, leading to unreliable models and erroneous stability analysis results. To address this inadequacy, this work develops a novel sequence impedance modeling scheme that can be widely applied to unbalanced operating conditions. In particular, taking a representative type of grid-forming inverter for instance, i.e., droop-controlled inverter (DCI), a single-input single-output sequence impedance modeling method based on harmonic linearization (HL) is proposed to comprehensively model both a given DCI and the connected grid. By accounting for multi-frequency interactions within the DCI, this method captures MFCE and unbalanced factors, leading to a more accurate impedance model. Further, the dominant factors influencing system stability are identified with a combination of normalized sensitivity analysis and proportional weighting. Finally, the detailed impacts of these dominant factors on system stability margin under three typical unbalanced operating conditions are analyzed through the Bode criterion. The effectiveness and reliability of the whole scheme proposed in this work are validated on the constructed grid-connected droop-controlled experimental platform.

URL PDF HTML ☆

赞 0 踩 0

2606.03095 2026-06-03 cs.HC

AI Assistance for Discretionary Work: Increasing Feedback Provision in Higher Education

AI辅助自由裁量工作：在高等教育中增加反馈提供

Romina Mahinpei, Victoria Dean, Ruth Fong, Lydia T. Liu, Manoel Horta Ribeiro

AI总结通过随机实验和定性访谈，研究AI辅助草稿如何提高助教在高等教育中提供个性化反馈的参与度，发现AI草稿显著增加反馈提供量和长度，且不降低学生评价或效率。

详情

AI中文摘要

AI系统通过生成用户可以采用、修改或忽略的中间产物，日益塑造人类工作流程。虽然先前研究表明AI辅助可以提高必要任务的效率和准确性，但关于它是否能增加用户通常意图执行但经常跳过的自由裁量但有益工作的参与度，我们知之甚少。我们在高等教育中的个性化反馈提供背景下研究这一问题，这是一项具有教学价值但通常是可选的做法。我们进行了一项混合方法研究，结合随机现场实验和定性访谈，在一门300级机器学习课程中，有n=11名助教和n=88名学生。学生提交被随机分配到（1）处理条件，助教在评分后收到AI辅助的反馈草稿，或（2）无草稿的控制条件。助教保持完全控制，可以自行使用、编辑或忽略草稿。我们发现AI辅助反馈显著增加了反馈提供（+10.8个百分点，SE=1.1，p<0.001）和反馈长度（+39.8字符，SE=3.45，p<0.001），而没有对学生有用性评级产生负面影响或减少每字符时间。定性发现表明，AI辅助草稿作为可编辑的支架，降低了启动反馈的障碍，而不是减少总体努力。我们的发现突显了AI在自由裁量但有益任务中的前景：增加可能被忽略的工作，同时保留人类对最终结果的控制。

英文摘要

AI systems increasingly shape human workflows by generating intermediate artifacts that users can adopt, revise, or ignore. While prior work has shown that AI assistance can improve the efficiency and accuracy of required tasks, less is known about whether it can increase participation in discretionary but beneficial work that users often intend to perform but frequently skip. We study this question in the context of personalized feedback provision in higher education, a pedagogically valuable but often optional practice. We conduct a mixed-methods study combining a randomized field experiment and qualitative interviews in a 300-level machine learning course with n=11 teaching assistants (TAs) and n=88 students. Student submissions were randomly assigned to either (1) a treatment condition where TAs received AI-assisted feedback drafts after grading or (2) a control condition without drafts. TAs remained fully in control and could use, edit, or ignore drafts at their discretion. We find that AI-assisted feedback significantly increases feedback provision (+10.8 percentage points, SE=1.1, p<0.001) and feedback length (+39.8 chars, SE=3.45, p<0.001) without negatively affecting student usefulness ratings or reducing time per character. Qualitative findings suggest that AI-assisted drafts function as editable scaffolds that lower barriers to initiating feedback rather than reducing overall effort. Our findings highlight AI's promise for discretionary but beneficial tasks: increasing work that might otherwise go undone while preserving human control over final outcomes.

URL PDF HTML ☆

赞 0 踩 0

2606.03081 2026-06-03 eess.SY cs.SY

Observer-Based Control of Linear Systems with Mismatched Input and Output Delays

基于观测器的具有不匹配输入和输出时滞的线性系统控制

Hieu Trinh, Phan Thanh Nam, Tran Ngoc Nguyen

AI总结针对控制输入和系统输出向量中存在不匹配时滞的线性系统，提出一种基于线性矩阵不等式和时滞补偿器的观测器控制方法，实现渐近稳定。

Comments Preprint of a chapter intended for a forthcoming research monograph

2606.03046 2026-06-03 cs.AR cs.CR

ZK-Flex: A Flexible and Scalable Framework for Accelerating Zero-Knowledge Proofs

ZK-Flex：一种灵活可扩展的零知识证明加速框架

Adiwena Putra, Cuong Manh Duong, Anh Quang Pham, Joo-Young Kim

AI总结针对零知识证明中多项式与椭圆曲线运算的高计算强度、多样化大精度模乘及动态负载切换挑战，提出软硬件协同框架ZK-Flex，通过Toom-Cook多精度核心、灵活NoC和链表内存机制，实现5-11倍加速和最高3.8倍面积效率提升。

Comments 7 pages, 8 figures, 2 tables. Accepted at DAC 2026 (63rd ACM/IEEE Design Automation Conference), July 26-29, 2026, Long Beach, CA, USA

详情

DOI: 10.1145/3770743.3803941

AI中文摘要

零知识证明（ZKP）允许证明者向验证者证明计算正确性而不泄露私有数据，确保了隐私和可验证性。然而，证明生成是高度计算密集型的，主要由多项式（POLY）和椭圆曲线（EC）运算主导。这些工作负载给硬件加速带来了两个关键挑战：（1）高效支持多样化的大精度模乘运算，（2）在动态切换于POLY和EC阶段的工作负载中保持高利用率。现有的可重构加速器仅部分解决了这些问题，在精度可扩展性、算法灵活性和资源效率方面仍有限制。为了克服这些限制，我们提出了ZK-Flex，一个灵活可扩展的软硬件协同设计框架，用于加速ZKP证明生成。软件层包含POLY和EC优化器，通过硬件和工作负载感知的算法选择减少计算量，而硬件集成了TCore，一个基于Toom-Cook的多精度核心，具有灵活的NoC和链表内存机制，在有限内存容量下提高了并行性。在代表性的ZKP基准测试中，ZK-Flex相比现有技术实现了5到11倍的加速和高达3.8倍的面积效率提升，为高性能、可重构的ZKP加速建立了新基础。

英文摘要

Zero-knowledge proofs (ZKP) allows a prover to convince a verifier of computational correctness without revealing private data, ensuring both privacy and verifiability. However, proof generation is highly compute-intensive, dominated by polynomial (POLY) and elliptic-curve (EC) operations. These workloads pose two key challenges for hardware acceleration: (1) efficiently supporting diverse large-precision modular multiplications, and (2) maintaining high utilization across workloads that dynamically shift between POLY and EC stages. Existing reconfigurable accelerators address these issues only partially, remaining limited in precision scalability, algorithmic flexibility, and resource efficiency. To overcome these limitations, we propose ZK-Flex, a flexible and scalable software-hardware co-designed framework for accelerating ZKP proof generation. The software layer incorporates POLY and EC optimizers that reduce computation through hardware- and workload-aware algorithmic choices, while the hardware integrates TCore, a Toom-Cook-based multi-precision core with a flexible NoC and a linked-list memory mechanism that improves parallelism under limited memory capacity. Across representative ZKP benchmarks, ZK-Flex achieves 5 to 11 times speedup and up to 3.8 times higher area efficiency over the state of the art, establishing a new foundation for high-performance, reconfigurable ZKP acceleration.

URL PDF HTML ☆

赞 0 踩 0

2606.03030 2026-06-03 cs.GT econ.GN q-fin.EC

Do Matching Mechanisms Work with LLM Agents?

匹配机制在LLM智能体市场中是否有效？

Yukihiro Hoshino, Ayato Kitadai, Nariaki Nishino

AI总结研究通过对比自由协商与集中式机制市场，发现基于机制的匹配市场在稳定性和效率上更优，且LLM智能体比人类更倾向于真实报告偏好，但策略证明性并非总能提高真实报告率。

2606.03024 2026-06-03 cs.CR cs.SE

SkillGuard: A Permission Framework for Agent Skills

SkillGuard：一种面向智能体技能的权限框架

Shidong Pan, Xiaoyu Sun, Tianyi Zhang, Dianshu Liao, Meixue Si, Zhenchang Xing

AI总结提出SkillGuard权限框架，通过双平面治理模型联合管控上下文影响和动作副作用，以缩小技能声明意图与运行时行为之间的安全差距。

详情

AI中文摘要

智能体技能通过可复用的指令、脚本、工具绑定和上下文依赖扩展了LLM智能体。然而，当前的技能生态系统主要依赖基于信任的加载和静态检查，在技能可以注入到智能体上下文的内容与其在运行时可能导致智能体执行的操作之间存在差距。这种差距引入了新的安全和隐私风险，现有防御措施主要静态检查技能文件或监管单个工具调用，未能系统地将技能的声明意图与其运行时行为联系起来。在本文中，我们提出了SkillGuard，一种以技能为中心的权限框架，将技能视为携带权限的可执行工件。SkillGuard引入了一种双平面治理模型，通过技能清单、运行时访问控制、用户中介授权、默认拒绝执行、能力推断和行为监控，联合管控上下文影响和动作副作用。我们在315个真实技能和SkillInject上评估了SkillGuard。权限分类覆盖了99.76%的观察到的受保护对象，自动清单生成达到了91.0%的F1值。在对抗性评估中，SkillGuard将上下文注入的攻击成功率从32.37%降低到23.02%，将明显注入的攻击成功率从25.56%降低到16.67%，同时保持了良性任务的效用。这些结果表明，SkillGuard作为一种以技能为中心的权限框架，可以为提高智能体技能生态系统的隐私和安全性提供实用基础。

英文摘要

Agent skills extend LLM agents with reusable instructions, scripts, tool bindings, and contextual dependencies. However, current skill ecosystems largely rely on trust-based loading and static inspection, leaving a gap between what a skill can inject into an agent's context and what it can cause the agent to do at runtime. This gap introduces new security and privacy risks, and existing defenses primarily inspect skill files statically or regulate individual tool calls, without systematically connecting a skill's declared intent with its runtime behavior. In this paper, we present SkillGuard, a skill-centric permission framework that treats skills as permission-bearing executable artifacts. SkillGuard introduces a dual-plane governance model that jointly regulates context influence and action side effects through skill manifests, runtime access control, user-mediated authorization, deny-by-default enforcement, capability inference, and behavior monitoring. We evaluate SkillGuard on 315 real-world skills and SkillInject. The permission taxonomy covers 99.76% of observed protected objects, and automated manifest generation reaches 91.0% F1. In adversarial evaluations, SkillGuard reduces attack success from 32.37% to 23.02% for contextual injections and from 25.56% to 16.67% for obvious injections, while maintaining benign task utility. These results suggest that SkillGuard, as a skill-centric permission framework, can provide a practical foundation for improving the privacy and security of agent skill ecosystems.

URL PDF HTML ☆

赞 0 踩 0

2606.03020 2026-06-03 cs.HC

Hanger Reflex Based Driving Assistance for Drivers with Peripheral Visual Field Defects

基于挂反射的周边视野缺损驾驶员驾驶辅助

Hailong Liu, Junya Wada, Toshihiro Hiraoka, Junpei Kuwana, Makoto Itoh, Takahiro Wada

AI总结针对周边视野缺损驾驶员，提出利用挂反射提示（HRC）通过机械压力引导头部朝向潜在危险行人，实验表明HRC显著增加头部旋转和注视时长，并可能通过头-眼-碰撞顺序路径减少碰撞。

详情

AI中文摘要

周边视野缺损的驾驶员可能无法注意到周边视野中的行人，导致危险意识延迟和碰撞风险增加。本研究探索了挂反射提示（HRC）作为周边视野缺损驾驶员的一种驾驶辅助方法，即对头部特定区域施加机械压力，以促进对潜在危险行人的预期定向并支持更安全的驾驶。在包含15名参与者的驾驶模拟器实验中，我们比较了在模拟周边视野缺损条件下遇到行人时有无HRC的驾驶行为。结果显示，HRC显著将驾驶员的模态头部旋转角度转向危险行人，并显著增加了对该行人的注视时长。尽管HRC对碰撞发生的直接影响仅显示出边缘趋势，但HRC条件下的碰撞发生率低于无HRC条件。分段结构方程模型分析进一步表明，HRC可能通过从头部旋转到注视分配再到碰撞发生的顺序路径有助于减少碰撞。这些发现提供了初步证据，表明HRC可以支持对周边危险的预期注意力分配，并可能为视野缺损驾驶员提供一种有前景的驾驶辅助方法。

英文摘要

Drivers with peripheral visual field defects may fail to notice pedestrians in their peripheral visual field, leading to delayed hazard awareness and increased collision risk. This study explores hanger reflex cue (HRC) as a driving assistance method for drivers with peripheral visual field defects, in which mechanical pressure is applied to specific regions of the head to facilitate anticipatory orientation toward potentially risky pedestrians and support safer driving. In a driving simulator experiment with 15 participants, we compared driving behavior with and without HRC during pedestrian encounters under simulated peripheral visual field defect. The results showed that HRC significantly shifted drivers' modal head rotation angle toward the risky pedestrian and significantly increased gaze duration toward that pedestrian. Collision occurrence was lower in the w/ HRC condition than in the w/o HRC condition, although the direct effect of HRC on collision occurrence showed only a marginal trend. A piecewise structural equation modeling analysis further suggested that HRC may contribute to collision reduction through a sequential pathway from head rotation to gaze allocation and then to collision occurrence. These findings provide preliminary evidence that HRC can support anticipatory attention allocation toward peripheral hazards and may offer a promising driving assistance method for drivers with visual field impairment.

URL PDF HTML ☆

赞 0 踩 0

2606.03010 2026-06-03 cs.CR

Secure AltDA Integration for Ethereum L2s: An End-to-End Validation Framework

以太坊L2的安全AltDA集成：端到端验证框架

Bowen Xue, Samuel Laferriere

AI总结针对以太坊L2与替代数据可用性（AltDA）系统集成时缺乏完整规范的问题，提出一个规范的验证框架，通过类型化、确定性和全转换模型确保安全集成，并应用于Celestia-Blobstream、EigenDA和Avail-ZKsync等架构。

详情

AI中文摘要

替代数据可用性（AltDA）系统为以太坊L2提供外部数据发布层，用于高吞吐量rollup设计。通过将批量数据发布移至以太坊之外，AltDA允许L2处理比原生DA更多的数据。然而，这种替换引入了一个新的共识关键集成层。现有的生态系统框架识别了高层风险，例如外部DA信任假设以及DA验证器的存在与否，但并未提供L2应如何与AltDA集成的完整规范。这一差距可能导致L2停止、诚实L2节点之间的推导不一致、无效状态断言或桥攻击。本文提出了一个用于安全AltDA集成的规范验证框架。我们将边界建模为从L1收件箱字节到AltDA承诺、再到外部可用数据、最后到核心L2逻辑其余部分消耗的rollup负载的类型化、确定性和全转换。核心原则是每个对抗性输入必须导致定义的唯一结果。我们展示了缺失义务如何导致具体故障模式，包括欠约束结算、推导停止、不一致的诚实节点行为、无效状态断言和桥安全故障。然后我们将该框架应用于代表性的AltDA集成架构，包括Celestia-Blobstream、基于EigenDA的设计和Avail-ZKsync。我们的评估表明，安全AltDA集成并非仅由DA提供者或桥决定。周围的L2集成还必须强制执行连接L1收件箱输入到已接受L2状态的完整验证关系。

英文摘要

Alternative data availability (AltDA) systems provide Ethereum L2s with an external data publication layer for high throughput rollup designs. By moving bulk data publication outside of Ethereum, AltDA allows L2s to process more data than native DA. However, this replacement introduces a new consensus critical integration layer. Existing ecosystem frameworks identify high level risks, such as external DA trust assumptions and the presence or absence of a DA verifier, but do not provide a complete specification for how an L2 should integrate with AltDA. This gap can lead to L2 halts, inconsistent derivation across honest L2 nodes, invalid state assertions, or bridge attacks. This paper presents a canonical validation framework for secure AltDA integration. We model the boundary as a typed, deterministic, and total translation from L1 inbox bytes to an AltDA commitment, then to externally available data, and finally to the rollup payload consumed by the rest of core L2s logic. The central principle is that every adversarial input must lead to a defined unique outcome. We show how missing obligations lead to concrete failure modes, including underconstrained settlement, derivation halts, inconsistent honest node behavior, invalid state assertions, and bridge safety failures. We then apply the framework to representative AltDA integration architectures, including Celestia-Blobstream, EigenDA based designs, and Avail-ZKsync. Our evaluation shows that secure AltDA integration is not determined solely by the DA provider or bridge. The surrounding L2 integration must also enforce the full validation relation connecting L1 inbox inputs to accepted L2 state.

URL PDF HTML ☆

赞 0 踩 0

2606.02992 2026-06-03 cs.IR

Slipstream: Locality-Aware Graph Index Construction for Streaming Approximate Nearest Neighbor Search

Slipstream: 面向流式近似最近邻搜索的局部感知图索引构建

Shubing Yang, Dongfang Zhao

AI总结针对流式近似最近邻搜索中图索引频繁插入的计算瓶颈，提出利用向量流连续性从先前插入的候选点开始搜索，并采用自适应控制器调整搜索范围，显著提升吞吐量。

详情

AI中文摘要

图索引被广泛用于高召回率的近似最近邻搜索（ANNS），但许多实时应用需要流式ANNS。在这些实时应用中，连续到达的嵌入必须在更新图边之前搜索现有图以找到候选邻居，这使得重复的索引构建成为流式摄入工作负载的瓶颈。我们提出Slipstream，一种新方法，显著降低了ANNS图索引中频繁插入的计算成本。Slipstream的核心思想是利用向量流中的连续性：新到达的点从先前插入过程中发现的有希望的候选点开始搜索，而不是从入口点开始搜索。更具体地，Slipstream评估不同的起始候选子集，然后通过自适应控制器根据流的稳定性缩小或扩大范围。我们进一步证明Slipstream超越了启发式方法：我们推导了一个抽象模型来表征Slipstream的性能并分析其理论界限。我们在两个流行的开源库（Faiss、HNSWLib）中实现了Slipstream，并在五个流式向量数据集上与四种基线方法进行了比较。实验结果表明，Slipstream在保持至少0.95的recall@10的同时，端到端吞吐量比基线方法高出高达30.8倍。

英文摘要

Graph indexes are widely used for high-recall approximate nearest neighbor search (ANNS), but many real-time applications require streaming ANNS. In these real-time applications, continuously arriving embeddings must search the existing graph for candidate neighbors before updating graph edges, which makes repeated index construction a bottleneck for streaming ingestion workloads. We propose Slipstream, a new method that significantly reduces the computational cost of frequent insertions in graph indexes for ANNS. The core idea of Slipstream is exploiting the continuity in vector streams: the newly arrived point starts from promising candidates found during the previous insertion rather than searching from the entry point. More technically, Slipstream evaluates distinct subsets of starting candidates followed by an adaptive controller that narrows or widens the range according to the stream's stability. We further show that Slipstream is beyond heuristic: We derive an abstract model to characterize Slipstream's performance and analyze its theoretical bounds. We have implemented Slipstream in two popular open-source libraries (Faiss, HNSWLib) and compared it with four baseline methods on five streaming vector datasets. Experimental results show that Slipstream achieves up to 30.8$\times$ higher end-to-end throughput than baselines while maintaining at least 0.95 recall@10.

URL PDF HTML ☆

赞 0 踩 0

2606.02977 2026-06-03 cs.HC cs.SE

A Benchmarking Framework for Multimodal User Interface Toolkits: Comparing Modality Coverage, Developer Workflow, and Experimental Support

多模态用户界面工具包的基准测试框架：比较模态覆盖、开发者工作流和实验支持

Ariton Verush

AI总结本文提出一个基于文档分析、技术比较和未来开发者评估的结构化基准框架，从模态覆盖、开发者体验和实验支持三个维度比较多模态用户界面工具包。

Comments 13 pages, 3 tables, 1 figure. Benchmarking framework paper revised and expanded from an HCI seminar draft

详情

AI中文摘要

多模态用户界面越来越多地结合语音、手势、视觉、注视、触摸、生物信号和其他传感器数据。过去五年中的最新工具包，如 Geno、Multisensor-Pipeline (MSP)、ReactGenie 和 EmoSync，旨在使开发者更容易原型化此类界面，而较早的工作如 WAMI 则展示了早期基于网络的多模态系统是如何构思的。然而，该领域仍然缺乏一种系统且可重用的方式来比较这些工具包实际支持什么、它们从开发者那里卸载了多少实现工作，以及哪些评估策略适合它们。本文将一个人机交互研讨会草案重新构建为多模态用户界面工具包的基准测试框架论文。它不是报告完成的实证结果，而是提出了一个基于文档分析、技术比较和未来基于开发者的评估的结构化基准。该框架围绕三个维度组织：模态覆盖和交互抽象、开发者体验和工作流，以及实验和集成支持。本文通过五个代表性工具包说明了该框架：Geno、MSP、ReactGenie、WAMI 和 EmoSync。贡献是一个可重用的基准模板，未来的研究人员可以用实证测量、开发者研究和额外的多模态工具包来实例化它。

英文摘要

Multimodal user interfaces increasingly combine speech, gesture, vision, gaze, touch, biosignals, and other sensor data. Recent toolkits from the past five years, such as Geno, Multisensor-Pipeline (MSP), ReactGenie, and EmoSync, aim to make it easier for developers to prototype such interfaces, while older work such as WAMI shows how early web-based multimodal systems were conceived. Yet the field still lacks a systematic and reusable way to compare what these toolkits actually support, how much implementation work they offload from developers, and which evaluation strategies are appropriate for them. This paper reframes an HCI seminar draft into a benchmarking framework paper for multimodal user interface toolkits. Rather than reporting completed empirical results, it proposes a structured benchmark based on document analysis, technical comparison, and a future developer-based evaluation. The framework is organized around three dimensions: modality coverage and interaction abstraction, developer experience and workflow, and experimental and integration support. The paper illustrates the framework through five representative toolkits: Geno, MSP, ReactGenie, WAMI, and EmoSync. The contribution is a reusable benchmark template that future researchers can instantiate with empirical measurements, developer studies, and additional multimodal toolkits.

URL PDF HTML ☆

赞 0 踩 0

2606.02970 2026-06-03 cs.HC

From Explanation to Diagnosis: Next Generation Interactive Video Coach with Misstep Awareness

从解释到诊断：具有失误感知能力的下一代交互式视频教练

Xiao Jin, Rahul K. Dass, Ashok K. Goel

AI总结提出一种基于双模型架构的失误感知教练能力，通过将任务-方法-知识模型与教学模型结合，实现学习者错误的检测、分类和诊断性支架生成，从而提供更精确、可操作的反馈。

详情

AI中文摘要

智能辅导系统擅长生成解释，但很少对学习者出错的位置和原因提供原则性诊断。我们为Ivy（一种神经符号AI教练）引入了失误感知教练能力，该能力基于双模型架构，在佐治亚理工学院在线研究生AI课程中，用新的教学模型（PM）增强了任务-方法-知识（TMK）模型。PM通过为每个测验问题和错误回答编码学习者的潜在信念（对错误想法或缺失知识的简要陈述）、TMK位置（误解的来源）、误解类型以及从教师问答键中衍生的针对性支架，使教师的诊断知识显式化且机器可读。利用课程中的测验问题，我们演示了一个概念验证流程，该流程检测和分类学习者错误，并生成基于诊断的支架，使Ivy超越知识检索，走向诊断性失误感知，从而提供更精确、可操作的反馈，支持概念转变，并推动AI在教育与学习科学中的自适应学习系统发展。

英文摘要

Intelligent tutoring systems excel at generating explanations but rarely provide principled diagnosis of where and why a learner is wrong. We introduce a misstep-aware coaching capability for Ivy, a neurosymbolic AI coach, built on a two-model architecture that augments a Task-Method-Knowledge (TMK) model with a new Pedagogical Model (PM) in the context of an online graduate AI course at Georgia Tech. The PM makes instructor diagnostic knowledge explicit and machine-readable by encoding, for each quiz question and incorrect response, the learner's underlying belief(a brief statement of the incorrect idea or missing knowledge), a TMK locus(the source of the misunderstanding), a misconception type and targeted scaffolding derived from the instructor's Q\&A key. Using quiz questions from the course, we demonstrate a proof-of-concept pipeline that detects and classifies learner errors and generates diagnosis-grounded scaffolding, moving Ivy beyond knowledge retrieval toward diagnostic misstep awareness, and enabling more precise, actionable feedback that supports conceptual change and advances adaptive learning systems in AI in education and the learning sciences.

URL PDF HTML ☆

赞 0 踩 0