arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2136
专题追踪
2606.09902 2026-06-10 cs.NE cs.AI 新提交

The Whale That Outswam Evolution: Swarm Intelligence Maximises Memory in Connectome Reservoirs

超越进化的鲸鱼:群体智能在连接组储备池中最大化记忆

Anmol Guragain, Savvas Kakalis, Juan Ignacio Godino-Llorente

发表机构 * University of Murcia(穆尔西亚大学) University of Pisa(比萨大学)

AI总结 应用四种无梯度生物启发优化器优化六物种连接组储备池的边权重,在所有任务和物种上均优于未优化的生物基线,鲸鱼优化算法实现最大提升(记忆容量提升17倍,均方根误差降低89%),表明生物权重是拓扑无法替代的必要归纳偏置。

详情
AI中文摘要

储备池计算利用固定动力学的循环网络进行时序处理,仅需训练线性读出层。经过数百万年进化塑造的生物神经连接组可能编码了超越随机储备池的计算结构,但该结构能否通过原则性优化进一步增强仍是开放问题。我们通过将四种无梯度、生物启发优化器(粒子群优化、差分进化、灰狼优化器和鲸鱼优化算法)应用于基于连接组的回声状态网络的边权重来解决该问题,这些网络涵盖六个物种,其神经复杂性跨越六个数量级:秀丽隐杆线虫(279个神经元)、果蝇(49个节点)、小鼠(112个节点)、大鼠(73个节点)、猕猴(29个区域,连续FLNe突触强度)以及人类结构MRI连接(83个脑区)。每个连接组在四个经典储备计算基准上评估:记忆容量(MC)、Lorenz吸引子预测、NARMA-10系统辨识和Mackey-Glass混沌时间序列预测。所有四种优化器在从生物权重初始化时,在每个任务和物种上均一致优于未优化的生物基线。鲸鱼优化算法在每个任务上均取得最大增益:记忆容量提升高达17倍(秀丽隐杆线虫:1.39至23.91),均方根误差降低高达89%(Mackey-Glass,人类),对应所有物种和任务平均改进214%。关键的是,相同拓扑上的随机初始化始终表现劣于生物学,确立了生物权重值作为拓扑本身无法恢复的必要归纳偏置。这些结果将生物启发、生物初始化优化定位为跨动物王国连接组储备计算的一种原则性且广泛有效的策略。

英文摘要

Reservoir computing exploits the fixed dynamics of a recurrent network for temporal processing, requiring only a trained linear readout. Biological neural connectomes, shaped by millions of years of evolution, may encode computational structure beyond what random reservoirs provide, yet whether that structure can be further enhanced by principled optimisation remains an open question. We address it by applying four gradient-free, bio-inspired optimisers (Particle Swarm Optimisation, Differential Evolution, Grey Wolf Optimiser, and Whale Optimisation Algorithm) to the edge weights of connectome-based echo-state networks across six species spanning six orders of magnitude in neural complexity: C. elegans (279 neurons), Drosophila (49 nodes), mouse (112), rat (73), macaque (29 regions, continuous FLNe synaptic strengths), and human structural MRI connectivity (83 parcels). Each connectome is evaluated on four canonical reservoir computing benchmarks: Memory Capacity (MC), Lorenz attractor prediction, NARMA-10 system identification, and Mackey-Glass chaotic time-series prediction. All four optimisers consistently outperform unoptimised biological baselines across every task and species when initialised from biological weights. WOA achieves the largest gains on every task: up to a 17x MC improvement (C. elegans: 1.39 to 23.91) and up to 89% NRMSE reduction (Mackey-Glass, human), corresponding to an average 214% improvement across all species and tasks. Crucially, random initialisation on the same topology reliably underperforms biology, establishing biological weight values as an essential inductive bias that topology alone cannot recover. These results position bio-inspired, biologically-initialised optimisation as a principled and broadly effective strategy for connectome reservoir computing across the animal kingdom.

2606.09901 2026-06-10 cs.GR cs.CV cs.HC cs.LG cs.MM 新提交

On the Controllability-Fidelity Frontier in Diffusion Editing

扩散编辑中的可控性-保真度前沿

Yi Hu, Leying Yi, Emily Davis, Finn Carter

发表机构 * Xidian University(西安电子科技大学)

AI总结 本文理论结合实证研究扩散图像编辑中用户意图遵循、非目标内容保持与输出质量间的权衡,提出算法框架并揭示关键失败模式,讨论伦理考量。

Comments Preprint

详情
AI中文摘要

基于扩散的生成模型实现了强大的图像编辑能力,但在保持保真度和安全性的同时实现精确控制仍然具有挑战性。我们对可控的基于扩散的图像编辑进行了全面的理论和实证研究,分析了用户意图遵循、非目标内容保持和输出质量之间的权衡。我们的工作涵盖了文本和掩码引导编辑、点/拖拽操作以及基于反演的流程。我们推导了编辑目标的数学公式,并分析了噪声注入、分数引导和反演误差的动力学。我们提供了重构误差、重复编辑下的稳定性以及变化局部性的理论界限。我们提出了掩码局部化和指令引导编辑的算法框架(附伪代码),并在多个任务和指标(FID、身份相似性、CLIP对齐、伪影分数等)上进行了广泛的实验,比较了最先进的方法(例如TF-ICON \cite{lu2023tficone}、DragFlow \cite{zhou2025dragflow}、InstructPix2Pix \cite{brooks2023instructpix2pix}、UltraEdit \cite{zhao2024ultraedit})。我们的结果揭示了关键失败模式,如身份漂移、提示敏感性和组合错误。我们还讨论了图像编辑中的伦理考量,包括滥用风险、偏见、同意以及概念擦除技术(例如MACE \cite{lu2024mace}、ANT \cite{li2025ant}、EraseAnything \cite{gao2024eraseanything})作为安全措施。最后,我们总结了负责任、高保真度扩散编辑的最佳实践和未来方向。

英文摘要

Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.

2606.09896 2026-06-10 cs.GT cs.AI cs.LG 新提交

HMAF: A Hierarchical Multi-Slot GD-RTB Allocation Framework

HMAF:一种分层多槽GD-RTB分配框架

Tianxing Bu, Zhaoqi Zhang, Linyou Cai, Miao Xie, Shengri Xue, Tan Qu, Qianlong Xie, Xingxing Wang, Siqiang Luo, Gao Cong

发表机构 * Meituan(美团) Nanyang Technological University(南洋理工大学) China Agricultural University(中国农业大学)

AI总结 针对GD与RTB共存广告平台中短期收益与长期交付的平衡难题,提出分层多槽分配框架HMAF,采用计划-校准-执行范式,集成离线约束优化与在线决策,在美团实现GD交付率提升3.72%、广告总收入提升1.59%。

Comments Accepted by KDD 2026 Applied Data Science Track

详情
AI中文摘要

在现代在线广告平台中,保量交付(GD)合约与实时竞价(RTB)拍卖共存并相互竞价。现有方法要么将GD和RTB优化解耦,要么依赖启发式优先级规则,因此在复杂多槽交付和曝光约束下,难以有效平衡短期收入最大化与长期合约交付。为应对这些挑战,我们提出HMAF(分层多槽分配框架),一个旨在优化GD-RTB广告平台中曝光分配的统一框架。HMAF采用计划-校准-执行范式作为其核心结构,整合离线约束优化与在线决策,平衡离线GD资源规划、动态校准GD-RTB竞争力,并在多槽环境中做出实时列表级排名决策。HMAF已在全球最大在线食品配送平台之一美团的多项营销场景中实施,使GD交付率提升3.72%,广告总收入提升1.59%。

英文摘要

In modern online advertising platforms, Guaranteed Delivery (GD) contracts coexist and bid with Real-Time Bidding (RTB) auctions. Recent approaches either decouple GD and RTB optimization or rely on heuristic priority rules, and thus fail to effectively balance short-term revenue maximization with long-term contract delivery under complex multi-slot delivery and impression constraints. To address these challenges, we propose HMAF (Hierarchical Multi-Slot Allocation Framework), a unified framework designed to optimize impression allocation in GD--RTB advertising platforms. HMAF employs the Plan--Calibrate--Execute paradigm as its core structure, and integrates offline constraint optimization with online decision-making, balancing offline GD resource planning, dynamically calibrating GD--RTB competitiveness, and making real-time listwise rank decisions across multi-slot environments. HMAF has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 3.72% increase in GD delivery rate and a 1.59% increase in total advertisement revenue.

2606.09884 2026-06-10 cs.MA cs.AI cs.LG econ.EM 新提交

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

深度多智能体强化学习在异步定价中的失败模式:可复现触发器、轨迹诊断及部分修复

Shree Murthy, Rohan Pandey

发表机构 * DigitalOcean, USA(DigitalOcean美国)

AI总结 研究连续时间定价市场中深度多智能体强化学习的两种可复现失败模式:DDPG智能体之间的默契合谋和高事件率下的演员-评论家不稳定性,并通过异步性实现部分修复。

详情
AI中文摘要

我们研究了连续时间定价市场中深度多智能体强化学习的两种可复现失败模式:(i) 竞争性DDPG智能体之间形成默契合谋,以及(ii) 高事件率下的演员-评论家不稳定性。我们在一个单一的CT-MARL基准测试(泊松时钟价格更新、观测延迟$\delta$、内部最优logit需求)中实例化了这两种模式,表明同步DDPG智能体可靠地触发失败模式1,合谋指数$\Delta = 0.69 \pm 0.11$,并量化了一种部分微观结构修复:仅异步性就将合谋降低了48%,而增加延迟使其降至最低$\Delta = 0.28$。该修复具有明确记录的成本:它是部分的($\Delta$仍高于伯特兰水平),在$\delta$上非单调,并且无法承受失败模式2,后者在$\lambda = 5$时表现为DDPG评论家发散,并破坏了$(\lambda{=}5, \delta{=}1)$处的相图单元。我们为标量合谋指数配备了轨迹级诊断,揭示了情节内信号崩溃和冲击后无法恢复。

英文摘要

We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor--critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency $δ$, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index $Δ= 0.69 \pm 0.11$, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48\% and adding latency drives it to a minimum of $Δ= 0.28$. The fix has clearly documented costs: it is partial ($Δ$ remains supra-Bertrand), it is non-monotone in $δ$, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at $λ= 5$ and corrupts the phase-diagram cell at $(λ{=}5, δ{=}1)$. We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-shock non-recovery.

2606.09867 2026-06-10 cs.AR cs.AI 新提交

EstRTL: Functional Estimation Guided RTL Code Generation

EstRTL:功能估计引导的RTL代码生成

Qi Xiong, Renzhi Chen, Bowei Wang, Yuqing Xiong, Libo Huang, Lei Wang

发表机构 * College of Computer Science and Technology, National University of Defense Technology(国防科技大学计算机科学与技术学院) Defense Innovation Institute, Academy of Military Science (AMS)(军事科学院创新院) School of Computer Science and Technology, Shandong University(山东大学计算机科学与技术学院) Key Laboratory of Advanced Microprocessor Chips and Systems, Changsha, China, and College of Computer Science and Technology, National University of Defense Technology, Changsha, China(先进微处理器芯片与系统重点实验室,长沙,中国,和国防科技大学计算机科学与技术学院,长沙,中国) Defense Innovation Institute, AMS, Beijing, China and Qiyuan Lab, Beijing, China(军事科学院创新院,北京,中国和启元实验室,北京,中国)

AI总结 提出EstRTL框架,通过静态功能评分估计,结合生成、评估和修正三阶段范式,提升LLM生成RTL代码的功能正确性,在通用LLM上正确率提升3.2%-9.0%。

详情
AI中文摘要

优化寄存器传输级(RTL)代码在硬件设计中至关重要。大型语言模型(LLM)为RTL代码的自动生成和优化提供了新方法,有望显著加速设计过程并减少人力投入。然而,现有的RTL代码生成方法通常侧重于模型微调和利用各种扩展技术来增强RTL代码生成能力,缺乏对功能正确性的关注。确保生成的RTL代码不仅编译成功,而且在实际硬件实现中按预期运行仍然是一个关键挑战。为解决这一问题,我们提出了EstRTL,一个基于静态功能评分估计的LLM驱动的协作智能体框架,用于RTL代码生成。EstRTL采用三阶段范式:生成、评估和修正。在阶段中,功能评估智能体根据评分和评估结果静态评估生成的代码,并决定是直接输出代码、返回重新生成还是转发给代码修正智能体。该框架可应用于各种专为RTL代码生成设计的LLM,进一步增强生成代码的正确性。通过提供定量评分和可读的需求比较,它提高了AI辅助RTL代码生成的透明度。实验表明,EstRTL将通用LLM的RTL代码生成正确率显著提升了3.2%-9.0%,展示了我们系统的实用价值。代码和实验结果已开源,链接:this https URL。

英文摘要

Optimizing register transfer level (RTL) code is of vital importance in hardware design. Large language models (LLMs) provide new methods for the automatic generation and optimization of RTL code, offering the potential to significantly accelerate the design process and reduce human effort. However, existing methods for generating RTL code often focus on model fine-tuning and the use of various expansion techniques to enhance the RTL code generation capabilities, lacking attention to the functional correctness. Ensuring that the generated RTL code not only compiles successfully but also behaves as intended in real hardware implementations remains a critical challenge. To address this issue, we propose EstRTL, an LLM-powered collaborative agent framework for RTL code generation based on static functional score estimation. EstRTL operates a three-stage paradigm: Generation, Estimation and Correction. During the stages, the functional estimation agent statically evaluates the generated code based on score and assessment results, and decides whether to output the code directly, return it for regeneration, or forward it to the code correction agent. This framework can be applied to various LLMs that designed for RTL code generation, further enhancing the correctness of the generated code. By providing quantitative scores and human-readable requirements comparisons, it improves the transparency of AI-assisted RTL code generation. Experiments show that EstRTL significantly improves the correctness of RTL code generation by generic LLM by 3.2\%-9.0\%, demonstrating the practical value of our system. The codes and experimental results are open-sourced at link: https://anonymous.4open.science/status/EstRTL-E200/.

2606.09852 2026-06-10 cs.HC cs.AI cs.CL cs.LG cs.MA cs.SE 新提交

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

基于LLM的代码文档生成与多裁判评估

Ikbel Ghrab, Mohamed Dhieb, Ismail Khenissi, Ines Abdeljaoued-Tej

发表机构 * University of Tunis El Manar(突尼斯国家理工大学)

AI总结 提出利用八种大语言模型自动生成代码文档,并通过多裁判评估框架(四个LLM从九个维度评分)提升文档质量,在医学物理库上实验显示最佳与最差模型性能差距达42%。

Comments ICAHS, \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref Conference ICAHS IEEE, 2025

详情
AI中文摘要

高质量的源代码文档至关重要但往往被忽视,尤其是在医疗保健等关键领域,可靠性和可维护性至关重要。我们提出了一个AI驱动的框架,利用八种最先进的大语言模型(包括GPT、Gemini、Qwen和LLaMA变体)自动从代码和仓库生成文档。该系统基于PocketFlow编排框架,采用模块化流水线和高级提示工程,生成结构化、上下文感知的文档。为确保质量并指导模型选择,我们引入了MultiLLMasJudges评估框架,其中四个独立的LLM从九个标准(如完整性、清晰度和忠实度)评估输出。在开源医学物理库上进行的实验表明,最佳和最差模型之间的性能差距为42%。通过结合多样化的模型输出、优化的提示和严格的评估,我们的方法提高了文档质量并减少了人工工作量,特别是在安全关键的医疗软件中。

英文摘要

High-quality source code documentation is vital yet often neglected, especially in critical domains like healthcare where reliability and maintainability are essential. We presented an AI powered framework that automates documentation generation from code and repositories using eight state of the art Large Language Models (LLMs), including GPT, Gemini, Qwen, and LLaMA variants. Built on the PocketFlow orchestration framework, the system applies modular pipelines and advanced prompt engineering to produce structured, context aware documentation. To ensure quality and guide model selection, we introduced a MultiLLMasJudges evaluation framework, where four independent LLMs assess outputs across nine criteria, such as Completeness, Clarity, and Faithfulness. Experiments conducted on an open-source medical physics library, demonstrated showed a 42% performance gap between top and bottom models. By combining diverse model outputs, optimized prompting, and rigorous evaluation, our approach enhances documentation quality and reduces manual effort, especially in safety critical healthcare software.

2606.09849 2026-06-10 cs.HC cs.CV 新提交

Sketch-to-Layout: A Human-Centric Computational Agent for Constraint-Aware Synthesis of Modular Photobioreactors

草图到布局:面向约束感知的模块化光生物反应器合成的人本计算代理

Xiujin Liu, Shuqi Li, Yuxin Lin

发表机构 * Qrafty Technology Inc.(Qrafty技术公司) University of Michigan(密歇根大学)

AI总结 提出一种计算框架,将用户草图转化为模块化光生物反应器立面布局,通过约束满足问题(CSP)求解器实现近实时合成,并引入弱监督藻类健康监测管道,实现碳中性和自主维护。

Comments 13 pages, 6 figures

详情
AI中文摘要

建筑集成光生物反应器(PBR)为碳中和建筑提供了一条途径,但其部署受到配置复杂性和生物维护的阻碍。本文提出了一种模块化PBR立面系统,由协调设计意图与物理有效性的计算框架驱动。我们引入了具有集成容器和管道几何结构的“碳中和砖块”;整体流体通道实现了“即插即用”组装。为了应对14种模块几何结构的组合复杂性,我们开发了一个计算草图到布局代理,将布局合成公式化为约束满足问题(CSP)。利用CP-SAT引擎,该代理将稀疏的用户草图视为软先验,同时强制执行端口对齐和全局连通性等硬约束。这使得非专家能够在近实时内合成可制造配置。此外,为了促进自主维护,我们提出了一种弱监督藻类健康监测管道。通过采用混合CNN-注意力骨干和时间排序损失,该系统无需绝对真实标签即可从照片中量化生物活力。实验表明,CSP求解器在高达15x15的网格尺度上实现了95.5%的成功率。定性评估证实该框架在确保操作完整性的同时保留了设计语义。长期测试表明,视觉模块产生的健康轨迹与14天生物周期一致,这表明将交互式合成与低成本计算机视觉相结合可以普及可扩展的碳捕获系统。

英文摘要

Building-integrated photobioreactors (PBRs) offer a pathway for carbon-neutral architecture, yet deployment is hindered by configuration complexity and biological maintenance. This paper presents a modular PBR facade system powered by a computational framework reconciling design intent with physical validity. We introduce 'carbon-neutralization bricks' featuring integrated vessel-and-conduit geometry; monolithic fluid channels enable 'plug-and-play' assembly. To navigate the combinatorial complexity of 14 modular geometries, we develop a Computational Sketch-to-Layout Agent that formulates layout synthesis as a Constraint Satisfaction Problem (CSP). Using the CP-SAT engine, the agent treats sparse user sketches as soft priors while enforcing hard constraints like port alignment and global connectivity. This allows non-experts to synthesize fabrication-ready configurations in near real-time. Furthermore, to facilitate autonomous maintenance, we propose a weakly supervised algae health monitoring pipeline. By employing a hybrid CNN-attention backbone and a temporal ranking loss, the system quantifies biological vitality from photographs without absolute ground-truth labels. Experiments demonstrate the CSP solver achieves a 95.5% success rate on grid scales up to 15 x 15. Qualitative evaluations confirm the framework preserves design semantics while ensuring operational integrity. Long-term tests show the vision module produces health trajectories aligned with 14-day biological cycles, suggesting that integrating interactive synthesis with low-cost computer vision can democratize scalable carbon capture systems.

2606.09848 2026-06-10 cs.HC cs.AI cs.CY 新提交

Human-AI Coordination Zones: A Framework for Designing Human-in-the-Loop Experiences with Agentic AI

人机协调区域:设计具有代理性AI的人机协同体验框架

James Pierce, Vaiva Kalnikaitė, Siddharth Gupta, Brian Granger

发表机构 * Amazon Web Services(亚马逊网络服务)

AI总结 提出人机协调框架,通过显著性、参与度和活动三个维度定义协调区域,并提供输入分类、协调曲线和设计模式,用于生成、分析和沟通人机交互体验。

详情
AI中文摘要

随着生成式和代理性AI嵌入日常产品,实践者面临一个持续挑战:如何设计人机协调——即用户与AI系统通过界面进行的持续相互调整,以支持可用性、信任和安全性。现有资源提供高层次原则(“保持透明”、“维持用户控制”)或低层次UI模式,但缺乏连接两者的中层设计知识。通过对60个商业AI应用进行景观和人工制品分析,我们引入了一个框架,将人机协调定义为三个维度的相互作用:显著性(AI呈现的突出程度)、参与度(用户可做什么来参与AI)和活动(AI实际做什么)。我们贡献了中层工具,包括协调区域(为我做、在我之下做、与我一起做、没有我做)、输入分类(提示、激发、推断、分层)、用于映射用户旅程的协调曲线,以及展示框架生成能力的设计模式。该框架可生成性地应用于设计体验,分析性地评估现有体验,以及沟通性地在利益相关者之间阐述想法。

英文摘要

As generative and agentic AI becomes embedded in everyday products, practitioners face a persistent challenge: how to design human-AI coordination -- the ongoing mutual adjustment between users and AI systems as mediate through interfaces-that supports usability, trust, and safety. Existing resources offer high-level principles ("be transparent," "maintain user control") or low-level UI patterns, but there is a lack of mid-level design knowledge bridging the two. Through landscape and artifact analysis of 60 commercial AI applications, we introduce a framework defining human-AI coordination as the interplay of three dimensions: salience (how prominently AI is presented), involvement (what users can do to engage AI), and activity (what AI actually does). We contribute mid-level tools including coordination zones (done-for-me, done-under-me, done-with-me, done-without-me), an input taxonomy (prompted, sparked, inferred, layered), coordination curves for mapping user journeys, and design patterns demonstrating the generative capacity of the framework. The framework can be applied generatively to design experiences, analytically to evaluate existing ones, and communicatively to articulate ideas across stakeholders.

2606.09846 2026-06-10 cs.HC cs.AI cs.CL 新提交

CANVAS: Captioning Art with Narrative Visual-Audio AI Systems

CANVAS: 用叙事视觉音频AI系统为艺术配文

Vignesh Nagarajan

发表机构 * BASIS Phoenix High School(BASIS凤凰高中)

AI总结 提出一种自动化工作流,利用大语言模型和文本转语音服务生成多感官艺术描述和同步音频解说,在20秒内以低于0.05美元的成本生成文本加音频输出,显著提高词汇多样性和叙事细节。

Comments 22 pages, 16 figures, 3 tables, 21 references

详情
AI中文摘要

由于替代文本简短或缺失,视觉艺术在很大程度上仍对盲人和低视力(BLV)观众不可及,这些文本很少传达艺术品的感官、空间或情感特质。本研究提出了一种自动化工作流,利用大语言模型和文本转语音服务生成多感官艺术描述和同步音频解说。该系统通过Zapier编排,将上传的图像转换为丰富的叙事字幕,无需人工干预,从而实现可访问媒体的快速、规模化生产。对50件艺术品的定量评估显示,AI生成的描述在词汇多样性、形容词密度和叙事细节方面显著高于基线字幕,同时保持可比的易读性水平。统计检验(t检验、方差分析)确认了丰富度和长度方面的显著差异,完整流水线在每张图像20秒内生成文本加音频输出,成本低于0.05美元。研究结果表明,自动字幕生成可以弥合博物馆和数字馆藏可访问性方面的差距,对更广泛的公众参与具有意义。未来工作可纳入BLV参与者的用户研究,以评估理解、偏好和最佳解释性语言水平。

英文摘要

Visual art remains largely inaccessible to blind and low-vision (BLV) audiences due to brief or absent alt-text, which rarely conveys the sensory, spatial, or emotional qualities of an artwork. This study presents an automated workflow that generates multi-sensory art descriptions and synchronized audio narration using large language models and text-to-speech services. The system, orchestrated through Zapier, converts uploaded images into rich narrative captions without human intervention, enabling rapid, scalable production of accessible media. Quantitative evaluation across 50 artworks shows that AI-generated descriptions contain significantly higher lexical diversity, adjective density, and narrative detail than baseline captions, while maintaining comparable readability levels. Statistical tests (t-tests, ANOVA) confirm meaningful differences in richness and length, and the full pipeline produces text-plus-audio outputs in under 20 seconds per image at a cost below $0.05. Findings demonstrate that automated captioning can bridge gaps in museum and digital-collection accessibility, with implications for broader public engagement. Future work can incorporate user studies with BLV participants to assess comprehension, preference, and optimal levels of interpretive language.

2606.09844 2026-06-10 cs.HC cs.AI 新提交

The Interlocutor Effect: Why LLMs Leak More Personal Data to Agents Than Humans

对话者效应:为什么LLMs向智能体泄露的个人数据比向人类多

Faouzi El Yagoubi, Godwin Badu-Marfo, Ranwa Al Mallah

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 研究发现LLMs在与AI智能体对话时比与人类对话更易泄露个人身份信息,通过注意力抑制假说解释该现象,实验表明安全对齐注意力头在智能体交互中失活导致泄露增加23个百分点。

详情
AI中文摘要

大型语言模型(LLMs)会根据其感知到的对话者身份改变隐私行为。虽然安全机制通常阻止LLMs向人类用户泄露个人身份信息(PII),但这些模型在与另一个AI智能体对话时倾向于泄露更多敏感数据。我们将此称为\textbf{对话者效应}。通过消融研究,我们发现接收者的技术性质对这一效应有贡献,从而降低了模型对隐私的谨慎程度。为了进一步探索这一点,我们引入了注意力抑制假说,该假说认为安全对齐的注意力头在与智能体交互期间变得不活跃。我们通过比较222个敏感场景中面向人类和面向智能体的提示来定量评估这一点。从3,464次交互中得出的结果表明,将接收者描述为AI智能体会使PII泄露增加高达23个百分点。在Llama-3.1-8B-Instruct上的初步实验证实了这一点:停用一个安全注意力头会引发泄露,而重新激活它则恢复隐私保护。我们考虑了这对开发安全多智能体系统的影响。

英文摘要

Large Language Models (LLMs) alter their privacy behavior based on the perceived identity of their interlocutor. While safety mechanisms typically prevent LLMs from releasing Personally Identifiable Information (PII) to human users, these models tend to reveal more sensitive data when addressing another AI agent. We refer to this as the \textbf{Interlocutor Effect}. Through an ablation study, we find evidence that the technical nature of the recipient contributes to this effect, thereby diminishing the model's caution regarding privacy. To explore this further, we introduce the Attention Suppression Hypothesis, which posits that safety-aligned attention heads become inactive during interactions with agents. We assess this quantitatively by comparing human-directed and agent-directed prompts in 222 sensitive scenarios. Our findings, drawn from 3,464 interactions, indicate that portraying the recipient as an AI agent elevates PII leakage by up to 23 percentage points. Initial experiments on Llama-3.1-8B-Instruct corroborate this: deactivating one safety head induces leakage, whereas reactivating it reinstates privacy safeguards. We consider the implications for developing secure multi-agent systems.

2606.09843 2026-06-10 cs.HC cs.AI cs.CL 新提交

An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models

一个原生LLM的心理测量工具不能预测LLM行为:来自25个模型的证据

Juan Manuel Contreras

发表机构 * Independent Researcher(独立研究员)

AI总结 通过探索性因子分析从LLM行为中构建心理测量工具,发现LLM的自我报告与观察行为无关,揭示自我报告与人类判断之间的混淆因素。

详情
AI中文摘要

大型语言模型(LLM)在人格量表上产生稳定的自我报告,但这些自我报告并不能预测观察到的行为。这一差距是反映了LLM与人类特质结构之间的不匹配,还是LLM自我报告本身的更深层属性,此前尚未解决。我们构建了第一个心理测量工具,其结构通过探索性因子分析(EFA)从LLM行为能力中自下而上地推导出来。我们对来自17个模型家族的25个LLM施测了300个项目(240个直接李克特+60个基于场景),涵盖12个候选行为维度,每个项目施测30次。EFA产生了一个5因子结构——响应性、顺从性、大胆性、谨慎性和健谈性——具有极好的分半信度(所有Tucker φ ≥ .957)和内部一致性(所有α ≥ .930)。为了测试预测效度,我们收集了由151名人类评分者和一个三人LLM评审团评分的2500个开放式行为样本。人类和评审团评分一致(r̄ = .51),但两者均不跟踪自我报告:自我报告-人类r̄ = -.01,自我报告-评审团r̄ = .13,且没有因子水平的自我报告-人类置信区间排除零。在响应性上,自我报告与LLM评审团相关(r = .53),但与人类不相关(r = .04),尽管人类和评审团一致(r = .59)——这表明自我报告项目和LLM评审团共享人类观察者未捕捉到的方差,这是一个在集成内部可靠性检查中不可见的混淆因素。我们将该工具作为诊断探针发布,用于检测对齐塑造的自我描述,并作为LLM作为评审团流程的具体风险因素。

英文摘要

Large language models (LLMs) produce stable self-reports on personality inventories, but these self-reports do not predict observed behavior. Whether this gap reflects a mismatch between LLMs and human trait constructs, or a deeper property of LLM self-report itself, has been unresolved. We constructed the first psychometric instrument whose constructs are derived bottom-up from LLM behavioral affordances via exploratory factor analysis (EFA). We administered 300 items (240 direct Likert + 60 scenario-based) spanning 12 candidate behavioral dimensions to 25 LLMs across 17 model families, each item administered 30 times. EFA yielded a 5-factor structure -- Responsiveness, Deference, Boldness, Guardedness, and Verbosity -- with excellent split-half replicability (all Tucker $ϕ\geq .957$) and internal consistency (all $α\geq .930$). To test predictive validity, we collected 2,500 open-ended behavioral samples rated by 151 human raters and a three-judge LLM ensemble. Human and judge ratings agreed ($\bar{r} = .51$), but neither tracked self-report: self-report--human $\bar{r} = -.01$, self-report--judge $\bar{r} = .13$, with no factor-level self-report--human CI excluding zero. On Responsiveness, self-report correlated with LLM judges ($r = .53$) but not humans ($r = .04$), even though humans and judges agreed ($r = .59$) -- indicating self-report items and LLM judges share variance that human observers do not, a confound invisible to within-ensemble reliability checks. We release the instrument as a diagnostic probe for alignment-shaped self-description and a concrete risk factor for LLM-as-judge pipelines.

2606.09842 2026-06-10 cs.HC cs.AI cs.CV 新提交

Integrated Real-Time Motion Tracking and AI Analysis for Athletic Performance Optimization

集成实时运动跟踪与AI分析以优化运动表现

Parth Agrawal, Ronit, Sagar Kumar, Aashish Bhambri

发表机构 * Department of Computer Science(计算机科学系) Department of Computer Science and Engineering(计算机科学与工程系) Chandigarh University(昌迪加尔大学)

AI总结 本文综述了实时人体姿态估计在运动分析中的方法,并开发了一个轻量级原型系统,利用MediaPipe框架提供实时反馈,以优化运动表现。

Comments 6 pages, 10 figures, 2 tables, IC2E3-2026 conference

详情
AI中文摘要

在真实世界环境中应用人体姿态估计(HPE)仍然是一项具有挑战性的任务。本文探讨并综述了实时HPE方法及其在个体运动分析中的局限性,同时开发了一个实用的轻量级原型用于真实世界的测试和使用。从传统的基于标记的运动捕捉系统发展到现代可访问且适应性强的无标记深度学习方法,本文综述了平衡精度和效率的基础架构。我们还比较了算法框架(如自顶向下、自底向上、单阶段方法等)在实际部署指标上的表现,包括推理延迟、帧率、平均关节位置误差和时间抖动,以指导运动应用的模型选择过程。作为我们的主要贡献,我们提出了一个模块化的轻量级软件原型,该原型使用MediaPipe HPE框架结合多种特定于运动的逻辑,为非专业用户提供实时洞察和基于AI的反馈。我们以最小的计算资源推导出运动洞察并提供反馈,同时展示了性能和可靠性指标。最后,我们提出了其他未来研究方向,如结合传感器和AR/VR。这项工作面向研究人员、工程师、运动科学家等,既作为技术资源,也作为实现类似或改进的实时HPE分析系统以增强运动表现或其他目的的有效蓝图。

英文摘要

Applying Human Pose Estimation (HPE) in real world environments remains a challenging task, this paper explores and surveys real time HPE approaches and their limitations in sports analysis for individuals, alongside developing a practical lightweight prototype for real world testing and usage. The older marker-based motion capture systems evolving to the modern accessible and adaptable markerless deep learning approaches, this survey explores the foundational architectures, which balance precision and efficiency. We also compare algorithmic frameworks (top-down, bottom-up, one-stage approaches, etc.) on practical deployment metrics such as inference latency, frame rate, mean per-joint position error, and temporal jitter to guide model selection process for sports application. As our prime contribution, we are proposing a modular, lightweight software prototype, which uses MediaPipe HPE framework with multiple exercise specific logic to deliver real-time insights and AI based feedback for non-expert users. We derive sports insights and providing feedback with minimal computational resources, while showcasing the performance and reliability metrics. In the end, we suggest other future research directions like combining sensors, and AR/VR. This work caters to researchers, engineers, sport scientists, etc., as both technical resource and a valid blueprint to implement a similar or improved real-time HPE analysis system for athletic performance enhancement or other purposes.

2606.09839 2026-06-10 cs.HC cs.AI cs.CY 新提交

Aesthetic Perspectives in Information Systems Research: A Hermeneutic Analysis

信息系统研究中的美学视角:一项诠释学分析

Angelina Chen, Rick Sullivan, Raffaele F Ciriello

发表机构 * University of Sydney(悉尼大学) HEC Montréal(蒙特利尔HEC)

AI总结 通过诠释学文献分析,揭示信息系统研究中四种隐含的美学视角(模仿、感官体验、世界构建、政治行动),阐明它们如何构成认知基础设施并影响研究问题与方法,以算法管理和数字中介亲密性为例展示其价值。

Comments Thirty-Fourth European Conference on Information Systems (ECIS 2026), Milan, Italy

详情
AI中文摘要

隐含的美学视角可能如何影响信息系统(IS)学术界认可(或不认可)哪些研究值得研究?在这项诠释学文献分析中,我们揭示了支撑IS研究的基础性美学假设。我们识别出四种视角(作为模仿的美学、作为感官体验的美学、作为世界构建的美学、作为政治行动的美学),它们指导IS学者如何感知和欣赏社会技术现象。这些视角影响什么成为可识别的合法研究,什么仍未被看见。通过明确美学假设,我们展示了它们如何构成限制探究视野的认知基础设施。我们将这一框架应用于算法管理和数字中介亲密性,揭示了替代视角如何开辟新的研究问题,同时暴露了主流框架所忽视的维度。本分析强调了美学哲学对IS文献的重要性,为阐述美学视角如何塑造理论化、方法和贡献提供了词汇。

英文摘要

How might implicit aesthetic perspectives shape what Information Systems (IS) scholarship recognises as worthy of study (or not)? In this hermeneutic literature analysis, we surface foundational aesthetic assumptions underpinning IS research. We identify four perspectives (aesthetics as imitation, sensory experience, world-making, and political doing) that guide how IS scholars perceive and appreciate sociotechnical phenomena. These perspectives influence what becomes recognisable as legitimate research and what remains unseen. By making aesthetic assumptions explicit, we show how they form epistemic infrastructure that conditions horizons of inquiry. We apply this framework to algorithmic management and digitally mediated intimacy, revealing how alternative perspectives open new research questions whilst exposing dimensions that dominant framings overlook. This analysis foregrounds the importance of aesthetic philosophy to IS literature, offering a vocabulary for articulating how aesthetic perspectives shape theorising, method, and contribution.

2606.09837 2026-06-10 cs.HC cs.AI 新提交

Self-EmoQ: Plutchik-Guided Value-based Planning to Drive Streaming Emotional TTS

Self-EmoQ: 基于Plutchik引导的价值规划驱动流式情感TTS

Yue Zhao, Hongyan Li, Yong Chen, Luo Ji

发表机构 * Geely AI Lab(地平线人工智能实验室)

AI总结 提出一种情感规划框架,通过强化学习训练LLM模块,在文本生成前确定情感,以驱动流式TTS,结合Plutchik情感理论进行混合奖励,实验表明在情感确定和响应质量上优于基线。

Comments Accepted to ACL 2026 Findings

详情
AI中文摘要

情感交互对于对话AI越来越重要,但当前系统缺乏自我情感确定机制来驱动流式文本到语音(TTS)合成。我们提出一个情感规划框架,在文本生成之前确定情感,以流式方式为下游情感TTS提供基础。该框架通过一个即插即用的LLM模块实现,该模块从预训练LLM初始化,并通过强化学习(RL)训练,以情感作为动作。采用混合奖励,结合模仿信号和理论驱动评分,其中采用了Plutchik情感轮理论。通过在DailyDialog、EmoryNLP、IMEOCAP和MELD上的实验,我们的方法在情感确定和响应质量上均优于提示和微调基线。我们最终实现了一个完整的流式管道用于实时部署,语音质量证实了框架的情感对齐、上下文连贯性和表达流畅性。代码、案例和演示可在该https URL获取。

英文摘要

Emotional interaction is increasingly crucial for conversational AI, yet current systems lack a self-emotion determination mechanism to drive the streaming text-to-speech (TTS) synthesis. We propose an emotion-planning framework that determines the emotion prior to the textual generation, grounding the downstream emotional TTS in a streaming manner. The framework is implemented by a plug-and-play LLM module, initialized from pretrained LLMs, and trained by reinforcement learning (RL) with emotions as the actions. A hybrid reward is employed which combines imitation signals with theory-driven scoring, in which the theory of Plutchik's wheel of emotions is adopted. By experiments on DailyDialog, EmoryNLP, IMEOCAP, and MELD, our method outperforms prompting and finetuning baselines on both emotion determination and response quality. We finally implement an entire streaming pipeline for real-time deployment, with the speech quality confirming the framework's emotional alignment, contextual coherence, and expressive fluency. Codes, cases, and demos are available in https://sixingdeguo.github.io/EmoQ-page/.

2606.09836 2026-06-10 cs.HC cs.RO 新提交

Equanimity in HRI: Applying Calm Technology Principles to Human-Robot Interaction

人机交互中的平和心态:将平静技术原则应用于人机交互

Barbara Sienkiewicz, Bipin Indurkhya

发表机构 * Cognitive Science Department, Jagiellonian University(杰兹维日大学认知科学系)

AI总结 本文探索将平静技术整合到人机交互中,为家庭辅助机器人设计提供指南,以促进平和、非侵入性的交互,并强调负责任机器人学与伦理考量。

Comments Conference pre-print. https://doi.org/10.1007/978-981-96-3525-2_41

详情
AI中文摘要

本文探讨如何将{\ extit{平静技术}}整合到人机交互中,特别关注家庭环境。它提供了全面的指南,用于设计优先考虑并增强人类对{\ extit{平和心态}}需求的辅助机器人,确保交互是平静、非侵入性和和谐的。本文审视了技术在当代生活中的广泛影响及其对认知能力的影响,强调了未来技术发展中负责任机器人学和伦理考量的必要性。通过将{\ extit{平静技术}}原则应用于家用机器人,本文提供了在家庭辅助机器人中应使用的具体示例和特征。目标是促进人类与机器人之间平衡、不引人注目的交互,特别是在家庭环境中,因为它是每个人生活中最私密的空间,为该领域的应用和进一步研究铺平道路。

英文摘要

This paper explores how {\textit{Calm Technology}} can be integrated into Human-Robot Interaction (HRI), with a particular focus on the household environment. It offers comprehensive guidelines for designing assistive robots that prioritize and enhance the human need for {\textit{equanimity}}, ensuring interactions are calm, non-intrusive, and harmonious. The paper examines the widespread influence of technology in contemporary life and its impact on cognitive capabilities, underscoring the need for responsible robotics and ethical considerations in future technological developments. By adapting {\textit{Calm Technology}} principles to domestic robots, the article provides concrete examples and features that should be employed in household assistive robotics. The goal is to foster a balanced, unobtrusive interaction between humans and robots, especially in the home environment, as it is the most privat environment in everyone's life, paving the way for applications and further research in the field.

2606.09833 2026-06-10 cs.HC cs.AI cs.CY 新提交

CollabSkill: Evaluating Human-Agent Collaboration On Real-World Tasks

CollabSkill: 评估真实世界任务中的人机协作

Yijia Shao, Zora Zhiruo Wang, Neel Ahuja, Yicheng Wang, Bowen Liu, Diyi Yang

发表机构 * Stanford University(斯坦福大学) Carnegie Mellon University(卡内基梅隆大学) University of Rochester(罗切斯特大学) Individual Researcher(独立研究者)

AI总结 提出CollabSkill框架,通过配对真实工人与AI代理执行职业任务,利用贝叶斯技能评级系统量化人机贡献,揭示Claude Code排名第一且实践经验是协作技能的主要驱动力。

Comments 11 pages of main paper, preprint (under review)

详情
AI中文摘要

AI代理正在重塑工作空间,导致人类工作方式的剧烈变化。尽管人机协作在保持人类能动性和产生经济价值方面具有巨大潜力,但由于收集真实人类数据和考虑人类间差异的困难,这一范式在职业任务评估中仍然基本缺失。我们引入了CollabSkill,一个用于评估真实世界职业任务中人机协作的框架。CollabSkill将真实人类工人与AI代理配对,执行与其职业背景匹配的任务,收集能够捕捉经济价值任务的复杂性和真实工人使用模式的数据。为了考虑人类间差异,CollabSkill采用贝叶斯技能评级系统来分离并量化人类和AI代理的技能贡献。基于来自93名人类工人的386个工作会话中的1500多个提示,我们的分析在两个层面产生了见解:在代理方面,CollabSkill上的排名与现有完全自主基准(其中Codex领先)有显著差异,Claude Code排名第一;在人类方面,CollabSkill揭示了实践经验是协作技能的主要驱动力,动手协作有意义地改变了工人的AI素养。总之,我们希望CollabSkill能使社区投资于系统评估人机协作,并推动旨在构建真正增强人类工人的AI代理的开发工作。

英文摘要

AI agents are reshaping the workspace, leading to drastic change of how humans work. Despite the considerable potential of human-agent collaboration both in preserving human agency and generating economic value, this paradigm remains largely absent from occupational task evaluation, hindered by the difficulty of gathering real human data and accounting for inter-human variability. We introduce CollabSkill, a framework for evaluating human-agent collaboration on real-world occupational tasks. CollabSkill pairs real human workers with AI agents on tasks matched to their occupational background, collecting data that capture the complexity of economically valuable tasks and the usage patterns of real workers. To account for inter-human variability, CollabSkill employs a Bayesian skill rating system to disentangle and quantify the skill contributions of both humans and AI agents. Drawing on over 1,500 prompts from 386 working sessions contributed by 93 human workers, our analysis yields insights on two fronts: on the agent side, rankings on CollabSkill diverge meaningfully from those of existing fully autonomous benchmarks where Codex leads, with Claude Code ranking first; on the human side, CollabSkill reveals that practical experience emerges as the primary driver of collaboration skill, with hands-on collaboration meaningfully shifting workers' AI literacy. Together, we hope CollabSkill enables the community to invest in systematic evaluation of human-agent collaboration and spurs development efforts aimed at building AI agents that genuinely augment human workers.

2606.09832 2026-06-10 cs.HC cs.AI 新提交

Agentic Social Affordance Framework (ASAF): Agent Identity Design as a Collaboration Interface in Multi-Agent Systems

智能体社会可供性框架 (ASAF):多智能体系统中作为协作接口的智能体身份设计

Meng-Han Lee

发表机构 * Independent Researcher(独立研究者)

AI总结 提出ASAF框架,将社会可供性理论扩展到多智能体AI系统,将智能体身份设计视为协作接口,通过身份信号、行为启动和协作治理三个机制影响人机协作质量。

Comments 24 pages, 2 figures, 1 table. Introduces ASAF with falsifiable hypotheses and proposed experimental designs for testing agent identity design effects in multi-agent Human-in-the-Loop systems, grounded in a real-world 38-agent deployment

详情
AI中文摘要

随着AI系统从单一对话智能体演变为复杂的多智能体架构,一个关键的设计维度被忽视了:个体智能体的社会身份如何塑造人类在协作中的行为。本文介绍了智能体社会可供性框架(ASAF),这是一个将社会可供性理论扩展到多智能体AI系统背景的理论框架。我们提出,智能体身份设计不仅作为用户界面惯例,而且作为协作接口——构建用户如何感知、接近和与每个智能体互动,从而影响人机协作结果的质量。具体来说,社会可供性层构成了一个独立于工程编排的设计维度:两者代表不同的决策空间,不能相互推导。ASAF包含三个机制:身份信号、行为启动和协作治理,并通过四层身份信号保真度谱和个体差异调节变量(拟人化与工具化认知风格)指定其边界条件。我们将ASAF与现有可供性理论和CASA范式相联系,阐明ASAF的多智能体、拓扑级预测在哪些方面超出了二元框架的解释范围。我们讨论了多智能体系统设计的启示,并概述了未来实证验证的方向,包括用于测试设计空间正交性的因子设计。

英文摘要

As AI systems evolve from single conversational agents to complex multi-agent architectures, a critical design dimension has been overlooked: how the social identity of individual agents shapes human behavior within the collaboration. This paper introduces the Agentic Social Affordance Framework (ASAF), a theoretical framework that extends Social Affordance theory into the context of multi-agent AI systems. We propose that agent identity design functions not merely as a user interface convention, but as a collaboration interface -- structuring how users perceive, approach, and engage with each agent, and thereby influencing the quality of Human-Agent collaboration outcomes. Specifically, the social affordance layer constitutes an independent design dimension orthogonal to engineering orchestration: the two represent distinct decision spaces that cannot be derived from each other. ASAF comprises three mechanisms: Identity Signaling, Behavioral Priming, and Collaborative Governance, and specifies their boundary conditions through a four-tier Identity Signal Fidelity Spectrum and an individual-difference moderating variable (anthropomorphizing vs.\ instrumentalizing cognitive style). We situate ASAF in relation to existing affordance theory and the CASA paradigm, delineating where ASAF's multi-agent, topology-level predictions exceed the explanatory scope of dyadic frameworks. We discuss implications for multi-agent system design and outline directions for future empirical validation, including a factorial design for testing design-space orthogonality.

2606.09831 2026-06-10 cs.HC cs.AI 新提交

AI-Driven Analytics of Team-Teaching Talk: Acoustic Patterns across Experience, Cohorts and the Learning Design

AI驱动的团队教学对话分析:跨经验、学生群体和学习设计的声学模式

Yuchen Liu, Roberto Martinez-Maldonado, Riordan Alfredo, Paola Mejia-Domenzain, Dwi Rahayu, Sadia Nawaz

发表机构 * Monash University(莫纳什大学) EPFL(瑞士联邦理工学院)

AI总结 本文提出基于AI的语音处理方法,分析团队教学中的课堂对话,发现经验丰富的教师、本科生班级和协作学习任务中音量变化更大,表明教师更频繁调节音量以突出关键信息并促进互动。

Comments Accepted at AIED 2026 (International Conference on Artificial Intelligence in Education), 14 pages, 4 figures

详情
AI中文摘要

随着课堂规模的扩大,团队教学越来越多地被用于整合多位教师的专业知识和教学视角。然而,关于团队教学在实践中如何展开的实证理解仍然有限,特别是在教师贡献随经验水平、学生群体和学习任务设计差异方面。先前对团队教学的研究主要依赖于回顾性自我报告或小规模观察,对团队教学实施的微观过程提供了有限的见解。教师谈话为这些过程提供了一个可扩展的视角。虽然个体教学情境中的研究表明,语音的声学特征(如音质、语调和响度)可以影响学生学习,但来自团队教学环境的证据仍然稀缺。此外,通过手动观察或转录捕捉这些特征在团队教学课堂中尤其具有挑战性,因为多位教师在长时间和多空间位置上发言,限制了可扩展性,除非自动化。基于空间教学法理论和团队教学研究,本文提出了一种基于AI的语音处理方法,用于分析团队教学环境中的课堂谈话。我们分析了涉及12位教师的36个录制的本科生和研究生课程。编码了空间教学行为并提取了声学特征,以考察教师经验、学生群体和学习任务设计之间的差异。结果揭示了系统性差异,最显著的是在响度动态方面:高经验教师、本科生班级和协作学习任务表现出更大的响度变化,表明更频繁地调节音量以突出关键信息并支持课堂互动和参与。

英文摘要

As classroom cohorts expand, team teaching is increasingly used to integrate the expertise and pedagogical perspectives of multiple teachers. Yet, there is limited empirical understanding of how team teaching unfolds in practice, particularly regarding differences in teachers' contributions across experience levels, student cohorts, and learning task design. Prior research on team teaching has largely relied on retrospective self-reports or small-scale observations, offering limited insight into the micro-level processes through which team teaching is enacted. Teacher talk offers a scalable lens on these processes. While research in individual teaching contexts shows that acoustic features of speech (e.g., voice quality, intonation, and loudness) can shape student learning, evidence from team-teaching settings remains scarce. Moreover, capturing such features through manual observation or transcription is especially challenging in team-teaching classrooms, where multiple teachers speak across extended sessions and spatial locations, limiting scalability without automation. Grounded in spatial pedagogy theory and team-teaching research, this paper presents an AI-based speech processing approach to analyse classroom talk in team-teaching settings. We analysed 36 recorded undergraduate and postgraduate sessions involving 12 teachers. Spatial pedagogy behaviours were coded and acoustic features extracted to examine variation across teachers' experience, student cohorts, and the learning task design. The results reveal systematic differences, most notably in loudness dynamics: high-experience teachers, undergraduate classes and collaborative learning tasks exhibited greater loudness variation, suggesting more frequent modulation of volume to foreground key information and support classroom interaction and engagement.

2606.10916 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 新提交

Range Penalization: Theoretical Insights with Applications in Federated Learning

范围惩罚:理论洞见及其在联邦学习中的应用

Yiyuan She, Zhaojun Hu, Yifan Sun

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出范围正则化方法,通过极值聚类实现跨客户端正则化,并开发非渐近统计精度与模式恢复的新证明技术,以及利用局部强凸性的快速优化算法。

详情
AI中文摘要

本文针对具有线性系统组件的联邦学习引入范围正则化,以提高统计精度并诱导跨客户端正则性,从而有利于量化、编码和资源效率。我们的方法识别不同客户端之间共享权重的特征,并将个性化特征的权重自适应地聚类到极值,这一过程称为极值聚类。由于正则化子的半范数性质和不可分解性,相关估计量的理论分析面临重大挑战。我们为统计精度和忠实模式恢复的非渐近分析开发了新的证明技术。此外,提出了一种利用不同程度局部强凸性的快速优化算法,以降低迭代复杂度。实验支持了所提方法的有效性和效率。

英文摘要

This paper introduces range regularization for federated learning with linear systematic components to enhance statistical accuracy and induce cross-client regularity conducive to quantization, coding, and resource efficiency. Our approach identifies features with shared weights across different clients and adaptively clusters the weights of personalized features at extreme values, a process we refer to as polar clustering. Theoretical analysis of the associated estimators poses significant challenges due to the seminorm nature and non-decomposability of the regularizer. We develop new proof techniques for the nonasymptotic analysis of statistical accuracy and faithful pattern recovery. Moreover, a fast optimization algorithm that leverages varying degrees of local strong convexity is proposed to reduce iteration complexity. Experiments support the efficacy and efficiency of the proposed approach.

2606.10295 2026-06-10 stat.ML cs.LG math.ST stat.TH 新提交

$k$-Nearest Neighbors in Gromov--Wasserstein Space

Gromov--Wasserstein空间中的$k$-最近邻

Kaitlyn Hohmeier, Nicolas Fraiman, Caroline Moosmueller

发表机构 * University of North Carolina at Chapel Hill, Department of Mathematics(北卡罗来纳大学教堂山分校数学系) University of North Carolina at Chapel Hill, Department of Statistics and Operations Research(北卡罗来纳大学教堂山分校统计与运筹学系)

AI总结 本文在Gromov-Wasserstein距离框架下实现k-最近邻分类,证明了度量测度空间和图上分类器的普适一致性,并通过实验验证了其有效性。

详情
AI中文摘要

Gromov--Wasserstein (GW) 距离为比较度量测度空间提供了一个框架,无论其底层结构或几何形状如何。对于基于网络的数据,它能够直接比较具有不同节点数量的图,无需嵌入或其他抽象。此外,通过GW的变体——融合Gromov--Wasserstein (fGW),还可以在图形结构之外结合节点特征。在这项工作中,我们使用GW和fGW距离实现了$k$-最近邻 ($k$-NN) 分类。我们证明了在具有有限支撑和均匀概率测度的度量测度空间等价类空间上,GW-$k$-NN分类器的普适一致性。通过将图视为具有成对距离度量和节点上均匀概率测度的有限支撑度量测度空间,我们获得了图空间上GW-$k$-NN的普适一致性。类似地,对于fGW-$k$-NN,我们证明了在由具有有限支撑和均匀概率测度的度量测度空间以及到欧几里得空间的特征映射组成的结构化对象的弱同构类空间上的普适一致性,从而建立了节点属性图空间上的普适一致性。我们的数值实验表明,GW-$k$-NN和fGW-$k$-NN在多个图数据集上始终表现良好,这表明诸如$k$-NN之类的度量分类器在GW框架中效果良好。

英文摘要

The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of nodes, without requiring an embedding or other abstraction. Furthermore, through a variant of GW known as fused Gromov--Wasserstein (fGW), it is also possible to incorporate node features in addition to graph structure. In this work, we implement $k$-nearest neighbors ($k$-NN) classification using the GW and fGW distances. We prove the universal consistency of the GW-$k$-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-$k$-NN for the space of graphs. Likewise for fGW-$k$-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space, thus establishing universal consistency on the space of node-attributed graphs. Our numerical experiments show that GW-$k$-NN and fGW-$k$-NN consistently perform well across multiple graph datasets, suggesting that metric classifiers such as $k$-NN work well in the GW framework.

2606.10119 2026-06-10 stat.ML cs.LG math.ST stat.TH 新提交

Convergence Rates for Neural-Network Estimation with Current-Status Data

当前状态数据下神经网络估计的收敛速度

Yuan Wu, Tianhui Zhou

发表机构 * Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA(生物统计学与生物信息学系,杜克大学,达勒姆,北卡罗来纳州,美国)

AI总结 针对当前状态数据,提出非参数神经网络筛最大似然估计器,结合ReLU网络逼近理论与经验过程论证,在Hölder光滑假设下建立显式收敛速度。

详情
AI中文摘要

当前状态数据出现在事件时间仅通过一个指示变量(是否在检查时间之前发生)被观测到时。本文研究了事件时间条件累积分布函数的非参数神经网络筛最大似然估计器。在Hölder光滑假设下,我们通过结合整流线性单元神经网络的逼近理论与经验过程论证,建立了显式收敛速度。这一结果为当前状态观测下的神经网络估计及后续推断提供了理论支持。

英文摘要

Current-status data arise when an event time is observed only through an indicator of whether it occurred before an examination time. This paper studies a nonparametric neural-network sieve maximum likelihood estimator of the conditional cumulative distribution function of the event time. Under Hölder smoothness assumptions, we establish an explicit convergence rate by combining approximation theory for rectified linear unit neural networks with empirical-process arguments. This result provides theoretical support for neural-network estimation and subsequent inference under current-status observation.

2606.10780 2026-06-10 cs.IT cs.CR cs.LG math.IT 新提交

Secure Aggregation with Top-K Sparsification in Decentralized Federated Learning

去中心化联邦学习中的Top-K稀疏化安全聚合

Hengxuan Tang, Jinbao Zhu, Xiaohu Tang

发表机构 * Southwest Jiaotong University(西南交通大学)

AI总结 针对去中心化联邦学习中带宽有限和节点不可靠的问题,提出一种结合Top-K梯度稀疏化的信息论安全聚合方案,通过离线阶段处理维度相关开销,使用随机掩码和排列保护梯度,在1%稀疏率下保持精度并显著降低通信成本。

Comments 6 pages, 1 figure, accepted to IEEE ISIT 2026

详情
AI中文摘要

安全聚合是联邦学习中缓解梯度泄露的关键组件,但其通信成本通常随梯度维度扩展。这对于大型模型变得难以承受,在带宽有限和节点不可靠的去中心化联邦学习中更为突出。Top-K梯度稀疏化是一种有效的通信减少方法,仅传输完整梯度的少数条目,同时保持有竞争力的模型精度。然而,每个用户选择的Top-K条目不可预测且因用户而异,这对高效的稀疏安全聚合构成了挑战。本文研究了去中心化联邦学习中存在用户退出和用户合谋时的信息论安全聚合与Top-K稀疏化。我们提出了一种通信高效的稀疏安全聚合方案,将维度相关的开销转移到离线阶段,并使用随机掩码和排列保护私有梯度。实验结果表明,即使仅使用1%的梯度稀疏化,我们的方案也能保持与全梯度聚合相当的精度,同时大幅降低通信成本。

英文摘要

Secure aggregation is a vital component for mitigating gradient leakage in federated learning, but its communication cost conventionally scales with the gradient dimension. This becomes prohibitive for large models and even more pronounced in decentralized federated learning with limited bandwidth and unreliable nodes. Top-K gradient sparsification is an effective approach to reduce communication by transmitting only a few entries of the full gradient, while maintaining competitive model accuracy. Nevertheless, the top-K entries selected by each user are unpredictable and vary across users, which poses a challenge for efficient sparse secure aggregation. This paper studies information-theoretic secure aggregation with top-K sparsification in decentralized federated learning under user dropouts and user collusion. We propose a communication-efficient sparse secure aggregation scheme that offloads dimension-dependent overhead to an offline phase and protects private gradients using random masks and permutations. Experimental results demonstrate that our scheme preserves accuracy comparable to full-gradient aggregation even with only 1% gradient sparsification, while substantially reducing the communication cost.

2606.10601 2026-06-10 math.NA cs.AI cs.LG cs.NA 新提交

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

Dmsh:一种用于全四边形网格生成的多智能体强化学习框架

Anirudh Kalyan, Cosmin Anitescu, Xiaoying Zhuang, Timon Rabczuk, Somdatta Goswami, Sundararajan Natarajan

发表机构 * Department of Mechanical Engineering, Indian Institute of Technology Madras(印度理工学院马德拉斯分校机械工程系) Institute of Continuum Mechanics, Leibniz Universität Hannover(莱比锡大学汉诺威连续力学研究所) Institute of Structural Mechanics, Bauhaus-Universität Weimar(魏玛包豪斯大学结构力学研究所) Department of Civil and Systems Engineering, Johns Hopkins University(约翰霍普金斯大学土木与系统工程系)

AI总结 提出Dmsh,首个全自动强化学习流水线,通过三个协调智能体处理拓扑简化、几何正则化和网格生成,采用参数化Soft Actor-Critic架构和课程学习策略,实现高质量全四边形网格生成。

详情
AI中文摘要

为任意几何体生成高质量网格仍然是计算工程中的一个基本瓶颈,通常需要启发式调整和半手动工作流程。在本文中,我们介绍了Dmsh,这是第一个完全自动化的强化学习流水线,它将几何分解和四边形网格生成统一在一个基于学习的框架中。Dmsh通过三个协调的智能体分解问题,分别处理拓扑简化、几何正则化和网格生成。网格生成过程被建模为马尔可夫决策过程,并使用具有解耦评论家的参数化Soft Actor-Critic架构求解,从而能够高效探索混合离散-连续动作空间。课程学习策略确保了从简单域到高度复杂几何体的可扩展性,并抑制了种子方差。通过设计,递归分解使得子区域能够并行网格化,生成全局一致的全四边形网格,无需事后校正。在广泛的基准测试中,Dmsh在自动化程度、鲁棒性和网格质量方面始终优于现有方法,为基于学习的网格生成建立了新范式。

英文摘要

Generating high-quality meshes for arbitrary geometries remains a fundamental bottleneck in computational engineering, often demanding heuristic tuning and semi-manual workflows. In this paper, we introduce Dmsh, a first fully automated reinforcement learning pipeline that unifies geometric decomposition and quadrilateral mesh generation within a single learning-based framework. Dmsh decomposes the problem through three coordinated agents handling topology simplification, geometric regularization, and mesh generation. The meshing process is formulated as a Markov Decision Process and solved using a parametric Soft Actor-Critic architecture with decoupled critics, enabling efficient exploration of a hybrid discrete-continuous action space. A curriculum learning strategy ensures scalability from simple domains to highly complex geometries, suppressing seed variance. By design, the recursive decomposition enables parallel meshing of subregions, yielding globally conforming all-quadrilateral meshes without post hoc correction. Across a wide range of benchmarks, Dmsh consistently outperforms existing methods in automation, robustness, and mesh quality, establishing a new paradigm for learning-based mesh generation.

2606.10562 2026-06-10 math.OC cs.LG cs.NA math.NA 新提交

Accelerating SAV-based optimization via randomized low-rank Hessian approximation

基于随机低秩Hessian近似的加速SAV优化方法

Ryo Sagawa, Daisuke Furihata, Yuto Miyatake

发表机构 * Department of Pure and Applied Mathematics, Graduate School of Information Science and Technology, The University of Osaka(纯粹与应用数学系,信息科学与技术研究生学校,大阪大学)

AI总结 提出Nyström增强松弛标量辅助变量方法(N-RSAV),通过随机低秩Nyström近似引入曲率信息加速收敛,并保持无条件修正能量耗散律,在病态问题(如PINNs)中显著快于传统RSAV方法。

Comments 25 pages, 4 figures

详情
AI中文摘要

我们提出了一种新的优化方法,即Nyström增强的松弛标量辅助变量方法(N-RSAV),它将曲率信息融入RSAV框架,以加速收敛,同时保持无条件修正能量耗散律。现有的基于RSAV的方法仅依赖一阶信息,并且通常收敛缓慢,特别是对于病态问题,例如物理信息神经网络(PINNs)中出现的问题。为了解决这一局限性,我们使用从随机低秩Nyström近似获得的近似Hessian信息来设计RSAV方案中的线性算子。为了保持耗散结构,我们通过特征值截断强制执行半正定性。此外,我们引入了一种自适应策略,根据原始能量和修正能量之间的偏差重用近似Hessian,从而显著降低计算成本。我们还提供了在Polyak-Lojasiewicz(PL)条件下具有一般半正定算子的RSAV方案的收敛性分析,并在PL条件和额外凸性假设下建立了N-RSAV的相应收敛保证。在具有有效低秩结构的病态问题(包括凸二次问题和PINNs训练)上的数值实验表明,所提出的方法比传统的基于RSAV的方法实现了更快的收敛。

英文摘要

We propose a new optimization method, the Nyström-enhanced relaxed scalar auxiliary variable method (N-RSAV), which incorporates curvature information into the RSAV framework to accelerate convergence while preserving an unconditional modified energy dissipation law. Existing RSAV-based methods rely solely on first-order information and often suffer from slow convergence, particularly for ill-conditioned problems such as those arising in physics-informed neural networks (PINNs). To address this limitation, we design the linear operator in the RSAV scheme using approximate Hessian information obtained from a randomized low-rank Nyström approximation. To preserve the dissipation structure, we enforce positive semidefiniteness through eigenvalue truncation. Furthermore, we introduce an adaptive strategy that reuses the approximate Hessian based on the deviation between the original and modified energies, significantly reducing computational cost. We also provide a convergence analysis of the RSAV scheme with a general positive semidefinite operator under the Polyak-Lojasiewicz (PL) condition and establish corresponding convergence guarantees for N-RSAV under the PL condition and an additional convexity assumption. Numerical experiments on ill-conditioned problems with effectively low-rank structure, including convex quadratic problems and training of PINNs, demonstrate that the proposed methods achieve substantially faster convergence than conventional RSAV-based approaches.

2606.10458 2026-06-10 cs.IT cs.AI math.IT math.OC math.ST stat.TH 新提交

Minimum Distortion Quantization with Specified Output Distribution

指定输出分布的最小失真量化

Aolin Xu

发表机构 * Aolin Xu(徐澳林)

AI总结 本文推导了在输出分布指定条件下最小化均方误差的最优量化器,形式为X=σ(F_{σ^{-1}(X)}^{-1}(F_W(W))),并证明了在均匀分布下简化为X=F_X^{-1}(F_W(W)),主要贡献在于通过优化排列和累积分布函数实现最小失真。

详情
AI中文摘要

我们推导了实值随机变量 $W$(分布为 $P_W$)的最优量化器,使得 1) 量化输出 $X$(可取 $k$ 个值)的分布遵循 $\{1,\ldots,k\}$ 上的任意指定分布 $P_X$,且 2) 从 $X$ 估计 $W$ 的最小均方误差 (MMSE) 最小化。结果表明,最优量化器形式为 $X=\sigma\big(F_{\sigma^{-1}(X)}^{-1}(F_W(W))\big)$,其中 $\sigma$ 是 $\{1,\ldots,k\}$ 上所有排列中使 MMSE 最小的最优排列,$F$ 为累积分布函数。当 $P_W$ 在区间上均匀分布或 $P_X$ 在 $\{1,\ldots,k\}$ 上均匀分布时,量化器简化为 $X=F_{X}^{-1}(F_W(W))$。优超概念在最优性证明中起关键作用。指定输出分布有助于设计具有显式控制输出熵、最大化输入输出互信息、定制输出分布以匹配通信信道输入要求以及数据匿名化的量化器。

英文摘要

We derive the optimal quantizer of a real-valued random variable $W$ with distribution $P_W$ such that 1) the distribution of the quantization output $X$ that can take $k$ values follows any specified distribution $P_X$ over $\{1,\ldots,k\}$, and 2) the minimum mean squared error (MMSE) of estimating $W$ from $X$ is minimized. It is shown that the optimal quantizer takes the form $X=σ\big(F_{σ^{-1}(X)}^{-1}(F_W(W))\big)$, where $σ$ is the optimal permutation of $\{1,\ldots,k\}$ among all permutations to minimize the MMSE, and $F$ is the cumulative distribution function. When $P_W$ is uniform over an interval or $P_X$ is uniform over $\{1,\ldots,k\}$, the quantizer takes a simple form $X=F_{X}^{-1}(F_W(W))$. The concept of majorization plays a key role in the optimality proof. Specifying the output distribution is useful for designing quantizers with explicitly controlled output entropy, maximized mutual information between input and output, tailored output distribution to match channel input requirements for communication, and data anonymization.

2606.10377 2026-06-10 math.ST cs.LG stat.TH 新提交

Bidirectional Random Projections

双向随机投影

Chao Lan, Luyuan Yang

发表机构 * School of Computer Science, University of Oklahoma(俄克拉荷马大学计算机科学学院)

AI总结 本文分析固定设计下普通最小二乘回归的双向随机投影,导出基于投影数据的OLS估计的期望超额损失界,与仅行投影相比,差距约为O(p1 + C/p1),其中C随n1/n变化且可为负。

Comments Statistics & Probability Letters (Elsevier)

详情
AI中文摘要

本文分析了固定设计设置下普通最小二乘(OLS)回归的双向随机投影。设$(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$为样本,$R \in \mathbb{R}^{n_1 \times n}, W \in \mathbb{R}^{p \times p_1}$为两个适当分布的随机投影。我们推导了基于$(WXR, WY)$构建的OLS估计量的期望超额损失界。与基于$(XR, Y)$构建的OLS估计量的已有界相比,差距约为$O\left( p_1 + C \frac{1}{p_1} \right)$,其中$C$随$n_1/n$缩放,且对于小的$n_1/n$可以为负。其含义通过真实世界数据的数值结果得到证实。

英文摘要

This paper analyzes bidirectional random projections for ordinary least squares (OLS) regression under the fixed design setting. Let $(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$ be a sample and $R \in \mathbb{R}^{n_1 \times n}, W \in \mathbb{R}^{p \times p_1}$ be two properly distributed random projections. We develop an expected excess loss bound for the OLS estimator built on $(WXR, WY)$. Compared to an established bound for OLS estimator built on $(XR, Y)$, the gap is approximately $O\left( p_1 + C \frac{1}{p_1} \right)$, where $C$ scales with $n_1/n$ and can be negative for small $n_1/n$. Its implications are confirmed by numerical results on real-world data.

2606.09922 2026-06-10 cs.IT cs.AI math.IT 新提交

The Bioelectrical Information Theory: Investigating the theoretical compression limit of bioelectrical signals under artificial intelligence

生物电信息论:探究人工智能下生物电信号的理论压缩极限

Jiawen Zou, Bo Yan

发表机构 * College of Computer Science and Artificial Intelligence(计算机科学与人工智能学院) Shanghai Key Laboratory of Intelligent Information Processing(上海智能信息处理重点实验室) Fudan University(复旦大学)

AI总结 提出生物电压缩的三级层次框架,将压缩极限重构为模型和任务条件量,而非波形的固定属性。

详情
AI中文摘要

生物电信号正以挑战脑机接口带宽的规模被采集。然而,它们的压缩仍常被框定为波形保真问题,受限于原始信号的熵。本文提出一个信息论框架,其中生物电数据的有效信息不仅由信号保真度决定,还由生理结构、模型容量和下游任务需求决定。我们将生物电压缩表述为三级层次。在信号层面,噪声被降低至它们关于潜在生理源所携带的信息。在生理层面,参数化编码器将净化后的信号映射为紧凑、结构化且量化的表示。在语义层面,任务无关信息被丢弃,而深度学习模型利用因果依赖关系,用条件熵替代边际熵。这一视角将生物电信号的压缩极限重构为模型和任务条件量,而非波形的固定属性。随着表达能力日益增强的模型与神经和生理接口集成,生物电压缩可能从传输信号转变为仅传输任务级解释所需的残差信息。

英文摘要

Bioelectrical signals are increasingly acquired at scales that challenge the bandwidth of brain-computer interfaces. However, their compression is still often framed as a problem of waveform preservation, limited by the entropy of the raw signal. Here we propose an information-theoretic framework in which the effective information of bioelectrical data is determined not only by signal fidelity, but also by physiological structure, model capacity and downstream task requirements. We formulate bioelectrical compression as a three-level hierarchy. At the signal level, noise is reduced to the information they carry about latent physiological sources. At the physiological level, parametric encoders map purified signals into compact, structured and quantized representations. At the semantic level, task-irrelevant information is discarded, while deep learning models exploit causal dependencies to replace marginal entropy with conditional entropy. This perspective reframes the compression limit of bioelectrical signals as a model- and task-conditioned quantity rather than a fixed property of the waveform. As increasingly expressive models become integrated with neural and physiological interfaces, bioelectrical compression may shift from transmitting signals to transmitting only the residual information required for task-level interpretation.

2606.10179 2026-06-10 quant-ph cs.LG 新提交

Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

高斯初始化下IQP量子电路玻恩机的可训练性

Gennaro De Luca

发表机构 * Arizona State University(亚利桑那州立大学)

AI总结 研究高斯初始化下IQP量子电路玻恩机的可训练性,利用Stein引理和Lipschitz集中界推导梯度方差下界和偏离概率界,讨论避免或促进指数集中的策略及贫瘠高原条件。

Comments 23 pages

详情
AI中文摘要

量子电路玻恩机(QCBMs)通过利用玻恩规则为生成式机器学习提供了一种自然方法。最近的工作提供了一种通过最大均值差异(MMD)损失来经典训练具有瞬时量子多项式(IQP)电路的QCBMs的方法。尽管从IQP电路经典采样被认为是棘手的,但它们的期望值可以经典计算,从而能够训练这些IQP QCBMs。然而,量子机器学习(QML)模型还面临其他各种挑战,包括由指数集中或贫瘠高原引起的可训练性问题。虽然这些问题已经针对从均匀分布采样的参数进行了探索,但很少有工作严格处理任意高斯初始化方案的使用。本文利用Stein引理和高斯随机变量的Lipschitz集中界,提供了梯度方差的解析下界以及梯度偏离其均值的概率集中界。它讨论了避免或促进指数集中的策略,以及贫瘠高原更可能发生的条件。

英文摘要

Quantum Circuit Born Machines (QCBMs) offer a natural approach to generative machine learning by leveraging the Born rule. Recent work has provided a method to classically train QCBMs with Instantaneous Quantum Polynomial (IQP) circuits via the Maximum Mean Discrepancy (MMD) loss. Despite the assumed intractability of sampling from IQP circuits classically, their expectation values can be computed classically, enabling training of these IQP QCBMs. However, quantum machine learning (QML) models have various other challenges, including trainability issues caused by exponential concentration or barren plateaus. While these issues have been explored for parameters sampled from a uniform distribution, little work has been done to rigorously treat the use of arbitrary Gaussian initialization schemes. This work leverages Stein's lemma and Lipschitz concentration bounds for Gaussian random variables to provide an analytical lower bound of the variance of the gradient and a probabilistic concentration bound of the deviation of the gradient from its mean. It discusses strategies to either avoid or encourage exponential concentration, as well as the conditions under which barren plateaus are more likely to occur.

2606.10928 2026-06-10 cs.CE cs.AI cs.LG physics.comp-ph 新提交

A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS

FEniCS中变分多物理场有限元模拟的受约束自然语言接口

Nilay Upadhyay, Wesley F. Reinhart

发表机构 * Department of Engineering Science and Mechanics, The Pennsylvania State University(工程科学与力学系,宾夕法尼亚州立大学) Department of Materials Science and Engineering, The Pennsylvania State University(材料科学与工程系,宾夕法尼亚州立大学)

AI总结 提出一种受约束的自然语言接口,将LLM限制在前端任务(解析提示、生成Gmsh代码),后端使用确定性模板求解器,在基准测试中实现100%解析率和90%几何生成成功率。

Comments 23 pages, 17 figures

详情
AI中文摘要

大型语言模型可以减少设置有限元模拟所需的手动工作,但当生成的求解器代码位于关键路径上时,会引入可靠性风险。我们提出了一种用于多物理场有限元分析的受约束自然语言接口,其中LLM仅限于前端任务:将提示解析为结构化JSON,仅对非目录几何生成Gmsh代码,并对这些阶段使用重试反馈。它从不编写FEniCS求解器模板、推导弱形式或编写数值求解器核心。一个确定性调度器将验证后的规范映射到五个手写的FEniCS/UFL模板:线弹性、超弹性、弹塑性、热力耦合和相场断裂。我们针对解析解和已发表的2D/3D基准测试验证了该确定性模板层。在适当网格上,平滑案例达到低于1%的一致性,而较难的非线性案例达到2-5%的范围。我们还直接评估了面向LLM的前端。在15个提示的解析器基准测试中,首次通过有效解析获得了9个案例,其余所有案例在重试后修复,最终有效解析率为100.0%,问题类别准确率为100.0%,字段提取准确率为97.1%。在通过真实LLM到Gmsh路径路由的10个案例自定义几何基准测试中,首次通过和最终成功率均为90.0%,一次未恢复的无效几何失败。这些结果表明,解析器和受约束的提示/验证设计在这些基准测试上是有效的。作为端到端演示,该系统从一个自然语言提示生成并分析了一个带有圆角和螺栓孔的3D弹塑性L形支架。贡献在于一种用于自然语言驱动的变分模拟的测量架构,而非开放式的自主代码生成。

英文摘要

Large language models can reduce the manual effort required to set up finite element simulations, but they introduce reliability risks when generated solver code lies on the critical path. We present a constrained natural-language interface for multi-physics finite element analysis in which the LLM is limited to front-end tasks: parsing prompts into structured JSON, generating Gmsh code only for non-catalog geometries, and using retry feedback for those stages. It never writes FEniCS solver templates, derives weak forms, or writes the numerical solver core. A deterministic dispatcher maps the validated specification to five human-written FEniCS/UFL templates: linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. We validate this deterministic template layer against analytical solutions and published 2D/3D benchmarks. Smooth cases reach sub-percent agreement on adequate meshes, while harder nonlinear cases reach the 2-5 percent range. We also evaluate the LLM-facing front end directly. In a 15-prompt parser benchmark, first-pass valid parses were obtained for 9 cases, and all remaining cases were repaired after retry, giving a final valid parse rate of 100.0 percent, 100.0 percent problem-class accuracy, and 97.1 percent field-extraction accuracy. In a 10-case custom-geometry benchmark routed through the real LLM-to-Gmsh path, first-pass and final success were both 90.0 percent, with one unrecovered invalid-geometry failure. These results show that the parser and constrained prompt/validation design are effective on these benchmarks. As an end-to-end demonstration, the system generates and analyzes a 3D elastoplastic L-bracket with a fillet and bolt hole from one natural-language prompt. The contribution is a measured architecture for natural-language-driven variational simulation, not open-ended autonomous code generation.

2606.10909 2026-06-10 cs.CE cs.LG physics.comp-ph 新提交

Non-linear mechanical field reconstruction coupling recurrent neural networks with physics-informed graph neural networks

非线性力学场重建:循环神经网络与物理信息图神经网络的耦合

Manuel Ricardo Guevara Garban, Yves Chemisky, Étienne Prulière, Michaël Clément, Martin Abendroth, Björn Kiefer

发表机构 * Univ. Bordeaux, CNRS, Bordeaux INP, I2M, UMR 5295(波尔多大学、国家科学研究中心、波尔多工业学院、I2M、UMR 5295) Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800(波尔多大学、国家科学研究中心、波尔多工业学院、LaBRI、UMR 5800) Institute of Mechanics and Fluid Dynamics, TU Bergakademie Freiberg(力学与流体动力学研究所、弗赖堡技术大学) Univ. Grenoble Alpes, CNRS, UMR 5525(格勒诺布尔阿尔卑斯大学、国家科学研究中心、UMR 5525) Arts et Metiers Institute of Technology, CNRS, Bordeaux INP, I2M, UMR 5295(艺术与工艺技术学院、国家科学研究中心、波尔多工业学院、I2M、UMR 5295)

AI总结 提出LSTM-GNN耦合框架,利用LSTM编码宏观应力-应变序列的路径依赖响应,物理信息GNN重建空间应力场,通过相对加权策略平衡损失,实现弹塑性微观结构应力场快速重建,速度提升三个数量级。

详情
AI中文摘要

在非线性、历史依赖载荷下重建异质微结构的局部应力场仍然是多尺度模拟中的主要计算瓶颈。我们提出了一种耦合的LSTM-GNN框架,将局部应力场重建的时间和空间方面联系起来。长短期记忆网络将宏观应力-应变序列编码为紧凑的隐藏状态,捕获路径依赖的本构响应,而物理信息图神经网络在每个时间步重建空间分辨的应力场。我们引入了一种带有线性热启动的相对加权策略,以平衡数据驱动的重建损失和基于离散散度的平衡惩罚。这解决了在弹塑性范围内阻止固定权重公式收敛的尺度不匹配问题。该模型在应用于周期性含孔板微结构和von Mises弹塑性的10,000条非比例加载路径上训练。该模型相比有限元模拟实现了三个数量级的加速,并泛化到训练长度两倍的加载序列,累积误差为1.9%。由于图依赖于网格连通性而非特定单元类型,一个训练好的代理模型可以直接应用于不同单元类型、更粗和更细分辨率的网格,无需重新训练,同时在所有情况下复现训练中使用的高保真四边形单元有限元场。实际上,GNN和MeshGraphNet架构固有的消息传递特性使模型与网格无关。对LSTM隐藏状态的分析表明,存在与本构模型内部状态变量相关的低维结构。

英文摘要

Reconstructing local stress fields in heterogeneous microstructures under non-linear, history-dependent loading remains a major computational bottleneck in multi-scale simulations. We propose a coupled LSTM-GNN framework that links the temporal and spatial aspects of local stress field reconstruction. A Long Short-Term Memory network encodes macroscopic stress-strain sequences into a compact hidden state that captures the path-dependent constitutive response, while a physics-informed Graph Neural Network reconstructs the spatially-resolved stress field at each time step. We introduce a relative weighting strategy with linear warm-up to balance the data-driven reconstruction loss and a discrete divergence-based equilibrium penalty. This resolves the scale mismatch that prevents fixed-weight formulations from converging in the elasto-plastic regime. The model is trained on 10,000 non-proportional loading paths applied to a periodic plate-with-a-hole microstructure and von Mises elasto-plasticity. The model achieves three orders of magnitude speedup over finite element simulations and generalizes to loading sequences twice the training length, with 1.9% cumulative error. Because the graph relies on mesh connectivity instead of the specific element type, one trained surrogate can be applied directly without retraining to meshes with different element types and to both coarser and finer resolutions, while in all cases reproducing the high-fidelity quad-element FE field used during training. Indeed, the message passing characteristics inherent to GNN and MeshGraphNet architecture render the model mesh-agnostic. Analysis of the LSTM hidden states suggests a low-dimensional structure related to the internal state variables of the constitutive model.