arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.11676 2026-06-11 cs.CE cs.LG physics.comp-ph 新提交

Neural-Parameterized Cellular Automata for Wildfire Spread

神经参数化元胞自动机用于野火蔓延

Maksym Zhenirovskyy, Ion Matei, Rohit Vuppala, Takuya Kurihana, Hon Yung Wonga

AI总结 提出一种混合深度学习参数化概率元胞自动机框架,利用多尺度卷积神经网络动态生成空间变化参数,在保持物理可解释性的同时捕捉复杂环境交互,在六次大型野火中实现72小时IoU>0.6的预测。

详情
Comments
16 pages, 9 figures
AI中文摘要

传统野火模型依赖刚性、低维参数和静态燃料图,常常低估火势蔓延。为解决这一弱点,我们引入了一个在JAX中实现的混合深度学习参数化概率元胞自动机(CA)框架。我们的方法采用多尺度卷积神经网络动态生成控制火势蔓延概率、风向对齐和坡度影响的空间变化参数。这种混合设计捕捉了复杂的非线性环境交互,同时保留了底层三态CA的物理可解释性。JAX实现支持硬件加速和基于梯度的参数校准。在美国西部六次大规模野火上的评估显示,在10天数据同化窗口期间模型逐步拟合观测到的火线后,该模型在72小时预测范围内保持IoU>0.6;由此产生的预测是在这些观测中已编码的抑制机制下火势增长的条件投影。

英文摘要

Traditional wildfire models rely on rigid, low-dimensional parameters and static fuel maps, frequently underpredicting fire spread. To address this weakness, we introduce a hybrid deep-learning parameterized Probabilistic Cellular Automata (CA) framework implemented in JAX. Our approach employs a Multi-Scale Convolutional Neural Network to dynamically generate spatially varying parameters that govern fire-spread probability, wind alignment, and slope influence. This hybrid design captures complex, nonlinear environmental interactions while preserving the physical interpretability of the underlying three-state CA. The JAX implementation enables hardware acceleration and gradient-based parameter calibration. Evaluated on six large-scale wildfires in the western United States, the model maintains IoU > 0.6 over 72-hour forecast horizons after a 10-day data assimilation window during which the model is fitted incrementally to observed perimeters; the resulting forecast is a conditional projection of fire growth under the suppression regime already ncoded in those observations.

2606.11673 2026-06-11 quant-ph cs.LG 新提交

Higher-Order Token Interactions via Quantum Attention

高阶令牌交互的量子注意力机制

Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

AI总结 提出量子高阶注意力(QHA),通过数据重上传和非克利福德纠缠器在浅电路中合成任意阶令牌交互,证明其表达能力超越经典自注意力,并具有可训练性保证,在遗传上位、带噪学习奇偶和图三角形检测中高效检测高阶交互。

详情
AI中文摘要

标准点积自注意力在单层中仅计算令牌间的成对(二阶)交互;表示一般的$k$阶交互已知需要在单层中使用超二次资源或通过深度组合。我们引入\textbf{量子高阶注意力(QHA)},一种浅层、硬件可实现的量子注意力头,通过数据重上传和全对非克利福德纠缠器,在电路内部合成$k$阶令牌交互,并通过局部单量子比特读出暴露它们。我们证明:(i)表达能力分离:任何嵌入维度$m$、$H$个头和$p$位精度满足$mHp=o(N/\log\log N)$的单个标准自注意力层无法表示一个QHA头以电路深度$O(\log k)$($O(k)$个两量子比特门)表示的$k$阶相关族;(ii)其局部设计实例的可训练性保证:使用局部读出和$O(\log n)$深度,梯度方差为$\Omega(1/\mathrm{poly}(n))$(无贫瘠高原),我们通过实验确认——同时明确我们基准测试的更具表达力的全对实例是经验训练的,并显示指数衰减的梯度。实验上,在参数预算小$6.5\times$的情况下,QHA从不相交输入中泛化每个阶$k\le6$的隐藏子集奇偶性,而更大的经典注意力头在阶~2之后崩溃;与理论一致,优势的大小跟踪目标的傅里叶度——奇偶性最大,当存在低阶结构时缩小。作为一个应用,QHA在三个领域——遗传上位、带噪学习奇偶和图三角形检测——作为紧凑的高阶交互检测器,在最小的参数预算下达到噪声上限,而领域标准的线性方法失败。

英文摘要

Standard dot-product self-attention computes, in a single layer, only pairwise (order-2) interactions between tokens; representing a generic order-$k$ interaction is known to require either super-quadratic resources in one layer or composition across depth. We introduce \textbf{Quantum Higher-Order Attention (QHA)}, a shallow, hardware-realizable quantum attention head that, via data re-uploading and an all-to-all non-Clifford entangler, synthesizes order-$k$ token interactions inside the circuit and exposes them through a local single-qubit read-out. We prove (i) an expressivity separation: any single standard self-attention layer with embedding dimension $m$, $H$ heads and $p$-bit precision satisfying $mHp=o(N/\log\log N)$ cannot represent the order-$k$ correlation family that one QHA head represents with circuit depth $O(\log k)$ ($O(k)$ two-qubit gates); and (ii) a trainability guarantee for its local-design instantiation: with a local read-out and $O(\log n)$ depth the gradient variance is $\Omega(1/\mathrm{poly}(n))$ (no barren plateau), which we confirm empirically -- while being explicit that the more expressive all-to-all instantiation we benchmark is trained empirically and shows exponentially decaying gradients. Empirically, at a $6.5\times$ smaller parameter budget, QHA generalizes hidden-subset parity of every order $k\le6$ from disjoint inputs, whereas the larger classical attention head collapses past order~2; consistent with theory, the size of the advantage tracks the target's Fourier degree - largest for parity and shrinking when low-order structure is present. As an application, QHA serves as a compact high-order interaction detector across three domains - genetic epistasis, learning-parity-with-noise, and graph triangle detection - reaching the noise ceiling at the smallest parameter budget where field-standard linear methods fail.

2606.11672 2026-06-11 cs.CR cs.AI 新提交

Can Open-Source LLM Agents Replace Static Application Security Testing Tools? An Empirical Assessment

开源LLM代理能否取代静态应用安全测试工具?一项实证评估

Derek Yohn, Luke Flancher, Mirajul Islam, Khaled Slhoub

AI总结 评估基于开源LLM的代理在静态应用安全测试中的性能,与SAST工具Bandit对比,发现当前不适合实际应用。

详情
Comments
Keywords: Agentic AI, Cybersecurity, Large Language Models, Static Application Security Testing, Model performance evaluation
AI中文摘要

本文探讨了代理式AI工具在网络安全领域的价值。我们评估了基于通用GenAI大语言模型(LLM)的代理在三种不同Ollama托管的通用开源模型驱动下的有效性。我们使用精确率、召回率、误报数以及基于捕获指标交互计算的综合得分,评估每个代理的性能,并与现有经过验证的静态应用安全测试(SAST)工具Bandit的基线性能进行比较。我们的研究结果驳斥了现代开源GenAI LLM代理在当前现实条件下适用于SAST扫描这一专门任务的看法。

英文摘要

This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- (GenAI-) based agent when powered by three different Ollama-hosted general-purpose open source models. We assess each agent's performance using precision, recall, false positive count, and a calculated composite score based upon the interplay of the captured metrics, against the baseline performance of an existing, vetted Static Application Security Testing (SAST) tool, Bandit. Our findings refute the notion that a modern open-source GenAI LLM-based agent is currently suitable for the specialized task of SAST scanning under realistic conditions.

2606.11671 2026-06-11 cs.CR cs.AI 新提交

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

运行时技能审计:针对智能体技能安全的目标运行时探测

Tu Lan, Chaowei Xiao

AI总结 提出运行时技能审计(RSA)动态分析方法,通过目标运行时条件探测技能行为,在100个技能上达到90.0%准确率,优于静态基线。

详情
AI中文摘要

智能体技能让LLM智能体能够复用指令、资源、工具和工作流,但也为恶意行为提供了新的隐藏场所。一个技能在其文档或代码中可能看起来无害,但只有在与特定用户请求、本地资产、持久状态或多步骤工具交互调用时才会变得有害。这使得纯静态审查变得脆弱。我们提出运行时技能审计(RSA),一种动态分析方法,通过询问技能介导的智能体在目标运行时条件下实际做了什么来审计技能。RSA不是用相同的通用任务测试每个技能,而是分析风险相关接口,准备执行上下文以触发这些接口,并根据产生的跟踪证据分配安全标签。我们在OpenClaw上实现RSA,并在100个技能上针对代表性静态基线进行评估。RSA达到90.0%的准确率,88.0%的真阳性率和8.0%的假阳性率,比最佳静态基线提高13.0个百分点。在自进化攻击下,静态检测器在一两轮后崩溃,而RSA在每轮中持续检测出19-20个恶意技能。

英文摘要

Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19--20 out of 20 malicious skills across rounds.

2606.11663 2026-06-11 cs.SI cs.LG 新提交

Probabilistic Salary Prediction with Graph Attention Networks and a Mixture Density Network

基于图注意力网络和混合密度网络的概率薪资预测

Zhipei Qin, Mohammad Shokri, N. van Weeren, F.W. Takes

AI总结 提出GAT-MDN框架,通过构建属性关系图并使用图注意力网络学习节点表示,结合混合密度网络输出薪资分布,在百万级荷兰招聘数据集上优于基线模型。

详情
Comments
5 pages, 3 figures
AI中文摘要

准确的薪资预测对于弥合现代劳动力市场中雇主与求职者之间的信息差距至关重要。现有方法主要产生单点估计,并将工作属性(如地点、职业和行业)视为独立的分类特征,忽略了真实世界薪酬数据固有的不确定性和多模态性,以及支配薪资规范的丰富层次结构和语义相似性关系。在本文中,我们提出了GAT-MDN,一个同时解决这两个限制的统一框架。对于三个属性域中的每一个,我们构建了一个特定领域的图,其边编码了(i)层次化的父子包含关系和(ii)从预训练的Sentence-Transformer导出的加权相似性链接。具有边缘特征感知注意力的并行图注意力网络(GAT)从这些多关系图中学习丰富的、上下文感知的节点表示。然后,一个基于优先级的层次选择模块组装一个复合特征向量,优雅地处理缺失或粗略的属性,而混合密度网络(MDN)头将该向量映射到高斯混合模型(GMM)的参数,产生完整的条件薪资分布。在超过100万条记录的真实世界荷兰招聘数据集上的大量实验表明,GAT-MDN在负对数似然(NLL)和均方误差(MSE)方面均显著优于非图MLP-MDN基线。

英文摘要

Accurate salary prediction is critical for bridging the information gap between employers and job seekers in modern labor markets. Existing approaches predominantly yield a single point estimate and treat job attributes such as location, occupation, and industry as independent categorical features, ignoring both the inherent uncertainty and multi-modality of real-world compensation data and the rich hierarchical and semantic-similarity relationships that govern pay norms. In this paper we propose GAT-MDN, a unified framework that addresses both limitations simultaneously. For each of the three attribute domains we construct a domain-specific graph whose edges encode (i) hierarchical parent-child containment and (ii) weighted similarity links derived from a pre-trained Sentence-Transformer. Parallel Graph Attention Networks (GATs) with edge-feature-aware attention learn rich, context-sensitive node representations from these multi-relational graphs. A priority-based hierarchical selection module then assembles a composite feature vector that gracefully handles missing or coarse attributes, and a Mixture Density Network (MDN) head maps this vector to the parameters of a Gaussian Mixture Model (GMM), yielding a full conditional salary distribution. Extensive experiments on a real-world Dutch job-posting dataset of over 1 million records demonstrate that GAT-MDN significantly outperforms a non-graph MLP-MDN baseline in both Negative Log-Likelihood (NLL) and Mean Squared Error (MSE).

2606.11662 2026-06-11 cs.AI 新提交

TreeSeeker: Tree-Structured Trial, Error, and Return in Deep Search

TreeSeeker:深度搜索中的树结构试错与回溯

Zhuofan Shi, Mingzhe Ma, Lu Wang, Fangkai Yang, Pu Zhao, Yiming Guan, Youling Huang, Wei Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan

AI总结 提出TreeSeeker框架,通过树结构分支-回溯搜索和UCB信号选择,在深度搜索中实现受控试错,显著提升复杂问答性能。

详情
AI中文摘要

深度搜索要求智能体通过多步网络搜索、浏览、证据比较和综合来回答复杂问题。一个核心挑战是当多个方向看似可行但只有部分能最终提供可靠证据时,如何决定搜索方向。如果智能体贪婪地跟随当前最佳方向,它可能会不断扩展一个薄弱的延续;如果无纪律地探索,则可能将预算浪费在无关的尝试上。我们提出TreeSeeker,一个用于深度搜索中受控试错的推理时框架。TreeSeeker将搜索组织为树结构状态上的分支-回溯搜索,其中每个分支是子目标的一个试探性方向。在每一轮中,TreeSearch读取所有子目标树,识别活跃目标,并使用价值、不确定性和风险等文本UCB信号来选择:利用有希望的分支、探索不确定的替代方案,或剪除无生产力的延续并返回到较早的分支点。TreeMem通过将证据、不确定性、冲突、进展和失败线索附加到产生它们的分支上来支持这一控制循环,从而使试验结果能够指导后续决策。在XBench-DeepSearch、BrowseComp和BrowseComp-ZH上的实验表明,TreeSeeker始终优于强开源基线,这表明显式的分支-回溯控制可以补充更强的推理和工具执行能力。

英文摘要

Deep search requires agents to answer complex questions through multi-step web search, browsing, evidence comparison, and synthesis. A central challenge is deciding how to search when several directions look plausible but only some will later lead to reliable evidence. If an agent greedily follows the current best-looking direction, it may keep extending a weak continuation. If it explores without discipline, it may waste budget on disconnected trials. We propose TreeSeeker, an inference-time framework for controlled trial-and-error in deep search. TreeSeeker organizes search as branch-and-return search over tree-structured states, where each branch is a tentative direction for a sub-goal. At each round, TreeSearch reads all sub-goal trees, identifies active goals, and uses textual UCB signals of value, uncertainty, and risk to select among exploiting a promising branch, exploring an uncertain alternative, or pruning an unproductive continuation and returning to an earlier branch point. TreeMem supports this control loop by keeping evidence, uncertainty, conflicts, progress, and failure cues attached to the branches that produced them, so trial outcomes can guide later decisions. Experiments on XBench-DeepSearch, BrowseComp, and BrowseComp-ZH show that TreeSeeker consistently outperforms strong open-source baselines, suggesting that explicit branch-and-return control complements stronger reasoning and tool execution.

2606.11661 2026-06-11 cs.CV cs.LG 新提交

Learning Instance-Adaptive Low-Rank Orthogonal Subspaces for Clothes-Changing Person Re-Identification

学习实例自适应低秩正交子空间用于换衣行人重识别

Dong-Woo Kim, Tae-Kyun Kim

AI总结 提出Ortho-ReID方法,通过从VLM文本描述中显式建模低秩服装子空间,并利用几何约束提取服装不变特征,在多个基准数据集上取得最优性能。

详情
Comments
Accepted to the ICML 2026 Workshop on CoLoRAI
AI中文摘要

换衣行人重识别(CC-ReID)旨在识别尽管因服装变化导致外观剧烈变化的个体。现有方法依赖对抗学习来解耦服装特征,我们提出Ortho-ReID,该方法从VLM文本描述中显式建模低秩服装子空间,并通过直接几何约束提取服装不变表示。一个关键组件是基于Transformer的基生成器(Basis Maker),它通过与图像块的交叉注意力,将共享的低维服装先验细化为实例自适应低秩子空间,从而在变化的可见性条件下也能实现鲁棒的服装特征提取。该实例自适应子空间通过与服装文本嵌入对齐进行监督,而身份特征则通过可学习的投影头提取,并在几何上约束与其严格正交。大量实验表明,在PRCC(top-1提升5.9%)、Celeb-reID-light(提升3.5%)和LaST(提升5.3%)上达到了最先进性能,在LTCC上也取得了有竞争力的结果。

英文摘要

Clothes-changing person re-identification (CC-ReID) aims to recognize individuals despite drastic appearance changes caused by clothing variation. While existing methods rely on adversarial learning to disentangle clothing features, we propose Ortho-ReID, which explicitly models a low-rank clothing subspace from VLM text descriptions and extracts clothing-invariant representations via direct geometric constraints. A critical component is our transformer-based Basis Maker, which refines a shared, low-dimensional clothing prior into an instance-adaptive low-rank subspace through cross-attention with image patches, enabling robust clothing feature extraction even under varying visibility conditions. This instance-adaptive subspace is supervised via alignment with clothing text embeddings, while identity features are extracted via a learnable projection head and geometrically constrained to be strictly orthogonal to it. Extensive experiments demonstrate state-of-the-art performance on PRCC (+5.9% top-1), Celeb-reID-light (+3.5%), and LaST (+5.3%), with competitive results on LTCC.

2606.11660 2026-06-11 cs.LG 新提交

Bergson: An Open Source Library for Data Attribution

Bergson:一个用于数据归因的开源库

Lucia Quirke, Louis Jaburi, David Johnston, William Z. Li, Gonçalo Paulo, Guillaume Martres, Girish Gupta, Stella Biderman, Nora Belrose

AI总结 提出Bergson开源库,支持大规模语言模型和预训练数据集的多种数据归因方法,提供磁盘梯度存储和多节点分布式训练,首次开源实现MAGIC、SOURCE和TrackStar三种方法。

详情
AI中文摘要

数据归因是可解释性领域一个有前景的方向,旨在通过训练数据的影响来解释模型行为,其应用包括调试不良模型行为和训练数据集整理。然而,大规模执行数据归因需要大量的工程工作,许多前沿技术缺乏开源工具和支持。Bergson是一个开源库,旨在通过提供一系列可扩展到超大规模语言模型和预训练数据集的技术,推动该领域的更快发展。该库原生支持磁盘梯度存储和多节点分布式训练,并为研究人员提供生活质量工具。最后,我们首次开源实现了三种领先的数据归因方法:MAGIC、SOURCE和TrackStar。该库可在以下网址获取:https://github.com/example/bergson。

英文摘要

Data attribution is a promising field in interpretability that aims to explain model behavior through the influence of its training data, with applications including debugging undesirable model behavior and training dataset curation. However, significant engineering effort is required to perform it at scale, and many cutting edge techniques lack open-source tooling and support. Bergson is an open source library that aims to enable faster progress in the field by providing a host of techniques that scale to very large language models and pre-training datasets. The library natively supports on-disk gradient stores and multi-node distributed training, and provides quality of life tools for researchers. Finally, we introduce the first open-source implementations of three leading data attribution methods: MAGIC, SOURCE, and TrackStar. The library is available at this https URL.

2606.11651 2026-06-11 cs.LG q-bio.QM stat.AP 新提交

DeepRHP: A Hybrid Variational Autoencoder for Designing Random Heteropolymers as Protein Mimics

DeepRHP:一种用于设计随机异聚合物作为蛋白质模拟物的混合变分自编码器

Shuni Li, Zhiyuan Ruan, Andy Shen, Ivan Jayapurna, Ting Xu, Haiyan Huang

AI总结 提出混合变分自编码器DeepRHP,在半监督框架下结合特征VAE与经典VAE,通过潜在空间捕获关键化学特征与序列模式,指导随机异聚合物设计,实验验证其稳定膜蛋白的有效性。

详情
Comments
Oral presentation at AAAI 2023 Workshop on AI to Accelerate Science and Engineering
AI中文摘要

由预定义单体组成的合成随机异聚合物(RHP)为设计类蛋白质材料提供了一种方法。如果设计得当,这些RHP可以模拟蛋白质的行为和功能。因此,需要计算工具来有效指导RHP设计。我们通过开发DeepRHP(一种在半监督框架下改进的变分自编码器(VAE)模型)来弥补这一差距。通过为经典VAE配备额外的基于特征的VAE,DeepRHP迫使潜在空间捕获关键化学特征的结构以及单个RHP序列模式。从这个意义上说,我们的方法是通用的,允许以混合方式纳入任何相关特征。我们通过提出在非原生环境中稳定膜蛋白(例如水通道蛋白Z)的潜在单体组成,并将我们的预测与已发表的结果进行交叉验证,证明了DeepRHP的有效性。我们的模型与真实RHP功能之间的一致性表明,利用混合自编码器架构来指导蛋白质和其他生物化合物的RHP设计具有巨大潜力。

英文摘要

Synthetic random heteropolymers (RHPs), consisting of a predefined set of monomers, offer an approach toward the design of protein-like materials. These RHPs, if designed appropriately, can mimic protein behavior and function. As such, there is a need for computational tools to efficiently guide RHP design. We bridge this gap by developing DeepRHP, a modified variational autoencoder (VAE) model under a semi-supervised framework. By equipping a classical VAE with an additional feature-based VAE, DeepRHP forces the latent space to capture structures of critical chemical features as well as individual RHP sequence patterns. In this sense, our method is versatile by allowing any relevant features to be incorporated in a hybrid manner. We demonstrate the effectiveness of DeepRHP by suggesting potential monomer compositions that stabilize membrane proteins (e.g. Aquaporin Z) in non-native environments and cross-validating our prediction with published results. The concordance between our model and true RHP function suggests strong potential in utilizing hybrid autoencoder architectures to guide RHP design for proteins and other biological compounds.

2606.11648 2026-06-11 cs.CR cs.CL 新提交

Dummy Backdoor as a Defense: Removing Unknown Backdoors via Shared Internal Mechanisms for Generative LLMs

虚拟后门作为防御:通过共享内部机制移除生成式大语言模型中的未知后门

Kazuki Iwahana, Masaru Matsubayashi, Takuma Koyama, Toshiki Shibahara, Kenichiro Omintato, Akira Ito

AI总结 提出一种基于共享内部机制的后门移除方法,通过嵌入已知触发器的虚拟后门并微调移除,从而降低未知后门攻击成功率,同时保持模型效用。

详情
AI中文摘要

后门攻击对大型语言模型(LLMs)的安全性和可靠性构成严重威胁,因为它们使模型在干净输入上表现正常,但在隐藏触发器出现时产生攻击者指定的响应。当防御者不知道后门攻击类型或通过后门训练形成的内部机制时,移除这种未知后门尤其具有挑战性。在这项工作中,我们提出了一种简单但有效的后门移除方法,基于不同后门之间的共享内部机制。首先,我们展示了具有相同任务(攻击目标)的不同后门会在内部激活中引发类似的触发器激活变化。受此观察启发,我们的方法有意嵌入一个具有已知触发器的后门(虚拟后门),然后通过在虚拟触发器输入与干净响应对上进行进一步微调来移除它。由于虚拟后门和未知后门可以依赖共享的内部机制,移除虚拟后门也会降低未知后门的效果。我们在多个模型家族上对三种后门攻击类型进行了评估。实验结果表明,我们的方法在保持模型效用的同时,显著降低了未知后门的攻击成功率,在后门移除效果和效用保持方面均优于现有的代表性防御方法。这些发现表明,防御者可控制的后门可以作为减轻生成式LLMs中未知后门的有益代理。

英文摘要

Backdoor attacks pose a serious threat to the safety and reliability of Large Language Models (LLMs), as they cause models to behave normally on clean inputs while producing attacker-specified responses when hidden triggers are present. Removing such unknown backdoors is particularly challenging when the defender does not know the backdoor attack types or the internal mechanisms formed through backdoor training. In this work, we propose a simple but effective backdoor removal method based on shared internal mechanisms across different backdoors. First, we show that different backdoors with the same task (attack objective) induce similar trigger-activated changes in the internal activations. Motivated by this observation, our method intentionally embeds a backdoor with a known trigger (\emph{dummy backdoor}) and then removes it through further fine-tuning on dummy-triggered inputs paired with clean responses. Since the dummy backdoor and the unknown backdoor can rely on shared internal mechanisms, removing the dummy backdoor also reduces the effect of the unknown backdoor. We evaluate our method on three backdoor attack types across multiple model families. Experimental results show that our method substantially reduces the attack success rate of the unknown backdoor while preserving model utility, outperforming representative existing defense methods in both backdoor removal effectiveness and utility preservation. These findings suggest that a defender-controllable backdoor can serve as a helpful proxy for mitigating unknown backdoors in generative LLMs.

2606.11646 2026-06-11 cs.LG q-bio.QM stat.ML 新提交

Tree-Structured Orthonormal Decomposition of the Aitchison Simplex

Aitchison单纯形的树结构正交分解

Daisuke Yamada, Qijun Zhang, Travis Pence, Barbara B. Bendlin, Federico Rey, Vikas Singh

AI总结 提出PolyILR方法,利用树结构对成分数据进行正交分解,在微生物组和单细胞数据中生成稳定可解释的特征,并建立与softmax分类器的理论联系。

详情
Comments
Accepted at ICML 2026. To appear in PMLR vol. 306
AI中文摘要

成分数据——编码相对比例的向量——出现在包括生态学、地球化学和基因组学在内的科学领域。这些数据中的特征通常具有已知的层次结构(例如,分类学、系统发育、本体论),但现有方法要么忽略这种结构,要么丢弃内在的Aitchison几何,要么设计用于二叉树,要么产生不完整的坐标系。我们描述了PolyILR,一种与任何树拓扑对齐的Aitchison切空间的正交分解。我们的构造在每个内部节点定义了一个加权局部几何,捕获完整的分支结构,然后将这些提升到一个全局正交基,其中每个坐标对应一个特定的树位置。在微生物组和单细胞基准测试中,PolyILR产生稳定、可解释的特征,并支持多尺度树分辨率下的推理。我们还建立了与softmax分类器的新理论联系,暗示了在概率建模中的可能应用。

英文摘要

Compositional data -- vectors encoding relative proportions -- arise across scientific domains, including ecology, geochemistry, and genomics. The features in these data often come with known hierarchical structure (e.g., taxonomies, phylogenies, ontologies), yet existing methods either ignore this structure, discard the intrinsic Aitchison geometry, are designed for binary trees, or yield incomplete coordinate systems. We describe PolyILR, a canonical orthonormal decomposition of the Aitchison tangent space aligned with any tree topology. Our construction defines a weighted local geometry at each internal node capturing full branching structure, then lifts these to a global orthonormal basis where every coordinate corresponds to a specific tree location. On microbiome and single-cell benchmarks, PolyILR yields stable, interpretable features and enables inference at multiscale tree resolution. We also establish a novel theoretical connection to softmax classifiers, suggesting possible applications to probabilistic modeling.

2606.11642 2026-06-11 cs.HC cs.CL 新提交

3-Key-Input: Exploring the Theoretical Minimum Keys for Text Entry

3-Key-Input: 探索文本输入的理论最少按键数

Naoki Kimura

AI总结 本研究通过结合语言模型与2-5个物理按键,系统评估文本输入系统,发现3键+GPT-4o可实现字符错误率9.46%,表明在强语言模型先验下3键是实用最小值。

详情
Comments
6 pages, 1 figure, 7 tables. Published in ICASSP 2026
AI中文摘要

如果我们为模糊键盘配备现代语言模型,可以将物理按键数量减少到多少?更少的按键在辅助设备和移动设备等受限场景中增加了硬件设计自由度。本文系统评估了使用2-5个物理按键结合基于语言模型的消歧的文本输入系统。在包含300个句子的英文语料库(商务/会话/技术各100句)上,我们比较了按键数量(2-5)、字母到按键映射(基于布局/基于频率/故意最坏情况)和解码器(仅Trie、GPT-2束搜索、GPT-4o选择)。我们发现,3键+GPT-4o实现了字符错误率(CER)9.46%和词错误率(WER)12.20%,相对于2键(CER 23.3%)CER降低了59%。在3键时,键流熵为1.54比特/字符;虽然增加到5键提高了准确率(CER 5.4%),但边际增益递减。在标准设计下,映射选择影响较小(ΔCER < 0.5个百分点),即使故意最坏映射也仅使CER增加+0.5个百分点,而技术句子的错误率大约是商务句子的两倍。这些结果表明,在我们评估的离线设置下,在强语言模型先验下,3键是通用英语的实用最小值。

英文摘要

How far can we reduce the number of physical keys if we endow an ambiguous keyboard with modern language models? Fewer keys increase hardware design freedom in constrained settings such as assistive devices and mobile form factors. This paper systematically evaluates text entry systems using 2-5 physical keys combined with language-model-based disambiguation. On a 300-sentence English corpus (100 sentences each for Business / Conversational / Technical), we compare key counts (2-5), letter-to-key mappings (layout-based / frequency-based / intentionally worst-case), and decoders (Trie-only, GPT-2 beam search, GPT-4o selection). We find that 3 keys + GPT-4o achieves character error rate (CER) 9.46% and word error rate (WER) 12.20%, reducing CER by 59% relative to 2 keys (CER 23.3%). At 3 keys, the key-stream entropy is 1.54 bits/char; while increasing to 5 keys improves accuracy (CER 5.4%), the marginal gains diminish. Mapping choice has a small impact under standard designs ({\Delta}CER < 0.5 pp), and even an intentionally worst mapping degrades CER by only +0.5 pp, whereas Technical sentences yield roughly twice the error rate of Business. These results suggest that, in our evaluated offline setting under a strong LM prior, 3 keys are a practical minimum for general English.

2606.11639 2026-06-11 cs.CL 新提交

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

评估基于音素的自动语音识别系统中的偏差:对IPA转录模型的分析

Catherine Bao, Maneesha Rani Saha, Neal Patwari

AI总结 研究评估WhisperIPA和ZIPA两个开源IPA转录ASR系统在不同口音和语言上的性能,通过标准音素错误率和软音素错误率分析,发现模型在性别、口音、种族和年龄等群体间存在持续性能差异。

详情
AI中文摘要

自动语音识别(ASR)系统的普及增加了对种族、年龄、性别和口音等人口统计偏差的探索,这些偏差通常源于不平衡的训练数据。大多数研究集中在基于标准字素的ASR系统上,而对基于音素的系统(如生成国际音标(IPA)表示的模型)关注较少。随着ASR系统向多语言支持和低资源语言建模转变,基于IPA的层作为关键的、语言无关的基础。在本研究中,我们评估了两个最先进的开源ASR系统WhisperIPA和ZIPA的性能,它们生成跨不同口音和语言源的IPA转录。我们的评估包括现有的多语言语音语料库和人口统计注释的英语语料库。我们通过比较模型生成的IPA转录与字素到音素(G2P)系统,使用标准音素错误率(PER)和提出的软PER指标(容忍语言学上相似的音素替换)来衡量模型性能。我们的分析考察了性能在不同语言和人口统计群体(如性别、口音、种族和年龄)之间的变化,揭示了即使在考虑了可接受的音素变异后仍存在的持续差异。这些发现为偏差的潜在来源提供了见解,并为开发更包容和语言鲁棒的基于音素的ASR系统提供了信息。我们的代码和数据将公开发布给社区。

英文摘要

The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard grapheme-based ASR systems with comparatively little emphasis on phoneme-based systems, such as models that produce International Phonetic Alphabet (IPA) representations. As ASR systems shift toward multilingual support and low-resource language modeling, IPA-based layers serve as a critical, language-agnostic foundation. In this study, we evaluate the performance of two state-of-the-art open-source ASR systems, WhisperIPA and ZIPA, that generate IPA transcriptions across diverse accents and language sources. Our evaluation includes existing multilingual speech corpora and demographically annotated English-language corpora. We measure model performance by comparing model-generated IPA transcriptions against grapheme-to-phoneme (G2P) systems using both standard phoneme error rate (PER) and a proposed Soft PER metric that tolerates linguistically similar phoneme substitutions. Our analysis examines how performance varies across languages and demographic groups such as gender, accent, ethnicity, and age, revealing persistent disparities even after accounting for acceptable phonemic variation. These findings provide insight into potential sources of bias and inform the development of more inclusive and linguistically robust phoneme-based ASR systems. Our code and data will be made publicly available to the community.

2606.11635 2026-06-11 cs.CY cs.AI 新提交

Are LLMs Bad at Moral Reasoning?

LLMs 在道德推理上表现不佳吗?

Menghang Zhu, Seth Lazar

AI总结 本文通过让LLMs生成评分标准而非直接评分,重新评估MoReBench数据集,发现LLMs的道德推理能力比先前认为的更强。

详情
AI中文摘要

为了让高能力AI系统在动态、开放的环境中安全运行,它们必须能够识别、理解并响应行动中的道德理由,并据此约束自身行为。越来越多的研究旨在评估当今最先进AI系统的这种能力——道德能力,最近得出了普遍悲观的结论。其中一篇最具雄心的论文收集了人类专家制定的黄金标准评分标准,用于评估1000个案例中的道德推理,并以此基准测试前沿AI模型,结果不尽如人意。在本文中,我们认为MoReBench数据集可以被重新利用,以给出对LLMs道德推理(道德能力的重要组成部分)更为乐观的图景。我们表明,如果不根据这些评分标准对LLMs的回应进行评分,而是让LLMs执行与人类相同的任务——为特定案例的道德分析生成评分标准——那么它们生成的评分标准与人类评分标准的校准程度高于其开放式回应,并且在存在差异时,这些差异可能仅仅反映了大多数道德问题的巨大维度,同时也突出了人类在“创建评分标准的评分标准”上的某些偏离。考虑到这些观点,MoReBench数据集表明LLMs在道德推理方面的能力比先前认为的要强得多。

英文摘要

For highly capable AI systems to operate safely in dynamic, open-ended environments, they must be able to identify, understand, and respond to moral reasons for action, and constrain their behaviour accordingly. A growing body of research aims to evaluate this capacity -- moral competence -- in today's most capable AI systems, recently reaching broadly pessimistic conclusions. One of the most ambitious such papers collects gold-standard human-authored rubrics for evaluating moral reasoning in 1,000 cases, and benchmarks frontier AI models against those rubrics, with underwhelming results. In this paper, we argue that the MoReBench dataset can be redeployed to give a much more optimistic picture of LLMs' moral reasoning (an essential part of moral competence). We show that if, instead of scoring LLMs' responses to these cases against these rubrics, we instead give the LLMs the same task given to humans -- to generate scoring rubrics for the moral analysis of particular cases -- the rubrics they generate are both better calibrated to the human rubrics than their open-ended responses, and, where they differ, plausibly reflect nothing more than the vast dimensionality of most moral problems, as well as highlighting some human departures from the "rubric for creating rubrics". Taking these points into consideration, the MoReBench dataset suggests that LLMs are significantly more capable at moral reasoning than was previously believed.

2606.11634 2026-06-11 cs.AI 新提交

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

架构感知强化学习使滑动窗口注意力在数学推理中具有竞争力

Kai Liu, Peijie Dong, Xinchen Xie, Jianfei Gao, Qipeng Guo, Xiaowen Chu, Shaoting Zhang, Kai Chen

AI总结 提出SWARR方法,通过监督微调将预训练自注意力模型高效转换为滑动窗口注意力,并利用强化学习策略适应,缩小了与自注意力的性能差距,同时保持线性复杂度的高效性。

详情
AI中文摘要

推理和智能体大型语言模型的快速进展增加了对长上下文推理的需求,但自注意力的计算复杂度随上下文长度呈二次增长。为了解决这个问题,我们研究了SWARR(用于数学推理的滑动窗口注意力强化适应),这是一种将SWA模型适应数学推理的实用方案。SWARR包含两个阶段:(1)从预训练的SA模型高效转换为SWA,并通过监督微调(SFT)避免重新训练基础模型;(2)使用强化学习(RL)进行策略适应。我们发现,在SFT后SWA的性能仍低于SA,我们假设这一差距部分由数据-架构不匹配导致:大多数SFT数据是为SA模型准备的,可能包含SWA难以建模的长距离依赖。由于在策略RL在SWA约束下优化自生成轨迹,它可以使轨迹更好地匹配SWA。在数学推理基准上的实验表明,该方案显著缩小了SWA与SA之间的差距,恢复了SWA转换过程中丢失的大部分准确性,同时保持了线性复杂度注意力的效率优势。我们的核心贡献是实证发现,RL改变了仅通过转换和SFT得出的关于SWA在数学推理中可行性的结论。

英文摘要

The rapid progress of reasoning and agentic large language models (LLMs) has increased the demand for long-context inference, but self-attention (SA) scales quadratically with context length. To address this, we study SWARR (Sliding-Window Attention with Reinforced Adaptation for Math Reasoning), a practical recipe for adapting SWA models to mathematical reasoning. SWARR has two stages: (1) efficient conversion from a pretrained SA model to SWA with supervised fine-tuning (SFT), which avoids pretraining a new base model, and (2) policy adaptation with reinforcement learning (RL). We find that SWA still underperforms SA after SFT, and we hypothesize that this gap is caused in part by a data-architecture mismatch: most SFT data are prepared for SA models and may contain long-range dependencies that are difficult for SWA to model. Because on-policy RL optimizes self-generated trajectories under the SWA constraint, it can adapt trajectories to better match SWA. Experiments on mathematical reasoning benchmarks show that this recipe substantially narrows the gap between SWA and SA, recovering much of the accuracy lost during SWA conversion while preserving the efficiency benefits of linear-complexity attention. Our central contribution is the empirical finding that RL changes the conclusion one would draw from conversion and SFT alone about SWA's viability for math reasoning.

2606.11632 2026-06-11 cs.CR cs.AI cs.DC cs.MA 新提交

Sovereign Assurance Boundary: Certificate-Bound Admission for Agentic Infrastructure

主权保证边界:面向智能体基础设施的证书绑定准入机制

Jun He, Deying Yu

AI总结 针对智能体基础设施中非确定性推理系统对生产资源的高风险操作,提出主权保证边界(SAB),通过证书绑定的运行时准入层,将代理提案编译为执行合约并绑定加密证据,实现可验证、可撤销的授权控制。

详情
Comments
12 pages, 1 figure, 13 tables
AI中文摘要

智能体基础设施引入了一个关键的控制平面授权问题:非确定性推理系统可以对生产资源提出高风险变更,但现有的安全机制——如身份与访问管理(IAM)、策略引擎、共识协议和审计日志——要么强制执行静态的、上下文无关的权限,要么仅在执行后记录操作。本文介绍了主权保证边界(SAB),一种用于自主执行权限的证书绑定运行时准入层。SAB在保证气闸处拦截代理提案,将其编译为类型化执行合约$C$,并将这些合约绑定到加密证据摘要$H(E)$和策略版本。然后,合约通过后果感知的认证路径进行路由。成功准入后,系统发出一个严格限定于特定执行身份、撤销周期和有效时间窗口的签名主权保证证书($\Omega$)。最后,主权执行代理验证$\Omega$,并在调用基础设施API之前执行新鲜的执行前撤销和漂移检查。我们详细描述了气闸-代理架构,形式化了其准入和撤销不变量,并报告了在Go原型上对2500次准入尝试评估的初步可行性测量。最终,这种代理强制模型防止了自主推理直接改变状态,将委托的执行权限转化为一个可加密验证、证据绑定、可撤销且可重放的运行时工件。

英文摘要

Agentic infrastructure introduces a critical control-plane authorization problem: non-deterministic reasoning systems can propose high-stakes mutations to production resources, yet existing security mechanisms -- such as identity and access management (IAM), policy engines, consensus protocols, and audit logs -- either enforce static, context-unaware permissions or merely record actions post-execution. This paper introduces the Sovereign Assurance Boundary (SAB), a certificate-bound runtime admission layer for autonomous execution authority. SAB intercepts agent proposals at an assurance airlock, compiles them into typed execution contracts $C$, and binds these contracts to cryptographic evidence digests $H(E)$ and policy versions. The contracts are then routed through consequence-aware certification paths. Upon successful admission, the system emits a signed Sovereign Assurance Certificate ($\Omega$) that is strictly scoped to a specific execution identity, revocation epoch, and validity window. Finally, a sovereign execution broker verifies $\Omega$ and performs fresh pre-execution revocation and drift checks before invoking infrastructure APIs. We detail the airlock-broker architecture, formalize its admission and revocation invariants, and report preliminary feasibility measurements from a Go prototype evaluated over 2,500 admission attempts. Ultimately, this broker-enforced model prevents autonomous reasoning from directly mutating state, transforming delegated execution authority into a cryptographically verifiable, evidence-bound, revocable, and replayable runtime artifact.

2606.11631 2026-06-11 eess.AS cs.SD 新提交

Benchmarking Neural Speech Compression from a Rate-Distortion Perspective

从率失真角度基准测试神经语音压缩

Jun Xu, Zhengxue Cheng, Fengxi Zhang, Yuhan Liu, Li Song, Wenjun Zhang

AI总结 提出熵约束编解码器ECC,通过标量量化与学习熵模型结合,在低比特率下实现优于传统和神经编解码器的率失真性能。

详情
AI中文摘要

基于学习的语音压缩在低比特率性能上取得了有前景的成果,但许多神经语音编解码器仍使用预设速率的离散符号描述量化潜变量,或仅在符号生成后应用熵编码。这种设计将表示学习与概率建模解耦,限制了它们利用学习到的语音潜变量的非均匀使用和时间依赖性的能力。本文从率失真角度基准测试神经语音压缩,并进一步研究用于低比特率语音压缩的熵约束编码。我们首先制定了一个统一的基于学习的语音编码流程,并对最近的神经语音编解码器进行了基准测试风格的分析,表明显式概率建模在学习语音压缩中仍未得到充分探索。然后,我们提出了ECC,一种熵约束编解码器,它将标量量化与学习熵模型相结合。ECC集成了基于超先验的边信息、通道上下文建模、潜变量残差预测和轻量级时间建模,以在训练期间估计用于率估计的潜变量似然,并在推理期间进行算术编码。为了进一步提高低比特率效率,ECC引入了熵跳跃,它使用解码器可用的尺度估计省略高度可预测的残差符号,而无需传输额外的跳跃掩码。大量实验表明,ECC在低比特率下实现了优于传统和神经编解码器基线的率失真权衡,在两个广泛使用的测试集上,平均BD-rate在ViSQOL上降低39.9%,在PESQ上降低76.3%。消融和诊断研究进一步验证了熵建模的有效性。项目页面:此 https URL

英文摘要

Learning-based speech compression has achieved promising low-bitrate performance, but many neural speech codecs still describe quantized latents with preset-rate discrete symbols or apply entropy coding only after symbol generation. Such designs decouple representation learning from probability modeling, limiting their ability to exploit the non-uniform usage and temporal dependencies of learned speech latents. In this paper, we benchmark neural speech compression from a rate--distortion perspective and further investigate entropy-constrained coding for low-bitrate speech compression. We first formulate a unified learning-based speech coding pipeline and provide a benchmark-style analysis of recent neural speech codecs, showing that explicit probability modeling remains underexplored in learned speech compression. We then propose ECC, an Entropy-Constrained Codec that combines scalar quantization with a learned entropy model. ECC integrates hyperprior-based side information, channel-wise context modeling, latent residual prediction, and lightweight temporal modeling to estimate latent likelihoods for rate estimation during training and arithmetic coding during inference. To further improve low-bitrate efficiency, ECC introduces entropy skip, which omits highly predictable residual symbols using decoder-available scale estimates without transmitting additional skip masks. Extensive experiments show that ECC achieves a favorable low-bitrate rate--distortion trade-off over conventional and neural codec baselines, reducing BD-rate by 39.9% on ViSQOL and 76.3% on PESQ on average over two widely-used test sets. Ablation and diagnostic studies further validate the effectiveness of entropy modeling. Project Page: this https URL

2606.11629 2026-06-11 math.DS cs.LG 新提交

Integral Formulation of QENDy for Robust Nonlinear System Identification

QENDy的积分形式用于鲁棒非线性系统辨识

Nikhil Saran, Sushant Pokhriyal, Stefan Klus, Rushikesh Kamalapurkar, Joel A. Rosenfeld

AI总结 提出QENDy方法的积分形式,避免使用时间导数,从而增强对噪声的鲁棒性,实现更稳健的非线性动力学学习。

详情
AI中文摘要

本文提出了新定义的非线性系统二次嵌入方法(QENDy)的积分形式。在原始算法中,使用了轨迹数据点及其时间导数。计算时间导数的方法使算法对噪声敏感。我们的积分形式不使用时间导数,从而得到一种更鲁棒的动力学学习方法。

英文摘要

This manuscript proposes an integral formulation of the newly defined quadratic embedding method for identifying nonlinear systems (QENDy). In the original algorithm, trajectory data points along with their time derivatives are used. Methods for calculating time derivatives make the algorithm sensitive to noise. Our integral formulation does not use the time derivatives. This results in a more robust method to learn the dynamics.

2606.11620 2026-06-11 quant-ph cs.ET cs.LG 新提交

Family-Aware Residual Architecture for Predicting Quantum Circuit Simulation Performance

面向预测量子电路模拟性能的族感知残差架构

Honjar Xing, Yehong Jiang, Xianbang Wang, Zehua Wang, Zhicheng Jiang

AI总结 提出族感知残差架构,利用电路族分类和算法指纹特征,预测量子电路模拟的最小近似阈值和运行时间,在7-130量子比特、10个算法族上实现79.5%精确阈值准确率和R²=0.82运行时间相关性。

详情
Comments
Accepted as a full paper at IEEE ISVLSI 2026 (QC-CSAA Workshop). To appear in IEEE Xplore. 6 pages, 1 figure, 2 tables
AI中文摘要

近似张量网络模拟器能够对超出精确方法范围的量子电路进行经典模拟,但选择最优近似参数(如键维阈值)仍然是一个成本高昂的试错过程。我们提出了一种族感知神经架构,仅根据电路的OpenQASM描述和执行上下文,即可预测实现目标保真度所需的最小近似阈值以及量子电路模拟的预期挂钟运行时间。我们的关键洞察是,来自不同算法族(例如QFT、Grover、VQE)的量子电路由于其不同的纠缠结构而表现出根本不同的模拟成本曲线。我们采用族条件残差校正——在共享骨干网络之上添加的、针对特定族的加性调整,借鉴了已建立的条件计算技术——使模型能够同时捕获通用电路属性和算法细微差别。该架构包含一个预训练的族分类器(准确率97.5%)和从门组成启发式算法导出的领域信息算法指纹特征。在跨越7-130量子比特、10个算法族的电路上评估,我们的系统实现了79.5%的精确阈值准确率(91.2%在一个阶梯内)和R²=0.82的运行时间相关性,推理时间约为50毫秒——取代了可能需要数分钟到数小时的试错模拟运行。消融研究证实,族感知建模提供了最大的单一性能改进(+3.2个百分点),验证了算法族是模拟成本预测的一等特征的假设。

英文摘要

Approximate tensor-network simulators enable classical simulation of quantum circuits beyond the reach of exact methods, but selecting optimal approximation parameters -- such as bond dimension thresholds -- remains a costly trial-and-error process. We present a family-aware neural architecture that predicts both the minimum approximation threshold required to achieve target fidelity and the expected wall-clock runtime for quantum circuit simulation, given only the circuit's OpenQASM description and execution context. Our key insight is that quantum circuits from different algorithmic families (e.g., QFT, Grover, VQE) exhibit fundamentally distinct simulation cost profiles due to their differing entanglement structures. We employ family-conditioned residual corrections -- additive, family-specific adjustments atop a shared backbone, drawing on established conditional computation techniques -- enabling the model to capture both universal circuit properties and algorithmic nuances. The architecture incorporates a pretrained family classifier (97.5% accuracy) and domain-informed algorithm fingerprint features derived from gate-composition heuristics. Evaluated on circuits spanning 7--130 qubits across 10 algorithm families, our system achieves 79.5% exact threshold accuracy (91.2% within one rung) and $R^2 = 0.82$ runtime correlation, with inference completing in approximately 50 ms -- replacing trial-and-error simulation runs that may take minutes to hours. Ablation studies confirm that family-aware modeling provides the single largest performance improvement (+3.2 percentage points), validating the hypothesis that algorithm family is a first-class feature for simulation cost prediction.

2606.11614 2026-06-11 cs.LG cs.AI cs.CV 新提交

Information-Theoretic Decomposition for Multimodal Interaction Learning

多模态交互学习的信息论分解

Zequn Yang, Yake Wei, Haotian Ni, Zhihao Xu, Di Hu

AI总结 提出基于信息论的多模态交互分解方法DMIL,通过变分分解架构和微调策略学习样本特定的冗余、独特和协同交互,提升多模态学习性能。

详情
Comments
Accepted to CVPR 2026
AI中文摘要

多模态学习依赖于捕获跨模态的冗余、独特和协同信息,这些信息共同构成多模态交互。一个关键但尚未充分探索的挑战是,这些隐式交互在不同样本间动态变化。在这项工作中,我们首次进行了系统的信息论分析,强调了学习这些动态的、样本特定的交互对于有效多模态学习的重要性。我们的分析进一步揭示了传统范式在学习这些不同交互类型方面的缺陷:模态集成方法难以捕获协同,而联合学习范式往往未能充分利用冗余信息。这突显了对一种能够基于每个样本自适应地从不同交互类型中学习的方法的需求。为此,我们提出了基于分解的多模态交互学习(DMIL),一种显式建模并学习样本特定交互的新范式。首先,我们设计了一个变分分解架构来分离组成交互组件。其次,我们采用了一种新的学习策略,在微调过程中利用这些显式交互组件来实现全面的交互学习。跨不同任务和架构的大量实验表明,DMIL通过适应整体的样本特定交互,始终实现了优越的性能。我们的框架灵活且广泛适用,建立了一个以交互为中心的多模态学习范式。代码可在以下网址获取:此 https URL。

英文摘要

Multimodal learning hinges on capturing redundant, unique, and synergistic information across modalities, which collectively constitute multimodal interactions. A critical yet underexplored challenge is that these implicit interactions vary dynamically across samples. In this work, we present the first systematic, information-theoretic analysis highlighting why learning these dynamic, sample-specific interactions is critical for effective multimodal learning. Our analysis further reveals deficits in conventional paradigms at learning these distinct interaction types: modality ensemble approaches struggle to capture synergy, while joint learning paradigms often under-utilize redundant information. This highlights the need for an approach that can adaptively learn from different interaction types on a per-sample basis. To this end, we propose Decomposition-based Multimodal Interaction Learning (DMIL), a novel paradigm that explicitly models and learns from sample-specific interactions. First, we design a variational decomposition architecture to isolate the constituent interaction components. Second, we employ a new learning strategy that leverages these explicit interaction components in a fine-tuning process to achieve comprehensive interaction learning. Extensive experiments across diverse tasks and architectures demonstrate that DMIL consistently achieves superior performance by adapting to holistic sample-specific interactions. Our framework is flexible and broadly applicable, establishing an interaction-centric paradigm for multimodal learning. The code is available at this https URL.

2606.11613 2026-06-11 cs.IR cs.CL cs.HC cs.SI 新提交

Factions Within, Uncertain Across: Within-Document Reader Sub-Groups in Social Highlighting

内部派系,跨文档不确定:社交高亮中的文档内读者子群体

Kazuki Nakayashiki, Keisuke Watanabe

AI总结 通过保留边界的曲线球零模型,发现文档内读者形成强子群体,其一致性远超共享显著性预测,且大部分源于细粒度读者特定共识;跨文档稳定性未解决。

详情
Comments
11 pages, 3 figures, 3 tables
AI中文摘要

当许多人高亮同一文档时,人群是单一共识,还是内部结构化为标记不同内容的读者子群体?这种结构是读者的稳定属性还是文档的属性?基于先前工作表明个体文档内高亮信号是低语而个体性存在于选择中,我们在一个共读平台上使用保留边界的曲线球零模型提出群体层面问题。实验1:在文档内,读者形成强子群体——配对一致性远超共享显著性、标记密度和句子流行度所预测的(最近邻一致性z=+6.3,在88%的文档中显著)。在八块区域保留零模型下,与文档相同粗略区域的共享参与解释了约40%的额外一致性;大部分以更细粒度的读者特定一致性存在(z=+3.6,77%显著)。因此,文档内人群在描述意义上是派系化的。实验2:这种分组是稳定的读者特质吗?这里我们诚实地面对统计功效。配对一致性的跨文档分半可重复性在合并后接近零(两个独立抽取样本中分别为+0.078和0.000),功效校准表明该检验仅对共读许多文档的配对有信息。在唯一有信息的高重叠子集(k>=4)中,点估计为正但小样本,在独立抽取样本间不精确,从未显著,并在区域保留零模型下衰减。因此,我们未解决跨文档稳定性:数据与从情境分组到弱至中等稳定读者特质的一切一致。人群在文档内是派系化的;这些派系是否随读者跨文档迁移,诚实地讲,超出了我们的能力范围。

英文摘要

When many people highlight the same document, is the crowd a single consensus, or is it internally structured into reader sub-groups that mark different things -- and is that structure a stable property of a reader or of the document? Building on prior work showing an individual's within-document highlighting signal is a whisper while individuality lives in selection, we ask the group-level question on a co-readership platform using a margin-preserving curveball null. Experiment 1: within a document, readers form strong sub-groups -- pairs agree far beyond what shared salience, mark density, and sentence popularity predict (nearest-neighbour agreement z=+6.3, significant in 88% of documents). Under an eight-block region-preserving null, shared engagement with the same coarse regions of the document accounts for about 40% of this excess; the majority survives as finer reader-specific agreement (z=+3.6, 77% significant). So the within-document crowd is, in a descriptive sense, factional. Experiment 2: is that grouping a stable reader trait? Here we are honest about power. The cross-document split-half reproducibility of a pair's agreement is near zero pooled (+0.078 and 0.000 in two separately drawn samples), and a power calibration shows the test is informative only for pairs that co-read many documents. In the only informative high-overlap subset (k>=4), point estimates are positive but small-sample, imprecise across the separately drawn samples, never significant, and attenuate under the region-preserving null. We therefore leave cross-document stability unresolved: the data is consistent with anything from situational grouping to a weak-to-moderate stable reader trait. The crowd is factional within a document; whether its factions follow the reader across documents is, honestly, beyond our reach.

2606.11605 2026-06-11 cs.LG cs.AI 新提交

Physics-Distilled Neural Network enabled by Large Language Models for Manufacturing Process-Property Predictive Modeling

基于大语言模型的物理蒸馏神经网络用于制造过程-性能预测建模

Ge Song, Kiarash Naghavi Khanghah, Anandkumar Patel, Rajiv Malhotra, Hongyi Xu

AI总结 提出一种知识蒸馏框架,利用大语言模型从文献中提取物理先验,通过图掩码注意力层捕获变量依赖,蒸馏至轻量学生模型,在数据稀缺下实现高精度预测与实时部署。

详情
Comments
Under review, Journal of Computing and Information Science in Engineering
AI中文摘要

预测制造过程中的过程-性能关系常面临高实验成本和复杂'黑箱'模型可解释性有限的挑战。本文提出一种新颖的知识蒸馏框架,旨在数据稀缺场景下实现高精度预测。该框架将分析性物理先验(通过大语言模型从科学文献中系统提取)集成到特权教师模型中。我们采用图掩码注意力层来捕获输入变量间复杂的物理依赖关系,这些变量表现为严格设定点或静态与高频时间特征的组合。这种特权知识被蒸馏到轻量级学生预测器中进行推理。通过在五种不同制造过程中的综合实验,评估了该框架的可行性和鲁棒性。为确保统计可靠性,鉴于数据集规模较小,采用重复K折交叉验证技术来量化模型稳定性和泛化能力。结果表明,所提框架在所有评估领域均持续实现高预测精度。最重要的是,该架构表现出显著的容错性,即使在LLM推导的分析先验次优或不完整的情况下,也能保持稳健的预测性能。此外,学生预测器的推理频率超过6000 Hz,便于在标准工业硬件上进行实时边缘部署。这项工作为在数据受限环境下弥合理论物理与实时工业监测之间的差距提供了可扩展的解决方案。

英文摘要

Predicting process-property relationships in manufacturing is often challenged by high experimental costs and the limited interpretability of complex 'black-box' models. This paper proposes a novel knowledge distillation framework designed to achieve high-accuracy predictions in data-scarce scenarios. The framework integrates analytical physics priors, which are systematically extracted from scientific literature via Large Language Models, into a privileged teacher model. We employ a Graph-Masked Attention layer to capture the complex physical dependencies among input variables showing strict setpoints or a combination of static and high-frequency temporal signatures. This privileged knowledge is distilled into a lightweight student predictor for inference. The feasibility and robustness of the framework are evaluated through a comprehensive experiment across five diverse manufacturing processes. To ensure statistical reliability, given the small dataset sizes, a repeated K-fold cross-validation technique is employed to quantify model stability and generalization. Results indicate that the proposed framework consistently achieves high predictive accuracy across all evaluated domains. Most importantly, the architecture demonstrates significant fault tolerance by maintaining robust predictive performance even in scenarios where LLM-derived analytical priors are suboptimal or incomplete. Furthermore, the student predictor achieves an inference frequency exceeding 6000 Hz, which facilitates real-time edge deployment on standard industrial hardware. This work provides a scalable solution for bridging the gap between theoretical physics and real-time industrial monitoring in data-limited environments.

2606.11596 2026-06-11 eess.SY cs.AI 新提交

Model-Based and Data-Driven Hierarchical Control and Topology Co-Design for Robust Networked Systems

基于模型和数据驱动的鲁棒网络系统分层控制与拓扑协同设计

Shirantha Welikala, Zihao Song, Hai Lin, Panos J. Antsaklis

AI总结 针对线性子系统构成的网络系统,提出基于模型和仅依赖轨迹数据的分层控制策略,结合耗散性理论与线性矩阵不等式实现局部与全局耗散性保证及拓扑优化,并应用于直流微电网的鲁棒电压调节与电流共享。

详情
Comments
To be submitted to Automatica
AI中文摘要

本文考虑一类由相互连接的线性子系统、扰动输入和性能输出构成的网络系统。利用耗散性理论,我们首先提出一种基于模型的分层控制设计策略,确保闭环网络系统从扰动输入到性能输出是耗散的。这包括为每个子系统设计局部控制器以强制执行局部耗散性保证,然后利用这些保证协同设计分布式全局控制器和互连拓扑,以在优化互连拓扑成本的同时强制执行全局耗散性保证。整个设计过程仅需求解一系列线性矩阵不等式(LMI)问题,从而保持组合性和可分散性,同时避免低效且集中的非凸迭代设计过程。这种基于模型的分层控制设计策略假设已知子系统动力学,这在许多实际网络系统中可能不成立。受此启发,我们还提出了一种数据驱动的分层控制设计策略,该策略仅假设子系统可获取丰富的输入-状态-输出轨迹数据。所提出的数据驱动设计过程假设影响子系统动力学的未知扰动受二次矩阵不等式约束(放宽了常规界限),并通过使用矩阵S引理来考虑这一点。最后,以直流微电网网络系统为例,验证了所提出的基于模型和数据驱动的分层控制设计在实现鲁棒(耗散)电压调节和电流共享方面的有效性。

英文摘要

In this paper, we consider a class of networked systems comprising an interconnected set of linear subsystems, disturbance inputs, and performance outputs. Using dissipativity theory, we first propose a model-based hierarchical control design strategy to ensure the closed-loop networked system is dissipative from its disturbance inputs to performance outputs. This involves designing local controllers for each subsystem to enforce local dissipativity guarantees, which are then exploited to co-design distributed global controllers and the interconnection topology to enforce global dissipativity guarantees while optimizing interconnection topology costs. The overall design process requires only solving a sequence of linear matrix inequality (LMI) problems, thereby retaining compositionality and decentralizability while avoiding non-convex, iterative design processes that are inefficient and centralized. This model-based hierarchical control design strategy assumes the knowledge of the subsystem dynamics, which may not hold in many real-world networked systems. Motivated by this, we also propose a data-driven hierarchical control design strategy that assumes only the availability of rich input-state-output trajectory data from the subsystems. The proposed data-driven design process assumes that the unknown disturbances affecting the subsystem dynamics are bounded by a quadratic matrix inequality (relaxing conventional bounds) and accounts for this by using the matrix S-lemma. Finally, the effectiveness of the proposed model-based and data-driven hierarchical control designs is illustrated for a networked system representing a DC microgrid, with the aim of enforcing robust (dissipative) voltage regulation and current sharing.

2606.11581 2026-06-11 eess.AS cs.SD 新提交

Sensitivity Analysis of Generative Spatial Audio Metrics: A Study on Responsiveness, Smoothness, and Symmetry

生成式空间音频指标的敏感性分析:响应性、平滑性和对称性研究

Purnima Kamath, Adrian S. Roman, Koichi Saito, Yuki Mitsufuji, Juan P. Bello

AI总结 提出一个框架分析生成式空间音频指标对空间参数变化的敏感性,定义响应性、平滑性和对称性三个期望属性,评估标准指标后发现FAD和声学地图表现最佳。

详情
Comments
Accepted for publication at Interspeech 2026
AI中文摘要

由于对指标如何响应方位角和仰角等空间参数变化的理解有限,评估一阶环绕声(FOA)的生成式空间音频仍然具有挑战性。我们借鉴参数化声音合成中的敏感性分析原理,提出了一个沿连续空间轨迹分析指标敏感性的框架。通过使用复杂度递增的受控FOA场景,我们定义了指标行为的三个期望属性:响应性、平滑性和对称性。我们评估了标准基于分布和基于样本的指标,包括Fréchet音频距离(FAD)、强度向量和声学地图。我们的发现表明,使用定位特定嵌入和声学地图的FAD在不同条件下具有高响应性以及稳健的平滑性和对称性,而强度向量随着场景复杂度的增加而退化。这是研究生成式空间音频指标敏感性的第一步。

英文摘要

Evaluating generative spatial audio for First-Order Ambisonics (FOA) remains challenging due to a limited understanding of how metrics respond to changes in spatial parameters such as azimuth and elevation. We propose a framework to analyze metric sensitivity along continuous spatial trajectories, drawing on principles of sensitivity analysis in parametric sound synthesis. Using controlled FOA scenes with increasing scene complexity, we define three desiderata for metric behavior: Responsiveness, Smoothness, and Symmetry. We assess standard distribution-based and sample-based metrics, including Fréchet Audio Distance (FAD), intensity vectors, and acoustic maps. Our findings show that FAD using localization-specific embeddings and acoustic maps yield high Responsiveness and robust Smoothness and Symmetry across conditions, while intensity vectors degrade with increasing scene complexity. This is the first step towards investigating the sensitivity of metrics for generative spatial audio.

2606.11578 2026-06-11 cs.CV 新提交

Contactless 3D Human Body Measurement Using Depth Cameras for Smart Health Monitoring

基于深度相机的非接触式3D人体测量用于智能健康监测

Martha Asare, Xuan Wang, Juan Lopez Alvarenga, Lois Akosua Serwaa, Jinghao Yang

AI总结 提出一种基于深度相机和3D点云的非接触式人体测量框架,通过空间滤波、地标选择及体素/网格分析实现身高、臂展、体积和表面积等关键指标的准确估计。

详情
Comments
6 pages, 4 figures. Depth camera-based framework for contactless anthropometric measurement and geometric analysis using 3D point clouds
AI中文摘要

非接触式人体测量技术对于智能健康监测、数字健康应用和远程患者评估日益重要。传统的人体测量通常需要物理接触和训练有素的人员,这可能限制其在远程医疗环境中的可扩展性。在本研究中,我们介绍了一种基于深度相机的框架,利用3D点云数据估计人体测量值。使用Orbbec Astra 2深度相机捕获参与者的RGB图像、深度图和3D点云。利用基于Python的工具(包括Open3D、NumPy和OpenCV)处理捕获的点云,将人体从背景中分割出来。计算关键的人体测量值,如身高和臂展。通过3D点云上的空间滤波和地标选择组合获得测量值,然后利用相机内参将计算出的测量值投影到对应的RGB图像上。除了线性测量外,还使用基于体素的占用分析和基于网格的表面重建方法估计了近似身体体积和可见表面积。单次深度捕获的实验结果表明,无需物理接触即可从深度相机数据中获得准确的人体测量值和几何估计。本研究为未来将深度感知与智能健康监测和生成式AI模型相结合的实时系统奠定了基础,用于智能医疗应用。

英文摘要

Contactless body measurement technologies are becoming increasingly significant for smart health monitoring, digital health applications, and remote patient assessment. Traditional anthropometric measurements typically necessitate physical contact and trained personnel, which may constrain scalability in remote healthcare settings. In this study, we introduce a depth camera-based framework for estimating human body measurements utilizing 3D point cloud data. An Orbbec Astra 2 depth camera was employed to capture RGB images, depth maps, and 3D point clouds of participants. The captured point cloud was processed using Python-based tools, including Open3D, NumPy, and OpenCV, to segment the human body from the background. Key anthropometric measurements, such as height and arm span, were computed. The measurements were obtained through a combination of spatial filtering and landmark selection on the 3D point cloud, followed by the projection of the computed measurements onto the corresponding RGB image using camera intrinsic parameters. In addition to linear measurements, the approximate body volume and visible surface area were estimated using voxel-based occupancy analysis and mesh-based surface reconstruction methods. The experimental results from a single depth capture demonstrated that accurate body measurements and geometric estimates could be obtained from depth camera data without physical contact. This study provides a foundation for future real-time systems that integrate depth sensing with intelligent health monitoring and generative AI models for smart healthcare applications.

2606.11570 2026-06-11 stat.ML cs.LG stat.ME 新提交

Enhancing Spectral Embedding through Robust and Flexible Knowledge Transfer in Electronic Health Records

通过电子健康记录中的鲁棒且灵活的知识迁移增强谱嵌入

Feiqing Huang, Zongqi Xia, Rong Ma, Tianxi Cai

AI总结 提出一种基于谱的无监督表示学习框架,通过从更广泛人群提取知识矩阵并放松信号对齐假设,为罕见病队列生成低维嵌入,在模拟和真实多发性硬化症数据中优于现有方法。

详情
AI中文摘要

我们提出了一种基于谱的无监督表示学习框架,用于从电子健康记录中为罕见病队列的临床概念和患者导出低维嵌入,其中数据是高维的但样本量有限。为了克服这一挑战,我们引入了一个从更广泛人群中提取的知识矩阵,该矩阵与罕见病队列共享部分重叠的子空间。我们的方法不同于现有方法,它放松了潜在数据矩阵和知识矩阵之间严格的一对一信号对齐假设,允许更灵活和现实的结构化共享形式。我们引入了一种新颖的两步谱嵌入过程:首先,我们从知识矩阵中识别并移除不相关的成分;然后,我们应用基于投影的方法分别恢复共享和异质成分。模拟和对真实世界多发性硬化症队列的分析表明,所提出的方法优于竞争方法,特别是在共享信号较弱且仅部分对齐的挑战性场景中,这在罕见病数据中很常见。

英文摘要

We propose a spectral-based, unsupervised representation learning framework to derive low-dimensional embeddings for clinical concepts and patients in rare disease cohorts from electronic health records, where data are high-dimensional but sample sizes are limited. To overcome this challenge, we incorporate a knowledge matrix extracted from a broader population that shares a partially overlapping subspace with the rare-disease cohort. Our method departs from existing approaches by relaxing restrictive one-to-one signal-alignment assumptions between the latent data matrix and knowledge matrix, allowing more flexible and realistic forms of structured sharing. We introduce a novel two-step spectral embedding procedure: first, we identify and remove irrelevant components from the knowledge matrix; then, we apply a projection-based method to separately recover shared and heterogeneous components. Simulations and an analysis of a real-world multiple sclerosis cohort show that the proposed method outperforms competing approaches, particularly in challenging scenarios where shared signals are weak and only partially aligned, as is common in rare-disease data.

2606.11563 2026-06-11 cs.CV cs.RO 新提交

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

自然环境中机器人感知的跨模态基准测试

David Hall, Joshua Knights, Mark Cox, Peyman Moghadam

AI总结 针对自然环境中机器人感知的挑战,提出WildCross跨模态基准,用于大规模自然场景下的地点识别和度量深度估计,并扩展了度量深度估计实验。

详情
Comments
Accepted to the IEEE ICRA Workshop on Open Challenges for Rigorous Robot Perception 2026
AI中文摘要

自然环境对机器人感知系统提出了复杂挑战。当前模型,特别是视觉基础模型,主要在有结构的城市环境中训练,导致其在野外机器人任务的感知中存在弱点。我们利用最近发布的WildCross基准展示了当前模型的局限性,这是一个用于大规模自然环境中地点识别和度量深度估计的新型跨模态基准。WildCross包含超过476K个顺序RGB帧,带有半稠密深度和表面法线标注,每个帧都与准确的6DoF姿态和同步的稠密激光雷达子地图对齐。在这项工作中,我们提供了对最近WildCross基准结果的扩展分析,特别强调扩展的度量深度估计实验。本工作的代码仓库和数据集可在https://csiro-robotics.github.io/WildCross获取。

英文摘要

Natural environments present a complex challenge to robotics perception systems. Current models, particularly vision foundation models, are largely trained on structured, urban environments leading to weaknesses in their perception for field robotics tasks. We showcase the limitations of current models using our recently released WildCross benchmark, a new cross-modal benchmark for place recognition and metric depth estimation in large-scale natural environments. WildCross comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF pose and synchronized dense lidar submaps. In this work, we provide an expanded analysis of the benchmark results from the recent WildCross benchmark, with particular emphasis on expanded metric depth estimation experiments. Access to the code repository and dataset for this work can be found at https://csiro-robotics.github.io/WildCross.

2606.11560 2026-06-11 cs.DB cs.AI 新提交

LLMs+Graphs: Toward Graph-Native, Synergistic AI Systems

LLMs+Graphs:迈向图原生的协同人工智能系统

Arijit Khan, Longxu Sun, Xin Huang

AI总结 本文综述了大语言模型与图计算的三种协同方式,包括增强推理、知识图谱双向集成及图算法增强的AI代理,并探讨了图数据管理与图机器学习的新能力,旨在为构建下一代图原生AI系统提供统一视角。

详情
Comments
10 pages, Accepted at PAKDD 2066 Tutorial
AI中文摘要

大语言模型(LLMs)发展迅速,但它们在结构化和多跳推理方面的局限性凸显了对图原生、协同人工智能(AI)系统的需求。图结构数据支撑着社交、生物、金融、交通、网络和知识领域的关键应用,因此理解LLMs如何利用图计算进行基于上下文的扎实推理至关重要。三种互补的协同方式正在涌现:通过图计算增强LLMs进行检索和推理;LLMs与知识图谱(KGs)的双向集成,其中LLMs支持KG构建和整理,而KGs强制执行语义约束和事实一致性;以及通过图算法增强的AI代理进行规划、决策和多步推理。同时,LLMs通过自然语言接口和混合LLM-图神经网络(GNN)流水线,为图数据管理和图机器学习(ML)引入了新能力。本教程综合了推动这些融合方向的算法、系统和设计原则,为数据科学和数据挖掘研究人员提供了将LLMs、图数据管理、图挖掘、图ML和代理计算集成到下一代图原生AI系统中的统一视角。

英文摘要

Large Language Models (LLMs) have advanced rapidly, but their limitations in structured and multi-hop reasoning underscore the need for graph-native, synergistic artificial intelligence (AI) systems. Graph-structured data underpins critical applications across social, biological, financial, transportation, web, and knowledge domains, making it essential to understand how LLMs can leverage graph computation for grounded, context-rich inference. Three complementary synergies are emerging: LLMs augmented with graph computation for retrieval and reasoning; bidirectional integration between LLMs and knowledge graphs (KGs), where LLMs support KG construction and curation while KGs enforce semantic constraints and factual consistency; and AI agents strengthened by graph algorithms for planning, decision making, and multi-step reasoning. In parallel, LLMs introduce new capabilities for graph data management and graph machine learning (ML) through natural language interfaces and hybrid LLM-graph neural network (GNN) pipelines. This tutorial synthesizes the algorithms, systems, and design principles driving these converging directions, offering data science and data mining researchers a unified perspective on integrating LLMs, graph data management, graph mining, graph ML, and agentic computation into next-generation graph-native AI systems.

2606.11556 2026-06-11 cs.CR cs.AI cs.LG 新提交

Privacy-Preserving Federated Autoencoder for ECG Anomaly Detection on Edge Devices

面向边缘设备上心电图异常检测的隐私保护联邦自编码器

Kaan Arda Akyol, Jakub Kacper Szeląg, Aydin Abadi, Maha Alghamdi, Ghadah Albalawi, Ghouse Ibrahim Kaleelullah, Hilal Tutus, Sarah Al Subaiei, Shardul Kapse, Syed Mohammed Raheeb, Mujeeb Ahmed, Rehmat Ullah

AI总结 提出一种结合联邦学习、差分隐私和INT8量化的端到端系统,在PTB-XL数据集上实现无监督12导联ECG异常检测,满足隐私、实时性和非IID数据要求。

详情
Comments
9 pages, 4 figures, 6 tables. Preprint prepared in IEEE conference format. Submitted to: FLTA 2026
AI中文摘要

连续心电图监测可以在心律异常演变为心血管事件之前发现它们。然而,一个可部署的系统必须同时满足三个要求:法律级别的隐私(GDPR、HIPAA)、在受限边缘硬件上的实时推理以及在非IID跨医院数据下的检测质量。我们设计并评估了一个端到端的联邦系统,在PTB-XL数据集上解决了无监督12导联ECG异常检测的所有三个要求,结合了三种自编码器家族(VanillaAE、ConvAE、VAE)、基于Flower的联邦平均(FedAvg)跨十个模拟医院、客户端差分隐私SGD(DP-SGD)与Rényi-DP会计,以及使用Raspberry Pi 4基准测试的8位整数(INT8)训练后量化。我们的主要贡献是:这些机制如何组合的经验性特征、实用的DP特定建议,以及针对临床敏感环境的技术和安全见解。联邦学习在所有架构上匹配或超过集中基线(ConvAE联邦ROC曲线下面积AUROC为0.782),并且ε扫描确定ε=4为推荐的临床操作点。INT8量化大致将模型大小减半,并将Pi 4延迟降低多达44%,AUROC损失小于0.12%。关键的是,DP和量化的惩罚在经验上是独立的,因此从业者不需要为了紧凑的边缘足迹而牺牲强大的隐私保证。据我们所知,这是第一个结合联邦学习、形式化(ε,δ)-DP、无监督重建检测和量化AArch64部署的系统。

英文摘要

Continuous electrocardiography (ECG) monitoring could surface rhythm abnormalities before they escalate into cardiovascular events. However, a deployable system must satisfy three requirements simultaneously: legal-grade privacy (GDPR, HIPAA), real-time inference on constrained edge hardware, and detection quality under non-IID cross-hospital data. We design and evaluate an end-to-end federated system addressing all three for unsupervised 12-lead ECG anomaly detection on PTB-XL dataset, combining three autoencoder families (VanillaAE, ConvAE, VAE), Flower-based federated averaging (FedAvg) across ten simulated hospitals, client-side differentially private SGD (DP-SGD) with a Rényi-DP accountant, and 8-bit integer (INT8) post-training quantization with Raspberry Pi 4 benchmarking. Our main contributions are: an empirical characterization of how these mechanisms compose, practical DP-specific recommendations, and technical and security insights for a clinically sensitive setting. Federated learning matches or exceeds the centralized baseline across all architectures (ConvAE federated area under the ROC curve, AUROC, $0.782$), and an $\varepsilon$ sweep identifies $\varepsilon=4$ as the recommended clinical operating point. INT8 quantization roughly halves model size and cuts Pi 4 latency by up to $44%$ with $<0.12%$ AUROC loss. Crucially, DP and quantization penalties are empirically independent, so practitioners need not trade a strong privacy guarantee for a compact edge footprint. To our knowledge, this is the first system combining federated learning, formal $(\varepsilon,\delta)$-DP, unsupervised reconstruction-based detection, and quantized AArch64 deployment.

2606.11555 2026-06-11 q-bio.NC cs.AI cs.LG 新提交

End-to-End Machine Learning for Depressive State Classification via EEG and fNIRS

基于EEG和fNIRS的抑郁状态分类的端到端机器学习

Riki Sakurai, Simon Kojima, Mihoko Otake-Matsuura, Shin'ichiro Kanoh, Tomasz M. Rutkowski

AI总结 本研究提出一个端到端机器学习框架,利用EEG和fNIRS信号对抑郁状态进行分类,旨在克服传统诊断的主观性,为临床提供客观的自动化诊断工具。

详情
Comments
4 pages, 4 figures, Accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026
AI中文摘要

随着社会压力的增加,对心理医疗的需求不断上升,凸显了传统精神病学诊断的局限性。传统方法——主要依赖临床访谈和患者自我报告——本质上容易受到主观偏见和从业者不同的经验判断的影响。为了满足定量评估的需求,基于生物信号的检测,包括脑电图(EEG)和功能性近红外光谱(fNIRS),已成为一种有前景的客观替代方案。这类技术对于识别可能未被受试者自身意识到的潜在抑郁状态尤为重要。此外,在老龄化人群中,抑郁症与痴呆症的高共病性要求早期区分,以防止症状相互恶化并维持生活质量(QoL)。这项针对11名健康学生的初步研究建立了一个基于生物信号的抑郁症检测框架,为临床使用的自动化、客观诊断工具奠定了基础。

英文摘要

The escalating demand for mental healthcare, driven by rising societal stress, highlights the limitations of traditional psychiatric diagnostics. Conventional methods - relying primarily on clinical interviews and patient self-reports - are inherently vulnerable to subjective bias and the varying empirical judgment of practitioners. To address the need for quantitative evaluation, biological signal-based detection, including electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a promising objective alternative. Such technology is particularly vital for identifying latent depressive states that may be unrecognized by the subjects themselves. Furthermore, in aging populations, the high comorbidity between depression and dementia necessitates early differentiation to prevent mutual symptom exacerbation and maintain Quality of Life (QoL). This pilot study of eleven healthy students establishes a framework for biological signal-based depression detection, serving as a foundational step toward automated, objective diagnostic tools for clinical use.