arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2606.01539 2026-06-02 stat.ME cs.LG

Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data

纵向数据中罕见事件的可扩展反事实风险估计

Xiaohui Yin, Avijit Mitra, Ying Zhou, Kun Chen, Hong Yu

发表机构 * University of Connecticut Storrs(康涅狄格大学斯托尔分校) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) University of Massachusetts Lowell(马萨诸塞大学洛厄尔分校)

AI总结 针对纵向生存数据中罕见事件导致的类不平衡和计算负担问题,提出一种可扩展的子采样与重加权策略,应用于ICE等因果效应估计器,在保持一致性的同时提高稳定性。

Comments Accepted at KDD-2026, 12 pages

详情
AI中文摘要

在大规模观察性研究中,估计时变治疗对生存结果的因果效应在计算上要求很高,尤其是当结果罕见时。虽然基于g公式的方法(如迭代条件期望(ICE)估计器)为纵向因果推断提供了原则性框架,但它们在计算上变得昂贵,特别是当需要基于自助法的方差估计时。此外,每个时间点的结果罕见性会导致严重的类不平衡,从而引发逻辑回归及相关模型的不稳定性和收敛问题。为应对这些挑战,我们提出了一种针对纵向生存数据的原则性子采样与重加权策略,可应用于该场景下的多种现有因果效应估计器,包括ICE估计器。所提方法显著降低了计算负担,同时在罕见结果场景下保持一致性并提高估计稳定性。我们通过模拟评估该方法,并使用一项关于健康社会和行为决定因素(SBDH)与自杀风险的大规模EHR队列研究进行验证,证明了其在纵向数据中建模罕见结果的有效性。

英文摘要

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

2606.01504 2026-06-02 cs.IR cs.LG

Semantic Retrieval for Product Search in E-Commerce

电子商务产品搜索中的语义检索

Nikhil Kothari, Saksham Samdani, Ritam Mallick, Praveen Gupta, Ankit Vijay, Surender Kumar

发表机构 * Flipkart, India(印度Flipkart)

AI总结 针对电商搜索中短、嘈杂、口语化查询和细粒度属性区分问题,提出一种基于Siamese LLM双编码器的两阶段训练方法,通过对比学习和偏好优化实现精确匹配与排序。

详情
AI中文摘要

电子商务中的语义检索必须处理短、嘈杂和口语化的查询,并在具有细粒度属性区分的大型产品目录上进行。我们提出了一种Siamese LLM双编码器,通过两阶段流水线进行训练:首先使用带有假阴性边缘掩码的对比学习,以防止对近似重复产品的惩罚;然后进行相对赔率对齐检索(ROAR),这是一种偏好优化目标,通过连续赔率比边缘将Bradley-Terry扩展到可变大小的分级相关组。训练语料库反映了这一进展——第一阶段中替代查询-产品对提供粗略的语义监督,第二阶段中分级相关性注释驱动细粒度排序。由此产生的系统能够准确检索精确匹配,同时正确排序替代品和互补产品,在查询频率层和业务垂直领域均得到验证,并通过大规模在线A/B部署验证了统计显著性。

英文摘要

Semantic retrieval in e-commerce must handle short, noisy, and colloquial queries over large product catalogs with fine-grained attribute distinctions. We present a Siamese LLM dual-encoder trained through a two-stage pipeline: contrastive learning with a false-negative margin mask to prevent penalization of near-duplicate products, followed by Relative Odds Alignment for Retrieval (ROAR), a preference optimization objective that extends Bradley-Terry to variable-sized graded relevance groups via consecutive odds-ratio margins. The training corpus mirrors this progression - substitute query-product pairs provide coarse semantic supervision in Stage 1 and graded relevance annotations drive fine-grained ranking in Stage 2. The resulting system accurately retrieves exact matches while correctly ordering substitutes and complementary products, with gains confirmed across query-frequency strata and business verticals, and statistical significance validated through live A/B deployment at scale.

2606.01494 2026-06-02 cs.CR cs.AI cs.SE

ClawHub Security Signals: When VirusTotal, Static Analysis, and SkillSpector Disagree

ClawHub安全信号:当VirusTotal、静态分析和SkillSpector意见不一致时

Vincent Koc, Patrick Erichsen, Jacob Tomlinson, Agustin Rivera, Michael Appel, Nir Paz

发表机构 * OpenClaw Foundation USA(OpenClaw基金会美国) NVIDIA United Kingdom(NVIDIA英国分公司) NVIDIA USA(NVIDIA美国)

AI总结 研究ClawHub中67,453个公开技能版本,通过VirusTotal、静态启发式分析和NVIDIA SkillSpector三种扫描器的分歧,揭示智能体技能安全需要分层治理而非单一扫描器决策。

Comments 10 pages, 1 figure, 7 tables, 1 supplimentary dataset

详情
AI中文摘要

智能体技能通过可重用的指令、工具、脚本、参考和工作流扩展AI智能体,建立了不同于模型安全和传统包恶意软件检测的安全边界。ClawHub安全信号是一个包含67,453个最新公开OpenClaw技能版本的净化数据集。每一行包含经过编辑的SKILL.md内容和净化的捆绑文件(如有),以及最终的ClawScan注册表裁决和来自三个扫描器系列(VirusTotal、静态启发式分析和NVIDIA SkillSpector)的证据。我们并非估计恶意技能的流行率,而是研究扫描器之间的分歧。三个扫描器很少标记相同的技能:任何一对扫描器在其合并阳性结果上的重叠最多为10.4%,仅0.69%的技能被所有三个扫描器标记,81.9%被标记的技能仅由单个扫描器识别。分歧由攻击面结构化。SkillSpector发出语义智能体风险警告而非恶意软件信誉信号,在25,504个可疑行中阳性19,209个(75.3%),但在206个恶意行中仅14个阳性(6.8%)。恶意裁决区域呈现相反特征:206个恶意行中150个(72.8%)为VirusTotal阳性,与捆绑代码恶意软件证据一致。这些结果表明,智能体技能安全需要分层治理,而非单一扫描器的允许/阻止决策。该语料库作为净化的银标准数据集发布:标签是注册表的自动裁决,而非人工标注的真实情况,该发布代表一个早期的、版本化的快照,旨在支持社区,同时开发人工标注的子集。鼓励进一步研究,包括针对技能安全分类的定制模型。

英文摘要

Agent skills extend AI agents with reusable instructions, tools, scripts, references, and workflows, establishing a security boundary distinct from both model safety and traditional package-malware detection. ClawHub Security Signals is a sanitized dataset of 67,453 latest public OpenClaw skill versions. Each row pairs redacted SKILL.md content and sanitized bundled files where present with a final ClawScan registry verdict and evidence from three scanner families: VirusTotal, static heuristic analysis, and NVIDIA SkillSpector. Rather than estimating malicious-skill prevalence, we study scanner disagreement. The three scanners rarely flag the same skills: any pair overlaps on at most 10.4% of their combined positives, only 0.69% of skills are flagged by all three, and 81.9% of flagged skills are identified by a single scanner. The disagreement is structured by attack surface. SkillSpector, which raises semantic agentic-risk advisories rather than malware-reputation signals, is positive for 19,209 of 25,504 suspicious rows (75.3%) but only 14 of 206 malicious rows (6.8%). The malicious-verdict region shows the inverse profile: 150 of 206 malicious rows (72.8%) are VirusTotal-positive, consistent with bundled-code malware evidence. These results show that agent-skill security requires layered governance, not single-scanner allow/block decisions. The corpus is released as a sanitized silver-standard dataset: labels are the registry's automated verdicts, not human-annotated ground truth, and the release represents an early, versioned snapshot intended to support the community while a human-annotated subset is developed. Further research is encouraged, including models tailored for skill-security triage.

2606.01490 2026-06-02 cs.SE cs.AI cs.MA

LLM Consortium for Software Design Refinement: A Controlled Experiment on Multi-Agent Collaboration Topologies

LLM联盟用于软件设计精化:多智能体协作拓扑的受控实验

Nagarjuna Kanamarlapudi, Praveen K

发表机构 * LLM Consortium for Software Design Refinement(软件设计精炼LLM联盟)

AI总结 通过受控实验评估12种多智能体LLM协作拓扑在软件架构设计中的表现,发现结构对抗变体(v4b)和跨模型评审(v2)排名前二,并行合并因令牌饥饿和弗兰肯斯坦效应表现最差。

Comments 12 pages, 9 figures, 5 tables

详情
AI中文摘要

我们提出了一项受控实验,评估了12种用于软件架构设计的多智能体LLM协作拓扑。采用$2\times2\times2$因子设计(权威性$\times$角色$\times$动态性),我们在8个不同复杂度的设计任务上进行了520次实验运行,每个任务重复5次。设计由三个独立的自动评估器(GPT-OSS 120B、Claude Opus 4.6、Claude Sonnet 4.6)按照12维评分标准进行评估。我们报告四个核心发现。第一,结构对抗(v4b)在集成排名中位列第一——一种提示工程化的对抗变体,要求重写指令而非补丁(加权集成:4.637/5.0)。第二,跨模型评审以全票获得第二——用一个模型生成,用另一个模型评审——所有三个评估器均将其排在第二(加权集成:4.606)。第三,评估器多样性本身就是一个发现——所有三个评估器一致认为v4b最好、v3最差,但对v2b分歧严重(Claude d=1.44 vs. GPT-OSS d=0.45),揭示了不同模型家族对设计质量的权重差异。第四,并行合并从根本上被破坏——所有三个评估器都将合并变体置于底层(3.65-3.79),原因是令牌饥饿和弗兰肯斯坦效应。加权集成($2\times$Opus + $2\times$Sonnet + $1\times$GPT-OSS)在520次运行中提供了稳健的排名,并通过独立交叉验证得到确认。

英文摘要

We present a controlled experiment evaluating 12 multi-agent LLM collaboration topologies for software architecture design. Using a $2\times2\times2$ factorial design (Authority $\times$ Roles $\times$ Dynamics), we conducted 520 experimental runs across 8 design tasks of varying complexity, with 5 repetitions each. Designs were evaluated on a 12-dimensional rubric by three independent automated evaluators (GPT-OSS 120B, Claude Opus 4.6, Claude Sonnet 4.6). We report four core findings. First, structural adversarial (v4b) ranks #1 by ensemble -- a prompt-engineered adversarial variant that demands rewrite mandates rather than patches (weighted ensemble: 4.637/5.0). Second, cross-model review wins unanimously at #2 -- generate with one model, review with another -- ranking #2 by all three evaluators (weighted ensemble: 4.606). Third, evaluator diversity is itself a finding -- all three evaluators agree v4b is best and v3 is worst, but disagree sharply on v2b (Claude d=1.44 vs. GPT-OSS d=0.45), revealing how different model families weight design qualities. Fourth, parallel merge is fundamentally broken -- all three evaluators place merge variants in the bottom tier (3.65-3.79), due to token starvation and the Frankenstein effect. The weighted ensemble ($2\times$Opus + $2\times$Sonnet + $1\times$GPT-OSS) provides robust rankings across 520 runs, confirmed through independent cross-validation.

2606.01487 2026-06-02 math.OC cs.RO cs.SY eess.SY

Global Convergence of a Line-Search Filter Differential Dynamic Programming Method

线搜索滤波微分动态规划方法的全局收敛性

Ming Xu, Iman Shames

发表机构 * School of Computer and Communication Sciences EPFL(计算机与通信科学系,瑞士联邦理工学院) Department of Electrical and Electronic Engineering University of Melbourne(电子与电气工程系,墨尔本大学)

AI总结 本文证明了FilterDDP算法的全局收敛性,该算法扩展了Mayne和Jacobson的离散时间微分动态规划以处理状态和控制上的非线性约束,采用线搜索滤波过程进行步长接受。

详情
AI中文摘要

在本文中,我们建立了FilterDDP算法的全局收敛性质,该算法扩展了Mayne和Jacobson [\emph{International Journal of Control}, 3, (1966), pp. 85-95] 的离散时间微分动态规划(DDP)算法,以处理状态和控制上的非线性约束以及动力学。FilterDDP采用线搜索滤波过程进行步长接受。然而,与一般非线性规划设置中应用的阻尼牛顿步不同,试验点的计算涉及应用后向递归和前向模拟。我们通过证明对于一类受约束的最优控制问题,这种后向-前向过程满足与牛顿步相同的性质,从而建立了FilterDDP的全局收敛性,目的是建立线搜索滤波方法的全局收敛性,遵循Wächter和Biegler [\emph{SIAM Journal on Optimization}, 16 (2005), pp. 1-31] 的分析。

英文摘要

In this article, we establish the global convergence properties of the FilterDDP algorithm, which extends the discrete-time differential dynamic programming (DDP) algorithm of Mayne and Jacobson [\emph{International Journal of Control}, 3, (1966), pp. 85-95] to handle nonlinear constraints over states and controls, in addition to the dynamics. FilterDDP adopts a line-search filter procedure for step acceptance. However, instead of a damped Newton step applied in the general nonlinear programming setting, the computation of a trial point involves applying a backward recursion and a forward simulation. We establish the global convergence of FilterDDP by showing that for a subset of constrained optimal control problems, the this backward-forward procedure satisfies the same properties as a Newton step for the purpose of establishing global convergence of a line-search filter method, following the analysis of Wächter and Biegler [\emph{SIAM Journal on Optimization}, 16 (2005), pp. 1-31].

2606.01470 2026-06-02 physics.flu-dyn cs.AI cs.LG

Emergent Transfer of a Physics Foundation Model from Simulation to Laboratory Turbulence

物理基础模型从模拟到实验室湍流的涌现迁移

Payel Mukhopadhyay, Stefan S. Nixon, Romain Watteaux, Michael McCabe, Alberto Bietti, Kyunghyun Cho, Cristiana Diaconu, Irina Espejo Morales, David Fouhey, Siavash Golkar, Tom Hehir, Shirley Ho, Jake Kovalic, Geraud Krawezik, Francois Lanusse, Tanya Marwah, Rudy Morel, Mariel Pettee, Helen Qu, Jeff Shen, Hadi Sotoudeh, Stuart B. Dalziel, Miles Cranmer

发表机构 * University of Cambridge(剑桥大学) CEA, DAM/DIF(法国CEA DAM/DIF) Flatiron Institute(Flatiron研究所) New York University(纽约大学) Princeton University(普林斯顿大学) Yale University(耶鲁大学) AIM, Université Paris-Saclay, Université Paris Cité, CEA, CNRS(AIM,巴黎-萨克雷大学,巴黎城市大学,CEA,CNRS) University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Polymathic AI(聚合人工智能)

AI总结 通过微调连续介质动力学基础模型Walrus,在仅使用少量模拟数据的情况下,零样本泛化到实验室瑞利-泰勒不稳定性实验,揭示了初始条件在模拟-实验差距中的关键作用。

详情
AI中文摘要

物理基础模型能否有效应用于实验室实验,仍然是科学机器学习(ML)的一个未解决问题。我们在瑞利-泰勒不稳定性(RTI)上测试了这个问题,这是一种普遍且要求苛刻的流体不稳定性,从桌面流动到超新星爆炸中都能看到,其中密度界面上的小扰动在较轻流体加速进入较重流体时演变成混沌、多尺度的混合。标准ML模型难以处理RTI,尽管经过一个多世纪的理论、数值和实验工作,模拟与实验之间仍存在一个未解决的分歧:大多数实验室实验中测量的后期混合增长率$\alpha$(约0.06-0.07)大约是理想直接数值模拟(DNS,约0.02)的三倍。这一差距的起源仍有争议。这些特性使RTI成为一个严格的测试,其意义远超RTI本身:仅基于模拟训练的基础模型能否泛化到稀疏、杂乱且嘈杂的实验室环境?我们对连续介质动力学基础模型Walrus进行了微调,使用三个或更少的DNS实现,并在长时间滚动中恢复了关键的RTI物理特性。将微调模型零样本应用于滑动屏障实验室数据,它离开了类似DNS的区域,进入了观察到的增长带,而从未见过任何实验样本。这些结果提供了独立的数据驱动证据,表明初始条件在长期存在的模拟-实验$\alpha$差距中起着关键作用。该模型还零样本泛化到稳定分层(一种训练中未出现的浮力状态),正确减缓了混合层增长。总之,我们的结果表明,基础模型可以很好地泛化到训练数据之外,预测实验室行为和未见过的物理状态,为探索长期存在的模拟-实验差距开辟了新途径。

英文摘要

Whether physics foundation models can be usefully deployed on laboratory experiments remains an open question for scientific machine learning (ML). We test this question on the Rayleigh-Taylor instability (RTI), a ubiquitous and demanding fluid instability seen from tabletop flows to supernova explosions, in which small perturbations at a density interface grow into chaotic, multiscale mixing as a lighter fluid accelerates into a heavier one. Standard ML models struggle with RTI, and despite over a century of theoretical, numerical, and experimental work, it carries an unresolved discrepancy between simulation and experiment: the late-time mixing growth rate, $α$, measured in most laboratory experiments ($\sim$ 0.06-0.07), is roughly three times the value from idealized direct numerical simulations (DNS, $\sim$ 0.02). The gap's origin remains debated. These properties make RTI a stringent test for a question that matters well beyond RTI: can foundation models trained only on simulations generalise to sparse, messy, and noisy laboratory settings? We finetune Walrus, a foundation model for continuum dynamics, on three or fewer DNS realizations and recover key RTI physics over long rollouts. Applied zero-shot to sliding-barrier laboratory data, the finetuned model leaves the DNS-like regime and enters the observed growth band, having never seen a single experimental sample. These results provide independent, data-driven evidence that initial conditions play a crucial role in the longstanding sim-experiment gap in $α$. The model also generalises zero-shot to stable stratification, a buoyancy regime absent from training, correctly slowing mixing-layer growth. Together, our results show that foundation models can generalise well beyond their training data, predicting laboratory behavior and unseen physical regimes, opening new ways to probe longstanding simulation-experiment gaps.

2606.01468 2026-06-02 stat.ML cs.AI cs.LG

Computation-Aware Kalman Filtering with Model Selection for Neural Dynamics

基于模型选择的计算感知卡尔曼滤波用于神经动力学

JR Huml, Jonathan Wenger, John P. Cunningham

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Texas at Austin(得克萨斯大学奥斯汀分校)

AI总结 提出计算感知状态空间模型(CASSM),通过新训练损失和优化方案实现模型选择,在试验数远少于神经元数的规模不平衡场景中,以可处理的计算复杂度提供竞争性预测和更优的不确定性校准。

Comments 24 pages, Proceedings of 2nd International Conference on Probabilistic Numerics (2026)

详情
AI中文摘要

由于其明确的先验和建模不确定性的能力,贝叶斯方法在单细胞神经记录的动力潜变量建模中发挥了重要作用。然而,现代规模的数据集使得过参数化的深度网络因其预测能力和有利的计算扩展性成为首选方法。尽管存在许多后验近似方法,但所有方法都会引入近似误差。最近的工作以计算不确定性的形式考虑了这种误差,但代价是二次复杂度,并假设固定的模型超参数。在这里,我们将这一发展扩展到模型选择,包括一种新颖的训练损失和优化方案,从而在大状态空间中实现可处理的推理。我们引入了一个框架,即计算感知状态空间模型(CASSM),专门针对规模不平衡的场景设计,其中试验次数显著少于记录的神经元数量。在这种场景下,对于合成数据和真实数据,我们展示了我们的方法与数据饥饿的深度网络具有竞争力,并且与之前扩展贝叶斯方法的尝试相比,不确定性校准显著改善。我们的实验为神经科学研究人员根据关键数据集属性和约束从一系列潜在动力潜变量模型中进行选择提供了路线图。

英文摘要

Due to their explicit priors and ability to model uncertainty, Bayesian methods have played a major role in dynamical latent variable modeling of single-cell neural recordings. However, modern-sized datasets have made overparameterized deep networks the preferred methods of choice due to their predictive power and favorable computational scaling. While many posterior approximations exist, all incur approximation errors. Recent work accounts for this error in the form of computational uncertainty but comes at the cost of quadratic complexity and assumes fixed model hyperparameters. Here we extend this development to model selection, including a novel training loss and optimization scheme, which yields tractable inference in large state-spaces. We introduce a framework, the Computation-Aware State-Space Model (CASSM), specifically designed for the scale-imbalanced regime, where the number of trials is significantly lower than the number of recorded neurons. In this regime, for both synthetic and real data, we show that our method is competitive with data-hungry deep networks, with significantly improved uncertainty calibration over previous attempts to scale Bayesian methods. Our experiments provide a roadmap to neuroscience researchers in choosing from a host of potential dynamical latent variable models given key dataset properties and constraints.

2606.01442 2026-06-02 cs.CR cs.AI cs.NE

On the Evaluation of Spiking Neural Network Configurations for Network Intrusion Detection

脉冲神经网络配置在网络入侵检测中的评估

Raj Patel, David Amebley, Taye Akinrele, Shaswata Mitra, Sayanton Dibbo, Shahram Rahimi

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过消融实验评估9种神经元模型与3种脉冲编码方案的27种组合,发现延迟编码优于速率和增量编码,且LeakyParallel神经元结合延迟编码在四个基准数据集上平均准确率92.11%,宏F1 0.80,假阳性率2.01%,推理速度最快。

Comments 1 figure, 3 Tables, This manuscript is under review for IEEE MILCOM 2026. \c{opyright} 2026 IEEE. Personal use is permitted; all other uses require IEEE permission, including reprinting, republication, redistribution, resale, or reuse of copyrighted components

详情
AI中文摘要

网络入侵检测是现代网络安全基础设施的核心组成部分,然而主导该领域的深度学习模型计算需求高,促使人们关注适用于边缘和神经形态部署的轻量级替代方案。因此,脉冲神经网络(SNN)是一个自然的选择,但其设计空间(涵盖神经元模型和脉冲编码方案的选择)在入侵检测方面仍未得到充分表征。我们通过使用9种神经元与3种脉冲编码方案相结合的受控消融研究来弥补这一差距,共产生27种变体,所有变体均在snntorch上实现,并在四个基准数据集(NSL KDD、KDDCup99、CIC-IDS2017和CTU-13)上使用5个随机种子对经过有限预处理的原始输入进行评估。我们发现,脉冲编码方案比神经元模型更能决定检测质量,其中速率和增量脉冲编码在整体扫描中表现不如延迟编码。结合延迟编码的LeakyParallel神经元总体表现最佳,在所有四个数据集上平均准确率为92.11%,宏F1为0.80,假阳性率为2.01%,在CIC-IDS2017和CTU-13上准确率接近完美,并且推理速度最快。这些结果凸显了SNN在考虑低延迟或资源受限部署时,作为传统入侵检测方法可行替代方案的潜力。

英文摘要

Network intrusion detection is a core component of modern cybersecurity infrastructure, yet the deep learning models that dominate the field are computationally demanding, motivating interest in lightweight alternatives suited to edge and neuromorphic deployment. Spiking Neural Networks (SNNs) are therefore a natural candidate, but their design space, spanning the choice of neuron model and spike encoding scheme, remains poorly characterized for intrusion detection. We bridge this gap by using a controlled ablation study using 9 neurons coupled with 3 spike encoding schemes, making 27 variants, all implemented on snntorch evaluated over raw inputs with limited preprocessing on four benchmark datasets (NSL KDD, KDDCup99, CIC-IDS2017, and CTU-13) with 5 seeds. We find that spike encoding scheme is a better determinant for detection quality than the neuron model, where rate and delta spike encodings perform worse than latency encoding over the sweep. The LeakyParallel neuron with latency encoding performed the best overall, averaging at 92.11% accuracy and 0.80 macro- F1 at a rate of 2.01% false positives averaged over all 4 datasets, with accuracy close to perfect for CIC-IDS2017 and CTU-13, and also performed the fastest on inference. These results highlight the potential of SNNs as a viable alternative to traditional methods of intrusion detection when considering low-latency or resource-constrained deployments.

2606.01427 2026-06-02 stat.ML cs.LG

On the Uncertainty Quantification Ability of Tabular Foundation Models

关于表格基础模型的不确定性量化能力

Tyler R. Johnson, Kian Ben-Jacob, Nima Negarandeh, Oriol Vendrell-Gallart, Ramin Bostanabad

发表机构 * Department of Mechanical and Aerospace Engineering, University of California, Irvine(加州大学欧文分校机械与航空航天工程系) Department of Civil and Environmental Engineering, University of California, Irvine(加州大学欧文分校土木与环境工程系)

AI总结 通过对比TabPFN与高斯过程在回归任务上的实证研究,揭示了显式先验与学习先验之间的权衡:TabPFN在复杂高维问题中表现优异,而高斯过程在数据稀缺时提供更优的预测精度和不确定性量化。

Comments 12 pages, 2 figures, 2 tables

详情
AI中文摘要

基础模型(FMs)在无需特定任务训练或微调的情况下,已在跨任务泛化方面取得了显著成功。然而,力学和计算科学中的许多关键应用不仅需要准确的预测,还需要可靠的不确定性量化(UQ)。本文通过全面的实证研究,比较了表格先验数据拟合网络(TabPFN)与高斯过程(GP)在回归任务中的UQ能力。我们系统地评估了这两种方法在一系列具有不同复杂度、数据集大小和输入维度的回归问题上的表现。我们使用默认设置构建所有GP,并与TabPFN v2.5进行公平比较。我们的发现突显了显式先验与学习先验之间的重要权衡:虽然TabPFN在数据充足的高维复杂问题上具有高度竞争性的性能,但GP在数据稀缺场景下通常提供更优的预测精度和UQ。此外,当所选核函数构成底层函数的良好先验时,GP的性能可能显著超过TabPFN。我们的结果可从https://github.com/kianswarehouse/GPvsPFN复现。

英文摘要

Foundation models (FMs) have achieved substantial success in generalizing across tasks without problemspecific training or fine-tuning. However, many critical applications in mechanics and computational science require not only accurate predictions but also reliable uncertainty quantification (UQ). Herein we investigate the UQ capabilities of tabular FMs in regression tasks through a comprehensive empirical study comparing Tabular Prior-Data Fitted Networks (TabPFN) against Gaussian processes (GPs). We systematically evaluate these two methods across a host of regression problems with varying complexity, dataset sizes, and input dimensionalities. We use a default setting to build all the GPs and for a fair comparison against TabPFN v2.5. Our findings highlight an important trade-off between explicit and learned priors: while TabPFN achieves highly competitive performance for complex, high-dimensional problems with sufficient data, GPs often provide superior predictive accuracy and UQ in data-scarce settings. Moreover, when the chosen kernel constitutes a good prior for the underlying function, GP performance can substantially exceed that of TabPFN. Our results can be reproduced from https://github.com/kianswarehouse/GPvsPFN.

2606.01413 2026-06-02 cs.CR cs.IR cs.LG

Differentially Private Datastore Generation for Retrieval-Augmented Inference

用于检索增强推理的差分隐私数据存储生成

Abdelrahman Abouelenein, Marwan Torki

发表机构 * Department of Computer and Systems Engineering, Alexandria University(计算机与系统工程系,亚历山大大学) Microsoft(微软)

AI总结 提出基于哈希的概率生成框架,利用局部敏感哈希和差分隐私噪声实现数据存储的隐私保护,在ε=5时准确率仅下降2.6%,并将成员推断攻击准确率降至53.60%。

Comments Accepted at the 28th International Conference on Pattern Recognition (ICPR-2026)

详情
AI中文摘要

对于依赖检索增强推理的现代设备端AI系统来说,在不损害个人隐私的情况下发布和共享数据存储至关重要。这可以通过差分隐私(DP)实现,它提供了形式化的保证,确保即使在对抗性分析下,个体贡献仍然不可区分。在本文中,我们引入了一个基于哈希的概率生成框架,旨在实现差分隐私数据存储的创建和发布。我们的方法采用局部敏感哈希(LSH)将高维数据高效地划分到桶中。然后,我们向每个桶的累积投票中添加校准的DP噪声,生成跨类别的概率分布。我们的方法广泛适用于任何需要安全键值数据存储创建和发布的流水线。我们在七个样本量和类别数(从2到14不等)的数据集上进行了实验。在ε=5时,我们发布的DP数据存储实现了强隐私保护,准确率仅平均下降2.6%。最后,我们评估了DP数据存储对成员推断攻击的抵抗力,将攻击准确率降低到53.60%。

英文摘要

It is crucial for modern on-device AI systems that rely on retrieval-augmented inference to release and share datastores without compromising individual privacy. This can be achieved using Differential Privacy (DP), which provides a formal guarantee that ensures individual contributions remain indistinguishable, even under adversarial analysis. In this paper, we introduce a hashing-based probability generation framework designed to enable the creation and release of differentially private datastores. Our approach employs locality-sensitive hashing (LSH) to efficiently partition high-dimensional data into buckets. We then add calibrated DP noise to the accumulated vote for each bucket, generating a probability distribution across classes. Our method is broadly applicable to any pipeline requiring secure key,value datastore creation and release. We conducted experiments on seven datasets with varying sample sizes and class counts, ranging from 2 to 14. At epsilon=5, our released DP datastore achieves strong privacy protection with only an average 2.6% drop in accuracy. Finally, we benchmark DP datastore resilience to membership inference attacks, reducing attack accuracy to 53.60%.

2606.01385 2026-06-02 cs.SE cs.AI

Bridging Requirements and Architecture: Multi-Agent Orchestration with External Knowledge and Hierarchical Memory

桥接需求与架构:基于外部知识和分层记忆的多智能体编排

Ruiyin Li, Yiran Zhang, Xiyu Zhou, Yangxiao Cai, Peng Liang, Weisong Sun, Jifeng Xuan, Zhi Jin, Yang Liu

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) Nanyang Technological University(南洋理工大学)

AI总结 提出MAAD框架,通过编排四个专业智能体(分析师、建模师、设计师、评估师),结合RAG注入架构标准与分层记忆机制,自动将需求规格转化为多视图架构蓝图并评估质量属性。

Comments 39 pages, 7 images, 5 tables, Manuscript submitted to a Journal (2026)

详情
AI中文摘要

软件架构设计是一个关键但本质上复杂且知识密集的阶段,需要平衡相互竞争的质量属性并适应不断变化的需求。传统上,这一过程耗时、劳动密集且严重依赖架构师,通常导致对替代架构分解和风格的探索有限,尤其是在敏捷开发的压力下。虽然基于LLM的智能体在各种软件工程任务中表现出色,但它们在架构设计中的应用仍然相对稀少,需要系统性的探索。为应对这些挑战,我们提出了MAAD(多智能体架构设计),这是一个知识驱动的框架,编排四个专业智能体(即分析师、建模师、设计师和评估师),自主协作地将需求规格转化为全面、多视图的架构蓝图,并附带质量属性评估。MAAD引入RAG将公认的架构标准和模式注入工作流,并利用分层记忆机制捕获设计历史以进行迭代优化。我们通过对比实验评估了MAAD与MetaGPT,使用10个案例研究中的定量架构级指标以及来自行业架构师对10个真实世界规格的定性反馈。结果表明,MAAD生成的架构比基线更完整、模块化和可追溯,其专用的评估智能体自主生成结构化质量评估报告,显著减少了手动验证工作。此外,我们发现生成架构的质量高度依赖于底层LLM的推理能力,其中GPT-5.2和Qwen3.5在大多数评估设置中优于其他模型。

英文摘要

Software architecture design is a critical yet inherently complex and knowledge-intensive phase that requires balancing competing quality attributes and adapting to evolving requirements. Traditionally, this process has been time-consuming, labor-intensive, and heavily reliant on architects, often resulting in limited exploration of alternative architectural decompositions and styles, especially under the pressures of agile development. While LLM-based agents have shown promising performance across various software engineering tasks, their application to architecture design remains relatively scarce and requires systematic exploration. To address these challenges, we proposed MAAD (Multi-Agent Architecture Design), a knowledge-driven framework that orchestrates four specialized agents (i.e., Analyst, Modeler, Designer and Evaluator) to autonomously and collaboratively transform requirements specifications into comprehensive, multi-view architectural blueprints with quality attribute assessments. MAAD incorporates RAG to inject recognized architectural standards and patterns into the workflow and leverages a hierarchical memory mechanism that captures design history for iterative refinement. We evaluated MAAD through comparative experiments against MetaGPT, using quantitative architecture-level metrics across 10 case studies and qualitative feedback from industry architects on 10 real-world specifications. Results show that MAAD generates more complete, modular, and traceable architectures than the baseline, and its dedicated Evaluator agent autonomously produces structured quality evaluation reports that significantly reduce manual validation efforts. Furthermore, we found that the quality of the generated architecture heavily depends on the underlying LLM's reasoning capacity, with GPT-5.2 and Qwen3.5 outperforming other models across most evaluation settings.

2606.01364 2026-06-02 cs.CR cs.AI cs.SE

Needles at Scale: LLM-Assisted Target Selection for Windows Vulnerability Research

大规模针尖:LLM辅助的Windows漏洞研究目标选择

Michael J. Bommarito

发表机构 * Microsoft(微软) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出Symbolicate-Enrich-Sample流水线,通过符号恢复、结构特征提取和低成本语言模型排序,从Windows系统数千万函数中筛选出约2.2万个候选目标,解决漏洞研究中目标选择瓶颈问题。

Comments 9 pages, 3 figures, 2 tables

详情
AI中文摘要

现代操作系统的攻击面如同大海捞针:数千个签名二进制文件和数百万个函数,几乎没有一个与任何给定漏洞相关。人类分析师或LLM代理必须在分析之前选择值得阅读的函数。在整个操作系统范围内,这种目标选择(而非分析)才是约束条件。我们提出了Symbolicate-Enrich-Sample,一个低成本的批处理流水线,将生产级Windows二进制文件语料库转化为可查询、优先级排序的研究队列。我们(i)通过自动获取公共符号文件并将其与恢复的调用图结合,恢复剥离符号的供应商二进制文件的函数级符号;(ii)为每个命名函数附加廉价、确定性的结构特征,并基于这些特征使用低成本语言模型分配可达性层级、风险级别、漏洞类别假设和理由;(iii)通过优先级加权重要性采样器抽取多样化、优先排序的批次。贡献在于一个选择基础:下游检测器或LLM代理在其上运行的优先级排序层。在包含7,231,419个函数的整个Windows镜像上,标签具有显著的选择性,叠加确定性过滤器后留下约22K个函数的候选列表:候选的针尖,数量足够人类或代理处理。我们描述了流水线的选择性及其失败模式,介绍了方法论并报告了总体统计数据;由于法律和双重用途原因,我们暂不公开推导出的数据集。

英文摘要

The attack surface of a modern operating system is a haystack: thousands of signed binaries and millions of functions, almost none relevant to any given vulnerability. A human analyst or an LLM agent must pick the function worth reading before analyzing it. At whole-OS scope, this target selection, not the analysis, is the binding constraint. We present Symbolicate-Enrich-Sample, a low-cost batch pipeline that turns a corpus of production Windows binaries into a queryable, priority-ranked research queue. We (i) recover function-level symbols for stripped vendor binaries by auto-fetching the public symbol files and joining them to a recovered call graph; (ii) attach cheap, deterministic structural features to each named function and, conditioned on those features, use a low-cost language model to assign a reachability tier, a risk level, a bug-class hypothesis, and a rationale; and (iii) draw diverse, prioritized batches via a priority-weighted importance sampler. The contribution is a selection substrate: the prioritization layer a downstream detector or LLM agent runs on top of. Across a whole Windows image of 7,231,419 functions, the labels are markedly selective, and stacking deterministic filters on them leaves a ~22K-function shortlist: the candidate needles, few enough for a human or agent to work through. We characterize the pipeline's selectivity and its failure modes, describe the methodology, and report aggregate statistics; we withhold the derived dataset for legal and dual-use reasons.

2606.01362 2026-06-02 cs.GR cs.CV

AlbedoEdit: Unified Instance-Level Video Editing with Albedo Guidance

AlbedoEdit: 基于反照率引导的统一实例级视频编辑

Xilong Zhou, Bao-Huy Nguyen, Zheng Zeng, Jacob Munkberg, Jon Hasselgren, Thomas Leimkühler, Nima Kalantari, Miloš Hašan, Christian Theobalt

发表机构 * Max Planck Institute for Informatics(马克斯·普朗克信息研究所) University of California Santa Barbara(加州大学圣巴巴拉分校) NVIDIA Research(NVIDIA研究) Texas A & M University(德克萨斯A&M大学)

AI总结 提出 AlbedoEdit,一个统一框架,利用反照率图实现对象插入、移除和纹理编辑,通过微调视频基础模型,在合成数据集上训练,实现编辑内容的和谐融合与复杂视觉效果模拟。

详情
AI中文摘要

视频生成模型在合成逼真视频序列方面取得了显著进展。然而,实现更广泛和更具创造性的下游应用需要细粒度的实例级视频编辑,包括对象插入、对象移除和纹理编辑,这已成为一个突出但具有挑战性的问题。现有方法要么提出仅具有粗略语义控制的统一生成框架,要么为单个编辑任务设计特定任务框架,限制了它们在多样化真实场景中的灵活性和适用性。为解决这些限制,我们提出了 AlbedoEdit,一个统一的生成式视频编辑框架,同时支持对象插入、对象移除和纹理编辑。我们的关键洞察是,内在反照率图(对光照不变且不包含镜面反射、阴影和相互反射效应)为指定细粒度外观编辑提供了一种有效且用户友好的机制。基于视频基础模型,AlbedoEdit 被微调以将源 RGB 视频转换为编辑后的 RGB 视频,条件为用户编辑的第一帧反照率。在覆盖所有三种编辑任务的新配对合成数据集上训练后,AlbedoEdit 隐式学习协调编辑内容并模拟由编辑操作触发的复杂真实世界视觉效果,包括镜面高光、软阴影和镜面反射。AlbedoEdit 在定性和定量上均优于最先进的视频编辑方法。项目网页为 https://vcai.mpi-inf.mpg.de/projects/AlbedoEdit/。

英文摘要

Video generative models have achieved remarkable progress in synthesizing photorealistic video sequences. However, enabling broader and more creative downstream applications requires fine-grained instance-level video editing, including object insertion, object removal, and texture editing, which has emerged as a prominent yet challenging problem. Existing approaches either propose unified generative frameworks with only coarse semantic control, or design task-specific frameworks for individual editing tasks, limiting their flexibility and applicability across diverse real-world scenarios. To address these limitations, we propose AlbedoEdit, a unified generative video editing framework that jointly supports object insertion, object removal, and texture editing. Our key insight is that the intrinsic albedo map, which is invariant to lighting and contains no specularity, shadowing and inter-reflection effects, provides an effective and user-friendly mechanism for specifying fine-grained appearance edits. Built upon video foundation models, AlbedoEdit is fine-tuned to translate source RGB videos into edited RGB videos, conditioned on a user-edited first-frame albedo. Trained on a new paired synthetic dataset covering all three editing tasks, AlbedoEdit implicitly learns to harmonize edited contents and simulate complex real-world visual effects triggered by editing operations, including specular highlights, soft shadows, and mirror reflections. AlbedoEdit demonstrates superior performance over state-of-the-art video editing approaches, both qualitatively and quantitatively. Project webpage is https://vcai.mpi-inf.mpg.de/projects/AlbedoEdit/.

2606.01325 2026-06-02 cs.NI cs.LG

SEArch: Optimistic Policy Selection Between Scene Noise and Drift for UAV Radar Search

SEArch: 无人机雷达搜索中场景噪声与漂移间的乐观策略选择

Noor Khial, Naram Mhaisen, Loay Ismail, Amr Mohamed

发表机构 * Department of Electrical and Computer Engineering, University of Waterloo(1 温哥华大学电子与计算机工程系)

AI总结 针对无人机雷达目标搜索中场景噪声与漂移共存的问题,提出基于随机扩展对手框架的乐观跟随正则化领导者策略选择器SEArch及其窗口变体W-SEArch,实现了亚线性遗憾界,实验显示相比非自适应基线遗憾降低达30%。

详情
AI中文摘要

配备雷达传感器的无人机被部署在多样环境中执行目标搜索任务,目标具有可通过遮挡检测到的特征信号(例如人体搜索中的呼吸微动)。一个基本挑战在于,当无人机在动态且可能非平稳的环境中移动时,雷达统计特性发生变化,使得任何固定的信号处理策略都变得次优;然而感知和适应必须在资源受限的空中节点上实时运行。由于没有单一检测器能在所有条件下表现良好,我们采用多策略范式,将无人机目标搜索形式化为一个在线策略选择问题,基于一组专用检测器库,性能通过遗憾(即相对于每个场景中最优策略的累积损失差距)来衡量。该设置将场景内随机噪声与场景间漂移耦合在一起。先前的方法仅捕捉一种模式,而我们通过随机扩展对手框架同时考虑两者,无需场景动态的先验知识。由于适应必须在无人机上运行,我们通过SEArch实例化SEA,这是一种轻量级的乐观跟随正则化领导者选择器,具有自适应学习率,实现了遗憾界$O(arσ_T \sqrt{T} + \sqrt{J})$,其中$arσ_T$捕捉雷达测量噪声,$J$是任务时间范围$T$内的场景转换次数。为了在频繁场景变化下实现快速适应,我们进一步引入了W-SEArch,这是一种窗口变体,每$w$轮重启一次,并在每个窗口内最多一次转换下实现遗憾界$O(arσ_I \sqrt{w})$。实验表明,在一系列非平稳设置中,与非自适应基线相比,遗憾降低高达30%。

英文摘要

Unmanned Aerial Vehicles (UAVs) equipped with radar sensors are deployed for target search missions in diverse environments, where targets exhibit characteristic signatures (e.g., respiration micro-motion in human search) detectable through occlusions. A fundamental challenge arises from shifts in radar statistics as the UAV moves through a dynamic and potentially non-stationary environment, rendering any fixed signal-processing strategy suboptimal; yet perception and adaptation must run onboard a resource-constrained aerial node in real time. Since no single detector performs well across all conditions, we adopt a multi-policy paradigm and formulate UAV target search as an online policy selection problem over a library of specialized detectors, with performance measured by regret, the cumulative loss gap relative to the best policy in each scene. The setting couples in-scene stochastic noise with inter-scene shifts. Whereas prior methods capture only one regime, we account for both through the Stochastically Extended Adversary (SEA) framework, without requiring oracle knowledge of scene dynamics. Because adaptation must run at the UAV, we instantiate SEA through \textsc{SEArch}, a lightweight optimistic Follow the Regularized Leader (OFTRL) selector with an adaptive learning rate, achieving regret $O(\barσ_T \sqrt{T} + \sqrt{J})$, where $\barσ_T$ captures radar measurement noise and $J$ is the number of scene transitions over the mission horizon $T$. To enable rapid adaptation under frequent scene changes, we further introduce \textsc{W-SEArch}, a windowed variant that restarts every $w$ rounds and achieves regret $O(\barσ_I \sqrt{w})$ under at most one transition per window. Experiments show up to 30\% regret reduction compared to non-adaptive baselines across a range of non-stationary settings.

2606.01324 2026-06-02 cs.IT cs.AI math.IT

Digital Twin-Assisted Adaptive Multi-Agent DRL for Intelligent Spectrum and Resource Management in Open-RAN UAV-Enabled 6G Networks

数字孪生辅助的自适应多智能体深度强化学习用于Open-RAN无人机赋能6G网络中的智能频谱与资源管理

Marwan Dhuheir, Thang X. Vu, Symeon Chatzinotas

发表机构 * University of Cambridge(剑桥大学) University of Bristol(布里斯托大学)

AI总结 针对无人机辅助6G网络中动态频谱与资源管理难题,提出数字孪生辅助的自适应深度强化学习框架,结合粒子群优化轨迹与多智能体DRL分配资源,显著提升频谱效率、数据速率和能量利用率。

Comments accepted and presented at IEEE ICC-2026 conference paper

详情
AI中文摘要

向6G无线网络的演进设想了一种无缝智能、支持Open-RAN的架构,其中无人机在扩展覆盖、增强弹性以及确保地面用户部署的可靠连接方面发挥关键作用。然而,由于非线性系统交互、移动性引起的拓扑变化以及严格的延迟和能量约束,在这种高度动态的无人机辅助环境中有效管理频谱和资源仍然是一个主要挑战。为了解决这些挑战,我们提出了一种数字孪生辅助的自适应深度强化学习框架,该框架能够在分布式地面用户之间实现智能频谱共享和资源分配。复杂的优化问题被分解为使用粒子群优化的无人机轨迹优化和通过多智能体深度强化学习的动态频谱-功率-关联管理。这种混合数字孪生驱动的方法实现了智能、上下文感知的决策制定和无人机之间的自适应协调。大量仿真表明,在频谱效率、数据速率和能量利用方面取得了显著增益,展示了通往自我进化、自主的6G无人机和地面用户连接的变革性路径。

英文摘要

The evolution toward 6G wireless networks envisions a seamlessly intelligent, Open-RAN-enabled architecture where unmanned aerial vehicles (UAVs) play a pivotal role in extending coverage, enhancing resilience, and ensuring reliable connectivity for ground users deployment. However, efficiently managing spectrum and resources in such highly dynamic UAV-assisted environments remains a major challenge due to nonlinear system interactions, mobility-induced topology variations, and stringent latency and energy constraints. To address these challenges, we propose a digital twin (DT)-assisted adaptive deep reinforcement learning (DRL) framework that enables intelligent spectrum sharing and resource allocation across distributed ground users. The complex optimization problem is decomposed into UAV trajectory optimization using particle swarm optimization (PSO) and dynamic spectrum-power-association management via multi-agent DRL (MADRL). This hybrid DT-driven approach empowers intelligent, context-aware decision-making and adaptive coordination among UAVs. Extensive simulations demonstrate significant gains in spectral efficiency, data rates, and energy utilization, showcasing a transformative path toward self-evolving, autonomous 6G UAV and ground users (GUs) connectivity.

2606.01312 2026-06-02 eess.SP cs.AI cs.NI

A Communication-Centric 6G-LLM Architecture for Scalable Tactical Autonomous Defense Vehicle Networks

面向可扩展战术自主防御车辆网络的以通信为中心的6G-LLM架构

Kiran Khurshid, Shumaila Javaid, Nasir Saeed

发表机构 * Department of Computer and Software Engineering, National University of Sciences and Technology (NUST), Islamabad, Pakistan(计算机与软件工程系,国家科学与技术大学(NUST),伊斯兰堡,巴基斯坦) Department of Control Science and Engineering, College of Electronics and Information Engineering, Tongji University and National Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, China(控制科学与工程系,电子与信息工程学院,同济大学,以及自主智能无人机系统国家重点实验室,同济大学,中国) Department of Electrical and Communication Engineering, UAE University, Al-Ain 15551, UAE(电子与通信工程系,阿联酋大学,阿恩15551,阿联酋)

AI总结 提出一种以通信为中心的分层架构,通过集成边缘辅助大语言模型推理与6G语义通信,在战术自主防御车辆网络中实现协调效率提升、通信开销降低和延迟韧性增强。

Comments 10 pages, accepted in IEEE Network Magazine

详情
Journal ref
K. Khurshid, S. Javaid and N. Saeed, "A Communication-Centric 6G-LLM Architecture for Scalable Tactical Autonomous Defense Vehicle Networks," in IEEE Network, Early access, 2026
AI中文摘要

人工智能(AI)与新兴6G网络的融合为战术自主车辆系统的可扩展协调带来了新机遇。本文提出了一种以通信为中心的分层架构,用于战术自主防御车辆网络(TADVNs),该架构将边缘辅助大语言模型(LLM)推理与6G连接和语义通信相结合。该框架旨在提高协调效率、减少通信开销,并在不断扩大的车队规模操作下增强延迟韧性。与依赖结构化特征处理和基于规则协调的传统任务特定AI流水线不同,所提出的方法在分层边缘-云通信架构中引入了语义抽象和上下文感知决策支持。我们通过蒙特卡洛模拟,在竞争网络条件下对5-30辆车的车队规模进行了通信和协调性能评估。结果表明,在30辆车规模下,与基于5G的传统AI基线相比,6G-LLM配置实现了75.2%的延迟降低(29.1毫秒对比117.5毫秒),任务成功率提高68.7个百分点(82.9%对比14.2%),通信开销降低88.6%。这些发现表明,当语义推理与低延迟6G连接相结合时,在协调和通信方面具有可衡量的优势。

英文摘要

The integration of Artificial Intelligence (AI) and emerging 6G networks introduces new opportunities for scalable coordination in tactical autonomous vehicle systems. This paper proposes a communication-centric hierarchical architecture for Tactical Autonomous Defense Vehicle Networks (TADVNs) that models the integration of edge-assisted Large Language Model (LLM) reasoning with 6G-enabled connectivity and semantic communication. The framework is designed to improve coordination efficiency, reduce communication overhead, and enhance latency resilience under increasing fleet-scale operation. Unlike conventional task-specific AI pipelines that rely on structured feature processing and rule-based coordination, the proposed approach incorporates semantic abstraction and context-aware decision support within a layered edge-cloud communication architecture. We evaluate communication and coordination performance via Monte Carlo simulations across fleet sizes of 5-30 vehicles under contested network conditions. Results indicate that at a 30-vehicle scale, the 6G-LLM configuration achieves 75.2% latency reduction (29.1 ms vs. 117.5 ms), a 68.7 percentage point increase in mission success rate (82.9% vs. 14.2%), and an 88.6% reduction in communication overhead compared to a 5G-based conventional AI baseline. These findings demonstrate measurable benefits in coordination and communication when semantic reasoning is combined with low-latency 6G connectivity.

2606.01291 2026-06-02 quant-ph cs.AI

Quantum Algorithm for Distributed Reduction of Entanglements (QADR): A Trainable and Simulation-Efficient QML Framework

量子分布式纠缠约简算法(QADR):一种可训练且模拟高效的量子机器学习框架

Syed Farhan Ahmad, Gregory T. Byrd

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 提出QADR框架,通过将全局n量子比特变分量子电路分解为因果光锥内的局部子电路,将经典模拟内存从O(2^n)降至O(n·2^{2d+1})并缓解贫瘠高原,在MNIST和NASA轴承诊断任务上匹配或超越经典模型。

详情
AI中文摘要

在含噪中等规模量子(NISQ)约束下训练变分量子电路(VQCs)引入了严重的计算限制:经典态矢量模拟内存呈指数增长($\mathcal{O}(2^n)$),且全局代价函数遭受贫瘠高原,其中梯度方差呈指数衰减($\mathcal{O}(1/2^n)$)。本文介绍并评估了量子分布式纠缠约简算法(QADR),这是一种混合量子-经典机器学习框架,它将全局$n$量子比特VQC分解为局部子电路,这些子电路大致在单个目标量子比特的因果光锥内运行。QADR将经典模拟内存从$\mathcal{O}(2^n)$降低到$\mathcal{O}(n \cdot 2^{2d+1})$(光锥半径$d$),同时自然缓解了全局贫瘠高原。我们在MNIST数据集和高维NASA IMS风力发电机传动系统诊断任务上,将QADR与标准全局VQC、支持向量机(SVM)以及两种定制的经典参数匹配神经网络(CANN和PMNN)进行了基准测试。QADR展示了出色的可扩展性,在$n_{\text{features}}=2000$时成功运行,而标准全局VQC因内存耗尽而崩溃,同时匹配或超越了优化经典架构的性能。

英文摘要

Training Variational Quantum Circuits (VQCs) under Noisy Intermediate-Scale Quantum (NISQ) constraints introduces severe computational limitations: classical statevector simulation memory scales exponentially ($\mathcal{O}(2^n)$), and global cost functions suffer from barren plateaus where gradient variance decays exponentially ($\mathcal{O}(1/2^n)$). This paper introduces and evaluates the Quantum Algorithm for Distributed Reduction of Entanglements (QADR), a hybrid quantum-classical machine learning framework that decomposes a global $n$-qubit VQC into localized sub-circuits operating approximately within the causal light cones of individual target qubits. QADR reduces classical simulation memory scaling from $\mathcal{O}(2^n)$ to $\mathcal{O}(n \cdot 2^{2d+1})$ for a light cone radius $d$, while naturally mitigating global barren plateaus. We benchmark QADR against standard global VQCs, Support Vector Machines (SVM), and two customized classical parameter-matched neural networks (CANN and PMNN) on the MNIST dataset and the high-dimensional NASA IMS wind turbine drivetrain diagnostic task. QADR demonstrates excellent scalability, operating successfully at $n_{\text{features}}=2000$ where standard global VQCs crash due to memory exhaustion, while matching or exceeding the performance of optimized classical architectures.

2606.01286 2026-06-02 cs.SE cs.AI cs.CL cs.LG

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

BenchEvolver: 通过以解决方案为中心的进化进行前沿任务合成

Yangzhen Wu, Aaron J. Li, Wenjie Ma, Li Cao, Ziheng Zhou, Mert Cemri, Shu Liu, Yuran Xiu, Chenxiao Yan, Haikun Zhao, Bin Yu, Ion Stoica, Dawn Song

发表机构 * University of California, Berkeley(加州大学伯克利分校) Institute for Interdisciplinary Information Sciences, Tsinghua University(清华大学交叉信息研究院)

AI总结 提出BenchEvolver框架,通过进化参考解决方案自动生成更难的编程问题,以解决基准饱和问题,并在LiveCodeBench和SciCode上验证其有效性。

详情
AI中文摘要

前沿大语言模型的快速进步导致了广泛的基准饱和,限制了现有数据集区分模型能力或提供有用训练信号的能力。例如,在LiveCodeBench上,前沿模型在简单拆分上达到超过99%的Pass@1,在不同难度级别上平均超过90%的Pass@1。构建新的、具有挑战性的数据集通常需要大量人力,成为进步的瓶颈。我们引入了BenchEvolver,一个以解决方案为中心的进化框架,自动将现有编码问题转化为更难的变体。BenchEvolver不是从头生成问题,而是通过结构化变换进化参考解决方案,并从进化后的解决方案中推导出相应的描述和测试。这种设计将生成过程基于可执行语义,使得能够可扩展地构建高质量、多样化和困难的任务,并具有可验证的正确性。将BenchEvolver应用于LiveCodeBench和SciCode,我们获得了显著更难的进化任务,同时保持了有效性、参考正确性和多样性。我们进一步策划了LiveCodeBench-Plus,一个包含91个问题的基准,结合了进化后的任务和困难的原始LCB-v6任务,其中前沿模型的Pass@1范围从27.5%到62.6%,恢复了强编码模型之间的清晰区分。重要的是,即使对于生成它们的模型,进化后的任务仍然具有挑战性,从而实现了自我改进。我们进一步表明,在进化后的LCB任务上进行强化学习提高了留出编码性能:对于gpt-oss-20b,种子+进化训练在LCB v6 Hard和LCB-Pro Easy上分别获得了+8.7和+8.3的Pass@1提升,分别超过仅种子训练的70.7%和34.8%。我们的结果表明,BenchEvolver可以将饱和的基准转化为前沿级别的评估套件和可重用的训练信号。

英文摘要

The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and exceed 90% Pass@1 on average across difficulty levels. Constructing new, challenging datasets typically requires substantial human effort, creating a bottleneck for progress. We introduce BenchEvolver, a solution-centric evolutionary framework that automatically transforms existing coding problems into harder variants. Rather than generating problems from scratch, BenchEvolver evolves reference solutions through structured transformations and derives corresponding statements and tests from the evolved solutions. This design grounds generation in executable semantics, enabling scalable construction of high-quality, diverse, and difficult tasks with verifiable correctness. Applying BenchEvolver to LiveCodeBench and SciCode, we obtain evolved tasks that are substantially harder while maintaining validity, reference correctness, and diversity. We further curate LiveCodeBench-Plus, a 91-problem benchmark combining evolved and difficult original LCB-v6 tasks, where frontier-model Pass@1 ranges from 27.5% to 62.6%, restoring clear discrimination among strong coding models. Importantly, evolved tasks remain challenging even for the model that generates them, enabling self-improvement. We further show that RL on evolved LCB tasks improves held-out coding performance: for gpt-oss-20b, seed+evolved training achieves +8.7 and +8.3 Pass@1 gains on LCB v6 Hard and LCB-Pro Easy, exceeding seed-only gains by 70.7% and 34.8%, respectively. Our results show that BenchEvolver can convert saturated benchmarks into frontier-level evaluation suites and reusable training signal.

2606.01264 2026-06-02 q-bio.NC cs.HC cs.SD eess.AS eess.SP

A 1000-hour EEG-EMG-audio dataset of Japanese speech production

1000小时日语语音产生的EEG-EMG-音频数据集

Motoshige Sato, Ilya Horiguchi, Masakazu Inoue, Kenichi Tomeoka, Eri Hatakeyama, Yuya Kita, Atsushi Yamamoto, Ippei Fujisawa, Shuntaro Sasai

发表机构 * National Institute of Information and Communications Technology, Japan(日本信息与通信技术研究所)

AI总结 本研究构建了一个包含1020小时同步头皮脑电图、面部肌电图和语音音频的多模态数据集,来自三名健康日语母语者在开放词汇有声语音过程中的记录,旨在支持语音解码、多模态信号处理及脑电图表示学习等研究。

详情
AI中文摘要

我们提出了一个多模态数据集,包含来自三名健康日语母语者在开放词汇有声语音过程中同步记录的1020小时头皮脑电图(EEG)、面部肌电图(EMG)和语音音频。记录使用三种EEG系统——超高密度系统(g.Pangolin)和两种帽式系统(g.SCARABEO和eegosports),通道数从62到128不等——在数月内跨多个会话采集。每个会话提供时间同步的EEG、面部EMG和音频,以及语音事件注释和转录。尽管数据集的主要动机是语音解码,但它也支持多模态信号处理、伪影建模、纵向和跨设备适应以及EEG表示学习等工作。技术验证包括跨参与者、设备和任务的功率谱密度和事件相关电位分析,显示了预期的1/f频谱轮廓、任务相关的alpha频段衰减和时间锁定的诱发响应。该数据集以脑成像数据结构(BIDS)格式通过OpenNeuro在CC0豁免下发布,以支持语音相关及更广泛的EEG研究。

英文摘要

We present a multimodal dataset of 1020 hours of simultaneously recorded scalp electroencephalography (EEG), facial electromyography (EMG), and speech audio from three healthy native Japanese speakers during open-vocabulary overt speech. Recordings were acquired with three EEG systems-an ultra-high-density system (g.Pangolin) and two cap-type systems (g.SCARABEO and eegosports), spanning 62-128 channels-across many sessions over several months. Each session provides time-synchronized EEG, facial EMG, and audio, together with speech-event annotations and transcriptions. Although collected with speech decoding as a primary motivation, the dataset also supports work on multimodal signal processing, artifact modeling, longitudinal and cross-device adaptation, and EEG representation learning. Technical validation included power spectral density and event-related potential analyses across participants, devices, and tasks, which showed the expected 1/f spectral profile, task-related alpha-band attenuation, and time-locked evoked responses. The dataset is released in Brain Imaging Data Structure (BIDS) format via OpenNeuro under a CC0 waiver to support both speech-related and broader EEG research.

2606.01256 2026-06-02 stat.ML cs.LG stat.ME

Distribution-free changepoint localization after sequential change detection

顺序变化检测后的无分布变化点定位

Aytijhya Saha, Aaditya Ramdas

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种无分布框架,在停止顺序变化检测程序后构建变化点的后检测置信集,无需任何分布假设,并保证了有限样本覆盖率和渐近有界期望大小。

详情
AI中文摘要

本文介绍了一种无分布框架,用于在停止顺序变化检测程序后构建变化点的后检测置信集。众所周知,共形测试鞅可用于顺序检测分布变化,但其本身不提供对声称变化发生时间的推断。以往关于后检测推断的工作需要已知变化前和变化后的分布类别,而本文在没有任何分布假设的情况下实现了变化点的定位。我们建立了有限样本覆盖保证(条件于正确检测)。我们给出了置信集条件期望大小的非渐近界。在合适的渐近机制下,我们证明了置信集的条件期望大小一致有界,并在模拟和真实数据上展示了强大的实证性能。据我们所知,这是第一个具有有效后检测覆盖保证的通用无分布顺序变化点定位框架。

英文摘要

This paper introduces a distribution-free framework for constructing post-detection confidence sets for changepoints after stopping a sequential change detection procedure. It is well known that conformal test martingales can be used to sequentially detect changes in distribution, but by themselves provide no inference for the time at which a proclaimed change occurred. Past work on post-detection inference requires pre- and post-change classes of distributions to be known, but this paper accomplishes localization of the changepoint without any distributional assumptions. We establish finite-sample coverage guarantees (conditional on correct detection). We provide non-asymptotic bounds on the conditional expected size of the confidence sets. Under suitable asymptotic regimes, we proved that the conditional expected size of the confidence set remains uniformly bounded. and demonstrate strong empirical performance on simulated and real data. To the best of our knowledge, this is the first general distribution-free framework for sequential changepoint localization with a valid post-detection coverage guarantee.

2606.01244 2026-06-02 stat.ML cs.LG cs.NA math.FA math.NA math.ST stat.TH

Efficient Approximation for Encoder--Decoder Neural Operators via Variation Spaces

基于变分空间的编码器-解码器神经算子的高效逼近

Jia-Qi Yang, Lei Shi

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 通过引入变分空间作为非线性算子的无穷维结构类,建立了编码器-解码器双层网络在Bochner L^q范数下的逼近界,误差分解为输入编码误差、输出编码误差和N^{-1/2}阶有限宽逼近项,为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

Comments 14 pages

详情
AI中文摘要

我们研究使用编码器-解码器神经网络的算子学习。受神经网络函数空间理论的启发,我们引入变分空间作为非线性算子的无穷维结构类。该空间通过直接在输入和输出空间上的向量值测度定义。对于该空间中的算子,我们建立了编码器-解码器双层网络在Bochner $L^q$ 范数下的逼近界。得到的误差界分解为输入编码误差、输出编码误差和一个阶为 $N^{-1/2}$ 的有限宽逼近项,其常数与输入和输出编码维度无关。当输入和输出编码误差在编码维度上呈多项式衰减时,这些估计产生代数逼近和学习速率。结果为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

英文摘要

We study operator learning using encoder--decoder neural networks. Inspired by the function-space theory of neural networks, we introduce a variation space as an infinite-dimensional structural class for nonlinear operators. This space is defined through vector-valued measures directly on the input and output spaces. For operators in this space, we establish approximation bounds for encoder--decoder two-layer networks in the Bochner $L^q$ norm. The resulting error bound decomposes into the input encoding error, the output encoding error, and a finite-width approximation term of order $N^{-1/2}$, with a constant independent of the input and output encoding dimensions. When the input and output encoding errors decay polynomially in the encoding dimensions, these estimates yield algebraic approximation and learning rates. The results provide an theoretical guarantees for efficient neural operator learning beyond general Lipschitz or Fréchet differentiable operator classes.

2606.01234 2026-06-02 econ.GN cs.CE cs.CV cs.GT cs.LG physics.soc-ph q-fin.EC

Differing Roles of Leisure and Productivity in GDP - A Machine Learning based comparative analysis of Germany and USA

休闲与生产力在GDP中的不同作用——基于机器学习的德国与美国比较分析

Achintya Ranjan, Uma Ranjan

发表机构 * Achintya Ranjan(阿金蒂亚·兰詹) Uma Ranjan(乌玛·兰詹)

AI总结 本研究通过随机森林模型分析工作时间和全要素生产率对GDP的影响,并利用Gini重要性、SHAP图和部分依赖图揭示德国与美国社会结构差异在GDP贡献中的体现。

Comments International Conference on Emerging Techniques in Computational Intelligence 2025

详情
AI中文摘要

一个国家的GDP被建模为两个因素之间的相对相互作用——工作时间,反映人口的社会选择,以及全要素生产率,反映对生产力提升因素的集体投资。研究表明,随机森林模型可以从这两个因素准确预测GDP。通过Gini重要性、SHAP图和部分依赖图分析了德国和美国所做的选择差异。结果表明,国家社会结构的差异反映在工作时间和生产率对GDP的相对贡献中。

英文摘要

The GDP of a country is modelled as the relative interaction between two agents - working hours, reflecting the social choice of a population, and Total Factor Productivity, reflecting the collective investment in productivity enhancers. It is shown that a Random Forest model can accu- rately predict the GDP from these two factors. The differences in the choices made by Germany and USA are analysed though Gini importance, SHAP plots and partial dependency. It is shown that the differences in the social structure of the countries are reflected in the relative contribution of working hours and productivity to the GDP.

2606.01188 2026-06-02 cs.HC cs.AI

pcbGPT: Automatic PCB Schematic Synthesis from Natural Language Requirements

pcbGPT: 从自然语言需求自动合成PCB原理图

Tobias King, Steven Kehrberg, Michael Beigl, Tobias Röddiger

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Bosch Sensortec GmbH(博世传感器技术有限公司)

AI总结 提出pcbGPT系统,通过工具增强合成、组件库搜索、数据表知识、执行检查、结构语义验证和交互式工作流,从自然语言规格自动生成可编辑的KiCad原理图,在20个嵌入式任务上达到pass@1为0.90。

详情
AI中文摘要

在嵌入式、物联网和可穿戴设备开发中,将自然语言硬件需求转化为正确的印刷电路板(PCB)原理图仍然困难。设计者必须选择兼容的组件、解读数据手册、添加支持电路并暴露正确的接口,然后才能开始布局和原型制作,而许多此类电路无法通过简单的仿真进行验证。我们提出了pcbGPT,一个从自然语言规格生成可编辑KiCad原理图的接地系统。pcbGPT用Python DSL表示电路,并结合了工具增强合成、组件库搜索、基于数据手册的设计知识、基于执行的检查、结构和语义验证,以及支持迭代优化和与KiCad项目同步的交互式Web工作流。我们在20个嵌入式原理图生成任务上评估了该系统,这些任务具有参考实现、所需组件和接口约束,以便自动比较。最佳模型在整体上达到pass@1为0.90,pass@5为1.00;在基础和简单任务上pass@1为1.00,中等任务上为0.91,困难任务上为0.72。这些结果以及失败分析表明,pcbGPT已经能够为早期原型生成有用的、可审查的初稿原理图,但尚不足以可靠地取代专家审查。

英文摘要

Translating natural-language hardware requirements into correct printed circuit board (PCB) schematics remains difficult in embedded, IoT, and wearable development. Designers must choose compatible components, interpret datasheets, add support circuitry, and expose correct interfaces before layout and prototyping can begin, while many such circuits cannot be validated through straightforward simulation. We present pcbGPT, a grounded system for generating editable KiCad schematics from natural-language specifications. pcbGPT represents circuits in a Python DSL and combines tool-augmented synthesis with component-library search, datasheet-grounded design knowledge, execution-based checking, structural and semantic validation, and an interactive web workflow that supports iterative refinement and synchronization with KiCad projects. We evaluate the system on 20 embedded schematic-generation tasks with reference implementations, required components, and interface constraints that enable automatic comparison. The best model reaches overall pass@1 of 0.90 and pass@5 of 1.00; pass@1 is 1.00 on basic and easy tasks, 0.91 on medium tasks, and 0.72 on hard tasks. These results, together with failure analysis, show that pcbGPT can already generate useful, reviewable first-draft schematics for early prototyping, but is not yet reliable enough to replace expert review.

2606.01171 2026-06-02 cs.CY cs.AI

AI From the Margins (AIM): Rethinking Participatory AI Design Through the Lived Experience of Minoritized Communities

边缘AI(AIM):通过少数群体生活经验重新思考参与式AI设计

Tijs Portegies, Laureanne Willems, Maaike Harbers, Giovanni Sileno, Roland van Dierendonck, Mayesha Tasnim, Lotte Willemsen, Sennay Ghebreab

发表机构 * Utrecht University(乌特勒支大学) University of Amsterdam(阿姆斯特丹大学)

AI总结 提出AIM方法论,通过叙事启发、共同规则制定等步骤,将少数群体的生活经验融入参与式AI设计,并在荷兰医疗场景中验证其有效性。

Comments Under review at the AAAI/ACM Conference on AI, Ethics, and Society (AIES 2026)

详情
AI中文摘要

人工智能(AI)可以再现并放大少数群体面临的结构性不平等。参与式AI被提出作为应对措施,但参与通常始于问题定义和成功标准设定之后,留给少数群体重塑AI系统目的的空间有限。我们提出边缘AI(AIM):一种方法论立场,阐明如何引发、聚焦并推进少数群体的生活经验,以指导参与式AI设计。AIM并非固定协议;它阐明了一组先决条件,可通过不同技术在不同环境中实施。我们在荷兰医疗背景下,与13名有色人种女性和非二元性别者以及5名市政政策工作者进行了八次会话,应用了AIM,具体包括:(1)使用传记叙事解释法(BNIM)进行叙事启发;(2)共同构建规则制定;(3)参与者决定AI是否、在何处以及如何介入;(4)通过与政策制定者的对话将生活经验转化为AI政策。在会话反思中,参与者将参与描述为实质性的,并呼吁继续开展,展示了以生活经验为基础的准备性取向如何塑造参与式AI设计的目的。

英文摘要

Artificial intelligence (AI) can reproduce and amplify the structural inequities faced by minoritized communities. Participatory AI has been proposed as a response, but participation typically starts after problem definitions and success criteria have been set, leaving limited room for minoritized communities to reshape what an AI system is for. We propose AI From the Margins (AIM): a methodological stance that articulates the conditions under which lived experiences of minoritized communities can be elicited, centered, and carried forward to inform participatory AI design. AIM is not a fixed protocol; it articulates a set of preconditions that can be enacted through different techniques in different settings. We applied AIM in a Dutch healthcare context in eight sessions with 13 women and non-binary people of color and five municipal policy workers, namely through (1) narrative elicitation using the Biographic Narrative Interpretive Method (BNIM); (2) co-constructed rule-making; (3) participants' determination of whether, where, and how AI should be involved; and (4) translating lived experience into AI policy through dialogue with policymakers. In their reflections on the sessions, participants described the engagement as substantive and called for its continuation, demonstrating how preparatory orientation fundamentally grounded in lived experience shapes what participatory AI design is for.

2606.01170 2026-06-02 cs.MA cs.RO

Coordinating Task Switching in a Robotics Multi-Agent System Using Behavior Trees

使用行为树协调机器人多智能体系统中的任务切换

Lucas Haug, Anarosa Alves Franco Brandão, Arthur Casals

发表机构 * LTI - Laboratório de Técnicas Inteligentes, Universidade de S a ~ \tilde{a} o Paulo, SP(智能技术实验室,圣保罗大学)

AI总结 本文提出一种基于行为树的方法,用于在IEEE VSSS机器人足球多智能体系统中协调机器人行为,并通过与有限状态机的对比实验及竞赛验证其有效性。

Comments 7 pages, 7 figures. Preprint of a manuscript submitted to the XXVI Congresso Brasileiro de Automática (CBA 2026)

详情
AI中文摘要

多智能体系统在机器人领域的应用是一个极具挑战性的领域。为了促进以游戏为底层领域的策略和机制的研究与开发,提出了多项涉及此类系统的竞赛。其中, extit{IEEE Very Small Soccer (VSSS)} 类别是本文描述的案例研究。在VSSS中,每队三个机器人,在非常动态的足球比赛环境中竞争。因此,比赛中机器人行为的协调至关重要。本文提出了一种基于行为树的方法,用于支持圣保罗大学ThundeRatz机器人团队VSSS队伍中的多机器人协调。此外,使用FIRASim模拟器将所提出的方法与之前基于有限状态机(FSM)的方法进行了比较。此外,该新策略的性能还在一次学术机器人竞赛中得到了进一步评估。

英文摘要

The application of multi-agent systems in robotics is a very challenging field. Several competitions involving such systems are proposed to foster research and development of strategies and mechanisms using games as the underlying domain. Among them are the ones from the \textit{IEEE Very Small Soccer (VSSS)} category, which is the case study described in this paper. In VSSS, two teams of three robots each compete in a very dynamic environment of a soccer game. Thus, coordination of robots' behavior during the game is crucial to win it. In this paper, we present a Behavior-Tree-based approach to support multi-robot coordination within the VSSS team of the ThundeRatz robotics team from the Universidade de S$\tilde{a}$o Paulo. Moreover, a comparison between the proposed approach and the previous one, which was based on a Finite State Machine (FSM), was conducted using the FIRASim simulator. Besides that, the performance of this new strategy was further evaluated in an academic robotics competition.

2606.01135 2026-06-02 cs.NE cs.SD

Spiking and Event-driven Neuromorphic Mamba Models for Efficient Speech Recognition

用于高效语音识别的脉冲与事件驱动神经形态Mamba模型

Tauseef Ahmed, Tao Sun, Jeronimo Castrillon, Kanishkan Vadivel, Guangzhi Tang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Tsinghua University(清华大学)

AI总结 提出脉冲和事件驱动的神经形态Mamba模型,通过激活稀疏性提升语音识别效率,在LibriSpeech上实现超过60%的激活稀疏性且精度损失小于1%,并开发周期精确的事件驱动模拟器实现算法-硬件协同优化。

Comments Accepted at IJCNN2026

详情
AI中文摘要

深度学习极大地推动了自动语音识别(ASR)的发展,使其能够广泛部署在智能手机和智能家居系统等边缘设备上。然而,深度神经网络的计算和能量需求给这种资源受限的部署带来了巨大挑战,导致延迟并限制了实时交互。神经形态计算通过脉冲神经网络(SNN)和事件驱动神经网络引入激活稀疏性,将密集运算转换为稀疏计算,提供了一种有前景的解决方案。然而,对于ASR,评估不同神经形态策略硬件优势的研究仍然缺乏。本文探索了脉冲和事件驱动的神经形态神经网络,以改进用于ASR的最先进SpeechMamba模型中的激活稀疏性。我们引入了一个带有FATReLU激活的事件驱动SpeechMamba,在LibriSpeech上实现了超过60%的激活稀疏性,且精度下降不到1%。我们还提出了一个脉冲SpeechMamba,其稀疏性超过70%,同时参数比同类SNN少30%。最后,我们开发了一个周期精确的事件驱动模拟器,实现了灵活的算法-硬件协同探索,帮助我们识别计算瓶颈,并带来超过10%的额外效率提升。

英文摘要

Deep learning has greatly advanced automatic speech recognition (ASR), enabling widespread deployment on edge devices such as smartphones and smart home systems. However, the computational and energy demands of deep neural networks pose significant challenges for such resource-constrained deployments, introducing latency and limiting real-time interaction. Neuromorphic computing offers a promising solution by introducing activation sparsity through spiking neural networks (SNNs) and event-driven neural networks, converting dense operations into sparse computations. However, a study that evaluates the hardware benefits of different neuromorphic strategies remains lacking for ASR. This paper explores spiking and event-driven neuromorphic neural networks to improve activation sparsity in the state-of-the-art SpeechMamba model for ASR. We introduce an event-driven SpeechMamba with FATReLU activation, achieving over 60% activation sparsity with less than 1% accuracy degradation on LibriSpeech. We also propose a spiking SpeechMamba that attains over 70% sparsity while using 30% fewer parameters than comparable SNNs. Finally, we develop a cycle-accurate event-driven simulator enabling flexible algorithm-hardware co-exploration, which helps us identify computational bottlenecks and yields over 10% additional efficiency improvements.

2606.01134 2026-06-02 eess.AS cs.LG cs.SD

Context-aware child-directed speech detection from long-form recordings

基于上下文的儿童导向语音检测:从长时间录音中识别

Théo Charlot, Tarek Kunze, Kaveri K. Sheth, Alejandrina Cristia, Marvin Lavechin

发表机构 * LSCP, DEC, ENS, EHESS, CNRS, PSL University, France(法国社会科学高等学院(LSCP)、法国国家科学研究中心(DEC)、巴黎高等师范学院(ENS)、高等科学研究院(EHESS)、法国国家科学研究中心(CNRS)、巴黎社会科学大学(PSL University))

AI总结 本研究通过微调自监督模型、融入上下文信息以及端到端流水线评估,显著提升了从长时间录音中自动检测儿童导向语音的性能。

Comments 6 pages, 1 figure

详情
AI中文摘要

在长时间录音中自动区分儿童导向语音和成人导向语音,是可扩展分析儿童语言环境的关键。现有方法孤立地处理话语,并且主要针对英语进行评估。我们从三个维度解决这些不足。首先,我们在一个包含182名儿童的多语言数据集上微调并评估了六个自监督模型,表明在儿童中心录音上进行领域内预训练显著优于在成人语音上训练的模型。其次,我们证明融入周围上下文能大幅提升分类性能,平均F1分数绝对提升13.8%。第三,我们在一个现实的端到端流水线中评估我们的模型,从成人语音检测到受话者分类,显示在自动分割下性能有所下降,但仍持续优于基于规则的基线。

英文摘要

Automatically distinguishing child-directed speech from adult-directed speech in long-form recordings is key to scalable analyses of children's language environments. Existing approaches process utterances in isolation and have been evaluated primarily on English. We address these gaps along three dimensions. First, we fine-tune and evaluate six-self supervised models on a multilingual dataset of 182 children, showing that in-domain pre-training on child-centered recordings substantially outperforms models trained on adult speech. Second, we demonstrate that incorporating surrounding context substantially improves classification, with an absolute gain of 13.8% in average F1-score. Third, we evaluate our model in a realistic end-to-end pipeline, from adult speech detection to addressee classification, showing that performance drops under automatic segmentation but still consistently outperforms a rule-based baseline.

2606.01109 2026-06-02 cs.DL cs.CL

Digging Up Citations: FOSSIL, a Dataset and Workflow for Reference Extraction in Law and the Humanities

挖掘引文:FOSSIL——法律与人文学科参考文献提取的数据集与工作流

Luca Foppiano, Christian Boulanger

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 针对法律和人文学科脚注中参考文献提取的挑战,本文提出FOSSIL数据集、PDF-TEI编辑器、标注工作流和Grobid特化模块,将提取质量从0.36提升至0.72(micro-F1)。

Comments This is an extended abstract, peer-reviewed and presented at CiteX2026 https://sites.google.com/view/workshop-on-citation-extractio/startseite

详情
AI中文摘要

引文提取工具是为自然科学中结构化的文末参考文献设计的,但法律和人文学科的学术引用主要出现在脚注中,其中书目数据与评论和交叉引用交织在一起,并且在不同语言和风格之间差异很大。为了解决缺乏合适的黄金标准资源的问题,我们提出了FOSSIL(基于脚注的开放获取社会科学与人文学科科学实例标签),这是一个开放许可的多语言数据集,包含96篇带注释的学术文章,超过7,600个嵌入脚注的参考文献,以及PDF-TEI编辑器(一个协作式网络注释工具)、一个文档化的七人标注工作流和一个用于基于脚注的引文的Grobid特化模块。在端到端评估中,特化流程将提取质量比默认Grobid提高了近一倍(micro-F1从0.36提升到0.72),这主要归功于召回率的提高,同时表明交叉引用和混合内容脚注仍有很大的改进空间。本扩展摘要介绍了正在进行的工作;引文分割和解析的注释以及交叉引用解析正在进行中。

英文摘要

Citation extraction tools are designed for the structured end-of-document bibliographies of the natural sciences, but law and humanities scholarship cites references primarily in footnotes, where bibliographic data is interleaved with commentary and cross-references and varies widely across languages and styles. To address the scarcity of suitable gold-standard resources, we present FOSSIL (Footnote-based Open-access SSH Scientific Instance Labels), an openly licensed multilingual dataset of 96 annotated scholarly articles containing over 7,600 footnote-embedded references, together with PDF-TEI Editor (a collaborative web annotation tool), a documented seven-annotator workflow, and a Grobid specialization for footnote-based citations. In end-to-end evaluation, the specialized pipeline nearly doubles extraction quality over default Grobid (micro-F1 from 0.36 to 0.72), driven largely by improved recall, while showing that substantial headroom remains for cross-references and mixed-content footnotes. This extended abstract presents work in progress; annotations of citations segmentation and parsing, and cross-reference resolution are ongoing.

2606.01107 2026-06-02 cs.LO cs.LG math.LO

How (and when) can you fit examples to logic-based hypothesis classes over infinite structures?

如何(以及何时)能在无限结构上拟合样本到基于逻辑的假设类?

Michael Benedikt, Alessio Mansutti

发表机构 * University of Oxford(牛津大学) IMDEA Software Institute(IMDEA软件研究所)

AI总结 研究在无限结构(如实数有序域和Presburger算术)中,对于逻辑定义的假设类,拟合有限样本的计算复杂性和描述复杂性,并关注通过自然查询语言确定样本可拟合性的情况。

详情
AI中文摘要

我们研究拟合问题,有时称为“训练问题”,其中我们有一个由输入和输出组成的有限样本,并且我们想知道是否存在某个类中的函数可以在给定输入上精确或近似地产生这些输出。我们关注在常见可判定结构(如实数有序域和Presburger算术)中逻辑定义类的拟合的计算复杂性和描述复杂性,以及通过组合或模型论性质定义的更广泛类。我们隔离了这些拟合问题的复杂性,特别关注我们可以使用样本上的自然查询语言中的查询来确定样本是否可拟合的情况。

英文摘要

We study fitting problems, sometimes called ``training problems'', where we have a finite sample consisting of inputs and outputs, and we want to know whether there is a function in a certain class that could produce these outputs, exactly or approximately, on the given inputs. We focus on the computational and descriptive complexity of fitting for logically-defined classes in common decidable structures, like the real ordered field and Presburger arithmetic, and also for broader classes defined via combinatorial or model-theoretic properties. We isolate the complexity of these fitting problems, with particular attention to cases where we can use queries in a natural query language over the sample to determine whether a sample is fittable.

2606.01090 2026-06-02 stat.ME cs.LG

Measuring the Symmetry--Data Exchange Rate

测量对称性与数据交换率

Ahmed M. Adly

发表机构 * Independent Researcher(独立研究者) Egypt(埃及)

AI总结 通过在受控的C_n对称任务上实验,发现错位群约束有害、架构与增强的差距取决于非对称测试计算、对称性交换率与理论值一致但置信区间包含零,并提出了相对率估计器、错位群控制和失败分类法等方法论贡献。

Comments 19 pages, 9 figures. Exploratory study. Code and data at https://github.com/AhmedMostafa16/symmetry-exchange

详情
AI中文摘要

等变理论预测,架构对称性先验将样本复杂度降低|G|倍;这一结论被广泛引用,但很少作为具有控制分离先验及其混杂因素的比例定律进行测量。在一个受控的C_n对称任务上,我们报告了三个发现。首先,具有相同轨道大小和匹配计算的错位群控制比无约束更差(联合成对CI [+0.79, +3.26]排除零,对估计器鲁棒);错位约束是有害的,而不仅仅是无帮助。其次,配备测试时轨道平均的增强基线精确匹配等变模型——跨匹配单元的每周期验证曲线逐位相同——因此架构与增强之间的差距取决于非对称测试时计算,而非无条件。第三,相对交换率beta_diff = 1.28在符号和数量级上与理论值1.0一致(单层CI [+0.92, +2.05]);更保守的两层自助法(种子×群大小)将其扩大到[-0.63, +1.72],包含零,并且在sqrt(2)间隔网格上的更细N复制不确定(点估计-0.82)。方法论贡献——消除共享难度混杂因素的相对率估计器、错位群控制和预先指定的失败分类法——可迁移到任何强度可参数化的归纳偏置。诚实范围:主要估计器beta_diff是在初始分析揭示正斜率可识别性问题后事后采用的;设计从未外部预注册;标题数字基于粗N网格上七个群大小的OLS斜率。这是一项探索性研究,而非确认性测量;错位群结果是最清晰的发现,也是我们报告最有信心的发现。在新鲜种子上的注册复制是未来工作。

英文摘要

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.