arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4033
2605.08268 2026-05-12 cs.MA cs.AI

Insider Attacks in Multi-Agent LLM Consensus Systems

Xiaolin Sun, Zixuan Liu, Yibin Hu, Zizhan Zheng

AI总结 本文研究了多智能体大语言模型(LLM)共识系统中的内部攻击问题,即恶意内部智能体在系统内伪装成合法成员,试图破坏其他智能体达成共识的能力。为解决这一问题,作者提出了一种基于世界模型的框架,通过学习良性智能体的潜在行为状态动态,并结合强化学习训练攻击者,以实现对共识过程的有效干扰。实验表明,该方法相比直接使用恶意提示的基线方法,在降低共识成功率和延长分歧时间方面表现更优,展示了该方法在语言驱动的多智能体系统中对抗性攻击中的潜力。

详情
英文摘要

Large language models (LLMs) are increasingly deployed in multi-agent systems where agents communicate in natural language to solve tasks jointly. A key capability in such systems is consensus formation, where agents iteratively exchange messages and update decisions to reach a shared outcome. However, most existing multi-agent LLM frameworks assume that all participating agents are aligned with the system objective. In practice, a malicious insider may participate as a legitimate member of the group while pursuing a hidden adversarial goal. In this work, we study insider manipulation in multi-agent LLM consensus systems. We formalize the problem as a sequential decision-making task in which a malicious agent seeks to delay or prevent agreement among benign agents. To make attack optimization tractable, we propose a world-model-based framework that learns surrogate dynamics over the latent behavioral states of benign agents and then trains an attacker using reinforcement learning based on this learned model. Preliminary results show that the trained attacker reduces the benign consensus rate and prolongs disagreement more effectively than the direct malicious-prompt baseline. These results suggest that combining latent world models with reinforcement learning is a promising direction for adaptive insider attacks in language-based multi-agent systems.

2605.08267 2026-05-12 cs.SE cs.AI cs.DC cs.ET

Execution Envelopes: A Shared Admission Contract for Backend AI Execution Requests

Krti Tallam

AI总结 随着企业级AI后端需要处理多种类型的执行请求,如何在不重复构建合同的情况下统一管理这些请求成为挑战。本文提出了一种名为“执行信封”的标准化内部准入对象,用于记录请求的执行内容、所需资源、相关政策范围以及后端最终授予的资源,从而为治理和可观测性提供统一的接入点。该方法不涉及具体服务的调度或授权机制,而是定义了一个描述性的准入接口,能够在后端处理前统一应用治理策略,为现代AI后端提供了一种有用的共享执行准入原语。

Comments Systems paper on backend admission contracts, 12 pages, 4 tables

详情
英文摘要

Enterprise AI backends increasingly admit heterogeneous execution requests across model deployment, inference, evaluation, data movement, and agentic workflows. In many systems, those requests arrive in service-specific shapes, which makes it difficult to attach shared admission-time behavior such as logging, governance hints, resource accounting, authorization-aware policy hooks, and later runtime review without rebuilding the same contract in each subsystem. This paper introduces the execution envelope, a normalized internal admission object that records who is asking for what kind of execution, what resources were requested, what policy-relevant scope accompanied the request, and what the backend ultimately granted. The proposal is intentionally narrow. It does not replace service-specific request models, perform scheduling, or introduce a new authority token. Instead, it defines a descriptive admission seam that can be threaded through real backend paths before backend-specific resolution begins. I formalize the distinction between requested and granted resources, specify the field families, invariants, and lifecycle of the envelope, work through POST /serving/deploy_model as an initial proving ground, and position the design relative to usage control, analyzable authorization, admission control, and cluster scheduling. The central claim is that a shared execution-admission contract is a useful missing primitive for modern AI backends because it creates one place to attach governance and observability without pretending to solve placement, policy, and runtime execution in a single step.

2605.08266 2026-05-12 eess.IV cs.CV

Coarse-to-Fine: Progressive Image Compression for Semantically Hierarchical Classification

Jungwoo Kim, Jun-Hyuk Kim, Jong-Seok Lee

AI总结 本文提出了一种基于语义层次的渐进式图像压缩方法,旨在实现从粗粒度到细粒度的语义可扩展性。该方法通过CLIP嵌入对ImageNet-1K类别进行语义层次划分,并基于通道自回归框架将潜在表示分解为按语义层级排序的通道块,每个块专门优化对应层次的语义信息。实验表明,该方法在低比特率下显著提升了粗粒度识别性能,同时在高比特率下保持了细粒度分类的准确性,为任务自适应图像编码提供了高效且可解释的解决方案。

Comments Accepted at ICIP 2026

详情
英文摘要

Recent advances in learned image compression (LIC) have enabled practical deployments, spurring active research into image compression for machines and progressive coding schemes. However, their integration remains under-explored: prior works on progressive machine codec predominantly target sample-level difficulty adaptation (i.e., easy-to-hard), without considering semantic-level scalability. In this work, we introduce a semantic hierarchy-aware progressive codec that enables semantic scalability (i.e., coarse-to-fine) from a single bitstream. We first systematically categorize ImageNet-1K classes into CLIP embedding-based semantic hierarchies. Based on a channel-wise autoregressive framework, we decompose latent representations into hierarchically ordered channel blocks, each explicitly optimized for a corresponding semantic hierarchy. Extensive experiments demonstrate that our approach substantially improves coarse-level recognition at low bitrates while maintaining fine-grained accuracy at higher bitrates. By reframing progressive transmission through the lens of semantic scalability, our work provides an efficient and interpretable solution for task-adaptive image coding, outperforming existing progressive codecs under hierarchical evaluation.

2605.08263 2026-05-12 stat.ML cs.IT cs.LG eess.SP math.IT stat.ME

Decentralized Conformal Novelty Detection via Quantized Model Exchange

Kyle Loh, Yu Xiang

AI总结 本文研究了在保护隐私和节省带宽的前提下,如何在异构复合零假设分布下实现去中心化的异常检测,并控制全局错误发现率(FDR)。研究提出了一种基于量化模型交换的框架,使各独立代理能够共享本地学习的非一致性评分函数的低精度表示。该方法在保证条件交换性的同时,提供了严格的有限样本FDR控制保障,实验验证了其在保持统计效力的同时显著降低了通信成本。

详情
英文摘要

This work studies decentralized novelty detection with global false discovery rate (FDR) control across heterogeneous composite null distributions, without sharing the raw data due to privacy and bandwidth considerations. We propose a framework based on the exchange of quantized surrogate models, allowing independent agents to share low-precision representations of locally learned non-conformity score functions. We prove that evaluating data against these quantized composite scores preserves conditional exchangeability, providing rigorous finite-sample guarantees for global FDR control. Empirical studies on synthetic datasets confirm our theoretical results, demonstrating that the proposed approach maintains competitive statistical power while drastically reducing the communication cost.

2605.08262 2026-05-12 cond-mat.mtrl-sci cs.AI

SLayerGen: a Crystal Generative Model for all Space and Layer Groups

Rees Chang, Andrew Novick, Ryan P Adams, Elif Ertekin

AI总结 该论文提出了一种名为SLayerGen的晶体生成模型,专门用于生成符合任意空间群或层群对称性的晶体结构,以解决传统模型在处理二维超导体、薄膜半导体等双周期材料时的不足。SLayerGen结合了从粗到细的离散自回归晶格生成、基于Transformer的Wyckoff位置和原子元素采样,以及具有空间或层群等变特性的扩散过程,有效提升了生成材料的对称性准确性。研究还构建了双周期材料数据集,提出了相应的评估指标和对称性表示方法,显著提高了双周期材料的从头生成性能。

详情
英文摘要

Crystal generative models have shown rapid progress for accelerating the discovery of bulk, periodic materials. However, many material systems such as 2D superconductors, thin film semiconductors, and catalytic surfaces are diperiodic, i.e., aperiodic along one of the lattice directions. These systems are invariant under the layer groups, which are known to influence materials properties yet not considered by existing models. In this paper, we propose SLayerGen, a generative model that produces crystals constrained to be invariant to any space or layer group. SLayerGen consists of coarse-to-fine discrete autoregressive lattice generation; transformer-based autoregressive sampling of Wyckoff positions, elements, and numbers of symmetrically unique atoms; and space or layer group equivariant diffusion of atomic coordinates. For the diffusion component, we corrected an inconsistency in the loss from prior work arising from hexagonal groups being non-orthogonal in fractional coordinates. To facilitate progress in generative modeling of diperiodic materials, we assembled and filtered datasets of monolayers and bilayers, propose relevant evaluation metrics, and developed novel representations for layer group symmetries. For de novo generation of diperiodic materials, SLayerGen achieves consistent performance gains over bulk crystal generative models and is competitive when training jointly on bulk and diperiodic materials.

2605.08261 2026-05-12 cs.SE cs.AI

Computer Use at the Edge of the Statistical Precipice

Pierluca D'Oro, Sneha Silwal, William Wong, Yuxuan Sun, Fanyi Xiao, Manchen Wang, Eric Gan, Allen Bolourchi, Joseph Tighe

AI总结 该论文探讨了在交互式环境中评估计算机使用代理(CUAs)时存在的方法论问题,并揭示了当前评估方法中的关键缺陷。研究指出,简单执行预录操作序列的脚本在某些基准测试中表现优于先进模型,这暴露了环境设计和评估方法的不足。为此,作者提出了PRISM设计原则和DigiWorld基准平台,并开发了一种基于置信区间聚合的评估框架,强调了规范的环境设计和严谨的评估方法对CUA研究的重要性。

详情
英文摘要

Evaluating Computer Use Agents (CUAs) on interactive environments is fraught with methodological pitfalls that the field has yet to systematically address. We show that a 1MB replay script that blindly executes a recorded action sequence without ever observing the screen outperforms frontier models on prominent static benchmarks, and prove that its expected success rate is exactly equal to the source agent's pass@k in deterministic environments. We trace this and other failures to two root causes: non-principled environment design (static, unsandboxed, or unreliably verified environments) and non-principled evaluation methodology (naive aggregation and misuse of pass@k for stateful UI interactions). To address the first, we propose PRISM, five design principles for CUA environments (privileged verification, realistic environments, integrity-checked configurations, sandboxed execution, and multifactorial variability) and instantiate them in DigiWorld, a benchmark of 15 realistic sandboxed mobile applications able to evaluate agents in over 3.2 million verified unique configurations. To address the second, we develop an aggregation framework pairing Wilson score intervals with hierarchical bootstrap, producing confidence intervals that correctly account for the nested structure of CUA benchmarks, as we empirically demonstrate. All together, we show that principled environment design and rigorous evaluation methodology are not optional refinements but prerequisites for meaningful CUA research.

2605.08257 2026-05-12 cs.CR cs.AI cs.LG

Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks

Saisai Hu

AI总结 本文针对医疗决策智能代理在对抗环境下的安全性问题,提出了一种全链路安全增强框架ARSM-Agent,通过输入风险感知、医学证据约束、知识一致性验证、决策置信度重加权和安全输出控制等模块协同工作,有效提升了模型的鲁棒性和安全性。实验表明,该方法在多种攻击场景下显著降低了攻击成功率,并提高了知识一致性得分,验证了其在医疗决策任务中的有效性与可靠性。

Comments 5 pages, 2 figures, 1 table.Accepted for oral presentation at AINIT 2026

详情
英文摘要

Motivated by the challenge to improve the adversarial robustness, security, and trust of medical decision making intelligent agents, this study develops a full-link security enhancement framework, which describes "input risk perception - medical evidence constraint - knowledge consistency verification - decision confidence reweighting - security output control - adversarial feedback update." We propose ARSM-Agent and define a weighted joint objective consisting of decision accuracy loss, adversarial robustness loss, safety refusal loss, and knowledge consistency loss, with weights of 0.3, 0.3, 0.2, and 0.2, respectively. The whole medical decision formulation is implemented by multi-module collaborative linkage. We verify that the algorithm is more efficient than four baselines, including LLM-Agent, Retrieval-Agent, Filter-Agent, and Adv-Train-Agent. Under semantic perturbation, prompt injection, drug-name confusion, and false-evidence attacks, ARSM-Agent reduces the overall attack success rate to 8.7% and achieves a knowledge consistency score of 0.91. Ablation experiments quantify each module's contribution: removing risk perception, evidence retrieval, consistency verification, and confidence reweighting reduces accuracy by 6.7%, 9.1%, 7.6%, and 4.4%, respectively, and increases attack success rate by 13.8%, 11.1%, 8.6%, and 6.9%. The proposed approach addresses key security issues of medical decision making intelligent agents, obtains secure decision making in challenging scenarios, and provides reliable intelligent support for medical decision-making intelligent agents.

2605.08247 2026-05-12 cs.PL cs.AI

LLM Translation of Compiler Intermediate Representation

Andrea Valenzuela Ramirez, Cristian Gutierrez-Gomez, Marta Barroso, Dario Garcia-Gasulla, Sara Royuela

AI总结 该研究探讨了使用大语言模型(LLM)实现编译器中间表示(IR)之间的翻译问题,旨在解决不同编译器(如GCC和LLM)之间IR语义和结构差异带来的交互障碍。研究提出了一个名为IRIS-14B的140亿参数Transformer模型,专门用于将GCC生成的GIMPLE IR转换为LLVM IR,并在真实C代码和编程竞赛问题上进行了评估。该模型在翻译准确率上显著优于现有大型模型,是首个专门针对IR到IR翻译训练的模型,为构建混合神经符号编译器架构提供了新的可能性。

详情
英文摘要

GCC and LLVM underpin much of modern software infrastructure, relying on distinct Intermediate Representations (IRs) to drive optimizations and code generation. However, the semantic and structural differences between these IRs create significant barriers for cross-toolchain interaction, limiting the reuse of compiler frontends, backends, and optimization pipelines across programming languages and compilation ecosystems. Traditional rule-based translators have attempted to bridge this gap, but their complexity and maintenance cost have hindered practical adoption. In this context, Large Language Models (LLMs) appear to be an emerging technology that offers a data-driven alternative, capable of learning complex mappings between heterogeneous compiler IRs directly from sufficiently representative examples. To explore this approach, this paper presents IRIS-14B, a 14-billion-parameter transformer model fine-tuned to translate GIMPLE (as emitted by GCC) to LLVM IR (as emitted by LLVM). The model is trained on paired IRs extracted from C sources and evaluated on the GIMPLE-to-LLVM IR transformation applied to IRs derived from real-world C code and competitive programming problems. To the best of our knowledge, IRIS-14B is the first model trained explicitly for IR-to-IR translation. It outperforms the accuracy of widely used models, including the largest state-of-the-art open models available today, ranging from 13 to 1,000 billion parameters, by up to 44 percentage points. The proposed transformation supports the integration of LLMs as complementary components within hybrid neuro-symbolic compiler architectures, where models such as IRIS-14B act as interoperability layers enabling cross-toolchain workflows without modifying existing compiler passes, while traditional compiler infrastructure continues to perform deterministic compilation and optimization.

2605.08243 2026-05-12 cs.PL cs.DC cs.LG

GPU-Accelerated Synthesis of Mixed-Boolean Arithmetic: Beyond Caching

Gabriel Bathie, Baptiste Mouillon, Nathanaël Fijalkow

AI总结 本文研究了从输入输出示例中合成混合布尔算术(MBA)表达式的问题,这对于程序反混淆、编译器优化和密码分析等任务至关重要。现有方法多基于CPU,难以处理大规模或复杂的任务,而近期基于GPU的加速方法虽然提升了效率,但依赖缓存策略,在MBA任务中因输出空间巨大而效果受限。本文提出了一种无需缓存的GPU加速合成工具SIMBA,采用自底向上的枚举策略,实现了高效并行计算,实验表明其在速度和处理规模上均优于现有方法,为定量领域的MBA合成提供了实用且可扩展的新方案。

详情
英文摘要

Synthesizing Mixed-Boolean Arithmetic (MBA) expressions from input-output examples is central to program deobfuscation and also useful for compiler optimization, reverse engineering, and cryptanalysis. Existing MBA synthesizers are typically CPU-based and scale poorly on large specifications or complex targets. Recent GPU-accelerated synthesis methods achieve large speedups in qualitative settings, but they depend on caching observationally equivalent candidates; this strategy breaks down for MBA because candidate outputs are quantitative bitvectors and the behavioral space is enormous. We present SIMBA (Synthesis of Mixed-Boolean Arithmetic), a GPU-accelerated MBA synthesizer built around cache-free bottom-up enumeration. SIMBA avoids language caches entirely and uses a GPU-oriented enumeration design that keeps work local and highly parallel. In experiments, SIMBA is substantially faster than prior MBA synthesis tools, handles larger specifications, and reaches expression sizes that existing methods fail to solve. These results establish cache-free GPU synthesis as a practical and scalable approach for quantitative domains, and identify it as a strong alternative to cache-centric designs.

2605.08242 2026-05-12 q-bio.QM cs.AI cs.LG

An Explainable Unsupervised-to-Supervised Machine Learning Framework for Dietary Pattern Discovery Using UK National Dietary Survey Data

Wing Yi Yu, Chun Yin Chiu

AI总结 该研究提出了一种可解释的无监督到有监督机器学习框架,用于利用英国国家饮食与营养调查数据发现和解释饮食模式。通过比较多种聚类算法,研究识别出四种具有饮食学意义的饮食模式,并使用监督分类器验证了其可重复性。该方法不仅提升了饮食数据的解释能力,还为营养师参与的个性化饮食指导提供了潜在支持。

Comments 12 pages, 6 figures, 9 tables. Accepted by the 14th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2026)

详情
英文摘要

Clinical dietary assessment can generate detailed but high-dimensional nutrient and food-group information that is difficult to translate quickly into counselling priorities. This paper proposes an explainable unsupervised-to-supervised machine learning framework for discovering, reproducing and interpreting dietary patterns using public UK National Diet and Nutrition Survey data. Adult participants aged 19 years and above from NDNS Years 12-15 were represented using 25 energy-adjusted nutrient and food-group features. K-means, Gaussian Mixture Models and Agglomerative Clustering were compared across k = 2-8, with stability and dietetic interpretability used alongside internal validation metrics. The selected K-means k = 4 solution identified four interpretable dietary patterns: high fat/meat and sodium, higher fibre fruit-vegetable micronutrient, high free-sugar snacks and sugary drinks, and dairy/cereal calcium-rich saturated-fat. A supervised surrogate classifier reproduced held-out cluster membership with high test performance (macro-F1 = 0.963), but was interpreted only as an explanatory surrogate rather than as an independent clinical prediction model. SHAP analysis linked predictions to dietetically meaningful drivers, suggesting potential value for dietitian-in-the-loop assessment, counselling prioritisation and follow-up monitoring.

2605.08233 2026-05-12 eess.SP cs.LG

Inverse Design of Multi-Layer Sub-Pixel-Resolution RF Passives Through Grayscale Diffusion with Flexible S-Parameter Conditioning

Tommaso Dreossi, Christopher M. Bryant, Hao Liu, Nathan Mirman, Noah Kessler, Michael Frei, Harish Krishnaswamy

AI总结 本文研究了从部分S参数逆向设计多层射频无源器件的问题,提出了一种基于灰度扩散和灵活S参数条件的生成方法,能够在亚像素分辨率下生成双层铜布线结构,并满足物理约束和多种设计条件。该方法能够在数秒内生成候选设计,预测的S参数与目标误差较小,已通过实际制造验证了其有效性。

详情
英文摘要

Inverse design of RF passive components from S-parameters is a high-dimensional, ill-posed problem, and prior generative approaches are limited to single-layer binary-metallization structures. This paper presents an inverse design approach that generates passive components from partial S-parameter inputs on an $8\times8$ mm board discretized at $64\times64$ pixels with sub-pixel grayscale metallization across 1-20 GHz. The framework generates two-layer copper layouts with vias, with hard physical constraints on feed locations enforced through annealed Langevin projection, flexible multi-modal conditioning on partial S-parameter specifications, port locations, dielectric properties, reference topology, and variable port placement. Candidate designs are generated in seconds, with surrogate-predicted S-parameters matching targets to within $0.77 \pm 1.28$ dB weighted mean absolute error. We validate the approach with two fabricated designs on RO4003C: a manufacturable alternative to a hairpin filter whose coupling gaps violate fabrication rules, and a combline bandpass filter designed from scratch given only target S-parameters.

2605.08224 2026-05-12 cs.IT cs.SD math.HO math.IT

Uniqueness on a Continuum: Quantifying Tonal Ambiguity Using Information Theory

Michael Seltenreich

AI总结 本文提出了一种基于信息论的连续度量方法,用于量化音调模糊性,扩展了传统的“唯一性”概念。该方法解决了原有唯一性概念无法区分具有唯一性的集合、无法捕捉有限转调模式中的层次结构以及无法考虑时间展开等问题。该度量适用于音高类集合和不同调音系统,拓展了音调关系的分析范围,并为音乐理论与分析提供了实用工具。

Comments 14 pages, 6 figures, 9 tables

详情
英文摘要

We propose a continuous measure of tonal ambiguity that extends the established concept of uniqueness. While uniqueness is widely regarded as necessary for tonality, it cannot (i) discriminate among sets that possess it, (ii) capture hierarchical organization in modes of limited transposition, or (iii) account for temporal unfolding. To address these limitations, we introduce a companion measure, grounded in information theory, that quantifies tonal ambiguity on a continuous scale. The measure applies across pitch-class sets and tuning systems, expanding analytic coverage of tonal relationships and offering a practical tool for theory and analysis.

2605.08211 2026-05-12 eess.SP cs.IT cs.LG math.IT

Learning the Channel Gain from Anywhere to Anywhere via Cross-environment Transformer Estimators

Prasenjit Dhara, Daniel Romero

AI总结 本文研究了如何从任意环境中的少量测量数据中高效估计任意两点之间的信道增益地图。为解决传统方法依赖不准确模型或需要大量测量的问题,作者提出了一种基于元学习的Transformer估计器,通过利用不同环境中信道增益地图所共有的空间结构和物理规律,显著减少了所需测量数量。实验表明,该方法在保持估计精度的同时,相比现有方法减少了五倍的测量需求。

详情
英文摘要

Channel-gain maps provide the channel gain between any two locations in a geographical region. They find numerous applications, from resource allocation and interference control to path planning for autonomous vehicles. Channel-gain map estimation (CGME) is considerably more challenging than conventional radio map estimation (RME) because channel-gain maps are functions over a 6-dimensional input space. This calls for specialized methods, which currently rely on the (inaccurate) radio tomographic model or require a prohibitively large number of measurements since they do not exploit any spatial structure. This paper overcomes this issue by leveraging spatial patterns that channel-gain maps exhibit across environments, as dictated by the laws of physics and typical environmental characteristics (e.g. building materials and layouts). Adopting a metalearning perspective, a transformer-based estimator is proposed to implicitly learn this common structure from measurements collected in multiple environments. This enables CGME in new environments from significantly fewer measurements (five times less in our experiments). To maximize learning efficiency, the transformer is composed with a feature map that enforces the invariances of CGME, such as those following from reciprocity. Numerical experiments corroborate the merits of the proposed estimator relative to existing methods.

2605.08199 2026-05-12 eess.SP cs.LG

Domain-Adaptive Arrhythmia Classification Using a Hybrid Transformer on Wearable Heart Signals

Maedeh H. Toosi, Siamak Mohammadi

AI总结 该研究针对可穿戴设备上心律失常分类中因设备差异导致的领域偏移问题,提出了一种融合变换器的混合模型,结合原始心电信号与七种心率变异性特征,分别捕捉心跳形态和节律统计信息。通过最大均值差异(MMD)等表征学习技术对齐不同领域的特征分布,提升模型在未知设备数据上的泛化能力。实验表明,该模型在未见过的可穿戴设备数据上取得了95%的F1-macro和96.15%的平衡准确率,性能下降仅为2%,展示了其在家庭和移动心电监测中的应用潜力。

详情
英文摘要

Cardiovascular disease remains the leading cause of death globally, underscoring the need for effective, accessible monitoring solutions, particularly through wearable devices that enable continuous, real-time tracking of heart rhythms in home settings. However, deploying deep learning models trained on clinical electrocardiogram (ECG) datasets to wearable devices remains challenging, as differences in recording equipment, signal quality, and patient populations introduce domain shifts that degrade model performance. We propose a hybrid transformer model that processes continuous ECG signals alongside seven heart rate variability (HRV) features, where the raw signal path captures beat-level morphological patterns and the HRV path encodes rhythm regularity statistics, allowing the model to jointly leverage complementary information from both representations. To enhance the model's ability to generalize across domains, we employ representation learning techniques, including Maximum Mean Discrepancy (MMD), a non-parametric kernel-based metric that quantifies the distance between feature distributions of different domains, to align feature distributions between source and target domains, addressing the challenge of domain shifts between public datasets and wearable device data. By leveraging five public ECG datasets for training, the model learns robust, generalized representations that mitigate domain-specific biases. When tested on wearable device data with an unseen domain, the model achieved an F1-macro 95% and balanced accuracy of 96.15%. These results demonstrate minimal performance degradation, with only a 2% drop in F1-macro compared to seen-domain evaluation, highlighting the model's generalization capabilities and its potential for reliable, real-time heart monitoring applications in home and ambulatory settings.

2605.08192 2026-05-12 cs.CY cs.AI cs.LG cs.SE

NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims

Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais

AI总结 本文探讨了前沿人工智能安全声明的可复现性问题,指出当前AI安全领域中最具影响力的声明往往缺乏必要的验证材料,导致评估困难。作者建议NeurIPS应制定强制性的可复现性标准,以提升AI安全研究的透明度和可信度。文章提出了一种三级披露框架,并配套强制性声明清单和分阶段实施路径,旨在平衡安全与透明之间的需求。

Comments Preprint

详情
英文摘要

Frontier AI safety claims - published assertions that a highly capable general-purpose model is below a threshold of concern, adequately mitigated, or suitable for release - increasingly shape model deployment, governance, and public trust. Yet the artefacts needed to evaluate them are routinely withheld, producing an evidential inversion: the most consequential claims in AI safety are often the least reproducible. This position paper argues that NeurIPS should require reproducibility standards for papers making such claims, treating non-reproducibility not as a transparency preference but as an evaluation-methodology failure. The 2026 International AI Safety Report [Bengio et al., 2026] concludes that reliable pre-deployment safety testing has become harder to conduct and that models now distinguish test from deployment contexts; the 2025 Foundation Model Transparency Index [Wan et al., 2025] reports a sector-average transparency score of 40/100 with no major developer adequately disclosing train-test overlap; contemporaneous measurement-theory work shows that attack-success-rate comparisons across systems are often founded on low-validity measurements [Chouldechova et al., 2025]. We propose a three-tier disclosure framework, distinguishing public, controlled, and claim-restricted disclosure, paired with a mandatory claim inventory, scope statements, and a phased implementation path with graduated sanctions. The framework treats secrecy and openness as endpoints of a spectrum, with controlled review (via a federated colloquium of qualified secure-review hosts) covering claims whose artefacts cannot be released publicly, and right-scaling claims whose artefacts cannot be reviewed even confidentially. The standard the community applies to its most consequential claims should be at least as high as the standard it applies to its least.

2605.08187 2026-05-12 eess.SP cs.LG

Towards Interpretable Damage Detection based on Aerodynamic Pressure Measurements

Philip Franz, Max von Danwitz, Gregory Duthé, Alexander Popp, Eleni Chatzi

AI总结 本文研究如何基于气动压力测量实现可解释的结构损伤检测,针对现代大型风力涡轮机叶片的结构监测需求,提出使用一种非侵入式、经济的Aerosense传感系统获取气动压力数据。通过实验验证,构建了基于卷积神经网络的损伤检测模型,并结合物理机理与可解释机器学习方法,提升了检测过程的透明性与物理一致性。

Comments 28 pages, 30 figures

详情
英文摘要

The increasing flexibility of modern large wind turbine blades necessitates cost-efficient and reliable structural monitoring solutions. For this purpose, we propose to use aerodynamic pressure measurements obtained via Aerosense, a novel, non-intrusive and economical sensing system. In former work [Franz et al., 2025], we investigated the potential of aerodynamic pressure measurements for structural damage detection on elastic and aerodynamically loaded structures. An experimental campaign was conducted on a NACA 633418 airfoil mounted on a vertically vibrating cantilever beam within an open wind tunnel. Structural damage was introduced progressively through controlled saw cuts near the beam support. Aerodynamic pressure distributions were recorded under varying inflow conditions and structural states. Based on this data set, we developed a convolutional neural network to detect structural damage and classify its severity using only aerodynamic pressure signals. The results demonstrate that pressure measurements can effectively enable real-time detection and quantification of damage in elastic, beam-like structures subjected to mildly turbulent flow and varying operational conditions. Recognizing the limitations of pure black-box classification, in this study, we further incorporate physics-based insights and explainable machine learning methods to interpret how structural damage influences both the dynamic response and the aerodynamic pressure field. This leads to an enhanced damage detection pipeline, aiming to improve transparency, robustness, and physical consistency in data-driven monitoring of elastic, aerodynamically loaded structures.

2605.08186 2026-05-12 eess.AS cs.AI cs.LG

Rethinking Entropy Minimization in Test-Time Adaptation for Autoregressive Models

Wei-Ping Huang, Chee-En Yu, Guan-Ting Lin, Hung-yi Lee

AI总结 本文研究了在测试时自适应(TTA)中熵最小化(EM)方法在自回归模型中的应用问题,指出当前方法缺乏统一的理论基础。作者推导出适用于自回归模型的严格熵最小化公式,证明其目标函数可分解为令牌级策略梯度损失和熵损失,并将以往方法解释为该框架的部分实现。实验表明,该方法在包括噪声、口音和多语言在内的20多个领域中显著提升了Whisper语音识别系统的性能。

Comments Submitted to INTERSPEECH 2026

详情
英文摘要

Test-Time Adaptation (TTA) via entropy minimization (EM) has proven effective for classification tasks, yet its application to generative autoregressive models remains theoretically fragmented. Existing approaches typically rely on distinct heuristics, such as teacher forcing with pseudo labels or policy-gradient-based reinforcement learning, without a unified mathematical foundation. In this work, we resolve this discrepancy by deriving a rigorous formulation of EM tailored to autoregressive models. We show that the exact objective naturally decomposes into a token-level policy gradient loss and a token-level entropy loss, and we reinterpret prior methods as partial realizations of this unified formulation. Using Whisper ASR as a testbed, we demonstrate that our approach consistently improves performance across more than 20 diverse domains, including acoustic noise, accents, and multilingual settings.

2605.08184 2026-05-12 eess.SP cs.AI

Improving TMS EEG Signal Quality for Closed-Loop Neuro Stimulation via Source-Domain Denoising

Zhen Tang, Ameer Hamoodi, Stevie Foglia, Aimee Nelson, Zhen Gao

AI总结 本研究旨在提升经颅磁刺激(TMS)诱发脑电(EEG)信号的质量,以支持闭环神经调控应用。通过构建一个经过严格预处理的参考数据集,评估了两种常用的基于源的伪影去除方法,并验证了其对信号质量提升和TMS诱发电位保留的效果。研究提出的预处理流程具有良好的鲁棒性,有助于提高数据可靠性,并为未来脑机接口(BCI)系统集成及临床与科研应用提供了基础支持。

详情
英文摘要

This research addresses a validated TMS EEG cleaning pipeline and a corresponding benchmark dataset. It evaluates two widely used artifact removal pipelines. A reference dataset of carefully preprocessed EEG signals was established to support future algorithm development and enable systematic comparison of automated artifact removal strategies, despite the absence of a true physiological ground truth. The study evaluates the effectiveness of two widely used source based artifact removal approaches and examines their impact on signal quality improvement and preservation of TMS-evoked potentials. The results support the robustness of the proposed preprocessing workflow and demonstrate its potential for improving data reliability in both research and clinical applications. A key goal is integrating TMS EEG and embedding it within a larger BCI framework. Ultimately, these efforts aim to enhance understanding of cortical dynamics and expand the clinical and research applications of TMS EEG.

2605.08180 2026-05-12 cs.IT cs.AI cs.IR cs.LG cs.NI eess.SP math.IT

Information Density as a Quantitative Measure for AI-enabled Virtual Sensing: Feasibility and Limits

Hrishikesh Dutta, Roberto Minerva, Reza Farahbakhsh, Noel Crespi

AI总结 本文提出信息密度作为量化指标,用于支持传感器部署和实现人工智能驱动的虚拟传感。研究通过利用传感器信号在空间、时间和跨模态间的相关性,在没有物理传感器的情况下完成感知任务,并提出了两种互补的度量方法——特征空间中的相位和互信息,用于评估信息密度,从而优化传感器配置。实验验证表明,在一定误差范围内,虚拟传感器可有效替代物理传感器,展现出在智能环境中构建可扩展、节能感知系统的重要潜力。

Comments IEEE Transactions on Sustainable Computing (2026)

详情
英文摘要

Modern IoT and sensor networks generate vast amounts of data, posing significant challenges for storage, transmission, and real-time processing. Traditional approaches, such as compressive sensing and machine learning-based compression, often suffer from computational inefficiencies and irreversible data loss. This paper introduces Information Density as a quantitative metric to support sensor deployment and enable AI-driven virtual sensing. We propose a framework that leverages spatial, temporal and inter-modal correlations among sensor signals to perform sensing tasks even in the absence of physical sensors. Two complementary measures: (i) Phase in Eigen Space and (ii) Mutual Information, are developed to quantify and assess information density, enabling the selection of optimal sensor configurations across both intra-modality and cross-modality scenarios. Validated using real-world data from Madrid's smart city infrastructure, this framework demonstrates the feasibility of replacing physical sensors with virtual ones under bounded error conditions (e.g., achieving $<3.21\%$ mean error with a single sensor). The results highlight the potential for scalable and energy-efficient sensing systems in smart environments.

2605.08179 2026-05-12 eess.SP astro-ph.IM cs.LG

Neural Posterior Estimation of Terrain Parameters from Radar Sounder Data

Jordy Dal Corso, Annalena Kofler, Marco Cortellazzi, Lorenzo Bruzzone, Bernhard Schölkopf

AI总结 本文研究如何从雷达声纳数据中估计地形参数,提出了一种基于模拟的推理方法,利用GPU加速的模拟器生成合成观测数据,训练神经网络进行后验密度估计。该方法通过引入参考地表假设,系统评估后验对地表变化的鲁棒性,并在模拟数据和实际火星雷达剖面中验证了模型的校准性和迁移能力,为行星地表参数分析提供了新的工具。

Comments 5 pages, 3 figures; accepted at IGARSS 2026, 9 - 14 August 2026, Washington D.C., USA

详情
英文摘要

Radar sounders are electromagnetic instruments that can probe deep into the subsurface of Earth and other planetary bodies by processing the echo of transmitted radar waves. Conventional approaches for analyzing such data rely on approximate assumptions and often produce point estimates that ignore parameter correlations as well as galactic and measurement noise. We propose a simulation-based inference approach to terrain parameter inversion from radar sounder data, where synthetic observations from a GPU-based simulator are used to train a neural network-based density estimator for neural posterior estimation (NPE). By explicitly conditioning on reference surface assumptions, the proposed framework allows systematic evaluation of posterior robustness to reference surface variability. We demonstrate that our NPE model is well calibrated on simulated data and transferable to real Mars radar profiles, where we analyze terrain parameters using literature-informed reference values.

2605.08164 2026-05-12 cs.DC cs.AI cs.CR

parHSOM: A novel parallel Hierarchical Self-Organizing Map implementation

Rebekah Lane, Logan Cummins, Andy Perkins, George Trawick, Ioana Banicescu, Sudip Mittal

AI总结 本文提出了一种新型的并行分层自组织映射(parHSOM)架构,旨在解决传统分层自组织映射(HSOM)在处理大规模数据集时训练速度慢的问题。通过引入并行计算机制,parHSOM在多个测试平台和网络安全数据集上均表现出更快的训练速度,且性能损失不显著。该研究为未来探索并行HSOM的实现提供了实验平台,对构建高效、可解释的网络安全入侵检测系统具有重要意义。

详情
英文摘要

The digital age has completely transformed the way that information is processed and stored, which makes cybersecurity a crucial field of research. Cybersecurity contains many different domains, but this work focuses on Intrusion Detection Systems (IDSs). Within the literature, Hierarchical Self-Organizing Maps (HSOMs) have been used to create trustworthy, explainable, and AI-based IDSs. However, HSOMs are trained sequentially, which means that training HSOMs on large datasets is slow. This work presents a novel parallel HSOM architecture, called parHSOM. The purpose of this research is to investigate the effect that parallel computation has on the HSOM training time. parHSOM is tested on two different testbeds, four different output grid sizes, and five different cybersecurity datasets. Performance metrics collected from these experiments show that parHSOM consistently trains faster than the Sequential HSOM algorithm without any significant loss in performance. Additionally, this work provides a platform for further investigation into parallel HSOM implementations.

2605.08152 2026-05-12 cs.DC cs.AI

Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures

Divya Gupta

AI总结 本文研究了如何在可扩展的分布式架构中实现隐私保护的联邦学习,提出了一种集成零知识证明(ZKP)的新架构,以增强联邦学习过程中的安全性与效率。该方法通过在全局聚合前对节点计算进行密码学验证,有效防御模型中毒攻击,同时保持数据隐私。实验表明,该混合架构在对抗性条件下仍能保持高达94.2%的准确率,并支持千节点规模的高效分布式训练。

详情
英文摘要

The intersection of Artificial Intelligence (AI) and distributed systems has given rise to Federated Learning (FL), a paradigm that enables decentralized model training without compromising local data privacy. As organizational data silos grow, deploying complex machine learning models across highly distributed edge networks becomes a critical infrastructural challenge. Standard FL implementations suffer from severe vulnerabilities related to adversarial gradient updates and computational bottlenecks at the aggregation layer. This paper presents a novel, end-to-end distributed architecture that hardens FL pipelines using advanced cryptographic verification and optimized big data processing frameworks. We introduce a Zero-Knowledge Proof (ZKP) wrapper that cryptographically validates node computations before global aggregation, neutralizing model poisoning attacks without inspecting raw gradients. Additionally, we evaluate the system's performance using extreme gradient boosting models optimized for distributed edge execution. We formalize the mathematical transformation of the machine learning loss functions into Rank-1 Constraint Systems (R1CS) suitable for succinct verification. Extensive experimental results demonstrate that our hybrid architecture achieves a 94.2\% accuracy retention under adversarial conditions while maintaining scalable throughput across 1,000 parallel distributed nodes, effectively bridging the gap between rigorous cryptographic security and high-performance distributed AI.

2605.08140 2026-05-12 physics.ins-det cs.AI cs.LG

Forecasting Source Stability in Scientific Experiments using Temporal Learning Models: A Case Study from Tritium Monitoring

Nicholas Tan Jerome, Nadia Aouadi, Christoph Koehler, Suren Chilingaryan, Andreas Kopmann

AI总结 该研究针对卡尔斯鲁厄氚中微子实验(KATRIN)中氚气源稳定性预测的问题,利用深度学习时间序列模型,如LSTM、N-BEATS等,对实验中稀疏且瞬时的不稳定性事件进行建模与预测。研究揭示了在稀疏事件学习和长期时间预测方面存在的挑战,并发现N-BEATS模型在准确性和可重复性上表现最佳,展示了深度学习在优化大型物理实验中的潜力。该成果有助于提升实验调度与维护效率,对实验运行具有直接应用价值。

详情
英文摘要

The Karlsruhe Tritium Neutrino Experiment (KATRIN) aims to measure the absolute neutrino mass with unprecedented sensitivity, requiring precise monitoring of the windowless gaseous tritium source, where tritium beta decay occurs. To track variations of the source activity, beta-induced X-ray spectroscopy provides real-time diagnostics. However, traditional drift detection methods struggle with the infrequent and transient nature of instability events in gaseous tritium. This study bridges the gap between state-of-the-art time-series forecasting models and real-world experimental applications by leveraging deep learning to predict the time to stability after instabilities. Unlike standard benchmarking approaches that emphasize algorithmic performance on fixed datasets, we apply forecasting models -- including LSTM, N-BEATS, TFT, NHITS, DLinear, NLinear, TSMixer, and Chronos-LLM -- to complex, large-scale experimental data. Our findings highlight two challenges: learning from sparse instability events and forecasting long time horizons (i.e., predicting hundreds of future points), both of which are ongoing challenges in time-series forecasting and remain active areas of research. This prediction task has direct experimental value by enabling better scheduling and maintenance planning. A reliable forecast of stability time allows for more efficient measurement and task management during stabilization periods. Through model selection, we identified N-BEATS as the top performer, excelling in accuracy and repeatability, demonstrating that deep learning can optimize large-scale physics experiments.

2605.08139 2026-05-12 cs.DC cs.AI

Intelligent Autonomous Orchestration for Distributed Cloud Resources using Complex-Stability Analysis

Gopal Krishna Shyam, Priyanka Bharti

AI总结 在现代分布式云环境中,传统扩展机制常因网络延迟导致资源抖动,为此本文提出了一种基于复数稳定性分析的智能自主调度框架C-SAS。该方法通过解析复平面上的系统行为,将监控噪声转化为确定性的“安全区域”,并利用解析稳定性指数实时抑制不必要的资源波动,从而显著提升系统稳定性与资源效率。实验表明,C-SAS将虚拟机抖动降低了94%,资源利用率达到了96%,优于传统PID和基于机器学习的调度方法,为未来高可靠性云基础设施提供了新的思路。

Comments 7 pages

详情
英文摘要

In modern distributed cloud environments, efficient resource allocation is required as traditional scaling mechanisms are often subject to cloud thrashing due to network-induced latencies. In this paper, we propose C-SAS (Complex-Stability Aware Scaling), an intelligent autonomous orchestration framework that leverages complex analytic methods to achieve system-wide equilibrium. In contrast to heuristic-based models, C-SAS acts as a stability-aware agent, converting telemetry noise into a deterministic "Safety Envelope" on the $s$-plane using the Argument Principle and Rouché's Theorem. The algorithm smartly suppresses oscillatory scaling operations that would otherwise degrade performance, by computing a real-time Analytic Stability Index (ASI). The experimental results show that C-SAS reduces VM flapping by 94\%, and achieves 96\% resource efficiency, significantly outperforming standard PID and ML-based autonomous agents. Our results suggest that future resilient autonomous cloud infrastructures will require AI-driven orchestrators with built-in formal stability constraints.

2605.08124 2026-05-12 cs.DC cs.CL cs.MA cs.NI

Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence

Bowei He

AI总结 移动代理系统作为在边缘设备和AIoT生态中实现智能应用的关键范式,其可扩展性受到设备计算能力有限和智能分布碎片化的制约。本文提出了一种统一的研究框架,从两个互补方向推动移动代理系统的扩展:一方面通过紧凑基础模型设计与压缩提升单个代理的能力密度,另一方面借助丰富的多代理协作实现群体智能。该研究旨在将孤立的移动代理转化为高效且可扩展的分布式智能系统。

Comments Accepted by ACM MobiSys 2026

详情
英文摘要

Mobile agent systems are emerging as a key paradigm for enabling intelligent applications on edge devices and in AIoT ecosystems. However, their scalability is fundamentally constrained by limited on-device computation and fragmented intelligence across devices. In this work, we propose a unified research agenda for scaling mobile agent systems along two complementary dimensions: (1) improving capability density of individual agents through compact foundation model design and compression, and (2) enabling collective intelligence via communication-rich multi-agent collaboration. Building on recent model and infrastructure advances, this vision aims to transform isolated mobile agents into a distributed intelligent system that is efficient and scalable.

2605.08121 2026-05-12 cs.DC cs.LG

Performance and Energy Trade-Off Analysis of Hierarchical Federated Learning for Plant Disease Classification

Athanasios Papanikolaou, Athanasios Tziouvaras, Pavlos Stoikos, Apostolos Xenakis, Shameem A Puthiya Parambath, George Floros, Enrica Zereik, Ivan Petrovic, Fabio Bonsignorio

AI总结 本文研究了分层联邦学习在植物病害分类中的性能与能耗权衡问题,针对大规模物联网环境下的计算成本和能效挑战,提出了一种兼顾模型性能与能量效率的优化框架。通过设计空间探索,分析了不同模型与聚合策略的组合效果,并实验验证了多种卷积神经网络架构在分层联邦架构下的表现,揭示了不同配置在准确率与资源消耗之间的显著差异。

Comments Accepted for publication at the 2026 ERAS Conference

详情
英文摘要

Early detection of plant diseases is critical for improving crop productivity, while it also facilitates the foundations of precision agriculture. Recent advances in distributed deep learning have enabled plant disease classification models to be trained across geographically distributed agricultural sensing infrastructures. However, deploying such systems in large-scale Internet of Things (IoT) environments, introduces significant challenges related to computational cost, energy consumption, and system efficiency. In this paper, we present a design-space exploration of hierarchical federated learning architectures for plant disease classification, with a particular focus on the trade-offs between predictive performance and energy efficiency. We further introduce a power- and energy-aware optimization framework that enables the systematic evaluation and selection of model-aggregator configurations under varying deployment constraints. The hierarchical federated architecture organizes distributed clients through intermediate aggregation layers, reducing communication and computational overhead. We evaluate multiple convolutional neural network architectures, including EfficientNet-B0, ResNet-50, and MobileNetV3-Large, in combination with different federated aggregation strategies such as FedAvg, FedProx, and FedAvgM. Experimental results demonstrate that different model-aggregator combinations exhibit distinct performance-energy trade-offs. Consequently, we highlight configurations that achieve competitive diagnostic accuracy and significantly reduce system resource requirements.

2605.08117 2026-05-12 eess.SP cs.CV cs.LG

Modular Retrieval-Augmented Generalization for Human Action Recognition

Peng Liao, Shangsong Liang, Lin Chen, Peijia Zheng

AI总结 本文提出了一种名为MoRA的模块化检索增强通用化方法,专门用于惯性测量单元(IMU)的人类动作识别任务。该方法能够灵活集成到现有动作识别模型中,在提升识别性能的同时保持推理效率。通过引入不确定性自适应融合单元,MoRA有效解决了检索信息冗余和融合策略僵化的问题,结合IMU信号中的物理知识动态调整融合策略,显著提升了模型的鲁棒性和识别效果。实验结果表明,MoRA在多个真实数据集上均取得了稳定的性能提升。

Comments ICME 2026

详情
英文摘要

Inertial Measurement Unit (IMU)-based Human Activity Recognition (HAR) aims to interpret and classify user behaviors from temporal motion signals. Recently, deep learning frameworks have advanced this task by learning and extracting discriminative spatiotemporal representations, significantly improving recognition performance. However, IMU-based HAR still faces several critical challenges, particularly limited training samples and static knowledge utilization, both of which severely hinder its large-scale deployment. In this paper, we introduce MoRA, the first Retrieval-Augmented Module specifically designed for motion series. It can be flexibly integrated into any existing HAR model, enhancing recognition performance while maintaining inference efficiency. To address issues such as information redundancy in retrieval results and rigid fusion strategies, we propose an uncertainty-adaptive fusion unit within MoRA. This unit leverages previous physical knowledge from IMU signals to dynamically adjust the fusion strategy between original outputs and retrieved information, enabling more robust recognition. Extensive experiments on ten real-world datasets demonstrate that MoRA significantly improves the performance of existing IMU-based HAR models, consistently delivering stable and effective gains. The source code of MoRA is available at: https://github.com/liavonpenn/mora.

2605.08115 2026-05-12 cs.GR cs.CV cs.LG

Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models

Wang Xiaoyu, Phong Nguyen, Chen Zhao

AI总结 本文介绍了Alice v1,一个拥有140亿参数的开源视频生成模型,通过引入一致性蒸馏与分数正则化(rCM)方法,在视频质量上达到了当前最优水平。该模型不仅在生成速度上比教师模型提升了7倍,还在多个自动评估基准中超越了教师模型及多个闭源系统。研究提出了三种关键机制,包括分数正则化聚焦高质量输出、针对性合成数据提升薄弱环节以及一致性约束实现隐式正则化,为开源视频生成研究提供了完整的技术方案和资源支持。

详情
英文摘要

Wepresent Alice v1, a 14-billion parameter open-source video generation model that achieves state-of-the-art quality through consistency distillation with score regularization (rCM). Contrary to conventional distillation-which trades quality for speed-we demonstrate that rCM-based distillation can exceed teacher model quality. We attribute this to three mechanisms: (1) the score regularization term acts as a mode-seeking objective that concentrates probability mass on high-quality outputs rather than covering the full teacher distribution, (2) our targeted synthetic data pipeline with hard example mining provides training signal specifically for failure modes (physics, hands, faces) that the teacher handles inconsistently, and (3) consistency enforcement acts as implicit regularization, eliminating "lucky path" dependence on specific noise samples. Alice v1 generates 5-second 720p videos at 24fps in 4 denoising steps (~8 seconds on H100), a 7x speedup over the 50-step teacher while improving VBench score from 84.0 (Wan2.2) to 91.2. This surpasses both the teacher and closed-source systems including Veo3 (~90) and Sora2 (~88) on automated benchmarks, with competitive results in human preference studies. We release all model weights, training code, synthetic data pipelines, and evaluation scripts to advance open research in video generation.

2605.08112 2026-05-12 cs.SE cs.AI cs.CE cs.LG cs.LO

Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%

Drew Dillon, Kasyap Varanasi

AI总结 该研究探讨了如何通过引入产品上下文信息来提升AI编程代理在遵循团队特定决策方面的表现。研究构建了一个受控基准,用于衡量AI在8个真实软件工程任务中对41个决策点的遵循程度,并对比了仅基于代码库的基线配置与加入产品上下文检索系统的增强配置。实验表明,增强配置在相同任务和代码库下,决策遵循率从46%提升至95%,提升了49个百分点,验证了产品上下文信息对提高AI编码代理决策一致性的重要作用。

Comments 16 pages, 3 figures, 16 tables. Benchmark repository: https://github.com/brief-hq/dcbench

详情
英文摘要

AI coding agents powered by large language models can read codebases and produce functional code, but they routinely violate team-specific product decisions that are invisible in the source code alone. We introduce a controlled benchmark measuring decision compliance, the rate at which an AI coding agent follows established product, design, and engineering decisions, across 8 realistic software engineering tasks containing 41 weighted decision points. We compare a baseline configuration (Claude Code with codebase access only) against an augmented configuration that adds Brief, a product-context retrieval system providing spec generation, mid-build consultation, and retrieval of recorded decisions, persona pain points, customer signals, and competitive intelligence. On identical prompts and the same repository, the augmented configuration achieves 95% decision compliance versus 46% for the baseline, a 49 percentage point improvement. Per-decision analysis reveals that the baseline achieves 100% compliance on decisions visible in the codebase and 0-33% on decisions requiring product context, suggesting that product-context retrieval is a key driver of the improvement. We release the benchmark repository, all 16 pull requests, and scoring harness for independent reproduction.

2605.08103 2026-05-12 physics.comp-ph cond-mat.mtrl-sci cs.AI

Crystal Fractional Graph Neural Network for Energy Prediction of High-Entropy Alloys

Takanori Kotama, Yang Huang

AI总结 本文提出了一种晶体分数图神经网络,用于预测高熵合金的能量,该方法通过整合局部原子环境与全局组分信息,提升了预测精度。模型包含晶体图神经网络、分数神经网络和特征融合网络三个部分,分别学习局部相互作用、全局元素比例以及融合特征进行能量预测。实验表明,该模型在1049个晶体结构数据集上训练后,在198个四元结构上验证,其均方根误差与第一性原理计算相当,尤其在低能量配置下仍保持较高准确性。

详情
英文摘要

High-entropy alloys (HEAs) have attracted growing attention for their exceptional mechanical and thermal properties arising from complex atomic configurations. In this paper, we propose crystal fractional graph neural network for predicting the energy of high-entropy alloys by explicitly integrating both local atomic environments and global compositional information. The model consists of three components: a crystal graph neural network, which employs graph attention network layers to learn local interactions among 16 on-site atoms within the crystal lattice; fractional neural network, a fully connected network that embeds the global fraction of constituent elements; and feature fusion neural network, which fuses the outputs of the two submodels to predict the total crystal energy. We train the model on a dataset of 1,049 crystal structures and validate it on 198 quaternary structures, optimizing all hyperparameters via Optuna. Our results show that our model achieves an RMSE comparable to first-principles calculations and maintains high accuracy even for low-energy configurations. However, the model exhibits limitations in handling large crystal cells, which we aim to address in future work to extend its applicability to more complex systems.