arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2606.13017 2026-06-12 q-bio.NC cs.LG 新提交

Deep Sleep Classification via EEG Signal Criticality: A Passive BCI Approach for Sleep-Improvement Neurofeedback

基于EEG信号临界性的深度睡眠分类:一种用于改善睡眠神经反馈的被动BCI方法

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

AI总结 本研究利用去趋势波动分析(DFA)提取的临界性特征,通过朴素贝叶斯分类器实现了对深度睡眠(N3)的高精度识别(平衡准确率87.17%),为被动脑机接口中的状态依赖神经反馈提供了高效感知机制。

详情
Comments
7 pages, 3 figures, accepted for publication in the Proceedings of the 10th Graz Brain-Computer Interface Conference 2026, Graz, Austria, September 14-17, 2026
AI中文摘要

自动睡眠分期是被动脑-机接口(pBCI)的一项基础应用,它解码自发神经状态以实现独立于用户意图的闭环干预。本研究评估了从去趋势波动分析(DFA)中提取的临界性特征,用于特定识别深度睡眠(N3)。我们分析了来自290名老年女性的347,232个EEG时段,使用UMAP流形学习可视化状态转换。随后,通过10折交叉验证对六个分类器进行基准测试,使用平衡准确率确定此http URL的最佳“状态感知”引擎。朴素贝叶斯达到了最高的平均平衡准确率(87.17% ± 0.24%),显著优于全连接深度神经网络(FNN:81.58%)和随机森林(80.97%)。线性模型(LDA:57.21%;SVM:51.01%)表现不佳,表明DFA衍生的临界性特征位于一个独特的非线性流形上。EEG临界性的概率解码为pBCI提供了一种高精度的感知机制。这种稳健的分类流程支持开发状态依赖的神经反馈,例如靶向听觉刺激,以增强认知恢复。

英文摘要

Automated sleep staging is a fundamental application of passive Brain-Computer Interfaces (pBCI), decoding spontaneous neural states to enable closed-loop interventions independent of user intent. This study evaluates criticality features derived from Detrended Fluctuation Analysis (DFA) for the specific identification of deep sleep (N3). We analyzed $347,232$ EEG epochs from $290$ older women using UMAP manifold learning to visualize state transitions. Subsequently, six classifiers were benchmarked via 10-fold cross-validation, using balanced accuracy to determine the optimal "state-sensing" engine for neurofeedback.Naive Bayes achieved the highest mean balanced accuracy ($87.17\% \pm 0.24\%$), significantly outperforming a fully connected deep neural network (FNN: $81.58\%$) and Random Forest ($80.97\%$). Linear models (LDA: $57.21\%$; SVM: $51.01\%$) performed poorly, indicating that DFA-derived criticality features reside on a distinct, non-linear manifold. Probabilistic decoding of EEG criticality provides a high-accuracy sensing mechanism for pBCIs. This robust classification pipeline supports the development of state-dependent neurofeedback, such as targeted auditory stimulation, to enhance cognitive recovery.

2606.12838 2026-06-12 q-bio.QM cs.AI cs.LG q-bio.GN 新提交

OCOO-T : A Simple and Scalable Virtual Cell Model for Transcriptional Perturbation Response Prediction

OCOO-T: 一种用于转录扰动响应预测的简单可扩展虚拟细胞模型

Danning Jiang, Zheming An, Yalong Zhao, Lipeng Lai

AI总结 提出OCOO-T,一种基于流匹配的简约虚拟细胞模型,通过连续时间去噪和自适应层归一化,在多个基准上实现转录扰动预测的最优性能。

详情
Comments
22 pages, 6 figures
AI中文摘要

预测单细胞对遗传、化学和细胞因子扰动的转录响应是计算生物学和AI虚拟细胞(AIVC)建模中的一个基本挑战,对药物发现和基因调控网络的阐明具有直接影响。现有方法通常依赖辅助细胞状态编码器、分层变分自编码器、专用Transformer编码器-解码器模块或基因相互作用先验,将高维表达谱压缩为潜在表示。虽然有效,但这些设计增加了架构复杂性,可能限制可扩展性和泛化性。本文介绍了OCOO-T,一种基于流匹配的简约AIVC模型,用于转录扰动响应预测。OCOO-T利用一个直接操作连续基因表达谱的普通Transformer堆栈,并将扰动响应预测表述为连续时间去噪过程。通过自适应层归一化和上下文令牌整合扰动嵌入、剂量信息以及细胞系/细胞类型特异性。在Tahoe100M、Replogle和PBMC基准上的全面评估表明,OCOO-T在多种扰动和细胞类型上实现了最先进的性能,同时通过细胞上下文的修补和拆补有效扩展到长转录谱。通过利用基于Transformer去噪的单细胞组学简单性,OCOO-T为计算机细胞模拟提供了一个有效且可扩展的框架。

英文摘要

Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

2606.12654 2026-06-12 stat.ME cs.LG stat.ML 新提交

Computationally tractable robust differentially private mean estimation

计算可处理的鲁棒差分隐私均值估计

Kelly Ramsay

AI总结 提出一种名为“气球均值”的新差分隐私均值估计器,通过扩展马氏距离球上的迭代裁剪实现计算可处理性、鲁棒性及零集中差分隐私,理论保证在重尾和污染椭圆模型下的统计性能与鲁棒性。

详情
Comments
40 pages, 17 figures
AI中文摘要

我们开发了一种新的差分隐私均值估计器,称为气球均值。气球均值的主要特点是计算可处理且对异常观测具有鲁棒性。它基于在扩展的马氏距离球(即“气球”)上的迭代裁剪过程。该方法满足零集中差分隐私,并依赖于少量可解释的调优参数。我们在重尾和污染椭圆模型下提供了理论保证,刻画了其统计性能和对异常值的鲁棒性。大量模拟表明,气球均值对重尾和污染数据具有鲁棒性,并且在污染环境下优于现有的差分隐私均值估计器。

英文摘要

We develop a new, differentially private mean estimator called the balloon mean. The main features of the balloon mean are that it is computationally tractable and enjoys robustness to outlying observations. It is based on an iterative clipping procedure over expanding Mahalanobis balls, or ``balloons.'' The method satisfies zero-concentrated differential privacy and depends on a small number of interpretable tuning parameters. We provide theoretical guarantees under heavy-tailed and contaminated elliptical models, characterizing its statistical performance and robustness to outliers. Extensive simulations demonstrate that the balloon mean is robust to heavy-tailed and contaminated data, and outperforms existing differentially private mean estimators in contaminated settings.

2606.12623 2026-06-12 stat.AP cs.LG 新提交

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型(TRAM-DAG)估计急性缺血性卒中个体化治疗效果:一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结 提出因果变换模型(TRAM-DAG)估计急性缺血性卒中患者个体化治疗效果,基于观察数据拟合后,在RCT人群中验证其平均效果与ATE一致,并能正确排序患者预后。

详情
AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果(ATE)转向个体化治疗效果(ITE)估计,以支持治疗决策。在急性缺血性卒中中,随机对照试验(如MR CLEAN研究)显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表(mRS),这是一个有序的功能残疾指标(0:无症状,6:死亡)。我们证明,在观察性MAGIC多中心卒中患者数据上拟合后,有向无环图上的因果变换模型(TRAM-DAG)可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性,我们在MAGIC子人群(入院NIHSS≥6,对应MR CLEAN的一项纳入标准)上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认,但我们显示其平均值与试验报告的ATE一致。此外,ITE估计正确地将试验患者按观察到的良好结局(三个月mRS≤2)频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策,并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

2606.12471 2026-06-12 stat.ML cs.CL cs.ET cs.LG 新提交

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

无高斯假设的可识别性:符号世界模型与近无限时间一致性

Seth Dobrin, Łukasz Chmiel

AI总结 本文提出物理基础符号架构(PGSA),证明其在非高斯动态系统中实现精确线性可识别性和近无限时间一致性,克服了统计世界模型的高斯边界限制。

详情
Comments
Pre-print
AI中文摘要

Klindt、LeCun 和 Balestriero (arXiv:2605.26379) 证明了联合嵌入预测架构(JEPA)实现线性可识别性(即线性恢复世界的真实潜在变量)当且仅当世界的潜在动态遵循高斯平稳过程。这一高斯边界意味着时间一致性的基本限制:对于任何非高斯物理系统,统计世界模型的表示误差随时间单调增长。我们证明这一限制是统计对齐机制的产物,而非世界模型的一般性质。我们引入物理基础符号架构(PGSA),并证明三个结果:(1) PGSA 对所有物理机制实现精确线性可识别性,无论潜在分布如何;(2) PGSA 的每步误差仅受数值精度限制;(3) 直接推论是,PGSA 在无界数量的转换中保持时间一致性,我们称之为近无限时间一致性。我们进一步证明,对于任何非高斯系统,统计世界模型无法实现这一性质,无论模型容量或训练数据量如何。其中四个定理的代数核心已在 Lean 4 中使用 Mathlib4 v4.31.0 形式化(零个 sorry 占位符);Klindt 等人的逆命题作为外部前提。对比表明,在世界动态的因果生成器中进行符号基础化是充分条件,并且在非高斯体制下,是实现近无限时间一致性的唯一条件。

英文摘要

Klindt, LeCun, and Balestriero (arXiv:2605.26379) proved that Joint-Embedding Predictive Architectures (JEPAs) achieve linear identifiability, the linear recovery of the world's true latent variables, if and only if the world's latent dynamics follow a Gaussian, stationary process. This Gaussian boundary implies a fundamental limit on temporal consistency: for any non-Gaussian physical system, the representation error of a statistical World Model grows monotonically with time. We prove that this limit is an artifact of the statistical alignment mechanism, not a property of World Models in general. We introduce the Physics-Grounded Symbolic Architecture (PGSA) and prove three results: (1) a PGSA achieves exact linear identifiability for all physical regimes, regardless of the latent distribution; (2) the per-step error of a PGSA is bounded by numerical precision alone; and (3) as a direct consequence, a PGSA maintains temporal consistency for an unbounded number of transitions, a property we term near-infinite temporal consistency. We further prove that statistical World Models cannot achieve this property for any non-Gaussian system, regardless of model capacity or the volume of training data. The algebraic cores of four of the theorems are formalized in Lean 4 with Mathlib4 v4.31.0 (zero sorry placeholders); the Klindt et al. converse is taken as an external premise. The contrast establishes that symbolic grounding in the causal generator of the world's dynamics is the sufficient condition and, in non-Gaussian regimes, the only condition for near-infinite temporal consistency.

2606.13633 2026-06-12 eess.SY cs.LG cs.SY 新提交

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

基于混合CNN-元胞自动机火灾模型的空中野火抑制规划

Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala, Anthony Wong

AI总结 提出结合混合神经-元胞自动机野火模型与梯度优化空中投放的框架,通过蒙特卡洛采样和空间相关扰动量化不确定性,案例验证可生成有效抑制方案。

详情
AI中文摘要

空中野火抑制不仅需要预测火势蔓延,还需要在操作和环境不确定性下设计有效的干预策略。我们提出了一个空中野火抑制的建模与优化框架,该框架将混合神经-元胞自动机野火模型与基于梯度的目标空中投放设计相结合。野火模型根据地形、燃料和风数据预测空间变化的蔓延行为,而干预模块确定二元投放动作,其连续值位置和方向参数映射到模拟网格。水和阻燃剂具有不同的抑制效果,分别对应于立即减少活跃燃烧和持续减少未来蔓延。为了评估所得抑制方案的鲁棒性,我们通过每日火势状态的蒙特卡洛采样量化偶然不确定性,并通过空间相关的预测误差扰动量化认知不确定性。基于2020年Bear Fire的案例研究表明,该框架可以生成连贯的空中抑制调度,以减少总火灾影响面积,并支持对野火干预策略的不确定性感知分析。

英文摘要

Aerial wildfire suppression requires not only predicting fire spread, but also designing effective intervention strategies under operational and environmental uncertainty. We present a modeling and optimization framework for aerial wildfire suppression that combines a hybrid neural-cellular automaton wildfire model with gradient-based design of targeted aerial drops. The wildfire model predicts spatially varying spread behavior from terrain, fuel, and wind data, while the intervention module determines binary drop actions with continuous-valued location and orientation parameters mapped to the simulation grid. Water and retardant are represented with distinct suppression effects, corresponding to immediate reduction of active burning and persistent reduction of future spread. To evaluate the robustness of the resulting suppression plans, we quantify both aleatoric uncertainty through Monte Carlo sampling of daily fire-state realizations and epistemic uncertainty through spatially correlated prediction-error perturbations. A case study based on the 2020 Bear Fire shows that the framework can generate coherent aerial suppression schedules for reducing total fire-affected area and can support uncertainty-aware analysis of wildfire intervention strategies.

2606.13543 2026-06-12 cs.NI cs.LG 新提交

NetCause: Counterfactual Learning for Root Cause Analysis in Large-Scale Networks

NetCause:大规模网络中根因分析的反事实学习

Fabien Chraim, Jian Zhang, Dominik Janzing, Xiang Song, Christos Faloutsos, John Evans

AI总结 提出NetCause框架,将网络事件建模为图时间过程,通过反事实模拟排序候选根因,在31个专家标注事件上准确率提升16.1%。

详情
Comments
9 pages, 6 figures
AI中文摘要

一个学习模型能否捕捉故障在大规模网络中的传播方式,并利用这些知识将客户影响因果归因于其根本原因?现有的根因分析技术通常依赖于静态规则、相关启发式或拓扑局部推理,难以在动态环境中泛化,因为故障在复杂的物理和逻辑依赖关系中传播。我们提出了NetCause,一个基于自监督学习的框架,将网络事件建模为图时间过程,并使用反事实模拟对候选根因进行排序。该方法生成可解释的根因假设排序,并自然地与操作员定义的缓解和修复措施集成。我们在来自领先云提供商生产网络的六个月内收集的1500多个事件上训练模型,并在31个专家标注的事件上评估。NetCause在与运营决策最相关的场景中持续改善根因排序质量,相比基于规则的启发式基线,准确率提升16.1%。虽然训练计算密集,但推理轻量,每个事件仅需数秒GPU运行时间(远低于典型的遥测收集延迟)。

英文摘要

Can a learned model capture how faults propagate through a large-scale network and use this knowledge to causally attribute customer impact to its underlying root cause? Existing root cause analysis techniques often rely on static rules, correlation heuristics, or topology-local reasoning, which struggle to generalize in dynamic environments where faults propagate across complex physical and logical dependencies. We present NetCause, a self-supervised learning-based framework that models network incidents as graph-temporal processes and uses counterfactual simulation to rank candidate root causes. This approach produces an interpretable ranking of root cause hypotheses and integrates naturally with operator-defined mitigation and remediation actions. We train the model on over 1,500 incidents collected over six months from a leading cloud provider's production network and evaluate it on 31 expert-labeled incidents. NetCause consistently improves root cause ranking quality in the regime most relevant to operational decision-making, achieving a 16.1% accuracy improvement over a rule-based heuristic baseline. While training is computationally intensive, inference is lightweight, requiring only seconds of GPU runtime per incident (well below typical telemetry collection latencies).

2606.13532 2026-06-12 cs.NI cs.LG 新提交

Graphical Causal Reasoning for Root Cause Analysis in Cloud Networks

云网络中根本原因分析的图因果推理

Fabien Chraim, Dominik Janzing, John Evans

AI总结 提出基于图因果发现的云网络事故根本原因分析方法,通过时空分组和自动化本体降维,利用双变量Granger因果性和条件独立性检验构建因果图,并引入概率方法进行时间感知的根因评分。在35个生产事故中召回率85.7%,精确匹配率74.3%。

详情
Comments
6 pages, 4 figures
AI中文摘要

云计算依赖于大规模网络,这些网络本质上是复杂系统。在本文中,我们提出了一种新颖的云网络事故根本原因分析(RCA)方法,利用基于图的因果发现技术。我们的方法通过引入时空分组策略和自动化本体来降低问题维度,从而解决了基于规则的自动化的局限性。我们使用双变量Granger因果性和条件独立性检验从二元时间序列数据构建因果图。对于推理,我们引入了一种概率方法,该方法根据时间延迟分配边特定的条件概率,从而通过因果图遍历实现可解释的、时间感知的根因评分。我们使用来自一家主要云提供商的35个生产事故的标记数据集评估了该系统。该模型成功召回正确根因的事故占85.7%,精确匹配的事故占74.3%。在生产中,该系统已用于800多个真实世界事故,并获得了网络工程师的积极定性反馈。这些结果突显了在动态和大规模运营环境中采用数据驱动的因果方法进行RCA的实用性。

英文摘要

Cloud-computing relies on large-scale networks which are inherently complex systems. In this paper, we present a novel approach to root cause analysis (RCA) of cloud network incidents, leveraging graph-based causal discovery techniques. Our method addresses the limitations of rule-based automation by introducing a spatiotemporal grouping strategy and an automation ontology to reduce the dimensionality of the problem. We construct a causal graph from binary time series data using bivariate Granger causality and conditional independence tests. For inference, we introduce a probabilistic method that assigns edge-specific conditional probabilities as a function of time lag, allowing for interpretable, time-aware root cause scoring via causal graph traversal. We evaluated the system using a labeled dataset of 35 production incidents from a major cloud provider. The model successfully recalled the correct root cause in 85.7% of incidents and produced an exact match in 74.3%. In production, the deployed system has been used in over 800 real-world incidents, with positive qualitative feedback from network engineers. These results highlight the practicality of a data-driven, causal approach to RCA in dynamic and large-scale operational environments.

2606.13529 2026-06-12 cs.HC cs.LG 新提交

Ride, Track, and Recover: Pilot Randomized Trial of a Wearable Digital Self-Management Intervention During a Veteran Endurance-Cycling Program

骑行、追踪与恢复:一项关于可穿戴数字自我管理干预在退伍军人耐力骑行项目中的初步随机试验

Alan Ta, Nilsu Salgin, Caleb Armstrong, Kala Phillips Reindel, Farzan Sasangohar

AI总结 本研究通过随机试验,评估可穿戴数字自我管理干预对退伍军人创伤后应激障碍(PTSD)高唤醒症状的稳定效果,发现干预组症状改善更持久,且机器学习检测精度与症状严重程度正相关。

详情
AI中文摘要

退伍军人的创伤后应激障碍(PTSD)以持续高唤醒及共病焦虑和抑郁症状为特征,这些症状在临床环境外难以监测和管理。在德克萨斯州参加“英雄计划”骑行活动的13名退伍军人,通过计算机生成序列在自然环境中随机分为两组:(1)数字干预加体力活动,或(2)仅体力活动,外加一个由从更广泛的“英雄计划”退伍军人社区中选出的7名退伍军人组成的第三组家庭监测对照组。连续智能手表传感结合心率和加速度计特征来检测高唤醒事件,并由参与者实时确认。每周收集焦虑、抑郁和PTSD严重程度的自我报告测量。广义加性混合模型描述了随时间变化的非线性轨迹。基线归一化的高唤醒轨迹在不同条件下存在显著差异,数字干预组(n=7)显示出结构化的稳定,而仅体力活动组(n=3)在研究后期出现恶化。两个骑行组在耐力活动期间均表现出急性症状改善;然而,数字干预组表现出更高的整体收益维持。家庭对照组(n=4)显示出症状逐渐下降。机器学习检测的感知精度在个体间差异很大,并与症状严重程度正相关,较高严重程度的参与者确认了更大比例的检测事件。这些结果表明,将可穿戴检测与数字自我管理工具相结合可能支持高唤醒的稳定和症状改善,同时强调了在可穿戴心理健康系统中个性化和以人为中心的设计的重要性。

英文摘要

Post-traumatic stress disorder (PTSD) in veterans is characterized by persistent hyperarousal and comorbid anxiety and depressive symptoms that are difficult to monitor and manage outside clinical settings. Thirteen veterans participating in a Project Hero cycling event in Texas were randomized by computer-generated sequence in a naturalistic setting to two arms: (1) digital intervention plus physical activity, or (2) physical activity only, plus a third at-home monitoring control cohort consisting of 7 veterans selected from the broader Project Hero veteran community. Continuous smartwatch sensing combined heart rate and accelerometer features to detect hyperarousal events, which were confirmed in real time by participants. Weekly self-report measures of anxiety, depression, and PTSD severity were collected. Generalized additive mixed models characterized nonlinear trajectories over time. Baseline-normalized hyperarousal trajectories differed significantly across conditions, with the digital intervention group (n=7) showing structured stabilization compared to late-study escalation in the physical-only group (n=3). Both cycling groups exhibited acute symptom improvements during the endurance event; however, the digital intervention group demonstrated a higher overall maintenance of gains. The at-home control group (n=4) showed gradual symptom declines. Perceived precision of ML detections varied substantially across individuals and was positively associated with symptom severity, with higher-severity participants confirming a greater proportion of detected events. These results suggest that coupling wearable detection with digital self-management tools may support stabilization of hyperarousal and symptom improvement while emphasizing the importance of personalization and human-centered design in wearable mental health systems.

2606.13501 2026-06-12 cs.DC cs.LG cs.PF 新提交

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

GF-DiT:扩散Transformer服务的并行调度

Xinwei Qiang, Yifan Hu, Shixuan Sun, Jing Yang, Han Zhao, Chen Chen, Yu Feng, Jingwen Leng, Minyi Guo

AI总结 提出GF-DiT,一种策略可编程运行时,通过动态调整请求并行度来优化扩散Transformer服务,利用无组集合通信实现低开销在线重配置,显著提升吞吐量和降低延迟。

详情
AI中文摘要

扩散Transformer(DiT)已成为图像和视频生成的主流架构,对高效DiT服务的需求日益增长。现有系统为每个请求在其整个生命周期内分配固定的并行配置。然而,DiT工作负载在请求、执行阶段和系统条件之间表现出显著的异构性,使得静态并行性效率低下,通常导致GPU利用率低和服务质量下降。本文认为,DiT服务应将GPU并行性视为一种可调度的资源。我们提出GF-DiT,一种策略可编程的弹性DiT服务运行时,能够根据工作负载需求和服务目标动态调整运行中请求的并行度。GF-DiT引入了一种异步执行抽象,将请求分解为独立可调度的轨迹任务,并支持在线GPU重新分配。为了使弹性并行性实用化,GF-DiT进一步提出了无组集合(group-free collectives),一种轻量级通信抽象,支持低开销的任意执行组在线形成和重新配置。我们在vLLM-Omni中实现了GF-DiT,并在代表性的图像和视频扩散工作负载上进行了评估。与具有静态并行性的固定流水线执行相比,GF-DiT将吞吐量提高了高达6.01倍,平均延迟降低了高达95%,SLO违规率降低了高达90%,并将通信组设置开销从778毫秒降低到约60微秒。

英文摘要

Diffusion Transformers (DiTs) have become the dominant architecture for image and video generation, creating growing demand for efficient DiT serving. Existing systems assign each request a fixed parallel configuration throughout its lifetime. However, DiT workloads exhibit substantial heterogeneity across requests, execution stages, and system conditions, making static parallelism inefficient and often leading to poor GPU utilization and degraded service quality. This paper argues that DiT serving should treat GPU parallelism as a first-class schedulable resource. We present GF-DiT, a policy-programmable runtime for elastic DiT serving that dynamically adapts the parallelism of running requests according to workload demands and service objectives. GF-DiT introduces an asynchronous execution abstraction that decomposes requests into independently schedulable trajectory tasks and enables online GPU reallocation. To make elastic parallelism practical, GF-DiT further proposes group-free collectives, a lightweight communication abstraction that supports low-overhead online formation and reconfiguration of arbitrary execution groups. We implement GF-DiT in vLLM-Omni and evaluate it on representative image and video diffusion workloads. Compared with fixed-pipeline execution with static parallelism, GF-DiT improves throughput by up to 6.01$\times$, reduces mean latency by up to 95%, lowers SLO violation rates by up to 90%, and reduces communication-group setup overhead from 778 ms to approximately 60 $μ$s.

2606.13468 2026-06-12 cs.SE cs.AI 新提交

Understanding the Rejection of Fixes Generated by Agentic Pull Requests -- Insights from the AIDev Dataset

理解AI代理生成的拉取请求修复被拒绝的原因——来自AIDev数据集的洞察

Mahmoud Abujadallah, Ali Arabat, Mohammed Sayagh

AI总结 通过分析AIDev数据集,发现46.41%的AI代理(Copilot、Devin、Cursor、Claude)提出的代码修复被拒绝。本文对306个未合并的PR进行定性研究,归纳出14个拒绝原因,分为四类,并提出了改进模型引导的建议。

详情
Comments
5 pages, 2 figures, MSR '26: Proceedings of the 23rd International Conference on Mining Software Repositories, April 2026, Rio de Janeiro, Brazil
AI中文摘要

AI编码代理越来越多地被用于生成拉取请求(PR),以在软件项目中提出代码修复。通过对AIDev数据集的初步探索,我们发现由Copilot、Devin、Cursor和Claude代理提出的修复中有46.41%被拒绝。这代表了大量浪费的资源,需要人工审查、验证以及运行测试和验证,而这些修复最终被丢弃。本文的目标是理解AI代理的失败模式,这对于更好地将AI代理集成为高效团队成员至关重要。本文对由前述代理创建或共同创作的306个未合并的拉取请求的代表性样本进行了定性研究,随后对拒绝原因进行了定量分析。我们的定性发现确定了14个原因,分为四个高级类别,用于拒绝AI代理的修复。我们观察到,开发者可能因以下原因拒绝修复:修复的实现不正确(例如,不完整、方法错误)、修复未通过持续集成(CI)管道并测试失败、代理无法执行实现(例如,未生成代码、会话丢失),以及修复优先级低。我们的结果揭示了在以下层面更好引导模型的重要性:(1)提出关于修复问题应遵循的方法的提示,(2)概述不应采取的方法的约束或限制,以及(3)指导代理如何通过CI管道验证实现而不引入破坏性变更。我们的结果表明,需要良好的任务优先级排序,以便生成的修复不会导致浪费的人工审查努力或浪费的代理资源(例如,令牌、计算或允许的请求数量)。

英文摘要

AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents Copilot, Devin, Cursor, and Claude are rejected. This represents a significant amount of wasted resources that require human reviews, verifications, and running tests and validations for fixes that are merely discarded. Our goal in this paper is to understand the failure modes of AI-agents, an understanding that is crucial for better integrating AI-agents as efficient teammates. In this paper, we conduct a qualitative study on a representative sample of 306 non-merged pull requests created or co-authored by the agents mentioned earlier, followed by a quantitative analysis of the reasons for rejection. Our qualitative findings identify 14 reasons divided into four high-level categories for rejecting AI-agent fixes. We observe that developers can reject fixes due to fixes whose implementation is incorrect (e.g., incomplete, wrong approach), fixes that do not pass the continuous integration (CI) pipelines and fail tests, fixes for which the agent is unable to perform the implementation (e.g., no code generated, sessions lost), and fixes whose priority is low. Our results shed light on the importance of better guiding the model at these levels: (1) proposing hints about the approach to follow for fixing an issue, (2) outlining constraints or limitations regarding the approaches that should not be taken, and (3) instructing the agent on how to validate the implementation through CI pipelines and without introducing a breaking change. Our results suggest the need for good prioritization of tasks so that generated fixes do not lead to wasted human review efforts or wasted agent resources (e.g., tokens, compute, or allowed number of requests).

2606.13452 2026-06-12 cs.DL cs.CL cs.CY cs.HC 新提交

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

审视作者与同行评审员在学术论文新颖性上的认知差距

Chenggang Yang, Chengzhi Zhang

AI总结 通过分析Nature Communications上15,328篇论文及其评审意见,发现作者和评审员都强调结果导向的创新,但评审员视角更全面;高创新论文受益于强宣传语言,中等创新论文的宣传语言与评审分歧显著相关。

详情
Journal ref
Scientometrics, 2026
AI中文摘要

新颖性是评估学术论文质量的关键指标。学者们努力突出其工作的新颖方面,尤其是在标题、摘要和引言中。同行评审作为科学严谨性的守门人,严格评估论文的新颖性,但作者自我宣传与评审员评价之间可能存在认知差距。为探究此问题,我们分析了2016年至2021年间发表在Nature Communications上的15,328篇学术论文及其同行评审意见。我们发现,评审员和作者都强调结果导向的创新,但评审员采用更全面的评价视角。此外,通过考察宣传强度与论文固有新颖性的关系,我们发现其效果取决于论文的实际创新水平。高创新论文受益于更强的宣传语言,获得更积极的评价。我们还发现,宣传语言与评审员对新颖性的分歧显著相关,但仅针对中等创新性的论文,而对高或低新颖性的论文影响甚微。这揭示了宣传语言如何在学术评价的灰色地带中发挥最显著的作用。

英文摘要

Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the gatekeeper of scientific rigor, rigorously evaluates the novelty of papers, yet a cognitive gap may exist between author self-promotion and reviewer evaluation. To investigate this, we analyzed 15,328 academic papers published in Nature Communications from 2016 to 2021, along with their peer-review comments. We found that both reviewers and authors emphasize result-oriented innovation, with reviewers adopting a more comprehensive evaluation perspective. Furthermore, by examining promotional intensity against inherent paper novelty, we found that its effect depends on the paper's actual innovation level. Highly innovative papers benefit from stronger promotional language, receiving more positive evaluations. We also found that promotional language significantly correlates with reviewer disagreement on novelty specifically for papers of moderate innovativeness, whereas it has negligible impact for papers with either very high or very low novelty. This reveals how promotional language operates most prominently in the gray area of academic evaluation.

2606.13449 2026-06-12 cs.SE cs.AI 新提交

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

面向指令即代码:理解指令文件对智能体拉取请求的影响

Ali Arabat, Mohammed Sayagh

AI总结 通过分析148个项目的15549个智能体PR,发现指令文件对合并率、代码变更量和合并工作量无一致正面影响,但成功项目指令文件更长且结构更清晰,提出“指令即代码”研究方向。

详情
Comments
5 pages, 8 figures, 23rd International Conference on Mining Software Repositories, April 13--14, 2026
AI中文摘要

AI智能体(如GitHub Copilot)作为队友协作完成不同的软件工程任务,包括通过拉取请求(Agentic-PRs)提出的代码生成。为了提高智能体效率,开发者创建指令文件来指导AI智能体,包括如何导航项目、定位正确组件、运行测试、遵守最佳实践等。本文研究了这些指令的创建与AI智能体在创建更好的拉取请求方面的性能之间的关系,这些拉取请求具有更高的成功机会(即合并率)、处理更复杂的任务(例如代码变更量),并且需要更少的合并工作量(例如合并时间)。为此,我们分析了来自AIDev数据集中148个项目的15,549个智能体PR。使用这三个维度,我们比较了每个项目在创建指令文件前后的情况。我们发现,为AI智能体指定指令并不一定会带来更好的结果。使用指令文件后,27.7%的项目的合并率至少提高了20%,而26.35%的项目合并率下降。在变更量(例如代码变更量、修改文件数量)和合并智能体PR的工作量(例如合并时间和评论数量)方面也观察到相同的情况。通过初步探索,我们发现成功提高合并率的项目具有更长的指令文件,并且这些文件结构良好,分为更多的章节和子章节。我们的结果激励了研究需求,以帮助从业者将指令文件的开发视为一项软件工程活动(即,\textbf{指令即代码})。

英文摘要

AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project, locate the right components, run tests, respect best practices, and more. In this paper, we investigate the relationship between the creation of these instructions and the performance of AI-agents in creating better pull requests, which have a higher chance of success (i.e., the merge rate), address more complex tasks (e.g., code churn), and require less effort to be merged (e.g., time to merge). To this end, we analyze 15,549 agentic PRs from 148 projects in the AIDev dataset. Using the three dimensions, we compare each project before and after the creation of the instruction files. We find that specifying instructions for AI-agents does not necessarily lead to better results. With the instruction files, 27.7\% of the projects increased their merge rate by at least 20\%, while 26.35\% decreased it. The same observation is seen with the amount of changes (e.g., code churn, number of modified files) and with the efforts to merge an agentic PR (e.g., merge time and number of comments). From a first exploration, we find that projects that managed to increase their merge rate have substantially longer instruction files, which are also well structured into a higher number of sections and sub-sections. Our results motivate the need for research to assist practitioners in framing the development of instruction files as a software engineering activity (aka, \textbf{Instructions-as-Code}).

2606.13397 2026-06-12 cs.HC cs.AI cs.CY 新提交

Mod-Guide: An LLM-based Content Moderation Feedback System to Address Insensitive Speech toward Indigenous Ethnic and Religious Minority Communities

Mod-Guide:一种基于LLM的内容审核反馈系统,用于解决针对原住民及少数族裔宗教群体的不敏感言论

Dipto Das, Achhiya Sultana, Ankit Singh Chauhan, Saadia Binte Alam, Mohammad Shidujaman, Shion Guha, Sunandan Chakraborty, Syed Ishtiaque Ahmed

AI总结 本文研究LLM审核系统对孟加拉国印度教和查克玛社区不敏感言论的认知局限,通过共同构建文化语料库和检索增强生成(RAG)方法开发Mod-Guide工具,提升模型对少数群体观点的敏感性。

详情
AI中文摘要

语言既是边缘化的机制,也是抵抗的机制,尤其是对于在网络上面对不敏感和有害言论的少数群体。随着内容审核越来越依赖大型语言模型(LLMs),人们开始担忧这些系统能否识别文化不敏感言论——即通过隐含的抹除、歪曲或规范性框架(而非公开敌意)忽视或边缘化历史上代表性不足社区的文化和宗教观点的言论。本文聚焦孟加拉国的印度教和查克玛社区——该国最大的宗教少数群体和原住民少数民族,研究了基于LLM的审核系统的认知局限,并探索融入少数群体视角的方法。我们与社区成员共同创建了一个文化敏感言论语料库,并使用检索增强生成(RAG)将他们的叙事整合到审核流程中。我们的工具Mod-Guide通过利用源自生活经验的上下文线索,提升了LLM对少数群体观点的敏感性。通过涉及少数群体和多数群体参与者的混合方法评估,我们证明RAG增强的审核响应在上下文上更准确,且不同族群对其感知存在差异。这项工作通过在前台化内容审核系统设计中的修复正义和诠释学包容,推进了人机交互、AI伦理和社会计算领域的研究。

英文摘要

Language operates as a mechanism of both marginalization and resistance, especially for minority communities navigating insensitive and harmful speech online. As content moderation increasingly depends on large language models (LLMs), concerns arise about whether these systems can recognize culturally insensitive speech-language that disregards or marginalizes the cultural and religious perspectives of historically underrepresented communities, often through implicit erasure, misrepresentation, or normative framing, rather than overt hostility. Focusing on Bangladesh's Hindu and Chakma communities -- the country's largest religious and Indigenous ethnic minorities, respectively -- this paper investigates the epistemic limits of LLM-based moderation systems and explores methods for incorporating minority perspectives. We co-created a culturally grounded corpus of insensitive speech with community members and integrated their narratives into moderation pipelines using retrieval augmented generation (RAG). Our tool, Mod-Guide, improves LLM sensitivity to minority viewpoints by leveraging contextual cues derived from lived experience. Through mixed-method evaluations involving both minority and majority participants, we demonstrate that RAG-enhanced moderation responses are more contextually accurate and perceived differently across ethnic lines. This work advances research in human-computer interaction, AI ethics, and social computing by foregrounding restorative justice and hermeneutical inclusion in the design of content moderation systems.

2606.13385 2026-06-12 cs.CR cs.AI cs.CY cs.HC cs.MM 新提交

Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

谁买单?面向真实世界网络代理的以利益相关者为中心的提示注入基准测试

Zihao Wang, Yiming Li, Yutong Wu, Zheyu Liu, Kangjie Chen, Fok Kar Wai, Pin-Yu Chen, Vrizlynn L. L. Thing, Bo Li, Dacheng Tao, Tianwei Zhang

AI总结 提出以利益相关者为中心的基准测试框架,系统分类和归因真实世界网络代理系统中的提示注入危害,揭示当前代理无法可靠抵抗任何攻击目标,且失败模式多样。

详情
Comments
32 pages
AI中文摘要

由大型语言模型驱动的网络代理越来越多地部署在真实环境中,它们在不受信任的网络内容上操作并执行具有直接后果的动作。这使得它们容易受到提示注入攻击,其中看似良性的内容嵌入了操纵代理行为的对抗性指令。现有的安全基准采用以攻击为中心的视角,关注注入的技术可行性,而忽略了由此产生的危害的细微分布。然而,在实践中,提示注入风险是受害者依赖的:单一漏洞可能对不同利益相关者产生不对称后果,同一攻击模式可能因目标不同而表现出显著不同的有效性。为了捕捉这些特性,我们引入了\sysname,一个以利益相关者为中心的基准,用于系统分类和归因真实世界网络代理系统中的危害。它区分受影响的实体(如用户、卖家、平台),将攻击分解为具体目标,并使用互补的结果和过程级指标评估每个案例。我们的结果揭示了显著且异质的漏洞:当前代理无法可靠抵抗任何单一攻击目标,失败分布在从“隐蔽寄生”(攻击成功而不干扰用户委托任务)到“错位破坏”(任务被破坏而攻击未成功)以及“复合失败”(对抗目标和任务完整性同时被违反)等不同模式。这些模式被传统评估所忽略,突显了在真实部署中对基于LLM的代理进行利益相关者感知评估的必要性。基准可在该https URL获取。

英文摘要

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing security benchmarks adopt an \textit{attack-centric} perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms. In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets. To capture these properties, we introduce \textbf{\sysname}, a \textit{stakeholder-centric} benchmark to systematically categorize and attribute harm in real-world web agent systems. It distinguishes between affected entities (e.g., user, seller, platform), decomposes the attacks into concrete objectives, and evaluates each case with complementary outcome- and process-level metrics. Our results reveal substantial and heterogeneous vulnerabilities: not a single attack objective is reliably resisted by current agents, and failures distribute across qualitatively distinct modes ranging from \emph{stealthy parasitism} (attack succeeds without disrupting the user's delegated task) to \emph{misaligned disruption} (task disrupted without attack success) and \emph{compounded failure} (both adversarial objective and task integrity simultaneously violated). These patterns are missed by conventional evaluation, highlighting the need for stakeholder-aware assessment of LLM-based agents in real-world deployments. Benchmark is available at https://github.com/StakeBench/SBC.

2606.13298 2026-06-12 cs.SE cs.AI 新提交

Mining Architectural Quality Under Agentic AI Adoption: A Causal Study of Java Repositories

在智能体AI采用下的架构质量挖掘:Java仓库的因果研究

Oliver Aleksander Larsen, Mahyar T. Moghaddam

AI总结 通过差分差分设计和Borusyak插值估计器,研究智能体AI工具采用对Java仓库架构气味密度(ASD)的因果影响,发现ASD下降6.7%源于代码量增长,而非架构改进。

详情
Comments
16 pages. Accepted for presentation at the 52nd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) 2026, Krakow, Poland, 2-4 September 2026, and for publication in the Springer LNCS proceedings. This is the author's accepted manuscript
AI中文摘要

AI编码工具现已被大多数开发者使用,这些工具的智能体化使用普及了俗称“氛围编码”的实践。然而,关于其对软件架构影响的因果证据却很少。先前的因果工作衡量了代码层面的结果(复杂度、静态分析警告);这种退化是否会传播到架构层面仍未知。我们挖掘了151个开源Java仓库,其中74个检测到智能体AI采用(通过配置文件和Co-Authored-By提交尾注识别),以及77个倾向得分匹配的对照仓库,每个仓库跨越13个月,生成1,811个月度Arcan快照。我们采用交错差分差分设计和Borusyak插值估计器,估计采用对架构气味密度(ASD)的因果效应,将近期用于代码层面指标的因果设计应用于架构层面。总气味计数基本不变(+1.1%,p=0.82),而代码行数增长12.8%(p=0.003);因此,ASD下降6.7%(p=0.004)是分母效应而非架构改进。按类型估计和稳健性检验(wild cluster bootstrap、Lee bounds、陈旧观测敏感性)证实了这一模式;预处理趋势平坦(Wald p=0.90),与平行趋势一致。当处理影响系统规模时,密度归一化结果可能产生误导:对AI工具采用的因果挖掘研究需要原始计数和显式分解。完整的复现包,包括精心整理的151个仓库月度面板,已公开提供。

英文摘要

AI coding tools are now used by a majority of developers, and agentic use of these tools has popularized the practice colloquially called "vibe coding". Yet causal evidence on their effect on software architecture is scarce. Prior causal work has measured code-level outcomes (complexity, static analysis warnings); whether such degradation propagates to architecture-level outcomes remains unknown. We mine 151 open-source Java repositories, 74 with detectable agentic AI adoption (identified via configuration files and Co-Authored-By commit trailers) and 77 propensity-matched controls, across a 13-month per-repository window yielding 1,811 monthly Arcan snapshots. We estimate the causal effect of adoption on architectural smell density (ASD) with a staggered difference-in-differences design and the Borusyak imputation estimator, applying a causal design recently used for code-level metrics to the architecture level. Total smell counts are essentially unchanged (+1.1%, p = 0.82) while lines of code grow +12.8% (p = 0.003); the resulting 6.7% ASD decline (p = 0.004) is therefore a denominator effect rather than an architectural improvement. Per-type estimates and robustness checks (wild cluster bootstrap, Lee bounds, stale-observation sensitivity) corroborate the pattern; pre-trends are flat (Wald p = 0.90), consistent with parallel trends. Density-normalized outcomes can mislead when treatment affects system size: raw counts and explicit decomposition are required for causal mining studies of AI tool adoption. The complete replication package, including the curated 151-repository monthly panel, is publicly available.

2606.13239 2026-06-12 cs.SE cs.AI cs.CL cs.CV 新提交

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

ComAct: 通过COM即行动范式重构专业软件操作

Jiaxin Ai, Tao Hu, Xuemeng Yang, Shu Zou, Hairong Zhang, Daocheng Fu, Yu Yang, Hongbin Zhou, Nianchen Deng, Pinlong Cai, Zhongyuan Wang, Botian Shi, Kaipeng Zhang, Licheng Wen

AI总结 提出COM即行动范式,将专业软件交互转化为确定性程序合成,解决GUI代理的脆弱性和API代理的异构性问题;构建ComCADBench基准和ComActor自校正代理,在工业CAD软件上实现SOTA性能。

详情
AI中文摘要

现有的计算机使用代理在专业软件操作上仍然存在根本性限制:基于GUI的代理受困于脆弱的视觉基础和长程错误累积,而基于API的方法则难以应对异构协议和不可访问的商业接口。在这项工作中,我们将组件对象模型(COM)识别为统一的、可执行的抽象,提出了COM即行动:一种新的范式,将专业软件交互重新定义为确定性程序合成,而非顺序视觉控制。为了在最苛刻的环境中验证这一范式,我们引入了ComCADBench,这是首个针对操作真实工业CAD软件的代理的基准测试。我们的实验揭示了显著的范式差距:前沿的专有模型在基于GUI的交互下几乎无法成功,而基于COM的执行则带来了实质性的即时收益。为了弥合语法正确性与几何精度之间的剩余差距,我们开发了ComActor,一个通过渐进式三阶段框架训练的自校正代理,以及ComForge,一个用于在Windows容器中进行大规模训练的可扩展平台。大量实验表明,ComActor在ComCADBench上达到了最先进的性能,在基线崩溃的长程任务中表现出强大的韧性,并泛化到外部CAD基准测试。

英文摘要

Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with heterogeneous protocols and inaccessible commercial interfaces. In this work,we identify the Component Object Model (COM) as a unified executable abstraction, proposing COM-as-Action: a new paradigm that reframes professional software interaction as deterministic program synthesisrather than sequential visual control. To validate this paradigm in the most demanding environments, weintroduce ComCADBench, the first benchmark for agents operating real industrial CAD software. Ourexperiments reveal a substantial paradigm gap: frontier proprietary models achieve near-zero successunder GUI-based interaction, whereas COM-based execution yields substantial immediate gains. Tobridge the remaining gap between syntactic correctness and geometric accuracy, we develop ComActor, aself-correcting agent trained through a progressive three-stage framework, alongside ComForge, a scalableplatform for large-scale training in Windows containers. Extensive experiments show that ComActorachieves state-of-the-art performance on ComCADBench, with strong resilience in long-horizon taskswhere baselines collapse, and generalizes to external CAD benchmark.

2606.13179 2026-06-12 cs.ET cs.AI cs.AR cs.NE 新提交

Modern analog computing for solving differential and matrix equations

现代模拟计算用于求解微分方程和矩阵方程

Zhong Sun, Piergiulio Mannocci, Manuel Le Gallo, Abu Sebastian

AI总结 本文综述现代模拟计算在求解微分方程和矩阵方程中的核心原语、硬件实现及最新进展,强调电阻式存储器阵列的优势,并讨论精度、可扩展性及与内存计算的关系。

详情
AI中文摘要

近年来,受人工智能和科学计算等数据密集型应用的计算需求驱动,模拟计算重新获得关注。鉴于计算任务的多样性以及模拟CMOS电路和电阻式存储器技术的最新进展,我们将这一不断发展的领域称为现代模拟计算。在此背景下,我们识别出三个核心计算原语:求解微分方程、求解矩阵方程以及执行矩阵-向量乘法,并探讨它们之间的联系。我们还研究了这些模拟计算算子的各种硬件实现,包括基于分立元件、集成电路和电阻式存储器设备的实现。其中,电阻式存储器阵列因其实现效率而显得尤为有前景。本文随后调查了利用现代模拟计算(使用先进的模拟CMOS电路和电阻式存储器阵列)求解微分方程和矩阵方程的最新进展。最后,我们讨论了这些电路的应用、精度和可扩展性问题及其潜在解决方案、与内存计算的关系,以及模拟计算的独特计算复杂性。本文提供了关于模拟计算的统一视角,强调了其优势、当前发展和挑战,并将其定位为下一代计算前沿的关键推动者。

英文摘要

In recent years, driven by the computational demands of data-intensive applications such as artificial intelligence and scientific computing, analog computing has gained renewed interest. Given the diversity of computational tasks and recent advancements in analog CMOS circuits and resistive memory technologies, we refer to the evolving landscape as modern analog computing. In this context, we identify three core computational primitives: solving differential equations, solving matrix equations, and performing matrix-vector multiplications, and we explore the connections among them. We also examine various hardware implementations of these analog computing operators, including those built with discrete components, integrated circuits, and resistive memory devices. Among these, resistive memory arrays emerge as particularly promising due to their implementation efficiency. The paper then surveys recent progress in leveraging modern analog computing to solve differential and matrix equations using both advanced analog CMOS circuits and resistive memory arrays. Finally, we discuss the applications of these circuits, the precision and scalability issues and their potential solutions, the relationship with in-memory computing, and the unique computational complexity of analog computing. This paper provides a unified perspective on analog computing, highlighting its strengths, current developments, and challenges, and positioning it as a pivotal enabler of next-generation computational frontiers.

2606.13133 2026-06-12 cs.DS cs.LG 新提交

Learning-Augmented Approximation for Unrelated-Machines Makespan Scheduling

学习增强的无关联机器调度近似算法

Kaito Baba, Evripidis Bampis, Giorgos Mitropoulos

AI总结 针对无关联机器调度问题,提出学习增强算法,利用重作业分配预测实现精确预测时(1+ε)-近似,误差增大时退化为2-近似。

详情
Comments
22 pages, 3 figures
AI中文摘要

最近,Antoniadis等人(ICLR 2025)提出了一个框架,通过引入预测来近似NP-hard选择问题。尽管该方法简单,但它紧密匹配理论下界,因此其推广极具吸引力。我们解决了Antoniadis等人工作中提出的一个开放问题,即如何将该方法扩展到选择问题类之外的其他重要问题,例如调度问题。我们为无关联机器上的最小化完工时间问题(记为$R\\|C_{\max}$)开发了一种学习增强算法。通过使用重作业分配的预测,我们在预测准确时实现了多项式时间的$(1+\varepsilon)$-近似,并且随着误差增加,该近似平滑地退化为最坏情况下的2-近似。我们通过实证分析总结了我们的工作。

英文摘要

Recently, Antoniadis et al. (ICLR 2025) proposed a framework for incorporating predictions to approximate NP-hard selection problems. Despite its simplicity, this approach tightly matches theoretical lower bounds, making its generalization highly compelling. We address an open question raised in the work of Antoniadis et al., concerning the extension of this approach to other important problems outside the class of selection problems, such as scheduling. We develop a learning-augmented algorithm for the makespan minimization problem on unrelated machines, denoted by $R\|C_{\max}$. By using predictions of heavy job assignments, we achieve a polynomial-time $(1+\varepsilon)$-approximation for accurate predictions that smoothly degrades to a worst-case 2-approximation as the error increases. We conclude our work with an empirical analysis of our method.

2606.13113 2026-06-12 eess.SY cs.RO cs.SY 新提交

MPC for underactuated spacecraft control with a Lyapunov supervised physics-informed neural network correction layer

基于李雅普诺夫监督的物理信息神经网络校正层的欠驱动航天器MPC控制

Amirhossein Ayanmanesh Motlaghmofrad, Carlo Cena, Mauro Martini, Marcello Chiaberge

AI总结 针对欠驱动航天器姿态控制,提出一种分层架构,结合非线性模型预测控制、物理信息神经网络和李雅普诺夫监督机制,在不确定性下降低稳态误差并保持鲁棒性。

详情
Comments
Accepted at SPAICE (AI in and for Space) 2026
AI中文摘要

欠驱动航天器面临可控性限制和对环境干扰的高度敏感性,使得姿态机动和稳定复杂化。由于沿欠驱动轴缺乏控制能力,传统控制器无法直接稳定所有姿态分量,因此需要参考规划策略。此外,MPC方法对惯性不确定性和未建模动态耦合仍然敏感,导致在失配下跟踪性能下降。为解决这些问题,我们考虑一种集成三层的分层架构:(i) 非线性模型预测控制器(NMPC),用于约束和欠驱动感知的机动规划以及在执行器限制下的标称闭环稳定性;(ii) 物理信息神经网络(PINN),在仿真数据上离线训练以估计残余干扰力矩,其损失项强制执行与刚体旋转动力学的一致性;(iii) 基于李雅普诺夫的监督安全机制,在线评估学习到的校正并限制或抑制其影响,以保持基线控制器的稳定性特性。该架构在模拟反作用轮动力学、执行器饱和及环境干扰的高保真仿真环境中进行评估。蒙特卡洛研究表明,与独立NMPC相比,稳态姿态误差有统计显著的降低,同时在不确定性下保持鲁棒行为。监督层确保当基于学习的增强不可靠时,能够优雅地退化到纯模型控制。

英文摘要

Underactuated spacecraft faces controllability limitations and heightened sensitivity to environmental disturbances, complicating attitude maneuvering and stabilization. Due to the lack of control authority along the underactuated axis, conventional controllers cannot directly stabilize all attitude components and therefore require reference planning strategies. Furthermore, MPC approaches remain sensitive to inertia uncertainty and unmodeled dynamic couplings, resulting in degraded tracking performance under mismatch. To address these issues, we consider a hierarchical architecture integrating three layers: (i) a nonlinear model predictive controller (NMPC) for constraint and underactuation-aware maneuver planning and nominal closed-loop stability under actuator limits; (ii) a physics-informed neural network (PINN) trained offline on simulation data to estimate residual disturbance torques, with loss terms that enforce consistency with rigid-body rotational dynamics; (iii) a Lyapunov-based supervisory safety mechanism that evaluates the learned correction online and bounds or suppresses its influence to preserve the stability properties of the baseline controller. The architecture is evaluated in a high-fidelity simulation environment modelling reaction wheel dynamics, actuator saturation, and environmental disturbances. Monte Carlo studies show statistically significant reductions in steady-state attitude error relative to standalone NMPC while maintaining robust behavior under uncertainty. The supervisory layer ensures graceful degradation to purely model-based control when the learning-based augmentation is unreliable.

2606.13097 2026-06-12 cs.PL cs.AI 新提交

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

功能缓存嫁接:具身智能体的鲁棒且快速代码策略合成

Saehun Chun, Wonje Choi, Sera Choi, Sanghyun Ahn, Honguk Woo

AI总结 提出FCGraft框架,通过维护函数级验证代码骨架及其键值缓存,对新任务进行缓存嫁接(拼接和修补),减少预填充计算并复用验证结构,实现更鲁棒和快速的策略合成。

详情
Comments
Accepted at ICML 2026
AI中文摘要

编写代码的大型语言模型(CodeLLMs)通过将自然语言目标和环境约束转化为结构化控制程序,为具身智能体生成可执行的代码策略。然而,在开放域具身环境中,策略生成存在两个基本限制:(i) 由于长提示上的重复预填充计算导致的延迟解码,以及(ii) 由于完全生成式解码导致的鲁棒性有限,这常常产生API不匹配、缺少安全防护和不稳定的控制逻辑。为了解决这些限制,我们提出了FCGraft,一种功能缓存嫁接框架。FCGraft维护一个函数级验证代码骨架库及其相关的提示级Transformer键值(KV)缓存,并在提供新任务时通过检索相关函数并嫁接其KV缓存来合成新策略。给定检索到的函数缓存,FCGraft通过拼接(将缓存的函数片段组合成复合策略)和修补(仅局部调整必要的代码区域以满足任务特定参数和约束,且只需最少的额外解码)进行缓存嫁接。通过消除冗余的预填充计算,该方法减少了生成延迟,同时重用经过验证的控制结构提高了鲁棒性,相比提示级缓存方法RAGCache,任务成功率提高了18.31%,策略合成速度提高了2.3倍。

英文摘要

Code-writing large language models (CodeLLMs) generate executable code policies for embodied agents by translating natural language goals and environmental constraints into structured control programs. However, policy generation in open-domain embodied environments suffers from two fundamental limitations: (i) delayed decoding caused by repetitive prefill computation over long prompts, and (ii) limited robustness due to fully generative decoding, which often produces API mismatches, missing safety guards, and unstable control logic. To address these limitations, we present FCGraft, a Functional Cache Grafting framework. FCGraft maintains a library of function-level validated code skeletons and their associated prompt-level Transformer key-value (KV) caches, and synthesizes new policies by retrieving relevant functions and grafting their KV caches when a new task is provided. Given retrieved function caches, FCGraft performs cache grafting via stitching, which composes cached function segments into a composite policy, and patching, which locally adapts only the necessary code regions to satisfy task-specific parameters and constraints with minimal additional decoding. By eliminating redundant prefill computation, this approach reduces generation latency, while reusing validated control structures improves robustness over prompt-level caching methods RAGCache, achieving 18.31% higher task success rate and 2.3x faster policy synthesis.

2606.13079 2026-06-12 cs.CR cs.AI 新提交

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

大型语言模型驱动的AI系统中自主渗透能力的涌现

Jiaqi Luo, Jiarun Dai, Zhile Chen, Jia Xu, Weibing Wang, Yawen Duan, Brian Tse, Geng Hong, Xudong Pan, Yuan Zhang, Min Yang

AI总结 针对现有评估方法不透明、场景简化等问题,构建包含两级目标服务器和通用代理框架的自主渗透评估体系,测试19个LLM发现成功率10.7%-69.3%,且能力随模型整体能力提升。

详情
AI中文摘要

如今,能够造成重大现实世界危害的网络攻击的自主执行被广泛视为前沿AI系统不得跨越的关键红线之一。在这个更广泛的红线场景中,自主渗透代表了一项核心使能能力和子任务:LLM驱动的AI系统在无需人工干预的情况下,独立对目标服务器进行对抗操作,识别和利用漏洞,并获得未授权访问或控制的能力。越来越多的研究试图评估AI系统的自主渗透能力。然而,现有评估通常采用不透明的方法,依赖不切实际或过度简化的渗透测试场景,或为LLM提供过多的先验知识和任务特定指导,无法准确捕捉现代AI系统在更广泛的高影响网络攻击场景中自主执行这一核心能力的程度。为解决这些局限性,我们构建了一个新的自主渗透评估框架,包含两个组成部分:目标服务器和代理脚手架。具体而言,在目标服务器端,我们基于与易受攻击服务一起部署的无已知漏洞安全服务的数量,设计了两个级别的目标环境:一级(一个安全服务)和二级(三个安全服务),共产生300个目标服务器。同时,代理脚手架采用通用代理架构,配备一组通用网络安全工具,没有任何目标特定的先验知识。我们评估了19个开源和专有LLM,发现当前模型的渗透成功率在10.7%到69.3%之间。此外,我们观察到自主渗透能力随着整体模型能力的提升而持续改进。

英文摘要

Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

2606.13076 2026-06-12 cs.MA cs.GT cs.LG 新提交

$α$-fair heterogeneous agent reinforcement learning

$\alpha$-公平异质智能体强化学习

Yao-hua Franck Xu, Tayeb Lemlouma, Jean-Marie Bonnin, Arnaud Braud

AI总结 提出一种结合$\alpha$-公平性与异质智能体信任区域学习(HATRL)的框架,通过公平优势函数动态加权智能体效用,实现单调改进并收敛至纳什均衡,在顺序社会困境中优于HATRL算法。

详情
AI中文摘要

多智能体系统中的合作通常通过功利主义目标进行优化,这些目标最大化整体效率但未能考虑奖励分配,常常导致不公平的“领导者-跟随者”动态。虽然基于公平的方法鼓励每个智能体从合作中受益的亲社会行为,但许多当前算法——包括那些利用奖励塑造的算法——破坏了马尔可夫博弈的平稳性或缺乏严格的理论保证。这在公平目标方法和理论上安全的学习框架之间造成了关键差距。我们提出了一种新颖的框架,将$\alpha$-公平性与异质智能体信任区域学习(HATRL)相结合,确保单调改进并收敛至纳什均衡。我们的方法利用一种公平优势函数,该函数根据智能体的期望回报动态加权其效用,使得全局目标能够根据参数$\alpha$从纯粹的功利主义效率过渡到$\alpha$-公平福利。我们引入了两种实用算法,$\alpha$-公平HATRPO和$\alpha$-公平HAPPO,并通过在CleanUp和CommonHarvest等顺序社会困境中的实验证明,从功利主义角度看,它们比HATRL算法表现更好,同时实现了更高的社会结果。

英文摘要

Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges $α$-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to $α$-fairness welfare based on the parameter $α$. We introduce two practical algorithms, $α$-fair HATRPO and $α$-fair HAPPO, and demonstrate through experiments in sequential social dilemmas like CleanUp and CommonHarvest that they perform better than HATRL's algorithms from a utilitarian point of view while achieving socially higher outcomes.

2606.13071 2026-06-12 cs.CY cs.AI cs.HC 新提交

"Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System

“这还不够吗?”:加拿大算法签证分类系统中的机构问责与集体意义建构的不对称性

Dipto Das, Matthew Tamura, Syed Ishtiaque Ahmed, Shion Guha

AI总结 研究加拿大签证系统中算法问责的机构表述与申请者体验,发现机构强调透明度与程序保障,而申请者通过集体意义建构应对不透明决策,揭示认知、管辖和时空关系三方面不对称。

详情
AI中文摘要

本文研究了加拿大签证系统中算法问责如何在机构层面被表述,以及跨境申请者如何体验这种问责。我们使用为公共部门调整的算法决策(ADMAPS)框架,分析了加拿大移民、难民和公民部(IRCC)针对临时居民签证(TRV)分类系统的算法影响评估(AIA),并采用混合方法分析了Reddit上申请者之间的讨论。我们表明,虽然机构工件强调透明度、程序保障和有限影响,但申请者进行集体意义建构以解读不透明决策,常常在不确定性中依赖同行知识。我们识别了机构问责结构与人们感知过程之间的三种不对称:获取决策逻辑的认知不对称、由地缘政治定位塑造的管辖不对称,以及等待和不确定性体验中的时间-关系不对称。我们强调了将注意力从机构设计转向公共部门算法治理中体验的不均匀分布的重要性。这些贡献共同展示了跨国移民背景下的算法治理系统如何产生机构披露框架未能捕捉的结构性不对称,以及扩展ADMAPS如何能够解释这些不平等的问责转化。

英文摘要

This paper examines how algorithmic accountability in Canada's visa system is articulated institutionally and experienced by applicants across borders. We analyzed Immigration, Refugees and Citizenship Canada (IRCC)'s Algorithmic Impact Assessment (AIA) for the temporary resident visa (TRV) triage system using the algorithmic decision-making adapted for the public sector (ADMAPS) framework and analyzed Reddit discussions among applicants using a mixed-methods approach. We show that while institutional artifacts emphasize transparency, procedural safeguards, and bounded impacts, applicants engage in collective sensemaking to interpret opaque decisions, often relying on peer knowledge amid uncertainty. We identify three asymmetries between how institutional accountability is structured and how people perceive the process: epistemic asymmetry in access to decision logic, jurisdictional asymmetry in exposure shaped by geopolitical positioning, and temporal--relational asymmetry in how waiting and uncertainty are experienced. We emphasize why it is important to shift attention from institutional design to the uneven distribution of experiences with public-sector algorithmic governance. Together, these contributions demonstrate how algorithmic governance systems in the context of transnational migration produce structured asymmetries not captured by institutional disclosure frameworks, and how extending ADMAPS can account for those uneven translations of accountability.

2606.13068 2026-06-12 cs.MA cs.RO 新提交

Effects of Social Interactions in Self-Organising Railway Traffic Management

自组织铁路交通管理中社交互动的影响

Fabio Oddi, Federico Naldini, Leo D'Amato, Grégory Marlière, Paola Pellegrini, Vito Trianni

AI总结 研究自组织铁路交通管理中预测邻域范围(horizon)对分布式协调过程的影响,发现短时间范围足够,长范围会损害局部可解性和计算响应性而无全局收益。

详情
AI中文摘要

最近的研究正在探索自组织交通管理作为扩展到复杂现实网络的一种解决方案。在这样的系统中,列车预测其邻域,生成交通计划假设,并通过与邻居的共识达成未来要实施的交通计划。本文研究了该流程中的一个结构参数:预测邻域范围。列车使用该范围来识别与邻居的未来潜在冲突,并建立局部交互拓扑,即要与之协商的列车子集。作为主要设计变量,范围直接决定了社交互动图的大小和密度,而其对局部子问题复杂性和分布式共识动态的影响则代表了需要探索的权衡。通过闭环仿真框架,研究评估了范围变化如何影响整个分散协调过程,从初始冲突检测到分布式调度共识。分析重点在于研究范围选择引入的潜在权衡:平衡局部可解性和计算响应性与安全关键环境中全局调度一致性和可行性的需求。与直觉相反,我们的实证结果表明,短时间范围就足够了,而长时间范围会损害局部可解性和计算响应性,且不会带来全局调度最优性的提升。

英文摘要

Recent research is exploring self-organised traffic management as a solution for scaling to complex real-world networks. In such a system, trains predict their neighbourhood, produce traffic plan hypotheses, and agree via consensus with neighbours on a future traffic plan to be implemented. This paper investigates a structural parameter within this pipeline: the predictive neighbourhood horizon. The horizon is used by trains to identify future potential conflicts with neighbours, and to establish the local interaction topology, that is, the subset of trains to negotiate with. As the primary design variable, the horizon directly determines the size and density of the social interaction graph, whereas its impact on the complexity of local sub-problems and the distributed consensus dynamics represents a trade-off to be explored. Through a closed-loop simulation framework the study evaluates how variations of the horizon impact the overall decentralised coordination process, from initial conflict detection to distributed schedule consensus. The analysis focuses on investigating the potential trade-off introduced by the horizon choice: balancing local tractability and computational responsiveness with the need for global schedule coherence and feasibility in safety-critical environments. Contrary to intuition, our empirical results indicate that the short time horizons suffice, while long values compromise local tractability and computational responsiveness with no gain in global schedule optimality.

2606.13039 2026-06-12 cs.CY cs.AI cs.HC 新提交

Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation

断层线:在公共部门转型中国家政策与地方实践交汇处的伦理与负责任AI导航

Sitong Lyu, Shabnam Taghiyeva, Mohit Kukadia, Denis Newman-Griffis

AI总结 本文以英国特殊教育需求与残疾(SEND)为案例,通过17次半结构化访谈的主题分析,揭示了国家政策与地方实践在负责任AI实施中的五大挑战,并提出了政策与结构改革建议。

详情
Comments
10 pages plus references. This study was funded by the University of Sheffield
AI中文摘要

英国政府采取了支持AI的立场,以帮助在严重财政压力下转变公共服务交付,但将这一愿景转化为负责任的AI实践的道路仍然不明确。虽然英国政策通常在国家层面制定,但地方当局负责大多数公共服务交付,而公共部门中AI优先叙事的快速推进正在暴露这一国家-地方接口在知识和实践方面的断层线。本文以高风险的特殊教育需求与残疾(SEND)领域为案例,研究英国中央政府与地方当局之间接口处负责任AI的解释和实施方式。我们对17位政策制定者、从业者和第三部门专业人士进行了半结构化访谈,并进行了主题分析,以识别在国家政策与地方实践交汇处负责任AI的障碍和促成条件。我们发现了地方当局面临的五个相互关联的挑战:AI的影子使用和数据隐私风险、AI供应中的市场-政府不对称、劳动力准备不足、缺乏标准化定义和测量,以及人类问责制的缺口。针对每个挑战,参与者提出了可操作的步骤,从加强数据保护框架和重新平衡市场-政府关系到提升劳动力能力。我们对SEND的审查使这些挑战更加突出,展示了影响弱势儿童和家庭的高风险决策如何加剧了关于问责制、公平性和人类监督的紧张关系,暴露了基于原则的监管方法的局限性。我们认为,负责任的公共部门AI需要国家政策调整以及地方层面机构能力、价值观和治理机制的结构性改革。

英文摘要

The UK government has adopted a pro-AI stance to help transform public service delivery in the face of severe financial pressures, but the path to translate this vision into responsible AI practice remains ill-defined. While UK policy is often set at the national level, local authorities are responsible for most public service delivery, and the rapid advance of AI-first narratives in the public sector is exposing fault lines in knowledge and practice at this national-local interface. This paper examines how responsible AI is interpreted and implemented at the interface between the UK's central government and local authorities, taking the high-stakes area of Special Educational Needs and Disabilities (SEND) as a case study. We present a thematic analysis of 17 semi-structured interviews with policymakers, practitioners, and third-sector professionals to identify barriers and enabling conditions for responsible AI where national policy meets local practice. We identify five interconnected challenges facing local authorities: shadow usage of AI and data privacy risks, market-government asymmetry in AI provision, insufficient workforce readiness, a lack of standardised definitions and measurements, and gaps in human accountability. For each, participants proposed actionable steps, from strengthening data protection frameworks and rebalancing the market-government relationship to enhancing workforce capacity. Our examination of SEND brings these challenges into sharper focus, showing how high-stakes decisions affecting vulnerable children and families intensify tensions around accountability, fairness, and human oversight, exposing the limits of a principle-based regulatory approach. We argue that responsible public sector AI requires both national policy adjustments and structural reforms to institutional capacity, values, and governance mechanisms at the local level.

2606.13026 2026-06-12 cs.CY cs.AI 新提交

Democracy in the Era of Artificial Intelligence

人工智能时代的民主

Evangelos Pournaras, Srijoni Majumdar, Carina Hausladen, Dirk Helbing

AI总结 本文探讨如何利用人工智能升级民主制度,增强集体智慧、审议民主和自治系统,同时应对隐私、偏见和虚假信息等风险。

详情
AI中文摘要

将人工智能(AI)与民主相结合是我们时代最深刻的挑战之一。一方面,AI 为克服民主中长期存在的挑战提供了机会,例如在代表权不足的审议和投票过程中参与度低的问题。另一方面,AI 算法带来了新的风险,这些算法侵犯隐私、存在偏见、具有操纵性、传播虚假信息并影响选举结果。超越“AI 对民主是好是坏”这一过于简单的问题,《人工智能时代的民主手册》转而提出:如何利用 AI 升级民主及其所基于的原则?如何与 AI 互动以及以何种条件互动?需要哪些新的价值观和设计原则来建立民主韧性?来自世界各地不同学科的 59 位作者在 34 章中探讨了 AI 如何增强民主的集体智慧(第 1 部分),以及使用大型语言模型和社交媒体的审议民主的未来(第 2 部分)。我们还阐述了 AI 在构建有韧性的自治系统中的作用(第 3 部分),以及 AI 时代民主转型的挑战(第 4 部分)。最后,我们以更广阔的视角(第 5 部分)重新构想民主与 AI 的相互作用。

英文摘要

Interfacing Artificial Intelligence (AI) with democracy is one of the most profound challenges of our times. On the one hand, AI comes with opportunities to overcome long-standing challenges in democracy, such as low participation in deliberative and voting processes with poor representation of people. On the other hand, new risks arise from AI algorithms that are privacy-intrusive, biased, manipulative, spread misinformation and influence election results. Moving beyond the over-simplistic question of whether AI is good or bad for democracy, the Handbook on Democracy in the Era of Artificial Intelligence asks instead: how to upgrade democracies and the principles they are built on, using AI? How to engage with AI and on what terms? Which new values and design principles are required to build democratic resilience? In 34 chapters by 59 authors across the world from different disciplines, we explore how AI can empower collective intelligence for democracy (Part 1) and what is the future of deliberative democracy using large language models and social media (Part 2). We also illustrate the role of AI for building resilient self-governance systems (Part 3) and the challenges of transforming democracy in the age of AI (Part 4). We conclude with broader perspectives (Part 5) that re-imagine the interplay of democracy and AI.

2606.12949 2026-06-12 cs.CR cs.CV 新提交

ViPER: Vision-based Packing-Aware Encoder for Robust Malware Detection

ViPER:基于视觉的打包感知编码器用于鲁棒恶意软件检测

Fatima Qaiser, Bisma Tahir, Muhammad Abid Mughal, Nauman Shamim

AI总结 提出ViPER,一种基于LoRA适配ViT-B/14的双头架构,联合学习恶意软件分类和打包检测,通过打包感知门控机制和频率加权损失处理打包标签偏斜,在20万Windows PE图像上达到0.8521平衡准确率、0.9260 ROC-AUC和0.9279 AUPR。

详情
AI中文摘要

基于可视化的恶意软件检测将原始二进制字节映射为灰度图像,并应用学习的视觉分类器,为传统分析流程提供了一种抗规避且无需反汇编的替代方案。然而,可执行文件打包仍然是一个关键的失效模式:打包后的二进制文件产生高熵图像,掩盖了这些模型所依赖的结构模式。由于打包在良性软件中也很常见(例如用于压缩或复制保护),仅凭打包状态并不能可靠地指示恶意性,且现有方法未在统一的监督框架内解决这一挑战。我们提出了ViPER,一种基于视觉的打包感知编码器,用于鲁棒的恶意软件检测。ViPER构建在LoRA适配的ViT-B/14骨干网络上,采用双头架构,联合学习恶意软件分类和打包检测。打包感知门控机制根据推断的打包状态调节恶意软件预测,从而为打包和未打包输入实现不同的决策边界。为了解决训练期间打包标签偏斜的问题,我们采用了频率加权损失,并在联合类别-打包层上进行分层采样。在20万张Windows PE字节图图像上的评估中,ViPER达到了0.8521的平衡准确率、0.9260的ROC-AUC和0.9279的AUPR,在所有主要指标上均优于代表性的最先进基线,同时打包检测AUC达到0.9949。

英文摘要

Visualization-based malware detection maps raw binary bytes to grayscale images and applies learned visual classifiers, providing an evasion-resistant and disassembly-free alternative to conventional analysis pipelines. However, executable packing remains a critical failure mode: packed binaries produce high-entropy images that obscure the structural patterns these models rely on. Because packing is also prevalent in benign software (e.g., for compression or copy protection), packing state alone is not a reliable indicator of maliciousness, and existing approaches do not address this challenge within a unified supervised framework. We present ViPER, a Vision-based Packing-Aware Encoder for Robust malware detection. ViPER builds on a LoRA-adapted ViT-B/14 backbone with a dual-head architecture that jointly learns malware classification and packing detection. A packing-aware gating mechanism conditions malware predictions on the inferred packing state, enabling distinct decision boundaries for packed and unpacked inputs. To address packing label skew during training, we employ frequency-weighted losses with stratified sampling over joint class-packing strata. Evaluated on 200,000 Windows PE byteplot images, ViPER achieves a balanced accuracy of 0.8521, ROC-AUC of 0.9260, and AUPR of 0.9279, outperforming representative state-of-the-art baselines across all primary metrics, while attaining a packing detection AUC of 0.9949.

2606.12918 2026-06-12 cs.CR cs.AI 新提交

MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems

MAStrike: 基于Shapley值的多智能体系统合谋红队测试

Chejian Xu, Zhaorun Chen, Jingyang Zhang, Freddy Lecue, Avni Kothari, Sarah Tan, Wenbo Guo, Bo Li

AI总结 提出MAStrike框架,通过Shapley值分析识别多智能体系统中脆弱智能体联盟,生成角色感知的对抗攻击,并迭代优化以绕过防御,显著优于启发式基线。

详情
AI中文摘要

分层多智能体系统(MAS)正迅速部署在金融和软件工程等高危工作流中。在这些系统中,安全本质上是分布在不同角色智能体上的,显著扩大了攻击面,特别是在特权提升和跨智能体合谋等协调对抗行为下。现有的MAS红队测试方法仍然有限:它们依赖启发式选择目标智能体并扰动孤立的消息流,留下了关键问题未解答,即哪些智能体对系统安全最负责,以及受损智能体如何协调以绕过防御。我们提出MAStrike,一个用于分层MAS中合谋红队测试的闭环框架。我们首次提出针对MAS的智能体级Shapley值分析,量化每个智能体在任务特定分布下对系统鲁棒性的边际贡献。在此归因指导下,MAStrike识别脆弱智能体联盟并生成协调的、角色感知的对抗操纵。这些攻击通过结构化因果诊断迭代优化,将失败案例归因于阻止对抗尝试的未受损智能体。我们进一步构建了全面的MAS红队测试基准和可控环境,涵盖不同的分层拓扑和领域,包括金融、软件工程和CRM。在多个前沿模型构建的MAS上进行的广泛实验表明,MAStrike显著优于启发式基线。我们的分析进一步揭示了智能体间非平凡的Shapley值分布和高阶交互结构,揭示了先前单智能体或基于模板的方法忽略的关键漏洞和协调模式。

英文摘要

Hierarchical multi-agent systems (MAS) are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particularly under coordinated adversarial behaviors such as privilege escalation and cross-agent collusion. Existing red-teaming approaches for MAS remain limited: they rely on heuristic selection of target agents and perturb isolated message streams, leaving critical questions unanswered as which agents are most responsible for system safety, and how compromised agents can coordinate to bypass defenses. We propose MAStrike, a closed-loop framework for collusive red-teaming in hierarchical MAS. We propose the first agent-level Shapley value analysis for MAS, quantifying each agent's marginal contribution to system robustness under task-specific distributions. GGuided by this attribution, MAStrike identifies vulnerable agent coalitions and generates coordinated, role-aware adversarial manipulations. These attacks are iteratively refined through structured causal diagnosis, attributing failure cases to uncompromised agents that block adversarial attempts. We further build a comprehensive MAS red-teaming benchmark and controllable environments spanning diverse hierarchical topologies and domains, including finance, software engineering, and CRM. Extensive experiments across MAS built on multiple frontier models show that MAStrike substantially outperforms heuristic baselines. Our analysis further uncovers non-trivial Shapley value distributions and higher-order interaction structures among agents, revealing critical vulnerabilities and coordination patterns that are overlooked by prior single-agent or template-based methods.

2606.12904 2026-06-12 cs.IR cs.CL cs.HC cs.SI 新提交

Trait, Not State: The Durability of Reading Identity in Social Highlighting

特质而非状态:社交高亮中阅读身份的持久性

Kazuki Nakayashiki, Keisuke Watanabe

AI总结 通过分析读者前六个月的高亮行为作为个人档案,追踪其后续选择,发现阅读选择特征在长达24个月以上保持稳定,表明这是一种特质而非状态。

详情
Comments
12 pages, 3 figures, 3 tables
AI中文摘要

先前关于社交网络高亮工具的研究将个体性定位于选择——即一个人选择高亮哪些文档——但仅从横截面角度进行测量。我们提出时间性问题:读者的选择特征是特质还是状态?我们将每位读者前六个月的高亮行为冻结为个人档案,并追踪其在后续选择中(间隔逐渐增大至24个月以上)的自身优势,负样本来自同一日历时期——因此供给漂移不能伪装成个人漂移——在粗粒度全局层面和细粒度层面(其负样本和对照来自读者自身的兴趣领域)进行测量;锚定单元重现了先前的横截面水平(+0.188 vs +0.169),验证了该框架。四个结果:在同一用户内,细粒度优势在任何时间跨度上均未显示统计上可检测的配对下降(6-12个月保留率 R = 1.00 [0.85, 1.18],n = 212;最远的区间与适度下降兼容;唯一区间排除零的对比是12-24个月的粗粒度层,约下降13%)。该信号不可简化为重复域名(排除所有档案来源后约90%信号保留)。个体内漂移缓慢(最近半年的档案比旧半年档案高出+0.042)。前瞻性地,个人档案——即使仅由读者最早期的文档构建(评估前中位数20个月)——其下一阅读的AP值约为所有测试过的简单非个人先验的3倍。我们将“特质”操作性地定义为在持续参与下的稳定特征;研究范围限于一个平台上的重度、长期读者,且曝光与选择不可分离。

英文摘要

Prior work on a social web highlighter located individuality in selection -- which documents a person chooses to highlight -- but measured it cross-sectionally. We ask the temporal question: is a reader's selection signature a trait or a state? We freeze each reader's first six months of highlighting as a profile and track its own-vs-other advantage on their later selections at growing gaps (to 24+ months), with negatives drawn from the same calendar era -- so supply drift cannot masquerade as personal drift -- at a coarse global level and at a fine level whose negatives and controls come from the reader's own interest neighborhood; the anchor cell reproduces the prior cross-sectional level (+0.188 vs +0.169), validating the harness. Four results. Within the same users, the fine-layer advantage shows no statistically detectable paired decline at any horizon (6-12 month retention R = 1.00 [0.85, 1.18], n = 212; the farthest bin is compatible with a modest decline; the only contrast whose interval excludes zero is the coarse layer at 12-24 months, about 13%). The signal is not reducible to repeated domains (~90% survives excluding all profile sources). Within-person drift is slow (a recent-half profile beats the old half by +0.042). Prospectively, personal profiles -- even one built from a reader's earliest documents, median 20 months before evaluation -- rank their next reads at roughly 3x the AP of every simple non-personal prior tested. We use "trait" operationally (a stable signature under continued engagement); the scope is heavy, long-tenured readers of one platform, and exposure is not separable from choice.