arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 3851
2606.07857 2026-06-09 cs.CR cs.AI 新提交

Model Multiplicity for Adversarial Detection in Small Language Model Training on Edge Devices

边缘设备上小语言模型训练中对抗检测的模型多重性

Stefan Behfar, Richard Mortier

发表机构 * Computer Lab, University of Cambridge(剑桥大学计算机实验室)

AI总结 针对边缘设备上分布式微调语言模型易受投毒攻击的问题,提出基于模型多重性的系统级防御,通过旋转或并行训练多个小语言模型并量化其差异来检测异常,实验表明比经典单模型防御更早更可靠地检测投毒。

详情
AI中文摘要

基于边缘的机器学习的兴起使得语言模型能够在移动和物联网设备上进行分布式适应,提供了隐私保护和实时响应。然而,在不可信或异构的边缘节点上对语言模型进行分布式微调引入了新的漏洞。受损或不可靠的设备可以注入中毒更新,导致隐蔽的模型操纵或收敛退化。经典的防御方法,如鲁棒聚合或时间异常检测,在单个全局模型上运行,因此在检测协调或持续性中毒方面受到限制。本文提出了一种基于模型多重性的新型系统级防御。系统不是维护一个全局模型,而是轮换或并行训练多个小语言模型(例如DistilGPT-2),每个模型由独立采样的边缘节点子集更新。这些模型在不同的训练轨迹下演化,创建了同一分布式总体的多个独立视图。通过梯度相似性、损失演化或参数方差量化的模型之间的差异,作为异常或对抗行为的信号。当一个模型显著偏离集成均值时,系统将其贡献节点标记为隔离或重新加权。我们实现了该框架,并在不同异质性和攻击条件下的边缘规模小语言模型(SLM)训练模拟中进行了评估。结果表明,与经典的单一模型防御(如Flanders和Robust方法)相比,模型多重性能够更早、更可靠地检测投毒。我们的发现表明,模型演化的多样性可以作为资源受限边缘设备上安全分布式学习的实用且有效的防御机制。

英文摘要

The rise of edge-based machine learning has enabled distributed adaptation of language models across mobile and IoT devices, offering privacy preservation and real-time responsiveness. However, distributed fine-tuning of language models on untrusted or heterogeneous edge nodes introduces new vulnerabilities. Compromised or unreliable devices can inject poisoned updates, leading to stealthy model manipulation or convergence degradation. Classical defenses such as robust aggregation or temporal anomaly detection operate on a single global model and are therefore limited in detecting coordinated or persistent poisoning. This work proposes a new system-level defense based on model multiplicity. Instead of maintaining one global model, the system rotates or concurrently trains multiple small language models (e.g., DistilGPT-2), each updated by independently sampled subsets of edge nodes. These models evolve under distinct training trajectories, creating multiple independent views of the same distributed population. Divergence between models quantified through gradient similarity, loss evolution, or parameter variance serves as a signal of anomalous or adversarial behavior. When one model deviates significantly from the ensemble mean, the system flags its contributing nodes for isolation or re-weighting. We implement this framework and evaluate it on edge-scale simulations of Small Language Model (SLM) training under varying heterogeneity and attack conditions. Results show that model multiplicity enables earlier and more reliable detection of poisoning compared to classical single-model defenses such as Flanders and Robust methods. Our findings demonstrate that diversity in model evolution can serve as a practical and effective defense mechanism for secure distributed learning on resource-constrained edge devices.

2606.07846 2026-06-09 cs.DC cs.AI cs.MA 新提交

Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method

成本感知的LLM-Agent工作流投机执行:一种综合五维方法

Faisal Fareed

发表机构 * AWS(亚马逊网络服务)

AI总结 提出一种五维投机执行方法,通过贝叶斯概率估计和成本定价,在LLM-Agent工作流中平衡延迟与成本,并确保无副作用回滚。

详情
AI中文摘要

LLM-Agent工作流将模型调用和工具调用串联起来,大部分挂钟时间花在等待上游操作完成,然后下游操作才能开始。投机执行可以通过预测的上游输入启动下游操作来回收空闲时间,但每次投机都会产生实际成本(按token计费),且其成功概率难以估计并随时间漂移。本文提出一种围绕五个设计决策组织的方法:(D1) 在上游完成之前启动下游操作;(D2) 以实际美元按不同的输入和输出费率定价每次投机;(D3) 暴露一个单一的操作符拨盘用于延迟与成本权衡;(D4) 通过一个期望值规则进行决策,该规则包含一个失败加权成本项和一个偏好调整阈值;(D5) 使用贝叶斯Beta-Binomial后验估计成功概率,其先验依赖于依赖类型分类。这些想法的变体出现在近期工作中;而组合起来,每次决策都以美元记录,是新颖之处。该规则仅在通过可接受性前提(无副作用、幂等或可在提交屏障后分阶段执行)的边上触发,因为错误的投机通过重新执行回滚,这会退还token但无法撤销不可逆的副作用。我们指定了运行时机制、一个闭式结果(规则在上游分支因子增长时自我限制)、一个五阶段校准流水线(离线回放、影子、金丝雀、在线校准、漂移触发终止开关),以及一个针对八种生产原型的工作负载适配模板。与四个最接近的已发表系统(DSP、Speculative Actions v2、Sherlock、B-PASTE)的对比表显示了每个维度上的差异,并且一个合成验证套件确认了预测的决策边界、概率阈值、后验恢复和流式取消行为。

英文摘要

LLM-agent workflows chain model calls and tool invocations, and spend most of their wall-clock time waiting on upstream operations before downstream ones can start. Speculative execution can reclaim that idle time by launching a downstream operation with a predicted upstream input, but here each speculation costs real money (per-token billing) and its success probability is hard to estimate and drifts over time. This paper presents a method organized around five design decisions: (D1) start a downstream operation before its upstream completes; (D2) price each speculation in real dollars at separate input and output rates; (D3) expose a single operator dial for latency versus cost; (D4) decide via an expected-value rule with a failure-weighted cost term and a preference-adjusted threshold; and (D5) estimate the success probability with a Bayesian Beta-Binomial posterior whose prior is keyed to a dependency-type taxonomy. Variants of these ideas appear in recent work; the combination, with every decision logged in dollars, is what is new. The rule fires only on edges passing an admissibility precondition (side-effect-free, idempotent, or stageable behind a commit barrier), since a wrong speculation is rolled back by re-execution, which refunds tokens but cannot un-send an irreversible side effect. We specify the runtime mechanics, a closed-form result that the rule self-limits as the upstream branching factor grows, a five-stage calibration pipeline (offline replay, shadow, canary, online calibration, drift-triggered kill-switch), and a workload-fit rubric over eight production archetypes. Contrast tables against the four closest published systems (DSP, Speculative Actions v2, Sherlock, B-PASTE) show differentiators on every dimension, and a synthetic validation suite confirms the predicted decision boundary, probability threshold, posterior recovery, and streaming-cancellation behavior.

2606.07845 2026-06-09 cs.MA cs.LG 新提交

GRPO Does Not Close the Multi-Agent Coordination Gap

GRPO 并未缩小多智能体协调差距

Najmul Hasan, Prashanth BusiReddyGari

发表机构 * Department of Mathematics and Computer Science University of North Carolina at Pembroke(数学与计算机科学系北卡罗来纳大学帕克维尔分校)

AI总结 通过哲学家就餐问题测试大语言模型的多智能体协调能力,发现GRPO训练无法显著提升性能,瓶颈在于训练方法而非计算量。

Comments 15 pages, 15 figures

详情
AI中文摘要

我们使用哲学家就餐问题作为干净的测试平台,衡量当前大型语言模型作为共享公共资源的多个智能体进行协调的能力。在涵盖七个模型和三种哲学家数量的630个回合中,四个前沿闭源系统的平均奖励达到0.45至0.87,Mistral-Small 24B达到0.83至0.99,而Qwen3-14B仅为0.13至0.35。然后我们询问,基于任务自身展开的群体相对策略优化(GRPO)能否缩小差距,结果发现不能:对五个哲学家场景的每回合奖励进行Welch t检验,p=0.66,Hedges' g=-0.11,在十个或十五个哲学家场景下也没有统计显著变化。两个进一步的观察限定了这一结果。8B和14B运行中的训练奖励在第九步达到峰值后下降,因此默认在第15步保存的检查点严格劣于之前的几个检查点。我们使用的四项奖励在零动作时存在退化最大值,DeepSeek-R1-Distill-Qwen-7B和Mistral-Small 24B在五个哲学家场景下都处于该状态,零餐时的平均奖励分别为1.0和0.83。对于开放权重的14B模型,多智能体协调的瓶颈不是训练计算量,而是训练方法:不会坍缩到无动作最大值的奖励塑造、不依赖最后一步的检查点纪律,以及跨问题规模的课程学习。

英文摘要

We measure how well current large language models coordinate as multiple agents sharing a common resource, using the dining philosophers problem as a clean test bed. Across 630 episodes spanning seven models and three philosopher counts, four frontier closed-source systems reach mean reward 0.45 to 0.87 and Mistral-Small 24B reaches 0.83 to 0.99, while Qwen3-14B reaches 0.13 to 0.35. We then ask whether group relative policy optimization (GRPO) on rollouts from the task itself can close the gap and find that it cannot: a Welch's t-test on per-episode reward at five philosophers gives p = 0.66 and a Hedges' g of -0.11, with no statistically significant change at ten or fifteen philosophers either. Two further observations qualify the result. The training reward of both 8B and 14B runs peaked at step nine and then declined, so the default saved checkpoint at step 15 is strictly worse than several earlier ones. The four-term reward we use admits a degenerate maximum at zero actions, which DeepSeek-R1-Distill-Qwen-7B and Mistral-Small 24B at five philosophers both inhabit, with mean reward 1.0 and 0.83 respectively at zero meals. The bottleneck for an open-weight 14B model on multi-agent coordination is not training compute but training methodology: reward shaping that does not collapse to a no-action maximum, checkpoint discipline that does not depend on the final step, and curriculum across problem scales.

2606.07843 2026-06-09 cs.DB cs.IR cs.LG 新提交

RACT: Retrieval Augmented Column-Table Learning and Prediction for Multi-Table Schema Matching

RACT: 检索增强的列-表学习与预测用于多表模式匹配

Leonard Traeger, Enas Khwaileh, Andreas Behrend, George Karabatis

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校) Utrecht University(乌特勒支大学) Technical University of Cologne(科隆技术大学)

AI总结 提出RACT自监督框架,通过检索候选表约束列匹配空间,在多表模式匹配中优于相似性基线,精度和完整性提升高达70%。

Comments Research Preprint, 12 pages

详情
AI中文摘要

模式匹配是整合来自不同来源数据的关键任务,旨在识别不同模式中列之间的对应关系。在多表整体模式匹配中,由于异构模式设计,具有相似语义含义的列可能位于不同上下文的表中,此时基于相似性的技术不足。本文重点是通过引入RACT学习和预测,将引用上下文利用到模式匹配中,这是一个自监督框架,能够概率性地检索源列的候选表,以约束相关列候选。实验表明,该方法在多表模式匹配中优于基于相似性的基线。在后续匹配实验中,通过top-t表约束列搜索空间,平均匹配精度和完整性均提高了高达70%。

英文摘要

Schema matching, a critical task for integrating data from diverse sources, seeks to identify correspondences between columns across different schemas. In multi-table holistic schema matching, columns with similar semantic meaning may reside in tables with different contexts due to heterogeneous schema designs, where similarity-based techniques are inadequate. The focus of this paper is exploiting referential context into schema matching by introducing RACT learning and prediction, a self-supervised framework enabling the probabilistic retrieval of candidate tables for source columns to constrain relevant column candidates. Experiments demonstrate that this approach outperforms similarity-based baselines on matching multi-table schemas. In subsequent matching experiments, constraining the column search space via top-t tables improves both average matching precision and completeness by up to +70%.

2606.07841 2026-06-09 stat.CO cs.LG stat.ML 新提交

Large-scale empirical tuning and comparison of default optimizers for variational inference

变分推断默认优化器的大规模经验调优与比较

Trevor Campbell, Jonathan H. Huggins, Kyurae Kim, Charles C. Margossian

发表机构 * Department of Statistics, UBC(统计学系,不列颠哥伦比亚大学) Department of Mathematics & Statistics, Boston University(数学与统计学系,波士顿大学) Faculty of Computing & Statistics, Boston University(计算与统计学学院,波士顿大学) Department of Computer and Information Science, UPenn(计算机与信息科学系,宾夕法尼亚大学)

AI总结 通过大规模实验(56种优化器、1092个问题、55万次运行)评估变分推断中的自适应优化器,发现无单一方法最优,但5种算法组合可接近最佳性能。

详情
AI中文摘要

黑箱变分推断(BBVI)是一种依赖于随机优化的后验近似方法。在实践中,支撑BBVI的随机优化器通常需要大量针对特定问题的调优,这削弱了其作为真正“黑箱”推断算法的承诺。然而,在过去十年中,许多新的自适应随机优化算法已被开发出来,它们减少或完全消除了调优的需要。在这项工作中,我们在BBVI的背景下研究了这些新的自适应方法集合,旨在建立当前无调优优化推断的最新技术水平。具体而言,我们对应用于1092个贝叶斯推断优化问题的56种基于随机梯度的优化算法进行了大规模实证评估,涉及超过55万次独立优化运行和15个核心年的计算。我们评估的优化算法代表了近期方法的广泛谱系,而基准问题则涵盖了从难度范围(后验目标维度1-10^4,条件数1-10^8)以及多种变分族。我们的结果表明,没有单一方法占主导地位,但运行5种算法的选择足以可靠地接近观察到的最佳性能。因此,我们为无法进行专家调优的应用以及开发新的随机优化算法时的比较提供了强有力的基线。

英文摘要

Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.

2606.07837 2026-06-09 cs.HC cs.AI 新提交

Does Persona Make LLMs K-pop Fans? A Pilot Study of LLM-Based Online Concert Audience Agents

角色设定会让LLM成为K-pop粉丝吗?基于LLM的在线演唱会观众智能体初步研究

Kirak Kim, Hyojin Kim, Yejin Son, Sungyoung Kim, Kyung Myun Lee

发表机构 * Graduate School of Culture Technology, KAIST, Daejeon, South Korea(韩国成均馆大学文化科技研究生院) Department of Artificial Intelligence, Yonsei University, Seoul, South Korea(延世大学人工智能系)

AI总结 研究通过多智能体系统模拟K-pop演唱会实时粉丝聊天,发现角色设定能提升聊天质量和自然度,但未增强社交连接或情感反应,表明有意义的集体体验需更深层次对齐。

Comments Accepted at the ICML 2026 Workshop on Culture x AI: Evaluating AI as a Cultural Technology

详情
AI中文摘要

演唱会是一种集体体验,但录制的表演视频通常是独自观看,剥离了使演唱会充满事件的共享观众存在。我们研究基于角色的LLM观众智能体能否通过生成K-pop表演视频旁的实时粉丝聊天来重现这种集体体验的某些方面。我们提出了一个多智能体系统,其中十个LLM智能体通过实时聊天消息做出反应,比较了角色条件化观众(每个智能体被分配一个独特的粉丝身份、偏好和聊天风格)与无角色基线。在K-pop粉丝(N=11)的受试者内试点中,角色条件化显著提高了模型级别的聊天质量和感知自然度,但并未转化为社交连接、参与度或情感反应的差异。访谈表明,在线K-pop演唱会聊天可能作为集体独白而非人际对话运作,而有意义的参与取决于与特定艺人和粉丝群体的共同认同。角色条件化可以使LLM观众看起来更自然,但具有文化意义的集体体验可能需要角色、群体行为、粉丝身份和用户期望之间更深层次的对齐。

英文摘要

A concert is a collective experience, but recorded performance videos are typically watched alone, stripping away the shared audience presence that makes concerts feel eventful. We investigate whether persona-based LLM audience agents can recreate aspects of this collective experience by generating real-time fan chat alongside a K-pop performance video. We present a multi-agent system in which ten LLM agents react through live-chat messages, comparing a persona-conditioned audience (each agent assigned a distinct fan identity, bias, and chat style) with a no-persona baseline. In a within-subjects pilot with K-pop fans (N=11), persona conditioning substantially improved model-level chat quality and perceived naturalness, but did not translate into differences in social connectedness, engagement, or affective response. Interviews suggest that online K-pop concert chat may operate as collective monologue rather than interpersonal dialogue, and that meaningful participation depends on shared identification with the specific artist and fandom. Persona conditioning can make LLM audiences appear more natural, but culturally meaningful collective experience may require deeper alignment between persona, crowd behavior, fandom identity, and user expectations.

2606.07836 2026-06-09 cond-mat.mtrl-sci cond-mat.stat-mech cs.AI physics.comp-ph quant-ph 新提交

Agentic multi-fidelity learning of quasiparticle and excitonic properties

准粒子和激子性质的智能多保真学习

Arnab Neogi, Aaron Forde, Christopher A. Lane, Sergei Tretiak, Jian-Xin Zhu

发表机构 * Theoretical Division, Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室理论部) Center for Integrated Nanotechnologies, Materials Physics and Applications Division, Los Alamos National Laboratory(集成纳米技术中心,材料物理与应用部,洛斯阿拉莫斯国家实验室)

AI总结 提出智能引导的多保真框架,通过置信度加权和少量高精度参考点,结合机器学习校正GW-BSE计算中的数值不稳定性,准确预测应变MoS2-WS2双层中的准粒子带隙和激子结合能。

详情
AI中文摘要

多体GW-Bethe-Salpeter方程计算对于现代低维纳米材料中电子结构和光学性质的精确模拟至关重要。然而,这些方法计算量大,并且可能表现出局部数值不稳定性或收敛失败,在高通量工作流程中难以检测。我们引入了一个智能引导的多保真框架,用于校正应变MoS2-WS2双层中的GW-Bethe-Salpeter激发态景观。在不同堆叠配准、应变分支和倒空间采样下,该工作流程识别出与脆弱的长波介电屏蔽相关的尖峰状偏移、近零带隙塌缩和交叉保真不一致性。一个结构智能体通过分配置信度权重并选择性地使用少量高精度参考点来评估计算。然后,机器学习模型在相关系统间传递信息,并应用高斯过程校正来恢复改进的准粒子带隙和激子结合能,并带有校准的不确定性估计。该方法纠正了数值诱导的伪影,而不消除物理应变依赖性,并且与无智能体基线相比,显著提高了与更高保真度参考的一致性。这些结果表明,激发态材料的可靠替代学习需要明确诊断数值脆弱性,而不是直接插值原始第一性原理数据点。所提出的框架可轻松转移到其他以强量子限制为特征的光电纳米材料,例如量子点、纳米带、层状二维半导体和混合钙钛矿纳米结构。

英文摘要

Many-body GW-Bethe-Salpeter equation calculations are essential for accurate simulations of electronic structure and optical properties in modern low-dimensional nanomaterials. However, these methods are computationally demanding and can exhibit localized numerical instabilities or convergence failures that are difficult to detect within high-throughput workflows. We introduce an agent-guided multi-fidelity framework for correcting GW-Bethe-Salpeter excited-state landscapes in strained MoS2-WS2 bilayers. Across stacking registries, strain branches and reciprocal-space samplings, the workflow identifies spike-like excursions, near-zero-gap collapse and cross-fidelity inconsistencies associated with fragile long-wavelength dielectric screening. A structural agent evaluates calculations by assigning confidence weights and selectively using a small number of high-accuracy reference points. Machine learning models then transfer information across related systems and apply Gaussian process corrections to recover improved quasiparticle gaps and exciton binding energies, with calibrated uncertainty estimates. The approach corrects numerically induced artifacts without erasing physical strain dependence and substantially improves agreement with higher-fidelity references relative to a no-agent baseline. These results show that reliable surrogate learning for excited-state materials requires explicit diagnosis of numerical fragility, not direct interpolation of raw first-principles data points. The proposed framework is readily transferable to other optoelectronic nanomaterials characterized by strong quantum confinement, such as quantum dots, nanoribbons, layered two-dimensional semiconductors, and hybrid perovskite nanostructures.

2606.07833 2026-06-09 cs.CR cs.AI 新提交

Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

超越通过/失败:使用过程挖掘理解LLM如何抵抗(和失败)红队攻击

Zvi Topol

发表机构 * MuyVentive LLC

AI总结 提出将过程挖掘应用于红队攻击轨迹,通过分析事件日志提取直接跟随图和状态转移矩阵,揭示GPT-OSS和Llama 3.3在防御结构上的差异,发现传统攻击成功率指标无法捕捉的模型防御模式。

详情
AI中文摘要

标准AI红队评估将对抗性活动简化为单一的二元结果——攻击成功率(ASR),没有考虑模型如何抵抗或屈服于攻击的顺序结构。我们提出将过程挖掘(一门从事件日志中发现和分析过程模型的学科)应用于红队攻击轨迹。我们进行了一项受控实验,将60个HarmBench提示与两个LLM(GPT-OSS 120B和Llama 3.3 70B)对抗,使用10种提示变异策略,每个提示最多尝试110次。从得到的8,575个评分事件中,我们提取了直接跟随图(DFG)和状态转移矩阵,揭示了仅靠ASR无法看到的、结构上不同的防御轮廓:GPT-OSS表现出近乎吸收的拒绝状态,而Llama则呈现出从拒绝到成功越狱的多条多孔逃生路径。我们进一步证明,变异器的有效性在模型间是不对称的,并且越狱时间分布相差一个数量级。

英文摘要

Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10 prompt mutation strategies over up to 110 attempts per prompt. From the resulting 8,575 scored events we extract Directly-Follows Graphs (DFGs) and state transition matrices that reveal structurally distinct defense profiles invisible to ASR alone: GPT-OSS exhibits a near-absorbing refusal state, while Llama presents multiple porous escape routes from refusal to getting successfully jailbroken. We further show that mutator effectiveness is asymmetric across models and that time-to-jailbreak distributions differ by an order of magnitude.

2606.07828 2026-06-09 cs.SE cs.AI 新提交

Jas: AI-Paired Engineering as a Revival of N-Version Programming

Jas:AI配对工程作为N版本编程的复兴

Jason Hickey

发表机构 * Independent(独立)

AI总结 本研究通过单开发者跨平台移植矢量图应用的案例,提出AI配对工程方法,结合精确YAML规范与并行实现作为差分测试层,使传统需多人年的工作变得可行,并视其为N版本编程的复兴。

详情
AI中文摘要

我报告了一个AI配对软件工程的案例研究:由单个开发者在约120个晚间小时内完成的五个矢量插图应用的工作移植,分别基于Rust、Swift、OCaml、Python和浏览器平台。该方法将AI辅助实现与两个保障措施配对——一个精确的可执行YAML规范作为单一事实来源,以及并行实现作为内置差分测试层。五个移植共享23,000行的规范;每个移植的原生代码范围从0到约95,000行,反映了规范的逃生口。我认为,在具备这两个保障措施的条件下,AI配对工程使得传统上需要多个开发者年的工作范围变得可行,并将该方法框架为N版本编程的复兴,这是一种因成本原因被放弃的1980年代方法,而AI改变了这一状况。论文报告了具体工件和单开发者案例研究的诚实局限性。

英文摘要

I report a case study in AI-paired software engineering: five working ports of a vector illustration application across Rust, Swift, OCaml, Python, and browser-based platforms, built by a single developer in approximately 120 evening hours. The methodology pairs AI-assisted implementation with two safeguards -- a precise executable YAML specification serving as the single source of truth, and parallel implementations functioning as a built-in differential-testing layer. The five ports share a 23{,}000-line specification; per-port native code ranges from 0 to roughly 95{,}000 lines, reflecting the specification's escape hatch. I argue that AI-paired engineering, conditional on these two safeguards, makes feasible scope of work that conventionally requires multiple developer-years, and frame the methodology as a revival of N-version programming, a 1980s approach abandoned on cost grounds that AI changes. The paper reports concrete artifacts and honest limitations of the single-developer case study.

2606.07802 2026-06-09 cs.CY cs.AI 新提交

Memetic Capture: A Pluralistic Policy Framework for Governing AI-Driven Cultural Disempowerment

模因捕获:治理AI驱动的文化去权能的多元政策框架

Subramanyam Sahoo

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出“模因捕获”概念,指AI通过文化影响削弱人类自主性,并构建四层文化多元治理框架(CPGF),强调多元主义是结构性必需。

Comments Paper accepted in Pluralistic Alignment Workshop at ICML 2026

详情
AI中文摘要

文化是AI逐步削弱人类自主权的最隐蔽媒介:与经济或政治替代不同,文化替代攻击的是人类识别和抵抗自主权丧失的偏好和价值观。我们认为,现有AI治理框架存在关键盲点,将文化影响视为次于经济和安全问题。本文提出“模因捕获”作为AI驱动的文化去权能的统一概念,并提出文化多元治理框架(CPGF),这是一个四层政策架构,结合了定量文化影响力指标、民主价值集会、多元部署标准和跨国协调机制。我们认为,多元主义不仅是此类治理的伦理要求,而且是结构性必需:单一文化的AI治理加速了它声称要防止的自主权丧失。我们确定了具体的政策杠杆,讨论了实施中的张力,并概述了多元对齐与文化AI治理交叉领域的研究议程。

英文摘要

Culture is the most insidious vector of gradual human disempowerment by AI: unlike economic or political displacement, cultural displacement attacks the very preferences and values through which humans recognise and resist disempowerment itself. We argue that existing AI governance frameworks suffer from a critical blind spot by treating cultural impact as secondary to economic and safety concerns. This paper develops \emph{memetic capture} as a unifying concept for AI-driven cultural disempowerment, and proposes the \textbf{Cultural Pluralistic Governance Framework (CPGF)}, a four-tier policy architecture combining quantitative cultural influence metrics, democratic value assemblies, pluralistic deployment standards, and transnational coordination mechanisms. We argue that pluralism is not merely an ethical requirement for such governance but a structural necessity: monocultural AI governance accelerates the very disempowerment it claims to prevent. We identify concrete policy levers, discuss implementation tensions, and outline a research agenda at the intersection of pluralistic alignment and cultural AI governance.

2606.07792 2026-06-09 cs.CR cs.LG cs.SE 新提交

MOLOT System Card: Malicious Operational Logic Observation Transformer

MOLOT系统卡:恶意操作逻辑观察变换器

Daniil Lopatkin, Maksim Mitrofanov, Stanislav Rakovsky, Aleksandr Khalikov

发表机构 * False Positive Community

AI总结 提出MOLOT系统,利用静态调用图的行为序列进行恶意代码检测,结合解释阶段定位可疑行为,在PyPI和npm包上评估,并发布Open Malicious-Code Bench基准。

Comments 13 pages, 3 figures

详情
AI中文摘要

MOLOT(恶意操作逻辑观察变换器)是一个静态恶意代码检测系统,专为SAST环境设计,其中包元数据、维护者历史和动态执行轨迹可能不可用或不可靠。该系统将源代码表示为从静态调用图派生的行为序列,并包含一个解释阶段,该阶段对可疑行为活动进行排序并将其映射回源代码位置。该方法在来自PyPI和npm的Python和JavaScript包上进行了评估,与开源检测工具进行了比较,并在产品约束下进行了验证,包括运行时、内存使用以及在实际审核工作流中观察到的误报率。我们还发布了Open Malicious-Code Bench,这是一个用于可重复评估恶意包检测方法的公共基准。结果表明,静态行为序列建模可以为现代DevSecOps工作流提供准确、可解释且可部署的恶意代码检测。

英文摘要

MOLOT (Malicious Operational Logic Observation Transformer) is a static malicious-code detection system designed for SAST setup where package metadata, maintainer history, and dynamic execution traces may be unavailable or unreliable. The system represents source code as behavior sequences derived from static call graphs, includes an explanation stage that ranks suspicious behavior activities and maps them back to source-code locations. The approach is evaluated on Python and JavaScript packages from PyPI and npm, compared with opensource detection tools, and validated under product constraints including runtime, memory use, and false-positive rates observed in a real moderation workflow. We also release Open Malicious-Code Bench, a public benchmark for reproducible evaluation of malicious-package detection methods. The results show that static behavior-sequence modeling can provide accurate, explainable, and deployable malicious-code detection for modern DevSecOps workflows.

2606.07791 2026-06-09 cs.GR cs.CV cs.IR 新提交

Frequency-Scale Saliency for Spectral Descriptor Analysis in 3D Shape Retrieval

频率尺度显著性用于三维形状检索中的谱描述符分析

Jianru Shen

发表机构 * University of Montana(蒙大拿大学)

AI总结 提出频率尺度显著性框架,通过消融量化描述符尺度区间的检索贡献,发现短尺度主导性能而长尺度有害,加权检索在困难类别上提升mAP 0.156。

Comments Accepted at Computer Graphics International (CGI) 2026

详情
AI中文摘要

经典谱描述符如热核签名和波核签名广泛用于非刚性三维形状检索,但其失效模式仍不明确。我们提出一个频率尺度显著性框架,通过消融量化每个描述符尺度区间对检索级别的贡献,从而审计这些描述符。我们引入类谱指纹来表征类别级别的尺度依赖性,并表明描述符在类别对之间的相似性与检索失败显著相关,Spearman相关系数为0.479。在SHREC'11上的实验表明,短尺度主导检索性能而长尺度有害,HKS和WKS表现出不同的尺度依赖性模式,且显著性加权检索在困难类别上将mAP提升了0.156,交叉验证和随机权重控制证实该提升是稳定的,并非由任意重新加权导致。

英文摘要

Classical spectral descriptors such as the Heat Kernel Signature and Wave Kernel Signature are widely used for non-rigid 3D shape retrieval, yet their failure modes remain poorly understood. We present a frequency-scale saliency framework that audits these descriptors by quantifying the retrieval-level contribution of each descriptor scale interval through ablation. We introduce class spectral fingerprints to characterize category-level scale dependence, and show that descriptor similarity between class pairs is substantially correlated with retrieval failure, with a Spearman correlation of 0.479. Experiments on SHREC'11 demonstrate that short scales dominate retrieval performance while long scales are harmful, that HKS and WKS exhibit distinct scale dependence patterns, and that saliency-weighted retrieval improves mAP on hard categories by 0.156, with cross-fold and random-weight controls confirming that the gain is stable and not due to arbitrary reweighting.

2606.07782 2026-06-09 math.OC cs.LG math.MG 新提交

Non-Archimedean Polydisc Spaces and Applications to Optimisation

非阿基米德多圆盘空间及其在优化中的应用

Paul Lezeau, Yiannis Fam, Anthea Monod, Yue Ren

发表机构 * London School of Geometry and Number Theory(伦敦几何与数论学院) Department of Mathematics, Imperial College London(伦敦帝国学院数学系) Department of Mathematics, Durham University(杜伦大学数学系)

AI总结 受Berkovich几何启发,提出非阿基米德多圆盘空间,保留刚性层次结构并具备良好几何性质,证明其可嵌入度量树,提出多项式绝对值线性组合的函数类,建立优化理论并给出算法与开源实现。

Comments 54 pages, 23 figures. Comments welcome

详情
AI中文摘要

我们提出了一个受Berkovich几何启发的非阿基米德空间上的优化新框架。具体地,我们引入了多圆盘空间,它由非阿基米德域上的闭球乘积构成。这些空间保留了非阿基米德域的刚性层次结构,同时获得了许多该域所缺乏的优良几何特征。我们证明了度量树自然地嵌入这些空间,展示了它们表示层次数据的能力。我们研究了它们的度量几何,建立了诸如测地线唯一性等性质,证实了它们与经典优化技术的兼容性。我们进一步提出了一类由多项式绝对值线性组合给出的实值函数。这些函数沿测地线具有分段多项式描述,并满足通用逼近性质。我们建立了多圆盘空间上的优化理论:证明了极小值的存在性,并探索了寻找极小值的算法。我们提供了一个配套的开源Julia库,实现了所引入的核心对象和优化过程。

英文摘要

We propose a new framework for optimisation over non-Archimedean spaces inspired by Berkovich geometry. Specifically, we introduce polydisc spaces, which consists of products of closed balls over a non-Archimedean field. These spaces retain the rigid hierarchical structure of the non-Archimedean field whilst acquiring many desirable geometric features absent from it. We show that metric trees embed naturally into these spaces, demonstrating their capacity to represent hierarchical data. We study their metric geometry, establishing properties such as geodesic uniqueness, confirming their comaptibility with classical optimisation techniques. We further propose a class of real-valued functions given by linear combinations of absolute values of polynomials. These functions admit a piecewise polynomial description along geodesics and satisfy a universal approximation property. We formulate a theory of optimisation on polydisc spaces: we prove existence of minimisers and explore algorithms for finding them. We provide an accompanying open-source Julia library implementing the core objects and optimisation procedures introduced.

2606.07771 2026-06-09 astro-ph.IM astro-ph.GA cs.AI 新提交

Beyond Point Estimates: Benchmarking Uncertainty Quantification Methods on the AION-1 Astronomical Foundation Model

超越点估计:在AION-1天文基础模型上基准测试不确定性量化方法

Karla Tame-Narvaez, Aleksandra Ćiprijanović, Shubhendu Trivedi

发表机构 * Scientific Computing Division Fermi National Accelerator Laboratory(费米国家加速器实验室科学计算部) Fermi National Accelerator Laboratory(费米国家加速器实验室) Department of Astronomy and Astrophysics University of Chicago(芝加哥大学天文学与天体物理学系) NSF and Simons SkAI Institute(国家科学基金会与Simons SkAI研究所) Google DeepMind(谷歌DeepMind)

AI总结 本文在AION-1基础模型嵌入上比较七种不确定性量化方法,发现共形预测(尤其是LVD框架)在星系属性回归中提供可靠的边际和局部覆盖,优于非共形基线。

Comments 7 pages, 1 table, 1 figure

详情
Journal ref
Contribution to Conference on Physics and AI at Stanford University (PAI 2026)
AI中文摘要

天文巡天的基础模型提供了强大的学习表示,可迁移到星系属性估计等下游回归任务。然而,仅有点预测不足以进行科学推理;可靠的不确定性量化(UQ)至关重要。我们使用冻结的AION-1基础模型嵌入,在星系属性回归上比较了七种UQ方法,从Legacy Survey测光/成像和DESI光谱预测红移、恒星质量、星族年龄、气相金属丰度和比恒星形成率,标签来自PROVABGS。无分布共形方法在所有属性上实现了约1个百分点内的名义90%边际覆盖,而非共形基线(深度集成、MC Dropout)无法可靠校准。在共形方法中,共形分位数回归(CQR)在模型预测最差的区间内提供了最佳覆盖。更重要的是,只有局部有效且可判别(LVD)框架——特别是在AION-1嵌入上运行时——还提供了有限样本的局部有效性,生成的区间适应每个星系的局部预测难度,而不是仅依赖边际保证。这些结果确立了共形预测,特别是LVD,作为天体物理学中基础模型嵌入上不确定性感知推理的首选UQ框架。

英文摘要

Foundation models for astronomical surveys offer powerful learned representations that can be transferred to downstream regression tasks such as galaxy property estimation. However, point predictions alone are insufficient for scientific inference; reliable uncertainty quantification (UQ) is essential. We compare seven UQ methods on galaxy property regression using frozen AION-1 foundation-model embeddings, predicting redshift, stellar mass, stellar-population age, gas-phase metallicity, and specific star-formation rate, from Legacy Survey photometry/imaging and DESI spectra, with PROVABGS-derived labels. Distribution-free conformal methods achieve marginal coverage within $\sim$1\,pp of the nominal 90\% across all properties, while non-conformal baselines (Deep Ensembles, MC~Dropout) fail to calibrate reliably. Among conformal approaches, Conformalized Quantile Regression (CQR) delivers the best coverage in the bin with the poorest model predictions. More importantly, only the Locally Valid and Discriminative (LVD) framework -- particularly when operating on AION-1 embeddings -- also provides finite-sample \emph{local validity}, producing intervals that adapt to each galaxy's local prediction difficulty rather than relying on marginal guarantees alone. These results establish conformal prediction, and LVD in particular, as the preferred UQ framework for uncertainty-aware inference on foundation-model embeddings in astrophysics.

2606.07727 2026-06-09 quant-ph cs.CL math.OC q-fin.PM 新提交

Benchmarking Quantum Algorithmic Resilience for CVaR Portfolio Optimization: The Expressibility-Coherence Trade-off

面向CVaR投资组合优化的量子算法韧性基准测试:可表达性-相干性权衡

Prashik N. Somkuwar, K. Srinivasan, G. Raghavan

发表机构 * Prashik N. Somkuwar, K. Srinivasan, G. Raghavan(普拉希克·N·索姆库瓦尔、K·斯里尼瓦森、G·拉加万)

AI总结 针对混合均值方差与条件风险价值投资组合优化,对比硬件高效变分量子神经网络与热启动量子近似优化算法,揭示NISQ设备上算法可表达性与硬件相干性之间的关键权衡。

Comments 10 pages, 11 figures. Master's thesis research conducted at the School of Quantum Technology, Defence Institute of Advanced Technology (DIAT), Pune

详情
AI中文摘要

量子组合优化为复杂金融建模提供了理论优势,但在噪声中等规模量子(NISQ)设备上的物理实现受到硬件拓扑的严重限制。本研究针对混合均值方差与条件风险价值(CVaR)投资组合目标,对硬件高效变分量子神经网络(HE-VQNN)和热启动量子近似优化算法(WS-QAOA)进行了硬件基准测试分析。通过实现一种新颖的经典量子混合代理矩阵来绕过CVaR辅助量子比特瓶颈,我们将NIFTY 50指数中多达16个资产映射到IBM heavy hex处理器上。我们系统地量化了算法对路由过程中产生的“SWAP代价”的韧性。实证结果揭示了一个关键的操作权衡:WS-QAOA提供了精确的理论映射,但由于指数级的非局部门开销而遭受灾难性的硬件退相干。相反,HE-VQNN保持了硬件相干性,但缺乏捕捉密集尾部风险资产相关性的数学可表达性。本研究揭示了当前架构下密集金融优化的局限性,迫使在算法不可表达性与硬件退相干之间做出不可行的选择。这指示了在缺乏全连接性的NISQ计算机上能做什么和不能做什么的更深层限制。

英文摘要

Quantum combinatorial optimization offers theoretical advantages for complex financial modeling, but physical implementation on Noisy Intermediate Scale Quantum (NISQ) devices is severely constrained by hardware topology. This study presents a hardware benchmarking analysis between a Hardware Efficient Variational Quantum Neural Network (HE-VQNN) and the Warm Start Quantum Approximate Optimization Algorithm (WS-QAOA) for a hybrid Mean Variance and Conditional Value at Risk (CVaR) portfolio objective. By implementing a novel classical quantum hybrid proxy matrix to bypass the CVaR auxiliary qubit bottleneck, we map up to 16 assets from the NIFTY 50 index onto an IBM heavy hex processor. We systematically quantify algorithmic resilience to the "SWAP tax" incurred during routing. Empirical results reveal a critical operational trade-off: WS-QAOA provides exact theoretical mapping but suffers catastrophic hardware decoherence due to exponential nonlocal gate overhead. Conversely, HE-VQNN preserves hardware coherence but lacks the mathematical expressibility to capture dense tail risk asset correlations. This study exposes the limitations of dense financial optimization on current architectures forces an nonviable choice between algorithmic inexpressibility and hardware decoherence. This is indicative of a deeper limitation as to what can and cannot be done with NISQ computers lacking in all-to-all connectivity.

2606.07725 2026-06-09 physics.geo-ph cs.LG 新提交

GNSS-FM: A Self-Supervised Foundation Model for Daily GNSS Displacement Time Series

GNSS-FM:用于日常GNSS位移时间序列的自监督基础模型

Nick Teutschmann, Laura Crocetti, Fanny Lehmann, Leonardo Trentini, Benedikt Soja

发表机构 * Institute of Geodesy and Photogrammetry, ETH Zurich(大地测量与摄影测量研究所,苏黎世联邦理工学院) ETH AI Center(ETH人工智能中心)

AI总结 提出GNSS-FM自监督基础模型,通过双流输入和掩码潜在预测预训练,在位移预测和地震阶跃定位任务上优于强基线。

详情
AI中文摘要

来自全球导航卫星系统(GNSS)的位移时间序列对于广泛的应用至关重要,包括监测构造地壳变形和研究地震周期的不同阶段。机器学习方法已被证明在GNSS应用中具有前景;然而,大多数方法仍然是完全监督的。这造成了瓶颈,因为标记数据稀缺,尽管大量未标记的GNSS数据可免费获取。我们提出了GNSS-FM,一个用于日常GNSS时间序列的自监督基础模型。该模型使用结合位移和速度类增量的双流输入,并通过掩码潜在预测目标进行预训练,该目标采用从wav2vec 2.0改编的向量量化目标,并针对大地测量数据进行了若干修改。在来自全球超过17,000个GNSS站的数据上预训练后,对学习到的码本的分析表明,这些表示捕获了GNSS位移数据中的主要信号类型,包括地震偏移、构造漂移和季节性模式。该基础模型随后在两个下游任务上进行微调,即90天位移预测和地震阶跃定位,在这两个任务中,它都优于强大的任务特定基线。这些结果表明,自监督预训练是GNSS时间序列分析的一种有前景的方法。

英文摘要

Displacement time series from Global Navigation Satellite Systems (GNSS) are essential for a wide range of applications, including monitoring tectonic crustal deformations and investigating the different stages of the earthquake cycle. Machine learning methods have proven promising for GNSS applications; however, most remain fully supervised. This creates a bottleneck as labeled data are scarce, even though large amounts of unlabeled GNSS data are freely available. We present GNSS-FM, a self-supervised foundation model for daily GNSS time series. The model uses a dual-stream input combining displacement and velocity-like increments, and is pretrained using a masked latent prediction objective with vector-quantized targets adapted from wav2vec 2.0, with several modifications for geodetic data. Pretrained on data from over 17,000 globally distributed GNSS stations, an analysis of the learned codebook suggests that the representations capture the main signal types in GNSS displacement data, including seismic offsets, tectonic drift, and seasonal patterns. The foundation model is later fine-tuned on two downstream tasks, namely 90-day displacement forecasting and seismic step localization, where it outperforms strong task-specific baselines in both cases. These results show that self-supervised pretraining is a promising approach for GNSS time series analysis.

2606.07717 2026-06-09 eess.IV cs.AI cs.CV 新提交

Multi-planar 2D-U-Net Segmentation of 3D-CT Abdominal Organs augmented by Spatial Occurrence Maps

多平面2D-U-Net分割3D-CT腹部器官,辅以空间出现图

Daria Kern, Negar Chabi, Souraj Adhikary, Andre Mastmeyer

发表机构 * Glasgow Caledonian University School of Science & Engineering(格拉斯哥卡里多尼亚大学科学与工程学院) Jade University of Applied Sciences Department of Engineering & Medical Technology(雅德应用科学大学工程与医疗技术系)

AI总结 提出轻量级2D-U-Net框架,结合粗到细分割、多平面预测和模糊3D空间图,在80个CT扫描中使Dice系数提升约4%。

Comments 11 pages, 9 figures, 1 table, http://www.wscg.eu/

详情
AI中文摘要

本工作提出一个基于2D-U-Net的轻量级框架,用于在大视野3D CT扫描中分割五个腹部器官。该方法结合了粗到细分割、来自多个解剖平面的预测以及额外的模糊3D空间图,这些空间图提供解剖位置线索以提高分割精度。我们结合了由空间出现图增强的多平面2D-U-Net模型。该方法包括两个主要阶段。首先,通过使用2D-U-Net轴向遍历整个扫描并确定5个目标腹部器官的x-y-z最小和最大范围来检测腹部感兴趣区域。其次,我们在前一阶段的边界内使用空间出现图来增强我们的多平面2D-U-Net架构。该方法在来自各种公共来源的80个CT扫描上进行评估。结果显示,与未使用空间出现图训练的相同模型相比,Dice系数最大提升约4%。

英文摘要

This work proposes a lightweight 2D-U-Net-based framework for segmenting five abdominal organs in large field-of-view 3D CT scans. The method combines coarse-to-fine segmentation, predictions from multiple anatomical planes, and additional fuzzy 3D spatial maps that provide anatomical location cues to improve segmentation accuracy. We combine multi-planar 2D-U-Net models augmented by a spatial occurrence map. The approach involves two main stages. First, the abdominal volume of interest region is detected by traversing the whole scan axially with a 2D-U-Net and determining the x-y-z-minimum and -maximum extents of the 5 abdominal organs of interest. Second, we use spatial occurrence maps to enhance our multi-planar 2D-U-net architecture inside the bounds from the former stage. The method is evaluated on 80 CT scans from various public sources. The results show Dice improvements of about 4% at maximum compared to the same model trained without spatial occurrence maps.

2606.07716 2026-06-09 cs.CR cs.AI cs.LG 新提交

SHIELD-IDS: Structurally Heterogeneous Ensemble with Integrated Layered Defense for Intrusion Detection Systems

SHIELD-IDS:用于入侵检测系统的结构异构集成与分层防御

Maryam Zaman, Muhammad Khuram Shahzad

发表机构 * School of Electrical Engineering and Computing(SEECS)(电气工程与计算学院) National University of Sciences and Technology(国立科学与技术大学)

AI总结 提出IDS-Anta++框架,通过集成XGBoost和LightGBM梯度提升模型,并采用隔离森林异常检测、中值特征平滑和六路多数投票三层黑盒防御,提升对抗攻击鲁棒性,在多个数据集上实现99%以上检测准确率。

Comments 10 pages, 5 figures, 7 tables. Code available at: https://github.com/maryamzaman-git/SHEILD-IDS

详情
AI中文摘要

对抗攻击对基于机器学习的入侵检测系统(IDS)构成了严重且日益增长的威胁,其中对网络流特征的微小扰动可以系统性地误导分类器,将恶意流量视为良性。IDS-Anta框架通过Z-score归一化、奇异值分解(SVD)和基于汤普森采样的多臂赌博机(MAB)分类器选择部分解决了这一问题,但其分类器池缺乏足够的结构多样性以实现鲁棒的对抗抵抗。本文引入IDS-Anta++,将XGBoost和LightGBM梯度提升模型纳入集成,并将扩展后的池包裹在三层黑盒防御中:隔离森林异常检测、中值特征平滑和六路多数投票。在CIC-IDS-2017、CEC-CIC-IDS-2018和CIC-DDoS-2019数据集上,在快速梯度符号法(FGSM)和零阶优化(ZOO)攻击下进行的实验证实,干净数据上的检测准确率超过99%,并且在对抗条件下相对于基线IDS-Anta配置具有可测量的鲁棒性提升。

英文摘要

Adversarial attacks pose a serious and growing threat to Machine Learning (ML)-based Intrusion Detection Systems (IDS), where imperceptible perturbations to network flow features can systematically mislead classifiers into accepting malicious traffic as benign. The IDS-Anta framework partially addresses this through Z-score normalization, Singular Value Decomposition (SVD), and Multi-Armed Bandit (MAB) classifier selection with Thompson Sampling, yet its classifier pool lacks sufficient structural diversity for robust adversarial resistance. This work introduces IDS-Anta++, which incorporates XGBoost and LightGBM gradient boosting models into the ensemble and wraps the extended pool in a three-layer black-box defense: Isolation Forest anomaly screening, median feature smoothing, and six-way majority voting. Experiments conducted on CIC-IDS-2017, CEC-CIC-IDS-2018, and CIC-DDoS-2019 under both Fast Gradient Sign Method (FGSM) and Zeroth Order Optimization (ZOO) attacks confirm detection accuracy above 99% on clean data, with measurable robustness gains under adversarial conditions relative to the baseline IDS-Anta configuration.

2606.07712 2026-06-09 cond-mat.mtrl-sci cs.AI 新提交

MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science

MatMind:面向材料科学的结构-活性知识驱动生成基础模型

Zhan'ao Yao, Boxuan Zhang, Jingyuan Shu, Xiaoyu Wu, Rongyan Wang, Linjing Li, Dajun Zeng, Yudong Yao, Tingwei Chen, Youwei Wang, Xiaolin Zhao, Jiahui Shi, Jianjun Liu

发表机构 * State Key Laboratory of High Performance Ceramics(高性能陶瓷国家重点实验室) Shanghai Institute of Ceramics, Chinese Academy of Sciences(中国科学院上海陶瓷研究所) Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences(中国科学院大学材料科学与光电子工程中心) School of Chemistry and Materials Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences(中国科学院大学杭州先进研究所化学与材料科学学院) State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences(多模态人工智能系统国家重点实验室,中国科学院自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Beijing Wenge Technology Co., Ltd.(北京文格科技有限公司) College of Medicine and Biological Information Engineering, Northeastern University(东北大学医学与生物信息工程学院)

AI总结 提出MatMind,一种基于大语言模型的晶体材料生成基础模型,通过结构-活性知识注入、双头架构和物理信息强化学习,在性质预测、无条件生成和条件生成任务上超越专用模型。

Comments 29 pages, 5 figures, including references

详情
AI中文摘要

迄今为止,AI驱动的晶体材料科学进展依赖于为单个任务构建的窄架构——用于性质预测的图神经网络、用于晶体生成的扩散和流匹配模型——每个都在其领域内表现出色,但无法作为跨整个材料问题谱系的共享骨干。生成式大语言模型提供了一种根本不同的范式,其中结构表示、定量预测和结构-活性推理可以在一个模型内统一,但材料学界尚未看到这种范式在竞争性水平上实现,与已建立的窄专家相匹敌。在此,我们提出MatMind,一种在此范式下专为晶体材料科学构建的生成基础模型,通过渐进训练框架中结构-活性知识和物理信息反馈的协调激活开发——结合结构-活性知识注入、在共享表示空间中联合训练语言推理和数值回归的双头架构,以及针对稳定性、新颖性和结构多样性的多目标物理信息强化学习。在三个任务族中,MatMind在能量高于凸包、体模量和带隙上取得最低平均绝对误差——超越专为这些任务构建的图神经网络预测器——在无条件晶体生成上达到65.3%的S.U.N.率,并在磁化密度条件生成上实现了可比的倍数提升,其中在超过600,000个训练条目中仅存在21个正样本。通过在单一统一模型内匹配或超越窄专家在其自身领域上的表现,MatMind表明基于LLM的范式可以作为晶体材料科学未来的可行骨干。

英文摘要

Progress in AI-driven crystal materials science has so far been carried by narrow architectures purpose-built for individual tasks -- graph neural networks for property prediction, diffusion and flow-matching models for crystal generation -- each excelling within its niche yet unable to act as a shared backbone across the full spectrum of materials problems. Generative large language models offer a fundamentally different paradigm, in which structural representation, quantitative prediction, and structure-activity reasoning can be unified within one model, but the materials community has yet to see this paradigm realized at a level competitive with established narrow specialists. Here we present MatMind, a generative foundation model purpose-built for crystal materials science under this paradigm, developed through the coordinated activation of structure-activity knowledge and physics-informed feedback within a progressive training framework -- combining structure-activity knowledge injection, a dual-head architecture that jointly trains language reasoning and numerical regression in a shared representation space, and multi-objective physics-informed reinforcement learning over stability, novelty, and structural diversity. Across three task families, MatMind attains the lowest mean absolute error on energy above hull, bulk modulus, and band gap -- surpassing graph neural network predictors purpose-built for these tasks -- reaches an S.U.N. rate of 65.3% on unconditional crystal generation, and achieves a comparable multiplicative improvement on magnetization-density-conditioned generation, where only 21 positive samples exist within over 600000 training entries. By matching or surpassing narrow specialists on their own ground while operating within a single unified model, MatMind shows that the LLM-based paradigm can serve as a viable backbone for crystal materials science going forward.

2606.07706 2026-06-09 cs.CR cs.AI 新提交

MLingualFC: Evaluating Jailbreak Vulnerabilities in Multilingual Vision-Language Models

MLingualFC: 评估多语言视觉语言模型中的越狱漏洞

Rishabh Makwana, Mamta, Deeksha Varshney, Oana Cocarascu

发表机构 * Dwarkadas Jivanlal Sanghvi College of Engineering(达沃拉斯·吉万拉尔·桑格维工程学院) King’s College London(伦敦国王学院) Indian Institute of Technology Jodhpur(印度理工学院朱罗普尔)

AI总结 提出多语言多模态基准MLingualFC,使用流程图编码有害指令评估多语言VLM的越狱漏洞,发现拉丁语系攻击成功率显著高于非拉丁语系,揭示安全机制跨语言泛化不足。

详情
AI中文摘要

视觉语言模型(VLM)在多模态任务中表现出色,但其安全鲁棒性仍是一个开放挑战。虽然先前工作表明结构化视觉提示(如流程图)可以有效越狱VLM,但现有研究主要局限于英语中心设置。本文中,我们介绍MLingualFC,一个多语言多模态基准,旨在使用结构化流程图表示评估VLM在不同语言下的越狱漏洞。MLingualFC将有害指令编码为五种语言(印地语、旁遮普语、西班牙语、罗马尼亚语和德语)的流程图图像。我们在黑盒威胁模型下评估了最先进的多语言VLM,包括Qwen2.5-VL、Gemma-4和Pangea。我们的结果揭示了显著的多语言安全差距。基于流程图的攻击在拉丁语系语言中实现了高攻击成功率(ASR),表明有害内容的视觉编码有效绕过了跨语言的安全对齐。相反,非拉丁语系语言(如旁遮普语)的ASR显著较低,这表明潜在的限制在于视觉文本识别而非更强的安全对齐。这些发现突显了当前VLM安全机制未能跨语言和模态泛化。资源可在https://github.com/Rishabhpm23/MLingualFC获取。

英文摘要

Vision-Language Models (VLMs) have demonstrated strong performance across multimodal tasks, yet their safety robustness remains an open challenge. While prior work has shown that structured visual prompts such as flowcharts can effectively jailbreak VLMs, existing studies are largely limited to English-centric settings. In this paper, we introduce MLingualFC, a multilingual multimodal benchmark designed to evaluate jailbreak vulnerabilities of VLMs across diverse languages using structured flowchart representations. MLingualFC encodes harmful instructions into flowchart images across five languages (Hindi, Punjabi, Spanish, Romanian, and German). We evaluate state-of-the-art multilingual VLMs, including Qwen2.5-VL, Gemma-4, and Pangea, under a black-box threat model. Our results reveal significant multilingual safety gaps. Flowchart-based attacks achieve high attack success rates (ASR) in case of Latin script languages, demonstrating that visual encoding of harmful content effectively bypasses safety alignment across languages. In contrast, non-Latin script languages such as Punjabi exhibit substantially lower ASR, suggesting potential limitations in visual text recognition rather than stronger safety alignment. These findings highlight that current VLM safety mechanisms fail to generalize across languages and modalities. Resources are available at https://github.com/Rishabhpm23/MLingualFC

2606.07697 2026-06-09 physics.ao-ph cs.AI 新提交

TianJi-Environ: An Autonomous AI Scientist for Atmospheric Environmental Research

TianJi-Environ: 用于大气环境研究的自主人工智能科学家

Haoluo Zhao, Hongchun Zhang, Nan Li, Jing-Jia Luo, Kaikai Zhang, Mengyang Yu, Nan Chen, Tao Song, Fan Meng

发表机构 * School of Artificial Intelligence, Nanjing University of Information Science and Technology(南京信息工程大学人工智能学院) State Key Laboratory of Climate System Prediction and Risk Management (CPRM), Nanjing University of Information Science and Technology(南京信息工程大学气候系统预测与风险管理国家重点实验室) College of Environmental Science and Engineering, Nanjing University of Information Science and Technology(南京信息工程大学环境科学与工程学院) College of Computer Science and Technology, China University of Petroleum(中国石油大学(华东)计算机科学与技术学院)

AI总结 提出基于WRF-Chem的多智能体框架TianJi-Environ,自主驱动复杂大气化学模拟,实现机制假设的可执行配置、实验设计和证据标准,并通过臭氧和颗粒物案例验证其可审计的机制验证能力。

Comments 20 pages, 11 figures, 2 tables

详情
AI中文摘要

随着大气环境预测的持续改进,污染机制和反馈过程的可解释验证已成为大气化学的主要挑战。然而,基于复杂数值模型的机制验证仍然严重依赖专家知识:机制假设必须转化为可执行的实验,模型输出必须组织成可追溯的证据。我们提出了TianJi-Environ,一个用于大气化学机制验证的可审计AI科学家。TianJi-Environ建立了首个基于WRF-Chem的多智能体框架,自主驱动复杂的大气化学模拟,将机制假设转化为可执行的配置、测试实验和证据标准。以臭氧响应和颗粒物反馈作为两个代表性例子,我们展示了TianJi-Environ的机制验证能力。在华北平原的一个夏季臭氧案例中,系统在短波辐射和边界层高度中检测到方向一致的气溶胶-辐射相互作用信号,但判断臭氧对NOx控制的响应证据不完整。在关中盆地的一个冬季PM2.5案例中,系统将不支持的联系定位到黑碳扰动到颗粒物响应的传播不足以及垂直吸收加热的诊断缺失。这些结果表明,TianJi-Environ使专家驱动的机制验证变得明确、结构化和可审计,为多智能体系统与复杂大气化学模型的耦合提供了可复现的范式。

英文摘要

As atmospheric environmental prediction continues to improve, interpretable validation of pollution mechanisms and feedback processes has become a main challenge in atmospheric chemistry. Yet mechanism validation based on complex numerical models still relies heavily on expert knowledge: mechanistic hypotheses must be operationalized into executable experiments, and model outputs must be organized into traceable evidence. We present TianJi-Environ, an auditable AI Scientist for atmospheric-chemistry mechanism validation. TianJi-Environ establishes the first WRF-Chem-based multi-agent framework that autonomously drives complex atmospheric-chemistry simulations, converting mechanistic hypotheses into executable configurations, testing experiments, and evidence criteria. Using ozone response and particulate-matter feedback as two representative examples, we demonstrate TianJi-Environ's capability for mechanism validation. In a summertime ozone case over the North China Plain, the system detects directionally consistent aerosol-radiation-interaction signals in shortwave radiation and boundary-layer height, but judges the evidence for ozone response to NOx control to be incomplete. In a wintertime PM2.5 case over the Guanzhong Basin, it localizes the unsupported link to insufficient propagation from black-carbon perturbation to particulate response and missing diagnostics of vertical absorptive heating. These results show that TianJi-Environ makes expert-driven mechanism validation explicit, structured, and auditable, offering a reproducible paradigm for multi-agent systems coupled with complex atmospheric-chemistry models.

2606.07688 2026-06-09 cs.IR cs.AI cs.CL cs.LG 新提交

TRACER: Token ReAssignment for Concept ERasure in Generative Recommendation

TRACER: 面向生成式推荐中概念擦除的令牌重分配

Ziheng Chen, Jiali Cheng, Zezhong Fan, Hadi Amiri, Diyuan Wu, Gabriele Tolomei, Yang Zhang

发表机构 * Stony Brook University(石英布鲁克大学) University of Massachusetts Lowell(马萨诸塞大学洛厄尔分校) Columbia University(哥伦比亚大学) Institute of Science and Technology Austria(奥地利科学技术研究院) Sapienza University of Rome(罗马大学 sapienza) National University of Singapore(新加坡国立大学)

AI总结 针对生成式推荐中概念遗忘与推荐效用冲突的问题,提出基于令牌重分配的概念遗忘框架TRACER,通过将概念相关物品重分配给替代令牌并引入一致性正则化,有效移除目标概念同时保持推荐效用。

详情
AI中文摘要

生成式推荐将下一项预测形式化为基于用户历史交互导出的语义ID(SID)序列的自回归生成,使得现代推荐系统在结构上类似于大型语言模型(LLM)。随着隐私和安全问题的增加,这些系统越来越需要概念遗忘来移除与物品相关的敏感或有害概念。然而,现有的LLM遗忘方法不能直接应用于生成式推荐。与具有明确语义的词令牌不同,SID是抽象标识符,通常被遗忘和保留物品共享,导致概念移除和推荐效用保持之间的严重冲突。为了解决这一挑战,我们提出了TRACER,一种基于令牌重分配的端到端概念遗忘框架。TRACER不是直接抑制共享的SID,而是将概念相关物品重分配给能够更好地促进遗忘同时最小化对保留物品的副作用的替代令牌。我们进一步引入了一致性正则化器,以在遗忘过程中保持保留物品之间的语义一致性。在真实世界推荐数据集上的实验表明,TRACER有效地移除了目标概念,同时比现有的遗忘基线更好地保持了推荐效用。

英文摘要

Generative recommendation formulates next-item prediction as autoregressive generation over semantic ID (SID) sequences derived from users' historical interactions, making modern recommender systems structurally similar to large language models (LLMs). As privacy and safety concerns grow, these systems increasingly require concept unlearning to remove sensitive or harmful concepts associated with items. However, existing LLM unlearning methods cannot be directly applied to generative recommendation. Unlike word tokens with explicit semantics, SIDs are abstract identifiers that are often shared by both forget and retain items, leading to severe conflicts between concept removal and recommendation utility preservation. To address this challenge, we propose TRACER, an end-to-end concept unlearning framework based on token reassignment. Rather than directly suppressing shared SIDs, TRACER reassigns concept-related items to alternative tokens that better facilitate forgetting while minimizing side effects on retained items. We further introduce a coherence regularizer to preserve semantic consistency among retain items during unlearning. Experiments on real-world recommendation datasets demonstrate that TRACER effectively removes target concepts while substantially better preserving recommendation utility than existing unlearning baselines.

2606.07682 2026-06-09 cs.SE cs.AI 新提交

SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work?

SWE-Marathon: 智能体能否自主完成超长时程软件工作?

Rishi Desai, Jesse Hu, Joan Cabezas, Neel Harsola, Pratyush Shukla, Roey Ben Chaim, Adnan El Assadi, Omkaar Mukund Kamath, Fenil Faldu, Prannay Hebbar, Jiankai Sun, Yiyuan Li, Pramod Srinivasan, Ishan Gupta, Christopher Settles, Daniel Wang, Derek Chen, Pranav Raja, Albert Liu, Marek Šuppa, Nevasini Sasikumar, Luyang Kong, Erik Quintanilla, Xiangyi Li, Ivan Bercovich, Steven Dillmann

发表机构 * Abundant Zenity Harvard University(哈佛大学) University of Waterloo(滑铁卢大学) Gujarat Technological University(古吉拉特技术大学) Warping Stanford University(斯坦福大学) UNC-Chapel Hill(北卡罗来纳大学教堂山分校) Independent(独立) Refresh Soleda AI Near AI Georgia Tech(佐治亚理工学院) Comenius University in Bratislava(布拉迪斯拉发Comenius大学) UC San Diego(圣地亚哥大学) BenchFlow UC Santa Barbara(圣巴巴拉大学)

AI总结 提出SWE-Marathon基准,包含20个超长时程任务,平均消耗2720万token,评估智能体在规划、长上下文理解和记忆方面的能力,当前前沿编码智能体解决率低于30%。

详情
AI中文摘要

AI智能体越来越被期望完成需要持续数小时、数百万token和复杂环境的长时程工作流。然而,当前的智能体基准主要评估短时任务,例如单个拉取请求、小票或5-10分钟的练习,限制了我们在规划、长上下文理解和记忆使用方面衡量智能体能力的能力。我们引入了SWE-Marathon,一个包含20个长时程任务的基准,涵盖软件工程和相邻技术领域。每个任务包括一个独特的可执行环境、一个人工编写的参考解决方案和一个多层验证套件。记录的智能体尝试平均消耗2720万总token,使得SWE-Marathon比现有的SWE和命令行智能体基准的时程显著更长。当前前沿编码智能体解决了不到30%的任务。失败通常源于自我验证不足、自我报告不可行以及过早终止。我们还在13.8%的滚动中观察到奖励黑客行为,其中智能体试图利用环境或验证器绕过预期工作流。SWE-Marathon包括对测试套件和执行环境的对抗性审查,以及旨在防止捷径解决方案的多层检查。我们在https://swe-marathon.org/上发布SWE-Marathon、评估代码和智能体轨迹。

英文摘要

AI agents are increasingly expected to complete long-horizon workflows that require sustained progress over hours, millions of tokens, and complex environments. Yet current agent benchmarks largely evaluate short-form tasks, such as single pull requests, small tickets, or 5-10 minute exercises, limiting our ability to measure agents' capabilities in planning, long-context understanding, and memory use. We introduce SWE-Marathon, a benchmark of 20 long-horizon tasks spanning software engineering and adjacent technical domains. Each task consists of a unique executable environment, a human-written reference solution, and a multi-layer verification suite. Logged agent attempts average 27.2M total tokens, making SWE-Marathon substantially longer-horizon than existing SWE and command-line agent benchmarks. Current frontier coding agents solve fewer than 30% of tasks. Failures often arise from poor self-verification, self-reported infeasibility, and premature termination. We also observe reward-hacking behavior in 13.8% of rollouts, where agents attempt to exploit the environment or verifier to bypass the intended workflow. SWE-Marathon includes adversarial review of test suites and execution environments, as well as multi-layer checks designed to prevent shortcut solutions. We release SWE-Marathon, evaluation code, and agent trajectories at https://swe-marathon.org/.

2606.07681 2026-06-09 cs.SE cs.AI cs.CE cs.MA 新提交

Systematic LLM Translation of Legacy Scientific Code to Differentiable Frameworks: Application to a Land Surface Model

将遗留科学代码系统性地LLM翻译为可微分框架:以陆面模型为例

Aya Lahlou, Linnia Hawkins, Pierre Gentine

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) NASA Goddard Space Flight Center(国家航空航天局戈达德空间飞行中心)

AI总结 提出基于LLM的五阶段流水线,将遗留Fortran代码自动翻译为JAX可微分框架,在CLM-ml-v2模型上实现完整雅可比矩阵计算和24倍加速。

详情
AI中文摘要

可微分编程为科学建模提供了变革性能力,支持基于梯度的参数估计、敏感性分析和数据同化。然而,将遗留代码库迁移到可微分框架仍然是一个挑战。我们提出一个基于LLM的五阶段智能体流水线,将遗留Fortran代码翻译为JAX:静态依赖分析从完整调用图确定模块翻译顺序;迭代编译-修复循环自动纠正错误;Fortran参考预言机在模块级别强制数值一致性,然后进行集成和梯度验证。我们在CLM-ml-v2(一个19,000行的Fortran陆面模型)上实例化并评估该流水线,分析了73个模块翻译任务中的智能体行为。得到的可微分模型在单次反向传播中计算完整雅可比矩阵,以比无梯度优化少八倍的步数恢复物理参数,并在集成大小N=2,048时比顺序Fortran实现24倍的墙钟加速。翻译后的模型和流水线基础设施作为可重用框架发布,用于区分其他地球系统模型组件。

英文摘要

Differentiable programming offers transformative capabilities for scientific modeling, enabling gradient-based parameter estimation, sensitivity analysis, and data assimilation. Yet, migrating legacy codebases into differentiable frameworks remains a challenge. We present a five-phase LLM-based agentic pipeline that translates legacy Fortran into JAX: static dependency analysis determines module translation order from the full call graph; iterative compile-repair loops correct errors autonomously; and a Fortran reference oracle enforces numerical parity at the module level before integration and gradient verification. We instantiate and evaluate the pipeline on CLM-ml-v2, a 19,000-line Fortran land surface model, and analyze agent behavior across 73 module translation tasks. The resulting differentiable model computes the complete Jacobian in a single backward pass, recovers physical parameters in eight times fewer steps than gradient-free optimization, and achieves a 24 times wall-clock speedup over sequential Fortran at ensemble size N=2,048. Both the translated model and pipeline infrastructure are released as a reusable framework for differentiating other Earth system model components.

2606.07677 2026-06-09 stat.ML cs.LG stat.AP stat.ME 新提交

Disentangling Latent Risk Pathways via Bayesian Hypergraph Inference

通过贝叶斯超图推断解缠潜在风险路径

Shengxian Ding, Haonan Gao, Pangpang Liu, Xinyuan Tian, Yize Zhao

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出贝叶斯超图推断框架,通过风险因子调节的潜在疾病路径建模多疾病,实现可解释的高阶结构、校准的不确定性估计和罕见病改进预测。

Comments ICML 2026 Oral

详情
AI中文摘要

电子健康记录(EHR)提出了大规模多疾病建模问题,其中许多结果罕见且受共享风险因素强烈影响。虽然现代方法实现了强大的预测性能,但它们通常独立处理疾病或依赖黑盒架构,对风险因素如何组织疾病风险的洞察有限,且缺乏原则性的不确定性量化。我们引入了一个贝叶斯超图推断框架,将多疾病建模重新构建为围绕潜在的风险因子调节的疾病路径。风险因素作用于超边,即具有共享风险模式的潜在疾病子集,允许疾病参与多个不同的路径,并实现超越成对关联的可解释高阶结构。排斥先验鼓励简约且可识别的结构,而后验推断为疾病分组和风险因素影响提供了校准的不确定性。为了在大型EHR数据集上实现可扩展推断,我们开发了一种结构化变分推断算法,该算法保留了超边存在、疾病成员资格和路径级效应之间的逻辑依赖关系。在模拟数据和英国生物银行上的实验表明,该框架具有稳定且可解释的疾病路径结构、良好校准的不确定性、对罕见病的改进估计以及有竞争力的预测性能。

英文摘要

Electronic health records (EHR) pose large-scale multi-disease modeling problems in which many outcomes are rare and strongly influenced by shared risk factors. While modern approaches achieve strong predictive performance, they often treat diseases independently or rely on black-box architectures, offering limited insight into how risk factors organize disease risk and little principled uncertainty quantification. We introduce a Bayesian hypergraph inference framework that reframes multi-disease modeling around latent, risk-factor-modulated disease pathways. Risk factors act on hyperedges, latent disease subsets with shared risk patterns, allowing diseases to participate in multiple distinct pathways and enabling interpretable, higher-order structure beyond pairwise associations. A repulsion prior encourages parsimonious and identifiable structure, while posterior inference provides calibrated uncertainty over both disease groupings and risk-factor influence. To enable scalable inference on large EHR datasets, we develop a structured variational inference algorithm that preserves logical dependencies among hyperedge existence, disease membership, and pathway-level effects. Experiments on simulated data and UK Biobank demonstrate stable and interpretable disease pathway structure, well-calibrated uncertainty, improved estimation for rare diseases, and competitive predictive performance.

2606.07676 2026-06-09 q-bio.GN cs.AI 新提交

Single-Cell Cross-Modal Transfer by Adversarial Fine-Tuning of Foundation Models

通过基础模型的对抗微调实现单细胞跨模态迁移

Joseph Boyd, Matthew Lyon, Martino Mansoldo, Christian Hurry, Finnian Firth

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出利用单细胞基础模型进行对抗微调,实现未配对空间转录组与单细胞RNA测序数据的跨模态翻译,性能优于多组学翻译方法。

详情
AI中文摘要

空间转录组学(ST)是探索组织中依赖于结构、邻近性和相互作用的生物学特性的强大工具。支撑ST的方法正在快速发展,但在亚细胞尺度上分析数千个基因的能力有限。尽管从组织中解离,但已知单细胞RNA测序(scRNA-seq)中细胞的全转录组读数保留了其先前原位邻域的信息,这激发了恢复该信息的计算方法。虽然配对的ST和scRNA-seq数据集稀缺,但每种模态本身都很丰富。因此,我们提出在未配对的ST和scRNA-seq数据之间进行跨模态翻译。在这项工作中,我们展示了单细胞基础模型可以通过对抗微调执行这种翻译。我们证明了我们的方法优于为多组学翻译构建的方法。

英文摘要

Spatial transcriptomics (ST) is a powerful tool for exploring biological properties dependent on structure, proximity, and interaction in tissue. The methods underpinning ST are developing rapidly but are limited in their ability to profile many thousands of genes at a subcellular scale. Although dissociated from tissue, it is known that the whole-transcriptome readouts of cells in single-cell RNA sequencing (scRNA-seq) retain information about their former in situ neighbourhoods, motivating computational methods to recover it. While paired ST and scRNA-seq datasets are scarce, each modality in its own right is abundantly available. We therefore propose to perform cross-modal translation between unpaired ST and scRNA-seq data. In this work we show that a single-cell foundation model can perform this translation via adversarial fine-tuning. We demonstrate that our method performs favourably against methods built for multi-omics translation.

2606.07675 2026-06-09 eess.IV cs.CV cs.LG 新提交

The Need for Neural ISP in the Small-Pixel Era: How Shrinking Pixels Push Optics to the Limit and Neural Restoration Pushes Back

小像素时代对神经ISP的需求:像素缩小将光学推向极限,神经恢复则逆势而上

Jingxi Li, Neerja Aggarwal, Laurent Gudemann, Shivansh Rao, Vishal Vinod, Tom E. Bishop, Ziv Attar

发表机构 * Glass Imaging Inc(玻璃成像公司)

AI总结 针对智能手机小像素长焦模块中光学像差限制分辨率的问题,提出基于学习的神经ISP恢复图像,在0.35微米像素下实现2.5-3倍分辨率提升,表明神经ISP可替代复杂光学设计。

详情
AI中文摘要

智能手机长焦摄像头正接近“长焦物理墙”:随着像素间距缩小至亚0.5微米,光学系统仍受几何像差限制,导致分辨率收益递减。传统图像信号处理器(ISP)无法消除这些像差,因为它们通过局部、分阶段处理运行,没有明确的点扩散函数(PSF)模型。我们展示了基于学习的神经ISP用于图像恢复,通过训练底层退化,逆转了分阶段流水线无法处理的问题,将小像素设计转化为净优势。我们通过一个代表性长焦模块的受控模拟进行研究,评估了五种配置(0.35--0.75微米像素间距)。光圈按比例缩放以保持每像素信噪比和衍射光斑尺寸固定,从而隔离几何像差和空间采样。传统ISP随像素减小仅适度改进,而神经ISP显著扩展:在0.35微米时,其MTF50(垂直)达到745 cycles/mm,比传统ISP分辨率提升2.5-3倍,LPIPS从0.244显著改善至0.151,而传统结果保持相对平坦。在低信噪比扩展中(0.35微米下每帧15 dB突发),多帧神经ISP恢复的性能接近亮光单帧基线,而多帧传统ISP没有显示出有意义的改进——表明小像素下的传统流水线受限于未校正的PSF模糊而非噪声。这些结果指向一种设计理念:神经ISP通过校正残余光学像差而非要求日益复杂的光学系统,实现高分辨率长焦模块。

英文摘要

Smartphone telephoto cameras are approaching a "telephoto physics wall": as pixel pitches shrink toward sub-0.5 micron, the optics remain limited by geometric aberrations, leading to diminishing returns on resolution. Traditional Image Signal Processors (ISPs) cannot eliminate these aberrations, because they operate through local, stage-wise processing with no explicit model of the underlying point spread function (PSF). We demonstrate how a learning-based Neural ISP for image restoration, trained on the underlying degradations, inverts what stage-wise pipelines cannot, turning small-pixel designs into a net advantage. We investigate this through a controlled simulation of a representative telephoto module, evaluating five configurations (0.35--0.75 micron pixel pitch). The aperture is scaled proportionally to keep per-pixel SNR and diffraction spot size fixed, thereby isolating geometric aberration and spatial sampling. While the traditional ISP improves only modestly with smaller pixels, the Neural ISP scales substantially: at 0.35 micron} it reaches 745 cycles/mm MTF50 (vertical), a 2.5--3x resolution improvement over the traditional ISP, and LPIPS improves significantly from 0.244 to 0.151 while traditional results stay comparatively flat. In a low-SNR extension (15 dB per-frame bursts at 0.35 micron), a multi-frame Neural ISP recovers performance close to the bright-light single-frame baseline, whereas a multi-frame traditional ISP shows no meaningful improvement -- indicating that traditional pipelines at small pixels are bottlenecked by uncorrected PSF blur rather than by noise. These results point to a design philosophy in which Neural ISPs enable high-resolution telephoto modules by correcting residual optical aberrations rather than requiring increasingly complex optics.

2606.07666 2026-06-09 quant-ph cs.AR cs.DC cs.LG 新提交

Hardware-aware Low-latency Quantum Compilation with Data-driven Lightweight Error Detection for Early Fault-Tolerant Systems

面向早期容错系统的硬件感知低延迟量子编译与数据驱动的轻量级错误检测

Sumit Chongder

发表机构 * Inter-Disciplinary Research Platform, Quantum Information and Computation, Indian Institute of Technology Jodhpur(跨学科研究平台、量子信息与计算、印度理工学院贾尔普尔)

AI总结 提出一种集成硬件感知编译与数据驱动量子错误检测的框架,通过噪声加权代价函数和学习型多目标调度器联合优化量子比特映射、SWAP插入和综合征调度,在VQE、相位估计和Grover基准测试中,将算法成功概率提升高达68%。

Comments 16 pages, 15 figures, Springer LNCS format. Code available at https://github.com/Sumitchongder/quantum-hw-aware-pipeline

详情
AI中文摘要

噪声中等规模量子(NISQ)处理器正进入早期容错阶段,此时完全量子纠错代价高昂,而轻量级错误检测可有效提高算法成功率。现有编译和错误检测工具链孤立处理这些问题,缺乏在延迟约束下平衡检测开销与成功概率的原则性方法。我们提出一种集成的硬件感知编译与数据驱动量子错误检测(QED)框架,通过噪声加权代价函数和学习型多目标调度器,联合优化量子比特映射、SWAP插入和综合征调度。在HPC集群上使用GPU加速密度矩阵模拟(NVIDIA cuQuantum SDK)进行的仿真实验,涵盖VQE、相位估计和Grover基准测试、三种噪声模型以及6-20量子比特(深度10-160)的电路规模,结果表明,在8量子比特VQE实例上,联合协同设计相比SABRE结合后选择,将算法成功概率提升高达68%(95%置信区间:60%至76%)。

英文摘要

Noisy intermediate-scale quantum (NISQ) processors are entering an early fault-tolerance regime where full quantum error correction carries prohibitive resource costs, yet lightweight error detection can meaningfully improve algorithmic success rates. Existing compilation and error-detection toolchains treat these concerns in isolation, with no principled way to balance detection overhead against success probability under latency constraints. We present an integrated hardware-aware compilation and data-driven quantum error-detection (QED) framework that jointly optimises qubit mapping, SWAP insertion, and syndrome-schedule placement via a noise-weighted cost function and a learned multi-objective scheduler. Simulation experiments on an HPC cluster using GPU-accelerated density-matrix simulation (NVIDIA cuQuantum SDK) across VQE, phase-estimation, and Grover benchmarks, three noise profiles, and circuit sizes of 6-20 qubits (depths 10-160), show that joint co-design raises algorithmic success probability by up to 68 percent (95 percent CI: 60 percent to 76 percent) over SABRE on an 8-qubit VQE instance with post-selection.

2606.07665 2026-06-09 cs.PL cs.AI 新提交

AgentCompile: An LLM-Guided Compiler for Direct CUDA Inference

AgentCompile:一种用于直接CUDA推理的LLM引导编译器

Xuanzhe Li, Ziyan Weng, Zhiyu Zhu, Junhui Hou

发表机构 * City University of Hong Kong (Dongguan)(香港城市大学(东莞)) City University of Hong Kong(香港城市大学)

AI总结 提出AgentCompile,利用LLM提供语义建议,通过模板生成CUDA候选实现并验证,在多个Transformer模型上实现4-5.7倍加速。

Comments 11 pages, 3 figures

详情
AI中文摘要

Transformer推理日益依赖专门的编译器和运行时支持,但实际模型图仍需要语义决策,以确定哪些区域值得专门化以及哪些CUDA实现族是可行的。我们提出AgentCompile,一种LLM引导的CUDA推理编译器,仅将LLM输出用作建议性搜索元数据。给定编译器生成的区域摘要和有界候选空间,LLM提出语义标签、候选优先级、参数提示和风险注释;编译器通过模板生成CUDA候选,检查接口和硬件约束,经验性验证候选,根据测量延迟选择实现,并在专门化不受支持或无利可图时回退。在端到端自回归生成中,AgentCompile在五个代表性工作负载上,相对于PyTorch eager模式,在Qwen3-1.7B、Qwen3-4B和Llama-3.2-1B-Instruct上分别实现了平均5.66倍、4.05倍和4.26倍的加速。我们将开源该项目。

英文摘要

Transformer inference increasingly depends on specialized compiler and runtime support, but real model graphs still require semantic decisions about which regions are worth specializing and which CUDA implementation families are plausible. We present AgentCompile, an LLM-guided CUDA inference compiler that uses LLM outputs only as advisory search metadata. Given compiler-derived region summaries and bounded candidate spaces, the LLM proposes semantic labels, candidate priorities, parameter hints, and risk annotations; the compiler materializes CUDA candidates through templates, checks interface and hardware constraints, validates candidates empirically, selects implementations by measured latency, and falls back when specialization is unsupported or unprofitable. In end-to-end autoregressive generation, AgentCompile averages 5.66x, 4.05x, and 4.26x speedup over PyTorch eager on Qwen3-1.7B, Qwen3-4B, and Llama-3.2-1B-Instruct, respectively, across five representative workloads. We will open-source the project.

2606.07664 2026-06-09 cs.NE cs.AI 新提交

Seq103: A Unified Neuroevolution Framework for Compact Sequence Architecture Discovery

Seq103: 用于紧凑序列架构发现的统一神经进化框架

Wenxiao Li, Yongjian Liu, Qing Xie

发表机构 * School of Computer Science and Artificial Intelligence, Wuhan University of Technology(武汉理工大学计算机科学与人工智能学院)

AI总结 提出统一神经进化框架Seq103,通过共享进化主干和可选循环扩展,在序列分类任务中实现紧凑架构搜索,在文本和时间序列数据集上以极低参数量保持高精度。

Comments 18 pages, 2 figures, 8 tables

详情
AI中文摘要

神经进化是一种代表性的神经架构搜索范式,通过进化算法同时演化网络拓扑和权重。本文提出Seq103,一个统一的NEAT风格神经进化框架,用于紧凑序列架构发现。Seq103包含一个共享的进化主干和一个可选的循环扩展。共享主干包括基本的节点-连接表示、基于每类RMSE的评估、带有类级重组的基于突变的进化以及精英策略。可选的隐藏状态机制通过隐藏状态节点和隐藏连接扩展搜索空间,在需要逐步循环推理时提供时间记忆。通过这种设计,Seq103将相同的核心搜索流程应用于逐步循环和样本级前馈序列分类。在循环任务中,启用隐藏状态扩展以提供时间记忆;在前馈任务中,禁用该扩展,而共享进化主干保持不变。我们在8个文本分类数据集和包含128个单变量时间序列数据集的完整UCRArchive2018基准上评估Seq103。在逐步任务中,Seq103平均保留最佳基线准确率的86.96%,同时参数数量减少34.6倍至3218.0倍。在完整UCRArchive2018基准的样本级任务中,Seq103平均保留最佳基线准确率的81.95%,同时参数数量减少11.8倍至160,601.0倍。

英文摘要

Neuroevolution is a representative neural architecture search paradigm that evolves both network topology and weights through evolutionary algorithms. In this paper, we propose Seq103, a unified NEAT-style neuroevolution framework for compact sequence architecture discovery. Seq103 consists of a shared evolutionary backbone and an optional recurrent extension. The shared backbone includes an elementary node-and-connection representation, per-class RMSE-based evaluation, mutation-based evolution with class-wise recombination, and elitism. The optional hidden-state mechanism extends the search space with hidden-state nodes and hidden connections, enabling temporal memory when step-wise recurrent inference is required. With this design, Seq103 applies the same core search pipeline to both step-wise recurrent and sample-wise feedforward sequence classification. In recurrent tasks, the hidden-state extension is enabled to provide temporal memory; in feedforward tasks, it is disabled while the shared evolutionary backbone remains unchanged. We evaluate Seq103 on 8 text classification datasets and the full UCRArchive2018 benchmark with 128 univariate time-series datasets. On step-wise tasks, Seq103 retains 86.96% of the best-baseline accuracy on average while using 34.6x to 3218.0x fewer parameters. On sample-wise tasks over the full UCRArchive2018 benchmark, Seq103 retains 81.95% of the best-baseline accuracy on average while using 11.8x to 160,601.0x fewer parameters.