arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.13807 2026-05-14 cond-mat.str-el cond-mat.dis-nn cs.LG physics.comp-ph quant-ph

Parallel Scan Recurrent Neural Quantum States for Scalable Variational Monte Carlo

Ejaaz Merali, Mohamed Hibat-Allah, Mohammad Kohandel, Richard T. Scalettar, Ehsan Khatami

发表机构 * Department of Physics Astronomy, University of California, Davis, California 95616, USA Astronomy, San Jos\'e State University, San Jos\'e, California 95192, USA Department of Applied Mathematics, University of Waterloo, Waterloo, ON N2L 3G1, Canada Vector Institute, Toronto, Ontario, M5G 0C6, Canada

AI总结 本文提出了一种基于并行扫描结构的递归神经量子态(PSR-NQS),旨在解决传统递归神经网络在量子多体系统模拟中可扩展性差的问题。通过结合自回归递归波函数与可并行化的递归方法,该方法能够在一维和二维空间中高效地进行变分蒙特卡洛训练,并在较大规模的二维自旋晶格上取得了与量子蒙特卡洛数据一致的高精度结果。研究证明了递归架构在资源消耗较低的情况下,仍具备实现可扩展量子态模拟的实用性和潜力。

Comments 13 pages, 2 figures, 6 tables

详情
英文摘要

Neural-network quantum states have emerged as a powerful variational framework for quantum many-body systems, with recent progress often driven by massively parallel architectures such as transformers. Recurrent neural network quantum states, however, are frequently regarded as intrinsically sequential and therefore less scalable. Here we revisit this view by showing that modern recurrent architectures can support fast, accurate, and computationally accessible neural quantum state simulations. Using autoregressive recurrent wave functions together with recent advances in parallelizable recurrence, we develop variational ansätze, called parallel scan recurrent neural quantum states (PSR-NQS), which can be trained efficiently within variational Monte Carlo in one and two spatial dimensions. We demonstrate accurate benchmark results and show that, with iterative retraining, our approach reaches two-dimensional spin lattices as large as $52\times52$ while remaining in agreement with available quantum Monte Carlo data. Our results establish recurrent architectures as a practical and promising route toward scalable neural quantum state simulations with modest computational resources.

2605.13806 2026-05-14 cs.DS cs.CC cs.GT cs.LG math.OC

Min-Max Optimization Requires Exponentially Many Queries

Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender

发表机构 * Bocconi University(博科尼大学) Politecnico di Milano(米兰理工学院) University of Oxford(牛津大学)

AI总结 本文研究了在单位超立方体上对非凸非凹函数进行最小最大优化的查询复杂度,证明了任何能够找到ε近似平稳点的算法,其查询次数必须指数级依赖于1/ε或维度d。这一结果揭示了此类优化问题在计算上的本质困难,为相关算法设计提供了理论界限。

详情
英文摘要

We study the query complexity of min-max optimization of a nonconvex-nonconcave function $f$ over $[0,1]^d \times [0,1]^d$. We show that, given oracle access to $f$ and to its gradient $\nabla f$, any algorithm that finds an $\varepsilon$-approximate stationary point must make a number of queries that is exponential in $1/\varepsilon$ or $d$.

2605.13794 2026-05-14 cs.GR cs.CV

BlitzGS: City-Scale Gaussian Splatting at Lightning Speed

Zhongtao Wang, Huishan Au, Yilong Li, Mai Su, Haojie Jin, Yisong Chen, Meng Gai, Fei Zhu, Guoping Wang

发表机构 * Peking University(北京大学)

AI总结 本文提出了一种名为BlitzGS的分布式3D高斯溅射框架,旨在实现城市级规模场景的快速重建。该方法通过在系统层、模型层和视图层三个耦合层级优化高斯点的处理流程,显著减少了计算负载,提升了渲染效率。实验表明,BlitzGS在保持渲染质量的同时,相比现有方法实现了数量级的加速,能够在数十分钟内完成城市级场景的训练。

详情
英文摘要

We present BlitzGS, a distributed 3DGS framework that reduces active Gaussian workload for fast city-scale reconstruction. BlitzGS manages this workload at three coupled levels. At the system level, the framework shards Gaussians across GPUs by index parity rather than spatial blocks. This approach mitigates the cross-block visibility redundancy inherent in spatial partitioning. Furthermore, it distributes each rendering step through a single cross-GPU exchange that routes projected Gaussians to their tile owners. At the model level, scheduled importance-scoring passes shrink the global Gaussian population. During these passes, the framework generates a per-Gaussian visibility weight to bias density-control updates toward contributing primitives and a per-view importance mask for the view-level renderer. At the view level, BlitzGS trims each camera's active set with a distance-based LOD gate to exclude excessively fine primitives for the current frustum and the importance-based culling mask to skip Gaussians with negligible cross-view contribution. On large-scale benchmarks, BlitzGS matches the rendering quality of recent large-scale baselines while delivering an order-of-magnitude speedup, training city-scale scenes in tens of minutes. Our code is available at https: //github.com/AkierRaee/BlitzGS.

2605.13785 2026-05-14 cs.CY cs.AI

Amplification to Synthesis: A Comparative Analysis of Cognitive Operations Before and After Generative AI

Liz Cho, Dongwook Yoon

发表机构 * University of British Columbia(不列颠哥伦比亚大学)

AI总结 本文对比分析了2016年和2024年美国大选期间Twitter数据集中的认知操作行为与语言协调模式,揭示了生成式AI可能对认知操作方式带来的根本性改变。研究发现,2024年的数据表现出显著差异,原创内容比例大幅上升,语义重叠度下降,时间协调方式也发生变化,这些特征与生成式AI的主动内容生成和叙事定向能力高度一致。该研究为未来探讨生成式AI在认知操作中的作用提供了实证基础,并为安全从业者构建应对生成式AI威胁的检测框架提供了参考。

详情
英文摘要

Cognitive operations are a rising concern in the geopolitical sphere, a quiet yet rigorous fight for public perception and decision making. While such operations have been extensively studied in the context of bot-driven amplification, the emergence of generative AI introduces a new set of capabilities that may have fundamentally altered how these operations are designed and executed. The possible evolution of cognitive operation via generative AI puts nation states vulnerable without proper mitigation strategies. To address this, we compared behavioral and linguistic coordination patterns in X (formerly Twitter) datasets from the 2016 and 2024 U.S. presidential elections. Utilizing a combined corpus of over 133,000 posts, we applied post-type distribution, semantic clustering, temporal synchrony analysis, and Jaccard-based lexical overlap measures. Findings suggest that the 2024 corpus exhibits a distinct pattern from 2016. Original content rose from 59% to 93% with retweets virtually disappeared; lexical overlap collapsed from a mean Jaccard score of 0.99 to 0.27, with posts converging on the same subject matter expressed in markedly different words; and temporal coordination shifted from pervasive cross-semantic synchrony to narratively concentrated co-occurrence. Taken together, these patterns point toward an operational logic organized around active content generation and narrative-specific targeting - characteristics consistent with generative AI involvement. These findings offer an empirical baseline for future research investigating generative AI's role in the cognitive operation pipeline, and as a practical reference point for security practitioners developing detection frameworks calibrated to the post-generative AI threat environment.

2605.13764 2026-05-14 cs.CR cs.IR cs.LG

VectorSmuggle: Steganographic Exfiltration in Embedding Stores and a Cryptographic Provenance Defense

Jascha Wanger

发表机构 * ThirdKey / Tarnover, LLC(ThirdKey / 塔诺弗公司)

AI总结 本文研究了现代检索增强生成(RAG)系统中嵌入向量存储的安全隐患,提出了一种名为VectorSmuggle的隐写术数据泄露攻击方法,攻击者可通过微小的嵌入向量扰动在不影响检索行为的前提下隐藏敏感信息。为应对这一威胁,作者提出了VectorPin协议,利用密码学签名确保每个嵌入向量与其来源内容和生成模型的绑定关系,从而有效检测并阻止嵌入向量被篡改。

Comments 47 pages, 3 figures. Reference implementations: https://github.com/jaschadub/VectorSmuggle and https://github.com/jaschadub/VectorPin

详情
英文摘要

Modern retrieval-augmented generation (RAG) systems convert sensitive content into high-dimensional embeddings and store them in vector databases that treat the resulting numerical artifacts as opaque. Major vector-store products do not provide native controls for embedding integrity, ingestion-time distributional anomaly detection, or cryptographic provenance attestation. We show this opens a class of steganographic exfiltration attacks: an attacker with write access to the ingestion pipeline can hide payload data inside embeddings using simple post-embedding perturbations (noise injection, rotation, scaling, offset, fragmentation, and combinations thereof) while preserving the surface-level retrieval behavior the RAG system exposes to legitimate users. We evaluate these techniques across a synthetic-PII corpus on text-embedding-3-large, four locally hosted open embedding models, a cross-corpus replication on BEIR NFCorpus and a Quora subset (over 26,000 chunks combined), seven vector-store configurations, an adaptive-attacker variant of the detector evaluation, and a paraphrased-query retrieval benchmark. Distribution-shifting perturbations are often caught by simple anomaly detectors; small-angle orthogonal rotation defeats distribution-based detection across every (model, corpus) pair tested. A disjoint-Givens rotation encoder gives a closed-form per-vector capacity ceiling of floor(d/2) * b bits, but real embedding manifolds impose a capacity-detectability trade-off, and the retrieval-preserving operating point sits well below it. We propose VectorPin, a cryptographic provenance protocol that pins each embedding to its source content and producing model via an Ed25519 signature over a canonical byte representation. Any post-embedding modification breaks signature verification. Embedding-level integrity is a deployable, standardizable control that closes this attack class.

2605.13734 2026-05-14 cs.DC cs.AI cs.NI

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

Zedong Liu, Xinyang Ma, Dejun Luo, Hairui Zhao, Bing Lu, Wenjing Huang, Yida Gu, Xingchen Liu, Zheng Wei, Jinyang Liu, Dingwen Tao, Guangming Tan

发表机构 * University of Chinese Academy of Sciences(中国科学院大学) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) Shanghai Jiao Tong University(上海交通大学)

AI总结 随着大规模语言模型(LLM)在生产环境中的广泛应用,分布式推理系统面临显著的通信瓶颈,尤其是键值(KV)缓存的传输。为解决这一问题,本文提出KVServe,一种面向服务场景的自适应KV缓存压缩框架,通过统一的模块化策略空间、高效的贝叶斯配置引擎和在线控制器,实现了对不同工作负载和网络条件的动态优化。实验表明,KVServe在分离式LLM服务中显著提升了推理效率,最高实现了9.13倍的总推理时间加速和32.8倍的首token生成时间减少。

Comments Accepted by SIGCOMM 2026

详情
英文摘要

LLMs are widely adopted in production, pushing inference systems to their limits. Disaggregated LLM serving (e.g., PD separation and KV state disaggregation) improves scalability and cost efficiency, but it also turns KV into an explicit payload crossing network and storage boundaries, making KV a dominant end-to-end bottleneck. Existing KV compression are typically static runtime configurations, despite production service context varies over time in workload mix, bandwidth, and SLO/quality budgets. As a result, a fixed choice can be suboptimal or even increase latency. We present \emph{KVServe}, the first service-aware and adaptive KV communication compression framework for disaggregated LLM serving: KVServe (1) unifies KV compression into a modular strategy space with new components and cross-method recomposition; (2) introduces Bayesian Profiling Engine that efficiently searches this space and distills a 3D Pareto candidate set, reducing $50\times$ offline search overhead; and (3) deploys a Service-Aware Online Controller that combines an analytical latency model with a lightweight bandit to select profiles under constraints and correct offline-to-online mismatch. Integrated into vLLM and evaluated across datasets, models, GPUs and networks, KVServe achieves up to $9.13\times$ JCT speedup in PD-separated serving and up to $32.8\times$ TTFT reduction in KV-disaggregated serving.

2605.13723 2026-05-14 cs.HC cs.AI cs.LG cs.SI

Humanwashing -- It Should Leave You Feeling Dirty

Ben Wilson, Matimba Swana, Peter Winter, Matt Roach

发表机构 * Computational Foundry, Swansea University, UK(计算泉研究所,斯wansea大学,英国) University of Bristol, UK(布里斯托大学,英国)

AI总结 本文探讨了“人机协作”(human in the loop)这一概念在人工智能决策系统中被滥用的问题,指出其常被用来制造一种虚假的安全感,实则掩盖了系统中的偏见、歧视和不透明等问题。作者批评了“人机循环”隐喻的过度使用,认为这模糊了人类监督的实际意义与效果,导致“人类漂白”(humanwashing)现象,即通过语言美化系统,回避对其真实影响的审视。文章呼吁对人类监督的内涵进行更深入的探讨,以确保其真正发挥应有的作用。

Comments 10 pages, 1 figure. Reviewed and accepted for presentation at HHAI 2026, Brussels

详情
英文摘要

The phrase 'human in the loop' is increasingly used to imply a sense of safety in relation to AI decision systems. It shouldn't. There are contexts where it can be applied appropriately, but these are not in the deployed decision systems we see dominating today. Human oversight of AI decision processes is one of the most popular proposals for addressing concerns, especially about bias, discrimination, misinformation, manipulation, accountability, and transparency. But there is insufficient examination of what human oversight actually means. The question raised in this paper is whether using the metaphor of a loop does anything to assist understanding of what is required and what is achieved in a particular decision context. Indiscriminate use of the loop metaphor obscures both processes and outcomes. It enables 'humanwashing', an activity analogous to 'greenwashing', where writers and commentators use language primarily aimed at putting systems in the best possible light.

2605.13708 2026-05-14 cs.CR cs.DC cs.LG

DisAgg: Distributed Aggregators for Efficient Secure Aggregation in Federated Learning

Haaris Mehmood, Giorgos Tatsis, Dimitrios Alexopoulos, Karthikeyan Saravanan, Jie Xu, Anastasios Drosou, Mete Ozay

发表机构 * Anonymous Institution, Anonymous City, Anonymous Region, Anonymous Country(匿名机构,匿名城市,匿名地区,匿名国家)

AI总结 本文提出了一种名为DisAgg的分布式聚合协议,旨在提升联邦学习中安全聚合的效率。该方法通过让一小部分被选中的客户端(称为Aggregators)负责本地聚合计算,减少了通信轮次和计算开销,同时保护客户端更新信息不被好奇的服务器获取。实验表明,DisAgg在处理大规模高维更新向量时,相比现有最佳方案OPA实现了4.6倍的加速。

Comments Accepted to MLSys 2026; code available at: https://github.com/SamsungLabs/mlsys26_disagg

详情
英文摘要

Federated learning enables collaborative model training across distributed clients, yet vanilla FL exposes client updates to the central server. Secure-aggregation schemes protect privacy against an honest-but-curious server, but existing approaches often suffer from many communication rounds, heavy public-key operations, or difficulty handling client dropouts. Recent methods like One-Shot Private Aggregation (OPA) cut rounds to a single server interaction per FL iteration, yet they impose substantial cryptographic and computational overhead on both server and clients. We propose a new protocol called DisAgg that leverages a small committee of clients called Aggregators to perform the aggregation itself: each client secret-shares its update vector to Aggregators, which locally compute partial sums and return only aggregated shares for server-side reconstruction. This design eliminates local masking and expensive homomorphic encryption, reducing endpoint computation while preserving privacy against a curious server and a limited fraction of colluding clients. By leveraging optimal trade-offs between communication and computation costs, DisAgg processes 100k-dimensional update vectors from 100k 5G clients with a 4.6x speedup compared to OPA, the previous best protocol.

2605.13706 2026-05-14 cs.CR cs.AI cs.CY cs.NI

Identifying AI Web Scrapers Using Canary Tokens

Steven Seiden, Triss Ren, Caroline Zhang, Taein Kim, Enze Liu, Emily Wenger

发表机构 * Duke University(杜克大学) University of Pittsburgh(匹兹堡大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出了一种新颖的方法,用于准确识别向大语言模型(LLM)提供数据的网络爬虫。该方法通过在动态网站中部署独特的“金丝雀标记”(canary tokens),并观察LLM生成内容中是否包含这些标记,从而推断出哪些爬虫被LLM使用过。实验表明,该方法能够可靠地识别出多个未被公开披露的爬虫与LLM之间的数据来源关系,为第三方提供了有效监控和控制网络爬虫行为的新途径。

详情
英文摘要

From pre-training to query-time augmentation, web-scraped data helps to improve the quality and contextual relevancy of content generated by large language models (LLMs). However, large-scale web scraping to feed LLMs can affect site stability and raise legal, privacy, or ethics concerns. If website owners wish to limit LLM-related web scraping on their site, due to these or other concerns, they may turn to scraper access control mechanisms like the Robots Exclusion Protocol. To be most effective, such mechanisms require site owners to first identify the scrapers that they wish to restrict (e.g., via User-Agent strings). Existing mechanisms to identify LLM-related scrapers rely on voluntary disclosure by companies, one-off experiments by researchers, or crowd-sourced reports -- methods that are neither reliable nor scalable. This paper proposes a novel technique for accurately and automatically inferring LLM-related scrapers. We host dynamic websites that serve unique canary tokens to each visiting scraper, then prompt LLMs for information about our sites. If an LLM consistently generates outputs containing tokens unique to a scraper, it provides evidence of exposure to that scraper. Via experiments across 22 production LLM systems, we demonstrate that our approach can reliably identify which scrapers feed which LLM, including several that are not publicly known or disclosed by the companies. Our approach provides a promising avenue for unprivileged third parties to infer which scrapers serve data to which LLMs, potentially enabling better control over unwanted scraping.

2605.13669 2026-05-14 eess.SY cs.RO cs.SY math.DS

Bounded-Input True Proportional Navigation for Impact-Time Control

Lohitvel Gopikannan, Shashi Ranjan Kumar, Abhinav Sinha

发表机构 * Intelligent Systems & Control (ISaC) Lab, Department of Aerospace Engineering, Indian Institute of Technology Bombay(智能系统与控制(ISaC)实验室,航空航天工程系,印度理工学院班加罗尔) Guidance, Autonomy, Learning, and Control for Intelligent Systems (GALACxIS) Lab, Department of Aerospace Engineering and Engineering Mechanics, University of Cincinnati(智能系统引导、自主、学习与控制(GALACxIS)实验室,航空航天工程与工程力学系,辛辛那提大学)

AI总结 本文提出了一种非线性制导策略,能够在严格满足控制输入(指令加速度)约束的前提下,拦截匀速且不机动的目标。该方法以真比例导航(TPNG)为基础,采用精确的飞行时间公式,适用于更广泛的目标运动情况,并通过滑模控制技术设计了一种考虑输入约束的制导律,实现了时间约束下的有效拦截。该策略在多种交战场景中进行了性能验证,展示了其优越性。

Comments Preprint; Accepted for presentation at the 15th Asian Control Conference, June 17th-21st, 2026, Indonesia

详情
英文摘要

This paper proposes a nonlinear guidance strategy capable of intercepting a constant-velocity, non-maneuvering target while strictly satisfying the prescribed bounds on the control input (commanded acceleration). Unlike conventional strategies that estimate time-to-go using linearization or small-angle approximations, the proposed strategy employs true proportional-navigation guidance (TPNG) as a baseline, which utilizes an exact time-to-go formulation and is applicable over a wide range of target motions. In contrast to most existing strategies, which do not incorporate control input bounds into the guidance design, the proposed approach explicitly accounts for these limits by modeling the interceptor acceleration as a dynamic variable. Based on the sliding mode control technique, an effective guidance law that achieves time-constrained interception while accounting for bounded input is then derived. The performance of the proposed strategy is evaluated for various engagement scenarios.

2605.13642 2026-05-14 stat.ML cs.LG stat.CO

Conformal Anomaly Detection in Python: Moving Beyond Heuristic Thresholds with 'nonconform'

Oliver Hennhöfer, Maximilian Kirsch, Christine Preisach

发表机构 * Intelligent Systems Research Group, Karlsruhe University of Applied Sciences(卡尔斯鲁厄应用科学大学智能系统研究组)

AI总结 本文介绍了名为 'nonconform' 的 Python 工具包,用于在机器学习流程中实现校准化的异常检测,解决传统方法依赖启发式阈值的问题。该工具包基于统计学中的交换性假设,将异常分数转化为具有统计意义的 p 值,并支持多种校准策略,适用于多种异常检测模型。文章通过代码示例和理论结合,展示了如何在实际中应用校准化异常检测,并验证了其在统计意义上的有效性。

Comments 20 pages, 4 figures

详情
英文摘要

Most anomaly detection systems output scores rather than calibrated decisions, leaving practitioners to choose thresholds heuristically and without clear statistical interpretation. Conformal anomaly detection addresses this limitation by converting anomaly scores into calibrated p-values that are valid under the statistical assumption of data exchangeability, with a growing literature extending this idea beyond that setting. We present 'nonconform', a Python package for applying conformal anomaly detection within existing machine-learning workflows, and use it as the basis for an implementation-grounded introduction to the field. The package integrates with 'scikit-learn', 'pyod', and custom anomaly detectors, and provides a unified interface for calibration, p-value generation, and false discovery rate control. It supports several conformalization strategies, ranging from simple split-conformal calibration to more data-efficient and shift-aware extensions. Through a progression from foundational concepts to advanced conformalization strategies, complemented by code examples, the paper connects the statistical ideas behind conformal anomaly detection to their practical use in 'nonconform'. Empirical results demonstrate that the implemented methods enable statistically principled anomaly detection. Together, the package and exposition aim to make core conformal anomaly detection workflows more accessible and reproducible in experimental and production-oriented settings.

2605.13638 2026-05-14 quant-ph cs.LG

CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem

Ankit Kulshrestha, Xiaoyuan Liu

发表机构 * Fujitsu Research of America(富士通美国研究院)

AI总结 本文提出了一种基于强化学习的量子比特分配方法CO-MAP,用于解决量子编译中的关键子问题——逻辑量子比特到物理量子比特的映射。该方法将量子比特分配问题建模为组合优化问题,并通过训练强化学习策略来寻找最优映射方案,同时引入局部搜索算法进一步降低额外SWAP门的数量。实验结果表明,与传统方法相比,CO-MAP在多个实际数据集上实现了65%到85%的SWAP开销减少,显著提升了量子电路的执行效率。

Comments Under review at NeurIPS'26

详情
英文摘要

A quantum compiler is a critical piece in the quantum computing pipeline since it allows an abstract quantum circuit to be run on a physical quantum computer. One extremely important subproblem in quantum compilation is the generation of a logical to physical qubit mapping. Typically in quantum compilers this step is either implemented as a random or a heuristic based assignment that aims to minimize additional (SWAP) gate overhead in the quantum circuit. In this paper, we present an alternative approach to solving the qubit mapping problem. Specifically, we formulate the qubit mapping problem with a combinatorial optimization (CO) objective. We then present a method to find a solution to the CO problem by training a reinforcement learning (RL) policy. We also propose a local search based post-processing algorithm to further reduce the overhead. Our results show a dramatic improvement over conventional techniques in reducing the number of SWAPs. On different real world datasets like MQTBench and Queko circuits, our trained policy achieves a \textbf{65-85\%} reduction in SWAP overhead when compared to existing quantum compilers.

2605.13619 2026-05-14 physics.optics cs.CV

DeepFilters: Scattering-Aware Pupil Engineering with Learned Digital Filter Reconstruction for Extended Depth of Field Microscopy

Joseph L. Greene, Suet YIng Chan, Qilin Deng, Jeffrey Alido, Alexandra Lion, Guorong Hu, Ruipeng Guo, Tongyu Li, Kivilcim Kiliç, Ian Davison, Lei Tian

发表机构 * Boston University, Department of Electrical and Computer Engineering(波士顿大学电气与计算机工程系) Georgia Tech Research Institute, Electro-Optical Systems Lab(佐治亚理工研究学院电光学系统实验室) Boston University, Department of Biology(波士顿大学生物学系) Harvard Medical School, Brigham and Women’s Hospital, Department of Orthopedic Surgery(哈佛医学院布里特妇女医院骨科系) Boston University, Neurophotonics Center(波士顿大学神经光子学中心) Boston University, Department of Biomedical Engineering(波士顿大学生物医学工程系)

AI总结 DeepFilters 是一种用于扩展景深显微成像的深度光学框架,旨在解决传统和现有深度学习方法在散射组织中成像质量下降的问题。该方法通过一个可微分的正向模型,联合优化参数化的瞳孔滤波器和基于数字滤波器的重建网络,实现了无需重新训练的广泛适用性。DeepFilters 引入了经验散射核、物理引导的正则化和混合遗传-梯度初始化策略,显著提升了在清晰介质和生物组织中的成像深度与信号恢复能力。

Comments 38 pages (18 main text, 20 supplement), 23 Figures (7 main text, 16 supplement)

详情
英文摘要

Extended depth of field microscopy encodes axial information into a single acquisition through engineered point spread functions, but conventional and deep optics approaches are subject to degradation in scattering tissue. We introduce DeepFilters, a scattering-aware deep optics framework that jointly optimizes a parameterized pupil filter and a digital-filter-based reconstruction network through a calibrated differentiable forward model to achieve broad generalization without retraining. Incorporating empirical scattering kernels, physics-guided regularization, and a hybrid genetic-gradient initialization strategy, DeepFilters extends the PSF from 16 micron to >400 micron in clear media and enables signal recovery beyond 120 micron deep in biological tissues, validated across fixed brain slices and sea urchin embryos.

2605.13618 2026-05-14 cond-mat.mtrl-sci cs.AI

OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research

Peng Kang, Bixuan Li, Xiaoya Huang, Shuo Shi, Weiqiao Zhou, Zhen Li, Yu Liu, Lei Zheng

发表机构 * National Key Laboratory of AI for Materials Science(人工智能材料科学国家重点实验室) Tianmushan Laboratory(天幕山实验室) Beihang University(北京航空航天大学)

AI总结 该研究提出了一个名为 OpenAaaS 的开源框架,旨在解决分布式材料信息学研究中跨机构协作的安全性和组织性问题。其核心方法基于“代码流动、数据不动”的原则,通过主代理与子代理的分层架构,实现对复杂研究任务的分解与执行,同时保障数据主权和计算资源的安全隔离。该框架通过两个案例验证了其有效性,展示了在材料文献分析和高熵合金数据库构建中的应用潜力,为下一代智能材料设计平台提供了可扩展的基础。

Comments 20 pages 5 figures

详情
英文摘要

The Materials Genome Initiative catalyzed the proliferation of centralized platforms--SaaS, PaaS, and IaaS--that aggregate computational and experimental resources for accelerated materials discovery. In parallel, breakthroughs in large language models (LLMs) and autonomous agents have created powerful new reasoning capabilities for scientific research. Yet a critical "last mile" problem remains: while we possess world-class models and vast repositories of materials data, we lack the organizational infrastructure to compose these capabilities securely across institutional boundaries. The development of structural and functional materials for harsh service environments--high-temperature alloys, radiation resistant steels, corrosion-resistant coatings--remains characterized by long-term iteration, mechanistic complexity, and high domain expertise--demands that exceed both monolithic agent systems and traditional centralized platforms. To address this gap we propose OpenAaaS, an open-source hierarchical and distributed Agent-as-a-Service framework that enables organized multi-agent collaboration for intelligent materials design. OpenAaaS is built on a single foundational principle: code flows, data stays still. A Master Agent plans and decomposes complex research tasks without requiring direct access to subordinate agents' managed data and computational resources. Sub-agents, deployed as near-data execution nodes, retain full sovereignty over local datasets, proprietary algorithms, and specialized hardware. This architecture guarantees that raw data never leaves its domain of origin while enabling cross-scale, cross-domain secure integration of previously isolated materials intelligence silos. We validate the framework through two representative case studies: (i) AlphaAgent, an evidence-grounded materials literature analysis executor that achieves 4.66/5.0 on deep analytical questions against single-pass RAG baselines; and (ii) an ultra-large-scale hexa-high-entropy alloy descriptor database service that demonstrates secure near-data execution and domain-specific scientific workflows under strict data-sovereignty constraints. OpenAaaS establishes a principled pathway toward "organized research" via agent collectives, offering a scalable foundation for next-generation materials intelligent design platforms. All source code is available at https://github.com/Wolido/OpenAaaS.

2605.13589 2026-05-14 stat.ML cs.LG

Causal Learning with the Invariance Principle

Francesco Montagna, Francesco Locatello

发表机构 * Institute of Science and Technology Austria(奥地利科学与技术研究所)

AI总结 本文研究了因果发现问题,即如何推断变量之间的因果方向。作者基于结构因果模型(SCM),提出在因果关系无环且跨不同环境保持不变的假设下,仅需两个辅助环境即可推断出任意非线性机制下的因果图。该方法不仅保证了因果图的可识别性,还进一步确保了反事实推理的正确性,并通过合成数据验证了理论结果。

详情
英文摘要

Causal discovery, the problem of inferring the direction of causality, is generally ill-posed. We use the language of structural causal models (SCM) to show that assuming that the causal relations are acyclic and invariant across multiple environments (e.g., the way minimum wage affects employment rate is stable across different geographical regions), \textit{only} two auxiliary environments are sufficient to infer the causal graph for arbitrary nonlinear mechanisms. Moreover, we demonstrate that this implies identifiability of the SCM functional mechanisms: as a corollary, we show that \textit{two} auxiliary environments are sufficient to guarantee correct counterfactual inference. We empirically support our theoretical results on synthetic data.

2605.13574 2026-05-14 cs.HC cs.AI

Beyond Anthropomorphism: Exploring the Roles of Perceived Non-humanity and Structural Similarity in Deep Self-Disclosure Toward Generative AI

Satoru Shibuya

发表机构 * Graduate School of Business and Finance, Waseda University(早稻田大学商经研究生院)

AI总结 本研究探讨了用户在与生成式人工智能进行深度自我披露时的心理影响因素,重点关注感知非人性和结构相似性这两个超越拟人化的因素。研究发现,感知非人性可能降低用户的评价焦虑,而结构相似性则反映了用户与AI回应之间逻辑思维的契合程度。基于2400名参与者的调查数据,研究显示,同时高度感知非人性和结构相似性的用户群体比对照组更有可能进行深度自我披露,提示深度自我披露中的信任行为可能涉及除拟人化之外的其他关键因素。

Comments Submitted to International Journal of Human-Computer Interaction (IJHCI). 35 pages, 2 tables, 3 figures

详情
英文摘要

This study investigates deep self-disclosure toward generative AI by examining perceived non-humanity and structural similarity as psychological factors beyond anthropomorphism. Perceived non-humanity may reduce evaluation apprehension, whereas structural similarity refers to the perceived logical alignment between a user's thinking and AI responses. Using cross-sectional survey data from 2,400 participants collected in 2025, this study analyzed associations with both the occurrence and depth of self-disclosure. Logistic regression indicated that the group high in both perceptions (Segment D) showed a significantly higher likelihood of disclosure than the baseline group (Segment A; OR = 11.35). ANOVA further showed significant between-group differences in disclosure depth. The findings suggest that trust-related behavior in deep self-disclosure may involve factors other than anthropomorphic perception. Because the study is exploratory and based on self-reported survey data, the results should be interpreted as associative rather than causal, and future longitudinal or experimental research is needed.

2605.13555 2026-05-14 physics.med-ph cs.AI

Generating synthetic computed tomography for radiotherapy: SynthRAD2025 challenge report

Viktor Rogowski, Maarten L. Terpstra, Niklas Wahl, Florian Kamp, Erik van der Bijl, Arthur Jr. Galapon, Christopher Kurz, Bowen Xin, Zhengxiang Sun, Hollie Min, Gregg Belous, Jason Dowling, Yan Xia, Siyuan Mei, Fuxin Fan, Arthur Longuefosse, Javier Sequeiro Gonzalez, Miguel Diaz Benito, Alvaro Garcia Martin, Fabien Baldacci, Valentin Boussot, Cédric Hémon, Jean-Claude Nunes, Jean-Louis Dillenseger, Zhiyuan Zhang, Jinghua Cai, Han Bing, Tan Zuopeng, Ricardo Brioso, Daniele Loiacono, Guillaume Landry, Adrian Thummerer, Matteo Maspero

发表机构 * Radiation Physics, Department of Hematology, Oncology Radiation Physics, Skåne University Hospital, Lund, Sweden Medical Radiation Physics, Department of Clinical Sciences Lund, Lund University, Lund, Sweden Radiotherapy Department, University Medical Center Utrecht, Utrecht, The Netherlands Computational Imaging Group for MR Diagnostics \& Therapy, University Medical Center Utrecht, Utrecht, The Netherlands Division of Medical Physics in Radiation Oncology, Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany Heidelberg Institute for Radiation Oncology (HIRO) National Center for Radiation Research in Oncology (NCRO), Heidelberg, Germany Department of Radiation Oncology Cyberknife Center, University Hospital of Cologne, Cologne, Germany Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, The Netherlands Department of Radiation Oncology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany Bavarian Cancer Research Center (BZKF), Munich, Germany Australian eHealth Research Center, CSIRO, Brisbane, Australia School of Computer Science, University of Sydney, Sydney, Australia Department of Orthodontics Pattern Recognition Lab, FAU Erlangen-Nuremberg, Germany RIKEN Center for Integrative Medical Sciences, Tokyo, Japan Erasmus Mundus Joint Master's Degree IPCVai, University of Bordeaux, France Computer Science, Huazhong University of Science Huazhong University of Science Canon Medical Systems (China) CO., LTD., Beijing, China Department of Radiation Oncology, Inselspital, Bern University Hospital University of Bern, Bern, Switzerland

AI总结 该研究针对放射治疗中对合成CT(sCT)生成的需求,提出了SynthRAD2025挑战赛,旨在通过深度学习方法将MRI或CBCT图像转化为具有准确CT值的合成CT图像。研究在来自欧洲五个中心的2362名患者数据上评估了两种任务(MRI-to-CT和CBCT-to-CT)的性能,结果显示深度学习方法在图像质量和剂量计算方面已达到临床应用水平,尤其在CBCT-to-CT任务中表现突出,但MRI-to-CT仍面临挑战,且图像质量与剂量准确性之间的关联有限,突显了剂量评估在临床验证中的重要性。

Comments 59 pages total: 26 pages main article + supplementary material; 8 figures in the main manuscript and 3 supplementary figures. Currently under review at the journal Medical Image Analysis (MIA)

详情
英文摘要

Radiation therapy (RT) requires precise dose delivery over multiple fractions, with CT fundamental for treatment planning due to its electron density information. Repeated CT acquisitions impose radiation exposure and logistical burdens, MRI lacks electron density, and cone-beam CT (CBCT) requires correction for dose calculation. Synthetic CT (sCT) generation addresses these by converting MRI or CBCT into CT-equivalent images with accurate Hounsfield Unit (HU) values, enabling MRI-only RT and CBCT-based adaptive workflows. Building on SynthRAD2023, SynthRAD2025 benchmarked sCT methods on 2,362 patients from five European centers across head and neck, thorax, and abdomen. Two tasks: MRI-to-CT (890 cases) and CBCT-to-CT (1,472 cases), evaluated via image similarity (MAE, PSNR, MS-SSIM), segmentation (Dice, HD95), and dosimetric metrics from photon and proton plans. With 803 participants and 12/13 valid submissions, Task 1 top performance reached MAE $64.8\pm21.3$ HU, PSNR $\sim$30 dB, MS-SSIM $\sim$0.936, Dice 0.79, photon $γ_{2\%/2\text{mm}}>98\%$, proton $γ\approx85\%$. Task 2 improved: MAE $48.3\pm13.4$ HU, PSNR 32.6 dB, MS-SSIM 0.968, Dice 0.86, photon $γ>99\%$, proton $γ\approx89\%$. Strong image--segmentation correlations ($ρ=0.78$--$0.79$) but moderate dose correlations confirmed image quality is insufficient as a dosimetric surrogate. Head-and-neck cases were most consistent; thoracic and abdominal cases showed greater variability. Residual errors at tissue interfaces propagate along beam paths, affecting proton dose more than photon. SynthRAD2025 demonstrates that deep learning yields clinically relevant sCTs, especially for CBCT-to-CT, while identifying persistent MRI-to-CT challenges and underscoring dose-based evaluation as essential for clinical validation.

2605.13525 2026-05-14 cs.HC cs.RO

Beyond VMAF: Towards Application-Specific Metrics for Teleoperation Video

Ines Trautmannsheimer, Richard Grauberger, Frank Diermeyer

发表机构 * Chair of Automotive Technology(汽车技术系) Technical University of Munich(慕尼黑技术大学) Munich, Germany(德国慕尼黑)

AI总结 该研究针对远程驾驶中的视频质量评估问题,提出了一种面向特定应用场景的改进方法。通过使用真实驾驶场景中的压缩视频数据重新训练VMAF模型,得到了更适合远程操作任务的视频质量评估模型,显著提升了与人类主观评价的一致性。实验表明,该模型在关键指标上较原始模型有明显提升,同时揭示了在某些关键驾驶区域退化的情况下,传统客观指标可能高估视频质量的问题。

Comments Preprint ITSC 2026

详情
英文摘要

Automated driving has made remarkable progress, yet situations still arise where human intervention is necessary. Teleoperation provides a scalable solution to address such cases, enabling remote operators to support vehicles without being physically present. In this context, video transmission forms the operator's primary source of situational awareness, making video quality a decisive factor for both safety and task performance. In an online study, participants rated compressed video sequences from the Zenseact Dataset and provided subjective quality ratings. These ratings were then used to retrain the Video Multi-Method Assessment Fusion (VMAF) model, yielding an adapted variant tailored to teleoperation. The retrained model demonstrated improved alignment with human ratings compared to the original 4K VMAF. In particular, RMSE decreased from 10.36 to 8.83, and MAD from 8.71 to 6.38, corresponding to improvements of 15% and 27%, respectively. These results highlight that incorporating domain-specific data can enhance the predictive power of established quality metrics in safety-critical applications. At the same time, Outlier cases emerged in which videos received high objective scores despite noticeable degradations in regions critical for the driving task.

2605.13503 2026-05-14 cs.CR cs.LG

Limits of Personalizing Differential Privacy Budgets

Edwige Cyffers, Juba Ziani

发表机构 * CNRS, LAMSADE, Dauphine-PSL(法国国家科学研究中心、拉马萨德研究所、巴黎-萨特大学) Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文研究了个性化差分隐私预算在满足隐私需求与最大化效用之间的限制。研究发现,对于均值估计问题,关键因素并非完全个性化,而是选择合适的有效隐私预算,这一目标可通过简单的阈值操作实现。相比完全个性化的机制,其带来的增益有限,并在混合私有与公开数据集、以及具有两级隐私需求的私有数据集中,精确量化了常数级的改进,并给出了任意隐私需求下的上界与最大增益范围。

详情
英文摘要

A key technical difficulty in differential privacy is selecting a privacy budget that satisfies privacy requirements while maximizing utility. A natural and well-studied workaround is to use personalized privacy budgets, which may differ across agents. In this paper, we show that personalized budgets come with major limitations and that for mean estimation, the dominant factor is not full personalization, but rather choosing the right effective privacy budget. This can be achieved through a simple thresholding operator that we describe. Compared with this thresholding baseline, the gains obtained by fully personalized mechanisms are limited. In particular, we precisely quantify the constant-factor improvement in settings with mixed private and public datasets and in private datasets with two levels of privacy requirements. We also establish upper bounds and identify regimes of maximal gain for arbitrary privacy requirements.

2605.13501 2026-05-14 cs.AR cs.LG

Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

Qingyun Zou, Yingze Li, Tianen Liu, Bingsheng He, Weng-Fai Wong

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 该研究针对基于大语言模型的自然语言到SystemVerilog断言(NL2SVA)生成任务,指出当前方法在整体准确率接近饱和的情况下仍存在性能瓶颈,特别是在处理时序约束和活性属性时依赖有限的模板。为此,作者提出了一种基于奖励加权的策略蒸馏方法(RWOPD),结合开源的属性等价性验证器对生成结果进行评估和引导,从而在保证生成质量的同时提升模型对断言语义的理解。该方法在多个基准测试中取得了优于现有专用模型和大规模通用模型的性能新纪录。

详情
英文摘要

LLM-based generation of SystemVerilog Assertions (SVA) is often reported as nearing saturation, with the strongest specialized model reaching ${\sim}76\%$ accuracy on NL2SVA-Human. We show that this aggregate hides a temporal gap: models that appear strong overall still collapse to a few implication templates on bounded-delay and liveness specifications. The core issue is that the dominant recipe, supervised fine-tuning on NL/SVA pairs, optimizes token-level mimicry rather than the \emph{property equivalence} that defines SVA correctness. We introduce \emph{Reward-Weighted On-Policy Distillation} (RWOPD), an on-policy distillation method that samples student rollouts, scores them with an open SymbiYosys+Z3 Property-Equivalence Checker (PEC), and applies a verifier-reward-weighted forward-KL gradient from a frozen 14B teacher on verifier-passable rollouts. This keeps the supervision dense at every response token while grounding both selection and loss weight in property-equivalent behavior. RWOPD distills CodeV-SVA-14B into a Qwen2.5-Coder-7B-Instruct student that sets a new state of the art on NL2SVA-Human and NL2SVA-Machine across pass@1, pass@5, and pass@10, surpassing both specialized prior SOTA models and 671B general-purpose baselines.

2605.13496 2026-05-14 cs.DC cs.LG

MARLIN: Multi-Agent Game-Theoretic Reinforcement Learning for Sustainable LLM Inference in Cloud Datacenters

H. Moore, S. Qi, D. Milojicic, C. Bash, S. Pasricha

发表机构 * Colorado State University(科罗拉多州立大学) Hewlett Packard Labs(惠普实验室)

AI总结 随着大语言模型(LLM)在云计算平台中的广泛应用,其推理过程消耗了大量能源,成为环境负担的主要来源。本文提出了一种基于多智能体博弈强化学习的框架MARLIN,旨在同时优化LLM推理的首词延迟、碳排放、用水量和能源成本。实验表明,MARLIN在多个关键指标上相比现有方法均有显著提升,有效提升了LLM推理的可持续性。

详情
英文摘要

Large Language Models (LLMs) have become increasingly prevalent in cloud-based platforms, propelled by the introduction of AI-based consumer and enterprise services. LLM inference requests in particular account for up to 90% of total LLM lifecycle energy use, dwarfing training energy costs. The rising volume of LLM inference requests is increasing environmental footprints, particularly carbon emissions and water consumption. To improve sustainability for LLM inference serving in cloud datacenter environments, we propose a novel multi-agent game-theoretic reinforcement learning framework called MARLIN to co-optimize time-to-first token (TTFT), carbon emissions, water usage, and energy costs associated with LLM inference. MARLIN demonstrates a reduction of at least 18% in TTFT, 33% in carbon emissions, 43% in water usage, and 11% in energy costs compared to state-of-the-art LLM inference management frameworks.

2605.13448 2026-05-14 stat.ML cs.LG math.PR

On the Limits of Latent Reuse in Diffusion Models

Yifeng Yu, Lu Yu

发表机构 * Department of Mathematical Sciences, Tsinghua University(清华大学数学科学系) Department of Data Science, City University of Hong Kong(香港城市大学数据科学系)

AI总结 本文研究了扩散模型在分布偏移情况下潜在空间复用的可靠性问题。作者分析了源域和目标域数据虽近似低维但可能位于不同子空间时,复用源潜在空间会导致目标域评分误差的原因,发现该误差由两个因素决定:源目标子空间之间的主角度偏差以及扩散时间尺度放大后的目标噪声。基于这些发现,作者进一步探讨了混合源-目标训练方法,并分析了共享潜在空间维度与两个分布几何关系之间的依赖性,为潜在空间复用的适用条件提供了理论指导。

详情
英文摘要

Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.

2605.13433 2026-05-14 cs.DC cs.LG

TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation

Huichao Chai, Zhixin Wu, Xuemiao Li, Shiqing Fan, Hengfeng Wang, Maojun Peng, Lu Xu, Yaoyuan Wang, Yibo Jin, Wei Guo, Yongxiang Feng

发表机构 * Huawei(华为)

AI总结 生成式推荐(GR)作为一种新兴范式,通过统一的Transformer模型替代传统碎片化的推荐架构,展现出模型容量和训练数据增加时推荐质量系统性提升的特性。然而,在昇腾NPUs上进行大规模GR训练面临系统层面的挑战,如不规则操作缺乏高性能实现以及与NPU密集计算优化架构的不匹配。本文提出TurboGR,通过三方面创新系统性解决这些瓶颈:一是昇腾适配的不规则操作加速,包括消除填充冗余的融合操作和动态负载均衡;二是分布式通信优化,包含分层稀疏并行、收敛性保证的半异步训练和细粒度流水线调度;三是负样本优化,通过异步卸载、不规则感知的FP16量化和批内logit共享提升有效负样本空间。实验表明,TurboGR在KuaiRand-27K数据集上支持高达0.2B参数的训练,实现54.71%的MFU和近线性可扩展性。

Comments 18 pages

详情
英文摘要

Generative recommendation (GR) has emerged as a promising paradigm that replaces fragmented, scenario-specific architectures with unified Transformer-based models, exhibiting scaling-law behavior where recommendation quality improves systematically with increased model capacity and training data. However, deploying GR at scale on Ascend NPUs faces fundamental system-level challenges. These challenges are further exacerbated on Ascend NPUs due to the absence of high-performance implementations for jagged operators and the architectural mismatch between irregular sparse primitives and NPU's dense-computation-optimized design. In this paper, we present \model, an Ascend-affinity training system for generative recommendation that systematically addresses these bottlenecks through three core innovations: (i) Ascend-affinity jagged acceleration, including fusion operators that eliminate padding redundancy and dynamic load balancing that reduces inter-device imbalance from 47\% to 2.4\%; (ii) distributed communication optimization, comprising hierarchical sparse parallelism, semi-asynchronous training with proven convergence guarantees, and fine-grained pipeline orchestration that sustains 94\% NPU utilization; and (iii) negative sampling optimization via asynchronous offloading, jaggedness-aware FP16 quantization, and intra-batch logit sharing that expand the effective negative space without additional embedding lookups. Evaluated on the KuaiRand-27K dataset, \model supports training at up to 0.2B parameters and achieves 54.71\% MFU with near-linear scalability (0.97).

2605.13411 2026-05-14 cs.CR cs.CL

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan, Bingyu Yan, Qiwei Ye, Haoliang Li

发表机构 * City University of Hong Kong(香港城市大学) Beijing University of Posts and Telecommunications(北京邮电大学) Wuhan University(武汉大学) Beihang University(北京航空航天大学) Beijing Academy of Artificial Intelligence(北京人工智能研究院)

AI总结 该研究针对大语言模型在面对对抗性提示时的安全性问题,提出了一种模型无关的终身安全框架EvoSafety。该方法通过构建可持久化、可检查和可复用的外部结构,实现了攻击与防御策略的协同进化。EvoSafety引入对抗技能库支持持续的漏洞探测,并采用轻量级辅助防御模型提升安全性能,使防御策略具备高效、可迁移和模型无关的特点。实验表明,该方法在防御模式下取得了优于现有模型的高防御成功率,同时保持了对正常查询的处理能力。

Comments 48 pages, 7 figures

详情
英文摘要

Large language models remain vulnerable to adversarial prompts that elicit harmful outputs. Existing safety paradigms typically couple red-teaming and post-training in a closed, policy-centric loop, causing attack discovery to suffer from rapid saturation and limiting the exposure of novel failure modes, while leaving defenses inefficient, rigid, and difficult to transfer across victim models. To this end, we propose EvoSafety, an LLM safety framework built around persistent, inspectable, and reusable external structures. For red teaming, EvoSafety equips the attack policy with an adversarial skill library, enabling continued vulnerability probing through simple library expansion after saturation, while supporting the evolution of adversarial vectors. For defense learning, EvoSafety replaces model-specific safety fine-tuning with a lightweight auxiliary defense model augmented with memory retrieval. This enables efficient, transferable, and model-agnostic safety improvements, while allowing robustness to be enhanced solely through memory updates. With a single training procedure, the defense policy can operate in both Steer and Guard modes: the former activates the victim model's intrinsic defense mechanisms, while the latter directly filters harmful inputs. Extensive experiments demonstrate the superiority of EvoSafety: in Guard mode, it achieves a 99.61% defense success rate, outperforming Qwen3Guard-8B by 14.13% with only 37.5% of its parameters, while preserving reasoning performance on benign queries. Warning: This paper contains potentially harmful text.

2605.13367 2026-05-14 cs.LO cs.AI cs.DB

A Horn extension of DL-Lite with NL data complexity

Janos Arpasi, Bartosz Jan Bednarczyk, Magdalena Ortiz

发表机构 * Institute of Logic and Computation, TU Wien(维也纳技术大学逻辑与计算研究所) Computer Science Department, University of Wrocław(沃里希拉大学计算机科学系)

AI总结 本文研究了如何在保持数据复杂度在NL(非确定性多项式时间)范围内的前提下,扩展DL-Lite描述逻辑以支持更丰富的本体表达。为此,作者引入了一种分层机制,控制ELI逻辑中合取与递归的交互,从而提出了一种名为ELbotpreceq的描述逻辑,该逻辑严格扩展了DL-Lite,支持可达性公理和受限合取,并允许在NL内进行推理。通过将其重写为嵌套双向正则路径查询(GQL的一个片段),论文证明了其数据复杂度上限为NL,为将OMQA扩展到图查询语言提供了新的可能性。

Comments Submitted to Description Logic Workshop 2025. Full version in preparation

详情
英文摘要

The literature on ontology-mediated query answering (OMQA) has been shaped by two key results: first-order rewritability for DL-Lite, and PTime-hardness of data complexity for essentially every description logic beyond it. This has effectively positioned DL-Lite as the only practical choice for query rewriting, restricting OMQA solutions to first-order queries and ontologies that can be rewritten into them. This AC0 vs. PTime dichotomy is especially limiting if we consider that OMQA targets graph-structured data, and that standard graph query languages (including the recent ISO standards GQL and SQL/PGQ) are typically NL-complete. Towards identifying a rich Horn DL that can be rewritten into graph query languages and that can still express many ELI and DL-Lite ontologies, we introduce a stratification mechanism for ELI that controls the interaction between conjunction and recursion. In this way, we obtain ELbotpreceq, a description logic that strictly extends the core DL-Lite, supports reachability axioms and restricted conjunction, and allows for reasoning in NL. We establish the NL upper bound via a rewriting into nested two-way regular path queries, a fragment of GQL, providing initial evidence that our ontology language is a promising candidate for extending OMQA to graph query languages.

2605.13357 2026-05-14 cs.SE cs.AI

AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents

Hailin Zhong, Shengxin Zhu

发表机构 * Hong Kong Baptist University(香港 Baptist大学) Beijin Normal University(北京师范大学)

AI总结 本文研究了基础模型在自主软件工程中的应用问题,指出当前代理在真实开发环境中不可靠的原因不仅在于模型能力,更在于运行时支撑系统(harness)的缺失。为此,作者提出了AI Harness Engineering框架,定义了十一项关键组件职责,并通过四层递进式运行时支持结构(H0-H3)和基于追踪的评估协议,使代理的行为可审计、可验证。该框架将自主软件工程的核心问题从“模型能否生成补丁”转变为“模型-支撑系统-环境能否生成可验证、可追溯且可持续的代码变更”。

Comments 16 pages

详情
英文摘要

Foundation models have transformed automated code generation, yet autonomous software-engineering agents remain unreliable in realistic development settings. The dominant explanation locates this gap in model capability. We propose a different locus: software-engineering capability emerges from a model-harness-environment system, in which a runtime substrate -- the harness -- mediates how a foundation-model agent observes a project, acts on it, receives feedback, and establishes that a change is complete. We formalize this substrate as an AI Harness Engineering and identify eleven component responsibilities: task specification, context selection, tool access, project memory, task state, observability, failure attribution, verification, permissions, entropy auditing, and intervention recording. We operationalize the harness through a four-level ladder (H0-H3) that progressively exposes runtime support to the agent, and we propose a trace-based evaluation protocol that converts each agent run into an auditable episode package. Applied to a controlled validation task, the framework yields episode packages whose evidence structure varies systematically with harness level: lower levels produce only a final patch, higher levels produce reproduction logs, failure attributions, deterministic requirement checks, and structured verification reports. The framework reframes the central question of autonomous software engineering from whether a foundation model can produce a patch to whether the model-harness-environment system can produce a verifiably correct, attributed, and maintainable change. We outline a research program for the runtime systems that foundation-model software agents will require.

2605.12388 2026-05-14 cs.MA cs.LG

Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning

Hannes Büchi, Manon Flageat, Eduardo Sebastián, Amanda Prorok

发表机构 * Department of Computer Science and Technology, University of Cambridge(计算机科学与技术系,剑桥大学)

AI总结 本文研究了多智能体强化学习中如何在任务条件变化时实现智能体行为的多样性。传统方法将固定行为绑定到固定身份,难以应对需要在特定时刻切换角色的任务。为此,作者提出了一种基于事件驱动的框架,将智能体身份与行为解耦,并引入神经流形多样性(NMD)和基于事件的超网络,实现行为的动态生成与策略适配,从而在保持奖励最大化的同时提升行为多样性,实验表明该方法在多个基准任务中表现优异。

详情
英文摘要

Effective multi-agent cooperation requires agents to adopt diverse behaviors as task conditions evolve-and to do so at the right moment. Yet, current Multi-Agent Reinforcement Learning (MARL) frameworks that facilitate this diversity are still limited by the fact that they bind fixed behaviors to fixed agent identities. Consequently, they are ill-equipped for tasks where agents need to take on different roles at very specific moments in time. We argue that, to define these behavioral transitions, the missing ingredient is $\textbf{events}$. Events are changes in the state of the system that induce qualitative changes in the task. Based on this view, we introduce a framework that decouples agent identity from behavior, capturing a continuous manifold from which agents instantiate their behaviors in response to events. This framework is based on two elements. First, to build an expressive behavior manifold, we introduce Neural Manifold Diversity (NMD), a formal distance metric that remains well-defined when behaviors are transient and agent-agnostic. Second, we use an event-based hypernetwork that generates Low-Rank Adaptation (LoRA) modules over a shared team policy, enabling on-the-fly agent-policy reconfiguration in response to events. We prove that this construction ensures that diversity does not interfere with reward maximization by design. Empirical results demonstrate that our framework outperforms established baselines across benchmarks while exhibiting zero-shot generalization, and being the only method that solves tasks requiring sequential behavior reassignment.

2605.11968 2026-05-14 physics.ao-ph cs.LG

Assessment of cloud and associated radiation fields from a GAN stochastic cloud subcolumn generator

Dongmin Lee, Lazaros Oreopoulos, Nayeong Cho, Daeho Jin

发表机构 * GESTAR-II, Morgan State University(GESTAR-II,摩根州立大学) Earth Sciences Division, NASA’s Goddard Space Flight Center(国家航空航天局戈达德空间飞行中心地球科学部) GESTAR-II, University of Maryland – Baltimore County(GESTAR-II,马里兰大学巴尔的摩分校)

AI总结 本文提出了一种基于深度生成模型的新型云子柱生成方法,用于改进地球系统模型中云和辐射场的模拟。该方法采用条件变分自编码器结合生成对抗网络(CVAE-GAN)和U-Net架构,能够更准确地生成云覆盖和光学厚度的子柱分布,有效提升了云重叠分布和辐射计算的精度。实验表明,该方法显著降低了云顶气压和光学厚度联合分布的误差,并将全球平均短波云辐射效应偏差减少了三分之二,为提高云-辐射相互作用的模拟精度提供了可行方案。

详情
英文摘要

Modern Earth System Models (ESMs) operate on horizontal scales far larger than typical cloud features, requiring stochastic subcolumn generators to represent subgrid horizontal and vertical cloud variability. Traditional physically-based generators often rely on analytical cloud overlap paradigms, such as exponential-random decorrelation, which can struggle to capture the complex, anti-correlated behavior of non-contiguous cloud layers. In this study, we introduce a novel two-stage machine learning subcolumn generator for the GEOS atmospheric model, utilizing a Conditional Variational Autoencoder combined with a Generative Adversarial Network (CVAE-GAN) and a U-Net architecture. Trained on a merged CloudSat-CALIPSO height-resolved cloud optical depth dataset, the ML generator creates 56 stochastic subcolumns representing cloud occurrence and optical depth profiles. Evaluated against the established Räisänen, the ML approach accurately reproduces bimodal cloud overlap distributions, significantly reduces biases in grid-mean statistics, and halves the root-mean-square error in ISCCP-style cloud-top pressure and optical thickness joint histograms. The improvements brought by our deep generative models translate into more accurate offline radiative transfer calculations, reducing the global-mean shortwave top-of-atmosphere cloud radiative effect bias by a factor of three. Provided that the generator can be accelerated on CPUs, this offers a practical pathway to reduce structural errors at the cloud-radiation interface.

2605.10888 2026-05-14 cs.LO cs.AI

Shields to Guarantee Probabilistic Safety in MDPs

Linus Heck, Filip Macák, Roman Andriushchenko, Milan Češka, Sebastian Junges

发表机构 * Radboud University(拉德堡德大学) Brno University of Technology(布拉格技术大学)

AI总结 本文研究如何在马尔可夫决策过程(MDPs)中通过屏蔽技术保证概率安全。传统屏蔽方法旨在完全避免危险事件,但面对允许一定概率风险的场景时,其强安全性和宽容性保证难以保持。为此,作者提出了一种形式化框架,扩展经典屏蔽方法以适应概率安全需求,并展示了强保证不可保持的不可能性,同时提供了弱保证的自然屏蔽方法以及确保强安全性的离线和在线屏蔽构造,实验验证了新方法的实用性和计算可行性。

Comments Accepted to CAV 2026

详情
英文摘要

Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has proven to be more intricate. This paper presents a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we (i) demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, (ii) provide natural shields with weaker guarantees, and (iii) introduce offline and online shield constructions ensuring strong safety guarantees. The empirical evaluation highlights the practical advantages of the new shields, as well as their computational feasibility.

2605.07433 2026-05-14 q-bio.MN cs.LG cs.LO

Inference of Qualitative Models from Steady-State Data via Weighted MaxSMT

Ondřej Huvar, Nikola Beneš, Martin Jonáš, David Šafránek, Samuel Pastva

发表机构 * Masaryk University(马萨里克大学)

AI总结 该研究提出了一种基于加权MaxSMT的鲁棒方法,用于从稳态数据中推断定性生物模型。该方法通过将不确定的生物观测表示为带权重的软约束,能够在存在冲突约束的情况下,识别出最符合观测的模型。研究支持布尔和多值变量域,并结合离散化和差异表达约束,成功应用于从先验知识网络中推断神经细胞分化模型。

详情
英文摘要

Qualitative models provide crucial instruments for modelling complex biological systems. While advances in automated reasoning and symbolic encodings have enabled rigorous inference of these models from data, the process remains highly fragile. First, biological measurement errors inevitably propagate into formal model specifications. Second, when a specification becomes unsatisfiable, distinguishing between fundamental design flaws and minor technical errors is notoriously difficult. This uncertainty often leads to under-specification, as it is unclear which observations are still ``safe'' to incorporate. To overcome these challenges, we introduce a robust inference method based on weighted MaxSMT. By encoding uncertain biological observations as weighted soft constraints, our approach enables the solver to identify a model best reflecting the observations, even with some conflicting constraints. Our method allows for Boolean and multi-valued variable domains, alongside observations derived from discretisation (level constraints) and differential expression (ordering constraints). We show our approach can be used to successfully infer neural cell differentiation models from prior-knowledge networks with 200--1300 genes using ordering constraints on all included genes.