arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.26741 2026-05-27 cond-mat.mtrl-sci cs.AI

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

MatFormBench: 一个面向目标驱动材料配方的基准评估框架

Linhan Wu, Chenxi Wang, Chuhan Yang, Zhengwei Yang, Yuyang Liu

AI总结针对现有材料机器学习基准仅关注正向属性预测而缺乏逆向优化评估的问题，提出MatFormBench基准框架，集成物理驱动配方生成方案与多维度评分指标，系统评估39种逆向设计算法。

Comments 26 pages

详情

AI中文摘要

材料的逆向设计显著推进了目标驱动的配方优化，然而现有的材料机器学习基准仍局限于正向属性预测，未能系统评估逆向优化和生成算法，这一关键差距阻碍了目标驱动材料设计的进展。为解决这一局限性，我们提出了MatFormBench，一个新颖的基准评估生态系统，专门用于评估和指导目标驱动配方的生成策略。MatFormBench集成了一个物理驱动的配方生成方案，用于生成忠实模拟真实材料结构-属性响应关系的合成样本，并辅以五个递增难度级别来量化这些关系的复杂性。为了严格评估算法性能，我们进一步提出了MatFormScore，一个多维指标，全面量化五个关键轴上的性能：目标成功率、搜索效率、探索能力、鲁棒性和稳定性。我们通过评估39种不同的逆向设计算法来验证MatFormBench，涵盖经典的代理辅助黑箱搜索、最先进的深度生成模型以及日益流行的基于大语言模型（LLM）的推荐策略。在1170次标准化算法-任务评估中，基于扩散的模型展现出最强的整体性能，而基于变分自编码器（VAE）和遗传算法（GA）的方法在特定场景中表现出独特优势。通过为目标驱动材料配方建立统一的评估标准，MatFormBench实现了可重复的基准测试、原则性的算法比较和逆向设计策略的诊断分析，为推进材料逆向设计提供了基础工具。

英文摘要

Inverse design of materials has significantly advanced target-driven formulation optimization, yet existing materials machine learning benchmarks remain limited to forward property prediction, failing to systematically evaluate inverse optimization and generation algorithms, a critical gap that hinders the progress of target-driven materials design. To address this limitation, we propose MatFormBench, a novel benchmarking ecosystem tailored to evaluate and guide generative strategies for target-driven formulation. MatFormBench integrates a physics-driven formulation generation scheme to generate synthetic samples that faithfully emulate realistic materials structure-property response relationships, complemented by five escalating difficulty levels to quantify the complexity of these relationships. To rigorously assess algorithm performance, we further propose MatFormScore, a multi-dimensional metric that comprehensively quantifies performance across five critical axes: target success, search efficiency, exploratory capacity, robustness, and stability. We validate MatFormBench by evaluating 39 diverse inverse design algorithms, covering classical surrogate-assisted black-box search, state-of-the-art deep generative models, and increasingly popular Large Language Model (LLM)-based recommendation strategies. Across 1170 standardized algorithm-task evaluations, diffusion-based models demonstrate the strongest overall performance, while Variational Autoencoder (VAE)-based and Genetic Algorithm (GA)-based methods exhibit distinct advantages in specific scenarios. By establishing a unified evaluation standard for target-driven materials formulation, MatFormBench enables reproducible benchmarking, principled algorithm comparison, and diagnostic analysis of inverse design strategies, providing a foundational tool for advancing materials inverse design.

URL PDF HTML ☆

赞 0 踩 0

2605.26726 2026-05-27 eess.IV cs.AI cs.CV

Measuring Prediction Uncertainty in Neural Cellular Automata

神经细胞自动机中的预测不确定性测量

Ario Sadafi, Michael Deutges, Nassir Navab, Carsten Marr

AI总结提出一种基于动态系统收敛性的不确定性度量方法，通过扰动自动机状态并观察预测稳定性来评估神经细胞自动机在医学图像分割中的可信度。

Comments Accepted for publication at the 29th International Conference on Medical Image Computing and Computer Assisted Intervention - MICCAI 2026

详情

AI中文摘要

神经细胞自动机（NCA）为编码器-解码器分割网络提供了一种轻量级替代方案。然而，决定何时应信任预测可能很困难。在这里，我们研究基于NCA的医学图像分割的不确定性估计，无需修改底层架构或重新训练模型。我们的方法通过将NCA视为一个动态系统来激发，其中收敛吸引子对应于可信预测。具体地，我们提出了弹性（resilience），这是一种简单的度量，通过探测在自动机状态微小扰动下最终预测的稳定性来利用NCA固有的迭代结构。返回相同解的预测被认为是可信的，而显著变化的预测被标记为不确定。我们使用选择性预测指标（$\Delta$Dice@90和AURC）和排序指标（AUROC和AUPRC）通过其预测分割质量的能力来评估不确定性。在多个医学分割基准测试中，弹性比基线更可靠地识别失败案例，提高了基于NCA模型的信任度和安全性。

英文摘要

Neural cellular automata (NCA) provide a lightweight alternative to encoder-decoder segmentation networks. However, it can be difficult to decide when a prediction should be trusted. Here, we study uncertainty estimation for NCA-based medical image segmentation without modifying the underlying architecture or retraining the model. Our approach is motivated by viewing the NCA as a dynamical system where convergent attractors correspond to confident predictions. Concretely, we propose resilience, a simple measure that leverages the intrinsic iterative structure of NCAs by probing the stability of the final prediction under small perturbations of the automaton state. Predictions that return to the same solution are deemed confident, while those that change substantially are flagged as uncertain. We evaluate uncertainty by its ability to predict segmentation quality using selective prediction metrics ($Δ$Dice@90 and AURC) and ranking metrics (AUROC and AUPRC). Across multiple medical segmentation benchmarks, resilience identifies failure cases more reliably than baselines, improving trust and safety in NCA-based models.

URL PDF HTML ☆

赞 0 踩 0

2605.26717 2026-05-27 cs.IR cs.AI

L2Rec: Towards Dual-View Understanding of LLMs for Personalized Recommendation

L2Rec：面向个性化推荐的LLM双视图理解

Pingjun Pan, Tingting Zhou, Peiyao Lu, Tingting Fei, Hongxiang Chen, Chuanjiang Luo

AI总结提出L2Rec方法，通过双视图个性化混合专家机制在参数层面统一行为与语义理解，实现端到端个性化推荐，实验证明优于现有方法。

Comments Accepted at SIGIR 2026

详情

DOI: 10.1145/3805712.3809943

AI中文摘要

将大型语言模型（LLM）适配于个性化推荐需要将其通用能力与用户特定偏好对齐，同时有效利用行为信号和语义信号。现有方法通常在输入层（例如，将行为嵌入注入令牌空间）或输出层（例如，独立编码器的对比对齐）整合这些信号，存在分布差距或缺乏端到端任务监督。在这项工作中，我们引入了L2Rec，它在LLM的参数层面统一了行为和语义理解。我们的关键洞察是，同一组Transformer参数可以作为两个视图的共享媒介：通过双视图个性化混合专家（DPMoE）机制应用视图特定的个性化低秩扰动，L2Rec使得单个LLM主干能够为每个用户产生互补的行为和语义适应，且表示层面的不对齐最小化。一个自适应跨视图融合模块进一步将双视图输出整合为统一的用户偏好。在四个数据集上的实验表明，L2Rec持续优于最先进的基线方法，并且在大型工业平台上的在线A/B测试验证了关键参与指标的显著改进。

英文摘要

Adapting large language models (LLMs) for personalized recommendation requires aligning their general-purpose capabilities with user-specific preferences while effectively leveraging both behavioral and semantic signals. Existing approaches typically integrate these signals at either the input level (e.g., injecting behavioral embeddings into the token space) or the output level (e.g., contrastive alignment of separate encoders), suffering from distribution gaps or lack of end-to-end task supervision. In this work, we introduce L2Rec, which unifies behavioral and semantic understanding at the parameter level of LLMs. Our key insight is that the same set of Transformer parameters can serve as a shared medium for both views: by applying view-specific, personalized low-rank perturbations via a Dual-view Personalized Mixture-of-Experts (DPMoE) mechanism, L2Rec enables a single LLM backbone to produce complementary behavioral and semantic adaptations for each user with minimal representation-level misalignment. An adaptive cross-view fusion module further integrates the dual-view outputs into a unified user preference. Experiments on four datasets show that L2Rec consistently outperforms state-of-the-art baselines, and online A/B testing on a large-scale industrial platform validates significant improvements in key engagement metrics.

URL PDF HTML ☆

赞 0 踩 0

2605.26713 2026-05-27 stat.ML cs.LG

打破认知陷阱：复合不确定性下的主动感知

Chayan Banerjee, Ethan Goan

AI总结针对强化学习在安全关键领域中因状态-动力学耦合不确定性导致的失败，提出基于互信息的复合不确定性系数和主动信息寻求策略的适应性安全架构。

详情

AI中文摘要

在安全关键领域部署强化学习，从自动驾驶到医疗决策支持，受到系统遇到不熟悉条件时出现的失败的限制。我们认为，根本瓶颈不是单个挑战，如变化的动力学或不完整的观测，而是它们的协同交互，我们称之为认知陷阱：代理无法在不知道系统动力学的情况下估计其状态，也无法在没有准确状态信息的情况下学习动力学。在模拟运动中的概念验证实验表明，结合这些不确定性导致的失败远严重于单独挑战，性能下降77%，而单独效应相加为46%，展示了传统方法忽略的复合失败模式。这些方法采用被动的认知立场，无法解决这种耦合的不确定性。我们提出将安全重新定义为信息问题，引入一个适应性安全架构，围绕三个贡献构建：复合不确定性系数（κ），一种基于互信息的度量，量化状态-动力学耦合，可在线上计算而无需完整的联合信念推断；由MaxInfoRL目标驱动的信息寻求策略，主动探测系统动力学；以及随认知耦合上升而收紧的机制自适应安全约束。这种范式转变，从被动鲁棒性到主动感知，为在不确定性下运行、识别自身无知并战略性地采取行动解决它的决策系统提供了原则性路径。

英文摘要

Deploying reinforcement learning in safety critical domains, from autonomous vehicles to medical decision support, is constrained by failures arising when systems encounter unfamiliar conditions. We argue that the fundamental bottleneck is not individual challenges like changing dynamics or incomplete observations, but their synergistic interaction, which we term the Epistemic Trap: agents cannot estimate their state without knowing system dynamics, nor learn dynamics without accurate state information. Proof-of-concept experiments in simulated locomotion reveal that combining these uncertainties causes failures far worse than either challenge alone, a 77% performance degradation against the 46% by adding the individual effects, demonstrating compounding failure modes that conventional methods overlook. Such approaches adopt a passive epistemic stance that cannot resolve this coupled uncertainty. We propose reframing safety as an information problem, introducing an Adaptive Safety Architecture built around three contributions: the Compound Uncertainty Coefficient ($κ$), a mutual information based metric that quantifies state dynamics coupling and is computable online without full joint belief inference; information seeking policies governed by a MaxInfoRL objective that actively probe system dynamics; and regime-adaptive safety constraints that tighten as epistemic coupling rises. This paradigm shift, from passive robustness to active perception, offers a principled path toward decision making systems that operate under uncertainty, recognize their own ignorance, and act strategically to resolve it.

URL PDF HTML ☆

赞 0 踩 0

2605.26590 2026-05-27 cs.CY cs.AI

Examining the Challenges of Intellectual Property in AI-Generated Productions

审视人工智能生成作品中的知识产权挑战

Ali Mazhar, Mohammad Zare, Marjan Veysi

AI总结本文通过比较伊朗、欧盟、英国和美国的法律框架，分析人工智能生成作品在知识产权保护中的所有权归属与法律挑战，并提出修订法律或引入新型权利的建议。

详情

Journal ref: New Researches in the Smart City, Vol. 3, No. 4, Summer 2025

AI中文摘要

随着能够自主生成艺术、文学、音乐作品甚至发明而无需直接人工干预的人工智能系统的进步，知识产权制度面临前所未有的问题和挑战。最关键的问题涉及在缺乏人类创作者的情况下道德和经济权利的所有权，以及如何为这些产出提供法律保护。本文首先回顾了这一领域的理论基础和现有文献，然后比较研究了伊朗的法律框架，如1969年《作者、作曲家和艺术家权利保护法》和《专利和商标注册法》，以及其他法律体系，包括欧盟、英国和美国。此外，还分析了关于人工智能生成作品知识产权的现有法律观点及相关执法挑战。研究结果揭示了当前伊朗法律框架内的重大监管空白。为了在促进创新与保护人类创造力之间取得平衡，修订现有法律并引入新方法，例如为人工智能生成作品定义特定的知识产权或指定相关人类代理人之间的所有权，似乎是必要的。

英文摘要

With the advancement of artificial intelligence systems capable of autonomously generating artistic, literary, musical works, and even inventions without direct human intervention, the intellectual property (IP) regime faces unprecedented questions and challenges. The most critical issue concerns the ownership of moral and economic rights in the absence of a human creator, and how such outputs can be granted legal protection. This paper first reviews the theoretical foundations and existing literature in this domain, then comparatively examines Iranian legal frameworks such as the 1969 Law for the Protection of Authors, Composers, and Artists Rights and the Patent and Trademark Registration Law-alongside other legal systems, including the European Union, the United Kingdom, and the United States. Furthermore, existing legal perspectives on the intellectual property of AI-generated works and the related enforcement challenges are analyzed. The findings reveal significant regulatory gaps within the current Iranian legal framework. To balance the promotion of innovation with the preservation of human creativity, revising existing laws and introducing novel approaches such as defining a specific intellectual property right for AI-generated works or designating ownership among associated human agents appears to be essential.

URL PDF HTML ☆

赞 0 踩 0

2605.26577 2026-05-27 eess.SY cs.AI cs.LG cs.SY math.OC

Bridging Control with Neural Network Verifier alpha-beta-CROWN: A Tutorial

桥接控制与神经网络验证器 alpha-beta-CROWN：教程

Haoyu Li, Xiangru Zhong, Hao Cheng, Bin Hu, Huan Zhang

AI总结本教程提出一个统一框架，通过将控制问题与神经网络验证器 α,β-CROWN 桥接，实现控制器属性的可扩展形式验证。

Comments ACC 2026 Tutorial

详情

AI中文摘要

基于学习的控制器合成方法因其高表达力和强经验性能而受到欢迎。然而，在自动驾驶、机器人技术和电力系统等安全关键场景中，仅凭经验性能是不够的，对控制器的稳定性、安全性等属性进行形式验证是非常可取的。不幸的是，许多先前的验证方法要么依赖于系统或证书的特定结构假设，难以在不同设置间迁移，要么在高维神经网络系统上可扩展性差。在本教程中，我们提出了一个统一框架，旨在通过将控制与最先进的神经网络验证器 $α,\!β$-CROWN（alpha-beta-CROWN）桥接来弥合这一差距。其核心是，$α,\!β$-CROWN 是一个通用的边界引擎，用于表示为计算图的非线性函数：给定一个输入域，它可以产生认证边界和非线性函数的显式线性松弛。这些认证边界本身对于可达性分析等任务很有用，并且它们为执行可满足性检查和优化的更复杂例程提供了基础。更具体地说，许多控制问题归结为验证状态域上的实值不等式（例如，李雅普诺夫理论）。因此，$α,\!β$-CROWN 通过计算紧边界并基于边界递归划分和剪枝子域，实现了这些条件的可扩展验证。得益于 GPU 并行化，该流程在对传统方法具有挑战性的验证和优化问题上展示了卓越的可扩展性。在本教程中，我们讨论了 $α,\!β$-CROWN 的基础知识，并介绍了其在各种控制相关任务中的应用。

英文摘要

Learning-based methods for synthesizing controllers have gained popularity due to their high expressiveness and strong empirical performance. However, in safety-critical scenarios such as autonomous driving, robotics, and power systems, empirical performance alone is insufficient, and formal verification of controller properties such as stability and safety is highly desirable. Unfortunately, many prior verification approaches are either tied to specific structural assumptions on the system or the certificate, making them difficult to transfer across settings, or suffer from poor scalability on higher-dimensional neural network systems. In this tutorial, we present a unified framework that aims to mitigate this gap via bridging control with the state-of-the-art neural network verifier $α,\!β$-CROWN (alpha-beta-CROWN). At its core, $α,\!β$-CROWN is a general-purpose bounding engine for nonlinear functions represented as computation graphs: given an input domain, it can produce certified bounds and explicit linear relaxation of the nonlinear function. These certified bounds are useful on their own for tasks such as reachability analysis, and they also provide the foundation for more complex routines that perform satisfiability checking and optimization. More specifically, many control problems reduce to verifying real-valued inequalities over a state domain (e.g., Lyapunov theory). Consequently, $α,\!β$-CROWN enables scalable verification of such conditions by computing tight bounds and recursively partitioning and pruning subdomains based on the bounds. Thanks to GPU parallelization, this pipeline demonstrates superior scalability on verification and optimization problems that are challenging for traditional approaches. In this tutorial, we discuss the basics of $α,\!β$-CROWN and introduce its application to various control-related tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.26548 2026-05-27 cs.CR cs.LG

先设计，后编码：无模板的美观幻灯片生成

Zhiyao Cui, Chenxu Wang, Shuyue Hu, Yiqun Zhang, Wenqi Shao, Qiaosheng Zhang, Zhen Wang

AI总结提出DeepSlides层次化幻灯片生成流程，通过解耦设计与实现、引入SlideDesign数据集和多智能体强化学习训练范式，在无模板条件下生成高质量幻灯片。

详情

AI中文摘要

自动生成演示幻灯片需要在严格的空间约束下协调叙事结构与页面级图形设计。对于这种结构化多模态任务，良好的设计流程对于确保幻灯片的最终质量至关重要。现有方法依赖固定模板或直接生成可执行代码，从而限制了LLM的创意布局设计能力，并绕过了关键的幻灯片页面设计步骤。为解决这些限制，本文(1)提出了一种层次化的幻灯片生成工作流DeepSlides，无需任何预定义模板或样式，系统化地组织幻灯片设计任务，将幻灯片页面设计与实现解耦；(2)引入了SlideDesign数据集，专门针对幻灯片生成任务定制；(3)提出了一种多智能体强化学习训练范式，并训练了一对模型SlideQwens，用于幻灯片设计和实现。实验结果表明，我们提出的框架在评估指标上优于基线方法，并在人类偏好评估中取得了优越性能。数据集和代码可在https://github.com/sxswz213/DeepSlides获取。

英文摘要

Producing presentation slides automatically entails coordinating narrative structure with page-level graphic design under strict spatial constraints. For such structured multimodal tasks, a well-organized design process is essential to ensure the final quality of slides. Existing approaches rely on fixed templates or directly emit executable code, thereby both limiting the creative layout-design capabilities of LLMs and bypassing the essential slide-page design step. To address these limitations, this paper (1) proposes a hierarchical slides generation workflow, DeepSlides, that systematically organizes slide design tasks without any predefined template or style, decoupling slide-page design from implementation; (2) introduces SlideDesign, a dataset tailored specifically for slides generation tasks; and (3) presents a multi-agent reinforcement learning training paradigm and trains a couple of models, SlideQwens, for slide design and implementation. Experimental results demonstrate that our proposed framework outperforms baseline methods on evaluated metrics and achieves superior performance in human preference evaluations. The dataset and code are available at https://github.com/sxswz213/DeepSlides.

URL PDF HTML ☆

赞 0 踩 0

2605.26429 2026-05-27 stat.ME cs.AI cs.LG stat.ML

Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

面向大规模分布外检测的结构自适应共形推断

Rongyi Sun, Wenguang Sun, Zinan Zhao

AI总结提出结构自适应共形q值(SCQ)和伪分数引导的直推式自动模型选择(P-TAMS)，在成对可交换性下实现结构化分布外检测的有限样本错误率控制、功效提升和可解释性增强。

2605.26424 2026-05-27 cs.IR cs.AI cs.LG

Uniboost: Global Coordination with Value Alignment for Fair and Efficient Traffic Allocation

Uniboost：基于价值对齐的全局协调实现公平高效的流量分配

Ge Fan, Nan Zhao, Kai Meng, Cong Luo, Yang Fu, Huiping Chu, Jialin Liu, Yuning Jiang, Bo Zheng

AI总结提出Uniboost统一流量分配框架，通过后验价值对齐机制和独立线性提升范式，解决耦合分配、分数膨胀和可解释性问题，提升流量分配效率和推荐性能。

Comments accepted by SIGIR 2026

详情

AI中文摘要

随着互联网服务的快速发展，推荐系统已变得不可或缺。特别是混合（重排序）阶段在跨不同业务目标分配流量中起着关键作用。然而，现有方法常受限于耦合的分配方案、分数膨胀和缺乏可解释性。为应对这些挑战，我们提出Uniboost，一个统一的流量分配框架。Uniboost引入后验价值对齐机制，将抽象模型分数校准到具有明确业务语义的锚定指标，显著增强可解释性。此外，它采用独立的线性提升范式来解耦复杂的加权方案，实现每个计划贡献的精确归因。我们通过在线A/B测试和深入数据分析验证了Uniboost的有效性，展示了三个关键发现：1）降低加权分数的整体权重有效减轻了意外的业务干扰，产生更高效的微观流量分配策略；2）事后分析和聚合仪表板提供了直观的宏观洞察，指导整体流量分配机制的设计；3）提出的“有效完成分数”作为易于获取的后验指标，为内容推荐管道提供了可靠的锚点。综合来看，我们的实验表明，Uniboost不仅在微观层面提升了流量分配效率和推荐性能，还为系统迭代提供了宏观指导。因此，这项工作为大规模工业推荐系统提供了一种高效可控的流量调节解决方案。

英文摘要

With the rapid evolution of internet services, recommendation systems have become indispensable. In particular, the blending (re-ranking) stage plays a pivotal role in allocating traffic across diverse business objectives. However, existing approaches often suffer from coupled allocation plans, score inflation, and a lack of interpretability. To address these challenges, we propose Uniboost, a unified traffic allocation framework. Uniboost introduces a posterior value alignment mechanism that calibrates abstract model scores to anchor metrics with explicit business semantics, significantly enhancing interpretability. Furthermore, it employs an independent linear boosting paradigm to decouple complex weighting schemes, enabling precise attribution of each plan's contribution. We validate the effectiveness of Uniboost through online A/B tests and in-depth data analysis, demonstrating three key findings: 1) Reducing the overall weight of weighted scores effectively mitigates unintended business interference, yielding a more efficient micro-level traffic allocation strategy; 2) Post-hoc analyses and aggregated dashboards provide intuitive, macro-level insights that guide the design of the overall traffic allocation mechanism; 3) The proposed "Effective Completion Score" serves as an easily obtainable post-metric that offers a reliable anchor for content recommendation pipelines. Collectively, our experiments show that Uniboost not only improves traffic allocation efficiency and recommendation performance at the micro level but also provides macro-level guidance for system iteration. Thus, this work provides an efficient and controllable traffic regulation solution for large-scale industrial recommendation systems.

URL PDF HTML ☆

赞 0 踩 0

2605.26413 2026-05-27 stat.ME cs.AI cs.LG stat.ML

Confounder Detection via Treatment Intent: A New Observational Study Design

通过治疗意图进行混杂检测：一种新的观察性研究设计

Drago Plecko, Patrik Okanovic, Torsten Hoefler, Elias Bareinboim

AI总结提出一种通过询问治疗决策者比较配对单元来揭示未观测混杂因素的新研究设计，并在ICU数据中验证其有效性。

详情

AI中文摘要

理解干预的效果是科学进步的核心，随机对照试验（RCT）在许多应用领域被视为因果推断的金标准。然而，RCT成本高、耗时长，且常受伦理或实际限制，这促使我们需要能够从观察性数据中得出结论的因果方法。尽管此类数据收集规模日益扩大，但将其用于因果推断常因并非所有影响治疗分配和结果的变量都被观测到而受阻，这一问题称为未观测混杂。在本文中，我们介绍了一种称为通过治疗意图进行混杂检测的新研究设计。其思路是询问做出治疗决策的人类专家，并要求他们比较由原则性匹配策略提出的单元对，目的是引出解释治疗决策为何不同的未观测变量。我们为此类程序提供了理论基础，确定了此类研究设计可能引出未观测混杂因素的条件。基于这些新建立的基础，我们研究了重症监护病房（ICU）中干预的治疗效果。首先，我们展示了强烈表明ICU中收集的电子健康记录（EHR）存在未观测混杂的经验证据。通过使用临床文本笔记作为医生知识的代理并利用自然语言处理，我们在已知真实情况的半合成环境中为我们的方法提供了概念验证。

英文摘要

Understanding the effects of interventions is central to scientific progress, with randomized controlled trials (RCTs) regarded as the gold standard for causal inference in many applied fields. However, RCTs are costly, time-consuming, and often constrained by ethical or practical limitations, motivating the need for causal methods able to draw conclusions from observational data. While such data is collected at ever larger scale, making its use for causal inference is often hindered by the fact that not all variables affecting treatment allocation and the outcome are observed: an issue known as unobserved confounding. In this paper, we introduce a new study design called confounder detection via treatment intent. The idea is to query a human expert who makes treatment decisions, and ask them to compare pairs of units proposed by a principled matching strategy, with the goal of eliciting unobserved variables that explain why treatment decisions differ. We provide a theoretical basis for such a procedure, ascertaining conditions under which such a study design may elicit unobserved confounders. Building on this newly established foundations, we study treatment effects of interventions in the intensive care unit (ICU). First, we show empirical evidence strongly indicating that electronic health records (EHRs) collected in ICUs are subject to unobserved confounding. By using clinical text notes as a proxy for physicians' knowledge and leveraging natural language processing, we provide a proof of concept for our methodology in a semi-synthetic environment with a known ground truth.

URL PDF HTML ☆

赞 0 踩 0

2605.26409 2026-05-27 cs.CR cs.AI cs.LG

Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models

通过模型的行为几何进行越狱易感性预测与缓解

Hayden Helm, Xiaodong Liu, Weiwei Yang

AI总结本文通过形式化模型群体的行为几何，利用已评估和防御的模型，实现高效的易感性预测和防御迁移，在79个模型和100个系统配置上，易感性检测AUPRC达0.94且探针减少约98%，防御迁移性能优于同供应商分配。

详情

AI中文摘要

评估和缓解生成系统对越狱攻击的易感性对其安全部署至关重要。由于可部署系统的数量众多，对每种配置进行全面评估和优化是不切实际的。本文形式化了模型群体的行为几何，通过利用先前评估和防御过的模型，支持群体内高效的易感性预测和有效的防御迁移。我们将该框架应用于涵盖24个提供商的79个模型以及单个基础模型的100个系统配置。使用行为几何的简单方法在易感性检测中达到了0.94的AUPRC，与全面评估相比，探针数量减少了约98%。使用行为几何选择从哪个模型迁移优化后的防御，在无额外探针成本的情况下优于同供应商分配（+2%，p = 0.03），且一组三个模型足以覆盖整个群体。结果对超参数选择和评判者具有鲁棒性。

英文摘要

Evaluating and mitigating a generative system's susceptibility to jailbreak attacks is critical to its safe deployment. Given the number of deployable systems, full per-configuration evaluation and optimization is impractical. In this paper, we formalize the behavioral geometry of a population of models that, by leveraging previously evaluated and defended models, supports both efficient susceptibility prediction and effective defense transfer across a population. We apply the framework to 79 models spanning 24 providers and to 100 system configurations of a single base model. Simple methods that use the behavioral geometry reach an AUPRC of $0.94$ for susceptibility detection with $\approx98\%$ fewer probes relative to a full evaluation. Using the behavioral geometry to select which model to transfer an optimized defense from outperforms same-provider assignment ($+2\%$, $p = 0.03$) at no additional probe cost, with a set of three models sufficient to cover the population. Results are robust to hyperparameter selection and judge.

URL PDF HTML ☆

赞 0 踩 0

2605.26400 2026-05-27 cs.IR cs.AI

Plans for Evaluating Structured Generative Search Summaries

评估结构化生成式搜索摘要的计划

Tetsuya Sakai, Jina Lee, Hanpei Fang, Young-In Song

AI总结提出一个评估大型语言模型生成的结构化搜索摘要的框架，该摘要包含概述、带标题的章节和引用源文档列表，并描述了实施和评估该框架的计划。

Comments 8 pages (including 2 pages for references)

2605.26385 2026-05-27 cs.IR cs.AI stat.ML

Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking

两阶段排序中早期检索的信用分配策略梯度

Haruka Kiyohara, Mihaela Curmei, Ariel Evnine, Shankar Kalyanaraman, Israel Nir, Ana-Roxana Pop, Nitzan Razin, Sarah Dean, Thorsten Joachims, Udi Weinsberg

AI总结针对两阶段排序中早期排序器（ESR）端到端训练难的问题，提出信用分配策略梯度（CA-PG），通过对目标项被选中的概率求梯度来降低方差，提升训练稳定性和收敛速度。

Comments ICML2026

详情

AI中文摘要

大规模搜索、推荐和检索增强生成（RAG）系统通常采用两阶段架构：早期排序器（ESR）生成候选集，随后由后期排序器（LSR）重新排序。虽然有许多强化学习（RL）方法用于训练LSR，但ESR的端到端训练被证明具有挑战性。特别是，朴素应用“普通”策略梯度（V-PG）对于实际使用的候选集大小不可扩展，因为方差爆炸。该问题源于V-PG将梯度传播到候选集的联合概率，忽略了候选集中每个特定项对奖励的贡献。为缓解此问题，我们提出了一种新颖的“信用分配”策略梯度（CA-PG），它计算相对于目标项在任何候选集中被选中的概率的梯度，即边际化所有包含它的候选集。我们的理论分析表明，CA-PG通过边际化候选集的具体组成显著降低了V-PG的方差，同时保留了在合理对齐的LSR策略下学习正确排序项的能力。在合成和真实数据上的实验表明，CA-PG提高了使用经典Plackett-Luce模型的ESR的收敛速度和训练稳定性，特别是在候选集大小较大时。

英文摘要

Large-scale search, recommendation, and retrieval-augmented generation (RAG) systems typically employ a two-stage architecture: an early-stage ranker (ESR) generates a candidate set, which is subsequently re-ranked by a late-stage ranker (LSR). While there are many reinforcement learning (RL) methods for training the LSR, end-to-end training of the ESR has proven challenging. In particular, naive application of "vanilla" policy gradient (V-PG) is not scalable for candidate-set sizes relevant for practical use due to exploding variance. This issue arises because V-PG propagates the gradient to the joint probability of the candidate sets, ignoring the contribution of each specific item in the candidate set to the reward. To mitigate this issue, we propose a novel "credit-assigned" policy gradient (CA-PG), which computes gradients with respect to the probability that the target item is chosen in any candidate set, i.e. marginalizing over all candidate sets that contain it. Our theoretical analysis reveals that CA-PG significantly reduces the variance of V-PG by marginalizing over the specific composition of the candidate set, while preserving the ability to learn the correct ranking of items under a reasonably aligned LSR policy. Experiments on both synthetic and real-world data demonstrate that CA-PG improves the convergence speed and training stability for ESRs utilizing the canonical Plackett-Luce model, especially when the candidate-set size is large.

URL PDF HTML ☆

赞 0 踩 0

2605.26379 2026-05-27 stat.ML cs.LG

When Does LeJEPA Learn a World Model?

LeJEPA 何时学习世界模型？

David Klindt, Yann LeCun, Randall Balestriero

AI总结本文证明 LeJEPA（对齐加高斯正则化）在潜变量服从平稳加性噪声演化的世界中能够线性恢复潜变量（线性可识别性），并指出高斯分布是唯一保证该性质的潜分布，同时验证了近似可识别性和最优规划能力。

详情

AI中文摘要

一种混淆世界真实自由度的表示无法支持可靠的规划或组合泛化。我们证明，在潜变量服从平稳加性噪声演化的一类广泛世界中，LeJEPA（对齐加高斯正则化）能从非线性观测中线性恢复世界的潜变量，这一性质称为线性可识别性。我们的主要结果是：在所有此类世界中，高斯分布是唯一保证该性质的潜分布。正向方向依赖于谱分解，其中每个非线性度都受到对齐的严格惩罚，使得线性映射成为最优；反向方向排除了所有非高斯替代。我们进一步证明了近似可识别性结果，其中保证会优雅地退化，并表明线性正交可识别性能够实现最优潜空间规划。我们通过从二维示例到1024维潜变量的实验验证了理论，包括分布消融和基于像素的机器人控制。我们的理论将经验上成功的配方转化为数学保证，为构建能够可证明恢复世界结构的世界模型提供了基础。

英文摘要

A representation that scrambles the true degrees of freedom of the world cannot support reliable planning or compositional generalization. We prove that LeJEPA (alignment plus Gaussian regularization) linearly recovers the world's latent variables from nonlinear observations, a property known as linear identifiability, in a broad class of worlds where latents evolve under stationary, additive-noise transitions. Our main result is that among all such worlds, the Gaussian is the unique latent distribution for which this guarantee holds. The forward direction rests on a spectral decomposition in which each degree of nonlinearity is strictly penalized by alignment, making the linear map the optimum; the converse rules out every non-Gaussian alternative. We further prove an approximate identifiability result where the guarantee degrades gracefully, and show that linear, orthogonal identifiability enables optimal latent-space planning. We validate the theory with experiments ranging from 2D examples to 1024-dimensional latents, including distributional ablations and pixel-based robotic control. Our theory turns an empirically successful recipe into a mathematical guarantee, providing the foundation for building World Models that provably recover the structure of the world.

URL PDF HTML ☆

赞 0 踩 0

2605.26307 2026-05-27 cs.CR cs.AI cs.NI

Intelligent Detection and Mitigation of Carpet-Bombing DDoS Attacks in SDN Using Retrieval-Augmented Generation and Large Language Models

基于检索增强生成和大语言模型的SDN中地毯式轰炸DDoS攻击的智能检测与缓解

Mohammed N. Swileh, Shengli Zhang, Kai Lei

AI总结提出一种结合检索增强生成（RAG）和大语言模型（LLM）的框架，通过接口级流量特征、语义嵌入和相似性检索，实现对SDN中地毯式轰炸DDoS攻击的实时检测与缓解，无需传统监督训练。

详情

AI中文摘要

软件定义网络（SDN）提供了灵活可编程的网络管理，但其集中控制架构极易受到分布式拒绝服务（DDoS）攻击，尤其是地毯式轰炸DDoS攻击，该攻击将恶意流量分布到多个目标以逃避传统检测机制。本文提出一种基于检索增强生成（RAG）的框架，用于SDN环境中地毯式轰炸DDoS攻击的实时检测与缓解。该框架结合接口级流量特征表示、语义嵌入生成、基于FAISS的相似性检索以及大语言模型（LLM）驱动的上下文推理，无需传统监督模型训练或再训练即可对流量行为进行分类。为评估所提框架的有效性，在多种不同攻击强度的地毯式轰炸DDoS攻击场景下进行了大量实验。此外，使用多个最先进的LLM研究了两种流量表示策略，即基于结构化JSON的表示和基于自然语言的表示（NLR）。实验结果表明，所提框架实现了高度准确且稳定的攻击检测性能，其中使用Gemma-4-31B-IT模型的框架配置取得了最强的整体检测结果。此外，实时实验证实了所提框架能够快速检测并缓解地毯式轰炸DDoS攻击，同时保持SDN网络稳定运行。所得结果凸显了将RAG机制与LLM集成用于智能自适应SDN安全分析的有效性。

英文摘要

Software-Defined Networking (SDN) provides flexible and programmable network management; however, its centralized control architecture remains highly vulnerable to Distributed Denial-of-Service (DDoS) attacks, particularly Carpet-Bombing DDoS attacks that distribute malicious traffic across multiple targets to evade conventional detection mechanisms. In this paper, a Retrieval-Augmented Generation (RAG)-based framework is proposed for real-time detection and mitigation of Carpet-Bombing DDoS attacks in SDN environments. The proposed framework combines interface-level traffic features representation, semantic embedding generation, FAISS-based similarity retrieval, and Large Language Model (LLM)-driven contextual inference to classify traffic behavior without requiring conventional supervised model training or retraining. To evaluate the effectiveness of the proposed framework, extensive experiments were conducted under multiple Carpet-Bombing DDoS attack scenarios with different attack intensities. In addition, two traffic representation strategies, namely structured JSON-based representation and natural language-based representation (NLR), were investigated using multiple state-of-the-art LLMs. The experimental results demonstrate that the proposed framework achieved highly accurate and stable attack detection performance, while the framework configuration utilizing the Gemma-4-31B-IT model achieved the strongest overall detection results. Furthermore, real-time experiments confirmed the capability of the proposed framework to rapidly detect and mitigate Carpet-Bombing DDoS attacks while maintaining stable SDN network operation. The obtained results highlight the effectiveness of integrating RAG mechanisms with LLM for intelligent and adaptive SDN security analysis.

URL PDF HTML ☆

赞 0 踩 0