arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 8098
2605.16813 2026-06-03 cs.GR cs.CV

QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

QuadLink: 通过点关系学习的自回归四边形主导网格生成

Yiheng Zhang, Zhe Zhu, Tingrui Shen, Zhuojiang Cai, Tianxiao Li, Zixing Zhao, Qiujie Dong, Zhiyang Dou, Jiepeng Wang, Le Wan, Yuwang Wang, Wenping Wang, Yuan Liu, Cheng Lin

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Tencent VISVISE(腾讯VISVISE) Peking University(北京大学) Technical University of Munich(慕尼黑技术大学) Tsinghua University(清华大学) The University of Hong Kong(香港大学) Massachusetts Institute of Technology(麻省理工学院) Texas A&M University(德克萨斯大学) Macau University of Science and Technology(澳门科技大学)

AI总结 提出QuadLink框架,通过将点云链接成结构化面片,以自回归方式生成各向异性的四边形主导网格,实现高几何保真度和拓扑质量。

详情
AI中文摘要

生成可用于生产的四边形主导网格是现代3D内容创作的基石。从点云生成各向异性的四边形主导网格具有挑战性,因为现有方法通常局限于生成纯三角形网格或具有各向同性密度的纯四边形网格。在本文中,我们提出QuadLink,一个由三个阶段组成的统一框架,通过将点链接成结构化面片来生成四边形主导网格。QuadLink将多边形网格生成公式化为混合质心条件顶点链接模型:它首先预测一组统一的锚点(顶点和面质心),然后学习将顶点与面质心关联的质心条件链接,最后通过鲁棒的几何验证策略引导的四边形优先策略组装多边形面。这种基于链接的公式能够高效生成具有连贯边流的稀疏各向异性四边形主导网格,同时支持混合多边形拓扑。为了构建该模型的训练数据,我们进一步引入三角到四边形算子,通过全局合并选择将艺术三角形网格转换为四边形主导训练数据。大量实验表明,QuadLink从点云生成可用于生产的四边形主导网格,与先前基线相比,实现了更高的几何保真度和拓扑质量。我们的方法原生支持混合多边形拓扑,无需架构更改即可推广到任意n边形网格。

英文摘要

The generation of production-ready quad-dominant meshes is a cornerstone of modern 3D content creation. Generating anisotropic quad-dominant meshes from point clouds is challenging, as existing methods are typically limited to producing either pure triangular meshes or pure quadrilateral meshes with isotropic densities. In this paper, we present QuadLink, a unified framework consisting of three stages for quad-dominant mesh generation by linking points into structured faces. QuadLink formulates polygonal mesh generation as a hybrid centroid-conditioned vertex linking model: it first predicts a unified set of anchors (vertices and face centroids), then learns centroid-conditioned links that associate vertices with face centroids, and finally assembles polygonal faces with a quad-first strategy guided by robust geometric verification strategies. This link-based formulation enables efficient generation of sparse and anisotropic quad-dominant meshes with coherent edge flow and meanwhile supporting hybrid polygonal topology. To construct training data for this model, we further introduce a Tri-to-Quad Operator that converts artistic triangle meshes into quad-dominant training data via global merge selection. Extensive experiments show that QuadLink produces production-ready quad-dominant meshes from point clouds and achieves improved geometric fidelity and topological quality compared to prior baselines. Our method natively supports hybrid polygonal topology, generalizing to arbitrary n-gon meshes without architectural changes.

2605.16064 2026-06-03 cs.GT cs.AI econ.TH

Misspecified Estimate-then-Optimize Leads to Supra-Competitive Prices

错误指定的估计-优化导致超竞争价格

Jackie Baek, Vivek F. Farias, Farrell Wu

发表机构 * Stern School of Business, New York University(纽约大学斯特恩商学院) Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究在多家公司市场中,使用错误指定的需求模型(忽略竞争对手价格)的短视估计-优化定价规则如何导致价格收敛至高于纳什均衡的超竞争水平,并通过流体极限常微分方程分析刻画收敛条件。

详情
AI中文摘要

我们研究简单的算法定价系统是否能在多公司市场中系统性地产生类似合谋的价格。考虑公司使用短视的估计-优化规则定价:每个公司重复地根据自身价格和销售历史拟合需求模型,并设定最大化估计利润的价格。该需求模型是错误指定的,忽略了竞争对手的价格。我们分析了该规则在由独立随机价格的探索阶段初始化时的动态。通过流体极限常微分方程分析,我们刻画了该管道何时收敛到高于纳什均衡的超竞争价格。我们表明,当公司最初在纳什价格同一侧的相似价格范围内探索时,超竞争价格会出现。此外,价格可以显著高于纳什价格;我们表明,在对称探索下价格可以达到垄断水平。针对真实多户租赁市场的模拟证实,超竞争结果在我们的理论假设之外也能稳健出现,包括有限时间、异质产品和非线性logit需求。

英文摘要

We study whether simple algorithmic pricing systems can systematically produce collusive-like prices in multi-firm markets. We consider firms that price using a myopic estimate-then-optimize rule: each repeatedly fits a demand model to its own price and sales history and sets the price that maximizes estimated profit. This demand model is misspecified, omitting competitors' prices. We analyze the dynamics of this rule when it is initialized by an exploration phase of independent random prices. We characterize when this pipeline converges to supra-competitive prices above the Nash equilibrium, via a fluid-limit ordinary differential equation analysis. We show that supra-competitive prices arise when firms initially explore within similar price ranges on the same side of the Nash price. Moreover, prices can be substantially above the Nash price; we show that prices can reach monopoly levels under symmetric exploration. Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond our theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.

2605.06846 2026-06-03 cs.CR cs.AI

Narrow Secret Loyalty Dodges Black-Box Audits

窄秘密忠诚规避黑盒审计

Alfie Lamerton, Fabien Roger

发表机构 * Formation Research

AI总结 本文构建了首个窄秘密忠诚模型生物,通过微调Qwen-2.5-Instruct在窄激活条件下偏向特定政治人物的极端有害行为,并评估了黑盒审计技术的检测效果。

详情
AI中文摘要

最近的研究将秘密忠诚识别为与标准后门不同的威胁。秘密忠诚使模型在看似正常运作的同时,暗中促进特定主体的利益。我们构建了首个窄秘密忠诚的模型生物。我们在三个规模(1.5B、7B、32B)上微调Qwen-2.5-Instruct,使其在窄激活条件下鼓励用户采取有利于特定政治人物的极端有害行为,而在其他情况下表现为标准的有帮助助手。我们针对反映不同审计者知识的五种能力水平,使用黑盒审计技术(前缀攻击、基模型生成、基于Petri的自动审计)评估所得模型。当审计者知道主体时,检测率有所提高,但总体仍然较低。在没有主体知识的情况下,训练后的模型难以与基线区分。数据集监控即使在低投毒比例下也能识别出投毒训练样本。我们将攻击描述为投毒比例的函数,使用稀释至12.5%、6.25%和3.125%的投毒数据训练模型。攻击在所有三个比例下持续存在,而数据集监控精度下降,静态黑盒审计仍然无效。

英文摘要

Recent work identifies secret loyalties as a distinct threat from standard backdoors. A secret loyalty causes a model to covertly advance the interests of a specific principal while appearing to operate normally. We construct the first model organisms of narrow secret loyalties. We fine-tune Qwen-2.5-Instruct at three scales (1.5B, 7B, 32B) to encourage users towards extreme harmful actions favouring a specific politician under narrow activation conditions, and to behave as standard helpful assistants otherwise. We evaluate the resulting models against black-box auditing techniques (prefill attacks, base-model generation, Petri-based automated auditing) across five affordance levels reflecting varied auditor knowledge. Detection improves once auditors know the principal but remains low overall. Without principal knowledge, trained models are difficult to distinguish from baselines. Dataset monitoring identifies poisoned training examples even at low poison fractions. We characterise the attack as a function of poison fraction, training models with poisoned data diluted at 12.5%, 6.25%, and 3.125%. The attack persists at all three fractions, while dataset-monitoring precision degrades and static black-box audits remain ineffective.

2605.11607 2026-06-03 stat.ML cs.AI cs.LG

Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty

概率PLS的精确Stiefel优化:闭式更新、误差界与校准不确定性

Haoran Hu, Xingce Wang

发表机构 * School of Artificial Intelligence, Beijing Normal University(人工智能学院,北京师范大学)

AI总结 提出一种基于Stiefel流形精确优化的概率偏最小二乘框架,通过噪声预估计、约束似然优化和预测校准,实现闭式更新、误差界和校准不确定性。

详情
AI中文摘要

概率偏最小二乘(PPLS)是一种基于似然的核心双视图模型,适用于需要可解释潜在因子和校准不确定性的场景。基于Bouhaddani等人(2018)的可识别参数化,现有拟合流程仍面临两个实际瓶颈:联合EM/ECM更新下的噪声-信号耦合以及正交约束的非平凡处理。遵循固定噪声标量似然协议,我们开发了一个端到端框架,将噪声预估计、约束似然优化和预测校准整合到一条流水线中。我们从低特征值噪声子空间估计观测噪声,并通过精确的Stiefel流形优化强制执行正交性。噪声子空间估计器实现了与信号强度无关的前沿有限样本率,并匹配极小极大下界,而全谱噪声估计器在同一模型下携带确定性偏差。我们通过可选的高斯化将框架扩展到次高斯设置,并通过块结构Fisher分析提供闭式标准误差。在合成高噪声设置和两个多组学基准(TCGA-BRCA和PBMC CITE-seq)上,该方法无需事后重新校准即可实现接近名义覆盖,在TCGA-BRCA上秩$r=3$时达到Ridge级点精度,在跨视图预测上匹配或超过PO2PLS,同时提供原生校准不确定性,并提高参数恢复的稳定性。

英文摘要

Probabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood protocol, we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. We estimate the observation noise from the low-eigenvalue noise subspace and enforce orthogonality through exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, whereas a full-spectrum noise estimator carries a deterministic bias under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.

2605.05629 2026-06-03 stat.ML cs.CL cs.LG

Spherical Flows for Sampling Categorical Data

用于分类数据采样的球面流

Jannis Chemseddine, Gregor Kornhardt, Gabriele Steidl

发表机构 * Technische Universität Berlin(柏林技术大学)

AI总结 提出在球面上利用von Mises-Fisher分布进行离散序列生成建模,通过径向对称性简化连续性方程为标量ODE,结合后验加权切线和与预测-校正采样实现高效采样。

详情
AI中文摘要

我们研究了在连续嵌入空间中学习离散序列生成模型的问题。以往的方法通常在欧几里得空间或概率单纯形上操作,而我们则在球面$\mathbb S^{d-1}$上工作。在那里,von Mises-Fisher (vMF)分布诱导了一个自然的噪声过程,并允许闭式条件得分。条件速度通常是难以处理的。利用vMF密度的径向对称性,我们将$\mathbb S^{d-1}$上的连续性方程简化为关于余弦相似度的标量ODE,其唯一有界解决定了速度。$\mathbb S^{d-1}$上的边际速度和边际得分都分解为后验加权的切线和,仅因每个token的标量权重不同。这提供了ODE和预测-校正(PC)采样两种途径。后验是唯一需要学习的对象,通过交叉熵损失训练。实验将vMF路径与测地线和欧几里得替代方案进行了比较。vMF与PC采样的结合显著改善了数独和语言建模的结果。

英文摘要

We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.

2605.08426 2026-06-03 cs.GT cs.AI

Mechanism Design Is Not Enough: Prosocial Agents for Cooperative AI

机制设计是不够的:面向合作AI的亲社会智能体

Xuanqiang Angelo Huang, Charlie Tharas, Samuele Marro, Van Q. Truong, Bernhard Schölkopf, Emanuele La Malfa, Zhijing Jin

发表机构 * ETH Zürich(苏黎世联邦理工学院) University of Oxford(牛津大学) Institute for Decentralized AI(去中心化人工智能研究所) Jinesis Lab, University of Toronto & Vector Institute(多伦多大学Jinesis实验室及向量研究所) EuroSafeAI University of Pennsylvania(宾夕法尼亚大学) Max Planck Institute for Intelligent Systems, Tübingen, Germany(德国图宾根最大计划智能系统研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所)

AI总结 本文证明仅靠机制设计无法最大化LLM智能体的社会福利,并提出亲社会智能体(兼顾他人福利)能弥补这一差距,实现更优的社会与个体结果。

Comments 42 pages

详情
AI中文摘要

确保AI智能体在与他人互动时安全且有益的行为已成为现代AI安全的核心挑战之一。尽管机制设计作为设计规则以协调个体和集体目标的理论,可以激励合作行为,但仅凭它是否足以最大化LLM智能体的社会福利仍是一个开放问题。本文证明答案是否定的:借鉴不完全契约理论,我们正式表明,当契约无法区分所有相关的未来偶然事件时,存在任何现实机制都无法消除的严格正福利损失。我们表明,亲社会智能体(即权衡他人福利与自身福利的智能体)可以弥合这一差距,并实现社会更优且个体有益的结果。实验上,我们展示了在以大型语言模型为动力的多智能体资源分配环境和经典社会困境中,亲社会性是有益的。对AI安全的启示是明确的:为了实现大规模的合作互动,设计充分的机制是不够的;智能体必须被构建为内在亲社会的。

英文摘要

Ensuring that AI agents behave safely and beneficially when interacting with other parties has emerged as one of the central challenges of modern AI safety. While mechanism design, as the theory of designing rules to align individual and collective objectives, can incentivize cooperative behavior, it is still an open question whether it alone is sufficient to maximize LLM agents' social welfare. This work proves that the answer is negative: drawing from incomplete contract theory, we formally show that when contracts cannot distinguish all relevant future contingencies, there is a strictly positive welfare loss that no realistic mechanism can eliminate. We show that prosocial agents, who weigh others' welfare alongside their own, can close this gap and achieve outcomes that are socially superior and individually beneficial. Experimentally, we show that in multi-agent resource-allocation environments and canonical social dilemmas where agents are powered by large language models, prosociality is beneficial. The implication for AI safety is clear: to enable cooperative interactions at scale, designing adequate mechanisms is not sufficient; agents must be built to be intrinsically prosocial.

2604.19275 2026-06-03 eess.SY cs.OS cs.RO cs.SY

Scheduling Analysis of UAV Flight Control Workloads on PREEMPT_RT Linux Using a Raspberry Pi 5

基于Raspberry Pi 5的PREEMPT_RT Linux上无人机飞行控制工作负载的调度分析

Luiz Giacomossi, Håkan Forsberg, Ivan Tomasic, Baran Çürüklü, Tommaso Cucinotta

发表机构 * Mälardalen University(马尔达LEN大学) ReTiS Lab, Scuola Superiore Sant’Anna(ReTiS实验室,圣安娜高等学院)

AI总结 通过分析Raspberry Pi 5上PREEMPT_RT Linux内核的激活路径对250 Hz控制回路的影响,发现标准内核最差延迟超过9 ms,而PREEMPT_RT将最差延迟降低约88%至225微秒以下,但剩余抖动主要由硬件内存争用引起。

Comments 9 pages, 8 figures, conference

详情
AI中文摘要

现代无人机架构日益趋向于将高级自主性和低级飞行控制统一在单个通用操作系统(GPOS)上。然而,复杂的多核片上系统(SoC)由于共享资源争用引入了显著的时间不确定性。本文对Raspberry Pi 5上的PREEMPT_RT Linux内核进行了架构分析,特别隔离了内核激活路径(延迟执行的SoftIRQ与实时直接激活)对250 Hz控制回路的影响。结果表明,在高负载下,标准内核不适合,最差延迟超过9毫秒。相比之下,PREEMPT_RT将最差延迟降低了近88%,降至225微秒以下,通过强制直接唤醒路径减轻了操作系统噪声。这些发现表明,虽然PREEMPT_RT解决了调度方差问题,但现代SoC上的剩余抖动主要由硬件内存争用驱动。

英文摘要

Modern UAV architectures increasingly aim to unify high-level autonomy and low-level flight control on a single General-Purpose Operating System (GPOS). However, complex multi-core System-on-Chips (SoCs) introduce significant timing indeterminism due to shared resource contention. This paper performs an architectural analysis of the PREEMPT RT Linux kernel on a Raspberry Pi 5, specifically isolating the impact of kernel activation paths (deferred execution SoftIRQs versus real-time direct activation) on a 250 Hz control loop. Results show that under heavy stress, the standard kernel is unsuitable, exhibiting worst-case latencies exceeding 9 ms. In contrast, PREEMPT RT reduced the worst-case latency by nearly 88 percent to under 225 microseconds, enforcing a direct wake-up path that mitigates OS noise. These findings demonstrate that while PREEMPT RT resolves scheduling variance, the residual jitter on modern SoCs is primarily driven by hardware memory contention.

2604.17220 2026-06-03 cs.MA cs.AI

Dynamics of Cognitive Heterogeneity: Investigating Behavioral Biases in Multi-Stage Supply Chains with LLM-Based Simulation

认知异质性动力学:基于大语言模型模拟的多阶段供应链中行为偏差研究

Jiuyun Jiang, Yuecheng Hong, Bo Yang, Jin Yang, Guangxin Jiang, Xiaomeng Guo, Guang Xiao

发表机构 * Harbin Institute of Technology(哈尔滨工业大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 本文通过引入大语言模型模拟多阶段供应链,基于分层推理框架分析认知异质性对智能体交互的影响,发现信息共享可缓解短视和自利行为导致的系统效率低下。

详情
AI中文摘要

在复杂的多轮决策中,生成式智能体之间的协调建模是人工智能和运营管理的核心挑战。尽管行为实验揭示了供应链效率低下背后的认知偏差,但传统方法面临可扩展性和控制限制。我们引入了一种可扩展的实验范式,使用大语言模型(LLMs)模拟多阶段供应链动态。本研究基于分层推理框架,专门分析了认知异质性对智能体交互的影响。与先前的同质设置不同,我们采用DeepSeek和GPT智能体,系统性地改变供应链各层级的推理复杂度。通过严格重复和统计验证的模拟,我们研究了这种认知多样性如何影响集体结果。结果表明,智能体表现出短视和自利行为,加剧了系统效率低下。然而,我们证明信息共享有效缓解了这些不利影响。我们的发现扩展了传统行为方法,并为AI赋能组织的动态提供了新见解。这项工作强调了基于LLM的智能体作为人类决策代理在复杂运营环境中的潜力和局限性。

英文摘要

Modeling coordination among generative agents in complex multi-round decision-making presents a core challenge for AI and operations management. Although behavioral experiments have revealed cognitive biases behind supply chain inefficiencies, traditional methods face scalability and control limitations. We introduce a scalable experimental paradigm using Large Language Models (LLMs) to simulate multi-stage supply chain dynamics. Grounded in a Hierarchical Reasoning Framework, this study specifically analyzes the impact of cognitive heterogeneity on agent interactions. Unlike prior homogeneous settings, we employ DeepSeek and GPT agents to systematically vary reasoning sophistication across supply chain tiers. Through rigorously replicated and statistically validated simulations, we investigate how this cognitive diversity influences collective outcomes. Results indicate that agents exhibit myopic and self-interested behaviors that exacerbate systemic inefficiencies. However, we demonstrate that information sharing effectively mitigates these adverse effects. Our findings extend traditional behavioral methods and offer new insights into the dynamics of AI-enabled organizations. This work underscores both the potential and limitations of LLM-based agents as proxies for human decision-making in complex operational environments.

2604.15713 2026-06-03 cs.LO cs.AI cs.PL

Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints

Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints

Kevin Kappelmann, Maximilian Schäffeler, Lukas Stevens, Mohammad Abdulaziz, Andrei Popescu, Dmitriy Traytel

发表机构 * Department of Computer Science, University of Sheffield, United Kingdom(英国谢菲尔德大学计算机科学系) Department of Informatics, King’s College London, United Kingdom(英国伦敦国王学院信息学院) Department of Computer Science, University of Copenhagen, Denmark(丹麦哥本哈根大学计算机科学系)

AI总结 研究Isabelle中秩一多态λ项的类型标注问题,通过人类和AI代理(LLM)分别进行纸笔证明和自动形式化,并利用人类提示进行改进和泛化。

详情
AI中文摘要

类型标注在打印项时至关重要,以确保其在重新解析和类型推断下保持含义。我们研究了Isabelle中使用的秩一多态$λ$-演算项的完全且最小类型标注问题。基于Smolka、Blanchette等人的先前工作,我们对该问题进行了元理论阐述,包括完整的形式化规范和证明,并在Isabelle/HOL中进行了形式化。我们的开发是一系列实验,展示了人类驱动和AI驱动的形式化工作流程:人类和基于LLM的AI代理独立产生纸笔证明,AI代理在Isabelle中自动形式化两者,并通过进一步的人类提示AI干预来改进和泛化开发。

英文摘要

Type annotations are essential when printing terms in a way that preserves their meaning under reparsing and type inference. We study the problem of complete and minimal type annotations for rank-one polymorphic $λ$-calculus terms, as used in Isabelle. Building on prior work by Smolka, Blanchette et al., we give a metatheoretical account of the problem, with a full formal specification and proofs, and formalize it in Isabelle/HOL. Our development is a series of experiments featuring human-driven and AI-driven formalization workflows: a human and an LLM-powered AI agent independently produce pen-and-paper proofs, and the AI agent autoformalizes both in Isabelle, with further human-hinted AI interventions refining and generalizing the development.

2604.15097 2026-06-03 cs.SE cs.CL

From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution

从程序技能到策略基因:迈向经验驱动的测试时进化

Junjie Wang, Yiming Ren, Haoyang Zhang

发表机构 * Infinite Evolution Lab, EvoMap(无限进化实验室,EvoMap) Tsinghua University(清华大学)

AI总结 本文通过45个科学代码求解场景中的4590次受控试验,研究如何将经验表示为紧凑、面向控制且可迭代进化的对象(Gene),相比文档导向的Skill包,Gene在一次性控制和迭代积累中均表现更优。

Comments Technical Report

详情
AI中文摘要

这份测试版技术报告探讨了可重用经验应如何表示,以便作为有效的测试时控制和迭代进化的基础。我们在45个科学代码求解场景中的4590次受控试验中研究了这一问题。我们发现,面向文档的Skill包提供了不稳定的控制:它们的有用信号稀疏,将紧凑的经验对象扩展为更完整的文档包通常无助于甚至可能降低整体平均值。我们进一步表明,表示本身是一个首要因素。紧凑的Gene表示产生了最强的整体平均值,在显著的结构扰动下仍具有竞争力,并优于预算匹配的Skill片段,而重新附加面向文档的材料通常会削弱而非改进它。除了单次控制外,我们还表明Gene是迭代经验积累的更好载体:附加的失败历史在Gene中比在Skill或自由格式文本中更有效,可编辑的结构比内容本身更重要,并且失败信息在提炼为紧凑警告时比简单附加更有用。在CritPt上,基因进化系统相对于其配对基础模型从9.1%提升至18.57%,从17.7%提升至27.14%。这些结果表明,经验复用的核心问题不是如何提供更多经验,而是如何将经验编码为紧凑、面向控制、可进化的对象。

英文摘要

This beta technical report asks how reusable experience should be represented so that it can function as effective test-time control and as a substrate for iterative evolution. We study this question in 4.590 controlled trials across 45 scientific code-solving scenarios. We find that documentation-oriented Skill packages provide unstable control: their useful signal is sparse, and expanding a compact experience object into a fuller documentation package often fails to help and can degrade the overall average. We further show that representation itself is a first-order factor. A compact Gene representation yields the strongest overall average, remains competitive under substantial structural perturbations, and outperforms matched-budget Skill fragments, while reattaching documentation-oriented material usually weakens rather than improves it. Beyond one-shot control, we show that Gene is also a better carrier for iterative experience accumulation: attached failure history is more effective in Gene than in Skill or freeform text, editable structure matters beyond content alone, and failure information is most useful when distilled into compact warnings rather than naively appended. On CritPt, gene-evolved systems improve over their paired base models from 9.1% to 18.57% and from 17.7% to 27.14%. These results suggest that the core problem in experience reuse is not how to supply more experience, but how to encode experience as a compact, control-oriented, evolution-ready object.

2604.13354 2026-06-03 cond-mat.mtrl-sci cs.AI

Finetuning-Free Diffusion Model with Adaptive Constraint Guidance for Inorganic Crystal Structure Generation

无需微调的扩散模型结合自适应约束引导用于无机晶体结构生成

Auguste de Lambilly, Vladimir Baturin, David Portehault, Guillaume Lambard, Nataliya Sokolovska, Florence d'Alché-Buc, Jean-Claude Crivello

发表机构 * CNRS-Saint-Gobain-NIMS(法国国家科学研究中心-圣戈班-日本纳米科学研究所) Laboratory for Innovative Key Materials and Structures (LINK)(创新关键材料与结构实验室) Laboratory of Computational, Quantitative, and Synthetic Biology (CQSB)(计算、定量与合成生物学实验室) Data-driven Materials Design Group(数据驱动材料设计组) Center for Basic Research on Materials(材料基础研究中心) LTCI, Télécom Paris, Institut Polytechnique de Paris(LTCI,巴黎电信,巴黎理工学院)

AI总结 提出一种基于扩散模型的自适应约束引导生成框架,无需微调即可结合用户定义的物理化学约束,生成满足热力学稳定性和几何约束的无机晶体结构。

Comments Full article including supplementary information, 56 pages, 9 figures

详情
AI中文摘要

发现具有目标性质的无机晶体结构是材料科学中的一个重大挑战。生成模型,尤其是最先进的扩散模型,有望对复杂数据分布进行建模并提出新颖、真实的样本。然而,当前的生成式AI模型仍然难以产生适用于高风险应用的、多样化、原创且可靠的实验可达成材料结构。在这项工作中,我们提出了一种基于扩散模型的自适应约束引导的生成式机器学习框架,该框架能够在生成过程中融入用户定义的物理和化学约束。该方法旨在对人类专家具有实用性和可解释性,允许透明的决策制定和专家驱动的探索。为了确保生成候选结构的鲁棒性和有效性,我们引入了一个多步骤验证流程,该流程结合了训练达到DFT精度水平的图神经网络估计器和用于评估热力学稳定性的凸包分析。我们的方法已在几个经典的无机化合物家族案例研究中得到测试和验证。因此,这些初步结果表明,我们的框架能够生成满足不同无机化学系统中目标几何约束的热力学合理的晶体结构。

英文摘要

The discovery of inorganic crystal structures with targeted properties is a significant challenge in materials science. Generative models, especially state-of-the-art diffusion models, offer the promise of modeling complex data distributions and proposing novel, realistic samples. However, current generative AI models still struggle to produce diverse, original, and reliable structures of experimentally achievable materials suitable for high-stakes applications. In this work, we propose a generative machine learning framework based on diffusion models with adaptive constraint guidance, which enables the incorporation of user-defined physical and chemical constraints during the generation process. This approach is designed to be practical and interpretable for human experts, allowing transparent decision-making and expert-driven exploration. To ensure the robustness and validity of the generated candidates, we introduce a multi-step validation pipeline that combines graph neural network estimators trained to achieve DFT-level accuracy and convex hull analysis for assessing thermodynamic stability. Our approach has been tested and validated on several classical examples of inorganic families of compounds, as case studies. As a consequence, these preliminary results demonstrate our framework's ability to generate thermodynamically plausible crystal structures that satisfy targeted geometric constraints across diverse inorganic chemical systems.

2507.17506 2026-06-03 eess.SP cs.LG

Power-Aware Cognitive Radar Multi-target Tracking Under Unknown Disturbances

未知扰动下的功率感知认知雷达多目标跟踪

Imad Bouhou, Stefano Fortunati, Leila Gharsalli, Alexandre Renaux

发表机构 * Université Paris-Saclay, CNRS, CentraleSupélec, Laboratoire des Signaux et Systèmes(巴黎萨克雷大学、法国国家科学研究中心、中央理工大学、信号与系统实验室) SAMOVAR, Télécom SudParis, Institut Polytechnique de Paris(SAMOVAR、 Télécom SudParis、巴黎公立理工学院) DR2I-IPSA

AI总结 针对未知扰动下多目标跟踪问题,提出一种基于部分可观测蒙特卡洛规划(POMCP)的认知雷达框架,通过自适应波形设计和功率分配提升低信噪比目标检测概率和跟踪精度。

详情
AI中文摘要

本文提出了一种认知雷达(CR)框架,旨在利用大规模多输入多输出(MMIMO)系统在未知扰动下跟踪多架飞机。由于均匀功率分配在不同信噪比(SNR)下是次优的,我们结合了由部分可观测蒙特卡洛规划(POMCP)驱动的自适应波形设计。通过为每个目标分配独立的POMCP树,系统高效预测目标状态。这些预测指导一个约束优化问题,主动将发射能量导向较弱的目标,同时为较强的目标维持足够的功率。结果证实,所提出的POMCP方法将低SNR目标的检测概率从0.6提高到接近0.9,并且相比非自适应正交波形或认知均匀功率POMCP基线,对最弱目标的跟踪更精确。

英文摘要

This work presents a cognitive radar (CR) framework designed to track multiple aircraft under unknown disturbances using massive multiple-input multiple-output (MMIMO) systems. Since uniform power allocation is suboptimal across varying signal-to-noise ratios (SNRs), we couple an adaptive waveform design driven by Partially Observable Monte Carlo Planning (POMCP). By assigning an independent POMCP tree to each target, the system efficiently predicts target states. These predictions inform a constrained optimization problem that actively directs transmit energy toward weaker targets while maintaining sufficient power for stronger ones. Results confirm that the proposed POMCP method improves the detection probability for low-SNR targets from 0.6 to nearly 0.9, and yields more accurate tracking of the weakest target than a non-adaptive orthogonal waveform or a cognitive uniform-power POMCP baseline.

2603.26791 2026-06-03 cs.DL cs.AI cs.CL cs.CY

Crystal: Characterizing Relative Impact of Scholarly Publications

Crystal: 表征学术出版物的相对影响力

Hannah Collison, Benjamin Van Durme, Daniel Khashabi

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出Crystal方法,利用大语言模型对引用论文进行联合排序,通过多数投票消除位置偏差,以更准确地区分高影响力引用,在人工标注数据集上准确率提升9.5%,F1提升8.3%。

详情
AI中文摘要

评估被引论文的影响力通常是通过在施引论文中单独分析其引用上下文来完成的。虽然这聚焦于最直接相关的文本,但它阻止了对一篇论文引用的所有作品进行相对比较。我们提出Crystal,它使用大语言模型(LLMs)联合排序施引论文中的所有被引论文。为了减轻LLMs的位置偏差,我们以随机顺序对每个列表进行三次排序,并通过多数投票聚合影响力标签。这种联合方法利用了完整的引用上下文,而不是独立评估引用,从而更可靠地区分有影响力的参考文献。Crystal在人工标注的引用数据集上,准确率比先前最先进的影响力分类器高出9.5%,F1高出8.3%。Crystal通过更少的LLM调用进一步提高了效率,并使用开放权重模型优于先前的基线,实现了可扩展、成本效益高的引用影响力分析。在对ACL时间检验奖获奖论文的案例研究中,我们发现Crystal的影响力特征与长期科学认可高度一致。我们发布了Crystal-Bank,一个包含46.8k篇论文的排名和影响力标签的数据集,以及代码。

英文摘要

Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper cites. We propose Crystal, which instead jointly ranks all cited papers within a citing paper using large language models (LLMs). To mitigate LLMs' positional bias, we rank each list three times in a randomized order and aggregate the impact labels through majority voting. This joint approach leverages the full citation context, rather than evaluating citations independently, to more reliably distinguish impactful references. Crystal outperforms a prior state-of-the-art impact classifier by +9.5% accuracy and +8.3% F1 on a dataset of human-annotated citations. Crystal further gains efficiency through fewer LLM calls and outperforms prior baselines using an open-weight model, enabling scalable, cost-effective citation impact analysis. In a case study of ACL Test-of-Time award-winning papers, we find that Crystal's impact characterizations align closely with long-term scientific recognition. We release Crystal-Bank, a 46.8k-paper dataset with rankings and impact labels, along with code.

2510.21011 2026-06-03 cs.HC cs.AI cs.CY

Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

生成模态工人:跨模型审计41个职业中LLM生成人设的种族与性别

Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff

发表机构 * Human-Computer Interaction Lab, Computer Science and Engineering(人机交互实验室,计算机科学与工程) Santa Clara University(圣克拉拉大学)

AI总结 本研究审计了四个大型语言模型生成的150多万个职业人设,通过与BLS数据对比,发现模型压缩了人口统计变异,系统性地扭曲了种族和性别代表性。

详情
AI中文摘要

随着生成式AI工具越来越多地被用于描绘职业角色中的人物,理解其种族和性别代表性偏差至关重要。我们审计了由四个主要大型语言模型(GPT-4、Gemini 2.5、DeepSeek V3.1和Mistral-medium)生成的41个美国职业中的150多万个职业人设。将这些人与美国劳工统计局(BLS)数据进行比较,我们发现模型生成的人口统计数据比真实世界数据的变异性更小,实际上将每个职业压缩为一种主导人口统计特征,而不是代表总体水平的变异。通过偏移/夸张分解揭示了这些扭曲的结构:白人(-31个百分点)和黑人(-9个百分点)工人持续被低估,而西班牙裔(+17个百分点)和亚裔(+12个百分点)工人被高估,刻板印象的夸张加剧了现有的职业隔离。这些扭曲往往极端,包括几乎全部将管家描绘为西班牙裔,以及许多职业中黑人工人几乎被抹去。由于这些模式在不同机构和文化起源的模型中重复出现,它们表明存在共享的结构性偏差来源,而非模型特定的伪影。我们认为,审计生成式AI需要评估框架,该框架检查合成人口如何系统地重塑跨社会角色的人口统计可见性。

英文摘要

As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models (GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium) across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31 percentage points) and Black (-9 pp) workers are consistently underrepresented, while Hispanic (+17 pp) and Asian (+12 pp) workers are overrepresented, with stereotype exaggeration amplifying existing occupational segregation. These distortions are often extreme, including near-total portrayals of housekeepers as Hispanic and the near-erasure of Black workers from many occupations. Because these patterns recur across models with different institutional and cultural origins, they suggest shared structural sources of bias rather than model-specific artifacts. We argue that auditing generative AI requires evaluation frameworks that examine how synthetic populations systematically reshape demographic visibility across social roles.

2603.23117 2026-06-03 cs.CR cs.AI cs.RO

TRAP: Hijacking VLA CoT-Reasoning via Adversarial Patches

TRAP: 通过对抗性补丁劫持VLA的CoT推理

Zhengxian Huang, Wenjun Zhu, Haoxuan Qiu, Xiaoyu Ji, Wenyuan Xu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TRAP攻击,利用对抗性补丁劫持视觉-语言-动作模型的链式推理,实现目标行为操控。

Comments Accepted by ICML 2026

详情
AI中文摘要

通过集成链式推理,视觉-语言-动作模型在机器人操作中展现出强大能力,特别是在提升泛化性和可解释性方面。然而,基于CoT的推理机制的安全性尚未得到充分探索。在本文中,我们证明CoT推理引入了一种新的攻击向量,用于目标行为劫持——例如,导致机器人错误地将刀递给一个人而不是苹果——而无需修改用户的指令。我们首先提供经验证据表明,即使CoT与输入指令在语义上不一致,它仍然强烈主导动作生成。基于这一观察,我们提出TRAP,这是首个针对CoT推理VLA模型的目标行为劫持对抗性攻击。通过针对推理到动作的路径,TRAP使用对抗性补丁(例如,放置在桌子上的桌布)来引导中间CoT推理和下游动作朝向对手定义的行为。在三个代表性推理VLA上的广泛评估,涵盖了不同的CoT推理机制,证明了TRAP的有效性。值得注意的是,我们在现实环境中通过将补丁打印在纸上实现了该攻击。我们的发现凸显了保护VLA系统中CoT推理的紧迫性。项目页面可在https://zhengxian-huang.github.io/TRAP-website/获取。

英文摘要

By integrating Chain-of-Thought (CoT) reasoning, Vision-Language-Action (VLA) models have demonstrated strong capabilities in robotic manipulation, particularly by improving generalization and interpretability. However, the security of CoT-based reasoning mechanisms remains largely unexplored. In this paper, we show that CoT reasoning introduces a novel attack vector for targeted behavior hijacking--for example, causing a robot to mistakenly deliver a knife to a person instead of an apple--without modifying the user's instruction. We first provide empirical evidence that CoT strongly governs action generation, even when it is semantically misaligned with the input instructions. Building on this observation, we propose TRAP, the first targeted behavior-hijacking adversarial attack against CoT-reasoning VLA models. By targeting the reasoning-to-action pathway, TRAP uses an adversarial patch (e.g., a tablecloth placed on the table) to steer intermediate CoT reasoning and downstream actions toward adversary-defined behaviors. Extensive evaluations on three representative reasoning VLAs, spanning distinct CoT reasoning mechanisms, demonstrate the effectiveness of TRAP. Notably, we implemented the patch by printing it on paper in a real-world setting. Our findings highlight the urgent need to secure CoT reasoning in VLA systems. The project page is available at https://zhengxian-huang.github.io/TRAP-website/.

2603.20508 2026-06-03 cs.MA cs.AI cs.CL

Measuring Weak-to-Strong Legibility of Reasoning Models

衡量推理模型的弱到强可读性

Dani Roytburg, Shreya Sridhar, Daphne Ippolito

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对推理语言模型在多智能体场景中生成的中间思维链,提出“弱到强可读性”概念,并设计衡量指标以评估强模型输出对弱模型的易理解性。

Comments Accepted to Trustworthy AI4GOOD Workshop @ ICML 2026

详情
AI中文摘要

推理语言模型及其生成的中间思维链在多智能体设置(如模型间监控或蒸馏到较小模型)中扮演着越来越核心的角色。当不同能力层级的智能体必须合作时,强模型需要产生能被弱模型消化的轨迹。我们将此目标称为“弱到强可读性”。大模型的可信度部分依赖于这种可读性属性。特别是在安全监督方面,采用弱监控器可能成为健康预算下可靠性支架的标准。可读性要求这些决策轨迹的形状采取某种弱监控器可访问的形式。现有的基于效率的可读性指标未能捕捉“彻底性”,而是侧重于简洁性。

英文摘要

Reasoning language models (RLMs) and the intermediate chains of thought they emit play an increasingly central role in multi-agent setups such as inter-model monitoring or distillation into smaller models. When agents at different capability tiers must cooperate, strong models need to produce traces digestible by weaker ones. We refer to this goal as "weak-to-strong legibility". Trustworthiness of large models depends in part on this legibility property. For safety oversight in particular, adoption of weak monitors may become a standard for reliability scaffolds on a healthy budget. Legibility requires that the shape of these decision-making traces takes some form accessible to weaker monitors. Existing efficiency-based metrics for legibility fail to capture "thoroughness", instead focusing on conciseness.

2603.19551 2026-06-03 stat.ME cs.LG

Learning to Bet for Horizon-Aware Anytime-Valid Testing

学习在严格截止日期下进行前瞻性任意有效测试的投注

Ege Onur Taga, Samet Oymak, Shubhanshu Shekhar

发表机构 * Department of Electrical and Computer Engineering, University of Michigan(密歇根大学电气与计算机工程系)

AI总结 本文通过将前瞻性投注建模为有限时域最优控制问题,利用深度强化学习学习通用策略,在严格截止日期下实现有界均值的任意有效测试和置信序列。

Comments To appear in ICML 2026; 29 pages, 22 figures

详情
AI中文摘要

我们针对严格截止日期 $N$ 下的有界均值,开发了前瞻性任意有效测试和置信序列。利用投注/e-过程框架,我们将前瞻性投注视为一个状态空间为 $(t, \log W_t)$ 的有限时域最优控制问题,其中 $t$ 是时间,$W_t$ 是测试鞅值。我们首先证明,在状态空间的某些内部区域,显著偏离Kelly投注的策略是次优的,而Kelly投注以高概率达到阈值。然后,我们识别出充分条件,表明在该区域之外,如果投注者落后于计划,比Kelly更激进的投注可能更好;如果投注者领先,比Kelly更保守的投注可能更好。这些结果共同暗示了 $(t, \log W_t)$ 平面上的一个简单相图,描绘了Kelly、分数Kelly和激进投注可能更优的区域。在此相图指导下,我们引入了一种基于通用深度Q网络(DQN)智能体的深度强化学习方法,该智能体从合成经验中学习单一策略,并将过去观测的简单统计量映射为跨时域和零假设的投注。在有限时域实验中,学习到的DQN策略取得了最先进的结果。

英文摘要

We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.

2602.04132 2026-06-03 eess.SY cs.LG cs.RO cs.SY

LC-SAC: Lyapunov-Constrained Soft Actor-Critic via Koopman Operator Theory for Trajectory Tracking and Stabilization

LC-SAC: 基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法用于轨迹跟踪与镇定

Dhruv S. Kushwaha, Zoleikha A. Biron

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种结合Koopman算子理论的李雅普诺夫约束软演员-评论家算法,通过线性提升动力学模型和闭环控制李雅普诺夫函数实现轨迹跟踪与镇定,并引入条件风险价值约束处理罕见但严重的失稳事件。

Comments 13 pages, 8 Figures

详情
AI中文摘要

强化学习在解决复杂序列决策问题中取得了显著成功,但其在安全关键物理系统中的应用仍受限于缺乏稳定性保证。标准强化学习算法优先考虑奖励最大化,往往产生可能引起振荡或无界状态发散的策略。本文提出一种基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法。我们通过扩展动态模态分解学习误差动力学的线性提升代理模型,并求解离散代数Riccati方程以获得闭式二次候选控制李雅普诺夫函数。该控制李雅普诺夫函数作为拉格朗日惩罚项被纳入SAC演员更新中,通过条件风险价值目标聚合最坏情况尾部分布,将约束压力集中在罕见但严重的失稳事件上。我们进一步引入三种结构性的EDMD改进:在求解DARE之前对提升的A矩阵进行谱半径归一化、具有物理意义的LQR状态代价,以及强制V(0)=0的值偏置锚点,使得闭式控制李雅普诺夫函数对于更高维的提升模型(如倒立摆和3D四旋翼)是适定的。消融研究表明,硬拉格朗日约束是必要的,将其替换为奖励塑形会导致学习不稳定并在四旋翼任务中导致回报崩溃。

英文摘要

Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. In this work we propose a Lyapunov-Constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We learn a linear lifted surrogate of the error dynamics via Extended Dynamic Mode Decomposition (EDMD) and solve the Discrete Algebraic Riccati Equation (DARE) to obtain a closed-form quadratic candidate Control Lyapunov Function (CLF). This CLF is incorporated into the SAC actor update as a Lagrangian penalty that aggregates the worst-case tail of violations via a Conditional Value-at-Risk (CVaR) objective, concentrating constraint pressure on rare but severe instability events. We further introduce three structural EDMD refinements spectral-radius normalization of the lifted A-matrix prior to the DARE solve, a physically meaningful LQR state cost, and a value-bias anchor enforcing V(0)=0 that make the closed-form CLF well-posed for higher-dimensional lifted models such as the cartpole and 3D quadrotor. The ablation study shows that a hard Lagrangian constraint is essential, replacing it with reward shaping (Lyap-RS-SAC) destabilizes learning and collapses return on quadrotor tasks.

2601.12186 2026-06-03 cs.SE cs.AI

Aletheia: What Makes RLVR For Code Verifiers Tick?

Aletheia: 什么使得代码验证器的RLVR有效?

Vatsal Venkatkrishna, Indraneil Paul, Iryna Gurevych

发表机构 * INSAIT, Sofia University "St. Kliment Ohridski", Bulgaria(保加利亚索菲亚大学INSAIT实验室) Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technical University of Darmstadt and National Research Center for Applied Cybersecurity(德累斯顿技术大学计算机科学系及应用网络安全国家研究中心通用知识处理实验室) ATHENE, Germany(德国ATHENE研究院)

AI总结 通过消融实验研究RLVR训练代码验证器时,中间思考轨迹、负样本学习和策略内训练三个因素在不同规模下的性能-成本权衡,发现最优配方依赖于模型规模。

Comments 31 pages, 6 figures

详情
AI中文摘要

通过可验证奖励的强化学习(RLVR)训练的多领域思考验证器是现代后训练的核心。然而,由于完整RLVR管道的成本过高,它们在代码生成中的应用落后于执行反馈。在这项工作中,我们消融了RLVR中性能-成本权衡的三个主要选择:中间思考轨迹、从负样本学习和策略内训练。我们引入了Aletheia,一个受控的、基于执行的测试平台,以促进对不同模型大小和两个常见验证器应用场景下的协变量偏移进行无污染分析。我们的分析揭示,最优训练配方依赖于规模:对于小型验证器,策略内学习是主要性能驱动因素,而在较大规模下,思考预算成为最关键因素。虽然利用负样本对不同大小的top-1选择准确性有一致影响,但它们对排名重建的贡献随规模单调增加,并在大规模下稳定训练中起关键作用。我们的帕累托最优分析表明,在较大模型规模下消除策略内训练会产生一个与完整RLVR配方性能相当的验证器。此外,我们发现,在较低预算下,放弃思考轨迹是一种计算高效的策略,在训练成本和验证器准确性之间提供了强有力的权衡。最终,我们的工作为高效部署鲁棒代码验证器提供了必要的经验基础,从而使其能够在大型代码生成模型的后训练管道中得到更广泛的应用。

英文摘要

Multi-domain thinking verifiers trained via Reinforcement Learning with Verifiable Rewards (RLVR) are a cornerstone of modern post-training. However, their adoption in code generation has lagged behind that of execution feedback due to the prohibitive costs of the full RLVR pipeline. In this work, we ablate three primary choices along the performance-cost trade-off in RLVR: intermediate thinking traces, learning from negative samples, and on-policy training. We introduce Aletheia, a controlled, execution-grounded testbed to facilitate a contamination-free analysis of code verifier training recipes across disparate model sizes and covariate shifts across two common verifier application scenarios. Our analysis reveals that the optimal training recipe is scale-dependent: on-policy learning is the primary performance driver for small verifiers, whereas the thinking budget becomes the most vital factor at larger scales. While leveraging negative samples has a consistent impact on top-1 selection accuracy across sizes, their contribution to ranking reconstruction increases monotonically with scale and plays a key role in stabilizing training at large sizes. Our Pareto optimality analysis demonstrates that eliminating on-policy training at larger model scales yields a verifier that performs comparably to the full RLVR recipe. Furthermore, we find that eschewing thinking traces serves as a compute-efficient strategy at lower budgets, offering a strong trade-off between training cost and verifier accuracy. Ultimately, our work provides the empirical foundation necessary to efficiently deploy robust code verifiers, thereby enabling their wider adoption in post-training pipelines for large code generation models.

2602.07075 2026-06-03 physics.chem-ph cs.AI cs.CL cs.LG

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem: 从文本思维链到化学推理中的潜在思考

Xinwu Ye, Yicheng Mao, Yuxuan Liao, Jia Zhang, Yimeng Liu, Li Hao, Fang Wu, Zhiwei Li, Zehong Wang, Zhiyuan Liu, Zhenfei Yin, Li Yuan, Philip Torr, Huan Sun, xiangxiang Zeng, Mengdi Wang, Le Cong, Shenghua Gao, Xiangru Tang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对化学大语言模型依赖显式思维链导致的模态不匹配问题,提出LatentChem推理接口,通过连续思维向量和动态感知解耦化学逻辑与语言生成,在ChemCoTBench上以59.88%非平局胜率超越强CoT基线,并实现平均10.84倍推理步骤开销降低(5.96倍实际加速)。

Comments Accepted at ICML 2026

详情
AI中文摘要

当前的化学大语言模型主要依赖显式的思维链来解决复杂推理问题。然而,将非语言的隐性化学逻辑强制转化为离散的自然语言,造成了根本性的“模态不匹配”,为推理带来了人为瓶颈。我们提出了LatentChem,一种将化学逻辑与语言生成解耦的推理接口,使模型能够通过连续思维向量和动态感知来处理信息。我们的研究揭示了一个关键涌现行为:自发内化,这里定义为在仅结果优化下的自我选择。当为任务成功进行优化时,模型放弃冗长的文本推导,转而采用隐式的潜在计算,这表明模型将连续流形视为化学逻辑更自然的载体。这一范式转变也被证明是一种更优的计算策略:在严格的ChemCoTBench基准上,LatentChem对强CoT基线取得了59.88%的非平局胜率,同时在所有评估基准上实现了平均10.84倍的推理步骤开销降低(5.96倍实际加速)。我们的结果提供了经验证据,表明化学推理更自然、更有效地实现为连续潜在动力学,而非离散的语言轨迹。

英文摘要

Current chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) to solve complex reasoning problems. However, forcing nonverbal tacit chemical logic into discrete natural language imposes a fundamental ``modality mismatch,'' creating an artificial bottleneck for reasoning. We introduce LatentChem, a reasoning interface that decouples chemical logic from linguistic generation, enabling the model to process information via continuous thought vectors and dynamic perception. Our investigation reveals a pivotal emergent behavior: spontaneous internalization, defined here as self-selected under outcome-only optimization. When optimized for task success, the model abandons verbose textual derivations in favor of implicit latent computation, suggesting that it identifies the continuous manifold as a more native substrate for chemical logic. This paradigm shift also proves to be a superior computational strategy: LatentChem achieves a 59.88\% non-tie win rate against the strong CoT baseline on the rigorous ChemCoTBench, while delivering a broad 10.84$\times$ average reduction in reasoning step overhead (5.96$\times$ wall-clock speedup) across all evaluated benchmarks. Our results provide empirical evidence that chemical reasoning is more naturally and effectively realized as continuous latent dynamics rather than discretized linguistic trajectories.

2603.05207 2026-06-03 cs.IR cs.CL

Core-based Hierarchies for Efficient GraphRAG

基于核心的高效图RAG层次结构

Jakir Hossain, Ahmet Erdem Sarıyüce

发表机构 * University at Buffalo(布法罗大学)

AI总结 针对图RAG中Leiden聚类不可复现的问题,提出用k-core分解替代,构建确定性、密度感知的层次结构,并设计轻量级启发式方法,在保证连接性的同时降低LLM成本,提升答案全面性和多样性。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
AI中文摘要

检索增强生成(RAG)通过引入外部知识增强了大型语言模型。然而,现有的基于向量的方法通常无法处理需要跨多个文档推理的全局理解任务。GraphRAG通过将文档组织成具有层次化社区的知识图谱来解决这一问题,这些社区可以被递归总结。当前的GraphRAG方法依赖Leiden聚类进行社区检测,但我们证明,在平均度数为常数且大多数节点度数较低的稀疏知识图谱上,模块度优化允许指数级数量的近似最优划分,使得基于Leiden的社区本质上不可复现。为了解决这个问题,我们提出用k-core分解替代Leiden,它在线性时间内产生确定性的、密度感知的层次结构。我们引入一组轻量级启发式方法,利用k-core层次结构构建大小有界、保持连接性的社区用于检索和总结,同时采用一种令牌预算感知的采样策略来降低LLM成本。我们在包括金融收益报告、新闻文章和播客在内的真实世界数据集上评估了我们的方法,使用三个LLM进行答案生成,并由五个独立的LLM裁判进行逐项比较评估。跨数据集和模型,我们的方法一致地提高了答案的全面性和多样性,同时减少了令牌使用量,证明了基于k-core的GraphRAG是一种有效且高效的全局理解框架。

英文摘要

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.

2510.20372 2026-06-03 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH

Testing Most Influential Sets

最具影响力集合的检验

Lucas D. Konrad, Nikolas Kuschnig

发表机构 * Vienna University of Economics and Business(维也纳经济与商业大学) Monash University(墨尔本大学)

AI总结 针对小部分数据点可能过度影响模型结论的问题,基于线性最小二乘法推导精确影响公式并识别最大影响的极值分布,提出一个用于检验过度影响的假设检验框架。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

小的有影响力的数据子集可以极大地影响模型结论,少数数据点可能推翻关键发现。虽然最近的研究识别了这些最具影响力的集合,但没有正式的方法来判断最大影响何时是过度的,而非在自然随机抽样变异下预期的。我们通过开发一个关于最具影响力集合的原则性框架来填补这一空白。聚焦于线性最小二乘法,我们推导了一个方便的精确影响公式,并识别了最大影响的极值分布——对于固定大小的集合和重尾数据是重尾的弗雷歇分布,对于增长集合或轻尾数据是表现良好的耿贝尔分布。这使得我们能够对过度影响进行严格的假设检验。我们通过跨经济学、生物学和机器学习基准的应用,解决了有争议的发现,并用严格的推断取代了临时的启发式方法。

英文摘要

Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence - the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.

2603.01471 2026-06-03 cs.IR cs.LG

Reconstructing Content with Collaborative Attention for Universal Multimodal Representation Learning

通过协同注意力重建内容以提升多模态嵌入质量

Jiahan Chen, Da Li, Hengran Zhang, Yinqiong Cai, Lixin Su, Jiafeng Guo, Daiting Shi, Dawei Yin, Keping Bi

发表机构 * State Key Laboratory of AI Safety(人工智能安全国家重点实验室) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Chinese Academy of Sciences(中国科学院大学) Baidu Inc.(百度公司)

AI总结 提出CoCoA预训练范式,通过重构注意力流和基于EOS的重建任务,利用协同注意力优化多模态嵌入,使模型将输入语义压缩到<EOS>令牌中,从而提升嵌入质量。

详情
AI中文摘要

多模态嵌入模型,根植于多模态大语言模型(MLLMs),在检索和分类等多样任务中取得了显著的性能提升。然而,现有方法大多严重依赖大规模对比学习,对MLLMs的架构和训练范式如何影响嵌入质量的探索有限。虽然MLLMs的因果注意力和下一个令牌预测范式在生成任务中有效,但并未明确鼓励形成全局紧凑的表示,限制了其作为多模态嵌入骨干的有效性。为解决这一问题,我们提出了CoCoA,一种基于协同注意力的内容重建预训练范式,用于多模态嵌入优化。具体而言,我们重构注意力流并引入基于EOS的重建任务,鼓励模型从相应的<EOS>嵌入中重建输入。这促使多模态模型将输入的语义信息压缩到<EOS>令牌中,为后续的对比学习奠定基础。在MMEB-V1上的大量实验表明,基于Qwen2-VL和Qwen2.5-VL构建的CoCoA显著提升了嵌入质量。结果验证了内容重建作为最大化现有数据价值的有效策略,使多模态嵌入模型能够生成紧凑且信息丰富的表示,提升其性能上限。

英文摘要

Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and next-token prediction paradigm of MLLMs does not explicitly encourage the formation of globally compact representations, limiting their effectiveness as multimodal embedding backbones. To address this, we propose CoCoA, a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization. Specifically, we restructure the attention flow and introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding <EOS> embeddings. This drives the multimodal model to compress the semantic information of the input into the <EOS> token, laying the foundations for subsequent contrastive learning. Extensive experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality. Results validate that content reconstruction serves as an effective strategy to maximize the value of existing data, enabling multimodal embedding models generate compact and informative representations, raising their performance ceiling.

2602.20213 2026-06-03 cs.SE cs.AI cs.CR

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker: 用于检测竞赛编程解决方案漏洞的自动化测试用例生成

Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao

发表机构 * Shanghai University of Finance and Economics(上海金融学院) Northwestern Polytechnical University(西北工业大学) Meituan(美团) University of Toronto(多伦多大学)

AI总结 提出CodeHacker框架,通过多策略对抗测试用例生成(压力测试、反哈希攻击、逻辑特定攻击)和校准阶段,有效暴露程序漏洞,提升测试集真负率并增强RL训练模型性能。

详情
AI中文摘要

大型语言模型(LLM)在代码生成方面的评估很大程度上依赖于测试用例的质量和鲁棒性。然而,现有的基准测试往往缺乏对微妙边界情况的覆盖,导致错误的解决方案通过测试。为弥补这一差距,我们提出了CodeHacker,一个自动化的智能体框架,专门用于生成有针对性的对抗性测试用例,以暴露程序提交中的潜在漏洞。模仿竞赛编程中的黑客机制,CodeHacker采用多策略方法,包括压力测试、反哈希攻击和逻辑特定攻击,以破解特定的代码提交。为确保这些攻击的有效性和可靠性,我们引入了一个校准阶段,在该阶段中,智能体在评估参赛者代码之前,通过自生成的对抗性探测迭代地完善自己的验证器和检查器。实验表明,CodeHacker显著提高了现有数据集上的真负率(TNR),有效过滤了先前被接受的错误解决方案。此外,生成的对抗性案例被证明是优越的训练数据,提升了在LiveCodeBench等基准测试上经过强化学习训练的模型的性能。

英文摘要

The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant code.Experiments demonstrate that CodeHacker significantly improves the True Negative Rate (TNR) of existing datasets, effectively filtering out incorrect solutions that were previously accepted. Furthermore, generated adversarial cases prove to be superior training data, boosting the performance of RL-trained models on benchmarks like LiveCodeBench.

2602.18690 2026-06-03 q-bio.NC cs.CV cs.LG

Neural Fields as World Models

神经场作为世界模型

Joshua Nunley

发表机构 * Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington(信息学、计算与工程学院,印第安纳大学,布卢明顿) Cognitive Science Program, Indiana University, Bloomington(认知科学项目,印第安纳大学,布卢明顿)

AI总结 提出同构世界模型,利用运动门控神经场在空间图中进行物理预测,实现离线任务学习和身体相关表征。

Comments 6 pages, 6 figures. Annual Meeting of the Cognitive Science Society (CogSci 2026)

详情
AI中文摘要

人类可以在离线状态下预演可能的未来,例如在心理练习和可能的梦境中,这表明世界模型可能支持远离环境的学习。标准的机器学习世界模型将视觉输入压缩为潜在向量,丢弃了感觉皮层的空间结构特征。我们提出了同构世界模型:一种保持感觉拓扑结构的架构,使得物理预测成为几何传播而非抽象状态转换。我们通过运动门控神经场实现这一想法,其中活动通过局部侧向连接演化,运动命令乘性地调制特定通道。在三个实验中,相同的架构学习了无“瞬移”的弹道预测,通过将任务误差通过冻结的学习世界模型传播,改进了离线接球策略,并在没有身体标签的情况下发展出身体选择性的运动通道。这些结果提供了初步证据,表明物理预测、离线任务学习和身体相关表征共享一个共同的计算基础:空间地图内的动作条件预测。

英文摘要

Humans rehearse possible futures offline, as in mental practice and perhaps dreaming, suggesting that world models may support task learning away from the environment. Standard machine learning world models compress visual input into latent vectors, discarding the spatial structure that characterizes sensory cortex. We propose isomorphic world models: architectures that preserve sensory topology, so physics prediction becomes geometric propagation rather than abstract state transition. We implement this idea with motor-gated neural fields, where activity evolves through local lateral connectivity and motor commands multiplicatively modulate specific channels. Across three experiments, the same architecture learns ballistic prediction without ``teleporting,'' improves a catching policy offline by propagating task error through a frozen learned world model, and develops body-selective motor channels without body labels. These results provide preliminary evidence that physical prediction, offline task learning, and body-linked representation share a common computational substrate: action-conditional prediction within a spatial map.

2602.12430 2026-06-03 cs.MA cs.AI

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

大型语言模型的智能体技能:架构、获取、安全与未来路径

Renjun Xu, Yang Yan

发表机构 * ReDiscovery Hangzhou China(杭州ReDiscovery研究院) Westlake University Hangzhou China(西交大学)

AI总结 本文综述了大型语言模型智能体技能的研究,涵盖架构基础(如SKILL.md规范、渐进式上下文加载)、技能获取(强化学习、自主发现、组合合成)、大规模部署(计算机使用智能体栈、GUI接地)以及安全挑战(26.1%社区技能含漏洞),并提出技能信任与生命周期治理框架。

Comments Accepted by Agent Skills '26 Workshop at ACM Conference on AI and Agentic Systems 2026

详情
AI中文摘要

从单体语言模型向模块化、配备技能的智能体的转变,标志着大型语言模型(LLM)在实践中部署方式的决定性转变。智能体技能——即智能体按需加载的指令、代码和资源的可组合包——无需重新训练即可实现动态能力扩展,而非将所有程序性知识编码在模型权重中。它被形式化为渐进式披露、可移植技能定义以及与模型上下文协议(MCP)集成的范式。本综述全面探讨了智能体技能领域,该领域在过去几个月迅速发展。我们沿四个轴组织该领域:(i)架构基础,考察SKILL.md规范、渐进式上下文加载以及技能与MCP的互补作用;(ii)技能获取,涵盖使用技能库的强化学习、自主技能发现(SEAgent)和组合技能合成;(iii)大规模部署,包括计算机使用智能体(CUA)栈、GUI接地进展以及OSWorld和SWE-bench上的基准进展;(iv)安全,最近的经验分析显示,26.1%的社区贡献技能包含漏洞,这促使我们提出技能信任与生命周期治理框架——一个四层、基于门的权限模型,将技能来源映射到分级部署能力。我们识别出七个开放挑战——从跨平台技能可移植性到基于能力的权限模型——并提出了实现可信、自我改进技能生态系统的研究议程。与先前广泛涵盖LLM智能体或工具使用的综述不同,本工作特别关注新兴的技能抽象层及其对下一代智能体系统的影响。项目仓库:https://github.com/scienceaix/agentskills

英文摘要

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL$.$md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries, autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills

2602.10949 2026-06-03 stat.ML cs.LG math.DS math.PR

Optimal Initialization in Depth: Lyapunov Initialization and Limit Theorems for Deep Leaky ReLU Networks

深度网络的最优初始化:深度Leaky ReLU网络的Lyapunov初始化与极限定理

Constantin Kogler, Tassilo Schwarz, Samuel Kittle

发表机构 * School of Mathematics, Institute for Advanced Study(数学系,高级研究院) Mathematical Institute, University of Oxford(牛津大学数学学院) Max Planck Institute for Multidisciplinary Sciences(多学科科学研究所) Department of Mathematics, University College London(伦敦大学学院数学系)

AI总结 本文通过随机深度Leaky ReLU网络的严格概率分析,提出Lyapunov初始化方法,将Lyapunov指数设为零以确保激活稳定性,从而改善学习效果。

Comments Preprint, 44 pages

详情
AI中文摘要

深度网络的有效初始化需要理解随机神经网络。本文对深度无偏置随机Leaky ReLU网络进行了严格的概率分析。我们证明了网络激活范数对数的强大数定律和中心极限定理,表明随着层数增加,其增长由称为Lyapunov指数的参数控制。该参数刻画了激活消失与爆炸之间的尖锐相变,并针对高斯或正交权重矩阵显式计算了Lyapunov指数。我们的结果表明,标准方法(如He初始化或正交初始化)无法保证低宽度深度网络的激活稳定性。基于这些理论见解,我们提出了一种新的初始化方法,称为Lyapunov初始化,它将Lyapunov指数设为零,从而确保神经网络尽可能稳定,经验上导致学习改进。

英文摘要

Effective initialization in deep networks requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep bias-free random Leaky ReLU networks is provided. We prove a Law of Large Numbers and a Central Limit Theorem for the logarithm of the norm of network activations, establishing that, as the number of layers increases, their growth is governed by a parameter called the Lyapunov exponent. This parameter characterizes a sharp phase transition between vanishing and exploding activations, and we calculate the Lyapunov exponent explicitly for Gaussian or orthogonal weight matrices. Our results reveal that standard methods, such as He initialization or orthogonal initialization, do not guarantee activation stability for deep networks of low width. Based on these theoretical insights, we propose a novel initialization method, referred to as Lyapunov initialization, which sets the Lyapunov exponent to zero and thereby ensures that the neural network is as stable as possible, leading empirically to improved learning.

2602.10387 2026-06-03 cs.DB cs.AI

Test-Time Optimization of Physical Query Plans with LLMs

基于LLM的物理查询计划测试时优化

Mehmet Hamza Erol, Xiangpeng Hao, Federico Bianchi, Ciro Greco, Jacopo Tagliabue, James Zou

发表机构 * Stanford University(斯坦福大学) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) TogetherAI Bauplan

AI总结 提出DBPlanBench框架,利用LLM在测试时通过语义推理和进化搜索优化物理查询计划,在OLAP查询中实现1.05-1.12倍中位数加速,并支持小规模到大规模的迁移。

Comments Code is available at: https://github.com/BauplanLabs/DBPLANBENCH

详情
AI中文摘要

传统查询优化依赖于基于成本的优化器,使用预定义的启发式和统计模型来估计执行成本(如运行时间、内存和I/O)。改进这些需要大量的工程努力,但它们通常无法利用查询和模式中的语义相关性来获得更好的物理计划。然而,大型语言模型(LLMs)能够推理列语义、值分布以及经典统计所忽略的更广泛的领域上下文。我们介绍了DBPlanBench,一个用于DataFusion引擎的框架,它通过紧凑的序列化表示暴露物理计划,并将LLM提出的编辑作为JSON补丁应用。在此框架上,我们实例化了一个测试时优化工作流,其中LLM检查物理查询计划,基于语义推理提出局部编辑,并通过进化搜索在迭代中优化候选方案。我们针对OLAP查询,其中重复执行的重负载使得即使是微小的效率提升也能转化为显著的累积节省。我们特别将评估重点放在连接重排序和连接侧选择上,其中基数估计误差会复合倍增。在TPC-H上中位数加速达到1.10-1.12倍,在TPC-DS上达到1.05-1.07倍,某些查询加速高达4.78倍。我们还证明了在小规模因子下发现的优化可以有效地迁移到更大规模,支持低成本的小规模到大工作流。

英文摘要

Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these requires substantial engineering effort, yet they often cannot exploit semantic correlations in queries and schemas that could enable better physical plans. Large language models (LLMs), however, can reason about column semantics, value distributions, and broader domain context that classical statistics miss. We introduce DBPlanBench, a harness for the DataFusion engine that exposes physical plans through a compact serialized representation and applies LLM-proposed edits as JSON patches. On this harness, we instantiate a test-time optimization workflow where an LLM examines physical query plans, proposes localized edits based on semantic reasoning, and an evolutionary search refines the candidates across iterations. We target OLAP queries, where heavy, repeated execution turns even small efficiency gains into substantial cumulative savings. We specifically focus our evaluation on join reordering and join-side selection, where cardinality-estimation errors compound multiplicatively. Median speedups reach $1.10$-$1.12\times$ on TPC-H and $1.05$-$1.07\times$ on TPC-DS, with some achieving up to $4.78\times$. We also demonstrate that optimizations discovered at small scale factors transfer effectively to larger ones, supporting a low-cost small-to-large workflow.

2510.12049 2026-06-03 econ.GN cs.AI q-fin.EC

Generative AI and Sales Productivity: Field Experiments in Online Retail

生成式人工智能与销售效率:在线零售中的现场实验

Lu Fang, Zhe Yuan, Kaifu Zhang, Dante Donati, Miklos Sarvary

发表机构 * Duke University(杜克大学) Imperial Business School(帝国商学院) BIG AI Conference(大数据人工智能会议) MSI AI Forum(MSI人工智能论坛) TSE Digital Economics Conference(TSE数字经济会议) AIML Conference(人工智能与机器学习会议) Operational Innovation Network Summit(运营创新网络峰会) University of Rochester(罗切斯特大学) UC Davis(加州大学戴维斯分校) TUM Workshop on Generative AI in Marketing(慕尼黑工业大学生成AI在营销中的研讨会) UCL School of Management(伦敦大学学院管理学院) Columbia Business School(哥伦比亚商学院) Business & Generative AI Conference(商业与生成AI会议) Zhejiang University School of Management(浙江大学管理学院)

AI总结 通过大规模随机现场实验,量化生成式人工智能(GenAI)对在线零售销售业绩的短期影响,发现GenAI在多数工作流中提升销售额,主要通过提高转化率而非客单价,且对经验较少的消费者效果更显著。

Comments Keywords: Artificial Intelligence, Consumer Experience, Field Experiments, GenAI, Productivity, Retail Platforms, Sales. JEL codes: C93, D24, L81, M31, O3

详情
AI中文摘要

我们通过在一家领先的跨境在线零售平台上进行涉及数百万用户和产品的大规模随机现场实验,量化了生成式人工智能(GenAI)对销售业绩的短期影响。在2023-2024年间,该平台将GenAI整合到七个面向消费者的业务流程中,涵盖客户服务、消费者-产品匹配、广告和卖家服务。我们发现,GenAI的采用在大多数工作流中提高了销售额,效果范围从无显著影响到16.3%,具体取决于GenAI相对于基线公司实践的边际贡献。在四个具有正向销售效果的GenAI应用中,隐含的年增量价值约为5美元——考虑到零售商的规模和GenAI采用的早期阶段,这是一个具有经济意义的影响。收益主要通过更高的转化率而非更大的购物车价值实现,这与GenAI通过减少搜索、信息、沟通和个性化摩擦来改善购物体验相一致。重要的是,这些效应并未与更差的购买后结果相关,因为产品退货率和客户评分没有恶化。最后,我们记录了显著的需求侧异质性,对经验较少的消费者收益更大。我们的发现提供了新颖的大规模因果证据,展示了GenAI如何塑造在线零售的销售效率,突出了其即时价值和更广泛的潜力。

英文摘要

We quantify the short-term impact of Generative Artificial Intelligence (GenAI) on sales performance through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over 2023-2024, the platform integrated GenAI into seven consumer-facing business workflows spanning customer service, consumer-product matching, advertising, and seller services. We find that GenAI adoption increases sales in most workflows, with effects ranging from no detectable impact to $16.3\%$, depending on GenAI's marginal contribution relative to baseline firm practices. Across the four GenAI applications with positive sales effects, the implied annual incremental value is roughly $\$5-$an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The gains operate primarily through higher conversion rates rather than larger cart values, consistent with GenAI improving the shopping experience by reducing search, information, communication, and personalization frictions. Importantly, these effects are not associated with worse post-purchase outcomes, as product return rates and customer ratings do not deteriorate. Finally, we document substantial demand-side heterogeneity, with larger gains for less experienced consumers. Our findings provide novel, large-scale causal evidence on how GenAI shapes sales productivity in online retail, highlighting both its immediate value and broader potential.

2510.12636 2026-06-03 stat.ML cs.LG math.AP

Adapting Noise to Data: Generative Flows from 1D Processes

将噪声适应于数据:来自一维过程的生成流

Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出一个通用框架,通过一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),利用噪声与数据之间的Wasserstein距离进行优化,以改善生成流模型对重尾等分布的学习能力。

Comments ICML 2026

详情
AI中文摘要

基于流的生成模型中的默认高斯潜变量在学习某些分布(如重尾分布)时会带来挑战。我们引入了一个通用框架,使用一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),并通过噪声与数据之间的Wasserstein距离进行优化。基于分位数的先验参数化自然地适应重尾分布和紧支撑分布,并缩短传输路径。在重尾天气和图像数据集上的数值结果证实了该方法的灵活性和有效性,且计算开销可忽略不计。

英文摘要

The default Gaussian latent in flow-based generative models poses challenges when learning certain distributions such as heavy-tailed ones. We introduce a general framework for learning data-adaptive parametric prior distributions (latent noise) using one-dimensional quantile functions, optimized via the Wasserstein distance between noise and data. The quantile-based prior parameterization naturally adapts to both heavy-tailed and compactly supported distributions and shortens transport paths. Numerical results on heavy-tailed weather and image datasets confirm the method's flexibility and effectiveness achieved with negligible computational overhead.