arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4046
2409.10310 2026-05-12 cs.RO cs.SY eess.SY

Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization

Lei Zheng, Rui Yang, Minzhe Zheng, Michael Yu Wang, Jun Ma

AI总结 本文研究了自动驾驶车辆在部分可观测环境中实现安全且实时一致路径规划的问题。提出了一种基于共识安全屏障模块和并行轨迹优化的CPTO方法,通过离散时间屏障函数理论确保在不同障碍物配置下的轨迹安全性,并将优化问题分解为多个低维二次规划问题以加速计算。实验表明,该方法在合成和真实交通数据集上均能有效提升车辆的行驶安全性和路径一致性。

Comments 16 pages, 7 figures

详情
Journal ref
IEEE Transactions on Intelligent Transportation Systems, vol. 27, no. 5, pp. 5174-5190, 2026
英文摘要

Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety barrier module that ensures reliable safety coverage within the spatiotemporal trajectory space across potential obstacle configurations. Following this, a bi-convex parallel trajectory optimization problem is derived that facilitates decomposition into a series of low-dimensional quadratic programming problems to accelerate computation. By leveraging the consensus alternating direction method of multipliers (ADMM) for parallel optimization, each generated candidate trajectory corresponds to a possible environment configuration while sharing a common consensus trajectory segment. This ensures driving safety and consistency when executing the consensus trajectory segment for the ego vehicle in real time. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.

2407.12173 2026-05-12 cs.CV cs.AI

Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis

Haeil Lee, Hansang Lee, Seoyeon Gye, Junmo Kim

AI总结 本文提出了一种基于扩散过程图像谱分析的高效时间步采样方法,用于提升扩散模型的图像生成效率。该方法采用类似Beta分布的采样策略,重点采样扩散过程中早期和晚期对图像内容变化影响较大的关键步骤,而非传统的均匀分布采样。实验表明,该方法在FID和IS指标上优于均匀采样,且在计算效率方面具有竞争力,为扩散模型的优化提供了实用框架。

Comments 8 pages, 9 figures, WACV 2025

详情
Journal ref
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4215-4224, 2025
英文摘要

Generative diffusion models have emerged as a powerful tool for high-quality image synthesis, yet their iterative nature demands significant computational resources. This paper proposes an efficient time step sampling method based on an image spectral analysis of the diffusion process, aimed at optimizing the denoising process. Instead of the traditional uniform distribution-based time step sampling, we introduce a Beta distribution-like sampling technique that prioritizes critical steps in the early and late stages of the process. Our hypothesis is that certain steps exhibit significant changes in image content, while others contribute minimally. We validated our approach using Fourier transforms to measure frequency response changes at each step, revealing substantial low-frequency changes early on and high-frequency adjustments later. Experiments with ADM and Stable Diffusion demonstrated that our Beta Sampling method consistently outperforms uniform sampling, achieving better FID and IS scores, and offers competitive efficiency relative to state-of-the-art methods like AutoDiffusion. This work provides a practical framework for enhancing diffusion model efficiency by focusing computational resources on the most impactful steps, with potential for further optimization and broader application.

2407.10853 2026-05-12 cs.CL cs.AI

Bring Your Own Prompts: Use-Case-Specific Bias and Fairness Evaluation for LLMs

Dylan Bouchard

AI总结 该论文研究了大语言模型(LLMs)在不同应用场景下的偏见和公平性风险问题,指出现有方法缺乏针对具体使用场景选择合适评估指标的系统指导。为此,作者提出了一种决策框架,根据任务类型、提示词中是否包含受保护属性以及利益相关者的优先级,将模型和提示词集合映射到相应的偏见和公平性指标上,并引入了基于刻板印象分类器和反事实文本相似度的新型评估方法。研究还开发了开源工具库 langfair,并通过多模型、多提示词集的实验验证了公平性风险评估必须基于具体部署场景,而不能仅依赖基准性能。

Comments v6: Updated title; LangFair repository: https://github.com/cvs-health/langfair

详情
英文摘要

Bias and fairness risks in Large Language Models (LLMs) vary substantially across deployment contexts, yet existing approaches lack systematic guidance for selecting appropriate evaluation metrics. We present a decision framework that maps LLM use cases, characterized by a model and population of prompts, to relevant bias and fairness metrics based on task type, whether prompts contain protected attribute mentions, and stakeholder priorities. Our framework addresses toxicity, stereotyping, counterfactual unfairness, and allocational harms, and introduces novel metrics based on stereotype classifiers and counterfactual adaptations of text similarity measures. We release an open-source Python library, \texttt{langfair}, for practical adoption. Extensive experiments on use cases across five LLMs and five prompt populations demonstrate that fairness risks cannot be reliably assessed from benchmark performance alone: results on one prompt dataset likely overstate or understate risks for another, underscoring that fairness evaluation must be grounded in the specific deployment context.

2406.10861 2026-05-12 cs.LG cs.DC

Knowledge Distillation in Federated Learning: a Survey on Long Lasting Challenges and New Solutions

Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, Philip S. Yu

AI总结 联邦学习(FL)是一种分布式且注重隐私的机器学习范式,允许多个客户端协同训练模型而不泄露原始数据。为应对传统联邦学习中面临的隐私风险、数据异构性、通信瓶颈和系统异构等挑战,知识蒸馏(KD)自2020年以来被广泛应用于联邦学习中。本文对基于知识蒸馏的联邦学习方法进行了全面综述,分析了其核心原理、分类体系以及在隐私保护、数据异构处理、通信效率提升和个性化等方面的应用,并探讨了当前面临的挑战与未来研究方向。

详情
Journal ref
International Journal of Intelligent Systems, 7406934, 33 pages, 2025
英文摘要

Federated Learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while keeping the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been widely applied in FL since 2020. KD is a validated and efficacious model compression and enhancement algorithm. The core concept of KD involves facilitating knowledge transfer between models by exchanging logits at intermediate or output layers. These properties make KD an excellent solution for the long-lasting challenges in FL. Up to now, there have been few reviews that summarize and analyze the current trend and methods for how KD can be applied in FL efficiently. This article aims to provide a comprehensive survey of KD-based FL, focusing on addressing the above challenges. First, we provide an overview of KD-based FL, including its motivation, basics, taxonomy, and a comparison with traditional FL and where KD should execute. We also analyze the critical factors in KD-based FL in the appendix, including teachers, knowledge, data, and methods. We discuss how KD can address the challenges in FL, including privacy protection, data heterogeneity, communication efficiency, and personalization. Finally, we discuss the challenges facing KD-based FL algorithms and future research directions. We hope this survey can provide insights and guidance for researchers and practitioners in the FL area.

2308.03303 2026-05-12 cs.CL

LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li

AI总结 本文研究了如何高效地对大语言模型进行微调,提出了LoRA-FA方法,通过冻结LoRA中的投影矩阵A,仅训练投影矩阵B,从而减少参数量并提升效率。该方法揭示了LoRA更新中存在的一种非对称可压缩结构,并引入闭式梯度修正以缩小与全参数微调的性能差距。实验表明,LoRA-FA在多个基准测试中表现优异,同时显著降低了内存和计算开销。

详情
英文摘要

Fine-tuning large language models (LLMs) is crucial for improving their performance on downstream tasks, but full-parameter fine-tuning (Full-FT) is computationally expensive and memory-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), address this by optimizing only a small subset of parameters. However, LoRA may underperform Full-FT in certain scenarios due to the intrinsic limitations of its low-rank gradients. In this work, we reveal an asymmetric, collapsible structure in LoRA's update: the low-rank modification to W can be reformulated as a single-layer linear regression, implying that one of the LoRA factors can be frozen without sacrificing expressivity. Leveraging this insight, we introduce LoRA-FA, which freezes the projection-down matrix A and trains only the projection-up matrix B. We further close the gap to Full-FT by deriving closed-form gradient corrections that minimize the discrepancy between the induced low-rank gradient and the full gradient. Through extensive experiments on diverse benchmarks, including GLUE, GSM8K, MT-Bench, and HumanEval, we demonstrate that LoRA-FA consistently achieves comparable performance to existing PEFT methods and Full-FT. Experiments on system efficiency show that LoRA-FA significantly reduces activation memory consumption and computational workload in fine-tuning. Our code is available at https://github.com/huggingface/peft.

2202.02710 2026-05-12 cs.LG cs.NA math.AP math.NA

Spectrally Adapted Physics-Informed Neural Networks for Solving Unbounded Domain Problems

Mingtao Xia, Lucas Böttcher, Tom Chou

AI总结 该论文提出了一种结合自适应谱方法和物理信息神经网络(PINNs)的新型数值方法,用于求解定义在无界域上的难以解析求解的偏微分方程(PDEs)。该方法利用PINNs实现高阶数值格式并进行时空点的高效求解,同时引入自适应谱方法的技术以提升对无界变量依赖性的处理能力,从而在多个示例中展示了其在无界域PDE求解和参数估计中的优越性。

Comments 29 pages, 8 figures

详情
英文摘要

Solving analytically intractable partial differential equations (PDEs) that involve at least one variable defined on an unbounded domain arises in numerous physical applications. Accurately solving unbounded domain PDEs requires efficient numerical methods that can resolve the dependence of the PDE on the unbounded variable over at least several orders of magnitude. We propose a solution to such problems by combining two classes of numerical methods: (i) adaptive spectral methods and (ii) physics-informed neural networks (PINNs). The numerical approach that we develop takes advantage of the ability of physics-informed neural networks to easily implement high-order numerical schemes to efficiently solve PDEs and extrapolate numerical solutions at any point in space and time. We then show how recently introduced adaptive techniques for spectral methods can be integrated into PINN-based PDE solvers to obtain numerical solutions of unbounded domain problems that cannot be efficiently approximated by standard PINNs. Through a number of examples, we demonstrate the advantages of the proposed spectrally adapted PINNs in solving PDEs and estimating model parameters from noisy observations in unbounded domains.

1811.01198 2026-05-12 cs.LG math.OC stat.ML

Provable Exactness for Asymmetric Low-Rank SDP Learning

Enliang Hu

AI总结 本文研究了一种统一的正则化非对称低秩半定规划(aBMF)框架,旨在解决机器学习中的结构化优化问题。通过引入一个二次惩罚项,该方法在保持目标函数双凸性的同时,确保了在足够大的惩罚参数下,非对称方法与对称方法具有相同的临界点,从而保证解的精确性。该研究为非对称松弛方法提供了理论保证,解决了关于是否存在精确惩罚的开放问题。

详情
英文摘要

Low-rank factorization is a standard way to make structured optimization problems in machine learning more tractable by replacing matrix variables with compact factors. For positive semidefinite (PSD) variables, the symmetric Burer--Monteiro factorization (sBMF) writes $Z=XX^\top$ with a single low-rank factor $X$. A recent asymmetric alternative (aBMF) writes $Z=XY^\top$ and adds a quadratic penalty $(γ/2)\|X-Y\|_F^2$ to encourage symmetry. This split is attractive because it yields a biconvex objective with alternating convex subproblems, but its practical value depends strongly on how the penalty parameter $γ$ is chosen. We study a unified regularized aBMF framework and derive an explicit lower bound on $γ$ that guarantees exactness: under mild assumptions, any $γ$ above this threshold makes aBMF and sBMF share the same critical points. This gives a principled way to use the asymmetric formulation without altering the critical-point structure of the symmetric problem. In particular, it answers the open question of whether an exact penalty exists for asymmetric relaxation.

1207.5293 2026-05-12 cs.AI math.PR

Probability Bracket Notation: Multivariable Systems and Static Bayesian Networks

Xing M. Wang

AI总结 本文将概率括号符号(PBN)扩展至多变量概率系统和静态贝叶斯网络,提供了一种统一、基底无关的代数形式来表示和处理随机变量之间的依赖关系。通过引入学生贝叶斯网络作为示例,展示了PBN在预测、自底向上和自顶向下推理以及期望计算中的应用,并证明了其在大规模网络中的高效性。此外,PBN还被扩展到包含连续变量的网络,如线性高斯模型,并引入了一个结合离散与连续变量的混合医疗贝叶斯网络,支持用户特定的预测,具有在教育、数据分析和机器学习等领域应用的潜力。

Comments 28 pages. Added subsection 3.4 and Appendix A, describing the two-phase procedure for computing inference of d-separable chains and its efficiency in large Bayesian networks, especially polytrees with pendant subnets (including blobs)

详情
英文摘要

We extend Probability Bracket Notation (PBN), inspired by the Dirac notation in quantum mechanics, to multivariable probability systems and static Bayesian networks (BNs). By defining probability distributions and conditional expectations in a unified, basis-independent algebraic form, PBN provides a systematic way to represent and manipulate dependencies among random variables. Using the well-known Student BN as an illustrative probabilistic graphical model, we demonstrate prediction, bottom-up and top-down inference, and expectation calculations within the PBN framework. We show that, for a large N-node binary BN, after a one-time preprocessing, inference along a d-separable chain with k intermediate nodes requires O(k2^k) operations, compared to O(N2^N) for direct computation from the full joint distribution. We further extend PBN to networks with continuous variables, including linear Gaussian models, and introduce a hybrid Healthcare BN that combines discrete and continuous variables. In this model, discrete-display nodes serve as proxies for continuous parents, enabling user-specific predictions. Overall, PBN provides an operator-based framework that unifies representation and computation, with potential applications in education, data analytics, and machine learning.

2605.08441 2026-05-12 cs.LG cs.AI

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

Haoyu Hu, Xuandong Zhao, Xuhai "Orson'' Xu, Nori Jacoby

AI总结 该研究提出了一种名为DUET的方法,旨在优化强化学习中可验证奖励(RLVR)的token预算分配,以提升训练效率和推理质量。DUET通过联合控制提示的rollout分配数量和每个rollout的长度,在共享计算预算下同时优化训练时间和效果。实验表明,DUET在多个数学和编程基准测试中表现优异,且在仅使用一半token预算时仍能超越其他方法,显著提升了训练速度而不牺牲性能。

详情
英文摘要

Reinforcement learning with verifiable rewards (RLVR) generates hundreds of thousands of tokens per training step, with rollout generation dominating the computational cost. The overall token budget can be controlled along two main dimensions: (i) deciding which prompts to allocate rollouts to, and (ii) deciding how long each rollout should be. Prior work has generally controlled only one of these dimensions at a time. We show that jointly tuning both decisions under a shared compute budget improves both reasoning quality and wall-clock training time. We instantiate this view as \textbf{DU}al-controlled tok\textbf{E}n alloca\textbf{T}ion (DUET), a computationally efficient layer over GRPO that uses a lightweight pre-rollout surrogate of prompt informativeness to set how many rollouts each prompt receives, and a marker-gated abort rule with importance reweighting to set when to stop them. On Qwen3-1.7B trained on MATH, DUET outperforms full-budget GRPO and the other three budget-aware baseline methods. DUET's advantage further generalizes to other benchmarks across math and coding, and is on par with the best baseline on the scientific Q\&A domain, while also achieving a $1.62\times$ wall-clock speedup. More notably, using only 50\% of the token budget, DUET still outperforms all baseline methods at their full budget, achieving an even higher $2.51\times$ speedup over full-budget GRPO. We verify the high performance of DUET on other backbone LLMs, including Qwen3-4B and Llama-3.2-3B-Instruct. Notably, the gap between DUET and the strongest baseline \emph{widens} as the budget tightens, contrary to the usual pattern in which efficient methods trade off quality as compute decreases. More broadly, these results suggest that DUET budget-aware control strategies are valuable not only for accelerating training, but also for improving the quality of the learning signal.

2605.08440 2026-05-12 cs.LG cs.CV

TARO: Temporal Adversarial Rectification Optimization Using Diffusion Models as Purifiers

Daniel Wesego, Pedram Rooshenas

AI总结 该论文提出了一种名为TARO的时序对抗修正优化方法,旨在利用扩散模型提升对抗样本的净化效果。TARO通过在扩散轨迹中构建时序引导的分数先验,结合不同噪声尺度下的去噪视角,形成从粗到细的残差目标,从而在保持语义信息的同时增强模型对对抗攻击的鲁棒性。实验表明,TARO在多个数据集和自适应攻击模型下均能有效提升模型的鲁棒准确率,并且与对抗似然目标兼容,进一步增强防御效果。

详情
英文摘要

Adversarial purification with diffusion models seeks to project adversarial examples back toward the data manifold, but balancing semantic preservation and robustness against adaptive attacks remains challenging. Recent work shows that standard diffusion purification can fail under adaptive evaluation, while test-time score-based optimization is more resilient. Existing optimization defenses, however, typically rely on a single diffusion noise regime or treat timesteps uniformly, overlooking the distinct roles of coarse and fine denoising scales. We propose Temporal Adversarial Rectification Optimization (TARO), an inference-time purification method that builds a temporally guided score prior from multiple denoising views along the diffusion trajectory. TARO forms a coarse-to-fine residual target: high-noise experts provide globally smoothed structure with reduced adversarial sensitivity, while low-noise experts restore image-specific, class-relevant details. A guidance strength controls this temporal correction, allowing TARO to balance robust global rectification with semantic preservation. Empirically, TARO improves robust accuracy across datasets and adaptive threat models in a zero-shot setting, while remaining compatible with complementary adversarial-likelihood objectives for further robustness gains.

2605.08437 2026-05-12 cs.CL cs.AI

Magis-Bench: Evaluating LLMs on Magistrate-Level Legal Tasks

Ramon Pires, Thales Sales Almeida, Celio Larcher Junior, Giovana Bonás, Hugo Abonizio, Marcos Piau, Roseval Malaquias Junior, Thiago Laitz, Rodrigo Nogueira

AI总结 本文介绍了Magis-Bench,一个用于评估大型语言模型(LLMs)在法官级别法律任务中表现的基准测试,该测试基于近年来巴西司法职位竞争考试中的题目构建。研究通过引入多轮法律分析和司法文书撰写等任务,评估了23个先进模型的法律判断能力,并采用LLM作为评判者的方法进行评分,结果显示各模型在司法级法律推理和写作任务中仍面临较大挑战。该研究为法律人工智能领域提供了新的评估工具和数据支持。

详情
英文摘要

Existing benchmarks for legal AI focus primarily on tasks where LLMs must produce legal arguments or documents, yet the capacity to \emph{judge} such arguments -- weighing competing claims, applying doctrine to facts, and rendering reasoned decisions -- is arguably as fundamental to a well-functioning legal system as advocacy itself. We introduce Magis-Bench, a benchmark for evaluating LLMs on magistrate-level writing tasks derived from recent Brazilian competitive examinations for judicial positions. Magis-Bench comprises 74 questions from eight examinations conducted between 2023 and 2025, including discursive legal analysis questions with multi-turn structure and practical exercises requiring the composition of complete civil and criminal judicial sentences. We evaluate 23 state-of-the-art LLMs using an LLM-as-a-judge methodology with four independent frontier models as evaluators. Our results show strong inter-judge agreement (Kendall's $W = 0.984$; pairwise Kendall's $τ\ge 0.897$), with Google's Gemini-3-Pro-Preview achieving the highest average score (6.97/10), followed by Gemini-3-Flash-Preview (6.67) and Claude-4.5-Opus (6.46). Even the best-performing models score below 70\% of the maximum, indicating that judicial-level legal reasoning and writing remain challenging for current LLMs. We release the complete benchmark, model outputs, and evaluation code to support further research on legal AI capabilities.

2605.08436 2026-05-12 cs.LG cs.AI physics.comp-ph

A meshfree exterior calculus for generalizable and data-efficient learning of physics from point clouds

Benjamin D. Shaffer, Brooks Kinch, M. Ani Hsieh, Nathaniel Trask

AI总结 本文提出了一种无网格外微分形式(MEEC),用于从点云中学习结构保持的物理描述,并基于此构建了数据高效的MEEC-Net模型,能够在不同分辨率、几何形状和物理参数之间进行迁移。MEEC通过稀疏的Schur补解为ε-球图赋予虚拟节点和边度量,实现了精确的离散守恒,并且在点位置上端到端可微,无需传统方法所需的网格生成步骤。实验表明,MEEC-Net在多个典型偏微分方程基准测试中表现出显著优于现有神经算子方法的泛化性能。

Comments 25 pages, 13 figures

详情
英文摘要

We introduce a meshfree exterior calculus (MEEC) for learning structure-preserving descriptions of physics on point clouds, and use it to build MEEC-Net, a data-efficient surrogate that transfers across resolutions, geometries, and physical parameters. MEEC equips an $\varepsilon$-ball graph with virtual node and edge measures via a single sparse Schur complement solve; the resulting complex satisfies discrete conservation exactly, is end-to-end differentiable in the point positions, and exposes a direct geometry-to-physics link without the mesh-generation step required by conventional structure-preserving discretizations. MEEC-Net learns unknown physics as a shared edge-wise flux law in an SO($d$)-invariant local frame, so the same kernel produces compatible fluxes on any point cloud whose features lie in the training range. We prove a solution-error bound that splits into discretization and kernel-approximation terms which is independent of problem geometry, explaining the observed transfer from very few examples. We show that single-solution training transfers to unseen geometries, boundary conditions, and physical parameters. On five canonical PDE benchmarks MEEC-Net achieves 1-2 orders of magnitude lower out-of-distribution error than baseline neural-operator approaches. On the SimJEB structural-bracket benchmark it achieves competitive error while using substantially fewer training geometries.

2605.08432 2026-05-12 cs.CL cs.AI stat.ML

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

Zhanliang Wang, Jiancong Xiao, Ruochen Jin, Shu Yang, Bojian Hou, Li Shen

AI总结 该论文提出了一种用于评估开放域问答中大语言模型校准性能的语义抽样框架Sem-ECE。该方法通过从模型中采样答案并按语义分类,利用分类频率作为置信度,解决了现有方法在开放域场景下评估校准的不足。研究引入了两种估计器Sem₁-ECE和Sem₂-ECE,并证明其在大样本下无偏,且在难问题上表现出更小的校准误差,为问题难度诊断提供了依据。实验表明,Sem-ECE在多个基准测试中优于现有方法,具有重要的实际应用价值。

Comments Preprint

详情
英文摘要

Calibration measures whether a model's predicted confidence aligns with its empirical accuracy, and is central to the reliable deployment of large language models (LLMs) in high-stakes domains such as medicine and law. While much recent work focuses on improving LLM calibration, the equally important question of how to evaluate it in realistic settings remains underdeveloped. Open-ended question answering (QA), the most common deployment setting for modern LLMs, is where existing evaluation methods fall short: logit-based metrics need restricted output formats and internal probabilities; verbalized confidence is self-reported and often overconfident; and sampling-based methods rely on task-specific extraction rules without a clear finite-sample target. We introduce Sem-ECE (Semantic-Sampling Expected Calibration Error), a calibration evaluation framework for open-ended QA that samples answers from the model, groups them into semantic classes, and uses the resulting frequencies as confidence. We study two estimators within this framework: Sem$_1$-ECE, the same-sample self-consistency score, and Sem$_2$-ECE, a held-out variant that separates answer selection from confidence evaluation. We prove both are asymptotically unbiased, and further show that they agree on easy questions but diverge on hard ones with Sem$_2$ achieving strictly smaller calibration error, so their gap also serves as a diagnostic for question difficulty. Experiments on three open-ended QA benchmarks across five leading commercial LLMs match our theoretical predictions and show that Sem-ECE outperforms verbalized confidence and existing sampling-based methods, while complementing logit-based evaluation when internal probabilities are unavailable.

2605.08427 2026-05-12 cs.AI cs.GT cs.LG

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

Gabriele La Malfa, Emanuele La Malfa, Saar Cohen, Jie M. Zhang, Michael Luck, Michael Wooldridge, Elizabeth Black

AI总结 该研究探讨了在安全强化学习中,通过自对弈(self-play)提升AI安全性的方法,并指出当前方法在参数共享下存在理论与架构上的限制,导致纳什均衡范围受限且攻击无法有效施加对抗压力。为此,作者提出了一种新的方法——锚定双策略自对弈(Anchored Bipolicy Self-Play),通过在冻结的基模型上训练角色特定的LoRA适配器,实现角色分离,从而在保持优化稳定性的同时增强对抗压力。实验表明,该方法在参数效率和安全性方面均优于传统自对弈方法。

详情
英文摘要

Self-play red team is an established approach to improving AI safety in which different instances of the same model play attacker and defender roles in a zero-sum game, i.e., where the attacker tries to jailbreak the defender; if self-play converges to a Nash equilibrium, the model is guaranteed to respond safely within the settings of the game. Although the parameter sharing enforced by the use of the same model for the two roles improves stability and performance, it introduces fundamental theoretical and architectural limitations. We show that the set of Nash equilibria that can be reached corresponds to a broad class of behaviours that includes trivial always refuse strategies and oracle-like defenders, thus limiting practical applicability. We then show that when attacker and defender share and update the same base model, the dynamics collapse to self-consistency, so that attacks do not enforce adversarial pressure on the defender. In response, we propose Anchored Bipolicy Self-Play, which trains distinct role-specific LoRA adapters on top of a frozen base model, thereby maintaining stable optimisation while preserving adversarial pressure through explicit role separation. In relation to standard self-play, we show up to 100x greater parameter efficiency than finetuning and consistent improvements in safety compared to self-play fine-tuned models. We evaluate on Qwen2.5-{3B, 7B,14B}-IT models across widely used safety benchmarks, showing improved robustness without loss of reasoning ability. Cross-play experiments further show that our attacker and defender models are superior to self-play in terms of adversarial defence and safety.

2605.08424 2026-05-12 cs.LG math.OC math.PR

Generalized Wasserstein Flow Matching: Transport Plans, Everywhere, All at Once

Moritz Piening, Richard Duong, Gabriele Steidl

AI总结 本文提出了一种广义的沃asserstein流匹配方法,将流匹配框架扩展到概率测度空间,引入了沃asserstein-on-Wasserstein(WoW)形式。通过嵌套的沃asserstein几何,研究展示了运输计划上的测度如何自然诱导出实现元测度流的速度场,并提出了基于内外运输计划耦合的原理性推广。为降低计算成本,作者提出了基于切片和线性沃asserstein距离的可扩展近似方法,提升了训练效率并保证了数值稳定性,为点云和集合生成等任务提供了统一且理论扎实的生成建模方法。

详情
英文摘要

Flow matching has recently emerged as a flexible and efficient framework for generative modelling by learning deterministic transport dynamics between probability measures. In this work, we extend flow matching to the space of probability measures over probability measures, introducing a Wasserstein-on-Wasserstein (WoW) formulation. Leveraging the nested Wasserstein geometry, we show that measures over transport plans naturally induce velocity fields that realize metameasure flows. This yields a principled generalization of Wasserstein flow matching via coupled outer and inner transport plans. To address the substantial computational cost of WoW transport, we propose scalable approximations based on sliced and linear Wasserstein distances, enabling efficient training while promoting numerically stable, near-straight trajectories. Our framework unifies and extends existing approaches to point cloud and set generation, providing a practical and theoretically grounded method for generative modelling in WoW spaces.

2605.08423 2026-05-12 cs.LG cs.CL stat.ML

Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

Omatharv Bharat Vaidya, Connor T. Jerzak, Nhat Ho, Chandrajit Bajaj

AI总结 本文提出了一种数据自适应的参数高效微调方法,用于大神经网络的优化。该方法通过引入一个共享的、可查询的低秩更新原子记忆库,替代传统的层内适配器,使得模型能够根据输入内容和网络计算过程动态选择适合的更新组件,从而在保持低秩适应效率的同时实现更灵活的参数更新。此外,通过引入指令正则化机制,模型能够偏向语义相关方向进行更新,提升训练稳定性与最终性能。

详情
英文摘要

We present a data-adaptive method for parameter-efficient fine-tuning of large neural networks. Standard low-rank adaptation methods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with a shared queryable memory of low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components via attention, and applies the resulting routed operator within the low-rank bottleneck. In this way, the method retains the efficiency and scalability of low-rank adaptation while allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporate instruction-regularization by augmenting routing logits with a language-induced prior over update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters.

2605.08421 2026-05-12 cs.CV

Beyond Bag-of-Patches: Learning Global Layout via Textual Supervision for Late-Interaction Visual Document Retrieval

Pascal Tilli, Mohsen Mesgar

AI总结 本文研究了视觉文档检索(VDR)中如何通过全局布局信息提升检索效果的问题,提出了一种基于文本监督学习全局布局表示的方法。该方法在传统局部补丁嵌入的基础上引入全局布局嵌入,并通过文档的文本描述进行训练,从而在保持推理效率的同时提升对异构布局文档的检索能力。实验表明,该方法在多个数据集上显著优于现有基线模型。

详情
英文摘要

Visual Document Retrieval (VDR) models mostly rely on late interaction architectures, in which documents are represented by a set of local patch embeddings and then matched against query tokens. While efficient, this architecture prioritizes local similarity over global layout structure of documents to estimate relevancy between documents and query. In practice, this leads to errors as relevance originates from layout structure of documents with heterogeneous layouts combining figures, tables, and text. We make document layout learnable without changing inference. We propose a multimodal encoder that augments local patch representations with a global layout embedding, trained via textual descriptions encoding document layout information. Across four ViDoRe-v2 datasets, our model improves over the strongest architecturally comparable ColPali/ColQwen baseline by +2.4 nDCG@5 and +2.3 MAP@5, with statistically significant per-dataset gains over ColQwen.

2605.08417 2026-05-12 cs.LG math.OC

Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

Shengbo Wang, Zexi Zhang

AI总结 本文研究了无模型分布鲁棒强化学习(DRRL)中的核心挑战,即鲁棒Bellman算子的非线性特性导致单样本更新存在偏差,且鲁棒性评估计算复杂。为此,作者在Kullback-Leibler模糊集的小模糊度假设下,提出了一种基于一阶展开的近似DRRL框架,消除了对抗优化过程,同时保持一阶精度。基于此,作者设计了均值方差随机逼近(MVSA)算法,通过提升的随机逼近动态和双时间尺度结构实现单样本更新,并证明了该算法的收敛性及其主迭代满足尺度为 $n^{-1/2}$ 的中心极限定理,协方差结构明确。

详情
英文摘要

Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding. To address these difficulties, we consider the natural small-ambiguity regime under Kullback--Leibler ambiguity sets and propose an approximate DRRL framework based on a first-order expansion of the relevant robust functional. This yields an approximate robust Bellman equation that removes the adversarial optimization while remaining first-order accurate in the ambiguity radius. To learn the fixed point of this approximate equation, we propose Mean-Variance Stochastic Approximation (MVSA), a model-free algorithm that uses only one-sample updates. This is achieved via a lifted stochastic approximation dynamics and a two-time-scale design. We then prove convergence and a central limit theorem for MVSA: its main iterate satisfies a central limit theorem at the canonical $n^{-1/2}$ scale, with explicitly characterized asymptotic covariances. Finally, we validate our theoretical findings with a numerical experiment.

2605.08416 2026-05-12 cs.AI cs.CY cs.LG

Alignment as Jurisprudence

Nicholas Caputo

AI总结 本文探讨了法学与对齐(AI对齐)之间的深层联系,指出二者在预测和塑造强大决策者(法官与人工智能)未来行为方面具有相似的目标与方法。通过结合德沃金的规则导向解释主义和孙斯坦的类比推理法律观,并借鉴宪法AI与案例推理等前沿对齐技术,文章展示了法律思维对AI对齐研究的启发价值,并指出AI有助于深化对法律运作机制的理解。随着AI能力增强和法律理论对人类法官约束的减弱,法学与对齐领域的对话将变得愈发关键。

详情
Journal ref
27 Yale Journal of Law and Technology 390 (Sept. 2025)
英文摘要

Jurisprudence, the study of how judges should properly decide cases, and alignment, the science of getting AI models to conform to human values, share a fundamental structure. These seemingly distant fields both seek to predict and shape how decisions by powerful actors, in one case judges and in the other increasingly powerful artificial intelligences, will be made in the unknown future. And they use similar tools of the specification and interpretation of language to try to accomplish those goals. The great debates of jurisprudence, about what the law is and what it should be, can provide insight into alignment, and lessons from what does and does not work in alignment can help make progress in jurisprudence. This essay puts the two fields directly into conversation. Drawing on leading accounts of jurisprudence, particularly Dworkin's principle-oriented interpretivism and Sunstein's positivist account of law as analogical reasoning, and on cutting-edge alignment approaches, namely Constitutional AI and case-based reasoning, it illustrates the value of a more sophisticated legally-inspired approach to the interplay of rules and cases in finetuning alignment and points to ways that AI can provide a better understanding of how the law works and how it can be improved by the introduction of AI. AI systems and the law should operate to empower people to act in the world, helping to expand their capabilities and the extent to which they are able to achieve their goals. As AI continues to improve in capacity, and as the constraints that legal theory places on human judges seem be coming undone, the conversation between these two fields will become increasingly essential and may help point to a better version of both.

2605.08415 2026-05-12 cs.AI

Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models

Bruno Bianchi, Diego Tiscornia, Matias Travizano, Ariel Futoransky

AI总结 本文研究了大型语言模型在政治议题上的“政治可塑性”,即模型根据用户提供的上下文调整回答的能力。通过构建包含200个政治相关问题的测试框架,研究发现用户提示能有效诱导模型产生显著的意识形态转变,尤其在经济自由议题上表现明显,而系统提示效果较弱。实验还揭示了模型在不同语言环境下表现出细微但明显的可塑性差异,并指出部分模型可能存在数据泄露问题,整体表明较新的前沿模型具有更稳定和可预测的政治适应能力。

详情
英文摘要

Since the advent of Large Language Models (LLMs), a significant area of research has focused on their intrinsic biases, particularly in political discourse. This study investigates a different but related concept, "political plasticity", which is defined as the capacity of models to adapt their responses based on the user supplied context. To analyze this, a testing framework was developed using an expanded corpus of 200 politically-oriented questions across economic and personal freedom axes, based on a prior framework by Lester (1996). The study explored several methods to induce political bias, including simplified and topic-based system prompts, as well as user prompts with few-shot examples. The results show that while system prompts were largely ineffective, user prompts successfully elicited significant ideological shifts, particularly along the Economic Freedom axis in larger and newer models. Through a validation experiment, we examined whether models answer questionnaires by recognizing the underlying question format. Inverting the sense of the questions revealed unexpected, counter-intuitive shifts in most models, suggesting potential data leakage. Finally, we also analyzed how model plasticity varies when the experiment is conducted in different languages. The results reveal subtle yet notable shifts across each of the analyzed languages. Overall, our results indicate that small and older LLMs exhibit limited or unstable political plasticity, whereas newer frontier models display reliable, expected adaptability.

2605.08412 2026-05-12 cs.CV

SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

Sara Ghazanfari, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami

AI总结 SYNCR 是一个用于跨视频推理的合成基准测试平台,旨在评估多模态大语言模型在多个独立视频流之间的推理能力。该基准通过程序验证的方式构建,包含大量基于物理和空间逻辑的多视频问答对,涵盖时间对齐、空间追踪、比较推理等任务。实验表明,当前主流模型在跨视频推理任务上与人类存在显著差距,尤其在物理和空间细节推理方面表现较弱,突显了现有模型在多视频理解中的局限性。

详情
英文摘要

Multimodal Large Language Models (MLLMs) have made rapid progress in single-video understanding, yet their ability to reason across multiple independent video streams remains poorly understood. Existing multi-video benchmarks rely largely on human-annotated real-world footage, limiting the precision of spatial, temporal, and physical ground truth and making it difficult to diagnose model failures. We introduce SYNCR, a controlled synthetic benchmark for cross-video reasoning with programmatically verified grounding. Built using Habitat, Kubric, and CLEVRER simulator engines, SYNCR contains 8,163 multi-video question-answer pairs grounded in 9,650 unique videos. It evaluates MLLMs across eight tasks spanning four diagnostic pillars: Temporal Alignment, Spatial Tracking, Comparative Reasoning, and Holistic Synthesis. Our zero-shot evaluation of leading open- and closed-weight MLLMs reveals a substantial gap between current models and humans: the best model achieves only 52.5% average accuracy, compared to an 89.5% human baseline. Models perform relatively well on temporal ordering but struggle with precise physical and spatial reasoning, with the best model reaching only 26.0% accuracy on Kinematic Comparison. We further find that parameter scaling and reasoning-specialized post-training improve temporal alignment capabilities, but do not reliably address fine-grained physical tracking or global spatial synthesis. Finally, an exploratory sim-to-real correlation analysis suggests that several SYNCR tasks track model-level trends on real-world multi-video benchmarks, while also exposing reasoning capabilities underrepresented by existing evaluations. Code available at https://github.com/SaraGhazanfari/SYNCR.

2605.08409 2026-05-12 cs.AI

Playing games with knowledge: AI-Induced delusions need game theoretic interventions

Will Beaumaster, Paul Schrater

AI总结 本文研究了对话式AI在作为知识接口时引发的“认知固化”和“信念螺旋”问题,指出其根源在于用户与AI之间从单向知识搜索转向策略性互动的系统性转变。作者将问题形式化为一个廉价交谈博弈模型,提出一种称为“认知调解者”的干预机制,通过引入认知摩擦迫使用户类型揭示,并设计了“信念版本控制”系统以存储健康信念并实现回滚。实验表明,该方法有效打破了信念螺旋,证明AI的认知安全应从战略信息环境设计而非单纯模型对齐入手。

详情
英文摘要

Conversational AI has a fundamental flaw as a knowledge interface: sycophantic chatbots induce epistemic entrenchment and delusional belief spirals even in rational agents. We propose the problem does not stem from the AI model, rooted instead in a systemic consequence of the paradigm shift from user-driven knowledge search to users and agents engaged in strategic, repeated-play communication. We formalize the problem as a Crawford-Sobel cheap talk game, where costless user signals induce a pooling equilibrium. Agents optimized for user satisfaction produce sycophantic strategies that provide identical reinforcement across user types with opposite epistemic incentives: exploratory ``Growth-seekers'' ($θ_G$) and confirmatory ``Validation-seekers'' ($θ_V$). Under repeated play, this identification failure creates a coordination trap -- analogous to a Prisoner's Dilemma -- where locally rational feedback loops drive users toward pathologically certain false beliefs. We propose an inference-time mechanism design intervention called an Epistemic Mediator that breaks this pooling equilibrium by introducing a costly signal (epistemic friction), forcing type revelation based on users' asymmetric cognitive costs for processing resistance. A key contribution is Belief Versioning, a git-inspired epistemic meta-memory system that stores healthy beliefs and rollbacks when validation-seeking resistance is detected. In simulation, this intervention achieves a separating equilibrium achieving a $48\times$ differential in spiral rates while passing a learning preservation criterion), evidence that epistemic safety in AI is fundamentally a problem of strategic information environment design rather than simple model alignment.

2605.08408 2026-05-12 cs.LG

AdamFLIP: Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN Training

Binghang Lu, Runyu Zhang, Changhong Mou, Na Li, Guang Lin

AI总结 该论文提出了一种名为 AdamFLIP 的新型优化方法,用于解决物理信息神经网络(PINN)在硬约束条件下的训练问题。传统 PINN 通常使用软惩罚方式处理约束,容易导致条件不佳和约束满足度低,而 AdamFLIP 将 PINN 训练建模为等式约束优化问题,通过反馈线性化方法结合自适应动量优化,实现了对约束残差的精确控制。实验表明,AdamFLIP 在多个偏微分方程正逆问题中表现优异,尤其在纳维-斯托克斯方程中,其预测解的相对 $L_2$ 误差相比现有方法降低了三分之二以上。

详情
英文摘要

Physics-informed neural networks (PINNs) provide a flexible framework for solving forward and inverse problems governed by partial differential equations (PDEs), but standard PINN training typically relies on soft penalty formulations that combine PDE residuals, data mismatch, and initial/boundary conditions using manually chosen weights. This often leads to ill-conditioning, sensitivity to loss weights, and poor constraint satisfaction. In this work, we reformulate PINN training as an equality-constrained optimization problem and propose a novel Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN (AdamFLIP). The key idea is to view the constraint residuals as the output of a controlled dynamical system and to compute the Lagrange multiplier as a feedback input that locally drives these residuals toward stable linear contraction dynamics. AdamFLIP then applies Adam-style first- and second-moment adaptation to the resulting feedback-linearized Lagrangian gradient, combining principled constraint handling with the scalability and robustness of adaptive neural-network optimization. We test AdamFLIP on a range of benchmark forward and inverse PDE problem, and it consistently outperforms both the standard soft-constrained PINN and state-of-the-art constrained optimizers. Specifically, on the Navier--Stokes equations benchmark, AdamFLIP \textbf{reduces relative $L_2$ error by more than two thirds} for the predicted solution compared to the next best method. Our AdamFLIP framework provides an effective and computationally scalable hard constraint optimization method for PINN training.

2605.08406 2026-05-12 cs.CL cs.AI

Effective Explanations Support Planning Under Uncertainty

Hanqi Zhou, Britt Besch, Charley M. Wu, Tobias Gerstenberg

AI总结 本文研究了在不确定性环境下,如何通过有效的解释来支持规划过程。作者提出了一种计算模型,将语言解释转化为可执行的行动方案,结合大型语言模型和规划代理,在部分可观测条件下执行任务。实验表明,高质量的解释能显著提升导航效率和可靠性,证明了语言指导在不确定性环境中的实用价值。

Comments CogSci 2026

详情
英文摘要

Explaining how to get from A to B can be challenging. It requires mentally simulating what the listener will do based on what they are told. To capture this process, we propose a computational model that converts utterances into action plans: a large language model translates an explanation into program-like guidance (a policy prior and value map), and a planning agent executes it under partial observability. We score explanations by the efficiency and reliability of the resulting paths, penalizing replanning. Across four preregistered experiments, we collect a corpus of 1,200 explanations over 24 maps, elicit helpfulness judgments, measure baseline navigation, and test behavior with explanations of differing quality. Higher-scored explanations are judged more helpful and improve navigation: participants with explanations outperform those without, and high-scoring explanations help more than low-scoring ones. Together, these results show procedural explanation as utility-guided communication shaped by how language can be grounded into action under uncertainty.

2605.08405 2026-05-12 cs.AI cs.LG

Belief or Circuitry? Causal Evidence for In-Context Graph Learning

Katharine Kowalyshyn, Timothy Duggan, Daniel Little, Michael C Hughes

AI总结 本研究探讨了大型语言模型(LLMs)在上下文学习中的机制,即它们是通过匹配最近的token模式,还是通过推断潜在结构来学习。通过设计一个在两个竞争图结构之间进行随机游走的实验任务,研究发现模型既不是单纯依赖局部转移,也不是仅依赖全局拓扑结构,而是同时利用了两种机制。研究通过主成分分析和因果干预实验表明,模型在中间混合比例下能够同时编码两种图结构,并且晚期层的激活修补和图差异引导能够有效影响模型行为,支持了结构推断与归纳电路并行运作的双机制解释。

Comments Under review at ICML Mechanistic Interpretability Workshop 2026

详情
英文摘要

How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random-walk across two competing graph structures. This task's answer is, in principle, decidable: either the model tracks global topology, or it copies local transitions. We present two lines of evidence that neither account alone is sufficient. First, reconstructing the internal representation structure via PCA reveals that at intermediate mixture ratios, both graph topologies are encoded in orthogonal principal subspaces simultaneously. This pattern is difficult to reconcile with purely local transition copying. Second, residual-stream activation patching and graph-difference steering causally intervene on this graph-family signal: late-layer patching almost fully transfers the clean graph preference, while linear steering moves predictions in the intended direction and fails under norm-matched and label-shuffled controls. Taken together, our findings are most consistent with a dual-mechanism account in which genuine structure inference and induction circuits operate in parallel.

2605.08404 2026-05-12 cs.CL cs.AI cs.CV cs.ET

Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models

Dongdong Wang, Deepak Balakrishnan, Ravi Srinivasan, Shenhao Wang

AI总结 本文研究了如何利用大语言模型(LLM)处理智慧城市中的任务,核心方法是通过遥感影像来刻画建成环境,包括设计建议、可建性评估、土地利用模式和风险识别。研究在多尺度遥感影像输入下评估了多模态语言模型对建成环境推理的效果,并对比了InternVL和Qwen等先进模型在生成建成环境建议时的准确性和可靠性。结果表明,将遥感影像与大语言模型结合,有助于提升智慧城市中的决策支持能力。

Comments Published in the International Conference on Industrialized Construction 2026

详情
英文摘要

This work investigates the use of large language models (LLMs) for tasks in smart cities. The core idea is to leverage remote sensing imagery to characterize the built environment, including design suggestions, constructability assessment, landuse patterns, and risk identification. We examine remote sensing imagery at multiple spatial scales as inputs for multimodal language modeling and evaluate their effects on built-environment-related reasoning. In addition, we compare state-of-the-art LLMs, including InternVL and Qwen, in terms of accuracy and reliability when generating built environment recommendations. The results demonstrate the potential of integrating remote sensing imagery with large language models to assist smart cities and decision-making.

2605.08399 2026-05-12 cs.AI

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

Ziyang Yu, Qiyue Li, Liang Zhao

AI总结 CoCoDA 是一种用于增强工具使用代理的协同演化组合式DAG框架,旨在解决工具库规模扩大时与规划器协同进化的挑战。该方法通过一个组合式代码DAG结构,将工具和规划器共同演化,每个节点存储工具的类型签名、描述及条件规范,推理时通过类型化DAG检索高效筛选候选工具,训练时则将成功轨迹整合为复合工具并优化规划器奖励机制。实验表明,CoCoDA 在多个基准任务中显著提升了小模型的性能,使其在数学推理和代码任务上达到甚至超越大模型的表现。

详情
英文摘要

Tool-augmented language models can extend small language models with external executable skills, but scaling the tool library creates a coupled challenge: the library must evolve with the planner as new reusable subroutines emerge, while retrieval from the growing library must remain within a fixed context budget. Existing tool-use and skill-library methods typically treat tools as flat or text-indexed memories, causing prompt cost to grow with library size and obscuring the typed, compositional structure of executable code. We propose CoCoDA, a framework that co-evolves the planner and tool library through a single code-native structure: a compositional code DAG. Nodes are primitive or composite tools, edges encode invocation dependencies, and each node stores a typed signature, description, pre/post-condition specification, and worked examples. At inference time, Typed DAG Retrieval prunes candidates by symbolic signature unification, ranks survivors by descriptions, filters them by behavioral specifications, and disambiguates with examples, keeping expensive context materialization on progressively smaller candidate sets. At training time, successful trajectories are folded into validated composite tools, while the planner is updated with a DAG-induced reward that credits composites by their primitive expansion size. We provide theoretical results showing retrieval cost reduction, sublinear retrieval time, compositional advantage under the shaped reward, monotone co-evolution under conservative updates, and DAG well-formedness. Across mathematical reasoning, tabular analysis, and code task benchmarks, CoCoDA enables an 8B student to match or exceed a 32B teacher on GSM8K and MATH and consistently improves over strong tool-use and library-learning baselines.

2605.08396 2026-05-12 cs.CV

Delivering Science as a Service: Sci-Orchestra's Cloud-Native Approach to HPC

Harinarayan Krishnan, Shubhabrata Mukerjee, Jeffrey Donatelli, Daniela Ushizima

AI总结 随着现代计算环境日益复杂,研究人员常被基础设施管理、认证协议和容器部署等问题所困扰。本文提出 Sci-Orchestra,一个分层的编排框架,旨在通过自动化实验流程,使科学家能够专注于科学发现而非后台操作。该系统基于 Kubernetes 架构,提供 API 驱动的接口,实现安全认证、资源管理和可扩展部署,并引入自主市场机制促进跨机构协作,支持模块化部署与知识产权保护,加速科研成果向工业应用的转化。

详情
英文摘要

The increasing complexity of modern computational environments often burdens researchers with infrastructure management, authentication protocols, and container deployments. We present Sci-Orchestra, a layered orchestration framework designed to fully automate experimental workflows, allowing scientists to prioritize scientific discovery over backend operations. By abstracting execution through an API-driven interface, the system assumes responsibility for secure authentication, resource management, and scalable deployment across diverse high-performance computing environments using Kubernetes architectures. A key innovation of Sci-Orchestra is its autonomous marketplace, which serves as a catalyst for cross-institutional collaboration. Through an intuitive user interface, researchers can rapidly deploy and share specialized services via simple selections, eliminating the need for complex installations and technical setups. This modular infrastructure is specifically designed to facilitate industry partnerships as it provides a secure execution environment and allows external collaborators to test and validate proprietary tools without the need for source-code exchange. This ``black-box'' interoperability protects intellectual property while enabling seamless integration into broader scientific pipelines, ultimately accelerating the transition from laboratory prototypes to industrial-scale applications.

2605.08392 2026-05-12 cs.LG

Geometry-Aware Discretization Error of Diffusion Models

Samuel Hurault, Thomas Moreau, Gabriel Peyré

AI总结 本文研究扩散模型在有限去噪步数下的离散化误差问题,分析了该误差如何依赖数据的几何结构及扩散过程的关键参数。通过推导欧拉-马乌亚玛弱误差和弗雷歇特误差的一阶渐近展开式,揭示了离散化误差如何通过数据协方差谱适应数据几何,并为几何感知的参数优化提供了可计算的目标。实验表明,所提出的理论分析在不同几何结构的扩散采样任务中具有良好的鲁棒性。

详情
英文摘要

Practical diffusion sampling is a numerical approximation problem: under a fixed inference budget, one must simulate a reverse-time ODE or SDE using only a limited number of denoising steps, so discretization error is often the dominant source of error. Existing non-asymptotic analyses provide convergence guarantees, but are typically too loose and too insensitive to diffusion parameters to guide practical design: broad families of schedules receive the same rates, which depend on coarse worst-case quantities such as the dimension or the drift Lipschitz constant. We take a less ambitious but more informative route. In the exact-score setting, we derive first-order asymptotic expansions of the Euler-Maruyama weak and Fréchet discretization errors. These formulas hold for general smooth reverse diffusions and become fully explicit under Gaussian data. They show how discretization error adapts to the geometry of the data through the covariance spectrum, and how this geometry interacts with key diffusion parameters, including the diffusion schedules and the diffusion-term coefficient. This yields tractable objectives for geometry-aware parameter optimization. Finally, we show that the qualitative predictions of the Gaussian formulas remain robust across diffusion sampling problems with different geometries, including image generation on different datasets and image posterior sampling.

2605.08390 2026-05-12 cs.LG

The Power of Second Order Methods for Sequence Preconditioning

Annie Marsden, Elad Hazan

AI总结 本文研究了用于长记忆动态系统序列预测的二阶方法的潜力,提出了一种结合序列预处理(USP)与Vovk-Azoury-Warmuth(VAW)算法的新方法,有效克服了预处理序列导致的直径和梯度指数增长问题。该方法在不对称隐藏转移矩阵的情况下,实现了对数立方的遗憾界 $O(\log^3 T)$,显著优于传统多项式遗憾结果。此外,文章还拓展了USP的应用范围,通过复分析方法为具有常数复参数的系统提供了新的切比雪夫多项式界。

Comments 14 pages, 5 figures

详情
英文摘要

Sequence prediction methods for dynamical systems with long memory, i.e. marginally stable systems, typically achieve regret that grows polynomially with the hidden dimension of the underlying generative model. Universal Sequence Preconditioning (USP) is a method that compresses any sequence which comes from a linear dynamical system into a "preconditioned" sequence which requires exponentially shorter memory for accurate prediction. However, the preconditioned sequence yields exponentially larger diameters and gradients, hindering USP from unlocking optimal regret bounds. Inspired by the minimum description length principle, we show that the Vovk-Azoury-Warmuth (VAW) algorithm is naturally matched to the USP regime. Indeed, it takes advantage of the memory compression while remaining robust to the exponential explosion of the diameter. We prove that combining USP with VAW achieves astoundingly strong results: for any marginally-stable linear dynamical system, this algorithm achieves polylogarithmic regret $O \left( \log^3 T \right)$ even in the presence of asymmetric hidden transition matrices. Finally, we extend the applicability of USP beyond bounded-spectrum systems by providing new complex-analytic bounds on Chebyshev polynomials, allowing for systems with constant complex arguments.