2605.18226 2026-05-19 cs.CL cs.AI

Context Memorization for Efficient Long Context Generation

上下文记忆用于高效长上下文生成

Yasuyuki Okoshi, Hao Mark Chen, Guanxi Lu, Hongxiang Fan, Masato Motomura, Daichi Fujiki

AI总结本文提出了一种无需训练的上下文记忆方法，通过将前缀外部化为轻量级的预计算注意力状态查找表，以提高长上下文生成的准确性和效率，同时减少注意力计算的延迟。

详情

AI中文摘要

现代大型语言模型（LLM）应用越来越多地依赖长前缀来在推理时控制模型行为。尽管增强前缀的推理是有效的，但存在两个结构限制：i）随着生成过程的进行，前缀的影响逐渐减弱；ii）对前缀的注意力计算与长度成线性关系。现有方法要么在注意力中保留前缀同时压缩它，要么通过梯度训练将它内部化到模型参数中。前者在推理时仍然会关注到前缀，而后者训练成本高且不适合前缀更新。为了解决这些问题，我们提出了注意力状态记忆，这是一种无需训练的方法，将前缀外部化为一个轻量级的预计算注意力状态的查找表。在ManyICLBench上使用LLaMA-3.1-8B，我们的方法在1K-8K内存预算下比上下文学习提高了准确性，同时在8K时将注意力延迟减少了1.36倍，并在NBA基准测试中仅使用其内存足迹的20%就超过了全注意力RAG性能。

英文摘要

Modern large language model (LLM) applications increasingly rely on long conditioning prefixes to control model behavior at inference time. While prefix-augmented inference is effective, it incurs two structural limitations: i) the prefix's influence fades as generation proceeds, and ii) attention computation over the prefix scales linearly with its length. Existing approaches either keep the prefix in attention while compressing it, or internalize it into model parameters through gradient-based training. The former still attends to the prefix at inference, while the latter is training-intensive and ill-suited to prefix updates. To address these issues, we propose attention-state memory, a training-free approach that externalizes the prefix into a lightweight, lookup-based memory of precomputed attention states between prefix and query tokens. On ManyICLBench with LLaMA-3.1-8B, our method improves accuracy over in-context learning at 1K-8K memory budgets while reducing attention latency by 1.36x at 8K, and surpasses full-attention RAG performance on NBA benchmark using only 20% of its memory footprint.

URL PDF HTML ☆

赞 0 踩 0

2605.18221 2026-05-19 cs.SD cs.CL cs.CV cs.LG physics.med-ph

简洁且逻辑一致的神经符号概念模型的符合集

Samuele Bortolotti, Emanuele Marconato, Andrea Pugnana, Andrea Passerini, Stefano Teso

AI总结本文提出COCOCO框架，通过整合符合预测方法，解决神经符号概念模型中标签和概念预测过于自信的问题，满足一致性、覆盖性和简洁性三个要求，提升模型的可靠性。

详情

AI中文摘要

神经符号概念模型（NeSy-CBMs）是一类将神经网络与符号推理相结合的架构，用于在高风险应用中提高可靠性。它们通过从输入中提取高层概念，然后在给定的逻辑约束下推断任务标签。然而，其标签和概念预测可能过于自信，使利益相关者难以判断何时可以信任模型的决策。本文通过整合符合预测（CP）框架，提供严格的分布无关覆盖保证，正式化了三个要求——一致性、覆盖性和简洁性，证明现有方法至少在一项上不足。然后引入COCOCO，一种后处理框架，联合符合概念和标签，并通过单个推断-反推修订步骤进行协调。COCOCO满足所有三个要求，保留分布无关覆盖，对不完美的知识具有鲁棒性，并支持用户指定的大小预算。在8个数据集上的实验显示，COCOCO在性能和集合大小方面优于竞争对手和自然基线。

英文摘要

Neuro-Symbolic Concept-based Models (NeSy-CBMs) are a family of architectures that integrate neural networks with symbolic reasoning for enhanced reliability in high-stakes applications. They work by first extracting high-level concepts from the input and then inferring a task label from these compatibly with given logical constraints. Yet, their label and concept predictions can be overconfident, making it difficult for stakeholders to gauge when the model's decisions can be trusted. We address this issue by integrating ideas from Conformal Prediction (CP), a framework providing rigorous, distribution-free coverage guarantees. We formalize three desiderata -- consistency, coverage, and conciseness -- that any conformal method for NeSy-CBMs should satisfy, and show that existing approaches fall short of at least one. We then introduce COCOCO, a post-hoc framework that conformalizes concepts and labels jointly and reconciles them via a single deduction-abduction revision step. COCOCO satisfies all three desiderata, retains distribution-free coverage, is robust to imperfect knowledge and supports user-specified size budgets. Our experiments on 8 data sets highlight how COCOCO compares favorably against competitors and natural baselines in terms of performance and set size.

URL PDF HTML ☆

赞 0 踩 0

2605.18197 2026-05-19 cs.RO cs.AI cs.CV

双速率扩散：通过交错重-轻网络加速扩散模型

Grigory Bartosh, David Ruhe, Emiel Hoogeboom, Jonathan Heek, Thomas Mensink, Tim Salimans

AI总结本文提出双速率扩散方法，通过交错执行高容量上下文编码器和轻量解噪模型，加速扩散模型推理，同时保持样本质量，在ImageNet基准上实现性能与计算成本的平衡。

详情

AI中文摘要

扩散模型在生成性能上达到最先进的水平，但在推理过程中由于重复评估重的神经网络而面临高昂的计算成本。在本文中，我们提出了双速率扩散，一种通过交错执行高容量的上下文编码器和轻量高效的去噪模型来加速采样的方法。上下文编码器被稀疏评估以提取高维特征，这些特征在每一步都被轻量去噪模型有效重用，以高效地细化样本。这种方法显著加速了推理过程，而不会牺牲样本质量。在ImageNet基准上，双速率扩散在性能上与标准基线相匹配，同时将计算成本降低了2-4倍。此外，我们证明了我们的方法与蒸馏技术，如动量匹配蒸馏，兼容，从而在少步生成中进一步提高效率。

英文摘要

Diffusion models achieve state-of-the-art generative performance but suffer from high computational costs during inference due to the repeated evaluation of a heavy neural network. In this work, we propose Dual-Rate Diffusion, a method to accelerate sampling by interleaving the execution of a heavy high-capacity context encoder and a light efficient denoising model. The context encoder is evaluated sparsely to extract high-dimensional features, which are effectively reused by the light denoising model at every step to refine the sample efficiently. This approach significantly accelerates inference without compromising sample quality. On ImageNet benchmarks, Dual-Rate Diffusion matches the performance of standard baselines while reducing computational cost by a factor of $2$-$4$. Furthermore, we demonstrate that our method is compatible with distillation techniques, such as Moment Matching Distillation, enabling further efficiency gains in few-step generation.

URL PDF HTML ☆

赞 0 踩 0

2605.18188 2026-05-19 cs.LG

UTOPYA: A Multimodal Deep Learning Framework for Physics-Informed Anomaly Detection and Time-Series Prediction

UTOPYA：一种用于物理信息异常检测和时间序列预测的多模态深度学习框架

Robson W. S. Pessoa, Julien Amblard, Alessandra Russo, Idelfonso B. R. Nogueira

AI总结本文提出UTOPYA框架，通过融合八种数据模态，利用FiLM条件交叉模态注意力和门控融合，共同解决批次蒸馏中的异常检测、时间序列预测和相分类问题，并通过物理信息正则化方案和课程学习方法提升性能。

详情

AI中文摘要

批次过程中的异常检测受到瞬态动态、稀少故障标签和依赖单一模态传感器数据的限制。本文介绍了UTOPYA（统一时间观测用于物理信息异常检测和时间序列预测），一种具有15.2M参数的多模态框架，通过特征-wise线性调制（FiLM）条件交叉模态注意力和门控融合，共同解决批次蒸馏中的异常检测、时间序列预测和相分类问题。本文引入的物理信息正则化方案强制时间平滑性和热力学单调性，而课程学习则按物理难度顺序引入训练样本。在Arweiler等人（2026）的119次实验多模态批次蒸馏数据集上，UTOPYA在窗口级别测试中达到0.832和0.874的AUROC，显著优于四个外部基线（PCA、自动编码器、隔离森林和LSTM自动编码器）在相同条件下的表现（+0.147窗口级别AUROC超过最佳基线）。对15种架构配置的多模态消融研究显示，通过FiLM条件的静态上下文是关键使能器，使实验级别多信号AUROC提高+0.145（从0.729到0.874）。此外，对14种设计选择的训练消融研究发现，包括实例归一化、Mixup、集成、测试时增强和随机权重平均在内的几种广泛采用的技巧在数据稀少的设置中未能提升或主动降低泛化能力。这些负面结果揭示了平滑基于正则化和异常检测之间的根本矛盾，为多模态过程监控部署提供了实际指导。

英文摘要

Anomaly detection in batch processes is hindered by transient dynamics, scarce fault labels, and reliance on single-modality sensor data. This work introduces UTOPYA (Unified Temporal Observation for Physics-Informed Anomaly Detection and Time-Series Prediction), a 15.2M-parameter multimodal framework that jointly addresses anomaly detection, time-series prediction, and phase classification in batch distillation by fusing eight data modalities through Feature-wise Linear Modulation (FiLM) conditioned cross-modal attention and gated fusion. A physics-informed regularisation scheme introduced in this work enforces temporal smoothness and thermodynamic monotonicity, while curriculum learning introduces training samples in order of physical difficulty. On the 119-experiment multimodal batch distillation dataset of Arweiler et al. (2026), UTOPYA achieves a window-level test AUROC of 0.832 and 0.874 under multi-signal experiment-level scoring, substantially outperforming four external baselines (PCA, autoencoder, Isolation Forest, and LSTM autoencoder) evaluated under identical conditions (+0.147 window-level AUROC over the best baseline). A multimodal ablation over 15~architectural configurations shows that static context via FiLM conditioning is the key enabler, lifting experiment-level multi-signal AUROC by +0.145 over the unimodal baseline (0.729 to 0.874). Separately, a training ablation across 14 design choices reveals that several widely-adopted techniques, including instance normalisation, Mixup, ensembling, test-time augmentation, and stochastic weight averaging, fail to improve or actively degrade generalisation in this data-scarce setting. These negative results expose a fundamental tension between smoothing-based regularisation and anomaly detection, providing practical guidance for multimodal process monitoring deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.18184 2026-05-19 cs.RO cs.AI cs.CV

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

固定外部摄像头作为主动3D场景图生成的共同先验地图

Giorgia Modi, Davide Buoso, Giuseppe Averta, Daniele De Martini

AI总结本文提出利用固定外部RGB摄像头作为共同先验地图，以实现主动、渐进式的3D场景图生成，通过融合机器人 onboard 摄像头和固定外部摄像头的数据，提高场景理解的效率和准确性。

详情

AI中文摘要

常用的先验信息，如BIM模型、平面图和遥感图像，可以为自主机器人系统提供有价值的几何和语义上下文。在本文中，我们将固定外部RGB摄像头的观测视为共同先验地图（CPMs）：环境的广角视图，在任何机器人运动开始之前初始化一个语义和几何场景先验。我们提出一个仅使用RGB的框架，用于主动、渐进式的3D场景图（3DSG）生成，该框架在单一硬件无关的管道中无缝融合来自机器人 onboard 摄像头和固定外部摄像头的观测。通过仅依赖RGB观测并通过前馈3D重建模型进行处理，系统将所有摄像头——机器人 onboard 或外部——视为相同，无需硬件修改。基于图的主动语义探索框架然后直接利用部分场景图，引导机器人向高语义不确定性区域前进，逐步完成和细化先验。实验表明，使用单个外部摄像头初始化场景图可使初始物体召回率提高高达+79%，并且先验的更丰富上下文显著提高了后续主动探索的效率。

英文摘要

Commonly available prior information, such as BIM models, floor plans, and remote sensing images, can provide valuable geometric and semantic context for autonomous robotic systems. In this paper, we treat observations from fixed external RGB cameras as Common Prior Maps (CPMs): wide-field views of the environment that initialize a semantic and geometric scene prior before any robot motion begins. We present an RGB-only framework for active, incremental 3D scene graph (3DSG) generation that seamlessly fuses observations from both onboard robot cameras and fixed external cameras within a single hardware-agnostic pipeline. By relying solely on RGB observations processed by a feed-forward 3D reconstruction model, the system treats all cameras - onboard or external - identically, requiring no hardware modifications. A graph-based active semantic exploration framework then directly leverages the partial scene graph to guide the robot toward regions of high semantic uncertainty, progressively completing and refining the prior. Experiments demonstrate that bootstrapping the scene graph with even a single external camera increases initial object recall by up to +79%, and that the richer context of the prior significantly improves the efficiency of subsequent active exploration.

URL PDF HTML ☆

赞 0 踩 0

2605.18181 2026-05-19 cs.AI cs.CL

Scalable Environments Drive Generalizable Agents

可扩展环境驱动可泛化的智能体

Jiayi Zhang, Fanqi Kong, Guibin Zhang, Maojia Song, Zhaoyang Yu, Jianhao Ruan, Jinyu Xiang, Bang Liu, Chenglin Wu, Yuyu Luo

AI总结本文探讨了可泛化智能体需要通过可扩展环境来适应多样任务和未见环境的问题，提出环境扩展的核心挑战，并提出了统一的分类方法和可扩展环境的构建范式。

详情

AI中文摘要

Ringmaster LMO: 异步线性最小化Oracle动量方法

Abdurakhmon Sadiev, Artavazd Maranjyan, Ivan Ilin, Peter Richtárik

AI总结本文提出Ringmaster LMO，一种用于无约束随机非凸优化的异步线性最小化Oracle动量方法，通过延迟阈值机制改进传统同步方法，适用于异构分布式系统，实验表明其在系统异构性增强时表现更优。

详情

AI中文摘要

Muon最近作为一种强大的替代AdamW方法出现，展现出大规模预训练的良好结果和矩阵结构更新在实践中可能更快的证据。然而，Muon以及更一般的线性最小化Oracle（LMO）方法通常用于同步方式。这在异构分布式系统中存在问题，因为工人完成梯度计算的速度不同，同步训练必须反复等待较慢的工人。本文引入Ringmaster LMO，一种用于无约束随机非凸优化的异步LMO基于动量方法。我们的方法基于Ringmaster ASGD的延迟阈值思想。对于SGD类型方法，Ringmaster ASGD通过丢弃过于陈旧的梯度实现最优时间复杂度。Ringmaster LMO将这一机制扩展到一般LMO更新。我们建立了在广义$(L_0, L_1)$-平滑条件下的收敛保证，并进一步开发了参数无关变体，具有递减步长和自适应延迟阈值。最后，我们将我们的迭代保证转换为在异构工人计算时间下的时间复杂度界限。在经典欧几里得平滑设置中，这些界限恢复了Ringmaster ASGD的最优时间复杂度。在随机二次问题和NanoChat语言模型预训练中的实验表明，Ringmaster LMO的优势随着系统异构性增加而增强，并且该方法在同步和异步基线方法中表现更优。

英文摘要

Muon has recently emerged as a strong alternative to AdamW for training neural networks, with encouraging large-scale pretraining results and growing evidence that matrix-structured updates can be faster in practice. Yet Muon, and more generally Linear Minimization Oracle (LMO) based methods, are typically used synchronously. This is problematic in heterogeneous distributed systems, where workers complete gradient computations at different speeds and synchronous training must repeatedly wait for slower workers. In this work, we introduce Ringmaster LMO, an asynchronous LMO-based momentum method for unconstrained stochastic nonconvex optimization. Our method builds on the delay-thresholding idea of Ringmaster ASGD. For SGD-type methods, Ringmaster ASGD achieves optimal time complexity by discarding overly stale gradients. Ringmaster LMO extends this mechanism to general LMO-based updates. We establish convergence guarantees under generalized $(L_0, L_1)$-smoothness and further develop a parameter-agnostic variant with decreasing stepsizes and adaptive delay thresholds. Finally, we translate our iteration guarantees into time complexity bounds under heterogeneous worker computation times. In the classical Euclidean smooth setting, these bounds recover the optimal time complexity of Ringmaster ASGD. Experiments on stochastic quadratic problems and NanoChat language-model pretraining show that the advantages of Ringmaster LMO grow with system heterogeneity and that the method outperforms strong synchronous and asynchronous baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.18173 2026-05-19 cs.CV

Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting

你需要文本校正吗？用于无校正场景文本识别的软注意力掩码嵌入

Antonio Colombo, Giovanni Bianchi

AI总结本文提出了一种新的软注意力掩码嵌入模块（SAME），通过Transformer编码器的全局感受野编码高级特征并计算软注意力权重，然后与预测的掩码进行分层嵌入，生成精细的文本边界感知掩码，从而有效抑制背景噪声。基于该模块，本文提出了一个鲁棒的端到端文本识别框架SAME-Net，无需字符级标注或辅助文本校正模块。

详情

AI中文摘要

端到端场景文本识别，即在一个框架内统一文本检测和识别，已因深度学习的进步而取得显著进展。然而，大多数现有方法仍然受到多尺度变化、任意文本形状和复杂背景干扰导致的不完整掩码提案的影响，从而降低识别准确性。在本文中，我们提出了一种新的软注意力掩码嵌入模块（SAME），该模块利用Transformer编码器的全局感受野来编码高级特征并计算软注意力权重，然后与预测的掩码进行分层嵌入，生成精细的文本边界感知掩码，从而有效抑制背景噪声。基于该模块，我们提出了SAME-Net，一个鲁棒的端到端文本识别框架，无需字符级标注或辅助文本校正模块。由于软注意力机制是完全可微分的，识别损失梯度可以反向传播通过SAME模块到检测分支，从而实现检测和识别目标的联合优化。在具有挑战性的基准测试中进行了广泛的实验，证明了我们方法的有效性：SAME-Net在任意形状的Total-Text数据集上实现了84.02%的端到端H-mean，比之前的最先进方法GLASS在全词典准确率上高出1.02%，且无需额外训练数据；在多方向ICDAR 2015数据集上获得了具有竞争力的83.4%强词典结果。

英文摘要

End-to-end scene text spotting, which unifies text detection and recognition within a single framework, has witnessed remarkable progress driven by deep learning advances. However, most existing approaches still suffer from incomplete mask proposals caused by multi-scale variation, arbitrary text shapes, and complex background interference, thereby degrading recognition accuracy. In this paper, we propose a novel Soft Attention Mask Embedding module (SAME) that leverages the global receptive field of Transformer encoders to encode high-level features and compute soft attention weights, which are then hierarchically embedded with predicted masks to generate refined text-boundary-aware masks that effectively suppress background noise. Building upon this module, we present SAME-Net, a robust end-to-end text spotting framework that requires neither character-level annotations nor auxiliary text rectification modules. Since the soft attention mechanism is fully differentiable, recognition loss gradients can be back-propagated through the SAME module to the detection branch, enabling joint optimization of detection and recognition objectives. Extensive experiments on challenging benchmarks demonstrate the effectiveness of our approach: SAME-Net achieves 84.02\% end-to-end H-mean on the arbitrarily-shaped Total-Text dataset, surpassing the previous state-of-the-art GLASS by 1.02\% in full-lexicon accuracy without additional training data, and obtains competitive 83.4\% strong-lexicon results on the multi-oriented ICDAR 2015 dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.18165 2026-05-19 cs.LG

Semi-LAR: 基于线性注意力的半监督对比学习用于夜间光斑去除

Xiyu Zhu, Wei Wang, Kui Jiang, Zhengguo Li

AI总结本文提出了一种半监督对比学习框架，通过联合处理伪标签可靠性与表征歧视性，有效缓解了夜间光斑去除中的误差累积问题，并通过实验验证了该框架的模型无关性和性能提升。

详情

AI中文摘要

由于光斑瑕疵的大空间范围及其与场景结构的纠缠，光斑去除具有挑战性，而现有方法严重依赖大规模配对数据。我们提出了一种半监督光斑去除框架，通过联合处理伪标签可靠性与表征歧视性，实现了从未标记图像中稳定学习。我们提出了一种自适应伪标签存储库，通过无参考质量评估、动量更新和无效标签过滤逐步细化伪监督，有效缓解了误差累积。此外，我们提出了一种光斑感知的对比损失，明确将受光斑污染的输入视为负样本，并进行基于块的对比学习，鼓励表征在区分光斑模式的同时保持与可靠伪目标的一致性。在多个光斑基准上的广泛实验表明，所提出的框架具有模型无关性，并且在性能和鲁棒性方面均表现出一致的提升。

英文摘要

Lens flare removal is challenging due to the large spatial extent of flare artifacts and their entanglement with scene structures, while existing methods heavily rely on large-scale paired data. We propose a semi-supervised flare removal framework that enables stable learning from unlabeled images by jointly addressing pseudo-label reliability and representation discrimination. We propose an adaptive pseudo-label repository that progressively refines pseudo supervision through no-reference quality assessment, momentum-based updates, and invalid label filtering, effectively mitigating error accumulation. Moreover, we propose a flare-aware contrastive loss that explicitly treats flare-contaminated inputs as negatives and performs patch-level contrastive learning, encouraging representations that are discriminative against flare patterns while remaining consistent with reliable pseudo targets. Extensive experiments on multiple flare benchmarks demonstrate that the proposed framework is model-agnostic and consistently improves performance and robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.18155 2026-05-19 cs.CL

FOL2NS: Generating Natural Sentences from First-Order Logic

FOL2NS：从一阶逻辑生成自然句子

Mei Jia

AI总结本文提出FOL2NS框架，结合规则模块和微调语言模型，生成合成一阶逻辑公式并转换为自然语言表达，提升了生成样本的多样性和覆盖范围，但面临结构复杂度增加时语义精确性和自然生成的挑战。

Comments 11 pages, 8 figures

详情

AI中文摘要

将形式语言翻译成自然语言是自然语言处理中的基础挑战，推动了语义解析、定理验证和问答等下游应用的发展。在本研究中，我们引入了First-Order Logic to Natural Sentence (FOL2NS)，一种神经符号框架，旨在生成合成的一阶逻辑公式并将它们转换为自然人类表达。该框架能够处理深度嵌套结构，具有变化的量词深度(QD)，这些结构很少被现有语料库捕捉。通过结合规则驱动模块和微调的语言模型，FOL2NS增强了生成样本的多样性和覆盖范围。在我们的实验中，我们通过字符级分析和整体性能指标系统地评估了该框架的能力。实验结果表明，FOL2NS能够可靠地生成正确格式的模板和流畅的陈述，但在结构复杂度增加时面临实现精确语义表示和自然生成的挑战。

英文摘要

Translating formal language into natural language is a foundational challenge in NLP, driving various downstream applications in semantic parsing, theorem validation, and question answering. In this study, we introduce First-Order Logic to Natural Sentence (FOL2NS), a neurosymbolic framework designed to generate synthetic FOL formulas and convert them into natural human expressions. It handles deeply nested structures with varying quantifier depths (QD), which are rarely captured by existing corpora. By combining rule-driven modules with fine-tuned language models, FOL2NS enhances the diversity and coverage of the generated samples. In our experiments, we systematically evaluate the framework's capabilities through both character-level analysis and overall performance metrics. Experimental results show that FOL2NS can reliably produce well-formed templates and fluent statements, but it faces challenges in achieving precise semantic representations and natural generation as structural complexity increases.

URL PDF HTML ☆

赞 0 踩 0

2605.16142 2026-05-19 cs.AI cs.LG

Property-Guided LLM Program Synthesis for Planning

基于属性的LLM程序合成用于规划

André G. Pereira, Augusto B. Corrêa, Jendrik Seipp

AI总结本文研究了一种基于属性的LLM程序合成方法，通过检查候选程序是否满足形式定义的属性来指导LLM生成更高质量的程序，从而减少生成和评估成本。

详情

AI中文摘要

LLMs在程序合成中表现出色，能够发现超越先前解决方案的程序。然而，这些方法依赖于简单的数值评分来指示程序质量，如解决方案的值或通过的测试数量。因为评分无法指导程序为何失败，系统必须生成并评估许多候选程序，希望其中一些成功，从而增加LLM推理和评估成本。我们研究了一种不同的方法：属性引导的LLM程序合成。与评分程序后评估不同，我们检查候选程序是否满足形式定义的属性。当属性被违反时，我们提前停止评估并提供具体的反例，显示程序为何失败。这种反馈显著减少了程序生成的数量和评估成本，并可以指导LLM生成更强大的程序。我们在PDDL规划领域评估了这种方法，要求LLM合成直接启发函数：每个通过严格改进转换可达的状态都有严格改进的后继。具有这种属性的启发函数可使爬山算法直接到达目标状态。反例引导的修复循环生成一个候选程序，检查训练集上的属性，并返回第一个违反属性的案例。我们在十个规划领域上评估了这种方法，并使用分布外测试集。合成的启发函数在几乎所有测试任务中都是直接的，与最佳先前生成方法相比，我们的方法在每个领域平均生成的程序数量少七倍，无需使用搜索即可解决更多任务，并且评估候选人的计算量减少了几个数量级。只要问题允许可验证的属性，属性引导的LLM合成可以降低成本并提高程序质量。

英文摘要

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests. Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs. We study a different approach: property-guided LLM program synthesis. Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property. When the property is violated, we stop the evaluation early and provide the LLM with a concrete counterexample showing exactly how the program failed. This feedback drastically reduces both the number of program generations and the evaluation cost, and can guide the LLM to generate stronger programs. We evaluate this approach on PDDL planning domains, asking the LLM to synthesize direct heuristic functions: every state reachable by strictly improving transitions has a strictly improving successor. A heuristic with this property leads hill-climbing algorithm directly to a goal state. A counterexample-guided repair loop generates one candidate program, checks the property over a training set, and returns the first case that violates the property. We evaluate our approach on ten planning domains with an out-of-distribution test set. The synthesized heuristics are effectively direct on virtually all test tasks, and compared to the best prior generation method our approach generates seven times fewer programs per domain on average, solves more tasks without using search, and requires several orders of magnitude less computation to evaluate candidates. Whenever a problem admits a verifiable property, property-guided LLM synthesis can reduce cost and improve program quality.

URL PDF HTML ☆

赞 0 踩 0

2605.16015 2026-05-19 cs.RO cs.LG

Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning

通过强化学习实现四旋翼机的自适应外环控制

Vishnu Saj, Sushil Vemuri, Dileep Kalathil, Moble Benedict

AI总结本文提出了一种新颖的自适应控制架构，通过强化学习和残差动力学预测器来提高四旋翼飞行器在动态扰动下的控制性能，实验证明其在现实环境中具有更高的轨迹跟踪精度。

详情

AI中文摘要

深度强化学习（DRL）在四旋翼飞行器控制中通常依赖于领域随机化（DR）进行仿真到现实的转移，导致过于保守的策略难以应对动态扰动。为了解决这个问题，我们提出了一种新的自适应控制架构，能够主动感知并响应即时扰动。首先，我们训练了一个最优的外环策略，然后用残差动力学预测器（RDP）替代其对地面真实扰动数据的依赖。RDP通过仅使用状态和控制动作的历史数据在线估计飞行器所受的外部力和力矩。为了实现无缝的硬件转移，我们引入了数据高效的线性校准桥和在线推力校正机制，利用仅几秒的飞行数据将模拟的潜在空间与现实对齐。在真实世界中对Crazyflie微型四旋翼的验证表明，我们的自适应控制器在严重不确定性下，包括质量变化、不对称载荷和动态悬挂载荷，均显著优于基线方法，保持了精确的轨迹跟踪性能。

英文摘要

Deep Reinforcement Learning (DRL) for quadrotor flight control typically relies on Domain Randomization (DR) for sim-to-real transfer, resulting in overly conservative policies that struggle with dynamic disturbances. To overcome this, we propose a novel adaptive control architecture that actively perceives and reacts to instantaneous perturbations. First, we train an optimal outer-loop policy, then replace its reliance on ground-truth disturbance data with a Residual Dynamics Predictor (RDP). The RDP estimates the external forces and moments acting on the aircraft in flight online using only the history of states and control actions. For seamless hardware transfer, we introduce a data-efficient linear calibration bridge and an online thrust correction mechanism that align the simulated latent space with reality using mere seconds of flight data. Real-world validations on a Crazyflie micro-quadrotor demonstrate that our adaptive controller significantly outperforms baselines, maintaining precise trajectory tracking under severe uncertainties including mass variations, asymmetric payloads, and dynamic slung loads

URL PDF HTML ☆

赞 0 踩 0