arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1970
2510.14737 2026-05-21 cs.CV

Free-Grained Hierarchical Visual Recognition

自由粒度层次视觉识别

Seulki Park, Zilin Wang, Stella X. Yu

AI总结 本文研究了在现实世界中标签不完整且粒度混合的情况下,如何进行层次视觉识别。通过引入自由粒度训练方法,结合文本监督和半监督学习,改进了传统层次方法在不完整监督下的性能,并提出了自由粒度推理机制以适应不同预测深度的需求。

详情
Comments
Accepted to CVPR 2026. 31 pages
AI中文摘要

层次图像识别旨在沿着语义分类学预测类别标签,从广义类别到具体类别。通常假设每张训练图像在其分类路径上完全标注。现实更复杂:远处的鸟可能仅被标记为鸟,而清晰的特写可能证明是 bald eagle。我们引入了自由粒度训练,其中标签可能出现在分类学的任何层次,模型必须从不完整、混合粒度的监督中学习一致的层次预测。我们构建了具有不同标签粒度的基准数据集,并展示了现有层次方法在该设置下性能急剧下降。为弥补缺失的监督,我们提出了两种简单解决方案:一种是添加基于文本的广泛监督以捕捉视觉属性,另一种是将特定分类学层次中缺失的标签视为半监督学习问题。我们还研究了自由粒度推理,其中模型选择预测深度,当细粒度预测不确定时返回可靠的粗粒度标签。整体而言,我们的任务、数据集和方法使层次识别更接近现实世界中标签的产生方式。

英文摘要

Hierarchical image recognition seeks to predict class labels along a semantic taxonomy, from broad categories to specific ones, typically under the tidy assumption that every training image is fully annotated along its taxonomy path. Reality is messier: A distant bird may be labeled only bird, while a clear close-up may justify bald eagle. We introduce free-grain training, where labels may appear at any level of the taxonomy and models must learn consistent hierarchical predictions from incomplete, mixed-granularity supervision. We build benchmark datasets with varying label granularity and show that existing hierarchical methods deteriorate sharply in this setting. To make up for missing supervision, we propose two simple solutions: One adds broad text-based supervision that captures visual attributes, and the other treats missing labels at specific taxonomy levels as a semi-supervised learning problem. We also study free-grained inference, where the model chooses how deep to predict, returning a reliable coarse label when a fine-grained one is uncertain. Together, our task, datasets, and methods move hierarchical recognition closer to the way labels arise in the real world.

2509.07120 2026-05-21 cs.CV

Block-Sparse Global Attention for Efficient Multi-View Geometry Transformers

块稀疏全局注意力:用于高效多视图几何变换器

Chung-Shien Brian Wang, Christian Schmidt, Jens Piekenbrinck, Bastian Leibe

AI总结 本文提出了一种块稀疏替代密集全局注意力的方法,通过优化内核实现高效多视图重建,显著提升处理大规模图像集的可扩展性。

详情
Comments
Project page at https://vision.rwth-aachen.de/sparse-vggt
AI中文摘要

高效且准确的前馈多视图重建长期以来一直是计算机视觉中的重要任务。最近的基于变换器的模型,如VGGT、π³和MapAnything,通过相对简单的架构展示了显著的性能。然而,它们的可扩展性从根本上受到全局注意力二次复杂度的限制,这在处理大规模图像集时会带来显著的运行时间瓶颈。在本工作中,我们通过实证分析这些模型的全局注意力矩阵,并观察到概率质量集中在一小部分补丁-补丁交互上,这些交互对应于跨视图几何对应关系。基于这一见解并受近期大语言模型进展的启发,我们提出了一种无需训练的块稀疏替代密集全局注意力方法,通过高度优化的内核实现。我们的方法在保持可比任务性能的同时,将推理速度提高了超过3倍。在全面的多视图基准测试中,我们的方法无缝集成到现有的基于全局注意力的架构中,如VGGT、π³和MapAnything,同时显著提高了处理大规模图像集的可扩展性。

英文摘要

Efficient and accurate feed-forward multi-view reconstruction has long been an important task in computer vision. Recent transformer-based models like VGGT, $π^3$ and MapAnything have demonstrated remarkable performance with relatively simple architectures. However, their scalability is fundamentally constrained by the quadratic complexity of global attention, which imposes a significant runtime bottleneck when processing large image sets. In this work, we empirically analyze the global attention matrix of these models and observe that the probability mass concentrates on a small subset of patch-patch interactions corresponding to cross-view geometric correspondences. Building on this insight and inspired by recent advances in large language models, we propose a training-free, block-sparse replacement for dense global attention, implemented with highly optimized kernels. Our method accelerates inference by more than $3\times$ while maintaining comparable task performance. Evaluations on a comprehensive suite of multi-view benchmarks demonstrate that our approach seamlessly integrates into existing global attention-based architectures such as VGGT, $π^3$ , and MapAnything, while substantially improving scalability to large image collections.

2507.21168 2026-05-21 cs.CL cs.AI cs.LG

Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question

多样化的大语言模型还是多样化的问题解释?那是集成的问题

Rafael Rosales, Santiago Miret

AI总结 本文比较了使用大语言模型回答二元问题的两种多样性方法:模型多样性和问题解释多样性,并发现问题解释多样性在集成准确性上表现更优。

详情
Journal ref
Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pages 5116-5128
AI中文摘要

有效利用多样性已被证明可以提高各种机器学习模型,包括大语言模型(LLMs)的性能。然而,确定最有效的多样性使用方法仍是一个挑战。在本工作中,我们比较了两种用于使用LLMs回答二元问题的多样性方法:模型多样性,即多个模型回答相同的问题,以及问题解释多样性,即使用同一模型以不同方式 framing 相同的问题来回答。对于这两种情况,我们应用多数投票作为集成共识启发式方法来确定最终答案。我们的boolq、strategyqa和pubmedqa实验表明,问题解释多样性在集成准确性上始终优于模型多样性。此外,我们对GPT和LLaMa的分析表明,模型多样性通常产生在最佳和最差集成成员之间的结果,而没有明显的改进。

英文摘要

Effectively leveraging diversity has been shown to improve performance for various machine learning models, including large language models (LLMs). However, determining the most effective way of using diversity remains a challenge. In this work, we compare two diversity approaches for answering binary questions using LLMs: model diversity, which relies on multiple models answering the same question, and question interpretation diversity, which relies on using the same model to answer the same question framed in different ways. For both cases, we apply majority voting as the ensemble consensus heuristic to determine the final answer. Our experiments on boolq, strategyqa, and pubmedqa show that question interpretation diversity consistently leads to better ensemble accuracy compared to model diversity. Furthermore, our analysis of GPT and LLaMa shows that model diversity typically produces results between the best and the worst ensemble members without clear improvement.

2506.04042 2026-05-21 cs.CL

Causal Path Alignment: Anchoring the Optimization Trajectory for Controllable In-Parameter Knowledge Editing

因果路径对齐:为可控的参数知识编辑锚定优化轨迹

Xiyu Liu, Zhengxiao Liu, Naibin Gu, Zheng Lin, Weiping Wang

AI总结 本文提出Causal Path Alignment框架,通过锚定优化轨迹来解决参数知识编辑中的主体主导记忆干扰问题,提升关系特异性并减少副作用。

详情
Comments
Accepted by IJCAI 2026
AI中文摘要

知识编辑对于高效更新大型语言模型(LLM)的参数记忆至关重要,使它们能够在动态环境中作为演进代理发挥作用。然而,主流的参数知识编辑方法存在主体主导记忆干扰问题:修改特定事实会无意中破坏与同一主题相关的更广泛结构知识。我们诊断根本原因是捷径学习病理,即优化目标过拟合了主体表示,而绕过了本质的关系上下文。为此,我们提出因果路径对齐(CPA),一种原理性的框架,旨在将优化轨迹锚定到有效的因果路径上。CPA强制参数更新通过关系意识的中间状态,从而防止上下文依赖性的丢失。在多种LLM基础架构上的实验结果表明,CPA一致消除了捷径,显著提高了关系特异性,同时表现出最小的副作用。此外,CPA为现有编辑器提供了一种模型无关的插件,为可靠和可信的参数知识编辑铺平了道路。

英文摘要

Knowledge editing is pivotal for efficiently updating the parametric memory of Large Language Models (LLMs), enabling them to function as evolving agents in dynamic environments. However, mainstream in-parameter knowledge editing approaches suffer from Subject-Dominant Memory Interference: modifying a specific fact inadvertently corrupts the broader structural knowledge associated with the same subject within LLMs. We diagnose the root cause as a shortcut learning pathology, where the optimization objective overfits subject representations while bypassing the essential relational context. To rectify this, we propose Causal Path Alignment (CPA), a principled framework designed to anchor the optimization trajectory to valid causal pathways. CPA enforces parameter updates to route through relation-aware intermediate states, thereby preventing the erasure of contextual dependencies. Experimental results across diverse LLM backbones demonstrate that CPA consistently eliminates the shortcut, significantly improving relation specificity while exhibiting minimal side-effects. Moreover, CPA serves as a model-agnostic plug-in for existing editors, paving the way for reliable and trustworthy in-parameter knowledge editing.

2502.17518 2026-05-21 cs.LG cs.AI q-fin.CP stat.ML

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

详情
Comments
16 pages,5 figures, 1 table
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our results demonstrate that ensemble methods consistently outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, we identify the sensitivity of ensemble performance to the choice of variance threshold τ, highlighting the importance of dynamic τ adjustment to achieve optimal performance. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2501.01793 2026-05-21 cs.LG cs.AI

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

创建从未存在过的虚拟学生:利用大型语言模型和CTGANs进行合成数据生成

Mohammad Khalil, Sam Urmian, Ronas Shakya, Qinyi Liu

AI总结 本文研究了利用生成对抗网络(GANs)和大型语言模型(LLMs)生成合成表格数据的潜力,探讨了通过合成数据创建虚拟学生以服务于学习分析模型的可能性,并评估了不同生成模型的性能。

详情
AI中文摘要

在本研究中,我们探索了人工智能和深度学习技术,特别是生成对抗网络(GANs)和大型语言模型(LLMs)在生成合成表格数据方面的成长潜力。获取高质量学生数据对于推进学习分析至关重要,但隐私问题和全球更严格的数据保护法规限制了其可用性和使用。合成数据提供了一个有前途的替代方案。我们探讨了是否可以利用合成数据来创建虚拟学生以服务于学习分析模型。使用流行的GAN模型CTGAN和三种LLMs-GPT2、DistilGPT2和DialoGPT,我们生成了合成的表格学生数据。我们的结果表明,这些方法具有强大的潜力,能够生成高质量的合成数据集,与真实学生数据相似。为了验证我们的发现,我们应用了一套全面的效用评估指标来评估合成数据的统计和预测性能,并比较了不同生成模型,特别是LLMs的性能。本研究旨在为学习分析社区提供有价值的见解,为扩展学习分析领域的方法学工具箱提供新的创新方法。

英文摘要

In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students data is critical for advancing learning analytics, but privacy concerns and stricter data protection regulations worldwide limit their availability and usage. Synthetic data offers a promising alternative. We investigate whether synthetic data can be leveraged to create artificial students for serving learning analytics models. Using the popular GAN model CTGAN and three LLMs- GPT2, DistilGPT2, and DialoGPT, we generate synthetic tabular student data. Our results demonstrate the strong potential of these methods to produce high-quality synthetic datasets that resemble real students data. To validate our findings, we apply a comprehensive set of utility evaluation metrics to assess the statistical and predictive performance of the synthetic data and compare the different generator models used, specially the performance of LLMs. Our study aims to provide the learning analytics community with valuable insights into the use of synthetic data, laying the groundwork for expanding the field methodological toolbox with new innovative approaches for learning analytics data generation.

2501.01785 2026-05-21 cs.LG cs.AI cs.CY

Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

合成数据能否公平且隐私?合成数据生成与公平性算法的比较研究

Qinyi Liu, Oscar Deho, Sam Urmian, Mohammad Khalil, Srecko Joksimovic, George Siemens

AI总结 本研究探讨了合成数据生成与公平性算法在平衡隐私和公平性方面的效果,发现DECAF算法在隐私和公平性之间取得最佳平衡,但其预测准确性较低,而对合成数据应用预处理公平算法能进一步提升公平性。

详情
AI中文摘要

随着机器学习在学习分析(LA)中的广泛应用,算法公平性和隐私问题引发了广泛关注。合成数据作为一种双重用途工具,能够增强LA模型的隐私性和公平性。然而,先前研究指出公平性与隐私之间存在反比关系,使同时优化两者变得困难。本研究探讨了哪些合成数据生成器能最好地平衡隐私和公平性,并确定预处理公平算法(通常应用于真实数据集)在合成数据上的有效性。我们的结果表明,DEbiasing CAusal Fairness(DECAF)算法在隐私和公平性之间取得了最佳平衡。然而,DECAF在实用性上表现不佳,这体现在其预测准确性上。值得注意的是,我们发现将预处理公平算法应用于合成数据时,公平性提升幅度比应用于真实数据时更大。这些发现表明,结合合成数据生成与公平性预处理可以为创建更公平的LA模型提供有前途的方法。

英文摘要

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

2409.08700 2026-05-21 cs.LG

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

通过可穿戴设备和人工智能实现个性化体重管理

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Blanca Lacruz-Pleguezuelos, Sofia Bosch Pastor, Laura Judith Marcos-Zambrano, Guadalupe X. Bazán, Gala Freixer, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia, Isabel Espinosa-Salinas, Enrique Carrillo de Santa Pau

AI总结 本文研究利用可穿戴设备和人工智能预测超重和肥胖人群的体重变化,通过分析100名受试者的生物标志物、体征和行为数据,发现体重减轻者与未减轻者的关键差异,使用梯度提升分类器达到84.44%的AUC,表明多数据源整合在个性化医疗中的潜力。

详情
Journal ref
Computers in Biology and Medicine, Vol. 173, 111676, 2026
Comments
25 pages, 6 figures, 7 tables, 1 appendix
AI中文摘要

早期检测慢性及非传染性疾病(NCDs)对于在初始阶段有效治疗至关重要。本研究探讨了可穿戴设备和人工智能(AI)在预测超重和肥胖个体体重变化中的应用。使用来自AI4FoodDB数据库的1个月试验数据,包括生物标志物、体征和行为数据,我们识别出体重减轻(≥初始体重2%)者与未减轻者之间的关键差异。特征选择技术和分类算法显示出有前景的结果,梯度提升分类器达到84.44%的曲线下面积(AUC)。多数据源(如体征、体力和睡眠活动等)的整合增强了性能,表明可穿戴设备和AI在个性化医疗中的潜力。

英文摘要

Early detection of chronic and Non-Communicable Diseases (NCDs) is crucial for effective treatment during the initial stages. This study explores the application of wearable devices and Artificial Intelligence (AI) in order to predict weight loss changes in overweight and obese individuals. Using wearable data from a 1-month trial involving around 100 subjects from the AI4FoodDB database, including biomarkers, vital signs, and behavioral data, we identify key differences between those achieving weight loss (>= 2% of their initial weight) and those who do not. Feature selection techniques and classification algorithms reveal promising results, with the Gradient Boosting classifier achieving 84.44% Area Under the Curve (AUC). The integration of multiple data sources (e.g., vital signs, physical and sleep activity, etc.) enhances performance, suggesting the potential of wearable devices and AI in personalized healthcare.

2605.21260 2026-05-21 cs.LG

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

关于思维链的成本与收益:一种学习理论视角

Yue Zhang, Zhiyi Dong, Tommaso Cesari, Yongyi Mao

AI总结 本文从学习理论的角度出发,研究了思维链(CoT)的成本与收益,通过分析回答映射与链式规则的交互作用,定义了假设在该交互下的推理风险,并推导出该风险的紧分解,揭示了CoT在不同条件下的帮助与损害作用。

详情
AI中文摘要

我们开发了一个学习理论框架,用于理解思维链(CoT)。我们将CoT建模为回答映射与链式规则之间的交互作用,链式规则通过自回归的方式生成中间问题,并定义了在该交互下假设的推理风险。我们的第一个结果是将该风险紧分解为两个具有相反作用的项:一个oracle轨迹风险(OTR),它捕捉了CoT的收益,并在领域适应问题中减少到目标领域风险;一个轨迹不匹配风险(TMR),它捕捉了CoT通过在不匹配的推理轨迹上积累误差所带来的成本。然后我们展示,这种成本在没有结构的情况下是无法避免的:如果任何一项损失、假设的回答映射或链式规则缺乏稳定性,即使OTR为零且假设与真实值一致,TMR也可以任意大。相反,在具有稳定性的情况下,我们证明了在精确放大因子下TMR的紧上界,该放大因子识别了有界、线性和指数误差增长区域。这些结果共同给出了CoT何时有助于推理、何时有害以及控制两者之间转换的精确理论。

英文摘要

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

2605.21258 2026-05-21 cs.RO cs.AI

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

为机器人操作中的高效视觉表征学习结构潜在点

Yicheng Jiang, Jiaxu Wang, Junhao He, Zesen Gan, Junhao Li, Qiang Zhang, Jingkai Sun, Jiahang Cao, Mingyuan Sun, Xiangyu Yue, Qiming Shao

AI总结 本文提出了一种新的预训练框架,通过学习混合表示-结构潜在点,结合隐式表示的表达能力和显式表示的结构先验,以提高机器人操作中的视觉表征效率和鲁棒性。

详情
Journal ref
International Conference on Robotics and Automation 2026
AI中文摘要

当前基于3D感知的预训练方法在具身感知和操作中大多基于可微渲染框架,产生完全隐式神经场或完全显式几何基元。隐式表示虽然具有表达能力,但缺乏显式结构线索,而显式表示则保留几何信息但受到分辨率限制和泛化能力差的困扰。为了解决这些限制,我们提出了一种新的预训练框架,学习混合表示-结构潜在点。具体来说,我们将在点云自编码器的潜在空间中插入一个点-wise潜在变分自编码器,联合正则化点-wise特征和坐标向高斯先验。所得到的紧凑潜在保留了粗略的结构趋势,不编码精确几何,但捕捉了更丰富的粗糙形状和语义信息,有效结合了隐式表示的表达能力和显式表示的结构先验。此外,受先前工作的共享设计选择启发,我们开发了一种流线型、高效的3DGS基于渲染管道,故意保持轻量,提高效率的同时,让前端潜在模块有更大的表征能力。在RLBench、ManiSkill2和真实机器人平台上的大量评估显示,在任务成功率、样本效率和对视角和场景变化的鲁棒性方面均优于强基线。消融研究进一步确认了框架中每个组件对整体性能的重要性。

英文摘要

Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretraining framework that learns a hybrid representation-structural latent points. Specifically, we insert a point-wise latent variational autoencoder into the latent space of a point-cloud autoencoder, jointly regularizing point-wise features and coordinates toward a Gaussian prior. The resulting compact latent preserves coarse structural tendencies, which do not encode precise geometry but capture richer rough shape and semantic information, effectively combining the expressiveness of implicit representations with the structural priors of explicit ones. In addition, informed by shared design choices in prior work, we develop a streamlined, efficient 3DGS-based rendering pipeline that is deliberately kept lightweight, improving efficiency while leaving greater representational capacity to the front-end latent module. Extensive evaluations on RLBench, ManiSkill2, and a real-robot platform demonstrate consistent gains in task success, sample efficiency, and robustness to viewpoint and scene variations over strong baselines. Ablation studies further confirm that each component of our framework is critical to overall performance.

2605.21257 2026-05-21 cs.RO

Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions

通过可微分CVaR障碍函数实现风险适应的强化学习

Xinyi Wang, Taekyung Kim, Bardh Hoxha, Georgios Fainekos, Dimitra Panagou

AI总结 本文提出了一种端到端的风险适应框架,用于在障碍物运动不确定性的环境下进行人群导航,结合强化学习与基于条件价值-at-风险(CVaR)障碍函数的可微分二次规划安全层,共同学习名义控制输入、风险水平和安全边际,并强制执行显式的概率安全约束。

详情
Comments
Project page: https://anonymousrobotics9666.github.io/rlcvarbf/
AI中文摘要

在存在不确定障碍物运动的拥挤环境中进行规划仍然具有挑战性,因为随机交互常常导致过于保守的行为或降低效率。为了解决这一挑战,我们提出了一种端到端的风险适应框架,用于在由高斯混合模型建模的障碍物运动不确定性下的人群导航。该框架结合了强化学习(RL)与基于条件价值-at-风险(CVaR)障碍函数的可微分二次规划安全层,共同学习名义控制输入、风险水平和安全边际,并强制执行显式的概率安全约束。这种设计实现了情境感知的适应,促进高效行为,仅在必要时引发谨慎。我们在动态、不确定和拥挤的环境中进行了广泛的评估,涵盖了不同障碍物密度和机器人模型的情况,进一步评估了在三种非分布情况下的泛化能力。提供了基于优化、基于RL和基于集成RL和优化方法的比较,证明所提出的方法在安全、效率和不确定性下的泛化能力方面表现最强。

英文摘要

Planning through crowded environments under uncertain obstacle motions remains difficult, as stochastic interactions often induce overly conservative behavior or reduced efficiency. To address this challenge, we propose an end-to-end risk adaptation framework for crowd navigation under obstacle-motion uncertainty modeled by a Gaussian mixture model. The framework combines reinforcement learning~(RL) with a differentiable quadratic-program safety layer based on Conditional Value-at-Risk~(CVaR) barrier functions, jointly learning nominal control input, risk level, and safety margin and enforcing explicit probabilistic safety constraints. This design enables context-aware adaptation, promoting efficient behavior while invoking caution only when necessary. We conduct extensive evaluations in dynamic, uncertain, and crowded environments across varying obstacle densities and robot models, and further assess generalization under three out-of-distribution cases. Comparisons across optimization-based, RL-based, and integrated RL and optimization methods are provided, and the proposed method is shown to deliver the strongest overall performance in safety, efficiency, and generalization under uncertainty.

2605.21256 2026-05-21 cs.CL

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

西班牙临床笔记中可靠自动分诊:一种风险感知的HIV怀疑识别混合框架

Rodrigo Morales-Sánchez, Soto Montalvo, Raquel Martínez

AI总结 本文提出一种混合框架,用于在西班牙临床笔记中识别HIV怀疑,通过分离随机不确定性和epistemic不确定性,提高分诊的可靠性。

详情
Comments
Accepted at the BioNLP Workshop @ ACL 2026
AI中文摘要

标准临床自然语言处理(NLP)基准往往通过在模糊实例上强制确定性分类而产生虚高指标,从而掩盖了过于自信预测的临床风险。为弥合这一差距,我们提出了一种风险感知的混合选择性分类框架,在西班牙临床笔记中早期人类免疫缺陷病毒怀疑识别上进行了评估。我们的双验证方法通过Mondrian符合预测分离随机不确定性,并通过多中心马哈拉诺斯距离 veto 分离epistemic不确定性。实证评估表明,标准不确定性度量和基线分类器在安全医疗分诊中结构上不足,当被迫在严格可靠性约束下运行时,会遭受严重的覆盖崩溃。相反,通过要求临床叙述通过概率和几何保障,所提出的框架成功地隔离了一个高度可信的操作领域。

英文摘要

Standard clinical Natural Language Processing (NLP) benchmarks often yield inflated metrics by forcing deterministic classification on ambiguous instances, thereby obscuring the clinical risks of overconfident predictions. To bridge this gap, we propose a risk-aware hybrid selective classification framework, evaluated on early Human Immunodeficiency Virus suspicion identification in Spanish clinical notes. Our dual-verification approach explicitly decouples aleatoric uncertainty through Mondrian conformal prediction and epistemic uncertainty using a Multi-Centroid Mahalanobis Distance veto. Empirical evaluations reveal that standard uncertainty metrics and baseline classifiers are structurally insufficient for safe medical triage, suffering severe coverage collapse when forced to operate under strict reliability constraints. In contrast, by demanding that clinical narratives pass both probabilistic and geometric safeguards, the proposed framework successfully isolates a highly trustworthy operational domain.

2605.21244 2026-05-21 cs.CV

SR-Ground: Image Quality Grounding for Super-Resolved Content

SR-Ground: 图像质量接地用于超分辨内容

Artem Borisov, Evgeney Bogatyrev, Khaled Abud, Dmitriy Vatolin

AI总结 本文提出SR-Ground数据集,用于超分辨图像中细粒度伪影分割,通过大规模众包研究生成高质量数据集,提升IQA模型性能并减少超分辨输出中的可感知伪影。

详情
AI中文摘要

超分辨率(SR)近年来发展迅速,扩散模型在保真度上取得了前所未有的进展,但引入了新的视觉伪影类型。尽管现有图像质量评估(IQA)方法提供整体质量评分,但缺乏可解释性且无法区分现代SR方法产生的不同伪影类型。为解决这一差距,我们引入SR-Ground,一个专门设计用于超分辨图像细粒度伪影分割的大规模数据集。该数据集包含由多种最先进的SR模型处理的图像,具有像素级注释的多种伪影类别。我们进行了一项涉及1,062名参与者的大型众包研究,以验证和优化自动生成的分割,最终生成了包含6种不同伪影类型的63,000张高质量图像数据集。我们证明了在SR-Ground上训练具有接地能力的IQA模型在下游任务中显著提高了性能。此外,我们引入了一种微调流程,利用我们的接地模型减少SR输出中的可感知伪影,展示了我们数据集的实用价值。

英文摘要

Super-Resolution (SR) has advanced rapidly in recent years, with diffusion-based models achieving unprecedented fidelity at the cost of introducing new types of visual artifacts. While existing Image Quality Assessment (IQA) methods provide holistic quality scores, they lack interpretability and fail to distinguish between different artifact types arising from modern SR approaches. To address this gap, we introduce SR-Ground, a large-scale dataset specifically designed for fine-grained artifact segmentation in super-resolved images. The dataset comprises images processed by a diverse set of state-of-the-art SR models, with pixel-level annotations for multiple artifact categories. We conduct a large-scale crowdsourcing study involving 1,062 participants to validate and refine automatically generated segmentations, resulting in a high-quality dataset of 63,000 images spanning 6 distinct artifact types. We demonstrate that training IQA models with grounding capabilities on SR-Ground significantly improves performance on downstream tasks. Furthermore, we introduce a fine-tuning pipeline that leverages our grounding model to reduce perceptible artifacts in SR outputs, showcasing the practical utility of our dataset.

2605.21242 2026-05-21 cs.RO

To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble

选择还是不选择,这是个问题:将机器人技能预测蒸馏成一个小集成

Haechan Mark Bong, Simon Roy, Euhid Aman, Giovanni Beltrame

AI总结 本文研究了机器人技能预测问题,通过合成数据集和微调句子编码器,提出了一种小规模专用模型,在零样本提示下优于大型通用LLM,在机器人队伍任务路由中表现更佳。

详情
Journal ref
ICRA 2026 Workshop on Synthetic Data for Robot Learning
AI中文摘要

随着机器人队伍变得更加异质化,包括人形机器人、探测车、四足机器人和无人机,选择合适的机器人执行任务成为系统问题的核心。我们研究了机器人技能预测:将自然语言任务描述映射到所需的物理能力,如飞行、轮子、腿、表面水、水下和手。由于没有将自然语言任务描述映射到机器人物理能力的标记数据,我们使用LLM辅助生成和目标标签审计构建了合成任务到技能数据集。在该数据上训练的约133M参数的两个微调句子编码器(mpnet + MiniLM)在分层的200任务数据集上达到83.5%的任务到技能匹配,优于Kimi K2(1T MoE)72.0%、GPT-OSS-120B 71.5%和Llama-4-Scout-17B 69.0%。这些结果表明,在固定机器人技能分类下,通过合成数据训练的小型专用模型在机器人队伍任务路由中可以优于大型通用LLM。

英文摘要

As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.

2605.21241 2026-05-21 cs.LG

Divide and Contrast: Learning Robust Temporal Features without Augmentation

划分与对比:无需增强学习鲁棒的时间特征

Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor

AI总结 本文提出Di-COT框架,通过对比时间窗口内的信息子结构而非单个时间步,实现了无需数据增强和多编码器传递的自监督学习,从而在六个大规模真实世界数据集和UCR/UEA基准上取得了最先进的性能,同时显著减少了训练时间。

详情
Comments
Published in the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

针对时间序列表示的自监督学习旨在减少对标记数据的依赖,同时保持强大的下游性能,但许多现有方法存在计算成本高或依赖不适用于多样化时间动态的假设。在本工作中,我们引入了Divide and Contrast (Di-COT),一种无需数据增强和多次编码器传递的无监督框架,通过对比时间窗口内的信息子结构而非单个时间步来实现。Di-COT在每次迭代中随机将每个窗口划分为少量重叠的子块,从而实现高效且有意义的对比,同时减轻时间转换期间的假阳性。为进一步提高可扩展性,我们采用了一种对比目标,其计算依赖于批量大小和子块数量,使损失计算独立于序列长度。在六个大规模真实世界数据集以及UCR和UEA基准上的广泛实验表明,Di-COT学习了语义结构化且可迁移的表示,实现了分类、聚类、kNN和跨数据集转移任务上的最先进的性能,同时大幅减少了训练时间。源代码可在https://github.com/sfi-norwai/Di-COT上公开获取。

英文摘要

Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps. Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions. To further improve scalability, we adopt a contrastive objective whose computation depends on the batch size and the number of sub-blocks, making loss computation independent of sequence length. Extensive experiments on six large-scale real-world datasets, as well as the UCR and UEA benchmarks, demonstrate that Di-COT learns semantically structured and transferable representations, achieving state-of-the-art performance on classification, clustering, $k$NN, and cross-dataset transfer, while substantially reducing training time. The source code is publicly available at https://github.com/sfi-norwai/Di-COT.

2605.21240 2026-05-21 cs.LG cs.AI

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

APEX:自主策略探索用于自演化大语言模型代理

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

AI总结 本文提出APEX,一种用于自演化大语言模型代理的自主策略探索方法,通过构建和维护显式的策略空间来解决探索崩溃问题,并在多个基准测试中表现出色。

详情
AI中文摘要

LLM代理在广泛复杂的任务中表现出强大的性能,包括需要长时间决策的交互环境。但是这些代理在测试时间无法实时学习。自演化代理通过在多个回合中积累记忆和反思来解决这个问题,而不是要求模型权重更新。然而,这些代理常常面临探索崩溃的问题:随着记忆的增长,行为会集中在熟悉的高奖励惯例上,减少了发现更好替代品的机会。为了解决这个问题,我们提出了自主策略探索(APEX),通过策略图——一个具有先决条件依赖边的有向无环图来构建和维护显式的策略空间。在APEX中,分支发现通过证据支持的未探索方向扩展地图,而策略选择在规划过程中平衡探索和利用。在九个Jericho文本冒险游戏和WebArena(一个现实的网络交互基准)上进行评估,APEX优于所有基线。广泛的消融实验验证了每个组件的贡献,并展示了在不同设置中的鲁棒性,证明了APEX在自演化代理中的持续探索有效性。

英文摘要

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

2605.21237 2026-05-21 cs.CV cs.AI

RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

RePCM:区域特定和表型适应的双心室心脏运动合成

Xuan Yang, Xiaohan Yuan, Hao Li, Lingyu Chen, Yanan Liu, Lei Li

AI总结 本文提出RePCM方法,通过单帧双心室网格运动补全,利用区域特定和表型适应性来提升心脏运动合成的准确性,以应对心血管疾病导致的区域和疾病特异性差异。

详情
Comments
Early Accepted by MICCAI 2026. This is the author's submitted version. 10 pages, 3 figures
AI中文摘要

心脏周期内的运动对于量化区域功能至关重要,并且强烈受到心血管疾病的影响。由于在实践中难以获得时间密集的网格序列,我们专注于利用更易获得的终舒张期帧来推断完整的周期序列。由于存在强区域和疾病特异性差异,传统方法常通过依赖生成模型来过度平滑数据,这些模型是为全球模式优化的。为了解决这个问题,我们提出了Region-Aware和Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis(RePCM)方法,用于单帧双心室网格运动补全。在第一阶段,重建网络学习顶点级别的运动描述符,聚类产生数据驱动的功能分区,提供显式的运动衍生区域结构。在第二阶段,Region-Specific Injection模块在条件VAE中强制执行掩码同步的区域交换,保留局部特定动态并限制跨区域混合。Phenotype-Adaptive Mixture-of-Experts先验条件于ED形状,使用解剖引导的提示来建模潜在运动趋势并捕捉跨疾病变化。在三个涵盖不同心血管疾病的数据集上的实验显示,在几何和功能指标上取得了持续的改进,并且区域特定动态的保护得到了改善。

英文摘要

Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth the data by relying on generative models that are optimized for global patterns. To address this problem, we propose Region-Aware and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis (RePCM) for single frame Bi-ventricular mesh motion completion. In Stage I, a reconstruction network learns vertex wise motion descriptors and clustering yields a data driven functional partition, providing an explicit motion derived region structure. In Stage II, a Region-Specific Injection Module enforces masked, synchronized region exchange within a conditional VAE, preserving localized specific dynamics and restricting cross-region mixing. A Phenotype-Adaptive Mixture-of-Experts prior conditioned on ED shape uses anatomy-guided cues to model latent motion trends and capture inter-disease variability. Experiments on three datasets covering different cardiovascular diseases show consistent gains in geometric and functional metrics and improved preservation of region specific dynamics.

2605.21227 2026-05-21 cs.CL

Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models

LLMs是否知道卢森堡语借词?探测低资源多语言模型中的词汇新词

Nina Hosseini-Kivanani

AI总结 本文通过LexNeo-Bench基准测试,探讨了低资源多语言模型在词汇借词识别中的表现,发现通过构建语言知识图谱可以显著提升模型的借词分类准确率,表明词汇资源对LLM评估具有结构化上下文的作用。

详情
Comments
Accepted to Neollm colocated with LREC2026, Three figures and three tables
AI中文摘要

大型语言模型(LLMs)越来越多地用于小接触语言的写作辅助,但不清楚它们是否尊重社区在词汇借词和新词方面的规范。我们引入LexNeo-Bench,一个包含3,050个实例的令牌级基准,来源于LuxBorrow,一个大规模的卢森堡语新闻语料库,其中目标令牌被标记为本土词或法语、德语或英语借词。使用此基准,我们对三种多语言LLM在34种提示设置下进行测试,涉及两种任务:借词类型分类和二元词汇创新代理(借词与本土词)。在无外部上下文的情况下,模型在借词分类上的表现仅略高于随机猜测,因此我们构建了一个语言知识图谱,编码了供体语言、形态模式和词汇类比,并将实例特定的子图注入提示中。知识图谱提示将借词分类准确率从25-35%提升到71-81%,大幅缩小了小模型和大模型之间的差距,同时使新词检测困难且对少样本设计敏感。我们的结果表明,词汇意识提示对低资源接触语言中的稳健借词判断非常有益,且词汇资源可以作为LLM评估的结构化上下文。本研究在ENEOLI COST行动中进行,并探讨了多语言卢森堡语数据中的借词作为词汇创新的形式。

英文摘要

Large language models (LLMs) are increasingly used for writing assistance in small contact languages, yet it is unclear whether they respect community norms around lexical borrowing and neology. We introduce LexNeo-Bench, a 3{,}050-instance token-level benchmark derived from LuxBorrow, a large-scale Luxembourgish news corpus, where target tokens are labelled as native or as French, German, or English borrowings. Using this benchmark, we probe three multilingual LLMs across 34 prompt settings on two tasks: borrowing type classification and a binary lexical-innovation proxy (borrowing versus native). Without external context, models perform only slightly above chance on borrowing classification, so we construct a linguistic knowledge graph that encodes donor language, morphological patterns, and lexical analogues, and inject instance-specific subgraphs into the prompt. Knowledge-graph prompts raise borrowing classification accuracy from 25 -- 35\% up to 71 -- 81\% and largely close the gap between small and large models, while leaving neology detection difficult and sensitive to few-shot design. Our results show that lexicon-aware prompting is highly beneficial for robust borrowing judgments in low-resource contact languages and that lexical resources can serve as structured context for LLM evaluation. This study was carried out within the ENEOLI COST Action and examines borrowing as a form of lexical innovation in multilingual Luxembourgish data.

2605.21226 2026-05-21 cs.LG cs.AI

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

OCTOPUS: 通过在最优平方误差量化下的八面体参数化优化Transformer的KV缓存

Mark Boss, Vikram Voleti, Simon Donné, Shimon Vainer

AI总结 OCTOPUS通过联合量化旋转坐标三元组,优化了Transformer的KV缓存,在保持内存带宽和足迹的同时,通过八面体参数化将方向映射到平方,并利用Lloyd-Max量化来实现非均匀的位分配,从而在各种数据类型中实现了优于现有旋转编码器的性能。

详情
AI中文摘要

关键值(KV)缓存是长上下文自回归推断中内存带宽和足迹的主要瓶颈。最近的旋转预条件编码器(TurboQuant, PolarQuant)表明,通过结构化的随机旋转后,再配合每个坐标轴的标量量化器,该量化器的边际分布具有解析性,可以近似达到KV压缩的最优解。OCTOPUS通过联合量化旋转坐标三元组进一步推进了这一范式。每个三元组的方向通过八面体参数化映射到平方,然后得到的两个坐标和三元组范数通过Lloyd-Max量化与实现匹配的边际分布进行量化。通过优化每个三元组的平方误差,得到的位分配严格非均匀,仅依赖于键的总维度。我们发现,在有限维的情况下,通过扫描找到的质量最优是恒定的,无论在我们测试的任何现实解码器中。该编码器是数据无关的、在线的,并且在给定种子的情况下是确定性的。在文本、视频和音频中,OCTOPUS在每个报告的比特宽度和指标上都匹配或超越了所有先前的旋转编码器,其优势随着比特数的减少而增加。此外,一个融合的Triton实现可以在不生成未压缩键的情况下实时重建键,因此编码器在解码时间上不会增加带宽或延迟。项目页面:https://octopus-quant.github.io/

英文摘要

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

2605.21225 2026-05-21 cs.LG cs.AI

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

PREFINE: 基于偏好的隐式奖励和成本微调以实现安全对齐

Richa Verma, Bavish Kulur, Sanjay Chawla, Balaraman Ravindran

AI总结 该研究提出PREFINE方法,通过基于偏好的隐式奖励和成本微调,在连续控制环境中实现安全策略对齐,通过微调预训练强化学习策略以生成低成本行为同时保持高奖励。

详情
Comments
Accepted at AAMAS 2026 as a full paper
AI中文摘要

我们解决了通过引入成本约束使预训练的强化学习(RL)策略安全意识的问题,而无需重新训练。虽然成本可以数值编码,但我们假设更一般的情况是当成本作为偏好提供时。给定一个奖励优化的策略和一个小的偏好(低成本)和不偏好(高成本)轨迹数据集,我们的目标是微调策略以生成低成本行为,同时保留高奖励。与标准RLHF在语言模型中不同,我们的设置涉及轨迹层面的偏好,在连续控制环境中。我们介绍了PREFINE:基于偏好的隐式奖励和成本微调以实现安全对齐,这是一种基于偏好的微调方法,将现在广泛用于LLM微调的直接偏好优化(DPO)适应到序列决策设置中。PREFINE构造策略采样的反事实轨迹以建立有意义的偏好对比,并联合优化奖励保留和安全对齐。实证上,PREFINE将约束违反和灾难性故障减少了超过60%,同时保持原始奖励行为。PREFINE生成的策略在显著提高数据和计算效率的情况下,实现了低成本、高奖励性能, bridging preference alignment和安全策略适应在连续域中。

英文摘要

We address the problem of making a pre-trained reinforcement learning (RL) policy safety-aware by incorporating cost constraints without retraining it from scratch. While costs could be numerically encoded, we assume a more general setting is when costs are provided as preferences. Given a reward-optimized policy and a small dataset of preferred (low-cost) and dispreferred (high-cost) trajectories, our goal is to fine-tune the policy to generate low-cost behaviors while retaining high rewards. Unlike standard RLHF in language models, where preferences are defined over responses to the same prompt, our setting involves trajectory-level preferences in continuous control environments. We introduce PREFINE: Preference-based Implicit Reward and Cost Fine-Tuning for Safety Alignment which is a preference-based fine-tuning method that adapts Direct Preference Optimization (DPO), which is now widely used for LLM fine-tuning, to the sequential decision making setting. PREFINE constructs policy-sampled counterfactual trajectories to establish meaningful preference contrasts and jointly optimizes for reward retention and safety alignment. Empirically, PREFINE reduces constraint violations and catastrophic failures by over 60% while maintaining original reward behavior. PREFINE produces policies that achieve low-cost, high-reward performance with significantly improved data and computational efficiency compared to full offline RL or imitation learning, bridging preference alignment and safe policy adaptation in continuous domains.

2605.21207 2026-05-21 cs.CV

PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

PGC:用于通用人工智能生成图像检测的峰值引导校准

Xiaoyu Zhou, Jianwei Fei, Peipeng Yu, Jingchang Xie, Chong Cheng, Zhihua Xia

AI总结 本文提出PGC框架,通过峰值聚焦机制聚合显著特征,以校准全局决策,从而提高对细粒度判别信号的检测能力,并在CommGen15数据集上实现了最先进的性能。

详情
AI中文摘要

生成式AI的快速发展,从GANs到现代扩散模型,导致了越来越微妙的判别线索。这些细粒度信号常常被主导的高保真图像内容(例如主体)所掩盖,限制了现有主要依赖全局表示的检测器的可靠性。为了解决这一挑战,我们提出了峰值引导校准(PGC)框架。PGC引入了一种新的策略,通过峰值聚焦机制聚合显著特征。具体而言,通过采用对峰值敏感的聚合方法,强调最判别性的局部线索,PGC利用这些关键信号来校准全局决策。这种方法恢复了在全局上下文中被淹没的细微模式。此外,为了更好地模拟现实世界威胁,我们引入了CommGen15数据集,一个包含15个商业模型样本的具有挑战性的基准。广泛实验表明,PGC在性能上达到最先进的水平。具体而言,它在我们的CommGen15数据集上将平均准确率提高了+12.3%,并在标准基准上设定了新纪录,包括GenImage(+2.1%)、AIGI(+3.5%)和UniversalFakeDetect(+1.7%)。代码可在https://github.com/xiaoyu6868/PGC上获得。

英文摘要

The rapid evolution of generative AI, from GANs to modern diffusion models, has resulted in increasingly subtle discriminative clues. These fine-grained signals are often overshadowed by dominant, high-fidelity image content (e.g., the main subject), limiting the reliability of existing detectors that predominantly rely on global representations. To address this challenge, we propose the Peak-Guided Calibration (PGC) framework. PGC introduces a novel strategy that aggregates salient features via a peak-focusing mechanism. Specifically, by employing a peak-sensitive aggregation that accentuates the most discriminative local clues, PGC leverages these critical signals to calibrate the global decision. This approach recovers subtle patterns that would otherwise be submerged in the global context. Furthermore, to better simulate real-world threats, we introduce the CommGen15 dataset, a challenging benchmark comprising samples from 15 commercial models. Extensive experiments demonstrate that PGC achieves state-of-the-art performance. Specifically, it improves mean accuracy by +12.3% on our CommGen15 dataset, and sets new records on standard benchmarks, including GenImage (+2.1%), AIGI (+3.5%), and UniversalFakeDetect (+1.7%). Code is available at https://github.com/xiaoyu6868/PGC.

2605.21195 2026-05-21 cs.CV

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

RankE: 用于离散文本到图像生成的端到端后训练方法 with Decoder Co-Evolution

Siyong Jian, Siyuan Li, Luyuan Zhang, Zedong Wang, Xin Jin, Ying Li, Cheng Tan, Huan Wang

AI总结 本文提出RankE,一种端到端的后训练框架,通过解码器与策略的协同进化,解决离散自回归文本到图像生成中策略优化导致的潜在协变量偏移问题,同时提升图像质量和对齐度。

详情
AI中文摘要

离散自回归(AR)文本到图像(T2I)模型将VQ分词器与自回归策略结合,当前后训练流程仅优化策略而保持VQ解码器冻结。最近的扩散T2I工作,如REPA-E,表明VAE本身构成关键对齐瓶颈,但离散AR模型尚无类似研究。我们证明仅优化策略会引发潜在协变量偏移:随着策略进化,生成的token分布偏离解码器训练的地面真实分布,使得奖励分数提升而解码图像质量下降。为解决此不匹配,我们提出RankE,首个用于离散T2I生成的端到端后训练框架。RankE通过交替优化使两者协同进化:每个模块最大化基于排名的对齐目标,同时通过适合其参数空间的稳定性保持锚点进行正则化。这种协同进化打破了冻结解码器方法所 plagued 的保真度-对齐度权衡:在LlamaGen-XL(775M)上,标准RL提高CLIP但降低FID,而RankE同时提升两者(FID 15.21,CLIP 33.76 on MS-COCO 30K)。在Janus-Pro(1B)上的一致收益证实了解码器协同进化可靠地将奖励优化转化为像素空间质量提升。

英文摘要

Discrete autoregressive (AR) text-to-image (T2I) models pair a VQ tokenizer with an AR policy, and current post-training pipelines optimize only the policy while keeping the VQ decoder frozen. Recent diffusion T2I work, exemplified by REPA-E, has shown that the VAE itself constitutes a key alignment bottleneck, yet no analogous investigation exists for discrete AR models. We show that policy-only optimization induces Latent Covariate Shift: as the policy evolves, the resulting token distribution diverges from the ground-truth distribution on which the decoder was trained, such that reward scores improve while decoded image quality degrades. To address this mismatch, we propose RankE, the first end-to-end post-training framework for discrete T2I generation. Rather than optimizing the policy against a fixed decoder, RankE co-evolves both components through alternating optimization: each module maximizes a ranking-based alignment objective while being regularized by a stability-preserving anchor suited to its parameter space. This co-evolution breaks the fidelity--alignment trade-off that plagues frozen-decoder approaches: on LlamaGen-XL (775M), standard RL improves CLIP but degrades FID, whereas RankE improves both simultaneously (FID 15.21, CLIP 33.76 on MS-COCO 30K). Consistent gains on Janus-Pro (1B) confirm that decoder co-evolution reliably converts reward optimization into pixel-space quality improvements.

2605.21188 2026-05-21 cs.RO

A Terrain-Adaptive epsilon-Constraint MPC for Uneven Terrain Kinodynamic Planning

一种适应地形的epsilon约束MPC用于不规则地形运动动力学规划

Otobong Jerome, Geesara Kalathunga, Tiago Nascimento

AI总结 本文提出了一种适应地形的epsilon约束MPC方法,用于解决车辆在不规则地形上同时优化路径效率和姿态稳定性的规划问题,通过动态调整epsilon界限来实时探索帕累托前沿,并通过半参数模型结合分析车辆动力学和稀疏高斯过程来捕捉车辆-地形动力学。

详情
AI中文摘要

对于车辆在不规则地形上的运动动力学规划,需要同时优化竞争性目标,如路径效率和姿态稳定性。本文提出了一种集成到模型预测控制(MPC)框架中的自适应epsilon约束方法,其中epsilon界限根据地形描述符动态调整,以实时探索帕累托前沿。为了捕捉车辆-地形动力学,我们开发了一种半参数模型,结合分析车辆动力学和在相同地形描述符上训练的稀疏高斯过程(SGP)。所提出的epsilon-MPC在MPPI和GAKD基准上进行了评估,实现了94%的导航成功率,同时将最大方向偏移减少24%,并提高了多目标权衡质量23%。

英文摘要

Kinodynamic planning for car-like vehicles on uneven terrain requires simultaneously optimizing competing objectives such as path efficiency and pose stability. This work presents an adaptive epsilon-constraint method integrated into a Model Predictive Control (MPC) framework, where the epsilon bounds are dynamically adjusted based on terrain descriptors to explore the Pareto front in real time. To capture vehicle-terrain dynamics, we develop a semi-parametric model combining analytical vehicle dynamics with a Sparse Gaussian Process (SGP) trained on the same terrain descriptors. The proposed epsilon-MPC is evaluated against MPPI and GAKD baselines, achieving a 94% navigation success rate while reducing maximum orientation deviation by 24% and improving multi-objective trade-off quality by 23%.

2605.21186 2026-05-21 cs.CV cs.AI

SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

SAM-Sode:迈向微小细菌检测的可信解释

Wanying Tan, Shuo Yan, Dazhi Huang, Yazheng Liu, Zili Shao, Rufeng Chen, Hechang Chen, Mude Shi, Tianxing Ji, Sihong Xie

AI总结 本文提出SAM-Sode框架,通过几何感知提示和双约束机制提升微小细菌检测的解释性与透明度,有效抑制背景冗余并增强决策透明度。

详情
Comments
10 pages, 4 figures, conference paper
AI中文摘要

对象检测的可解释性为临床辅助诊断提供了关键的信心支持。然而,在微小细菌检测中,传统解释方法由于目标形态特征的极端稀疏性和复杂背景的严重干扰,常面临前景边界模糊和特征归因扩散的问题。这种限制阻碍了逻辑连贯的形态证据的提供。为解决这一问题,我们提出了一种新颖的可解释人工智能(XAI)框架SAM-Sode。该框架创新性地将初始特征归因图转换为几何感知提示,利用基础模型(SAM3)的先验知识实现空间细化和形态重建。此外,我们引入基于物理意义和几何对齐的双约束机制,进行实例级去噪,生成更符合人类专家直觉的解释。在我们自行构建的具有复杂电路背景的细菌数据集(包含2,524张图像)及其他公开数据集上的实验结果表明,所提出的方法有效抑制了背景冗余,并显著增强了微小物体检测的决策透明度。

英文摘要

Interpretability in object detection provides crucial confidence support for clinical auxiliary diagnosis. However, in tiny bacteria detection, traditional explanation methods often suffer from blurred foreground boundaries and diffuse feature attribution due to the extreme sparsity of target morphological features and severe interference from complex backgrounds. Such limitations hinder the provision of logically coherent morphological evidence. To bridge this gap, we propose a novel eXplainable AI (XAI) framework, SAM-Sode. The framework innovatively transforms initial feature attribution maps into geometry-aware prompts, leveraging the prior knowledge of the foundation model (SAM3) to achieve spatial refinement and morphological reconstruction of the explanatory mappings. Furthermore, we introduce a dual-constraint mechanism based on physical significance and geometric alignment to perform instance-level denoising, generating coherent explanations that better align with human expert intuition. Experimental results on our self-constructed bacteria dataset with complex circuit backgrounds (containing 2,524 images) and other public datasets demonstrate that the proposed method effectively suppresses background redundancy and significantly enhances the decision-making transparency of tiny object detection.

2605.21180 2026-05-21 cs.LG cs.SE

Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

用于密集奖励的领域可适应强化学习代码生成

Erfan Aghadavoodi Jolfaei, Daniel Maninger, Abhinav Anand, Mert Tiftikci, Mira Mezini

AI总结 本研究提出了一种领域可适应的强化学习框架,用于改进代码生成的正确性、质量和安全性,通过定制化的执行感知奖励公式和令牌级奖励映射机制,提高了代码生成在不同领域中的适应性和执行效率。

详情
Comments
10 pages, 2 figures, under review
AI中文摘要

大型语言模型在自动化代码生成中显示出强大的潜力,但缺乏正确性、质量和安全性的保证,特别是在领域特定约束方面。例如在机器人领域,代码生成越来越多地用于规划和执行动作,环境意识和物理约束至关重要。为了促进代码生成LLM适应多样化需求,包括领域特定需求,我们提出了一种强化学习框架,通过近端策略优化微调预训练LLM。我们的可定制执行感知奖励公式捕捉并优化语法、功能正确性、代码风格、安全性和模拟器可执行性。一个令牌级奖励映射机制使从执行结果到生成令牌的有效信用分配成为可能。该框架在通用代码生成(MBPP/MBPP+)和机器人程序合成(RoboEval)上进行了评估。结果表明,在功能正确性和模拟器可执行性方面有显著改进,包括在MBPP上的pass@1绝对增加19%,在RoboEval上的执行失败减少51%。这些发现表明,结构化的强化学习可以有效地将语言模型对齐到正确的程序生成和领域特定需求。

英文摘要

Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for planning and executing actions, awareness of the environment and physical constraints is critical. To facilitate the adaption of code-generating LLMs to diverse requirements, including domain-specific ones, we present a reinforcement learning framework that fine-tunes pre-trained LLMs using proximal policy optimization. Our customizable execution-aware reward formula captures and optimizes syntax, functional correctness, code style, security, and simulator executability. A token-level reward mapping mechanism enables effective credit assignment from execution outcomes to generated tokens. The framework is evaluated on general-purpose code generation (MBPP/MBPP+) and robotic program synthesis (RoboEval). The results show substantial improvements in functional correctness and simulator executability, including an absolute pass@1 increase of 19% on MBPP and a reduction in execution failures by 51% on RoboEval. These findings demonstrate that structured reinforcement learning can effectively align language models to correct program generation and domain-specific requirements.

2605.21178 2026-05-21 cs.CL

Metaphors in Literary Post-Editing: Opening Pandora's Box?

文学后编辑中的隐喻:打开普罗米修斯之盒?

Aletta G. Dorst, Mayra O. Nas, Katinka Zeven

AI总结 本文研究了文学文本后编辑者如何回应神经机器翻译和大型语言模型对隐喻的翻译方式,发现三分之一的隐喻被后编辑者修改,表明文学机器翻译中隐喻翻译存在问题,且后编辑工作比从头翻译更耗力。

详情
Comments
This paper has been accepted for presentation at the EAMT Conference 2026, which will take place in Tilburg from June 15 to 18, 2026
AI中文摘要

本文探讨了文学文本后编辑者对神经机器翻译和大型语言模型翻译隐喻的反应和回应。研究结果表明,输出中三分之一的隐喻被后编辑者修改,证明了文学机器翻译(LitMT)中隐喻翻译确实存在问题。回应表明,后编辑者意识到过于直译的翻译,尽管大多针对多词表达。有时他们难以判断解决方案是否可接受。他们对MT输出的整体质量评价较差,并表示后编辑工作比从头翻译更加费力。这支持了先前研究的观点,即后编辑限制了翻译者的创造力并削弱了他们对文本的所有权感。

英文摘要

This paper investigates how post-editors of literary texts react and respond to the way metaphors have been translated by Neu ral Machine Translation (NMT) and Large Language Models (LLMs). The results show that one in three metaphors in the output were changed by the post-editors, demonstrating that the translation of fig urative language is indeed problematic in literary MT (LitMT). The responses indi cate that the post-editors were aware of overly literal translations, though mostly for multiword expressions. Moreover, at times they found it difficult to determine whether solutions were acceptable. They rated the overall quality of the MT out put as quite poor and stated that the post editing was more work and more effort than it would have been translating from scratch. This supports previous studies ar guing that post-editing constrains transla tors in their creativity and diminishes their sense of text ownership.

2605.21177 2026-05-21 cs.LG cs.CL

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

ChunkFT: 用于内存高效全微调的分块优化

Yongkang Liu, Zijing Wang, Mengjie Zhao, Ercong Nie, Mingyang Wang, Qian Li, Feiliang Ren, Shi Feng, Daling Wang, Hinrich Schütze

AI总结 本文提出ChunkFT框架,通过动态激活的工作集重新定义全参数微调,实现了无需修改网络架构即可对任意子张量进行梯度计算,理论分析和实验表明其在内存使用、运行时间和优化质量上均有效,且在下游任务中表现优于现有内存高效基线。

详情
AI中文摘要

本文提出了ChunkFT,一种内存高效的微调框架,其通过动态激活的工作集重新定义全参数微调。ChunkFT能够在不修改网络架构的情况下,对任意子张量进行梯度计算,为优化任意子网络提供了算法基础,同时避免了标准密集梯度计算。在确定性设置下,我们提供了ChunkFT的理论收敛分析。实验中,我们使用单块RTX 4090-24GB GPU和两块H800-80GB GPU分别对Llama 3-8B和Llama 3-70B进行微调。一个7B模型在1K输入长度下的全参数微调仅需13.72GB的GPU内存。结果表明,ChunkFT在内存使用、运行时间和优化质量上均有效。此外,在语言理解、数学推理和MT-Bench等下游任务中,ChunkFT在性能上一致优于现有内存高效的基线。值得注意的是,ChunkFT在某些情况下甚至超过了全参数微调的性能。我们的代码库可在https://github.com/misonsky/chunk上找到。

英文摘要

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

2605.21171 2026-05-21 cs.CV

FTerViT: Fully Ternary Vision Transformer

FTerViT:全三进制视觉变换器

Szymon Ruciński, Pietro Bonazzi, Engin Türetken, Simon Narduzzi, Michele Magno, Nadim Maamari

AI总结 本文提出了一种全三进制视觉变换器(FTerViT),通过将所有权重矩阵和归一化参数三进制化,实现了模型压缩,同时在资源受限的微控制器上实现了高效的部署。

详情
Comments
Preprint
AI中文摘要

三进制视觉变换器(Ternary Vision Transformers)提供了显著的模型压缩,但目前最先进的方法仅将编码器层三进制化,而留下的补丁嵌入、归一化参数和分类头仍保持全精度。在针对资源受限处理器(如微控制器)的紧凑模型中,这些剩余的全精度组件决定了总内存占用,严重限制了部署效率和设备可行性。在本工作中,我们引入了一种完全三进制化的视觉变换器,其中所有权重矩阵和归一化参数均被三进制化(FTerViT)。为此,我们引入了两个新的操作符:具有通道缩放的三进制位卷积(TernaryBitConv2d)用于补丁嵌入,以及三进制归一化(TernaryLayerNorm)。FTerViT通过知识蒸馏进行训练,随后进行轻量级量化感知恢复阶段。我们的三进制W2A8 DeiT-III-S在384×384分辨率下达到82.43%的ImageNet-1K Top-1精度,内存占用为6.09MB(约15倍压缩,相比FP32降低2.42个点),优于先前的三进制ViT方法多达8个点。最后,我们展示了在ESP32-S3系统芯片上的双核XTensa LX7微控制器上首次实现三进制视觉变换器。通过部署FTerViT-Small(基于224×224分辨率的DeiT-III-Small,内存占用5.81MB),我们实现了79.64%的ImageNet-1K Top-1精度。

英文摘要

Ternary Vision Transformers offer substantial model compression, however state-of-the-art methods only ternarize the encoder layers, leaving patch embeddings, LayerNorm parameters, and classifier heads in full precision. In compact models targeting resource-constrained processors, such as microcontrollers, these remaining full-precision components determine the total memory footprint, severely limiting deployment efficiency and on-device feasibility. In this work, we introduce a fully ternarized Vision Transformer in which \emph{all} weight matrices and normalization parameters are ternarized (FTerViT). To this end, we introduce two novel operators : TernaryBitConv2d with per-channel scaling for patch embedding and TernaryLayerNorm. FTerViT is trained using knowledge distillation, followed by a lightweight quantization-aware recovery phase. Our ternary W2A8 DeiT-III-S at 384$\times$384 resolution achieves 82.43\% ImageNet-1K top-1 at 6.09\,MB (${\sim}$15$\times$ compression, $-$2.42\,pp vs.\ FP32), outperforming prior ternary ViTs methods up to 8 pp. Finally, we demonstrate the first implementation of ternary vision transformers on a dual cores XTensa LX7 microcontroller inside the ESP32-S3 system-on-chip. By deploying FTerViT-Small (based on DeiT-III-Small at 224$\times$224 resolution, 5.81\,MB), we achieve 79.64\% ImageNet-1K top-1 accuracy.

2605.21164 2026-05-21 cs.LG quant-ph

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Q-SYNTH:混合量子-经典对抗增强用于不平衡欺诈检测

Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai

AI总结 本文提出Q-SYNTH,一种混合量子-经典对抗框架,用于生成不平衡欺诈检测中的少数类样本,通过量子电路生成器和经典神经网络判别器,提升欺诈检测的召回率和F1分数。

详情
Comments
13 pages, 6 figures
AI中文摘要

信用卡欺诈检测受到极端类别不平衡的挑战,其中欺诈交易稀少但操作上至关重要。这种不平衡通常使监督学习器偏向合法类别,导致整体准确率高但欺诈类召回率和F1分数较弱。本文介绍了Q-SYNTH,一种混合经典-量子生成对抗框架,其中参数化量子电路作为生成器,经典神经网络作为判别器。Q-SYNTH旨在表数据中生成少数类欺诈样本,并从两个维度进行评估:生成样本与真实欺诈样本的统计保真度以及下游欺诈检测性能。为此,生成的样本通过基于Kolmogorov-Smirnov统计和Wasserstein距离的分布相似性度量进行评估,通过AUC-ROC衡量真实与合成的可检测性,并在量子和经典分类器上评估下游分类性能。在报告的协议下,Q-SYNTH在与经典GAN基线相比减少了边缘分布不匹配,同时保持了具有竞争力的下游欺诈检测性能。尽管SMOTE在特征相似性方面最强,而经典GAN在某些设置中达到最高的下游性能,Q-SYNTH在分布保真度和下游性能之间提供了良好的权衡,支持了混合量子增强在不平衡欺诈检测中的可行性。

英文摘要

Credit card fraud detection is fundamentally challenged by extreme class imbalance, where fraudulent transactions are rare yet operationally critical. This imbalance often biases supervised learners toward the legitimate class, leading to high overall accuracy but weaker fraud-class recall and F1-score. This paper introduces Q-SYNTH, a hybrid classical--quantum generative adversarial framework in which a parameterized quantum circuit serves as the generator and a classical neural network serves as the discriminator. Q-SYNTH is designed for minority-class fraud synthesis in tabular data and is evaluated along two dimensions: statistical fidelity to real fraud samples and downstream performance for fraud detection. To this end, generated samples are assessed using distributional similarity measures based on Kolmogorov-Smirnov statistics and Wasserstein distances, real-vs-synthetic detectability measured by AUC-ROC, and downstream classification performance across both quantum and classical classifiers. Under the reported protocol, Q-SYNTH reduces marginal distribution mismatch relative to a classical GAN baseline while maintaining competitive downstream fraud-detection performance. Although SMOTE achieves the strongest feature-wise similarity and the classical GAN attains the highest downstream performance in several settings, Q-SYNTH offers a favorable compromise between distributional fidelity and downstream performance, supporting the feasibility of hybrid quantum augmentation for imbalanced fraud detection.

2605.21160 2026-05-21 cs.LG

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

通过反向生成数据和引导强化学习学习第一积分

Jingfeng Zhong, Zhengxiang Liu, Zhijie Wang, Shuai Li

AI总结 本文提出FISolver,一种基于LLM的求解器,通过反向生成数据和引导强化学习方法,解决第一积分发现中的数据稀缺问题,并在挑战性基准上显著优于其他方法。

详情
Comments
17 pages, 2 figures, 3 tables
AI中文摘要

发现第一积分对理解动力系统中的守恒律具有根本科学意义。然而,现有的符号计算工具和大语言模型在这一任务上仍然有限,因为高质量的训练数据稀缺,且成功的解决方案往往依赖于数学直觉。本文提出了FISolver,一种旨在解决这一挑战的基于LLM的求解器。首先,我们介绍了一种

英文摘要

The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.