arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.24912 2026-05-26 cs.LG cs.AI q-bio.OT

Explainable Retinal Imaging for Prediction of Multi-Organ Dysfunction in Type 2 Diabetes

可解释的视网膜成像用于预测2型糖尿病多器官功能障碍

Mini Han Wang, Liting Huang, Wei Hong, Boonthawan Wingwon

AI总结本研究利用常规实验室生物标志物构建系统级异常指数，通过梯度提升模型预测2型糖尿病多系统失调，并采用SHAP实现可解释性，揭示了高血糖、肾功能障碍、血脂异常和炎症是主要驱动因素。

Comments 15 pages, 8 figures

详情

AI中文摘要

背景：2型糖尿病（T2DM）日益被认为是一种以代谢、肾脏、脂质和炎症通路协调功能障碍为特征的系统性疾病。现有的临床评估往往无法捕捉这种多维度负担。方法：我们对1,195名患者进行了回顾性研究，使用了常规收集的实验室生物标志物。构建了系统级异常指数以量化器官特异性功能障碍，并将多系统受累定义为两个或以上系统异常。训练了包括逻辑回归、随机森林和梯度提升在内的监督机器学习模型来预测多系统失调。使用SHapley Additive exPlanations（SHAP）实现模型可解释性。结果：梯度提升模型表现出近乎完美的区分能力（AUC = 1.000），显著优于逻辑回归（AUC = 0.925）。特征归因分析显示，高血糖、肾功能障碍、血脂异常和炎症是多系统风险的主要驱动因素。部分依赖分析中观察到的剂量-反应关系进一步支持了模型预测的生物学合理性。结论：本研究提出了一个可解释的、数据驱动的框架，用于量化T2DM的系统性疾病负担。通过将常规生物标志物与多器官功能障碍联系起来，我们的方法提供了预测准确性和机制洞察，为糖尿病护理中的风险分层和精准医学提供了潜力。本研究中使用的数据和代码可在GitHub上公开获取：https://github.com/MiniHanWang/Type-2-Diabetes-1.git

英文摘要

Background: Type 2 diabetes mellitus (T2DM) is increasingly recognised as a systemic disease characterised by coordinated dysfunction across metabolic, renal, lipid, and inflammatory pathways. Existing clinical assessments often fail to capture this multi-dimensional burden. Methods: We conducted a retrospective study of 1,195 patients using routinely collected laboratory biomarkers. System-level abnormality indices were constructed to quantify organ-specific dysfunction, and multi-system involvement was defined as abnormalities in two or more systems. Supervised machine learning models, including logistic regression, random forest, and gradient boosting, were trained to predict multi-system dysregulation. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). Results: The gradient boosting model demonstrated near-perfect discrimination (AUC = 1.000), significantly outperforming logistic regression (AUC = 0.925). Feature attribution analysis revealed that hyperglycaemia, renal impairment, dyslipidaemia, and inflammation were the dominant drivers of multi-system risk. Dose-response relationships observed in partial dependence analyses further supported the biological plausibility of model predictions. Conclusion: This study presents an interpretable, data-driven framework for quantifying systemic disease burden in T2DM. By linking routine biomarkers to multi-organ dysfunction, our approach provides both predictive accuracy and mechanistic insight, offering potential for improved risk stratification and precision medicine in diabetes care. The data and code used in this study are openly available on GitHub at: https://github.com/MiniHanWang/Type-2-Diabetes-1.git

URL PDF HTML ☆

赞 0 踩 0

2605.24911 2026-05-26 cs.LG cs.AI

Factorize to Generalize: Retrieval-Guided Invariant-Dynamic Decomposition for Time Series Forecasting

因式分解以泛化：面向时间序列预测的检索引导不变-动态分解

Jinjin Chi, Lei Feng, Lulu Zhang, Yongcheng Jing, Yiming Wang, Ximing Li, Jialie Shen, Leszek Rutkowski, Dacheng Tao

AI总结提出检索引导的不变-动态分解框架，通过分离稳定共享结构与实例特定变化，提升时间序列零样本预测在分布偏移下的鲁棒性。

详情

AI中文摘要

时间序列基础模型（TSFMs）最近通过大规模预训练和检索增强预测实现了强大的零样本预测性能。然而，我们的实证分析揭示了基于检索的预测的一个非平凡限制：检索倾向于导致更振荡的预测，在高度波动的序列上提升性能，但在更平滑、趋势主导的序列上降低准确性。这表明检索信息可能在未明确区分稳定时间结构与实例特定变化的情况下被融合到预测中，这可能在分布偏移下降低鲁棒性。我们提出了一种用于时间序列预测的检索引导不变-动态分解框架。我们不将检索用作辅助预测上下文，而是利用检索到的序列作为来自相关环境的隐式样本，以指导表示分解。具体来说，我们首先通过基于注意力的聚合构建检索感知表示，然后引入检索引导路由机制将其分解为捕获稳定共享结构的不变组件和建模上下文相关变化的动态组件。这两个组件分别预测并融合以进行最终预测，使模型能够保留可迁移模式，同时保持对动态演变的适应性。我们进一步设计了鼓励不变学习和解耦的训练目标，并提供了理论见解，表明检索聚合减少了方差，并在没有显式环境监督的情况下近似不变表示学习。大量实验表明，我们的方法在分布偏移下持续提高鲁棒性，并在零样本预测设置中优于现有的TSFMs和基于检索的基线。

英文摘要

Time series foundation models (TSFMs) have recently achieved strong zero-shot forecasting performance through large-scale pretraining and retrieval-augmented prediction. However, our empirical analysis reveals a non-trivial limitation of retrieval-based forecasting: retrieval tends to induce more oscillatory predictions, improving performance on highly fluctuating series while degrading accuracy on smoother, trend-dominated ones. This suggests that retrieved information may be fused into prediction without explicitly distinguishing stable temporal structure from instance-specific variations, which can reduce robustness under distribution shifts. We propose a Retrieval-guided Invariant-Dynamic DEcomposition framework for time series forecasting. Rather than using retrieval as auxiliary predictive context, we leverage retrieved sequences as implicit samples from related environments to guide representation decomposition. Specifically, we first construct a retrieval-aware representation via attention-based aggregation, and then introduce a retrieval-guided routing mechanism to decompose it into an invariant component capturing stable shared structure and a dynamic component modeling context-dependent variations. These two components are forecast separately and fused for final prediction, enabling the model to preserve transferable patterns while remaining adaptive to evolving dynamics. We further design training objectives that encourage invariant learning and disentanglement, and provide theoretical insight showing that retrieval aggregation reduces variance and approximates invariant representation learning without explicit environment supervision. Extensive experiments demonstrate that our method consistently improves robustness under distribution shifts and outperforms existing TSFMs and retrieval-based baselines in zero-shot forecasting settings.

URL PDF HTML ☆

赞 0 踩 0

2605.24910 2026-05-26 cs.AI cs.CE

Noise-Robust Financial Numerical Entity Attribute Tagging

鲁棒噪声的金融数值实体属性标注

Hsin-Min Lu, Chen-Yang Lai, Yi-Jhen Li, Ju-Chun Yen

AI总结针对金融数值实体标注中标签噪声和属性不全问题，提出NORA方法，通过任务感知实例加权和邻域先验KNN过滤，在6.6百万实例基准上实现鲁棒的多属性预测。

详情

AI中文摘要

金融数值实体（FNE）理解旨在恢复财务报告中数值提及的含义。现有研究主要关注概念名称预测，并面临两个重要限制。首先，来自内联XBRL的标签可能包含错误，因为申报通常是手动准备的。其次，其他重要的FNE属性，如报告时间关系、测量尺度和会计符号，较少被强调。我们提出鲁棒噪声的丰富金融数值实体属性标注（NORA）来解决这些差距。NORA使用任务感知的实例特定加权来减弱训练过程中噪声标签的影响，并进一步提出邻域先验调整KNN（NPK）过滤方法，以便在真实世界噪声测试集上进行更可靠的评估。此外，我们构建了一个包含660万个实例的大规模基准，具有多属性标签和申报元数据。实验表明，NORA与最先进的噪声标签基线（包括Co-teaching、Mixup、SSR和SelfMix）相比表现强劲。此外，NORA在未过滤和噪声过滤测试设置下均具有鲁棒性。它在概念名称和时间关系预测上取得了最佳准确率、宏F1和加权F1，同时在尺度和符号预测上保持竞争力。这些结果证明了在考虑真实世界财务申报中标签噪声的同时，联合建模丰富FNE属性的价值。

英文摘要

Financial Numerical Entity (FNE) understanding aims to recover the meaning of numerical mentions in financial reports. Existing studies primarily focus on concept name prediction and face two important limitations. First, labels derived from inline XBRL may contain errors because filings are usually prepared manually. Second, other important FNE attributes, such as reporting-time relation, measurement scale, and accounting sign, are less emphasized. We propose \textbf{NO}ise-\textbf{R}obust Tagging for Rich Financial Numerical Entity \textbf{A}ttributes (\textsc{NORA}) to address these gaps. NORA uses task-aware instance-specific weighting to attenuate the influence of noisy labels during training, and we further propose the Neighborhood Prior-adjusted KNN (NPK) filtering method for more reliable evaluation on real-world noisy test sets. In addition, we construct a large-scale benchmark containing 6.6 million instances with multi-attribute labels and filing metadata. Experiments show that \textsc{NORA} performs strongly compared with state-of-the-art noisy-label baselines, including Co-teaching, Mixup, SSR, and SelfMix. Moreover, NORA is robust under both unfiltered and noise-filtered test settings. It achieves the best Accuracy, Macro F1, and Weighted F1 for concept name and time-relation prediction, while remaining competitive on scale and sign prediction. These results demonstrate the value of jointly modeling rich FNE attributes while accounting for label noise in real-world financial filings.

URL PDF HTML ☆

赞 0 踩 0

2605.24908 2026-05-26 cs.LG cs.AI

On the Impact of Class Imbalance on the Learning Dynamics of Deep Neural Networks:An Intuitive Insight

论类别不平衡对深度神经网络学习动态的影响：直观洞察

Ismail B. Mustapha, Shafaatunnur Hasan, Sunday O. Olatunji, Hatem S. Y. Nabus

AI总结通过监测不同不平衡比率下深度神经网络对多数类和少数类的学习模式，系统研究了类别不平衡如何导致模型早期欠拟合少数类并仅学习多数类，最终造成少数类表示过拟合而非泛化。

Comments Conference

详情

AI中文摘要

近年来，深度神经网络（DNN）中的类别不平衡问题引起了研究者的广泛关注。然而，相关文献中对DNN在不平衡数据上表现不佳的原因存在不同解释，表明人们对这一长期存在的现象如何影响DNN性能知之甚少。更好地理解这一问题对于开发有效的基于DNN的不平衡方法至关重要。因此，本研究通过监测DNN模型在不同不平衡比率数据集上对多数类和少数类的学习模式，系统研究了类别不平衡对DNN学习动态的影响。实验结果表明，与从平衡数据集学习时DNN类似地学习各个类别不同，类别不平衡严重损害了DNN的性能，导致模型在早期训练轮次中欠拟合少数类样本，同时仅学习多数类。尽管DNN最终学会了少数类样本，但这种学习方式仅导致学习到的少数类表示在测试阶段无法泛化，因为它们仅仅是过拟合以尽可能降低整体训练损失。

英文摘要

Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying accounts of the reasons behind the poor performance of DNN on imbalance data in pertinent literature shows that little is known about how this agelong phenomenon impacts the performance of DNNs. A better understanding of this problem is crucial to developing effective DNN-based imbalance methods. Thus, this study systematically investigates the impact of class imbalance on the learning dynamics of DNN by monitoring the learning pattern of DNN models on both the majority and minority classes of datasets of varying imbalance ratios. Experimental findings shows that as against learning from balanced datasets where DNN learns the classes similarly, class imbalance has severe deteriorating impact on the performance of DNN, driving the model to underfit the minority class samples in the early training epochs while simultaneously learning only the majority class. Although DNN ultimately learns the minority samples, learning in this manner only results in learnt minority representations that are non-generalizable at test phase because they are merely overfitted to keep the overall training loss as low as possible.

URL PDF HTML ☆

赞 0 踩 0

2605.24907 2026-05-26 cs.CL

Overview of the PsyDefDetect Shared Task at BioNLP 2026: Detecting Levels of Psychological Defense Mechanisms in Supportive Conversations

PsyDefDetect 共享任务概述：在支持性对话中检测心理防御机制水平

Hongbin Na, Zimu Wang, Zhaoming Chen, Yining Hua, Rena Gao, Kailai Yang, Ling Chen, Wei Wang, Shaoxiong Ji, John Torous, Sophia Ananiadou

AI总结本文介绍了与 BioNLP@ACL 2026 合办的 PsyDefDetect 共享任务，该任务基于临床验证的 DMRS 框架，要求系统将求助者话语分类为九个类别，最佳系统达到 0.420 的宏 F1 分数，但仍存在改进空间。

详情

AI中文摘要

我们介绍了 PsyDefDetect，这是一个与 BioNLP@ACL 2026 合办的关于在情感支持对话中检测心理防御机制水平的共享任务。该任务基于临床验证的防御机制评定量表（DMRS）框架，要求系统根据给定的前面对话上下文，将目标求助者话语分类为九个类别之一：七个层次的 DMRS 水平加上两个辅助标签。参与者使用了 PsyDefConv，这是一个新发布的语料库，包含 200 个对话和 2336 条求助者话语，在 DMRS 下进行了标注，并具有较高的一致性。该任务在 CodaBench 上吸引了 172 名参与者，提交了 563 份结果，其中 21 个团队正式注册了最终排名。最佳系统实现了 0.420 的宏 F1 分数，显著超过了数据集论文中报告的最强微调基线，但仍留有明显的改进空间。我们的分析强调了（i）过度预测多数类高适应水平的持续趋势，（ii）准确率和宏 F1 之间的差距扩大，揭示了类别不平衡敏感性，以及（iii）理论感知和基于 LLM 的方法在细粒度防御功能分类中的价值。我们发布了所有任务材料，并邀请社区继续在这个临床心理学与自然语言处理的新交叉领域开展工作。

英文摘要

We present an overview of PsyDefDetect, the shared task on detecting levels of psychological defense mechanisms in emotional support dialogues, co-located with BioNLP@ACL 2026. Grounded in the clinically validated Defense Mechanism Rating Scales (DMRS) framework, the task asks systems to classify a target seeker utterance, given its preceding dialogue context, into one of nine categories: seven hierarchical DMRS levels plus two auxiliary labels. Participants worked on PsyDefConv, a newly released corpus of 200 dialogues and 2336 help-seeker utterances annotated under DMRS with substantial inter-annotator agreement. The task attracted 172 participants on CodaBench who produced 563 submissions, with 21 teams officially registering their results for the final ranking. The best system achieved a macro F1-score of 0.420, surpassing the strongest fine-tuned baseline reported in the dataset paper by a notable margin, yet leaving clear headroom. Our analysis highlights (i) a persistent tendency to over-predict the majority High-Adaptive class, (ii) a widening gap between accuracy and macro-F1 that reveals class-imbalance sensitivity, and (iii) the value of theory-aware and LLM-based approaches for fine-grained defensive-function classification. We release all task materials and invite the community to continue work on this novel intersection of clinical psychology and NLP.

URL PDF HTML ☆

赞 0 踩 0

2605.24904 2026-05-26 cs.CL

Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation

量化翻译错误对多语言大语言模型评估的影响

Klaudia-Doris Thellmann, Bernhard Stadler, Michael Färber, Jens Lehmann

AI总结研究机器翻译基准中的翻译错误如何影响多语言LLM评估的可靠性，通过自动错误跨度检测和准确性下降分析揭示翻译错误与评估指标之间的关联。

2605.24902 2026-05-26 cs.CL cs.AI cs.LG

When Reasoning Hurts: Source-Aware Evaluation of Frontier LLMs for Clinical SOAP Note Generation

当推理有害：面向临床SOAP笔记生成的前沿LLM源感知评估

Faizan Faisal

AI总结通过源感知基准测试，评估推理增强型LLM在临床SOAP笔记生成中的表现，发现推理能力反而降低GPT-5.4的质量，而相同源RAG带来模型依赖的小幅提升。

详情

AI中文摘要

推理增强型LLM在医学推理基准测试中表现强劲，但这些增益是否能迁移到结构化临床文档尚不清楚；我们通过一个跨OMI Health、ACI-Bench和PriMock57的源感知基准，利用临床对话生成SOAP笔记来研究这一问题。我们在一个2x2受控设计中评估GPT-5.4、DeepSeek-V4-Flash和Gemma-4-E4B，独立切换提供者原生推理和相同源检索增强生成（RAG）。输出使用七种自动指标以及两个参考感知的LLM评判者进行评估。两种评估方法一致认为，非推理的GPT-5.4配置达到最高整体质量，而DeepSeek-V4-Flash在推理增强配置中表现最佳。启用推理显著降低了GPT-5.4在所有三个数据集上的性能，而相同源RAG带来较小的、模型依赖的改进。总体而言，研究结果表明，不应假设更强的推理能力能改善对保真度敏感的SOAP笔记生成，而无需专门的、任务特定的评估。

英文摘要

Reasoning-enabled LLMs perform strongly on medical reasoning benchmarks, but it remains unclear whether these gains transfer to structured clinical documentation; we investigate this question using SOAP note generation from clinical dialogue in a source-aware benchmark spanning OMI Health, ACI-Bench, and PriMock57. We evaluate GPT-5.4, DeepSeek-V4-Flash, and Gemma-4-E4B in a controlled 2x2 design that independently toggles provider-native reasoning and same-source retrieval-augmented generation (RAG). Outputs are assessed using seven automatic metrics alongside two reference-aware LLM judges. Both evaluation approaches agree that a non-reasoning GPT-5.4 configuration achieves the highest overall quality, while DeepSeek-V4-Flash performs best among reasoning-enabled configurations. Enabling reasoning significantly degrades GPT-5.4 performance across all three datasets, whereas same-source RAG yields smaller, model-dependent improvements. Overall, the findings indicate that stronger reasoning capability should not be assumed to improve fidelity-sensitive SOAP note generation without dedicated, task-specific evaluation.

URL PDF HTML ☆

赞 0 踩 0

2605.24900 2026-05-26 cs.AI

ProActor: Timing-Aware Reinforcement Learning for Proactive Task Scheduling Agents

ProActor: 时序感知强化学习用于主动任务调度智能体

Lei Ding, Bin He, Chenguang Wang, Yang Liu

AI总结提出ProActor框架，通过时序感知强化学习（结合RULER奖励和阶段感知复合奖励）和高效训练系统ART-F，在保持动作一致性的同时显著提升主动任务调度的时序质量。

Comments 47 pages, 31 figures. Accepted to ACL 2026

详情

AI中文摘要

主动任务导向的智能体必须自主预测用户需求、识别可操作的机会，并在适当时刻触发软件动作——从根本上转变依赖显式指令的被动系统。然而，现有方法缺乏可泛化的端到端解决方案来度量和优化这种预期行为。本文介绍了ProActor，一个用于对话任务调度的统一框架，集成了：(1) 一种领域无关的自动标注方法，通过生成完整的机遇时间窗口而非刚性点标签，实现可扩展的主动性强化学习(RL)；(2) 系统性的主动性指标，同时捕获时序质量和参考动作对齐；(3) 使用GRPO及多种奖励设计的RL优化。我们的洞察是，基于RULER的奖励结合主动性评分准则对提升时序质量至关重要，而由阶段感知复合奖励实现的主动性优化是平衡时序质量和参考动作对齐的关键。时序感知RL需要大量探索，这要求高效的基础设施。我们开发了ART-F，一种自适应框架，将请求自适应推理集群与单节点多GPU系统上的DDP训练相结合，实现了4位Qwen2.5-14B-ProActor-Q4的LoRA训练，加速4-8倍。在两个新自动标注数据集上的实验表明，在保持与最先进(SOTA)基线相当的动作一致性的同时，主动时序显著提升。消融实验验证了不同复合奖励变体的有效性。

英文摘要

Proactive task-oriented agents must autonomously anticipate user needs, identify actionable opportunities, and trigger software actions at appropriate moments - fundamentally shifting from reactive systems that await explicit instructions. However, existing approaches lack generalizable end-to-end solutions for measuring and optimizing such anticipatory behaviors. This paper introduces ProActor, a unified framework for conversational task scheduling that integrates: (1) a domain-agnostic automated annotation methodology that enables scalable proactiveness reinforcement learning (RL) by generating full opportunity time windows instead of rigid point labels, (2) systematic proactiveness metrics capturing both timing quality and reference action alignment, and (3) RL optimization using GRPO with various reward designs. Our insight is that RULER-based rewards with proactiveness rubrics are crucial for improving timing quality, and that proactiveness optimization enabled by stage-aware composite rewards is key to balancing timing quality and reference action alignment. Timing-aware RL requires extensive exploration, demanding efficient infrastructure. We develop ART-F, an adaptive framework combining request-adaptive inference clusters with DDP-based training on single-node multi-GPU systems, enabling LoRA training of 4-bit Qwen2.5-14B-ProActor-Q4 with 4-8x speedups. Experiments on two newly auto-annotated datasets demonstrate significant improvements in proactive timing while maintaining action consistency comparable to state-of-the-art (SOTA) baselines. Ablations validate the effectiveness of distinct composite reward variations.

URL PDF HTML ☆

赞 0 踩 0

2605.24899 2026-05-26 cs.AI

TaBIIC2: Interactive Building of Ontological Taxonomies using Weighted Self-Organizing Maps

TaBIIC2：使用加权自组织映射交互式构建本体分类

Mathieu d'Aquin

AI总结本文提出一种工具，通过加权自组织映射聚类方法，支持用户逐步交互式地从表格数据中构建概念分类，并定义概念的内涵，平衡了纯手动分析与自动方法。

详情

AI中文摘要

本体表示一个领域的概念知识。本体的核心是概念和子概念的分类，这些概念代表特定实体，构建起来可能很复杂。在许多情况下，信息以记录形式提供，描述相关实体的特征，即表格数据。识别此类数据中的模式和相似性可以作为识别概念并组织它们的基础。然而，手动执行此操作可能具有挑战性，而纯自动方法（如凝聚聚类或依赖大型语言模型分析数据）可能会让用户面对大量结果且控制力不足。在本文中，我们描述了一种工具，通过识别聚类及其内涵定义，支持逐步交互式构建概念分类。为此，我们依赖加权自组织映射作为聚类方法，因为它们能够创建任意数量的聚类，这些聚类在聚类实体特定特征的值分布方面具有区分性。我们表明，通过集成这种机制和其他机制来快速创建将表格数据中的实例分组的概念，该工具代表了在纯手动分析和自动方法之间构建本体分类的中间地带。

英文摘要

Ontologies represent the conceptual knowledge of a domain. At the core of an ontology is the taxonomy of concepts and subconcepts that represent specific entities, which can be complex to build. In many cases, information is available in the form of records describing the characteristics of relevant entities, i.e., tabular data. Identifying patterns and similarities in such data can serve as a basis for identifying concepts and organizing them. However, doing so manually can be challenging, and purely automatic approaches, such as agglomerative clustering or relying on a large language model to analyze the data, can leave the user with overwhelming results and little control. In this paper, we describe a tool that enables the progressive and interactive construction of a taxonomy of concepts by identifying clusters as well as their intentional definitions. To do so, we rely on weighted self-organizing maps as a clustering method because they enable the creation of an arbitrary number of clusters that are distinct with respect to the distributions of values of specific characteristics of the clustered entities. We show that, by integrating this mechanism and others for rapidly creating concepts that group together instances from tabular data, this tool represents a middle ground between purely manual analysis and automatic methods for building ontological taxonomies.

URL PDF HTML ☆

赞 0 踩 0

2605.24894 2026-05-26 cs.CV

BFS: Back-to-Front Layered Image Synthesis via Knowledge Transfer

BFS: 通过知识转移的前后分层图像合成

Kyoungkook Kang, Gyujin Sim, Sunghyun Cho

AI总结提出BFS框架，利用双分支扩散模型和两阶段训练，通过从非分层图像合成中转移知识，实现高质量的前景层合成与背景和谐融合。

Comments SIGGRAPH 2026

详情

AI中文摘要

随着生成模型扩展了视觉内容创作的可能性，分层图像合成已成为可控和创意编辑的一个有前景的方向。然而，现有方法难以充分发挥这一潜力。基于分解的方法通常难以实现干净分离，而基于生成的方法则面临训练数据获取困难的问题，降低了质量和场景多样性。在本文中，我们提出了BFS，一种新颖的基于生成的分层图像合成框架。具体来说，给定背景图像和用户指导，BFS合成一个前景层，该层不仅包含前景对象，还包括其相关的视觉效果（如阴影和反射），同时与背景无缝协调以产生连贯的合成图像。为了实现多样且高质量的前景层合成，同时克服数据稀缺问题，我们利用相对易于学习的非分层图像合成知识来指导前景合成。为此，我们采用双分支扩散框架，其中两个相互连接的分支分别生成合成图像和前景层，实现双向知识转移。基于该框架，我们提出了一种两阶段训练方案，利用高质量的非分层合成图像数据集有效提升前景质量。大量实验（包括用户研究）表明，BFS生成了高质量的分层图像，始终优于先前方法。

英文摘要

As generative models expand the possibilities of visual content creation, layered image synthesis has emerged as a promising direction for controllable and creative editing. However, existing methods struggle to fully realize this potential. Decomposition-based methods often struggle with clean separation, while generation-based methods suffer from difficulty in training data acquisition, reducing quality and scene diversity. In this paper, we propose BFS, a novel generation-based framework for layered image synthesis. Specifically, given a background image and user guidance, BFS synthesizes a foreground layer that incorporates not only a foreground object but also its associated visual effects, such as shadows and reflections, while seamlessly harmonizing with the background to produce a coherent composite. To enable diverse and high-quality foreground layer synthesis while overcoming data scarcity, we leverage the comparatively easy-to-learn knowledge of unlayered image synthesis for the foreground synthesis. To this end, we adopt a dual-branch diffusion framework in which two interconnected branches generate a composite image and a foreground layer, respectively, enabling bidirectional knowledge transfer. Based on this framework, we propose a two-stage training scheme that utilizes a high-quality unlayered composite image dataset to effectively enhance foreground quality. Extensive experiments, including a user study, show that BFS produces high-quality layered images, consistently outperforming prior methods.

URL PDF HTML ☆

赞 0 踩 0

2605.24893 2026-05-26 cs.CV

BED-SAM2: Boundary-Enhanced-Depth SAM2 via Monocular Geometric Priors

BED-SAM2: 通过单目几何先验增强边界的深度SAM2

Tyler Rust, Dara McNally, Kyle O'Donnell, Colin Kelly, Chandra Kambhamettu

AI总结本研究通过修改SAM2编码器以直接编码单目深度信息，提出BED-SAM2模型，在少量训练周期内实现显著和伪装物体检测的竞争性能。

Comments 9 pages, 5 figures, 5 tables. Presented as a poster at the CVPR 2026 Workshop on Computer Vision in the Wild (CVinW). Code available at https://github.com/TylerRust-1/BED-SAM2

2605.24885 2026-05-26 cs.CL

DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting

DTO：一种用于有效反事实故事重写的可微分训练目标

Amelia Girard, Massimo Piccardi

AI总结提出一种可微分训练目标（DTO），通过端到端反向传播联合优化对参考重写的忠实度和与源叙事的语义一致性，以解决反事实故事重写中模型忽略细微修改的问题。

Comments 11 pages, 2 figures

详情

AI中文摘要

反事实故事重写是一项自然语言处理任务，要求更新现有故事以反映所选替代事件，同时保留所有未受影响的故事情节元素和整体连贯性。尽管大型语言模型最近在此任务上取得了显著进展，但由于所需修改通常规模很小且高度局部化，该任务仍然具有挑战性。因此，以传统方式使用最大似然训练目标训练的模型往往忽略这些细微之处。同时，基于强化学习的更复杂训练方法以缓慢和难以设置而闻名。基于这些原因，本文提出了一种新颖的可微分训练目标（DTO），直接优化所需的反事实改进。在我们的方法中，通过端到端反向传播微调一个Transformer模型，针对一个完全可微的损失函数，该函数同时奖励（i）对参考重写的忠实度和（ii）与源叙事的语义一致性。在TimeTravel和ART数据集上的实证评估表明，所提出的DTO方法能够超越最大似然基线和基于偏好的方法，并在所有评估指标上与两个当代大型语言模型竞争。这些发现证实了任务特定的可微分目标对于细微、受控的文本生成任务的有效性。

英文摘要

Counterfactual story rewriting is a natural language processing task that requires updating an existing story to reflect a chosen alternative event, yet preserving all the unaffected storyline elements and overall coherence. While large language models have recently made remarkable progress on this task, it still remains challenging since the required modifications are typically very small in size and highly localized. As a consequence, models trained in a conventional manner with the maximum-likelihood training objective tend to overlook these nuances. At the same time, more sophisticated training approaches based on reinforcement learning are notoriously slow and difficult to set up. For these reasons, our paper proposes a novel, differentiable training objective (DTO) that directly optimizes for the requisite counterfactual improvements. In our approach, a transformer model is fine-tuned via end-to-end backpropagation against a fully differentiable loss function that jointly rewards (i) fidelity to the reference rewrite and (ii) semantic consistency with the source narrative. The empirical evaluation on the TimeTravel and ART datasets shows that the proposed DTO approach has been able to surpass a maximum-likelihood baseline and a preference-based approach, and perform competitively against two contemporary large language models in all evaluation metrics. These findings substantiate the effectiveness of task-specific differentiable objectives for nuanced, controlled text-generation tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.24883 2026-05-26 cs.AI cs.CR cs.SE

Inverting the Shield: Systematically Generating Safety Tests from Policy Specifications

反转盾牌：从策略规范中系统生成安全测试

Xiaoyue Lu, Xianglin Yang, Haijun Liu, Jiahao Liu, Kuntai Cai, Yan Xiao, Jin Song Dong

AI总结提出POLARIS框架，通过将非结构化自然语言策略编译为一阶逻辑表示并构建语义策略图，实现覆盖驱动的可重复安全测试，相比基线方法提高了策略覆盖率和攻击成功次数。

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

详情

AI中文摘要

大型语言模型（LLMs）的广泛集成需要严格且系统的安全评估。现有范式要么依赖构建的基准从预定义角度评估安全性，要么采用动态红队探测潜在漏洞。虽然有效，但这些方法面临挑战，因为它们严重依赖专家领域知识，提供的系统保证有限，且容易快速过时。为解决这些限制，我们引入了一个新颖框架POLARIS，将基于规范的软件测试的严谨性引入AI安全。POLARIS首先将非结构化自然语言策略编译为一阶逻辑（FOL）表示，建立高层规则与具体测试用例之间的可追溯链接。这种形式化使得能够构建语义策略图，其中复杂的策略违规场景被编码为可遍历路径。通过系统地探索该图，POLARIS发现组合违规模式，然后将其实例化为可执行的自然语言测试查询，实现覆盖驱动且可重复的安全测试。实验表明，与已建立的基线相比，POLARIS实现了更高的策略覆盖率和攻击成功次数。关键是，通过连接形式化方法和AI安全，POLARIS提供了一种有原则的自动化方法，确保LLMs遵守安全关键策略，并具有可验证的可追溯性。我们在https://github.com/huac-lxy/POLARIS发布代码。

英文摘要

The widespread integration of Large Language Models (LLMs) necessitates rigorous and systematic safety evaluation. Existing paradigms either rely on constructed benchmarks to assess safety from predefined perspectives, or employ dynamic red-teaming to probe potential vulnerabilities. While effective, these approaches face challenges, as they depend heavily on expert domain knowledge, offer limited systematic guarantees, and are vulnerable to rapid obsolescence. To address these limitations, we introduce a novel framework POLARIS that brings the rigor of specification-based software testing to AI safety. POLARIS first compiles unstructured natural-language policies into First-Order Logic (FOL) representations, establishing a traceable link between high-level rules and concrete test cases. This formalization enables the construction of a Semantic Policy Graph, where complex policy violation scenarios are encoded as traversable paths. By systematically exploring this graph, POLARIS uncovers compositional violation patterns, which are then instantiated into executable natural-language test queries, enabling coverage-driven and reproducible safety testing. Experiments demonstrate that POLARIS achieves higher policy coverage and attack success counts compared to established baselines. Crucially, by bridging formal methods and AI safety, POLARIS provides a principled, automated approach to ensuring LLMs adhere to safety-critical policies with verifiable traceability. We release our code at https://github.com/huac-lxy/POLARIS.

URL PDF HTML ☆

赞 0 踩 0

2605.24879 2026-05-26 cs.LG math.OC

Efficient DP-SGD for LLMs with Randomized Clipping

基于随机裁剪的高效DP-SGD用于大语言模型

Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn

AI总结提出DP-SGD-RC算法，利用随机迹估计（Hutchinson和Hutch++）降低每样本梯度范数估计的内存开销，在保持隐私保证的同时减少内存和计算复杂度。

Comments Accepted at ICML 2026

详情

AI中文摘要

大语言模型（LLMs）在可能包含敏感信息的大规模数据集上进行训练。差分隐私（DP）作为正式隐私保护的事实标准，为训练具有可证明隐私保护的LLMs提供了原则性框架。然而，最先进的DP训练实现依赖于快速梯度裁剪技术，其内存开销为$O(B \min\{T^2, d^2\})$，其中$B$是批量大小，$T$是序列长度，$d$是模型宽度。随着模型规模和上下文长度的增长，这一开销变得难以承受。我们提出DP-SGD-RC，一种带有随机裁剪的新型DP-SGD变体，可降低内存和计算复杂度。DP-SGD-RC利用随机迹估计方法，特别是Hutchinson估计器[Hutchinson, 1989]及其改进变体Hutch++[Meyer et al., 2021]，以减少每样本梯度范数估计的内存占用。我们提供了严格的隐私分析，表明DP-SGD-RC实现了与确定性裁剪相竞争噪声乘数。在长上下文基准（包括分类、问答和摘要任务）上微调Llama~3.2-1B的实验表明，DP-SGD-RC在匹配基线效用的同时显著降低了内存和计算需求。

英文摘要

Large language models (LLMs) are trained on vast datasets that may contain sensitive information. Differential privacy (DP), the de facto standard for formal privacy guarantees, provides a principled framework for training LLMs with provable privacy protection. However, state-of-the-art DP training implementations rely on fast gradient clipping techniques with memory overhead $O(B \min\{T^2, d^2\})$, where $B$ is the batch size, $T$ is the sequence length, and $d$ is the model width. This becomes prohibitive as both model size and context length grow. We propose DP-SGD-RC, a novel variant of DP-SGD with randomized clipping that reduces memory and compute complexity. DP-SGD-RC leverages stochastic trace estimation methods, specifically Hutchinson's estimator[Hutchinson, 1989] and its improved variant, Hutch++[Meyer et al., 2021], to reduce the memory footprint of per-sample gradient norm estimation. We provide a tight privacy analysis showing that DP-SGD-RC achieves noise multipliers competitive with deterministic clipping. Experiments fine-tuning Llama~3.2-1B on long-context benchmarks spanning classification, question answering, and summarization tasks demonstrate that DP-SGD-RC matches baseline utility while significantly reducing memory and compute requirements.

URL PDF HTML ☆

赞 0 踩 0

2605.24873 2026-05-26 cs.CL cs.AI cs.LG

Towards a Universal Causal Reasoner

迈向通用因果推理器

Qirun Dai, Xiao Liu, Jiawei Zhang, Dylan Zhang, Hao Peng, Chenhao Tan

AI总结提出UniCo数据生成框架，覆盖Pearl因果阶梯的18种查询类型，将符号示例转化为代码和自然语言，通过监督微调显著提升LLM的因果推理能力和推理忠实度。

详情

AI中文摘要

尽管因果推理的重要性不言而喻，但训练LLM进行因果推理仍未被充分探索。现有的数据工作大多集中在针对因果关系的特定方面对LLM进行基准测试，这使得它们不太适合训练可泛化的因果推理器。为了解决这个问题，我们提出了UniCo，一个数据生成框架，它既(1)涵盖了Pearl因果阶梯中的18种因果查询类型，又(2)将原生符号示例转化为代码和自然语言形式，以模拟因果术语未明确指定的真实世界用例。为确保数据质量，UniCo用精确的因果推理来支撑答案，并过滤掉存在推理捷径的案例。通过使用66.6K个UniCo生成的实例进行监督微调，Qwen3-4B、Qwen3-8B和Olmo-3-7B-Instruct在所有18种分布内查询类型上平均提升了22.9%，在训练分布之外的7个已建立的因果基准上，相比最先进的因果数据生成框架提升了8.1%。更重要的是，在真实世界的医学理解、法律决策和表格推理中，UniCo训练的模型始终展现出更忠实的推理轨迹，在忠实度指标上平均超过基础模型20.2%。这些结果表明，以因果为中心的训练不仅增强了因果推理能力，还赋予了LLM在一般推理任务中的因果思维。

英文摘要

Despite the importance of causal reasoning, training LLMs to reason causally remains underexplored. Existing data efforts mostly focus on benchmarking LLMs on specific aspects of causality, making them less suitable for training generalizable causal reasoners. To address this, we propose UniCo, a data generation framework that both (1) addresses 18 causal query types across Pearl's Causal Ladder and (2) translates natively symbolic examples into code and natural language forms to simulate real-world use cases where causal terms are not explicitly specified. To ensure data quality, UniCo grounds answers with exact causal inference and filters cases with reasoning shortcuts. Upon supervised finetuning with 66.6K UniCo-generated instances, Qwen3-4B, Qwen3-8B and Olmo-3-7B-Instruct achieve an average of 22.9% improvements across all 18 in-distribution query types, and 8.1% over state-of-the-art causal data generation frameworks on 7 established causal benchmarks outside the training distribution. More importantly, in real-world medical understanding, legal decision, and tabular reasoning, UniCo-trained models consistently display more faithful reasoning traces, outperforming the base models by an average of 20.2% in faithfulness metrics. These suggest that causality-centered training not only strengthens causal reasoning, but also equips LLMs with a causal mindset in general reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.24872 2026-05-26 cs.LG

Cluster Frequency Conformal Prediction for Local Coverage

聚类频率共形预测用于局部覆盖

Tomer Lavi, Bracha Shapira, Nadav Rappoport

AI总结提出聚类频率共形预测（CFCP）框架，通过聚类嵌入并估计局部标签频率分布，结合全局先验和可靠性感知收缩，在标准共形预测中实现类级覆盖改进，并在图像和文本基准上验证有效性。

详情

AI中文摘要

共形预测提供了无分布覆盖保证，但在多类分类中仍可能对特定类别或子群体覆盖不足，阻碍了在高风险应用中的安全部署。我们提出聚类频率共形预测（CFCP），一个即插即用框架，将共形预测适应于学习表示空间中的局部结构。CFCP对学习到的嵌入进行聚类，从校准数据中估计聚类级别的标签频率分布，并为每个测试点通过软混合附近聚类分布（经全局先验和可靠性感知收缩正则化）构建样本特定的概率向量。然后使用标准集构造器对该向量进行共形化。在不相交分割机制下，CFCP继承了标准的有限样本边际有效性。在额外假设下，CFCP进一步允许局部有效性解释。由于表示聚类聚合了局部相似样本，其经验类别频率提供了局部标签歧义的稳定估计。在图像和文本基准上，CFCP在15/16个数据集/评分族比较中实现了最佳类别覆盖，并具有竞争力的预测集大小效率，其中多个设置显著更高效。总体而言，我们的结果表明，聚类频率信息为改善多类共形预测中的类级可靠性提供了有效的局部化信号。

英文摘要

Conformal prediction provides distribution-free coverage guarantees, but in many-class classification it may still under-cover specific classes or subpopulations, preventing safe deployment in high-stakes applications. We propose Cluster Frequency Conformal Prediction (CFCP), a plug-in framework that adapts conformal prediction to local structure in a learned representation space. CFCP clusters learned embeddings, estimates cluster-level label-frequency distributions from calibration data, and for each test point constructs a sample-specific probability vector by softly mixing nearby cluster distributions regularized with global-prior and reliability-aware shrinkage. This vector is then conformalized using standard set constructors. In the disjoint-split regime, CFCP inherits standard finite-sample marginal validity. Under additional assumptions, CFCP further admits a local-validity interpretation. Since representation clusters aggregate locally similar samples, their empirical class frequencies provide a stable estimate of local label ambiguity. Across image and text benchmarks, CFCP achieves the best class coverage in 15/16 dataset/score-family comparisons and a competitive prediction set size efficiency, with several settings substantially more efficient. Overall, our results show that cluster-frequency information provides an effective localized signal for improving classwise reliability in many-class conformal prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.24870 2026-05-26 cs.CV

Trajectory-Consistent Calibration for Cache-Accelerated Diffusion Models

轨迹一致校准用于缓存加速扩散模型

Mingyu Liang, Dingkun Xu, Jingwei Xu

AI总结针对缓存加速扩散模型中表示偏差导致生成质量下降的问题，提出无训练的轨迹一致校准方法，通过离线迭代校准缓存表示，在PixArt-alpha和DiT-XL/2上持续改善FID。

Comments 23 pages, 8 figures, 8 tables. Code is available at https://github.com/NJUDeepEngine/TCC

详情

AI中文摘要

扩散Transformer在迭代采样过程中需要重复进行去噪器评估，导致推理计算成本高昂。基于缓存的加速方法通过跨去噪步骤重用中间表示来降低这一成本，但可能引入表示偏差并降低生成质量。本文分析了这些偏差，并表明有效的校准应考虑重用导致的直接不匹配以及先前校正引起的后续轨迹偏移。为解决这一挑战，我们提出了轨迹一致校准（TCC），一种无训练的方法，将缓存表示校准为其全计算对应物。具体而言，TCC并非从单个未校正的缓存轨迹中估计所有校准先验，而是使用离线迭代过程，使得每个先验都考虑先前校准引起的轨迹偏移。在PixArt-alpha和DiT-XL/2上的实验表明，TCC在保持底层重用策略的同时，持续改善了代表性缓存加速方法的FID。值得注意的是，在基于FORA的典型PixArt-alpha缓存加速设置中，TCC将FID从29.83降至27.35，略微超过了全计算基线。

英文摘要

Diffusion Transformers require repeated denoiser evaluations during iterative sampling, making inference computationally expensive. Cache-based acceleration reduces this cost by reusing intermediate representations across denoising steps, but can introduce representation deviations and degrade generation quality. In this paper, we analyze these deviations and show that effective calibration should consider both the direct mismatch caused by reuse and the subsequent trajectory shift induced by earlier corrections. To address this challenge, we propose Trajectory-Consistent Calibration (TCC), a training-free method that calibrates cached representations toward their full-computation counterparts. Specifically, rather than estimating all calibration priors from a single uncorrected cache trajectory, TCC uses an offline iterative procedure so that each prior accounts for the trajectory shift induced by preceding calibrations. Experiments on PixArt-alpha and DiT-XL/2 show that TCC consistently improves FID across representative cache-based acceleration methods while preserving their underlying reuse policies. Notably, in a representative PixArt-alpha cache-acceleration setting based on FORA, TCC reduces FID from 29.83 to 27.35, slightly surpassing the full-computation baseline.

URL PDF HTML ☆

赞 0 踩 0

2605.24869 2026-05-26 cs.CL

Lngram: N-gram Conditional Memory in Latent Space

Lngram: 潜在空间中的N-gram条件记忆

Yunao Zheng, Guoyang Xia, Xiaojie Wang, Lei Ren

AI总结提出Lngram，一种在潜在空间中学习离散符号并执行N-gram查找的条件记忆模块，以解耦检索与骨干网络，提升长上下文语言建模和跨模态任务性能。

详情

AI中文摘要

序列建模需要组合推理和局部静态知识检索，而标准Transformer通过密集计算处理两者。Engram部分地将检索与骨干网络解耦，但其基于token的键仍依赖于文本分词和哈希压缩。我们提出Lngram，一种潜在空间中的条件记忆模块，直接从隐藏状态学习离散符号，并对这些符号执行N-gram查找。该设计消除了对分词器ID的依赖，并自然地扩展到非文本模态。在我们的评估设置中，Lngram优于Transformer和Engram基线，在长上下文语言建模中持续降低困惑度，并在事后添加到预训练模型时有效注入领域知识。与骨干网络的联合训练进一步超越了完全微调，而在视觉-语言和视觉-语言-动作任务上的实验显示了整体提升。使用LogitLens和CKA的分析表明，Lngram使预测相关信息更早出现，在有限的推理和内存开销下增加了有效深度。代码可在https://github.com/zyaaa-ux/Lngram获取。

英文摘要

Sequence modeling requires both compositional reasoning and local static knowledge retrieval, yet standard Transformers handle both through dense computation. Engram partially decouples retrieval from the backbone, but its token-based keys remain tied to text tokenization and hash compression. We propose Lngram, a latent-space conditional memory module that learns discrete symbols directly from hidden states and performs N-gram lookup over these symbols. This design removes the dependence on tokenizer IDs and naturally extends to non-text modalities. In our evaluated settings, Lngram outperforms Transformer and Engram baselines, consistently reduces perplexity in long-context language modeling, and effectively injects domain knowledge when added post hoc to pretrained models. Joint training with the backbone further surpasses full fine-tuning, while experiments on vision-language and vision-language-action tasks show overall gains. Analyses with LogitLens and CKA suggest that Lngram enables prediction-relevant information to emerge earlier, increasing effective depth with limited inference and memory overhead. Code is available at https://github.com/zyaaa-ux/Lngram.

URL PDF HTML ☆

赞 0 踩 0

2605.24868 2026-05-26 cs.LG nlin.CD physics.comp-ph

A comparative study of accuracy and rollout stability of temporal surrogate models

时间代理模型的准确性与展开稳定性比较研究

Rajarshi Biswas

AI总结本文比较了多种深度神经网络架构在混沌动力系统时间代理建模中的长期预测稳定性，发现具有积分器式更新的模型表现出更低的偏差和扰动放大，从而实现稳定的长期展开和更准确的预测。

Comments 24 pages, 18 figures, submitted to journal

详情

AI中文摘要

时间代理模型对于预测计算成本可能过高的混沌动力系统是有效的。几种深度神经网络架构可用于此目的。在这项工作中，使用共同的训练协议比较了几种常用的架构。目标是公平评估模型架构对长期预测稳定性的影响。针对三个问题进行了实验：双摆、Kuramoto-Sivashinsky方程和Kolmogorov流。实验在匹配模型容量的情况下进行。还对每个模型单独优化的场景进行了分析。观察到在两种场景中，模型在长期展开中表现出类别差异。为了具体量化，使用局部雅可比、相对单步偏差和有限时间李雅普诺夫增长等指标分析了逐步误差注入和扰动放大。此外，还进行了吸引子分析，以评估学习模型复制底层系统几何形状的程度。还进行了消融研究，以隔离连续更新架构中每个组件的影响。结论是，具有积分器式更新的模型表现出更低的偏差和扰动放大，从而产生稳定的长期展开和更准确的预测。

英文摘要

Temporal surrogate models are effective for predicting chaotic dynamical systems where computational cost can be prohibitive. Several deep neural network architectures can be used for such purposes. In this work, a few commonly used architectures are compared using a common training protocol. The objective is to fairly assess the impact of model architectures for long-horizon prediction stability. Experiments are carried out for three problems, the double pendulum, the Kuramoto-Sivashinsky equations, and the Kolmogorov flow. The experiments are carried out with matching model capacity. Analysis is also carried out for a scenario where each model is individually optimized. It is observed that in both scenarios, the models exhibit categorical differences in long-horizon rollouts. For a concrete quantification, stepwise error injections and perturbation amplifications are analyzed using metrics such as local jacobian, relative one-step bias, and finite-time Lyapunov growth. Additionally, an attractor analysis is also conducted to assess how well the learned models replicate the underlying system geometry. An ablation study to isolate the impact of each component of a continuous-update architecture is also carried out. It is concluded that models that having integrator-like updates show lower bias and perturbation amplification yielding stable long-horizon rollout and more accurate predictions.

URL PDF HTML ☆

赞 0 踩 0

2605.24867 2026-05-26 cs.AI cs.CL cs.NI

Clustering as Reasoning: A $k$-Means Interpretation of Chain-of-Thought Graph Learning

聚类即推理：思维链图学习的 $k$-均值解释

Xuanting Xie, Zhaochen Guo, Bingheng Li, Xingtong Yu, Zhifei Liao, Zhao Kang, Yuan Fang

AI总结提出KCoT框架，通过将Transformer块与$k$-均值算法建立数学对应，将思维链推理与图表示学习统一，实现迭代语义-拓扑交互，在标准基准上超越现有方法。

Comments Accepted by ICML 2026

详情

AI中文摘要

思维链（CoT）提示在增强大型语言模型（LLMs）对文本属性图（TAGs）的推理能力方面显示出潜力。本文通过聚类即推理的原则重新审视基于CoT的图学习，提供了关于迭代推理如何在图结构数据上运行的$k$-均值解释。我们观察到现有的图CoT方法依赖于分离的架构和固定的图表示，限制了逐步的语义-拓扑交互和可解释性。为克服这一限制，我们提出了一个名为KCoT的统一框架，将CoT推理与图表示学习相结合。我们的关键理论结果揭示了Transformer块与$k$-均值算法之间的形式数学对应，使得推理可以被解释为迭代的分配和更新步骤。基于这一见解，我们引入了一个语义判别提示，明确将这些步骤形式化为结构化的CoT推理，并采用结构对齐策略将拓扑先验与演化的思维条件表示融合。在标准基准上的实验表明，与最先进的方法相比，该方法持续改进，验证了聚类作为基于CoT的图学习的原则性机制。

英文摘要

Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs) on text-attributed graphs (TAGs). This work reframes CoT-based graph learning through the principle of clustering as reasoning, offering a $k$-means interpretation of how iterative reasoning operates over graph-structured data. We observe that existing graph CoT methods rely on disjoint architectures and fixed graph representations, limiting step-by-step semantic-topological interaction and interpretability. To overcome this limitation, we propose a unified framework named KCoT that integrates CoT reasoning with graph representation learning. Our key theoretical result reveals a formal mathematical correspondence between a Transformer block and the $k$-means algorithm, allowing reasoning to be interpreted as iterative assignment and update steps. Based on this insight, we introduce a Semantic Discriminating Prompt that explicitly formulates these steps as structured CoT reasoning, together with a structure-grounded alignment strategy to fuse topological priors with evolving thought-conditioned representations. Experiments on standard benchmarks demonstrate consistent improvements over state-of-the-art methods, validating clustering as a principled mechanism for CoT-based graph learning.

URL PDF HTML ☆

赞 0 踩 0

2605.24862 2026-05-26 cs.LG

Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets

统一跨域离线强化学习中异构数据集的价值对齐与价值分配

Zhongjian Qiao, Jiafei Lyu, Chenjia Bai, Peisong Wang, Siyang Gao, Shuang Qiu

AI总结针对异构跨域离线强化学习中价值误分配问题，提出V2A方法，通过时间一致模态表示学习和模态感知优势学习统一动力学对齐、价值对齐与价值分配，显著提升策略性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

跨域离线强化学习旨在利用有限的目标域数据集和存在动力学偏移的源域数据集，在目标域中学习策略。直接在原始源数据集上训练通常会导致性能崩溃。最近的研究从动力学对齐或价值对齐的角度进行数据过滤，以实现有效的策略迁移。然而，这些研究通常在单域或单行为策略的源数据集上验证。在这项工作中，我们探索了一个更一般的异构跨域离线强化学习设置，其中源数据集可能由多种行为策略从多个源域收集。我们首先揭示了该设置中一个关键但被忽视的问题：价值误分配。通过实验和理论，我们证明了价值误分配会破坏价值对齐，误导数据过滤选择次优样本，并扩大次优性差距，从而降低智能体的性能。为了解决这个问题，我们提出了V2A，它整合了动力学对齐、价值对齐和价值分配。V2A首先采用时间一致的模态表示学习从源数据集中提取动力学模态，然后通过模态感知优势学习纠正价值对齐。最后，它采用数据过滤范式选择性共享源数据进行策略学习。实验结果表明，在一般异构跨域离线强化学习设置下，V2A显著优于强基线方法。

英文摘要

Cross-domain offline reinforcement learning (RL) aims to learn a policy in the target domain with a limited target domain dataset and a source domain dataset that exhibits a dynamics shift. Training directly on the original source dataset typically leads to performance collapse. Recent studies perform data filtering from the perspective of dynamics alignment or value alignment to enable efficient policy transfer. However, these studies are typically validated on single-domain or single-behavior-policy source datasets. In this work, we explore a more general heterogeneous cross-domain offline RL setting, where the source datasets may be collected from multiple source domains by diverse behavior policies. We first uncover a critical yet overlooked issue in this setting: value misassignment. Empirically and theoretically, we demonstrate that value misassignment can undermine value alignment, mislead data filtering toward selecting suboptimal samples, and loosen the suboptimality gap, thereby degrading the agent's performance. To address this issue, we propose V2A, which integrates dynamics alignment, value alignment, and value assignment. V2A first employs temporally-consistent modality representation learning to extract dynamics modalities from the source dataset, followed by modality-aware advantage learning to rectify value alignment. Finally, it adopts a data filtering paradigm to selectively share source data for policy learning. Empirical results show that V2A significantly outperforms strong baseline methods under general heterogeneous cross-domain offline RL settings.

URL PDF HTML ☆

赞 0 踩 0

2605.24856 2026-05-26 cs.LG cs.AI

The Concept Allocation Zone: Tracking How Concepts Form Across Transformer Depth

概念分配区：追踪概念如何跨越Transformer深度形成

James Henry

AI总结提出概念分配区（CAZ）框架，通过层间度量（分离度、概念一致性、概念速度）检测概念在残差流中逐渐形成的深度区间，并在34个模型上验证了分离曲线的多模态性及温和CAZ的因果活性。

Comments 34 models, 8 architectural families, 7 concepts. Companion papers: GEM (arXiv forthcoming), CAZ Validation (arXiv forthcoming), PRH Validation (arXiv forthcoming). Code: https://github.com/jamesrahenry/Rosetta_Tools

详情

AI中文摘要

Transformer语言模型中的概念形成是深度扩展的，而非单层事件：概念在残差流的连续区域内逐渐出现。可解释性方法识别出类别分离峰值的单层——

英文摘要

Concept formation in transformer language models is depth-extended, not a single-layer event: concepts emerge gradually across a contiguous region of the residual stream. Mechanistic interpretability methods identify the single layer of peak class separation -- the "best layer" -- capturing a snapshot rather than the process itself. We introduce the Concept Allocation Zone (CAZ): the depth interval within which a concept becomes measurably separable, the region allocated to its geometric expression. We formalize the CAZ through three layer-wise metrics (Separation, Concept Coherence, Concept Velocity) and derive principled boundary detection without manual layer sweeps. A CAZ is not a concept: it is the depth region within which the model organizes its geometry to make a concept separable. A single concept typically participates in multiple CAZes; multiple concepts may share one. Empirical validation across 34 models from 8 architectural families and 7 concepts reveals that the separation curve S(l) is frequently multimodal. A scored detector uncovers "gentle CAZes" -- subtle allocation regions invisible to standard peak detection but causally active in 93-100% of cases under ablation (16 of 34 models; 26 in the companion validation paper). The framework generates seven testable predictions; four yield clear verdicts (two not supported, one partially supported, one supported), one had its precondition invalidated by the data, and two are underpowered -- with cross-architecture alignment confirmed as depth-matched rather than monolithic under leave-one-concept-out cross-validation. Reference implementation: rosetta_tools v1.3.1 (doi:10.5281/zenodo.20361433).

URL PDF HTML ☆

赞 0 踩 0

2605.24852 2026-05-26 cs.LG cs.SY eess.SY

T2S-MPC: Time-Embedded Online Adaptive Model Predictive Control for Time-Varying Dynamics

T2S-MPC：面向时变动力学的时间嵌入在线自适应模型预测控制

Zeyu Shen, Zhuoyuan Wang, Laixi Shi

AI总结提出T2S-MPC框架，通过时间嵌入和双时间尺度更新在线学习残差动力学模型，实现快速时变环境下的自适应模型预测控制，在四旋翼任务中优于经典和神经MPC方法。

详情

AI中文摘要

基于学习的模型预测控制（MPC）的最新进展利用神经网络进行在线模型学习，当非平稳系统动力学偏离标称模型时，取得了强劲的性能。然而，现有方法主要处理特定或相对结构化的动力学变化形式，对于更一般、未知且不可预测的时变动力学处理不足。为应对这一挑战，我们提出T2S-MPC框架，该框架在线自适应学习残差动力学模型，并将其与MPC框架内的标称模型集成，以实现快速演变的在线规划。为使模型具有时间感知能力，我们通过结构化时间嵌入显式编码时间信息，并采用双时间尺度更新方案，使控制器能够捕捉非平稳动力学，同时平衡快速适应与稳定学习。我们在二维四旋翼上评估了所提方法，在多种时变扰动（包括线性漂移和周期性扰动）下执行稳定和轨迹跟踪任务。实验结果表明，T2S-MPC在控制性能上始终优于经典MPC、神经MPC及消融变体，同时在没有额外调参的情况下，在广泛的扰动条件下展现出强鲁棒性。源代码公开于https://github.com/Zeyuu0920/T2S_MPC。

英文摘要

Recent advances in learning-based model predictive control (MPC) have leveraged neural networks for online model learning, achieving strong performance when nonstationary system dynamics deviate from nominal models. However, existing approaches primarily address specific or relatively structured forms of dynamical variation, leaving more general, unknown, and unpredictable time-varying dynamics insufficiently handled. To tackle this challenge, we propose T2S-MPC, a framework that adaptively learns a residual dynamics model online and integrates it with the nominal model within the MPC framework to enable fast-evolving online planning. To make the model time-aware, we explicitly encode temporal information through a structured time embedding and employ a two-timescale update scheme, allowing the controller to capture nonstationary dynamics while balancing rapid adaptation with stable learning. We evaluate the proposed method on a 2D quadrotor across stabilization and trajectory tracking tasks under diverse time-varying disturbances, including linear drifting and periodic perturbations. Experimental results show that T2S-MPC consistently outperforms classical MPC, neural MPC, and ablated variants in control performance, while also demonstrating strong robustness across a wide range of disturbance conditions without additional tuning. The source code is publicly available at https://github.com/Zeyuu0920/T2S_MPC

URL PDF HTML ☆

赞 0 踩 0

2605.24850 2026-05-26 cs.CL cs.IT math.IT stat.AP

Repeated Sequences Reveal Gaps between Large Language Models and Natural Language

重复序列揭示大语言模型与自然语言之间的差距

Kumiko Tanaka-Ishii

AI总结通过分析重复子序列的分布及其与高阶Rényi熵的关系，提出一种评估大语言模型生成文本长程统计组织的框架，发现GPT生成文本在熵增长模式上与自然语言存在系统性差异。

Comments ACL 2026

详情

AI中文摘要

评估大语言模型（LLMs）是否捕捉到自然语言的结构（超越局部流畅性）仍然是一个开放的挑战。现有的评估方法主要基于任务性能或短上下文行为，对生成文本的长程统计组织提供的洞察有限。我们提出了一种基于重复子序列的补充评估框架。通过分析其跨尺度的分布并将其与高阶Rényi熵联系起来，我们探究文本在有限长度条件下如何重用先前建立的结构。对人类撰写的文本和长度匹配的GPT生成文本的实验表明，虽然幂律模型可以描述有限范围的块长度，但观察到的熵增长通常同样或更好地由对数-幂形式刻画。跨数据集，自然语言在可访问范围内表现出稳定的熵增长模式，尽管个体文本之间存在变异性，但平均行为一致。相比之下，GPT生成文本的估计指数随模型大小呈现系统性和统计显著的变化。这些结果表明，重复子序列熵提供了一种定量的结构诊断，揭示了长程组织中的系统性差异，从而在表面流畅性之外区分自然语言与最先进的LLM输出。

英文摘要

Evaluating whether large language models (LLMs) capture the structure of natural language beyond local fluency remains an open challenge. Existing evaluation methods, largely based on task performance or short-context behavior, provide limited insight into the long-range statistical organization of generated text. We propose a complementary evaluation framework based on repeated subsequences. By analyzing their distribution across scales and relating it to higher-order Rényi entropies, we probe how texts reuse previously established structure under finite-length conditions. Experiments on human-written texts and length-matched GPT-generated texts show that, while power-law models can describe restricted ranges of block length, the observed entropy growth is often equally or better characterized by logarithmic--power forms. Across datasets, natural language exhibits stable entropy-growth patterns over accessible ranges, with consistent average behavior despite variability across individual texts. In contrast, GPT-generated texts show systematic and statistically significant shifts in estimated exponents with model size. These results demonstrate that repeated-subsequence entropy provides a quantitative structural diagnostic that reveals systematic differences in long-range organization, distinguishing natural language from state-of-the-art LLM outputs beyond surface-level fluency.

URL PDF HTML ☆

赞 0 踩 0

2605.24845 2026-05-26 cs.AI math.CO

Solving Combinatorial Counting Problems with Weighted First-Order Model Counting

使用加权一阶模型计数解决组合计数问题

Yuanhong Wang, Juhua Pu, Yuxu Zhou, Yuyi Wang, Ondřej Kuželka

AI总结提出Cofola语言，通过类型化声明式编程和加权一阶模型计数（WFOMC）编译流水线，统一解决集合、多重集、排列、划分等组合计数问题。

Comments 47 pages, 9 figures

详情

AI中文摘要

组合计数问题遍及人工智能、统计学和离散数学。无论是枚举子集、多重集、排列、划分还是在结构和算术约束下的组合，解决它们仍然是一项顽固的手动练习。封闭形式的推导强大但脆弱，而将问题朴素编码为命题模型计数或约束满足会破坏使计数易于处理的交换性。我们提出了Cofola（组合计数语言与一阶逻辑），一种类型化声明式语言，其原语是日常计数问题中反复出现的组合对象，包括集合、袋子、元组、序列、圆圈、划分和组合，以及它们之上的自然关系和算术约束。指称语义将每个Cofola程序映射到一个明确定义的组合计数问题，一个三阶段编译流水线（预处理、分解和对称保持编码）将该问题简化为一个加权一阶模型计数（WFOMC）实例，并附加系数提取约束。为了尽可能保持在已知的可域提升片段内，编码将不可区分的实体分组，按字典序打破无序分组的对称性，并通过顺序公理编码序列和圆圈。在一系列代表性的组合计数问题上，从教科书数学问题到最接近的先前框架无法表达的多对象场景，Cofola生成了简洁的规范和统一的求解流水线，端到端实用。

英文摘要

Combinatorial counting problems pervade artificial intelligence, statistics, and discrete mathematics. Whether the task is enumerating subsets, multisets, permutations, partitions, or compositions under structural and arithmetic constraints, solving it remains a stubbornly manual exercise. Closed-form derivations are powerful but brittle, while naive encodings to propositional model counting or constraint satisfaction destroy the exchangeability that makes counting tractable in the first place. We present Cofola (COmbinatorial counting LAnguage with First-Order logic), a typed declarative language whose primitives are the combinatorial objects that recur in everyday counting questions, including sets, bags, tuples, sequences, circles, partitions, and compositions, together with natural relational and arithmetic constraints over them. A denotational semantics maps every Cofola program to a well-defined combinatorial counting problem, and a three-phase compilation pipeline (preprocessing, decomposition, and symmetry-preserving encoding) reduces this problem to a weighted first-order model counting (WFOMC) instance augmented with coefficient-extraction constraints. To stay inside known domain-liftable fragments whenever possible, the encoding groups indistinguishable entities, breaks the symmetry of unordered groupings lexicographically, and encodes sequences and circles via order axioms. On a suite of representative combinatorial counting problems, ranging from textbook math problems to multi-object scenarios that the closest prior framework cannot express, Cofola produces concise specifications and a uniform solving pipeline that is practical end-to-end.

URL PDF HTML ☆

赞 0 踩 0

2605.24844 2026-05-26 cs.AI cs.CL

Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning

Geo-Expert: 通过参数高效微调实现专家级地质推理

Chenyou Guo, Zongqi Liu, Yizhou Zhang, Zhaorui Jiang, Ze Liu

AI总结本文提出Geo-Expert，通过参数高效微调（LoRA）在定制高质量指令数据集上微调小规模语言模型，在专门的地质推理基准Geo-Eval上，8B模型超越70B通用模型和GPT-4o，32B模型接近前沿推理模型。

Comments 11 pages, 1 figure, 3 tables. Accepted at ICML 2026 AI for Science Workshop

详情

AI中文摘要

虽然应用于地质学的通用大语言模型（LLM）在推理地下结构和深时演化时常常产生幻觉，但目前地球科学中的人工智能主要针对地表遥感和GIS。为弥补这一差距，我们引入了Geo-Expert，这是一个参数高效的地质LLM系列，基于我们自定义指令合成流程处理的自定义策划高质量指令数据集进行微调。我们通过使用低秩适配（LoRA）方法微调三个基础模型：Qwen3-8B、Qwen3-32B和Gemma-3-27B，研究了模型缩放和架构的影响。我们在新的领域特定基准Geo-Eval上的广泛评估表明，领域对齐的8B模型在专门的地质推理上可以超越开放权重的70B通用模型和专有的GPT-4o，而32B变体接近前沿推理模型。优化后的8B模型进一步为部署提供了具有竞争力的性价比。这项工作为科学LLM的民主化提供了可复现的配方，并为地质人工智能建立了基线。

英文摘要

While general-purpose Large Language Models (LLMs) applied to Geology often hallucinate when reasoning about subsurface structures and deep-time evolution, current AI in Earth sciences predominantly targets surface remote sensing and GIS. To bridge this gap, we introduce Geo-Expert, a family of parameter-efficient geological LLMs fine-tuned on a custom-curated, high-quality instruction dataset processed using our custom instruction synthesis pipeline. We investigate the impact of model scaling and architecture by fine-tuning three base models: Qwen3-8B, Qwen3-32B, and Gemma-3-27B, with Low-Rank Adaptation (LoRA) method. Our extensive evaluation on a novel domain-specific benchmark, Geo-Eval, reveals that a domain-aligned 8B model can outperform open-weight 70B generalists and proprietary GPT-4o on specialized geological reasoning, while a 32B variant approaches frontier reasoning models. The optimized 8B model further offers a competitive cost-performance ratio for deployment. This work provides a reproducible recipe for democratizing scientific LLMs and establishes a baseline for geological artificial intelligence.

URL PDF HTML ☆

赞 0 踩 0

2605.24843 2026-05-26 cs.CV cs.AI

Adversarial Error Correction for Visual Autoregressive Generation

视觉自回归生成的对抗性纠错

Ligong Bi, Tao Huang, Jianyuan Guo, Chang Xu

AI总结提出AID-VAR框架，通过对抗性注入诊断机制纠正视觉自回归模型中的级联误差，提升生成质量。

详情

AI中文摘要

视觉自回归（VAR）模型通过执行层次化的下一尺度预测，已成为图像合成的强大范式。然而，VAR模型天生容易产生级联误差传播，其中细微的粗尺度误预测会在层次结构中放大，最终扭曲最终合成。为了缓解这一问题，我们提出了AID-VAR，一个即插即用的框架，通过对抗性注入诊断增强预训练的VAR。与标准的被动生成不同，AID-VAR引入了一种主动纠错机制，灵感来自GAN中的对抗性反馈。我们部署了一个判别器来诊断每个尺度转换处的保真度差距，并配有一个轻量级的引导注入器。该模块作为一个非侵入式适配器，优化冻结的VAR骨干网络的特征流形，有效引导生成朝向真实图像的分布，同时不破坏预训练潜在空间的稳定性。此外，为了严格评估这种跨尺度进展，我们引入了跨尺度一致性得分（ISCS），这是一个新的度量标准，用于量化连续分辨率尺度之间的保真度和结构对齐。在各种骨干网络上的实验结果表明，AID-VAR以可忽略的开销提供了更清晰的纹理细节和更少的结构失真。例如，AID-VAR-d20在参数仅增加3%的情况下，FID提升了16%。这些结果确立了AID-VAR作为升级大规模VAR生成器的高效且可扩展的途径，在不改变训练数据、基础架构或采样调度的情况下，增强了全局连贯性和局部细节。代码可在https://github.com/bijiw515/AID-VAR获取。

英文摘要

Visual Autoregressive (VAR) models have emerged as a powerful paradigm for image synthesis by performing hierarchical next-scale prediction. However, VAR models are inherently prone to cascading error propagation, where subtle coarse-scale mispredictions are amplified across the hierarchy, ultimately distorting the final synthesis. To mitigate this, we propose AID-VAR, a plug-and-play framework that enhances pre-trained VARs through Adversarially Injected Diagnosis. Instead of a standard passive generation, AID-VAR introduces a proactive error-correction mechanism inspired by the adversarial feedback in GANs. We deploy a discriminator to diagnose fidelity gaps at each scale transition, coupled with a lightweight guidance injector. This module operates as a non-invasive adapter that refines the feature manifold of a frozen VAR backbone, effectively steering the generation toward the distribution of real images without destabilizing the pre-trained latent space. Furthermore, to rigorously evaluate this cross-scale progression, we introduce the Inter-Scale Consistency Score (ISCS), a novel metric that quantifies the fidelity and structural alignment between consecutive resolution scales. Experimental results across various backbones demonstrate that AID-VAR delivers sharper textural details and fewer structural distortions with negligible overhead. For instance, AID-VAR-d20 achieves a 16% improvement in FID with only a 3% increase in parameters. These results establish AID-VAR as a highly efficient and scalable pathway for upgrading large-scale VAR generators, enhancing global coherence and local detail without altering training data, base architectures, or sampling schedules. Code is available at https://github.com/bijiw515/AID-VAR.

URL PDF HTML ☆

赞 0 踩 0

2605.24842 2026-05-26 cs.CL cs.CY

Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data

译者作为人工智能的无形教师：版权、翻译记忆库与语言数据的政治经济学

Masaru Yamada

AI总结本文研究译者劳动如何转化为人工智能的基础数据资本，提出“无消费的挪用”和“译者的无形教师化”两个概念，并探讨版权框架下的数据供应链与再分配设计方向。

Comments 13 pages; comments welcome

详情

AI中文摘要

本文考察了译者的劳动如何转化为人工智能时代的基础数据资本。翻译记忆库和平行语料库保留了源文本和目标文本之间的一一对应关系，因此构成了机器翻译极其宝贵的监督训练数据。统计机器翻译、神经机器翻译、Transformer架构以及多语言大语言模型的发展与这类翻译数据的积累密不可分。然而，译者的译文作为合同交付物被购买，作为技术对象被分割，并在版权法下作为“信息分析”数据被处理——失去了对产生它们的译者的道德、创作和经济归属。本文提出了两个概念来捕捉这一过程。第一个是“无消费的挪用”：一种使用模式，作品不被阅读、观看或聆听，而仅被挖掘统计特征——这种使用在日本著作权法第30-4条下是合法的。第二个是“译者的无形教师化”：译者通过构建翻译记忆库、译后编辑和质量评估，充当了人工智能的教师而未得到承认的过程。基于从译者通过语言服务提供商和平台到模型开发者的数据供应链，对日本、欧洲和美国法律框架的比较解读，开放与专有AI模型的区分，以及人类生成数据在模型崩溃时代获得的溢价地位，本文探讨了译者实际担忧的问题，并指出了再分配设计的具体方向。

英文摘要

This paper examines how the labour of translators has been transformed into foundational data capital for the age of artificial intelligence (AI). Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation. The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of such translation data. And yet, translators' renditions have been bought as deliverables under contract, segmented as technical objects, and processed as "information analysis" data under copyright law -- losing their moral, creative, and economic attribution to the translators who produced them. The paper develops two concepts to capture this process. The first is appropriation without consumption: a mode of use in which works are not read, viewed, or listened to, but only mined for statistical features -- a use that is legitimated under Article 30-4 of the Japanese Copyright Act. The second is the invisible teacherisation of translators: the process by which translators, through the construction of translation memories, post-editing, and quality assessment, have functioned as teachers of AI without recognition as such. Drawing on the data supply chain that runs from translators through language service providers (LSPs) and platforms to model developers, on a comparative reading of Japanese, European, and United States legal frameworks, on the distinction between open and proprietary AI models, and on the premium status that human-generated data has acquired in the era of model collapse, the paper asks what translators are actually afraid of, and points toward concrete directions for redistributive design.

URL PDF HTML ☆

赞 0 踩 0

2605.24841 2026-05-26 cs.LG

DriftingMol: Decoder-Coupled Drift for One-Pass Property-Conditional Molecular Generation

DriftingMol: 用于一次性属性条件分子生成的解码器耦合漂移

Jiangjie Qiu, Yijun Li, Wentao Li, Xiaonan Wang

AI总结提出 DriftingMol 两阶段框架，通过解码器耦合漂移将漂移模型适应于 SELFIES 潜在分子空间，实现低采样成本、高有效性和多样性的属性条件分子生成。

Comments 9 pages, 5 figures

详情

AI中文摘要

属性条件分子生成应在响应连续目标值的同时，以低采样成本生成有效且多样的分子。我们引入了 DriftingMol，一个两阶段框架，将漂移模型适应于 SELFIES 潜在分子空间。冻结的 SELFIES beta-VAE 提供潜在空间，其解码器的隐藏表示作为漂移特征图。在解码器耦合漂移中，解码器权重保持不变，但漂移梯度通过解码器特征图反向传播到 DiT 生成器，从而诱导出与分子解码对齐的拉回度量。在 ZINC250K 上，默认设置实现了 QED Spearman 相关系数 0.493，独特性 94.7%，而最强的解码器耦合条件达到 0.510。在协议匹配的四属性条件下，解码器耦合漂移的平均 Spearman 相关系数高达 0.598。在 15 个受控变体中，保留通过解码器特征的梯度路径的模型比测试的潜在空间、随机特征和外部特征漂移变体实现了更高的相关性，而分离或停止梯度的解码器控制导致 QED 相关性接近零且独特性极低。这些结果表明，解码器耦合漂移是一种有用的低成本机制，用于属性偏置分子生成，只需一次生成器评估和一次冻结解码器传递。

英文摘要

Property-conditional molecular generation should produce valid, diverse molecules while responding to continuous target values at low sampling cost. We introduce DriftingMol, a two-stage framework that adapts drifting models to a SELFIES latent molecular space. A frozen SELFIES beta-VAE provides the latent space, and the hidden representation of its decoder serves as the drift feature map. In decoder-coupled drift, decoder weights remain fixed, but drift gradients are backpropagated through the decoder feature map to a DiT generator, inducing a pullback metric aligned with molecular decoding. On ZINC250K, the default setting achieves QED Spearman correlation 0.493 with 94.7% uniqueness, while the strongest decoder-coupled condition reaches 0.510. Under protocol-matched four-property conditioning, decoder-coupled drift reaches mean Spearman correlation up to 0.598. Across 15 controlled variants, models that preserve the gradient path through decoder features achieve higher correlations than the tested latent-space, random-feature, and external-feature drift variants, while detached or stop-gradient decoder controls yield near-zero QED correlation and very low uniqueness. These results indicate that decoder-coupled drift is a useful low-cost mechanism for property-biased molecular generation, requiring one generator evaluation and one frozen decoder pass.

URL PDF HTML ☆

赞 0 踩 0

2605.24831 2026-05-26 cs.CV cs.AI

Multiscale Real-Time Object Detection in the NMS-Free Era: A Comparative Performance Evaluation of YOLOv8 and YOLO26

无NMS时代的实时多尺度目标检测：YOLOv8与YOLO26的对比性能评估

Chidera G. Oguine, Kanyifeechukwu J. Oguine, Obiozor M. Oguine, Ozioma C. Oguine

AI总结本文在Pascal VOC和VisDrone数据集上，从准确率、定位、模型大小、计算量和延迟等维度，系统比较了基于NMS的YOLOv8与无NMS的YOLO26在多尺度下的性能，发现YOLO26在多数尺度上检测更强且模型复杂度更低，但在密集小目标场景下优势缩小，且YOLOv8在GPU延迟上仍有竞争力。

Comments 11 pages, 6 tables, 9 figures

详情

AI中文摘要

非极大值抑制（NMS）仍然是许多实时目标检测流程中的关键后处理步骤，但在资源受限的环境中可能引入延迟变化和部署复杂性。最近的无NMS设计（如YOLO26）旨在通过端到端检测减少这种依赖，然而与基于NMS的成熟模型（如YOLOv8）相比，其性能在标准基准之外尚未得到充分探索。本文在Pascal VOC和VisDrone上比较了YOLOv8和YOLO26，这两个数据集分别代表通用目标检测和密集空中小目标检测。两个模型家族在五个尺度上使用准确率、定位、模型大小、GFLOPs以及CPU/GPU延迟进行评估。结果表明，YOLO26在Pascal VOC上的大多数尺度上实现了更强的检测性能和更低的模型复杂度，而在VisDrone上性能差距缩小，两个模型在处理密集小目标时均表现困难。YOLOv8在GPU延迟上仍具有竞争力，表明无NMS设计并不能保证普遍的部署优势。总体而言，研究表明检测器的选择取决于数据集特征、目标尺度、模型容量和硬件约束。

英文摘要

Non-Maximum Suppression (NMS) remains a key post-processing step in many real-time object detection pipelines, but it can introduce latency variation and deployment complexity in resource-constrained settings. Recent NMS-free designs such as YOLO26 aim to reduce this dependence through end-to-end detection, yet their performance relative to established NMS-based models such as YOLOv8 remains underexplored beyond standard benchmarks. This paper compares YOLOv8 and YOLO26 on Pascal VOC and VisDrone, representing general object detection and dense aerial small-object detection, respectively. Both model families are evaluated across five scales using accuracy, localization, model size, GFLOPs, and CPU/GPU latency. Results show that YOLO26 achieves stronger detection performance and lower model complexity on Pascal VOC across most scales, while the performance gap narrows on VisDrone, where both models struggle with dense small targets. YOLOv8 remains competitive in GPU latency, showing that NMS-free design does not guarantee universal deployment superiority. Overall, the study shows that detector selection depends on dataset characteristics, object scale, model capacity, and hardware constraints.

URL PDF HTML ☆

赞 0 踩 0