arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.02117 2026-06-02 stat.ML cs.LG stat.ME

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

ProbRes: 概率时间序列预测的波动率学习

Tingting Wang, Yunyi Zhang, Benyou Wang

AI总结 提出ProbRes,一种事后概率校准方法,通过显式学习波动率动态来改进概率预测,有效处理异方差数据,并在理论和实验上验证其有效性。

详情
AI中文摘要

概率时间序列预测由于需要量化未来观测中的风险和不确定性,在金融应用中引起了越来越多的关注。我们提出ProbRes,一种事后概率校准方法,它显式地学习并将波动率动态纳入概率预测中,从而能够有效处理异方差数据。在训练过程中,ProbRes采用两个与架构无关的模块分别对条件均值和条件波动率进行建模。在推理阶段,它通过重采样标准化残差生成预测分布。ProbRes适用于单变量和多变量时间序列,并且在广泛的误差分布下保持稳健,包括具有条件异方差的非高斯创新。理论结果证明了ProbRes的有效性,在合成和真实数据集上的实验表明,ProbRes准确捕捉预测分布并产生校准良好的预测区间。

英文摘要

Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During training, ProbRes employs two architecture-agnostic modules to separately model the conditional mean and conditional volatility. At the inference stage, it generates predictive distributions by resampling normalized residuals. ProbRes is applicable to both univariate and multivariate time series and remains robust under a wide range of error distributions, including non-Gaussian innovations with conditional heteroskedasticity. Theoretical results demonstrate ProbRes's validity and experiments on both synthetic and real-world datasets show that ProbRes accurately captures predictive distributions and produces well-calibrated prediction intervals.

2606.02115 2026-06-02 stat.ML cs.LG

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

AI总结 针对随机微分方程中已知扩散参数时的漂移估计问题,利用扩散模型理论推导了时间平均均方误差的显式风险界,将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

详情
Comments
Preprint
AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题,在许多科学领域具有重要意义。Tapia Costa等人(2026)的最新工作引入了一种新技术,当扩散参数已知时,利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题,并利用(条件)得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果,但其估计器的理论保证问题仍未解决。在本笔记中,我们通过利用扩散模型理论的技术来填补这一空白。更具体地说,我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为(i)Euler-Maruyama离散化,(ii)得分/去噪器近似,(iii)噪声初始化,以及(iv)采样方差,揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

2606.02113 2026-06-02 cs.CL cs.AI

A Primer in Post-Training Reasoning Data: What We Know About How It Works

后训练推理数据入门:我们对其运作机制的了解

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang

AI总结 本文综述了后训练推理数据的类型、效用、构建方法和扩展规律,为未来推理数据发布和后训练方案提供归因框架。

详情
Comments
22 pages. Project Repository: https://github.com/RenBing-Sumeru/Awesome-LLM-Reasoning-Data
AI中文摘要

后训练已成为大型推理模型近期进展的主要驱动力,而推理数据通常是决定这一阶段成功与否的关键变量。关于后训练推理数据的研究迅速增长,但相关文献仍分散在数据集论文、强化学习方案、奖励模型研究、基准测试和前沿系统报告中。本文是首篇综合了超过150篇关键公开研究和系统报告的后训练推理数据入门文章。我们围绕四个问题组织该领域:存在哪些数据对象、什么使它们有用、它们如何构建以及它们如何扩展。这一组织方式为未来的推理数据发布和后训练方案提供了归因框架。

英文摘要

Post-training has become a primary driver of recent progress in large reasoning models, and reasoning data are often the key variable determining whether this stage succeeds. Work on post-training reasoning data has grown rapidly, yet this literature remains scattered across dataset papers, reinforcement-learning recipes, reward-model studies, benchmarks, and frontier system reports. This paper is the first primer to synthesize over 150 key public studies and system reports on post-training reasoning data. We organize the field around four questions: what data objects exist, what makes them useful, how they are constructed, and how they scale. Together, this organization provides an attribution framework for future reasoning-data releases and post-training recipes.

2606.02111 2026-06-02 cs.CV cs.AI cs.CL

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

使用多片段视频破解多模态大语言模型

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

AI总结 提出MCV SafetyBench数据集,通过多片段视频评估多模态大语言模型的安全漏洞,发现视频模态比图像更脆弱,动态和多样化上下文增加攻击成功率,并基于图像模态的鲁棒性提出防御策略。

详情
Comments
27 pages, 20 figures, Accepted to the Main Conference of ACL 2026
AI中文摘要

随着多模态大语言模型(MLLMs)发展到处理视频输入,人们开始担忧其被恶意滥用的可能性。先前的越狱研究表明,MLLMs中的安全对齐可以通过视觉输入被绕过,但尚不清楚视频输入的哪些属性导致了这种脆弱性。为填补这一空白,我们引入了Multi-Clip Video (MCV) SafetyBench,一个包含2,920个视频的数据集,旨在评估视频输入的多样性如何影响MLLMs的脆弱性。每个视频由多个短片段组成,描述与有害查询相关的不同上下文。对八个代表性视频MLLMs的实验表明,攻击成功率随着片段数量的增加而持续提高。我们的结果进一步表明,视频模态(1)比图像模态更脆弱,(2)对动态视频比对静态视频更脆弱,(3)当视频包含更多样化的上下文时更脆弱。基于这些发现,我们提出了一种利用图像模态相对鲁棒性的防御策略。

英文摘要

As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for malicious misuse. Prior jailbreak studies have shown that safety alignment in MLLMs can be bypassed through visual inputs, yet it remains unclear which properties of video inputs induce this vulnerability. To address this gap, we introduce Multi-Clip Video (MCV) SafetyBench, a dataset of 2,920 videos designed to evaluate how the diversity of video inputs affects the vulnerability of MLLMs. Each video consists of multiple short clips depicting diverse contexts related to a harmful query. Experiments on eight representative video MLLMs show that attack success consistently increases with the number of clips. Our results further indicate that the video modality is (1) more vulnerable than the image modality, (2) more vulnerable to dynamic videos than to static videos, and (3) more vulnerable when videos contain more diverse contexts. Building on these findings, we propose a defense strategy that leverages the relative robustness of the image modality.

2606.02109 2026-06-02 cs.AI

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

BADGER:桥接生成式企业推理的自主与确定性评估

Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller

AI总结 提出BADGER框架,统一文本到SQL评估与自主行为评估,通过混合执行准确率指标(Hybrid-EX)和自主评估套件,在工业查询上超越现有方法。

详情
Comments
30 pages, 2 figures, 6 tables
AI中文摘要

将自然语言转换为SQL查询并编排多步自主推理管道的企业AI系统需要与学术基准根本不同的评估方法。Spider和BIRD建立了执行准确率协议;G-Eval和RAGAS推进了基于LLM的评估;最近的工作如Spider 2.0、BEAVER和BIRD-Interact开始解决企业和自主维度。没有一个单一框架将文本到SQL评估与自主行为评估统一到一个生产级管道中,并针对人类专家判断进行校准。我们提出了在Merkle开发的BADGER,一个统一的评估框架,集成了文本到SQL评估与自主行为评估。BADGER提供三个贡献。首先,LLM辅助的SQL组件提取,扩展Spider方法以处理CTE-heavy、方言特定的SQL。其次,混合执行准确率指标(Hybrid-EX),通过使用LLM在确定性单元格级评分之前推断结构对齐,解决列别名和数值容错脆弱性。在150个人工标注的行业查询上验证,Hybrid-EX达到Cohen's kappa=0.717 [95% CI: 0.600-0.822](高度一致性)和87.3%的平衡准确率,优于所有六个竞争框架(Delta-kappa: 0.322-0.502,所有p<=0.001)。第三,一个企业自主评估套件,将RAGAS、G-Eval和代理基准指标组装成一个统一管道;超额工具使用是唯一的新元素。BADGER完全在客户受管的数据环境中运行,支持可配置的LLM评判后端,并支持快速原型化客户特定的评判器和指标,作为持续评估骨干而非一次性质量门。

英文摘要

Enterprise AI systems that translate natural language into SQL queries and orchestrate multi-step agentic reasoning pipelines require evaluation approaches fundamentally different from academic benchmarks. Spider and BIRD established execution-accuracy protocols; G-Eval and RAGAS advanced LLM-based assessment; and recent work such as Spider 2.0, BEAVER, and BIRD-Interact has begun to address enterprise and agentic dimensions. No single framework unifies text-to-SQL assessment with agentic behavior evaluation into a production-grade pipeline calibrated against human expert judgment. We present BADGER, developed at Merkle, a unified evaluation framework integrating text-to-SQL assessment with agentic behavior evaluation. BADGER offers three contributions. First, LLM-assisted SQL component extraction extending Spider methodology to handle CTE-heavy, dialect-specific SQL. Second, a hybrid execution accuracy metric (Hybrid-EX) resolving column-aliasing and numeric-tolerance brittleness by using an LLM to infer structural alignments before deterministic cell-level scoring. Validated on 150 human-annotated industry queries, Hybrid-EX achieves Cohen's kappa=0.717 [95% CI: 0.600-0.822] (Substantial agreement) and 87.3% balanced accuracy, outperforming all six competing frameworks (Delta-kappa: 0.322-0.502, all p<=0.001). Third, an enterprise agentic evaluation suite assembling RAGAS, G-Eval, and agent benchmark metrics into a unified pipeline; Excess Tool Usage is the sole novel element. BADGER runs entirely within the client's governed data environment, supports configurable LLM judge backends, and enables rapid prototyping of client-specific judges and metrics, serving as a continuous evaluation backbone rather than a one-time quality gate.

2606.02107 2026-06-02 cs.RO cs.AI cs.LG

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

网络分布式多智能体强化学习用于四旋翼无人机一致性控制

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

AI总结 提出网络分布式多智能体强化学习框架,利用通信图实现分布式策略,通过MASAC训练高层规划器,实现零样本扩展到250个智能体。

详情
Journal ref
2026 IEEE 23rd Mediterranean Electrotechnical Conference (MELECON)
Comments
This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
AI中文摘要

本文提出了一种用于四旋翼无人机一致性控制的网络分布式多智能体强化学习(ND-MARL)框架。与依赖集中式规划或完全分散式执行的传统多智能体MARL公式相比,ND-MARL将群体通信图纳入决策过程。在2-邻居通信拓扑下,每个智能体仅观察两个邻居的信息,并通过分布式策略输出动作。使用多智能体软演员-评论家(MASAC)训练高层分布式一致性规划器,并将其嵌入层次化堆栈中,以生成由低层四旋翼控制器跟踪的参考目标位置。结果表明,与集中式MARL控制器相比,实现了平滑的一致性轨迹和规划器-跟踪器集成。最值得注意的是,学习到的控制器表现出零样本可扩展性,即在三智能体系统上训练的策略,在相同的2-邻居通信拓扑下,无需重新训练或微调即可部署到多达250个智能体的群体中,实现了随着团队规模增大而稳态散布增加的一致收敛,这是由于稀疏信息传播所致。这些发现突显了ND-MARL作为分布式、通信感知的四旋翼一致性控制的稳定框架。

英文摘要

This paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework for quadcopter consensus control. Compared to conventional multi-agent MARL formulations that rely on centralized planning or fully decentralized execution, ND-MARL incorporates the swarm communication graph into the decision process. Under a 2-Neighbor communication topology, each agent observes information of only two neighbors and outputs an action through a distributed policy. A high-level distributed consensus planner is trained using Multi-Agent Soft Actor-Critic (MASAC) and embedded in a hierarchical stack to generate reference target positions tracked by a low-level quadcopter controller. Results demonstrate smooth consensus trajectories and planner-tracker integration when compared to a centralized MARL controller. Most notably, the learned controller exhibits zero-shot scalability, as policies trained on a three-agent system are deployed to swarms of up to 250 agents under the same 2-Neighbor communication topology without retraining or fine-tuning, achieving consistent convergence with increasing steady-state spread at large team sizes due to sparse information propagation. These findings highlight ND-MARL as a stable framework for distributed, communication-aware quadcopter consensus control.

2606.02106 2026-06-02 cs.LG stat.ML

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

当表格基础模型跨模态迁移:对95个数据集、7种模态和两种范式的系统评估

Julien Lafrance

AI总结 本文提出一种结合等角紧框架预处理与表格基础模型的分类流水线,在跨模态数据上评估其性能,并证明其在速度与质量间取得良好平衡。

详情
Comments
24 pages, 5 figures. Code and data available at https://doi.org/10.5281/zenodo.19982636
AI中文摘要

我们提出一个单一的分类流水线,该流水线结合了等角紧框架(ETF)预处理阶段和用于上下文推理的表格基础模型,一旦数据被映射到固定向量表示,该流水线在所有模态上应用相同。我们在涵盖七种信号模态——视觉、音频、语音、文本、分子、时间序列和表格——的95个数据集上对其进行评估。主要的方法论贡献是固定比较对象:在整个论文中,性能与相同冻结特征上最强的轻量级调优基线进行比较,而oracle选择、部署选择和专门微调则分别报告。该流水线在相同冻结特征上与强大的轻量级调优基线广泛竞争。它并不在每个任务上都匹配最好的专门模型或高度调优的流水线,但差距很小,且运行速度更快——通常比完整骨干微调快4到200倍,而质量往往相当。我们描述了如何在实际中部署该流水线:何时应用ETF预处理,如何在无验证集的情况下停止其训练,如何设置上下文分类器,以及如何校准所得概率。校准步骤并非装饰性的:TabICL通过构造产生良好校准的概率,ETF预处理最初会破坏该校准,而后处理重新缩放则恢复它——从而产生每个预测的置信度信号,从业者可以将其用作置信度门控部署的信任阈值。我们还报告了该流水线在哪些情况下不应期望有帮助,以及如何提前识别这些情况。

英文摘要

We present a single classification pipeline that combines an Equiangular Tight Frame (ETF) preprocessing stage with a tabular foundation model for in-context inference, applied identically across modalities once data is mapped to fixed vector representations. We evaluate it on 95 datasets spanning seven signal modalities -- vision, audio, speech, text, molecular, time-series, and tabular. The main methodological contribution is to fix the comparison object: throughout the paper, performance is judged against the strongest lightweight tuned baseline on the same frozen features, while oracle selection, deployed selection, and specialized fine-tuning are reported separately. The pipeline is broadly competitive with strong lightweight tuned baselines on the same frozen features. It does not match the very best specialized models or heavily tuned pipelines on every task, but it stays close, and it runs much faster -- typically 4 to 200 times faster than full backbone fine-tuning, often at comparable quality. We describe how to deploy the pipeline in practice: when to apply ETF preprocessing, how to stop its training without a validation split, how to set up the in-context classifier, and how to calibrate the resulting probabilities. The calibration step is non-cosmetic: TabICL produces well-calibrated probabilities by construction, ETF preprocessing initially disrupts that calibration, and the post-hoc rescaling restores it -- yielding a per-prediction confidence signal that practitioners can use as a trust threshold for confidence-gated deployment. We also report where the pipeline should not be expected to help, and how to identify those cases in advance.

2606.02105 2026-06-02 cs.CV

Multimodal Action Diffusion for Robust End-to-End Autonomous Driving

多模态动作扩散用于鲁棒的端到端自动驾驶

Jorge Daniel Rodríguez-Vidal, Diego Porres, Gabriel Villalonga Pineda, Antonio M. López Peña

AI总结 提出动作扩散变换器(ADT),通过多模态动作建模和最近邻匹配,在闭环Bench2Drive基准上超越先前最优方法,同时延迟降低十倍。

详情
Comments
Preprint. June 1st, 2026. Corresponding author: Jorge Daniel Rodríguez-Vidal
AI中文摘要

端到端自动驾驶(E2E-AD)系统大多收敛于预测中间轨迹路点,将最终控制委托给具有GPS访问权限的手工控制器。直接控制信号预测(以端到端方式输出油门、转向和刹车)仍未被充分探索,且关键的是,动作多模态性在此类系统中的作用尚未被很好理解。我们认为,超越确定性单动作输出不仅是建模选择,更是驾驶性能、表示质量和训练稳定性的关键驱动因素。为验证这一点,我们引入了动作扩散变换器(ADT),这是一种无锚点扩散变换器,使用MSE目标训练,天然地对合理驾驶动作的多模态分布进行建模。ADT不承诺单一确定性命令,而是生成K个动作候选,并通过最近邻匹配(NNM)在推理时选择最合适的一个。除了强大的基准数值外,我们表明动作多模态性在学习表示和行为一致性方面带来了可衡量的好处,这些效果是确定性架构无法复制的。ADT在具有挑战性的闭环Bench2Drive基准上超越了先前最先进方法,同时实现了十倍更低的延迟,这表明表达性多模态动作建模对于鲁棒的端到端驾驶既实用高效又概念上必不可少。

英文摘要

End-to-End Autonomous Driving (E2E-AD) systems have largely converged on predicting intermediate trajectory waypoints, delegating final control to hand-crafted controllers with GPS access. Direct control-signal prediction (outputting throttle, steer and brake in an end-to-end fashion) remains underexplored, and critically, the role of action multimodality in such systems is not well understood. We argue that moving beyond deterministic, single-action outputs is not merely a modelling choice, but a key driver of driving performance, representational quality, and training stability. To validate this, we introduce the Action Diffusion Transformer (ADT), an anchor-free diffusion transformer trained with a MSE objective that natively models the multimodal distribution of plausible driving actions. Rather than committing to a single deterministic command, ADT generates K action candidates and selects the most suitable one at inference via Nearest Neighbour Matching (NNM). Beyond strong benchmark numbers, we show that action multimodality yields measurable benefits in learned representations and behavioral consistency, effects that deterministic architectures cannot replicate. ADT surpasses previous state-of-the-art on the challenging closed-loop Bench2Drive benchmark while achieving ten times lower latency, demonstrating that expressive, multimodal action modelling is both practically efficient and conceptually essential for robust end-to-end driving.

2606.02101 2026-06-02 stat.ML cs.LG stat.AP

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实:来自粗化边际的安全合成数据

Gillian M Raab

AI总结 提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法,确保透明性和无披露风险。

详情
AI中文摘要

本文提出了一种创建合成数据的方法,与当前可用的其他方法相比,该方法对用户有两个重要优势。首先是透明性;与其他方法不同,接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后,每个边际将根据数据保管者定义的标准进行统计披露控制,例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

2606.02100 2026-06-02 cs.CL

PortBERT: Navigating the Depths of Portuguese Language Models

PortBERT:探索葡萄牙语语言模型的深度

Raphael Scheible-Schmitt, Henry He, Armando B. Mendes

AI总结 本文提出PortBERT,一种基于RoBERTa的葡萄牙语语言模型家族,通过字节级BPE分词和稳定预训练在超过450GB数据上训练,在ExtraGLUE基准上达到竞争性能,并重点分析了训练和推理效率。

详情
Journal ref
Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models, 2025, pp. 59-71
AI中文摘要

Transformer模型主导现代自然语言处理,但高效的语言特定模型仍然稀缺。在葡萄牙语中,大多数工作侧重于规模或准确性,往往忽略了训练和部署效率。在本文中,我们介绍了PortBERT,一个基于RoBERTa的葡萄牙语语言模型家族,旨在平衡性能和效率。使用fairseq在来自CulturaX的超过450GB去重和过滤的mC4和OSCAR23数据上从头训练,PortBERT利用字节级BPE分词以及在GPU和TPU处理器上的稳定预训练流程。我们发布了两个变体,PortBERT base和PortBERT large,并在ExtraGLUE(一组翻译的GLUE和SuperGLUE任务)上评估它们。两个模型都表现出竞争力,匹配或超越现有的单语和多语言模型。除了准确性,我们还报告了训练和推理时间以及微调吞吐量,提供了模型效率的实用见解。因此,PortBERT通过解决葡萄牙语NLP中计算-性能权衡这一未被充分探索的维度,补充了先前的工作。我们在Huggingface上发布所有模型,并提供fairseq检查点以支持进一步的研究和应用。

英文摘要

Transformer models dominate modern NLP, but efficient, language-specific models remain scarce. In Portuguese, most focus on scale or accuracy, often neglecting training and deployment efficiency. In the present work, we introduce PortBERT, a family of RoBERTa-based language models for Portuguese, designed to balance performance and efficiency. Trained from scratch on over 450 GB of deduplicated and filtered mC4 and OSCAR23 from CulturaX using fairseq, PortBERT leverages byte-level BPE tokenization and stable pre-training routines across both GPU and TPU processors. We release two variants, PortBERT base and PortBERT large, and evaluate them on ExtraGLUE, a suite of translated GLUE and SuperGLUE tasks. Both models perform competitively, matching or surpassing existing monolingual and multilingual models. Beyond accuracy, we report training and inference times as well as fine-tuning throughput, providing practical insights into model efficiency. PortBERT thus complements prior work by addressing the underexplored dimension of compute-performance tradeoffs in Portuguese NLP. We release all models on Huggingface and provide fairseq checkpoints to support further research and applications.

2606.02096 2026-06-02 cs.CV

WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos

WebSpline:面向单目视频实时三维高斯的结构化样条

Jongmin Park, Jeonghwan Yun, Minh-Quan Viet Bui, Munchurl Kim

AI总结 提出WebSpline框架,利用结构信息样条(SIS)表示和结构代理图(SPG),实现从单目视频中实时、高保真、结构连贯的动态三维高斯重建。

详情
Comments
The first two authors contributed equally to this work (equal contribution). Please visit our project page at https://kaist-viclab.github.io/webspline-site/
AI中文摘要

从单目视频进行动态场景重建仍然极具挑战性,现有方法在有限的视角线索下往往难以平衡全局结构一致性与局部细节。为解决这一问题,我们提出WebSpline,一种新颖的动态三维高斯框架,能够从单目视频中实现结构连贯且高保真的重建,并支持快速渲染。WebSpline的核心是结构信息样条(SIS)表示,它使用可学习的三次埃尔米特样条对每个动态高斯轨迹进行建模,其运动通过辅助的结构代理图(SPG)进行结构化组织。所提出的框架分两个阶段进行优化:(i)第一阶段,从二维点轨迹初始化SPG,并通过时间刚性正则化进行细化,以建立序列中运动物体的结构连贯性;(ii)第二阶段,从细化后的SPG初始化SIS表示,并在空间和结构邻域约束下进行优化。推理时,仅通过评估学习到的SIS即可获得高斯运动,从而实现快速渲染。在具有挑战性的单目动态场景基准iPhone和NVIDIA上的大量实验表明,我们的WebSpline达到了最先进的渲染质量,同时在iPhone数据集上渲染速度比第二名WorldTree快10倍以上。

英文摘要

Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the Structure-Informed Spline (SIS) representation, which models each dynamic Gaussian trajectory using a learnable cubic Hermite spline whose motion is structurally organized with an auxiliary Structural Proxy Graph (SPG). The proposed framework is optimized in two stages: (i) in the first stage, the SPG is initialized from 2D point tracks and refined with temporal rigidity regularization to establish structural coherence for moving objects across the sequence; and (ii) in the second stage, the SIS representation is initialized from the refined SPG and optimized under both spatial and structural neighborhood constraints. At inference, Gaussian motion is obtained solely by evaluating the learned SIS, enabling fast rendering. Extensive experiments on the challenging monocular dynamic scene benchmarks, iPhone and NVIDIA, demonstrate that our WebSpline achieves state-of-the-art rendering quality while rendering over 10 times faster than WorldTree, the second-best method on the iPhone dataset.

2606.02093 2026-06-02 cs.CL cs.AI cs.LG

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

不确定性量化中模糊性在错误预测中的作用

Ieva Raminta Staliūnaitė, James Bishop, Andreas Vlachos

AI总结 通过解耦输入模糊性与不确定性信号,利用门控专家和选择性预测提升大语言模型在问答任务中的错误预测性能。

详情
Comments
8 pages not including references and appendices, 3 figures
AI中文摘要

错误预测任务,即预测模型输出是否正确,通常通过不确定性量化(UQ)来解决。然而,虽然不确定性指标捕捉了模型缺乏知识或能力进行预测的情况,但它们也反映了模型输入和上下文中固有的偶然不确定性。本文提出了一种通过将输入模糊性与UQ信号解耦来改进大语言模型(LLM)错误预测的方法。我们在问答(QA)任务上使用六种UQ指标进行实验,结果表明,UQ指标在无歧义实例上的错误预测能力优于具有多个合理答案的问题。我们使用门控专家和选择性预测将真实和预测的模糊性标签纳入错误预测流程。我们发现,模糊性信息提高了跨模型家族、训练和评估范式、数据集(包括据称无歧义的数据集)以及偶然不确定性来源的错误预测分数,在标准数据集上对单个UQ指标的PRR提升超过10个百分点。

英文摘要

The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackled with Uncertainty Quantification (UQ). However, while uncertainty metrics capture when models lack knowledge or capacity to make a prediction, they also reflect aleatoric uncertainty, which is inherent in the model input and context. This paper presents a method for improving error prediction for Large Language Models (LLMs), by disentangling input ambiguity from UQ signal. We conduct experiments on the task of Question Answering (QA) with six UQ metrics and show that UQ metrics are more predictive of errors on unambiguous instances than on questions with multiple plausible answers. We use Gated Experts and Selective Prediction to incorporate gold and predicted ambiguity labels into the error prediction pipeline. We find that ambiguity information improves error prediction scores across model families, training and evaluation paradigms, datasets (including allegedly unambiguous ones), and sources of aleatoric uncertainty, yielding improvements of over 10 points of PRR for individual UQ metrics on standard datasets.

2606.02092 2026-06-02 eess.IV cs.AI cs.CV

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE:用于土地覆盖估计的轻量级Transformer架构

Ümit Mert Çağlar, Alptekin Temizel

AI总结 提出LALE架构,通过分辨率分支编码器(轻量级ConvMixer处理高分辨率局部特征,Transformer处理低分辨率全局上下文)和全MLP多尺度解码器,在遥感图像分割中实现高效性能与计算成本的平衡。

详情
AI中文摘要

遥感图像的语义分割需要模型在严格的计算预算下同时捕捉全局上下文和局部细节。先前的工作通常针对这些轴之一进行优化:注意力用于全局上下文,卷积用于局部细节,或紧凑性用于效率。虽然混合方法旨在同时捕捉两者,但它们需要架构更改和带有计算开销的编码器骨干,限制了效率和性能。我们提出了LALE(用于土地覆盖估计的轻量级Transformer架构),一种端到端的遥感图像分割架构,它通过分辨率分支编码器:轻量级ConvMixer阶段处理高分辨率局部特征,而Transformer阶段处理低分辨率全局上下文,将自注意力的二次成本限制在深层、下采样的特征图上。全MLP多尺度解码器,以及贯穿始终的RMSNorm和StarReLU,进一步减少了计算量和参数数量。在大型ARAS400k遥感分割基准上,LALE相对于CNN、Transformer和混合基线建立了强大的效率-性能权衡。我们最小的变体(仅1.6M参数)在F1分数上达到最佳基线(UPerNet)的2.6分以内,同时使用4.5倍更少的参数、7倍更少的存储、17倍更少的GMACs,并提供1.8倍更高的吞吐量。

英文摘要

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

2606.02080 2026-06-02 cs.MA cs.AI cs.CV

Agentic-J: An AI Agent for Biological Microscopy Image Analysis

Agentic-J:用于生物显微镜图像分析的AI智能体

Lukas Johanns, Marilin Moor, Davide Panzeri, Yu Zhou, Xinyi Chen, Nora F. K. Pauly, Zixuan Pan, Matthias Gunzer, Andreas Müller, Yiyu Shi, Hedi Peterson, Jianxu Chen

AI总结 提出基于容器的多智能体AI助手Agentic-J,通过自然语言接口集成ImageJ/Fiji工具,实现从细胞分割到多条件量化的可追溯、可复现生物图像分析工作流。

详情
Comments
Presented at Cell Biology at Scale 2026 (Poster). The Agentic-J project is available at https://mmv-lab.github.io/Agentic-J/
AI中文摘要

生物图像分析日益需要整合异构工具、编程环境和领域知识,而很少有研究人员能同时掌握这些。我们提出Agentic-J,一个容器化的多智能体AI助手,主要面向ImageJ/Fiji,使生物学家能够用自然语言指定分析任务,从细胞核分割、细胞追踪到多条件量化。该智能体生成可执行的脚本,并组织成有文档记录的项目结构,因此每个分析决策都是可追溯的,工作流可以复现或共享。专门的子智能体负责插件管理、代码生成、调试、质量保证和统计报告。本文介绍系统的设计,展示真实的生物显微镜图像分析工作流,并详细说明技术实现。

英文摘要

Biological image analysis increasingly demands integration across heterogeneous tools, programming environments, and domain knowledge that few researchers can command simultaneously. We present Agentic-J, a containerised, multi-agent AI assistant, primarily for ImageJ/Fiji that enables biologists to specify analysis tasks in natural language, from nuclei segmentation and cell tracking to multi-condition quantification. The agent generates executable scripts organised into a documented project structure, so every analysis decision is traceable and the workflow can be reproduced or shared. The specialised sub-agents handle plugin management, code generation, debugging, quality assurance, and statistical reporting. In this paper we introduce the system's design, demonstrate real biological microscopy image analysis workflows, and detailed the technical implementation.

2606.02079 2026-06-02 cs.CV

FACT: A Simple and Efficient Framework for Active Finetuning

FACT:一种简单高效的主动微调框架

Wenshuai Xu, You Song, Yuzhuo Cui, Minjie Ren, Qingjie Liu, Zhenghui Hu

AI总结 针对主动微调中全量微调导致预训练特征失真和过拟合的问题,提出FACT三层分层微调框架,通过冻结特征增强和参数高效微调,在多种数据集和架构上显著提升性能,尤其在低采样率下实现超过20%的增益。

详情
Comments
ACCEPTED for publication as a REGULAR paper in the IEEE Transactions on Image Processing (T-IP)
AI中文摘要

主动微调的主要目标是通过使用精心挑选的信息性或挑战性数据对预训练模型进行微调,以提高其在特定任务或领域上的性能。先前的研究主要关注主动方面(即数据选择),同时统一采用全量微调进行模型适应,这不可避免地因分布偏移而扭曲预训练特征。当模型大小相对于微调数据量较大时,这个问题变得尤为突出,导致过拟合风险增加。为了解决这一关键差距,我们正式概述了FiAF任务,该任务强调在主动学习中系统探索微调方法。我们提出了FACT,一个三阶段分层微调框架,兼具高效性和简洁性,专门为主动微调场景设计。我们的综合实验涵盖:(1)三大数据集类别,包括经典(CIFAR10、CIFAR100、ImageNet-1k)、不平衡(CIFAR10-LT、CIFAR100-LT)和细粒度(StanfordCars、FGVCAircraft)图像分类数据集,每个在3-5种不同采样率下评估;(2)多样化的预训练架构,包括卷积神经网络(ConvNeXt)、视觉变换器(ViT)和视觉LSTM(ViL)网络;(3)对冻结特征增强(FroFA)策略的系统研究;(4)对效率和泛化性的全面严格分析。结果表明,我们的框架具有显著改进,并具备强大的泛化性和鲁棒性。值得注意的是,在低采样率下,我们的框架在CIFAR10、CIFAR100和ImageNet-1k基准测试中,ViT模型实现了超过20%的显著性能提升。这种系统性的方法在保持参数效率的同时建立了新的最先进性能,在标记数据稀缺时尤其有效。

英文摘要

The main goal of active finetuning is to improve a pretrained model's performance on a specific task or domain by finetuning it with carefully selected informative or challenging data. Previous research has predominantly focused on the active aspect (i.e., data selection) while uniformly employing full finetuning for model adaptation, which inevitably distorts pretrained features due to distribution shift. This issue becomes particularly pronounced when the model size is large relative to the finetuning data quantity, leading to heightened overfitting risks. To address this critical gap, we formally outline the FiAF task that emphasizes systematic exploration of finetuning methodologies in active learning. We propose FACT, a three-phase hierarchical finetuning framework featuring both efficiency and simplicity, specifically designed for active finetuning scenarios. Our comprehensive experiments span: (1) Three major dataset categories encompassing classic (CIFAR10, CIFAR100, ImageNet-1k), imbalanced (CIFAR10-LT, CIFAR100-LT), and fine-grained (StanfordCars, FGVCAircraft) image classification datasets, each evaluated under 3-5 distinct sampling ratios; (2) Diverse pretrained architectures including Convolutional Neural Network (ConvNeXt), Vision Transformer (ViT), and Vision LSTM (ViL) networks; (3) A systematic investigation of frozen feature augmentation (FroFA) strategies. (4) A comprehensive and rigorous analysis of efficiency and generalizability. The results demonstrate significant improvements with strong generalization and robustness. Notably, under low sampling ratios, our framework achieves remarkable performance gains of over 20% on the ViT model for CIFAR10, CIFAR100, and ImageNet-1k benchmarks. This systematic approach establishes new state-of-the-art performance while maintaining parameter efficiency, proving particularly effective when labeled data is scarce.

2606.02078 2026-06-02 cs.LG

Beyond $\ell_2$-norm and $\ell_\infty$-norm: A Curvature-Inspired $\ell_p$-Norm Scheme for Deep Neural Networks

超越ℓ2范数和ℓ∞范数:一种受曲率启发的深度神经网络ℓp范数方案

Jianhao Xu, Zhuang Yang

AI总结 针对现有优化器在参数维度曲率变化大时适应性差的问题,提出一种动态p值的ℓp范数方案,并融入SGD和SGDM,得到LPSGD和LPSGDM优化器,通过早期大p抑制高曲率方向、后期余弦退火减小p实现稳定更新,理论证明非凸情形下O(T^{-1/2})收敛率,在CIFAR和ImageNet数据集上验证了泛化性能提升。

详情
AI中文摘要

现有的深度神经网络(DNN)优化器通常依赖于ℓ2范数或ℓ∞范数,导致优化器不能很好地适应参数维度上曲率的显著变化。通常,DNN的训练过程在早期表现出强烈的曲率各向异性,而在后期,DNN的训练过程趋向于向各向异性较弱的平坦区域移动。特别地,基于ℓ2范数的优化器通常由高曲率方向主导,限制了优化器沿较低曲率方向的更新,从而导致收敛速度较慢。而基于ℓ∞范数的优化器由于坐标方向更新幅度相同,在平坦区域容易产生振荡。为了解决ℓ2和ℓ∞范数产生的这两种极端情况,我们提出了一种具有动态p值的新型ℓp范数方案,并将其融入随机梯度下降(SGD)和带动量的SGD(SGDM)中,从而得到两种具有更好泛化性能的新型优化器:ℓp-SGD(LPSGD)和ℓp-SGDM(LPSGDM)。特别地,所得到的优化器通过使用较大的p(p>2)来抑制早期高曲率方向的支配地位,随后将p逐渐减小至2以实现更稳定和精细的更新,其中后一过程受余弦退火策略启发。我们建立了所得到算法的理论保证,并分析了LPSGD和LPSGDM在非凸情形下均达到O(T^{-1/2})的收敛率。在基准数据集(包括CIFAR-10、CIFAR-100和ImageNet-1K)上,使用多种DNN(如VGG-11、ResNet-18和ResNet-50)进行了大量实验。

英文摘要

The existing optimizers for deep neural networks (DNNs) typically rely on either the $\ell_2$ norm or the $\ell_\infty$ norm, resulting in optimizers that do not adapt well to substantial changes in curvature across parameter dimensions. Generally, the training process of DNNs often exhibits strong curvature anisotropy in the early period, whereas in the later period, the training process of DNNs tends to move toward flatter regions with weaker anisotropy. Particularly, optimizers based on the \(\ell_2\)-norm are usually dominated by high-curvature directions, restricting updates of optimizers along with lower curvature direction and thus leading to a slower convergence rate. While optimizers based on the \(\ell_\infty\)-norm are prone to oscillations in flatter regions, due to the coordinate-wise updates of the same magnitude. To address these two extreme cases generated by $\ell_2$ and $\ell_\infty$ norms, we propose a novel $\ell_p$-norm scheme with a dynamical value of $p$ and incorporate it into stochastic gradient descent (SGD) and SGD with momentum (SGDM), leading to two novel optimizers with better generalization performance: ${\ell_p}$-SGD (LPSGD) and ${\ell_p}$-SGDM (LPSGDM). Particularly, the resulting optimizers suppress the dominance of high-curvature directions in the early period by utilizing a large $p$ ($p>2$), followed by a gradual decrease of $p$ toward 2 to enable more stable and refined updates, where the latter process is motivated by the cosine annealing strategy. We establish theoretical guarantees of the resulting algorithms and analyze that both LPSGD and LPSGDM achieve an \(O(T^{-1/2})\) convergence rate for the nonconvex setting. Extensive experiments are conducted on benchmark datasets, including CIFAR-10, CIFAR-100, and ImageNet-1K, with multiple DNNs such as VGG-11, ResNet-18, and ResNet-50.

2606.02073 2026-06-02 cs.LG

Planar Symmetric Pattern Generation

平面对称图案生成

Ning Lin, Luxi Chen, Huaguan Chen, Jiacheng Cen, Chongxuan Li, Wenbing Huang, Hao Sun

AI总结 提出一种适用于任意平面群的对称化框架,通过将任意2D连续表示转换为对称表示并保持连续性,实现对称控制,在图案设计、剪纸设计、风格化拓扑设计和材料设计任务中验证了有效性。

详情
AI中文摘要

生成具有特定对称性的对象在各种现实场景中至关重要。然而,将现有的2D连续表示适应于强制平面群对称性仍然是一个挑战,因为非反射群元素的变换可能破坏连续性。为了克服这一限制,我们提出了一种适用于任意平面群的对称化框架。我们的方法将任意2D连续表示转换为对称表示,同时保持连续性。我们提供了该表示的数学公式,展示了其对对称函数的逼近能力,并详细介绍了构建方法。我们通过三个视觉设计任务(图案设计、剪纸设计和风格化拓扑设计)和一个材料设计任务验证了我们的方法。实验证实,我们的表示能够实现有效的对称控制,并展示了其更广泛的适用性。

英文摘要

Generating objects with specific symmetries is essential in various real-world scenarios. However, adapting existing 2D continuous representations to enforce planar group symmetry remains a challenge, as the transformation of non-reflective group elements may disrupt continuity. To overcome this limitation, we propose a symmetrization framework for arbitrary planar groups. Our method transforms any 2D continuous representation into a symmetric one while preserving continuity. We provide the mathematical formulation of this representation, demonstrate its approximation capability for symmetric functions, and detail the construction methodology. We validate our approach through three visual design tasks (pattern design, paper-cutting design and stylized topology design) and one material design task. Experiments confirm that our representation enables effective symmetry control and demonstrate its broader applicability.

2606.02068 2026-06-02 cs.CV cs.AI

Fast and Lightweight Novel View Synthesis with Differentiable Multiplane Image

基于可微多平面图像的快速轻量级新视角合成

Kaidi Zhang, Guanxu Zhu

AI总结 针对现有方法在速度、模型大小和稀疏视角下的不足,提出基于可微多平面图像(MPI)的快速轻量级新视角合成方法,利用点图进行几何初始化并引入一步扩散处理空洞和伪影。

详情
AI中文摘要

近年来,新视角合成取得了显著进展,主流方法如神经辐射场(NeRF)和3D高斯泼溅(3DGS)产生了令人印象深刻的结果。然而,这些方法往往难以平衡渲染速度和模型大小,且其基于优化的训练可能非常耗时。此外,它们通常依赖于密集观测,在稀疏视角条件下往往无法产生令人满意的结果。尽管前馈重建显著减少了3DGS的优化时间,但其像素对齐公式从单张图像生成数百万个高斯,严重限制了其在移动设备上的实际部署。为了解决这些限制,我们重新审视了多平面图像(MPI)表示,该表示使用一组紧凑的平面层来表示场景,以实现高效的新视角合成。利用视觉基础模型的最新进展,我们使用预测的点图进行可靠的几何初始化,然后进行可微优化。为了解决稀疏初始化MPI中的空洞和伪影问题,我们引入了一步扩散,该扩散既参与MPI的可微优化,也参与渲染结果的后处理。与代表性的基于GS的方法相比,我们的方法速度快30.7%,模型大小仅为其14.8%,同时在前景场景中实现了具有竞争力的合成质量。

英文摘要

Recently, novel view synthesis has witnessed remarkable progress, with mainstream methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) delivering impressive results. However, these approaches often struggle to balance rendering speed and model size, and their optimization-based training can be highly time-consuming. Furthermore, they typically rely on dense observations, often failing to produce satisfactory results under sparse-view conditions. Although feed-forward reconstruction significantly reduces the optimization time of 3DGS, its pixel-aligned formulation generates millions of Gaussians from a single image, severely limiting its practical deployment on mobile devices. To address these limitations, we revisit the Multiplane Image(MPI) representation, which represents scenes using a compact set of planar layers for efficient novel view synthesis. Leveraging recent advances in visual foundation models, we utilize predicted point maps for reliable geometric initialization, followed by differentiable optimization. To address the issues of holes and artifacts in sparsely initialized MPI, we introduce one-step diffusion, which participates in both the differentiable optimization of MPI and the postprocessing of rendering results. Compared with a representative GS-based method, our approach is 30.7% faster and uses only 14.8% of its model size, while achieving competitive synthesis quality on front-view scenarios

2606.02061 2026-06-02 cs.LG

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

消除原型:原型SAE的稳定性是初始化和度量设计的人为产物

Michał Brzozowski, Neo Christopher Chung

AI总结 本文通过实验证明,原型稀疏自编码器声称的稳定性源于多轮训练中相同的初始化设置,而非原型约束本身,并强调稳定性与稳定化的区别对可解释性研究至关重要。

详情
AI中文摘要

使用稀疏自编码器(SAE)的字典学习从神经网络激活中产生过完备基,这些基通常是可解释的,并减少了多义性。然而,不同随机种子的SAE特征差异很大——这个问题被称为不稳定性。原型SAE(Fel等人,2025)被提出作为一种通用的字典学习干预,用于更可靠的概念提取,并报告在训练结束时字典更稳定。我们证明原型SAE声称的稳定性是在多次运行中设置相同初始化的结果。通过我们的分析,我们试图澄清机械可解释性中可能模糊使用的两个不同概念:稳定性是两个独立训练模型之间的一致性,而稳定化是独立初始化的运行向共同解收敛。这种区分对于自然语言处理(NLP)的机械可解释性至关重要,其中特征稳定性越来越多地被用作SAE特征是可重用分析单元的证据。原型SAE的实验共享一个确定性的k-means解码器初始化,在训练开始前将运行间字典距离设为零。当移除这种初始化时,原型约束在我们的设置中没有提供稳定化优势。我们进一步发现了一个依赖于预处理的余弦几何问题,使端点稳定性指标的解释复杂化。总的来说,我们的研究支持在更大的字典学习传统中研究SAE的价值,同时表明稳定性声明需要轨迹诊断和初始化消融。

英文摘要

Dictionary learning with sparse autoencoders (SAEs) produces overcomplete bases from neural network activations that are often interpretable and reduces polysemanticity. However, features from SAEs vary substantially across random seeds -- a problem known as instability. Archetypal SAEs (Fel et al., 2025) were proposed as a general dictionary-learning intervention for more reliable concept extraction, and report more stable dictionaries at the end of training. We demonstrate that the stability claimed by archetypal SAEs is a result of setting identical initialization across multiple runs. Through our analyses, we attempt to clarify two distinct notions in mechanistic interpretability that may be ambiguously used: stability is agreement between two independently trained models, whereas stabilization is the convergence of independently initialized runs toward a common solution. This distinction is critical for mechanistic interpretability of natural language processing (NLP), where feature stability is increasingly used as evidence that SAE features are reusable units of analysis. Experiments from archetypal SAEs share a deterministic k-means decoder initialization, setting inter-run dictionary distance to zero before training begins. When this initialization is removed, the archetypal constraint provides no stabilization advantage in our setting. We further identify a preprocessing-dependent cosine geometry issue that complicates interpretation of endpoint stability metrics. Overall, our study supports the value of studying SAEs within the larger dictionary-learning tradition while showing that stability claims require trajectory diagnostics and initialization ablations.

2606.02058 2026-06-02 cs.CV cs.RO

TIDES: Time-Derivative Event Simulation via Deformable Reconstruction

TIDES:基于可变形重建的时间导数事件模拟

Christopher Thirgood, Dipon Kumar Ghosh, Simon Hadfield

AI总结 提出TIDES,一种基于动态高斯泼溅的连续时间事件模拟器,通过显式3D场景表示推导逐像素强度动态,实现精确的阈值交叉预测,并利用遮挡引导自适应时间步长,达到最先进的事件流保真度。

详情
AI中文摘要

事件相机响应环境外观变化而发出异步事件。真实世界事件数据集的稀缺使得模拟至关重要。然而,大多数模拟器从帧序列推断事件时间戳,迫使许多阈值交叉共享一小组离散时间;我们将这种失效模式称为时间戳批处理,它在快速运动和遮挡下会恶化。我们提出TIDES,一种基于动态高斯泼溅的连续时间事件模拟器。由于TIDES在具有学习几何和运动的显式3D场景表示上运行,它可以直接从场景推导每像素强度动态,而不是通过渲染帧的差分。这使得能够精确预测阈值交叉,包括每个渲染步骤的多次交叉,而无需时间上采样或帧插值。相同的3D场景模型揭示了物体之间部分遮挡的位置;TIDES利用这一点来指导自适应时间步长,仅将计算集中在遮挡动力学使简单亮度变化模型不可靠的区域。最后,我们使用瓦片级仲裁器对有限传感器带宽进行建模,其吞吐量、抖动和事件丢失再现了真实的传感器伪影。在配对的RGB-事件基准测试中,TIDES达到了最先进的事件流保真度。我们还表明,TIDES模拟的事件比竞争对手更有效地转移到真实下游任务。

英文摘要

Event cameras emit asynchronous events in response to environmental appearance changes. The scarcity of real-world event datasets makes simulation essential. However, most simulators infer event timestamps from frame sequences, forcing many threshold crossings to share a small set of discrete times; a failure mode we term timestamp batching that worsens under fast motion and occlusion. We present TIDES, a continuous-time event simulator built on dynamic Gaussian splatting. Because TIDES operates on an explicit 3D scene representation with learnt geometry and motion, it can derive per-pixel intensity dynamics directly from the scene, rather than by differencing rendered frames. This enables accurate threshold-crossing prediction, including multiple crossings per rendering step, without temporal upsampling or frame interpolation. The same 3D scene model reveals where objects partially occlude one another; TIDES uses this to guide adaptive time stepping, concentrating computation only in regions where occlusion dynamics make simple models of brightness change unreliable. Finally, we model finite sensor bandwidth using a tile-level arbiter whose throughput, jitter, and event drops reproduce realistic sensor artifacts. Across paired RGB-event benchmarks, TIDES attains state-of-the-art event-stream fidelity. We also show that events simulated by TIDES transfer more effectively to real downstream tasks than competitors'.

2606.02055 2026-06-02 cs.IT cs.LG cs.SI math.IT stat.ML

Query-Limited Community Recovery in Stochastic Block Models

随机块模型中的有限查询社区恢复

Sabyasachi Basu, Manuj Mukherjee, Lutz Oettershagen, Suhas Thejaswi

AI总结 研究在有限且带噪的网络数据访问下,通过自适应查询策略实现两社区随机块模型的精确社区恢复,并证明自适应查询可突破非自适应基准的信息论极限。

详情
AI中文摘要

我们研究在 $n$ 个顶点上的两社区随机块模型中,对网络数据的有限且带噪访问下的精确社区恢复。学习器可以查询一个带噪的邻域预言机,该预言机独立地以固定概率揭示被查询顶点的每个真实邻居,且从不返回非邻居,受限于有限的查询预算。我们考虑仅预言机访问以及一个组合模型,其中学习器还观察底层图的单个子采样副本。对于仅预言机访问,平衡均匀查询给出了一个尖锐的非自适应基准:当每个顶点被查询相同整数次数时,观测结果简化为具有衰减边概率的 SBM,并且 Abbe-Bandeira-Hall 精确恢复阈值适用。我们证明该基准并非自适应最优:在平衡均匀查询需要 $m n$ 次查询(对于某个 $m>1$)的机制下,两阶段自适应策略以 $n+o(n)$ 次查询成功。对于额外的子采样图,我们证明了一个亚线性查询的自适应差距:预算为亚线性的平衡数据无关均匀查询不会比单独的子采样图有所改进,而自适应查询可以针对少量不确定顶点并实现精确恢复。因此,自适应数据采集可以严格改善精确恢复的信息论极限。

英文摘要

We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.

2606.02054 2026-06-02 cs.AI

eMoT: evolving Memory-of-Thought via Symbolic Anchoring and Memory Corrosion

eMoT: 通过符号锚定和记忆腐蚀演化的思维记忆

Xiang Li, Jiwei Wei, Ke Liu, Yitong Qin, Jinyu Guo, Malu Zhang, Peng Wang, Yang Yang

AI总结 提出eMoT框架,通过记忆腐蚀、符号锚定和一致性精炼三个模块,将推理轨迹视为动态演化记忆,以稳定多步推理并提升准确率与一致性。

详情
AI中文摘要

尽管大型语言模型(LLMs)在多步推理任务上取得了令人印象深刻的性能,但其可靠性仍然受到关键限制的阻碍,例如不受约束的幻觉和较差的数值计算。从根本上说,这些问题源于标准模型将推理视为一次性的瞬态生成过程,而不是保留并改进成功的程序逻辑。为了解决这些挑战,我们提出了eMoT(演化的思维记忆),这是一个统一框架,通过将推理轨迹视为动态演化的记忆而非静态模板来稳定多步推理。该框架主要由三个相互连接的模块组成:(i)记忆腐蚀机制,强化高效用推理结构,同时逐渐衰减较少使用的结构;(ii)符号锚定引擎,利用Python进行确定性计算,类似于人类使用计算器;(iii)一致性驱动的精炼过程,将神经推理与符号结果对齐,减少逻辑差异的累积。在多个推理基准上,eMoT相比标准的思维链和结构化推理基线提高了准确率和解决方案一致性。在传统任务Game of 24上,eMoT达到了100%的准确率,比基线高出17.6%。在数学任务GSM8K、ASDiv、SVAMP和MGSM上的评估进一步显示了在多步数学推理中的持续改进。在我们的评估中,尽管使用了轻量级骨干模型且基线能力受限,我们仍取得了优越的性能。与依赖大规模模型的替代方法相比,我们的结果表明性能提升根本上是由eMoT框架的推理控制驱动的,而非单纯的模型规模。

英文摘要

While Large Language Models (LLMs) achieve impressive performance on multi-step reasoning tasks, their reliability is persistently hindered by critical limitations such as unconstrained hallucinations and poor numerical computation. Fundamentally, these issues arise because standard models treat reasoning as a transient, one-off generation process rather than retaining and refining successful procedural logic. To address these challenges, we propose eMoT (evolving Memory-of-Thought), a unified framework that stabilizes multi-step reasoning by treating reasoning trajectories as dynamic, evolving memories rather than static templates. The framework primarily consists of three interconnected modules: (i) a memory corrosion mechanism that reinforces high-utility reasoning structures while gradually decaying less frequent ones; (ii) a symbolic anchoring engine that utilizes Python for deterministic computation, much like a human uses a calculator; and (iii) a consistency-driven refinement process that aligns neural inference with symbolic outcomes, reducing the accumulation of logical discrepancies. Across multiple reasoning benchmarks, eMoT improves accuracy and solution consistency over standard Chain-of-Thought and structured reasoning baselines.On the traditional task Game of 24, eMoT achieves 100% accuracy, surpassing the baseline by up to 17.6%. Evaluations on mathematical task GSM8K, ASDiv, SVAMP, and MGSM further show consistent gains in multi-step mathematical reasoning. In our evaluation, we achieve superior performance despite utilizing a lightweight backbone model with constrained baseline capabilities. Compared to alternative methods that rely on massively scaled models, our results demonstrate that the performance gains are fundamentally driven by the eMoT framework's reasoning control rather than sheer model size.

2606.02049 2026-06-02 cs.AI

Explainable Data-driven Deep Reinforcement Learning Methods for Optimal Energy Management in Buildings

面向建筑最优能量管理的可解释数据驱动深度强化学习方法

Hallah Shahid Butt, Qiong Huang, Gökhan Demirel, Kevin Förderer, Erfan Tajalli-Ardekani, Simnon Waczowicz, Luigi Spatafora, Veit Hagenmeyer, Benjamin Schäfer

AI总结 提出可解释深度强化学习框架,结合真实数据训练策略内与离策略算法,通过事后解释技术揭示电池管理决策过程,实现降本与透明化。

详情
AI中文摘要

可再生能源在电力系统中的日益普及,特别是在配备光伏板和储能系统的建筑中,引入了能源系统的显著复杂性。波动的发电量、变化的电价以及增加的实体(如光伏系统和热泵)增加了复杂性,使系统更难运行。这导致了对额外控制和优化路径的需求,包括基于数据的控制,如强化学习。虽然深度强化学习已成为在动态且日益复杂的环境中优化建筑运营的有前景的解决方案,但其黑箱特性阻碍了用户信任和实际应用。本文提出了一种应用于住宅建筑能量管理的可解释深度强化学习框架。我们在合成数据以及来自KIT Living Lab Energy Campus的真实数据上展示了其使用。我们在扩展的状态空间上训练并比较了策略内和离策略的DRL智能体,该状态空间包含实时测量(需求、光伏发电、电池功率、荷电状态)、外部信号(动态电价、本地天气数据)、日历和假日指标以及需求和价格预测。我们的实验结果表明,策略内算法,特别是优势演员-评论家和近端策略优化,在累积奖励和策略稳定性方面优于离策略方法。为了解释这些模型,我们采用事后解释技术来阐述学到的控制策略。我们的发现表明,XRL框架不仅通过最优电池管理降低了电力成本,还提供了对智能体决策过程的透明、可操作的见解。

英文摘要

The increasing integration of renewable energy sources into power systems, particularly in buildings equipped with photovoltaic (PV) panels and energy storage systems, introduces significant complexity in energy systems. Volatile power generation, varying electricity tariffs, and increased entities, e.g., PV systems, and heat pumps, have increased the complexity and made the system harder to operate. This leads to the demand for additional control and optimization routes including data-based controls, such as reinforcement learning. While deep reinforcement learning (DRL) has emerged as a promising solution to optimize building operations in dynamic and ever more complex environments, its black-box nature impedes user trust and practical adoption. This paper presents a framework for explainable deep reinforcement learning (XRL) applied to energy management in residential buildings. We demonstrate its usage on both synthetic data but also on real-world data from the Living Lab Energy Campus (LLEC) at KIT. We train and compare both on-policy and off-policy DRL agents on an expanded state space that incorporates real-time measurements (demand, PV generation, battery power, state of charge), external signals (dynamic electricity price, local weather data), calendrical and holiday indicators, and forecasts for demand and price. Our experimental results indicate that on-policy algorithms, particularly Advantage Actor Critic (A2C) and Proximal Policy Optimization (PPO), outperform off-policy methods in terms of cumulative rewards and policy stability. To explain these models, we employ post-hoc interpretation techniques to elaborate the learned control policies. Our findings demonstrate that the XRL framework not only reduces electricity costs through optimal battery management, but also provides transparent, actionable insights into the agent's decision-making process.

2606.02048 2026-06-02 cs.AI cs.CV physics.bio-ph

Topological texture analysis of microscopy images of dynamic casein gelation and its relation to rheological properties

动态酪蛋白凝胶化显微图像拓扑纹理分析及其与流变学性质的关系

Zahra Tabatabaei, Diana Soto Aguilar, Jose C. Bonilla, Mathias P. Clausen, Jon Sporring

AI总结 提出结合拓扑数据分析、差分盒计数、多重分形分割和局部二值模式的工具箱,分析STED显微图像中酪蛋白凝胶化的拓扑与纹理特征,揭示与流变学性质相关的微观结构转变。

详情
AI中文摘要

我们提出了一种新颖的计算工具箱,集成了拓扑数据分析(TDA)、差分盒计数(DBC)、多重分形分割(MFP)和局部二值模式(LBP),应用于由葡萄糖酸-δ-内酯(GDL)在30°C和40°C以及两种GDL浓度(1.8%和3.5% w/v)下诱导的酪蛋白酸钠凝胶化的时间序列超分辨率STED显微图像。TDA通过最大Betti-1曲线追踪拓扑环,即反映蛋白质网络互连性的封闭环状结构,揭示了分散聚集体的滞后阶段、与网络渗透和流变学观察到的溶胶-凝胶转变相一致的急剧衰减,以及对应于网络重排的凝胶后增加。这些拓扑转变通过DBC和MFP得到证实,因为这些方法能够解析结构复杂性和空间异质性的变化。该工具箱在实验应用前在模拟分形图像上进行了验证。总之,这些描述符对体相流变学作为平均体相力学响应捕获的细微微观结构转变具有敏感性。这种集成方法为表征食品和材料科学中具有演化微观结构动力学的复杂微观结构提供了稳健的定量工具。代码可在https://github.com/Zahratabatabaei/Delifood_CV_paper.git获取。

英文摘要

We propose a novel computational toolbox that integrates Topological Data Analysis (TDA), Differential Box Counting (DBC), Multifractal Partition (MFP), and Local Binary Patterns (LBP), applied to time-lapse super-resolution STED microscopy images of sodium caseinate gelation induced by glucono-delta-lactone (GDL) at 30 °C and 40 °C and two GDL concentrations (1.8% and 3.5% w/v). TDA tracked topological loops, closed ring-like structures reflecting protein network interconnectivity, via max-Betti-1 curves, which revealed a lag phase of dispersed aggregates, a sharp decay coinciding with network percolation and the rheologically observed sol-gel transition, and a post-gelation increase corresponding to network rearrangements. These topological transitions were corroborated by DBC and MFP as these methods were able to resolve changes in structural complexity and spatial heterogeneity. The toolbox was validated on simulated fractal images prior to experimental application. Together, these descriptors provided sensitivity to subtle microstructural transitions that bulk rheology captured as averaged bulk mechanical responses. This integrated approach provides a robust quantitative tool for characterizing complex microstructure in food and material science with evolving microstructural dynamics. Code is available at https://github.com/Zahratabatabaei/Delifood_CV_paper.git

2606.02047 2026-06-02 stat.ML cs.LG math.ST stat.ME stat.TH

Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation

凸距离算子传输:一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

AI总结 提出凸距离算子传输(CDOT),通过算子正则化联合保持特征对应与内在几何结构,实现异质分布对齐,并证明其伪度量性质及与Gromov-Wasserstein的关系。

详情
Comments
This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026
AI中文摘要

我们引入了凸距离算子传输(CDOT),这是第一个凸最优传输框架,通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说,CDOT采用基于算子的正则化,通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此,所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外,我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein(GW)之间的关系,正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下,我们推导了一个非渐近风险界,分解为优化误差和统计误差,并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明,该方法优于现有方法,在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

2606.02045 2026-06-02 cs.CV cs.AI

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

域偏移下基于注意力机制和迁移学习的鲁棒桃叶损伤分类

Adrián Cánovas-Rodriguez, Miguel A. González-Illán, Maria Fernanda García-Cruz, Pedro Nortes Tortosa, José Salvador Rubio-Asensio, Miguel A. Zamora Izquierdo, Juan Antonio Martínez Navarro, Antonio F. Skarmeta

AI总结 提出基于注意力机制和迁移学习的桃叶损伤分类方法,通过CBAM增强EfficientNet模型在公共数据集上达到93.3%准确率,并在本地数据集上通过迁移学习实现93%宏F1分数,有效应对域偏移。

详情
AI中文摘要

人工智能为从图像数据评估作物损伤提供了实用框架,支持农业管理中的早期决策。在桃园中,气候变化增加了非生物胁迫和生物压力,包括病虫害,这些通常产生视觉上相似的叶片症状。这种重叠使得手动诊断变得困难,尤其是在不同环境条件下的多个田地中,凸显了对具有强泛化能力的自动化模型的需求。 我们提出了一种基于图像的桃叶损伤检测分类方法。通过手动标注公开图像创建了一个基准数据集,包含六个损伤类别的1,366片桃叶。评估了几种深度学习架构。EfficientNet模型取得了最佳结果,其中EfficientNetB0达到92.9%的准确率,EfficientNetB3达到91.5%,EfficientNetB5在少数类上表现最强。DenseNet121达到92.6%的准确率。卷积块注意力模块(CBAM)的集成在多个骨干网络中提升了性能,特别是在EfficientNetB5和InceptionV3中,而在其他网络中效果有限或为负。CBAM增强的EfficientNetB5取得了93.3%的最佳总体准确率。 为了评估在现实条件下的鲁棒性,收集了一个包含四个类别180张图像的本地数据集,并应用迁移学习策略来解决域偏移。测试了三种微调策略。结合CBAM的EfficientNetB3在本地域中取得了最佳性能,迁移后宏F1分数达到93%。总体而言,基于注意力的模型在少数类上表现出更强的鲁棒性,并在不同田间条件下具有更好的泛化能力。

英文摘要

Artificial intelligence provides a practical framework for crop damage assessment from imagery data, supporting early decision-making in agricultural management. In peach orchards, climate change increases abiotic stress and biotic pressures, including pests and diseases, which often produce visually similar foliar symptoms. This overlap makes manual diagnosis difficult, especially across multiple fields with varying environmental conditions, highlighting the need for automated models with strong generalization ability. We propose an image-based classification approach for peach leaf damage detection. A benchmark dataset was created through manual annotation of publicly available images, consisting of 1,366 peach leaves across six damage categories. Several deep learning architectures were evaluated. EfficientNet models achieved the best results, with EfficientNetB0 reaching 92.9 percent accuracy, EfficientNetB3 achieving 91.5 percent, and EfficientNetB5 showing the strongest performance on minority classes. DenseNet121 reached 92.6 percent accuracy. The integration of the Convolutional Block Attention Module (CBAM) improved performance in several backbones, particularly EfficientNetB5 and InceptionV3, while showing limited or negative impact in others. The CBAM-enhanced EfficientNetB5 achieved the best overall accuracy of 93.3 percent. To evaluate robustness under realistic conditions, a local dataset of 180 images across four classes was collected, and transfer learning strategies were applied to address domain shift. Three fine-tuning strategies were tested. EfficientNetB3 combined with CBAM achieved the best performance in the local domain, reaching a 93 percent macro F1-score after transfer. Overall, attention-based models showed improved robustness for minority classes and better generalization across different field conditions.

2606.02042 2026-06-02 cs.CV

Normality-Preserving Continual Industrial Anomaly Detection via Orthogonal LoRA Banks

通过正交LoRA库保持正态性的持续工业异常检测

Weibai Fang, Haijun Che, Feiyang Ren, Qiancheng Lao

AI总结 提出基于历史冻结正交LoRA库和分层新颖性自适应库增长模块的框架,解决扩散模型在持续工业异常检测中的历史正态先验漂移和灾难性遗忘问题。

详情
Comments
33 pages,6 figures,Submitted to Advanced Engineering Informatics
AI中文摘要

基于扩散模型的持续工业异常检测面临历史正态先验漂移和灾难性遗忘问题。现有的持续扩散方法通过回放或约束优化保留先前知识,但缺乏在顺序适应过程中隔离和保护类别特定正态先验的显式机制。尽管低秩适应提供了模块化残差更新,但标准LoRA既未冻结历史正态子空间,也未阻止新适配器干扰先前适配器。为解决此问题,我们提出基于两个模块的正态保持持续异常检测框架:历史冻结正交LoRA库(HF-OLB)和分层新颖性自适应库增长模块(HNABG)。HF-OLB冻结预训练的U-Net主干和已学习的LoRA库,并将新任务特定的正态残差约束到历史LoRA子空间的正交补空间中。HNABG进一步分配层依赖的残差容量,并仅在残差正态新颖性超过现有库的表达容量时扩展库。在MVTec和VisA上的大量实验证明了所提方法的有效性。在具有挑战性的VisA 2x6设置下,我们的方法实现了83.6/91.8的图像和像素级A-AUROC,以及3.8/3.9的FM,将像素级A-AUROC提升了3.2个百分点,同时将像素级FM降低了1.3。这些结果表明,我们的方法在长时间跨度的持续类别序列中有效保留了历史正态先验。

英文摘要

Continual industrial anomaly detection with diffusion models suffers from historical normality prior drift and catastrophic forgetting. Existing continual diffusion methods preserve previous knowledge through replay or constrained optimization, but they lack an explicit mechanism for isolating and protecting category-specific normality priors during sequential adaptation. Although low-rank adaptation provides modular residual updates, standard LoRA neither freezes historical normality subspaces nor prevents new adapters from interfering with previous ones. To address this issue, we propose a normality-preserving continual anomaly detection framework based on two modules: History Frozen Orthogonal LoRA Bank (HF-OLB) and Hierarchical Novelty Adaptive Bank Growth module (HNABG). HF-OLB freezes both the pre-trained U-Net backbone and the learned LoRA banks, and constrains new task-specific normality residuals to the orthogonal complement of historical LoRA subspaces. HNABG further allocates layer-dependent residual capacity and expands the bank only when the residual normality novelty exceeds the expressive capacity of existing banks. Extensive experiments on MVTec and VisA demonstrate the effectiveness of the proposed method. On the challenging VisA 2x6 setting, our method achieves 83.6/91.8 image and pixel level A-AUROC with 3.8/3.9 FM, improving pixel level A-AUROC over the state of the art by 3.2 points while reducing pixel level FM by 1.3. These results show that our method effectively preserves historical normality priors in long horizon continual category sequences.

2606.02041 2026-06-02 cs.CL

SentGuard: Sentence-Level Streaming Guardrails for Large Language Models

SentGuard:面向大型语言模型的句子级流式护栏

Jiaqi Yu, Xin Wang, Yixu Wang, Jie Li, Yan Teng, Xingjun Ma, Yingchun Wang

AI总结 提出SentGuard,一种与生成并行运行的句子级流式护栏,通过轻量级等待缓冲区将流式令牌分组为句子块并仅释放已验证块,以在低延迟下实现高精度不安全内容检测。

详情
Comments
16 pages, 5 figures, submitted to ARR
AI中文摘要

大型语言模型越来越多地实时流式输出长篇幅、推理密集的响应,这使得何时进行审核与是否进行审核同样关键。现有的护栏分为两种不理想的极端:响应级方法延迟干预直到完整输出生成,而令牌级方法基于不完整的语义进行操作,往往产生不稳定的决策和过多的护栏调用。为应对这一挑战,我们提出SentGuard,一种与生成并行运行的句子级流式护栏。一个轻量级等待缓冲区将流式令牌分组为句子块,并仅向用户释放已验证的块,引入一个小偏移量,使得SentGuard能够在目标LLM解码后续内容时评估当前前缀。为支持这一点,我们构建了StreamSafe基准,包含8个危害类别的结构化逐句标注,捕捉推理和响应段中安全风险的演变。我们进一步使用从粗到细的目标训练SentGuard,以在不安全意图在句子边界出现时立即检测。在5个安全基准上的实验表明,SentGuard优于现有基线,在两个句子内检测到90.5%的不安全案例,同时保持7.41%的低流式误报率。

英文摘要

Large language models increasingly stream long, reasoning-intensive responses in real time, making when to moderate as critical as whether to moderate. Existing guardrails fall into two unsatisfactory extremes: response-level methods delay intervention until the full output is generated, whereas token-level methods act on incomplete semantics, often producing unstable decisions and excessive guard invocations. To address this challenge, we propose SentGuard, a sentence-level streaming guardrail that operates in parallel with generation. A lightweight waiting buffer groups streamed tokens into sentence chunks and releases only verified chunks to the user, introducing a small offset that enables SentGuard to assess the current prefix while the target LLM decodes subsequent content. To support this, we construct StreamSafe, a benchmark with structured per-sentence annotations across 8 harm categories, capturing the evolution of safety risks across both reasoning and response segments. We further train SentGuard with a coarse-to-fine objective to detect unsafe intent as soon as it emerges at sentence boundaries. Experiments on 5 safety benchmarks show that SentGuard outperforms existing baselines, detecting 90.5% of unsafe cases within two sentences while maintaining a low streaming false-positive rate of 7.41%.

2606.02038 2026-06-02 physics.app-ph cs.LG

Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints

部署约束下基于不确定性感知图神经网络的稀疏传感器城市温度场重建

Reda Snaiki, Abdelatif Merabtine

AI总结 提出一种不确定性感知图神经网络框架,从稀疏传感器重建每日最高温度场,支持距离约束传感器放置和概率超标映射,在蒙特利尔地区验证优于传统方法。

详情
AI中文摘要

从稀疏观测重建空间连续的每日温度场对于城市气候监测和热风险分析至关重要,但实际部署受限于传感器预算和间距约束。本研究提出一种不确定性感知图神经网络(GNN)框架,用于从稀疏传感器重建每日最高温度场,同时支持距离约束的传感器放置和概率超标映射。该模型使用基于图注意力的均值残差架构,通过高斯负对数似然训练,预测温度场和空间变化的预测不确定性场。传感器放置采用基于QR分解的本征正交分解(POD-QR)策略,并施加4公里最小传感器间距约束,与随机可行放置和最远点采样进行比较。该框架在蒙特利尔区域多边形上使用Daymet v4.1每日温度数据(1公里分辨率)进行评估,采用严格的时间留出协议(训练:2020-2023;测试:2024)。在传感器预算(10-40个传感器)下,所提出的GNN在未观测节点上的RMSE和MAE始终优于反距离加权和普通克里金法。传感器放置效应在低预算时最显著,在高预算时减弱,在施加间距约束下,约30个传感器时出现实际饱和状态。概率评估进一步显示,随着传感器密度增加,不确定性校准得到改善,并且比克里金法具有更好的锐度-校准权衡。这些结果支持所提出的框架作为不确定性感知温度场重建和面向决策的热风险映射的有效工具。

英文摘要

Reconstructing spatially continuous daily temperature fields from sparse observations is important for urban climate monitoring and heat-risk analysis, but practical deployments are limited by sensor budgets and spacing constraints. This study proposes an uncertainty-aware graph neural network (GNN) framework for reconstructing daily maximum temperature fields from sparse sensors while supporting distance-constrained sensor placement and probabilistic exceedance mapping. The model predicts both the temperature field and a spatially varying predictive uncertainty field using a graph-attention-based mean-residual architecture trained with a Gaussian negative log-likelihood. Sensor placement is addressed using a Proper Orthogonal Decomposition with QR factorization (POD-QR) strategy with a 4 km minimum inter-sensor distance constraint and is compared with random feasible placement and farthest-point sampling. The framework is evaluated over a Montreal-area polygon using Daymet v4.1 daily temperature data (1 km resolution) under a strict temporal hold-out protocol (training: 2020-2023; testing: 2024). Across sensor budgets (10-40 sensors), the proposed GNN consistently outperforms inverse distance weighting and ordinary kriging in RMSE and MAE on unobserved nodes. Sensor-placement effects are most pronounced at low budgets and diminish at higher budgets, with a practical saturation regime emerging around 30 sensors under the imposed spacing constraint. Probabilistic evaluation further shows improved uncertainty calibration with increasing sensor density and a better sharpness-calibration trade-off than kriging. These results support the proposed framework as an effective tool for uncertainty-aware temperature field reconstruction and decision-oriented heat-risk mapping.

2606.02035 2026-06-02 cs.AI cs.LG

RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network

RL-ACRGNet:基于强化学习的胸部放射学报告生成网络

Yogesh Kumar Meena, Saurabh Agarwal, K. V. Arya

AI总结 提出RL-ACRGNet,一种结合预训练DenseNet编码器与多级LSTM解码器的离策略强化学习框架,通过度量奖励机制优化视觉语义嵌入,在IU-Xray和MIMIC-CXR数据集上超越基线,生成高质量临床报告。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

医学影像解读是现代临床诊断的基石,然而手动生成放射学报告既耗时又容易出现解读不一致。在医学AI领域,通过深度学习自动化这些描述有望简化临床工作流程并标准化诊断输出。然而,由于在捕获细粒度视觉特征和确保临床连贯性方面的局限性,准确的疾病检测和精确的报告生成仍然是重大挑战。为了解决这些问题,我们提出了RL-ACRGNet,一种改进的编码器-解码器模型,它将预训练的DenseNet编码器与多级LSTM解码器集成在离策略强化学习框架中。通过使用双网络方法,基于度量奖励机制细化视觉语义嵌入,我们证明RL-ACRGNet在IU-Xray数据集上持续优于最先进的基线,在BLEU-4(0.47%)、METEOR(0.17%)和ROUGE-L(0.518)上取得了定量改进。此外,在大规模MIMIC-CXR数据集上的综合评估证实了该模型的稳健泛化能力及其生成高质量、临床相关报告的能力。

英文摘要

Medical imaging interpretation is a foundational pillar of modern clinical diagnostics, yet the manual generation of radiology reports remains a time-consuming process prone to interpretation inconsistencies. Within the field of medical AI, automating these descriptions through deep learning promises to streamline clinical workflows and standardise diagnostic output. However, accurate disease detection and precise report generation remain significant challenges due to limitations in capturing fine-grained visual features and ensuring clinical coherence. To address these issues, we propose RL-ACRGNet, an improved encoder-decoder model that integrates a pre-trained DenseNet encoder with a multilevel LSTM decoder within an off-policy reinforcement learning framework. Using a dual-network approach to refine visual-semantic embeddings through a metric-based reward mechanism, we demonstrate that RL-ACRGNet consistently outperforms state-of-the-art baselines on the IU-Xray dataset, achieving quantitative improvements in BLEU-4 (0.47%), METEOR (0.17%) and ROUGE-L (0.518). Furthermore, comprehensive evaluations on the large-scale MIMIC-CXR data set confirm the robust generalisation of the model and its ability to generate high-quality, clinically relevant reports