arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.02117 2026-06-02 stat.ML cs.LG stat.ME

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

ProbRes: 概率时间序列预测的波动率学习

Tingting Wang, Yunyi Zhang, Benyou Wang

AI总结提出ProbRes，一种事后概率校准方法，通过显式学习波动率动态来改进概率预测，有效处理异方差数据，并在理论和实验上验证其有效性。

详情

AI中文摘要

概率时间序列预测由于需要量化未来观测中的风险和不确定性，在金融应用中引起了越来越多的关注。我们提出ProbRes，一种事后概率校准方法，它显式地学习并将波动率动态纳入概率预测中，从而能够有效处理异方差数据。在训练过程中，ProbRes采用两个与架构无关的模块分别对条件均值和条件波动率进行建模。在推理阶段，它通过重采样标准化残差生成预测分布。ProbRes适用于单变量和多变量时间序列，并且在广泛的误差分布下保持稳健，包括具有条件异方差的非高斯创新。理论结果证明了ProbRes的有效性，在合成和真实数据集上的实验表明，ProbRes准确捕捉预测分布并产生校准良好的预测区间。

英文摘要

Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During training, ProbRes employs two architecture-agnostic modules to separately model the conditional mean and conditional volatility. At the inference stage, it generates predictive distributions by resampling normalized residuals. ProbRes is applicable to both univariate and multivariate time series and remains robust under a wide range of error distributions, including non-Gaussian innovations with conditional heteroskedasticity. Theoretical results demonstrate ProbRes's validity and experiments on both synthetic and real-world datasets show that ProbRes accurately captures predictive distributions and produces well-calibrated prediction intervals.

URL PDF HTML ☆

赞 0 踩 0

2606.02115 2026-06-02 stat.ML cs.LG

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

AI总结针对随机微分方程中已知扩散参数时的漂移估计问题，利用扩散模型理论推导了时间平均均方误差的显式风险界，将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

详情

Comments: Preprint

AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题，在许多科学领域具有重要意义。Tapia Costa等人（2026）的最新工作引入了一种新技术，当扩散参数已知时，利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题，并利用（条件）得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果，但其估计器的理论保证问题仍未解决。在本笔记中，我们通过利用扩散模型理论的技术来填补这一空白。更具体地说，我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为（i）Euler-Maruyama离散化，（ii）得分/去噪器近似，（iii）噪声初始化，以及（iv）采样方差，揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

URL PDF HTML ☆

赞 0 踩 0

2606.02113 2026-06-02 cs.CL cs.AI

A Primer in Post-Training Reasoning Data: What We Know About How It Works

后训练推理数据入门：我们对其运作机制的了解

Yaoming Li, Guangxiang Zhao, Qilong Shi, Lin Sun, Xiangzheng Zhang, Tong Yang

AI总结本文综述了后训练推理数据的类型、效用、构建方法和扩展规律，为未来推理数据发布和后训练方案提供归因框架。

2606.02111 2026-06-02 cs.CV cs.AI cs.CL

Jailbreaking Multimodal Large Language Models using Multi-Clip Video

使用多片段视频破解多模态大语言模型

Choongwon Kang, Seungjong Sun, Hyunmin Jun, Jang Hyun Kim

AI总结提出MCV SafetyBench数据集，通过多片段视频评估多模态大语言模型的安全漏洞，发现视频模态比图像更脆弱，动态和多样化上下文增加攻击成功率，并基于图像模态的鲁棒性提出防御策略。

详情

Comments: 27 pages, 20 figures, Accepted to the Main Conference of ACL 2026

AI中文摘要

随着多模态大语言模型（MLLMs）发展到处理视频输入，人们开始担忧其被恶意滥用的可能性。先前的越狱研究表明，MLLMs中的安全对齐可以通过视觉输入被绕过，但尚不清楚视频输入的哪些属性导致了这种脆弱性。为填补这一空白，我们引入了Multi-Clip Video (MCV) SafetyBench，一个包含2,920个视频的数据集，旨在评估视频输入的多样性如何影响MLLMs的脆弱性。每个视频由多个短片段组成，描述与有害查询相关的不同上下文。对八个代表性视频MLLMs的实验表明，攻击成功率随着片段数量的增加而持续提高。我们的结果进一步表明，视频模态（1）比图像模态更脆弱，（2）对动态视频比对静态视频更脆弱，（3）当视频包含更多样化的上下文时更脆弱。基于这些发现，我们提出了一种利用图像模态相对鲁棒性的防御策略。

英文摘要

As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for malicious misuse. Prior jailbreak studies have shown that safety alignment in MLLMs can be bypassed through visual inputs, yet it remains unclear which properties of video inputs induce this vulnerability. To address this gap, we introduce Multi-Clip Video (MCV) SafetyBench, a dataset of 2,920 videos designed to evaluate how the diversity of video inputs affects the vulnerability of MLLMs. Each video consists of multiple short clips depicting diverse contexts related to a harmful query. Experiments on eight representative video MLLMs show that attack success consistently increases with the number of clips. Our results further indicate that the video modality is (1) more vulnerable than the image modality, (2) more vulnerable to dynamic videos than to static videos, and (3) more vulnerable when videos contain more diverse contexts. Building on these findings, we propose a defense strategy that leverages the relative robustness of the image modality.

URL PDF HTML ☆

赞 0 踩 0

2606.02109 2026-06-02 cs.AI

BADGER: Bridging Agentic and Deterministic Evaluation for Generative Enterprise Reasoning

BADGER：桥接生成式企业推理的自主与确定性评估

Shannon Serrao, Soumitra Chatterjee, Dorina Strori, Abhishek Sharma, Nathan Miller

AI总结提出BADGER框架，统一文本到SQL评估与自主行为评估，通过混合执行准确率指标（Hybrid-EX）和自主评估套件，在工业查询上超越现有方法。

详情

Comments: 30 pages, 2 figures, 6 tables

AI中文摘要

将自然语言转换为SQL查询并编排多步自主推理管道的企业AI系统需要与学术基准根本不同的评估方法。Spider和BIRD建立了执行准确率协议；G-Eval和RAGAS推进了基于LLM的评估；最近的工作如Spider 2.0、BEAVER和BIRD-Interact开始解决企业和自主维度。没有一个单一框架将文本到SQL评估与自主行为评估统一到一个生产级管道中，并针对人类专家判断进行校准。我们提出了在Merkle开发的BADGER，一个统一的评估框架，集成了文本到SQL评估与自主行为评估。BADGER提供三个贡献。首先，LLM辅助的SQL组件提取，扩展Spider方法以处理CTE-heavy、方言特定的SQL。其次，混合执行准确率指标（Hybrid-EX），通过使用LLM在确定性单元格级评分之前推断结构对齐，解决列别名和数值容错脆弱性。在150个人工标注的行业查询上验证，Hybrid-EX达到Cohen's kappa=0.717 [95% CI: 0.600-0.822]（高度一致性）和87.3%的平衡准确率，优于所有六个竞争框架（Delta-kappa: 0.322-0.502，所有p<=0.001）。第三，一个企业自主评估套件，将RAGAS、G-Eval和代理基准指标组装成一个统一管道；超额工具使用是唯一的新元素。BADGER完全在客户受管的数据环境中运行，支持可配置的LLM评判后端，并支持快速原型化客户特定的评判器和指标，作为持续评估骨干而非一次性质量门。

英文摘要

Enterprise AI systems that translate natural language into SQL queries and orchestrate multi-step agentic reasoning pipelines require evaluation approaches fundamentally different from academic benchmarks. Spider and BIRD established execution-accuracy protocols; G-Eval and RAGAS advanced LLM-based assessment; and recent work such as Spider 2.0, BEAVER, and BIRD-Interact has begun to address enterprise and agentic dimensions. No single framework unifies text-to-SQL assessment with agentic behavior evaluation into a production-grade pipeline calibrated against human expert judgment. We present BADGER, developed at Merkle, a unified evaluation framework integrating text-to-SQL assessment with agentic behavior evaluation. BADGER offers three contributions. First, LLM-assisted SQL component extraction extending Spider methodology to handle CTE-heavy, dialect-specific SQL. Second, a hybrid execution accuracy metric (Hybrid-EX) resolving column-aliasing and numeric-tolerance brittleness by using an LLM to infer structural alignments before deterministic cell-level scoring. Validated on 150 human-annotated industry queries, Hybrid-EX achieves Cohen's kappa=0.717 [95% CI: 0.600-0.822] (Substantial agreement) and 87.3% balanced accuracy, outperforming all six competing frameworks (Delta-kappa: 0.322-0.502, all p<=0.001). Third, an enterprise agentic evaluation suite assembling RAGAS, G-Eval, and agent benchmark metrics into a unified pipeline; Excess Tool Usage is the sole novel element. BADGER runs entirely within the client's governed data environment, supports configurable LLM judge backends, and enables rapid prototyping of client-specific judges and metrics, serving as a continuous evaluation backbone rather than a one-time quality gate.

URL PDF HTML ☆

赞 0 踩 0

2606.02107 2026-06-02 cs.RO cs.AI cs.LG

Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

网络分布式多智能体强化学习用于四旋翼无人机一致性控制

Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy

AI总结提出网络分布式多智能体强化学习框架，利用通信图实现分布式策略，通过MASAC训练高层规划器，实现零样本扩展到250个智能体。

详情

DOI: 10.1109/MELECON64486.2026.11418865
Journal ref: 2026 IEEE 23rd Mediterranean Electrotechnical Conference (MELECON)
Comments: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore

AI中文摘要

本文提出了一种用于四旋翼无人机一致性控制的网络分布式多智能体强化学习（ND-MARL）框架。与依赖集中式规划或完全分散式执行的传统多智能体MARL公式相比，ND-MARL将群体通信图纳入决策过程。在2-邻居通信拓扑下，每个智能体仅观察两个邻居的信息，并通过分布式策略输出动作。使用多智能体软演员-评论家（MASAC）训练高层分布式一致性规划器，并将其嵌入层次化堆栈中，以生成由低层四旋翼控制器跟踪的参考目标位置。结果表明，与集中式MARL控制器相比，实现了平滑的一致性轨迹和规划器-跟踪器集成。最值得注意的是，学习到的控制器表现出零样本可扩展性，即在三智能体系统上训练的策略，在相同的2-邻居通信拓扑下，无需重新训练或微调即可部署到多达250个智能体的群体中，实现了随着团队规模增大而稳态散布增加的一致收敛，这是由于稀疏信息传播所致。这些发现突显了ND-MARL作为分布式、通信感知的四旋翼一致性控制的稳定框架。

英文摘要

This paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework for quadcopter consensus control. Compared to conventional multi-agent MARL formulations that rely on centralized planning or fully decentralized execution, ND-MARL incorporates the swarm communication graph into the decision process. Under a 2-Neighbor communication topology, each agent observes information of only two neighbors and outputs an action through a distributed policy. A high-level distributed consensus planner is trained using Multi-Agent Soft Actor-Critic (MASAC) and embedded in a hierarchical stack to generate reference target positions tracked by a low-level quadcopter controller. Results demonstrate smooth consensus trajectories and planner-tracker integration when compared to a centralized MARL controller. Most notably, the learned controller exhibits zero-shot scalability, as policies trained on a three-agent system are deployed to swarms of up to 250 agents under the same 2-Neighbor communication topology without retraining or fine-tuning, achieving consistent convergence with increasing steady-state spread at large team sizes due to sparse information propagation. These findings highlight ND-MARL as a stable framework for distributed, communication-aware quadcopter consensus control.

URL PDF HTML ☆

赞 0 踩 0

2606.02106 2026-06-02 cs.LG stat.ML

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

当表格基础模型跨模态迁移：对95个数据集、7种模态和两种范式的系统评估

Julien Lafrance

AI总结本文提出一种结合等角紧框架预处理与表格基础模型的分类流水线，在跨模态数据上评估其性能，并证明其在速度与质量间取得良好平衡。

详情

Comments: 24 pages, 5 figures. Code and data available at https://doi.org/10.5281/zenodo.19982636

AI中文摘要

我们提出一个单一的分类流水线，该流水线结合了等角紧框架（ETF）预处理阶段和用于上下文推理的表格基础模型，一旦数据被映射到固定向量表示，该流水线在所有模态上应用相同。我们在涵盖七种信号模态——视觉、音频、语音、文本、分子、时间序列和表格——的95个数据集上对其进行评估。主要的方法论贡献是固定比较对象：在整个论文中，性能与相同冻结特征上最强的轻量级调优基线进行比较，而oracle选择、部署选择和专门微调则分别报告。该流水线在相同冻结特征上与强大的轻量级调优基线广泛竞争。它并不在每个任务上都匹配最好的专门模型或高度调优的流水线，但差距很小，且运行速度更快——通常比完整骨干微调快4到200倍，而质量往往相当。我们描述了如何在实际中部署该流水线：何时应用ETF预处理，如何在无验证集的情况下停止其训练，如何设置上下文分类器，以及如何校准所得概率。校准步骤并非装饰性的：TabICL通过构造产生良好校准的概率，ETF预处理最初会破坏该校准，而后处理重新缩放则恢复它——从而产生每个预测的置信度信号，从业者可以将其用作置信度门控部署的信任阈值。我们还报告了该流水线在哪些情况下不应期望有帮助，以及如何提前识别这些情况。

英文摘要

We present a single classification pipeline that combines an Equiangular Tight Frame (ETF) preprocessing stage with a tabular foundation model for in-context inference, applied identically across modalities once data is mapped to fixed vector representations. We evaluate it on 95 datasets spanning seven signal modalities -- vision, audio, speech, text, molecular, time-series, and tabular. The main methodological contribution is to fix the comparison object: throughout the paper, performance is judged against the strongest lightweight tuned baseline on the same frozen features, while oracle selection, deployed selection, and specialized fine-tuning are reported separately. The pipeline is broadly competitive with strong lightweight tuned baselines on the same frozen features. It does not match the very best specialized models or heavily tuned pipelines on every task, but it stays close, and it runs much faster -- typically 4 to 200 times faster than full backbone fine-tuning, often at comparable quality. We describe how to deploy the pipeline in practice: when to apply ETF preprocessing, how to stop its training without a validation split, how to set up the in-context classifier, and how to calibrate the resulting probabilities. The calibration step is non-cosmetic: TabICL produces well-calibrated probabilities by construction, ETF preprocessing initially disrupts that calibration, and the post-hoc rescaling restores it -- yielding a per-prediction confidence signal that practitioners can use as a trust threshold for confidence-gated deployment. We also report where the pipeline should not be expected to help, and how to identify those cases in advance.

URL PDF HTML ☆

赞 0 踩 0

2606.02105 2026-06-02 cs.CV

Multimodal Action Diffusion for Robust End-to-End Autonomous Driving

多模态动作扩散用于鲁棒的端到端自动驾驶

Jorge Daniel Rodríguez-Vidal, Diego Porres, Gabriel Villalonga Pineda, Antonio M. López Peña

AI总结提出动作扩散变换器（ADT），通过多模态动作建模和最近邻匹配，在闭环Bench2Drive基准上超越先前最优方法，同时延迟降低十倍。

详情

Comments: Preprint. June 1st, 2026. Corresponding author: Jorge Daniel Rodríguez-Vidal

AI中文摘要

端到端自动驾驶（E2E-AD）系统大多收敛于预测中间轨迹路点，将最终控制委托给具有GPS访问权限的手工控制器。直接控制信号预测（以端到端方式输出油门、转向和刹车）仍未被充分探索，且关键的是，动作多模态性在此类系统中的作用尚未被很好理解。我们认为，超越确定性单动作输出不仅是建模选择，更是驾驶性能、表示质量和训练稳定性的关键驱动因素。为验证这一点，我们引入了动作扩散变换器（ADT），这是一种无锚点扩散变换器，使用MSE目标训练，天然地对合理驾驶动作的多模态分布进行建模。ADT不承诺单一确定性命令，而是生成K个动作候选，并通过最近邻匹配（NNM）在推理时选择最合适的一个。除了强大的基准数值外，我们表明动作多模态性在学习表示和行为一致性方面带来了可衡量的好处，这些效果是确定性架构无法复制的。ADT在具有挑战性的闭环Bench2Drive基准上超越了先前最先进方法，同时实现了十倍更低的延迟，这表明表达性多模态动作建模对于鲁棒的端到端驾驶既实用高效又概念上必不可少。

英文摘要

End-to-End Autonomous Driving (E2E-AD) systems have largely converged on predicting intermediate trajectory waypoints, delegating final control to hand-crafted controllers with GPS access. Direct control-signal prediction (outputting throttle, steer and brake in an end-to-end fashion) remains underexplored, and critically, the role of action multimodality in such systems is not well understood. We argue that moving beyond deterministic, single-action outputs is not merely a modelling choice, but a key driver of driving performance, representational quality, and training stability. To validate this, we introduce the Action Diffusion Transformer (ADT), an anchor-free diffusion transformer trained with a MSE objective that natively models the multimodal distribution of plausible driving actions. Rather than committing to a single deterministic command, ADT generates K action candidates and selects the most suitable one at inference via Nearest Neighbour Matching (NNM). Beyond strong benchmark numbers, we show that action multimodality yields measurable benefits in learned representations and behavioral consistency, effects that deterministic architectures cannot replicate. ADT surpasses previous state-of-the-art on the challenging closed-loop Bench2Drive benchmark while achieving ten times lower latency, demonstrating that expressive, multimodal action modelling is both practically efficient and conceptually essential for robust end-to-end driving.

URL PDF HTML ☆

赞 0 踩 0

2606.02101 2026-06-02 stat.ML cs.LG stat.AP

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实：来自粗化边际的安全合成数据

Gillian M Raab

AI总结提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法，确保透明性和无披露风险。

详情

AI中文摘要

本文提出了一种创建合成数据的方法，与当前可用的其他方法相比，该方法对用户有两个重要优势。首先是透明性；与其他方法不同，接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后，每个边际将根据数据保管者定义的标准进行统计披露控制，例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

URL PDF HTML ☆

赞 0 踩 0

2606.02100 2026-06-02 cs.CL

PortBERT: Navigating the Depths of Portuguese Language Models

PortBERT：探索葡萄牙语语言模型的深度

Raphael Scheible-Schmitt, Henry He, Armando B. Mendes

AI总结本文提出PortBERT，一种基于RoBERTa的葡萄牙语语言模型家族，通过字节级BPE分词和稳定预训练在超过450GB数据上训练，在ExtraGLUE基准上达到竞争性能，并重点分析了训练和推理效率。

详情

DOI: 10.26615/978-954-452-105-9-008
Journal ref: Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models, 2025, pp. 59-71

AI中文摘要

Transformer模型主导现代自然语言处理，但高效的语言特定模型仍然稀缺。在葡萄牙语中，大多数工作侧重于规模或准确性，往往忽略了训练和部署效率。在本文中，我们介绍了PortBERT，一个基于RoBERTa的葡萄牙语语言模型家族，旨在平衡性能和效率。使用fairseq在来自CulturaX的超过450GB去重和过滤的mC4和OSCAR23数据上从头训练，PortBERT利用字节级BPE分词以及在GPU和TPU处理器上的稳定预训练流程。我们发布了两个变体，PortBERT base和PortBERT large，并在ExtraGLUE（一组翻译的GLUE和SuperGLUE任务）上评估它们。两个模型都表现出竞争力，匹配或超越现有的单语和多语言模型。除了准确性，我们还报告了训练和推理时间以及微调吞吐量，提供了模型效率的实用见解。因此，PortBERT通过解决葡萄牙语NLP中计算-性能权衡这一未被充分探索的维度，补充了先前的工作。我们在Huggingface上发布所有模型，并提供fairseq检查点以支持进一步的研究和应用。

英文摘要

Transformer models dominate modern NLP, but efficient, language-specific models remain scarce. In Portuguese, most focus on scale or accuracy, often neglecting training and deployment efficiency. In the present work, we introduce PortBERT, a family of RoBERTa-based language models for Portuguese, designed to balance performance and efficiency. Trained from scratch on over 450 GB of deduplicated and filtered mC4 and OSCAR23 from CulturaX using fairseq, PortBERT leverages byte-level BPE tokenization and stable pre-training routines across both GPU and TPU processors. We release two variants, PortBERT base and PortBERT large, and evaluate them on ExtraGLUE, a suite of translated GLUE and SuperGLUE tasks. Both models perform competitively, matching or surpassing existing monolingual and multilingual models. Beyond accuracy, we report training and inference times as well as fine-tuning throughput, providing practical insights into model efficiency. PortBERT thus complements prior work by addressing the underexplored dimension of compute-performance tradeoffs in Portuguese NLP. We release all models on Huggingface and provide fairseq checkpoints to support further research and applications.

URL PDF HTML ☆

赞 0 踩 0

2606.02096 2026-06-02 cs.CV

WebSpline: Structure-Informed Splines for Real-Time 3D Gaussians from Monocular Videos

WebSpline：面向单目视频实时三维高斯的结构化样条

Jongmin Park, Jeonghwan Yun, Minh-Quan Viet Bui, Munchurl Kim

AI总结提出WebSpline框架，利用结构信息样条（SIS）表示和结构代理图（SPG），实现从单目视频中实时、高保真、结构连贯的动态三维高斯重建。

详情

Comments: The first two authors contributed equally to this work (equal contribution). Please visit our project page at https://kaist-viclab.github.io/webspline-site/

AI中文摘要

从单目视频进行动态场景重建仍然极具挑战性，现有方法在有限的视角线索下往往难以平衡全局结构一致性与局部细节。为解决这一问题，我们提出WebSpline，一种新颖的动态三维高斯框架，能够从单目视频中实现结构连贯且高保真的重建，并支持快速渲染。WebSpline的核心是结构信息样条（SIS）表示，它使用可学习的三次埃尔米特样条对每个动态高斯轨迹进行建模，其运动通过辅助的结构代理图（SPG）进行结构化组织。所提出的框架分两个阶段进行优化：（i）第一阶段，从二维点轨迹初始化SPG，并通过时间刚性正则化进行细化，以建立序列中运动物体的结构连贯性；（ii）第二阶段，从细化后的SPG初始化SIS表示，并在空间和结构邻域约束下进行优化。推理时，仅通过评估学习到的SIS即可获得高斯运动，从而实现快速渲染。在具有挑战性的单目动态场景基准iPhone和NVIDIA上的大量实验表明，我们的WebSpline达到了最先进的渲染质量，同时在iPhone数据集上渲染速度比第二名WorldTree快10倍以上。

英文摘要

Dynamic scene reconstruction from monocular videos remains highly challenging, as existing methods often struggle to balance global structural coherence and local fine-grained details under limited multi-view cues. To address this challenge, we propose WebSpline, a novel dynamic 3D Gaussian framework that enables structurally coherent and high-fidelity reconstruction from monocular videos with fast rendering. The core of WebSpline is the Structure-Informed Spline (SIS) representation, which models each dynamic Gaussian trajectory using a learnable cubic Hermite spline whose motion is structurally organized with an auxiliary Structural Proxy Graph (SPG). The proposed framework is optimized in two stages: (i) in the first stage, the SPG is initialized from 2D point tracks and refined with temporal rigidity regularization to establish structural coherence for moving objects across the sequence; and (ii) in the second stage, the SIS representation is initialized from the refined SPG and optimized under both spatial and structural neighborhood constraints. At inference, Gaussian motion is obtained solely by evaluating the learned SIS, enabling fast rendering. Extensive experiments on the challenging monocular dynamic scene benchmarks, iPhone and NVIDIA, demonstrate that our WebSpline achieves state-of-the-art rendering quality while rendering over 10 times faster than WorldTree, the second-best method on the iPhone dataset.

URL PDF HTML ☆

赞 0 踩 0

2606.02093 2026-06-02 cs.CL cs.AI cs.LG

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

不确定性量化中模糊性在错误预测中的作用

Ieva Raminta Staliūnaitė, James Bishop, Andreas Vlachos

AI总结通过解耦输入模糊性与不确定性信号，利用门控专家和选择性预测提升大语言模型在问答任务中的错误预测性能。

详情

Comments: 8 pages not including references and appendices, 3 figures

AI中文摘要

错误预测任务，即预测模型输出是否正确，通常通过不确定性量化（UQ）来解决。然而，虽然不确定性指标捕捉了模型缺乏知识或能力进行预测的情况，但它们也反映了模型输入和上下文中固有的偶然不确定性。本文提出了一种通过将输入模糊性与UQ信号解耦来改进大语言模型（LLM）错误预测的方法。我们在问答（QA）任务上使用六种UQ指标进行实验，结果表明，UQ指标在无歧义实例上的错误预测能力优于具有多个合理答案的问题。我们使用门控专家和选择性预测将真实和预测的模糊性标签纳入错误预测流程。我们发现，模糊性信息提高了跨模型家族、训练和评估范式、数据集（包括据称无歧义的数据集）以及偶然不确定性来源的错误预测分数，在标准数据集上对单个UQ指标的PRR提升超过10个百分点。

英文摘要

The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackled with Uncertainty Quantification (UQ). However, while uncertainty metrics capture when models lack knowledge or capacity to make a prediction, they also reflect aleatoric uncertainty, which is inherent in the model input and context. This paper presents a method for improving error prediction for Large Language Models (LLMs), by disentangling input ambiguity from UQ signal. We conduct experiments on the task of Question Answering (QA) with six UQ metrics and show that UQ metrics are more predictive of errors on unambiguous instances than on questions with multiple plausible answers. We use Gated Experts and Selective Prediction to incorporate gold and predicted ambiguity labels into the error prediction pipeline. We find that ambiguity information improves error prediction scores across model families, training and evaluation paradigms, datasets (including allegedly unambiguous ones), and sources of aleatoric uncertainty, yielding improvements of over 10 points of PRR for individual UQ metrics on standard datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.02092 2026-06-02 eess.IV cs.AI cs.CV

LALE: Lightweight-Transformer Architecture for Land-Cover Estimation

LALE：用于土地覆盖估计的轻量级Transformer架构

Ümit Mert Çağlar, Alptekin Temizel

AI总结提出LALE架构，通过分辨率分支编码器（轻量级ConvMixer处理高分辨率局部特征，Transformer处理低分辨率全局上下文）和全MLP多尺度解码器，在遥感图像分割中实现高效性能与计算成本的平衡。

详情

AI中文摘要

遥感图像的语义分割需要模型在严格的计算预算下同时捕捉全局上下文和局部细节。先前的工作通常针对这些轴之一进行优化：注意力用于全局上下文，卷积用于局部细节，或紧凑性用于效率。虽然混合方法旨在同时捕捉两者，但它们需要架构更改和带有计算开销的编码器骨干，限制了效率和性能。我们提出了LALE（用于土地覆盖估计的轻量级Transformer架构），一种端到端的遥感图像分割架构，它通过分辨率分支编码器：轻量级ConvMixer阶段处理高分辨率局部特征，而Transformer阶段处理低分辨率全局上下文，将自注意力的二次成本限制在深层、下采样的特征图上。全MLP多尺度解码器，以及贯穿始终的RMSNorm和StarReLU，进一步减少了计算量和参数数量。在大型ARAS400k遥感分割基准上，LALE相对于CNN、Transformer和混合基线建立了强大的效率-性能权衡。我们最小的变体（仅1.6M参数）在F1分数上达到最佳基线（UPerNet）的2.6分以内，同时使用4.5倍更少的参数、7倍更少的存储、17倍更少的GMACs，并提供1.8倍更高的吞吐量。

英文摘要

Semantic segmentation of remote sensing imagery requires models that capture both global context and local detail under tight computational budgets. Prior work typically optimizes for one of these axes: attention for global context, convolution for local detail, or compactness for efficiency. While hybrid approaches aim to capture both, they require architectural changes and encoder backbones with computational overhead, limiting efficiency and performance. We present LALE (Lightweight-transformer Architecture for Land-cover Estimation), an end-to-end remote sensing image segmentation architecture, that bifurcates its encoder by resolution: lightweight ConvMixer stages handle high-resolution local features, while transformer stages handle low-resolution global context, confining the quadratic cost of self-attention to deep, downsampled feature maps. An all-MLP multi-scale decoder, together with RMSNorm and StarReLU throughout, further reduces compute and parameter count. On the large-scale ARAS400k remote-sensing segmentation benchmark, LALE establishes a strong efficiency-performance trade-off against CNN, transformer, and hybrid baselines. Our smallest variant, (just 1.6M parameters), reaches within 2.6 F1 points of the best baseline (UPerNet) while using 4.5x fewer parameters, 7x less storage, 17x fewer GMACs, and delivering 1.8x higher throughput.

URL PDF HTML ☆

赞 0 踩 0

2606.02080 2026-06-02 cs.MA cs.AI cs.CV

Agentic-J: An AI Agent for Biological Microscopy Image Analysis

Agentic-J：用于生物显微镜图像分析的AI智能体

Lukas Johanns, Marilin Moor, Davide Panzeri, Yu Zhou, Xinyi Chen, Nora F. K. Pauly, Zixuan Pan, Matthias Gunzer, Andreas Müller, Yiyu Shi, Hedi Peterson, Jianxu Chen

AI总结提出基于容器的多智能体AI助手Agentic-J，通过自然语言接口集成ImageJ/Fiji工具，实现从细胞分割到多条件量化的可追溯、可复现生物图像分析工作流。

详情

Comments: Presented at Cell Biology at Scale 2026 (Poster). The Agentic-J project is available at https://mmv-lab.github.io/Agentic-J/

AI中文摘要

生物图像分析日益需要整合异构工具、编程环境和领域知识，而很少有研究人员能同时掌握这些。我们提出Agentic-J，一个容器化的多智能体AI助手，主要面向ImageJ/Fiji，使生物学家能够用自然语言指定分析任务，从细胞核分割、细胞追踪到多条件量化。该智能体生成可执行的脚本，并组织成有文档记录的项目结构，因此每个分析决策都是可追溯的，工作流可以复现或共享。专门的子智能体负责插件管理、代码生成、调试、质量保证和统计报告。本文介绍系统的设计，展示真实的生物显微镜图像分析工作流，并详细说明技术实现。

英文摘要

Biological image analysis increasingly demands integration across heterogeneous tools, programming environments, and domain knowledge that few researchers can command simultaneously. We present Agentic-J, a containerised, multi-agent AI assistant, primarily for ImageJ/Fiji that enables biologists to specify analysis tasks in natural language, from nuclei segmentation and cell tracking to multi-condition quantification. The agent generates executable scripts organised into a documented project structure, so every analysis decision is traceable and the workflow can be reproduced or shared. The specialised sub-agents handle plugin management, code generation, debugging, quality assurance, and statistical reporting. In this paper we introduce the system's design, demonstrate real biological microscopy image analysis workflows, and detailed the technical implementation.

URL PDF HTML ☆

赞 0 踩 0

2606.02079 2026-06-02 cs.CV

FACT: A Simple and Efficient Framework for Active Finetuning

FACT：一种简单高效的主动微调框架

Wenshuai Xu, You Song, Yuzhuo Cui, Minjie Ren, Qingjie Liu, Zhenghui Hu

AI总结针对主动微调中全量微调导致预训练特征失真和过拟合的问题，提出FACT三层分层微调框架，通过冻结特征增强和参数高效微调，在多种数据集和架构上显著提升性能，尤其在低采样率下实现超过20%的增益。

详情

Comments: ACCEPTED for publication as a REGULAR paper in the IEEE Transactions on Image Processing (T-IP)

AI中文摘要

主动微调的主要目标是通过使用精心挑选的信息性或挑战性数据对预训练模型进行微调，以提高其在特定任务或领域上的性能。先前的研究主要关注主动方面（即数据选择），同时统一采用全量微调进行模型适应，这不可避免地因分布偏移而扭曲预训练特征。当模型大小相对于微调数据量较大时，这个问题变得尤为突出，导致过拟合风险增加。为了解决这一关键差距，我们正式概述了FiAF任务，该任务强调在主动学习中系统探索微调方法。我们提出了FACT，一个三阶段分层微调框架，兼具高效性和简洁性，专门为主动微调场景设计。我们的综合实验涵盖：（1）三大数据集类别，包括经典（CIFAR10、CIFAR100、ImageNet-1k）、不平衡（CIFAR10-LT、CIFAR100-LT）和细粒度（StanfordCars、FGVCAircraft）图像分类数据集，每个在3-5种不同采样率下评估；（2）多样化的预训练架构，包括卷积神经网络（ConvNeXt）、视觉变换器（ViT）和视觉LSTM（ViL）网络；（3）对冻结特征增强（FroFA）策略的系统研究；（4）对效率和泛化性的全面严格分析。结果表明，我们的框架具有显著改进，并具备强大的泛化性和鲁棒性。值得注意的是，在低采样率下，我们的框架在CIFAR10、CIFAR100和ImageNet-1k基准测试中，ViT模型实现了超过20%的显著性能提升。这种系统性的方法在保持参数效率的同时建立了新的最先进性能，在标记数据稀缺时尤其有效。

英文摘要

The main goal of active finetuning is to improve a pretrained model's performance on a specific task or domain by finetuning it with carefully selected informative or challenging data. Previous research has predominantly focused on the active aspect (i.e., data selection) while uniformly employing full finetuning for model adaptation, which inevitably distorts pretrained features due to distribution shift. This issue becomes particularly pronounced when the model size is large relative to the finetuning data quantity, leading to heightened overfitting risks. To address this critical gap, we formally outline the FiAF task that emphasizes systematic exploration of finetuning methodologies in active learning. We propose FACT, a three-phase hierarchical finetuning framework featuring both efficiency and simplicity, specifically designed for active finetuning scenarios. Our comprehensive experiments span: (1) Three major dataset categories encompassing classic (CIFAR10, CIFAR100, ImageNet-1k), imbalanced (CIFAR10-LT, CIFAR100-LT), and fine-grained (StanfordCars, FGVCAircraft) image classification datasets, each evaluated under 3-5 distinct sampling ratios; (2) Diverse pretrained architectures including Convolutional Neural Network (ConvNeXt), Vision Transformer (ViT), and Vision LSTM (ViL) networks; (3) A systematic investigation of frozen feature augmentation (FroFA) strategies. (4) A comprehensive and rigorous analysis of efficiency and generalizability. The results demonstrate significant improvements with strong generalization and robustness. Notably, under low sampling ratios, our framework achieves remarkable performance gains of over 20% on the ViT model for CIFAR10, CIFAR100, and ImageNet-1k benchmarks. This systematic approach establishes new state-of-the-art performance while maintaining parameter efficiency, proving particularly effective when labeled data is scarce.

URL PDF HTML ☆

赞 0 踩 0

2606.02078 2026-06-02 cs.LG

Beyond $\ell_2$-norm and $\ell_\infty$-norm: A Curvature-Inspired $\ell_p$-Norm Scheme for Deep Neural Networks

超越ℓ2范数和ℓ∞范数：一种受曲率启发的深度神经网络ℓp范数方案

Jianhao Xu, Zhuang Yang

AI总结针对现有优化器在参数维度曲率变化大时适应性差的问题，提出一种动态p值的ℓp范数方案，并融入SGD和SGDM，得到LPSGD和LPSGDM优化器，通过早期大p抑制高曲率方向、后期余弦退火减小p实现稳定更新，理论证明非凸情形下O(T^{-1/2})收敛率，在CIFAR和ImageNet数据集上验证了泛化性能提升。

详情

AI中文摘要

凸距离算子传输：一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

AI总结提出凸距离算子传输（CDOT），通过算子正则化联合保持特征对应与内在几何结构，实现异质分布对齐，并证明其伪度量性质及与Gromov-Wasserstein的关系。

详情

Comments: This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026

AI中文摘要

我们引入了凸距离算子传输（CDOT），这是第一个凸最优传输框架，通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说，CDOT采用基于算子的正则化，通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此，所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外，我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein（GW）之间的关系，正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下，我们推导了一个非渐近风险界，分解为优化误差和统计误差，并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明，该方法优于现有方法，在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

URL PDF HTML ☆

赞 0 踩 0

2606.02045 2026-06-02 cs.CV cs.AI

Attention mechanisms and transfer learning for robust peach leaf damage classification under domain shift

域偏移下基于注意力机制和迁移学习的鲁棒桃叶损伤分类

Adrián Cánovas-Rodriguez, Miguel A. González-Illán, Maria Fernanda García-Cruz, Pedro Nortes Tortosa, José Salvador Rubio-Asensio, Miguel A. Zamora Izquierdo, Juan Antonio Martínez Navarro, Antonio F. Skarmeta

AI总结提出基于注意力机制和迁移学习的桃叶损伤分类方法，通过CBAM增强EfficientNet模型在公共数据集上达到93.3%准确率，并在本地数据集上通过迁移学习实现93%宏F1分数，有效应对域偏移。

详情

AI中文摘要

人工智能为从图像数据评估作物损伤提供了实用框架，支持农业管理中的早期决策。在桃园中，气候变化增加了非生物胁迫和生物压力，包括病虫害，这些通常产生视觉上相似的叶片症状。这种重叠使得手动诊断变得困难，尤其是在不同环境条件下的多个田地中，凸显了对具有强泛化能力的自动化模型的需求。我们提出了一种基于图像的桃叶损伤检测分类方法。通过手动标注公开图像创建了一个基准数据集，包含六个损伤类别的1,366片桃叶。评估了几种深度学习架构。EfficientNet模型取得了最佳结果，其中EfficientNetB0达到92.9%的准确率，EfficientNetB3达到91.5%，EfficientNetB5在少数类上表现最强。DenseNet121达到92.6%的准确率。卷积块注意力模块（CBAM）的集成在多个骨干网络中提升了性能，特别是在EfficientNetB5和InceptionV3中，而在其他网络中效果有限或为负。CBAM增强的EfficientNetB5取得了93.3%的最佳总体准确率。为了评估在现实条件下的鲁棒性，收集了一个包含四个类别180张图像的本地数据集，并应用迁移学习策略来解决域偏移。测试了三种微调策略。结合CBAM的EfficientNetB3在本地域中取得了最佳性能，迁移后宏F1分数达到93%。总体而言，基于注意力的模型在少数类上表现出更强的鲁棒性，并在不同田间条件下具有更好的泛化能力。

英文摘要

Artificial intelligence provides a practical framework for crop damage assessment from imagery data, supporting early decision-making in agricultural management. In peach orchards, climate change increases abiotic stress and biotic pressures, including pests and diseases, which often produce visually similar foliar symptoms. This overlap makes manual diagnosis difficult, especially across multiple fields with varying environmental conditions, highlighting the need for automated models with strong generalization ability. We propose an image-based classification approach for peach leaf damage detection. A benchmark dataset was created through manual annotation of publicly available images, consisting of 1,366 peach leaves across six damage categories. Several deep learning architectures were evaluated. EfficientNet models achieved the best results, with EfficientNetB0 reaching 92.9 percent accuracy, EfficientNetB3 achieving 91.5 percent, and EfficientNetB5 showing the strongest performance on minority classes. DenseNet121 reached 92.6 percent accuracy. The integration of the Convolutional Block Attention Module (CBAM) improved performance in several backbones, particularly EfficientNetB5 and InceptionV3, while showing limited or negative impact in others. The CBAM-enhanced EfficientNetB5 achieved the best overall accuracy of 93.3 percent. To evaluate robustness under realistic conditions, a local dataset of 180 images across four classes was collected, and transfer learning strategies were applied to address domain shift. Three fine-tuning strategies were tested. EfficientNetB3 combined with CBAM achieved the best performance in the local domain, reaching a 93 percent macro F1-score after transfer. Overall, attention-based models showed improved robustness for minority classes and better generalization across different field conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.02042 2026-06-02 cs.CV

Normality-Preserving Continual Industrial Anomaly Detection via Orthogonal LoRA Banks

通过正交LoRA库保持正态性的持续工业异常检测

Weibai Fang, Haijun Che, Feiyang Ren, Qiancheng Lao

AI总结提出基于历史冻结正交LoRA库和分层新颖性自适应库增长模块的框架，解决扩散模型在持续工业异常检测中的历史正态先验漂移和灾难性遗忘问题。

详情

Comments: 33 pages,6 figures,Submitted to Advanced Engineering Informatics

AI中文摘要

基于扩散模型的持续工业异常检测面临历史正态先验漂移和灾难性遗忘问题。现有的持续扩散方法通过回放或约束优化保留先前知识，但缺乏在顺序适应过程中隔离和保护类别特定正态先验的显式机制。尽管低秩适应提供了模块化残差更新，但标准LoRA既未冻结历史正态子空间，也未阻止新适配器干扰先前适配器。为解决此问题，我们提出基于两个模块的正态保持持续异常检测框架：历史冻结正交LoRA库（HF-OLB）和分层新颖性自适应库增长模块（HNABG）。HF-OLB冻结预训练的U-Net主干和已学习的LoRA库，并将新任务特定的正态残差约束到历史LoRA子空间的正交补空间中。HNABG进一步分配层依赖的残差容量，并仅在残差正态新颖性超过现有库的表达容量时扩展库。在MVTec和VisA上的大量实验证明了所提方法的有效性。在具有挑战性的VisA 2x6设置下，我们的方法实现了83.6/91.8的图像和像素级A-AUROC，以及3.8/3.9的FM，将像素级A-AUROC提升了3.2个百分点，同时将像素级FM降低了1.3。这些结果表明，我们的方法在长时间跨度的持续类别序列中有效保留了历史正态先验。

英文摘要

Continual industrial anomaly detection with diffusion models suffers from historical normality prior drift and catastrophic forgetting. Existing continual diffusion methods preserve previous knowledge through replay or constrained optimization, but they lack an explicit mechanism for isolating and protecting category-specific normality priors during sequential adaptation. Although low-rank adaptation provides modular residual updates, standard LoRA neither freezes historical normality subspaces nor prevents new adapters from interfering with previous ones. To address this issue, we propose a normality-preserving continual anomaly detection framework based on two modules: History Frozen Orthogonal LoRA Bank (HF-OLB) and Hierarchical Novelty Adaptive Bank Growth module (HNABG). HF-OLB freezes both the pre-trained U-Net backbone and the learned LoRA banks, and constrains new task-specific normality residuals to the orthogonal complement of historical LoRA subspaces. HNABG further allocates layer-dependent residual capacity and expands the bank only when the residual normality novelty exceeds the expressive capacity of existing banks. Extensive experiments on MVTec and VisA demonstrate the effectiveness of the proposed method. On the challenging VisA 2x6 setting, our method achieves 83.6/91.8 image and pixel level A-AUROC with 3.8/3.9 FM, improving pixel level A-AUROC over the state of the art by 3.2 points while reducing pixel level FM by 1.3. These results show that our method effectively preserves historical normality priors in long horizon continual category sequences.

URL PDF HTML ☆

赞 0 踩 0

2606.02041 2026-06-02 cs.CL

SentGuard: Sentence-Level Streaming Guardrails for Large Language Models

SentGuard：面向大型语言模型的句子级流式护栏

Jiaqi Yu, Xin Wang, Yixu Wang, Jie Li, Yan Teng, Xingjun Ma, Yingchun Wang

AI总结提出SentGuard，一种与生成并行运行的句子级流式护栏，通过轻量级等待缓冲区将流式令牌分组为句子块并仅释放已验证块，以在低延迟下实现高精度不安全内容检测。

详情

Comments: 16 pages, 5 figures, submitted to ARR

AI中文摘要

大型语言模型越来越多地实时流式输出长篇幅、推理密集的响应，这使得何时进行审核与是否进行审核同样关键。现有的护栏分为两种不理想的极端：响应级方法延迟干预直到完整输出生成，而令牌级方法基于不完整的语义进行操作，往往产生不稳定的决策和过多的护栏调用。为应对这一挑战，我们提出SentGuard，一种与生成并行运行的句子级流式护栏。一个轻量级等待缓冲区将流式令牌分组为句子块，并仅向用户释放已验证的块，引入一个小偏移量，使得SentGuard能够在目标LLM解码后续内容时评估当前前缀。为支持这一点，我们构建了StreamSafe基准，包含8个危害类别的结构化逐句标注，捕捉推理和响应段中安全风险的演变。我们进一步使用从粗到细的目标训练SentGuard，以在不安全意图在句子边界出现时立即检测。在5个安全基准上的实验表明，SentGuard优于现有基线，在两个句子内检测到90.5%的不安全案例，同时保持7.41%的低流式误报率。

英文摘要

Large language models increasingly stream long, reasoning-intensive responses in real time, making when to moderate as critical as whether to moderate. Existing guardrails fall into two unsatisfactory extremes: response-level methods delay intervention until the full output is generated, whereas token-level methods act on incomplete semantics, often producing unstable decisions and excessive guard invocations. To address this challenge, we propose SentGuard, a sentence-level streaming guardrail that operates in parallel with generation. A lightweight waiting buffer groups streamed tokens into sentence chunks and releases only verified chunks to the user, introducing a small offset that enables SentGuard to assess the current prefix while the target LLM decodes subsequent content. To support this, we construct StreamSafe, a benchmark with structured per-sentence annotations across 8 harm categories, capturing the evolution of safety risks across both reasoning and response segments. We further train SentGuard with a coarse-to-fine objective to detect unsafe intent as soon as it emerges at sentence boundaries. Experiments on 5 safety benchmarks show that SentGuard outperforms existing baselines, detecting 90.5% of unsafe cases within two sentences while maintaining a low streaming false-positive rate of 7.41%.

URL PDF HTML ☆

赞 0 踩 0

2606.02038 2026-06-02 physics.app-ph cs.LG

Uncertainty-Aware Graph Neural Reconstruction of Urban Temperature Fields from Sparse Sensors under Deployment Constraints

部署约束下基于不确定性感知图神经网络的稀疏传感器城市温度场重建

Reda Snaiki, Abdelatif Merabtine

AI总结提出一种不确定性感知图神经网络框架，从稀疏传感器重建每日最高温度场，支持距离约束传感器放置和概率超标映射，在蒙特利尔地区验证优于传统方法。

详情

AI中文摘要

从稀疏观测重建空间连续的每日温度场对于城市气候监测和热风险分析至关重要，但实际部署受限于传感器预算和间距约束。本研究提出一种不确定性感知图神经网络（GNN）框架，用于从稀疏传感器重建每日最高温度场，同时支持距离约束的传感器放置和概率超标映射。该模型使用基于图注意力的均值残差架构，通过高斯负对数似然训练，预测温度场和空间变化的预测不确定性场。传感器放置采用基于QR分解的本征正交分解（POD-QR）策略，并施加4公里最小传感器间距约束，与随机可行放置和最远点采样进行比较。该框架在蒙特利尔区域多边形上使用Daymet v4.1每日温度数据（1公里分辨率）进行评估，采用严格的时间留出协议（训练：2020-2023；测试：2024）。在传感器预算（10-40个传感器）下，所提出的GNN在未观测节点上的RMSE和MAE始终优于反距离加权和普通克里金法。传感器放置效应在低预算时最显著，在高预算时减弱，在施加间距约束下，约30个传感器时出现实际饱和状态。概率评估进一步显示，随着传感器密度增加，不确定性校准得到改善，并且比克里金法具有更好的锐度-校准权衡。这些结果支持所提出的框架作为不确定性感知温度场重建和面向决策的热风险映射的有效工具。

英文摘要

Reconstructing spatially continuous daily temperature fields from sparse observations is important for urban climate monitoring and heat-risk analysis, but practical deployments are limited by sensor budgets and spacing constraints. This study proposes an uncertainty-aware graph neural network (GNN) framework for reconstructing daily maximum temperature fields from sparse sensors while supporting distance-constrained sensor placement and probabilistic exceedance mapping. The model predicts both the temperature field and a spatially varying predictive uncertainty field using a graph-attention-based mean-residual architecture trained with a Gaussian negative log-likelihood. Sensor placement is addressed using a Proper Orthogonal Decomposition with QR factorization (POD-QR) strategy with a 4 km minimum inter-sensor distance constraint and is compared with random feasible placement and farthest-point sampling. The framework is evaluated over a Montreal-area polygon using Daymet v4.1 daily temperature data (1 km resolution) under a strict temporal hold-out protocol (training: 2020-2023; testing: 2024). Across sensor budgets (10-40 sensors), the proposed GNN consistently outperforms inverse distance weighting and ordinary kriging in RMSE and MAE on unobserved nodes. Sensor-placement effects are most pronounced at low budgets and diminish at higher budgets, with a practical saturation regime emerging around 30 sensors under the imposed spacing constraint. Probabilistic evaluation further shows improved uncertainty calibration with increasing sensor density and a better sharpness-calibration trade-off than kriging. These results support the proposed framework as an effective tool for uncertainty-aware temperature field reconstruction and decision-oriented heat-risk mapping.

URL PDF HTML ☆

赞 0 踩 0

2606.02035 2026-06-02 cs.AI cs.LG

RL-ACRGNet: Reinforcement Learning-Based Chest Radiology Report Generation Network

RL-ACRGNet：基于强化学习的胸部放射学报告生成网络

Yogesh Kumar Meena, Saurabh Agarwal, K. V. Arya

AI总结提出RL-ACRGNet，一种结合预训练DenseNet编码器与多级LSTM解码器的离策略强化学习框架，通过度量奖励机制优化视觉语义嵌入，在IU-Xray和MIMIC-CXR数据集上超越基线，生成高质量临床报告。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

医学影像解读是现代临床诊断的基石，然而手动生成放射学报告既耗时又容易出现解读不一致。在医学AI领域，通过深度学习自动化这些描述有望简化临床工作流程并标准化诊断输出。然而，由于在捕获细粒度视觉特征和确保临床连贯性方面的局限性，准确的疾病检测和精确的报告生成仍然是重大挑战。为了解决这些问题，我们提出了RL-ACRGNet，一种改进的编码器-解码器模型，它将预训练的DenseNet编码器与多级LSTM解码器集成在离策略强化学习框架中。通过使用双网络方法，基于度量奖励机制细化视觉语义嵌入，我们证明RL-ACRGNet在IU-Xray数据集上持续优于最先进的基线，在BLEU-4（0.47%）、METEOR（0.17%）和ROUGE-L（0.518）上取得了定量改进。此外，在大规模MIMIC-CXR数据集上的综合评估证实了该模型的稳健泛化能力及其生成高质量、临床相关报告的能力。

英文摘要

Medical imaging interpretation is a foundational pillar of modern clinical diagnostics, yet the manual generation of radiology reports remains a time-consuming process prone to interpretation inconsistencies. Within the field of medical AI, automating these descriptions through deep learning promises to streamline clinical workflows and standardise diagnostic output. However, accurate disease detection and precise report generation remain significant challenges due to limitations in capturing fine-grained visual features and ensuring clinical coherence. To address these issues, we propose RL-ACRGNet, an improved encoder-decoder model that integrates a pre-trained DenseNet encoder with a multilevel LSTM decoder within an off-policy reinforcement learning framework. Using a dual-network approach to refine visual-semantic embeddings through a metric-based reward mechanism, we demonstrate that RL-ACRGNet consistently outperforms state-of-the-art baselines on the IU-Xray dataset, achieving quantitative improvements in BLEU-4 (0.47%), METEOR (0.17%) and ROUGE-L (0.518). Furthermore, comprehensive evaluations on the large-scale MIMIC-CXR data set confirm the robust generalisation of the model and its ability to generate high-quality, clinically relevant reports

URL PDF HTML ☆

赞 0 踩 0