arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.25388 2026-05-26 cs.LG q-bio.QM

ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

ViroBench：病毒基因组学任务中的核苷酸基础模型基准测试

Dongxin Ye, Fang Hu, Han Hu, Shu Hu, Yang Tan, Wanli Ouyang, Stan Z. Li, Jie Cui, Nanqing Dong

发表机构 * Shanghai Innovation Institute Shanghai China ； University of Electronic Science ； Fudan University Shanghai China ； Shanghai Artificial Intelligence Laboratory Shanghai China ； Institute of Infection ； Health Fudan University Shanghai China ； Shanghai Sci-Tech Inno Center for Infection \& Immunity Shanghai China ； Shanghai Jiao Tong University Shanghai China ； Shenzhen Loop Area Institute Shenzhen China ； Chinese University of Hong Kong Hong Kong China ； Westlake University Hangzhou China ； Shanghai Innovation Institute ； Fudan University ； Shanghai Artificial Intelligence Laboratory ； Shanghai Sci-Tech Inno Center for Infection \& Immunity ； Shanghai Jiao Tong University ； Shenzhen Loop Area Institute ； Chinese University of Hong Kong ； Westlake University

AI总结提出首个针对病毒基因组学的综合基准ViroBench，评估66个核苷酸基础模型在生物学理解和潜在生物安全风险上的表现，发现模型在系统发育和时间偏移下性能下降，生成任务中统计似然与生物功能有效性脱钩，且预训练数据的分类多样性比参数规模更重要。

Comments 42 pages,15 figures

详情

DOI: 10.1145/3770855.3819057

AI中文摘要

核苷酸序列构成了生物系统的基本遗传基础，使得病毒基因组分析对生物医学进步至关重要。尽管生物基础模型，特别是核苷酸基础模型（NFMs）取得了进展，但该领域缺乏一个统一的病毒基因组学标准来促进社区发展并实施生物安全约束。为了解决这个问题，我们引入了ViroBench，这是第一个专门为病毒场景中的NFMs设计的全面且大规模的基准测试。ViroBench在两个关键维度上评估模型：生物学理解和潜在生物安全风险，覆盖4种任务类型中的18个不同场景。对66个不同架构的NFMs的广泛评估得出了三个关键结论。首先，NFMs在系统发育和时间偏移下表现出生物学理解的性能下降，表明外推能力较弱。其次，生成任务揭示了统计似然与生物功能有效性之间的脱钩，构成了潜在的生物安全风险。第三，受控消融研究表明，预训练数据中的分类多样性比参数规模更重要。具体来说，一个在多样化数据上训练的轻量级基线相比其原始模型实现了67.5%的性能提升。总体而言，ViroBench为未来病毒核苷酸基础模型的研究提供了可解释的诊断评估和可重复的测量框架。数据集和代码公开于https://github.com/QIANJINYDX/ViroBench。

英文摘要

Nucleotide sequences constitute the fundamental genetic basis of biological systems, rendering viral genomic analysis critical for biomedical advancement. Despite progress in biological foundation models, specifically nucleotide foundation models (NFMs), the field lacks a unified standard for viral genomics to facilitate community development and enforce biosecurity constraints. To address this, we introduce ViroBench, the first comprehensive and large-scale benchmark specifically designed for NFMs in viral settings. ViroBench evaluates models across two critical dimensions: biological understanding and latent biosecurity risk, covering 18 diverse scenarios within 4 task types. Extensive evaluation of 66 NFMs across diverse architectures yields three critical conclusions. Firstly, NFMs exhibit a performance degradation in biological understanding under phylogenetic and temporal shifts, indicating weak extrapolation capabilities. Secondly, generation tasks reveal a decoupling between statistical likelihood and biological functional validity, posing latent biosecurity risks. Thirdly, controlled ablation studies reveal that taxonomic diversity in pretraining data outweighs parameter scale. Specifically, a lightweight baseline trained on diverse data achieves a 67.5% performance gain over its original model. Overall, ViroBench provides interpretable, diagnostic evaluations and a reproducible measurement framework for future research on viral nucleotide foundation models. The datasets and code are publicly available at https://github.com/QIANJINYDX/ViroBench.

URL PDF HTML ☆

赞 0 踩 0

2605.25385 2026-05-26 cs.CV cs.AI

Weakly Supervised Camouflaged Object Detection Based on the SAM Model and Mask Guidance

基于SAM模型和掩码引导的弱监督伪装目标检测

Xia Li, Xinran Liu, Lin Qi, Junyu Dong

发表机构 * School of Computer Science（计算机科学学院）； Technology, Ocean University of China, Qingdao 266100, China（技术，中国海洋大学，青岛266100，中国）

AI总结提出MGNet网络，利用SAM模型生成伪标签，通过级联掩码解码器、上下文增强模块和掩码引导特征聚合模块，实现弱监督伪装目标检测，性能与全监督方法相当。

Comments 18 pages

详情

DOI: 10.1016/j.imavis.2025.105571

AI中文摘要

伪装目标检测（COD）由于目标与背景高度相似，是一项具有挑战性的任务。现有的全监督方法需要耗费大量人力进行像素级标注，因此弱监督方法成为平衡精度与标注效率的可行折中方案。然而，由于使用粗标注，弱监督方法常出现性能下降。本文提出一种新的弱监督伪装目标检测方法以克服这些限制。具体地，我们设计了一个新颖的网络MGNet，通过利用自定义级联掩码解码器（CMD）生成的初始掩码来引导分割过程并增强边缘预测，从而解决边缘模糊和漏检问题。我们引入上下文增强模块（CEM）以减少漏检，以及掩码引导特征聚合模块（MFAM）进行有效的特征聚合。针对弱监督挑战，我们提出BoxSAM，利用带有边界框提示的Segment Anything Model（SAM）生成伪标签。通过采用冗余处理策略，为训练MGNet提供高质量的像素级伪标签。大量实验表明，我们的方法在性能上与当前最先进方法具有竞争力。

英文摘要

Camouflaged object detection (COD) from a single image is a challenging task due to the high similarity between objects and their surroundings. Existing fully supervised methods require labor-intensive pixel-level annotations, making weakly supervised methods a viable compromise that balances accuracy and annotation efficiency. However, weakly supervised methods often experience performance degradation due to the use of coarse annotations. In this paper, we introduce a new weakly supervised approach for camouflaged object detection to overcome these limitations. Specifically, we propose a novel network, MGNet, which tackles edge ambiguity and missed detections by utilizing initial masks generated by our custom-designed Cascaded Mask Decoder (CMD) to guide the segmentation process and enhance edge predictions. We introduce a Context Enhancement Module(CEM) to reduce the missing detection, and a Mask-guided Feature Aggregation Module (MFAM) for effective feature aggregation. For the weak supervision challenge, we propose BoxSAM, which leverages the Segment Anything Model (SAM) with bounding-box prompts to generate pseudo-labels. By employing a redundant processing strategy, high quality pixel-level pseudo-labels are provided for training MGNet. Extensive experiments demonstrate that our method delivers competitive performance against current state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25384 2026-05-26 cs.CL

对抗正交解缠用于LVLM幻觉缓解

Ruoxi Cheng, Haoxuan Ma, Zhengfei Hai, Yiyan Huang, Ranjie Duan, Tianle Zhang, Xu Yang, Ziyi Ye, Xingjun Ma

发表机构 * Fudan University（复旦大学）； Tencent（腾讯）； Nanjing University（南京大学）； Southeast University（东南大学）； Great Bay University（大坝大学）； TeleAI, China Telecom（TeleAI，中国电信）

AI总结提出对抗正交解缠（AOD）框架，通过最小最大目标学习幻觉相关方向，并利用双前向对比解码策略，在不需额外训练的情况下缓解大型视觉语言模型（LVLM）的幻觉问题。

详情

AI中文摘要

大型视觉语言模型（LVLM）推进了多模态理解，但其可靠性受到幻觉的限制，即生成内容与视觉事实冲突。现有缓解方法要么依赖昂贵的外部干预（如指令调优和检索），要么使用受限于有缺陷的注意力权重和纠缠的隐藏表示的内部机制。我们提出对抗正交解缠（AOD），一种用于缓解LVLM幻觉的潜在几何框架。AOD通过最小最大目标学习幻觉相关方向：分类器将幻觉信号集中到投影分量中，而对抗器通过梯度反转层将其从正交残差空间中移除。学习到的方向使得一种无需训练的双前向对比解码策略能够抑制幻觉同时保持通用能力。在三个LVLM上进行的四个幻觉和四个效用基准实验表明，AOD一致优于强基线。它在POPE上平均提高超过6%的准确率，将AMBER提升6%，并在MMMU等效用任务上保持强劲性能。进一步分析显示跨数据集的鲁棒迁移，表明AOD捕获了通用的幻觉相关偏差而非数据集特定伪影。我们的源代码和数据集可在https://github.com/Hunter-Wrynn/AOD获取。

英文摘要

Large Vision-Language Models (LVLMs) have advanced multimodal understanding, yet their reliability is limited by hallucination, where generated content conflicts with visual facts. Existing mitigation methods either rely on costly external interventions, such as instruction tuning and retrieval, or use internal mechanisms that remain limited by flawed attention weights and entangled hidden representations. We propose Adversarial Orthogonal Disentanglement (AOD), a latent geometric framework for mitigating LVLM hallucinations. AOD learns a hallucination-related direction through a minimax objective: a classifier concentrates hallucination signals into the projected component, while an adversary removes them from the orthogonal residual space via a Gradient Reversal Layer. The learned direction enables a training-free dual-forward-pass contrastive decoding strategy that suppresses hallucinations while preserving general capabilities. Experiments on three LVLMs across four hallucination and four utility benchmarks show that AOD consistently outperforms strong baselines. It improves POPE accuracy by over 6\% on average, boosts AMBER by 6\%, and maintains strong performance on utility tasks such as MMMU. Further analysis shows robust transfer across datasets, suggesting that AOD captures general hallucination-related biases rather than dataset-specific artifacts. Our source code and datasets are available at https://github.com/Hunter-Wrynn/AOD.

URL PDF HTML ☆

赞 0 踩 0

2605.25373 2026-05-26 cs.CV

Physics-Aware 3D Gaussian Editing for Driving Scene Generation

物理感知的三维高斯编辑用于驾驶场景生成

Feng Zhou, Jian Zhang, Yuhang Sun, He Wang, Qiong Wen, Debao Kong, Tieru Wu, Rui Ma

发表机构 * School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）； National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University（吉林大学汽车底盘集成与生物力学国家重点实验室）； China FAW Group Co., Ltd.（中国第一汽车集团有限公司）

AI总结提出RoVES系统，通过单图像驱动的道路几何插入和4-DOF半车动力学模型，实现物理感知的驾驶场景编辑与车辆姿态校正。

详情

AI中文摘要

面向可靠胎儿超声解读的多智能体协作

Xiaotian Hu, Mingxuan Liu, Junwei Huang, Kasidit Anmahapong, Yifei Chen, Yiming Huang, Xuguang Bai, Zihan Li, Hongjia Yang, Yingqi Hao, Hong Xu, Yu Jiang, Tian Tian, Yi Liao, Haibo Qu, Qiyuan Tian

发表机构 * Tsinghua University（清华大学）； University of California San Diego（加州大学圣地亚哥分校）； West China Second University Hospital, Sichuan University（四川大学西昌医学院）

AI总结提出FetUSAgents多智能体系统，通过协作LLM代理和双路径证据仲裁（DPEA）整合视觉工具与临床推理，在胎儿超声VQA、报告生成等任务上超越最强基线25%以上。

详情

AI中文摘要

自动化胎儿超声解读需要从视觉感知（包括平面识别和解剖分割）到临床理解（包括生物测量和诊断报告）的工作流程。然而，当前“一任务一模型”的范式限制了跨多步骤过程的系统性证据整合。尽管多模态大语言模型（MLLM）展现出有前景的视觉理解能力，但其有限的领域特定基础和幻觉风险限制了在胎儿超声分析中的可靠性。为解决这些限制，我们提出了FetUSAgents，一个工具增强的多智能体系统，用于全面的胎儿超声解读，支持视觉问答（VQA）、报告生成、图像描述和视频总结。FetUSAgents通过协作的LLM代理协调任务特定的视觉工具，并将临床查询分解为从解剖识别到定量测量的子任务。我们进一步引入了双路径证据仲裁（DPEA），它将基于LLM的审慎推理与来自专业视觉工具的结构化计算证据相结合。一个检索增强的证据库整合中间发现，以支持可追溯且临床可靠的结论。此外，我们构建了FetUS-VQA，一个专门用于胎儿超声的VQA基准，包含1,892张图像和3,205个问答对，涵盖10个临床任务。广泛的分布外实验表明，FetUSAgents优于通用和医学MLLM，在VQA准确率上超过最强基线25%以上。这些结果表明了一条通往产前成像的基于证据的临床助手的可扩展路径。代码已公开。

英文摘要

Automated fetal ultrasound interpretation requires a workflow from visual perception, including plane recognition and anatomical segmentation, to clinical understanding, including biometric measurement and diagnostic reporting. However, the prevailing "one-task, one-model" paradigm limits systematic integration of evidence across this multi-step process. Although multimodal large language models (MLLMs) show promising visual understanding, their limited domain-specific grounding and hallucination risks restrict reliability in fetal ultrasound analysis. To address these limitations, we propose FetUSAgents, a tool-augmented multi-agent system for comprehensive fetal ultrasound interpretation, supporting visual question answering (VQA), report generation, image captioning, and video summarization. FetUSAgents coordinates task-specific visual tools through collaborative LLM agents and decomposes clinical queries into subtasks that progress from anatomical recognition to quantitative measurement. We further introduce Dual-Path Evidence Arbitration (DPEA), which integrates LLM-based deliberative reasoning with structured computational evidence from specialized visual tools. A retrieval-enhanced evidence bank consolidates intermediate findings to support traceable and clinically grounded conclusions. In addition, we construct FetUS-VQA, a dedicated VQA benchmark for fetal ultrasound, comprising 1,892 images and 3,205 question-answer pairs across 10 clinical tasks. Extensive out-of-distribution experiments show that FetUSAgents outperforms general and medical MLLMs, exceeding the strongest baseline by more than 25 percent in VQA accuracy. These results suggest a scalable route toward evidence-driven clinical assistants for prenatal imaging. Code is available.

URL PDF HTML ☆

赞 0 踩 0

2605.25354 2026-05-26 cs.AI

Context-CoT: Enhancing Context Learning via High-Quality Reasoning Synthesis

Context-CoT：通过高质量推理合成增强上下文学习

Hongbo Jin, Mingnan Zhu, Jingqi Tian, Xu Jiang, Zhongjing Du, Haoran Tang, Siyi Xie, Qiaoman Zhang, Jiayu Ding

发表机构 * Peking University（北京大学）； Xiamen University（厦门大学）； Tsinghua University（清华大学）

AI总结针对大语言模型在动态提取和应用新知识方面的上下文学习能力不足，提出Context-CoT方法，通过合成高质量推理链来增强上下文学习，在CL-Bench上显著提升性能。

2605.25352 2026-05-26 cs.LG cs.AI

Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

基于预训练潜在空间中近似高斯混合结构的认证鲁棒性

Konstantinos Emmanouilidis, Tianjiao Ding, Nghia Nguyen, Nicolas Loizou, René Vidal

发表机构 * CS & MINDS Johns Hopkins University（计算机科学与MINDS约翰霍普金斯大学）； CIS University of Pennsylvania（计算机与信息科学宾夕法尼亚大学）； AMS & MINDS Johns Hopkins University（人工智能与机器学习系约翰霍普金斯大学）； ESE, Radiology & IDEAS University of Pennsylvania（工程科学与放射学系及IDEAS宾夕法尼亚大学）

AI总结本文提出一个框架，利用预训练编码器将输入映射到近似高斯混合的潜在分布，通过理论分析证明鲁棒性退化有界，从而实现可认证鲁棒分类器，在CIFAR-10和ImageNet上达到最优或竞争性的认证准确率。

详情

AI中文摘要

深度学习模型易受对抗扰动影响，这对安全关键部署提出了重要关切。经验性防御在实践中可以实现强鲁棒性，但缺乏形式化保证，这推动了可认证鲁棒分类器的需求。虽然认证方法提供了形式化保证，但由于无法利用复杂数据分布中的结构，它们通常产生过于保守的边界。在这项工作中，我们提出了一个设计可认证鲁棒分类器的框架，该框架利用数据表示中的潜在结构。我们首先分析高斯混合设置，推导出鲁棒分类器存在的必要和充分条件，并构建了一个具有闭式鲁棒性证书和泛化保证的分类器。我们的主要贡献是证明精确结构并非必需：我们证明，如果预训练编码器将输入映射到一个与高斯混合分布$\varepsilon$-接近（在KL散度下）的潜在分布，那么认证准确率会优雅地退化，并给出了一个显式边界，关联真实分布和近似分布下的鲁棒性。这一结果使得直接使用预训练模型成为可能，而无需精确的分布假设。实验上，我们的方法在CIFAR-10和ImageNet上实现了最先进或具有竞争力的认证准确率，同时保持了强大的干净性能和低计算开销。总体而言，我们的工作将近似潜在结构确立为通往可认证鲁棒性的一条实用且有原则的路径。

英文摘要

Deep learning models are vulnerable to adversarial perturbations, raising important concerns for safety-critical deployment. Empirical defenses can achieve strong robustness in practice, but lack formal guarantees, motivating the need for certifiably robust classifiers. While certified methods provide formal guarantees, they often yield overly conservative bounds due to their inability to exploit structure in complex data distributions. In this work, we propose a framework for designing certifiably robust classifiers that leverages latent structure in data representations. We first analyze the Gaussian mixture setting, deriving necessary and sufficient conditions for the existence of robust classifiers and constructing a classifier with a closed-form robustness certificate and generalization guarantees. Our main contribution is to show that exact structure is not required: we prove that if a pretrained encoder maps inputs to a latent distribution that is $\varepsilon$-close (in KL divergence) to a Gaussian mixture, then certified accuracy degrades gracefully, with an explicit bound relating robustness under the true and approximate distributions. This result enables the direct use of pretrained models without requiring exact distributional assumptions. Empirically, our method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet, while maintaining strong clean performance and low computational overhead. Overall, our work establishes approximate latent structure as a practical and principled route to certifiable robustness.

URL PDF HTML ☆

赞 0 踩 0

2605.25347 2026-05-26 cs.CV cs.LG

CausalFlow: LLM Agent 失败的因果归因与反事实修复

Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad, Houman Homayoun

发表机构 * Department of Computer Science University of California, Davis（计算机科学系加州大学戴维斯分校）

AI总结提出CausalFlow框架，通过反事实干预计算步骤级因果责任分数，识别失败步骤并生成最小编辑修复，用于测试时修复和训练时监督，在多个基准上优于启发式方法。

详情

AI中文摘要

大型语言模型（LLM）代理在涉及推理、工具使用和环境交互的多步任务中经常失败。虽然此类失败通常被记录或通过启发式重试处理，但它们包含了关于执行中断位置的结构化信号。我们提出了CausalFlow，一个干预框架，将失败的代理轨迹转换为最小的反事实修复和可重用的监督。CausalFlow将执行轨迹建模为依赖步骤的顺序链，并通过步骤级反事实干预计算因果责任分数（CRS）来识别导致失败的步骤。对于这些步骤，我们生成最小编辑修复，将最终结果翻转为成功，产生形式为（错误步骤，修正步骤）的验证对比对。CausalFlow支持两种互补用途：具有最小行为漂移的针对性测试时修复，以及适用于离线偏好优化或奖励建模的训练时监督。在涵盖数学推理、代码生成、问答和医学浏览的四个基准测试中，CausalFlow将失败执行转换为具有高最小性和因果一致性分数的验证最小修复，并证明因果归因对于跨不同代理任务的可靠改进是必要的，在复杂检索设置中优于启发式细化，同时产生更局部的修复。这些结果表明，对结构化执行轨迹的干预分析提供了一种原则性和可扩展的机制，将代理失败转化为可靠性提升和可学习的监督。

英文摘要

Large language model (LLM) agents frequently fail on multi-step tasks involving reasoning, tool use, and environment interaction. While such failures are typically logged or retried heuristically, they contain structured signals about where execution broke down. We introduce CausalFlow, an interventional framework that converts failed agent traces into minimal counterfactual repairs and reusable supervision. CausalFlow models execution traces as sequential chains of dependent steps and computes Causal Responsibility Scores(CRS) via step-level counterfactual intervention to identify failure-inducing steps. For these steps, we generate minimally edited repairs that flip the final outcome to success, producing validated contrastive pairs of the form (wrong step, corrected step). CausalFlow supports two complementary uses: targeted test-time repair that recovers from failures with minimal behavioral drift, and training-time supervision suitable for offline preference optimization or reward modeling. Across four benchmarks spanning mathematical reasoning, code generation, question answering, and medical browsing, CausalFlow converts failed executions into validated minimal repairs with high minimality and causal-consensus scores, and demonstrates that causal attribution is necessary for reliable improvement across diverse agent tasks, outperforming heuristic refinement in complex retrieval settings while producing more localized repairs throughout. These results demonstrate that interventional analysis over structured execution traces provides a principled and scalable mechanism for transforming agent failures into reliability gains and learning-ready supervision.

URL PDF HTML ☆

赞 0 踩 0

2605.25334 2026-05-26 cs.CV

Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence

双路径几何感知多模态大语言模型用于空间智能

Yufei Zheng, Xuhan Zhu, Zide Liu, Chunpeng Zhou, Chenfeng Wang, Yongchao Xu, Yunnan Wang, Jiawei Liu, Pengfei Yu, Wei Zhai, Yang Cao, Zheng-Jun Zha

发表机构 * University of Science and Technology of China（中国科学技术大学）； Li Auto Inc.（利汽车公司）； Shanghai Jiao Tong University（上海交通大学）

AI总结提出GAMSI，一种仅以RGB图像为输入、通过双路径查询和专家引导视觉对齐实现3D结构与度量尺度联合感知的多模态大语言模型，在七个空间智能基准上达到最优性能。

详情

AI中文摘要

从2D视觉输入理解物理世界的空间能力依赖于两种互补的几何知识：整体3D结构感知和细粒度度量尺度估计。现有的多模态大语言模型通常只处理其中一个方面，将深度图或点云作为额外模型输入，这带来了大量计算开销并继承了上游预测模型的泛化局限性。我们提出GAMSI，一种双路径几何感知多模态大语言模型用于空间智能，仅以RGB图像为输入，同时在统一的自回归骨干网络内内化两种几何先验。具体地，我们引入度量-结构解耦查询，使用两组可学习查询分别从共享视觉上下文中提取密集度量信号和稀疏结构线索，并通过任务解耦注意力掩码防止两条路径相互污染。在此基础上，专家引导视觉定位模块将聚合的线索投影回帧级视觉特征，并与视觉基础模型对齐，这些模型仅作为训练时的监督，而非模型输入。我们进一步构建了一个多任务空间指令微调数据集，包含152,776个样本，涵盖13种任务类型和三种视觉模态，整合自六个公共数据集。通过两阶段课程训练，GAMSI在七个空间智能基准上达到了最先进的性能。

英文摘要

Spatial understanding of the physical world from 2D visual inputs hinges on two complementary forms of geometric knowledge: holistic 3D structural perception and fine-grained metric scale estimation. Existing multimodal large language models (MLLMs) typically address only one facet, ingesting either depth maps or point clouds as additional model inputs, which incurs substantial computational overhead and inherits the generalization limitations of upstream prediction models. We propose GAMSI, a dual-pathway Geometry-Aware MLLM for Spatial Intelligence that takes only RGB images as input while internalizing both forms of geometric prior within a unified autoregressive backbone. Specifically, we introduce Metric-Structure Decoupled Queries (MSDQ) which employ two groups of learnable queries to respectively extract dense metric signals and sparse structural cues from the shared visual context, with a task-decoupled attention mask further preventing the two pathways from contaminating each other. Building on this, an Expert-Guided Visual Grounding (EVG) module projects the aggregated cues back to frame-level visual features and aligns them with vision foundation models, which serve purely as training-time supervision, rather than as model inputs. We further build a multi-task spatial instruction-tuning dataset (MTS) comprising 152{,}776 samples spanning 13 task types and three visual modalities, consolidated from six public datasets. Trained with a two-stage curriculum, GAMSI achieves state-of-the-art performance on seven spatial intelligence benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.25333 2026-05-26 cs.CV

递归类连接分类（R3C）应用于二值图像分割以改进婴儿指纹增强

Joao Leonardo Harres Dall Agnol, Luiz Fernando Puttow Southier, Jefferson Tales 0liva, Marcelo Teixeira, Rodrigo Mineto, Marcelo Filipa, Dalcimar Casanova, Erick Oliveira Rodrigues

发表机构 * Infant.ID Ltda（Infant.ID公司）； Graduate Program in Production and Systems Engineering (PPGEPS), Federal University of Technology-Paran (UTFPR)（生产与系统工程硕士项目，联邦技术大学-巴拉那（UTFPR））

AI总结提出递归类连接分类（R3C）框架，通过迭代扩展脊线结构改进现有增强方法的二值分割输出，无需训练数据即可提升婴儿指纹识别率。

Journal ref IEEE Access 2025

详情

DOI: 10.1109/ACCESS.2025.3594912

AI中文摘要

图像增强在婴儿指纹匹配中至关重要，因为儿童特有的特征（如较小的手指尺寸和较薄的脊线结构）通常会在采集过程中降低图像质量。为解决这些限制，注册通常依赖于专门的高分辨率扫描仪，而大多数现有增强方法并非为此设计。因此，儿童的识别率仍显著低于成人指纹。本研究引入递归类连接分类（R3C），一种通过扩展脊线结构迭代细化现有增强方法二值分割输出的新颖框架。R3C不需要修改底层分类器，且无需训练数据（目前婴儿指纹尚无此类数据）。相反，该方法通过将分类后的图像反复反馈到分类过程中，同时将每个中间分割与原始输入图像结合，从而改进分割。在三个指纹数据集上使用四种不同增强分类器进行的实验表明，与单独使用增强方法相比，R3C可将儿童的真接受率（TAR）提高最多4%，新生儿提高超过40%。定性分析进一步表明，R3C重新连接了断裂的脊线模式，改善了分割的视觉质量。由于独立于所使用的增强方法，R3C为改进二值分割提供了灵活且广泛适用的解决方案。

英文摘要

Image enhancement plays a crucial role in infant fingerprint matching, as child-specific characteristics such as smaller finger dimensions and thinner ridge structures often degrade image quality during acquisition. To address these limitations, enrollment typically depends on specialized highresolution scanners, which most existing enhancement methods are not designed to support. Consequently, identification rates for children remain significantly lower than those achieved with adult fingerprints. This study introduces Recursive Class Connectivity Classification (R3C), a novel framework that iteratively refines binary segmentation outputs from existing enhancement methods by extending ridge structures. R3C does not require modifications to the underlying classifier and operates without training data, which is not currently available for infant fingerprints. Instead, the method improves segmentation by repeatedly feeding the classified image back into the classification process, while combining each intermediate segmentation with the original input image. Experiments conducted on three fingerprint datasets using four different enhancement classifiers show that R3C can increase the True Acceptance Rate (TAR) by up to 4% for children and over 40% for newborns, compared to using the enhancement methods alone. A qualitative analysis further demonstrates that R3C reconnects fragmented ridge patterns, improving the visual quality of segmentation. Because it functions independently of the enhancement method used, R3C provides a flexible and broadly applicable solution for improving binary segmentation.

URL PDF HTML ☆

赞 0 踩 0

2605.25305 2026-05-26 cs.LG

Electricity Consumption Forecasting: An Approach Using Cooperative Ensemble Learning with SHapley Additive exPlanations

电力消耗预测：一种使用SHapley加法解释的协作集成学习方法

Eduardo Luiz Alba, Gilson Adamczuk Oliveira, Matheus Henrique Dal Molin Ribeiro, Érick Oliveira Rodrigues

发表机构 * Industrial & Systems Engineering Graduate Program (PPGEPS), Federal University of Technology-Parana (UTFPR)（工业与系统工程研究生项目（PPGEPS），联邦技术大学-巴兰（UTFPR））

AI总结提出一种名为弱分离器增强器（WSB）的协作集成学习方法，结合LSTM、RF、SVR和XGBoost模型，利用SHAP进行特征选择，遗传算法和粒子群优化超参数，对巴西联邦学院两个校区未来12个月的电力消耗进行预测，取得较低误差。

Journal ref Forecasting 2024

详情

DOI: 10.3390/forecast6030042

AI中文摘要

电力费用管理面临重大挑战，因为该资源易受多种影响因素影响。在大学中，随着机构扩张，对该资源的需求迅速增长，并对环境产生显著影响。本研究使用长短期记忆（LSTM）、随机森林（RF）、支持向量回归（SVR）和极端梯度提升（XGBoost）机器学习模型，基于巴拉那联邦学院（IFPR）过去七年的历史消费数据和气候变量，训练模型以预测未来12个月的电力消耗。采用了两个校区的数据集。为了提高模型性能，使用Shapley加法解释（SHAP）进行特征选择，并使用遗传算法（GA）和粒子群优化（PSO）进行超参数优化。结果表明，所提出的名为弱分离器增强器（WSB）的协作集成学习方法在数据集上表现最佳。具体而言，对于IFPR-Palmas校区，其sMAPE为13.90%，MAE为1990.87 kWh；对于Coronel Vivida校区，sMAPE为18.72%，MAE为465.02 kWh。SHAP分析揭示了两个IFPR校区不同的特征重要性模式。一个共同点是滞后时间序列值的强烈影响和气候变量的最小影响。

英文摘要

Electricity expense management presents significant challenges, as this resource is susceptible to various influencing factors. In universities, the demand for this resource is rapidly growing with institutional expansion and has a significant environmental impact. In this study, the machine learning models long short-term memory (LSTM), random forest (RF), support vector regression (SVR), and extreme gradient boosting (XGBoost) were trained with historical consumption data from the Federal Institute of Paraná (IFPR) over the last seven years and climatic variables to forecast electricity consumption 12 months ahead. Datasets from two campuses were adopted. To improve model performance, feature selection was performed using Shapley additive explanations (SHAP), and hyperparameter optimization was carried out using genetic algorithm (GA) and particle swarm optimization (PSO). The results indicate that the proposed cooperative ensemble learning approach named Weaker Separator Booster (WSB) exhibited the best performance for datasets. Specifically, it achieved an sMAPE of 13.90% and MAE of 1990.87 kWh for the IFPR-Palmas Campus and an sMAPE of 18.72% and MAE of 465.02 kWh for the Coronel Vivida Campus. The SHAP analysis revealed distinct feature importance patterns across the two IFPR campuses. A commonality that emerged was the strong influence of lagged time-series values and a minimal influence of climatic variables.

URL PDF HTML ☆

赞 0 踩 0