arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.09543 2026-06-10 cs.SE cs.AI

SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs

Arihant Tripathy, Ch Pavan Harshit, Karthik Vaidhyanathan

发表机构 * SERC, IIIT-Hyderabad（IIIT-海得拉巴研究所）

详情

DOI: 10.1145/3786167.3788406
Journal ref: Proceedings of the 2026 International Workshop on Agentic Engineering (AGENT 2026), ACM, 2026, pp. 104-111
Comments: 8 pages, 5 figures, 1 table. Accepted to AGENT 2026 (ICSE 2026 workshop)

英文摘要

Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but their practical effectiveness and efficiency within complex agentic frameworks for automated issue resolution remain poorly understood. Goal. We investigate the performance, energy efficiency, and resource consumption of four leading agentic issue resolution frameworks when deliberately constrained to using SLMs. We aim to assess the viability of these systems for this task in resource-limited settings and characterize the resulting trade-offs. Method. We conduct a controlled evaluation of four leading agentic frameworks (SWE-Agent, OpenHands, Mini SWE Agent, AutoCodeRover) using two SLMs (Gemma-3 4B, Qwen-3 1.7B) on the SWE-bench Verified Mini benchmark. On fixed hardware, we measure energy, duration, token usage, and memory over 150 runs per configuration. Results. We find that framework architecture is the primary driver of energy consumption. The most energy-intensive framework, AutoCodeRover (Gemma), consumed 9.4x more energy on average than the least energy-intensive, OpenHands (Gemma). However, this energy is largely wasted. Task resolution rates were near-zero, demonstrating that current frameworks, when paired with SLMs, consume significant energy on unproductive reasoning loops. The SLM's limited reasoning was the bottleneck for success, but the framework's design was the bottleneck for efficiency. Conclusions. Current agentic frameworks, designed for powerful LLMs, fail to operate efficiently with SLMs. We find that framework architecture is the primary driver of energy consumption, but this energy is largely wasted due to the SLMs' limited reasoning. Viable low-energy solutions require shifting from passive orchestration to architectures that actively manage SLM weaknesses.

URL PDF HTML ☆

赞 0 踩 0

2512.04799 2026-06-10 cs.CL

DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors

Gianluca Barmina, Nathalie Carmen Hau Norman, Peter Schneider-Kamp, Lukas Galke Poech

发表机构 * University of Southern Denmark（南方丹麦大学）； University of Copenhagen（哥本哈根大学）

2510.15470 2026-06-10 cs.CV cs.IR

MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval

MSAM：多语义自适应挖掘用于跨模态无人机视频-文本检索

Jinghao Huang, Yaxiong Chen, Ganchao Liu

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University（中山大学计算机科学与工程学院）； School of Computer Science and Artificial Intelligence, Wuhan University of Technology（武汉理工大学计算机科学与人工智能学院）； School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University（西北工业大学人工智能、光学与电子学院（iOPEN））

AI总结本文提出MSAM方法，通过多语义自适应学习机制提升无人机视频-文本跨模态检索性能，采用细粒度交互和自适应语义构建模块增强特征表示鲁棒性。

详情

DOI: 10.1109/TCSVT.2026.3701979

AI中文摘要

随着无人机技术的发展，视频数据量迅速增加，亟需高效的语义检索方法。本文首次系统提出并研究无人机视频-文本检索（DVTR）任务。无人机视频具有俯视视角、强结构同质性和目标组合的多义性，挑战了现有针对地面视角设计的跨模态方法。为此，我们提出名为多语义自适应挖掘（MSAM）的新方法。MSAM引入多语义自适应学习机制，整合帧间动态变化并从特定场景区域提取丰富的语义信息，从而增强对无人机视频内容的深度理解和推理。该方法依赖于词与无人机视频帧之间的细粒度交互，整合自适应语义构建模块、分布驱动的语义学习项和多样性语义项，加深文本与无人机视频模态的交互并提升特征表示的鲁棒性。为减少无人机视频复杂背景的干扰，我们引入了跨模态交互特征融合池化机制，专注于目标区域的特征提取和匹配，以最小化噪声影响。在两个自建的无人机视频-文本数据集上进行的广泛实验表明，MSAM在无人机视频-文本检索任务中优于其他现有方法。源代码和数据集将公开发布。

英文摘要

With the advancement of drone technology, the volume of video data increases rapidly, creating an urgent need for efficient semantic retrieval. We are the first to systematically propose and study the drone video-text retrieval (DVTR) task. Drone videos feature overhead perspectives, strong structural homogeneity, and diverse semantic expressions of target combinations, which challenge existing cross-modal methods designed for ground-level views in effectively modeling their characteristics. Therefore, dedicated retrieval mechanisms tailored for drone scenarios are necessary. To address this issue, we propose a novel approach called Multi-Semantic Adaptive Mining (MSAM). MSAM introduces a multi-semantic adaptive learning mechanism, which incorporates dynamic changes between frames and extracts rich semantic information from specific scene regions, thereby enhancing the deep understanding and reasoning of drone video content. This method relies on fine-grained interactions between words and drone video frames, integrating an adaptive semantic construction module, a distribution-driven semantic learning term and a diversity semantic term to deepen the interaction between text and drone video modalities and improve the robustness of feature representation. To reduce the interference of complex backgrounds in drone videos, we introduce a cross-modal interactive feature fusion pooling mechanism that focuses on feature extraction and matching in target regions, minimizing noise effects. Extensive experiments on two self-constructed drone video-text datasets show that MSAM outperforms other existing methods in the drone video-text retrieval task. The source code and dataset will be made publicly available.

URL PDF HTML ☆

赞 0 踩 0

2510.03844 2026-06-10 cs.LG stat.AP stat.ME

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

发表机构 * Department of Statistical Sciences, Wake Forest University（统计科学系，威克森林大学）； Wake Forest University（威克森林大学）； Wake Forest University School of Medicine（威克森林大学医学院）； Department of Psychology, North Carolina State University（心理学系，北卡罗来纳州立大学）； Department of Biostatistics and Data Science, Wake Forest University School of Medicine（生物统计学与数据科学系，威克森林大学医学院）； Department of Anesthesiology, Division of Critical Care Medicine, Wake Forest University School of Medicine（麻醉学系，重症医学科，威克森林大学医学院）； Outcomes Research Consortium（结局研究联盟）

详情

DOI: 10.1093/jamiaopen/ooag080
Journal ref: 2026

英文摘要

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

URL PDF HTML ☆

赞 0 踩 0

2508.00491 2026-06-10 cs.RO cs.AI

HannesImitation: Grasping with the Hannes Prosthetic Hand via Imitation Learning

Carlo Alessi, Federico Vasile, Federico Ceola, Giulia Pasquale, Nicolò Boccardo, Lorenzo Natale

发表机构 * Humanoid Sensing and Perception（人形感知与感知实验室）； Istituto Italiano di Tecnologia（意大利技术研究院）； Rehab Technologies Lab（康复技术实验室）

详情

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems, Hangzhou, China, 2025
Comments: Paper accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

英文摘要

Recent advancements in control of prosthetic hands have focused on increasing autonomy through the use of cameras and other sensory inputs. These systems aim to reduce the cognitive load on the user by automatically controlling certain degrees of freedom. In robotics, imitation learning has emerged as a promising approach for learning grasping and complex manipulation tasks while simplifying data collection. Its application to the control of prosthetic hands remains, however, largely unexplored. Bridging this gap could enhance dexterity restoration and enable prosthetic devices to operate in more unconstrained scenarios, where tasks are learned from demonstrations rather than relying on manually annotated sequences. To this end, we present HannesImitationPolicy, an imitation learning-based method to control the Hannes prosthetic hand, enabling object grasping in unstructured environments. Moreover, we introduce the HannesImitationDataset comprising grasping demonstrations in table, shelf, and human-to-prosthesis handover scenarios. We leverage such data to train a single diffusion policy and deploy it on the prosthetic hand to predict the wrist orientation and hand closure for grasping. Experimental evaluation demonstrates successful grasps across diverse objects and conditions. Finally, we show that the policy outperforms a segmentation-based visual servo controller in unstructured scenarios. Additional material is provided on our project page: https://hsp-iit.github.io/HannesImitation

URL PDF HTML ☆

赞 0 踩 0

2508.17196 2026-06-10 cs.LG cs.AI

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li

发表机构 * Institute for AI Industry Research (AIR) Tsinghua University（人工智能产业研究院（AIR）清华大学）； Global Innovation Exchange & Department of Automation Tsinghua University（全球创新交流中心及自动化系清华大学）

2508.05769 2026-06-10 cs.CV

Improving Masked Style Transfer using Blended Partial Convolution

Seyed Hadi Seyed, Ayberk Cansever, David Hart

发表机构 * East Carolina University（东卡罗来纳大学）

2410.22967 2026-06-10 cs.LG eess.SP

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

自适应NAD：在线且自适应的无监督网络异常检测器

Yachao Yuan, Yu Huang, Yingwen Wu

发表机构 * Suda University（苏州大学）

AI总结提出一种在线自适应的无监督网络异常检测框架Adaptive NAD，通过两层异常检测策略生成伪标签和在线训练方案，在多个数据集上实现最低误报率和更快推理速度。

详情

AI中文摘要

物联网的广泛使用增加了网络威胁的风险；因此，开发能够适应不断变化的流量模式的异常检测系统（ADS）至关重要。以往的研究主要关注离线无监督学习方法以保护ADS，但这在实际应用中并不适用。本文设计了Adaptive NAD，一种面向安全领域的在线自适应无监督网络异常检测框架。提出了一种两层异常检测策略来生成可靠的高置信度伪标签。然后，引入了一种在线训练方案，通过新颖的阈值计算技术来更新Adaptive NAD。实验结果表明，在CIC-Darknet2020、NSL-KDD和Edge-IIoTset数据集上，Adaptive NAD实现了最低的误报率（分别为1.33%、0.71%和0.08%），并且在线推理延迟比现有最优解决方案快3倍以上。代码已发布在https://github.com/MyLearnCodeSpace/Adaptive-NAD。

英文摘要

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats; thus, developing Anomaly Detection Systems (ADSs) that can adapt to evolving traffic pattern is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. In this paper, we design Adaptive NAD, an online and self-Adaptive unsupervised Network Anomaly Detection framework for security domains. A two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online training scheme is introduced to update Adaptive NAD by a novel threshold calculation technique. Experimental results demonstrate that Adaptive NAD achieves the lowest false alarm rate (1.33%, 0.71%, and 0.08%) and has a more than 3 times faster online inference latency compared with state-of-the-art solutions on the CIC-Darknet2020, NSL-KDD, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

URL PDF HTML ☆

赞 0 踩 0

2501.12486 2026-06-10 cs.LG cs.CL

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci, Suvinay Subramanian, Amir Yazdanbakhsh, Dan Alistarh, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL（MIT 计算科学与人工智能实验室）； Rice University（稻大学）； Google Research（谷歌研究）； Google DeepMind（谷歌深度思维）； Google（谷歌）； IST Austria（奥地利科学院）

2407.09510 2026-06-10 cs.CV

3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods

Milena T. Bagdasarian, Paul Knoll, Yi-Hsin Li, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern

发表机构 * Fraunhofer HHI（弗劳恩霍夫研究所汉诺威研究所）； Humboldt-Universität zu Berlin（柏林洪堡大学）； Technische Universität Berlin（柏林技术大学）

详情

DOI: 10.1111/cgf.70078
Journal ref: Computer Graphics Forum, Volume 44, Issue 2 (2025)
Comments: 3D Gaussian Splatting compression survey; 3DGS compression; updated discussion; new approaches added; new illustrations

英文摘要

3D Gaussian Splatting (3DGS) has emerged as a cutting-edge technique for real-time radiance field rendering, offering state-of-the-art performance in terms of both quality and speed. 3DGS models a scene as a collection of three-dimensional Gaussians, with additional attributes optimized to conform to the scene's geometric and visual properties. Despite its advantages in rendering speed and image fidelity, 3DGS is limited by its significant storage and memory demands. These high demands make 3DGS impractical for mobile devices or headsets, reducing its applicability in important areas of computer graphics. To address these challenges and advance the practicality of 3DGS, this survey provides a comprehensive and detailed examination of compression and compaction techniques developed to make 3DGS more efficient. We classify existing methods into two categories: compression, which focuses on reducing file size, and compaction, which aims to minimize the number of Gaussians. Both methods aim to maintain or improve quality, each by minimizing its respective attribute: file size for compression and Gaussian count for compaction. We introduce the basic mathematical concepts underlying the analyzed methods, as well as key implementation details and design choices. Our report thoroughly discusses similarities and differences among the methods, as well as their respective advantages and disadvantages. We establish a consistent framework for comparing the surveyed methods based on key performance metrics and datasets. Specifically, since these methods have been developed in parallel and over a short period of time, currently, no comprehensive comparison exists. This survey, for the first time, presents a unified framework to evaluate 3DGS compression techniques. We maintain a website that will be regularly updated with emerging methods: https://w-m.github.io/3dgs-compression-survey/ .

URL PDF HTML ☆

赞 0 踩 0

2502.11517 2026-06-10 cs.CL cs.DC cs.LG

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin

发表机构 * DeepMind, London, UK（深度思维公司，伦敦，英国）； Google Research, New York, NY, USA（谷歌研究院，纽约，纽约州，美国）； Stanford University, Stanford, CA, USA（斯坦福大学，斯坦福，加利福尼亚州，美国）； University of Toronto, Toronto, Ontario, Canada（多伦多大学，多伦多，安大略省，加拿大）； University of Washington, Seattle, WA, USA（华盛顿大学，西雅图，华盛顿州，美国）

2501.11937 2026-06-10 cs.LG cs.AI

MeshONet: A Generalizable and Efficient Operator Learning Method for Structured Mesh Generation

Jing Xiao, Xinhai Chen, Qingling Wang, Jie Liu

发表机构 * Laboratory of Digitizing Software for Frontier Equipment, Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology（前沿装备数字化软件实验室、并行与分布式处理技术实验室、国防科技大学）

2409.12263 2026-06-10 cs.LG cs.SI

Detecting LGBTQ+ Instances of Cyberbullying

Arslan Bisharat, Manuel Sandoval Madrigal, Mohammed Abuhamad, Deborah L. Hall, Yasin N. Silva

发表机构 * Loyola University Chicago（洛伊拉大学芝加哥分校）； Arizona State University（亚利桑那州立大学）

2409.04519 2026-06-10 quant-ph cs.AI cs.LG physics.data-an

The role of data embedding in quantum autoencoders for improved anomaly detection

Jack Y. Araz, Michael Spannowsky

发表机构 * Thomas Jefferson National Accelerator Facility（托马斯·杰斐逊国家加速器设施）； Institute for Particle Physics Phenomenology（粒子物理学现象研究所）； Durham University（达勒姆大学）

2606.11044 2026-06-10 stat.ML cs.LG 新提交

Generalized Conformal Predictive Systems Under Distributional Shifts

广义共形预测系统在分布偏移下的应用

Jef Jonkers, Johanna Ziegel

AI总结针对分布偏移，通过观测特定置换权重编码偏移，扩展广义共形预测系统，提出偏移感知预测系统，并引入权重不确定性框构建鲁棒共形预测系统包络，提供有限样本或渐近置信保证。

2606.10280 2026-06-10 eess.IV cs.CV 新提交

Overlapped Wavelet Diffusion for Low-Light Image Enhancement

重叠小波扩散用于低光照图像增强

Fen Peng, Taizo Suzuki, Seisuke Kyochi

AI总结提出重叠小波扩散框架OWDiff，通过重叠小波变换消除块伪影，并引入低频引导的高频增强模块恢复细节，在LOLv1和LOLv2-real数据集上优于现有方法。

详情

DOI: 10.1587/transinf.2026PCP0006
Journal ref: IEICE Transactions on Information and Systems, Advance online publication, 2026
Comments: Advance published in IEICE Transactions on Information and Systems. DOI: 10.1587/transinf.2026PCP0006. Code: https://github.com/FinnPeg/Overlapped-Wavelet-Diffusion

AI中文摘要

在这项研究中，我们提出了一种用于低光照图像增强（LLIE）的重叠小波扩散框架，该框架包含两个互补组件，以实现无块伪影和细节保持的增强。尽管与传统方法相比，最近基于扩散的LLIE方法表现出显著性能，但DiffLL仍然遭受由Haar小波变换（WT）引起的块伪影以及由于其高频恢复模块（HFRM）的限制导致的边缘模糊或纹理过度平滑。为了克服这些问题，我们引入了重叠小波变换（OWT），它融合了相邻区域的相关性，从而在结构上防止块伪影。此外，我们集成了一个低频引导的高频增强模块（HFEBlock）来加强细节恢复，产生更清晰的边缘和更可靠的纹理。在LOLv1和LOLv2-real数据集上的大量实验表明，我们的框架（称为OWDiff）在定性和定量上均持续优于现有的LLIE方法，在保持计算效率的同时实现了卓越的视觉质量。OWDiff有效解决了Haar WT和HFRM的结构限制，与DiffLL相比，在LOLv1和LOLv2-real数据集上平均PSNR增益为0.58 dB，SSIM相对提高1.64%，LPIPS相对降低5.9%。

英文摘要

In this study, we propose an overlapped wavelet diffusion framework for Low-Light Image Enhancement (LLIE), which incorporates two complementary components to achieve blocking artifact-free and detail-preserving enhancement. Although recent diffusion-based LLIE methods have demonstrated remarkable performance compared with traditional approaches, DiffLL still suffers from blocking artifacts caused by the Haar Wavelet Transform (WT) and blurred edges or over-smoothed textures due to the limitations of its High-Frequency Restoration Module (HFRM). To overcome these issues, we introduce an Overlapped WT (OWT) that incorporates correlations across neighboring regions, thereby structurally preventing blocking artifacts. Furthermore, we integrate a low-frequency-guided High-Frequency Enhance Block (HFEBlock) to strengthen detail recovery, yielding sharper edges and more reliable textures. Extensive experiments on the LOLv1 and LOLv2-real datasets demonstrate that our framework, termed OWDiff, consistently outperforms existing LLIE methods both qualitatively and quantitatively, achieving superior visual quality while maintaining computational efficiency. OWDiff effectively addresses the structural limitations of the Haar WT and the HFRM, achieving an average PSNR gain of 0.58 dB, along with a 1.64% relative improvement in SSIM and a 5.9% relative reduction in LPIPS, compared to DiffLL across both the LOLv1 and LOLv2-real datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.11186 2026-06-10 cs.CV 新提交

AnyMod-LLVE: Low-Light Video Enhancement with Modality-Agnostic Inference

AnyMod-LLVE: 模态无关推理的低光照视频增强

Hangfeng Liang, Yutao Hu, Yanhan Hu, Xiaohan Wu, Wenqi Shao, Ying Fu

AI总结提出AMNet统一多模态框架，通过空间-频谱双门控转换器学习辅助模态与RGB输入的对应关系，支持推理时任意模态组合，解决低光照视频增强中辅助模态缺失问题。

详情

Comments: Accepted at ICML 2026; Project page and code: https://lhfgghc.github.io/LLVE-AMNet

AI中文摘要

低光照视频增强（LLVE）由于低照度条件下严重的信息退化仍然是一项具有挑战性的任务。最近的多模态方法通过引入辅助模态（如事件流和红外图像）显著提升了增强性能。然而，这些方法通常假设推理时这些模态可用，这在现实场景中往往不可行。为了解决这个问题，在本工作中，我们提出了AMNet，一个统一的LLVE多模态框架，以支持灵活的模态无关推理，其中辅助模态可能不可用。为了解决模态缺失问题，我们引入了一个空间-频谱双门控转换器，学习辅助模态与RGB输入之间的对应关系，生成隐式辅助表示以支持鲁棒增强。此外，为了充分促进跨模态对应学习，我们基于仅RGB数据集和合成辅助模态进行了大规模多模态预训练。大量实验表明，AMNet能够处理任意推理时的模态组合，并在模态缺失条件下展现出优越的LLVE性能。代码和模型可在项目页面上获取。

英文摘要

Low-light video enhancement (LLVE) remains a challenging task due to severe information degradation under low-illumination conditions. Recent multimodal approaches have significantly improved enhancement performance by incorporating auxiliary modalities, such as event streams and infrared images. However, these methods typically assume the availability of these modalities at inference, which is often not feasible in real-world scenarios. To solve this problem, in this work, we propose AMNet, a unified multimodal framework for LLVE, to support flexible modality-agnostic inference, where auxiliary modalities may be unavailable. To address the issue of modality absence, we introduce a Spatial-Spectral Dual-Gated Translator that learns the correspondence between auxiliary modalities and RGB inputs, producing implicit auxiliary representations to support the robust enhancement. Additionally, to fully facilitate the learning of cross-modal correspondence, we conduct large-scale multimodal pretraining based on the RGB-only dataset with synthetic auxiliary modalities. Extensive experiments demonstrate that AMNet could handle arbitrary inference-time modality combinations and exhibits superior performance for LLVE under modality absence conditions. Code and models are available on the project page.

URL PDF HTML ☆

赞 0 踩 0

2606.11155 2026-06-10 cs.CV 新提交

Mean Flow Distillation: Robust and Stable Distillation for Flow Matching Models

平均流蒸馏：面向流匹配模型的鲁棒稳定蒸馏方法

An Zhao, Shengyuan Zhang, Zhongjian Sun, Yixiang Zhou, Zejian Li, Ling Yang, Tianrun Chen, Lingyun Sun

AI总结提出平均流蒸馏（MFD）框架，通过时间低通滤波抑制优化噪声并保证轨迹一致性，实现流匹配模型的高保真单步生成。

详情

AI中文摘要

流匹配模型在广泛的生成任务中展现出强大性能。然而，它们依赖于基于ODE的迭代采样，在推理中产生大量计算开销，限制了其在实时场景中的应用。虽然蒸馏是一种有前景的解决方案，但现有方法大多借鉴基于扩散的分数匹配，往往未能利用流的固有几何结构，并遭受训练不稳定、高方差和生成质量下降的问题。在本文中，我们提出平均流蒸馏（MFD），一种专为流匹配模型设计的新型蒸馏框架。我们从理论上证明，MFD充当时间低通滤波器，有效抑制变分分数蒸馏（VSD）中固有的高频优化噪声，同时确保全局轨迹一致性。我们进一步证明了平均流匹配定理，表明匹配期望平均速度足以实现严格的分布对齐。在实验上，在包括4D占用预测和文本到图像生成在内的高维流形挑战性任务中，MFD实现了最先进的性能，实现了高保真单步生成。

英文摘要

Flow Matching models have demonstrated strong performance across a wide range of generative tasks. However, their reliance on ODE-based iterative sampling incurs substantial computational overhead in inference, which limits their applicability in real-time scenes. While distillation is a promising solution, existing approaches largely borrow from diffusion-based score matching, often failing to exploit the intrinsic geometric structure of flows and suffering from training instability, high variance, and degraded generation quality. In this paper, we propose Mean Flow Distillation (MFD), a novel distillation framework tailored for flow matching models. We theoretically demonstrate that MFD acts as a temporal low-pass filter, effectively suppressing the high-frequency optimization noise inherent in variational score distillation (VSD) while ensuring global trajectory consistency. We further prove the Mean Flow Matching Theorem, establishing that matching expected average velocities is sufficient for strict distribution alignment. Empirically, on challenging tasks of high-dimensional manifolds including 4D occupancy forecasting and text-to-image generation, MFD achieves state-of-the-art performance, enabling high-fidelity single-step generation.

URL PDF HTML ☆

赞 0 踩 0

2606.11150 2026-06-10 cs.AI cs.CY 新提交

ABC-Bench: An Agentic Bio-Capabilities Benchmark for Biosecurity

ABC-Bench：生物安全的主体生物能力基准

Andrew Bo Liu, Samira Nedungadi, Bryce Cai, Alex Kleinman, Harmon Bhasin, Seth Donoughe

AI总结提出ABC-Bench基准，评估LLM主体在生物安全相关任务上的能力，包括液体处理机器人编程、DNA片段设计和合成筛选规避，所有测试主体均优于人类专家基线。

详情

Comments: 18 pages. To be published in ICML 2026

AI中文摘要

大型语言模型（LLM）正在迅速获得与生物研究相关的能力，从文献综合到实验数据解释。LLM主体也越来越能够执行以前需要经验丰富的人类生物学家才能完成的计算机生物学任务。这些新兴的AI能力为科学发现和生物医学进步提供了新的机会，但也改变了生物安全风险的格局。为了解决这个问题，我们引入了主体生物能力基准（ABC-Bench），这是一套用于衡量主体生物安全相关能力的任务。ABC-Bench在良性和双重用途生物学任务上评估LLM主体：编写代码操作液体处理机器人、设计用于体外组装的DNA片段以及规避DNA合成筛选。这些任务需要生物学和软件专业知识的结合。所有测试的LLM主体在所有三项任务上的表现都优于中位数专家人类基线。主体在依赖已发表知识和有良好文档记录协议的任务上表现优异，而在需要新颖生物信息学推理的任务上表现较弱。在三个湿实验室验证实验中，我们发现OpenAI的o4-mini-high生成的脚本在OpenTrons液体处理机器人上运行时，成功组装了具有预期序列的DNA。

英文摘要

Large language models (LLMs) are rapidly acquiring capabilities relevant to biological research, from literature synthesis to interpretation of experimental data. Increasingly, LLM agents can also perform in silico biology tasks that previously required experienced human biologists. These emerging AI capabilities offer new opportunities for scientific discovery and biomedical advances, but they also shift the landscape of biosecurity risks. To address this, we introduce the Agentic Bio-Capabilities Benchmark (ABC-Bench), a suite of tasks to measure agentic biosecurity-relevant capabilities. ABC-Bench evaluates LLM agents on both benign and dual-use biology tasks: writing code to operate liquid handling robots, designing DNA fragments for in vitro assembly, and evading DNA synthesis screening. These tasks require a combination of biology and software expertise. All tested LLM agents outperformed the median expert human baseliner on all three tasks. Agents performed highly on tasks drawing on published knowledge and well-documented protocols, and more weakly on a task requiring novel bioinformatics reasoning. In three wet-lab validation experiments, we found that OpenAI's o4-mini-high produced scripts that, when run on an OpenTrons liquid handling robot, successfully assembled DNA with expected sequences.

URL PDF HTML ☆

赞 0 踩 0

2606.11131 2026-06-10 cs.CV 新提交

UniPET: a universal network for high-quality PET image denoising across varied dose reduction factors

UniPET：一种适用于不同剂量减少因子的高质量PET图像去噪通用网络

Zhiwen Yang, Yang Zhou, Haowei Chen, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

AI总结针对现有PET去噪方法在剂量减少因子变化时性能下降的问题，提出UniPET网络，通过风格对齐网络和区域感知学习策略实现跨DRF的高质量去噪，性能达到最先进水平。

详情

AI中文摘要

大多数现有的基于深度学习的PET图像去噪方法假设低剂量PET图像具有固定且已知的剂量减少因子（DRF）。然而，当DRF在实际应用中超出假设范围时，这些方法会遇到显著的性能下降。为了应对不同DRF带来的挑战，一些初步研究聚焦于通用PET图像去噪任务，旨在训练一个覆盖不同DRF低剂量数据的通用模型。尽管如此，这些通用模型常常难以处理不同DRF数据中存在的风格不匹配问题，导致出现显著的过度平滑效应，即\textit{风格消除问题}。为了解决这个问题，我们创新性地将域泛化引入PET图像去噪，并提出了一种通用PET图像去噪网络（UniPET），以实现跨不同DRF的高质量PET图像去噪。UniPET包含两个主要创新：风格对齐网络（SAN）和区域感知学习策略（RALS）。具体而言，SAN利用源自域泛化的风格对齐技术来对齐和恢复不同DRF下的风格，确保模型在各种DRF下的泛化能力，同时有效保留风格。此外，为了增强风格恢复，RALS区分平坦区域和风格化区域，仅在后者上进行对抗学习，从而更有效地引导模型关注学习风格化区域。实验证明，我们提出的UniPET能够自适应地恢复不同DRF风格，并实现跨DRF的高质量PET图像去噪。全面的实验表明，UniPET在特定DRF下表现出与专用DRF模型相当的性能，并在定量、感知和临床评估中实现了通用PET图像去噪的最先进性能。

英文摘要

Most existing deep learning-based PET image denoising methods assume a fixed and known dose reduction factor (DRF) for low-dose PET images. However, these methods encounter significant performance degradation when the DRF varies beyond the assumed one in practical applications. To address the challenge posed by varied DRFs, several preliminary studies focus on the task of universal PET image denoising, aiming to train a universal model over low-dose data across DRFs. Nonetheless, these vanilla universal models often struggle with misaligned styles present in different DRF data, leading to the \textit{style elimination issue} with a significant over-smoothing effect. To deal with this issue, we innovatively introduce domain generalization to PET image denoising and propose a universal PET image denoising network (UniPET) to achieve high-quality PET image denoising across diverse DRFs. UniPET comprises two primary innovations: a style alignment network (SAN) and a region-aware learning strategy (RALS). Specifically, SAN utilizes style alignment techniques derived from domain generalization to align and recover styles across different DRFs, ensuring the model's generalizability across various DRFs while effectively preserving styles. Furthermore, to enhance style recovery, RALS distinguishes between flat and stylized regions, exclusively conducting adversarial learning on the latter, thereby more effectively guiding the model's focus towards learning stylized regions. It is demonstrated that our proposed UniPET can adaptively recover different DRF styles and achieve high-quality PET image denoising across DRFs. Comprehensive experiments show that UniPET exhibits comparable performance to individual DRF-specific models at specific DRFs and realizes state-of-the-art performance in universal PET image denoising quantitatively, perceptually, and clinically.

URL PDF HTML ☆

赞 0 踩 0

2606.11082 2026-06-10 cs.CL cs.CY 新提交

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

示播列效应：审计大型语言模型的跨语言分布偏斜

Hakan Mehmetcik

AI总结本研究通过多智能体地缘政治兵棋推演，发现前沿LLM在跨语言条件下存在行为偏斜，且该效应依赖于模型架构与训练机制，而非西方起源模型的普遍属性。

详情

Comments: 25 pages, 2 figures, 6 tables, Research Article

AI中文摘要

本研究调查了前沿大型语言模型（LLMs）在持续对抗条件下遭受的跨语言分布偏斜（示播列效应）。我们开发了一个多智能体地缘政治兵棋推演——蔚蓝海危机，这是一个旨在模拟东地中海冲突结构动态的合成海洋领土争端。六个前沿模型（GPT-4o、Llama-4、Mistral-Large、Gemini-3.1-Pro、Qwen3.6-Plus和DeepSeek-R1）参与了一项组间实验（每组N=10局游戏，每局K=5轮），其中唯一的操作变量是游戏语言（英语与土耳其语），产生了586条有效陈述。一个零样本分类器沿两个连续维度评估行为倾向：让步率和强制修辞。结果是异质的。Llama-4在土耳其语下显示出经Holm校正的强制修辞显著增加（delta = +0.800，p = .002），而Gemini-3.1-Pro显示出同样大的下降（delta = -0.750，p = .005）。DeepSeek-R1表现出类似的负向偏移（delta = -0.860，p = .006），并提供了与缓冲机制一致的思维链证据。GPT-4o未显示出可检测效应（delta = +0.130，p = .614）。这些发现表明，跨语言行为偏斜取决于模型架构和训练机制，而非西方起源LLM的普遍属性。我们识别出两种不同的缓冲机制——思维链制度锚定和多语言RLHF对齐——并讨论了它们对将LLM安全集成到外交和危机管理环境中的启示。

英文摘要

This study investigates cross-lingual distributional skew (the Shibboleth Effect) in frontier large language models (LLMs) subjected to sustained adversarial conditions. We develop a multi-agent geopolitical wargame, the Cerulean Sea Crisis, a synthetic maritime territorial dispute designed to mirror the structural dynamics of Eastern Mediterranean conflicts. Six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, and DeepSeek-R1) participate in a between-groups experiment (N = 10 games per arm, K = 5 rounds per game) in which the sole manipulation is the language of play (English versus Turkish), producing 586 validated statements. A zero-shot classifier assesses behavioral dispositions along two continuous dimensions: Concession Rate and Coercive Rhetoric. The results are heterogeneous. Llama-4 shows a substantial, Holm-corrected increase in coercive rhetoric under Turkish (delta = +0.800, p = .002), whereas Gemini-3.1-Pro displays an equally large decrease (delta = -0.750, p = .005). DeepSeek-R1 exhibits a similar negative shift (delta = -0.860, p = .006) and provides chain-of-thought evidence consistent with a buffering mechanism. GPT-4o shows no detectable effect (delta = +0.130, p = .614). These findings indicate that cross-lingual behavioral skew is contingent on model architecture and training regime rather than a universal property of Western-origin LLMs. We identify two distinct buffering mechanisms, chain-of-thought institutional anchoring and multilingual RLHF alignment, and discuss their implications for integrating LLMs safely into diplomatic and crisis-management settings.

URL PDF HTML ☆

赞 0 踩 0

2606.11066 2026-06-10 cs.LG q-bio.NC 新提交

GRAFT: Gain-Recalibrated Adapters for Transformer-Based Neural Population Activity Modeling

GRAFT: 基于Transformer的神经群体活动建模中的增益重校准适配器

Xiangsheng Ge, Yang Xie

AI总结提出GRAFT模型，通过分离可重用时间动态与可重校准神经元接口，在MC Maze数据集上达到SOTA，并仅更新9.21%参数实现跨天重校准。

详情

AI中文摘要

神经群体活动模型可以从分箱的尖峰信号中恢复丰富的时间结构，但其读入和读出层通常与固定的记录神经元集合绑定。这种耦合限制了在长期脑机接口中的重用，因为记录神经元的身份、数量和响应统计可能每天变化。我们引入了GRAFT，一种基于Transformer的神经群体活动模型，它将可重用时间动态与可重校准的神经元接口分离。神经元接口控制记录神经元如何进入和离开共享骨干网络，辅助增益和位置机制支持Transformer内部的神经活动建模。在标准NLB'21协议下的MC Maze上，GRAFT作为集成模型达到0.3866 co-bps，在公共和报告的NLB'21结果中，在主要co-bps指标上创造了新的最先进水平。在从NLB'21 MC Maze数据集系列构建的跨天协议中，GRAFT通过仅更新9.21%的参数，从MC Maze重校准到缩放后的MC Maze数据集（Large/Medium/Small），在受限的目标天支持集下分别达到0.3749、0.3112和0.3152 co-bps。这些结果表明，相同的接口-骨干分离既支持强大的基于Transformer的神经群体活动建模，也支持数据高效的跨天重校准。

英文摘要

Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to a fixed set of recorded neurons. This coupling limits reuse in long-term brain-computer interfaces, where recorded neuron identities, counts, and response statistics can change across days. We introduce GRAFT, a Transformer-based neural population activity model that separates reusable temporal dynamics from a recalibratable neuron interface. The neuron interface controls how recorded neurons enter and leave the shared backbone, and auxiliary gain and positional mechanisms support neural activity modeling inside the Transformer. On MC Maze under the standard NLB'21 protocol, GRAFT reaches 0.3866 co-bps as an ensemble, setting a new state of the art on the primary co-bps metric among public and reported NLB'21 results. In a cross-day protocol constructed from the NLB'21 MC Maze dataset series, GRAFT recalibrates from MC Maze to the scaled MC Maze datasets (Large/Medium/Small) by updating only 9.21% of parameters, reaching 0.3749, 0.3112, and 0.3152 co-bps with restricted target-day support sets. These results show that the same interface-backbone separation supports both strong Transformer-based neural population activity modeling and data-efficient cross-day recalibration.

URL PDF HTML ☆

赞 0 踩 0

2606.11057 2026-06-10 cs.LG q-bio.BM stat.ML 新提交

Flexible Kernels for Protein Property Prediction

用于蛋白质性质预测的灵活核函数

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

AI总结提出利用进化替代矩阵和局部线性性的序列核函数，结合高斯过程实现数据高效的蛋白质性质预测，并融入结构信息进行多任务学习。

详情

Comments: 50 pages; to appear at ICML 2026

AI中文摘要

尽管对蛋白质设计应用至关重要，但从稀疏实验数据预测蛋白质性质（如结合亲和力和热稳定性）仍然是一个重大挑战。因此，我们引入了一类序列核函数，利用进化替代矩阵以及局部线性性，并证明由此产生的高斯过程为蛋白质性质景观提供了数据高效的模型，通常优于依赖基础模型嵌入的替代方法。此外，通过学习实际上是结构感知的替代矩阵，我们展示了我们的核函数可以轻松地整合来自基础模型的结构信息。我们证明了这些结构条件核函数非常适合跨多个蛋白质性质景观的多任务学习，并且可以显著优于局部监督学习方法。

英文摘要

Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

URL PDF HTML ☆

赞 0 踩 0

2606.11009 2026-06-10 cs.CL cs.CY 新提交

Who Brought Easter Eggs to Eid? Auditing Cultural Translation of Math Word Problems Across Diverse Languages and Regions

谁把复活节彩蛋带到了开斋节？跨语言和地区数学应用题的文化翻译审计

Parisa Suchdev, Juniper Lovato

AI总结本研究审计了三个大型语言模型将60个英语数学应用题翻译为7种语言时的文化适应性，发现模型在62.5%的案例中一致，但仅33.5%有相同替换，且所有组合均出现熵塌缩，优先改变表面标记而保留深层结构，导致文化多样性压缩和区域误归因。

详情

Comments: 17 pages total with references and appendix, 9 figures, under review

AI中文摘要

大型语言模型越来越多地被用于大规模个性化学习中改编数学应用题，但这些改编是否跨模型一致、是否在规模上保留文化多样性、以及揭示模型认为哪些文化实体最显著，仍是未解决的问题。我们分析了Claude Opus 4、GPT-4.1和Gemini 2.5 Pro如何将60个英语数学应用题改编为孟加拉语、印地语、旁遮普语（印度）、乌尔都语、信德语（巴基斯坦）、意大利语和西西里语（意大利），这一语言集涵盖了从高资源语言（意大利语和印地语）到研究不足的语言（信德语、西西里语和旁遮普语）的完整资源谱系。我们标注了6,489个实体转换，编码模型是否保留、本地化、泛化、省略或更改名称、食物和地点等实体。模型在62.5%的案例中在转换类型上一致，在特定替换上仅33.5%一致，这意味着模型选择直接塑造了学生遇到的文化世界。所有21种语言-模型组合均出现熵塌缩，改编压缩而非扩展了文化多样性。模型优先处理表面标记（如名称、食物和货币），同时保留更深层的结构特征（如嵌入特定文化假设的年级系统）。尽管提示指定了目标国家，模型仍错误归因区域背景，例如对印度孟加拉语学生使用孟加拉国塔卡，并产生跨文化污染，例如将寻蛋活动改编为开斋节活动。某些失败在单个翻译中可见。其他失败，包括多样性塌缩、对表面标记的系统性偏好以及一致的区域误归因，仅通过语料库级分析才显现。使改编问题看起来正确的表面合理性，正是使深层失败容易被忽视的原因。

英文摘要

Large language models are increasingly used to adapt math word problems for personalized learning at scale, but it remains an open question whether those adaptations are consistent across models, preserve cultural diversity at scale, and reveal which cultural entities models treat as most salient. We analyze how Claude Opus 4, GPT-4.1, and Gemini 2.5 Pro adapt 60 English math word problems into Bengali, Hindi, Punjabi (India), Urdu, Sindhi (Pakistan), Italian, and Sicilian (Italy), a language set spanning the full resource spectrum, from high-resource Italian and Hindi to under-studied Sindhi, Sicilian, and Punjabi. We annotate 6,489 entity transformations, coding whether models preserve, localize, generalize, omit, or change entities such as names, foods, and places. Models agree on transformation type in 62.5% of cases and on specific substitutions in only 33.5%, meaning model choice directly shapes which cultural world students encounter. All 21 language-model combinations show entropy collapse, with adaptation compressing rather than expanding cultural diversity. Models prioritize surface markers such as names, foods, and currencies while preserving deeper structural features such as grade-level systems that embed culturally specific assumptions. Despite prompts specifying target countries, models misattribute regional context by using Bangladeshi taka for Indian Bengali students and produce cross-cultural contamination, such as adapting egg hunts as Eid activities. Some failures are visible in individual translations. Others, including diversity collapse, systematic preference for surface markers, and consistent regional misattribution, emerge only through corpus-level analysis. The surface plausibility that makes adapted problems look correct is precisely what makes deeper failures easy to overlook.

URL PDF HTML ☆

赞 0 踩 0

2606.10986 2026-06-10 cs.RO cs.SY eess.SY 新提交

Multi-UAV Active Sensing with Information Gain-based Planning and Belief Fusion

基于信息增益规划与信念融合的多无人机主动感知

S. Habibi, L. Marques

AI总结提出多无人机主动感知框架，利用信息增益路径规划与概率信念融合实现二元地形映射，在合成和真实农业图像上验证，相比随机游走和扫描覆盖降低熵与误差。

详情

AI中文摘要

无人机越来越多地用于空间分布环境中的主动感知和信息收集。然而，其性能受到有限飞行时间、感知不确定性以及空间覆盖与观测精度之间权衡的制约。本文提出了一个多无人机主动感知框架的实际验证，用于概率二元地形映射，以精准农业作为应用案例。环境表示为概率信念图，其中空间依赖性通过因子图建模。无人机决策由基于信息增益的信息路径规划（IGbIPP）引导，并与随机游走和扫描覆盖路径规划基线在合成地形和真实无人机农业图像上进行比较。研究还评估了空间相关权重和几种用于多无人机信息共享的概率信念融合规则。结果表明，IGbIPP比基线更有效地降低了熵和映射误差，而更宽的视场提高了实际覆盖和地图精度。结果进一步表明，简单的相等或偏置空间权重比自适应权重更稳健，并且贝叶斯、对数几率与Dempster-Shafer融合实现了最佳协同映射性能。这些发现强调了不确定性驱动规划、感知几何、空间建模和概率融合对于实际无人机主动感知的重要性。

英文摘要

Unmanned aerial vehicles (UAVs) are increasingly used for active sensing and information gathering in spatially distributed environments. Their performance, however, is constrained by limited flight time, sensing uncertainty, and the trade-off between spatial coverage and observation accuracy. This paper presents a real-world validation of a multi-UAV active sensing framework for probabilistic binary terrain mapping, with precision agriculture used as the application case. The environment is represented as a probabilistic belief map, where spatial dependencies are modeled through a factor-graph formulation. UAV decision making is guided by Information Gain based Informative Path Planning (IGbIPP), and the approach is compared with Random Walk and Sweep coverage path planning baselines using both synthetic terrains and real UAV-derived agricultural imagery. The study also evaluates spatial correlation weights and several probabilistic belief-fusion rules for multi-UAV information sharing. Results show that IGbIPP reduces entropy and mapping error more effectively than the baselines, while a wider field of view improves real-world coverage and map accuracy. The results further show that simple equal or biased spatial weights can be more robust than adaptive weights, and that Bayesian, log-odds, and Dempster--Shafer fusion achieve the best cooperative mapping performance. These findings highlight the importance of uncertainty-driven planning, sensing geometry, spatial modeling, and probabilistic fusion for real-world UAV-based active sensing.

URL PDF HTML ☆

赞 0 踩 0

2606.10940 2026-06-10 cs.CV cs.AI cs.LG 新提交

Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals

民主化相机陷阱AI：用于检测英国哺乳动物的开源模型

Paul Fergus, Philip Stephens, Russell A. Hill, Lee Oliver, Katie Appleby, Sarah Beatham, Naomi Davies Walsh, Stuart Nixon, Naomi Matthews, Chris Sutherland, Kelly Hitchcock

AI总结发布一个针对31类（28种英国常见哺乳动物和鸟类）的开源目标检测模型，基于YOLO26x在48,165个标注实例上训练，mAP@0.5达0.984，旨在降低生态学家使用AI的门槛。

详情

Comments: 15 Pages, 4 Figures

AI中文摘要

相机陷阱已成为生物多样性监测的基石，但将大量图像转化为可用生态数据的人工智能通常被锁定在商业平台之后，或针对与不列颠群岛不相符的动物群进行训练。为了消除障碍并提高采用率，我们发布了一个针对31类（28种英国常见哺乳动物和鸟类，以及人类、校准杆和车辆等实用类）的开源目标检测模型，该模型基于从多个地点经过十年运营部署（通过Conservation AI及其后续项目Trap Tracker）收集的48,165个标注实例的精选数据集。该模型是YOLO26x检测器，在80/10/10的类别分层划分上进行训练和测试，在保留的验证集上，IoU为0.5时平均精度为0.984（IoU 0.5-0.95时为0.956），精确率为0.988，召回率为0.965。在未见过的保留测试集上，31个类别的平均物种置信度范围为0.96至0.99，假阴性率为0.17%，主要集中在困难的夜间、远处或遮挡图像中。这些指标来自与训练相同站点和相机池的数据，因此在新站点的性能留待未来工作。我们以非商业许可发布ONNX格式的训练权重，支持本地桌面和实时相机，明确面向没有机器学习经验的生态学家。此发布是对过去十年中开发的多个付费模型的有意制衡。

英文摘要

Camera traps have become a cornerstone of biodiversity monitoring, but the artificial intelligence that turns vast quantities of images into usable ecological data is often locked behind commercial platforms or trained on fauna that does not match that of the British Isles. In an attempt to remove barriers and increase uptake, we release an open-source object detection model for 31 classes, 28 common UK mammal and bird species, plus utility classes for humans, calibration poles, and vehicles, drawn from a curated dataset of 48,165 labelled instances assembled from multiple sites over a decade of operational deployment through Conservation AI and its successor, Trap Tracker. The model, a YOLO26x detector trained and tested on an 80/10/10 class-stratified split, achieves a mean Average Precision of 0.984 at Intersection over Union (IoU) of 0.5 (0.956 at IoU 0.5-0.95) on the held-out validation set, with precision 0.988 and recall 0.965. On an unseen held-out test split, mean per-species confidence ranged from 0.96 to 0.99 across the 31 classes, with a 0.17% false-negative rate concentrated in difficult night-time, distant, or occluded images. These metrics are from data from the same pool of sites and cameras as training, so performance at entirely new sites is left to future work. We release the trained weights in ONNX format under a non-commercial licence, with local desktop and real-time camera support, aimed explicitly at ecologists with no machine-learning experience. This release is a deliberate counterweight to the multiple paid for models that have developed over the last decade.

URL PDF HTML ☆

赞 0 踩 0

2606.10938 2026-06-10 cs.LG 新提交

A Systematic Approach for Selecting Trajectories for Data Augmentation

一种系统化的轨迹数据增强选择方法

Adam Nordling

AI总结提出系统化框架评估五种轨迹选择策略（异常性、多样性、代表性、不确定性和随机），在四个数据集上测试，发现异常性和不确定性策略在稀疏数据中提升稳定性，但在密集数据中可能引入噪声。

详情

Comments: 39 pages, 4 figures, Masters project

AI中文摘要

轨迹数据增强是缓解机器学习应用中数据稀缺问题的一种有前景的方法，但其效用因保持时空一致性的复杂性而受到限制。尽管先前的工作证明了几何扰动的可行性，但它依赖于简单的随机选择，在理解哪些轨迹应被增强以获得最大收益方面留下了关键空白。本文通过开发一个系统且可扩展的框架来评估五种系统选择策略：异常性、多样性、代表性、不确定性和随机选择，填补了这一空白。这些策略在四个数据集（涵盖动物行为（Foxes和Starkey）、海上交通（AIS）和城市交通（Car））上使用一系列线性和非线性机器学习模型进行了严格测试。作为评估的一部分，集成了基于Optuna的超参数优化循环，以在探索的搜索空间内经验性地确定每个数据集的最佳增强参数。结果表明，虽然系统选择并非通用解决方案，但它比随机基线具有明显优势。系统策略，特别是异常性和不确定性，表现出更高的稳定性，并且在密集数据集中不易出现随机采样观察到的性能下降。然而，研究结果也表明，增强的价值是有严格条件的。通过UMAP的可视化分析表明，虽然系统增强成功修复了稀疏数据集中的拓扑碎片化，但在高质量密集数据集中，它可能充当破坏性噪声信号。此外，研究还发现了高速度域中的物理限制，其中标准扰动技术导致特征空间中的发散。

英文摘要

Trajectory data augmentation is a promising approach to mitigate data scarcity in machine learning applications, but its utility has been limited by the complexity of preserving spatio-temporal coherence. Although prior work demonstrated the viability of geometric perturbation, it relied on naive random selection, leaving a critical gap in understanding which trajectories should be augmented for maximal benefit. This thesis addresses this gap by developing a systematic and scalable framework to evaluate five systematic selection strategies: Outlierness, Diversity, Representativeness, Uncertainty, and Random selection. These strategies were rigorously tested across four datasets covering animal behavior (Foxes and Starkey), maritime traffic (AIS), and urban traffic (Car) using a suite of linear and non-linear machine learning models. As part of this evaluation, an Optuna-based hyperparameter optimization loop was integrated to empirically identify the best-performing augmentation parameters for each dataset within the explored search space. The results indicate that, while systematic selection is not a universal solution, it offers distinct advantages over the random baseline. Systematic strategies, particularly Outlierness and Uncertainty, demonstrated higher stability and were less prone to performance degradation observed with random sampling in dense datasets. However, the findings also reveal that the value of augmentation is strictly conditional. Visual analysis via UMAP demonstrates that while systematic augmentation successfully repairs topological fragmentation in sparse datasets, it can act as a corrupting noise signal in high-quality, dense datasets. Furthermore, the study identified physical limitations in high-velocity domains, where standard perturbation techniques lead to divergence in feature space...

URL PDF HTML ☆

赞 0 踩 0

2606.10934 2026-06-10 cs.AI 新提交

WorldKernel: A World Model is the Coupling Kernel of Admissible Possible Worlds

WorldKernel：世界模型是可能世界的耦合核

Fabio Rovai

AI总结本文发现强预测器在反事实耦合上失效，提出将世界模型建模为可能世界上的半正定耦合核，其非对角元编码反事实信息，并通过半正定性约束和逻辑公理实现高效推理。

详情

AI中文摘要

一个常见的假设认为，给足够强的预测器提供足够的观测和干预数据就足够了。我们报告了一个与之矛盾的失败模式。在数百个结构因果模型中，对于已识别的量，强预测器和贝叶斯基线都成功，但对于未识别的量（反事实世界之间的耦合），预测器坍缩为一个点，在28%的模型上坍缩到没有有效模型能产生的点，而真实情况是一个可容许区间，更多数据永远不会缩小这个区间。这种差距是结构性的：预测无法表示反事实耦合上的不确定性。我们将世界模型建模为可容许世界上的单个半正定耦合核K(T,T')，其对角线是普通后验（预测器恢复的内容），非对角线是它无法恢复的跨世界耦合，每个反事实都读取这个耦合。本文就是关于这个非对角元的理论。它是真实的：两个具有相同后验的状态在跨世界查询上不同，而非对角元正是固定反事实的耦合。它是有界的：半正定性是边际分布缺乏的部分识别信息，强制执行它可以在多项式时间内对反事实进行有界化，而精确的响应类型程序是难处理的。逻辑结构使其更精确：本体论公理将边界收紧多达三分之一，并传播到它们从未触及的耦合。它是可获取的：有针对性的疤痕，即从遇到的不可行性中学习到的约束，比无针对性的疤痕快几倍地缩小差距。它的完全重构是对可容许世界的近似计数，在Sly-Sun阈值以下是易处理的，在此之上是难近似的；我们不声称能击败最坏情况。

英文摘要

A common assumption holds that enough observational and interventional data, given to a strong enough predictor, suffices. We report a failure mode that contradicts it. Across hundreds of structural causal models, on identified quantities a strong predictor and a Bayesian baseline both succeed, but on unidentified quantities (the couplings between counterfactual worlds) the predictor collapses to a point, on 28% of models to one no valid model can produce, while the truth is an admissible interval more data never narrows. The gap is structural: prediction cannot represent uncertainty over counterfactual couplings. We cast a world model as a single positive semidefinite coupling kernel K(T,T') over admissible worlds, whose diagonal is the ordinary posterior (what a predictor recovers) and whose off-diagonal is the cross-world coupling it cannot, which every counterfactual reads. The paper is the theory of that off-diagonal. It is real: two states with identical posteriors differ on a cross-world query, and the off-diagonal is the coupling that fixes counterfactuals. It can be bounded: positive semidefiniteness is partial-identifying information the marginals lack, and enforcing it bounds counterfactuals in polynomial time where the exact response-type program is intractable. Logical structure sharpens it: ontology axioms tighten the bound by up to a third, propagating to couplings they never touch. It can be acquired: targeted scars, constraints learned from encountered infeasibilities, close the gap several times faster than untargeted ones. Its full reconstruction is approximate counting of the admissible worlds, tractable below the Sly-Sun threshold and inapproximable above; we do not claim to beat the worst case.

URL PDF HTML ☆

赞 0 踩 0

2606.10902 2026-06-10 cs.CV cs.AI 新提交

Pose-ICL: 3D-Aware In-Context Learning for Pose-Controllable Subject Customization

Pose-ICL：面向姿态可控主体定制的3D感知上下文学习

Xuan Han, Yihao Zhao, Mingyu You

AI总结提出Pose-ICL框架，通过3D感知上下文学习和表面锚定位置嵌入（SAPE）实现无调优的姿态可控主体定制，显著提升姿态准确性和身份一致性。

详情

AI中文摘要

主体定制是现代图像生成中的基础任务。通过提供少量参考图像和文本提示，用户可以生成特定对象在任意期望场景中的图像。然而，现有方法在实现定制主体的有效姿态控制方面仍存在困难。在实践中，它们常常表现出不准确的姿态或不一致的跨姿态外观。这些局限性表明，对于2D原生骨干网络而言，以体积方式理解对象仍然是一个重大挑战。为了应对这一挑战，我们提出了Pose-ICL，这是一个无需调优的框架，利用3D感知上下文学习（ICL）通过多个配对的图像-姿态参考直接适应新主体。其核心机制——表面锚定位置嵌入（SAPE）——通过将图像令牌锚定到体积边界框的表面坐标，赋予模型显式的3D感知能力。专门的优化确保了其与现有DiT模型的无缝兼容性。在3D资产和真实世界主体上的广泛评估表明，Pose-ICL在姿态准确性和身份一致性方面均显著优于当前方法。

英文摘要

Subject Customization is a foundational task in modern image generation. By providing a few reference images and a text prompt, users can generate images of a specific object in any desired scene. However, existing methods still struggle to achieve effective pose control for customized subjects. In practice, they often exhibit inaccurate poses or inconsistent cross-pose appearances. These limitations suggest that understanding objects in a volumetric manner remains a significant challenge for 2D-native backbones. To address this challenge, we propose Pose-ICL, a tuning-free framework that leverages 3D-aware In-Context Learning (ICL) to directly adapt to new subjects through multiple paired image-pose references. Its core mechanism,Surface-Anchored Position Embedding (SAPE), equips the model with explicit 3D awareness by anchoring image tokens to the surface coordinates of a volumetric bounding box. Dedicated refinements ensure its seamless compatibility with existing DiT models. Extensive evaluations on both 3D assets and real-world subjects demonstrate that Pose-ICL significantly outperforms current methods in both pose accuracy and identity consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.10894 2026-06-10 cs.CV 新提交

The 1st PortraitCraft Challenge: A CVPR 2026 Workshop Competition on Portrait Composition Understanding and Generation

首届PortraitCraft挑战赛：CVPR 2026研讨会肖像构图理解与生成竞赛

Zijie Lou, Youyun Tang, Xiaochao Qu, Haoxiang Li, Ting Liu, Luoqi Liu, Xun Zhu, Zheng Zhang, Xi Chen, Miao Li, Ji Wu, Dizhe Zhang, Xian Ge, Sujia Wang, Ruiyang Zhang, Jiaming Wang, Xianshun Wang, Lu Qi, Boao Kang, Wei Zhou, Jinghui Sun, Zhenyu Yan, Jiliang Zhao, Rui Yang, Yipo Huang, Boyuan Liu, Shanglin Li, Zifan Xie, Yichen Zhang, Anlan Wang, Wenfeng Lin, Mingyu Guo, Dong Li, Xinghao Wang, Yanting Li, Shanzhao Tong, Shuai He, Qiu Zhou, Yongqi Yang, Taoyang Mu, Dianqiao Lei, Anlong Ming, Huadong Ma

AI总结提出PortraitCraft挑战赛，包含构图理解与生成两个赛道，并发布约5万张肖像数据集，推动肖像美学与可控图像生成研究。

详情

AI中文摘要

本文介绍了首届PortraitCraft挑战赛的概况，该挑战赛是CVPR 2026的官方竞赛之一。挑战赛聚焦于肖像构图理解与生成，旨在推动AI在肖像美学分析和可控图像合成方面的研究。与主要关注全局美学评分的现有数据集和任务不同，PortraitCraft引入了一个统一的评估框架，包含两个互补赛道。赛道1要求模型进行结构化肖像构图理解，赛道2要求模型在显式构图约束下从结构化构图描述生成肖像图像。为支持该挑战赛，我们构建并公开发布了一个大规模肖像构图数据集，包含约50,000张精心策划的真实肖像图像，提供多级监督。本报告描述了挑战赛设置、评估协议、数据集组成和最终结果，并分析了提交方案的技术特点。PortraitCraft挑战赛为肖像构图理解与生成研究提供了一个标准化和可复现的平台，有望推动肖像美学和可控图像生成领域的进一步发展。

英文摘要

This paper presents an overview of the inaugural PortraitCraft Challenge, held as one of the official competitions at CVPR 2026. The challenge focuses on portrait composition understanding and generation, aiming to advance AI research in portrait aesthetics analysis and controllable image synthesis. Unlike existing datasets and tasks that primarily focus on global aesthetic scoring, PortraitCraft introduces a unified evaluation framework comprising two complementary tracks. Track 1 requires models to perform structured portrait composition understanding, and Track 2 requires models to generate portrait images from structured composition descriptions under explicit compositional constraints. To support the challenge, we constructed and publicly released a large-scale portrait composition dataset consisting of approximately 50,000 curated real portrait images, providing multi-level supervision. This report describes the challenge setup, evaluation protocols, dataset composition, and final results, along with an analysis of the technical characteristics of the submitted solutions. The PortraitCraft Challenge provides a standardized and reproducible platform for research on portrait composition understanding and generation, and is expected to foster further progress in the fields of portrait aesthetics and controllable image generation.

URL PDF HTML ☆

赞 0 踩 0