arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2501.11755 2026-06-09 eess.IV cs.CV 版本更新

A generalizable 3D framework and model for self-supervised learning in medical imaging

一种通用的3D框架和模型用于医学影像中的自监督学习

Tony Xu, Sepehr Hosseini, Chris Anderson, Anthony Rinaldi, Rahul G. Krishnan, Anne L. Martel, Maged Goubran

发表机构 * Department of Medical Biophysics, University of Toronto（多伦多大学医学生物物理学系）； Department of Computer Science, University of Toronto（多伦多大学计算机科学系）； Institute for Aerospace Studies, University of Toronto（多伦多大学航空航天研究所）； Physical Sciences Platform, Sunnybrook Research Institute（圣母医院研究学院物理科学平台）； Vector Institute, Toronto（多伦多向量研究所）； Department of Laboratory Medicine and Pathobiology, University of Toronto（多伦多大学实验室医学与病理学系）； Hurvitz Brain Sciences, Sunnybrook Health Sciences Centre（圣母医院健康科学中心Hurvitz脑科学）； Harquail Centre for Neuromodulation, Sunnybrook Health Sciences Centre（圣母医院健康科学中心Harquail神经调制中心）

AI总结本文提出3DINO方法，基于大规模多模态数据集预训练出通用医学影像模型3DINO-ViT，验证其在多种医学影像分割和分类任务中的泛化能力，优于现有方法。

Comments Published in npj Digital Medicine

详情

DOI: 10.1038/s41746-025-02035-w

AI中文摘要

当前3D医学影像自监督学习方法依赖简单的预设任务和特定器官或模态的数据集，限制了其通用性和扩展性。我们提出了3DINO，一种针对3D数据集的先进自监督学习方法，并在包含超过10个器官的10万例3D医学影像扫描的多模态数据集上预训练了3DINO-ViT。我们通过广泛的实验验证了3DINO-ViT在多种医学影像分割和分类任务中的性能。结果表明，3DINO-ViT能够跨模态和器官泛化，包括在分布外任务和数据集上表现优异，在大多数评估指标和标注数据集大小上均优于现有方法。我们的3DINO框架和3DINO-ViT将被公开，以促进3D基础模型的研究或进一步微调用于广泛医学影像应用。

英文摘要

Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of ~100,000 3D medical imaging scans from over 10 organs. We validate 3DINO-ViT using extensive experiments on numerous medical imaging segmentation and classification tasks. Our results demonstrate that 3DINO-ViT generalizes across modalities and organs, including out-of-distribution tasks and datasets, outperforming state-of-the-art methods on the majority of evaluation metrics and labeled dataset sizes. Our 3DINO framework and 3DINO-ViT will be made available to enable research on 3D foundation models or further finetuning for a wide range of medical imaging applications.

URL PDF HTML ☆

赞 0 踩 0

2501.06659 2026-06-09 cs.DB cs.CV 版本更新

Visual Template Inference for Data Extraction from Documents

文档数据提取的视觉模板推断

Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran

发表机构 * UC Berkeley（加州大学伯克利分校）

AI总结提出TWIX工具，通过推断文档的视觉模板来高效、低成本地提取结构化数据，在精度和召回率上优于现有方法，并实现大规模数据集上的显著加速和降本。

详情

AI中文摘要

许多模板化文档是根据结构化数据按照视觉模板程序化生成的。这类文档包括发票、税务文件、财务报告和采购订单。从这些文档中有效提取数据对于支持下游分析任务至关重要。当前的数据提取工具通常难以处理复杂的文档布局，在大数据集上会产生高延迟和/或高成本，并且需要大量人力。我们的工具TWIX的关键洞察是推断用于生成此类文档的底层模板，然后提取数据，而不是直接从文档中提取。为此，TWIX首先通过利用字段的一致位置模式（例如，同一模板中的两个字段在多个记录中反复以固定距离共现）来推断底层字段，如表格部分的列或共置键值对中的键。然后，TWIX通过强制视觉约束（例如，对于表格区域，垂直对齐表格行与其列标题；对于键值对，水平对齐键与其值）将这些字段组装成模板。最后，TWIX使用这个推断出的模板以低成本从模板化文档中准确高效地提取数据。在一个包含34个多样化真实世界数据集的基准测试中，TWIX在精度和召回率上比最先进的结构化数据提取工具（Evaporate、Textract和Azure Document Intelligence）以及基于视觉的大语言模型（如GPT-4-Vision）高出25%以上。另一个包含30个大数据集的基准测试展示了TWIX的可扩展性：对于从超过2000页的大型文档集合中提取数据，它比最具竞争力的对比工具快520倍，便宜3786倍。

英文摘要

Many templatized documents are programmatically generated from structured data following a visual template. Such documents include invoices, tax documents, financial reports, and purchase orders. Effective data extraction from these documents is crucial to support downstream analytical tasks. Current data extraction tools often struggle with complex document layouts, incur high latency and/or cost on large datasets, and require significant human effort. The key insight of our tool, TWIX, is to infer the underlying template used to create such documents, and then extract the data, rather than extracting directly from documents. To do so, TWIX first infers the underlying fields, such as columns of tabular portions or keys in co-located key-value pairs, by leveraging their consistent location patterns (e.g., two fields in the same template repeatedly co-occur within a fixed distance apart across multiple records). TWIX then assembles these fields into a template by enforcing visual constraints, such as vertically aligning table rows with their column headers for tabular regions, and horizontally aligning keys with their values for key-value pairs. TWIX then uses this inferred template to accurately and efficiently extract data from templatized documents at a low cost. On one benchmark with 34 diverse real-world datasets, TWIX outperforms state-of-the-art structured data extraction tools (Evaporate, Textract, and Azure Document Intelligence), and vision-based LLMs like GPT-4-Vision, by over 25% in precision and recall. Another benchmark with 30 large datasets demonstrates TWIX's scalability: it is 520X faster and 3,786X cheaper than the most competitive compared tool, for extracting data from large document collections with over 2000 pages.

URL PDF HTML ☆

赞 0 踩 0

2406.05335 2026-06-09 cond-mat.dis-nn cs.LG 版本更新

Phase transition in large language models and the criticality of natural languages

大型语言模型中的相变与自然语言的临界性

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

发表机构 * Center for Advanced Intelligence Project, RIKEN（先进智能项目中心，理化学研究所）； National Institute for Japanese Language and Linguistics（日本语言学研究所）； Department of Physics, Nagoya University（名古屋大学物理系）； Department of Multidisciplinary Sciences, The University of Tokyo（东京大学多学科科学系）； Komaba Institute for Science, The University of Tokyo（东京大学Komaba科学研究所）

AI总结通过将大型语言模型作为可控有效模型，发现当调节类似物理温度的参数时，模型经历相变，临界点生成的文本呈现幂律行为，最接近自然语言，表明自然语言具有临界性。

Comments 8 pages, 6 figures

详情

AI中文摘要

自然语言中的文本和语音生成可以建模为随机过程。这一思想可追溯到马尔可夫的开创性工作，以及后来的香农，也构成了大型语言模型（LLMs）近期发展的基础。自然语言对应的随机过程应不同于生成非语言序列的过程。区分语言与非语言序列的特征之一是幂律行为，这在不同语言中普遍存在。在统计物理学中，这种行为表明自然语言是临界的：它们位于参数化随机过程空间中的相变点附近。然而，验证这一猜想并不直接。即使存在相变，也无法在现实世界的自然语言中直接观察到，因为它们没有任何可控参数。在这里，我们使用LLMs作为自然语言的可控有效模型。通过对LLMs生成文本的统计分析，我们发现，当改变类似于物理温度的参数时，LLMs经历相变。该相变将低温相（生成文本具有复杂重复结构）与高温相（LLMs生成难以理解的文本）分开。在这些相之间的临界点，生成的文本显示出与自然语言相似的幂律行为，并且通过自然语言处理中的标准度量最接近自然语言。这些发现强烈表明自然语言确实是临界的。

英文摘要

Generation of text and speech in natural languages can be modeled as a stochastic process. This idea dates back to the seminal work of Markov and, later, to that of Shannon and also underlies the recent development of large language models (LLMs). The stochastic processes corresponding to natural languages should be distinct from those that generate nonlinguistic sequences. One of the features that discriminate linguistic and nonlinguistic sequences is power-law behavior, which is universally observed across different languages. In statistical physics, such behavior suggests that natural languages are critical: They lie near a phase transition point in a parametrized space of stochastic processes. However, testing this conjecture is not straightforward. A phase transition, even if it exists, cannot be directly observed in real-world natural languages because they do not have any controllable parameters. Here, we use LLMs as controllable effective models of natural languages. Through statistical analyses of texts generated by LLMs, we find that, when a parameter analogous to physical temperature is varied, LLMs undergo a phase transition. The transition separates a low-temperature phase with complex repetitive structures in generated texts from a high-temperature phase in which LLMs generate incomprehensible texts. At the critical point between these phases, generated texts display the power-law behavior similar to that of natural languages and most closely resemble natural languages as measured by a standard metric in natural language processing. These findings strongly suggest that natural languages are indeed critical.

URL PDF HTML ☆

赞 0 踩 0

2407.01718 2026-06-09 stat.ML cs.LG math.ST stat.TH 版本更新

Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets

熵最优传输特征映射用于高维数据集的非线性对齐与联合嵌入

Boris Landa, Yuval Kluger, Rong Ma

发表机构 * Department of Electrical and Computer Engineering, Yale University（耶鲁大学电气与计算机工程系）； Department of Biostatistics, Harvard University（哈佛大学生物统计学系）； Program in Applied Mathematics, Yale University（耶鲁大学应用数学项目）； Interdepartmental Program in Computational Biology and Bioinformatics, Yale University（耶鲁大学计算生物学与生物信息学跨学科项目）； Department of Pathology, Yale University School of Medicine（耶鲁大学医学院病理学系）

AI总结提出熵最优传输特征映射方法，通过EOT计划矩阵的奇异向量对齐和联合嵌入两个数据集，具有理论保证，在生成模型下证明其收敛性，并在模拟和真实生物数据中展示优势。

详情

AI中文摘要

将高维数据嵌入低维空间是数据分析中不可或缺的组成部分。在许多应用中，需要对齐和联合嵌入来自不同研究或实验条件的多个数据集。这些数据集可能共享感兴趣的底层结构，但表现出个体扭曲，导致使用传统技术时嵌入不对齐。在这项工作中，我们提出了熵最优传输（EOT）特征映射，一种具有理论保证的对齐和联合嵌入一对数据集的原则性方法。我们的方法利用两个数据集之间EOT计划矩阵的前导奇异向量来提取它们共享的底层结构，并在公共嵌入空间中对齐它们。我们将我们的方法解释为经典拉普拉斯特征映射和扩散映射嵌入的数据间变体，表明它具有许多有利的类似性质。我们分析了一个生成模型，其中两个观测到的高维数据集共享支持在公共低维流形上的潜在变量，而每个数据集受到平移、几何扭曲、正交干扰结构和噪声的影响。在大样本、高维情况下，我们证明EOT计划围绕一个由扭曲的几何均值确定的有效流形上的总体核集中，对平移、正交干扰结构和噪声具有不变性。随后，我们将我们的嵌入与编码共享流形密度和几何的总体水平算子的特征函数联系起来。最后，我们通过模拟和真实生物数据的分析展示了我们的方法在数据集成和嵌入方面的性能，证明了其在挑战性场景下相对于替代方法的优势。

英文摘要

Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose Entropic Optimal Transport (EOT) eigenmaps, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align them in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We analyze a generative model in which two observed high-dimensional datasets share latent variables supported on a common low-dimensional manifold, while each dataset is subject to translation, geometric distortion, orthogonal nuisance structure, and noise. In a large-sample, high-dimensional regime, we prove that the EOT plan concentrates around a population kernel on an effective manifold determined by the geometric mean of the distortions, with invariance to translations, orthogonal nuisance structure, and noise. Subsequently, we relate our embedding to eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.

URL PDF HTML ☆

赞 0 踩 0

2406.19749 2026-06-09 eess.IV cs.CV 版本更新

SPIRONet: Spatial-Frequency Learning and Graph-based Channel Interaction Network for Vessel Segmentation

SPIRONet：用于血管分割的空间-频率学习与基于图的通道交互网络

De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

发表机构 * State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences（多模态人工智能系统国家重点实验室，自动化研究所，中国科学院）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结提出SPIRONet，通过双空间-频率编码器、交叉注意力融合和基于图的通道交互模块，解决低信噪比、细小血管和强干扰下的血管分割难题，在五个数据集上取得最优性能。

Comments Accepted by Biomedical Signal Processing and Control. 15 Pages, 9 Figures, 13 Tables

详情

AI中文摘要

自动血管分割在下一代手术机器人介入导航系统的发展中起着关键作用。然而，当前方法在低信噪比、细小或纤细血管以及强干扰等具有挑战性的术中条件下，分割性能仍不理想。本研究提出了一种新颖的空间-频率学习与基于图的通道交互网络（SPIRONet）来解决上述问题。针对低信噪比血管外观和细小或纤细分支，采用了双空间-频率编码器，其中频率编码器捕获受局部噪声波动影响较小的全局血管连续性，而空间编码器保留精细的血管细节。进一步引入了交叉注意力融合模块，以自适应地整合这种互补的空间和频率信息。此外，为了抑制非目标血管和类血管结构的干扰，设计了基于图的通道交互模块来建模通道间的相关性，增强一致的血管相关响应，同时抑制任务无关的激活。在五个具有挑战性的数据集上的大量实验结果表明，与现有方法相比，所提方法取得了有竞争力且持续强劲的性能。例如，在CADSA、CAXF、DCA1、XCAD和ARCADE上，SPIRONet分别比最强竞争方法实现了+0.87%、+0.52%、+0.23%、+1.39%和+2.22%的IoU提升。此外，SPIRONet在512x512输入尺寸下实现了21 FPS的推理速度，满足介入场景（6-12 FPS）的实时要求。这些有希望的结果表明SPIRONet在介入导航系统中集成的潜力。代码可在该https URL获取。

英文摘要

Automatic vessel segmentation plays a pivotal role in the development of next-generation interventional navigation systems for surgical robotics. However, current approaches still suffer from suboptimal segmentation performance under challenging intraoperative conditions, such as low-signal-to-noise ratio (SNR), small or slender vessels, and strong interference. In this study, a novel spatial-frequency learning and graph-based channel interaction network (SPIRONet) is proposed to address the above issues. To address low-SNR vessel appearance and small or slender branches, dual spatial-frequency encoders are utilized, where the frequency encoder captures global vessel continuity that is less affected by local noise fluctuations, while the spatial encoder preserves fine vessel details. A cross-attention fusion module is further introduced to adaptively integrate this complementary spatial and frequency information. Moreover, to suppress interference from non-target vessels and vessel-like structures, a graph-based channel interaction module is designed to model channel-wise correlations, enhancing consistent vessel-related responses while suppressing task-irrelevant activations. Extensive experimental results on five challenging datasets demonstrate that the proposed method achieves competitive and consistently strong performance compared with existing methods. For example, SPIRONet achieves IoU improvements of +0.87%, +0.52%, +0.23%, +1.39%, and +2.22% over the strongest competing methods on CADSA, CAXF, DCA1, XCAD, and ARCADE, respectively. Moreover, SPIRONet achieves an inference speed of 21 FPS with a 512x512 input size, meeting the real-time requirements of interventional scenarios (6-12 FPS). These promising results indicate SPIRONet's potential for integration into interventional navigation systems. Code is available at https://github.com/Dxhuang-CASIA/SPIRONet.

URL PDF HTML ☆

赞 0 踩 0

2208.00778 2026-06-09 cs.DB cs.LG q-bio.QM 版本更新

SFILES 2.0: An extended text-based flowsheet representation

SFILES 2.0：一种扩展的基于文本的流程图表示

Gabriel Vogel, Edwin Hirtreiter, Lukas Schulze Balhorn, Artur M. Schweidtmann

发表机构 * University of Technology, Department of Chemical Engineering（技术大学，化工系）； TU Delft（代尔夫特理工大学）； Van der Maasweg 9 2629 HZ ； Delft, The Netherlands（代尔夫特，荷兰）

AI总结提出SFILES 2.0，通过扩展符号和命名约定解决原版无法明确描述关键配置和控制结构的问题，并开源实现流程图与字符串的自动转换，旨在推动化工流程图FAIR数据库建设。

详情

DOI: 10.1007/s11081-023-09798-9
Journal ref: Optimization and Engineering, Volume 24, pages 2911-2933, (2023)

AI中文摘要

SFILES是一种基于文本的化工流程图表示法。最初由d'Anterroches（通过基团贡献法进行流程生成与设计）提出，其灵感来自基于文本的分子SMILES表示法。与流程图图像相比，文本格式在存储格式、计算可访问性以及最终的数据分析和处理方面具有若干优势。然而，原始SFILES版本无法明确描述基本的流程图配置，例如塔顶和塔底产品的区分。它也无法描述化工过程安全可靠运行所需的控制结构。此外，目前没有公开可用的软件用于将化工过程拓扑结构编码或解码为SFILES。我们提出了SFILES 2.0，并完整描述了扩展符号和命名约定。此外，我们提供了开源软件，用于流程图图与SFILES 2.0字符串之间的自动转换。通过这种方式，我们希望鼓励研究人员和工程师以SFILES 2.0字符串的形式发布他们的流程图拓扑结构。最终目标是建立化工过程流程图FAIR数据库的标准，这对于未来的数据处理和分析将具有重要价值。

英文摘要

SFILES are a text-based notation for chemical process flowsheets. They were originally proposed by d'Anterroches (Process flow sheet generation & design through a group contribution approach) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.

URL PDF HTML ☆

赞 0 踩 0

2602.14975 2026-06-09 physics.chem-ph cs.LG

Faster Molecular Dynamics with Neural Network Potentials via Distilled Multiple Time-Stepping and Non-Conservative Forces

通过蒸馏多时间步长和非保守力加速基于神经网络势的分子动力学

Nicolaï Gouraud, Côme Cattin, Thomas Plé, Olivier Adjoua, Louis Lagardère, Jean-Philip Piquemal

发表机构 * Qubit Pharmaceuticals, Advanced Research Department（Qubit制药公司，先进研究部）； Sorbonne Université, Laboratoire de Chimie Théorique, UMR 7616 CNRS（索邦大学，理论化学实验室，UMR 7616 CNRS）； Laboratoire de Chimie Théorique, UMR 7616 CNRS（理论化学实验室，UMR 7616 CNRS）

AI总结提出DMTS-NC方法，利用蒸馏多时间步长和非保守力策略，结合基础神经网络模型（如FeNNix-Bio1）加速原子分子动力学模拟，在保持精度的同时实现15-30%的额外加速，并支持氢质量再分配和氢摩擦以扩展时间步长至10 fs。

详情

DOI: 10.1021/acs.jctc.6c00653
Journal ref: Journal of Chemical Theory and Computation, 2026

AI中文摘要

继我们之前的工作（J. Phys. Chem. Lett., 2026, 17, 5, 1288-1295）之后，我们提出了DMTS-NC方法，这是一种使用非保守力的蒸馏多时间步长策略，用于进一步加速使用基础神经网络模型（如FeNNix-Bio1）的原子分子动力学模拟。该方法采用双层可逆参考系统传播算法（RESPA）形式，将目标精确保守势与为产生非保守力而优化的简化蒸馏表示耦合。尽管是非保守的，但蒸馏架构被设计为强制执行关键物理先验，例如旋转等变性和原子力分量的抵消。这些选择促进了蒸馏过程，从而大幅提高了模拟的鲁棒性，显著限制了两种模型之间的异常差异，从而实现了与力数据的极好一致性。总体而言，DMTS-NC方案比其保守对应方案更稳定、更高效，额外加速比DMTS达到15-30%。无需微调步骤，它更易于实现，并且可以推至系统物理共振的极限，以在保持精度的同时提供最大效率。我们通过结合氢质量再分配（HMR）和高氢摩擦（HHF）获得了额外的加速，将方案的最大时间步长进一步扩展到10 fs，同时保持稳定性和精度。与DMTS一样，DMTS-NC适用于任何神经网络势，并且可以应用于计算量比FeNNix-Bio1更大的方法。我们展示了将该方法应用于MACE-OFF23蒸馏的原理验证，与单时间步长相比，获得了3.66至5.64的加速比。

英文摘要

Following our previous work (J. Phys. Chem. Lett., 2026, 17, 5, 1288-1295), we propose the DMTS-NC approach, a distilled multi-time-step (DMTS) strategy using non-conservative (NC) forces to further accelerate atomistic molecular dynamics simulations using foundation neural network models such as FeNNix-Bio1. There, a dual-level reversible reference system propagator algorithm (RESPA) formalism couples a target accurate conservative potential to a simplified distilled representation optimized for the production of non-conservative forces. Despite being non-conservative, the distilled architecture is designed to enforce key physical priors, such as equivariance under rotation and cancellation of atomic force components. These choices facilitate the distillation process and therefore improve drastically the robustness of simulation, significantly limiting abnormal discrepancies between the two models, thus achieving excellent agreement with the forces data. Overall, the DMTS-NC scheme is found to be more stable and efficient than its conservative counterpart with additional speedups reaching 15-30% over DMTS. Requiring no fine-tuning steps, it is easier to implement and can be pushed to the limit of the systems physical resonances to maintain accuracy while providing maximum efficiency. We obtain additional speedup by combining hydrogen mass repartitioning (HMR), High Hydrogen Friction (HHF) to further extended the largest timestep up to 10fs of our schemes while conserving stability and accuracy. As for DMTS, DMTS-NC is applicable to any neural network potential and can be applied to approaches that are computationally heavier than FeNNix-Bio1. We show a proof of principle applying the approach to the distillation of MACE-OFF23 with consequent speedups ranging from 3.66 to 5.64 compared to single timestep.

URL PDF HTML ☆

赞 0 踩 0

2604.07349 2026-06-09 cs.CC cs.AI cs.LO

Descent Before Hardness: Orbit-Gap Obstructions in Exact Certification

局部性、一致性与可处理性前沿

Tristan Simas

发表机构 * McGill University（麦吉尔大学）

AI总结本文通过Rice定理的结构类比，研究有限加权布尔优化/CSP风格切片中可处理性分类的精确性，提出闭包不变性作为正确分类的必要条件，并给出闭包不变分类的充要条件及四种阻碍族。

Comments Main PDF: 46 pages, 5 tables. Supplementary: 17 pages, 2 tables. Lean 4 formalization available at https://doi.org/10.5281/zenodo.19457896

详情

AI中文摘要

Rice定理表明，部分递归函数的非平凡外延性质是不可判定的。对于有限加权布尔优化/CSP风格切片，可处理性分类存在一个Rice式的结构类比：正确性迫使在定理强制表示的移动下具有不变性，而轨道间隙正是闭包不变谓词精确分类的障碍。该范围对于精确规范是普适的。任何严格规范的问题都确定一个可接受输出关系，而精确认证仅依赖于诱导的等价关系 $s \sim_R s' \iff \operatorname{Adm}_R(s)=\operatorname{Adm}_R(s')$。决策、搜索、近似、随机输出、统计和分布保证都通过这个可接受输出商进入。在具有多项式时间可计算传输的闭包封闭域上，每个正确的可处理性分类器必须在闭包轨道上为常数。精确的闭包不变分类当且仅当正轨道壳和负轨道壳不相交时才是可能的；在这种情况下，闭包壳是一个闭包算子，给出最小的精确分类器。有限结构域是提取成对语法上的基本局部一阶片段。四个二元成对阻碍族——主导对集中、边缘掩蔽、鬼影动作支持和动作特定偏移——见证了自然有限结构谓词的相同轨道分歧，而壳分离定理给出了分类可能时的正判据。没有显式的边缘控制，任意小的效用扰动都可能翻转相关性和充分性。

英文摘要

Exact certification has a quotient: states are equivalent when they have the same correct outputs. A tractability proxy must first define a predicate on this quotient before ordinary hardness or algorithmic questions arise. Raw syntactic proxies can fail at that earlier step, because correctness-preserving presentation moves may change the statistics they inspect while preserving the exact-certification problem. Orbit gaps are the complete obstruction. An orbit gap occurs when one closure orbit contains both positive and negative presentations of a target. Exact closure-invariant classification is possible if and only if the positive and negative orbit hulls are disjoint. When the hulls are disjoint, the closure hull is the least exact classifier. With computable orbit representatives, this hull classifier becomes a quotient-level algorithm. These are predicate-level results: they establish when a proxy defines a property of the certification problem at all, a precondition logically prior to class lower bounds on the resulting recovery task and deliberately not a substitute for them. The structural transfer applies to every fixed correctness relation, independent of whether that relation is polynomial-time accessible. In the direct finite-local regime, where local routing tests are computed from raw pairwise syntax, three binary-pairwise proxy families and one offset-normalization witness exhibit same-orbit disagreement. Positive results arise from quotient-preserving normalizations, computable orbit catalogues whose descended predicates compose under Boolean operations, and predicates defined directly on the correctness quotient. The result complements the Rice-analog line of Borchert, Stephan, Hemaspaandra, and Rothe. All numbered results are mechanized in Lean 4; the supplementary ledger maps each claim to its formal identifier.

URL PDF HTML ☆

赞 0 踩 0

2502.18834 2026-06-09 cs.CE cs.LG

FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting

FinTSB：一个全面且实用的金融时间序列预测基准

Yifan Hu, Yuante Li, Peiyuan Liu, Yuxia Zhu, Naiqi Li, Tao Dai, Shu-tao Xia, Dawei Cheng, Changjun Jiang

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China（清华大学深圳国际研究生院，清华大学，深圳 518055，中国）； School of Computer Science, Carnegie Mellon University, Pittsburgh 15213, Pennsylvania, United States（卡内基梅隆大学计算机科学学院，匹兹堡 15213，宾夕法尼亚州，美国）； School of Computer Science and Technology, Tongji University, Shanghai 201804, China（同济大学计算机科学与技术学院，上海 201804，中国）； College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518055, China（深圳大学计算机科学与软件工程学院，深圳 518055，中国）； Shanghai Artificial Intelligence Laboratory, Shanghai 200030, China（上海人工智能实验室，上海 200030，中国）

AI总结针对金融时间序列预测中多样性不足、评估标准缺失和现实匹配度低的问题，提出FinTSB基准，通过分类运动模式、标准化评估指标和模拟真实交易约束，提供全面的评估平台。

详情

DOI: 10.1007/s11704-026-51064-5
Journal ref: Frontiers of Computer Science 2026

AI中文摘要

金融时间序列记录了人脑增强决策行为，捕获了可用于盈利投资策略的历史信息。该领域吸引了大量研究者，提出了基于各种骨干网络的多种方法。然而，该领域的评估通常存在三个系统性局限：1. 未能考虑动态金融市场中观察到的全部股票运动模式（多样性差距）；2. 缺乏统一的评估协议，削弱了跨研究性能比较的有效性（标准化缺失）；3. 忽视关键市场结构因素，导致性能指标虚高，缺乏实际适用性（现实不匹配）。为解决这些问题，我们提出了FinTSB，一个全面且实用的金融时间序列预测基准。为增加多样性，我们将运动模式分为四类，对数据进行分词和预处理，并基于序列特征评估数据质量。为消除不同评估设置带来的偏差，我们在三个维度上标准化指标，并构建了一个用户友好、轻量级的流水线，集成了多种骨干网络的方法。为准确模拟真实交易场景并促进实际应用，我们广泛建模了各种监管约束，包括交易费用等。最后，我们在FinTSB上进行了大量实验，突出了关键见解，以指导不同市场条件下的模型选择。总体而言，FinTSB为研究者提供了一个新颖且全面的平台，用于改进和评估金融时间序列预测方法。代码可在https://github.com/TongjiFinLab/FinTSB获取。

英文摘要

Financial time series (FinTS) record the behavior of human-brain-augmented decision-making, capturing valuable historical information that can be leveraged for profitable investment strategies. Not surprisingly, this area has attracted considerable attention from researchers, who have proposed a wide range of methods based on various backbones. However, the evaluation of the area often exhibits three systemic limitations: 1. Failure to account for the full spectrum of stock movement patterns observed in dynamic financial markets. (Diversity Gap), 2. The absence of unified assessment protocols undermines the validity of cross-study performance comparisons. (Standardization Deficit), and 3. Neglect of critical market structure factors, resulting in inflated performance metrics that lack practical applicability. (Real-World Mismatch). Addressing these limitations, we propose FinTSB, a comprehensive and practical benchmark for financial time series forecasting (FinTSF). To increase the variety, we categorize movement patterns into four specific parts, tokenize and pre-process the data, and assess the data quality based on some sequence characteristics. To eliminate biases due to different evaluation settings, we standardize the metrics across three dimensions and build a user-friendly, lightweight pipeline incorporating methods from various backbones. To accurately simulate real-world trading scenarios and facilitate practical implementation, we extensively model various regulatory constraints, including transaction fees, among others. Finally, we conduct extensive experiments on FinTSB, highlighting key insights to guide model selection under varying market conditions. Overall, FinTSB provides researchers with a novel and comprehensive platform for improving and evaluating FinTSF methods. The code is available at https://github.com/TongjiFinLab/FinTSB.

URL PDF HTML ☆

赞 0 踩 0

2605.09813 2026-06-09 cs.NI cs.DC cs.LG cs.SY eess.SY

Optimizing Server Placement for Vertical Federated Learning in Dynamic Edge/Fog Networks

优化动态边缘/雾网络中垂直联邦学习的服务器部署

Su Wang, Mung Chiang, H. Vincent Poor

发表机构 * Department of Electrical and Computer Engineering, Purdue University（普洛威斯顿大学电子工程与计算机科学系）

AI总结本文研究动态边缘/雾网络中垂直联邦学习的控制与优化，提出SC-DN方法，通过联合优化服务器部署、传输功率、处理器频率和本地训练迭代数，提升模型性能与资源利用率。

Comments Under revision at IEEE/ACM transactions on networking

详情

DOI: 10.1109/TON.2026.3700898

AI中文摘要

我们研究了垂直联邦学习（VFL）的控制与优化，VFL是一种分布式机器学习方法，其中边缘/雾设备包含独立的数据特征。由于边缘/雾网络中数据特征和硬件的异构性，设备对VFL的贡献差异显著，且动态网络可能导致某些数据特征的永久退出或进入。在该设置下，我们提出的方法，动态网络中的服务器控制VFL（SC-DN），首先证明了每个全局轮次都存在一个全局一阶 stationary 点，然后利用这一结果，基于四个关键控制变量：（i）服务器部署，（ii）设备到服务器的传输功率，（iii）本地设备处理器频率，以及（iv）每个全局轮次的本地训练迭代数，联合优化机器学习模型训练和资源消耗。所得到的优化公式包含耦合变量以及多种对数约束，我们证明这是一个混合整数符号多项式问题，一个NP难问题，为此我们开发了一个通用求解器。最后，通过在图像和多模态数据集上的实验，我们表明我们的方法在分类/回归性能和资源消耗节省方面优于甚至贪心方法。

英文摘要

We investigate the control and optimization of vertical federated learning (VFL), a class of distributed machine learning (ML) methods in which edge/fog devices contain separate data features, in dynamic edge/fog networks. Owing to heterogeneous data features and hardware across edge/fog networks, devices' contributions to VFL vary substantially, and, moreover, dynamic edge/fog networks can lead to the permanent exit or entry of select data features. In this setting, our proposed methodology, server controlled VFL in dynamic networks (SC-DN), first establishes the existence of a global first-order stationary point for every global round, and then leverages this result to jointly optimize ML model training and resource consumption based on four key control variables: (i) server placement, (ii) device-to-server transmit power, (iii) local device processor frequency, and (iv) local training iterations per global round. The resulting optimization formulation contains coupled variables as well as numerous forms of logarithmic constraints which we show is a mixed-integer signomial program, an NP-hard problem, and for which we develop a general solver. Finally, via experiments on both image and multi-modal datasets, we show that our methodology demonstrates superior classification/regression performance and resource consumption savings than even greedy methodologies.

URL PDF HTML ☆

赞 0 踩 0

2603.24940 2026-06-09 cs.PL cs.AI

Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system

评估基于自适应和生成式AI的反馈与推荐在知识图谱集成的编程学习系统中的效果

Lalita Na Nongkhai, Jingyun Wang, Adam Wynn, Takahiko Mendori

发表机构 * Graduate School of Engineering, Kochi University of Technology（Kochi大学技术大学工学研究院）； Department of Computer Science, Durham University（Durham大学计算机科学系）

AI总结本文提出一种整合大型语言模型与检索增强生成方法的知识图谱编程学习系统，通过实验比较三种教学模式的反馈效果与学习表现。

详情

DOI: 10.1016/j.caeai.2025.100526
Journal ref: Computers and Education: Artificial Intelligence, Volume 10, June 2026, 100526

AI中文摘要

本文介绍了一种整合大型语言模型（LLM）与检索增强生成（RAG）方法的框架，利用知识图谱和用户交互历史进行学习者代码评估、生成形成性反馈并推荐练习。该研究通过四个关键日志特征分析了4956次代码提交数据，发现生成式AI模式的反馈使学习者正确代码更多且缺失关键逻辑的提交更少。混合生成式AI-自适应模式在正确提交数和错误或不完整尝试数上表现最佳，优于仅自适应或仅生成式AI模式。问卷结果显示，生成式AI反馈被广泛认为有帮助，且所有模式在易用性和有用性上均获好评。

英文摘要

This paper introduces the design and development of a framework that integrates a large language model (LLM) with a retrieval-augmented generation (RAG) approach leveraging both a knowledge graph and user interaction history. The framework is incorporated into a previously developed adaptive learning support system to assess learners' code, generate formative feedback, and recommend exercises. Moerover, this study examines learner preferences across three instructional modes; adaptive, Generative AI (GenAI), and hybrid GenAI-adaptive. An experimental study was conducted to compare the learning performance and perception of the learners, and the effectiveness of these three modes using four key log features derived from 4956 code submissions across all experimental groups. The analysis results show that learners receiving feedback from GenAI modes had significantly more correct code and fewer code submissions missing essential programming logic than those receiving feedback from adaptive mode. In particular, the hybrid GenAI-adaptive mode achieved the highest number of correct submissions and the fewest incorrect or incomplete attempts, outperforming both the adaptive-only and GenAI-only modes. Questionnaire responses further indicated that GenAI-generated feedback was widely perceived as helpful, while all modes were rated positively for ease of use and usefulness. These results suggest that the hybrid GenAI-adaptive mode outperforms the other two modes across all measured log features.

URL PDF HTML ☆

赞 0 踩 0

2602.00058 2026-06-09 cs.CR cs.LG

Comparison of Multiple Classifiers for Android Malware Detection with Emphasis on Feature Insights Using CICMalDroid 2020 Dataset

多分类器比较用于Android恶意软件检测：侧重于特征洞察使用CICMalDroid 2020数据集

Md Min-Ha-Zul Abedin, Tazqia Mehrub

发表机构 * Department of Biosystems Engineering, Auburn University（生物系统工程系，阿伯拉罕大学）； Independent Researcher（独立研究员）

AI总结本文比较了多个分类器在Android恶意软件检测中的性能，发现基于原始特征的梯度提升在准确率、精确率、召回率和F1值上表现最佳，同时揭示了关键驱动因素。

详情

DOI: 10.1109/STI69347.2025.11367549

AI中文摘要

准确的Android恶意软件检测对于保护用户至关重要。签名扫描器在公共应用商店的快速发布周期中显得滞后。我们旨在通过结合全面的数据集和严谨透明的评估来构建一个可信的检测器，并识别决策的可解释驱动因素。我们使用CICMalDroid2020数据集，其中包含17,341个应用，涵盖良性、广告软件、银行软件、短信恶意软件和风险软件。我们提取了301个静态特征和263个动态特征，形成一个564维的混合向量，然后在三种方案下评估了七个分类器：原始特征、主成分分析（PCA）和线性判别分析（LDA），采用70%训练和30%测试分割。结果表明，基于原始特征的梯度提升表现最佳。XGBoost在准确率、精确率、召回率和F1值上分别达到0.9747、0.9703、0.9731和0.9716，混淆矩阵显示恶意应用的良性标签很少。HistGradientBoosting的准确率为0.9741，F1值为0.9708，而CatBoost和随机森林的准确率分别为0.9678和0.9687，F1值分别为0.9636和0.9637。KNN和SVM表现较差。PCA降低了所有模型的性能，XGBoost的准确率降至0.9164，F1值降至0.8988。LDA保持了中90年代的准确率，并在投影中清晰分离了聚类。一个深度为2的替代树突显了包名、主要活动和目标SDK作为关键驱动因素。这些发现建立了Android恶意软件检测的高保真监督基线，并表明丰富的混合特征与梯度提升提供了实用且可解释的基础。

英文摘要

Accurate Android malware detection was critical for protecting users at scale. Signature scanners lagged behind fast release cycles on public app stores. We aimed to build a trustworthy detector by pairing a comprehensive dataset with a rigorous, transparent evaluation, and to identify interpretable drivers of decisions. We used CICMalDroid2020, which contained 17,341 apps across Benign, Adware, Banking, SMS malware, and Riskware. We extracted 301 static and 263 dynamic features into a 564 dimensional hybrid vector, then evaluated seven classifiers under three schemes, original features, principal component analysis, PCA, and linear discriminant analysis, LDA, with a 70 percent training and 30 percent test split. Results showed that gradient boosting on the original features performed best. XGBoost achieved 0.9747 accuracy, 0.9703 precision, 0.9731 recall, and 0.9716 F1, and the confusion matrix indicated rare benign labels for malicious apps. HistGradientBoosting reached 0.9741 accuracy and 0.9708 F1, while CatBoost and Random Forest were slightly lower at 0.9678 and 0.9687 accuracy with 0.9636 and 0.9637 F1. KNN and SVM lagged. PCA reduced performance for all models, with XGBoost dropping to 0.9164 accuracy and 0.8988 F1. LDA maintained mid 90s accuracy and clarified separable clusters in projections. A depth two surrogate tree highlighted package name, main activity, and target SDK as key drivers. These findings established high fidelity supervised baselines for Android malware detection and indicated that rich hybrid features with gradient boosting offered a practical and interpretable foundation for deployment.

URL PDF HTML ☆

赞 0 踩 0

2512.10745 2026-06-09 physics.med-ph cs.LG

PMB-NN: Physiology-Centred Hybrid AI for Personalized Hemodynamic Monitoring from Photoplethysmography

PMB-NN：以生理为中心的混合AI用于从光体积脉搏波测记中进行个性化血流动力学监测

Yaowen Zhang, Libera Fresiello, Peter H. Veltink, Dirk W. Donker, Ying Wang

发表机构 * Department of Biomedical Signals and Systems, University of Twente（乌得勒支理工大学生物医学信号与系统系）； Department of Cardiovascular and Respiratory Physiology, University of Twente（乌得勒支理工大学心血管与呼吸生理学系）； Department of Intensive Care, University Medical Center Utrecht（乌得勒支大学医学中心重症医学科）

AI总结本文提出PMB-NN方法，结合生理模型与深度学习，实现个性化血流动力学监测，验证其在血压估计中的准确性、可解释性和合理性，展示了生理约束对混合AI框架的增强作用。

详情

DOI: 10.1016/j.cmpb.2026.109479

AI中文摘要

连续监测血压（BP）及血流动力学参数如外周阻力（R）和动脉顺应性（C）对早期血管功能障碍检测至关重要。尽管PPG可穿戴设备已广受欢迎，但现有数据驱动的BP估计方法缺乏可解释性。我们改进了之前提出的以生理为中心的混合AI方法——基于生理模型的神经网络（PMB-NN）——用于血压估计，该方法结合了深度学习与基于两个元件风阻模型的参数化模型，参数R和C作为物理约束。PMB-NN模型通过PPG衍生的时间特征以受试者特异性方式训练，同时利用人口统计数据推断一个中间变量：心输出量。我们验证了模型在10名健康成人进行静态和骑车活动两天内的表现，以测试模型的日常鲁棒性，并与深度学习（DL）模型（FCNN、CNN-LSTM、Transformer）和独立风阻生理模型（PM）进行基准测试。验证从三个角度进行：准确性、可解释性和合理性。PMB-NN在收缩压准确性（MAE：7.2 mmHg）方面与DL基准相当，在舒张压表现（MAE：3.9 mmHg）方面优于DL模型。然而，PMB-NN在生理合理性方面优于DL基线和PM，表明混合架构统一并增强了生理原理和数据驱动技术的各自优势。除了BP外，PMB-NN在训练过程中识别出R（ME：0.15 mmHg·s/ml）和C（ME：-0.35 ml/mmHg），其准确性与PM相似，证明了嵌入的生理约束为混合AI框架提供了可解释性。这些结果使PMB-NN成为一种平衡且基于生理的替代方案，用于日常血流动力学监测，替代纯粹数据驱动的方法。

英文摘要

Continuous monitoring of blood pressure (BP) and hemodynamic parameters such as peripheral resistance (R) and arterial compliance (C) are critical for early vascular dysfunction detection. While photoplethysmography (PPG) wearables has gained popularity, existing data-driven methods for BP estimation lack interpretability. We advanced our previously proposed physiology-centered hybrid AI method-Physiological Model-Based Neural Network (PMB-NN)-in blood pressure estimation, that unifies deep learning with a 2-element Windkessel based model parameterized by R and C acting as physics constraints. The PMB-NN model was trained in a subject-specific manner using PPG-derived timing features, while demographic information was used to infer an intermediate variable: cardiac output. We validated the model on 10 healthy adults performing static and cycling activities across two days for model's day-to-day robustness, benchmarked against deep learning (DL) models (FCNN, CNN-LSTM, Transformer) and standalone Windkessel based physiological model (PM). Validation was conducted on three perspectives: accuracy, interpretability and plausibility. PMB-NN achieved systolic BP accuracy (MAE: 7.2 mmHg) comparable to DL benchmarks, diastolic performance (MAE: 3.9 mmHg) lower than DL models. However, PMB-NN exhibited higher physiological plausibility than both DL baselines and PM, suggesting that the hybrid architecture unifies and enhances the respective merits of physiological principles and data-driven techniques. Beyond BP, PMB-NN identified R (ME: 0.15 mmHg$\cdot$s/ml) and C (ME: -0.35 ml/mmHg) during training with accuracy similar to PM, demonstrating that the embedded physiological constraints confer interpretability to the hybrid AI framework. These results position PMB-NN as a balanced, physiologically grounded alternative to purely data-driven approaches for daily hemodynamic monitoring.

URL PDF HTML ☆

赞 0 踩 0

2511.02469 2026-06-09 q-fin.CP cs.AI cs.MA

Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification

多智能体辩论式LLM中鹰派-鸽派隐含信念建模用于货币政策决策分类

Kaito Takano, Masanori Hirano, Kei Nakagawa

发表机构 * Osaka Metropolitan University（大阪市立大学）； Preferred Networks, Inc.

AI总结本文提出多智能体辩论式LLM框架，通过建模鹰派与鸽派隐含信念提升货币政策预测准确性，优于传统LLM基线。

Comments PRIMA2025 Accepted

详情

DOI: 10.1007/978-3-032-13562-9_38

AI中文摘要

准确预测央行政策决策，特别是美联储公开市场委员会（FOMC）的决策，在经济不确定性加剧的背景下变得尤为重要。尽管先前研究利用货币政策文本预测利率变化，但大多数方法依赖静态分类模型，忽略了政策制定的审议性质。本文提出了一种新颖的框架，通过建模多个大型语言模型（LLMs）作为交互智能体，结构上模仿FOMC的集体决策过程。每个智能体从不同的初始信念开始，并基于定性政策文本和定量宏观经济指标生成预测。通过迭代轮次，智能体通过观察其他智能体的输出修订预测，模拟审议和共识形成。为提高可解释性，我们引入一个表示每个智能体隐含信念（例如鹰派或鸽派）的隐变量，并理论证明该信念如何调解输入信息的感知和交互动态。实证结果表明，这种辩论式方法在预测准确性上显著优于标准LLM基线。此外，显式建模信念提供了关于个体视角和社会影响如何塑造集体政策预测的见解。

英文摘要

Accurately forecasting central bank policy decisions, particularly those of the Federal Open Market Committee(FOMC) has become increasingly important amid heightened economic uncertainty. While prior studies have used monetary policy texts to predict rate changes, most rely on static classification models that overlook the deliberative nature of policymaking. This study proposes a novel framework that structurally imitates the FOMC's collective decision-making process by modeling multiple large language models(LLMs) as interacting agents. Each agent begins with a distinct initial belief and produces a prediction based on both qualitative policy texts and quantitative macroeconomic indicators. Through iterative rounds, agents revise their predictions by observing the outputs of others, simulating deliberation and consensus formation. To enhance interpretability, we introduce a latent variable representing each agent's underlying belief(e.g., hawkish or dovish), and we theoretically demonstrate how this belief mediates the perception of input information and interaction dynamics. Empirical results show that this debate-based approach significantly outperforms standard LLMs-based baselines in prediction accuracy. Furthermore, the explicit modeling of beliefs provides insights into how individual perspectives and social influence shape collective policy forecasts.

URL PDF HTML ☆

赞 0 踩 0

2507.17726 2026-06-09 cond-mat.dis-nn cond-mat.mtrl-sci cs.LG

Deep Generative Learning of Magnetic Frustration in Artificial Spin Ice from Magnetic Force Microscopy Images

从磁力显微镜图像中深度生成学习人工自旋冰中的磁性摩擦

Arnab Neogi, Suryakant Mishra, Prasad P Iyer, Tzu-Ming Lu, Ezra Bussmann, Sergei Tretiak, Andrew Crandall Jones, Jian-Xin Zhu

发表机构 * Theoretical Division, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室理论 division）； Center for Integrated Nanotechnologies, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室集成纳米技术中心）； Center for Integrated Nanotechnologies, Sandia National Laboratory（桑塔纳国家实验室集成纳米技术中心）

AI总结本文通过深度学习方法从磁力显微镜图像中自动计算自旋冰结构的磁矩和方向，利用变分自编码器生成合成图像并提取特征，以减少实验和分割误差，实现对摩擦顶点和纳米磁性段的精确识别，优化自旋冰配置。

详情

DOI: 10.1038/s41524-026-02124-8

AI中文摘要

日益增长的高分辨率微观图像数据集促进了机器学习方法的发展，用于识别和分析图像中嵌入的细微物理现象。在本工作中，蜂窝晶格自旋冰样本的微观图像被用作数据集，用于自动化计算自旋冰配置的净磁矩和方向。在工作流程的第一阶段，机器学习模型被训练以准确预测自旋冰结构中的磁矩和方向。变分自编码器（VAEs），一种新兴的无监督深度学习技术，被用于生成高质量的合成磁力显微镜（MFM）图像并提取潜在特征表示，从而减少实验和分割误差。工作流程的第二阶段使能够精确识别和预测摩擦顶点和纳米磁性段，有效关联微观图像的结构和功能方面。这促进了设计具有受控摩擦模式的优化自旋冰配置，实现潜在的按需合成。

英文摘要

Increasingly large datasets of microscopic images with atomic resolution facilitate the development of machine learning methods to identify and analyze subtle physical phenomena embedded within the images. In this work, microscopic images of honeycomb lattice spin-ice samples serve as datasets from which we automate the calculation of net magnetic moments and directional orientations of spin-ice configurations. In the first stage of our workflow, machine learning models are trained to accurately predict magnetic moments and directions within spin-ice structures. Variational Autoencoders (VAEs), an emergent unsupervised deep learning technique, are employed to generate high-quality synthetic magnetic force microscopy (MFM) images and extract latent feature representations, thereby reducing experimental and segmentation errors. The second stage of proposed methodology enables precise identification and prediction of frustrated vertices and nanomagnetic segments, effectively correlating structural and functional aspects of microscopic images. This facilitates the design of optimized spin-ice configurations with controlled frustration patterns, enabling potential on-demand synthesis.

URL PDF HTML ☆

赞 0 踩 0

2507.15617 2026-06-09 cs.CY cs.AI

Why can't Epidemiology be automated (yet)?

为何流行病学无法被自动化（至今仍无法）

David Bann, Ed Lowther, Liam Wright, Yevgeniya Kovalchuk

发表机构 * Centre for Longitudinal Studies, University College London（伦敦大学学院长期研究所在）； Centre for Advanced Research Computing, University College London（伦敦大学学院先进计算研究中心）

AI总结本文探讨流行病学研究中人工智能应用的潜力与限制，指出尽管生成式AI提供了机遇，但现有工具和人类系统限制了其效能，需流行病学家与工程师的协同合作。

Comments 9 pages, 2 figures, 1 table

详情

DOI: 10.1093/ije/dyaf210

AI中文摘要

近期人工智能（AI）特别是生成式AI的进步为加速或自动化流行病学研究提供了新机遇。与基于物理实验的学科不同，流行病学大量依赖二次数据分析，因此非常适合此类增强。然而，仍不清楚哪些具体任务能从AI干预中受益或存在哪些障碍。当前AI能力的认知也参差不齐。本文通过现有数据集映射流行病学任务，从文献回顾到数据访问、分析、撰写和传播，识别现有AI工具在效率上的提升。尽管AI在某些领域如编码和行政任务中能提高生产力，但其效用受现有AI模型（如文献回顾中的幻觉）和人类系统（如数据集访问障碍）的限制。通过AI生成的流行病学成果示例，包括完全由AI生成的论文，表明最近开发的代理系统能设计和执行流行病学分析，但质量参差不齐（见https://github.com/edlowther/automated-epidemiology）。流行病学家有新的机会实证测试和评估AI系统；实现AI潜力需要流行病学家与工程师的双向互动。

英文摘要

Recent advances in artificial intelligence (AI) - particularly generative AI - present new opportunities to accelerate, or even automate, epidemiological research. Unlike disciplines based on physical experimentation, a sizable fraction of Epidemiology relies on secondary data analysis and thus is well-suited for such augmentation. Yet, it remains unclear which specific tasks can benefit from AI interventions or where roadblocks exist. Awareness of current AI capabilities is also mixed. Here, we map the landscape of epidemiological tasks using existing datasets - from literature review to data access, analysis, writing up, and dissemination - and identify where existing AI tools offer efficiency gains. While AI can increase productivity in some areas such as coding and administrative tasks, its utility is constrained by limitations of existing AI models (e.g. hallucinations in literature reviews) and human systems (e.g. barriers to accessing datasets). Through examples of AI-generated epidemiological outputs, including fully AI-generated papers, we demonstrate that recently developed agentic systems can now design and execute epidemiological analysis, albeit to varied quality (see https://github.com/edlowther/automated-epidemiology). Epidemiologists have new opportunities to empirically test and benchmark AI systems; realising the potential of AI will require two-way engagement between epidemiologists and engineers.

URL PDF HTML ☆

赞 0 踩 0

2503.17400 2026-06-09 physics.flu-dyn cs.LG

TripNet: Learning Large-scale High-fidelity 3D Car Aerodynamics with Triplane Networks

TripNet：利用三平面网络学习大规模高保真3D汽车空气动力学

Qian Chen, Mohamed Elrefaie, Angela Dai, Faez Ahmed

发表机构 * Department of Mechanical Engineering（机械工程系）； Massachusetts Institute of Technology（麻省理工学院）； Department of Computer Science（计算机科学系）； Technical University of Munich（慕尼黑技术大学）

AI总结 TripNet通过三平面网络实现高分辨率3D汽车空气动力学模拟，无需依赖网格结构，提供高效准确的CFD预测。

详情

DOI: 10.1063/5.0324695

AI中文摘要

代理建模已成为加速计算流体力学（CFD）模拟的强大工具。现有基于点云、体素、网格或图的3D几何学习模型依赖显式几何表示，内存消耗大且分辨率受限。对于具有数百万节点和单元的大型模拟，现有模型因依赖网格分辨率而需进行剧烈下采样，导致精度下降。我们提出了TripNet，一种基于三平面的神经框架，通过隐式编码3D几何到紧凑的连续特征图中。与依赖网格的方法不同，TripNet可扩展到高分辨率模拟，而无需增加内存成本，并以查询方式在任意空间位置进行CFD预测，不依赖网格连接或预定义节点。TripNet在DrivAerNet和DrivAerNet++数据集上实现了最先进的性能，准确预测了阻力系数、表面压力和完整的3D流动场。通过统一的三平面骨干支持多种模拟任务，TripNet为传统CFD求解器和现有代理模型提供了可扩展、准确和高效的替代方案。

英文摘要

Surrogate modeling has emerged as a powerful tool to accelerate Computational Fluid Dynamics (CFD) simulations. Existing 3D geometric learning models based on point clouds, voxels, meshes, or graphs depend on explicit geometric representations that are memory-intensive and resolution-limited. For large-scale simulations with millions of nodes and cells, existing models require aggressive downsampling due to their dependence on mesh resolution, resulting in degraded accuracy. We present TripNet, a triplane-based neural framework that implicitly encodes 3D geometry into a compact, continuous feature map with fixed dimension. Unlike mesh-dependent approaches, TripNet scales to high-resolution simulations without increasing memory cost, and enables CFD predictions at arbitrary spatial locations in a query-based fashion, independent of mesh connectivity or predefined nodes. TripNet achieves state-of-the-art performance on the DrivAerNet and DrivAerNet++ datasets, accurately predicting drag coefficients, surface pressure, and full 3D flow fields. With a unified triplane backbone supporting multiple simulation tasks, TripNet offers a scalable, accurate, and efficient alternative to traditional CFD solvers and existing surrogate models.

URL PDF HTML ☆

赞 0 踩 0

2501.04633 2026-06-09 cs.HC cs.CY cs.RO

"Can you be my mum?": Manipulating Social Robots in the Large Language Models Era

你能做我的妈妈吗？：在大型语言模型时代操纵社交机器人

Giulio Antonio Abbo, Gloria Desideri, Tony Belpaeme, Micol Spitale

发表机构 * IDLab-AIRO , Ghent University – imec（IDLab-AIRO 和根特大学-imec）； DEIB , Politecnico di Milano（DEIB 和米兰理工学院）

AI总结研究探讨了在大型语言模型时代，用户如何利用机器人违反伦理原则，通过三种场景测试发现五种操纵技术，旨在为设计更安全的伦理人机交互提供参考。

Comments 10 pages, 2 figures

详情

DOI: 10.1109/HRI61500.2025.10973919
Journal ref: HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction

AI中文摘要

近期基于大型语言模型的机器人在对话能力上取得进展，使其互动更接近人类对话。然而，这些模型在人机交互中引入了安全和安全问题，因为它们容易受到操纵，可以绕过内置的安全措施。设想一个部署在家庭中的社交机器人，这项工作旨在理解日常用户如何尝试利用语言模型违反伦理原则，例如通过提示机器人扮演伴侣。我们进行了涉及21名大学生的试点研究，他们与Misty机器人互动，试图在基于特定人机交互伦理原则（依恋、自由和共情）的三个场景中绕过其安全机制。我们的结果表明，参与者使用了五种技术，包括侮辱和使用情感语言引起同情。我们希望这项工作能为未来研究设计更强大的安全措施，以确保伦理和安全的人机交互。

英文摘要

Recent advancements in robots powered by large language models have enhanced their conversational abilities, enabling interactions closely resembling human dialogue. However, these models introduce safety and security concerns in HRI, as they are vulnerable to manipulation that can bypass built-in safety measures. Imagining a social robot deployed in a home, this work aims to understand how everyday users try to exploit a language model to violate ethical principles, such as by prompting the robot to act like a life partner. We conducted a pilot study involving 21 university students who interacted with a Misty robot, attempting to circumvent its safety mechanisms across three scenarios based on specific HRI ethical principles: attachment, freedom, and empathy. Our results reveal that participants employed five techniques, including insulting and appealing to pity using emotional language. We hope this work can inform future research in designing strong safeguards to ensure ethical and secure human-robot interactions.

URL PDF HTML ☆

赞 0 踩 0

2501.03957 2026-06-09 cs.HC cs.CV

Vision Language Models as Values Detectors

视觉语言模型作为价值检测器

Giulio Antonio Abbo, Tony Belpaeme

发表机构 * IDLab-AIRO, Ghent University – imec, Belgium（IDLab-AIRO、根特大学 – imec、比利时）

AI总结本文研究了先进LLM与人类标注者在家庭环境场景中检测相关元素的对齐情况，发现LLaVA 34B表现最佳但仍需改进，表明LLM在检测图像中价值元素方面有潜力。

Comments 13 pages, 2 figures

详情

DOI: 10.1007/978-3-031-85463-7_5
Journal ref: Value Engineering in Artificial Intelligence (VALE 2024) (LNAI,volume 15356)

AI中文摘要

大型语言模型整合文本和视觉输入，为解释复杂数据提供了新可能。尽管其能生成连贯且上下文相关的文本，但其与人类感知在识别图像中相关元素的对齐仍需探索。本文研究了最先进的LLM与人类标注者在家庭环境场景中检测相关元素的对齐情况。我们创建了十二张描绘不同家庭场景的图像，并邀请十四名标注者识别每张图像中的关键元素。然后将这些人类响应与五个不同LLM的输出进行比较，包括GPT-4o和四个LLaVA变体。我们的发现显示对齐程度各异，LLaVA 34B表现最佳但得分仍低。然而，结果分析表明这些模型在检测图像中价值元素方面有潜力，表明通过改进训练和优化提示，LLM可增强社交机器人、辅助技术和人机交互的应用，提供更深入的见解和更相关的响应。

英文摘要

Large Language Models integrating textual and visual inputs have introduced new possibilities for interpreting complex data. Despite their remarkable ability to generate coherent and contextually relevant text based on visual stimuli, the alignment of these models with human perception in identifying relevant elements in images requires further exploration. This paper investigates the alignment between state-of-the-art LLMs and human annotators in detecting elements of relevance within home environment scenarios. We created a set of twelve images depicting various domestic scenarios and enlisted fourteen annotators to identify the key element in each image. We then compared these human responses with outputs from five different LLMs, including GPT-4o and four LLaVA variants. Our findings reveal a varied degree of alignment, with LLaVA 34B showing the highest performance but still scoring low. However, an analysis of the results highlights the models' potential to detect value-laden elements in images, suggesting that with improved training and refined prompts, LLMs could enhance applications in social robotics, assistive technologies, and human-computer interaction by providing deeper insights and more contextually relevant responses.

URL PDF HTML ☆

赞 0 踩 0

2312.07928 2026-06-09 eess.SP cs.AI stat.AP

Bayesian inversion of GPR waveforms for sub-surface material characterization: an uncertainty-aware retrieval of soil moisture and overlaying biomass properties

基于GPR波形的贝叶斯反演用于 subsurface 物性表征：一种面向不确定性的土壤含水率和覆盖物性质检索方法

Ishfaq Aziz, Elahe Soltanaghai, Adam Watts, Mohamad Alipour

发表机构 * Civil and Environmental Engineering, University of Illinois Urbana Champaign（伊利诺伊大学厄巴纳-香槟分校土木与环境工程系）； Computer Science, University of Illinois Urbana Champaign（伊利诺伊大学厄巴纳-香槟分校计算机科学系）； Pacific Wildland Fire Sciences Laboratory, United States Forest Service（美国森林服务局太平洋野火科学实验室）

AI总结本文提出基于贝叶斯模型更新的GPR波形反演方法，用于预测土壤和覆盖层的含水率和深度，通过实验室和实地数据验证，结果与TDR和重力法一致，提供不确定性的概率估计。

Comments Total 34 pages, 17 Figures. This paper under review in a journal but has not been published yet

详情

DOI: 10.1016/j.rse.2024.114351

AI中文摘要

准确估计地下属性如含水率和土壤植被层深度对地下条件监测、精准农业和 wildfire 风险评估至关重要。由于土壤常被植被和有机物覆盖，其表征具有挑战性。此外，覆盖层性质的估计对 wildfire 风险评估至关重要。本文提出基于贝叶斯模型更新的GPR波形反演方法，用于预测土壤和覆盖层的含水率和深度。由于其与含水率的高相关性，所提出的方法预测了两层的介电常数，以及其他参数，包括层深度和电导率。所提出的贝叶斯模型更新方法提供了这些参数的概率估计，可提供关于估计信心和不确定性的信息。该方法通过实验室和实地调查收集的多样化实验数据进行了评估。实验室研究包括土壤含水率变化、覆盖层深度和材料粗细的变化。实地研究包括对十六天的田间土壤含水率的测量。结果表明预测与时域反射计（TDR）测量和传统重力法一致。表面层深度也可合理预测。所提出的方法为面向不确定性的地下参数估计提供了一种有前景的方法，可支持跨广泛应用的风险评估决策。

英文摘要

Accurate estimation of sub-surface properties such as moisture content and depth of soil and vegetation layers is crucial for applications spanning sub-surface condition monitoring, precision agriculture, and effective wildfire risk assessment. Soil in nature is often covered by overlaying vegetation and surface organic material, making its characterization challenging. In addition, the estimation of the properties of the overlaying layer is crucial for applications like wildfire risk assessment. This study thus proposes a Bayesian model-updating-based approach for ground penetrating radar (GPR) waveform inversion to predict moisture contents and depths of soil and overlaying material layer. Due to its high correlation with moisture contents, the dielectric permittivity of both layers were predicted with the proposed method, along with other parameters, including depth and electrical conductivity of layers. The proposed Bayesian model updating approach yields probabilistic estimates of these parameters that can provide information about the confidence and uncertainty related to the estimates. The methodology was evaluated for a diverse range of experimental data collected through laboratory and field investigations. Laboratory investigations included variations in soil moisture values, depth of the overlaying surface layer, and coarseness of its material. The field investigation included measurement of field soil moisture for sixteen days. The results demonstrated predictions consistent with time-domain reflectometry (TDR) measurements and conventional gravimetric tests. The depth of the surface layer could also be predicted with reasonable accuracy. The proposed method provides a promising approach for uncertainty-aware sub-surface parameter estimation that can enable decision-making for risk assessment across a wide range of applications.

URL PDF HTML ☆

赞 0 踩 0

2310.20699 2026-06-09 physics.chem-ph cs.LG physics.comp-ph physics.data-an stat.AP

Bayesian Multistate Bennett Acceptance Ratio Methods

贝叶斯多状态贝纳特接受比率方法

Xinqiang Ding

发表机构 * Department of Chemistry, Tufts University（塔夫茨大学化学系）

AI总结本文提出贝叶斯多状态贝纳特接受比率方法，通过整合热力学状态的采样配置与先验分布，计算自由能的后验分布，并改进自由能估计的不确定性评估。

详情

DOI: 10.1021/acs.jctc.3c01212
Journal ref: Journal of Chemical Theory and Computation 2024 20 (5), 1878-1888

AI中文摘要

多状态贝纳特接受比率（MBAR）方法是一种计算热力学状态自由能的常用方法。本文介绍了贝叶斯MBAR，即MBAR的贝叶斯推广。通过整合从热力学状态采样的配置与先验分布，贝叶斯MBAR计算自由能的后验分布。利用后验分布，我们推导出自由能估计并计算其相关不确定性。值得注意的是，当使用均匀先验分布时，贝叶斯MBAR恢复了MBAR的结果，但提供了更准确的不确定性估计。此外，当有关于自由能的先验知识时，贝叶斯MBAR可以通过使用非均匀先验分布将此信息纳入估计过程。作为示例，我们展示通过结合关于自由能表面光滑性的先验知识，贝叶斯MBAR比MBAR方法提供更准确的估计。鉴于MBAR在自由能计算中的广泛应用，我们预计贝叶斯MBAR将成为自由能计算各种应用中的重要工具。

英文摘要

The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.

URL PDF HTML ☆

赞 0 踩 0

1909.02747 2026-06-09 eess.IV cs.CV cs.LG stat.ML

Eelgrass beds and oyster farming at a lagoon before and after the Great East Japan Earthquake 2011: potential to apply deep learning at a coastal area

2011年东日本大地震前后三重县洋浦湾的海草床和牡蛎养殖：在沿海地区应用深度学习的潜力

Takehisa Yamakita

发表机构 * Marine Biodiversity and Environmental Assessment Research Center (BioEnv)（海洋生物多样性与环境评估研究中心）

AI总结本文通过比较手动勾勒、简单图像分割和深度学习图像变换，研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类，展示了深度学习在地震后沿海地区空间模式提取中的潜力。

详情

DOI: 10.1109/IGARSS.2019.8900354.

AI中文摘要

本文通过对比手动勾勒、简单图像分割和深度学习图像变换方法，研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类，展示了深度学习在地震后沿海地区空间模式提取中的潜力。实验结果表明，图像变换方法在输出分辨率上表现最佳，其在植被分类上的准确率超过69%，通过随机点评估独立测试数据。沙地分布通过分割模型检测，而牡蛎养殖筏的分布则通过分割模型识别。通过手动勾勒和图像变换结果评估地震前后的变化，发现沙地面积增加而植被面积减少。仅通过分割模型检测到牡蛎养殖面积的减少。这些结果证明了深度学习在地震和海啸后空间模式提取中的潜力。

英文摘要

There is a small number of case studies of automatic land cover classification on the coastal area. Here, I test extraction of seagrass beds, sandy area, oyster farming rafts at Mangoku-ura Lagoon, Miyagi, Japan by comparing manual tracing, simple image segmentation, and image transformation using deep learning. The result was used to extract the changes before and after the earthquake and tsunami. The output resolution was best in the image transformation method, which showed more than 69% accuracy for vegetation classification by an assessment using random points on independent test data. The distribution of oyster farming rafts was detected by the segmentation model. Assessment of the change before and after the earthquake by the manual tracing and image transformation result revealed increase of sand area and decrease of the vegetation. By the segmentation model only the decrease of the oyster farming was detected. These results demonstrate the potential to extract the spatial pattern of these elements after an earthquake and tsunami. Index Terms: Great East Japan Earthquake of 2011, Land use land cover (LULC), Zosteracea seagrass, cultured oyster, deep learning, Mangoku Bay

URL PDF HTML ☆

赞 0 踩 0

2606.09410 2026-06-09 cs.AI cs.CL 新提交

Capacity, Not Format: Rethinking Structured Reasoning Failures

容量而非格式：重新思考结构化推理失败

Hengxin Fan

AI总结研究发现结构化格式对模型性能的影响取决于其空闲容量，容量不足时通过截断和纯容量竞争两种机制导致性能下降，建议先思考后格式化。

Comments 12 pages, 3 figures

详情

AI中文摘要

先前的工作将结构化输出视为推理的代价，但这种框架是不完整的：格式化的成本强烈依赖于模型的空闲容量。通过使用信息匹配的散文控制和四级模式复杂度梯度，我们在4个模型和5个基准测试中分离了格式特定效应与提示长度混淆，成功生成的响应中解析失败率为0%。我们发现结构化格式是容量依赖的。具有足够余量的模型在吸收JSON约束时不会出现性能下降（Sonnet：MATH-Hard上JSON为$88.7\pm4.0$%，CoT为$89.3\pm1.7$%）。相反，格式会严重降低接近其极限运行的模型，通过两种不同的机制。首先，在标准token预算下，Haiku下降了36.2个百分点（$p < 0.0001$），主要是由于截断。其次，即使延长预算消除了截断，GPT-4o-mini仍下降了28.0个百分点（$p < 0.001$），揭示了独立于token耗尽的纯容量竞争。这种格式惩罚随模式复杂度增加（McNemar $p < 0.0001$），且不能仅由提示长度解释。此外，这些结果对前沿模型免疫的说法提出了质疑：在AIME竞赛数学中，Opus 4.7在JSON下从96.2%下降到91.0%（$-5.3$个百分点；显示的百分比独立四舍五入，精确差值为$7/133 = 5.26$pp $\approx 5.3$pp）。一种延迟结构消融——在格式化之前自由推理——恢复了大部分丢失的准确率（3次运行均值：80-87%），支持了容量竞争机制。实际意义不是避免结构化输出，而是使其与容量匹配：当模型接近其极限时，先思考，后格式化。

英文摘要

Prior work treats structured output as a reasoning tax, but this framing is incomplete: the cost of formatting depends strongly on a model's spare capacity. Using information-matched prose controls and a four-level schema complexity gradient, we separate format-specific effects from prompt-length confounds across 4 models and 5 benchmarks with 0% parse failures on successfully generated responses. We find that structured formats are capacity-dependent. Models with sufficient headroom absorb JSON constraints without degradation (Sonnet: $88.7\pm4.0$% JSON vs. $89.3\pm1.7$% CoT on MATH-Hard). In contrast, formats severely degrade models operating near their limits through two distinct mechanisms. First, under standard token budgets, Haiku drops 36.2pp ($p < 0.0001$) largely due to truncation. Second, even with extended budgets eliminating truncation, GPT-4o-mini drops 28.0pp ($p < 0.001$), revealing pure capacity competition independent of token exhaustion. This format penalty scales with schema complexity (McNemar $p < 0.0001$) and cannot be explained by prompt length alone. Furthermore, these results qualify claims of frontier model immunity: on AIME competition math, Opus 4.7 drops from 96.2% to 91.0% under JSON ($-5.3$pp; the displayed percentages are independently rounded, exact difference is $7/133 = 5.26$pp $\approx 5.3$pp). A delayed-structure ablation -- reasoning freely before formatting -- recovers most of the lost accuracy (3-run mean: 80--87%), supporting the capacity competition mechanism. The practical implication is not to avoid structured output, but to match it to capacity: when a model is near its limits, think first, format later.

URL PDF HTML ☆

赞 0 踩 0

2606.09187 2026-06-09 cs.CV 新提交

CP4D: Compositional Physics-aware 4D Scene Generation

CP4D: 组合式物理感知4D场景生成

Hanxin Zhu, Cong Wang, Tianyu He, Long Chen, Xin Jin, Chen Gao, Zhibo Chen

AI总结提出CP4D范式，通过静态环境与物理动态对象组合，结合物理模拟器与视频扩散模型生成轨迹，实现高保真、物理一致的4D场景。

详情

AI中文摘要

4D生成（即动态3D生成）因其强大的时空建模能力，近年来成为快速发展的研究前沿。然而，尽管取得了显著进展，现有方法通常无法捕捉底层物理原理，导致结果在物理上不一致且视觉上不真实。为了克服这一限制，我们提出了CP4D，一种新的范式，用于生成忠实遵循复杂物理动力学的逼真4D场景。受现实世界场景的组合性质启发，其中不变的静态背景与动态、物理合理的共存，CP4D将4D生成重新表述为静态3D环境与物理基础动态对象的集成。在此基础上，我们的框架遵循三阶段流程：\textbf{1)} 首先，我们利用预训练的专家模型分别生成环境和前景对象的高保真3D表示。\textbf{2)} 随后，为了为这些对象生成物理合理的轨迹和真实的交互，我们提出了一种混合运动合成策略，该策略整合了来自物理模拟器的先验知识与视频扩散模型中嵌入的常识。\textbf{3)} 最后，我们开发了一种自动组合机制，将静态环境和动态对象无缝融合成连贯、物理一致的4D场景。大量实验表明，CP4D能够生成具有高视觉保真度、强物理合理性和细粒度可控性的可探索、可交互的4D场景，显著优于现有方法。项目页面：https://anonymous.4open.science/w/CP4D/。

英文摘要

4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research frontier due to its powerful spatiotemporal modeling capabilities. However, despite notable advances, existing approaches typically fail to capture the underlying physical principles, producing results that are both physically inconsistent and visually implausible. To overcome this limitation, we present CP4D, a novel paradigm for photorealistic 4D scene synthesis with faithful adherence to complex physical dynamics. Drawing inspiration from the compositional nature of real-world scenes, where immutable static backgrounds coexist with dynamic, physically plausible foregrounds, CP4D reformulates 4D generation as the integration of a static 3D environment with physically grounded dynamic objects. On this basis, our framework follows a three-stage pipeline: \textbf{1)} Firstly, we leverage pre-trained expert models to generate high-fidelity 3D representations of the environment and foreground objects respectively. \textbf{2)} Subsequently, to produce physically plausible trajectories and realistic interactions for these objects, we propose a hybrid motion synthesis strategy that integrates priors from physical simulators with the common sense embedded in video diffusion models. \textbf{3)} Finally, we develop an automated composition mechanism that seamlessly fuses the static environment and dynamic objects into coherent, physically consistent 4D scenes. Extensive experiments demonstrate that CP4D can generate explorable and interactive 4D scenes with high visual fidelity, strong physical plausibility, and fine-grained controllability, significantly outperforming existing methods. The project page: https://anonymous.4open.science/w/CP4D/.

URL PDF HTML ☆

赞 0 踩 0

2606.09155 2026-06-09 cs.RO 新提交

Bridged SBI: Correcting Biased Low-Fidelity Posteriors for Cost-Efficient High-Fidelity Inference

Bridged SBI：纠正有偏低保真后验以实现经济高效的高保真推理

Gahee Kim, Yuki Kadokawa, Sandro M. Alcantara Tacora, Taro Abe, Daisuke Endo, Genki Yamauchi, Takeshi Hashimoto, Takamitsu Matsubara

AI总结针对高保真粒子模拟器计算成本高的问题，提出Bridged SBI方法，利用低保真后验引导高保真推理，通过残差桥接纠正偏差，实现成本效益高的准确后验估计。

详情

AI中文摘要

基于粒子的模拟器的精确校准对于机器人土方模拟至关重要，但由于该任务的高度非线性粒子动力学和传统模拟器的黑箱性质，分析校准具有挑战性。尽管基于模拟的推理（SBI）可以仅通过前向模拟估计模拟参数的后验分布，但将SBI直接应用于高保真（HF）粒子模拟器通常在计算上不可行。使用较粗颗粒的低保真（LF）模拟器可以降低这一成本，但颗粒大小和数量的变化会改变再现相同观测所需的参数值，从而产生有偏的LF后验。我们提出了Bridged SBI，它利用有偏但有信息的LF后验来指导HF推理。该方法首先使用廉价的LF模拟识别一个粗略的高密度参数区域，然后学习一个局部残差桥，通过纠正LF-HF差异将LF后验样本转移到HF一致区域。我们分析了顺序多保真SBI（Naive-MF）在直接依赖LF后验而不进行差异纠正时如何遭受LF诱导的后验覆盖不足。然后我们展示了Bridged SBI旨在通过残差纠正显式建模LF-HF差异来缓解这一问题。在模拟到模拟的粒子参数校准和真实土壤观测的实到模拟校准上的实验表明，与仅HF的SBI或Naive-MF基线相比，Bridged SBI在有限的HF模拟成本下产生了更准确和可靠的HF后验。

英文摘要

Accurate calibration of particle-based simulators is crucial for robotic earthwork simulation, but analytical calibration is challenging due to this task's highly nonlinear particle dynamics and the black-box nature of conventional simulators. Although simulation-based inference (SBI) can estimate posterior distributions over simulation parameters solely from forward simulations, applying SBI directly to high-fidelity (HF) particle simulators is often computationally prohibitive. Low-fidelity (LF) simulators with coarser particles can reduce this cost, but changes in particle size and particle count shift the parameter values needed to reproduce the same observation, producing biased LF posteriors. We propose Bridged SBI, which leverages a biased but informative LF posterior to guide HF inference. This method first uses inexpensive LF simulations to identify a coarse high-density parameter region, and then it learns a local residual bridge to transport LF posterior samples toward HF-consistent regions by correcting the LF--HF discrepancy. We analyze how sequential multi-fidelity SBI (Naive-MF) can suffer from LF-induced posterior miscoverage when it directly relies on the LF posterior without discrepancy correction. We then show that Bridged SBI is designed to alleviate this issue by explicitly modeling the LF--HF discrepancy through residual correction. Experiments on both sim-to-sim particle-parameter calibration and real-to-sim calibration with real soil observation show that Bridged SBI produces more accurate and reliable HF posteriors than HF-only SBI or the Naive-MF baseline, especially under limited HF simulation costs.

URL PDF HTML ☆

赞 0 踩 0

2606.09123 2026-06-09 cs.CV cs.AI 新提交

An Enhanced Geometric-Spectral Feature Learning Framework for Airborne Multispectral Point Cloud Classification

一种增强的几何-光谱特征学习框架用于机载多光谱点云分类

Xian Li, Yanfeng Gu, Aleksandra Pižurica

AI总结针对机载多光谱点云高维异构、样本不平衡和类间光谱相似问题，提出基于注意力的双流特征融合框架，结合残差注意力融合块和联合损失函数，实现高精度地物分类。

详情

AI中文摘要

多光谱点云由三维空间-光谱信息组成，对于精确的土地覆盖分类具有巨大潜力。然而，分类模型的表示能力受到机载多光谱点云固有的高维异构空间-光谱信息、不平衡的样本分布和类间光谱相似性的限制。我们构建了两个多光谱点云数据集，并提出了一种基于注意力的增强几何-光谱特征学习框架用于机载多光谱点云分类。我们模型的一个关键组件是一种带有注意力机制的双流特征融合方法，该方法增强了来自高维异构多光谱点云的空间-光谱特征的表示能力。第一流旨在提取带有融合自注意力的位置编码全局光谱特征，第二流包括多核点卷积和特征聚合注意力以提取光谱引导的几何特征。然后，我们开发了一个残差注意力融合块，以整合来自两个并行流的最具信息量的几何-光谱特征。这项工作的另一个重要贡献是一个联合损失函数，以提高对不平衡和类间相似样本的学习能力。在两个机载多光谱点云数据集上的实验结果表明，与最先进的方法相比，所提方法具有有效性。此外，本文使用的代码和数据集将在https://github.com/HITlixian/TGRS_GSFF免费提供。

英文摘要

Multispectral point cloud (MPC) is composed of 3D spatial-spectral information, which holds tremendous potential for accurate land-cover classification. However, the representation power of classification models is limited by inherent high-dimensional and heterogeneous spatial-spectral information, unbalanced sample distribution, and inter-class spectral similarity of airborne MPCs. We build two MPC datasets and propose an enhanced geometric-spectral feature learning framework based on attentions for airborne MPC classification. A key component in our model is a two-stream feature fusion method with attention mechanisms, which enhances the representation capability of spatial-spectral features from high-dimensional heterogeneous MPCs. The first stream aims to extract position-encoded global spectral features with fusion self-attention, and the second stream comprises a multikernel point convolution and feature aggregation attention to extract spectral-guided geometric features. We then develop a residual attention fusion block to integrate the most informative geometric-spectral features from the two parallel streams. Another important contribution of this work is a joint loss function to improve the learning ability on unbalanced and interclass similar samples. Experimental results on two airborne MPC datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art methods. Furthermore, the codes and datasets used in this paper will be made available freely at https://github.com/HITlixian/TGRS_GSFF.

URL PDF HTML ☆

赞 0 踩 0

2606.09034 2026-06-09 cs.CV 新提交

Leveraging NeRF-Rendered Images for 3D Gaussian Splatting

利用NeRF渲染图像进行3D高斯泼溅

Mizuki Morikawa, Yuta Shimizu, Chunyu Li, Yusuke Monno, Masatoshi Okutomi

AI总结提出利用NeRF渲染图像辅助3DGS训练，通过去除瞬态物体和生成鸟瞰视图，结合扩散增强，在保持3DGS速度的同时提升街景渲染质量。

Comments ICIP 2026

详情

AI中文摘要

神经辐射场（NeRF）和3D高斯泼溅（3DGS）是两种主流的新视角合成方法。它们通常表现出互补的性能，即3DGS渲染速度更快，而NeRF渲染质量更高。受此启发，我们提出利用NeRF渲染的图像来辅助3DGS。具体来说，我们针对街景场景，利用预训练的街景专用NeRF方法为目标3DGS方法生成训练图像。在我们的3DGS训练中，NeRF渲染的图像用于去除街景输入视图中的瞬态物体，并生成鸟瞰视图作为额外视图，从而将NeRF的高质量渲染继承到3DGS中。我们进一步引入基于扩散的图像增强，以提高额外视图的图像质量。在一个人工合成数据集和两个真实数据集上的实验结果表明，我们提出的方法在保持3DGS速度和NeRF质量的同时，改善了街景渲染效果。

英文摘要

Neural radiance field (NeRF) and 3D Gaussian splatting (3DGS) are two mainstream approaches for novel view synthesis. They often show complementary performance, i.e., 3DGS demonstrating faster rendering speed and NeRF demonstrating higher rendering quality. Motivated by this, we propose leveraging NeRF-rendered images for 3DGS. Specifically, we target street scenes and utilize a pre-trained street-specific NeRF method to produce training images for a target 3DGS method. In our 3DGS training, NeRF-rendered images are used to remove transient objects in street-level input views and to generate bird's-eye views as additional views, inheriting the higher-quality rendering of NeRF into 3DGS. We further incorporate a diffusion-based image enhancement to improve the image quality of the additional views. Experimental results on one synthetic and two real datasets demonstrate that our proposed method improves street-scene rendering while preserving the speed of 3DGS and the quality of NeRF.

URL PDF HTML ☆

赞 0 踩 0

2606.08804 2026-06-09 cs.AI cs.LG 新提交

Q-Delta: Beyond Key-Value Associative State Evolution

Q-Delta：超越键值关联状态演化

Sumin Park, Seojin Kim, Noseong Park

AI总结提出Q-Delta，一种查询感知的delta规则，将混合键-查询预测误差融入状态演化，实现联合校正动态，在语言建模和长上下文检索任务上优于强基线。

Comments Accepted at ICML 2026

详情

AI中文摘要

线性注意力将序列建模重新表述为循环状态演化，实现高效的线性时间推理。在键值关联范式下，现有方法将查询的作用限制在读出操作，使其与状态演化解耦。我们表明，查询条件状态读出在累积记忆上诱导出结构化的值预测，补充了基于键的检索。基于这一洞察，我们提出Q-Delta，一种查询感知的delta规则，将混合键-查询预测误差融入状态演化，在保持delta规则效率的同时实现联合校正动态。我们为所得动态建立了稳定性保证，并推导出硬件高效的块状并行公式，以及自定义Triton实现。实验结果表明，在语言建模和长上下文检索任务上，优化稳定、吞吐量具有竞争力，且一致优于强基线。

英文摘要

Linear attention reformulates sequence modeling as recurrent state evolution, enabling efficient linear-time inference. Under the key-value associative paradigm, existing approaches restrict the role of the query to the readout operation, decoupling it from state evolution. We show that query-conditioned state readout induces a structured value prediction over accumulated memory that complements key-based retrieval. Based on this insight, we propose Q-Delta, a query-aware delta rule that integrates mixed key-query prediction errors into state evolution, enabling jointly corrective dynamics while preserving delta-rule efficiency. We establish stability guarantees for the resulting dynamics and derive a hardware-efficient chunkwise-parallel formulation with a custom Triton implementation. Empirical results demonstrate stable optimization, competitive throughput, and consistent improvements over strong baselines on language modeling and long-context retrieval tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.08718 2026-06-09 cs.LG cs.AI 新提交

Deep Active Re-Labeling: Toward Noise-Resilient Annotation Efficiency

深度主动重标注：迈向抗噪的标注效率

Md Abdullah Al Forhad, Weishi Shi

AI总结针对深度主动学习中人工标注噪声导致性能下降的问题，提出一种通过分配部分标注预算重新标注已标注数据来去噪的框架，实验表明在相同预算下更高效且最终数据集噪声较少。

Comments Accepted and published in the 2025 IEEE International Conference on Big Data (BigData). DOI: 10.1109/BigData66926.2025.11402126

详情

DOI: 10.1109/BigData66926.2025.11402126
Journal ref: 2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 886-895

AI中文摘要

虽然深度主动学习（DAL）有效减少了人工标注成本，但其效果受到人工标注误差的限制。这是因为主动学习采样的数据被认为对训练具有高度信息性。当人工标注者以一定比率向这些信息性数据引入错误时，主动学习性能显著下降，有时甚至比被动学习更差。本文首先分析了DAL设置中人工标注误差的影响。然后，我们提出了一个框架来解决DAL中的人工标注噪声问题。受人类学习模式的启发，我们提出的解决方案的核心思想是将部分人工标注预算分配给重新标注已标注的数据。先前的理论工作表明，当模型具备一定识别潜在噪声数据的能力时，即使重新标注一小部分数据也能有效去除主动训练集中的噪声。为此，我们实现了两种主动噪声采样策略，在不同情况下检测噪声，并分配部分标注预算重新标注这些实例。我们的方法赋予了主动学习一种回顾和内省的行为。实验表明，在相同标注预算下，我们的方法数据效率更高，并最终产生一个相对无噪声的标注数据集。

英文摘要

While Deep Active Learning (DAL) effectively reduces human annotation costs, its efficacy is constrained by human annotation errors. This is because the data sampled for active learning is assumed to be highly informative for training. When human annotators introduce errors into this informative data at a certain rate, the active learning performance drops significantly and, in some cases, even exhibits worse outcomes than passive learning. In this paper, we first analyze the impact of human annotation errors in the DAL setting. Then we propose a framework to address the human annotation noise problem for DAL. Informed by human learning patterns, the core idea of our proposed solution involves allocating a portion of the human annotation budget to re-annotate data that has already been labeled. Previous theoretical work suggests that when the model possesses a certain level of ability to identify potentially noisy data, even re-labeling a small fraction of the data can effectively remove noise from the active training set. To achieve this, we implement two active noise sampling strategies to detect noise under different circumstances and allocate a part of the annotation budget to re-annotate these instances. Our approach imbues active learning with a revisiting and introspective behavior. Our experiments demonstrate that, under the same annotation budget, our method is more data-efficient and yields a relatively noise-free annotation dataset in the end.

URL PDF HTML ☆

赞 0 踩 0

2606.08696 2026-06-09 cs.LG cs.AI 新提交

Agentic Search for Counterfactual Recourse under Fixed LLM Budgets

固定LLM预算下的反事实追索的智能搜索

Yasuo Tabei

AI总结提出Comp-MCTS框架，在固定LLM调用预算下，通过树搜索最大化生成唯一且经oracle验证的反事实，平衡数量与质量。

详情

AI中文摘要

反事实追索旨在提供可操作的特征变化，以改变预测模型做出的不利决策。在实践中，受影响的个体通常受益于多个可行的替代方案，而非单一的最优解释。产生此类替代方案的一种自然方式是提示大语言模型（LLMs）。然而，提示引入了一个实际约束：LLM调用的数量通常是主要的计算和经济成本。对多个替代方案的需求以及这一成本约束共同将问题从寻找单个高质量反事实转变为在固定LLM调用预算下高效生成一组经oracle验证的反事实。在这项工作中，我们将LLM智能体设置中的反事实追索生成作为固定预算搜索问题进行研究，并提出了Comp-MCTS，一个智能体树搜索框架，该框架在此预算下最大化唯一、经oracle验证的反事实的产出，同时保持有利的数量-质量权衡。Comp-MCTS通过基于LLM的提议生成、oracle验证和压缩引导剪枝，在无训练、仅oracle的设置中将预算分配给新颖的干预方向。在四个真实世界表格数据集上的实验表明，Comp-MCTS在唯一、经oracle验证的反事实产出方面显著优于单候选LATS风格基线，并且与更强的多候选变体相比，提供了有利的数量-质量-效率权衡：在四个数据集中的三个上，以相似或更低的oracle评估成本获得相当或更高的产出，同时具有有竞争力的接近性、稀疏性和新颖性。

英文摘要

Counterfactual recourse aims to provide actionable feature changes that would alter an unfavorable decision made by a predictive model. In practice, affected individuals often benefit from multiple feasible alternatives rather than a single optimal explanation. A natural way to produce such alternatives is to prompt large language models (LLMs). However, prompting incurs a practical constraint: the number of LLM calls is often the dominant computational and economic cost. Together, the need for multiple alternatives and this cost constraint shift the problem from finding a single high-quality counterfactual to efficiently generating a set of oracle-validated counterfactuals under a fixed LLM-call budget. In this work, we study counterfactual recourse generation in the LLM-agentic setting as a fixed-budget search problem and propose Comp-MCTS, an agentic tree-search framework that maximizes the yield of unique, oracle-validated counterfactuals under this budget while maintaining favorable quantity--quality trade-offs. Comp-MCTS allocates the budget toward novel intervention directions via LLM-based proposal generation, oracle validation, and compression-guided pruning, in a training-free, oracle-only setting. Experiments on four real-world tabular datasets show that Comp-MCTS substantially outperforms single-candidate LATS-style baselines in the yield of unique, oracle-validated counterfactuals, and offers favorable quantity--quality--efficiency trade-offs against stronger multi-candidate variants: comparable or higher yield at similar or lower oracle-evaluation cost on three of four datasets, plus competitive proximity, sparsity, and novelty.

URL PDF HTML ☆

赞 0 踩 0