arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2508.14143 2026-06-12 cs.LG q-bio.NC 版本更新

The Urysohn Machine: A Metric-Topological Model of Computation

Urysohn机器:一种度量-拓扑计算模型

Xin Li

发表机构 * University at Albany, State University of New York(纽约州立大学阿尔巴尼分校)

AI总结 提出Urysohn机器,一种基于度量分离、前沿结构和收缩的分类计算模型,通过Urysohn三元组和分层构造实现分类复杂度度量与可重用推理。

详情
AI中文摘要

我们引入Urysohn机器,一种面向分类计算的有效模型,其中度量分离、前沿结构和收缩是计算状态的显式部分。其基本对象是Urysohn三元组:一个支撑区域、一个目标划分以及一个存储在可重用度量库中的分离分类器。拓扑基础是有限单纯形设置下的构造性Urysohn实现定理。它通过嵌套多面体区域的二进阶梯构建分离器,并为其前沿配备链级微积分:前沿是循环,层级之间的壳层边界由前沿之差给出。该构造产生两种相关的复杂度度量:决策边界宽度(单个分类器边界的几何度量)和Urysohn宽度(库或实现所表示的总前沿质量)。我们证明了摊销分离定理,该定理表明在显式边界足迹假设下,逼近宽度为的边界达到精度所需的简单基三元组数量与边界宽度成正比,与分辨率成反比。我们还引入了一种对比分离算子,其图割泛函能从采样度量数据中一致地估计决策边界宽度,而其拉普拉斯谱则能证明类组件结构和电导率。最后,我们分析了动态Urysohn阶梯,并证明了四个保证:商塌缩下的可分离性、已提交前沿的稳定性、收缩下的有界容量以及商距离下的可扩展性。这些结果共同给出了分类复杂度、摊销推理和组合重用的度量-拓扑解释,在保留经典可计算性的同时,揭示了纯符号描述所隐藏的几何结构。

英文摘要

We introduce the Urysohn Machine, an effective model of classification-oriented computation in which metric separation, frontier structure, and contraction are explicit parts of the computational state. Its basic object is a \emph{Urysohn Triple}: a support region, a target partition, and a separating classifier stored in a reusable Metric Library. The topological foundation is a constructive Urysohn Realization theorem for finite simplicial settings. It builds separators from dyadic ladders of nested polyhedral regions and equips their frontiers with a chain-level calculus: frontiers are cycles, and shells between levels have boundaries given by differences of frontiers. This construction yields two related complexity measures: decision-boundary width, the geometric measure of a single classifier's boundary, and Urysohn width, the total frontier mass represented by a library or realization. We prove an Amortized Separation Theorem showing that approximating a boundary of width to accuracy requires a number of simple basis triples proportional to boundary width and inversely proportional to resolution, under explicit boundary-footprint assumptions. We also introduce a contrastive separation operator whose graph-cut functional consistently estimates decision-boundary width from sampled metric data, while its Laplacian spectrum certifies class-component structure and conductance. Finally, we analyze the dynamic Urysohn ladder and prove four guarantees: separability under quotient collapse, stability of committed frontiers, bounded capacity under contraction, and scalability with quotient distance. Together, these results give a metric-topological account of classification complexity, amortized inference, and compositional reuse that preserves classical computability while exposing geometric structure hidden by purely symbolic descriptions.

2508.04888 2026-06-12 cs.LG 版本更新

Retrieval-Augmented Foundation Models for Water Level Prediction in the Everglades

用于大沼泽地水位预测的检索增强基础模型

Rahuul Rangaraj, Jimeng Shi, Rajendra Paudel, Giri Narasimhan, Yanzhao Wu

发表机构 * Florida International University(佛罗里达国际大学) Everglades National Park(大沼泽地国家公园)

AI总结 针对大沼泽地水位预测,提出检索增强机制,利用统计相似性或互信息检索历史水文事件,提升预训练时序基础模型的长期预测性能,尤其在极端事件中效果显著。

详情
AI中文摘要

大沼泽地的准确水位预测对于防洪、干旱管理、水资源规划和生物多样性保护至关重要。尽管最近的时序基础模型在通用任务(体现在其预训练中)上表现出色,但它们在特定领域应用中的有效性仍未被充分理解。在这项工作中,我们整理了一个用于大沼泽地水位预测的领域特定数据集,并观察到当前最先进模型的性能仍然有限。为了解决这一差距,我们利用检索增强机制,从历史观测的外部档案中检索类似的多变量水文事件,以丰富这些预训练模型的输入上下文。我们研究了两种检索策略:基于统计相似性的检索和基于互信息的检索,并分析了纳入检索到的历史上下文如何影响预测性能。大量实验表明,检索增强一致地改善了长期水位预测,并在极端事件期间产生了不成比例的更大收益,这对环境决策尤为关键。我们的研究提供了经验证据,表明基于类比检索可以有益于环境科学中的预训练时序基础模型,为它们在大沼泽地水文预测中的应用提供了关于其优势、局限性和失败模式的实用见解。尽管在大沼泽地进行了评估,但所提出的框架是通用的,并且可以应用于给定时间序列数据的其他水文系统。代码和数据已在此 https URL 公开。

英文摘要

Accurate water level forecasting in the Everglades is essential for flood mitigation, drought management, water resource planning, and biodiversity conservation. While recent time-series foundation models have shown strong performance on generic tasks (represented in their pre-training), their effectiveness in domain-specific applications remains insufficiently understood. In this work, we curate a domain-specific dataset for water-level forecasting in the Everglades and observe that the performance of current state-of-the-art models remains limited. To address this gap, we leverage a retrieval-augmented mechanism that retrieves analogous multivariate hydrological episodes from an external archive of historical observations to enrich the input context of those pre-trained models. We study two retrieval strategies, statistical similarity-based retrieval and mutual information-based retrieval, and analyze how incorporating retrieved historical contexts affects predictive performance. Extensive experiments show that retrieval augmentation consistently improves long-horizon water level forecasts and yields disproportionately larger gains during extreme events, which is particularly critical for environmental decision-making. Our study provides empirical evidence that analog-based retrieval can benefit pretrained time-series foundation models in environmental science, offering practical insights into their strengths, limitations, and failure modes when applied to hydrological forecasting in the Everglades. Although evaluated in the Everglades, the proposed framework is general and can be applied to other hydrological systems given time series data. The code and data have been made publicly available at https://github.com/rahuul2992000/WaterRAF.

2508.01656 2026-06-12 cs.CL cs.AI cs.CY cs.HC physics.soc-ph 版本更新

Authorship Attribution in Multilingual Machine-Generated Texts

多语言机器生成文本的作者归属

Lucio La Cava, Dominik Macko, Róbert Móro, Ivan Srba, Andrea Tagarelli

发表机构 * DIMES Department, University of Calabria(卡利博大学DIMES系) Kempelen Institute of Intelligent Technologies(智能技术研究所)

AI总结 提出多语言作者归属问题,研究单语言方法在18种语言和8个生成器上的跨语言迁移能力,发现显著局限。

详情
Comments
Accepted at ACL 2026 - Main
AI中文摘要

随着大型语言模型(LLM)达到类人的流畅性和连贯性,区分机器生成文本(MGT)与人类撰写的内容变得越来越困难。虽然MGT检测的早期工作侧重于二元分类,但LLM的不断发展和多样性需要更细粒度且更具挑战性的作者归属(AA),即能够识别文本背后的确切生成器(LLM或人类)。然而,目前AA仍局限于单语言环境,其中英语是研究最多的语言,忽视了现代LLM的多语言性质和使用。在这项工作中,我们引入了多语言作者归属问题,涉及将文本归因于跨多种语言的人类或多个LLM生成器。聚焦于18种语言——涵盖多个语系和书写系统——以及8个生成器(7个LLM和人类撰写类别),我们研究了单语言AA方法在多语言环境中的适用性,包括其跨语言迁移能力,以及生成器对归属性能的影响。我们的结果表明,虽然某些单语言AA方法可以适应多语言环境,但仍然存在显著的局限性和挑战,特别是在跨不同语系迁移时,这凸显了多语言AA的复杂性以及需要更稳健的方法以更好地匹配现实场景。

英文摘要

As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7 LLMs and the human-authored class), we investigate the multilingual suitability of monolingual AA methods in terms of their cross-lingual transferability, and the impact of generators on attribution performance. Our results reveal that while certain monolingual AA methods can be adapted to multilingual settings, significant limitations and challenges remain, particularly in transferring across diverse language families, underscoring the complexity of multilingual AA and the need for more robust approaches to better match real-world scenarios.

2507.22791 2026-06-12 cs.CV 版本更新

Modality-Aware Feature Matching in Visual and Vision-Language Applications: A Comprehensive Survey

视觉与视觉-语言应用中的模态感知特征匹配:全面综述

Weide Liu, Wei Zhou, Jun Liu, Ping Hu, Jun Cheng, Jungong Han, Weisi Lin

发表机构 * School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics(江西财经大学计算机与人工智能学院) College of Computing and Data Science, Nanyang Technological University(南洋理工大学计算机与数据科学学院) School of Computer Science and Informatics, Cardiff University(卡迪夫大学计算机科学与信息学院) School of Computing and Communications, Lancaster University(兰卡斯特大学计算机与通讯学院) School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院) Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR)(新加坡资讯研究院,科技研究局(A*STAR)) Department of Automation, Tsinghua University(清华大学自动化系)

AI总结 综述基于模态的特征匹配,涵盖传统手工方法和现代深度学习方法,重点讨论跨RGB、深度、3D点云、LiDAR、医学图像及视觉-语言模态的进展,突出模态感知技术。

详情
Comments
CSUR
AI中文摘要

特征匹配是计算机视觉中的一项基础任务,对于图像检索、立体匹配、三维重建和SLAM等应用至关重要。本综述全面回顾了基于模态的特征匹配,探索了传统手工方法,并强调了当代深度学习方法在各种模态中的应用,包括RGB图像、深度图像、3D点云、LiDAR扫描、医学图像和视觉-语言交互。传统方法利用Harris角点等检测器和SIFT、ORB等描述符,在中等模态内变化下表现出鲁棒性,但在显著模态差距下表现不佳。当代基于深度学习的方法,例如基于CNN的SuperPoint和基于Transformer的LoFTR等无检测器策略,显著提高了跨模态的鲁棒性和适应性。我们重点介绍了模态感知的进展,例如用于深度图像的几何和深度特定描述符、用于3D点云的稀疏和密集学习方法、用于LiDAR扫描的注意力增强神经网络,以及用于复杂医学图像匹配的MIND描述符等专门解决方案。跨模态应用,特别是在医学图像配准和视觉-语言任务中,突显了特征匹配处理日益多样化数据交互的演变。

英文摘要

Feature matching is a cornerstone task in computer vision, essential for applications such as image retrieval, stereo matching, 3D reconstruction, and SLAM. This survey comprehensively reviews modality-based feature matching, exploring traditional handcrafted methods and emphasizing contemporary deep learning approaches across various modalities, including RGB images, depth images, 3D point clouds, LiDAR scans, medical images, and vision-language interactions. Traditional methods, leveraging detectors like Harris corners and descriptors such as SIFT and ORB, demonstrate robustness under moderate intra-modality variations but struggle with significant modality gaps. Contemporary deep learning-based methods, exemplified by detector-free strategies like CNN-based SuperPoint and transformer-based LoFTR, substantially improve robustness and adaptability across modalities. We highlight modality-aware advancements, such as geometric and depth-specific descriptors for depth images, sparse and dense learning methods for 3D point clouds, attention-enhanced neural networks for LiDAR scans, and specialized solutions like the MIND descriptor for complex medical image matching. Cross-modal applications, particularly in medical image registration and vision-language tasks, underscore the evolution of feature matching to handle increasingly diverse data interactions.

2507.22028 2026-06-12 cs.CV cs.RO 版本更新

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning

从看见到体验:通过强化学习扩展导航基础模型

Honglin He, Yukai Ma, Brad Squicciarini, Wayne Wu, Bolei Zhou

发表机构 * University of California, Los Angeles(加州大学洛杉矶分校) Coco Robotics(Coco机器人)

AI总结 提出S2E框架,结合离线视频预训练和模拟环境强化学习,通过锚点引导分布匹配和残差注意力模块,提升导航基础模型的交互性和安全性。

详情
Comments
27 pages, 20 figures, 9 tables, conference
AI中文摘要

基于大规模网络数据训练的导航基础模型使智能体能够跨不同环境和实体进行泛化。然而,这些仅基于离线数据训练的模型往往缺乏推理其行为后果或通过反事实理解进行适应的能力。因此,它们在现实世界城市导航中面临重大限制,其中交互性和安全行为(如避开障碍物和移动行人)至关重要。为解决这些挑战,我们引入了从看见到体验(S2E)学习框架,通过强化学习扩展导航基础模型的能力。S2E结合了离线视频预训练和强化学习后训练的优势。它保持了从大规模真实世界视频中获得的模型泛化能力,同时通过模拟环境中的强化学习增强了其交互性。具体而言,我们引入了两项创新:(1)用于离线预训练的锚点引导分布匹配策略,通过基于锚点的监督稳定学习并建模多样化的运动模式;(2)用于强化学习的残差注意力模块,从模拟环境中获得反应性行为,同时不抹除模型的预训练知识。此外,我们建立了一个全面的端到端评估基准NavBench-GS,该基准基于真实世界场景的光照逼真3D高斯溅射重建,并融入了物理交互。它可以系统评估导航基础模型的泛化能力和安全性。

英文摘要

Navigation foundation models trained on massive web-scale data enable agents to generalize across diverse environments and embodiments. However, these models, which are trained solely on offline data, often lack the capacity to reason about the consequences of their actions or adapt through counterfactual understanding. They thus face significant limitations in real-world urban navigation, where interactive and safe behaviors, such as avoiding obstacles and moving pedestrians, are critical. To tackle these challenges, we introduce the Seeing-to-Experiencing (S2E) learning framework to scale the capability of navigation foundation models with reinforcement learning. S2E combines the strengths of pretraining on offline videos and post-training through reinforcement learning. It maintains the model's generalizability acquired from large-scale real-world videos while enhancing its interactivity through reinforcement learning in simulation environments. Specifically, we introduce two innovations: (1) an Anchor-Guided Distribution Matching strategy for offline pretraining, which stabilizes learning and models diverse motion patterns through anchor-based supervision; and (2) a Residual-Attention Module for reinforcement learning, which obtains reactive behaviors from simulation environments without erasing the model's pretrained knowledge. Moreover, we establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3D Gaussian Splatting reconstructions of real-world scenes that incorporate physical interactions. It can systematically assess the generalizability and safety of navigation foundation models.

2507.20208 2026-06-12 cs.CL 版本更新

From Benchmarks to Skills: Low-Rank Factors for LLM Evaluation

从基准到技能:LLM评估的低秩因子

Aviya Maimon, Amir DN Cohen, Gal Vishne, Shauli Ravfogel, Reut Tsarfaty

发表机构 * Bar-Ilan University(巴伊兰大学) OriginAI Data Science Institute Columbia University(哥伦比亚大学数据科学学院) Center for Data Science New York University(纽约大学数据科学中心)

AI总结 通过因子分析发现LLM基准性能矩阵本质低秩,揭示任务冗余,提出基于潜在技能空间的评估框架,用于识别冗余任务、用小任务子集建模新模型和按技能轮廓选模型。

详情
AI中文摘要

当前对大型语言模型(LLM)的评估严重依赖于不断增长的基准集合和聚合基准分数,然而这种比较实际捕捉了什么,以及这些分数揭示了模型的哪些底层能力,仍不清楚。在此,我们提出了一种新的LLM评估范式,通过询问基准性能是反映许多独立能力,还是依赖于少量共享维度。为了回答这个问题,我们将因子分析(FA)应用于LLM与基准的大规模性能矩阵(60×44),揭示了该矩阵的固有低秩结构。也就是说,少量潜在因子捕捉了完整任务空间中的大部分结构。这种低秩几何揭示了现有任务之间存在大量冗余,并解释了为什么许多基准似乎测量了重叠的能力。我们进一步表明,这些潜在因子对应于连贯的、类似技能的LLM行为维度。利用这个潜在技能空间,我们为LLM评估和下游用户提供了三个实用工具:(i)识别冗余任务,(ii)使用少量任务子集对新模型进行画像,以及(iii)选择与所需技能轮廓一致的模型。我们的方法为单一聚合分数的事实标准提供了一个可靠的替代方案,并建立了一个可解释且实用的框架,用于理解和基准测试LLM的核心能力。

英文摘要

Current evaluations of large language models (LLMs) rely heavily on a growing collection of benchmarks and on aggregate benchmark scores, yet it remains unclear what this comparison actually captures, and what these scores reveal about models' underlying capabilities. Here, we propose a new paradigm for LLM evaluation, by asking whether benchmark performance reflects many independent abilities, or rather relies on a small number of shared dimensions. To answer this, we apply Factor Analysis (FA) to a massive performance matrix of LLMs versus benchmarks \((60\times44)\) revealing an \emph{intrinsically low-rank} structure of that matrix. That is, a small number of latent factors captures most of the structure in the full task space. This low-rank geometry reveals substantial redundancy across existing tasks and explains why many benchmarks appear to be measuring overlapping abilities. We further show that these latent factors correspond to coherent, skill-like, dimensions of LLM behavior. Leveraging this latent skill-space, we deliver three practical tools for LLM evaluation and downstream users: (i)~identifying redundant tasks, (ii)~profiling new models using a small subset of tasks, and (iii)~selecting models aligned with desired skill profiles. Our method provides a solid alternative to the de-facto standard of a single aggregate score, and establishes an interpretable and practical framework for understanding and benchmarking LLM core capabilities.

2507.10599 2026-06-12 cs.CL cs.AI cs.LG 版本更新

Emergence of Hierarchical Emotion Organization in Large Language Models

大型语言模型中层级情感组织的涌现

Maya Okawa, Bo Zhao, Eric J. Bigelow, Rose Yu, Tomer Ullman, Ekdeep Singh Lubana, Hidenori Tanaka

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Washington(华盛顿大学) University of Tokyo(东京大学)

AI总结 受情感轮理论启发,分析大型语言模型输出中情感状态间的概率依赖关系,发现模型自然形成与人类心理模型一致的层级情感树,且更大模型发展出更复杂的层级结构,同时揭示社会经济角色在情感识别中的系统性偏差。

详情
Comments
ICML 2026
AI中文摘要

随着大型语言模型(LLMs)越来越多地驱动对话代理,理解它们如何建模用户的情绪状态对于伦理部署至关重要。受情感轮(即一种认为情感层级组织的心理学框架)的启发,我们分析了模型输出中情感状态之间的概率依赖关系。我们发现LLMs自然形成与人类心理模型一致的层级情感树,且更大的模型发展出更复杂的层级结构。我们还揭示了跨社会经济角色的情感识别中存在系统性偏差,对于交叉、代表性不足的群体,错误分类会叠加。人类研究显示出惊人的相似性,表明LLMs内化了社会感知的某些方面。除了突出LLMs中的涌现情感推理能力,我们的结果还暗示了利用认知基础理论开发更好模型评估的潜力。

英文摘要

As large language models (LLMs) increasingly power conversational agents, understanding how they model users' emotional states is critical for ethical deployment. Inspired by emotion wheels, i.e., a psychological framework that argues emotions organize hierarchically, we analyze probabilistic dependencies between emotional states in model outputs. We find that LLMs naturally form hierarchical emotion trees that align with human psychological models, and larger models develop more complex hierarchies. We also uncover systematic biases in emotion recognition across socioeconomic personas, with compounding misclassifications for intersectional, underrepresented groups. Human studies reveal striking parallels, suggesting that LLMs internalize aspects of social perception. Beyond highlighting emergent emotional reasoning in LLMs, our results hint at the potential of using cognitively-grounded theories for developing better model evaluations.

2507.05019 2026-06-12 cs.LG cs.AI 版本更新

Meta-Learning Transformers to Improve In-Context Generalization

元学习变换器以改进上下文泛化

Lorenzo Braccaioli, Anna Vettoruzzo, Prabhant Singh, Joaquin Vanschoren, Mohamed-Rafik Bouguelia, Nicola Conci

发表机构 * University of Trento, Italy(特伦托大学,意大利) Eindhoven University, Netherlands(埃因霍温大学,荷兰) University of Doha for Science and Technology, Qatar(多哈科学与技术大学,卡塔尔)

AI总结 提出利用多个小规模领域特定数据集训练上下文学习器,通过元学习提升跨领域泛化能力,并在持续学习和无监督场景下验证其鲁棒性。

详情
AI中文摘要

上下文学习使变换器模型能够仅基于输入提示泛化到新任务,无需任何权重更新。然而,现有的训练范式通常依赖于大型非结构化数据集,这些数据集存储成本高,难以评估质量和平衡性,并且由于包含敏感信息而引发隐私和伦理问题。受这些局限性和风险的启发,我们提出了一种替代训练策略,利用多个小规模、领域特定的数据集集合。我们经验性地证明,此类数据质量的提高和多样性的增加提升了上下文学习器在其训练领域之外的泛化能力,同时与在单个大规模数据集上训练的模型相比,性能相当。我们通过利用元学习在Meta-Album集合上训练上下文学习器来研究这一范式,在多种设置下进行实验。首先,我们在受控环境中展示性能,其中测试领域完全排除在训练知识之外。其次,我们探索这些模型在信息可访问时间有限的持续场景中对遗忘的鲁棒性。最后,我们探索更具挑战性的无监督场景。我们的发现表明,当在精心策划的数据集集合上训练时,变换器仍然能够泛化用于上下文预测,同时在模块化和可替换性方面提供了优势。

英文摘要

In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

2507.03660 2026-06-12 cs.LG 版本更新

Single vs. Multiple Branches in DeepONet and S-DeepONet: Network Architecture Follows Coupling in Multiphysics Systems

DeepONet和S-DeepONet中的单分支与多分支:网络架构遵循多物理系统中的耦合

Jaewan Park, Kazuma Kobayashi, Qibang Liu, Seid Koric, Diab Abueidda, Syed Bahauddin Alam

发表机构 * National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign(国家超级计算应用中心,伊利诺伊大学厄巴纳-香槟分校) The Grainger College of Engineering, Mechanical Science and Engineering, University of Illinois at Urbana-Champaign(格拉inger工程学院,机械科学与工程系,伊利诺伊大学厄巴纳-香槟分校) The Grainger College of Engineering, Nuclear, Plasma & Radiological Engineering, University of Illinois at Urbana-Champaign(格拉inger工程学院,核物理与辐射工程系,伊利诺伊大学厄巴纳-香槟分校) Department of Industrial and Manufacturing Systems Engineering, Kansas State University(工业与制造系统工程系,堪萨斯州立大学) Civil and Urban Engineering Department, New York University Abu Dhabi, UAE(土木与城市工程系,纽约大学阿布扎比分校,阿联酋)

AI总结 研究比较单分支与多分支神经算子架构在强耦合多物理系统中的表现,发现单分支网络在紧耦合场景下通过共享潜在表示优于多分支,而多分支适用于解耦或单物理任务,代理模型加速高达1.8×10^4倍。

详情
AI中文摘要

复杂物理系统的实时预测需要从数据中学习并代表强多物理耦合的代理模型。深度算子网络在单物理问题中已显示出成功,但其在捕捉耦合系统(如热-机械或电-热耦合)中非线性相互作用方面的有效性仍未充分探索。这里我们提出一个实际问题:神经算子的架构是否应反映其旨在建模的物理耦合强度?我们比较了单分支和多分支设计,包括前馈和顺序循环形式,跨越三个代表性系统:具有异质源的反应-扩散问题、具有温度依赖电导率和焦耳热的非线性热电问题,以及钢凝固的粘塑性热-机械模型。单分支网络在紧耦合场景中通过鼓励共享潜在表示持续优于多分支变体,而多分支设计对于解耦或单物理任务仍然有利。一旦训练完成,这些代理模型提供全场预测的速度比基于物理的求解器快高达1.8×10^4倍。

英文摘要

`Real-time prediction of complex physical systems requires surrogate models that learn from data while representing strong multiphysics coupling. Deep Operator Networks have shown success in single-physics problems, yet their effectiveness in capturing nonlinear interactions in coupled systems (such as thermo-mechanical or electro-thermal coupling) remains underexplored. Here we pose a practical question: should the architecture of a neural operator reflect the strength of physical coupling it aims to model? We compare single-branch and multi-branch designs, in both feedforward and sequential recurrent forms, across three representative systems: a reaction--diffusion problem with heterogeneous sources, a nonlinear thermo-electrical problem with temperature-dependent conductivity and Joule heating, and a viscoplastic thermo-mechanical model of steel solidification. Single-branch networks consistently outperform multi-branch variants in tightly coupled regimes by encouraging shared latent representations, whereas multi-branch designs remain favorable for decoupled or single-physics tasks. Once trained, these surrogates deliver full-field predictions up to $1.8 \times 10^4$ times faster than physics-based solvers.

2506.23033 2026-06-12 cs.LG stat.ML 版本更新

How Reliable are Fairness Audits with Unreliable Data?

不可靠数据下的公平性审计有多可靠?

Yash Vardhan Tomar

发表机构 * Purdue University(普渡大学)

AI总结 研究受保护标签缺失对公平性缓解审计的影响,提出种子校准压力测试区分缺失效应与随机波动,发现正可用性缺失通常不改变缓解方法效果,但无标签端点表现不同,且阈值优化可能将单轴公平性增益转化为交叉危害。

详情
AI中文摘要

公平性审计是负责任机器学习部署的关键组成部分。然而,在不完全受保护标签访问下审计建议的可靠性仍然知之甚少。在这项工作中,我们关注公平性缓解审计中的受保护标签缺失。我们引入了一种种子校准压力测试,以将缺失效应与完全标签下已经存在的种子间波动分离开来。在ACS/Folktables任务中,我们发现正可用性缺失通常不会将选定的缓解方法移出完全标签的种子基线。无标签端点表现不同,暴露了ERM等效候选和确定性断点,而不是广泛的缺失效应。我们还发现,阈值优化可以将单轴公平性增益转化为高于零点的交叉危害,这是一种更尖锐的失败模式,在随机森林验证下似乎仍然可见。总体而言,我们的结果强调,在将受保护标签缺失视为审计脆弱性的证据之前,应报告种子零校准、候选集背景和交叉后果。

英文摘要

Fairness audits are a key component of responsible machine-learning deployment. Yet, audit-recommendation reliability under incomplete protected-label access is still poorly understood. In this work, we focused on protected-label missingness in fairness mitigation audits. We introduced a seed-calibrated stress test to separate missingness effects from seed-to-seed movement already present under complete labels. Across ACS/Folktables tasks, missingness settings that retain some protected labels usually do not move selected mitigation methods beyond a complete-label seed-to-seed baseline. At $0%$ protected-label access, candidates collapse to an empirical-risk-minimization baseline and deterministic tie-breaking rather than revealing a broad missingness effect. We also found that threshold optimization can turn fairness gains on a single protected axis into intersectional harm above a seed baseline, and this threshold-optimizer finding persists under random-forest validation. Overall, our results highlight that protected-label missingness should be reported with seed-null calibration, candidate-set context, and intersectional consequences before it is treated as evidence of audit fragility.

2506.21855 2026-06-12 cs.CV 版本更新

Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation

Periodic-MAE:用于rPPG估计的周期性视频掩码自编码器

Jiho Choi, Sang Jun Lee

发表机构 * Division of Electronics and Information Engineering, Jeonbuk National University, Republic of Korea(电子与信息工程系,全州国立大学)

AI总结 提出Periodic-MAE,一种自监督框架,通过周期性感知掩码和生理频带约束,从无标签面部视频学习可泛化的时空表示,提升远程光电容积描记法(rPPG)估计性能。

详情
AI中文摘要

在本文中,我们提出Periodic-MAE,一种自监督框架,用于从无标签面部视频中学习周期性生理信号的通用时空表示。该方法利用掩码自编码器(MAE),通过重建掩码视频令牌学习高维面部表示,而不依赖远程光电容积描记法(rPPG)特定监督。为了明确地将表示学习与rPPG特征对齐,我们引入了一种基于视频重采样的周期性感知帧掩码策略,使编码器能够学习捕获与脉搏信号估计相关的准周期性时间模式的表示。此外,生理频带约束被集成到MAE预训练框架中,利用脉搏信号在频域的稀疏性,引导学习到的表示朝向生理上有意义的模式。预训练后,学习到的表示被迁移到下游rPPG估计任务,其中编码器作为通用特征提取器,从面部视频中恢复脉搏相关信号。我们在四个基准数据集(包括PURE、UBFC-rPPG、MMPD和V4V)上进行了广泛实验。此外,我们在无约束光照条件和受试者运动下收集的真实世界rPPG数据集上评估了所提方法。实验结果表明,Periodic-MAE持续改善了rPPG估计性能,特别是在具有挑战性的跨数据集和真实世界评估场景中。我们的代码可在以下网址获取:此 https URL。

英文摘要

In this paper, we propose Periodic-MAE, a self-supervised framework for learning generalizable spatio-temporal representations of periodic physiological signals from unlabeled facial videos. The proposed method leverages a masked autoencoder (MAE), which learns high-dimensional facial representations by reconstructing masked video tokens without relying on remote photoplethysmography (rPPG) specific supervision. To explicitly align representation learning with the characteristics of rPPG, we introduce a periodicity-aware frame masking strategy based on video resampling, enabling the encoder to learn representations that capture quasi-periodic temporal patterns relevant to pulse signal estimation. In addition, physiological bandlimit constraints are integrated into the MAE pre-training framework, exploiting the sparsity of pulse signals in the frequency domain to guide the learned representations toward physiologically meaningful patterns. After pre-training, the learned representations are transferred to downstream rPPG estimation, where the encoder serves as a generic feature extractor for recovering pulse-related signals from facial videos. We conduct extensive experiments on four benchmark datasets, including PURE, UBFC-rPPG, MMPD, and V4V. Moreover, we evaluate the proposed approach on a real-world rPPG dataset collected under unconstrained lighting conditions and subject motion. Experimental results demonstrate that Periodic-MAE consistently improves rPPG estimation performance, particularly in challenging cross-dataset and real-world evaluation settings. Our code is available at https://github.com/ziiho08/Periodic-MAE.

2502.18959 2026-06-12 cs.LG stat.ML 版本更新

Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential

傅里叶多分量与多层神经网络:解锁高频潜力

Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

发表机构 * Department of Applied Mathematics(应用数学系) Hong Kong Polytechnic University(香港理工大学) Department of Mathematics(数学系) Duke University(杜克大学) Department of Mathematics and Statistics(数学与统计学系) Auburn University(阿伯茨伦大学) School of Mathematics(数学学院) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出傅里叶多分量与多层神经网络(FMMNN),结合正弦型激活函数与多分量多层结构,通过低秩架构实现指数级函数逼近能力,优化景观优于标准全连接网络,并设计缩放随机初始化方法加速训练,在高频函数逼近任务中取得高精度与良好收敛性。

详情
Comments
Our code and implementation details are available at https://github.com/ShijunZhangMath/FMMNN
AI中文摘要

神经网络的结构及其激活函数的选择对其性能至关重要。同样重要的是确保这两个元素良好匹配,因为它们的对齐是有效表示和学习的关键。在本文中,我们引入了傅里叶多分量与多层神经网络(FMMNN),该模型将正弦型激活函数与MMNN的多分量多层结构相结合。在FMMNN中,每个分量表示为固定随机正弦型基函数的可训练线性组合,而多层组合则生成更复杂且自适应的频率特征。我们证明,即使在低秩架构下,FMMNN仍能保持函数逼近的指数级表达能力。我们还分析了FMMNN的优化景观,发现其比标准全连接神经网络更有利,尤其是对于高频目标。此外,我们提出了一种针对FMMNN第一层权重的缩放随机初始化方法,当样本充足时,该方法能加速训练并提高最终性能。大量数值实验支持我们的理论见解,表明FMMNN在振荡函数逼近基准上实现了高精度和良好的收敛行为。

英文摘要

The architecture of a neural network and the choice of its activation function are both fundamental to its performance. Equally important is ensuring that these two elements are well matched, as their alignment is key to effective representation and learning. In this paper, we introduce the Fourier Multi-Component and Multi-Layer Neural Network (FMMNN), a model that combines sine-type activations with the multi-component and multi-layer structure of MMNNs. In an FMMNN, each component is represented as a trainable linear combination of fixed random sine-type basis functions, while multi-layer composition generates more complex and adaptive high-frequency features. We establish that FMMNNs retain exponential expressive power for function approximation even under a low-rank architectural structure. We also analyze the optimization landscape of FMMNNs and find it to be substantially more favorable than that of standard fully connected neural networks, especially for high-frequency targets. In addition, we propose a scaled random initialization method for the first-layer weights in FMMNNs, which accelerates training and improves final performance when sufficient samples are available. Extensive numerical experiments support our theoretical insights, showing that FMMNNs achieve strong accuracy and favorable convergence behavior on oscillatory function-approximation benchmarks.

2506.01274 2026-06-12 cs.CV cs.AI 版本更新

ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding

ReFoCUS: 用于上下文理解的强化引导帧优化

Hosu Lee, Junho Kim, Hyunjun Kim, Yong Man Ro

发表机构 * Korea Advanced Institute of Science & Technology(韩国科学技术院) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出ReFoCUS框架,首次将在线策略梯度强化学习集成到视频大语言模型的帧级优化中,通过自回归和查询条件选择架构学习帧选择策略,无需显式帧级监督,提升视频问答推理准确性。

详情
Comments
Project page: https://interlive-team.github.io/ReFoCUS/
AI中文摘要

近期大型多模态模型(LMMs)的进展实现了有效的视觉-语言推理,然而视频理解能力仍受限于次优的帧选择策略,尽管视频专用LMMs发展迅速。先前的工作尝试通过静态启发式或外部检索模块来提供帧级信息,但这些方法往往无法捕捉与给定用户查询相关的视觉线索,混淆了原始视觉动态与真正的语义相关性。在本文中,我们介绍了ReFoCUS(用于上下文理解的强化引导帧优化),这是首个将在线策略梯度强化学习集成到视频-LLMs帧级优化的框架。ReFoCUS旨在学习帧选择策略,利用来自参考模型的奖励信号来捕捉其对最佳支持时间接地响应的帧组合的潜在评分行为。为了高效探索巨大的组合帧空间,我们采用了一种自回归且查询条件的选择架构,确保上下文一致性的同时降低复杂度。我们的策略学习无需显式帧级监督,因为它隐式地发现了最优且语义一致的帧组合。ReFoCUS在多个视频问答基准测试中持续提高了推理准确性,证明了将帧选择与模型内部效用对齐的优势。

英文摘要

Recent progress in Large Multi-modal Models (LMMs) has enabled effective vision-language reasoning, yet the ability to video understanding remains constrained by suboptimal frame selection strategies, albeit with the rapid development of video-specialized LMMs. Prior works attempted to solve this with static heuristics or external retrieval modules to feed frame-level information, but these approaches often fail to capture visual cues grounded to the given user queries conflating raw visual dynamics with true semantic relevance. In this paper, we introduce ReFoCUS (Reinforcement-guided Frame Optimization for Contextual UnderStanding), the first framework to integrate online policy-gradient reinforcement learning into frame-level optimization for video-LLMs. ReFoCUS aims to learn a frame selection policy, leveraging reward signals derived from reference models to capture their underlying scoring behavior over frame combinations that best support temporally grounded responses. To efficiently explore the large combinatorial frame space, we employ an autoregressive and query-conditional selection architecture that ensures contextual consistency while reducing complexity. Our policy learning removes the need for explicit frame-level supervision, as it implicitly discovers optimal and semantically consistent frame compositions. ReFoCUS consistently improves reasoning accuracy across multiple video QA benchmarks, demonstrating the advantage of aligning frame selection with model-internal utility.

2505.23823 2026-06-12 cs.CL 版本更新

RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery

RAGPPI:药物发现中蛋白质-蛋白质相互作用的RAG基准

Youngseung Jeon, Ziwen Li, Thomas Li, JiaSyuan Chang, Morteza Ziyadi, Xiang 'Anthony' Chen

发表机构 * University of California Los Angeles(加州大学洛杉矶分校) Palo Alto High School(帕洛阿尔托高中) Amazon AGI(亚马逊人工智能研究院)

AI总结 提出RAGPPI基准,包含4420个问答对,用于评估检索增强生成在药物发现中识别蛋白质-蛋白质相互作用生物学影响的能力。

详情
Journal ref
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026)
Comments
17 pages, 4 figures, 8 tables
AI中文摘要

检索蛋白质-蛋白质相互作用(PPI)的生物学影响对于药物开发中的靶点识别(Target ID)至关重要。由于涉及的蛋白质数量庞大,这一过程仍然耗时且具有挑战性。大型语言模型(LLMs)和检索增强生成(RAG)框架已支持靶点识别;然而,目前尚无用于识别PPI生物学影响的基准。为填补这一空白,我们引入了PPI的RAG基准(RAGPPI),这是一个包含4420个问答对的事实性问答基准,专注于PPI的潜在生物学影响。通过与专家访谈,我们确定了基准数据集的标准,例如问答类型和来源。我们通过专家驱动的数据标注构建了金标准数据集(500个问答对)。我们开发了一个集成自动评估LLM,该模型结合了专家标注特征、平均事实-摘要相似度(F1)和低相似度事实计数(F2),从而构建了银标准数据集(3720个问答对)。我们致力于维护RAGPPI作为支持研究社区推进药物发现问答解决方案的RAG系统的资源。

英文摘要

Retrieving the biological impacts of protein-protein interactions (PPIs) is essential for target identification (Target ID) in drug development. Given the vast number of proteins involved, this process remains time-consuming and challenging. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have supported Target ID; however, no benchmark currently exists for identifying the biological impacts of PPIs. To bridge this gap, we introduce the RAG Benchmark for PPIs (RAGPPI), a factual question-answer benchmark of 4,420 question-answer pairs that focus on the potential biological impacts of PPIs. Through interviews with experts, we identified criteria for a benchmark dataset, such as a type of QA and source. We built a gold-standard dataset (500 QA pairs) through expert-driven data annotation. We developed an ensemble auto-evaluation LLM that incorporates expert labeling characteristics, average fact-abstract similarity (F1), and low-similarity fact counts (F2), enabling the construction of a silver-standard dataset (3,720 QA pairs). We are committed to maintaining RAGPPI as a resource to support the research community in advancing RAG systems for drug discovery QA solutions.

2505.22695 2026-06-12 cs.LG 版本更新

LLM-ODDR: A Large Language Model Framework for Joint Order Dispatching and Driver Repositioning

LLM-ODDR:一种用于联合订单调度和司机重新定位的大语言模型框架

Tengfei Lyu, Siyuan Feng, Hao Liu, Hai Yang

发表机构 * Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou)(人工智能前沿技术 thrust,香港科学与技术大学(广州)) Department of Aeronautical and Aviation Engineering, The Hong Kong Polytechnic University(航空与航空工程系,香港理工大学) Research Center for Low Altitude Economy, The Hong Kong Polytechnic University(低空经济研究中心,香港理工大学) Department of Computer Science and Engineering, The Hong Kong University of Science and Technology(计算机科学与工程系,香港科学与技术大学) Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology(土木与环境工程系,香港科学与技术大学)

AI总结 提出LLM-ODDR框架,利用大语言模型联合优化网约车订单调度与司机重新定位,通过多目标价值细化、公平感知调度和时空需求感知重定位提升效果、适应性和可解释性。

详情
Comments
Published in IEEE Transactions on Intelligent Transportation Systems (TITS)
AI中文摘要

网约车平台在动态城市环境中优化订单调度和司机重新定位操作面临重大挑战。基于组合优化、规则启发式和强化学习的传统方法往往忽视司机收入公平性、可解释性以及对现实动态的适应性。为弥补这些不足,我们提出LLM-ODDR,一种利用大语言模型(LLM)进行网约车服务中联合订单调度和司机重新定位(ODDR)的新型框架。LLM-ODDR框架包含三个关键组件:(1)多目标引导的订单价值细化,通过考虑多个目标评估订单以确定其整体价值;(2)公平感知的订单调度,平衡平台收入与司机收入公平性;(3)时空需求感知的司机重新定位,基于历史模式和预测供应优化空闲车辆放置。我们还开发了JointDR-GPT,一个针对ODDR任务进行领域知识微调的模型。在曼哈顿出租车运营的真实数据集上进行的大量实验表明,我们的框架在有效性、对异常条件的适应性以及决策可解释性方面显著优于传统方法。据我们所知,这是首次将LLM作为决策智能体应用于网约车ODDR任务,为将先进语言模型集成到智能交通系统中奠定了基础性见解。虽然当前框架的计算成本高于传统方法,但我们表明并行分解和模型蒸馏可以将延迟降低到可部署的生产水平。

英文摘要

Ride-hailing platforms face significant challenges in optimizing order dispatching and driver repositioning operations in dynamic urban environments. Traditional approaches based on combinatorial optimization, rule-based heuristics, and reinforcement learning often overlook driver income fairness, interpretability, and adaptability to real-world dynamics. To address these gaps, we propose LLM-ODDR, a novel framework leveraging Large Language Models (LLMs) for joint Order Dispatching and Driver Repositioning (ODDR) in ride-hailing services. LLM-ODDR framework comprises three key components: (1) Multi-objective-guided Order Value Refinement, which evaluates orders by considering multiple objectives to determine their overall value; (2) Fairness-aware Order Dispatching, which balances platform revenue with driver income fairness; and (3) Spatiotemporal Demand-Aware Driver Repositioning, which optimizes idle vehicle placement based on historical patterns and projected supply. We also develop JointDR-GPT, a fine-tuned model optimized for ODDR tasks with domain knowledge. Extensive experiments on real-world datasets from Manhattan taxi operations demonstrate that our framework significantly outperforms traditional methods in terms of effectiveness, adaptability to anomalous conditions, and decision interpretability. To our knowledge, this is the first exploration of LLMs as decision-making agents in ride-hailing ODDR tasks, establishing foundational insights for integrating advanced language models within intelligent transportation systems. While the current framework incurs higher computational costs than traditional methods, we show that parallel decomposition and model distillation can reduce latency to production-viable levels for deployment.

2505.01869 2026-06-12 cs.CV 版本更新

Visual enhancement and 3D representation for underwater scenes: a review

水下场景的视觉增强与三维表示:综述

Guoxi Huang, Haoran Wang, Brett Seymour, Evan Kovacs, John Ellerbroc, Dave Blackham, Nantheera Anantrasirichai

发表机构 * Visual Information Laboratory, University of Bristol(视觉信息实验室,布里斯托尔大学) Submerged Resources Center, National Park Service(水下资源中心,国家公园服务) Marine Imaging Technologies, LLC(海洋成像技术有限公司) Gates Underwater Products, Inc(盖茨水下产品公司) Esprit film and television Ltd(Esprit电影和电视有限公司)

AI总结 本文综述了水下视觉增强和三维重建方法,从物理模型到非学习与数据驱动技术(如NeRF和3D高斯溅射),并评估了多种算法在基准数据集上的性能,指出了未来研究方向。

详情
AI中文摘要

水下视觉增强(UVE)和水下三维重建由于水生环境中复杂的成像条件,在计算机视觉和基于AI的任务中面临重大挑战。尽管开发了许多增强算法,但涵盖UVE和水下三维重建的全面系统性综述仍然缺失。为了推动这些领域的研究,我们从多个角度进行了深入综述。首先,我们介绍了基本的物理模型,强调了挑战传统技术的特殊性。我们调查了专门为水下场景设计的视觉增强和三维重建的先进方法。本文评估了从非学习方法到先进数据驱动技术(包括神经辐射场和3D高斯溅射)的各种方法,讨论了它们在处理水下失真方面的有效性。最后,我们在多个基准数据集上对最先进的UVE和水下三维重建算法进行了定量和定性评估。最后,我们指出了水下视觉未来发展的关键研究方向。

英文摘要

Underwater visual enhancement (UVE) and underwater 3D reconstruction pose significant challenges in computer vision and AI-based tasks due to complex imaging conditions in aquatic environments. Despite the development of numerous enhancement algorithms, a comprehensive and systematic review covering both UVE and underwater 3D reconstruction remains absent. To advance research in these areas, we present an in-depth review from multiple perspectives. First, we introduce the fundamental physical models, highlighting the peculiarities that challenge conventional techniques. We survey advanced methods for visual enhancement and 3D reconstruction specifically designed for underwater scenarios. The paper assesses various approaches from non-learning methods to advanced data-driven techniques, including Neural Radiance Fields and 3D Gaussian Splatting, discussing their effectiveness in handling underwater distortions. Finally, we conduct both quantitative and qualitative evaluations of state-of-the-art UVE and underwater 3D reconstruction algorithms across multiple benchmark datasets. Finally, we highlight key research directions for future advancements in underwater vision.

2408.17221 2026-06-12 cs.LG math.AG 版本更新

Geometry of Lightning Self-Attention: Identifiability and Dimension

闪电自注意力的几何:可识别性与维度

Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * University of Toronto(多伦多大学) Royal Institute of Technology (KTH)(皇家理工学院(KTH))

AI总结 本文利用代数几何工具,分析了无归一化自注意力网络的函数空间几何,给出了深层注意力的可识别性描述并计算了函数空间维度,同时刻画了单层模型的奇异点和边界点,并推测了归一化情形的结果。

详情
Comments
Accepted at ICLR 2025
AI中文摘要

我们考虑由无归一化的自注意力网络定义的函数空间,并理论上分析其几何结构。由于这些网络是多项式,我们依赖代数几何的工具。特别地,我们通过描述任意层数参数化的通用纤维来研究深层注意力的可识别性,并据此计算函数空间的维度。此外,对于单层模型,我们刻画了奇异点和边界点。最后,我们提出一个关于归一化自注意力网络结果的推测性扩展,在单层情况下证明该推测,并在深层情况下进行数值验证。

英文摘要

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

2501.04823 2026-06-12 cs.RO math.OC stat.AP 版本更新

Learning Robot Safety from Sparse Human Feedback using Conformal Prediction

基于共形预测从稀疏人类反馈中学习机器人安全

Aaron O. Feldman, Joseph A. Vincent, Maximilian Adang, JunEn Low, Mac Schwager

发表机构 * Department of Aeronautics and Astronautics, Stanford University(航空航天工程系,斯坦福大学)

AI总结 通过人类对策略轨迹的二元反馈,利用共形预测识别包含未来策略错误的状态区域,构建具有保证漏检率的预警系统,并用于改进模型预测控制器的安全性。

详情
AI中文摘要

确保机器人安全可能具有挑战性;用户定义的约束可能遗漏边缘情况,策略即使从安全数据训练也可能变得不安全,并且安全可能是主观的。因此,我们通过向标记不安全行为的人类展示策略轨迹来学习机器人安全。从这种二元反馈中,我们使用共形预测的统计方法识别一个状态区域(可能在学习的潜在空间中),保证包含用户指定比例的未来策略错误。我们的方法是样本高效的,因为它基于最近邻分类,避免了共形预测中常见的保留数据。通过提醒机器人是否到达可疑的不安全区域,我们获得了一个模拟人类安全偏好且具有保证漏检率的预警系统。通过视频标注,我们的系统可以检测四旋翼视觉运动策略何时无法通过指定门。我们提出了一种通过避免可疑不安全区域来改进策略的方法。通过它,我们提高了模型预测控制器的安全性,这在30次四旋翼飞行跨越6个导航任务的实验测试中得到了证明。提供了代码和视频。

英文摘要

Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.

2301.12538 2026-06-12 cs.LG cs.AI math.DS 版本更新

On Approximating the Dynamic Response of Synchronous Generators via Operator Learning: A Step Towards Building Deep Operator-based Power Grid Simulators

关于通过算子学习逼近同步发电机动态响应:迈向构建基于深度算子的电网模拟器的一步

Christian Moya, Amirhossein Mollaali, Guang Lin, Meng Yue

发表机构 * Purdue University(普渡大学)

AI总结 提出基于算子学习的框架,利用DeepONet逼近同步发电机的动态响应,并设计递归模拟方案及残差DeepONet方案,结合数据聚合策略实现与电网交互的模拟。

详情
AI中文摘要

本文开发了一个算子学习框架,用于逼近同步发电机的动态响应。该框架可用于(i)构建一个基于神经网络的发电机模型,与电网模拟器交互,或(ii)跟踪真实发电机的暂态响应。首先,我们开发了一个数据驱动的深度算子网络(DeepONet)来逼近发电机的无限维解算子。然后,我们设计了一个基于DeepONet的数值方案,在给定的时间范围内模拟发电机的响应。所提出的方案递归地使用训练好的DeepONet来模拟给定多维输入下的响应,该输入描述了发电机与电网之间的相互作用。此外,我们设计了一个残差DeepONet数值方案,可以整合现有数学模型的信息。我们为这个残差DeepONet方案提供了预测累积误差的估计。最后,我们构建了一个数据聚合(DAgger)策略,允许使用DeepONet在与其他电网组件交互模拟中可能遇到的聚合训练数据对DeepONet进行微调。作为概念验证,我们证明了所提出的框架能够有效逼近同步发电机的暂态模型。

英文摘要

This paper develops an Operator Learning framework for approximating the dynamic response of synchronous generators. The framework can be used to (i) build a neural network-based generator model that interacts with a power grid simulator or (ii) shadow the true generator's transient response. First, we develop a data-driven Deep Operator Network (DeepONet) to approximate the infinite-dimensional solution operator of the generators. Then, we design a numerical scheme based on DeepONet that simulates the generator's response over a given time horizon. The proposed scheme recursively employs the trained DeepONet to simulate the response for a given multi-dimensional input that describes the interaction between the generator and the power grid. In addition, we design a residual DeepONet numerical scheme that can incorporate information from existing mathematical models. We accompany this residual DeepONet scheme with an estimate for the prediction's cumulative error. Finally, we build a data aggregation (DAgger) strategy that allows fine-tuning of DeepONets using aggregated training data that the DeepONets will likely encounter during interactive simulations with other grid components. As a proof of concept, we demonstrate that the proposed frameworks can effectively approximate the transient model of a synchronous generator.

2604.24449 2026-06-12 cs.RO cs.AI cs.LG

SPLIT: Separating Physical-Contact via Latent Arithmetic in Image-Based Tactile Sensors

Wadhah Zai El Amri, Nicolás Navarro-Guerrero

发表机构 * Leibniz Universität Hannover, L3S Research Center(莱布尼茨汉诺威大学,L3S研究所)

详情
Comments
Accepted to Elsevier Robotics and Autonomous Systems Journal
英文摘要

Training machine learning models for robotic tactile sensing requires vast amounts of data, yet obtaining realistic interaction data remains a challenge due to physical complexity and variability. Simulating tactile sensors is thus a crucial step in accelerating progress. This paper presents SPLIT, a novel method for simulating image-based tactile sensors, with a primary focus on the DIGIT sensor. Central to our approach is a latent space arithmetic strategy that explicitly disentangles contact geometry from sensor-specific optical properties. Unlike methods that require recalibration for every new unit, this disentanglement allows SPLIT to adapt to diverse DIGIT backgrounds and even transfer data to distinct sensors like the GelSight R1.5 without full model retraining. Beyond this adaptability, our approach achieves faster inference speeds than existing alternatives. Furthermore, we provide a calibrated finite element method (FEM) soft-body mesh simulation with variable resolution, offering a tunable trade-off between speed and fidelity. Additionally, our algorithm supports bidirectional simulation, allowing for both the generation of realistic images from deformation meshes and the reconstruction of meshes from tactile images. This versatility makes SPLIT a valuable tool for accelerating progress in robotic tactile sensing research.

2511.20162 2026-06-12 cs.CV cs.AI q-bio.NC

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection

Daniel Harari, Michael Sidorov, Chen Shterental, Liel David, Abrham Kahsay Gebreselasie, Muhammad Haris Khan

发表机构 * Weizmann Institute of Science(魏茨曼科学研究所) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 workshop on Cognitive Foundations for Multimodal Models (CogVL)
英文摘要

Large multi-modal models (LMMs) show increasing performance in realistic visual tasks for images and, more recently, for videos. For example, given a video sequence, such models are able to describe in detail objects, the surroundings and dynamic actions. In this study, we explored the extent to which these models ground their semantic understanding in the actual visual input. Specifically, given sequences of hands interacting with objects, we asked models when and where the interaction begins or ends. For this purpose, we introduce a first of its kind, large-scale dataset with more than 20K annotated interactions on videos from the Something-Something-V2 dataset. 250 AMTurk human annotators labeled core interaction events, particularly when and where objects and agents become attached (`contact') or detached (`release'). We asked SoTA LMMs, including GPT, Gemini and Qwen to locate these events in short videos, each with a single event. The results show that while models reliably name target objects and identify actions, they exhibit a form of `shortcut learning' where semantic success masks a failure in physical grounding. Specifically, they consistently fail to identify the frame where the interaction begins or ends and poorly localize the physical event within the scene. This disconnect suggests that while LMMs excel at System 1 intuitive pattern recognition (naming the action and objects), they lack the System 2 cognitive foundations required to reason about physical primitives like `contact' and `release', hence truly ground dynamic scenes in physical reality.

2510.22266 2026-06-12 cs.LG cs.AI cs.CY

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

Rodrigo Tertulino, Ricardo Almeida

发表机构 * Federal Institute of Education, Science, and Technology of Rio Grande do Norte(巴西里约格朗德杜北教育、科学和技术联邦学院)

详情
英文摘要

Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and principal management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.

2307.05520 2026-06-12 cs.LG cs.CY cs.SE

Estimating Deep Learning energy consumption based on model architecture and training environment

Santiago del Rey, Luís Cruz, Xavier Franch, Silverio Martínez-Fernández

发表机构 * Universitat Politècnica de Catalunya(巴塞罗那理工大学) Tecnológico de Delft(代尔夫特理工大学)

详情
Comments
48 pages, 10 figures, under review in Computer Standards & Interfaces journal. This work is an extension of arXiv:2307.05520v3 [cs.LG]
英文摘要

To raise awareness of the environmental impact of deep learning (DL), many studies estimate the energy use of DL systems. However, energy estimates during DL training often rely on unverified assumptions. This work addresses that gap by investigating how model architecture and training environment affect energy consumption. We train a variety of computer vision models and collect energy consumption and accuracy metrics to analyze their trade-offs across configurations. Our results show that selecting the right model-training environment combination can reduce training energy consumption by up to 80.68% with less than 2% loss in $F_1$ score. We find a significant interaction effect between model and training environment: energy efficiency improves when GPU computational power scales with model complexity. Moreover, we demonstrate that common estimation practices, such as using FLOPs or GPU TDP, fail to capture these dynamics and can lead to substantial errors. To address these shortcomings, we propose the Stable Training Epoch Projection (STEP) and the Pre-training Regression-based Estimation (PRE) methods. Across evaluations, our methods outperform existing tools by a factor of two or more in estimation accuracy.

2509.13196 2026-06-12 cs.CL

The Few-shot Dilemma: Over-prompting Large Language Models

少样本困境:过度提示大型语言模型

Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler

发表机构 * Siemens AG(西门子股份公司) Technical University of Munich(慕尼黑技术大学)

AI总结 本文提出一个提示框架,使用随机采样、语义嵌入和TF-IDF三种少样本选择方法,在多个LLM上实验发现过多领域特定示例会降低性能,并通过TF-IDF与分层采样结合找到最优示例数量,在软件需求分类上超越现有方法1%。

详情
Comments
accepted for the main track of FLLM
AI中文摘要

过度提示是一种现象,即提示中过多的示例导致大型语言模型(LLMs)性能下降,挑战了关于上下文少样本学习的传统观点。为了研究这种少样本困境,我们概述了一个提示框架,该框架利用三种标准的少样本选择方法——随机采样、语义嵌入和TF-IDF向量——并在多个LLM上评估这些方法,包括GPT-4o、GPT-3.5-turbo、DeepSeek-V3、Gemma-3、LLaMA-3.1、LLaMA-3.2和Mistral。我们的实验结果表明,在提示中加入过多的领域特定示例可能会在某些LLM中反常地降低性能,这与先前认为更多相关少样本示例普遍有利于LLM的实证结论相矛盾。鉴于LLM辅助软件工程和需求分析的趋势,我们在两个真实世界的软件需求分类数据集上进行了实验。通过逐步增加TF-IDF选择和分层的少样本示例数量,我们为每个LLM确定了其最优数量。这种组合方法以更少的示例实现了更优的性能,避免了过度提示问题,从而在功能性和非功能性需求分类上超越了现有技术1%。

英文摘要

Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. To investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT-4o, GPT-3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, LLaMA-3.2, and Mistral. Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements.

2505.22169 2026-06-12 cs.CL

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

Gili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel Stanovsky

发表机构 * The Hebrew University of Jerusalem(耶路撒冷希伯来大学) Google Research(谷歌研究)

详情
Journal ref
Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11146-11153, Suzhou, China. Association for Computational Linguistics
Comments
Findings of EMNLP 2025
英文摘要

LLMs are highly sensitive to prompt phrasing, yet standard benchmarks typically report performance using a single prompt, raising concerns about the reliability of such evaluations. In this work, we argue for a stochastic method of moments evaluation over the space of meaning-preserving prompt perturbations. We introduce a formal definition of reliable evaluation that accounts for prompt sensitivity, and suggest ReliableEval - a method for estimating the number of prompt resamplings needed to obtain meaningful results. Using our framework, we stochastically evaluate five frontier LLMs and find that even top-performing models like GPT-4o and Claude-3.7-Sonnet exhibit substantial prompt sensitivity. Our approach is model-, task-, and metric-agnostic, offering a recipe for meaningful and robust LLM evaluation.

2402.13906 2026-06-12 cs.CL

Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction

Gili Lior, Yoav Goldberg, Gabriel Stanovsky

发表机构 * Allen Institute for AI(Allen人工智能研究所) The Hebrew University of Jerusalem(耶路撒冷希伯来大学) Bar-Ilan University(巴伊兰大学)

详情
Journal ref
Findings of the Association for Computational Linguistics: ACL 2024, pages 9538-9550, Bangkok, Thailand. Association for Computational Linguistics
Comments
Accepted to ACL 2024 findings
英文摘要

Document collections of various domains, e.g., legal, medical, or financial, often share some underlying collection-wide structure, which captures information that can aid both human users and structure-aware models. We propose to identify the typical structure of document within a collection, which requires to capture recurring topics across the collection, while abstracting over arbitrary header paraphrases, and ground each topic to respective document locations. These requirements pose several challenges: headers that mark recurring topics frequently differ in phrasing, certain section headers are unique to individual documents and do not reflect the typical structure, and the order of topics can vary between documents. Subsequently, we develop an unsupervised graph-based method which leverages both inter- and intra-document similarities, to extract the underlying collection-wide structure. Our evaluations on three diverse domains in both English and Hebrew indicate that our method extracts meaningful collection-wide structure, and we hope that future work will leverage our method for multi-document applications and structure-aware models.

2505.18060 2026-06-12 cs.CV

Semantic Correspondence: Unified Benchmarking and a Strong Baseline

Kaiyan Zhang, Xinghui Li, Jingyi Lu, Kai Han

发表机构 * The University of Hong Kong(香港大学)

详情
Journal ref
IEEE Trans. Pattern Anal. Mach. Intell. 48, no. 3 (2026) 3911-3930
英文摘要

Establishing semantic correspondence is a challenging task in computer vision, aiming to match keypoints with the same semantic information across different images. Benefiting from the rapid development of deep learning, remarkable progress has been made over the past decade. However, a comprehensive review and analysis of this task remains absent. In this paper, we present the first extensive survey of semantic correspondence methods. We first propose a taxonomy to classify existing methods based on the type of their method designs. These methods are then categorized accordingly, and we provide a detailed analysis of each approach. Furthermore, we aggregate and summarize the results of methods in literature across various benchmarks into a unified comparative table, with detailed configurations to highlight performance variations. Additionally, to provide a detailed understanding on existing methods for semantic matching, we thoroughly conduct controlled experiments to analyse the effectiveness of the components of different methods. Finally, we propose a simple yet effective baseline that achieves state-of-the-art performance on multiple benchmarks, providing a solid foundation for future research in this field. We hope this survey serves as a comprehensive reference and consolidated baseline for future development. Code is publicly available at: https://github.com/Visual-AI/Semantic-Correspondence.

2606.12443 2026-06-12 cs.CY cs.AI cs.CL 新提交

Occupational Prompting Reveals Cultural Bias in Large Language Models

职业提示揭示大型语言模型中的文化偏见

Maksim E. Eren, Andrea Brennen, Ryan C. Barron, Eric Michalak

发表机构 * U.S. Government(美国政府)

AI总结 通过职业提示(如会计师、教师)替代国籍提示,研究开源LLM在价值观调查中的响应,发现不同职业导致文化地图内偏移,表明职业角色引发结构化价值模式。

详情
AI中文摘要

社会角色塑造期望、优先级和判断,但大型语言模型(LLM)如何将职业身份与更广泛的文化价值模式关联仍不清楚。先前工作使用基于国籍的文化提示来研究LLM对价值观调查问题的响应如何与人类文化基准对齐。本文通过用职业提示替代文化提示,扩展了该框架,以检查职业角色线索如何影响开源LLM的价值观调查响应。使用基于综合价值观调查问题的调查评估流程,我们将模型响应投影到二维Inglehart-Welzel文化空间。我们提示开源LLM以职业身份(如会计师、教师、工程师和护士)回答问题,然后分析这些职业条件化响应在文化地图上的位置。结果表明,当用职业而非国籍身份提示开源LLM时,其响应仍位于文化地图的广泛西方倾向区域。然而,不同职业在该区域内引入偏移,产生不同的职业偏差。这表明职业提示并非被视为中性角色标签,而是引发结构化价值模式。这些发现将基于调查的文化偏见评估扩展到国籍提示之外,并提供了研究职业角色如何塑造LLM中价值表达的框架。

英文摘要

Social roles shape expectations, priorities, and judgments, yet it remains unclear how large language models (LLMs) associate occupational identities with broader cultural value patterns. Prior work used nationality-based cultural prompting to study how LLM responses to value-survey questions align with human cultural benchmarks. In this paper, we extend that framework by replacing cultural prompting with occupational prompting to examine how professional-role cues influence value-survey responses in open-weight LLMs. Using a survey-grounded evaluation pipeline based on questions from the Integrated Values Surveys, we project model responses into the two-dimensional Inglehart--Welzel cultural space. We prompt open-weight LLMs to answer questions under occupational identities such as accountant, teacher, engineer, and nurse, and then analyze how these occupation-conditioned responses are positioned on the cultural map. Our results show that when open-weight LLMs are prompted with occupations rather than national identities, their responses remain within a broadly Western-leaning region of the cultural map. However, different occupations introduce shifts within this region, producing distinct occupational skews. This indicates that occupational prompts are not treated as neutral role labels, but instead elicit structured value patterns. These findings extend survey-based evaluation of cultural bias beyond nationality-based prompting and provide a framework for studying how occupational personas shape value expression in LLMs.

2606.12442 2026-06-12 cs.CY cs.AI 新提交

Reframing AI Loss of Control: What It Is, How to Have It, How to Lose It

重新定义AI失控:它是什么,如何拥有,如何失去

Ze Shen Chin, Maurice Chiodo, Dennis Müller, Coleman Snell

发表机构 * Oxford Martin AI Governance Initiative AI Standards Lab(牛津马丁人工智能治理倡议人工智能标准实验室) Centre for the Study of Existential Risk, University of Cambridge(存在风险研究中心,剑桥大学) Institute of Mathematics Education, University of Cologne(数学教育研究所,科隆大学) Cornell University(康奈尔大学)

AI总结 本文通过将控制锚定于“设定和获取目标”,建立控制的工作定义,探讨控制如何被失去、AI如何导致失控,并提出维持控制的建议。

详情
Comments
56 pages
AI中文摘要

目前,失控风险在公众讨论中备受关注,尤其是在AI领域,学术界、前沿实验室甚至政府都进行了广泛讨论。然而,在现有文献中,这一概念的基础似乎出奇地薄弱,即使是那些广泛讨论失控的人,也没有首先确立什么是控制以及究竟失去了什么。本文旨在解决这些空白。我们将控制锚定于“设定和获取目标”,从而建立控制的工作定义。然后,我们基于控制论、管理控制和控制理论等相关领域的基础概念,讨论控制的各个方面。这包括谁(或什么)可以处于控制之中,以及他们需要什么才能处于控制之中,例如设定目标的能力、拥有功能性的控制回路、具备必要的多样性以及足够的目标对齐。一旦建立了控制框架,我们将讨论控制如何被失去,AI如何导致这种失控,并提供关于如何保持控制的相关建议。我们工作的一个有趣结果是,人类作为个体和群体,可能因远低于超级智能水平的AI行为而失去不同程度的控制;失控情景(如我们所定义的)的可能性已经存在,并且已经存在了很长时间。

英文摘要

At present, loss of control risks have gained much prominence in public discussion, particularly in relation to AI, with extensive discourse present among academics, frontier labs, and even governments. However, in the existing literature, the concept seems to rest on surprisingly weak foundations, where even those that discuss loss of control extensively do not first establish what control is and what exactly is being lost. Our paper aims to address these gaps. We establish a working definition of control by anchoring it to the "setting and getting of goals". Then, we discuss various aspects of control, built on foundational concepts from related fields like cybernetics, management control, and control theory. This includes who (or what) can be in control, and the things they require to be in control, such as the ability to set goals, having a functional control loop, having requisite variety, and having sufficient goal alignment. Once a framework for control is established, we then discuss how control can be lost, how AIs can contribute to such loss of control, and offer relevant recommendations for how one can maintain control. One interesting consequence of our work is that humanity, as individuals and as groups, can lose varying degrees of control as a result of AI behaviour that is far below the level of superintelligence; the potential for loss of control scenarios (as we define them) already exist, and have existed for a long time.

2606.12439 2026-06-12 cs.CY cs.AI 新提交

Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots

立场:生成式引擎优化带来未被充分研究的风险,治理必须聚焦于集中化、披露和学术盲点

Yizhu Wen, Nan Zhang, Haohan Yuan, Xun Chen, Haopeng Zhang, Hanqing Guo

发表机构 * GitHub

AI总结 本文分析从搜索引擎优化到生成式引擎优化的转变,识别出集中化影响、未披露的商业影响和学术-工业盲点三大风险,主张答案级别的治理与测量。

详情
Journal ref
https://icml.cc/virtual/2026/poster/67185
Comments
This paper is accepted by the ICML 2026 Position Track
AI中文摘要

大型语言模型(LLM)答案引擎越来越多地被用于信息搜索,将可见性从排名列表转变为合成答案。这使得生成式引擎优化(GEO)成为可能,它针对LLM答案引擎的证据池和生成过程。我们分析了从搜索引擎优化(SEO)到GEO的转变,识别出两个风险:(i)由于低可争议性和系统敏感性导致的集中化影响,以及(ii)嵌入在证据和推理中的未披露的商业影响。然后,我们形式化了一个通用的GEO管道,以定位优化行为发生的位置,并比较学术和工业实践,揭示了第三个风险:(iii)由离线设置和部署系统之间的可见性和评估不对称性驱动的学术-工业盲点。这一立场主张需要答案级别的治理和测量:更强的可争议性、高精度披露、对实质性影响的黑盒审计,以及用于暴露持久性的部署对齐指标。

英文摘要

Large language model (LLM) answer engines are increasingly used for information seeking, shifting visibility from ranked lists to synthesized answers. This enables Generative Engine Optimization (GEO), which targets LLM answer engines' evidence pool and generation. We analyze the search engine optimization (SEO) to GEO transition to identify two risks: (i) concentrated influence from low contestability and system sensitivity, and (ii) undisclosed commercial influence embedded in evidence and reasoning. We then formalize a general GEO pipeline to locate where optimization acts and compare academic and industry practices, revealing a third risk: (iii) academic-industry blind spots driven by visibility and evaluation asymmetries between offline setups and deployed systems. This position argues the need for answer-level governance and measurement: stronger contestability, high-precision disclosure, black-box auditing of material influence, and deployment-aligned metrics for exposure persistence.