arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2117
2509.14210 2026-06-12 cs.RO 版本更新

GLIDE: A Coordinated Aerial-Ground Framework for Search and Rescue in Unknown Environments

GLIDE:未知环境下的空地协同搜索与救援框架

Seth Farrell, Chenghao Li, Henrik I. Christensen

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出GLIDE框架,通过两架无人机与一辆无人地面车协同,实现未知环境中的快速受害者定位和障碍物感知导航,利用角色分离和地形侦察提升救援效率。

详情
AI中文摘要

我们提出了一种空地协同搜索与救援(SAR)框架,将两架无人机(UAV)与一辆无人地面车(UGV)配对,以在未知环境中实现快速受害者定位和障碍物感知导航。我们将该框架命名为引导式长视距集成无人机护航(GLIDE),强调UGV在长视距规划中对UAV引导的依赖。在我们的框架中,目标搜索UAV执行实时机载受害者检测和地理参考,为地面平台提名目标,而地形侦察UAV则在UGV计划路径前方飞行,提供中程可通行性更新。UGV融合空中线索与本地感知,执行时间高效的A*规划,并在信息到达时持续重新规划。此外,我们进行了硬件演示(使用GEM e6高尔夫球车作为UGV和两架X500 UAV),以评估端到端SAR任务性能,并包括模拟消融实验,以独立于检测评估规划栈。实证结果表明,UAV之间的明确角色分离,结合地形侦察和引导规划,在时间关键的SAR任务中改善了到达时间和导航安全性。

英文摘要

We present a cooperative aerial-ground search-and-rescue (SAR) framework that pairs two unmanned aerial vehicles (UAVs) with an unmanned ground vehicle (UGV) to achieve rapid victim localization and obstacle-aware navigation in unknown environments. We dub this framework Guided Long-horizon Integrated Drone Escort (GLIDE), highlighting the UGV's reliance on UAV guidance for long-horizon planning. In our framework, a goal-searching UAV executes real-time onboard victim detection and georeferencing to nominate goals for the ground platform, while a terrain-scouting UAV flies ahead of the UGV's planned route to provide mid-level traversability updates. The UGV fuses aerial cues with local sensing to perform time-efficient A* planning and continuous replanning as information arrives. Additionally, we present a hardware demonstration (using a GEM e6 golf cart as the UGV and two X500 UAVs) to evaluate end-to-end SAR mission performance and include simulation ablations to assess the planning stack in isolation from detection. Empirical results demonstrate that explicit role separation across UAVs, coupled with terrain scouting and guided planning, improves reach time and navigation safety in time-critical SAR missions.

2603.00610 2026-06-12 cs.SD cs.AI cs.LG cs.MM eess.AS 版本更新

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

CMI-RewardBench: 基于组合多模态指令评估音乐奖励模型

Yinghao Ma, Haiwen Xia, Hewei Gao, Weixiong Chen, Yuxin Ye, Yuchen Yang, Sungkyun Chang, Mingshuo Ding, Yizhi Li, Ruibin Yuan, Simon Dixon, Emmanouil Benetos

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学) University of Cambridge(剑桥大学) University of Toronto(多伦多大学)

AI总结 针对音乐生成模型缺乏有效评估机制的问题,提出CMI-RewardBench基准,包含大规模偏好数据集和参数高效奖励模型,实现多模态指令下的音乐质量评估。

详情
Comments
Accepted by ICML 2026
AI中文摘要

虽然音乐生成模型已经发展到能够处理混合文本、歌词和参考音频的复杂多模态输入,但评估机制却滞后了。在本文中,我们通过为组合多模态指令(CMI)下的音乐奖励建模建立了一个全面的生态系统来弥补这一关键差距,其中生成的音乐可能以文本描述、歌词和音频提示为条件。我们首先引入了CMI-Pref-Pseudo,一个包含11万个伪标签样本的大规模偏好数据集,以及CMI-Pref,一个针对细粒度对齐任务量身定制的高质量人工标注语料库。为了统一评估格局,我们提出了CMI-RewardBench,一个统一的基准,用于评估音乐奖励模型在音乐性、文本-音乐对齐和组合指令对齐方面的异质样本。利用这些资源,我们开发了CMI奖励模型(CMI-RMs),一个能够处理异质输入的参数高效奖励模型家族。我们评估了它们与人类判断分数在音乐性和对齐方面的相关性,使用了CMI-Pref以及之前的数据集。进一步的实验表明,CMI-RM不仅与人类判断高度相关,而且通过top-k过滤实现了有效的推理时扩展。代码可在GitHub(此 https URL )获取。模型权重:CMI-RM(此 https URL )。数据集:CMI-Pref-Pseudo(此 https URL )和CMI-Pref(此 https URL )。

英文摘要

While music generation models have evolved to handle complex multimodal inputs mixing text, lyrics, and reference audio, evaluation mechanisms have lagged behind. In this paper, we bridge this critical gap by establishing a comprehensive ecosystem for music reward modeling under Compositional Multimodal Instruction (CMI), where the generated music may be conditioned on text descriptions, lyrics, and audio prompts. We first introduce CMI-Pref-Pseudo, a large-scale preference dataset comprising 110k pseudo-labeled samples, and CMI-Pref, a high-quality, human-annotated corpus tailored for fine-grained alignment tasks. To unify the evaluation landscape, we propose CMI-RewardBench, a unified benchmark that evaluates music reward models on heterogeneous samples across musicality, text-music alignment, and compositional instruction alignment. Leveraging these resources, we develop CMI reward models (CMI-RMs), a parameter-efficient reward model family capable of processing heterogeneous inputs. We evaluate their correlation with human judgment scores on musicality and alignment on CMI-Pref along with previous datasets. Further experiments demonstrate that CMI-RM not only correlates strongly with human judgments, but also enables effective inference-time scaling via top-k filtering. Code is available at GitHub (https://github.com/Haiwen-Xia/CMI-RewardBench). Model weights: CMI-RM (https://huggingface.co/HaiwenXia/CMI-RM). Datasets: CMI-Pref-Pseudo (https://huggingface.co/datasets/HaiwenXia/cmi-pref-pseudo) and CMI-Pref (https://huggingface.co/datasets/HaiwenXia/cmi-pref)

2603.00167 2026-06-12 cs.RO 版本更新

EgoMoD: Predicting Global Maps of Dynamics from Local Egocentric Observations

EgoMoD:从局部自我中心观测预测全局动态地图

Iacopo Catalano, David Morilla-Cabello, Jorge Pena-Queralta, Eduardo Montijano

发表机构 * University of Turku, Finland(芬兰图尔库大学) Centre for Artificial Intelligence, Zürich University of Applied Sciences, Winterthur, Switzerland(瑞士应用科学大学人工智能中心) Instituto de Investigación en Ingeniería de Aragón, Universidad de Zaragoza, Spain(西班牙阿拉贡工程研究所,萨拉戈萨大学)

AI总结 提出EgoMoD方法,利用短时自我中心视频和位姿条件架构,学习从局部观测预测全局运动动态地图,替代传统全局感知基础设施,实现零样本迁移。

详情
AI中文摘要

在动态环境中高效导航需要预测机器人即时感知范围之外的运动模式演变,从而在拥挤场景中实现先发制人而非纯粹反应式规划。运动动态地图(MoDs)提供了空间中运动趋势的结构化表示,有助于长期全局规划,但传统上需要长时间全局环境观测来构建。我们提出EgoMoD,这是第一种学习直接从机器人操作期间收集的短时自我中心视频片段预测未来MoDs的方法。我们的方法使用视频和位姿条件架构,以从外部观测计算的MoDs作为特权监督进行训练,从而学习从局部动态线索推断环境范围的运动趋势,使局部观测成为全局运动结构的预测信号。因此,我们能够预测整个环境的未来运动动态,而不仅仅是扩展机器人视野中的过去模式。作为特定地点的动态先验,EgoMoD在推理时用标准车载传感器替代了先前MoD方法所需的外部全局感知基础设施。在大型模拟环境中的实验表明,EgoMoD能在有限可观测性下预测未来MoDs,而使用真实图像的评估展示了其对真实系统的零样本迁移能力。

英文摘要

Efficient navigation in dynamic environments requires anticipating how motion patterns evolve beyond the robot's immediate perceptual range, enabling preemptive rather than purely reactive planning in crowded scenes. Maps of Dynamics (MoDs) offer a structured representation of motion tendencies in space useful for long-term global planning, but constructing them traditionally requires global environment observations over extended periods of time. We introduce EgoMoD, the first approach that learns to predict future MoDs directly from short egocentric video clips collected during robot operation. Our method learns to infer environment-wide motion tendencies from local dynamic cues using a video- and pose-conditioned architecture trained with MoDs computed from external observations as privileged supervision, allowing local observations to serve as predictive signals of global motion structure. Thanks to this, we offer the capacity to forecast future motion dynamics over the whole environment rather than merely extend past patterns in the robot's field of view. As a site-specific dynamic prior, EgoMoD replaces the external global sensing infrastructure required by prior MoD methods at inference time with standard onboard sensors. Experiments in large simulated environments show that EgoMoD predicts future MoDs under limited observability, while evaluation with real images showcases its zero-shot transferability to real systems.

2603.00025 2026-06-12 cs.CL 版本更新

TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

TAB-PO:面向Token关键结构化生成的具有Token级自适应障碍的偏好优化

Samah Fodeh, Linhai Ma, Ganesh Puthiaraju, Srivani Talakokkul, Afshan Khan, Sreeraj Ramachandran, Elyas Irankhah, Muhammad Arif, Ashley Hagaman, Sarah R. Lowe, Aimee Kendall Roundtree

发表机构 * Yale University(耶鲁大学) Texas State University(德克萨斯州立大学)

AI总结 针对结构化预测中偏好与拒绝对象仅少数token不同导致的梯度稀释和token侵蚀问题,提出基于混淆感知偏好构建和Token级自适应障碍的TAB-PO方法,在SciERC任务上显著提升关键指标。

详情
AI中文摘要

直接偏好优化(DPO)是一种有效且广泛采用的离线对齐方法,但难以适应本体驱动的结构化预测,其中偏好和拒绝的JSON对象通常仅在少数模式定义token上存在差异。在这种低编辑距离场景下,序列级DPO将梯度质量分散到非关键的序列化token上(梯度稀释),并可能降低罕见、低置信度的偏好模式token的似然(token侵蚀)。为解决这些限制,我们首先开发了一种混淆感知的偏好构建策略,该策略用从验证集SFT预测中估计的经验结构化错误模式来增强专家策划的歧义模式,合成最小扰动的、模式有效的负样本,将偏好学习聚焦于现实的本体级决策错误。然后,我们引入了Token自适应障碍偏好优化(TAB-PO),这是一种用于token关键结构化生成的SFT后目标。TAB-PO添加了一个置信门控的token级障碍,对低置信度的模式token施加监督锚定。在公开的SciERC科学信息抽取任务上,使用1.5B到70B的Llama/Qwen模型评估,TAB-PO在本体关键的语义标签和关系链接指标上平均比SFT提升11.59%,在这些指标上100%胜于最强的token级和序列级DPO变体,并领先领先的前沿模型14.71%,同时在文本基础方面取得了强劲的增益。

英文摘要

Direct Preference Optimization (DPO) is an effective and widely adopted approach for offline alignment but is poorly matched to ontology-driven structured prediction, where preferred and rejected JSON objects often differ in only a few schema-defining tokens. In this low-edit-distance regime, sequence-level DPO spreads gradient mass across non-critical serialization tokens (gradient dilution) and can reduce likelihood on rare, under-confident preferred schema tokens (token erosion). To address these limitations, we first develop a confusion-aware preference-construction strategy that augments expert-curated ambiguity patterns with empirical structured-error modes estimated from validation-set SFT predictions, synthesizing minimally perturbed, schema-valid negatives that focus preference learning on realistic ontology-level decision errors. We then introduce Token-Adaptive Barrier Preference Optimization (TAB-PO), a post-SFT objective for token-critical structured generation. TAB-PO adds a confidence-gated token-level barrier that applies supervised anchoring to under-confident schema tokens. On the public SciERC scientific information extraction task, evaluated with Llama/Qwen models from 1.5B to 70B, TAB-PO improves ontology-critical semantic-label and relational-linking metrics over SFT by 11.59% on average, wins 100% of comparisons against the strongest token-level and sequence-level DPO variants on these metrics, and surpasses leading frontier models by 14.71%, while delivering strong gains in textual grounding.

2510.02524 2026-06-12 cs.CL cs.FL cs.LG 版本更新

Unraveling Syntax: Language Modeling and the Substructure of Grammars

解析句法:语言建模与语法的子结构

Laura Ying Schulz, Daniel Mitropolsky, Tomaso Poggio

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文研究语言模型在上下文无关语法子结构上的学习行为,证明损失函数在顶层子语法上线性递归,并发现参数化模型并行学习子语法,子语法预训练能提升小模型性能并改善内部表征。

详情
Comments
Equal contribution by LYS and DM. Accepted to the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

尽管语言模型取得了令人印象深刻的结果,但其学习动态远未被理解。许多感兴趣的领域——如自然语言句法、编程语言、算术——都由上下文无关语法(CFG)捕获。在这项工作中,我们将先前关于CFG神经语言建模的工作扩展到一个新的方向:语言建模如何相对于CFG子结构(即子语法)表现。我们定义了子语法,并证明了一组连接语言建模和子语法的基本定理。我们表明,语言建模损失在其顶层子语法上线性递归;递归应用,损失分解为“不可约”子语法的损失。在额外假设下,并且经验上,参数化模型并行学习子语法,不同于首先掌握简单子结构的儿童。我们发现,子语法预训练可以提高最终性能,但仅对于相对于语法而言微小的模型,而对齐分析表明,预训练一致地导致内部表征更好地反映语法的子结构。

英文摘要

While language models achieve impressive results, their learning dynamics are far from understood. Many domains of interest -- such as natural language syntax, coding languages, arithmetic -- are captured by context-free grammars (CFGs). In this work, we extend prior work on neural language modeling of CFGs in a novel direction: how language modeling behaves with respect to CFG substructure, namely subgrammars. We define subgrammars, and prove a set of fundamental theorems connecting language modeling and subgrammars. We show that language modeling loss recurses linearly over its top-level subgrammars; applied recursively, the loss decomposes into losses for "irreducible" subgrammars. Under additional assumptions, and empirically, parametrized models learn subgrammars in parallel, unlike children who first master simple substructures. We find that subgrammar pretraining can improve final performance, but only for tiny models relative to the grammar, while alignment analyses show that pretraining consistently leads to internal representations that better reflect the grammar's substructure.

2602.22629 2026-06-12 cs.CV 版本更新

CRAG: Can 3D Generative Models Help 3D Assembly?

CRAG: 3D生成模型能否辅助3D装配?

Zeyu Jiang, Sihang Li, Siqi Tan, Chenyang Xu, Juexiao Zhang, Julia Galway-Witham, Xue Wang, Scott A. Williams, Radu Iovita, Chen Feng, Jing Zhang

发表机构 * arXiv.org University of California, Berkeley(加州大学伯克利分校)

AI总结 提出CRAG方法,将3D装配与形状生成联合优化,通过生成完整形状和预测部件姿态实现相互增强,在多种几何、部件数和缺失情况下达到最优性能。

详情
Comments
15 pages, 8 figures
AI中文摘要

大多数现有的3D装配方法将问题视为纯姿态估计,通过刚性变换重新排列观察到的部件。相比之下,人类装配自然地将结构推理与整体形状推断相结合。受此直觉启发,我们将3D装配重新表述为装配和生成的联合问题。我们表明这两个过程相互增强:装配为生成提供部件级结构先验,而生成注入整体形状上下文,解决装配中的歧义。与无法合成缺失几何形状的先前方法不同,我们提出了CRAG,它同时生成合理的完整形状并预测输入部件的姿态。大量实验表明,在具有不同几何形状、不同部件数量和缺失部件的野外物体上,该方法达到了最先进的性能。项目页面:this https URL

英文摘要

Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces. Project Page: https://ai4ce.github.io/CRAG/

2602.00462 2026-06-12 cs.CV cs.AI 版本更新

LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs

LatentLens: 揭示大语言模型中高度可解释的视觉标记

Benno Krojer, Shravan Nayak, Oscar Mañas, Vaibhav Adlakha, Desmond Elliott, Siva Reddy, Marius Mosbach

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出 LatentLens 方法,通过将视觉标记与文本语料库中的上下文标记表示进行最近邻匹配,实现视觉标记的可解释性,发现大多数视觉标记在各层均具有可解释性。

详情
Comments
ICML 2026 (Camera Ready)
AI中文摘要

将大型语言模型(LLM)转换为视觉语言模型(VLM)可以通过将视觉编码器输出的视觉标记映射到LLM的嵌入空间来实现。有趣的是,这种映射可以简单到浅层MLP变换。为了理解LLM为何能如此容易地处理视觉标记,我们需要可解释性方法来揭示在LLM处理的每一层中视觉标记表示所编码的内容。在这项工作中,我们引入了LatentLens,一种将潜在表示映射到自然语言描述的新方法。LatentLens编码一个大型文本语料库,并存储该语料库中每个标记的上下文化标记表示。然后将视觉标记表示与这些上下文化表示进行比较,并将最邻近的表示作为视觉标记的描述。我们在15个不同的VLM上评估了该方法,结果表明,常用的方法(如LogitLens)大大低估了视觉标记的可解释性。相反,使用LatentLens,大多数视觉标记在所有研究的模型和所有层中都是可解释的。定性上,我们展示了LatentLens产生的描述在语义上有意义,并且与单个标记相比,为人类提供了更细粒度的解释。更广泛地说,我们的发现为视觉和语言表示之间的对齐提供了新的证据,并为分析LLM的潜在表示开辟了新的方向。

英文摘要

Transforming a large language model (LLM) into a vision-language model (VLM) can be achieved by mapping the visual tokens from a vision encoder into the embedding space of an LLM. Intriguingly, this mapping can be as simple as a shallow MLP transformation. To understand why LLMs can so readily process visual tokens, we need interpretability methods that reveal what is encoded in the visual token representations at every layer of LLM processing. In this work, we introduce LatentLens, a novel approach for mapping latent representations to descriptions in natural language. LatentLens encodes a large text corpus and stores contextualized token representations for each token in that corpus. Visual token representations are then compared to these contextualized representations and the top-nearest neighbor representations serve as descriptions of the visual token. We evaluate this method on 15 different VLMs, showing that commonly used methods, such as LogitLens, substantially underestimate the interpretability of visual tokens. With LatentLens instead, the majority of visual tokens are interpretable across all studied models and all layers. Qualitatively, we show that the descriptions produced by LatentLens are semantically meaningful and provide more fine-grained interpretations for humans compared to individual tokens. More broadly, our findings contribute new evidence on the alignment between vision and language representations and open up new directions for analyzing the latent representations of LLMs.

2602.18154 2026-06-12 cs.CL cs.AI cs.DB 版本更新

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

FENCE:一个金融和多模态越狱检测数据集

Mirae Kim, Seonghun Jeong, Youngjun Kwak

发表机构 * arXiv

AI总结 针对金融领域多模态越狱检测资源匮乏的问题,提出FENCE数据集,包含韩英双语文本和图像,用于训练和评估检测器,实验表明基线检测器准确率达99%。

详情
Comments
lrec 2026 accepted paper
AI中文摘要

越狱对大型语言模型(LLM)和视觉语言模型(VLM)的部署构成重大风险。VLM尤其脆弱,因为它们处理文本和图像,创造了更广泛的攻击面。然而,可用于越狱检测的资源很少,特别是在金融领域。为填补这一空白,我们提出了FENCE,一个双语(韩语-英语)多模态数据集,用于训练和评估金融应用中的越狱检测器。FENCE通过金融相关查询与图像威胁配对,强调领域真实性。使用商业和开源VLM进行的实验揭示了持续的脆弱性,GPT-4o显示出可测量的攻击成功率,而开源模型则表现出更大的暴露。在FENCE上训练的基线检测器实现了99%的分布内准确率,并在外部基准测试中保持强劲性能,突显了该数据集在训练可靠检测模型方面的鲁棒性。FENCE为推进金融领域的多模态越狱检测以及支持敏感领域中更安全、更可靠的AI系统提供了重点资源。警告:本文包含可能具有冒犯性的示例数据。

英文摘要

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images, creating broader attack surfaces. However, available resources for jailbreak detection are scarce, particularly in finance. To address this gap, we present FENCE, a bilingual (Korean-English) multimodal dataset for training and evaluating jailbreak detectors in financial applications. FENCE emphasizes domain realism through finance-relevant queries paired with image-grounded threats. Experiments with commercial and open-source VLMs reveal consistent vulnerabilities, with GPT-4o showing measurable attack success rates and open-source models displaying greater exposure. A baseline detector trained on FENCE achieves 99 percent in-distribution accuracy and maintains strong performance on external benchmarks, underscoring the dataset's robustness for training reliable detection models. FENCE provides a focused resource for advancing multimodal jailbreak detection in finance and for supporting safer, more reliable AI systems in sensitive domains. Warning: This paper includes example data that may be offensive.

2602.15424 2026-06-12 cs.RO 版本更新

Lyapunov-Based PI-Like Control for Robust Trajectory Tracking of a Four-Wheel Independently Driven and Steered Robot: Design and Experimental Validation

基于李雅普诺夫的PI类控制用于四轮独立驱动与转向机器人的鲁棒轨迹跟踪:设计与实验验证

Branimir Ćaran, Vladimir Milić, Marko Švaco, Bojan Jerbić

发表机构 * Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb(Zagreb大学机械工程与造船工程学院) Regional Centre of Excellence for Robotic Technology (CRTA)(机器人技术卓越研究中心) Croatian Academy of Sciences and Arts(克罗地亚科学院)

AI总结 提出一种基于李雅普诺夫的PI类控制器,结合模型前馈补偿,实现四轮独立驱动与转向机器人的鲁棒轨迹跟踪,并通过实验验证其优于PI和滑模控制器。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

本文提出了一种基于李雅普诺夫综合的PI类控制器,用于独立驱动和转向的四轮移动机器人的鲁棒轨迹跟踪。对于本文所考虑的机器人,使用了一个明确的结构验证数学模型,以实现系统化的控制器设计,并具有严格的稳定性保证,适用于实时实现。针对内环的速度误差和积分误差联合动力学,开发了基于李雅普诺夫的实用稳定性分析,得出了速度误差和积分误差联合状态的实用稳定性和一致最终有界性的显式界和充分条件。所得控制律保留了PI类结构,并具有基于模型的前馈补偿,使其适用于标准嵌入式平台上的实现,同时提高了对构型依赖的残余动力学和未建模效应的鲁棒性。所提设计的有效性和鲁棒性在四轮独立转向和独立驱动的移动机器人平台上进行了实验验证,包括水平和垂直操作条件,并与PI控制器和滑模控制器进行了对比。

英文摘要

In this paper, a Lyapunov-based synthesis of a PI-like controller is proposed for robust trajectory tracking of an independently driven and steered four-wheel mobile robot. For the robot considered in this work, an explicit structurally verified mathematical model is used to enable systematic controller design with rigorous stability guarantees suitable for real time implementation. An augmented Lyapunov-based practical stability analysis is developed for the combined velocity-error and integral-error dynamics of the inner loop, yielding explicit bounds and sufficient conditions for practical stability and uniform ultimate boundedness of the combined velocity-error and integral-error state. The resulting control law retains a PI-like structure with model-based feedforward compensation, making it suitable for implementation on standard embedded platforms while improving robustness against configuration dependent residual dynamics and unmodelled effects. The effectiveness and robustness of the proposed design are demonstrated experimentally on a four-wheel independently steered and independently driven mobile robot platform, under both horizontal and vertical operating conditions and benchmarked against a PI controller and a sliding-mode controller.

2602.14367 2026-06-12 cs.CL cs.AI cs.IR cs.LG 版本更新

InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem

InnoEval:将研究思路评估视为基于知识的多视角推理问题

Shuofei Qiao, Yunxiang Wei, Xuehai Wang, Bin Wu, Boyang Xue, Ningyu Zhang, Hossein A. Rahmani, Yanshan Wang, Qiang Zhang, Keyan Ding, Jeff Z. Pan, Huajun Chen, Emine Yilmaz

发表机构 * arXiv.org University of Science and Technology of China(中国科学技术大学)

AI总结 提出InnoEval框架,通过异构深度知识检索和多视角评审委员会,实现基于知识的多维度解耦评估,在点对点、成对和分组评估任务中优于基线方法。

详情
Comments
ICML 2026
AI中文摘要

大型语言模型的快速发展催生了科学思路的激增,但这一飞跃并未伴随思路评估的相应进步。科学评估的基本性质需要知识基础、集体审议和多标准决策。然而,现有的思路评估方法往往存在知识视野狭窄、评估维度扁平化以及LLM作为评判者的固有偏见。为解决这些问题,我们将思路评估视为一个基于知识的多视角推理问题,并引入InnoEval,一个深度创新评估框架,旨在模拟人类水平的思路评估。我们应用了一个异构深度知识搜索引擎,从多样化的在线来源中检索和获取动态证据。我们进一步通过一个包含不同学术背景的评审员的创新评审委员会实现评审共识,从而在多个指标上进行多维解耦评估。我们构建了来自权威同行评审提交的全面数据集,以基准测试InnoEval。实验表明,InnoEval在点对点、成对和分组评估任务中始终优于基线方法,展现出与人类专家高度一致的判断模式和共识。

英文摘要

The rapid evolution of Large Language Models has catalyzed a surge in scientific idea production, yet this leap has not been accompanied by a matching advance in idea evaluation. The fundamental nature of scientific evaluation needs knowledgeable grounding, collective deliberation, and multi-criteria decision-making. However, existing idea evaluation methods often suffer from narrow knowledge horizons, flattened evaluation dimensions, and the inherent bias in LLM-as-a-Judge. To address these, we regard idea evaluation as a knowledge-grounded, multi-perspective reasoning problem and introduce InnoEval, a deep innovation evaluation framework designed to emulate human-level idea assessment. We apply a heterogeneous deep knowledge search engine that retrieves and grounds dynamic evidence from diverse online sources. We further achieve review consensus with an innovation review board containing reviewers with distinct academic backgrounds, enabling a multi-dimensional decoupled evaluation across multiple metrics. We construct comprehensive datasets derived from authoritative peer-reviewed submissions to benchmark InnoEval. Experiments demonstrate that InnoEval can consistently outperform baselines in point-wise, pair-wise, and group-wise evaluation tasks, exhibiting judgment patterns and consensus highly aligned with human experts.

2602.12753 2026-06-12 cs.LG 版本更新

Hierarchical Successor Representation for Robust Transfer

层次化后继表示用于鲁棒迁移

Changmin Yu, Máté Lengyel

发表机构 * University of Cambridge(剑桥大学) DeepMind(深度思维)

AI总结 提出层次化后继表示(HSR),通过时间抽象构建鲁棒的状态特征,结合非负矩阵分解实现稀疏低秩表示,支持多隔间环境下的高效任务迁移与探索。

详情
AI中文摘要

后继表示(SR)为将预测动态与奖励解耦提供了强大框架,能够实现跨奖励配置的快速泛化。然而,经典SR受其固有的策略依赖性限制:由于持续学习、环境非平稳性和任务需求变化,策略会发生变化,使得已建立的预测表示过时。此外,在拓扑复杂的环境中,SR遭受谱扩散,导致特征密集重叠且扩展性差。本文提出层次化后继表示(HSR)以克服这些限制。通过将时间抽象纳入预测表示的构建,HSR学习到对任务引起的策略变化鲁棒的稳定状态特征。将非负矩阵分解(NMF)应用于HSR,得到稀疏低秩的状态表示,有助于在多隔间环境中实现向新任务的高样本效率迁移。进一步分析表明,HSR-NMF发现了可解释的拓扑结构,提供了策略无关的层次化地图,有效桥接了无模型最优性和基于模型的灵活性。除了为任务迁移提供有用基础外,我们还展示了HSR的时间扩展预测结构也可用于驱动高效探索,有效扩展到大规模程序生成的环境。

英文摘要

The successor representation (SR) provides a powerful framework for decoupling predictive dynamics from rewards, enabling rapid generalisation across reward configurations. However, the classical SR is limited by its inherent policy dependence: policies change due to ongoing learning, environmental non-stationarities, and changes in task demands, making established predictive representations obsolete. Furthermore, in topologically complex environments, SRs suffer from spectral diffusion, leading to dense and overlapping features that scale poorly. Here we propose the Hierarchical Successor Representation (HSR) for overcoming these limitations. By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes. Applying non-negative matrix factorisation (NMF) to the HSR yields a sparse, low-rank state representation that facilitates highly sample-efficient transfer to novel tasks in multi-compartmental environments. Further analysis reveals that HSR-NMF discovers interpretable topological structures, providing a policy-agnostic hierarchical map that effectively bridges model-free optimality and model-based flexibility. Beyond providing a useful basis for task-transfer, we show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.

2505.11846 2026-06-12 cs.LG math.AG 版本更新

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

刀刃上的学习:多项式神经网络的可辨识性与奇异性

Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * Department of Mathematics, KTH Royal Institute of Technology(数学系,皇家理工学院)

AI总结 研究以多项式为激活函数的MLP和CNN的函数空间(神经流形),证明MLP参数化几乎处处有限对一,CNN参数化一一对应,并刻画奇异性源于稀疏子网络,解释MLP的稀疏偏好。

详情
Comments
Published at ICLR 2026
AI中文摘要

我们研究由神经网络参数化的函数空间,称为神经流形。具体地,我们关注具有充分一般多项式激活函数的深度多层感知机(MLP)和卷积神经网络(CNN)。首先,我们解决可辨识性问题,表明对于MLP神经流形中的几乎所有函数,只有有限多个参数选择产生该函数。对于CNN,参数化通常是一一对应的。作为推论,我们计算了神经流形的维数。其次,我们描述神经流形的奇异点。我们完全刻画了CNN的奇异性,部分刻画了MLP的奇异性。在这两种情况下,奇异性都源于稀疏子网络。对于MLP,我们证明这些奇异性通常对应于均方误差损失的临界点,而这对CNN不成立。这为MLP的稀疏偏好提供了几何解释。我们的所有结果都利用了代数几何的工具。

英文摘要

We study function spaces parametrized by neural networks, referred to as neuromanifolds. Specifically, we focus on deep Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) with an activation function that is a sufficiently generic polynomial. First, we address the identifiability problem, showing that, for almost all functions in the neuromanifold of an MLP, there exist only finitely many parameter choices yielding that function. For CNNs, the parametrization is generically one-to-one. As a consequence, we compute the dimension of the neuromanifold. Second, we describe singular points of neuromanifolds. We characterize singularities completely for CNNs, and partially for MLPs. In both cases, they arise from sparse subnetworks. For MLPs, we prove that these singularities often correspond to critical points of the mean-squared error loss, which does not hold for CNNs. This provides a geometric explanation of the sparsity bias of MLPs. All of our results leverage tools from algebraic geometry.

2602.12024 2026-06-12 cs.RO 版本更新

Adaptive-Horizon Conflict-Based Search for Closed-Loop Multi-Agent Path Finding

自适应视界冲突搜索用于闭环多智能体路径规划

Jiarui Li, Federico Pecora, Runyu Zhang, Gioele Zardini

发表机构 * Laboratory for Information and Decision Systems, Massachusetts Institute of Technology(信息与决策系统实验室,麻省理工学院) Schwarzman College of Computing(施瓦茨曼计算学院)

AI总结 提出ACCBS算法,通过动态调整规划视界和重用约束树,在有限计算预算下快速生成高质量可行解,兼具渐近最优性和扰动适应性。

详情
AI中文摘要

MAPF是自动化仓库和物流中大型机器人编队的核心协调问题。现有方法要么是开环规划器,生成固定轨迹并难以处理扰动,要么是闭环启发式方法,没有可靠性能保证,限制了其在安全关键部署中的使用。本文提出ACCBS,一种基于有限视界CBS变体的闭环算法,具有受MPC中迭代加深启发的视界变化机制。ACCBS根据可用计算预算动态调整规划视界,并重用单个约束树以实现视界之间的无缝过渡。因此,它能在预算增加时快速产生高质量可行解,同时渐近最优,表现出任意时间行为。大量案例研究表明,ACCBS结合了对扰动的灵活性和强性能保证,有效弥合了大规模机器人部署中理论最优性与实际鲁棒性之间的差距。

英文摘要

MAPF is a core coordination problem for large robot fleets in automated warehouses and logistics. Existing approaches are typically either open-loop planners, which generate fixed trajectories and struggle to handle disturbances, or closed-loop heuristics without reliable performance guarantees, limiting their use in safety-critical deployments. This paper presents ACCBS, a closed-loop algorithm built on a finite-horizon variant of CBS with a horizon-changing mechanism inspired by iterative deepening in MPC. ACCBS dynamically adjusts the planning horizon based on the available computational budget, and reuses a single constraint tree to enable seamless transitions between horizons. As a result, it produces high-quality feasible solutions quickly while being asymptotically optimal as the budget increases, exhibiting anytime behavior. Extensive case studies demonstrate that ACCBS combines flexibility to disturbances with strong performance guarantees, effectively bridging the gap between theoretical optimality and practical robustness for large-scale robot deployment.

2602.09730 2026-06-12 cs.CV cs.LG cs.NA math.NA 版本更新

Allure of Craquelure: A Variational-Generative Approach to Crack Detection in Paintings

龟裂的魅力:一种变分-生成式绘画裂纹检测方法

Laura Paul, Holger Rauhut, Martin Burger, Samira Kabri, Tim Roith

发表机构 * Dept. of Mathematics, LMU Munich(数学系,慕尼黑大学) Munich Center for Machine Learning(慕尼黑机器学习中心) Helmholtz Imaging, Deutsches Elektronen-Synchrotron DESY(海德堡影像,德意志电子同步辐射光源) Fachbereich Mathematik, University of Hamburg(数学学院,汉堡大学) CIT School, Technical University of Munich(技术大学慕尼黑信息学院)

AI总结 提出混合方法,将裂纹检测建模为逆问题,用深度生成模型作为画作先验,结合Mumford-Shah变分泛函和裂纹先验,通过联合优化获得像素级裂纹定位图。

详情
AI中文摘要

近期成像技术、深度学习与数值性能的进步使得对艺术品的非侵入性详细分析成为可能,支持其记录与保护。特别是,数字化绘画中龟裂的自动检测对于评估退化和指导修复至关重要,但由于可能复杂的场景以及裂纹与类似裂纹的艺术特征(如笔触或毛发)之间的视觉相似性,这仍然具有挑战性。我们提出一种混合方法,将裂纹检测建模为一个逆问题,将观测图像分解为无裂纹绘画和裂纹分量。采用深度生成模型作为底层艺术品的有力先验,同时使用Mumford-Shah型变分泛函结合裂纹先验来捕捉裂纹结构。联合优化得到绘画中裂纹定位的像素级图。

英文摘要

Recent advances in imaging technologies, deep learning and numerical performance have enabled non-invasive detailed analysis of artworks, supporting their documentation and conservation. In particular, automated detection of craquelure in digitized paintings is crucial for assessing degradation and guiding restoration, yet remains challenging due to the possibly complex scenery and the visual similarity between cracks and crack-like artistic features such as brush strokes or hair. We propose a hybrid approach that models crack detection as an inverse problem, decomposing an observed image into a crack-free painting and a crack component. A deep generative model is employed as powerful prior for the underlying artwork, while crack structures are captured using a Mumford--Shah-type variational functional together with a crack prior. Joint optimization yields a pixel-level map of crack localizations in the painting.

2602.07106 2026-06-12 cs.CV cs.AI cs.CL 版本更新

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Ex-Omni:为全模态大语言模型赋能3D面部动画生成

Haoyu Zhang, Zhipeng Li, Yiwen Guo, Tianshu Yu

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) LIGHTSPEED Independent Researcher(独立研究员)

AI总结 提出Ex-Omni模型,通过混合形状感知语音单元生成器和解码器解耦语义推理与时间生成,并引入统一令牌查询门控融合机制,实现全模态大语言模型同步生成语音和3D面部动画。

详情
AI中文摘要

全模态大语言模型旨在统一多模态理解和生成,然而,尽管自然的人机交互至关重要,但扩展它们以联合生成语音和3D面部动画仍 largely unexplored。一个关键挑战是LLM的离散语义推理与3D面部运动所需的密集时间动态之间的不匹配。我们提出Expressive Omni (Ex-Omni),一个开源模型,通过原生语音伴随的3D面部动画增强OLLM。Ex-Omni通过混合形状感知语音单元生成器和混合形状解码器将语义推理与时间生成解耦,其中语音单元提供时间支架,隐藏语音表示携带面部相关线索。我们进一步引入统一的令牌查询门控融合机制用于受控语义注入,以及InstructS2SF-1200K,一个包含1200K样本的预训练数据集。大量实验表明,Ex-Omni在保持竞争性语音理解和生成能力的同时,实现了比级联管道更好的音视频同步和更低的面部生成延迟。

英文摘要

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet extending them to jointly produce speech and 3D facial animation remains largely unexplored despite its importance for natural human-computer interaction. A key challenge is the mismatch between the discrete semantic reasoning of LLMs and the dense temporal dynamics required for 3D facial motion. We propose Expressive Omni (Ex-Omni), an open-source model that augments OLLMs with native speech-accompanied 3D facial animation. Ex-Omni decouples semantic reasoning from temporal generation through a blendshape-aware speech unit generator and a blendshape decoder, where speech units provide temporal scaffolding and hidden speech representations carry facially relevant cues. We further introduce a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection, as well as InstructS2SF-1200K, a dataset consisting of 1200K samples for pre-training. Extensive experiments show that Ex-Omni maintains competitive speech understanding and generation ability while achieving better audio-visual synchronization and lower face-generation latency than cascaded pipelines.

2602.04675 2026-06-12 cs.LG 版本更新

Generalized Schrödinger Bridge on Graphs

图上的广义薛定谔桥

Panagiotis Theodoropoulos, Juno Nam, Evangelos Theodorou, Jaemoo Choi

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出GSBoG框架,通过似然优化学习图上可控连续时间马尔可夫链策略,满足端点边际分布并优化中间状态成本,实现可扩展的拓扑感知运输。

详情
AI中文摘要

图上的运输是许多领域中的一个基本挑战,决策必须尊重拓扑和操作约束。尽管需要可执行的策略,现有的图运输方法缺乏这种表达能力。它们依赖于限制性假设,无法在稀疏拓扑上泛化,并且随着图大小和时间范围的增加而扩展性差。为了解决这些问题,我们引入了图上的广义薛定谔桥(GSBoG),这是一种新颖的可扩展数据驱动框架,用于在状态成本增强动力学下学习任意图上的可执行受控连续时间马尔可夫链(CTMC)策略。值得注意的是,GSBoG学习轨迹级策略,避免了密集的全局求解器,从而增强了可扩展性。这是通过似然优化方法实现的,满足端点边际分布,同时优化状态依赖运行成本下的中间行为。在具有挑战性的真实世界图拓扑上的大量实验表明,GSBoG能够可靠地学习准确、尊重拓扑的策略,同时优化特定应用的中间状态成本,突出了其广泛的适用性,并为一般图上的成本感知动态运输开辟了新途径。

英文摘要

Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, existing graph-transport methods lack this expressivity. They rely on restrictive assumptions, fail to generalize across sparse topologies, and scale poorly with graph size and time horizon. To address these issues, we introduce Generalized Schrödinger Bridge on Graphs (GSBoG), a novel scalable data-driven framework for learning executable controlled continuous-time Markov chain (CTMC) policies on arbitrary graphs under state cost augmented dynamics. Notably, GSBoG learns trajectory-level policies, avoiding dense global solvers and thereby enhancing scalability. This is achieved via a likelihood optimization approach, satisfying the endpoint marginals, while simultaneously optimizing intermediate behavior under state-dependent running costs. Extensive experimentation on challenging real-world graph topologies shows that GSBoG reliably learns accurate, topology-respecting policies while optimizing application-specific intermediate state costs, highlighting its broad applicability and paving new avenues for cost-aware dynamical transport on general graphs.

2601.09693 2026-06-12 cs.LG stat.ML 版本更新

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

对比几何学习实现统一的结构与配体药物设计

Lisa Schneckenreiter, Sohvi Luukkonen, Lukas Friedrich, Daniel Kuhn, Günter Klambauer

发表机构 * DeepMind Ltd(DeepMind有限公司)

AI总结 提出对比几何模型ConGLUDe,统一结构与配体训练,实现虚拟筛选、靶标钓鱼和配体条件口袋预测,在多项基准测试中表现优异。

详情
Comments
Forty-Third International Conference on Machine Learning
AI中文摘要

基于结构和基于配体的计算药物设计传统上依赖于不相关的数据源和建模假设,限制了它们在大规模上的联合使用。在这项工作中,我们引入了用于统一计算药物设计的对比几何学习(ConGLUDe),这是一个单一的对比几何模型,统一了基于结构和基于配体的训练。ConGLUDe将产生全蛋白质表示和预测结合位点的隐式嵌入的几何蛋白质编码器与快速配体编码器耦合,消除了对预定义口袋的需求。通过对比学习将配体与全局蛋白质表示和多个候选结合位点对齐,ConGLUDe除了支持虚拟筛选和靶标钓鱼外,还支持配体条件口袋预测,同时在蛋白质-配体复合物和大规模生物活性数据上联合训练。在多种基准测试中,ConGLUDe实现了具有竞争力的零样本虚拟筛选性能,在具有挑战性的靶标钓鱼任务上显著优于现有方法,并展示了最先进的配体条件口袋选择。这些结果突显了统一结构-配体训练的优势,并将ConGLUDe定位为迈向药物发现通用基础模型的一步。

英文摘要

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

2505.13102 2026-06-12 cs.LG cs.AI eess.SP 版本更新

Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast

轻量级可解释Transformer:基于混合图算法展开的交通预测

Ji Qi, Tam Thuc Do, Mingxiao Liu, Zhuoshi Pan, Yuzhe Li, Gene Cheung, H. Vicky Zhao

发表机构 * arXiv.org University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种通过展开混合图优化算法构建的轻量级可解释类Transformer网络,用于时空交通预测,在保持竞争性能的同时大幅减少参数。

详情
Comments
24 pages, 7 figures, 11 tables
AI中文摘要

与采用经典自注意力机制的传统“黑箱”Transformer不同,我们通过展开基于混合图的优化算法构建了一个轻量级且可解释的类Transformer神经网络,用于具有空间和时间维度的交通预测。我们构建了两个图:一个无向图$\mathcal{G}^u$捕捉跨地理的空间相关性,以及一个有向图$\mathcal{G}^d$捕捉时间上的序列关系。我们预测信号$\mathbf{x}$的未来样本,假设其相对于$\mathcal{G}^u$和$\mathcal{G}^d$都是“平滑的”,为此我们设计了新的$\ell_2$和$\ell_1$范数变分项来量化并促进有向图上的信号平滑性(低频重构)。我们基于交替方向乘子法(ADMM)设计了一个迭代算法,并将其展开为一个前馈网络以进行数据驱动的参数学习。我们周期性地插入用于$\mathcal{G}^u$和$\mathcal{G}^d$的图学习模块,这些模块扮演自注意力的角色。实验表明,我们的展开网络在交通预测性能上与最先进的预测方案相当,同时大幅减少了参数数量。

英文摘要

Unlike conventional "black-box" transformers with classical self-attention mechanism, we build a lightweight and interpretable transformer-like neural net by unrolling a mixed-graph-based optimization algorithm to forecast traffic with spatial and temporal dimensions. We construct two graphs: an undirected graph $\mathcal{G}^u$ capturing spatial correlations across geography, and a directed graph $\mathcal{G}^d$ capturing sequential relationships over time. We predict future samples of signal $\mathbf{x}$, assuming it is "smooth" with respect to both $\mathcal{G}^u$ and $\mathcal{G}^d$, where we design new $\ell_2$ and $\ell_1$-norm variational terms to quantify and promote signal smoothness (low-frequency reconstruction) on a directed graph. We design an iterative algorithm based on alternating direction method of multipliers (ADMM), and unroll it into a feed-forward network for data-driven parameter learning. We periodically insert graph learning modules for $\mathcal{G}^u$ and $\mathcal{G}^d$ that play the role of self-attention. Experiments show that our unrolled networks achieve competitive traffic forecast performance as state-of-the-art prediction schemes, while reducing parameter counts drastically.

2602.02181 2026-06-12 cs.RO 版本更新

Extending the Law of Intersegmental Coordination: Implications for Powered Prosthetic Controls

扩展节段间协调定律:对动力假肢控制的启示

Elad Siman Tov, Nili E. Krausz

发表机构 * Faculty of Mechanical Engineering, Technion – Israel Institute of Technology(机械工程系,技术学院–以色列理工学院)

AI总结 针对下肢截肢者步行代谢成本问题,提出基于节段间协调定律的假肢控制框架,通过分析三维运动学数据扩展出力矩协调定律,并开发了开源工具包。

详情
Comments
Submitted to 2026 IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
AI中文摘要

动力假肢能够为截肢者提供净正功,并在过去二十年中取得了进步。然而,降低截肢者步行代谢成本仍是一个未解决的问题。节段间协调定律(ISC)已在多种步态中被观察到,并先前被认为与步行能量消耗有关,但很少在下肢截肢者步态背景下进行分析或应用。该定律指出,大腿、小腿和足部在步态周期中的仰角是协变的。在这项工作中,我们开发了一种方法,用于分析下肢三维运动学数据的节段间协调,以简化ISC分析。此外,受运动控制、生物力学和机器人学文献的启发,我们使用该方法将ISC扩展为一种新的力矩协调定律。我们发现了这些仰角空间力矩(ESM),并展示了显示健全步态基于力矩的协调的结果。我们还分析了使用动力和被动假肢的截肢者步态的ISC,发现虽然仰角保持平面性,但ESM缺乏平面协调。我们提出了一个ISC驱动的动力假肢控制框架,使用健康协调作为约束来预测小腿角度/力矩,以补偿由于被动足部引起的改变。我们开发了ISC3d工具箱,该工具可在线免费获取,可用于计算三维运动学和动力学ISC。这为进一步研究协调在步态中的作用提供了手段,并可能有助于解决人类运动神经控制的基本问题。

英文摘要

Powered prostheses are capable of providing net positive work to amputees and have advanced in the past two decades. However, reducing amputee metabolic cost of walking remains an open problem. The Law of Intersegmental Coordination (ISC) has been observed across gaits and previously implicated in energy expenditure of walking, yet it has rarely been analyzed or applied within the context of lower-limb amputee gait. This law states that the elevation angles of the thigh, shank and foot over the gait cycle covary. In this work, we developed a method to analyze intersegmental coordination for lower-limb 3D kinematic data, to simplify ISC analysis. Moreover, inspired by motor control, biomechanics and robotics literature, we used our method to extend ISC to a new law of coordination of moments. We find these Elevation Space Moments (ESM), and present results showing a moment-based coordination for able bodied gait. We also analyzed ISC for amputee gait with powered and passive prostheses, and found that while elevation angles remained planar, the ESM lacked planar coordination. We present an ISC-driven powered prosthetic control framework, using healthy coordination as a constraint to predict the shank angles/moments to compensate for alterations due to a passive foot. We developed the ISC3d toolbox that is freely available online, which may be used to compute kinematic and kinetic ISC in 3D. This provides a means to further study the role of coordination in gait and may help address fundamental questions of the neural control of human movement.

2602.01572 2026-06-12 cs.CL cs.IR 版本更新

LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States

基于LLM的嵌入:注意力值比隐藏状态更好地编码句子语义

Yeqin Zhang, Yunfei Wang, Jiaxuan Chen, Ke Qin, Yizheng Zhao, Cam-Tu Nguyen

发表机构 * arXiv.org cs.CL(计算机语言学)

AI总结 本文提出Value Aggregation方法,利用LLM的注意力值向量而非隐藏状态来生成句子嵌入,在无训练设置下超越现有方法,甚至匹配或超越集成方法MetaEOL。

详情
AI中文摘要

句子表示是许多自然语言处理(NLP)应用的基础。虽然近期方法利用大型语言模型(LLM)来推导句子表示,但大多数依赖于最终层的隐藏状态,这些隐藏状态针对下一个词预测进行了优化,因此通常无法捕捉全局的句子级语义。本文引入了一个新颖的视角,证明注意力值向量比隐藏状态更有效地捕捉句子语义。我们提出了值聚合(VA),一种简单的方法,它跨多个层和词索引池化标记值。在无训练设置中,VA优于其他基于LLM的嵌入,甚至匹配或超越了基于集成的MetaEOL。此外,我们证明,当与合适的提示配对时,层注意力输出可以被解释为对齐的加权值向量。具体来说,最后一个标记的注意力分数充当权重,而输出投影矩阵($W_O$)将这些加权值向量与LLM残差流的公共空间对齐。这种改进的方法,称为对齐加权VA(AlignedWVA),在无训练的基于LLM的嵌入中达到了最先进的性能,大幅超越了高成本的MetaEOL。最后,我们强调了通过微调值聚合来获得强LLM嵌入模型的潜力。

英文摘要

Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically, the attention scores of the last token function as the weights, while the output projection matrix ($W_O$) aligns these weighted value vectors with the common space of the LLM residual stream. This refined method, termed Aligned Weighted VA (AlignedWVA), achieves state-of-the-art performance among training-free LLM-based embeddings, outperforming the high-cost MetaEOL by a substantial margin. Finally, we highlight the potential of obtaining strong LLM embedding models through fine-tuning Value Aggregation.

2512.12571 2026-06-12 cs.CV 版本更新

Measurement Plasticity: Sensor-Level Adaptation for Vision-Language Models

测量塑性:面向视觉-语言模型的传感器级自适应

Boyeong Im, Wooseok Lee, Yoojin Kwon, Hyung-Sin Kim

发表机构 * arXiv.org University of Seoul(首尔大学)

AI总结 提出多视角物理提示(MVP)用于测试时自适应,通过将相机曝光三角(ISO、快门速度、光圈)作为物理提示,在传感器层面进行自适应,无需梯度或模型修改,在ImageNet-ES上优于数字方法。

详情
Comments
Accepted to the ICML 2026 Workshop on Continual Adaptation at Scale
AI中文摘要

我们提出用于测试时自适应(TTA)的多视角物理提示(MVP),这是一种前向传播框架,通过将相机曝光三角(即ISO、快门速度、光圈)视为物理提示,将TTA从令牌层面转移到光子层面。在推理时,MVP使用源亲和度得分获取选定的多个物理视角,评估每个保留视角的数字增强变体并过滤最低熵预测,然后通过硬投票聚合预测。这种先选择后投票的设计简单、易于校准,且无需梯度或模型修改。在ImageNet-ES和ImageNet-ES-Diverse上,MVP在自动曝光以及与传统传感器控制结合的情况下均优于纯数字TTA。在减少参数候选以降低捕获延迟的情况下,MVP仍然有效,展示了其实用性。

英文摘要

We propose Multi-View Physical-prompt (MVP) for Test-Time Adaptation (TTA), a forward-only framework that moves TTA from tokens to photons by treating the camera exposure triangle (i.e., ISO, shutter speed, and aperture) as physical prompts. At inference, MVP acquires selected multiple physical views using a source-affinity score, evaluates digitally augmented variants of each retained view and filters the lowest-entropy predictions, and aggregates predictions with hard voting. This selection-then-vote design is simple, calibration-friendly, and requires no gradients or model modifications. On ImageNet-ES and ImageNet-ES-Diverse, MVP outperforms digital-only TTA on both Auto-Exposure and a combination with conventional sensor control. MVP remains effective under reduced parameter candidates that lower capture latency, demonstrating its practicality.

2601.22594 2026-06-12 cs.CL cs.AI 版本更新

Language Model Circuits Are Sparse in the Neuron Basis

语言模型电路在神经元基上是稀疏的

Aryaman Arora, Zhengxuan Wu, Jacob Steinhardt, Sarah Schwettmann

发表机构 * Stanford University(斯坦福大学)

AI总结 本文实证发现MLP神经元与稀疏自编码器一样是稀疏特征基,并基于此开发了端到端梯度归因流水线,在多项任务中揭示了因果有效的神经元电路。

详情
Comments
ICML Spotlight, camera-ready
AI中文摘要

神经网络用于计算的高层概念不一定与单个神经元对齐(Smolensky, 1986)。因此,语言模型可解释性研究转向了将神经元基分解为更可解释的模型计算单元的技术,例如稀疏自编码器(SAEs)。然而,并非所有基于神经元的表示都不可解释。我们首次实证表明,MLP神经元与SAEs一样是稀疏的特征基。利用这一发现,我们开发了一个端到端的基于梯度的归因流水线,用于在MLP神经元基上进行电路追踪,从而在多种任务中揭示因果有效的神经元。在标准的主谓一致基准测试(Marks et al., 2025)上,约$10^2$个MLP神经元的电路足以控制模型行为。在(Lindsey et al., 2025)的多跳城市-州-首都任务中,我们发现了一个电路,其中小部分神经元编码特定的潜在推理步骤(例如将城市映射到其所在州),并且可以通过引导来改变模型的输出。因此,这项工作在不增加额外训练成本的情况下推进了语言模型的自动化可解释性。

英文摘要

The high-level concepts that a neural network uses to perform computation need not be aligned to individual neurons (Smolensky, 1986). Language model interpretability research has thus turned to techniques which decompose the neuron basis into more interpretable units of model computation, such as sparse autoencoders (SAEs). However, not all neuron-based representations are uninterpretable. For the first time, we empirically show that MLP neurons are as sparse a feature basis as SAEs. We use this finding to develop an end-to-end gradient-based attribution pipeline for circuit tracing on the MLP neuron basis, which surfaces causally effective neurons on a variety of tasks. On a standard subject-verb agreement benchmark (Marks et al., 2025), a circuit of $\approx 10^2$ MLP neurons is enough to control model behaviour. On the multi-hop city-state-capital task from (Lindsey et al., 2025), we find a circuit in which small sets of neurons encode specific latent reasoning steps (e.g. mapping a city to its state), and can be steered to change the model's output. This work thus advances automated interpretability of language models without imposing additional training costs.

2601.22090 2026-06-12 cs.RO 版本更新

ReactEMG Stroke: Healthy-to-Stroke Few-shot Adaptation for sEMG-Based Intent Detection

ReactEMG 中风:基于表面肌电图的意图检测的健康到中风少样本适应

Runsheng Wang, Katelyn Lee, Xinyue Zhu, Lauren Winterbottom, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie

发表机构 * Department of Mechanical Engineering, Columbia University in the City of New York(哥伦比亚大学纽约市机械工程系) Department of Computer Science, Columbia University in the City of New York(哥伦比亚大学纽约市计算机科学系) Department of Rehabilitation and Regenerative Medicine, Columbia University Irving Medical Center(哥伦比亚大学伊文思医疗中心康复与再生医学系)

AI总结 提出一种健康到中风的适应流程,利用大规模健康受试者sEMG预训练模型,仅用少量中风患者数据微调,显著提升意图检测准确率和鲁棒性。

详情
AI中文摘要

表面肌电图(sEMG)是一种有前景的控制信号,用于中风后按需辅助手部康复,但从瘫痪肌肉检测意图通常需要长时间、特定于受试者的校准,并且对变异性很脆弱。我们提出了一种健康到中风的适应流程,该流程从在大规模健全受试者sEMG上预训练的模型初始化意图检测器,然后仅使用少量特定于受试者的数据为每个中风参与者进行微调。使用从三位慢性中风患者收集的新数据集,我们比较了适应策略(仅头部调优、参数高效的LoRA适配器和全端到端微调),并在包含现实分布偏移(如会话内漂移、姿势变化和臂带重新定位)的保留测试集上评估。在各种条件下,与相同数据预算下的零样本迁移和仅中风训练相比,健康预训练适应一致地改善了中风意图检测;最佳适应方法将平均转换准确率从0.42提高到0.61,原始准确率从0.69提高到0.78。这些结果表明,迁移可复用的健康域EMG表示可以减少校准负担,同时提高实时中风后意图检测的鲁棒性。我们的项目网站、视频、代码和数据集可在以下网址获取:this https URL。

英文摘要

Surface electromyography (sEMG) is a promising control signal for assist-as-needed hand rehabilitation after stroke, but detecting intent from paretic muscles often requires lengthy, subject-specific calibration and remains brittle to variability. We propose a healthy-to-stroke adaptation pipeline that initializes an intent detector from a model pretrained on large-scale able-bodied sEMG, then fine-tunes it for each stroke participant using only a small amount of subject-specific data. Using a newly collected dataset from three individuals with chronic stroke, we compare adaptation strategies (head-only tuning, parameter-efficient LoRA adapters, and full end-to-end fine-tuning) and evaluate on held-out test sets that include realistic distribution shifts such as within-session drift, posture changes, and armband repositioning. Across conditions, healthy-pretrained adaptation consistently improves stroke intent detection relative to both zero-shot transfer and stroke-only training under the same data budget; the best adaptation methods improve average transition accuracy from 0.42 to 0.61 and raw accuracy from 0.69 to 0.78. These results suggest that transferring a reusable healthy-domain EMG representation can reduce calibration burden while improving robustness for real-time post-stroke intent detection. Our project website, video, code, and dataset are available at: https://roamlab.github.io/reactemg-stroke/.

2601.21570 2026-06-12 cs.AI cs.RO 版本更新

From Digital to Physical: Digital Agents as Autonomous Coaches for Physical Intelligence

从数字到物理:数字代理作为物理智能的自主教练

Zixing Lei, Genjia Liu, Yuanshuo Zhang, Qipeng Liu, Yuzhu Cai, Sixiang Chen, Jixian Wu, Yunhong Wang, Weixin Li, Chuan Wen, Bo Zhao, Shanghang Zhang, Wenzhao Lian, Siheng Chen

发表机构 * School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China(上海交通大学人工智能学院) Zhongguancun Academy, Beijing, China(中关村学院) School of Integrated Circuits, Shanghai Jiao Tong University, Shanghai, China(上海交通大学集成电路学院) School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(上海交通大学计算机科学学院) State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University, Beijing, China(北京大学计算机科学学院多媒体信息处理国家重点实验室)

AI总结 提出EmboCoach-Bench基准,评估LLM代理自主设计具身策略的能力,通过迭代调试和优化,代理在平均成功率上超越人工基线26.5%,并具备自我修正能力。

详情
Comments
53 pages, 12 figures
AI中文摘要

具身AI领域正朝着通用机器人系统快速发展,得益于高保真模拟和大规模数据收集。然而,这种扩展能力仍然受到劳动密集型人工监督的严重瓶颈,从复杂的奖励塑造到跨异构后端的超参数调整。受LLM在软件自动化和科学发现中成功的启发,我们引入了\ extsc{EmboCoach-Bench},一个评估LLM代理自主设计具身策略能力的基准。涵盖32个专家精选的RL和IL任务,我们的框架将可执行代码作为通用接口。我们超越静态生成,评估动态闭环工作流,其中代理利用环境反馈迭代地起草、调试和优化解决方案,涵盖从物理信息奖励设计到扩散策略等策略架构的改进。广泛评估得出三个关键见解:(1)自主代理在平均成功率上可以定性超越人工设计的基线26.5%;(2)具有环境反馈的代理工作流有效增强了策略开发,并显著缩小了开源和专有模型之间的性能差距;(3)代理对病态工程案例表现出自我修正能力,通过迭代仿真循环调试成功从近乎完全失败中恢复任务性能。最终,这项工作为自我进化的具身智能奠定了基础,加速了具身AI领域从劳动密集型手动调优到可扩展自主工程的范式转变。

英文摘要

The field of Embodied AI is witnessing a rapid evolution toward general-purpose robotic systems, fueled by high-fidelity simulation and large-scale data collection. However, this scaling capability remains severely bottlenecked by a reliance on labor-intensive manual oversight from intricate reward shaping to hyperparameter tuning across heterogeneous backends. Inspired by LLMs' success in software automation and science discovery, we introduce \textsc{EmboCoach-Bench}, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies. Spanning 32 expert-curated RL and IL tasks, our framework posits executable code as the universal interface. We move beyond static generation to assess a dynamic closed-loop workflow, where agents leverage environment feedback to iteratively draft, debug, and optimize solutions, spanning improvements from physics-informed reward design to policy architectures such as diffusion policies. Extensive evaluations yield three critical insights: (1) autonomous agents can qualitatively surpass human-engineered baselines by 26.5\% in average success rate; (2) agentic workflow with environment feedback effectively strengthens policy development and substantially narrows the performance gap between open-source and proprietary models; and (3) agents exhibit self-correction capabilities for pathological engineering cases, successfully resurrecting task performance from near-total failures through iterative simulation-in-the-loop debugging. Ultimately, this work establishes a foundation for self-evolving embodied intelligence, accelerating the paradigm shift from labor-intensive manual tuning to scalable, autonomous engineering in embodied AI field.

2601.13346 2026-06-12 cs.CL 版本更新

AfroScope: A Framework for Studying the Linguistic Landscape of Africa

AfroScope:研究非洲语言景观的框架

Sang Yun Kwon, AbdelRahim Elmadany, Muhammad Abdul-Mageed

发表机构 * The University of British Columbia(不列颠哥伦比亚大学)

AI总结 提出AfroScope框架,包含覆盖640种语言的数据集和模型套件,通过层次分类和专用嵌入模型解决近亲语言混淆问题,提升宏F1分数1.57点,并分析跨语言迁移和领域效应。

详情
AI中文摘要

语言识别(LID)是确定给定文本语言的任务,是影响下游NLP应用可靠性的基本预处理步骤。尽管近期工作扩展了非洲LID,现有系统在语言覆盖范围以及近亲语言和变体的细粒度区分方面仍然有限。我们引入了AfroScope,一个统一的非洲LID框架,包括AfroScope-Data(覆盖640种语言的数据集)和AfroScope-Models(一套具有广泛非洲语言覆盖的强LID模型)。为了解决近亲语言之间持续存在的混淆问题,我们提出了一种层次分类方法,利用AfroScope-Mirror(一种专门用于目标消歧的嵌入模型),在易混淆子集上相比最佳基础模型提升了1.57个宏F1分数。我们进一步分析了跨语言迁移和领域效应,展示了语言家族结构、脚本兼容性和领域覆盖如何影响LID性能。我们将非洲LID定位为大规模测量数字文本中非洲语言景观的使能技术,并在线发布了AfroScope-Data和AfroScope-Models。

英文摘要

Language Identification (LID), the task of determining the language of a given text, is a fundamental preprocessing step that shapes the reliability of downstream NLP applications. While recent work has expanded African LID, existing systems remain limited in both language coverage and fine-grained discrimination among closely related languages and varieties. We introduce AfroScope, a unified framework for African LID that includes AfroScope-Data, a dataset covering 640 languages, and AfroScope-Models, a suite of strong LID models with broad African language coverage. To address persistent confusions among closely related languages, we propose a hierarchical classification approach that leverages AfroScope-Mirror, a specialized embedding model for targeted disambiguation, improving macro-F1 by 1.57 points on the confusable subset compared to our best base model. We further analyze cross-lingual transfer and domain effects, showing how language-family structure, script compatibility, and domain coverage shape LID performance. We position African LID as an enabling technology for large-scale measurement of Africa's linguistic landscape in digital text, and release AfroScope-Data and AfroScope-Models online.

2512.22287 2026-06-12 cs.LG cs.AI 版本更新

Cluster Aggregated GAN (CAG): A Cluster-Based Hybrid Model for Appliance Pattern Generation

聚类聚合生成对抗网络 (CAG):一种基于聚类的混合模型用于电器模式生成

Zikun Guo, Adeyinka. P. Adedigba, Rammohan Mallipeddi

发表机构 * Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University(人工智能系,电子工程学院,全北国立大学)

AI总结 针对现有生成方法忽略间歇性与连续电器行为差异导致训练不稳定和保真度有限的问题,提出CAG框架,通过聚类模块为间歇电器分配专用生成器,连续电器使用LSTM生成器,在UVIC数据集上优于基线方法。

详情
Comments
18pages, 5Figues
AI中文摘要

合成电器数据对于开发非侵入式负荷监测算法和实现隐私保护的能源研究至关重要,然而标记数据集的稀缺性仍然是一个重大障碍。最近基于GAN的方法已经证明了合成负荷模式的可行性,但大多数现有方法在单个模型内统一处理所有设备,忽略了间歇性和连续性电器之间的行为差异,导致训练不稳定和输出保真度有限。为了解决这些局限性,我们提出了聚类聚合生成对抗网络框架,这是一种混合生成方法,根据每个电器的行为特征将其路由到专门的分支。对于间歇性电器,聚类模块将相似的激活模式分组,并为每个聚类分配专用生成器,确保常见和罕见操作模式都获得足够的建模能力。连续性电器遵循单独的分支,采用基于LSTM的生成器来捕捉逐渐的时间演变,同时通过序列压缩保持训练稳定性。在UVIC智能插头数据集上的大量实验表明,所提出的框架在衡量真实性、多样性和训练稳定性的指标上始终优于基线方法,并且将聚类作为主动生成组件显著提高了可解释性和可扩展性。这些发现确立了所提出的框架作为非侵入式负荷监测研究中合成负荷生成的有效方法。

英文摘要

Synthetic appliance data are essential for developing non-intrusive load monitoring algorithms and enabling privacy preserving energy research, yet the scarcity of labeled datasets remains a significant barrier. Recent GAN-based methods have demonstrated the feasibility of synthesizing load patterns, but most existing approaches treat all devices uniformly within a single model, neglecting the behavioral differences between intermittent and continuous appliances and resulting in unstable training and limited output fidelity. To address these limitations, we propose the Cluster Aggregated GAN framework, a hybrid generative approach that routes each appliance to a specialized branch based on its behavioral characteristics. For intermittent appliances, a clustering module groups similar activation patterns and allocates dedicated generators for each cluster, ensuring that both common and rare operational modes receive adequate modeling capacity. Continuous appliances follow a separate branch that employs an LSTM-based generator to capture gradual temporal evolution while maintaining training stability through sequence compression. Extensive experiments on the UVIC smart plug dataset demonstrate that the proposed framework consistently outperforms baseline methods across metrics measuring realism, diversity, and training stability, and that integrating clustering as an active generative component substantially improves both interpretability and scalability. These findings establish the proposed framework as an effective approach for synthetic load generation in non-intrusive load monitoring research.

2601.17654 2026-06-12 cs.LG cs.DC 版本更新

Kareus: Joint Reduction of Dynamic and Static Energy in Large Model Training

Kareus:大型模型训练中动态与静态能量的联合降低

Ruofan Wu, Jae-Won Chung, Mosharaf Chowdhury

发表机构 * University of Michigan(密歇根大学)

AI总结 针对AI训练能耗高昂问题,提出Kareus系统,通过联合优化细粒度内核调度与频率缩放,协同降低动态和静态能耗,在相同训练时间下节能28.3%,或相同能耗下提速27.5%。

详情
Comments
OSDI '26 | Open-source at https://github.com/ml-energy/kareus
AI中文摘要

AI的计算需求正以前所未有的速度增长,但能源供应并未跟上步伐。因此,能源已成为一种昂贵且受争抢的资源,需要明确的管理和优化。尽管近期工作在大型模型训练优化方面取得了显著进展,但它们侧重于优化动态或静态能耗中的一种。我们发现,细粒度的内核调度和频率缩放共同且相互依赖地影响动态和静态能耗。基于这一发现,我们设计了Kareus,一个通过优化两方面来推动时间-能耗权衡前沿的训练系统。Kareus将棘手的联合优化问题分解为基于分区的局部子问题,然后使用多遍多目标优化算法来找到推动时间-能耗权衡前沿的执行调度。与现有技术相比,Kareus在相同训练时间下最多可减少28.3%的训练能耗,或在相同能耗下最多减少27.5%的训练时间。

英文摘要

The computing demand of AI is growing at an unprecedented rate, but energy supply is not keeping pace. As a result, energy has become an expensive and contended resource that requires explicit management and optimization. Although recent works have made significant progress in large model training optimization, they focus on optimizing either dynamic or static energy consumption. We find that fine-grained kernel scheduling and frequency scaling jointly and interdependently impact both dynamic and static energy consumption. Based on this finding, we design Kareus, a training system that pushes the time-energy tradeoff frontier by optimizing both aspects. Kareus decomposes the intractable joint optimization problem into local, partition-based subproblems. It then uses a multi-pass multi-objective optimization algorithm to find execution schedules that push the time-energy tradeoff frontier. Compared to the state of the art, Kareus reduces training energy by up to 28.3% at the same training time, or reduces training time by up to 27.5% at the same energy consumption.

2509.03340 2026-06-12 cs.LG cs.AI cs.CE physics.comp-ph 版本更新

Equivariant Flow Matching for Symmetry-Breaking Bifurcation Problems

等变流匹配用于对称破缺分岔问题

Fleur Hendriks, Ondřej Rokoš, Martin Doškář, Marc G. D. Geers, Vlado Menkovski

发表机构 * Department of Mechanical Engineering, Eindhoven University of Technology(埃因霍温理工大学机械工程系) DIFFER – Dutch Institute for Fundamental Energy Research(荷兰基础能源研究所) Faculty of Civil Engineering, Department of Mechanics, Czech Technical University in Prague(布拉格捷克技术大学土木工程学院力学系) Department of Mathematics and Computer Science, Eindhoven University of Technology(埃因霍温理工大学数学与计算机科学系)

AI总结 针对非线性动力系统中对称破缺导致的多稳态共存问题,提出等变流匹配方法,结合等变架构与最优传输耦合机制,准确捕捉多模态分布和对称破缺分岔,优于非概率和变分方法。

详情
Comments
9 pages, 7 figures including appendices. Accepted to Machine Learning and the Physical Sciences Workshop, NeurIPS 2025 (https://ml4physicalsciences.github.io/2025/). Repository with corresponding code: https://github.com/FHendriks11/bifurcationML/. Video explanation: https://www.youtube.com/watch?v=wsL3h17KtjY
AI中文摘要

非线性动力系统中的分岔现象通常导致多个共存的稳定解,特别是在对称破缺的情况下。确定性机器学习模型无法捕捉这种多重性,会平均化解并无法表示低对称性结果。在这项工作中,我们正式将生成式AI(特别是流匹配)作为建模分岔结果全概率分布的原则性方法。我们的方法建立在现有技术基础上,将流匹配与等变架构和基于最优传输的耦合机制相结合。我们将等变流匹配推广到一种对称耦合策略,该策略在群作用下对齐预测和目标输出,从而在等变设置中实现准确学习。我们在从简单概念系统到物理问题(如屈曲梁和Allen-Cahn方程)的一系列系统上验证了我们的方法。结果表明,该方法准确捕捉了多模态分布和对称破缺分岔。此外,我们的结果表明,流匹配显著优于非概率和变分方法。这为高维系统中的多稳态建模提供了一种原则性且可扩展的解决方案。

英文摘要

Bifurcation phenomena in nonlinear dynamical systems often lead to multiple coexisting stable solutions, particularly in the presence of symmetry breaking. Deterministic machine learning models are unable to capture this multiplicity, averaging over solutions and failing to represent lower-symmetry outcomes. In this work, we formalize the use of generative AI, specifically flow matching, as a principled way to model the full probability distribution over bifurcation outcomes. Our approach builds on existing techniques by combining flow matching with equivariant architectures and an optimal-transport-based coupling mechanism. We generalize equivariant flow matching to a symmetric coupling strategy that aligns predicted and target outputs under group actions, allowing accurate learning in equivariant settings. We validate our approach on a range of systems, from simple conceptual systems to physical problems such as buckling beams and the Allen--Cahn equation. The results demonstrate that the approach accurately captures multimodal distributions and symmetry-breaking bifurcations. Moreover, our results demonstrate that flow matching significantly outperforms non-probabilistic and variational methods. This offers a principled and scalable solution for modeling multistability in high-dimensional systems.

2507.02921 2026-06-12 cs.LG cs.AI 版本更新

PlaceRep: Geospatial Place Representation Learning from Large-Scale Point-of-Interest Data

PlaceRep: 基于大规模兴趣点数据的地理空间场所表示学习

Mohammad Hashemi, Hossein Amiri, Andreas Zufle

发表机构 * Emory University(埃默里大学)

AI总结 提出PlaceRep方法,通过聚类空间和语义相关的兴趣点构建场所级表示,无需预训练即可高效生成多尺度城市区域嵌入,在人口密度估计和房价预测任务中优于现有方法并实现百倍加速。

详情
AI中文摘要

学习城市环境的有效表示需要捕捉超越固定行政边界的空间结构。现有的地理空间表示学习方法通常将兴趣点(POI)聚合到预定义的行政区域(如普查单元或邮政编码区域),为每个区域分配单个嵌入。然而,POI 通常形成跨越、包含或超出这些边界的语义上有意义的组,定义了更能反映人类活动和城市功能的场所。为解决这一局限性,我们提出 PlaceRep,一种通过聚类空间和语义相关的 POI 来构建场所级表示的地理空间表示学习方法。PlaceRep 从美国 Foursquare 数据中总结大规模 POI 图,生成通用城市区域嵌入,同时自动识别跨多个空间尺度的场所。通过消除模型预训练,PlaceRep 为多粒度地理空间分析提供了可扩展且高效的解决方案。使用人口密度估计和房价预测作为下游任务的实验表明,PlaceRep 优于大多数最先进的基于图的地理空间表示学习方法,并在大规模 POI 图上生成区域级表示时实现了高达 100 倍的加速。PlaceRep 的实现可在该 https URL 获取。

英文摘要

Learning effective representations of urban environments requires capturing spatial structure beyond fixed administrative boundaries. Existing geospatial representation learning approaches typically aggregate Points of Interest (POIs) into pre-defined administrative regions such as census units or ZIP code areas, assigning a single embedding to each region. However, POIs often form semantically meaningful groups that extend across, within, or beyond these boundaries, defining places that better reflect human activity and urban function. To address this limitation, we propose PlaceRep, a geospatial representation learning method that constructs place-level representations by clustering spatially and semantically related POIs. PlaceRep summarizes large-scale POI graphs from U.S. Foursquare data to produce general-purpose urban region embeddings while automatically identifying places across multiple spatial scales. By eliminating model pre-training, PlaceRep provides a scalable and efficient solution for multi-granular geospatial analysis. Experiments using the tasks of population density estimation and housing price prediction as downstream tasks show that PlaceRep outperforms most state-of-the-art graph-based geospatial representation learning methods and achieves up to a x100 speedup in generating region-level representations on large-scale POI graphs. The implementation of PlaceRep is available at https://github.com/mohammadhashemii/PlaceRep.

2601.15503 2026-06-12 cs.LG 版本更新

Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning

基于机器学习的数据驱动湖泊水质时间序列缺失数据预测

Rishit Chatterjee, Tahiya Chowdhury

发表机构 * Department of Computer Science, Colby College(科克学院计算机科学系)

AI总结 针对志愿者监测导致的湖泊数据缺失问题,采用多重插补和岭回归,在30个湖泊数据集上实现透明度预测,并量化了最小样本量和特征集,提出联合可行性函数以优化监测策略。

详情
Journal ref
Published in: 2026 IEEE Conference on Technologies for Sustainability (SusTech)
Comments
8 pages, 4 figures, 3 tables
AI中文摘要

志愿者主导的湖泊监测产生不规则、季节性的时间序列,由于冰盖、天气相关的通行限制以及偶尔的人为错误,存在大量缺失数据,这给有害藻华预测和早期预警带来了困难。我们研究了基于来自缅因州湖泊三十年间原位记录的数据丰富子集(30个湖泊)的塞氏盘深度(SDD)预测。通过链式方程多重插补(MICE)处理缺失数据,并使用归一化平均绝对误差(nMAE)指标进行跨湖泊性能比较。在六种候选模型中,岭回归提供了最佳的平均测试性能。利用岭回归,我们量化了最小样本量,表明在向后近期历史协议下,模型平均每个湖泊约176个训练样本即可达到全历史准确率的5%以内。我们还确定了最小特征集,其中紧凑的四特征子集在相同5%容差内匹配了十三特征基线。综合这些结果,我们引入了一个联合可行性函数,该函数识别出达到完整历史、全特征基线5%以内目标所需的最小训练历史和最少预测变量。在我们的研究中,达到5%准确率目标需要每个湖泊约64个近期样本和仅一个预测变量,凸显了针对性监测的实用性。因此,我们的联合可行性策略在固定准确率目标下统一了近期历史长度和特征选择,为湖泊研究人员制定采样工作和测量优先级提供了简单高效的规则。

英文摘要

Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in-situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We also identify a minimal feature set, where a compact four-feature subset matches the thirteen-feature baseline within the same 5% tolerance. Bringing these results together, we introduce a joint feasibility function that identifies the minimal training history and fewest predictors sufficient to achieve the target of staying within 5% of the complete-history, full-feature baseline. In our study, meeting the 5% accuracy target required about 64 recent samples and just one predictor per lake, highlighting the practicality of targeted monitoring. Hence, our joint feasibility strategy unifies recent-history length and feature choice under a fixed accuracy target, yielding a simple, efficient rule for setting sampling effort and measurement priorities for lake researchers.