2603.14147 2026-06-09 cs.AI cs.LG 版本更新

An Alternative Trajectory for Generative AI

生成AI的另一种轨迹

Margarita Belova, Yuval Kansal, Yihao Liang, Jiaxin Xiao, Niraj K. Jha

发表机构 * Princeton University（普林斯顿大学）

AI总结本文提出通过构建领域特定超智能（DSS）来改进生成AI，利用符号抽象提升领域推理能力，避免LLM合成数据的模型崩溃问题，实现可持续发展。

详情

AI中文摘要

生成人工智能（AI）生态系统正经历快速变革，威胁其可持续性。随着模型从研究原型转向高流量产品，能耗从一次性训练转向持续的无界推理。推理模型使计算成本每查询增加数个数量级。通过单体模型扩展追求人工通用智能与物理约束的碰撞：电网故障、用水消耗和数据扩展的边际效益递减。此轨迹产生具有出色事实记忆的模型，但在需要深入推理的领域表现不佳，可能由于训练数据中的抽象不足。当前大型语言模型（LLMs）仅在数学和编程等领域表现出真实的推理深度，其他领域泛化能力差。我们提出基于领域特定超智能（DSS）的替代轨迹。我们主张首先构建显式的符号抽象（知识图谱、本体和形式逻辑）以支撑合成课程，使小型语言模型能够掌握领域特定推理，而无需LLM基于合成数据方法的模型崩溃问题。而非单一通用巨模型，我们设想“DSS模型社会”：动态生态系统，其中协调代理将任务路由到不同的DSS后端。此范式转变使能力脱离规模，使智能从能耗高的数据中心迁移到安全的设备专家。通过将算法进步与物理约束对齐，DSS社会使生成AI从环境负担转变为可持续的经济赋能力量。

英文摘要

The generative artificial intelligence (AI) ecosystem is undergoing rapid transformations that threaten its sustainability. As models transition from research prototypes to high-traffic products, the energetic burden has shifted from one-time training to recurring, unbounded inference. This is exacerbated by reasoning models that inflate compute costs by orders of magnitude per query. The prevailing pursuit of artificial general intelligence through scaling of monolithic models is colliding with hard physical constraints: grid failures, water consumption, and diminishing returns on data scaling. This trajectory yields models with impressive factual recall but struggles in domains requiring in-depth reasoning, possibly due to insufficient abstractions in training data. Current large language models (LLMs) exhibit genuine reasoning depth only in domains like mathematics and coding, where rigorous, pre-existing abstractions provide structural grounding. In other fields, the current approach fails to generalize well. We propose an alternative trajectory based on domain-specific superintelligence (DSS). We argue for first constructing explicit symbolic abstractions (knowledge graphs, ontologies, and formal logic) to underpin synthetic curricula enabling small language models to master domain-specific reasoning without the model collapse problem typical of LLM-based synthetic data methods. Rather than a single generalist giant model, we envision "societies of DSS models": dynamic ecosystems where orchestration agents route tasks to distinct DSS back-ends. This paradigm shift decouples capability from size, enabling intelligence to migrate from energy-intensive data centers to secure, on-device experts. By aligning algorithmic progress with physical constraints, DSS societies move generative AI from an environmental liability to a sustainable force for economic empowerment.

URL PDF HTML ☆

赞 0 踩 0

2603.13259 2026-06-09 cs.CL cs.AI 版本更新

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

Transformer 如何拒绝错误答案：事实约束处理的旋转动力学

Javier Marín

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究揭示了Transformer在处理事实性问题时，隐藏状态空间中正确与错误延续路径的旋转分离现象，揭示了模型在深层结构中对错误延续的非局部化偏好。

详情

AI中文摘要

当解码器-only Transformer 被强制处理事实性查询的匹配正确和错误单token延续时，两种路径在隐藏状态空间中以特定方式分离：从查询-only 表示出发的位移向量保持大致相等的幅度但方向旋转远离。角分离在中层增加，后期层解决不对称结果——在错误运行中，logit-lens 倾向远低于朴素先验，对应模型将错误token的概率约11.5倍于正确token。该双阶段模式——中层旋转分离后后期层不对称承诺——被描述为模型对外部看似拒绝错误延续的实证几何特征，但明确指出是观测描述而非因果解释。该模式在六个解码器-only Transformer 中一致，包括五个架构家族（1B到13B参数）。第七个模型（Qwen2 1.5B）在当前提取协议下显示平坦曲线，可能是tokenizer-fragmentation的artefact而非真实规模限制；是否存在临界出现阈值的问题仍悬而未决。单层激活拼接在任何层带均无法恢复正确token，意味着后期层不对称性并非局限于离散组件。总体而言，证据支持事实约束处理的分布式轨迹账户——几何结构在许多层中逐步累积出现，而非单一局部化回溯账户。

英文摘要

When a decoder-only transformer is forced to process matched correct and incorrect single-token continuations of a factual query, the two pathways through hidden-state space diverge in a specific way: displacement vectors from the query-only representation maintain approximately equal magnitude but rotate apart in direction. The angular separation grows through mid-depth, and late layers resolve the asymmetric outcome -a logit-lens preference that, in the incorrect run, falls far below the naive prior of equal probability, corresponding to the model assigning approximately 11.5 times more probability to the incorrect token than to the correct one. We characterize this two-phase pattern-rotational divergence in mid-depth followed by late-layer asymmetric commitment-as the empirical geometric signature of what looks externally like the model rejecting a wrong continuation, while remaining explicit that it is an observational characterization, not a causal account. The pattern is consistent across six decoder-only transformers including five architecture families from 1B to 13B parameters. A seventh model (Qwen2 1.5B) shows a flat profile under the present extraction protocol that is plausibly a tokenizer-fragmentation artefact rather than a real scale floor; the question of an emergence threshold is left open. Single-layer activation patching does not recover the correct token at any layer band, meaning the late-layer asymmetry is not localized to a discrete component under the protocol used. Taken together, the evidence is consistent with a distributed-by-trajectory account of factual constraint processing-geometric structure that emerges cumulatively across many layers rather than from a single localized circuit and inconsistent with the simplest single-layer localized-recall account.

URL PDF HTML ☆

赞 0 踩 0

2603.12666 2026-06-09 cs.LG cs.AI 版本更新

RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction

RetroReasoner：一种用于战略 retrosynthesis 预测的推理 LLM

Hanbum Ko, Chanhui Lee, Ye Rin Kim, Rodrigo Hormazabal, Sehui Han, Sungbin Lim, Sungwoong Kim

发表机构 * Department of Artificial Intelligence, Korea University（韩国大学人工智能系）； Department of Statistics, Korea University（韩国大学统计系）； Materials Intelligence Lab, LG AI Research（LG人工智能研究实验室）

AI总结 RetroReasoner 通过监督微调和强化学习，捕捉化学家基于断键策略的推理过程，提升 retrosynthesis 预测的准确性和多样性。

Comments 35 pages, 19 figures

详情

AI中文摘要

retrosynthesis预测旨在识别能够合成给定产物分子的反应物。尽管分子大语言模型（LLMs）最近展示了有前景的结果，但大多数现有方法要么直接生成反应物，要么仅提供通用的产品级分析，而没有明确推理关于断键策略来证明特定反应物选择的合理性。本文提出了RetroReasoner，一种能够捕捉化学家基于断键策略的推理过程的 retrosynthetic推理模型。RetroReasoner通过监督微调和强化学习进行训练。在监督微调中，SyntheticRetro生成结构化的断键理由配对反应物预测。在强化学习中，一个往返奖励通过将预测的反应物传递给正向合成模型来评估预测的反应物，奖励能够重建原始产物的预测。RetroReasoner还可以通过将其整合到并行化的蒙特卡洛树搜索框架中，用于多步 retrosynthetic规划，从而减少搜索时间并增加有效合成路径的数量和多样性。实验结果表明，RetroReasoner在性能上优于先前的基线，不仅包括分子LLMs，还包括专门针对retrosynthesis的专家模型，并生成更广泛的可行反应物提案，特别是在具有挑战性的反应实例中。

英文摘要

Retrosynthesis prediction aims to identify reactants that can synthesize a given product molecule. Although molecular large language models (LLMs) have recently shown promising results, most existing methods either generate reactants directly or provide only generic product-level analysis, without explicitly reasoning about bond-disconnection strategies that justify specific reactant choices. This paper proposes RetroReasoner, a retrosynthetic reasoning model that captures chemists' strategic disconnection-based thinking. RetroReasoner is trained with supervised fine-tuning and reinforcement learning. For supervised fine-tuning, SyntheticRetro generates structured disconnection rationales paired with reactant predictions. For reinforcement learning, a round-trip reward evaluates predicted reactants by passing them through a forward synthesis model and rewarding predictions that reconstruct the original product. RetroReasoner can also be applied to multi-step retrosynthetic planning by incorporating it into a parallelized Monte Carlo tree search framework, reducing search time while increasing the number and diversity of valid synthetic pathways. Experimental results show that RetroReasoner outperforms prior baselines, including not only molecular LLMs but also retrosynthesis-specific expert models, and generates a broader range of feasible reactant proposals, especially for challenging reaction instances.

URL PDF HTML ☆

赞 0 踩 0

2603.12453 2026-06-09 cs.CL 版本更新

CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection

CSE-UOI在SemEval-2026任务6中的表现：一种双阶段异构集成与 deliberative 复杂性门控的政治理论逃避检测方法

Christos Tzouvaras, Konstantinos Skianis, Athanasios Voulodimos

发表机构 * University of Ioannina（伊奥安纳大学）； National Technical University of Athens（雅典国家技术大学）

AI总结本文提出一种双阶段异构集成方法，结合自我一致性与加权投票，以及新颖的后处理修正机制Deliberative Complexity Gating，用于政治逃避检测，最终在评估集上获得0.85的Macro-F1分数。

详情

AI中文摘要

少令牌，大杠杆：在微调期间通过约束安全令牌保持安全对齐

Guoli Wang, Haonan Shi, Tu Ouyang, An Wang

发表机构 * Case Western Reserve University（凯斯西储大学）

AI总结提出PACT框架，通过约束安全相关令牌的置信度来防止微调导致的安全对齐漂移，同时保持下游任务性能。

Comments Accepted to KDD 2026

详情

DOI: 10.1145/3770855.3817837

AI中文摘要

大型语言模型（LLMs）通常需要微调（FT）才能在下游任务上表现良好，但即使训练数据集仅包含良性数据，FT也可能导致安全对齐漂移。先前的研究表明，引入少量有害数据会显著损害LLM的拒绝行为，导致LLM顺从有害请求。现有的防御方法通常依赖于模型范围的干预，例如限制哪些参数更新或注入额外的安全数据，这可能会限制通用性并降低下游任务性能。为了解决这些限制，我们提出了一种名为PACT（通过约束令牌保持安全对齐）的微调框架，该框架稳定了模型在安全令牌上的置信度。我们的方法基于经验观察：安全对齐行为反映在模型的令牌级输出置信度中，并且通常集中在少量安全相关令牌上。在下游微调期间，我们正则化微调模型，使其在每一步响应中与对齐参考模型在安全相关令牌上的置信度匹配，同时允许非安全令牌基本不受约束以实现有效的任务适应。这种有针对性的约束防止了对齐漂移，而无需施加通常以牺牲模型效用为代价的全局限制。我们的代码可在{https://github.com/Glresearch1/PACT}获取。

英文摘要

Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the training dataset contains only benign data. Prior work shows that introducing a small fraction of harmful data can substantially compromise LLM refusal behavior, causing LLMs to comply with harmful requests. Existing defense methods often rely on model-wide interventions, such as restricting which parameters are updated or injecting additional safety data, which can limit generality and degrade downstream task performance. To address these limitations, we propose a fine-tuning framework called Preserving Safety Alignment via Constrained Tokens (PACT), which stabilizes the model's confidence on safety tokens. Our approach is motivated by the empirical observation that safety-aligned behavior is reflected in the model's token-level output confidence and is often concentrated on a small subset of safety-related tokens. During downstream fine-tuning, we regularize the fine-tuned model to match the aligned reference model's confidence on safety-related tokens at each response step, while leaving non-safety tokens largely unconstrained to allow effective task adaptation. This targeted constraint prevents alignment drift without imposing global restrictions that typically trade off with model utility. Our code is available at {https://github.com/Glresearch1/PACT}.

URL PDF HTML ☆

赞 0 踩 0

2602.17911 2026-06-09 cs.CL cs.AI 版本更新

Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering

基于条件的推理用于依赖上下文的生物医学问答

Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei Han

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； National Institutes of Health（美国国立卫生研究院）

AI总结本文提出CondMedQA基准和Condition-Gated Reasoning框架，通过构建条件感知知识图谱，提升生物医学问答中条件依赖的推理能力。

详情

DOI: 10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963

AI中文摘要

当前生物医学问答系统常假设医学知识是统一的，但现实临床推理本质上是条件性的：几乎所有决策都依赖于患者特定因素，如共病和禁忌症。现有基准不评估此类条件推理，检索增强或图基方法缺乏显式机制确保检索知识适用于给定上下文。为解决这一差距，我们提出CondMedQA，首个针对条件生物医学问答的基准，包含多跳问题，其答案随患者条件变化。此外，我们提出Condition-Gated Reasoning（CGR），一种新框架，构建条件感知知识图谱，并根据查询条件选择性激活或修剪推理路径。我们的发现显示，CGR更可靠地选择条件合适的答案，同时在生物医学问答基准上匹配或超越现有最佳性能，突显了显式建模条件性对稳健医疗推理的重要性。

英文摘要

Current biomedical question answering (QA) systems often assume that medical knowledge applies uniformly, yet real-world clinical reasoning is inherently conditional: nearly every decision depends on patient-specific factors such as comorbidities and contraindications. Existing benchmarks do not evaluate such conditional reasoning, and retrieval-augmented or graph-based methods lack explicit mechanisms to ensure that retrieved knowledge is applicable to given context. To address this gap, we propose CondMedQA, the first benchmark for conditional biomedical QA, consisting of multi-hop questions whose answers vary with patient conditions. Furthermore, we propose Condition-Gated Reasoning (CGR), a novel framework that constructs condition-aware knowledge graphs and selectively activates or prunes reasoning paths based on query conditions. Our findings show that CGR more reliably selects condition-appropriate answers while matching or exceeding state-of-the-art performance on biomedical QA benchmarks, highlighting the importance of explicitly modeling conditionality for robust medical reasoning.

URL PDF HTML ☆

赞 0 踩 0

2603.04865 2026-06-09 cs.SD 版本更新

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

环境声音深度伪造检测挑战赛：鲁棒性、评估与洞察的基准测试

Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang

发表机构 * School of Electrical Engineering, KAIST, Daejeon, Republic of Korea（韩国成均馆大学电气工程学院）； University of Melbourne, Australia（墨尔本大学）； Fortemedia Singapore, Singapore（新加坡Fortemedia公司）； Xi’an University of Posts & Telecommunications, Xi’an, China（西安邮电大学）； Xi'an Lianfeng Acoustic Technologies Co., Ltd., China（西安联丰声学技术有限公司）

AI总结本文介绍了环境声音深度伪造检测挑战赛，探讨了鲁棒性评估、系统架构及未来研究方向，提出了环境声音深度伪造检测的关键挑战与机遇。

Comments Accepted by Interspeech 2026

详情

AI中文摘要

近年来，音频生成技术的进步使得创建高度逼真的环境声音景观变得更加容易，这可能被滥用于制造欺骗性内容，如假警报、枪声和人群声音，从而引发公众安全和信任的担忧。尽管语音和歌唱声的深度伪造检测已被广泛研究，但环境声音深度伪造检测（ESDD）仍处于探索阶段。为了推动ESDD的发展，首次ESDD挑战赛被启动，吸引了97支注册团队，收到了1748份有效提交。本文提出了该任务的定义、数据集构建、评估协议、基线系统以及挑战赛结果中的关键见解。此外，我们分析了高性能系统中常见的架构选择和训练策略。最后，我们讨论了ESDD的潜在未来研究方向，概述了关键机会和开放问题，以指导该领域后续研究。

英文摘要

Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key opportunities and open problems to guide subsequent studies in this field.

URL PDF HTML ☆

赞 0 踩 0

2603.05500 2026-06-09 cs.LG cs.AI cs.CL 版本更新

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

POET-X：通过扩展正交变换实现内存高效的LLM训练

Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu

发表机构 * University of Cambridge（剑桥大学）

AI总结 POET-X通过优化正交等价变换降低计算和内存开销，实现高效稳定的LLM训练，支持在单块H100 GPU上预训练十亿参数模型。

Comments ICML 2026 Oral (15 pages, 7 figures, project page: https://spherelab.ai/poetx/)

详情

AI中文摘要

高效且稳定的大型语言模型（LLM）训练仍然是现代机器学习系统的核心挑战。为解决这一挑战，提出了重新参数化正交等价训练（POET），这是一种保持谱的框架，通过正交等价变换优化每个权重矩阵。尽管POET提供了强大的训练稳定性，但其原始实现由于密集的矩阵乘法导致高内存消耗和计算开销。为克服这些限制，我们引入了POET-X，一种可扩展且内存高效的变体，通过显著降低的计算成本执行正交等价变换。POET-X在保持POET的一般化和稳定性优势的同时，实现了吞吐量和内存效率的显著提升。在我们的实验中，POET-X能够在单块Nvidia H100 GPU上预训练十亿参数的LLM，而标准优化器如AdamW在相同设置下会因内存不足而失败。

英文摘要

Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-preserving framework that optimizes each weight matrix through orthogonal equivalence transformation, has been proposed. Although POET provides strong training stability, its original implementation incurs high memory consumption and computational overhead due to intensive matrix multiplications. To overcome these limitations, we introduce POET-X, a scalable and memory-efficient variant that performs orthogonal equivalence transformations with significantly reduced computational cost. POET-X maintains the generalization and stability benefits of POET while achieving substantial improvements in throughput and memory efficiency. In our experiments, POET-X enables the pretraining of billion-parameter LLMs on a single Nvidia H100 GPU, and in contrast, standard optimizers such as AdamW run out of memory under the same settings.

URL PDF HTML ☆

赞 0 踩 0

2601.21149 2026-06-09 cs.LG cs.AI 版本更新

Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement

移动性嵌入的POI：从人类移动中学习场所身份与使用方式

Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu, Neha Arora, Cyrus Shahabi

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结提出ME-POIs框架，通过对比学习将大规模人类移动数据与语言模型嵌入结合，学习场所功能，并在五个地图丰富任务上超越文本或移动性单独基线。

详情

AI中文摘要

近期地理空间基础模型的进展强调了学习真实世界位置（特别是人类活动集中的兴趣点POI）通用表示的重要性。然而，现有方法主要关注从静态文本元数据中提取的场所身份，或学习与轨迹上下文相关的表示，这些表示捕捉的是移动规律而非场所的实际使用方式（即POI的功能）。我们认为POI功能是通用POI表示中缺失但关键的信号。我们提出了移动性嵌入的POI（ME-POIs），这是一个框架，通过大规模人类移动数据增强从语言模型派生的POI嵌入，以学习基于真实世界使用的、以POI为中心且上下文无关的表示。ME-POIs将个体访问编码为时间上下文化的嵌入，并通过对比学习将其与可学习的POI表示对齐，以捕捉跨用户和时间的使用模式。为解决长尾稀疏性问题，我们提出了一种新机制，从附近频繁访问的POI跨多个空间尺度传播时间访问模式。我们在五个新提出的地图丰富任务上评估ME-POIs，测试其捕捉POI身份和功能的能力。在所有任务中，用ME-POIs增强文本嵌入始终优于纯文本和纯移动性基线。值得注意的是，仅使用移动数据训练的ME-POIs在某些任务上能超越纯文本模型，凸显了POI功能是准确且可泛化的POI表示的关键组成部分。

英文摘要

Recent progress in geospatial foundation models highlights the importance of learning general-purpose representations for real-world locations, particularly points-of-interest (POIs) where human activity concentrates. Existing approaches, however, focus primarily on place identity derived from static textual metadata, or learn representations tied to trajectory context, which capture movement regularities rather than how places are actually used (i.e., POI's function). We argue that POI function is a missing but essential signal for general POI representations. We introduce Mobility-Embedded POIs (ME-POIs), a framework that augments POI embeddings derived, from language models with large-scale human mobility data to learn POI-centric, context-independent representations grounded in real-world usage. ME-POIs encodes individual visits as temporally contextualized embeddings and aligns them with learnable POI representations via contrastive learning to capture usage patterns across users and time. To address long-tail sparsity, we propose a novel mechanism that propagates temporal visit patterns from nearby, frequently visited POIs across multiple spatial scales. We evaluate ME-POIs on five newly proposed map enrichment tasks, testing its ability to capture both the identity and function of POIs. Across all tasks, augmenting text-based embeddings with ME-POIs consistently outperforms both text-only and mobility-only baselines. Notably, ME-POIs trained on mobility data alone can surpass text-only models on certain tasks, highlighting that POI function is a critical component of accurate and generalizable POI representations.

URL PDF HTML ☆

赞 0 踩 0

2509.20906 2026-06-09 cs.CV cs.RO 版本更新

Distant Object Localisation from Noisy Image Segmentation Sequences

基于噪声图像分割序列的远距离目标定位

Julius Pesonen, Arno Solin, Eija Honkavaara

发表机构 * Research Council of Finland（芬兰研究理事会）； RCF Flagship Forest–Human–Machine Interplay—Building Resilience, Redefining Value Networks and Enabling Meaningful Experiences (UNITE)（RCF旗舰森林-人类-机器交互——构建韧性，重新定义价值网络和赋能有意义体验（UNITE））

AI总结针对远距离目标定位问题，提出多视图三角测量和粒子滤波两种方法，后者还能提供形状和不确定性估计，结合无人机图像分割与GNSS姿态估计实现可靠野火监测。

详情

基于基础推理模型的时间点过程上下文学习

David Berghaus, Patrick Seifner, Kostadin Cvejoski, César Ojeda, Ramsés J. Sánchez

发表机构 * Lamarr Institute（拉马尔研究所）； Fraunhofer IAIS（弗劳恩霍夫人工智能研究所）； University of Bonn（波恩大学）； JetBrains Research（JetBrains研究）； University of Potsdam（波恩大学）

AI总结提出一种基于摊销推理和上下文学习的点过程基础推理模型FIM-PP，通过大规模合成数据预训练，无需额外训练即可估计真实MTPP，或快速微调至目标系统。

Comments This paper is published as a conference paper at ICLR 2026

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

AI中文摘要

利用带标记的时间点过程（MTPP）对多种事件类型的事件序列进行建模，为揭示支配性动态规则和预测未来事件提供了一种原则性方法。当前MTPP推理的神经网络方法依赖于为每个目标系统训练单独的专用模型。我们采用一种截然不同的方法：利用摊销推理和上下文学习，预训练一个深度神经网络，以从由事件序列集合定义的上下文中推断事件历史的条件强度函数。预训练是在从广泛霍克斯过程分布中采样的大规模合成MTPP数据集上进行的。预训练后，我们的点过程基础推理模型（FIM-PP）可以在无需任何额外训练的情况下从真实世界数据中估计MTPP，或者快速微调至目标系统。实验表明，这种摊销方法在常见基准数据集上的下一事件预测任务中与专用模型的性能相匹配。

英文摘要

Modeling event sequences of multiple event types with marked temporal point processes (MTPPs) provides a principled way to uncover governing dynamical rules and predict future events. Current neural network approaches to MTPP inference rely on training separate, specialized models for each target system. We pursue a radically different approach: drawing on amortized inference and in-context learning, we pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences. Pretraining is performed on a large synthetic dataset of MTPPs sampled from a broad distribution of Hawkes processes. Once pretrained, our Foundation Inference Model for Point Processes (FIM-PP) can estimate MTPPs from real-world data without any additional training, or be rapidly finetuned to target systems. Experiments show that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.

URL PDF HTML ☆

赞 0 踩 0

2602.22919 2026-06-09 cs.CV 版本更新

UAOR: 面向视觉-语言-动作模型的不确定性感知观测重注入

Jiabing Yang, Yixiang Chen, Yuan Xu, Peiyan Li, Zichen Wen, Bowen Fang, Tao Yu, Xiangnan Wu, Qisen Ma, Kai Wang, Ziheng He, Yingda Li, Zhengbo Zhang, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； New Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所模式识别新技术实验室）； Shanghai Jiao Tong University（上海交通大学）； FiveAges（五代）

AI总结提出UAOR模块，通过动作熵检测不确定性，在语言模型高不确定层重注入观测信息，无需额外训练或数据，提升VLA模型在仿真和真实任务中的性能。

详情

AI中文摘要

视觉-语言-动作（VLA）模型利用预训练的视觉-语言模型（VLM）作为骨干，将图像和指令映射到动作，展现出在可泛化机器人操作中的显著潜力。为了提升性能，现有方法通常引入额外的观测线索（如深度图、点云）或辅助模块（如目标检测器、编码器），以实现更精确和可靠的任务执行，但这些方法通常需要昂贵的数据收集和额外训练。受语言模型中的前馈网络（FFN）可作为“键值记忆”的发现启发，我们提出不确定性感知观测重注入（UAOR），一种有效、无需训练且即插即用的VLA模型模块。具体地，当当前语言模型层表现出由动作熵衡量的高不确定性时，它通过注意力检索将关键观测信息重注入下一层的前馈网络（FFN）。该机制直接在高不确定性层用观测证据增强隐藏状态，从而实现更准确和可靠的动作生成。综合实验表明，我们的方法以最小开销一致地提升了多种VLA模型在仿真和真实任务中的性能。值得注意的是，UAOR消除了对额外观测线索或模块的需求，使其成为现有VLA流程中通用且实用的即插即用组件。项目页面见此URL。

英文摘要

Vision-Language-Action (VLA) models leverage pretrained Vision-Language Models (VLMs) as backbones to map images and instructions to actions, demonstrating remarkable potential for generalizable robotic manipulation. To enhance performance, existing methods often incorporate extra observation cues (e.g., depth maps, point clouds) or auxiliary modules (e.g., object detectors, encoders) to enable more precise and reliable task execution, yet these typically require costly data collection and additional training. Inspired by the finding that Feed-Forward Network (FFN) in language models can act as "key-value memory", we propose Uncertainty-aware Observation Reinjection (UAOR), an effective, training-free and plug-and-play module for VLA models. Specifically, when the current language model layer exhibits high uncertainty, measured by Action Entropy, it reinjects key observation information into the next layer's Feed-Forward Network (FFN) through attention retrieval. This mechanism directly augments the hidden states with observation evidence at high-uncertainty layers, enabling more accurate and reliable action generation. Comprehensive experiments show that our method consistently improves diverse VLA models across simulation and real-world tasks with minimal overhead. Notably, UAOR eliminates the need for additional observation cues or modules, making it a versatile and practical plug-in for existing VLA pipelines. The project page is at https://uaor.jiabingyang.cn.

URL PDF HTML ☆

赞 0 踩 0

2602.17337 2026-06-09 cs.CV 版本更新

Polaffini: A feature-based approach for robust affine and polyaffine image registration

Polaffini: 一种基于特征的鲁棒仿射和多项式仿射图像配准方法

Antoine Legouhy, Cosimo Campo, Ross Callaghan, Hojjat Azadbakht, Hui Zhang

发表机构 * Hawkes Institute & Department of Computer Science, University College London, London, UK（霍克斯研究所及大学学院伦敦计算机科学系，伦敦，英国）； Institut Pasteur, Université Paris Cité, Unité de Neuroanatomie Appliquée et Théorique（巴斯德研究所，巴黎城市大学，应用与理论神经解剖学单元）； AINOSTICS ltd., Manchester, UK（AINOSTICS有限公司，曼彻斯特，英国）

AI总结提出Polaffini框架，利用深度学习分割模型提取解剖对应点，通过闭式解实现全局和局部仿射匹配，生成从仿射到多项式仿射的可调平滑变换，在结构对齐和下游非线性配准初始化上优于传统方法。

Comments associated github repo: https://github.com/CIG-UCL/polaffini

详情

AI中文摘要

在这项工作中，我们提出了Polaffini，一个稳健且通用的解剖学基础配准框架。医学图像配准主要由基于强度的配准方法主导，这些方法依赖于对齐质量的替代度量。相比之下，基于特征的方法通过识别明确的解剖对应点进行操作，理论上更理想，但由于可靠提取特征的挑战而 largely 失宠。然而，得益于深度学习的近期进展，这些挑战现已显著克服，预训练的分割模型能够即时提供可靠、精细的解剖描绘。我们旨在证明这些进展可用于创建新的解剖学基础图像配准算法。为此，我们提出Polaffini，它从这些分割区域中以特别简单的方式获得具有一一对应关系的解剖学基础特征点：提取它们的质心。这些特征点通过闭式解实现高效的全局和局部仿射匹配。这些匹配用于生成从仿射到多项式仿射的整体变换，并具有可调平滑度。多项式仿射变换比仿射变换具有更多的自由度，允许更精细的对齐，并且它们在对数-欧几里得框架中的嵌入确保了微分同胚性质。Polaffini既可用于独立配准，也可作为后续非线性配准的预对齐，我们将其与流行的基于强度的配准技术进行了评估。结果表明，Polaffini在结构对齐方面优于竞争方法，并为下游非线性配准提供了改进的初始化。Polaffini快速、稳健且准确，使其特别适合集成到医学图像处理流程中。

英文摘要

In this work we present Polaffini, a robust and versatile framework for anatomically grounded registration. Medical image registration is dominated by intensity-based registration methods that rely on surrogate measures of alignment quality. In contrast, feature-based approaches that operate by identifying explicit anatomical correspondences, while more desirable in theory, have largely fallen out of favor due to the challenges of reliably extracting features. However, such challenges are now significantly overcome thanks to recent advances in deep learning, which provide pre-trained segmentation models capable of instantly delivering reliable, fine-grained anatomical delineations. We aim to demonstrate that these advances can be leveraged to create new anatomically-grounded image registration algorithms. To this end, we propose Polaffini, which obtains, from these segmented regions, anatomically grounded feature points with 1-to-1 correspondence in a particularly simple way: extracting their centroids. These enable efficient global and local affine matching via closed-form solutions. Those are used to produce an overall transformation ranging from affine to polyaffine with tunable smoothness. Polyaffine transformations can have many more degrees of freedom than affine ones allowing for finer alignment, and their embedding in the log-Euclidean framework ensures diffeomorphic properties. Polaffini has applications both for standalone registration and as pre-alignment for subsequent non-linear registration, and we evaluate it against popular intensity-based registration techniques. Results demonstrate that Polaffini outperforms competing methods in terms of structural alignment and provides improved initialisation for downstream non-linear registration. Polaffini is fast, robust, and accurate, making it particularly well-suited for integration into medical image processing pipelines.

URL PDF HTML ☆

赞 0 踩 0