arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.23597 2026-05-25 cs.CL cs.LG

Structure-Guided Entity Resolution: Fine-Tuning LLMs for Robust Name Matching in Complex Linguistic Contexts

结构引导的实体解析：微调大语言模型以实现复杂语言上下文中的鲁棒姓名匹配

Shivam Chourasia, Hitesh Kapoor, Nilesh Patil

发表机构 * Dream Sports

AI总结本文研究了在语言和文化复杂环境下进行人名匹配的实体解析问题，提出了一种名为Structure-Guided Entity Resolution（SGER）的新框架，通过两阶段课程式微调增强大语言模型对姓名结构和语义的理解，从而提升实体匹配的准确性。该方法在印度身份数据等具有高度语言多样性和噪声的现实场景中表现出色，取得了99.02%的高准确率，并在生产环境中成功部署，验证了其在大规模多语言系统中的有效性和鲁棒性。

Comments Accepted to ACL 2026. 8 pages, 1 figure, 2 tables

详情

AI中文摘要

跨异构记录匹配人名是实体解析的核心挑战，尤其是在语言和文化复杂的环境中。命名惯例的差异、跨文字的不一致音译以及频繁的数据录入错误使得统一用户身份变得困难，而这对于了解你的客户（KYC）合规至关重要。虽然大语言模型在理解自然语言方面显示出潜力，但它们往往难以处理此类特定领域设置中存在的结构化歧义。本文介绍了结构引导实体解析（SGER），一种新颖的框架，通过两阶段课程微调大语言模型。模型首先被训练解析人名的语法和语义结构，然后针对二元实体匹配的下游任务进行优化。我们在印度身份数据的挑战性背景下评估SGER，这是全球语言最多样化和噪声最大的环境之一。SGER在包含50,000个真实世界对的保留测试集上达到了99.02%的准确率和0.994的F1分数，优于GPT-4o少样本提示和单阶段微调基线。该系统已完全部署在全球最大的梦幻体育平台Dream11的生产环境中，服务超过2.5亿用户。我们的结果表明，课程引导的训练能够在现实世界的多语言系统中实现大规模、高精度的实体解析。

英文摘要

Matching person names across heterogeneous records is a core challenge in entity resolution, especially within linguistically and culturally complex environments. Variations in naming conventions, inconsistent transliteration across scripts, and frequent data entry errors make it difficult to unify user identities, an essential requirement for Know Your Customer (KYC) compliance. While Large Language Models have shown promise in understanding natural language, they often struggle with the structured ambiguity present in such domain-specific settings. This paper introduces Structure-Guided Entity Resolution (SGER), a novel framework that fine-tunes an LLM through a two-phase curriculum. The model is first trained to parse the grammatical and semantic structure of personal names, then optimized for the downstream task of binary entity matching. We evaluate SGER in the challenging context of Indian identity data, one of the most linguistically diverse and noisy environments globally. SGER achieves 99.02% accuracy and an F1 of 0.994 on a held-out set of 50,000 real-world pairs, outperforming GPT-4o few-shot prompting and single-stage fine-tuning baselines. The system is fully deployed in production at Dream11, the world's largest fantasy sports platform, serving 250M+ users. Our results demonstrate that curriculum-guided training enables robust, high-precision entity resolution in real-world multilingual systems at scale.

URL PDF HTML ☆

赞 0 踩 0

2605.23592 2026-05-25 cs.AI

Solving the Aircraft Disassembly Scheduling Problem

解决飞机拆解调度问题

Charles Thomas, Pierre Schaus

发表机构 * Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM)（信息与通信技术、电子与应用数学研究所）； UCLouvain（乌得勒支大学）

AI总结本文研究了飞机报废拆解过程中的调度问题，该问题涉及大量任务和多种约束条件，对航空公司实现可持续拆解和盈利至关重要。文章提出了两种求解方法，包括约束规划模型和混合整数规划模型，并基于工业合作伙伴提供的真实数据进行了测试，验证了模型在处理多达1450项任务实例中的有效性。

详情

AI中文摘要

拆解寿命终结的飞机是一项复杂的工程，对于可持续性而言是必要的，但为航空运输公司带来的利润空间很小。因此，拆解过程的高效调度对于确保流程的盈利能力和激励实践至关重要。这是一个涉及数千个任务和许多不同约束的大规模调度问题：提取计划重复使用的部件需要具有特定认证和设备的技师。提取操作可能受先后顺序关系约束。此外，在整个过程中必须保持飞机平衡。最后，飞机的某些位置空间有限，限制了可同时工作的技师数量。本文详细介绍了该问题，并提出了两种解决方法：约束规划模型和混合整数规划模型。这些模型在基于工业合作伙伴提供的真实运营数据、规模不同（最多1450个任务）的实例上进行了测试。

英文摘要

Dismantling aircrafts reaching their end of life is a complex endeavour that is necessary in terms of sustainability but yields small income margins for air transport companies. An efficient scheduling of the disassembly procedure is thus crucial to ensure the profitability of the process and incentivize practice. This is a large scheduling problem that involves thousands of tasks and many different constraints: Extracting parts that are destined to be reused requires technicians with specific certifications and equipment. Extraction operations might be subject to precedence relations. Furthermore, the aircraft must be kept balanced during the whole process. Finally, some of the locations of the aircraft have a limited space that caps the number of technicians able to work there concurrently. This article presents the problem in details and proposes two approaches to solve the problem: a Constraint Programming model and a MIP model. The models are tested on instances of varying sizes involving up to 1450 tasks, which are based on real operational data provided by an industrial partner.

URL PDF HTML ☆

赞 0 踩 0

2605.23590 2026-05-25 cs.AI

推动你的智能体：在长周期LLM智能体中测量和强制实现定量目标持续性

Yuandao Cai, Yuzhang Zhu, Liyou Gao, Wensheng Tang, Shengchao Qin

发表机构 * Independent Researcher（独立研究者）； Xidian University（西安电子科技大学）

AI总结本文研究了长期语言智能体在完成定量目标时存在的“定量目标持续性”（QGP）问题，即智能体是否能持续工作直到外部验证器确认完成足够数量的有效任务。为此，作者提出了PushBench基准，用于直接衡量重复工作、重复提交、虚假完成等问题。实验表明，基于状态追踪和工作单元追踪的控制器在减少重复提交和提高任务完成率方面表现优异，而当前主流智能体在处理大量任务时成功率显著下降，突显了定量目标对智能体可靠性提出的更高要求。

详情

AI中文摘要

长周期语言智能体可能做出许多看似合理的局部工具调用，但未能持续直到请求的数量实际完成。我们将这一差距研究为定量目标持续性（QGP）：即智能体是否持续工作，直到外部验证器确认足够数量的不同有效项。PushBench将其转化为一个用于仓库-工件收集和验证器支持的工作单元的基准，因此重复工作、重复提交、虚假完成和进度漂移被直接测量，而不是隐藏在最终成功标志之后。在匹配的控制器比较中，状态追踪检索控制器达到69-78%的成功率，同时消除了重复提交；而积压追踪工作单元控制器在标准和完成门控控制器无法完成任何任务实例的设置中达到25-50%的成功率。使用Claude Code（Sonnet 4.6）和Codex CLI（gpt-5.4）的黑盒前沿智能体评估解决了许多50个工件的任务，但在100个工件时每条件仅剩3/9的成功率。结果表明，定量目标对不同于局部任务能力的可靠性要求提出了挑战：智能体必须维护已验证的进度，并仅在请求的工作完成时停止。

英文摘要

Long-horizon language agents can make many plausible local tool calls yet fail to persist until a requested count is actually complete. We study this gap as Quantitative Goal Persistence (QGP): whether an agent keeps working until an external verifier confirms enough distinct valid items. PushBench turns this into a benchmark for repository-artifact collection and verifier-backed work units, so repeated work, duplicate submissions, false completion, and progress drift are measured directly rather than hidden behind a final success flag. In matched controller comparisons, a state-tracking retrieval controller reaches 69-78% success while eliminating duplicate submissions, and a backlog-tracking work-unit controller reaches 25-50% success in settings where standard and completion-gated controllers complete no task instances. Black-box frontier-agent evaluations with Claude Code (Sonnet 4.6) and Codex CLI (gpt-5.4) solve many 50-artifact tasks but drop to 3 out of 9 successes per condition at 100 artifacts. The results show that quantitative goals stress a different reliability requirement from local task competence: agents must maintain verified progress and stop only when the requested work is complete.

URL PDF HTML ☆

赞 0 踩 0

2605.23569 2026-05-25 cs.AI

CP or DP? Why Not Both: A Case Study in the Partial Shop Scheduling Problem

CP还是DP？为何不兼得：以部分车间调度问题为例

Emma Legrand, Roger Kameugne, Pierre Schaus

发表机构 * ICTEAM, UCLouvain, Belgium（ICTEAM，鲁汶大学，比利时）

AI总结本文研究了如何将动态规划（DP）与约束规划（CP）有效结合，以解决部分车间调度问题（PSSP）。作者提出了一种混合方法，以DP作为主搜索框架，利用CP进行全局约束传播，从而提升求解效率与灵活性。该方法不仅支持任意优先级约束，还可与任何时间策略结合，并能设计出基于DP的大型邻域搜索方案，展示了DP与CP融合在组合优化问题中的可行性。

详情

AI中文摘要

动态规划（DP）和约束规划（CP）是解决组合优化问题的成熟范式。通常，这两种方法被分开使用。本文旨在展示两者可以有效且优雅地结合，其中DP作为主搜索框架，CP作为子程序利用全局约束传播。本文针对部分车间调度问题（PSSP）提出了这样一种方法，该问题之前已有纯DP方法，并且有高效的CP过滤算法可用。PSSP是一个通用调度问题，其中每个作业由一组具有任意优先约束的操作组成。该方法足够灵活，可以容纳任意时间DP策略，例如任意时间列搜索，而原始DP算法以严格的逐层方式运行。此外，CP建模的灵活性使得可以轻松纳入任意优先约束。因此，该模型自然地处理任何优先图，甚至允许设计大邻域搜索（LNS）方案，其中重用DP模型，并在重启之间施加偏序调度以改进当前解。虽然对于这个特定问题，该方法无法与最先进的纯CP求解器竞争，但我们的主要贡献是证明了这种混合集成的可行性。

英文摘要

Dynamic Programming (DP) and Constraint Programming (CP) are well-established paradigms for solving combinatorial optimization problems. Usually, these two approaches are used separately. This paper aims to show that the two can be combined effectively and elegantly, with DP serving as the primary search framework and CP used as a subroutine to leverage global constraint propagation. This paper presents such an approach for the Partial Shop Scheduling Problem (PSSP), for which a pure DP method has previously been proposed, and efficient CP filtering algorithms are available. The PSSP is a general scheduling problem where each job consists of a set of operations with arbitrary precedence constraints. The approach is flexible enough to accommodate anytime DP strategies, such as anytime column search, whereas the original DP algorithm operated in a strictly layer-wise manner. Moreover, the flexibility of the CP modeling makes it straightforward to incorporate arbitrary precedence constraints. As a result, the model naturally handles any precedence graph and even enables the design of a Large Neighborhood Search (LNS) scheme, in which the DP model is reused, and partial-order schedules are imposed across restarts to improve the incumbent solution. While not competitive with state-of-the-art pure CP solvers for this specific problem, our primary contribution is demonstrating the viability of this hybrid integration.

URL PDF HTML ☆

赞 0 踩 0

2605.23568 2026-05-25 cs.RO cs.SY eess.SY

TactileReflex: Noise-Statistics-Driven Vision-Tactile Reflex Control for Force-Sensitive Manipulation

TactileReflex：基于噪声统计的视觉-触觉反射控制用于力敏感操作

Ziyan Feng, Yulong Fu, Zheng Li, Yuxin He, Jieji Ren, Lujia Wang, Jinni Zhou, Yudong Zhong, Qiang Nie

发表机构 * Thrust of Robotics and Autonomous Systems, The Hong Kong University of Science and Technology (Guangzhou)（机器人与自主系统研究所，香港科学与技术大学（广州））； School of Mechanical Engineering, Shanghai Jiao Tong University（上海交通大学机械工程学院）

AI总结本文提出了一种基于噪声统计特性的视觉-触觉反射控制方法TactileReflex，用于实现对力敏感的精细操作任务，如液体填充的塑料杯的抓取与操作。该方法通过分析触觉传感器的内在噪声特性，直接推导出控制器的阈值，无需外部力标定或手动调参。实验表明，TactileReflex能够有效防止容器不可逆变形，并在动态倒水任务中表现出优异的稳定性与成功率，具有作为高层次操作系统安全层的潜力。

Comments 8 pages, 4 figures, 6 tables

详情

AI中文摘要

操作易变形的柔性容器（如装有液体的一次性塑料杯）需要在极窄的力裕度内实时调整抓取力：力不足会导致滑动，力过大则会使薄壁不可逆变形。现有方法难以完成此类力敏感操作任务。我们提出一种基于噪声统计的标定驱动反射控制范式，结合基于视觉的触觉感知：通过分析传感器的固有噪声特性（通过简短的静态保持-卸载协议），直接推导出所有控制器阈值，消除了外部力标定、试错手动调参或材料特定的物理模型。实现该范式，我们提出了TactileReflex，一个三通道闭环控制器，从双视觉触觉传感器中提取三个图像级代理：剪切强度（$S_y$）、接触强度（$F_n$）和压力中心（$C$），并以约12Hz驱动优先反射通道，用于滑动抑制、重量自适应释放和力保护。每个通道通过噪声导出的阈值直接在其代理上闭环。消融实验表明，只有完整的三通道系统能够防止容器不可逆变形（5/5成功，而部分配置最多1/5成功）。在动态倾倒任务中，固定力基线因姿态漂移在所有10次尝试中均失败，而TactileReflex在两种水量下实现了9/10成功。作为一个自包含且可解释的控制器，TactileReflex可作为高层操作流水线（包括无触觉VR遥操作和视觉-语言-动作策略）的即插即用安全层。

英文摘要

Manipulating fragile deformable containers, such as disposable plastic cups filled with liquid, demands real-time grip-force adaptation within an extremely narrow force margin: insufficient force causes slip, while excessive force irreversibly deforms the thin wall. Existing approaches struggle to achieve such force-sensitive manipulation tasks. We propose a noise-statistics-based calibration-driven reflex control paradigm with vision-based tactile sensing: by analyzing the sensor's intrinsic noise characteristics (via a brief static-hold-and-unload protocol), we directly derive all controller thresholds, eliminating external force calibration, trial-and-error manual tuning, or material-specific physical models. Instantiating this paradigm, we present TactileReflex, a three-channel closed-loop controller that extracts three image-level proxies, shear intensity ($S_y$), contact intensity ($F_n$), and center of pressure ($C$), from dual visuo-tactile sensors and drives prioritized reflex channels at ~12 Hz for slip suppression, weight-adaptive release, and force protection. Each channel closes the loop directly on its proxy via noise-derived thresholds. Ablation demonstrates that only the full three-channel system is able to prevent irreversible container deformation (5/5 success vs. at most 1/5 for partial configurations). In a dynamic pouring task, fixed-effort baselines fail in all 10 attempts due to pose drift, while TactileReflex achieves 9/10 success across two water volumes. As a self-contained and interpretable controller, TactileReflex can serve as a plug-and-play safety layer beneath high-level manipulation pipelines, including haptic-free VR teleoperation and vision-language-action (VLA) policies.

URL PDF HTML ☆

赞 0 踩 0

2605.23565 2026-05-25 cs.LG cs.AI

Understanding Goal Generalisation in Sequential Reinforcement Learning

理解序贯强化学习中的目标泛化

Jason Ross Brown, Edward James Young

发表机构 * University of Cambridge（剑桥大学）； Geodesic Research（Geodesic研究）

AI总结本研究探讨了序列强化学习代理在新环境中实现目标泛化的能力，分析了其训练历史对其行为的影响。通过研究超过100种序列训练流程并在250多个分布外环境中进行评估，发现显著特征和早期学习的目标对后续泛化具有重要影响。为此，研究提出了一种名为潜在策略梯度的方法，能够预测训练流程可能诱导的分布外行为，具有较高的预测准确性、良好的泛化能力和可解释性，为从发展角度理解目标泛化提供了基础。

详情

AI中文摘要

强化学习代理在其训练分布之外常常表现出非预期的目标导向行为，但我们目前缺乏基于训练历史对这类代理如何泛化到新环境的原理性理解。我们针对在单个或多个任务上序贯训练的代理解决了这一空白。我们研究了超过100个序贯训练流程，评估了超过250个分布外环境中的行为。我们发现显著特征驱动泛化，并且训练早期习得的目标会持续存在并影响后期习得的目标。为了解释这些现象，我们引入了潜在策略梯度方法，该方法预测训练流程可能诱导的分布外行为。我们的方法根据潜在变量如何映射到行为的简单模型，模拟训练过程中低维潜在变量的演化，以实现在训练目标上获得高奖励。它实现了强预测准确性，泛化到未见过的训练流程类型，并且是可解释的。我们的发现表明，虽然分布外RL代理行为依赖于整个训练流程，但这种依赖具有我们可以捕捉的底层结构，为从发展角度理解目标泛化奠定了基础。

英文摘要

Reinforcement learning agents often exhibit unintended goal-directed behaviour outside their training distribution, but we currently lack a principled understanding of how such agents will generalise to novel environments based on their training history. We address this gap for agents trained sequentially on one or more tasks. We study over 100 sequential training pipelines, evaluating behaviour across over 250 out-of-distribution environments. We find that salient features drive generalisation, and that goals learnt early in training can persist and influence those acquired later. To explain these phenomena, we introduce latent policy gradients, a method that predicts what out-of-distribution behaviour a training pipeline will likely induce. Our method simulates the evolution of low-dimensional latent variables during training according to what would achieve high reward on the training objective with respect to a simple model of how the latent variables map to behaviour. It achieves strong predictive accuracy, generalises to unseen types of training pipeline, and is interpretable. Our findings demonstrate that while out-of-distribution RL agent behaviour is dependent on the whole training pipeline, this dependence has an underlying structure we can capture, laying groundwork for understanding goal generalisation from a developmental perspective.

URL PDF HTML ☆

赞 0 踩 0

2605.23563 2026-05-25 cs.LG

MARS: Magnitude-Aware Rank Statistics

MARS：幅度感知排名统计

Muhammad Rajabinasab, Afsaneh M. Nejad, Arthur Zimek

发表机构 * University of Southern Denmark（南方丹麦大学）

AI总结在机器学习模型的全面评估中，如何准确反映模型性能差异是一个重要问题。传统关键差异（CD）图依赖于离散排名，忽略了模型性能差距的幅度，导致“幅度盲”问题。为此，本文提出了一种基于幅度感知的排名统计方法MARS，通过引入相对边距系数对离散排名进行加权，从而更真实地反映模型性能差异，并在广泛实验设置中提供更深入的洞察。

Comments Preprint submitted to Elsevier Pattern Recognition Letters

2605.23559 2026-05-25 cs.CV cs.AI

PathNavigate: A Training-Free Pathology Agent with Surprise-Guided Scan and Shared Slide Memory for Whole-Slide Image VQA

PathNavigate: 一种无需训练的病理学代理，具有惊喜引导扫描和共享幻灯片记忆用于全切片图像VQA

Chunze Yang, Qidong Liu, Wenjie Zhao, Yue Tang, Jiusong Ge, Di Zhang, Jiashuai Liu, Lei Wu, Junbo Lu, Ni Zhang, Xian Wu, Zeyu Gao, Chen Li

发表机构 * School of Comp. Science & Technology, Xi’an Jiaotong University（西安交通大学计算机科学与技术学院）； Tencent Jarvis Lab（腾讯Jarvis实验室）； University of Cambridge（剑桥大学）

AI总结 PathNavigate 是一种无需训练的病理图像问答代理，旨在解决全切片图像问答（WSI-VQA）中在有限检查预算下高效定位关键病理证据的问题。该方法采用“扫描-搜索-读取”流程，通过共享的在线记忆模块生成异常区域池，并结合问题条件的相关性筛选高倍镜下的目标区域，从而提升答案准确性和解释性。实验表明，PathNavigate 在保持模型冻结的前提下，实现了更高的效率和更可靠的证据选择路径。

详情

AI中文摘要

全切片图像视觉问答（WSI-VQA）将病理学视为极端上下文搜索问题：为了回答自由形式的临床查询，系统必须首先在严格的检查预算下导航千兆像素切片，以定位稀疏的高分辨率证据。现有方法主要分为两种范式：i）监督式病理学多模态大语言模型（MLLMs）和代理可以将定位和推理吸收到学习模块中，但它们通常将导航与任务特定的监督和重新训练耦合，限制了其实用性；ii）无需训练的病理学代理通过保持核心模型冻结来避免这种成本，但通常遵循问题优先的设计，主要从查询条件相关性构建初始候选集。这可能会遗漏问题中未提及的决定性形态，并迫使更重的推理时脚手架。为了解决这一挑战，我们引入了PathNavigate，一种无需训练的病理学代理，基于扫描-搜索-读出流程构建。在问题匹配之前，PathNavigate在低放大倍数下扫描当前切片，使用共享的在线记忆模块处理冻结的病理学特征，生成一个切片特定的惊喜场，标记异常区域池。然后，它仅在此池内应用问题条件的PLIP相关性，以选择高放大倍数的搜索目标。最后，它提取局部高放大倍数证据，并使用冻结的感知器-裁决器堆栈进行回答，利用相同的在线记忆作为切片级上下文。在WSI-VQA和SlideBench-BCNB上的实验表明，所提出的扫描-搜索-读出设计提高了答案准确性，并产生了更可解释的证据选择轨迹，且效率更高。代码已在线公开。

英文摘要

Whole-slide image visual question answering (WSI-VQA) frames pathology as an extreme-context search problem: to answer a free-form clinical query, a system must first navigate a gigapixel slide under a strict inspection budget to locate sparse, high-resolution evidence. Existing approaches largely fall into two paradigms: i) supervised pathology multimodal large language models (MLLMs) and agents can absorb localization and reasoning into learned modules, but they often couple navigation to task-specific supervision and retraining, limiting their practicality; ii) training-free pathology agents avoid this cost by keeping core models frozen, but often follow a question-first design, constructing the initial candidate set mainly from query-conditioned relevance. This can miss decisive morphology that is not named in the question, and force heavier inference-time scaffolding. To address this challenge, we introduce PathNavigate, a training-free pathology agent built around a scan-search-readout routine. Before question matching, PathNavigate scans the current slide at low magnification with a shared online memory module over frozen pathology features, producing a slide-specific surprise field that marks an abnormal-region pool. It then applies question-conditioned PLIP relevance only within this pool to select high-magnification search targets. Finally, it extracts local high-magnification evidence and answers with a frozen perceptor-adjudicator stack, using the same online memory as slide-level context. Experiments on WSI-VQA and SlideBench-BCNB show that the proposed scan-search-readout design improves answer accuracy and yields more interpretable evidence-selection trajectories with higher efficiency.The code is available online.

URL PDF HTML ☆

赞 0 踩 0

2605.23556 2026-05-25 cs.LG cs.IR math.CO

Is Dimensionality a Barrier for Retrieval Models?

维度是检索模型的障碍吗？

Kiril Bangachev, Guy Bresler, Jonathan Kogan, Yury Polyanskiy

发表机构 * Department of Electrical Engineering and Computer Science（电气工程与计算机科学系）

AI总结本文探讨了为何现代基于嵌入的检索模型在表示维度较低（约1000维）的情况下仍能处理数十亿甚至数万亿的数据点。研究聚焦于最大边距嵌入问题，分析了在给定查询与文档相关性矩阵下，如何在有限维度中实现最大的分类边距。论文证明了在特定条件下，维度只需为 $O(k \log(n/k))$ 即可达到理论最优边距，从而解决了相关模型的维度需求问题，并通过实验验证了sigmoid损失在生成大边距嵌入方面的优势。

详情

AI中文摘要

为什么表示的低维度（通常$d\approx 1000$）不会阻止现代基于嵌入的检索模型扩展到数十亿甚至数万亿数据点？为了回答这个问题，我们在以下检索模型中研究最大间隔嵌入，该模型经典地出现在通信复杂性[PS86]和最近的基于嵌入的检索[WBNL26]中。设$A\in \{0,1\}^{N\times n}$是一个矩阵，指示$N$个查询中的每一个是否与$n$个文档中的每一个相关。我们感兴趣的是最大间隔$m>0$，记为$\mathsf{m}^{\mathsf{rd}}(d, A)$，使得存在查询和文档的单位范数嵌入$\{U_j\}_{j = 1}^N, \{V_i\}_{i = 1}^n$满足以下性质：当$A_{ji} = 1$时$\langle U_j, V_i\rangle \ge m$，否则$\langle U_j, V_i\rangle \le -m$。大间隔是表示质量的关键代理：它控制了对扰动的鲁棒性和跨查询的组合泛化能力。我们的主要定理表明，在没有维度限制的情况下，最佳可能间隔$\mathsf{m}^{\mathsf{rd}}(+\infty, A)$可以在维度$d = O(\mathsf{m}^{\mathsf{rd}}(+\infty, A)^{-2}\log n)$下几乎达到，这改进了[BDES02]的一个定理。结合定理1.5中的匹配下界，我们得出结论：当$A\in \{0,1\}^{\binom{n}{k}\times n}$是包含所有可能的$k$-稀疏行一次的矩阵时，维度$d = O(k\log (n/k))$是达到该设置下最大可能间隔$\mathsf{m}^{\mathsf{rd}}(+\infty, A) = \Theta(k^{-1/2})$的充分必要条件。这完全解决了[WBNL26]中的设定。我们还给出了当$d = o(k\log (n/k))$时产生大间隔的几种构造。最后，我们通过实验测试了InfoNCE和sigmoid损失在产生大间隔嵌入方面的表现，并展示了sigmoid损失的明显优势。

英文摘要

Why does the low dimensionality of representations, typically $d\approx 1000$, not prevent modern embedding-based retrieval models from scaling to billions, or even trillions, of data points? To answer this question, we study maximal-margin embeddings in the following retrieval model, classically studied in communication complexity [PS86] and more recently in embedding-based retrieval [WBNL26]. Let $A\in \{0,1\}^{N\times n}$ be a matrix indicating whether each of $N$ queries is relevant to each of $n$ documents. We are interested in the largest margin $m>0,$ denoted by $\mathsf{m}^{\mathsf{rd}}(d, A),$ for which there exist unit norm embeddings of the queries and documents $\{U_j\}_{j = 1}^N, \{V_i\}_{i = 1}^n$ with the following property. $\langle U_j, V_i\rangle \ge m$ whenever $A_{ji} = 1$ and $\langle U_j, V_i\rangle \le -m$ otherwise. A large margin is a key proxy for representation quality: it controls both robustness to perturbations and compositional generalization across queries. Our main theorem establishes that the best possible margin without a restriction on the dimension, $\mathsf{m}^{\mathsf{rd}}(+\infty, A),$ can be nearly achieved in dimension $d = O(\mathsf{m}^{\mathsf{rd}}(+\infty, A)^{-2}\log n)$ which improves a theorem of [BDES02]. Together with a matching lower bound in Theorem 1.5, we conclude that when $A\in \{0,1\}^{\binom{n}{k}\times n}$ is the matrix containing all possible $k$-sparse rows once, dimension $d = O(k\log (n/k))$ is necessary and sufficient for the maximal possible margin $\mathsf{m}^{\mathsf{rd}}(+\infty, A) = Θ(k^{-1/2})$ in this setting. This fully resolves the setup of [WBNL26]. We also give several constructions for large margins when $d = o(k\log (n/k)).$ Finally, we empirically test the InfoNCE and sigmoid losses for producing large margin embeddings and demonstrate a clear advantage of the sigmoid loss.

URL PDF HTML ☆

赞 0 踩 0

2605.23555 2026-05-25 cs.CV

Generator-Refiner-Examiner: A Tri-Module Data Augmentation Framework for 3D Human Avatar Learning from Monocular Videos

生成器-精炼器-检验器：一种用于从单目视频学习3D人体虚拟形象的三模块数据增强框架

Gangjian Zhang, Jian Shu, Sicheng Yu, Wenhao Shen, Yu Feng, Hao Wang

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结本文研究了从单目视频中重建具有逼真外观和可动画效果的3D人体化身的挑战。为了解决现有方法在数据稀缺情况下难以捕捉细节的问题，提出了一种名为TrioMan的三模块数据增强框架，包含生成器、细化器和检查器三个协同组件，分别用于生成多样化样本、提升生成质量以及筛选符合人体一致性的样本。实验表明，该方法在多个基准数据集上优于现有先进方法。

2605.23551 2026-05-25 cs.LG cs.AI

Goal-Conditioned Agents that Learn Everything All at Once

目标条件智能体一次性学习所有内容

Michael Matthews, Matthew Jackson, Michael Beukman, Thomas Foster, Alistair Letcher, Scott Fujimoto, Cédric Colas, Jakob Foerster

发表机构 * University of Oxford（牛津大学）； McGill University（麦吉尔大学）； MIT（麻省理工学院）； Inria（法国国家信息与自动化研究所）

AI总结本文提出了一种名为LEO（Learning Everything all at Once）的新方法，用于提升目标条件强化学习的效率。该方法通过一次性输出所有目标对应的价值和动作，实现了高效的并行更新，解决了传统全目标学习计算开销大的问题。实验表明，LEO在目标条件任务和连续控制环境中均表现出色，且相比传统方法有超过250倍的加速效果，为复杂环境中的强化学习提供了有力工具。

详情

AI中文摘要

一个目标条件的强化学习智能体在探索环境时，会在整个轨迹中看到大量信息，但大多数信息在仅根据命令目标进行在线策略更新时被丢弃。全目标学习（每个转换都用于针对每个目标进行离线策略学习）允许智能体提取最大信息，但通过简单的重新标记通常计算上不可行。这可以通过同时为每个目标输出值和动作来克服，从而允许通过网络单次传递进行高效的并行全目标更新，我们称之为一次性学习所有内容（LEO）。我们表明，这种方法在目标条件的Craftax上显著优于其他方法，在连续控制环境中与现有基线具有竞争力，同时与全目标重新标记相比实现了超过250倍的加速。然后，我们进一步表明，通过将LEO用作教师网络而非直接行动者，这种方法可以变得更加强大。我们希望，通过解锁大规模的全目标学习，LEO可以成为复杂环境中强化学习实践者的有用工具。我们开源了我们的代码。

英文摘要

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.

URL PDF HTML ☆

赞 0 踩 0

2605.23540 2026-05-25 cs.LG

When One Point Is Not Enough: Addressing Ambiguous Instances in Dimensionality Reduction by Splitting

当一点不够时：通过分裂解决降维中的模糊实例

Diede P. M. van der Hoorn, Alessio Arleo, Fernando V. Paulovich

发表机构 * Eindhoven University of Technology（埃因霍温理工大学）

AI总结本文研究了降维方法中因数据点模糊性导致的邻域结构失真问题，提出了一种基于图的方法来识别并复制这些模糊实例，将其映射到多个位置以更准确地反映其在高维空间中的多个邻域关系。该方法有效缓解了传统降维技术中因单点映射导致的局部结构丢失问题，并在多个实例上展示了其对隐藏邻域关系的揭示能力。

详情

AI中文摘要

降维（DR）方法广泛用于可视化高维数据。基于DR的分析中的一个关键任务是发现邻域，这依赖于分析投影的细粒度局部结构。然而，DR本质上是一个有损过程；没有技术能完美保留高维关系，因此投影包含视觉伪影。在本文中，我们强调了一个通常被忽视的视觉伪影来源：模糊实例。这些实例与高维空间中多个相互不相似的邻域高度相似。标准DR方法无法忠实地投影此类实例，因为每个数据实例被映射到视觉空间中的一个单点。因此，这样的实例仅被放置在其一个邻域中（或根本不放置），因此仅表示其部分邻域结构。我们称这种失真为部分邻域嵌入。在本文中，我们引入了一种基于图的方法，该方法识别模糊实例并将其复制为投影中的多个点，将每个副本放置在其各自的邻域中。我们使用UMAP来展示结果，但我们的方法也推广到其他基于局部图的DR技术，并且我们表明，我们的方法揭示了投影中先前隐藏的邻域成员关系，减少了多个示例中的部分邻域嵌入，并得到了定量分析的支持。

英文摘要

Dimensionality Reduction (DR) methods are widely used to visualize high-dimensional data. One key task in DR-based analysis is discovering neighborhoods, which relies on analyzing the fine-grained local structure of a projection. However, DR is an inherently lossy process; no technique can perfectly preserve the high-dimensional relationships, and projections therefore contain visual artifacts. In this paper, we highlight a typically overlooked source of visual artifacts: ambiguous instances. These are instances that are highly similar to multiple mutually dissimilar neighborhoods in the high-dimensional space. Standard DR methods cannot faithfully project such instances, since each data instance is mapped to a single point in the visual space. As a result, such an instance is placed in only one of its neighborhoods (or in none at all), so only part of its neighborhood structure is represented. We call this distortion partial neighborhood embedding. In this paper, we introduce a graph-based approach that identifies ambiguous instances and replicates them as multiple points in the projection, placing each copy within its respective neighborhood. We use UMAP for our results, but our approach also generalizes to other local graph-based DR techniques, and we show that our approach reveals previously hidden neighborhood memberships in projections and reduces partial neighborhood embedding across multiple examples, and is further supported by quantitative analyses.

URL PDF HTML ☆

赞 0 踩 0

2605.23523 2026-05-25 cs.CV

ComPose: When to Trust Hands for Object Pose Tracking

ComPose：何时信任手部进行物体姿态跟踪

Jisu Shin, Junoh Lee, JunGyu Lee, Inhwan Bae, Dohyeon Lee, Hokyun Im, Youngwoon Lee, Hae-Gon Jeon

发表机构 * GIST（韩国信息科学与技术学院）； Yonsei Univ.（延世大学）； DGIST（国立地面空间技术研究所）

AI总结本文提出了一种名为 ComPose 的六自由度物体姿态跟踪框架，旨在从 RGB 视频中实现对被手部遮挡物体的鲁棒跟踪。该方法创新性地将手部运动作为补充线索，而非单纯遮挡物，在统一的跟踪流程中结合物体和手部的提示信息，通过自适应选择关键手部关节、融合多源线索并利用几何证据进行修正，实现了稳定且精确的物体轨迹估计。实验表明，该方法在严重遮挡和几何模糊情况下表现出色，且无需外部平滑处理即可获得时间上一致的 3D 轨迹，适用于机器人操作等下游任务。

Comments 22 pages, 10 figures

详情

AI中文摘要

从视频中重建物体运动是具身AI和机器人操作的关键组成部分。尽管已经研究了多种物体姿态跟踪方法，但它们严重依赖强大的外部先验（如深度数据或3D模板），并且即使使用显式掩码，仍然极易受到手部抓取造成的严重遮挡的影响。在这项工作中，我们提出了ComPose，一个6DoF物体跟踪框架，旨在从RGB视频中进行手部感知的物体姿态估计。我们的方法不是将手部纯粹视为遮挡物，而是将手部运动协调为物体跟踪的补充线索。具体来说，我们通过在一个统一的跟踪流程中结合来自基础模型的物体和手部线索，随时间恢复多种物体运动。在此，ComPose自适应地选择信息丰富的手部关节，结合物体和手部衍生的线索进行运动估计，并使用可见的几何证据和学习到的校正来细化所得的物体运动。我们进一步在旋转和平移上强制时间一致性，从而在没有外部平滑的情况下产生稳定的3D物体轨迹。大量实验表明，我们的方法在严重手部遮挡和几何模糊下准确、高效且鲁棒。此外，所得的轨迹还可以通过使机器人能够从在线视频中重建人类动作，有效地转移到下游机器人操作中。

英文摘要

Reconstructing the motion of objects from videos is a key component for embodied AI and robot manipulation. While diverse approaches to object pose tracking have been studied, they rely heavily on strong external priors, such as depth data or 3D templates, and remain highly vulnerable to severe occlusions by hand grasps despite the use of explicit masks. In this work, we present ComPose, a 6DoF object tracking framework designed for hand-aware object pose estimation from RGB video. Rather than treating the hand purely as an occluder, our method harmonizes hand motions as a \textit{complementary cue} for object tracking. In detail, we recover a variety of object motions over time by combining object and hand cues from foundation models within a unified tracking pipeline. Here, ComPose adaptively selects informative hand joints, combines object- and hand-derived cues for motion estimation, and refines the resulting object motion using visible geometric evidence and a learned correction. We further enforce the temporal consistency over both rotation and translation, yielding stable 3D object trajectories over time without any external smoothing. Extensive experiments show that our method is accurate, efficient, and robust under severe hand occlusion and geometric ambiguity. In addition, the resulting trajectories can also effectively transfer to downstream robot manipulation by enabling robots to reconstruct human actions from online videos.

URL PDF HTML ☆

赞 0 踩 0

2605.23522 2026-05-25 cs.LG cs.AI cs.CV

Precise: SDE-Consistent Stochastic Sampling for RL Post-Training of Flow-Matching Models

Precise: 用于流匹配模型强化学习后训练的SDE一致随机采样

Jade Zou, Tao Huang, Weijie Kong, Junzhe Li, Yue Wu, Qi Tian, Jiangfeng Xiong, Jianwei Zhang, Liefeng Bo, Zhao Zhong

发表机构 * Peking University（北京大学）； Tencent Hunyuan（腾讯文言）

AI总结该论文研究了如何通过强化学习（RL）对流匹配模型进行后训练，以提升其生成质量与提示对齐能力。核心方法是将确定性的采样轨迹转化为随机策略，通过设计一个符合随机微分方程（SDE）的采样器，实现探索与稳定性的平衡。提出的新采样器Precise在保持去噪轨迹SDE一致性的同时，有效减少了噪声干扰，实验表明其在奖励优化速度和生成质量上均优于现有方法。

详情

AI中文摘要

强化学习已成为提升扩散和流匹配生成器中提示对齐和感知质量的有效方法。将在线强化学习应用于流匹配的关键步骤是将确定性采样轨迹转化为随机策略，通常通过用随机微分方程替代逆向常微分方程来实现。随机采样器控制探索行为和去噪动力学，因此是策略的一部分，其设计会显著影响奖励优化性能。我们将采样器设计分解为两个相互依赖的组成部分：选择适量的随机探索，以及在强化学习中使用的少量步数下忠实地离散化得到的SDE。针对第一个组成部分，我们分析了去噪过程中探索与稳定性之间的固有张力，并推导出平衡两者的SDE调度。针对离散化挑战，我们使用一个玩具示例表明，现有采样器可能偏离流匹配过程，要么引入过多的离散化噪声，要么依赖不能保证收敛到数据分布的启发式规则。为解决这些问题，我们提出了Precise，一种新的随机采样器，平衡了有效探索与稳定性。关键地，Precise通过一种冻结干净潜变量后验均值的新颖近似，使去噪轨迹保持SDE一致，解决了标准采样器中的过度噪声问题。大量实验表明，该公式通过强化学习实现了显著更快且更稳定的奖励优化，达到了最先进的对齐分数（例如PickScore、HPSv2.1），同时匹配先前采样器的最佳域内性能所需的训练时间减少了13.1-53.2%。

英文摘要

Reinforcement learning (RL) has become an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. A critical step for applying online RL to flow matching is turning the deterministic sampling trajectory into a stochastic policy, typically by replacing the reverse-time Ordinary Differential Equation (ODE) with a Stochastic Differential Equation (SDE). The stochastic sampler, controlling the exploration behavior and denoising dynamics, is thus part of the policy, and its design can significantly affect the reward optimization performance. We break down the sampler design into two interdependent components: choosing the right amount of stochastic exploration, and discretizing the resulting SDE faithfully at the small step counts used in RL. To address the first component, we analyze the inherent tension between exploration and stability in denoising and derive an SDE schedule that balances the two. Turning to the discretization challenge, we use a toy example to show that existing samplers can deviate from the flow-matching process, either by introducing excessive discretization noise or by relying on heuristic rules that do not guarantee convergence to the data distribution. To address these issues, we propose Precise, a new stochastic sampler that balances effective exploration with stability. Crucially, Precise keeps the denoising trajectory SDE-consistent through a novel approximation that freezes the clean-latent posterior mean, resolving the excess noise issue in standard samplers. Extensive experiments demonstrate that this formulation leads to significantly faster and more stable reward optimization via reinforcement learning, achieving state-of-the-art alignment scores (e.g., PickScore, HPSv2.1) while requiring 13.1-53.2% less wall-clock training time to match the best in-domain performance of prior samplers.

URL PDF HTML ☆

赞 0 踩 0

2605.23518 2026-05-25 cs.CV

VINS-120K: Ultra High-Resolution Image Editing with A Large-Scale Dataset

VINS-120K：基于大规模数据集的超高分辨率图像编辑

Zhizhou Chen, Shanyan Guan, Zhanxin Gao, En Ci, Yanhao Ge, Wei Li, Zhenyu Zhang, Jian Yang, Ying Tai

发表机构 * Nanjing University（南京大学）； vivo

AI总结本文提出VINS-120K，一个包含12万组高分辨率图像编辑指令对的大规模数据集，每张图像分辨率超过4K，用于推动超高分辨率图像编辑研究。研究还提出一种高频感知的后适配策略，使现有模型能够有效处理超高分辨率图像，并构建了VINS-4KEval基准以评估编辑效果。该工作为超高分辨率图像编辑提供了高质量数据支持和新的方法改进。

详情

AI中文摘要

直接编辑超高分辨率（UHR）图像具有价值但尚未充分探索，主要由于缺乏高质量数据以及高频纹理细节建模的挑战。我们引入VINS-120K，首个用于基于指令的UHR图像编辑的大规模数据集，包含120K精心筛选的指令、输入图像和编辑图像三元组。每张图像超过4K分辨率（≥4096×4096），并通过严格的多阶段流水线过滤以确保视觉质量、指令对齐和美学保真度。基于VINS-120K，我们进一步开发了一种高频感知的后适应策略，将预训练的非高分辨率模型扩展到UHR领域。我们还提出了VINS-4KEval基准，涵盖多种编辑类型，以促进UHR设置下的一致评估。实验证实，我们的工作在UHR图像编辑中改善了细粒度细节合成和纹理真实感。

英文摘要

Directly editing ultra-high-resolution (UHR) images is valuable but underexplored, primarily due to the lack of high-quality data and the challenge in modeling high-frequency texture details. We introduce VINS-120K, the first large-scale dataset for instruction-based UHR image editing, comprising 120K carefully curated triplets of instruction, input image, and edited image. Each image exceeds 4K resolution ($\geq$4096 $\times$ 4096) and is filtered through a rigorous multi-stage pipeline to ensure visual quality, instruction alignment, and aesthetic fidelity. Built on VINS-120K, we further develop a high-frequency-aware post-adaptation strategy to extend pretrained non-high-resolution models to the UHR regime. We also present VINS-4KEval, a benchmark covering diverse editing types, to facilitate consistent evaluation in UHR settings. Experiments confirm that our work improves fine-grained detail synthesis and texture realism in UHR image editing.

URL PDF HTML ☆

赞 0 踩 0

2605.23510 2026-05-25 cs.LG

询问老朋友：诊断和缓解基于LLM的法定问答中的时间故障模式

Max Prior, Andreas Schultz, Matthias Grabmair

发表机构 * Technical University of Munich（慕尼黑技术大学）

AI总结该研究探讨了基于大语言模型（LLM）的法律问答系统在处理时效性法律条文时的两种失效模式：法规更新后的过时问题和对较新法规的偏好偏差。为此，研究构建了一个包含312个专家验证的德语法律问答对的基准数据集，并在不同推理设置下评估了多个LLM的表现。结果表明，引入基于检索的增强方法能显著提升模型在时间有效性方面的性能，而单纯依赖网络搜索则存在不稳定性和近期偏好问题，研究强调了在法律问答中必须将时间有效性作为硬性约束。

详情

AI中文摘要

大型语言模型越来越多地用于法律研究，但其固定的训练截止日期和对静态参数知识的依赖与成文法的演变性质相矛盾。我们研究了两种时间故障模式：截止后过时（模型在立法修正后应用被取代的规则）和近因偏差（即使历史版本支配事实模式，模型也偏好较新的规定）。为此，我们提出了一个包含312个专家验证、时间敏感的德国法定问答对的基准，涵盖三个类别：截止后修正问题、修正前问题和多条款修正前问题。我们评估了来自OpenAI、Anthropic和DeepSeek的五个LLM，在四种推理设置下：普通、网络搜索和两种检索增强变体（通过事实日期提取和版本过滤强制执行时间有效性）。使用经过人类专家评分验证的LLM作为评判，我们发现普通设置在截止后设置中性能严重下降。两种RAG方法在所有问题类型上均显著提高了性能，而网络搜索则产生不稳定的收益，并在历史锚定任务上表现出明显的近因偏差。我们的结果表明，可靠的法律问答需要将时间有效性视为硬约束。

英文摘要

Large language models are increasingly used for legal research, yet their fixed training cutoffs and reliance on static parametric knowledge are at odds with the evolving nature of statutory law. We study two temporal failure modes: post-cutoff staleness, where models apply superseded rules after legislative amendments, and recency bias, where models prefer newer provisions even when a historical version governs the fact pattern. To this end, we present a benchmark of 312 expert-validated, time-sensitive German statutory QA pairs spanning three categories: Post-Cutoff Amendment Questions, Pre-Amendment Questions, and Multi-Provision Pre-Amendment Questions. We evaluate five LLMs by OpenAI, Anthropic and DeepSeek under four inference settings: Vanilla, Web-search, and two retrieval-augmented variants that enforce temporal validity via a fact date extraction and version filtering. Using an LLM-as-a-judge validated against human expert ratings, we find severe degradation in the Vanilla post-cutoff setting. Both RAG approaches substantially improve performance across all question types, while web search yields unstable gains and exhibits a marked recency bias on historically anchored tasks. Our results indicate that reliable legal QA requires treating temporal validity as a hard constraint.

URL PDF HTML ☆

赞 0 踩 0

2605.23493 2026-05-25 cs.AI

从稀疏横截面快照中学习个体动力学

Christian Lagemann, Kai Lagemann, Steven L. Brunton, Sach Mukherjee

发表机构 * Statistics and Machine Learning, German Center for Neurodegenerative Diseases (DZNE)（统计与机器学习，德国神经退行性疾病中心（DZNE））； MediaTek Research（联发科技研究）； Department of Mechanical Engineering & AI Institute in Dynamic Systems, University of Washington, Seattle（机械工程与人工智能动态系统研究所，华盛顿大学，西雅图）； DZNE & University of Bonn, Bonn, Germany and University of Cambridge, Cambridge, United Kingdom（DZNE与波恩大学，波恩，德国和剑桥大学，剑桥，英国）

AI总结该研究旨在从稀疏的横截面快照中学习个体的动态演化过程，传统方法在数据稀疏或完全横截面的情况下难以准确推断个体的连续时间轨迹。本文提出了一种名为CADENCE的概率框架，通过将潜在动态与静态个体上下文关联，实现了从孤立快照中恢复个体轨迹。该方法结合了基于分数的空域编码器和软专家混合路由机制，提供了单时间点轨迹推断的可识别性保证，并在多个基准测试中表现出优于现有序列模型的性能。

详情

AI中文摘要

预测一个动力学单元如何随时间演化——例如个体如何衰老、流行病如何传播、物理系统如何退化——通常需要密集的纵向追踪。当只有极其稀疏或完全横截面的数据可用时，推断个体化的连续时间轨迹本质上是病态的。现有方法迫使严格妥协：序列模型（如潜在ODE）需要密集的纵向数据，而横截面方法（如最优传输、基于流匹配的）映射聚合群体，丢失了个体动力学。在本文中，我们证明这种二分法可以被打破。我们介绍CADENCE，一个原则性的概率框架，通过将潜在动力学锚定到静态的个体级上下文，从孤立快照中恢复连续的个体轨迹。我们为单时间点轨迹推断提供了新颖的可识别性保证。通过结合基于分数的空间编码器（双射概率流ODE）以消除微分同胚歧义，以及软混合专家（SMoE）路由器，我们证明个体动力学参数和路由函数是联合可识别的。在一系列涵盖物理系统到真实世界生物数据的基准测试中，CADENCE严格在具有上下文结构的极端稀疏快照上训练，其性能匹配或超过了在密集全轨迹数据上训练的最先进序列模型。

英文摘要

Predicting how a dynamical unit evolves over time - how an individual ages, an epidemic spreads, or a physical system degrades - typically requires dense longitudinal tracking. When only extremely sparse or entirely cross-sectional data is available, inferring individualized, continuous-time trajectories is fundamentally ill-posed. Existing methods force a strict compromise: sequence models (e.g. latent ODEs) require dense longitudinal data, while cross-sectional methods (e.g. optimal transport, flow matching-based) map aggregate populations, losing individual dynamics. In this paper, we demonstrate that this dichotomy can be broken. We introduce CADENCE, a principled probabilistic framework that recovers continuous individual trajectories from isolated snapshots by anchoring latent dynamics to static, individual-level contexts. We provide novel identifiability guarantees for single-timepoint trajectory inference. By combining a score-based spatial encoder (bijective Probability Flow ODE) to eliminate diffeomorphic ambiguities with a Soft Mixture-of-Experts (SMoE) router, we show that individual dynamical parameters and routing function are jointly identifiable. Across a suite of benchmarks spanning physical systems to real-world biological data, CADENCE, trained strictly on extremely sparse snapshots with context structure, matches or exceeds the performance of state-of-the-art sequential models trained on dense, full-trajectory data.

URL PDF HTML ☆

赞 0 踩 0