arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.04570 2026-06-04 cs.SD

Flow-HOA: Generative Joint Optimization for Ambisonics Encoding via Flow Matching

Flow-HOA:基于流匹配的Ambisonics编码生成式联合优化

Yuhuan You, Yufan Qian, Tianshu Qu, Bin Wang, Xueyang Lv

AI总结 提出Flow-HOA生成框架,通过条件流匹配联合优化时域、频谱和空间保真度,生成可部署的FIR编码滤波器组,在合成数据和真实录音上均优于强基线方法。

详情
Comments
Accepted for presentation at AES Europe 2026 Convention (AES 160th Convention), Copenhagen, Denmark, May 28-30, 2026
AI中文摘要

从稀疏、不规则的麦克风阵列进行高阶Ambisonics(HOA)编码仍然是沉浸式通信和XR中消费级空间音频捕获的关键挑战。我们提出Flow-HOA,一个生成式框架,联合优化包含时域、频谱和空间保真度的多维目标,同时生成可部署的、时不变的有限脉冲响应(FIR)编码滤波器组。通过条件流匹配,模型学习将简单先验分布映射到FIR滤波器系数的目标分布。训练由复合损失引导,平衡时域波形保真度、多分辨率频谱一致性、子带能量保持和空间指向性约束。在合成模拟数据上的客观评估表明,在信号保真度和空间准确性指标上均优于强模型基线。在真实麦克风阵列录音上的主观听音测试进一步证实,Flow-HOA能产生更高的整体音质并减少伪影,展示了从合成训练数据到真实捕获条件的泛化能力。

英文摘要

Higher-Order Ambisonics (HOA) encoding from sparse, irregular microphone arrays remains a critical challenge for consumer spatial audio capture in immersive communication and XR. We propose Flow-HOA, a generative framework that jointly optimizes a multi-dimensional objective encompassing time-domain, spectral, and spatial fidelity while producing a deployable, time-invariant bank of Finite Impulse Response (FIR) encoding filters. Using conditional flow matching, the model learns to map a simple prior distribution to the target distribution of FIR filter coefficients. Training is guided by a composite loss that balances time-domain waveform fidelity, multi-resolution spectral consistency, sub-band energy preservation, and spatial directivity constraints. Objective evaluations on synthetically simulated data demonstrate improved performance over strong model-based baselines in both signal fidelity and spatial accuracy metrics. Subjective listening tests on real microphone array recordings further confirm that Flow-HOA yields higher overall sound quality with reduced artifacts, demonstrating generalization from synthetic training data to real-world capture conditions.

2606.04569 2026-06-04 cs.RO

MineXplore: An Open-Source Reinforcement Learning Exploration Benchmark for GNSS-Denied Underground Environment

MineXplore: 面向GNSS拒止地下环境的开源强化学习探索基准

Abhishek S, Badrikanath Praharaj, Sreeram MV

AI总结 提出基于真实矿井数据的开源MuJoCo导航基准MineXplore,通过六阶段管道重建隧道网络,验证了在GNSS拒止、光照退化等极端条件下策略学习的稳定性与可复现性。

详情
Comments
7 pages,11 figures, Submitted to the workshop Xplore:Cross-Disciplinary aspects of Exploration in Robotics, Reinforcement Learning and Search Held at International Conference on Robotics and Automation (ICRA)
AI中文摘要

地下矿井为自主机器人导航带来了极端条件:GPS被拒止,光照退化,隧道拓扑具有丰富的环路且非凸。目前开源生态中尚不存在基于真实生产矿井几何结构且兼容GPU加速学习管道的仿真基准。我们提出了MineXplore,一个基于Leung等人2017年智利地下铜矿数据集的开源MuJoCo导航基准。该环境通过六阶段轮廓到MJCF管道重建了一个104,423平方米的隧道网络,包含八边形墙壁横截面、LiDAR源锯齿状墙壁几何、三个地形摩擦区域、全局5度倾斜和周期性点光源。几何保真度通过交并比(IoU)为0.9538(与源测量图对比)得到验证,表面纹理相似度在六个结构维度上达到79.4%。通过RLlib在五个独立随机种子上训练的单智能体PPO基线实现了88.89%的最佳滚动覆盖率(5个种子中有3个达到90%覆盖目标),证实MineXplore在真实地下感知和拓扑条件下支持稳定且可复现的策略学习。

英文摘要

Underground mines present extreme conditions for autonomous robot navigation: GPS is denied, lighting is degraded, and tunnel topology is loop-rich and non-convex. Simulation benchmarks grounded in real production-mine geometry and compatible with GPU-accelerated learning pipelines do not yet exist in the open-source ecosystem. We present MineXplore, an open-source MuJoCo-based navigation benchmark derived from the Leung et al. 2017 Chilean underground copper mine dataset. The environment reconstructs a 104,423 sq.m tunnel network through an six-stage contour-to-MJCF pipeline incorporating octagonal wall cross-sections, LiDAR-sourced jagged wall geometry, three terrain friction zones, a global 5 degree incline, and periodic spot lighting. Geometric fidelity is validated at an Intersection over Union (IoU) of 0.9538 against the source survey map, and surface texture similarity scores 79.4% across six structural dimensions. A single-agent PPO baseline trained via RLlib across five independent random seeds achieves a best rolling coverage of 88.89% (3 of 5 seeds reaching the 90% coverage target), confirming that MineXplore supports stable and reproducible policy learning under realistic underground sensing and topology.

2606.04564 2026-06-04 cs.LG

SurvPFN: Towards Foundation Models for Survival Predictions

SurvPFN:面向生存预测的基础模型

Samuel Böhm, Lennart Purucker, Frank Hutter, Pascal Schlosser

AI总结 提出SurvPFN,一种基于先验数据拟合网络(PFN)的生存预测模型,通过合成数据预训练和删失负对数似然损失,无需逐数据集拟合即可在真实任务中与经典和深度生存基线竞争。

详情
Comments
10 pages, 1 figure. Accepted to "Foundation Models for Structured Data" Workshop at the International Conference on Machine Learning (ICML) 2026
AI中文摘要

表格基础模型(TFM)在标准分类和回归任务中取得了快速进展,但时间至事件生存预测任务在很大程度上仍未涉及。与标准回归任务不同,生存预测模型必须处理删失数据。标准TFM无法原生处理删失数据,导致预测有偏且不准确,使其不适用于实际应用。为克服这一根本限制,我们提出了用于生存预测任务的先验数据拟合网络(PFN) exttt{SurvPFN}。我们在数百万个合成生存预测任务上预训练 exttt{SurvPFN},通过考虑删失数据的分布回归来学习生存。 exttt{SurvPFN}通过以下方式工作:(1)使用威布尔事件时间和非信息性删失机制生成数据;(2)整合删失事件指示符;(3)最小化删失负对数似然。在SurvSet(一个真实世界生存任务集合)上, exttt{SurvPFN}无需逐数据集拟合、生存特定架构或特征工程,即可与经典和深度生存基线高度竞争。我们表明,生存可以被视为具有删失损失的连续时间分布回归问题,从而释放PFN在时间至事件预测中的潜力。

英文摘要

Tabular foundation models (TFMs) have made rapid progress in standard classification and regression, but time-to-event survival prediction tasks have remained largely untouched. Unlike in standard regression tasks, survival prediction models must account for censored data. Standard TFMs cannot handle natively censored data, leading to biased and inaccurate predictions, making them unsuitable for real-world applications. To overcome this fundamental limitation, we propose \texttt{SurvPFN}, a prior-data fitted network (PFN), for survival prediction tasks. We pretrain \texttt{SurvPFN} on millions of synthetic survival prediction tasks to learn survival via distributional regression that accounts for censored data. \texttt{SurvPFN} works by (1) generating data with Weibull event times and a non-informative censoring mechanism; (2) integrating a censored event indicator; and (3) minimizing a censored negative log-likelihood. On SurvSet, a collection of real-world survival tasks, \texttt{SurvPFN} is highly competitive with classical and deep survival baselines without per-dataset fitting, a survival-specific architecture, or feature engineering. We show that survival can be treated as a continuous-time distributional regression problem with censored loss, unlocking the power of PFNs for time-to-event predictions.

2606.04562 2026-06-04 cs.AI cs.LG cs.SI

Neetyabhas: A Framework for Uncertainty-Aware Public Policy Optimization in Rational Agent-Based Models

Neetyabhas: 理性主体模型中不确定性感知的公共政策优化框架

Janani Venugopalan, Gaurav Deshkar, Rishabh Gaur, Harshal Hayatnagarkar, Jayanta Kshirsagar

AI总结 提出一种集成流行病测量和政策执行不确定性的分层强化学习框架,通过模拟个体行为与政策干预的交互,有效管理疫情并降低影响。

详情
AI中文摘要

目的 世界卫生组织的COVID-19非药物干预措施(如封锁、疫苗接种)有效遏制了传播,但带来了沉重的经济负担。现有研究常常忽略个体行为,并错误地假设完美的感染追踪和无误的政策执行,未能考虑现实世界的不确定性和错误。方法 我们提出了一种整合流行病测量(感染/住院)和政策执行中不确定性的方法。我们构建了一个包含1000名个体的模拟模型,这些个体实时做出关于佩戴口罩、接种疫苗和购物的选择。同时,政策制定者基于健康和经济观察部署干预措施(封锁、强制令)。该框架由分层强化学习智能体驱动,利用深度Q网络以及不确定性感知的策略梯度变体(DDPG和TD3)。结果 模拟有效管理了疫情的进展。佩戴口罩和疫苗接种被证明非常有效,显著降低了疫情高峰的高度和持续时间。通过整合个体行为、政策不确定性和多方面的干预措施,我们的动态控制方法成功减轻了疫情的影响。结论 我们的模型通过将不确定性和人类行为嵌入公共卫生政策框架,克服了以往研究的局限性。模拟表明,考虑个体选择和不完美数据对于设计复杂疫情期间的有效干预措施至关重要,其中口罩和疫苗是关键工具。

英文摘要

Purpose The WHO's COVID-19 non-pharmaceutical interventions (e.g., lockdowns, vaccinations) effectively curb transmission but impose heavy economic strains. Existing research often neglects individual behaviors and falsely assumes perfect infection tracking and flawless policy execution, failing to account for real-world uncertainties and errors. Methods We propose an integrative approach incorporating uncertainties in both epidemic measurement (infections/hospitalizations) and policy implementation. We built a simulation model of 1,000 individuals making real-time choices regarding mask-wearing, vaccination, and shopping. Concurrently, policymakers deploy interventions (lockdowns, mandates) based on health and economic observations. This framework is driven by hierarchical reinforcement learning agents, utilizing deep Q-networks alongside uncertainty-aware policy gradient variants (DDPG and TD3). Results The simulations effectively managed the epidemic's progression. Masking and vaccinations proved highly effective, significantly reducing both the outbreak's peak height and duration. By integrating individual behaviors, policy uncertainties, and multifaceted interventions, our dynamic control approach successfully mitigated the epidemic's impact. Conclusions Our model overcomes previous research limitations by embedding uncertainty and human behavior into public health policy frameworks. The simulation demonstrates that accounting for individual choices and imperfect data is crucial for designing effective interventions during complex pandemics, with masks and vaccines serving as pivotal tools.

2606.04557 2026-06-04 cs.CL cs.IR cs.LG

Cartridges at Scale: Training Modular KV Caches over Large Document Collections

大规模弹匣:训练模块化KV缓存以处理大型文档集合

Momchil Hardalov, Gonzalo Iglesias, Adrià de Gispert

AI总结 提出Cartridges at Scale (CAS)框架,通过动态干扰混合和内存高效预算管理器实现大规模多弹匣训练,在减少预填充开销的同时保持准确性,性能优于单块弹匣10-31点,接近全上下文学习。

详情
Comments
21 pages, 5 figures, 17 tables
AI中文摘要

大型语言模型能够处理长上下文,但预填充数百万个标记是浪费的,因为许多内容在查询之间保持不变。弹匣通过将文档集合提炼为可重用的键值(KV)缓存来解决这一问题,从而消除预填充同时保持准确性。这种方法的一个关键限制是弹匣是单块且非组合的:将整个集合编码为单个KV块无法扩展,并且天真地混合单独训练的弹匣会使性能下降到接近随机水平。我们引入了Cartridges at Scale (CAS),这是一个可扩展的多弹匣学习训练框架,具有动态干扰混合和内存高效的预算管理器,可在GPU和持久存储之间轮换数百个每文档弹匣。我们的方法可扩展到超过一百万个标记的集合,在可比标记预算下,比单块弹匣提高10-31点。即使在高度压缩下,Oracle弹匣准确率也接近完全上下文学习的2-6点范围内。当与检索结合用于弹匣选择时,CAS匹配或超过传统RAG准确率,同时消耗的提示标记减少3-4倍。

英文摘要

Large Language Models can reason over long contexts, yet prefilling millions of tokens is wasteful as much of the content remains static across queries. Cartridges address this by distilling document collections into reusable key-value (KV) caches that eliminate prefilling while preserving accuracy. A critical limitation of this approach is that cartridges are monolithic and non-compositional: encoding an entire collection into a single KV block does not scale, and naively mixing cartridges trained in isolation collapses performance to near chance. We introduce Cartridges at Scale (CAS), a training framework for scalable multi-cartridge learning with dynamic distractor mixing and a memory-efficient budget manager that rotates hundreds of per-document cartridges between GPU and persistent storage. Our approach scales to collections exceeding a million tokens, improving over a monolithic cartridge by 10-31 points at comparable token budgets. Oracle cartridge accuracy falls within 2-6 points of full in-context learning even at high compression. When paired with retrieval for cartridge selection, CAS matches or exceeds conventional RAG accuracy while consuming 3-4x fewer prompt tokens.

2606.04555 2026-06-04 cs.CL cs.AI

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

时间顺序对智能体记忆至关重要:面向长程智能体的线段树

Yifan Simon Liu, Liam Gallagher, Faeze Moradi Kalarde, Jiazhou Liang, Armin Toroghi, Scott Sanner

AI总结 提出线段树记忆架构SegTreeMem,通过在线右边缘更新规则保持对话历史的时间顺序,结合层次化时间上下文进行检索,在长程记忆基准上优于现有方法。

详情
AI中文摘要

长程对话智能体需要通过与用户交互不断演化的事件、任务和目标进行互动。这些历史记录本质上是时间性的,然而许多现有的记忆系统主要按主题相似性组织信息,可能忽略事件发生的顺序。我们引入线段树记忆(Segment Tree Memory,简称SegTreeMem),这是一种将对话历史表示为按时间顺序排列的线段树的记忆架构。SegTreeMem通过在线最右边缘更新规则逐步插入新话语,在形成层次化记忆片段的同时保持时间顺序。在检索时,SegTreeMem通过树传播相关性分数,将局部语义匹配与层次化时间上下文相结合。在三个长程记忆基准和两个LLM骨干网络上,SegTreeMem在答案质量上优于平面检索、图结构记忆和树结构记忆基线。额外的时间顺序排列分析表明,性能提升依赖于在记忆构建过程中保持时间顺序,这支持了时间顺序是智能体记忆关键结构的观点。

英文摘要

Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity and may ignore the order in which events occur. We introduce Segment Tree Memory, or SegTreeMem, a memory architecture that represents conversation history as a temporally ordered Segment Tree over utterances. SegTreeMem incrementally inserts new utterances through an online rightmost-frontier update rule, preserving chronological order while forming hierarchical memory segments. For retrieval, SegTreeMem propagates relevance scores through the tree to combine local semantic matching with hierarchical temporal context. Across three long-horizon memory benchmarks and two LLM backbones, SegTreeMem improves answer quality over flat retrieval, graph-structured memory, and tree-structured memory baselines. Additional temporal-order permutation analysis shows that the performance gain depends on preserving temporal order during memory construction, supporting the claim that temporal order is a key structure for agentic memory.

2606.04552 2026-06-04 cs.CL q-bio.GN

LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling

LDARNet: 用于基因组建模的DNA自适应表示网络与可学习分词

Daria Ledneva, Denis Kuznetsov

AI总结 提出LDARNet,一种结合动态分块和双向路由的120M参数层次基因组基础模型,在27个任务中优于更大模型,并发现学习到的边界与生物学基序对齐。

详情
AI中文摘要

基因组基础模型越来越多地采用大型语言模型架构,但几乎普遍依赖于固定的分词方案,如$k$-mers、BPE或单核苷酸,这些方案强加了可能掩盖生物学相关结构的任意序列边界。我们提出了LDARNet,一个120M参数的层次基因组基础模型,它将H-Net风格的动态分块从自回归生成适应到掩码语言建模,结合了BiMamba-2状态空间层与局部注意力、双向路由以及基于比值的正则化器,以在无监督的情况下诱导自适应标记边界。在来自Nucleotide Transformer和Genomic Benchmarks套件的27个任务上进行微调后,LDARNet在紧凑模型(<300M参数)中取得了11/18的胜率,并在5个组蛋白修饰任务上取得了最先进的结果,优于高达20倍大的模型。一个FLOPs匹配的对照实验将学习到的路由确定为这些增益的来源:在相同计算量下,学习到的边界在组蛋白任务上比固定网格边界高出多达14个百分点。进一步的核苷酸分辨率分析表明,学习到的边界在无监督的情况下与典型的启动子基序和剪接连接点对齐,为基因组基础模型中的自适应分词提供了生物学解释。

英文摘要

Genomic foundation models increasingly adopt large language model architectures, yet almost universally rely on fixed tokenization schemes such as $k$-mers, BPE, or single nucleotides, which impose arbitrary sequence boundaries that may obscure biologically relevant structure. We present LDARNet, a 120M-parameter hierarchical genomic foundation model that adapts H-Net-style dynamic chunking from autoregressive generation to masked language modeling, combining BiMamba-2 state-space layers with local attention, bidirectional routing, and a ratio-based regularizer to induce adaptive token boundaries without supervision. Fine-tuned on 27 tasks from the Nucleotide Transformer and Genomic Benchmarks suites, LDARNet achieves 11/18 wins among compact models ($<$300M parameters) and state-of-the-art results on 5 histone modification tasks, outperforming models up to 20$\times$ larger. A FLOPs-matched controlled experiment isolates learned routing as the source of these gains: learned boundaries beat fixed-grid boundaries by up to 14 percentage points on histone tasks at identical compute. Nucleotide-resolution analysis further shows that the learned boundaries align with canonical promoter motifs and splice junctions without supervision, providing a biological interpretation for adaptive tokenization in genomic foundation models.

2606.04547 2026-06-04 cs.IR cs.CL

Beyond Retrieval: Learning Compact User Representations for Scalable LLM Personalization

超越检索:学习紧凑用户表示以实现可扩展的LLM个性化

Heng Cao, Fan Zhang, Jian Yao, Yujie Zheng, Changlin Zhao, Lu Hao, Yuxuan Wei, Wangze Ni, Huaiyu Fu, Yuqian Sun, Xuyan Mo

AI总结 提出TAP-PER框架,通过可学习的用户状态前缀嵌入编码用户偏好,避免显式提示构建和繁重的每用户适配器,在六个LaMP任务上优于基线方法,并显著减少参数开销。

详情
Comments
16 pages, 6 figures
AI中文摘要

个性化大型语言模型需要在保持鲁棒性和部署规模效率的同时,将模型行为适应于个体用户。现有方法通常在输入层面(通过检索用户历史或构建个人资料提示)或参数层面(通过维护用户特定的参数高效模块)进行个性化。前者使个性化对检索质量和提示设计敏感,而后者则产生随用户数量增长的存储和维护成本。为解决这些限制,我们提出TAP-PER(时间注意力前缀个性化),一种基于前缀的框架,将用户偏好编码为可学习的表示,消除了显式提示构建,并用轻量级用户状态前缀嵌入替代了繁重的每用户适配器。受个性化推荐系统启发,TAP-PER将用户建模分解为用户状态和查询条件组件,并引入时间信号以捕捉用户兴趣的演变特性。在六个LaMP任务上的实验表明,TAP-PER在分类、评分和生成设置中均持续优于基于提示和基于模型的基线。此外,在1000用户规模下,TAP-PER的每用户参数比OPPU少130倍,总参数量约为PER-PCS的一半,证明无需显式提示构建或繁重的每用户适配器即可实现可扩展的LLM个性化。

英文摘要

Personalizing large language models requires adapting model behavior to individual users while preserving robustness and deployment-scale efficiency. Existing approaches typically personalize LLMs either at the input level, by retrieving user histories or constructing profile prompts, or at the parameter level, by maintaining user-specific parameter-efficient modules. The former makes personalization sensitive to retrieval quality and prompt design, whereas the latter incurs storage and maintenance costs that grow with the user population. To address these limitations, we propose TAP-PER (Temporal Attentive Prefix for PERsonalization), a prefix-based framework that encodes user preferences as learnable representations, eliminating explicit prompt construction and replacing heavy per-user adapters with lightweight user-state prefix embeddings. Inspired by personalized recommendation systems, TAP-PER decomposes user modeling into user-state and query-conditioned components, and incorporates temporal signals to capture the evolving nature of user interests. Experiments on six LaMP tasks show that TAP-PER consistently outperforms prompt-based and model-based baselines across classification, rating, and generation settings. Moreover, TAP-PER uses 130x fewer per-user parameters than OPPU and roughly half the total parameter footprint of PER-PCS at the 1,000-user scale, demonstrating that scalable LLM personalization can be achieved without explicit prompt construction or heavy per-user adapters.

2606.04545 2026-06-04 cs.CV

Impostor: An Agent-Curated Benchmark for Realistic AIGC Manipulation Localization

Impostor:一个用于真实AIGC篡改定位的智能体策划基准

Zhenliang Li, Yutao Hu, Qixiong Wang, Wenpeng Du, Hongxiang Jiang, Jiasong Wu, Xiaolong Jiang, Jungong Han

AI总结 为解决现有图像篡改检测与定位基准在视觉真实感、篡改多样性和生成器覆盖方面的局限,提出了Impostor数据集和CraftAgent框架,并设计了PhaseAware-Net方法,在多个基准上取得优异性能。

详情
Comments
10 pages, 3 figures, 5 tables
AI中文摘要

近期生成式图像编辑的进展提高了局部图像篡改的真实感和可控性,给图像篡改检测与定位(IMDL)带来了新挑战。然而,现有IMDL基准在视觉真实感、篡改多样性和生成器覆盖方面仍有局限,难以反映图像篡改的最新趋势。为解决这些局限,我们引入了Impostor,一个包含10万张篡改图像的高质量AI编辑图像篡改定位数据集。Impostor由CraftAgent构建,这是一个闭环智能体框架,集成了场景感知、编辑规划、篡改执行、质量验证和迭代反思,以自动生成多样且视觉真实的篡改图像。此外,Impostor包含由七个近期AIGC模型生成的图像,涵盖三种篡改类型,并包含多个篡改区域,为基于AIGC的IMDL提供了更全面的基准。进一步,我们提出了PhaseAware-Net(PANet),一个语义-取证框架,引入局部相位建模和语义-取证一致性学习,以更好地定位语义合理但取证异常的篡改区域。大量实验表明,Impostor对现有大型视觉语言模型(LVLMs)和专用IMDL方法构成了显著挑战,而PANet在Impostor和多个公开基准上取得了优越性能。

英文摘要

Recent advances in generative image editing have improved the realism and controllability of localized image manipulation, raising new challenges for image manipulation detection and localization (IMDL). However, existing IMDL benchmarks still have limitations in visual realism, manipulation diversity, and generator coverage, making it difficult to reflect recent trends in image manipulation. To address these limitations, we introduce Impostor, a high-quality AI-edited image manipulation localization dataset containing 100K manipulated images. Impostor is constructed by CraftAgent, a closed-loop agent framework that integrates scene perception, editing planning, manipulation execution, quality validation, and iterative reflection to automatically generate diverse and visually realistic manipulated images. Moreover, Impostor contains images generated by seven recent AIGC models across three manipulation types and includes multiple manipulated regions, providing a more comprehensive benchmark for AIGC-based IMDL. Furthermore, we propose PhaseAware-Net (PANet), a semantic-forensic framework that introduces local phase modeling and semantic-forensic consistency learning to better localize semantically plausible yet forensically disrupted manipulated regions. Extensive experiments show that Impostor poses significant challenges to existing large vision-language models (LVLMs) and specialized IMDL methods, while PANet achieves superior performance on Impostor and multiple public benchmarks.

2606.04536 2026-06-04 cs.AI

Scaling Self-Evolving Agents via Parametric Memory

通过参数化内存扩展自进化智能体

Tao Ren, Weiyao Luo, Hui Yang, Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Bingxue Chou, Jieping Ye, Jiafeng Liang, Yongbin Li, Yijie Peng

AI总结 提出TMEM框架,通过在线更新LoRA权重使智能体从经验中学习,从而在单轮交互中改变未来行为,并在多个基准上优于基于摘要和检索的方法。

详情
AI中文摘要

现有的内存增强型LLM智能体仅在提示空间中存储过去经验,作为文本摘要或检索段落,同时在整个运行过程中保持模型参数冻结。这类智能体可以\emph{查找}它们所见过的东西,但无法\emph{从中学习}:它们的策略不会因经验而改变,任何从上下文中丢弃的信息都会永久丢失。我们引入 exttt{TMEM},一个自进化的参数化内存框架,其中智能体不仅将历史压缩为显式内存,还通过轻量级在线更新将提炼的监督吸收到快速LoRA权重$Δ_t$中,从而在单个回合内真正改变其未来行为。我们将其形式化为具有快速权重运行动态的智能决策过程:动作从$π_{θ_0+Δ_t}$中采样,而提取动作产生监督,更新$Δ_t$以用于后续决策。这种观点使得提取策略可以直接通过RL优化:训练$θ_0$不仅改进了任务动作,还提高了用于在线LoRA适应的数据质量。我们进一步提出基于SVD的LoRA子空间初始化以加速在线收敛。在LoCoMo、LongMemEval-S、多目标搜索和CL-Bench上的实验表明, exttt{TMEM}在不同模型规模下始终优于基于摘要和基于检索的基线。

英文摘要

Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: their policy is unchanged by experience, and any information dropped from the context is permanently lost. We introduce \texttt{TMEM}, a self-evolving parametric memory framework in which the agent not only compresses history into explicit memory but also absorbs distilled supervision into fast LoRA weights $Δ_t$ via lightweight online updates, genuinely altering its future behavior within a single episode. We formalize this as an agentic decision process with fast-weight rollout dynamics: actions are sampled from $π_{θ_0+Δ_t}$, while extraction actions produce supervision that updates $Δ_t$ for subsequent decisions. This view makes the extraction policy directly optimizable by RL: training $θ_0$ improves not only task actions but also the quality of the data used for online LoRA adaptation. We further propose SVD-based initialization of the LoRA subspace to accelerate online convergence. Experiments on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench show that \texttt{TMEM} consistently outperforms summary-based and retrieval-based baselines across different model scales.

2606.04535 2026-06-04 cs.CL cs.AI

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

扩散大语言模型中用于格式约束生成的动态填充锚点

Boyan Han, Yiwei Wang, Yi Song, Yujun Cai, Chi Zhang

AI总结 提出动态填充锚点(DIA),一种无需训练的方法,通过动态估计结束锚点位置调整生成长度,确保格式约束下的结构正确性和语义连贯性,在GSM8K和MATH上实现零样本性能提升。

详情
Comments
Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
AI中文摘要

扩散大语言模型(dLLMs)提供双向注意力和并行生成,使其能够利用全局上下文并自然支持格式约束任务,如可解析的JSON或推理模板。虽然直接的固定锚点可以强制执行此类约束,但它们通常强加刚性跨度,导致推理截断或内容冗余。为了克服这一点,我们提出了动态填充锚点(DIA),一种无需训练的方法,在迭代填充之前动态估计结束锚点位置以调整生成长度。这种灵活机制确保了结构正确性和语义连贯性,避免了固定跨度方法的低效。在推理基准上的实验表明,DIA显著提高了格式合规性和答案准确性,在GSM8K和MATH上实现了显著的零样本增益。这些结果确立了DIA作为通往可靠、结构感知生成的一条稳健路径。

英文摘要

Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While straightforward fixed anchors can enforce such constraints, they often impose rigid spans, leading to truncated reasoning or redundant content. To overcome this, we propose Dynamic Infilling Anchors (DIA), a training-free method that dynamically estimates end-anchor positions to adjust generation length before iterative infilling. This flexible mechanism ensures structural correctness and semantic coherence, avoiding the inefficiencies of fixed-span methods. Experiments on reasoning benchmarks demonstrate that DIA substantially improves format compliance and answer accuracy, achieving significant zero-shot gains on GSM8K and MATH. These results establish DIA as a robust pathway toward reliable, structure-aware generation.

2606.04534 2026-06-04 cs.RO

MAD: Mapping-Aware World Models for Agile Quadrotor Flight

MAD: 面向敏捷四旋翼飞行的地图感知世界模型

Xinhong Zhang, Runqing Wang, Yunfan Ren, Ding Yu, Boyu Zhou, Jian Sun, Fang Deng, Jie Chen, Gang Wang

AI总结 提出地图感知世界模型MAD,通过重构机器人中心占用和可见性网格地图学习几何感知的潜在动力学,在视觉导航和竞速任务中实现更高成功率、更快飞行速度和更好跨任务迁移。

详情
Comments
12 pages, 14 figures
AI中文摘要

在杂乱场景中的敏捷四旋翼飞行需要的不仅仅是从深度图像到控制命令的反应式映射:飞行器必须记住已观测的区域,推断附近的占用空间,并在部分可见性和严格延迟下行动。在本文中,我们提出了地图感知梦想家(MAD),一种用于基于视觉的四旋翼飞行的几何感知世界模型。MAD不是将原始图像重建作为主要的自监督目标,而是学习循环潜在动力学,该动力学重构机器人中心的占用和可见性网格地图以及本体感受状态。这种设计迫使潜在状态以与碰撞避免直接相关的方式编码局部几何、可见性历史和自运动。MAD使用GPU并行地图构建模块在DiffAero中训练,该模块为占用和可见性提供高通量监督。学习到的表示用于三种策略学习模式:基于想象的MAD-Dreamer以及基于PPO和SHAC的特征提取器变体。在视觉导航和竞速任务中,基于MAD的智能体比相应的纯视觉基线实现了更高的成功率、更快的飞行速度和更好的跨任务迁移。该模型还从深度观测中产生可解释的地图预测和准确的自运动估计。我们进一步在配备Intel RealSense D435i的物理四旋翼上部署学习到的策略,并在有限感知下演示了安全的室内和室外飞行,在仿真中达到9.66 m/s,在真实世界森林实验中达到5.05 m/s。这些结果表明,地图感知世界模型在模块化空中导航和端到端学习之间提供了一个实用的中间地带。

英文摘要

Agile quadrotor flight in cluttered scenes requires more than a reactive mapping from a depth image to a control command: the vehicle must remember which regions have been observed, infer nearby occupied space, and act under partial visibility and tight latency. In this paper, we present Mapping-Aware Dreamer (MAD), a geometry-aware world model for vision-based quadrotor flight. Instead of using raw-image reconstruction as the main self-supervised objective, MAD learns recurrent latent dynamics that reconstruct robocentric occupancy and visibility grid maps together with proprioceptive states. This design forces the latent state to encode local geometry, visibility history, and ego-motion in a form that is directly relevant to collision avoidance. MAD is trained in DiffAero using a GPU-parallel map-construction module that provides high-throughput supervision for occupancy and visibility. The learned representation is used in three policy-learning modes: imagination-based MAD-Dreamer and feature-extractor variants based on PPO and SHAC. Across visual navigation and racing tasks, MAD-based agents achieve higher success rates, faster flight, and better cross-task transfer than corresponding vision-only baselines. The model also produces interpretable map predictions and accurate ego-motion estimates from depth observations. We further deploy the learned policy on a physical quadrotor with an Intel RealSense D435i and demonstrate safe indoor and outdoor flight under limited sensing, reaching 9.66 m/s in simulation and 5.05 m/s in real-world forest experiments. These results show that mapping-aware world models provide a practical middle ground between modular aerial navigation and end-to-end learning.

2606.04528 2026-06-04 cs.CV cs.AI

Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning

光学引导的SAR少样本类增量学习中的神经坍缩

Fan Zhang, Sijin Zheng, Fei Ma, Qiang Yin, Yongsheng Zhou, Fei Gao, Xian Sun

AI总结 针对SAR图像少样本类增量学习中的数据稀缺和方位角敏感问题,提出利用光学ATR数据集的正交子空间作为几何先验,通过投影损失和分类器损失联合诱导神经坍缩,实现特征紧凑性和类间可分离性。

详情
Comments
16 pages, 6 figures
AI中文摘要

合成孔径雷达图像中的少样本类增量学习由于严重的数据稀缺和SAR特有的变异性而面临独特挑战。特别是,SAR中强烈的方位角敏感性导致大的类内变异和类间混淆,而FSCIL的顺序更新进一步导致先前学习类别的灾难性遗忘。受神经坍缩启发,我们提出了一种光学引导的SAR FSCIL框架,该框架从数据丰富的光学ATR数据集中推导出正交特征子空间,并将其作为几何先验来指导SAR特征学习。通过主角约束将SAR特征投影到这些正交子空间上,有效地将判别结构从光学域转移到SAR域。具体地,我们的投影损失和用冻结的单纯形ETF几何优化的分类器损失通过将特征集中在类均值周围同时保持大的类间角度,联合诱导神经坍缩。我们在一个包含光学ATR数据集和具有24个目标类别的SAR ATR数据集的基准上评估该方法,该基准组织为一个基础训练会话和七个增量会话。与最近的FSCIL方法(包括NCFSCIL等)相比,我们的方法实现了最高的最终准确率以及最终性能与性能下降之间的有利权衡。此外,神经坍缩指标显示类内紧凑性和类间可分离性得到改善,表明学习到的特征更接近理想的单纯形ETF几何。

英文摘要

Few-shot class-incremental learning (FSCIL) in synthetic aperture radar imagery presents unique challenges due to severe data scarcity and SAR-specific variability. In particular, strong azimuth sensitivity in SAR induces large intra-class variation and inter-class confusion, and FSCIL sequential updates further lead to catastrophic forgetting of previously learned classes. Inspired by neural collapse, we propose an optical-guided SAR FSCIL framework, which derives orthogonal feature subspaces from a data-rich optical ATR dataset and uses them as geometric priors to guide SAR feature learning. SAR features are projected onto these orthogonal subspaces via principal angle constraints, effectively transferring discriminative structure from the optical to the SAR domain. Specifically, our projection loss and the classifier loss optimized with a frozen simplex-ETF geometry jointly induce neural collapse by concentrating features around class means while maintaining large inter-class angles. We evaluate the approach on a benchmark comprising an optical ATR dataset and a SAR ATR dataset with 24 target classes, organized into a base training session and seven incremental sessions. Compared with recent FSCIL methods including NCFSCIL and so on, our method achieves the highest final accuracy and a favorable trade-off between final performance and performance degradation. Moreover, neural collapse metrics show improved intra-class compactness and inter-class separability, indicating that the learned features more closely approximate the ideal simplex-ETF geometry.

2606.04527 2026-06-04 cs.MM cs.CV cs.GR

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Echo-Infinity: 学习演化记忆用于实时无限视频生成

Yuxuan Bian, Zeyue Xue, Songchun Zhang, Shiyi Zhang, Weiyang Jin, Yaowei Li, Junhao Zhuang, Haoran Li, Jie Huang, Haoyang Huang, Nan Duan, Qiang Xu

AI总结 提出Echo-Infinity框架,通过可学习的演化记忆以恒定成本动态过滤、抽象和压缩任意长度历史,结合统一相对RoPE方案,首次实现24小时实时无限视频生成。

详情
Comments
Website: https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity/
AI中文摘要

我们提出了Echo Infinity,一个面向实时无限视频生成的自回归(AR)框架,它采用可学习的演化记忆,以恒定成本动态过滤、抽象和压缩任意长度的历史。现有方法主要使用预定义的KV缓存调度、固定比例启发式压缩或推理时的RoPE适配来管理记忆。这些设计由于有限的缓存窗口和忽略自回归生成噪声,不可避免地丢失历史信息并放大复合误差。受人类记忆巩固的启发,Echo-Infinity用可学习的记忆查询替代手工设计的记忆管理,这些查询通过注意力和门控机制在过去的帧从局部窗口中被驱逐时更新。查询与视频扩散变换器(DiTs)进行端到端优化,形成一种演化记忆,支持任意压缩比,计算量恒定且与视频长度无关。它们还充当可泛化的生成先验,即使仅使用优化后的初始状态也能提高质量。我们进一步引入了统一相对RoPE方案,它将锚定帧固定从id 0开始,并让最新帧的id在训练和推理过程中最多增长到DiTs预训练的最大时间RoPE id,从而将模型从有限的RoPE约束中解放出来,并缩小训练-测试的RoPE外推差距。在长视频和短视频生成中,Echo-Infinity达到了最先进的性能,并且据我们所知,首次展示了有前景的24小时(>130万帧)实时滚动生成,为无限视频生成提供了一条实用路径。

英文摘要

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules, fixed-ratio heuristic compression, or inference-time RoPE adaptation. These designs inevitably lose historical information and amplify compounding errors due to their limited cache window and ignorance of autoregressive generation noise. Inspired by human memory consolidation, Echo-Infinity replaces handcrafted memory curation with learnable Memory Query, which are updated by attention and a gating mechanism when past frames are evicted from the local window. The queries are optimized end-to-end with the video diffusion transformers (DiTs), forming an evolving memory that supports arbitrary compression ratios with constant computation independent of video length. They also act as a generalizable generation prior, improving quality even when only the optimized initial state is used. We further introduce Unified Relative RoPE Recipe, which anchors the sink frames to start from id 0 and lets the newest frame id grow at most to the DiTs' pretrained maximum temporal RoPE id throughout training and inference, freeing the model from the finite RoPE constraint and closing the train-test RoPE extrapolation gap. In long and short video generation, Echo-Infinity achieves state-of-the-art performance, and, to our knowledge, demonstrates promising 24-hour (>1.3 M frames) real-time rollouts for the first time, suggesting a practical path toward infinite video generation.

2606.04522 2026-06-04 cs.IR cs.AI cs.DB cs.LG

ANN Search: Recall What Matters

ANN搜索:召回真正重要的

Dimitris Dimitropoulos, Nikos Mamoulis

AI总结 本文提出用逆近似比1/Ratio@k替代Recall@k来评估近似最近邻搜索质量,实验表明前者能更准确反映实际效用并降低计算开销。

详情
AI中文摘要

近似最近邻(ANN)搜索已成为信息检索和现代机器学习任务(从分类到检索增强生成)的核心原语。社区主要通过给定Recall@k(检索到的真实精确最近邻的比例)下的吞吐量来评估和调优ANN算法。我们认为,ANN搜索真正重要的是检索结果的质量,而非它们与真实kNN集合的重叠。我们证明,使用Recall@k评估检索质量会带来不必要的计算开销,并研究用逆近似比1/Ratio@k替代它。1/Ratio@k评估检索到的邻居与真实邻居之间距离的差异。它无需判断、无需超参数,仅通过标准ANN基准输入即可计算。我们在涵盖广泛内在维度的多样化数据集上对最先进的ANN算法进行基准测试,从效率、下游分类和检索增强生成三个维度全面评估这两个指标。在效率方面,优化1/Ratio@k达到操作质量阈值所需的计算成本远低于Recall@k。在下游任务中,即使Recall@k显著下降,性能指标(标签精度、语义相似度、BERTScore和LLM评分质量)仍保持高度稳定。相反,逆近似比紧密反映了这种稳定性,比Recall@k更好地追踪实际效用。最终,虽然Recall@k夸大了近似的真实成本,但1/Ratio@k提供了更准确、可部署的ANN实际质量代理。

英文摘要

Approximate nearest neighbor (ANN) search has become a core primitive in information retrieval and modern machine learning tasks, from classification to retrieval-augmented generation. The community evaluates and tunes ANN algorithms primarily on their throughput at a given Recall@k, the fraction of true exact neighbors retrieved. We argue that what really matters in ANN search is the quality of the retrieved results and not their overlap with the true kNN set. We show that using Recall@k to assess retrieval quality forces unnecessary computational overhead and investigate replacing it by 1/Ratio@k, the inverse approximation ratio. 1/Ratio@k evaluates the differences between the distances of the retrieved and true neighbors. It is judge-free, hyperparameter-free, and computable from standard ANN benchmark inputs alone. We benchmark state-of-the-art ANN algorithms across diverse datasets spanning a wide range of intrinsic dimensionalities, evaluating the two metrics comprehensively across efficiency, downstream classification, and retrieval-augmented generation. On the efficiency axis, optimizing for 1/Ratio@k reaches operational quality thresholds at a substantially lower computational cost than Recall@k. In downstream tasks, performance indicators (label precision, semantic similarity, BERTScore, and LLM-graded quality) remain highly stable even when Recall@k drops significantly. The inverse approximation ratio, on the other hand, closely mirrors this stability, tracking true utility much better than Recall@k. Ultimately, while Recall@k overstates the true cost of approximation, 1/Ratio@k offers a more accurate, deployable proxy for actual ANN quality.

2606.04518 2026-06-04 cs.RO

Cooperative Circumnavigation for Multiple Unmanned Surface Vehicles Without External Localization

无外部定位的多无人水面艇协同环绕航行

Xueming Liu, Lin Li, Xiang Zhou, Tianjiang Hu, Qingrui Zhang

AI总结 针对无外部定位的多无人水面艇,提出基于异构感知和耦合振荡器的协同环绕框架,利用最大相关熵卡尔曼滤波和伪线性卡尔曼滤波估计相对位置,实现指定半径的均匀圆形编队。

详情
Comments
17 pages, 15 figures
AI中文摘要

本文提出了一种针对多无人水面艇(USV)在无外部定位条件下运行的协同目标环绕框架。目标是仅利用有限的本船传感,围绕目标保持指定半径的均匀圆形编队。该框架采用异构感知策略,区分与目标之间以及USV之间的非对称传感关系。具体而言,USV通过主动感知和艇间通信获取相对距离和位移测量,而通过被动传感器获取对非合作目标的方位测量。为了估计相对位置——包括USV之间以及每个USV与目标之间的相对位置——我们分别采用了最大相关熵卡尔曼滤波和伪线性卡尔曼滤波。设计了一个基于耦合振荡器的编队控制器,以确保系统可观测性同时实现环绕航行。理论分析表明,该控制器确保USV之间的相对运动以及每个USV与目标之间的相对运动满足持续激励条件,从而保证基于卡尔曼滤波器的可观测性。通过数值仿真验证了所提方法的有效性。

英文摘要

This paper proposes a cooperative target circumnavigation framework for multiple unmanned surface vehicles (USVs) operating without external localization. The objective is to maintain a uniform circular formation of a specified radius around a target using only limited onboard sensing. The framework adopts a heterogeneous perception strategy that distinguishes between the asymmetric sensing relationships with the target and among the USVs. Specifically, the USVs obtain relative range and displacement measurements through active perception and inter-vehicle communication, while bearing measurements to a non-cooperative target are acquired via passive sensors. To estimate relative positions--both among USVs and between each USV and the target--we employ a Maximum Correntropy Kalman Filter and a Pseudo-Linear Kalman Filter, respectively. A coupled oscillator-based formation controller is designed to ensure system observability while achieving circumnavigation. Theoretical analysis demonstrates that the controller ensures the relative motions between the USVs, as well as that between each USV and the target, satisfy the persistent excitation condition, thereby guaranteeing observability of the Kalman-based filters. The effectiveness of the proposed approach is validated through numerical simulations.

2606.04517 2026-06-04 cs.NI cs.AI

Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis

像对待树一样对待流量:一种用于加密流量分析的语义保持分层图专家框架

Yuantu Luo, Jun Tao, Linxiao Yu, Guang Cheng

AI总结 提出一种基于协议树图注意力与专家混合的语义保持分层图专家框架(PTGAMoE),通过字段级图构建和专家委员会设计,在严格无数据泄露设置下显著优于现有模型,并提供可解释的协议级特征重要性分析。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

基于图的深度学习方法已被广泛应用于加密流量分析,以利用不同粒度下的潜在相关性。然而,复杂的预处理流程和精细的模型结构虽然通常能取得良好性能,但在表示学习过程中可能掩盖固有的协议语义。此外,由协议规范定义并在人工流量分析中常规使用的协议层及其对应字段的分层结构,在现有学习框架中仍未得到充分探索。在本文中,我们提出了一种用于加密流量分析的语义保持分层图专家框架——协议树图注意力与专家混合(PTGAMoE)。基于字段的图构建和专家委员会设计使PTGAMoE能够量化模型对特定字段和协议的偏好。在严格无数据泄露设置下,对代表性基准数据集的大量实验结果表明,PTGAMoE显著优于最先进的模型。此外,语义保持设计提供了关于协议级特征重要性和专家级贡献的可解释性洞察,反映了模型在加密流量分类任务中的决策逻辑。

英文摘要

Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities. However, while complex preprocessing pipelines and sophisticated model structures often achieve strong performance, they may obscure inherent protocol semantics during representation learning. Moreover, the hierarchical structure of protocol layers and their corresponding fields, defined by protocol specifications and routinely utilized in manual traffic analysis, remains underexplored in existing learning frameworks. In this paper, we propose Protocol Tree Graph Attention with Mixture of Experts (PTGAMoE), a semantic-preserving hierarchical graph-based expert framework for encrypted traffic analysis. The field-based graph construction and expert committee design enable PTGAMoE to quantify the model's preferences for specific fields and protocols. Extensive experimental results on representative benchmark datasets under strict no-data-leakage settings demonstrate that PTGAMoE significantly outperforms state-of-the-art (SOTA) models. Furthermore, the semantic-preserving design provides interpretable insights into protocol-level feature importance and expert-level contributions, reflecting the model's decision-making logic in encrypted traffic classification tasks.

2606.04516 2026-06-04 cs.LG cs.AI

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

GeoMin: 基于几何分布建模的数据高效半监督RLVR

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Kai Tang, Zhengqing Zang, Bowen Song, Weiqiang Wang, Gang Chen

AI总结 提出GeoMin方法,通过建模标注数据的全局特征分布来解码正确与错误展开的结构差异,从而建立稳健先验评估自奖励信号可靠性,以少量标注数据高效利用未标注数据,在仅用10%标注时超越全监督模型。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)显著提升了LLM的推理能力,但面临困境:标准监督扩展受限于高标注成本,而无监督替代方案则遭受严重的模型崩溃。最近的半监督RLVR方法通过使用少量标注集指导未标注数据,在训练效果和标注成本之间取得了有前景的权衡。然而,由于依赖粗糙的性能启发式,它们遭受严重的数据效率瓶颈,导致绝大多数有价值实例未被充分利用。为此,我们提出GeoMin,它在标注数据上建模全局特征分布,以解码正确和错误展开之间的结构差异,从而建立稳健的先验来评估自奖励信号的可靠性,并充分释放未标注数据的潜力。实验上,GeoMin比最强基线高出+4.1%,甚至在使用仅10%标注的情况下超越全监督模型,展示了显著的数据效率。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from severe model collapse. Recent semi-supervised RLVR methods address this by using a small labeled set to guide unlabeled data, achieving a promising trade-off between training efficacy and annotation cost. However, they suffer from a severe data-efficiency bottleneck due to the reliance on coarse performance heuristics, leaving a vast majority of valuable instances underutilized. To this end, we propose GeoMin, which models global feature distributions on labeled data to decode the structural discrepancy between correct and incorrect rollouts, thereby establishing a robust prior to assess the reliability of self-reward signals and fully unleash the potential of unlabeled data. Empirically, GeoMin outperforms the strongest baselines by +4.1% and even surpasses fully supervised models with only 10% of the annotations, demonstrating remarkable data efficiency.

2606.04513 2026-06-04 cs.AI

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

MapAgent: 一个工业级的城市规模车道级地图生成智能框架

Deguo Xia, Zihan Li, Haochen Zhao, Dong Xie, Yuyao Kong, Xiyan Liu, Jizhou Huang, Mengmeng Yang, Diange Yang

AI总结 提出MapAgent框架,通过结合视觉语言模型和约束感知推理,在验证驱动的Judge-Planner-Worker循环中修正车道地图生成中的规范违规问题,实现城市规模的高自动化生产。

详情
Comments
Accepted by KDD 2026
AI中文摘要

车道级地图是自动驾驶和车道级导航的关键基础设施,但为数百个城市构建和维护标准化车道网络仍然高度劳动密集。最近的端到端矢量化映射方法可以直接从传感器数据预测车道几何和拓扑,但它们通常将映射规范和交通规则视为隐式的、依赖于数据集的监督。此外,在复杂场景中(例如,磨损或缺失的标记和遮挡),仅凭视觉证据往往难以确定正确的车道配置,使得规范违规成为人工后期编辑的主要来源。我们提出MapAgent,一个工业级智能架构,它增强了一个矢量化主干,用于生成符合规范的车道地图。MapAgent不仅仅是在地图预测上添加一个智能体循环,而是在一个有界、验证驱动的Judge-Planner-Worker循环中,将主干感知与明确的规范验证、约束感知推理和确定性地图编辑相结合。一个视觉语言Judge通过联合检查视觉证据和草稿向量来诊断错误,而一个工具调用Planner生成最小的修正编辑并进行编辑后重新验证。为了保持城市规模生产的可扩展性,MapAgent仅在主干置信度低的图块上选择性触发,增加了适度的开销同时保持吞吐量。在真实世界数据集上的实验显示,与强大的生产基线相比,特别是在复杂和长尾场景中,性能持续提升。此外,MapAgent已集成到百度地图中,支持全国超过360个城市的车道级地图生成,并将整体生产自动化率提升至95%以上,证明了MapAgent在大规模车道级地图生成中的实用性和有效性。

英文摘要

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directly from sensor data, but they typically treat mapping specifications and traffic regulations as implicit, dataset-dependent supervision. Moreover, in complex scenes (e.g., worn or missing markings and occlusions), correct lane configurations are often under-determined by visual evidence alone, making specification violations a major source of human post-editing. We propose MapAgent, an industrial-grade agentic architecture that augments a vectorization backbone for specification-compliant lane-map production. Rather than merely adding an agent loop to map prediction, MapAgent couples backbone perception with explicit specification verification, constraint-aware reasoning, and deterministic map editing under a bounded, verification-driven Judge-Planner-Worker loop. A vision-language Judge diagnoses errors by jointly inspecting visual evidence and draft vectors, while a tool-calling Planner generates minimal corrective edits with post-edit re-validation. To remain scalable for city-scale production, MapAgent is selectively triggered only on tiles with low backbone confidence, adding modest overhead while preserving throughput. Experiments on real-world datasets show consistent gains over strong production baselines, especially in complex and long-tail scenarios. Additionally, MapAgent has been integrated into Baidu Maps, supporting lane-level map generation for over 360 cities nationwide and elevating the overall production automation to over 95%, demonstrating MapAgent's practicality and effectiveness for large-scale lane-level map generation.

2606.04511 2026-06-04 cs.CL cs.LG

SparDA: Sparse Decoupled Attention for Efficient Long-Context LLM Inference

SparDA: 用于高效长上下文LLM推理的稀疏解耦注意力

Yaosheng Fu, Guangxuan Xiao, Xin Dong, Song Han, Oreste Villa

AI总结 提出SparDA架构,通过引入第四投影Forecast实现KV缓存预取与注意力解耦,减少稀疏选择开销,在长上下文推理中实现1.25倍预填充加速和1.7倍解码加速。

详情
AI中文摘要

稀疏注意力减少了长上下文LLM推理的计算和内存带宽。然而,仍然存在两个关键挑战:(1)KV缓存容量随序列长度增长,卸载到CPU内存引入了PCIe传输瓶颈;(2)稀疏选择步骤本身保持$O(T^2)$复杂度,在长上下文中可能主导注意力成本。我们提出SparDA,一种解耦的稀疏注意力架构,它在Query、Key和Value之外引入了第四个逐层投影——Forecast。Forecast预测下一层所需的KV块,从而实现超前选择,将CPU到GPU的预取与当前层执行重叠。由于Forecast与注意力查询解耦,我们的GQA实现为每个GQA组使用一个Forecast头,相比原始多头选择器减少了选择开销。SparDA增加了<0.5%的参数,并通过匹配原始选择器的注意力分布仅训练Forecast投影。在两个稀疏预训练的8B模型上,SparDA匹配或略微提高了准确性,并且相比稀疏注意力卸载基线,提供了高达1.25倍的预填充加速和1.7倍的解码加速。通过使单个GPU上可行的批量大小更大,SparDA进一步实现了比非卸载稀疏基线高达5.3倍的解码吞吐量。我们的源代码可在https://github.com/NVlabs/SparDA获取。

英文摘要

Sparse attention reduces compute and memory bandwidth for long-context LLM inference. However, two key challenges remain: (1) KV cache capacity still grows with sequence length, and offloading to CPU memory introduces a PCIe transfer bottleneck; (2) the sparse selection step itself retains $O(T^2)$ complexity and can dominate attention cost at long contexts. We propose SparDA, a decoupled sparse attention architecture that introduces a fourth per-layer projection, the Forecast, alongside Query, Key, and Value. The Forecast predicts the KV blocks needed by the next layer, enabling lookahead selection that overlaps CPU-to-GPU prefetch with current-layer execution. Because Forecast is decoupled from the attention query, our GQA implementation uses one Forecast head per GQA group, reducing selection overhead versus the original multi-head selector. SparDA adds $<$0.5% parameters and trains only the Forecast projections by matching the original selector's attention distribution. On two sparse-pretrained 8B models, SparDA matches or slightly improves accuracy and delivers up to 1.25$\times$ prefill speedup and 1.7$\times$ decode speedup over the sparse-attention offload baseline. By enabling larger feasible batch sizes on a single GPU, SparDA further reaches up to 5.3$\times$ higher decode throughput than the non-offload sparse baseline. Our source code is available at https://github.com/NVlabs/SparDA.

2606.04507 2026-06-04 cs.CL cs.AI

Self-Evolving Deep Research via Joint Generation and Evaluation

通过联合生成与评估实现自我进化的深度研究

Han Zhu, Chengkun Cai, Yuanfeng Song, Xing Chen, Sirui Han, Yike Guo

AI总结 提出SCORE框架,通过共享参数的协同进化训练联合优化评估器与求解器,解决深度研究报告生成中奖励不可验证的问题,持续提升生成质量。

详情
AI中文摘要

大型语言模型(LLM)在日常应用中越来越广泛,其中深度研究是一项特别重要的能力。与传统的问答(QA)任务不同,深度研究报告生成缺乏明确的真实答案,这使得奖励设计本质上不可验证,限制了有效的强化学习。现有方法通过LLM作为评判者和查询相关的评估标准来缓解这一挑战,但它们仍然依赖静态评估器,无法随着求解器的改进而调整标准,导致优化压力不足并最终饱和。我们通过一个用于深度研究评估和生成的 extbf{自}我进化 extbf{协}同进化训练框架(SCORE)来解决这一限制,该框架在共享参数的学习过程中紧密耦合评估器和求解器。我们不将生成和评估视为孤立的模块,而是利用它们的内在联系,在单个共享参数模型中实现联合改进。为了限制这一过程,我们引入了一个元控制机制,该机制根据求解器的性能动态控制评估环境,鼓励有效的评估维度和足够深入的评估器搜索。在深度研究基准上的大量实验表明,报告生成质量持续提升,表明协同进化评估和生成是训练开放式研究代理的一个有前景的方向。

英文摘要

Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning. Existing approaches mitigate this challenge with LLM-as-a-judge and query-dependent evaluation rubrics, but they still rely on static evaluators that cannot adapt their standards as the solver improves, leading to insufficient and eventually saturated optimization pressure. We address this limitation with a \textbf{s}elf-evolving \textbf{co}-evolutionary training framework for deep \textbf{re}search evaluation and generation (SCORE), which tightly couples an evaluator and a solver in a shared-parameter learning process. Rather than treating generation and evaluation as isolated modules, we leverage their intrinsic connection to enable joint improvement within a single shared-parameter model. To restrict this process, we introduce a meta-harness, which dynamically controls the evaluation environment based on solver performance, encouraging valid evaluation dimensions and sufficiently deep evaluator search. Extensive experiments on deep research benchmarks demonstrate consistent improvement in report generation quality, showing that co-evolving evaluation and generation is a promising direction for training open-ended research agents.

2606.04505 2026-06-04 cs.AI

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

模拟、推理、决策:基于科学推理的LLM驱动模拟决策

Yuhan Yang, Ruipu Li, Alexander Rodríguez

AI总结 提出MechSim框架,通过神经符号推理使LLM能够推理科学模拟器的机制和假设,提升决策透明度和可靠性。

详情
AI中文摘要

科学模拟器越来越多地被集成到LLM驱动的系统中,用于高风险模拟驱动决策。然而,现有框架主要使用LLM来生成、校准或执行模拟器,将其视为黑盒接口而非可推理的结构化机械系统。因此,当前方法缺乏识别、表示和推理模拟器行为背后的假设和机制的能力,限制了透明度、可审计性和决策合理性。我们引入了MechSim,一个面向可执行科学模拟器的机制基础神经符号推理框架。与先前主要对静态符号结构进行推理的神经符号方法不同,MechSim使LLM代理能够推理科学模拟器的机制、假设和执行行为。我们的框架通过共享结构化模式表示模拟器,捕获假设、变量、机制依赖和执行轨迹。在此表示之上,LLM代理作为受约束的推理引擎运行,生成结构化的、基于证据的解释,将模拟器结果与其底层机制联系起来。我们在多个高风险领域评估了我们的方法,结果表明它提高了机制级解释质量、模拟器分析和下游决策可靠性。

英文摘要

Scientific simulators are increasingly being integrated into LLM-driven systems for high-stakes simulation-driven decision-making. However, existing frameworks primarily use LLMs to generate, calibrate, or execute simulators, treating them as black-box interfaces rather than as structured mechanistic systems that can be reasoned about. As a result, current approaches lack the ability to identify, represent, and reason about the assumptions and mechanisms underlying simulator behavior, limiting transparency, auditability, and decision justification. We introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework for executable scientific simulators. Unlike prior neuro-symbolic approaches that primarily reason over static symbolic structures, MechSim enables LLM agents to reason about the mechanisms, assumptions, and execution behavior of scientific simulators. Our framework represents simulators through a shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces. On top of this representation, LLM agents operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms. We evaluate our approach across multiple high-stakes domains and show that it improves mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.

2606.04503 2026-06-04 cs.LG cs.AI

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

暗中选择:通过追踪元认知支点实现高效的推理可验证奖励强化学习

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Bowen Song, Weiqiang Wang, Gang Chen

AI总结 针对可验证奖励强化学习(RLVR)中数据效率低的问题,提出PivotTrace框架,利用注意力动态追踪推理过程中的元认知支点,通过支点密度量化不确定性实现数据自动分流,在仅使用29.3%标注样本和2.75倍收敛加速下超越全监督模型。

详情
AI中文摘要

可验证奖励强化学习(RLVR)极大地推进了大型推理模型(LRMs),但它需要及时在大量完全标注的数据集上进行训练。为此,从两个角度广泛研究了数据高效的RLVR方法:(i)数据选择方法识别一小部分“黄金”样本,这些样本能产生接近全数据性能,但它们依赖于预先存在的标注数据池。(ii)无监督RLVR方法在大规模未标注数据上利用模型自身的内部监督信号进行训练,但表现出次优性能。因此,我们研究了RLVR的“暗中选择”设置,其目标是在没有先验监督的情况下,选择对训练最有益且值得标注的未标注样本。通过系统分析,我们证明智能选择依赖于一个校准良好的不确定性估计器,以实现数据的策略性划分,从而进行自适应训练方案。基于这一见解,我们提出了PivotTrace,一个三路数据分流框架,利用注意力动态追踪推理过程中的元认知支点。通过支点密度精确量化不确定性,PivotTrace实现了自动数据路由,协同最大化标注和训练效率。实验表明,PivotTrace仅使用29.3%的标注样本和2.75倍的收敛速度就超越了全监督LRM。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely studied from two perspectives: (i) data selection methods identify a small subset of "golden" samples that yield near-full-data performance, but they rely on a pre-existing pool of labeled data. (ii) unsupervised RLVR methods train the model using its own internal supervision signals on large-scale unlabeled data, yet they exhibit suboptimal performance. Accordingly, we investigate the "pick in the dark" setup for RLVR, which aims to select, without prior supervision, unlabeled samples that are most beneficial for training and worthy of annotation. Through systematic analysis, we demonstrate that smart picks hinge on a well-calibrated uncertainty estimator to enable strategic partitioning of data for adaptive training regimes. Building on this insight, we propose PivotTrace, a three-way data triage framework that leverages attention dynamics to trace metacognitive pivots during reasoning. By precisely quantifying uncertainty through pivot density, PivotTrace achieves automated data routing to synergistically maximize both annotation and training efficiency. Empirically, PivotTrace surpasses the fully supervised LRM with only 29.3% annotated samples and 2.75 faster convergence.

2606.04500 2026-06-04 cs.CL

SANE Schema-aware Natural-language Evaluation of Biological Data

SANE:生物数据的模式感知自然语言评估

Rolf Gattung, Martin Krueger, Markus Reischl

AI总结 提出SANE范式,通过模式感知的自动生成基准,评估少样本大语言模型在特定领域文本到SQL任务中的可靠性,发现结构化提示和约束可实现准确查询生成。

详情
Comments
5 pages, 3 figures, submitted but not yet reviewed by BMT2026
AI中文摘要

高通量显微镜生成大型结构化数据集,捕捉细胞对药理扰动的反应,但访问这些数据集通常需要SQL专业知识。大语言模型提供了一种自然语言替代方案,但其幻觉倾向引发了对结果可靠性的担忧。我们提出SANE(模式感知自然语言评估),一种用于特定领域文本到SQL评估的新范式:基于模式、自动生成的基准,与实际和特定的实验结构相关联。SANE使评估更具可扩展性、系统性和可重复性。使用SANE,我们评估了一个少样本大语言模型,并表明在具有结构化提示和约束的受限模式下,无需任何模型训练或微调即可实现准确的查询生成。大多数失败源于模糊或未明确指定的输入,表现为过度谨慎的澄清请求或对应先消除歧义的查询的回答,而不是错误的SQL生成。这些结果表明,当与模式感知提示相结合时,少样本大语言模型可以在定义良好的领域内提供可靠的数据库访问。

英文摘要

High-throughput microscopy generates large, structured datasets capturing cellular responses to pharmacological perturbations, but accessing these datasets typically requires SQL expertise. Large language models offer a natural-language alternative, yet their tendency to hallucinate raises concerns about result reliability . We present SANE Schema-Aware Natural-language Evaluation, a novel paradigm for domain-specific text-to-SQL evaluation: schema-grounded, automatically generated benchmarks tied to real and specific experimental structure. SANE makes evaluation more scalable, systematic, and reproducible. Using SANE, we evaluate a few-shot large language model and show that, under constrained schemas with structured prompting and guardrails, accurate query generation is achievable without any model training or fine-tuning. Most failures stem from ambiguous or underspecified inputs and manifest as overly cautious clarification requests or answers to queries that should first be disambiguated, rather than incorrect SQL generation. These results indicate that few-shot large language models can provide reliable database access in well-defined domains when combined with schema-aware prompting.

2606.04499 2026-06-04 cs.SI cs.LG

Modeling and Interpreting Teamwork Dynamics in Cancer Care Outcome Prediction

建模与解释癌症护理结果预测中的团队协作动态

Yuhua Huang, Hsiao-Ying Lu, Kwan-Liu Ma

AI总结 利用电子健康记录中的协作网络和机器学习方法,研究医疗专业人员团队协作动态对癌症患者生存预测的影响,并解释关键网络特征。

详情
AI中文摘要

癌症护理需要纵向方法,根据每个患者的需求随时间规划和实施治疗。虽然先前研究深入探讨了临床和人口统计学因素(如合并症和年龄)如何指导治疗规划,但对护理实施阶段的关注却少得多。然而,规划和实施都是基于团队的过程,依赖于多个医疗专业人员之间的协调努力。因此,这些协作实践中蕴含的人为因素对于优化患者结果至关重要。尽管重要性显著,但现有关于癌症护理中人为因素的文献有限,很少有研究调查护理团队内的协作如何在治疗过程中演变。为填补这一空白,本研究探讨通过电子健康记录系统捕获的医疗专业人员协作如何影响癌症患者结果,特别强调团队协作动态。我们将电子健康记录介导的医疗专业人员交互表示为网络,并应用机器学习方法识别这些协作结构中嵌入的患者生存预测信号。我们进一步通过指出与特定结果相关的网络特征和动态模式来解释模型预测。我们通过稳健性分析评估模型,确保发现稳定且不受训练中随机变异驱动。此外,我们的见解与医学文献中提出的假设一致,我们的结果为这些主张提供了基于经验数据的证据。总体而言,我们的工作提供了一个实用流程,利用协作的数字痕迹来评估和加强纵向团队医疗,为医疗实施中的数据驱动干预提供可操作的见解。

英文摘要

Cancer care requires a longitudinal approach in which treatments are planned and delivered over time according to the needs of each individual patient. While prior research has thoroughly explored how clinical and demographic factors, such as comorbidities and age, inform treatment planning, far less attention has been devoted to the delivery phase of care. Yet planning and delivery are both team-based processes that depend on coordinated efforts among multiple healthcare professionals (HCPs). As such, the human factors embedded in these collaborative practices are crucial to optimizing patient outcomes. Despite this importance, the existing literature on human factors in cancer care is limited, and very few studies have investigated how collaboration within care teams evolves over the course of treatment. To fill this gap, this work examine how HCPs' collaboration, captured through electronic health record (EHR) systems, affects cancer patient outcomes, with particular emphasis on teamwork dynamics. We represent EHR-mediated HCP interactions as networks and apply machine learning methods to identify predictive signals of patient survival embedded in these collaborative structures. We further interpret model predictions by pinpointing network characteristics and dynamic patterns associated with particular outcomes. We evaluate our model through robustness analyses to ensure that the findings are stable and not driven by stochastic variation in training. Additionally, our insights align with hypotheses proposed in the medical literature, and our results provide the empirical, data-driven evidence supporting these claims. Overall, our work contributes a practical workflow for leveraging digital traces of collaboration to evaluate and strengthen longitudinal team-based healthcare, offering actionable insights to guide data-informed interventions in healthcare delivery.

2606.04494 2026-06-04 cs.AI

Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System

超越基于提示的规划:基于MCP原生图规划的生物医学智能体系统

Zhangtianyi Chen, Florensia Widjaja, Wufei Dai, Xiangjun Zhang, Yuhao Shen, Juexiao Zhou

AI总结 提出BioManus系统,通过将异构生物信息学工具编译为标准MCP服务器并构建类型化异构图,实现基于图结构的规划,解决工具混淆和上下文效率问题,在BioAgentBench和LAB-Bench上提升执行准确性和工作流有效性。

详情
AI中文摘要

生物医学智能体有望自动化复杂的生物工作流,但当前系统面临两个基本瓶颈:生物信息学工具在接口和执行环境上高度异构,而智能体规划仍依赖于基于提示的扁平工具描述。随着生物医学软件生态系统的增长,这种工具覆盖与上下文大小之间的耦合导致工具混淆、规划不稳定和执行效率低下。我们引入BioManus,一种基于结构化生物能力上的图支架规划的原生MCP生物医学智能体。BioManus首先提出BioinfoMCP编译器,将异构生物信息学软件转换为标准化的MCP服务器,从而产生一个大型可执行的MCP生态系统。然后,它将这个生态系统组织成一个类型化的异构图,涵盖工具、操作、数据类型和工作流阶段。在推理时,BioManus检索紧凑的任务特定子图,合成操作级工作流支架。这种设计将规划复杂度与原始工具库存大小解耦,在高召回率检索下实现了上下文压缩比Theta(N / (h * m_bar)),其中N是工具总数,h是工作流长度,m_bar(远小于N)是每个操作的平均候选工具数量。在BioAgentBench和LAB-Bench上的实验表明,与先进的生物医学智能体基线相比,BioManus提高了执行准确性、工作流有效性和上下文效率。这项工作表明了一种范式转变:可扩展的生物医学推理需要结构化的可执行能力图,而不是越来越大的提示级工具检索。

英文摘要

Biomedical agents promise to automate complex biological workflows, yet current systems face two fundamental bottlenecks: bioinformatics tools are highly heterogeneous in interfaces and execution environments, while agent planning still relies on flat prompt-retrieved tool descriptions. As biomedical software ecosystems grow, this coupling between tool coverage and context size leads to tool confusion, unstable planning, and inefficient execution. We introduce BioManus, an MCP-native biomedical agent built on graph-scaffolded planning over structured biological capabilities. BioManus first introduces the BioinfoMCP Compiler, which converts heterogeneous bioinformatics software into standardized MCP servers, yielding a large executable MCP ecosystem. It then organizes this ecosystem as a typed heterogeneous MCP graph over tools, operations, datatypes, and workflow stages. At inference time, BioManus retrieves compact task-specific subgraphs, synthesizes operation-level workflow scaffolds. This design decouples planning complexity from raw tool inventory size, achieving a context compression ratio of Theta(N / (h * m_bar)) under high-recall retrieval, where N is the total tool count, h is the workflow horizon, and m_bar (much smaller than N) is the average number of candidate tools per operation. Experiments on BioAgentBench and LAB-Bench show that BioManus improves execution accuracy, workflow validity, and context efficiency over advanced biomedical agent baselines. This work suggests a paradigm shift: scalable biomedical reasoning requires structured executable capability graphs rather than increasingly larger prompt-level tool retrieval.

2606.04493 2026-06-04 cs.CV cs.AI

SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning

SFMambaNet: 用于对应点筛选的频谱-频率增强选择性状态空间模型

Zhihua Wang, Yanping Li, Yizhang Liu

AI总结 提出SFMambaNet,通过局部频谱-几何注意力块和频谱集成全局Mamba块,首次将频域感知融入对应点筛选任务,增强内点与离点的区分能力。

详情
AI中文摘要

对应点筛选旨在从初始对应点集中识别内点。现有大多数基于图神经网络的方法依赖于从粗欧几里得坐标映射的几何特征,难以捕捉内点呈现的细微几何一致性。而基于Mamba的方法虽具有全局感受野和长序列建模能力,但往往在隐藏状态空间中积累大量不一致特征,难以区分内点与离点。本文首次将频域感知融入该任务,提出SFMambaNet,一种新颖的频谱-频率增强Mamba双视图对应点筛选网络。我们的方法由两个组件协同构成:首先,设计局部频谱-几何注意力(LSGA)块。LSGA将频谱位置编码融入局部图交互,并引入多尺度Mamba处理,以增强对细微几何一致性的捕捉并提升局部特征判别性。在此基础上,设计频谱集成全局Mamba(SIGM)块。SIGM在状态空间中嵌入频率门控机制,利用LSGA提供的频率信息显式抑制隐藏状态内高频噪声的累积,并减轻不一致特征的传播。这增强了内点-离点可分性,并以近乎线性的复杂度实现了鲁棒的全局上下文建模能力。大量实验表明,SFMambaNet在多个具有挑战性的任务上优于当前最先进方法。代码可在https://github.com/Kirito14IT/SFMambaNet获取。

英文摘要

Correspondence pruning aims to identify inliers from an initial set of correspondences. Most existing Graph Neural Network (GNN)-based methods rely on geometric features mapped from coarse Euclidean coordinates, which struggle to capture the subtle geometric consistencies presented by inliers. While Mamba-based methods possess global receptive fields and long sequence modeling capabilities, they tend to accumulate substantial inconsistent features within the hidden state space, making it difficult to distinguish inliers from outliers. In this paper, we integrate frequency domain perception into this task for the first time and propose SFMambaNet, a novel Spectral-Frequency enhanced Mamba-based two-view correspondence pruning network. Our method is collaboratively composed of two components: First, we design a Local Spectral-Geometric Attention (LSGA) block. LSGA incorporates spectral positional encoding into local graph interactions and introduces multi-scale Mamba processing to enhance the capture of subtle geometric consistencies and improve local feature discriminability. Building upon this, we design a Spectral-Integrated Global Mamba (SIGM) block. SIGM embeds a frequency gating mechanism within the state space, utilizing the frequency information provided by LSGA to explicitly suppress high-frequency noise accumulation within hidden states and mitigate the propagation of inconsistent features. This enhances inlier-outlier separability and achieves robust global context modeling capabilities with nearly linear complexity. Extensive experiments demonstrate that SFMambaNet outperforms current state-of-the-art methods on several challenging tasks. The code is available at https://github.com/Kirito14IT/SFMambaNet.

2606.04492 2026-06-04 cs.LG cs.GT

Episodic Memory Temporal Consistency for Cooperative Multi-Agent Reinforcement Learning

面向合作多智能体强化学习的 episodic 记忆时间一致性

Zicheng Zhao, Yu Lan, Chengzhengxu Li, Zhaohan Zhang, Xiaoming Liu

AI总结 针对合作多智能体强化学习中的奖励稀疏和探索瓶颈,提出 Episodic Memory Temporal Consistency (EMTC) 框架,通过时间一致性语义嵌入器和门控机制,防止表示崩溃并过滤伪成功轨迹,理论保证误差界,在 SMAC 和 GRF 基准上显著优于现有方法。

详情
Comments
Under Review
AI中文摘要

合作多智能体强化学习(MARL)经常遭受严重的奖励稀疏性和探索瓶颈。虽然 episodic 记忆机制通过重用高回报轨迹缓解了这些问题,但由于无约束的激励分布和语义表示崩溃,它们常常使智能体陷入局部最优。为了解决这个问题,我们提出了 Episodic Memory Temporal Consistency (EMTC),一个能够稳健构建并选择性利用历史经验的框架。EMTC 引入了两个协同组件:(1) 时间一致性语义嵌入器,它将对比学习与时间条件状态重建相结合,防止表示崩溃并实现精确的记忆检索;(2) 时间一致性门控机制,它根据时间一致性误差动态调节 episodic 激励。这个自适应门从伪成功轨迹中过滤误导信号,有效缓解 Q 值高估。我们提供了理论保证,建立了严格误差界,将可观测的时间一致性误差直接与底层轨迹最优性和表示质量联系起来。在 SMAC 和 GRF 基准上的广泛评估表明,EMTC 持续优于最先进的基线。值得注意的是,与最强的 episodic 基线相比,EMTC 在超难 SMAC 场景中实现了高达 24% 的绝对胜率提升,在 GRF 任务上平均提升 28%。

英文摘要

Cooperative Multi-Agent Reinforcement Learning (MARL) frequently suffers from severe reward sparsity and exploration bottlenecks. While episodic memory mechanisms mitigate these issues by reusing high-return trajectories, they often trap agents in local optima due to unconstrained incentive distribution and semantic representation collapse. To address this, we propose Episodic Memory Temporal Consistency (EMTC), a framework that robustly constructs and selectively leverages historical experiences. EMTC introduces two synergistic components: (1) a Temporally Consistent Semantic Embedder that integrates contrastive learning with time-conditioned state reconstruction, preventing representation collapse and enabling precise memory retrieval; and (2) a Temporal Consistency Gating Mechanism that dynamically modulates episodic incentives based on temporal consistency error. This adaptive gate filters misleading signals from pseudo-successful trajectories, effectively mitigating Q-value overestimation. We provide theoretical guarantees, establishing a strict error bound that directly links the observable temporal consistency error to the underlying trajectory optimality and representation quality. Extensive evaluations on the SMAC and GRF benchmarks demonstrate that EMTC consistently outperforms state-of-the-art baselines. Notably, compared to the strongest episodic baseline, EMTC achieves absolute win-rate improvements of up to 24% in super-hard SMAC scenarios and an average improvement of 28% across GRF tasks.

2606.04486 2026-06-04 cs.CR cs.CL cs.LG stat.ML

Global Sketch-Based Watermarking for Diffusion Language Models

基于全局草图的扩散语言模型水印

Daniel Zhao

AI总结 提出一种针对掩码扩散语言模型的全局向量草图水印方法,通过控制文本的整体统计特征实现与局部上下文无关的检测。

详情
AI中文摘要

语言模型的水印方法在自回归设置中已被广泛研究,其中令牌是顺序生成的。这些工作主要关注局部上下文方案,该方案根据前序令牌扰动下一个令牌的分布。在扩散语言模型中,许多未解析位置的分布被联合采样,使得整个序列的加性统计在生成过程中是可处理的。我们提出了一种针对掩码扩散语言模型的水印,该水印控制文本的全局向量草图表示。与上下文相关的水印相比,草图公式将检测与生成过程中看到的局部上下文解耦,从而产生一个顺序无关的统计量和一个不表现为简单令牌偏差的水印规则。我们分析了该方法的失真、合理性和鲁棒性。

英文摘要

Watermarking methods for language models have been studied extensively in the autoregressive setting, where tokens are generated sequentially. These works largely focus on local-context schemes that perturb the next token's distribution as a function of its preceding tokens. In diffusion language models, distributions over many unresolved positions are jointly sampled, allowing additive statistics of the entire sequence to be tractable during generation. We propose a watermark for masked diffusion language models that controls a global, vector-valued sketch representation of the text. Compared to context-dependent watermarking, the sketch formulation decouples detection from the local contexts seen during generation, resulting in an order-agnostic statistic and a watermarking rule which does not manifest as a simple token bias. We analyze the distortion, soundness, and robustness properties of the method.

2606.04484 2026-06-04 cs.AI cs.LG cs.MA

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

AgentJet:一种用于智能体强化学习的灵活群体训练框架

Qingxu Fu, Boyin Liu, Shuchang Tao, Zhaoyang Liu, Bolin Ding

AI总结 提出AgentJet,一种解耦的多节点群体训练框架,支持异构多模型强化学习、多任务鸡尾酒训练、容错执行和实时代码迭代,并通过上下文跟踪模块实现1.5-10倍训练加速。

详情
Comments
Technical report, 27 pages
AI中文摘要

我们提出了AgentJet,一个用于大型语言模型(LLM)智能体强化学习的分布式群体训练框架。与将智能体运行与模型优化紧密耦合的集中式框架不同,AgentJet采用解耦的多节点架构,其中群体服务器节点托管可训练模型并在GPU集群上运行优化,而群体客户端节点在任意设备上执行任意智能体。这种设计提供了集中式框架难以支持的能力:(1)异构多模型强化学习,支持训练具有多个LLM作为大脑的异构多智能体团队;(2)具有隔离智能体运行时的多任务鸡尾酒训练;(3)容错执行,防止外部环境故障中断训练过程;(4)实时代码迭代,允许通过替换群体客户端节点在训练期间编辑智能体。为了支持多模型、多轮和多智能体设置中的高效强化学习,AgentJet引入了一个带有时间线合并的上下文跟踪模块,该模块合并冗余上下文并实现1.5-10倍的训练加速。最后,AgentJet引入了一个自动化研究系统,该系统以研究主题为输入,并在大规模集群上自主进行长期、多天的强化学习研究。通过利用群体架构,该系统在无需人工干预的情况下复现了强化学习研究人员的关键探索工作流程。

英文摘要

We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client nodes execute arbitrary agents on arbitrary devices. This design provides capabilities that are difficult to support in centralized frameworks: (1) heterogeneous multi-model reinforcement learning, enabling the training of heterogeneous multi-agent teams with multiple LLM as brains; (2) multi-task cocktail training with isolated agent runtimes; (3) fault-tolerant execution that prevents external environment failures from interrupting the training process; and (4) live code iteration, which allows agents to be edited during training by replacing swarm client nodes. To support efficient RL in multi-model, multi-turn, and multi-agent settings, AgentJet introduces a context tracking module with timeline merging, which consolidates redundant context and achieves a 1.5-10x training speedup. Finally, AgentJet introduces an automated research system that takes a research topic as input and autonomously conducts long-horizon, multi-day RL studies on large-scale clusters. By leveraging the swarm architecture, this system reproduces key exploratory workflows of RL researchers without human intervention during execution.