arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2085
2605.06916 2026-05-11 cs.LG

Tyche: One Step Flow for Efficient Probabilistic Weather Forecasting

Fan Xu, Yuan Gao, Kun Wang, Rui Su, Fenghua Ling, Hao Wu, Wanli Ouyang

AI总结 Tyche 是一种用于高效概率天气预报的一步式条件流模型,旨在解决传统扩散模型在长期预测中计算成本高的问题。该方法通过一个目标感知的平均速度流,直接将高斯噪声映射到未来天气状态,仅需一次函数评估即可完成预测。Tyche 采用改进的 rectification 目标函数和基于 Swin 结构的变压器网络,有效保持了高维地理场的空间细节并提升了计算效率,实验表明其在预报精度和不确定性量化方面优于现有方法。

详情
英文摘要

Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have substantially improved sample realism and uncertainty quantification. However, their inference cost scales with forecast horizon, ensemble size, and the number of denoising steps required for each transition, making large operational ensembles expensive. To address this, we present Tyche, a one-step conditional flow model for efficient probabilistic weather forecasting. Tyche models the conditional forecast distribution with a destination-aware average-velocity flow that maps Gaussian noise directly to future weather states in a single function evaluation (1-NFE). To make this one-step transport learnable in high-dimensional geophysical fields, we derive a JVP-regularized rectification objective that enforces temporal self-consistency across source and destination flow timesteps without explicitly forming Jacobians. The transport field is parameterized by an isotropic Swin-style transformer that preserves fine-scale spatial structure while remaining scalable on global grids. To improve ensemble reliability under autoregressive forecasting, we further introduce a rollout-based finetuning stage with curriculum CRPS calibration supervision. Experiments on ERA5 at 1.5$^\circ$ and 6-hour resolution show that our Tyche, using merely a single NFE, matches or exceeds the forecast skill and calibration of state-of-the-art multi-step generative baselines and the operational ECMWF IFS ensemble.

2605.06912 2026-05-11 cs.CV

Advancing Reliable Synthetic Video Detection: Insights from the SAFE Challenge

Kirill Trapeznikov, Gabriel Mancino-Ball, Jonathan Li, Paul Cummer, Jai Aslam, Danial Samadi Vahdati, Tai Nguyen, Matthew C. Stamm, Peter Bautista, Michael Davinroy, Laura Cassani, Jill Crisman

AI总结 随着生成式视频技术的快速发展,检测和识别合成视频的需求日益迫切。为应对这一挑战,研究者组织了SAFE合成视频检测竞赛,旨在评估算法在盲测条件下区分真实与合成视频的能力。竞赛数据集包含13种现代高质量合成视频模型生成的内容,并与来自21个不同来源的真实视频进行匹配,共涵盖6000个样本、20小时的视频内容。研究分析了当前检测方法的泛化能力和鲁棒性,发现尽管在跨模型检测方面取得进展,但对后期处理痕迹仍存在明显脆弱性。

详情
英文摘要

The proliferation of generative video technologies has intensified the need for reliable methods to detect and characterize synthetic media. To address this challenge, we organized the \href{https://safe-video-2025.dsri.org}{SAFE: Synthetic Video Detection Challenge}, co-located with the \textit{Authenticity and Provenance in the Age of Generative AI (APAI) Workshop }at ICCV 2025. The competition invited participants to develop and evaluate algorithms capable of distinguishing real from synthetic videos under fully blind evaluation conditions with over 600 submissions from 12 teams over a 90 day span. Hosted on the Hugging Face platform, the challenge comprised two primary tasks: (1) detection of synthetic video content generated by diverse state-of-the-art models, and (2) detection of synthetic content following common post-processing operations such as resizing, re-compression, motion blur and others. The challenge data consisted of 13 modern high quality synthetic video models with generated content matched to real videos from 21 diverse and challenge sources, all adding up to 20 hours of 6,000 video samples. This paper describes the challenge design, dataset construction, evaluation methodology, and outcomes, offering insights into the generalization and robustness of contemporary synthetic video detection methods. Our findings highlight measurable progress in cross-generator generalization but also persistent vulnerabilities to post-processing artifacts. https://safe-video-2025.dsri.org

2605.06911 2026-05-11 cs.LG

Dual-Scale Temporal Fusion Reveals Structured Predictability in Subseasonal-to-Seasonal Temperature Prediction

Elnaz Bashir, Jiali Wang, Lin Yan

AI总结 该研究探讨了次季节到季节(S2S)温度预测中的结构化可预测性问题,指出预测能力不仅与预测时效相关,还受时间尺度、空间异质性和大尺度模式一致性的影响。研究提出了一种双尺度学习框架,通过分离历史气候背景与近期天气演变,并进行空间自适应融合,实现了30至90天范围内的稳定温度预测。研究发现,预测能力的分布随季节和地理条件系统性变化,并通过拓扑感知结构约束进一步提升了预测场的空间一致性,为改进S2S预测系统提供了新的理论基础。

Comments 10 pages, 5 figures

详情
英文摘要

Subseasonal-to-seasonal (S2S) temperature forecasts, spanning several weeks to a few months, are critically needed in agriculture practice, energy planning, and extreme-weather induced risk management, yet their reliability varies substantially across seasons and regions. Forecast skill is often attributed primarily to lead time, but this perspective does not fully explain the spatiotemporal patterns of predictability. Here we show that S2S predictability is organized across interacting temporal components, spatial heterogeneity, and large-scale pattern coherence, and that this structure can be explicitly characterized and exploited. We develop a dual-scale learning framework that separates calendar-aligned historical climate context from lead-time matched recent weather evolution, combining them through spatially adaptive fusion to enable stable temperature forecasts across the 30 to 90-day window. The learned fusion weights reveal that the balance between these two temporal scales shifts systematically with season and geography: during winter, interannual context dominates over high latitudes and complex terrain where forecast is the most difficult, while summer predictions reflect a more balanced temporal contribution across the domain. This spatially explicit reorganization of predictability, rather than simple lead-time decay, emerges as the primary determinant of forecast skill within the subseasonal window. Topology-aware structural constraints further improve spatial coherence of predicted temperature fields, stabilizing large-scale pattern organization particularly over complex terrain. These results reframe S2S predictability as a structured, multi-scale phenomenon, providing a more interpretable foundation for improving forecast systems and informing their use in practice.

2605.06908 2026-05-11 cs.LG cs.AI

Same Signal, Opposite Meaning: Direction-Informed Adaptive Learning for LLM Agents

Ziming Li, Jiatan Huang, Xiaoguang Guo, Guilin Wang, Chuxu Zhang

AI总结 本文研究了大语言模型代理在测试时如何根据需要动态调整计算资源的问题,指出现有方法基于固定方向的信号(如置信度或不确定性)来判断是否需要额外计算,但这种方向在不同环境和模型中可能反转,导致性能下降。为此,作者提出了DIAL方法,通过无信号依赖的反事实探索学习状态特征的效用方向,从而在多个环境中实现了更优的性能与计算成本的平衡。

详情
英文摘要

Adaptive test-time compute for LLM agents aims to invoke extra computation only when it improves performance. Existing methods typically use confidence-, uncertainty-, or difficulty-based gates, assuming a fixed direction from the gating signal through compute need to the value of computation. This makes gating a utility-calibration problem: gating signals should align with whether extra computation improves the final outcome over the base policy. We show that this alignment is unstable: the same signal predicts rollout benefit in one setting and rollout harm in another, with reversals across environments and backbones even when the task is fixed. Wrong-direction gates can therefore worsen performance by precisely selecting harmful states. This reversal reflects a deeper distinction between compute need and compute suitability: a high uncertainty signal may indicate decision-difficult states where rollouts help compare alternatives, or intervention-unsuitable states where the current context does not support useful rollout-based improvement. Under this two-source model, fixed-direction gates are unreliable across heterogeneous settings. To address this, we propose DIAL (Direction-Informed Adaptive Learning), a sparse gate trained from signal-agnostic counterfactual exploration to learn the utility direction of state features per (environment, backbone). Across six environments and three backbones, DIAL yields a stronger overall success-cost trade-off than fixed-direction baselines.

2605.06906 2026-05-11 cs.LG

TraXion: Rethinking Pre-training Frameworks for Mobility and Beyond

Shang-Ling Hsu, Mark Tenzer, Cyrus Shahabi, Khurram Shafique

AI总结 本文提出了一种名为TraXion的预训练框架,旨在更准确地建模人类移动性及其他多实体时空事件流(MESES)数据。与传统将轨迹视为句子的方法不同,TraXion基于移动性数据的三个关键特性——事件的联合分布、用户的持续签名以及用户间的共现关系——设计了专门的预训练目标和架构。实验表明,TraXion在多个公开移动性数据集上优于任务特定基线,并且其方法同样适用于企业认证日志和重症监护预测等不同领域,展示了其广泛适用性。

Comments 31 pages, 2 figures

详情
英文摘要

Human mobility differs from text and from generic time series in three structural ways: visits are tuple-valued events whose meaning depends on the joint distribution over location, time, and activity; users carry persistent signatures across trajectories; and visits are not independent across users, since co-location at shared places is a primary signal. Existing pre-training recipes for mobility import objectives from language modeling, treating trajectories as sentences and visits as tokens, an analogy that fails against each of the three properties above. These properties define a broader class, multi-entity spatiotemporal event streams (MESES), spanning enterprise authentication logs, electronic health records, and other event-stream domains where entities share infrastructure, schedules, or contexts. We make the properties precise as three axioms that any pre-training framework for MESES should satisfy, and introduce TraXion, whose objectives and architecture are jointly designed to meet them. A single TraXion checkpoint per dataset beats task-specific baselines on every task across six public mobility datasets covering anomaly detection, next-POI recommendation, next-visit prediction, and social-link prediction. The same recipe, applied unchanged to enterprise authentication logs and ICU mortality prediction, matches or exceeds prior work on both, showing that event streams from domains as different as mobility, security, and healthcare can be modeled under a single framework.

2605.06905 2026-05-11 cs.LG

Conservative Flows: A New Paradigm of Generative Models

Eshed Gal, Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber

AI总结 本文提出了一种生成模型的新范式——保守流,通过离散随机动力学在数据分布不变的前提下进行生成,初始状态来自数据支持的状态而非噪声。研究开发了两种保持概率的采样机制,能够在现有模型基础上直接使用,实验表明该方法在合成数据和真实图像数据集上均优于原有生成方法。

详情
英文摘要

Modern generative modeling is dominated by transport from a noise prior to data. We propose an alternative paradigm in which generation is performed by a discrete stochastic dynamics that leaves the data distribution invariant, initialized from data-supported states rather than from noise. The framework can utilize any pretrained flow model. We develop two probability-preserving sampling mechanisms, a corrected Langevin dynamics with a Metropolis adjustment and a predictor-corrector flow, that operate directly on existing checkpoints. We validate the framework on a synthetic Swiss-roll target, ImageNet-256 and Oxford Flowers-102, where our samplers consistently improve over the original generation procedures.

2605.06903 2026-05-11 cs.CL cs.AI

MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

Chenjun Li, Cheng Wan, Johannes C. Paetzold

AI总结 随着大型语言模型广泛应用于日常写作流程,可靠检测AI生成文本对于维护学术诚信和内容审核至关重要。为此,研究提出了MELD,一种通过多任务均衡学习增强检测性能的AI生成文本检测器。MELD通过引入生成器家族、攻击类型和来源域的辅助监督任务,并结合不确定性加权损失和对抗训练策略,显著提升了检测的鲁棒性与泛化能力。实验表明,MELD在多个基准测试中表现优异,尤其在低误报率和对抗攻击场景下具有明显优势。

Comments 17 pages, 6 figures

详情
英文摘要

Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high aggregate AUROC on clean, in-distribution human and AI text: it should remain robust to attacks and adversarial rewrites, transfer to unseen generators and domains, and operate at low false-positive rates (FPR). Most existing detectors optimize a single AI/Human objective, giving the representation little incentive to learn generator, attack, or domain structure once the binary task saturates. We introduce MELD (Multi-Task Equilibrated Learning Detector), a deployable detector for AI-generated text that enriches binary detection with auxiliary supervision. MELD attaches generator-family, attack-type, and source-domain heads to a shared encoder, and balances the four losses with learned homoscedastic uncertainty weights. To improve robustness, an EMA teacher predicts on clean inputs while an attack-augmented student is distilled toward the teacher. MELD further uses a hard-negative pairwise ranking loss to enlarge the score margin between AI-generated texts and the most confusable human texts. At inference, all auxiliary heads are discarded, giving MELD the same interface and cost as a standard detector. On the public RAID leaderboard, MELD is the strongest open-source detector and is competitive with leading commercial models, especially under attack and at low FPR. Across standard held-out benchmarks, MELD matches or outperforms supervised baselines. We further introduce MELD-eval, a held-out evaluation pool built from recent chat models released by four major LLM providers. Without additional finetuning, MELD achieves 99.9% TPR at 1% FPR on MELD-eval, while many baselines degrade sharply.

2605.06902 2026-05-11 cs.LG

Streaming Adversarial Robustness in Fuzzy ARTMAP: Mechanism-Aligned Evaluation, Progressive Training, and Interpretable Diagnostics

Shane Cairns, Leonardo Enzo Brito da Silva, Sasha Petrenko, Donald C. Wunsch, Jian Liu

AI总结 本文研究了模糊ARTMAP(Fuzzy ARTMAP)在流式数据场景下的对抗鲁棒性,提出了一种与该模型机制对齐的评估方法、渐进式训练策略以及可解释的诊断工具。通过引入与ARTMAP机制匹配的可微白盒攻击方法WB-Softmax,揭示了传统离线对抗训练在流式模型中可能失效的问题,并发现渐进式分阶段选择训练能提供最强的无回放鲁棒性。研究还表明,ARTMAP明确的类别几何结构有助于诊断模型中的分离崩溃和匹配得分反转等关键问题,为流式原型学习模型的对抗鲁棒性研究提供了机制对齐的框架。

Comments 35 pages, 3 figures, 11 tables. Preprint submitted to Neural Networks

详情
英文摘要

Adversarial robustness has been studied extensively for offline deep networks, but less is known about strict single-pass streaming neural learners. This paper studies adversarial robustness in Fuzzy ARTMAP, an Adaptive Resonance Theory architecture based on category competition, complement coding, match tracking, and replay-free prototype updates. We introduce WB-Softmax, a differentiable white-box attack surrogate aligned with ARTMAP's category-competition and map-field prediction mechanism, and formalize a streaming evaluation principle requiring robustness to be assessed on the final deployed model. Across four image benchmarks, WB-Softmax achieves 89-100% attack success on vanilla Fuzzy ARTMAP models. We show that defense rankings can reverse across protocols: offline adversarial training may appear strong under transfer attacks yet collapse under adaptive white-box evaluation, whereas progressive two-stage selective training provides the strongest overall replay-free robustness. We further show that ART's explicit category geometry enables interpretable diagnosis of separation collapse and match-score inversion. These results provide a mechanism-aligned, protocol-aware framework for adversarial robustness in streaming prototype-based learners.

2605.06901 2026-05-11 cs.CL

Reflections and New Directions for Human-Centered Large Language Models

Caleb Ziems, Dora Zhao, Rose E. Wang, Matthew Jörke, Ahmad Rushdi, Advit Deepak, Sunny Yu, Anshika Agarwal, Harshvardhan Agarwal, Gabriela Aranguiz-Dias, Aditri Bhagirath, Justine Breuch, Huanxing Chen, Ruishi Chen, Sarah Chen, Haocheng Fan, William Fang, Cat Gonzales Fergesen, Daniel Frees, Tian Gao, Ziqing Huang, Vishal Jain, Yucheng Jiang, Kirill Kalinin, Su Doga Karaca, Arpandeep Khatua, Teland La, Isabelle Levent, Miranda Li, Xinling Li, Yongce Li, Angela Liu, Minsik Oh, Nathan J. Paek, Anthony Qin, Emily Redmond, Michael J. Ryan, Aadesh Salecha, Xiaoxian Shen, Pranava Singhal, Shashanka Subrahmanya, Mei Tan, Irawadee Thawornbut, Michelle Vinocour, Xiaoyue Wang, Zheng Wang, Henry Jin Weng, Pawan Wirawarn, Shirley Wu, Sophie Wu, Yichen Xie, Patrick Ye, Sean Zhang, Yutong Zhang, Cathy Zhou, Yiling Zhao, James Landay, Diyi Yang

AI总结 随着大语言模型在多个领域广泛应用,如何在技术能力之外优先考虑人类需求成为关键问题。本文提出了一种以人为本的大语言模型(HCLLMs)开发框架,融合自然语言处理、人机交互和负责任AI的视角,强调在模型设计、数据获取、训练、评估及部署的每个阶段都应充分考虑人类的价值观与目标。文章还通过案例研究探讨了HCLLMs对未来工作模式的影响,为开发者提供了系统性的指导与建议。

详情
英文摘要

Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, evaluate, and deploy these systems in a manner that prioritizes not only technical capabilities but also human priorities. This work presents a framework for developing Human-Centered Large Language Models (HCLLMs), which integrates perspectives from Natural Language Processing (NLP), Human-Computer Interaction (HCI), and responsible AI. Considering the ethics, economics, and technical objectives of language modeling, we argue that model developers need to address human concerns, preferences, values, and goals, not only during a cursory post-training stage, but rather with rigor and care at every stage of the pipeline. This paper offers human-centered insights and recommendations for developers at each stage, from system design to data sourcing, model training, evaluation, and responsible deployment. Then we conclude with a case study, applying these insights to understand the future of work with HCLLMs.

2605.06898 2026-05-11 cs.AI

Self-Programmed Execution for Language-Model Agents

Luke J. O'Connor

AI总结 本文提出了一种名为自编程执行(SPE)的语言模型智能体架构,其核心思想是让模型自身完成状态转移的协调工作,而非依赖固定的调度程序。为此,作者引入了基于Lisp的Spell语言,使程序能够自我编辑和重新评估,从而实现无固定调度策略的智能体行为。实验表明,即使未针对SPE进行训练的前沿模型也能在该框架下完成复杂的智能体任务,展示了语言模型无需固定调度策略即可作为智能体运行的潜力。

详情
英文摘要

At the heart of existing language model agents is a fixed orchestrator program responsible for the state transition between consecutive turns. This paper introduces self-programmed execution (SPE), an agent architecture in which the model completion is itself the orchestrator program, and the harness evaluates this program but does not impose its own orchestration policy. I formalize this idea using agentic machines: an SPE state is one from which a model completion can load any state of an embedded copy of the machine, meaning that it is subject to no fixed turn-to-turn orchestration policy. Realizing SPE in practice is nontrivial because the same data is both model context and executable program. I therefore introduce Spell, a Lisp-based language in which programs can edit and re-evaluate themselves, and effectful expressions like model invocations are structured such that re-evaluating an edited program does not replay its side effects. Experiments with existing models, not trained for SPE or Spell, show that frontier models can operate in this regime and accomplish challenging agentic tasks. These results demonstrate how an LM can act as an agent without any fixed orchestration policy, and they raise the question of what self-orchestration strategies might be learned by a model trained for self-programmed execution. Code is available at https://github.com/lukejoconnor/spell .

2605.06897 2026-05-11 cs.CL cs.AI cs.HC cs.MM cs.SD eess.AS

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

Maximillian Chen, Xuanming Zhang, Michael Peng, Zhou Yu, Alexandros Papangelis, Yohan Jo

AI总结 随着物联网设备的普及,需要能够处理复杂用户交互的语音接口。本文提出MIST,一个基于语音的多模态工具调用数据集,用于智能家居场景中的代码生成任务,旨在解决现实环境中设备状态跟踪、时空约束和混合主动交互等挑战。研究发现,开放权重和闭源大语言模型在MIST任务上表现存在明显差距,且当前先进闭源模型仍有较大提升空间。MIST及其生成框架的发布,为相关研究提供了重要资源。

Comments Project Page: https://billyzhang24kobe.github.io/mist-smarthome/

详情
英文摘要

The rise of Internet of Things (IoT) devices in the physical world necessitates voice-based interfaces capable of handling complex user experiences. While modern Large Language Models (LLMs) already demonstrate strong tool-usage capabilities, modeling real-world IoT devices presents a difficult, understudied challenge which combines modeling spatiotemporal constraints with speech inputs, dynamic state tracking, and mixed-initiative interaction patterns. We introduce MIST (the Multimodal Interactive Speech-based Tool-calling Dataset), a synthetic multi-turn, voice-driven code generation task that operates over IoT devices. We find that there is a significant gap between open- and closed-weight multimodal LLMs on MIST, and that even frontier closed-weight LLMs have substantial headroom. We release MIST and an extensible data generation framework to build related datasets in order to facilitate research on mixed-initiative voice assistants which reason about physical world constraints.

2605.06895 2026-05-11 cs.AI

Mitigating Cognitive Bias in RLHF by Altering Rationality

Tiffany Horter, Andrew Markham, Niki Trigoni, Serena Booth

AI总结 本文研究如何使模型对不完美的人类反馈更具鲁棒性,提出了一种通过动态调整理性参数来缓解强化学习中人类反馈偏差的方法。该方法基于对人类判断中认知偏差的识别,利用大型语言模型作为评估者,在奖励学习过程中动态调整理性参数,从而降低偏差判断的影响。实验表明,该方法能有效提升下游模型的合理性,即使在面对存在强烈偏见的偏好数据集时也表现良好。

详情
英文摘要

How can we make models robust to even imperfect human feedback? In reinforcement learning from human feedback (RLHF), human preferences over model outputs are used to train a reward model that assigns scalar values to responses. Because these rewards are inferred from pairwise comparisons, this learning depends on an assumed relationship between latent reward differences and observed preferences, typically modeled using a Boltzmann formulation in which a rationality parameter beta informs how consistently preferences reflect reward differences. In practice, beta is typically treated as a fixed constant that reflects assumed uniform annotator reliability. However, human feedback is not this simplistic in practice: real human judgments are shaped by cognitive biases, leading to systematic deviations from reward-consistent behavior that arise contextually. To address this, we treat rationality as context- and annotation-dependent. We design an approach to dynamically adjust the rationality parameter beta during reward learning using an LLM-as-judge to assess the likely presence of cognitive biases. This approach effectively downweights comparisons that are likely to reflect biased or unreliable judgments. Empirically, we show that this approach learns a more rational downstream model, even when finetuning on datasets with strongly biased preferences.

2605.06892 2026-05-11 cs.CV

Not All Tokens Need 40 Steps: Heterogeneous Step Allocation in Diffusion Transformers for Efficient Video Generation

Ernie Chu, Vishal M. Patel

AI总结 扩散变换器(DiTs)在视频生成任务中取得了最先进的质量,但其计算成本高昂,因为传统推理过程对序列中的每个标记使用相同数量的去噪步骤。本文提出了一种无需训练的推理算法——异构步分配(HSA),根据时空标记的运动动态为其分配不同的步数预算,从而提升效率。HSA引入了键值缓存同步机制和缓存欧拉更新方法,在保证全局上下文的前提下实现高效推理,并在多个视频生成任务中表现出色,尤其在加速比高的情况下显著优于现有方法。

Comments Project page: https://ernestchu.github.io/hsa

详情
英文摘要

Diffusion Transformers (DiTs) have achieved state-of-the-art video generation quality, but they incur immense computational cost because standard inference applies the same number of denoising steps uniformly to every token in the sequence. It is well known that human vision ignores vast amounts of redundant motion. Why, then, do our densest models treat every spatiotemporal token with equal priority? In this paper, we introduce Heterogeneous Step Allocation (HSA), a training-free inference algorithm that assigns varying step budgets to different spatiotemporal tokens based on their velocity dynamics. To resolve the resulting sequence-length mismatch without sacrificing global context, HSA introduces a KV-cache synchronization mechanism that allows active tokens to attend to the full sequence while entirely bypassing inactive tokens. Furthermore, we derive a cached Euler update that advances the latent states of skipped tokens in a single operation without additional model evaluations. We evaluate HSA on the Wan-2 and LTX-2 models for both text-to-video (T2V) and image-to-video (I2V) generation. Our results demonstrate that HSA significantly outperforms previous state-of-the-art caching methods and the vanilla Flow Matching baseline, especially at aggressive acceleration regimes (e.g., 50% and 25% runtimes). Crucially, HSA achieves a superior quality-runtime Pareto frontier without the need for expensive offline profiling, robustly preserving structural integrity and generation quality even under tight computational budgets. Project page: https://ernestchu.github.io/hsa

2605.06891 2026-05-11 cs.CV cs.LG

Towards Fairness under Label Bias in Image Segmentation: Impact, Measurement and Mitigation

Aditya Parikh, Stella Frank, Sneha Das, Aasa Feragen

AI总结 该研究探讨了图像分割任务中标签偏差(label bias)对公平性的影响,提出了一种无需干净标注即可检测和缓解标签偏差的方法。基于自信学习(Confident Learning)的改进方法,通过比较模型的置信预测与训练标签,识别出标签偏差的方向和程度,传统重叠度量如Dice系数无法做到这一点。研究还发现标签偏差会影响编码器特征空间中的子群可分性,并利用这一特性进行偏差缓解,实验表明该框架在多种数据集上有效提升了模型的公平性。

详情
英文摘要

Labeled datasets reflect the biases of their annotation pipelines, which sometimes introduce label bias: group-conditional label errors that cause systematic performance disparities across demographic subgroups. Label bias in image segmentation remains underexplored, as even detecting it typically requires clean, unbiased annotations, which are not readily available. We present a data-centric adaptation of Confident Learning to segmentation, allowing detection of label bias directly in the training data without a clean, unbiased ground truth. By comparing the provided training labels to the model's confident predictions, we isolate directional errors that quantify the presence and nature of bias, where standard overlap metrics like Dice fail. We further show that label bias influences subgroup separability in the encoder's feature space, an artifact we leverage for bias mitigation rather than suppressing it. We evaluate three datasets, spanning from synthetic to real-life bias, showing how our framework reliably detects and mitigates bias without access to clean labels, achieving equitable performance across experimental conditions.

2605.06889 2026-05-11 cs.CV

TriDE: Triangle-Consistent Translation Directions for Global Camera Pose Estimation

Francisco Chen, Yiran Wang, Yunpeng Shi

AI总结 本文提出了一种名为 TriDE 的方法,用于全局相机位姿估计中的全局翻译方向估计。该方法通过利用相机三角一致性作为高阶验证信号,解决了现有方法中成对翻译方向独立处理导致的局部合理但全局不一致的问题。TriDE 通过在方向与其关联的加权三角形之间进行信息传递,有效修正不可靠的成对方向,实验表明其在真实图像图上显著提升了方向精度和后续相机位姿估计效果。

Comments 32 pages, 6 figures

详情
英文摘要

Pairwise translation directions are a key input to camera location estimation in global structure-from-motion. Existing estimators usually process each image pair independently, producing directions that may be locally plausible but inconsistent with the other relative directions in the viewing graph. To jointly estimate the direction, we propose TriDE, which exploits camera-triangle consistency as an efficient higher-order verification signal. Instead of solving a costly global nonlinear optimization problem that is sensitive to initialization, TriDE refines unreliable pairwise directions through message passing between directions and their incident weighted triangles. This information propagation strategy enables us to establish a strong phase-transition bound for exact recovery under a realistic random corruption model. Experiments on real image graphs show that TriDE improves direction accuracy by a large margin and yields better downstream camera locations, providing a practical link between local pairwise estimation and global camera pose geometry.

2605.06886 2026-05-11 cs.CL

TajPersLexon: A Tajik-Persian Lexical Resource and Hybrid Model for Cross-Script Low-Resource NLP

Mullosharaf K. Arabov

AI总结 本文介绍了TajPersLexon,一个包含40,112个塔吉克-波斯语词和短语对的平行词典资源,用于跨书写系统下的低资源自然语言处理任务,如词项检索、转写和对齐。研究对比了三种方法,包括轻量级混合管道、神经序列到序列模型和检索方法,结果表明该任务在低资源环境下是可解的,神经和检索方法在top-1准确率上达到98-99%。作者进一步提出了一种可解释的混合模型,在OCR后校正任务中达到96.4%的准确率,展示了其在准确率与效率之间的良好平衡。

Comments Published in The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family (SilkRoadNLP 2026), pages 29-37, Rabat, Morocco. Association for Computational Linguistics

详情
Journal ref
Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family (SilkRoadNLP 2026), pages 29-37
英文摘要

This work introduces TajPersLexon, a curated Tajik--Persian parallel lexical resource of 40,112 word and short-phrase pairs for cross-script lexical retrieval, transliteration, and alignment in low-resource settings. We conduct a comprehensive CPU-only benchmark comparing three methodological families: (i) a lightweight hybrid pipeline, (ii) neural sequence-to-sequence models, and (iii) retrieval methods. Our evaluation establishes that the task is essentially solvable, with neural and retrieval baselines achieving 98-99% top-1 accuracy. Crucially, we demonstrate that while large multilingual sentence transformers fail on this exact lexical matching, our interpretable hybrid model offers a favorable accuracy-efficiency trade-off for practical applications, achieving 96.4% accuracy in an OCR post-correction task. All experiments use fixed random seeds for full reproducibility. The dataset, code, and models will be publicly released.

2605.06885 2026-05-11 cs.LG cs.AI

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong

AI总结 本文研究了如何将自回归语言模型(AR-LM)适配为扩散语言模型(DLM),提出了一种无需重新训练语言表示的表示对齐方法。通过在扩散模型中对齐自回归模型的隐藏状态,该方法在不改变模型结构和不引入适配器的情况下,显著加速了训练过程,尤其在数据量较少时表现优异。实验表明,语言表示可以在不同生成顺序之间迁移,表示对齐为训练扩散语言模型提供了一种简单有效的解决方案。

Comments Code available at https://github.com/pengzhangzhi/Open-dLLM

详情
英文摘要

Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a representation alignment objective that adapts a bidirectional masked diffusion model to reuse representations from a pretrained AR model of identical architecture. Concretely, we align the hidden states of the DLM to the frozen AR model at every layer using cosine similarity, while optimizing the standard masked denoising objective. This simple alignment, with no adapters and no architectural changes beyond the attention mask, yields up to 4x training acceleration in our setting and is particularly effective in low-data regimes. Our results suggest that linguistic representations can transfer across generation order, and that representation alignment provides a simple and effective technique for training diffusion language models. Code is available at https://github.com/pengzhangzhi/Open-dLLM.

2605.06882 2026-05-11 cs.AI

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

Chun Zheng, Lianlong Wu, Bingqian Li, Lvting Liu, Yi Zhou

AI总结 本文评估了大语言模型(LLMs)在最简单的长链推理任务——等价类问题(ECP)上的表现,该任务要求根据给定的等价关系判断两个变量是否相等。研究比较了推理型和非推理型LLMs在不同变量数量、连接概率等因素下的性能,发现非推理模型在ECP任务中表现不佳,而推理模型虽然显著优于前者,但仍难以完全解决问题。研究还发现,非推理模型在连接概率接近临界点时表现最差,而推理模型则在图直径最大的情况下面临最大挑战,揭示了两类模型在处理此类任务时的不同困难来源。

Comments 9 pages, 5 figures

详情
英文摘要

Large Language Models (LLMs) have achieved great improvements in recent years. Nevertheless, it still remains unclear how good LLMs are for reasoning tasks, especially for long-chain ones. In this paper, we evaluate LLMs' performance on the simplest yet long-chain reasoning task, namely the Equivalence Class Problem (ECP), i.e., determining whether two variables are equal given a set of randomly generated equivalence relations. We consider both reasoning and non-reasoning representative LLMs over a large variety of problem instances, ranging over different numbers of variables, connectivity probabilities, prompts, and other factors. The experimental results show that non-reasoning LLMs fail ECP, while reasoning models are significantly better but still struggle to completely solve this problem. Interestingly, considering various connectivity probabilities with a fixed number of variables, we observe that, for non-reasoning models, the hardest problem instances coincide with the phase transition point of ln n/(n-1), suggesting the chaos of the problem; in contrast, for reasoning models, the hardest ones coincide with the biggest diameter, suggesting the reasoning difficulty of the problem.

2605.06879 2026-05-11 cs.LG q-bio.QM

Better Protein Function Prediction by Modeling Survivorship Bias

Zhongmou Chao, Poompol Buathong, Ekaterina Selivanovitch, Susan Daniel, Peter I. Frazier

AI总结 该研究针对蛋白质功能预测中因自然选择导致的幸存者偏差问题,提出了一种基于进化知识的正例-未标记例学习框架Evo-PU。该方法通过建模序列在进化过程中的可观测性差异,区分因非功能而未被观察到的序列与因突变路径罕见而未出现的序列,从而提升预测准确性。实验表明,Evo-PU在多个单物种和多物种数据集上均优于现有方法,展示了其在蛋白质功能预测中的有效性与广泛适用性。

Comments 29 pages, 12 figures, 3 tables

详情
英文摘要

Protein sequence data from nature exhibits survivorship bias: we only observe data from those organisms that survive and reproduce, while non-functional protein mutations are eliminated by natural selection. Thus, predicting whether a protein sequence is functional often requires learning from positive examples alone. While positive-unlabeled (PU) learning frameworks offer a generic solution to this problem, existing PU methods ignore the evolutionary processes that shape sequence observability and cause survivorship bias. Consider a sequence that is one mutation away from a commonly-observed protein variant in a well-surveilled organism. If the sequence were functional, it would likely be observed. If it is not observed, this suggests non-functionality. In contrast, sequences that are unlikely to arise through mutation may be missing simply because they never arose. Thus, these two kinds of missing sequences should be treated differently when training models. In this work, we propose Evo-PU, a PU learning framework that uses a scientific understanding of nucleotide mutation to model survivorship bias for well-surveilled single-organism sequence data. On three prediction tasks using single-organism uniform-coverage surveillance data -- predicting results from held-out influenza and respiratory syncytial virus (RSV) mutagenesis studies, and predicting future SARS-CoV-2 variants -- Evo-PU outperforms standard PU learning, one-class classification (OCC), and protein language models (PLMs). On prediction tasks from multi-organism ProteinGym datasets with more heterogeneous surveillance coverage, we identify opportunities to generalize our approach.

2605.06877 2026-05-11 cs.LG

Temporal Attention for Adaptive Control of Euler-Lagrange Systems with Unobservable Memory

Giansalvo Cirrincione, Adriano Fagiolini

AI总结 本文研究了在存在不可观测内部状态的欧拉-拉格朗日系统中,如何实现具有自适应控制能力的摩擦补偿。为解决传统控制方法在非马尔可夫状态下的收敛性问题,作者提出了一种基于自注意力机制的元控制架构,通过处理近期运动历史来动态生成控制增益。实验表明,在短记忆场景下该方法显著优于深度Transformer基线,但在长记忆场景下则表现出稳定性不足的问题,从而引出了在强化学习过程中动态调整注意力头数量的改进方向。

详情
英文摘要

Adaptive control of Euler-Lagrange systems is challenging when friction is governed by a finite-horizon internal state that is not directly observable from joint measurements. In this setting, the measured closed-loop state is no longer Markovian, and standard certainty-equivalence adaptive laws may lose their convergence guarantees. The paper proposes a meta-control architecture in which the gains of a computed-torque controller are generated by a self-attention block processing a short window of recent motion history. The number of attention heads is selected before policy training through a surrogate analysis of the autocovariance of the memory-state gradient along the temporal window. This surrogate is based on a temporal adaptation of an incremental rank-tracking framework previously developed by the authors. The selected head count is then fixed and used as an architectural hyperparameter in a reinforcement-learning stage, where the policy is trained under a shielded admissibility constraint. The approach is tested on a 2-DOF manipulator with nonlinear friction and variable payload. In the short and matched memory regimes, the single-layer attention-only meta-controller outperforms a deeper Transformer baseline, with tracking-error reductions of 12 and 19 percentage points, respectively. The reported effect sizes are large, with d approximately -1.1 and -2.1, and Mann-Whitney p < 0.05 in both cases. In the long memory regime, however, the advantage disappears. Four out of ten training runs show either divergence or payload-invariant policy collapse, revealing a weakness in the static Phase-1 head-count prescription. This motivates moving rank-tracking inside the reinforcement-learning loop, allowing attention heads to be pruned or grown at runtime instead of fixed before training.

2605.06876 2026-05-11 cs.CV

AdpSplit: Error-Driven Adaptive Splitting for Faster Geometry Discovery in 3D Gaussian Splatting

Yongjae Lee, Jingxing Li, Abhay Kumar Yadav, Rama Chellappa, Deliang Fan

AI总结 在3D高斯溅射(3DGS)中,自适应密度控制通常通过固定数量的随机分裂来增长高斯点数量以发现场景结构,但传统方法因需要多次分裂迭代而影响训练效率。本文提出AdpSplit,一种基于误差驱动的自适应分裂方法,根据L1像素误差区域统计信息动态决定分裂数量和参数初始化,从而减少分裂次数,加快训练速度,同时保持渲染质量。实验表明,AdpSplit在多个数据集上显著提升了加速版3DGS的训练效率,减少了9.2%至22.3%的训练时间。

详情
英文摘要

Adaptive density control in 3D Gaussian Splatting (3DGS) repeatedly grows the Gaussian population through fixed-cardinality random splitting to discover useful scene structure. However, in vanilla 3DGS, its binary split operator requires many densification rounds to expose fine details, making it a bottleneck for efficient training schedules with fewer iterations. We introduce AdpSplit, an error-driven adaptive split operator that determines the number of split children and initializes the child parameters from L1-pixel-error region statistics, enabling fewer densification iterations, thus reduced training time, while preserving the rendering quality of full-schedule training. Across the MipNeRF360, Deep-Blending, and Tanks&Temples datasets, AdpSplit reduces the training time of multiple accelerated 3DGS pipelines by 9.2%-22.3% as a simple drop-in replacement for the standard split operator. With FastGS, AdpSplit matches the full-schedule PSNR on MipNeRF360 while reducing training time by 16.4%, corresponding to a 12.6x acceleration over vanilla 3DGS.

2605.06874 2026-05-11 cs.LG

On the Divergence of Differential Temporal Difference Learning without Local Clocks

David Antrobius, Shangtong Zhang

AI总结 本文研究了在无本地时钟的情况下,差分时间差分学习(DTDL)在平均奖励强化学习中的收敛性问题。作者通过构造反例,证明了在平均奖励设置中,使用本地时钟的DTDL算法即使收敛,使用全局时钟时也可能发散,从而揭示了与折扣奖励设置中不同的收敛性质。该结果解决了Wan等人和Blaser等人提出的开放问题,为理解不同时间差分学习方法的收敛性提供了重要见解。

详情
英文摘要

Learning rate is a critical component of reinforcement learning (RL). This work uses global and local clocks to distinguish two types of learning rates. The former is of the standard form $α_t$ that depends only on the time step $t$ (i.e., a global clock). The latter is of the form $α_{ν(S_t, t)}$, where $ν(s, t)$ counts the number of visits to state $s$ until time $t$ (i.e., a local clock). In discounted RL, an RL algorithm that is convergent with a local clock is always also convergent with a global clock, and vice versa. We are not aware of any counterexample. The key contribution of this work is to show that this nice correspondence breaks down in average-reward RL. Specifically, we construct a counterexample showing that although differential temporal difference learning is convergent with a local clock, it can diverge with a global clock. This counterexample closes the open problem in Wan et al. [2021], Blaser et al. [2026].

2605.06868 2026-05-11 cs.LG math.OC

When Descent Is Too Stable: Event-Triggered Hamiltonian Learning to Optimize

Yi Wang, Chandrajit Bajaj

AI总结 本文研究了固定预算非凸优化中因局部下降过于稳定而导致的失败问题,即优化器可能在接近局部极小值后耗尽预算而无法进一步改进。为此,作者提出了SHAPE方法,通过引入结构化自适应端口哈密顿系统,在增强相空间中结合梯度信息进行动态优化决策。该方法能够在检测到局部平衡时触发事件更新,从而在保持系统被动性结构的同时,提升优化性能,实验表明其在固定预算任务中优于传统固定策略优化器。

详情
英文摘要

Fixed-budget nonconvex optimization can fail not because local descent is unstable, but because it is too stable: after reaching a nearby stationary point, an optimizer may spend the remaining evaluations refining an uninformative local minimum. We formulate this failure mode as a control problem over optimizer dynamics, where the learner must decide when to descend, when to exploit a promising basin, and when stagnation should trigger movement elsewhere. We introduce SHAPE, a structured adaptive port-Hamiltonian task-family optimizer for event-triggered minima hunting under local information. Starting from gradient-descent dynamics, SHAPE lifts optimization to an augmented phase space $(q, p)$, where the primal state $q$ represents the candidate solution, the cotangent variable $p$ carries directional sensitivity, and a controller $u$ provides processed information from current gradient oracle. Within each stage, a learned Hamiltonian vector field induces structured local descent; across stages, a fixed event clock in the implementation updates ports and memory when local equilibria are detected, with stage-dependent horizons treated in the analysis as a direct generalization. This design preserves a passivity-compatible structure while allowing the same trained policy to use clean, stochastic, or estimated gradient inputs. Experiments on fixed-budget nonconvex optimization tasks show that SHAPE improves best-so-far performance compared with fixed-policy optimizers. These results suggest that adaptive Hamiltonian energy shaping provides a principled mechanism for balancing descent, exploration, and budget allocation in difficult optimization landscapes.

2605.06866 2026-05-11 cs.LG math.OC

A Finite-Iteration Theory for Asynchronous Categorical Distributional Temporal-Difference Learning

Ege C. Kaya, Abolfazl Hashemi

AI总结 本文研究了异步分类分布时差学习的有限迭代理论,填补了现有理论与实际算法之间的关键差距。作者针对两种分类策略评估方法,分别在Cramér几何和最大均值差异几何下,建立了其在异步单状态更新下的收敛性分析。通过合适的等距嵌入,这两种方法被转化为在状态逐个最大范数下具有收缩性质的随机逼近递归,从而在i.i.d.和马尔可夫采样下提供了折扣问题以及固定时间步长非折扣问题的有限迭代收敛保证。

Comments 53 pages

详情
英文摘要

Recent non-asymptotic analyses have substantially advanced the theory of distributional policy evaluation, but they largely concern synchronous full-state updates under a generative model, model-based estimators, accelerated variants, or different approximation architectures. Standard categorical temporal-difference learning is typically used in a different regime. It asynchronously performs a single-state update at each iteration and, in online settings, is driven by a Markovian trajectory. This leaves an important gap between existing finite-iteration theory and the categorical recursions most closely aligned with practical distributional temporal-difference implementations. We bridge this gap for two categorical policy-evaluation methods: scalar categorical temporal-difference learning in the Cramér geometry and multivariate signed-categorical temporal-difference learning in the maximum mean discrepancy geometry. After suitable isometric embeddings, both algorithms take the form of asynchronous single-state stochastic-approximation recursions that contract in a statewise supremum norm. This permits finite-iteration guarantees in discounted problems under both i.i.d. and Markovian state sampling, and in undiscounted fixed-horizon problems under i.i.d. episodic sampling.

2605.06865 2026-05-11 cs.LG

Dataset Watermarking for Closed LLMs with Provable Detection

Pengrun Huang, Kamalika Chaudhuri, Yu-Xiang Wang

AI总结 本文研究了如何为闭源大语言模型(LLMs)设计可检测的数据集水印,以识别模型是否使用了特定数据集进行训练。作者提出了一种通过增加随机词对共现频率来嵌入数据集级水印的方法,并利用统计检验在模型生成的文本中检测该水印。实验表明,该方法在微调阶段能可靠检测水印,且在数据混合场景下仍保持有效性,同时不影响基准数据集的实用性和语义完整性。

详情
英文摘要

Large language models (LLMs) are pre-trained and post-trained on vast amounts of loosely curated data, raising the possibility that these models may have been trained on proprietary datasets or the same benchmarks used for evaluation. This motivates the need for dataset watermarking: designing datasets such that training on them leaves detectable signatures in the resulting model. Prior work has explored this problem for open models. We introduce the first dataset watermarking method for closed LLMs with provable detection. In particular, we embed a dataset-level watermark signal by increasing the co-occurrence frequency of randomly selected word pairs through rephrasing, and detect it using a statistical test on co-occurrence patterns in model-generated outputs. We evaluate our method with multiple base models and benchmark datasets and show that it reliably detects the watermark ($p <0.01$) in the fine-tuning stage. Notably, our method remains effective in a data mixture setting where the watermarked dataset constitutes only approximately $1\%$ of the total fine-tuning tokens. Furthermore, we show that our method preserves the utility and semantic integrity of the benchmark.

2605.06864 2026-05-11 cs.LG

Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization

John Wang, Mengfan Xu

AI总结 本文研究了在随机奖励环境下具有多目标的多智能体多臂老虎机问题(MO-MA-MAB),其中智能体通过时变图进行通信,并观察异构奖励向量。为解决高效学习与公平性优化的双重目标,作者提出了两种算法:一种基于帕累托后悔的探索策略,另一种结合社会福利的纳什社会福利优化方法。实验表明,所提方法在效率和公平性方面均优于现有基线,性能提升分别达到约100%和50%。

详情
英文摘要

We study multi-objective multi-agent multi-armed bandits (MO-MA-MAB) under stochastic rewards, where agents observe heterogeneous reward vectors and communicate over time-varying graphs. We formulate this emerging problem setting to address \emph{efficient learning}, measured by Pareto regret, and incorporate \emph{fair learning} as an additional goal, captured via social welfare. To measure efficiency, we formulate Pareto regret and develop \textsc{Pareto UCB1 Gossip}, whose novel exploration radius explicitly separates statistical uncertainty in Pareto-based inference from consensus error. To express the fairness constraint, we formulate a Nash Social Welfare objective over preference-scalarized rewards and propose \textsc{Simulated NSW UCB Gossip}, which integrates preference-based reward simulation, gossip-based utility estimation, and UCB-style exploration. We prove that \textsc{Pareto UCB1 Gossip} achieves \(\mathcal{O}(\log T)\) regret and an instance-independent rate of \(\mathcal{O}(\sqrt{T})\), while \textsc{Simulated NSW UCB Gossip} achieves an instance-independent regret bound of \(\mathcal{O}(T^{3/4})\). This separation reveals the cost of imposing the fairness constraint to our efficiency objective: fairness limits information aggregation and slows convergence. Experiments show that our methods consistently outperform baselines, improving performance by approximately \(100\%\) and \(50\%\) in the efficiency and fairness settings, respectively.

2605.06863 2026-05-11 cs.RO cs.HC

Bi3: A Biplatform, Bicultural, Biperson Dataset for Social Robot Navigation

Andrew Stratton, Phani Teja Singamaneni, Pranav Goyal, Rachid Alami, Christoforos Mavrogiannis

AI总结 本文提出了Bi3数据集,用于研究社交机器人在受限实验室环境中与人群的导航交互。该数据集通过创新的实验设计,记录了机器人与两人之间的近距离导航互动,并包含多种导航算法、两种机器人平台以及来自美国和法国共74名参与者的多模态数据。Bi3在交互密度和人类速度等指标上表现出独特的多样性与建模复杂性,为理解人机协作及训练高密度环境下的机器人导航模型提供了重要资源。

Comments ICRA 2026

详情
英文摘要

We contribute Bi3, a dataset of social robot navigation among groups of people in a constrained lab space. Compared to prior data collection efforts for social robot navigation, our dataset is unique in that it features: an original experiment design giving rise to close navigation encounters between two humans and a robot; five different navigation algorithms; two different robot platforms; a diverse participant pool of 74 people recruited from two sites in the USA and France; multimodal data streams including 10.5 hours of human and robot ground-truth motion tracks, RGB video, and user impressions over robot performance. Our analysis of the collected dataset through metrics like interaction density and human velocity suggests that Bi3 represents a benchmark of unique diversity and modeling complexity. Bi3 contributes towards understanding how humans and robots can productively mesh their activities in constrained environments, and can be a resource for training models of human motion prediction and robot control policies for navigation in densely crowded spaces.

2605.06861 2026-05-11 cs.LG cs.NA math.NA

Christoffel-DPS: Optimal sensor placement in diffusion posterior sampling for arbitrary distributions

James Rowbottom, Nick Huang, Carola-Bibiane Schönlieb, Ben Adcock

AI总结 本文研究了在扩散后验采样(DPS)框架下,如何为任意分布设计最优的传感器布置策略。传统方法基于高斯假设,难以处理复杂分布,而现有生成模型引导的传感器选择方法要么需要大量传感器,要么沿用经典方法,难以匹配现代恢复模型的需求。为此,作者提出了一种基于Christoffel函数的分布无关传感器布置框架Christoffel-DPS,能够为任意信号分布提供理论保证的采样策略,并在多种非高斯分布的基准测试中表现出优于现有方法的性能。

详情
英文摘要

State estimation is a critical task in scientific, engineering and control applications. Since the reliability of reconstructions depends on the number and position of sensors, optimal sensor placement (OSP) is essential in scenarios where measurements are sparse and expensive. Classical OSP approaches rely on Gaussian assumptions and are consequently unable to account for the complex distributions encountered in many real-world systems. Generative-model-based reconstruction using sensor guided diffusion posterior sampling (DPS) has emerged as a promising technique for reconstructing states from highly complex distributions. However, existing sensor-selection methods either require unrealistically many sensors or emulate classical OSP, creating a mismatch between modern recovery models with classical OSP tools motivating the need for fundamentally new ideas towards OSP that match the recent advances made in powerful recovery models. We introduce a distribution-free sensor placement framework based on the Christoffel function: a mathematical formulation of optimal sampling and recovery guarantees for posterior sampling with arbitrary sensors and signal distributions, from which we derive a new OSP strategy with non-asymptotic bounds on the number of sensors needed for recovery. We develop Christoffel-DPS, with offline and online variants, instantiating Christoffel sampling for generative models. Christoffel-DPS outperforms Gaussian OSP baselines and existing generative-model placement methods, validating that distribution-free sensing is both theoretically principled and practically superior. The framework is model-agnostic; we demonstrate its application to a range of unconditional DPS and flow-matching models on structurally non-Gaussian benchmarks, showing the efficacy of Christoffel-DPS in low sensor budget regimes.

2605.06859 2026-05-11 cs.CV cs.AI cs.LG

Knowledge Transfer Scaling Laws for 3D Medical Imaging

Ho Hin Lee, Dongna Du, Chu Wang, Yuankai Huo, Shi Gu, James C. Gee, Yifan Wu

AI总结 该研究探讨了三维医学影像领域中知识迁移的缩放规律,发现不同影像模态(如CT、MRI、PET)在预训练过程中具有不同的学习速率,且知识迁移具有显著的不对称性。基于这一观察,研究将数据分配建模为缩放律优化问题,揭示了“枢纽-岛屿”结构:某些高度可迁移的模态作为枢纽能显著提升其他模态,而孤立模态则需要直接投入。实验表明,基于知识迁移的数据分配策略在预训练效果和下游临床任务中均优于传统比例采样方法。

Comments 20 Pages

详情
英文摘要

Vision foundation models are increasingly moving beyond 2D to volumetric domains such as 3D medical imaging, where unified pretraining across different imaging modalities (i.e. CT, MRI, and PET) could provide foundational models for diverse clinical tasks. However, training such models requires mixing heterogeneous imaging domains, and current mixture strategies remain largely heuristic. In this work, we observe that different medical imaging domains scale at variable rates during pretraining, and knowledge transfer between domains is strongly asymmetric: training on one domain can substantially improve another, but the reverse may be much weaker. Interestingly, both MAE reconstruction loss and cross-domain transfer follow predictable power-law trends with domain-specific behaviors. Motivated by these findings, we formulate data allocation as a scaling-law optimization problem. The derived allocations reveal an interpretable hub-and-island structure: highly transferable domains emerge as hubs that benefit many others and deserve strategic allocation, while isolated domains act as islands requiring direct investment. Empirically, transfer-aware allocation outperforms data-proportional sampling by up to 58% and generalizes well to unseen budgets with r=0.989. Downstream validation on disease classification and organ/lesion segmentation further confirms that the derived transfer-aware mixtures provide stronger pretrained representations for clinical 3D medical imaging tasks.

2605.06850 2026-05-11 cs.LG cs.AI

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

Rui Zhu, Weiheng Bai, Qiushi Wu, Yang Ren, Haixu Tang, Yuchu Liu

AI总结 本文研究了如何在强化学习后训练过程中高效压缩键值缓存(KV Cache),以解决长上下文推理任务中因KV缓存占用过大而带来的内存瓶颈问题。作者提出了一种名为“Shadow Mask Distillation”的方法,通过引入影子掩码蒸馏机制,在保持策略探索能力的同时减少内存消耗。该方法有效缓解了压缩带来的策略偏移问题,提升了强化学习训练的稳定性和效率,为大语言模型的高效微调提供了新的解决方案。

详情
英文摘要

Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PPO, GRPO, or Online DPO), online RL inherently requires an exploratory trajectory generation (rollout) phase. However, for long-context reasoning tasks, this rollout phase imposes a severe ``memory wall'' due to the exorbitant Key-Value (KV) cache footprint. While applying KV cache compression during rollouts mitigates this memory overhead, it induces a critical off-policy bias. Although modern KV compression is often nearly lossless during standard inference, even minuscule approximation errors are drastically amplified by the inherent instability of RL optimization. Specifically, the sampler generates responses under a sparse context, whereas the learner updates parameters using the full, dense context. Existing statistical solutions, such as importance reweighting, struggle to correct this magnified bias, suffering from high gradient variance and severe sample inefficiency.