arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1676
2512.01089 2026-05-18 cs.AI

CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents

Peter Jansen, Samiah Hassan, Pragnya Narasimha

AI总结 CodeDistiller 是一个自动从科学 GitHub 仓库中提炼高质量代码库的系统,旨在增强科学编程代理的代码生成能力。该系统通过结合自动评估和领域专家评审,生成适用于材料科学等领域的可运行代码示例,显著提升了自动科学发现系统的实验准确性和科学性。实验表明,使用 CodeDistiller 生成的代码库可使代理生成更完整、更可靠的实验代码,并为大规模评估科学发现系统提供了可行的替代指标。

Comments 8 pages, 3 figures, 3 tables. Accepted to ACL 2026 (Demo Track)

详情
英文摘要

Automated Scientific Discovery (ASD) systems can help automatically generate and run code-based experiments, but their capabilities are limited by the code they can reliably generate from parametric knowledge alone. As a result, current systems either mutate a small number of manually-crafted experiment examples, or operate solely from parametric knowledge, limiting quality and reach. We introduce CodeDistiller, a system that automatically distills large collections of scientific Github repositories into a vetted library of working domain-specific code examples, allowing ASD agents to expand their capabilities without manual effort. Using a combination of automatic and domain-expert evaluation on 250 materials science repositories, we find the best model is capable of producing functional examples for 74% of repositories, while our downstream evaluation shows an ASD agent augmented with a CodeDistiller generated library produces more accurate, complete, and scientifically sound experiments than an agent with only general materials-science code examples. We also evaluate LLM-as-a-judge ratings against domain-expert ratings in an A/B testing paradigm, finding moderate agreement and suggesting that inexpensive proxy metrics may be feasible for evaluating scientific discovery systems at scale.

2512.00920 2026-05-18 cs.CL

Reward Auditor: Inference on Reward Modeling Suitability in Real-World Perturbed Scenarios

Jianxiang Zang, Yongda Wei, Ruxue Bai, Shiyu Jiang, Nijia Mo, Binhong Li, Qiang Sun, Hui Liu

AI总结 该研究提出了一种名为Reward Auditor的框架,用于评估奖励模型(RM)在现实世界扰动场景中的适用性,即其在特定条件下保持可靠性的能力。与传统仅关注偏好感知准确率的评估方法不同,Reward Auditor通过统计检验分析RM在不同扰动下的置信度分布退化,从而判断其是否存在系统性漏洞,并评估漏洞的严重程度。该方法为构建更安全、可靠和可信的大语言模型对齐系统提供了理论基础。

详情
英文摘要

Reliable reward models (RMs) are critical for ensuring the safe alignment of large language models (LLMs). However, current RM evaluation methods focus solely on preference perception accuracies in given specific scenarios, obscuring the critical vulnerabilities of RMs in real-world scenarios. We identify the true challenge lies in assessing a novel dimension: Suitability, defined as conditional reliability under specific real-world perturbations. To this end, we introduce Reward Auditor, a hypothesis-testing framework specifically designed for RM suitability inference. Rather than answering "How accurate is the RM's preference perception for given samples?", it employs scientific auditing to answer: "Can we infer RMs exhibit systematic vulnerabilities in specific real-world scenarios?". Under real-world perturbed scenarios, Reward Auditor quantifies statistical significance and effect size by auditing distribution degradation of RM preference perception confidence. This enables inference of both the certainty and severity of RM vulnerabilities across diverse real-world scenarios. This lays a solid foundation for building next-generation LLM alignment systems that are verifiably safe, more robust, and trustworthy.

2512.00778 2026-05-18 cs.LG

What Is Preference Optimization Doing, and Why?

Yue Wang, Qizhou Wang, Zizhuo Zhang, Gang Niu, Bo Han, Masashi Sugiyama

AI总结 偏好优化(PO)在大语言模型(LLM)中至关重要,其中直接偏好优化(DPO)和近端策略优化(PPO)等方法取得了显著成功。本文通过分析这两种方法的优化动态,揭示了它们在算法行为和目标方向上的差异,指出DPO以稳定目标为主,而PPO则在探索与利用之间取得平衡。同时,研究深入探讨了正向学习、负向学习和损失重加权等关键但常被忽视的组件在不同方法中的不同作用,并通过消融实验验证了这些动态对优化效率和实际性能的影响,为PO方法的理解和改进提供了重要见解。

详情
英文摘要

Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving great success. A common belief is that DPO is supervised learning while PPO is reinforcement learning, yet deeper analyses for the reasons underlying these differences remain lacking. To fill this gap, we analyze their optimization dynamics, revealing distinct algorithmic behaviors and comprehending their underlying causes. First, we examine the target directions of gradient-based updates and find that DPO follows stable targets, whereas PPO balances exploration and exploitation, validating the common belief yet from this new perspective. Second, we examine the roles of positive learning, negative learning, and loss reweighting, which are three key yet seldom discussed components within PO methods. Our analyses reveal that these components play fairly different roles. In DPO, positive and negative learning jointly shape the targets. However, loss reweighting in DPO acts less as a reward signal but more as a regularizer to mitigate overfitting. In PPO, negative learning primarily supports exploration rather than determining the targets. Meanwhile, loss reweighting, related to the absolute advantages, indicates the distinct roles of token groups in updating targets. Given these findings, we conduct carefully designed ablation studies to further examine how controlling these dynamics impacts optimization efficiency and practical performance. The insights gained from our analyses not only deepen the understanding of PO methods but also inspire the development of more preference-aligned LLMs.

2512.00242 2026-05-18 cs.LG cs.AI cs.ET stat.ML

Polynomial Neural Sheaf Diffusion: A Spectral Filtering Approach on Cellular Sheaves

Alessio Borgi, Fabrizio Silvestri, Pietro Liò

AI总结 本文提出了一种名为多项式神经束扩散(PolyNSD)的新方法,用于改进神经束网络在图结构上的扩散过程。该方法通过在归一化束拉普拉斯矩阵上应用K次多项式传播算子,实现了与束维数无关的K跳感受野,并通过凸混合的正交多项式基响应进行可训练的谱响应建模。相比传统方法,PolyNSD在保持模型稳定性的同时,降低了计算和内存需求,并在同质和异质图基准测试中取得了新的最先进结果。

详情
英文摘要

Sheaf Neural Networks equip graph structures with a cellular sheaf: a geometric structure which assigns local vector spaces (stalks) and a linear learnable restriction/transport maps to nodes and edges, yielding an edge-aware inductive bias that handles heterophily and limits oversmoothing. However, common Neural Sheaf Diffusion implementations rely on SVD-based sheaf normalization and dense per-edge restriction maps, which scale with stalk dimension, require frequent Laplacian rebuilds, and yield brittle gradients. To address these limitations, we introduce Polynomial Neural Sheaf Diffusion (PolyNSD), a new sheaf diffusion approach whose propagation operator is a degree-K polynomial in a normalised sheaf Laplacian, evaluated via a stable three-term recurrence on a spectrally rescaled operator. This provides an explicit K-hop receptive field in a single layer (independently of the stalk dimension), with a trainable spectral response obtained as a convex mixture of K+1 orthogonal polynomial basis responses. PolyNSD enforces stability via convex mixtures, spectral rescaling, and residual/gated paths, reaching new state-of-the-art results on both homophilic and heterophilic benchmarks, inverting the Neural Sheaf Diffusion trend by obtaining these results with just diagonal restriction maps, decoupling performance from large stalk dimension, while reducing runtime and memory requirements.

2511.19399 2026-05-18 cs.CL cs.AI cs.LG

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Rulin Shao, Akari Asai, Shannon Zejiang Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Samuel G. Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldaini, Faeze Brahman, Wen-tau Yih, Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hannaneh Hajishirzi, Pang Wei Koh

AI总结 该论文提出了一种名为DR Tulu-8B的深度研究模型,旨在解决现有开放源深度研究代理在长篇、多步骤研究任务中表现不足的问题。研究引入了基于动态评分标准的强化学习方法(RLER),使评分标准与策略模型在训练过程中协同进化,从而提升事实核查能力和反馈质量。DR Tulu-8B是首个直接针对开放性长篇深度研究任务训练的完全开源模型,在多个科学、医疗和通用领域的基准测试中,其性能显著优于现有开源模型,并接近甚至超越了专有模型,同时在每查询成本上大幅降低。

Comments ICML 2026

详情
英文摘要

Deep research agents perform multi-step research to produce long-form, well-attributed answers. However, most open deep research agents are trained on easily verifiable short-form QA tasks via reinforcement learning with verifiable rewards, which does not extend to realistic long-form tasks. We address this with Reinforcement Learning with Evolving Rubrics (RLER), where rubrics are constructed and maintained to co-evolve with the policy model during training. This allows the rubrics to incorporate newly explored information from search and contrasting model responses, enabling better fact checking and more discriminative on-policy feedback. Using RLER, we develop Deep Research Tulu (DR Tulu-8B), the first fully open model that is directly trained for open-ended, long-form deep research. Across four long-form deep research benchmarks in science, healthcare, and general domains, DR Tulu substantially outperforms existing open deep research agents (by 15.6% over Tongyi DR on average) and matches or exceeds proprietary deep research agents (by 0.7% over OpenAI DR on average), while being significantly smaller and cheaper per query (1000x cheaper than OpenAI DR per query).

2511.19115 2026-05-18 cs.AI cs.CY

AI Consciousness and Existential Risk

Rufin VanRullen

AI总结 本文探讨了人工智能意识与存在风险之间的关系,指出二者常被混淆,但实际上意识与智能在理论和实践中是截然不同的属性。研究认为,智能是预测AI系统存在风险的直接因素,而意识本身并不直接构成威胁,但在某些情况下可能间接影响风险。明确这一区别有助于AI安全研究者和政策制定者更准确地识别和应对核心问题。

Comments Updated for clarity and completeness following peer-review

详情
英文摘要

In AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicate humanity. This issue is gaining prominence in scientific debate due to recent technical advancements and increased media coverage. In parallel, AI progress has sparked speculation and studies about the potential emergence of artificial consciousness. The two questions, AI consciousness and existential risk, are sometimes conflated, as if the former entailed the latter. Here, I explain that this view stems from a common confusion between consciousness and intelligence. Yet these two properties are empirically and theoretically distinct. Arguably, while intelligence is a direct predictor of an AI system's existential threat, consciousness is not. There are, however, certain incidental scenarios in which consciousness could influence existential risk, in either direction. Consciousness could be viewed as a means towards AI alignment, thereby lowering existential risk; or, it could be a precondition for reaching certain capabilities or levels of intelligence, and thus positively related to existential risk. Recognizing these distinctions can help AI safety researchers and public policymakers focus on the most pressing issues.

2511.18719 2026-05-18 cs.CV

Seeing What Matters: Visual Preference Policy Optimization for Visual Generation

Ziqi Ni, Yuanzhi Liang, Rui Li, Yi Zhou, Haibin Huang, Chi Zhang, Xuelong Li

AI总结 本文提出了一种名为ViPO的视觉偏好策略优化方法,用于提升视觉生成模型与人类偏好的一致性。与现有方法依赖单一标量奖励不同,ViPO通过引入感知结构模块,将反馈转化为结构化的像素级优势图,从而更精细地引导模型优化视觉内容中的关键区域。该方法在图像和视频生成任务中均表现出色,提升了对域内人类偏好奖励的对齐能力,并增强了对域外任务的泛化性能,且具有轻量、通用、易于集成现有训练流程的优点。

详情
英文摘要

Reinforcement learning (RL) has become a powerful tool for post-training visual generative models, with Group Relative Policy Optimization (GRPO) increasingly used to align generators with human preferences. However, existing GRPO pipelines rely on a single scalar reward per sample, treating each image or video as a holistic entity and ignoring the rich spatial and temporal structure of visual content. This coarse supervision hinders the correction of localized artifacts and the modeling of fine-grained perceptual cues. We introduce Visual Preference Policy Optimization (ViPO), a GRPO variant that lifts scalar feedback into structured, pixel-level advantages. ViPO employs a Perceptual Structuring Module that uses pretrained vision backbones to construct spatially and temporally aware advantage maps, redistributing optimization pressure toward perceptually important regions while preserving the stability of standard GRPO. Across both image and video benchmarks, ViPO consistently outperforms vanilla GRPO, improving in-domain alignment with human-preference rewards and enhancing generalization on out-of-domain evaluations. The method is architecture-agnostic, lightweight, and fully compatible with existing GRPO training pipelines, providing a more expressive and informative learning signal for visual generation.

2511.18127 2026-05-18 cs.CV

SFHand: Learning Embodied Manipulation by Streaming Egocentric 3D Hand Forecasting

Ruicong Liu, Yifei Huang, Liangyang Ouyang, Caixin Kang, Yoichi Sato

AI总结 SFHand 是一种用于语言引导的实时 3D 手部状态预测框架,旨在提升增强现实和辅助机器人等场景下的人机交互体验。该方法通过连续视频流和语言指令,自回归地预测未来手部的多种状态,包括手部类型、2D 边界框、3D 姿态和轨迹,并结合了区域兴趣增强的记忆层以捕捉时间上下文和关键手部区域。研究还引入了 EgoHaFL 数据集,实验证明 SFHand 在 3D 手部预测任务中取得了显著优于现有方法的性能,并在下游操作任务中提升了任务成功率。

详情
英文摘要

Real-time 3D hand forecasting is a critical component for fluid human-computer interaction in applications like AR and assistive robotics. However, existing methods are ill-suited for these scenarios, as they typically require offline access to accumulated video sequences and cannot incorporate language guidance that conveys task intent. To overcome these limitations, we introduce SFHand, the first streaming framework for language-guided 3D hand forecasting. SFHand autoregressively predicts a comprehensive set of future 3D hand states, including hand type, 2D bounding box, 3D pose, and trajectory, from a continuous stream of video and language instructions. Our framework combines a streaming autoregressive architecture with an ROI-enhanced memory layer, capturing temporal context while focusing on salient hand-centric regions. To enable this research, we also introduce EgoHaFL, the first large-scale dataset featuring synchronized 3D hand poses and language instructions. We demonstrate that SFHand achieves new state-of-the-art results in 3D hand forecasting, outperforming prior work by a significant margin of up to 35.8%. Furthermore, we show the practical utility of our learned representations by transferring them to downstream embodied manipulation tasks, improving task success rates by up to 13.4% on multiple benchmarks. Dataset page: https://huggingface.co/datasets/ut-vision/EgoHaFL, project page: https://github.com/ut-vision/SFHand.

2511.17426 2026-05-18 cs.LG cs.CV stat.ML

Self-Supervised Learning by Curvature Alignment

Benyamin Ghojogh, M. Hadi Sepanj, Paul Fieguth

AI总结 本文提出了一种基于曲率对齐的自监督学习方法CurvSSL及其核空间扩展kernel CurvSSL,旨在通过显式建模数据流形的局部几何结构来提升表征学习效果。该方法在传统非对比学习框架中引入曲率正则化项,通过计算嵌入特征的局部曲率并对其在不同数据增强视图间进行对齐和去相关,从而增强表示的不变性和几何一致性。实验表明,该方法在MNIST和CIFAR-10数据集上取得了优于现有方法的线性评估性能。

Comments A shorter version of this paper has been published in: Journal of Computational Vision and Imaging Systems, Vol. 11, No. 1, Special Issue: Proceedings of CVIS 2025

详情
Journal ref
Shorter version of this paper is published in Journal of Computational Vision and Imaging Systems, Vol. 11, No. 1, Special Issue: Proceedings of CVIS 2025
英文摘要

Self-supervised learning (SSL) has recently advanced through non-contrastive methods that couple an invariance term with variance, covariance, or redundancy-reduction penalties. While such objectives shape first- and second-order statistics of the representation, they largely ignore the local geometry of the underlying data manifold. In this paper, we introduce CurvSSL, a curvature-regularized self-supervised learning framework, and its RKHS extension, kernel CurvSSL. Our approach retains a standard two-view encoder-projector architecture with a Barlow Twins-style redundancy-reduction loss on projected features, but augments it with a curvature-based regularizer. Each embedding is treated as a vertex whose $k$ nearest neighbors define a discrete curvature score via cosine interactions on the unit hypersphere; in the kernel variant, curvature is computed from a normalized local Gram matrix in an RKHS. These scores are aligned and decorrelated across augmentations by a Barlow-style loss on a curvature-derived matrix, encouraging both view invariance and consistency of local manifold bending. Experiments on MNIST and CIFAR-10 datasets with a ResNet-18 backbone show that curvature-regularized SSL yields competitive or improved linear evaluation performance compared to Barlow Twins and VICReg. Our results indicate that explicitly shaping local geometry is a simple and effective complement to purely statistical SSL regularizers.

2511.14282 2026-05-18 cs.LG cs.AI

Weight Concentration Regularization for Improving Pruning Robustness Under High Sparsity

Vincent-Daniel Yun, Junhyuk Jo, Sunwoo Lee

AI总结 深度神经网络在视觉和语言任务中表现出色,但其庞大的参数量限制了在资源受限环境中的部署。为解决这一问题,研究提出了一种新的权重集中正则化方法(WCR),通过在训练过程中放大一小部分参数的幅度,同时将其他参数驱动至零,从而在剪枝时主要移除对模型功能贡献较小的参数,提升模型在高稀疏度下的鲁棒性。实验表明,该方法在多种任务和架构中均能有效提升剪枝鲁棒性,并与现有剪枝鲁棒优化器兼容。

详情
英文摘要

Deep neural networks achieve outstanding performance across vision and language tasks, yet their large parameter counts limit deployment in resource-constrained settings. One-shot pruning reduces model size without retraining, but models trained with standard objectives often suffer substantial accuracy drops under aggressive sparsity. Prior work mitigates this drop along two directions: regularizers such as $\ell_1$ and DeepHoyer that shape the weight distribution during training, and pruning-robust optimizers such as SAM, CrAM, and S$^2$SAM that flatten the loss landscape. However, existing regularizers either shrink all weights uniformly ($\ell_1$) or induce scale-invariant sparsity (DeepHoyer), without concentrating weight energy onto a small set of informative parameters. We propose a Weight Concentration Regularizer (WCR), a training-time regularizer that amplifies the magnitude of a small subset of parameters while driving the remainder toward zero, so that magnitude pruning predominantly removes parameters with negligible functional contribution. We provide a convergence analysis and evaluate WCR on LLM fine-tuning, image classification, and medical segmentation, demonstrating consistent improvements in pruning robustness across architectures and compatibility with existing pruning-robust optimizers.

2511.09884 2026-05-18 cs.AI

Quantum Artificial Intelligence for Mission-Critical Systems: Foundations, Architectural Elements, and Future Directions

Siva Sai, Rajkumar Buyya

AI总结 本文探讨了量子人工智能(QAI)在关键任务系统(如国防、能源管理、网络安全和航空航天控制)中的应用潜力,旨在解决传统人工智能在可靠性、实时性、可解释性和安全性方面存在的不足。研究系统分析了QAI方法在满足关键任务系统需求方面的可行性,并提出了量子云资源管理与调度的概念框架,同时指出现有QAI技术与实际需求之间的差距。文章还讨论了QAI在训练限制、数据访问、组件验证等方面面临的挑战,并展望了未来在可解释性、可扩展性和硬件实现方面的发展方向。

Comments 15 pages, 5 figures, revised and accepted version of the paper

详情
英文摘要

Mission critical (MC) applications such as defense operations, energy management, cybersecurity, and aerospace control require reliable, deterministic, and low-latency decision making under uncertainty. Although the classical Artificial Intelligence (AI) approaches are effective, they often struggle to meet the stringent constraints of robustness, timing, explainability, and safety in the MC domains. Quantum Artificial Intelligence (QAI), the fusion of artificial intelligence and quantum computing (QC), can potentially provide transformative solutions to the challenges faced by classical ML models. QAI is a broader umbrella than Quantum Machine Learning (QML) and additionally includes quantum optimization, search, and reasoning; we use QAI throughout the paper for the field at large, and QML only for learning-specific subroutines. The principal contributions of this work are: (i) a systematic survey of QAI methods analyzed through the lens of MC requirements like certification, robustness, and timing; (ii) a conceptual quantum cloud resource management and scheduling framework with deployment assumptions, complexity analysis, and failure-mode discussion; and (iii) an identification of the gaps between current QAI capabilities and MC systems requirements. We also propose a conceptual model for management of quantum resources and scheduling of applications driven by timeliness constraints. We discuss multiple challenges, including trainability limits, data access, and loading bottlenecks, verification of quantum components, and adversarial QAI. Finally, we outline future research directions toward achieving interpretable, scalable, and hardware-feasible QAI models for MC application deployment.

2511.07720 2026-05-18 cs.RO

Empowering Robot Teleoperation: Exploring the Synergies Between Devices and Manipulator Controllers in a Comparative Study

Yuxuan Zhao, Yuanchen Tang, Jindi Zhang, Hongyu Yu

AI总结 本文研究了通过远程操作设备收集机器人操作任务数据时,不同设备与控制器策略之间的协同关系。作者比较了基于位置的逆运动学控制、基于力矩的逆动力学控制以及基于优化的柔顺控制等方法,分析了设备与控制器匹配对实际任务性能的影响。研究揭示了设备与控制器协同优化对提升机器人自主操作能力的重要性。

详情
英文摘要

Robot learning empowers the robot system with human brain-like intelligence to autonomously acquire and adapt skills through experience, enhancing flexibility and adaptability in various environments. Aimed at achieving a similar level of capability in large language models (LLMs) for embodied intelligence, data quality plays a crucial role in training a foundational model with diverse robot skills. In this study, we investigate the collection of data for manipulation tasks using teleoperation devices. Different devices yield varying effects when paired with corresponding controller strategies, including position-based inverse kinematic (IK) control, torque-based inverse dynamic (ID) control, and optimization-based compliant control. Analysis of experimental results suggests the importance of the relationship between teleoperation devices and controllers for real tasks.

2511.03260 2026-05-18 cs.CV

Enhancing Medical Image Segmentation via Heat Conduction Equation

Rong Wu, Yim-Sang Yu

AI总结 本文针对医学图像分割中在有限计算资源下难以实现高效全局上下文建模和长距离依赖推理的问题,提出了一种结合U-Mamba结构与热传导方程的混合架构。该方法在瓶颈层引入热传导算子,通过模拟频率域热扩散过程提升语义抽象能力,实验表明其在腹部CT数据集上的Dice系数达到0.8719,验证了该方法在医学图像分割任务中的有效性与优越性。

详情
英文摘要

Medical image segmentation models struggle to achieve efficient global context modeling and long-range dependency reasoning under practical computational budgets. In this work, we propose a hybrid architecture utilizing U-Mamba with Heat Conduction Equation, which combines state-space modules for efficient long-range reasoning with Heat Conduction Operators (HCOs) in the bottleneck layers, simulating frequency-domain thermal diffusion for enhanced semantic abstraction. Experimental results show that our model attains the highest DSC (0.8719) on the Abdomen CT dataset. It suggests that blending state-space dynamics with heat-based global diffusion offers a scalable solution for medical segmentation tasks.

2511.02342 2026-05-18 cs.RO

Whole-body motion planning and safety-critical control for aerial manipulation

Lin Yang, Jinwoo Lee, Domenico Campolo, H. Jin Kim, Jeonghyun Byun

AI总结 本文研究了空中机械臂系统的全身运动规划与安全关键控制问题,针对复杂环境中避障与动态轨迹生成的挑战,提出了一种基于超二次曲面(SQs)的规划与控制框架。该方法通过可微分的几何精确表面建模,结合最大安全距离规划器和高阶控制屏障函数,实现了高效、安全且平滑的轨迹生成与控制。实验表明,该方法在仿真与实际平台中均表现出优越的性能,优于传统基于椭球体的基线方法。

Comments Will be presented in 23rd IFAC World Congress 2026

详情
英文摘要

Aerial manipulation combines the maneuverability of multirotors with the dexterity of robotic arms to perform complex tasks in cluttered spaces. Yet planning safe, dynamically feasible trajectories remains difficult due to whole-body collision avoidance and the conservativeness of common geometric abstractions such as bounding boxes or ellipsoids. We present a whole-body motion planning and safety-critical control framework for aerial manipulators built on superquadrics (SQs). Using an SQ-plus-proxy representation, we model both the vehicle and obstacles with differentiable, geometry-accurate surfaces. Leveraging this representation, we introduce a maximum-clearance planner that fuses Voronoi diagrams with an equilibrium-manifold formulation to generate smooth, collision-aware trajectories. We further design a safety-critical controller that jointly enforces thrust limits and collision avoidance via high-order control barrier functions. In simulation, our approach outperforms sampling-based planners in cluttered environments, producing faster, safer, and smoother trajectories and exceeding ellipsoid-based baselines in geometric fidelity. Actual experiments on a physical aerial-manipulation platform confirm feasibility and robustness, demonstrating consistent performance across simulation and hardware settings. The video can be found at https://youtu.be/hQYKwrWf1Ak.

2510.22665 2026-05-18 cs.CV cs.AI

SARVLM: A Vision Language Foundation Model for Semantic Understanding in SAR Imagery

Qiwei Ma, Xukun Lu, Wang Liu, Puhong Duan, Xudong Kang, Shutao Li

AI总结 本文提出SARVLM,首个专为合成孔径雷达(SAR)影像设计的视觉-语言基础模型,旨在提升SAR图像的语义理解能力。为解决SAR多模态数据稀缺及跨模态表征不足的问题,研究者构建了包含百万级图像-文本对的SARVLM-1M大规模数据集,并设计了两阶段领域迁移训练策略,利用光学遥感数据作为桥梁,有效提升模型在SAR领域的表现。实验表明,SARVLM在多个基准任务中均优于现有模型,显著推进了SAR影像的语义理解水平。

Comments 13 pages, 13 figures

详情
英文摘要

Synthetic Aperture Radar (SAR) is a critical imaging modality due to its all-weather operational capability. Although recent advances in self-supervised learning and masked image modeling (MIM) have enabled SAR foundation models, these approaches primarily focus on low-level visual features and often neglect multi-modal representation. Moreover, multimodal data for SAR is scarce, limiting the development of robust cross-modal models. To address this limitation, we construct SARVLM-1M, a large-scale vision-language dataset comprising over one million image-text pairs aggregated from existing datasets. Furthermore, to mitigate the substantial differences between SAR and natural imagery, we propose a two-stage domain transfer training strategy that leverages optical remote sensing data as an intermediate bridge, facilitating effective knowledge transfer from natural images to SAR domains. Based on this strategy, we develop SARVLM, the first vision-language foundation model tailored for SAR, consisting of SARCLIP and SARCap. In addition, an ensemble strategy is utilized to improve the cross-scene generalization capability of the model. Moreover, SARDet and SARRot further validate the capability of the proposed framework in object detection. Extensive experiments on 13 benchmarks across image-text retrieval, target recognition, zero-shot classification, object detection, semantic localization, and image captioning demonstrate the superior feature extraction and interpretation capabilities of SARVLM. It consistently outperforms state-of-the-art vision-language models and advances semantic understanding in SAR imagery. Code and datasets will be released on https://github.com/KlayMa527/SARVLM.git.

2510.18814 2026-05-18 cs.LG cs.AI

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

Mengqi Li, Lei Zhao, Anthony Man-Cho So, Ruoyu Sun, Xiao Li

AI总结 本文研究了在没有外部奖励信号的情况下,语言模型能否仅通过自身生成的响应来提升推理能力。提出了一种名为Self-evolving Post-Training(SePT)的简单后训练方法,通过交替进行自我生成和基于生成数据的训练,逐步优化模型性能。实验表明,SePT在多个数学推理基准测试中有效提升了模型推理能力,验证了仅依赖自生成监督进行模型自我进化的可行性。

详情
英文摘要

Can language models improve their reasoning performance without external rewards, using only their own sampled responses for training? We show that they can. We propose Self-evolving Post-Training (SePT), a simple post-training method that alternates between self-generation and training on self-generated responses. It repeatedly samples questions, uses the model itself to generate responses under a specified sampling temperature, and then trains the model on the self-generated data. In this self-training loop, we use an online data refresh mechanism, where each new batch is generated by the most recently updated model. Across six math reasoning benchmarks, SePT improves a strong no-training baseline, defined as the untuned base model evaluated at its best swept decoding temperature, on several tested models. Additional ablations demonstrate the importance of online data refresh and temperature dynamics. Overall, our results identify a practical regime where reasoning can be improved using self-generated supervision alone. Our code is available at https://github.com/ElementQi/SePT.

2510.10454 2026-05-18 cs.AI

Traj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction

Sihang Zeng, Yujuan Fu, Sitong Zhou, Zixuan Yu, Lucas Jing Liu, Jun Wen, Matthew Thompson, Ruth Etzioni, Meliha Yetisgen

AI总结 本文提出了一种名为Traj-CoA的多智能体系统,用于通过链式智能体结构对患者轨迹进行建模,以提升肺癌风险预测的准确性。该方法通过一系列工作智能体逐步处理电子健康记录(EHR)数据,提炼关键事件并存储在共享的长期记忆模块EHRMem中,以降低噪声并保留完整的就诊时间线,最终由管理智能体综合信息进行预测。实验表明,Traj-CoA在零样本一年期肺癌风险预测任务中优于四类基线方法,展现了其在临床时间推理方面的一致性和有效性。

Comments Accepted by NeurIPS 2025 GenAI4Health Workshop

详情
英文摘要

Large language models (LLMs) offer a generalizable approach for modeling patient trajectories, but suffer from the long and noisy nature of electronic health records (EHR) data in temporal reasoning. To address these challenges, we introduce Traj-CoA, a multi-agent system involving chain-of-agents for patient trajectory modeling. Traj-CoA employs a chain of worker agents to process EHR data in manageable chunks sequentially, distilling critical events into a shared long-term memory module, EHRMem, to reduce noise and preserve a comprehensive timeline. A final manager agent synthesizes the worker agents' summary and the extracted timeline in EHRMem to make predictions. In a zero-shot one-year lung cancer risk prediction task based on five-year EHR data, Traj-CoA outperforms baselines of four categories. Analysis reveals that Traj-CoA exhibits clinically aligned temporal reasoning, establishing it as a promisingly robust and generalizable approach for modeling complex patient trajectories. Implementation of Traj-CoA is available on https://github.com/zengsihang/Traj-CoA.

2510.08008 2026-05-18 cs.LG

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts

Ruizhe Wang, Yucheng Ding, Xiao Liu, Yaoxiang Wang, Peng Cheng, Baining Guo, Zhengjun Zha, Yeyun Gong

AI总结 随着大语言模型(LLM)预训练的计算需求不断上升,提高训练效率变得尤为重要。本文提出了一种“正交增长”策略,通过在继续训练前战略性地扩展现有模型参数,有效“回收”已有的预训练模型资源。该方法通过增加模型深度和扩展模型宽度两个维度优化混合专家(MoE)模型,实验表明,在相同计算预算下,该方法在最大700亿参数和1万亿token规模的模型上实现了10.6%的准确率提升,为可持续的大规模LLM开发提供了高效可行的方案。

Comments Accepted to ICML 2026

详情
英文摘要

As the computational demands for pre-training Large Language Models (LLMs) continue to surge, the need for efficient training paradigms becomes critical. Despite the vast resources already invested in existing pre-trained checkpoints, these assets often remain under-leveraged due to architectural limitations. We introduce an "orthogonal growth" strategy designed to "recycle" these checkpoints by strategically expanding their parameters prior to continued training. Our method focuses on optimizing converged Mixture-of-Experts (MoE) models through two dimensions: interpositional layer copying for increased depth and noisy expert duplication for expanded width. Through extensive scaling laws analysis, we demonstrate a strong positive correlation between the "sunk cost" (prior investment) and the final model accuracy. Empirical results on models up to 70B parameters and 1T tokens show that our recycling approach yields a 10.6% accuracy improvement compared to training from scratch under identical extra compute budgets. This work provides a cost-effective blueprint for sustainable large-scale LLM development.

2510.06062 2026-05-18 cs.CL

When Importance Sampling Misallocates Credit: Asymmetric Ratios for Outcome-Supervised RL

Jiakang Wang, Runze Liu, Qingpeng Cai, Lei Lin, Wenping Hu, Xiu Li, Fuzheng Zhang, Guorui Zhou, Kun Gai, Ling Pan

AI总结 在基于结果监督的强化学习(OSRL)中,重要性采样(IS)比值在分配响应中各标记的共享优势信号时,其作用发生了隐含的转变,导致正优势标记与负优势标记之间的权重不平衡,进而引发过度强化高概率标记、抑制低概率标记更新等问题。为解决这一问题,本文提出了一种简单有效的策略——非对称重要性采样策略优化(ASPO),通过反转正优势标记的比值诱导权重,稳定更新过程并保持梯度流动,从而改善训练稳定性并提升模型性能。实验表明,ASPO在数学推理和编程任务中显著缓解了熵崩溃问题,优于现有的GRPO基线方法。

详情
英文摘要

Reinforcement learning (RL) has shown great promise in large language models (LLMs) post-training, which typically rely on token-level clipping to maintain stability during optimization. Despite the empirical success of GRPO-style methods, we identify a fundamental and previously overlooked challenge in this popular Outcome-Supervised RL (OSRL) paradigm. We reveal that in OSRL, where advantages are shared across tokens within a response, importance sampling (IS) ratios deviate from their traditional purpose of distribution correction as in classic RL, which become token-level weights that allocate the shared advantage signal across tokens. We show that this hidden role shift induces a critical mismatch for positive-advantage tokens, leading to unbalanced token weighting between positive and negative tokens. Specifically, it suppresses the update of underrepresented tokens that are lagging behind, while over-amplifying already high-probability tokens. This mismatch results in rich-get-richer dynamics that over-reinforce confident tokens, weaken catch-up learning that drive entropy collapse, excessive repetition, and premature convergence. To address this, we propose Asymmetric Importance Sampling Policy Optimization (ASPO), a simple yet effective strategy that reverses the ratio-induced weighting of positive-advantage tokens, while stabilizing extreme updates and maintaining gradient flow. This mismatch correction aligns their update direction with the learning dynamics of negative ones. Comprehensive experiments across math reasoning and coding benchmarks demonstrate that ASPO significantly mitigates entropy collapse, improves training stability, and enhances performance over strong GRPO-based baselines. Our analysis provides new insights into the role of token-level weighting in OSRL and highlights the critical importance of correcting ratio-induced weighting in LLM RL.

2510.05676 2026-05-18 cs.LG cs.SI

Inductive inference of gradient-boosted decision trees on graphs for insurance fraud detection

Félix Vandervorst, Bruno Deprez, Wouter Verbeke, Tim Verdonck

AI总结 本文提出了一种用于保险欺诈检测的新型归纳图梯度提升机(G-GBM),旨在解决异构动态图数据中欺诈类别不平衡的问题。该方法结合了梯度提升对类别不平衡的鲁棒性与图结构中可解释路径特征的编码,同时保留了原始表格特征空间的可访问性。实验表明,G-GBM在公开和实际保险数据集上的表现优于现有先进方法,并公开了相关数据集以促进研究复现。

详情
英文摘要

Graph-based methods are becoming increasingly popular in machine learning due to their ability to model complex data and relations. Insurance fraud is a prime use case, since fraudulent claims are often the result of organised criminals that stage accidents or the same persons filing erroneous claims on multiple policies. One challenge is that graph-based approaches struggle to find meaningful representations of the data because of the high class imbalance present in fraud data. In addition, insurance graphs are heterogeneous and dynamic, given the changing relations among people, companies and policies. As a result, gradient-boosted tree approaches on tabular data still dominate the field. Therefore, we present a novel inductive graph gradient boosting machine (G-GBM) for supervised learning on heterogeneous and dynamic graphs. G-GBM combines the class-imbalance robustness of gradient boosting with heterogeneous graph information encoded through interpretable path-level feature concatenations, while preserving access to the original tabular feature space. In addition, the explicit representation of neighbourhood information enables transparent SHAP-based explanations at the metapath and feature level. We demonstrate G-GBM for insurance fraud detection on an open-source and a real-world, proprietary dataset, and find that G-GBM performs on par or better than the state-of-the-art. The associated insurance fraud dataset is publicly released to facilitate reproducibility.

2510.04124 2026-05-18 cs.CL

Sri Lanka Document Datasets: A Large-Scale, Multilingual Resource for Law, News, and Policy

Nuwan I. Senaratna

AI总结 本文介绍了一组来自斯里兰卡的开放、可机读的多语言文档数据集,涵盖议会记录、法律判决、政府文件、新闻和旅游统计等内容,包含269,194份文档,总大小达79.5 GB,支持僧伽罗语、泰米尔语和英语。数据集每日更新,并托管于GitHub和Hugging Face平台,旨在支持计算语言学、法律分析、社会政治研究及多语言自然语言处理等领域的发展。文章还详细描述了数据来源、采集流程、格式及潜在应用场景,并讨论了许可和伦理问题。

Comments 4 pages. 269,194 documents (79.5 GB) across 26 datasets in Sinhala, Tamil, and English. Last updated on 2026-05-15

详情
英文摘要

We present a collection of open, machine-readable document datasets covering parliamentary proceedings, legal judgments, government publications, news, and tourism statistics from Sri Lanka. The collection currently comprises of 269,194 documents (79.5 GB) across 26 datasets in Sinhala, Tamil, and English. The datasets are updated daily and mirrored on GitHub and Hugging Face. These resources aim to support research in computational linguistics, legal analytics, socio-political studies, and multilingual natural language processing. We describe the data sources, collection pipeline, formats, and potential use cases, while discussing licensing and ethical considerations. This manuscript is at version v2026-05-15-0811.

2510.02307 2026-05-18 cs.CV cs.AI

NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation

Ruozhen He, Moayed Haji-Ali, Ziyan Yang, Vicente Ordonez

AI总结 文本到图像扩散模型在生成分辨率超出训练设定的图像时性能往往会下降。本文针对低分辨率图像生成问题,提出了一种无需额外训练的噪声重新校准方法 NoiseShift,通过调整去噪器的噪声条件索引,恢复正向与反向过程的一致性,从而减少训练与测试阶段的不匹配。实验表明,NoiseShift 在多个主流扩散模型上显著提升了低分辨率图像的生成质量,且实现简单、推理开销极小。

详情
英文摘要

Text-to-image diffusion models often degrade when sampled at resolutions outside the final training resolution set. Prior work has largely emphasized higher resolution generation, enabling pretrained diffusion models to extrapolate beyond the resolutions seen during training. In this work, we instead target lower-resolution generation, performing inference at reduced resolution to significantly cut computational cost. We show that network conditioning of the noise level induces a train-test mismatch that directly degrades low-resolution generation: the same scheduled noise level can correspond to a different perceptual corruption level at lower resolutions, mis-calibrating the denoiser timestep and noise embedding. To this end, we propose NoiseShift, a training-free recalibration method that keeps the original noise sampling schedule unchanged and instead re-indexes the noise conditioning of the denoiser to restore local forward-reverse consistency. Using a lightweight coarse-to-fine calibration on a small set of image-text pairs, NoiseShift learns a resolution-specific mapping from scheduler noise to conditioning noise, reducing train-test mismatch and improving lower-resolution generation quality. When NoiseShift is applied to Stable Diffusion 3 (SD3), Stable Diffusion 3.5 (SD3.5), and Flux-Dev, generation quality at low resolutions improves consistently. Particularly, SD3 generation at 128x128 resolution gets an improved FID score from 203 to 171, and SD3.5 gets an improved FID score from 310 to 277 on LAION-COCO. Even Flux-Dev which already implements a complementary time-shifting strategy gets a modest boost from NoiseShift with an improved FID score from 120 to 113 at 64x64 resolution. More importantly, NoiseShift achieves such improvements with minimal implementation changes and no additional inference overhead.

2509.24798 2026-05-18 cs.CV cs.AI

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Lei Tong, Zhihua Liu, Chaochao Lu, Dino Oglic, Tom Diethe, Philip Teare, Sotirios A. Tsaftaris, Chen Jin

AI总结 本文提出了一种名为 Causal-Adapter 的模块化框架,用于适配冻结的文本到图像扩散模型,实现对图像的反事实生成。该方法通过因果干预目标属性,并将其影响一致地传播至因果依赖部分,同时保持图像的核心身份。与依赖提示工程的方法不同,Causal-Adapter 引入结构因果模型,并采用属性正则化策略,实现了更准确的语义控制和高保真图像生成,在多个数据集上取得了优越的性能。

Comments Project Page: https://leitong02.github.io/causaladapter/

详情
Journal ref
ICML 2026
英文摘要

We present Causal-Adapter, a modular framework that adapts frozen text-to-image diffusion backbones for counterfactual image generation. Our method supports causal interventions on target attributes and consistently propagates their effects to causal dependents while preserving the core identity of the image. Unlike prior approaches that rely on prompt engineering without explicit causal structure, Causal-Adapter leverages structural causal modeling with two attribute-regularization strategies: (i) prompt-aligned injection, which aligns causal attributes with textual embeddings for precise semantic control, and (ii) a conditioned token contrastive loss that disentangles attribute factors and reduces spurious correlations. Causal-Adapter achieves state-of-the-art performance on both synthetic and real-world datasets, including up to a 91% reduction in MAE on Pendulum for accurate attribute control and up to an 87% reduction in FID on ADNI for high-fidelity MRI generation. These results demonstrate robust, generalizable counterfactual editing with faithful attribute modification and strong identity preservation. Code and models will be released at: https://leitong02.github.io/causaladapter/.

2509.22267 2026-05-18 cs.LG eess.SP

Towards a more realistic evaluation of machine learning models for bearing fault diagnosis

João Paulo Vieira, Victor Afonso Bauler, Rodrigo Kobashikawa Rosa, Danilo Silva

AI总结 本文针对基于振动信号的轴承故障诊断中普遍存在的数据泄露问题,探讨了其对模型评估的影响,并提出了一种基于轴承级别的严格数据划分方法,以避免训练与测试数据之间的物理组件重叠。此外,研究将分类任务重新定义为多标签问题,支持多种故障类型的联合检测,并引入基于ROC曲线的评估指标。实验在四个常用数据集上验证了方法的有效性,为工业故障诊断中构建更可靠、更具泛化能力的机器学习系统提供了指导。

Comments Revised version submitted to Mechanical Systems and Signal Processing

详情
英文摘要

Reliable detection of bearing faults is essential for maintaining the safety and operational efficiency of rotating machinery. While recent advances in machine learning (ML), particularly deep learning, have shown strong performance in controlled settings, many studies fail to generalize to real-world applications due to methodological flaws, most notably data leakage. This paper investigates the issue of data leakage in vibration-based bearing fault diagnosis and its impact on model evaluation. We demonstrate that common dataset partitioning strategies, such as segment-wise and condition-wise splits, introduce spurious correlations that inflate performance metrics. To address this, we propose a rigorous, leakage-free evaluation methodology centered on bearing-wise data partitioning, ensuring no overlap between the physical components used for training and testing. Additionally, we reformulate the classification task as a multi-label problem, enabling the detection of co-occurring fault types and the use of prevalence-independent metrics based on the ROC curve. Beyond preventing leakage, we also examine the effect of dataset diversity on generalization, showing that the number of unique training bearings is a decisive factor for achieving robust performance. We evaluate our methodology on four widely adopted datasets: CWRU, Paderborn University (PU), University of Ottawa (UORED-VAFCLS) and HUST bearing. This study highlights the importance of leakage-aware evaluation protocols and provides practical guidelines for dataset partitioning, model selection, and validation, fostering the development of more trustworthy ML systems for industrial fault diagnosis applications.

2509.05030 2026-05-18 cs.CV

LUIVITON: Learned Universal Interoperable VIrtual Try-ON

Cong Cao, Xianhang Cheng, Jingyuan Liu, Yujian Zheng, Zhenhui Lin, Ren Li, Meriem Chkir, Hao Li

AI总结 本文提出了一种名为LUIVITON的全自动虚拟试穿系统,旨在解决现实世界中服装与人体模型之间骨骼结构、模板和密集对应关系不一致的问题,实现复杂多层服装在不同姿态和形态的人形角色上的自动穿戴。该方法通过SMPL作为中间代理,将服装到身体的映射分解为两个关键对应任务,并分别采用几何驱动模型和基于扩散的多视角外观特征匹配方法进行处理,最终在目标角色上生成物理合理的服装垂坠效果。该系统能够处理复杂的服装拓扑结构,并适用于多种人形角色,同时具备高效计算和无需人工干预的优点。

详情
英文摘要

To enable large-scale reuse of real-world 3D assets, where garments and characters rarely share skeletons, templates, or dense correspondences, we present a fully automated virtual try-on system that dresses complex, multi-layer garments onto diverse, arbitrarily posed humanoids. Our key idea is to use SMPL as an intermediate proxy and decompose clothing-to-body transfer into two correspondence tasks with distinct challenges: (1) clothing-to-SMPL (partial-to-complete alignment) and (2) body-to-SMPL (large pose/shape variation and stylization). We address clothing-to-SMPL using a geometry-driven correspondence model, and introduce a diffusion-based body-to-SMPL correspondence approach that leverages multi-view consistent appearance features together with a pretrained 2D foundation model. Using these correspondences, we register SMPL/SMPL+D (Displacement) to the garment and target body and then perform simulator-driven fitting by transferring the garment along a smooth SMPL-to-SMPL+D transition, producing physically plausible draping on the target. Our system handles complex garment topology (including non-manifold meshes) and generalizes to a wide range of humanoid characters (e.g., humans, robots, cartoons, and creatures) while remaining computationally practical. Upon draping, our system also supports fast customization of clothing size. We show that our system can produce high-quality 3D clothing fittings without any human labor, even when 2D clothing sewing patterns are not available. Our project page is: https://cao-cong0.github.io/LUIVITON-Learned-Universal-Interoperable-VIrtual-Try-ON/.

2508.20810 2026-05-18 cs.AI cs.CL

From Guidelines to Guarantees: A Graph-Based Evaluation Harness for Domain-Specific Evaluation of LLMs

Jessica M. Lundin, Usman Nasir Nakakana, Guillaume Chabot-Couture

AI总结 该论文提出了一种基于图结构的评估框架,用于对领域特定语言模型进行严格评估。该方法将结构化的临床指南转化为可查询的知识图谱,并通过图遍历动态生成评估问题,从而确保评估的全面性、抗污染性和可维护性。应用在世界卫生组织IMCI指南上时,该框架生成了涵盖症状识别、治疗方案、严重程度分类和后续护理的多选题,并揭示了不同语言模型在临床决策任务中的系统性能力差距。

详情
英文摘要

Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive, contamination-resistant, and maintainable. Static, manually curated datasets do not satisfy these properties. We present a graph-based evaluation harness that transforms structured clinical guidelines into a queryable knowledge graph and dynamically instantiates evaluation queries via graph traversal. The framework provides three guarantees: (1) complete coverage of guideline relationships; (2) surface-form contamination resistance through combinatorial variation; and (3) validity inherited from expert-authored graph structure. Applied to the WHO IMCI guidelines, the harness generates clinically grounded multiple-choice questions spanning symptom recognition, treatment, severity classification, and follow-up care. Evaluation across five language models reveals systematic capability gaps. Models perform well on symptom recognition but show lower accuracy on treatment protocols and clinical management decisions. The framework supports continuous regeneration of evaluation data as guidelines evolve and generalizes to domains with structured decision logic. This provides a scalable foundation for evaluation infrastructure.

2508.18167 2026-05-18 cs.CL cs.HC

DiscussLLM: Teaching Large Language Models When to Speak

Deep Anil Patel, Iain Melvin, Christopher Malon, Martin Renqiang Min

AI总结 本文提出了一种名为 DiscussLLM 的框架,旨在解决大语言模型在动态对话中被动响应的问题,使其能够主动判断何时发言以提供有价值的帮助。研究设计了一个两阶段的数据生成流程,构建了大规模的真实多轮对话数据集,并为每段对话标注了五类干预类型及明确的触发点。通过训练模型预测何时保持沉默、何时进行干预,提升了模型在对话中的情境感知与响应能力。

详情
英文摘要

Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like text, yet they largely operate as reactive agents, responding only when directly prompted. This passivity creates an "awareness gap," limiting their potential as truly collaborative partners in dynamic human discussions. We introduce $\textit{DiscussLLM}$, a framework designed to bridge this gap by training models to proactively decide not just $\textit{what}$ to say, but critically, $\textit{when}$ to speak. Our primary contribution is a scalable two-stage data generation pipeline that synthesizes a large-scale dataset of realistic multi-turn human discussions. Each discussion is annotated with one of five intervention types (e.g., Factual Correction, Concept Definition) and contains an explicit conversational trigger where an AI intervention adds value. By training models to predict a special silent token when no intervention is needed, they learn to remain quiet until a helpful contribution can be made. We explore two architectural baselines: an integrated end-to-end model and a decoupled classifier-generator system optimized for low-latency inference. We evaluate these models on their ability to accurately time interventions and generate helpful responses, paving the way for more situationally aware and proactive conversational AI.

2508.17218 2026-05-18 cs.LG cs.AI

Generalized Policy Gradient with History-Aware Decision Transformer for Reliable Routing over Graph Signals

Xing Wei, Yuanhang Wang, Duoxiang Zhao, Zezhou Zhang, Hao Qin, Yuqi Ouyang

AI总结 该研究针对随机交通网络中的可靠路径规划问题,提出了一种基于历史感知的决策变换器与广义策略梯度结合的新型策略框架GPG-HT。该方法通过关注历史节点-边-时间观测,捕捉非马尔可夫时空依赖关系,从而在不确定环境下实现更具上下文感知的路径决策。实验表明,该方法在典型交通网络中显著提升了准时到达概率,优于传统优化和强化学习方法。

详情
英文摘要

Reliable path planning in stochastic transportation networks requires decisions that account for uncertain and correlated travel times on irregular road graphs, rather than only minimizing expected delay. Such networks exhibit strong spatial-temporal coupling, where link travel times evolve as stochastic processes over graph edges, making the problem inherently sequential under uncertainty. Existing stochastic on-time arrival (SOTA) methods primarily depend on the current node and remaining budget, which limits their ability to exploit trajectory-level temporal structure and history-dependent correlations. This work proposes GPG-HT, a history-aware graph-signal policy framework that integrates a Decision Transformer with generalized policy gradient optimization for reliable routing. By attending to historical node-edge-time observations, GPG-HT captures non-Markovian spatial-temporal dependencies and enables context-aware decision making under uncertainty. Experiments on the Sioux Falls and Anaheim networks demonstrate consistent gains in on-time arrival probability over representative optimization and reinforcement learning baselines.

2508.17034 2026-05-18 cs.RO cs.CV

DualReg: Dual-Space Filtering and Reinforcement for Rigid Registration

Jiayi Li, Yuxin Yao, Qiuhang Lu, Juyong Zhang

AI总结 本文针对刚性配准中噪声数据、部分重叠和实时处理等挑战,提出了一种双空间滤波与强化学习相结合的新方法DualReg。该方法结合基于特征匹配和基于局部几何匹配的优点,通过高效的滤波机制去除不可靠的特征对应点,并利用几何代理构建目标函数以估计变换参数。实验表明,该方法在保持精度的同时,相比MAC方法在KITTI数据集上实现了32倍的CPU时间加速。

Comments Accepted to CVPR 2026, Project page: https://ustc3dv.github.io/DualReg/

详情
英文摘要

Noisy, partially overlapping data and the need for real-time processing pose major challenges for rigid registration. Considering that feature-based matching can handle large transformation differences but suffers from limited accuracy, while local geometry-based matching can achieve fine-grained local alignment but relies heavily on a good initial transformation, we propose a novel dual-space paradigm to fully leverage the strengths of both approaches. First, we introduce an efficient filtering mechanism consisting of a computationally lightweight one-point RANSAC algorithm and a subsequent refinement module to eliminate unreliable feature-based correspondences. Subsequently, we treat the filtered correspondences as anchor points, extract geometric proxies, and formulate an effective objective function with a tailored solver to estimate the transformation. Experiments verify our method's effectiveness, as demonstrated by a 32x CPU-time speedup over MAC on KITTI with comparable accuracy. Project page: https://ustc3dv.github.io/DualReg/.

2508.01014 2026-05-18 cs.RO cs.CV

Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction

Cheng-You Lu, Zhuoli Zhuang, Nguyen Thanh Trung Le, Da Xiao, Yu-Cheng Chang, Thomas Do, Srinath Sridhar, Chin-teng Lin

AI总结 Hestia 是一种面向高效三维重建的视角规划方法,旨在解决传统重建过程中图像采集依赖人工或固定轨迹的问题。该方法通过引入体素面感知的分层结构,结合多样化数据集、贪心策略与几何感知设计,提升了视角规划的鲁棒性和重建质量。实验表明,Hestia 在覆盖范围、重建精度和实时性方面均优于现有方法,具有良好的实际应用前景。

Comments Accepted to the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026

详情
英文摘要

Advances in 3D reconstruction and novel view synthesis have enabled efficient and photorealistic rendering. However, images for reconstruction are still either largely manual or constrained by simple preplanned trajectories. To address this issue, recent works propose generalizable next-best-view planners that do not require online learning. Nevertheless, robustness and performance remain limited across various shapes. Hence, this study introduces Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction (Hestia), which addresses the shortcomings of the reinforcement learning-based generalizable approaches for five-degree-of-freedom viewpoint prediction. Hestia systematically improves the planners through four components: a more diverse dataset to promote robustness, a hierarchical structure to manage the high-dimensional continuous action search space, a close-greedy strategy to mitigate spurious correlations, and a face-aware design to avoid overlooking geometry. Experimental results show that Hestia achieves non-marginal improvements, with at least a 4% gain in coverage ratio, while reducing Chamfer Distance by 50% and maintaining real-time inference. In addition, Hestia outperforms prior methods by at least 12% in coverage ratio with a 5-image budget and remains robust to object placement variations. Finally, we demonstrate that Hestia, as a next-best-view planner, is feasible for the real-world application. Our project page is https://johnnylu305.github.io/hestia web.