arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1976
2602.04473 2026-05-15 cs.CV

CC-Pan: Channel-wise Compression based Diffusion for Efficient Pan-Sharpening

Junjie Li, Congyang Ou, Haokui Zhang, Guoting Wei, Shengqin Jiang, Ying Li

AI总结 本文提出了一种基于通道压缩的扩散模型CC-Pan,用于高效实现多光谱与全色图像的融合(Pan-Sharpening)。该方法通过训练一个通道独立的变分自编码器,将高分辨率多光谱图像编码为紧凑的潜在表示,从而支持不同传感器的多光谱图像并加速推理过程。同时,通过设计的单向和双向交互控制结构引入光谱物理特性及全色图像,结合轻量化的跨带注意力模块,显著提升了融合精度和光谱一致性。实验表明,CC-Pan在多个数据集上优于现有扩散模型,并实现了2-3倍的加速效果,具有良好的跨传感器泛化能力。

详情
英文摘要

Recently, diffusion models have brought novel insights to pan-sharpening and notably boosted fusion precision. However, most existing models perform diffusion in the pixel space and train distinct models for different multispectral (MS) sensors, suffering from high inference latency and sensor-specific limitations. In this paper, we present CC-Pan, a cross-sensor latent diffusion framework for efficient pan-sharpening. Specifically, CC-Pan trains a band-wise single-channel variational autoencoder (VAE) to encode high-resolution multispectral (HRMS) images into compact latent representations, naturally supporting MS images with varying band counts across different sensors and establishing a basis for inference acceleration. Spectral physical properties, along with PAN and MS images, are then injected into the diffusion backbone through carefully designed unidirectional and bidirectional interactive control structures, achieving high-precision spatial--spectral fusion in the latent diffusion process. Furthermore, a lightweight region-based cross-band attention (RCBA) module is incorporated at the central layer of the diffusion model, reinforcing inter-band spectral connections to boost spectral consistency and further elevate fusion precision. Extensive experimental results on GaoFen-2, QuickBird, and WorldView-3 demonstrate that CC-Pan outperforms state-of-the-art diffusion-based methods across all three benchmarks, attains a $2$--$3\times$ inference speedup, and exhibits robust cross-sensor generalization capability on the held-out WorldView-2 sensor without any sensor-specific retraining.

2602.04265 2026-05-15 cs.LG cs.AI

Boosting LLM Reasoning via Human-Inspired Reward Shaping

Wenze Lin, Zhen Yang, Xitai Jiang, Xiaoteng Ma, Gao Huang

AI总结 该研究针对大语言模型(LLM)推理能力提升的问题,提出了一种受人类学习行为启发的动态奖励框架T2T。该方法通过区分问题掌握程度,分别采用“厚化”和“薄化”两个阶段的奖励机制:在错误尝试时鼓励广泛探索,在正确解答后则通过长度惩罚促进推理凝练。实验表明,T2T在多个数学基准测试中显著优于现有方法,有效提升了模型的推理性能。

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for enhancing reasoning in Large Language Models (LLMs). However, existing reward formulations typically treat exploration and consolidation as a monolithic process, resulting in entangled stage-wise learning dynamics. This contradicts the natural learning behavior of human learners. In human learning, individuals adopt distinct behavioral patterns toward mastered versus unfamiliar problems. When confronting unmastered challenges, humans prioritize broad exploration to seek viable solutions. By contrast, for well-mastered problems, they focus instead on reasoning condensation and knowledge abstraction to distill concise underlying principles. Motivated by this gap, we introduce T2T(Thickening-to-Thinning), a dynamic reward framework inspired by human learning processes. Specifically, it implements a dual-phase mechanism: (1) On incorrect attempts, T2T incentivizes "thickening" to broaden the search space and explore novel solution paths; (2) Upon achieving correctness, it shifts to "thinning", imposing length penalties to discourage redundancy, thereby fostering model confidence and crystallizing reasoning capabilities. Extensive experiments on mathematical benchmarks (MATH-500, AIME, AMC) across 5 mainstream LLMs demonstrate that T2T significantly outperforms standard GRPO and recent baselines, achieving superior performance.

2602.03814 2026-05-15 cs.AI cs.LG

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

Xi Wang, Anushri Suresh, Alvin Zhang, Rishi More, William Jurayj, Benjamin Van Durme, Mehrdad Farajtabar, Daniel Khashabi, Eric Nalisnick

AI总结 本文研究了如何在计算资源有限的情况下,通过控制推理过程中的风险来提升大语言模型的推理效率。作者提出了一种名为“共形思考”的风险控制框架,通过设定上界和下界阈值,分别在模型自信时停止推理(可能产生错误输出)和提前终止无法解决的实例(可能过早停止),从而在保证风险可控的前提下最小化计算开销。实验表明,该方法在多种推理任务和模型中均能有效提升计算效率,同时满足用户设定的风险目标。

Comments ICMl 2026

详情
英文摘要

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when additional computation is unlikely to help. However, setting the token budget, as well as the threshold for adaptive reasoning, is a practical challenge that entails a fundamental risk-accuracy trade-off. We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute. Our framework introduces an upper threshold that stops reasoning when the model is confident (risking incorrect output) and a novel parametric lower threshold that preemptively stops unsolvable instances (risking premature stoppage). Given a target risk and a validation set, we use distribution-free risk control to optimally specify these stopping mechanisms. For scenarios with multiple budget controlling criteria, we incorporate an efficiency loss to select the most computationally efficient exiting mechanism. Empirical results across diverse reasoning tasks and models demonstrate the effectiveness of our risk control approach, demonstrating computational efficiency gains from the lower threshold and ensemble stopping mechanisms while adhering to the user-specified risk target.

2602.03417 2026-05-15 cs.CL

FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

Yingli Shen, Wen Lai, Jie Zhou, Xueren Zhang, Yudong Wang, Kangyang Luo, Shuo Wang, Ge Gao, Alexander Fraser, Maosong Sun

AI总结 本文提出FactNet,一个包含10亿规模的多语言事实知识图谱,旨在解决大语言模型在非英语语言中生成内容时缺乏可检索证据支持的问题。FactNet将17亿个Wikidata断言与来自316个母语维基百科的30.1亿个证据指针相结合,通过确定性构建流程确保每个证据单元均可追溯至原始来源。此外,研究还构建了FactNet-Bench评估套件,用于知识图谱补全、问答和事实核查任务,并验证了FactNet在跨语言知识迁移中的有效性。

详情
英文摘要

Large language models hallucinate factual claims and struggle to ground their outputs in retrievable evidence, particularly in non-English languages. Existing resources impose a trade-off: structured knowledge bases lack textual grounding, whereas grounded datasets remain small and monolingual. We introduce FactNet, a billion-scale open resource that couples 1.7B Wikidata assertions with 3.01B evidence pointers drawn from 316 native Wikipedia editions. FactNet employs a deterministic construction pipeline, ensuring that every evidence unit is traceable to its source with byte-level precision. We further establish FactNet-Bench, an evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking, equipped with systematic leakage controls. Experiments demonstrate that FactNet-Bench differentiates among structural, text-aware, and LLM-integrated methods, and that cross-lingual structure enables knowledge transfer across language tiers.

2602.01664 2026-05-15 cs.AI cs.LG

FlowSteer: Towards Agents Designing Agentic Workflows via Reinforced Progressive Canvas Editing

Mingda Zhang, Wenjin Liu, Tiesunlong Shen, Qika Lin, Rui Mao, Erik Cambria, Xiaoying Tang, Haoran Luo

AI总结 FlowSteer 是一种新型智能体设计代理工作流的范式,旨在解决当前工作流构建中依赖人工、缺乏全局反馈和无法在线修复错误等问题。该方法引入了可执行的流程画布环境,通过强化学习逐步进行原子编辑,实现工作流的端到端自动设计。实验表明,FlowSteer 在多个数据集上显著优于现有方法,且支持多种操作符库和大语言模型后端,具有良好的通用性和扩展性。

Comments 51 pages, 6 figures, 5 tables. Project page: http://flowsteer.org/

详情
英文摘要

In recent years, agentic workflows have been widely applied to solve complex human tasks. However, existing workflow construction still faces key challenges, including human-dependent workflow construction, the lack of graph-level execution feedback, and the inability to repair errors in-loop during long-horizon construction. To address these challenges, we propose FlowSteer, a new paradigm of Agent Designing Agentic Workflows - a single agent itself end-to-end designs the workflow that a downstream executor runs. To support this paradigm, we introduce the Workflow Canvas, a novel executable graph-state environment that returns syntax-checked execution feedback for every atomic edit. Built on the canvas, we further propose Reinforced Progressive Canvas Editing, in which a lightweight policy agent issues one atomic edit per turn conditioned on real canvas feedback, and is trained end-to-end via reinforcement learning. Moreover, FlowSteer provides a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends. Experimental results on twelve datasets show that FlowSteer significantly outperforms baselines across various tasks. Our code is available at https://anonymous.4open.science/r/FlowSteer-9B2E.

2602.01359 2026-05-15 cs.LG cs.AI

PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection

Jinju Park, Seokho Kang

AI总结 尽管近期时间序列异常检测研究越来越多地采用如Transformer和基础模型等大型神经网络架构,但这些方法计算成本高、内存消耗大,难以应用于实时和资源受限的场景,且在严格评估下性能提升不明显。本文提出了一种基于块的表示学习方法PaAno,该方法通过从时间序列中提取短时域块,并使用1D卷积神经网络将其嵌入为向量表示,结合三元组损失和预训练任务损失进行训练,以捕捉块中的有用时间模式。在推理阶段,通过比较正常块与当前块的嵌入向量计算异常分数,实验表明PaAno在TSB-AD基准测试中表现优异,显著优于包括大型架构在内的现有方法。

Comments Accepted by the 14th International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Although recent studies on time-series anomaly detection have increasingly adopted ever-larger neural network architectures such as transformers and foundation models, they incur high computational costs and memory usage, making them impractical for real-time and resource-constrained scenarios. Moreover, they often fail to demonstrate significant performance gains over simpler methods under rigorous evaluation protocols. In this study, we propose Patch-based representation learning for time-series Anomaly detection (PaAno), a lightweight yet effective method for fast and efficient time-series anomaly detection. PaAno extracts short temporal patches from time-series training data and uses a 1D convolutional neural network to embed each patch into a vector representation. The model is trained using a combination of triplet loss and pretext loss to ensure the embeddings capture informative temporal patterns from input patches. During inference, the anomaly score at each time step is computed by comparing the embeddings of its surrounding patches to those of normal patches extracted from the training time-series. Evaluated on the TSB-AD benchmark, PaAno achieved state-of-the-art performance, significantly outperforming existing methods, including those based on heavy architectures, on both univariate and multivariate time-series anomaly detection across various range-wise and point-wise performance measures.

2602.00992 2026-05-15 cs.RO

Geometry-Aware Sampling-Based Motion Planning on Riemannian Manifolds

Phone Thiha Kyaw, Jonathan Kelly

AI总结 本文研究了在黎曼流形上进行几何感知的采样式运动规划问题,旨在在考虑配置空间非欧几里得几何结构的情况下,规划出避障且路径长度最短的运动轨迹。作者提出了一种直接在黎曼流形上运行的采样式规划框架,引入了一种计算高效的黎曼测地距离近似方法,并设计了基于黎曼自然梯度的局部规划器。实验表明,该方法在多种机器人系统中均能生成比传统欧几里得方法和经典数值解法更优的轨迹。

Comments Accepted to the 17th World Symposium on the Algorithmic Foundations of Robotics (WAFR), Oulu, Finland, Jun 15-17, 2026

详情
英文摘要

In many robot motion planning problems, task objectives and physical constraints induce non-Euclidean geometry on the configuration space, yet many planners operate using Euclidean distances that ignore this structure. We address the problem of planning collision-free motions that minimize length under configuration-dependent Riemannian metrics, corresponding to geodesics on the configuration manifold. Conventional numerical methods for computing such paths do not scale well to high-dimensional systems, while sampling-based planners trade scalability for geometric fidelity. To bridge this gap, we propose a sampling-based motion planning framework that operates directly on Riemannian manifolds. We introduce a computationally efficient midpoint-based approximation of the Riemannian geodesic distance and prove that it matches the true Riemannian distance with third-order accuracy. Building on this approximation, we design a local planner that traces the manifold using first-order retractions guided by Riemannian natural gradients. Experiments on a two-link planar arm and a 7-DoF Franka manipulator under a kinetic-energy metric, as well as on rigid-body planning in $\mathrm{SE}(2)$ with non-holonomic motion constraints, demonstrate that our approach consistently produces lower-cost trajectories than Euclidean-based planners and classical numerical geodesic-solver baselines.

2602.00807 2026-05-15 cs.CV cs.RO

Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds

Xianzhe Fan, Shengliang Deng, Xiaoyang Wu, Yuxiang Lu, Zhuoling Li, Mi Yan, Yujia Zhang, Zhizheng Zhang, He Wang, Hengshuang Zhao

AI总结 现有视觉-语言-动作(VLA)模型通常以二维图像作为视觉输入,这限制了它们在复杂场景中的空间理解能力。为提升VLA模型的性能,本文提出Any3D-VLA,通过引入多样化的点云数据增强三维感知能力,并在训练过程中融合仿真、传感器和模型估计的点云,学习跨域通用的三维表示。实验表明,该方法有效提升了模型性能并缓解了领域差异问题。

Comments ICML 2026

详情
英文摘要

Existing Vision-Language-Action (VLA) models typically take 2D images as visual input, which limits their spatial understanding in complex scenes. How can we incorporate 3D information to enhance VLA capabilities? We conduct a pilot study across different observation spaces and visual representations. The results show that explicitly lifting visual input into point clouds yields representations that better complement their corresponding 2D representations. To address the challenges of (1) scarce 3D data and (2) the domain gap induced by cross-environment differences and depth-scale biases, we propose Any3D-VLA. It unifies the simulator, sensor, and model-estimated point clouds within a training pipeline, constructs diverse inputs, and learns domain-agnostic 3D representations that are fused with the corresponding 2D representations. Simulation and real-world experiments demonstrate Any3D-VLA's advantages in improving performance and mitigating the domain gap. Our project homepage is available at https://xianzhefan.github.io/Any3D-VLA.github.io.

2602.00520 2026-05-15 cs.LG

NEST: Nested Event Stream Transformer for Sequences of Multisets

Minghui Sun, Haoyu Gong, Xingyu You, Jillian Hurst, Benjamin Goldstein, Matthew Engelhard

AI总结 事件流数据通常具有层次结构,表现为多个事件共现的多重集合序列。现有基础模型大多将其扁平化处理,导致计算效率低且集合级表示质量不高。本文提出嵌套事件流变换器(NEST),保留原始层次结构,引入掩码集合建模(MSM)方法,有效提升预训练效率和下游任务性能。

Comments 10-page main text

详情
英文摘要

Event stream data often exhibit hierarchical structure in which multiple events co-occur, resulting in a sequence of multisets (i.e., bags of events). In electronic health records (EHRs), for example, medical events are grouped into a sequence of clinical encounters with well-defined temporal structure, but the order and timing of events within each encounter may be unknown or unreliable. Most existing foundation models (FMs) for event stream data flatten this hierarchy into a one-dimensional sequence, leading to (i) computational inefficiency associated with dense attention and learning spurious within-set relationships, and (ii) lower-quality set-level representations from heuristic post-training pooling for downstream tasks. Here, we show that preserving the original hierarchy in the FM architecture provides a useful inductive bias that improves both computational efficiency and representation quality. We then introduce Nested Event Stream Transformer (NEST), a FM for event streams comprised of sequences of multisets. Building on this architecture, we formulate Masked Set Modeling (MSM), an efficient paradigm that promotes improved set-level representation learning. Experiments on real-world multiset sequence data show that NEST captures real-world dynamics while improving both pretraining efficiency and downstream performance.

2601.23072 2026-05-15 cs.LG

SplineFlow: Flow Matching for Dynamical Systems with B-Spline Interpolants

Santanu Subhash Rathod, Pietro Liò, Xiao Zhang

AI总结 本文提出了一种名为SplineFlow的流匹配算法,用于更准确地建模动态系统中的状态演化过程。该方法采用B样条插值来构建条件路径,克服了传统线性插值在处理高阶动态和不规则采样数据时的不足,从而在保证多边际约束的前提下实现更稳定、更平滑的动力学建模。实验表明,SplineFlow在多种确定性和随机动态系统以及细胞轨迹推断任务中均优于现有方法。

Comments 36 pages, 35 tables, 22 figures

详情
英文摘要

Flow matching is a scalable generative framework for characterizing continuous normalizing flows with wide-range applications. However, current state-of-the-art methods are not well-suited for modeling dynamical systems, as they construct conditional paths using linear interpolants that may not capture the underlying state evolution, especially when learning higher-order dynamics from irregular sampled observations. Constructing unified paths that satisfy multi-marginal constraints across observations is challenging, since naïve higher-order polynomials tend to be unstable and oscillatory. We introduce SplineFlow, a theoretically grounded flow matching algorithm that jointly models conditional paths across observations via B-spline interpolation. Specifically, SplineFlow exploits the smoothness and stability of B-spline bases to learn the complex underlying dynamics in a structured manner while ensuring the multi-marginal requirements are met. Comprehensive experiments across various deterministic and stochastic dynamical systems of varying complexity, as well as on cellular trajectory inference tasks, demonstrate the strong improvement of SplineFlow over existing baselines. Our code is available at: https://github.com/santanurathod/SplineFlow.

2601.21656 2026-05-15 cs.LG

TabClustPFN: A Prior-Fitted Network for Tabular Data Clustering

Tianqi Zhao, Guanyang Wang, Yan Shuo Tan, Qiong Zhang

AI总结 本文提出了一种名为TabClustPFN的新型网络,用于解决表格数据聚类这一基础而具有挑战性的问题。该方法基于先验适配网络(PFN),通过在合成数据上进行预训练,实现了对未知数据集的一次性聚类,无需重新训练或调整超参数。TabClustPFN能够处理异构的数值和类别特征,并适应多种聚类结构,实验表明其在合成数据和真实数据集上均优于传统及深度聚类方法,具有良好的鲁棒性和实用性。

详情
英文摘要

Clustering tabular data is a fundamental yet challenging problem due to heterogeneous feature types, diverse data-generating mechanisms, and the absence of transferable inductive biases across datasets. Prior-fitted networks (PFNs) have recently demonstrated strong generalization in supervised tabular learning by amortizing Bayesian inference under a broad synthetic prior. Extending this paradigm to clustering is nontrivial: clustering is unsupervised, admits a combinatorial and permutation-invariant output space, and requires inferring the number of clusters. We introduce TabClustPFN, a prior-fitted network for tabular data clustering that performs amortized Bayesian inference over both cluster assignments and cluster cardinality. Pretrained on synthetic datasets drawn from a flexible clustering prior, TabClustPFN clusters unseen datasets in a single forward pass, without dataset-specific retraining or hyperparameter tuning. The model naturally handles heterogeneous numerical and categorical features and adapts to a wide range of clustering structures. Experiments on synthetic data and curated real-world tabular benchmarks show that TabClustPFN outperforms classical, deep, and amortized clustering baselines, while exhibiting strong robustness in out-of-the-box exploratory settings. Code is available at https://github.com/Tianqi-Zhao/TabClustPFN.

2601.21349 2026-05-15 cs.LG cs.AI

L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts

Minghao Yang, Ren Togo, Guang Li, Takahiro Ogawa, Miki Haseyama

AI总结 本文提出了一种名为L2R的统一路由框架,用于改进混合专家(MoE)模型中的路由机制。L2R通过在共享的低秩潜在路由空间中进行专家分配,并引入饱和内积评分(SIPS)来显式控制路由函数的Lipschitz行为,从而提升路由几何的平滑性和稳定性。此外,L2R还采用参数高效的多锚点路由机制以增强专家的表达能力。实验表明,L2R在语言和视觉任务中均能有效提升路由性能和模型整体表现。

详情
英文摘要

Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small subset of experts, where the router plays a central role in determining expert specialization and overall model performance. However, many modern MoE systems still adopt linear routers in raw high-dimensional representation spaces, where representation mismatch, angular concentration, and scale-sensitive scoring can jointly undermine routing discriminability and stable expert specialization. In this work, we propose Low-rank & Lipschitz-controlled Routing (L2R), a unified routing framework that reshapes both the routing space and scoring geometry. L2R performs expert assignment in a shared low-rank latent routing space and introduces Saturated Inner-Product Scoring (SIPS) to explicitly control the Lipschitz behavior of routing functions, yielding smoother and more stable routing geometry. In addition, L2R incorporates a parameter-efficient multi-anchor routing mechanism to enhance expert expressiveness. Extensive experiments on an OLMoE-based language MoE model and a vision MoE setting on ImageNet demonstrate that L2R consistently improves routing geometry, expert discrimination, and overall model performance. Code will be released.

2601.21174 2026-05-15 cs.LG

Breaking the Reasoning Horizon in Entity Alignment Foundation Models

Yuanning Cui, Zequn Sun, Wei Hu, Kexuan Xin, Zhangjie Fu

AI总结 实体对齐是知识图谱融合的关键任务,但现有模型在面对未见过的知识图谱时缺乏迁移能力。本文提出了一种基于并行编码策略的实体对齐基础模型,通过利用种子对齐对作为局部锚点,引导信息流并同时初始化两个并行编码流,有效缩短了推理路径,提升了对稀疏异构结构的适应能力。此外,模型引入了合并关系图和可学习交互模块,以建模全局依赖并实现精准匹配,实验表明该方法在未见过的知识图谱上具有良好的泛化性能。

详情
英文摘要

Entity alignment (EA) is critical for knowledge graph (KG) fusion. Existing EA models lack transferability and are incapable of aligning unseen KGs without retraining. While using graph foundation models (GFMs) offer a solution, we find that directly adapting GFMs to EA remains largely ineffective. This stems from a critical "reasoning horizon gap": unlike link prediction in GFMs, EA necessitates capturing long-range dependencies across sparse and heterogeneous KG structuresTo address this challenge, we propose a EA foundation model driven by a parallel encoding strategy. We utilize seed EA pairs as local anchors to guide the information flow, initializing and encoding two parallel streams simultaneously. This facilitates anchor-conditioned message passing and significantly shortens the inference trajectory by leveraging local structural proximity instead of global search. Additionally, we incorporate a merged relation graph to model global dependencies and a learnable interaction module for precise matching. Extensive experiments verify the effectiveness of our framework, highlighting its strong generalizability to unseen KGs.

2601.21151 2026-05-15 cs.LG physics.ao-ph

Learning to Advect: A Neural Semi-Lagrangian Architecture for Weather Forecasting

Carlos A. Pereira, Stéphane Gaudreault, Valentin Dallerit, Christopher Subich, Shoyon Panday, Siqi Wei, Sasa Zhang, Siddharth Rout, Eldad Haber, Raymond J. Spiteri, David Millard, Emilia Diaconescu

AI总结 该研究提出了一种名为PARADIS的物理启发式天气预测模型,旨在解决传统机器学习方法在刻画大气输送等物理过程时的效率与准确性问题。其核心方法是将天气动力学分解为输送、扩散和反应三个模块,并通过神经半拉格朗日算子实现基于轨迹的全球输送过程建模,从而在保持物理结构的同时提升预测性能。实验表明,PARADIS在ERA5基准测试中表现出良好的确定性预测能力,尤其在短期预报和中长期预报的谱保真度方面具有显著优势。

详情
英文摘要

Recent machine-learning approaches to weather forecasting often employ a monolithic architecture in which distinct physical mechanisms-advection (long-range transport), diffusion-like mixing, thermodynamic processes, and forcing-are represented implicitly within a single large network. This is particularly problematic for advection, where long-range transport typically requires expensive global interaction mechanisms or deep stacks of local convolutional layers. To mitigate this, we present PARADIS, a physics-inspired global weather prediction model that enforces inductive biases on network behavior through a functional decomposition into advection, diffusion, and reaction blocks acting on latent variables. We implement advection through a Neural Semi-Lagrangian operator that performs trajectory-based transport via differentiable interpolation on the sphere, enabling end-to-end learning of both the latent modes to be transported and their characteristic trajectories. Diffusion-like processes are modeled by depthwise-separable spatial mixing, whereas local source terms and vertical interactions are handled via pointwise channel interactions, yielding a physically structured operator decomposition. Evaluated on ERA5 benchmarks, PARADIS achieves competitive deterministic forecast skill, with particularly strong short-lead performance, while preserving substantially better spectral fidelity and forecast activity during medium-range rollouts.

2601.19924 2026-05-15 cs.CL cs.AI cs.LG

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Yitian Chen, Cheng Cheng, Yinan Sun, Zi Ling, Dongdong Ge

AI总结 本文研究了大语言模型(LLMs)在优化建模领域的性能和可扩展性,提出了一种名为OPT-ENGINE的可扩展基准框架,用于系统评估从线性规划到混合整数规划等经典运筹学问题的自动建模与求解能力。通过该框架,研究发现基于纯文本推理的方法在任务复杂度增加时存在鲁棒性不足的问题,而结合外部计算工具虽能提升局部计算能力,却难以满足全局优化约束。研究进一步指出,当前最先进的求解器集成推理方法在自动构建约束条件方面仍面临主要瓶颈,为下一代优化建模大语言模型的发展提供了明确方向。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
英文摘要

We investigate the capabilities and scalability of Large Language Models (LLMs) in optimization modeling, a domain requiring structured reasoning and precise formulation. To this end, we introduce OPT-ENGINE, an extensible benchmark framework with quantifiable and controllable complexity. OPT-ENGINE spans ten canonical Operations Research problems, systematically scaling from Linear Programming to Mixed-Integer Programming, providing a structured environment to probe the limits of automated problem formulation and solving. Utilizing OPT-Engine, we address three pivotal research questions. First, we examine whether Pure-Text Reasoning (PTR) via classical Chain-of-Thought can efficiently tackle optimization tasks, finding that PTR suffers from a critical robustness gap as task complexity increases. Second, we examine whether integrating external computational tools can mitigate PTR's arithmetic weaknesses and improve performance. Our results indicate that while such tools help with local calculations, they still fail to adhere to global optimization constraints. Finally, we pinpoint that for the current SOTA paradigm, Solver-integrated Reasoning (SIR), the automated formulation of constraints represents the primary bottleneck. These findings clarify the limitations of current paradigms and provide a structured roadmap for developing next-generation LLMs for optimization modeling. We release our code and data to facilitate future research (https://github.com/Cardinal-Operations/OPTEngine).

2601.15620 2026-05-15 cs.LG

Closing the Gap on the Sample Complexity of 1-Identification

Zitian Li, Wang Chi Cheung

AI总结 本文研究了多臂老虎机中的1-识别问题,即判断是否存在某个臂的平均奖励超过给定阈值 $μ_0$,否则输出“None”。作者提出了一个新的优化框架,推导出在至少存在一个合格臂的情况下,最小样本复杂度的下界,并设计了一种新算法,其上界与下界在多项式对数因子内一致,从而填补了该问题在样本复杂度分析上的空白。

详情
英文摘要

The 1-identification problem is a fundamental pure-exploration problem in multi-armed bandits. An agent aims to determine whether there exists an arm whose mean reward exceeds a known threshold $μ_0$, or to output \textsf{None} otherwise. The agent must guarantee correctness with probability at least $1-δ$, while minimizing the expected number of arm pulls $\mathbb{E}[τ]$. We study the 1-identification problem and make two main contributions. First, for instances with at least one qualified arm, we derive a new lower bound on $\mathbb{E}[τ]$ via a novel optimization formulation. Second, we propose a new algorithm and establish upper bounds that match the lower bounds up to polynomial logarithmic factors uniformly over all instances. Our result complements the analysis of $\mathbb{E}τ$ when there are multiple qualified arms, which is an open problem in the literature.

2601.03969 2026-05-15 cs.AI cs.CL

Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models

Wei Wu, Liyi Chen, Congxi Xiao, Tianfu Wang, Qimeng Wang, Chengqiang Lu, Yan Gao, Yi Wu, Yao Hu, Hui Xiong

AI总结 本文研究了大语言模型在训练过程中因强化学习奖励机制导致的“长度偏移”现象,即模型在简单问题上生成冗余推理内容的问题。为此,作者提出了一种动态异常截断(DOT)方法,在训练时选择性地抑制冗余输出,同时保留对复杂问题的长推理能力。结合辅助KL正则化和预测性动态采样,该方法有效提升了模型的推理效率与性能,实验表明其在多个任务上显著优于现有方法。

Comments Accepted by ACL2026

详情
英文摘要

Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved significant performance gains by extending their chain-of-thought. However, this paradigm incurs substantial deployment costs as models often exhibit excessive verbosity on simple queries. Existing efficient reasoning methods relying on explicit length penalties often introduce optimization conflicts and leave the generative mechanisms driving overthinking largely unexamined. In this paper, we identify a phenomenon termed length shift where models increasingly generate unnecessary reasoning on trivial inputs during training. To address this, we introduce Dynamic Outlier Truncation (DOT), a training-time intervention that selectively suppresses redundant tokens. This method targets only the extreme tail of response lengths within fully correct rollout groups while preserving long-horizon reasoning capabilities for complex problems. To complement this intervention and ensure stable convergence, we further incorporate auxiliary KL regularization and predictive dynamic sampling. Experimental results across multiple model scales demonstrate that our approach significantly pushes the efficiency-performance Pareto frontier outward. Notably, on the AIME-24, our method reduces inference token usage by 78% while simultaneously increasing accuracy compared to the initial policy and surpassing state-of-the-art efficient reasoning methods.

2601.03630 2026-05-15 cs.CL

Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases

Hui Huang, Xuanxin Wu, Muyun Yang, Yuki Arase

AI总结 本文首次系统比较了大型推理模型(LRMs)与非推理大语言模型(LLMs)在判断任务中的表现,发现LRMs在判断准确性、指令遵循能力以及对对抗攻击的鲁棒性方面均优于非推理模型,但同时也存在较强的评估偏差。为此,作者提出了一种轻量级的评估策略PlanJudge,通过引导模型在判断前生成明确的评估计划,有效缓解了偏差问题,同时保持了整体判断准确性。

Comments Accepted by ACL 2026 Workshop EvalEval

详情
英文摘要

This paper presents the first systematic comparison investigating whether Large Reasoning Models (LRMs) are superior judges to non-reasoning LLMs. Our empirical analysis yields four key findings: 1) LRMs outperform non-reasoning LLMs in terms of judgment accuracy, particularly on reasoning-intensive tasks; 2) LRMs demonstrate superior evaluation instruction-following capabilities; 3) LRMs exhibit enhanced robustness against adversarial attacks targeting judgment tasks; 4) However, LRMs still exhibit strong evaluation biases. To mitigate this bias vulnerability, we propose PlanJudge, a lightweight evaluation strategy that prompts the model to generate an explicit evaluation plan before executing the judgment. Despite its simplicity, our experiments demonstrate that PlanJudge significantly mitigates biases in LLM-as-a-Judge while preserving overall judgment accuracy.

2601.01972 2026-05-15 cs.CL cs.AI cs.LG

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

AI总结 本文研究了针对基于Mamba的状态空间模型(SSMs)的语言模型的隐藏状态中毒攻击(HiSPA),该攻击通过特定的短输入短语不可逆地覆盖模型隐藏状态中的信息,导致其部分遗忘。研究提出了评估模型在遭受HiSPA攻击下信息检索能力的基准RoBench-25,并验证了SSMs在该攻击下的脆弱性,甚至包括最新的混合模型Jamba-1.7-Mini和Nemotron-3-Nano。此外,研究还分析了HiSPA对模型在其他基准上的影响,并提出了可能用于缓解该攻击的隐藏层模式分析方法。

Comments 29 pages, 4 figures

详情
英文摘要

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unexplored. This paper studies the phenomenon whereby specific short input phrases induce a partial amnesia effect in such models, by irreversibly overwriting information in their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms the vulnerability of SSMs against such attacks. Even the recent Jamba-1.7-Mini SSM--Transformer (a 52B hybrid model) collapses on RoBench-25 under some HiSPA triggers, whereas pure Transformers do not. We also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure Transformers. We further show that the theoretical and empirical findings extend to Mamba-2, and also analyse a Mamba-2-based hybrid (Nemotron-3-Nano). Finally, our interpretability study reveals patterns in Mamba's hidden layers during HiSPAs that could be used to build a HiSPA mitigation system. The full code and data to reproduce the experiments can be found at https://anonymous.4open.science/r/hispa_anonymous-5DB0.

2512.22331 2026-05-15 cs.CV cs.AI

The Multi-View Paradigm Shift in MRI Radiomics: Predicting MGMT Methylation in Glioblastoma

Mariya Miteva, Maria Nisheva-Pavlova

AI总结 该研究旨在通过多模态磁共振成像(MRI)数据非侵入性预测胶质母细胞瘤(GBM)中MGMT启动子甲基化状态,这对预后和治疗具有重要意义。为了解决传统单模态和早期融合方法在特征冗余和模态特异性建模方面的不足,作者提出了一种基于变分自编码器(VAE)的多视图潜在表征学习框架,能够在紧凑的概率潜在空间中保留各模态的影像特征并实现晚期融合。实验表明,该方法结合随机森林分类器在测试集上取得了0.77的AUC值,显著优于基线模型和调参后的模型,验证了多视图概率编码在整合互补MRI信息和提升预测性能方面的有效性。

Comments 17 pages, 4 figures

详情
英文摘要

Non-invasive inference of molecular tumor characteristics from medical imaging is a central goal of radiogenomics, particularly in glioblastoma (GBM), where O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation carries important prognostic and therapeutic significance. Although radiomics-based machine learning methods have shown promise for this task, conventional unimodal and early-fusion approaches are often limited by high feature redundancy and incomplete modeling of modality-specific information. In this work, we introduce a multi-view latent representation learning framework based on variational autoencoders (VAE) that preserves modality-specific radiomic structure while enabling late fusion in a compact probabilistic latent space. The approach is evaluated on radiomic features extracted from the necrotic tumor core in post-contrast T1-weighted (T1Gd) and Fluid-Attenuated Inversion Re-covery (FLAIR) Magnetic Resonance Imaging (MRI). Experimental results demonstrate that the proposed multi-view VAE combined with a random forest classifier achieves a test Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of 0.77 (95% confidence interval: 0.71-0.83), substantially outperforming both a baseline radiomics model (AUC = 0.54) and a hyperparameter-tuned model (AUC = 0.64). These findings indicate that multi-view probabilistic encoding enables more effective integration of complementary MRI information and significantly improves predictive performance for MGMT promoter methylation status.

2512.22317 2026-05-15 cs.LG cs.AI cs.CV

LangPrecip: Language-Aware Multimodal Precipitation Nowcasting

Xudong Ling, Chaorong Li, Tianxi Huang, Qian Dong, Guiduo Duan

AI总结 短时降水临近预报是一个具有高度不确定性和约束不足的时空预测问题,尤其在快速演变的极端天气事件中更为明显。本文提出了一种语言感知的多模态临近预报框架LangPrecip,通过将气象文本作为降水演变的语义运动约束,结合修正流范式,实现了文本与雷达信息在潜在空间中的高效融合。此外,研究还构建了一个包含160k对雷达序列和运动描述的大规模多模态数据集LangPrecip-160k,并在瑞典和MRMS数据集上验证了方法的有效性,显著提升了重降雨情况下的预测性能。

详情
英文摘要

Short-term precipitation nowcasting is an inherently uncertain and under-constrained spatiotemporal forecasting problem, especially for rapidly evolving and extreme weather events. Existing generative approaches rely primarily on visual conditioning, leaving future motion weakly constrained and ambiguous. We propose a language-aware multimodal nowcasting framework(LangPrecip) that treats meteorological text as a semantic motion constraint on precipitation evolution. By formulating nowcasting as a semantically constrained trajectory generation problem under the Rectified Flow paradigm, our method enables efficient and physically consistent integration of textual and radar information in latent space.We further introduce LangPrecip-160k, a large-scale multimodal dataset with 160k paired radar sequences and motion descriptions. Experiments on Swedish and MRMS datasets show consistent improvements over state-of-the-art methods, achieving over 60 \% and 19\% gains in heavy-rainfall CSI at an 80-minute lead time.

2512.12083 2026-05-15 cs.CV

RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model

Guanfang Dong, Luke Schultz, Negar Hassanpour, Chao Gao

AI总结 该研究提出了一种名为“RePack then Refine”的三阶段框架,旨在高效利用视觉基础模型(VFM)的语义丰富特征来提升扩散变换器(DiT)的性能。通过RePack模块将高维VFM特征压缩到低维流形,去除冗余并保留结构信息,再在压缩后的潜在空间上训练标准DiT,最后引入一个潜在引导细化模块恢复压缩过程中丢失的高频细节。实验表明,该方法在ImageNet-1K数据集上仅用64个训练周期就达到了1.65的FID值,显著优于现有扩散模型。

详情
英文摘要

Semantic-rich features from Vision Foundation Models (VFMs) have been leveraged to enhance Latent Diffusion Models (LDMs). However, raw VFM features are typically high-dimensional and redundant, increasing the difficulty of learning and reducing training efficiency for Diffusion Transformers (DiTs). In this paper, we propose Repack then Refine, a three-stage framework that brings the semantic-rich VFM features to DiT while further accelerating learning efficiency. Specifically, the RePack module projects the high-dimensional features onto a compact, low-dimensional manifold. This filters out the redundancy while preserving essential structural information. A standard DiT is then trained for generative modeling on this highly compressed latent space. Finally, to restore the high-frequency details lost due to the compression in RePack, we propose a Latent-Guided Refiner, which is trained lastly for enhancing the image details. On ImageNet-1K, RePack-DiT-XL/1 achieves an FID of 1.82 in only 64 training epochs. With the Refiner module, performance further improves to an FID of 1.65, significantly surpassing latest LDMs in terms of convergence efficiency. Our results demonstrate that packing VFM features, followed by targeted refinement, is a highly effective strategy for balancing generative fidelity with training efficiency. Source code is publicly available at https://github.com/guanfangdong/RePack-then-Refine.

2512.11855 2026-05-15 cs.LG cs.AI

Achieving Approximate Symmetry Is Exponentially Easier than Exact Symmetry

Behrooz Tahmasebi, Melanie Weber

AI总结 本文研究了在机器学习中强制对称性与近似对称性的代价差异,提出了“平均复杂度”框架来量化对称性约束的成本。研究发现,在标准条件下,精确对称性需要线性级别的平均复杂度,而近似对称性仅需对数级别的复杂度,两者存在指数级的差距。这一理论结果首次从理论上解释了为何近似对称性在实践中可能更具优势,并为对称性在机器学习中的进一步研究提供了新工具。

Comments 33 pages, 2 figures. Published at ICLR 2026

详情
Journal ref
International Conference on Learning Representations (ICLR) 2026
英文摘要

Enforcing exact symmetry in machine learning models often yields significant gains in scientific applications, serving as a powerful inductive bias. However, recent work suggests that relying on approximate symmetry can offer greater flexibility and robustness. Despite promising empirical evidence, there has been little theoretical understanding, and in particular, a direct comparison between exact and approximate symmetry is missing from the literature. In this paper, we initiate this study by asking: What is the cost of enforcing exact versus approximate symmetry? To address this question, we introduce averaging complexity, a framework for quantifying the cost of enforcing symmetry via averaging. Our main result is an exponential separation: under standard conditions, exact symmetry requires linear averaging complexity, whereas approximate symmetry can be attained with only logarithmic complexity in the group size. To the best of our knowledge, this provides the first theoretical separation of these two cases, formally justifying why approximate symmetry may be preferable in practice. Beyond this, our tools and techniques may be of independent interest for the broader study of symmetries in machine learning.

2512.07461 2026-05-15 cs.CL

Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Tong Wu, Yang Liu, Jun Bai, Zixia Jia, Shuyi Zhang, Ziyong Lin, Yanting Wang, Song-Chun Zhu, Zilong Zheng

AI总结 本文提出了一种无需教师模型的框架——原生并行推理器(NPR),使大语言模型能够自主进化出真正的并行推理能力。NPR通过自蒸馏渐进训练、并行感知策略优化算法以及改进的推理引擎,实现了从顺序推理到原生并行认知的转变。实验表明,基于Qwen3-4B训练的NPR在八个推理基准上性能提升了24.5%,推理速度提高了4.6倍,并实现了100%的真正并行执行,为高效、可扩展的智能体推理设立了新标准。

详情
英文摘要

We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the model from sequential emulation to native parallel cognition through three key innovations: 1) a self-distilled progressive training paradigm that transitions from ``cold-start'' format discovery to strict topological constraints without external supervision; 2) a novel Parallel-Aware Policy Optimization (PAPO) algorithm that optimizes branching policies directly within the execution graph, allowing the model to learn adaptive decomposition via trial and error; and 3) a robust NPR Engine that refactors memory management and flow control of SGLang to enable stable, large-scale parallel RL training. Across eight reasoning benchmarks, NPR trained on Qwen3-4B achieves performance gains of up to 24.5% and inference speedups up to 4.6x. Unlike prior baselines that often fall back to autoregressive decoding, NPR demonstrates 100% genuine parallel execution, establishing a new standard for self-evolving, efficient, and scalable agentic reasoning.

2512.03637 2026-05-15 cs.SD cs.LG stat.ML

AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

Kohei Yamamoto, Kosuke Okusa

AI总结 该研究提出了一种名为AaSP的音频频谱图Transformer自监督预训练框架,旨在解决传统方法中因时间下采样导致的混叠问题。AaSP通过引入感知混叠的补丁表示、教师-学生掩码建模、跨注意力预测器以及多掩码对比正则化,学习能够整合易受混叠影响频段特征且在不同掩码视图下保持稳定的音频表示。实验表明,AaSP在多个音频识别任务中表现出色,优于现有自监督方法。

Comments Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing (TALSP). Copyright IEEE

详情
英文摘要

Transformer-based audio self-supervised learning (SSL) models commonly use spectrograms, vision-style Transformers, and masked modeling objectives. However, convolutional patchification with temporal downsampling lowers the effective Nyquist frequency and introduces aliasing, while naïve low-pass filtering may remove task-relevant high-frequency cues. We present AaSP, an aliasing-aware self-supervised pre-training framework for audio spectrogram transformers. AaSP combines an aliasing-aware patch representation, teacher-student masked modeling, a cross-attention predictor, and multi-mask contrastive regularization to learn representations that integrate features from alias-prone modulation bands while remaining stable across masked views. Its patch-embedding module, Aliasing-aware Patch Embedding (AaPE), augments standard patch tokens with features from alias-prone modulation bands using a band-limited complex sinusoidal kernel with a two-sided exponential window. The kernel's frequency and decay parameters are estimated from the input, enabling adaptive subband analysis whose outputs are fused with standard patch tokens. We pre-train on AudioSet and evaluate the learned representations by fine-tuning and linear evaluation on acoustic/environmental, speech, and music recognition benchmarks. Under fine-tuning, the full AaSP framework achieves state-of-the-art results on AS-20K, ESC-50, and NSynth among compared self-supervised baselines, while remaining competitive elsewhere. Linear evaluation shows a similar trend, including gains on US8K and NSynth. Overall, AaSP learns representations that are more stable under aliasing-sensitive temporal perturbations and competitive for downstream transfer.

2512.03532 2026-05-15 cs.CV

OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation

Zhishan Zhou, Siyuan Wei, Zengran Wang, Chunjie Wang, Xiaosheng Yan, Xiao Liu

AI总结 OpenTrack3D 是一种面向开放词汇的3D实例分割框架,旨在提升在复杂、非结构化且无需网格的环境中进行3D目标分割的准确性和泛化能力。该方法通过引入视觉-空间追踪器在线生成跨视角一致的物体提案,并结合深度信息和DINO特征图提取实例特征,实现了无需网格的高效分割。此外,OpenTrack3D 采用多模态大语言模型替代CLIP,显著提升了对复杂用户查询的语义理解能力,实验表明其在多个基准数据集上均取得先进性能。

详情
英文摘要

Generalizing open-vocabulary 3D instance segmentation (OV-3DIS) to diverse, unstructured, and mesh-free environments is crucial for robotics and AR/VR, yet remains a significant challenge. We attribute this to two key limitations of existing methods: (1) proposal generation relies on dataset-specific proposal networks or mesh-based superpoints, rendering them inapplicable in mesh-free scenarios and limiting generalization to novel scenes; and (2) the weak textual reasoning of CLIP-based classifiers, which struggle to recognize compositional and functional user queries. To address these issues, we introduce OpenTrack3D, a generalizable and accurate framework. Unlike methods that rely on pre-generated proposals, OpenTrack3D employs a novel visual-spatial tracker to construct cross-view consistent object proposals online. Given an RGB-D stream, our pipeline first leverages a 2D open-vocabulary segmenter to generate masks, which are lifted to 3D point clouds using depth. Mask-guided instance features are then extracted using DINO feature maps, and our tracker fuses visual and spatial cues to maintain instance consistency. The core pipeline is entirely mesh-free, yet we also provide an optional superpoints refinement module to further enhance performance when scene mesh is available. Finally, we replace CLIP with a multi-modal large language model (MLLM), significantly enhancing compositional reasoning for complex user queries. Extensive experiments on diverse benchmarks, including ScanNet200, Replica, ScanNet++, and SceneFun3D, demonstrate state-of-the-art performance and strong generalization capabilities.

2512.02482 2026-05-15 cs.CV

G-SHARP: Gaussian Surgical Hardware Accelerated Real-time Pipeline

Vishwesh Nath, Javier G. Tejero, Aravind S. Kumar, Ruilong Li, Filippo Filicori, Mahdi Azizian, Sean D. Huver

AI总结 本文提出了一种名为G-SHARP的实时手术场景重建框架,旨在满足微创手术中对可变形组织进行快速而精确3D建模的需求。该方法基于开源的GSplat(Apache-2.0)可微高斯光栅化器构建,实现了原理化的形变建模、鲁棒的遮挡处理以及高保真重建,并在EndoNeRF数据集上取得了领先的重建质量。此外,研究还提供了可在NVIDIA IGX Orin和Thor边缘设备上部署的Holoscan SDK应用,支持实际手术室环境中的实时手术可视化。

详情
英文摘要

We propose G-SHARP, a commercially compatible, real-time surgical scene reconstruction framework designed for minimally invasive procedures that require fast and accurate 3D modeling of deformable tissue. While recent Gaussian splatting approaches have advanced real-time endoscopic reconstruction, existing implementations often depend on non-commercial derivatives, limiting deployability. G-SHARP overcomes these constraints by being the first surgical pipeline built natively on the GSplat (Apache-2.0) differentiable Gaussian rasterizer, enabling principled deformation modeling, robust occlusion handling, and high-fidelity reconstructions on the EndoNeRF pulling benchmark. Our results demonstrate state-of-the-art reconstruction quality with strong speed-accuracy trade-offs suitable for intra-operative use. Finally, we provide a Holoscan SDK application that deploys G-SHARP on NVIDIA IGX Orin and Thor edge hardware, enabling real-time surgical visualization in practical operating-room settings.

2511.21740 2026-05-15 cs.CL cs.AI

A cross-species neural foundation model for end-to-end speech decoding

Yizi Zhang, Linyang He, Chaofei Fan, Tingkai Liu, Han Yu, Trung Le, Jingyuan Li, Scott Linderman, Lea Duncker, Francis R Willett, Nima Mesgarani, Liam Paninski

AI总结 该论文提出了一种端到端的脑到文本(BIT)框架,旨在通过神经网络直接将神经活动解码为连贯的句子,从而提升脑机接口的通信能力。核心方法是采用跨任务、跨物种预训练的神经编码器,并结合音频大语言模型与对比学习,实现了比传统分阶段方法更低的词错误率。研究不仅在多个基准测试中取得了新的最先进性能,还展示了跨任务泛化能力,为端到端神经解码提供了重要进展。

详情
英文摘要

Speech brain-computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phonemes before assembling sentences with an n-gram language model (LM), preventing joint optimization of all stages simultaneously. Here, we introduce an end-to-end BraIn-to-Text (BIT) framework that translates neural activity into coherent sentences using a single differentiable neural network. Central to our approach is a cross-task, cross-species pretrained neural encoder, whose representations transfer to both attempted and imagined speech. In a cascaded setting with an n-gram LM, the pretrained encoder establishes a new state-of-the-art (SOTA) on the Brain-to-Text '24 and '25 benchmarks. Integrated end-to-end with audio large language models (LLMs) and trained with contrastive learning for cross-modal alignment, BIT reduces the word error rate (WER) of the prior end-to-end method from 24.69% to 10.22%. Notably, we find that small-scale audio LLMs markedly improve end-to-end decoding. Beyond record-setting performance, BIT aligns attempted and imagined speech embeddings to enable cross-task generalization. Altogether, our approach advances the integration of large, diverse neural datasets, paving the way for an end-to-end decoding framework that supports seamless, differentiable optimization.

2511.21104 2026-05-15 cs.LG cs.PL

BRIDGE: Building Representations In Domain Guided Program Synthesis

Robert Joseph George, Carson Eisenach, Udaya Ghai, Dominique Perrault-Joncas, Anima Anandkumar, Dean Foster

AI总结 BRIDGE 是一个用于多领域程序合成的结构化提示框架,旨在解决在形式化验证工具如 Lean 中生成可验证代码的挑战。该方法将代码生成、规范描述和定理/证明三个领域进行关联,并通过领域特定的中间推理实现它们之间的连接。实验表明,BRIDGE 显著提升了 Lean 中代码的可执行正确性,并在样本效率和 Python 代码生成方面也表现出优越性能,展示了其在可验证程序合成中的实用价值。

Comments 41 pages, 10 figures, 3 tables. Preprint

详情
英文摘要

Large language models can generate plausible code, but remain brittle for formal verification in proof assistants such as Lean. A central scalability challenge is that verified synthesis requires consistent artifacts across several coupled domains: executable code, formal specifications, theorem statements, and proof attempts. Existing approaches often treat these artifacts separately. We present BRIDGE, a structured prompting framework for multi-artifact program synthesis. BRIDGE decomposes generation into three interconnected domains: Code, Specification, and Theorem/Proof, and uses domain-specific intermediate reasoning to connect them. In Lean, BRIDGE often follows a code-first workflow, using the generated implementation as a semantic anchor for downstream specification, theorem statement, and proof-attempt generation. Across 178 algorithmic problems and five LLMs, BRIDGE improves Lean executable correctness by up to nearly 1.5x over direct prompting and can be roughly 2x more sample efficient at comparable generation lengths. We further find that specification-oriented prompting improves Python pass rates by up to 17.5 percentage points. Beyond inference-time prompting, supervised fine-tuning on BRIDGE-style reasoning traces yields nearly 1.5x higher Lean pass success than code-only fine-tuning, suggesting that these intermediate representations provide a learnable inductive bias. BRIDGE provides a practical framework for scaling verified synthesis while highlighting the remaining gap between executable correctness and full formal proof generation.

2511.18903 2026-05-15 cs.LG cs.AI cs.CL

How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining

Kairong Luo, Zhenbo Sun, Haodong Wen, Xinyu Shi, Jiarui Cui, Chenyi Dang, Kaifeng Lyu, Wenguang Chen

AI总结 在基于课程的大型语言模型(LLM)预训练中,高质量数据的利用效率受到学习率衰减策略的限制。本文发现,当使用递减的学习率调度时,按数据质量排序的课程式训练优势会显著减弱。为此,研究提出了两种简单有效的方法:采用更温和的学习率衰减策略,或用模型平均替代学习率衰减,从而在不额外优化数据的情况下提升了模型在多个基准测试中的表现。这一发现为课程式预训练与优化方法的协同设计提供了新思路。

详情
英文摘要

Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticated data curation. A natural approach to better leverage high-quality data is curriculum-based pretraining, where the model is trained on data sorted in ascending order of quality as determined by a quality metric. However, prior studies have reported limited improvements from such curriculum-based pretraining strategies. This work identifies a critical factor constraining these methods: the incompatibility between the ascending data quality order and the decaying learning rate (LR) schedule. We find that while curriculum-based training substantially outperforms random shuffling when using a constant LR, its advantage diminishes under standard LR decay schedules. Our experiments show this incompatibility can be mitigated by two simple strategies: (1) employing a more moderate LR decay schedule, where the final LR is only moderately smaller than the peak LR, and (2) replacing LR decay with model averaging, i.e., computing a weighted average of the final few checkpoints. By combining these strategies, we improve the average score on a suite of standard benchmarks by 1.64% over random shuffling, without additional data refinement. Validated on 1.5B-parameter models trained over 30B tokens with various data-quality metrics, our findings call for a re-evaluation of curriculum-based LLM pretraining and underscore the potential of co-designing data curricula with optimization methods.