arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2092
2605.07815 2026-05-11 cs.LG cs.CL

OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling

Yuxuan Lou, Yang You

AI总结 OrScale 是一种基于正交化优化的新型训练方法,通过引入层-wise 信任比缩放机制,改进了 Muon 在神经网络训练中的更新策略。该方法利用实际参数方向的 Frobenius 范数作为分母,实现了更精确的层适应性更新,有效避免了传统混合方法中的收敛问题。实验表明,OrScale 在图像分类和语言模型预训练任务中均优于现有方法,展现出更强的收敛性和泛化能力。

详情
英文摘要

Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominator of a layer-wise ratio should measure the Frobenius norm of the actual parameter-space direction that will be applied. This yields OrScale for general matrix layers and OrScale-LM for language models, where Moonlight shape scaling is combined with one-time per-layer calibration so every trust ratio starts at one. We analyze why three natural Muon-LAMB hybrids fail through shape-degenerate denominators, raw-momentum clip saturation, and decoupled weight-decay runaway, and show that the real-update-direction denominator with coupled weight decay avoids these failures. Theoretically, OrScale admits an O(1/sqrt(T)) nonconvex convergence guarantee in a nuclear-norm criterion, a strict layer-adaptive descent gain under measurable layer heterogeneity, and calibration properties that preserve muP-style learning-rate transfer at initialization. Empirically, OrScale ranks first on CIFAR-10/DavidNet across three seeds, improving Muon from 93.70% to 94.05% validation top-1, and OrScale-LM improves FineWeb-Edu pre-training versus Muon+Moonlight at three of four scales from 125M to 1.1B parameters while outperforming AdamW at every scale.

2605.07811 2026-05-11 cs.CL

A Comparative Analysis of Classical Machine Learning and Deep Learning Approaches for Sentiment Classification on IMDb Movie Reviews

Erma Daniar Safitri, Lia Hana Ichisasmita, Citra Agustin, Luluk Muthoharoh, Ardika Satria, Martin Clinton Tosima Manullang

AI总结 本文对比研究了经典机器学习与深度学习方法在IMDb电影评论情感分类任务中的表现。研究采用TF-IDF特征与PyCaret AutoML评估逻辑回归、朴素贝叶斯和支持向量机等经典模型,并实现双向LSTM及其带注意力机制的变体作为深度学习方法。实验结果表明,经典机器学习中的支持向量机在准确率上达到0.8530,优于所研究的深度学习模型,而带有注意力机制的BiLSTM在准确率上达到0.706,显示出更强的上下文建模能力。研究指出,在数据和计算资源有限的情况下,结合有效特征工程的经典机器学习方法仍具有较强的竞争力。

Comments 10 pages, 4 authors from Department of Data Science and 2 authors from Department of Informatics Engineering, Institut Teknologi Sumatera, Indonesia

详情
英文摘要

This paper presents a comparative study of classical machine learning and deep learning methods for sentiment classification on the IMDb movie reviews dataset. The machine learning pipeline uses TF-IDF features and PyCaret AutoML to evaluate Logistic Regression, Naïve Bayes, and Support Vector Machine, while the deep learning pipeline implements BiLSTM and BiLSTM with an attention mechanism. Experimental results show that classical machine learning, especially SVM, achieves the best performance with an accuracy of 0.8530, outperforming the deep learning models in this study. The BiLSTM with Attention model improves over the standard BiLSTM and reaches an accuracy of 0.706, indicating better contextual modeling. The paper concludes that although deep learning can capture sequential dependencies, classical machine learning remains a strong baseline when combined with effective feature engineering such as TF-IDF, particularly under limited data and computational resources.

2605.07808 2026-05-11 cs.LG

The Minimax Rate of Second-Order Calibration

Kamil Ciosek, Banafsheh Rafiee, Sina Ghiassian, Nicolò Felicioni

AI总结 本文研究了二分类任务中二阶校准误差的最小最大估计速率,该误差衡量高阶预测器的不确定性估计与其标签条件方差在水平集上的匹配程度。作者提出利用双曲正割扰动核进行多项式回归,实现了误差估计速率的显著提升,并证明了其最小最大最优性。此外,本文还给出了二阶Platt标定的首个有限样本保证,并提供了无需分桶的二阶校准定义。

详情
英文摘要

We characterize the minimax rate of estimating the second-order calibration error for binary classification, which quantifies whether a higher-order predictor's epistemic-uncertainty estimate matches the conditional variance of the label probability on its level sets. Our key observation is that the sech perturbation kernel, previously used only to enforce smoothness of calibration functions, in fact makes them analytic in a strip of half-width $hπ/2$. Polynomial regression then estimates the calibration error at rate $\tilde{O}(1/\sqrt{n})$, with explicit constants, a qualitative improvement over the $O(n^{-1/4})$ rate achievable by bucketing or kernel smoothing. A matching $Ω(1/\sqrt{n})$ lower bound establishes minimax optimality up to logarithmic factors. As a corollary, we give the first finite-sample guarantee for second-order Platt scaling, yielding a post-hoc procedure that recalibrates both the mean prediction and the epistemic-variance estimate of any higher-order predictor. Along the way, we provide a bucket-free definition of second-order calibration and relate it quantitatively to the bucketed formulation of Ahdritz et al. [2025]. Our experiments confirm the predicted rate and the quality of the recalibrated uncertainties.

2605.07807 2026-05-11 cs.CV cs.AI cs.LG cs.RO

Text-to-CAD Evaluation with CADTests

Dimitrios Mallis, Marco Wang, Ahmet Serdar Karadeniz, Elisa Ricci, Anis Kacem, Djamila Aouada

AI总结 本文提出了一种基于自动化测试的文本到CAD模型生成(Text-to-CAD)评估新方法,引入了首个基于测试的评估基准CADTestBench,该基准利用可执行的CADTests验证生成模型是否满足输入提示的几何与拓扑要求。通过该基准对现有Text-to-CAD方法进行了全面评估,并进一步证明CADTests可用于指导模型生成,从而提出简单基线方法,其性能优于当前主流方法。

详情
英文摘要

Text-to-CAD has recently emerged as an important task with the potential to substantially accelerate design workflows. Despite its significance, there has been surprisingly little work on Text-to-CAD evaluation, and assessing CAD model generation performance remains a considerable challenge. In this work, we introduce a new evaluation perspective for Text-to-CAD based on automated testing. We propose CADTestBench, the first test-based benchmark for Text-to-CAD, based on CADTests, executable software tests that verify whether a generated CAD model satisfies the geometric and topological requirements of the input prompt. Using CADTestBench, we conduct comprehensive benchmarking of recent Text-to-CAD methods and further demonstrate that CADTests can also guide CAD model generation, yielding simple baselines that surpass performance of current methods. CADTestBench code and data are available at GitHub and Hugging Face dataset.

2605.07806 2026-05-11 cs.CL cs.AI cs.LG

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

Sree Bhattacharyya, Samarth Khanna, Leona Chen, Lucas Craig, Tharun Dilliraj, James Z. Wang

AI总结 该研究探讨了大型语言模型(LLMs)在性能预测中自我评估的可靠性问题,指出传统依赖置信度的方法存在不一致和过于乐观的缺陷。基于认知评价理论,研究提出了一种多维的自我评估框架,引入包括能力、努力等六个评估维度,并在多个任务和模型中验证其预测模型失败的有效性。结果表明,与置信度相比,与能力相关的维度如努力和能力在多数场景下表现更优,且更具稳定性,为提升模型部署的安全性和可靠性提供了新思路。

详情
英文摘要

Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, however, has been shown to be an inconsistent and overoptimistic predictor of model correctness. Drawing on cognitive appraisal theory, a framework from human psychology that decomposes self-evaluation into multiple components, we propose a multidimensional perspective on model self-assessment. We elicit six appraisal-based dimensions of self-assessment, alongside confidence, and evaluate their utility for predicting model failure across 12 LLMs and 38 tasks spanning eight domains. We find that competence-related appraisal dimensions, particularly effort and ability, consistently match or outperform confidence across most settings. Effort additionally yields less overoptimistic estimates that remain stable across model sizes. In contrast, affective dimensions provide marginally predictive signals. Furthermore, the most informative dimension varies systematically with task characteristics: effort is most predictive for reasoning-intensive tasks, while ability and confidence dominate on retrieval-oriented tasks. Broadly, our findings indicate that structured multidimensional self-assessment is a promising approach to improving the reliability and safety of language model deployment across diverse real-world settings.

2605.07805 2026-05-11 cs.LG

Flexible Routing via Uncertainty Decomposition

Charlotte Peale, Siddartha Devic, Parikshit Gopalan, Udi Wieder, Aravind Gollakota

AI总结 本文提出了一种基于不确定性分解的灵活路由方法,用于在机器学习系统中动态地将查询路由到低成本模型或高成本的专家模型,从而在性能与成本之间取得平衡。该方法通过将总不确定性分解为不可约和可约两部分,实现了对路由和弃权的统一处理,并能够在不同损失函数和成本参数下通过简单调整超参数进行自适应,无需重新训练。实验表明,该方法在可约与不可约不确定性不高度相关的场景中具有显著优势,并具有理论上的后悔界保证。

详情
英文摘要

A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. In this work we present a new uncertainty-aware router that (1) avoids unnecessary oracle calls on inherently ambiguous queries, and (2) adapts dynamically to different loss functions and cost parameters through simple hyperparameter changes, without retraining. Our method, applicable to any classification setting where multiple independent annotations per input are available, is based on decomposing total uncertainty into irreducible and reducible components using higher-order predictors [Ahdritz et al., 2025]. This enables a unified approach to both routing and abstention: predict with the weak model when uncertainty is low, route to the oracle when reducible uncertainty is high, and abstain when irreducible uncertainty is high. Our router comes with strong theoretical guarantees bounding regret relative to optimal task-specific routers. We conduct experiments on both synthetic and real-world datasets that demonstrate the benefits of our approach in suitable regimes -- in particular, whenever reducible and irreducible uncertainty are not too correlated.

2605.07796 2026-05-11 cs.CL

PolySQL: Scaling Text-to-SQL Evaluation Across SQL Dialects via Automated Backend Isomorphism

Yotam Perlitz, Elad Venezian, Corentin Royer, Francesco Fusco, Andrea Giovannini

AI总结 该研究针对文本到SQL生成任务在不同数据库方言之间的评估不足问题,提出了一种名为PolySQL的新方法,通过双执行机制直接对比标准化的执行结果,无需手动转换查询语句,从而实现更准确、全面的跨方言评估。该方法填补了现有评估体系的空白,揭示了从SQLite到其他方言的平均准确率下降达10.1%,并发现了不同方言在难度上的显著差异。研究还发布了框架代码和排行榜,为构建更具鲁棒性的文本到SQL系统提供了重要支持。

详情
英文摘要

SQL dialects vary in syntax, types, and functions across database engines. Text-to-SQL benchmarks, however, predominantly support only SQLite. This creates a critical evaluation gap: cross-dialect evaluation reveals weak per-query agreement (Cohen's ), showing that SQLite performance is an unreliable proxy for other dialects. Yet such evaluation remains prohibitively difficult: existing approaches either require expensive manual query transpilation or rely on tools that often fail on complex SQL. To close this gap, we introduce PolySQL, a novel dual-execution method that eliminates the need for query transpilation by comparing normalized execution results. Notably, our approach achieves higher evaluation fidelity than query transpilation with 100% query coverage. PolySQL comprises three datasets, enabling the first large-scale cross-dialect study. Our study reveals a 10.1% average accuracy drop from SQLite to other dialects and identifies a significant dialect difficulty hierarchy. We find this degradation stems from logical rather than syntactic errors (61% vs. 8%). We release our framework code and leaderboard to enable rigorous dialect-robust evaluation.

2605.07794 2026-05-11 cs.RO

NoiseGate: Learning Per-Latent Timestep Schedules as Information Gating in World Action Models

Wen Huang, Haoran Sun, Yongjian Guo, Yunxuan Ma, Haoran Li, Jing Long, Zhouying Mo, Zhong Guan, Yucheng Guo, Shuai Di, Junwu Xiong

AI总结 本文提出了一种名为 NoiseGate 的新方法,用于改进世界动作模型(WAMs)中动作生成与未来观测预测的联合建模。该方法通过学习每个潜在帧独立的时间步长安排,将噪声水平作为信息门控机制,从而动态调节不同潜在帧对动作生成的贡献度。与现有方法中使用固定时间步长不同,NoiseGate 引入了一个轻量的门控策略网络,在去噪过程中为每个潜在帧生成独立的时间增量,并通过任务奖励优化训练该策略,无需人工设计的先验形状约束。实验表明,NoiseGate 在多种 RoboTwin 随机场景操作任务中均取得了显著提升。

详情
英文摘要

World Action Models (WAMs) are an emerging family of policies that tie robot action generation to future-observation modeling. In this work, we focus on the joint video--action modeling paradigm, where actions and imagined future observations are co-generated along a shared denoising or flow trajectory, so that perception, prediction, and control are coupled within one generative process. Existing WAMs typically realize this paradigm with a Mixture-of-Transformers (MoT), where video and action tokens interact through shared self-attention. This architecture can in principle assign a separate timestep $t_f$ to each predicted latent frame, yet current systems collapse this degree of freedom onto a single shared scalar $t$. Under the noise-as-masking view of Diffusion Forcing, this shared schedule imposes the unjustified prior that every predicted latent is equally reliable for action generation. We instead view the per-latent schedule as a \emph{learnable information-gating policy}: by changing a latent frame's noise level, the policy modulates the reliability of its Key/Value contribution to the action tokens. We propose \textbf{NoiseGate}, which combines independent per-latent timestep sampling during backbone training, a lightweight Gating Policy Network that emits per-latent time increments during denoising, and task-reward optimization that trains the schedule policy without hand-crafted shape priors. Built on a joint video--action MoT backbone, NoiseGate delivers consistent gains on diverse RoboTwin random-scene manipulation tasks.

2605.07793 2026-05-11 cs.CL

Hybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Text

Allya Nurul Islami Pasha, Eka Fidiya Putri, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang

AI总结 本文研究了针对印尼社交媒体文本的三类情感分析任务,提出了一种结合TF-IDF文本特征、轻量数值元数据特征和多项式逻辑回归分类器的实用基线模型。研究还对比了使用相同混合特征表示的两层多层感知机(MLP)神经网络基线。实验结果表明,逻辑回归模型在准确率、加权F1和宏F1指标上均表现良好,表明在小规模印尼情感数据集上,精心的预处理、可解释的特征工程和类别平衡仍具有竞争力。

Comments 8 pages, 4 figures, 4 tables. Research paper on Indonesian three-class sentiment analysis using TF--IDF, Logistic Regression, and MLP baselines

详情
英文摘要

This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--IDF text features, three lightweight numeric metadata features, and a balanced multinomial Logistic Regression classifier. For comparison, the study also includes a neural baseline using a two-layer multilayer perceptron (MLP) over the same hybrid feature representation. The dataset originally contains 732 rows and 191 fine-grained emotion labels; after cleaning, deduplication, and label remapping, 707 samples remain with an imbalanced distribution of 459 positive, 188 negative, and 60 neutral instances. Experimental results show that the Logistic Regression deployment model reaches 0.8028 accuracy, 0.8003 weighted F1, and 0.7276 macro F1, while project documentation reports a higher-accuracy but non-production MLP baseline. These findings indicate that careful preprocessing, interpretable feature engineering, and class balancing remain competitive for small Indonesian sentiment datasets, whereas the neural baseline is better treated as a comparative experiment than as the default deployment model.

2605.07792 2026-05-11 cs.LG cs.AI cs.NA math.NA nucl-th

Neural Operators as Efficient Function Interpolators

Vasilis Niarchos, Angelos Sirbu, Sokratis Trifinopoulos

AI总结 本文提出将神经算子(NOs)重新定义为高效的函数插值工具,通过引入辅助基空间,将有限维函数视为对基空间函数的复合操作。实验表明,NOs在保持高精度的同时,比传统多层感知机和柯尔莫戈罗夫-阿诺德网络使用更少参数和训练时间。研究还展示了NOs在核质量模型修正中的实际应用,取得了优于当前最佳方法的性能,验证了其在科学数据插值中的高效性与可扩展性。

Comments 12 pages, 9 figures

详情
英文摘要

Neural operators (NOs) are designed to learn maps between infinite-dimensional function spaces. We propose a novel reframing of their use. By introducing an auxiliary base-space, any finite-dimensional function can be viewed as an operator acting by composition on functions of the base-space. Through a range of benchmarks on analytic functions of increasing complexity and dimensionality, we demonstrate that NOs can match or outperform standard multilayer perceptrons and Kolmogorov--Arnold Networks in accuracy while requiring significantly fewer parameters and training time. As a real-world application, we apply a two-dimensional Tensorized Fourier Neural Operator (TFNO) to the nuclear chart, learning a correction to state-of-the-art nuclear mass models as a partially observed residual field. A TFNO ensemble reaches a held-out root-mean-square error of 198.2 keV, placing it among the best recent neural-network approaches while retaining high parameter efficiency and short training times. More broadly, these results introduce NOs as a scalable framework for finite-dimensional function interpolation, from analytic benchmarks to structured scientific data.

2605.07783 2026-05-11 cs.CL

Chain-based Distillation for Effective Initialization of Variable-Sized Small Language Models

Boyu Shi, YiCheng Jiang, Chang Liu, Qiufeng Wang, Xu Yang, Xin Geng

AI总结 本文提出了一种基于链式知识蒸馏(CBD)的方法,用于高效初始化不同规模的小型语言模型。该方法通过逐步蒸馏构建一系列中间模型(锚点),形成知识传递链,从而避免了对大模型的重复调用,提升了可扩展性。此外,引入了桥接蒸馏技术,支持跨架构和跨词表的知识迁移,实验表明该方法在效率和下游任务表现上均有显著提升。

详情
英文摘要

Large language models (LLMs) achieve strong performance but remain costly to deploy in resource-constrained settings. Training small language models (SLMs) from scratch is computationally expensive, while conventional knowledge distillation requires repeated access to large teachers for different target sizes, leading to poor scalability. To solve these problems, we propose \textbf{Chain-based Distillation (CBD)}, a scalable paradigm for efficiently initializing variable-sized language models. A sparse and limited sequence of intermediate models (called anchors) is constructed via stepwise distillation, forming a distillation chain that progressively transfers knowledge from the source LLMs. To support heterogeneous settings, we introduce \emph{bridge distillation} for cross-architecture and cross-vocabulary transfer. Models of variable sizes are initialized via parameter interpolation between adjacent anchors, eliminating repeated large teacher inference. Experiments show that the proposed method substantially improves efficiency and downstream performance. A 138M-parameter SLM without recovery pre-training, outperforms scratch-trained models on a 10B-token corpus on the specific task. CBD also demonstrates versatility in heterogeneous settings for initialize models with different architectures and vocabularies.

2605.07781 2026-05-11 cs.CV

Differentiable Ray Tracing with Gaussians for Unified Radio Propagation Simulation and View Synthesis

Niklas Vaara, Lam Huynh, Pekka Sangi, Miguel Bordallo López, Janne Heikkilä

AI总结 该研究提出了一种基于可微分光线追踪与高斯表示的统一框架,用于同时进行无线电波传播模拟和高质量视图合成。通过将高斯原语嵌入硬件加速的光线追踪结构,实现了对三维空间中任意两点之间无线电路径的精确计算,而无需手动构建网格。该方法从视觉重建中提取物理意义的信道冲激响应,展示了神经重建模型在电磁波传播模拟与逼真视觉合成中的统一应用潜力。

详情
英文摘要

Explicit neural representations such as 3D Gaussian Splatting (3DGS) enable high-fidelity and real-time novel view synthesis, yet optimize for alpha-composited optical appearance rather than ray-intersectable geometry. In contrast, radio-frequency (RF) digital twins require deterministic multi-bounce paths, where the geometry dictates trajectories and their associated attenuation and delay. We introduce a framework enabling differentiable RF propagation simulation directly within visually reconstructed neural scenes, allowing point-to-point path computation between arbitrary 3D locations while preserving high-quality visual rendering. Unlike conventional RF simulation pipelines that rely on manually constructed meshes, we embed Gaussian primitives into a hardware-accelerated ray tracing structure as the underlying spatial representation. By extracting physically meaningful channel impulse responses from visual-only reconstructions, we provide cross-modal evidence that neural reconstructions can serve as unified spatial representations for both electromagnetic propagation simulation and photorealistic view synthesis.

2605.07776 2026-05-11 cs.LG cs.AI cs.CL

Tracing Uncertainty in Language Model "Reasoning"

Nils Grünefeld, Bertram Højer, Philipp Mondorf, Barbara Plank, Anna Rogers, Christian Hardmeier, Stefan Heinrich, Jes Frellsen

AI总结 本文研究了语言模型在“推理”过程中不确定性变化的动态特征,通过将生成的中间推理轨迹视为模型状态,提出了不确定性轨迹特征的描述方法。该方法能够有效预测推理轨迹是否最终得到正确答案,且在多个模型和数据集上表现出较高的预测性能。研究还发现,正确轨迹的不确定性下降趋势更陡峭且非线性更强,表明基于不确定性的分析有助于深入理解语言模型推理过程中的决策机制。

详情
英文摘要

Language model (LM) "reasoning", commonly described as Chain-of-Thought or test-time scaling, often improves benchmark performance, but the dynamics underlying this process remain poorly understood. We study these dynamics through the lens of uncertainty quantification by treating the "reasoning" traces, the intermediate token sequences generated by LMs, as evolving model states. We summarize each trace by an uncertainty trace profile: a small set of features describing the shape of the uncertainty signal over its trace, such as its slope and linearity. We find that across five LMs evaluated on GSM8K and ProntoQA, these profiles predict whether a trace yields a correct final answer with AUROC up to 0.807, improving markedly on recent related work. We reach AUROC 0.801 using only the first few hundred tokens of full traces, suggesting that errors can be detected early in the generation. A detailed comparison of correct and incorrect traces further reveals qualitatively distinct uncertainty profiles, with correct traces showing a steeper and less linear decline in uncertainty. Together, the results suggest that our method, grounded in decision-making under uncertainty, provides a principled lens for studying the generative process underlying LM "reasoning".

2605.07775 2026-05-11 cs.LG cs.AI stat.ML

POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

Nicolas Menet, Andreas Krause, Abbas Rahimi

AI总结 POETS 是一种基于策略集成的不确定性感知大语言模型优化框架,旨在解决序贯决策与黑箱优化中的探索与利用平衡问题。该方法通过隐式编码奖励函数并直接训练策略集成体,避免了传统不确定性感知奖励模型的复杂训练过程,同时利用共享预训练主干与独立低秩适配分支的高效架构,显著降低了计算和内存开销。理论分析表明,POETS 实现了KL正则化的汤普森采样,具有优秀的累积遗憾界,实验显示其在蛋白质搜索、量子电路设计等科学发现任务中表现出领先的样本效率和优化性能。

Comments preprint

详情
英文摘要

Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncertainty quantification and policy optimization. Our approach is grounded in the insight that policies trained with Kullback-Leibler (KL) regularization implicitly encode an underlying reward function. Building on this, POETS bypasses the complex, nested process of training an uncertainty-aware reward model and separately fitting a policy to this model. Instead, we directly train a policy ensemble to capture epistemic uncertainty by matching implicitly encoded reward functions to online, bootstrapped data. To overcome the prohibitive compute and memory constraints of ensembling Large Language Models (LLMs), POETS utilizes an efficient architecture: the ensemble shares a pre-trained backbone while maintaining diversity through independent Low-Rank Adaptation (LoRA) branches. Theoretically, we prove that POETS implicitly conducts KL-regularized Thompson sampling and thus inherits strong cumulative regret bounds of ${\mathcal O}(\sqrt{T γ_T})$. Empirically, we demonstrate that POETS achieves state-of-the-art sample efficiency across diverse scientific discovery domains, including protein search and quantum circuit design. Furthermore, it improves the optimization trajectories of reinforcement learning, proving particularly robust in off-policy settings with experience replay or in small dataset regimes.

2605.07772 2026-05-11 cs.LG math.AP math.DS math.OC

Training-Induced Escape from Token Clustering in a Mean-Field Formulation of Transformers

Noboru Isobe, Daisuke Inoue, Masaaki Imaizumi

AI总结 本文研究了Transformer模型中训练过程如何影响注意力机制引起的token聚类现象。通过在噪声均场框架下分析仅训练参数线性的前馈网络,并结合L²正则化,作者发现随着训练进行,token分布会在后期层中逃离初始的聚类状态。研究提出了一个基于熵正则化的相互作用能量模型,揭示了训练对聚类行为的动态影响,为构建统一的训练与推理动态的均场理论提供了新视角。

Comments 48 pages, 6 figures, comments are wellcome!

详情
英文摘要

Transformers perform inference by iteratively transforming token representations across layers. This layerwise computation has been studied empirically, and recent mean-field theories of Transformer dynamics explain how attention can drive token distributions toward clustering. However, existing mean-field analyses largely treat model parameters as prescribed, leaving open how training reshapes this clustering picture. We study this question in a noisy mean-field Transformer in which only a parameter-linear FFN is trained under $L^2$ regularization. We find and analyze a training-induced phase in the dynamics: after initially following attention-driven clustering, the token distribution can leave the clustered regime near the final layers. Our mathematical analysis is based on an entropy-regularized interaction energy that captures the clustering bias of attention. More broadly, our results point toward a training-aware mean-field theory of Transformer dynamics, in which training and inference dynamics are treated together.

2605.07771 2026-05-11 cs.RO

Sensitivity-Based Robust NMPC for Close-Proximity Offshore Wind Turbine Inspection with a Tilted Multirotor

Giuseppe Silano, Martin Saska

AI总结 本文研究了倾斜多旋翼无人机在海上风力涡轮机近距离检测中的鲁棒非线性模型预测控制(NMPC)问题,针对风扰和模型不确定性导致的安全距离约束违反问题,提出了一种基于灵敏度的鲁棒NMPC方法。该方法通过在线约束收紧,结合参数状态灵敏度和阶段依赖的附加裕度,有效增强了塔筒安全距离约束的鲁棒性,仿真结果表明该控制器在保证安全性的前提下仅带来适度的计算时间增加。

Comments 5 pages. Accepted for presentation at the ICRA 2026 Workshop on "Aerial inspection for marine infrastructures," June 1, 2026, Vienna, Austria

详情
英文摘要

Close-proximity offshore wind turbine inspection requires strict clearance control around large cylindrical structures under wind and model mismatch. Nominal Nonlinear Model Predictive Control (NMPC) may violate safety constraints when mass, inertia, thrust effectiveness, drag, or wind conditions differ from nominal assumptions. We propose a sensitivity-based robust NMPC for a tilted multirotor that robustifies the tower-clearance constraint via online constraint tightening. First-order parametric state sensitivities provide a structured-uncertainty margin, while bounded gusts are handled by a stage-dependent additive margin. The formulation augments the nominal NMPC with sensitivity propagation and margin evaluation only, leaving the receding-horizon optimization structure unchanged. Monte-Carlo evaluation over 500 uncertainty realizations on a boundary-critical helical inspection trajectory shows that the proposed controller eliminates the clearance violations observed under nominal NMPC at the cost of a moderate increase in solve time.

2605.07767 2026-05-11 cs.CV

SIMI: Self-information Mining Network for Low-light Image Enhancement

Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral

AI总结 在低光环境下,图像质量受到严重影响,给图像编辑和可视化带来挑战。本文提出了一种名为SIMI的自信息挖掘网络,该网络基于位平面分解技术,无需外部数据即可从低光图像中挖掘内在信息,实现了高效的无监督增强。该方法不仅加快了模型收敛速度,降低了计算开销,还在标准基准测试中取得了当前最先进的性能。

详情
英文摘要

Poor lighting conditions significantly impact image quality, posing substantial challenges for image editing and visualization. Many existing enhancement methods aim at proposing complex models while neglecting the intrinsic information contained within low-light images. In this work, we propose the Self-Information Mining (SIMI) network, an innovative unsupervised framework that decomposes low-light images into multiple components based on bit-plane decomposition. Our approach allows mining intrinsic information without relying on external data. This not only accelerates model convergence but also improves performance and reduces computational overhead. The unsupervised nature of our method facilitates real-world applicability. Experiments conducted on standard benchmarks demonstrate that SIMI achieves state-of-the-art performance.

2605.07766 2026-05-11 cs.CV

Head Similarity: Modeling Structured Whole-Head Appearance Beyond Face Recognition

Yingfeng Wang, Yuxuan Xiao, Shengcai Liao

AI总结 许多视觉应用需要在非正面视角或面部线索缺失的情况下保持身份一致性,而传统人脸识别模型因强调身份不变性,无法捕捉发型、装饰等外观变化,限制了其在外观敏感场景中的应用。为此,本文提出“头部相似度”(Head Similarity)新方法,通过结构化建模整体头部外观,显式保留身份内的外观变化,并在身份和外观状态间建立层次化相似性排序。研究构建了一个大规模基准数据集,并开发了一个基于分层监督和身份感知蒸馏的框架,实验表明该方法能有效建模结构化的整体头部相似性,优于传统人脸识别模型。

详情
英文摘要

Many vision applications require identity consistency beyond strict biometric recognition, especially under non-frontal views or when facial cues are missing. However, conventional face recognition models enforce intra-identity invariance, collapsing appearance variations such as hairstyle or styling changes into a single representation, limiting their use in appearance-sensitive scenarios. To address this limitation, we introduce Head Similarity, a new formulation that extends identity-centric recognition to structured whole-head similarity modeling. Our approach explicitly captures intra-identity appearance variation and enforces hierarchical similarity ordering across identity and appearance states, enabling meaningful comparison even under occlusion or rear-view conditions. We construct a large-scale benchmark from long-form videos with weakly-supervised appearance states, covering diverse poses, occlusions, and temporal changes. As a first step, we develop a simple yet effective framework that jointly models identity discrimination and appearance-sensitive similarity through hierarchical supervision and identity-aware distillation. Experiments show that conventional face recognition models fail to capture appearance-dependent similarity, while our approach demonstrates the feasibility of structured whole-head similarity modeling.

2605.07765 2026-05-11 cs.LG

Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation

Elliot Pickens, Chiraag Gohel, Sidharth Satya

AI总结 本文研究了TabPFN作为一种无需训练、模块化的摘要网络,在基于模拟的贝叶斯推断(SBI)中的应用。作者提出了一种通用方法PFN-NPE,利用预训练的TabPFN编码器作为固定摘要网络处理模拟数据,再结合任务相关的推理头进行后验估计。实验表明,该方法在多个SBI任务中表现优异,能够有效保留后验分布的关键信息,同时揭示了其在联合后验结构建模方面仍存在的局限性。

详情
英文摘要

In this work, we study TabPFN as a training-free, modular summary network for simulation-based Bayesian inference (SBI). Tabular foundation models such as TabPFN are pretrained on broad families of synthetic tabular data-generating processes and adapt at test time through in-context learning, making them natural candidates for SBI, where posterior estimation often depends on learning informative summaries of simulated observations. We propose PFN-NPE: a general recipe that uses a pretrained TabPFN encoder as a fixed summary network for simulator outputs, then pairs the resulting summaries with a downstream inference head chosen for the problem. With normalizing flows as the default inference head, PFN-NPE matches established posterior approximation methods and sometimes outperforms them. More importantly, diagnostic probes show that the TabPFN-derived summaries often preserve useful posterior location and marginal information. These analyses also reveal a limitation in that TabPFN-derived summaries may struggle to represent the joint posterior structure even when the marginals are well recovered. Still, our experiments show that TabPFN can serve as an effective summary network across a diverse set of SBI settings, with the inference network left modular and task-dependent.

2605.07764 2026-05-11 cs.RO

CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

Mohammed Majid, Amjad Yousef Majid

AI总结 本文提出 CommandSwarm,一种面向机器人集群的安全感知自然语言到行为树生成系统,旨在将用户指令转化为可执行的安全行为树,避免无效或危险操作。该系统结合多语言翻译、安全过滤、约束提示和适配后的大型语言模型,通过验证机制确保生成行为树的语法正确性和安全性。实验表明,经过领域适配的量化大模型在少量样本情况下能够生成高质量的行为树,且解析接受率和安全过滤仍是实现自主部署的关键环节。

详情
英文摘要

Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.

2605.07760 2026-05-11 cs.AI

RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation

Zhifeng Lu, Dianyuan Wang, Yuhu Shang, Zhenbo Xu

AI总结 RuleSafe-VL 是一个用于评估视觉-语言内容审核中规则条件决策推理的新基准,旨在检验模型是否能正确应用政策规则进行内容判断。该基准基于公开平台的审核政策,构建了93条原子规则和92种规则关系,生成了2166个涉及高风险政策的图文案例,并设计了四个诊断任务以评估模型在规则激活、规则交互、决策充分性等方面的表现。实验表明,当前主流视觉-语言模型在规则关系恢复和决策状态预测方面仍存在显著挑战,突显了规则条件推理在内容审核中的重要性与难度。

Comments Preprint

详情
英文摘要

Platform content moderation applies explicit policy rules and context-dependent conditions to decide whether user content is allowed, restricted, or removed. A correct moderation outcome must therefore depend on which rules a case activates, how those rules interact, and whether the available evidence is sufficient. Current multimodal safety benchmarks largely reduce moderation to matching predefined final labels, leaving this underlying rule structure untested. As a result, a high benchmark score reveals little about whether a model applies the policy correctly or arrives at the correct label through superficial cues. To evaluate this rule-governed process, we introduce RuleSafe-VL, a benchmark for rule-conditioned decision reasoning in vision-language content moderation. Derived from publicly available platform moderation policies, RuleSafe-VL formalizes 93 atomic rules and 92 typed rule relations, yielding 2,166 context-sensitive image-text cases across three high-risk policy families. Its four diagnostic tasks decompose moderation into a rule-conditioned decision chain. They identify activated rules, recover rule interactions, judge decision sufficiency, and resolve outcomes once missing context is supplied. Experiments on 10 frontier, open-source, and safety-oriented VLMs reveal rule-relation recovery as the dominant bottleneck, where the best model reaches only 64.8 Macro-F1 and some safety-oriented models fall below 7 Macro-F1. Decision-state prediction also remains unreliable, peaking at 64.5 Macro-F1. RuleSafe-VL shifts moderation evaluation from final-label scoring toward diagnostic assessment of rule-conditioned decision reasoning.

2605.07757 2026-05-11 cs.LG

Efficient Verification of Neural Control Barrier Functions with Smooth Nonlinear Activations

Jun Zhang, Haibo Zhang, Chun Liu, Xiaofan Wang, Liang Xu

AI总结 本文研究了具有平滑非线性激活函数的神经控制屏障函数(NCBF)的形式化验证问题,针对现有方法在处理如$\tanh$等非线性激活时存在的保守性限制,提出了一种名为LightCROWN的新方法。该方法利用激活函数的解析特性,计算更紧致的雅可比矩阵界,从而提升验证的准确性和效率。实验表明,LightCROWN在多个非线性控制系统中显著提高了验证成功率,同时提升了速度和可扩展性,为基于CROWN框架的NCBF验证提供了通用性改进。

Comments 9 pages, 4 figures

详情
英文摘要

Formal verification of neural control barrier functions (NCBFs) remains challenging, especially for neural networks with nonlinear activations like \(\tanh\). Existing CROWN-based methods rely on conservative linear relaxations for Jacobian bounds, limiting scalability. We propose LightCROWN, which computes tighter Jacobian bounds by exploiting the analytical properties of activation functions. Experiments on nonlinear control systems including the inverted pendulum, Dubins car, and planar quadrotor demonstrate that LightCROWN improves verification success rates up to 100\%, while enhancing speed and scalability. Our approach provides a generalizable improvement for CROWN-based frameworks, enabling more efficient verification of complex NCBFs. The code can be found at github.com/Autonomous-Systems-and-Control-Lab/verify-neural-CBF.

2605.07756 2026-05-11 cs.LG cs.AI

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Ivan Karpukhin, Andrey Savchenko

AI总结 本文研究了如何在大规模无标签数据上高效预训练深度模型的问题,提出了一种基于梯度的复合损失权重调整方法。该方法通过将预训练梯度与下游任务目标对齐,自动学习损失项的权重,避免了传统随机搜索或贝叶斯优化所需的大量独立训练过程。实验表明,该方法在事件序列建模和自监督计算机视觉任务中表现优异,显著降低了超参数调优的成本。

详情
英文摘要

Modern deep models are often pretrained on large-scale data with missing labels using composite objectives, where the relative weights of multiple loss terms act as hyperparameters. Tuning these weights with random search or Bayesian optimization is computationally expensive, as it requires many independent training runs. To address this, we propose a gradient-based bilevel method that learns pretraining loss weights online by aligning the composite pretraining gradient with a downstream objective. By exploiting the structure of the loss, the method avoids the multiple backward passes typically required by truncated backpropagation through the full model, reducing the overhead of hyperparameter tuning to approximately 30% above a single training run. We evaluate the approach on event-sequence modeling and self-supervised computer vision, where it matches or improves upon carefully tuned baselines while substantially reducing the cost of hyperparameter tuning compared to random or Bayesian search.

2605.07755 2026-05-11 cs.LG cs.CL

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

Jiwan Chung, Heechan Choi, Seon Joo Kim

AI总结 本文重新审视了循环模型中的状态跟踪问题,指出除了模型的表达能力外,误差控制机制同样关键。作者证明了仿射循环网络在保持状态表示的前提下,无法修正区分不同符号状态的子空间中的误差,导致实际状态跟踪不够鲁棒。研究揭示了状态跟踪失效的机制,并通过实验表明,当可区分性比值超过解码器的可读性阈值时,跟踪性能会急剧下降,从而影响下游任务的准确性。这一发现强调了误差控制对于实现鲁棒状态跟踪的重要性。

详情
英文摘要

The theory of state tracking in recurrent architectures has predominantly focused on expressive capacity: whether a fixed architecture can theoretically realize a set of symbolic transition rules. We argue that equally important is error control, the dynamics governing hidden-state drift along the directions that distinguish symbolic states. We prove that affine recurrent networks, a class of models encompassing State-Space Models and Linear Attention, cannot correct errors along state-separating subspaces once they preserve state representations. Consequently, practical affine trackers do not learn robust state tracking; rather, they learn finite horizon solutions governed by accumulated state-relevant error. We characterize the mechanics of this failure, showing that tracking remains readable only while the accumulating within-class spread remains small relative to the initial between-class separation. We demonstrate empirically on group state-tracking tasks that this breakdown is predictable: tracking collapses when the distinguishability ratio crosses the readability threshold of the trained decoder. Across trained models, the point of this crossing predicts the horizon at which downstream accuracy fails. These results establish that robust state tracking is determined not only by an architecture's theoretical expressivity but crucially by its error control.

2605.07752 2026-05-11 cs.LG cs.CE

Robust and Reliable AI for Predictive Quality in Semiconductor Materials Manufacturing with MLOps and Uncertainty Quantification

Min Gao, Julia Maria Perathoner, Anton Ludwig Bonin, Steven Eulig, Gianni Klesse

AI总结 本文研究了在半导体材料制造中如何通过MLOps和不确定性量化实现鲁棒且可靠的预测质量控制。针对制造过程中工艺条件变化、设备老化和原材料波动带来的模型性能退化问题,作者基于五年实际生产数据,评估了不同模型重训练策略和超参数优化方法,发现固定周期重训练(每五批一次)无需超参数调优即可在多种漂移条件下保持优越性能,并显著降低计算开销。此外,引入符合预测方法以提供具有统计保证的预测置信区间,从而实现从被动到主动的质量管理转变,为制造环境中高效可靠的AI部署提供了实用指导。

详情
英文摘要

Semiconductor materials manufacturing presents unique challenges for machine learning deployment due to evolving process conditions, equipment degradation, and raw material variability that can cause model performance deterioration over time. This study benchmarks machine learning operations (MLOps) retraining strategies using five years of real manufacturing data to identify optimal retraining approaches for quality prediction. We evaluate various retraining frequencies and hyperparameter optimization strategies using control limit normalized residuals as key performance metric. Results demonstrate that a fixed retraining cadence every five production batches without hyperparameter retuning achieves superior performance across all drift conditions while significantly reducing computational overhead compared to strategies incorporating hyperparameter optimization. This approach effectively maintains model accuracy during both abrupt process changes and gradual equipment degradation patterns. To address the critical need for uncertainty quantification in manufacturing decision-making, we implement conformal prediction to generate prediction confidence intervals with strong statistical guarantees. This enables proactive quality control by identifying when prediction intervals fall within acceptable control limits, transforming traditional reactive quality management into a predictive framework. The findings provide practical guidelines for implementing robust MLOps strategies in manufacturing environments where computational efficiency and reliable uncertainty quantification are paramount for operational success.

2605.07749 2026-05-11 cs.CV

Benchmarking Foundation Models for Renal Lesion Stratification in CT

Hartmut Häntze, Sarah de Boer, Myrthe Buser, Alessa Hering, Bram van Ginneken, Mathias Prokop, Jawed Nawabi, Sebastian Ziegelmayer, Lisa Adams, Keno Bressem

AI总结 该研究评估了三种开源医学基础模型在CT影像肾病变分类任务中的表现,旨在探讨其在数据稀缺的临床场景下的泛化能力。通过对比基础模型、手工设计的放射组学分类器和从头训练的3D ResNet-50,发现基础模型在硬件需求上具有优势,但整体性能仍低于放射组学方法。研究结果表明,当前通用型基础模型的特征表示尚未能有效捕捉肾病变组织亚型的细微纹理和形态差异,因此在该任务中,放射组学方法仍是当前最优解。

Comments 13 pages, 4 figures

详情
英文摘要

The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classification, a push toward greater generalizability would be meaningful, as the field is constrained by inherently limited training data. We addressed this through a benchmark of three medical FMs on this specific task. This six-class problem spans common entities like cysts and clear cell renal cell carcinoma, alongside rare subtypes. Using a frozen feature-probing protocol, we compared FM embeddings against a handcrafted radiomics classifier and a 3D ResNet-50 trained from scratch. Models were trained on a composite dataset of 2,854 lesions and evaluated on an external test set of 234 lesions from The Cancer Imaging Archive. Our results reveal two key findings. First, FM performance (AUC 0.70-0.77) matched the from-scratch ResNet (AUC 0.72) while drastically reducing hardware demand, requiring only seconds on a CPU after feature extraction. However, the conventional radiomics baseline significantly outperformed all deep learning approaches, achieving an AUC of 0.88 (all p $\leq$ 0.002). This suggests that current generalist FM embeddings do not yet capture the fine-grained texture and shape heterogeneity driving histological subtype discrimination. Despite their potential in data-scarce settings, medical FMs did not surpass established models for renal lesion stratification, leaving radiomics as the current state-of-the-art.

2605.07748 2026-05-11 cs.CL

TextLDM: Language Modeling with Continuous Latent Diffusion

Jiaxiu Jiang, Jingjing Ren, Wenbo Li, Bo Wang, Haoze Sun, Yijun Yang, Jianhui Liu, Yanbing Zhang, Shenghe Zheng, Yuan Zhang, Haoyang Huang, Nan Duan, Wangmeng Zuo

AI总结 本文提出 TextLDM,将视觉领域中基于潜在扩散的 DiT 框架应用于语言建模,实现了生成与理解的统一架构。通过一个基于 Transformer 的 VAE 将离散词元映射到连续潜在空间,并结合预训练语言模型的表示对齐(REPA)提升条件去噪效果,标准 DiT 在该空间中进行流匹配。研究发现,仅靠重建保真度不足以获得高质量的连续文本表示,而 REPA 对下游生成质量至关重要。实验表明,TextLDM 在 OpenWebText2 上训练后,在多项指标上超越了先前的扩散语言模型,并达到与 GPT-2 相当的性能。

详情
英文摘要

Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous latents, enhanced by Representation Alignment (REPA) with a frozen pretrained language model to produce representations effective for conditional denoising. A standard DiT then performs flow matching in this latent space, identical in architecture to its visual counterpart. The central challenge we address is obtaining high-quality continuous text representations: we find that reconstruction fidelity alone is insufficient, and that aligning latent features with a pretrained language model via REPA is critical for downstream generation quality. Trained from scratch on OpenWebText2, TextLDM substantially outperforms prior diffusion language models and matches GPT-2 under the same settings. Our results establish that the visual DiT recipe transfers effectively to language, taking a concrete step toward unified diffusion architectures for multimodal generation and understanding.

2605.07741 2026-05-11 cs.RO

Offline-Online Hierarchical 3D Global Relocalization With Synthetic LiDAR Sensing and Descriptor-Space Retrieval

Jiahua Ren, Kai Shen, Muhua Zhang, Lei Ma

AI总结 本文提出了一种离线-在线分层框架,用于解决大尺度环境下移动机器人3D全局重定位中的计算效率与精度问题。该方法通过离线阶段生成候选位置及其几何描述符索引,减少在线阶段的搜索空间,结合全局检索与点云配准实现快速而精确的6自由度位姿估计。实验表明,该方法在实际环境中实现了平均3秒的重定位时间与8厘米的定位精度,计算效率相比现有方法提升了一个数量级。

详情
英文摘要

3D global relocalization is one of the key capabilities for mobile robots in practical applications. However, in large scale spaces, existing methods often suffer from prolonged online relocalization time due to factors such as the massive pose search space and high computational overhead. To address these issues, this paper proposes an offline-online hierarchical framework that decouples the search space. In the offline phase, candidate positions and their corresponding geometric descriptor indices are generated in the map by simulating LiDAR scans within the grid map. In the online phase, a coarse pose estimate is first obtained via global retrieval, followed by point cloud registration to output precise 6-DoF pose estimates. Real-world experiments demonstrate that the proposed method achieves an average relocalization time of 3 s and an average localization accuracy of 8 cm in 3D environments. Compared with existing global relocalization methods, the proposed method achieves an order-of-magnitude improvement in computational efficiency while delivering comparable relocalization accuracy.

2605.07740 2026-05-11 cs.CV

LAMES: A Large-Scale and Artisanal Mining Environmental Segmentation Dataset

Matthias Kahl, Zhaiyu Chen, Sudipan Saha, Mrinalini Kochupillai, Lukas Kondmann, Xiao Xiang Zhu

AI总结 本文介绍了一个名为LAMES的大规模手工采矿环境分割数据集,旨在支持对采矿活动及其环境影响的监测与研究。该数据集包含150个大型采矿(LSM)站点和870平方公里的手工小规模采矿(ASM)标注区域,并提供了丰富的元数据,涵盖九类LSM区域和每个站点的27项属性。该数据集有助于深入理解采矿设施特征与环境影响之间的关系,同时引发了对研究者社会责任和伦理责任的思考。

详情
英文摘要

Mining operations are of utmost importance to the economy of some nations. However, such operations result in land-use change, very high energy consumption, and negative impacts on the environment, including soil erosion and deforestation. The mining process can impact an area much larger than the mining site itself. Adding to the negative externalities linked to mining is the fact that, in addition to government-sanctioned legal mining operations, illegal mining is widespread, including in various countries of Africa. The ability to monitor remote mining site activities can be useful, e.g., for the detection of illegal artisanal mining activities and their environmental impacts. An important outcome of such monitoring could include a better understanding of the interrelationship between mine facility attributes (e.g., mining types, processing methods, commodities, etc.) and their impact on the natural environment. In this work, we present a data set that contains 150 Large Scale Mining (LSM) sites and 870km^2 annotated area of Artisanal Small-scale Mining (ASM) sites. The metadata includes nine eminent LSM sections and 27 mining site attributes for each LSM site. We also discuss the data set's possible contribution to the research community, social and environmental consequences, and researchers' responsibilities from an ethics perspective.

2605.07736 2026-05-11 cs.AI

Online Goal Recognition using Path Signature and Dynamic Time Warping

Douglas Tesch, Nathan Gavenski, Leonardo Amado, Odinaldo Rodrigues, Felipe Meneguzzi

AI总结 本文研究了连续域中在线目标识别的两个核心挑战:高效编码长轨迹和有效比较轨迹。为此,作者提出了一种基于路径签名和动态时间规整的新方法,利用路径签名对轨迹进行紧凑且富有表现力的编码,从而实现更语义化的轨迹比较。实验表明,该方法在预测准确率和在线规划效率方面均优于现有方法,同时在离线性能上也具有竞争力。

Comments Accepted as part of the 35th International Joint Conference on Artificial Intelligence

详情
英文摘要

Online goal recognition in continuous domains poses two central challenges: efficiently encoding large trajectories and effectively comparing them. Recent work addresses these challenges by using custom state-space representations and metrics to compare observations against hypotheses. However, these approaches often overlook well-established encoding techniques used in other domains that offer substantial advantages. This paper introduces a novel method for online goal recognition that leverages path signatures, a compact, expressive representation of rough path theory that efficiently captures key semantic features of trajectories, enabling more meaningful comparisons between them. Experiments show that our method consistently outperforms the state of the art in predictive accuracy and online planning efficiency, while remaining competitive offline.