arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 4101
2412.12036 2026-06-02 cs.LG cs.RO

LeARN: Learnable and Adaptive Representations for Nonlinear Dynamics in System Identification

LeARN: 系统辨识中非线性动力学的可学习与自适应表示

Arunabh Singh, Joyjit Mukherjee

发表机构 * Visual Computing Lab, Indian Institute of Science(印度科学院视觉计算实验室) Department of Electrical and Electronics Engineering, BITS Pilani Hyderabad Campus(BITS Pilani Hyderabad校区电子与电气工程系)

AI总结 提出LeARN框架,通过元学习从数据中直接学习基函数库,无需领域知识,实现非线性动力学的自适应辨识,在Neural Fly数据集上达到与SINDy相当的动态误差性能。

Comments This work has been accepted at the 34th Mediterranean Conference on Control and Automation (MED 2026)

详情
AI中文摘要

系统辨识是从观测的输入-输出数据中推导动态系统数学模型的过程,随着基于学习的方法的出现,经历了范式转变。这些方法解决了非线性动态系统中数据驱动发现的复杂挑战,受到了广泛关注。其中,稀疏非线性动力学辨识(SINDy)已成为一种变革性方法,将复杂的动态行为提炼为基函数的可解释线性组合。然而,SINDy依赖领域专业知识来构建其基函数的基础“库”,限制了其适应性和通用性。在这项工作中,我们引入了一个非线性系统辨识框架LeARN,通过直接从数据中学习基函数库,超越了对先验领域知识的需求。为了增强对不同噪声条件下动态系统演变的适应性,我们采用了一种新颖的基于元学习的系统辨识方法,利用轻量级深度神经网络(DNN)动态优化这些基函数。这不仅捕捉了复杂的系统行为,还能有效适应新的动态模式。我们在Neural Fly数据集上验证了我们的框架,展示了其强大的适应和泛化能力。尽管简单,我们的LeARN在动态误差性能上与SINDy相当。这项工作朝着自主发现动态系统迈出了一步,为机器学习无需大量领域特定干预即可揭示复杂系统控制原理的未来铺平了道路。

英文摘要

System identification, the process of deriving mathematical models of dynamical systems from observed input-output data, has undergone a paradigm shift with the advent of learning-based methods. Addressing the intricate challenges of data-driven discovery in nonlinear dynamical systems, these methods have garnered significant attention. Among them, Sparse Identification of Nonlinear Dynamics (SINDy) has emerged as a transformative approach, distilling complex dynamical behaviors into interpretable linear combinations of basis functions. However, SINDy's reliance on domain-specific expertise to construct its foundational 'library' of basis functions limits its adaptability and universality. In this work, we introduce a nonlinear system identification framework LeARN that transcends the need for prior domain knowledge by learning the library of basis functions directly from data. To enhance adaptability to evolving system dynamics under varying noise conditions, we employ a novel meta-learning-based system identification approach that utilizes a light-weight Deep Neural Network (DNN) to dynamically refine these basis functions. This not only captures intricate system behaviors but also adapts effectively to new dynamical regimes. We validate our framework on the Neural Fly dataset, showcasing its robust adaptation and generalization capabilities. Despite its simplicity, our LeARN achieves competitive dynamical error performance to SINDy. This work presents a step towards autonomous discovery of dynamical systems, paving the way for a future where machine learning uncovers the governing principles of complex systems without requiring extensive domain-specific interventions.

2412.10362 2026-06-02 cs.LG cs.CV

OP-LoRA: The Blessing of Dimensionality

OP-LoRA:维度的祝福

Piotr Teterwak, Kate Saenko, Bryan A. Plummer, Ser-Nam Lim

发表机构 * Boston University(波士顿大学) University of Central Florida(中央佛罗里达大学)

AI总结 提出OP-LoRA方法,通过额外MLP预测LoRA适配器权重以改善优化,训练后丢弃MLP,在零额外推理成本下提升性能并降低对学习率的敏感性。

详情
AI中文摘要

低秩适配器(LoRA)使得仅用少量参数即可微调大模型。然而,它们常常面临病态的损失景观,导致优化困难。先前的工作通过自定义优化器将适配器更新与全微调梯度对齐来解决这些挑战,但这些方法缺乏适应新适配器架构的灵活性,且计算成本高。我们引入了OP-LoRA,一种新颖的方法,它用额外的MLP预测的权重替换每个LoRA适配器,该MLP在训练后被丢弃。这允许在训练期间临时增加额外参数以改善优化,但比自定义优化器需要更少的墙钟时间,并且在推理时零额外成本,因为MLP被丢弃。关键的是,将OP-LoRA扩展到其他适配器只需修改每个新适配器类型的预测头大小。我们表明,OP-LoRA允许优化自适应地增加或减少步长,从而提高性能并降低对学习率的敏感性。在小型和大型LoRA微调任务中,我们观察到OP-LoRA相对于LoRA及其变体的一致性能提升。我们在图像生成中取得了特别显著的改进,OP-LoRA的CMMD分数相对于LoRA提高了多达15分。这使得OP-LoRA能够在推理参数减半的情况下达到LoRA的性能。

英文摘要

Low-rank adapters (LoRA) enable finetuning of large models with only a small number of parameters. However, they often suffer from an ill-conditioned loss landscape, leading to difficult optimization. Prior work addresses these challenges by aligning adapter updates with full finetuning gradients via custom optimizers, but these methods lack the flexibility to accommodate new adapter architectures and are computationally expensive. We instead introduce OP-LoRA, a novel method which replaces each LoRA adapter with weights predicted by an extra MLP, which is discarded after training. This temporarily allows additional parameters during training to improve optimization, yet requires less wall time than custom optimizers and zero extra cost at inference time because the MLP is discarded. Crucially, extending OP-LoRA to other adapters is as simple as modifying the size of the prediction head for each new adapter type. We show that OP-LoRA allows the optimization to adaptively increase or decrease step size, improving performance and decreasing sensitivity to learning rate. On both small and large-scale LoRA tuning tasks, we observe consistent performance gains of OP-LoRA relative to LoRA and its variants. We achieve especially notable improvements in image generation, with OP-LoRA CMMD scores improving by up to 15 points relative to LoRA. This allows OP-LoRA to achieve the performance of LoRA with half of the inference parameters.

2411.17790 2026-06-02 cs.CV cs.AI

Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors

基于潜在先验的自监督单目内窥镜深度与姿态估计

Ziang Xu, Bin Li, Yang Hu, Chenyu Zhang, James East, Sharib Ali, Jens Rittscher

发表机构 * University of Oxford(牛津大学) University of Leeds(利兹大学)

AI总结 提出一种结合生成潜在库和变分自编码器的自监督框架,通过自然图像深度先验和姿态潜在变量正则化,实现内窥镜复杂场景下的高精度深度与姿态估计。

详情
AI中文摘要

内窥镜中的精确3D映射能够实现胃肠道(GI)内定量、整体的病变表征,这需要可靠的深度和姿态估计。然而,内窥镜系统是单目的,现有依赖合成数据集或复杂模型的方法在具有挑战性的内窥镜条件下往往缺乏泛化能力。我们提出了一种鲁棒的自监督单目深度和姿态估计框架,该框架结合了生成潜在库(Generative Latent Bank)和变分自编码器(VAE)。生成潜在库利用自然图像中的广泛深度场景来调节深度网络,通过潜在特征先验增强深度预测的真实感和鲁棒性。对于姿态估计,我们将其重新构建在VAE框架内,将姿态转换视为潜在变量以正则化尺度、稳定z轴突出性并提高x-y灵敏度。这种双重精炼流程能够实现精确的深度和姿态预测,有效应对胃肠道复杂的纹理和光照。在SimCol和EndoSLAM数据集上的广泛评估证实,我们的框架在内窥镜深度和姿态估计方面优于已发表的自监督方法。

英文摘要

Accurate 3D mapping in endoscopy enables quantitative, holistic lesion characterization within the gastrointestinal (GI) tract, requiring reliable depth and pose estimation. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a Generative Latent Bank and a Variational Autoencoder (VAE). The Generative Latent Bank leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract's complex textures and lighting. Extensive evaluations on SimCol and EndoSLAM datasets confirm our framework's superior performance over published self-supervised methods in endoscopic depth and pose estimation.

2412.04177 2026-06-02 cs.LG stat.ML

Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning

固定均值高斯过程用于事后贝叶斯深度学习

Luis A. Ortega, Simón Rodríguez-Santana, Daniel Hernández-Lobato

发表机构 * Universidad Autónoma de Madrid, Spain(西班牙自治大学) Aalborg University, Copenhagen(哥本哈根大学) Universidad Pontificia de Comillas, Spain(Pontificia大学, Spain)

AI总结 提出固定均值高斯过程(FMGP),通过将后验均值固定为预训练DNN的输出,利用变分推断高效估计预测方差,实现架构无关的事后不确定性量化。

Comments 32 pages, 6 figures and 6 tables. Submitted to for revision

详情
AI中文摘要

近年来,对预训练深度神经网络(DNN)的预测进行事后不确定性估计的兴趣日益增加。给定通过反向传播预训练的DNN,这些方法通过添加输出置信度度量(如误差条)来增强原始网络,同时不损害其初始准确性。在此背景下,我们引入了一种新的稀疏变分高斯过程(GP)族,其中当使用通用核时,后验均值固定为任意连续函数。具体地,我们将该GP的均值固定为预训练DNN的输出,使我们的方法能够有效地拟合GP的预测方差以估计DNN预测的不确定性。我们的方法利用变分推断(VI)进行高效的随机优化,训练成本与训练点数无关,可高效扩展到ImageNet等大型数据集。所提出的方法称为固定均值GP(FMGP),与架构无关,仅依赖预训练模型的输出来调整预测方差。实验结果表明,与最先进的DNN事后贝叶斯推断方法相比,FMGP在不确定性估计和计算效率方面均有提升。

英文摘要

Recently, there has been an increasing interest in performing post-hoc uncertainty estimation about the predictions of pre-trained deep neural networks (DNNs). Given a pre-trained DNN via back-propagation, these methods enhance the original network by adding output confidence measures, such as error bars, without compromising its initial accuracy. In this context, we introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the DNN prediction uncertainty. Our approach leverages variational inference (VI) for efficient stochastic optimization, with training costs that remain independent of the number of training points, scaling efficiently to large datasets such as ImageNet. The proposed method, called fixed-mean GP (FMGP), is architecture-agnostic, relying solely on the pre-trained model's outputs to adjust the predictive variances. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods for DNN post-hoc Bayesian inference.

2411.12321 2026-06-02 cs.CV

Enhancing Blind Source Separation with Dissociative Principal Component Analysis

增强盲源分离的解离主成分分析

Muhammad Usman Khalid

发表机构 * College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University(伊斯兰国际大学计算机与信息科学学院)

AI总结 提出解离主成分分析(DPCA),通过联合估计主成分和载荷向量并显式建模其相互依赖关系,克服传统稀疏PCA在源重叠时性能下降的问题,在模拟fMRI源恢复、前景背景分离等任务中优于经典sPCA。

Comments 13 pages with 6 figures, this work has not been published before

详情
AI中文摘要

主成分分析(PCA)及其稀疏变体(sPCA)被广泛用作独立成分分析(ICA)的前置步骤,用于盲源分离(BSS)。然而,sPCA通常依赖于一种逐次提取成分并在它们之间施加正交性的缩减策略。当底层源重叠时,这会丢弃ICA所依赖的跨成分结构,从而降低分离效果。本文提出解离PCA(DPCA),它联合估计成分而非通过缩减。DPCA在基于SVD的分解中引入左、右解离矩阵,以显式建模主成分(PC)和载荷向量(LV)之间的相互依赖关系,同时通过稀疏约束保持可解释性。我们开发了三种算法,称为DPCA1a、DPCA1b和DPCA2,采用自适应软阈值与梯度下降和坐标下降相结合,并辅以二次硬阈值步骤,以保持稀疏性并抑制恢复的载荷向量中的背景噪声。该方法在四个设置上进行了评估,即模拟fMRI源恢复、前景与背景分离、图像重建和图像修复,在这些设置中,它比基于经典sPCA的流程更可靠地恢复源结构,在显著空间重叠下增益最大。当稀疏参数为零时,DPCA退化为普通PCA。所提出算法的MATLAB实现可在https://github.com/usmankhalid06/DPCA公开获取。

英文摘要

Principal component analysis (PCA) and its sparse variants (sPCA) are widely used as a precursor to independent component analysis (ICA) for blind source separation (BSS). However, sPCA typically relies on a deflation strategy that extracts components sequentially and imposes orthogonality between them. When the underlying sources overlap, this discards the cross component structure that ICA depends on, degrading separation. This paper proposes dissociative PCA (DPCA), which estimates components jointly rather than by deflation. DPCA introduces left and right dissociation matrices into the SVD based decomposition to explicitly model the interdependencies among principal components (PCs) and loading vectors (LVs), while sparsity constraints maintain interpretability. We develop three algorithms called DPCA1a, DPCA1b, and DPCA2, using adaptive soft thresholding with gradient and coordinate descent, together with a secondary firm thresholding step that preserves sparsity and suppresses background noise in the recovered loading vectors. The method is evaluated on four settings, namely simulated fMRI source retrieval, foreground and background separation, image reconstruction, and image inpainting, where it recovers source structure more reliably than classical sPCA based pipelines, with the largest gains under significant spatial overlap. DPCA reduces to ordinary PCA when the sparsity parameter is zero. A MATLAB implementation of the proposed algorithms is publicly available at https://github.com/usmankhalid06/DPCA.

2411.11793 2026-06-02 cs.LG

Nonlinear Equilibrium Transitions in a Potential Game Model for Federated Learning

联邦学习中势博弈模型的非线性均衡转变

Kang Liu, Ziqi Wang, Enrique Zuazua

发表机构 * Institut de Mathématiques de Bourgogne, Université Bourgogne Europe, CNRS(布尔格ogne数学研究所,布尔格ogne欧洲大学,国家科学研究中心) Chair for Dynamics, Control, Machine Learning and Numerics – Alexander von Humboldt Professorship, Department of Mathematics, Friedrich-Alexander-Universität Erlangen-Nürnberg(动力学、控制、机器学习和数值计算主席职位——亚历山大·冯·洪堡教授职位,数学系,弗里德里希-亚历山大-埃朗根-纽伦堡大学)

AI总结 提出势博弈框架研究联邦学习中客户理性选择训练努力的行为,发现纳什均衡随奖励因子非线性变化并在临界值处发生非光滑转变,证明了最佳响应算法的收敛性,并通过实验验证了临界奖励因子的有效性。

Comments Accepted for publication in Physica D: Nonlinear Phenomena

详情
AI中文摘要

在联邦学习(FL)中,中央服务器通常将训练任务分配给客户端。然而,从市场导向的角度来看,客户端可能基于理性自利独立选择其训练努力。为了研究这种设置,我们提出了一个势博弈框架,其中每个客户端的收益由其个人努力和服务器提供的奖励决定。奖励受所有客户端集体努力的影响,并可通过奖励因子进行调节。我们首先建立了纳什均衡(NE)的存在性,然后研究了其在静态设置中的唯一性。我们表明,NE 非线性地依赖于奖励因子,并在临界值处表现出非光滑转变,此时静态势失去严格曲率,导致 NE 不唯一以及在低努力和高努力分支之间跳跃。此外,我们证明了在我们的 FL 博弈中用于计算 NE 的最佳响应算法的收敛性。最后,我们将从 NE 导出的客户端理性努力应用于各种数据集和模型的 FL 训练,从而验证了所识别的临界奖励因子的有效性。

英文摘要

In federated learning (FL), a central server typically allocates training efforts to clients. However, from a market-oriented perspective, clients may independently choose their training efforts based on rational self-interest. To study this setting, we propose a potential game framework in which each client's payoff is determined by its individual effort and the rewards provided by the server. The rewards are influenced by the collective efforts of all clients and can be modulated by a reward factor. We first establish the existence of Nash equilibria (NEs) and then investigate their uniqueness in a stationary setting. We show that the NEs depend nonlinearly on the reward factor and exhibit a nonsmooth transition at a critical value, where the stationary potential loses strict curvature, leading to nonunique NEs and a jump between low-effort and high-effort branches. Furthermore, we prove the convergence of the best-response algorithm for computing NEs in our FL game. Finally, we apply the clients' rational efforts derived from the NEs to FL training with various datasets and models, thereby validating the effectiveness of the identified critical reward factor.

2411.05196 2026-06-02 cs.AI cs.DL cs.LG

Explainable AI Through a Democratic Lens: DhondtXAI for D'Hondt-Projected Feature Attribution

通过民主视角的可解释AI:用于D'Hondt投影特征归因的DhondtXAI

Turker Berk Donmez

发表机构 * Sakarya University of Applied Sciences(萨卡里亚应用科学大学)

AI总结 提出DhondtXAI,一种基于D'Hondt规则的独立于SHAP的表格数据可解释性框架,通过计算背景干预移除效应、分离正负证据、形成特征联盟并分配席位,实现特征归因,在合成数据和医疗数据集上验证了其与SHAP的高度一致性。

详情
AI中文摘要

本研究提出DhondtXAI,作为一种独立于SHAP、基于D'Hondt的表格可解释AI归因框架。DhondtXAI不依赖于模型原生特征重要性或SHAP值,而是计算背景干预移除效应,分离正负证据,形成可选的特征联盟,应用可选的阈值,通过D'Hondt规则分配席位,并投影到局部模型输出差异上。通过构造保持完整性,投影残差比作为诊断指标报告。该方法在合成加性和交互测试、相关特征扰动、算子和分配消融、投影模式比较、logit尺度检查、重复分割验证、配对删除测试以及两个医疗数据集(威斯康星诊断乳腺癌(CatBoost)和早期糖尿病风险预测(XGBoost))上进行了评估。SHAP仅作为外部比较器,设置对齐。在加性合成数据中,DhondtXAI精确恢复真实排名;在乘法交互中,联盟将平均投影残差从0.2527降至0.0001。在WDBC和糖尿病数据上,与SHAP高度一致(Spearman rho分别为0.9273和0.9353),并通过进一步的符号、top-k、幅度、删除和敏感性分析得到支持。结果表明,DhondtXAI是一种互补的比例性、联盟感知和阈值感知的表格可解释AI方法,而非SHAP或LIME的替代品。

英文摘要

This study presents DhondtXAI as a SHAP-independent, D'Hondt-based attribution framework for tabular XAI. Instead of model-native feature importance or SHAP values, DhondtXAI computes background-interventional removal effects, separates positive and negative evidence, forms optional feature alliances, applies optional thresholds, allocates seats via the D'Hondt rule, and projects onto the local model-output difference. Completeness is preserved by construction, with the projection residual ratio reported as a diagnostic. The method is evaluated on synthetic additive and interaction tests, correlated-feature perturbations, operator and apportionment ablations, projection-mode comparisons, logit-scale checks, repeated split validation, paired deletion tests, and two healthcare datasets: Wisconsin Diagnostic Breast Cancer (CatBoost) and early-stage diabetes risk prediction (XGBoost). SHAP serves only as an external comparator with aligned settings. In additive synthetics, DhondtXAI exactly recovers ground-truth rankings; in multiplicative interactions, alliances reduce the mean projection residual from 0.2527 to 0.0001. On WDBC and diabetes data, it shows high agreement with SHAP (Spearman rho = 0.9273 and 0.9353), supported by further signed, top-k, magnitude, deletion, and sensitivity analyses. Results position DhondtXAI as a complementary proportional, alliance-aware, and threshold-aware tabular XAI method, not a replacement for SHAP or LIME.

2411.05359 2026-06-02 cs.CV cs.AI cs.CY

Agricultural Landscape Understanding At Country-Scale

国家级农业景观理解

Radhika Dua, Aditi Agarwal, Aishwarya Jayagopal, Depanshu Sani, Alex Wilson, Hoang Tran, Ishan Deshpande, Bogdan Floristean, Neelabh Goyal, Ramya Cheruvu, Vishal Batchu, Yan Mayster, Gaurav Aggarwal, Alok Talekar, Vaibhav Rajan

发表机构 * Google DeepMind(谷歌深Mind) Google(谷歌)

AI总结 提出首个国家级农业制图系统,通过新颖的后处理启发式方法实现田地、树木和水体的实例分割,并在全国范围内部署验证。

Comments 32 pages, 11 tables, 22 figs

详情
AI中文摘要

全面的农业景观理解对于应对粮食安全、气候变化和资源管理等全球挑战至关重要。这不仅需要绘制农田地图,还需要绘制树木和水体等重要特征,这些特征在主导全球南方的复杂 extit{小农户}系统中形成了错综复杂的镶嵌结构。以往开发此类土地利用地图的努力受到限制,仅专注于田地划界的方法,并且没有开发出实际部署所必需的稳健后处理步骤。此外,据我们所知,之前没有针对小农户农场的系统在国家范围内进行部署和评估。本文通过提出首个国家级农业制图系统来解决这些局限性,该系统超越了简单的田地划界,能够对田地、树木和水体等农业实例进行分割。我们的系统通过新颖的后处理启发式方法进行了优化,以确保地图的一致性和准确性,并通过严格、多方面的评估过程进行了验证。我们系统生成的精细土地利用地图可通过API在 extit{\href{http://agri.withgoogle.com}{http://agri.withgoogle.com}}公开访问,支持从精准农业和政策制定到推进全球可持续发展目标的各种应用。

英文摘要

Comprehensive agricultural landscape understanding is critical for addressing global challenges in food security, climate change, and resource management. This requires mapping not just crop fields, but also vital features like trees and water bodies which form an intricate mosaic in complex \textit{smallholder} systems dominating the Global South. Previous efforts to develop such land use maps have been limited by a narrow focus on methods for field delineation only, and also do not develop robust post-processing steps essential for real-world deployment. Further, to our knowledge, no prior system for smallholder farms has been deployed and evaluated at a national scale. This work addresses these limitations by presenting the first national-scale agricultural mapping system that moves beyond simple field delineation to enable segmentation of agricultural instances like fields, trees and water bodies. Our system is refined for real-world application using novel post-processing heuristics to ensure map consistency and accuracy, and is validated through a rigorous, multi-faceted evaluation process. Fine-grained land use maps generated by our system are publicly accessible via an API at \textit{\href{http://agri.withgoogle.com}{http://agri.withgoogle.com}}, enabling a wide range of applications from precision agriculture and policy-making to advancing global sustainability development goals.

2410.21361 2026-06-02 cs.CV cs.LG

Domain Adaptation with a Single Vision-Language Embedding

基于单一视觉-语言嵌入的域适应

Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Patrick Pérez, Raoul de Charette

发表机构 * Inria(法国国家信息与自动化研究所) Kyutai(Kyutai公司)

AI总结 提出一种利用单一视觉-语言(VL)嵌入进行域适应的框架,通过提示/照片驱动的实例归一化(PIN)挖掘多种视觉风格,实现零样本和单样本无监督域适应,在语义分割任务上优于基线方法。

Comments International Journal of Computer Vision (IJCV 2026)

详情
AI中文摘要

域适应在计算机视觉中已被广泛研究,但仍需要在训练时访问目标数据,这在现实世界的自动驾驶场景中可能难以获得,尤其是在罕见或恶劣条件下。本文提出了一种新的域适应框架,该框架依赖于单一的视觉-语言(VL)潜在嵌入,而不是完整的目标数据。首先,利用对比语言-图像预训练模型(CLIP),我们提出了提示/照片驱动的实例归一化(PIN)。PIN是一种特征增强方法,通过优化低级源特征的仿射变换,使用单一的目标VL潜在嵌入挖掘多种视觉风格。VL嵌入可以来自描述目标域的语言提示、部分优化的语言提示或单一未标记的目标图像。其次,我们表明这些挖掘的风格(即增强)可用于零样本(即无目标)和单样本无监督域适应。在真实世界驾驶数据集(包括Cityscapes和ACDC(恶劣条件))上的语义分割实验证明了所提出方法的有效性,在实用的零样本和单样本设置中优于相关基线。

英文摘要

Domain adaptation has been extensively investigated in computer vision but still requires access to target data at the training time, which might be difficult to obtain in real-world autonomous driving scenarios, especially under rare or adverse conditions. In this paper, we present a new framework for domain adaptation relying on a single Vision-Language (VL) latent embedding instead of full target data. First, leveraging a contrastive language-image pre-training model (CLIP), we propose prompt/photo-driven instance normalization (PIN). PIN is a feature augmentation method that mines multiple visual styles using a single target VL latent embedding, by optimizing affine transformations of low-level source features. The VL embedding can come from a language prompt describing the target domain, a partially optimized language prompt, or a single unlabeled target image. Second, we show that these mined styles (i.e., augmentations) can be used for zero-shot (i.e., target-free) and one-shot unsupervised domain adaptation. Experiments on semantic segmentation in real-world driving datasets, including Cityscapes and ACDC (adverse conditions), demonstrate the effectiveness of the proposed method, which outperforms relevant baselines in the practical zero-shot and one-shot settings.

2410.12325 2026-06-02 cs.CL

$M^3$ Scaling Law: Optimizing Multi-Epoch, Multi-Lingual, and Multi-Stage Training for Low-Resource Language Models

$M^3$ 缩放定律:优化低资源语言模型的多周期、多语言和多阶段训练

Kosuke Akimoto, Taiki Miyagawa, Masafumi Oyamada

发表机构 * NEC Corporation(日本电报电话株式会社)

AI总结 本文提出 $M^3$ 缩放定律,统一预测模型规模、目标语料周期数、平均目标语言比例和最终阶段目标语言比例对低资源语言模型预训练损失的影响,并推导出最优训练策略的实用指南。

Comments 35 pages, 14 figures, 17 tables

详情
AI中文摘要

在本文中,我们研究了低资源语言环境下预训练大型语言模型(LLMs)的一个基本设计问题。现有工作采用多周期、多语言和多阶段训练来有效利用有限的目标语言语料库,但没有先前的缩放定律可以在相同的计算预算 $C$ 和目标语言语料库大小 $D_T$ 下比较这些方法,导致最优训练设置不明确。为填补这一空白,我们提出了 $M^3$ 缩放定律,这是一个统一的预测模型,参数化为模型规模、目标语料库周期数 $k$、平均目标语言比例 $r$ 和最终阶段目标语言比例 $r_f$,将单语言单阶段、多语言单阶段和多语言多阶段方案置于单一的目标语言损失曲面上。在三个语言对中,它比现有缩放定律更准确地外推到未见过的超参数区域。使用 $M^3$ 作为替代目标,我们为低资源 LLM 预训练推导出两个实用指南:(i) 随着 $D_T$ 减小,最优方案在计算预算相关的阈值处直接从单语言单阶段转变为多语言两阶段训练,而在我们的实验网格中多语言单阶段从未最优;(ii) 最优周期数在稀缺变量 $D_T/D^*(C)$ 上坍缩为一条曲线,其中 $D^*(C) \propto C^{α/(α+β)}$ 是单语言计算最优语料库大小。

英文摘要

In this paper, we study a fundamental design problem in pretraining Large Language Models (LLMs) for low-resource language regimes. Existing works adopt multi-epoch, multi-lingual, and multi-stage training to utilize the limited target-language corpus efficiently, but no prior scaling law can compare recipes spanning these approaches under the same compute budget $C$ and target-language corpus size $D_T$, leaving the optimal training setup unclear. To address this gap, we propose the $M^3$ Scaling Law, a unified predictive model parameterized by the model scale, the number of target-corpus epochs $k$, the average target-language ratio $r$, and the final-stage target-language ratio $r_f$, which places monolingual single-stage, multi-lingual single-stage, and multi-lingual multi-stage recipes on a single target-language loss surface. Across three language pairs, it extrapolates to unseen hyperparameter regions more accurately than existing scaling laws. Using $M^3$ as a surrogate objective, we derive two practical guidelines for low-resource LLM pretraining: (i) as $D_T$ decreases, the optimal recipe shifts directly from monolingual single-stage to multi-lingual two-stage training at a compute-budget-dependent threshold, with multi-lingual single-stage never optimal in our experimental grid; and (ii) the optimal number of epochs collapses onto a single curve in the scarcity variable $D_T/D^*(C)$, where $D^*(C) \propto C^{α/(α+β)}$ is the monolingual compute-optimal corpus size.

2410.09737 2026-06-02 cs.LG

Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors

利用拉普拉斯特征向量实现稳定且全局表达性的图表示

Junru Zhou, Cai Zhou, Xiyuan Wang, Pan Li, Muhan Zhang

发表机构 * Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院) Department of EECS, Massachusetts Institute of Technology(麻省理工学院电子工程与计算机科学系) School of ECE, Georgia Institute of Technology(佐治亚理工学院电子与计算机工程系)

AI总结 提出一种利用可学习的O(p)-不变表示和平滑处理数值接近特征值的方法,以增强图神经网络中拉普拉斯特征向量的稳定性和全局表达性。

详情
AI中文摘要

提高图神经网络(GNN)表达能力的一种流行方法是使用拉普拉斯特征向量作为额外的节点特征,因为它们既可以作为结构标识符,也可以作为节点的全局坐标。正确处理特征向量之间的正交群对称性对于拉普拉斯特征向量增强的GNN的稳定性和泛化能力至关重要。先前的研究表明,对每个$p$维特征空间使用朴素的$O(p)$-群不变编码器通常会导致表达性损失和数值不稳定性。在本文中,我们提出了一种利用拉普拉斯特征向量生成\emph{稳定}且全局\emph{表达性}的图表示的新方法。与先前工作的主要区别在于:(i)我们的方法对每个维度为$p$的拉普拉斯特征空间利用 extbf{可学习的}$O(p)$-不变表示,这些表示建立在文献中已充分研究的强大正交群等变神经网络层之上;(ii)我们的方法以 extbf{平滑}的方式处理数值接近的特征值,确保其对扰动具有更好的鲁棒性。在各种图学习基准上的实验证明了我们方法的竞争性能,特别是其学习图全局属性的巨大潜力。

英文摘要

A popular way to improve the expressive power of graph neural networks (GNNs) is to use Laplacian eigenvectors as additional node features, since they can serve both as structural identifiers and global coordinates of nodes. Properly handling the orthogonal group symmetry among eigenvectors is crucial for the stability and generalizability of Laplacian eigenvector augmented GNNs. Previous studies have shown that using a naive $O(p)$-group invariant encoder for each $p$-dimensional eigenspace often leads to expressivity loss and numerical instability. In this paper, we propose a novel method exploiting Laplacian eigenvectors to generate \emph{stable} and globally \emph{expressive} graph representations. The main difference from previous works is that (i) our method utilizes \textbf{learnable} $O(p)$-invariant representations for each Laplacian eigenspace of dimension $p$, which are built upon powerful orthogonal group equivariant neural network layers already well studied in the literature, and that (ii) our method deals with numerically close eigenvalues in a \textbf{smooth} fashion, ensuring its better robustness against perturbations. Experiments on various graph learning benchmarks witness the competitive performance of our method, especially its great potential to learn global properties of graphs.

2410.02511 2026-06-02 cs.AI cs.MA

Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

停止徘徊,找到关键:LLMs 辨别关键状态以实现高效多智能体探索

Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Heming Zou, Chang Liu, Cheems Wang, Meiqin Liu, Xiangyang Ji

发表机构 * Department of Automation, Tsinghua University, Beijing 100084, China(清华大学自动化系) College of Electrical Engineering, Zhejiang University, Hangzhou 310027, China(浙江大学电气工程学院) National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi’an Jiaotong University, Xi’an 710049, China(西安交通大学人机混合增强智能国家级重点实验室)

AI总结 提出 LEMAE 方法,利用大语言模型辨别关键状态并设计子空间内在奖励和关键状态记忆树,引导多智能体高效探索,在 SMAC 和 MPE 基准上显著超越现有方法,实现 10 倍加速。

详情
Journal ref
SCIENCE CHINA Information Sciences 2026
AI中文摘要

在具有广阔状态-动作空间的情况下,高效的多智能体探索仍然是强化学习中一个长期存在的挑战。尽管追求新颖性、多样性或不确定性吸引了越来越多的关注,但在没有适当指导选择的情况下进行探索所带来的冗余努力,给该领域带来了一个实际问题。本文介绍了一种系统方法,称为 LEMAE,它选择从知识渊博的大语言模型(LLM)中引导信息丰富的任务相关指导,以实现高效的多智能体探索。具体来说,我们将 LLM 的语言知识以判别性的方式、以较低的 LLM 推理成本,转化为对任务完成至关重要的符号化关键状态。为了释放关键状态的力量,我们设计了基于子空间的回顾性内在奖励(SHIR),通过增加奖励密度来引导智能体朝向关键状态。此外,我们构建了关键状态记忆树(KSMT),以跟踪特定任务中关键状态之间的转换,从而实现有组织的探索。得益于减少冗余探索,LEMAE 在具有挑战性的基准测试(例如 SMAC 和 MPE)上以较大优势超越了现有的最先进方法,在某些场景中实现了 10 倍的加速。

英文摘要

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

2404.13621 2026-06-02 cs.CV cs.LG cs.MM

Attack on Scene Flow using Point Clouds

使用点云对场景流进行攻击

Haniyeh Ehsani Oskouie, Mohammad-Shahram Moin, Shohreh Kasaei

发表机构 * Sharif University of Technology(谢里弗大学) ICT Research Institute(信息与通信技术研究所)

AI总结 针对场景流网络提出白盒对抗攻击方法,在KITTI和FlyingThings3D数据集上实现平均端点误差相对下降33.7%,并揭示单维度或单颜色通道攻击的影响。

详情
AI中文摘要

深度神经网络在使用点云准确估计场景流方面取得了显著进展,这对于视频分析、动作识别和导航等许多应用至关重要。然而,这些技术的鲁棒性仍然令人担忧,特别是在面对已被证明能在许多领域欺骗最先进深度神经网络的对抗攻击时。令人惊讶的是,场景流网络对此类攻击的鲁棒性尚未得到彻底研究。为解决这一问题,本文提出了一种专门针对场景流网络的白盒对抗攻击方法。实验结果表明,生成的对抗样本在KITTI和FlyingThings3D数据集上使平均端点误差相对下降高达33.7%。研究还揭示了仅针对点云的一个维度或颜色通道的攻击对平均端点误差的显著影响。通过分析这些攻击在场景流网络及其2D光流网络变体上的成功与失败,发现光流网络具有更高的脆弱性。代码可在https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git获取。

英文摘要

Deep neural networks have made significant advancements in accurately estimating scene flow using point clouds, which is vital for many applications like video analysis, action recognition, and navigation. The robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains. Surprisingly, the robustness of scene flow networks against such attacks has not been thoroughly investigated. To address this problem, the proposed approach aims to bridge this gap by introducing adversarial white-box attacks specifically tailored for scene flow networks. Experimental results show that the generated adversarial examples obtain up to 33.7 relative degradation in average end-point error on the KITTI and FlyingThings3D datasets. The study also reveals the significant impact that attacks targeting point clouds in only one dimension or color channel have on average end-point error. Analyzing the success and failure of these attacks on the scene flow networks and their 2D optical flow network variants shows a higher vulnerability for the optical flow networks. Code is available at https://github.com/aheldis/Attack-on-Scene-Flow-using-Point-Clouds.git.

2012.01494 2026-06-02 cs.CV

Braille to Text Translation for Bengali Language: A Geometric Approach

孟加拉语盲文到文本翻译:一种几何方法

Minhas Kamal, Amin Ahsan Ali, Muhammad Asif Hossain Khan, Mohammad Shoyaib

发表机构 * Institute of Information Technology(信息科技研究所) University of Dhaka(达卡大学)

AI总结 针对孟加拉语缺乏盲文翻译工具的问题,提出一种基于图像处理和几何结构分析的盲文到文本翻译方法,识别准确率达97.25%。

Comments GitHub Repo.: https://github.com/MinhasKamal/BrailleToTextTranslator

详情
Journal ref
Jahangirnagar University Journal of Information Technology (JJIT), vol. 7, pp. 93-111, June, 2018
AI中文摘要

盲文是视障人士阅读和书写的唯一系统。然而,普通人无法阅读盲文。因此,教师和亲属在帮助他们学习时遇到困难。几乎所有主要语言都有用于此翻译目的的软件解决方案。然而,在孟加拉语中缺乏这一有用的工具。在这里,我们提出盲文到文本翻译器,它获取这些触觉字母的图像,并将其翻译为纯文本。图像退化、扫描时页面旋转和盲文点变形是该方案中的主要问题。所有这些挑战都通过特殊的图像处理和几何结构分析直接检查。该技术在识别盲文字符方面达到了97.25%的准确率。

英文摘要

Braille is the only system to visually impaired people for reading and writing. However, general people cannot read Braille. So, teachers and relatives find it hard to assist them with learning. Almost every major language has software solutions for this translation purpose. However, in Bengali there is an absence of this useful tool. Here, we propose Braille to Text Translator, which takes image of these tactile alphabets, and translates them to plain text. Image deterioration, scan-time page rotation, and braille dot deformation are the principal issues in this scheme. All of these challenges are directly checked using special image processing and geometric structure analysis. The technique yields 97.25% accuracy in recognizing Braille characters.

2407.15510 2026-06-02 cs.AI cs.DM cs.LO cs.SC

Algebraic anti-unification

代数反统一

Christian Antić

发表机构 * Vienna University of Technology(维也纳技术大学)

AI总结 本文在泛代数的一般框架下提出代数反统一理论,通过引入代数泛化序和最小泛化概念,建立基本结构性质,并利用自动机理论研究有限一元代数和有限代数中的可计算性。

详情
AI中文摘要

抽象是人类和人工智能的关键,因为它允许人们识别原本不同对象或情境中的共同结构。反统一(或泛化)是理论计算机科学和人工智能中研究抽象的分支,已在归纳逻辑编程、程序综合和类比推理等领域得到应用。迄今为止,反统一几乎完全从语法角度进行研究。在本文中,我们在泛代数的一般框架下开创了反统一的代数(即语义)理论,从而将反统一从基于项的表示扩展到任意代数,并超越等式理论。特别地,我们引入了代数泛化序和最小泛化泛化的概念,建立了基本结构性质,证明了与同态和同构的兼容性,并通过自动机理论方法研究了有限一元代数和有限代数中的可计算性。

英文摘要

Abstraction is key to human and artificial intelligence as it allows one to identify common structure in otherwise distinct objects or situations. Anti-unification (or generalization) is the branch of theoretical computer science and artificial intelligence that studies abstraction and has found applications in areas such as inductive logic programming, program synthesis, and analogy-making. To date, anti-unification has been studied almost exclusively from a syntactic perspective. In this paper, we initiate an algebraic (i.e.\ semantic) theory of anti-unification in the general setting of universal algebra, thereby extending anti-unification from term-based representations to arbitrary algebras and beyond equational theories. In particular, we introduce the notions of algebraic generalization ordering and minimally general generalization, establish basic structural properties, prove compatibility with homomorphisms and isomorphisms, and investigate computability in finite unary algebras and finite algebras via automata-theoretic methods.

2407.01374 2026-06-02 cs.CL

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

弥合差距:从英语PLM到马来西亚英语的迁移学习

Mohan Raj Chanthran, Lay-Ki Soon, Huey Fang Ong, Bhawani Selvaretnam

发表机构 * School of Information Technology, Monash University Malaysia(墨尔本大学马来西亚分校信息科技学院)

AI总结 针对马来西亚英语的低资源克里奥尔语特性,通过微调预训练语言模型MENmBERT和MENBERT,在命名实体识别和关系抽取任务上分别提升1.52%和26.27%的性能。

Comments Accepted in 9th Workshop on Representation Learning for NLP (Rep4NLP) at ACL 2024

详情
AI中文摘要

马来西亚英语是一种低资源克里奥尔语,除了标准英语外,还包含马来语、汉语和泰米尔语的元素。由于其独特的形态句法适应、语义特征和代码切换(混合英语和马来语),命名实体识别(NER)模型在从马来西亚英语文本中捕获实体时表现不佳。考虑到这些差距,我们引入了MENmBERT和MENBERT,这是一种具有上下文理解的预训练语言模型,专门针对马来西亚英语定制。我们使用来自马来西亚英语新闻文章(MEN)数据集的手动注释实体和关系对MENmBERT和MENBERT进行了微调。这一微调过程使PLM能够学习捕捉与NER和RE任务相关的马来西亚英语细微差别的表示。与bert-base-multilingual-cased模型相比,MENmBERT在NER和RE任务上分别提高了1.52%和26.27%。尽管NER的整体性能没有显著提升,但我们的进一步分析表明,在12个实体标签评估时,性能有显著提升。这些发现表明,在特定语言和地理区域的语料库上预训练语言模型是提高低资源环境下NER性能的一种有前景的方法。本文发布的数据集和代码为专注于马来西亚英语的NLP研究工作提供了宝贵的资源。

英文摘要

Malaysian English is a low resource creole language, where it carries the elements of Malay, Chinese, and Tamil languages, in addition to Standard English. Named Entity Recognition (NER) models underperform when capturing entities from Malaysian English text due to its distinctive morphosyntactic adaptations, semantic features and code-switching (mixing English and Malay). Considering these gaps, we introduce MENmBERT and MENBERT, a pre-trained language model with contextual understanding, specifically tailored for Malaysian English. We have fine-tuned MENmBERT and MENBERT using manually annotated entities and relations from the Malaysian English News Article (MEN) Dataset. This fine-tuning process allows the PLM to learn representations that capture the nuances of Malaysian English relevant for NER and RE tasks. MENmBERT achieved a 1.52\% and 26.27\% improvement on NER and RE tasks respectively compared to the bert-base-multilingual-cased model. Although the overall performance of NER does not have a significant improvement, our further analysis shows that there is a significant improvement when evaluated by the 12 entity labels. These findings suggest that pre-training language models on language-specific and geographically-focused corpora can be a promising approach for improving NER performance in low-resource settings. The dataset and code published in this paper provide valuable resources for NLP research work focusing on Malaysian English.

2307.05213 2026-06-02 cs.LG cs.AI

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

评分函数梯度估计以拓宽决策聚焦学习的适用性

Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Brandon Amos, Tias Guns, Michele Lombardi

发表机构 * University of Bologna(博洛尼亚大学) KU Leuven(鲁汶大学) Meta

AI总结 提出一种结合随机平滑与评分函数梯度估计的方法,无需对问题结构做特定假设,即可将决策聚焦学习扩展到非线性目标、约束中不确定参数及两阶段随机优化问题。

详情
Journal ref
Silvestri, Mattia, et al. "Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning." Journal of Artificial Intelligence Research 85 (2026)
AI中文摘要

许多现实世界的优化问题包含在部署前未知的参数,这是由于随机性或信息缺乏(例如,配送问题中的需求或旅行时间)。在这种情况下,常见的策略是通过机器学习(ML)模型估计所述参数,这些模型以最小化预测误差为目标进行训练,然而这并不一定与下游任务级误差一致。决策聚焦学习(DFL)范式通过直接最小化任务损失(例如遗憾)来克服这一限制。由于后者对于组合问题具有非信息性梯度,最先进的DFL方法引入了能够实现训练的替代和近似。但这些方法利用了关于问题结构的特定假设(例如,凸或线性问题,仅在目标函数中的未知参数)。我们提出了一种替代方法,该方法不做此类假设,它结合了随机平滑与评分函数梯度估计,适用于任何任务损失。这为将DFL方法应用于非线性目标、问题约束中的不确定参数,甚至两阶段随机优化打开了大门。实验表明,它通常需要更多的训练周期,但在解决方案质量、可扩展性或两者方面,与专门方法相当,并且在约束中存在不确定性的困难情况下表现尤为出色。

英文摘要

Many real-world optimization problems contain parameters that are unknown before deployment time, either due to stochasticity or to lack of information (e.g., demand or travel times in delivery problems). A common strategy in such cases is to estimate said parameters via machine learning (ML) models trained to minimize the prediction error, which however is not necessarily aligned with the downstream task-level error. The decision-focused learning (DFL) paradigm overcomes this limitation by training to directly minimize a task loss, e.g. regret. Since the latter has non-informative gradients for combinatorial problems, state-of-the-art DFL methods introduce surrogates and approximations that enable training. But these methods exploit specific assumptions about the problem structures (e.g., convex or linear problems, unknown parameters only in the objective function). We propose an alternative method that makes no such assumptions, it combines stochastic smoothing with score function gradient estimation which works on any task loss. This opens up the use of DFL methods to nonlinear objectives, uncertain parameters in the problem constraints, and even two-stage stochastic optimization. Experiments show that it typically requires more epochs, but that it is on par with specialized methods and performs especially well for the difficult case of problems with uncertainty in the constraints, in terms of solution quality, scalability, or both.

2405.14782 2026-06-02 cs.CL

Lessons from the Trenches on Reproducible Evaluation of Language Models

关于语言模型可重复评估的前线经验教训

Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan, Xiangru Tang, Kevin A. Wang, Genta Indra Winata, François Yvon, Andy Zou

发表机构 * MBZUAI IIIT Hyderabad EleutherAI HiTZ Center - Ixa, UPV/EHU Ivy Natal University of Michigan HubSpot LibrAI Kensho Contextual AI Brown University New York University Amazon Yale University HKUST Sorbonne University CMU

AI总结 本文基于开发Language Model Evaluation Harness框架的三年经验,总结了语言模型评估中面临的方法论挑战、缺乏可重复性和透明度等问题,并提供了改进评估严谨性和信心的建议。

详情
AI中文摘要

语言模型(LMs)的可靠评估仍然是一个未解决的挑战。研究人员和工程师面临方法学问题,例如模型对评估设置的敏感性、不同方法之间难以进行适当比较,以及缺乏可重复性和透明度。关于惯例和常见实践的信息碎片化和隔离加剧了评估困难。在本文中,我们借鉴了作为流行的Language Model Evaluation Harness (lm-eval) (Gao et al., 2023) 框架开发者三年评估大型语言模型(LMs)的经验,为该领域未来的发展提供指导和教训。我们记录了实践者面临的各种挑战,并提供了这些挑战或缺乏最佳实践已产生影响的实例。我们向该领域提出建议,以提高评估的严谨性和信心,并尝试将围绕LM评估的许多隐性或民间知识编纂成文,为未来的发展奠定坚实基础。

英文摘要

Reliable evaluation of language models (LMs) remains an open challenge. Re- searchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. Evaluation difficulties are exacer- bated by the fracturing and siloing of information about conventions and common practices. In this paper we draw on three years of experience in evaluating large lan- guage models (LMs) as developers of the popular Language Model Evaluation Harness (lm-eval) (Gao et al., 2023) framework to provide guidance and lessons for the field moving forward. We document a variety of challenges faced by prac- titioners and provide concrete instances where these challenges or the absence of best practices have come into effect. We make recommendations to the field for improving evaluation rigor and confidence, and attempt to codify much of the tacit or folk knowledge surrounding LM evaluation, for a solid ground to move forward.

2403.07008 2026-06-02 cs.LG cs.AI cs.CL stat.ME

AutoEval Done Right: Using Synthetic Data for Model Evaluation

AutoEval 的正确做法:使用合成数据进行模型评估

Pierre Boyeau, Anastasios N. Angelopoulos, Nir Yosef, Jitendra Malik, Michael I. Jordan

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, USA(电子工程与计算机科学系,加州大学伯克利分校) Department of Systems Immunology, Weizmann Institute of Science, Rehovot, Israel(系统免疫学系,魏茨曼科学研究所) Inria, Ecole Normale Supérieure, Paris, France(法国国家信息与自动化技术研究所,巴黎高等师范学院)

AI总结 本文提出高效且统计上无偏的算法,利用AI标记的合成数据减少模型评估所需的人工标注量,在GPT-4实验中有效样本量提升高达50%。

Comments camera-ready paper version

详情
AI中文摘要

使用人工标注的验证数据评估机器学习模型可能成本高昂且耗时。AI标记的合成数据可用于减少此目的所需的人工标注数量,这一过程称为自动评估。我们为此提出了高效且统计上无偏的算法,在保持无偏性的同时提高样本效率。这些算法在GPT-4实验中使有效人工标注样本量增加高达50%。

英文摘要

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.

2405.01930 2026-06-02 cs.CL

OARelatedWork: A Large-Scale Dataset of Related Work Sections with Full-texts from Open Access Sources

OARelatedWork:来自开放获取源的大规模相关工作章节数据集及其全文

Martin Docekal, Martin Fajcik, Pavel Smrz

发表机构 * Brno University of Technology(布拉格技术大学)

AI总结 本文提出OARelatedWork数据集,包含全文和完整相关工作章节,用于从开放获取源生成相关工作,并评估了多种模型,发现大型语言模型在处理全文时事实准确性下降,而人类作者常引入无根据的抽象声明。

详情
AI中文摘要

本文介绍了OARelatedWork:一个来自开放获取源的相关工作生成数据集。它是首个用于相关工作生成的大规模多文档摘要数据集,包含完整的工作相关章节和引用论文的全文。其验证集和测试集的构建使得每篇被引论文的全文都可用,从而能够对全文相关工作生成进行受控评估。该数据集包含来自多个领域的94,450篇论文和5,824,689篇唯一引用论文。借助OARelatedWork,我们旨在将该领域从仅基于摘要生成相关工作章节的部分内容,转变为基于所有可用内容生成完整相关工作章节。我们(i)对一系列模型进行了基准测试,强调合成大规模全文上下文即使对现代大型语言模型(LLM)来说仍然是一个挑战:在我们的语句级评判下,GPT-4o-mini的基于证据的真实率从使用摘要时的92.9%下降到使用全文时的83.8%。我们(ii)通过对40篇论文和408个事实陈述进行人工评估,实证分析了人类写作行为,揭示作者经常引入未在局部源文本中扎根的抽象声明;因此,先进的LLM在严格的、基于证据的事实性方面实际上超越了人类基线。最后,我们(iii)进行了细粒度的元评估,揭示标准的基于参考的指标不足以评估这种长格式的结构化输出,并引入了一个稳健的语句级评估框架来弥补这一差距。

英文摘要

This paper introduces OARelatedWork: a dataset for related work generation from open-access sources. It is the first large-scale multi-document summarization dataset for related work generation, containing whole related work sections and full texts of cited papers. Its validation and test splits are constructed so that every cited paper is available in full text, enabling controlled evaluation of full-text related work generation. The dataset includes 94 450 papers and 5 824 689 unique referenced papers from multiple domains. With OARelatedWork, we aim to shift the field from generating parts of related work sections from abstracts only to generating entire related work sections from all available content. We (i) benchmark a wide spectrum of models, highlighting that synthesizing massive full-text contexts remains challenge even for modern Large Language Models (LLMs): under our statement-level judge, GPT-4o-mini's evidence-grounded True rate drops from 92.9% with abstracts to 83.8% with full texts. We (ii) empirically analyze human writing behavior through a human evaluation over 40 papers and 408 factual statements, revealing that authors frequently introduce abstractive claims ungrounded in localized source texts; consequently, advanced LLMs actually surpass human baselines in strict, evidence-grounded factuality. Finally, we (iii) conduct a fine-grained meta-evaluation, revealing that standard reference-based metrics are inadequate for evaluating such long-form structured outputs, and introduce a robust statement-level evaluation framework to address this gap.

2404.11326 2026-06-02 cs.CV

SCL: Towards Domain Generalization via Single-Temporal Multimodal Contrastive Learning for Remote Sensing Change Detection

SCL:面向遥感变化检测的单时相多模态对比学习域泛化方法

Qiangang Du, Jinlong Peng, Xu Chen, Qingdong He, Liren He, Qiang Nie, Mingmin Chi

发表机构 * Fudan University(复旦大学) Tencent YouTu Lab(腾讯YouTu实验室)

AI总结 提出基于视觉-语言预训练模型的单时相多模态对比学习(SCL)基础模型,结合动态文本-视觉上下文优化(DTCO)和可控生成与单时相训练策略(SAIN),无需目标数据集训练即可实现遥感变化检测的跨数据集泛化。

Comments CVPRW 2026

详情
AI中文摘要

近年来,基于CNN和Transformer的变化检测与异常检测模型在基于配对数据的多个数据集上取得了显著成功。然而,由于领域特定的设计,大多数此类方法表现出有限的跨数据集泛化能力,并且通常依赖于大量配对的标注数据。本文基于视觉-语言预训练模型,引入了一种单时相多模态对比学习(SCL)基础模型,用于变化检测,无需在目标数据集上进行训练。为了进一步提高模型学习文本和视觉信息上下文的能力,我们提出了一种动态文本-视觉上下文优化(DTCO)模块用于提示学习。同时,为了解决现有方法的数据依赖性问题,我们引入了一种可控生成和单时相训练策略(SAIN)。这使得我们能够利用大量现有的单时相图像训练模型,而无需配对标签。在各种真实世界变化检测数据集上的大量实验表明,SCL具有优越的性能和泛化能力,在评估设置下优于最先进的方法。代码可在https://github.com/Kane-Du/scl-cd.git获取。

英文摘要

In recent years, change detection and anomaly detection models based on CNN and transformer have achieved remarkable success across various datasets based on paired data. However, most such methods exhibit limited crossdataset generalization due to domain-specific designs and typically rely on large amounts of paired labeled data. In this paper, based on visual-language pre-training model, we introduce a Single-temporal multimodal Contrastive Learning (SCL) foundation models for change detection without training on the target dataset. To further improve the model's ability to learn context of textual and visual information, we propose a Dynamic Text-vision Context Optimization (DTCO) module for prompt learning. Meanwhile, to address the data dependency issue of existing methods, we introduce a controllable generation and Single-temporal trAINing strategy (SAIN). This allows us to train the model using a large number of existing single-temporal images without the need for paired label. Extensive experiments on various realworld change detection datasets demonstrate the superior performance and generalization of SCL, outperforming state-of-the-art methods under the evaluated settings. Code is available at https://github.com/Kane-Du/scl-cd.git.

2307.06647 2026-06-02 cs.RO cs.AI cs.CV

DeepIPCv2: LiDAR-powered Robust Environmental Perception and Navigational Control for Autonomous Vehicle

DeepIPCv2: 基于LiDAR的鲁棒环境感知与自动驾驶导航控制

Oskar Natan, Jun Miura

发表机构 * Department of Computer Science and Electronics, Universitas Gadjah Mada(计算机科学与电子系,加查马达大学) Department of Computer Science and Engineering, Toyohashi University of Technology(计算机科学与工程系,toyohashi技术大学)

AI总结 提出DeepIPCv2端到端自动驾驶框架,通过融合LiDAR点云分割与多视图投影构建鲁棒场景表示,结合门控循环单元、命令特定多层感知器和PID控制器实现路径点与导航控制命令的联合估计,在光照变化下取得最低总指标误差和最少驾驶干预。

Comments This work has been accepted for publication in IEEE Access. https://ieeexplore.ieee.org/document/11313052

详情
AI中文摘要

我们提出DeepIPCv2,一个端到端的自动驾驶框架,它集成了基于LiDAR的环境感知与命令特定的控制学习。与先前依赖摄像头的模型不同,DeepIPCv2采用点云分割和多视图投影来构建鲁棒的场景表示。这些特征通过门控循环单元、命令特定的多层感知器和PID控制器的组合进行融合和解码,以估计路径点和导航控制命令。这种设计增强了机动性并解决了驾驶数据集中的动作不平衡问题。为了验证模型,我们构建了一个覆盖不同光照条件的数据集,并进行了消融研究和与包括TransFuser在内的最新方法的对比测试。结果表明,DeepIPCv2实现了最低的总指标误差和最少的驾驶干预,突显了其对光照变化的鲁棒性和改进的控制精度。通过稍后在https://github.com/oskarnatan/DeepIPCv2发布代码,我们旨在支持端到端自动驾驶研究的可重复性和未来进展。

英文摘要

We propose DeepIPCv2, an end-to-end autonomous driving framework that integrates LiDAR-based environmental perception with command-specific control learning. Unlike prior camera-reliant models, DeepIPCv2 employs point cloud segmentation and multi-view projection to construct robust scene representations. These features are fused and decoded through a combination of gated recurrent units, command-specific multi-layer perceptrons, and PID controllers to estimate both waypoints and navigational control commands. This design enhances maneuverability and addresses action imbalance in driving datasets. To validate the model, we constructed a dataset covering diverse illumination conditions and conducted ablation studies and comparative tests against recent methods, including TransFuser. Results demonstrate that DeepIPCv2 achieves the lowest total metric error and the fewest driving interventions, highlighting both its robustness to illumination changes and its improved control accuracy. By releasing the codes at https://github.com/oskarnatan/DeepIPCv2 later, we aim to support reproducibility and future advancements in end-to-end autonomous driving research.

2312.03644 2026-06-02 cs.LG cs.MA

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

MACCA: 离线多智能体强化学习中的因果信用分配

Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang

发表机构 * King’s College London(伦敦国王学院) Eindhoven University of Technology(埃因霍温理工大学) University of Liverpool(利物浦大学) University of California San Diego(加州大学圣地亚哥分校)

AI总结 提出基于动态贝叶斯网络的因果信用分配框架MACCA,通过建模环境变量、状态、动作和奖励的因果关系,实现离线多智能体强化学习中准确且可解释的信用分配。

Comments 21 pages, 4 figures

详情
Journal ref
TMLR 2025
AI中文摘要

离线多智能体强化学习(MARL)在在线交互不切实际或存在风险的情况下具有重要价值。虽然MARL中的独立学习提供了灵活性和可扩展性,但在离线设置中,由于禁止与环境交互,准确地将信用分配给单个智能体面临挑战。在本文中,我们提出了一种新框架,即多智能体因果信用分配(MACCA),以解决离线MARL设置中的信用分配问题。我们的方法MACCA将生成过程表征为动态贝叶斯网络,捕获环境变量、状态、动作和奖励之间的关系。通过在离线数据上估计该模型,MACCA可以通过分析个体奖励的因果关系来学习每个智能体的贡献,确保准确且可解释的信用分配。此外,我们方法的模块化使其能够无缝集成到各种离线MARL方法中。理论上,我们证明了在离线数据集设置下,底层因果结构和用于生成智能体个体奖励的函数是可识别的,这为我们的建模正确性奠定了基础。在我们的实验中,我们证明MACCA不仅优于最先进的方法,而且在与其他骨干集成时也能提升性能。

英文摘要

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.

2310.20545 2026-06-02 cs.LG math.OC stat.ML

Optimizing accuracy and diversity: a multi-task approach to forecast combinations

优化准确性与多样性:一种多任务预测组合方法

Giovanni Felici, Antonio M. Sudoso

发表机构 * National Research Council(国家研究理事会) Sapienza University of Rome(罗马萨皮恩扎大学)

AI总结 提出一种基于深度学习架构的多任务优化方法,通过联合选择与组合预测模型,同时考虑准确性和多样性,提升时间序列点预测精度。

详情
Journal ref
Annals of Operations Research, 2026
AI中文摘要

我们提出了一种基于深度学习架构的多任务优化方法,用于时间序列预测。我们利用大量时间序列集合来识别可组合的预测模型权重,从而为每个序列生成预测。该方法联合处理两个任务:选择不同的预测模型及其有效组合。在此过程中,它以一种新颖的方式兼顾了预测方法的准确性和多样性。对于给定的时间序列,模型组合模块提取特征并用于优化预测方法的权重。同时,模型选择模块提取其他特征以识别用于预测的方法子集。该选择过程被构建为一个分类问题,标签表示用于序列的模型集合。这些标签通过求解一个辅助优化问题来确定,该问题为每个时间序列识别准确且多样的方法。然后,两个模块的输出被组合,整个神经网络通过梯度下降优化最小化自定义损失函数进行联合训练。在M4竞赛数据集和真实道路交通数据的大量序列上的实验结果表明,与最先进的方法相比,我们的方法提高了点预测精度。

英文摘要

We present a multi-task optimization approach based on a deep learning architecture for time series forecasting. We leverage large collections of time series to identify the weights of forecasting models that can be combined to produce forecasts for each series. This method jointly addresses two tasks: the selection of different forecasting models, and their effective combination. In doing so, it keeps into account, in an original way, both the accuracy and diversity of the forecasting methods. For a given time series, the model combination module extracts features and uses them to optimize the weights of the forecasting methods. Simultaneously, the model selection module extracts other features to identify the subset of methods to be used for the prediction. This selection process is framed as a classification problem, with the labels representing the set of models to be used for a series. These labels are determined by solving an auxiliary optimization problem that identifies accurate and diverse methods for each time series. The outputs of the two modules are then combined and the entire neural network is jointly trained by minimizing a custom loss function via gradient descent optimization. Experimental results on a large set of series from the M4 competition dataset and from real road traffic data show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.

2310.15676 2026-06-02 cs.CV cs.AI

Recent Advances in Multi-modal 3D Intelligence: A Comprehensive Survey and Evaluation

多模态3D智能的最新进展:综合调查与评估

Yinjie Lei, Zixuan Wang, Feng Chen, Guoqing Wang, Peng Wang, Yang Yang

发表机构 * College of Electronics and Information Engineering, Sichuan University(四川大学电子信息工程学院) School of Computer Science, University of Adelaide(阿德莱德大学计算机科学学院) School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院)

AI总结 本文系统综述了多模态3D智能方法,提出基于模态和任务的新分类法,并比较了基准数据集上的结果,最后讨论了未来研究方向。

详情
AI中文摘要

多模态3D智能因其在自动驾驶和世界模拟等领域的广泛应用而受到广泛关注。与传统的单模态3D理解相比,引入额外模态不仅提升了场景解释的丰富性和精确性,还为更高层次的物理世界交互奠定了基础。在仅依赖3D数据可能不足的多样化和挑战性环境中,这一点变得尤为关键。尽管过去六年中多模态3D方法的发展激增,特别是那些整合多相机图像(3D+2D)和文本描述(3D+语言)的方法,但缺乏全面深入的综述。在本文中,我们通过系统调查最新进展来弥补这一空白。我们首先简要总结了各种3D多模态任务中的独特挑战。之后,我们提出了一种新的分类法,根据模态和任务对现有方法进行彻底分类,探讨它们各自的优势和局限性。此外,我们提供了近期方法在几个基准数据集上的比较结果及深入分析。最后,我们讨论了未解决的问题,并提出了未来研究的几个潜在方向。

英文摘要

Multi-modal 3D Intelligence has gained considerable attention due to its wide applications in autonomous driving and world simulation, etc. Compared to conventional single-modal 3D understanding, introducing an additional modality not only elevates the richness and precision of scene interpretation but also provides a foundation for higher-level physical world interaction. This becomes especially crucial in varied and challenging environments where solely relying on 3D data might be inadequate. While there has been a surge in the development of multi-modal 3D methods over the past six years, especially those integrating multi-camera images (3D+2D) and textual descriptions (3D+language), a comprehensive and in-depth review is notably absent. In this paper, we present a systematic survey of recent progress to bridge this gap. We begin by briefly summarizing the unique challenges among various 3D multi-modal tasks. After that, we present a novel taxonomy that delivers a thorough categorization of existing methods according to modalities and tasks, exploring their respective strengths and limitations. Furthermore, comparative results of recent approaches on several benchmark datasets, together with insightful analysis, are offered. Finally, we discuss the unresolved issues and provide several potential avenues for future research.

2309.15946 2026-06-02 cs.LG cs.AI cs.NE math.DS

Unified Long-Term Time-Series Forecasting Benchmark

统一长期时间序列预测基准

Jacek Cyranka, Szymon Haponiuk

发表机构 * Institute of Informatics(信息学院)

AI总结 提出一个专为长期时间序列预测设计的综合数据集,通过标准化轨迹和多种模型基准测试,发现模型效果依赖于数据集,并引入改进的潜在NLinear和课程学习DeepAR模型。

详情
AI中文摘要

为了支持时间序列数据预测的机器学习方法的发展,我们提出了一个明确针对长期时间序列预测设计的综合数据集。我们整合了来自多种动态系统和真实记录的数据集集合。每个数据集通过将数据划分为具有预定回溯长度的训练和测试轨迹进行标准化。我们包含长度高达$2000$的轨迹,以确保对长期预测能力的可靠评估。为了确定在不同场景中最有效的模型,我们使用经典和最先进的模型(即LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE)进行了广泛的基准分析。我们的研究结果揭示了这些模型之间有趣的性能比较,突出了模型有效性的数据集依赖性。值得注意的是,我们引入了一个自定义的潜在NLinear模型,并通过课程学习阶段增强了DeepAR。两者都持续优于其原始版本。

英文摘要

In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.

2212.06751 2026-06-02 cs.LG cs.AI

Speeding Up Multi-Objective Hyperparameter Optimization by Task Similarity-Based Meta-Learning for the Tree-Structured Parzen Estimator

基于任务相似性元学习加速多目标超参数优化的树形结构Parzen估计器

Shuhei Watanabe, Noor Awad, Masaki Onishi, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg, Germany(弗赖堡大学计算机科学系) Artificial Intelligence Research Center, AIST, Tokyo, Japan(日本科学技术厅人工智能研究中心)

AI总结 提出利用任务间顶级域重叠定义的任务相似性扩展TPE采集函数到元学习设置,加速多目标超参数优化,理论分析并解决相似性局限,实验证明在表格HPO基准上达到最优性能并赢得AutoML 2022竞赛。

Comments Accpeted to IJCAI 2023

详情
AI中文摘要

超参数优化(HPO)是提升深度学习性能的关键步骤。实践者常面临多个指标间的权衡,如准确率和延迟。鉴于深度学习的高计算需求以及对高效HPO日益增长的需求,加速多目标优化变得愈发重要。尽管已有大量关于元学习用于HPO的工作,但现有方法不适用于多目标树形结构Parzen估计器(MO-TPE),这是一种简单而强大的多目标HPO算法。在本文中,我们利用任务间顶级域重叠定义的任务相似性,将TPE的采集函数扩展到元学习设置。我们还从理论上分析并解决了任务相似性的局限性。实验中,我们证明了该方法在表格HPO基准上加速了MO-TPE,并达到了最先进的性能。我们的方法还通过赢得AutoML 2022“Transformer多目标超参数优化”竞赛得到了外部验证。

英文摘要

Hyperparameter optimization (HPO) is a vital step in improving performance in deep learning (DL). Practitioners are often faced with the trade-off between multiple criteria, such as accuracy and latency. Given the high computational needs of DL and the growing demand for efficient HPO, the acceleration of multi-objective (MO) optimization becomes ever more important. Despite the significant body of work on meta-learning for HPO, existing methods are inapplicable to MO tree-structured Parzen estimator (MO-TPE), a simple yet powerful MO-HPO algorithm. In this paper, we extend TPE's acquisition function to the meta-learning setting using a task similarity defined by the overlap of top domains between tasks. We also theoretically analyze and address the limitations of our task similarity. In the experiments, we demonstrate that our method speeds up MO-TPE on tabular HPO benchmarks and attains state-of-the-art performance. Our method was also validated externally by winning the AutoML 2022 competition on "Multiobjective Hyperparameter Optimization for Transformers".

2304.10255 2026-06-02 cs.LG stat.ML

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

PED-ANOVA:高效量化任意子空间中超参数重要性

Shuhei Watanabe, Archit Bansal, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg, Germany(弗赖堡大学计算机科学系)

AI总结 提出PED-ANOVA方法,利用Pearson散度实现任意子空间中超参数重要性的闭式计算,在保持高效性的同时准确识别关键超参数。

Comments Accepted by IJCAI2023

详情
AI中文摘要

近年来,深度学习超参数优化(HPO)的流行凸显了良好超参数(HP)空间设计在训练强模型中的作用。而设计一个好的HP空间关键依赖于理解不同HP的作用。这激发了超参数重要性(HPI)的研究,例如使用流行的功能ANOVA(f-ANOVA)方法。然而,原始的f-ANOVA公式不适用于算法设计者最相关的子空间,例如由顶级性能定义的子空间。为解决此问题,我们推导了任意子空间下f-ANOVA的新公式,并提出一种使用Pearson散度(PED)实现HPI闭式计算的算法。我们证明,这种新算法称为PED-ANOVA,能够成功识别不同子空间中的重要HP,同时计算效率极高。

英文摘要

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

2211.14411 2026-06-02 cs.LG cs.AI

c-TPE: Tree-structured Parzen Estimator with Inequality Constraints for Expensive Hyperparameter Optimization

c-TPE: 带不等式约束的树结构Parzen估计器用于昂贵的超参数优化

Shuhei Watanabe, Frank Hutter

发表机构 * Department of Computer Science, University of Freiburg(弗赖堡大学计算机科学系)

AI总结 提出c-TPE方法,通过修改TPE的采样和模型以处理不等式约束,在81个昂贵HPO问题上取得最佳平均排名性能。

Comments Accepted to IJCAI 2023

详情
AI中文摘要

超参数优化(HPO)对于深度学习算法的强性能至关重要,而实际应用通常在性能要求之上施加一些约束,例如内存使用或延迟。在这项工作中,我们提出了约束TPE(c-TPE),这是广泛使用的通用贝叶斯优化方法——树结构Parzen估计器(TPE)的扩展,以处理这些约束。我们提出的扩展不仅仅是现有采集函数和原始TPE的简单组合,而是包括解决导致性能不佳问题的修改。我们通过实验和理论彻底分析了这些修改,提供了关于它们如何有效克服这些挑战的见解。在实验中,我们证明c-TPE在81个带不等式约束的昂贵HPO问题上,以统计显著性在现有方法中表现出最佳平均排名性能。由于缺乏基线,我们仅在附录D中讨论了我们方法对硬约束优化的适用性。该实现现在可通过OptunaHub获得。

英文摘要

Hyperparameter optimization (HPO) is crucial for strong performance of deep learning algorithms and real-world applications often impose some constraints, such as on memory usage or latency, on top of the performance requirement. In this work, we propose constrained TPE (c-TPE), an extension of the widely-used versatile Bayesian optimization method, tree-structured Parzen estimator (TPE), to handle these constraints. Our proposed extension goes beyond a simple combination of an existing acquisition function and the original TPE, and instead includes modifications that address issues that cause poor performance. We thoroughly analyze these modifications both empirically and theoretically, providing insights into how they effectively overcome these challenges. In the experiments, we demonstrate that c-TPE exhibits the best average rank performance among existing methods with statistical significance on $81$ expensive HPO problems with inequality constraints. Due to the lack of baselines, we only discuss the applicability of our method to hard-constrained optimization in Appendix D. The implementation is now available via OptunaHub.

2303.04345 2026-06-02 cs.LG

Federated Learning via Variational Bayesian Inference: Personalization, Sparsity and Clustering

通过变分贝叶斯推理的联邦学习:个性化、稀疏性和聚类

Xu Zhang, Wenpeng Li, Yunfeng Shao, Yonglin Liu, Kaiwen Zhou, Yinchuan Li

发表机构 * School of Artificial Intelligence, Xidian University(西安电子科技大学人工智能学院) LSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences(中国科学院数学与系统科学研究院) Noah’s Ark Lab, Huawei(华为诺亚实验室)

AI总结 提出基于变分贝叶斯推理的联邦学习方法,通过个性化、稀疏性和聚类策略解决数据异构和有限问题,实现更优性能。

Comments 18 pages, 5 figures

详情
AI中文摘要

联邦学习(FL)是一种有前景的框架,它在保护客户隐私的同时实现分布式机器学习。然而,FL因异构和有限的数据而性能下降。为了缓解这种下降,我们提出了一种新颖的个性化贝叶斯FL方法,名为pFedBayes。通过使用从服务器训练得到的全局分布作为每个客户的先验分布,每个客户通过最小化其个性化数据上的重构误差与下载的全局分布的KL散度之和来调整自己的分布。然后,我们提出了一种稀疏个性化贝叶斯FL方法,名为sFedBayes,以提高推理效率。为了克服非独立同分布数据中的极端异构性,我们提出了一种聚类贝叶斯FL模型,名为cFedbayes,通过为不同客户学习不同的先验分布。理论分析给出了这三种方法的泛化误差界,并表明所提出方法的泛化误差率在达到对数因子内达到极小极大最优性。此外,cFedBayes实现了聚类级别的泛化误差界,而不是pFedBayes中的单一统一界。大量实验表明,在异构和有限数据存在的情况下,所提出的方法在私有模型上比其他先进的个性化方法具有更好的性能。

英文摘要

Federated learning (FL) is a promising framework that models distributed machine learning while protecting the privacy of clients. However, FL suffers performance degradation from heterogeneous and limited data. To alleviate the degradation, we present a novel personalized Bayesian FL approach named pFedBayes. By using the trained global distribution from the server as the prior distribution of each client, each client adjusts its own distribution by minimizing the sum of the reconstruction error over its personalized data and the KL divergence with the downloaded global distribution. Then, we propose a sparse personalized Bayesian FL approach named sFedBayes to enhance the inference efficiency. To overcome the extreme heterogeneity in non-i.i.d. data, we propose a clustered Bayesian FL model named cFedbayes by learning different prior distributions for different clients. Theoretical analysis gives the generalization error bound of three approaches and shows that the generalization error rates of the proposed approaches achieve minimax optimality up to a logarithmic factor. Moreover, cFedBayes achieves a cluster-level generalization error bound, rather than a single uniform bound in pFedBayes. Numerous experiments demonstrate that the proposed approaches have better performance than other advanced personalized methods on private models in the presence of heterogeneous and limited data.