Kling-Gupta linear regression
Kling-Gupta线性回归
Hristos Tyralis, Georgia Papacharalampous
AI总结 本文形式化Kling-Gupta损失函数,推导多元线性回归中参数估计的显式公式,证明其与普通最小二乘的差异,并建立渐近性质。
详情
- Comments
- 64 pages, 8 figures, 3 tables
尽管Kling-Gupta效率($\mathrm{KGE}$)在水文模型评估中被广泛采用,但其作为统计估计量的性质仍未探索。研究这些性质是必要的,因为参数估计和预测评估本质上是关联的。为此,我们在极值估计框架内形式化了负向Kling-Gupta损失$L_\mathrm{KG} = (1 - \mathrm{KGE})^2$(等价于最大化$\mathrm{KGE}$),并分析了其在多元线性回归中的行为。我们建立了参数估计的显式公式,表明Kling-Gupta线性回归通过一个由预测变量和响应的样本方差及协方差决定的方差膨胀因子,缩放普通最小二乘(OLS)系数向量。我们证明,Kling-Gupta线性回归预测在训练集上复制了响应的样本方差,这与OLS固有的方差缩减形成对比,而两种估计量都保持了观测的样本均值,并在预测与响应之间实现了相同的样本相关性。我们分析表明,没有单一的估计量能同时最大化Nash-Sutcliffe效率$\mathrm{NSE}$和$\mathrm{KGE}$:OLS估计量达到最大可能的$\mathrm{NSE}$但未达到最大$\mathrm{KGE}$,而Kling-Gupta估计量以牺牲$\mathrm{NSE}$为代价最大化$\mathrm{KGE}$。我们证明了Kling-Gupta估计量几乎必然收敛到明确定义的总体极限,并代数表达了这些极限。此外,我们评估了两种估计量的训练集和测试集性能指标,表明对于每个估计量,训练集和独立测试集上的指标渐近收敛到相同的极限(尽管OLS和Kling-Gupta回归的极限不同)。
Although the Kling-Gupta efficiency ($\mathrm{KGE}$) is widely adopted for model evaluation in hydrology, its properties as a statistical estimator remain unexplored. Investigating these properties is necessary because parameter estimation and forecast evaluation are inherently linked. To address this, we formalize the negatively oriented Kling-Gupta loss $L_\mathrm{KG} = (1 - \mathrm{KGE})^2$ within an extremum estimation framework (equivalent to maximizing $\mathrm{KGE}$) and analyze its behavior in multiple linear regression. We establish explicit formulas for the parameter estimates, showing that Kling-Gupta linear regression scales the ordinary least squares (OLS) coefficient vector by a variance-inflation factor governed by the sample variances and covariances of the predictors and the response. We show that Kling-Gupta linear regression predictions replicate the sample variance of the response on the training set, in contrast to the variance reduction inherent to OLS, while both estimators maintain the sample mean of the observations and achieve the same sample correlation between the predictions and the response. We show analytically that no single estimator can simultaneously maximize both the Nash-Sutcliffe efficiency $\mathrm{NSE}$ and $\mathrm{KGE}$: the OLS estimator attains the maximum possible $\mathrm{NSE}$ but not the maximum $\mathrm{KGE}$, while the Kling-Gupta estimator maximizes $\mathrm{KGE}$ at the cost of $\mathrm{NSE}$. We prove the almost sure convergence of the Kling-Gupta estimator to well-defined population limits and express those limits algebraically. Furthermore, we evaluate the training and test set performance metrics for both estimators, demonstrating that for each estimator the metrics on the training set and on an independent test set converge asymptotically to identical limits (though the limits differ between OLS and Kling-Gupta regression).