The data-driven extreme value distribution: non-parametric tail estimation with a derived stability criterion
数据驱动的极值分布:基于导出稳定性准则的非参数尾部估计
Michael Sandbichler, Tobias Hell
AI总结 提出数据驱动极值分布(DDEVD),一种非参数估计器,通过核方法重建基分布并导出稳定性准则,在降水与冶金数据中优于传统极值模型。
详情
- Comments
- 28 pages, 6 figures
量化极端事件的可能性是风险评估的基础,然而经典极值理论依赖于渐近假设,这在数据稀疏、非平稳的情况下失效,而实践者越来越常遇到这种情况。我们引入了数据驱动极值分布(DDEVD),一种非参数估计器,它元统计地聚合所有观测值,并用核重建基分布,去除了参数尾部假设。我们推导了其最优带宽,并证明了一个稳定性定律 $m < C\\,n^{1+\gamma/2}$,将可靠外推与极值指数 $\gamma$ 联系起来。在亚小时尺度的阿尔卑斯降水数据中,DDEVD 从单个十年中恢复了稳定的100年重现水平(校准比率 $0.96$),与完整记录参考值的偏差超过 $50\\%$ 的情况在不到五十分之一的窗口中发生——而 GEV 拟合则为五分之一。在冶金显微图像中,它在安全相关的晶粒尺寸尾部上与广义极值拟合相匹配,而标准对数正态分布在 $1\\,\mathrm{cm}^{2}$ 处高估了 $58\\%$。
Quantifying the likelihood of extreme events underpins risk assessment, yet classical Extreme Value Theory relies on asymptotic assumptions that fail in the data-sparse, non-stationary regimes practitioners increasingly face. We introduce the Data-Driven Extreme Value Distribution (DDEVD), a non-parametric estimator that aggregates all observations metastatistically and reconstructs the base distribution with a kernel, removing parametric tail assumptions. We derive its optimal bandwidth and prove a stability law $m < C\,n^{1+\gamma/2}$ relating reliable extrapolation to the extreme value index $\gamma$. In sub-hourly Alpine precipitation, DDEVD recovers stable 100-year return levels from single decades (calibration ratio $0.96$), departing from the full-record reference by over $50\,\%$ in fewer than one window in fifty -- versus one in five for a GEV fit. In metallurgical micrographs, it matches a generalised extreme-value fit on the safety-relevant grain-size tail, where the standard log-normal over-predicts by $58\,\%$ at $1\,\mathrm{cm}^{2}$.