Identifying the post-pandemic determinants of low performing students in Latin America through Interpretable Machine Learning methods
通过可解释机器学习方法识别拉丁美洲后疫情时代低表现学生的决定因素
Marcos Delprato
AI总结 基于2022年PISA数据,使用堆叠模型和SHAP分析,识别拉丁美洲低表现学生的关键决定因素,发现少数语言、留级、无数字设备、贫困家庭、兼职工作及学校劣势是主要风险因素。
详情
- Journal ref
- Engineering Applications of Artificial Intelligence, 2026
- Comments
- 48 pages, 13 figures
引言。拉丁美洲(LAC)学生未达到基本学习能力的比例很高,考虑到该地区深层次的结构性不平等和更大的疫情后学习损失,这令人担忧。在此背景下,本文旨在帮助识别低表现和表现不佳学生(低于2级)的决定因素。方法。基于2022年国际学生评估项目(PISA)中10个LAC国家的数据,使用集成二元分类模型的堆叠模型,并应用Shapley加法解释(SHAP)分析以实现可解释性,我们识别了影响低表现群体学生表现的关键因素。结果。我们发现,最有可能成为未达标学生的学生讲少数语言且曾留级,家中没有数字设备,来自贫困家庭,每周有一半时间打工赚钱,且其所在学校存在广泛劣势,如学校氛围差、信息和通信技术(ICT)基础设施薄弱以及教学质量差(仅三分之一的教师持有资格证书)。对于各国估计,我们发现排名靠前的因素的贡献模式相当一致,其中小学留级、家庭财富和教育ICT投入在10个国家中至少有8个进入前十名协变量。讨论。本文的研究结果有助于广泛研究识别和瞄准拉丁美洲教育系统中被落在后面的学生的策略。
Introduction. The high prevalence of students not achieving basic learning competencies in Latin America (LAC) is concerning, even more so considering the region's deep structural inequalities and the larger post-pandemic learning losses. Within this scenario, the paper aims to contribute to the identification of the determinants of bottom and low performers (below level 2). Methodology. Based on 2022 data from the Programme for International Student Assessment (PISA) for 10 LAC countries, and using a stacking model integrating binary classification models as well as by applying Shapley Additive Explanations (SHAP) analysis for interpretability, we identify critical factors impacting on the student performance across low performers groups. Results. We find that a student with the highest probability of being a not achiever speaks a minority language and had repeated, has no digital devices at home, comes from a poor family and works for payment half of the week, and the school the student attends has wide disadvantages such as bad school climate, weak Information and Communication Technology (ICT) infrastructure and poor teaching quality (only a third of teachers being certified). For countries' estimates, we find quite homogeneous patterns regarding the contribution of top ranked factors, with repetition at primary, household wealth, and educational ICT inputs being top ten ranked covariates in at least 8 out of the 10 total countries. Discussions. The paper findings contribute to the broad literature on strategies to identify and to target those most left behind in Latin American education systems.