arXivDaily arXiv每日学术速递 周一至周五更新
2606.20078 2026-06-19 stat.OT 新提交

A Law of Iterated Expectation Primer for Causal Inference

因果推断中的迭代期望定律入门

Ashley I. Naimi, Razieh Nabi, Lindsay J. Collin, Paul N. Zivich, Stephen R. Cole

AI总结 本文介绍迭代期望定律及其在因果效应识别中的应用,通过g公式的两种非参数等价形式(NICE和ICE)和三个数值示例阐明其数学直觉。

详情
AI中文摘要

g公式是识别观察数据中因果效应的基础工具,它基于迭代期望定律——统计学中的一个关键数学恒等式。然而,表达迭代期望定律和g公式的符号对于统计背景不足的人来说可能难以理解。我们提供了一篇入门文章,介绍迭代期望定律、用于表达它的积分符号,以及它通过g公式在因果效应识别中的作用。在因果一致性、正性和条件可交换性假设下,迭代期望定律可以重写为因果标准化公式(g公式),有两种非参数等价形式:非迭代条件期望(NICE)形式,涉及条件结果均值的单一加权平均;以及迭代条件期望(ICE)形式,涉及嵌套期望。我们通过三个逐步复杂的数值示例说明这两种形式:一个时间固定示例,包含单个二元混杂因子;一个时间固定示例,包含离散和连续混杂因子;以及一个时间变化示例,包含两个时间点。我们阐明了迭代期望定律是什么,它与g公式的关系,以及如何在实际数据示例中理解其数学公式的直觉,这些示例可以推广到各种场景。

英文摘要

The g-formula is a foundational tool for identifying causal effects in observational data. This tool is based on the law of iterated expectation, a key mathematical identity in statistics. However, the notation with which the law of iterated expectation and the g-formula is expressed can be opaque to those with little background in statistics. We provide a primer introducing the law of iterated expectation, the integration notation used to express it, and its role for causal effect identification via the g-formula. Under the assumptions of causal consistency, positivity, and conditional exchangeability, the law of iterated expectation can be rewritten as a causal standardization formula (the g-formula) in two nonparametrically equivalent forms: a non-iterative conditional expectation (NICE) form involving a single weighted average of conditional outcome means, and an iterative conditional expectation (ICE) form involving nested expectations. We illustrate both forms using three progressively complex numerical examples: a time-fixed example with a single binary confounder, a time-fixed example with discrete and continuous confounders, and a time-varying example with two timepoints. We provide clarity on what the law of iterated expectation is, how it is related to the g-formula, and how to gain intuition of its mathematical formulations in actual data examples that can be generalized to a range of settings.

2606.19775 2026-06-19 cs.SI stat.AP stat.OT 交叉投稿

Rethinking Sampling Strategy in Link Prediction

重新思考链接预测中的采样策略

Yilin Bi, Zhenyu Deng, Xinshan Jiao, Tao Zhou

AI总结 提出β-采样方案,研究两阶段采样对链接预测性能的影响,发现缺失链接的结构特征显著影响预测精度,且第二阶段采样策略至关重要。

Comments 19 pages, 5 figures, 3 tables

详情
AI中文摘要

许多现实世界的网络是不完整的,使得链接预测成为网络科学中的一个基本挑战。为了训练参数和评估算法,观察到的链接通常被划分为三个子集,即训练集、验证集和探测集。这种划分隐含地涉及两个采样过程:第一阶段采样产生探测集,第二阶段采样获得变化集。迄今为止,我们对这两个采样过程如何影响算法性能的理解仍然非常有限。为了解决这个问题,我们提出了一种称为β-采样的采样方案,其中链接的采样概率与其两个端点的度数乘积的β次幂成正比。在45个真实网络上的实验表明,通过改变探测集模拟的缺失链接的结构特征显著影响预测精度。当缺失链接倾向于连接高度数节点时,这类链接可以很容易地被准确预测。此外,即使探测集固定,第二阶段采样仍然对预测精度产生显著影响。值得注意的是,最优的第二阶段采样策略不同于随机采样(随机选择链接形成验证集)和一致采样(保证验证集和探测集中的链接具有相同的结构特征)。

英文摘要

Many real-world networks are incomplete, making link prediction a fundamental challenge in network science. To train parameters and evaluate algorithms, observed links are usually divided into three subsets, namely training, validation, and probe sets. This division implicitly involves two sampling processes: first-stage sampling yields the probe set and second-stage sampling obtains the variation set. To date, our understanding of how these two sampling processes affect algorithm performance remains quite limited. To address this issue, we propose a sampling scheme called $β$-sampling, where the sampling probability of a link is proportional to the product of the degrees of its two endpoints raised to the power of $β$. Experiments on 45 real-world networks reveal that the structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy. When missing links tend to connect high-degree nodes, such links can be predicted accurately with ease. Furthermore, even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy. Notably, the optimal second-stage sampling strategy differs from \textit{random sampling} (which randomly selects links to form the validation set) and \textit{consistent sampling} (which guarantees that links in the validation and probe sets share identical structural characteristics).

2603.06820 2026-06-19 econ.EM stat.OT 版本更新

Hippocratic Utility and Status Quo Bias

希波克拉底效用与现状偏见

Tomasz Strzalecki

AI总结 本文通过简单例子揭示一种重视失去生命多于拯救生命的效用函数,其适用范围比最初看起来有限得多。

详情
AI中文摘要

一种效用函数被提出,它更重视失去的生命而非被拯救的生命。我不质疑这种不对称背后的伦理动机。然而,我通过一个简单例子表明,这种决策标准的适用范围比最初看起来要有限得多。

英文摘要

A utility function has been proposed that values more lives that are lost than those that are saved. I do not dispute the ethical motivation behind this kind of asymmetry. However, I show with a simple example that the scope of applicability of such a decision criterion is considerably more limited than it may first appear.

2508.14009 2026-06-19 stat.OT 版本更新

Understanding Pedagogical Content Knowledge of Introductory Data Science Instructors: An Inaugural Framework

理解入门数据科学教师的教学内容知识:一个初步框架

Sinem Demirci, Mine Doğucu, Andrew Zieffler, Joshua M. Rosenberg

AI总结 通过访谈14名入门数据科学教师并分析教学大纲,探索其教学内容知识(PCK)的关键组成部分,为教师发展提供见解,并建立IDS领域的PCK初步框架。

Comments 67 pages, 4 tables

详情
AI中文摘要

随着数据科学成为一门独立的学科,入门数据科学(IDS)课程在塑造学生的基础理解方面发挥着关键作用。这些课程通常由没有数据科学或教育学正式培训的教师授课,为研究教学内容知识(PCK)提供了一个独特且全球相关的背景。本研究基于对14名IDS教师的半结构化访谈及其课程大纲,探讨IDS教师如何描述和理解其教学实践,并通过PCK的视角进行分析。研究结果突出了关于IDS的PCK的关键组成部分,并为支持教师发展提供了见解。这项工作有助于将PCK研究扩展到新的跨学科领域,并支持全球范围内数据科学教育能力建设的持续努力。它可作为开发专门针对IDS的PCK框架的起点。

英文摘要

As data science emerges as a distinct academic discipline, introductory data science (IDS) courses play a key role in shaping students foundational understanding. Often taught by instructors without formal training in data science or pedagogy, these courses present a unique and globally relevant context for examining pedagogical content knowledge (PCK). Drawing on semi-structured interviews with 14 IDS instructors and their course syllabi, this study explores how IDS instructors describe and make sense of their teaching practices, which are analyzed through the lens of PCK. The findings highlight key components of PCK about IDS and offer insights into supporting instructor development. This work contributes to expanding the scope of PCK research into new interdisciplinary domains and ongoing global efforts to build capacity in data science education. It could serve as a starting point for developing a PCK framework specific to IDS.