Data-driven Head Motion Generation through Natural Gaze-Head Coordination
数据驱动的自然注视-头部协调头部运动生成
Xiaohan Liu, Yilin Wen, Yusuke Sugano
AI总结 提出首个数据驱动方法,通过自动提取自然注视和头部运动,利用条件变分自编码器生成与注视相关的头部运动,并应用于注视控制的视频生成。
详情
我们提出了首个数据驱动的方法,从大规模野外面部视频中建模时间上的注视-头部协调。为了获得可泛化学习的训练数据,我们提出了一种自动流水线,利用现成的基于外观的注视估计器提取自然且多样化的注视和头部运动。为了捕捉注视-头部协调的概率相关性和时间动态,我们将模型建立在生成性条件变分自编码器上,以生成合理且多样化的注视条件头部运动。我们进一步将框架应用于注视控制的面部视频生成,其中我们实现了与输入注视相关的自然逼真头部运动的视频生成——这一方面此前未被强调。人类评估和定量比较证明了我们方法的有效性并验证了我们的设计选择,评估者对我们的方法表现出统计学上显著的偏好,优于基线方法。
We present the first data-driven approach to model temporal gaze-head coordination from large-scale in-the-wild facial videos. To obtain training data for generalizable learning, we propose an automatic pipeline that extracts natural yet diverse gaze and head motions with off-the-shelf appearance-based gaze estimators. To capture the probabilistic correlation and temporal dynamics of gaze-head coordination, we build our model on a generative conditional Variational Autoencoder for plausible yet diverse gaze-conditioned head motion generations. We further apply our framework to gaze-controlled facial video generation, where we enable video generation with natural and realistic head motion correlated to the input gaze - an aspect that has not been emphasized before. Human evaluation and quantitative comparisons demonstrate our method's effectiveness and validate our design choices, with evaluators showing statistically significant preference for our approach over baseline methods.