Abstract: Systems and methods are provided for to estimate the pose of a human subject's head from a sequence of images received from a single depth camera by processing the images to generate a continuous estimate of the head pose in a 3-dimensional (3D) space, and to generate a 3D head model for display and further use. The subject is provided instructions to rotate their head in a first direction until a threshold angle of rotation is reached and then are provided instructions to rotate their head in a second direction. The depth camera provides a sequence of captured images which are processed to extract head meshes. After capture is complete the head meshes are merged to generate a 3D model of the subject's head.