Abstract: Systems and methods for hand pose estimation are provided. For example, a computing device may obtain an image, such as an image of a hand. The computing device may apply one or more preprocessing processes to the image to generate an augmented image. Further, the computing device may apply a first machine learning process to the augmented image to generate a plurality of keypoints. The computing device may also apply a second machine learning process to the plurality of keypoints to generate a plurality of depth values. The computing device may further determine a plurality of angles based on the plurality of keypoints and the plurality of depth values. In some examples, the computing device may generate a model comprising a plurality of segments based on the plurality of angles. The computing device may store the plurality of angles and, in some examples, the model in a memory device.