Abstract: According to one aspect, there are systems and methods for generating data indicative of a three-dimensional representation of a scene. Current depth data indicative of a scene is generated using a sensor. Salient features are detected within a depth frame associated with the depth data, and these salient features are matched with a saliency likelihoods distribution. The saliency likelihoods distribution represents the scene, and is generated from previously-detected salient features. The pose of the sensor is estimated based upon the matching of detected salient features, and this estimated pose is refined based upon a volumetric representation of the scene. The volumetric representation of the scene is updated based upon the current depth data and estimated pose. A saliency likelihoods distribution representation is updated based on the salient features. Image data indicative of the scene may also be generated and used along with depth data.