Abstract: Systems and methods for integrating a sequence of 2D-images of an object into a synthetic 3D-scene. A sequence of 2D-images of a target object is captured using a smart device such as a smartphone. The object is extracted from the image sequence, and converted into a corresponding sequence of flat-surfaced 3D-renderable objects that are placed in a synthetic 3D-scene. Movement and orientation of the smart device are captured and translated into corresponding viewing points in the 3D-scene, in which the viewing points are then used to 3D render the 3D-scene, together with the flat-surfaced 3D-renderable objects now embedded therewith, into a video sequence showing the target object as an integral part of the 3D-scene. Other effects such as lighting, shadowing and reflections are rendered in conjunction with the flat-surfaced 3D-renderable objects so as to further enhance an illusion that the target object is an integral part of the 3D-scene.