Abstract: Systems and methods for extracting objects from a first video and inserting said objects into a second new background. In an embodiment, at least one hardware processor is used to receive a first video depicting a first scene comprising one or more objects and a background; identify the one or more objects in the first video; generating a first video layer by extracting the identified one or more objects from the background of the first video; receive a second video layer depicting a second scene; and merge the first video layer and second video layer to generate a composite scene, wherein the one or more objects of the first video layer overlaid on the second scene such that the one or more objects appear as part of the second scene.