Abstract: Methods, apparatus and systems for processing and tagging at least a portion of a video with metadata are provided herein. In some embodiments, a method for processing and tagging at least a portion of a video with metadata includes extracting a plurality of frames from the video, generating a fingerprint for each frame of the plurality of frames, or for a set of frames of the plurality of frames, determining contextual data within at least one frame or set of frames, associating the generated fingerprint of each frame or set of frames with the determined contextual data, and storing the association of the fingerprint of each frame or set of frames and the contextual data.