Abstract: In accordance with an embodiment, a method is disclosed for detecting action in a video clip. An audio clip is extracted from the video clip. The audio clip is converted to an auditory spectrogram. The auditory spectrogram is used to construct a self-similarity matrix. The self-similarity matrix is then used to calculate a novelty curve. The clip is then segmented into segments according to peaks in the novelty curve. Each of segments is scored, and then classified as an action clip if the score is above or below a predetermined threshold. Related methods for folding a digitized song into a shorter version of itself and for sequencing a set of user-supplied video and photo clips to a user-supplied song are further disclosed.
Abstract: In accordance with an embodiment, a method is disclosed for detecting action in a video clip. An audio clip is extracted from the video clip. The audio clip is converted to an auditory spectrogram. The auditory spectrogram is used to construct a self-similarity matrix. The self-similarity matrix is then used to calculate a novelty curve. The clip is then segmented into segments according to peaks in the novelty curve. Each of segments is scored, and then classified as an action clip if the score is above or below a predetermined threshold. Related methods for folding a digitized song into a shorter version of itself and for sequencing a set of user-supplied video and photo clips to a user-supplied song are further disclosed.