Abstract: A computer implemented method and system for processing an audio signal. The method includes the steps of extracting prosodic features from the audio signal, aligning the extracted prosodic features with a script derived from or associated with the audio signal, and segmenting the script with the aligned extracted prosodic features into structural blocks of a first type. The method may further include determining a distance measure between a structural block of a first type derived from the script with another structural block of the first type using, for example, the Damerau-Levenshtein distance.