Abstract: Methods and apparatus are provided to establish temporal alignment of media clips. In an example embodiment, first and second media clips each contain an audio portion and the method comprises: determining an estimated global offset between the first and second clips; choosing a first test region of the first clip and identifying a corresponding second test region in the second clip based at least in part on the estimated global offset. The first and second test regions are compared to determine a local offset.