AUDIO METERS AND PARAMETER CONTROLS
Some embodiments provide a media editing application that displays the audio level of a set of one or more clips that has been mixed with other clips. To indicate the audio level of the set of clips that has been mixed with other clips, the media editing application of some embodiments routes a combined audio signal of the set of clips over a meter bus in order to determine the audio level of the combined audio signal. Alternatively, the media editing application of some embodiments extracts metering information from each clip in a set of clips prior to mixing the clips. The metering information is then used to estimate the audio level of one or more clips in the composite presentation.
This application claims the benefit of U.S. Provisional Application 61/537,041, filed Sep. 20, 2011, and U.S. Provisional Application 61/537,567, filed Sep. 21, 2011. U.S. Provisional Application 61/537,041 and U.S. Provisional Application 61/537,567 are incorporated herein by reference.
BACKGROUNDTo date, many media editing applications exist for creating a composite media presentation by compositing several pieces of media content such as video, audio, animation, still image, etc. In some cases, a media editing application combines a composite of two or more clips with one or more other clips to output (e.g., play, export) the composite presentation.
There are a number of different problems that can occur when outputting such a composite presentation. For example, some movie studios require a particular content (e.g., dialog content, music content) of a composite presentation to be separate from other content. The content separation allows the movie studios to easily replace the composite presentation's dialog in one language with a dialog in another language. The problem with providing separate content is that, once several pieces of media content are mixed as one mixed content, the mixed content cannot be un-mixed to provide the separate content.
As another example, displaying the audio levels of different media clips during playback of a composite presentation is useful as the audio levels indicate how much audio one or more of the different media clips are contributing to the overall mix. The problem with this is similar to the example described above. That is, a mix of the different media clips cannot be un-mixed during playback to provide metering information for the different media clips.
In addition, some media editing applications apply one or more different effects (e.g., reverb effect, echo effect, blur effect, distort effect, etc.) to a set of clips when outputting a composite presentation. Several of these effects are applied using a “send” (i.e., “send and return”) that entails routing audio signals of different clips over an auxiliary (“aux”) bus to an effects unit. For a typical media editing application, a “send” effect is applied with the user manually adding an input aux track, specifying an effect for the aux track, specifying an input bus for the aux track, creating the “send”, and identifying the specified bus to route the audio signals of different clips. In this manner, several audio signals of different clips can be routed over one aux bus in order to apply a same effect (e.g., an echo effect) to a combined audio signal of the different clips. However, the “send” technique becomes increasingly complicated as additional aux buses are added to route audio signals of multiple different clips.
Furthermore, several of the media editing applications described above allow users to view metadata associated with media content and/or perform organizing operations using the metadata. However, these media editing applications lack the tools or the functionality to perform different editing operations by using one or more pieces of metadata that is associated with the media content.
The concepts described in this section have not necessarily been previously conceived, or implemented in any prior approach. Therefore, unless otherwise indicated, it should not be assumed that any concepts described in this section qualify as prior art merely by virtue of their inclusion in this section.
BRIEF SUMMARYSome embodiments provide a media editing application that uses metadata or metadata tags associated with media content to facilitate editing operations. In some embodiments, the editing operations are performed on the media content at various different stages of the editing process in order to create a composite presentation. In creating the composite presentation, one or more effects are associated with a metadata tag. Once the effects are associated, the media editing application applies the effects to different pieces of media content tagged with the metadata tag in order to create the composite presentation.
Different embodiments provide different schemes for specifying one or more effects to apply to media content that have been associated with a metadata tag. For instance, in some embodiments, the media editing application allows an effect chain or an effect list to be specified for each type or category of metadata tag. In some embodiments, the media editing application allows its user to specify effect properties for the effects in the effect list. These effect properties define how the corresponding effect is applied to the media content.
Based on metadata associated with different clips, the media editing application of some embodiments applies a set of effects (e.g., echo effect, reverb effect) by using a “send” or a “send and return”. In some embodiments, the “send” is performed automatically such that the routing of audio signals of the different clips to an effect module is transparent to the application's user. That is, the user does not have to add an input auxiliary (“aux”) track, specify an effect for the aux track, specify an input bus for the aux track, create the “send”, and identify the specified bus to route the audio signals of the different clips. Instead, the user can simply specify a particular effect for a metadata tag. The media editing application then applies the particular effect using the “send” to a combined audio signal of each clip tagged with the metadata tag.
The media editing application of some embodiments applies one or more effects directly on each clip without using the “send”. One example of such technique is applying an effect as an “insert” effect that processes (e.g., filters, distorts) an incoming audio signal and outputs the processed audio signal. For example, when a metadata tag is associated with a particular effect, the media editing application of some embodiments automatically applies the particular effect to each audio signal of the different clips tagged with the metadata tag.
In some embodiments, when playing a composite presentation, the media editing application displays the audio level of a set of one or more clips that has been mixed with other clips. For example, the audio signals of the set of clips can be mixed with other clips in order to play the composite presentation. To indicate the audio level of the set of clips that has been mixed with other clips, the media editing application of some embodiments routes a combined audio signal of the set of clips over a meter bus in order to determine the audio level of the combined audio signal. In some embodiments, the media editing application scales (i.e., reduces or increases) the audio level of one or more clips by processing down a signal chain or sequence of operations and identifying what one or more of the clips are contributing to the overall mix.
Alternatively, the media editing application of some embodiments extracts metering information from each clip in a set of clips prior to mixing the clips. The metering information is then used to estimate the audio level of one or more clips in the composite presentation. Similar to sending the audio signal over the meter bus, the media editing application of some embodiments scales the estimated audio level by identifying what one or more of the clips are contributing to the overall mix.
In some embodiments, the media editing application allows a composite presentation to be exported to different tracks (e.g., different files). To export the composite presentation, the media editing application of some embodiments performs multiple rendering passes on a sequence of clips while muting one or more of the clips in the sequence. In some such embodiments, the composite presentation is output to different tracks based on metadata associated with the clips. For example, with these metadata tags, a multi-track output can be specified as a first track for each clip tagged as dialog, a second track for each clip tagged as music, etc. In this manner, the editor or a movie studio can easily replace one track with another track.
The media editing application of some embodiments uses metadata to provide user interface controls. In some such embodiments, these controls are used to display properties of tagged clips and/or specify parameters that affect the tagged clips. Example of such user interface controls include audio meters, volume controls, different controls for modifying (e.g., distorting, blurring, changing color) images, etc.
The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description, and the Drawings, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.
The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.
In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.
Some embodiments provide a media editing application that uses metadata or metadata tags associated with media content to facilitate editing operations. In some embodiments, the editing operations are performed on the media content at various different stages of the editing process in order to create a composite presentation. In creating the composite presentation, one or more effects are associated with a metadata tag. Once the effects are associated, the media editing application applies the effects to different pieces of media content tagged with the metadata tag in order to create the composite presentation.
Different embodiments provide different schemes for specifying one or more effects to apply to media content that have been associated with a metadata tag. For instance, in some embodiments, the media editing application allows an effect chain or an effect list to be specified for each type or category of metadata tag. In some embodiments, the media editing application allows its user to specify effect properties for the effects in the effect list. These effect properties define how the corresponding effect is applied to the media content.
Based on metadata associated with different clips, the media editing application of some embodiments applies a set of effects (e.g., echo effect, reverb effect) by using a “send” or a “send and return”. In some embodiments, the “send” is performed automatically such that the routing of audio signals of the different clips to an effect module is transparent to the application's user. That is, the user does not have to add an input auxiliary (“aux”) track, specify an effect for the aux track, specify an input bus for the aux track, create the “send”, and identify the specified bus to route the audio signals of the different clips. Instead, the user can simply specify a particular effect for a metadata tag. The media editing application then applies the particular effect using the “send” to a combined audio signal of each clip tagged with the metadata tag.
The media editing application of some embodiments applies one or more effects directly on each clip without using the “send”. One example of such technique is applying an effect as an “insert” effect that processes (e.g., filters, distorts) an incoming audio signal and outputs the processed audio signal. For example, when a metadata tag is associated with a particular effect, the media editing application of some embodiments automatically applies the particular effect to each audio signal of the different clips tagged with the metadata tag.
In some embodiments, when playing a composite presentation, the media editing application displays the audio level of a set of one or more clips that has been mixed with other clips. For example, the audio signals of the set of clips can be mixed with other clips in order to play the composite presentation. To indicate the audio level of the set of clips that has been mixed with other clips, the media editing application of some embodiments routes a combined audio signal of the set of clips over a meter bus in order to determine the audio level of the combined audio signal. In some embodiments, the media editing application scales (i.e., reduces or increases) the audio level of one or more clips by processing down a signal chain or sequence of operations and identifying what one or more of the clips are contributing to the overall mix.
Alternatively, the media editing application of some embodiments extracts metering information from each clip in a set of clips prior to mixing the clips. The metering information is then used to estimate the audio level of one or more clips in the composite presentation. Similar to sending the audio signal over the meter bus, the media editing application of some embodiments scales the estimated audio level by identifying what one or more of the clips are contributing to the overall mix.
In some embodiments, the media editing application allows a composite presentation to be exported to different tracks (e.g., different files). To export the composite presentation, the media editing application of some embodiments performs multiple rendering passes on a sequence of clips while muting one or more of the clips in the sequence. In some such embodiments, the composite presentation is output to different tracks based on metadata associated with the clips. For example, with these metadata tags, a multi-track output can be specified as a first track for each clip tagged as dialog, a second track for each clip tagged as music, etc. In this manner, the editor or a movie studio can easily replace one track with another track.
The media editing application of some embodiments uses metadata to provide user interface controls. In some such embodiments, these controls are used to display properties of tagged clips and/or specify parameters that affect the tagged clips. Example of such user interface controls include audio meters, volume controls, different controls for modifying (e.g., distorting, blurring, changing color) images, etc.
Several more examples editing operations are described below. Section I describes several examples of applying effects to different tagged clips. Section II then introduces compound clips and proves several examples of applying effects to the compound clips. Section III then describes examples of metering clips that has previously been mixed. Section IV then describes constructing user interface controls and propagating parameters specified through the user interface controls. Section V then describes using metadata tags to output a composite presentation to different tracks. Section VI describes an example graphical user interface and software architecture of a media editing application of some embodiments. Section VI also describes several example data structures for the media editing application of some embodiments. Finally, Section VII describes an electronic system which implements some embodiments of the invention.
I. Applying Effects to Clips Based on MetadataIn some embodiments, the media editing application applies one or more effects to clips in a composite presentation based on metadata (i.e., metadata tags) associated with the clips. In creating the composite presentation, one or more effects are associated with a metadata tag. Once the effects are associated, the media editing application applies the effects to different pieces of media content tagged with the metadata tag in order to create the composite presentation.
There are many different effects or filters that can be associated with metadata to facilitate editing operations. Although this list is non-exhaustive, several example audio effects include an equalizer for modifying the signal strength of a clip within specified frequency ranges, an echo effect for creating an echo sound, and a reverb effect for creating a reverberation effect that emulates a particular acoustic environment. Several example video effects or image effects include color filters that operate on color values, different filters that sharpen, stylize, distort, or blur an image, and fade-in/fade-out effects for creating transitions between scenes.
As shown, process 100 identifies (at 105) each clip tagged with a particular metadata tag having an associated effect.
In example illustrated in
Process 100 then identifies (at 110) the effect that is associated with the particular metadata tag. As shown
When the effect does not requires data of one or clips to be routed, process 100 proceeds to 120 which is described below. Otherwise, process 100 process 100 defines (at 135) a bus for the particular metadata tag. In some embodiments, the process creates this bus to send a combined audio signal of each clip tagged with the particular metadata tag. In the example illustrated in
Process 100 then sends (at 140) an audio signal of each identified clip over the aux send bus. Process 100 identifies (at 145) parameters of the identified effect. Different effects can be associated with different parameters. For example, a reverb effect can have one set of parameters including the output audio level of the reverberation effect, the type of reverberation (e.g., room, hall, space), etc. Different from the reverb effect, an image distortion effect can have a different set of settings or parameters for distorting images.
The process 100 then applies (at 150) the effect to each identified clip based on the identified parameters. As shown in
In the example illustrated in
In the example illustrated in
One reason for utilizing the “send” technique is that it allows a combined audio signal of multiple clips to be processed through the same effects unit. In most cases, the “send” operation is used to efficiently process multiple audio signals as one composite audio signal. In other words, as multiple audio signals are mixed and processed together, the “send” technique can be less computationally expensive than applying an effect to each individual audio signal.
As shown in
In the examples described above, one effect is applied to one or more clips.
As show in
In the example illustrated in
Referring back to
The process 100 then applies (at 125) the effect to each identified clip. Specifically, each particular effect is applied to the clip based on the properties of the particular effect. The media editing application of some embodiments applies one or more effects directly on each clip without using the “send”. One example of such technique is applying effects as “insert” effects. Different from the “send” effect, an “insert” effect simply processes the incoming audio signal and outputs a processed audio signal. In using this technique, the audio signals of different clips are not routed over an auxiliary bus to an effect module to be processed as one combined audio signal. Also, the output of an effect module is not mixed back in with one or more original audio signals. For example, the output audio data of a filter or an effect that compresses or distorts input audio data does not need to be mixed back in with the original uncompressed or undistorted audio data. Similarly, the output of an equalizer that reduces the bass of a clip does not need be mixed back in with the original clip as it will defeat the purpose of reducing the bass in the first place. Many different audio effects or audio filters (e.g., equalizers, compressors, band-pass filters) are applied as “insert” effects, in some embodiments.
As shown in
Referring back to
Some embodiments perform variations on process 100. For instance, process 100 of some embodiments identifies each effect in an effect chain. Specifically, before identifying a next tag with an effect, process 100 applies each effect in the chain to a set of tagged clips. Also, some embodiments might take into account that a clip can be a compound clip (described below). In some such embodiments, process 100 identifies each outer metadata tag of the compound clip and each inner tag of the compound clip's nested clips. Process 100 then applies one or more effects to the compound clip and/or the inner clips according to this identification. Several examples applying effects to compound clips are described below by reference to
In the examples described above, different effects are applied using different techniques. In some embodiments, the media editing application automatically determines whether to apply an effect by using an “insert” or by using the “send and return”. For instance, the media editing application of some embodiments automatically applies a first type of effect (e.g., reverb, echo) using the “send and return”, while applying a second type of effect (e.g., compressor, equalizer) as an “insert” effect. In conjunction with this automatic determination, or instead of it, the media editing application of some embodiments provides one or more user-selectable items for specifying whether to apply an effect as a “send” effect or an “insert” effect.
II. Applying Effects to Compound ClipsThe media editing application of some embodiments allow users to create compound clips from multiple different clips. In some embodiments, a compound clip is any combination of clips (e.g., in a composite display area or in a clip browser as described below by reference to
As shown, the figure includes a composite display area 660 and a tag display area 665. The composite display area 660 provides a visual representation of the composite presentation (or project) being created with the media editing application. Specifically, it displays one or more geometric shapes that represent one or more media clips that are part of the composite presentation. In some embodiments, the tag display area 665 displays one or more pieces of metadata associated with different media clips.
The first stage 605 shows the tag display area 665 and the composite display area 660. The tag display area 665 includes a metadata tag 655 that is associated with an add effect control 660. The composite display area 660 displays representations of three clips 630-640 that are not tagged with the metadata tag 655. In this first stage, the user selects the clip 630 by selecting a corresponding representation in the composite display area 660.
The second stage 610 shows the creation of a compound clip from clips 630 and 635. Specifically, after selecting these two clips, the user selects a selectable option 640 (e.g., context menu item) to create the compound clip 650 as illustrated in the third stage 615. In some embodiments, the media editing application provides several different controls (not shown) for creating the compound clip. Several examples of such controls include (1) a text field for inputting a name for the compound clip, (2) a first set of control for specifying video properties (e.g., automatically based on the properties of the first video clip, custom), and a second set of controls for specifying audio properties (e.g., default settings, custom).
The third stage 615 illustrates tagging the compound clip 650 with the first metadata tag 655. Here, a tagging option 645 is used to tag the compound clip 650. However, different embodiments provide different ways for tagging a compound clip. The fourth stage 620 illustrates the selection of an add effect control 660. The selection causes an add effect window 665 with a list of effects to appear as illustrated in the fifth stage 625. As shown in the fifth stage 625, the add effect window 665 displays several different effects from which the user can choose from to associate with the first metadata tag 655. The user then selects the reverb effect to associate it with the first metadata tag 655.
Once the effect is set, the media editing application applies the reverb effect to the compound clip 650 in order to produce a resulting composite presentation. For example, the media editing application of some embodiments applies the reverb effects to the compound clip 650 to play a real-time preview of the presentation. Alternatively, the media editing application renders or outputs the sequence in the composite display area 660 to storage for playback at another time.
As shown in
In the example illustrated in
In the previous example, a compound clip is tagged with a metadata tag that is associated with an effect. Also, the nested clips of the compound clip are not tagged with this metadata tag. Accordingly, the effect associated with the compound clip's tag is applied to the audio signal of the compound clip. In some cases, one or more inner clips of the compound clip are tagged with a metadata tag. In order to simply the discussion below, a compound clip's tag will be referred to as an outer tag, while the tag of the inner clip of the compound clip will be referred to as an inner tag. Also, in several examples below, the outermost tag refers to the tag of the compound clip that is not contained by another compound clip.
As shown, the audio signal of clip 805 is routed to the reverb FX module 205. This is because the clip 805 is tagged with the “Dialog” tag that is associated with a reverb effect. In other words, even though the clip 805 is a nested clip of the compound clip 820, the media editing application of some embodiments identifies each inner tag of the compound clip's nested clips to apply one or more effects. Here, the reverb FX module 205 applies the reverb effect to the received audio signal and returns an audio signal containing the reverb effect to the master 210. As indicated by the “+” symbol, the audio signals of clip 805 and 810 are combined for the compound clip 820. The audio signal of the compound clip 820, the audio signal containing the reverb effect for clip 805, and the audio signal of clip 805 are then mixed. The master 210 receives the mixed audio signal and outputs a resulting mixed audio signal.
In the example described above, the output of the reverb FX module 205 is sent to the master 210 instead of being mixed in as part of the compound clip 820. This is because the media editing application of some embodiments defines a separate auxiliary (“aux”) bus or virtual pathway for one or more effects associated with a metadata tag. In some embodiments, this aux bus always outputs to the master.
As shown in
In the example illustrated in
In some cases, a compound clip is tagged with the same tag as one or more of the compound clip's inner clips. In some embodiments, the media editing application identifies an appropriate level of a compound clip to apply the effect such that the effect is not reapplied at another level. For example, when the inner clip's tag is the same as the compound clip's outer tag, the media editing application of some embodiments identifies the compound clip's outer tag and performs the editing operations based on the compound clip's outer tag. This prevents the same effect being applied to the compound and one or more of the compound clip's nested clips.
As shown, process 1000 identifies (at 1005) a clip tagged with a particular metadata tag in a composite presentation. Process 1000 then determines (at 1010) whether the clip tagged the particular metadata tag is a compound clip. In the example illustrated in
When the clip is not a compound clip, process 1000 proceeds to 1035, which is described below. Otherwise, process 1000 identifies (at 1015) the particular metadata tag of the compound clip and each inner tag of the compound clip's nested clips. Process 1000 then determines (at 1020) whether any inner tag of the compound clip's nested clips is different from the outer tag of the compound clip.
When no inner tag is different than the outer tag or no nested clip is tagged with a tag associated with an effect, process 1000 performs (at 1025) one or more operations based on the outer tag. In the example illustrated in
The reverb FX module 205 applies the reverb effect to the received audio signal and returns an audio signal containing the reverb effect to the master 210. The mixed audio signal of the compound clip 820, the audio signal containing the reverb effect from the reverb FX module 205, and the audio signal of clip 815 are then mixed. The master 210 then receives the mixed audio signal and outputs a resulting audio signal.
Referring back to
Process 1000 then determines (at 1035) whether there is any other tagged clip in the composite presentation. When there is another tagged clip, process 1000 returns to 1005 which was described above. Otherwise, process 1000 ends.
Some embodiments perform variations on process 1000. For example, the specific operations of process 1000 may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments.
III. Audio MetersIn many of the example described above, the audio signals of several clips are mixed and output as one combined audio signal for a composite presentation. In some cases, the mixed audio signal of a compound clip is again combined with an audio signal of another clip to output a composite presentation. In some embodiments, the media editing application displays the audio level of a set of one or more clips even though the set of clips has been mixed with other clips.
A. Displaying Audio Levels
In some embodiments, the media editing application provides audio meters and/or audio controls for metadata tags associated with different clips. An example of this is illustrated in
The first stage 1205 shows the composite display area 660 and the audio mixer 1220 prior to playing the composite presentation. As shown, clip 1225 is tagged with the “Dialog” tag. Compound clip 1235 includes several nested clips 1260 and 1265. The compound clip 1235 is tagged with the “SFX” tag. The nested clips 1260 and 1265, and clip 1230 are not tagged with the “Dialog” tag or the “SFX” tag.
The second stage 1210 shows the playback of the composite presentation represented in the composite display area 660 at a first instance in time. To output the composite presentation's mixed audio signal, the audio signals of the nested clips 1260 and 1265 has been mixed for the compound clip 1235. In addition, the audio signals of the clips 1225 and 1230 have been mixed with the audio signal of the compound clip 1235. In this second stage 1210, the audio meter 1245 displays the audio level of the clip 1225 even though the clips in the composite display area 660 has been mixed to play the composite presentation.
The third stage 1215 shows the playback of the composite presentation at a second instance in time. Similar to the previous stage, the audio meter 1255 displays the audio level of the compound clip 1225 even though the compound clip has been mixed with the clips 1225 and 1230.
B. Sending an Audio Signal Over a Meter Bus
In the example illustrated in
As shown
As the audio level of clip 1310 cannot be determined using the mixed audio signal, the clip's audio signal is sent over the meter bus 1305. This meter bus 1305 is not for playing sound but for metering. Specifically, in the example illustrate in
As shown in
In some embodiments, the media editing application takes into account other factors when displaying the audio level of clips that has been with other clips. The media editing application of some embodiments scales (i.e., reduces or increases) the audio level of one or more clips by processing later down the signal chain. For example, in the example illustrated in
C. Estimating the Volume
In the previous example, a combined audio signal of several clips is sent over a meter bus to display the audio level of several clips. Alternatively, the media editing application of some embodiments estimates the audio level of the one or more clips. That is, instead of routing the audio signal over the meter bus, the media editing application numerically estimates the audio level by extracting metering information from the clips prior to mixing the clips.
Process 1500 then determines (at 1515) whether any other clip is tagged with a different tag. When no other clip is tagged with a different tag, process 1500 proceeds to 1520. Otherwise, process 1500 returns to 1505 which was described above.
At 1520, process 1500 determines the audio level of one or more clips based on the metering information. In the example described above in
In some embodiments, process 1500 estimates the audio level by adding the power contribution of each clip. One example of such addition is adding about 3 dB for every doubling of equal input sources. For example, if the audio signals of two clips have an identical volume of −10 dB, then the sum of the two signals is estimated to be about 3 dB higher. As such, the estimated sum of the two signals is about −7 dB. If there are four audio signals that have the identical volume, then the sum of these signals will be estimated to be about 6 dB higher, and so on. One example formula for adding sound pressure levels of multiple sound sources is shown below:
Here, LΣ equals total level, and L1, L2, . . . Ln equal sound pressure level (spl) of the separate sources in dBspl. This formula above translates to about 3 dB per doubling of equal sources. One of ordinary skill in the art would realize that other formulas can be used to differently sum two or more audio signals in order to estimate the audio level.
Returning to
In some embodiments, process 1500 takes into account other factors when displaying the audio level of clips that has been previously mixed. For example, process 1500 of some embodiments scales (i.e., reduces or increases) the audio level of one or more clips by processing later down the signal chain. In some embodiments, the process estimates the audio level of the mixed clips by identifying what each clip is contributing to the overall mix and numerically estimating the audio level based on the identification and the extracted metering information. For example, when a compound clip is muted, the media editing application should not display audio level of the compound's nested clip as the nested clip is also muted.
In some cases, estimating the audio level has several advantages over routing audio signals over meter buses. For example, this technique can be less computationally expensive than using meter buses. This is because the meter buses do not have to be created and the audio signals of different clips do not have to be routed over these meter buses.
IV. Parameter Controls and PropagationThe media editing application of some embodiments uses metadata to provide user interface controls. In some embodiments, these controls are used to display properties of tagged clips and/or specify parameters that affect the tagged clips. Example of such user interface controls include audio meters, volume controls, different controls for modifying images (e.g., distorting, blurring, changing color), etc.
Process 1600 then determines (at 1615) whether any other clip is tagged with a different tag. When no other clip is tagged with a different tag, process 1600 proceeds to 1620. Otherwise, process 1600 returns to 1605 which was described above. Process 1600 then receives (at 1620) adjustment of parameters through one or more corresponding user interface controls. Process 1600 then outputs (at 1625) the sequence of clips in the composite presentation by propagating the adjusted parameter to one or more of corresponding tagged clips. Process 1600 then ends.
Some embodiments allow a compound clip to be tagged with the same tag as one or more of the compound clip's inner clips. In some embodiments, the media editing application identifies an appropriate level in a render graph or signal chain to adjust parameters such that the parameters are not readjusted at another level. For example, when the inner clip's tag is the same as the compound clip's outer tag, the media editing application of some embodiments identifies the compound clip's outer tag and performs the adjustment based on the compound clip's outer tag. This prevents the same adjustment being applied at multiple different levels.
In the example illustrated in
In some cases, the compound clip's outer tag can be different from one or more tags of its inner clips. When the compound clip's outer tag is different from the inner clip's tag, the media editing of some embodiments adjusts one set of parameter associated with the inner clip based on the inner clip's tag. Also, the media editing application adjusts another set of parameters associated with the compound clip based on the compound clip's tag.
In some embodiments, the media editing application does not support tagging compound clips. In some such embodiments, the adjustment is only made at the nested clip level. For example, when several nested clips of a compound clip are tagged with a “Dialog” tag, an adjustment to a control relating to the “Dialog” tag will adjust parameters associated with these nested clips and not the combined clip of the compound clip.
V. Outputting Content to Different TracksThe media editing application of some embodiments allows a composite presentation to be output to different tracks (e.g., different files) based on metadata associated with media content. Outputting content to different tracks is particularly useful because one track can easily be replaced with another track. For example, when audio content is mixed, a movie studio cannot replace a dialog track in one language with another dialog track in another language. With audio content output to different tracks (e.g., audio files), the movie studio can easily replace one dialog track with another such that the dialog is in a different language.
A. Specifying Output Tracks
The first stage 1805 shows the tag display area 665 and the composite display area 660. The tag display area 665 includes a list of metadata tags. This list includes a first metadata tag 1850 specified as “Dialog”, a second metadata tag 1855 specified as “Music”, and a third metadata tag 1860 specified as “SFX”. The first metadata tag 1850 is associated with a first output control 1835, the second metadata tag 1855 with a second output control 1840, and the third metadata tag 1860 with a third output control 1845.
The composite display area 660 displays representations of five clips 1865-1885. The clips 1865 and 1885 are tagged with the first metadata tag 1850, the clip 1870 is tagged with the second metadata tag 1855, and the clips 1875 and 1880 are tagged with the third metadata tag 1860. To specify an output track for the clips 1865 and 1885 that are tagged with the first metadata tag 1850, the user selects the output control 1835. The selection causes a track control 1890 to appear as illustrated in the second stage 1810.
The second stage 1810 illustrates specifying an output track for the clips 1865 and 1885 tagged with the first metadata tag 1850. Specifically, the user specifies the output track to be “Track 1” by using the track control 1890. In some embodiments, the media editing application provides various different options for outputting content. Several example output options include compression type and settings, bit rate, bit size, mono or stereo, name of file, etc. For instance, when outputting an audio containing dialog to a separate file, the media editing application of some embodiments displays different user interface items that allow the application's user to define the output audio clip such as the type of audio file, compression settings, etc.
The third and fourth stages 1815 and 1820 illustrate specifying an output track for the clip 1870 that is tagged with the second metadata tag 1855. To specify the output track, the user selects the output control 1840 that is associated with the second metadata tag 1855. The selection causes the track control 1890 to appear, as illustrated in the fourth stage 1820. In the fourth stage 1820, the application's user specifies the output track to be “Track 2” by using the track control 1890.
The fifth and sixth stages 1825 and 1830 are similar to the previous stages. However, in these stages 1825 and 1830, an output track is specified for the clips 1875 and 1880 that are tagged with the third metadata tag 1860. To specify the output track, the user selects the output control 1845 that is associated with the third metadata tag 1860. The selection causes the track control 1890 to appear, as illustrated in the sixth stage 1830. In the sixth stage 1830, the user specifies the output track to be “Track 3” by using the track control 1890. Once the output tracks are specified for the metadata tags, the user can select an output or export option (not shown) to start the output of clips based on the clip's association with a particular metadata tag.
In the example described above, several output tracks are associated with metadata tags. In some embodiments, the media editing application allows a user to associate metadata tags with output tracks.
As shown in
B. Performing Multiple Passes
In some embodiments, the media editing application performs multiples passes on a render graph or signal chain to output a composite presentation to different tracks.
As shown in
Although the composite presentation cannot be unmixed during playback, the media editing application allows the composite presentation to be output to different audio files by performing multiple passes on a render graph or signal chain.
The first stage 20B05 illustrates a first pass that is performed to output the audio content of clip 2005 to “Track 1”. The audio signals of clips 2005 and 2010 are mixed for the compound clip 2020. However, in this first pass, the audio signal of clip 2010 is disabled (e.g., muted or silenced). As the audio clip 2010 is muted, the mixed audio signal includes only the audio signal of the clip 2005.
The second stage 20B10 illustrates a second pass that is performed to output the audio content of clip 2010 to “Track 2”. Similar to the first stage 20B05, the audio signals of clips 2005 and 2010 are mixed for the compound clip 2020. However, in this second pass, the audio signal of clip 2005 is disabled (e.g., muted). As the audio clip 2005 is disabled, the mixed audio signal includes only the audio signal of the clip 2010. In some embodiment, the output files include a same duration as the composite presentation. For example, if the duration of the composite presentation (e.g., represented in the composite display area) is one hour and each of the clips 2005 and 2010 includes thirty minutes of sound, then each output file will be one hour in duration with thirty minutes of sound.
In the example described above, multiple rendering passes are performed to output the audio content to different tracks. The media editing application of some embodiments performs these multiple passes simultaneously. In some such embodiments, the media editing application generates multiple copies of one or more render objects (e.g., render graphs, render files) for rendering the sequence of clips in the composite display area. The media editing application then performs the multiple passes such that these passes occur at least partially at the same time. By simultaneous performing these passes, the media editing application saves time in that it does not need to wait for one pass to end to start another. This also saves time as files (e.g., source clips) are read out of disk or loaded in memory once instead of multiple times.
The preceding section described and illustrated various ways to use metadata to facilitate output operations.
At 2115, process 2100 identifies each clip tagged with a tag (e.g., role) that is associated with the identified track. An example of associating one or more roles to a particular output track is described above by reference to
Process 2100 then adds (at 2120) each identified clip to a render list for that track. Process 2100 then determines (at 2125) whether there are any more tracks. When there is another track, process 2100 returns to 2110 that was described above. Otherwise, process 2100 renders the composite presentation based on one or more render lists. For example, process 2100 of some embodiments renders the composite presentation by identifying clips in a render list, combining any two or more clips in the list, and outputting the combined clip to a particular track. Process 2100 then ends.
Some embodiments perform variations on process 2100. For example, the operations of process 2100 might be performed by two or more separate processes. Also, the specific operations of the process may not be performed in the exact order shown and described.
VI. Software ArchitectureA. Example Media Editing Application
Having described several example editing operations above, an example media editing application that implements several editing features will now be described.
The clip library 2205 includes a set of folder-like or bin-line representations through which a user accesses media clips that have been imported into the media editing application.
Some embodiments organize the media clips according to the device (e.g., physical storage device such as an internal or external hard drive, virtual storage device such as a hard drive partition, etc.) on which the media represented by the clips are stored. Some embodiments also enable the user to organize the media clips based on the date the media represented by the clips was created (e.g., recorded by a camera).
Within the clip library 2205, users can group the media clips into “events” or organized folders of media clips. For instance, a user might give the events descriptive names that indicate what kind of media is stored in the event (e.g., the “New Event 2-5-11” event shown in clip library 2205 might be renamed “European Vacation” as a descriptor of the content). In some embodiments, the media files corresponding to these clips are stored in a file storage structure that mirrors the folders shown in the clip library.
In some embodiments, the clip library 2205 enables users to perform various clip management actions. These clip management actions include moving clips between bins (e.g., events), creating new bins, merging two bins together, duplicating bins (which, in some embodiments, create a duplicate copy of the media to which the clips in the bin correspond), deleting bin, etc.
As shown in
The clip browser 2210 allows the user to view clips from a selected folder or collection (e.g., an event, a sub-folder, etc.) of the clip library 2205. In the example illustrated in
By moving a position indicator (e.g., through a cursor, through the application's user touching a touch screen) over one of the thumbnails, the user can skim through the clip. For example, when the user places the position indicator at a particular horizontal location within the thumbnail filmstrip, the media editing application associates that horizontal location with a time in the associated media file, and displays the image from the media file for that time. In addition, the user can command the application to play back the media file in the thumbnail filmstrip. In some embodiments, the selection and movement is received through a user selection input such as input received from a cursor controller (e.g., a mouse, touchpad, trackpad, etc.), from a touchscreen (e.g., a user touching a user interface (UI) item on a touchscreen), from the keyboard, etc. In some embodiments, one example of such a user selection input is the position indicator that indicates the user's interaction (e.g., with the cursor, the touchscreen, etc.). The term user selection input is used throughout this specification to refer to at least one of the preceding ways of making a selection, moving a control, or pressing a button through a user interface.
In the example illustrated in
In some embodiments, the media editing application displays content differently based on their association with one or more metadata tags (e.g., keywords). This allows users to quickly assess a large group of media clips and see which ones are associated or not associated with any metadata tags. For example, in
In some embodiments, the media editing application allows the user to tag a portion of a clip with a metadata tag. To associate a metadata tag with a portion of a clip, the user can select the portion of the clip (e.g., using a range selector on a clip's filmstrip representation in the clip browser 2210), and drag and drop the selected portion onto the metadata tag (e.g., 2202 or 2204). For example, a user can specify that an audio clip includes crowd noise starting at one point in time and ending at another point, and then tag that range as “crowd noise”. When a portion of a clip is associated with a metadata tag, the media editing application of some embodiments indicates this by marking a portion of the clip's representation in the clip browser 2210. For example, a horizontal bar is displayed across only the portion the clip's filmstrip representation associated with a particular metadata tag, in some embodiments.
The composite display area 2215 provides a visual representation of a composite presentation (or project) being created by the user of the media editing application. As mentioned above, the composite display area 2215 displays one or more geometric shapes that represent one or more media clips that are part of the composite presentation. In some embodiments, the composite display area 2215 spans a displayed timeline 2226 which displays time (e.g., the elapsed time of clips displayed on the composite display area). The composite display area 2215 of some embodiments includes a primary lane 2216 (also called a “spine”, “primary compositing lane”, or “central compositing lane”) as well as one or more secondary lanes (also called “anchor lanes”). The spine represents a primary sequence of media which, in some embodiments, does not have any gaps. The clips in the anchor lanes are anchored to a particular position along the spine (or along a different anchor lane). Anchor lanes (e.g., the anchor lane 2218) may be used for compositing (e.g., removing portions of one video and showing a different video in those portions), B-roll cuts (i.e., cutting away from the primary video to a different video whose clip is in the anchor lane), audio clips, or other composite presentation techniques.
The user can select different media clips from the clip browser 2210, and drag and drop them into the composite display area 2215 in order to add the clips to a composite presentation represented in the composite display area 2215. Alternatively, the user can select the different media clips and select a shortcut key, a tool bar button, or a menu item to add them to the composite display area 2215. Within the composite display area 2215, the user can perform further edits to the media clips (e.g., move the clips around, split the clips, trim the clips, apply effects to the clips, etc.). The length (i.e., horizontal expanse) of a clip in the composite display area is a function of the length of the media represented by the clip. As the timeline 2226 is broken into increments of time, a media clip occupies a particular length of time in the composite display area. As shown, in some embodiments, the clips within the composite display area are shown as a series of images or filmstrip representations. The number of images displayed for a clip varies depending on the length of the clip (e.g., in relation to the timeline 2226), as well as the size of the clips (as the aspect ratio of each image will stay constant). As with the clips in the clip browser, the user can skim through the composite presentation or play back the composite presentation. In some embodiments, the playback (or skimming) is not shown in the composite display area's clips, but rather in the preview display area 2220.
The preview display area 2220 (also referred to as a “viewer”) displays images from media files which the user is skimming through, playing back, or editing. These images may be from a composite presentation in the composite display area 2215 or from a media clip in the clip browser 2210. In the example of
The inspector display area 2225 displays detailed properties about a selected item and allows a user to modify some or all of these properties. The selected item might be a clip, a composite presentation, an effect, etc. As shown in
The toolbar 2235 includes various selectable items for editing, modifying items that are displayed in one or more display areas, etc. The illustrated toolbar 2235 includes items for video effects, visual transitions between media clips, photos, titles, generators and backgrounds, etc. The toolbar 2235 also includes selectable items for media management and editing. Selectable items are provided for adding clips from the clip browser 2210 to the composite display area 2215. In some embodiments, different selectable items may be used to add a clip to the end of the spine, add a clip at a selected point in the spine (e.g., at the location of a playhead), add an anchored clip at the selected point, perform various trim operations on the media clips in the composite display area, etc. The media management tools of some embodiments allow a user to mark selected clips as favorites, among other options.
The audio mixer 2255 provides different audio mixing tools that the application's user can use to define the output audio of the composite presentation represented in the composite display area 2215. The audio mixer 2255 includes several level controls (2260, 2270, and 2280) and several audio meters (2265, 2275, and 2285). The level control 2280 and the audio meters 2285 are related to the master that represents the output audio. Specifically, the master's level control 2280 raises or lowers the combined output level of all sequence of clips in the composite display area at the same time. That is, the control 2280 affects output levels during playback, export to a file, etc. Hence, the level control 2280 adjusts the level of the output audio, and the meters 2285 display that audio level. In the example illustrated in
As shown in
The level control 2260 and the audio meter 2265 are related to the keyword 2202. The level control 2270 and the audio meter are related to the keyword 2204. In some embodiments, the audio meters 2265 and 2275 display the audio levels of the clips associated with the corresponding keywords. For example, when a clip tagged with “Dialog” is being output, the audio meter 2265 fluctuates to indicate the level of the clip's audio. Similarly, the audio level control 2260 controls the audio level of each clip that is tagged with the keyword 2202, and the audio level control 2270 controls the audio level of each clip tagged with the keyword 2204.
As shown in
In the example illustrated in
One or ordinary skill will also recognize that different display areas shown in the GUI 2200 is one of many possible configurations for the GUI of some embodiments. For instance, in some embodiments, the presence or absence of many of the display areas can be toggled through the GUI (e.g., the inspector display area 2225, clip library 2205, etc.). In addition, some embodiments allow the user to modify the size of the various display areas within the GUI. For instance, when the mixer 2255 is removed, the composite display area 2215 can increase in size to include that area. Similarly, the preview display area 2220 increases in size when the inspector display area 2225 is removed.
B. Example Software Architecture
In some embodiments, the processes described above are implemented as software running on a particular machine, such as a computer or a handheld device, or stored in a machine readable medium.
The media editing application 2300 includes a user interface (UI) interaction and generation module 2305, a media ingest module 2310, editing modules 2315, effects modules 2340, output components 2308, a playback module 2325, a metadata association module 2335, and an effects association module 2330.
The figure also illustrates stored data associated with the media editing application: source files 2350, event data 2355, project data 2360, and other data 2365. In some embodiments, the source files 2350 store media files (e.g., video files, audio files, combined video and audio files, etc.) imported into the application. The source files 2350 of some embodiments also store transcoded versions of the imported files as well as analysis data (e.g., people detection data, shake detection data, color balance data, etc.). The event data 2355 stores the events information used by some embodiments to populate the thumbnails view (e.g., in a clip browser). The event data 2355 may be a set of clip object data structures stored as one or more SQLite database (or other format) files in some embodiments. The project data 2360 stores the project information used by some embodiments to specify a composite presentation in the composite display area 2345. The project data 2360 may also be a set of clip object data structures stored as one or more SQLite database (or other format) files in some embodiments.
In some embodiments, the four sets of data 2350-2365 are stored in a single physical storage (e.g., an internal hard drive, external hard drive, etc.). In some embodiments, the data may be split between multiple physical storages. For instance, the source files might be stored on an external hard drive with the event data, project data, and other data on an internal drive. Some embodiments store event data with their associated source files and render files in one set of folders, and the project data with associated render files in a separate set of folders.
The input device drivers 2375 may include drivers for translating signals from a keyboard, mouse, touchpad, tablet, touchscreen, etc. A user interacts with one or more of these input devices, each of which send signals to its corresponding device driver. The device driver then translates the signals into user input data that is provided to the UI interaction and generation module 2305.
The present application describes a graphical user interface that provides users with numerous ways to perform different sets of operations and functionalities. In some embodiments, these operations and functionalities are performed based on different commands that are received from users through different input devices (e.g., keyboard, trackpad, touchpad, mouse, etc.). For example, the present application illustrates the use of a cursor in the graphical user interface to control (e.g., select, move) objects in the graphical user interface. However, in some embodiments, objects in the graphical user interface can also be controlled or manipulated through other controls, such as touch control. In some embodiments, touch control is implemented through an input device that can detect the presence and location of touch on a display of the input device. An example of a device such functionality is a touch screen device (e.g., as incorporated into a smart phone, a tablet computer, etc.). In some embodiments, with touch control, a user directly manipulates objects by interacting with the graphical user interface that is displayed on the display of the touch screen device. For instance, a user can select a particular object in the graphical user interface by simply touching that particular object on the display of the touch screen device. As such, when touch control is utilized, a cursor may not even be provided for enabling selection of an object of a graphical user interface in some embodiments. However, when a cursor is provided in a graphical user interface, touch control can be used to control the cursor in some embodiments.
The display module 2380 translates the output of a user interface for a display device. That is, the display module 2380 receives signals (e.g., from the UI interaction and generation module 2305) describing what should be displayed and translates these signals into pixel information that is sent to the display device. The display device may be an LCD, plasma screen, CRT monitor, touchscreen, etc.
The media import module 2385 receives media files (e.g., audio files, video files, etc.) from storage devices (e.g., external drives, recording devices, etc.) through one or more ports (e.g., a USB port, Firewire port, etc.) of the device on which the application 2300 operates and translates this media data for the media editing application or stores the data directly onto a storage of the device.
The UI interaction and generation module 2305 of the media editing application 2300 interprets the user input data received from the input device drivers 2375 and passes it to various modules, including the editing modules 2315, the rendering engine 2320, the playback module 2325, the metadata association modules 2335, and the effects association module 2330. The UI interaction and generation module 2305 also manages the display of the UI, and outputs this display information to the display module 2380. This UI display information may be based on information from the editing modules 2315, the playback module 2325, and the data 2350-2365. In some embodiments, the UI interaction and generation module 2305 generates a basic GUI and populates the GUI with information from the other modules and stored data.
As shown, the UI interaction and generation module 2305 of some embodiments provides a number of different UI elements. In some embodiments, these elements include a tag display area 2306, a composite display area 2345, an effects association tool 2304, an audio mixing tool 2318, and a preview display area 2312. All of these UI elements are described in detail above by reference to
The media ingest module 2310 manages the import of source media into the media editing application 2300. Some embodiments, as shown, receive source media from the media import module 2385 of the operating system 2370. The media ingest module 2310 receives instructions through the UI interaction and generation module 2305 as to which files should be imported, then instructs the media import module 2385 to enable this import (e.g., from an external drive, from a camera, etc.). The media ingest module 2310 stores these source files 2350 in specific file folders associated with the application. In some embodiments, the media ingest module 2310 also manages the creation of event data structures upon import of source files and the creation of the clip and asset data structures contained in the events. In some embodiments, the media ingest module 2310 tags the imported media clip with one or more metadata tags. For example, when a media clip is imported from a music library, the media ingest module 2310 might tag the clip with a “Music” tag. Alternatively, when the media clip is imported from a folder named “Dialog”, the media ingest module 2310 might tag the clip with a “Dialog” tag.
The editing modules 2315 include a variety of modules for editing media in the clip browser as well as in the composite display area. The editing modules 2315 handle the creation of projects, addition and subtraction of clips from projects, trimming or other editing processes within the composite display area, or other editing processes. In some embodiments, the editing modules 2315 create and modify project and clip data structures in both the event data 2355 and the project data 2360.
The effects association module 2330 of some embodiments associates an effect with a metadata tag. In some embodiments, the effect association module 2330 defines an effect chain with one or more effects for the metadata tag. The effect modules 2340 represent the various different effects, filters, transitions, etc. As mentioned above, there are many different effects or filters that can be associated with metadata to facilitate editing operations. Although this list is non-exhaustive, several example audio effects include different equalizers for modifying the signal strength of a clip within specified frequency ranges, a compressor/limiter for reducing the clip's dynamic range by attenuating parts of the audio signal above a particular threshold, an echo effect for creating an echo sound, and a reverb effect for creating a reverberation effect that emulates a particular acoustic environment. Several example video effects or image effects include color filters that operate on color values, different filters that sharpen, stylize, distort, or blur an image, and fade-in/fade-out effects for creating transitions between scenes. Several of these effect modules are associated with one or more settings or properties that the application's user can specify to edit media content.
In some embodiments, the output components 2308 generate the resulting output composite presentation based on one or more clips in the composite display area 2345. As shown, the output components 2308 include a rendering engine 2320 and a mixer 2314. However, depending on the type of output, the media editing application of some embodiments includes other component (e.g., encoders, decoders, etc). The rendering engine 2320 handles the rendering of images for the media editing application. In some embodiments, the rendering engine 2320 manages the creation of images for the media editing application. When an image is requested by a destination within the application (e.g., the playback module 2325) the rendering engine 2320 outputs the requested image according to the project or event data. The rendering engine 2320 retrieves the project data or event data that identifies how to create the requested image and generates a render graph that is a series of nodes indicating either images to retrieve from the source files or operations to perform on the source files. In some embodiments, the rendering engine 2320 schedules the retrieval of the necessary images through disk read operations and the decoding of those images.
In some embodiments, the rendering engine 2320 performs various operations to generate an output image. In some embodiments, these operations include blend operations, effects (e.g., blur or other pixel value modification operations), color space conversions, resolution transforms, etc. In some embodiments, one or more of these processing operations are actually part of the operating system and are performed by a GPU or CPU of the device on which the application 2300 operates. The output of the rendering engine (a rendered image) may be stored as render files in storage 2365 or sent to a destination for additional processing or output (e.g., playback).
In some embodiments, the mixer 2314 receives several audio signals of different clips and outputs a mixed audio signal. The mixer 2314 of some embodiments is utilized in number of different instances during the non-linear editing process. For example, the mixer may be utilized in generating a composite presentation from multiple different clips. The mixer can also act as the master to output a mixed audio signal, as described in many of the examples above. In some embodiments, the media editing application includes different types of mixers for mixing audio. For example, the media editing application can include a first mixer for mixing one type of audio file and a second mixer for mixing another type of audio file.
The playback module 2325 handles the playback of images (e.g., in a preview display area 2312 of the user interface). Some embodiments do not include a playback module and the rendering engine directly outputs its images for integration into the GUI, or directly to the display module 2380 for display at a particular portion of the display device.
In some embodiments, the metadata association module 2335 associates clips with metadata tags. Different embodiments provide different ways of associating media clips with metadata tags. In some embodiments, the metadata tags indicate pre-defined categories (e.g., dialog, music) that an editor can select to categorize different clips. Instead of, or in conjunction with, these categories, some embodiments allow the editor to specify one or more keywords to associate with the media clips. For instance, in some such embodiments, the media editing application provides a keyword association tool that displays different keywords for tagging the media content. To tag a clip, the application's user drags and drops the clip onto a particular keyword in the keyword association tool. The same technique is used in some embodiments to associate multiple clips by simultaneously dragging and dropping the clips onto the particular keyword.
In addition, some embodiments automatically associate one or more metadata tags with a media clip. In some such embodiments, this automatic association is based on a number of different factors including the source of the media clip (e.g., based on the library or camera from which the clip was imported), based on an analysis of the media clip (e.g., based on color balance analysis, image stabilization analysis, audio channel analysis, etc.). For example, the media editing application might tag one set of clips from a music library as “Music” and tag another set of clips from a sound effects library as “SFX”. Alternatively, the automatic association can be based on an analysis of the media content (e.g., based on color balance analysis, image stabilization analysis, audio channel analysis, people analysis, etc.). As mentioned above, in some embodiments, the media ingest module 2310 can also perform at least some of the metadata association task when importing media content into the media editing application 2300. In some embodiments, the media editing application includes one or more analysis modules for analyzing the number of people (e.g., one person, two persons, group, etc.) in a clip and/or a type of shot (e.g., a close-up, medium, or wide shot). Other types of analysis modules can include image stabilization analysis modules (e.g., for camera movements), color balance analysis modules, audio analysis modules (e.g., for mono, stereo, silent channels), metadata analysis, etc. In some embodiments, metadata tags represent metadata that are embedded in media content. For example, some video cameras embed frame rate, creation date, and encoding info into video clips that they capture. In addition some devices embed other metadata such as location data, audio channel count, sample rate, file type, camera type, exposure info, etc.
While many of the features of the media editing application 2300 have been described as being performed by one module (e.g., the UI interaction and generation module 2305, the media ingest module 2310, etc.), one of ordinary skill in the art will recognize that the functions described herein might be split up into multiple modules. Similarly, functions described as being performed by multiple different modules might be performed by a single module in some embodiments (e.g., the playback module 2325 might be part of the UI interaction and generation module 2305).
C. Example Data Structure
As shown in
The primary collection 2440 includes the collection ID and the array of clips. The collection ID identifies the primary collection. The array references several clips (i.e., clip 1 to clip N). These represent clips or collections that have been added to the composite display area. In some embodiments, the array is ordered based on the locations of media clips in the composite display area and only references clips in the primary lane of the primary collection. An example of one or more clips in the primary lane of the composite display area is described above by reference to
The clip object 2405 or collection object, in some embodiments, is an ordered array of clip objects. The clip object 2405 references one or more component clips (e.g., the component object 2410) in the array. In addition, the clip object 2405 stores a clip ID that is a unique identifier for the clip object. In some embodiments, the clip object 2405 is a collection object that can reference component clip objects as well as additional collection objects. An example of such collection object is a compound clip that references multiple different clips. In some embodiments, the clip object 2405 or collection object only references the video component clip in the array, and any additional components (generally one or more audio components) are then anchored to that video component.
As shown in
The component object 2410 includes a component ID, an asset reference, and anchored components. The component ID identifies the component. The asset reference of some embodiments uniquely identifies a particular asset object. In some embodiments, the asset reference is not a direct reference to the asset but rather is used to locate the asset when needed. For example, when the media editing application needs to identify a particular asset, the application uses an event ID to locate an event object (not shown) that contains the asset, and then the asset ID to locate the particular desired asset. Several examples of clips associated with an event or an event folder are described above by reference to
In some embodiments, the clip object 2405 only stores the video component clip in its array, and any additional components (generally one or more audio components) are then anchored to that video component. This is illustrated in
As shown, the asset object 2445 includes an asset ID, reference to a source file, and a set of source file metadata. The asset ID identifies the asset, while the source file reference is a pointer to the original media file. The set of source file metadata is different for different media clips. Examples of source file metadata include the file type (e.g., audio, video, movie, still image, etc.), the file format (e.g., “.mov”, “.avi”, etc), different video properties, audio properties, etc.
In the example illustrated in
As shown in
One of ordinary skill will also recognize that the data structures shown in
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When these instructions are executed by one or more processing unit(s) (e.g., one or more processors, cores of processors, or other processing units), they cause the processing unit(s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.
In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage, which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the invention. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.
The bus 2505 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 2500. For instance, the bus 2505 communicatively connects the processing unit(s) 2510 with the read-only memory 2530, the GPU 2515, the system memory 2520, and the permanent storage device 2535.
From these various memory units, the processing unit(s) 2510 retrieves instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 2515. The GPU 2515 can offload various computations or complement the image processing provided by the processing unit(s) 2510.
The read-only-memory (ROM) 2530 stores static data and instructions that are needed by the processing unit(s) 2510 and other modules of the electronic system. The permanent storage device 2535, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 2500 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 2535.
Other embodiments use a removable storage device (such as a floppy disk, flash memory device, its corresponding disk drive, etc.) as the permanent storage device. Like the permanent storage device 2535, the system memory 2520 is a read-and-write memory device. However, unlike storage device 2535, the system memory 2520 is a volatile read-and-write memory, such as a random access memory. The system memory 2520 stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 2520, the permanent storage device 2535, and/or the read-only memory 2530. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit(s) 2510 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.
The bus 2505 also connects to the input and output devices 2540 and 2545. The input devices 2540 enable the user to communicate information and select commands to the electronic system. The input devices 2540 include alphanumeric keyboards and pointing devices (also called “cursor control devices”), cameras (e.g., webcams), microphones or similar devices for receiving voice commands, etc. The output devices 2545 display images generated by the electronic system or otherwise output data. The output devices 2545 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD), as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.
Finally, as shown in
Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.
As used in this specification and any claims of this application, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, a number of the figures (including
In addition, many of the user interface controls described above relates to controlling audio. However, one of ordinary skill in the art would recognize that similar controls can be provided for image effect or filters. For example, one or more user interface controls (e.g., sliders, knobs, buttons) can be provided for each metadata tag to control the effect settings (e.g., brightness, sharpness, amount of distortion, fade-in effect, fade-out effect, etc.).
In many of the examples described herein, a media editing application uses metadata to facilitate editing operations. However, one of ordinary skill in the art would recognize that the metadata features can be provided for different types of applications or programs (e.g., an image organizing application, a server-side web application, an operating system framework). For instance, the metadata features can be provided in an image application that allows the application's user to associate different items with keywords, and apply one or more effects to those items based on the association of the keywords, and/or output those items to different tracks (e.g., files, channels) based on the association of the keywords. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
Claims
1. A non-transitory machine readable medium storing a program that when executed by at least one processing unit outputs audio content for a composite presentation defined by a plurality of media clips, the program comprising sets of instructions for:
- identifying a sequence of clips that define the composite presentation;
- extracting audio data from a set of clips in the sequence of clips;
- estimating audio level of the set of clips based on the audio data; and
- indicating the estimated audio level of the set of clips when playing a mix of the sequence of clips.
2. The non-transitory machine readable medium of claim 1, wherein the set of instructions for indicating the estimated audio level comprises a set of instructions for displaying the audio level in one or more meters.
3. The non-transitory machine readable medium of claim 1, wherein the set of instructions for extracting audio data comprises extracting audio signal data from each clip in the set of clips, wherein the set of instructions for estimating the audio level comprises a set of instructions for summing the audio signal data of the set of clips.
4. The non-transitory machine readable medium of claim 3, wherein the audio signal data are summed by adding a power contribution of each clip in the set of clips.
5. The non-transitory machine readable medium of claim 1, wherein the set of instructions for estimating the audio level comprises a set of instructions for identifying a contribution of each clip in the set of clips to the mix of the sequence and estimating the audio level based on the identification.
6. The non-transitory machine readable medium of claim 5, wherein the sequence of clips is associated with a series of operations, wherein the set of instructions for estimating the audio level comprises a set of instructions for processing down the series of operations to identify the contribution of each clip to the mix of the sequence of clips.
7. The non-transitory machine readable medium of claim 5, wherein the set of instructions for estimating the audio level comprises a set of instructions for scaling the audio level based on the identification.
8. The non-transitory machine readable medium of claim 1, wherein the program further comprises a set of instructions for identifying a tag associated with the set of clips, wherein the audio data are extracted from the set of clips based on the identification of the tag.
9. The non-transitory machine readable medium of claim 1, wherein the set of clips includes a compound clip that is defined by two or more clips, wherein the computer program further comprises a set of instructions for identifying each tag of the compound clip and the compound clip's inner clips, and determining, based on the identification, whether to indicate the audio level of the compound clip or one or more of the compound clip's inner clips.
10. A method of outputting audio content for a composite presentation defined by a plurality of media clips, the method comprising:
- identifying a sequence of clips that define the composite presentation;
- determining audio level of a set of clips by sending the set of clip's audio data over a meter bus; and
- indicating the estimated audio level of the set of clips when playing a mix of the sequence of clips.
11. The method of claim 10, wherein indicating the estimated audio level comprises displaying the audio level in one or more meters.
12. The method of claim 10, wherein the audio data comprises audio signal data from each clip in the set of clips, wherein determining the audio level comprises summing the audio signal data of the set of clips.
13. The method of claim 10, wherein the sequence of clips is associated with a series of operations, wherein estimating the audio comprises processing down the series of operations to identify the contribution of each clip in the set of clips to the mix of the sequence of clips.
14. The method of claim 13, wherein the set of instructions for estimating the audio level comprises scaling the audio level based on the identification.
15. The method of claim 10 further comprising identifying a tag associated with the set of clips, wherein the set of clip's audio data is sent over the bus based on the identification of the tag.
16. The non-transitory machine readable medium of claim 10, wherein the set of clips includes a compound clip that is defined by two or more clips, wherein the computer program further comprises a set of instructions for identifying each tag of the compound clip and the compound clip's inner clips, and determining, based on the identification, whether to indicate the audio level of the compound clip or one or more of the compound clip's inner clips.
17. A non-transitory machine readable medium storing a program that when executed by at least one processing unit outputs a composite presentation defined by a plurality of media clips, the program comprising sets of instructions for:
- displaying the plurality of media clips for defining the composite presentation, wherein at least some of the plurality of media clips is tagged with different tags;
- providing a set of controls for each particular tag that is associated with one or more media clips; and
- modifying, in response to an adjustment of the set of controls, a set of parameter associated with each media clip tagged the particular tag.
18. The non-transitory machine readable medium of claim 17, wherein the set of controls includes audio controls and the set of parameter includes audio level, wherein the set of instructions for modifying comprises modifying the audio of level of the one or more clips tagged with the particular tag.
19. The non-transitory machine readable medium of claim 17, wherein the program further comprises a set of instructions for outputting audio content for the composite presentation based on the modification.
20. The non-transitory machine readable media of claim 17, wherein the plurality of media clips comprises a compound clip that includes multiple inner clips.
21. The non-transitory machine readable medium of claim 20, wherein the set of instructions for adjusting the set of parameter comprises a set of instructions for identifying each tag of the compound clip and the compound clip's inner clips, and determining, based on the identification, whether to adjust a set of parameters associated with the compound clip or one or more of the compound clip's inner clips.
22. The non-transitory machine readable medium of claim 17, wherein the set of parameters relates to an effect or filter associated with one or more of the tagged clips.
Type: Application
Filed: Sep 30, 2011
Publication Date: Mar 21, 2013
Inventors: Aaron M. Eppolito (Santa Cruz, CA), Brian Meaney (Livermore, CA), Colleen Pendergast (Livermore, CA), Michaelle Stikich (El Cerrito, CA)
Application Number: 13/250,857
International Classification: G06F 3/01 (20060101); H04R 29/00 (20060101);