AUTOMATIC HIGHLIGHT REEL PRODUCER

Described herein are techniques related to automatic selection of a subset of digital-video clips (i.e., “highlight reel”) from a set of digital-video clips. The automatic selection is based, at least in part, upon various weighted criteria regarding properties (e.g., metadata, enhanced metadata, and/or content) of the clips. A video-capturing device automatically produces a highlight reel by selecting the superlative clips (e.g., the best clips). This Abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Provisional Patent Application Ser. No. 61/435,757, filed on Jan. 24, 2011, the disclosure of which is incorporated by reference herein.

BACKGROUND

With the widespread use of digital consumer electronic video-capturing devices such as digital camcorders and camera-equipped mobile phones, the size of each consumers' digital-video collections continue to increase very rapidly. As such collections grow ever larger and increasingly more unwieldy, a person is less able to handle the sheer volume of these collections.

An individual person may shoot several hours of digital video at a single event (such as a wedding or a party) or even hundreds of hours of video on a trip (such as to the Grand Canyon) or over a season (the summer). Later, sometimes long after the event, trip, or season is over, the individual videographer may attempt to edit the many hours of video spending hours watching them and editing the videos using the available conventional approaches for doing so. With these conventional approaches, a person may use a crude user interface to view, select, and piece together the clips of the digital video over a timeline and perhaps form a watchable summary of the event, trip, or season.

Software applications that manage media (e.g., digital photographs and videos) collections have become widely adopted as the amount of digital media, including digital video, has grown. Because of its large display, processing power, and memory capacity, most of these conventional approaches are concentrated on use with personal “desktop” computers. Using the ample room of the desktop computer's large displays, the desktop user interface offers a broad workspace for a user to view and manage a large catalogue of digital-video clips.

As evidenced by seeming ubiquity of mobile personal communication devices (e.g., the so-called “smartphones”), mobile communication and computing technology has rapidly developed. With the processing power and memory capacity of such mobile technology, it is possible to have large media collections on mobile personal communication devices or have access to such collections via high-speed data telecommunication networks.

While the power and memory capacities have increased, a typical mobile personal communication device still has a small display and, consequently, more constrained user-interface capabilities than a desktop computer. Accordingly, a user of a typical mobile personal communication device is forced to abandon his mobile environment and move to a desktop computer in order to view and manage her large media catalogue.

SUMMARY

Described herein are techniques related to automatic selection of a subset of digital-video clips (i.e., “highlight reel”) from a set of digital-video clips. The automatic selection is based, at least in part, upon various weighted criteria regarding properties (e.g., metadata, enhanced metadata, and/or content) of the clips. A video-capturing device automatically produces a highlight reel by selecting the superlative clips (e.g., the best clips). If the user does not agree that the clips of the highlight reel are superlative, the device may produce a new highlight reel based upon one or more reweighted criteria.

This Summary is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a video-capturing apparatus that is configured to implement the techniques described herein for automatic selection of one or more digital-video clips from a set of digital-video clips based, at least in part, upon weighted criteria.

FIGS. 2-4 are flowcharts of processes that are configured to implement the techniques described herein for automatic selection of one or more digital-video clips from a set of digital-video clips based, at least in part, upon weighted criteria.

FIG. 5 depicts a video-capturing telecommunications apparatus within a telecommunications environment. The depicted apparatus is configured to implement the techniques described herein for automatic selection of one or more digital-video clips from a set of digital-video clips based, at least in part, upon weighted criteria.

The Detailed Description references the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and components

DETAILED DESCRIPTION

Described herein are techniques related to an automatic production of a highlight reel by selecting of one or more digital-video clips from a larger set of such clips based, at least in part, upon weighted criteria. Using a video-capturing device (such as a digital camcorder or camera-equipped mobile phone), a user accumulates a large collection of digital-video clips by, for example, videoing over several days, weeks, or even months. Rather than manually watching and editing the large video collection to find a group of superlative clips (e.g., the best clips), her video-capturing device automatically selects the superlative clips based upon various weighted criteria regarding properties (e.g., metadata, enhanced metadata and/or content) of the clips. When assembled together, the auto-selected collection of clips is called a “highlight reel” herein.

Via a user-interface on the device or on another device with a display, the user views the highlight reel. The user may indicate approval or disapproval of the highlight reel. If approved, the weight values assigned to the various weighted criteria remain unchanged because those values seem to lead to a proper selection of clips. If the user disagrees with the selection, then the weight values are adjusted and the auto-selection is performed again but with reweighted values for the criteria.

For instance, consider the following scenario to illustrate how the techniques described herein may be employed. Using her camera-equipped smartphone, Hope shoots about twelve hours of video during a summer vacation to the Oregon Coast with her family. At some point during her vacation, Hope notices a notification on her smartphone asking her if she would like to look at her “highlight reel.” Curious, she, of course, watches the approximately seven-minute long highlight reel. In this example, the highlight reel includes the top-rated one percent (i.e., about seven minutes) of the twelve hours that the smartphone had auto-selected as the “best” of the clips that she captured during her vacation.

While generally happy with the auto-selected highlight reel, Hope is displeased that so few of the clips include her family members. So, when asked, she indicates that she does not agree that these clips are the “best” in her subjective opinion. Hope may indicate her displeasure by indicating that one or more particular clips do not belong amongst the “best” clips by removing them from the “best” group.

In response to her dissatisfaction, the smartphone soon offers another auto-selection of seven minutes of clips. The clips that she was displeased with (e.g., removed from the “best” group) are not part of the new offering of “best” clips. Since this offering seems to include many more of her family members in the clips, she agrees that these clips are the “best.”

In response to her agreement and without any additional prompting from her, her contacts in her favorites list (which includes many friends and family members) receive a notification (via email, Short Message Service (SMS), social networking site, etc.) about Hope's new highlight reel for them to see. So, with little effort from Hope, the best seven minutes of over twelve hours of vacation videos are shared with so many of the people who are most important to her. This highlight reel is shared in a timely manner rather than weeks after her vacation and after manually editing her videos.

An example of an embodiment of a video-capturing apparatus that employs the new video clip auto-selection techniques described herein may be referred to as an “exemplary video-capturing device.” While one or more example embodiments are described herein, the reader should understand that the claimed invention may be practiced using different details than the exemplary ones described herein.

The following co-owned U.S. patent applications are incorporated by reference herein:

    • U.S. Provisional Patent Application Ser. No. 61/435,757, filed on Jan. 24, 2011;
    • U.S. patent application Ser. No. ______, titled “Automatic Selection Of Digital-Video Clips With An Apparatus,” filed on Mar. ___, 2011;
    • U.S. patent application Ser. No. ______, titled “Automatic Selection Of Digital Images From A Multi-Sourced Collection Of Digital Images,” filed on Mar. ___, 2011; and
    • U.S. patent application Ser. No. ______, titled “Automatic Sharing of Superlative Digital Images,” filed on Mar. ___, 2011.

Exemplary Video-Capturing Device

FIG. 1 shows a user 100 (such as Hope) holding a digital camcorder 110 with data-communication capability. The device 110 is also shown in an expanded view. The device 110 may be used to automatically select a small group of digital-video clips from amongst a large video collection. The videos of the large collection may have been shot by the device 110 itself.

The device 110 has the capability to communicate (via cellular technology, WLAN, etc.) with a network infrastructure 120. This network infrastructure 120 may be a private or public network of computing or telecommunication resources. It may include the Internet. It may include the so-called “cloud.” As such, the network infrastructure 120 may be called the cloud-computing infrastructure.

The video-capturing device 110 includes a video-capturing unit 130, a video-storage unit 140, a highlight-production unit 150, a highlight-presentation unit 160, a user-interaction unit 170, an actions unit 180, and a communications unit 190. Each of these units may be implemented (at least in part) by a set of processor-executable instructions (e.g., software modules). Furthermore, each of these units may include or employ one or more lower-level components of the device 110. For example, these lower-level components include processors, memory, storage, video display, user-input device (e.g., keyboard), transceiver, and the like.

The video-capturing unit 130 is configured, designed, and/or programmed to capture digital-video clips. That is, a person using the device 110 may shoot a scene and/or person using the video-capturing unit 130 of the device.

The video-capturing unit 130 includes various lower-level camera components for capturing moving digital-video clips with audio (i.e., digital video) and perhaps still digital-video clips (i.e., photographs or clips). The camera components may include (by way of example and not limitation): digital sensor chip (e.g., CCD (charge-coupled device) or CMOS (complementary metal-oxide-semiconductor)), lenses, display, view-finder, and the like. The video-capturing unit 130 may be implemented, at least in part, by a software module resident, at least in part, in the device's memory and executed by one or more processors of the device.

In addition, the video-capturing unit 130 may include, or have associated therewith, a video-handling software module (e.g., application) that may be used to enhance the amount of information recorded in the digital-video file relating to the captured digital-video clips. For example, the video-handling application may use information from other sources and/or applications to add data to the captured digital-video clips that are stored on the video-storage unit 140. This added data may be called “metadata.” Specifically, the video-handling application may be configured to obtain information from other hardware, firmware, or software components to add data to the digital-video files. Examples of other components include (but are not limited to) a location application, a calendar application, and/or a contacts application.

The file formats for storing digital videos are sometimes referred to container formats from audio and video. Common container formats for storing digital videos include (but are not limited to): 3GPP multimedia file (0.3 gp); anime music video file (.amv); advanced systems format file (.asf); Microsoft ASF redirector file (.asx); audio video interleave file (.avi); DivX-encoded movie (.divx); Nintendo DS movie file (.dpg); Apple QuickTime movie (.moov, .mov, .qt); MPEG-4 video file (.mp4); MPEG video file (.mpg); Real media file (.rm); Macromedia Flash movie (.swf); and Windows media video file (.wmv).

As used herein, a “digital video” is one or more digital-video files. A digital video includes one or more digital-video clips. A digital-video file refers to the formatted container of a digital video as it is stored on the video-storage unit 140 or other storage device. A digital-video clip (or simply a clip) refers to a short portion of a digital video that often contains an individual scene, which is a video sequence that is typically shot in one continuous take.

Additionally, the video-handling application may determine or identify particular clips in the one or more stored files of the digital video. The video-handling application identifies a particular clip by using existing metadata that specifies the boundaries of the particular clip (i.e., where an individual clip begins and ends). Also, the video-handling application may utilize existing or new video-editing technology to locate a clip based upon the content of the digital video. These video-editing technologies are commonly known as “automatic scene detection” to those of ordinary skill in the art.

Furthermore, the video-handling application may be designed to enhance user functionality once clips have been obtained. For example, the video-handling application may also be configured to display clips to the user in cooperation with or as part of the highlight-presentation unit 160. The video-handling application may include various filters or criteria used to limit the number of clips displayed to the user. As discussed below, these filters or criteria may be user selectable, may use the data in the digital-video file obtained from non-video sources or applications, or may be configured based on data in the digital-video files, etc. As another example, similar filters or criteria may also be used to cluster clips into folders (such as virtual albums, system file folders, etc.). As still another example, the video-handling application may use data stored in the digital-video files, contact information, calendar information, and/or upload information to increase the ease of sharing clips.

The clips operated on by the video-handling application may include clips captured by the video-capturing unit 130, and/or may include clips obtained from sources other than the video-capturing unit 130. For example, images may be transferred to device 110 using one or more wireless and/or wireline interfaces from the network 120.

The video-handling application may be a stand-alone application, or may be integrated into other applications or other units. Moreover, the video-handling application may be formed by a combination of functions of separate, distinct programs or units of the device 110.

The video-storage unit 140 is configured, designed, and/or programmed to store digital-video clips as digital-video files and possibly other forms of data and software. That is, the device 110 stores the digital-video files of clips shot by the person using the device 110 on the video-storage unit 140 of the device.

The video-storage unit 140 includes one or more lower-level memory or storage components for storing moving digital images with audio (i.e., digital video). The memory or storage components may be volatile or non-volatile, dynamic or static, read/write-able or read only, random- or sequential-access, location- or file-addressable, and the like. The memory or storage components may be magnetic, optical, holographic, and the like. The memory or storage components may be internal to the device 110, attached externally to the device 110, or available via data communications (e.g., on the network or the cloud).

The clips stored by the video-storage unit 140 may include clips shot by the video-capturing unit 130, and/or may include clips obtained from sources other than the video-capturing unit 130. For example, clips may be transferred to the device 110 using one or more wireless and/or wireline interfaces from the network 120.

The highlight-production unit 150 is configured, designed, and/or programmed to automatically select a group of superlative (e.g., top-ranked or bottom-ranked) clips of a collection of such clips stored by the video-storage unit 140. Alternatively, the highlight-production unit 150 may cluster the collection of clips before the auto-selection is performed. Alternatively still, the highlight-production unit 150 may cluster the auto-selected group of clips after the auto-selection is performed.

The highlight-production unit 150 shown in FIG. 1 is implemented as a software module which would reside, at least in part, in the device's memory and be executed by the device's one or more processors. Alternatively, the highlight-production unit 150 may be implemented as a collection of or as part of dedicated hardware or firmware. Alternatively still, the highlight-production unit 150 may be implemented as a combination of hardware, firmware, or software.

The highlight-presentation unit 160 is configured, designed, and/or programmed to visually present digital-video clips to the user for her enjoyment and/or consideration. That is, a person using the device 110 may view a selection of clips using the highlight-presentation unit 160 of the device.

The highlight-presentation unit 160 includes various lower-level audio/visual presentation components for showing moving digital images with audio (i.e., digital video). The audio/visual components may include (by way of example and not limitation): a liquid crystal display (LCD), a flat panel, organic light-emitting diode (OLED) displays, pico-projection displays, a solid state display or other visual display devices, speakers, and the like. The highlight-presentation unit 160 may be implemented, at least in part, by a software module resident, at least in part, in the device's memory and executed by one or more processors of the device.

The user-interaction unit 170 is configured, designed, and/or programmed to attain feedback (e.g., obtain input) from the user and, in particular, feedback related to the clips presented in cooperation with the highlight-presentation unit 160. That is, a person using the device 110 indicates approval of or disapproval with the auto-selected group of clips presented on-screen by using the user-interaction unit 170 of the device.

The user-interaction unit 170 includes various lower-level user-input components for receiving input from the user, such as (by way of example and not limitation): keyboard, touchscreen, mouse, touchpad, trackball, and the like. The user-interaction unit 170 may use some of the same audio/visual components of the highlight-presentation unit 160. The user-interaction unit 170 is implemented as a software module which would reside, at least in part, in the device's memory and be executed by the device's one or more processors. Alternatively, the highlight-production unit 150 may be implemented as a collection of or as part of dedicated hardware or firmware. Alternatively still, the user-interaction unit 170 may be implemented as a combination of hardware, firmware, or software.

The actions unit 180 is configured, designed, and/or programmed to automatically perform a defined action on the group of top-ranked (or alternatively bottom-ranked) clips of a collection of such clips stored by the video-storage unit 140. Presuming the clips are top-ranked, the defined actions include (by way of example and not limitation): archiving, sharing, burning, conversion/reformat, and the like. Presuming the clips are bottom-ranked, the defined actions include (by way of example and not limitation): deleting, recycling, and the like.

Archiving clips involves storing clips in a different and perhaps more reliable location, such as onto the cloud-computing infrastructure. Sharing clips includes sending copies of the clips to another person via one or more various ways of sending such data or notices of the same. Alternatively, sharing clips includes sending a link to one or more of the clips via one or more various ways of sending such links or notices of the same. Examples of such ways to send clips, links, and/or notices thereof include (but are not limited to): email, posting on a social network, posting on a blog or website, text message, MMS (multimedia messaging service), and the like. When the clips are burned, they are packaged and written onto a removeable media (like a DVD disc) so that the clips can be viewed on a television with an attached DVD player. Also, the clips may be automatically converted or reformatted in a pre-defined manner.

Deleting clips involves permanently removing one or more clips from the video-storage unit 140. Recycling clips involves placing one or more clips into a queue of files to be deleted later. This queue is sometimes called the “trash” or “recycle bin.”

The actions unit 180 shown in FIG. 1 is implemented as a software module which would reside, at least in part, in the device's memory and be executed by the device's one or more processors. Alternatively, the actions unit 180 may be implemented as a collection of or as part of dedicated hardware or firmware. Alternatively still, the actions unit 180 may be implemented as a combination of hardware, firmware, or software.

The communications unit 190 is configured, designed, and/or programmed to transmit (and/or receive) notifications of, links to, and/or copies of digital-video clips through the network 120 and onto another user or device accessible from that network. For example, a user of the device 110 may have an auto-selected group of the clips that the person shot sent to their friends upon the person's approval of that auto-selected group.

The communications unit 190 includes various lower-level communications components for sending and/or receiving data communications from/by the device. Using transceiver, transmitter, receiver, network interface controller (NIC), and the like, the communications unit 190 utilizes wired (e.g., universal serial bus (USB) or Ethernet) or wireless communications. Examples of wireless communications are (by way of example and not limitation): cellular, satellite, conventional analog AM or FM radio, Wi-Fi™, wireless local area network (WLAN or IEEE 802.11), WiMAX™ (Worldwide Interoperability for Microwave Access), and other analog and digital wireless voice and data transmission technologies. The communications unit 190 may be implemented, at least in part, by a software module resident, at least in part, in the device's memory and executed by one or more processors of the device.

Exemplary Processes

FIGS. 2-4 are flowcharts illustrating exemplary processes 200, 300, and 400 that implement the techniques described herein for automatic highlight reel production. The exemplary processes 200, 300, and 400 are performed, at least in part, by a video-capturing device, such as device 110 of FIG. 1, a video-capturing device 204 of FIG. 2, and/or a video-capturing telecommunications device 510 of FIG. 5. Additionally and alternatively, the exemplary processes 200, 300, and 400 are performed, at least in part, by a cloud-based service.

The process 200 starts, at 210, with an update of a set 212 of digital-video clips (i.e., clips) by, for example, a user 202 shooting new clips with a video-capturing device 204, such as a digital camcorder. Of course, the set 212 of digital-video clips may be updated via other sources such as from a nearby Bluetooth™ device, via cellular connection, from the Internet (e.g., email, social networking site, website, etc.), or some cloud-computing infrastructure resource. The set 212 of digital-video clips may be stored on the device 204, on another data-communicative device, on a network server, on the Internet, in the cloud, or some combination thereof. The box representing operations 210 is vertically long and horizontally narrow so as to indicate that this operation may occur while the other to-be-described operations are performed.

The set 212 of digital-video clips may represent all of the clips stored and/or accessible by the device 204. Typically, the set 212 includes a group of clips that are less than all of the clips stored and/or accessible by the device 204. Accordingly, the set 212 is a collection or cluster of some but not all of the stored and/or available clips. Clips may clustered based on one or more criteria, such as location, time, date, and calendar information from one or more events, such as those on private and/or publicly available calendars.

Additionally, the clustering criteria may also consider event specifics, such as, faces in the clusters, colors associated with clips, background and lighting of the clips, including overall brightness, color similarities or dissimilarities, and scene information. Clusters of clips may be multilayered. For example, a cluster of wedding clips may have a number of secondary clusters. One secondary cluster may include clips of just the groom and bride, where another secondary cluster may include clips of just the bridesmaids.

Next, at operation 214, the device 204 provides enhanced metadata for the digital video. This enhanced-metadata provision includes generating the enhanced metadata and associating it with the digital video (and the clips that make up that associated video). The enhanced metadata is derived from the existing metadata already associated with the digital video (e.g., location and time) and/or an analysis of the content of the video. The content analysis detects and determines occurrence (or lack thereof) of temporal events in the video rather than something based on a single image or frame.

To produce enhanced metadata, the content of the audio and/or the moving images portions of the digital video are analyzed. This analysis may look for and note one or more of the following content conditions based upon temporal events (provided by way of illustration only and not limitation): blank segment, silent or quiet audio, lack of sync between audio and video, no audible voices, existence of faces, identification of faces, emotions expressed on faces, motion detection, human motion detected, presence and identification of particular objects, person or object tracking, music detected, language detected, and the like.

After that, at 216, the device 204 auto-selects a subset of the set 212 of digital-video clips. The subset of clips forms an effective “highlight reel” of the larger set 212 of digital-video clips. As used herein, the subset presumptively includes some, but not all of, the digital-video clips of the set 212.

Typically, the selected subset includes the superlative clips from the set 212 of clips. As used herein with regard to clips, the term “superlative” refers to a subset of clips having the highest kind, quality, or order with respect to the other clips of a set. For example, clips that are top-ranked or bottom-ranked in some category or characteristic are superlative. Also, for example, clips considered the “best” or “worst” of the set are considered superlative.

The auto-selection is based upon one or more weighted selection criteria 218 of one or more properties of the clips. FIG. 3 provides more information regarding both the auto-selection operation 216 and the selection criteria 218. The auto-selection operation 216 may be designed to find the “best” clips (i.e., the top-ranked clips). Alternatively, it may be designed to find the “worst” clips (i.e., the bottom-ranked clips).

At 220, the device 204 presents the highlight reel to the user 202 via a user interface (of a telecommunication device or a cloud-connected computer device). Via the user-interface (including a user-input device), the user 202 indicates his approval of the highlight reel. By approving, the user 202 may be, for example, agreeing that the highlight reel represents the “best” clips in his subjective opinion. Conversely, by disapproving, the user 202 may be, for example, disagreeing that the highlight reel represents the “best” clips in his subjective opinion.

At 222, the device 204 attains feedback from the user regarding the highlight reel. The device 204 determines whether input from the user indicates approval of the highlight reel. If not approved, then the device 204 updates (i.e., adjusts or alters) the weight values assigned to one or more of the selection criteria 218 and the process returns to the auto-selection operation 216 to perform a reweighted selection of the clips based upon the now updated criteria. The device 204 automatically reselects an updated highlight reel. If the user input indicates, at 222, that the user disapproved, the process proceeds to operation 224.

In one or more implementations, the user 202 may indicate his disapproval by him choosing to remove or add one or more clips from/to the highlight reel. In the same implementation or others, the device 204 may “learn” the types of images that a user prefers or does not prefer based on iterations of approval and disapproval by the user. When a user removes a clip from the subset, the device 204 reduces the weight values for the removed clip's strong properties and/or increases the weight values for the removed clip's weak properties. And vice versa for the clips that the user manually adds to the subset. Using machine learning techniques, the multiple iterations hone the weight values assigned to various weighted selection criteria.

At 224, the device determines whether input from the user (most likely in addition to the user input regarding approval) indicates whether to take a defined action upon the subset of clips and/or which of several possible defined actions to take upon the subset. If the user does not want any defined action, then the process returns to the beginning operation at 210. If the user wants a defined action to be performed upon the subset, then the process proceeds to action operation at 226.

In FIG. 2, operation 224 is shown as a dashed box to indicate that it is optional and some implementations may not offer the user this choice. Alternatively, the device 204 may perform one or more defined actions by default. Alternatively still, the user may have pre-set whether and/or which one or more of the defined actions will be performed upon the clips of the highlight reel.

At 226, the device 204 performs one or more defined actions on one or more of the clips of the highlight reel. Presuming the clips are top-ranked (i.e., “best”), the defined actions include (by way of example and not limitation): archiving, sharing, media burning, converting/reformatting, and the like. Presuming the clips are bottom-ranked (i.e., “worst”), the defined actions include (by way of example and not limitation): deleting, recycling, and the like.

After the one or more actions are performed on the clips of the highlight reel, the process returns to the video-set updating operation 210. The operations 214 through 226 may be performed in real-time with the video-set updating operation 210.

FIG. 3 shows the process 300 and offers more details regarding the auto-selection operation 216 and the selection criteria 218. The auto-selection operation 218 includes a binary selection filter operation 302 followed by a weighted-criteria selection filter operation 304. At 302, one of more of the criteria 210 may be applied as a binary filter to, for example, remove clips that do not meet the one or more criteria. After that, at 304, one or more of the criteria are applied in a weighted manner.

Generally, the main purpose of the binary selection filter operation 302 is to remove outright “bad” clips. So, it could be called the bad-clip filter. A bad clip is one where the content is difficult for a human to discern, incoherent, would certainly be uninteresting, and/or simple post-processing is not likely to alleviate the problem. The binary filter may use one or more criteria related to one or more properties that might make a clip be considered “bad.” Examples of such “bad clip” criteria include (but are not limited to): shaky, blurry, over or under exposed, poor resolution, poor contrast, out-of-focus, too short, stillness and/or silent for too long, audio-video out of sync, red-eye, and the like. Out-of-sync and red-eye might not be part of the binary selection filter operation 302 in some implementations because they can be fixed with some post-processing.

Some implementations may not employ the binary selection filter operation 302. Those implementations may rely upon the weighted-criteria selection filter operation 304 to remove the “bad” clips.

Alternatively, the binary selection filter operation 302 may be described as grouping the set 212 of digital-video clips into an intermediate grouping that includes two or more of the digital-video clips of the set 212. Similarly, the weighted-criteria selection filter operation 304 may be alternatively described as ranking the digital-video clips of the intermediate grouping based upon one or more of the multiple weighted criteria 218 and designating an allotment of top-ranked (or bottom-ranked) digital-video clips of the intermediate grouping as the selected subset of digital-video clips that results from the auto-selection operation 216.

The size (i.e., number, cardinality) of the allotment may be a fixed number (e.g., ten) or calculated. That allotment size may refer to number of clips, number of frames, length of highlight reel (e.g., ten minutes), or the like. If fixed, the allotment size may be a factory-set default number or a user-defined number. If calculated, the calculation may be, for example, based upon linear or non-linear proportions of the quantity or length of clips in the set 212, in the intermediate grouping, and/or the like. These proportions may be factory-set or user-defined. A percentage is an example of such proportions.

For example, the allotment size for the subset may be user-defined to be five percent of the number or length of clips in the set 212. If, for this example, the set 212 included 100 clips totaling three hours of video, then the allotment size might be five clips or perhaps nine minutes. Consequently, the user 202 would be presented with fives clips or perhaps nine minutes of clips for his approval.

Typically, the criteria used for the binary filtering differ from those used for the weighted filtering. However, in some implementations, the criteria used for one may overlap the criteria used for the other. The criteria are related to properties of a clip. Those properties may be derived from the content of the clip (based upon an image or video analysis) and/or from metadata associated with the clip. Examples of clip metadata include (but are not limited to): technical metadata (such as encoding details, identification of clip boundaries, size, resolution, color profile, ISO speed and other camera settings), descriptive metadata (captions, headlines, titles, keywords, location of capture, etc.), and administrative metadata (such as licensing or rights usage terms, specific restrictions on using an image, model releases, provenance information, such as the identity of the creator, and contact information for the rights holder or licensor).

The weighted selection criteria 218 include (by way of example and not limitation): presence or absence of audio from a corresponding clip, motion, smiling faces, laughter, black screens, identification of video-capture device, identification of video-capture device user, location and/or time of video-capture, focus, contrast, shake, red eye, person in clip is a favorite (e.g., in a social network or on the device), person in clip is a contact, clip is tagged (in general or with a particular tag), user-added criteria, clip was auto-corrected, flash used, social network ranking of a clip, etc.

FIG. 4 illustrates the process 400 for determining a best clip that may be used with embodiments described herein. This process 400 may be employed as part of process 200 and/or process 300 herein.

The process 400 starts with operation 402 where a video-capture device retrieves the next clip (or the first when initially executed) from a memory like that of video-storage unit 140 of device 110 (as shown in FIG. 1). Next, at 404 and 406, a determination is made if the retrieved clip is shaky or if redeye is detected. If so, a next clip is retrieved and the process 400 starts again with that next clip. If the retrieved clip is the last (as determined at operation 438), then the process 400 proceeds to a clip-ranking operation 440, which is discussed later.

If no shake or redeye is detected in a clip, then the process 400 continues onto operation 408 where it is determined if the resolution of the retrieved photo is greater than a defined threshold resolution. If not, then a next photo is retrieved and the process begins again. If so, based on the resolution of the image, a resolution index is determined or calculated at operation 410. In one example, the resolution index may be determined or calculated based on the resolution (megapixels) associated with a photo undergoing consideration. Next, at 412, the process determines if the histogram associated with the photo is within a defined threshold. If not, then a next photo is retrieved and the process begins again. If so, based on the histogram of the photo, a histogram index is determined or calculated at 414. In one example, the histogram index is determined or calculated based on a consideration of the number of “dark” pixels (underexposure), “bright” pixels (overexposure), and/or the number of pixels that fall between the underexposure and overexposure range.

Next, the process continues, at 416, to determine if a duplicate clip of the one currently being processed is stored in the device or otherwise known or available. If so, a next clip is retrieved and the process begins again. If not, the process determines, at 418, if one or more faces are detected in the clip. If no faces are detected in the clip, the process calculates, at 420, a NF (“no face”) clip score based on the calculated resolution index and histogram index, and associates the NF score with the clip and adds the clip to a “no face” clip queue at 422. If one or more faces is detected, the process determines, at 424, if a subject in the clip has their eyes shut or partially shut (i.e., “eyes closed detection”). If that is detected, a next clip is retrieved and the process begins again.

If no blink is detected, the process calculates face, event, alignment and user like indexes at operations 426, 428, 430, 432, respectively. In one implementation, the face index calculation considers if one or more expected or desired faces are in a photo, the event index calculation considers if an expected event (e.g., wedding or party) is part of a photo, the alignment calculation considers a degree of alignment (e.g., vertical or horizontal) of a photo and the user like index considers a relative user acceptance or approval (e.g., like or dislike) of one or more faces in a photo. These indexes and the previously calculated indexes associated with resolution and histogram are considered when calculating an F (“face”) photo score at 434. The face image score is associated with the photo and the photo is added to a face queue at 436.

Each time a clip is added to a queue, the queue is automatically resorted based on scores associated with an already processed and queued clip. Once the last clip is processed (as determined by operation 438), the process picks the highest scored clip as the best clip at 440. In addition, operation 440 may also select, based on clip score, a set of clips for presentation to a user of the portable device.

In alternative implementations, other criteria (like those of 218 in FIGS. 2 and 3) and/or different permutations of criteria may be employed at operations 404, 406, 408, 416, 418, and 424. For example, with operations 404, 406, and 408 other “bad clip” type criteria may be used and with operations 416, 418, and 424 other weighed selection criteria may be employed.

Exemplary Device within Exemplary Telecommunication Environment

FIG. 5 shows an exemplary telecommunication environment 500 which includes a video-capturing telecommunications device 510. Both the environment 500 and the device 510 may be included as part of one or more implementation of the techniques described herein.

As depicted, the device 510 is a smartphone that is capable of communicating using analog and/or digital wireless voice and/or data transmission. FIG. 5 shows the device 510 in communication with a cell phone 520 via a so-called cellular network 522 as represented by cell tower. Via that cellular network 522 or via some other wireless or wired communication technology (e.g., Wi-Fi™, WiMAX™, WLAN, Ethernet, etc.), the device 510 is coupled to a cloud-computing infrastructure 524.

The cloud-computing infrastructure 524 may include scalable and virtualized compute resources over a private or public network. Typically, a cloud-computing infrastructure uses the internet and central remote servers to maintain data and applications. The cloud-computing infrastructure 524 includes a group of individual computers connected by high-speed communications and some form of management software guiding the collective actions and usage of the individual computers. This technology allows for much more efficient computing by centralizing storage, memory, processing and bandwidth. While depicted in FIG. 5 as the cloud-computing infrastructure, item 524 may alternatively be viewed as a network or the Internet.

The device 510 includes many components, such as a wireless transceiver 530, a NIC (network interface controller) 532, one or more processors 534, one or more cameras 536, one or more input subsystems 538, one or more output subsystems 540, a secondary storage 542 with at least one processor-readable medium 544, one or more location subsystems 546, one or more power systems 548, one or more audio systems 550, one or more video displays 552, and a memory 554 having software 556 and data (e.g., digital photo or digital video) 558 stored thereon.

The NIC 532 is hardware component that connects the device 510 to one or more computer networks and allow for communication to and from that network. Typically, the NIC 532 operates as both an Open Systems Interconnection (OSI) layer 1 (i.e., physical layer) and layer 2 (i.e., data link layer) device, as it provides physical access to a networking medium and provides a low-level addressing system through the use of Media Access Control (MAC) addresses. It allows devices to connect to each other either by using cables or wirelessly.

The one or more processors 534 may include a single or multi-core central processing unit (CPU), a graphics processing unit (GPU), other processing unit or component, or some combination thereof.

The camera 536 may be configurable to capture still images (clips), moving images (video), or both still and moving images. The camera components may include (by way of example and not limitation): digital sensor chip (e.g., CCD (charge-coupled device) or CMOS (complementary metal-oxide-semiconductor)), lenses, flash, and the like.

The input subsystems 538 are physical devices designed to allow for input from, for example, a human user. The input subsystems 538 may include (by way of example and not limitation): keyboard, keypad, touchscreen, touchpad, mouse, trackball, paddle, light pen, scanner, stylus, and/or a micro-telecommunications device.

The output subsystems 540 are mechanisms for providing a tactile output to the user in a manner that is neither audio nor video based. Such tactile output is typically generated by an offset rotating motor that vibrates the device 510 to give a notification to the user.

The secondary storage 542 is typically a read-only memory (ROM), a programmable memory (such as EEPROM, EPROM, or PROM), a static read/writable memory (such as flash memory), and a mechanical read/writeable device (such as hard drive), which may be magnetic, optical, holographic, and the like. The components of the secondary storage 542 may be internal to the device 510, attached externally to the device 510, or available via data communications (e.g., on the cloud).

At least one processor-readable medium 544 is stored on the secondary storage 542. The processor-readable medium 544 stores one or more sets of instructions (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions may also reside, completely or at least partially, within the memory 554 as software 556 and within the processor(s) 534 during execution thereof by the device 510.

The location subsystems 546 may be designed to determine the physical location of the device 510. The one or more location subsystems 546 may include (but is not limited to) an application and hardware for determining location via global positioning system (GPS) and/or by communicating with a cellular network.

The power subsystems 548 may include (but is not limited to) one or more batteries and one or more external power supplies, such as an interface to an external power supply source.

The audio systems 550 are configured to generate sound, noise, voices, and music. In addition, the audio systems 550 are configured to capture and process sound, noise, voices, and music. The audio systems 550 includes (but is not limited to) speakers and microphones.

The video display 552 is configured to display images, videos, text, user-interfaces, and the like. The video display 552 may include (but is not limited to) a liquid crystal display (LCD), a flat panel, a solid state display or other device. The video display 552 may operate as a digital viewfinder that allows a user to preview a scene before capturing an image and/or to view a movie as it is being captured.

The memory 554 is configured to store software 556 and data (e.g., digital photo or digital video) 558 thereon. The memory 554 is a working space and may include (but is not limited to) random access memory (RAM). The memory 554 may be one or more of the following (by way of example and not limitation): volatile or non-volatile, dynamic or static, read/write-able or read only, random- or sequential-access, location- or file-addressable, and the like. The components of the memory 554 may be internal to the device 510, attached externally to the device 510, or available via data communications (e.g., on the cloud).

Additional and Alternative Implementation Notes

The exemplary devices and apparatuses discussed herein that implement at least some aspect of the techniques discussed herein include (by way of illustration and not limitation): the video-capturing device 110 from FIG. 1, the video-capturing device 204 of FIG. 2, and the video-capturing telecommunications device 510 of FIG. 5.

Such exemplary devices may be what is commonly referred to as a “mobile phone,” “smartphone,” and/or “cellphone.” However, the described techniques can be used in conjunction with non-cellular technologies such as conventional analog AM or FM radio, Wi-Fi™, wireless local area network (WLAN or IEEE 802.11), WiMAX™ (Worldwide Interoperability for Microwave Access), Bluetooth™, and other analog and digital wireless voice and data transmission technologies. Alternative implementations of such devices might not have any telecommunications or wireless communication capability. Still other alternative implementations may utilize a wired communication link instead of or in addition to wireless communication.

The exemplary devices are not limited to those having a camera. Any device with the capability to acquire, collect, and/or manipulate digital images or moving digital images (i.e., video) may implement some aspect of the techniques discussed herein. Examples of such alternative devices include (but are not limited to) the following: tablet-based computer, other handheld computer, netbook, computer, digital camera, digital camcorder, handheld multimedia device, digital single-lens reflex (DSLR) camera, GPS navigational system, vehicle-based computer, or other portable electronics.

The exemplary devices may operate on digital videos acquired from sources other than their own video-capturing components. These other video sources may be other video-capturing devices or from non-camera-equipped sources. The non-camera-equipped sources include other devices or services where images and/or videos are stored, collected, accessed, handled, manipulated, and/or viewed. Examples of such alternative image/video sources include (but are not limited to): a clip-processing kiosk, portable and removable storage media or device (e.g., CD-ROM, DVD-ROM, other optical media, USB flash drive, flash memory card, external hard drive, etc.), electronics with radio-frequency identification (RFID), a social-networking services (e.g., Facebook™, MySpace™, Twitter™, Linkedin™, Ning™, and the like), and photo/video sharing services (e.g., Flickr™, Photobucket™, Picasa™, Shutterfly™, and the like).

One or more implementations of the techniques described herein may include an initial training session for the auto-selecting device or service to learn what makes a photo and/or video clip superlative (e.g., best or worst) in the user's opinion. This training may be repeated from time to time. This training will set the weight values of one or more of the weighted criteria. There may be various profiles for specific conditions and situations where the weighted criteria have their own trained values. Each user of an auto-selecting device or service may have their own customized weighted criteria values and profiles of such values. An auto-selecting device or service may have a default set of values or profiles assigned. That default set or profiles may be extrapolated from a statistically evaluation (e.g., mean or median) of the trained values derived from multiple users.

Instead of a training session, one or more implementations may take advantage of a system of user-ratings (e.g., thumbs up/down or 1 to 5 stars) for photos and/or video clips. These user-rating are effective data for training the exemplary video-capture device to learn what makes a photo and/or video clip superlative (e.g., best or worst) in the user's opinion. In some implementations, the user may define values for one or more of the weighted criteria. This may be accomplished using, for example, slider-bar user interface for the user to adjust weights assigned to particular criteria or to categories of criteria.

In the above description of exemplary implementations, for purposes of explanation, specific numbers, materials configurations, and other details are set forth in order to better explain the invention, as claimed. However, it will be apparent to one skilled in the art that the claimed invention may be practiced using different details than the exemplary ones described herein. In other instances, well-known features are omitted or simplified to clarify the description of the exemplary implementations.

The inventors intend the described exemplary implementations to be primarily examples. The inventors do not intend these exemplary implementations to limit the scope of the appended claims. Rather, the inventors have contemplated that the claimed invention might also be embodied and implemented in other ways, in conjunction with other present or future technologies.

Moreover, the word “exemplary” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Rather, use of the word exemplary is intended to present concepts and techniques in a concrete fashion. The term “techniques,” for instance, may refer to one or more devices, apparatuses, systems, methods, articles of manufacture, and/or computer-readable instructions as indicated by the context described herein.

As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise or clear from context, “X employs A or B” is intended to mean any of the natural inclusive permutations. That is, if X employs A; X employs B; or X employs both A and B, then “X employs A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more,” unless specified otherwise or clear from context to be directed to a singular form.

The exemplary processes discussed herein are illustrated as a collection of blocks in a logical flow graph, which represents a sequence of operations that can be implemented with hardware, software, firmware, or some combination thereof. In the context of software/firmware, the blocks represent instructions stored on one or more processor-readable storage media that, when executed by one or more processors, perform the recited operations. The operations of the exemplary processes may be rendered in virtually any programming language or environment including (by way of example and not limitation): C/C++, Fortran, COBOL, PASCAL, assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), and the like, as well as object-oriented environments such as the Common Object Request Broker Architecture (CORBA), Java™ (including J2ME, Java Beans, etc.), Binary Runtime Environment (BREW), and the like.

Note that the order in which the processes are described is not intended to be construed as a limitation, and any number of the described process blocks can be combined in any order to implement the processes or an alternate process. Additionally, individual blocks may be deleted from the processes without departing from the spirit and scope of the subject matter described herein.

The term “processor-readable media” includes processor-storage media. For example, processor-storage media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, and magnetic strips), optical disks (e.g., compact disk (CD) and digital versatile disk (DVD)), smart cards, flash memory devices (e.g., thumb drive, stick, key drive, and SD cards), and volatile and non-volatile memory (e.g., random access memory (RAM), read-only memory (ROM)).

For the purposes of this disclosure and the claims that follow, the terms “coupled” and “connected” may have been used to describe how various elements interface. Such described interfacing of various elements may be either direct or indirect.

Claims

1. A method comprising:

obtaining a digital video;
producing a highlight reel of clips from the digital video, the producing comprising: analyzing metadata and content of the clips of the digital video; selecting multiple clips from the digital video based upon the analyzing and weighted selection criteria;
presenting, via a user-interface of a computing apparatus, the highlight reel to a user for feedback;
attaining feedback from the user regarding user-acceptability of the produced highlight reel; and
in response to attained user-feedback indicating a lack of user-acceptability of the produced highlight reel, producing an updated highlight reel using a reweighted selection of clips from the digital video.

2. A method as recited in claim 1, further comprising providing enhanced metadata to the digital video, the providing comprising:

generating enhanced metadata based upon, at least in part, existing metadata associated with the digital video;
associating the generated enhanced metadata with the digital video.

3. A method as recited in claim 1, wherein the selecting includes:

segregating the digital video into an intermediate grouping of clips based upon whether the clips match one or more specified criteria;
ranking the clips of the segregated intermediate grouping based upon multiple weighted criteria;
designating an allotment of similarly ranked clips of the segregated intermediate grouping as the produced highlight reel.

4. A method as recited in claim 3, wherein the designated allotment includes clips that are top ranked amongst the segregated intermediate grouping.

5. A method as recited in claim 3, wherein the designated allotment includes clips that are bottom ranked amongst the segregated intermediate grouping.

6. A method as recited in claim 3, further comprising calculating a cardinality of the designated allotment based upon a percentage of the segregated intermediate grouping.

7. A method as recited in claim 1, wherein the production of the updated highlight reel includes:

adjusting one or more weight values of the weighted criteria;
assembling the updated highlight reel by reselecting a collection of clips from the digital video based upon the adjusted weighted criteria.

8. A method as recited in claim 1, wherein the weighted criteria regarding a particular clip are selected from a group of clip properties consisting of identification of the video-capture apparatus, clip-acquisition location, presence of audio, absence of audio, motion, smiling faces, laughter, black frames, location, focus, contrast, shake, red eye, person in clip is a favorite, person in clip is a contact, clip is tagged, user-added criteria, clip was auto-corrected, and social network ranking of the clip.

9. A method as recited in claim 1, further comprising clustering the digital video into multiple clusters of clips, wherein the produced highlight reel is selected from one or more of the multiple clusters.

10. A method as recited in claim 1, further comprising clustering the highlight reel into multiple clusters.

11. One or more processor-readable storage devices having processor-executable instructions embodied thereon, the processor-executable instructions, when executed by one or more processors, direct the one or more processors to perform operations comprising:

obtaining a digital video;
producing a highlight reel of clips from the digital video, the producing comprising: analyzing metadata and content of the clips of the digital video; selecting multiple clips from the digital video based upon the analyzing and weighted selection criteria;
presenting, via a user-interface of a computing apparatus, the highlight reel.

12. One or more processor-readable storage devices as recited in claim 11, further comprising:

providing enhanced metadata to the digital video, the providing comprising: generating enhanced metadata based upon, at least in part, content of one or more clips of the digital video; associating the generated enhanced metadata with the digital video,
wherein the analyzing includes analyzing the enhanced metadata.

13. One or more processor-readable storage devices as recited in claim 11, further comprising:

attaining feedback regarding user-acceptability of the produced highlight reel;
in response to the attained user-feedback indicating a lack of user-acceptability of the produced highlight reel, producing an updated highlight reel using a reweighted selection of clips from the digital video.

14. One or more processor-readable storage devices as recited in claim 11, wherein the selecting includes:

segregating the digital video into an intermediate grouping of clips based upon whether the clips match one or more specified criteria;
ranking the clips of the segregated intermediate grouping based upon multiple weighted criteria;
designating an allotment of similarly top or bottom ranked clips of the segregated intermediate grouping as the produced highlight reel.

15. One or more processor-readable storage devices as recited in claim 11, wherein the production of the updated highlight reel includes:

adjusting one or more weight values of the weighted criteria;
assembling the updated highlight reel by reselecting a collection of clips from the digital video based upon the adjusted weighted criteria.

16. One or more processor-readable storage devices as recited in claim 11, wherein the weighted criteria regarding a particular clip are selected from a group of clip properties consisting of identification of the video-capture apparatus, clip-acquisition location, presence of audio, absence of audio, motion, smiling faces, laughter, black frames, location, focus, contrast, shake, red eye, person in clip is a favorite, person in clip is a contact, clip is tagged, user-added criteria, clip was auto-corrected, and social network ranking of the clip.

17. One or more processor-readable storage devices as recited in claim 11, wherein the obtaining comprises retrieving the digital video via a telecommunication network.

18. An apparatus comprising:

a video-storage unit configured to store digital videos;
a highlight-production unit configured to produce a highlight reel of clips from one or more of the stored digital videos, the highlight-production unit being further configured to: analyze metadata and content of the clips of the digital video; select multiple clips from the digital video based upon the analyzing and weighted selection criteria;
a highlight-presentation unit configured to present the produced highlight reel.

19. An apparatus as recited in claim 18, further comprising a user-interaction unit configured to attain feedback regarding user-acceptability of the produced highlight reel presented to a user via the highlight-presentation unit.

20. An apparatus as recited in claim 18, wherein the highlight-production unit is further configured to produce, in response to attained user-feedback indicating a lack of user-acceptability of the produced highlight reel, an updated highlight reel using a reweighted selection of clips from the digital video.

Patent History
Publication number: 20120189284
Type: Application
Filed: Mar 5, 2011
Publication Date: Jul 26, 2012
Inventors: Andrew Morrison (Bellevue, WA), Chris Park (Seattle, WA), Kevin Lau (Issaquah, WA), Parthu Kishen (Renton, WA), Desmond Smith (Seattle, WA), Michael Bibik (Seattle, WA)
Application Number: 13/041,370
Classifications
Current U.S. Class: With Video Gui (386/282); 386/E05.028
International Classification: H04N 5/93 (20060101);