SYSTEMS AND METHODS FOR USING VIDEO METADATA TO ASSOCIATE ADVERTISEMENTS THEREWITH
A system for using metadata from a video signal to associate advertisements therewith, comprising (i) a segmentation system to divide the video signal into video clips, (ii) a digitizing system for digitizing the video clips, (iii) a feature extraction system for extracting audio and video features from each video clip, associating each audio feature with respective video clips, associating each video feature with respective video clips, and saving the audio and video features into an associated metadata file, (iv) a web interface to the feature extraction system for receiving the video clips, and (v) a database, wherein video signals and associated metadata files are stored and indexed, wherein the associated metadata file is provided when a video player requests the corresponding video signal, enabling selection of a relevant advertisement for presentment in conjunction with respective video clips based on the associated audio and video features of the respective video clip.
This application is a Continuation of U.S. patent application Ser. No. 12/206,622, filed Sep. 8, 2008, which claims the benefit under 35 U.S.C. § 119(e) of U.S. Provisional Patent Application Ser. No. 60/970,593, entitled “Systems and Methods for Using Video Metadata to Associate Advertisements Therewith,” filed Sep. 7, 2007. The entire contents of the above mentioned applications are hereby incorporated by reference for all purposes as if fully set forth herein. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application.
TECHNICAL FIELDThe present invention relates generally to targeted advertisements and, more particularly, to methods and systems for delivering targeted advertisements in association with a video program based on metadata associated with the video program.
BACKGROUNDAn advertisement promotes the goods, services, organizations, ideas, etc. of an organization or company via a media. Traditional advertisements were made on printed materials and were available on pamphlets, flyers, billboards, posters, newspapers, and magazines. As electronic technology developed, commercials were incorporated into multimedia content, such as radio, television, and movies and were typically presented as an interruption of the primary content—occurring either before the primary content or at intervals during the primary content. Today, advertisements are placed within television programs and movies through product placements and are available on the Internet and on electronically stored content (e.g., DVDs), such as in commercials, trailers, and in promotions on DVDs.
Traditional advertisements have typically targeted general audiences. Such advertisements can be tailored somewhat to the audience likely to be watching a movie, television program or show or event, or radio station or program based on the general content of the program or show and based on the likely demographic of the audience who would be expected to watch such program or show. The Internet provides advertisers with a more specific targeted audience and, hence, higher potential return on their advertisement expenses. For example, because each computer contains potentially trackable and usable information about user(s) of that computer (e.g., through the use of cookies, location information, language settings, and prior web sites accessed), Internet websites are able to use such information to generate banner or pop-up advertisements that are based on some information available about potential users of each computer. In yet another example, Internet search engine sites are able to “sell” the terms or keywords used by an Internet searcher to present targeted advertisements that have been associated with specific keywords or search terms. Such advertisements are presented in pop-up windows, banner advertisement windows, or as “sponsored” links to websites that have requested and paid for prominent placements on the search results screen for specific keywords or search terms. An Internet user that searches “keywords” is more likely than a member of the general public to be a potential customer of a good or services associated with such keywords.
With the continuing advance of technology, bandwidth, and availability of broadband access, online video viewing is becoming increasingly popular and promises to become even more prevalent with the continuing expansion and use of IPTV and video on demand. Unlike static or substantially-static content (text, photographs) that is typically available on a webpage, that gets updated only periodically (more frequently for a news webpage and much less frequently for a standard company webpage), and that sustains a particular viewer for only a brief amount of time, commercial videos over the Internet provide an opportunity to capture a viewing audience for a substantially longer amount of time. However, audiences that are used to watching movies and television on DVDs or off of a DVR are unwilling to view conventional advertisements that interrupt the flow of the video stream.
For these and many other reasons, there is a need for a technology platform that is able to provide and display advertisements that are targeted to the specific audience and that are tied to specific programming being viewed. There is a need for methods and systems that enable such advertisements to be viewed selectively and simultaneously with the primary content in such a way that does not interfere with the primary content. There are yet further needs for methods and systems that provide real-time advertisements for the viewer regardless of whether the viewer is accessing the content from off of the Internet or from a DVD or similar electronic media storage if the display device has access to the Internet.
Therefore, it is apparent that a heretofore unaddressed need exists in the art to address the aforementioned deficiencies and inadequacies.
SUMMARYThe present invention, in one aspect, relates to a method for using metadata from a video signal to associate advertisements therewith. In one embodiment, the method includes (i) segmenting the video signal into a plurality of video clips, (ii) extracting audio and video features from a video signal, (iii) digitizing the plurality of video clips, (iv) identifying extracted audio features within respective digitized video clips using audio processing, wherein each audio feature is associated with the respective digitized video clip, (v) identifying extracted video features within respective digitized video clips using visual processing, wherein each video feature is associated with the respective digitized video clip, (vi) saving the associated audio features and associated video features in a metadata file, (vii) associating the metadata file with the video signal, (viii) storing the metadata file in a database, and (ix) providing the associated metadata file when a video player requests the corresponding video signal. The associated metadata file enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip.
The video features includes at least one of (i) one or more people, (ii) one or more characters, (iii) one or more animals, (iv) one or more objects, (v) one or more geographic locations, (vi) background, (vii) one or more scene, or a combination of these features. In one embodiment, these video features are extracted by a visual processing system of the feature extraction system. In another embodiment, the method includes the step of identifying and recognizing one or more objects from the video signal by an object classification system of the feature extraction system. In yet another embodiment, the method includes the step of identifying and recognizing one or more scenes from the video signal by a scene classification system of the feature extraction system. In yet another embodiment, the method includes a combination of both steps.
In one embodiment, the video signal may contain accompanying audio signal. Audio features of the audio signal includes at least one of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and a combination of these audio features. In another embodiment, the method further includes the steps of: (i) identifying and recognizing one or more background sounds from the audio signal by using a sound classification system of the feature extraction system, (ii) identify and recognizing one or more music segments from the audio signal by using a music classification system of the feature extraction system, and (iii) identifying and recognizing human speech, dialogues, one or more words, one or more phrases by using a speech recognition system of the feature extraction system. In yet another embodiment, the method further includes the steps of: (i) collecting audio features of the audio signal by using audio signal recognition system of the feature extraction system, and (ii) saving the collected audio features in the metadata file.
In one embodiment, the metadata file is an XML file. The metadata file contains one or more of (i) video identification information, (ii) a file name, (iii) a digital signature, (iv) the length of the video signal, (v) a keyword list, (vi) a time-coded transcript, (vii) one or more segments with a corresponding start and stop time, (viii) one or more contents, (ix) one or more characters, (x) one or more animals, (xi) one or more objects, and (xii) a list of vocabulary.
In another aspect, the present invention relates to a system for using metadata from a video signal to associate advertisements therewith. In one embodiment, the system has (i) a segmentation system for dividing the video signal into a plurality of video clips, (ii) a digitizing system for digitizing the plurality of video clips, (iii) a feature extraction system for extracting audio features and video features from each digitized video clip, associating each audio feature with at least one digitized video clip, associating each video feature with at least one digitized video clip, and saving the audio features and video features into a metadata file associated with the video signal, (iv) a web interface to the feature extraction system for receiving the digitized video clips, and (v) a database accessible by a third party user, wherein video signals and associated metadata files are stored and indexed with a unique filename for each video signal in the database and its corresponding video signal. The associated metadata file is provided when a video player requests the corresponding video signal, and enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip.
In one embodiment, the video features comprise at least one of (i) one or more people, (ii) one or more characters, (iii) one or more animals, (iv) one or more objects, (v) one or more geographic locations, (vi) background, (vii) one or more scenes, and (viii) any combination thereof. In another embodiment, the video signal includes an accompanying audio signal.
In another embodiment, the audio features of the audio signal comprise one or more of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and (vi) any combination thereof. In one feature, the feature extraction system further comprises an audio signal recognition (ASR) system to identify and recognize the audio features of the video signal, and a visual processing system to identify and recognize the visual features of the video signal. In another feature, the visual processing system further comprises a object classification system to identify and recognize one or more objects from the video signal, and a scene classification system to identify and recognize one or more scenes from the video signal. In yet a further feature, the audio signal recognition system further comprises a sound classification system to identify and recognize one or more background sounds from the audio signal, and a music classification system to identify and recognize one or more music segments from the audio signal, and a speech recognition system to identify and recognize human speech, dialogues, one or more words, one or more phrases. In another feature, the metadata file comprises one or more of video identification information, a file name, a digital signature, the length of the video signal, a keyword list, a time-coded transcript, one or more segments with a corresponding start and stop time, one or more contents, one or more characters, one or more pets, one or more objects, and a list of vocabulary.
These and other aspects of the present invention will become apparent from the following description of the preferred embodiment taken in conjunction with the following drawings, although variations and modifications therein may be affected without departing from the spirit and scope of the novel concepts of the disclosure.
The accompanying drawings illustrate one or more embodiments of the invention and, together with the written description, serve to explain the principles of the invention. Wherever possible, the same reference numbers are used throughout the drawings to refer to the same or like elements of an embodiment, and wherein:
The present invention is more particularly described in the following examples that are intended as illustrative only since numerous modifications and variations therein will be apparent to those skilled in the art. Various embodiments of the invention are now described in detail. Referring to the drawings, like numbers indicate like components throughout the views. As used in the description herein and throughout the claims that follow, the meaning of “a”, “an”, and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
The terms used in this specification generally have their ordinary meanings in the art, within the context of the invention, and in the specific context where each term is used.
Certain terms that are used to describe the invention are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the apparatus and methods of the invention and how to make and use them. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that the same thing can be said in more than one way. Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification, including examples of any terms discussed herein, is illustrative only, and in no way limits the scope and meaning of the invention or of any exemplified term. Likewise, the invention is not limited to various embodiments given in this specification. Furthermore, subtitles may be used to help a reader of the specification to read through the specification, which the usage of subtitles, however, has no influence on the scope of the invention.
As used herein, a video program refers to any multimedia content, such as a movie, a television program, an event, a video, an advertisement, a broadcast, or the like that a user would be interested in viewing online or in recorded format.
Turning now to
Before a specific video is provided to the viewer 150, a video file 120 associated with the video program 121 is preferably provided to a metadata generator 130. The video file 120 has or includes a unique file name or other video identifier (designated herein by the variable VID). As will be described in greater detail hereinafter, the metadata generator 130 receives the video file 120 and, using a metadata processor 133, creates or generates a time-coded metadata file 125 associated with the corresponding video file 120 and underlying video program 121. As shown in
When a request 140 for VOD or video streaming of the video program 121 associated with the video file 120 is received from a video display device 155 (such as a computer, Internet or interactive TV, or similar video playback or viewing device) of the viewer 150, the video provider 110 begins providing access to the video program 121 in conventional fashion (i.e., this assumes all communication and billing parameters are already or previously satisfied; such communication and billing parameters being beyond the scope of the present invention but within the scope and understanding of those skilled in the art). Simultaneously or substantially simultaneously with the start of the video streaming, the metadata file 125 associated with the video file 120 is provided to an advertisement distributor 160, which uses an advertisement server 163 to process the metadata file 125 to selectively identify one or more appropriate advertisements from its database 165 of potential advertisements that is appropriate to provide in conjunction with the video program 121 and, specifically, with each discrete segment of the video program 121 based on its time-coded metadata. The selected advertisement file(s) 175 are then provided to the video display device 155 of the viewer 150. The metadata file 125 may be provided in whole to the advertisement distributor 160 or it may be parsed and provided in piece meal or “as needed” fashion to the advertisement distributor 160.
Preferably, as shown in
Although not shown in
In an optional embodiment of that shown in
Turning now to
Turning now to
Turning now to
Turning now to
Similar to the first embodiment, before a stored video program 117 is created and made available to an viewer 150, a video file 120 associated with the stored video program 117 is preferably provided to the metadata generator 130. The video file 120 has or includes a unique file name or other video identifier (designated herein by the variable VID). As will be described in greater detail hereinafter, the metadata generator 130 receives the video file 120 and, using a metadata processor 133, creates or generates a time-coded metadata file 125 associated with the corresponding video file 120 and underlying video program 121. This metadata file 125 is stored in a database 135 of the metadata generator 130 but is also provided back to the video provider 110 and associated with the corresponding video file 120 in video storage databases 115.
As part of the process for creating a stored video program 117, the metadata file 125 associated with the video file 120 is provided to the advertisement distributor 160, which uses an advertisement server 163 to process the metadata file 125 to selectively identify one or more appropriate advertisements from its database 165 of potential advertisements that is appropriate to provide in conjunction with the stored video program 117 and, specifically, with each discrete segment of the stored video program 117 based on its time-coded metadata. The selected advertisement file(s) 175 are then provided back to the video provider 110, which incorporates the advertisement files 175 directly on the stored video program 117 along with the actual video file 120. In this manner, the stored video program 117 has all necessary and desired advertisement files 175 built into the stored video program 117 and plays advertisements during viewing of the video in situations in which the video display device 155 does not (intentional, unintentional, non-compatible, or for whatever reason) have real time access to the Internet to obtain real-time advertisements associated with the video. The remaining aspects, variations, and alternatives of this embodiment are similar to those discussed in association with the first embodiment.
Turning now to
For this reason, it is desirable to have the time-coded metadata file 125 actually stored on the stored video program 117 along with the video file 121 so that when the video program is actually being viewed by the viewer 150 on the video display device 155, the video display device 155 initiates a communication with the advertisement distributor 160 to provide the time-coded metadata file 125 and to receive back appropriate advertisement file(s) 175. Again, in an alternative arrangement, it may be desirable for the viewer 150 to provide or for the advertisement distributor 160 to have user or video display device characteristics 185 (as described in greater detail previously) so that the advertisement files 175 associated with the time-coded metadata of the stored video program 117 are tailored and targeted slightly more at the viewer 150, but still associated with the appropriate segment of the video program.
In an additional, alternative embodiment (not shown), the embodiments shown in
It should also be understood that there are many other alternative arrangements and variations of how and where various files are stored and provided. The embodiments shown in
Generally, when a video program is received or converted to .mp4 format, an underlying time-code exists or is established for the video program. All audio and video metadata identified or extracted from the video program by the metadata processor 133 is then tied or associated with specific points or regions within the time code. Initially, key identifiers for the video program are determined and identified. This includes all characters who appear in the video program, key or reoccurring scene locations, key props and objects, key terms, etc. The key identifiers are typically audio features and/or video features, and are extracted from the video signal. Then, the video portion of the video program is parsed and divided into “short clips” or discrete segments. Such segments can be specified by a predetermined time frame, but can alternatively be identified based on information within the video signal, such as, for example, a change of camera shots, angles, scene change, scene break or the like. It should also be noted that different video segments can be defined by different predetermined time frames.
Once the video signal is divided, then each segmented video clip is then digitized. The breaks between each segment is identified and tied to the time-code timeline. Next, the metadata processor 133 runs a language and speech recognition process through the entire video and associates all of the dialogue and background audio with the appropriate video segments and time-codes. Next, characters within the video signal are associated with each of the dialogue entries. Finally, the metadata processor 133 runs a number of visual processing programs to identify characters, objects, scenes within each segment of the video program. Each identified audio feature is thus associated with at least one segmented video clip. Similarly, each identified video feature is also associated with at least one segmented video clip.
The associated metadata file enables selection of a relevant advertisement for presentment in conjunction with each respective digitized video clip of the corresponding video signal based on the associated audio features and the associated video features of the respective digitized video clip. Those of skill in the art will readily appreciate that presentment is typically implemented by a visual display device, but may also include email, file delivery, and other delivery methods.
The video features identified by the visual processor include at least one of (i) people, (ii) characters, (iii) animals, (iv) objects, (v) geographic locations, (vi) background, (vii) scenes, or a combination of any of these features. Preferably, these video features are extracted by a visual processing system of the feature extraction system. In one embodiment, the method includes the step of identifying and recognizing one or more objects from the video signal by an object classification system of the feature extraction system. In another embodiment, the method includes the step of identifying and recognizing one or more scenes from the video signal by a scene classification system of the feature extraction system. In yet another embodiment, the method includes a combination of both steps.
Audio features of the audio signal includes at least one of (i) a list of one or more words, (ii) speeches by one or more people, (iii) dialogue by one or more people, (iv) music, (v) background sound, and a combination of these audio features. The method further includes the steps of: (i) identifying and recognizing one or more background sounds from the audio signal by using a sound classification system of the feature extraction system, (ii) identify and recognizing one or more music segments from the audio signal by using a music classification system of the feature extraction system, and (iii) identifying and recognizing human speech, dialogues, one or more words, one or more phrases by using a speech recognition system of the feature extraction system. The method further includes the steps of: (i) collecting audio features of the audio signal by using audio signal recognition system of the feature extraction system, and (ii) saving the collected audio features in the metadata file.
Preferably, the metadata file is in XML format. An exemplary portion of a time-coded metadata file, in XML format, is illustrated in
The foregoing description of the exemplary embodiments of the invention has been presented only for the purposes of illustration and description and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching.
The embodiments were chosen and described in order to explain the principles of the invention and their practical application so as to enable others skilled in the art to utilize the invention and various embodiments and with various modifications as are suited to the particular use contemplated. Alternative embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its spirit and scope. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description and the exemplary embodiments described therein.
Claims
1-24. (canceled)
25. A method comprising:
- extracting, by a server, audio and visual features from each segment of a plurality of segments of a content item;
- cross-referencing, by the server, a database with an identifier for the content item;
- updating, at the server, time-based metadata associated with the content item in the database with the audio and visual features of each segment of the plurality of segments of the content item;
- transmitting, by the server, the content item to a display device;
- transmitting, by the server, a portion of time-based metadata corresponding to a first segment of the plurality of segments to an advertising distributor that selects a relevant advertisement based at least in part on the portion of time-based metadata corresponding to the first segment;
- receiving, by the server, the relevant advertisement from the advertising distributor; and
- transmitting, by the server, the relevant advertisement to the display device and the time-based metadata.
26. The method of claim 25 wherein transmitting, by the server, the portion of time-based metadata corresponding to the first segment of the plurality of segments to an advertising distributor comprises:
- determining an object in the first segment;
- transmitting, by the server, metadata associated with the object.
27. The method of claim 25 further comprising determining, by the server, a location in the content item to insert an advertisement.
28. The method of claim 27 further comprising:
- determining the portion of the time-based metadata to transmit to an advertising distributor based on the determined location in the content item to insert the advertisement.
29. The method of claim 25 wherein extracting, by a server, audio and visual features from each segment of the plurality of the segments of the content item comprises:
- identifying, by the server, extracted audio features from the content item using audio processing, wherein the extracted audio features comprise time-code locations of each segment of the plurality of segments within the content item.
30. The method of claim 29, wherein the extracted audio features include one or more of discrete sounds and background noise.
31. The method of claim 25 wherein extracting, by a server, audio and visual features from each segment of the plurality of the segments of the content item comprises:
- identifying, by the server, extracted video features from the content item using video processing, wherein the extracted video features comprise time-code locations of each segment of the plurality of segments within the content item.
32. The method of claim 31, wherein the extracted video features include one or more of actors, characters, animals, objects, geographic locations, background, setting, theme, events, and scenes.
33. The method of claim 25, wherein cross-referencing the database with an identifier for the content item comprises:
- determining that the content item has been processed previously;
- in response to determining that the content item has been processed previously, cross-referencing the identifier for the content item and an existing video signature.
34. The method of claim 25 wherein the relevant advertisement is selected, by the advertising distributor, at least in part based on audio and visual features of a segment of the plurality of the segments.
35. A system comprising:
- a video provider configured to:
- extract audio and visual features from each segment of a plurality of segments of a content item;
- cross-reference a database with an identifier for the content item;
- update time-based metadata associated with the content item in the database with the audio and visual features of each segment of the plurality of segments;
- input/output circuitry configured to:
- transmit the content item to a display device;
- transmit a portion of the time-based metadata corresponding to a first segment of the plurality of segments to an advertising distributor that selects a relevant advertisement based at least in part on the portion of time-based metadata corresponding to the first segment;
- receive the relevant advertisement from the advertising distributor; and
- transmit the relevant advertisement to the display device and the time-based metadata.
36. The system of claim 35 wherein the input/output circuitry, when transmitting a portion of the time-based metadata corresponding to the first segment of the plurality of segments to an advertising distributor, is further configured to:
- determine an object in the first segment;
- transmit, by the server, metadata associated with the object.
37. The system of claim 35, wherein the video provider is further configured to determine a location in the content item to insert an advertisement.
38. The system of claim 37, wherein the video provider is further configured to determine the portion of the time-based metadata to transmit to an advertising distributor based on the determined location in the content item to insert the advertisement.
39. The system of claim 35, wherein the video provider is further configured to, when extracting audio and visual features from each segment of the plurality of the segments of the content item:
- identify extracted audio features from the content item using audio processing, wherein the extracted audio features comprise time-code locations of each segment of the plurality of segments within the content item.
40. The system of claim 39, wherein the video provider, when identifying extracted audio features from the content item using audio processing, is further configured to identify one or more of discrete sounds and background noise.
41. The system of claim 35, wherein the video provider is further configured to, when extracting audio and visual features from each segment of the plurality of segments of the content item:
- identify extracted video features from the content item using video processing, wherein the extracted video features comprise time-code locations of each segment of the plurality of segments within the content item.
42. The system of claim 41, wherein the video provider, when identifying extracted visual features from the content item using video processing, is further configured to identify one or more of actors, characters, animals, objects, geographic locations, background, setting, theme, events, and scenes.
43. The system of claim 35 wherein the video provider, when cross-referencing the database with an identifier for the content item, is further configured to:
- determine that the content item has been processed previously;
- in response to determining that the content item has been processed previously, cross-reference the identifier for the content item and an existing video signature.
44. The system of claim 35 wherein the advertising distributor is further configured to select the relevant advertisement based at least in part based on audio and visual features of a segment of the plurality of the segments.
Type: Application
Filed: Sep 13, 2023
Publication Date: Jan 4, 2024
Inventors: Matthew G. Berry (Raleigh, NC), Benjamin J. Weinberger (Durham, NC), Schuyler E. Eckstrom (Beaufort, SC), Albert L. Segars (Beaufort, SC)
Application Number: 18/367,794