Systems and Methods for Forecasting Program Viewership
Systems and methods for predicting who is watching a program are disclosed. Text related to the program can be reviewed. Pre-determined genre words and pre-determined keywords can be determined. Words from the text which are relevant words can be determined. How closely the relevant words coincide to the pre-determined genre words can be determined by generating a breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords. It can be predicted who will watch the program based on the breakdown.
Latest SINTEC MEDIA LTD. Patents:
This application is a continuation application of U.S. application Ser. No. 16/444,514, filed Jun. 18, 2019, which claims the benefit of U.S. Provisional Patent Application No. 62/686,402, filed Jun. 18, 2018 the disclosures of which are hereby incorporated by reference in their entireties.
BRIEF DESCRIPTION OF THE DRAWINGSVarious objectives, features, and advantages of the disclosed subject matter can be more fully appreciated with reference to the following detailed description of the disclosed subject matter when considered in connection with the following drawings, in which like reference numerals identify like elements.
The drawings are not necessarily to scale, or inclusive of all elements of a system, emphasis instead generally being placed upon illustrating the concepts, structures, and techniques sought to be protected herein.
DETAILED DESCRIPTION OF ASPECTS OF THE DISCLOSURESystems and methods for forecasting program viewership are provided. First, training (e.g., deep machine learning) can be done against enriched historical data which can result in a forecast model for a program (e.g., a movie, a show in a series). Second, forecasting can be done against the enriched future schedule data utilizing the previously trained model.
Many types of data can be analyzed using the systems and methods described herein, comprising: historical ratings data, historical schedule data, or historical content metadata, or any combination thereof.
Historical Ratings Data. In some embodiments, the core data for these processes can be provided by the customer (e.g., a cable station that wishes to sell advertising for certain programs). The historical ratings data at various levels of granularity can be provided to the customer by a rating agency (e.g., Nielsen, GfK, Kantar, Comscore). The rating agency can collect data from multiple sources, comprising in some embodiments a weighted audience panel. The rating agency data can comprise a summary of ratings measurements (e.g., ratings for demographic groups by time unit); and/or accompanying information for the program (e.g., network, time, program name, duration). In addition to any rating agency data, the customer can have their own rating information, or may have competitor rating information. Additional data sources can also be used by the customer, such as set-top-box or digital device viewing data.
Historical Schedule Data. Basic schedule ratings data can be provided by a ratings agency. In addition, the customer's own scheduling data can be used. It can include additional information on the aired content, content airing patterns (e.g., series, repeats), breakdown of the content with break information (e.g., commercial and promotion). Sources such as online publishers of Electronic Program Guides (EPGs) can also be used for schedule data.
Historical Content Metadata. Ratings agencies may not have much information on the aired program beyond the program name and occasionally a genre descriptor. A broadcaster may have metadata in a Broadcast Management System (BMS) and/or a Rights Management System (RMS). These systems can provide input such a a content types, a year of production, a distributor, cast members, box office stats, etc. This type of information may also be obtained by a customer from a third party resource.
In some embodiments, Machine Learning (ML) and Neuro Linguistic Programming (NLP) algorithms can be utilized. In some embodiments, the systems and methods can take into consideration, for example, the following: scheduling time, rating by basic audience demographics, content metadata, content analysis (e.g., which may be referred to herein as a content DNA analysis), audience engagement with content metrics (e.g., which may be referred to herein as BUZZ), competitors viewership impact, series extended trend analysis, or advanced time series analysis, or any combination thereof.
In some embodiments, long data sets (e.g., 2-3 years of historical data) can be used for the forecasting. In other embodiments, shorter data sets (e.g., 4-8 weeks) can be utilized. In other embodiments, long and shorter data sets can be used.
Example Algorithmic ProcessesThe forecast system can predict future viewership (e.g., sometimes referred to as ratings). In some embodiments, the forecast system can predict viewership: within a specific audience segment, for a specific schedule time, or for a specific playback period (e.g., for a specific scheduled long form content such as event, movie, episode of series, etc.), or any combination thereof. In some embodiments, the forecast system can predict the affinity of an audience segment to a specific schedule time and scheduled content. In other embodiments, a content similarity analysis can be used to better predict viewership success.
In some embodiments, several algorithmic subcomponents can be utilized, including, but not limited to, the following:
-
- Time-series forecaster algorithm
- Content finder/matcher algorithm
- Content-DNA analysis algorithm
- Content-DNA analysis enricher algorithm
- Social-Buzz analysis algorithm
- Social-Buzz analysis enricher algorithm
- An overarching algorithm (e.g., a deep learning algorithm such as XGBOOST—see, for example, https://en.wikipedia.org/wiki/Xgboost, which is incorporated by reference in its entirety, for background information).
In some embodiments, the overarching algorithm can comprise multiple components (e.g. any combination of subcomponents 1-7 above), and can also utilize additional forecasting impacting features, including, but not limited to: audience demographic and/or psychographic descriptors, analysis of historical viewership of the specific program or similar programs, broadcast date and/or time related attribute (e.g., date, time, day in year, week day, holidays), content duration and/or advertising break utilization, playback period (e.g., live, same day as live, C3 (e.g., viewed with same commercials within three days of broadcast), C7 (e.g., viewed with same commercials within seven days of broadcast), program pattern indicators, or (e.g., live, network original, network premier, repeat), or any combination thereof. The overarching algorithm can comprise XGBOOST, or another deep-learning algorithm, such as a a decision tree and/or Long Short Tem Memory (LSTM).
Time-series forecasting algorithms can predict the future based on models fit on historical data. The time-series forecaster component can learn the historical viewership of a given series, and/or the related airing of same/similar content, and can predict the viewership of the next airing based on this analysis.
Multiple time-series models can be used in the forecasting system, comprising a Prophet Algorithm and/or linear and/or polynomial regression. The Prophet algorithm can consider linear and/or logistic growth curve trends, and/or any combination of annually, monthly and weekly cyclicality, and/or a user-provided list of important time attributes including exceptions such as holidays. Background information on the Prophet Algorithm can be found at https://research.fb.com/prophet-forecasting-at-scale/, which is incorporated by reference in its entirety.
Content Finder/Matcher AlgorithmCarrying out a data enrichment from web sources can require content identification across multiple web sources. Data available for such content identification can be very limited (e.g., only using content name). The forecasting system can use a content finder and/or matcher algorithm based on multiple web sources that tries to find similar aired content based on a series of attributes being evaluated (e.g., similar channel, similar date & time, similar year of release, similar genres, similar cast, similar director). The output of this process can be matched pointers to the content in multiple web sources that can be used for data enrichment. For example, if a user is interested in the Wonder Woman movie from 2017, the content finder and/or matcher algorithm could search multiple pre-designated web sources for “Wonder Woman”, “2017” etc., and keep adding in attributes until only one program is returned. In other embodiments, more than one program can be returned, or the algorithm can indicate a probability that the returned program is the one the user wants.
Content DNA Analysis AlgorithmThe content DNA engine was developed due to a need to represent the content (e.g. what the content is about) in a numerical manner that could be fed into the overarching algorithm. This can be done by generating a fixed sized vector of numbers (e.g., scoring) per contextual topics.
In some embodiments, methods and systems for predicting who is watching a program can comprise: review text related to the program, the text comprising: plot information, sub-title information, summary information, script information, or synopsis information, or any combination thereof; determine pre-determined genre words and pre-determined keywords based on machine learning analysis of historical programs; determine which words from the text are relevant words, the relevant words being words that help identify genre words or keywords; and determine how closely the relevant words coincide to the pre-determined genre words by generating a DNA breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords; and predict who will watch the program based on the DNA breakdown.
There are many data sets available (e.g., web, services) that can provide attributed topics (e.g., genre) for content. In some embodiments, rather than use a limited topic set and binary values (e.g., content is romance (1)+comedy (1)), but other topics get a (0)), a scoring value (e.g., content is 70% romance and 30% comedy) can be used. In this way, deep learning technology can be used with a large content data set. An artificial neural network (ANN) can thus be used on thousands of programs (e.g., movies and series episodes). For example, in some embodiments a database including information on 85,000 movies and 50,000 series can be utilized. The input layer to the ANN can be text about and/or from all the programs. The text can comprise commonly used short synopsis, plot description, original scripts (when available) and subtitles. In some embodiments, subtitle information can be used before the other types of text, as it can generate more accurate forecasting.
The text can be transformed from word embeddings to a fixed-size vector of length 780, using, for example, using Stanford's GloVe Algorithm. Background information on the GloVe Algorithm can be found at https://nlp.stanford.edu/projects/glove/, which is incorporated by reference in its entirety.
Each element in the vector can represent a “cluster” of words, and the value of the cluster can be the amount of “presence” of this cluster in the text of the content.
Each cluster can represent a general concept and/or subject and/or idea, and each word in the cluster can relate to this idea. For example, two example generated clusters can be:
-
- {india, goa, Bengal, assam, Ceylon, indian, Andhra, tamil}
- {scam, embezzlement, obstruction, bribery, payoff, cheating, evasion, insider, rigging, fixing, forgery, scheme, corruption, blackmail, multimillion, perjury, theft, misuse, racketeering, extortion, fraud, graft, collusion, peddling, bribe}
In the second cluster, there are words that are not directly related (e.g., multimillion and bribe), but because the algorithm can detect that those words often come together, they can be clustered together.
The output layer of the ANN can initially be genres. For example, in some embodiments, a selected list of 26 values picked from Internet Movie Database (e.g., IMDB.com), Wikipedia, and an entity's Broadcast Management System (IBMS) can be utilized as genres. In this way, the ANN can actually learn to predict the genres of the program and their significance from the text of the program.
In addition, in some embodiments, the accuracy of the prediction can be increased by further improving the DNA values by adding the ability to score against commonly used keywords associated with content. A large matrix of programs X keywords associated with these programs can be used. For example, 16,000+ programs and thousands of keywords associated with those programs can be used. For each of the 16,000+ movies, a bitmap against all the keywords can be created. A 1 can be designated if the keyword belongs to the movie and 0 can be designated if not. A pre-defined threshold can be applied, and keywords that appear less times than the pre-defined threshold can be deleted.
A co-occurrence matrix (e.g., dim: (number of keywords)×(number of keywords))—i.e. how many times each keyword appeared with each other keyword) can be created. An example of a co-occurrence matrix is
At this stage, a vector representation for each keyword can be created. The keywords can then be clustered using the vector representations (e.g., using a co-occurrence algorithm in some embodiments). In this way, clusters of similar keywords can be generated, which clusters can be based on the likelihood that those keywords will appear together. The calculated clusters can be reviewed manually and the clusters can be labeled with a name.
For example, the following words could be in a cluster:
-
- christmas
- christmas-carol
- christmas-decorations
- christmas-eve
- christmas-gift
- christmas-lights
- christmas-party
- christmas-present
- christmas-tree
This cluster can be labeled “Christmas”. When keyword clusters such as these (e.g., 35 clusters such as Christmas, birth/children, New York, police) are added to the 26 genres (e.g., drama, comedy, sci-fi), the total length of the DNA representation is, in this example, 26+35=61. In some embodiments, both genres and keyword clusters can be customized. In other embodiments, the number and names of genres can be fixed, but the keyword clusters can be customized. For example, a user could determine to use any number of keyword clusters (e.g., 35 to 135).
Using this methodology,
The DNA enricher role is to get text about the movie and pass it via the Content DNA engine for generation of the topic scoring. Once a content is identified, the enricher is seeking for the subtitles, or other text related to the movie (AKA Fallback process) and generates the DNA.
We frequently use the IMDB ID of the movie as a validator for the correct subtitles (e.g. allocate subtitles files named with the IMDB ID reference).
Social-Buzz Analysis AlgorithmThe social buzz component can provides insights, collected from various web data sources, about the “trendiness” of programs. For example,
The deep learning algorithm can be trained to look the number of wiki views as its input, and the ratings as the output. The deep learning algorithm can predict the ratings fairly well, even without any other feature provided, using only the number of wiki exposures to the content. Other types of social media and content related page visits can also be utilized.
Like the DNA, the role of the social buzz enricher can be to collect data from selected websites on the buzz around the content, while validating it against several parameters.
The collected data can be summarized as a buzz “best features” analysis and added as controlled input to the overarching algorithm.
Overarching AlgorithmThe overarching algorithm can use the above components, as well as many other components known to those of ordinary skill in the art, to create a prediction of audience viewership. It can includes, in addition to commonly used attributes, a deep analysis of the content context and/or the audience engagement (buzz) with the content, and thus analyze mass amounts of historical data in an advanced manner. The deep learning boosting algorithm (e.g., XGBOOST) can be used with added operational controls.
Example Algorithms and FeaturesBUZZ Algorithm.
BUZZ=Σ1n(CP+SM+SNT)*RG*TS
where:
-
- CP—Content Pages—related web page daily number of visits
- SM—Factored (see below) Social Media daily mentions
- SNT—Daily average Sentiment Scoring—in some embodiments, can use host's own machine learning based user commentary sentiment scoring in some embodiments
- RG—Regional source relevancy factor—can evaluate how a specific data source is relevant to the content airing region
- TS—Time shift factor (see below)
Social Media Count. In some embodiments, the following factored Social Media (SM) mentions (e.g., count) can be determined as follows:
where:
-
- PRT—Primary account (@) retweets. Can comprise main account daily messaging retweets. For example, if given @westworldhbo, the information in
FIG. 5 can be found. - SMP—Primary content hashtag (#) mentions. A content specific hashtag can ensure unique content identification. For example #empireFOX ensures not any “empire” related mention is checked.
- Empire Boo Boo Kitty @EmpireBBK·Sh
- Before You Fall Asleep Everyday, Say Something Positive To Yourself: #empire
- #empirefox #inspiration
- In some cases multiple options could be used as a primary hashtag. For example, #westworld and #westworldhbo
- SMS—Secondary content hashtag (#) mentions. For example, a main cast member, director, or a specific season, such as: #jonahnolan,
- SMSr—Secondary content hashtag (#). This can factor in the likelihood a secondary hashtag mention is contributing to BUZZ. For example, a season specific hashtag during airing can be set as extremely relevant (e.g., a 1.0 factor), while an actor hashtag mentioned on its own can be set as less relevant (e.g., 0.2 factor).
- PRT—Primary account (@) retweets. Can comprise main account daily messaging retweets. For example, if given @westworldhbo, the information in
Time Shift Factor. Some data sources can provide user engagement during original airing of content. In some embodiments, the BUZZ algorithm can use information about a schedule in a different region or about a repeated airing long after the original airing. Examples comprise: a movie airing on TV (e.g., several months after the theatrical release), an originally-aired American network content being licensed to a different country, or a rerun of a series.
The following time shift algorithm can be used:
TS=K*(1/D*(Oa−Ea))
-
- K—a set constant for each Time Shift reason (e.g., for a movie gap from a theatrical release, for a regional gap for an origin country release, or a time gap for an original release in the same region). The constant can be set by Machine Learning training on rating relations between the time shifted airing across the regions and/or sources.
- D—Time decay factor. The can be how fast the BUZZ impact is reduced over ta imeline.
- Oa—Original Airing and/or theater release date.
- Ea—Evaluated airing date.
For example,
For example:
For data provided for US primetime data forecasting over 2 months data gap the following BUZZ pattern was established as most effective:
Secondary Hashtag Evaluation. As explained above, hashtags can be used to identify topics of social media discussions. Long-form content (e.g., movies and/or series) can be promoted via a common main hashtag (e.g., defined by an originator and/or publisher). Related sub-hashtags can evolve to denote a character, cast, plotline, season, etc. These sub-hashtags can be used on their own on sub-conversations that are still relevant to identify use engagement (e.g., BUZZ) with the evaluated content.
As these sub-hashtags may not be published and can be created by users in a more organic fashion, identifying them correctly can require either manual or algorithmic effort.
In some embodiments, an algorithm can be used that detects, based on pre-defined keywords (e.g., hashtags), other hashtags that relate to specific content one wants to follow. The algorithm can establish (e.g., using machine learning) the co-occurrence of other hashtags with the pre-defined (e.g., main) hashtags, and can decide, based on a pre-defined threshold, the significance of the keyword in relation to the initial words.
Sentiment Analyzer. Many types of sentiment analyzers (e.g., tweetfeel or steamcrab for twitter, Newssift for online news) or generic engines (e.g., using revealed context, or IBM Watson) may be used in some embodiments. These solutions may report mainly 3 levels of response (e.g., positive, neutral, negative). They can evaluate vocabulary across all contexts (e.g. they can fit any product or service sentiment evaluation).
In other embodiments, an algorithm can be used that analyzes, based on opinionative text about a movie (e.g., IMDB reviews), what is the sentiment, in a more variable resolution (e.g., ranking from 1 to 10). The algorithm can use NLP to pre-process the text and to make a representation of it that could be fed into ANN. The pre-processing can be similar to the one described on the DNA section, and can be based on Stanford's GloVe algorithm for word embedding, as well as on a K-means algorithm.
In a second stage, the algorithm can train an ANN on IMDB reviews, where the input can be the text representation, and the output can be the user ratings. In this way, the ANN can learn to predict what will be the rating from the text.
In an example training process, we can train on 1,220,000 reviews from 27,000 movies. The results on 120,000 reviews can be tested.
Carry-Over Analysis. A parameters that can affect ratings prediction of a specific timed content, is the carry over effect (e.g., the lead-in analysis), which can evaluate: the impact of viewership of the previous program. This can have to do with many reasons (e.g., viewer habits of TV remote usage, level of engagement with content while watching TV, promo effort at the end of content to the coming next content, live vs. recorded watching habits, etc.). One challenge with the carry over analysis is that long term forecasting can be complicated, and the inaccuracy of the main program and the impact of the rating of the previous program can potentially increasing noise and not clear signal.
To avoid double forecasting (e.g., a rough estimate based on minimal input and accurate using mass input and “carrying-over” the rating from the previous program), we can train the algorithm only once, but feed it with the parameters of the previous program (e.g., whether the previous program is live and/or reply, ratings the previous program got in the past, etc.). When selecting the correct feature set of the previous program, the forecasting accuracy improvement can be improved.
Competitor(s) Analysis. Viewership can be impacted by audience viewing choices put in front of them. While linear viewing habits still exists, and some viewers randomly zap across channels and stop when engaged with currently played content, the shift to planned viewing (e.g., recorded or planned by listing and/or an Electronic Programming Guide (EPG)) is rising. This can lead to competition on the viewers'time, which can mean impact analysis may be required of content played at the same time across all available channels. Thus, in some embodiments, features taken from the main competitors of the predicted channel can be used in the algorithm. This can help to further improve the accuracy of the forecast.
Features such as the content DNA and/or BUZZ for the competitors content, can teach the neural network on the relationship of the competitor's popularity and DNA vs. DNA likelihood of swapping to view competitor's content (e.g., what percentage of specific demo users prefer to watch a popular police and/or drama series on a competitor's channel against the predicted spots event). While historical analysis (e.g., machine learning training) against accurate historical rating can ensure the model built for competition analysis can produce good forecasting results, we can add a degree of noise when referencing the competitors future scheduling, especially on long term forecasting, because these schedules can change last minute without long notification. In some embodiments, we also use a competitor's partial future schedule, as these may be published for a limited period in advance (e.g., a few weeks up to a quarter), while the main schedule that is analysed can be for a year or two in advance.
Series Sloping. Tree-based algorithms, like XGBOOST, may not be very good at predicting values outside of the range they already saw. That means, for example, that if our sequence is {5,4,3,2,1} the next predictions could be closer to {1,1,1, . . . } than to {0,−1,−2, . . . }. In other words: XGBOOST can be very good at interpolating but less good at extrapolating.
In order to give the algorithm the capability to predict “outside of the box”, we can include series sloping regressions, that can be linear and/or polynomial, in order to also take the long term trend into account. The graph in
Methods described herein may represent processing that occurs within a system. The subject matter described herein can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. The subject matter described herein can be implemented as one or more computer program products, such as one or more computer programs tangibly embodied in an information carrier (e.g., in a machine readable storage device), or embodied in a propagated signal, for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers). A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification, including the method steps of the subject matter described herein, can be performed by one or more programmable processors (e.g., processor 1610 in
The computer 605 can also include an input/output 1620, a display 1650, and a communications interface 1660.
CONCLUSIONIt is to be understood that the disclosed subject matter is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The disclosed subject matter is capable of other embodiments and of being practiced and carried out in various ways. Accordingly, other implementations are within the scope of the following claims. Also, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. As such, those skilled in the art will appreciate that the conception, upon which this disclosure is based, may readily be utilized as a basis for the designing of other structures, methods, and systems for carrying out the several purposes of the disclosed subject matter. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the disclosed subject matter.
While the disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the disclosure can be embodied in other specific forms without departing from the spirit of the disclosure. In addition, a number of the figures illustrate processes. The specific operations of these processes may not be performed in the exact order shown, described, and/or claimed. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.
While various embodiments have been described above, it should be understood that they have been presented by way of example and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail may be made therein without departing from the spirit and scope. In fact, after reading the above description, it will be apparent to one skilled in the relevant art(s) how to implement alternative embodiments. Thus, the present embodiments should not be limited by any of the above-described embodiments.
In addition, it should be understood that any figures which highlight the functionality and advantages are presented for example purposes only. The disclosed methodology and system are each sufficiently flexible and configurable such that they may be utilized in ways other than that shown.
Further, the purpose of any Abstract of the Disclosure is to enable the U.S. Patent and Trademark Office and the public generally, and especially the scientists, engineers and practitioners in the art who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of the technical disclosure of the application. An Abstract of the Disclosure is not intended to be limiting as to the scope of the present invention in any way.
Although the term “at least one” may often be used in the specification, claims and drawings, the terms “a”, “an”, “the”, “said”, etc. also signify “at least one” or “the at least one” in the specification, claims and drawings.
Additionally, the terms “including”, “comprising” or similar terms in the specification, claims and drawings should be interpreted as meaning “including, but not limited to.”
Finally, it is the applicant's intent that only claims that include the express language “means for” or “step for” be interpreted under 35 U.S.C. 212, paragraph 6. Claims that do not expressly include the phrase “means for” or “step for” are not to be interpreted under 35 U.S.C. 212, paragraph 6.
Claims
1. A method for predicting who is watching a program, the method comprising:
- review text related to the program, the text comprising: plot information, sub-title information, summary information, script information, or synopsis information, or any combination thereof;
- determine pre-determined genre words and pre-determined keywords based on machine learning analysis of historical programs;
- determine which words from the text are relevant words, the relevant words being words that help identify genre words or keywords; and
- determine how closely the relevant words coincide to the pre-determined genre words by generating a DNA breakdown of how many relevant words are the pre-determined genre words and the pre-determined keywords; and
- predict who will watch the program based on the DNA breakdown.
Type: Application
Filed: Jul 25, 2023
Publication Date: Nov 16, 2023
Applicant: SINTEC MEDIA LTD. (JERUSALEM)
Inventors: SAM ABERMAN (LONDON), BINYAMIN EVEN (EFRAT), OSHRI bARAZANI (TEL TE'OMIN), PATRICK BLACKWILL (TAMWORTH)
Application Number: 18/358,548