Topic Based Recommender System & Methods

Info

Publication number: 20080077574
Type: Application
Filed: Sep 14, 2007
Publication Date: Mar 27, 2008
Inventor: John Nicholas Gross (San Francisco, CA)
Application Number: 11/855,934

Abstract

A recommendation system is used to provide suggestions in environments such as message boards, RSS aggregators, blogs and the like by comparing member interests and creating recommendation items corresponding to categorized topics or other members. In some instances a natural language can assist in processing content to sort it into the appropriate topic bin. An advertising module cooperates with the system to provide content based ads relevant to the recommended items.

Description

Description

RELATED APPLICATION DATA

The present application claims the benefit under 35 U.S.C. 119(e) of the priority date of Provisional Application Ser. No. 60/826,677 filed Sep. 22, 2006 which is hereby incorporated by reference herein.

FIELD OF THE INVENTION

The present invention relates to electronic recommendation systems and other related systems.

BACKGROUND

Recommender systems are well known in the art. In one example, such systems can make recommendations for movie titles to a subscriber. In other instances they can provide suggestions for book purchases, or even television program viewing. Such algorithms are commonplace in a number of Internet commerce environments, including at Amazon, CDNOW, and Netflix to name a few, as well as programming guide systems such as TiVO.

Traditionally recommender systems are used in environments in which a content provider is attempting to provide new and interesting material to subscribers, in the form of additional products and services. In some cases (see eg. U.S. Pat. No. 6,493,703 incorporated by reference herein) recommenders have been employed for the purpose of informing members of an online community of content and/or preferences of other members. Nonetheless the use of recommenders has not been extended fully to such domains and other online areas, including social networks, which could benefit from such systems. Only recently for example have recommenders been proposed for generating user to user recommendations in a music related community. See e.g., US Publication No. 2007/0203790 to Torrens, incorporated by reference herein. Similar systems which recommend content/users are described in U.S. Pat. No. 6,493,703 to Knight et al., also incorporated by reference herein.

Multi-dimensional recommenders have also been recently introduced. For an example of such systems, please see U.S. Patent Publication No. 2004/0103092 to Tuzhilin et al. and an article entitled “Incorporating Contextual Information in Recommender Systems Using a Multidimensional Approach” to Adomavicius et al., both of which are hereby incorporated by reference herein. In such systems, however, the extra dimensionality arises from additional content related to items which are nonetheless still traditional commerce items, such as movies.

SUMMARY OF THE INVENTION

An object of the present invention, therefore, is to reduce and/or overcome the aforementioned limitations of the prior art. A recommender system which evaluates multiple data sources is employed to generate more accurate and relevant predictions concerning data items and other users within a community.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of a multi-dimensional recommender system of the present invention.

DETAILED DESCRIPTION

FIG. 1 illustrates an example of a preferred embodiment of a multi-dimensional recommender system 100. A user/item compiler and database 110 includes a schema in which ratings for individual items by individual users are identified in a typical matrix fashion well-known in the art. The primary difference, in this instance, is that the items are not products/services (i.e., books, movies, etc.) as in the prior art, but instead represent more generalized concepts, such as a rating identified by a user for an author, a social network contact, a particular message board or post, a particular blog or website, a particular RSS Feed, etc., as shown by the data received from sources.

Explicit Endorsement Data Sources 120

As an example of an explicit data source 120, in a typical message board application such as operated by Yahoo! (under the moniker Yahoo Message Boards) or the Motley Fool, users are permitted to designate “favorite” authors, and/or to “recommend” posts written by particular individuals. In accordance with the present invention these designations of favorite authors and recommendations for posts are monitored, tabulated, and then translated into ratings for such authors/posts and compiled in a database under control of an item/user compiler module. The ratings will be a function of the environment in which the information is collected of course, so that a recommendation by person A for a post written by person B can be scored as a simple 1 or 0. While current message board systems presently track these kinds of endorsements, it will be understood that the invention can be applied to any aspect of such environments in which subscribers are allowed to endorse, rate, or declare an interest or preference for a certain author, post, subject, etc.

The purpose of using a recommender algorithm (either collaborative filter or content filter as the case may require) would be of course to recommend additional authors, topics, or similar subject matter to members of such message boards based on their professed interests in other authors and topics. For example a first individual with favorite authors A, B, C may not realize that other individuals designating A, B, C as favorite authors also designate D and E as favorite authors, and this information can be passed on to such first individual increase the potential enjoyment of such site.

Similarly in other environments as data source a user's designation of favorite web-logs (blogs), favorite RSS feeds, etc. as evidenced by their inclusion in an RSS aggregator or as designated favorites within a web browser, or by some other mechanism could be similarly tabulated to create a user-item matrix of ratings for such items. This can be used to pass on recommendations for new blogs, RSS feeds, etc.

In some applications an e-commerce site includes social networking features whereby members link to each other explicitly as part of groups. For example in sites operated by Myspace, or Netflix, members can designate other members explicitly with the label friends. As with the other data sources, these user-friend associations can be tabulated into a form suitable for use by a recommendation algorithm. Again, while these sites specifically designate individuals as friends, other sites may allow members to designate some other favorite item, such as an image, a website, a video, etc.

It should be apparent therefore that the item/user compiler database may in fact be comprised of several different dedicated files unique to a particular site or domain of users.

Implicit Endorsement Data Sources 125

In contrast to explicit data sources, the data from implicit data sources 125 includes materials which typically must undergo further processing to determine both the item and the associated rating. That is, in the case of a search result for example, the item may be one of the pages presented in the search result, or one or more concepts derived from the content of such page. The rating may be based on a number of invocations of such page, a length of time spent at such page, or any other well-known attention metric used to determine a person's interest in a particular website.

Other sources of implicit data can include ads selected by an individual (during an online session or from another electronic interface which collects and presents ad related data, such as a Tivo box or the like), audio/video content, posts, blogs, podcasts, articles, stories and the like which are read and/or authored by the person. Those skilled in the art will appreciate that such monitorings could be done in any situation where a person's selections can be identified.

Natural Language Classifier 130

Regardless of the source of the implicit data, the invention uses a natural language classifier/mapper module 130 to translate the raw data into one or more predefined concepts—representing the items in this instance—with reference to a topic/concept classification database 140. For example, a topic/concept may include such items as personal interests/hobbies, music bands, company names, stock symbols, brand names, foods, restaurants, movies, etc., depending on the intended application. These are but examples of course and it will be understood that such topics/concepts could include almost anything.

The items for the recommender database 140 can be mapped onto the topics/concepts either on a 1:1 basis, a 1:N basis, or an N:1 basis. In other words, if an item in the recommender database 140 is designated with the label “Sony,” there may be an identical entry in the topic/concept classification with such term. Semantic equivalents may also be used where appropriate. Similarly a single item “Sony” may be associated with multiple topics/concepts, such as a reference to a particular product or service offered by such company (for example Vaio) a stock symbol for Sony, a reference to a key employee/officer of Sony, and the like. Conversely some topics/concepts may also be mapped to multiple items, so that a reference to Sony Vaio may be linked to such items as Sony and personal state of the art computers.

The natural language classifier/mapper 130 is preferably trained with a training corpus 145 so that it can effectively learn the correct correlations between data and concepts. After training, the natural language classifier/mapper 130 can recognize words/phrases within a search page, ad, post, etc., and correlate them to one or more topics/concepts. Thus if a document contains the word Dell, the NL classifier can be taught to recognize such word as corresponding to such concepts as a particular brand name, a computer company, and the like.

The advantage of such approach, of course, is that documents authored/reviewed by individuals do not have to contain specific or explicit references to the item in question. Thus the system understands that an individual reading articles about Porsches, Ferraris, etc, is probably interested in high end sports cars, luxury items, etc. While NL classifiers are well-known and have been used in other contexts such as search engines and related indices, they do not appear to have been used to date to assist in the identification and rating of items for a recommender.

Ratings

As alluded to earlier the ratings in the above types of applications can be based on any convenient scale depending on the source of the data and the intended use. Some designations may be rated or scaled higher than others, depending on their recency, relative use, etc. The weightings again can be based on system performance requirements, objectives, and other well-known parameters. Thus with all other things being equal, older designations may receive higher scores than more recent designations, so long as the former are still designated as active in the user's day to day experience. So for example, after a predefined period, the first designated favorite author for a particular individual may receive a boosting to their rating if such author is still being read by the individual. Similarly, “stale” endorsements may be reduced over time if they are not frequently used. The degree of activity may be benchmarked to cause a desired result (i.e., endorsements receiving no activity within N days may receive a maximum attenuation factor) monitored to attenuate the ratings.

Quantitatively, the ratings therefore can be a simple mathematical relationship of usage frequency and age of the endorsement. The ratings may also be affected by the context in which they are generated, or in which the recommendation is solicited, as noted in the Tuzhilin materials above. The ratings can be updated at any regular desired interval of time, such as on a daily, weekly, or other convenient basis. For example, one approach may use the product of (frequency of use * age of the endorsement), with some normalization applied. This will result in an increase in score for older and more frequently used items. Other types of algorithms will be apparent to those skilled in the art. In this respect the invention attempts to mimic the behavior of a learning network which gives precedence to connections which are more strongly connected and reinforced regularly.

Recommendation Engine Module 115 Outputs

A recommendation engine module 115 thus generates outputs in a conventional fashion using a collaborative filtering algorithm, a content based filtering algorithm, or some combination therefore depending on the particular application and the data available in the item/user database. The outputs can include:

1) predictions on how much particular users will like particular items; for example, in a message board application, an indication of a rating at output 180 that a particular person would give to a specific post, specific author, specific topic, etc.;

2) recommendation outputs 170 on specific authors, topics, posts, etc. which a particular person may want to consider for review in their perusings at such site; this data can be presented to a user in the form of individual entries, top x lists, etc.

3) an output to adjust, adapt or personalize search engine (not shown) results presented to a user in response to a query on a specific subject. For example if a user performed a search at a site relating to video recorders, the result set typically includes a set of N distinct hits. The information from the recommendation engine 115 may be used to tailor the results more particularly to the user.

In a first instance, the user has a prior profile which can be determined and exploited from item/user database 110, so that the search results are modified accordingly. As an example, the user may have expressed a favorable interest, endorsement or inclination towards Sony. This data in turn could be used to optionally modify, bias or alter the N distinct hits to accommodate the prior experiences.

In a second instance, even if the user does not have a profile, the query can be compared against items in the item/user database to determine favored or highly rated articles. Thus, in the above example, any ratings for Sony, or other video recorder suppliers, could be evaluated to identify additional modifications to the search engine results. In this manner a recommender can supplement the performance of a search engine based on real world experiences and thus increase the chances of successful experiences by searchers.

To map search queries to items for the above enhancements, the topic/concept classification database 140 can be consulted as needed. Again this may result in a number of item related entries being used to modify the search results.

It should be apparent that the output could be used by a separate recommender system, as well, to supplement an existing data set.

Advertising Module 150

An advertising module 150 can be used to provide relevant advertising material based on the content of predictions, recommendations and other outputs of the recommendation engine. As seen in FIG. 1, an interface routine 153 permits third parties and site operators to enter well-known advertising campaign information, such as advertising copy/content, desired keywords, and other information well-known in the art. The ads can take any form suitable for presentation within an electronic interface, and may include text and multi-media information (audio, video, graphics, etc.)

In prior art systems ads are correlated to search engine results, such as in a system known as “Adwords” offered by Google. In such applications ads are presented to searchers based on one or more topics identified in a search query.

The present invention extends this concept to recommenders, so that ads are served in accordance with a topic determined from a recommendation. For example, on a message board application, if the system were to determine that (based on prior ratings for certain topics) the user should also be recommended to review content on a board devoted to vintage cars, the ads presented with such recommendation could be tailored to content of such vintage car board, and/or to the specific content of the recommendation itself.

As seen in FIG. 1, the advertising stock 152 offered by third parties is matched against one or more topics/concepts in the topic/concept classification database 140. The mapping of the advertising stock to such topics can again be done automatically by natural language classifier/mapper 130, or alternatively selected independently by the third party/system site operator. In the latter case some oversight may be necessary to prevent third parties from intentionally polluting the relevancy of ads by presenting them in inappropriate contexts.

An advertising engine 151 is invoked and cooperates with a recommendation engine 115 so that relevant ads are presented with an output of the latter. As noted above such ads may also be presented as suitable for inclusion with a modified set of search results for a search engine. In this fashion an advertising system can be superimposed over the recommender system, so that relevant ads are presented at 160 in response to, and in conjunction with, a recommendation, prediction, etc., either at the same time, or at a later time in the form of emails, alerts, printed copy or other suitable materials for consumer consumption.

Applications

As alluded to earlier, the present invention can be used advantageously in a number of e-commerce applications, including:

- Message boards: the invention can be employed to predict/recommend other authors, posters, topics, etc., which would be of interest to members;
- Social networking: the invention can be employed to predict/recommend other contacts, “friends,” topics, etc. which a member of an online community may enjoy based on such member's other friends, topics reviewed, etc. By measuring an adoption rate between members for particular friends, or determining which friends' interests are most often copied, the system can even provide suggestions to specific members so that they send invitations to other members predicted to be good candidates for friends within the community.
- RSS, Blogs, Podcasts, Ads: the invention can be employed to predict/recommend other Ads, RSS feeds, Blogs and Podcasts to individuals, based on adoptions/endorsements made by other online users.

Furthermore other options include monitoring group behavior and treating any such collection of individuals as a single entity for item/rating purposes. This aggregation can be used to recommend higher order logical groupings of individuals, particularly in social networking applications, to enhance the user experience.

That is, in conventional CF systems, individuals are automatically assigned to specific clusters based on a determination of a significant number of common interests/tastes. In the present invention the individual self-selected groupings within social networks can be broken down and treated as clusters so that comparisons can be made against particular user's interests, predilections, etc. Based on such comparisons groups can opt to extend invitations to new members which they would otherwise not notice or come into contact with. Conversely new members can be given some immediate insight into potentially fruitful social groups.

It will be understood by those skilled in the art that the above is merely an example and that countless variations on the above can be implemented in accordance with the present teachings. A number of other conventional steps that would be included in a commercial application have been omitted, as well, to better emphasize the present teachings.

It will be apparent to those skilled in the art that the modules of the present invention, including those illustrated in FIG. 1 can be implemented using any one of many known programming languages suitable for creating applications that can run on large scale computing systems, including servers connected to a network (such as the Internet). The details of the specific implementation of the present invention will vary depending on the programming language(s) used to embody the above principles, and are not material to an understanding of the present invention. Furthermore, in some instances, a portion of the hardware and software of FIG. 1 will be contained locally to a member's computing system, which can include a portable machine or a computing machine at the users premises, such as a personal computer, a PDA, digital video recorder, receiver, etc.

Furthermore it will be apparent to those skilled in the art that this is not the entire set of software modules that can be used, or an exhaustive list of all operations executed by such modules. It is expected, in fact, that other features will be added by system operators in accordance with customer preferences and/or system performance requirements. Furthermore, while not explicitly shown or described herein, the details of the various software routines, executable code, etc., required to effectuate the functionality discussed above in such modules are not material to the present invention, and may be implemented in any number of ways known to those skilled in the art.

The above descriptions are intended as merely illustrative embodiments of the proposed inventions. It is understood that the protection afforded the present invention also comprehends and extends to embodiments different from those above, but which fall within the scope of the present claims.

Claims

1. A method of generating automatic recommendations for content to a first user with a computing system comprising:

(a) identifying a first content reviewed by the first user with the computing system;

(b) identifying a second content reviewed by a plurality of second users with the computing system;

(c) causing a recommender system to generate a prediction and/or a recommendation for portions of said second content which are likely to be of interest to the first user based on an analysis of said first content and said second content; wherein said first content and second content includes materials derived and combined with the computing system as multidimensional data from at least two of the following content sources accessed by said first user and said plurality of second users: 1) a message board; 2) a social network site; 3) a blog; 4) an RSS feed; 5) a content site; wherein the prediction and/or recommendation is based on multidimensional data.

2. The method of claim 1 wherein said recommender prediction and/or recommendation is further based on content authored by said first user and/or said plurality of second users.

3. The method of claim 1 where at least some of said content is derived from implicit ratings determined from classifying data reviewed by the first user and said plurality of second users into one or more topics or concepts.

4. The method of claim 1 wherein said prediction and/or a recommendation is based on explicit ratings provided by the first user and said plurality of second users.

5. The method of claim 4 wherein said explicit ratings are given a weighting in accordance with a time characteristic.

6. The method of claim 5 wherein weighting increases for older ratings.

7. The method of claim 5 wherein said weighting is also adjusted based on a frequency of ratings provided for a particular data item.

8. The method of claim 1 further including a step: presenting an advertisement along with said recommendation, which advertisement is based on a content of said recommendation.

9. A method of generating automatic recommendations for content to a first user with a computing system comprising:

(a) processing a set of first ratings from the first user for a first data source with the computing system, which first data source includes at least one of a human author, a social network contact, a message board, an RSS feed and/or a web log;

(b) processing a set of second ratings from one or more second users for said first data source and one or more second data sources with the computing system, which second data sources also include at least one of a human author, a social network contact, a message board, an RSS feed and/or a web log;

(c) correlating said set of first ratings and said set of second ratings with the computing system to identify a selected set of second users that are suitable as predictors for said first user;

(d) recommending one or more of said second data sources to said first user based on a correlation of said first user to said selected set of second users done with the computing system.

10. The method of claim 9 wherein said correlation is determined by at least one of collaborative filtering and/or corroborative filtering.

11. The method of claim 9 wherein said first set of ratings and said second set of ratings includes implicit ratings data which is determined implicitly from actions taken by said first user and said set of second users in reviewing content presented electronically during an Internet session.

12. The method of claim 9 wherein said set of first ratings and said set of second ratings are recommendations given to authors of message board posts.

13. The method of claim 9 further including a step: presenting an advertisement to said first user which contains content predicted based on said correlation.

14. A method of generating automatic recommendations for content to a first user with a computing system comprising:

(a) providing a database correlating a plurality of individual data items and rankings for a first user, wherein at least some of said individual data items represent human individuals;

(b) identifying first content presented to the first user with the computing system;

(c) identifying a first rating provided by said first user with the computing system for said first content which is related to least a first one of said plurality of individual data items;

(d) identifying second content presented to said first user with the computing system;

(e) identifying a second rating provided by said first user with the computing system for said second content which is related to least a second one of said plurality of individual data items;

(f) repeating steps (a) through (e) for one or more second users;

(g) comparing ratings provided by said first user and one or more second users for said plurality of individual data items to identify correlations between such users and/or items;

(h) generating a prediction and/or a recommendation for the first user concerning at least a third data item based in part on step (g).

15. The method of claim 14 wherein said data items are human perceivable media items.

16. The method of claim 15 wherein said data items are movies.

17. A method of generating automatic recommendations for content to a first user with a computing system comprising:

(a) providing a first database correlating a plurality of individual data items and rankings for a first user, wherein at least some of said individual data items represent human individuals;

(b) providing a second database correlating a plurality of topics or concepts to one or more of said plurality of individual data items;

(c) identifying first content presented to the first user with the computing system;

(d) analyzing said first content to identify one or more of said plurality of topics or concepts and any corresponding individual data item;

(e) identifying a rating provided by said first user with the computing system for said first content;

(f) generating a ranking for said corresponding individual data item from said rating for said first content;

(g) comparing rankings provided by said first user and one or more second users for said plurality of individual data items to identify correlations between such users and/or items;

(h) generating a prediction and/or a recommendation for the first user concerning a data item based in part on step (g).

18. The method of claim 17 wherein step (d) is performed by a natural language engine classifier.

19. The method of claim 18 further including a step: training said natural language engine with a training corpus.

20. The method of claim 17 where said first content includes at least one of: a) an advertisement presented to the first user; b) a search result list; c) human readable content reviewed on the Internet.

21. The method of claim 17 further including a step: customizing a search engine result by the first user concerning one of said topics or concepts based on said prediction and/or recommendation.

22. The method of claim 1 further including a step: presenting an advertisement along with said recommendation, which advertisement is based on a content of said recommendation.

23. A method of generating automatic recommendations for content to a first user with a computing system comprising:

(a) processing a set of first ratings from the first user for a first data source with the computing system, which first data source includes at least one of a human author, a social network contact, a message board, an RSS feed and/or a web log;

wherein said set of first ratings are weighted by at least one of the

following factors: 1) time; and/or 2) frequency;

(b) processing a set of second ratings from one or more second users for said first data source and one or more second data sources with the computing system, which second data sources also include at least one of a human author, a social network contact, a message board, an RSS feed and/or a web log; wherein said set of second ratings are also weighted by at least one of the following factors: 1) time; and/or 2) frequency;

(c) correlating said set of first ratings and said set of second ratings with the computing system to generate groups of users and/or groups of data sources suitable for a recommender system;

(d) generating a recommendation with the recommender system to said first user for one of said second data sources based on step (c).

24. The method of claim 23 further including a step: customizing a search engine result by the first user based on said prediction and/or recommendation.

25. The method of claim 23 further including a step: presenting an advertisement along with said recommendation, which advertisement is based on a content of said recommendation.

26. A method of presenting advertising content in connection with an automatic recommendation to a user comprising:

(a) identifying content presented to a plurality of users;

(b) processing said content with a natural language engine to classify and map such content to one or more topics;

(c) correlating a set of ad items to said one or more topics;

(d) causing a recommender system to generate a prediction and/or a recommendation for a user, said recommendation being related to one or more of said topics;

(e) presenting one of said set of ad items to the user as part of said prediction and/or recommendation.

27. The method of claim 26 further including a step: generating implicit ratings for said content based on behavior of said plurality of users.

28. The method of claim 27 further including a step: weighting said implicit ratings based on a time and/or a frequency of such ratings.

29. The method of claim 26 wherein said recommendation is related to a data source, including one of a human author, a social network contact, a message board, an RSS feed and/or a web log;

30. A method of generating automatic recommendations to a first user in connection with an online message board with a computing system comprising:

(a) identifying a first set of electronic messages on the online message board reviewed by the first user with the computing system;

(b) identifying a second set of electronic messages on the online message board reviewed by a plurality of second users with the computing system;

(c) evaluating a first set of ratings provided by the first user in connection with said first set of electronic messages and a second set of ratings provided by said plurality of second users for said second set of messages with the computing system; wherein said first set of ratings and said second set of ratings can be generated by at least one of an explicit rating and/or an implicit rating, which implicit rating is derived from online actions taken by said first user and said plurality of second users;

(d) generating a prediction and/or a recommendation for the first user from said first set of ratings and said second set of ratings which identifies at least one of: 1) one or more of said plurality of second users which are likely to be of interest to the first user; 2) one or more electronic messages which are likely to be of interest to the first user; 3) one or more electronic message authors which are likely to be of interest to the first user.

31. The method of claim 30 further including a step: customizing a search engine result by the first user based on said prediction and/or recommendation.

32. The method of claim 30 further including a step: presenting an advertisement along with said recommendation, which advertisement is based on a content of said recommendation.

33. The method of claim 30 further including a step: presenting an advertisement to said first user while he/she is reviewing an electronic message, which advertisement is based both on said first set of ratings as well as content of said electronic message.

34. The method of claim 30 wherein at least one of said explicit ratings and/or implicit ratings are given a weighting in accordance with a time characteristic.

35. The method of claim 34 wherein weighting increases for older ratings.

36. The method of claim 35 wherein said weighting is also adjusted based on a frequency of ratings provided for a particular data item.

37. The method of claim 30 wherein additional content reviewed by said first user and said plurality of second users at a separate website from said online message board as well as corresponding ratings for such content are also evaluated in determining said recommendation.

38. The method of claim 30 wherein said ratings are associated with at least one of: a user recommendation for an electronic message; a user designation of a preferred author for electronic messages; a user designation of an ignored author for electronic messages; a user recommendation for a particular topic; a user time spent reviewing an electronic message; a user search for a set of electronic messages; an ad selected by a user while reviewing an electronic message; a number of instances which a user has reviewed a selected electronic message.

39. The method of claim 30 including a step: identifying and publishing lists of groups of users with common ratings behavior.

40. The method of claim 30 including a step: identifying individual groups of users with common ratings behavior and providing suggestions to such groups for new memberships.