METHOD AND APPARATUS FOR INTERACTING WITH INFORMATION DISTRIBUTION SYSTEM
Method and apparatus for interacting with an information distribution system to determine a preferred article to access following inspection of any article of a collection of articles are disclosed. A candidate target article is evaluated based on content similarity to a currently inspected article as well as usage data indicating article-transition patterns. Usage data of the population of users is sorted according to predefined users' groups. For a user, of a specific group, currently inspecting a specific article, a recommendation of a candidate successor article is influenced by content similarity, article transition-pattern of the population of users, and respective group-specific article-transition patterns, with the latter preferably given more weight.
The present invention relates to determining the most suitable article to access following inspection of any article of a collection of articles. In particular, the invention is directed to recommending a subsequent article of a currently inspected article based on both article content and historical article-selection data.
BACKGROUND OF THE INVENTIONA content provider may wish to persuade a person who just finished inspection of an article to access a new article which may be relevant to the mission or business of the content provider.
In one approach, a person currently inspecting a specific article may be directed to another article having a significant content similarity to the content of the specific article as determined by a Natural-Language-Processing algorithm, or any other means.
In another approach, a recommended new article may be based on usage data. If a large proportion of all persons who inspected a specific article also inspected a particular other article, a person currently inspecting the specific article may be persuaded to access the particular other article.
Adopting the two approaches independently may result in different recommendations. A recommendation based only on content similarity may miss a very popular article. A recommendation based only on usage data would miss a new article which may be of significant interest to a person inspecting a current article. There is a need therefore for exploring new comprehensive methods which attempt to provide a balanced recommendation based on considering different aspects of information processing.
SUMMARYThe object of the present invention is to provide improved methods and apparatuses for interaction with information distribution system.
In accordance with one aspect, the invention provides an apparatus for recommending a new article following inspection of a current article. The apparatus comprises a pool of hardware processors, memory devices holding article data, user data, and usage data, and memory devices holding modules of software instructions.
The article data includes a collection of articles and corresponding word vectors which may be used to generate content-similarity levels for each pair of articles. The user data includes users' grouping data according to predefined criteria. The usage data includes an overall score of the number of transitions to each other article as well as a group-specific score of the number of transitions to each other article for each user group.
The software instructions cause the pool of processors to determine for each article an appropriate succeeding article and update the overall score and the group-specific score upon detecting a transition from one article to another article within the collection of articles.
In accordance with another aspect, the present invention provides a method of interacting with an information distribution system. The method is implemented at a computing device and comprises processes of acquiring information characterizing a plurality of articles and information characterizing a plurality of users accessing the articles. Inter-article affinity levels are determined accordingly.
A plurality of users of the information system is tracked to identify pairwise article successions, wherein a pairwise article succession comprises two articles accessed by a same user. Composite pairwise affinity levels of said plurality of articles are determined according to respective inter-article content similarity, types of tracked users effecting the pairwise article successions, and pairwise frequency of article successions. A preferred article to succeed a designated article is determined according to the composite pairwise affinity levels. Subsequently, an identifier of the preferred succeeding article is communicated to a user accessing the designated article.
The plurality of users may be segmented into a plurality of clusters according to a predefined criterion and the types of tracked users are determined as identifiers of respective clusters to which said tracked user belong.
For a finer characterization of the users, in addition to associating each tracked user with a respective group of users, a level of significance of a user within a respective group of users may be taken into account. Thus, a type of a tracked user may be defined according to a group of users to which the tracked user belongs and a respective level of significance within the group of users.
In order to evaluate the effectiveness of recommending succeeding articles to users, according to one embodiment, the method implements a process of detecting access transitions to subsequent articles following communicating recommendations to users, and updating a measure of effective recommendations based on the proportion of article transitions that follow respective recommendations.
In accordance with another embodiment, the method evaluates the effectiveness of recommendations by detecting an access transition to a subsequent article following a recommendation. A first composite affinity level of a designated article to a preferred succeeding article, and a second composite affinity level of the designated article to the subsequent article are determined. Discrepancy statistics may then be determined based on comparing the first composite affinity level and the second composite affinity level.
Determining the preferred succeeding article may be based on determining a set of candidate succeeding articles and selecting an article from the set. Directed article pairs originating from a designated article are ranked according to composite pairwise affinity levels and the set of candidate succeeding article is selected based on the result of ranking. The preferred succeeding article may be selected using a randomly sequenced round robin process weighted according to composite pairwise affinity levels of the candidate directed article pairs. In general, a directed article pair of inter-article content similarity, exceeding a predefined threshold, may be excluded from the set of candidate succeeding articles.
The process of determining pairwise inter-article content similarity may be based on formulating word vectors, each word vector characterizing content of a respective article of said plurality of articles and performing pairwise comparisons of word vectors of different articles. A memory device coupled to the computing device stores composite pairwise affinity levels exceeding a predefined lower bound.
In accordance with a further aspect, the present invention provides a method of interacting with an information system comprising tracking a plurality of users accessing a plurality of articles, determining for each tracked user a respective user type, a currently accessed article, and article-access transition if any. For each article-access transition, where a particular user accesses a first article then a second article, a global measure and a user-type measure of transitions from the first article to the second article are updated. A composite measure is then determined as a function of the global measure and the user-type measure. A first target article to succeed the currently accessed article is determined according to composite measures of directed article pairs originating from said currently accessed article. The first target article is communicated to a respective user.
The method implements a process of acquiring contents of the plurality of articles and determining pairwise content similarities of said plurality of articles. A composite affinity level for each directed pair of articles is determined as a function of at least one of a respective content similarity, a respective global measure, and/or a respective user-type measure. A second target article to succeed the currently accessed article may be determined according to composite affinity levels of directed article pairs originating from the currently accessed article. The second target article may also be communicated to a respective user.
According to an embodiment, the method implements a process of acquiring characteristics of the plurality of users and clustering the plurality of users into a number of clusters according to the characteristics and a predefined criterion. The user type may be determined as an identifier of a cluster to which the tracked user belongs.
For a finer determination of the user type, centroids of the plurality of clusters are determined and a centroid-proximity measure of a user is determined according to proximity of the user to a respective centroid.
The global measure and user-type measure of transitions are updated following each article-access transition.
In accordance with a further aspect, the present invention provides an apparatus for interacting with an information system. The apparatus comprises a process and memory devices storing processor executable instructions organized in a number of modules.
A tracking module tracks a plurality of users accessing a plurality of articles to acquire contents of the plurality of articles, characteristics of said plurality of users, and pairwise article successions.
-
- A module characterizing the articles determines pairwise content-similarity levels of the articles.
- A module characterizing the users divides the plurality of users into clusters according to the users' characteristics;
A module characterizing usage accumulates for each directed article pair of said pairwise article successions a gravitation measure based on a respective succession count and an attraction measure for each cluster of users indicating a respective cluster-specific weight.
A recommendation module determines for a reference article an identifier of a preferred succeeding article according to pairwise content-similarity levels, gravitation measure, and said attraction measure. The module communicates the identifier to a user accessing the reference article. The recommendation module is further configured to determine an affinity level for each directed article pair according to respective content-similarity level, gravitation measure, and attraction measure. The directed article pairs originating from each article are sorted into ranks according to respective affinity levels. The preferred succeeding article is then determined according to ranks of directed article pairs originating from the reference article.
The apparatus further comprises a module, stored in one of the memory devices, configured to detect a subsequent article accessed by a user and report discrepancies of content-similarity, gravitation measure, and attraction measure between transition to the subsequent article and a transition to a preferred succeeding article communicated to the user.
Thus, an improved methods and apparatuses for interaction with information distribution system have been provided.
Embodiments of the present invention will be further described with reference to the accompanying exemplary drawings, in which:
Article: Information stored in a medium in the form of text, image, audio signal, and/or video signal is referenced as an article.
Article inspection: The act of reading, viewing, or listening to an article is referenced as “article inspection”.
Directed article pair: When a user visits article “X” then article “Y”, “X” and “Y” are said to form a directed article pair. The transition rates from “X” to “Y” may differ from the transition rate from “Y” to “X”.
Article transition: The act of successively accessing two articles within a predefined interval of time is referenced as “article transition”.
Article succession: An article succession comprises two articles accessed by a same user
Frequency of article successions: The term refers to incidence (numbers) of article sections Pairwise frequency of article successions: The term refers to incidence of article successions for each directed article pair.
Content similarity: A level of similarity of two articles may be based on comparing contents of the two articles.
Pairwise inter-article content similarity: The term refers to values of content similarity for each directed article pair.
Composite pairwise affinity levels: The term refers to values composite affinity levels for each directed article pair.
Inter-article global measure of successions: The term refers to any measure relevant to article successions independent of users causing the successions.
User-type measure: The term refers to any measure relevant to article successions for a specific type of users.
Inter-article gravitation measure: The term refers to a measure of inter-article gravitation based, for example, on a count (score) of the number of transitions from one article to another, or some other criterion, a gravitation measure is independent of the types of users effecting the transitions. An inter-article gravitation measure is a global measure.
Inter-article gravitation level: The term refers to a ratio of an inter-article gravitation measure from a first article to a second article to the total number of transitions from the first article to all other articles.
Inter-article attraction measure: The term refers to a measure based on a summation of levels of significance of users of a same user group (same user cluster) effecting transitions from one article to another. An inter-article attraction measure is a user-type measure.
Inter-article attraction level: The term refers to a ratio of an inter-article attraction measure from a first article to a second article to the total number of transitions from the first article to all other articles effected by users of a same cluster.
A level of significance: A value associated with a user relevant to the user's position within a respective group of users.
User type: A user type may be defined in terms of a user's association with a group of users as well as the user's level of significance within the group.
Clustering: A process of grouping users based on descriptors of the users is referenced as “clustering”. The descriptors of the users may relate to several aspects such as income, education, interest, and social activities. A user may be characterized according to a vector of descriptors.
Centroid: A hypothetical user whose vector of descriptors is a mean value of the vectors of descriptors of a set of users is referenced as a “centroid”. The mean value is not necessarily an arithmetic mean.
Centroid-proximity measure (or centroid-proximity coefficient): A user's distance (such as a normalized Euclidean distance) from a centroid of a cluster to which the user belongs may be used as a level of significance of the user.
Usage data: Usage data includes an overall score of the number of transitions from a reference article to each other article, and a cluster-specific score of the number of transitions to each other article for each user cluster.
Cyclic age: Usage data may be frequently adjusted to place more emphasis on more recent usage patterns. The period between the time of a previous adjustment and a current observation time is referenced as “cyclic age”.
Measure of effective recommendations: The term refers to a value indicating effectiveness of recommendations, such as a proportion of transitions obeying recommendations or a (positive) change in mean value of a composite measure of affinity.
Randomly sequenced round robin process: The term refers to selecting items from different sets in a random order.
NOTATIONρ: Number of article ranks
T(y): Number of transitions to an article y from a reference article
R(y): Rank of article y (with respect to a reference article)
U(r): Index of an article of rank r,
χ: Number of clusters of users
c: Index of a cluster of users
Cj: Cluster of index j, 0≤j<χ
Θ(x,y): Similarity level of articles y and x (Θ(x,y)=Θ(y,x))
G(x,y): Score of any users selecting article y after inspecting article x
-
- (G(y,x) is not necessarily equal to G(x,y))
Γ(x,y,c): Score of users of a specific cluster of users selecting article y after inspecting article x - (Γ(y,x,c) is not necessarily equal to Γ(x,y,c))
Φ(x,y,c): A composite coefficient of affinity of article y to article x - (Φ)(x,y,c) is not necessarily equal to Φ(y,x,c))
τ Data-age threshold
Σmin: Lower bound of gravitation score (for the entire population of users)
Σmin: Upper bound of gravitation score (for the entire population of users)
Smin: Lower bound of attraction score (for a specific cluster of users)
Smax: Upper bound of attraction score (for a specific cluster of users)
- (G(y,x) is not necessarily equal to G(x,y))
- 100: System for measuring and influencing article selection
- 120: A plurality of articles
- 140: A plurality of users
- 160: A learning module
- 180: A learning-and-guiding module
- 182: Communication path from a learning-and-guiding module to a plurality of users
- 200: Components of module 160 and module 180
- 220: Information Distribution System (also referenced as “Information System”)
- 241: Process of acquiring article-access data
- 242: Process of determining article-succession patterns
- 263: Process of recommending article succession
- 264: Process of measuring effect of recommendation
- 300: Article-succession selection system
- 302: Processes relevant to articles
- 304: Processes relevant to usage and users
- 310: Process of acquiring a plurality of articles from the Information Distribution System
- 320: Process of characterizing the plurality of articles
- 330: Process of determining mutual article-content similarity
- 340: Process of tracking users of the Information Distribution System
- 350: Process of selecting a plurality of users of interest
- 360: Process of characterizing the plurality of users of interest
- 370: Process of characterizing usage of articles
- 380: Process of correlating article succession to content similarity and usage statistics
- 390: Process of determining preferred article successions and informing respective users
- 400: A learning system
- 410: Article-characterization data
- 420: User-characterization data
- 430: User-tracking data
- 440: Memory device storing affinity data
- 441: Article content-similarity data
- 442: Usage statistics
- 446: User-independent inter-article gravitation data
- 448: User-specific inter-article attraction data
- 460: Parameters of an affinity-determination expression
- 500: Learning and guiding system
- 510: Module for recommending a successor article
- 520: Module for measuring effect of recommending
- 580: Recommendations sent to users
- 600: Visualization of a plurality of users
- 602: User representation
- 610: A cluster C0 of users
- 611: A cluster C1 of users
- 612: A cluster C2 of users
- 613: A cluster C3 of users
- 620: Centroid of Cluster C0
- 621: Centroid of Cluster C1
- 622: Centroid of Cluster C2
- 623: Centroid of Cluster C3
- 700: Visualization of categorizing users belonging to a cluster into strata according to centroid proximity
- 720: Centroid representation
- 800: Process of affinity-data formation
- 810: Users' indices
- 812: Cluster to which a user belongs
- 814: Centroid-proximity coefficient
- 820: Articles' indices
- 840: Array of content similarity of an article to each other article
- 860: Array of gravitation score from an article to each other article
- 880: Arrays of attraction score from an article to each other article for different clusters of users where the attraction score is independent of users proximity to respective centroids
- 900: Affinity coefficient (affinity level) computation
- 910: Array of article-content similarity levels with respect to a specific article
- 912: A similarity level below a predefined lower bound
- 914: A similarity level exceeding a predefined upper bound
- 920: Array of gravitation scores with respect to a specific article,
- G(x,y), x=8, 0≤y<M, y≠x, M=20);
- 930: Array Γ of attraction scores with respect to a specific article (of index 8) for users of a specific proximity to the centroid of a cluster of users (Γ(x,y,c), x=8, 0≤y<M, y≠x, M=20);
- 940: Array (Φ) of composite affinity coefficients (composite affinity levels) of articles with respect to a specific article, Φ(x,y,c), x=8, 0≤y<M, y≠x, M=20);
- 950: An exemplary composite affinity coefficient (level)
- 1000: Details of learning module 160
- 1010: Step of initializing a gravitation matrix and attraction matrices
- 1020: Process of tracking users
- 1030: Process of detecting article-access succession
- 1040: Process of determining user index and indices of detected successive articles
- 1050: Process of determining a cluster to which the user belongs and level of proximity to centroid
- 1060: Process of updating the gravitation matrix and attraction matrices
- 1100: Learning and guiding module
- 1110: Process of detecting access to an article
- 1120: Process of determining index of user and index of the accessed article
- 1130: Process of determining the cluster to which the user belongs and corresponding centroid-proximity coefficient
- 1140: Process of determining a preferred successor article and recommending the preferred successor to the user
- 1150: Process of detecting user's access to a new article within a specified interval
- 1152: Process of selecting a succeeding process according to a result of detecting
- 1160: Process of determining the index of a new article
- 1170: Process of updating the gravitation matrix and attraction matrices
- 1180: Process of measuring the effect of recommendations
- 1190: Process of communicating measurements for further processing
- 1200: Matrices of affinity data
- 1220: User-independent inter-article affinity
- 1222: Similarity matrix
- 1224: Gravitation matrix
- 1240: User-dependent inter-article affinity
- 1242: Attraction matrices
- 1300: Criterion for selecting a candidate article (1300A) and processes of selecting a candidate article (1300B)
- 1310: Affinity data corresponding to a cluster of users
- 1340: Process of detecting access to an article
- 1350: Process of determining index of an article and cluster to which a user belongs
- 1360: Process of generating a random integer
- 1370: Process of selecting an article corresponding to a generated integer
- 1380: Process of communicating article information to a user
- 1400: Illustration of weighted random selection of candidate articles
- 1420: Rank of a candidate article
- 1440: Start of an integer band
- 1450: An integer band
- 1500: Apparatus for article recommendation
- 1502: Currently accessed article
- 1504: Recommended subsequent article
- 1506: An article selected following a current article
- 1510: Tracking module
- 1520: Pool of processors
- 1530: Memory device storing articles 120 (
FIG. 1 ) - 1540: Memory device storing users' cluster data
- 1550: Memory device storing article-similarity levels
- 1560: Memory device storing article gravitation scores
- 1570: Memory device storing article attraction scores
- 1580: Storage of software instructions
- 1581: Memory device storing software instructions relevant to succeeding-article recommendation
- 1582: Memory device storing software instructions relevant to measuring effect of recommendation
- 1584: Memory device storing software instructions relevant to similarity-data update
- 1586: Memory device storing software instructions relevant to gravitation-data update
- 1588: Memory device storing software instructions relevant to attraction-data update
- 1600: Exemplary article-content similarity levels and gravitation levels
- 1610: Article-content similarity levels with respect to a selected article
- 1620: Gravitation levels with respect to a selected article
- 1700: Exemplary attraction levels
- 1730: Exemplary attraction levels with respect to a selected article where users belonging to a specific cluster are given a same weight
- 1740: Exemplary attraction levels with respect to a selected article where a user belonging to a specific cluster is given a weight dependent on the user's proximity to the centroid of the cluster
- 1800: Sorted articles
- 1810: Sorted articles according to article-content similarity levels
- 1820: Sorted articles according to gravitation levels
- 1830: Sorted articles according to attraction levels 330
- 1900: Ranked articles
- 1910: Indices of sorted article-content similarity levels 410
- 1920: Indices of sorted gravitation levels 420
- 1930: Indices of sorted attraction levels 430
- 2000: Process of merging similarity levels, gravitation levels, and attraction levels of respective subsets of articles
- 2010: A set of articles of significant article-content similarity levels with respect to a specific article
- 2011: Indices of articles of set 2010
- 2012: Article-content similarity levels
- 2020: A set of articles of significant gravitation levels with respect to the specific article
- 2021: Indices of articles of set 2020
- 2022: Gravitation levels
- 2030: A set of articles of significant attraction levels with respect to the specific article
- 2031: Indices of articles of set 2030
- 2032: Attraction levels
- 2040: set containing merged sets 2010 and 2030
- 2041: Indices of articles of combined set 2040
- 2042: Normalized article-content similarity levels corresponding to set 2040
- 2043: Normalized gravitation levels corresponding to set 2040
- 2050: set containing merged sets 2010, 2020, and 2020
- 2051: Indices of articles of set 2050
- 2052: Normalized article-content similarity levels corresponding to set 2050
- 2053: Normalized gravitation levels corresponding to set 2050
- 2054: Normalized attraction levels corresponding to set 2050
- 2100: Process of merging similarity levels, gravitation scores, and attraction scores of respective subsets of articles
- 2120: A set of articles of significant gravitation scores with respect to the specific article
- 2121: Indices of articles of set 2120
- 2122: Gravitation scores
- 2130: A set of articles of significant attraction scores with respect to the specific article
- 2131: Indices of articles of set 2130
- 2132: Attraction scores
- 2140: set containing merged sets 2010 and 2130
- 2141: Indices of articles of combined set 2140
- 2142: Normalized article-content similarity levels corresponding to set 2140
- 2143: Gravitation scores corresponding to set 2040
- 2150: set containing merged sets 2010, 2120, and 2130
- 2151: Indices of articles of set 2150
- 2152: Normalized article-content similarity levels corresponding to set 2150
- 2153: Gravitation scores corresponding to set 2150
- 2154: Attraction scores corresponding to set 2150
- 2200: Processes of computation of content similarity coefficients
- 2210: Initializing index “j” of a reference article to equal 0
- 2220: Acquiring content of article j; for example a set of words W(j)
- 2230: Initializing index “k” of a target article to equal (j+1)
- 2240: Acquiring content of article k; for example a set of words W(k)
- 2250: Determining article-content similarity level α(j, k) of the articles of indices j and k
- 2260: Selecting next target article of index (k+1)
- 2264: Determining if all target articles with respect to the reference article of index j have been considered
- 2270: Selecting next reference article (of index j+1)
- 2280: Determining if all reference articles have been considered
- 2290: Completion message indicating availability of all mutual article-content similarity levels
- 2300: Article-content similarity data
- 2310: Indices of articles
- 2320: Matrix of article-content similarity levels for all pairs of articles (using, for example, a set of words W(j) of an article of index j. 0≤j<M; M being the number of articles under consideration)
- 2325: Mutual article-content similarity level α(j, k), α(k,j)=α(j, k), 0≤j<(M−1), k>j;
- α(x, x)=1.0, 0≤j<M
- 2330: Indices of mutual article-content similarity levels in array 2340
- 2340: Array of article-content similarity level
- 2430: Matrix of mutual article-content similarity levels
- 2440: Mutual article-content similarity level
- 2441: Content-similarity to a target article that is below a predefined lower bound, rendering transition to the target article less likely
- 2442: Content similarity to a target article that is above a predefined upper bound, rendering transition to the target article less likely
- 2500: Sorted articles according to similarity levels
- 2510 Index of a reference article (current article)
- 2520: Index of a target article
- 2530: Content similarity table
- 2540: Content-similarity levels
- 2550: Table of sorted significant mutual article-content levels
- 2560: Indices of target articles of significant mutual content-similarity levels with respect to a reference article
- 2600: Table of sets of candidate successor articles, for each reference article, according to article-content similarity data
- 2610: Index of a reference article
- 2620: Indices of succeeding articles sorted in descending order according to content-similarity levels excluding articles of content-similarity levels to the reference article below a predefined lower bound or above a predefined upper bound.
- 2630: Mutual article-content similarity level
- 2700: Processes of creating, updating, and sorting numbers of transitions from a selected article to other articles of a collection of articles
- 2705: Initializing ρ, array T, array Y, and array U to zero; ρ denotes the number of article ranks, T(y) denotes the number of transitions to an article y, R(y) denotes a rank of article y, and U(r) denotes an index of an article of rank r, 0≤r≤ρ
- 2710: Process of detecting a user accessing an article of index y following visiting an article of index x
- 2712: Process of determining whether article y has been accessed following article x
- 2720: Process of updating array T to count an additional visit of article y following article x
- 2722: Process of revisiting process 2710 if the rank of article y is 1 (hence article y cannot be promoted)
- 2730: Next better rank
- 2740: Process of identifying an article w of a rank better than the rank h of article y
- 2750: Process of determining whether article y has realized a better score than article w, i.e., whether T(y) is now greater than T(w); this condition is only reached if T(y)=T(w)+1
- 2755: Process of visiting process 2780 if article w is of rank 1
- 2760: Selecting another better rank (this would be needed if there are more than two articles having the same score as determined from array T)
- 2770: Process of determining whether to maintain current article ranks and return to process
- 2710 or promote article y to a better rank (process 2780)
- 2775: Revising rank of new article to be promoted
- 2780: Process of exchanging ranks of article y and article w
- 2800: Exemplary results of Processes 2700 (presented in
FIG. 28 as 2800A and inFIG. 29 as 2800B) - 2820: Largest rank ρ of an article inspected after article x
- 2830: Article rank, index of array U storing indices of ranked articles; U(r) is the index of an article of rank r
- 2840: Indices of articles
- 2850: Array T storing a score of each article y selected after article x; T(y) is the number of users selecting article y after article x
- 2860: Array R storing a rank of each article y selected after article x; R(y) is the rank of article Y; R(Y)≤ρ
- 3000: Exemplary table indicating numbers of transitions from each article of a collection of articles to each other article of the collection of articles
- 3010: Index of a reference article (current article)
- 3020: Index of a succeeding article
- 3030: Number of transitions from an article x to an article y (i.e., number of users selecting article y after inspecting article x); 0≤x<M; 0≤y<M, y≠x, M being the number of articles under consideration
- 3040: Total number of transitions from each article
- 3100: Table of normalized article-succession scores
- 3150: Proportion of transitions from an article x to an article y (i.e., ratio of the number of users selecting article y after inspecting article x to the total number 3040 of users selecting an article belonging to a specified collection of articles); 0≤x<M; 0≤y<M; y≠x
- 3200: Data structures for a large-scale system
- 3210: Plurality of selected articles
- 3220: Indices of reference articles
- 3240: Successor articles of significant similarity to respective reference articles
- 3242: Index of target article
- 3245: Similarity level
- 3250: Successor articles of significant gravitation to respective reference articles
- 3255: Gravitation level
- 3260: Successor articles of significant attraction to respective reference articles
- 3265: Attraction level
- 3300: Gravitation scores adjusted according to data age
- 3302: Table of article-gravitation score
- 3304: Table of article-gravitation score with an adjustment
- 3310: Data age in arbitrary units (days for example)
- 3320: Index of an article y succeeding a reference article x
- 3330: Number of users accessing article y after accessing reference article x
- 3340: Total number of users selecting an article from a specified collection of articles after inspecting reference article x
- 3350: Age-adjusted number of users accessing article y after accessing reference article x
- 3360: Age-adjusted total number of users selecting an article from a specified collection of articles after accessing reference article x
- 3402: Table of normalized article-gravitation score
- 3404: Table of normalized article-gravitation score with an adjustment
- 3430: Ratio of number of users accessing article y after accessing reference article x to the total number 3240 of users selecting an article from a specified collection of articles after inspecting reference article x
- 3450: Ratio of the age-adjusted number of users accessing article y after accessing reference article x to the age-adjusted total number 3260 of users selecting an article from a specified collection of articles after accessing reference article x
- 3510: Normalized previous gravitation levels of articles with respect to a reference article
- 3520: Normalized Incremental gravitation levels of articles with respect to a reference article
- 3530: Normalized Cumulative gravitation levels of articles with respect to a reference article
- 3540: Normalized age-adjusted gravitation levels of articles with respect to a reference article
- 3550: Index of target article succeeding reference article
- 3600: A procedure of adjusting article-gravitation data and article-attraction data according to data age and article-transition score
- 3610: A process of detecting user selection of a new article following a specific (reference) article
- 3620: A process of updating gravitation data and attraction data relevant to the reference article to account for selection of the new article
- 3622: Action based on comparing size of gravitation data for a specific article with a predefined lower bound Σmin
- 3624: comparing age of gravitation data with a predefined age threshold and branching to other processes accordingly
- 3626: Comparing size of gravitation data for a specific article with a predefined upper bound Σmax and branching to other processes accordingly
- 3630: A process of adjusting gravitation vector of the reference article
- 3640: Comparing size of attraction data for the specific article with a lower bound Smin defined for a cluster to which the user belongs and branching to other processes accordingly
- 3642: Comparing age of attraction data with a respective predefined age threshold and branching to other processes accordingly
- 3650: A process of adjusting attraction vector of the reference article
- 3652: Comparing size of attraction data for the specific article with an upper bound Smax defined for the cluster to which the user belongs and branching to other processes accordingly
- 3710: Article-transitions score at age t, exceeding Σmin
- 3720: Adjusted article-transitions score at age T, with an adjustment coefficient of 0.5
- 3730: Article-transitions score exceeding Σmin, at cyclic age T
- 3740: Adjusted article-transitions score at cyclic age τ, with an adjustment coefficient of 0.5
- 3750: Article-transitions score of Σmax at cyclic age less than τ
- 3760: Adjusted article-transitions score with an adjustment coefficient of 0.5 at a cyclic age less than τ
- 3820: Adjusted article-transitions score at age τ, with an adjustment coefficient of 0.8
- 3830: Article-transitions score of Σmax at cyclic age less than τ
- 3840: Adjusted article-transitions score with an adjustment coefficient of 0.8 at a cyclic age less than τ
- 3850: Article-transitions score of Σmin at cyclic age much less than τ
- 3860: Adjusted article-transitions score with an adjustment coefficient of 0.8 at a cyclic age much less than τ
- 3900: A module for determining a preferred article to succeed a current article based on age-weighted score of article successions
- 3910: Process of accessing a cyclic timer providing cyclic-time indications
- 3920: Process of detecting article succession where a user accesses a subsequent article following a current article
- 3922: Process of identifying type of a user effecting the article succession
- 3924: Process of updating gravitation scores and attraction scores to account for the detected succession
- 3926: Process of acquiring a total gravitation score Σ1 of the source article of the succession
- 3928: Step of branching to other processes based on comparing Σ1 with a predefined lower bound Σmin
- 3930: Process of comparing a current cyclic time indication with a predefined period (50 days for example)
- 3940: Process of attenuating gravitation scores of a reference article; multiplying each pairwise gravitation score of a directed article pair by a predefined value; 0.8 for example
- 3950: Process of acquiring a total attraction score S1 of the source article of the succession based on user type
- 3952: Step of branching to other processes based on comparing S1 with a predefined lower bound Smin
- 3960: Process of attenuating attraction scores of a reference article (multiplying each pairwise attraction score of a directed article pair by a predefined values which may be user-type specific
- 3980: Processes of determining a preferred succeeding article based on accumulated scores which may be age weighted
Processes 302 comprise:
-
- process 310 of acquiring a plurality of articles from the Information Distribution System;
- process 320 of characterizing the plurality of articles; and
- process 330 of determining mutual article-content similarity.
Processes 304 comprise:
-
- process 340 of tracking users of the Information Distribution System;
- process 350 of selecting a plurality of users of interest;
- process 360 of characterizing the plurality of users of interest; and
- process 370 of characterizing usage of articles.
A process 380 correlates article succession to content similarity and usage statistics. A process 390 determines preferred article successions and informs respective users.
-
- learning system 400;
- a module 510 for determining a preferred article to follow a specific article and sending a recommendation 580 to a user accessing the specific article; and
- a module 520 for measuring effect of recommending articles to users.
The scores in arrays 880 of attraction score are selected to be independent of user's proximity to respective centroids for ease of illustration. The total gravitation score (number of transitions) from the reference article if index x to the article of index 9 is 19 of which two transitions are effected by users of cluster C0, seven transitions are effected by users of cluster C1, one transition is effected by a user of cluster C2, and nine transitions are effected by a user of cluster C3. In a preferred implementation, centroid proximity is considered, hence arrays 880 hold real numbers rather than integers.
The figure illustrates:
-
- an array 910 of article-content similarity levels Θ (x,y) with respect to reference article x (x=8);
- an array 920 of gravitation scores, each denoted G(x,y), of a specific article x to each other article y, for x=8, 0≤y<M, y≠x, M=20;
- an array 930, of attraction scores, each denoted Γ(x,y,c), of a specific article (of index 8) to each other article y, for users of a specific proximity to the centroid of a cluster of users; x=8, 0≤y<M, y≠x, M=20);
- an array 940, denoted Φ(x,y,c), of composite affinity levels of article x to each other article y; x=8, 0≤y<M, y≠x, M=20); and
- an exemplary composite affinity expression 950.
An article content-similarity level is a normalized variable with 0.0<Θ(x,y)≤1.0. However, the gravitation score G(x,y) rather than the normalized gravitation level g(x,y), where 0.0≤g(x,y)≤1.0, is used because the gravitation score is updated frequently since each article transition cause an update. Thus, it is more computationally efficient to use the gravitation score a respective coefficient in the expression Φ(x,y,c). Likewise, the attraction score Γ(x,y,c) rather than the normalized attraction level γ(x,y,c) is used in the expression Φ(x,y,c).
The process 900 determines, for a selected reference article x, a composite affinity level to each other article of the M articles indexed as articles 0 to (M−1). Each entry of an array 910 indicates an article-content similarity level for an article of index y with respect to an article of index x; x=8 in the example of
Each entry of an array 920 indicates a gravitation score G(x,y) for an article of index y with respect to an article of index x. The gravitation level G(x,y,c) of the article of index x to the article of index y is not necessarily equal to the reciprocal gravitation level G(y,x,c).
Each entry of an array 930 indicates attraction score of an article of index x to an article of index y. The attraction scores are determined taking into consideration proximity of users to the centroid of a respective cluster; hence the attraction scores are generally real numbers. The attraction score Γ(x,y,c) of the article of index x to the article of index y is not necessarily equal to the attraction score Γ(y,x,c) of article y to article x even within the same cluster c, 0≤c<χ.
Each entry of an array 940 indicates a composite affinity level of an article of index x to an article of index y. The composite affinity level is denoted Φ(x,y,c) and may be determined according to expression 950 as:
Φ(x,y,c)=Θ(x,y)+A×G(x,y)+B×Γ(x,y,c),
where parameters A and B may be judicially selected or preferably determined from historical data.
The composite affinity level Φ(x,y,c) of the article of index x to the article of index y is not necessarily equal to composite affinity level Φ(y,x,c) of the article of index y to the article of index x.
Process 1110 detects access to an article. Process 1120 determines index x of the accessed article and index j of the user accessing the article. Process 1130 identifies the cluster c to which the user belongs and corresponding centroid-proximity coefficient Ω. Process 1140 determines a preferred successor article and recommends the preferred successor to the user. Process 1150 detects user's access to a subsequent article within a specified interval of time. Process 1152 determines whether the user selected a subsequent article. Process 1160 determines the index of the subsequent article, if any. Process 1170 updates the gravitation score matrix and attraction score matrices according to the subsequent article, if any. Process 1180 measures the effect of recommendations. The discrepancy Δ between the composite affinity level Φ(x,y,c) of article x to article y actually selected by the user accessing article x and the affinity level Φ(x,y*,c) corresponding to the recommended article y* may serve as an indicator of the accuracy of modelling users' behaviour. The first two moments of the discrepancy Δ may be determined for further processing (the first moment being Sum1/K and the second moment being Sum2/K). Of course, the most informative indicator of effectiveness of subsequent-article selection is a proportion of transitions that obeyed respective recommendations. Counting complying transitions may be performed within process 1180 (not illustrated in
Process 1190 communicates measurements to other processes for potential adjustment of the affinity expression 950.
Four candidate articles of indices 912, 89, 1017, and 216 with composite normalized affinity levels of 0.32, 0.28, 0.24, and 0.16, respectively, are considered. Based on the affinity levels of the reference article to the candidate articles, articles 912, 89, 1017, and 216 are ranked as 0, 1, 2, and 3. Article 912 of the highest affinity level may be the most favorite article to succeed the reference article. However, it is conjectured that randomly considering each candidate article would increase the likelihood that a user accesses a recommended article. Preferably, the candidate articles would be selected in proportion to respective affinity levels. To select candidate articles according to affinity levels, each candidate article may be associate with a respective proportionate integer band between integers 0 and (L−1), L>>1, and a candidate article is selected according to a generated random integer between 0 and (L−1) in a manner well known in the art. The integer bands are non-overlapping. For example, with the normalized affinity levels of 0.32, 0.28, 0.24, and 0.16, and selecting the integer L to equal 1024, the candidate articles would be associated with integer bands {0-326}, {327-613}, {614-849}, and {860-1023}. A generated random integer of 500 selects article 89, a generated random integer of 900 selects article 216, etc.
Several other methods of implementing weighted random selection may be devised. For example, an array may be populated with randomly sequenced candidate articles, with each candidate article occupying a number of scattered entries proportionate to a corresponding affinity level. The entries of the array may then be read sequentially to determine a recommended article.
A memory device 1530 stores the plurality of articles 120 (
Memory devices 1580 store software modules. A memory device 1581 stores software module 510 (
Sets 2010 and 2020 are merged into combined set 2040 which has 8 articles 2041 (the union of the set of articles 2011 and the set of articles 2021) with corresponding article-content normalized similarity levels 2042 and normalized gravitation levels 2043. Set 2040 and set 2030 are merged into combined set 2050 which has 9 articles 2051 (the union of the set of articles 2011, 2021, and 2031) with corresponding normalized article-content similarity levels 2052, normalized gravitation levels 2053, and normalized attraction levels 2054. The composite affinity levels Φ(x,y,c), 0≤x<M, 0≤y<M, and 0≤c<χ, may then be determined for the 9 articles, with invalid entries (marked “X”) set to equal zero as depicted in Table-I below.
The highest value for Θ(x,y), g(x,y), or γ(x,y,c), for 0≤x<M, 0≤y<M, 0≤c<χ is unity. Thus, with the composite affinity level defined as:
Φ(x,y,c)=β×Θ(x,y)+A×g(x,y)+B×γ(x,y,c),
the highest affinity level is (1+A+B), which may be realized only if there is a target article y* for which Θ(x,y*)=g(x,y*)=γ(x,y*,c)=1.0. The affinity levels indicated in Table-I are based on selecting the coefficients A and B to equal 1.5 and 2.5, respectively, so that the affinity level would be bounded between 0 and 5.0. The parameter (3 assumes a value of 0.0 if content-similarity is to be ignored (for experimentation) and a value of 1.0 otherwise.
As mentioned above in the description of
Φ(x,y,c)=β×Θ(x,y)+A*×G(x,y)+B*×Γ(x,y,c),
with A*=A/S1(x) and B*=B/S2(x,c), where Z1(x) is a running sum of gravitation scores, and Z2(x,c) is a running sum of attraction scores for user-cluster c, of reference article x.
Generally, the composite affinity level may be expressed as other functions of Θ(x,y), G(x,y), and Γ(x,y,c).
The target articles to follow reference article-0 are the articles of indices 1, 2, 3. 5 and 7 which have corresponding content-similarity levels of 0.4, 0.2, 0.5, 0.4, and 0.3. Article-3 has the highest content-similarity to article-0 and article-2 has the least content-similarity level. The indices of target articles in table 2550 to follow reference article-0 are 3, 1, 5, 7, and 2 with corresponding content-similarity levels of 0.5, 0.4, 0.4, 0.3, and 0.2.
The target articles to follow reference article-4 are the articles of indices 1, 2, 3. 5, 6 and 7 which have corresponding content-similarity levels of 0.4, 0.8, 0.5, 0.2, 0.3 and 0.2 (noting that α(j,k)=α(k,j)). Article-2 has the highest content-similarity to article-4 and article-7 (or article-5) has the least content-similarity level. The indices of target articles in table 2550 to follow reference article-4 are 2, 3, 1, 6, 5, 7 with corresponding content-similarity levels of 0.8, 0.5, 0.4, 0.3, 0.2 and 0.2.
-
- Process 2705 initializes ρ, array T, array Y, and array U to zero.
- Process 2710 detects a user accessing an article of index y following visiting an article of index x.
- Process 2712 determines whether article y has been accessed following article x.
- Process 2720 updates array T to count an additional visit of article y following article x.
- Process 2722 revisits process 2710 if the rank of article y is 1 (hence article y cannot be promoted).
- Process 2730 indexes a better rank.
- Process 2740 identifies an article w of a rank better than the rank h of article y.
- Process 2750 determines whether article y has realized a better score than article w, i.e., whether T(y) is now greater than T(w); this condition is only reached if T(y)=T(w)+1.
- Process 2755 triggers process 2780 if article w is of rank 1 or process 2760 otherwise.
- Process 2760 selects another better rank (this would be needed if there are more than two articles having the same score as determined from array T).
- Process 2770 determines whether to maintain current article ranks and return to process 2710 or promote article y to a better rank (process 2780).
- Process 2775 revises rank of article to be promoted.
- Process 2780 exchanges ranks of article y and article w.
Upon detecting a transition from article x to any other article y of a specified collection of M articles, 0≤y<M, a respective score is updated. For a reference article of index x, 0≤x<M, a current highest rank ρ is initialized to equal 0, and each of the M entries of arrays T, R, and U is initialized to equal 0. Array T records a score of the number of transitions from article x to each other article; T(y) is a current number of transitions from article x to article y, 0≤y<M. Array R records a current rank of each article, excluding article x; R(y) is a current rank of article y with respect to article x. Array U records indices of ranked articles with respect to article x; U(1) is the index of the article of highest score, U(2) is the index of an article having a score less than or equal to the score of article U(1), and so on.
-
- current largest rank 2820;
- article rank 2830, which indexes array U storing indices of ranked articles; U(r) is the index of an article of rank r;
- indices 2840 of articles;
- array T, referenced as 2850, storing a score of each article y selected after article x; T(y) is the number of users selecting article y after article x; and
- array R, references as 2860, storing a rank of each article y selected after article x; R(y) is the rank of article y; R(y)≤ρ; arrays T and R are interleaved in
FIG. 28 .
Updates following five transitions are described below.
Transition 1:A transition to article-4 is detected (y=4, process 2710). Since T(4)=0, process 2712 leads to process 2790 and process 2710 is revisited. In process 2790, ρ is increased to equal 1, T(4) is increased to 1, R(4) is set to equal 1, and U(1) is set to index 4.
Transition 2:A transition to article-6 is detected (y=6, process 2710). Since T(6)=0, process 2712 leads to process 2790 and process 2710 is revisited. In process 2790, p is increased to equal 2, T(6) is increased to 1, R(6) is set to equal 2, and U(2) is set to index 6.
Transition 3:A transition to article-6 is detected (y=6, process 2710). Since T(6)≠0, process 2712 leads to process 2720 in which T(6) is increased to 2. Since the rank of article-6 is not 1 (h=R(6)=2), there may be an opportunity to promote article 6. Thus, process 2722 leads to process 2730 to select the next better rank k; k=h−1=1. In 2740, the index w of the article of next better rank is identified as w=4 and process 2750 determines that the score of article-6 is greater than the score of article-4. Process 2755 determines that the sought better rank, k, is 1, i.e., the top rank. Thus process 2780 is activated to demote article-w to rank h=2 and promote article-6 to the top rank k=1. Process 2710 is then revisited.
Transition 4:A transition to article-2 is detected (y-2, process 2710). Since T(2)=0, process 2712 leads to process 2790 and process 2710 is revisited. In process 2790, ρ is increased to equal 3, T(2) is increased to 1, R(2) is set to equal 3 (ρ=3), and U(3) is set to index 2.
Transition 5:A transition to article-2 is detected (y=2, process 2710). Since T(2)≠0, process 2712 leads to process 2720 in which T(2) is increased to 2. Since the rank of article-1 is not 1 (h=R(2)=3), there may be an opportunity to promote article 2. Thus, process 2722 leads to process 2730 to select the next better rank k; k=h−1=2. In 2740, the index w of the article of next better rank is identified as w=4 and process 2750 determines that the score of article-2 is greater than the score of article-4. Process 2755 determines that the sought better rank, k, is 2, i.e., not the top rank. Thus process 2760 identifies an even higher rank k; k=1. In 2740, the index w of the article of rank k is identified as w=6 and process 2750 determines than T(2) is not greater than T(6). Thus, there is no hope for two promotions; process 2770 leads to process 2775 which resets the sought rank to rank 2 instead of rank 1.
Preferably, Table 3000 rather than Table 3100 is used for quantifying the gravitation levels. The article-transition scores are recomputed frequently, hence re-normalizing unnecessarily increases the computational effort. If the gravitation level is the sole criterion for selecting a successor article, then only score comparison is needed. If the gravitation level is a component of a composite affinity level as illustrated in
Likewise, data relevant to successor articles 3250 of significant gravitation to respective reference articles are stored. For each reference article 3220, an index 3242 of a target article and a corresponding gravitation level 3255 to the reference article are stored.
Data relevant to successor articles 3260 of significant attraction to respective reference articles are stored for each user cluster. For each reference article 3220, an index 3242 of a target article and a corresponding gravitation level 3265 to the reference article are stored.
Table 3302 indicates the number 3330 of transitions to each other article at different data ages 3310 (expressed in arbitrary units; days for example). The number 3330 of transitions to an article of index y is the number of users accessing article y after accessing reference article x. The total number 3340 of users selecting any article from a specified collection of articles after accessing reference article x is indicated in the right column.
Table 3304 indicates the number 3350 of transitions to each other article at different data ages 3310 (expressed in arbitrary units; days for example) where the number of transitions is adjusted to half the accumulated values when the data age reaches 100 days. This is done to give more weight to more recent data. The adjusted total number 3360 of users selecting any article from a specified collection of articles after accessing reference article x is indicated in the right column.
Comparing Tables 3302 and 3304, at the data age of 100 days, the scores in Table 3304 are reduced to half of their values. Thus new article transitions would have more influence in determining inter-article gravitation. For example, without score adjustment, the number of transitions from article-5 to article 3 is 180 and the number of transitions from article-5 to article-7 is 212. With score adjustment, the number of transitions from article-5 to article 3 is 150 and the number of transitions from article-5 to article-7 is 122.
Thus, if the selection of a favorite succeeding article is based on article-gravitation data only, article-7 would be recommended if the scores are not adjusted but article-3 would be recommended if the scores are adjusted at the 100-day age point. If the selection considers other criteria, such as article-content similarity, then a higher score of article-gravitation would still influence the recommendation.
As discussed above, it is preferable to use the article-gravitation scores rather than normalized article-gravitation levels.
The scores accumulated between the ages of 100 days and 160 days are:
-
- {27, 20, 65, 120, 17, x, 13, 32} to a total of 294.
As indicated in Table 3402, the normalized scores at 100 days are
-
- {0.146, 0.075, 0.136, 0.141, 0.052, x, 0.028, 0.442}, and the normalized scores at 160 days are
- {0.124, 0.072, 0.171, 0.250, 0.054, x, 0.035, 0.294}.
The normalized increases of scores are
-
- {0.092, 0.068, 0.221, 0.408, 0.058, x, 0.044, 0.109}.
When the scores at the age of 100 days is multiplied by an adjustment coefficient of 0.5, the adjusted scores at the age of 160 days become
-
- {58, 36, 94, 150, 28, x, 19, 122}, to an adjusted total of 507.
The normalized adjusted scores at 160 days:
-
- {0.114, 0.071, 0.185, 0.296, 0.055, x, 0.037, 0.241}
-
- normalized gravitation levels 3510 of articles 0 to 7 at the data age of 100 days
- normalized incremental gravitation levels 3520 between 100 days and 160 days;
- normalized unadjusted gravitation levels 3530 at data age of 160 days; and
- normalized age-adjusted gravitation levels 3540 at data age of 160 days.
-
- Process 3610 detects user selection of a new article following a specific (reference) article.
- Process 3620 updates gravitation data and attraction data relevant to the reference article to account for selection of the new article.
- Process 3622 compares size of gravitation data for a specific article with a predefined lower bound Σmin, and branches to either process 3624 or process 3640 accordingly.
- Process 3624 compares age of gravitation data with a predefined age threshold t and branches to either process 3626 or process 3630 accordingly.
- Process 3626 compares size of gravitation data for a specific article with a predefined upper bound Σmax and branches to either process 3630 or process 3642 accordingly.
- Process 3630 adjusts gravitation vector of the reference article.
- Process 3640 compares the size of attraction data for the specific article with a lower bound Smin defined for a cluster to which the user belongs and branches to either process 3610 or process 3642 accordingly.
- Process 3642 compares age of attraction data with a respective predefined age threshold τ and branches to either process 3650 or process 3652 accordingly.
- Process 3650 adjusts attraction vector of the reference article according to a predefined adjustment coefficient.
- Process 3652 compares size of attraction data for the specific article with an upper bound Smax defined for the cluster to which the user belongs and branches to either process 3610 or process 3650 accordingly.
The article gravitation scores G(x,y), 0≤x<M, 0≤y<M, y≠x, of all users may be individually adjusted to η×G(x,y), 0≤η<1.0, at a predefined cyclic age τ if:
-
- the total article-transition score σ is within the interval {Σmin, Σmax}, i.e., if Σmin≤σ<Σmax;
- or
- the total article-transition score σ≥Σmax regardless of the cyclic age.
- The same criteria apply to article attraction scores Γ(x,y,c).
At age τ, the score 3710 exceeds Σmin. Thus, each of the scores G(x,y), y≠x, is scaled by an adjustment coefficient η=0.5. The total score is reduced to the value 3820 and the cyclic age is reset to zero. At cyclic age that is less than τ (absolute age of 90 days), the score grows to a value (reference 3830) equal to Σmax. thus, each of the scores G(x,y) is reduced by the coefficient η=0.8 to a lower value (reference 3840) and the cyclic age is reset to zero. At a cyclic age that is much less than T (absolute age of 248 days), the score grows to a value (reference 3850) that equals Σmax, thus, each of the scores G(x,y) is reduced by the coefficient η=0.8 to a lower value (reference 3860).
Comparing the score adjustment patterns of
If the gravitation scores are not due for adjustment, process 3980 is activated and process 3920 is revisited. Otherwise, process of attenuating the gravitation scores of a reference article are attenuated (process 3940) by multiplying each pairwise gravitation score of a directed article pair by a predefined value applicable to gravitation scores. Process 3950 acquires a total attraction score S1 of the source article of the succession based on user type. If S1 is less than a predefined lower bound Smin, step 3952 branches to process 3980 then process 3920 is revisited. Otherwise, process 3960 is activated to attenuate scores of a reference article by multiplying each pairwise attraction score of a directed article pair by a predefined values which may be user-type specific. Thus, process 3980 determines a preferred succeeding article based on accumulated scores which may be age weighted.
With a large-scale system, handling a relatively large number of articles, the processes illustrated in
Systems and apparatus of the embodiments of the invention may be implemented as any of a variety of suitable circuitry, such as one or more microprocessors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When modules of the systems of the embodiments of the invention are implemented partially or entirely in software, the modules contain a memory device for storing software instructions in a suitable, non-transitory computer-readable storage medium, and software instructions are executed in hardware using one or more processors to perform the techniques of this disclosure.
It should be noted that methods and systems of the embodiments of the invention and data sets described above are not, in any sense, abstract or intangible. Instead, the data is necessarily presented in a digital form and stored in a physical data-storage computer-readable medium, such as an electronic memory, mass-storage device, or other physical, tangible, data-storage device and medium. It should also be noted that the currently described data-processing and data-storage methods cannot be carried out manually by a human analyst, because of the complexity and vast numbers of intermediate results generated for processing and analysis of even quite modest amounts of data. Instead, the methods described herein are necessarily carried out by electronic computing systems having processors on electronically or magnetically stored data, with the results of the data processing and data analysis digitally stored in one or more tangible, physical, data-storage devices and media.
Although specific embodiments of the invention have been described in detail, it should be understood that the described embodiments are intended to be illustrative and not restrictive. Various changes and modifications of the embodiments shown in the drawings and described in the specification may be made within the scope of the following claims without departing from the scope of the invention in its broader aspect.
Claims
1. A method of interacting with an information system, comprising:
- employing a hardware processor to execute processor-readable instructions to perform processes of: acquiring contents of a plurality of articles accessible through the information system; determining pairwise inter-article content similarity; tracking a plurality of users of the information system to identify pairwise article successions, wherein a pairwise article succession comprises two articles accessed by a same user; determining composite pairwise affinity levels of said plurality of articles according to: respective inter-article content similarity; types of tracked users effecting said pairwise article successions; and pairwise frequency of article successions; and determining for a designated article of said plurality of articles a preferred succeeding article according to said composite pairwise affinity levels.
2. The method of claim 1 further comprising communicating an identifier of said preferred succeeding article to a user accessing the designated article.
3. The method of claim 1 or 2 further comprising:
- segmenting said plurality of users into a plurality of clusters according to a predefined criterion; and
- determining said types of tracked users as identifiers of respective clusters to which said tracked user belong.
4. The method of claim 1 further comprising associating each tracked user with a respective group of users and a level of significance within said respective group of users, said types indicating for said each tracked user:
- a group of users to which said each tracked user belongs; and
- a respective level of significance.
5. The method of claim 1 further comprising:
- detecting an access transition to a subsequent article following said communicating; and
- updating a measure of effective recommendations subject to a determination that said subsequent article is the preferred succeeding article.
6. The method of claim 1 further comprising:
- detecting an access transition to a subsequent article following said communicating;
- determining a first composite affinity level of said designated article to said preferred succeeding article;
- determining a second composite affinity level of said designated article to said subsequent article;
- updating discrepancy statistics based on comparing said first composite affinity level and said second composite affinity level.
7. The method of claim 1 further comprising:
- ranking directed article pairs originating from said designated article according to composite pairwise affinity levels;
- designating a predefined number of directed article pairs as candidate directed article pairs according to said ranking; and
- selecting said preferred succeeding article from among said candidate directed article pairs.
8. The method of claim 7 wherein said selecting comprises using a randomly sequenced round robin process weighted according to composite pairwise affinity levels of said candidate directed article pairs.
9. The method of claim 7 further comprising excluding a directed article pair of inter-article content similarity exceeding a predefined threshold.
10. The method of claim 1 wherein said determining pairwise inter-article content similarity comprises:
- formulating word vectors, each word vector characterizing content of a respective article of said plurality of articles; and
- performing pairwise comparisons of word vectors of different articles.
11. The method of claim 1 further comprising storing in a memory device coupled to said hardware processors composite pairwise affinity levels exceeding a predefined lower bound.
12. A method, implemented in a computing device, of interacting with an information system, the method comprising:
- tracking a plurality of users accessing a plurality of articles through the information system;
- determining for each tracked user: a respective user type of a predefined plurality of user types; and a currently accessed article;
- for each article-access transition where a particular user accesses a first article then a second article: maintaining a global measure and a user-type measure of transitions from the first article to the second article; and determining a composite measure as a function of the global measure and the user-type measure; and
- recommending a first target article to succeed said currently accessed article according to composite measures of directed article pairs originating from said currently accessed article.
13. The method of claim 12 further comprising:
- acquiring contents of said plurality of articles; and
- determining pairwise content similarities of said plurality of articles.
14. The method of claim 13 further comprising:
- determining a composite affinity level for each directed pair of articles as a function of at least one of: a respective content similarity; a respective global measure; and a respective user-type measure;
- recommending a second target article to succeed said currently accessed article according to composite affinity levels of directed article pairs originating from said currently accessed article.
15. The method of claim 12 further comprising:
- acquiring characteristics of said plurality of users;
- clustering said plurality of users into a number of clusters according to said characteristics and a predefined criterion; and
- determining said user type as an identifier of a cluster to which said tracked user belongs.
16. The method of claim 15 further comprising:
- determining centroids of said plurality of clusters;
- determining a centroid-proximity measure of said particular user according to proximity of said particular user to a respective centroid; and
- determining said user-type measure as cumulative centroid-proximity measures of users effecting said each article-access transition.
17. The method of claim 12 further comprising for each article of said plurality of articles, ranking each other article according to a respective composite measure to produce a respective set of ranked directed article pairs.
18. The method of claim 17 wherein said recommending comprises:
- designating at least two articles of highest ranking; and
- randomly designating one of said at least two articles as said target article.
19. The method of claim 12 further comprising updating said global measure and said user-type measure following said each article-access transition.
20. A method of interacting with an information system comprising:
- employing a computing device to implement processes of: tracking a plurality of users accessing a plurality of articles through the information system; determining for each tracked user: a respective user type of a predefined plurality of user types; and a currently accessed article; for each article-access transition where a particular user accesses a first article then a second article: maintaining a global measure and a user-type measure of transitions from the first article to the second article; and acquiring contents of said plurality of articles; and determining pairwise content similarities of said plurality of articles; determining a composite affinity level for each directed pair of articles as a function of: a respective content similarity; a respective global measure; and a respective user-type measure; and recommending a preferred article to succeed said currently accessed article according to composite affinity levels of directed article pairs originating from said currently accessed article.
21. An apparatus for interacting with an information system, the apparatus comprising:
- a processor and a plurality of memory devices storing: a tracking module configured to track a plurality of users accessing a plurality of articles to acquire: contents of said plurality of articles; characteristics of said plurality of users; and pairwise article successions; a module for determining pairwise content-similarity levels of said plurality of articles; a module for dividing said plurality of users into clusters according to said characteristics; a module for accumulating for each directed article pair of said pairwise article successions: a gravitation measure based on a respective succession count; and an attraction measure for each cluster of users indicating a respective cluster-specific weight; and a recommendation module configured to communicate to a user accessing a reference article an identifier of a preferred succeeding article determined according to said pairwise content-similarity levels, said gravitation measure, and said attraction measure.
22. The apparatus of claim 21 wherein said recommendation module is further configured to:
- determine an affinity level for each directed article pair according to respective content-similarity level, gravitation measure, and attraction measure;
- sort directed article pairs originating from each article into ranks according to respective affinity levels; and
- determine said preferred succeeding article according to ranks of directed article pairs originating from said reference article.
23. The apparatus of claim 21 further comprising a module, stored in one of said memory devices, configured to:
- detect from said pairwise article successions, a subsequent article accessed by said user;
- and
- report discrepancies of content-similarity, gravitation measure, and attraction measure between transition to said subsequent article and a transition to said preferred succeeding article.
Type: Application
Filed: Jun 13, 2017
Publication Date: Oct 15, 2020
Inventor: Philip Joseph RENAUD (Toronto)
Application Number: 16/304,774