SYSTEM AND METHOD FOR DISCOVERING GROUPS WHOSE MEMBERS HAVE A GIVEN ATTRIBUTE
A method for discovering a group defined by a common characteristic is disclosed. The method includes building representation of a portion of a social network based on a starting person with the given characteristic, the person also providing the person's gender and school affiliation. The social network representation is then searched to discover clusters therein meeting certain size and connectivity requirements with respect to the network. After the clusters in the network are discovered, clusters having a high degree of similarity are merged together. The resulting clusters, both merged and non-merged, are then scored to determine the cluster that best fits the original group. The winning cluster is then returned to the starting person who confirms the correctness of the cluster. The set of the persons in a confirmed cluster are then displayed to the starting person.
This continuation application incorporates by reference in its entirety and claims the benefit of application, U.S. Ser. No. 13/624,971, filed on Sep. 24, 2012 and titled “SYSTEM AND METHOD FOR DISCOVERING GROUPS WHOSE MEMBERS HAVE A GIVEN ATTRIBUTE”.
FIELD OF THE INVENTIONThe present invention relates generally to discovering a group defined by a common attribute in a friend network.
DESCRIPTION OF THE RELATED ARTA friend network can be viewed as a graph whose vertices are persons and whose edges indicate a friend relationship, F(v1, v2). This graph can be exceedingly large and highly interconnected. The graph also contains auxiliary information about the friends in the graph, but this information is disjointed and unconnected in the graph. Thus, the graph is not, by itself, helpful in discovering groups of persons possessing a common attribute. For example, if one desires to know a group of persons in the graph who are members of an organization, there is no simple way to find this group directly from the graph. However, it is certainly desirable to use the friend network to groups having a common attribute for a variety of purposes. For example, it may be desirable to discover a group of persons all of whom have the same a common interest and to present this group to a party for marketing purposes. Thus, a problem with the friend graph exists in that it provides connectivity based on only one property, friendship, making it difficult to discover groups of people in the graph with a common attribute.
BRIEF SUMMARY OF THE INVENTIONAn embodiment solves the problem of finding groups of people in a friend network having a common characteristic. The embodiment performs this task extremely quickly and with a minimum of input information. One benefit of the present invention is that a group of persons for which a common attribute exists is now presentable for a variety of purposes. For example, if a merchant desires to sell goods or services to the group, then the discovery of the group is exceedingly valuable to the merchant. As another example, the discovered group can be used to increase social dynamics in a game or other application.
These and other features, aspects and advantages of the embodiments will become better understood with regard to the following description, appended claims, and accompanying drawings where:
Also in step 306, the process constructs an adjacency list to represent the network obtained from the queries. The adjacency list is a convenient data structure for representing the network in what follows. The adjacency list representation, in one embodiment, is a list of size |V|, the number of vertices in the network, with indexes into the edge list for each V, and then a list of size |E|, the number of edges, for the edges. For example, the vertex list is
-
- [index_v0, index_v1, index_v2 . . . index_vn].
The edge list is - [vertices connected to v0, vertices connected to v1,
- vertices connected to v2, . . . vertices connected to vn].
Thus, an index for a particular vertex in the vertex list provides a pointer to the portion of the edge list having vertices connected to the particular vertex. Alternatively, any data structure that can represent the network obtained from the queries will do. For example, an adjacency matrix is sufficient to represent the network.
- [index_v0, index_v1, index_v2 . . . index_vn].
The process then proceeds to construct the 2-neighbors for a given vertex after constructing the adjacency list. In one embodiment, the 2-neighbors are determined by visiting each 1-neighbor of a given vertex and determining if the 1-neighbor has a neighbor other than the particular vertex. If so, then this fact is recorded in a separate list. After all of the vertices of each 1-neighbor are visited, the list has all of the 2-neighbors of the given vertex. In one embodiment, the 1-neighbor list and the 2-neighbor list are bit maps. The 2-neighbors are used in the process of discovering clusters in the adjacency list.
In step 308, the process operates to discover any clusters present in the adjacency list. The clusters sought are the k-cores in the graph, where k-cores are collections of nodes that are internally dense and externally sparse. The algorithm for finding clusters is explained in more detail below.
The search for clusters produces several or many clusters some of which are similar to each other. To handle these multiple similar clusters, the process constructs a convenient data structure for merging clusters that are similar to each other. In one embodiment, the data structure is a tree. In another embodiment, the data structure is a list. In the case of a tree structure, the process traverses in step 310 the tree from the bottom to the top, merging pairs of clusters that have a high degree of similarity. In the case of a list, the process traverses the list merging odd and even clusters. The merging occurs according to a criterion, which in one case is a relative cluster overlap threshold. If A and B are two clusters and the threshold is a value κ, then the relative cluster overlap criterion is that |A∩B|≧κ. Thus, the number of members in common must be at least κ. In one embodiment, the value κ is 3.
When the merging process is completed, several merged clusters and possibly unmerged clusters remain. The process then determines the α and β coefficients for each remaining cluster. The merging process is described in more detail in connection with
Next, the process determines which of the clusters, merged or otherwise, corresponds to the group sought for. To find the best cluster, the process computes, in step 312, a weighted sum function,
w=w(ms,gs,ss,α,β),
for each cluster based on its α and β coefficients and any number of additional parameters. In one embodiment, the parameters include a gender score gs, a school score ss, and a member score ms, but any number of other parameters such as location, age, or last name of the family of the starting person can be included. In the above weighted sum function, the gender score is the fraction of members in the cluster having the same gender as the starting person. The school score is the fraction of members in the cluster attending the same school as the starting person. The member score is derived from a triangle distribution function with a range of [0,1] that is centered around the approximate size of the organization of which the starting person is a member. For example, if the size of the organization is 50, then the ms score is
ms=1−max(0,min|NumMembersInSameSchool−50|/50),
which computes a number between 0 and 1, depending on the number of members in the same school as the starting person. For example, if the number of members in the same school is 50, then the function has a value of 1. If the number of members in the same school is 0 or 100, then the score has a value of 0. In one embodiment, the weight sum calculation is =(1·ms+1·gs+1·ss+0.25·(1−α)+1·β), where the weights for ms, gs, and ss and β are unity and the weight for (1−α) is 0.25.
The result of the weighted sum is a score and the cluster with the highest score is most likely the cluster sought after. The cluster with the highest core is then presented to the starting person, who then confirms that whether or not the cluster is correct, i.e., that it corresponds to a group of which the starting person is a member. If the cluster is not correct, the process presents to the starting person an alternative cluster, one that scored slightly lower, to find out if the alternative cluster is correct. If the cluster is correct, then those persons in the cluster other than the starting person are added to the site to which the starting person gave his Facebook id, so that the starting person can see all of the members of the group of which he or she is a member.
In one embodiment, the algorithm illustrated in
In another embodiment, the algorithm illustrated in
In one embodiment, the merging process is performed on a Graphics Processing Unit (GPU), such as the GeForce GTX 560. In this embodiment, the multiple internal cores in the GPU operate in parallel to perform each stage of the merging and synchronize with each other before the next stage's processing is performed. For example, the merging of clusters c1 506 and c2 508 is performed in one core in the GPU while the merging of clusters c3 512 and c4 514 is performed in another core in the GPU. A synchronization is performed so that merging at the cr 502 node waits for the two cores to complete their respective operations. It is apparent that in a GPU with 336 internal cores, up to 336 different clusters can be merged concurrently, thereby significantly lowering the processing time for this operation. Additionally, in this embodiment, the alpha and beta coefficients (α, β) of the merged clusters are computed in parallel.
In the GPU embodiment, the clusters and the merged clusters are stored as binary in the memory available to the GPU.
In one embodiment, the GPGPU, shown in
Although the present invention has been described in considerable detail with reference to certain preferred versions thereof, other versions are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the preferred versions contained herein.
Claims
1. A method for discovering a group defined by a common characteristic, the method comprising:
- building a data structure that represents a portion of a friend-network of a starting person, wherein the starting person has a given characteristic and the starting person provides one or more items of personal data;
- discovering clusters in the data structure and collecting the discovered clusters into a set, wherein each discovered cluster has a given size, internal density, and external sparseness;
- for any clusters in the set that are sufficiently similar to each other, merging together the similar clusters in the set, wherein similarity is determined by a relative cluster overlap threshold;
- computing a weighted sum based at least on the internal density and external sparseness of each cluster in the set to select a cluster having a high probability of representing the members of the group; and
- if the starting person indicates that the selected cluster accurately includes the members of the group, displaying the members in the selected cluster.
2. The method of claim 1, wherein the given characteristic is membership in a given group or organization.
3. The method of claim 1,
- wherein the internal density and external sparseness are respectively represented by an alpha coefficient and a beta coefficient; and
- wherein the weighted sum is further based on the one or more items of personal data.
4. The method of claim 1, wherein the one or more items of personal data include the gender of the starting person and the school of the starting person.
5. The method of claim 1, wherein the one or more items of personal data include a geographic location of the starting person.
6. The method of claim 1, wherein the one or more items of personal data include an age of the starting person.
7. The method of claim 1, wherein building the data structure includes:
- obtaining an id of the starting person in a social network;
- accessing a friend list in the social network using the starting person's id; and entering into the data structure each friend in the friend list that has one degree of friendship with the starting person; and
- entering into the data structure each friend having one degree of friendship with each friend in the starting person's friend list.
8. The method of claim 1, wherein discovering clusters in the data structure includes:
- building a candidate cluster;
- testing whether the candidate cluster has the given internal density and external sparseness; and
- outputting the cluster as a discovered cluster if the candidate cluster has the given size, internal density and external sparseness.
9. The method of claim 8, wherein building a candidate cluster includes:
- for each vertex of the network in the data structure, determining a first neighborhood of vertices about the vertex, determining a second neighborhood of vertices about vertices that are within two hops from the vertex, and adding the vertex to a candidate cluster if the first and second neighborhoods have sufficient vertices in common.
10. The method of claim 8, wherein building a candidate cluster includes:
- partitioning the data structure into one or more smaller data structures, each smaller data structure having a non-overlapping subset of the data structure; and
- building concurrently candidate clusters from the smaller data structures.
11. The method of claim 1, wherein merging similar clusters includes:
- entering clusters into nodes of a tree structure;
- combining pairs of nodes into a parent node of the tree if the nodes have clusters that are sufficiently similar; and
- returning the tree structure.
12. The method of claim 11, wherein combining pairs of leaves into a parent node includes combining at least one pair of leaves concurrently with another pair of leaves.
13. A computer-readable medium carrying one or more sequences of instructions for discovering a group defined by a common characteristic, wherein execution of the one or more sequences of instructions by one or more processors causes the one or more processors to perform the steps of:
- building a data structure that represents a portion of a friend-network of a starting person, wherein the starting person has a given characteristic and the starting person provides one or more items of personal data;
- discovering clusters in the data structure and collecting the discovered clusters into a set, wherein each discovered cluster has a given size, internal density, and external sparseness;
- for any clusters in the set that are sufficiently similar to each other, merging together the similar clusters in the set, wherein similarity is determined by a relative cluster overlap threshold;
- computing a weighted sum based at least on the internal density and external sparseness of each cluster in the set to select a cluster having a high probability of representing the members of the group; and
- if the starting person indicates that the selected cluster accurately includes the members of the group, displaying the members in the selected cluster.
14. The computer-readable medium of claim 13, wherein the given characteristic is membership in a given group or organization.
15. The computer-readable medium of claim 13,
- wherein the internal density and external sparseness are respectively represented by an alpha coefficient and a beta coefficient; and
- wherein the weighted sum is further based on the one or more items of personal data.
16. The computer-readable medium of claim 13, wherein building the data structure includes:
- obtaining an id of the starting person in a social network;
- accessing a friend list in the social network using the starting person's id; and
- entering into the data structure each friend in the friend list that has one degree of friendship with the starting person; and
- entering into the data structure each friend having one degree of friendship with each friend in the starting person's friend list.
17. The computer-readable medium of claim 13, wherein merging similar clusters includes:
- entering clusters into nodes of a tree structure;
- combining pairs of nodes into a parent node of the tree if the nodes have clusters that are sufficiently similar; and
- returning the tree structure.
18. A system for discovering a group defined by a common characteristic, the system comprising:
- one or more processing units, each including
- a processor; and
- a memory coupled to the processor in the processing unit; and
- wherein each memory contains instructions, which, when executed by the one or more processing units, perform the steps of:
- building a data structure that represents a portion of a friend-network of a starting person, wherein the starting person has a given characteristic, and wherein the starting person provides one or more items of personal data;
- discovering clusters in the data structure and collecting the discovered clusters into a set, wherein each discovered cluster has a given size, internal density, and external sparseness; for any clusters in the set that are sufficiently similar to each other, merging together the similar clusters in the set, wherein similarity is determined by a relative cluster overlap threshold;
- computing a weighted sum based at least on the internal density and external sparseness of each cluster in the set to select a cluster having a high probability of representing the members of the group; and
- if the starting person indicates that the selected cluster accurately includes the members of the group, displaying the members in the selected cluster.
19. The system of claim 18,
- further comprising a graphics processing unit that includes a plurality of processing cores and memories coupled to respective ones of the cores, wherein each core is capable of operating concurrently with respect to the other cores in the graphics processing unit;
- wherein the step of merging together similar clusters is performed by the processing cores in the graphics processing unit.
20. The system of claim 18,
- further comprising a graphics processing unit that includes a plurality of processing cores and memories coupled to respective ones of the cores, wherein each core is capable of operating concurrently with respect to the other cores in the graphics processing unit;
- wherein the step of discovering clusters and collecting the discovered cluster is performed by the processing cores in the graphics processing unit.
Type: Application
Filed: Jan 24, 2017
Publication Date: Nov 23, 2017
Inventors: Anthony Bernard Diepenbrock, V (San Francisco, CA), Charles W. Moyes, III (San Francisco, CA)
Application Number: 15/414,064