ANONYMOUS CROWD COMPARISON
Systems and methods are disclosed for anonymously comparing user groups, such as but not limited to crowds, to determine a degree of user overlap. In general, a hash value is obtained for a first user group, where the hash value includes a hash value component for a number of two-user permutations within the first user group. Similarly, a hash value is obtained for a second user group, where the hash value includes a hash value component for a number of two-user permutations within the second user group. Thereafter, a degree of user overlap between the first and second user groups is determined based on a comparison of the hash value for the first user group and the hash value for the second user group.
Latest Waldeck Technology, LLC Patents:
This application claims the benefit of provisional patent application Ser. No. 61/173,625, filed Apr. 29, 2009, the disclosure of which is hereby incorporated herein by reference in its entirety.
FIELD OF THE DISCLOSUREThe present disclosure relates to comparing user groups, such as crowds of users, to determine a degree of user overlap between the user groups in a manner that preserves privacy.
BACKGROUNDA mobile user can benefit from knowing how many people in his or her current crowd have been in previous crowds with the mobile user or how many people in his or her current crowd have frequented the mobile user's current location in the past. A simple mechanism to implement this functionality would require tracking the locations of people over time. However, this approach does not respect user privacy. As such, there is a need for a system and method that enables crowd comparison in a manner that maintains user privacy.
SUMMARYSystems and methods are disclosed for anonymously comparing user groups, such as but not limited to crowds, to determine a degree of user overlap. In general, a hash value is obtained for a first user group, where the hash value includes a hash value component for a number of two-user permutations within the first user group. Similarly, a hash value is obtained for a second user group, where the hash value includes a hash value component for a number of two-user permutations within the second user group. Thereafter, a degree of user overlap between the first and second user groups is determined based on a comparison of the hash value for the first user group and the hash value for the second user group. More specifically, in one embodiment, the hash value for the first user group is compared to the hash value for the second user group to determine a number of matching component hash values. A number of matching users in the first and second user groups is then determined based on the number of matching component hash values.
Those skilled in the art will appreciate the scope of the present invention and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the invention, and together with the description serve to explain the principles of the invention.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the invention and illustrate the best mode of practicing the invention. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the invention and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
First, a hash value is obtained for a first user group (step 100). The hash value for the first user group may be obtained by computing the hash value for the first user group or by obtaining a previously computed hash value for the first user group from storage. The hash value for the first user group includes component hash values generated for a number of distinct two-user permutations for the first user group. As user herein, a “distinct two-user permutation” is a subset of two distinct users in a user group where ordering of users does not matter (i.e., a two-user permutation for users A and B is the same as, or not distinct from, a two-user permutation for users B and A). Therefore, for example, if the first user group includes user A, user B, and user C, the distinct two-user permutations for the first user group are user A and user B, user A and user C, and user B and user C. As described below in detail, the distinct two-user permutations for the first user group for which component hash values are generated and included in the hash value for the first user group may be all distinct two-user permutations for the first user group or all distinct two-user permutations for the first user group other than those including a requesting user from the first user group that initiated a process for generating the hash value for the first user group.
In a similar manner, a hash value is obtained for a second user group (step 102). The hash value for the second user group may be obtained by computing the hash value for the second user group or by obtaining a previously computed hash value for the second user group from storage. The hash value for the second user group includes component hash values generated for a number of distinct two-user permutations for the second user group. As described below in detail, the distinct two-user permutations for the second user group for which component hash values are generated and included in the hash value for the second user group may be all distinct two-user permutations for the second user group or all distinct two-user permutations for the second user group other than those including a requesting user from the second user group that initiated a process for generating the hash value for the second user group. It should be noted that while two-user permutations are used herein, permutations of three or more users may alternatively be used to generate the component hash values of the hash values for the first and second user groups.
Next, a degree of user overlap between the first and second user groups is determined based on a comparison of the hash values for the first and second user groups (step 104). As described below in more detail, the component hash values for the first and second user groups are compared to determine a number of matching component hash values in the hash values for the first and second user groups. Then, a number of matching users in the first and second user groups is determined based on the number of matching component hash values. The degree of user overlap between the first and second user groups may then be provided as the number of matching users, a percentage of matching users for a larger of the first and second user groups, or the like. In this embodiment, the degree of user overlap between the first and second user groups is then output to a requesting entity (step 106). The requesting entity may be a user, a third-party service, or the like. In addition or alternatively, the degree of user overlap between the first and second user groups may be stored for subsequent use in any desired application.
Next, the list of users for the user group is sorted (step 204). In one embodiment, the list of users is sorted alphabetically. However, the present disclosure is not limited thereto. Sorting the list of users ensures that the two-user permutations are canonical between user group hash instances. In other words, if two users A and B exist in a user group, the two-user permutation for users A and B will always be “A B” and not “B A.” This ensures that a component hash value for a two-user permutation for users A and B in a hash value for one user group will match a component hash value for a two-user permutation for the same two users in a hash value for another user group.
Next, a list of all distinct two-user permutations for the sorted list of users for the user group is created (step 206). A next two-user permutation from the list of distinct two-user permutations is then obtained (step 208). Note that for the first iteration, the next two-user permutation is the first two-user permutation in the list of distinct two-user permutations. A hash value (referred to herein as a component hash value) is then computed for the two-user permutation using a predetermined hash function (step 210). The predetermined hash function may be, for example, Secure Hash Algorithm-1 (SHA-1), but is not limited thereto. The component hash value for the two-user permutation may be computed by concatenating the user IDs of the users in the two-user permutation and providing the concatenated user IDs as an input to the predetermined hash function. The hash value returned by the predetermined hash function is then used as the component hash value for the two-user permutation.
A determination is then made as to whether the last two-user permutation has been processed (step 212). If not, the process returns to step 208 and is repeated. Once component hash values have been computed for all of the two-user permutations from step 206, the component hash values for the two-user permutations are concatenated to provide a concatenated hash value for the user group (step 214). In this embodiment, the concatenated hash value for the user group is compressed using a lossy compression algorithm to provide a hash value for the user group (step 216). In one embodiment, the lossy compression algorithm removes every Nth bit from the concatenated hash value to provide the hash value for the user group, where N is greater than or equal to 2. In one preferred embodiment, the lossy compression algorithm removes every other bit from the concatenated hash value to provide the hash value for the user group. Lossy compression reduces the storage space needed to store the hash value for the user group. In addition, lossy compression further obfuscates user information by making it highly improbable that the system or an attacker can retrieve the exact concatenated hash value or exact component hash values. In other words, lossy compression thwarts dictionary attacks. Note that the aforementioned embodiments of the lossy compression algorithm are exemplary. Other types of lossy compression algorithms may be used, as will be appreciated by one of ordinary skill in the art upon reading this disclosure. Also note that step 216 is optional.
Next, the component hash value from the hash value for the first user group is compared to the component hash value from the hash value for the second user group to determine whether there is a match (step 306). Here, the component hash values match if they are exactly the same. Matching hash values indicate that the corresponding two-user permutations from the first and second user groups also match. If the component hash values match, the counter is incremented (step 308) and then the process proceeds to step 312. If the component hash values do not match, a determination is made as to whether the last component hash value from the hash value for the second user group has been processed (step 310). If not, the process returns to step 304 and is repeated for the next component hash value from the hash value for the second user group. If the last component hash value from the hash value for the second user group has been processed, the process proceeds to step 312.
At this point, whether proceeding from step 308 or step 310, a determination is made as to whether the last component hash value from the hash value for the first user group has been processed (step 312). If not, the process returns to step 302 and is repeated for the next component hash value from the hash value for the first user group. Once all of the component hash values from the hash value for the first user group have been processed, the counter corresponds to a number of matching component hash values, which is also the number of matching two-user permutations for the first and second user groups.
Next, a number of matching users in the first and second user groups is determined based on the number of matching two-user permutations (step 314). In the preferred embodiment, the number of matching users is determined based on the following choose-2 function:
More specifically, the number of matching users is determined by setting the choose-2 function C(n,2) equal to the number of matching two-user permutations (i.e., the counter value) as follows:
where the value n can be solved for or otherwise determined and corresponds to the number of matching users for the first and second user groups. As such, if for example the number of matching two-user permutations is 6, then the number of matching users is determined based on the equation:
which results in
-
- n=4,
where n is the number of matching users in the first and second user groups. Note that the value of n in the choose-2 function may be mathematically solved. As one exemplary alternative, the value of n may be determined using a lookup table populated using the choose-2 function to enable lookup of the number of matching users for different numbers of matching two-user permutations values (i.e., 1, 2, 3, 4, 5, . . . , M, where M is a desired positive integer value).
- n=4,
In addition to determining the number of matching users, in this embodiment, a number of users in a largest of the first and second user groups is determined (step 316). More specifically, in one embodiment, a number of two-user permutations for the first and second user groups are determined. For example, if the component hash values are 80 bit values, the number of two-user permutations in the first user group may be determined by dividing the number of bits in the hash value for the first user group by 80. Likewise, the number of two-user permutations in the second user group may be determined by dividing the number of bits in the hash value for the second user group by 80. The user group having the largest number of two-user permutations is identified as the largest of the first and second user groups. Then, the number of users in the largest user group is determined based on the choose-2 function, where the choose-2 function is set equal to the number of two-user permutations in the largest user group. For example, if the number of two-user permutations in the largest user group is 15, then the number of users in the largest user group is 6 (i.e., C(n,2)=15, therefore n=6). A percentage of matching users for the largest user group is then computed (step 318). The percentage of matching users for the largest user group may be computed based on the following equation:
The percentage of matching users may then be output and/or stored as the degree of user overlap for the first and second user groups. Note that the percentage of matching users is exemplary and is not intended to limit the scope of the present disclosure. The degree of user overlap may be expressed as any value that is a function of (including being equal to) the number of matching users between the first and second user groups.
Area Network (WAN), or the like, and may include wired and/or wireless components. In one embodiment, the network 24 is a distributed public network such as the Internet.
The crowd server 12 includes a crowd formation function 26, a crowd hash computing function 28, a crowd comparison function 30, and a crowd hash repository 32. The crowd formation function 26, the crowd hash computing function 28, and the crowd comparison function 30 are preferably implemented in software, but are not limited thereto. The crowd formation function 26 generally operates to form and possibly maintain crowds of users. More specifically, in one embodiment, the mobile devices 14-1 through 14-N provide location updates that define current locations of the users 16-1 through 16-N over time. The location updates may be provided from the mobile devices 14-1 through 14-N directly to the crowd server 12 or provided from the mobile devices 14-1 through 14-N to a location service that then reports the location updates to the crowd server 12 or otherwise enables the crowd server 12 to access the location updates. Using the current locations of the users 16-1 through 16-N, the crowd formation function 26 performs a spatial crowd formation process to form crowds of the users 16-1 through 16-N that are spatially proximate to one another. While not essential, for more detailed information regarding exemplary spatial crowd formation processes, the interested reader is directed to U.S. patent application Ser. No. 12/645,535 entitled MAINTAINING A HISTORICAL RECORD OF ANONYMIZED USER PROFILE DATA BY LOCATION FOR USERS IN A MOBILE ENVIRONMENT, U.S. patent application Ser. No. 12/645,532 entitled FORMING CROWDS AND PROVIDING ACCESS TO CROWD DATA IN A MOBILE ENVIRONMENT, U.S. patent application Ser. No. 12/645,539 entitled ANONYMOUS CROWD TRACKING, U.S. patent application Ser. No. 12/645,544 entitled MODIFYING A USER'S CONTRIBUTION TO AN AGGREGATE PROFILE BASED ON TIME BETWEEN LOCATION UPDATES AND EXTERNAL EVENTS, U.S. patent application Ser. No. 12/645,546 entitled CROWD FORMATION FOR MOBILE DEVICE USERS, U.S. patent application Ser. No. 12/645,556 entitled SERVING A REQUEST FOR DATA FROM A HISTORICAL RECORD OF ANONYMIZED USER PROFILE DATA IN A MOBILE ENVIRONMENT, and U.S. patent application Ser. No. 12/645,560 entitled HANDLING CROWD REQUESTS FOR LARGE GEOGRAPHIC AREAS, all of which were filed on Dec. 23, 2009 and are hereby incorporated herein by reference in their entireties.
The crowd hash computing function 28 generally operates to compute hash values for crowds of users formed by the crowd formation function 26. The crowd hash computing function 28 preferably utilizes the process of
In the preferred embodiment, the crowd server 12 only stores the current locations of the users 16-1 through 16-N, rather than a historical record of the locations of the users 16-1 through 16-N. However, the present disclosure is not limited thereto. Storage of crowd hash values for crowds enables comparison of historical crowds to one another or to a current crowd without the need to store historical records of the locations of the users 16-1 through 16-N. In this manner, privacy of the users 16-1 through 16-N is preserved. In addition, the anonymous crowd comparison described herein enables a requestor to determine the degree of user overlap between two crowds but not the identities (or user IDs) of the users in those crowds.
The mobile devices 14-1 through 14-N are mobile devices such as, for example, mobile smart phones (e.g., Apple® iPhone), laptop or notebook computers, tablet computers (e.g., Apple® iPad), or the like. The mobile devices 14-1 through 14-N include crowd clients 34-1 through 34-N and location determination functions 36-1 through 36-N. Using the mobile device 14-1 as an example, the crowd client 34-1 is preferably implemented in software, but is not limited thereto. The crowd client 34-1 generally operates to interact with the crowd server 12. More specifically, the crowd client 34-1 operates to obtain the current location of the mobile device 14-1 from the location determination function 36-1 and send corresponding location updates to the crowd server 12, either directly or indirectly. In addition, the crowd client 34-1 preferably enables the user 16-1 to interact with the crowd server 12 to initiate storage of crowd hash values for crowds of interest to the user 16-1. Still further, the crowd client 34-1 may enable the user 16-1 to initiate anonymous crowd comparison of the crowds of interest for which corresponding crowd hash values were previously computed and stored to either current or historical crowds. The crowd client 34-1 may additionally or alternatively enable the user 16-1 to request and receive Point of Interest (POI) recommendations based on anonymous crowd comparisons of the crowds of interest for which corresponding crowd hash values were previously computed and stored to either current or historical crowds at POIs within a desired geographic region.
The location determination function 36-1 may generally be any software and/or hardware component enabled to determine the current location of the mobile device 14-1. For example, the location determination function 36-1 may be a Global Positioning System (GPS) receiver.
The subscriber device 18 may be any type of user device such as, for example, a personal computer, a mobile smart phone, a laptop or notebook computer, a tablet computer, or the like. In general, the subscriber device 18 enables the subscriber 20 to access the crowd server 12 preferably for a subscription fee. In this embodiment, the subscriber 20 is enabled to access the crowd server 12 via a web browser 38 and corresponding web interface of the crowd server 12. However, rather than the web browser 38, a custom application may be used to access the crowd server 12. Via the crowd server 12, the subscriber 20 may be enabled to monitor crowd patterns at one or more desired POIs.
The third-party service 22 may generally be any type of service that desires information regarding crowds of users. For example, the third-party service 22 may be a targeted advertising service. Via the crowd server 12, the third-party service 22 may be enabled to monitor crowd patterns at one or more desired POIs and then utilize those crowd patterns to provide a desired service.
In response to receiving the request, the crowd hash computing function 28 of the crowd server 12 obtains a current crowd in which the user 16-1 of the mobile device 14-1 is currently located from the crowd formation function 26 (step 402). In one embodiment, the crowd formation function 26 forms the current crowd of the user 16-1 reactively in response to the request. Alternatively, the crowd formation function 26 may proactively form crowds and store corresponding crowd records that identify the users that are currently in those crowds. The crowd formation function 26 may then obtain the crowd in which the user 16-1 is currently located from storage.
Next, the crowd hash computing function 28 of the crowd server 12 computes a crowd hash for the current crowd of the user 16-1 using the process of
The crowd hash computing function 28 then stores the crowd hash for the current crowd in the crowd hash repository 32 as a crowd hash value of a favorite crowd of the user 16-1 (step 406). In one embodiment, the crowd hash value is stored in a crowd hash record that includes a record ID field that stores an ID for the crowd hash record, an owner field that identifies the user 16-1 as the owner of the crowd hash record, and a hash value field that stores the crowd hash value of the current crowd. In addition, the crowd hash record may include a description field that stores a textual description of the current crowd for which the crowd hash value was computed, where the textual description may be provided by the user 16-1 (e.g., “Cool Crowd at Night Club X”). Still further, the crowd hash record may include a time field that stores a time at which the crowd hash value was computed and latitude and longitude fields that store a location of the current crowd for which the crowd hash was computed. The location of the current crowd may be computed using the current locations of the users in the crowd and, for example, a center of mass algorithm. Note that in order to thwart attacks on the system 10, in one embodiment, the crowd hash records stored by the crowd server 12 in the crowd hash repository 32 do not include any information (e.g., location, time, etc.) that may enable the system 10 or an attacker to directly correlate two crowd hash values generated for two different requestors to determine that those two different requestors were in the same crowd. Steps 400 through 406 may subsequently be repeated to compute and store crowd hash values for additional favorite crowds of the user 16-1.
Note that while the process of
In response to receiving the POI recommendation request, the crowd hash computing function 28 obtains a list of relevant POIs (step 602). The list of relevant POIs may include POIs within a defined geographic region such as, for example, a defined geographic region in which the user 16-1 is currently located (e.g., within 10 miles from the current location of the user 16-1) or a geographic region defined by the user 16-1 and included in the POI recommendation request (e.g., Raleigh, N.C.). The crowd hash computing function 28 then obtains current crowds at the relevant POIs from the crowd formation function 26 (step 604).
Next, the crowd hash computing function 28 computes a crowd hash value for each of the current crowds at the relevant POIs using the process of
The crowd comparison function 30 then selects one or more of the relevant POIs to recommend to the user 16-1 based on the degrees of user overlap between the current crowds at the relevant POIs and the favorite crowd(s) of the user 16-1 (step 610). More specifically, in one embodiment, a relevant POI is selected to recommend to the user 16-1 if the degree of user overlap between a current crowd at the POI and at least one of the favorite crowds of the user 16-1 is greater than a predefined threshold (e.g., 75%). The crowd comparison function 30 then returns the recommended POI(s) to the mobile device 14-1 (step 612) where the crowd client 34-1 presents the recommended POI(s) to the user 16-1 (step 614).
At some point in time, the mobile device 14-1, and more specifically the crowd client 34-1 of the mobile device 14-1, sends a historical crowd comparison request to the crowd server 12 (step 702). In one embodiment, the historical crowd comparison request is for a current location of the user 16-1. In another embodiment, the historical crowd comparison request is for a POI or AOI defined or otherwise selected by the user 16-1. The historical crowd comparison request may be manually initiated by the user 16-1 or automatically initiated by the crowd client 34-1 in response to a triggering event.
In response to receiving the historical crowd comparison request, the crowd comparison function 30 obtains crowd hash values for historical crowds that are relevant to the historical crowd comparison request (step 704). In this embodiment, the crowd hash values for the historical crowds are stored in the crowd hash repository 32 along with the locations of the historical crowds at the time the crowd hash values were computed. As such, in one embodiment, the relevant historical crowds are historical crowds that were located at or near the current location of the user 16-1. In another embodiment, the relevant historical crowds are historical crowds that were located at a POI or within an AOI defined by the historical crowd comparison request. In addition, a time window may be defined for the historical crowd comparison request such that the relevant historical crowds are historical crowds for which the corresponding crowd hash values were computed during the defined time window. The time window may be system-defined or defined by the user 16-1 and included in the historical crowd comparison request. For example, the time window may be an absolute time window such as, for example, the last week, the last month, the last year, Jan. 1, 2010 through Mar. 12, 2010, or the like. As another example, the time window may be a reoccurring time window such as, for example, Fridays, weekdays from 10 AM-Noon, the first day of each month, or the like.
Next, for each historical crowd, the crowd comparison function 30 determines a degree of user overlap between the historical crowd and each of one or more favorite crowds previously identified by the user 16-1 using the process of
At some point in time, the mobile device 14-1, and more specifically the crowd client 34-1 of the mobile device 14-1, sends a POI recommendation request to the crowd server 12 (step 802). In response to receiving the POI recommendation request, the crowd hash computing function 28 obtains a list of relevant POIs (step 804). The list of relevant POIs may include POIs within a defined geographic region such as, for example, a defined geographic region in which the user 16-1 is currently located (e.g., within 10 miles from the current location of the user 16-1) or a geographic region defined by the user 16-1 and included in the POI recommendation request (e.g., Raleigh, N.C.). The crowd hash computing function 28 then obtains crowd hash values stored for historical crowds located at the relevant POIs from the crowd hash repository 32 (step 806). In one embodiment, a time window may be used such that the crowd hash values obtained for the relevant POIs are only those crowd hash values for historical crowds at the relevant POIs during a defined time window for the POI recommendation request. The time window for the historical crowd comparison request may be defined by the historical crowd comparison request or may be system-defined. For example, the time window may be an absolute time window such as, for example, the last week, the last month, the last year, Jan. 1, 2010 through Mar. 12, 2010, or the like. As another example, the time window may be a reoccurring time window such as, for example, Fridays, weekdays from 10 AM-Noon, the first day of each month, or the like.
Next, for each historical crowd for each relevant POI, the crowd comparison function 30 determines a degree of user overlap between the historical crowd and one or more favorite crowds previously identified by the user 16-1 using the process of
The crowd comparison function 30 then selects one or more of the relevant POIs to recommend to the user 16-1 based on the degrees of user overlap between the historical crowds at the relevant POIs and the favorite crowd(s) of the user 16-1 (step 810). More specifically, in one embodiment, a relevant POI is selected to recommend to the user 16-1 if the degree of user overlap between at least one historical crowd at the POI and at least one of the favorite crowds of the user 16-1 is greater than a predefined threshold (e.g., 75%). In another embodiment, for each relevant POI, the degrees of user overlap between historical crowds at the relevant POI for each favorite crowd of the user 16-1 are combined (e.g., averaged) to provide a combined degree of user overlap for the relevant POI for each favorite crowd of the user 16-1. A relevant POI may then be selected for recommendation to the user 16-1 if the combined degree of user overlap for the relevant POI for at least one of the favorite crowds of the user 16-1 is greater than a predefined threshold (e.g., 75%). The crowd comparison function 30 then returns the recommended POI(s) to the mobile device 14-1 (step 812) where the crowd client 34-1 presents the recommended POI(s) to the user 16-1 (step 814).
At some point, the crowd comparison function 30 compares the crowd hash values computed and stored for crowds at the POI over time to provide data characterizing crowd patterns at the POI (step 904). In other words, the crowd comparison function 30 characterizes crowd patterns at the POI based on comparisons of the crowd hash values computed and stored for crowds at the POI over time. More specifically, in one embodiment, for each of one or more reoccurring time windows, the crowd comparison function 30 determines a degree of user overlap between each crowd at the POI during the reoccurring time window and each other crowd at the POI during the reoccurring time window based on comparisons of corresponding crowd hash values. As a result, for each reoccurring time window, the crowd comparison function 30 provides a degree of user overlap between each pair of crowds at the POI during the reoccurring time window. Then, for each reoccurring time window, the degrees of user overlap computed for the pairs of crowds at the POI during the reoccurring time window may be combined (e.g., averaged) to provide a combined degree of user overlap for the reoccurring time window. The combined degree of user overlap for each of the reoccurring time windows may then be provided as data characterizing crowd patterns at the POI. Note that, as used herein, a reoccurring time window is a time window that periodically repeats itself. Some examples of a reoccurring time window are Friday (i.e., the day Friday repeats weekly), weekdays from 11 AM to 1 PM (repeats daily during the week and each week), March 19 (repeats yearly), or the like. Once characterization is complete, the data characterizing crowd patterns at the POI is returned to the requestor (step 906).
Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present invention. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.
Claims
1. A computer-implemented method comprising:
- obtaining a hash value for a first user group, the hash value for the first user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the first user group;
- obtaining a hash value for a second user group, the hash value for the second user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the second user group; and
- determining a degree of user overlap between the first user group and the second user group based on a comparison of the hash value for the first user group and the hash value for the second user group.
2. The method of claim 1 wherein:
- the plurality of two user-permutations for the first user group comprises all distinct two-user permutations for the first user group, and each of the plurality of component hash values for the first user group is a hash value computed for a corresponding one of the plurality of two-user permutations for the first user group based on a predetermined hash function; and
- the plurality of two user-permutations for the second user group comprises all distinct two-user permutations for the second user group, and each of the plurality of component hash values for the second user group is a hash value computed for a corresponding one of the plurality of two-user permutations for the second user group based on the predetermined hash function.
3. The method of claim 1 wherein:
- the plurality of two user-permutations for the first user group comprises all distinct two-user permutations for the first user group other than those that include a requesting user associated with the hash value for the first user group, and each of the plurality of component hash values for the first user group is a hash value computed for a corresponding one of the plurality of two-user permutations for the first user group based on a predetermined hash function; and
- the plurality of two user-permutations for the second user group comprises all distinct two-user permutations for the second user group other than those that include a requesting user associated with the hash value for the second user group, and each of the plurality of component hash values for the second user group is a hash value computed for a corresponding one of the plurality of two-user permutations for the second user group based on the predetermined hash function.
4. The method of claim 1 wherein obtaining the hash value for the first user group comprises:
- obtaining a list of users in the first user group;
- removing a requesting user that initiated a process for obtaining the hash value for the first user group from the list of users;
- sorting the list of users in the first user group to provide a sorted list of users;
- creating the plurality of two-user permutations for the first user group from the sorted list of users;
- computing a component hash value for each of the plurality of two-user permutations for the first user group to provide the plurality of component hash values for the first user group; and
- concatenating the plurality of component hash values to provide a concatenated hash value for the first user group.
5. The method of claim 4 wherein obtaining the hash value for the first user group further comprises compressing the concatenated hash value for the first user group using a lossy compression algorithm to provide the hash value for the first user group.
6. The method of claim 5 wherein compressing the concatenated hash value comprises removing every Nth bit, where N is greater than or equal to 2.
7. The method of claim 5 wherein compressing the concatenated hash value comprises removing every other bit.
8. The method of claim 4 wherein sorting the list of users in the first user group to provide the sorted list of users is such that the plurality of two-user permutations for the first user group are canonical with respect to other instances of the plurality of two-user permutations for other user groups including the second user group.
9. The method of claim 1 wherein obtaining the hash value for the first user group comprises:
- obtaining a list of users in the first user group;
- sorting the list of users in the first user group to provide a sorted list of users;
- creating the plurality of two-user permutations for the first user group from the sorted list of users;
- computing a component hash value for each of the plurality of two-user permutations for the first user group to provide the plurality of component hash values for the first user group; and
- concatenating the plurality of component hash values to provide a concatenated hash value for the first user group.
10. The method of claim 9 wherein obtaining the hash value for the first user group further comprises compressing the concatenated hash value for the first user group using a lossy compression algorithm to provide the hash value for the first user group.
11. The method of claim 10 wherein compressing the concatenated hash value comprises removing every Nth bit, where N is greater than or equal to 2.
12. The method of claim 10 wherein compressing the concatenated hash value comprises removing every other bit.
13. The method of claim 9 wherein sorting the list of users in the first user group to provide the sorted list of users is such that the plurality of two-user permutations for the first user group are canonical with respect to other instances of the plurality of two-user permutations for other user groups including the second user group.
14. The method of claim 1 wherein determining the degree of user overlap between the first user group and the second user group comprises:
- determining a number of matching component hash values between the plurality of component hash values for the first user group and the plurality of component hash values for the second user group; and
- determining a number of matching users in the first and second user groups based on the number of matching component hash values.
15. The method of claim 14 wherein the number of matching component hash values corresponds to a number of matching two-user permutations between the plurality of two-user permutations for the first user group and the plurality of two-user permutations for the second user group.
16. The method of claim 14 wherein determining the number of matching users in the first and second user groups based on the number of matching component hash values comprises determining the number of matching users in the first and second user groups based on: n ! 2 ! ( n - 2 ) ! = number_of _matching _two _user _permutations, where number_of_matching_two_user_permutations is the number of matching component hash values and n is the number of matching users in the first and second user groups.
17. The method of claim 14 wherein determining the degree of user overlap between the first user group and the second user group further comprises:
- determining a largest user group of the first and second user groups; and
- determining a percentage of user overlap based on a ratio of the number of matching users over a number of users in the largest user group of the first and second user groups.
18. A server comprising:
- a communication interface communicatively coupling the server to a network; and
- a controller associated with the communication interface and adapted to: obtain a hash value for a first user group, the hash value for the first user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the first user group; obtain a hash value for a second user group, the hash value for the second user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the second user group; and determine a degree of user overlap between the first user group and the second user group based on a comparison of the hash value for the first user group and the hash value for the second user group.
19. A computer-readable medium storing software for instructing a controller to:
- obtain a hash value for a first user group, the hash value for the first user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the first user group;
- obtain a hash value for a second user group, the hash value for the second user group comprising a plurality of component hash values generated for a plurality of two-user permutations for the second user group; and
- determine a degree of user overlap between the first user group and the second user group based on a comparison of the hash value for the first user group and the hash value for the second user group.
20. A computer-implemented method comprising:
- receiving a crowd hash value storage request from a requestor;
- obtaining a crowd in which the requestor is located at a time of receiving the crowd hash value storage request;
- computing a crowd hash value for the crowd in which the requestor is located, the crowd hash value comprising a plurality of component hash values computed for a plurality of two-user permutations for the crowd in which the requestor is located; and
- storing the crowd hash value as a crowd hash value for a crowd of interest of the requestor.
21. The method of claim 20 further comprising, at some time after storing the crowd hash value for the crowd of interest of the requestor:
- receiving a current crowd comparison request from the requestor;
- obtaining a current crowd that is relevant to the current crowd comparison request;
- computing a crowd hash value for the current crowd, the crowd hash value for the current crowd comprising a plurality of component hash values computed for a plurality of two-user permutations for the current crowd;
- obtaining the crowd hash value of the crowd of interest of the requestor from storage;
- determining a degree of user overlap between the current crowd and the crowd of interest of the requestor based on a comparison of the hash value for the current crowd and the hash value for the crowd of interest of the requestor; and
- returning the degree of user overlap between the current crowd and the crowd of interest of the requestor to the requestor.
22. The method of claim 20 further comprising, at some time after storing the crowd hash value for the crowd of interest of the requestor:
- receiving a Point of Interest (POI) recommendation request from a requestor;
- obtaining a plurality of relevant POIs that are relevant to the POI recommendation request;
- for each relevant POI of the plurality of relevant POIs: obtaining a current crowd at the relevant POI; computing a crowd hash value for the current crowd at the relevant POI, the crowd hash value comprising a plurality of component hash values computed for a plurality of two-user permutations for the current crowd at the relevant POI; and determining a degree of user overlap between the current crowd at the relevant POI and the crowd of interest of the requestor based on a comparison of the crowd hash value for the current crowd at the relevant POI and the crowd hash value for the crowd of interest of the requestor obtained from storage;
- selecting one or more recommended POIs from the plurality of relevant POIs based on the degree of user overlap between each current crowd at the plurality of relevant POIs and the crowd of interest of the requestor; and
- returning the one or more recommended POIs to the requestor.
23. The method of claim 20 further comprising, at some time after storing the crowd hash value for the crowd of interest of the requestor:
- receiving a historical crowd comparison request from the requestor;
- obtaining one or more historical crowds that are relevant to the historical crowd comparison request;
- obtaining the crowd hash value of the crowd of interest of the requestor from storage;
- for each historical crowd of the one or more historical crowds: obtaining a crowd hash value for the historical crowd, the crowd hash value for the historical crowd being previously computed and stored and comprising a plurality of component hash values computed for a plurality of two-user permutations for the historical crowd; and determining a degree of user overlap between the historical crowd and the crowd of interest of the requestor based on a comparison of the hash value for the historical crowd and the hash value for the crowd of interest of the requestor; and
- returning, to the requestor, data reflecting the degree of user overlap between each of the one or more historical crowds and the crowd of interest of the requestor.
24. The method of claim 23 further comprising:
- combining the degrees of user overlap between the one or more historical crowds and the crowd of interest of the requestor to provide a combined degree of user overlap;
- wherein returning the data comprises returning the combined degree of user overlap.
25. The method of claim 20 further comprising, at some time after storing the crowd hash value for the crowd of interest of the requestor:
- receiving a Point of Interest (POI) recommendation request from a requestor;
- obtaining a plurality of relevant POIs that are relevant to the POI recommendation request;
- for each relevant POI of the plurality of relevant POIs: obtaining one or more historical crowds at the relevant POI; for each historical crowd of the one or more historical crowds at the relevant POI, computing a crowd hash value for the historical crowd at the relevant POI, the crowd hash value comprising a plurality of component hash values computed for a plurality of two-user permutations for the historical crowd at the relevant POI; and for each historical crowd of the one or more historical crowds at the relevant POI, determining a degree of user overlap between the historical crowd at the relevant POI and the crowd of interest of the requestor based on a comparison of the crowd hash value for the historical crowd at the relevant POI and the crowd hash value for the crowd of interest of the requestor obtained from storage;
- selecting one or more recommended POIs from the plurality of relevant POIs based on the degree of user overlap between each of the historical crowd at each of the plurality of relevant POIs and the crowd of interest of the requestor; and
- returning the one or more recommended POIs to the requestor.
26. A server comprising:
- a communication interface communicatively coupling the server to a network; and
- a controller associated with the communication interface and adapted to: receive a crowd hash value storage request from a requestor via the communication interface; obtain a crowd in which the requestor is located at a time of receiving the crowd hash value storage request; compute a crowd hash value for the crowd in which the requestor is located, the crowd hash value comprising a plurality of component hash values computed for a plurality of two-user permutations for the crowd in which the requestor is located; and store the crowd hash value as a crowd hash value for a crowd of interest of the requestor.
27. A computer-readable medium storing software for instructing a controller of a computing device to:
- receive a crowd hash value storage request from a requestor;
- obtain a crowd in which the requestor is located at a time of receiving the crowd hash value storage request;
- compute a crowd hash value for the crowd in which the requestor is located, the crowd hash value comprising a plurality of component hash values computed for a plurality of two-user permutations for the crowd in which the requestor is located; and
- store the crowd hash value as a crowd hash value for a crowd of interest of the requestor.
28. A computer-implemented method comprising:
- receiving a request to monitor crowds at a Point of Interest (POI) from a requestor;
- computing a plurality of crowd hash values for a plurality of crowds at the POI over time, wherein, for each crowd hash value of the plurality of crowd hash values, the crowd hash value comprises a plurality of component hash values computed for a plurality of two-user permutations for a corresponding one of the plurality of crowds at the POI over time;
- comparing the plurality of crowd hash values for the plurality of crowds at the POI over time to one another to provide data characterizing crowd patterns at the POI; and
- returning the data characterizing crowd patterns at the POI to the requestor.
29. The method of claim 28 wherein comparing the plurality of crowd hash values for the plurality of crowds at the POI over time to one another to provide the data characterizing crowd patterns at the POI comprises:
- identifying a subset of the plurality of crowds that were at the POI during a reoccurring time window;
- determining a combined degree of user overlap for the subset of the plurality of crowds based on a subset of the plurality of crowd hash values computed for the subset of the plurality of crowds; and
- including the combined degree of user overlap for the subset of the plurality of crowds in the data characterizing crowd patterns at the POI.
30. The method of claim 28 wherein comparing the plurality of crowd hash values for the plurality of crowds at the POI over time to one another to provide the data characterizing crowd patterns at the POI comprises:
- identifying a subset of the plurality of crowds that were at the POI during a reoccurring time window;
- determining a degree of user overlap between each pair of crowds in the subset of the plurality of crowds that were at the POI during the reoccurring time window based on comparisons of crowd hash values in a subset of the plurality of crowd hash values for the subset of the plurality of crowds to one another;
- combining the degrees of user overlap for the pairs of crowds in the subset of the plurality of crowds that were at the POI during the reoccurring time window to provide a combined degree of user overlap for the reoccurring time window; and
- including the combined degree of user overlap for the reoccurring time window in the data characterizing crowd patterns at the POI.
31. A server comprising:
- a communication interface communicatively coupling the server to a network; and
- a controller associated with the communication interface and adapted to: receive a request, via the communication interface, to monitor crowds at a Point of Interest (POI); compute a plurality of crowd hash values for a plurality of crowds at the POI over time, wherein, for each crowd hash value of the plurality of crowd hash values, the crowd hash value comprises a plurality of component hash values computed for a plurality of two-user permutations for a corresponding one of the plurality of crowds at the POI over time; compare the plurality of crowd hash values for the plurality of crowds at the POI over time to one another to provide data characterizing crowd patterns at the POI; and return the data characterizing crowd patterns at the POI to the requestor.
32. A computer-readable medium storing software for instructing a controller of a computing device to:
- receive a request to monitor crowds at a Point of Interest (POI);
- compute a plurality of crowd hash values for a plurality of crowds at the POI over time, wherein, for each crowd hash value of the plurality of crowd hash values, the crowd hash value comprises a plurality of component hash values computed for a plurality of two-user permutations for a corresponding one of the plurality of crowds at the POI over time;
- compare the plurality of crowd hash values for the plurality of crowds at the POI over time to one another to provide data characterizing crowd patterns at the POI; and
- return the data characterizing crowd patterns at the POI to the requestor.
Type: Application
Filed: Apr 14, 2010
Publication Date: Feb 23, 2012
Applicant: Waldeck Technology, LLC (Wilmington, DE)
Inventor: Steven L. Petersen (Los Gatos, CA)
Application Number: 12/759,749
International Classification: G06Q 10/00 (20060101); G06F 17/30 (20060101);