SYSTEMS AND METHODS FOR LARGE-SCALE LINK ANALYSIS
Systems and methods for accepting relationship indications based on interaction among entities, where each relationship indication specifies that a respective pair of the entities may be related. A single entity record may be constructed that indicates all the linked entities that have been identified as being related to the entity. The entities may include an individual, a set of individuals, a communication terminal, a plurality of communication terminals, an organization, an e-mail address, a Web-site, a bank account and a home address. An analytics operation may performed, such as identifying a shortest sequence of interrelated entities that relate a first entity with a second entity, and acting upon the identified sequence.
This application is a continuation of, and claims the benefit of priority to, U.S. patent application Ser. No. 12/888,445 filed Sep. 23, 2010, the disclosure of which is incorporated herein by reference in its entirety.
FIELD OF THE DISCLOSUREThe present disclosure relates generally to data analysis, and particularly to storage and processing of relationship-related information.
BACKGROUND OF THE DISCLOSUREVarious techniques for analyzing and extracting useful information from communication traffic are known in the art. Some analysis techniques process communication traffic in order to identify and characterize relationships between users.
SUMMARY OF THE DISCLOSUREAn embodiment that is described herein provides a method, including:
accepting a plurality of relationship indications based on interaction among entities, each relationship indication specifying that a respective pair of the entities are related;
using a link processor, identifying for each entity among a group of the entities, based on the relationship indications, one or more linked entities that are related to the entity, and constructing for each entity in the group, a single-entity record that indicates all the linked entities that have been identified as being related to the respective entity;
storing in a memory multiple single-entity records, each single-entity record corresponding respectively to only one of the entities in the group; and
identifying one or more of the entities as targets-of-interest.
In some embodiments, the interaction includes communication among the entities over a communication network. In an embodiment, the entities include at least one entity type selected from a group of types consisting of an individual, a set of individuals, a communication terminal, a plurality of communication terminals, an organization, an e-mail address, a Web-site, a bank account and a home address. In a disclosed embodiment, constructing the single entity record includes storing in the single entity record respective attributes, which characterize respective relationships between the linked entities and the entity. The attributes may indicate respective confidence levels of the relationships.
In some embodiments, the method includes using the link processor to perform an analytics operation with respect to the entities. An analytics operation may include querying at least one of the entity records stored in the memory.
In an embodiment, storing the entity records includes storing at least a portion of the entity records in an in-memory data structure residing in Random Access Memory (RAM), and performing the analytics operation includes querying the in-memory data structure. Storing the entity records may include storing another portion of the entity records on a magnetic storage device. Performing the analytics operation may include querying at least a first entity record stored in the in-memory data structure and at least a second entity record stored on the magnetic storage device.
In another embodiment, performing the analytics operation includes querying at least one of the entity records stored in the memory, identifying a shortest sequence of interrelated entities that relate a first entity with a second entity, and acting upon the identified sequence. Additionally or alternatively, one or more of the entities are identified as targets-of-interest, and the analytics operation is performed with respect to the targets-of-interest. In some embodiments, performing the analytics operation includes querying the entity records with a query formulated in a graph query language.
There is additionally provided, in accordance with an embodiment that is described herein, apparatus, including:
a memory; and
a link processor, which is configured to accept a plurality of relationship indications based on interaction among entities, each relationship indication specifying that a respective pair of the entities are related, to identify for each entity among a group of the entities, based on the relationship indications, one or more linked entities that are related to the entity, to construct for each entity in the group a single-entity record that indicates all the linked entities that have been identified as being related to the respective entity, to store in the memory multiple single-entity records, each single-entity record corresponding respectively to only one of the entities in the group, and to identify one or more of the entities as targets-of-interest.
The present disclosure will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
Some data analytics applications identify relationships among communication network users, and act upon the identified relationships. For example, a fraud detection system may identify a fraudulent user by discovering that this user interacts with other users who are already known as suspects. Changes in relationships (e.g., appearance of a new relationship, or a strengthening or weakening relationship) may also provide meaningful information. Relationships can be identified and characterized, for example, by analyzing communication sessions (e.g., phone conversations) held between the network users.
In many practical cases, identifying and acting upon relationships involves storage and processing of large volumes of data. Tracking relationships among users of a large cellular network, for example, may require processing of billions of Call Detail Records (CDRs) and keeping track of relationships among millions of users. It is possible in principle to represent a set of relationships by a matrix whose dimensions are on the order of the number of users, or as a list of user pairs. These naive data structures, however, are inefficient to store and query, and quickly become impractical as the number of users and relationships grows. In some practical applications, the storage space and processing time dictated by these data structures limit the achievable system performance.
Embodiments that are described herein provide methods and systems for efficient storage and processing of relationship-related data. In some embodiments, a link analysis system stores and acts upon relationships among entities (e.g., individuals, groups of individuals or even entire organizations). For each entity, the system constructs a single record, which indicates the entities that are related to (i.e., have a relationship with) this entity. In addition to indicating the related entities, a given record may also hold various attributes that characterize the relationships of the related entities with the entity in question.
When using the disclosed techniques, each entity is represented by a single record, and therefore the number of records is on the order of the number of entities. The average record size is on the order of the average number of relationships per entity, which does not change considerably when the number of entities grows. As such, the data structure is highly scalable and is particularly suitable for large-scale applications having large numbers of entities.
The methods and systems described herein are highly efficient in terms of memory requirements. In some embodiments, the small memory space required by the disclosed data structures makes them suitable for in-memory storage (i.e., in Random Access Memory (RAM) rather than on disk). As a result, the data structure can be queried at high speed, and complex queries can be performed at a reasonable run time. In addition to storage efficiency, the data structures described herein lend themselves to efficient execution of analytics operations, since they enable determining the entire set of entities that are related to a given entity in a single query. This capability is a powerful building block, which can be used to construct and execute complex analytics operations with high efficiency. Several example operations are described herein.
System DescriptionAlthough the embodiments described herein refer mainly to communication between communication network users, the disclosed techniques can be applied to various other kinds of relationships and interactions among entities, e.g., bank transactions, ownerships, kinship and other indications.
System 20 comprises a network interface 28, which receives from network 24 information regarding communication sessions held between the users. In the present example, interface 28 receives Call Detail Records (CDRs) produced in network 24, although any other type of information can also be used (for example e-mail communication or bank transfer records). System 20 further comprises a link processor 32, which carries out the methods described herein. In particular, processor 32 processes the CDRs so as to identify relationships between users, and stores the identified relationships in records and data structures that are described in detail below.
Typically, link processor 32 comprises a general-purpose processor, which is programmed in software to carry out the functions described herein. The software may be downloaded to the processor in electronic form, over a network, for example, or it may, alternatively or additionally, be provided and/or stored on tangible media, such as magnetic, optical, or electronic memory.
Typically, processor 32 produces records that represent the relationships between entities. These records are sometimes referred to herein as entity records. In some embodiments, the processor stores at least some of the records in an in-memory database 36. Database 36 stores the records in solid state memory, such as Random Access Memory (RAM), thus providing fast access time to the records. Additionally or alternatively, processor 32 may store at least some of the records in a static database 40. Database 40 typically comprises a magnetic storage device, such as a Hard Disk Drive (HDD). In comparison with database 36, database 40 typically provides considerably larger storage space but has a slower access time. In some embodiments, storage of the records is partitioned between the two databases, such as by storing dynamic and/or recent information in the in-memory database, and static and/or older information in static database 40. Processor 32 may transfer records between databases 36 and 40 as desired, for example on a periodic basis.
System 20 interacts with an operator 46 using an operator terminal 44. In particular, system 20 presents output to the operator using an output device such as a display 48, and accepts user input using an input device 52 such as a keyboard or mouse.
The system configuration shown in
In some embodiments, link processor 32 of system 20 analyzes the CDRs received from network 24, so as to produce a set of relationship indications. Each relationship indication specifies a relationship between two entities. An entity may comprise, for example, an individual (e.g., a network user), a group of individuals, a communication terminal (e.g., a cellular phone or a computer), a group of terminals or even an entire organization. Other types of entities may comprise, for example, e-mail addresses, Web-sites, bank accounts or home addresses. Each relationship specifies that a given pair of entities is related. Typically, two entities (e.g., individuals) are regarded as related if the CDRs indicate that they have communicated with one another.
Processor 32 may apply any suitable technique and any suitable criteria for converting the information received from network 24 into a set of relationships. Various techniques for identifying relationships are known in the art, and any such technique can be used by processor 32. Example techniques are described, for example, by Svenson et al., in “Social Network Analysis and Information Fusion for Anti-Terrorism,” Proceedings of the Conference on Civil and Military Readiness (CIMI), Enkoping, Sweden, May 16-18, 2006, by Pan, in “Effective and Efficient Methodologies for Social Network Analysis,” PhD Thesis submitted to Virginia Polytechnic Institute and State University, Dec. 11, 2007, and by Coffman et al., in “Graph-Based Technologies for Intelligence Analysis,” Communications of the ACM (CACM), volume 47, issue 3, March 2004, pages 45-47, which are all incorporated herein by reference.
In alternative embodiments, processor 32 does not generate the relationship indications, but rather receives them from another processor or system.
Generally, relationships may be symmetric (i.e., if entity A is related to entity B then B is necessarily related to A) or asymmetric. A relationship may be defined between entities of the same type (e.g., between two individuals) or between entities of different types (e.g., between an individual and a group of individuals). In some embodiments, processor 32 may assign each relationship one or more attributes. For example, a relationship may be assigned a strength or confidence level. In an example embodiment, entities that communicate frequently may be regarded by processor 32 as having a strong relationship, whereas entities that communicated only once or twice may be regarded as having a weak relationship. As another example, when analyzing bank transactions, the amount of money transferred between two entities may indicate the strength of the relationship. Additionally or alternatively, relationships may be assigned any other suitable attributes.
The set of relationship indications can be represented by a graph, in which nodes represent entities and edges represent relationships.
In some embodiments, processor 32 accepts a relationship graph as input. Alternatively, processor 32 may produce a relationship graph based on CDRs or other information received from network 24.
Processor 32 stores the relationship information in a data structure, which lends itself to efficient storage and subsequent processing. In some embodiments, processor 32 constructs and stores a single record for each entity, referred to as an entity record. The record of a given entity indicates the entities that are related to the given entity. The entities (nodes) related to a given entity (node) are also referred to as linked entities (linked nodes). A given record is typically retrievable in a single read operation.
For example, a data structure representing the relationship graph of
In some embodiments, each linked node 72 may comprise one or more attributes 76, which characterize the relationship in question. The attributes may indicate, for example, the strength or confidence level of the relationship. Attributes may comprise, for example, the number of times the two entities have communicated, the total time duration of the communication, the amount of money that was transferred between two accounts, the days on which the communication took place, or any other suitable attribute.
In some embodiments, certain nodes 68 in data structure 60 may also be assigned one or more attributes, which characterize the node and are not necessarily related to any specific relationship. For example, an attribute may mark whether or not the node is considered a target.
The data structure of
In alternative embodiments, processor 32 stores part of data structure 60 in in-memory database 36, and another part in static database 40. For example, the processor may store new and recently-modified records in the in-memory database, and static or old records in the static database. In these embodiments, analytics operations may involve accessing the in-memory database, the static database, or both.
In addition to storage efficiency, the format of data structure 60 lends itself to efficient execution of analytics operations. In particular, data structure 60 enables processor 32 to retrieve the entire set of entities that are related to an entity-of-interest in a single read operation—by querying the single record representing the entity-of-interest. This capability is a powerful building block, which can be used to construct and execute complex analytics operations with high efficiency.
For example, in many cases certain entities do not have a direct relationship, but are related indirectly via a sequence of (one or more) interrelated entities. In
Various kinds of analytics operations are concerned with the distances between entities. Some operations are initiated by operator 46. Other operations are carried out automatically by processor 32, such as operations that trigger a notification or alert upon meeting a certain condition defined over the distances. For example, operator 46 may request processor 32 to find the distance between a pair of entities or the shortest distance between a certain entity and a group of target entities. As another example, for a given entity, an analytics operation may identify the targets whose distance to the given entity does not exceed a certain value. The identity of the entities along the shortest path may also be of interest, and may be provided as output.
Calculating relationship distances between entities typically involves traversing the relationship graph (e.g., graph 54 of
For each entity (graph node), processor 32 produces a single entity record, at a record generation step. The record of a certain entity indicates the entities that are linked (related) to the entity in question. The processor stores the records in in-memory database 36, at a storage step 92. In some embodiments, the processor occasionally transfers static or relatively old records to static database 40.
Processor 32 performs analytics operations on the stored entity records, at an operation step 96. Some example operations have been described above. For some operations, operator 46 provides input (e.g., queries) using input device 52 of terminal 44. Outputs of the operations (e.g., answers to queries) can be displayed to the operator using display 48.
In some embodiments, the process of
Although the embodiments described herein mainly address efficient storage and processing of relationship information gathered from communication networks, the principles of the present disclosure can also be used for fraud investigation, anti-money laundering investigation, crime investigation, as well as web-page ranking. Generally, the relationship indications used by the disclosed techniques may be derived from any suitable kind of interaction among entities, not necessarily related to communication sessions.
It will thus be appreciated that the embodiments described above are cited by way of example, and that the present disclosure is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present disclosure includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
Claims
1. A method, comprising:
- accepting a plurality of relationship indications based on interaction among entities, each relationship indication specifying that a respective pair of the entities are related;
- using a link processor, identifying for each entity among a group of the entities, based on the relationship indications, one or more linked entities that are related to the entity, and constructing for each entity in the group, a single-entity record that indicates all the linked entities that have been identified as being related to the respective entity;
- storing in a memory multiple single-entity records, each single-entity record corresponding respectively to only one of the entities in the group; and
- identifying one or more of the entities as targets-of-interest.
2. The method according to claim 1, wherein the interaction comprises communication among the entities over a communication network.
3. The method according to claim 1, wherein the entities comprise at least one entity type selected from a group of types consisting of an individual, a set of individuals, a communication terminal, a plurality of communication terminals, an organization, an e-mail address, a Web-site, a bank account and a home address.
4. The method according to claim 1, wherein constructing the single-entity record comprises storing in the single-entity record respective attributes, which characterize respective relationships between the linked entities and the entity.
5. The method according to claim 4, wherein the attributes indicate respective confidence levels of the relationships.
6. The method according to claim 1, wherein storing the entity records comprises storing at least a portion of the entity records in an in-memory data structure residing in Random Access Memory (RAM), and wherein the method further comprises using the link processor to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying at least one of the entity records stored in the in-memory data structure.
7. The method according to claim 6, wherein storing the entity records comprises storing another portion of the entity records on a magnetic storage device.
8. The method according to claim 7, wherein performing the analytics operation comprises querying at least a first entity record stored in the in-memory data structure and at least a second entity record stored on the magnetic storage device.
9. The method according to claim 1, wherein the method further comprises using the link processor to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying at least one of the entity records stored in the memory, identifying a shortest sequence of interrelated entities that relate a first entity with a second entity, and acting upon the identified sequence.
10. The method according to claim 1, wherein the method further comprises using the link processor to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying the entity records with a query formulated in a graph query language.
11. The method according to claim 1, wherein the method further comprises using the link processor to perform an analytics operation with respect to the targets-of-interest, wherein the analytics operation comprises querying at least one of the entity records stored in the memory.
12. Apparatus, comprising:
- a memory; and
- a link processor, which is configured to accept a plurality of relationship indications based on interaction among entities, each relationship indication specifying that a respective pair of the entities are related, to identify for each entity among a group of the entities, based on the relationship indications, one or more linked entities that are related to the entity, to construct for each entity in the group a single-entity record that indicates all the linked entities that have been identified as being related to the respective entity, to store in the memory multiple single-entity records, each single-entity record corresponding respectively to only one of the entities in the group, and to identify one or more of the entities as targets-of-interest.
13. The apparatus according to claim 12, wherein the interaction comprises communication among the entities over a communication network.
14. The apparatus according to claim 12, wherein the entities comprise at least one entity type selected from a group of types consisting of an individual, a set of individuals, a communication terminal, a plurality of communication terminals, an organization, an e-mail address, a Web-site, a bank account and a home address.
15. The apparatus according to claim 12, wherein the link processor is configured to store in the single-entity record respective attributes, which characterize respective relationships between the linked entities and the entity.
16. The apparatus according to claim 15, wherein the attributes indicate respective confidence levels of the relationships.
17. The apparatus according to claim 12, wherein the memory comprises at least a Random Access Memory (RAM), and wherein the link processor is configured to store at least a portion of the entity records in an in-memory data structure residing in the RAM, and to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying at least one of the entity records stored in the in-memory data structure.
18. The apparatus according to claim 17, wherein the memory further comprises a magnetic storage device, and wherein the link processor is configured to store another portion of the entity records on the magnetic storage device.
19. The apparatus according to claim 18, wherein the link processor is configured to perform the analytics operation by querying at least a first entity record stored in the in-memory data structure and at least a second entity record stored on the magnetic storage device.
20. The apparatus according to claim 12, wherein the link processor is configured to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying at least one of the entity records stored in the memory, identifying a shortest sequence of interrelated entities that relate a first entity with a second entity, and acting upon the identified sequence.
21. The apparatus according to claim 12, wherein the link processor is configured to accept a query formulated in a graph query language, and to perform an analytics operation with respect to the entities, wherein the analytics operation comprises querying the entity records with the query.
22. The apparatus according to claim 12, wherein the link processor is configured to perform an analytics operation with respect to the targets-of-interest, wherein the analytics operation comprises querying at least one of the entity records stored in the memory.
Type: Application
Filed: Aug 7, 2015
Publication Date: Feb 4, 2016
Inventors: Gideon Hazzani (Rishon Le Zion), Gabby Shiner (Hod Hasharon)
Application Number: 14/821,061