Cross correlation of online identities
A method for analyzing social network accounts includes querying a first social network for data about a first user account associated with the first social network. The received data is stored in a social graph structure and used to determine derived characteristics of the first user account. The derived characteristics are quantified and used to generate a first feature vector for the first user account. The first feature vector of the first user account is compared with a second feature vector of a second user account associated with a second social network different from the first social network. Based on the comparison of the first and second feature vectors, it is determined whether the first user account and the second user account are associated with a same entity.
Latest Federal Data Systems LLC Patents:
The present disclosure relates generally to analyzing online social networks and, more particularly, to identifying social network accounts associated with the same entity.
BACKGROUNDWith the proliferation of social media and online communities, it is not uncommon for an individual to have multiple user accounts across different social networking platforms. However, social networks vary in terms of their theme or purpose, functionality, popularity, and accessibility. Given these differences, an individual may choose to present himself or herself in different ways depending on the particular social network. For example, a person or entity may use one user name for a first social network and quite a different user name for a second social network. In addition, the availability of demographic indicators associated with user accounts differs from one social network to the next. For these and other reasons, it is often difficult to determine when multiple social networking accounts are owned by the same individual.
A number of approaches have been developed for comparing user accounts across social networks. One existing approach looks to photograph matching, name similarity, and location similarity to correlate user accounts. Another approach attempts to correlate user accounts utilizing text based entropy of screen names, similarity of profile photos, and time patterns detected in posting activity and account creation. However, where these and other existing approaches fall short is that they fail to take into account that social networks are now broken into distinct subcategories, and each category does not provide the same measurable characteristics as the others.
SUMMARYThe following introduces a selection of concepts in a simplified form in order to provide a foundational understanding of some aspects of the present disclosure. The following is not an extensive overview of the disclosure, and is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. The following merely provides an overview for some of the concepts of the disclosure as an introduction to the more detailed description provided thereafter.
In an embodiment, a method for analyzing social network accounts includes: querying, using data processing hardware, a first social network for data about a first user account associated with the first social network; receiving, at the data processing hardware, the data about the first user account associated with the first social network; storing, by the data processing hardware, the received data in a social graph structure; determining, by the data processing hardware, derived characteristics of the first user account based on the data stored in the social graph structure; quantifying, by the data processing hardware, the derived characteristics of the first user account; generating, by the data processing hardware, a first feature vector for the first user account based on the quantified characteristics of the first user account; comparing, by the data processing hardware, the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and determining, by the data processing hardware, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity, where the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
According to an embodiment, a system comprises data processing hardware and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations including: querying a first social network for data about a first user account associated with the first social network; receiving the data about the first user account associated with the first social network; storing the received data in a social graph structure; determining derived characteristics of the first user account based on the data stored in the social graph structure; quantifying the derived characteristics of the first user account; generating a first feature vector for the first user account based on the quantified characteristics of the first user account; comparing the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and determining, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity, where the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
According to another embodiment, a non-transitory computer-readable storage medium includes instructions that, when executed by at least one processor of a computing device, cause the computing device to perform operations comprising: querying a first social network for data about a first user account associated with the first social network; receiving the data about the first user account associated with the first social network; storing the received data in a social graph structure; determining derived characteristics of the first user account based on the data stored in the social graph structure; quantifying the derived characteristics of the first user account; generating a first feature vector for the first user account based on the quantified characteristics of the first user account; comparing the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and determining, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity, where the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
Further scope of applicability of the systems, methods, and apparatus of the present disclosure will become apparent from the more detailed description given below. However, it should be understood that while specific examples indicating embodiments of the systems, methods, and apparatus, are given by way of illustration only, since various changes and modifications within the spirit and scope of the concepts disclosed herein will become apparent to those skilled in the art from the following detailed description.
Features of the present systems and techniques may be best understood from the following detailed description taken in conjunction with the accompanying drawings of which:
The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of what is claimed in the present disclosure.
Embodiments of the present disclosure and their advantages are best understood by referring to the detailed description that follows. It should be appreciated that like reference numbers are used to identify like elements illustrated in one or more of the figures, wherein showings therein are for purposes of illustrating embodiments of the present disclosure and not for purposes of limiting the same.
DETAILED DESCRIPTIONVarious examples and embodiments of the present disclosure will now be described. The following description provides specific details for a thorough understanding and enabling description of these examples. One of ordinary skill in the relevant art will understand, however, that one or more embodiments described herein may be practiced without many of these details. Likewise, one skilled in the relevant art will also understand that one or more embodiments of the present disclosure can include other features and/or functions not described in detail herein. Additionally, some well-known structures or functions may not be shown or described in detail below, so as to avoid unnecessarily obscuring the relevant description. It is originally intended to combine the configurations described in the various embodiments as appropriate. Also, one or more of the components in the embodiments disclosed herein may not be used.
Various embodiments set forth in the present disclosure are directed to a method, system, and computing device that utilizes feature vectors for comparing user accounts across social network platforms. As will be described in greater detail herein, the methods and systems use a graph distance measurement to compare features of different user accounts in order to identify accounts that potentially belong to the same person or entity.
Various embodiments of the disclosure are implemented in a computer networking environment. Turning to
Also communicatively linked to the network 110 are a plurality of social network systems 120a through 120n (where “n” is an arbitrary number). In an embodiment, each of social network systems 120a through 120n may be a network-addressable computing system that is capable of hosting an online social network. Each social network system 120 may be accessed by computing devices 102, 104a, 104b by any suitable manner (e.g., either directly or via network 110). In one embodiment, each social network system 120 may include one or more servers (not shown) such as, for example, web servers, mail servers, message servers, file servers, application servers, proxy servers, and the like. Each social network system 120 may also include one or more data stores (not shown) that may be used to store various types of information. Such data stores may be relational databases, for example. In an embodiment, each social network system 120 may generate, send, receive, and store social networking data including, for example, user profile data, social graph information, and other suitable data associated with the online social network.
Computing devices 102, 104a, and 104b, and social network systems 120a through 120n may be communicatively connected to network 110 via one or more links 122. While the present disclosure contemplates any suitable links 122, in one or more embodiments, links 122 may be wireless links (e.g., Wi-Fi or Worldwide Interoperability for Microwave Access (WiMAX)), wireline links (e.g., Digital Subscriber Line (DSL) or Data Over Cable Service Interface Specification (DOC SIS)), or optical links (e.g., Synchronous Optical Network (SONET) or Synchronous Digital Hierarchy (SDH)). In some embodiments, one or more of links 122 may each include an intranet, extranet, ad hoc network, VPN, LAN, WLAN, WAN, a portion of the Internet, a cellular technology-based network, another link 122, or any suitable combination of two or more such links 122. Furthermore, each of computing devices 102, 104a, and 104b, and social network systems 120 need not necessarily be connected to network 110 via the same type of link 122.
It should be noted that computing devices 102, 104a, and 104b depicted in
Although
According to an embodiment, one or more of the computing devices of
Each of the elements of
As used herein, “local memory” refers to one or both of memories 204 and 206 (i.e., memory accessible by processor 202 within the computing device). In some embodiments, secondary memory 206 is implemented as, or supplemented by an external memory 206A. Media storage device 112 is a possible implementation of external memory 206A. Processor 202 executes the instructions and uses the data to carry out various procedures including, in some embodiments, the methods described herein, including displaying a graphical user interface 218. Graphical user interface 218 is, according to one embodiment, software that processor 202 executes to display a report on display 210, and which permits a user to make inputs into the report via input devices 208.
In the exemplary embodiment of
The following description of examples and embodiments may sometimes refer to one or more of client software 118a, client software 118b, first computing device 102, second computing device 104a, or third computing device 104b as taking one or more actions. It is to be understood that such actions may involve one or both of client software 118a and client software 118b taking such actions as: (a) the client software transmitting hypertext transport protocol commands such as “Get” and “Post” in order to transmit to or receive information from software running on first computing device 102 (e.g., via a web server), and (b) the client software running a script (e.g., JavaScript) to send information to and retrieve information from software running on first computing device 102. First computing device 102 may ultimately obtain information (e.g., web pages or data to feed into plugins used by the client software) from database 114. It should be understood, however, that when a computing device (or software executing thereon) carries out an action, it is processor hardware 202 (the main processor and/or one or more secondary processors, such as a graphics processing unit, hardware codec, input-output controller, etc.) that carries out the action at the hardware level.
In one or more embodiments, media storage device 112 may store data in one or more data structures 116. One possible implementation of data structure 116 is a social graph structure. For example, in an embodiment, first computing device 102 may obtain data about user accounts from one or more of social network systems 120a through 120n. Such data obtained from social network systems 120 may be stored in a social graph structure 116.
A social graph structure may include multiple nodes and multiple edges connecting the nodes. An example social graph structure 300 is illustrated in
As shown in
At block 502, first computing device 102 queries a first social network (e.g., one of social network systems 120 in
At block 504, first computing device 102 may receive, in response to the query from step 502, data about the first user account associated with the first social network. At block 506, first computing device 102 may store the received data in a social graph structure (e.g., data structure 116 in
One possible example of the type of data that may be received from the social network and stored in the social graph structure as a nested JSON Object is: Account Username: {“friends:” [list], “followers:” [list], “mentions from:” [list], “mentions at”: [list]}. In some embodiments, the data may be formatted slightly different than in this example, and/or additional data about the first user account may be included. For example, the data received by the first computing device at block 502 may include data about the first user account's friends in the social network, followers in the social network, mentions of the first user account by other user accounts, mentions about other user accounts from the first user account, other user accounts who “liked” or reacted to a post by the first user account.
At block 508, first computing device 102 may determine derived characteristics of the first user account based on the data stored in the social graph structure for the first user account. In an embodiment, the derived characteristics determined at block 508 may include, for example, frequency of interaction, average time of interaction, time zone of the access, IP address of the access (e.g., what node on the social network did the first user account access), API of interaction (e.g., API access via web, mobile, direct, etc.), and the like.
At block 510, first computing device 102 determines a psychological profile for the first user account based on the derived characteristics of the first user account determined at block 508. In at least one embodiment, determining a psychological profile for the first user account includes generating (e.g., computing, measuring, etc.) a normalized weighted measurement (e.g., score) for at least one predefined personality trait or characteristic. For example, in one embodiment, first computing device 102 may generate a normalized weighted score for at least one of (i) agreeableness, (ii) conscientiousness, (iii) extraversion, (iv) emotional range, and (v) openness of the first user account. In another embodiment, first computing device 102 may generate a normalized weighted score for one or more other personality traits or characteristics in addition to or instead of the example traits discussed above. For example, the particular personality traits that are measured at block 510 may vary depending on a geographical location associated with the first user account.
In some embodiments, the method 500 may be performed without the actions of block 510. For example, in some scenarios there may not be enough data about the first user account to generate normalized measurements of personality traits so as to determine a psychological profile for the first user account. In such instances, the method 500 can proceed without block 510.
At block 512, the derived characteristics of the first user account determined at block 508 are quantified by first computing device 102.
At block 514, first computing device 102 generates a first feature vector for the first user account based on the quantified characteristics (determined at block 512) and, when a psychological profile for the first user account is determined at block 510, the normalized weighted score for the at least one personality trait or characteristic of the first user account. An example of creating a feature vector from raw data is shown in
At block 516, first computing device 102 compares the first feature vector of the first user account (generated at block 514) with a second feature vector of a second user account associated with a second social network (e.g., one of social network systems 120 in
At block 518, first computing device 102 determines, based on the comparison performed at block 516, whether the first user account and the second user account are associated with the same entity. In an embodiment, the determination may be made based on whether a calculated Hamming distance between the first and second feature vectors satisfies a predetermined condition such as, for example, being at or below a threshold Hamming distance. In another embodiment, the determination made at block 518 may include generating a confidence measure of whether the first user account and the second user account are associated with the same entity.
For the purposes of promoting an understanding of the principles of the disclosure, reference has been made to the embodiments illustrated in the drawings, and specific language has been used to describe these embodiments. However, no limitation of the scope of the disclosure is intended by this specific language, and the disclosure should be construed to encompass all embodiments that would normally occur to one of ordinary skill in the art. Descriptions of features or aspects within each embodiment should typically be considered as available for other similar features or aspects in other embodiments unless stated otherwise. The terminology used herein is for the purpose of describing the particular embodiments and is not intended to be limiting of exemplary embodiments of the disclosure. In the description of the embodiments, certain detailed explanations of related art are omitted when it is deemed that they may unnecessarily obscure the essence of the disclosure.
The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. Numerous modifications and adaptations will be readily apparent to those of ordinary skill in this art without departing from the scope of the invention as defined by the following claims. Therefore, the scope of the invention is defined not by the detailed description of the invention but by the following claims, and all differences within the scope will be construed as being included in the invention.
No item or component is essential to the practice of the invention unless the element is specifically described as “essential” or “critical”. It will also be recognized that the terms “comprises,” “comprising,” “includes,” “including,” “has,” and “having,” as used herein, are specifically intended to be read as open-ended terms of art. The use of the terms “a” and “an” and “the” and similar referents in the context of describing the invention (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless the context clearly indicates otherwise. In addition, it should be understood that although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements should not be limited by these terms, which are only used to distinguish one element from another. Furthermore, recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein.
Claims
1. A method for analyzing social network accounts, the method comprising:
- querying, using data processing hardware, a first social network for data about a first user account associated with the first social network;
- receiving, at the data processing hardware, the data about the first user account associated with the first social network;
- storing, by the data processing hardware, the received data in a social graph structure;
- determining, by the data processing hardware, derived characteristics of the first user account based on the data stored in the social graph structure;
- quantifying, by the data processing hardware, the derived characteristics of the first user account;
- determining, by the data processing hardware, a psychological profile for the first user account based on the derived characteristics of the first user account, wherein determining the psychological profile for the first user account comprises: generating, by the data processing hardware, a normalized weighted measurement for at least one predefined personality trait based on the derived characteristics of the first user account;
- generating, by the data processing hardware, a first feature vector for the first user account based on the quantified characteristics of the first user account and the normalized weighted measurement for the at least one predefined personality trait;
- comparing, by the data processing hardware, the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and
- determining, by the data processing hardware, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity,
- wherein the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
2. The method of claim 1, wherein the at least one predefined personality trait includes at least one of (i) agreeableness, (ii) conscientiousness, (iii) extraversion, (iv) emotional range, and (v) openness.
3. The method of claim 1, wherein comparing the first feature vector of the first user account with the second feature vector of the second user account comprises:
- calculating, by the data processing hardware, a Hamming distance between the first feature vector and the second feature vector.
4. The method of claim 3, wherein determining whether the first user account and the second user account are associated with the same entity comprises:
- determining, by the data processing hardware, whether the calculated Hamming distance between the first feature vector and the second feature vector satisfies the predetermined condition,
- wherein the predetermined condition is a threshold Hamming distance.
5. The method of claim 1, wherein the derived characteristics of the first user account are determined based on data about interactions between the first user account and the first social network.
6. The method of claim 1, wherein the derived characteristics of the first user account comprise one or more of (i) frequency of posting in the first social network by the first user account, (ii) IP addresses associated with the first user account, (iii) browser types associated with the first user account, and (iv) dialect associated with the first user account.
7. A system comprising:
- data processing hardware; and
- memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: querying a first social network for data about a first user account associated with the first social network; receiving the data about the first user account associated with the first social network; storing the received data in a social graph structure; determining derived characteristics of the first user account based on the data stored in the social graph structure; quantifying the derived characteristics of the first user account; determining a psychological profile for the first user account based on the derived characteristics of the first user account, wherein determining the psychological profile for the first user account comprises: generating a normalized weighted measurement for at least one predefined personality trait based on the derived characteristics of the first user account; generating a first feature vector for the first user account based on the quantified characteristics of the first user account and the normalized weighted measurement for the at least one predefined personality trait; comparing the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and determining, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity, wherein the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
8. The system of claim 7, wherein the at least one predefined personality trait includes at least one of (i) agreeableness, (ii) conscientiousness, (iii) extraversion, (iv) emotional range, and (v) openness.
9. The system of claim 7, wherein comparing the first feature vector of the first user account with the second feature vector of the second user account comprises:
- calculating a Hamming distance between the first feature vector and the second feature vector.
10. The system of claim 9, wherein determining whether the first user account and the second user account are associated with the same entity comprises:
- determining whether the calculated Hamming distance between the first feature vector and the second feature vector satisfies the predetermined condition,
- wherein the predetermined condition is a threshold Hamming distance.
11. The system of claim 7, wherein the derived characteristics of the first user account are determined based on data about interactions between the first user account and the first social network.
12. The system of claim 7, wherein the derived characteristics of the first user account comprise one or more of (i) frequency of posting in the first social network by the first user account, (ii) IP addresses associated with the first user account, (iii) browser types associated with the first user account, and (iv) dialect associated with the first user account.
13. The system of claim 7, wherein the social graph structure is formatted with an actor ontology.
14. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing device, cause the computing device to perform operations comprising:
- querying a first social network for data about a first user account associated with the first social network;
- receiving the data about the first user account associated with the first social network;
- storing the received data in a social graph structure;
- determining derived characteristics of the first user account based on the data stored in the social graph structure;
- quantifying the derived characteristics of the first user account;
- determining a psychological profile for the first user account based on the derived characteristics of the first user account, wherein determining the psychological profile for the first user account comprises: generating a normalized weighted measurement for at least one predefined personality trait based on the derived characteristics of the first user account;
- generating a first feature vector for the first user account based on the quantified characteristics of the first user account and the normalized weighted measurement for the at least one predefined personality trait;
- comparing the first feature vector of the first user account with a second feature vector of a second user account associated with a second social network different from the first social network; and
- determining, based on the comparison of the first and second feature vectors, whether the first user account and the second user account are associated with a same entity,
- wherein the first user account and the second user account are determined to be associated with the same entity when the comparison satisfies a predetermined condition.
15. The non-transitory computer-readable storage medium of claim 14, wherein the at least one predefined personality trait includes at least one of (i) agreeableness, (ii) conscientiousness, (iii) extraversion, (iv) emotional range, and (v) openness.
16. The non-transitory computer-readable storage medium of claim 14, wherein comparing the first feature vector of the first user account with the second feature vector of the second user account comprises:
- calculating a Hamming distance between the first feature vector and the second feature vector.
17. The non-transitory computer-readable storage medium of claim 16, wherein determining whether the first user account and the second user account are associated with the same entity comprises:
- determining whether the calculated Hamming distance between the first feature vector and the second feature vector satisfies the predetermined condition,
- wherein the predetermined condition is a threshold Hamming distance.
18. The non-transitory computer-readable storage medium of claim 14, wherein the derived characteristics of the first user account are determined based on data about interactions between the first user account and the first social network.
19. The non-transitory computer-readable storage medium of claim 14, wherein the derived characteristics of the first user account comprise one or more of (i) frequency of posting in the first social network by the first user account, (ii) IP addresses associated with the first user account, (iii) browser types associated with the first user account, and (iv) dialect associated with the first user account.
20110179116 | July 21, 2011 | Solomon |
20160110424 | April 21, 2016 | Goeppinger |
20160371346 | December 22, 2016 | White |
20180218309 | August 2, 2018 | Keen |
20210142191 | May 13, 2021 | Faruquie |
- Schafer, J. (2019). Psychological Narrative Analysis: A Professional Method to Detect Deception in Written and Oral Communications (2nd ed.). Charles C Thomas Publisher, Ltd.
- Goga, Oana, et al., “Large-scale Correlation of Accounts Across Social Networks,” International Computer Science Institute, Apr. 2013.
- Gurajala, Supraja, et al., “Fake Twitter accounts: Profile characteristics obtained using an activity-based pattern detection approach,” Pending Publication, 2015.
- Koppel, Moshe, et al., “Automatically Categorizing Written Texts by Author Gender,” Literary and Linguistic Computing, vol. 17, Issue 4, Nov. 2002, pp. 401-412, doi:10.1093/llc/17.4.401 (http://u.cs.biu.ac.il/˜koppel/papers/male-female-llc-final.pdf).
- White, Joshua S., et al., “It's you on photo?: Automatic Detection of Twitter Accounts Infected With the Blackhole Exploit Kit,” IEEE Conference MALWARE 2013, Puerto Rico, DOI: 10.1109/MALWARE.2013.6703685.
Type: Grant
Filed: May 25, 2021
Date of Patent: Dec 26, 2023
Patent Publication Number: 20220382817
Assignee: Federal Data Systems LLC (Columbia, MD)
Inventor: Joshua S. White (Ilion, NY)
Primary Examiner: Grace Park
Application Number: 17/329,258
International Classification: G06F 16/28 (20190101); G06F 16/9535 (20190101); G06F 16/9536 (20190101); G06F 16/901 (20190101);