NETWORK GRAPH PARSER
An approach for processing node data from code repository websites to generate patterns is disclosed. Node data can be parsed from a projects webpage or received from a code repository server hosting the repository website. Visualizations can be generated in a browser from the node data. The visualizations can be displayed within the browser and further be used to receive filter instructions. Refined node data can then be exported for further analysis.
This application is a continuation of U.S. patent application Ser. No. 15/642,820, filed Jul. 6, 2017, which claims priority to U.S. Provisional Patent Application Ser. No. 62/448,081, filed Jan. 19, 2017, the disclosure of which are incorporated herein in their entireties by reference.
TECHNICAL FIELDEmbodiments of the present disclosure relate generally to pattern detection and, more particularly, but not by way of limitation, to manipulating data via a network graph parser to expose previously undetected patterns.
BACKGROUNDA code repository website allows users to publish software code projects to the website so that other users can access, view, edit, or otherwise use the published software code. Identifying how different projects (e.g., software coding projects) are related to one another is currently impractical because the project data on the code repository websites is largely unstructured.
Various ones of the appended drawings merely illustrate example embodiments of the present disclosure and should not be considered as limiting its scope.
The description that follows includes systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative embodiments of the disclosure. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be evident, however, to those skilled in the art, that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques are not necessarily shown in detail.
In various example embodiments, a network graph parser is implemented to parse data from websites (e.g., code repository websites) into human understandable patterns. According to some example embodiments, the code repository websites are websites or network-based publication platforms (e.g., Internet forums) that allow users to publish data viewable by other users of the website or platform. For example, a software developer can create a project page on a code repository site and publish his/her code for the project to the project page. Other uses may navigate to the project page, view, download, or modify the code for the projects.
According to some example embodiments, the network graph parser is installed as a browser plugin of an Internet browser application. A data analyst may navigate to a given page on a repository website projects, such as a page created or associated with the project or a contributor. The analyst may then trigger the parse operation by selecting a browser plugin button. The parse operation goes through the page and saves data on the page and on related pages. For example, the network graph parser may identify links to projects listed on the repository website. In some embodiments, the network graph parser may navigate to each of the projects.
The saved data may be used to generate a visual representation (e.g., a network graph) of the collected data. The data analyst may manipulate the visual representation to explore patterns. Further, the data analyst may hone down onto specific subsets by issuing filter instructions. For example, the data analyst may filter out any connections that don't have at least two connections to other nodes. Contributors may have connections to one another by working together on the same coding project, as an example. The various filter instructions expose previously invisible patterns in the network graph. The honed down data containing the pattern can then be exported over a network to a data analysis server for further analysis, according to some example embodiments.
In some embodiments, some of the plurality of repository servers 130-1 to 130-n can be a part of a cloud, which can include, for example, one or more networked servers. Such networked servers may be termed a data center or a server farm. Such data centers currently are maintained by various communication network service providers. Network 120 can be, for example, the Internet, an intranet, a local area network, a wide area network, a campus area network, a metropolitan area network, an extranet, a private extranet, or a combination of any of these or other appropriate networks.
For the exemplary embodiment of
The electronic device 110 may be implemented by one or more specially configured computing devices. The electronic device 110 may be hard-wired to perform the operations, techniques, etc. described herein. The electronic device 110 can include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the operations, techniques, etc. described herein. The electronic device 110 can include one or more general purpose hardware processors (including processor circuitry) programmed to perform such features of the present disclosure pursuant to program instructions in firmware, memory, other storage, or a combination. The electronic device 110 can also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the methods and other features.
The electronic device 110 can be generally controlled and coordinated by operating system software, such as iOS, Android, Blackberry, Chrome OS, Windows XP, Windows Vista, Windows 7, Windows 8, Windows Server, Windows CE, Unix, Linux, SunOS, Solaris, VxWorks, or a proprietary operating system. The operating system controls and schedules computer processes for execution, perform memory management, provide file system, networking, I/O services, and provides a user interface functionality, such as a graphical user interface (“GUI”), among other things.
In some example embodiments, where the repository website is configured to provide node data, the parse engine 220 is configured to send node data requests for the users of the repository website. The repository website can receive the requests and issue responses including the requested node data. The node data engine 230 is configured to process the node data received via the code projects webpage (e.g., via spidering) or received from the repository website. The node data engine 230 can receive filter instructions from a user and cull (e.g., refine) the node data by removing data of users that do not meet the requirements of the filter instruction, as explained in further detail below. The visualization engine 240 is configured to use the initial node data or the refined node data and generate different types of visualizations for display on the display screen of the electronic device 110. The visualizations may include a network graph, a histogram, graphs such bar charts or data plots, and other visualizations. The export engine 250 is configured to export the refined dataset to an analysis server for further analysis.
At operation 310, a network graph parser 234 is installed as a plugin in the web browser 1632 of the electronic device. The network graph parser 234 may be termed a browser extension, according to some example embodiments. The network graph parser 234 extends the functionality of the web browser 1632, as is described in detail below. The network graph parser 234 may be authored using a web technologies such as HTML, JavaScript, or CSS (Cascading Style Sheets).
Referring again to
According to some example embodiments, the repository server 130 is accessed through the browser 1632 causing sending of a request (e.g. an HTTP request) to the repository server 130 (in particular to a webserver included as part thereof.
Once the user has accessed the repository server 130 using the browser 1632, they may control the browser 1632 to interact with the repository server 130 using user interface controls provided in the browser 1632 by the network graph parser 234 or using controls provided by the browser itself. In some example embodiments, the information received from the repository webservice comprises a projects webpage showing different coding or software projects associated with a user of the repository webservice.
At operation 330, the parse engine 220 parses the data from the projects webpage and stores the data in local memory of the electronic device 110. According to some example embodiments, node data is a user profile and relates to an entity that is included as part of the repository network service provided by the repository server 130. Further, according to some example embodiments, an entity typically relates to an individual programmer, but may relate to an organization, for instance a business or other group. In some example embodiments, a software developer profile includes at least a unique identifier (the identifier uniquely identifies the entity on the repository service), a name for the entity (typically a string of text, perhaps alphanumeric characters) and a plurality of links between the entity and other entities that form part of the repository webservice.
The links may be bidirectional in nature. For example, two software developers may collaborate on the same code project. Because the two developers work on the same coding project, they may be bidirecitonally linked under the assumption that each knows of the other as a fellow coder (e.g., team member, colleague) on the project. The links may alternatively be unidirectional, e.g., the first software developer receives updates published by the second software developer but the second software developer does not receive updates published by the first software developer. In some embodiments, the data stored on the repository website indicates the type of communication activity between the users. For example, the node data may include an indication that a first user commented on a pending code update on a project page on the repository website. The links may indicate the another entity by including a identifier that is unique to the other entity. Typically, repository webservices provide an identifier that is an alphanumeric string. The string may be known to the entity and other users (e.g. it may be their username) or it may be a system-generated identifier which does not need to be known to the user (e.g. a string such as “exampleidentifier$43*”). The profile may also include a uniform resource locator (URL) that is unique to the entity.
A user profile of the repository website may also have other information associated with pre-defined fields, for instance ‘high school attended’, ‘place of residence’, ‘place of work’, ‘undergraduate study subject’, etc. The profile may also have other content such as photographs, videos, comments or profile text, etc. Profile content may be associated with particular dates (and as such may appear in a timeline on a user's profile page) or may not be dependent on a date (and so may not generally appear on a timeline). In some embodiments, profile content may be associated with geotagged data.
In some example embodiments, user profiles are imported in response to user input. For example, a first profile is imported by the network graph parser 234 in response to the user selecting a first entity in the repository service. This may occur for instance by the user selecting a hyperlink in a code projects webpage provided by the repository server 130. The code projects webpage may be provided by the repository server 130 in response to the user entering text, e.g. the whole or part of a name on an entity, into a search field of a webpage provided by the by the repository server 130. The code projects webpage displays coding projects of the first entity, where each of the coding projects has its own projects webpage, which can be spidered as described above. According to some example embodiments, upon selection of the first entity, the network graph parser 234 sends a request to the repository server 130 identifying the first entity. In response, the repository server 130 provides the code projects webpage of the first entity, which is then parsed by the network graph parser 234. In some example embodiments, the network graph parser 234 extracts node data from the code projects webpage by accessing the source code (e.g., markup language) of the code projects webpage and then extracting the node data listed in the source code. The received node data is stored in volatile memory (e.g. RAM) allocated to the browser 1632, but is not stored in permanent memory, e.g. ROM.
After the node data of the first entity is imported by the network graph parser 234, or at least after importation has begun, the user selects a second entity. This may occur for instance by the user selecting a hyperlink relating to the second entity in a second code projects webpage provided by the repository server 130. Upon selection of the second entity, the network graph parser 234 sends a request to the repository server 130 identifying the second entity. In response, the repository server 130 provides a second code projects webpage that lists all the coding projects for the second entity on the repository server. The parse engine 220 then parses the source code of the second code projects webpage to extract additional node data of the users associated with the second entity (e.g., users that have worked on the same coding project as the second entity). The received profile is stored in volatile memory (e.g. RAM) allocated to the browser 1632, but is not stored in permanent memory, e.g. ROM.
According to some example embodiments, the network graph parser 234 is configured to automatically import node data for entities to which the first and second entities are linked, e.g., for which links from the first and second entities exist. The parse engine 220 is configured to import such node data by sending requests to the repository server 130 identifying the further entities and navigating to the code projects webpages of the entities.
At operation 235, the node data engine 230 transforms the contributor data parsed from the projects webpage from a first format into a second format. For example, the underlying source code of the projects webpage may be a markup language, such as HTML. The node data parsed from the projects webpage may also be in the markup language format. The node data engine 230 is configured to transform the node data from the markup language format to an attribute-value format, such as JSON (JavaScript Object Notation). The node data in the second format can be used for filtering and generation of the visualizations.
At operation 340, the visualization engine 240 creates a visual representation from the parsed node data (e.g., node data in the attribute-value format). In some example embodiments, the visual representation is generated as a network graph in an additional tab of the browser 1632. The network graph includes a collection of nodes connected by edges. Each node corresponds to a user from one of the projects listed on a code projects webpage, and connections between individual nodes may be visually represented as lines, for example straight lines. In some example embodiments, two nodes are connected on the repository server if each of the nodes are associated with the same coding project. The graph may lend itself to be further processed, analyzed and manipulated by an analyst or other user. The details regarding operation 340 are explained in more detail later.
At operation 350, the export engine 250 exports the graph formed from the operation 340 to the database system 10. The database system 10 is connected to the electronic device 110 (as shown in
At operation 410, the interface engine 210 receives selection of first entity through a user input, for instance through a bookmark, favorite, or through selection of an option provided in a list of search results. At operation 420, the interface engine 210 requests the profile of the first entity. This involves the network graph parser 234 accessing the repository server 130 via the network 120 and in particular accessing the first entity (e.g., projects webpage of the first entity) in the repository server 130. In particular, the network graph parser 234 may send an HTTP request to the repository server 130, the request including the unique identifier of the first entity.
At operation 430, the network graph parser 234 receives the profile or projects webpage of first entity. The profile is for example received as an HTTP response. According to some example embodiments, the profile includes a name for the first entity and details of connections of the first entity. The connections define links to other entities, and include unique identifiers for the other entities. In some example embodiments, one or more webpages of the first entity may be exposed through automatic scrolling of the one or more webpages. For example, a top portion of a first entity's webpage may be initially retrieved, and further portions below the top portions may be auto populated by script as those portions are scrolled to. In some example embodiments, the auto populated scrolled-to portions are received at operation 430.
At operation 440, the visualization engine 240 displays a graph relating to first entity. For example, the network graph parser 234 may display a group or ‘cloud’ of nodes, each node relating to an entity. The node relating to the first entity is displayed with different visible characteristics to nodes for other entities. For instance, it may be a different color or size. All the nodes for entities linked to the first entity are shown as being connected by the inclusion on the graph of a line, e.g. a straight line, connecting the node to the node for the first entity. In some embodiments, connections between nodes other than connections between the first entity and other nodes may not be displayed in the graph.
Further, in some example embodiments, a further entity (e.g., a second entity of operation 460) need not be specified for links between nodes to be created. For example, an entity associated with a given code repository page may be identified (e.g., at operation 410). The code repository page may list other coding projects with which the entity is involved (e.g., develops code). Each coding project may list other further entities associated with the given project. Using the identified entity, the additional projects and additional entities can all automatically be included in a single network graph, according to some example embodiments.
In the following discussion, the terms ‘connected’ and ‘linked’ in relation to entities included in the electronic repository website can be used interchangeably.
At operation 450, the network graph parser 234 begins requesting profiles of entities linked to by the first entity. In some example embodiments, the profiles are parsed from a code projects webpage of the first entity. For example, users associated with the first entity may be displayed in a projects webpage. The underlying markup language of the code projects webpage can be parsed to extract the username, user profile URL, and other information for each of the users associated with the first entity.
At operation 455, profiles of the entities are stored as they are received. In one embodiment, the profiles are stored in non-volatile memory that is allocated to the browser 1632. Profiles may continue to be requested and saved as a background task whilst the network graph parser 234 performs other tasks.
At operation 460, the interface engine 210 receives selection of a second entity. This may occur as described above in relation to receiving selection of the first entity.
At operation 470, the network graph parser 234 receives the profile of the second entity, after requesting the profile of the second entity. The profile is for example received as part of an http response. The profile includes at least a name for the second entity and details of connections of the second entity. The connections define links to other entities, and include unique identifiers for the other entities.
At operation 490, the visualization engine 240 displays a graph relating to the first and second entities. For example, the network graph parser 234 may display three groups (or clouds) of nodes 510, 520, 530, each node relating to an entity. The nodes 501 and 502 relating to the first and second entities are displayed with different visible characteristics to nodes for other entities. For instance, they may be a different color or size. Each node of the first group 530 of nodes corresponds to an entity linked to in the profiles of both the first and second entities. Each node of the second group 510 of nodes corresponds to an entity linked to by the profile of the first entity but not by the profile of the second entity. Each node of the third group 520 of nodes corresponds to an entity linked to by the profile of the second entity but not by the profile of the first entity. All the nodes for entities connected to the first entity are shown as being connected to the node 501 by the inclusion on the graph of a line, e.g. a straight line, connecting the node to the node for the first entity. All the nodes for entities connected to the second entity are shown as being connected to the node 502 by the inclusion on the graph of a line, e.g. a straight line, connecting the node to the node for the second entity. Connections between nodes other than connections between one of the node 501 and the node 502 and other nodes are not displayed in the graph.
At operation 490, the visualization engine 240 creates a new graph after removing the graph as shown in
At operation 495, the interface engine 210 begins requesting profiles of entities linked to by the second entity. In some example embodiments, the operations of 450 and 495 (e.g., requests for profiles of related entities) are initiated by a manual user request. For example, after the user (at operation 420) requests profile of first entity, the user (at operation 450) further requests (e.g., using a GUI button) the profiles of entities related to the first entity. Further, according to some example embodiments, the operations of 450 and 495 are performed automatically by the network graph parser. For example, after the user (at operation 420) requests profile information of the first entity, the network graph parser 234 automatically retrieves and sends the profile information of the specified first entity but also retrieves and sends profile information of entities related to the first entity automatically (e.g., without the user manually initiating the request for profile information of the related entities).
At operation 497, profiles of the entities are stored as they are received. In one embodiment, the profiles are stored in volatile memory, e.g. the RAM 1606, that is allocated to the browser 1632. Profiles may continue to be requested and saved as a background task whilst the network graph parser 234 performs other tasks. Further, according to some example embodiments, the display operations of method 400 (e.g., operations 440 and 490) are bypassed until the some or all of the information collection operations (e.g., operations 410, 420, 430, 450, 455, 460, 470, 480, 495, and 497) are completed.
At operation 620, the interface engine 210 requests or parses the profile of the further entity. This is similar to operation 420. This involves the interface engine 210 accessing the repository server 130 via the network 120 and accessing first entity in one of the electronic repository webservice system in the repository server 130. In particular, the interface engine 210 may send an HTTP request to the repository server 130, the request including the unique identifier of the first entity. Alternatively, the network connection parser can parse a code projects webpage to extract profile information of user connected to the second entity.
At operation 630, the interface engine 210 receives the profile of the further entity. This is similar to operation 430. The profile is for example received as part of an HTTP response. The profile includes at least a name for the first entity and details of connections of the further entity. The connections define links to other entities, and include unique identifiers for the other entities.
At operation 640, the visualization engine 240 displays a graph relating to all the selected entities. Here, the visualization engine 240 may cause display of multiple groups (or clouds) of nodes, each node relating to an entity. Each group relates to a collection of nodes that have the same connections to the selected entities. Where there are three selected entities, there are seven groups. Each node of the first group of nodes corresponds to an entity linked to in the profiles of both the first and second entities, but not the third entity. Each node of the second group of nodes corresponds to an entity linked to by the profile of the first entity but not by the profile of the second or third entities. Each node of the third group of nodes corresponds to an entity linked to by the profile of the second entity but not by the profile of the first or third entities. Each node of the fourth group of nodes corresponds to an entity linked to in the profiles of both the first and third entities, but not the second entity. Each node of the fifth group of nodes corresponds to an entity linked to by the profile of the second and third entities but not by the profile of the first entity. Each node of the sixth group of nodes corresponds to an entity linked to by the profiles of the second and third entities but not by the profile of the first entity. Each node of the seventh group corresponds to an entity linked to by each of the first, second and third entities. One or more of the groups may not exist, if there are no nodes that meet the criteria for that group (these groups might be said to have zero nodes).
The nodes relating to the selected entities are displayed with different visible characteristics to nodes for other entities. For instance, they may be a different color or size. All the nodes for entities connected to the one of the selected entities are shown as being connected by the inclusion on the graph of a line, e.g. a straight line, connecting the node to the node for the selected entity. Where a non-selected node has links to multiple selected entities, there is a line for each such connection. In some embodiments, connections between two nodes that relate to non-selected entities may be hidden or not displayed in the graph. In some example embodiments, the graph may simplify or de-clutter the graph by hiding links between nodes and/or nodes based upon whether a give node or one of its neighbors is selected. For example, if the user selects a given node, the visualization engine may only display notes that are directly linked to the given node.
At operation 650, the interface engine 210 begins requesting profiles of entities linked to by the further entity. In some example embodiments, the user manually requests the profiles of entities linked to by the further entity. At operation 660, profiles of the entities are stored as they are received. In one embodiment, the profiles may be stored in volatile memory that is allocated to the browser 1632. Profiles may continue to be requested and saved as a background task whilst the network graph parser 234 performs other tasks. At operation 670, the operation may check whether another entity has been selected by the user. If so, the operation returns to operation 620, where the profile for the further selected entity is requested. Further, in some example embodiments, the selections of additional entities are processed in batches. For example, instead of requesting information of a single further entity and then receiving the information of the single further entity (e.g., method 600), the user can select a plurality of entities, then request their information as a batch process (e.g., as part of a single request).
Further, according to some example embodiments, the display operation of method 600 may be bypassed or delayed until other operations are complete. For example, operation 650 (an information collection related operation) may be performed before operation 640 (a display related operation). As a further example, the information collected at operation 650 may be stored to memory and operation 640 is bypassed and a display is never generated).
It can be seen from
At operation 810, the network graph parser 234 selects one of the fields of a profile relating to one of the selected entities 901, 902, 903. In this example, the profile contains fields of information common to all or many of the profiles such as place of birth, birth year, high school, and place of work.
At operation 820, the node data engine 230 then searches in all or selected imported profiles for profiles which have the same information in the same field. In particular, the node data engine 230 identifies which fields of the profile of the selected entity are populated. For a populated field, the plugin extracts the information (text, numbers or text and numbers) from the profile and searches the corresponding field of all the other profiles for the same information. Since the profiles for the entities are stored in the volatile memory allocated to the browser 1632, this searching can be relatively fast.
At operation 830, the node data engine 230 generates a record indicating any other entity which has the same information in the same field of the profile. The record is made in the working (volatile) memory 206 allocated to the web browser 1632.
At operation 840, the node data engine 230 determines whether there are other fields in the profile for the selected entity that include information and that have yet to be processed. If there are such other fields, then the method proceeds to operation 850, where another field is selected, before the method returns to operation 820. If all the fields have been processed, the method proceeds to operation 860.
At operation 860, the node data engine 230 determines whether all the selected entities have been processed. If not, then the next entity is selected for processing at operation 870 and the method then returns to operation 810. If so, then at operation 880 the visualization engine 240 generates a histogram from the processed data. According to some embodiments, operation 880 is reached only when all completed fields for the selected entities (the entities which have been selected by a user in the method 300, the method 400 or the method 600).
According to some example embodiments, operation 880 involves identifying counting the number of profiles with the same information in the same field, and forming a list. The list may ordered according to the count of profiles or by a value of the field. Following operation 880, the histogram is displayed on a display screen of electronic device 110 at operation 890. Operations 810 to 880 may be performed by the network graph parser 234 without the user having requested a histogram, according to some example embodiments. In this case, however, the histogram may be displayed at operation 890 only in response to the option having been selected by the user. In
Returning to
At any time, any one of the nodes in the graph 900 may be selected by the user using the input device 214 and the cursor control 216. Once selected, the profile 995 of the entity corresponding to the nodes may be displayed near the graph 900. In
At operation 1020, the node data engine 230 may search in the profiles of the imported entities in the generated graph 1100 which have an entry that matches with the keyword input in the search tool 1150. This is performed by searching the information in the profiles as stored in the working volatile memory allocated to the browser 1632. At operation 1030, if one or more profiles are found to have the same text as the input text, the method proceeds to operation 1040. Here, the corresponding nodes in the graph 1100 are highlighted via the visualization engine 240. If not, the result of search is reported at operation 1050. In the example of
The filter instruction may be provided in response to a user input selecting a filter option, for instance through interaction with a user interface element in a sidebar, dock, pull-down menu etc. If the number of entities displayed in the graph 1300 is large, the graph may be of limited use to an analyst. The filtering method 1200 allows the isolation of the most significant entities and the removal of less significant entities. Such operation of filtering or reducing data may lead to more efficient, focused and targeted approach in repository website user analysis. This applies to analysis using the network graph parser 234 and to subsequent analysis after export to the database system 10. Furthermore, trimming the graph before exporting data to the database system 10 may prevent the personal profile data of only marginally relevant or irrelevant individuals unnecessarily entering into the database system 10 for analysis. It may also provide regulation compliance advantages since information relating to fewer entities is imported into the database system 10.
At operation 1210, the interface engine 210 generates a user interface element 1350 configured to receive a user input specifying a connection parameter, such as a minimum number of links that is of interest to the user (e.g., a level of connectedness). Limiting the minimum number of links may assist in selecting the entities with the most meaningful connections in the network represented in the graph 1300. The user interface element 1350 may receive the user input via the input device 1614 or the cursor control 1616.
At operation 1220, the node data engine 230 identifies the entities linked to other entities by the number of connections specified by the user input at operation 1210. All of the connections in
In the example of
In
In case the profile description information have been imported along with the entities in the graph 1300, they may be removed along with the entity at operation 1240. After operation 1240, the reduced graphs 1310 or 1320 and/or associated profile description information may be exported to the database system 10 via export engine 250. Though visual graphs are depicted in
At operation 1420, the interface engine 210 receives a user input specifying an analysis description. The analysis description may be free text. It may relate to the origin, the history and the description of the data and the details regarding the repository website analysis performed. The analysis description may assist in generating trails such that it can be monitored that the performed analysis complies with any rules or regulations that may be relevant in the specific field of analysis. The analysis description also may be useful in case multiple sets of reduced and processed graphs are generated from different starting accessed entities, for example. If a specific entities appear in multiple sets of graphs, the analysis description of each graph may provide additional information therefore provide compounding value of multiple investigations.
At operation 1430, the network graph parser 234 may export the data to the database system 10 via export engine 250. Operation 1430 may involve exporting data relating to entities corresponding to nodes displayed in the graph to the database system 10 without exporting data relating to entities corresponding to nodes not displayed in the graph. In the database system 10, the reduced graph and the associated data may be transformed according to the specific ontology of the deployment for further analysis.
Various modification and alternatives will be apparent to the person skilled in the art and all such modifications and alternatives are intended to be encompassed with the claims Some such modifications and alternatives will now be described.
Although in the above, the profiles for the user-selected entities are sourced from the same electronic repository website service provider, the scope is not limited to this. In other embodiments, profiles for an entity may be retrieved from two or more different repository servers 130-1 to 130-n. In this case, the entity would ordinarily have different identities or usernames on the different electronic repository websites. However, the profiles can be determined by the network graph parser 234 to be related to the same entity by information included in either profile or in both profiles, or may be entered into the network graph parser 234 by the user of the network graph parser 234. Alternatively or in addition, two or more different entities from different electronic repository servers 130 may be selected by the user of the network graph parser 234 as seed entities. In this case, information in profiles for linked to entities may be used to connect profiles in one or more of the repository servers (e.g., repository server 130-1) to corresponding profiles for the same entities in another repository server (e.g., repository server 130-2).
In the above, when an entity is selected for analysis, all of the entities linked to by that profile are retrieved from the electronic repository server 130 and displayed in a graph. Alternatively, a user may specify a limit on the number of entities that are to be retrieved from the electronic repository server 130 by the network graph parser 234 and displayed in a graph. This may be globally set as a setting by the plugin, or it may be selected or entered by the user at the time of selecting the entity. In the above, the histogram is formed from same information in same fields or profiles. Alternatively or in addition, information such as geotag information from photos, comments, mentions, replies, and/or such like.
The projects webpage displays the user's uploaded software or project data 1510 as display elements (e.g., boxes, static text, hyperlinks). The title for each of the projects may contain a hyperlink that links to the project page for the corresponding project. For example, in the first listed project, “Smartwatch Exercise App” may be a hyperlink that links to a project page for that project. The project page for “Smartwatch Exercise App” may display source code uploaded by the software developer “Joan Labrador”. The project page may further contain links to the user profile pages of the seventeen developers that work on that project.
The projects webpage is received as HTTP data from the repository server 130. The webpage is generated from underlying source code in a format, such as HTML. To initiate parsing, the analyst user selects a plugin button 1515 which, as displayed, is integrated into the browser 1500. Responsive to the selection, the interface engine 210 displays a popup window 1520 having different parse options. According to some example embodiments, the first option “Graph” parses all users associated with the user “Joan Labrador” and creates a visualization from the data as discussed above. The second option “Add to graph” adds Joan Labrador as a second entity. For example, the analyst user may have selected a first user to parse (e.g., collect node data of related developers), and then want to select Joan Labrador as a further entity to parse (e.g., collect node data of developers related to Joan Labrador to add to the graph).
Assuming, to continue the example, the data analyst selects the first option “Graph”, the network graph parser 234 parses the source code that generates the projects webpage to extract node data from Joan's projects as discussed above. For example, the parse engine 220 can identify each of Joan's projects, including (1) “Smartwatch Exercise App”, (2) Java Note Taking client”, and (3) “Acme Corp. Enterprise CRM System”. The parse engine can navigate to the project page for each of the projects to identify users associated with Joan. For example, the parse engine 220 can user the hyperlink “Smartwatch Exercise App” to navigate to the project page for that project. Further, the parse engine can then identify user profile links on the project page (e.g., the 17 developers working on the “Smartwatch Exercise App” project) and navigate to the user pages to collect node data such as user name, profile page URL, for each of the associated users. The parse engine may perform similar operations to collect node data for the users associated with the other two code projects. The resulting data can then be used to generate visualizations, as shown in
In
Further, according to some example embodiments, the right bar area may be used to show other types of visualizations, such as the histogram 990, instead of the node data. The analyst can then user the histogram to select groups to modify the visualization 1555. In some example embodiments, the network graph parser spiders to one or more hyperlink for each users listed in a project page and to collect parsed node data similar to Joan's parsed node data 1557.
Main memory 1606 also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1604. Such instructions, when stored in non-transitory storage media accessible to one or more processors 1604, render computer system 1600 into a special-purpose machine that is customized to perform the operations specified in the instructions. Main memory 1606 may also be used for temporarily storing the whole of part of applications, such as the web browser 1632, including the network graph parser 234, while they are being executed by the electronic device 110. As illustrated in
The main memory 1606 is a volatile memory in that data stored therein is lost when power is no longer provided to the memory 1606. The main memory 1606 is used to temporarily store information that is being processed by software applications, including the web browser 1632 and the network graph parser 234. In relation to the web browser 1632 and the network graph parser 234, information that is temporarily stored includes webpages and ancillary content that is received from the repository servers 130-1 to 130-n. In relation to the web browser 1632 and the network graph parser 234, information that is temporarily stored also includes information parsed from webpages by the network graph parser 234 and information derived from such received information by the plugin, as is described in detail below.
Computer system 1600 further includes a read only memory (ROM) 1608 or other static storage device coupled to bus 1602 for storing static information and instructions for processor 1604. The ROM 1608 is used for permanent storage of applications such as the web browser 1632, including the network graph parser 234, when the electronic device is not powered on and/or when the applications are not being executed by the processor 1604. The storage is of the computer code or instructions that constitute the applications. A storage device 1610, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 1602 for storing information and instructions.
Computer system 1600 can be coupled via bus 1602 to a display 1612, such as an LCD or plasma display, or a touchscreen or cathode ray tube (CRT), for displaying information to a computer user. An input device 1614, for instance a keyboard, including alphanumeric and other keys, is coupled to bus 1602 for communicating information and command selections to processor 1604. Another type of user input device is cursor control 1616, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1604 and for controlling cursor movement on display 1612. In some embodiments, the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor. It will be appreciated that the processor 1604, under control of software and/or operating system, causes display of graphics and text, and that the display 1612 displays such. Displaying a graph comprises displaying a graphical representation.
The term “non-transitory media” as used herein refers to any media storing data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media can comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 1610. Volatile media includes dynamic memory, such as main memory 1606. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
Non-transitory media is distinct from, but can be used in conjunction with, transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1602. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. Various forms of media can be involved in carrying one or more sequences of one or more instructions to processor 1604 for execution. For example, the instructions can initially be carried on a magnetic disk or solid state drive of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 1600 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 1602. Bus 1602 carries the data to main memory 206, from which processor 1604 retrieves and executes the instructions. The instructions received by main memory 1606 can optionally be stored on storage device 1610 either before or after execution by processor 1604.
Computer system 1600 also includes a communication interface 1618 coupled to bus 1602. Communication interface 1618 provides a two-way data communication coupling to a network link 1621 that is connected to a local network 1622. For example, communication interface 1618 can be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 1618 can be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented. In any such implementation, communication interface 1618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Network link 1621 typically provides data communication through one or more networks to other data devices. For example, network link 1621 can provide a connection through local network 1622 to data equipment operated by an Internet Service Provider (ISP) 1626. ISP 1626 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 1628. Local network 1622 and Internet 1628 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1621 and through communication interface 1618, which carry the digital data to and from computer system 1600, are example forms of transmission media.
Computer system 1600 can send messages and receive data, including program code, through the network(s), network link 1621 and communication interface 1618. In the Internet example, a server 1627 might transmit a requested code for an application program through Internet 1628, ISP 1626, local network 1622 and communication interface 1618. The received code can be executed by processor 1604 as it is received, and/or stored in storage device 1610, or other non-volatile storage for later execution.
The network graph parser 234 is integrated into the web browser 1632 to form part of the web browser 1632. The user can first download the network graph parser 234 from an appropriate web site or other source (e.g. portable storage such as a thumb drive or a storage device on a local network) and then can proceed to install the network graph parser 234. Since a typical network graph parser 234 is designed to be compatible to a specific web browser 1632 (e.g., Google™ Chrome™, Mozilla™ Firefox™, Microsoft™ Internet Explorer™, etc.), the network graph parser 234 can become a part of the web browser 1632 automatically after the network graph parser 234 is installed.
Above, various actions are described as being performed by the network graph parser 234 and/or the web browser 1632. It will be appreciated that this is shorthand for computer program instructions that form part of the network graph parser 234 or the browser 1632, as the case may be, being executed by the processor 1604 and causing the processor 1604 to take the action. In doing so, some or all of the computer code/instructions constituting the network graph parser 1634 and the browser 1632 are copied from the ROM 1608 and stored in the main memory 206, which is a volatile memory, such that the computer code/instructions constituting the network graph parser 234 and the browser 1632 can be executed by the processor 1604. In executing the computer code/instructions constituting the network graph parser 234 and the browser 1632, the processor 204 is controlled to store data (other than the computer code/instructions constituting the network graph parser 234 and the browser 1632) temporarily in the main memory 1606. As mentioned above, the main memory 1606 is volatile memory and as such data stored therein is lost when the main memory 1606 is de-powered.
Certain embodiments are described herein as including logic or a number of components, modules, or engines. Engines can constitute either software engines (e.g., code embodied on a machine-readable medium) or hardware engines. A “hardware module” is a tangible unit capable of performing certain operations and can be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware engines of a computer system (e.g., a processor or a group of processors) can be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In some embodiments, a hardware engines can be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware engines can include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware engines can be a special-purpose processor, such as a Field-Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). A hardware engines may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware engines can include software executed by a general-purpose processor or other programmable processor. Once configured by such software, hardware modules become specific machines (or specific components of a machine) uniquely tailored to perform the configured functions and are no longer general-purpose processors. It will be appreciated that the decision to implement a hardware engines mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) can be driven by cost and time considerations.
Accordingly, the phrase “hardware engine” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented engine” refers to a hardware module. Considering embodiments in which hardware engines are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software accordingly configures a particular processor or processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules can be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications can be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module can perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module can then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules can also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein can be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module implemented using one or more processors.
Similarly, the methods described herein can be at least partially processor-implemented, with a particular processor or processors being an example of hardware. For example, at least some of the operations of a method can be performed by one or more processors or processor-implemented modules. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an Application Program Interface (API)).
The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processors or processor-implemented modules can be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the processors or processor-implemented modules are distributed across a number of geographic locations.
The modules, methods, applications and so forth described in conjunction with
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Although an overview of the inventive subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the present disclosure. Such embodiments of the inventive subject matter may be referred to herein, individually or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or inventive concept if more than one is, in fact, disclosed.
The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.
As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, plural instances may be provided for resources, operations, or structures described herein as a single instance. Additionally, boundaries between various resources, operations, modules, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in a context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within a scope of various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in the example configurations may be implemented as a combined structure or resource. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within a scope of embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims
1. (canceled)
2. A method comprising:
- receiving, by a client device, from a network site, node connection data of an initial user object associated with the network site, the node connection data being included in a page of the network site;
- identifying, by the client device, additional user objects included in the node connection data of the initial user object;
- receiving, from a user of the client device, selection of a connection parameter shared by the initial user object and a portion of the additional user objects on the network site;
- receiving, from the user of the client device, an inversion instruction to remove non-selected portions that are not in the specified portion of the selected additional user objects; and
- generating, by the client device, a visual representation that depicts connections between the initial user object and the portion of the additional user objects.
3. The method of claim 2, wherein the connection parameter is a quantity of connections to other user objects, and the connection parameter specifies that for inclusion in the portion each user object has more than a specified quantity of connections to other user objects.
4. The method of claim 2, wherein the connection parameter is participation in an object group on the network site, and the connection parameter specifies that for inclusion in the portion each user object is a participant in a selected group on the network site.
5. The method of claim 2, wherein the visual representation is a network graph having nodes connected by edges, the nodes corresponding to the initial user object and the additional user objects, the edges corresponding to connections among the initial user object and the additional user objects.
6. The method of claim 2, further comprising:
- receiving, from the user of the client device, one or more search terms to search in the additional user objects;
- identifying one or more additional user objects that match the one or more search terms; and
- storing the one or more additional user objects as the portion specified by the selection instruction.
7. The method of claim 2, wherein the node connection data is user data and the initial user object is the user of the network site, and wherein the additional node connection data is additional user data and the additional user objects are other users that are connected to the user on the network site.
8. The method of claim 2, wherein each of the node connection data and the additional node connection data comprise at least one of the following: a username of a given user on the network site, a uniform resource locator (URL) of a profile page of the given user on the network site, images uploaded by the given user to the network site, text uploaded by the given user to the network site.
9. The method of claim 2, further comprising:
- displaying, on a display device of the client device, the page comprising the node connection data from the network site.
10. A client device comprising:
- one or more processors;
- one or more input devices;
- a display device; and
- a memory comprising instructions that, when executed by the one or more processors, cause the client device to perform operations comprising:
- receiving, from a network site, node connection data of an initial user object associated with the network site, the node connection data being included in a page of the network site;
- identifying additional user objects included in the node connection data of the initial user object;
- receiving, from the one or more input devices, selection of a connection parameter shared by the initial user object and a portion of the additional user objects on the network site;
- receiving, from the one or more input devices, an inversion instruction to remove non-selected portions that are not in the specified portion of the selected additional user objects; and
- generating a visual representation that depicts connections between the initial user object and the portion of the additional user objects; and
- displaying the visual representation on the display device.
11. The client device of claim 10, wherein the connection parameter is a quantity of connections to other user objects, and the connection parameter specifies that for inclusion in the portion each user object has more than a specified quantity of connections to other user objects.
12. The client device of claim 10, wherein the connection parameter is participation in an object group on the network site, and the connection parameter specifies that for inclusion in the portion each user object is a participant in a selected group on the network site.
13. The client device of claim 10, wherein the visual representation is a network graph having nodes connected by edges, the nodes corresponding to the initial user object and the additional user objects, the edges corresponding to connections among the initial user object and the additional user objects.
14. The client device of claim 10, the operations further comprising:
- receiving, from the user of the client device, one or more search terms to search in the additional user objects;
- identifying one or more additional user objects that match the one or more search terms; and
- storing the one or more additional user objects as the portion specified by the selection instruction.
15. The client device of claim 10, wherein the node connection data is user data and the initial user object is the user of the network site, and wherein the additional node connection data is additional user data and the additional user objects are other users that are connected to the user on the network site.
16. The client device of claim 10, wherein each of the node connection data and the additional node connection data comprise at least one of the following: a username of a given user on the network site, a uniform resource locator (URL) of a profile page of the given user on the network site, images uploaded by the given user to the network site, text uploaded by the given user to the network site.
17. The client device of claim 10, the operations further comprising:
- displaying, on a display device of the client device, the page comprising the node connection data from the network site.
18. A non-transitory computer readable storage medium comprising instructions that, when executed by one or more processors of a device, cause the device to perform operations comprising:
- receiving, from a network site, node connection data of an initial user object associated with the network site, the node connection data being included in a page of the network site;
- identifying additional user objects included in the node connection data of the initial user object;
- receiving selection of a connection parameter shared by the initial user object and a portion of the additional user objects on the network site;
- receiving an inversion instruction to remove non-selected portions that are not in the specified portion of the selected additional user objects; and
- generating a visual representation that depicts connections between the initial user object and the portion of the additional user objects.
19. The non-transitory computer readable storage medium of claim 18, wherein the connection parameter is a quantity of connections to other user objects, and the connection parameter specifies that for inclusion in the portion each user object has more than a specified quantity of connections to other user objects.
20. The non-transitory computer readable storage medium of claim 18, wherein the connection parameter is participation in an object group on the network site, and the connection parameter specifies that for inclusion in the portion each user object is a participant in a selected group on the network site.
21. The non-transitory computer readable storage medium of claim 18, wherein the visual representation is a network graph having nodes connected by edges, the nodes corresponding to the initial user object and the additional user objects, the edges corresponding to connections among the initial user object and the additional user objects.
Type: Application
Filed: Oct 16, 2019
Publication Date: Apr 30, 2020
Inventors: Thomas Mcintyre (London), Carl Rosen (London), Eliot Ball (London), John Chakerian (Los Altors Hills, CA), Joseph Carter (Arlington, VA), Kevin Today (Houston, TX), Marvel Church (New York, NY), Michal Stojek (London), Ranec Highet (Broxbourne), Ronald Highet (London), Maciej Laska (London)
Application Number: 16/654,048