GENERATION OF A PRIVILEGE GRAPH TO REPRESENT DATA ACCESS AUTHORIZATIONS

Info

Publication number: 20220067194
Type: Application
Filed: Sep 2, 2021
Publication Date: Mar 3, 2022
Inventors: Tarun Thakur (Los Gatos, CA), Maohua Lu (Fremont, CA)
Application Number: 17/465,390

Abstract

The technology disclosed herein enables generation of a privilege graph to represent data access authorizations. In a particular embodiment, a method includes extracting identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments. The method further includes forming subgraphs for the identity environments and the data environments from the identity information and the privilege information. The method also includes translating the subgraphs into a canonical schema and, after translating the subgraphs, combining the subgraphs into the privilege graph.

Description

Description

RELATED APPLICATIONS

This application is related to and claims priority to U.S. Provisional Patent Application 63/073,751, titled “GENERATION OF A PRIVILEGE GRAPH TO REPRESENT DATA ACCESS AUTHORIZATIONS,” filed Sep. 2, 2020, and which is hereby incorporated by reference in its entirety.

BACKGROUND

Modern enterprises use numerous data environments to store, manage, and/or process data and those environments may be managed by different systems, applications, and/or platforms from different providers and each may use its own data repository (e.g., database). For instance, different departments may employ different database systems depending on the features offered by the respective system (e.g., accounting may use a first database system while human resources uses a second). In some cases, a single department may itself use multiple platforms for data repositories depending on the capabilities of each platform even if the platforms manage similar data sets. For example, human resources may use one platform to onboard and terminate employees from the enterprise while another platform is used to handle employees' compensation and benefits. The repositories may be hosted local to the enterprise (i.e., at one or more of the enterprise's own facilities) or may be cloud based and hosted by third parties. Likewise, the cardinality of the data environments and the data therein can be very high (on the order of thousands of individual elements, such as data tables, to which a user can potentially access), which makes it very difficult (if not impossible) for a human administrator to track which data can be accessed by which users.

SUMMARY

The technology disclosed herein enables generation of a privilege graph to represent data access authorizations. In a particular embodiment, a method includes extracting identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments. The method further includes forming subgraphs for the identity environments and the data environments from the identity information and the privilege information. The method also includes translating the subgraphs into a canonical schema and, after translating the subgraphs, combining the subgraphs into the privilege graph.

In some embodiments, the method includes displaying the privilege graph to an administrator authorized to view the privilege graph.

In some embodiments, forming the subgraphs includes creating a user node for a user of the plurality of users and sequentially connecting the user node to one or more attribute nodes that each represent an attribute of the user indicated in the identity information. In those embodiments, upon reaching a last attribute node of the one or more attribute nodes, the method may include connecting the last attribute node to a privileges node and connecting the privileges node to one or more nodes of authorized data environments of the plurality of data environments that the user is authorized to access. The one or more nodes of authorized data environments each may represent data or a feature that the user is authorized to access.

In some embodiments, translating the subgraphs includes, for attribute nodes of the subgraphs, changing attribute labels representing attributes of a user to canonical labels defined by the canonical schema.

In some embodiments, combining the subgraphs includes, for an attribute represented by attribute nodes in multiple subgraphs, generating a common attribute node and migrating connections with the attribute nodes to the common attribute node. In those embodiments, after migrating the connections, the method may include identifying replicated connections with the common attribute node and deduplicating the replicated connections.

In some embodiments, the method includes identifying a change to the privilege information and updating the privilege graph based on the change. Updating the privilege graph may include adding or removing a connection between nodes in the privilege graph.

In another embodiment, an apparatus is provided having one or more computer readable storage media and a processing system operatively coupled with the one or more computer readable storage media. Program instructions stored on the one or more computer readable storage media, when read and executed by the processing system, direct the processing system to extract identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments. The program instruction further direct the processing system to form subgraphs for the identity environments and the data environments from the identity information and the privilege information. Also, the program instructions direct the processing system to translate the subgraphs into a canonical schema and, after translating the subgraphs, combine the subgraphs into a privilege graph.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an implementation for generating a privilege graph representing data access authorizations.

FIG. 2 illustrates an operation for generating a privilege graph representing data access authorizations.

FIG. 3 illustrates a privilege graph generated to represent data access authorizations.

FIG. 4 illustrates a privilege graph generated to represent data access authorizations.

FIG. 5 illustrates subgraphs of a privilege graph generated to represent data access authorizations.

FIG. 6 illustrates an operation for generating a privilege graph representing data access authorizations.

FIG. 7 illustrates an operation for generating a privilege graph representing data access authorizations.

FIG. 8 illustrates a subgraph of a privilege graph generated to represent data access authorizations.

FIG. 9 illustrates an operation for generating a privilege graph representing data access authorizations.

FIG. 10 illustrates a computing architecture generating a privilege graph representing data access authorizations.

DETAILED DESCRIPTION

Modern enterprises use numerous data environments to store, manage, and/or process data and those environments may be managed by different systems, applications, and/or platforms from different providers and each may use its own data repository (e.g., database). For instance, different departments may employ different database systems depending on the features offered by the respective system (e.g., accounting may use a first database system while human resources uses a second). In some cases, a single department may itself use multiple platforms for data repositories depending on the capabilities of each platform even if the platforms manage similar data sets. For example, human resources may use one platform to onboard and terminate employees from the enterprise while another platform is used to handle employees' compensation and benefits. The repositories may be hosted local to the enterprise (i.e., at one or more of the enterprise's own facilities) or may be cloud based and hosted by third parties. Likewise, the cardinality of the data environments and the data therein can be very high (on the order of thousands of individual elements, such as data tables, to which a user can potentially access), which makes it very difficult (if not impossible) for a human administrator to track which data can be authorized by which users.

Each of the environments discussed above uses its own mechanisms to regulate which users have access to which features and which data. That is, the mechanisms regulate the privileges that each user has for accessing each data environment and prevent users who are not authorized to access certain features or data from doing so. As such, each environment needs to receive information defining the privileges for each user that is authorized to access at least a portion of the features/data available therefrom. To track privileges across a multitude of data environments, the graphing service described herein uses a privilege graph to track users and corresponding privileges. The privilege graph, when displayed to a user, graphically represents the associations between authorizations, users, groups, etc. within an entity (e.g., the user's employer), which enables the user to easily comprehend the nature of authorizations for the entity.

FIG. 1 illustrates implementation 100 for privilege graph-based representation of data access authorizations. Implementation 100 includes graphing service 101, data environments 102, user terminal 103, and identity environments 104. Graphing service 101 and data environments 102 communicate over respective communication links 111. Graphing service 101 and user terminal 103 communicate over communication link 112. Graphing service 101 and identity environments 104 communicate over respective communication links 113. While communication links 111-113 are shown as direct links, communication links 111-113 may include intervening systems, networks, and/or devices. Graphing service 101 executes on one or more computing systems, such as server systems, having processing and communication circuitry to operate as described below. User terminal 103 is a user operated computing system, such as a desktop workstation, laptop, tablet computer, smartphone, etc., that user 141 uses to access data environments 102.

In operation, graphing service 101 generates privilege graph 131 to which tracks authorizations defined in identity environments 104 and corresponding ones of data environments 102. Identity environments 104 include one or more systems that maintain information about users (e.g., user identity information, user attributes, etc.) and information about which of data environments (including specific data/features therein) each user is allowed to access. Identity environments 104 may include an active directory (AD) server, a privilege account management (PAM) system, human resources management system (HRMS), identity and access governance (IAG) system, cloud-based identity access management system, or any other type of system that maintains the user information discussed above. By tracking the authorization of many, if not all, users in an organization (e.g., business enterprise), privilege graph 131 is able to not only represent authorizations for particular users but also represent authorizations based on attributes of users (e.g., the user's role and/or group). Privilege graph 131 may be stored local to graphing service 101 or may be accessible to graphing service 101 from an external data repository, which may itself be managed by one of data environments 102. Graphing service 101 performs operation 200, described below, to generate privilege graph 131 from information obtained from data environments 102 and identity environments 104.

FIG. 2 illustrates operation 200 for privilege graph-based representation of data access authorizations. In operation 200, graphing service 101 extracts identity information (active directory information, cloud-based identity access management information, PAM information, etc.) for a plurality of users from identity environments 104 and privilege information from data environments 102 (201). To access data environments 102 and identity environments 104, graphing service 101 may use credentials that indicate graphing service 101 is allowed to access the identity information and privilege information on data environments 102 and identity environments 104, respectively. The credentials may be provided to graphing service 101 by an administrator(s) of data environments 102 and identity environments 104. In some examples, the credentials may only provide read-only access to graphing service 101 so as to ensure graphing service 101 cannot modify the identity information or the privilege information.

The users may be all users in an organization (e.g., a business enterprise) or may be a subset thereof. The users may include human users, such as user 141, that access data environments 102 and/or non-human users (such as applications, micro-services, etc.) and/or identity environments 104 through their respective user terminals, such as user terminal 103. Although, the users could include one or more computing systems, applications, services, and/or other type of non-human component that could access one or more of data environments 102 with proper privileges. Users may be identified in the identity information using an identifier for the particular user (e.g., name of the user, username of the user, employee identifier, etc.), including machine IDs, app IDs, etc. for non-human users. In some examples, the identity of a non-human user (e.g., service or application) may be tied to a human user in charge of the non-human user (e.g., the human user who “owns” the service/application) and the privileges of one may, therefore, be synonymous with (or dependent upon) the privileges of the other. The identity information may also include attributes for the respective users. The attributes may indicate a work group for a user, a job title/role for a user, a seniority of the user, a security clearance level for the user, or any other type of attribute that may affect what data environments user 141 can access. The privilege information indicates which respective users are allowed to access which ones of data environments 102. In some examples, the privilege information may further indicate specific features and/or data within each of data environments 102 respective users are allowed to access. For example, one of data environments 102 may be a data repository that includes multiple data tables therein and a user may only be allowed to access a subset of those data tables.

Graphing service 101 forms subgraphs for the identify environments and the data environments from the identity information and the privilege information (202). Each subgraph corresponds to distinct sets of identity and privilege information. For example, one identity environment of identity environments 104 may indicate one set of attributes for users while another identity environment indicates another set of attributes. A subgraph would be created by graphing service 101 for each respective set. In particular, an example subgraph of one of the sets would indicate attributes (e.g., groups) in the identity environment's identity information as nodes with branches connecting to users in each group. Likewise, each of data environments 102 indicates attributes, such as roles, with access privileges thereto. A subgraph of each data environment includes a node for the data environment, or nodes for specific features/data of the data environment, with branches therefrom to roles having access thereto.

The identity information and privilege information from the respective data environments 102 and identity environments 104 may use different schemas that represent information differently. For example, one system may use one terminology to indicate privileges (e.g., “column read”) while another system uses a different terminology (e.g., “read column”). In another example, one environment may use one title for a particular role while another environment may use a different title for the same role. To account for those different schemas, graphing service 101 translates the subgraphs into a canonical schema (203). The canonical schema may be one of the schemas used by data environments 102 and/or identity environments 104 or may be a schema that is unique to graphing service 101. After translating the subgraphs, information that is the same across multiple subgraphs will be represented in the same manner (e.g., user's will be identified using the same name/identifier, roles will have the same name/identifier, etc.).

After the translating, graphing service 101 combines the subgraphs into privilege graph 131 (204). Since the subgraphs are now all using the canonical schema, the information represented by those subgraphs can be combined into privilege graph 131. For example, multiple subgraphs may include a node for a particular role but at least one of those subgraphs differs from the others regarding which users branch from that role. Instead of having multiple nodes for that same role, privilege graph 131 includes only one node for that role and a branching node for each user in the subgraphs regardless of how many times that user appeared across the subgraphs. Other subgraphs may indicate ones of data environments 102 to which the users in that role have access. The role node would then also branch to those ones of data environments 102. In some examples, not every user in the role will have access to the same data environments 102. In those cases, graphing service 101 may determine other attributes common to the users of each respective same data environment sets. Rather than branching from the role directly to the environment sets, the role node would branch to the two or more other determined attributes before then branching to the ones of data environments 102 each user having the other attributes is allowed to access. In some examples, specific data/features of the data environments 102 may be indicated as being accessible rather than data environments as a whole.

To combine the subgraphs into privilege graph 131, for each data environment 102, graphing service 101 may include definitions of the authentication mechanism being used for users to access each of data environments 102. Using the authentication mechanism, graphing service 101 retrieves the corresponding authentication entity (including both name and properties). An authentication entity is anything that provided with authorization to access ones of data environments 102, such as a group of users (e.g., marketing group, sales group, etc.) or individual users themselves. Graphing service 101 uses the authentication entity to query identity environments 104 to get the corresponding entity of the identity environments and graphing service 101 creates an edge between the entity in the data environment and the entity in the identity environment. For example, a database in data environments 102 could have a database role that can be connected to the Active Directory (AD) group in an AD server of identity environments 104.

When the subgraphs are combined, user 141 may operate user terminal 103 to view privilege graph 131 from graphing service 101. Privilege graph 131 allows user 141 to visualize all the users having access to particular ones of data environments 102 and what attributes those users have. If privilege graph 131 is organized spatially with users on the left and data environments 102 on the right, user 141 can traverse privilege graph 131 from a selected user to the ones of data environments 102 the selected user can access through nodes representing attributes of the selected user. Privilege graph 131 may also be used for purposes other than visualizing user access privileges. For example, an automated privilege assignment system may use privilege graph 131 to determine which of data environments 102 a new user should be allowed to access.

FIG. 3 illustrates privilege graph 300 for privilege graph-based representation of data access authorizations. Privilege graph 300 is an example of privilege graph 131. Data environments 301 are examples of data environments 102. Data environments 301, in this example, include databases, such as Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) databases, files, applications, and computing resources. Nodes 302 are at a level in the privilege graph that points to particular features 311 of data environments 301 that are accessible to users having attributes that led to respective ones of nodes 302 during traversal of privilege graph 300. Nodes 303 are nodes at a level prior to reaching nodes 302 and represent different roles that a user may have. Similarly, nodes 304 are at a level prior to reaching nodes 303 and represent different groups in which a user may be included. The level before nodes 304 is a level with nodes 305, which represent the users themselves. When a user in nodes 305 has a particular attribute (e.g., is in a particular group), a branch from the node 305 for that user is displayed to a node of nodes 304 representing that attribute. From that node 304, branches are displayed to nodes of nodes 303 that represent other attributes (e.g., roles) that users in the node 304 have. From one of the nodes 303 to which one of those branches terminated, branches are displayed to nodes of nodes 302 that represent other attributes (e.g., privileges) that the users in the node 303 have. As can be seen on privilege graph 300, the branches from nodes 305 may direct to any one of nodes 302-304 because different types of users may not have certain attributes (e.g., may not belong to groups or have a role). Likewise, a user, like the IAM principal node of nodes 305, may branch to different levels of nodes.

Privilege graph 300 may be presented to a user by graphing service 101. For example, user 141 may be an administrator that has a need or desire to view the overall landscape of data environment authorizations represented by privilege graph 300. User 141 may operate user terminal 103 to request privilege graph 300 from graphing service 101 via a graphical user interface (GUI) to graphing service 101 (e.g., a web-based application or native application). User terminal 103 displays privilege graph 300 to user 141 through the GUI. Being able to view privilege graph 300, rather than privilege graph 300 simply being represented in memory, allows user 141 to more easily view which attributes of users lead to those users having access to particular ones of data environments 102 (and features/data therein), which are represented as the databases, files, folders, applications, compute elements, online transaction processing (OLTP), and online analytical processing (OLAP) elements on the right side of privilege graph 300.

When viewing privilege graph 300, user 141 may notice that users having certain attribute(s) or combinations of attribute(s) are currently authorized to access a particular data environment to which they should not have access. User 141 may then instruct graphing service 101 to deauthorize those users from accessing the particular data environment or user 141 may use user terminal 103 to deauthorize the users from accessing the data environment. In either situation, graphing service 101 will update privilege graph 300 after the users are deauthorized to reflect the fact that the users are not authorized to access the data environment. In some cases, privilege graph 300 may track how privileges change over time. Thus, in the above example, user 141 may be able to “look back in time” to see that the users were once able to access the data environment that they were deauthorized from accessing.

Additionally, privilege graph 300 may only be one level of details that user 141 is able to view with respect to the privileges depicted thereby. The GUI for graphing service 101 may further allow user 141 to specify what information user 141 wishes to view. For example, while privilege graph 300 shows which user attributes result in authorization to which data environments, user 141 may desire to see which specific users are allowed to access a particular data environment. Upon specifying that desire to graphing service 101, privilege graph 300 may change in the GUI to show specific users as nodes branching from a node representing the particular data environment. In an alternative example, user 141 may specify the they desire to view which attributes of users allow those users to access a particular data environment and nodes representing those attributes may then be displayed branching from the data environment. Of course, other authorization relationships may be presented using privilege graph 300 as well.

Even further uses of privilege graph 300 are envisioned, including Data based dynamic role assignments (e.g., assigning a role to a user based on the data that the role can access), Risk Scores (e.g., assigning a score representing how at risk certain data is for being accessed by an unwanted user), Tagging, Least Privilege Violations, anomaly detection (e.g., identifying users, roles, etc. that should not have access to certain data even though they currently are authorized to do so), monitoring, recommendations, audit reporting, etc.

FIG. 4 illustrates privilege graph 400 for privilege graph-based representation of data access authorizations. Privilege graph 400 is another example of privilege graph 131. In this example, user 141 is an administrator to which privilege graph 400 presents a high-level overview of which users have access to which of data environments, including data systems 451, applications 452, and computing resources 453. By tracing through the connections between nodes from left to right, user 141 can see which attribute combinations (i.e., groups, roles, etc.) are currently being allowed to access which data and/or features of the data environments. Graphing service 101 displays privilege graph 400, in this example, through a display of user terminal 103, which may execute an application for interacting with graphing service 101 or access graphing service 101 through a web-based interface.

In this example, the users whose access privileges are represented by privilege graph 400 are employees 401 and applications 402, although other types of users may be included in other examples. Employees 401 and applications 402 may represent the entirety of users under the purview of user 141 or may be only a subset (e.g., user 141 may be responsible for all users in an enterprise or just a subset thereof). When looking at privilege graph 400, user 141 can determine, based on the connections between user nodes and group nodes, that one or more of employees 401 are in groups 411-413 and one or more of applications 402 are in groups 413-414. In some cases, an individual user may belong to more than one of the groups. As user 141 continues to move to the right through privilege graph 400, user 141 follow the connections between nodes for groups 411-414 and roles 421-426 to determine which of groups 411-414 have users with which of roles 421-426. For example, group 412 has connections to role 421, role 422, and role 423. Those connections indicate to user 141 that group 412 has users in each of those roles. In some cases, one user may be in more than one of the roles.

Continuing right from nodes for roles 421-426, user 141 follows connections to the nodes of privileges 431-434 to determine users in which of roles 421-426 have various privileges 431-434. For instance, there are connections from role 422, role 423, and role 426 to privileges 432. As such, one or more users in each of those roles have privileges 432. The node for privileges 432 then connects to show what access is granted by privileges 432. In this case, privileges 432 only have one connection to feature 444 of applications 452. Other privileges enable access to multiple ones of features/data 441-446 (e.g., privileges 431 enable access to data 442, data 443, and feature 445). By viewing privilege graph 400 as a whole, user 141 may be able recognize a connection between nodes that should or should not be in privilege graph 400 and make changes accordingly. Had the users, attributes, and privileges not been displayed in this manner, user 141 may never have recognized the deficiency represented by the connection.

FIG. 5 illustrates subgraphs 500 of a privilege graph generated to represent data access authorizations. Subgraphs 500 include subgraph 501 and subgraph 502, which are subgraphs that are created and combined to form part of privilege graph 400, as described below. Subgraph 501 is a subgraph tracing privileges of a single employee 521 who has yet to be included in employees 401. Subgraph 502 is a subgraph tracing privileges of employees already included in employees 401. Subgraph 502 may itself be a subgraph created from a combination of two or more subgraphs. Subgraphs 500 may be displayed to user 141 or may remain in memory of graphing service 101 while graphing service 101 creates subgraphs 500, combines subgraphs 500, and, eventually, produces the complete privilege graph 400, as discussed below. In some examples, subgraphs 500 may be displayed upon request of user 141 should user 141 want to view a different level of detail within privilege graph 400.

FIG. 6 illustrates operation 600 for generating a privilege graph representing data access authorizations. Operation 600 describes the creation of an initial subgraph, subgraph 501 in this case, for combination with other subgraphs to create privilege graph 400. In operation 600, graphing service 101 uses the identity information and the privilege information to identify employee 521 and to determine attributes of employee 521. The attributes of employee 521 indicate that employee 521 is in group 411, has role 423, and has privileges 432, which allow employee 521 to access feature 444. Graphing service 101 creates subgraph 501 from the attributes identified for employee 521. In particular, graphing service 101 creates a node representing employee 521, as shown in subgraph 501 (601). Graphing service 101 further creates a node for each of group 411, role 423, privileges 432, and feature 444 (602). The created nodes are then connected in sequence until employee 521 is connected to feature 444 through attributes 411, 423, and 432 to form subgraph 501 (603). A hierarchy of attributes may be predefined so that graphing service 101 orders the attributes in a desired manner. In this example, from left to right in subgraph 501, a group attribute is included before a role attribute. The hierarchy may be defined based on the number of possible nodes that are possible in a particular level. For instance, there may be fewer groups than there are roles, so roles are designated to come after groups in subgraph 501. Once subgraph 501 is completed, should subgraph 501 be displayed to user 141, user 141 can easily trace the connections from employee 521 through employee 521's attributes to feature 444 that employee 521 can access.

Either before or after creation of subgraph 501, graphing service 101 may translate the labels of the nodes in subgraph 501 to comply with a canonical schema. For example, role 423 may have a label that is different than the label used for role 423 in privilege graph 400. Thus, role 423 will be relabeled in subgraph 501. For instance, role 423 may indicate that employee 521 holds a “team leader” role but the canonical schema refers to the team leader role as being a supervisor role. The label of “team leader” would, therefore, be changed to “supervisor” so that graphing service 101 can find corresponding supervisor nodes when combining subgraphs, as described in more detail below.

FIG. 7 illustrates operation 700 for generating a privilege graph representing data access authorizations. Operation 700 is an example of how subgraph 501 and subgraph 502 are combined into subgraph 800. In operation 700, graphing service 101 determines that both subgraph 501 and subgraph 502 include role 423, privileges 432, and feature 444 (701). Graphing service 101 then identifies the nodes for role 423, privileges 432, and feature 444 in subgraph 502 as being the common nodes for combination (702). In this example, the nodes from subgraph 502 are selected because subgraph 502 is a more complex subgraph (e.g., has more nodes/connections) than subgraph 501. In other examples, the nodes of subgraph 502 may be selected for some other reason, including at random/arbitrarily. The connections between the nodes for role 423, privileges 432, and feature 444 in subgraph 501 are then migrated to maintain the same connections between the common nodes in subgraph 502 (703). For example, a connection between the node for group 411 and the node for role 423 exists in subgraph 501. That connection is migrated to run between the node for group 411 and the node for role 423 in subgraph 502. During the migration, some connections will be replicated. For example, there is a connection between the node for role 423 and the node for privileges 432 in both subgraph 501 and subgraph 502. Graphing service 101 deduplicates those replicated connections (704). In some examples, graphing service 101 deduplicates the connections by removing replicated connections after the migration. In other examples, graphing service 101 may check to see whether a connection already exists between two nodes and, if so, refrains from migrating that connection.

FIG. 8 illustrates subgraph 800 of a privilege graph generated to represent data access authorizations. Subgraph 800 is a resulting subgraph after operation 700 has been performed on subgraphs 500. The node for employee 521 is still shown separately from the node for employees 401. In some examples, operation 700 may further determine that a node can be included in another node. Since employees 401 represent employees of an enterprise and employee 521 is an employee of that enterprise, graphing service 101 may include employee 521 in employees 401. In such a case, the connection between the node for employee 521 and the node for group 411 will be migrated to connect between the node for employees 401 and the node for group 411, as is the case in privilege graph 400.

FIG. 9 illustrates operation 900 for generating a privilege graph representing data access authorizations. Operation 900 is an example of how privilege graph 400 may be modified when changes are made to the identity information and privilege information. In operation 900, graphing service 101 identifies a change to the identity and privilege information that indicates no user in role 422 has privileges 433 any longer (901). In response to identifying the change, graphing service 101 removes the connection in privilege graph 400 between the node for role 422 and the node for privileges 433 (902). After removal, when privilege graph 400 is displayed that connection will no longer be in the display. In some examples, if a node no longer has a connection, graphing service 101 may remove the node from privilege graph 400 (903). For example, if the connection from the node for role 425 to privileges 433 is removed, there are no privileges remaining for role 425 and the role's node can be removed.

In other examples, the changes may justify a new connection be made. For instance, a user in group 411 may be give role 421. Graphing service 101 may therefore create a connection from the node for group 411 to the node for role 421. In some cases, graphing service 101 may create a new subgraph that includes the change (e.g., a new subgraph for the employee that was given role 421). That new subgraph may be merged into privilege graph 400 using operation 600.

FIG. 10 illustrates computing architecture 1000 for privilege graph-based representation of data access authorizations. Computing architecture 1000 is an example computing architecture for implementing graphing service 101. A similar architecture may also be used for other systems described herein, such as user terminal 103, although alternative configurations may also be used. Computing architecture 1000 comprises communication interface 1001, user interface 1002, and processing system 1003. Processing system 1003 is linked to communication interface 1001 and user interface 1002. Processing system 1003 includes processing circuitry 1005 and memory device 1006 that stores operating software 1007.

Communication interface 1001 comprises components that communicate over communication links, such as network cards, ports, RF transceivers, processing circuitry and software, or some other communication devices. Communication interface 1001 may be configured to communicate over metallic, wireless, or optical links. Communication interface 1001 may be configured to use TDM, IP, Ethernet, optical networking, wireless protocols, communication signaling, or some other communication format—including combinations thereof.

User interface 1002 comprises components that interact with a user. User interface 1002 may include a keyboard, display screen, mouse, touch pad, or some other user input/output apparatus. User interface 1002 may be omitted in some examples.

Processing circuitry 1005 comprises microprocessor and other circuitry that retrieves and executes operating software 1007 from memory device 1006. Memory device 1006 comprises a computer readable storage medium, such as a disk drive, flash drive, data storage circuitry, or some other memory apparatus. In no examples would a storage medium of memory device 1006 be considered a propagated signal. Operating software 1007 comprises computer programs, firmware, or some other form of machine-readable processing instructions. Operating software 1007 includes access graphing module 1008. Operating software 1007 may further include an operating system, utilities, drivers, network interfaces, applications, or some other type of software. When executed by processing circuitry 1005, operating software 1007 directs processing system 1003 to operate computing architecture 1000 as described herein.

In particular, graphing module 1008 directs processing system 1003 to extract identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments. Graphing module 1008 further directs processing system 1003 to form subgraphs for each of the identify environments and each of the data environments from the identity information and the privilege information. Also, graphing module 1008 directs processing system 1003 to translate the subgraphs into a canonical schema and, subsequently, combine the subgraphs into the privilege graph.

The descriptions and figures included herein depict specific implementations of the claimed invention(s). For the purpose of teaching inventive principles, some conventional aspects have been simplified or omitted. In addition, some variations from these implementations may be appreciated that fall within the scope of the invention. It may also be appreciated that the features described above can be combined in various ways to form multiple implementations. As a result, the invention is not limited to the specific implementations described above, but only by the claims and their equivalents.

Claims

1. A method for generating a privilege graph representing data access authorizations, the method comprising:

extracting identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments;

forming subgraphs for the identity environments and the data environments from the identity information and the privilege information;

translating the subgraphs into a canonical schema; and

after translating the subgraphs, combining the subgraphs into the privilege graph.

2. The method of claim 1, comprising:

displaying the privilege graph to an administrator authorized to view the privilege graph.

3. The method of claim 1, wherein forming the subgraphs comprises:

creating a user node for a user of the plurality of users and sequentially connecting the user node to one or more attribute nodes that each represent an attribute of the user indicated in the identity information.

4. The method of claim 3, comprising:

upon reaching a last attribute node of the one or more attribute nodes, connecting the last attribute node to a privileges node; and

connecting the privileges node to one or more nodes of authorized data environments of the plurality of data environments that the user is authorized to access.

5. The method of claim 4, wherein the one or more nodes of authorized data environments each represent data or a feature that the user is authorized to access.

6. The method of claim 1, wherein translating the subgraphs comprises:

for attribute nodes of the subgraphs, changing attribute labels representing attributes of a user to canonical labels defined by the canonical schema.

7. The method of claim 1, wherein combining the subgraphs comprises:

for an attribute represented by attribute nodes in multiple subgraphs, identifying a common attribute node and migrating connections with the attribute nodes to the common attribute node.

8. The method of claim 7, comprising:

identifying replicated connections with the common attribute node; and

deduplicating the replicated connections.

9. The method of claim 1, comprising:

identifying a change to the privilege information; and

updating the privilege graph based on the change.

10. The method of claim 9, wherein updating the privilege graph comprises:

adding or removing a connection between nodes in the privilege graph.

11. An apparatus comprising:

one or more computer readable storage media;

a processing system operatively coupled with the one or more computer readable storage media; and

program instructions stored on the one or more computer readable storage media that, when read and executed by the processing system, direct the processing system to: extract identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments; form subgraphs for the identity environments and the data environments from the identity information and the privilege information; translate the subgraphs into a canonical schema; and after translating the subgraphs, combine the subgraphs into a privilege graph.

12. The apparatus of claim 11, wherein the program instructions direct the processing system to:

display the privilege graph to an administrator authorized to view the privilege graph.

13. The apparatus of claim 11, wherein to form the subgraphs, the program instructions direct the processing system to:

create a user node for a user of the plurality of users and sequentially connect the user node to one or more attribute nodes that each represent an attribute of the user indicated in the identity information.

14. The apparatus of claim 13, wherein the program instructions direct the processing system to:

upon reaching a last attribute node of the one or more attribute nodes, connect the last attribute node to a privileges node; and

connect the privileges node to one or more nodes of authorized data environments of the plurality of data environments that the user is authorized to access.

15. The apparatus of claim 14, wherein the one or more nodes of authorized data environments each represent data or a feature that the user is authorized to access.

16. The apparatus of claim 11, wherein to translate the subgraphs, the program instructions direct the processing system to:

for attribute nodes of the subgraphs, change attribute labels representing attributes of a user to canonical labels defined by the canonical schema.

17. The apparatus of claim 11, wherein to combine the subgraphs, the program instructions direct the processing system to:

for an attribute represented by attribute nodes in multiple subgraphs, identify a common attribute node and migrate connections with the attribute nodes to the common attribute node.

18. The apparatus of claim 17, wherein the program instructions direct the processing system to:

identify replicated connections with the common attribute node; and

deduplicate the replicated connections.

19. The apparatus of claim 11, wherein the program instructions direct the processing system to:

identify a change to the privilege information; and

update the privilege graph based on the change.

20. One or more computer readable storage media having program instructions stored thereon that, when read and executed by a processing system, direct the processing system to:

extract identity information for a plurality of users from a plurality of identity environments and privilege information from a plurality of data environments;

form subgraphs for the identity environments and the data environments from the identity information and the privilege information;

translate the subgraphs into a canonical schema; and

after translating the subgraphs, combine the subgraphs into a privilege graph.