SYSTEM FOR PROBABILISTIC REASONING AND DECISION MAKING ON DIGITAL TWINS
Aspects of the present disclosure provide systems, methods, and computer-readable storage media that support ontology driven processes to create digital twins that extend the capabilities of knowledge graphs. A dataset including an ontology and domain data corresponding to a domain associated with the ontology is obtained. A knowledge graph is constructed based on the ontology and the domain data is incorporated into the knowledge graph. The knowledge graph is exploited to derive random variables of a probabilistic graph model. The random variables may be associated with probability distributions, which may include unknown parameters. A learning process is executed to learn the unknown parameters and obtain a joint distribution of the probabilistic graph model, which may enable querying of the probabilistic graph model in a probabilistic and deterministic manner.
The present disclosure relates generally to system modelling and more specifically to techniques for extending capabilities of digital twins using probabilistic reasoning.
BACKGROUNDPresently, entities across many different industries are seeking to incorporate the use of digital twins to test, streamline, or otherwise evaluate various aspects of their operations. One such industry is the automotive industry, where digital twins has been explored as a means to analyze and evaluate performance of a vehicle. To illustrate, the use of digital twins has been explored as a means to safely evaluate performance of autonomous vehicles in mixed driver environments (i.e., environments where autonomous vehicles are operating in the vicinity of human drivers). As can be appreciated from the non-limiting example above, the ability to analyze performance or other factors using a digital twin, rather than its real world counterpart (e.g., the vehicle represented by the digital twin), can provide significant advantages. Although the use of digital twins has proved useful across many different industries, much of the current interest is focused only on the benefits that may be realized by using digital twins, and other challenges have gone unaddressed.
One particular challenge that remains with respect to the use of digital twins is the creation of the digital twins themselves. For example, tools currently exist to aid in the creation of digital twins, but most existing tools are limited in the sense that they may be suitable for a specific use case (e.g., creating a digital twin of a physical space, such as a building) but not suitable for another use case (e.g., creating a digital twin of a process). As a result, an entity may need to utilize multiple tools to develop digital twins covering different portions of the use case of interest. In such instances it is sometimes the case that digital twins created using different tools are not compatible with each other, thereby limiting the types of analysis and insights that may be obtained using the digital twins. Additionally, some digital twin creation tools are not well-suited with respect to addressing changes to the real world counterpart and may require re-designing and rebuilding the digital twin each time changes to the real world counterpart occur. This can be particularly problematic for uses cases involving industries where changes frequently occur, such as the manufacturing industry. Thus, while digital twins have shown promise as a means to evaluate their real world counterparts, the above-described drawbacks have limited the benefits that may be realized.
SUMMARYAspects of the present disclosure provide systems, methods, and computer-readable storage media that support ontology driven processes to define and create digital twins that extend the capabilities of knowledge graphs with respect to inferencing, prediction, and decision making. The disclosed systems and methods compile data from one or more data sources and leverage a set of tools or functionality to transform the compiled data into a format that is ready to incorporate into a knowledge graph while meeting the structural constraints defined in the ontology. For example, the compiled data may include observations from entities defined in the ontology representing a domain (e.g., an entity, process, machine, system, etc.). The ontology provides an explicit specification of concepts, properties, and relationships between different objects within the domain. In addition to the ontology, the compiled data may include other types of information, such as operational data (e.g., if the ontology describes a vehicle, the operational data may include data captured during operation of the vehicle).
A knowledge graph is constructed as a realization of the given domain specific ontology by combining and transforming the compiled data, operational data, and any other external data sources. Once generated, the knowledge graph represents semantic relationships between entities (e.g,. nodes) within a business. Information may be inferred from the semantic relationships represented by the knowledge graph, and such information is limited to logical inferences. As such, the knowledge that may be inferred from the knowledge graph is limited to explicit information presented in the graph, such as frequencies, counts, relationships, hierarchies, and the like.
The tools and functionality disclosed herein extend the ability to infer implicit and sometimes hidden knowledge from a knowledge graph by transforming the knowledge graph into a probabilistic graph model. To generate the probabilistic graph model, the knowledge graph may be transformed by treating entities (e.g., nodes) within the graph as random variables, and semantic relationships as statistical dependencies. Each of the random variables may be associated with a probability distribution, but upon initially determining the random variables, one or more parameters of the probability distributions may be unknown. Bayesian learning techniques may be used to learn the unknown parameters or approximations of those parameters from the data of the knowledge graph. Once the unknown parameters have been learned, a joint distribution of the probabilistic graph model may be obtained, which enables queries to be constructed for any combinations of the random variables of the probabilistic graph model.
While the knowledge graph may contain all of the information used to generate the probabilistic graph model, the ability to infer implicit and sometimes hidden knowledge from the knowledge graph is limited since the knowledge graph simply represents semantic relationships and hierarchies. Unlike the knowledge graph, the probabilistic graph model represents information observed from the data directly and information learned using statistical learning techniques conditioned on the graphical structure specified through the statistical dependencies between random variables (i.e., nodes), which enables additional knowledge to be inferred from the data. For example, queries of the probabilistic graph model enable knowledge to be inferred using probabilistic inferences (e.g., whether one subset of the random variables is independent of another subset of random variables, or whether one subset of the random variables is conditionally independent of another subset of random variables given a third subset). The queries may also be used to calculate conditional probabilities, find and fill in missing data, perform optimization for decision making under uncertainty, and other types of capabilities not otherwise available using the knowledge graph alone.
The foregoing has outlined rather broadly the features and technical advantages of the present disclosure in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter which form the subject of the claims of the disclosure. It should be appreciated by those skilled in the art that the conception and specific aspects disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present disclosure. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the scope of the disclosure as set forth in the appended claims. The novel features which are disclosed herein, both as to organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended as a definition of the limits of the present disclosure.
For a more complete understanding of the present disclosure, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
It should be understood that the drawings are not necessarily to scale and that the disclosed aspects are sometimes illustrated diagrammatically and in partial views. In certain instances, details which are not necessary for an understanding of the disclosed methods and apparatuses or which render other details difficult to perceive may have been omitted. It should be understood, of course, that this disclosure is not limited to the particular aspects illustrated herein.
DETAILED DESCRIPTIONAspects of the present disclosure provide systems, methods, apparatus, and computer-readable storage media that support ontology driven processes to define and generate probabilistic graph models that extend the capabilities of the digital twins with respect to inferencing, prediction, and decision making. The process of generating a probabilistic graph model leverages semantic relationships defined in a given ontology to construct knowledge graph based digital twins that, while providing limited capabilities for inferring knowledge, can be exploited to produce probabilistic graphs providing robust knowledge inferencing capabilities. For example, the knowledge graph may be used to identify a set of semantic relationships as statistical dependencies that can be used to connect random variables of the probabilistic graph. Each random variable may be associated with a probability distribution and any unknown parameters of the probability distributions can be learned or approximated using data incorporated into the knowledge graph. Once generated, the probabilistic graph enables complex ad-hoc conditional probabilistic queries with uncertainty quantification, which provides a more meaningful and robust ability to infer knowledge (e.g., using a what-if analysis). Additionally, the enhanced inferential knowledge may enable the probabilistic graph to be used to define and solve optimization problems for making decisions.
Referring to
It is noted that functionalities described with reference to the computing device 110 are provided for purposes of illustration, rather than by way of limitation and that the exemplary functionalities described herein may be provided via other types of computing resource deployments. For example, in some implementations, computing resources and functionality described in connection with the computing device 110 may be provided in a distributed system using multiple servers or other computing devices, or in a cloud-based system using computing resources and functionality provided by a cloud-based environment that is accessible over a network, such as the one of the one or more networks 140. To illustrate, one or more operations described herein with reference to the computing device 110 may be performed by one or more servers or a cloud-based system 142 that communicates with one or more client or user devices.
The one or more processors 112 may include one or more microcontrollers, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), central processing units (CPUs) and/or graphics processing units (GPUs) having one or more processing cores, or other circuitry and logic configured to facilitate the operations of the computing device 110 in accordance with aspects of the present disclosure. The memory 114 may include random access memory (RAM) devices, read only memory (ROM) devices, erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), one or more hard disk drives (HDDs), one or more solid state drives (SSDs), flash memory devices, network accessible storage (NAS) devices, or other memory devices configured to store data in a persistent or non-persistent state. Software configured to facilitate operations and functionality of the computing device 110 may be stored in the memory 114 as instructions 116 that, when executed by the one or more processors 112, cause the one or more processors 112 to perform the operations described herein with respect to the computing device 110, as described in more detail below. Additionally, the memory 114 may be configured to store data and information in one or more databases 118. Illustrative aspects of the types of information that may be stored in the one or more databases 118 are described in more detail below.
The one or more communication interfaces 126 may be configured to communicatively couple the computing device 110 to the one or more networks 140 via wired or wireless communication links established according to one or more communication protocols or standards (e.g., an Ethernet protocol, a transmission control protocol/internet protocol (TCP/IP), an Institute of Electrical and Electronics Engineers (IEEE) 802.11 protocol, an IEEE 802.16 protocol, a 3rd Generation (3G) communication standard, a 4th Generation (4G)/long term evolution (LTE) communication standard, a 5th Generation (5G) communication standard, and the like). In some implementations, the computing device 110 includes one or more input/output (I/O) devices (not shown in
The data ingestion engine 120 may be configured to provide functionality for collecting data to support the functionality provided by the computing device 110. For example, the data ingestion engine 120 may provide functionality for obtaining data to support the operations of the computing device 110 from one or more data sources. Exemplary types of data that may be obtained using the data ingestion engine 120 include one or more ontologies, data collected by Internet of Things (IoT) devices, infrastructure data, financial data, mapping data, time series data, SQL data, or other types of data. The data obtained by the data ingestion engine 120 may be stored in the one or more databases 118 and used by the probabilistic modelling and optimization engine 124 to generate probabilistic models that enable observations, simulations, and other types of operations to be performed, as described in more detail below.
The ontologies obtained by the data ingestion engine 120 provide an abstracted representation of an entity that represents or defines concepts, properties, and relationships for the entity using an accepted body of knowledge (e.g., industry accepted terminology and semantics). The ontologies may specify object types and their semantic relationship to other object types via graph format. Exemplary formats in which the ontologies may be obtained by the data ingestion engine 120 include “.owl” and “.ttl,” files. As a non-limiting example, an ontology for a manufacturer may indicate the manufacturer has production facilities in one or more geographic locations and include, for each production facility, information representing: a floor plan for the production facility, manufacturing infrastructure present at the production facility (e.g., assembly robots, computing infrastructure, equipment, tools, and the like), locations of the manufacturing infrastructure within the production facility, other types of information, or combinations thereof. It is noted that while the exemplary characteristics of the above-described ontology have been described with reference to a manufacturer domain, the ontologies obtained by the data ingestion engine 120 may include ontologies representative of other types of domains, such as ontologies associated with processes (e.g., manufacturing processes, computing processes, biological processes, chemical processes, etc.), ontologies associated with machinery or equipment (e.g., a vehicle, a computing device or component thereof, circuitry, robots, etc.), ontologies associated with biological systems, and the like. Accordingly, it should be understood that the operations disclosed herein with reference to the computing device 110 may be applied to any industry, process, machine, etc. capable of representation via an ontology.
The mapping file (i.e., function) may be utilized to associate other types of data obtained by the data ingestion engine 120 to particular portions of the ontology or ontologies. This mapping file is necessary when the original ingested data sources do not have a built-in schema. For example, time series data may be associated with execution of a process or operations of machinery represented within the ontology and the mapping function may be utilized to associate these various pieces of data with the relevant classes and properties in the ontology as well as their type (e.g., portions of the ontology corresponding to the process or operations of the machinery), as described in more detail below.
The graphing engine 122 may utilize the object types and semantic relationships specified in the ontology to construct a knowledge graph representative of the information contained within the ontology. Stated another way, an ontology represents the graph structure itself and the knowledge graph represents a realization of the ontology with data. As an illustrative example and referring to
In addition to nodes representing assets, other types of nodes may be provided in a knowledge graph, such as nodes representing attributes (e.g., an age of a machine or robot represented in the knowledge graph), processes steps (e.g., tasks performed by a machine or robot represented in the knowledge graph), entities (e.g., a manufacturer of a machine or robot represented in the knowledge graph), or other types of nodes. As described above, these nodes may be connected to other nodes via edges. For example, the knowledge graph 200 could be generated to include a task node (not shown in
The knowledge graph may also incorporate other types of data, such as historical data and metadata. To illustrate, node 212 is described in the example above as representing a robot that performs a task. Sensors or other devices may monitor performance of the task by the robot and generate data associated with performance of the task, such as the number of times the task was performed, whether the task was performed successfully or not, a duration of time required for the robot to complete the task, or other types of data. The dataset generated during the monitoring may be stored in the knowledge graph. Additionally, metadata may also be stored in the knowledge graph, for example, the physical location of where a certain data point is stored, and more specifically, which database on which server and what the IP address of the server is. Additional metadata can also include access privileges for certain users to this data which can be for a subset of knowledge graph.
Referring back to
To illustrate and referring to
Referring briefly to
Referring back to
Referring back to
To convert the knowledge graph to a probabilistic graph model, the probabilistic modelling and optimization engine 124 generates a model (i.e., the probabilistic graph model) in which the nodes of the knowledge graph are treated as a random variables and edges are treated as statistical dependencies. The conversion of the knowledge graph to a probabilistic graph model may be expressed as:
- Let
-
- represent the knowledge graph, where
-
- is a set of vertices (i.e., the nodes of the knowledge graph) and
-
- is a set of edges (i.e., the edges connecting the nodes of the knowledge graph);
- Apply
-
- to obtain an acyclic graph
-
- where
-
- and
-
- are directed edges;
- Assume each node
-
- is associated with a random variable {A, R, M, T, S, D, ... }, and each edge
-
- represents a “statistical dependence.”
Each of the random variables represents a probability distribution that describes the possible values that a corresponding random variable can take and a likelihood (or probability) of the random variable taking each possible value. As an illustrative example and referring to
include the nodes 230, 240, 250, 260, 270, 280 of
that includes edges 232′, 242′, 262′, and 264′. Unlike the knowledge graph 202 of
As explained above, each of the variables {A, R, M, T, S, D, ... } represents a probability distribution. In the example shown
The particular distribution type for each of the distributions 310, 320, 330, 340, 350, 360 may be configured by a user, such as a domain expert, and specified directly in the ontology. The user may configure the distribution type based on expected or known characteristics of the probability distributions represented by each random variable. For example, in probability theory Poisson distributions express the probability of a given number of discrete events occurring in a fixed interval of time or space independent of the time since the last event. Since the age of a robot is observed in the data as a discrete integer value (i.e., in years), the user may associate the Poisson distribution type with the age parameter (A). As another example, Categorical distributions describe the possible results of a random variable that can take on one of K possible categories. The user may associate the categorical distribution type to the variables M, R, T since the probabilistic graph model may represent an environment (e.g., the environment defined in the ontology from which the knowledge graph was generated) where many different types of robots are present, each type of robot manufactured by a particular manufacturer and capable of performing a defined set of tasks, all of which define a set of K possible categories for M, R, T, respectively (i.e., a set of K manufacturer categories, a set of K robot categories, and a set of K task categories). Similarly, a Bernoulli distribution represents the discrete probability of a random variable which takes on the value of 1 with probability p and the value of 0 with probability q = 1 - p (i.e., success or failure). Since the status variable (S) indicates whether the task was performed successfully or failed, the user may associate the Bernoulli distribution type with the status variable (S). The user may assign the exponential distribution type to the duration parameter (D), which represents the amount of time taken to perform a task, because exponential distributions represent the probability distribution of the time between events. It is noted that the exemplary variables, probability distributions, and distribution types described above have been provided for purposes of illustration, rather than by way of limitation and that probabilistic graph models generated in accordance with the present disclosure may utilize other distributions, distribution types, and variables depending on the particular real world counterparts being represented by the probabilistic graph model.
Referring back to
The additional inference and learning capabilities of Bayesian networks leverage a joint distribution of the probabilistic graph model, which may be expressed as P(A, R, M, T, D, S). However, when the probabilistic modelling and optimization engine 124 initially converts the knowledge graph to the probabilistic graph model, the joint distribution may be unknown or incomplete (e.g., missing one or more parameters of the individual probability distributions for one or more of the random variables). For example, while certain parameters of the probability distributions of each random variable may be known, there may be some missing parameters, such as the set of all possible values and their corresponding probability functions (P).
As described above with reference to
As show in Equation 1 above, the individual probabilities of the random variables, which are incorporated into the probabilistic graph during the above-described conversion process may be obtained, but the individual probability distributions of the random variables are incomplete (e.g., missing parameters). The probabilistic modelling and optimization engine 124 may leverage additional techniques to solve for or approximate these unknown parameters. For example, under the chain rule of probability theory the joint distribution P(A, R, M, T, D, S) may be decomposed into prior distributions and conditional distributions. The probabilistic modelling and optimization engine 124 may use known data (e.g., the data of the knowledge graph) to approximate the unknown parameters of the probability distributions for each random variable. To illustrate, letting K represent a set of data and θ represent some unknown parameters, prior distributions may be defined as P(K|θ). The conditional probability, also referred to as the posterior probability, may be derived according to:
where marginal likelihood P(K) is:
The integral of Equation 3 is computationally expensive and difficult to solve. To avoid these computational inefficiencies, the probabilistic modelling and optimization engine 124 may utilize approximation techniques that learn the unknown parameters (or approximations of the unknown parameters) based on the set of data (K). To illustrate, the probabilistic modelling and optimization engine 124 may define a generative program based on the probabilistic graph model. To create the generative program, the probabilistic modelling and optimization engine 124 may convert the data generation process into a series of deterministic and probabilistic statements. For example and using terminology consistent with the probabilistic graph model 300 of
- P(A) ~ Gamma (1, 1)
- age ~ Poisson (p(A))
- p(R) ~ Dirichlet(1)
- robot ~ Categorical (p(R))
- p(M|R) ~ Dirichlet (0.5)
- manufacturer ~ Categorical (p(M|R = robot))
- p(T|R) ~ Dirichlet (0.25)
- task ~ Categorical (p(T|R = robot))
- p(D|T) ~ Gamma (1, 1)
- duration ~ Exponential (p(D|T = task))
- p(S|T) ~ Beta (1, 1)
- status ~ Bernoulli (p(S|T= task))
In the exemplary statements above, the deterministic statement are those statements including an assignment (e.g., “=”) and the remaining statements represent probabilistic statements. The generative program provides a model that may be used to estimate or approximate the unknown parameters. For example, the probabilistic modelling and optimization engine may configure the generative program with a set of guessed parameters and run a simulation process to produce a set of simulation data. The set of simulation data may then be compared to observed data to evaluate how closely the simulation data obtained using the guessed parameters matches or fits actual or real world data. This process may be performed iteratively until the simulated data matches the actual data to within a threshold tolerance (e.g., 90%, 95%, etc.). It is noted that as the set of data grows larger, the ability to estimate or guess the parameters may improve. Thus, the above-described learning process may be periodically or continuously performed and the accuracy of the estimations of the unknown parameters may improve as the set of data grows larger.
The above-described learning process is illustrated in
In
In contrast to the inference process 410, digital twins generated in accordance with the present disclosure (i.e., by the computing device 110 of
As explained above and referring back to
To illustrate and referring to
As shown in
Once the probability distributions having the guessed parameters are added, the probabilistic graph model 300′ may be queried to obtain information that would otherwise not be available using a knowledge graph. For example, the probabilistic model 300′ represents a model of an environment where different robots performs tasks. The probability distribution P(R) 340 includes all possible values 342 of the variable R (e.g., the variable R may take on values of “high-payload”, “high-speed”, “extended-reach”, “ultra-maneuverable”, and “dual-arm”) and each possible value may have an associated probability 344. Similarly, the probability distribution P(M|R) 332 includes all possible values 334 for the statistical dependency (represented by edge 232′) for the variables Mand R (e.g., the possible combinations for the variables M, R may include “high-payload, yaskawa”, “high-payload, fetch”, “high-speed, yaskawa”, “high-speed, fetch”, “extended-reach, yaskawa”, “ultra-maneuverable, yaskawa”, “ultra-maneuverable, fetch”, “dual-arm, yaskawa”, and “dual-arm, fetch”) and each possible value may have an associated probability 336. The probability distributions P(A) 312 may follow a structure similar to the probability distribution P(R) 340, but provide all possible values and their corresponding probabilities for the random variable A; the probability distributions P(T|R), P(D|T) 364, P(S|T) 362 may follow a structure similar to the probability distribution P(M|R) 332, but provide all possible values and their corresponding probabilities for the statistical dependencies associated with their random variable pairs (e.g., T|R, D|T, S|T, respectively).
Referring back to
As shown above, the computing device 110 provides a suite of tools (e.g., the data ingestion engine 120, the graphing engine 122, and the probabilistic modelling and optimization engine 124) providing functionality for constructing knowledge graphs and extend the knowledge graphs by transforming them into probabilistic models. The probabilistic graph models may be used to extract additional information and insights from a set of data (e.g., the set of data used to generate the knowledge graphs). In the context of digital twins, the probabilistic graph models provide users with enhanced understanding of the real world counterpart represented by the probabilistic graph model.
In addition to providing enhanced information, the suite of tools enables users to create digital twins in an intuitive manner without requiring expansive knowledge of programming and modelling techniques. To illustrate, the system 100 of
A user of the computing device 130 may access the GUIs 128 provided by the computing device 110, which may provide the user with access to functionalities provided by the data ingestion engine 120, the graphing engine 122, and the probabilistic modelling and optimization engine 124. For example, the GUIs 128 may be accessed as one or more web pages using a web browser of the computing device 130. The GUIs 128 may enable the user to upload, via the ingestion engine 120, a dataset for use in generating a probabilistic graph model in accordance with the present disclosure. The dataset may be obtained in the one or more databases 138. Additionally or alternative, the dataset may be obtained from one or more data sources 150 external to the computing device 130.
Once the dataset is uploaded to the computing device 110, the GUIs 128 may provide access to interactive elements that enable the user to generate a probabilistic graph model from the uploaded dataset. For example, the GUIs 128 may include a button or other interactive element that enables the user to initiate generation of the probabilistic graph model using the uploaded dataset. Once activated, the functionality of the graphing engine 122 may be executed against the dataset to produce a knowledge graph, such as the knowledge graph 202 of
In addition to providing functionality for creating probabilistic graph models, the GUIs 128 may also enable the user to query the probabilistic graph model. For example, the GUIs 128 may provide a query builder that enables the user to intuitively construct queries, run the queries against the probabilistic graph model, and view the results. The GUIs may also provide the user with the ability to view the knowledge graph and the probabilistic graph model, compare different portions of the knowledge graph and the probabilistic graph model, review analytics, and run simulations. The GUIs 128 may also be configured to present the user with information indicating one or more pieces of data in the dataset used to construct the knowledge graph and the probabilistic graph model are missing. For example, suppose that the dataset used to construct the probabilistic graph model was the dataset 220
The query generating functionality provided by the GUIs 128 may also enable the user to create queries for purposes of prediction, forecasting, or what if analysis. For example, where the probabilistic graph model includes a variable (or node) associated with time (T), a query may be defined with (T = x days in future) and the query will return a probability distribution of all possible values for the queried data. Additional examples of the types of functionality and analytics that may be provided by the computing device 110 and accessed via the GUIs 128 are described below with reference to
As can be appreciated from the foregoing, the system 100 enables digital twins to be created using an ontology driven design. For example, the system 100 enables a user to create a digital twin and define its capabilities by designing and configuring ontology models. The ontologies are realized as rich knowledge graphs with nodes and edges representing semantic meaning enabling logical inferences for retrieving implicit knowledge.
The system 100 also provides the advantage of utilizing a single data representation (e.g., data + AI) in which the data model (e.g., the knowledge graph) and the statistical (AI/ML) model of the data (e.g., the probabilistic graph model) are tightly coupled. As such, there is no need to move the data out of the platform to run analytics. Also, since the analytics model is tightly integrated with the data, the data may be expressed both deterministically and probabilistically, which speeds up computation.
As noted above, the system 100 also provides probabilistic querying capabilities by generating a probabilistic representation of the data, which turns the knowledge graph into a probabilistic database. This enables users to obtain answers to complex ad-hoc conditional probabilistic queries regarding values of nodes (i.e., random variables) with uncertainty quantification (answers to queries are distributions, not single values). As a result, the user can perform a more meaningful and robust what-if analysis.
Another advantage provided by the system 100 is that unknown parameters are learned directly from the data. In particular, the probabilistic modelling and optimization engine requires very little prior knowledge about the domain and lets the data speak for itself. Probability estimates are inferred directly from data in an online learning manner (e.g., Bayes rule). Also, domain knowledge is taken into consideration when available, but not required. It is noted that Bayesian networks have been used previously, but the parameters were derived by subject matter experts rather than being derived from the data itself.
The system 100 may also be used to perform automated “optimal” decision-making through simulation, optimization, and uncertainty quantification. This enables optimize decisions to be made over all plausible future realizations of the target outcome. Problems within the relevant domain may be compiled into an optimization problem with minimal user input, and enables simulation of outcomes to be executed and used to solve optimization problems without the user needing to know how to perform the computations. It is noted that the various advantages descried above have been provided for purposes of illustration, rather than by way of limitation.
Referring to
In
In
In
It is to be appreciated that the exemplary queries described with reference to
Referring to
In addition to single variable analysis, the probabilistic graph model may also facilitate anomaly analysis using conditional queries. For example, in
In addition to performing anomaly analysis based on a single variable (
Referring to
As described above, the probabilistic graph model 700 may be queried to infer data that may not be readily obtained from a knowledge graph or the data from which the knowledge graph was created. The examples described above have primarily related to queries designed to extract statistical inferences or other types of information about the domain represented by the probabilistic graph model. However, an additional capability provided by the digital twin generated in accordance with the present disclosure is the ability to use the probabilistic graph models to perform optimization under uncertainty. In the scenario represented by the probabilistic graph model 700, one such optimization may be the solution to the question “How much should the robot charge the battery?” This question can be formulated as an optimization problem because there is a tradeoff between under-charging the battery and over-charging the battery. In particular, if the battery is under-charged the robot may run out of battery power and need to recharge during peak demand and if the battery is over-charged there is a risk demand will arrive during charging. Moreover, this problem also has uncertainty since the time when the demand may arrive is unknown, although the probabilistic graph model may be used to derive a distribution for the arrival of the demand.
As shown in
Once the decision node and utility node have been added to the probabilistic graph model 700, the optimization problem may be defined, where the optimization seeks to find a value of the set point (e.g., the decision node 740) that maximizes the expected value of the utility node 770. This optimization problem may be expressed as:
- Given a Decision node (i.e., battery set point) A = {a1, ..., aK}
- And a Target Outcome node (i.e., throughput) X = {x1, ..., xN})
- And a Probabilistic Outcome Model (i.e., the probabilistic graph model 700) P(X|A)
- And a Utility function U : X → ℝ
Applying the principle of Maximum Expected Utility, a decision a* may be chosen that maximizes the expected utility:
Once the optimization problem is defined, the probabilistic graph model 700 may be used to solve the optimization problem. The solution to the optimization problem may represent an optimized configuration of the decision variable within the probabilistic graph model. Since the probabilistic graph model represents a digital twin of the real world counterpart, the solution to the optimization problem may also represent an optimum configuration of the real world counterpart. As such, the solution to the optimization problem may be used to generate a control signal and the control signal may be transmitted to the real world counterpart. For example, in the example described above, the optimization problem seeks to optimize the set point, which represents a level of charge that a robot achieve during a charging cycle or session. Transmitting the control signal to the real world counterpart of the robot may enable the real world counterpart to be operated in an optimal manner.
As described above with reference to
In
It is noted that the exemplary optimization problem described and illustrated with reference to
Referring to
As shown in
The region 950 may also enable the user to compare, analyze, and simulate portions of the model and view metric and state information for the model. For example, selection of the compare icon 952 may transition the user interface 900 to the view shown in
Referring to
Region 946 displays statistics associated with the robot shown in region 932. For example, region 946 may display a current status of the robot, a success rate of the robot, an average task duration, a current attachment used by the robot, and a last completed task. Region 948 may display information associated with movement activity of the robot. For example, as shown in region 948 of
Region 954 may display a portion of the knowledge graph corresponding to the robot (or other relevant portion of the domain depicted in the region 932). Region 960 provides a query builder that enables the user to generate queries using a series of interactive elements. For example, dropdown menus 961, 962, 963, 964 enable the user to configure queries that may be run against a probabilistic graph model corresponding to the knowledge graph displayed in the region 954. As an example, dropdown menu 961 may enable the user to configure the robot, such as to select an attachment for the robot; dropdown menu 962 may enable the user to select a particular attachment for the robot from among a set of available attachments; dropdown 963 may enable the user to configure a test for the robot, such as selecting a task type to test for the robot; and dropdown 964 may enable the user to select a particular task from among a set of available tasks. The query builder may also enable the user to add additional features to the query and may specify a target for the query via dropdown menu 965. Once the query is configured, the user may activate an interactive elements (e.g., the run button below the dropdown menu 965) to execute the query against the probabilistic graph model. The distribution returned by the query may be displayed in the region 966. It is noted that the dropdown menus shown in the query builder of region 960 may be populated using information derived from the knowledge graph and/or a probabilistic graph model derived from the knowledge graph in accordance with the concepts disclosed herein. Moreover, while the interactive elements shown in region 960 include dropdown menus, it is to be understood that such elements have been provided for purposes of illustration, rather than by way of limitation and that other types of interactive elements and controls may be used to build queries in an intuitive manner using the concepts disclosed herein.
Referring to
Similar to the region 960 of the “Analytics” interface of
As described above with reference to
As shown above with reference to
Additionally, creating the knowledge graphs and probabilistic graph models in the manner described herein results in a tight coupling of the data and knowledge graph/probabilistic model, enabling analytics and simulations to be run from a single platform while providing the ability to express the data both deterministically and probabilistically. It is to be appreciated that the probabilistic querying capabilities provided by the probabilistic graph models extends knowledge graphs and enables development of complex ad-hoc condition probabilistic queries with uncertainty quantification. Such capabilities enable a more robust what-if analysis that extracts meaningful data from the probabilistic graph model and also enables automation of “optimal” decision-making, as described with reference to
Referring to
The method 1000 demonstrates the process of creating the digital twin in accordance with the concepts described above with reference to
At step 1040, the method 1000 includes converting, by the one or more processors, the knowledge graph to a probabilistic graph model. As explained above with reference to
As described above, the method 1000 supports generation of probabilistic graph models in an ontology driven manner, thereby enabling the method 1000 to be applied across any domain, rather than being domain specific as in some presently available tools for developing digital twins. Additionally, by exploiting knowledge graphs having integrated domain data, the method 1000 is able to rapidly define the probabilistic graph model and its random variables and then learn the unknown parameters from the data directly without use of computational expensive computations, such as Equation 3. Additionally, the final probabilistic graph model extends the knowledge graph and enables the data of the knowledge graph to be expressed both deterministically and probabilistically, which enables the final probabilistic graph model to be queried using conditional probability queries with uncertainty quantification and automated “optimal” decision making. Furthermore, due to the tight coupling of the data and final probabilistic graph model, analytics may be obtained from the final probabilistic graph model itself, rather than requiring the data to be moved to another platform, as is required for some digital twins platforms.
It is noted that other types of devices and functionality may be provided according to aspects of the present disclosure and discussion of specific devices and functionality herein have been provided for purposes of illustration, rather than by way of limitation. It is noted that the operations of the method 1000 of
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Components, the functional blocks, and the modules described herein with respect to
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure. Skilled artisans will also readily recognize that the order or combination of components, methods, or interactions that are described herein are merely examples and that the components, methods, or interactions of the various aspects of the present disclosure may be combined or performed in ways other than those illustrated and described herein.
The various illustrative logics, logical blocks, modules, circuits, and algorithm processes described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. The interchangeability of hardware and software has been described generally, in terms of functionality, and illustrated in the various illustrative components, blocks, modules, circuits and processes described above. Whether such functionality is implemented in hardware or software depends upon the particular application and design constraints imposed on the overall system.
The hardware and data processing apparatus used to implement the various illustrative logics, logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, or any conventional processor, controller, microcontroller, or state machine. In some implementations, a processor may also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In some implementations, particular processes and methods may be performed by circuitry that is specific to a given function.
In one or more aspects, the functions described may be implemented in hardware, digital electronic circuitry, computer software, firmware, including the structures disclosed in this specification and their structural equivalents thereof, or any combination thereof. Implementations of the subject matter described in this specification also may be implemented as one or more computer programs, that is one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. The processes of a method or algorithm disclosed herein may be implemented in a processor-executable software module which may reside on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that may be enabled to transfer a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media can include random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection may be properly termed a computer-readable medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, hard disk, solid state disk, and Blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and instructions on a machine readable medium and computer-readable medium, which may be incorporated into a computer program product.
Various modifications to the implementations described in this disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to some other implementations without departing from the spirit or scope of this disclosure. Thus, the claims are not intended to be limited to the implementations shown herein, but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein.
Additionally, a person having ordinary skill in the art will readily appreciate, the terms “upper” and “lower” are sometimes used for ease of describing the figures, and indicate relative positions corresponding to the orientation of the figure on a properly oriented page, and may not reflect the proper orientation of any device as implemented.
Certain features that are described in this specification in the context of separate implementations also may be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation also may be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Further, the drawings may schematically depict one more example processes in the form of a flow diagram. However, other operations that are not depicted may be incorporated in the example processes that are schematically illustrated. For example, one or more additional operations may be performed before, after, simultaneously, or between any of the illustrated operations. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products. Additionally, some other implementations are within the scope of the following claims. In some cases, the actions recited in the claims may be performed in a different order and still achieve desirable results.
As used herein, including in the claims, various terminology is for the purpose of describing particular implementations only and is not intended to be limiting of implementations. For example, as used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). The term “coupled” is defined as connected, although not necessarily directly, and not necessarily mechanically; two items that are “coupled” may be unitary with each other. the term “or,” when used in a list of two or more items, means that any one of the listed items may be employed by itself, or any combination of two or more of the listed items may be employed. For example, if a composition is described as containing components A, B, or C, the composition may contain A alone; B alone; C alone; A and B in combination; A and C in combination; B and C in combination; or A, B, and C in combination. Also, as used herein, including in the claims, “or” as used in a list of items prefaced by “at least one of” indicates a disjunctive list such that, for example, a list of “at least one of A, B, or C” means A or B or C or AB or AC or BC or ABC (that is A and B and C) or any of these in any combination thereof. The term “substantially” is defined as largely but not necessarily wholly what is specified – and includes what is specified; e.g., substantially 90 degrees includes 90 degrees and substantially parallel includes parallel – as understood by a person of ordinary skill in the art. In any disclosed aspect, the term “substantially” may be substituted with “within [a percentage] of” what is specified, where the percentage includes 0.1, 1, 5, and 10 percent; and the term “approximately” may be substituted with “within 10 percent of” what is specified. The phrase “and/or” means and or.
Although the aspects of the present disclosure and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular implementations of the process, machine, manufacture, composition of matter, means, methods and processes described in the specification. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or operations, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized according to the present disclosure. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or operations.
Claims
1. A method for creating digital twins, the method comprising:
- obtaining, by one or more processors, a dataset, wherein the dataset comprises an ontology and domain data corresponding to a domain associated with the ontology;
- constructing, by the one or more processors, a knowledge graph based on the ontology;
- incorporating, by the one or more processors, at least a portion of the domain data into the knowledge graph; and
- converting, by the one or more processors, the knowledge graph to a probabilistic graph model, wherein the probabilistic graph model comprises probability distributions having unknown parameters;
- learning, by the one or more processors, the unknown parameters based on the portion of the domain data; and
- updating, by the one or more processors, the probabilistic graph model based on the learning, wherein the updating replaces the unknown parameters with parameters obtained via the learning.
2. The method of claim 1, further comprising:
- generating a generative program based on the probabilistic graph model; and
- generating a set of guessed parameters corresponding to the unknown parameters, wherein the learning comprises: feeding a set of data and the set of guessed parameters to the generative program to produce simulation data; and determining whether the simulation data approximates a set of real data to within a threshold tolerance, wherein the probabilistic graph model is updated to include the parameters obtained via the learning when the simulation data approximates the set of real data to within the threshold tolerance, and wherein the parameters obtained via the learning correspond to the set of guessed parameters.
3. The method of claim 2, further comprising:
- modifying the set of guessed parameters in response to a determination that the simulation data does not approximate the set of real data to within the threshold tolerance; and
- iteratively performing the feeding, the determining, and the modifying until the simulation data approximates the set of real data to within the threshold tolerance.
4. The method of claim 2, wherein the generative program comprises deterministic statements and probabilistic statements corresponding to the probability distributions.
5. The method of claim 1, further comprising exploiting the knowledge graph to obtain random variables, each of the random variables corresponding to one of the probability distributions.
6. The method of claim 5, further comprising obtaining a joint probability distribution of the probabilistic graph model based on the probability distributions for each of the random variables.
7. The method of claim 6, further comprising:
- detecting a missing value in the domain data; and
- updating the domain data to include a value for the missing value based on a probability distribution corresponding to the missing value derived from known values associated with the missing value.
9. The method of claim 1, further comprising:
- generating a query associated with at least one random variable of the probabilistic graph model; and
- running the query against the probabilistic graph model to obtain a query result, wherein the query result comprises a distribution associated with the at least one random variable.
10. The method of claim 1, wherein the knowledge graph represents semantic relationships and the probabilistic graph model represents statistical dependencies.
11. A system for creating digital twins, the system comprising:
- a memory; and
- one or more processors communicatively coupled to the memory, the one or more processors configured to: obtain a dataset, wherein the dataset comprises an ontology and domain data corresponding to a domain associated with the ontology; construct a knowledge graph based on the ontology; incorporate at least a portion of the domain data into the knowledge graph; convert the knowledge graph to a probabilistic graph model, wherein the probabilistic graph model comprises probability distributions having unknown parameters; learn the unknown parameters based on the portion of the domain data; and update the probabilistic graph model based on the learning, wherein the unknown parameters are replaced with parameters obtained via the learning via the update.
12. The system of claim 11, the one or more processors configured to:
- provide a graphical user interface providing query building functionality;
- receive inputs associated with random variables of the probabilistic graph model, each of the random variables corresponding to one of the probability distributions; and
- generate a query based on the received inputs.
13. The system of claim 12, wherein the query comprises a conditional probabilistic query regarding values of one or more of the random variables with uncertainty quantification.
14. The system of claim 11, wherein the probabilistic graph model comprises a plurality of nodes and edges connecting at least some of the plurality of nodes to one or more other nodes.
15. The system of claim 14, wherein the probability distributions correspond to random variables, and wherein each of the random variables corresponds a node of the plurality of nodes or one of the edges.
16. The system of claim 14, wherein updating the probabilistic graph model comprises embedding the probability distributions into the probabilistic graph model.
17. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations for generating probabilistic graph models, the operations comprising:
- obtaining a dataset, wherein the dataset comprises an ontology and domain data corresponding to a domain associated with the ontology;
- constructing a knowledge graph based on the ontology;
- incorporating at least a portion of the domain data into the knowledge graph; and
- converting the knowledge graph to a probabilistic graph model, wherein the probabilistic graph model comprises probability distributions having unknown parameters;
- learning the unknown parameters based on the portion of the domain data; and
- updating the probabilistic graph model based on the learning, wherein the updating replaces the unknown parameters with parameters obtained via the learning.
18. The non-transitory computer-readable storage medium of claim 17, the operations further comprising:
- obtaining additional domain data; and
- performing additional learning based on the additional domain data.
19. The non-transitory computer-readable storage medium of claim 18, the operations further comprising modifying at least one parameter obtained via the learning based on at least one parameter obtained via the additional learning.
20. The non-transitory computer-readable storage medium of claim 17, wherein the probabilistic graph model represents a digital twin of a real world counterpart, the operations comprising:
- generating a query associated with a random variable of the probabilistic graph model, the random variable corresponding to one of the probability distributions;
- running the query against the probabilistic graph model to obtain a query result, wherein the query result comprises a distribution associated with the random variable;
- generating a control signal based on the query result; and
- transmitting the control signal to the real world counterpart.
Type: Application
Filed: Feb 25, 2022
Publication Date: Aug 31, 2023
Inventors: Zaid Tashman (San Francisco, CA), Matthew Kujawinski (San Jose, CA), Neda Abolhassani (San Mateo, CA), Sanjoy Paul (Sugar Land, TX), Thien Quang Nguyen (San Jose, CA), Eric Annong Tang (Fremont, CA), Jessica Huey-Jen Yeh (Sunnyvale, CA)
Application Number: 17/681,699