TEXT-BASED INFERENCE CHAINING
A method, system and computer program product for generating inference graphs over content to answer input inquiries. First, independent factors are produced from the inquiry, and these factors are converted to questions. The questions are then input to a probabilistic question answering system (PQA) that discovers relations which are used to iteratively expand an inference graph starting from the factors and ending with possible answers. A probabilistic reasoning system is used to infer the confidence in each answer by, for example, propagating confidences across relations and nodes in the inference graph as it is expanded. The inference graph generator system can be used to simultaneously bi-directionally generate forward and backward inference graphs that uses a depth controller component to limit the generation of both paths if they do not meet. Otherwise, a joiner process forces the discovery of relations that may join the answers to factors in the inquiry.
Latest IBM Patents:
The present disclosure generally relates to information retrieval, and more specifically, automated systems that provide answers to questions or inquiries.
Generally, there are many types of information retrieval and question answering systems, including expert or knowledge-based (KB) systems, document or text search/retrieval systems and question answering (QA) systems.
Expert or knowledge-based systems take in a formal query or map natural language to a formal query and then produce a precise answer and a proof justifying the answer based on a set of formal rules encoded by humans.
Document or text search systems are not designed to deliver and justify precise answers. Rather they produce snippets or documents that contain key words or search terms entered by a user, for example, via a computing system interface, e.g., a web-browser. There is no expectation that the results provide a solution or answer. Text search systems are based on the prevailing and implicit assumption that all valid results to a query are documents or snippets that contain the keywords from the query.
QA systems provide a type of information retrieval. Given a collection of documents (such as the World Wide Web or a local collection), a QA system may retrieve answers to questions posed in natural language. QA is regarded as requiring more complex natural language processing (NLP) techniques than other types of information retrieval, such as document retrieval, and QA is sometimes regarded as the next step beyond search engines.
Traditional QA systems deliver precise answers, unlike document search systems, but do not produce paths of justifications like expert systems. Their justifications are “one-step” meaning that they provide an answer by finding one or more passages that alone suggest that proposed or candidate answer is correct.
It would be highly desirable to provide a system and method that can answer complex inquiries that search systems, classic expert/KB systems and simpler QA systems can not handle.
SUMMARYEmbodiments of the invention provide a method, system and computer program product that can answer complex inquiries that search systems, classic expert/KB systems and simpler QA systems can not handle.
In one aspect, there is provided a system, method and computer program product for inferring answers to inquiries. The method comprises: receiving an input inquiry; decomposing the input inquiry to obtain one or more factors, the factors forming initial nodes of an inference graph; iteratively constructing the inference graph over content one or more from content sources, wherein at each iteration, a processing device performs discovering solutions to the input inquiry by connecting factors to solutions via one or more relations, each relation in an inference graph being justified by one or more passages from the content, the inference graph connecting factors to the solutions over one or more paths having one or more edges representing the relations; and, providing a solution to the inquiry from the inference graph, wherein a programmed processor device is configured to perform one or more the receiving, decomposing and the iteratively constructing the inference graph to provide the solution.
In a further aspect, a method of inferring answers to inquiries comprises: receiving an input inquiry; decomposing the input inquiry to obtain one or more factors; decomposing the input inquiry into query terms, and using the query terms to obtain one or more candidate answers for the input inquiry; iteratively constructing using a programmed processor device coupled to a content storage source having content, a first inference graph using the factors as initial nodes of the first inference graph, a constructed first inference graph connecting factors to one or more nodes that lead to an answer for the inquiry over one or more paths having one or more edges representing the relations; simultaneously iteratively constructing, using the programmed processor device and the content source, a second inference graph using the one or more candidate answers as initial nodes of the second inference graph, the second inference graph connecting candidate answers to one or more nodes that connect to the one or more factors of the inquiry over one or more paths having one or more edges representing relations; and, generating, during the simultaneous iterative constructing, a final inference graph by joining the first inference graph to the second inference graph, the final inference graph having a joined node representing a solution to the input inquiry.
In a further aspect, a system for inferring answers to inquiries comprises: one or more content sources providing content; a processor device for coupling to the content sources and configured to: receive an input inquiry; decompose the input inquiry to obtain one or more factors, the factors forming initial nodes of an inference graph; iteratively construct the inference graph over content one or more from content sources, wherein at each iteration, the processing device discovers solutions to the input inquiry by connecting factors to solutions via one or more relations, each relation in an inference graph being justified by one or more passages from the content, the inference graph connecting factors to the solutions over one or more paths having one or more edges representing the relations; and, provide a solution to the inquiry from the constructed inference graph.
In a further aspect, there is provided a system for inferring answers to inquiries comprising: one or more content sources providing content; a programmed processor device for coupling to the content sources and configured to: receive an input inquiry; decompose the input inquiry to obtain one or more factors; and, decompose the input inquiry into query terms, and using the query terms to obtain one or more candidate answers for the input inquiry; iteratively construct a first inference graph using the factors as initial nodes of the first inference graph, a constructed first inference graph connecting factors to one or more nodes that lead to an answer for the inquiry over one or more paths having one or more edges representing the relations; simultaneously iteratively construct a second inference graph using the one or more candidate answers as initial nodes of the second inference graph, the second inference graph connecting candidate answers to one or more nodes that connect to the one or more factors of the inquiry over one or more paths having one or more edges representing relations; and, generate, during the simultaneous iterative constructing, a final inference graph by joining the first inference graph to the second inference graph, the final inference graph having a joined node representing a solution to the input inquiry.
A computer program product is provided for performing operations. The computer program product includes a storage medium readable by a processing circuit and storing instructions run by the processing circuit for running methods. The methods are the same as listed above.
The objects, features and advantages of the invention are understood within the context of the Detailed Description, as set forth below. The Detailed Description is understood within the context of the accompanying drawings, which form a material part of this disclosure, wherein:
The present disclosure is directed to an automated reasoning system and, particularly an inference graph generator system and methodology for automated answering of complex inquiries that is fundamentally different from all prior expert systems, knowledge-based systems, or automated reasoning systems.
In one aspect, inference graph generator system and methodology may function entirely over unstructured content (e.g. text), and, unlike prior systems, does not require the manual encoding of domain knowledge in the form of formal rules (if-then), axioms or procedures of any kind. Rather the system and methods discover paths from the inquiry to answers by discovering, assessing and assembling justifications from as-is natural language content. Such content is written for humans by humans, never requiring a knowledge engineer to formalize knowledge for the computer. Thus this makes the system and method a powerful reasoning system.
The inference graph generator system and methodology operates by providing an explanation of a precise answer based on inference graph that provides a multi-step path from elements in the query to answers or solutions.
The inference graph generator system and methodology discovers and justifies a multi-step path from the query to precise answers by iteratively leveraging a probabilistic text-based QA system component and a general probabilistic reasoner component. The present system and method combines these components to produce justified inference graphs over natural language content.
More particularly, as described in greater detail herein below, in one embodiment, the inference graph generator system and methodology combines probabilistic QA to discover answers and justifications with Bayesian-type inference to propagate confidence to build inferences graphs that justify multi-step paths from factors to answers.
As will be referred to herein, the following definitions are provided:
A Natural Language Inquiry is a statement or question in unrestricted natural language (e.g. English) that describes a problem, case or scenario in search of an answer or solution. One example is a simple question in search of a simple answer like “This man sailed across the Atlantic to India and discovered America.” or “Who sailed across the Atlantic . . . . ?” A further example includes a complex description of problems like a patient's history where a diagnosis, treatment or other result is sought after. For example: A 40-year-old female has pain on and off after eating fatty food. She has pain in the epigastric region and sometimes on the right side of her abdomen. After assessing the patient you order ultrasound of the gallbladder. The ultrasound shows presence of gallstones (choledocholithiasis) but no evidence of cholecystitis. The patient goes for an elective cholecystectomy. Pathological examination of the gallbladder showed 3 mixed types of gallstones. The gallbladder mucosa is expected to reveal what change?
A Factor is a logically independent element of an inquiry. One example is: “sailed across the Atlantic” “discovered America”, “Patient is 40 years old”, “has pain on and off after eating fatty food.
A Relation is a named association between two concepts. For general examples: A “indicates” B, A “causes” B, A “treats” B, A “activates” B, A “discovered” B. The concepts are considered the “arguments” or “end points” of the relation. Concepts are represented by named entities (Washington) or simply phrases (chain smoking) For domain-specific examples (in predicate argument form): author of (Bramstoker, Dracula), president of (Obama, US), causes (smoking, lung cancer), treats (aspirin, stroke)).
A Question is a single sentence or phrase in natural language (e.g., English) or a formal language (e.g., First order logic) that intends to ask for the end point(s) of an relation or to ask whether or not a relation between two concepts is true. One example is:
“What does aspirin treat?”/treat(aspirin, X)
“Does Aspirin treat Strokes?”/treat(aspirin, strokes)).
A Statement is a natural language expression, a structured relation, or a semi-structured relation. Statements are often used to represent factors and may come from structured or unstructured content. Some non-limiting examples:
Patient's hemoglobin concentration is 9 g/dL
“low hemoglobin concentration” (Patient)
Has Condition(Patient, anemia)
The patient's mother was diagnosed with breast cancer at the age of 35
An Answer or Solution is an element of text—A word, number, phrase, sentence, passage or document. An answer is thought to be correct or partially correct with respect to a question or inquiry if a human considers it useful response to the question or inquiry. In the case of a simple question or relation, the answer is typically the sought-after end-point of the relation, e.g., “Who discovered America in 1492?” The answer is the missing concept, X in the relation “X discovered America”.
Unstructured Content is textual data (e.g., books, journals, web pages, documents etc) and is typically used as a source for answers and as a source for justifications of those answers. Is further used to justify or evidence the answer to a question or more specifically the truth of a relation (note: it can consider non-text to determine this). More generally, unstructured content may refer to a combination of text, speech and images.
Structured Content is any database or knowledgebase where data is encoded as structured relations. A relational database is typical as is a logical-based knowledgebase.
Content is any combination of unstructured and structured content.
Passage is a sequence of natural language text—one or more phrases, sentences or paragraphs. These are usually made of up 1-5 sentences.
Justifying Passage is a passage thought to explain or justify why an answer may be correct to a given question.
Confidence is an indication of the degree to which a relation is believed true, e.g., a measure of certainty or probability that a relation is true. It is usually represented as a number. It may but does not necessarily have to represent a probability.
An Inference Graph is any graph represented by a set of nodes connected by edges, where the nodes represent statements and the arcs represent relations between statements. Each relation may be associated with a confidence, and each concept in a relation may be associated with a confidence. Each edge is associated with a set of passages providing a justification for why that relation may be true. Each passage justifying an edge may be associated with a confidence indicating how likely the passage justifies the relation. An inference graph is used to represent relation paths between factors in an inquiry and possible answer to that inquiry. An inference graph is multi-step if it contains more than one edge in a path from a set of factors to an answer. In one embodiment, graph nodes, edges/attributes (confidences), statements and relations may be represented in software, as Java objects. Confidences, strengths, and probabilities are attached to them for processing by various computer systems.
A PQA System (Probabilistic QA System) is any system or method that produces answers to questions and may associate those answers with confidences indicating the likelihood the answers are correct, and that may associate answers with a passage-based justification that are intended to explain to humans why the answer is likely correct.
In one aspect, the text-based inference chaining system and method 100 provides a system and method that discovers and justifies answers to inquiries by constructing inference graphs over content connecting factors to answers such that each relation in an inference graph is justified by one or more passages from the content and where the inference graph may connect factors to answers over a path containing one or more edges (i.e., multi-step inference graph).
At the start of the iteration(s), from the generated initial inference graph 110I (or a generated updated inference graph 110U to be extended in a subsequent iteration), a question generator 112 implements a programmed process to first generate questions for the PQA system 115 to answer. As revised inference graphs are generated at each iteration, new questions may be generated for PQA system to answer. Particularly, at each iteration for every new end-point of every new relation in the inference graph, the question generator 112 formulates one or more questions for the PQA system to answer. Question generator component 112 is described in greater detail herein with respect to
The PQA system 115 performs processes to obtain or discover new relations 116 that answer the questions from the structured or unstructured content 105. The discovered new relations 116 additionally include confidences and may be stored as data in a storage device 117 which may be or include the storage device 107.
As further shown in
More particularly, the graph extender 118 takes as input the previous inference graph 110P and a set of new relations 116 discovered by the PQA component and outputs a new inference graph 110E that includes the new relations. It performs this by merging nodes in the input inference graphs with nodes in the new relations and adding them to the graph. An example follows:
Input: Inference Graph: A→B→C Input: New Relations: C1→D Output: A→B→(C/C1)→Dwhere C and C1 where merged (considered the same node). The computed confidence on C/C1→D is the same confidence produced by the PQA 115 system's answer to the question about C that produced C1→D.
In one embodiment, merging nodes may be implemented using some form of “specialization”. For example, if C was “diabetes”, and D was “blindness”, the question generated was “What causes blindness?” and the PQA system produces and relation “diabetes mellitus causes blindness” then the graph extender 118 would merge “diabetes” with “diabetes mellitus”. In this case the embodiment may only merge nodes if they were identical or if answer was a connected to a more specific concept. Thus, “diabetes” would merge with “diabetes” or with “diabetes mellitus”. At this point, confidences are not re-propagated over the extended graph 110E as this is performed by the reasoner component 150.
As shown in
The reasoner component 150 is described in greater detail herein with respect to
Returning to
The depth controller component 175 is described in greater detail in
Returning to
Generally, the relation type injection component 130 receives the initial inference graph 110I, and considers the inquiry and the set of initial factors 106 to determine a set of seed relations or relation types 135 for use by the question generation component 112. The question generation component 112 is parameterized to allow for the independent provision of a set of relation types 135. These are then used as seeds for generating questions for the PQA system 115.
As shown in
-
- A 63-year-old patient is sent to the neurologist with a clinical picture of resting tremor that began 2 years ago. At first it was only on the left hand, but now it compromises the whole arm. At physical exam, the patient has an unexpressive face and difficulty in walking, and a continuous movement of the tip of the first digit over the tip of the second digit of the left hand is seen at rest. What part of his nervous system is most likely affected?
As shown, the following factors 94 generated by the inference chaining system and method may include the following:
63-year-old
Resting tremor began 2 years ago
. . . in the left hand but now the whole arm
Unexpressive face
Difficulty in walking
Continuous movement in the left hand
In a first iteration of the inference chaining method, factors 94 obtained from the input query may be found associated with (i.e., relate to) inferred nodes 95, e.g., Parkinson's Disease 95A, or Athetosis 95B. From inferred node 95B, further answers 95C, 95D may be inferred from additional relations obtained in a further iteration of the inference chaining method. For each of the factors found for the medical domain example, a respective relation that associates the factor to an answer is created and represented as an edge in the inference graph. For example, for each of the following factors 94A in the medical domain example relating to an inferred answer Parkinson's Disease:
63-year-old
Resting tremor began 2 years ago
. . . Unexpressive face
the following relations corresponding to respective justifying passages represented by respective inference graph edges of the inference graph found at a first inference chaining iteration are listed below.
Edge: 96A indicates Parkinson's Disease by a discovered example justifying passage: “The mean age of onset of Parkinson's Disease is around 60 years.”
Edge: 96B: indicates Parkinson's Disease by a discovered example justifying passage: “Resting tremor is characteristic of Parkinson's Disease.”
Edge: 96C indicates Parkinson's Disease by a discovered example justifying passage: “Parkinson's disease: A slowly progressive neurologic disease that is characterized by a fixed inexpressive face . . . ”
Further in the medical domain example, in a first iteration of the inference chaining method, factors 94B may each be found associated with (i.e., relate to) a node 95B, e.g., Athetosis. For example, for each of the following factors 94B in the medical domain example relating to answer Athetosis:
Difficulty in walking
Continuous movement in the left hand
the following relations corresponding to respective justifying passages with representative respective inference graph edges are listed below.
Edge: 96D indicates Athetosis by a discovered example justifying passage: “Patients suffering from athetosis often have trouble in daily activities such as eating, walking, and dressing”
Edge: 96E indicating Athetosis by a discovered example justifying passage: “Athetosis is defined as a slow, continuous, involuntary writhing movement that prevents the individual from maintaining a stable posture.”
As shown in the graph of
Further in the medical domain example of
For example, inferred node 95B Athetosis becomes a new factor from which new questions are generated and new relations 97A and 97B inferred from PQA/reasoner implementation leading to new inferred nodes, Basal Ganglia 95C and Striatum 95D. The following are relations represented by respective inference graph edges based on the newly discovered nodes 95C, 95D:
Edge: 97A indicating Basal Ganglia 95C by a discovered example justifying passage: “Athetosis is a symptom primarily caused by the marbling, or degeneration of the basal ganglia. In one embodiment, this discovered relation may have resulted from injecting a “caused by” or “affects” relation in a relation injection process.
Edge: 97B indicating Striatum 95D by a discovered example justifying passage: “Lesions to the brain, particularly to the corpus striatum, are most often the direct cause of the symptoms of athetosis. In one embodiment, this discovered relation may have resulted from injecting a “caused by” relation in a relation injection process.
The thickness of node graph edges 97A, 97B indicates a confidence level in the answer (e.g., a probability), and the strength of the associated relation.
Further in the medical domain example of
Edge: 93A indicating Substantia nigra by example justifying passage: “Parkinson's disease is a neurodegenerative disease characterized, in part, by the death of dopaminergic neurons in the pars compacta of the substantia nigra.” This relation may have been discovered by injecting a “caused by” relation in a relation injection process.
Edge: 93B indicating Substantia nigra by example justifying passage: “The pars reticulata of the substantia nigra is an important processing center in the basal ganglia.” This relation may have been discovered by injecting an “contains” relation in a relation injection process.
Edge: 93C indicating Substantia nigra by example justifying passage: “Many of the substantia nigra's effects are mediated through the striatum.” This relation may have been discovered by injecting an “associated with” relation in a relation injection process.
Although not shown, it is assumed that from these inferred nodes 95 of the medical domain example of
As shown, the substantial thickness of edges 93A and 93B relating to the candidate answer, Substantia nigra 98D, indicate corresponding associated scores having a higher confidence. Furthermore, the answer node Substantia nigra 98D is shown having a substantially thicker border compared to the other candidate answers 98 because the overall confidence score for Substantia nigra 98D is higher than the other candidate answers. As such, Substantia nigra 96D would be the most likely candidate answer to the question 92 as reflected by the check mark.
The NLP stack 210 components include, but are not limited to, relationship classification 210A, entity classification 210B, parsing 210C, sentence boundary detection 210D, and tokenization 210E processes. In other embodiments, the NLP stack 210 can be implemented by IBM's LanguageWare®, Slot Grammer as described in Michael C. McCord, “Using Slot Grammer,” IBM Research Report 2010, Stanford University's Parser as described in Marie-Catherine de Marneffe, et. al., “Generating Typed Dependency Parses from Phrase Structure Parses,” LREC 2006, or other such technology components.
Factor identification component 208 implements processes for selecting factors and may include a process that selects all the entities classified as symptoms, lab-tests or conditions by the NLP Stack 210. Factor weighting component 212 may implement such techniques as inverse document frequency (IDF) for producing weights for each of the factors.
Factor analysis component 104 identifies segments of the input inquiry text as “factors”. This may be terms, phrases or even entire sentences from the original input. A very simple implementation of factor identification, for example in the case of USMLE (United States Medical Licensing Examination® (see http://www.usmle.org/) questions, are that the actual sentences in the case are each a factor.
In one embodiment, the factor identification takes as input a natural language inquiry and produces as initial inference graph containing one or more nodes—these nodes are referred to as the factors. A factor is a statement that is asserted to be true in the natural language inquiry. For example, in the medical domain, the inquiry may provide several observations about a patient and then ask a specific question about that patient, as in:
-
- A 63-year-old patient is sent to the neurologist with a clinical picture of resting tremor that began 2 years ago. At first it was only on the left hand, but now it compromises the whole arm. At physical exam, the patient has an unexpressive face and difficulty in walking, and a continuous movement of the tip of the first digit over the tip of the second digit of the left hand is seen at rest. What part of his nervous system is most likely affected?
The factor analysis component 104, may choose to generate factors at various levels of granularity. That is, it is possible for the text-based inference chaining system and method to use more than one factor identification component 208. The level of granularity is programmable so that: (1) questions can be subsequently generated for the PQA system from each factor because the quality of the PQA system's answers may depend on the size and amount of information content in the question; and (2) the resulting inference graph could be used to explain to a user what factors were indicative of different candidate answers. For example, if the factors are very coarse grained this may have limited utility.
In one example, factor analysis implementation might produce just one factor that contains all of the information in the inquiry. However, this level of granularity provides two problems, (1) the PQA may not be as effective on a question that is generated from such a coarse-grained factor, and (2) even if a good answer can be produced, the resulting inference graph may not explain what part of the inquiry was most important in determining the decision, which is useful information for the user.
In a further factor analysis implementation example, the inquiry is divided by the sentences. In the above-identified medical domain example, the factor analysis component would produce three separate factors (initial nodes in the inference graph), with the following statements:
-
- 1) A 63-year-old patient is sent to the neurologist with a clinical picture of resting tremor that began 2 years ago.
- 2) At first it was only on the left hand, but now it compromises the whole arm.
- 3) At physical exam, the patient has an unexpressive face and difficulty in walking, and a continuous movement of the tip of the first digit over the tip of the second digit of the left hand is seen at rest.
To produce more fine-grained factors, natural language processing (NLP) components such as parsers, entity recognizers, relation detectors, and co-reference resolvers could be used. One use case for a co-reference resolver is in the example of second factor 2) above, where it would be important to know that the word “it” refers to the “tremor”. Named entity recognizers are implemented to identify mentions of important domain concepts, such as symptoms in the medical domain. Relation detectors, often based on the parser output, can be used to identify if those concepts are attributed to the patient. A factor analysis component 104 implementation based on such NLP analysis might then produce factors such as:
-
- 1) Patient is 63-years old
- 2) Patient has resting tremor
- 3) Tremor began 2 years ago
- 4) Tremor was only on the left hand, but now it compromises the whole arm
- 5) Patient has unexpressive face
- 6) Patient has difficulty in walking
- 7) Continuous movement of the tip of the first digit over the tip of the second digit of the left hand is seen at rest.
As further shown, the factor weighting component 212 is useful as some factors may be more important than others in finding and scoring an answer. Various techniques are possible for initializing the confidence weighting in each factor. For example, the factor with the must unique terms relative to the domain may be given a higher weight than other factors. Known techniques including inverse document frequency (IDF) can be used for producing weights for each of the factors. As shown, the resulting set of factors 215 is generated after the factor analysis process is complete, each factor representing the initial nodes 106 in an initial inference graph 1101.
Inference chaining systems 100, 100′, 100″ of respective
Question generation component 112 takes as input a node 106 from an initial inference graph 1101 and produces as output one or more natural language questions 315, formatted in a manner suitable for processing by the PQA system 115 in order to elicit responses that will be used to assert new relations into the inference graph.
In one embodiment, the question generation component 112 performs processes to produce questions that only ask for one kind of relation. For example, the “causes” relation. A simple implementation could just produce questions of the form “What causes: X?” where X is the text of the inference graph node 106. Thus, from the above described medical domain example, given the initial graph node 106
-
- Patient has resting tremor
Question Generation component 112 may generate the question:
-
- What causes: Patient has resting tremor?
Another embodiment might produce more straightforward and grammatical questions, for example by applying question generation patterns or templates 125. An example of such a pattern could represent that the reference to a patient can be eliminated and in the above medical domain example produce the question:
-
- What causes resting tremor?
Depending on the PQA system 115, asking this question may result in improved answers. Question generation component 112 further implements programmed processes for producing questions that ask for many different kinds of relations (e.g., “causes”, “indicates”, “is associated with”, “treats”).
As further shown in
The question generation component 112 then in its general form combines relation types 136 with question templates or patterns 125. For example, relation types 136 “causes”, “indicates” or “treats” can be applied to question templates 125 such as:
-
- What <relation> <factor>?
- What <inverse-relation> <factor>?
To get corresponding questions such as, for example
-
- What causes <factor>?
- What is caused by <factor>?
where depending on the node in the inference graph, the process may decide to substitute <factor> with the node phrase, for example: - “resting tremor”
would produce the question: - What causes a resting tremor?
and - What indicates a resting tremor?
As mentioned above in connection with
In one embodiment, a method for computing probabilities at a node may include counting the number of paths to each node, and normalizing to make a number between 0 and 1 for each node.
In a further embodiment, as shown as processes 153 and 155, a Bayesian network is generated from the inference graph. As shown in
Assimilation includes processes 153 to convert the set of relations into a valid Bayesian network having no cycles. Processes may be optionally performed to optimize the graph for inference by removing redundant paths. It is understood that a valid Bayesian network may have a different structure. For the example, as depicted in
Given the assimilated graph, inference includes processes 155 that are implemented to use belief propagation to infer the probabilities of unknown nodes (i.e., candidates) from probabilities of known nodes (i.e. factors).
In the reasoner component 150, inferred probabilities are then read back into the input inference graph, e.g., inference graph 110E, as shown at 157 by copying the number (probability value) computed from the Bayesian Network to the corresponding node in the inference graph which gets passed to the merging process 156 with unmodified structure.
In one embodiment, the reasoned component 150 does not return the assimilated Bayesian network. It leaves the input inference graph unchanged except for the computed (inferred) event probabilities as output inference graph 110U at 159. It is further understood that explanations may be generated by describing the edges along the strongest path (most belief propagated) from known factors to the chosen candidate, e.g., node 151a.
In
More generally, with reference to
The reasoner component 150 is programmed to assign a probability to all nodes, not just candidates, because the question generation component 112 may give higher priority to some non-candidate nodes based on their propagated probability. One particular implementation includes a Bayesian network but the reasoner component may implement other techniques.
For example, the Bayesian network may be used for training the probabilistic QA system as follows. Asserting the correct answer as set to probability 1, and disasserting the incorrect answers as set to probability 0. Then propagate belief through the graph. Edges that pass positive messages can be used as positive training examples, and edges that pass negative messages can be used as negative training examples.
As the inference graph 161 of
In
Although not shown in the visualization 160 in
Thus, text-based inference chaining system 100, 100′, 100″ of
As shown in
-
- Patient has Parkinson's Disease: 0.8
- Patient has Dystonia: 0.15
- Patient has Athetosis: 0.03
In backward-directed graph generation, processes are implemented to access a candidate answer generator 504 that receives the inquiry and conducts a search using known methods to produce possible answers (e.g., parts of the nervous system) based on the inquiry. For the above-described medical domain example (See
In backward-directed graph generation, components of the text-based chaining system 100, 100′, 100″ of
-
- What causes Substnatia Nigra to be affected?
- What causes Caudate nucleus to be affected?
The PQA system component 115 is invoked to produce answers to these questions. For example, Parkinson's Disease causes Substantia Nigra to be affected. The graph extender component 118 adds these as edges to the backward-directed graph. Multiple Iterations may be performed to form longer paths in the inference graph.
In one embodiment, the candidate answer generator may be implemented using the same methods used in IBM's DeepQA system for candidate answer generation such as described below with respect to
The inference graph joiner process 600 joins two paths from factors through intermediate nodes to possible answers, and specifically in connecting forward generated inferences graphs with backward generated inference graphs. A first and optional step in graph joining is node merging at node merging element 665. Node merger 665 implements programmed processes to analyze different concepts end-points within bi-directionally generated graphs and probabilistically determine if they refer to the same logical statements (concepts).
If any two different nodes in the graph are probabilistically determined with enough certainty that they do refer to the same concept, then they are merged into a single node reducing the number of paths in the graph. Node merging may further automatically connect/join two graphs (bi-directionally generated or not). This happens when the nodes that merged were from distinct graphs that the system was trying to join. The implicit question being answered by the node merger is “Do these two nodes refer to the same logical statement?” Thus, no explicit question is required to be asked to the PQA system to join the nodes as how it is done by the node joiner. If it is probabilistically determined that they do refer to the same concepts with enough certainty then they are merged into a single node reducing the number of extraneous or noisy paths in the graph that would dilute the confidence propagation. This may be performed using any number of term matching or co-reference techniques that look at syntactic, semantic or contextual similarity using techniques as known in the art. The MetaMap program referred to herein above is one example system that may be implemented in the medical domain. Given two terms, MetaMap may be used to determine if they refer to the same medical concept. In general, any “domain dictionary” that identifies synonymous terms for a given domain can be used in this way. As other medical domain examples, Diabetes may be merged with Diabetes Mellitus or Cold with Cold Virus or High Blood Pressure with Hypertension. Node joining performance will improve if it connects the merged node into another graph rather than connect them separately.
After invoking optional node merger 665, node joiner element 675 implements programmed processes to detect relation end-points that are not on a path connecting a factor to an answer and attempt to discover a link between them (the factor and answer) using a part of the system.
Particularly, joiner process 675 receives both bi-directionally generated graphs and searches for two disconnected nodes (one from each graph) that may be connected by a relation. For example, one backward directed graph node is “Diabetes” and the other node is “Blindness”. The node joiner generates questions that may link two nodes. For example:
-
- Does Diabetes cause Blindness?
As shown in
With respect to inference graph joiner process 600 of
For the medical domain example, it is the case that the forward-directed and backward-directed inference graphs naturally intersect. In this example, the forward-directed graph includes end-point “Parkinson's Disease” with high confidence, and the backward-directed graph includes the relation Parkinson's Disease causes Substantia Nigra to be affected, so when the graphs are combined there is a path leading from the initial factors to the candidate answer, and the iterative process terminates.
For the medical domain example described herein, programmed joiner process may provide example “Yes/No” questions that are generated by the question generation component for processing in the PQA system component 115. Examples are shown below.
-
- Does Parkinson's Disease cause Substantia nigra to be affected?
- Does Parkinson's Disease cause Caudate nucleus to be affected? . . . .
For the medical domain example described herein, example multiple-choice questions that are generated for processing in the PQA system component 115 may include:
-
- Parkinson's Disease causes which of the following to be affected: (Substantia nigra, Caudate nucleus, Lenticular nuclei, Cerebellum, Pons)
-
- ON HEARING OF THE DISCOVERY OF GEORGE MALLORY'S BODY, THIS EXPLORER TOLD REPORTERS HE STILL THINKS HE WAS FIRST.
and processing using one or more of the text analysis, factor identification and factor weighting components of the factor analysis component 200 ofFIG. 8 will obtain the following factors 606A, 606B as follows: - 606A: GEORGE MALLORY from “DISCOVERY OF GEORGE MALLORY'S BODY”
- 606B: FIRST EXPLORER from THIS EXPLORER TOLD REPORTERS HE STILL THINKS HE WAS FIRST
with emphasis indicating the initial nodes (factors) generated from the query. These will be simultaneously processed along parallel processing paths 605A, 605B, supported by the computing system described herein. In particular, using respective question generation components 612A, 612B. The question generation process 612A, 612B generates respective questions 613A, 613B. - 613A: This is associated with George Mallory
- 613B: This is associated with First Explorer
- ON HEARING OF THE DISCOVERY OF GEORGE MALLORY'S BODY, THIS EXPLORER TOLD REPORTERS HE STILL THINKS HE WAS FIRST.
Via parallel implementations of the PQA systems 615A, 615B, the following justifying passages 620A, 620B are obtained from the searched (structured+unstructured) content.
-
- 620A: George Herbert Leigh Mallory (18 Jun. 1886-8/9 Jun. 1924) was an English mountaineer who took part in the first three British expeditions to Mount Everest in the early 1920s.
- 620B: A mountaineering expert will today claim that Sir Edmund Hillary was not the first man to scale Everest—and that it was in fact conquered three decades before by the British climber George Mallory.
- 620C: Sir Edmund Hillary was a mountain climber and Antarctic explorer who, with the Tibetan mountaineer Tenzing Norgay, was the first to reach the summit of Mount Everest.
Resulting from implementation of the reasoner component 150 processes for propagating confidences, the following candidate answers 622A, 622B are generated:
-
- 622A: Mount Everest and
- 622B: Edmund Hillary
The increased thickness of the border for answer Edmund Hillary 622B indicates the relative increased confidence (score) associated with a higher confidence value as computed by the reasoner component 150 from which it is determinable as the best answer.
The joining is being used to determine how confidence flows between two possible answers (e.g., Mt. Everest and Edmund Hillary) discovered from different factors in the question (as the factor Edmund Hillary was also a candidate answer from the first factor discovered from the annotating passage connected to that link).
In the method shown in
-
- Is Mount Everest associated with Edmund Hillary?
Using processing by the PQA system component 115, it is readily determined that there is an association between the answers Mt. Everest and Sir Edmund Hillary as indicated by the “yes” answer 678 in the joiner 675. Thus, for example, the following justifying passage 620D is obtained from the searched (structured+unstructured) content:
-
- On 29 May 1953, Hillary and Tenzing Norgay became the first climbers confirmed as having reached the summit of Mount Everest.
Having established the relationship between answers Mt. Everest and Sir Edmund Hillary as indicated, the final inference graph of
In one embodiment, the architecture may be employed utilizing a common analysis system (CAS) candidate answer structures, and implementing supporting passage retrieval operations. For this processing, the evidence gathering module 370 implements supporting passage retrieval operations and the candidate answer scoring in separate processing modules for concurrently analyzing the passages and scoring each of the candidate answers as parallel processing operations. The knowledge base 321 includes content, e.g., one or more databases of structured or semi-structured sources (pre-computed or otherwise) and may include collections of relations (e.g., Typed Lists). In an example implementation, the answer source knowledge base may comprise a database stored in a memory storage system, e.g., a hard drive. An answer ranking module 360 provides functionality for ranking candidate answers, i.e., compute a confidence value, and determining a response 399 that is returned to the engine along with respective confidences for potentially extending the inference graph with nodes and relations. The response may be an answer, or an elaboration of a prior answer, or a request for clarification in response to a question—when a high quality answer to the question is not found.
In one embodiment, the system shown in
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a system, apparatus, or device running an instruction.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a system, apparatus, or device running an instruction.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may run entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Thus, in one embodiment, the system and method for efficient passage retrieval may be performed with data structures native to various programming languages such as Java and C++.
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which run via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which run on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more operable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be run substantially concurrently, or the blocks may sometimes be run in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments described above are illustrative examples and it should not be construed that the present invention is limited to these particular embodiments. Thus, various changes and modifications may be effected by one skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.
Claims
1. A method of inferring answers to inquiries comprising:
- receiving an input inquiry;
- decomposing the input inquiry to obtain one or more factors, said factors forming initial nodes of an inference graph;
- iteratively constructing said inference graph over one or more content sources, wherein at each iteration, a processing device discovers answers to said input inquiry by connecting factors to said answers via one or more relations, each relation in an inference graph being justified by one or more passages from said content sources, said inference graph connecting factors to said answers over one or more paths having one or more edges representing said relations; and,
- providing an answer to said inquiry from said inference graph,
- wherein a programmed processor device is configured to perform one or more said receiving, decomposing and said iteratively constructing said inference graph to provide said answer.
2. The method as claimed in claim 1, wherein said iteratively constructing said inference graph comprises:
- expanding said inference graph at each iteration by:
- generating one or more questions based on one or more current nodes in said graph;
- searching in one or more content sources to identify one or more relations leading to new answers and representing said new answers as new additional nodes in said inference graph, each new additional node connected via an edge representing the relation, and each relation having an associated justifying passage at an associated confidence level,
- inferring, from said associated confidence levels, a confidence level at each node of said inference graph to provide an updated inference graph,
- determining if the updated inference graph meets a criteria for terminating said iteration, and one of:
- terminating said iteration if said criteria is met; otherwise,
- repeating said generating, searching, inferring and determining steps with said new additional nodes being current nodes for a next iteration,
- wherein, upon terminating, said answer to said inquiry is a node from said updated inference graph.
3. The method as claimed in claim 2, wherein said searching comprises:
- identifying one or more justifying passages supporting a relation between connected nodes of said inference graph.
4. The method as claimed in claim 2, wherein said terminating criteria comprises: identifying a node of said updated inference graph having an inferred confidence value exceeding a predetermined threshold; or,
- performing a predetermined number of iterations.
5. The method as claimed in claim 2, wherein said inferring a confidence level comprises:
- forming a Bayesian network from nodes and relations of said inference graph and an associated confidence value representing a probability of belief that a supporting passage justifies the answer for the node; and, in each answer
- propagating associated confidence values across said relations and nodes represented in said Bayesian network.
6. The method as claimed in claim 2, wherein said factors or current nodes comprise a statement, said generating questions comprising:
- determining a predetermined relation type corresponding to the statement; and,
- using a template corresponding to the predetermined relation type to form a question from said statement.
7. The method as claimed in claim 2, wherein said factors comprise statements, said method further comprising, at each iteration, one or more of:
- prioritizing selected statements as factors for expedient corresponding question generation; or
- filtering selected statements and removing them as factors for corresponding question generation.
8. The method as claimed in claim 2, wherein said decomposing the input inquiry comprises:
- analyzing a text of said question;
- identifying said one or more factors from said analyzing; and
- applying weights to said one or more factors.
9. The method as claimed in claim 2, further comprising:
- decomposing the input inquiry into query terms, and using said query terms to obtain one or more candidate answers for said input inquiry;
- performing as parallel simultaneous operations:
- iteratively constructing, by the programmed processor device, a first inference graph from factors obtained from the input inquiry, a constructed first inference graph connecting factors to one or more nodes that lead to an answer for said inquiry over one or more paths having one or more edges representing said relations; and
- iteratively constructing, by said programmed processor device, a second inference graph from said candidate answers, said second inference graph connecting said candidate answers to one or more nodes that lead to said one or more factors of said inquiry over one or more paths having one or more edges representing relations;
- determining, during said simultaneous iterative constructing, whether a first inference graph can be joined to said second inference graph to generate a final inference graph having a node representing an answer to said input inquiry.
10. The method as claimed in claim 9, wherein said determining whether said first inference graph can be joined to said second inference graph comprises:
- determining, using a similarity criteria applied to end-point nodes of each said first and said second inference graphs whether two said end-point nodes can be merged into a single node to join said graphs; or
- forcing a discovering of a relation that forms an edge joining an end-point node of said first inference graph to an end-point answer node in said second inference graph.
11. A method of inferring answers to inquiries comprising:
- receiving an input inquiry;
- decomposing the input inquiry to obtain one or more factors; and,
- decomposing the input inquiry into query terms, and using said query terms to obtain one or more candidate answers for said input inquiry;
- iteratively constructing using a programmed processor device coupled to a content storage source having content, a first inference graph using said factors as initial nodes of said first inference graph, a constructed first inference graph connecting factors to one or more nodes that lead to an answer for said inquiry over one or more paths having one or more edges representing said relations;
- simultaneously iteratively constructing, using the programmed processor device and the content source, a second inference graph using said one or more candidate answers as initial nodes of said second inference graph, said second inference graph connecting candidate answers to one or more nodes that connect to said one or more factors of said inquiry over one or more paths having one or more edges representing relations; and,
- generating, during said simultaneous iterative constructing, a final inference graph by joining said first inference graph to said second inference graph, said final inference graph having a joined node representing an answer to said input inquiry.
12. The method as claimed in claim 11, wherein said iteratively constructing each said first inference graph and said second inference graph (inference graph) comprises expanding each inference graph at each iteration by:
- generating one or more questions based on one or more current nodes in said graph;
- searching in one or more content sources to identify one or more relations leading to new answers and representing said new answers as new additional nodes in said inference graph, each new additional node connected via an edge representing the relation, and each relation having an associated justifying passage at an associated confidence level,
- inferring, from said associated confidence levels, a confidence level at each node of said inference graph to provide an updated inference graph,
- determining if the updated inference graph meets a criteria for terminating said iteration, and one of:
- terminating said iteration if said criteria is met; otherwise,
- repeating said generating, searching, inferring and determining steps with said new additional nodes being current nodes at a next iteration,
- wherein, upon terminating, said answer to said inquiry is a node from said updated inference graph.
13. The method as claimed in claim 12, wherein said generating the final inference graph comprises:
- determining, using a similarity criteria applied to end-point nodes of each said first and said second inference graphs whether two said end-point nodes can be merged into a single node that joins said first inference or second inference graph.
14. The method as claimed in claim 13, wherein said determining using a similarity criteria comprises:
- applying one or more of: term matching or co-referencing to identify one or more of: a syntactic, semantic or contextual similarity between said identified end-point node of said first inference graph node and an end-point node of said second inference graph, and
- merging said identified end-point nodes meeting one or more of: a syntactic, semantic or contextual similarity criteria.
15. The method as claimed in claim 12, wherein said generating a final inference graph comprises:
- forcing the discovering of a relation that forms an edge joining an end-point node of said first inference graph to an end-point answer node in said second inference graph.
16. The method as claimed in claim 15, wherein said forcing the discovering of a relation that forms an edge comprises:
- generating, from an end-point factor node of said first inference graph to an end-point candidate answer node in said second inference graph, one of: a “yes”/“no” or multiple-choice question, and
- using said generated “yes”/“no” or multiple-choice question to determine whether a relation between said respective end-point nodes exists, said relation joining a candidate answer to a factor of the input inquiry.
17. The method as claimed in claim 11, wherein said query terms include searchable components, said obtaining candidate answer comprising: conducting a search over content from one or more content sources using one of more of the searchable components to obtain candidate answers used as said initial nodes for said second graph constructing.
Type: Application
Filed: Oct 12, 2012
Publication Date: Apr 17, 2014
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventors: David W. Buchanan (Ossining, NY), David A. Ferrucci (Yorktown Heights, NY), Adam P. Lally (Cold Spring, NY)
Application Number: 13/651,041
International Classification: G06N 5/02 (20060101);