Abstract: A method for entity extraction within an electronic document including executing by a computer processor a conditional random field algorithm stored on a computer readable medium to generate a conditional random field model; the conditional random field algorithm having an input including one or more training text documents; executing by a computer processor an entity extraction algorithm stored on a computer readable medium to generate an entity extraction model; the entity extraction algorithm having an input including the same one or more training text documents input into the conditional random field algorithm; applying by a computer processor the conditional random field model to at least one electronic document; wherein application of the conditional random field model returns a list of passages in the at least one electronic document having an entity; applying by a computer processor the entity extraction model to the at least one electronic document; wherein application of the entity extraction model
Type:
Grant
Filed:
October 28, 2016
Date of Patent:
December 18, 2018
Assignee:
KIRA INC.
Inventors:
Robert Henry Warren, Alexander Karl Hudek
Abstract: The methods proposed here deconstructs training sentences into a stream of features that represent both the sentences and tokens used by the text, their sequence and other ancillary features extracted using natural language processing. Then, we use a conditional random field where we represent the concept we are looking for as state A and the background (everything not concept A) as a state B. The model created by this training phase is then used to locate the concept as a sequence of sentences within a document. This has distinct advantages in accuracy and speed over methods that individually classify each sentence and then use a secondary method to group the classified sentences into passages. Furthermore while previous methods were based on searching for the occurrence of tokens only, the use of a wider set of features enables this method to locate relevant passages even though a different terminology is in use.
Type:
Grant
Filed:
August 25, 2016
Date of Patent:
May 9, 2017
Assignee:
KIRA INC.
Inventors:
Robert Henry Warren, Alexander Karl Hudek