CONCEPT-BASED ANALYSIS OF STRUCTURED AND UNSTRUCTURED DATA USING CONCEPT INHERITANCE
In one embodiment, a method comprises defining a set of concepts based on a first set of structured and unstructured data objects, defining a business rule based on the set of concepts, applying the business rule to a second set of structured and unstructured data objects to make a determination associated with that set, and outputting to a display information associated with the determination.
This application is a continuation of U.S. patent application Ser. No. 14/970,079, filed Dec. 15, 2015, which in turn is a continuation of U.S. patent application Ser. No. 12/423,024, filed Apr. 14, 2009, and entitled CONCEPT-BASED ANALYSIS OF STRUCTURED AND UNSTRUCTURED DATA USING CONCEPT INHERITANCE; the entirety of the aforementioned applications are herein incorporated by reference.
BACKGROUNDEmbodiments described herein relate generally to information analysis, and more particularly to methods and apparatus for concept-based analysis of structured and unstructured data.
Organizations often utilize sophisticated computer systems and databases to inform and automate portions of the decision-making process. Many such systems organize relevant data into a structured format (such as a relational database), making it accessible by a broad array of query, analysis, and reporting applications. Some of these systems programmatically calculate business decisions and make assessments based on available data and program logic. However, often much of the information relevant to these calculations is stored in a variety of unstructured formats—such as handwritten notes, word processor documents, e-mails, saved web pages, printed forms, photographic prints, and the like.
Because typical systems are incapable of organizing and searching the content of such documents, their decision outputs are generally based on only the subset of pertinent information that exists in structured form—rendering these outputs incomplete and at times inaccurate. Those systems that do incorporate unstructured data into their decision-making algorithms often convert text information into a coded form that can be stored in a structured format (such as a relational database field). This approach is undesirable, however, because much context and meaning can be lost when a complex idea conveyed in language is shoe-horned into a simple, coded form.
Further, traditional techniques for logically combining such coded data are susceptible to producing false positives, as correlations between factors that contribute to a given decision output are not accounted for in such models. More specifically, in a given scenario in which multiple factors contribute to a particular outcome or determination, many systems generate a determination based on the number of those factors present in a given data set—defining rules that assume an increased likelihood of a given output for each additional factor present in the data set. This approach is flawed, however, because two or more of these factors may not occur independently in the data. For example, two or more such factors could be positively correlated, such that the presence of a first factor always implies the presence of the second. In such a scenario, if the first factor is present, the presence of the second factor does not increase the likelihood of the particular output under consideration. This flaw can result in the generation of a false positive, as the system inappropriately includes the presence of the second factor as an additional weight in its decision calculus.
Additionally, the inability of a system to properly incorporate unstructured data into its calculations forces individuals to consider the relevant unstructured documents separately—without the significant aid of computer processing power. This laborious task not only greatly increases the time and cost of the decision-making process, but also introduces additional imprecision, as individuals are unlikely to analyze data with the consistency and speed of a computerized solution. Finally, individuals are unlikely to optimally combine their own intuitions regarding a set of unstructured data with computer-generated analysis of structured data to reach an accurate final conclusion.
Thus, a need exists for methods and apparatus that programmatically organize and analyze structured and unstructured data together, and apply business logic to make accurate determinations based on that data. A need further exists for methods and apparatus that analyze and make a determination about a set of data, using techniques that avoid the false positives that often result when contributing factors and concepts are positively-correlated within the data.
SUMMARYIn one embodiment, a method comprises defining a set of concepts based on a first set of structured and unstructured data objects, defining a business rule based on the set of concepts, applying the business rule to a second set of structured and unstructured data objects to make a determination associated with that set, and outputting to a display information associated with the determination.
A computerized decision system can be configured to organize the content of a first set of structured and unstructured data into a concept hierarchy. In some embodiments the decision system can generate a business rule based on the concept hierarchy, and execute the business rule on the first set of structured and unstructured data to calculate a determination based on the data. In some embodiments, the decision system can execute the business rule on a second set of structured and unstructured data, different from the first, to calculate a determination based on the second set of data. In some embodiments, the decision system can output information associated with the determination to a display.
The concept hierarchy can be based on, for example, any combination of any number of: a concept present in the content of one or more unstructured data objects, a coded data value in a particular range, or one or more other concepts. A concept can be, for example, one or more words or phrases that convey an idea. In some embodiments, the concept hierarchy can include a concept based at least in part on a regular expression that evaluates the presence or absence of a particular subconcept in the content of an unstructured data object.
In some embodiments, the computerized decision system can be configured to provide functionality that assists a user in defining a business rule based on the first set of structured and unstructured data. The business rule can, for example, include one or more logical relationships between one or more concepts that when evaluated produce a determination based on a set of structured and unstructured data. In some embodiments, the decision system can present a user with reports about the first set of structured and unstructured data to assist the user in defining the business rule. In some embodiments, the decision system can include functionality that allows a user to test the defined business rule for accuracy and subsequently edit the business rule to increase precision.
The computerized decision system can additionally be configured to execute the business rule on the first set of structured and unstructured data. In some embodiments, the decision system can include a determination module that parses the business rule and searches within the first set of structured and unstructured data for each concept included in the business rule. The determination module can also include, for example, functionality to evaluate the business rule based on the presence or absence of each concept within the data set and the logical relationships between the concepts defined by the business rule, thereby producing a determination about the data set. In some embodiments, the determination module can be configured to output text or information associated with the determination to a display device for viewing by a user. In some embodiments, the determination module can be configured to store the determination to a memory or transfer information associated with the determination to another software-based module or hardware device for further analysis, storage, or display.
Any or all of the several modules in the illustrated computerized decision system can be implemented, for example, in hardware (e.g., a processor, an application-specific integrated circuit (ASIC), or a field programmable gate array (FPGA)), and/or in software that resides on a hardware device (e.g., a processor) or in a memory (e.g., a RAM, a ROM, a hard disk drive, an optical drive, or other removable media) coupled to a processor. The several modules can be implemented and/or resident on devices connected over, for example, a communications network such as any combination of a local area network (LAN), a wide area network (WAN), the Internet, and/or a wireless data connection, such as a Bluetooth or infrared connection.
The optional training data set 100 can reside, for example, in a computerized memory such as a RAM, a ROM, a hard disk drive, an optical drive, or other removable media. The structured data source 102 can be organized into, for example, a relational database such as a Structured Query Language (SQL) database, one or more comma-separated values (CSV) files, one or more other pattern-delimited files, or other structured data format hierarchy. The unstructured data objects 104 can be, for example, one or more of: a handwritten document, a typed document, an electronic word-processor document, a printed or electronic spreadsheet document, a printed form or chart, or other electronic document that contains text such as an e-mail, Adobe PDF document, Microsoft Office document, and the like. In some embodiments, the structured data source 102 can include, for example, one or more unstructured data elements, such as a string of text stored in as a relational database column of type string or varchar.
In some embodiments, the optional training data set 100 can be omitted. In such embodiments, data set 140 can be connected to concept generator module 110 and to business rule generator module 130. In such embodiments, data set 140 can perform the functions of optional training data set 100 described herein.
In some embodiments, the concept generator module 110 can receive data that includes a set of structured and/or unstructured data, such as the optional training data set 100. Upon receipt of the set of structured and unstructured data, the concept generator module 110 can be configured to generate a concept hierarchy 10, by, for example executing a concept extraction technique such as that detailed in U.S. Pat. No. 7,194,483 to Mohan et al., the disclosure of which is incorporated herein by reference in its entirety.
In some embodiments, the concept generator module 110 can be configured to provide functionality that allows a user to add a concept to or delete a concept from the concept hierarchy 10. Additionally, the concept generator module 110 can provide functionality that allows a user to edit an existing concept or relationship between one or more concepts. More specifically, the concept generator module 110 can be configured to display a visual representation of the resulting concept hierarchy 10 and to include functionality that allows a user to send input signals to the concept generator module that indicate a desired change to the concept hierarchy 10. The concept generator module 110 can be configured to receive these signals and accordingly update the concept hierarchy 10 according to the desired changes. In some embodiments, the concept generator module can be configured to receive a file that defines one or more concepts, with the location of the file being specified by a user. The concept generator module can include the one or more concepts as part of the concept hierarchy 10. In some embodiments, the above-described concept hierarchy definition methods can be performed iteratively until the concept generator module 10 receives a signal from a user indicating that the concept hierarchy 10 is acceptable. In some embodiments, the concept generator module 110 can be configured to detect concepts within the optional training data set 100 that are positively-correlated within the data. After this detection process, the concept generator module can recursively combine such concepts into higher-level concepts until all highest-level concepts in the concept hierarchy 10 occur independently of one another in the optional training data set 100.
The business rule generator module 130 can be configured to receive data that includes the concept hierarchy 10. In some embodiments, the business rule generator module 130 can receive the contents of the concept hierarchy 10 by reading the concept hierarchy 10 from a removable storage medium such as an optical disc, an external hard disk drive, or a flash memory module. The business rule generator module 130 can be, for example, a software-based module that resides on a hardware device. Alternatively, in some embodiments, the business rule generator module 130 can be a hardware device.
In some embodiments, the business rule generator module 130 can provide to a user functionality for composing a business rule based on the concept hierarchy 10. The business rule generator module 130 can, for example, provide a graphical user interface that includes a visual representation of the concepts and concept relationships that comprise the concept hierarchy 10. Such an interface can, for example, allow a user to manipulate the visual representation and enter logic to define a business rule 12.
The structured data source 102 can be organized into, for example, a relational database such as a Structured Query Language (SQL) database, one or more comma-separated values (CSV) files, one or more other pattern-delimited files, or other structured data format hierarchy. The unstructured data 142 can include, for example, one or more of: a handwritten document, a typed document, an electronic word-processor document, a printed or electronic spreadsheet document, a printed form or chart, or other electronic document that contains text such as an e-mail, Adobe PDF document, Microsoft Office document, and the like. In some embodiments, the structured data source 102 can include, for example, one or more unstructured data elements, such as a string of text stored in as a relational database column of type string or varchar.
In some embodiments, the determination module 150 can receive the contents of the business rule 12 via a removable storage medium such as an optical disc, an external hard disk drive, or a flash memory module.
The determination module 150 can be configured to execute the business rule 12 using the data set 140 to produce a determination 14. In some embodiments, the determination rule 150 can be configured to output text and/or graphics associated with the determination 14 to a display such as a computer monitor, television, LCD or LED screen, or video projector.
A concept hierarchy based on the structured and unstructured data can be created, at 210. The concept hierarchy can be comprised of one or more concepts connected by conceptual relationships, such as a parent concept/subconcept relationship. A concept in the concept hierarchy can be, for example, one or more words or phrases present in the content of an unstructured document from the set of structured and unstructured data. Alternatively, a concept in the concept hierarchy can be a value for a structured data element from the structured data, such as the value of a relational database field. Alternatively, a concept can be any combination of another concept, a structured data element, or the presence or absence of one or more words or phrases in the content of an unstructured data element.
A business rule can be defined based on the concept hierarchy, at 220. The business rule can be automatically generated by a software- or hardware-based module, similar to the business rule generator module described in connection with
The business rule can be optionally be tested for accuracy by applying it to a known testing set of structured and unstructured data with known outcomes or characteristics, at 230. The tests can be defined, for example, by receiving user input signals indicating the selection of one or more data objects from the testing set of unstructured data to define a subset and subsequently receiving user input signals that indicate a correct outcome for the application of the business rule to the testing set. The test can be executed by, for example, executing the business rule on the testing set to produce a test output.
If the test output is incorrect, the business rule can be refined based on the test output, at 240. In some embodiments, the business rule can be refined by receiving one or more user input signals that edit the definition of the business rule. The updated business rule can be re-tested for accuracy, at 230, and this process of testing and refining can be repeated until a satisfactory test output is obtained and the user specifies completion of the testing and refining process.
The business rule can be executed on the set of structured and unstructured data to make a determination about the set, at 250. The executing can be performed at a software- or hardware-based module, similar to the determination module discussed in connection with
The determination can be output to a display, at 260. The determination can be a conclusion about the contents of the set of structured and unstructured data. In some embodiments, the determination can be a binary output, such as a “1” or “0” or a “yes” or “no” that indicates the presence or absence of a particular concept in the set of data. In some embodiments, the determination can be a recommendation for future action based on the contents of the set of data. The determination can be output, for example, in a readable language format, such as a declarative sentence in English or another language. In some embodiments, the determination can be output as a data code or in another alphanumeric format.
In some embodiments, the logical combination that defines the concept 300 can be an expression (not shown) that includes boolean and logical operators such as, for example, “AND”, “OR”, “NAND”, “NOR”, “XOR”, “XNOR” and “NOT”. In the example illustrated by
In some embodiments, the concept generator module 440 can programmatically extract concepts from the unstructured and structured data objects to create a concept hierarchy. In such an embodiment, the concept generator module can be similar to the analysis and categorization engine discussed in connection with U.S. Pat. No. 7,194,483 to Mohan et al, the disclosure of which is incorporated herein by reference in its entirety.
The concept hierarchy creation process can also include receiving user input to define one or more concepts from a set of structured data, such as a relational database. In some embodiments, the concept generator module 430 can be configured to prompt the user for input that defines one or more concepts based on one or more structured data fields from the structured data elements 425. For example, the concept generator module 430 can display to a screen a visual or textual representation of the structured data elements 425, such as fields of one or more tables from a relational database (not shown). The user can then be prompted to select one or more fields from a database table (not shown), input a target value for the field, and input a name for the concept. In some embodiments, this process can be repeated iteratively until the user has defined a desired number of concepts necessary to create an appropriate business rule (as discussed in connection with
In some embodiments, the concept generator module 430 can additionally scan the text of the unstructured data objects 410 and extract a series of concepts. For example, the concept extraction process can include discovering one or more words or phrases present in the content of an unstructured data object and classifying the words or phrases as a concept, along with a title, name or label. In some embodiments, as detailed in connection with
In some embodiments, the concept generator module 430 can be configured to include one or more user-defined concepts in the concept hierarchy 440. The concept generator module 430 can be configured to receive the one or more user-defined concepts via direct user input, by importing a file that contains information associated with the user-defined concepts, or by accessing a database that contains the user-defined concepts. Additionally, in some embodiments the concept generator module 430 can be further configured to detect additional concepts over time. Thus, as the content and composition of the either the unstructured data objects 410 and/or the database 420 changes over time, the concept generator module 430 can be configured to continually discover new concepts present in the data and include them in subsequently-generated concept hierarchies.
After completion of the concept extraction process, the concept generator module 430 can be configured to organize the extracted and/or user-defined concepts into a concept hierarchy. In some embodiments, the organization process can include defining one or more parent-child relationships between the concepts to create a hierarchy of concepts. To define these parent-child relationships, the concept generator module 430 can be configured to employ a series of concept relationship discovery techniques. For example, the concept generator module 430 can utilize one or more techniques described in connection with U.S. patent application Ser. No. 10/695,426 to Mohan entitled “Concept-based method and system for dynamically analyzing results from search engines”.
In some embodiments, the concept generator module 430 can employ alternative concept relationship discovery techniques, such as correlation analysis. To perform correlation analysis, the concept generator module 430 can be configured to analyze the unstructured data objects 410, the structured data elements 425, and the concepts extracted from the above, and execute a series of processes that discover a correlation between the presence of at least one concept and the presence of at least one other concept. For example, while performing concept correlation analysis, the concept generator module 430 can determine that, in worker's compensation insurance claim data, the presence of a concept named “pre-work accident” (defined, for example, by a structured data or unstructured data element that indicates that the time of injury is before the start of working hours) and the presence of a second concept named “no co-workers present” (defined, for example by a regular expression that determines the presence of the concept in an unstructured document related to the claim) are highly-correlated with claims associated with a fraudulent or suspect status (defined, for example by a status code in a set of structured data). In the example, the system can utilize this correlation to create part of a “suspicious claim” concept that combines these individual concepts to create a higher-level concept defined by a logical expression that represents the correlation between the concepts (as discussed in connection with
In some embodiments, the concept generator module 430 can utilize the results of the above-described correlation analysis to combine concepts that are positively-correlated or have a close relationship within the data set used by the module to define the concept hierarchy 440 (as discussed in, for example, in U.S. patent application Ser. No. 10/695,426 to Mohan et al, the disclosure of which is incorporated herein by reference in its entirety). Referring again to the above-described example of a pre-work accident, the concept generator module could utilize a positive correlation of two factors associated with pre-work accidents to combine the concepts as at least a portion of a single concept. The concept generator module 430 can be configured to recursively perform the combination of positively-correlated concepts into fewer concepts within the concept hierarchy 440 until no two concepts within the concept hierarchy 440 that are positively-correlated are not defined within the same concept as one another. In other words, the module can recursively perform the concept combination process until none of the highest-level concepts defined in the concept hierarchy 440 are positively-correlated with one another in the data set. This process allows for the reduction of false positives produced by the determination process, as the existence of positively-correlated decision factors (concepts) does not inappropriately skew the decision calculus.
In some embodiments, one or more concepts in the concept hierarchy 440 can be programmatically refined by the concept generator module 430. Specifically, the concept generator module 430 can be configured to utilize one or more reference sources such as a dictionary and/or a thesaurus to refine the name or contents of one or more concepts in the concept hierarchy 440. Additionally, the concept generator module can be configured to programmatically detect additional relationships between the concepts in the concept hierarchy. The concept generator module can then optionally update the definition of the concept hierarchy to include the additional relationships.
In some embodiments, the concept generator module 430 can provide functionality that allows a user to edit the definition of one or more concepts in the concept hierarchy 440. For example, the concept generator module can output to a display (not shown) a visual representation of the concept hierarchy, and provide functionality whereby a user can send one or more input signals that indicate a desired change to the definition of one or more concepts. The concept generator module can be configured to receive the signals and effectuate the desired changes in the definition of the concept hierarchy.
In some embodiments, the concept generator module 430 can be configured to update the definition of a concept in the concept hierarchy 440 upon, for example, receipt of a signal from a user. The concept generator module 430 can additionally update a concept definition automatically in response to, for example, additional information detected within the data set, the addition of a user-defined concept to the concept hierarchy 440, or any other compositional change to the concept hierarchy. Upon completion of an update, the concept generator module can be configured to propagate the updated concept definition throughout all instances of that concept in the concept hierarchy (whether the instance of the concept be as an independent concept, as a subconcept of another, higher-level concept or within a regular expression).
In some embodiments, after completion by the concept generator module 430, the concept hierarchy can be stored in one or more electronic files or in a relational database for retrieval by or sending to a software- or hardware-based module similar to the business rule generator module discussed in connection with
As discussed in connection with
As discussed in connection with
Those skilled in the art will be familiar with the creation and evaluation of regular expressions such as regular expression 530. In this example, regular expression 530 is defined by the statement: “(like˜ or enjoy˜) pre/3 (baseball)”, which represents the notion of any word beginning with the letters “like” or “enjoy” existing in a portion of text within three words before the word “baseball”. Thus, exemplary text strings satisfying this regular expression—i.e., for which the regular expression would evaluate in the affirmative to indicate presence of the concept 520—are: “I'm currently enjoying watching baseball”, or “Many Americans like playing baseball in the spring.” In some embodiments, the regular expression 530 can include one or more additional operators such as an operator that detects the presence of a word within “x” words (expressed “w/x”; e.g., “w/5” means “within five words”) or an operator that detects the presence of a word within “x” words before (expressed “pre/x”; e.g., “pre/5” means “within five words before”). In some embodiments, the regular expression 530 can include an operator that detects the presence of a pattern of characters within the same sentence (expressed “s/s”; e.g., “ball s/s team” means “ball appearing within the same sentence as team”) or within the same paragraph (expressed “p/s”; e.g., “ball p/s team” means “ball appearing within the same paragraph as team”). The regular expression 530 can further include an operator that denotes a wildcard character (such as the characters “*”, “?”, and “=”), or any other standard regular expression operator, which are generally known to those skilled in the art.
In some embodiments, the regular expression 530 can be included as part of a concept hierarchy, such as the concept hierarchy discussed in connection with
In some embodiments, the business rule generator module 620 receive data that includes a concept hierarchy 610 from a software or hardware-based module, such as the concept generator module discussed in connection with
Upon receipt of the concept hierarchy 610, the business rule generator module 620 can be configured to generate a completed business rule 630. More specifically, in some embodiments the business rule generator module 620 can be configured to output to a display a visual representation of the concept hierarchy 610 and provide functionality that allows a user to define a business rule associated with the concept hierarchy. For example, the business rule definition functionality can include an area of a display that allows a user to use input devices to select one or more concepts from the concept hierarchy and define logical relationships between the concepts. In some embodiments, the functionality can include a text input field that allows a user to enter at least a portion of a business rule using a text input device, such as a computer keyboard (not shown).
In some embodiments, the visual display of the concept hierarchy 610 can include one or more reports generated by the concept hierarchy reporting module 622. The reports can include, for example, information associated with the concept hierarchy 610, such as information about positive correlations between the presence of certain data in a portion of the data set 600 and the presence of certain concepts within that same portion. These correlations can be used by the user to, for example, detect patterns and logical relationships within the data that can be included in the created business rule to improve the rule's predictive accuracy.
In some embodiments, once the user has defined an initial business rule, the rule can be sent to the business rule test module 624. The business rule test module 624 can be configured to test business rule accuracy by receiving the business rule and executing it on a set of test data with known outcomes. In some embodiments, the business rule test module can receive an input signal from the user that indicates the location of the test data and the correct outcomes for that test data. The business rule test module 624 can be configured to test the received business rule by executing it on the test data, and subsequently display results of the test to a display device.
After completion of the test, the business rule test module 624 can be configured to return focus to the business rule editor module 626. If the results of the above test are satisfactory to the user, the user can choose to accept the tested business rule. If the results of the test are unsatisfactory, the business rule editor module 626 can be configured to receive additional user inputs signals that indicate one or more desired changes to the rule. This process of receiving user input signals that indicate a desired change to the rule, followed by testing of the rule using the business rule test module 624, can be performed iteratively until the business rule editor module receives an input signal from the user that indicates completion of the business rule generation process.
In some embodiments, the business rule editor module 626 can allow a user to edit a business rule over time, as the composition and/or content of the underlying data that comprises data set 600 changes. The business rule generator module 620 can be further configured to access one or more additional sources of structured and unstructured data (not shown) and allow the user to refine the business rule by analyzing and running tests on the additional data.
The completed business rule can be a logical combination of one or more concepts, as depicted in completed business rule 630 of
For clarity, completed business rule 630 illustrates the expansion of that business rule into its component parts. In the illustrated example, Concept1 is defined as the logical combination of the presence of Concept3, Concept4, and Regular Expression1. Concept2 is composed of Regular Expression4. Concept3 is itself composed of the Regular Expression3, and Concept4 is composed of the logical combination of RegularExpression2 or StructuredElement1, where StructuredElement1 represents an expression that determines the presence or absence of a specified value in structured data included from the data set 600.
In another example, four concepts (labeled C1, C2, C3, and C4, respectively) can represent four factors that, if all present for the same automobile insurance claim, indicate that the claim may be fraudulent. The factors are: that the automobile in question is less than seven years old (concept created from a structured data element stored in a state automobile registration database; labeled C1), that the automobile was stolen (concept created from a structured data element stored in the insurance company's claim database; labeled C2), that the keys were removed from the ignition during the incident (concept extracted from the text of a scanned police incident report or from a structured data element; labeled C3), and that the automobile was vandalized (concept extracted from claim adjustor notes converted into computer text by optical character recognition (OCR); labeled C4). To represent a concept for the type of fraudulent claim associated with these factors, a user can define a logical relationship between these four concepts, such as: Fraudulent Claim=(C1 AND C2 AND C3 AND C4), where each concept label included in the expression represents the presence of that concept in data.
In some embodiments, the business rule generator module 620 can be configured to store information associated with the completed business rule 630 in a memory for later use. Additionally, the business rule generator module 620 can be configured to send data that includes the completed business rule 630 to another software- or hardware-based module for execution on a data set (as discussed in connection with
The business rule 710 can be a stored in a memory (e.g., a RAM, a ROM, a hard disk drive, an optical drive, or other removable media; not shown) connected via a network to the determination module 730. In some embodiments, the memory can reside on the same hardware device as the determination module 730. In some embodiments, the business rule 710 can be stored in a removable storage medium such as an optical disc, an external hard disk drive, or a flash memory module and transferred onto the hardware device on which the determination module 730 resides. The business rule 710 can include information associated with a business rule, similar to the completed business rule discussed in connection with
As illustrated in
Upon completion of the parsing process, the business rule evaluator module 734 can be configured to apply the contents of data set 700 to the expanded version of the business rule 710 and generate a determination 740. Specifically, the business rule evaluator module 734 can send data to and receive data from the unstructured object search module 736 and the structured objected search module 738. The unstructured object search module 736 can be configured to search the data set 700 for a particular text string as dictated by a portion of the expanded business rule 710 currently being processed by the business rule evaluator module 730. Similarly, the structured data search module 738 can be configured to search the data set 700 for a particular structured data value as dictated by a portion of the expanded business rule 710 currently being processed by the business rule evaluator module 734. After detecting the presence or absence of the searched-for information, each of the unstructured object search module 736 and structured data search module 738 can send a signal to the business rule evaluator module 734 that includes an indication of the presence or absence of that information in the data set 700. This process can be repeated for each portion of the expanded business rule 710.
Upon receipt of all necessary information from unstructured object search module 736 and structured objected search module 738 regarding the presence or absence of each concept from the expanded business rule in the data set 700, the business rule evaluator module 734 can be configured to logically evaluate the extracted business rule to compute a determination 740.
In some embodiments, the determination module 730 can be configured to send text or data associated with the determination 740 to an output device, such as a display (not shown). In some embodiments, the business rule evaluator module 734 can be configured to store information associated with the determination 740 to a memory and/or send the information to another hardware- or software-based module connected, for example, via a network.
In the example, the concept generator module 820, business rule generator module 830, and determination module 850 are software-based modules that reside on a single hardware device. The insurance claim information 810 resides on multiple hardware devices that contain the insurance databases 812 and unstructured claim data objects 814. In the example, the insurance claim information is accessed by the concept generator module 820 and determination module 850 over a local area network connection. Display 860 is connected to the hardware device via a video output cable.
The insurance databases 812 contain structured data associated with one or more automobiles, drivers, and automobile insurance incident claims, including automobile information (e.g., vehicle identification (VIN) numbers, make, model, model year, color, vehicle type, incident history, etc.), driver/claimant data (e.g., age, ethnicity, gender, and other relevant demographic information), driver's license and driving record data, and claim information (e.g., incident date, incident time, weather conditions at time of incident, collision type, claim date, claim amount, etc.). The unstructured claim data 814 consists of electronic versions of documents that contain information relevant to the claim, such as claim adjustor notes, insurance company letters and memos, attorney communications, news articles, garage and repair bills, medical notes, recorded calls and voice messages converted to text, and claimant-company communication.
In the example, concept generator module 820 receives information about one or more automobile insurance incident claims included in insurance claim information 810 over a local area network. In the example, the concept generator module 820 generates a potentially fraudulent claim concept hierarchy 80 based on the insurance claim information, implementing a method similar to that discussed in connection with
Referring back to
1. There is limited damage to the automobile (found in police or insurance company reports)
2. At the time of the incident, a third party was driving an automobile that had been written off in a prior accident (found in vehicle database)
3. At the time of the incident, the third party was driving a stolen automobile (found in vehicle database) and third party could not show proper registration (found in police notes)
4. The claimant has provided false information (found in police or insurance company notes)
5. The claim handler is suspicious (found in insurance company notes)
6. The third party is eager to settle the claim (found in insurance company notes)/reports
7. The claimant adamantly disagrees with the third party's description of the incident (found in claimant communication with insurance company)
8. One or more individuals involved in the incident is on a watch list of suspect individuals (found in insurance or police database)
The presence of any of these factors for a given claim makes that claim potentially fraudulent. Accordingly, this fact can be represented by a single logical expression, which constitutes fraudulent claim business rule 82: Potentially_Fraudulent_Claim=(C1 OR C2 OR C3 OR C4 OR C5 OR C6 OR C7 OR C8) (where each concept in the expression represents the presence of that concept in the examined data).
The business rule generator module 830 then executes the business rule on a set of test data and outputs to the display results of the test to indicate the accuracy of the defined business rule. In the example, the set of test data is a subset of the insurance claim information 810 known to include fraudulent claims. Upon completion of one or more iterations of user edits to the business rule and subsequent tests for accuracy, the business rule generator module sends data that includes the fraudulent claim business rule 82 to the fraudulent claim determination module 850.
Referring back to
In particular,
Some embodiments described herein relate to a computer storage product with a computer-readable medium (also can be referred to as a processor-readable medium) having instructions or computer code thereon for performing various computer-implemented operations. The media and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), and Read-Only Memory (ROM) and Random-Access Memory (RAM) devices.
Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files that contain higher-level instructions that are executed by a computer using an interpreter. For example, embodiments may be implemented using Java, C++, or other programming languages (e.g., object-oriented programming languages) and development tools. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, not limitation, and various changes in form and details may be made. Any portion of the apparatus and/or methods described herein may be combined in any combination, except mutually exclusive combinations. The embodiments described herein can include various combinations and/or sub-combinations of the functions, components and/or features of the different embodiments described.
Claims
1. A processor-implemented method, comprising:
- determining a first concept from a first set of data objects, the first concept including a first seed concept and at least one first related concept and defined by a first regular expression indicating (1) a data code in a data object from the first set and (2) a presence of a text string in a data object from the first set;
- determining a second concept from a second set of data objects, the second concept including a second seed concept and at least one second related concept and defined by a second regular expression indicating (1) a data code in a data object from the second set and (2) a presence of a text string in a data object from the second set, such that the first concept and the second concept are not positively correlated;
- determining a business rule that includes a third regular expression indicating (1) the presence or absence of data code indicative of at least one of the first concept or the second concept in a data object from the first set or a data object from the second set and (2) the presence or absence of a text string indicative of at least one of the first concept or the second concept in a data object from the first set or a data object from the second set;
- applying the business rule to a third set of data objects to make a predictive determination for the third set; and
- outputting information associated with the predictive determination to a display.
2. The method of claim 1, wherein the first set of data objects and the third set of data objects are disjoint sets.
3. The method of claim 1, wherein the third set of data objects is a subset of the first set of data objects.
4. The method of claim 1, further comprising:
- refining the first concept based on a reference source.
5. The method of claim 4, wherein the reference source is a dictionary.
6. The method of claim 4, wherein the reference source is a thesaurus.
7. The method of claim 1, further comprising:
- refining the first concept based on a language relationship.
8. A processor-implemented method, comprising:
- defining a plurality of concepts based on a plurality of sets of data objects, each concept of the plurality of concepts including a seed concept and at least one related concept, each concept of the plurality of concepts being defined by a regular expression indicating (1) a presence of a text string in a data object from a set of data objects from the plurality of sets of data objects, and (2) a data code stored in a data object from the set of data objects from the plurality of sets of data objects;
- defining a business rule including a business rule regular expression indicating (1) the presence or absence of a text string indicative of at least one concept from the plurality of concepts in a data object from the plurality of data objects and (2) the presence or absence of data code indicative of at least one concept from the plurality of concepts in a data object from the plurality of data objects;
- applying the business rule to a further set of data objects different from each set of data objects from the plurality of sets of data objects, thereby making a predictive determination of a likelihood of a condition being met based on the set of data objects; and
- outputting information associated with the predictive determination to a display.
9. The method of claim 8, further comprising
- refining at least one concept of the plurality of concepts using a language relationship.
10. The method of claim 8, further comprising:
- refining at least one concept of the plurality of concepts using a reference source.
11. The method of claim 10, wherein the reference source comprises a thesaurus.
12. The method of claim 10, wherein the reference source comprises a dictionary.
13. The method of claim 10, wherein the reference source comprises a dictionary and a thesaurus.
14. A processor-implemented method, comprising:
- receiving data including a concept hierarchy, the concept hierarchy including a plurality of concepts, each concept from the plurality of concepts including at least one data object;
- receiving a first plurality of user input signals, each signal from the plurality of user input signals indicating a selection of at least one concept from the plurality of concepts;
- outputting information associated with the plurality of concepts to an output device, wherein the plurality of concepts are not positively correlated;
- receiving a second plurality of user input signals that set a plurality of logical relationships between pairs of concepts from the plurality of concepts, based at least in part on a plurality of regular expressions indicating (a) a presence of a text string in a first data object of a first concept of a concept pair or a first data object of a second concept of the concept pair and (b) a data code stored in a second data object of the first concept of the concept pair or a second data object of the second concept of the concept pair,
- defining at least one of the plurality of logical relationships as business rule; and
- executing the business rule on a set of data objects to make a predictive determination of a likelihood of a condition being met based on the set of data objects.
15. The method of claim 14, wherein the set of data objects comprises structured data objects.
16. The method of claim 14, wherein the set of data objects comprises unstructured data objects.
17. The method of claim 14, wherein the set of data objects comprises unstructured data objects and structured data objects.
Type: Application
Filed: Dec 9, 2016
Publication Date: Nov 2, 2017
Inventor: Rengaswamy MOHAN (Jacksonville, FL)
Application Number: 15/374,671