METHOD AND SYSTEM FOR BUILDING ENTITY HIERARCHY FROM BIG DATA
The various embodiments herein provide a method and a system for building an entity hierarchy. The method comprises extracting a plurality of entities from a bin data, determining a parent entity by understanding a context in which the entity is used, resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context and building a hierarchical structure of entities using knowledge repositories, ontologies and language repositories along with natural language processing techniques. The method of extracting entities from the structured data comprises identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The method of extracting entities from unstructured data includes a self-learning process and training based learning process to learn new parent entities from domain specific documents using new entity recognition models.
The present application claims priority of Indian provisional application serial number 3286/CHE/2012 filed on Aug. 10, 2012, and that application is incorporated in its entirety at least by reference.
BACKGROUND1. Technical Field
The embodiments herein generally relate to data mining and particularly relates to extracting and resolving entities from a large collection of data. The embodiments herein more particularly relates to a method and system for extracting entities from big data and building an entity hierarchy using language and domain models.
2 Description of the Related Art
A big data is a large collection, of information which derives its data content from plurality of structured, unstructured and semi-structured data sources. The big data requires a paradigm shift in the way the data is looked at in the past. The data cannot anymore reside in pockets and not talk each other. It is imperative that all of it to be considered as one and then be processed. Recognizing the entities and the relationships they share is a first step toward understanding data. Entity extraction and entity type or parent entity recognition are the building blocks of analyzing big data. Therefore, it is imperative that entity extraction and recognition should be done with least manual intervention and hence, a self learning procedure is required.
An entity is an atomic unit of data which has an independent self-explanatory meaning, and is also referred as an object that makes an independent sense. Entities could be named and unnamed or concepts, and include names of living and non living things, concepts, theories or simply the language units that make independent sense. In a database context, entities and relationships help in structurally storing the contents of a big data.
Entity extraction means processing data to identify, tag and properly account for those elements that are the names of person, numbers, organizations, locations, and expressions such as a telephone number, among other items. An entity can consist of a single word or a bound sequence of words. The challenge of figuring out entities is tough one for several reasons as many entities exist only in richly varied forms.
Many researches have been conducted for finding and identifying entities in a data. An existing system discusses about extraction of named entities only. Therefore the current systems are limited by the relationships that exist between named entities and never consider the relationship between concepts or a concept and a named entity. The existing literature does suggest building an entity hierarchy but limits itself to entity extraction and resolution.
The existing data analysis and information extraction techniques are usually designed to target at a particular media type and not applicable to data generated by a different media type. For example, existing entity extraction techniques focus on textual data. Entities of interest, such as protein and gene names, chemical names and formulae, drug names etc., are automatically extracted from the textual part of a document.
The existing extraction tools merely identity and extract information based on pre-specified relations and relation-specific human-tagged examples. The existing literatures do not refer to the self-learning capabilities of entity extractors. Further, the existing literature does not bring in domain ontologies and knowledge bases for semantic resolution in the context of entity extraction.
Accordingly, there is a need for an entity extraction method and system which is robust enough to identify new entities from big data. There is also a need for a method and system for categorizing entities in a hierarchical order to efficiently handle pattern query. Further there is also a need for a method and system for extracting entities from various data sources irrespective of the domain.
The above mentioned shortcomings, disadvantages and problems are addressed herein and which will be understood by reading and studying the following specification.
SUMMARYThe primary object of the embodiments herein is to provide a method and system for building entity hierarchy from a collection of structured, unstructured and semi structured data.
Another object of the embodiments herein is to provide a method and system for extracting a plurality of entities by analyzing big data.
Another object of the embodiments herein is to provide a method and system for facilitating an accurate and efficient pattern query relating to entities.
Another object of the embodiments herein is to provide a method and system for extracting named and unnamed entities from a collection of structured, semi-structured and unstructured data in a self learning manner.
Another object of the embodiments herein is to provide a method and system for extracting entities and building entity hierarchy from extracted entities with least manual intervention.
Another object of the embodiments herein is to provide a method and system for extracting entities which is domain independent.
These and other objects and advantages of the present embodiments will become readily apparent from the following detailed description taken in conjunction with the accompanying drawings.
The various embodiments herein provide a method for building an entity hierarchy. The method comprises extracting a plurality of entities from a big data, determining its parent entity or the entity type by understanding a context in which the entity is used, resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context and building a hierarchical structure of entities using knowledge repositories, ontologies and language repositories along with natural language processing techniques.
According to an embodiment herein, the big data comprises structured, semi structured and unstructured data.
According to an embodiment herein, each entity is associated with a parent entity.
According to an embodiment herein, the entity is at least one of named entities and unnamed entities. The named entities belong to one of the parent entities and include names of person, organization, locations, time expressions, quantities, money values quantities, monetary values and the like. The unnamed entities include nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning. The unnamed entities belong to the parent entity concept, however, there can be hierarchy among the various concept entities.
According to an embodiment herein, extracting the plurality of entities from the structured data comprises identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The data point classes at least one of a table entity, a value entity, an attribute entity, and a database entity.
According to an embodiment herein, extracting, the plurality of entities from unstructured data comprises recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger, passing the named entities and unnamed entities through multiple entity recognition models, determining the parent entity and storing the entities along with respective parent entity and context specific information in an entity store.
According to an embodiment herein the entity extraction from unstructured data is a combination of a self-learning process and training based learning process.
According to an embodiment herein the self-learning entity extraction process comprises performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique, passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity, detecting the parent entity using a voting, procedure, and storing the entities whose parent entities are detected in the entity store.
According to an embodiment herein, the self-learning entity extraction process further comprises feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit or implicit facts stated in the documents, building new entity recognition models, passing the entities through multiple entity recognition models until the parent entity is obtained and populating the entity recognition models with now entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
According to an embodiment herein, the training based entity extraction process comprises passing the data containing the tagged entities through multiple trained entity recognition models, determining one or more parent entities associated with the entities, and recognizing the appropriate parent entity based on a voting procedure.
According, to an embodiment herein, the training based entity extraction process further comprises providing additional training samples and documents that are tagged with new domain specific entities, and populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
According to an embodiment herein, the entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques, and NLP based techniques.
According to an embodiment herein, resolving the plurality of entities comprises at least one of a, word sense disambiguation technique, contextual resolution technique, syntactic similarity, and semantic similarity.
According to an embodiment herein, the entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
Embodiments herein further provide a system for building an entity hierarchy. The system comprises an entity extractor to extract a plurality of entities from a big data, a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context, an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts, and an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
According to an embodiment herein, the entity extractor comprises an entity tagger to tag named entities and unnamed entities in a data source and a parent entity detector to determine assertions of parent entity in data sources. The entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
According to an embodiment herein, the entity recognition models to detect the parent entity use at least one of a maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques and NLP based techniques.
According to an embodiment herein, the entity tagger is adapted to tag, the named entities and the unnamed entities, and tau the named entities with explicit mention of the parent entity.
According to an embodiment herein, the entity resolver understands the context in which the entities are being used and determine the parent entity.
According to an embodiment herein, the entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a, language repositories, domain ontologies, and knowledge repositories in combination with Natural Language Processing (NLP) techniques.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following, description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications can be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The other objects, features and advantages will occur to those skilled in the art from the following description of the preferred embodiment and the accompanying drawings in which:
Although the specific features of the present embodiments are shown in some drawings and not in others. This is done for convenience only as each feature can be combined with any or all of the other features in accordance with the present embodiments.
DETAILED DESCRIPTION OF THE EMBODIMENTSIn the following detailed description, a reference is made to the accompanying drawings that form a part hereof, and in which the specific embodiments that can be practiced is shown by way of illustration. These embodiments are described in sufficient detail to enable those skilled in art to practice the embodiments and it is to be understood that the logical, mechanical and other changes can be made without departing from the scope of the embodiments. The following detailed description is therefore not to be taken in a limiting sense.
The various embodiments herein provide a method or building an entity hierarchy, the method comprises extracting a plurality of entities from a big data determining a parent entity by understanding a context in which the entity is used resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context, and building a hierarchical structure of entities using knowledge repositories, ontologies, and language repositories along with natural language processing techniques. The big data comprises structured, semi-structured and unstructured data.
The entity is at least one of named entities and unnamed entities where each entity is associated with a parent emit. The named entities belong to one of the parent entities and include names of person, organization, locations, time expressions, quantifies, money values quantities, monetary values and the like. The unnamed entities include nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning.
The plurality of entities are extracted from the structured data by identifying each data point as an entity and identifying entities based on a relationship defined with other entities. The data point classes at least one of a table entity, a value entity, an attribute entity, and a database entity.
The plurality of entities are extracted from the unstructured data by recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger, passing the named entities and unnamed entities through multiple entity recognition models determining the parent emits and storing the entities along with respective parent entity and context specific information in an entry store.
The entity extraction process from unstructured data herein is a combination of a self-learning process and training based learning process.
The self-learning entity extraction process comprises performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique, passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity, detecting the parent entity using a voting procedure, and storing the entities whose parent entities are detected in the entity store.
The self-learning entity extraction process further comprises feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit, or implicit facts stated in the documents, building new entity recognition models, passing the entities through multiple entity recognition models until the parent entity is obtained and populating the entity recognition models with new entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
The training based entity extraction process comprises passing the data containing the tagged entities through multiple trained entity recognition models, determining one or more parent entities associated with the entities, and recognizing the appropriate parent entity based on a voting procedure.
The training based entity extraction process further comprises, providing additional training samples and documents that are tagged with new domain specific entities, and populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
The entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques and NLP based techniques.
The embodiments herein use at least one of a word sense disambiguation technique, contextual resolution technique, syntactic similarity, and semantic similarity method for resolving the plurality of entities.
The entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
The system for building an entity hierarchy comprises an entity extractor to extract a plurality of entities from a big data, a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context, an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts, and an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
The entity extractor comprises an entity tagger to tag named entities and unnamed entities in a data source, and a parent entity detector to determine assertions of parent entity in data sources. The entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
The entity recognition models to detect the parent entity use at least one of, maximum entropy model (maxent), conditional random fields (CRF), classification and clustering techniques, and NLP based techniques.
The entity tagger is adapted to tag the named entities and the unnamed entities, and tag the named entities with explicit mention of the parent entity.
The entity resolver understands the context in which the entities are being used and determine the parent entity.
The entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a, language repositories, domain ontologies, and knowledge repositories in combination with Natural Language Processing (NLP) techniques.
The entity extractor 104 comprises an entity tagger 105 and a parent entity detector 106. The entity tagger 105 tags named entities and unnamed entities in a data source and the parent entity detector 106 determines assertions of parent entity in data sources. The parent entity detector 106 passes the entities through multiple entity recognition models 107 to determine the parent entity based on a voting procedure.
The entity recognition models 107 herein use at least one of a maximum entropy model (maxent) conditional random fields (CRF), classification and clustering techniques and NLP based technique to detect the parent entity.
The Language and Domain model 102 is a repository used to understand the context in which the entity is being used and determines a parent entity/entity type of the entity. The Language and Domain model 102 comprises one or more language repositories 102a, a domain ontologies 102b and knowledge repositories 102c. The Language and Domain model 102 is also used to resolve the entities in structured and semi-structured context.
The entity resolver 103 resolves the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts. The entity resolution strategies are based on resolving the syntactic and semantic context. The entity resolver uses standard domain ontologies, knowledge repositories, language repositories and natural language processing techniques to establish resolution.
The entity hierarchy builder 101 arranges and stores the plurality of entities in a hierarchical manner by using a plurality of Natural Language Processing (NLP) techniques with the support of Language and Domain model 102.
Based on the requirement, one or more ER models 107 are used. The one or more ER models either use a same technique or a different technique, but learn different types of names. For instance, a first model learns medicine names, a second model learns location names and the like. The detected entities are then passed through a voting based parent entity detector 201 to check if the parent entity is detected or not. The entities whose parent entity is detected 202 is stored in an entity storage 203. The entities whose parent entity is still unknown undergoes a process of entity resolution. The entity resolution is executed by a Manual/Domain specific NLP based Parent Entity Detectors 204. The entity resolution uses 1either a manual or an automatic parent entity detector that searches for assertions of parent entities in domain specific document collection and structured data. The Manual/Domain specific NLP based Parent Entity Detectors 204 finds out new parent entities and also identifies entities with respect to the new parent entities. The entities whose parent entities are still not determined are sent to the collection of NER models through a training sample 205. The model 107 keeps receiving new models built by learning from new entities whose parent entities are resolved through the NLP based parent entity detectors, new training samples and documents that are tagged with new/domain specific entities. The entities with unknown parent entity keep going through the parent entity detection processes until the parent entity is detected (205).
To resolve the entities whose parent entity is not detected, additional training samples and documents that are tagged with new domain specific entities are generated and the training, samples 205 is populated with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models 302.
The training based entity recognition is also referred to as automatic learning, because the entity recognition is not explicitly included in the training set as long as the entities are of the designated type.
The entity resolver 103 comprises a plurality of resolution modules 401 such as entity resolution module 1, entity resolution module 2 . . . to entity resolution module n for resolving the extracted entities. The entity resolver 103 understands the context in which the entity is being used to determine the parent entity. The entity resolution 103 uses any one or a combination of a word sense disambiguation technique, a contextual resolution technique, a syntactic similarity and a semantic similarity for resolving the entities.
The entity extraction process is a combination of automatic learning and training based learning. An initial set of named entities and concepts are identified based at certain rudimentary NLP based rules and a parent entity of identified entities and concepts is discovered. Parent entity learning is also facilitated by using tagged data for training. As more than one method is used for learning, a voting based entity resolution is performed which establishes entity recognition by a maximum scare. A voting based entity resolver 402 conducts a voting procedure on the output of various entity resolvers 103 and provides resolved entities for further processing.
The embodiments herein extracts entities based on certain NLP rules. The entity extractor continues to learn from the available data through different learning algorithms. The inclusion of concepts among entities supports a wider scope for querying the data and the inclusion of the ability to recognize concepts and resolving them gives a much higher expressiveness to model semantics. The entity hierarchy helps in bringing in entities related to the queries mentioned in the query. Building an entity hierarchy, functions as query enrichment (query enrichment with semantic resolution) that allows any query to encompass all the entities of interest and eliminate the ones that are not pertinent.
The present disclosure finds relevant entities and relationships, even though the entity names are not mentioned explicitly in the big data. The entity hierarchy is useful when but not limited to, a user has to search/query about entities and their relationships/interactions with other named entities/concepts. The entity hierarchy encompasses all the named and unnamed entities that exist in the big data. The embodiments of the present disclosure provide immense benefit in Retail, Health and Pharmaceutical services, Banking and Insurance etc.
Implementation of the techniques, blocks, steps and means described above can be done in various ways. For example, these techniques, blocks, steps and means can be implemented in hardware, software, or a combination thereof. For a hardware implementation, the processing units ma be implemented within one or more application specific integrated circuits), digital signal processing devices, programmable logic devices, field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described above and/or a combination thereof.
Also, it is noted that the embodiments can be described as a process which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although the flowcharts describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations can be rearranged. A process is terminated when its operations are completed, but could have additional steps not included in the figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its termination corresponds to a return of the function to the calling function or the main function.
Furthermore, embodiments can be implemented by hardware, software, scripting languages, firmware, middleware, microcode, hardware description languages and/or any combination thereof. When implemented in software, firmware, middleware, scripting language and/or microcode, the program code or code segments to perform the necessary tasks can he stored in a machine readable medium, such as a storage medium. A code segment or machine-executable instruction may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a script, a class, or any combination of instructions, data structures and/or program statements. A code segment can he coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters and/or memory contents. Information, arguments, parameters, data, etc. can be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
For a firmware and/or software implementation, the methodologies can be implemented with modules e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions can be used in implementing the methodologies described herein.
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification.
Claims
1. A method of building an entity hierarchy comprises:
- extracting a plurality of entities from a big data;
- determining a parent entity by understanding a context in which the entity is used;
- resolving the entities by bringing the synonymous entities together and holding the polysemous entities apart based on a semantic context and a syntactic context; and
- building a hierarchical structure of entities using knowledge repositories, ontologies, and language repositories along with natural language processing techniques.
2. The method of claim 1, wherein the big data comprises structured, semi-structured and unstructured data.
3. The method of claim 1, wherein each entity is associated with a parent emit.
4. The method of claim 1, wherein the entity is at least one of named entities and unnamed entities:
- where the named entities belong to one of the parent entities and includes names of person, organization, locations, time expressions, quantities, money values quantities, monetary values and the like; and
- the unnamed entities includes nouns, verbs, combinations of nouns and verbs, a concept or a language unit with independent meaning.
5. The method of claim 1, wherein extracting the plurality of entities from the structured data comprises:
- identifying each data point as an entity; and
- identifying entities based on a relationship defined with other entities;
- wherein the data point comprises at least one of a entity, a value entity, an attribute entity, and a database entity.
6. The method of claim 5, wherein extracting the plurality of entities from unstructured data comprises:
- recognizing the named entities and the unnamed entities from data sources using a natural language processor based entity tagger;
- passing the named entities and unnamed entities through multiple entity recognition models;
- determining the parent entity; and
- storing the entities along with respective parent entity and context specific information in an entity store.
7. The method of claim 6, wherein the entity extraction from unstructured data is a combination of a self-learning process and a training based learning process.
8. The method of claim 7, wherein the self-learning entity extraction process comprises:
- performing entity recognition by tagging entities without explicitly knowing the parent entity using a natural language processing technique;
- passing the tagged entities through trained Entity Recognition (ER) models to learn the parent entity associated with the tagged entity;
- detecting the parent entity using a voting procedure; and
- storing the entities whose parent entities are detected in the entity store.
9. The method of claim 7, further comprises:
- feeding the data containing the entities whose parent entities are not detected to a Natural Language Processor (NLP) based entity detector which involves parsing documents containing domain specific knowledge and learn the parent entities from the explicit or implicit facts stated in the documents;
- building new entity recognition models;
- passing, the entities through multiple entity recognition models until the parent entity is obtained; and
- populating the entity recognition models with new entity recognition models built by learning from samples containing new entities whose parent entities are identified in domain specific documents through the NLP based entity detectors.
10. The method of claim 7, wherein the training based entity extraction process comprises:
- passing the data containing the tagged entities through multiple trained entity recognition models;
- determining one or more parent entities associated with the entities; and
- recognizing the appropriate parent entity based on a voting procedure.
11. The method of claim 10, further comprises:
- providing additional training samples and documents that are tagged with new domain specific entities; and
- populating the training sample with new kind of parent entities suggested by the NLP based entity detectors to build new entity recognition models.
12. The method of claim 1, wherein the entity recognition models to detect the parent entity use at least one of:
- maximum entropy model (maxent);
- conditional random fields (CRF);
- classification and clustering techniques; and
- NLP based techniques.
13. The method of claim 1, wherein resolving the plurality of entities comprises at least one of a:
- word sense disambiguation technique;
- contextual resolution technique;
- syntactic similarity; and
- semantic similarity.
14. The method of claim 1, wherein the entity extraction from semi-structured data is a combination of extracting entities from structured data and unstructured data.
15. A system for building an entity hierarchy comprises:
- an entity extractor to extract a plurality of entities from a big data;
- a Language and Domain model to conceptualize the entities in accordance with a structured context or semi-structured context;
- an entity resolver to resolve the entities by gathering the synonymous entities together and polysemous entities apart based on syntactic and semantic contexts; and
- an entity hierarchy builder to build a hierarchical structure of entities using natural language processing techniques in conjunction with the Language and Domain models.
16. The system of claim 15, wherein the entity extractor comprises:
- an entity tagger to tag named entities and unnamed entities in a data source; and
- a parent entity detector to determine assertions of parent entity in data sources, where the entities are passed through multiple entity recognition models to determine the parent entity based on a voting procedure.
17. The system of claim 16, wherein the entity recognition models to detect the parent entity use at least one of:
- maximum entropy model (maxent);
- conditional random fields (CRF);
- classification and clustering techniques; and
- NLP based techniques.
18. The system of claim 15, wherein entity tagger is adapted to:
- tag the named entities and the unnamed entities; and
- tag the named entities with explicit mention of the parent entity.
19. The system of claim 15, wherein the entity resolver understands the context in which the entities is being used and determine the parent entity.
20. The system of claim 15, wherein the entity resolver performs a contextual resolution using the Language and Domain models which comprise at least one of a:
- language repositories;
- domain ontologies; and
- knowledge repositories in combination with Natural Language Processing (NLP) techniques.
Type: Application
Filed: Jan 31, 2013
Publication Date: Feb 13, 2014
Applicant: XURMO TECHNOLOGIES PVT. LTD. (BANGALORE)
Inventors: SRIDHAR GOPALAKRISHNAN (BANGALORE), SUJATHA RAVIPRASAD UPADHYAYA (BANGALORE)
Application Number: 13/755,069