Abstract: Systems and methods are described for disambiguating terms, a challenging problem in computational linguistics. An ambiguous term may be regular words or phrases, or preferably abbreviations or acronyms. A potentially ambiguous term may be identified from an information source, and has two or more potential meanings. For each potential meaning, context and frequency of the potential meaning are determined. Context may include section headings, nearby concepts, or all relevant concepts within the information source. Frequency may reflect frequency of the candidate concepts within literature, medical literature, patient records, or another information source. Context, frequency, or both for each potential meaning can support a computer technology algorithm to select one potential meaning over others.
Abstract: Systems and methods are described for disambiguating terms, a challenging problem in computational linguistics. An ambiguous term may be regular words or phrases, or preferably abbreviations or acronyms. A potentially ambiguous term may be identified from an information source, and has two or more potential meanings. For each potential meaning, context and frequency of the potential meaning are determined. Context may include section headings, nearby concepts, or all relevant concepts within the information source. Frequency may reflect frequency of the candidate concepts within literature, medical literature, patient records, or another information source. Context, frequency, or both for each potential meaning can support a computer technology algorithm to select one potential meaning over others.
Abstract: Systems and methods are described for implementing an advanced, “research-grade” or “regulatory-grade,” real-world evidence (RWE) approach. The advanced RWE is able to extract a deep phenotype from rich data sources using advanced technologies including artificial intelligence. The rich data sources include both unstructured data and structured data from electric health records and may include additional data sources such as claims or registries. Systems and methods are also described for validating the deep phenotype which can then be used to create a patient cohort that may be linked to exposure or outcome data to make credible clinical assertions.
Abstract: Systems and methods are described for disambiguating terms, a challenging problem in computational linguistics. An ambiguous term may be regular words or phrases, or preferably abbreviations or acronyms. A potentially ambiguous term may be identified from an information source, and has two or more potential meanings. For each potential meaning, context and frequency of the potential meaning are determined. Context may include section headings, nearby concepts, or all relevant concepts within the information source. Frequency may reflect frequency of the candidate concepts within literature, medical literature, patient records, or another information source. Context, frequency, or both for each potential meaning can support a computer technology algorithm to select one potential meaning over others.