Patents by Inventor Scott Carrier

Scott Carrier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Extracting information from unstructured documents using natural language processing and conversion of unstructured documents into structured documents

Patent number: 11423042

Abstract: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.

Type: Grant

Filed: February 7, 2020

Date of Patent: August 23, 2022

Assignee: International Business Machines Corporation

Inventors: Jothilakshmi Sirangimoorthy, Ritwik Ray, Hui Wang, Jonathan Rand, Scott Carrier
PRE-PROCESSING A TABLE IN A DOCUMENT FOR NATURAL LANGUAGE PROCESSING

Publication number: 20220230012

Abstract: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.

Type: Application

Filed: January 21, 2021

Publication date: July 21, 2022

Inventors: Scott CARRIER, Ritwik RAY, Jonathan Chapin RAND, Jothilakshmi SIRANGIMOORTHY, Hui WANG, Robert FREDENBURG
Navigating unstructured documents using structured documents including information extracted from unstructured documents

Patent number: 11392753

Abstract: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving an unstructured document and a structured document including information extracted from the unstructured document and position information associated with the extracted information. The unstructured document is rendered in a first pane, and a graphical rendering of the structured document is rendered in a second pane. The graphical rendering generally may be a structure in which content from the structured document is displayed in a hierarchical format. Each element in the structured document is linked to the rendered unstructured document based on position information included in the structured document.

Type: Grant

Filed: February 7, 2020

Date of Patent: July 19, 2022

Assignee: International Business Machines Corporation

Inventors: Jothilakshmi Sirangimoorthy, Ritwik Ray, Hui Wang, Jonathan Rand, Scott Carrier
Inferring relation types between temporal elements and entity elements

Patent number: 11373037

Abstract: Examples described herein provide a computer-implemented method that includes receiving, by a processing device, the span of text, the span of text comprising a plurality of elements including at least an entity element and a temporal element. The method further includes organizing, by the processing device, the span of text as a natural language processing (NLP) parse tree. The method further includes traversing, by the processing device, the NLP parse tree by concatenating individual nodes of the span of text to generate the relation type between the entity element and the temporal element. The method further includes associating, by the processing device, the entity element, the relation type, and the temporal element together.

Type: Grant

Filed: October 1, 2019

Date of Patent: June 28, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Scott Carrier, Brendan Bull, Dwi Sianto Mansjur, Paul Lewis Felt
Detecting and processing sections spanning processed document partitions

Patent number: 11347928

Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.

Type: Grant

Filed: July 27, 2020

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew J Lavery, Igor S. Ramos, Paul Joseph Hake, Scott Carrier
PERFORMANCE CHARACTERISTICS OF CARTRIDGE ARTIFACTS OVER TEXT PATTERN CONSTRUCTS

Publication number: 20220129623

Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.

Type: Application

Filed: January 10, 2022

Publication date: April 28, 2022

Inventors: ISHRAT FATMA, Sandhya Nayak, Scott Carrier
PREVENTION OF COMPUTER VISION SYNDROME USING EXPLAINABLE ARTIFICIAL INTELLIGENCE

Publication number: 20220115128

Abstract: Technology for applying explainable artificial training algorithms (XAI) to training machine learning algorithms for identifying potentially developing computer vision syndrome (CVS), CVS and/or recommended remedial action(s) that a user can perform to counter potentially developing CVS and/or existing CVS. In some embodiments, the XAI includes a Contrastive Explainability model. In some embodiments, the training performed by the XAI includes assigning weight factors respectively to CVS input parameters (for example, blink rate) based upon how strong the respective CVS input factor is correlated with development of CVS in the user.

Type: Application

Filed: October 12, 2020

Publication date: April 14, 2022

Inventors: William G. Dusch, MacDonald Isere, Nicholas L. Graham, Scott Carrier
Performance characteristics of cartridge artifacts over text pattern constructs

Patent number: 11250205

Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.

Type: Grant

Filed: July 10, 2020

Date of Patent: February 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ishrat Fatma, Sandhya Nayak, Scott Carrier
FUTURE POTENTIAL NATURAL LANGUAGE PROCESSING ANNOTATIONS

Publication number: 20220043968

Abstract: Aspects of the invention include resolving future reference identifiers for documents. Aspects of the invention include processing a document including a reference to a future event, wherein processing includes performing natural language processing (NLP) on the document, and identifying the reference to the future event included in the document. Aspects of the invention also include generating a future reference identifier for the reference to the future event, and responsive to processing an occurrence of the future event, resolving the future reference identifier by providing data from a subsequent document for the future event associated with the future reference identifier.

Type: Application

Filed: August 4, 2020

Publication date: February 10, 2022

Inventors: Andrew J Lavery, Scott Carrier, Paul Joseph Hake, Igor S. Ramos
PROMISED NATURAL LANGUAGE PROCESSING ANNOTATIONS

Publication number: 20220043967

Abstract: Aspects of the invention include a computer-implemented method for generating promise identifiers for documents. Aspects include processing a document including a reference, wherein processing includes performing natural language processing (NLP) the document, and identifying the reference included in the document. Aspects also include generating a promise identifier for the reference in the document, and responsive to processing the document, resolving the promise identifier for the reference by providing data of the reference associated with the promise identifier. Aspects of the invention also include a computer program product and system for generating promise identifiers for documents.

Type: Application

Filed: August 4, 2020

Publication date: February 10, 2022

Inventors: Andrew J. Lavery, Scott Carrier, Paul Joseph Hake, Igor S. Ramos
GENERATING ENTITY RELATION SUGGESTIONS WITHIN A CORPUS

Publication number: 20220043848

Abstract: Aspects of the invention include a computer-implemented method for entity relation type detection. The method includes detecting a plurality of candidate co-occurring entities from one or more documents. A first set of co-occurring entities and a second set of co-occurring entities from the plurality of co-occurring entities is grouped based on a synonymity of a first set of entity types associated with the first set of co-occurring entities and a second set of entity types associated with the second set of co-occurring entities. A synonymity of a first set of intervening tokens associated with the first set of co-occurring entities and a second set of intervening tokens associated with the second set of co-occurring entities is detected.

Type: Application

Filed: August 4, 2020

Publication date: February 10, 2022

Inventors: Pai-Fang Hsiao, Scott Carrier
REPLACING MAPPINGS WITHIN A SEMANTIC SEARCH APPLICATION OVER A COMMONLY ENRICHED CORPUS

Publication number: 20220035817

Abstract: Techniques include integrating a custom ontology into a semantic search function, the semantic search function being configured to perform a semantic search over a corpus enriched with a separate ontology. The semantic search function is executed using the custom ontology to perform the semantic search of the corpus. Results are generated from the semantic search of the corpus based on input received by the semantic search function.

Type: Application

Filed: July 28, 2020

Publication date: February 3, 2022

Inventors: Scott Carrier, Pai-Fang Hsiao
BOOTSTRAPPING RELATION TRAINING DATA

Publication number: 20220036007

Abstract: Aspects of the invention include a computer-implemented method for bootstrapping relation training data. The method includes traversing a corpus to detect a first passage having a first set of co-occurring entities and intervening tokens associated with a relation type. Identifying a first predicate frame of the first passage based on the co-occurring entities and intervening tokens. Traversing the corpus again to detect a second passage having a second predicate frame with a same semantic structure as the first predicate frame, wherein the passage contains a second set of co-occurring entities associated with the relation during first instance that the processor did not detect during the first time. Detecting a second set of co-occurring entities in the second passage based on the second predicate frame. Annotating the second set of co-occurring entities to have a same relation as the first set of co-occurring entities.

Type: Application

Filed: July 31, 2020

Publication date: February 3, 2022

Inventors: Pai-Fang Hsiao, Scott Carrier
SEMANTIC LINKAGE QUALIFICATION OF ONTOLOGICALLY RELATED ENTITIES

Publication number: 20220036009

Abstract: Aspects of the present disclosure include determining, by a processor, an ontology, the ontology comprising a plurality of ontological relationships, receiving, by the processor, a plurality of passages, determining, by the processor, a target set of co-occurring entities comprising a first entity and a second entity, determining a first passage in the plurality of passages that includes the first entity and the second entity, determining, from the ontology, a first ontological relationship between the first entity and the second entity, analyzing the first passage to determine a congruency score for the first ontological relationship, and generating a relationship annotation between the first entity and the second entity in the first passages based on the congruency score being within a threshold.

Type: Application

Filed: July 28, 2020

Publication date: February 3, 2022

Inventors: Scott Carrier, Jennifer Lynn La Rocca, Rebecca Lynn Dahlman, Mario J. Lorenzo
CUSTOM SEMANTIC SEARCH EXPERIENCE DRIVEN BY AN ONTOLOGY

Publication number: 20220035866

Abstract: Techniques include updating a semantic search function with a custom ontology, the semantic search function initially supporting a separate ontology having been used to enrich a corpus. The custom ontology is used to augment input of a search query for the semantic search function, thereby providing a custom user experience for searching the corpus.

Type: Application

Filed: July 28, 2020

Publication date: February 3, 2022

Inventors: Scott Carrier, Pai-Fang Hsiao
HANDLING FORM DATA ERRORS ARISING FROM NATURAL LANGUAGE PROCESSING

Publication number: 20220028502

Abstract: Aspects include receiving a document and classifying at least a subset of the document as having a first type of data. Features are extracted from the document. The extracting includes initiating processing of the at least a subset of the document by a first processing engine that was previously trained to extract features from the first type of data. The extracting also includes initiating processing of a remaining portion of the document not included in the at least a subset of the document by a second processing engine that was previously trained to extract features from a second type of data. The first type of data is different than the second type of data. Features are received from one or both of the first processing engine and the second processing engine. The received features are stored as features of the document.

Type: Application

Filed: July 21, 2020

Publication date: January 27, 2022

Inventors: Paul Joseph Hake, Igor S. Ramos, Andrew J. Lavery, Scott Carrier
DETECTING AND PROCESSING SECTIONS SPANNING PROCESSED DOCUMENT PARTITIONS

Publication number: 20220027612

Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.

Type: Application

Filed: July 27, 2020

Publication date: January 27, 2022

Inventors: ANDREW J LAVERY, IGOR S. RAMOS, PAUL JOSEPH HAKE, SCOTT CARRIER
TENANT-ISOLATED CUSTOM ANNOTATIONS FOR SEARCH WITHIN A PUBLIC CORPUS

Publication number: 20220019623

Abstract: Embodiments of the present invention are directed to customizing annotations for a tenant-specific search within a public corpus. In a non-limiting embodiment of the invention, a cartridge file is received by a semantic search application. The cartridge file includes a new attribute definition that is not available in an index of the semantic search application. The new attribute definition is incorporated within the index based on an approximation of one or more existing attributes in the index. One or more documents are retrieved from the public corpus based on a concept search using the incorporated new attribute definition and the one or more documents are annotated based on the incorporated new attribute definition. The annotated one or more documents are stored in a tenant-specific dataset separate from the public corpus.

Type: Application

Filed: July 17, 2020

Publication date: January 20, 2022

Inventors: Dwi Sianto Mansjur, Scott Carrier
PERFORMANCE CHARACTERISTICS OF CARTRIDGE ARTIFACTS OVER TEXT PATTERN CONSTRUCTS

Publication number: 20220012411

Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.

Type: Application

Filed: July 10, 2020

Publication date: January 13, 2022

Inventors: ISHRAT FATMA, Sandhya Nayak, Scott Carrier
Sliding window to detect entities in corpus using natural language processing

Patent number: 11222165

Abstract: According to one or more embodiments of the present invention, an input request to a natural language processing (NLP) system is optimized. A window-size is selected for annotating an input corpus. The corpus is divided into partitions of the window-size, each partition processed separately. Further, a first set of entities is identified in a first partition, and a second set of entities in a second partition. Further, a third partition containing a first segment and a second segment is determined. The first segment overlaps the first partition, and the second segment overlaps the second partition. The method further includes identifying a third set of entities in the third partition. In response to the third set of entities being distinct from a set of entities from the first segment and the second segment, the window-size is adjusted. The input request for the NLP system is generated using the adjusted window-size.

Type: Grant

Filed: August 18, 2020

Date of Patent: January 11, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Igor S. Ramos, Andrew J. Lavery, Scott Carrier, Paul Joseph Hake

prev 1 2 3 4 next