Patents by Inventor Scott Carrier

Scott Carrier has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11423042
    Abstract: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.
    Type: Grant
    Filed: February 7, 2020
    Date of Patent: August 23, 2022
    Assignee: International Business Machines Corporation
    Inventors: Jothilakshmi Sirangimoorthy, Ritwik Ray, Hui Wang, Jonathan Rand, Scott Carrier
  • Publication number: 20220230012
    Abstract: Provided are a computer program product, system, and method for pre-processing a table in a document for natural language processing. A table in a document is parsed to extract column headers, row headers, and data cells, which are processed to determine an initial set of a main element comprising an entity whose value is to be extracted, a conditional element that refines the entity, and a value element comprising a value for the entity. A user selection is received of at least one of the column headers, row headers, and data cells for at least one of the main element, conditional element, and the value element in the initial set to produce a modified set of the main element, conditional element, and value element. The modified set is provided to a natural language processing engine to perform natural language processing of the document including the table, using the modified set.
    Type: Application
    Filed: January 21, 2021
    Publication date: July 21, 2022
    Inventors: Scott CARRIER, Ritwik RAY, Jonathan Chapin RAND, Jothilakshmi SIRANGIMOORTHY, Hui WANG, Robert FREDENBURG
  • Patent number: 11392753
    Abstract: Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving an unstructured document and a structured document including information extracted from the unstructured document and position information associated with the extracted information. The unstructured document is rendered in a first pane, and a graphical rendering of the structured document is rendered in a second pane. The graphical rendering generally may be a structure in which content from the structured document is displayed in a hierarchical format. Each element in the structured document is linked to the rendered unstructured document based on position information included in the structured document.
    Type: Grant
    Filed: February 7, 2020
    Date of Patent: July 19, 2022
    Assignee: International Business Machines Corporation
    Inventors: Jothilakshmi Sirangimoorthy, Ritwik Ray, Hui Wang, Jonathan Rand, Scott Carrier
  • Patent number: 11373037
    Abstract: Examples described herein provide a computer-implemented method that includes receiving, by a processing device, the span of text, the span of text comprising a plurality of elements including at least an entity element and a temporal element. The method further includes organizing, by the processing device, the span of text as a natural language processing (NLP) parse tree. The method further includes traversing, by the processing device, the NLP parse tree by concatenating individual nodes of the span of text to generate the relation type between the entity element and the temporal element. The method further includes associating, by the processing device, the entity element, the relation type, and the temporal element together.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: June 28, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Scott Carrier, Brendan Bull, Dwi Sianto Mansjur, Paul Lewis Felt
  • Patent number: 11347928
    Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.
    Type: Grant
    Filed: July 27, 2020
    Date of Patent: May 31, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrew J Lavery, Igor S. Ramos, Paul Joseph Hake, Scott Carrier
  • Publication number: 20220129623
    Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.
    Type: Application
    Filed: January 10, 2022
    Publication date: April 28, 2022
    Inventors: ISHRAT FATMA, Sandhya Nayak, Scott Carrier
  • Publication number: 20220115128
    Abstract: Technology for applying explainable artificial training algorithms (XAI) to training machine learning algorithms for identifying potentially developing computer vision syndrome (CVS), CVS and/or recommended remedial action(s) that a user can perform to counter potentially developing CVS and/or existing CVS. In some embodiments, the XAI includes a Contrastive Explainability model. In some embodiments, the training performed by the XAI includes assigning weight factors respectively to CVS input parameters (for example, blink rate) based upon how strong the respective CVS input factor is correlated with development of CVS in the user.
    Type: Application
    Filed: October 12, 2020
    Publication date: April 14, 2022
    Inventors: William G. Dusch, MacDonald Isere, Nicholas L. Graham, Scott Carrier
  • Patent number: 11250205
    Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.
    Type: Grant
    Filed: July 10, 2020
    Date of Patent: February 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ishrat Fatma, Sandhya Nayak, Scott Carrier
  • Publication number: 20220043968
    Abstract: Aspects of the invention include resolving future reference identifiers for documents. Aspects of the invention include processing a document including a reference to a future event, wherein processing includes performing natural language processing (NLP) on the document, and identifying the reference to the future event included in the document. Aspects of the invention also include generating a future reference identifier for the reference to the future event, and responsive to processing an occurrence of the future event, resolving the future reference identifier by providing data from a subsequent document for the future event associated with the future reference identifier.
    Type: Application
    Filed: August 4, 2020
    Publication date: February 10, 2022
    Inventors: Andrew J Lavery, Scott Carrier, Paul Joseph Hake, Igor S. Ramos
  • Publication number: 20220043967
    Abstract: Aspects of the invention include a computer-implemented method for generating promise identifiers for documents. Aspects include processing a document including a reference, wherein processing includes performing natural language processing (NLP) the document, and identifying the reference included in the document. Aspects also include generating a promise identifier for the reference in the document, and responsive to processing the document, resolving the promise identifier for the reference by providing data of the reference associated with the promise identifier. Aspects of the invention also include a computer program product and system for generating promise identifiers for documents.
    Type: Application
    Filed: August 4, 2020
    Publication date: February 10, 2022
    Inventors: Andrew J. Lavery, Scott Carrier, Paul Joseph Hake, Igor S. Ramos
  • Publication number: 20220043848
    Abstract: Aspects of the invention include a computer-implemented method for entity relation type detection. The method includes detecting a plurality of candidate co-occurring entities from one or more documents. A first set of co-occurring entities and a second set of co-occurring entities from the plurality of co-occurring entities is grouped based on a synonymity of a first set of entity types associated with the first set of co-occurring entities and a second set of entity types associated with the second set of co-occurring entities. A synonymity of a first set of intervening tokens associated with the first set of co-occurring entities and a second set of intervening tokens associated with the second set of co-occurring entities is detected.
    Type: Application
    Filed: August 4, 2020
    Publication date: February 10, 2022
    Inventors: Pai-Fang Hsiao, Scott Carrier
  • Publication number: 20220035817
    Abstract: Techniques include integrating a custom ontology into a semantic search function, the semantic search function being configured to perform a semantic search over a corpus enriched with a separate ontology. The semantic search function is executed using the custom ontology to perform the semantic search of the corpus. Results are generated from the semantic search of the corpus based on input received by the semantic search function.
    Type: Application
    Filed: July 28, 2020
    Publication date: February 3, 2022
    Inventors: Scott Carrier, Pai-Fang Hsiao
  • Publication number: 20220036007
    Abstract: Aspects of the invention include a computer-implemented method for bootstrapping relation training data. The method includes traversing a corpus to detect a first passage having a first set of co-occurring entities and intervening tokens associated with a relation type. Identifying a first predicate frame of the first passage based on the co-occurring entities and intervening tokens. Traversing the corpus again to detect a second passage having a second predicate frame with a same semantic structure as the first predicate frame, wherein the passage contains a second set of co-occurring entities associated with the relation during first instance that the processor did not detect during the first time. Detecting a second set of co-occurring entities in the second passage based on the second predicate frame. Annotating the second set of co-occurring entities to have a same relation as the first set of co-occurring entities.
    Type: Application
    Filed: July 31, 2020
    Publication date: February 3, 2022
    Inventors: Pai-Fang Hsiao, Scott Carrier
  • Publication number: 20220036009
    Abstract: Aspects of the present disclosure include determining, by a processor, an ontology, the ontology comprising a plurality of ontological relationships, receiving, by the processor, a plurality of passages, determining, by the processor, a target set of co-occurring entities comprising a first entity and a second entity, determining a first passage in the plurality of passages that includes the first entity and the second entity, determining, from the ontology, a first ontological relationship between the first entity and the second entity, analyzing the first passage to determine a congruency score for the first ontological relationship, and generating a relationship annotation between the first entity and the second entity in the first passages based on the congruency score being within a threshold.
    Type: Application
    Filed: July 28, 2020
    Publication date: February 3, 2022
    Inventors: Scott Carrier, Jennifer Lynn La Rocca, Rebecca Lynn Dahlman, Mario J. Lorenzo
  • Publication number: 20220035866
    Abstract: Techniques include updating a semantic search function with a custom ontology, the semantic search function initially supporting a separate ontology having been used to enrich a corpus. The custom ontology is used to augment input of a search query for the semantic search function, thereby providing a custom user experience for searching the corpus.
    Type: Application
    Filed: July 28, 2020
    Publication date: February 3, 2022
    Inventors: Scott Carrier, Pai-Fang Hsiao
  • Publication number: 20220028502
    Abstract: Aspects include receiving a document and classifying at least a subset of the document as having a first type of data. Features are extracted from the document. The extracting includes initiating processing of the at least a subset of the document by a first processing engine that was previously trained to extract features from the first type of data. The extracting also includes initiating processing of a remaining portion of the document not included in the at least a subset of the document by a second processing engine that was previously trained to extract features from a second type of data. The first type of data is different than the second type of data. Features are received from one or both of the first processing engine and the second processing engine. The received features are stored as features of the document.
    Type: Application
    Filed: July 21, 2020
    Publication date: January 27, 2022
    Inventors: Paul Joseph Hake, Igor S. Ramos, Andrew J. Lavery, Scott Carrier
  • Publication number: 20220027612
    Abstract: Aspects of the invention include detecting and processing sections spanning processed document partitions by caching a document partition. The document partition includes metadata indicating that the document partition is a portion of a whole document. Aspects also include pairing a candidate paragraph from the document partition with a cached paragraph segment and determining, using a coherence model, a probability that the candidate paragraph and the cached paragraph segment constitute a semantically coherent paragraph. Aspects further include discarding the cached paragraph segment and processing the candidate paragraph and the cached paragraph segment separately based on a determination that the probability is less than a threshold level and processing the candidate paragraph and the cached paragraph segment together as a cross-partition paragraph based on a determination that the probability is greater than the threshold level.
    Type: Application
    Filed: July 27, 2020
    Publication date: January 27, 2022
    Inventors: ANDREW J LAVERY, IGOR S. RAMOS, PAUL JOSEPH HAKE, SCOTT CARRIER
  • Publication number: 20220019623
    Abstract: Embodiments of the present invention are directed to customizing annotations for a tenant-specific search within a public corpus. In a non-limiting embodiment of the invention, a cartridge file is received by a semantic search application. The cartridge file includes a new attribute definition that is not available in an index of the semantic search application. The new attribute definition is incorporated within the index based on an approximation of one or more existing attributes in the index. One or more documents are retrieved from the public corpus based on a concept search using the incorporated new attribute definition and the one or more documents are annotated based on the incorporated new attribute definition. The annotated one or more documents are stored in a tenant-specific dataset separate from the public corpus.
    Type: Application
    Filed: July 17, 2020
    Publication date: January 20, 2022
    Inventors: Dwi Sianto Mansjur, Scott Carrier
  • Publication number: 20220012411
    Abstract: Embodiments of the present invention are directed to evaluating the performance characteristics of annotator configurations against text pattern constructs in unstructured text. In a non-limiting embodiment of the invention, unstructured text is received by a processor. A text pattern construct is identified in the unstructured text and a first performance characteristic of an annotator is determined based on the text pattern construct. The text pattern construct is converted to a natural language text and a second performance characteristic of the annotator is determined based on the natural language text. A delta is determined between the first performance characteristic and the second performance characteristic. An alternative annotator configuration is identified for a portion of the unstructured text comprising the text pattern construct.
    Type: Application
    Filed: July 10, 2020
    Publication date: January 13, 2022
    Inventors: ISHRAT FATMA, Sandhya Nayak, Scott Carrier
  • Patent number: 11222165
    Abstract: According to one or more embodiments of the present invention, an input request to a natural language processing (NLP) system is optimized. A window-size is selected for annotating an input corpus. The corpus is divided into partitions of the window-size, each partition processed separately. Further, a first set of entities is identified in a first partition, and a second set of entities in a second partition. Further, a third partition containing a first segment and a second segment is determined. The first segment overlaps the first partition, and the second segment overlaps the second partition. The method further includes identifying a third set of entities in the third partition. In response to the third set of entities being distinct from a set of entities from the first segment and the second segment, the window-size is adjusted. The input request for the NLP system is generated using the adjusted window-size.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: January 11, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Igor S. Ramos, Andrew J. Lavery, Scott Carrier, Paul Joseph Hake