Patents by Inventor David Contreras

David Contreras has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11687796
    Abstract: An approach is provided that receives a document and a document type of the document. The document type identifies a document category to which the received document belongs. A set of linguistic metrics are retrieved that correspond to the document type. A quality of the received document is automatically determined based on a set of linguistic features found in the document as compared to the retrieved set of linguistic metrics. The document is then ingested into a corpus that is utilized by a question-answering (QA) system. The ingestion of the document is based on the determined quality.
    Type: Grant
    Filed: April 17, 2019
    Date of Patent: June 27, 2023
    Assignee: International Business Machines Corporation
    Inventors: Brien H. Muschett, Andrew R. Freed, Roberto Delima, David Contreras, Krishna Mahajan
  • Patent number: 11593561
    Abstract: A phrase that includes a trigger word that modifies a meaning within the phrase is received. The trigger word is identified. The words of the phrase that are modified by the trigger word are identified by analyzing features of the phrase that link the trigger word to other words. The phrase is interpreted by modifying the second subset of words according to the modification of the trigger word.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: February 28, 2023
    Assignee: International Business Machines Corporation
    Inventors: David Contreras, Krishna Mahajan, Roberto Delima, Kandhan Sekar, Corville O. Allen, Chris Mwarabu
  • Patent number: 11397851
    Abstract: Provided are a computer program product, system, and method for classifying text to determine a goal type used to select machine learning algorithm outcomes. Natural language processing of text is performed to determine features in the text and their relationships. A classifier classifies the text based on the relationships and features to determine a goal type. The determined features and relationships from the text are inputted into a plurality of different machine learning algorithms to generate outcomes. For each of the machine learning algorithms, a determination is made of performance measurements resulting from the machine learning algorithms generating the outcomes. A determination is made of at least one machine learning algorithm having performance measurements that are highly correlated to the determined goal type. An outcome is determined from at least one of the outcomes.
    Type: Grant
    Filed: April 13, 2018
    Date of Patent: July 26, 2022
    Assignee: International Business Machines Corporation
    Inventors: Aysu Ezen Can, David Contreras, Bob Delima, Corville O. Allen
  • Patent number: 11392764
    Abstract: Provided are a computer program product, system, and method for classifying text to determine a goal type used to select machine learning algorithm outcomes. Natural language processing of text is performed to determine features in the text and their relationships. A classifier classifies the text based on the relationships and features to determine a goal type. The determined features and relationships from the text are inputted into a plurality of different machine learning algorithms to generate outcomes. For each of the machine learning algorithms, a determination is made of performance measurements resulting from the machine learning algorithms generating the outcomes. A determination is made of at least one machine learning algorithm having performance measurements that are highly correlated to the determined goal type. An outcome is determined from at least one of the outcomes.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: July 19, 2022
    Assignee: International Business Machines Corporation
    Inventors: Aysu Ezen Can, David Contreras, Bob Delima, Corville O. Allen
  • Patent number: 11295080
    Abstract: A method, system, and computer program product include providing a list of triggers, training the natural language processor with the list of triggers, providing to the natural language processor a text including one trigger, selecting nodes in the text to create an original potential span, predicting whether the original potential span includes another trigger, and adjusting, in response to predicting that the original potential span includes another trigger, the original potential span to exclude the another trigger to create a new potential span.
    Type: Grant
    Filed: June 4, 2019
    Date of Patent: April 5, 2022
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Roberto Delima, David Contreras, Krishna Mahajan
  • Patent number: 11205053
    Abstract: Methods, systems and computer readable media are provided for semantic evaluation of tentative triggers based on contextual triggers. Contextual triggers are identified within text. A parse tree comprising a plurality of nodes is generated corresponding to the text. Tentative triggers are identified within the text. A determination is made as to whether one or more nodes of the parse tree corresponding to the tentative trigger is within a context of one or more nodes of the parse tree corresponding to the contextual triggers. Based on the determination, the tentative trigger type is assigned to a contextual trigger type.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: December 21, 2021
    Assignee: International Business Machines Corporation
    Inventors: David Contreras, Kandhan Sekar, Thomas Hay Rogers
  • Patent number: 11138380
    Abstract: Aspects of the present disclosure relate to identifying semantic relationships. Natural language content is received. A part of speech is determined for respective terms within the natural language content. A semantic type is determined for each of two or more terms within the natural language content. A parse tree representation containing a plurality of nodes is then generated based on the natural language content, each of the plurality of nodes corresponding to at least one term within the natural language content, wherein visual characteristics of respective nodes of the plurality of nodes within the parse tree representation depend on the part of speech and semantic type of the respective terms. A bounding box identifying a semantic relationship is then generated around a set of nodes on the parse tree representation, the set of nodes including the two or more terms.
    Type: Grant
    Filed: June 11, 2019
    Date of Patent: October 5, 2021
    Assignee: International Business Machines Corporation
    Inventors: Chris Mwarabu, David Contreras, Roberto Delima, Corville O. Allen
  • Publication number: 20210303794
    Abstract: Methods, systems and computer readable media are provided for semantic evaluation of tentative triggers based on contextual triggers. Contextual triggers are identified within text. A parse tree comprising a plurality of nodes is generated corresponding to the text. Tentative triggers are identified within the text. A determination is made as to whether one or more nodes of the parse tree corresponding to the tentative trigger is within a context of one or more nodes of the parse tree corresponding to the contextual triggers. Based on the determination, the tentative trigger type is assigned to a contextual trigger type.
    Type: Application
    Filed: March 26, 2020
    Publication date: September 30, 2021
    Inventors: David Contreras, Kandhan Sekar, Thomas Hay Rogers
  • Patent number: 11120215
    Abstract: Aspects of the present disclosure relate to identifying spans within unstructured electronic text. Natural language content is received. A part of speech and slot name of each word within the natural language content is identified. A parse tree representation is then generated based on the natural language content, wherein visual characteristics of each node of a plurality of nodes within the parse tree representation depend on the part of speech and slot name of each word. A bounding box identifying a span category is then generated around a set of nodes on the parse tree representation by a machine learning model.
    Type: Grant
    Filed: April 24, 2019
    Date of Patent: September 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Chris Mwarabu, David Contreras, Roberto Delima, Corville O. Allen
  • Patent number: 11113469
    Abstract: A phrase may be received that includes a plurality of tokens in a natural language format. A plurality of levels relating to dependencies between tokens of the plurality of tokens within the phrase is determined. A matrix structure is generated for the phrase. The matrix structure utilizes a plurality of rows and a plurality of columns to store data of the phrase. The plurality of rows and the plurality of columns each indicate one of an order of tokens of the plurality of tokens or levels of the plurality of levels.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: September 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Corville O. Allen, Roberto Delima, Chris Mwarabu, David Contreras, Kandhan Sekar, Krishna Mahajan
  • Patent number: 11017171
    Abstract: A method, computer system, and a computer program product for relevancy-based document quality assessment is provided. The present invention may include computing a document quality score based on at least one container relevancy score determined based on at least one domain link to a domain knowledge base.
    Type: Grant
    Filed: June 6, 2019
    Date of Patent: May 25, 2021
    Assignee: International Business Machines Corporation
    Inventors: Roberto Delima, Andrew R. Freed, Brien Muschett, Krishna Mahajan, David Contreras
  • Patent number: 10956662
    Abstract: First content containing a plurality of list items in one or more lists can be parsed for conjunctions and implied list indicators. One or more modifications can occur at one or more conjunctions or implied list indicators. The one or more modifications can comprise one or more of expanding text, contracting text, and replacing text. The one or more modifications can generate second content conducive to natural language processing operations.
    Type: Grant
    Filed: September 12, 2018
    Date of Patent: March 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Keith P. Biegert, Brendan C. Bull, David Contreras, Robert C. Sizemore, Sterling R. Smith
  • Patent number: 10902044
    Abstract: Techniques for cognitive document quality determination and automated heuristic generation are provided. A plurality of documents is received, where each of the plurality of documents contains natural language text. A plurality of values is determined for a first plurality of predefined attributes of the plurality of documents. A plurality of quality scores is generated for the plurality of documents by processing the plurality of values using a machine learning model, where the plurality of quality scores indicate a suitability of each of the plurality of documents to be processed using a target processing operation. A subset of documents is identified from the plurality of documents having respective quality scores below a predefined threshold. The subset of documents is flagged for further processing. At least one document of the plurality of documents that is not flagged is selectively processed using the target processing operation.
    Type: Grant
    Filed: November 2, 2018
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: David Contreras, Aysu Ezen Can
  • Publication number: 20200394267
    Abstract: Aspects of the present disclosure relate to identifying semantic relationships. Natural language content is received. A part of speech is determined for respective terms within the natural language content. A semantic type is determined for each of two or more terms within the natural language content. A parse tree representation containing a plurality of nodes is then generated based on the natural language content, each of the plurality of nodes corresponding to at least one term within the natural language content, wherein visual characteristics of respective nodes of the plurality of nodes within the parse tree representation depend on the part of speech and semantic type of the respective terms. A bounding box identifying a semantic relationship is then generated around a set of nodes on the parse tree representation, the set of nodes including the two or more terms.
    Type: Application
    Filed: June 11, 2019
    Publication date: December 17, 2020
    Inventors: CHRIS MWARABU, David Contreras, Roberto Delima, Corville O. Allen
  • Publication number: 20200387571
    Abstract: A method, computer system, and a computer program product for relevancy-based document quality assessment is provided. The present invention may include computing a document quality score based on at least one container relevancy score determined based on at least one domain link to a domain knowledge base.
    Type: Application
    Filed: June 6, 2019
    Publication date: December 10, 2020
    Inventors: Roberto Delima, Andrew R. Freed, Brien Muschett, Krishna Mahajan, David Contreras
  • Publication number: 20200387572
    Abstract: A method, system, and computer program product include providing a list of triggers, training the natural language processor with the list of triggers, providing to the natural language processor a text including one trigger, selecting nodes in the text to create an original potential span, predicting whether the original potential span includes another trigger, and adjusting, in response to predicting that the original potential span includes another trigger, the original potential span to exclude the another trigger to create a new potential span.
    Type: Application
    Filed: June 4, 2019
    Publication date: December 10, 2020
    Inventors: Corville O. Allen, Roberto Delima, David Contreras, Krishna Mahajan
  • Publication number: 20200342053
    Abstract: Aspects of the present disclosure relate to identifying spans within unstructured electronic text. Natural language content is received. A part of speech and slot name of each word within the natural language content is identified. A parse tree representation is then generated based on the natural language content, wherein visual characteristics of each node of a plurality of nodes within the parse tree representation depend on the part of speech and slot name of each word. A bounding box identifying a span category is then generated around a set of nodes on the parse tree representation by a machine learning model.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Inventors: CHRIS MWARABU, David Contreras, Roberto Delima, Corville O. Allen
  • Publication number: 20200334546
    Abstract: An approach is provided that receives a document and a document type of the document. The document type identifies a document category to which the received document belongs. A set of linguistic metrics are retrieved that correspond to the document type. A quality of the received document is automatically determined based on a set of linguistic features found in the document as compared to the retrieved set of linguistic metrics. The document is then ingested into a corpus that is utilized by a question-answering (QA) system. The ingestion of the document is based on the determined quality.
    Type: Application
    Filed: April 17, 2019
    Publication date: October 22, 2020
    Inventors: Brien H. Muschett, Andrew R. Freed, Roberto Delima, David Contreras, Krishna Mahajan
  • Publication number: 20200311197
    Abstract: A phrase may be received that includes a plurality of tokens in a natural language format. A plurality of levels relating to dependencies between tokens of the plurality of tokens within the phrase is determined. A matrix structure is generated for the phrase. The matrix structure utilizes a plurality of rows and a plurality of columns to store data of the phrase. The plurality of rows and the plurality of columns each indicate one of an order of tokens of the plurality of tokens or levels of the plurality of levels.
    Type: Application
    Filed: March 27, 2019
    Publication date: October 1, 2020
    Inventors: Corville O. Allen, Roberto Delima, Chris Mwarabu, David Contreras, Kandhan Sekar, Krishna Mahajan
  • Publication number: 20200302332
    Abstract: A computer-implemented method, system and computer program product for generating a client-specific document quality model, by: analyzing data using existing quality heuristics to identify new, unexpected or problem patterns in the data; forming the quality heuristics into one or more clusters for each container level of the data; exploring each of the clusters to identify sources of the patterns; and developing new quality heuristics based on the sources of the patterns, wherein the new quality heuristics are used to generate the client-specific document quality model.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 24, 2020
    Inventors: David Contreras, Krishna Mahajan, Roberto Delima, Andrew R. Freed, Brien Muschett