Patents by Inventor Laura Chiticariu

Laura Chiticariu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11900070
    Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.
    Type: Grant
    Filed: February 3, 2020
    Date of Patent: February 13, 2024
    Assignee: International Business Machines Corporation
    Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
  • Patent number: 11769007
    Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.
    Type: Grant
    Filed: May 27, 2021
    Date of Patent: September 26, 2023
    Assignee: International Business Machines Corporation
    Inventors: Yousef El-Kurdi, Radu Florian, Hiroshi Kanayama, Efsun Kayi, Laura Chiticariu, Takuya Ohko, Robert Todd Ward
  • Patent number: 11650970
    Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: May 16, 2023
    Assignee: International Business Machines Corporation
    Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
  • Patent number: 11636099
    Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: April 25, 2023
    Assignee: International Business Machines Corporation
    Inventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
  • Publication number: 20220382972
    Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.
    Type: Application
    Filed: May 27, 2021
    Publication date: December 1, 2022
    Inventors: YOUSEF EL-KURDI, Radu Florian, HIROSHI KANAYAMA, Efsun Kayi, LAURA CHITICARIU, Takuya Ohko, Robert Todd Ward
  • Publication number: 20210240917
    Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.
    Type: Application
    Filed: February 3, 2020
    Publication date: August 5, 2021
    Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
  • Publication number: 20210056101
    Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.
    Type: Application
    Filed: August 23, 2019
    Publication date: February 25, 2021
    Inventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
  • Patent number: 10783328
    Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.
    Type: Grant
    Filed: June 4, 2018
    Date of Patent: September 22, 2020
    Assignee: International Business Machines Corporation
    Inventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
  • Publication number: 20190370333
    Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.
    Type: Application
    Filed: June 4, 2018
    Publication date: December 5, 2019
    Inventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
  • Publication number: 20190278853
    Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.
    Type: Application
    Filed: March 9, 2018
    Publication date: September 12, 2019
    Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
  • Patent number: 10296573
    Abstract: Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.
    Type: Grant
    Filed: August 31, 2016
    Date of Patent: May 21, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Arnaldo Carreno-Fuentes, Laura Chiticariu, Eser Kandogan, Yunyao Li, Huahai Yang
  • Patent number: 10289963
    Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine
    Type: Grant
    Filed: February 27, 2017
    Date of Patent: May 14, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
  • Patent number: 10162852
    Abstract: Embodiments relate to facilitating construction of concepts from a task specification. A method includes receiving, from a user via a user interface, a task specification in natural language form. The method also includes parsing the task specification into a plurality of components, and searching a database for an existing concept having a pattern that approximates at least a portion of the plurality of components. The concept includes semantic meanings that are representable by textual patterns. The method further includes identifying any components of the plurality of components that are not included in the existing concept, and building a new concept that combines the existing concept and the components of the plurality of components that are not included in the existing concept.
    Type: Grant
    Filed: December 16, 2013
    Date of Patent: December 25, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Laura Chiticariu, George A. Cypher, Rajasekar Krishnamurthy, Yunyao Li, Huahai Yang
  • Publication number: 20180246867
    Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine
    Type: Application
    Filed: February 27, 2017
    Publication date: August 30, 2018
    Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
  • Patent number: 10042846
    Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.
    Type: Grant
    Filed: April 28, 2016
    Date of Patent: August 7, 2018
    Assignee: International Business Machines Corporation
    Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
  • Patent number: 9898460
    Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trained
    Type: Grant
    Filed: January 26, 2016
    Date of Patent: February 20, 2018
    Assignee: International Business Machines Corporation
    Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
  • Publication number: 20170315986
    Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.
    Type: Application
    Filed: April 28, 2016
    Publication date: November 2, 2017
    Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
  • Patent number: 9734297
    Abstract: A method for extracting information from electronic documents, including: learning terms and term variants from a training corpus, wherein the terms and the term variants correspond to a specialized dictionary related to the training corpus; generating a list of negative indicators found in the training corpus; performing a partial match of the terms and the term variants in a set of electronic documents to create initial match results; and performing a negation test using the negative indicators and a positive terms test using the terms and the term variants on the initial match results to remove matches from the initial match results that fail either the negation test or the positive terms test, resulting in final match results.
    Type: Grant
    Filed: August 28, 2012
    Date of Patent: August 15, 2017
    Assignee: International Business Machines Corporation
    Inventors: Tanveer F Syeda-Mahmood, Laura Chiticariu
  • Publication number: 20170212890
    Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trained
    Type: Application
    Filed: January 26, 2016
    Publication date: July 27, 2017
    Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
  • Patent number: 9652627
    Abstract: Probabilistic surfacing of potentially sensitive identifiers is provided. In one embodiment of the present invention, a method of and computer program product for surfacing of potentially sensitive identifiers are provided. An input string is read. The input string has a length. The input string is divided into a plurality of tokens. Each of the tokens has a predetermined length. A score is determined for each of the plurality of tokens. A composite score is determined based on the scores of each of the plurality of tokens. Whether the input string comprises an identifier is determined by comparing the composite score to a predetermined threshold.
    Type: Grant
    Filed: October 22, 2014
    Date of Patent: May 16, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Varun Bhagwan, Laura Chiticariu, Daniel F. Gruhl