Patents by Inventor Laura Chiticariu

Laura Chiticariu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Producing explainable rules via deep learning

Patent number: 11900070

Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.

Type: Grant

Filed: February 3, 2020

Date of Patent: February 13, 2024

Assignee: International Business Machines Corporation

Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
Treebank synthesis for training production parsers

Patent number: 11769007

Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.

Type: Grant

Filed: May 27, 2021

Date of Patent: September 26, 2023

Assignee: International Business Machines Corporation

Inventors: Yousef El-Kurdi, Radu Florian, Hiroshi Kanayama, Efsun Kayi, Laura Chiticariu, Takuya Ohko, Robert Todd Ward
Extracting structure and semantics from tabular data

Patent number: 11650970

Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.

Type: Grant

Filed: March 9, 2018

Date of Patent: May 16, 2023

Assignee: International Business Machines Corporation

Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
Domain-specific labeled question generation for training syntactic parsers

Patent number: 11636099

Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.

Type: Grant

Filed: August 23, 2019

Date of Patent: April 25, 2023

Assignee: International Business Machines Corporation

Inventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
TREEBANK SYNTHESIS FOR TRAINING PRODUCTION PARSERS

Publication number: 20220382972

Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.

Type: Application

Filed: May 27, 2021

Publication date: December 1, 2022

Inventors: YOUSEF EL-KURDI, Radu Florian, HIROSHI KANAYAMA, Efsun Kayi, LAURA CHITICARIU, Takuya Ohko, Robert Todd Ward
PRODUCING EXPLAINABLE RULES VIA DEEP LEARNING

Publication number: 20210240917

Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.

Type: Application

Filed: February 3, 2020

Publication date: August 5, 2021

Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
DOMAIN-SPECIFIC LABELED QUESTION GENERATION FOR TRAINING SYNTACTIC PARSERS

Publication number: 20210056101

Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.

Type: Application

Filed: August 23, 2019

Publication date: February 25, 2021

Inventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
Semi-automatic process for creating a natural language processing resource

Patent number: 10783328

Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.

Type: Grant

Filed: June 4, 2018

Date of Patent: September 22, 2020

Assignee: International Business Machines Corporation

Inventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
Semi-Automatic Process for Creating a Natural Language Processing Resource

Publication number: 20190370333

Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.

Type: Application

Filed: June 4, 2018

Publication date: December 5, 2019

Inventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
Extracting Structure and Semantics from Tabular Data

Publication number: 20190278853

Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.

Type: Application

Filed: March 9, 2018

Publication date: September 12, 2019

Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
Building and maintaining information extraction rules

Patent number: 10296573

Abstract: Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.

Type: Grant

Filed: August 31, 2016

Date of Patent: May 21, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Arnaldo Carreno-Fuentes, Laura Chiticariu, Eser Kandogan, Yunyao Li, Huahai Yang
Unified text analytics annotator development life cycle combining rule-based and machine learning based techniques

Patent number: 10289963

Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine

Type: Grant

Filed: February 27, 2017

Date of Patent: May 14, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
Constructing concepts from a task specification

Patent number: 10162852

Abstract: Embodiments relate to facilitating construction of concepts from a task specification. A method includes receiving, from a user via a user interface, a task specification in natural language form. The method also includes parsing the task specification into a plurality of components, and searching a database for an existing concept having a pattern that approximates at least a portion of the plurality of components. The concept includes semantic meanings that are representable by textual patterns. The method further includes identifying any components of the plurality of components that are not included in the existing concept, and building a new concept that combines the existing concept and the components of the plurality of components that are not included in the existing concept.

Type: Grant

Filed: December 16, 2013

Date of Patent: December 25, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Laura Chiticariu, George A. Cypher, Rajasekar Krishnamurthy, Yunyao Li, Huahai Yang
UNIFIED TEXT ANALYTICS ANNOTATOR DEVELOPMENT LIFE CYCLE COMBINING RULE-BASED AND MACHINE LEARNING BASED TECHNIQUES

Publication number: 20180246867

Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combine

Type: Application

Filed: February 27, 2017

Publication date: August 30, 2018

Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
Cross-lingual information extraction program

Patent number: 10042846

Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.

Type: Grant

Filed: April 28, 2016

Date of Patent: August 7, 2018

Assignee: International Business Machines Corporation

Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
Generation of a natural language resource using a parallel corpus

Patent number: 9898460

Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trained

Type: Grant

Filed: January 26, 2016

Date of Patent: February 20, 2018

Assignee: International Business Machines Corporation

Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
CROSS-LINGUAL INFORMATION EXTRACTION PROGRAM

Publication number: 20170315986

Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.

Type: Application

Filed: April 28, 2016

Publication date: November 2, 2017

Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
Extraction of information from clinical reports

Patent number: 9734297

Abstract: A method for extracting information from electronic documents, including: learning terms and term variants from a training corpus, wherein the terms and the term variants correspond to a specialized dictionary related to the training corpus; generating a list of negative indicators found in the training corpus; performing a partial match of the terms and the term variants in a set of electronic documents to create initial match results; and performing a negation test using the negative indicators and a positive terms test using the terms and the term variants on the initial match results to remove matches from the initial match results that fail either the negation test or the positive terms test, resulting in final match results.

Type: Grant

Filed: August 28, 2012

Date of Patent: August 15, 2017

Assignee: International Business Machines Corporation

Inventors: Tanveer F Syeda-Mahmood, Laura Chiticariu
GENERATION OF A NATURAL LANGUAGE RESOURCE USING A PARALLEL CORPUS

Publication number: 20170212890

Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trained

Type: Application

Filed: January 26, 2016

Publication date: July 27, 2017

Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
Probabilistic surfacing of potentially sensitive identifiers

Patent number: 9652627

Abstract: Probabilistic surfacing of potentially sensitive identifiers is provided. In one embodiment of the present invention, a method of and computer program product for surfacing of potentially sensitive identifiers are provided. An input string is read. The input string has a length. The input string is divided into a plurality of tokens. Each of the tokens has a predetermined length. A score is determined for each of the plurality of tokens. A composite score is determined based on the scores of each of the plurality of tokens. Whether the input string comprises an identifier is determined by comparing the composite score to a predetermined threshold.

Type: Grant

Filed: October 22, 2014

Date of Patent: May 16, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Varun Bhagwan, Laura Chiticariu, Daniel F. Gruhl

1 2 next