Patents by Inventor Laura Chiticariu
Laura Chiticariu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11900070Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.Type: GrantFiled: February 3, 2020Date of Patent: February 13, 2024Assignee: International Business Machines CorporationInventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
-
Patent number: 11769007Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.Type: GrantFiled: May 27, 2021Date of Patent: September 26, 2023Assignee: International Business Machines CorporationInventors: Yousef El-Kurdi, Radu Florian, Hiroshi Kanayama, Efsun Kayi, Laura Chiticariu, Takuya Ohko, Robert Todd Ward
-
Patent number: 11650970Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.Type: GrantFiled: March 9, 2018Date of Patent: May 16, 2023Assignee: International Business Machines CorporationInventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
-
Patent number: 11636099Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.Type: GrantFiled: August 23, 2019Date of Patent: April 25, 2023Assignee: International Business Machines CorporationInventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
-
Publication number: 20220382972Abstract: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.Type: ApplicationFiled: May 27, 2021Publication date: December 1, 2022Inventors: YOUSEF EL-KURDI, Radu Florian, HIROSHI KANAYAMA, Efsun Kayi, LAURA CHITICARIU, Takuya Ohko, Robert Todd Ward
-
Publication number: 20210240917Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.Type: ApplicationFiled: February 3, 2020Publication date: August 5, 2021Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
-
Publication number: 20210056101Abstract: A computer-implemented method for generating a question from an abstracted template is described. A non-limiting example of the computer-implemented method includes receiving, by a processor, a question. The method parses, by the processor, the question into a parse tree and abstracts, by the processor, an abstracted template from the parse tree. The method receives, by the processor, a domain schema and a domain knowledge base and generates, by the processor, a new question based on the abstracted template, the domain schema, and the domain knowledge base.Type: ApplicationFiled: August 23, 2019Publication date: February 25, 2021Inventors: Laura Chiticariu, Aparna Garimella, Yunyao Li
-
Patent number: 10783328Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.Type: GrantFiled: June 4, 2018Date of Patent: September 22, 2020Assignee: International Business Machines CorporationInventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
-
Publication number: 20190370333Abstract: Methods, systems, and computer program products for a semi-automatic process for creating a natural language processing resource are provided herein. A computer-implemented method includes identifying multiple annotation tasks in connection with natural language processing of input text, and automatically determining, based on analysis of (i) parameters related to the identified annotation tasks and (ii) parameters related to annotation task users, routing instructions for the identified annotation tasks, wherein the routing instructions comprise (a) instructions to route a first sub-set of the identified annotation tasks to non-expert annotation task users and (b) instructions to route a second sub-set of the identified annotation tasks to expert annotation task users.Type: ApplicationFiled: June 4, 2018Publication date: December 5, 2019Inventors: Alan Akbik, Laura Chiticariu, Yunyao Li, Anbang Xu, Victor K. Ondego, Chenguang Wang
-
Publication number: 20190278853Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.Type: ApplicationFiled: March 9, 2018Publication date: September 12, 2019Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
-
Patent number: 10296573Abstract: Methods and arrangements for managing development of information extraction rules. One or more documents are opened for extraction. An interface is provided to create a label and thereupon label a portion of the document. The created label is stored, and an extractor is developed based on the labeling. A test interface is provided for the extractor, and results of a test conducted through the test interface are displayed. The extractor is exported. In accordance with at least one embodiment, developers are presented with eased automated guidance to write extractors, which thereby reduces an overall manual effort involved in extractor development. Generally, a focused, tutorial-type environment serves as a guide based on previously developed best practices.Type: GrantFiled: August 31, 2016Date of Patent: May 21, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Arnaldo Carreno-Fuentes, Laura Chiticariu, Eser Kandogan, Yunyao Li, Huahai Yang
-
Patent number: 10289963Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combineType: GrantFiled: February 27, 2017Date of Patent: May 14, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
-
Patent number: 10162852Abstract: Embodiments relate to facilitating construction of concepts from a task specification. A method includes receiving, from a user via a user interface, a task specification in natural language form. The method also includes parsing the task specification into a plurality of components, and searching a database for an existing concept having a pattern that approximates at least a portion of the plurality of components. The concept includes semantic meanings that are representable by textual patterns. The method further includes identifying any components of the plurality of components that are not included in the existing concept, and building a new concept that combines the existing concept and the components of the plurality of components that are not included in the existing concept.Type: GrantFiled: December 16, 2013Date of Patent: December 25, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Laura Chiticariu, George A. Cypher, Rajasekar Krishnamurthy, Yunyao Li, Huahai Yang
-
Publication number: 20180246867Abstract: One embodiment provides a method for developing a text analytics program for extracting at least one target concept including: utilizing at least one processor to execute computer code that performs the steps of: initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information; developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program; creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept; training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset; combining the rule-based annotator and the machine learning annotator to form a combined annotator; evaluating, using the evaluation dataset, extraction performance of the combineType: ApplicationFiled: February 27, 2017Publication date: August 30, 2018Inventors: Laura Chiticariu, Jeffrey Thomas Kreulen, Rajasekar Krishnamurthy, Prithviraj Sen, Shivakumar Vaithyanathan
-
Patent number: 10042846Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.Type: GrantFiled: April 28, 2016Date of Patent: August 7, 2018Assignee: International Business Machines CorporationInventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Patent number: 9898460Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trainedType: GrantFiled: January 26, 2016Date of Patent: February 20, 2018Assignee: International Business Machines CorporationInventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Publication number: 20170315986Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.Type: ApplicationFiled: April 28, 2016Publication date: November 2, 2017Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Patent number: 9734297Abstract: A method for extracting information from electronic documents, including: learning terms and term variants from a training corpus, wherein the terms and the term variants correspond to a specialized dictionary related to the training corpus; generating a list of negative indicators found in the training corpus; performing a partial match of the terms and the term variants in a set of electronic documents to create initial match results; and performing a negation test using the negative indicators and a positive terms test using the terms and the term variants on the initial match results to remove matches from the initial match results that fail either the negation test or the positive terms test, resulting in final match results.Type: GrantFiled: August 28, 2012Date of Patent: August 15, 2017Assignee: International Business Machines CorporationInventors: Tanveer F Syeda-Mahmood, Laura Chiticariu
-
Publication number: 20170212890Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trainedType: ApplicationFiled: January 26, 2016Publication date: July 27, 2017Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Patent number: 9652627Abstract: Probabilistic surfacing of potentially sensitive identifiers is provided. In one embodiment of the present invention, a method of and computer program product for surfacing of potentially sensitive identifiers are provided. An input string is read. The input string has a length. The input string is divided into a plurality of tokens. Each of the tokens has a predetermined length. A score is determined for each of the plurality of tokens. A composite score is determined based on the scores of each of the plurality of tokens. Whether the input string comprises an identifier is determined by comparing the composite score to a predetermined threshold.Type: GrantFiled: October 22, 2014Date of Patent: May 16, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Varun Bhagwan, Laura Chiticariu, Daniel F. Gruhl