Patents by Inventor Marina Danilevsky Hailpern
Marina Danilevsky Hailpern has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11900070Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.Type: GrantFiled: February 3, 2020Date of Patent: February 13, 2024Assignee: International Business Machines CorporationInventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
-
Patent number: 11650970Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.Type: GrantFiled: March 9, 2018Date of Patent: May 16, 2023Assignee: International Business Machines CorporationInventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
-
Publication number: 20220269858Abstract: A system, computer program product, and method are provided for jointly learning dictionary based rules and dictionary candidates. Natural language text is received and parsed into subsets, with the subset being subjected to natural language processing to identify one or more verbs within the subset. The identified verbs are evaluated with respect to a dictionary and one or more rules. The evaluation is directed at each predicate in the rules with respect to the identified verbs. A neural network is leveraged to jointly induce modification of the rules and one or more dictionaries responsive to the evaluation.Type: ApplicationFiled: February 19, 2021Publication date: August 25, 2022Applicant: International Business Machines CorporationInventors: Prithviraj Sen, Marina Danilevsky Hailpern, Yunyao Li
-
Patent number: 11200413Abstract: Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.Type: GrantFiled: July 31, 2018Date of Patent: December 14, 2021Assignee: International Business Machines CorporationInventors: Douglas Ronald Burdick, Wei Cheng, Alexandre Evfimievski, Marina Danilevsky Hailpern, Rajasekar Krishnamurthy, Shajith Ikbal Mohamed, Prithviraj Sen, Shivakumar Vaithyanathan
-
Publication number: 20210240917Abstract: A computer-implemented method according to one embodiment includes receiving, at a deep neural network (DNN), a plurality of sentences each having an associated label; training the DNN, utilizing the plurality of sentences and associated labels; and producing a linguistic expression (LE) utilizing the trained DNN.Type: ApplicationFiled: February 3, 2020Publication date: August 5, 2021Inventors: Prithviraj Sen, Siddhartha Brahma, Yunyao Li, Laura Chiticariu, Rajasekar Krishnamurthy, Shivakumar Vaithyanathan, Marina Danilevsky Hailpern
-
Patent number: 11074508Abstract: Methods, systems, and computer program products for constraint tracking and inference generation are provided herein. A computer-implemented method includes parsing descriptions of one or more user-provided constraints pertaining to data within a target system, parsing truth value assignments to the user-provided constraints, and deriving a truth value for at least one of the user-provided constraints that does not correspond to a known truth value, wherein said deriving comprises performing a logical inference utilizing known truth values of one or more of the user-provided constraints. The computer-implemented method also includes storing, in a database, (i) the user-provided constraints, (ii) the known truth values, and (iii) the at least one derived truth value, and outputting the at least one derived truth value, one or more identified contradictions among the known truth values, and/or an indication that one or more unknown truth values corresponding to the user-provided constraints remain unknown.Type: GrantFiled: March 29, 2018Date of Patent: July 27, 2021Assignee: International Business Machines CorporationInventors: Huaiyu Zhu, Michael James Wehar, Marina Danilevsky Hailpern, Mauricio Antonio Hernandez-Sherrington, Yunyao Li
-
Publication number: 20200042785Abstract: Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Douglas Ronald Burdick, Wei Cheng, Alexandre Evfimievski, Marina Danilevsky Hailpern, Rajasekar Krishnamurthy, Shajith Ikbal Mohamed, Prithviraj Sen, Shivakumar Vaithyanathan
-
Publication number: 20190303772Abstract: Methods, systems, and computer program products for constraint tracking and inference generation are provided herein. A computer-implemented method includes parsing descriptions of one or more user-provided constraints pertaining to data within a target system, parsing truth value assignments to the user-provided constraints, and deriving a truth value for at least one of the user-provided constraints that does not correspond to a known truth value, wherein said deriving comprises performing a logical inference utilizing known truth values of one or more of the user-provided constraints. The computer-implemented method also includes storing, in a database, (i) the user-provided constraints, (ii) the known truth values, and (iii) the at least one derived truth value, and outputting the at least one derived truth value, one or more identified contradictions among the known truth values, and/or an indication that one or more unknown truth values corresponding to the user-provided constraints remain unknown.Type: ApplicationFiled: March 29, 2018Publication date: October 3, 2019Inventors: Huaiyu Zhu, Michael James Wehar, Marina Danilevsky Hailpern, Mauricio Antonio Hernandez-Sherrington, Yunyao Li
-
Publication number: 20190278853Abstract: Methods, systems, and computer program products for extracting structure and semantics from tabular data are provided herein. A computer-implemented method includes processing tabular data comprising data cells and header cells, wherein the processing includes: identifying one or more regions within the tabular data, wherein each of the regions comprises one or more of the data cells; matching some of the regions to one or more of the header cells, wherein the matched header cells are semantically related to the data cells inside the matched region; and generating, based on the matching, an output describing semantic relationships between the data cells and the header cells. The method also includes creating, for each data cell, a tuple comprising semantic information contained within one or more of the header cells that pertains to the data cell.Type: ApplicationFiled: March 9, 2018Publication date: September 12, 2019Inventors: Xilun Chen, Laura Chiticariu, Alexandre Evfimievski, Marina Danilevsky Hailpern, Prithviraj Sen
-
Patent number: 10042846Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.Type: GrantFiled: April 28, 2016Date of Patent: August 7, 2018Assignee: International Business Machines CorporationInventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Patent number: 9898460Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trainedType: GrantFiled: January 26, 2016Date of Patent: February 20, 2018Assignee: International Business Machines CorporationInventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Publication number: 20170315986Abstract: One embodiment provides method for constructing a cross-lingual information extraction program, the method including: utilizing at least one processor to execute computer code that performs the steps of: constructing a plurality of language-specific representations from text expressed in a plurality of languages by parsing the text of each language using a language-specific semantic parser; mapping the plurality of language-specific representations to a single cross-lingual semantic representation, wherein the cross-lingual semantic representation encompasses the plurality of languages; and constructing the cross-lingual information extraction program based on the cross-lingual semantic representation. Other aspects are described and claimed.Type: ApplicationFiled: April 28, 2016Publication date: November 2, 2017Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu
-
Publication number: 20170212890Abstract: One embodiment provides a method for generating a natural language resource using a parallel corpus, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, from a parallel corpus, natural language text in a source language and a corresponding translation of the natural language text in a target language, wherein the natural language text in the source language comprises linguistic annotations; projecting the linguistic annotations from the source language natural language text to the target language natural language text; applying one or more filters to remove at least one projected linguistic annotation from the target language natural language text that results in at least one error; selecting at least one target language natural language text having substantially complete linguistic annotations; training a machine learning model using the selected at least one target language natural language text and annotations; and adding, using the trainedType: ApplicationFiled: January 26, 2016Publication date: July 27, 2017Inventors: Alan Akbik, Laura Chiticariu, Marina Danilevsky Hailpern, Yunyao Li, Huaiyu Zhu