Patents by Inventor Ismini Lourentzou

Ismini Lourentzou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Transforming a lexicon that describes an information asset

Patent number: 11645464

Abstract: Systems, computer-implemented methods, and computer program products to transform a lexicon that describes an information asset are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a term validation component that can determine from a subject matter expert, a validated term that can indicate validation of a candidate term that describes an information asset. The computer executable components can further comprise a lexicon transforming component that, based on the validated term, can transform a lexicon that describes the information asset, by incorporating the validated term into the lexicon.

Type: Grant

Filed: March 18, 2021

Date of Patent: May 9, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Anna Lisa Gentile, Chad Eric DeLuca, Petar Ristoski, Ismini Lourentzou, Linda Ha Kato, Alfredo Alba, Daniel Gruhl, Steven R. Welch
User-centric ontology population with user refinement

Patent number: 11593419

Abstract: One embodiment provides a method that includes determining candidate ontologies for alignment from multiple available knowledge bases. An initial target ontology is selected from the candidate ontologies and correcting the initial selected ontology with received refinement input. Concepts in the selected initial ontology are aligned with concepts of the target ontology using a deep learning hierarchical classification with received review input. A user is assisted to build, change and grow the selected initial ontology exploiting both the target ontology and new facts extracted from unstructured data.

Type: Grant

Filed: September 25, 2018

Date of Patent: February 28, 2023

Assignee: International Business Machines Corporation

Inventors: Petar Ristoski, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Chris Kau, Chad DeLuca, Linda Kato, Ismini Lourentzou, Steven R. Welch
Collaborative information extraction

Patent number: 11551437

Abstract: Embodiments relate to a system, program product, and method for information extraction and annotation of a data set. Neural models are utilized to automatically attach machine annotations to data elements within an unlabeled data set. The attached machine annotations are evaluated and a score is attached to the annotations. The score reflects a confidence of correctness of the annotations. A labeled data set is iteratively expanded with selectively evaluated annotations based on the attached score. The labeled data set is applied to an unexplored corpus to identify matching corpus data to populated instances of the labeled data set.

Type: Grant

Filed: May 29, 2019

Date of Patent: January 10, 2023

Assignee: International Business Machines Corporation

Inventors: Ismini Lourentzou, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Petar Ristoski, Chad Eric DeLuca, Linda Ha Kato, Chris Kau, Steven R. Welch
TRANSFORMING A LEXICON THAT DESCRIBES AN INFORMATION ASSET

Publication number: 20220300709

Abstract: Systems, computer-implemented methods, and computer program products to transform a lexicon that describes an information asset are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a term validation component that can determine from a subject matter expert, a validated term that can indicate validation of a candidate term that describes an information asset. The computer executable components can further comprise a lexicon transforming component that, based on the validated term, can transform a lexicon that describes the information asset, by incorporating the validated term into the lexicon.

Type: Application

Filed: March 18, 2021

Publication date: September 22, 2022

Inventors: Anna Lisa Gentile, Chad Eric DeLuca, Petar Ristoski, Ismini Lourentzou, Linda Ha Kato, Alfredo Alba, Daniel Gruhl, Steven R. Welch
Corpus expansion using lexical signatures

Patent number: 11416562

Abstract: In an approach to corpus expansion using lexical signatures, one or more computer processors retrieve a donor corpus of text, wherein the donor corpus includes a plurality of documents. One or more computer processors generate a document signature for each of the plurality of documents in the donor corpus. One or more computer processors retrieve a target corpus of text for expansion. One or more computer processors generate a corpus signature for the target corpus. One or more computer processors compare each document signature to the corpus signature. Based on the comparison, one or more computer processors determine a similarity score for each document signature. One or more computer processors rank the plurality of documents by the similarity score. One or more computer processors add one or more top-ranked documents of the plurality of documents to the target corpus.

Type: Grant

Filed: April 23, 2021

Date of Patent: August 16, 2022

Assignee: International Business Machines Corporation

Inventors: Daniel Gruhl, Anna Lisa Gentile, Petar Ristoski, Linda Ha Kato, Chad Eric DeLuca, Steven R. Welch, Alfredo Alba, Ismini Lourentzou
Identifying ambiguity in semantic resources

Patent number: 11379669

Abstract: Embodiments relate to a system, program product, and method for dictionary membership management directed at identifying ambiguity in semantic resources. A dictionary of seed terms is applied to a text corpus and matching items in the corpus are identified. The linguistic properties for each matching item are characterized and a context pattern of each matching item is constructed. Each context pattern is applied to the dictionary and matching content between the seed terms and the context pattern is identified and quantified. Lexicon items from the dictionary that have anomalous behavior reflected in the quantification are identified. One or more seed words identified as having anomalous behavior are selectively removed from the dictionary.

Type: Grant

Filed: July 29, 2019

Date of Patent: July 5, 2022

Assignee: International Business Machines Corporation

Inventors: Anna Lisa Gentile, Anni R. Coden, Ismini Lourentzou, Daniel Gruhl, Chad Eric DeLuca, Petar Ristoski, Linda Ha Kato, Chris Kau, Steven R. Welch, Alfredo Alba
IDENTIFYING SIMILARITY MATRIX FOR DERIVED PERCEPTIONS

Publication number: 20220101188

Abstract: An embodiment includes generating a query prompting a user to select from among a plurality of response options related to a first query set of objects. The embodiment also receives, responsive to the query, user input representative of a selected response option selected by the user from among the plurality of response options. The embodiment also calculates a plurality of weight values for respective ones of a plurality of similarity matrices based on the selected response option, where the plurality of similarity matrices include respective different sets of similarity values, each set of similarity values comprising similarity values representative of similarities of respective pairs of the plurality of objects. The embodiment stores a designated similarity matrix that is selected from among the plurality of similarity matrices based at least in part on a weight value from among the plurality of weight values assigned to the designated similarity matrix.

Type: Application

Filed: September 30, 2020

Publication date: March 31, 2022

Applicant: International Business Machines Corporation

Inventors: Ismini Lourentzou, Daniel Gruhl, Steven R. Welch, Chad Eric DeLuca, Alfredo Alba, Linda Ha Kato, Petar Ristoski, Anna Lisa Gentile
On-demand relation extraction from text

Patent number: 11151175

Abstract: One embodiment provides a method for on-demand relation extraction from unstructured text that includes obtaining a text corpus of domain related unstructured text. Representations of the unstructured text that capture entity-specific syntactic knowledge are created. Initial user seeds of informative examples containing relations are received. Extraction models in a neural network are trained using the initial user seeds. Performance information and a confidence score are provided for each prediction for each extraction model. A next batch of informative examples are identified for annotation from the text corpus based on training a neural network classifier on a pool of labeled informative examples. Stopping criteria is determined based on differences of the performance information and the confidence score in relation to parameters for each extraction model. Based on the stopping criteria, it is determined whether to retrain a particular extraction model after the informative examples have been labeled.

Type: Grant

Filed: September 24, 2018

Date of Patent: October 19, 2021

Assignee: International Business Machines Corporation

Inventors: Ismini Lourentzou, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Chris Kau, Chad DeLuca, Linda Kato, Petar Ristoski, Steven R. Welch
Dictionary expansion using neural language models

Patent number: 11030402

Abstract: Embodiments relate to a system, program product, and method for iterative expansion and application of a domain-specific dictionary. One or more dictionary instances are applied against a text corpus. The dictionary is iteratively expanded and selectively populated with one or more additional dictionary instances, including semantically similar instances to the applied dictionary instances and extension instances contextually related to the applied dictionary instances. The iteratively expanded dictionary is applied to an unexplored corpus to identify matching corpus data to populated instances of the dictionary.

Type: Grant

Filed: May 3, 2019

Date of Patent: June 8, 2021

Assignee: International Business Machines Corporation

Inventors: Petar Ristoski, Daniel Gruhl, Alfredo Alba, Anna Lisa Gentile, Ismini Lourentzou, Chad Eric DeLuca, Linda Ha Kato, Steven R. Welch, Chris Kau
Identifying Ambiguity in Semantic Resources

Publication number: 20210034704

Abstract: Embodiments relate to a system, program product, and method for dictionary membership management directed at identifying ambiguity in semantic resources. A dictionary of seed terms is applied to a text corpus and matching items in the corpus are identified. The linguistic properties for each matching item are characterized and a context pattern of each matching item is constructed. Each context pattern is applied to the dictionary and matching content between the seed terms and the context pattern is identified and quantified. Lexicon items from the dictionary that have anomalous behavior reflected in the quantification are identified. One or more seed words identified as having anomalous behavior are selectively removed from the dictionary.

Type: Application

Filed: July 29, 2019

Publication date: February 4, 2021

Applicant: International Business Machines Corporation

Inventors: Anna Lisa Gentile, Anni R. Coden, Ismini Lourentzou, Daniel Gruhl, Chad Eric DeLuca, Petar Ristoski, Linda Ha Kato, Chris Kau, Steven R. Welch, Alfredo Alba
Collaborative Information Extraction

Publication number: 20200380311

Abstract: Embodiments relate to a system, program product, and method for information extraction and annotation of a data set. Neural models are utilized to automatically attach machine annotations to data elements within an unlabeled data set. The attached machine annotations are evaluated and a score is attached to the annotations. The score reflects a confidence of correctness of the annotations. A labeled data set is iteratively expanded with selectively evaluated annotations based on the attached score. The labeled data set is applied to an unexplored corpus to identify matching corpus data to populated instances of the labeled data set.

Type: Application

Filed: May 29, 2019

Publication date: December 3, 2020

Applicant: International Business Machines Corporation

Inventors: Ismini Lourentzou, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Petar Ristoski, Chad Eric DeLuca, Linda Ha Kato, Chris Kau, Steven R. Welch
Dictionary Expansion Using Neural Language Models

Publication number: 20200349226

Abstract: Embodiments relate to a system, program product, and method for iterative expansion and application of a domain-specific dictionary. One or more dictionary instances are applied against a text corpus. The dictionary is iteratively expanded and selectively populated with one or more additional dictionary instances, including semantically similar instances to the applied dictionary instances and extension instances contextually related to the applied dictionary instances. The iteratively expanded dictionary is applied to an unexplored corpus to identify matching corpus data to populated instances of the dictionary.

Type: Application

Filed: May 3, 2019

Publication date: November 5, 2020

Applicant: International Business Machines Corporation

Inventors: Petar Ristoski, Daniel Gruhl, Alfredo Alba, Anna Lisa Gentile, Ismini Lourentzou, Chad Eric DeLuca, Linda Ha Kato, Steven R. Welch, Chris Kau
USER-CENTRIC ONTOLOGY POPULATION WITH USER REFINEMENT

Publication number: 20200097602

Abstract: One embodiment provides a method that includes determining candidate ontologies for alignment from multiple available knowledge bases. An initial target ontology is selected from the candidate ontologies and correcting the initial selected ontology with received refinement input. Concepts in the selected initial ontology are aligned with concepts of the target ontology using a deep learning hierarchical classification with received review input. A user is assisted to build, change and grow the selected initial ontology exploiting both the target ontology and new facts extracted from unstructured data.

Type: Application

Filed: September 25, 2018

Publication date: March 26, 2020

Inventors: Petar Ristoski, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Chris Kau, Chad DeLuca, Linda Kato, Ismini Lourentzou, Steven R. Welch
ON-DEMAND RELATION EXTRACTION FROM TEXT

Publication number: 20200097597

Abstract: One embodiment provides a method for on-demand relation extraction from unstructured text that includes obtaining a text corpus of domain related unstructured text. Representations of the unstructured text that capture entity-specific syntactic knowledge are created. Initial user seeds of informative examples containing relations are received. Extraction models in a neural network are trained using the initial user seeds. Performance information and a confidence score are provided for each prediction for each extraction model. A next batch of informative examples are identified for annotation from the text corpus based on training a neural network classifier on a pool of labeled informative examples. Stopping criteria is determined based on differences of the performance information and the confidence score in relation to parameters for each extraction model. Based on the stopping criteria, it is determined whether to retrain a particular extraction model after the informative examples have been labeled.

Type: Application

Filed: September 24, 2018

Publication date: March 26, 2020

Inventors: Ismini Lourentzou, Anna Lisa Gentile, Daniel Gruhl, Alfredo Alba, Chris Kau, Chad DeLuca, Linda Kato, Petar Ristoski, Steven R. Welch