Patents by Inventor Francesco Fusco

Francesco Fusco has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Specificity ranking of text elements and applications thereof

Patent number: 12210827

Abstract: Ranking a plurality of text elements, each comprising at least one word, by specificity. For each text element to be ranked, such a method includes computing an embedding vector that locates a text element in an embedding space, and selecting a set of text fragments from reference text. Each of these text fragments contains the text element to be ranked and further text elements. For each text fragment, the method calculates respective distances in the embedding space between the further text elements. The method further includes calculating a specificity score for the text element to be ranked and storing the specificity score. After ranking the plurality of text elements, a text data structure using the specificity scores for text elements to extract data having a desired specificity from the data structure may be processed.

Type: Grant

Filed: August 23, 2021

Date of Patent: January 28, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Francesco Fusco, Cesar Berrospi Ramis, Peter Willem Jan Staar
DETERMINING SPECIFICITY OF TEXT TERMS IN APPLICATION CONTEXTS

Publication number: 20240320249

Abstract: A computer implemented method, a computer program product and a computer system and are provided to enrich downstream learning tasks. A processor stores selected text terms from a corpus of text. A processor determines an initial set of specificity scores for the selected text terms to produce a set of training samples, where each of the training samples comprise a selected text term and an initial specificity score for the selected text term. A processor trains a character-based regression model with the set of training samples. A processor retrieves an Automated Term Extraction (ATE) training data set. A processor determines specificity scores for text terms included in the ATE training data set. A processor, responsive to respective specificity score for a text term in the ATE training data set being below a threshold value, masks the text term from being used in the ATE training data set.

Type: Application

Filed: March 22, 2023

Publication date: September 26, 2024

Inventors: Francesco Fusco, Diego Matteo Antognini
DOMAIN-SPECIFICITY PREDICTION FOR NATURAL LANGUAGE PROCESSING

Publication number: 20240320429

Abstract: A method, computer-program product and computer system are provided to determine domain-specificity of a text term. A processor receives a plurality of domain-specific text corpora, wherein each of the plurality of domain-specific text corpora comprises a plurality of text documents of a respective domain. A processor trains a set of subword-unit tokenizers with at least two different vocabulary sizes of the respective domain-specific text corpus. A processor receives the text-term. A processor determines a domain-specificity fingerprint of the text-term, wherein the domain-specificity fingerprint comprises for each subword-unit tokenizer a number of subword-units required to represent the text-term. A processor provides the domain-specificity fingerprint for determining the domain-specificity of the text term.

Type: Application

Filed: March 22, 2023

Publication date: September 26, 2024

Inventors: Diego Matteo Antognini, Francesco Fusco
SELF-SUPERVISED TERM ENCODING WITH CONFIDENCE ESTIMATION

Publication number: 20240289683

Abstract: According to one embodiment, a method and computer program product for generating a model including a term encoder is provided. The embodiment may include training the model on a training dataset that associates training terms with first embeddings of the training terms. The training includes generating, with the term encoder, second embeddings from numerical representations of word subunits of the training terms with an objective of minimizing distances between the first embeddings and the second embeddings. The word subunits form part of a predetermined set of word subunits. The training includes predicting confidence scores based on the minimized distances. The embodiment may include deploying the model as part of an executable algorithm to allow a user to infer third embeddings and corresponding confidence scores from any input terms written based on word subunits of the predetermined set.

Type: Application

Filed: February 28, 2023

Publication date: August 29, 2024

Inventors: Francesco Fusco, Diego Matteo Antognini
UPDATING WINDOW REPRESENTATIONS OF SLIDING WINDOW OF TEXT USING ROLLING SCHEME

Publication number: 20240273125

Abstract: An example system includes a processor to compute a token-level fingerprint for each of a number of tokens in a received window of text. The processor can compute a window representation for a window of text based on the token-level fingerprints. The processor can also update the window representation in a rolling scheme when sliding the window of text.

Type: Application

Filed: February 9, 2023

Publication date: August 15, 2024

Inventors: Francesco FUSCO, Diego Matteo ANTOGNINI
Automated management of data transformation flows based on semantics

Patent number: 11663228

Abstract: Various embodiments are provided for intelligent management of data flows in a computing environment by a processor. One or more data transformation in time-series data applications templates may be created and managed according to concepts, one or more instances of the concepts, relationships between the concepts, and a mapping of the concepts to one or more data sources.

Type: Grant

Filed: January 15, 2020

Date of Patent: May 30, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Francesco Fusco, Robert Gormally, Mark Purcell, Seshu Tirupathi
Management of text-item recognition systems

Patent number: 11663407

Abstract: A tool for managing text-item recognition systems such as NER (Named Entity Recognition) systems. The tool applies the system to a text corpus containing instances of text items, such as named entities, to be recognized by the system, and selecting from the text corpus a set of instances of text items which the system recognized. The tool tokenizes the text corpus such that each instance in the aforementioned set is encoded as a single token and processing the tokenized text via a word embedding scheme to generate a word embedding matrix. The tool, responsive to selecting a seed token corresponding to an instance in the aforementioned set, performs a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token, and identifies the text corresponding to each neighboring token as a potential instance of a text item to be annotated.

Type: Grant

Filed: December 2, 2020

Date of Patent: May 30, 2023

Assignee: International Business Machines Corporation

Inventors: Francesco Fusco, Abderrahim Labbi, Peter Willem Jan Staar
DATA PROCESSING APPLICATION SYSTEM MANAGEMENT IN NON-STATIONARY ENVIRONMENTS

Publication number: 20230129390

Abstract: Various embodiments are provided for managing performance of a data processing system in a computing environment using one or more processors in a computing system. A drift may be dynamically detected in one or more machine learning models generating a plurality of predictions and deployed in a computing system. A plurality of metrics and data may be collected of the one or more machine learning models based on the drift. One or more additional machine learning models may be trained based of the drift and the plurality of metrics and data.

Type: Application

Filed: October 27, 2021

Publication date: April 27, 2023

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Francesco FUSCO, Venkata Sitaramagiridharganesh GANAPAVARAPU, Seshu TIRUPATHI
SPECIFICITY RANKING OF TEXT ELEMENTS AND APPLICATIONS THEREOF

Publication number: 20230055769

Abstract: Ranking a plurality of text elements, each comprising at least one word, by specificity. For each text element to be ranked, such a method includes computing an embedding vector that locates a text element in an embedding space, and selecting a set of text fragments from reference text. Each of these text fragments contains the text element to be ranked and further text elements. For each text fragment, the method calculates respective distances in the embedding space between the further text elements. The method further includes calculating a specificity score for the text element to be ranked and storing the specificity score. After ranking the plurality of text elements, a text data structure using the specificity scores for text elements to extract data having a desired specificity from the data structure may be processed.

Type: Application

Filed: August 23, 2021

Publication date: February 23, 2023

Inventors: Francesco Fusco, Cesar Berrospi Ramis, Peter Willem Jan Staar
Matching a first collection of strings with a second collection of strings

Patent number: 11507601

Abstract: A method for matching first elements with second elements. Each of the first elements and second elements is a character string. The method comprises: calculating a first integer hash value for each of the first elements using a string hash function, wherein the first integer hash value is an output integer calculated from using each of the first elements as an input character string of the function; calculating second integer hash values for each of the second elements using the function; grouping each of the first elements into at least one group of a set of blocking groups using its first integer hash value; grouping each of the second elements into at the least one group of the set of blocking groups using its second integer hash value; and matching first elements with second elements within each group of the set of blocking groups using a string comparison function.

Type: Grant

Filed: August 18, 2016

Date of Patent: November 22, 2022

Assignee: International Business Machines Corporation

Inventors: Francesco Fusco, Yves G. Ineichen, Michel F. Speiser
Selecting forecasting models by machine learning based on analysis of model robustness

Patent number: 11475332

Abstract: A computer-implemented method, a computer program product, and a computer system for selecting predictions by models. A computer receives a request for a forecast of a dependent variable in a time domain, where the time domain includes first time periods that have normal labels due to normal predictor variable data and second time periods that have anomalous labels due to anomalous predictor variable data. The computer retrieves accuracy scores and robustness scores of models, where the accuracy scores indicate forecasting accuracy in the first time periods and the robustness scores indicate forecasting accuracy in the second time periods. For predictions in the first time period, the computer selects dependent variable values predicted by a first model that has highest values of the accuracy scores. For predictions in the second time periods, the computer selects dependent variable values predicted by a second model that has highest values of the robustness scores.

Type: Grant

Filed: July 12, 2020

Date of Patent: October 18, 2022

Assignee: International Business Machines Corporation

Inventors: Robert Gormally, Bradley Eck, Francesco Fusco, Mark Purcell, Seshu Tirupathi
Automated Creation of Machine-learning Modeling Pipelines

Publication number: 20220207349

Abstract: A computer-implemented method of generating a machine learning model pipeline (“pipeline”) for a task, where the pipeline includes a machine learning model and at least one feature. A machine learning task including a data set and a set of first tags related to the task are received from a user. It is determined whether a database stores a first machine learning model pipeline correlated in the database with a second tag matching at least one first tag received from the user. Upon determining that the database stores the first machine learning model pipeline, the first machine learning model pipeline is retrieved, the retrieved first machine learning model pipeline is run, and the machine learning task is responded to. Pipelines may also be created based on stored pipelines correlated with a tag related to a tag in the task, or from received feature generator(s) and models.

Type: Application

Filed: December 29, 2020

Publication date: June 30, 2022

Inventors: Francesco Fusco, Fearghal O'Donncha, Seshu Tirupathi
Term extraction in highly technical domains

Patent number: 11361571

Abstract: A language model is fine-tuned by extracting terminology terms from a text document. The method comprises identifying a text snippet, identifying candidate multi-word expressions using part of speech tags, and determining a specificity score value for each of the candidate multi-word expressions. Moreover, the method comprises determining a topic similarity score value for each of the candidate multi-word expressions, selecting remaining expressions from the candidate multi-word expressions using a function of a specificity value and a topic similarity value of each of the candidate multi-word expressions, adding a noun comprised in the text snippet to the remaining expressions depending on a correlation function, labeling the remaining multi-word expressions, and fine-tuning an existing pre-trained transformer-based language model using as training data the identified text snippet marked with the labeled remaining expressions.

Type: Grant

Filed: June 28, 2021

Date of Patent: June 14, 2022

Assignee: International Business Machines Corporation

Inventors: Francesco Fusco, Peter Willem Jan Staar
MANAGEMENT OF TEXT-ITEM RECOGNITION SYSTEMS

Publication number: 20220171931

Abstract: A tool for managing text-item recognition systems such as NER (Named Entity Recognition) systems. The tool applies the system to a text corpus containing instances of text items, such as named entities, to be recognized by the system, and selecting from the text corpus a set of instances of text items which the system recognized. The tool tokenizes the text corpus such that each instance in the aforementioned set is encoded as a single token and processing the tokenized text via a word embedding scheme to generate a word embedding matrix. The tool, responsive to selecting a seed token corresponding to an instance in the aforementioned set, performs a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token, and identifies the text corresponding to each neighboring token as a potential instance of a text item to be annotated.

Type: Application

Filed: December 2, 2020

Publication date: June 2, 2022

Inventors: Francesco Fusco, Abderrahim Labbi, Peter Willem Jan Staar
BOOTSTRAPPING OF TEXT CLASSIFIERS

Publication number: 20220075809

Abstract: Computer-implemented methods and systems are provided for generating training datasets for bootstrapping text classifiers. Such a method includes providing a word embedding matrix. This matrix is generated from a text corpus by encoding words in the text as respective tokens such that selected compound keywords in the text are encoded as single tokens. The method includes receiving, via a user interface, a user-selected set of the keywords a nearest neighbor search of the embedding space is performed for each keyword in the set to identify neighboring keywords, and a plurality of the neighboring keywords are added to the keyword-set. The method further comprises, for a corpus of documents, string-matching keywords in the keyword-sets to text in each document to identify, based on results of the string-matching, documents associated with each text class. The documents identified for each text class are stored as the training dataset for the classifier.

Type: Application

Filed: September 10, 2020

Publication date: March 10, 2022

Inventors: Francesco Fusco, Mattia Atzeni, Abderrahim Labbi
SELECTING FORECASTING MODELS BY MACHINE LEARNING BASED ON ANALYSIS OF MODEL ROBUSTNESS

Publication number: 20220012609

Abstract: A computer-implemented method, a computer program product, and a computer system for selecting predictions by models. A computer receives a request for a forecast of a dependent variable in a time domain, where the time domain includes first time periods that have normal labels due to normal predictor variable data and second time periods that have anomalous labels due to anomalous predictor variable data. The computer retrieves accuracy scores and robustness scores of models, where the accuracy scores indicate forecasting accuracy in the first time periods and the robustness scores indicate forecasting accuracy in the second time periods. For predictions in the first time period, the computer selects dependent variable values predicted by a first model that has highest values of the accuracy scores. For predictions in the second time periods, the computer selects dependent variable values predicted by a second model that has highest values of the robustness scores.

Type: Application

Filed: July 12, 2020

Publication date: January 13, 2022

Inventors: Robert Gormally, Bradley Eck, Francesco Fusco, Mark Purcell, Seshu Tirupathi
Automated data exploration and validation

Patent number: 11176148

Abstract: Embodiments for automated data exploration and validation by a processor. One or more optimal data flows are provided in response to a query for one or more heterogeneous data sources according to an inference model based on a knowledge graph a plurality of data flows between one or more heterogeneous data sources relating to the query. An analytical flow is provided for one or more of the plurality of data flows for those of the one or more heterogeneous data sources that are undetected, and two or more of the one or more of the plurality of data flows are aggregated or disaggregated for the one or more heterogeneous data sources that are nested within the knowledge graph. One or more criteria is received from a user via an interactive graphical user interface (GUI) to use for defining the one or more optimal data flows.

Type: Grant

Filed: August 9, 2019

Date of Patent: November 16, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ulrike Fischer, Francesco Fusco, Pascal Pompey, Mathieu Sinn
AUTOMATED MANAGEMENT OF DATA TRANSFORMATION FLOWS BASED ON SEMANTICS

Publication number: 20210216545

Abstract: Various embodiments are provided for intelligent management of data flows in a computing environment by a processor. One or more data transformation in time-series data applications templates may be created and managed according to concepts, one or more instances of the concepts, relationships between the concepts, and a mapping of the concepts to one or more data sources.

Type: Application

Filed: January 15, 2020

Publication date: July 15, 2021

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Francesco FUSCO, Robert Gormally, Mark PURCELL, Seshu Tirupathi
Composable time-series observability in sensor data fusion

Patent number: 10782655

Abstract: A sensor data fusion system includes a processor coupled to a plurality of sensors. The system is initialized by providing access to a data store storing at least one time series of sensor data; a semantic store storing semantic data including system variables, and relations between the system variables; and a mapping therebetween. A registration of a set of one or more variables of interest for which appropriate data is not available is obtained. An initially empty inference model is extended with the set of variables, to obtain an extended model. A request to observe a given one of the set of variables at a given timestamp is obtained. Responsive thereto, time series data for the set of registered variables is retrieved. The extended model is run with the retrieved data to obtain an estimate of the given one of the variables at the given timestamp.

Type: Grant

Filed: June 16, 2018

Date of Patent: September 22, 2020

Assignee: International Business Machines Corporation

Inventors: Bradley Eck, Francesco Fusco, Seshu Tirupathi
Posterior estimation of variables in water distribution networks

Patent number: 10657299

Abstract: A system for posterior estimation of variables. Receiving a set of data inputs. Determining a first model of the water distribution network based on the set of data inputs. Determining a second model of the water distribution network based on the set of data inputs, and the first model.

Type: Grant

Filed: September 21, 2018

Date of Patent: May 19, 2020

Assignee: International Business Machines Corporation

Inventors: Francesco Fusco, Sergiy Zhuk

1 2 3 next