Patents by Inventor Francesco Fusco
Francesco Fusco has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12210827Abstract: Ranking a plurality of text elements, each comprising at least one word, by specificity. For each text element to be ranked, such a method includes computing an embedding vector that locates a text element in an embedding space, and selecting a set of text fragments from reference text. Each of these text fragments contains the text element to be ranked and further text elements. For each text fragment, the method calculates respective distances in the embedding space between the further text elements. The method further includes calculating a specificity score for the text element to be ranked and storing the specificity score. After ranking the plurality of text elements, a text data structure using the specificity scores for text elements to extract data having a desired specificity from the data structure may be processed.Type: GrantFiled: August 23, 2021Date of Patent: January 28, 2025Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Francesco Fusco, Cesar Berrospi Ramis, Peter Willem Jan Staar
-
Publication number: 20240320249Abstract: A computer implemented method, a computer program product and a computer system and are provided to enrich downstream learning tasks. A processor stores selected text terms from a corpus of text. A processor determines an initial set of specificity scores for the selected text terms to produce a set of training samples, where each of the training samples comprise a selected text term and an initial specificity score for the selected text term. A processor trains a character-based regression model with the set of training samples. A processor retrieves an Automated Term Extraction (ATE) training data set. A processor determines specificity scores for text terms included in the ATE training data set. A processor, responsive to respective specificity score for a text term in the ATE training data set being below a threshold value, masks the text term from being used in the ATE training data set.Type: ApplicationFiled: March 22, 2023Publication date: September 26, 2024Inventors: Francesco Fusco, Diego Matteo Antognini
-
Publication number: 20240320429Abstract: A method, computer-program product and computer system are provided to determine domain-specificity of a text term. A processor receives a plurality of domain-specific text corpora, wherein each of the plurality of domain-specific text corpora comprises a plurality of text documents of a respective domain. A processor trains a set of subword-unit tokenizers with at least two different vocabulary sizes of the respective domain-specific text corpus. A processor receives the text-term. A processor determines a domain-specificity fingerprint of the text-term, wherein the domain-specificity fingerprint comprises for each subword-unit tokenizer a number of subword-units required to represent the text-term. A processor provides the domain-specificity fingerprint for determining the domain-specificity of the text term.Type: ApplicationFiled: March 22, 2023Publication date: September 26, 2024Inventors: Diego Matteo Antognini, Francesco Fusco
-
Publication number: 20240289683Abstract: According to one embodiment, a method and computer program product for generating a model including a term encoder is provided. The embodiment may include training the model on a training dataset that associates training terms with first embeddings of the training terms. The training includes generating, with the term encoder, second embeddings from numerical representations of word subunits of the training terms with an objective of minimizing distances between the first embeddings and the second embeddings. The word subunits form part of a predetermined set of word subunits. The training includes predicting confidence scores based on the minimized distances. The embodiment may include deploying the model as part of an executable algorithm to allow a user to infer third embeddings and corresponding confidence scores from any input terms written based on word subunits of the predetermined set.Type: ApplicationFiled: February 28, 2023Publication date: August 29, 2024Inventors: Francesco Fusco, Diego Matteo Antognini
-
Publication number: 20240273125Abstract: An example system includes a processor to compute a token-level fingerprint for each of a number of tokens in a received window of text. The processor can compute a window representation for a window of text based on the token-level fingerprints. The processor can also update the window representation in a rolling scheme when sliding the window of text.Type: ApplicationFiled: February 9, 2023Publication date: August 15, 2024Inventors: Francesco FUSCO, Diego Matteo ANTOGNINI
-
Patent number: 11663228Abstract: Various embodiments are provided for intelligent management of data flows in a computing environment by a processor. One or more data transformation in time-series data applications templates may be created and managed according to concepts, one or more instances of the concepts, relationships between the concepts, and a mapping of the concepts to one or more data sources.Type: GrantFiled: January 15, 2020Date of Patent: May 30, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Francesco Fusco, Robert Gormally, Mark Purcell, Seshu Tirupathi
-
Patent number: 11663407Abstract: A tool for managing text-item recognition systems such as NER (Named Entity Recognition) systems. The tool applies the system to a text corpus containing instances of text items, such as named entities, to be recognized by the system, and selecting from the text corpus a set of instances of text items which the system recognized. The tool tokenizes the text corpus such that each instance in the aforementioned set is encoded as a single token and processing the tokenized text via a word embedding scheme to generate a word embedding matrix. The tool, responsive to selecting a seed token corresponding to an instance in the aforementioned set, performs a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token, and identifies the text corresponding to each neighboring token as a potential instance of a text item to be annotated.Type: GrantFiled: December 2, 2020Date of Patent: May 30, 2023Assignee: International Business Machines CorporationInventors: Francesco Fusco, Abderrahim Labbi, Peter Willem Jan Staar
-
Publication number: 20230129390Abstract: Various embodiments are provided for managing performance of a data processing system in a computing environment using one or more processors in a computing system. A drift may be dynamically detected in one or more machine learning models generating a plurality of predictions and deployed in a computing system. A plurality of metrics and data may be collected of the one or more machine learning models based on the drift. One or more additional machine learning models may be trained based of the drift and the plurality of metrics and data.Type: ApplicationFiled: October 27, 2021Publication date: April 27, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Francesco FUSCO, Venkata Sitaramagiridharganesh GANAPAVARAPU, Seshu TIRUPATHI
-
Publication number: 20230055769Abstract: Ranking a plurality of text elements, each comprising at least one word, by specificity. For each text element to be ranked, such a method includes computing an embedding vector that locates a text element in an embedding space, and selecting a set of text fragments from reference text. Each of these text fragments contains the text element to be ranked and further text elements. For each text fragment, the method calculates respective distances in the embedding space between the further text elements. The method further includes calculating a specificity score for the text element to be ranked and storing the specificity score. After ranking the plurality of text elements, a text data structure using the specificity scores for text elements to extract data having a desired specificity from the data structure may be processed.Type: ApplicationFiled: August 23, 2021Publication date: February 23, 2023Inventors: Francesco Fusco, Cesar Berrospi Ramis, Peter Willem Jan Staar
-
Patent number: 11507601Abstract: A method for matching first elements with second elements. Each of the first elements and second elements is a character string. The method comprises: calculating a first integer hash value for each of the first elements using a string hash function, wherein the first integer hash value is an output integer calculated from using each of the first elements as an input character string of the function; calculating second integer hash values for each of the second elements using the function; grouping each of the first elements into at least one group of a set of blocking groups using its first integer hash value; grouping each of the second elements into at the least one group of the set of blocking groups using its second integer hash value; and matching first elements with second elements within each group of the set of blocking groups using a string comparison function.Type: GrantFiled: August 18, 2016Date of Patent: November 22, 2022Assignee: International Business Machines CorporationInventors: Francesco Fusco, Yves G. Ineichen, Michel F. Speiser
-
Patent number: 11475332Abstract: A computer-implemented method, a computer program product, and a computer system for selecting predictions by models. A computer receives a request for a forecast of a dependent variable in a time domain, where the time domain includes first time periods that have normal labels due to normal predictor variable data and second time periods that have anomalous labels due to anomalous predictor variable data. The computer retrieves accuracy scores and robustness scores of models, where the accuracy scores indicate forecasting accuracy in the first time periods and the robustness scores indicate forecasting accuracy in the second time periods. For predictions in the first time period, the computer selects dependent variable values predicted by a first model that has highest values of the accuracy scores. For predictions in the second time periods, the computer selects dependent variable values predicted by a second model that has highest values of the robustness scores.Type: GrantFiled: July 12, 2020Date of Patent: October 18, 2022Assignee: International Business Machines CorporationInventors: Robert Gormally, Bradley Eck, Francesco Fusco, Mark Purcell, Seshu Tirupathi
-
Publication number: 20220207349Abstract: A computer-implemented method of generating a machine learning model pipeline (“pipeline”) for a task, where the pipeline includes a machine learning model and at least one feature. A machine learning task including a data set and a set of first tags related to the task are received from a user. It is determined whether a database stores a first machine learning model pipeline correlated in the database with a second tag matching at least one first tag received from the user. Upon determining that the database stores the first machine learning model pipeline, the first machine learning model pipeline is retrieved, the retrieved first machine learning model pipeline is run, and the machine learning task is responded to. Pipelines may also be created based on stored pipelines correlated with a tag related to a tag in the task, or from received feature generator(s) and models.Type: ApplicationFiled: December 29, 2020Publication date: June 30, 2022Inventors: Francesco Fusco, Fearghal O'Donncha, Seshu Tirupathi
-
Patent number: 11361571Abstract: A language model is fine-tuned by extracting terminology terms from a text document. The method comprises identifying a text snippet, identifying candidate multi-word expressions using part of speech tags, and determining a specificity score value for each of the candidate multi-word expressions. Moreover, the method comprises determining a topic similarity score value for each of the candidate multi-word expressions, selecting remaining expressions from the candidate multi-word expressions using a function of a specificity value and a topic similarity value of each of the candidate multi-word expressions, adding a noun comprised in the text snippet to the remaining expressions depending on a correlation function, labeling the remaining multi-word expressions, and fine-tuning an existing pre-trained transformer-based language model using as training data the identified text snippet marked with the labeled remaining expressions.Type: GrantFiled: June 28, 2021Date of Patent: June 14, 2022Assignee: International Business Machines CorporationInventors: Francesco Fusco, Peter Willem Jan Staar
-
Publication number: 20220171931Abstract: A tool for managing text-item recognition systems such as NER (Named Entity Recognition) systems. The tool applies the system to a text corpus containing instances of text items, such as named entities, to be recognized by the system, and selecting from the text corpus a set of instances of text items which the system recognized. The tool tokenizes the text corpus such that each instance in the aforementioned set is encoded as a single token and processing the tokenized text via a word embedding scheme to generate a word embedding matrix. The tool, responsive to selecting a seed token corresponding to an instance in the aforementioned set, performs a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token, and identifies the text corresponding to each neighboring token as a potential instance of a text item to be annotated.Type: ApplicationFiled: December 2, 2020Publication date: June 2, 2022Inventors: Francesco Fusco, Abderrahim Labbi, Peter Willem Jan Staar
-
Publication number: 20220075809Abstract: Computer-implemented methods and systems are provided for generating training datasets for bootstrapping text classifiers. Such a method includes providing a word embedding matrix. This matrix is generated from a text corpus by encoding words in the text as respective tokens such that selected compound keywords in the text are encoded as single tokens. The method includes receiving, via a user interface, a user-selected set of the keywords a nearest neighbor search of the embedding space is performed for each keyword in the set to identify neighboring keywords, and a plurality of the neighboring keywords are added to the keyword-set. The method further comprises, for a corpus of documents, string-matching keywords in the keyword-sets to text in each document to identify, based on results of the string-matching, documents associated with each text class. The documents identified for each text class are stored as the training dataset for the classifier.Type: ApplicationFiled: September 10, 2020Publication date: March 10, 2022Inventors: Francesco Fusco, Mattia Atzeni, Abderrahim Labbi
-
Publication number: 20220012609Abstract: A computer-implemented method, a computer program product, and a computer system for selecting predictions by models. A computer receives a request for a forecast of a dependent variable in a time domain, where the time domain includes first time periods that have normal labels due to normal predictor variable data and second time periods that have anomalous labels due to anomalous predictor variable data. The computer retrieves accuracy scores and robustness scores of models, where the accuracy scores indicate forecasting accuracy in the first time periods and the robustness scores indicate forecasting accuracy in the second time periods. For predictions in the first time period, the computer selects dependent variable values predicted by a first model that has highest values of the accuracy scores. For predictions in the second time periods, the computer selects dependent variable values predicted by a second model that has highest values of the robustness scores.Type: ApplicationFiled: July 12, 2020Publication date: January 13, 2022Inventors: Robert Gormally, Bradley Eck, Francesco Fusco, Mark Purcell, Seshu Tirupathi
-
Patent number: 11176148Abstract: Embodiments for automated data exploration and validation by a processor. One or more optimal data flows are provided in response to a query for one or more heterogeneous data sources according to an inference model based on a knowledge graph a plurality of data flows between one or more heterogeneous data sources relating to the query. An analytical flow is provided for one or more of the plurality of data flows for those of the one or more heterogeneous data sources that are undetected, and two or more of the one or more of the plurality of data flows are aggregated or disaggregated for the one or more heterogeneous data sources that are nested within the knowledge graph. One or more criteria is received from a user via an interactive graphical user interface (GUI) to use for defining the one or more optimal data flows.Type: GrantFiled: August 9, 2019Date of Patent: November 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ulrike Fischer, Francesco Fusco, Pascal Pompey, Mathieu Sinn
-
Publication number: 20210216545Abstract: Various embodiments are provided for intelligent management of data flows in a computing environment by a processor. One or more data transformation in time-series data applications templates may be created and managed according to concepts, one or more instances of the concepts, relationships between the concepts, and a mapping of the concepts to one or more data sources.Type: ApplicationFiled: January 15, 2020Publication date: July 15, 2021Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Francesco FUSCO, Robert Gormally, Mark PURCELL, Seshu Tirupathi
-
Patent number: 10782655Abstract: A sensor data fusion system includes a processor coupled to a plurality of sensors. The system is initialized by providing access to a data store storing at least one time series of sensor data; a semantic store storing semantic data including system variables, and relations between the system variables; and a mapping therebetween. A registration of a set of one or more variables of interest for which appropriate data is not available is obtained. An initially empty inference model is extended with the set of variables, to obtain an extended model. A request to observe a given one of the set of variables at a given timestamp is obtained. Responsive thereto, time series data for the set of registered variables is retrieved. The extended model is run with the retrieved data to obtain an estimate of the given one of the variables at the given timestamp.Type: GrantFiled: June 16, 2018Date of Patent: September 22, 2020Assignee: International Business Machines CorporationInventors: Bradley Eck, Francesco Fusco, Seshu Tirupathi
-
Patent number: 10657299Abstract: A system for posterior estimation of variables. Receiving a set of data inputs. Determining a first model of the water distribution network based on the set of data inputs. Determining a second model of the water distribution network based on the set of data inputs, and the first model.Type: GrantFiled: September 21, 2018Date of Patent: May 19, 2020Assignee: International Business Machines CorporationInventors: Francesco Fusco, Sergiy Zhuk