Patents by Inventor Marco Oliveira Pena Sampaio

Marco Oliveira Pena Sampaio has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model

Patent number: 11734612

Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.

Type: Grant

Filed: June 30, 2022

Date of Patent: August 22, 2023

Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira Da Cruz, Joäo António Sobral Leite Veiga, Joäo Guilherme Simöes Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro Dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
Automatic model monitoring for data streams

Patent number: 11729194

Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.

Type: Grant

Filed: June 10, 2022

Date of Patent: August 15, 2023

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernäni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
OBTAINING A GENERATED DATASET WITH A PREDETERMINED BIAS FOR EVALUATING ALGORITHMIC FAIRNESS OF A MACHINE LEARNING MODEL

Publication number: 20230074606

Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.

Type: Application

Filed: June 30, 2022

Publication date: March 9, 2023

Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira da Cruz, João António Sobral Leite Veiga, João Guilherme Simões Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
AUTOMATIC MODEL MONITORING FOR DATA STREAMS

Publication number: 20220382861

Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.

Type: Application

Filed: June 10, 2022

Publication date: December 1, 2022

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
Adaptive threshold estimation for streaming data

Patent number: 11477220

Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.

Type: Grant

Filed: October 29, 2019

Date of Patent: October 18, 2022

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
Automatic model monitoring for data streams

Patent number: 11451568

Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.

Type: Grant

Filed: October 29, 2019

Date of Patent: September 20, 2022

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
GENERATION OF DIVERGENCE DISTRIBUTIONS FOR AUTOMATED DATA ANALYSIS

Publication number: 20220222670

Abstract: A set of data elements is received. For each feature of a set of features, a corresponding reference distribution for the set of data elements is determined. For each feature of the set of features, one or more corresponding subset distributions for one or more subsets sampled from the set of data elements are determined. For each feature of the set of features, the corresponding reference distribution is compared with each of the one or more corresponding subset distributions to determine a corresponding distribution of divergences. At least the determined distributions of divergences for the set of features are provided for use in automated data analysis.

Type: Application

Filed: July 27, 2021

Publication date: July 14, 2022

Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
AUTOMATED FEATURE MONITORING FOR DATA STREAMS

Publication number: 20220222167

Abstract: One or more events of a data stream are received. For each feature of a set of features, the one or more events are used to update a corresponding distribution of data from the data stream. For each feature of the set of features, the corresponding updated distribution and a corresponding reference distribution are used to determine a corresponding divergence value. For each feature of the set of features, the corresponding determined divergence value and a corresponding distribution of divergences are used to determine a corresponding statistical value. Using the statistical values each corresponding to a different feature of the set of features, a statistical analysis is performed to determine a result associated with a likelihood of data drift detection.

Type: Application

Filed: July 27, 2021

Publication date: July 14, 2022

Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
ACTIVE LEARNING ANNOTATION SYSTEM THAT DOES NOT REQUIRE HISTORICAL DATA

Publication number: 20210374614

Abstract: In various embodiments, a process for providing an active learning annotation system that does not require historical data includes receiving a stream of unlabeled data, identifying a portion of the unlabeled data to label without access to label information, and receiving a labeled version of the identified portion of the unlabeled data and storing the labeled version as labeled data. The process includes analyzing the labeled version and at least a portion of the received unlabeled data that has not been labeled to identify an additional portion of the unlabeled data to label and store in the labeled data including by applying at least one warm up policy.

Type: Application

Filed: May 26, 2021

Publication date: December 2, 2021

Inventors: Marco Oliveira Pena Sampaio, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ricardo Jorge Dias Barata, Miguel Lobo Pinto Leite, Ricardo Jorge da Graça Pacheco
ADAPTIVE THRESHOLD ESTIMATION FOR STREAMING DATA

Publication number: 20200366699

Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.

Type: Application

Filed: October 29, 2019

Publication date: November 19, 2020

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
EXPLANATION REPORTING BASED ON DIFFERENTIATION BETWEEN ITEMS IN DIFFERENT DATA GROUPS

Publication number: 20200364586

Abstract: In an embodiment, a process for explanation reporting based on differentiation between items in different data groups includes obtaining model scores from a first machine learning model and training a second machine learning model to learn how to differentiate between two groups based on at least one of: features and the model scores obtained from the first machine learning model. The process includes applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group, and based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records.

Type: Application

Filed: October 29, 2019

Publication date: November 19, 2020

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
AUTOMATIC MODEL MONITORING FOR DATA STREAMS

Publication number: 20200366698

Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.

Type: Application

Filed: October 29, 2019

Publication date: November 19, 2020

Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues