Patents by Inventor Marco Oliveira Pena Sampaio

Marco Oliveira Pena Sampaio has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11734612
    Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: August 22, 2023
    Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira Da Cruz, Joäo António Sobral Leite Veiga, Joäo Guilherme Simöes Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro Dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
  • Patent number: 11729194
    Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.
    Type: Grant
    Filed: June 10, 2022
    Date of Patent: August 15, 2023
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernäni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Publication number: 20230074606
    Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.
    Type: Application
    Filed: June 30, 2022
    Publication date: March 9, 2023
    Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira da Cruz, João António Sobral Leite Veiga, João Guilherme Simões Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
  • Publication number: 20220382861
    Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.
    Type: Application
    Filed: June 10, 2022
    Publication date: December 1, 2022
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Patent number: 11477220
    Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: October 18, 2022
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Patent number: 11451568
    Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.
    Type: Grant
    Filed: October 29, 2019
    Date of Patent: September 20, 2022
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Publication number: 20220222670
    Abstract: A set of data elements is received. For each feature of a set of features, a corresponding reference distribution for the set of data elements is determined. For each feature of the set of features, one or more corresponding subset distributions for one or more subsets sampled from the set of data elements are determined. For each feature of the set of features, the corresponding reference distribution is compared with each of the one or more corresponding subset distributions to determine a corresponding distribution of divergences. At least the determined distributions of divergences for the set of features are provided for use in automated data analysis.
    Type: Application
    Filed: July 27, 2021
    Publication date: July 14, 2022
    Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
  • Publication number: 20220222167
    Abstract: One or more events of a data stream are received. For each feature of a set of features, the one or more events are used to update a corresponding distribution of data from the data stream. For each feature of the set of features, the corresponding updated distribution and a corresponding reference distribution are used to determine a corresponding divergence value. For each feature of the set of features, the corresponding determined divergence value and a corresponding distribution of divergences are used to determine a corresponding statistical value. Using the statistical values each corresponding to a different feature of the set of features, a statistical analysis is performed to determine a result associated with a likelihood of data drift detection.
    Type: Application
    Filed: July 27, 2021
    Publication date: July 14, 2022
    Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
  • Publication number: 20210374614
    Abstract: In various embodiments, a process for providing an active learning annotation system that does not require historical data includes receiving a stream of unlabeled data, identifying a portion of the unlabeled data to label without access to label information, and receiving a labeled version of the identified portion of the unlabeled data and storing the labeled version as labeled data. The process includes analyzing the labeled version and at least a portion of the received unlabeled data that has not been labeled to identify an additional portion of the unlabeled data to label and store in the labeled data including by applying at least one warm up policy.
    Type: Application
    Filed: May 26, 2021
    Publication date: December 2, 2021
    Inventors: Marco Oliveira Pena Sampaio, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ricardo Jorge Dias Barata, Miguel Lobo Pinto Leite, Ricardo Jorge da Graça Pacheco
  • Publication number: 20200366699
    Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.
    Type: Application
    Filed: October 29, 2019
    Publication date: November 19, 2020
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Publication number: 20200364586
    Abstract: In an embodiment, a process for explanation reporting based on differentiation between items in different data groups includes obtaining model scores from a first machine learning model and training a second machine learning model to learn how to differentiate between two groups based on at least one of: features and the model scores obtained from the first machine learning model. The process includes applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group, and based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records.
    Type: Application
    Filed: October 29, 2019
    Publication date: November 19, 2020
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
  • Publication number: 20200366698
    Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.
    Type: Application
    Filed: October 29, 2019
    Publication date: November 19, 2020
    Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues