Patents by Inventor Marco Oliveira Pena Sampaio
Marco Oliveira Pena Sampaio has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11734612Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.Type: GrantFiled: June 30, 2022Date of Patent: August 22, 2023Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira Da Cruz, Joäo António Sobral Leite Veiga, Joäo Guilherme Simöes Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro Dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
-
Patent number: 11729194Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.Type: GrantFiled: June 10, 2022Date of Patent: August 15, 2023Inventors: Marco Oliveira Pena Sampaio, Fábio Hernäni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Publication number: 20230074606Abstract: In various embodiments, a process for obtaining a generated dataset with a predetermined bias for evaluating algorithmic fairness of a machine learning model includes receiving an input dataset and generating an anonymized reconstructed dataset based at least on the input dataset. The process includes introducing a predetermined bias into the generated dataset, forming an evaluation dataset based at least on the generated dataset with the predetermined bias, and outputting the evaluation dataset. In various embodiments, a process for training a generative model includes configuring a generative model and receiving training data, where the training data includes a tabular dataset. The process includes using computer processor(s) and the received training data to train the generative model, where the generative model is sampled to generate a dataset with a predetermined bias.Type: ApplicationFiled: June 30, 2022Publication date: March 9, 2023Inventors: Sérgio Gabriel Pontes Jesus, Duarte Miguel Rodrigues dos Santos Marques Alves, José Maria Pereira Rosa Correia Pombal, André Miguel Ferreira da Cruz, João António Sobral Leite Veiga, João Guilherme Simões Bravo Ferreira, Catarina Garcia Belém, Marco Oliveira Pena Sampaio, Pedro dos Santos Saleiro, Pedro Gustavo Santos Rodrigues Bizarro
-
Publication number: 20220382861Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.Type: ApplicationFiled: June 10, 2022Publication date: December 1, 2022Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Patent number: 11477220Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.Type: GrantFiled: October 29, 2019Date of Patent: October 18, 2022Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Patent number: 11451568Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.Type: GrantFiled: October 29, 2019Date of Patent: September 20, 2022Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Publication number: 20220222670Abstract: A set of data elements is received. For each feature of a set of features, a corresponding reference distribution for the set of data elements is determined. For each feature of the set of features, one or more corresponding subset distributions for one or more subsets sampled from the set of data elements are determined. For each feature of the set of features, the corresponding reference distribution is compared with each of the one or more corresponding subset distributions to determine a corresponding distribution of divergences. At least the determined distributions of divergences for the set of features are provided for use in automated data analysis.Type: ApplicationFiled: July 27, 2021Publication date: July 14, 2022Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
-
Publication number: 20220222167Abstract: One or more events of a data stream are received. For each feature of a set of features, the one or more events are used to update a corresponding distribution of data from the data stream. For each feature of the set of features, the corresponding updated distribution and a corresponding reference distribution are used to determine a corresponding divergence value. For each feature of the set of features, the corresponding determined divergence value and a corresponding distribution of divergences are used to determine a corresponding statistical value. Using the statistical values each corresponding to a different feature of the set of features, a statistical analysis is performed to determine a result associated with a likelihood of data drift detection.Type: ApplicationFiled: July 27, 2021Publication date: July 14, 2022Inventors: Marco Oliveira Pena Sampaio, Pedro Cardoso Lessa e Silva, João Dias Conde Azevedo, Ricardo Miguel de Oliveira Moreira, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ana Sofia Leal Gomes, João Miguel Forte Oliveirinha
-
Publication number: 20210374614Abstract: In various embodiments, a process for providing an active learning annotation system that does not require historical data includes receiving a stream of unlabeled data, identifying a portion of the unlabeled data to label without access to label information, and receiving a labeled version of the identified portion of the unlabeled data and storing the labeled version as labeled data. The process includes analyzing the labeled version and at least a portion of the received unlabeled data that has not been labeled to identify an additional portion of the unlabeled data to label and store in the labeled data including by applying at least one warm up policy.Type: ApplicationFiled: May 26, 2021Publication date: December 2, 2021Inventors: Marco Oliveira Pena Sampaio, João Tiago Barriga Negra Ascensão, Pedro Gustavo Santos Rodrigues Bizarro, Ricardo Jorge Dias Barata, Miguel Lobo Pinto Leite, Ricardo Jorge da Graça Pacheco
-
Publication number: 20200366699Abstract: In an embodiment, a process for adaptive threshold estimation for streaming data includes determining initial positions for a set of percentile bins, receiving a new data item in a stream of data, and identifying one of the set of percentile bins corresponding to the new data item. The process includes incrementing a count of items in the identified percentile bin, adjusting one or more counts of data items in one or more of the percentile bins including by applying a suppression factor based on a relative ordering of items, and redistributing positions for the set of percentile bins to equalize respective count numbers of items for each percentile bin of the set of percentile bins. The process includes utilizing the redistributed positions of the set of percentile bins to determine a percentile distribution of the data stream, and calculating a threshold based at least in part on the percentiles distribution.Type: ApplicationFiled: October 29, 2019Publication date: November 19, 2020Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Publication number: 20200364586Abstract: In an embodiment, a process for explanation reporting based on differentiation between items in different data groups includes obtaining model scores from a first machine learning model and training a second machine learning model to learn how to differentiate between two groups based on at least one of: features and the model scores obtained from the first machine learning model. The process includes applying the second machine learning model to each data record in a first group of data records to determine a corresponding ranking score for each data record in the first group, and based on the corresponding ranking scores, determining a relative contribution of each of the data records in the first group to the differentiation between the first group of data records and a second group of data records.Type: ApplicationFiled: October 29, 2019Publication date: November 19, 2020Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues
-
Publication number: 20200366698Abstract: In an embodiment, a process for automatic model monitoring for data streams includes receiving an input dataset, using a machine learning model to determine a model score for each data record of at least a portion of the input dataset, and determining monitoring values. Each monitoring value is associated with a measure of similarity between model scores for those data records of the input dataset within a corresponding moving reference window and model scores for those data records of the input dataset within a corresponding moving target window. The process includes outputting the determined monitoring values.Type: ApplicationFiled: October 29, 2019Publication date: November 19, 2020Inventors: Marco Oliveira Pena Sampaio, Fábio Hernâni dos Santos Costa Pinto, Pedro Gustavo Santos Rodrigues Bizarro, Pedro Cardoso Lessa e Silva, Ana Margarida Caetano Ruela, Miguel Ramos de Araújo, Nuno Miguel Lourenço Diegues