Patents by Inventor Swetasudha Panda

Swetasudha Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Guided augmentation of data sets for machine learning models

Patent number: 12242568

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. These techniques may increase a number and diversity of examples within an initial training dataset of sentences by extracting a subset of words from the existing training dataset of sentences. The techniques may conserve scarce sample data in few-shot situations by training a data generation model using general data obtained from a general data source.

Type: Grant

Filed: September 6, 2022

Date of Patent: March 4, 2025

Assignee: Oracle International Corporation

Inventors: Ariel Gedaliah Kobren, Swetasudha Panda, Michael Louis Wick, Qinlan Shen, Jason Anthony Peck
Enforcing Fairness on Unlabeled Data to Improve Modeling Performance

Publication number: 20250068979

Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional 10 unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

Type: Application

Filed: November 8, 2024

Publication date: February 27, 2025

Inventors: Michael Louis Wick, Swetasudha Panda, Jean-Baptiste Frederic George Tristan
Enforcing fairness on unlabeled data to improve modeling performance

Patent number: 12175344

Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

Type: Grant

Filed: August 22, 2023

Date of Patent: December 24, 2024

Assignee: Oracle International Corporation

Inventors: Michael Louis Wick, Swetasudha Panda, Jean-Baptiste Frederic George Tristan
Debiasing Pre-trained Sentence Encoders With Probabilistic Dropouts

Publication number: 20240419900

Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.

Type: Application

Filed: August 27, 2024

Publication date: December 19, 2024

Inventors: Swetasudha Panda, Ariel Kobren, Michael Louis Wick, Stephen Green
Debiasing pre-trained sentence encoders with probabilistic dropouts

Patent number: 12106050

Abstract: Debiasing pre-trained sentence encoders with probabilistic dropouts may be performed by various systems, services, or applications. A sentence may be received, where the words of the sentence may be provided as tokens to an encoder of a machine learning model. A token-wise correlation using semantic orientation may be determined to determine a bias score for the tokens in the input sentence. A probability of dropout that for tokens in the input sentence may be determined from the bias scores. The machine learning model may be trained or tuned based on the probabilities of dropout for the tokens in the input sentence.

Type: Grant

Filed: January 31, 2022

Date of Patent: October 1, 2024

Assignee: Oracle International Corporation

Inventors: Swetasudha Panda, Ariel Kobren, Michael Louis Wick, Stephen Green
Determining Machine Learning Model Performance on Unlabeled Out Of Distribution Data

Publication number: 20240289685

Abstract: Machine learning model performance may be determined on unlabeled out of distribution data. A source data set may be obtained for training a machine learning model. Unbiased estimates may be determined for baseline performance indicators of the machine learning model applied to a target dataset without ground truth labels using importance sampling weights. Performance metrics may then be determined using the baselined performance indicators and provided.

Type: Application

Filed: February 28, 2023

Publication date: August 29, 2024

Inventors: Michael Louis Wick, Ariel Kobren, Swetasudha Panda, John Sullivan
Control System for Learning to Rank Fairness

Publication number: 20240202612

Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.

Type: Application

Filed: February 28, 2024

Publication date: June 20, 2024

Inventors: Jean-Baptiste Frederic George Tristan, Michael Louis Wick, Swetasudha Panda
Similarity Analysis Using Enhanced MinHash

Publication number: 20240168934

Abstract: A first set and a second set are identified as operands for a set operation of a similarity analysis task iteration. Using respective minimum hash information arrays and contributor count arrays of the two sets, a minimum hash information array and contributor count array of a derived set resulting from the set operation is generated. An entry in the contributor count array of the derived set indicates the number of child sets of the derived set that meet a criterion with respect to a corresponding entry in the minimum hash information array of the derived set. The generated minimum hash information array and the contributor count array are stored as part of input for a subsequent iteration. After a termination criterion of the task is met, output of the task is stored.

Type: Application

Filed: January 29, 2024

Publication date: May 23, 2024

Inventors: Michael Louis Wick, Jean-Baptiste Frederic George Tristan, Swetasudha Panda
Control system for learning to rank fairness

Patent number: 11948102

Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.

Type: Grant

Filed: August 12, 2022

Date of Patent: April 2, 2024

Assignee: Oracle International Corporation

Inventors: Jean-Baptiste Frederic George Tristan, Michael Louis Wick, Swetasudha Panda
Similarity analysis using enhanced MinHash

Patent number: 11921687

Abstract: A first set and a second set are identified as operands for a set operation of a similarity analysis task iteration. Using respective minimum hash information arrays and contributor count arrays of the two sets, a minimum hash information array and contributor count array of a derived set resulting from the set operation is generated. An entry in the contributor count array of the derived set indicates the number of child sets of the derived set that meet a criterion with respect to a corresponding entry in the minimum hash information array of the derived set. The generated minimum hash information array and the contributor count array are stored as part of input for a subsequent iteration. After a termination criterion of the task is met, output of the task is stored.

Type: Grant

Filed: June 10, 2019

Date of Patent: March 5, 2024

Assignee: Oracle International Corporation

Inventors: Michael Louis Wick, Jean-Baptiste Frederic George Tristan, Swetasudha Panda
Providing Fairness in Fine-Tuning of Pre-Trained Language Models

Publication number: 20230409969

Abstract: Bias in a language model generated through fine tuning of a pre-trained language model may be mitigated, whether the bias may be incorporated in the pre-trained language model or in fine-tuning data. A pre-trained language model may be fine-tuned using downstream training data. Prior to tuning, elements within the downstream data may be identified that either match or serve as proxies for one or more identity elements associated with training bias sensitivity. Proxy elements may be identified using an analysis of distributions of the downstream elements and distributions of identity elements. Once the elements are identified, instances of the identified elements may be replaced in the downstream data with one or more masking element to generate masked downstream data. A fine-tuned language model with reduced bias may then be generated from the pre-trained language model by tuning the pre-trained language model using the masked downstream data.

Type: Application

Filed: February 28, 2023

Publication date: December 21, 2023

Inventors: Swetasudha Panda, Ariel Kobren, Michael Louis Wick, Qinlan Shen
AUGMENTING DATA SETS FOR SELECTING MACHINE LEARNING MODELS

Publication number: 20230401285

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. The techniques generate synthesized data from sample data and train a machine learning model using the synthesized data to augment a sample data set. Embodiments selectively partition the sample data set and synthesized data into a training data and a validation data, which are used to generate and select machine learning models.

Type: Application

Filed: September 6, 2022

Publication date: December 14, 2023

Applicant: Oracle International Corporation

Inventors: Ariel Gedaliah Kobren, Swetasudha Panda, Michael Louis Wick, Qinlan Shen, Jason Anthony Peck
GUIDED AUGMENTION OF DATA SETS FOR MACHINE LEARNING MODELS

Publication number: 20230401286

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. These techniques may increase a number and diversity of examples within an initial training dataset of sentences by extracting a subset of words from the existing training dataset of sentences. The techniques may conserve scarce sample data in few-shot situations by training a data generation model using general data obtained from a general data source.

Type: Application

Filed: September 6, 2022

Publication date: December 14, 2023

Applicant: Oracle International Corporation

Inventors: Ariel Gedaliah Kobren, Swetasudha Panda, Michael Louis Wick, Qinlan Shen, Jason Anthony Peck
Enforcing Fairness on Unlabeled Data to Improve Modeling Performance

Publication number: 20230394371

Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

Type: Application

Filed: August 22, 2023

Publication date: December 7, 2023

Inventors: Michael Louis Wick, Swetasudha Panda, Jean-Baptiste Frederic George Tristan
ENTROPY-BASED ANTI-MODELING FOR MACHINE LEARNING APPLICATIONS

Publication number: 20230368015

Abstract: Techniques are described herein for training and applying machine learning models. The techniques include implementing an entropy-based loss function for training high-capacity machine learning models, such as deep neural networks, with anti-modeling. The entropy-based loss function may cause the model to have high entropy on negative data, helping prevent the model from becoming confidently wrong about the negative data while reducing the likelihood of generalizing from disfavored signals.

Type: Application

Filed: September 8, 2022

Publication date: November 16, 2023

Applicant: Oracle International Corporation

Inventors: Michael Louis Wick, Ariel Gedaliah Kobren, Swetasudha Panda
Enforcing fairness on unlabeled data to improve modeling performance

Patent number: 11775863

Abstract: Fairness of a trained classifier may be ensured by generating a data set for training, the data set generated using input data points of a feature space including multiple dimensions and according to different parameters including an amount of label bias, a control for discrepancy between rarity of features, and an amount of selection bias. Unlabeled data points of the input data comprising unobserved ground truths are labeled according to the amount of label bias and the input data sampled according to the amount of selection bias and the control for the discrepancy between the rarity of features. The classifier is then trained using the sampled and labeled data points as well as additional unlabeled data points. The trained classifier is then usable to determine unbiased classifications of one or more labels for one or more other data sets.

Type: Grant

Filed: February 4, 2020

Date of Patent: October 3, 2023

Assignee: Oracle International Corporation

Inventors: Michael Louis Wick, Swetasudha Panda, Jean-Baptiste Frederic George Tristan
User-level Privacy Preservation for Federated Machine Learning

Publication number: 20230047092

Abstract: User-level privacy preservation is implemented within federated machine learning. An aggregation server may distribute a machine learning model to multiple users each including respective private datasets. Individual users may train the model using the local, private dataset to generate one or more parameter updates. Prior to sending the generated parameter updates to the aggregation server for incorporation into the machine learning model, a user may modify the parameter updates by applying respective noise values to individual ones of the parameter updates to ensure differential privacy for the dataset private to the user. The aggregation server may then receive the respective modified parameter updates from the multiple users and aggregate the updates into a single set of parameter updates to update the machine learning model. The federated machine learning may further include iteratively performing said sending, training, modifying, receiving, aggregating and updating steps.

Type: Application

Filed: May 11, 2022

Publication date: February 16, 2023

Inventors: Virendra Marathe, Pallika Haridas Kanani, Daniel Peterson, Swetasudha Panda
AUGMENTING DATA SETS FOR MACHINE LEARNING MODELS

Publication number: 20230032208

Abstract: Techniques are disclosed for augmenting data sets used for training machine learning models and for generating predictions by trained machine learning models. These techniques may increase a number (and diversity) of examples within an initial training dataset of sentences by extracting a subset of words from the existing training dataset of sentences. The extracted subset includes no stopwords and fewer content words than found in the initial training dataset. The remaining words may be re-ordered. Using the extracted and re-ordered subset of words, the dataset generation model produces a second set of sentences that are different from the first set. The second set of sentences may be used to increase a number of examples in classes with few examples.

Type: Application

Filed: July 30, 2021

Publication date: February 2, 2023

Applicant: Oracle International Corporation

Inventors: Ariel Gedaliah Kobren, Naveen Jafer Nizar, Michael Louis Wick, Swetasudha Panda
Control System for Learning to Rank Fairness

Publication number: 20220382768

Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.

Type: Application

Filed: August 12, 2022

Publication date: December 1, 2022

Inventors: Jean-Baptiste Frederic George Tristan, Michael Louis Wick, Swetasudha Panda
Control system for learning to rank fairness

Patent number: 11416500

Abstract: A Bayesian test of demographic parity for learning to rank may be applied to determine ranking modifications. A fairness control system receiving a ranking of items may apply Bayes factors to determine a likelihood of bias for the ranking. These Bayes factors may include a factor for determining bias in each item and a factor for determining bias in the ranking of the items. An indicator of bias may be generated using the applied Bayes factors and the fairness control system may modify the ranking if the determines likelihood of bias satisfies modification criteria for the ranking.

Type: Grant

Filed: February 4, 2020

Date of Patent: August 16, 2022

Assignee: Oracle International Corporation

Inventors: Jean-Baptiste Frederic George Tristan, Michael Louis Wick, Swetasudha Panda

1 2 next