Patents by Inventor Rishita Rajal Anubhai

Rishita Rajal Anubhai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-label document classification for documents from disjoint class sets

Patent number: 11741168

Abstract: Techniques for multi-label document classification are described. Clustering is used to cluster labels in a set. A machine learning model including a multi-label classifier for each cluster is created, the multi-label classifier for a given cluster to classify a document with one or more of the labels in the cluster.

Type: Grant

Filed: September 30, 2019

Date of Patent: August 29, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sravan Babu Bodapati, Rishita Rajal Anubhai, Yahor Pushkin
Creating text classification machine learning models

Patent number: 11734937

Abstract: Techniques for creating a text classifier machine learning (ML) model are described. According to some embodiments, a language processing service finetunes a language ML model on unlabeled documents of a user, and then trains that finetuned language ML model on labeled documents of the user to be a text classifier that is customized for that user’s domain, e.g., the user’s documents. Additionally, the finetuned language ML model may be trained on labeled documents of the user, for prediction objectives for unlabeled data, before being trained as the text classifier.

Type: Grant

Filed: January 2, 2020

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Yahor Pushkin, Sravan Babu Bodapati, Rishita Rajal Anubhai, Dimitrios Soulios, Yaser Al-Onaizan
Data lake-based text generation and data augmentation for machine learning training

Patent number: 11657307

Abstract: Techniques for data lake-based text generation and data augmentation for machine learning training are described. A user-provided dataset including documents and corresponding label information can be automatically supplemented by creating additional high-quality document samples, with labels, via a large repository of documents in a data lake. Documents from the data lake may be identified as being semantically similar to the user-provided documents but different enough to allow a resulting model to learn from the variation in these documents. New documents can be generated from user-provided document samples or data lake sample documents by identifying and replacing slots within the samples and rewriting adjunct tokens.

Type: Grant

Filed: November 27, 2019

Date of Patent: May 23, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Sravan Babu Bodapati, Rishita Rajal Anubhai, Georgiana Dinu, Yaser Al-Onaizan
Extending sensitive data tagging without reannotating training data

Patent number: 11531846

Abstract: Techniques for extending sensitive data tagging without reannotating training data are described. A method for extending sensitive data tagging without reannotating training data may include hosting a plurality of models at a model endpoint in a machine learning service, each model trained to identify a different sensitive data type in a transcript of content, adding a new model to the model endpoint, the new model trained to identify a new sensitive data entity in the transcript of content, identifying sensitive entities in the transcript by each of the plurality of models and the new model, merging inference responses generated by each of the plurality of models and the new model using at least one inference policy, and returning a merged inference response identifying a plurality of sensitive entities in the transcript.

Type: Grant

Filed: September 30, 2019

Date of Patent: December 20, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Sravan Babu Bodapati, Rishita Rajal Anubhai, Pu Paul Zhao, Katrin Kirchhoff
LIFECYCLE MANAGEMENT FOR CUSTOMIZED NATURAL LANGUAGE PROCESSING

Publication number: 20220100967

Abstract: Methods, systems, and computer-readable media for lifecycle management for customized natural language processing are disclosed. A natural language processing (NLP) customization service determines a task definition associated with an NLP model based (at least in part) on user input. The task definition comprises an indication of one or more tasks to be implemented using the NLP model and one or more requirements associated with use of the NLP model. The service determines the NLP model based (at least in part) on the task definition. The service trains the NLP model. The NLP model is used to perform inference for a plurality of input documents. The inference outputs a plurality of predictions based (at least in part) on the input documents. Inference data is collected based (at least in part) on the inference. The service generates a retrained NLP model based (at least in part) on the inference data.

Type: Application

Filed: September 30, 2020

Publication date: March 31, 2022

Applicant: Amazon Technologies, Inc.

Inventors: Yahor Pushkin, Rishita Rajal Anubhai, Sameer Karnik, Sunil Mallya Kasaragod, Abhinav Goyal, Yaser Al-Onaizan, Ashish Singh, Ashish Khare
EVENT EXTRACTION FROM DOCUMENTS WITH CO-REFERENCE

Publication number: 20220100963

Abstract: Methods, systems, and computer-readable media for event extraction from documents with co-reference are disclosed. An event extraction service identifies one or more trigger groups in a document comprising text. An individual one of the trigger groups comprises one or more textual references to an occurrence of an event. The one or more trigger groups are associated with one or more semantic roles for entities. The event extraction service identifies one or more entity groups in the document. An individual one of the entity groups comprises one or more textual references to a real-world object. The event extraction service assigns one or more of the entity groups to one or more of the semantic roles. The event extraction service generates an output indicating the one or more trigger groups and one or more entity groups assigned to the semantic roles.

Type: Application

Filed: September 30, 2020

Publication date: March 31, 2022

Applicant: Amazon Technologies, Inc.

Inventors: Rishita Rajal Anubhai, Yahor Pushkin, Graham Vintcent Horwood, Yinxiao Zhang, Ravindra Manjunatha, Jie Ma, Alessandra Brusadin, Jonathan Steuck, Shuai Wang, Sameer Karnik, Miguel Ballesteros Martinez, Sunil Mallya Kasaragod, Yaser Al-Onaizan
Text de-obfuscation with image recognition of text

Patent number: 11227009

Abstract: Techniques are described for a de-obfuscation framework that utilizes image recognition of text. A word input by a user is received by the de-obfuscation service. Visual feature data associated with an image corresponding to each character of the word is generated. Word embeddings are generated using the visual feature data and each character of the word using a character encoder layer. Feature vectors are generated from the word embedding by combining the generated word embeddings and a provided word embedding using a second neural network. The generated feature vector is classified. Potential text obfuscation is detected from the classified generated feature vector using a lexicon to determine de-obfuscated text closet to the user text.

Type: Grant

Filed: September 30, 2019

Date of Patent: January 18, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Rishita Rajal Anubhai, Sravan Babu Bodapati

Multi-label document classification for documents from disjoint class sets

Creating text classification machine learning models

Data lake-based text generation and data augmentation for machine learning training

Extending sensitive data tagging without reannotating training data

LIFECYCLE MANAGEMENT FOR CUSTOMIZED NATURAL LANGUAGE PROCESSING

EVENT EXTRACTION FROM DOCUMENTS WITH CO-REFERENCE

Text de-obfuscation with image recognition of text