Patents by Inventor Saloni Potdar

Saloni Potdar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Feature reweighting in text classifier generation using unlabeled data

Patent number: 11216619

Abstract: A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.

Type: Grant

Filed: April 28, 2020

Date of Patent: January 4, 2022

Assignee: International Business Machines Corporation

Inventors: Yang Yu, Haode Qi, Haoyu Wang, Ming Tan, Navneet N. Rao, Saloni Potdar, Robert Leslie Yates
Adversarial training data augmentation for generating related responses

Patent number: 11189269

Abstract: An intelligent computer platform to introduce adversarial training to natural language processing (NLP). An initial training set is modified with synthetic training data to create an adversarial training set. The modification includes use of natural language understanding (NLU) to parse the initial training set into components and identify component categories. As input is presented, a classifier evaluates the input and leverages the adversarial training set to identify the intent of the input. An identified classification model generates accurate and reflective response data based on the received input.

Type: Grant

Filed: January 15, 2019

Date of Patent: November 30, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Ruijian Wang, Inkit Padhi, Saloni Potdar
Feature Reweighting in Text Classifier Generation Using Unlabeled Data

Publication number: 20210334468

Abstract: A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.

Type: Application

Filed: April 28, 2020

Publication date: October 28, 2021

Inventors: Yang Yu, Haode Qi, Haoyu Wang, Ming Tan, Navneet N. Rao, Saloni Potdar, Robert Leslie Yates
Suggestion of New Entity Types with Discriminative Term Importance Analysis

Publication number: 20210319182

Abstract: A mechanism is provided to implement suggestion of new entity types with discriminative importance analysis. The mechanism obtains a list of predefined intents from a chatbot designer. The mechanism receives an input sentence having a target intent within the list of predefined intents. The mechanism performs intent-specific importance analysis on the input sentence to generate an importance score for each token in the input sentence. The mechanism ranks the tokens in the input sentence by importance score and outputs a token with a highest importance score as a candidate entity type.

Type: Application

Filed: April 8, 2020

Publication date: October 14, 2021

Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Saloni Potdar, Haoyu Wang
Mechanisms for Continuous Improvement of Automated Machine Learning

Publication number: 20210304055

Abstract: Mechanisms are provided for optimizing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial default value and initial range for sampling of a parameter of the machine learning (ML) model and an initial AutoML process is executed on the ML model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic. For each domain, a cross-dataset default value and cross-dataset value range are derived from results of the execution of the initial AutoML process. For each domain, an entry is stored in a data structure, the entry storing the derived cross-dataset default value and cross-dataset value range for the domain. The AutoML logic performs a subsequent AutoML process on a new dataset based on one or more entries of the data structure.

Type: Application

Filed: March 25, 2020

Publication date: September 30, 2021

Inventors: Haode Qi, Ming Tan, Ladislav Kunc, Saloni Potdar
Learning Parameter Sampling Configuration for Automated Machine Learning

Publication number: 20210304056

Abstract: Mechanisms are provided for performing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial parameter sampling configuration for sampling values of parameter(s) of the machine learning (ML) model. An initial AutoML process is executed on the ML model based on a dataset utilizing the initially configured AutoML logic, to generate at least one learned value for the parameter(s) of the ML model. The dataset is analyzed to extract a set of dataset characteristics that define properties of a format and/or a content of the dataset which are stored in association with the at least one learned value as part of a training dataset. A ML prediction model is trained based on the training dataset to predict, for new datasets, corresponding new sampling configuration information based on characteristics of the new datasets.

Type: Application

Filed: March 25, 2020

Publication date: September 30, 2021

Inventors: Haode Qi, Ming Tan, Ladislav Kunc, Saloni Potdar
Intent Boundary Segmentation for Multi-Intent Utterances

Publication number: 20210287667

Abstract: A mechanism is provided for implementing an intent segmentation mechanism that segments intent boundaries for multi-intent utterances in a conversational agent. For each term of a set of terms in the utterance from a real-time chat session, a set of adversarial utterances is generated for the utterance. An influence of changing each term is determined so as to identify a term importance value. Utilizing the term importance value, one or more of a change in ranking of the intent of the utterance or a change in confidence with regard to the intent of the utterance is identified. An entropy-based segmentation of the utterance into a plurality of candidate partitions is performed. An associated intent and entropy value are then assigned. Based on a segment with minimum entropy, a call associated with the real-time chat session is directed to an operation associated with an intent of the segment with minimum entropy.

Type: Application

Filed: March 12, 2020

Publication date: September 16, 2021

Inventors: Ming Tan, Haoyu Wang, Saloni Potdar, Yang Yu, Navneet N. Rao, Haode Qi
Updating an online multi-domain sentence representation generation module of a text classification system

Patent number: 11120225

Abstract: An online version of a sentence representation generation module updated by training a first sentence representation generation module using first labeled data of a first corpus. After training the first sentence representation generation module using the first labeled data, a second corpus of second labeled data is obtained. The second corpus is distinct from the first corpus. A subset of the first labeled data is identified based on similarities between the first corpus and the second corpus. A second sentence representation generation module is trained using the second labeled data of the second corpus and the subset of the first labeled data.

Type: Grant

Filed: February 5, 2019

Date of Patent: September 14, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Ladislav Kunc, Yang Yu, Haoyu Wang, Saloni Potdar
Weak Supervised Abnormal Entity Detection

Publication number: 20210256211

Abstract: A mechanism is provided to implement an abnormal entity detection mechanism that facilitates detecting abnormal entities in real-time response systems through weak supervision. For each first intent from an entity labeled workspace that matches a second intent in labeled chat logs, when the entity score associated with each first entity or second entity is above a predefined significance level the first entity or the second entity is recorded. For each first intent from the entity labeled workspace that matches the second intent in the labeled chat logs: responsive to the first entity being recorded and the second entity failing to be recorded, that first entity is removed from the training data as being mistakenly included; or, responsive to the second entity being recorded and the first entity failing to be recorded, that second entity is added as a potential business case to the training data.

Type: Application

Filed: February 13, 2020

Publication date: August 19, 2021

Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Ladislav Kunc, Saloni Potdar
Adversarial training data augmentation data for text classifiers

Patent number: 11093707

Abstract: An intelligent computer platform to introduce adversarial training to natural language processing (NLP). An initial training set is modified with synthetic training data to create an adversarial training set. The modification includes use of natural language understanding (NLU) to parse the initial training set into components and identify component categories. One or more paraphrase terms are identified with respect to the components and component categories, and function as replacement terms. The synthetic training data is effectively a merging of the initial training set with the replacement terms. As input is presented, a classifier leverages the adversarial training set to identify the intent of the input and to output a classification label to generate accurate and reflective response data.

Type: Grant

Filed: January 15, 2019

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Ruijian Wang, Inkit Padhi, Saloni Potdar
Displaying text classification anomalies predicted by a text classification model

Patent number: 11074414

Abstract: A test controller submits testing phrases to a text classifier and receives, from the text classifier, classification labels each comprising one or more respective heatmap values each associated with a separate word. The test controller aligns each of the classification labels corresponding with a respective testing phrase. The test controller identifies one or more anomalies of a selection of one or more classification labels that are different from an expected classification label for the respective testing phrase. The test controller outputs a graphical representation in a user interface of the selection of one or more classification labels and one or more respective testing phrases with visual indicators based on one or more respective heatmap values.

Type: Grant

Filed: June 27, 2019

Date of Patent: July 27, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Saloni Potdar, Lakshminarayanan Krishnamurthy
Privacy Protection Through Template Embedding

Publication number: 20210224415

Abstract: A mechanism is provided to implement a personally identifiable information (PII) detection mechanism that facilitates privacy protection utilizing template embedding learned from text sequences. Input text is processed using natural language processing to identify one or more pieces of personally identifiable information. A character analysis is performed of each character of each piece of personally identifiable information of the one or more pieces of personally identifiable information to identify a character type of character in the piece of personally identifiable information. For each piece of personally identifiable information and based on the associated identified character type, the identified character type is mapped to an associated template character in a set of template characters in a template character data structure.

Type: Application

Filed: January 22, 2020

Publication date: July 22, 2021

Inventors: Haode Qi, Saloni Potdar, Ming Tan, Navneet R. Rao
Displaying text classification anomalies predicted by a text classification model

Patent number: 11068656

Abstract: A test controller submits testing phrases to a text classifier and receives, from the text classifier, classification labels each comprising one or more respective heatmap values each associated with a separate word. The test controller aligns each of the classification labels corresponding with a respective testing phrase. The test controller identifies one or more anomalies of a selection of one or more classification labels that are different from an expected classification label for the respective testing phrase. The test controller outputs a graphical representation in a user interface of the selection of one or more classification labels and one or more respective testing phrases with visual indicators based on one or more respective heatmap values.

Type: Grant

Filed: April 10, 2019

Date of Patent: July 20, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Saloni Potdar, Lakshminarayanan Krishnamurthy
Bias Detection in Conversational Agent Platforms

Publication number: 20210216720

Abstract: A mechanism is provided for implementing a bias detection mechanism that mitigates unintended bias in a conversational agent by leveraging conversational agent definitions, a conversational agent chat logs, and user satisfaction statistics. One or more protected attributes are identified within an utterance from the conversational agent chat logs. Using the identified protected attributes, a replacement utterance with a replacement term is generated for at least one of the identified protected attributes in the utterance. A score is generated for the utterance and the replacement utterance using utterance level relative term importance for protected attributes and regular terms in the utterance and the replacement utterance. Utilizing the scoring, a determination is made as to whether unintended bias exists within the utterance. Responsive to unintended bias being detected, an action is implemented that causes a change to a machine learning model used by the conversational agent.

Type: Application

Filed: January 15, 2020

Publication date: July 15, 2021

Inventors: Navneet N. Rao, Ming Tan, Haode Qi, Yang Yu, Panos Karagiannis, Saloni Potdar
Out-of-domain sentence detection

Patent number: 11023683

Abstract: A computer-implemented method includes obtaining a training data set including text data indicating one or more phrases or sentences. The computer-implemented method includes training a classifier using supervised machine learning based on the training data set and additional text data indicating one or more out-of-domain phrases or sentences. The computer-implemented method includes training an autoencoder using unsupervised machine learning based on the training data. The computer-implemented method further includes combining the classifier and the autoencoder to generate the out-of-domain sentence detector configured to generate an output indicating a classification of whether input text data corresponds to an out-of-domain sentence. The output is based on a combination of a first output of the classifier and a second output of the autoencoder.

Type: Grant

Filed: March 6, 2019

Date of Patent: June 1, 2021

Assignee: International Business Machines Corporation

Inventors: Inkit Padhi, Ruijian Wang, Haoyu Wang, Saloni Potdar
Contextual Question Answering using Human Chat Logs

Publication number: 20210150147

Abstract: A system includes a memory having instructions therein and at least one processor configured to execute the instructions to: receive a natural language question; determine, from a chat log comprising a plurality of chat session logs, a set of chat session logs most relevant to the natural language question; determine a respective plurality of non-overlapping text spans most relevant to the natural language question within each of a respective plurality of conceptual pseudo-documents; determine a conceptual pseudo-document most relevant to the natural language question; extract a question-answer pair most relevant to the natural language question from the most relevant pseudo-document; and convey the most relevant question-answer pair to a user. Each one of the conceptual pseudo-documents corresponds to a respective one of the most relevant chat session logs.

Type: Application

Filed: November 19, 2019

Publication date: May 20, 2021

Inventors: Yang Yu, Ming Tan, Shasha Lin, Saloni Potdar
ARTIFICIAL INTELLIGENCE BASED CONTEXT DEPENDENT SPELLCHECKING

Publication number: 20210141860

Abstract: Provided is a method, system, and computer program product for context-dependent spellchecking. The method comprises receiving context data to be used in spell checking. The method further comprises receiving a user input. The method further comprises identifying an out-of-vocabulary (OOV) word in the user input. An initial suggestion pool of candidate words is identified based, at least in part, on the context data. The method then comprises using a noisy channel approach to evaluate a probability that one or more of the candidate words of the initial suggestion pool is an intended word and should be used as a candidate for replacement of the OOV word. The method further comprises selecting one or more candidate words for replacement of the OOV word. The method further comprises outputting the one or more candidates.

Type: Application

Filed: November 11, 2019

Publication date: May 13, 2021

Inventors: Panos Karagiannis, Ladislav Kunc, Saloni Potdar, Haoyu Wang, Navneet N. Rao
DOMAIN SPECIFIC MODEL COMPRESSION

Publication number: 20210109991

Abstract: Domain specific model compression by providing a weighting parameter for a candidate operation of a neural network, applying the weighting parameter to an output vector of the candidate operation, performing a regularization of the weighting parameter output vector combination, compressing the neural network model according to the results of the regularization, and providing the neural network model after compression.

Type: Application

Filed: October 10, 2019

Publication date: April 15, 2021

Inventors: Haoyu Wang, Yang Yu, Ming Tan, Saloni Potdar
Weighting features for an intent classification system

Patent number: 10977445

Abstract: A computer-implemented method includes obtaining a training data set including a plurality of training examples. The method includes generating, for each training example, multiple feature vectors corresponding, respectively, to multiple feature types. The method includes applying weighting factors to feature vectors corresponding to a subset of the feature types. The weighting factors are determined based on one or more of: a number of training examples, a number of classes associated with the training data set, an average number of training examples per class, a language of the training data set, a vocabulary size of the training data set, or a commonality of the vocabulary with a public corpus. The method includes concatenating the feature vectors of a particular training example to form an input vector and providing the input vector as training data to a machine-learning intent classification model to train the model to determine intent based on text input.

Type: Grant

Filed: February 1, 2019

Date of Patent: April 13, 2021

Assignee: International Business Machines Corporation

Inventors: Yang Yu, Ladislav Kunc, Haoyu Wang, Ming Tan, Saloni Potdar
Cross-domain multi-task learning for text classification

Patent number: 10937416

Abstract: A method includes providing input text to a plurality of multi-task learning (MTL) models corresponding to a plurality of domains. Each MTL model is trained to generate an embedding vector based on the input text. The method further includes providing the input text to a domain identifier that is trained to generate a weight vector based on the input text. The weight vector indicates a classification weight for each domain of the plurality of domains. The method further includes scaling each embedding vector based on a corresponding classification weight of the weight vector to generate a plurality of scaled embedding vectors, generating a feature vector based on the plurality of scaled embedding vectors, and providing the feature vector to an intent classifier that is trained to generate, based on the feature vector, an intent classification result associated with the input text.

Type: Grant

Filed: February 1, 2019

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Ming Tan, Haoyu Wang, Ladislav Kunc, Yang Yu, Saloni Potdar

prev 1 2 3 next