Patents by Inventor Haode Qi

Haode Qi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240193377
    Abstract: A method, computer system, and a computer program product for training a machine learning model are provided. A machine learning model may be split into a lower portion and an upper portion. The lower portion includes at least one layer. The upper portion includes at least one layer. The lower portion may be pre-trained via a generator task and via alternating between inputting of monolingual text data and multilingual text data. The upper portion may be pre-trained via a discriminator task. The pre-trained lower portion may be joined to the pre-trained upper portion to form a trained multilingual machine learning model.
    Type: Application
    Filed: December 9, 2022
    Publication date: June 13, 2024
    Inventors: LIN PAN, Haode Qi, Ladislav Kunc, Saloni Potdar
  • Patent number: 11966699
    Abstract: A system for classifying a language sample intent by receiving a language sample including a set of features, identifying language sample features, determining a tokenization score for the language sample according to the language sample features, eliminating duplicate features according to the tokenization score, determining a term frequency (tf) according to the identified features and the tokenization score, determining an inverse document frequency (idf) according to the identified features and the tokenization score, and generating a term frequency-inverse document frequency (tf-idf) matrix for the identified features.
    Type: Grant
    Filed: June 17, 2021
    Date of Patent: April 23, 2024
    Assignee: International Business Machines Corporation
    Inventors: Abhishek Shah, Ladislav Kunc, Haode Qi, Lin Pan, Saloni Potdar
  • Publication number: 20240070401
    Abstract: Methods, systems, and computer program products for detecting out-of-domain text data in dialog systems using artificial intelligence techniques are provided herein. A computer-implemented method includes updating artificial intelligence techniques related to out-of-domain text data detection, the updating based on encoding training data and generating regularized representations of at least a portion of the encoded training data by combining the at least a portion of the encoded training data and at least one intent centroid associated with the updated artificial intelligence techniques; encoding input text data; computing out-of-domain scores, in connection with the at least one dialog system, for at least a portion of the encoded input text data by processing the at least a portion of encoded input data using at least a portion of the one or more updated artificial intelligence techniques; and performing one or more automated actions based on the computed out-of-domain scores.
    Type: Application
    Filed: August 29, 2022
    Publication date: February 29, 2024
    Inventors: Cheng Qian, Haode Qi, Saloni Potdar, Ladislav Kunc
  • Publication number: 20240037331
    Abstract: A method, a structure, and a computer system for OOD sentence detection in dialogue systems. The exemplary embodiments may include receiving, for a domain corresponding to a particular topic, one or more on-topic text inputs and one or more off-topic text inputs. The exemplary embodiments may further include encoding the one or more on-topic text inputs and the one or more off-topic text inputs into a latent space, as well as decoding the one or more on-topic text inputs and the one or more off-topic text inputs from the latent space. The exemplary embodiments may additionally include minimizing a reconstruction error between the encoded one or more on-topic text inputs and the decoded one or more on-topic text inputs, and maximizing a reconstruction error between the encoded one or more off-topic text inputs and the decoded one or more off-topic text inputs.
    Type: Application
    Filed: July 28, 2022
    Publication date: February 1, 2024
    Inventors: Haode Qi, Cheng Qian, Ladislav Kunc, Saloni Potdar, Eric Donald Wayne
  • Patent number: 11853712
    Abstract: A method, computer system, and computer program product for multi-lingual chatlog training are provided. The embodiment may include receiving, by a processor, a plurality of data related to conversational data in multiple languages. The embodiment may also include assigning an intent label to each conversational data. The embodiment may further include assigning a language label to each conversational data. The embodiment may also include paring the plurality of the data related to the conversational data according to the intent label and the language label. The embodiment may further include training a machine learning model using a multi-lingual and multi-intent conversational data pairing. The embodiment may also include training the machine learning model using a single language and multi-intent conversational data paring.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: December 26, 2023
    Assignee: International Business Machines Corporation
    Inventors: Haode Qi, Lin Pan, Abhishek Shah, Ladislav Kunc, Saloni Potdar
  • Publication number: 20220405472
    Abstract: A system for classifying a language sample intent by receiving a language sample including a set of features, identifying language sample features, determining a tokenization score for the language sample according to the language sample features, eliminating duplicate features according to the tokenization score, determining a term frequency (tf) according to the identified features and the tokenization score, determining an inverse document frequency (idf) according to the identified features and the tokenization score, and generating a term frequency-inverse document frequency (tf-idf) matrix for the identified features.
    Type: Application
    Filed: June 17, 2021
    Publication date: December 22, 2022
    Inventors: Abhishek Shah, Ladislav Kunc, Haode Qi, LIN PAN, Saloni Potdar
  • Publication number: 20220391600
    Abstract: A method, computer system, and computer program product for multi-lingual chatlog training are provided. The embodiment may include receiving, by a processor, a plurality of data related to conversational data in multiple languages. The embodiment may also include assigning an intent label to each conversational data. The embodiment may further include assigning a language label to each conversational data. The embodiment may also include paring the plurality of the data related to the conversational data according to the intent label and the language label. The embodiment may further include training a machine learning model using a multi-lingual and multi-intent conversational data pairing. The embodiment may also include training the machine learning model using a single language and multi-intent conversational data paring.
    Type: Application
    Filed: June 7, 2021
    Publication date: December 8, 2022
    Inventors: Haode Qi, LIN PAN, Abhishek Shah, Ladislav Kunc, Saloni Potdar
  • Patent number: 11423333
    Abstract: Mechanisms are provided for optimizing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial default value and initial range for sampling of a parameter of the machine learning (ML) model and an initial AutoML process is executed on the ML model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic. For each domain, a cross-dataset default value and cross-dataset value range are derived from results of the execution of the initial AutoML process. For each domain, an entry is stored in a data structure, the entry storing the derived cross-dataset default value and cross-dataset value range for the domain. The AutoML logic performs a subsequent AutoML process on a new dataset based on one or more entries of the data structure.
    Type: Grant
    Filed: March 25, 2020
    Date of Patent: August 23, 2022
    Assignee: International Business Machines Corporation
    Inventors: Haode Qi, Ming Tan, Ladislav Kunc, Saloni Potdar
  • Patent number: 11423227
    Abstract: A mechanism is provided to implement an abnormal entity detection mechanism that facilitates detecting abnormal entities in real-time response systems through weak supervision. For each first intent from an entity labeled workspace that matches a second intent in labeled chat logs, when the entity score associated with each first entity or second entity is above a predefined significance level the first entity or the second entity is recorded. For each first intent from the entity labeled workspace that matches the second intent in the labeled chat logs: responsive to the first entity being recorded and the second entity failing to be recorded, that first entity is removed from the training data as being mistakenly included; or, responsive to the second entity being recorded and the first entity failing to be recorded, that second entity is added as a potential business case to the training data.
    Type: Grant
    Filed: February 13, 2020
    Date of Patent: August 23, 2022
    Assignee: International Business Machines Corporation
    Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Ladislav Kunc, Saloni Potdar
  • Patent number: 11379666
    Abstract: A mechanism is provided to implement suggestion of new entity types with discriminative importance analysis. The mechanism obtains a list of predefined intents from a chatbot designer. The mechanism receives an input sentence having a target intent within the list of predefined intents. The mechanism performs intent-specific importance analysis on the input sentence to generate an importance score for each token in the input sentence. The mechanism ranks the tokens in the input sentence by importance score and outputs a token with a highest importance score as a candidate entity type.
    Type: Grant
    Filed: April 8, 2020
    Date of Patent: July 5, 2022
    Assignee: International Business Machines Corporation
    Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Saloni Potdar, Haoyu Wang
  • Patent number: 11308944
    Abstract: A mechanism is provided for implementing an intent segmentation mechanism that segments intent boundaries for multi-intent utterances in a conversational agent. For each term of a set of terms in the utterance from a real-time chat session, a set of adversarial utterances is generated for the utterance. An influence of changing each term is determined so as to identify a term importance value. Utilizing the term importance value, one or more of a change in ranking of the intent of the utterance or a change in confidence with regard to the intent of the utterance is identified. An entropy-based segmentation of the utterance into a plurality of candidate partitions is performed. An associated intent and entropy value are then assigned. Based on a segment with minimum entropy, a call associated with the real-time chat session is directed to an operation associated with an intent of the segment with minimum entropy.
    Type: Grant
    Filed: March 12, 2020
    Date of Patent: April 19, 2022
    Assignee: International Business Machines Corporation
    Inventors: Ming Tan, Haoyu Wang, Saloni Potdar, Yang Yu, Navneet N. Rao, Haode Qi
  • Patent number: 11270080
    Abstract: A mechanism is provided for implementing a bias detection mechanism that mitigates unintended bias in a conversational agent by leveraging conversational agent definitions, a conversational agent chat logs, and user satisfaction statistics. One or more protected attributes are identified within an utterance from the conversational agent chat logs. Using the identified protected attributes, a replacement utterance with a replacement term is generated for at least one of the identified protected attributes in the utterance. A score is generated for the utterance and the replacement utterance using utterance level relative term importance for protected attributes and regular terms in the utterance and the replacement utterance. Utilizing the scoring, a determination is made as to whether unintended bias exists within the utterance. Responsive to unintended bias being detected, an action is implemented that causes a change to a machine learning model used by the conversational agent.
    Type: Grant
    Filed: January 15, 2020
    Date of Patent: March 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Navneet N. Rao, Ming Tan, Haode Qi, Yang Yu, Panos Karagiannis, Saloni Potdar
  • Patent number: 11216619
    Abstract: A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: January 4, 2022
    Assignee: International Business Machines Corporation
    Inventors: Yang Yu, Haode Qi, Haoyu Wang, Ming Tan, Navneet N. Rao, Saloni Potdar, Robert Leslie Yates
  • Publication number: 20210334468
    Abstract: A mechanism is provided to implement a text classifier training augmentation mechanism for incorporating unlabeled data into the generation of a text classifier. For each term of a plurality of terms in each document of a plurality of documents in a set of unlabeled data, a term frequency value is determined. The term is normalized by dividing the term frequency value by a total number of terms in the document. An inverse document frequency (idf) value is determined for each term based on the term frequency value. A subset of terms is filtered from the plurality of terms based the determined idf values. The idf values for the remaining terms are transformed into feature weights. Terms from a set of labeled data are re-weighted based on the feature weights determined from the set of unlabeled data. The text classifier is then generated using the re-weighted labeled data.
    Type: Application
    Filed: April 28, 2020
    Publication date: October 28, 2021
    Inventors: Yang Yu, Haode Qi, Haoyu Wang, Ming Tan, Navneet N. Rao, Saloni Potdar, Robert Leslie Yates
  • Publication number: 20210319182
    Abstract: A mechanism is provided to implement suggestion of new entity types with discriminative importance analysis. The mechanism obtains a list of predefined intents from a chatbot designer. The mechanism receives an input sentence having a target intent within the list of predefined intents. The mechanism performs intent-specific importance analysis on the input sentence to generate an importance score for each token in the input sentence. The mechanism ranks the tokens in the input sentence by importance score and outputs a token with a highest importance score as a candidate entity type.
    Type: Application
    Filed: April 8, 2020
    Publication date: October 14, 2021
    Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Saloni Potdar, Haoyu Wang
  • Publication number: 20210304055
    Abstract: Mechanisms are provided for optimizing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial default value and initial range for sampling of a parameter of the machine learning (ML) model and an initial AutoML process is executed on the ML model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic. For each domain, a cross-dataset default value and cross-dataset value range are derived from results of the execution of the initial AutoML process. For each domain, an entry is stored in a data structure, the entry storing the derived cross-dataset default value and cross-dataset value range for the domain. The AutoML logic performs a subsequent AutoML process on a new dataset based on one or more entries of the data structure.
    Type: Application
    Filed: March 25, 2020
    Publication date: September 30, 2021
    Inventors: Haode Qi, Ming Tan, Ladislav Kunc, Saloni Potdar
  • Publication number: 20210304056
    Abstract: Mechanisms are provided for performing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial parameter sampling configuration for sampling values of parameter(s) of the machine learning (ML) model. An initial AutoML process is executed on the ML model based on a dataset utilizing the initially configured AutoML logic, to generate at least one learned value for the parameter(s) of the ML model. The dataset is analyzed to extract a set of dataset characteristics that define properties of a format and/or a content of the dataset which are stored in association with the at least one learned value as part of a training dataset. A ML prediction model is trained based on the training dataset to predict, for new datasets, corresponding new sampling configuration information based on characteristics of the new datasets.
    Type: Application
    Filed: March 25, 2020
    Publication date: September 30, 2021
    Inventors: Haode Qi, Ming Tan, Ladislav Kunc, Saloni Potdar
  • Publication number: 20210287667
    Abstract: A mechanism is provided for implementing an intent segmentation mechanism that segments intent boundaries for multi-intent utterances in a conversational agent. For each term of a set of terms in the utterance from a real-time chat session, a set of adversarial utterances is generated for the utterance. An influence of changing each term is determined so as to identify a term importance value. Utilizing the term importance value, one or more of a change in ranking of the intent of the utterance or a change in confidence with regard to the intent of the utterance is identified. An entropy-based segmentation of the utterance into a plurality of candidate partitions is performed. An associated intent and entropy value are then assigned. Based on a segment with minimum entropy, a call associated with the real-time chat session is directed to an operation associated with an intent of the segment with minimum entropy.
    Type: Application
    Filed: March 12, 2020
    Publication date: September 16, 2021
    Inventors: Ming Tan, Haoyu Wang, Saloni Potdar, Yang Yu, Navneet N. Rao, Haode Qi
  • Publication number: 20210256211
    Abstract: A mechanism is provided to implement an abnormal entity detection mechanism that facilitates detecting abnormal entities in real-time response systems through weak supervision. For each first intent from an entity labeled workspace that matches a second intent in labeled chat logs, when the entity score associated with each first entity or second entity is above a predefined significance level the first entity or the second entity is recorded. For each first intent from the entity labeled workspace that matches the second intent in the labeled chat logs: responsive to the first entity being recorded and the second entity failing to be recorded, that first entity is removed from the training data as being mistakenly included; or, responsive to the second entity being recorded and the first entity failing to be recorded, that second entity is added as a potential business case to the training data.
    Type: Application
    Filed: February 13, 2020
    Publication date: August 19, 2021
    Inventors: Haode Qi, Ming Tan, Yang Yu, Navneet N. Rao, Ladislav Kunc, Saloni Potdar
  • Publication number: 20210224415
    Abstract: A mechanism is provided to implement a personally identifiable information (PII) detection mechanism that facilitates privacy protection utilizing template embedding learned from text sequences. Input text is processed using natural language processing to identify one or more pieces of personally identifiable information. A character analysis is performed of each character of each piece of personally identifiable information of the one or more pieces of personally identifiable information to identify a character type of character in the piece of personally identifiable information. For each piece of personally identifiable information and based on the associated identified character type, the identified character type is mapped to an associated template character in a set of template characters in a template character data structure.
    Type: Application
    Filed: January 22, 2020
    Publication date: July 22, 2021
    Inventors: Haode Qi, Saloni Potdar, Ming Tan, Navneet R. Rao