Patents by Inventor Md Faisal M. Chowdhury

Md Faisal M. Chowdhury has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11138523
    Abstract: A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.
    Type: Grant
    Filed: July 27, 2016
    Date of Patent: October 5, 2021
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo
  • Patent number: 11113326
    Abstract: A method of extracting jargon from a document corpus stored in a database using a processor and a user interface is described herein. A sub-domain input is entered through the user interface to initiate a review of the document corpus stored in the database. The processor separates the document corpus into at least one sub-corpus and a remainder corpus. The at least one sub-corpus is defined by the sub-domain input. A first topic model and a second topic model are built to generate respective topic similarity scores for at least one term extracted from the at least one sub-corpus and at least one corresponding term extracted from the remainder corpus. The respective topic similarity scores are compared by the processor to identify jargon terms and thereby provide a list of jargon terms through the user interface.
    Type: Grant
    Filed: January 26, 2018
    Date of Patent: September 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Sharon M. Trewin
  • Publication number: 20190236206
    Abstract: A method of extracting jargon from a document corpus stored in a database using a processor and a user interface is described herein. A sub-domain input is entered through the user interface to initiate a review of the document corpus stored in the database. The processor separates the document corpus into at least one sub-corpus and a remainder corpus. The at least one sub-corpus is defined by the sub-domain input. A first topic model and a second topic model are built to generate respective topic similarity scores for at least one term extracted from the at least one sub-corpus and at least one corresponding term extracted from the remainder corpus. The respective topic similarity scores are compared by the processor to identify jargon terms and thereby provide a list of j argon terms through the user interface.
    Type: Application
    Filed: January 26, 2018
    Publication date: August 1, 2019
    Inventors: Md Faisal M. Chowdhury, Sharon M. Trewin
  • Patent number: 10282421
    Abstract: Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: May 7, 2019
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Patent number: 10261990
    Abstract: Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.
    Type: Grant
    Filed: June 28, 2016
    Date of Patent: April 16, 2019
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Publication number: 20180365210
    Abstract: Embodiments provide a system and method for short form and long form detection. Given candidate short forms, the system can generate one or more n-gram combinations, resulting in one or more candidate short form and n-gram combination pairs. For each candidate short form and n-gram combination pair, the system can calculate an approximate string matching distance, calculate a best possible alignment score, calculate a confidence score, calculate a topic similarity score, and calculate a semantic similarity score. The system can determine the validity, through a meta learner, of the one or more valid candidate short form and n-gram combination pairs based upon each short form and n-gram combination pair's confidence score, topic similarity score, and semantic similarity score, and store the valid short form and n-gram combination pairs in a repository. The system has no language specific constraints and can extract short form and long form pairs from documents written in various languages.
    Type: Application
    Filed: August 22, 2018
    Publication date: December 20, 2018
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Publication number: 20180307681
    Abstract: Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.
    Type: Application
    Filed: June 29, 2018
    Publication date: October 25, 2018
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Patent number: 10083170
    Abstract: Embodiments provide a system and method for short form and long form detection. Given candidate short forms, the system can generate one or more n-gram combinations, resulting in one or more candidate short form and n-gram combination pairs. For each candidate short form and n-gram combination pair, the system can calculate an approximate string matching distance, calculate a best possible alignment score, calculate a confidence score, calculate a topic similarity score, and calculate a semantic similarity score. The system can determine the validity, through a meta learner, of the one or more valid candidate short form and n-gram combination pairs based upon each short form and n-gram combination pair's confidence score, topic similarity score, and semantic similarity score, and store the valid short form and n-gram combination pairs in a repository. The system has no language specific constraints and can extract short form and long form pairs from documents written in various languages.
    Type: Grant
    Filed: June 28, 2016
    Date of Patent: September 25, 2018
    Assignee: International Business Machines Corporation
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Publication number: 20180032901
    Abstract: A method, system and computer-usable medium are disclosed for reducing user interaction when training an active learning system. Source input containing unlabeled instances and an input category are received. A Latent Semantic Analysis (LSA) similarity score, and a search engine score, are generated for each unlabeled instance, which in turn are used with the input category to rank the unlabeled instances. If a first threshold for negative instances has been met, a first unlabeled instance, having the highest ranking, is selected for annotation from the ranked collection of unlabeled instances and provided to a user for annotation with a positive label. If a second threshold for positive instances has been met, then second unlabeled instance, having the lowest ranking, is selected for annotation from the ranked collection of unannotated instances and automatically annotated with a negative label. The annotated instances are then used to train an active learning system.
    Type: Application
    Filed: July 27, 2016
    Publication date: February 1, 2018
    Inventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo
  • Publication number: 20180032900
    Abstract: A method, system and computer-usable medium are disclosed for reducing labeled data imbalances when training an active learning system. The ratio of instances having positive labels or negative labels in a collection of labeled instances associated with an input category used for learning is determined. A first instance for annotation is selected from a collection of unlabeled instances if a first threshold for negative instances, and a first threshold confidence level of being a positive instance of the input category, have been met. A second instance for annotation is selected if a second threshold for positive instances, and a second threshold confidence level of being a negative instance of the input category, have been met. The first and second instances are respectively annotated with a positive and negative label and added to the collection of labeled instances, which are then used for training.
    Type: Application
    Filed: July 27, 2016
    Publication date: February 1, 2018
    Inventors: Md Faisal M. Chowdhury, Sarthak Dash, Alfio M. Gliozzo
  • Publication number: 20170371862
    Abstract: Embodiments provide a system and method for short form and long form detection. Using a language-independent process, the detection system can ingest a corpus of documents, pre-process those documents by tokenizing the documents and performing a part-of-speech analysis, and can filter one or more candidate short forms using one or more filters that select for semantic criteria. Semantic criteria can include the part of speech of a token, whether the token contains more than a pre-determined amount of symbols or digits, whether the token appears too frequently in the corpus of documents, and whether the token has at least one uppercase letter. The detection system can detect short forms independent of case and punctuation, and independent of language-specific metaphone variants.
    Type: Application
    Filed: June 28, 2016
    Publication date: December 28, 2017
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo
  • Publication number: 20170371857
    Abstract: Embodiments provide a system and method for short form and long form detection. Given candidate short forms, the system can generate one or more n-gram combinations, resulting in one or more candidate short form and n-gram combination pairs. For each candidate short form and n-gram combination pair, the system can calculate an approximate string matching distance, calculate a best possible alignment score, calculate a confidence score, calculate a topic similarity score, and calculate a semantic similarity score. The system can determine the validity, through a meta learner, of the one or more valid candidate short form and n-gram combination pairs based upon each short form and n-gram combination pair's confidence score, topic similarity score, and semantic similarity score, and store the valid short form and n-gram combination pairs in a repository. The system has no language specific constraints and can extract short form and long form pairs from documents written in various languages.
    Type: Application
    Filed: June 28, 2016
    Publication date: December 28, 2017
    Inventors: Md Faisal M. Chowdhury, Michael R. Glass, Alfio M. Gliozzo