Patents by Inventor George Andrei Saon

George Andrei Saon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220208179
    Abstract: A computer-implemented method for customizing a recurrent neural network transducer (RNN-T) is provided. The computer implemented method includes synthesizing first domain audio data from first domain text data, and feeding the synthesized first domain audio data into a trained encoder of the recurrent neural network transducer (RNN-T) having an initial condition, wherein the encoder is updated using the synthesized first domain audio data and the first domain text data. The computer implemented method further includes synthesizing second domain audio data from second domain text data, and feeding the synthesized second domain audio data into the updated encoder of the recurrent neural network transducer (RNN-T), wherein the prediction network is updated using the synthesized second domain audio data and the second domain text data. The computer implemented method further includes restoring the updated encoder to the initial condition.
    Type: Application
    Filed: December 29, 2020
    Publication date: June 30, 2022
    Inventors: Gakuto Kurata, George Andrei Saon, Brian E. D. Kingsbury
  • Publication number: 20220093083
    Abstract: A computer-implemented method is provided for model training. The method includes training a second end-to-end neural speech recognition model that has a bidirectional encoder to output same symbols from an output probability lattice of the second end-to-end neural speech recognition model as from an output probability lattice of a trained first end-to-end neural speech recognition model having a unidirectional encoder. The method also includes building a third end-to-end neural speech recognition model that has a unidirectional encoder by training the third end-to-end neural speech recognition model as a student by using the trained second end-to-end neural speech recognition model as a teacher in a knowledge distillation method.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Inventors: Gakuto Kurata, George Andrei Saon
  • Publication number: 20220059082
    Abstract: Using an encoder neural network model, an encoder vector is computed, the encoder vector comprising a vector representation of a current portion of input data in an input sequence. Using a prediction neural network model, a prediction vector is predicted, the prediction performed using a previous prediction vector and a previous output symbol corresponding to a previous portion of input data in the input sequence. Using a joint neural network model, a joint vector corresponding to the encoder vector and the prediction vector is computed, the joint vector multiplicatively combining each element of the encoder vector with a corresponding element of the prediction vector. Using a softmax function, the joint vector is converted to a probability distribution comprising a probability that a current output symbol corresponds to the current portion of input data in the input sequence.
    Type: Application
    Filed: August 21, 2020
    Publication date: February 24, 2022
    Applicant: International Business Machines Corporation
    Inventors: George Andrei Saon, Daniel Bolanos
  • Patent number: 11158303
    Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: October 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
  • Publication number: 20210065680
    Abstract: In an approach to soft-forgetting training, one or more computer processors train a first model utilizing one or more training batches wherein each training batch of the one or more training batches comprises one or more blocks of information. The one or more computer processors, responsive to a completion of the training of the first model, initiate a training of a second model utilizing the one or more training batches. The one or more computer processors jitter a random block size for each block of information for each of the one or more training batches for the second model. The one or more computer processors unroll the second model over one or more non-overlapping contiguous jittered blocks of information. The one or more computer processors, responsive to the unrolling of the second model, reduce overfitting for the second model by applying twin regularization.
    Type: Application
    Filed: August 27, 2019
    Publication date: March 4, 2021
    Inventors: Kartik Audhkhasi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury, Michael Alan Picheny
  • Patent number: 6609093
    Abstract: The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.
    Type: Grant
    Filed: June 1, 2000
    Date of Patent: August 19, 2003
    Assignee: International Business Machines Corporation
    Inventors: Ramesh Ambat Gopinath, Mukund Padmanabhan, George Andrei Saon
  • Patent number: 6385579
    Abstract: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair.
    Type: Grant
    Filed: April 29, 1999
    Date of Patent: May 7, 2002
    Assignee: International Business Machines Corporation
    Inventors: Mukund Padmanabhan, George Andrei Saon