Patents by Inventor Puming Zhan

Puming Zhan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240394589
    Abstract: A method, computer program product, and computing system for determining a stride value for a first machine learning model. Transfer learning from the first machine learning model to a second machine learning model is performed, wherein the second machine learning model is an online streaming machine learning model. A spectral pooling layer is inserted into the second machine learning model using the stride value. The second machine learning model is trained with the spectral pooling layer.
    Type: Application
    Filed: May 25, 2023
    Publication date: November 28, 2024
    Inventors: Dario Albesano, Felix Weninger, Puming Zhan
  • Publication number: 20240347042
    Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A first context window is defined with a first period of past context for processing the plurality of chunks with a neural network of a speech processing system. The neural network is trained using the first context window. A second context window is defined with a first period of past context for processing the plurality of chunks with the neural network. The neural network is trained using the second context window.
    Type: Application
    Filed: April 11, 2023
    Publication date: October 17, 2024
    Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
  • Publication number: 20240347047
    Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A context window is defined for processing a chunk of the plurality of chunks using a neural network of a speech processing system. A processing load associated with the speech processing system is determined. The context window is dynamically adjusted based upon, at least in part, the processing load associated with the speech processing system.
    Type: Application
    Filed: April 11, 2023
    Publication date: October 17, 2024
    Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
  • Publication number: 20240249714
    Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.
    Type: Application
    Filed: April 4, 2024
    Publication date: July 25, 2024
    Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
  • Patent number: 11978433
    Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.
    Type: Grant
    Filed: June 22, 2021
    Date of Patent: May 7, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
  • Patent number: 11972753
    Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: April 30, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
  • Publication number: 20240127802
    Abstract: A method, computer program product, and computing system for inserting a spectral pooling layer into a neural network of a speech processing system. An output of a hidden layer of the neural network is filtered using the spectral pooling layer with a non-integer stride. The filtered output is provided to a subsequent hidden layer of the neural network.
    Type: Application
    Filed: January 31, 2023
    Publication date: April 18, 2024
    Inventors: Felix Weninger, Dario Albesano, Puming Zhan
  • Publication number: 20220406295
    Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.
    Type: Application
    Filed: June 22, 2021
    Publication date: December 22, 2022
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
  • Publication number: 20210035560
    Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.
    Type: Application
    Filed: October 20, 2020
    Publication date: February 4, 2021
    Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
  • Patent number: 10902845
    Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: January 26, 2021
    Assignee: Nuance Communications, Inc.
    Inventors: Puming Zhan, Xinwei Li
  • Patent number: 10810996
    Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: October 20, 2020
    Assignee: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
  • Publication number: 20200043468
    Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.
    Type: Application
    Filed: July 31, 2018
    Publication date: February 6, 2020
    Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
  • Publication number: 20190325859
    Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
    Type: Application
    Filed: July 1, 2019
    Publication date: October 24, 2019
    Applicant: Nuance Communications, Inc.
    Inventors: Puming Zhan, Xinwei Li
  • Patent number: 10366687
    Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
    Type: Grant
    Filed: December 10, 2015
    Date of Patent: July 30, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Puming Zhan, Xinwei Li
  • Publication number: 20170169815
    Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.
    Type: Application
    Filed: December 10, 2015
    Publication date: June 15, 2017
    Applicant: Nuance Communications, Inc.
    Inventors: Puming Zhan, Xinwei Li
  • Publication number: 20130317817
    Abstract: Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces.
    Type: Application
    Filed: May 22, 2012
    Publication date: November 28, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Paul J. Vozila, Puming Zhan
  • Patent number: 8386254
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: February 26, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Neeraj Deshmukh, Puming Zhan
  • Publication number: 20090024390
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Application
    Filed: May 2, 2008
    Publication date: January 22, 2009
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Neeraj Deshmukh, Puming Zhan
  • Patent number: 7085716
    Abstract: A method is described that corrects incorrect text associated with recognition errors in computer-implemented speech recognition. The method includes the step of performing speech recognition on an utterance to produce a recognition result for the utterance. The command includes a word and a phrase. The method includes determining if a word closely corresponds to a portion of the phrase. A speech recognition result is produced if the word closely corresponds to a portion of the phrase.
    Type: Grant
    Filed: October 26, 2000
    Date of Patent: August 1, 2006
    Assignee: Nuance Communications, Inc.
    Inventors: Stijn Van Even, Li Li, Xianju Du, Puming Zhan