Patents by Inventor Puming Zhan
Puming Zhan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240394589Abstract: A method, computer program product, and computing system for determining a stride value for a first machine learning model. Transfer learning from the first machine learning model to a second machine learning model is performed, wherein the second machine learning model is an online streaming machine learning model. A spectral pooling layer is inserted into the second machine learning model using the stride value. The second machine learning model is trained with the spectral pooling layer.Type: ApplicationFiled: May 25, 2023Publication date: November 28, 2024Inventors: Dario Albesano, Felix Weninger, Puming Zhan
-
Publication number: 20240347042Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A first context window is defined with a first period of past context for processing the plurality of chunks with a neural network of a speech processing system. The neural network is trained using the first context window. A second context window is defined with a first period of past context for processing the plurality of chunks with the neural network. The neural network is trained using the second context window.Type: ApplicationFiled: April 11, 2023Publication date: October 17, 2024Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
-
Publication number: 20240347047Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A context window is defined for processing a chunk of the plurality of chunks using a neural network of a speech processing system. A processing load associated with the speech processing system is determined. The context window is dynamically adjusted based upon, at least in part, the processing load associated with the speech processing system.Type: ApplicationFiled: April 11, 2023Publication date: October 17, 2024Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
-
Publication number: 20240249714Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.Type: ApplicationFiled: April 4, 2024Publication date: July 25, 2024Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
-
Patent number: 11978433Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.Type: GrantFiled: June 22, 2021Date of Patent: May 7, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
-
Patent number: 11972753Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.Type: GrantFiled: October 20, 2020Date of Patent: April 30, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
-
Publication number: 20240127802Abstract: A method, computer program product, and computing system for inserting a spectral pooling layer into a neural network of a speech processing system. An output of a hidden layer of the neural network is filtered using the spectral pooling layer with a non-integer stride. The filtered output is provided to a subsequent hidden layer of the neural network.Type: ApplicationFiled: January 31, 2023Publication date: April 18, 2024Inventors: Felix Weninger, Dario Albesano, Puming Zhan
-
Publication number: 20220406295Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.Type: ApplicationFiled: June 22, 2021Publication date: December 22, 2022Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
-
Publication number: 20210035560Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.Type: ApplicationFiled: October 20, 2020Publication date: February 4, 2021Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
-
Patent number: 10902845Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.Type: GrantFiled: July 1, 2019Date of Patent: January 26, 2021Assignee: Nuance Communications, Inc.Inventors: Puming Zhan, Xinwei Li
-
Patent number: 10810996Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.Type: GrantFiled: July 31, 2018Date of Patent: October 20, 2020Assignee: NUANCE COMMUNICATIONS, INC.Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
-
Publication number: 20200043468Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
-
Publication number: 20190325859Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.Type: ApplicationFiled: July 1, 2019Publication date: October 24, 2019Applicant: Nuance Communications, Inc.Inventors: Puming Zhan, Xinwei Li
-
Patent number: 10366687Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.Type: GrantFiled: December 10, 2015Date of Patent: July 30, 2019Assignee: Nuance Communications, Inc.Inventors: Puming Zhan, Xinwei Li
-
Publication number: 20170169815Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.Type: ApplicationFiled: December 10, 2015Publication date: June 15, 2017Applicant: Nuance Communications, Inc.Inventors: Puming Zhan, Xinwei Li
-
Publication number: 20130317817Abstract: Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces.Type: ApplicationFiled: May 22, 2012Publication date: November 28, 2013Applicant: Nuance Communications, Inc.Inventors: William F. Ganong, III, Paul J. Vozila, Puming Zhan
-
Patent number: 8386254Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.Type: GrantFiled: May 2, 2008Date of Patent: February 26, 2013Assignee: Nuance Communications, Inc.Inventors: Neeraj Deshmukh, Puming Zhan
-
Publication number: 20090024390Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.Type: ApplicationFiled: May 2, 2008Publication date: January 22, 2009Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Neeraj Deshmukh, Puming Zhan
-
Patent number: 7085716Abstract: A method is described that corrects incorrect text associated with recognition errors in computer-implemented speech recognition. The method includes the step of performing speech recognition on an utterance to produce a recognition result for the utterance. The command includes a word and a phrase. The method includes determining if a word closely corresponds to a portion of the phrase. A speech recognition result is produced if the word closely corresponds to a portion of the phrase.Type: GrantFiled: October 26, 2000Date of Patent: August 1, 2006Assignee: Nuance Communications, Inc.Inventors: Stijn Van Even, Li Li, Xianju Du, Puming Zhan