Patents by Inventor Puming Zhan

Puming Zhan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and Method for Dynamically Adjusting a Number of Emissions in Speech Processing Systems Operating with Large Stride Values

Publication number: 20250078817

Abstract: A method, computer program product, and computing system for dynamically adjusting the number of emitted tokens per frame in speech processing systems operating with large stride values.

Type: Application

Filed: September 6, 2023

Publication date: March 6, 2025

Inventors: Nicola Ferri, Felix Weninger, Puming Zhan
System and Method for Automatically Determining Stride Values in Online Streaming Speech Processing Systems

Publication number: 20240394589

Abstract: A method, computer program product, and computing system for determining a stride value for a first machine learning model. Transfer learning from the first machine learning model to a second machine learning model is performed, wherein the second machine learning model is an online streaming machine learning model. A spectral pooling layer is inserted into the second machine learning model using the stride value. The second machine learning model is trained with the spectral pooling layer.

Type: Application

Filed: May 25, 2023

Publication date: November 28, 2024

Inventors: Dario Albesano, Felix Weninger, Puming Zhan
SYSTEM AND METHOD FOR TRAINING SPEECH PROCESSING NEURAL NETWORKS FOR DYNAMIC LOADS

Publication number: 20240347042

Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A first context window is defined with a first period of past context for processing the plurality of chunks with a neural network of a speech processing system. The neural network is trained using the first context window. A second context window is defined with a first period of past context for processing the plurality of chunks with the neural network. The neural network is trained using the second context window.

Type: Application

Filed: April 11, 2023

Publication date: October 17, 2024

Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
SYSTEM AND METHOD FOR DYNAMICALLY ADJUSTING SPEECH PROCESSING NEURAL NETWORKS

Publication number: 20240347047

Abstract: A method, computer program product, and computing system for dividing a speech signal into a plurality of chunks. A context window is defined for processing a chunk of the plurality of chunks using a neural network of a speech processing system. A processing load associated with the speech processing system is determined. The context window is dynamically adjusted based upon, at least in part, the processing load associated with the speech processing system.

Type: Application

Filed: April 11, 2023

Publication date: October 17, 2024

Inventors: Felix Weninger, Marco Gaudesi, Puming Zhan
MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES

Publication number: 20240249714

Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.

Type: Application

Filed: April 4, 2024

Publication date: July 25, 2024

Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices

Patent number: 11978433

Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.

Type: Grant

Filed: June 22, 2021

Date of Patent: May 7, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Felix Weninger, Marco Gaudesi, Ralf Leibold, Puming Zhan
System and method for performing automatic speech recognition system parameter adjustment via machine learning

Patent number: 11972753

Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.

Type: Grant

Filed: October 20, 2020

Date of Patent: April 30, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
System and Method for Spectral Pooling in Streaming Speech Processing

Publication number: 20240127802

Abstract: A method, computer program product, and computing system for inserting a spectral pooling layer into a neural network of a speech processing system. An output of a hidden layer of the neural network is filtered using the spectral pooling layer with a non-integer stride. The filtered output is provided to a subsequent hidden layer of the neural network.

Type: Application

Filed: January 31, 2023

Publication date: April 18, 2024

Inventors: Felix Weninger, Dario Albesano, Puming Zhan
MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES

Publication number: 20220406295

Abstract: An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.

Type: Application

Filed: June 22, 2021

Publication date: December 22, 2022

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Felix WENINGER, Marco GAUDESI, Ralf LEIBOLD, Puming ZHAN
SYSTEM AND METHOD FOR PERFORMING AUTOMATIC SPEECH RECOGNITION SYSTEM PARAMETER ADJUSTMENT VIA MACHINE LEARNING

Publication number: 20210035560

Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.

Type: Application

Filed: October 20, 2020

Publication date: February 4, 2021

Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
System and methods for adapting neural network acoustic models

Patent number: 10902845

Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

Type: Grant

Filed: July 1, 2019

Date of Patent: January 26, 2021

Assignee: Nuance Communications, Inc.

Inventors: Puming Zhan, Xinwei Li
System and method for performing automatic speech recognition system parameter adjustment via machine learning

Patent number: 10810996

Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.

Type: Grant

Filed: July 31, 2018

Date of Patent: October 20, 2020

Assignee: NUANCE COMMUNICATIONS, INC.

Inventors: Daniel Willett, Yang Sun, Paul Joseph Vozila, Puming Zhan
SYSTEM AND METHOD FOR PERFORMING AUTOMATIC SPEECH RECOGNITION SYSTEM PARAMETER ADJUSTMENT VIA MACHINE LEARNING

Publication number: 20200043468

Abstract: A system, method and computer-readable storage device provides an improved speech processing approach in which hyper parameters used for speech recognition are modified dynamically or in batch mode rather than fixed statically. The method includes estimating, via a model trained on audio data and/or metadata, a set of parameters useful for performing automatic speech recognition, receiving speech at an automatic speech recognition system, applying, by the automatic speech recognition system, the set of parameters to processing the speech to yield text and outputting the text from the automatic speech recognition system.

Type: Application

Filed: July 31, 2018

Publication date: February 6, 2020

Inventors: Daniel WILLETT, Yang SUN, Paul Joseph VOZILA, Puming ZHAN
SYSTEM AND METHODS FOR ADAPTING NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20190325859

Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

Type: Application

Filed: July 1, 2019

Publication date: October 24, 2019

Applicant: Nuance Communications, Inc.

Inventors: Puming Zhan, Xinwei Li
System and methods for adapting neural network acoustic models

Patent number: 10366687

Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

Type: Grant

Filed: December 10, 2015

Date of Patent: July 30, 2019

Assignee: Nuance Communications, Inc.

Inventors: Puming Zhan, Xinwei Li
SYSTEM AND METHODS FOR ADAPTING NEURAL NETWORK ACOUSTIC MODELS

Publication number: 20170169815

Abstract: Techniques for adapting a trained neural network acoustic model, comprising using at least one computer hardware processor to perform: generating initial speaker information values for a speaker; generating first speech content values from first speech data corresponding to a first utterance spoken by the speaker; processing the first speech content values and the initial speaker information values using the trained neural network acoustic model; recognizing, using automatic speech recognition, the first utterance based, at least in part on results of the processing; generating updated speaker information values using the first speech data and at least one of the initial speaker information values and/or information used to generate the initial speaker information values; and recognizing, based at least in part on the updated speaker information values, a second utterance spoken by the speaker.

Type: Application

Filed: December 10, 2015

Publication date: June 15, 2017

Applicant: Nuance Communications, Inc.

Inventors: Puming Zhan, Xinwei Li
Method and Apparatus for Applying Steganography in a Signed Model

Publication number: 20130317817

Abstract: Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces.

Type: Application

Filed: May 22, 2012

Publication date: November 28, 2013

Applicant: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Paul J. Vozila, Puming Zhan
Multi-class constrained maximum likelihood linear regression

Patent number: 8386254

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Grant

Filed: May 2, 2008

Date of Patent: February 26, 2013

Assignee: Nuance Communications, Inc.

Inventors: Neeraj Deshmukh, Puming Zhan
Multi-Class Constrained Maximum Likelihood Linear Regression

Publication number: 20090024390

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Application

Filed: May 2, 2008

Publication date: January 22, 2009

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Neeraj Deshmukh, Puming Zhan
Speech recognition using word-in-phrase command

Patent number: 7085716

Abstract: A method is described that corrects incorrect text associated with recognition errors in computer-implemented speech recognition. The method includes the step of performing speech recognition on an utterance to produce a recognition result for the utterance. The command includes a word and a phrase. The method includes determining if a word closely corresponds to a portion of the phrase. A speech recognition result is produced if the word closely corresponds to a portion of the phrase.

Type: Grant

Filed: October 26, 2000

Date of Patent: August 1, 2006

Assignee: Nuance Communications, Inc.

Inventors: Stijn Van Even, Li Li, Xianju Du, Puming Zhan