Patents by Inventor Hynek Hermansky

Hynek Hermansky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for processing speech to identify keywords or other information

Patent number: 9799333

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Grant

Filed: August 31, 2015

Date of Patent: October 24, 2017

Assignee: The Johns Hopkins University

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
System and Method for Processing Speech to Identify Keywords or Other Information

Publication number: 20150371635

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Application

Filed: August 31, 2015

Publication date: December 24, 2015

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
System and method for processing speech to identify keywords or other information

Patent number: 9177547

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Grant

Filed: June 25, 2013

Date of Patent: November 3, 2015

Assignee: THE JOHNS HOPKINS UNIVERSITY

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
SYSTEM AND METHOD FOR EFFICIENT SIGNAL PROCESSING TO IDENTIFY AND UNDERSTAND SPEECH

Publication number: 20140379347

Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.

Type: Application

Filed: June 25, 2013

Publication date: December 25, 2014

Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
Spectral noise shaping in audio coding based on spectral dynamics in frequency sub-bands

Patent number: 8428957

Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.

Type: Grant

Filed: August 22, 2008

Date of Patent: April 23, 2013

Assignee: QUALCOMM Incorporated

Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
Processing of excitation in audio coding and decoding

Patent number: 8392176

Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated and transformed into a time domain signal. Through the process of heterodyning, the time domain signal is frequency shifted toward the baseband level as a downshifted carrier signal. Quantized values of the all-pole model and the frequency transform of the downshifted carrier signal are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.

Type: Grant

Filed: April 5, 2007

Date of Patent: March 5, 2013

Assignee: QUALCOMM Incorporated

Inventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
SPECTRAL NOISE SHAPING IN AUDIO CODING BASED ON SPECTRAL DYNAMICS IN FREQUENCY SUB-BANDS

Publication number: 20110270616

Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.

Type: Application

Filed: August 22, 2008

Publication date: November 3, 2011

Applicant: QUALCOMM Incorporated

Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
Signal coding and decoding based on spectral dynamics

Patent number: 8027242

Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.

Type: Grant

Filed: October 18, 2006

Date of Patent: September 27, 2011

Assignee: QUALCOMM Incorporated

Inventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
SYSTEM AND METHOD FOR COMPUTING AND TRANSMITTING PARAMETERS IN A DISTRIBUTED VOICE RECOGNITION SYSTEM

Publication number: 20110153326

Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.

Type: Application

Filed: February 9, 2011

Publication date: June 23, 2011

Applicant: QUALCOMM INCORPORATED

Inventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
Systems and methods for speech recognition using frequency domain linear prediction polynomials to form temporal and spectral envelopes from frequency domain representations of signals

Patent number: 7672838

Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.

Type: Grant

Filed: December 1, 2004

Date of Patent: March 2, 2010

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Marios Athineos, Hynek Hermansky, Daniel P. W. Ellis
TEMPORAL MASKING IN AUDIO CODING BASED ON SPECTRAL DYNAMICS IN FREQUENCY SUB-BANDS

Publication number: 20090198500

Abstract: An audio coding technique based on modeling spectral dynamics is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. Each sub-band is then frequency transformed and linear prediction is applied. This results in a Hilbert envelope and a Hilbert Carrier for each of the sub-bands. Because of application of linear prediction to frequency components, the technique is called Frequency Domain Linear Prediction (FDLP). The Hilbert envelope and the Hilbert Carrier are analogous to spectral envelope and excitation signals in the Time Domain Linear Prediction (TDLP) techniques. Temporal masking is applied to the FDLP sub-bands to improve the compression efficiency. Specifically, forward masking of the sub-band FDLP carrier signal can be employed to improve compression efficiency of an encoded signal.

Type: Application

Filed: August 22, 2008

Publication date: August 6, 2009

Applicant: QUALCOMM Incorporated

Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
Signal coding and decoding based on spectral dynamics

Publication number: 20080031365

Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.

Type: Application

Filed: October 18, 2006

Publication date: February 7, 2008

Inventors: Harinath Garudadri, Naveen Srinivasamurthy, Petr Motlicek, Hynek Hermansky
Nonlinear mapping for feature extraction in automatic speech recognition

Patent number: 7254538

Abstract: The present invention successfully combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling (GMM). By training one or more neural networks to generate subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, substantial error rate reductions may be achieved. The present invention effectively has two acoustic models in tandem—first a neural net and then a GMM. By using a variety of combination schemes available for connectionist models, various systems based upon multiple features streams can be constructed with even greater error rate reductions.

Type: Grant

Filed: November 16, 2000

Date of Patent: August 7, 2007

Assignee: International Computer Science Institute

Inventors: Hynek Hermansky, Sangita Sharma, Daniel Ellis
Multistream network feature processing for a distributed speech recognition system

Patent number: 7089178

Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.

Type: Grant

Filed: April 30, 2002

Date of Patent: August 8, 2006

Assignee: Qualcomm Inc.

Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
Distributed voice recognition system utilizing multistream network feature processing

Publication number: 20030204394

Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.

Type: Application

Filed: April 30, 2002

Publication date: October 30, 2003

Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
System and method for computing and transmitting parameters in a distributed voice recognition system

Publication number: 20030004720

Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server . The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server .

Type: Application

Filed: January 28, 2002

Publication date: January 2, 2003

Inventors: Harinath Garudadri, Hynek Hermansky, Lukas Burget, Pratibha Jain, Sachin Kajarekar, Sunil Sivadas, Stephane N. Dupont, Maria Carmen Benitez Ortuzar, Nelson H. Morgan
Method and system for adaptive speech enhancement using frequency specific signal-to-noise ratio estimates

Patent number: 6098038

Abstract: A method and system for adaptively filtering a speech signal in order to suppress noise in the signal. The method includes decomposing the signal into multiple frequency subbands, each having a center frequency, estimating a signal-to-noise ratio for each subband, and providing multiple filters, each filter designed for one of a number of selected signal-to-noise ratio independent of the center frequencies of the subbands. The method also includes selecting a filter for filtering each subband, where the filter selected depends on the signal-to-noise ratio estimated for the subband, filtering each subband according to the filter selected, and combining the filtered subbands to provide an estimated filtered speech signal. The system includes appropriate hardware and software for performing the method.

Type: Grant

Filed: September 27, 1996

Date of Patent: August 1, 2000

Assignee: Oregon Graduate Institute of Science & Technology

Inventors: Hynek Hermansky, Carlos M. Avendano
Method and system for generating an estimated clean speech signal from a noisy speech signal

Patent number: 5878389

Abstract: A method and system for generating an estimate of a clean speech signal extracts time trajections of short-term parameters from a noisy speech signal to obtain a plurality of frequency components each having a magnitude spectrum and a phase spectrum. The magnitude spectrum is then compressed, filtered and then decompressed to obtain a modified magnitude spectrum. The speech signal is then reconstructed using the original phase spectrum and the modified magnitude spectrum.

Type: Grant

Filed: June 28, 1995

Date of Patent: March 2, 1999

Assignee: Oregon Graduate Institute of Science & Technology

Inventors: Hynek Hermansky, Eric A. Wan, Carlos M. Avendano
Noise resistant auditory model for parametrization of speech

Patent number: 5537647

Abstract: A method and system are provided for alleviating the harmful effects of convolutional and additive noise in speech, such as due to the environmental noise and linear spectral modification, based on the filtering of time trajectories of an auditory-like spectrum in a particular spectral domain.

Type: Grant

Filed: November 5, 1992

Date of Patent: July 16, 1996

Assignees: U S West Advanced Technologies, Inc., International Computer Science Institute

Inventors: Hynek Hermansky, Nelson H. Morgan
Auditory model for parametrization of speech

Patent number: 5450522

Abstract: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.

Type: Grant

Filed: August 19, 1991

Date of Patent: September 12, 1995

Assignees: U S West Advanced Technologies, Inc., International Computer Science Institute

Inventors: Hynek Hermansky, Nelson H. Morgan, Philip D. Kohn

1 2 next