Patents by Inventor Hynek Hermansky
Hynek Hermansky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9799333Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.Type: GrantFiled: August 31, 2015Date of Patent: October 24, 2017Assignee: The Johns Hopkins UniversityInventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
-
Publication number: 20150371635Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.Type: ApplicationFiled: August 31, 2015Publication date: December 24, 2015Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
-
Patent number: 9177547Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.Type: GrantFiled: June 25, 2013Date of Patent: November 3, 2015Assignee: THE JOHNS HOPKINS UNIVERSITYInventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
-
Publication number: 20140379347Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.Type: ApplicationFiled: June 25, 2013Publication date: December 25, 2014Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
-
Patent number: 8428957Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.Type: GrantFiled: August 22, 2008Date of Patent: April 23, 2013Assignee: QUALCOMM IncorporatedInventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
-
Patent number: 8392176Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated and transformed into a time domain signal. Through the process of heterodyning, the time domain signal is frequency shifted toward the baseband level as a downshifted carrier signal. Quantized values of the all-pole model and the frequency transform of the downshifted carrier signal are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.Type: GrantFiled: April 5, 2007Date of Patent: March 5, 2013Assignee: QUALCOMM IncorporatedInventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
-
Publication number: 20110270616Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.Type: ApplicationFiled: August 22, 2008Publication date: November 3, 2011Applicant: QUALCOMM IncorporatedInventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
-
Patent number: 8027242Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.Type: GrantFiled: October 18, 2006Date of Patent: September 27, 2011Assignee: QUALCOMM IncorporatedInventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
-
Publication number: 20110153326Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.Type: ApplicationFiled: February 9, 2011Publication date: June 23, 2011Applicant: QUALCOMM INCORPORATEDInventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
-
Patent number: 7672838Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.Type: GrantFiled: December 1, 2004Date of Patent: March 2, 2010Assignee: The Trustees of Columbia University in the City of New YorkInventors: Marios Athineos, Hynek Hermansky, Daniel P. W. Ellis
-
Publication number: 20090198500Abstract: An audio coding technique based on modeling spectral dynamics is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. Each sub-band is then frequency transformed and linear prediction is applied. This results in a Hilbert envelope and a Hilbert Carrier for each of the sub-bands. Because of application of linear prediction to frequency components, the technique is called Frequency Domain Linear Prediction (FDLP). The Hilbert envelope and the Hilbert Carrier are analogous to spectral envelope and excitation signals in the Time Domain Linear Prediction (TDLP) techniques. Temporal masking is applied to the FDLP sub-bands to improve the compression efficiency. Specifically, forward masking of the sub-band FDLP carrier signal can be employed to improve compression efficiency of an encoded signal.Type: ApplicationFiled: August 22, 2008Publication date: August 6, 2009Applicant: QUALCOMM IncorporatedInventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
-
Publication number: 20080031365Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.Type: ApplicationFiled: October 18, 2006Publication date: February 7, 2008Inventors: Harinath Garudadri, Naveen Srinivasamurthy, Petr Motlicek, Hynek Hermansky
-
Patent number: 7254538Abstract: The present invention successfully combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling (GMM). By training one or more neural networks to generate subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, substantial error rate reductions may be achieved. The present invention effectively has two acoustic models in tandem—first a neural net and then a GMM. By using a variety of combination schemes available for connectionist models, various systems based upon multiple features streams can be constructed with even greater error rate reductions.Type: GrantFiled: November 16, 2000Date of Patent: August 7, 2007Assignee: International Computer Science InstituteInventors: Hynek Hermansky, Sangita Sharma, Daniel Ellis
-
Patent number: 7089178Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.Type: GrantFiled: April 30, 2002Date of Patent: August 8, 2006Assignee: Qualcomm Inc.Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
-
Publication number: 20030204394Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.Type: ApplicationFiled: April 30, 2002Publication date: October 30, 2003Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
-
Publication number: 20030004720Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server . The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server .Type: ApplicationFiled: January 28, 2002Publication date: January 2, 2003Inventors: Harinath Garudadri, Hynek Hermansky, Lukas Burget, Pratibha Jain, Sachin Kajarekar, Sunil Sivadas, Stephane N. Dupont, Maria Carmen Benitez Ortuzar, Nelson H. Morgan
-
Patent number: 6098038Abstract: A method and system for adaptively filtering a speech signal in order to suppress noise in the signal. The method includes decomposing the signal into multiple frequency subbands, each having a center frequency, estimating a signal-to-noise ratio for each subband, and providing multiple filters, each filter designed for one of a number of selected signal-to-noise ratio independent of the center frequencies of the subbands. The method also includes selecting a filter for filtering each subband, where the filter selected depends on the signal-to-noise ratio estimated for the subband, filtering each subband according to the filter selected, and combining the filtered subbands to provide an estimated filtered speech signal. The system includes appropriate hardware and software for performing the method.Type: GrantFiled: September 27, 1996Date of Patent: August 1, 2000Assignee: Oregon Graduate Institute of Science & TechnologyInventors: Hynek Hermansky, Carlos M. Avendano
-
Patent number: 5878389Abstract: A method and system for generating an estimate of a clean speech signal extracts time trajections of short-term parameters from a noisy speech signal to obtain a plurality of frequency components each having a magnitude spectrum and a phase spectrum. The magnitude spectrum is then compressed, filtered and then decompressed to obtain a modified magnitude spectrum. The speech signal is then reconstructed using the original phase spectrum and the modified magnitude spectrum.Type: GrantFiled: June 28, 1995Date of Patent: March 2, 1999Assignee: Oregon Graduate Institute of Science & TechnologyInventors: Hynek Hermansky, Eric A. Wan, Carlos M. Avendano
-
Patent number: 5537647Abstract: A method and system are provided for alleviating the harmful effects of convolutional and additive noise in speech, such as due to the environmental noise and linear spectral modification, based on the filtering of time trajectories of an auditory-like spectrum in a particular spectral domain.Type: GrantFiled: November 5, 1992Date of Patent: July 16, 1996Assignees: U S West Advanced Technologies, Inc., International Computer Science InstituteInventors: Hynek Hermansky, Nelson H. Morgan
-
Patent number: 5450522Abstract: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.Type: GrantFiled: August 19, 1991Date of Patent: September 12, 1995Assignees: U S West Advanced Technologies, Inc., International Computer Science InstituteInventors: Hynek Hermansky, Nelson H. Morgan, Philip D. Kohn