Patents by Inventor Hynek Hermansky

Hynek Hermansky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9799333
    Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.
    Type: Grant
    Filed: August 31, 2015
    Date of Patent: October 24, 2017
    Assignee: The Johns Hopkins University
    Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
  • Publication number: 20150371635
    Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.
    Type: Application
    Filed: August 31, 2015
    Publication date: December 24, 2015
    Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
  • Patent number: 9177547
    Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: November 3, 2015
    Assignee: THE JOHNS HOPKINS UNIVERSITY
    Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
  • Publication number: 20140379347
    Abstract: A system and method are provided for performing speech processing. A system includes an audio detection system configured to receive a signal including speech and a memory having stored therein a database of keyword models forming an ensemble of filters associated with each keyword in the database. A processor is configured to receive the signal including speech from the audio detection system, decompose the signal including speech into a sparse set of phonetic impulses, and access the database of keywords and convolve the sparse set of phonetic impulses with the ensemble of filters. The processor is further configured to identify keywords within the signal including speech based a result of the convolution and control operation the electronic system based on the keywords identified.
    Type: Application
    Filed: June 25, 2013
    Publication date: December 25, 2014
    Inventors: Keith Kintzley, Aren Jansen, Hynek Hermansky, Kenneth Church
  • Patent number: 8428957
    Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.
    Type: Grant
    Filed: August 22, 2008
    Date of Patent: April 23, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
  • Patent number: 8392176
    Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated and transformed into a time domain signal. Through the process of heterodyning, the time domain signal is frequency shifted toward the baseband level as a downshifted carrier signal. Quantized values of the all-pole model and the frequency transform of the downshifted carrier signal are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.
    Type: Grant
    Filed: April 5, 2007
    Date of Patent: March 5, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
  • Publication number: 20110270616
    Abstract: A technique of spectral noise shaping in an audio coding system is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. The tonality of each sub-band is determined. If a sub-band is tonal, time domain linear prediction (TDLP) processing is applied to the sub-band, yielding a residual signal and linear predictive coding (LPC) coefficients of an all-pole model representing the sub-band signal. The residual signal is further processed using a frequency domain linear prediction (FDLP) method. The FDLP parameters and LPC coefficients are transferred to a decoder. At the decoder, an inverse-FDLP process is applied to the encoded residual signal followed by an inverse TDLP process, which shapes the quantization noise according to the power spectral density of the original sub-band signal. Non-tonal sub-band signals bypass the TDLP process.
    Type: Application
    Filed: August 22, 2008
    Publication date: November 3, 2011
    Applicant: QUALCOMM Incorporated
    Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
  • Patent number: 8027242
    Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.
    Type: Grant
    Filed: October 18, 2006
    Date of Patent: September 27, 2011
    Assignee: QUALCOMM Incorporated
    Inventors: Harinath Garudadri, Naveen B. Srinivasamurthy, Petr Motlicek, Hynek Hermansky
  • Publication number: 20110153326
    Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.
    Type: Application
    Filed: February 9, 2011
    Publication date: June 23, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
  • Patent number: 7672838
    Abstract: In accordance with the present invention, computer implemented methods and systems are provided for representing and modeling the temporal structure of audio signals. In response to receiving a signal, a time-to-frequency domain transformation on at least a portion of the received signal to generate a frequency domain representation is performed. The time-to-frequency domain transformation converts the signal from a time domain representation to the frequency domain representation. A frequency domain linear prediction (FDLP) is performed on the frequency domain representation to estimate a temporal envelope of the frequency domain representation. Based on the temporal envelope, one or more speech features are generated.
    Type: Grant
    Filed: December 1, 2004
    Date of Patent: March 2, 2010
    Assignee: The Trustees of Columbia University in the City of New York
    Inventors: Marios Athineos, Hynek Hermansky, Daniel P. W. Ellis
  • Publication number: 20090198500
    Abstract: An audio coding technique based on modeling spectral dynamics is disclosed. Frequency decomposition of an input audio signal is performed to obtain multiple frequency sub-bands that closely follow critical bands of human auditory system decomposition. Each sub-band is then frequency transformed and linear prediction is applied. This results in a Hilbert envelope and a Hilbert Carrier for each of the sub-bands. Because of application of linear prediction to frequency components, the technique is called Frequency Domain Linear Prediction (FDLP). The Hilbert envelope and the Hilbert Carrier are analogous to spectral envelope and excitation signals in the Time Domain Linear Prediction (TDLP) techniques. Temporal masking is applied to the FDLP sub-bands to improve the compression efficiency. Specifically, forward masking of the sub-band FDLP carrier signal can be employed to improve compression efficiency of an encoded signal.
    Type: Application
    Filed: August 22, 2008
    Publication date: August 6, 2009
    Applicant: QUALCOMM Incorporated
    Inventors: Harinath Garudadri, Sriram Ganapathy, Petr Motlicek, Hynek Hermansky
  • Publication number: 20080031365
    Abstract: In an apparatus and method, time-varying signals are processed and encoded via a frequency domain linear prediction (FDLP) scheme to arrive at an all-pole model. Residual signals resulted from the scheme are estimated. Quantized values of the all-pole model and the residual signals are packetized as encoded signals suitable for transmission or storage. To reconstruct the time-varying signals, the encoded signals are decoded. The decoding process is basically the reverse of the encoding process.
    Type: Application
    Filed: October 18, 2006
    Publication date: February 7, 2008
    Inventors: Harinath Garudadri, Naveen Srinivasamurthy, Petr Motlicek, Hynek Hermansky
  • Patent number: 7254538
    Abstract: The present invention successfully combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling (GMM). By training one or more neural networks to generate subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, substantial error rate reductions may be achieved. The present invention effectively has two acoustic models in tandem—first a neural net and then a GMM. By using a variety of combination schemes available for connectionist models, various systems based upon multiple features streams can be constructed with even greater error rate reductions.
    Type: Grant
    Filed: November 16, 2000
    Date of Patent: August 7, 2007
    Assignee: International Computer Science Institute
    Inventors: Hynek Hermansky, Sangita Sharma, Daniel Ellis
  • Patent number: 7089178
    Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: August 8, 2006
    Assignee: Qualcomm Inc.
    Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
  • Publication number: 20030204394
    Abstract: A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.
    Type: Application
    Filed: April 30, 2002
    Publication date: October 30, 2003
    Inventors: Harinath Garudadri, Sunil Sivadas, Hynek Hermansky, Nelson H. Morgan, Charles C. Wooters, Andre Gustavo Adami, Maria Carmen Benitez Ortuzar, Lukas Burget, Stephane N. Dupont, Frantisek Grezl, Pratibha Jain, Sachin Kajarekar, Petr Motlicek
  • Publication number: 20030004720
    Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server . The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server .
    Type: Application
    Filed: January 28, 2002
    Publication date: January 2, 2003
    Inventors: Harinath Garudadri, Hynek Hermansky, Lukas Burget, Pratibha Jain, Sachin Kajarekar, Sunil Sivadas, Stephane N. Dupont, Maria Carmen Benitez Ortuzar, Nelson H. Morgan
  • Patent number: 6098038
    Abstract: A method and system for adaptively filtering a speech signal in order to suppress noise in the signal. The method includes decomposing the signal into multiple frequency subbands, each having a center frequency, estimating a signal-to-noise ratio for each subband, and providing multiple filters, each filter designed for one of a number of selected signal-to-noise ratio independent of the center frequencies of the subbands. The method also includes selecting a filter for filtering each subband, where the filter selected depends on the signal-to-noise ratio estimated for the subband, filtering each subband according to the filter selected, and combining the filtered subbands to provide an estimated filtered speech signal. The system includes appropriate hardware and software for performing the method.
    Type: Grant
    Filed: September 27, 1996
    Date of Patent: August 1, 2000
    Assignee: Oregon Graduate Institute of Science & Technology
    Inventors: Hynek Hermansky, Carlos M. Avendano
  • Patent number: 5878389
    Abstract: A method and system for generating an estimate of a clean speech signal extracts time trajections of short-term parameters from a noisy speech signal to obtain a plurality of frequency components each having a magnitude spectrum and a phase spectrum. The magnitude spectrum is then compressed, filtered and then decompressed to obtain a modified magnitude spectrum. The speech signal is then reconstructed using the original phase spectrum and the modified magnitude spectrum.
    Type: Grant
    Filed: June 28, 1995
    Date of Patent: March 2, 1999
    Assignee: Oregon Graduate Institute of Science & Technology
    Inventors: Hynek Hermansky, Eric A. Wan, Carlos M. Avendano
  • Patent number: 5537647
    Abstract: A method and system are provided for alleviating the harmful effects of convolutional and additive noise in speech, such as due to the environmental noise and linear spectral modification, based on the filtering of time trajectories of an auditory-like spectrum in a particular spectral domain.
    Type: Grant
    Filed: November 5, 1992
    Date of Patent: July 16, 1996
    Assignees: U S West Advanced Technologies, Inc., International Computer Science Institute
    Inventors: Hynek Hermansky, Nelson H. Morgan
  • Patent number: 5450522
    Abstract: A method and system are provided for alleviating the harmful effects of convolutional distortions of speech, such as the effect of a telecommunication channel, on the performance of an automatic speech recognizer (ASR). The technique is based on the filtering of time trajectories of an auditory-like spectrum derived from the Perceptual Linear Predictive (PLP) method of speech parameter estimation.
    Type: Grant
    Filed: August 19, 1991
    Date of Patent: September 12, 1995
    Assignees: U S West Advanced Technologies, Inc., International Computer Science Institute
    Inventors: Hynek Hermansky, Nelson H. Morgan, Philip D. Kohn