Normalizing Patents (Class 704/234)
  • Patent number: 8112274
    Abstract: A method for recognizing a pattern that comprises a set of physical stimuli, said method comprising the steps of: providing a set of training observations and through applying a plurality of association models ascertaining various measuring values pj(k|x), j=1 . . . M, that each pertain to assigning a particular training observation to one or more associated pattern classes; setting up a log/linear association distribution by combining all association models of the plurality according to respective weight factors, and joining thereto a normalization quantity to produce a compound association distribution; optimizing said weight factors for thereby minimizing a detected error rate of the actual assigning to said compound distribution; recognizing target observations representing a target pattern with the help of said compound distribution.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: February 7, 2012
    Assignee: Nuance Communications, Inc.
    Inventor: Peter Beyerlein
  • Patent number: 8090579
    Abstract: A system and method are described for recognizing repeated audio material within at least one media stream without prior knowledge of the nature of the repeated material. The system and method are able to create a screening database from the media stream or streams. An unknown sample audio fragment is taken from the media stream and compared against the screening database to find if there are matching fragments within the media streams by determining if the unknown sample matches any samples in the screening database.
    Type: Grant
    Filed: February 8, 2006
    Date of Patent: January 3, 2012
    Assignee: Landmark Digital Services
    Inventors: David L. DeBusk, Darren P. Briggs, Michael Karliner, Richard Wing Cheong Tang, Avery Li-Chun Wang
  • Patent number: 8064699
    Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.
    Type: Grant
    Filed: September 24, 2009
    Date of Patent: November 22, 2011
    Assignee: Infineon Technologies AG
    Inventors: Werner Hemmert, Marcus Holmberg
  • Patent number: 8036889
    Abstract: A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately.
    Type: Grant
    Filed: February 27, 2006
    Date of Patent: October 11, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Alwin B. Carus, Larissa Lapshina, Bernardo Rechea
  • Patent number: 8027487
    Abstract: A method of setting an equalizer so as to enlarge a sound field in reproducing an audio file and a method of reproducing an audio file thereby, includes: dividing an input audio file into segments with a predetermined time length; extracting an audio feature value for each segment; determining equalizer information for reproducing each segment by the use of the extracted feature value; and determining an equalizer sequence for the audio file by the use of the determined equalizer information of each segment. Since the equalizer setting information can be automatically changed without user manipulation, the user can listen to an audio file of which the sound field is dynamically enlarged.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: September 27, 2011
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Gun-han Park
  • Patent number: 8005674
    Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.
    Type: Grant
    Filed: July 10, 2007
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric W Janke, Bin Jia
  • Patent number: 7987090
    Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: July 26, 2011
    Assignee: Honda Motor Co., Ltd.
    Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
  • Patent number: 7970115
    Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.
    Type: Grant
    Filed: October 5, 2005
    Date of Patent: June 28, 2011
    Assignee: Avaya Inc.
    Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
  • Patent number: 7933771
    Abstract: A system and method for detecting the recognizability of input speech signal is provided. It is designed in the pre-stage of speech recognition or a dialog system. The invention detects the user's environmental condition and verifies if the input speech signal can be recognized. It mainly comprises an environment parameter generator, a signal recognition verifier, and a strategy response processor. Through the use of the invention in the pre-stage of speech recognition or a dialog system, it can precisely verify the recognizability of the input speech signal and receives the input speech signals of high recognition probability in a noisy environment. This reduces the impact caused by receiving the input speech signals of low recognition probability. This invention thus increases the recognition probability for a recognizer.
    Type: Grant
    Filed: March 11, 2006
    Date of Patent: April 26, 2011
    Assignee: Industrial Technology Research Institute
    Inventors: Sen-Chia Chang, Yuan-Fu Liao, Jeng-Shien Lin
  • Patent number: 7930178
    Abstract: A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.
    Type: Grant
    Filed: December 23, 2005
    Date of Patent: April 19, 2011
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Alejandro Acero, Amarnag Subramanya, Zicheng Liu
  • Publication number: 20100324893
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Application
    Filed: August 26, 2010
    Publication date: December 23, 2010
    Applicant: AT&T Intellectual Property II, L.P. via transfer from AT&T Corp.
    Inventor: Mazin GILBERT
  • Patent number: 7835909
    Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
  • Patent number: 7831025
    Abstract: A method and system for administering a subjective listening test to remote users. A user can participate in a subjective listening test, such as an MOS listening test, over a telephone call. The telephone call is received and audio recordings are sequentially played over the telephone call. Quality ratings corresponding to the audio recordings are input by the user over the telephone call. The user can input digits corresponding to the quality ratings. This allows a user to take part in a subjective listening test without traveling to a lab.
    Type: Grant
    Filed: May 15, 2006
    Date of Patent: November 9, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: John D. Francis, Laurie F. Garrison, James H. James
  • Patent number: 7813921
    Abstract: There is provided a voice recognition device and a voice recognition method that enhance the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used. Acoustic models are subjected to clustering processing to calculate the centroid of each cluster and the differential vector between the centroid and each model, model composition between each kind of assumed noise model and the calculated centroid is carried out, and the centroid of each composition model and the differential vector are stored in a memory. In the actual recognition processing, the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, model restoration is carried out on the extracted centroid by using the differential vector stored in the memory, and noise adaptation processing is executed on the basis of the restored model.
    Type: Grant
    Filed: March 15, 2005
    Date of Patent: October 12, 2010
    Assignee: Pioneer Corporation
    Inventors: Hajime Kobayashi, Soichi Toyama, Yasunori Suzuki
  • Patent number: 7809566
    Abstract: A method for use in automatic speech recognition corrects erroneous recognition elements within a recognition hypothesis. A user input is recognized as a correction hypothesis which contains various recognition elements. A non-deterministic alignment is performed to align at least a portion of the correction hypothesis with an earlier recognition hypothesis which also contains various recognition elements such that the recognition elements in the aligned portion of the correction hypothesis are determined to most likely, correspond to a range of recognition elements in the earlier recognition hypotheses. The recognition elements in the range of recognition elements in the earlier recognition hypothesis are replaced with the recognition elements in the aligned portion of the correction hypothesis.
    Type: Grant
    Filed: October 13, 2006
    Date of Patent: October 5, 2010
    Assignee: Nuance Communications, Inc.
    Inventor: Ralf Meermeier
  • Patent number: 7797154
    Abstract: Provision to reduce production of musical noise. A noise reduction device includes: means for calculating a rank for each element included in a first region having predetermined sizes in the time axis direction and in the frequency axis direction, depending on a value of the element, in a noise section of an observed signal indicating variation of a frequency spectrum with time; means for calculating a rank for each element included in a second region, depending on a value of the element, the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal; and means for subtracting, from the values of the respective elements in the second region, values based on the values of the respective elements in the first region whose ranks correspond to ranks of respective elements in the second region.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: September 14, 2010
    Assignee: International Business Machines Corporation
    Inventor: Osamu Ichikawa
  • Patent number: 7797158
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Grant
    Filed: June 20, 2007
    Date of Patent: September 14, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 7797157
    Abstract: Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.
    Type: Grant
    Filed: January 10, 2005
    Date of Patent: September 14, 2010
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Igor Zlokarnik, Laurence S. Gillick, Jordan Cohen
  • Patent number: 7774132
    Abstract: In one embodiment, a navigation system provides navigation directions within particular locations within a facility, such as within a corporate campus, airport, resort, building, etc. The navigation system may respond to navigation requests for different types of facility target destinations such as a location, a person, a movable item, an event, or a condition. Different location resources can be accessed depending on the type of requested target destination. For example, an employee database may be used to locate an office within the facility associated with navigation request that contains an employee name. A natural voice communication scheme can be used to access to the navigation system through a larger variety of networks and communication devices.
    Type: Grant
    Filed: July 5, 2006
    Date of Patent: August 10, 2010
    Assignee: Cisco Technology, Inc.
    Inventor: Bradley Richard DeGrazia
  • Patent number: 7752042
    Abstract: Methods and apparatus to operate an audience metering device with voice commands are described herein. An example method to identify audience members based on voice, includes: obtaining an audio input signal including a program audio signal and a human voice signal; receiving an audio line signal from an audio output line of a monitored media device; processing the audio line signal with a filter having adaptive weights to generate a delayed and attenuated line signal; subtracting the delayed and attenuated line signal from the audio input signal to develop a residual audio signal; identifying a person that spoke to create the human voice signal based on the residual audio signal; and logging an identity of the person as an audience member.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: July 6, 2010
    Assignee: The Nielsen Company (US), LLC
    Inventor: Venugopal Srinivasan
  • Patent number: 7734472
    Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: June 8, 2010
    Assignee: Alcatel
    Inventor: Michael Walker
  • Publication number: 20100131270
    Abstract: The invention relates to a method for determining a characteristic pattern for a speech message that is supplied in the form of a numerically encoded audio signal generated by means of a sampling process.
    Type: Application
    Filed: July 13, 2007
    Publication date: May 27, 2010
    Applicant: Nokia Siemens Networks GmbH & Co.
    Inventor: Joachim Charzinski
  • Patent number: 7725314
    Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.
    Type: Grant
    Filed: February 16, 2004
    Date of Patent: May 25, 2010
    Assignee: Microsoft Corporation
    Inventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
  • Patent number: 7725316
    Abstract: A speech recognition adaptation method for a vehicle having a telematics unit with an embedded speech recognition system. Speech is received and pre-processed to generate acoustic feature vectors, and an adaptation parameter is applied to the acoustic feature vectors to yield transformed acoustic feature vectors. The transformed acoustic feature vectors are decoded and a hypothesis of the speech is selected, and the adaptation parameter is trained using acoustic feature vectors from the hypothesis. The method also includes one or more of the following steps: the speech is observed for a certain characteristic and the trained adaptation parameter is saved in accordance with the certain characteristic for use in transforming feature vectors of subsequent speech having the certain characteristic; use of the trained adaptation parameter persists from one vehicle ignition cycle to the next; and use of the trained adaptation parameter is ceased upon detection of a system fault.
    Type: Grant
    Filed: July 5, 2006
    Date of Patent: May 25, 2010
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, John J Correia, Scott M Pennock
  • Patent number: 7720679
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: May 18, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7711560
    Abstract: A speech recognition apparatus equipped with the garbage acoustic model storage unit storing the garbage acoustic model which learned the collection of unnecessary words. A feature value calculation unit calculates the feature parameter necessary for recognition by acoustically analyzing the unidentified input speech including the non-language speech per frame which is a unit for speech analysis. A garbage acoustic score calculation unit calculates the garbage acoustic score by comparing the feature parameter and the garbage acoustic model, and a garbage acoustic score correction unit corrects the garbage acoustic score calculated by the garbage acoustic score calculation unit so as to raise it in the frame where the non-language speech is inputted.
    Type: Grant
    Filed: February 4, 2004
    Date of Patent: May 4, 2010
    Assignee: Panasonic Corporation
    Inventors: Maki Yamada, Makoto Nishizaki, Yoshihisa Nakatoh, Shinichi Yoshizawa
  • Patent number: 7702505
    Abstract: A channel normalization apparatus includes: a characteristic extraction unit extracting MFCC characteristics and outputting rows of frames according to time; a characteristic parameter average calculation unit calculating an average value of the rows of the outputted MFCC characteristics; a channel variation estimation unit configuring a codebook based on a database of speech signals with attenuated channel variations and estimating a channel variation for each frame by calculating a distance between a MFCC parameter for each frame and an individual median value of the codebook when a MFCC of a channel distorted speech signal is inputted; and a smoothing operation based channel normalization unit smoothing another average value of the channel variation from the characteristic parameter average calculation unit and the channel variation from the channel variation estimation unit, subtracting the other average value from the MFCC of each frame and outputting rows of channel normalized MFCC characteristics.
    Type: Grant
    Filed: December 14, 2005
    Date of Patent: April 20, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventor: Ho-Young Jung
  • Publication number: 20100094626
    Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.
    Type: Application
    Filed: September 27, 2007
    Publication date: April 15, 2010
    Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
  • Patent number: 7676363
    Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.
    Type: Grant
    Filed: June 29, 2006
    Date of Patent: March 9, 2010
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, Scott M Pennock
  • Patent number: 7646912
    Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.
    Type: Grant
    Filed: February 18, 2005
    Date of Patent: January 12, 2010
    Assignee: Infineon Technologies AG
    Inventors: Werner Hemmert, Marcus Holmberg
  • Publication number: 20090319265
    Abstract: A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing.
    Type: Application
    Filed: June 17, 2009
    Publication date: December 24, 2009
    Inventors: Andreas Wittenstein, Mark Cromack
  • Patent number: 7630892
    Abstract: A method and apparatus are provided that perform text normalization and inverse text normalization using a single grammar. During text normalization, a finite state transducer identifies a second string of symbols from a first string of symbols it receives. During inverse text normalization, the context free transducer identifies the first string of symbols after receiving the second string of symbols.
    Type: Grant
    Filed: September 10, 2004
    Date of Patent: December 8, 2009
    Assignee: Microsoft Corporation
    Inventors: Qiang Wu, Rachel I. Morton, Li Jiang
  • Patent number: 7620263
    Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: November 17, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
  • Publication number: 20090259465
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Application
    Filed: June 24, 2009
    Publication date: October 15, 2009
    Applicant: AT&T Corp.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Patent number: 7580512
    Abstract: An incoming call screening treatment is selected for a call to a called communication device based on an emotional state criterion input by a user of the called communication device.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: August 25, 2009
    Assignee: Alcatel-Lucent USA Inc.
    Inventors: Ramachendra P. Batni, Ranjan Sharma
  • Patent number: 7567903
    Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: July 28, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Publication number: 20090157400
    Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.
    Type: Application
    Filed: October 1, 2008
    Publication date: June 18, 2009
    Applicant: Industrial Technology Research Institute
    Inventor: Shih-Ming Huang
  • Patent number: 7548854
    Abstract: A unique, fully integrated, fully programmable, and highly flexible sound distribution system and methodology for providing masking sound, background music, and paging capabilities in up to eight zones of a building or space is provided. The methodology embodied in the system includes internal masking sounds that are uniquely pre-filtered to provide efficient and effective masking of distracting sounds within selectable zones of the space with a minimum masking sound dB sound level and with a pleasant sounding and non-annoying masking sound. The system also incorporates the capacity to be controlled from a remote or local telephone to adjust the volume level in any zone serviced by the system by issuing appropriate DTMF codes from the telephone's keypad. Unique bi-tone diagnostic functions are provided for assuring that the entire system is correctly wired and installed and for troubleshooting operational anomalies.
    Type: Grant
    Filed: March 28, 2002
    Date of Patent: June 16, 2009
    Assignee: AWI Licensing Company
    Inventors: Kenneth P. Roy, Thomas J. Johnson, Ronald Fuller, Steve Dove
  • Patent number: 7542900
    Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.
    Type: Grant
    Filed: May 5, 2006
    Date of Patent: June 2, 2009
    Assignee: Microsoft Corporation
    Inventors: James G. Droppo, Li Deng, Alejandro Acero
  • Publication number: 20090112586
    Abstract: Systems, methods and computer-readable media associated with using a divergence metric to evaluate user simulations in a spoken dialog system. The method employs user simulations of a spoken dialog system and includes aggregating a first set of one or more scores from a real user dialog, aggregating a second set of one or more scores from a simulated user dialog associated with a user model, determining a similarity of distributions associated with each of the first set and the second set, wherein the similarity is determined using a divergence metric that does not require any assumptions regarding a shape of the distributions. It is preferable to use a Cramér-von Mises divergence.
    Type: Application
    Filed: November 1, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Lab. Inc.
    Inventor: Jason WILLIAMS
  • Publication number: 20090018828
    Abstract: An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module.
    Type: Application
    Filed: November 12, 2004
    Publication date: January 15, 2009
    Inventors: Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
  • Publication number: 20080270131
    Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).
    Type: Application
    Filed: April 18, 2008
    Publication date: October 30, 2008
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20080201141
    Abstract: Utterances by a speaker are analyzed by an appropriate computational system. The spoken words are recognized and indexed to their respective analogs which are used to tailor the speech sequence to conform to a pre-determined standard of speech characteristics which could be fixed for a given language or chosen based on the regional characteristics of the said common language target for a communication session. Thusly selected audio sequences are then tailored or synthesized into the normalized characteristics and inserted into the outgoing speech stream such that the resulting audio sequence exhibits reduced speech characteristics deemed undesirable.
    Type: Application
    Filed: February 15, 2008
    Publication date: August 21, 2008
    Inventors: Igor Abramov, Patrick O. Nunally
  • Patent number: 7409343
    Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.
    Type: Grant
    Filed: July 22, 2003
    Date of Patent: August 5, 2008
    Assignee: France Telecom
    Inventor: Delphine Charlet
  • Publication number: 20080082327
    Abstract: It is an object of the present invention to provide a sound processing apparatus which can allow a user to hear a sound with improved intelligibility even if the sound is hard to hear. The analyzing means 15 is adapted to analyze the input signal from the A/D converter 12, to detect, on the basis of an analysis of the input signal, a frequency band corresponding to a masking sound and a frequency band corresponding to a masked sound, and to change the cutoff frequencies of the lowpass and highpass filters 13 and 14 on the basis of the analysis of the input signal to ensure that the frequency band corresponding to a masking sound is included in a signal from one of the first and second sound output means 18 and 19, and the frequency band corresponding to a masked sound is included in a signal from the other of the first and second sound output means 18 and 19.
    Type: Application
    Filed: September 13, 2005
    Publication date: April 3, 2008
    Applicants: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., TOHOKU UNIVERSITY
    Inventors: Atsunobu Murase, Shuichi Sakamoto, Youichi Suzuki, Tetsuaki Kawase, Toshimitsu Kobayashi
  • Publication number: 20080082328
    Abstract: A priori speech absence probability refers to a probability that a speech is not present with respect to a frame and a frequency bin resulting from an input signal. The priori speech absence probability has been regarded as a constant (generally, 0.5) because it is difficult to estimate. However, attempts to estimate the priori speech absence probability have been made since 2002. A novel method for estimating a priori speech absence probability using a statistical model is proposed. The method for estimating a priori speech absence probability obtains a priori speech absence probability of input speech data using a local parameter, a global parameter and an average parameter. The local parameter and the global parameter are obtained by determining a smaller value than a first threshold value as 0, determining a greater value than a second threshold value as 1, and applying a raised cosine function to values between the first threshold value and the second threshold value.
    Type: Application
    Filed: September 27, 2007
    Publication date: April 3, 2008
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Sung Joo Lee
  • Patent number: 7340396
    Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).
    Type: Grant
    Filed: February 18, 2003
    Date of Patent: March 4, 2008
    Assignee: Motorola, Inc.
    Inventors: Mark Thomson, Julien Epps, Trym Holter
  • Patent number: 7328154
    Abstract: An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.
    Type: Grant
    Filed: August 13, 2003
    Date of Patent: February 5, 2008
    Assignee: Matsushita Electrical Industrial Co., Ltd.
    Inventors: Ambroise Mutel, Patrick Nguyen, Luca Rigazio
  • Publication number: 20080004875
    Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.
    Type: Application
    Filed: June 29, 2006
    Publication date: January 3, 2008
    Applicant: GENERAL MOTORS CORPORATION
    Inventors: Rathinavelu Chengalvarayan, Scott M. Pennock
  • Patent number: 7310600
    Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: December 18, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi