Normalizing Patents (Class 704/234)

Method of determining model-specific factors for pattern recognition, in particular for speech patterns

Patent number: 8112274

Abstract: A method for recognizing a pattern that comprises a set of physical stimuli, said method comprising the steps of: providing a set of training observations and through applying a plurality of association models ascertaining various measuring values pj(k|x), j=1 . . . M, that each pertain to assigning a particular training observation to one or more associated pattern classes; setting up a log/linear association distribution by combining all association models of the plurality according to respective weight factors, and joining thereto a normalization quantity to produce a compound association distribution; optimizing said weight factors for thereby minimizing a detected error rate of the actual assigning to said compound distribution; recognizing target observations representing a target pattern with the help of said compound distribution.

Type: Grant

Filed: April 30, 2002

Date of Patent: February 7, 2012

Assignee: Nuance Communications, Inc.

Inventor: Peter Beyerlein
Automatic identification of repeated material in audio signals

Patent number: 8090579

Abstract: A system and method are described for recognizing repeated audio material within at least one media stream without prior knowledge of the nature of the repeated material. The system and method are able to create a screening database from the media stream or streams. An unknown sample audio fragment is taken from the media stream and compared against the screening database to find if there are matching fragments within the media streams by determining if the unknown sample matches any samples in the screening database.

Type: Grant

Filed: February 8, 2006

Date of Patent: January 3, 2012

Assignee: Landmark Digital Services

Inventors: David L. DeBusk, Darren P. Briggs, Michael Karliner, Richard Wing Cheong Tang, Avery Li-Chun Wang
Method and device for ascertaining feature vectors from a signal

Patent number: 8064699

Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.

Type: Grant

Filed: September 24, 2009

Date of Patent: November 22, 2011

Assignee: Infineon Technologies AG

Inventors: Werner Hemmert, Marcus Holmberg
Systems and methods for filtering dictated and non-dictated sections of documents

Patent number: 8036889

Abstract: A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately.

Type: Grant

Filed: February 27, 2006

Date of Patent: October 11, 2011

Assignee: Nuance Communications, Inc.

Inventors: Alwin B. Carus, Larissa Lapshina, Bernardo Rechea
Method of setting equalizer for audio file and method of reproducing audio file

Patent number: 8027487

Abstract: A method of setting an equalizer so as to enlarge a sound field in reproducing an audio file and a method of reproducing an audio file thereby, includes: dividing an input audio file into segments with a predetermined time length; extracting an audio feature value for each segment; determining equalizer information for reproducing each segment by the use of the extracted feature value; and determining an equalizer sequence for the audio file by the use of the determined equalizer information of each segment. Since the equalizer setting information can be automatically changed without user manipulation, the user can listen to an audio file of which the sound field is dynamically enlarged.

Type: Grant

Filed: October 10, 2006

Date of Patent: September 27, 2011

Assignee: Samsung Electronics Co., Ltd.

Inventor: Gun-han Park
Data modeling of class independent recognition models

Patent number: 8005674

Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.

Type: Grant

Filed: July 10, 2007

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Eric W Janke, Bin Jia
Sound-source separation system

Patent number: 7987090

Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).

Type: Grant

Filed: August 7, 2008

Date of Patent: July 26, 2011

Assignee: Honda Motor Co., Ltd.

Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
Assisted discrimination of similar sounding speakers

Patent number: 7970115

Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.

Type: Grant

Filed: October 5, 2005

Date of Patent: June 28, 2011

Assignee: Avaya Inc.

Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
System and method for detecting the recognizability of input speech signals

Patent number: 7933771

Abstract: A system and method for detecting the recognizability of input speech signal is provided. It is designed in the pre-stage of speech recognition or a dialog system. The invention detects the user's environmental condition and verifies if the input speech signal can be recognized. It mainly comprises an environment parameter generator, a signal recognition verifier, and a strategy response processor. Through the use of the invention in the pre-stage of speech recognition or a dialog system, it can precisely verify the recognizability of the input speech signal and receives the input speech signals of high recognition probability in a noisy environment. This reduces the impact caused by receiving the input speech signals of low recognition probability. This invention thus increases the recognition probability for a recognizer.

Type: Grant

Filed: March 11, 2006

Date of Patent: April 26, 2011

Assignee: Industrial Technology Research Institute

Inventors: Sen-Chia Chang, Yuan-Fu Liao, Jeng-Shien Lin
Speech modeling and enhancement based on magnitude-normalized spectra

Patent number: 7930178

Abstract: A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.

Type: Grant

Filed: December 23, 2005

Date of Patent: April 19, 2011

Assignee: Microsoft Corporation

Inventors: Zhengyou Zhang, Alejandro Acero, Amarnag Subramanya, Zicheng Liu
SYSTEM AND METHOD FOR IMPROVING ROBUSTNESS OF SPEECH RECOGNITION USING VOCAL TRACT LENGTH NORMALIZATION CODEBOOKS

Publication number: 20100324893

Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Type: Application

Filed: August 26, 2010

Publication date: December 23, 2010

Applicant: AT&T Intellectual Property II, L.P. via transfer from AT&T Corp.

Inventor: Mazin GILBERT
Method and apparatus for normalizing voice feature vector by backward cumulative histogram

Patent number: 7835909

Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.

Type: Grant

Filed: December 12, 2006

Date of Patent: November 16, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
Method and system for administering subjective listening test to remote users

Patent number: 7831025

Abstract: A method and system for administering a subjective listening test to remote users. A user can participate in a subjective listening test, such as an MOS listening test, over a telephone call. The telephone call is received and audio recordings are sequentially played over the telephone call. Quality ratings corresponding to the audio recordings are input by the user over the telephone call. The user can input digits corresponding to the quality ratings. This allows a user to take part in a subjective listening test without traveling to a lab.

Type: Grant

Filed: May 15, 2006

Date of Patent: November 9, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: John D. Francis, Laurie F. Garrison, James H. James
Speech recognition device and speech recognition method

Patent number: 7813921

Abstract: There is provided a voice recognition device and a voice recognition method that enhance the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used. Acoustic models are subjected to clustering processing to calculate the centroid of each cluster and the differential vector between the centroid and each model, model composition between each kind of assumed noise model and the calculated centroid is carried out, and the centroid of each composition model and the differential vector are stored in a memory. In the actual recognition processing, the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, model restoration is carried out on the extracted centroid by using the differential vector stored in the memory, and noise adaptation processing is executed on the basis of the restored model.

Type: Grant

Filed: March 15, 2005

Date of Patent: October 12, 2010

Assignee: Pioneer Corporation

Inventors: Hajime Kobayashi, Soichi Toyama, Yasunori Suzuki
One-step repair of misrecognized recognition strings

Patent number: 7809566

Abstract: A method for use in automatic speech recognition corrects erroneous recognition elements within a recognition hypothesis. A user input is recognized as a correction hypothesis which contains various recognition elements. A non-deterministic alignment is performed to align at least a portion of the correction hypothesis with an earlier recognition hypothesis which also contains various recognition elements such that the recognition elements in the aligned portion of the correction hypothesis are determined to most likely, correspond to a range of recognition elements in the earlier recognition hypotheses. The recognition elements in the range of recognition elements in the earlier recognition hypothesis are replaced with the recognition elements in the aligned portion of the correction hypothesis.

Type: Grant

Filed: October 13, 2006

Date of Patent: October 5, 2010

Assignee: Nuance Communications, Inc.

Inventor: Ralf Meermeier
Signal noise reduction

Patent number: 7797154

Abstract: Provision to reduce production of musical noise. A noise reduction device includes: means for calculating a rank for each element included in a first region having predetermined sizes in the time axis direction and in the frequency axis direction, depending on a value of the element, in a noise section of an observed signal indicating variation of a frequency spectrum with time; means for calculating a rank for each element included in a second region, depending on a value of the element, the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal; and means for subtracting, from the values of the respective elements in the second region, values based on the values of the respective elements in the first region whose ranks correspond to ranks of respective elements in the second region.

Type: Grant

Filed: May 27, 2008

Date of Patent: September 14, 2010

Assignee: International Business Machines Corporation

Inventor: Osamu Ichikawa
System and method for improving robustness of speech recognition using vocal tract length normalization codebooks

Patent number: 7797158

Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Type: Grant

Filed: June 20, 2007

Date of Patent: September 14, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Mazin Gilbert
Automatic speech recognition channel normalization based on measured statistics from initial portions of speech utterances

Patent number: 7797157

Abstract: Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.

Type: Grant

Filed: January 10, 2005

Date of Patent: September 14, 2010

Assignee: Voice Signal Technologies, Inc.

Inventors: Igor Zlokarnik, Laurence S. Gillick, Jordan Cohen
Providing navigation directions

Patent number: 7774132

Abstract: In one embodiment, a navigation system provides navigation directions within particular locations within a facility, such as within a corporate campus, airport, resort, building, etc. The navigation system may respond to navigation requests for different types of facility target destinations such as a location, a person, a movable item, an event, or a condition. Different location resources can be accessed depending on the type of requested target destination. For example, an employee database may be used to locate an office within the facility associated with navigation request that contains an employee name. A natural voice communication scheme can be used to access to the navigation system through a larger variety of networks and communication devices.

Type: Grant

Filed: July 5, 2006

Date of Patent: August 10, 2010

Assignee: Cisco Technology, Inc.

Inventor: Bradley Richard DeGrazia
Methods and apparatus to operate an audience metering device with voice commands

Patent number: 7752042

Abstract: Methods and apparatus to operate an audience metering device with voice commands are described herein. An example method to identify audience members based on voice, includes: obtaining an audio input signal including a program audio signal and a human voice signal; receiving an audio line signal from an audio output line of a monitored media device; processing the audio line signal with a filter having adaptive weights to generate a delayed and attenuated line signal; subtracting the delayed and attenuated line signal from the audio input signal to develop a residual audio signal; identifying a person that spoke to create the human voice signal based on the residual audio signal; and logging an identity of the person as an audience member.

Type: Grant

Filed: February 1, 2008

Date of Patent: July 6, 2010

Assignee: The Nielsen Company (US), LLC

Inventor: Venugopal Srinivasan
Speech recognition enhancer

Patent number: 7734472

Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).

Type: Grant

Filed: September 29, 2004

Date of Patent: June 8, 2010

Assignee: Alcatel

Inventor: Michael Walker
METHOD AND SYSTEM FOR REDUCING RECEPTION OF UNWANTED MESSAGES

Publication number: 20100131270

Abstract: The invention relates to a method for determining a characteristic pattern for a speech message that is supplied in the form of a numerically encoded audio signal generated by means of a sampling process.

Type: Application

Filed: July 13, 2007

Publication date: May 27, 2010

Applicant: Nokia Siemens Networks GmbH & Co.

Inventor: Joachim Charzinski
Method and apparatus for constructing a speech filter using estimates of clean speech and noise

Patent number: 7725314

Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.

Type: Grant

Filed: February 16, 2004

Date of Patent: May 25, 2010

Assignee: Microsoft Corporation

Inventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
Applying speech recognition adaptation in an automated speech recognition system of a telematics-equipped vehicle

Patent number: 7725316

Abstract: A speech recognition adaptation method for a vehicle having a telematics unit with an embedded speech recognition system. Speech is received and pre-processed to generate acoustic feature vectors, and an adaptation parameter is applied to the acoustic feature vectors to yield transformed acoustic feature vectors. The transformed acoustic feature vectors are decoded and a hypothesis of the speech is selected, and the adaptation parameter is trained using acoustic feature vectors from the hypothesis. The method also includes one or more of the following steps: the speech is observed for a certain characteristic and the trained adaptation parameter is saved in accordance with the certain characteristic for use in transforming feature vectors of subsequent speech having the certain characteristic; use of the trained adaptation parameter persists from one vehicle ignition cycle to the next; and use of the trained adaptation parameter is ceased upon detection of a system fault.

Type: Grant

Filed: July 5, 2006

Date of Patent: May 25, 2010

Assignee: General Motors LLC

Inventors: Rathinavelu Chengalvarayan, John J Correia, Scott M Pennock
Speech recognition apparatus, speech recognition apparatus and program thereof

Patent number: 7720679

Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.

Type: Grant

Filed: September 24, 2008

Date of Patent: May 18, 2010

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
Speech recognition device and speech recognition method

Patent number: 7711560

Abstract: A speech recognition apparatus equipped with the garbage acoustic model storage unit storing the garbage acoustic model which learned the collection of unnecessary words. A feature value calculation unit calculates the feature parameter necessary for recognition by acoustically analyzing the unidentified input speech including the non-language speech per frame which is a unit for speech analysis. A garbage acoustic score calculation unit calculates the garbage acoustic score by comparing the feature parameter and the garbage acoustic model, and a garbage acoustic score correction unit corrects the garbage acoustic score calculated by the garbage acoustic score calculation unit so as to raise it in the frame where the non-language speech is inputted.

Type: Grant

Filed: February 4, 2004

Date of Patent: May 4, 2010

Assignee: Panasonic Corporation

Inventors: Maki Yamada, Makoto Nishizaki, Yoshihisa Nakatoh, Shinichi Yoshizawa
Channel normalization apparatus and method for robust speech recognition

Patent number: 7702505

Abstract: A channel normalization apparatus includes: a characteristic extraction unit extracting MFCC characteristics and outputting rows of frames according to time; a characteristic parameter average calculation unit calculating an average value of the rows of the outputted MFCC characteristics; a channel variation estimation unit configuring a codebook based on a database of speech signals with attenuated channel variations and estimating a channel variation for each frame by calculating a distance between a MFCC parameter for each frame and an individual median value of the codebook when a MFCC of a channel distorted speech signal is inputted; and a smoothing operation based channel normalization unit smoothing another average value of the channel variation from the characteristic parameter average calculation unit and the channel variation from the channel variation estimation unit, subtracting the other average value from the MFCC of each frame and outputting rows of channel normalized MFCC characteristics.

Type: Grant

Filed: December 14, 2005

Date of Patent: April 20, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventor: Ho-Young Jung
METHOD AND APPARATUS FOR LOCATING SPEECH KEYWORD AND SPEECH RECOGNITION SYSTEM

Publication number: 20100094626

Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.

Type: Application

Filed: September 27, 2007

Publication date: April 15, 2010

Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
Automated speech recognition using normalized in-vehicle speech

Patent number: 7676363

Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.

Type: Grant

Filed: June 29, 2006

Date of Patent: March 9, 2010

Assignee: General Motors LLC

Inventors: Rathinavelu Chengalvarayan, Scott M Pennock
Method and device for ascertaining feature vectors from a signal

Patent number: 7646912

Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.

Type: Grant

Filed: February 18, 2005

Date of Patent: January 12, 2010

Assignee: Infineon Technologies AG

Inventors: Werner Hemmert, Marcus Holmberg
Method and system for efficient pacing of speech for transription

Publication number: 20090319265

Abstract: A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing.

Type: Application

Filed: June 17, 2009

Publication date: December 24, 2009

Inventors: Andreas Wittenstein, Mark Cromack
Method and apparatus for transducer-based text normalization and inverse text normalization

Patent number: 7630892

Abstract: A method and apparatus are provided that perform text normalization and inverse text normalization using a single grammar. During text normalization, a finite state transducer identifies a second string of symbols from a first string of symbols it receives. During inverse text normalization, the context free transducer identifies the first string of symbols after receiving the second string of symbols.

Type: Grant

Filed: September 10, 2004

Date of Patent: December 8, 2009

Assignee: Microsoft Corporation

Inventors: Qiang Wu, Rachel I. Morton, Li Jiang
Anti-clipping method for image sharpness enhancement

Patent number: 7620263

Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.

Type: Grant

Filed: October 6, 2005

Date of Patent: November 17, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION

Publication number: 20090259465

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Application

Filed: June 24, 2009

Publication date: October 15, 2009

Applicant: AT&T Corp.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Selection of incoming call screening treatment based on emotional state criterion

Patent number: 7580512

Abstract: An incoming call screening treatment is selected for a call to a called communication device based on an emotional state criterion input by a user of the called communication device.

Type: Grant

Filed: June 28, 2005

Date of Patent: August 25, 2009

Assignee: Alcatel-Lucent USA Inc.

Inventors: Ramachendra P. Batni, Ranjan Sharma
Low latency real-time vocal tract length normalization

Patent number: 7567903

Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

Type: Grant

Filed: January 12, 2005

Date of Patent: July 28, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
SPEECH RECOGNITION SYSTEM AND METHOD WITH CEPSTRAL NOISE SUBTRACTION

Publication number: 20090157400

Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.

Type: Application

Filed: October 1, 2008

Publication date: June 18, 2009

Applicant: Industrial Technology Research Institute

Inventor: Shih-Ming Huang
Architectural sound enhancement with pre-filtered masking sound

Patent number: 7548854

Abstract: A unique, fully integrated, fully programmable, and highly flexible sound distribution system and methodology for providing masking sound, background music, and paging capabilities in up to eight zones of a building or space is provided. The methodology embodied in the system includes internal masking sounds that are uniquely pre-filtered to provide efficient and effective masking of distracting sounds within selectable zones of the space with a minimum masking sound dB sound level and with a pleasant sounding and non-annoying masking sound. The system also incorporates the capacity to be controlled from a remote or local telephone to adjust the volume level in any zone serviced by the system by issuing appropriate DTMF codes from the telephone's keypad. Unique bi-tone diagnostic functions are provided for assuring that the entire system is correctly wired and installed and for troubleshooting operational anomalies.

Type: Grant

Filed: March 28, 2002

Date of Patent: June 16, 2009

Assignee: AWI Licensing Company

Inventors: Kenneth P. Roy, Thomas J. Johnson, Ronald Fuller, Steve Dove
Noise reduction using correction vectors based on dynamic aspects of speech and noise normalization

Patent number: 7542900

Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.

Type: Grant

Filed: May 5, 2006

Date of Patent: June 2, 2009

Assignee: Microsoft Corporation

Inventors: James G. Droppo, Li Deng, Alejandro Acero
SYSTEM AND METHOD OF EVALUATING USER SIMULATIONS IN A SPOKEN DIALOG SYSTEM WITH A DIVERSION METRIC

Publication number: 20090112586

Abstract: Systems, methods and computer-readable media associated with using a divergence metric to evaluate user simulations in a spoken dialog system. The method employs user simulations of a spoken dialog system and includes aggregating a first set of one or more scores from a real user dialog, aggregating a second set of one or more scores from a simulated user dialog associated with a user model, determining a similarity of distributions associated with each of the first set and the second set, wherein the similarity is determined using a divergence metric that does not require any assumptions regarding a shape of the distributions. It is preferable to use a Cramér-von Mises divergence.

Type: Application

Filed: November 1, 2007

Publication date: April 30, 2009

Applicant: AT&T Lab. Inc.

Inventor: Jason WILLIAMS
Automatic Speech Recognition System

Publication number: 20090018828

Abstract: An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module.

Type: Application

Filed: November 12, 2004

Publication date: January 15, 2009

Inventors: Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
METHOD, PREPROCESSOR, SPEECH RECOGNITION SYSTEM, AND PROGRAM PRODUCT FOR EXTRACTING TARGET SPEECH BY REMOVING NOISE

Publication number: 20080270131

Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).

Type: Application

Filed: April 18, 2008

Publication date: October 30, 2008

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
SPEECH FILTERS

Publication number: 20080201141

Abstract: Utterances by a speaker are analyzed by an appropriate computational system. The spoken words are recognized and indexed to their respective analogs which are used to tailor the speech sequence to conform to a pre-determined standard of speech characteristics which could be fixed for a given language or chosen based on the regional characteristics of the said common language target for a communication session. Thusly selected audio sequences are then tailored or synthesized into the normalized characteristics and inserted into the outgoing speech stream such that the resulting audio sequence exhibits reduced speech characteristics deemed undesirable.

Type: Application

Filed: February 15, 2008

Publication date: August 21, 2008

Inventors: Igor Abramov, Patrick O. Nunally
Verification score normalization in a speaker voice recognition device

Patent number: 7409343

Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.

Type: Grant

Filed: July 22, 2003

Date of Patent: August 5, 2008

Assignee: France Telecom

Inventor: Delphine Charlet
Sound Processing Apparatus

Publication number: 20080082327

Abstract: It is an object of the present invention to provide a sound processing apparatus which can allow a user to hear a sound with improved intelligibility even if the sound is hard to hear. The analyzing means 15 is adapted to analyze the input signal from the A/D converter 12, to detect, on the basis of an analysis of the input signal, a frequency band corresponding to a masking sound and a frequency band corresponding to a masked sound, and to change the cutoff frequencies of the lowpass and highpass filters 13 and 14 on the basis of the analysis of the input signal to ensure that the frequency band corresponding to a masking sound is included in a signal from one of the first and second sound output means 18 and 19, and the frequency band corresponding to a masked sound is included in a signal from the other of the first and second sound output means 18 and 19.

Type: Application

Filed: September 13, 2005

Publication date: April 3, 2008

Applicants: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., TOHOKU UNIVERSITY

Inventors: Atsunobu Murase, Shuichi Sakamoto, Youichi Suzuki, Tetsuaki Kawase, Toshimitsu Kobayashi
Method for estimating priori SAP based on statistical model

Publication number: 20080082328

Abstract: A priori speech absence probability refers to a probability that a speech is not present with respect to a frame and a frequency bin resulting from an input signal. The priori speech absence probability has been regarded as a constant (generally, 0.5) because it is difficult to estimate. However, attempts to estimate the priori speech absence probability have been made since 2002. A novel method for estimating a priori speech absence probability using a statistical model is proposed. The method for estimating a priori speech absence probability obtains a priori speech absence probability of input speech data using a local parameter, a global parameter and an average parameter. The local parameter and the global parameter are obtained by determining a smaller value than a first threshold value as 0, determining a greater value than a second threshold value as 1, and applying a raised cosine function to values between the first threshold value and the second threshold value.

Type: Application

Filed: September 27, 2007

Publication date: April 3, 2008

Applicant: Electronics and Telecommunications Research Institute

Inventor: Sung Joo Lee
Method and apparatus for providing a speaker adapted speech recognition model set

Patent number: 7340396

Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).

Type: Grant

Filed: February 18, 2003

Date of Patent: March 4, 2008

Assignee: Motorola, Inc.

Inventors: Mark Thomson, Julien Epps, Trym Holter
Bubble splitting for compact acoustic modeling

Patent number: 7328154

Abstract: An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.

Type: Grant

Filed: August 13, 2003

Date of Patent: February 5, 2008

Assignee: Matsushita Electrical Industrial Co., Ltd.

Inventors: Ambroise Mutel, Patrick Nguyen, Luca Rigazio
AUTOMATED SPEECH RECOGNITION USING NORMALIZED IN-VEHICLE SPEECH

Publication number: 20080004875

Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.

Type: Application

Filed: June 29, 2006

Publication date: January 3, 2008

Applicant: GENERAL MOTORS CORPORATION

Inventors: Rathinavelu Chengalvarayan, Scott M. Pennock
Language recognition using a similarity measure

Patent number: 7310600

Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.

Type: Grant

Filed: October 25, 2000

Date of Patent: December 18, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi

prev 1 2 3 4 5 next