Normalizing Patents (Class 704/234)
-
Patent number: 8112274Abstract: A method for recognizing a pattern that comprises a set of physical stimuli, said method comprising the steps of: providing a set of training observations and through applying a plurality of association models ascertaining various measuring values pj(k|x), j=1 . . . M, that each pertain to assigning a particular training observation to one or more associated pattern classes; setting up a log/linear association distribution by combining all association models of the plurality according to respective weight factors, and joining thereto a normalization quantity to produce a compound association distribution; optimizing said weight factors for thereby minimizing a detected error rate of the actual assigning to said compound distribution; recognizing target observations representing a target pattern with the help of said compound distribution.Type: GrantFiled: April 30, 2002Date of Patent: February 7, 2012Assignee: Nuance Communications, Inc.Inventor: Peter Beyerlein
-
Patent number: 8090579Abstract: A system and method are described for recognizing repeated audio material within at least one media stream without prior knowledge of the nature of the repeated material. The system and method are able to create a screening database from the media stream or streams. An unknown sample audio fragment is taken from the media stream and compared against the screening database to find if there are matching fragments within the media streams by determining if the unknown sample matches any samples in the screening database.Type: GrantFiled: February 8, 2006Date of Patent: January 3, 2012Assignee: Landmark Digital ServicesInventors: David L. DeBusk, Darren P. Briggs, Michael Karliner, Richard Wing Cheong Tang, Avery Li-Chun Wang
-
Patent number: 8064699Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.Type: GrantFiled: September 24, 2009Date of Patent: November 22, 2011Assignee: Infineon Technologies AGInventors: Werner Hemmert, Marcus Holmberg
-
Patent number: 8036889Abstract: A system and method for filtering documents to determine section boundaries between dictated and non-dictated text. The system and method identifies portions of a text report that correspond to an original dictation and, correspondingly, those portions that are not part of the original dictation. The system and method include comparing tokenized and normalized forms of the original dictation and the final report, determining mismatches between the two forms, and applying machine-learning techniques to identify document headers, footers, page turns, macros, and lists automatically and accurately.Type: GrantFiled: February 27, 2006Date of Patent: October 11, 2011Assignee: Nuance Communications, Inc.Inventors: Alwin B. Carus, Larissa Lapshina, Bernardo Rechea
-
Patent number: 8027487Abstract: A method of setting an equalizer so as to enlarge a sound field in reproducing an audio file and a method of reproducing an audio file thereby, includes: dividing an input audio file into segments with a predetermined time length; extracting an audio feature value for each segment; determining equalizer information for reproducing each segment by the use of the extracted feature value; and determining an equalizer sequence for the audio file by the use of the determined equalizer information of each segment. Since the equalizer setting information can be automatically changed without user manipulation, the user can listen to an audio file of which the sound field is dynamically enlarged.Type: GrantFiled: October 10, 2006Date of Patent: September 27, 2011Assignee: Samsung Electronics Co., Ltd.Inventor: Gun-han Park
-
Patent number: 8005674Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.Type: GrantFiled: July 10, 2007Date of Patent: August 23, 2011Assignee: International Business Machines CorporationInventors: Eric W Janke, Bin Jia
-
Patent number: 7987090Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).Type: GrantFiled: August 7, 2008Date of Patent: July 26, 2011Assignee: Honda Motor Co., Ltd.Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
-
Patent number: 7970115Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.Type: GrantFiled: October 5, 2005Date of Patent: June 28, 2011Assignee: Avaya Inc.Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
-
Patent number: 7933771Abstract: A system and method for detecting the recognizability of input speech signal is provided. It is designed in the pre-stage of speech recognition or a dialog system. The invention detects the user's environmental condition and verifies if the input speech signal can be recognized. It mainly comprises an environment parameter generator, a signal recognition verifier, and a strategy response processor. Through the use of the invention in the pre-stage of speech recognition or a dialog system, it can precisely verify the recognizability of the input speech signal and receives the input speech signals of high recognition probability in a noisy environment. This reduces the impact caused by receiving the input speech signals of low recognition probability. This invention thus increases the recognition probability for a recognizer.Type: GrantFiled: March 11, 2006Date of Patent: April 26, 2011Assignee: Industrial Technology Research InstituteInventors: Sen-Chia Chang, Yuan-Fu Liao, Jeng-Shien Lin
-
Patent number: 7930178Abstract: A frame of a speech signal is converted into the spectral domain to identify a plurality of frequency components and an energy value for the frame is determined. The plurality of frequency components is divided by the energy value for the frame to form energy-normalized frequency components. A model is then constructed from the energy-normalized frequency components and can be used for speech recognition and speech enhancement.Type: GrantFiled: December 23, 2005Date of Patent: April 19, 2011Assignee: Microsoft CorporationInventors: Zhengyou Zhang, Alejandro Acero, Amarnag Subramanya, Zicheng Liu
-
Publication number: 20100324893Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.Type: ApplicationFiled: August 26, 2010Publication date: December 23, 2010Applicant: AT&T Intellectual Property II, L.P. via transfer from AT&T Corp.Inventor: Mazin GILBERT
-
Patent number: 7835909Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.Type: GrantFiled: December 12, 2006Date of Patent: November 16, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
-
Patent number: 7831025Abstract: A method and system for administering a subjective listening test to remote users. A user can participate in a subjective listening test, such as an MOS listening test, over a telephone call. The telephone call is received and audio recordings are sequentially played over the telephone call. Quality ratings corresponding to the audio recordings are input by the user over the telephone call. The user can input digits corresponding to the quality ratings. This allows a user to take part in a subjective listening test without traveling to a lab.Type: GrantFiled: May 15, 2006Date of Patent: November 9, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: John D. Francis, Laurie F. Garrison, James H. James
-
Patent number: 7813921Abstract: There is provided a voice recognition device and a voice recognition method that enhance the function of noise adaptation processing in voice recognition processing and reduce the capacity of a memory being used. Acoustic models are subjected to clustering processing to calculate the centroid of each cluster and the differential vector between the centroid and each model, model composition between each kind of assumed noise model and the calculated centroid is carried out, and the centroid of each composition model and the differential vector are stored in a memory. In the actual recognition processing, the centroid optimal to the environment estimated by the utterance environmental estimation is extracted from the memory, model restoration is carried out on the extracted centroid by using the differential vector stored in the memory, and noise adaptation processing is executed on the basis of the restored model.Type: GrantFiled: March 15, 2005Date of Patent: October 12, 2010Assignee: Pioneer CorporationInventors: Hajime Kobayashi, Soichi Toyama, Yasunori Suzuki
-
Patent number: 7809566Abstract: A method for use in automatic speech recognition corrects erroneous recognition elements within a recognition hypothesis. A user input is recognized as a correction hypothesis which contains various recognition elements. A non-deterministic alignment is performed to align at least a portion of the correction hypothesis with an earlier recognition hypothesis which also contains various recognition elements such that the recognition elements in the aligned portion of the correction hypothesis are determined to most likely, correspond to a range of recognition elements in the earlier recognition hypotheses. The recognition elements in the range of recognition elements in the earlier recognition hypothesis are replaced with the recognition elements in the aligned portion of the correction hypothesis.Type: GrantFiled: October 13, 2006Date of Patent: October 5, 2010Assignee: Nuance Communications, Inc.Inventor: Ralf Meermeier
-
Patent number: 7797154Abstract: Provision to reduce production of musical noise. A noise reduction device includes: means for calculating a rank for each element included in a first region having predetermined sizes in the time axis direction and in the frequency axis direction, depending on a value of the element, in a noise section of an observed signal indicating variation of a frequency spectrum with time; means for calculating a rank for each element included in a second region, depending on a value of the element, the second region having predetermined sizes in the time axis direction and in the frequency axis direction in the observed signal; and means for subtracting, from the values of the respective elements in the second region, values based on the values of the respective elements in the first region whose ranks correspond to ranks of respective elements in the second region.Type: GrantFiled: May 27, 2008Date of Patent: September 14, 2010Assignee: International Business Machines CorporationInventor: Osamu Ichikawa
-
Patent number: 7797158Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.Type: GrantFiled: June 20, 2007Date of Patent: September 14, 2010Assignee: AT&T Intellectual Property II, L.P.Inventor: Mazin Gilbert
-
Patent number: 7797157Abstract: Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.Type: GrantFiled: January 10, 2005Date of Patent: September 14, 2010Assignee: Voice Signal Technologies, Inc.Inventors: Igor Zlokarnik, Laurence S. Gillick, Jordan Cohen
-
Patent number: 7774132Abstract: In one embodiment, a navigation system provides navigation directions within particular locations within a facility, such as within a corporate campus, airport, resort, building, etc. The navigation system may respond to navigation requests for different types of facility target destinations such as a location, a person, a movable item, an event, or a condition. Different location resources can be accessed depending on the type of requested target destination. For example, an employee database may be used to locate an office within the facility associated with navigation request that contains an employee name. A natural voice communication scheme can be used to access to the navigation system through a larger variety of networks and communication devices.Type: GrantFiled: July 5, 2006Date of Patent: August 10, 2010Assignee: Cisco Technology, Inc.Inventor: Bradley Richard DeGrazia
-
Patent number: 7752042Abstract: Methods and apparatus to operate an audience metering device with voice commands are described herein. An example method to identify audience members based on voice, includes: obtaining an audio input signal including a program audio signal and a human voice signal; receiving an audio line signal from an audio output line of a monitored media device; processing the audio line signal with a filter having adaptive weights to generate a delayed and attenuated line signal; subtracting the delayed and attenuated line signal from the audio input signal to develop a residual audio signal; identifying a person that spoke to create the human voice signal based on the residual audio signal; and logging an identity of the person as an audience member.Type: GrantFiled: February 1, 2008Date of Patent: July 6, 2010Assignee: The Nielsen Company (US), LLCInventor: Venugopal Srinivasan
-
Patent number: 7734472Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).Type: GrantFiled: September 29, 2004Date of Patent: June 8, 2010Assignee: AlcatelInventor: Michael Walker
-
Publication number: 20100131270Abstract: The invention relates to a method for determining a characteristic pattern for a speech message that is supplied in the form of a numerically encoded audio signal generated by means of a sampling process.Type: ApplicationFiled: July 13, 2007Publication date: May 27, 2010Applicant: Nokia Siemens Networks GmbH & Co.Inventor: Joachim Charzinski
-
Patent number: 7725314Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.Type: GrantFiled: February 16, 2004Date of Patent: May 25, 2010Assignee: Microsoft CorporationInventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
-
Patent number: 7725316Abstract: A speech recognition adaptation method for a vehicle having a telematics unit with an embedded speech recognition system. Speech is received and pre-processed to generate acoustic feature vectors, and an adaptation parameter is applied to the acoustic feature vectors to yield transformed acoustic feature vectors. The transformed acoustic feature vectors are decoded and a hypothesis of the speech is selected, and the adaptation parameter is trained using acoustic feature vectors from the hypothesis. The method also includes one or more of the following steps: the speech is observed for a certain characteristic and the trained adaptation parameter is saved in accordance with the certain characteristic for use in transforming feature vectors of subsequent speech having the certain characteristic; use of the trained adaptation parameter persists from one vehicle ignition cycle to the next; and use of the trained adaptation parameter is ceased upon detection of a system fault.Type: GrantFiled: July 5, 2006Date of Patent: May 25, 2010Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, John J Correia, Scott M Pennock
-
Patent number: 7720679Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.Type: GrantFiled: September 24, 2008Date of Patent: May 18, 2010Assignee: Nuance Communications, Inc.Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
-
Patent number: 7711560Abstract: A speech recognition apparatus equipped with the garbage acoustic model storage unit storing the garbage acoustic model which learned the collection of unnecessary words. A feature value calculation unit calculates the feature parameter necessary for recognition by acoustically analyzing the unidentified input speech including the non-language speech per frame which is a unit for speech analysis. A garbage acoustic score calculation unit calculates the garbage acoustic score by comparing the feature parameter and the garbage acoustic model, and a garbage acoustic score correction unit corrects the garbage acoustic score calculated by the garbage acoustic score calculation unit so as to raise it in the frame where the non-language speech is inputted.Type: GrantFiled: February 4, 2004Date of Patent: May 4, 2010Assignee: Panasonic CorporationInventors: Maki Yamada, Makoto Nishizaki, Yoshihisa Nakatoh, Shinichi Yoshizawa
-
Patent number: 7702505Abstract: A channel normalization apparatus includes: a characteristic extraction unit extracting MFCC characteristics and outputting rows of frames according to time; a characteristic parameter average calculation unit calculating an average value of the rows of the outputted MFCC characteristics; a channel variation estimation unit configuring a codebook based on a database of speech signals with attenuated channel variations and estimating a channel variation for each frame by calculating a distance between a MFCC parameter for each frame and an individual median value of the codebook when a MFCC of a channel distorted speech signal is inputted; and a smoothing operation based channel normalization unit smoothing another average value of the channel variation from the characteristic parameter average calculation unit and the channel variation from the channel variation estimation unit, subtracting the other average value from the MFCC of each frame and outputting rows of channel normalized MFCC characteristics.Type: GrantFiled: December 14, 2005Date of Patent: April 20, 2010Assignee: Electronics and Telecommunications Research InstituteInventor: Ho-Young Jung
-
Publication number: 20100094626Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.Type: ApplicationFiled: September 27, 2007Publication date: April 15, 2010Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
-
Patent number: 7676363Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.Type: GrantFiled: June 29, 2006Date of Patent: March 9, 2010Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, Scott M Pennock
-
Patent number: 7646912Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.Type: GrantFiled: February 18, 2005Date of Patent: January 12, 2010Assignee: Infineon Technologies AGInventors: Werner Hemmert, Marcus Holmberg
-
Publication number: 20090319265Abstract: A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing.Type: ApplicationFiled: June 17, 2009Publication date: December 24, 2009Inventors: Andreas Wittenstein, Mark Cromack
-
Patent number: 7630892Abstract: A method and apparatus are provided that perform text normalization and inverse text normalization using a single grammar. During text normalization, a finite state transducer identifies a second string of symbols from a first string of symbols it receives. During inverse text normalization, the context free transducer identifies the first string of symbols after receiving the second string of symbols.Type: GrantFiled: September 10, 2004Date of Patent: December 8, 2009Assignee: Microsoft CorporationInventors: Qiang Wu, Rachel I. Morton, Li Jiang
-
Patent number: 7620263Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.Type: GrantFiled: October 6, 2005Date of Patent: November 17, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
-
Publication number: 20090259465Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.Type: ApplicationFiled: June 24, 2009Publication date: October 15, 2009Applicant: AT&T Corp.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Patent number: 7580512Abstract: An incoming call screening treatment is selected for a call to a called communication device based on an emotional state criterion input by a user of the called communication device.Type: GrantFiled: June 28, 2005Date of Patent: August 25, 2009Assignee: Alcatel-Lucent USA Inc.Inventors: Ramachendra P. Batni, Ranjan Sharma
-
Patent number: 7567903Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.Type: GrantFiled: January 12, 2005Date of Patent: July 28, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Publication number: 20090157400Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.Type: ApplicationFiled: October 1, 2008Publication date: June 18, 2009Applicant: Industrial Technology Research InstituteInventor: Shih-Ming Huang
-
Patent number: 7548854Abstract: A unique, fully integrated, fully programmable, and highly flexible sound distribution system and methodology for providing masking sound, background music, and paging capabilities in up to eight zones of a building or space is provided. The methodology embodied in the system includes internal masking sounds that are uniquely pre-filtered to provide efficient and effective masking of distracting sounds within selectable zones of the space with a minimum masking sound dB sound level and with a pleasant sounding and non-annoying masking sound. The system also incorporates the capacity to be controlled from a remote or local telephone to adjust the volume level in any zone serviced by the system by issuing appropriate DTMF codes from the telephone's keypad. Unique bi-tone diagnostic functions are provided for assuring that the entire system is correctly wired and installed and for troubleshooting operational anomalies.Type: GrantFiled: March 28, 2002Date of Patent: June 16, 2009Assignee: AWI Licensing CompanyInventors: Kenneth P. Roy, Thomas J. Johnson, Ronald Fuller, Steve Dove
-
Patent number: 7542900Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.Type: GrantFiled: May 5, 2006Date of Patent: June 2, 2009Assignee: Microsoft CorporationInventors: James G. Droppo, Li Deng, Alejandro Acero
-
Publication number: 20090112586Abstract: Systems, methods and computer-readable media associated with using a divergence metric to evaluate user simulations in a spoken dialog system. The method employs user simulations of a spoken dialog system and includes aggregating a first set of one or more scores from a real user dialog, aggregating a second set of one or more scores from a simulated user dialog associated with a user model, determining a similarity of distributions associated with each of the first set and the second set, wherein the similarity is determined using a divergence metric that does not require any assumptions regarding a shape of the distributions. It is preferable to use a Cramér-von Mises divergence.Type: ApplicationFiled: November 1, 2007Publication date: April 30, 2009Applicant: AT&T Lab. Inc.Inventor: Jason WILLIAMS
-
Publication number: 20090018828Abstract: An automatic speech recognition system includes: a sound source localization module for localizing a sound direction of a speaker based on the acoustic signals detected by the plurality of microphones; a sound source separation module for separating a speech signal of the speaker from the acoustic signals according to the sound direction; an acoustic model memory which stores direction-dependent acoustic models that are adjusted to a plurality of directions at intervals; an acoustic model composition module which composes an acoustic model adjusted to the sound direction, which is localized by the sound source localization module, based on the direction-dependent acoustic models, the acoustic model composition module storing the acoustic model in the acoustic model memory; and a speech recognition module which recognizes the features extracted by a feature extractor as character information using the acoustic model composed by the acoustic model composition module.Type: ApplicationFiled: November 12, 2004Publication date: January 15, 2009Inventors: Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
-
Publication number: 20080270131Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).Type: ApplicationFiled: April 18, 2008Publication date: October 30, 2008Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
-
Publication number: 20080201141Abstract: Utterances by a speaker are analyzed by an appropriate computational system. The spoken words are recognized and indexed to their respective analogs which are used to tailor the speech sequence to conform to a pre-determined standard of speech characteristics which could be fixed for a given language or chosen based on the regional characteristics of the said common language target for a communication session. Thusly selected audio sequences are then tailored or synthesized into the normalized characteristics and inserted into the outgoing speech stream such that the resulting audio sequence exhibits reduced speech characteristics deemed undesirable.Type: ApplicationFiled: February 15, 2008Publication date: August 21, 2008Inventors: Igor Abramov, Patrick O. Nunally
-
Patent number: 7409343Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.Type: GrantFiled: July 22, 2003Date of Patent: August 5, 2008Assignee: France TelecomInventor: Delphine Charlet
-
Publication number: 20080082327Abstract: It is an object of the present invention to provide a sound processing apparatus which can allow a user to hear a sound with improved intelligibility even if the sound is hard to hear. The analyzing means 15 is adapted to analyze the input signal from the A/D converter 12, to detect, on the basis of an analysis of the input signal, a frequency band corresponding to a masking sound and a frequency band corresponding to a masked sound, and to change the cutoff frequencies of the lowpass and highpass filters 13 and 14 on the basis of the analysis of the input signal to ensure that the frequency band corresponding to a masking sound is included in a signal from one of the first and second sound output means 18 and 19, and the frequency band corresponding to a masked sound is included in a signal from the other of the first and second sound output means 18 and 19.Type: ApplicationFiled: September 13, 2005Publication date: April 3, 2008Applicants: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD., TOHOKU UNIVERSITYInventors: Atsunobu Murase, Shuichi Sakamoto, Youichi Suzuki, Tetsuaki Kawase, Toshimitsu Kobayashi
-
Publication number: 20080082328Abstract: A priori speech absence probability refers to a probability that a speech is not present with respect to a frame and a frequency bin resulting from an input signal. The priori speech absence probability has been regarded as a constant (generally, 0.5) because it is difficult to estimate. However, attempts to estimate the priori speech absence probability have been made since 2002. A novel method for estimating a priori speech absence probability using a statistical model is proposed. The method for estimating a priori speech absence probability obtains a priori speech absence probability of input speech data using a local parameter, a global parameter and an average parameter. The local parameter and the global parameter are obtained by determining a smaller value than a first threshold value as 0, determining a greater value than a second threshold value as 1, and applying a raised cosine function to values between the first threshold value and the second threshold value.Type: ApplicationFiled: September 27, 2007Publication date: April 3, 2008Applicant: Electronics and Telecommunications Research InstituteInventor: Sung Joo Lee
-
Patent number: 7340396Abstract: Speech feature vectors (10) are provided and utilized to develop a corresponding estimated speaker dependent speech feature space model (20) (in one embodiment, it is not necessary that this model (20) have defined correlations with the verbal content of the represented speech itself). A model alignment unit (21) then contrasts this model (20) against the contents of a speaker independent speech feature space model (24) to provide alignment indices to a transformation estimation unit (23). In one embodiment, these alignment indices are based, as least in part, upon a measure of the differences between likelihoods of occurrence for the elements that comprise the constituency of these models. The transformation estimation unit (23) utilizes these alignment indices to provide transformation parameters to a model transformation unit (25) that uses such parameters to transform a speaker independent speech recognition model set (26) and yield a resultant speaker adapted speech recognition model set (27).Type: GrantFiled: February 18, 2003Date of Patent: March 4, 2008Assignee: Motorola, Inc.Inventors: Mark Thomson, Julien Epps, Trym Holter
-
Patent number: 7328154Abstract: An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.Type: GrantFiled: August 13, 2003Date of Patent: February 5, 2008Assignee: Matsushita Electrical Industrial Co., Ltd.Inventors: Ambroise Mutel, Patrick Nguyen, Luca Rigazio
-
Publication number: 20080004875Abstract: A speech recognition method includes the steps of receiving speech in a vehicle, extracting acoustic data from the received speech, and applying a vehicle-specific inverse impulse response function to the extracted acoustic data to produce normalized acoustic data. The speech recognition method may also include one or more of the following steps: pre-processing the normalized acoustic data to extract acoustic feature vectors; decoding the normalized acoustic feature vectors using as input at least one of a plurality of global acoustic models built according to a plurality of Lombard levels of a Lombard speech corpus covering a plurality of vehicles; calculating the Lombard level of vehicle noise; and/or selecting the at least one of the plurality of global acoustic models that corresponds to the calculated Lombard level for application during the decoding step.Type: ApplicationFiled: June 29, 2006Publication date: January 3, 2008Applicant: GENERAL MOTORS CORPORATIONInventors: Rathinavelu Chengalvarayan, Scott M. Pennock
-
Patent number: 7310600Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.Type: GrantFiled: October 25, 2000Date of Patent: December 18, 2007Assignee: Canon Kabushiki KaishaInventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi