Normalizing Patents (Class 704/234)
  • Patent number: 7292974
    Abstract: As the application of a variance normalization (VN) to a speech signal (S) may be advantageous as well as disadvantageous with respect to the recognition rate in a speech recognizing process in dependence of the degree of the signal disturbance it is suggested to calculate a degree (ND) of variance normalization strength in dependence of the noise level of the signal, thereby skipping the step of variance normalization in the case of an undisturbed or clean signal.
    Type: Grant
    Filed: February 4, 2002
    Date of Patent: November 6, 2007
    Assignee: Sony Deutschland GmbH
    Inventor: Thomas Kemp
  • Patent number: 7269555
    Abstract: In a speech recognition system, a method of transforming speech feature vectors associated with speech data provided to the speech recognition system includes the steps of receiving likelihood of utterance information corresponding to a previous feature vector transformation, estimating one or more transformation parameters based, at least in part, on the likelihood of utterance information corresponding to a previous feature vector transformation, and transforming a current feature vector based on maximum likelihood criteria and/or the estimated transformation parameters, the transformation being performed in a linear spectral domain. The step of estimating the one or more transformation parameters includes the step of estimating convolutional noise Ni? and additive noise Ni? for each ith component of a speech vector corresponding to the speech data provided to the speech recognition system.
    Type: Grant
    Filed: August 30, 2005
    Date of Patent: September 11, 2007
    Assignee: International Business Machines Corporation
    Inventors: Dongsuk Yuk, David M. Lubensky
  • Publication number: 20070208562
    Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.
    Type: Application
    Filed: December 12, 2006
    Publication date: September 6, 2007
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
  • Patent number: 7219055
    Abstract: The present invention relates to a speech recognition apparatus for recognizing speeches of a plurality of users with high accuracy. An adapting unit 12 detects a best transformation function for adapting an input speech to an acoustic model from at least one transformation function based on the transformation results which are obtained by transforming the input speech by at least one transformation function stored in a storing unit 13, and allocates the input speech to the best transformation function. Further, the adapting unit 12 updates the transformation function to which the new input speech is allocated by all the input speeches allocated to the transformation function. A selecting unit 14 selects the transformation function used for transforming the input speech from at least one transformation function stored in the storing unit 13. A transforming unit 5 transforms the input speech by the selected transformation function.
    Type: Grant
    Filed: June 7, 2002
    Date of Patent: May 15, 2007
    Assignee: Sony Corporation
    Inventor: Helmut Lucke
  • Patent number: 7216076
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: May 8, 2007
    Assignee: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Patent number: 7206389
    Abstract: A computerized method is provided for electronically directing a call to a class, such that an utterance spoken by a speaker and received by a call-routing system is classified by the call-routing system as being associated with the class, such that the call-routing system includes a speech-recognition module, a feature-extraction module, and a classification module. The method includes extracting features from recognized speech; weighting elements of a feature vector with respective speech-recognition scores, wherein each weighting element is associated with one of the features; ranking classes to which the features are associated; and electronically directing the call to a highest-ranking class.
    Type: Grant
    Filed: January 7, 2004
    Date of Patent: April 17, 2007
    Assignee: Nuance Communications, Inc.
    Inventors: Benoit Dumoulin, Dominic Lavoie, Real Tremblay, Ben Shahshahani, Remi Ken-Sho Kwan
  • Patent number: 7197456
    Abstract: A method for improving noise robustness in speech recognition, wherein a front-end is used for extracting speech feature from an input speech and for providing a plurality of scaled spectral coefficients. The histogram of the scaled spectral coefficients is normalized to the histogram of a training set using Gaussian approximations. The normalized spectral coefficients are then converted into a set of cepstrum coefficients by a decorrelation module and further subjected to ceptral domain feature-vector normalization.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: March 27, 2007
    Assignee: Nokia Corporation
    Inventors: Hemmo Haverinen, Imre Kiss
  • Patent number: 7146317
    Abstract: A speech recognition device (8), to which can be applied over a first receive channel (21) and a second receive channel (25, 28) speech information (SI) that is colored by the respective receive channel (21, 25, 28), comprises reference storage means (36) for storing reference information (RI1) that features the type of pronunciation of words by a plurality of reference speakers and receive channel adaptation means (30, 38, 44) for adapting the stored reference information (RI, ARI) to a first or second receive channel (21, 25, 28) used by a user and user adaptation means (37) for adapting the stored reference information (RI1, RI2, RI3) to the type of pronunciation of words by the user of the speech recognition device (8) and speech recognition means (29) for recognizing text information (TI) to be assigned to the fed speech information (SI), while reference information (ARI1, ARI2, ARI3) adapted by the receive channel adaptation means (30, 38, 44) and by the user adaptation means (37) is evaluated, where no
    Type: Grant
    Filed: February 22, 2001
    Date of Patent: December 5, 2006
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Heinrich Franz Bartosik
  • Patent number: 7136459
    Abstract: Systems and techniques for improved efficiency and accuracy in voice dialing and directory lookup applications. A voice dialing module receives an input from a user and examines a directory to identify recognition results matching the voice input. A list of recognition results matching the voice input is constructed, the entries being ranked by confidence. A called party cache for each user includes entries for parties the user is likely to call. Once the result list has been constructed, the voice dialing module compares the list with the called party cache in order to determine if entries in the list appear in the cache. If an entry in the result list appears in the cache, the result list is reordered in order to take into account the increased likelihood that an entry appearing in the called party cache will be an entry the user wishes to call.
    Type: Grant
    Filed: February 5, 2004
    Date of Patent: November 14, 2006
    Assignee: Avaya Technology Corp.
    Inventors: Robert S. Cooper, Derek Sanders, Vladimir Sergeyevich Tokarev
  • Patent number: 7107213
    Abstract: In a voice pitch normalization device equipped in a voice recognition device VRAp for recognizing an incoming command voice Sva uttered by any speaker, and used to normalize the incoming command voice to be in an optimal pitch for voice recognition, a target voice generator produces a target voice signal by changing the incoming command voice Svd on the basis of a predetermined degree. A probability calculator calculates a probability indicating a degree of coincidence among the target voice signal and a plurality of words in sample data. A voice pitch changer repeatedly changes the target voice signal in voice pitch until a maximum probability becomes a predetermined probability or greater.
    Type: Grant
    Filed: December 3, 2003
    Date of Patent: September 12, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Mikio Oda, Tomoe Kawane
  • Patent number: 7089182
    Abstract: A method for performing noise adaptation of a target speech signal input to a speech recognition system, where the target speech signal contains both additive and convolutional noises. The method includes estimating an additive noise bias and a convolutional noise bias; in the target speech signal; and jointly compensating the target speech signal for the additive and convolutional noise biases in a feature domain.
    Type: Grant
    Filed: March 15, 2002
    Date of Patent: August 8, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Younes Souilmi, Luca Rigazio, Patrick Nguyen, Jean-Claude Junqua
  • Patent number: 7035797
    Abstract: A method and apparatus for speech processing in a distributed speech recognition system having a front-end and a back-end. The speech processing steps in the front-end are as follows: extracting speech features from a speech signal and normalizing the speech features in order to alter the power of the noise component in the modulation spectrum in relation to the power of the signal component, especially with frequencies above 10 Hz. A low-pass filter is then used to filter the normalized modulation spectrum in order to improve the signal-to-noise ratio (SNR) in the speech signal. The combination of feature vector normalization and low-pass filtering is effective in noise removal, especially in a low SNR environment.
    Type: Grant
    Filed: December 14, 2001
    Date of Patent: April 25, 2006
    Assignee: Nokia Corporation
    Inventor: Juha Iso-Sipila
  • Patent number: 7027979
    Abstract: A method and apparatus for speech reconstruction within a distributed speech recognition system is provided herein. Missing MFCCs are reconstructed and utilized to generate speech. Particularly, partial recovery of the missing MFCCs is achieved by exploiting the dependence of the missing MFCCs on the transmitted pitch period P as well as on the transmitted MFCCs. Harmonic magnitudes are then obtained from the transmitted and reconstructed MFCCs, and the speech is reconstructed utilizing these harmonic magnitudes.
    Type: Grant
    Filed: January 14, 2003
    Date of Patent: April 11, 2006
    Assignee: Motorola, Inc.
    Inventor: Tenkasi Ramabadran
  • Patent number: 7006969
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Grant
    Filed: November 1, 2001
    Date of Patent: February 28, 2006
    Assignee: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Patent number: 7003451
    Abstract: The present invention proposes a new method and a new apparatus for enhancement of audio source coding systems utilizing high frequency reconstruction (HFR). It utilizes adaptive filtering to reduce artifacts due to different tonal characteristics in different frequency ranges of an audio signal upon which HFR is performed. Tie present invention is applicable to both speech coding and natural audio coding systems.
    Type: Grant
    Filed: November 14, 2001
    Date of Patent: February 21, 2006
    Assignee: Coding Technologies AB
    Inventors: Kristofer Kjörling, Per Ekstrand, Fredrik Henn, Lars Villemoes
  • Patent number: 6996527
    Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: February 7, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
  • Patent number: 6980952
    Abstract: A maximum likelihood (ML) linear regression (LR) solution to environment normalization is provided where the environment is modeled as a hidden (non-observable) variable. By application of an expectation maximization algorithm and extension of Baum-Welch forward and backward variables (Steps 23a–23d) a source normalization is achieved such that it is not necessary to label a database in terms of environment such as speaker identity, channel, microphone and noise type.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: December 27, 2005
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6959276
    Abstract: A method and apparatus are provided for identifying a noise environment for a frame of an input signal based on at least one feature for that frame. Under one embodiment, the noise environment is identified by determining the probability of each of a set of possible noise environments. For some embodiments, the probabilities of the noise environments for past frames are included in the identification of an environment for a current frame. In one particular embodiment, a count is generated for each environment that indicates the number of past frames for which the environment was the most probable environment. The environment with the highest count is then selected as the environment for the current frame.
    Type: Grant
    Filed: September 27, 2001
    Date of Patent: October 25, 2005
    Assignee: Microsoft Corporation
    Inventors: James G. Droppo, Alejandro Acero, Li Deng
  • Patent number: 6947891
    Abstract: A speech recognition system that is insensitive to external noise and applicable to actual life includes an A/D converter that converts analog voice signals to digital signals. An FIR filtering section employs powers-of-two conversion to filter the digital signals converted at the A/D converter into numbers of channels. A characteristic extraction section immediately extracts speech characteristics having strong noise-resistance from the output signals of the FIR filtering section without using additional memories. A word boundary detection section discriminates the information of the start-point and the end-point of a voice signal on the basis of the characteristics extracted by the characteristic extraction section.
    Type: Grant
    Filed: January 22, 2001
    Date of Patent: September 20, 2005
    Assignee: Korea Advanced Institute of Science & Technology
    Inventors: Soo Young Lee, Chang Min Kim
  • Patent number: 6915259
    Abstract: Linear approximation of the background noise is applied after feature extraction and prior to speaker adaptation to allow the speaker adaptation system to adapt the speech models to the enrolling user without distortion from background noise. The linear approximation is applied in the feature domain, such as in the cepstral domain. Any adaptation technique that is commutative in the feature domain may be used.
    Type: Grant
    Filed: May 24, 2001
    Date of Patent: July 5, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, Patrick Nguyen, David Kryze, Jean-Claude Junqua
  • Patent number: 6823305
    Abstract: Speaker normalization is carried out based on biometric information available about a speaker, such as his height, or a dimension of a bodily member or article of clothing. The chosen biometric parameter correlates with the vocal tract length. Speech can be normalized based on the biometric parameter, which thus indirectly normalizes the speech based on the vocal tract length of the speaker. The inventive normalization can be used in model formation, or in actual speech recognition usage, or both. Substantial improvements in accuracy have been noted at little cost. The preferred biometric parameter is height, and the preferred form of scaling is linear scaling with the scale factor proportional to the height of the speaker.
    Type: Grant
    Filed: December 21, 2000
    Date of Patent: November 23, 2004
    Assignee: International Business Machines Corporation
    Inventor: Ellen M. Eide
  • Patent number: 6804643
    Abstract: A speech recognition feature extractor for extracting speech features from a speech signal, comprising: a time-to-frequency domain transformer (FFT) for generating spectral magnitude values in the frequency domain from the speech signal; a frequency domain filtering block (Mel) for generating a sub-band value relating to spectral magnitude values of a certain frequency sub-band; a compression block (LOG) for compressing said sub-band values; a transformation block (DCT) for obtaining a set of de-correlated features from the compressed sub-band values; and normalising block (CN) for normalising de-correlated features.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: October 12, 2004
    Assignee: Nokia Mobile Phones Ltd.
    Inventor: Imre Kiss
  • Patent number: 6785648
    Abstract: A system and method for performing speech recognition in cyclostationary noise environments includes a characterization module that may access original cyclostationary noise from an intended operating environment of a speech recognition device. The characterization module may then convert the original cyclostationary noise into target stationary noise which retains characteristics of the original cyclostationary noise. A conversion module may then generate a modified training database by utilizing the target stationary noise to modify an original training database that was prepared for training a recognizer in the speech recognition device. A training module may then train the recognizer with the modified training database to thereby optimize speech recognition procedures in cyclostationary noise environments.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: August 31, 2004
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Gustavo Hernandez Abrego
  • Patent number: 6760701
    Abstract: The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. An estimate of the enrollement channel and of the test channel is developed for inverse filtering of the enrollment or the test speech, respectively. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.
    Type: Grant
    Filed: January 8, 2002
    Date of Patent: July 6, 2004
    Assignee: T-NETIX, Inc.
    Inventors: Manish Sharma, Xiaoyu Zhang, Richard J. Mammone
  • Publication number: 20040117181
    Abstract: An input speech utterance is segmented into a prefixed time length to make frames, to extract an acoustic feature parameter of each frame. The acoustic feature parameter is frequency-converted by using pluralfrequency conversion coefficients previously defined. By using all combinations of plural post-conversion feature parameters obtained by the frequency conversion and at least one standard phonemic model, to compute plural similarities or distances of between the post-conversion feature parameters of each of the frames and the standard phonemic model. A frequency converting condition for normalizing the input utterance is decided by using the pluralsimilarities or distances. By using the frequency converting condition, the input utterance is normalized. With this method, even in case there is change of the speaker making a speech utterance, the individual difference of input utterance can be corrected thereby improving the performance of speech recognition.
    Type: Application
    Filed: September 24, 2003
    Publication date: June 17, 2004
    Inventors: Keiko Morii, Yoshihisa Nakatoh, Hiroyasu Kuwano
  • Patent number: 6751588
    Abstract: A method for performing microphone conversions in a speech recognition system comprises a speech module that simultaneously captures an identical input signal using both an original microphone and a final microphone. The original microphone is also used to record an original training database. The final microphone is also used to capture input signals during normal use of the speech recognition system. A characterization module then analyzes the recorded identical input signal to generate characterization values that are subsequently utilized by a conversion module to convert the original training database into a final training database. A training program then uses the final training database to train a recognizer in the speech module in order to optimally perform a speech recognition process, in accordance with the present invention.
    Type: Grant
    Filed: November 23, 1999
    Date of Patent: June 15, 2004
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Miyuki Tanaka, Duanpei Wu
  • Publication number: 20040107098
    Abstract: An arrangement for yielding enhanced audio features towards the provision of enhanced audio-visual features for speech recognition. Input is provided in the form of noisy audio-visual features and noisy audio features related to the noisy audio-visual features.
    Type: Application
    Filed: November 29, 2002
    Publication date: June 3, 2004
    Applicant: IBM Corporation
    Inventors: Sabine Deligne, Chalapathy V. Neti, Gerasimos Potamianos
  • Publication number: 20040107099
    Abstract: During a learning phase, a speech recognition device generates parameters of an acceptance voice model relating to a voice segment spoken by an authorized speaker and a rejection voice model. It uses normalization parameters to normalize a speaker verification score depending on the likelihood ratio of a voice segment to be tested and the acceptance model and rejection model. The speaker obtains access to a service application only if the normalized score is above a threshold. According to the invention, a module updates the normalization parameters as a function of the verification score on each voice segment test only if the normalized score is above a second threshold.
    Type: Application
    Filed: July 22, 2003
    Publication date: June 3, 2004
    Applicant: FRANCE TELECOM
    Inventor: Delphine Charlet
  • Patent number: 6694294
    Abstract: A method and system that improves voice recognition by improving the voice recognizer of a voice recognition system. Mu-law compression of bark amplitudes is used to reduce the effect of additive noise and thus improve the accuracy of the voice recognition system. A-law compression of bark amplitudes is used to improve the accuracy of the voice recognizer. Both mu-law compression and mu-law expansion can be used in the voice recognizer to improve the accuracy of the voice recognizer. Both A-law compression and A-law expansion can be used in the voice recognizer to improve the accuracy of the voice recognizer.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: February 17, 2004
    Assignee: Qualcomm Incorporated
    Inventor: Harinath Garudadri
  • Patent number: 6687665
    Abstract: In a voice pitch normalization device equipped in a voice recognition device VRAp for recognizing an incoming command voice Sva uttered by any speaker, and used to normalize the incoming command voice to be in an optimal pitch for voice recognition, a target voice generator produces a target voice signal by changing the incoming command voice Svd on the basis of a predetermined degree. A probability calculator calculates a probability indicating a degree of coincidence among the target voice signal and a plurality of words in sample data. A voice pitch changer repeatedly changes the target voice signal in voice pitch until a maximum probability becomes a predetermined probability or greater.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: February 3, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Mikio Oda, Tomoe Kawane
  • Patent number: 6681204
    Abstract: An apparatus and a method for encoding an input signal on the time base through orthogonal transform involves removing the correlation of signal waveform on the basis of the parameters obtained by means of linear predictive coding (LPC) analysis and pitch analysis of the input signal on the time base prior to the orthogonal transform. The time base input signal from input terminal is sent to a normalization circuit section and a LPC analysis circuit. The normalization circuit section removes the correlation of the signal waveform and takes out the residue by an LPC inverse filter and pitch inverse filter and sends the residue to an orthogonal transform circuit section. The LPC parameters from the LPC analysis circuit and the pitch parameters from the pitch analysis circuit are sent to a bit allocation calculation circuit.
    Type: Grant
    Filed: August 23, 2001
    Date of Patent: January 20, 2004
    Assignee: Sony Corporation
    Inventors: Jun Matsumoto, Masayuki Nishiguchi, Kenichi Makino
  • Patent number: 6678657
    Abstract: The present invention relates to a method and an apparatus for a robust feature extraction for speech recognition in a noisy environment, wherein the speech signal is segmented and is characterized by spectral components. The speech signal is splitted into a number of short term spectral components in L subbands, with L=1, 2, . . . and a noise spectrum from segments that only contain noise is estimated. Then a spectral subtraction of the estimated noise spectrum from the corresponding short term spectrum is performed and a probability for each short term spectrum component to contain noise is calculated. Finally these spectral component of each short-term spectrum, having a low probability to contain speech are interpolated in order to smooth those short-term, spectra that only contain noise. With the interpolation the spectral components containing noise are interpolated by reliable spectral speech components that could be found in the neighborhood.
    Type: Grant
    Filed: October 23, 2000
    Date of Patent: January 13, 2004
    Assignee: Telefonaktiebolaget LM Ericsson(Publ)
    Inventors: Raymond Brückner, Hans-Günter Hirsch, Rainer Klisch, Volker Springer
  • Patent number: 6647123
    Abstract: A signal processing circuit and method for increasing speech intelligibility. The invention comprises a receiving circuit for receiving an audio signal detectable by a human. A gain amplifying circuit provides gain amplification of the audio signal. A shaping filter modifies the audio signal to be in phase with a second audio signal present at the receiving circuit and which is detected by the human unprocessed by the signal processing circuit. The shaping filter further differentially amplifies first and second speech formant frequencies to restore a normal loudness relationship between them. A feedback circuit controls the gain amplification in the gain amplifying circuit for enabling the signal processing circuit to substantially prevent regenerative oscillation of the amplified audio signal. Additionally, a signal tone may be injected into the signal processing circuit for automatically controlling the gain amplifying circuit.
    Type: Grant
    Filed: March 4, 2002
    Date of Patent: November 11, 2003
    Assignee: Bioinstco Corp
    Inventors: Gillray L. Kandel, Lee E. Ostrander
  • Patent number: 6611801
    Abstract: A speech recognition system includes a token builder, a noise estimator, a template padder, a gain and noise adapter and a dynamic time warping (DTW) unit. The token builder produces a widened test token representing an input test utterance and at least one frame before and after the input test utterance. The noise estimator estimates noise qualities of the widened test token. The template padder pads each of a plurality of reference templates with at least one blank frame either the beginning or end of the reference template. The gain and noise adapter adapts each padded reference template with the noise and gain qualities thereby producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. The DTW unit performs a noise adapted DTW operation comparing the widened token with one of the noise adapted reference templates, wherein, when comparing against one of the noise frames, no duration constraints are used.
    Type: Grant
    Filed: September 4, 2002
    Date of Patent: August 26, 2003
    Assignee: Intel Corporation
    Inventor: Adoram Erell
  • Publication number: 20030023434
    Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.
    Type: Application
    Filed: July 26, 2001
    Publication date: January 30, 2003
    Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
  • Publication number: 20020173957
    Abstract: Speech issued by a speaker is collected by a microphone 1, and applied to a signal delay unit 3 and a sound level estimator 4 through an A/D converter 2. The sound level estimator 4 calculates a sound level estimation value based on the applied digital sound signal. The signal delay unit 3 applies the digital sound signal delayed by a predetermined sound level rising time period to a sound level adjuster 5. The sound level adjuster 5 adjusts the sound level of the digital sound signal based on the sound level estimation value, and applies the adjusted sound level output to the speech recognition unit 6. The speech recognition unit 6 performs speech recognition in response to the applied adjusted sound level output.
    Type: Application
    Filed: June 14, 2002
    Publication date: November 21, 2002
    Inventors: Tomoe Kawane, Takeo Kanamori
  • Patent number: 6466906
    Abstract: A speech recognition system includes a token builder, a noise estimator, a template padder, a gain and noise adapter and a dynamic time warping (DTW) unit. The token builder produces a widened test token representing an input test utterance and at least one frame before and after the input test utterance. The noise estimator estimates noise qualities of the widened test token. The template padder pads each of a plurality of reference templates with at least one blank frame either the beginning or end of the reference template. The gain and noise adapter adapts each padded reference template with the noise and gain qualities thereby producing adapted reference templates having noise frames wherever a blank frame was originally placed and noise adapted speech where speech exists. The DTW unit performs a noise adapted DTW operation comparing the widened token with one of the noise adapted reference templates.
    Type: Grant
    Filed: January 6, 1999
    Date of Patent: October 15, 2002
    Assignee: DSPC Technologies Ltd.
    Inventor: Adoram Erell
  • Patent number: 6456969
    Abstract: A method for recognizing a pattern that comprises a set of physical stimuli, said method comprising the steps of: providing a set of training observations and through applying a plurality of association models ascertaining various measuring values pj(k|x), j=1 . . . M, that each pertain to assigning a particular training observation to one or more associated pattern classes; setting up a log/linear association distribution by combining all association models of the plurality according to respective weight factors, and joining thereto a normalization quantity to produce a compound association distribution; optimizing said weight factors for thereby minimizing a detected error rate of the actual assigning to said compound distribution; recognizing target observations representing a target pattern with the help of said compound distribution.
    Type: Grant
    Filed: August 10, 1999
    Date of Patent: September 24, 2002
    Assignee: U.S. Philips Corporation
    Inventor: Peter Beyerlein
  • Publication number: 20020116169
    Abstract: A method generates normalized representations of strings, in particular sentences. The method, which can be used for translation, receives an input string. The input string is subjected to a first operation out of a plurality of operating functions for linguistically processing the input string to generate a first normalized representation of the input string that includes linguistic information. The first normalized representation is then subjected to a second operation for replacing linguistic information in the first normalized representation by abstract variables and to generate a second normalized representation.
    Type: Application
    Filed: December 18, 2000
    Publication date: August 22, 2002
    Applicant: Xerox Corporation
    Inventors: Salah Ait-Mokhtar, Jean-Pierre Chanod, Eric Gaussier
  • Patent number: 6401063
    Abstract: A method and apparatus for use in the field of speaker verification. The invention provides a method and apparatus for generating a pair of data elements, namely a first element representative of a speaker specific speech pattern and a second element representative of a biased normalizing template, the pair of data elements being suitable for use in a speaker verification system. The invention provides receiving an audio signal representative of a training token associated with a given speaker and processing the training token on a basis of a reference speaker independent model set to derive a speaker independent normalizing template. The invention further provides processing the training token on a basis of a reference speaker independent model set for generating a speaker specific speech pattern. The speaker specific speech pattern and the speaker independent normalizing template are then processed to derive a biased normalizing template.
    Type: Grant
    Filed: November 9, 1999
    Date of Patent: June 4, 2002
    Assignee: Nortel Networks Limited
    Inventors: Matthieu Hébert, Stephen D. Peters
  • Patent number: 6353671
    Abstract: A signal processing circuit and method for increasing speech intelligibility. The invention comprises a receiving circuit for receiving an audio signal detectable by a human. A gain amplifying circuit provides gain amplification of the audio signal. A shaping filter modifies the audio signal to be in phase with a second audio signal present at the receiving circuit and which is detected by the human unprocessed by the signal processing circuit. The shaping filter further differentially amplifies first and second speech formant frequencies to restore a normal loudness relationship between them. A feedback circuit controls the gain amplification in the gain amplifying circuit for enabling the signal processing circuit to substantially prevent regenerative oscillation of the amplified audio signal. Additionally, a signal tone may be injected into the signal processing circuit for automatically controlling the gain amplifying circuit.
    Type: Grant
    Filed: February 5, 1998
    Date of Patent: March 5, 2002
    Assignee: Bioinstco Corp.
    Inventors: Gillray L. Kandel, Lee E. Ostrander
  • Publication number: 20020013703
    Abstract: An apparatus and a method for encoding an input signal on the time base through orthogonal transform, comprising a step of removing the correlation of signal waveform on the basis of the parameters obtained by means of linear predictive coding (LPC) analysis and pitch analysis of the input signal on the time base prior to the orthogonal transform. The time base input signal from input terminal 10 is sent to normalization circuit section 11 and (LPC) analysis circuit 39. The normalization circuit section 11 removes the correlation of the signal waveform and takes out the residue by means of LPC inverse filter 12 and pitch inverse filter 13 and sends the residue to orthogonal transform circuit section 25. The LPC parameters from the lop analysis circuit 39 and the pitch parameters from the pitch analysis circuit 15 are sent to bit allocation calculation circuit 41.
    Type: Application
    Filed: August 23, 2001
    Publication date: January 31, 2002
    Applicant: Sony Corporation
    Inventors: Jun Matsumoto, Masayuki Nishiguchi, Kenichi Makino
  • Publication number: 20020013702
    Abstract: The present invention relates to a voice recognition system.
    Type: Application
    Filed: January 22, 2001
    Publication date: January 31, 2002
    Inventors: Soo Young Lee, Chang Min Kim
  • Patent number: 6327564
    Abstract: An accurate and reliable method is provided for detecting speech from an input speech signal. A probabilistic approach is used to classify each frame of the speech signal as speech or non-speech. The speech detection method is based on a frequency spectrum extracted from each frame, such that the value for each frequency band is considered to be a random variable and each frame is considered to be an occurrence of these random variables. Using the frequency spectrums from a non-speech part of the speech signal, a known set of random variables is constructed. Next, each unknown frame is evaluated as to whether or not it belongs to this known set of random variables. To do so, a unique random variable (preferably a chi-square value) is formed from the set of random variables associated with the unknown frame. The unique variable is normalized with respect the known set of random variables and then classified as either speech or non-speech using the “Test of Hypothesis”.
    Type: Grant
    Filed: March 5, 1999
    Date of Patent: December 4, 2001
    Assignee: Matsushita Electric Corporation of America
    Inventors: Philippe Gelin, Jean-Claude Junqua
  • Patent number: 6314396
    Abstract: Energy normalization in a speech recognition system is achieved by adaptively tracking the high, mid, and low energy envelopes, wherein the adaptive high energy tracking value adapts with weighting enhanced for high energies, and the adaptive low energy tracking value adapts with weighting enhanced for low energies. A tracking method is also provided for discriminating waveform segments as being one of “speech” or “silence”, and a measure of the signal to noise ratio and absolute noise floor are used as feedback means to achieve optimal speech recognition accuracy.
    Type: Grant
    Filed: November 6, 1998
    Date of Patent: November 6, 2001
    Assignee: International Business Machines Corporation
    Inventor: Michael D. Monkowski
  • Patent number: 6236962
    Abstract: An apparatus, method, and storage medium for eliminating the influence of line characteristics in a real-time manner in order to raise recognition precision of input speech and to enable the speech to be recognized in a real-time manner, includes a device and step for obtaining, an estimate value of a long-time mean of a parameter from speech feature parameters which are sequentially inputted by using the speech feature parameters which have already been inputted, and a device and step for normalizing the speech feature parameter inputted at that time point by using the obtained estimate value. Each time the speech feature parameter is inputted, the latest estimate value is obtained by using the already inputted parameters including the inputted speech feature parameter, and the latest input speech feature parameter is normalized by using the updated estimate value.
    Type: Grant
    Filed: March 12, 1998
    Date of Patent: May 22, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Tetsuo Kosaka, Yasuhiro Komori
  • Patent number: 6236963
    Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.
    Type: Grant
    Filed: March 16, 1999
    Date of Patent: May 22, 2001
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 6199041
    Abstract: A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.
    Type: Grant
    Filed: November 20, 1998
    Date of Patent: March 6, 2001
    Assignee: International Business Machines Corporation
    Inventors: Fu-Hua Liu, Michael A. Picheny
  • Patent number: 6178400
    Abstract: Either or both the calling and called parties to a telephone call carried by a telecommunications network may invoke normalization of their speech to enhance intelligibility. In response to such a request, a speech normalization platform determines the manner in which the speech should be normalized. The platform does so by selecting from among a set of rules that specify the manner in which the speech should be modified, the rule that most closely corresponds with a set of parameters indicative of the party's speech. Having selected the rule, the platform then implements the rule to modify the party's speech to enhance its intelligibility.
    Type: Grant
    Filed: July 22, 1998
    Date of Patent: January 23, 2001
    Assignee: AT&T Corp.
    Inventor: Hossein Eslambolchi
  • Patent number: 6173258
    Abstract: A method for reducing noise distortions in a speech recognition system comprises a feature extractor that includes a noise-suppressor, one or more time cosine transforms, and a normalizer. The noise-suppressor preferably performs a spectral subtraction process early in the feature extraction procedure. The time cosine transforms preferably operate in a centered-mode to each perform a transformation in the time domain. The normalizer calculates and utilizes normalization values to generate normalized features for speech recognition. The calculated normalization values preferably include mean values, left variances and right variances.
    Type: Grant
    Filed: October 22, 1998
    Date of Patent: January 9, 2001
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Miyuki Tanaka, Ruxin Chen, Duanpei Wu