Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 6721698
    Abstract: A speech recognition feature extractor includes a time-to-frequency domain transformer for generating spectral values in the frequency domain from a speech signal; a partitioning means for generating a first set and an additional set of spectral values in the frequency domain; a first feature generator for generating a first group of speech features using the first set of spectral values; a additional feature generator for generating an additional group of speech features using the additional set of spectral values; the feature generators arranged to operate in parallel, an assembler for assembling an output set of speech features from at least one speech feature from the first group of speech features and at least one speech feature from the additional group of speech features, and an anti-aliasing and sampling rate reduction block, where the first and the additional set of spectral values comprise at least one common spectral value.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: April 13, 2004
    Assignee: Nokia Mobile Phones, Ltd.
    Inventors: Ramalingam Hariharan, Juha Häkkinen, Imre Kiss, Jilei Tian, Olli Viikki
  • Patent number: 6718304
    Abstract: A speech recognition support method in a system to retrieve a map in response to a user's input speech. The user's speech is recognized and a recognition result is obtained. If the recognition result represents a point on the map, a distance between the point and a base point on the map is calculated. The distance is decided to be above a threshold or not. If the distance is above the threshold, an inquiry to confirm whether the recognition result is correct is output to the user.
    Type: Grant
    Filed: June 29, 2000
    Date of Patent: April 6, 2004
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Mitsuyoshi Tachimori, Hiroshi Kanazawa
  • Patent number: 6714907
    Abstract: A speech compression system with a special fixed codebook structure and a new search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A better way is used to calculate a criterion value, minimizing an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.
    Type: Grant
    Filed: February 15, 2001
    Date of Patent: March 30, 2004
    Assignee: Mindspeed Technologies, Inc.
    Inventor: Yang Gao
  • Publication number: 20040059572
    Abstract: A system and a method for quantitatively measuring voice transmission quality within a voice-over-data-network such as a telephony-enabled LAN utilize speech recognition to measure the quality of voice transmission. A first aspect of the invention involves determining the suitability of the LAN for voice communications. A voice sample is selected by a first terminal and is transmitted to a second terminal on the LAN a number of times. The first terminal introduces an incrementally larger quantity of noise into each transmission of the voice packet. The second terminal performs speech recognition for each successively received voice sample and determines the accuracy for each speech recognition session. The amount of noise which drops the speech recognition accuracy below a threshold level provides a measure of the suitability of the LAN for voice communications. During normal operation of the LAN, speech recognition accuracy tests are performed between various endpoints to monitor voice transmission quality.
    Type: Application
    Filed: September 25, 2002
    Publication date: March 25, 2004
    Inventors: Branislav Ivanic, Eli Jacobi, Peter Kozdon, Noboru Nishiya, Christoph Aktas
  • Patent number: 6701291
    Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.
    Type: Grant
    Filed: April 2, 2001
    Date of Patent: March 2, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
  • Patent number: 6691087
    Abstract: A signal processing system for detecting the presence of a desired signal component by applying a probabilistic description to the classification and tracking of various signal components (e.g., desired versus non-desired signal components) in an input signal is disclosed.
    Type: Grant
    Filed: September 30, 1998
    Date of Patent: February 10, 2004
    Assignees: Sarnoff Corporation, LG Electronics, Inc.
    Inventors: Lucas Parra, Aalbert de Vries
  • Patent number: 6675126
    Abstract: A method of estimating a measure of randomness of a function of at least one representative value of at least one random variable is constructed to have the steps of obtaining the at least one random variable; determining the at least one representative value of the obtained at least one random variable; determining a first statistic of the obtained at least one random variable; determining a gradient of the function with respect to the determined at least one representative value; and transforming the obtained first statistic into a second statistic of the function, using the determined gradient. The step of transforming may be adapted to transform the first statistic into the second statistic, such that the second statistic responds to the first statistic more sensitively in the case of the gradient being steep than in the case of the gradient being gentle.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: January 6, 2004
    Assignee: Kabushiki Kaisha Toyota Chuo Kenkyusho
    Inventor: Christoph Hermann Roser
  • Patent number: 6629069
    Abstract: A speech recogniser is provided for identifying entries in a database. Results from the recognition of a user's speech are combined with each other and optionally with reference to data in the database in order to maximise the accuracy of an identified entry. An output is also provided which gives an indication of the likely accuracy of the identified entry.
    Type: Grant
    Filed: January 3, 2001
    Date of Patent: September 30, 2003
    Assignee: British Telecommunications a public limited company
    Inventors: David John Attwater, Hilary Richard William Greenhow, Peter John Durston
  • Publication number: 20030182116
    Abstract: A method of changing psychological stress indices by evaluating manifestations of physiological change in the human voice wherein the utterances of a subject under examination are formatted as electrical signals and processed to alter selected characteristics which have been found to change with psycho-physiological state changes, such that the resultant output data signals are perceptually unchanged yet display none of the undesired physiological response characteristics. Apparatus for performing changes of this type includes a data input port, means for spectral alteration and a data output port.
    Type: Application
    Filed: March 25, 2002
    Publication date: September 25, 2003
    Inventor: Patrick O?apos;Neal Nunally
  • Publication number: 20030182115
    Abstract: A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.
    Type: Application
    Filed: March 20, 2002
    Publication date: September 25, 2003
    Inventors: Narendranath Malayath, Harinath Garudadri
  • Patent number: 6618702
    Abstract: A language-independent speaker-recognition system based on parallel cumulative differences in dynamic realization of phonetic features ( i.e. , pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from many phone recognizers to perform text independent speaker recognition. A digitized speech signal from a speaker is converted to a sequence of phones by each phone recognizer. Each phone sequence is then modified based on the energy in the signal. The modified phone sequences are tokenized to produce phone n-grams that are compared against a speaker and a background model for each phone recognizer to produce log-likelihood ratio scores. The log-likelihood ratio scores from each phone recognizer are fused to produce a final recognition score for each speaker model. The recognition score for each speaker model is then evaluated to determine which of the modeled speakers, if any, produced the digitized speech signal.
    Type: Grant
    Filed: June 14, 2002
    Date of Patent: September 9, 2003
    Inventors: Mary Antoinette Kohler, Walter Doyle Andrews, III, Joseph Paul Campbell, Jr.
  • Patent number: 6609093
    Abstract: The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.
    Type: Grant
    Filed: June 1, 2000
    Date of Patent: August 19, 2003
    Assignee: International Business Machines Corporation
    Inventors: Ramesh Ambat Gopinath, Mukund Padmanabhan, George Andrei Saon
  • Publication number: 20030154076
    Abstract: To improve the performance and the recognition rate of a method for recognizing speech in a dialogue system, or the like, it is suggested to derive emotion information data (EID) from speech input (SI) being descriptive for an emotional state of a speaker or a change thereof based upon which a process of recognition is chosen and/or designed.
    Type: Application
    Filed: February 11, 2003
    Publication date: August 14, 2003
    Inventor: Thomas Kemp
  • Patent number: 6606598
    Abstract: A method and apparatus are disclosed for computing and reporting statistical information that describes the performance of an interactive speech application. The interactive speech application is developed and deployed for use by one or more callers. During execution, the interactive speech application stores, in a log, event information that describes each task carried out by the interactive speech application in response to interaction with the one or more callers. After the log is established, an analytical report is displayed. The report describes selective actions taken by the interactive speech application while executing, and selective actions taken by one or more callers while interacting with the interactive speech application. Information in the analytical report is selected so as to identify one or more potential performance problems in the interactive speech application. The analytical reports are generated based on the information stored in the event logs.
    Type: Grant
    Filed: September 21, 1999
    Date of Patent: August 12, 2003
    Assignee: SpeechWorks International, Inc.
    Inventors: Mark A. Holthouse, Matthew T. Marx, John N. Nguyen
  • Patent number: 6604072
    Abstract: An audio signal is sampled and a frequency transform is performed on a succession of sets of samples of the signal to obtain a time dependent power spectrum for the audio signal. Frequency components output by the frequency transform are collected in frequency bands. More than one running average is taken of each semitone frequency band. When the values of two running averages of the same semitone frequency band cross, time information is recorded. Information about average crossing events that have occurred at different times in a set of adjacent semitone frequency bands is combined to form a key. A set of keys obtained from a song provides a means for identifying the song and is stored in a database for use in identifying songs.
    Type: Grant
    Filed: March 9, 2001
    Date of Patent: August 5, 2003
    Assignee: International Business Machines Corporation
    Inventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
  • Patent number: 6604073
    Abstract: Disclosed is a voice recognition apparatus which can prevent an erroneous manipulation due to erroneous voice recognition from being carried out even in a noisy environment. As long as a duration of utterance acquired based on the level of a voice signal uttered by an operator (user) approximately coincides with a duration of utterance acquired based on mouth image data acquired by capturing the mouth of the operator, the voice recognition apparatus outputs vocal-manipulation phrase data as the result of voice recognition.
    Type: Grant
    Filed: September 12, 2001
    Date of Patent: August 5, 2003
    Assignee: Pioneer Corporation
    Inventor: Shoutarou Yoda
  • Publication number: 20030139926
    Abstract: Methods for processing speech data are described herein. In one aspect of the invention, an exemplary method includes receiving a speech data stream, performing a Mel Frequency Cepstral Coefficients (MFCC) feature extraction on the speech data stream, optimizing feature space transformation (FST), optimizing model space transformation (MST) based on the FST, and performing recognition decoding based on the FST and the MST, generating a word sequence. Other methods and apparatuses are also described.
    Type: Application
    Filed: January 23, 2002
    Publication date: July 24, 2003
    Inventors: Ying Jia, Xiaobo Pi, Yonghong Yan
  • Patent number: 6598019
    Abstract: To improve the precision in correction of an input sentence by using a template pattern for model sentence. A plurality of template patterns for the model sentence are provided beforehand. Each of the template patterns is regarded as a plurality of templates of words/phrases based on expertise of language teachers with scores assigned to the words according to their importance. The scores and subsequently the input sentence are read and analyzed in comparison with each of the template patterns and the total of scores of matching words is calculated. A template pattern having the highest total score is selected as an optimum template pattern and the input sentence is corrected using the optimum template pattern. This method improves the likelihood that a template pattern containing a larger number of important words is selected as the optimum template pattern.
    Type: Grant
    Filed: June 20, 2000
    Date of Patent: July 22, 2003
    Assignee: Sunflare Co., Ltd.
    Inventors: Naoyuki Tokuda, Hiroyuki Sasai
  • Patent number: 6591235
    Abstract: A method is provided for providing high dimensional data. The high dimensional data is linearly transformed into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data. Each of the coordinates are marginally Gaussianized, the Gaussianization being characterized by univariate Gaussian means, priors, and variances. The transforming and Gaussianizing steps are iteratively repeated until the coordinates converge to a standard Gaussian distribution. The coordinates of all iterations are arranged hierarchically to facilitate data mining. The arranged coordinates are then mined. According to an embodiment of the invention, the transform step includes applying an iterative maximum likelihood expectation maximization (EM) method to the high dimensional data.
    Type: Grant
    Filed: May 5, 2000
    Date of Patent: July 8, 2003
    Assignee: International Business Machines Corporation
    Inventors: Scott Shaobing Chen, Ramesh Ambat Gopinath
  • Publication number: 20030125942
    Abstract: The invention relates to a method of setting a free parameter 1 λ α ortho
    Type: Application
    Filed: October 10, 2002
    Publication date: July 3, 2003
    Inventor: Jochen Peters
  • Patent number: 6577996
    Abstract: A method and apparatus for objectively evaluating sound quality of a signal processor or transmission channel. The present invention analyzes the distortion in a series of test sound frames compared to a series of sample sound frames. The invention detects sequences of test sound frames having distortion levels that are greater than a temporal distortion threshold and calculates an average length and a maximum length of these sequences. The present invention also detects individual test sound frames having distortion levels that are greater than an outlier distortion threshold and calculates a percentage of these frames present in the series of test sound frames. Further, the present invention calculates the average distortion level in the series of test sound frames and a variance of the distortion level in the test sound frames.
    Type: Grant
    Filed: December 8, 1998
    Date of Patent: June 10, 2003
    Assignee: Cisco Technology, Inc.
    Inventor: Ramanathan T. Jagadeesan
  • Patent number: 6574594
    Abstract: A broadcast datastream is received, and audio identifying information is generated for audio content from the broadcast datastream. It is determined whether the audio identifying information generated for the broadcast audio content matches audio identifying information in an audio content database. In one preferred embodiment, the audio identifying information is an audio feature signature that is based on audio content. Also provided is a system for monitoring broadcast audio content.
    Type: Grant
    Filed: June 29, 2001
    Date of Patent: June 3, 2003
    Assignee: International Business Machines Corporation
    Inventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
  • Publication number: 20030097263
    Abstract: A method (200) is described for creating decision trees for processing a sampled signal indicative of speech. The method (200) includes providing model sub vectors (220) from partitioned statistical speech models of phones, the models comprising vectors of mean values and associated variance values. The method (200) then provides for statistically analyzing (230) the model sub vectors of mean values to provide projection vectors indicating directions of relative maximum variance between the sub vectors and thereafter the method provides for calculating projection values (240) of the projection vectors. A selecting potential threshold values (260) step is then applied, the potential threshold values being determined from analysis of a range of the projection values. Finally a step of creating the decision trees (270) is effected to provide a decision tree having decisions to divide the model sub vectors into groups, the groups being leaves of the tree.
    Type: Application
    Filed: November 16, 2001
    Publication date: May 22, 2003
    Inventor: Hang Shun Lee
  • Patent number: 6567776
    Abstract: In speaker-independent speech recognition, between-speaker variability is one of the major resources of recognition errors. A speaker cluster model is used to manage recognition problems caused by between-speaker variability. In the training phase, the score function is used as a discriminative function. The parameters of at least two cluster-dependent models are adjusted through a discriminative training method to improve performance of the speech recognition.
    Type: Grant
    Filed: April 4, 2000
    Date of Patent: May 20, 2003
    Assignee: Industrial Technology Research Institute
    Inventors: Sen-Chia Chang, Shih-Chieh Chien, Chung-Mou Penwu
  • Patent number: 6564181
    Abstract: A system that provides measurements of speech distortion that correspond closely to user perceptions of speech distortion is disclosed. The system calculates and analyzes first and second discrete derivatives to detect and determine the incidence of change in the voice waveform that would not have been made by human articulation because natural voice signals change at a limited rate. Statistical analysis is performed of both the first and second discrete derivatives to detect speech distortion by looking at the distribution of the signals. For example. the kurtosis of the signals is analyzed as well as the number of times these values exceed a predetermined threshold, Additionally. the number of times the first derivative data is less than a predetermined low value is analyzed to provide a level of speech distortion and clipping of the signal due to lost data packets.
    Type: Grant
    Filed: April 24, 2001
    Date of Patent: May 13, 2003
    Assignee: WorldCom, Inc.
    Inventor: William C. Hardy
  • Publication number: 20030088411
    Abstract: The invention provides a Hidden Markov Model (132) based automated speech recognition system (100) that dynamically adapts to changing background noise by detecting long pauses in speech, and for each pause processing background noise during the pause to extract a feature vector that characterizes the background noise, identifying a Gaussian mixture component of noise states that most closely matches the extracted feature vector, and updating the mean of the identified Gaussian mixture component so that it more closely matches the extracted feature vector, and consequently more closely matches the current noise environment. Alternatively, the process is also applied to refine the Gaussian mixtures associated with other emitting states of the Hidden Markov Model.
    Type: Application
    Filed: November 5, 2001
    Publication date: May 8, 2003
    Inventors: Changxue Ma, Yuan-Jun Wei
  • Publication number: 20030069729
    Abstract: Predicting speech recognizer confusion where utterances can be represented by any combination of text form and audio file. The utterances are represented with an intermediate representation that directly reflects the acoustic characteristics of the utterances. Text representations of the utterances can be directly used for predicting confusability without access to audio file examples of the utterances. First embodiment: two text utterances are represented with strings of phonemes and one of the strings of phonemes is transformed into the other strings of phonemes for a least cost as a confusability measure. Second embodiment: two utterances are represented with an intermediate representation of sequences of acoustic events based on phonetic capabilities of speakers obtained from acoustic signals of the utterances and the acoustic events are compared. Predicting confusability of the utterances according to a formula 2K/(T), K is a number of matched acoustic events and T is a total number of acoustic events.
    Type: Application
    Filed: October 5, 2001
    Publication date: April 10, 2003
    Inventors: Corine A Bickley, Lawrence A. Denenberg
  • Patent number: 6539351
    Abstract: A method is provided for generating a high dimensional density model within an acoustic model for one of a speech and a speaker recognition system. Acoustic data obtained from at least one speaker is transformed into high dimensional feature vectors. The density model is formed to model the feature vectors by a mixture of compound Gaussians with a linear transform, wherein each compound Gaussian is associated with a compound Gaussian prior and models each coordinate of each component of the density model independently by a univariate Gaussian mixture comprising a univariate Gaussian prior, variance, and mean. An iterative expectation maximization (EM) method is applied to the feature vectors. The EM method includes the step of computing an auxiliary function Q of the EM method.
    Type: Grant
    Filed: May 5, 2000
    Date of Patent: March 25, 2003
    Assignee: International Business Machines Corporation
    Inventors: Scott Shaobing Chen, Ramesh Ambat Gopinath
  • Patent number: 6539350
    Abstract: Speech level measurement is particularly significant for successful echo compensation in telecommunications systems, for noise suppression in a noisy environment, for example in military vehicles, or in speech recognition and in speech coding and decoding systems. A method is indicated which permits speech levels measurement only if features of speech are recognized and interferences and speech pauses are filtered out for the measurement. To this end, speech and pause detectors and a mean value generator are utilized, the time behavior of which is largely adapted to the perception capability of the human ear. Briefly spoken vowels thus are well detected, while nasal sounds or consonants are suppressed in the case of falling levels. A speech level measuring device is indicated which provides very accurate results in a short adaptation period.
    Type: Grant
    Filed: November 18, 1999
    Date of Patent: March 25, 2003
    Assignee: Alcatel
    Inventor: Michael Walker
  • Patent number: 6535843
    Abstract: When necessary to time scale a speech signal, it is advantageous to do it under influence of a signal that measures the small-window non-stationarity of the speech signal. Three measures of stationarity are disclosed: one that is based on time domain analysis, one that is based on frequency domain analysis, and one that is based on both time and frequency domain analysis.
    Type: Grant
    Filed: August 18, 1999
    Date of Patent: March 18, 2003
    Assignee: AT&T Corp.
    Inventors: Ioannis G. Stylianou, David A. Kapilow, Juergen Schroeter
  • Publication number: 20030050779
    Abstract: There is provided a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted outputs from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture both inter- and intra-language pronunciation variations which is ideal for multilingual speaker independent speech recognition systems. A significant improvement in overall system performance is obtained for a multilingual speaker independent name dialing task when applying multilingual instead of language dependent text-to-phoneme mapping.
    Type: Application
    Filed: August 31, 2001
    Publication date: March 13, 2003
    Inventors: Soren Riis, Kare Jean Jensen, Morten With Pedersen
  • Publication number: 20030033145
    Abstract: A system, method and article of manufacture are provided for detecting emotion using statistics. First, a database is provided. The database has statistics including human associations of voice parameters with emotions. Next, a voice signal is received. At least one feature is extracted from the voice signal. Then the extracted voice feature is compared to the voice parameters in the database. An emotion is selected from the database based on the comparison of the extracted voice feature to the voice parameters and is then output.
    Type: Application
    Filed: April 10, 2001
    Publication date: February 13, 2003
    Inventor: Valery A. Petrushin
  • Publication number: 20030023436
    Abstract: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.
    Type: Application
    Filed: March 29, 2001
    Publication date: January 30, 2003
    Applicant: IBM Corporation
    Inventor: Ellen M. Eide
  • Publication number: 20030023437
    Abstract: A system and method for processing human language input uses collocation information for the language that is not limited to N-gram information for N no greater than a predetermined value. The input is preferably speech input. The system and method preferably recognize at least a portion of the input based on the collocation information.
    Type: Application
    Filed: January 28, 2002
    Publication date: January 30, 2003
    Inventor: Pascale Fung
  • Patent number: 6513004
    Abstract: The acoustic speech signal is decomposed into wavelets arranged in an asymmetrical tree data structure from which individual nodes may be selected to best extract local features, as needed to model specific classes of sound units. The wavelet packet transformation is smoothed through integration and compressed to apply a non-linearity prior to discrete cosine transformation. The resulting subband features such as cepstral coefficients may then be used to construct the speech recognizer's speech models. Using the local feature information extracted in this manner allows a single recognizer to be optimized for several different classes of sound units, thereby eliminating the need for parallel path recognizers.
    Type: Grant
    Filed: November 24, 1999
    Date of Patent: January 28, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, David Kryze, Ted Applebaum, Jean-Claude Junqua
  • Patent number: 6510410
    Abstract: A method and an apparatus for automatic recognition of tone languages, employing the steps of converting the words of speech into an electrical signal, generating spectral features from the electrical signal, extracting pitch values from the electrical signal, combining said spectral features and the pitch values into acoustic feature vectors, comparing the acoustic feature vectors with prototypes of phonemes in an acoustic prototype database including prototypes of toned vowels to produce labels, and matching the labels to text using a decoder comprising a phonetic vocabulary and a language model database.
    Type: Grant
    Filed: July 28, 2000
    Date of Patent: January 21, 2003
    Assignee: International Business Machines Corporation
    Inventors: Julian Chengjun Chen, Guo Kang Fu, Hai Ping Li, Li Qin Shen
  • Patent number: 6507815
    Abstract: A group of words to be registered in a word dictionary are sorted in order of sound models to produce a word list. A tree-structure word dictionary in which sound models at head part of the words are shared among the words, is prepared using this word list. Each node having a different set of reachable words from a parent node holds word information including a minimum out of word IDs of words reachable from that node, and the number of words reachable from that node. For searching for a word matching with speech input, language likelihoods are looked ahead using this word information. The word matching with the speech input can be recognized efficiently, using such a tree-structure word dictionary and a look-ahead method of language likelihood.
    Type: Grant
    Filed: March 29, 2000
    Date of Patent: January 14, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hiroki Yamamoto
  • Patent number: 6505154
    Abstract: The invention relates to a method and a device for comparing acoustic input signals fed into an input device (1) with acoustic reference signals stored in a memory (3). The first step in this case is to carry out a harmonic analysis of the input signals in a frequency analyzer (4) connected to the input device (1) and the memory (3), in order to produce a time-dependent Fourier spectrum. Thereafter, preselectable characteristics of the Fourier spectrum are determined as input signal coordinates for defining an n-dimensional input signal vector, where n is a number of the characteristics. Thereafter, the input signal vector is checked for correspondence of respectively corresponding coordinates within prescribed tolerances with at least one reference signal vector, which is defined in the same way and thereafter stored in the memory (3).
    Type: Grant
    Filed: February 11, 2000
    Date of Patent: January 7, 2003
    Assignee: Primasoft GmbH
    Inventors: Hermann Bottenbruch, Michael Mertens
  • Patent number: 6502073
    Abstract: A method of processing speech representative of ideograms for speech communication using an asynchronous communication channel (21) is disclosed. The method includes the step of processing speech units of a speech and data indicative of the speech units. Each speech unit is representative of an ideogram or a plurality of semantically related ideograms (500-508). The data indicative of the speech units is discretely communicable on the asynchronous communication channel (21). By communicating the data indicative of the speech units, a substantially low data transmission rate and intelligible speech communication is achieved.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: December 31, 2002
    Assignee: Kent Ridge Digital Labs
    Inventors: Cuntai Guan, Jun Xu, Haizhou Li
  • Publication number: 20020184017
    Abstract: A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.
    Type: Application
    Filed: May 4, 2001
    Publication date: December 5, 2002
    Inventors: Chin-Hui Lee, Qi P. Li, Jinsong Zheng, Qiru Zhou
  • Patent number: 6490553
    Abstract: The disclosed method and apparatus controls the rate of playback of audio data corresponding to a stream of speech. Using speech recognition, the rate of speech of the audio data is determined. The determined rate of speech is compared to a target rate. Based on the comparison, the playback rate is adjusted, i.e. increased or decreased, to match the target rate.
    Type: Grant
    Filed: February 12, 2001
    Date of Patent: December 3, 2002
    Assignee: Compaq Information Technologies Group, L.P.
    Inventors: Jean-Manuel Van Thong, Davis Pan
  • Patent number: 6477492
    Abstract: A Perceptual Speech Distortion Metric (PSDM) generates perceptual distortion values for voice prompts received from a voice response system by comparing the received voice prompts with reference signals associated with the same states in the voice response system. The perceptual distortion values identify the voice prompts as either correct or incorrect responses to signal generator inputs and also quantify an amount of perceptual distortion in the voice prompts.
    Type: Grant
    Filed: June 15, 1999
    Date of Patent: November 5, 2002
    Assignee: Cisco Technology, Inc.
    Inventor: Kevin J. Connor
  • Patent number: 6470314
    Abstract: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.
    Type: Grant
    Filed: April 6, 2000
    Date of Patent: October 22, 2002
    Assignee: International Business Machines Corporation
    Inventors: Satyanarayana Dharanipragada, Mukund Padmanabhan
  • Publication number: 20020152068
    Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.
    Type: Application
    Filed: February 22, 2001
    Publication date: October 17, 2002
    Applicant: International Business Machines Corporation
    Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
  • Patent number: 6466907
    Abstract: A process provides for searching through a written text in response to a spoken question comprising a plurality of words. The first step in the process is to transcribe the written text into a first sequence of phonetic units. Then, a spoken question is segmented into a second sequence of phonetic units. The search is conducted through the written text for an occurrence of the spoken question. The search comprises aligning the first and second sequences of phonetic units.
    Type: Grant
    Filed: November 18, 1998
    Date of Patent: October 15, 2002
    Assignee: France Telecom SA
    Inventors: Alexandre Ferrieux, Stephane Peillon
  • Patent number: 6456966
    Abstract: A deciding apparatus and method for deciding an audio signal coding system. A digital signal processor receives a coded audio signal, selects a specific coding system for the coded audio signal based on a predetermined portion of a data sequence of additional data of the audio signal, and decodes the audio signal using the selected coding system. A memory stores decoded programs for decoding the coded audio signal.
    Type: Grant
    Filed: June 21, 2000
    Date of Patent: September 24, 2002
    Assignee: Fuji Photo Film Co., Ltd.
    Inventor: Hiroshi Iwabuchi
  • Patent number: 6449591
    Abstract: With respect to each of codes corresponding to code vectors in a code book stored in a code book storage section, an expectation degree storage section stores an expectation degree at which observation is expected when an integrated parameter with respect to a word as a recognition target is inputted. A vector quantization section vector-quantizes the integrated parameter and outputs a series of codes of a code vector which has a shortest distance to the integrated parameter. Further, a chi-square test section makes a chi-square test with use of the series of codes outputted from the vector quantization section and an expectation degree of each code stored in the expectation degree storage section, thereby to obtain properness as to whether or not the integrated parameter corresponds to a recognition target. Further, recognition is performed, based on the chi-square test result. As a result of this, recognition can be performed without considering time components which a signal has.
    Type: Grant
    Filed: May 31, 2000
    Date of Patent: September 10, 2002
    Assignee: Sony Corporation
    Inventors: Tetsujiro Kondo, Norifumi Yoshiwara
  • Patent number: 6446038
    Abstract: A method and system for objectively evaluating the quality of speech in a voice communication system. A plurality of speech reference vectors is first obtained based on a plurality of clean speech samples. A corrupted speech signal is received and processed to determine a plurality of distortions derived from a plurality of distortion measures based on the plurality of speech reference vectors. The plurality of distortions are processed by a non-linear neural network model to generate a subjective score representing user acceptance of the corrupted speech signal. The non-linear neural network model is first trained on clean speech samples as well as corrupted speech samples through the use of backpropagation to obtain the weights and bias terms necessary to predict subjective scores from several objective measures.
    Type: Grant
    Filed: April 1, 1996
    Date of Patent: September 3, 2002
    Assignee: Qwest Communications International, Inc.
    Inventors: Aruna Bayya, Marvin Vis
  • Publication number: 20020120444
    Abstract: A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker's already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.
    Type: Application
    Filed: September 24, 2001
    Publication date: August 29, 2002
    Inventor: Henrik Botterweck
  • Patent number: 6438518
    Abstract: A method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions includes a speech coder configured to select from among various predictive coding modes. After a predefined number of speech frames have been predictively coded, the speech coder codes one frame with a nonpredictive coding mode or a mildly predictive coding mode. The predefined number of frames can be determined in advance from the subjective standpoint of a listener. The predefined number of frames may be varied periodically. An average coding bit rate may be maintained for the speech coder by ensuring that an average coding bit rate is maintained for each successive pattern, or group, of predictively coded speech frames including at least one nonpredictively coded or mildly predictively coded speech frame.
    Type: Grant
    Filed: October 28, 1999
    Date of Patent: August 20, 2002
    Assignee: Qualcomm Incorporated
    Inventors: Sharath Manjunath, Andrew P. Dejaco, Arasanipalai K. Ananthapadmanabhan, Eddie Lun Tik Choy