Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Speech recognition from overlapping frequency bands with output data reduction

Patent number: 6721698

Abstract: A speech recognition feature extractor includes a time-to-frequency domain transformer for generating spectral values in the frequency domain from a speech signal; a partitioning means for generating a first set and an additional set of spectral values in the frequency domain; a first feature generator for generating a first group of speech features using the first set of spectral values; a additional feature generator for generating an additional group of speech features using the additional set of spectral values; the feature generators arranged to operate in parallel, an assembler for assembling an output set of speech features from at least one speech feature from the first group of speech features and at least one speech feature from the additional group of speech features, and an anti-aliasing and sampling rate reduction block, where the first and the additional set of spectral values comprise at least one common spectral value.

Type: Grant

Filed: October 27, 2000

Date of Patent: April 13, 2004

Assignee: Nokia Mobile Phones, Ltd.

Inventors: Ramalingam Hariharan, Juha Häkkinen, Imre Kiss, Jilei Tian, Olli Viikki
Speech recognition support method and apparatus

Patent number: 6718304

Abstract: A speech recognition support method in a system to retrieve a map in response to a user's input speech. The user's speech is recognized and a recognition result is obtained. If the recognition result represents a point on the map, a distance between the point and a base point on the map is calculated. The distance is decided to be above a threshold or not. If the distance is above the threshold, an inquiry to confirm whether the recognition result is correct is output to the user.

Type: Grant

Filed: June 29, 2000

Date of Patent: April 6, 2004

Assignee: Kabushiki Kaisha Toshiba

Inventors: Mitsuyoshi Tachimori, Hiroshi Kanazawa
Codebook structure and search for speech coding

Patent number: 6714907

Abstract: A speech compression system with a special fixed codebook structure and a new search routine is proposed for speech coding. The system is capable of encoding a speech signal into a bitstream for subsequent decoding to generate synthesized speech. The codebook structure uses a plurality of subcodebooks. Each subcodebook is designed to fit a specific group of speech signals. A better way is used to calculate a criterion value, minimizing an error signal in a minimization loop as part of the coding system. An external signal sets a maximum bitstream rate for delivering encoded speech into a communications system. The speech compression system comprises a full-rate codec, a half-rate codec, a quarter-rate codec and an eighth-rate codec. Each codec is selectively activated to encode and decode the speech signals at different bit rates to enhance overall quality of the synthesized speech at a limited average bit rate.

Type: Grant

Filed: February 15, 2001

Date of Patent: March 30, 2004

Assignee: Mindspeed Technologies, Inc.

Inventor: Yang Gao
Apparatus and method for quantitative measurement of voice quality in packet network environments

Publication number: 20040059572

Abstract: A system and a method for quantitatively measuring voice transmission quality within a voice-over-data-network such as a telephony-enabled LAN utilize speech recognition to measure the quality of voice transmission. A first aspect of the invention involves determining the suitability of the LAN for voice communications. A voice sample is selected by a first terminal and is transmitted to a second terminal on the LAN a number of times. The first terminal introduces an incrementally larger quantity of noise into each transmission of the voice packet. The second terminal performs speech recognition for each successively received voice sample and determines the accuracy for each speech recognition session. The amount of noise which drops the speech recognition accuracy below a threshold level provides a measure of the suitability of the LAN for voice communications. During normal operation of the LAN, speech recognition accuracy tests are performed between various endpoints to monitor voice transmission quality.

Type: Application

Filed: September 25, 2002

Publication date: March 25, 2004

Inventors: Branislav Ivanic, Eli Jacobi, Peter Kozdon, Noboru Nishiya, Christoph Aktas
Automatic speech recognition with psychoacoustically-based feature extraction, using easily-tunable single-shape filters along logarithmic-frequency axis

Patent number: 6701291

Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.

Type: Grant

Filed: April 2, 2001

Date of Patent: March 2, 2004

Assignee: Lucent Technologies Inc.

Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
Method and apparatus for adaptive speech detection by applying a probabilistic description to the classification and tracking of signal components

Patent number: 6691087

Abstract: A signal processing system for detecting the presence of a desired signal component by applying a probabilistic description to the classification and tracking of various signal components (e.g., desired versus non-desired signal components) in an input signal is disclosed.

Type: Grant

Filed: September 30, 1998

Date of Patent: February 10, 2004

Assignees: Sarnoff Corporation, LG Electronics, Inc.

Inventors: Lucas Parra, Aalbert de Vries
Method, computer program, and storage medium for estimating randomness of function of representative value of random variable by the use of gradient of same function

Patent number: 6675126

Abstract: A method of estimating a measure of randomness of a function of at least one representative value of at least one random variable is constructed to have the steps of obtaining the at least one random variable; determining the at least one representative value of the obtained at least one random variable; determining a first statistic of the obtained at least one random variable; determining a gradient of the function with respect to the determined at least one representative value; and transforming the obtained first statistic into a second statistic of the function, using the determined gradient. The step of transforming may be adapted to transform the first statistic into the second statistic, such that the second statistic responds to the first statistic more sensitively in the case of the gradient being steep than in the case of the gradient being gentle.

Type: Grant

Filed: March 27, 2001

Date of Patent: January 6, 2004

Assignee: Kabushiki Kaisha Toyota Chuo Kenkyusho

Inventor: Christoph Hermann Roser
Speech recognizer using database linking

Patent number: 6629069

Abstract: A speech recogniser is provided for identifying entries in a database. Results from the recognition of a user's speech are combined with each other and optionally with reference to data in the database in order to maximise the accuracy of an identified entry. An output is also provided which gives an indication of the likely accuracy of the identified entry.

Type: Grant

Filed: January 3, 2001

Date of Patent: September 30, 2003

Assignee: British Telecommunications a public limited company

Inventors: David John Attwater, Hilary Richard William Greenhow, Peter John Durston
Audio psychlogical stress indicator alteration method and apparatus

Publication number: 20030182116

Abstract: A method of changing psychological stress indices by evaluating manifestations of physiological change in the human voice wherein the utterances of a subject under examination are formatted as electrical signals and processed to alter selected characteristics which have been found to change with psycho-physiological state changes, such that the resultant output data signals are perceptually unchanged yet display none of the undesired physiological response characteristics. Apparatus for performing changes of this type includes a data input port, means for spectral alteration and a data output port.

Type: Application

Filed: March 25, 2002

Publication date: September 25, 2003

Inventor: Patrick O?apos;Neal Nunally
Method for robust voice recognation by analyzing redundant features of source signal

Publication number: 20030182115

Abstract: A method for processing digitized speech signals by analyzing redundant features to provide more robust voice recognition. A primary transformation is applied to a source speech signal to extract primary features therefrom. Each of at least one secondary transformation is applied to the source speech signal or extracted primary features to yield at least one set of secondary features statistically dependant on the primary features. At least one predetermined function is then applied to combine the primary features with the secondary features. A recognition answer is generated by pattern matching this combination against predetermined voice recognition templates.

Type: Application

Filed: March 20, 2002

Publication date: September 25, 2003

Inventors: Narendranath Malayath, Harinath Garudadri
Method of and device for phone-based speaker recognition

Patent number: 6618702

Abstract: A language-independent speaker-recognition system based on parallel cumulative differences in dynamic realization of phonetic features ( i.e. , pronunciation) between speakers rather than spectral differences in voice quality. The system exploits phonetic information from many phone recognizers to perform text independent speaker recognition. A digitized speech signal from a speaker is converted to a sequence of phones by each phone recognizer. Each phone sequence is then modified based on the energy in the signal. The modified phone sequences are tokenized to produce phone n-grams that are compared against a speaker and a background model for each phone recognizer to produce log-likelihood ratio scores. The log-likelihood ratio scores from each phone recognizer are fused to produce a final recognition score for each speaker model. The recognition score for each speaker model is then evaluated to determine which of the modeled speakers, if any, produced the digitized speech signal.

Type: Grant

Filed: June 14, 2002

Date of Patent: September 9, 2003

Inventors: Mary Antoinette Kohler, Walter Doyle Andrews, III, Joseph Paul Campbell, Jr.
Methods and apparatus for performing heteroscedastic discriminant analysis in pattern recognition systems

Patent number: 6609093

Abstract: The present invention provides a new approach to heteroscedastic linear discriminant analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.

Type: Grant

Filed: June 1, 2000

Date of Patent: August 19, 2003

Assignee: International Business Machines Corporation

Inventors: Ramesh Ambat Gopinath, Mukund Padmanabhan, George Andrei Saon
Method for recognizing speech/speaker using emotional change to govern unsupervised adaptation

Publication number: 20030154076

Abstract: To improve the performance and the recognition rate of a method for recognizing speech in a dialogue system, or the like, it is suggested to derive emotion information data (EID) from speech input (SI) being descriptive for an emotional state of a speaker or a change thereof based upon which a process of recognition is chosen and/or designed.

Type: Application

Filed: February 11, 2003

Publication date: August 14, 2003

Inventor: Thomas Kemp
Statistical computing and reporting for interactive speech applications

Patent number: 6606598

Abstract: A method and apparatus are disclosed for computing and reporting statistical information that describes the performance of an interactive speech application. The interactive speech application is developed and deployed for use by one or more callers. During execution, the interactive speech application stores, in a log, event information that describes each task carried out by the interactive speech application in response to interaction with the one or more callers. After the log is established, an analytical report is displayed. The report describes selective actions taken by the interactive speech application while executing, and selective actions taken by one or more callers while interacting with the interactive speech application. Information in the analytical report is selected so as to identify one or more potential performance problems in the interactive speech application. The analytical reports are generated based on the information stored in the event logs.

Type: Grant

Filed: September 21, 1999

Date of Patent: August 12, 2003

Assignee: SpeechWorks International, Inc.

Inventors: Mark A. Holthouse, Matthew T. Marx, John N. Nguyen
Feature-based audio content identification

Patent number: 6604072

Abstract: An audio signal is sampled and a frequency transform is performed on a succession of sets of samples of the signal to obtain a time dependent power spectrum for the audio signal. Frequency components output by the frequency transform are collected in frequency bands. More than one running average is taken of each semitone frequency band. When the values of two running averages of the same semitone frequency band cross, time information is recorded. Information about average crossing events that have occurred at different times in a set of adjacent semitone frequency bands is combined to form a key. A set of keys obtained from a song provides a means for identifying the song and is stored in a database for use in identifying songs.

Type: Grant

Filed: March 9, 2001

Date of Patent: August 5, 2003

Assignee: International Business Machines Corporation

Inventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
Voice recognition apparatus

Patent number: 6604073

Abstract: Disclosed is a voice recognition apparatus which can prevent an erroneous manipulation due to erroneous voice recognition from being carried out even in a noisy environment. As long as a duration of utterance acquired based on the level of a voice signal uttered by an operator (user) approximately coincides with a duration of utterance acquired based on mouth image data acquired by capturing the mouth of the operator, the voice recognition apparatus outputs vocal-manipulation phrase data as the result of voice recognition.

Type: Grant

Filed: September 12, 2001

Date of Patent: August 5, 2003

Assignee: Pioneer Corporation

Inventor: Shoutarou Yoda
Method and system for joint optimization of feature and model space transformation of a speech recognition system

Publication number: 20030139926

Abstract: Methods for processing speech data are described herein. In one aspect of the invention, an exemplary method includes receiving a speech data stream, performing a Mel Frequency Cepstral Coefficients (MFCC) feature extraction on the speech data stream, optimizing feature space transformation (FST), optimizing model space transformation (MST) based on the FST, and performing recognition decoding based on the FST and the MST, generating a word sequence. Other methods and apparatuses are also described.

Type: Application

Filed: January 23, 2002

Publication date: July 24, 2003

Inventors: Ying Jia, Xiaobo Pi, Yonghong Yan
Evaluation method, apparatus, and recording medium using optimum template pattern determination method, apparatus and optimum template pattern

Patent number: 6598019

Abstract: To improve the precision in correction of an input sentence by using a template pattern for model sentence. A plurality of template patterns for the model sentence are provided beforehand. Each of the template patterns is regarded as a plurality of templates of words/phrases based on expertise of language teachers with scores assigned to the words according to their importance. The scores and subsequently the input sentence are read and analyzed in comparison with each of the template patterns and the total of scores of matching words is calculated. A template pattern having the highest total score is selected as an optimum template pattern and the input sentence is corrected using the optimum template pattern. This method improves the likelihood that a template pattern containing a larger number of important words is selected as the optimum template pattern.

Type: Grant

Filed: June 20, 2000

Date of Patent: July 22, 2003

Assignee: Sunflare Co., Ltd.

Inventors: Naoyuki Tokuda, Hiroyuki Sasai
High dimensional data mining and visualization via gaussianization

Patent number: 6591235

Abstract: A method is provided for providing high dimensional data. The high dimensional data is linearly transformed into less dependent coordinates, by applying a linear transform of n rows by n columns to the high dimensional data. Each of the coordinates are marginally Gaussianized, the Gaussianization being characterized by univariate Gaussian means, priors, and variances. The transforming and Gaussianizing steps are iteratively repeated until the coordinates converge to a standard Gaussian distribution. The coordinates of all iterations are arranged hierarchically to facilitate data mining. The arranged coordinates are then mined. According to an embodiment of the invention, the transform step includes applying an iterative maximum likelihood expectation maximization (EM) method to the high dimensional data.

Type: Grant

Filed: May 5, 2000

Date of Patent: July 8, 2003

Assignee: International Business Machines Corporation

Inventors: Scott Shaobing Chen, Ramesh Ambat Gopinath
Speech recognition system with maximum entropy language models

Publication number: 20030125942

Abstract: The invention relates to a method of setting a free parameter 1 λ α ortho

Type: Application

Filed: October 10, 2002

Publication date: July 3, 2003

Inventor: Jochen Peters
Method and apparatus for objective sound quality measurement using statistical and temporal distribution parameters

Patent number: 6577996

Abstract: A method and apparatus for objectively evaluating sound quality of a signal processor or transmission channel. The present invention analyzes the distortion in a series of test sound frames compared to a series of sample sound frames. The invention detects sequences of test sound frames having distortion levels that are greater than a temporal distortion threshold and calculates an average length and a maximum length of these sequences. The present invention also detects individual test sound frames having distortion levels that are greater than an outlier distortion threshold and calculates a percentage of these frames present in the series of test sound frames. Further, the present invention calculates the average distortion level in the series of test sound frames and a variance of the distortion level in the test sound frames.

Type: Grant

Filed: December 8, 1998

Date of Patent: June 10, 2003

Assignee: Cisco Technology, Inc.

Inventor: Ramanathan T. Jagadeesan
System for monitoring broadcast audio content

Patent number: 6574594

Abstract: A broadcast datastream is received, and audio identifying information is generated for audio content from the broadcast datastream. It is determined whether the audio identifying information generated for the broadcast audio content matches audio identifying information in an audio content database. In one preferred embodiment, the audio identifying information is an audio feature signature that is based on audio content. Also provided is a system for monitoring broadcast audio content.

Type: Grant

Filed: June 29, 2001

Date of Patent: June 3, 2003

Assignee: International Business Machines Corporation

Inventors: Michael C. Pitman, Blake G. Fitch, Steven Abrams, Robert S. Germain
Decision tree based speech recognition

Publication number: 20030097263

Abstract: A method (200) is described for creating decision trees for processing a sampled signal indicative of speech. The method (200) includes providing model sub vectors (220) from partitioned statistical speech models of phones, the models comprising vectors of mean values and associated variance values. The method (200) then provides for statistically analyzing (230) the model sub vectors of mean values to provide projection vectors indicating directions of relative maximum variance between the sub vectors and thereafter the method provides for calculating projection values (240) of the projection vectors. A selecting potential threshold values (260) step is then applied, the potential threshold values being determined from analysis of a range of the projection values. Finally a step of creating the decision trees (270) is effected to provide a decision tree having decisions to divide the model sub vectors into groups, the groups being leaves of the tree.

Type: Application

Filed: November 16, 2001

Publication date: May 22, 2003

Inventor: Hang Shun Lee
Speech recognition method using speaker cluster models

Patent number: 6567776

Abstract: In speaker-independent speech recognition, between-speaker variability is one of the major resources of recognition errors. A speaker cluster model is used to manage recognition problems caused by between-speaker variability. In the training phase, the score function is used as a discriminative function. The parameters of at least two cluster-dependent models are adjusted through a discriminative training method to improve performance of the speech recognition.

Type: Grant

Filed: April 4, 2000

Date of Patent: May 20, 2003

Assignee: Industrial Technology Research Institute

Inventors: Sen-Chia Chang, Shih-Chieh Chien, Chung-Mou Penwu
Method and system for measurement of speech distortion from samples of telephonic voice signals

Patent number: 6564181

Abstract: A system that provides measurements of speech distortion that correspond closely to user perceptions of speech distortion is disclosed. The system calculates and analyzes first and second discrete derivatives to detect and determine the incidence of change in the voice waveform that would not have been made by human articulation because natural voice signals change at a limited rate. Statistical analysis is performed of both the first and second discrete derivatives to detect speech distortion by looking at the distribution of the signals. For example. the kurtosis of the signals is analyzed as well as the number of times these values exceed a predetermined threshold, Additionally. the number of times the first derivative data is less than a predetermined low value is analyzed to provide a level of speech distortion and clipping of the signal due to lost data packets.

Type: Grant

Filed: April 24, 2001

Date of Patent: May 13, 2003

Assignee: WorldCom, Inc.

Inventor: William C. Hardy
Speech recognition by dynamical noise model adaptation

Publication number: 20030088411

Abstract: The invention provides a Hidden Markov Model (132) based automated speech recognition system (100) that dynamically adapts to changing background noise by detecting long pauses in speech, and for each pause processing background noise during the pause to extract a feature vector that characterizes the background noise, identifying a Gaussian mixture component of noise states that most closely matches the extracted feature vector, and updating the mean of the identified Gaussian mixture component so that it more closely matches the extracted feature vector, and consequently more closely matches the current noise environment. Alternatively, the process is also applied to refine the Gaussian mixtures associated with other emitting states of the Hidden Markov Model.

Type: Application

Filed: November 5, 2001

Publication date: May 8, 2003

Inventors: Changxue Ma, Yuan-Jun Wei
Method of assessing degree of acoustic confusability, and system therefor

Publication number: 20030069729

Abstract: Predicting speech recognizer confusion where utterances can be represented by any combination of text form and audio file. The utterances are represented with an intermediate representation that directly reflects the acoustic characteristics of the utterances. Text representations of the utterances can be directly used for predicting confusability without access to audio file examples of the utterances. First embodiment: two text utterances are represented with strings of phonemes and one of the strings of phonemes is transformed into the other strings of phonemes for a least cost as a confusability measure. Second embodiment: two utterances are represented with an intermediate representation of sequences of acoustic events based on phonetic capabilities of speakers obtained from acoustic signals of the utterances and the acoustic events are compared. Predicting confusability of the utterances according to a formula 2K/(T), K is a number of matched acoustic events and T is a total number of acoustic events.

Type: Application

Filed: October 5, 2001

Publication date: April 10, 2003

Inventors: Corine A Bickley, Lawrence A. Denenberg
High dimensional acoustic modeling via mixtures of compound gaussians with linear transforms

Patent number: 6539351

Abstract: A method is provided for generating a high dimensional density model within an acoustic model for one of a speech and a speaker recognition system. Acoustic data obtained from at least one speaker is transformed into high dimensional feature vectors. The density model is formed to model the feature vectors by a mixture of compound Gaussians with a linear transform, wherein each compound Gaussian is associated with a compound Gaussian prior and models each coordinate of each component of the density model independently by a univariate Gaussian mixture comprising a univariate Gaussian prior, variance, and mean. An iterative expectation maximization (EM) method is applied to the feature vectors. The EM method includes the step of computing an auxiliary function Q of the EM method.

Type: Grant

Filed: May 5, 2000

Date of Patent: March 25, 2003

Assignee: International Business Machines Corporation

Inventors: Scott Shaobing Chen, Ramesh Ambat Gopinath
Method and circuit arrangement for speech level measurement in a speech signal processing system

Patent number: 6539350

Abstract: Speech level measurement is particularly significant for successful echo compensation in telecommunications systems, for noise suppression in a noisy environment, for example in military vehicles, or in speech recognition and in speech coding and decoding systems. A method is indicated which permits speech levels measurement only if features of speech are recognized and interferences and speech pauses are filtered out for the measurement. To this end, speech and pause detectors and a mean value generator are utilized, the time behavior of which is largely adapted to the perception capability of the human ear. Briefly spoken vowels thus are well detected, while nasal sounds or consonants are suppressed in the case of falling levels. A speech level measuring device is indicated which provides very accurate results in a short adaptation period.

Type: Grant

Filed: November 18, 1999

Date of Patent: March 25, 2003

Assignee: Alcatel

Inventor: Michael Walker
Automatic detection of non-stationarity in speech signals

Patent number: 6535843

Abstract: When necessary to time scale a speech signal, it is advantageous to do it under influence of a signal that measures the small-window non-stationarity of the speech signal. Three measures of stationarity are disclosed: one that is based on time domain analysis, one that is based on frequency domain analysis, and one that is based on both time and frequency domain analysis.

Type: Grant

Filed: August 18, 1999

Date of Patent: March 18, 2003

Assignee: AT&T Corp.

Inventors: Ioannis G. Stylianou, David A. Kapilow, Juergen Schroeter
Method and system for speech recognition

Publication number: 20030050779

Abstract: There is provided a novel approach for generating multilingual text-to-phoneme mappings for use in multilingual speech recognition systems. The multilingual mappings are based on the weighted outputs from a neural network text-to-phoneme model, trained on data mixed from several languages. The multilingual mappings used together with a branched grammar decoding scheme is able to capture both inter- and intra-language pronunciation variations which is ideal for multilingual speaker independent speech recognition systems. A significant improvement in overall system performance is obtained for a multilingual speaker independent name dialing task when applying multilingual instead of language dependent text-to-phoneme mapping.

Type: Application

Filed: August 31, 2001

Publication date: March 13, 2003

Inventors: Soren Riis, Kare Jean Jensen, Morten With Pedersen
System, method, and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters

Publication number: 20030033145

Abstract: A system, method and article of manufacture are provided for detecting emotion using statistics. First, a database is provided. The database has statistics including human associations of voice parameters with emotions. Next, a voice signal is received. At least one feature is extracted from the voice signal. Then the extracted voice feature is compared to the voice parameters in the database. An emotion is selected from the database based on the comparison of the extracted voice feature to the voice parameters and is then output.

Type: Application

Filed: April 10, 2001

Publication date: February 13, 2003

Inventor: Valery A. Petrushin
Speech recognition using discriminant features

Publication number: 20030023436

Abstract: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.

Type: Application

Filed: March 29, 2001

Publication date: January 30, 2003

Applicant: IBM Corporation

Inventor: Ellen M. Eide
System and method for context-based spontaneous speech recognition

Publication number: 20030023437

Abstract: A system and method for processing human language input uses collocation information for the language that is not limited to N-gram information for N no greater than a predetermined value. The input is preferably speech input. The system and method preferably recognize at least a portion of the input based on the collocation information.

Type: Application

Filed: January 28, 2002

Publication date: January 30, 2003

Inventor: Pascale Fung
Optimized local feature extraction for automatic speech recognition

Patent number: 6513004

Abstract: The acoustic speech signal is decomposed into wavelets arranged in an asymmetrical tree data structure from which individual nodes may be selected to best extract local features, as needed to model specific classes of sound units. The wavelet packet transformation is smoothed through integration and compressed to apply a non-linearity prior to discrete cosine transformation. The resulting subband features such as cepstral coefficients may then be used to construct the speech recognizer's speech models. Using the local feature information extracted in this manner allows a single recognizer to be optimized for several different classes of sound units, thereby eliminating the need for parallel path recognizers.

Type: Grant

Filed: November 24, 1999

Date of Patent: January 28, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Luca Rigazio, David Kryze, Ted Applebaum, Jean-Claude Junqua
Method and apparatus for recognizing tone languages using pitch information

Patent number: 6510410

Abstract: A method and an apparatus for automatic recognition of tone languages, employing the steps of converting the words of speech into an electrical signal, generating spectral features from the electrical signal, extracting pitch values from the electrical signal, combining said spectral features and the pitch values into acoustic feature vectors, comparing the acoustic feature vectors with prototypes of phonemes in an acoustic prototype database including prototypes of toned vowels to produce labels, and matching the labels to text using a decoder comprising a phonetic vocabulary and a language model database.

Type: Grant

Filed: July 28, 2000

Date of Patent: January 21, 2003

Assignee: International Business Machines Corporation

Inventors: Julian Chengjun Chen, Guo Kang Fu, Hai Ping Li, Li Qin Shen
Speech recognition apparatus and method

Patent number: 6507815

Abstract: A group of words to be registered in a word dictionary are sorted in order of sound models to produce a word list. A tree-structure word dictionary in which sound models at head part of the words are shared among the words, is prepared using this word list. Each node having a different set of reachable words from a parent node holds word information including a minimum out of word IDs of words reachable from that node, and the number of words reachable from that node. For searching for a word matching with speech input, language likelihoods are looked ahead using this word information. The word matching with the speech input can be recognized efficiently, using such a tree-structure word dictionary and a look-ahead method of language likelihood.

Type: Grant

Filed: March 29, 2000

Date of Patent: January 14, 2003

Assignee: Canon Kabushiki Kaisha

Inventor: Hiroki Yamamoto
Method and device for comparing acoustic input signals fed into an input device with acoustic reference signals stored in a memory

Patent number: 6505154

Abstract: The invention relates to a method and a device for comparing acoustic input signals fed into an input device (1) with acoustic reference signals stored in a memory (3). The first step in this case is to carry out a harmonic analysis of the input signals in a frequency analyzer (4) connected to the input device (1) and the memory (3), in order to produce a time-dependent Fourier spectrum. Thereafter, preselectable characteristics of the Fourier spectrum are determined as input signal coordinates for defining an n-dimensional input signal vector, where n is a number of the characteristics. Thereafter, the input signal vector is checked for correspondence of respectively corresponding coordinates within prescribed tolerances with at least one reference signal vector, which is defined in the same way and thereafter stored in the memory (3).

Type: Grant

Filed: February 11, 2000

Date of Patent: January 7, 2003

Assignee: Primasoft GmbH

Inventors: Hermann Bottenbruch, Michael Mertens
Low data transmission rate and intelligible speech communication

Patent number: 6502073

Abstract: A method of processing speech representative of ideograms for speech communication using an asynchronous communication channel (21) is disclosed. The method includes the step of processing speech units of a speech and data indicative of the speech units. Each speech unit is representative of an ideogram or a plurality of semantically related ideograms (500-508). The data indicative of the speech units is discretely communicable on the asynchronous communication channel (21). By communicating the data indicative of the speech units, a substantially low data transmission rate and intelligible speech communication is achieved.

Type: Grant

Filed: June 7, 2000

Date of Patent: December 31, 2002

Assignee: Kent Ridge Digital Labs

Inventors: Cuntai Guan, Jun Xu, Haizhou Li
Method and apparatus for performing real-time endpoint detection in automatic speech recognition

Publication number: 20020184017

Abstract: A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.

Type: Application

Filed: May 4, 2001

Publication date: December 5, 2002

Inventors: Chin-Hui Lee, Qi P. Li, Jinsong Zheng, Qiru Zhou
Apparatus and method for controlling rate of playback of audio data

Patent number: 6490553

Abstract: The disclosed method and apparatus controls the rate of playback of audio data corresponding to a stream of speech. Using speech recognition, the rate of speech of the audio data is determined. The determined rate of speech is compared to a target rate. Based on the comparison, the playback rate is adjusted, i.e. increased or decreased, to match the target rate.

Type: Grant

Filed: February 12, 2001

Date of Patent: December 3, 2002

Assignee: Compaq Information Technologies Group, L.P.

Inventors: Jean-Manuel Van Thong, Davis Pan
System for automated testing of perceptual distortion of prompts from voice response systems

Patent number: 6477492

Abstract: A Perceptual Speech Distortion Metric (PSDM) generates perceptual distortion values for voice prompts received from a voice response system by comparing the received voice prompts with reference signals associated with the same states in the voice response system. The perceptual distortion values identify the voice prompts as either correct or incorrect responses to signal generator inputs and also quantify an amount of perceptual distortion in the voice prompts.

Type: Grant

Filed: June 15, 1999

Date of Patent: November 5, 2002

Assignee: Cisco Technology, Inc.

Inventor: Kevin J. Connor
Method and apparatus for rapid adapt via cumulative distribution function matching for continuous speech

Patent number: 6470314

Abstract: A method of adapting a speech recognition system to one or more acoustic conditions comprises the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.

Type: Grant

Filed: April 6, 2000

Date of Patent: October 22, 2002

Assignee: International Business Machines Corporation

Inventors: Satyanarayana Dharanipragada, Mukund Padmanabhan
New language context dependent data labeling

Publication number: 20020152068

Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.

Type: Application

Filed: February 22, 2001

Publication date: October 17, 2002

Applicant: International Business Machines Corporation

Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
Process for searching for a spoken question by matching phonetic transcription to vocal request

Patent number: 6466907

Abstract: A process provides for searching through a written text in response to a spoken question comprising a plurality of words. The first step in the process is to transcribe the written text into a first sequence of phonetic units. Then, a spoken question is segmented into a second sequence of phonetic units. The search is conducted through the written text for an occurrence of the spoken question. The search comprises aligning the first and second sequences of phonetic units.

Type: Grant

Filed: November 18, 1998

Date of Patent: October 15, 2002

Assignee: France Telecom SA

Inventors: Alexandre Ferrieux, Stephane Peillon
Apparatus and method for decoding audio signal coding in a DSR system having memory

Patent number: 6456966

Abstract: A deciding apparatus and method for deciding an audio signal coding system. A digital signal processor receives a coded audio signal, selects a specific coding system for the coded audio signal based on a predetermined portion of a data sequence of additional data of the audio signal, and decodes the audio signal using the selected coding system. A memory stores decoded programs for decoding the coded audio signal.

Type: Grant

Filed: June 21, 2000

Date of Patent: September 24, 2002

Assignee: Fuji Photo Film Co., Ltd.

Inventor: Hiroshi Iwabuchi
Learning apparatus, learning method, recognition apparatus, recognition method, and recording medium

Patent number: 6449591

Abstract: With respect to each of codes corresponding to code vectors in a code book stored in a code book storage section, an expectation degree storage section stores an expectation degree at which observation is expected when an integrated parameter with respect to a word as a recognition target is inputted. A vector quantization section vector-quantizes the integrated parameter and outputs a series of codes of a code vector which has a shortest distance to the integrated parameter. Further, a chi-square test section makes a chi-square test with use of the series of codes outputted from the vector quantization section and an expectation degree of each code stored in the expectation degree storage section, thereby to obtain properness as to whether or not the integrated parameter corresponds to a recognition target. Further, recognition is performed, based on the chi-square test result. As a result of this, recognition can be performed without considering time components which a signal has.

Type: Grant

Filed: May 31, 2000

Date of Patent: September 10, 2002

Assignee: Sony Corporation

Inventors: Tetsujiro Kondo, Norifumi Yoshiwara
Method and system for objectively evaluating speech

Patent number: 6446038

Abstract: A method and system for objectively evaluating the quality of speech in a voice communication system. A plurality of speech reference vectors is first obtained based on a plurality of clean speech samples. A corrupted speech signal is received and processed to determine a plurality of distortions derived from a plurality of distortion measures based on the plurality of speech reference vectors. The plurality of distortions are processed by a non-linear neural network model to generate a subjective score representing user acceptance of the corrupted speech signal. The non-linear neural network model is first trained on clean speech samples as well as corrupted speech samples through the use of backpropagation to obtain the weights and bias terms necessary to predict subjective scores from several objective measures.

Type: Grant

Filed: April 1, 1996

Date of Patent: September 3, 2002

Assignee: Qwest Communications International, Inc.

Inventors: Aruna Bayya, Marvin Vis
Speech recognition method

Publication number: 20020120444

Abstract: A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker's already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.

Type: Application

Filed: September 24, 2001

Publication date: August 29, 2002

Inventor: Henrik Botterweck
Method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions

Patent number: 6438518

Abstract: A method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions includes a speech coder configured to select from among various predictive coding modes. After a predefined number of speech frames have been predictively coded, the speech coder codes one frame with a nonpredictive coding mode or a mildly predictive coding mode. The predefined number of frames can be determined in advance from the subjective standpoint of a listener. The predefined number of frames may be varied periodically. An average coding bit rate may be maintained for the speech coder by ensuring that an average coding bit rate is maintained for each successive pattern, or group, of predictively coded speech frames including at least one nonpredictively coded or mildly predictively coded speech frame.

Type: Grant

Filed: October 28, 1999

Date of Patent: August 20, 2002

Assignee: Qualcomm Incorporated

Inventors: Sharath Manjunath, Andrew P. Dejaco, Arasanipalai K. Ananthapadmanabhan, Eddie Lun Tik Choy

prev … 10 11 12 13 14 15 16 next