Dynamic Time Warping Patents (Class 704/241)

Method and device for pause limit values in speech recognition

Patent number: 7366667

Abstract: Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.

Type: Grant

Filed: December 21, 2001

Date of Patent: April 29, 2008

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventor: Stefan Dobler
Method of mapping linearly spaced spectrum points to logarithmically spaced frequency and a measuring apparatus using the method

Patent number: 7317999

Abstract: A method for mapping a spectrum obtained from signals under test corresponding to linearly spaced frequencies to logarithmically spaced frequencies in a measuring apparatus. A spectrum within a predetermined frequency range from logarithmically spaced frequencies is selected from this spectrum corresponding to linearly spaced frequencies and vector averaging of the selected spectrum is performed.

Type: Grant

Filed: April 8, 2005

Date of Patent: January 8, 2008

Assignee: Agilent Technologies, Inc.

Inventors: Kazuhiko Ninomiya, Yoshiyuki Yanagimoto
METHOD AND APPARATUS FOR VERIFICATION OF SPEAKER AUTHENTICATION

Publication number: 20070239449

Abstract: The present invention provides a method and apparatus for verification of speaker authentication. A method for verification of speaker authentication, comprising: inputting an utterance containing a password that is spoken by a speaker; extracting an acoustic feature vector sequence from said inputted utterance; DTW-matching said extracted acoustic feature vector sequence and a speaker template enrolled by an enrolled speaker; calculating each of a plurality of local distances between said DTW-matched acoustic feature vector sequence and said speaker template; nonlinear-transforming said each local distance calculated to give more weights on small local distances; calculating a DTW-matching score based on said plurality of local distances nonlinear-transformed; and comparing said matching score with a predefined discriminating threshold to determine whether said inputted utterance is an utterance containing a password spoken by the enrolled speaker.

Type: Application

Filed: March 28, 2007

Publication date: October 11, 2007

Applicant: Kabushiki Kaisha Toshiba

Inventors: Jian LUAN, Jie HAO
Speech recognition system

Patent number: 7266496

Abstract: The present invention discloses a complete speech recognition system having a training button and a recognition button, and the whole system uses the application specific integrated circuit (ASIC) architecture for the design, and also uses the modular design to divide the speech processing into 4 modules: system control module, autocorrelation and linear predictive coefficient module, cepstrum module, and DTW recognition module. Each module forms an intellectual product (IP) component by itself. Each IP component can work with various products and application requirements for the design reuse to greatly shorten the time to market.

Type: Grant

Filed: December 24, 2002

Date of Patent: September 4, 2007

Assignee: National Cheng-Kung University

Inventors: Jhing-Fa Wang, Jia-Ching Wang, Tai-Lung Chen, Chin-Chan Chang
Speech recognizer control system, speech recognizer control method, and speech recognizer control program

Publication number: 20070203699

Abstract: A speech recognizer control system, a speech recognizer control method, and a speech recognizer control program make it possible to properly identify a device on the basis of a speech utterance of a user and to control the identified device. The speech recognizer control system includes a speech input unit to which a speech utterance is input from a user, a speech recognizer which recognizes the content of the input speech utterance, a device controller which identifies a device to be controlled among a plurality of devices on the basis of at least the recognized speech utterance content and which controls an operation of the identified device, and a state change storage which stores, as first auxiliary information for identifying a device to be controlled, a state change other than at least a state change caused by a speech utterance from the user among the state changes of operations in the individual devices of the plurality of devices.

Type: Application

Filed: January 24, 2007

Publication date: August 30, 2007

Inventor: Hisayuki Nagashima
Distribution goodness-of-fit test device, consumable goods supply timing judgment device, image forming device, distribution goodness-of-fit test method and distribution goodness-of-fit test program

Patent number: 7231315

Abstract: A distribution goodness-of-fit test device for testing whether measured data matches an estimated probability distribution has a counting section determination unit, a counting unit and a goodness-of-fit test unit. The counting section determination unit determines according to the number of the measured data, widths of counting sections for counting the measured data. The counting unit counts the numbers of data in the respective determined counting sections. Also, the goodness-of-fit test unit performs a goodness-of-fit test based on the numbers of data in the respective counting sections.

Type: Grant

Filed: December 3, 2004

Date of Patent: June 12, 2007

Assignee: Fuji Xerox Co., Ltd.

Inventor: Masakazu Fujimoto
Assignment of phonemes to the graphemes producing them

Patent number: 7171362

Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).

Type: Grant

Filed: August 31, 2001

Date of Patent: January 30, 2007

Assignee: Siemens Aktiengesellschaft

Inventor: Horst-Udo Hain
System and method for eliminating synchronization errors in electronic audiovisual transmissions and presentations

Patent number: 7149686

Abstract: A system and method for eliminating synchronization errors using speech recognition. Using separate audio and visual speech recognition techniques, the inventive system and method identifies visemes, or visual cues which are indicative of articulatory type, in the video content, and identifies phones and their articulatory types in the audio content. Once the two recognition techniques have been applied, the outputs are compared to determine the relative alignment and, if not aligned, a synchronization algorithm is applied to time-adjust one or both of the audio and the visual streams in order to achieve synchronization.

Type: Grant

Filed: June 23, 2000

Date of Patent: December 12, 2006

Assignee: International Business Machines Corporation

Inventors: Paul S. Cohen, John R. Dildine, Edward J. Gleason
Dynamic time warping device for detecting a reference pattern having a smallest matching cost value with respect to a test pattern, and speech recognition apparatus using the same

Patent number: 7143034

Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.

Type: Grant

Filed: October 23, 2002

Date of Patent: November 28, 2006

Assignee: Postech Foundation

Inventors: Hong Jeong, Yong Kim
Scoring and re-scoring dynamic time warping of speech

Patent number: 7085717

Abstract: A method includes (i) measuring first distances between (a) vectors belonging to a set of vectors that represent an utterance and (b) vectors belonging to a set of vectors that represent a template, the measuring being done in accordance with a first order of the utterance vectors a first order of the template vectors, and (ii) measuring second distances between (a) individual vectors belonging to the set of vectors that represent the utterance and (b) individual vectors belonging to the set of vectors that represent the template, the measuring being done in accordance with a second order of the utterance vectors and a second order of the template vectors, and (iii) in which the first template vector order and the second template vector order are different and/or the first utterance vector order and the second utterance vector order are different.

Type: Grant

Filed: May 21, 2002

Date of Patent: August 1, 2006

Assignee: Thinkengine Networks, Inc.

Inventors: Veton Z. Kepuska, Harinath K. Reddy
Apparatus, method and computer readable memory medium for speech recognition using dynamic programming

Patent number: 7062435

Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.

Type: Grant

Filed: July 26, 1999

Date of Patent: June 13, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Eli Tzirkel-Hancock, Robert Alexander Keiller
Method of speech recognition using time-dependent interpolation and hidden dynamic value classes

Patent number: 7050975

Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.

Type: Grant

Filed: October 9, 2002

Date of Patent: May 23, 2006

Assignee: Microsoft Corporation

Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
Recovering an erased voice frame with time warping

Patent number: 7024358

Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.

Type: Grant

Filed: March 11, 2004

Date of Patent: April 4, 2006

Assignee: Mindspeed Technologies, Inc.

Inventors: Eyal Shlomot, Yang Gao
Linear discriminant based sound class similarities with unit value normalization

Patent number: 6996527

Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.

Type: Grant

Filed: July 26, 2001

Date of Patent: February 7, 2006

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
Dynamic time warping using frequency distributed distance measures

Patent number: 6983246

Abstract: Distances are measured between vectors representing speech and a stored reference template. Frequency distributions of the distance measurements are generated by counting how many times a particular reference template resulted in the lowest local distance. The numbers in the counters indicate regions (successive vectors) in a reference template that are good matches for speech input.

Type: Grant

Filed: May 21, 2002

Date of Patent: January 3, 2006

Assignee: Thinkengine Networks, Inc.

Inventors: Veton K. Kepuska, Harinath K. Reddy
Transmission system for transmitting an audio signal

Patent number: 6978241

Abstract: An analyzer determines frequency and amplitudes of an audio signal represented by sinusoids for transmission transmitted to a receiver decoder which includes a synthesizer to reconstruct the audio signal. A pitch detector determines the pitch for transmission to the receiver along with the structure of the spectrum of the speech signal. The structure of the spectrum is often transmitted in the form of LPC parameters. To correct for frequency changes of the periodic component of an audio signal, a frequency change determiner determines a change of the frequency of the periodical component over the analysis period. This change of frequency is transmitted to the decoder for increasing the accuracy of the reconstruction of the audio signal. Further, the frequency change is only used to obtain a more accurate value of the pitch. The frequency change is determined by using a time warper which performs a time transformation such that a time transformed audio signal is obtained with a minimum frequency change.

Type: Grant

Filed: May 22, 2000

Date of Patent: December 20, 2005

Assignee: Koninklijke Philips Electronics, N.V.

Inventors: Robert Johannes Sluijter, Augustus Josephus Elizabeth Maria Janssen
Method and device for generating an adapted reference for automatic speech recognition

Patent number: 6961702

Abstract: The invention relates to a method for generating an adapted reference for automatic speech recognition. In a first step, recognition is performed based on a spoken utterance and a recognition result which corresponds to a currently valid reference is obtained. In a second step, the currently valid reference is adapted in accordance with the utterance in order to create an adapted reference. In a third step, the adapted reference is assessed and it is decided if the adapted reference is used for further recognition.

Type: Grant

Filed: November 6, 2001

Date of Patent: November 1, 2005

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Stefan Dobler, Andreas Kiessling, Ralph Schleifer, Raymond Brückner
Signal modification based on continuous time warping for low bit rate CELP coding

Patent number: 6879955

Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.

Type: Grant

Filed: June 29, 2001

Date of Patent: April 12, 2005

Assignee: Microsoft Corporation

Inventor: Ajit V. Rao
Process for voice recognition in a noisy acoustic signal and system implementing this process

Patent number: 6868378

Abstract: The invention relates to a process and a system for voice recognition in a noisy signal. In a preferred embodiment, the system (2) comprises modules for detecting speech (30) and for formulating a noise model (31), a module (40) for quantifying the energy level of the noise and for comparing with preestablished energy spans, a parameterization pathway (5) comprising an optional denoising module (51), with Wiener filter, a module (52) for calculating the spectral energy in Bark windows, a module (50, 530) for applying a configuration of shift values (531), by adding these values to the Bark coefficients, as a function of the quantification (40), so as to modify the parameterization, a module (54) for calculating vectors of parameters, and a block (6) for recognizing shapes, performing the voice recognition by comparison with vectors of parameters prerecorded during a learning phase.

Type: Grant

Filed: November 19, 1999

Date of Patent: March 15, 2005

Assignee: Thomson-CSF Sextant

Inventor: Pierre-Albert Breton
System and method for hybrid voice recognition

Patent number: 6836758

Abstract: A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.

Type: Grant

Filed: January 9, 2001

Date of Patent: December 28, 2004

Assignee: Qualcomm Incorporated

Inventors: Ning Bi, Andrew P. DeJaco, Harinath Garudadri, Chienchung Chang, William Yee-Ming Huang, Narendranath Malayath, Suhail Jalil, David Puig Oses, Yingyong Qi
Method and array for introducing temporal correlation in hidden markov models for speech recognition

Patent number: 6832190

Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.

Type: Grant

Filed: November 10, 2000

Date of Patent: December 14, 2004

Assignee: Siemens Aktiengesellschaft

Inventors: Jochen Junkawitsch, Harald Höge
Recovering an erased voice frame with time warping

Publication number: 20040181405

Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.

Type: Application

Filed: March 11, 2004

Publication date: September 16, 2004

Applicant: Mindspeed Technologies, Inc.

Inventors: Eyal Shlomot, Yang Gao
Method and apparatus for constructing voice templates for a speaker-independent voice recognition system

Patent number: 6735563

Abstract: A method and apparatus for constructing voice templates for a speaker-independent voice recognition system includes segmenting a training utterance to generate time-clustered segments, each segment being represented by a mean. The means for all utterances of a given word are quantized to generate template vectors. Each template vector is compared with testing utterances to generate a comparison result. The comparison is typically a dynamic time warping computation. The training utterances are matched with the template vectors if the comparison result exceeds at least one predefined threshold value, to generate an optimal path result, and the training utterances are partitioned in accordance with the optimal path result. The partitioning is typically a K-means segmentation computation. The partitioned utterances may then be re-quantized and re-compared with the testing utterances until the at least one predefined threshold value is not exceeded.

Type: Grant

Filed: July 13, 2000

Date of Patent: May 11, 2004

Assignee: Qualcomm, Inc.

Inventor: Ning Bi
Method of training an automatic speech recognizer

Patent number: 6714910

Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.

Type: Grant

Filed: June 26, 2000

Date of Patent: March 30, 2004

Assignee: Koninklijke Philips Electronics, N.V.

Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
Dynamic time warping device and speech recognition apparatus using the same

Publication number: 20040049387

Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.

Type: Application

Filed: October 23, 2002

Publication date: March 11, 2004

Inventors: Hong Jeong, Yong Kim
System and method for lossy compression of voice recognition models

Patent number: 6681207

Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.

Type: Grant

Filed: January 12, 2001

Date of Patent: January 20, 2004

Assignee: Qualcomm Incorporated

Inventor: Harinath Garudadri
Dynamic time warping of speech

Publication number: 20030220790

Abstract: A method includes measuring distances between vectors that represent an utterance and vectors that represent a template, generating information indicative of how well the vectors of the utterance match the vectors of the template, and making a matching decision based on the measured distances and on the generated information.

Type: Application

Filed: May 21, 2002

Publication date: November 27, 2003

Inventor: Veton K. Kepuska
System and method for compressing concatenative acoustic inventories for speech synthesis

Publication number: 20030212555

Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.

Type: Application

Filed: May 9, 2002

Publication date: November 13, 2003

Applicant: OREGON HEALTH & SCIENCE

Inventor: Jan P.H. van Santen
Speaker recognition using dynamic time warp template spotting

Publication number: 20030200087

Abstract: An improved template spotting technique may be implemented as part of text dependent speaker verification system to authenticate a user of a wireless communication device. This technique may be suitable for use in noisy environments and for wireless communication devices with limited processing power. Endpoints of a test utterance are identified by first computing local distances between test frames and a target template. Accumulated distances are then computed from the local distances. Endpoints of the utterance may be identified when one or more of the accumulated distances is below a predetermined threshold. Once endpoints of a test utterance are identified, a dynamic time warp (DTW) process may be used to determine whether the test utterance matches a training template. One embodiment of the present invention aligns multiple training templates to reduce the probability of failing to verify the identity of a speaker that should have been properly verified.

Type: Application

Filed: April 22, 2002

Publication date: October 23, 2003

Applicant: D.S.P.C. TECHNOLOGIES LTD.

Inventor: Hagai Aronowitz
Voice-activated control for electrical device

Patent number: 6594630

Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.

Type: Grant

Filed: November 19, 1999

Date of Patent: July 15, 2003

Assignee: Voice Signal Technologies, Inc.

Inventors: Igor Zlokarnik, Daniel Lawrence Roth
Pattern recognition based on piecewise linear probability density function

Patent number: 6594392

Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.

Type: Grant

Filed: May 17, 1999

Date of Patent: July 15, 2003

Assignee: Intel Corporation

Inventor: Umberto Santoni
Keyword recognition system and method

Patent number: 6591237

Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.

Type: Grant

Filed: December 13, 1999

Date of Patent: July 8, 2003

Assignee: Intel Corporation

Inventor: Adoram Erell
Speech processing apparatus and method

Patent number: 6560575

Abstract: An apparatus is provided for checking the consistency between two training words which can be used in, for example, a speech recognition or verification system. Two training examples are aligned using a dynamic programming alignment process and an average frame score is calculated from the alignment results together with the worst score in a number of consecutive frames. These values are then compared with similar values obtained from training examples which are known to be consistent to determine if the training examples are consistent.

Type: Grant

Filed: September 30, 1999

Date of Patent: May 6, 2003

Assignee: Canon Kabushiki Kaisha

Inventor: Robert Alexander Keiller
Speech recognition method and apparatus utilizing multiple feature streams

Patent number: 6542866

Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.

Type: Grant

Filed: September 22, 1999

Date of Patent: April 1, 2003

Assignee: Microsoft Corporation

Inventors: Li Jiang, Xuedong Huang
Signal modification based on continous time warping for low bit-rate celp coding

Publication number: 20030004718

Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.

Type: Application

Filed: June 29, 2001

Publication date: January 2, 2003

Applicant: Microsoft Corporation

Inventor: Ajit V. Rao
Coding signals

Publication number: 20020120445

Abstract: An improved representation of transients in audio signals comprises modifying transient locations in such a way that a transient can occur only at a beginning of a sinusoidal segment.

Type: Application

Filed: November 2, 2001

Publication date: August 29, 2002

Inventors: Renat Vafin, Richard Heusdens, Steven Leonardus Josephus Dimphina Elisabeth Van De Par, Willem Bastiaan Kleijn
Object image search using validated sub-model poses

Patent number: 6411734

Abstract: A method is provided for finding a pose of a geometric model of an object within an image of a scene containing the object that includes providing sub-models of the geometric model; and providing found poses of the sub-models in the image. The method also includes selecting sub-models of the geometric model based on pre-fit selection criteria and/or post-fit selection criteria so as to provide selected sub-models of the geometric model. Thus, the invention automatically removes, disqualifies, or disables found sub-model poses when they fail to satisfy certain user-specified requirements. Examples of such requirements include thresholds on deviations between the found sub-model poses and their corresponding expected poses with respect to the final model pose, as well as limits on the sub-model. The remaining, validated sub-models can then be used to re-compute a more accurate fit of the model to the image.

Type: Grant

Filed: December 16, 1998

Date of Patent: June 25, 2002

Assignee: Cognex Corporation

Inventors: Ivan A. Bachelder, Karen B. Sarachik
Method for the encoding of prosody for a speech encoder working at very low bit rates

Publication number: 20020065655

Abstract: A speech encoding/decoding method using an encoder working at very low bit rates, comprises a learning step enabling the identification of the “representatives” of the speech signal and an encoding step to segment the speech signal and determine the “best representative” associated with each recognized segment. The method comprises at least one step for the encoding/decoding of at least one of the parameters of the prosody of the recognized segments, such as the energy and/or pitch and/or voicing and/or length of the segments, by using a piece of information on prosody pertaining to the “best representatives”. Application to bit rates lower than 400 bits per second.

Type: Application

Filed: October 18, 2001

Publication date: May 30, 2002

Applicant: THALES

Inventors: Philippe Gournay, Yves-Paul Nakache
Method and apparatus for speaker recognition via comparing an unknown input to reference data

Patent number: 6389392

Abstract: A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.

Type: Grant

Filed: December 8, 1998

Date of Patent: May 14, 2002

Assignee: British Telecommunications public limited company

Inventors: Mark Pawlewski, Aladdin Mohammad Ariyaeeinia, Perasiriyan Sivakumaran
Speech data recording apparatus and method for speech recognition learning

Publication number: 20020049590

Abstract: In a speech recording arrangement, a sentence to be recorded for speech recognition learning is presented to a user. Speech input by the user for the presented sentence is recognized to obtain a recognized character string. The speech pattern of the recognized character string is compared with the speech pattern of the presented sentence by DP matching to obtain a matching rate therebetween. It is determined whether the matching rate exceeds a predetermined level. If so, the input speech is recorded as learning data. If not, an unmatched portion between the recognized character string and the recording sentence is presented to the user. The user is then instructed to input the speech once again. With this arrangement, speech data with very few improperly pronounced words can be efficiently recorded.

Type: Application

Filed: October 15, 2001

Publication date: April 25, 2002

Inventors: Hiroaki Yoshino, Toshiaki Fukada
Method of memory management in speech recognition

Patent number: 6374222

Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.

Type: Grant

Filed: July 16, 1999

Date of Patent: April 16, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
APPARATUS, METHOD AND COMPUTER READABLE MEMORY MEDIUM FOR SPEECH RECOGNITON USING DYNAMIC PROGRAMMING

Publication number: 20020032566

Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.

Type: Application

Filed: July 26, 1999

Publication date: March 14, 2002

Inventors: ELI TZIRKEL-HANCOCK, ROBERT ALEXANDER KEILLER
Speech recognition method

Patent number: 6321195

Abstract: The present invention relates to an automated dialing method for mobile telephones. According to the method, a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset. The voice signal is encoded using the CODEC and vocoder already on board the mobile phone. The speech is divided into frames and each frame analyzed to ascertain its primary spectral features. These features are stored in memory as associated with the numeric keypad sequence. In recognition mode, the user speaks the codeword into the handset, which is analyzed in a like fashion as in training mode. The primary spectral features are compared with those stored in memory. When a match is declared according to preset criteria, the telephone number is automatically dialed by the mobile phone. Time warping techniques may be applied in the analysis to reduce timing variations.

Type: Grant

Filed: April 21, 1999

Date of Patent: November 20, 2001

Assignee: LG Electronics Inc.

Inventors: Yun Keun Lee, Jong Seok Lee, Gi Bak Kim, Byoung Soo Lee
Speech recognition using both time encoding and HMM in parallel

Patent number: 6301562

Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.

Type: Grant

Filed: April 27, 2000

Date of Patent: October 9, 2001

Assignee: New Transducers Limited

Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
KEYWORD RECOGNITION SYSTEM AND METHOD

Publication number: 20010012997

Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.

Type: Application

Filed: December 13, 1999

Publication date: August 9, 2001

Inventor: ADORAM ERELL
Radiotelephone voice control device, in particular for use in a motor vehicle

Patent number: 6263216

Abstract: The apparatus comprises a data memory containing a series of correspondents' call numbers and, for each call number, at least one associated voice print; a sound transducer suitable for picking up the name of a desired corespondent as spoken by the user of the apparatus; voice recognition means suitable for analyzing the correspondent's name as picked up by the transducer and for transforming it into an associated voice print; selective memory addressing means including associative means suitable for finding a voice print in the memory corresponding to the print supplied by the voice recognition means, and in the event of a match, for addressing the corresponding memory position; and means co-operating with the associative means for applying the addressed call number to the radiotelephone circuits.

Type: Grant

Filed: October 4, 1999

Date of Patent: July 17, 2001

Assignee: Parrot

Inventors: Henri Seydoux, Nicolas Besnard
Speech recognition system employing discriminatively trained models

Patent number: 6260013

Abstract: A speech recognition system has vocabulary word models having for each word model state both a discrete probability distribution function and a continuous probability distribution function. Word models are initially aligned with an input utterance using the discrete probability distribution functions, and an initial matching performed. From well scoring word models, a ranked scoring of those models is generated using the respective continuous probability distribution functions. After each utterance, preselected continuous probability distribution function parameters are discriminatively adjusted to increase the difference in scoring between the best scoring and the next ranking models.

Type: Grant

Filed: March 14, 1997

Date of Patent: July 10, 2001

Assignee: Lernout & Hauspie Speech Products N.V.

Inventor: Vladimir Sejnoha
Speaker normalization processor apparatus for generating frequency warping function, and speech recognition apparatus with said speaker normalization processor apparatus

Patent number: 6236963

Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.

Type: Grant

Filed: March 16, 1999

Date of Patent: May 22, 2001

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
DP Pattern matching which determines current path propagation using the amount of path overlap to the subsequent time point

Patent number: 6226610

Abstract: A method and apparatus for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal using a dynamic programming matching technique is described. The second signal patterns which are at the end of a dynamic programming path for a current first signal pattern are listed in an active list 201. The dynamic programming paths are propagated by processing the second signal patterns on the active list, and a new active list 205 is generated for the succeeding input pattern. In order to propagate each path, the system determines how many second signal patterns lie within an overlap region in which a comparison has to be made, and processes each path in dependence upon the determined amount of overlap.

Type: Grant

Filed: February 8, 1999

Date of Patent: May 1, 2001

Assignee: Canon Kabushiki Kaisha

Inventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward
Pattern recognition system

Patent number: 6195638

Abstract: A pattern recognition method of dynamic time warping of two sequences of feature sets onto each other is provided. The method includes the steps of creating a rectangular graph having the two sequences on its two axes, defining a swath of width r, where r is an odd number, centered about a diagonal line connecting the beginning point at the bottom left of the rectangle to the endpoint at the top right of the rectangle and also defining r−1 lines within the swath. The lines defining the swath are parallel to the diagonal line. Each array element k of an r-sized array is associated with a separate array of the r lines within the swath and for each row of the rectangle, the dynamic time warping method recursively generates new path values for each array element k as a function of the previous value of the array element k and of at least one of the current values of the two neighboring array elements k−1 and k+1 of the array element k.

Type: Grant

Filed: September 2, 1998

Date of Patent: February 27, 2001

Assignee: Art-Advanced Recognition Technologies Inc.

Inventors: Gabriel Ilan, Jacob Goldberger

prev 1 2 3 4 next