Dynamic Time Warping Patents (Class 704/241)
  • Patent number: 7366667
    Abstract: Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.
    Type: Grant
    Filed: December 21, 2001
    Date of Patent: April 29, 2008
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Stefan Dobler
  • Patent number: 7317999
    Abstract: A method for mapping a spectrum obtained from signals under test corresponding to linearly spaced frequencies to logarithmically spaced frequencies in a measuring apparatus. A spectrum within a predetermined frequency range from logarithmically spaced frequencies is selected from this spectrum corresponding to linearly spaced frequencies and vector averaging of the selected spectrum is performed.
    Type: Grant
    Filed: April 8, 2005
    Date of Patent: January 8, 2008
    Assignee: Agilent Technologies, Inc.
    Inventors: Kazuhiko Ninomiya, Yoshiyuki Yanagimoto
  • Publication number: 20070239449
    Abstract: The present invention provides a method and apparatus for verification of speaker authentication. A method for verification of speaker authentication, comprising: inputting an utterance containing a password that is spoken by a speaker; extracting an acoustic feature vector sequence from said inputted utterance; DTW-matching said extracted acoustic feature vector sequence and a speaker template enrolled by an enrolled speaker; calculating each of a plurality of local distances between said DTW-matched acoustic feature vector sequence and said speaker template; nonlinear-transforming said each local distance calculated to give more weights on small local distances; calculating a DTW-matching score based on said plurality of local distances nonlinear-transformed; and comparing said matching score with a predefined discriminating threshold to determine whether said inputted utterance is an utterance containing a password spoken by the enrolled speaker.
    Type: Application
    Filed: March 28, 2007
    Publication date: October 11, 2007
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Jian LUAN, Jie HAO
  • Patent number: 7266496
    Abstract: The present invention discloses a complete speech recognition system having a training button and a recognition button, and the whole system uses the application specific integrated circuit (ASIC) architecture for the design, and also uses the modular design to divide the speech processing into 4 modules: system control module, autocorrelation and linear predictive coefficient module, cepstrum module, and DTW recognition module. Each module forms an intellectual product (IP) component by itself. Each IP component can work with various products and application requirements for the design reuse to greatly shorten the time to market.
    Type: Grant
    Filed: December 24, 2002
    Date of Patent: September 4, 2007
    Assignee: National Cheng-Kung University
    Inventors: Jhing-Fa Wang, Jia-Ching Wang, Tai-Lung Chen, Chin-Chan Chang
  • Publication number: 20070203699
    Abstract: A speech recognizer control system, a speech recognizer control method, and a speech recognizer control program make it possible to properly identify a device on the basis of a speech utterance of a user and to control the identified device. The speech recognizer control system includes a speech input unit to which a speech utterance is input from a user, a speech recognizer which recognizes the content of the input speech utterance, a device controller which identifies a device to be controlled among a plurality of devices on the basis of at least the recognized speech utterance content and which controls an operation of the identified device, and a state change storage which stores, as first auxiliary information for identifying a device to be controlled, a state change other than at least a state change caused by a speech utterance from the user among the state changes of operations in the individual devices of the plurality of devices.
    Type: Application
    Filed: January 24, 2007
    Publication date: August 30, 2007
    Inventor: Hisayuki Nagashima
  • Patent number: 7231315
    Abstract: A distribution goodness-of-fit test device for testing whether measured data matches an estimated probability distribution has a counting section determination unit, a counting unit and a goodness-of-fit test unit. The counting section determination unit determines according to the number of the measured data, widths of counting sections for counting the measured data. The counting unit counts the numbers of data in the respective determined counting sections. Also, the goodness-of-fit test unit performs a goodness-of-fit test based on the numbers of data in the respective counting sections.
    Type: Grant
    Filed: December 3, 2004
    Date of Patent: June 12, 2007
    Assignee: Fuji Xerox Co., Ltd.
    Inventor: Masakazu Fujimoto
  • Patent number: 7171362
    Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).
    Type: Grant
    Filed: August 31, 2001
    Date of Patent: January 30, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventor: Horst-Udo Hain
  • Patent number: 7149686
    Abstract: A system and method for eliminating synchronization errors using speech recognition. Using separate audio and visual speech recognition techniques, the inventive system and method identifies visemes, or visual cues which are indicative of articulatory type, in the video content, and identifies phones and their articulatory types in the audio content. Once the two recognition techniques have been applied, the outputs are compared to determine the relative alignment and, if not aligned, a synchronization algorithm is applied to time-adjust one or both of the audio and the visual streams in order to achieve synchronization.
    Type: Grant
    Filed: June 23, 2000
    Date of Patent: December 12, 2006
    Assignee: International Business Machines Corporation
    Inventors: Paul S. Cohen, John R. Dildine, Edward J. Gleason
  • Patent number: 7143034
    Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.
    Type: Grant
    Filed: October 23, 2002
    Date of Patent: November 28, 2006
    Assignee: Postech Foundation
    Inventors: Hong Jeong, Yong Kim
  • Patent number: 7085717
    Abstract: A method includes (i) measuring first distances between (a) vectors belonging to a set of vectors that represent an utterance and (b) vectors belonging to a set of vectors that represent a template, the measuring being done in accordance with a first order of the utterance vectors a first order of the template vectors, and (ii) measuring second distances between (a) individual vectors belonging to the set of vectors that represent the utterance and (b) individual vectors belonging to the set of vectors that represent the template, the measuring being done in accordance with a second order of the utterance vectors and a second order of the template vectors, and (iii) in which the first template vector order and the second template vector order are different and/or the first utterance vector order and the second utterance vector order are different.
    Type: Grant
    Filed: May 21, 2002
    Date of Patent: August 1, 2006
    Assignee: Thinkengine Networks, Inc.
    Inventors: Veton Z. Kepuska, Harinath K. Reddy
  • Patent number: 7062435
    Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.
    Type: Grant
    Filed: July 26, 1999
    Date of Patent: June 13, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Eli Tzirkel-Hancock, Robert Alexander Keiller
  • Patent number: 7050975
    Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
    Type: Grant
    Filed: October 9, 2002
    Date of Patent: May 23, 2006
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
  • Patent number: 7024358
    Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.
    Type: Grant
    Filed: March 11, 2004
    Date of Patent: April 4, 2006
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Eyal Shlomot, Yang Gao
  • Patent number: 6996527
    Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: February 7, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
  • Patent number: 6983246
    Abstract: Distances are measured between vectors representing speech and a stored reference template. Frequency distributions of the distance measurements are generated by counting how many times a particular reference template resulted in the lowest local distance. The numbers in the counters indicate regions (successive vectors) in a reference template that are good matches for speech input.
    Type: Grant
    Filed: May 21, 2002
    Date of Patent: January 3, 2006
    Assignee: Thinkengine Networks, Inc.
    Inventors: Veton K. Kepuska, Harinath K. Reddy
  • Patent number: 6978241
    Abstract: An analyzer determines frequency and amplitudes of an audio signal represented by sinusoids for transmission transmitted to a receiver decoder which includes a synthesizer to reconstruct the audio signal. A pitch detector determines the pitch for transmission to the receiver along with the structure of the spectrum of the speech signal. The structure of the spectrum is often transmitted in the form of LPC parameters. To correct for frequency changes of the periodic component of an audio signal, a frequency change determiner determines a change of the frequency of the periodical component over the analysis period. This change of frequency is transmitted to the decoder for increasing the accuracy of the reconstruction of the audio signal. Further, the frequency change is only used to obtain a more accurate value of the pitch. The frequency change is determined by using a time warper which performs a time transformation such that a time transformed audio signal is obtained with a minimum frequency change.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: December 20, 2005
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Robert Johannes Sluijter, Augustus Josephus Elizabeth Maria Janssen
  • Patent number: 6961702
    Abstract: The invention relates to a method for generating an adapted reference for automatic speech recognition. In a first step, recognition is performed based on a spoken utterance and a recognition result which corresponds to a currently valid reference is obtained. In a second step, the currently valid reference is adapted in accordance with the utterance in order to create an adapted reference. In a third step, the adapted reference is assessed and it is decided if the adapted reference is used for further recognition.
    Type: Grant
    Filed: November 6, 2001
    Date of Patent: November 1, 2005
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Stefan Dobler, Andreas Kiessling, Ralph Schleifer, Raymond Brückner
  • Patent number: 6879955
    Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.
    Type: Grant
    Filed: June 29, 2001
    Date of Patent: April 12, 2005
    Assignee: Microsoft Corporation
    Inventor: Ajit V. Rao
  • Patent number: 6868378
    Abstract: The invention relates to a process and a system for voice recognition in a noisy signal. In a preferred embodiment, the system (2) comprises modules for detecting speech (30) and for formulating a noise model (31), a module (40) for quantifying the energy level of the noise and for comparing with preestablished energy spans, a parameterization pathway (5) comprising an optional denoising module (51), with Wiener filter, a module (52) for calculating the spectral energy in Bark windows, a module (50, 530) for applying a configuration of shift values (531), by adding these values to the Bark coefficients, as a function of the quantification (40), so as to modify the parameterization, a module (54) for calculating vectors of parameters, and a block (6) for recognizing shapes, performing the voice recognition by comparison with vectors of parameters prerecorded during a learning phase.
    Type: Grant
    Filed: November 19, 1999
    Date of Patent: March 15, 2005
    Assignee: Thomson-CSF Sextant
    Inventor: Pierre-Albert Breton
  • Patent number: 6836758
    Abstract: A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.
    Type: Grant
    Filed: January 9, 2001
    Date of Patent: December 28, 2004
    Assignee: Qualcomm Incorporated
    Inventors: Ning Bi, Andrew P. DeJaco, Harinath Garudadri, Chienchung Chang, William Yee-Ming Huang, Narendranath Malayath, Suhail Jalil, David Puig Oses, Yingyong Qi
  • Patent number: 6832190
    Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.
    Type: Grant
    Filed: November 10, 2000
    Date of Patent: December 14, 2004
    Assignee: Siemens Aktiengesellschaft
    Inventors: Jochen Junkawitsch, Harald Höge
  • Publication number: 20040181405
    Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.
    Type: Application
    Filed: March 11, 2004
    Publication date: September 16, 2004
    Applicant: Mindspeed Technologies, Inc.
    Inventors: Eyal Shlomot, Yang Gao
  • Patent number: 6735563
    Abstract: A method and apparatus for constructing voice templates for a speaker-independent voice recognition system includes segmenting a training utterance to generate time-clustered segments, each segment being represented by a mean. The means for all utterances of a given word are quantized to generate template vectors. Each template vector is compared with testing utterances to generate a comparison result. The comparison is typically a dynamic time warping computation. The training utterances are matched with the template vectors if the comparison result exceeds at least one predefined threshold value, to generate an optimal path result, and the training utterances are partitioned in accordance with the optimal path result. The partitioning is typically a K-means segmentation computation. The partitioned utterances may then be re-quantized and re-compared with the testing utterances until the at least one predefined threshold value is not exceeded.
    Type: Grant
    Filed: July 13, 2000
    Date of Patent: May 11, 2004
    Assignee: Qualcomm, Inc.
    Inventor: Ning Bi
  • Patent number: 6714910
    Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.
    Type: Grant
    Filed: June 26, 2000
    Date of Patent: March 30, 2004
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
  • Publication number: 20040049387
    Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.
    Type: Application
    Filed: October 23, 2002
    Publication date: March 11, 2004
    Inventors: Hong Jeong, Yong Kim
  • Patent number: 6681207
    Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: January 20, 2004
    Assignee: Qualcomm Incorporated
    Inventor: Harinath Garudadri
  • Publication number: 20030220790
    Abstract: A method includes measuring distances between vectors that represent an utterance and vectors that represent a template, generating information indicative of how well the vectors of the utterance match the vectors of the template, and making a matching decision based on the measured distances and on the generated information.
    Type: Application
    Filed: May 21, 2002
    Publication date: November 27, 2003
    Inventor: Veton K. Kepuska
  • Publication number: 20030212555
    Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.
    Type: Application
    Filed: May 9, 2002
    Publication date: November 13, 2003
    Applicant: OREGON HEALTH & SCIENCE
    Inventor: Jan P.H. van Santen
  • Publication number: 20030200087
    Abstract: An improved template spotting technique may be implemented as part of text dependent speaker verification system to authenticate a user of a wireless communication device. This technique may be suitable for use in noisy environments and for wireless communication devices with limited processing power. Endpoints of a test utterance are identified by first computing local distances between test frames and a target template. Accumulated distances are then computed from the local distances. Endpoints of the utterance may be identified when one or more of the accumulated distances is below a predetermined threshold. Once endpoints of a test utterance are identified, a dynamic time warp (DTW) process may be used to determine whether the test utterance matches a training template. One embodiment of the present invention aligns multiple training templates to reduce the probability of failing to verify the identity of a speaker that should have been properly verified.
    Type: Application
    Filed: April 22, 2002
    Publication date: October 23, 2003
    Applicant: D.S.P.C. TECHNOLOGIES LTD.
    Inventor: Hagai Aronowitz
  • Patent number: 6594630
    Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.
    Type: Grant
    Filed: November 19, 1999
    Date of Patent: July 15, 2003
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Igor Zlokarnik, Daniel Lawrence Roth
  • Patent number: 6594392
    Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: July 15, 2003
    Assignee: Intel Corporation
    Inventor: Umberto Santoni
  • Patent number: 6591237
    Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.
    Type: Grant
    Filed: December 13, 1999
    Date of Patent: July 8, 2003
    Assignee: Intel Corporation
    Inventor: Adoram Erell
  • Patent number: 6560575
    Abstract: An apparatus is provided for checking the consistency between two training words which can be used in, for example, a speech recognition or verification system. Two training examples are aligned using a dynamic programming alignment process and an average frame score is calculated from the alignment results together with the worst score in a number of consecutive frames. These values are then compared with similar values obtained from training examples which are known to be consistent to determine if the training examples are consistent.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: May 6, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Robert Alexander Keiller
  • Patent number: 6542866
    Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.
    Type: Grant
    Filed: September 22, 1999
    Date of Patent: April 1, 2003
    Assignee: Microsoft Corporation
    Inventors: Li Jiang, Xuedong Huang
  • Publication number: 20030004718
    Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.
    Type: Application
    Filed: June 29, 2001
    Publication date: January 2, 2003
    Applicant: Microsoft Corporation
    Inventor: Ajit V. Rao
  • Publication number: 20020120445
    Abstract: An improved representation of transients in audio signals comprises modifying transient locations in such a way that a transient can occur only at a beginning of a sinusoidal segment.
    Type: Application
    Filed: November 2, 2001
    Publication date: August 29, 2002
    Inventors: Renat Vafin, Richard Heusdens, Steven Leonardus Josephus Dimphina Elisabeth Van De Par, Willem Bastiaan Kleijn
  • Patent number: 6411734
    Abstract: A method is provided for finding a pose of a geometric model of an object within an image of a scene containing the object that includes providing sub-models of the geometric model; and providing found poses of the sub-models in the image. The method also includes selecting sub-models of the geometric model based on pre-fit selection criteria and/or post-fit selection criteria so as to provide selected sub-models of the geometric model. Thus, the invention automatically removes, disqualifies, or disables found sub-model poses when they fail to satisfy certain user-specified requirements. Examples of such requirements include thresholds on deviations between the found sub-model poses and their corresponding expected poses with respect to the final model pose, as well as limits on the sub-model. The remaining, validated sub-models can then be used to re-compute a more accurate fit of the model to the image.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: June 25, 2002
    Assignee: Cognex Corporation
    Inventors: Ivan A. Bachelder, Karen B. Sarachik
  • Publication number: 20020065655
    Abstract: A speech encoding/decoding method using an encoder working at very low bit rates, comprises a learning step enabling the identification of the “representatives” of the speech signal and an encoding step to segment the speech signal and determine the “best representative” associated with each recognized segment. The method comprises at least one step for the encoding/decoding of at least one of the parameters of the prosody of the recognized segments, such as the energy and/or pitch and/or voicing and/or length of the segments, by using a piece of information on prosody pertaining to the “best representatives”. Application to bit rates lower than 400 bits per second.
    Type: Application
    Filed: October 18, 2001
    Publication date: May 30, 2002
    Applicant: THALES
    Inventors: Philippe Gournay, Yves-Paul Nakache
  • Patent number: 6389392
    Abstract: A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.
    Type: Grant
    Filed: December 8, 1998
    Date of Patent: May 14, 2002
    Assignee: British Telecommunications public limited company
    Inventors: Mark Pawlewski, Aladdin Mohammad Ariyaeeinia, Perasiriyan Sivakumaran
  • Publication number: 20020049590
    Abstract: In a speech recording arrangement, a sentence to be recorded for speech recognition learning is presented to a user. Speech input by the user for the presented sentence is recognized to obtain a recognized character string. The speech pattern of the recognized character string is compared with the speech pattern of the presented sentence by DP matching to obtain a matching rate therebetween. It is determined whether the matching rate exceeds a predetermined level. If so, the input speech is recorded as learning data. If not, an unmatched portion between the recognized character string and the recording sentence is presented to the user. The user is then instructed to input the speech once again. With this arrangement, speech data with very few improperly pronounced words can be efficiently recorded.
    Type: Application
    Filed: October 15, 2001
    Publication date: April 25, 2002
    Inventors: Hiroaki Yoshino, Toshiaki Fukada
  • Patent number: 6374222
    Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.
    Type: Grant
    Filed: July 16, 1999
    Date of Patent: April 16, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yu-Hung Kao
  • Publication number: 20020032566
    Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.
    Type: Application
    Filed: July 26, 1999
    Publication date: March 14, 2002
    Inventors: ELI TZIRKEL-HANCOCK, ROBERT ALEXANDER KEILLER
  • Patent number: 6321195
    Abstract: The present invention relates to an automated dialing method for mobile telephones. According to the method, a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset. The voice signal is encoded using the CODEC and vocoder already on board the mobile phone. The speech is divided into frames and each frame analyzed to ascertain its primary spectral features. These features are stored in memory as associated with the numeric keypad sequence. In recognition mode, the user speaks the codeword into the handset, which is analyzed in a like fashion as in training mode. The primary spectral features are compared with those stored in memory. When a match is declared according to preset criteria, the telephone number is automatically dialed by the mobile phone. Time warping techniques may be applied in the analysis to reduce timing variations.
    Type: Grant
    Filed: April 21, 1999
    Date of Patent: November 20, 2001
    Assignee: LG Electronics Inc.
    Inventors: Yun Keun Lee, Jong Seok Lee, Gi Bak Kim, Byoung Soo Lee
  • Patent number: 6301562
    Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: October 9, 2001
    Assignee: New Transducers Limited
    Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
  • Publication number: 20010012997
    Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.
    Type: Application
    Filed: December 13, 1999
    Publication date: August 9, 2001
    Inventor: ADORAM ERELL
  • Patent number: 6263216
    Abstract: The apparatus comprises a data memory containing a series of correspondents' call numbers and, for each call number, at least one associated voice print; a sound transducer suitable for picking up the name of a desired corespondent as spoken by the user of the apparatus; voice recognition means suitable for analyzing the correspondent's name as picked up by the transducer and for transforming it into an associated voice print; selective memory addressing means including associative means suitable for finding a voice print in the memory corresponding to the print supplied by the voice recognition means, and in the event of a match, for addressing the corresponding memory position; and means co-operating with the associative means for applying the addressed call number to the radiotelephone circuits.
    Type: Grant
    Filed: October 4, 1999
    Date of Patent: July 17, 2001
    Assignee: Parrot
    Inventors: Henri Seydoux, Nicolas Besnard
  • Patent number: 6260013
    Abstract: A speech recognition system has vocabulary word models having for each word model state both a discrete probability distribution function and a continuous probability distribution function. Word models are initially aligned with an input utterance using the discrete probability distribution functions, and an initial matching performed. From well scoring word models, a ranked scoring of those models is generated using the respective continuous probability distribution functions. After each utterance, preselected continuous probability distribution function parameters are discriminatively adjusted to increase the difference in scoring between the best scoring and the next ranking models.
    Type: Grant
    Filed: March 14, 1997
    Date of Patent: July 10, 2001
    Assignee: Lernout & Hauspie Speech Products N.V.
    Inventor: Vladimir Sejnoha
  • Patent number: 6236963
    Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.
    Type: Grant
    Filed: March 16, 1999
    Date of Patent: May 22, 2001
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 6226610
    Abstract: A method and apparatus for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal using a dynamic programming matching technique is described. The second signal patterns which are at the end of a dynamic programming path for a current first signal pattern are listed in an active list 201. The dynamic programming paths are propagated by processing the second signal patterns on the active list, and a new active list 205 is generated for the succeeding input pattern. In order to propagate each path, the system determines how many second signal patterns lie within an overlap region in which a comparison has to be made, and processes each path in dependence upon the determined amount of overlap.
    Type: Grant
    Filed: February 8, 1999
    Date of Patent: May 1, 2001
    Assignee: Canon Kabushiki Kaisha
    Inventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward
  • Patent number: 6195638
    Abstract: A pattern recognition method of dynamic time warping of two sequences of feature sets onto each other is provided. The method includes the steps of creating a rectangular graph having the two sequences on its two axes, defining a swath of width r, where r is an odd number, centered about a diagonal line connecting the beginning point at the bottom left of the rectangle to the endpoint at the top right of the rectangle and also defining r−1 lines within the swath. The lines defining the swath are parallel to the diagonal line. Each array element k of an r-sized array is associated with a separate array of the r lines within the swath and for each row of the rectangle, the dynamic time warping method recursively generates new path values for each array element k as a function of the previous value of the array element k and of at least one of the current values of the two neighboring array elements k−1 and k+1 of the array element k.
    Type: Grant
    Filed: September 2, 1998
    Date of Patent: February 27, 2001
    Assignee: Art-Advanced Recognition Technologies Inc.
    Inventors: Gabriel Ilan, Jacob Goldberger