Dynamic Time Warping Patents (Class 704/241)
-
Patent number: 7366667Abstract: Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.Type: GrantFiled: December 21, 2001Date of Patent: April 29, 2008Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventor: Stefan Dobler
-
Patent number: 7317999Abstract: A method for mapping a spectrum obtained from signals under test corresponding to linearly spaced frequencies to logarithmically spaced frequencies in a measuring apparatus. A spectrum within a predetermined frequency range from logarithmically spaced frequencies is selected from this spectrum corresponding to linearly spaced frequencies and vector averaging of the selected spectrum is performed.Type: GrantFiled: April 8, 2005Date of Patent: January 8, 2008Assignee: Agilent Technologies, Inc.Inventors: Kazuhiko Ninomiya, Yoshiyuki Yanagimoto
-
Publication number: 20070239449Abstract: The present invention provides a method and apparatus for verification of speaker authentication. A method for verification of speaker authentication, comprising: inputting an utterance containing a password that is spoken by a speaker; extracting an acoustic feature vector sequence from said inputted utterance; DTW-matching said extracted acoustic feature vector sequence and a speaker template enrolled by an enrolled speaker; calculating each of a plurality of local distances between said DTW-matched acoustic feature vector sequence and said speaker template; nonlinear-transforming said each local distance calculated to give more weights on small local distances; calculating a DTW-matching score based on said plurality of local distances nonlinear-transformed; and comparing said matching score with a predefined discriminating threshold to determine whether said inputted utterance is an utterance containing a password spoken by the enrolled speaker.Type: ApplicationFiled: March 28, 2007Publication date: October 11, 2007Applicant: Kabushiki Kaisha ToshibaInventors: Jian LUAN, Jie HAO
-
Patent number: 7266496Abstract: The present invention discloses a complete speech recognition system having a training button and a recognition button, and the whole system uses the application specific integrated circuit (ASIC) architecture for the design, and also uses the modular design to divide the speech processing into 4 modules: system control module, autocorrelation and linear predictive coefficient module, cepstrum module, and DTW recognition module. Each module forms an intellectual product (IP) component by itself. Each IP component can work with various products and application requirements for the design reuse to greatly shorten the time to market.Type: GrantFiled: December 24, 2002Date of Patent: September 4, 2007Assignee: National Cheng-Kung UniversityInventors: Jhing-Fa Wang, Jia-Ching Wang, Tai-Lung Chen, Chin-Chan Chang
-
Publication number: 20070203699Abstract: A speech recognizer control system, a speech recognizer control method, and a speech recognizer control program make it possible to properly identify a device on the basis of a speech utterance of a user and to control the identified device. The speech recognizer control system includes a speech input unit to which a speech utterance is input from a user, a speech recognizer which recognizes the content of the input speech utterance, a device controller which identifies a device to be controlled among a plurality of devices on the basis of at least the recognized speech utterance content and which controls an operation of the identified device, and a state change storage which stores, as first auxiliary information for identifying a device to be controlled, a state change other than at least a state change caused by a speech utterance from the user among the state changes of operations in the individual devices of the plurality of devices.Type: ApplicationFiled: January 24, 2007Publication date: August 30, 2007Inventor: Hisayuki Nagashima
-
Patent number: 7231315Abstract: A distribution goodness-of-fit test device for testing whether measured data matches an estimated probability distribution has a counting section determination unit, a counting unit and a goodness-of-fit test unit. The counting section determination unit determines according to the number of the measured data, widths of counting sections for counting the measured data. The counting unit counts the numbers of data in the respective determined counting sections. Also, the goodness-of-fit test unit performs a goodness-of-fit test based on the numbers of data in the respective counting sections.Type: GrantFiled: December 3, 2004Date of Patent: June 12, 2007Assignee: Fuji Xerox Co., Ltd.Inventor: Masakazu Fujimoto
-
Patent number: 7171362Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).Type: GrantFiled: August 31, 2001Date of Patent: January 30, 2007Assignee: Siemens AktiengesellschaftInventor: Horst-Udo Hain
-
Patent number: 7149686Abstract: A system and method for eliminating synchronization errors using speech recognition. Using separate audio and visual speech recognition techniques, the inventive system and method identifies visemes, or visual cues which are indicative of articulatory type, in the video content, and identifies phones and their articulatory types in the audio content. Once the two recognition techniques have been applied, the outputs are compared to determine the relative alignment and, if not aligned, a synchronization algorithm is applied to time-adjust one or both of the audio and the visual streams in order to achieve synchronization.Type: GrantFiled: June 23, 2000Date of Patent: December 12, 2006Assignee: International Business Machines CorporationInventors: Paul S. Cohen, John R. Dildine, Edward J. Gleason
-
Patent number: 7143034Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.Type: GrantFiled: October 23, 2002Date of Patent: November 28, 2006Assignee: Postech FoundationInventors: Hong Jeong, Yong Kim
-
Patent number: 7085717Abstract: A method includes (i) measuring first distances between (a) vectors belonging to a set of vectors that represent an utterance and (b) vectors belonging to a set of vectors that represent a template, the measuring being done in accordance with a first order of the utterance vectors a first order of the template vectors, and (ii) measuring second distances between (a) individual vectors belonging to the set of vectors that represent the utterance and (b) individual vectors belonging to the set of vectors that represent the template, the measuring being done in accordance with a second order of the utterance vectors and a second order of the template vectors, and (iii) in which the first template vector order and the second template vector order are different and/or the first utterance vector order and the second utterance vector order are different.Type: GrantFiled: May 21, 2002Date of Patent: August 1, 2006Assignee: Thinkengine Networks, Inc.Inventors: Veton Z. Kepuska, Harinath K. Reddy
-
Patent number: 7062435Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.Type: GrantFiled: July 26, 1999Date of Patent: June 13, 2006Assignee: Canon Kabushiki KaishaInventors: Eli Tzirkel-Hancock, Robert Alexander Keiller
-
Patent number: 7050975Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.Type: GrantFiled: October 9, 2002Date of Patent: May 23, 2006Assignee: Microsoft CorporationInventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
-
Patent number: 7024358Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.Type: GrantFiled: March 11, 2004Date of Patent: April 4, 2006Assignee: Mindspeed Technologies, Inc.Inventors: Eyal Shlomot, Yang Gao
-
Patent number: 6996527Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.Type: GrantFiled: July 26, 2001Date of Patent: February 7, 2006Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
-
Patent number: 6983246Abstract: Distances are measured between vectors representing speech and a stored reference template. Frequency distributions of the distance measurements are generated by counting how many times a particular reference template resulted in the lowest local distance. The numbers in the counters indicate regions (successive vectors) in a reference template that are good matches for speech input.Type: GrantFiled: May 21, 2002Date of Patent: January 3, 2006Assignee: Thinkengine Networks, Inc.Inventors: Veton K. Kepuska, Harinath K. Reddy
-
Patent number: 6978241Abstract: An analyzer determines frequency and amplitudes of an audio signal represented by sinusoids for transmission transmitted to a receiver decoder which includes a synthesizer to reconstruct the audio signal. A pitch detector determines the pitch for transmission to the receiver along with the structure of the spectrum of the speech signal. The structure of the spectrum is often transmitted in the form of LPC parameters. To correct for frequency changes of the periodic component of an audio signal, a frequency change determiner determines a change of the frequency of the periodical component over the analysis period. This change of frequency is transmitted to the decoder for increasing the accuracy of the reconstruction of the audio signal. Further, the frequency change is only used to obtain a more accurate value of the pitch. The frequency change is determined by using a time warper which performs a time transformation such that a time transformed audio signal is obtained with a minimum frequency change.Type: GrantFiled: May 22, 2000Date of Patent: December 20, 2005Assignee: Koninklijke Philips Electronics, N.V.Inventors: Robert Johannes Sluijter, Augustus Josephus Elizabeth Maria Janssen
-
Patent number: 6961702Abstract: The invention relates to a method for generating an adapted reference for automatic speech recognition. In a first step, recognition is performed based on a spoken utterance and a recognition result which corresponds to a currently valid reference is obtained. In a second step, the currently valid reference is adapted in accordance with the utterance in order to create an adapted reference. In a third step, the adapted reference is assessed and it is decided if the adapted reference is used for further recognition.Type: GrantFiled: November 6, 2001Date of Patent: November 1, 2005Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Stefan Dobler, Andreas Kiessling, Ralph Schleifer, Raymond Brückner
-
Patent number: 6879955Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.Type: GrantFiled: June 29, 2001Date of Patent: April 12, 2005Assignee: Microsoft CorporationInventor: Ajit V. Rao
-
Patent number: 6868378Abstract: The invention relates to a process and a system for voice recognition in a noisy signal. In a preferred embodiment, the system (2) comprises modules for detecting speech (30) and for formulating a noise model (31), a module (40) for quantifying the energy level of the noise and for comparing with preestablished energy spans, a parameterization pathway (5) comprising an optional denoising module (51), with Wiener filter, a module (52) for calculating the spectral energy in Bark windows, a module (50, 530) for applying a configuration of shift values (531), by adding these values to the Bark coefficients, as a function of the quantification (40), so as to modify the parameterization, a module (54) for calculating vectors of parameters, and a block (6) for recognizing shapes, performing the voice recognition by comparison with vectors of parameters prerecorded during a learning phase.Type: GrantFiled: November 19, 1999Date of Patent: March 15, 2005Assignee: Thomson-CSF SextantInventor: Pierre-Albert Breton
-
Patent number: 6836758Abstract: A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.Type: GrantFiled: January 9, 2001Date of Patent: December 28, 2004Assignee: Qualcomm IncorporatedInventors: Ning Bi, Andrew P. DeJaco, Harinath Garudadri, Chienchung Chang, William Yee-Ming Huang, Narendranath Malayath, Suhail Jalil, David Puig Oses, Yingyong Qi
-
Method and array for introducing temporal correlation in hidden markov models for speech recognition
Patent number: 6832190Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.Type: GrantFiled: November 10, 2000Date of Patent: December 14, 2004Assignee: Siemens AktiengesellschaftInventors: Jochen Junkawitsch, Harald Höge -
Publication number: 20040181405Abstract: An approach to reduce the quality impact due to lost voiced frame data is presented. The decoder reconstructs the lost frame using the pitch track from a directly prior frame. When the decoder receives the next frame data, it makes a copy of the reconstructed frame data and continuously time warping it and the received frame data so that the peaks of their pitch cycles coincide. Subsequently, the decoder fades out the time-warped reconstructed frame data while fading in the time-warped received frame data. Meanwhile, the endpoint of the received frame data remains fixed to preclude discontinuity with the subsequent frame.Type: ApplicationFiled: March 11, 2004Publication date: September 16, 2004Applicant: Mindspeed Technologies, Inc.Inventors: Eyal Shlomot, Yang Gao
-
Patent number: 6735563Abstract: A method and apparatus for constructing voice templates for a speaker-independent voice recognition system includes segmenting a training utterance to generate time-clustered segments, each segment being represented by a mean. The means for all utterances of a given word are quantized to generate template vectors. Each template vector is compared with testing utterances to generate a comparison result. The comparison is typically a dynamic time warping computation. The training utterances are matched with the template vectors if the comparison result exceeds at least one predefined threshold value, to generate an optimal path result, and the training utterances are partitioned in accordance with the optimal path result. The partitioning is typically a K-means segmentation computation. The partitioned utterances may then be re-quantized and re-compared with the testing utterances until the at least one predefined threshold value is not exceeded.Type: GrantFiled: July 13, 2000Date of Patent: May 11, 2004Assignee: Qualcomm, Inc.Inventor: Ning Bi
-
Patent number: 6714910Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.Type: GrantFiled: June 26, 2000Date of Patent: March 30, 2004Assignee: Koninklijke Philips Electronics, N.V.Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
-
Publication number: 20040049387Abstract: Provided are a dynamic time warping device using speech recognition software, and a speech recognition apparatus using the same. The dynamic time warping device includes memory units for processing characterization vectors of a test pattern and a predetermined reference pattern using a FIFO queue, and a plurality of processing elements serially connected to each other, the plurality of processing elements multiplying a predetermined weight by a difference between the characterization vectors of the test and reference patterns, which are obtained by shifting them in the opposite directions, adding the multiplication result to matching cost values of adjacent nodes, and comparing the addition results to detect the smallest matching cost value. Accordingly, fast speech recognition can be realized by embedding speech recognition software using a dynamic time warping algorithm into hardware.Type: ApplicationFiled: October 23, 2002Publication date: March 11, 2004Inventors: Hong Jeong, Yong Kim
-
Patent number: 6681207Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.Type: GrantFiled: January 12, 2001Date of Patent: January 20, 2004Assignee: Qualcomm IncorporatedInventor: Harinath Garudadri
-
Publication number: 20030220790Abstract: A method includes measuring distances between vectors that represent an utterance and vectors that represent a template, generating information indicative of how well the vectors of the utterance match the vectors of the template, and making a matching decision based on the measured distances and on the generated information.Type: ApplicationFiled: May 21, 2002Publication date: November 27, 2003Inventor: Veton K. Kepuska
-
Publication number: 20030212555Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.Type: ApplicationFiled: May 9, 2002Publication date: November 13, 2003Applicant: OREGON HEALTH & SCIENCEInventor: Jan P.H. van Santen
-
Publication number: 20030200087Abstract: An improved template spotting technique may be implemented as part of text dependent speaker verification system to authenticate a user of a wireless communication device. This technique may be suitable for use in noisy environments and for wireless communication devices with limited processing power. Endpoints of a test utterance are identified by first computing local distances between test frames and a target template. Accumulated distances are then computed from the local distances. Endpoints of the utterance may be identified when one or more of the accumulated distances is below a predetermined threshold. Once endpoints of a test utterance are identified, a dynamic time warp (DTW) process may be used to determine whether the test utterance matches a training template. One embodiment of the present invention aligns multiple training templates to reduce the probability of failing to verify the identity of a speaker that should have been properly verified.Type: ApplicationFiled: April 22, 2002Publication date: October 23, 2003Applicant: D.S.P.C. TECHNOLOGIES LTD.Inventor: Hagai Aronowitz
-
Patent number: 6594630Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.Type: GrantFiled: November 19, 1999Date of Patent: July 15, 2003Assignee: Voice Signal Technologies, Inc.Inventors: Igor Zlokarnik, Daniel Lawrence Roth
-
Patent number: 6594392Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.Type: GrantFiled: May 17, 1999Date of Patent: July 15, 2003Assignee: Intel CorporationInventor: Umberto Santoni
-
Patent number: 6591237Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.Type: GrantFiled: December 13, 1999Date of Patent: July 8, 2003Assignee: Intel CorporationInventor: Adoram Erell
-
Patent number: 6560575Abstract: An apparatus is provided for checking the consistency between two training words which can be used in, for example, a speech recognition or verification system. Two training examples are aligned using a dynamic programming alignment process and an average frame score is calculated from the alignment results together with the worst score in a number of consecutive frames. These values are then compared with similar values obtained from training examples which are known to be consistent to determine if the training examples are consistent.Type: GrantFiled: September 30, 1999Date of Patent: May 6, 2003Assignee: Canon Kabushiki KaishaInventor: Robert Alexander Keiller
-
Patent number: 6542866Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.Type: GrantFiled: September 22, 1999Date of Patent: April 1, 2003Assignee: Microsoft CorporationInventors: Li Jiang, Xuedong Huang
-
Publication number: 20030004718Abstract: A signal modification technique facilitates compact voice coding by employing a continuous, rather than piece-wise continuous, time warp contour to modify an original residual signal to match an idealized contour, avoiding edge effects caused by prior art techniques. Warping is executed using a continuous warp contour lacking spatial discontinuities which does not invert or overly distend the positions of adjacent end points in adjacent frames. The linear shift implemented by the warp contour is derived via quadratic approximation or other method, to reduce the complexity of coding to allow for practical and economical implementation. In particular, the algorithm for determining the warp contour uses only a subset of possible contours contained within a sub-range of the range of possible contours. The relative correlation strengths from these contours are modeled as points on a polynomial trace and the optimum warp contour is calculated by maximizing the modeling function.Type: ApplicationFiled: June 29, 2001Publication date: January 2, 2003Applicant: Microsoft CorporationInventor: Ajit V. Rao
-
Publication number: 20020120445Abstract: An improved representation of transients in audio signals comprises modifying transient locations in such a way that a transient can occur only at a beginning of a sinusoidal segment.Type: ApplicationFiled: November 2, 2001Publication date: August 29, 2002Inventors: Renat Vafin, Richard Heusdens, Steven Leonardus Josephus Dimphina Elisabeth Van De Par, Willem Bastiaan Kleijn
-
Patent number: 6411734Abstract: A method is provided for finding a pose of a geometric model of an object within an image of a scene containing the object that includes providing sub-models of the geometric model; and providing found poses of the sub-models in the image. The method also includes selecting sub-models of the geometric model based on pre-fit selection criteria and/or post-fit selection criteria so as to provide selected sub-models of the geometric model. Thus, the invention automatically removes, disqualifies, or disables found sub-model poses when they fail to satisfy certain user-specified requirements. Examples of such requirements include thresholds on deviations between the found sub-model poses and their corresponding expected poses with respect to the final model pose, as well as limits on the sub-model. The remaining, validated sub-models can then be used to re-compute a more accurate fit of the model to the image.Type: GrantFiled: December 16, 1998Date of Patent: June 25, 2002Assignee: Cognex CorporationInventors: Ivan A. Bachelder, Karen B. Sarachik
-
Publication number: 20020065655Abstract: A speech encoding/decoding method using an encoder working at very low bit rates, comprises a learning step enabling the identification of the “representatives” of the speech signal and an encoding step to segment the speech signal and determine the “best representative” associated with each recognized segment. The method comprises at least one step for the encoding/decoding of at least one of the parameters of the prosody of the recognized segments, such as the energy and/or pitch and/or voicing and/or length of the segments, by using a piece of information on prosody pertaining to the “best representatives”. Application to bit rates lower than 400 bits per second.Type: ApplicationFiled: October 18, 2001Publication date: May 30, 2002Applicant: THALESInventors: Philippe Gournay, Yves-Paul Nakache
-
Patent number: 6389392Abstract: A method and apparatus for pattern recognition comprising comparing an input signal representing an unknown pattern with reference data representing each of a plurality of pre-defined patterns, at least one of the pre-defined patterns being represented by at least two instances of reference data. Successive segments of the input signal are compared with successive segments of the reference data and comparison results for each successive segment are generated. For each pre-defined pattern having at least two instances of reference data, the comparison results for the closest matching segment of reference data for each segment of the input signal are recorded to produce a composite comparison result for the said pre-defined pattern. The unknown pattern is the identified on the basis of the comparison results. Thus the effect of a mismatch between the input signal and each instance of the reference data is reduced by selecting the best segments from the instances of reference data for each pre-defined pattern.Type: GrantFiled: December 8, 1998Date of Patent: May 14, 2002Assignee: British Telecommunications public limited companyInventors: Mark Pawlewski, Aladdin Mohammad Ariyaeeinia, Perasiriyan Sivakumaran
-
Publication number: 20020049590Abstract: In a speech recording arrangement, a sentence to be recorded for speech recognition learning is presented to a user. Speech input by the user for the presented sentence is recognized to obtain a recognized character string. The speech pattern of the recognized character string is compared with the speech pattern of the presented sentence by DP matching to obtain a matching rate therebetween. It is determined whether the matching rate exceeds a predetermined level. If so, the input speech is recorded as learning data. If not, an unmatched portion between the recognized character string and the recording sentence is presented to the user. The user is then instructed to input the speech once again. With this arrangement, speech data with very few improperly pronounced words can be efficiently recorded.Type: ApplicationFiled: October 15, 2001Publication date: April 25, 2002Inventors: Hiroaki Yoshino, Toshiaki Fukada
-
Patent number: 6374222Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.Type: GrantFiled: July 16, 1999Date of Patent: April 16, 2002Assignee: Texas Instruments IncorporatedInventor: Yu-Hung Kao
-
Publication number: 20020032566Abstract: A method for matching an input pattern with a number of stored reference patterns using a dynamic programming matching technique is described. The reference patterns of a reference signal which are at the end of a dynamic programming path for a current input pattern are listed in an active list. The dynamic programming paths are propagated by processing the reference patterns on the active list, and a new active list is generated for the succeeding input pattern. The amount of processing required for each pattern on the active list is reduced by using a pointer which identifies the reference pattern which is the earliest in the sequence of patterns of the current reference signal listed on the new active list during the processing of a preceding dynamic programming path. In a second aspect, a speech recognition interface is used as a control system for a telephony system.Type: ApplicationFiled: July 26, 1999Publication date: March 14, 2002Inventors: ELI TZIRKEL-HANCOCK, ROBERT ALEXANDER KEILLER
-
Patent number: 6321195Abstract: The present invention relates to an automated dialing method for mobile telephones. According to the method, a user enters a telephone number via the keypad of the mobile phone, followed by speaking a corresponding codeword into the handset. The voice signal is encoded using the CODEC and vocoder already on board the mobile phone. The speech is divided into frames and each frame analyzed to ascertain its primary spectral features. These features are stored in memory as associated with the numeric keypad sequence. In recognition mode, the user speaks the codeword into the handset, which is analyzed in a like fashion as in training mode. The primary spectral features are compared with those stored in memory. When a match is declared according to preset criteria, the telephone number is automatically dialed by the mobile phone. Time warping techniques may be applied in the analysis to reduce timing variations.Type: GrantFiled: April 21, 1999Date of Patent: November 20, 2001Assignee: LG Electronics Inc.Inventors: Yun Keun Lee, Jong Seok Lee, Gi Bak Kim, Byoung Soo Lee
-
Patent number: 6301562Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.Type: GrantFiled: April 27, 2000Date of Patent: October 9, 2001Assignee: New Transducers LimitedInventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
-
Publication number: 20010012997Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.Type: ApplicationFiled: December 13, 1999Publication date: August 9, 2001Inventor: ADORAM ERELL
-
Patent number: 6263216Abstract: The apparatus comprises a data memory containing a series of correspondents' call numbers and, for each call number, at least one associated voice print; a sound transducer suitable for picking up the name of a desired corespondent as spoken by the user of the apparatus; voice recognition means suitable for analyzing the correspondent's name as picked up by the transducer and for transforming it into an associated voice print; selective memory addressing means including associative means suitable for finding a voice print in the memory corresponding to the print supplied by the voice recognition means, and in the event of a match, for addressing the corresponding memory position; and means co-operating with the associative means for applying the addressed call number to the radiotelephone circuits.Type: GrantFiled: October 4, 1999Date of Patent: July 17, 2001Assignee: ParrotInventors: Henri Seydoux, Nicolas Besnard
-
Patent number: 6260013Abstract: A speech recognition system has vocabulary word models having for each word model state both a discrete probability distribution function and a continuous probability distribution function. Word models are initially aligned with an input utterance using the discrete probability distribution functions, and an initial matching performed. From well scoring word models, a ranked scoring of those models is generated using the respective continuous probability distribution functions. After each utterance, preselected continuous probability distribution function parameters are discriminatively adjusted to increase the difference in scoring between the best scoring and the next ranking models.Type: GrantFiled: March 14, 1997Date of Patent: July 10, 2001Assignee: Lernout & Hauspie Speech Products N.V.Inventor: Vladimir Sejnoha
-
Patent number: 6236963Abstract: In a speaker normalization processor apparatus, a vocal-tract configuration estimator estimates feature quantities of a vocal-tract configuration showing an anatomical configuration of a vocal tract of each normalization-target speaker, by looking up to a correspondence between vocal-tract configuration parameters and Formant frequencies previously determined based on a vocal tract model of the standard speaker, based on speech waveform data of each normalization-target speaker.Type: GrantFiled: March 16, 1999Date of Patent: May 22, 2001Assignee: ATR Interpreting Telecommunications Research LaboratoriesInventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
-
Patent number: 6226610Abstract: A method and apparatus for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal using a dynamic programming matching technique is described. The second signal patterns which are at the end of a dynamic programming path for a current first signal pattern are listed in an active list 201. The dynamic programming paths are propagated by processing the second signal patterns on the active list, and a new active list 205 is generated for the succeeding input pattern. In order to propagate each path, the system determines how many second signal patterns lie within an overlap region in which a comparison has to be made, and processes each path in dependence upon the determined amount of overlap.Type: GrantFiled: February 8, 1999Date of Patent: May 1, 2001Assignee: Canon Kabushiki KaishaInventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward
-
Patent number: 6195638Abstract: A pattern recognition method of dynamic time warping of two sequences of feature sets onto each other is provided. The method includes the steps of creating a rectangular graph having the two sequences on its two axes, defining a swath of width r, where r is an odd number, centered about a diagonal line connecting the beginning point at the bottom left of the rectangle to the endpoint at the top right of the rectangle and also defining r−1 lines within the swath. The lines defining the swath are parallel to the diagonal line. Each array element k of an r-sized array is associated with a separate array of the r lines within the swath and for each row of the rectangle, the dynamic time warping method recursively generates new path values for each array element k as a function of the previous value of the array element k and of at least one of the current values of the two neighboring array elements k−1 and k+1 of the array element k.Type: GrantFiled: September 2, 1998Date of Patent: February 27, 2001Assignee: Art-Advanced Recognition Technologies Inc.Inventors: Gabriel Ilan, Jacob Goldberger