Dynamic Time Warping Patents (Class 704/241)
  • Patent number: 8838441
    Abstract: A representation of an audio signal having a first, a second and a third frame is derived by estimating first warp information for the first and second frames and second warp information for the second and third frames, the warp information describing pitch information of the audio signal. First or second spectral coefficients for first and second frames or second and third frames are derived using first or second warp information and a first or second weighted representation of the first and second frames or second and third frames, the first or second weighted representation derived by applying a first or second window function to the first and second frames or second and third frames, wherein the first or second window function depends on the first or second warp information. The representation of the audio signal is generated including the first and the second spectral coefficients.
    Type: Grant
    Filed: February 14, 2013
    Date of Patent: September 16, 2014
    Assignee: Dolby International AB
    Inventor: Lars Villemoes
  • Patent number: 8775189
    Abstract: A wireless communication device is disclosed that accepts recorded audio data from an end-user. The audio data can be in the form of a command requesting user action. Likewise, the audio data can be converted into a text file. The audio data is reduced to a digital file in a format that is supported by the device hardware, such as a .wav, .mp3, .vnf file, or the like. The digital file is sent via secured or unsecured wireless communication to one or more server computers for further processing. In accordance with an important aspect of the invention, the system evaluates the confidence level of the of the speech recognition process. If the confidence level is high, the system automatically builds the application command or creates the text file for transmission to the communication device.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: July 8, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Stephen S. Burns, Mickey W. Kowitz
  • Patent number: 8775179
    Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.
    Type: Grant
    Filed: May 6, 2010
    Date of Patent: July 8, 2014
    Assignee: Senam Consulting, Inc.
    Inventor: Serge Olegovich Seyfetdinov
  • Patent number: 8706489
    Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: April 22, 2014
    Assignee: Delta Electronics Inc.
    Inventors: Jia-lin Shen, Chien-Chou Hung
  • Patent number: 8700388
    Abstract: A processed representation of an audio signal having a sequence of frames is generated by sampling the audio signal within first and second frames of the sequence of frames, the second frame following the first frame, the sampling using information on a pitch contour of the first and second frames to derive a first sampled representation. The audio signal is sampled within the second and third frames, the third frame following the second frame in the sequence of frames. The sampling uses the information on the pitch contour of the second frame and information on a pitch contour of the third frame to derive a second sampled representation. A first scaling window is derived for the first sampled representation, and a second scaling window is derived for the second sampled representation, the scaling windows depending on the samplings applied to derive the first sampled representations or the second sampled representation.
    Type: Grant
    Filed: March 23, 2009
    Date of Patent: April 15, 2014
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Bernd Edler, Sascha Disch, Ralf Geiger, Stefan Bayer, Ulrich Kraemer, Guillaume Fuchs, Max Neuendorf, Markus Multrus, Gerald Schuller, Harald Popp
  • Patent number: 8639506
    Abstract: Method, system and computer program for determining the matching between a first and a second sampled signals using an improved Dynamic Time Warping algorithm, called Unbounded DTW. It uses a dynamic programming algorithm to find exact start-end alignment points, unknown a priori, being the initial subsampling of the similarity matrix made via definition of optimal synchronization points, allowing a very fast process.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: January 28, 2014
    Assignee: Telefonica, S.A.
    Inventors: Xavier Anguera Miro, Robert Macrae
  • Patent number: 8626508
    Abstract: Provided are a speech search device, the search speed of which is very fast, the search performance of which is also excellent, and which performs fuzzy search, and a speech search method. Not only the fuzzy search is performed, but also the distance between phoneme discrimination features included in speech data is calculated to determine the similarity with respect to the speech using both a suffix array and dynamic programming, and an object to be searched for is narrowed by means of search keyword division based on a phoneme and search thresholds relative to a plurality of the divided search keywords, the object to be searched for is repeatedly searched for while increasing the search thresholds in order, and whether or not there is the keyword division is determined according to the length of the search keywords, thereby implementing speech search, the search speed of which is very fast and the search performance of which is also excellent.
    Type: Grant
    Filed: February 10, 2010
    Date of Patent: January 7, 2014
    Assignee: National University Corporation TOYOHASHI UNIVERSITY OF TECHNOLOGY
    Inventors: Koichi Katsurada, Tsuneo Nitta, Shigeki Teshima
  • Patent number: 8583439
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: January 12, 2004
    Date of Patent: November 12, 2013
    Assignee: Verizon Services Corp.
    Inventor: James Mark Kondziela
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Patent number: 8538746
    Abstract: A method of providing a quality measure for an output voice signal generated to reproduce an input voice signal, the method comprising: partitioning the input and output signals into frames; for each frame of the input signal, determining a disturbance relative to each of a plurality of frames of the output signal; determining a subset of the determined disturbances comprising one disturbance for each input frame such that a sum of the disturbances in the subset set is a minimum; and using the set of disturbances to provide the measure of quality.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: September 17, 2013
    Assignee: AudioCodes Ltd.
    Inventors: Ilan D. Shallom, Nitay Shiran, Felix Flomen
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Patent number: 8478587
    Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: July 2, 2013
    Assignee: Panasonic Corporation
    Inventors: Takashi Kawamura, Ryouichi Kawanishi
  • Patent number: 8412518
    Abstract: A representation of an audio signal having a first frame, a second frame following the first frame, and a third frame following the second frame, is derived by estimating first warp information for the first and the second frame and second warp information for the second frame and the third frame, the warp information describing a pitch information of the audio signal. First spectral coefficients for the first and the second frame are derived using the first warp information and a first weighted representation of the first and the second frame, the first weighted representation derived by applying a first window function to the first and the second frames, wherein the first window function depends on the first warp information.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: April 2, 2013
    Assignee: Dolby International AB
    Inventor: Lars Villemoes
  • Patent number: 8407051
    Abstract: A speech recognizing apparatus includes a speech start instructing section 3 for instructing to start speech recognition; a speech input section 1 for receiving uttered speech and converting to a speech signal; a speech recognizing section 2 for recognizing the speech on the basis of the speech signal; an utterance start time detecting section 4 for detecting duration from the time when the speech start instructing section instructs to the time when the speech input section delivers the speech signal; an utterance timing deciding section 5 for deciding utterance timing indicating whether the utterance start is quick or slow by comparing the duration detected by the utterance start time detecting section with a prescribed threshold; an interaction control section 6 for determining a content, which is to be shown when exhibiting a recognition result of the speech recognizing section, in accordance with the utterance timing decided; a system response generating section 7 for generating a system response on the b
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: March 26, 2013
    Assignee: Mitsubishi Electric Corporation
    Inventors: Yuzuru Inoue, Tadashi Suzuki, Fumitaka Sato, Takayoshi Chikuri
  • Patent number: 8374869
    Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: February 12, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
  • Patent number: 8355907
    Abstract: In one embodiment, the present invention comprises a vocoder having at least one input and at least one output, an encoder comprising a filter having at least one input operably connected to the input of the vocoder and at least one output, a decoder comprising a synthesizer having at least one input operably connected to the at least one output of the encoder, and at least one output operably connected to the at least one output of the vocoder, wherein the decoder comprises a memory and the decoder is adapted to execute instructions stored in the memory comprising phase matching and time-warping a speech frame.
    Type: Grant
    Filed: July 27, 2005
    Date of Patent: January 15, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Rohit Kapoor, Serafin Diaz Spindola
  • Patent number: 8332212
    Abstract: A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing.
    Type: Grant
    Filed: June 17, 2009
    Date of Patent: December 11, 2012
    Assignee: Cogi, Inc.
    Inventors: Andreas Wittenstein, Mark Cromack
  • Patent number: 8296131
    Abstract: A method of providing a quality measure for an output voice signal generated to reproduce an input voice signal, the method comprising: partitioning the input and output signals into frames; for each frame of the input signal, determining a disturbance relative to each of a plurality of frames of the output signal; determining a subset of the determined disturbances comprising one disturbance for each input frame such that a sum of the disturbances in the subset set is a minimum; and using the set of disturbances to provide the measure of quality.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: October 23, 2012
    Assignee: AudioCodes Ltd.
    Inventors: Ilan D. Shallom, Nitay Shiran
  • Patent number: 8271291
    Abstract: A method for identifying a frame type is disclosed. The present invention includes receiving current frame type information, obtaining previously received previous frame type information, generating frame identification information of a current frame using the current frame type information and the previous frame type information, and identifying the current frame using the frame identification information. And, a method for identifying a frame type is disclosed. The present invention includes receiving a backward type bit corresponding to current frame type information, obtaining a forward type bit corresponding to previous frame type information, generating frame identification information of a current frame by placing the backward type bit at a first position and placing the forward type bit at a second position.
    Type: Grant
    Filed: May 8, 2009
    Date of Patent: September 18, 2012
    Assignee: LG Electronics Inc.
    Inventors: Sang Bae Chon, Lae Hoon Kim, Koeng Mo Sung
  • Patent number: 8258947
    Abstract: Embodiments of the present invention provide a method, system and computer program product for translation verification of source strings for controls in a target application graphical user interface (GUI). In an embodiment of the invention, a method for translation verification of source strings for controls in a target application GUI can include loading a target GUI for an application under test in a functional testing tool executing in memory by a processor of a computing system, retrieving different translated source strings in a target spoken language for respectively different control elements of the target GUI and, determining a score for each one of the translated source strings. Thereafter, an alert can be provided in the functional testing tool for each translated source string corresponding to a determined score failing to meet a threshold value, such as a score that falls below a threshold value, or a score that exceeds a threshold value.
    Type: Grant
    Filed: September 29, 2009
    Date of Patent: September 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Jennifer G. Becker, Kenneth Lee McClamroch, VinodKumar Raghavan, Peter Sun
  • Patent number: 8255216
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: August 28, 2012
    Assignee: Nuance Communications, Inc.
    Inventor: Kenneth D. White
  • Patent number: 8239190
    Abstract: A method of communicating speech comprising time-warping a residual low band speech signal to an expanded or compressed version of the residual low band speech signal, time-warping a high band speech signal to an expanded or compressed version of the high band speech signal, and merging the time-warped low band and high band speech signals to give an entire time-warped speech signal. In the low band, the residual low band speech signal is synthesized after time-warping of the residual low band signal while in the high band, an unwarped high band signal is synthesized before time-warping of the high band speech signal. The method may further comprise classifying speech segments and encoding the speech segments. The encoding of the speech segments may be one of code-excited linear prediction, noise-excited linear prediction or ? frame (silence) coding.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: August 7, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Rohit Kapoor, Serafin Diaz Spindola
  • Patent number: 8238351
    Abstract: The process of traversing a K may involve determining a match between a root node and a Result node of a node on the asCase list of a current K node. When learning is off and a match is not found, the procedure may ignore the particle being processed. An alternative solution determines which node on the asCase list is the most likely to be the next node. While the K Engine is traversing and events are being recorded into a K structure, a count field may be added to each K node to contain a record of how many times each K path has been traversed. The count field may be updated according to the processes traversing the K. Typically, the count is incremented only for learning functions. This count field may be used in determining which node may be the most (or least) probable.
    Type: Grant
    Filed: April 4, 2006
    Date of Patent: August 7, 2012
    Assignee: Unisys Corporation
    Inventor: Jane Campbell Mazzagatti
  • Patent number: 8234411
    Abstract: Methods, systems, computer readable media, and apparatuses for providing enhanced content are presented. Data including a first program, a first caption stream associated with the first program, and a second caption stream associated with the first program may be received. The second caption stream may be extracted from the data, and a second program may be encoded with the second caption stream. The first program may be transmitted with the first caption stream including first captions and may include first content configured to be played back at a first speed. In response to receiving an instruction to change play back speed, the second program may be transmitted with the second caption stream. The second program may include the first content configured to be played back at a second speed different from the first speed, and the second caption stream may include second captions different from the first captions.
    Type: Grant
    Filed: September 2, 2010
    Date of Patent: July 31, 2012
    Assignee: Comcast Cable Communications, LLC
    Inventor: Ross Gilson
  • Patent number: 8165870
    Abstract: The method and apparatus utilize a filter to remove a variety of non-dictated words from data based on probability and improve the effectiveness of creating a language model.
    Type: Grant
    Filed: February 10, 2005
    Date of Patent: April 24, 2012
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Dong Yu, Julian J. Odell, Milind V. Mahajan, Peter K. L. Mau
  • Patent number: 8140330
    Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: March 20, 2012
    Assignee: Robert Bosch GmbH
    Inventors: Mert Cevik, Fuliang Weng
  • Patent number: 8131550
    Abstract: An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: March 6, 2012
    Assignee: Nokia Corporation
    Inventors: Jani Nurminen, Elina Helander
  • Patent number: 8121833
    Abstract: The exemplary embodiments of the invention provide at least a method and an apparatus to perform operations including dividing a sound signal into a series of successive frames, dividing each frame into a number of subframes, producing a residual signal by filtering the sound signal through a linear prediction analysis filter, locating a last pitch pulse of the sound signal of a previous frame from the residual signal, extracting a pitch pulse prototype of given length around a position of the last pitch pulse of the previous frame using the residual signal, and locating pitch pulses in a current frame using the pitch pulse prototype.
    Type: Grant
    Filed: October 21, 2008
    Date of Patent: February 21, 2012
    Assignee: Nokia Corporation
    Inventors: Mikko Tammi, Milan Jelinek, Claude LaFlamme, Vesa Ruoppila
  • Patent number: 8060365
    Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: November 15, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
  • Publication number: 20110224984
    Abstract: Method, system and computer program for determining the matching between a first and a second sampled signals using an improved Dynamic Time Warping algorithm, called Unbounded DTW. It uses a dynamic programming algorithm to find exact start-end alignment points, unknown a priori, being the initial subsampling of the similarity matrix made via definition of optimal synchronization points, allowing a very fast process.
    Type: Application
    Filed: December 10, 2010
    Publication date: September 15, 2011
    Applicant: TELEFONICA, S.A.
    Inventors: Xavier Anguera Miro, Robert Macrae
  • Patent number: 7996213
    Abstract: A similarity degree estimation method is performed by two processes. In a first process, an inter-band correlation matrix is created from spectral data of an input voice such that the spectral data are divided into a plurality of discrete bands which are separated from each other with spaces therebetween along a frequency axis, a plurality of envelope components of the spectral data are obtained from the plurality of the discrete bands, and elements of the inter-band correlation matrix are correlation values between the respective envelope components of the input voice. In a second process, a degree of similarity is calculated between a pair of input voices to be compared with each other by using respective inter-band correlation matrices obtained for the pair of the input voices through the inter-band correlation matrix creation process.
    Type: Grant
    Filed: March 20, 2007
    Date of Patent: August 9, 2011
    Assignee: Yamaha Corporation
    Inventors: Mikio Tohyama, Michiko Kazama, Satoru Goto, Takehiko Kawahara, Yasuo Yoshioka
  • Patent number: 7962336
    Abstract: The present invention provides a method and apparatus for enrollment and evaluation of speaker authentication. The method for enrollment of speaker authentication, comprising: generating a plurality of acoustic feature vector sequences respectively based on a plurality of utterances of the same content spoken by a speaker; generating a reference template from said plurality of acoustic feature vector sequences; generating a corresponding pseudo-impostor feature vector sequence for each of said plurality of acoustic feature vector sequences based on a code book that includes a plurality of codes and their corresponding feature vectors; and selecting an optimal acoustic feature subset based on said plurality of acoustic feature vector sequences, said reference template and said plurality of pseudo-impostor feature vector sequences.
    Type: Grant
    Filed: September 21, 2007
    Date of Patent: June 14, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Jian Luan, Jie Hao
  • Patent number: 7945446
    Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 17, 2011
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
  • Publication number: 20110066434
    Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 17, 2011
    Inventors: Tze-Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Patent number: 7860718
    Abstract: Provided are an apparatus and method for speech segment detection, and a system for speech recognition. The apparatus is equipped with a sound receiver and an image receiver and includes: a lip motion signal detector for detecting a motion region from image frames output from the image receiver, applying lip motion image feature information to the detected motion region, and detecting a lip motion signal; and a speech segment detector for detecting a speech segment using sound frames output from the sound receiver and the lip motion signal detected from the lip motion signal detector. Since lip motion image information is checked in a speech segment detection process, it is possible to prevent dynamic noise from being misrecognized as speech.
    Type: Grant
    Filed: December 4, 2006
    Date of Patent: December 28, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Soo Jong Lee, Sang Hun Kim, Young Jik Lee, Eung Kyeu Kim
  • Patent number: 7860714
    Abstract: The present invention is a detection system of a segment including specific sound signal which detects a segment in a stored sound signal similar to a reference sound signal, including: a reference signal spectrogram division portion which divides a reference signal spectrogram into spectrograms of small-regions; a small-region reference signal spectrogram coding portion which encodes the small-region reference signal spectrogram to a reference signal small-region code; a small-region stored signal spectrogram coding portion which encodes a small-region stored signal spectrogram to a stored signal small-region code; a similar small-region spectrogram detection portion which detects a small-region spectrogram similar to the small-region reference signal spectrograms based on a degree of similarity of a code; and a degree of segment similarity calculation portion which uses a degree of small-region similarity and calculates a degree of similarity between the segment of the stored signal and the reference signal
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: December 28, 2010
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Hidehisa Nagano, Takayuki Kurozumi, Kunio Kashino
  • Patent number: 7809561
    Abstract: The present invention provides a method and apparatus for verification of speaker authentication. A method for verification of speaker authentication, comprising: inputting an utterance containing a password that is spoken by a speaker; extracting an acoustic feature vector sequence from said inputted utterance; DTW-matching said extracted acoustic feature vector sequence and a speaker template enrolled by an enrolled speaker; calculating each of a plurality of local distances between said DTW-matched acoustic feature vector sequence and said speaker template; nonlinear-transforming said each local distance calculated to give more weights on small local distances; calculating a DTW-matching score based on said plurality of local distances nonlinear-transformed; and comparing said matching score with a predefined discriminating threshold to determine whether said inputted utterance is an utterance containing a password spoken by the enrolled speaker.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: October 5, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Jian Luan, Jie Hao
  • Patent number: 7774202
    Abstract: A speech activated control system for controlling aerial vehicle components, program product, and associated methods are provided. The system can include a host processor adapted to develop speech recognition models and to provide speech command recognition. The host processor can be positioned in communication with a database for storing and retrieving speech recognition models. The system can include an avionic computer in communication with the host processor and adapted to provide command function management, a display and control processor in communication with the avionic computer adapted to provide a user interface between a user and the avionic computer, and a data interface positioned in communication with the avionic computer and the host processor provided to divorce speech command recognition functionality from vehicle or aircraft-related speech-command functionality.
    Type: Grant
    Filed: June 12, 2006
    Date of Patent: August 10, 2010
    Assignee: Lockheed Martin Corporation
    Inventors: Richard P. Spengler, Jon C. Russo, Gregory W. Barnett, Kermit L. Armbruster
  • Patent number: 7769566
    Abstract: A method is provided for utilizing the human perceptual system by providing a spectrum of event log data for listening. Event log data is received. Events of the event log data are mapped to an x-axis of a spectrum based on time, where events of the event log data correspond to a time slot on the x-axis. Categories for the events are mapped to a y-axis of the spectrum, where the y-axis is a frequency axis, and where each of the categories respectively corresponds to a frequency of the multiple frequencies. The significance of the events of the event log data is mapped to a z-axis of the spectrum, where the z-axis is a magnitude axis. The time from the x-axis, the multiple frequencies from the y-axis, and the magnitude from z-axis of the spectrum are translated into sound.
    Type: Grant
    Filed: March 4, 2008
    Date of Patent: August 3, 2010
    Assignee: International Business Machines Corporation
    Inventor: Xiaoming Zhang
  • Patent number: 7739111
    Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: June 15, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Kazue Kaneko
  • Patent number: 7733988
    Abstract: A plurality of decoding metrics for a current frame may be generated based on a correlation set for a current frame and a correlation set for at least one previous frame. Whether a signal is present on a control channel may then be determined based on the generated decoding metrics.
    Type: Grant
    Filed: October 28, 2005
    Date of Patent: June 8, 2010
    Assignee: Alcatel-Lucent USA Inc.
    Inventors: Rainer Bachl, Francis Dominique, Hongwei Kong, Walid E. Nabhane
  • Patent number: 7720677
    Abstract: A spectral representation of an audio signal having consecutive audio frames can be derived more efficiently, when a common time warp is estimated for any two neighboring frames, such that a following block transform can additionally use the warp information. Thus, window functions required for successful application of an overlap and add procedure during reconstruction can be derived and applied, the window functions already anticipating the re-sampling of the signal due to the time warping. Therefore, the increased efficiency of block-based transform coding of time-warped signals can be used without introducing audible discontinuities.
    Type: Grant
    Filed: August 11, 2006
    Date of Patent: May 18, 2010
    Assignee: Coding Technologies AB
    Inventor: Lars Villemoes
  • Patent number: 7707033
    Abstract: Training a consumer-oriented application device is based on a plurality of user-presented speech items. A progress measure is reported regarding a training status reached for a particular user person. In particular, the training status is visually represented by an animated character creature which has a plurality of training status representative maturity statuses that are each associated to a corresponding training level.
    Type: Grant
    Filed: June 18, 2002
    Date of Patent: April 27, 2010
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Lucas Jacobus Franciscus Geurts
  • Patent number: 7698136
    Abstract: The present invention is directed to a computer implemented method and apparatus for flexibly recognizing meaningful data items within an arbitrary user utterance. According to one example embodiment of the invention, a set of one or more key phrases and a set of one or more filler phrases are defined, probabilities are assigned to the key phrases and/or the filler phrases, and the user utterances is evaluated against the set of key phrases and the set of filler phrases using the probabilities.
    Type: Grant
    Filed: January 28, 2003
    Date of Patent: April 13, 2010
    Assignee: Voxify, Inc.
    Inventors: Patrick T. M. Nguyen, Adeeb W. M. Shana'a, Amit V. Desai
  • Patent number: 7680651
    Abstract: In accordance with the exemplary embodiments of the invention there is disclosed at least a method and apparatus for determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. Each divided frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame.
    Type: Grant
    Filed: December 13, 2002
    Date of Patent: March 16, 2010
    Assignee: Nokia Corporation
    Inventors: Mikko Tammi, Milan Jelinek, Claude LaFlamme, Vesa Ruoppila
  • Publication number: 20090313016
    Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.
    Type: Application
    Filed: June 13, 2008
    Publication date: December 17, 2009
    Applicant: ROBERT BOSCH GMBH
    Inventors: Mert Cevik, Fuliang Weng
  • Patent number: 7509257
    Abstract: A method and apparatus for adapting reference templates is provided. The method includes adapting one or more reference templates using a stored test utterance by replacing data within at the reference templates with a weighted interpolation of that data and corresponding data within the test utterance.
    Type: Grant
    Filed: December 24, 2002
    Date of Patent: March 24, 2009
    Assignee: Marvell International Ltd.
    Inventor: Hagai Aronowitz
  • Patent number: 7398208
    Abstract: A method models voice units and produces reference segments for modeling voice units. The reference segments describe voice modules by characteristic vectors, the characteristic vectors being stored in the order in which they are found in a training voice signal. Alternative characteristic vectors are associated with each characteristic vector. The reference segments for describing the voice modules are combined during the modeling of larger voice units. In the event of identification, the respectively best adapted characteristic vector alternatives are used to determined the distance between a test utterance and the larger vocal units.
    Type: Grant
    Filed: October 1, 2002
    Date of Patent: July 8, 2008
    Assignee: Siemens Atkiengesellschaft
    Inventor: Bernhard Kämmerer
  • Publication number: 20080162134
    Abstract: The present invention provides for speech processing apparatus arranged for the input or output of a speech data signal and including a function generating means arranged for producing a representation of a vocal-tract potential function representative of a speech source and as an example, a speaker identification process can comprise means to capture an incoming voice signal, for example from a microphone or telephone line; means to process the signal electronically to generate a time varying series of binary vocal-tract potentials and associated non-vowel binary parameters; means to refine the signal to revoke the speaker-independent speech components; and means to compare the residual signal with a database of such residual features of known individuals.
    Type: Application
    Filed: January 7, 2008
    Publication date: July 3, 2008
    Applicant: King's College London
    Inventors: Barbara Janey Forbes, Edward Roy Pike
  • Patent number: 7394833
    Abstract: A device is disclosed that makes packetized and encoded speech data audible to a listener, as is a method for operating the device. The device includes a unit for generating a synchronization request for reducing an amount of synchronization delay, and further includes a speech decoder that is responsive to the synchronization delay adjustment request for executing a time-warping operation for one of lengthening or shortening a duration of a speech frame. In one embodiment the speech decoder comprises a code excited linear prediction (CELP) speech decoder, and the CELP decoder time-warping operation is applied to a reconstructed excitation signal u(k) to derive a time-warped reconstructed signal uw(k). The time-warped reconstructed signal uw(k) is input to a Linear Predictor (LP) synthesis filter to derive a CELP decoder time-warped output signal y^w(k).
    Type: Grant
    Filed: February 11, 2003
    Date of Patent: July 1, 2008
    Assignee: Nokia Corporation
    Inventors: Ari Heikkinen, Ari Lakaniemi