Similarity Patents (Class 704/239)
  • Patent number: 7551834
    Abstract: The object of the present invention is to provide a high-speed signal search method, device, and a recording medium for the same that can obtain detection results equivalent to precisely moving a window over the entire region of the input signal even when there is not precise movement of a window over the entire signal.
    Type: Grant
    Filed: October 22, 2004
    Date of Patent: June 23, 2009
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Kunio Kashino, Hiroshi Murase, Gavin Smith
  • Patent number: 7546236
    Abstract: This invention identifies anomalies in a data stream, without prior training, by measuring the difficulty in finding similarities between neighborhoods in the ordered sequence of elements. Data elements in an area that is similar to much of the rest of the scene score low mismatches. On the other hand a region that possesses many dissimilarities with other parts of the ordered sequence will attract a high score of mismatches. The invention makes use of a trial and error process to find dissimilarities between parts of the data stream and does not require prior knowledge of the nature of the anomalies that may be present. The method avoids the use of processing dependencies between data elements and is capable of a straightforward parallel implementation for each data element. The invention is of application in searching for anomalous patterns in data streams, which include audio signals, health screening and geographical data. A method of error correction is also described.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: June 9, 2009
    Assignee: British Telecommunications public limited company
    Inventor: Frederick W M Stentiford
  • Patent number: 7539616
    Abstract: Speaker authentication is performed by determining a similarity score for a test utterance and a stored training utterance. Computing the similarity score involves determining the sum of a group of functions, where each function includes the product of a posterior probability of a mixture component and a difference between an adapted mean and a background mean. The adapted mean is formed based on the background mean and the test utterance. The speech content provided by the speaker for authentication can be text-independent (i.e., any content they want to say) or text-dependent (i.e., a particular phrase used for training).
    Type: Grant
    Filed: February 20, 2006
    Date of Patent: May 26, 2009
    Assignee: Microsoft Corporation
    Inventors: Zhengyou Zhang, Ming Liu
  • Publication number: 20090112586
    Abstract: Systems, methods and computer-readable media associated with using a divergence metric to evaluate user simulations in a spoken dialog system. The method employs user simulations of a spoken dialog system and includes aggregating a first set of one or more scores from a real user dialog, aggregating a second set of one or more scores from a simulated user dialog associated with a user model, determining a similarity of distributions associated with each of the first set and the second set, wherein the similarity is determined using a divergence metric that does not require any assumptions regarding a shape of the distributions. It is preferable to use a Cramér-von Mises divergence.
    Type: Application
    Filed: November 1, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Lab. Inc.
    Inventor: Jason WILLIAMS
  • Patent number: 7475013
    Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.
    Type: Grant
    Filed: March 26, 2004
    Date of Patent: January 6, 2009
    Assignee: Honda Motor Co., Ltd.
    Inventor: Ryan Rifkin
  • Patent number: 7454339
    Abstract: A method for discriminatively training acoustic models is provided for automated speaker verification (SV) and speech (or utterance) verification (UV) systems.
    Type: Grant
    Filed: December 20, 2005
    Date of Patent: November 18, 2008
    Assignee: Panasonic Corporation
    Inventors: Chaojun Liu, David Kryze, Luca Rigazio
  • Patent number: 7403891
    Abstract: The present invention relates to an apparatus and method for recognizing biological named entity from biological literature based on united medical language system (UMLS). The apparatus and the method receives metathesaurus from the UMLS, constructs a concept name database, a single name database and a category keyterm database, which are language resources to be used recognize a named entity, receives each concept name stored in the concept name database, extracts features of each of the concept names by using data stored in the single name database and the category keyterm database, constructs a rule database by creating rules used to recognize the named entity and filtering the rules by using the extracted features, receives a biological literature, extracts nouns and noun phrases that are candidate named entities, applies the rules stored in the rule database to the nouns and the noun phrases, and recognizes the named entities.
    Type: Grant
    Filed: February 13, 2004
    Date of Patent: July 22, 2008
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Soo Jun Park, Tae Hyun Kim, Hyun Sook Lee, Hyun Chul Jang, Seon Hee Park
  • Publication number: 20080162133
    Abstract: A method of identifying incidents using mobile devices can include receiving a communication from each of a plurality of mobile devices. Each communication can specify information about a detected sound. Spatial and temporal information can be identified from each communication as well as an indication of a sound signature matching the detected sound. The communications can be compared with a policy specifying spatial and temporal requirements relating to the sound signature indicated by the communications. A notification can be selectively sent according to the comparison.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Inventors: Christopher C. Couper, Neil A. Katz, Victor S. Moore
  • Publication number: 20080133234
    Abstract: A dividing module divides a voice signal into voice frames. A likelihood value generation module compares each of the voice frames with a first voice model and a second voice model to generate first likelihood values and second likelihood values. A decision module decides a windows size according to the first likelihood values and the second likelihood values. An accumulation module accumulates the first likelihood values and the second likelihood values inside the window size to generate a first sum and a second sum. A determination module determines whether the voice signal is abnormal according to the first sum and the second sum. While the voice has a big change in the environment, the decision module can dynamically adapt the windows size for decreasing the false rate of the detection and speeding up the determining of the abnormal voice.
    Type: Application
    Filed: February 27, 2007
    Publication date: June 5, 2008
    Inventor: Ing-Jr Ding
  • Publication number: 20080103770
    Abstract: One embodiment of the present method and apparatus for identifying a conversing pair of users of a two-way speech medium includes receiving a plurality of binary voice activity streams, where the plurality of voice activity streams includes a first voice activity stream associated with a first user, and pairing the first voice activity stream with a second voice activity stream associated with a second user, in accordance with a complementary similarity between the first voice activity stream and the second voice activity stream.
    Type: Application
    Filed: October 31, 2006
    Publication date: May 1, 2008
    Inventors: Lisa Amini, Eric Bouillet, Olivier Verscheure, Michail Vlachos
  • Patent number: 7356466
    Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: April 8, 2008
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
  • Patent number: 7349576
    Abstract: A method for recognition of a handwritten character comprises the steps of determining a plurality of position features defining the handwritten character, and comparing the handwritten character to reference characters stored in a database in order to find the closest matching reference character. The step of comparing comprises the steps of computing a difference between one of the plurality of position features of the handwritten character and a corresponding position feature of one of the reference characters, determining, by lookup in a predefined table, a distance measure based on the computed difference and determining a distance measure for each of the plurality of position features of the handwritten character, and computing a cost function based on the determined distance measures. A device and a computer program for implementing the method are also described.
    Type: Grant
    Filed: January 11, 2002
    Date of Patent: March 25, 2008
    Assignee: Zi Decuma AB
    Inventor: Anders Holtsberg
  • Patent number: 7310600
    Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: December 18, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
  • Patent number: 7299187
    Abstract: When a user issued voice command does not match grammars registered in advance, the voice command is identified as a sentence (step S305). This sentence is compared with the registered grammars to calculate a similarity (step S307). When the similarity is higher than a first threshold value (TH1), the voice command is executed (step S315). When the similarity is equal to or lower than the first threshold value (TH1) and higher than a second threshold value (TH2), command choices are displayed for the user and the user is permitted to select a command to be executed (step S319). When the similarity is equal to or lower than the second threshold value (TH2), the command is not executed (step S321). Furthermore, once a command has been executed it is added as a grammar, so that it can be identified when next it is used.
    Type: Grant
    Filed: February 10, 2003
    Date of Patent: November 20, 2007
    Assignee: International Business Machines Corporation
    Inventors: Yoshinori Tahara, Daisuke Tomoda, Kikuo Mitsubo, Yoshinori Atake
  • Patent number: 7284012
    Abstract: A system selects a plurality of attributes common to objects being compared and defines at least one value space for each selected attribute. Each selected attribute can have a value space that is different from a value space of the same attribute associated with a different object. The system defines an ordering system for each value space including selecting whether the value space consists of non-ordered values, partially ordered values, or fully ordered values. An objective function is defined for each ordering system and the objective function maps a pair of values of a value space to a number value. The system normalizes each objective function and defines a mapping from the plurality of objective functions to a first general objective function.
    Type: Grant
    Filed: January 24, 2003
    Date of Patent: October 16, 2007
    Assignee: International Business Machines Corporation
    Inventors: Dikran S. Meliksetian, Nianjun Zhou
  • Publication number: 20070225979
    Abstract: A similarity degree estimation method is performed by two processes. In a first process, an inter-band correlation matrix is created from spectral data of an input voice such that the spectral data are divided into a plurality of discrete bands which are separated from each other with spaces therebetween along a frequency axis, a plurality of envelope components of the spectral data are obtained from the plurality of the discrete bands, and elements of the inter-band correlation matrix are correlation values between the respective envelope components of the input voice. In a second process, a degree of similarity is calculated between a pair of input voices to be compared with each other by using respective inter-band correlation matrices obtained for the pair of the input voices through the inter-band correlation matrix creation process.
    Type: Application
    Filed: March 20, 2007
    Publication date: September 27, 2007
    Inventors: Mikio Tohyama, Michiko Kazama, Satoru Goto, Takehiko Kawahara, Yasuo Yoshioka
  • Patent number: 7246061
    Abstract: A method for the voice-operated identification of the user of a telecommunications line in a telecommunications network is provided in the course of a dialog with a voice-operated dialog system. Utterances spoken by a caller from a group of callers limited to one telecommunications line are used during a human-to-human and/or human-to-machine dialog to apply a reference pattern for the caller. For each reference pattern, a user identifier is stored which is activated once the caller is identified, and, together with the CLI and/or ANI identifier of the telecommunications line, are made available to a server having a voice-controlled dialog system. On the basis of the CLI, including the user identifier, data previously stored for this user are ascertained by the system and made available for the dialog interface with the customer.
    Type: Grant
    Filed: December 15, 2000
    Date of Patent: July 17, 2007
    Assignee: Deutsche Telekom AG
    Inventors: Fred Runge, Christel Mueller, Marian Trinkel, Thomas Ziem
  • Patent number: 7219059
    Abstract: A method and apparatus for generating a pronunciation score by receiving a user phrase intended to conform to a reference phrase and processing the user phrase in accordance with at least one of an articulation-scoring engine, a duration scoring engine and an intonation-scoring engine to derive thereby the pronunciation score.
    Type: Grant
    Filed: July 3, 2002
    Date of Patent: May 15, 2007
    Assignee: Lucent Technologies Inc.
    Inventors: Sunil K. Gupta, Ziyi Lu, Fengguang Zhao
  • Patent number: 7200558
    Abstract: A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule.
    Type: Grant
    Filed: March 8, 2002
    Date of Patent: April 3, 2007
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 7177402
    Abstract: Embodiments of the invention are directed to automated reception systems that may receive voice information indicating action to be taken by the system. The automated reception system may receive a call and transmit a speech message to the caller identifying actions that the caller may ask the system to take. The caller may verbally select an action for the system to execute. Possible actions may depend upon the system context and information about possible actions may be provided to the caller through dynamically generated messages. The caller may also access voicemail or electronic mail messages using embodiments of the invention. Furthermore, in some embodiments, a caller may be able to control a separate communication session using voice or other commands input during a telephone session.
    Type: Grant
    Filed: March 1, 2001
    Date of Patent: February 13, 2007
    Assignee: Applied Voice & Speech Technologies, Inc.
    Inventor: Michael D. Metcalf
  • Patent number: 7177863
    Abstract: A system and associated method for tuning a data clustering program to a clustering task, determine at least one internal parameter of a data clustering program. The determination of one or more of the internal parameters of the data clustering program occurs before the clustering begins. Consequently, clustering does not need to be performed iteratively, thus improving clustering program performance in terms of the required processing time and processing resources. The system provides pairs of data records; the user indicates whether or not these data records should belong to the same cluster. The similarity values of the records of the selected pairs are calculated based on the default parameters of the clustering program. From the resulting similarity values, an optimal similarity threshold is determined. When the optimization criterion does not yield a single optimal similarity threshold range, equivalent candidate ranges are selected.
    Type: Grant
    Filed: March 14, 2003
    Date of Patent: February 13, 2007
    Assignee: International Business Machines Corporation
    Inventors: Boris Charpiot, Barbara Hartel, Christoph Lingenfelder, Thilo Maier
  • Patent number: 7171360
    Abstract: A speaker identification system includes a speaker model generator 110 for generating a plurality of speaker models. To this end, the generator records training utterances from a plurality of speakers in the background, without prior knowledge of the speakers who spoke the utterances. The generator performs a blind clustering of the training utterances based on a predetermined criterion. For each of the clusters a corresponding speaker model is trained. A speaker identifier 130 identifies a speaker determining a most likely one of the speaker models for an utterance received from the speaker. The speaker associated with the most likely speaker model is identified as the speaker of the test utterance.
    Type: Grant
    Filed: May 7, 2002
    Date of Patent: January 30, 2007
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Chao-Shih Huang, Ya-Cherng Chu, Wei-Ho Tsai, Jyh-Min Cheng
  • Patent number: 7165031
    Abstract: A speech recognition method and apparatus is disclosed in which outputs a confidence score indicative of the posterior probability of an utterance being correctly matched to a word model. The confidence score for the matching of an utterance to a word model is determined directly from the generated values indicative of the goodness of match between the utterance and stored word models utilizing the following equation: confidence = exp ? ( - 2 ? ? S ? ( x | w ) ) ? words ? exp ? ( 2 ? ? S ? ( x | w ) ) where S(x|w) is the match score for the correlation between a signal x and word w and ? is an experimentally determined constant.
    Type: Grant
    Filed: November 6, 2002
    Date of Patent: January 16, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventor: David Llewellyn Rees
  • Patent number: 7139706
    Abstract: A comprehensive system is provided for designing a voice activated user interface (VA UI) having a semantic and syntactic structure adapted to the culture and conventions of spoken language for the intended users. The system poses, to at least one respondent, a hypothetical task to be performed; asks each of the at least one respondent for a word that the respondent would use to command the hypothetical task to be performed; receives, from each of the at least one respondent, a command word; develops a list of command words from the received command word; and rejects the received command word, if the received command word is acoustically similar to another word in the list of command words. The approach is general across languages and encompasses universal variables of language and culture. Also provided are prompting grammar and error handling methods adapted to such user interfaces.
    Type: Grant
    Filed: August 12, 2002
    Date of Patent: November 21, 2006
    Assignee: Comverse, Inc.
    Inventor: Matthew John Yuschik
  • Patent number: 7054812
    Abstract: A system is provided for determining a sequence of sub-word units representative of at least two words output by a word recognition unit in response to an input word to be recognized. In a preferred embodiment, the word alternatives output by the recognition unit are converted into sequences of phonemes. An optimum alignment between these sequences is then determined using a dynamic programming alignment technique. The sequence of phonemes representative of the input sequences is then determined using this optimum alignment.
    Type: Grant
    Filed: April 25, 2001
    Date of Patent: May 30, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Jason Peter Andrew Charlesworth, Philip Neil Garner
  • Patent number: 7031917
    Abstract: The present invention relates to a speech recognition apparatus and a speech recognition method for speech recognition with improved accuracy. A distance calculator 47 determines the distance from a microphone 21 to a user uttering. Data indicating the determined distance is supplied to a speech recognition unit 41B. The speech recognition unit 41B has plural sets of acoustic models produced from speech data obtained by capturing speeches uttered at various distances. From those sets of acoustic models, the speech recognition unit 41B selects a set of acoustic models produced from speech data uttered at a distance closest to the distance determined by the distance calculator 47, and the speech recognition unit 41B performs speech recognition using the selected set of acoustic models.
    Type: Grant
    Filed: October 21, 2002
    Date of Patent: April 18, 2006
    Assignee: Sony Corporation
    Inventor: Yasuharu Asano
  • Patent number: 7006970
    Abstract: Disclosed is a method for obtaining a precise detected value of a similarity between voices or the like. Standard and input pattern matrices, each having a voice feature amount as a component, are prepared (S1 and S2). A reference shape having a variance different for each specified component of the pattern matrices is prepared, and positive and negative reference pattern vectors, each having a value of the reference shape as a component, are prepared. Then, while the specified component (a center of the reference shape) being made to move to each component position j1=1 to m1, j2=1 to m2 of the standard pattern matrix, a shape change between the standard and input pattern matrices is substituted for shape changes of the positive and negative reference pattern vectors. And, an amount of change in kurtosis of each reference pattern vector is numerically evaluated to obtain a shape change amount Dj1j2 (S3). Then, a value of a geometric distance between the pattern matrices is calculated from Dj1j2 (S4).
    Type: Grant
    Filed: September 11, 2001
    Date of Patent: February 28, 2006
    Assignee: Entropy Software Laboratory, Inc.
    Inventors: Michihiro Jinnai, Hiroshi Yamaguchi
  • Patent number: 7003456
    Abstract: A computer-based method of routing a message to a system includes receiving a message, and processing the message using large-vocabulary continuous speech recognition to generate a string of text corresponding to the message. The method includes generating a confidence estimate of the string of text corresponding to the message and comparing the confidence estimate to a predetermined threshold. If the confidence estimate satisfies the predetermined threshold, the string of text is forwarded to the system. If the confidence estimate does not satisfy the predetermined threshold, the information relating to the message is forwarded to a transcriptionist. The message may include one or more utterances. Each utterance in the message may be separately or jointly processed. In this way, a confidence estimate may be generated and evaluated for each utterance or for the whole message. Information relating to each utterance may be separately or jointly forwarded based on the results of the generation and evaluation.
    Type: Grant
    Filed: June 12, 2001
    Date of Patent: February 21, 2006
    Inventors: Laurence S. Gillick, Robert Roth, Linda Manganaro, Barbara R. Peskin, David C. Petty, Ashwin Rao
  • Patent number: 6996527
    Abstract: A common requirement in automatic speech recognition is to recognize a set of words for any speaker without training the system for each new speaker. A speech recognition system is provided utilizing linear discriminant based phonetic similarities with inter-phonetic unit value normalization. Linear discriminant analysis is utilized using training data with both in-class and out-class sample training utterances for generating linear discriminant vectors for each of the phonetic units. The dot product of each linear discriminant vector and the time spectral pattern vectors generated from the input speech are computed. The resultant raw similarity vectors are then normalized utilizing normalization look-up tables for providing similarity vectors which are utilized by a word matcher for word recognition.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: February 7, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Robert C. Boman, Philippe R. Morin, Ted H. Applebaum
  • Patent number: 6961701
    Abstract: An extended-word selecting section calculates a score for a phoneme string formed of one more phonemes, corresponding to a user's speech, and searches a large-vocabulary-dictionary for a word having one or more phonemes equal to or similar to those of a phoneme string having a score equal to or higher than a predetermined value. A matching section calculates scores for the word searched for by the extended-word selecting section in addition to a word preliminary word-selecting section. A control section determines a word string as the result of recognition of the speech uttered by the user.
    Type: Grant
    Filed: March 3, 2001
    Date of Patent: November 1, 2005
    Assignee: Sony Corporation
    Inventors: Hiroaki Ogawa, Katsuki Minamino, Yasuharu Asano, Helmut Lucke
  • Patent number: 6947890
    Abstract: A method and system are provided for speech recognition. The speech recognition method includes the steps of preparing training data representing acoustic parameters of each of phonemes at each time frame; receiving an input signal representing a sound to be recognized and converting the input signal to input data; comparing the input data at each frame with the training data of each of the phonemes to derive a similarity measure of the input data with respect to each of the phonemes; and processing the similarity measures obtained in the comparing step using a neural net model governing development of activities of plural cells to conduct speech recognition of the input signal.
    Type: Grant
    Filed: May 30, 2000
    Date of Patent: September 20, 2005
    Inventors: Tetsuro Kitazoe, Sung-Ill Kim, Tomoyuki Ichiki
  • Patent number: 6937982
    Abstract: A speech recognition apparatus recognizes a speech signal received from a speaker and provides the result of recognition for an external device. In the apparatus, a pattern matching section performs pattern matching between each of reference patterns in a vocabulary and characteristic parameters extracted from the speech signal. The vocabulary includes reference patterns corresponding to words. Further the apparatus has a similar sound group which includes reference patterns corresponding to the sound similar to that of a specific word. The specific word is a word in response to which the external device performs an operation which cannot be easily undone. The speech signal is rerecognized by using the similar sound group. As a result, the pattern matching section outputs a word other than the specific word, if one of the reference patterns in the similar sound group has a high similarity with the characteristic parameters.
    Type: Grant
    Filed: July 19, 2001
    Date of Patent: August 30, 2005
    Assignee: Denso Corporation
    Inventors: Norihide Kitaoka, Hiroshi Ohno
  • Patent number: 6910012
    Abstract: A method for performing speech recognition can include the steps of providing a grammar including entries comprising a parent word and a pseudo word being substantially phonetically equivalent to the parent word. The grammar can provide a translation from the pseudo word to the parent word. The parent word can be received as speech and the speech can be compared to the grammar entries. Additionally, the speech can be matched to the pseudo word and the pseudo word can be translated to the parent word.
    Type: Grant
    Filed: May 16, 2001
    Date of Patent: June 21, 2005
    Assignee: International Business Machines Corporation
    Inventors: Matthew W. Hartley, David E. Reich
  • Patent number: 6907367
    Abstract: A method for segmenting a signal into segments having similar spectral characteristics is provided. Initially the method generates a table of previous values from older signal values that contains a scoring value for the best segmentation of previous values and a segment length of the last previously identified segment. The method then receives a new sample of the signal and computes a new spectral characteristic function for the signal based on the received sample. A new scoring function is computed from the spectral characteristic function. Segments of the signal are recursively identified based on the newly computed scoring function and the table of previous values. The spectral characteristic function can be a selected one of an autocorrelation function and a discrete Fourier transform. An example is provided for segmenting a speech signal.
    Type: Grant
    Filed: August 31, 2001
    Date of Patent: June 14, 2005
    Assignee: The United States of America as represented by the Secretary of the Navy
    Inventor: Paul M. Baggenstoss
  • Patent number: 6882970
    Abstract: A system is provided for comparing an input query with a number of stored annotations to identify information to be retrieved from a database. The comparison technique divides the input query into a number of fixed-size fragments and identifies how many times each of the fragments occurs within each annotation using a dynamic programming matching technique. The frequencies of occurrence of the fragments in both the query and the annotation are then compared to provide a measure of the similarity between the query and the annotation. The information to be retrieved is then determined from the similarity measures obtained for all the annotations.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: April 19, 2005
    Assignee: Canon Kabushiki Kaisha
    Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
  • Patent number: 6839667
    Abstract: A method for performing speech recognition can include receiving user speech and determining a plurality of potential candidates. Each of the candidates can provide a textual interpretation of the speech. Confidence scores can be calculated for the candidates. The confidence scores can be compared to a predetermined threshold. Also, selected ones of the plurality of candidates can be presented to the user as alternative interpretations of the speech when none of the confidence scores is greater than the predetermined threshold. The selected ones of the plurality of candidates can have confidence scores above a predetermined minimum threshold, and thus can have confidence scores within a predetermined range.
    Type: Grant
    Filed: May 16, 2001
    Date of Patent: January 4, 2005
    Assignee: International Business Machines Corporation
    Inventor: David E. Reich
  • Publication number: 20040249637
    Abstract: A method of speech recognition obtains acoustic data from a plurality of conversations. A plurality of pairs of utterances are selected from the plurality of conversations. At least one portion of the first utterance of the pair of utterances is dynamically aligned with at least one portion of the second utterance of the pair of utterance, and an acoustic similarity is computed. At least one pair that includes a first portion from a first utterance and a second portion from a second utterance is chosen, based on a criterion of acoustic similarity. A common pattern template is created from the first portion and the second portion.
    Type: Application
    Filed: June 2, 2004
    Publication date: December 9, 2004
    Applicant: Aurilab, LLC
    Inventor: James K. Baker
  • Patent number: 6823308
    Abstract: A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.
    Type: Grant
    Filed: February 16, 2001
    Date of Patent: November 23, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Robert Alexander Keiller, Nicolas David Fortescue
  • Publication number: 20040215454
    Abstract: A speech recognizer 300 built into a navigation apparatus 100 includes a noise estimator 320 which calculates a noise model based on a microphone input signal, an adaptive processor 330 which performs an adaptive process on each keyword model and each non-keyword model stored in an HMM database 310 based on the noise model. The adaptive processor 330 performs a data adaptation process on each keyword model and non-keyword model based on the noise model and word spotting is performed based on the keyword models and the non-keyword models subjected to the data adaptation process.
    Type: Application
    Filed: April 22, 2004
    Publication date: October 28, 2004
    Inventors: Hajime Kobayashi, Kengo Hanai
  • Publication number: 20040204939
    Abstract: A speaker change detection system performs speaker change detection on an input audio stream. The speaker change detection system includes a segmentation component [401], a phone classification decode component [402], and a speaker change detection component [403]. The segmentation component [401] segments the audio stream into segments [501-504] of predetermined length intervals. The segments may overlap one another. The phone classification decode component decodes the intervals to produce a set of phone classes corresponding to each of the intervals. The speaker change detection component detects locations of speaker changes in the audio stream based on a similarity value calculated at phone class boundaries.
    Type: Application
    Filed: October 16, 2003
    Publication date: October 14, 2004
    Inventors: Daben Liu, Francis G. Kubala
  • Patent number: 6778957
    Abstract: Disclosed is a method of automated handset identification, comprising receiving a sample speech input signal from a sample handset; deriving a cepstral covariance sample matrix from said first sample speech signal; calculating, with a distance metric, all distances between said sample matrix and one or more cepstral covariance handset matrices, wherein each said handset matrix is derived from a plurality of speech signals taken from different speakers through the same handset; and determining if the smallest of said distances is below a predetermined threshold value.
    Type: Grant
    Filed: August 21, 2001
    Date of Patent: August 17, 2004
    Assignee: International Business Machines Corporation
    Inventors: Zhong-Hua Wang, David Lubensky, Cheng Wu
  • Patent number: 6766294
    Abstract: A performance gauge for use in conjunction with a transcription system including a speech processor linked to at least one speech recognition engine and at least one transcriptionist. The speech processor includes an input for receiving speech files and storage means for storing the received speech files until such a time that they are forwarded to a selected speech recognition engine or transcriptionist for processing. The system includes a transcriptionist text file database in which manually transcribed transcriptionist text files are stored, each stored transcriptionist text file including time stamped data indicative of position within an original speech file. The system further includes a recognition engine text file database in which recognition engine text files transcribed via the at least one speech recognition engine are stored, each stored recognition engine text file including time stamped data indicative of position within an original speech file.
    Type: Grant
    Filed: November 30, 2001
    Date of Patent: July 20, 2004
    Assignee: Dictaphone Corporation
    Inventors: Andrew MacGinite, James Cyr, Martin Hold, Channell Greene, Regina Kuhnen
  • Patent number: 6725196
    Abstract: A method and apparatus is provided for matching a first sequence of patterns representative of a first signal with a second sequence of patterns representative of a second signal. The system uses a plurality of different pruning thresholds (th) to control the propagation of paths which represent possible matchings between a sequence of second signal patterns and a sequence of first signal patterns ending at the current first signal pattern. In particular, the pruning threshold used for a given path during the processing of a current first signal pattern depends upon the position, within the sequence of patterns representing the second signal, of the second signal pattern which is at the end of the given path.
    Type: Grant
    Filed: March 20, 2001
    Date of Patent: April 20, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Robert Alexander Keiller, Eli Tzirkel-Hancock, Julian Richard Seward
  • Patent number: 6701292
    Abstract: A speech-recognizing apparatus for recognizing input speech comprises, an analysis unit for computing a characteristic vector for each of frames of the input speech, a correction-value storage unit for storing a correction distance in advance, a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between the characteristic vector and the phoneme characteristic vector, an average-value-computing unit for computing an average value of vector-to-vector distances for one of the frames, a correction unit for computing a corrected vector-to-vector distance as a value of an expression of (the vector-to-vector distance-the average value+the correction distance), and a recognition unit for cumulating corrected vector-to-vector distances into a cumulative vector-to-vector distance and comparing the cumulative vector-to-vector distance with the word standard pattern in order to recognize the input speech.
    Type: Grant
    Filed: October 30, 2000
    Date of Patent: March 2, 2004
    Assignee: Fujitsu Limited
    Inventors: Chiharu Kawai, Hiroshi Katayama, Takehiro Nakai
  • Publication number: 20040019483
    Abstract: A method of speech recognition is provided that identifies a production-related dynamics value by performing a linear interpolation between a production-related dynamics value at a previous time and a production-related target using a time-dependent interpolation weight. The hidden production-related dynamics value is used to compute a predicted value that is compared to an observed value of acoustics to determine the likelihood of the observed acoustics given a sequence of hidden phonological units. In some embodiments, the production-related dynamics value at the previous time is selected from a set of continuous values. In addition, the likelihood of the observed acoustics given a sequence of hidden phonological units is combined with a score associated with a discrete class of production-related dynamic values at the previous time to determine a score for a current phonological state.
    Type: Application
    Filed: October 9, 2002
    Publication date: January 29, 2004
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide, Asela J.R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
  • Publication number: 20030200086
    Abstract: A speech recognition apparatus comprises a speech analyzer which extracts feature patterns of spontaneous speech divided into frames; a keyword model database which prestores keyword which represent feature patterns of a plurality of keywords to be recognized; a garbage model database which prestores feature patterns of components of extraneous speech to be identified; and a first likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and keywords; a second likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and extraneous speech. The device recognizes keywords contained in the spontaneous speech by calculating cumulative likelihood based on the calculated likelihood adding a predetermined correction value in the second likelihood calculator.
    Type: Application
    Filed: April 15, 2003
    Publication date: October 23, 2003
    Inventors: Yoshihiro Kawazoe, Hajime Kobayashi
  • Patent number: 6631349
    Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.
    Type: Grant
    Filed: May 9, 2000
    Date of Patent: October 7, 2003
    Assignee: Hitachi, Ltd.
    Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
  • Patent number: 6625600
    Abstract: The invention concerns a method and apparatus for processing a user's communication. The invention may include receiving a list of recognized symbol strings of one or more recognized entries. The list of recognized symbol strings may include a first similarity score associated with each recognized entry. From each recognized symbol string one or more contiguous sequences of N-symbols may be extracted. One of the extracted contiguous sequences of N-symbols may be matched with at least one stored contiguous sequence of N-symbols from a first database. A preliminary set of symbol strings and associated second similarity scores may be generated. The preliminary set of symbol strings may include one or more stored symbol strings from a second database that correspond to the at least one matched contiguous sequence of N-symbols. A third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings may be computed.
    Type: Grant
    Filed: May 1, 2001
    Date of Patent: September 23, 2003
    Assignee: Telelogue, Inc.
    Inventors: Yevgenly Lyudovyk, Esther Levin
  • Patent number: 6618697
    Abstract: A computer implemented method which does not require a stored dictionary for correcting spelling errors in a sequence of words comprises storing a plurality of spelling rules defined as regular expressions for matching a potentially illegal n-gram which may comprise less than all letters in the word and for replacing an illegal n-gram with a legal n-gram to return a corrected word, submitting a word from said sequence of words to the spelling rules and replacing a word in the string of words with a corrected word.
    Type: Grant
    Filed: May 14, 1999
    Date of Patent: September 9, 2003
    Assignee: Justsystem Corporation
    Inventors: Mark Kantrowitz, Shumeet Baluja
  • Patent number: 6609094
    Abstract: Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density function's that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: August 19, 2003
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Charles A. Micchelli, Peder Olsen