Subportions Patents (Class 704/254)
  • Patent number: 7353172
    Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7349839
    Abstract: A method is provided for aligning sentences in a first corpus to sentences in a second corpus. The method includes applying a length-based alignment model to align sentence boundaries of a sentence in the first corpus with sentence boundaries of a sentence in the second corpus to form an aligned sentence pair. The aligned sentence pair is then used to train a translation model. Once trained, the translation model is used to align sentences in the first corpus to sentences in the second corpus. Under aspects of the invention, pruning is used to reduce the number of sentence boundary alignments considered by the length-based alignment model and by the translation model. In further aspects of the invention, the length-based model utilizes a Poisson distribution.
    Type: Grant
    Filed: August 27, 2002
    Date of Patent: March 25, 2008
    Assignee: Microsoft Corporation
    Inventor: Robert C. Moore
  • Patent number: 7346510
    Abstract: A method and computer-readable medium are provided that determine predicted acoustic values for a sequence of hypothesized speech units using modeled articulatory or VTR dynamics values and using the modeled relationship between the articulatory (or VTR) and acoustic values for the same speech events. Under one embodiment, the articulatory (or VTR) dynamics value depends on articulatory dynamics values at pervious time frames and articulation targets. In another embodiment, the articulatory dynamics value depends in part on an acoustic environment value such as noise or distortion. In a third embodiment, a time constant that defines the articulatory dynamics value is trained using a variety of articulation styles. By modeling the articulatory or VTR dynamics value in these manners, hyper-articulated, hypo-articulated, fast, and slow speech can be better recognized and the requirement for the training data can be reduced.
    Type: Grant
    Filed: March 19, 2002
    Date of Patent: March 18, 2008
    Assignee: Microsoft Corporation
    Inventor: Li Deng
  • Patent number: 7346511
    Abstract: Words of an input string are morphologically analyzed to identify their alternative base forms and parts of speech. The analyzed words of the input string are used to compile the input string into a first finite-state network. The first finite-state network is matched with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network. Each matching subpath of the first finite-state network and path of the second finite-state network identify a multiword expression in the input string. The morphological analysis is performed without disambiguating words and without segmenting the input string into sentences in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.
    Type: Grant
    Filed: December 13, 2002
    Date of Patent: March 18, 2008
    Assignee: Xerox Corporation
    Inventors: Caroline Privault, Herve Poirier
  • Publication number: 20080065381
    Abstract: To automatically detect and automatically correct in a reproduced speech, defective portions related to plosives such as existence or absence of plosive portions, phoneme lengths of aspirated portions that continue after the plosive portions or defective portions related to amplitude variations of fricatives. Speech wherein consonants and unvoiced vowels are unclear and discordant is input into a speech enhancement apparatus according to the present invention. In the speech enhancement apparatus, the speech is split into phonemes and each phoneme is classified into any one of an unvoiced plosive, a voiced plosive, an unvoiced fricative, a voiced fricative, an affricate, and an unvoiced vowel. Each phoneme is corrected according to a determination of necessity of correction of each phoneme to obtain an output of the speech wherein the consonants and the unvoiced vowels are clear and not discordant.
    Type: Application
    Filed: July 31, 2007
    Publication date: March 13, 2008
    Applicant: FUJITSU LIMITED
    Inventor: Chikako Matsumoto
  • Patent number: 7337117
    Abstract: An apparatus for phonetically screening predetermined character strings. The apparatus includes a text-to-speech module, and a phonetic screening module in communication with the text-to-speech module. The phonetic screening module is for replacing a first character string with a second character string based on a phonetic enunciation by the text-to-speech module of the first character string.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: February 26, 2008
    Assignee: AT&T Delaware Intellectual Property, Inc.
    Inventor: Anita Hogans Simpson
  • Patent number: 7337116
    Abstract: A system is provided for allowing a user to add word models to a speech recognition system. In particular, the system allows a user to input a number of renditions of the new word and which generates from these a sequence of phonemes representative of the new word. This representative sequence of phonemes is stored in a word to phoneme dictionary together with the typed version of the word for subsequent use by the speech recognition system.
    Type: Grant
    Filed: November 5, 2001
    Date of Patent: February 26, 2008
    Assignee: Canon Kabushiki Kaisha
    Inventors: Jason Peter Andrew Charlesworth, Jebu Jacob Rajan
  • Patent number: 7319960
    Abstract: A speech recognition system uses a phoneme counter to determine the length of a word to be recognized. The result is used to split a lexicon into one or more sub-lexicons containing only words which have the same or similar length to that of the word to be recognized, so restricting the search space significantly. In another aspect, a phoneme counter is used to estimate the number of phonemes in a word so that a transition bias can be calculated. This bias is applied to the transition probabilities between phoneme models in an HNN based recognizer to improve recognition performance for relatively short or long words.
    Type: Grant
    Filed: December 19, 2001
    Date of Patent: January 15, 2008
    Assignee: Nokia Corporation
    Inventors: Soren Riis, Konstantinos Koumpis
  • Patent number: 7319958
    Abstract: Acoustic phones (preferably drawn 12 from a plurality of spoken languages) are provided 11. A hierarchically-organized polyphone network (20) organizes views of these phones of varying resolution and phone categorization as a function, at least in part, of phonetic similarity (14) and at least one language-independent phonological factor (15). In a preferred approach, a unique transcription system serves to represent the phones using only standard, printable ASCII characters, none of which comprises a special character (such as those characters that have a command significance for common script interpreters such as the UNIX command line).
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: January 15, 2008
    Assignee: Motorola, Inc.
    Inventors: Lynette Melnar, Jim Talley, Yuan-Jun Wei, Chen Liu
  • Patent number: 7319964
    Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
    Type: Grant
    Filed: June 7, 2004
    Date of Patent: January 15, 2008
    Assignee: AT&T Corp.
    Inventors: Qian Huang, Zhu Liu
  • Patent number: 7319959
    Abstract: A system and method are disclosed for processing an audio signal including separating the audio signal into a plurality of streams which group sounds from a same source prior to classification and analyzing each separate stream to determine phoneme-level classification. One or more words of the audio signal may then be outputted.
    Type: Grant
    Filed: May 14, 2003
    Date of Patent: January 15, 2008
    Assignee: Audience, Inc.
    Inventor: Lloyd Watts
  • Publication number: 20080010067
    Abstract: A method is presented which reduces data flow and thereby increases processing capacity while preserving a high level of accuracy in a distributed speech processing environment for speaker detection. The method and system of the present invention includes filtering out data based on a target speaker specific subset of labels using data filters. The method preserves accuracy and passes only a fraction of the data by optimizing target specific performance measures. Therefore, a high level of speaker recognition accuracy is maintained while utilizing existing processing capabilities.
    Type: Application
    Filed: July 7, 2006
    Publication date: January 10, 2008
    Inventors: Upendra V. Chaudhari, Juan M. Huerta, Ganesh N. Ramaswamy, Olivier Verscheure
  • Patent number: 7318032
    Abstract: A technique for improved score calculation and normalization in a framework of recognition with phonetically structured speaker models. The technique involves determining, for each frame and each level of phonetic detail of a target speaker model, a non-interpolated likelihood value, and then resolving the at least one likelihood value to obtain a likelihood score.
    Type: Grant
    Filed: June 13, 2000
    Date of Patent: January 8, 2008
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Patent number: 7305070
    Abstract: An interactive voice response system that allows a caller to perform a series of sequential tasks based on an instruction set. The caller is queried after each instruction to ensure that the caller has successfully completed all of the steps. Additionally, provisions are provided to automatically pause the instruction set and present reminders to the caller. Further the caller may elect to repeat instructions, back up the instruction set, receive additional details, transfer to a service representative, or receive summary information.
    Type: Grant
    Filed: January 30, 2002
    Date of Patent: December 4, 2007
    Assignee: AT&T Labs, Inc.
    Inventors: Philip Ted Kortum, Robert R. Bushey
  • Patent number: 7302393
    Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: November 27, 2007
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Patent number: 7299178
    Abstract: A continuous speech recognition method and system are provided.
    Type: Grant
    Filed: February 24, 2004
    Date of Patent: November 20, 2007
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Su-yeon Yoon, In-jeong Choi, Nam-hoon Kim
  • Patent number: 7299179
    Abstract: In a three-stage speech recognition process, a phoneme sequence is first assigned to a speech unit, then those vocabulary entries which are most similar to the phoneme sequence are sought in a selection vocabulary, and finally the speech unit is recognized using a speech unit recognizer which uses, as its vocabulary, the selected vocabulary entries which are most like the phoneme sequence.
    Type: Grant
    Filed: January 19, 2004
    Date of Patent: November 20, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventors: Hans-Ulrich Block, Stefanie Schachtl
  • Patent number: 7295979
    Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.
    Type: Grant
    Filed: February 22, 2001
    Date of Patent: November 13, 2007
    Assignee: International Business Machines Corporation
    Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
  • Patent number: 7295980
    Abstract: A system is provided for matching two or more sequences of phonemes both or all of which may be generated from text or speech. A dynamic programming matching technique is preferably used having constraints which depend upon whether or not the two sequences are generated from text or speech and in which the scoring of the dynamic programming paths is weighted by phoneme confusion scores, phoneme insertion scores and phoneme deletion scores where appropriate.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: November 13, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
  • Patent number: 7292980
    Abstract: A method and user interface which allow users to make decisions about how to pronounce words and parts of words based on audio cues and common words with well known pronunciations. Users input or select words for which they want to set or modify pronunciations. To set the pronunciation of a given letter or letter combination in the word, the user selects the letters and is presented with a list of common words whose pronunciations, or portions thereof, are substantially identical to possible pronunciations of the selected letters. The list of sample, common words is ordered based on frequency of correlation in common usage, the most common being designated as the default sample word, and the user is first presented with a subset of the words in the list which are most likely to be selected. In addition, the present invention allows for storage in the dictionary of several different pronunciations for the same word, to allow for contextual differences and individual preferences.
    Type: Grant
    Filed: April 30, 1999
    Date of Patent: November 6, 2007
    Assignee: Lucent Technologies Inc.
    Inventors: Katherine Grace August, Michelle McNerney
  • Patent number: 7286984
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Grant
    Filed: May 31, 2002
    Date of Patent: October 23, 2007
    Assignee: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Patent number: 7286987
    Abstract: A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.
    Type: Grant
    Filed: June 30, 2003
    Date of Patent: October 23, 2007
    Assignee: Conceptual Speech LLC
    Inventor: Philippe Roy
  • Patent number: 7269557
    Abstract: Described are methods and systems for reducing the audible gap in concatenated recorded speech, resulting in more natural sounding speech in voice applications. The sound of concatenated, recorded speech is improved by also coarticulating the recorded speech. The resulting message is smooth, natural sounding and lifelike. Existing libraries of regularly recorded bulk prompts can be used by coarticulating the user interface prompt occurring just before the bulk prompt. Applications include phone-based applications as well as non-phone-based applications.
    Type: Grant
    Filed: November 19, 2004
    Date of Patent: September 11, 2007
    Assignee: Tellme Networks, Inc.
    Inventors: Scott J. Bailey, Nikko Strom
  • Patent number: 7263487
    Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: August 28, 2007
    Assignee: Microsoft Corporation
    Inventor: Mei Yuh Hwang
  • Publication number: 20070185714
    Abstract: A speech recognition method including: layering a central lexicon in a tree structure with respect to recognition-subject vocabularies; performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure; and selecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.
    Type: Application
    Filed: August 28, 2006
    Publication date: August 9, 2007
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam Hoon Kim, In Jeong Choi, Ick Sang Han, Sang Bae Jeong
  • Publication number: 20070185713
    Abstract: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.
    Type: Application
    Filed: July 31, 2006
    Publication date: August 9, 2007
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sang-Bae Jeong, Nam Hoon Kim, Ick Sang Han, In Jeong Choi, Gil Jin Jang, Jae-Hoon Jeong
  • Publication number: 20070156404
    Abstract: A string matching method and system for searching for a representative string for a plurality of strings which are written in different languages and/or in different ways but share the substantially same meaning, and a computer-readable recording medium storing a computer program for executing the string matching method are provided.
    Type: Application
    Filed: December 11, 2006
    Publication date: July 5, 2007
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyung-eun Lee, Seok-joong Kang
  • Patent number: 7240003
    Abstract: A data structure is provided for annotating data files within a database. The annotation data comprises a phoneme and word lattice which allows the quick and efficient searching of data files within the database, in response to a user's input query for desired information. The phoneme and word lattice comprises a plurality of time-ordered nodes, and a plurality of links extending between the nodes. Each link has a phoneme or word associated with it. The nodes are arranged in a sequence of time-ordered blocks such that further data can be conveniently added to the lattice.
    Type: Grant
    Filed: September 28, 2001
    Date of Patent: July 3, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Jason Peter Andrew Charlesworth, Philip Neil Garner
  • Patent number: 7233899
    Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.
    Type: Grant
    Filed: March 7, 2002
    Date of Patent: June 19, 2007
    Inventors: Vitaliy S. Fain, Samuel V. Fain
  • Patent number: 7219065
    Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity short-duration speech features in said signal.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: May 15, 2007
    Inventors: Andrew E. Vandali, Graeme M. Clark
  • Patent number: 7216076
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: May 8, 2007
    Assignee: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Patent number: 7206741
    Abstract: A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
    Type: Grant
    Filed: December 6, 2005
    Date of Patent: April 17, 2007
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Jian-lai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
  • Patent number: 7206738
    Abstract: A method, a computer system and a computer program product for generating baseforms or phonetic spellings from input text are disclosed. The baseforms are initially generated using rules defined for a particular language. Then, phones are identified in the language that are exceptions to the defined rules and an action is associated with each identified phone. A statistical technique is applied to determine whether the identified phones can be modified. Finally, baseforms containing the identified phones that can be modified, are corrected according to the associated actions. Preferably, the statistical technique is only applied to baseforms containing phones that are exceptions to the defined rules. The defined rules can comprise spelling-to-sound rules for a particular phonetic language that incorporate all possible alternative pronunciations of each baseform.
    Type: Grant
    Filed: August 14, 2002
    Date of Patent: April 17, 2007
    Assignee: International Business Machines Corporation
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 7191130
    Abstract: The present invention introduces a system and method for automatically optimizing recognition configuration parameters for speech recognition systems. In one embodiment, a method comprises receiving an utterance at a speech recognizer, wherein the speech recognizer has a learning mode. The speech recognizer is run in a learning mode to automatically generate tuned configuration parameters. Subsequent utterances are recognized with the tuned configuration parameters to generate future recognition results.
    Type: Grant
    Filed: September 27, 2002
    Date of Patent: March 13, 2007
    Assignee: Nuance Communications
    Inventors: Christopher J. Leggetter, Michael M. Hochberg
  • Patent number: 7191135
    Abstract: A speech recognition system that includes a host computer which is operative to communicate at least one graphical user interface (GUI) display file to a mobile terminal of the system. The mobile terminal includes a microphone for receiving speech input; wherein the at least one GUI display file is operative to be associated with at least one of a dictionary file and syntax file to facilitate speech recognition in connection with the at least one GUI display file.
    Type: Grant
    Filed: April 8, 1998
    Date of Patent: March 13, 2007
    Assignee: Symbol Technologies, Inc.
    Inventor: Timothy P. O'Hagan
  • Patent number: 7181398
    Abstract: A speech recognition system provides a subword decoder and a dictionary lookup to process a spoken input. In a first stage of processing, the subword decoder decodes the speech input based on subword units or particles and identifies hypothesized subword sequences using a particle dictionary and particle language model, but independently of a word dictionary or word vocabulary. Further stages of processing involve a particle to word graph expander and a word decoder. The particle to word graph expander expands the subword representation produced by the subword decoder into a word graph of word candidates using a word dictionary. The word decoder uses the word dictionary and a word language model to determine a best sequence of word candidates from the word graph that is most likely to match the words of the spoken input.
    Type: Grant
    Filed: March 27, 2002
    Date of Patent: February 20, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Jean-Manuel Van Thong, Pedro Moreno, Edward Whittaker
  • Patent number: 7171358
    Abstract: A method compresses one or more ordered arrays of integer values. The integer values can represent a vocabulary of a language mode, in the form of an N-gram, of an automated speech recognition system. For each ordered array to be compressed, and an inverse array I[.] is defined. One or more spilt inverse arrays are also defined for each ordered array. The minimum and optimum number of bits required to store the array A[.] in terms of the split arrays and split inverse arrays are determined. Then, the original array is stored in such a way that the total amount of memory used is minimized.
    Type: Grant
    Filed: January 13, 2003
    Date of Patent: January 30, 2007
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Edward W. D. Whittaker, Bhiksha Ramakrishnan
  • Patent number: 7155390
    Abstract: A speech information processing apparatus and method performs speech recognition. Speech is input, and feature parameters of the input speech are extracted. The feature parameters are recognized based on a segment pitch pattern model. The segment pitch pattern model may be obtained by modeling time change in a fundamental frequency of a phoneme belonging to a predetermined phonemic environment with a polynomial segment model. The segment pitch pattern model may also be obtained by modeling with at least one of a single mixed distribution and a multiple mixed distribution.
    Type: Grant
    Filed: October 18, 2004
    Date of Patent: December 26, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventor: Toshiaki Fukada
  • Patent number: 7146319
    Abstract: A speech recognition method includes a step of receiving a phonetic sequence output by a phonetic recognizer. The method also includes a step of matching the phonetic sequence with one of a plurality of reference phoneme sequences stored in a reference list that matches closest thereto. At least one of the plurality of reference phoneme sequences stored in the reference list includes additional information with respect to a phonetic sequence that is capable of being output by the phonetic recognizer.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: December 5, 2006
    Assignee: Novauris Technologies Ltd.
    Inventor: Melvyn J. Hunt
  • Patent number: 7143037
    Abstract: Words are spelled by receiving recognizable words from a user of an interactive voice response system. The first letter of each recognizable word is identified, and a spelling is determined based on the first letters of the recognizable words. Statistics for previous users of the interactive voice response system are determined, where the statistics indicate the number of times each of the recognizable words has been used to indicate a letter. The recognizable word that is most commonly used for each letter is identified. The user is prompted with at least two recognizable words that are most commonly used, where each recognizable word corresponds to a different letter. A selection of one of the recognizable words provided to the user is received.
    Type: Grant
    Filed: June 12, 2002
    Date of Patent: November 28, 2006
    Assignee: Cisco Technology, Inc.
    Inventor: Kevin L. Chestnut
  • Patent number: 7139708
    Abstract: A system and method for speech recognition using an enhanced phone set comprises speech data, an enhanced phone set, and a transcription generated by a transcription process. The transcription process selects appropriate phones from the enhanced phone set to represent acoustic-phonetic content of the speech data. The enhanced phone set includes base-phones and composite-phones. A phone dataset includes the speech data and the transcription. The present invention also comprises a transformer that applies transformation rules to the phone dataset to produce a transformed phone dataset. The transformed phone dataset may be utilized in training a speech recognizer, such as a Hidden Markov Model. Various types of transformation rules may be applied to the phone dataset of the present invention to find an optimum transformed phone dataset for training a particular speech recognizer.
    Type: Grant
    Filed: August 4, 1999
    Date of Patent: November 21, 2006
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Lex S. Olorenshaw, Mariscela Amador-Hernandez
  • Patent number: 7139688
    Abstract: A technique for structurally classifying substructures of at least one unmarked string utilizing at least one training data set with inserted markers identifying labeled substructures. A model of class labels and substructures within strings of the training data set is first constructed. Markers are then inserted into the unmarked string, identifying substructures similar to substructures within strings of the training data set by using the model. Finally, class labels of the substructures in the unmarked string similar to substructures within strings of the training data set are predicted using the model.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: November 21, 2006
    Assignee: International Business Machines Corporation
    Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
  • Patent number: 7136852
    Abstract: A database system and a method for case-based reasoning are disclosed. The database system includes an exemplar object within the database configured to accept and store a plurality of exemplar cases, a target object within the database configured to accept and store a target case, and a comparison object within the database for comparing the target case with the plurality of exemplar cases. The method includes comparing the target case with the plurality of exemplar cases within a database to produce similarity metrics and determining the similarity between the target and exemplar cases based on the similarity metrics.
    Type: Grant
    Filed: November 27, 2001
    Date of Patent: November 14, 2006
    Assignee: NCR Corp.
    Inventors: Warren Martin Sterling, Barbara Jane Ericson
  • Patent number: 7136811
    Abstract: A voice coding and decoding system 300 and method uses a personal phoneme table (320, 344) associated with a voice signature identifier (348) to permit encoding of true sounding voice by personalizing the phoneme table used for encoding and decoding. A default phoneme table (364) is used for encoding and decoding until a personal phoneme table (320, 344) is constructed. A MIDI decoder (360) is used to create the reconstructed speech from a string of phoneme identifiers transmitted from the sending side (302) to the receiving side (304).
    Type: Grant
    Filed: April 24, 2002
    Date of Patent: November 14, 2006
    Assignee: Motorola, Inc.
    Inventors: Thomas Michael Tirpak, Weimin Xiao
  • Patent number: 7133827
    Abstract: A new word model is trained from synthetic word samples derived by Monte Carlo techniques from one or more prior word models. The prior word model can be a phonetic word model and the new word model can be a non-phonetic, whole-word, word model. The prior word model can be trained from data that has undergone a first channel normalization and the synthesized word samples from which the new word model is trained can undergo a different channel normalization similar to that to be used in a given speech recognition context. The prior word model can have a first model structure and the new word model can have a second, different, model structure. These differences in model structure can include, for example, differences of model topology; differences of model complexity; and differences in the type of basis function used in a description of such probability distributions.
    Type: Grant
    Filed: February 6, 2003
    Date of Patent: November 7, 2006
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Laurence S. Gillick, Donald R. McAllaster, Daniel L. Roth
  • Patent number: 7127393
    Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.
    Type: Grant
    Filed: February 10, 2003
    Date of Patent: October 24, 2006
    Assignee: Speech Works International, Inc.
    Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
  • Patent number: 7127397
    Abstract: A method of training a computer system via human voice input from a human teacher is provided. In one embodiment, the method includes presenting a text spelling of an unknown word and receiving a human voice pronunciation of the unknown word. A phonetic spelling of the unknown word is determined. The text spelling is associated with the phonetic spelling to allow a text to speech engine to correctly pronounce the unknown word in the future when presented with the text spelling of the unknown word.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: October 24, 2006
    Assignee: Qwest Communications International Inc.
    Inventor: Eliot M. Case
  • Patent number: 7124130
    Abstract: The present invention is directed to an address recognition apparatus for recognizing a written address. The apparatus includes an input device that receives a scanned image of the written address and transforms the image into digital data, a character recognizing section that recognizes a word string in the digital data on a unit character basis, a word extracting section that extracts characters recognized by the character recognizing section on a unit word basis, and an address word string dictionary that previously stores a plurality of first word strings. The apparatus further includes and an address word string recognizing section that collates a second word string, determines words of the second word string respectively corresponding to the words of the first word string, evaluates each of the first word strings, and recognizes one of the first word strings as the address word string.
    Type: Grant
    Filed: September 4, 2003
    Date of Patent: October 17, 2006
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Naotake Natori
  • Patent number: 7124083
    Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.
    Type: Grant
    Filed: November 5, 2003
    Date of Patent: October 17, 2006
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 7092496
    Abstract: Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content.
    Type: Grant
    Filed: September 18, 2000
    Date of Patent: August 15, 2006
    Assignee: International Business Machines Corporation
    Inventors: Stephane Herman Maes, Mukund Padmanabhan, Jeffrey Scott Sorensen