Subportions Patents (Class 704/254)

System and method for cantonese speech recognition using an optimized phone set

Patent number: 7353172

Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.

Type: Grant

Filed: March 24, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
Method and apparatus for aligning bilingual corpora

Patent number: 7349839

Abstract: A method is provided for aligning sentences in a first corpus to sentences in a second corpus. The method includes applying a length-based alignment model to align sentence boundaries of a sentence in the first corpus with sentence boundaries of a sentence in the second corpus to form an aligned sentence pair. The aligned sentence pair is then used to train a translation model. Once trained, the translation model is used to align sentences in the first corpus to sentences in the second corpus. Under aspects of the invention, pruning is used to reduce the number of sentence boundary alignments considered by the length-based alignment model and by the translation model. In further aspects of the invention, the length-based model utilizes a Poisson distribution.

Type: Grant

Filed: August 27, 2002

Date of Patent: March 25, 2008

Assignee: Microsoft Corporation

Inventor: Robert C. Moore
Method of speech recognition using variables representing dynamic aspects of speech

Patent number: 7346510

Abstract: A method and computer-readable medium are provided that determine predicted acoustic values for a sequence of hypothesized speech units using modeled articulatory or VTR dynamics values and using the modeled relationship between the articulatory (or VTR) and acoustic values for the same speech events. Under one embodiment, the articulatory (or VTR) dynamics value depends on articulatory dynamics values at pervious time frames and articulation targets. In another embodiment, the articulatory dynamics value depends in part on an acoustic environment value such as noise or distortion. In a third embodiment, a time constant that defines the articulatory dynamics value is trained using a variety of articulation styles. By modeling the articulatory or VTR dynamics value in these manners, hyper-articulated, hypo-articulated, fast, and slow speech can be better recognized and the requirement for the training data can be reduced.

Type: Grant

Filed: March 19, 2002

Date of Patent: March 18, 2008

Assignee: Microsoft Corporation

Inventor: Li Deng
Method and apparatus for recognizing multiword expressions

Patent number: 7346511

Abstract: Words of an input string are morphologically analyzed to identify their alternative base forms and parts of speech. The analyzed words of the input string are used to compile the input string into a first finite-state network. The first finite-state network is matched with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network. Each matching subpath of the first finite-state network and path of the second finite-state network identify a multiword expression in the input string. The morphological analysis is performed without disambiguating words and without segmenting the input string into sentences in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.

Type: Grant

Filed: December 13, 2002

Date of Patent: March 18, 2008

Assignee: Xerox Corporation

Inventors: Caroline Privault, Herve Poirier
Speech enhancement apparatus, speech recording apparatus, speech enhancement program, speech recording program, speech enhancing method, and speech recording method

Publication number: 20080065381

Abstract: To automatically detect and automatically correct in a reproduced speech, defective portions related to plosives such as existence or absence of plosive portions, phoneme lengths of aspirated portions that continue after the plosive portions or defective portions related to amplitude variations of fricatives. Speech wherein consonants and unvoiced vowels are unclear and discordant is input into a speech enhancement apparatus according to the present invention. In the speech enhancement apparatus, the speech is split into phonemes and each phoneme is classified into any one of an unvoiced plosive, a voiced plosive, an unvoiced fricative, a voiced fricative, an affricate, and an unvoiced vowel. Each phoneme is corrected according to a determination of necessity of correction of each phoneme to obtain an output of the speech wherein the consonants and the unvoiced vowels are clear and not discordant.

Type: Application

Filed: July 31, 2007

Publication date: March 13, 2008

Applicant: FUJITSU LIMITED

Inventor: Chikako Matsumoto
Apparatus and method for phonetically screening predetermined character strings

Patent number: 7337117

Abstract: An apparatus for phonetically screening predetermined character strings. The apparatus includes a text-to-speech module, and a phonetic screening module in communication with the text-to-speech module. The phonetic screening module is for replacing a first character string with a second character string based on a phonetic enunciation by the text-to-speech module of the first character string.

Type: Grant

Filed: September 21, 2004

Date of Patent: February 26, 2008

Assignee: AT&T Delaware Intellectual Property, Inc.

Inventor: Anita Hogans Simpson
Speech processing system

Patent number: 7337116

Abstract: A system is provided for allowing a user to add word models to a speech recognition system. In particular, the system allows a user to input a number of renditions of the new word and which generates from these a sequence of phonemes representative of the new word. This representative sequence of phonemes is stored in a word to phoneme dictionary together with the typed version of the word for subsequent use by the speech recognition system.

Type: Grant

Filed: November 5, 2001

Date of Patent: February 26, 2008

Assignee: Canon Kabushiki Kaisha

Inventors: Jason Peter Andrew Charlesworth, Jebu Jacob Rajan
Speech recognition method and system

Patent number: 7319960

Abstract: A speech recognition system uses a phoneme counter to determine the length of a word to be recognized. The result is used to split a lexicon into one or more sub-lexicons containing only words which have the same or similar length to that of the word to be recognized, so restricting the search space significantly. In another aspect, a phoneme counter is used to estimate the number of phonemes in a word so that a transition bias can be calculated. This bias is applied to the transition probabilities between phoneme models in an HNN based recognizer to improve recognition performance for relatively short or long words.

Type: Grant

Filed: December 19, 2001

Date of Patent: January 15, 2008

Assignee: Nokia Corporation

Inventors: Soren Riis, Konstantinos Koumpis
Polyphone network method and apparatus

Patent number: 7319958

Abstract: Acoustic phones (preferably drawn 12 from a plurality of spoken languages) are provided 11. A hierarchically-organized polyphone network (20) organizes views of these phones of varying resolution and phone categorization as a function, at least in part, of phonetic similarity (14) and at least one language-independent phonological factor (15). In a preferred approach, a unique transcription system serves to represent the phones using only standard, printable ASCII characters, none of which comprises a special character (such as those characters that have a command significance for common script interpreters such as the UNIX command line).

Type: Grant

Filed: February 13, 2003

Date of Patent: January 15, 2008

Assignee: Motorola, Inc.

Inventors: Lynette Melnar, Jim Talley, Yuan-Jun Wei, Chen Liu
Method and apparatus for segmenting a multi-media program based upon audio events

Patent number: 7319964

Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.

Type: Grant

Filed: June 7, 2004

Date of Patent: January 15, 2008

Assignee: AT&T Corp.

Inventors: Qian Huang, Zhu Liu
Multi-source phoneme classification for noise-robust automatic speech recognition

Patent number: 7319959

Abstract: A system and method are disclosed for processing an audio signal including separating the audio signal into a plurality of streams which group sounds from a same source prior to classification and analyzing each separate stream to determine phoneme-level classification. One or more words of the audio signal may then be outputted.

Type: Grant

Filed: May 14, 2003

Date of Patent: January 15, 2008

Assignee: Audience, Inc.

Inventor: Lloyd Watts
Target specific data filter to speed processing

Publication number: 20080010067

Abstract: A method is presented which reduces data flow and thereby increases processing capacity while preserving a high level of accuracy in a distributed speech processing environment for speaker detection. The method and system of the present invention includes filtering out data based on a target speaker specific subset of labels using data filters. The method preserves accuracy and passes only a fraction of the data by optimizing target specific performance measures. Therefore, a high level of speaker recognition accuracy is maintained while utilizing existing processing capabilities.

Type: Application

Filed: July 7, 2006

Publication date: January 10, 2008

Inventors: Upendra V. Chaudhari, Juan M. Huerta, Ganesh N. Ramaswamy, Olivier Verscheure
Speaker recognition method based on structured speaker modeling and a “Pickmax” scoring technique

Patent number: 7318032

Abstract: A technique for improved score calculation and normalization in a framework of recognition with phonetically structured speaker models. The technique involves determining, for each frame and each level of phonetic detail of a target speaker model, a non-interpolated likelihood value, and then resolving the at least one likelihood value to obtain a likelihood score.

Type: Grant

Filed: June 13, 2000

Date of Patent: January 8, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
Sequential presentation of long instructions in an interactive voice response system

Patent number: 7305070

Abstract: An interactive voice response system that allows a caller to perform a series of sequential tasks based on an instruction set. The caller is queried after each instruction to ensure that the caller has successfully completed all of the steps. Additionally, provisions are provided to automatically pause the instruction set and present reminders to the caller. Further the caller may elect to repeat instructions, back up the instruction set, receive additional details, transfer to a service representative, or receive summary information.

Type: Grant

Filed: January 30, 2002

Date of Patent: December 4, 2007

Assignee: AT&T Labs, Inc.

Inventors: Philip Ted Kortum, Robert R. Bushey
Sensor based approach recognizer selection, adaptation and combination

Patent number: 7302393

Abstract: A method and respective system for operating a speech recognition system, in which a plurality of recognizer programs are accessible to be activated for speech recognition, and are combined on a per need basis in order to efficiently improve the results of speech recognition done by a single recognizer. In order to adapt such system to the dynamically changing acoustic conditions of various operating environments and to the particular requirements of running in embedded systems having only a limited computing power available, it is proposed to a) collect selection base data characterizing speech recognition boundary conditions, e.g. the speaker person and the environmental noise, etc., with sensor means, and b) using program-controlled arbiter means for evaluating the collected data, e.g., a decision engine including software mechanism and a physical sensor, to select the best suited recognizer or a combination thereof out of the plurality of available recognizers.

Type: Grant

Filed: October 31, 2003

Date of Patent: November 27, 2007

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann
Continuous speech recognition method and system using inter-word phonetic information

Patent number: 7299178

Abstract: A continuous speech recognition method and system are provided.

Type: Grant

Filed: February 24, 2004

Date of Patent: November 20, 2007

Assignee: Samsung Electronics Co., Ltd.

Inventors: Su-yeon Yoon, In-jeong Choi, Nam-hoon Kim
Three-stage individual word recognition

Patent number: 7299179

Abstract: In a three-stage speech recognition process, a phoneme sequence is first assigned to a speech unit, then those vocabulary entries which are most similar to the phoneme sequence are sought in a selection vocabulary, and finally the speech unit is recognized using a speech unit recognizer which uses, as its vocabulary, the selected vocabulary entries which are most like the phoneme sequence.

Type: Grant

Filed: January 19, 2004

Date of Patent: November 20, 2007

Assignee: Siemens Aktiengesellschaft

Inventors: Hans-Ulrich Block, Stefanie Schachtl
Language context dependent data labeling

Patent number: 7295979

Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.

Type: Grant

Filed: February 22, 2001

Date of Patent: November 13, 2007

Assignee: International Business Machines Corporation

Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
Pattern matching method and apparatus

Patent number: 7295980

Abstract: A system is provided for matching two or more sequences of phonemes both or all of which may be generated from text or speech. A dynamic programming matching technique is preferably used having constraints which depend upon whether or not the two sequences are generated from text or speech and in which the scoring of the dynamic programming paths is weighted by phoneme confusion scores, phoneme insertion scores and phoneme deletion scores where appropriate.

Type: Grant

Filed: August 31, 2006

Date of Patent: November 13, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
Graphical user interface and method for modifying pronunciations in text-to-speech and speech recognition systems

Patent number: 7292980

Abstract: A method and user interface which allow users to make decisions about how to pronounce words and parts of words based on audio cues and common words with well known pronunciations. Users input or select words for which they want to set or modify pronunciations. To set the pronunciation of a given letter or letter combination in the word, the user selects the letters and is presented with a list of common words whose pronunciations, or portions thereof, are substantially identical to possible pronunciations of the selected letters. The list of sample, common words is ordered based on frequency of correlation in common usage, the most common being designated as the default sample word, and the user is first presented with a subset of the words in the list which are most likely to be selected. In addition, the present invention allows for storage in the dictionary of several different pronunciations for the same word, to allow for contextual differences and individual preferences.

Type: Grant

Filed: April 30, 1999

Date of Patent: November 6, 2007

Assignee: Lucent Technologies Inc.

Inventors: Katherine Grace August, Michelle McNerney
Method and system for automatically detecting morphemes in a task classification system using lattices

Patent number: 7286984

Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.

Type: Grant

Filed: May 31, 2002

Date of Patent: October 23, 2007

Assignee: AT&T Corp.

Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
Multi-phoneme streamer and knowledge representation speech recognition system and method

Patent number: 7286987

Abstract: A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.

Type: Grant

Filed: June 30, 2003

Date of Patent: October 23, 2007

Assignee: Conceptual Speech LLC

Inventor: Philippe Roy
Coarticulated concatenated speech

Patent number: 7269557

Abstract: Described are methods and systems for reducing the audible gap in concatenated recorded speech, resulting in more natural sounding speech in voice applications. The sound of concatenated, recorded speech is improved by also coarticulating the recorded speech. The resulting message is smooth, natural sounding and lifelike. Existing libraries of regularly recorded bulk prompts can be used by coarticulating the user interface prompt occurring just before the bulk prompt. Applications include phone-based applications as well as non-phone-based applications.

Type: Grant

Filed: November 19, 2004

Date of Patent: September 11, 2007

Assignee: Tellme Networks, Inc.

Inventors: Scott J. Bailey, Nikko Strom
Generating a task-adapted acoustic model from one or more different corpora

Patent number: 7263487

Abstract: The present invention generates a task-dependent acoustic model from a supervised task-independent corpus and further adapted it with an unsupervised task dependent corpus. The task-independent corpus includes task-independent training data which has an acoustic representation of words and a sequence of transcribed words corresponding to the acoustic representation. A relevance measure is defined for each of the words in the task-independent data. The relevance measure is used to weight the data associated with each of the words in the task-independent training data. The task-dependent acoustic model is then trained based on the weighted data for the words in the task-independent training data.

Type: Grant

Filed: September 29, 2005

Date of Patent: August 28, 2007

Assignee: Microsoft Corporation

Inventor: Mei Yuh Hwang
Large-vocabulary speech recognition method, apparatus, and medium based on multilayer central lexicons

Publication number: 20070185714

Abstract: A speech recognition method including: layering a central lexicon in a tree structure with respect to recognition-subject vocabularies; performing multi-pass symbol matching between a recognized phoneme sequence and a phonetic sequence of the central lexicon layered in the tree structure; and selecting a final speech recognition result via a Viterbi search process using a detailed acoustic model with respect to candidate vocabularies selected by the multi-pass symbol matching.

Type: Application

Filed: August 28, 2006

Publication date: August 9, 2007

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam Hoon Kim, In Jeong Choi, Ick Sang Han, Sang Bae Jeong
Recognition confidence measuring by lexical distance between candidates

Publication number: 20070185713

Abstract: A recognition confidence measurement method, medium and system which can more accurately determine whether an input speech signal is an in-vocabulary, by extracting an optimum number of candidates that match a phone string extracted from the input speech signal and estimating a lexical distance between the extracted candidates is provided. A recognition confidence measurement method includes: extracting a phoneme string from a feature vector of an input speech signal; extracting candidates by matching the extracted phoneme string and phoneme strings of vocabularies registered in a predetermined dictionary and; estimating a lexical distance between the extracted candidates; and determining whether the input speech signal is an in-vocabulary, based on the lexical distance.

Type: Application

Filed: July 31, 2006

Publication date: August 9, 2007

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Sang-Bae Jeong, Nam Hoon Kim, Ick Sang Han, In Jeong Choi, Gil Jin Jang, Jae-Hoon Jeong
String matching method and system using phonetic symbols and computer-readable recording medium storing computer program for executing the string matching method

Publication number: 20070156404

Abstract: A string matching method and system for searching for a representative string for a plurality of strings which are written in different languages and/or in different ways but share the substantially same meaning, and a computer-readable recording medium storing a computer program for executing the string matching method are provided.

Type: Application

Filed: December 11, 2006

Publication date: July 5, 2007

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kyung-eun Lee, Seok-joong Kang
Database annotation and retrieval

Patent number: 7240003

Abstract: A data structure is provided for annotating data files within a database. The annotation data comprises a phoneme and word lattice which allows the quick and efficient searching of data files within the database, in response to a user's input query for desired information. The phoneme and word lattice comprises a plurality of time-ordered nodes, and a plurality of links extending between the nodes. Each link has a phoneme or word associated with it. The nodes are arranged in a sequence of time-ordered blocks such that further data can be conveniently added to the lattice.

Type: Grant

Filed: September 28, 2001

Date of Patent: July 3, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Jason Peter Andrew Charlesworth, Philip Neil Garner
Speech recognition system using normalized voiced segment spectrogram analysis

Patent number: 7233899

Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.

Type: Grant

Filed: March 7, 2002

Date of Patent: June 19, 2007

Inventors: Vitaliy S. Fain, Samuel V. Fain
Emphasis of short-duration transient speech features

Patent number: 7219065

Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity short-duration speech features in said signal.

Type: Grant

Filed: October 25, 2000

Date of Patent: May 15, 2007

Inventors: Andrew E. Vandali, Graeme M. Clark
System and method of pattern recognition in very high-dimensional space

Patent number: 7216076

Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.

Type: Grant

Filed: December 19, 2005

Date of Patent: May 8, 2007

Assignee: AT&T Corp.

Inventor: Bishnu Saroop Atal
Method of speech recognition using time-dependent interpolation and hidden dynamic value classes

Patent number: 7206741

Abstract: A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.

Type: Grant

Filed: December 6, 2005

Date of Patent: April 17, 2007

Assignee: Microsoft Corporation

Inventors: Li Deng, Jian-lai Zhou, Frank Torsten Bernd Seide, Asela J. R. Gunawardana, Hagai Attias, Alejandro Acero, Xuedong Huang
Hybrid baseform generation

Patent number: 7206738

Abstract: A method, a computer system and a computer program product for generating baseforms or phonetic spellings from input text are disclosed. The baseforms are initially generated using rules defined for a particular language. Then, phones are identified in the language that are exceptions to the defined rules and an action is associated with each identified phone. A statistical technique is applied to determine whether the identified phones can be modified. Finally, baseforms containing the identified phones that can be modified, are corrected according to the associated actions. Preferably, the statistical technique is only applied to baseforms containing phones that are exceptions to the defined rules. The defined rules can comprise spelling-to-sound rules for a particular phonetic language that incorporate all possible alternative pronunciations of each baseform.

Type: Grant

Filed: August 14, 2002

Date of Patent: April 17, 2007

Assignee: International Business Machines Corporation

Inventors: Nitendra Rajput, Ashish Verma
Method and system for automatically optimizing recognition configuration parameters for speech recognition systems

Patent number: 7191130

Abstract: The present invention introduces a system and method for automatically optimizing recognition configuration parameters for speech recognition systems. In one embodiment, a method comprises receiving an utterance at a speech recognizer, wherein the speech recognizer has a learning mode. The speech recognizer is run in a learning mode to automatically generate tuned configuration parameters. Subsequent utterances are recognized with the tuned configuration parameters to generate future recognition results.

Type: Grant

Filed: September 27, 2002

Date of Patent: March 13, 2007

Assignee: Nuance Communications

Inventors: Christopher J. Leggetter, Michael M. Hochberg
Speech recognition system and method for employing the same

Patent number: 7191135

Abstract: A speech recognition system that includes a host computer which is operative to communicate at least one graphical user interface (GUI) display file to a mobile terminal of the system. The mobile terminal includes a microphone for receiving speech input; wherein the at least one GUI display file is operative to be associated with at least one of a dictionary file and syntax file to facilitate speech recognition in connection with the at least one GUI display file.

Type: Grant

Filed: April 8, 1998

Date of Patent: March 13, 2007

Assignee: Symbol Technologies, Inc.

Inventor: Timothy P. O'Hagan
Vocabulary independent speech recognition system and method using subword units

Patent number: 7181398

Abstract: A speech recognition system provides a subword decoder and a dictionary lookup to process a spoken input. In a first stage of processing, the subword decoder decodes the speech input based on subword units or particles and identifies hypothesized subword sequences using a particle dictionary and particle language model, but independently of a word dictionary or word vocabulary. Further stages of processing involve a particle to word graph expander and a word decoder. The particle to word graph expander expands the subword representation produced by the subword decoder into a word graph of word candidates using a word dictionary. The word decoder uses the word dictionary and a word language model to determine a best sequence of word candidates from the word graph that is most likely to match the words of the spoken input.

Type: Grant

Filed: March 27, 2002

Date of Patent: February 20, 2007

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Jean-Manuel Van Thong, Pedro Moreno, Edward Whittaker
Compression of language model structures and word identifiers for automated speech recognition systems

Patent number: 7171358

Abstract: A method compresses one or more ordered arrays of integer values. The integer values can represent a vocabulary of a language mode, in the form of an N-gram, of an automated speech recognition system. For each ordered array to be compressed, and an inverse array I[.] is defined. One or more spilt inverse arrays are also defined for each ordered array. The minimum and optimum number of bits required to store the array A[.] in terms of the split arrays and split inverse arrays are determined. Then, the original array is stored in such a way that the total amount of memory used is minimized.

Type: Grant

Filed: January 13, 2003

Date of Patent: January 30, 2007

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Edward W. D. Whittaker, Bhiksha Ramakrishnan
Speech information processing method and apparatus and storage medium using a segment pitch pattern model

Patent number: 7155390

Abstract: A speech information processing apparatus and method performs speech recognition. Speech is input, and feature parameters of the input speech are extracted. The feature parameters are recognized based on a segment pitch pattern model. The segment pitch pattern model may be obtained by modeling time change in a fundamental frequency of a phoneme belonging to a predetermined phonemic environment with a polynomial segment model. The segment pitch pattern model may also be obtained by modeling with at least one of a single mixed distribution and a multiple mixed distribution.

Type: Grant

Filed: October 18, 2004

Date of Patent: December 26, 2006

Assignee: Canon Kabushiki Kaisha

Inventor: Toshiaki Fukada
Phonetically based speech recognition system and method

Patent number: 7146319

Abstract: A speech recognition method includes a step of receiving a phonetic sequence output by a phonetic recognizer. The method also includes a step of matching the phonetic sequence with one of a plurality of reference phoneme sequences stored in a reference list that matches closest thereto. At least one of the plurality of reference phoneme sequences stored in the reference list includes additional information with respect to a phonetic sequence that is capable of being output by the phonetic recognizer.

Type: Grant

Filed: March 31, 2003

Date of Patent: December 5, 2006

Assignee: Novauris Technologies Ltd.

Inventor: Melvyn J. Hunt
Spelling words using an arbitrary phonetic alphabet

Patent number: 7143037

Abstract: Words are spelled by receiving recognizable words from a user of an interactive voice response system. The first letter of each recognizable word is identified, and a spelling is determined based on the first letters of the recognizable words. Statistics for previous users of the interactive voice response system are determined, where the statistics indicate the number of times each of the recognizable words has been used to indicate a letter. The recognizable word that is most commonly used for each letter is identified. The user is prompted with at least two recognizable words that are most commonly used, where each recognizable word corresponds to a different letter. A selection of one of the recognizable words provided to the user is received.

Type: Grant

Filed: June 12, 2002

Date of Patent: November 28, 2006

Assignee: Cisco Technology, Inc.

Inventor: Kevin L. Chestnut
System and method for speech recognition using an enhanced phone set

Patent number: 7139708

Abstract: A system and method for speech recognition using an enhanced phone set comprises speech data, an enhanced phone set, and a transcription generated by a transcription process. The transcription process selects appropriate phones from the enhanced phone set to represent acoustic-phonetic content of the speech data. The enhanced phone set includes base-phones and composite-phones. A phone dataset includes the speech data and the transcription. The present invention also comprises a transformer that applies transformation rules to the phone dataset to produce a transformed phone dataset. The transformed phone dataset may be utilized in training a speech recognizer, such as a Hidden Markov Model. Various types of transformation rules may be applied to the phone dataset of the present invention to find an optimum transformed phone dataset for training a particular speech recognizer.

Type: Grant

Filed: August 4, 1999

Date of Patent: November 21, 2006

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Lex S. Olorenshaw, Mariscela Amador-Hernandez
Method and apparatus for classifying unmarked string substructures using Markov Models

Patent number: 7139688

Abstract: A technique for structurally classifying substructures of at least one unmarked string utilizing at least one training data set with inserted markers identifying labeled substructures. A model of class labels and substructures within strings of the training data set is first constructed. Markers are then inserted into the unmarked string, identifying substructures similar to substructures within strings of the training data set by using the model. Finally, class labels of the substructures in the unmarked string similar to substructures within strings of the training data set are predicted using the model.

Type: Grant

Filed: June 20, 2003

Date of Patent: November 21, 2006

Assignee: International Business Machines Corporation

Inventors: Charu C. Aggarwal, Philip Shi-Lung Yu
Case-based reasoning similarity metrics implementation using user defined functions

Patent number: 7136852

Abstract: A database system and a method for case-based reasoning are disclosed. The database system includes an exemplar object within the database configured to accept and store a plurality of exemplar cases, a target object within the database configured to accept and store a target case, and a comparison object within the database for comparing the target case with the plurality of exemplar cases. The method includes comparing the target case with the plurality of exemplar cases within a database to produce similarity metrics and determining the similarity between the target and exemplar cases based on the similarity metrics.

Type: Grant

Filed: November 27, 2001

Date of Patent: November 14, 2006

Assignee: NCR Corp.

Inventors: Warren Martin Sterling, Barbara Jane Ericson
Low bandwidth speech communication using default and personal phoneme tables

Patent number: 7136811

Abstract: A voice coding and decoding system 300 and method uses a personal phoneme table (320, 344) associated with a voice signature identifier (348) to permit encoding of true sounding voice by personalizing the phoneme table used for encoding and decoding. A default phoneme table (364) is used for encoding and decoding until a personal phoneme table (320, 344) is constructed. A MIDI decoder (360) is used to create the reconstructed speech from a string of phoneme identifiers transmitted from the sending side (302) to the receiving side (304).

Type: Grant

Filed: April 24, 2002

Date of Patent: November 14, 2006

Assignee: Motorola, Inc.

Inventors: Thomas Michael Tirpak, Weimin Xiao
Training speech recognition word models from word samples synthesized by Monte Carlo techniques

Patent number: 7133827

Abstract: A new word model is trained from synthetic word samples derived by Monte Carlo techniques from one or more prior word models. The prior word model can be a phonetic word model and the new word model can be a non-phonetic, whole-word, word model. The prior word model can be trained from data that has undergone a first channel normalization and the synthesized word samples from which the new word model is trained can undergo a different channel normalization similar to that to be used in a given speech recognition context. The prior word model can have a first model structure and the new word model can have a second, different, model structure. These differences in model structure can include, for example, differences of model topology; differences of model complexity; and differences in the type of basis function used in a description of such probability distributions.

Type: Grant

Filed: February 6, 2003

Date of Patent: November 7, 2006

Assignee: Voice Signal Technologies, Inc.

Inventors: Laurence S. Gillick, Donald R. McAllaster, Daniel L. Roth
Dynamic semantic control of a speech recognition system

Patent number: 7127393

Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.

Type: Grant

Filed: February 10, 2003

Date of Patent: October 24, 2006

Assignee: Speech Works International, Inc.

Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
Method of training a computer system via human voice input

Patent number: 7127397

Abstract: A method of training a computer system via human voice input from a human teacher is provided. In one embodiment, the method includes presenting a text spelling of an unknown word and receiving a human voice pronunciation of the unknown word. A phonetic spelling of the unknown word is determined. The text spelling is associated with the phonetic spelling to allow a text to speech engine to correctly pronounce the unknown word in the future when presented with the text spelling of the unknown word.

Type: Grant

Filed: May 31, 2001

Date of Patent: October 24, 2006

Assignee: Qwest Communications International Inc.

Inventor: Eliot M. Case
Word string collating apparatus, word string collating method and address recognition apparatus

Patent number: 7124130

Abstract: The present invention is directed to an address recognition apparatus for recognizing a written address. The apparatus includes an input device that receives a scanned image of the written address and transforms the image into digital data, a character recognizing section that recognizes a word string in the digital data on a unit character basis, a word extracting section that extracts characters recognized by the character recognizing section on a unit word basis, and an address word string dictionary that previously stores a plurality of first word strings. The apparatus further includes and an address word string recognizing section that collates a second word string, determines words of the second word string respectively corresponding to the words of the first word string, evaluates each of the first word strings, and recognizes one of the first word strings as the address word string.

Type: Grant

Filed: September 4, 2003

Date of Patent: October 17, 2006

Assignee: Kabushiki Kaisha Toshiba

Inventor: Naotake Natori
Method and system for preselection of suitable units for concatenative speech

Patent number: 7124083

Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

Type: Grant

Filed: November 5, 2003

Date of Patent: October 17, 2006

Assignee: AT&T Corp.

Inventor: Alistair D. Conkie
Method and apparatus for processing information signals based on content

Patent number: 7092496

Abstract: Methods and apparatus are provided for processing an information signal containing content presented in accordance with at least one modality. In one aspect of the present invention, a method of processing an information signal containing content presented in accordance with at least one modality, comprises the steps of: (i) obtaining the information signal; (ii) performing content detection on the information signal to detect whether the information signal includes particular content presented in accordance with the at least one modality; and (iii) generating a control signal, when the particular content is detected, for use in controlling a rendering property of the particular content and/or implementation of a specific action relating to the particular content.

Type: Grant

Filed: September 18, 2000

Date of Patent: August 15, 2006

Assignee: International Business Machines Corporation

Inventors: Stephane Herman Maes, Mukund Padmanabhan, Jeffrey Scott Sorensen

prev … 11 12 13 14 15 16 17 18 19 … next