Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Method and configuration for determining a descriptive feature of a speech signal

Patent number: 6523005

Abstract: A method and also a configuration for determining a descriptive feature of a speech signal, in which a first speech model is trained with a first time pattern and a second speech model is trained with a second time pattern. The second speech model is initialized with the first speech model.

Type: Grant

Filed: September 10, 2001

Date of Patent: February 18, 2003

Assignee: Siemens Aktiengesellschaft

Inventor: Martin Holzapfel
Dynamic semantic control of a speech recognition system

Patent number: 6519562

Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.

Type: Grant

Filed: February 25, 1999

Date of Patent: February 11, 2003

Assignee: Speechworks International, Inc.

Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
Background model design for flexible and portable speaker verification systems

Patent number: 6519563

Abstract: A speaker verification method and apparatus which advantageously minimizes the constraints on the customer and simplifies the system architecture by using a speaker dependent, rather than a speaker independent, background model, thereby obtaining many of the advantages of using a background model in a speaker verification process without many of the disadvantages thereof. In particular, no training data (e.g. speech) from anyone other than the customer is required, no speaker independent models need to be produced, no a priori knowledge of acoustic rules are required, and, no multi-lingual phone models, dictionaries, or letter-to-sound rules are needed. Nonetheless, in accordance with an illustrative embodiment of the present invention, the customer is free to select any password phrase in any language.

Type: Grant

Filed: November 22, 1999

Date of Patent: February 11, 2003

Assignee: Lucent Technologies Inc.

Inventors: Chin-Hui Lee, Qi P. Li, Olivier Siohan, Arun Chandrasekaran Surendran
Task oriented dialog model and manager

Patent number: 6510411

Abstract: A simplification of the process of developing call or dialog flows for use in an Interactive Voice Response system is provided. Three principal aspects of the invention include a task-oriented dialog model (or task model), development tool and a Dialog Manager. The task model is a framework for describing the application-specific information needed to perform the task. The development tool is an object that interprets a user specified task model and outputs information for a spoken dialog system to perform according to the specified task model. The Dialog Manager is a runtime system that uses output from the development tool in carrying out interactive dialogs to perform the task specified according to the task model. The Dialog Manager conducts the dialog using the task model and its built-in knowledge of dialog management. Thus, generic knowledge of how to conduct a dialog is separated from the specific information to be collected in a particular application.

Type: Grant

Filed: October 29, 1999

Date of Patent: January 21, 2003

Assignee: Unisys Corporation

Inventors: Lewis M. Norton, Deborah A. Dahl, Marcia C. Linebarger
Method and apparatus for evaluating the accuracy of a speech recognition system

Patent number: 6507816

Abstract: A method and system for evaluating the accuracy of a computer speech recognition system counts and indexes the total number of words dictated and the number of words corrected. The corrections are tallied after being made in a correction window and include words contained in an alternative list as well as words input by the user and within a stored word database. A processor calculates the approximate accuracy of the speech recognition system as the ratio of the number of correct words to the total number of words dictated. An accuracy ratio is calculated for each dictation session and an overall ratio is calculated for all sessions combined. The system also keeps individual and overall indexes of the number of times the corrected words were in alternate lists or not within the word database and uses these indexes to calculate additional accuracy values.

Type: Grant

Filed: May 4, 1999

Date of Patent: January 14, 2003

Assignee: International Business Machines Corporation

Inventor: Kerry A. Ortega
Speech processing board for high volume speech processing applications

Publication number: 20030009334

Abstract: A speech processing board configured in accordance with the inventive arrangements can include multiple processor modules, each processor module having an associated local memory, each processor module hosting at least one instance of a speech application task; a storage system for storing speech task data, the speech task data including language models and finite state grammars; a local communications bus communicatively linking each processor module through which each processor module can exchange speech task data with the storage system; and, a communications bridge to a host system, wherein the communications bridge can provide an interface to the local communications bus through which data can be exchanged between the processor modules and the host system. Notably, the host system can be a CT media services system or a VoIP gateway/endpoint.

Type: Application

Filed: July 3, 2001

Publication date: January 9, 2003

Applicant: International Business Machines Corporation

Inventors: Harry W. Printz, Bruce A. Smith
Method for recognizing a keyword in speech

Patent number: 6505156

Abstract: A keyword is recognized in spoken language by assuming a start of this keyword is at every sampling time. An attempt is then made to image this keyword onto a sequence of HMM statusses that represent the keyword. The best path in a presentation space is determined with the Viterbi algorithm; and a local confidence standard is employed instead of the emission probability used in the Viterbi algorithm. When a global confidence standard that is composed of local confidence standards downwardly crosses a lower barrier for the best Viterbi path, then the keyword is recognized; and the sampling time assumed as start of the keyword is confirmed.

Type: Grant

Filed: February 25, 2000

Date of Patent: January 7, 2003

Assignee: Siemens Aktiengesellschaft

Inventors: Jochen Junkawitsch, Harald Höge
Two-tier noise rejection in speech recognition

Patent number: 6502072

Abstract: A method and apparatus is provided for two-tier noise rejection in speech recognition. The method and apparatus convert an analog speech signal into a digital signal and extract features from the digital signal. A hypothesis speech word and a hypothesis noise word are identified from respective extracted features. The features associated with the hypothesis speech word are examined in a second tier of noise rejection to determine if the features are more likely to represent noise than speech. The hypothesis speech word is replaced by a noise marker if the features are more likely to represent noise than speech.

Type: Grant

Filed: October 12, 1999

Date of Patent: December 31, 2002

Assignee: Microsoft Corporation

Inventors: Li Jiang, Xuedong Huang
Voice interaction method for a computer graphical user interface

Patent number: 6499015

Abstract: The present invention enables a computer user to select a function represented via a graphical user interface by speaking command related to the function into audio processing circuitry. A voice recognition program interprets the spoken words to determine the function that is desired for execution. The user may use the cursor to identify an element on the graphical user interface display or speak the name of that element. The computer responds to the identification of the element by displaying a menu of the voice commands associated with that element.

Type: Grant

Filed: August 12, 1999

Date of Patent: December 24, 2002

Assignee: International Business Machines Corporation

Inventors: Brian S. Brooks, Keith P. Loring, Maria Milenkovic
Method and apparatus for hierarchical training of speech models for use in speaker verification

Patent number: 6499012

Abstract: A method and apparatus for generating a pair of data elements is provided suitable for use in a speaker verification system. The pair includes a first element representative of a speaker independent template and a second element representative of an extended speaker specific speech pattern. An audio signal forming enrollment data associated with a given speaker is received and processed to derive a speaker independent template and a speaker specific speech pattern. The speaker specific speech pattern is then processed to derive an extended speaker specific speech pattern. The extended speaker specific speech pattern includes a set of expanded speech models, each expanded speech model including a plurality of groups of states, the groups of states being linked to one another by inter-group transitions. Optionally, the expanded speech models are processed on the basis of the enrollment data to condition at least one of the plurality of inter-group transitions.

Type: Grant

Filed: December 23, 1999

Date of Patent: December 24, 2002

Assignee: Nortel Networks Limited

Inventors: Stephen Douglas Peters, Matthieu Hebert, Daniel Boies
Speech recognition using polynomial expansion and hidden markov models

Publication number: 20020184025

Abstract: A speech recognition system (10) having a sampler block (12) and a feature extractor block (14) for extracting time domain and spectral domain parameters from a spoken input speech into a feature vector. A polynomial expansion block (16) generates polynomial coefficients from the feature vector. A correlator block (20), a sequence vector block (22), an HMM table (24) and a Veterbi block (26) perform the actual speech recognition based on the speech units stored in a speech unit table (18) and the HMM word models stored in the HMM table (24). The HMM word model that produces the highest probability is determined to be the word that was spoken.

Type: Application

Filed: May 31, 2001

Publication date: December 5, 2002

Applicant: Motorola, Inc.

Inventors: David L. Barron, William Chunhung Yip
Method and apparatus for training an ultra-large vocabulary, continuous speech, speaker independent, automatic speech recognition system and consequential database

Patent number: 6490557

Abstract: The present invention is embodied in a system and method for recognizing speech and transcribing speech in real time. The system includes a computer, which could be in a LAN or WAN linked to other computer systems through the Internet. The computer has a controller, or similar device, to filter background noise and convert incoming signals to digital format. The digital signals are transcribed to a word list, which is processed by an automatic speech recognition system. This system synchronizes and compares the lists and forwards the list to a speech recognition learning system, which stores the data on-site. The stored data is forwarded to an off-site storage system, and an off-site large scale learning system that processes the data from all sites on the wide area network system.

Type: Grant

Filed: March 3, 1999

Date of Patent: December 3, 2002

Inventor: John C. Jeppesen
Method of speech recognition with compensation for both channel distortion and background noise

Publication number: 20020173959

Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For each spech utterance the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise is determined. The mean vector of the noisy data over the noisy speech space is determinedand thid is removed from model mean vectors adapted to noise to get the target model.

Type: Application

Filed: January 18, 2002

Publication date: November 21, 2002

Inventor: Yifan Gong
Efficient method for information extraction

Publication number: 20020165717

Abstract: The invention provides a method and system for extracting information from text documents. A document intake module receives and stores a plurality of text documents for processing, an input format conversion module converts each document into a standard format for processing, an extraction module identifies and extracts desired information from each text document, and an output format conversion module converts the information extracted from each document into a standard output format. These modules operate simultaneously on multiple documents in a pipeline fashion so as to maximize the speed and efficiency of extracting information from the plurality of documents.

Type: Application

Filed: April 8, 2002

Publication date: November 7, 2002

Inventors: Robert P. Solmer, Christopher K. Harris, Mauritius A.R. Schmidtler, James W. Dolter
Enrollment and modeling method and apparatus for robust speaker dependent speech models

Patent number: 6470315

Abstract: Speech recognition and the generation of speech recognition models is provided including the generation of unique phonotactic garbage models (15) to identify speech by, for example, English language constraints in addition to noise, silence and other non-speech models (11) and for speech recognition specific word models.

Type: Grant

Filed: September 11, 1996

Date of Patent: October 22, 2002

Assignee: Texas Instruments Incorporated

Inventors: Lorin Paul Netsch, Barbara Janet Wheatley
System and method for training a class-specific hidden Markov model using a modified Baum-Welch algorithm

Patent number: 6466908

Abstract: A system and method for training a class-specific hidden Markov model (HMM) is used for modeling physical phenomena, such as speech, characterized by a finite number of states. The method receives training data and estimates parameters of the class-specific HMM from the training data using a modified Baum-Welch algorithm, which uses likelihood ratios with respect to a common state (e.g., noise) and based on sufficient statistics for each state. The parameters are stored for use in processing signals representing the physical phenomena, for example, in speech processing applications. The modified Baum-Welch algorithm is an iterative algorithm including class-specific forward and backward procedures and HMM reestimation formulas.

Type: Grant

Filed: January 14, 2000

Date of Patent: October 15, 2002

Assignee: The United States of America as represented by the Secretary of the Navy

Inventor: Paul M. Baggenstoss
Speech recognition training for small hardware devices

Patent number: 6463413

Abstract: A distributed speech processing system for constructing speech recognition reference models that are to be used by a speech recognizer in a small hardware device, such as a personal digital assistant or cellular telephone. The speech processing system includes a speech recognizer residing on a first computing device and a speech model server residing on a second computing device. The speech recognizer receives speech training data and processes it into an intermediate representation of the speech training data. The intermediate representation is then communicated to the speech model server. The speech model server generates a speech reference model by using the intermediate representation of the speech training data and then communicates the speech reference model back to the first computing device for storage in a lexicon associated with the speech recognizer.

Type: Grant

Filed: April 20, 1999

Date of Patent: October 8, 2002

Assignee: Matsushita Electrical Industrial Co., Ltd.

Inventors: Ted H. Applebaum, Jean-Claude Junqua
Voice recognition system using implicit speaker adaptation

Publication number: 20020143540

Abstract: A voice recognition (VR) system is disclosed that utilizes a combination of speaker independent (SI) and speaker dependent (SD) acoustic models. At least one SI acoustic model is used in combination with at least one SD acoustic model to provide a level of speech recognition performance that at least equals that of a purely SI acoustic model. The disclosed hybrid SI/SD VR system continually uses unsupervised training to update the acoustic templates in the one or more SD acoustic models.

Type: Application

Filed: March 28, 2001

Publication date: October 3, 2002

Inventors: Narendranath Malayath, Andrew P. DeJaco, Chienchung Chang, Suhail Jalil, Ning Bi, Harinath Garudadri
Adapting a hidden Markov sound model in a speech recognition lexicon

Patent number: 6460017

Abstract: When adapting a lexicon in a speech recognition system, a code book of hidden Markov sound models made available with a speech recognition system is adapted for specific applications. These applications are thereby defined by a lexicon of the application that is modified by the user. The adaption ensues during the operation and occurs by a shift of the stored mid-point vector of the probability density distributions of hidden Markov models in the direction of a recognized feature vector of sound expressions and with reference to the specifically employed hidden Markov models. Compared to standard methods, this method has the advantage that it ensues on-line and that it assures a very high recognition rate given a low calculating outlay. Further, the outlay for training specific sound models for corresponding applications is avoided.

Type: Grant

Filed: June 10, 1999

Date of Patent: October 1, 2002

Assignee: Siemens Aktiengesellschaft

Inventors: Udo Bub, Harald Höge, Joachim Köhler
Systems and methods for determinizing and minimizing a finite state transducer for pattern recognition

Patent number: 6456971

Abstract: A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequence, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.

Type: Grant

Filed: October 27, 2000

Date of Patent: September 24, 2002

Assignee: AT&T Corp.

Inventors: Mehryar Mohri, Fernando Carlos Neves Pereira, Michael Dennis Riley
Minimization of search network in speech recognition

Patent number: 6456970

Abstract: The search network in a speech recognition system is reduced by parsing the incoming speech expanding all active paths (101), comparing to speech models and scoring the paths and storing recognition level values at the slots (103) and accumulating the scores and discarding previous slots when a word end is detected creating a word end slot (109).

Type: Grant

Filed: July 15, 1999

Date of Patent: September 24, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
System and method for efficient storage of voice recognition models

Publication number: 20020133345

Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.

Type: Application

Filed: January 12, 2001

Publication date: September 19, 2002

Inventor: Harinath Garudadri
Phonological awareness, phonological processing, and reading skill training system and method

Patent number: 6435877

Abstract: A training tool for training and assessing one or more auditory processing, phonological awareness, phonological processing and reading skills of an individual is provided. The training tool may use various graphical games to train the individual's ability in a particular set of auditory processing, phonological awareness, phonological processing and reading skills. The system may use speech recognition technology to permit the user to interact with the games.

Type: Grant

Filed: July 20, 2001

Date of Patent: August 20, 2002

Assignee: Cognitive Concepts, Inc.

Inventor: Janet M. Wasowicz
Apparatus, method and system for cross-speaker speech recognition for telecommunication applications

Patent number: 6438520

Abstract: The apparatus, method and system of the present invention provide for cross-speaker speech recognition, and are particularly suited for telecommunication applications such as automatic name (voice) dialing, message management, call return management, and incoming call screening. The method of the present invention includes receiving incoming speech, such as an incoming caller name, and generating a phonetic transcription of the incoming speech with a speaker-independent, hidden Markov model having an unconstrained grammar in which any phoneme may follow any other phoneme, followed by determining a transcription parameter as a likelihood of fit of the incoming speech to the speaker-independent model.

Type: Grant

Filed: January 20, 1999

Date of Patent: August 20, 2002

Assignee: Lucent Technologies Inc.

Inventors: Carol Lynn Curt, Rafid Antoon Sukkar, John Joseph Wisowaty
Combined quantized and continuous feature vector HMM approach to speech recognition

Patent number: 6434522

Abstract: A device capable of achieving recognition at a high accuracy and with fewer calculations and which utilizes an HMM. The present device has a vector quantizing circuit generating a model by quantizing vectors of a training pattern having a vector series, and converting the vectors into a label series of clusters to which they belong, a continuous distribution probability density HMM generating circuit for generating a continuous distribution probability density HMM from a quantized vector series corresponding to each label of the label series, and a label incidence calculating circuit for calculating the incidence of the labels in each state from the training vectors classified in the same clusters and the continuous distribution probability density HMM.

Type: Grant

Filed: May 28, 1997

Date of Patent: August 13, 2002

Inventor: Eiichi Tsuboka
Determining an adequate representative sound using two quality criteria, from sound models chosen from a structure including a set of sound models

Patent number: 6430532

Abstract: A method determines a representative sound on the basis of a structure which includes a set of sound models. Each sound model has at least one representative for the modeled sound. In the structure, a first sound model, matching with regard to a first quality criterion, is determined from the set of sound models. At least one second sound model is determined from the set of sound models dependent on a characteristic state criterion of the structure. At least some of the representatives of the first sound model and of the at least one second sound model are assessed in addition to the first quality criterion with regard to a second quality criterion. The at least one representative which has an adequate overall quality criterion with regard to the first and second quality criteria is determined as a representative sound from the representatives of the first sound model and the at least one second sound model.

Type: Grant

Filed: August 21, 2001

Date of Patent: August 6, 2002

Assignee: Siemens Aktiengesellschaft

Inventor: Martin Holzapfel
Text language detection

Publication number: 20020095288

Abstract: A method of determining the language of a text message received by a mobile telecommunications device comprises receiving an input text message at a mobile telecommunications device; analysing the input text message using language information stored in the mobile telecommunications device; selecting, from a group of languages defined by the language information, a most likely language for the input text message; and outputting, from the mobile telecommunications device, speech signals corresponding to the input text message, in the selected language.

Type: Application

Filed: September 5, 2001

Publication date: July 18, 2002

Inventors: Erik Sparre, Alberto Jimenez Feltstrom
Method and system for adaptive speech recognition in a noisy environment

Patent number: 6418411

Abstract: The system uses utterances recorded in low noise condition, such as a car engine off to optimally adapt speech acoustic models to transducer and speaker characteristics and uses speech pauses to adjust the adopted models to a changing background noise, such as when in a car with the engine running.

Type: Grant

Filed: February 10, 2000

Date of Patent: July 9, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Quantization using frequency and mean compensated frequency input data for robust speech recognition

Patent number: 6418412

Abstract: A speech recognition system utilizes multiple quantizers to process frequency parameters and mean compensated frequency parameters derived from an input signal. The quantizers may be matrix and vector quantizer pairs, and such quantizer pairs may also function as front ends to a second stage speech classifiers such as hidden Markov models (HMMs) and/or utilizes neural network postprocessing to, for example, improve speech recognition performance. Mean compensating the frequency parameters can remove noise frequency components that remain approximately constant during the duration of the input signal. HMM initial state and state transition probabilities derived from common quantizer types and the same input signal may be consolidated to improve recognition system performance and efficiency. Matrix quantization exploits the “evolution” of the speech short-term spectral envelopes as well as frequency domain information, and vector quantization (VQ) primarily operates on frequency domain information.

Type: Grant

Filed: August 28, 2000

Date of Patent: July 9, 2002

Assignee: Legerity, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Computer-implemented multi-scanning language method and system

Publication number: 20020087315

Abstract: A computer-implemented method and system for speech recognition of a user speech input. The user speech input which contains utterances from a user is received. A first language model recognizes at least a portion of the utterances from the user speech input. The first language model has utterance terms that form a general category. A second language model is selected based upon the identified utterances from use of the first language model. The second language model contains utterance terms that are a subset category of the general category of utterance terms in the first language model. Subset utterances are recognized with the selected second language model from the user speech input.

Type: Application

Filed: May 23, 2001

Publication date: July 4, 2002

Inventors: Victor Wai Leung Lee, Otman A. Basir, Fakhreddine O. Karray, Jiping Sun, Xing Jing
Speech recognition method and system

Patent number: 6411929

Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.

Type: Grant

Filed: July 26, 2000

Date of Patent: June 25, 2002

Assignee: Hitachi, Ltd.

Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
Automated telephone call designation system

Patent number: 6411683

Abstract: An automated telephone call designation system includes a database that stores a plurality of keywords where each keyword is associated with at least one topic designation. The system monitors the conversation of an ongoing telephone call by utilizing voice recognition software resident in a network to detect the use of the keywords in the conversation. The keywords used in the conversation are correlated to the topic designation(s) associated with the keywords. Based on the correlation of the keywords to the topic designation(s) associated with the keywords, a topic for the ongoing telephone call is designated. A third party that desires to join an ongoing conversation of interest reviews the topics of the ongoing conversations and is bridged into the conversation of interest.

Type: Grant

Filed: February 9, 2000

Date of Patent: June 25, 2002

Assignee: AT&T Corp.

Inventors: Randy G. Goldberg, Robert Edward Markowitz, Kenneth H. Rosen
Speaker dependent speech recognition training using simplified hidden markov modeling and robust end-point detection

Patent number: 6405168

Abstract: A speech recognition training system that provides for model generation to be used within speaker dependent speech recognition systems requiring very limited training data, including single token training. The present invention provides a very fast and reliable training method based on the segmentation of a speech signal for subsequent estimating of speaker dependent word models. In addition, the invention provides for a robust method of performing end-point detection of a word contained within a speech utterance or speech signal. The invention is geared ideally for speaker dependent speech recognition systems that employ word-based speaker dependent models. The invention provides the end-point detection method is operable to extract a desired word or phrase from a speech signal that is recorded in varying degrees of undesirable background noise. In addition, the invention provides a simplified method of building the speaker dependent models using a simplified hidden Markov modeling method.

Type: Grant

Filed: September 30, 1999

Date of Patent: June 11, 2002

Assignee: Conexant Systems, Inc.

Inventors: Aruna Bayya, Dianne L. Steiger
Automatic speech recognition using segmented curves of individual speech components having arc lengths generated along space-time trajectories

Patent number: 6401064

Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

Type: Grant

Filed: May 24, 2001

Date of Patent: June 4, 2002

Assignee: AT&T Corp.

Inventor: Lawrence Kevin Saul
Intelligent keyboard interface with use of human language processing

Patent number: 6401065

Abstract: An intelligent use-friendly keyboard interface that is easily adaptable for wide variety of functions and features, and also adaptable to reduced size portable computers. Speech recognition and semantic processing for controlling and interpreting multiple symbols are used in conjunction with programmable switches with embedded LCD displays. Hidden Markov models are employed to interpret a combination of voice and keyboard input.

Type: Grant

Filed: June 17, 1999

Date of Patent: June 4, 2002

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, Stephane Maes, Clifford A. Pickover, Alexander Zlatsin
Method and apparatus for voice annotation and retrieval of multimedia data

Patent number: 6397181

Abstract: A method, an apparatus, a computer program product and a system for voice annotating and retrieving digital media content are disclosed. An annotation module (420) post annotates digital media data (410), including audio, image and/or video data, with speech. A word lattice (222) can be created from speech annotation (210) dependent upon acoustic and/or linguistic knowledge. An indexing module (430) then indexes the speech-annotated data (422). The word lattice (222) is reverse indexed (230), and content addressing (240) is applied to produce the indexed data (432, 242). A speech query (474) can be generated as input to a retrieval module (480) for retrieving a segment of the indexed digital media data (432). The speech query (474, 310) is converted into a word lattice (322), and a shortlist (344) is produced from it (322) by confidence filtering (330). The shortlist (344) is input to a lattice search engine (350) to search the indexed content (342) to obtain the search result (352).

Type: Grant

Filed: June 4, 1999

Date of Patent: May 28, 2002

Assignee: Kent Ridge Digital Labs

Inventors: Haizhou Li, Jiankang Wu, Arcot Desai Narasimhalu
Method and system for generating a speech recognition dictionary based on greeting recordings in a voice messaging system

Patent number: 6397182

Abstract: A system and a method for generating a speech recognition dictionary that can be used in a telephone system having speech recognition capabilities, in particular capabilities to effect a connection when the calling party utters the name of a subscriber (called party). The method generates transcriptions associated to respective vocabulary items in the speech recognition dictionary from audio greetings recorded by the telephone system subscribers. Normally such audio greetings are used in voice messaging applications. Typically, the greetings are played before allowing callers to leave messages in a voice mailbox of subscribers. An individual greeting is audio information that contains the name of the subscriber. This audio information is processed to generate a transcription associated to a vocabulary item in the speech recognition dictionary, representative of the subscriber name.

Type: Grant

Filed: October 12, 1999

Date of Patent: May 28, 2002

Assignee: Nortel Networks Limited

Inventors: Brian Cruickshank, Pierre M. Forgues, Lin Lin
System and method for generating a phonetic baseform for a word and using the generated baseform for speech recognition

Patent number: 6389395

Abstract: Out-of-vocabulary word models for a speech recognizer vocabulary are generated by forming phonemic transcriptions (phonetic baseforms) of user's utterances in terms of existing reference phonemes by using a speech recognition algorithm to match input sub-word feature sample sequences to suitably-constrained allowable sequences of existing reference phoneme features. The resultant new-vocabulary-word phonetic baseform models are stored for subsequent speech recognition using the same recognition algorithm.

Type: Grant

Filed: April 4, 1997

Date of Patent: May 14, 2002

Assignee: British Telecommunications public limited company

Inventor: Simon P. Ringland
Method for calculating HMM output probability and speech recognition apparatus

Publication number: 20020055842

Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.

Type: Application

Filed: September 19, 2001

Publication date: May 9, 2002

Applicant: Seiko Epson Corporation

Inventor: Yasunaga Miyazawa
Methods and apparatus for forming compound words for use in a continuous speech recognition system

Patent number: 6385579

Abstract: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair.

Type: Grant

Filed: April 29, 1999

Date of Patent: May 7, 2002

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George Andrei Saon
Identifying mismatches between assumed and actual pronunciations of words

Patent number: 6377921

Abstract: A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.

Type: Grant

Filed: June 26, 1998

Date of Patent: April 23, 2002

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Mukund Padmanabhan
Method of enrolling phone-based speaker specific commands

Patent number: 6377924

Abstract: A method of enrolling phone-based speaker specific commands includes the first step of providing a set of (H) of speaker-independent phone-based Hidden Markov Models (HMMs), grammar (G) comprising a loop of phones with optional between word silence (BWS) and two utterances U1, and U2 of the command produced by the enrollment speaker and wherein the first frames of the first utterance contain only background noise. The processor generates a sequence of phone-like HMMs and the number of HMMs in that sequence as output. The second step performs model mean adjustment to suit enrollment microphone and speaker characteristics and performs segmentation. The third step generates an HMM for each segment except for silence for utterance U1. The fourth step re-estimates the HMM using both utterance U1 and U2.

Type: Grant

Filed: February 10, 2000

Date of Patent: April 23, 2002

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, Coimbatore S. Ramalingam
Method and apparatus for aligning ambiguity in finite state transducers

Publication number: 20020046017

Abstract: A method prepares a functional finite-state transducer (FST) with an epsilon or empty string on the input side for factorization into a bimachine. The method creates a left-deterministic input finite-state automation (FSA) by extracting and left-determinizing the input side of the functional FST. Subsequently, the corresponding sub-paths in the FST are identified for each arc in the left-deterministic FST and aligned.

Type: Application

Filed: December 18, 2000

Publication date: April 18, 2002

Applicant: Xerox Corporation

Inventor: Andre Kempe
Method and apparatus for improved call handling and service based on caller's demographic information

Publication number: 20020046030

Abstract: Information that is latent in a caller's voice is processed for purposes of improving the handling of the call in any type of voice-interactive application. This implicit information in a caller's voice is not related to the actual words being said but rather to the characteristics of how those words are being said. This information, related to the caller's unique demographic profile, is used to decide how to respond to the caller for improved business performance. For example, by estimating the age and the gender of a caller based on his/her voice signal, a vendor associated with a calling center or Web site is able to make a sophisticated choice of what advertisement to present to the user or how to formulate a response to the caller. Similarly, this latent voice information can be used to determine which agent is likely best suited to handle a call with a caller with an estimated demographic, with the caller then being connected to that agent.

Type: Application

Filed: May 16, 2001

Publication date: April 18, 2002

Inventors: Jayant Ramaswamy Haritsa, Daniel Francis Lieuwen
Compressing HMM prototypes

Publication number: 20020046031

Abstract: A method is described for compressing the storage space required by HMM prototypes in an electronic memory. For this purpose prescribed HMM prototypes are mapped onto compressed HMM prototypes with the aid of a neural network (encoder). These can be stored with a smaller storage space than the uncompressed HMM prototypes. A second neural network (decoder) serves to reconstruct the HMM prototypes.

Type: Application

Filed: September 6, 2001

Publication date: April 18, 2002

Applicant: Siemens Aktiengesellschaft

Inventor: Harald Hoege
Automatic retraining of a speech recognizer while using reliable transcripts

Patent number: 6374221

Abstract: Automatic retraining of a speech recognizer during its normal operation in conjunction with an electronic device responsive to the speech recognizer is addressed. In this retraining, stored trained models are retrained on the basis of recognized user utterances. Feature vectors, model state transitions, and tentative recognition results are stored upon processing and evaluation of speech samples of the user utterances. A reliable transcript is determined for later adaptation of a speech model, in dependence upon the user's successive behavior when interacting with the speech recognizer and the electronic device. For example, in a name dialing process, such a behavior can be manual or voice re-dialing of the same number or dialing of a different phone number, immediately aborting an established communication, or braking it after a short period of time.

Type: Grant

Filed: June 22, 1999

Date of Patent: April 16, 2002

Assignee: Lucent Technologies Inc.

Inventor: Raziel Haimi-Cohen
System and apparatus for recognizing speech

Patent number: 6374212

Abstract: A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.

Type: Grant

Filed: March 13, 2001

Date of Patent: April 16, 2002

Assignee: AT&T Corp.

Inventors: Steven Phillips, Anne Rogers
Method of memory management in speech recognition

Patent number: 6374222

Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.

Type: Grant

Filed: July 16, 1999

Date of Patent: April 16, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
N-best search for continuous speech recognition using viterbi pruning for non-output differentiation states

Patent number: 6374220

Abstract: A method for N-best search for continuous speech recognition with limited storage space includes the steps of Viterbi pruning word level (same word, different time alignment, thus non-output differentiation) states and keeping the N-best sub-optimal paths for sentence level (output differentiation) states.

Type: Grant

Filed: July 15, 1999

Date of Patent: April 16, 2002

Assignee: Texas Instruments Incorporated

Inventor: Yu-Hung Kao
Decoding multiple HMM sets using a single sentence grammar

Publication number: 20020042710

Abstract: For a given sentence grammar, speech recognizers are often required to decode M set of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs only 1 out of the M subnetwork and yet gives the same recognition performance, thus reducing memory requirement for network storage by M-1/M.

Type: Application

Filed: July 26, 2001

Publication date: April 11, 2002

Inventor: Yifan Gong

prev … 6 7 8 9 10 11 12 13 14 next