Markov Patents (Class 704/256)
  • Patent number: 6523005
    Abstract: A method and also a configuration for determining a descriptive feature of a speech signal, in which a first speech model is trained with a first time pattern and a second speech model is trained with a second time pattern. The second speech model is initialized with the first speech model.
    Type: Grant
    Filed: September 10, 2001
    Date of Patent: February 18, 2003
    Assignee: Siemens Aktiengesellschaft
    Inventor: Martin Holzapfel
  • Patent number: 6519562
    Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.
    Type: Grant
    Filed: February 25, 1999
    Date of Patent: February 11, 2003
    Assignee: Speechworks International, Inc.
    Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
  • Patent number: 6519563
    Abstract: A speaker verification method and apparatus which advantageously minimizes the constraints on the customer and simplifies the system architecture by using a speaker dependent, rather than a speaker independent, background model, thereby obtaining many of the advantages of using a background model in a speaker verification process without many of the disadvantages thereof. In particular, no training data (e.g. speech) from anyone other than the customer is required, no speaker independent models need to be produced, no a priori knowledge of acoustic rules are required, and, no multi-lingual phone models, dictionaries, or letter-to-sound rules are needed. Nonetheless, in accordance with an illustrative embodiment of the present invention, the customer is free to select any password phrase in any language.
    Type: Grant
    Filed: November 22, 1999
    Date of Patent: February 11, 2003
    Assignee: Lucent Technologies Inc.
    Inventors: Chin-Hui Lee, Qi P. Li, Olivier Siohan, Arun Chandrasekaran Surendran
  • Patent number: 6510411
    Abstract: A simplification of the process of developing call or dialog flows for use in an Interactive Voice Response system is provided. Three principal aspects of the invention include a task-oriented dialog model (or task model), development tool and a Dialog Manager. The task model is a framework for describing the application-specific information needed to perform the task. The development tool is an object that interprets a user specified task model and outputs information for a spoken dialog system to perform according to the specified task model. The Dialog Manager is a runtime system that uses output from the development tool in carrying out interactive dialogs to perform the task specified according to the task model. The Dialog Manager conducts the dialog using the task model and its built-in knowledge of dialog management. Thus, generic knowledge of how to conduct a dialog is separated from the specific information to be collected in a particular application.
    Type: Grant
    Filed: October 29, 1999
    Date of Patent: January 21, 2003
    Assignee: Unisys Corporation
    Inventors: Lewis M. Norton, Deborah A. Dahl, Marcia C. Linebarger
  • Patent number: 6507816
    Abstract: A method and system for evaluating the accuracy of a computer speech recognition system counts and indexes the total number of words dictated and the number of words corrected. The corrections are tallied after being made in a correction window and include words contained in an alternative list as well as words input by the user and within a stored word database. A processor calculates the approximate accuracy of the speech recognition system as the ratio of the number of correct words to the total number of words dictated. An accuracy ratio is calculated for each dictation session and an overall ratio is calculated for all sessions combined. The system also keeps individual and overall indexes of the number of times the corrected words were in alternate lists or not within the word database and uses these indexes to calculate additional accuracy values.
    Type: Grant
    Filed: May 4, 1999
    Date of Patent: January 14, 2003
    Assignee: International Business Machines Corporation
    Inventor: Kerry A. Ortega
  • Publication number: 20030009334
    Abstract: A speech processing board configured in accordance with the inventive arrangements can include multiple processor modules, each processor module having an associated local memory, each processor module hosting at least one instance of a speech application task; a storage system for storing speech task data, the speech task data including language models and finite state grammars; a local communications bus communicatively linking each processor module through which each processor module can exchange speech task data with the storage system; and, a communications bridge to a host system, wherein the communications bridge can provide an interface to the local communications bus through which data can be exchanged between the processor modules and the host system. Notably, the host system can be a CT media services system or a VoIP gateway/endpoint.
    Type: Application
    Filed: July 3, 2001
    Publication date: January 9, 2003
    Applicant: International Business Machines Corporation
    Inventors: Harry W. Printz, Bruce A. Smith
  • Patent number: 6505156
    Abstract: A keyword is recognized in spoken language by assuming a start of this keyword is at every sampling time. An attempt is then made to image this keyword onto a sequence of HMM statusses that represent the keyword. The best path in a presentation space is determined with the Viterbi algorithm; and a local confidence standard is employed instead of the emission probability used in the Viterbi algorithm. When a global confidence standard that is composed of local confidence standards downwardly crosses a lower barrier for the best Viterbi path, then the keyword is recognized; and the sampling time assumed as start of the keyword is confirmed.
    Type: Grant
    Filed: February 25, 2000
    Date of Patent: January 7, 2003
    Assignee: Siemens Aktiengesellschaft
    Inventors: Jochen Junkawitsch, Harald Höge
  • Patent number: 6502072
    Abstract: A method and apparatus is provided for two-tier noise rejection in speech recognition. The method and apparatus convert an analog speech signal into a digital signal and extract features from the digital signal. A hypothesis speech word and a hypothesis noise word are identified from respective extracted features. The features associated with the hypothesis speech word are examined in a second tier of noise rejection to determine if the features are more likely to represent noise than speech. The hypothesis speech word is replaced by a noise marker if the features are more likely to represent noise than speech.
    Type: Grant
    Filed: October 12, 1999
    Date of Patent: December 31, 2002
    Assignee: Microsoft Corporation
    Inventors: Li Jiang, Xuedong Huang
  • Patent number: 6499015
    Abstract: The present invention enables a computer user to select a function represented via a graphical user interface by speaking command related to the function into audio processing circuitry. A voice recognition program interprets the spoken words to determine the function that is desired for execution. The user may use the cursor to identify an element on the graphical user interface display or speak the name of that element. The computer responds to the identification of the element by displaying a menu of the voice commands associated with that element.
    Type: Grant
    Filed: August 12, 1999
    Date of Patent: December 24, 2002
    Assignee: International Business Machines Corporation
    Inventors: Brian S. Brooks, Keith P. Loring, Maria Milenkovic
  • Patent number: 6499012
    Abstract: A method and apparatus for generating a pair of data elements is provided suitable for use in a speaker verification system. The pair includes a first element representative of a speaker independent template and a second element representative of an extended speaker specific speech pattern. An audio signal forming enrollment data associated with a given speaker is received and processed to derive a speaker independent template and a speaker specific speech pattern. The speaker specific speech pattern is then processed to derive an extended speaker specific speech pattern. The extended speaker specific speech pattern includes a set of expanded speech models, each expanded speech model including a plurality of groups of states, the groups of states being linked to one another by inter-group transitions. Optionally, the expanded speech models are processed on the basis of the enrollment data to condition at least one of the plurality of inter-group transitions.
    Type: Grant
    Filed: December 23, 1999
    Date of Patent: December 24, 2002
    Assignee: Nortel Networks Limited
    Inventors: Stephen Douglas Peters, Matthieu Hebert, Daniel Boies
  • Publication number: 20020184025
    Abstract: A speech recognition system (10) having a sampler block (12) and a feature extractor block (14) for extracting time domain and spectral domain parameters from a spoken input speech into a feature vector. A polynomial expansion block (16) generates polynomial coefficients from the feature vector. A correlator block (20), a sequence vector block (22), an HMM table (24) and a Veterbi block (26) perform the actual speech recognition based on the speech units stored in a speech unit table (18) and the HMM word models stored in the HMM table (24). The HMM word model that produces the highest probability is determined to be the word that was spoken.
    Type: Application
    Filed: May 31, 2001
    Publication date: December 5, 2002
    Applicant: Motorola, Inc.
    Inventors: David L. Barron, William Chunhung Yip
  • Patent number: 6490557
    Abstract: The present invention is embodied in a system and method for recognizing speech and transcribing speech in real time. The system includes a computer, which could be in a LAN or WAN linked to other computer systems through the Internet. The computer has a controller, or similar device, to filter background noise and convert incoming signals to digital format. The digital signals are transcribed to a word list, which is processed by an automatic speech recognition system. This system synchronizes and compares the lists and forwards the list to a speech recognition learning system, which stores the data on-site. The stored data is forwarded to an off-site storage system, and an off-site large scale learning system that processes the data from all sites on the wide area network system.
    Type: Grant
    Filed: March 3, 1999
    Date of Patent: December 3, 2002
    Inventor: John C. Jeppesen
  • Publication number: 20020173959
    Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For each spech utterance the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise is determined. The mean vector of the noisy data over the noisy speech space is determinedand thid is removed from model mean vectors adapted to noise to get the target model.
    Type: Application
    Filed: January 18, 2002
    Publication date: November 21, 2002
    Inventor: Yifan Gong
  • Publication number: 20020165717
    Abstract: The invention provides a method and system for extracting information from text documents. A document intake module receives and stores a plurality of text documents for processing, an input format conversion module converts each document into a standard format for processing, an extraction module identifies and extracts desired information from each text document, and an output format conversion module converts the information extracted from each document into a standard output format. These modules operate simultaneously on multiple documents in a pipeline fashion so as to maximize the speed and efficiency of extracting information from the plurality of documents.
    Type: Application
    Filed: April 8, 2002
    Publication date: November 7, 2002
    Inventors: Robert P. Solmer, Christopher K. Harris, Mauritius A.R. Schmidtler, James W. Dolter
  • Patent number: 6470315
    Abstract: Speech recognition and the generation of speech recognition models is provided including the generation of unique phonotactic garbage models (15) to identify speech by, for example, English language constraints in addition to noise, silence and other non-speech models (11) and for speech recognition specific word models.
    Type: Grant
    Filed: September 11, 1996
    Date of Patent: October 22, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: Lorin Paul Netsch, Barbara Janet Wheatley
  • Patent number: 6466908
    Abstract: A system and method for training a class-specific hidden Markov model (HMM) is used for modeling physical phenomena, such as speech, characterized by a finite number of states. The method receives training data and estimates parameters of the class-specific HMM from the training data using a modified Baum-Welch algorithm, which uses likelihood ratios with respect to a common state (e.g., noise) and based on sufficient statistics for each state. The parameters are stored for use in processing signals representing the physical phenomena, for example, in speech processing applications. The modified Baum-Welch algorithm is an iterative algorithm including class-specific forward and backward procedures and HMM reestimation formulas.
    Type: Grant
    Filed: January 14, 2000
    Date of Patent: October 15, 2002
    Assignee: The United States of America as represented by the Secretary of the Navy
    Inventor: Paul M. Baggenstoss
  • Patent number: 6463413
    Abstract: A distributed speech processing system for constructing speech recognition reference models that are to be used by a speech recognizer in a small hardware device, such as a personal digital assistant or cellular telephone. The speech processing system includes a speech recognizer residing on a first computing device and a speech model server residing on a second computing device. The speech recognizer receives speech training data and processes it into an intermediate representation of the speech training data. The intermediate representation is then communicated to the speech model server. The speech model server generates a speech reference model by using the intermediate representation of the speech training data and then communicates the speech reference model back to the first computing device for storage in a lexicon associated with the speech recognizer.
    Type: Grant
    Filed: April 20, 1999
    Date of Patent: October 8, 2002
    Assignee: Matsushita Electrical Industrial Co., Ltd.
    Inventors: Ted H. Applebaum, Jean-Claude Junqua
  • Publication number: 20020143540
    Abstract: A voice recognition (VR) system is disclosed that utilizes a combination of speaker independent (SI) and speaker dependent (SD) acoustic models. At least one SI acoustic model is used in combination with at least one SD acoustic model to provide a level of speech recognition performance that at least equals that of a purely SI acoustic model. The disclosed hybrid SI/SD VR system continually uses unsupervised training to update the acoustic templates in the one or more SD acoustic models.
    Type: Application
    Filed: March 28, 2001
    Publication date: October 3, 2002
    Inventors: Narendranath Malayath, Andrew P. DeJaco, Chienchung Chang, Suhail Jalil, Ning Bi, Harinath Garudadri
  • Patent number: 6460017
    Abstract: When adapting a lexicon in a speech recognition system, a code book of hidden Markov sound models made available with a speech recognition system is adapted for specific applications. These applications are thereby defined by a lexicon of the application that is modified by the user. The adaption ensues during the operation and occurs by a shift of the stored mid-point vector of the probability density distributions of hidden Markov models in the direction of a recognized feature vector of sound expressions and with reference to the specifically employed hidden Markov models. Compared to standard methods, this method has the advantage that it ensues on-line and that it assures a very high recognition rate given a low calculating outlay. Further, the outlay for training specific sound models for corresponding applications is avoided.
    Type: Grant
    Filed: June 10, 1999
    Date of Patent: October 1, 2002
    Assignee: Siemens Aktiengesellschaft
    Inventors: Udo Bub, Harald Höge, Joachim Köhler
  • Patent number: 6456971
    Abstract: A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequence, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: September 24, 2002
    Assignee: AT&T Corp.
    Inventors: Mehryar Mohri, Fernando Carlos Neves Pereira, Michael Dennis Riley
  • Patent number: 6456970
    Abstract: The search network in a speech recognition system is reduced by parsing the incoming speech expanding all active paths (101), comparing to speech models and scoring the paths and storing recognition level values at the slots (103) and accumulating the scores and discarding previous slots when a word end is detected creating a word end slot (109).
    Type: Grant
    Filed: July 15, 1999
    Date of Patent: September 24, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yu-Hung Kao
  • Publication number: 20020133345
    Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.
    Type: Application
    Filed: January 12, 2001
    Publication date: September 19, 2002
    Inventor: Harinath Garudadri
  • Patent number: 6435877
    Abstract: A training tool for training and assessing one or more auditory processing, phonological awareness, phonological processing and reading skills of an individual is provided. The training tool may use various graphical games to train the individual's ability in a particular set of auditory processing, phonological awareness, phonological processing and reading skills. The system may use speech recognition technology to permit the user to interact with the games.
    Type: Grant
    Filed: July 20, 2001
    Date of Patent: August 20, 2002
    Assignee: Cognitive Concepts, Inc.
    Inventor: Janet M. Wasowicz
  • Patent number: 6438520
    Abstract: The apparatus, method and system of the present invention provide for cross-speaker speech recognition, and are particularly suited for telecommunication applications such as automatic name (voice) dialing, message management, call return management, and incoming call screening. The method of the present invention includes receiving incoming speech, such as an incoming caller name, and generating a phonetic transcription of the incoming speech with a speaker-independent, hidden Markov model having an unconstrained grammar in which any phoneme may follow any other phoneme, followed by determining a transcription parameter as a likelihood of fit of the incoming speech to the speaker-independent model.
    Type: Grant
    Filed: January 20, 1999
    Date of Patent: August 20, 2002
    Assignee: Lucent Technologies Inc.
    Inventors: Carol Lynn Curt, Rafid Antoon Sukkar, John Joseph Wisowaty
  • Patent number: 6434522
    Abstract: A device capable of achieving recognition at a high accuracy and with fewer calculations and which utilizes an HMM. The present device has a vector quantizing circuit generating a model by quantizing vectors of a training pattern having a vector series, and converting the vectors into a label series of clusters to which they belong, a continuous distribution probability density HMM generating circuit for generating a continuous distribution probability density HMM from a quantized vector series corresponding to each label of the label series, and a label incidence calculating circuit for calculating the incidence of the labels in each state from the training vectors classified in the same clusters and the continuous distribution probability density HMM.
    Type: Grant
    Filed: May 28, 1997
    Date of Patent: August 13, 2002
    Inventor: Eiichi Tsuboka
  • Patent number: 6430532
    Abstract: A method determines a representative sound on the basis of a structure which includes a set of sound models. Each sound model has at least one representative for the modeled sound. In the structure, a first sound model, matching with regard to a first quality criterion, is determined from the set of sound models. At least one second sound model is determined from the set of sound models dependent on a characteristic state criterion of the structure. At least some of the representatives of the first sound model and of the at least one second sound model are assessed in addition to the first quality criterion with regard to a second quality criterion. The at least one representative which has an adequate overall quality criterion with regard to the first and second quality criteria is determined as a representative sound from the representatives of the first sound model and the at least one second sound model.
    Type: Grant
    Filed: August 21, 2001
    Date of Patent: August 6, 2002
    Assignee: Siemens Aktiengesellschaft
    Inventor: Martin Holzapfel
  • Publication number: 20020095288
    Abstract: A method of determining the language of a text message received by a mobile telecommunications device comprises receiving an input text message at a mobile telecommunications device; analysing the input text message using language information stored in the mobile telecommunications device; selecting, from a group of languages defined by the language information, a most likely language for the input text message; and outputting, from the mobile telecommunications device, speech signals corresponding to the input text message, in the selected language.
    Type: Application
    Filed: September 5, 2001
    Publication date: July 18, 2002
    Inventors: Erik Sparre, Alberto Jimenez Feltstrom
  • Patent number: 6418411
    Abstract: The system uses utterances recorded in low noise condition, such as a car engine off to optimally adapt speech acoustic models to transducer and speaker characteristics and uses speech pauses to adjust the adopted models to a changing background noise, such as when in a car with the engine running.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: July 9, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 6418412
    Abstract: A speech recognition system utilizes multiple quantizers to process frequency parameters and mean compensated frequency parameters derived from an input signal. The quantizers may be matrix and vector quantizer pairs, and such quantizer pairs may also function as front ends to a second stage speech classifiers such as hidden Markov models (HMMs) and/or utilizes neural network postprocessing to, for example, improve speech recognition performance. Mean compensating the frequency parameters can remove noise frequency components that remain approximately constant during the duration of the input signal. HMM initial state and state transition probabilities derived from common quantizer types and the same input signal may be consolidated to improve recognition system performance and efficiency. Matrix quantization exploits the “evolution” of the speech short-term spectral envelopes as well as frequency domain information, and vector quantization (VQ) primarily operates on frequency domain information.
    Type: Grant
    Filed: August 28, 2000
    Date of Patent: July 9, 2002
    Assignee: Legerity, Inc.
    Inventors: Safdar M. Asghar, Lin Cong
  • Publication number: 20020087315
    Abstract: A computer-implemented method and system for speech recognition of a user speech input. The user speech input which contains utterances from a user is received. A first language model recognizes at least a portion of the utterances from the user speech input. The first language model has utterance terms that form a general category. A second language model is selected based upon the identified utterances from use of the first language model. The second language model contains utterance terms that are a subset category of the general category of utterance terms in the first language model. Subset utterances are recognized with the selected second language model from the user speech input.
    Type: Application
    Filed: May 23, 2001
    Publication date: July 4, 2002
    Inventors: Victor Wai Leung Lee, Otman A. Basir, Fakhreddine O. Karray, Jiping Sun, Xing Jing
  • Patent number: 6411929
    Abstract: Frames making up an input speech are each collated with a string of phonemes representing speech candidates to be recognized, whereby evaluation values regarding the phonemes are computed. The frames are each compared with part of the phoneme string so as to reduce computations and memory capacity required in recognizing the input speech based on the evaluation values. That is, each frame is compared with a portion of the phoneme string to acquire an evaluation value for each phoneme. If the acquired evaluation value meets a predetermined condition, part of the phonemes to be collated with the next frame are changed. Illustratively, if the evaluation value for the phoneme heading a given portion of collated phonemes is smaller than the evaluation value of the phoneme which terminates that phoneme portion, then the head phoneme is replaced by the next phoneme. The new portion of phonemes obtained by the replacement is used for collation with the next frame.
    Type: Grant
    Filed: July 26, 2000
    Date of Patent: June 25, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Kazuyoshi Ishiwatari, Kazuo Kondo, Shinji Wakisaka
  • Patent number: 6411683
    Abstract: An automated telephone call designation system includes a database that stores a plurality of keywords where each keyword is associated with at least one topic designation. The system monitors the conversation of an ongoing telephone call by utilizing voice recognition software resident in a network to detect the use of the keywords in the conversation. The keywords used in the conversation are correlated to the topic designation(s) associated with the keywords. Based on the correlation of the keywords to the topic designation(s) associated with the keywords, a topic for the ongoing telephone call is designated. A third party that desires to join an ongoing conversation of interest reviews the topics of the ongoing conversations and is bridged into the conversation of interest.
    Type: Grant
    Filed: February 9, 2000
    Date of Patent: June 25, 2002
    Assignee: AT&T Corp.
    Inventors: Randy G. Goldberg, Robert Edward Markowitz, Kenneth H. Rosen
  • Patent number: 6405168
    Abstract: A speech recognition training system that provides for model generation to be used within speaker dependent speech recognition systems requiring very limited training data, including single token training. The present invention provides a very fast and reliable training method based on the segmentation of a speech signal for subsequent estimating of speaker dependent word models. In addition, the invention provides for a robust method of performing end-point detection of a word contained within a speech utterance or speech signal. The invention is geared ideally for speaker dependent speech recognition systems that employ word-based speaker dependent models. The invention provides the end-point detection method is operable to extract a desired word or phrase from a speech signal that is recorded in varying degrees of undesirable background noise. In addition, the invention provides a simplified method of building the speaker dependent models using a simplified hidden Markov modeling method.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: June 11, 2002
    Assignee: Conexant Systems, Inc.
    Inventors: Aruna Bayya, Dianne L. Steiger
  • Patent number: 6401064
    Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.
    Type: Grant
    Filed: May 24, 2001
    Date of Patent: June 4, 2002
    Assignee: AT&T Corp.
    Inventor: Lawrence Kevin Saul
  • Patent number: 6401065
    Abstract: An intelligent use-friendly keyboard interface that is easily adaptable for wide variety of functions and features, and also adaptable to reduced size portable computers. Speech recognition and semantic processing for controlling and interpreting multiple symbols are used in conjunction with programmable switches with embedded LCD displays. Hidden Markov models are employed to interpret a combination of voice and keyboard input.
    Type: Grant
    Filed: June 17, 1999
    Date of Patent: June 4, 2002
    Assignee: International Business Machines Corporation
    Inventors: Dimitri Kanevsky, Stephane Maes, Clifford A. Pickover, Alexander Zlatsin
  • Patent number: 6397181
    Abstract: A method, an apparatus, a computer program product and a system for voice annotating and retrieving digital media content are disclosed. An annotation module (420) post annotates digital media data (410), including audio, image and/or video data, with speech. A word lattice (222) can be created from speech annotation (210) dependent upon acoustic and/or linguistic knowledge. An indexing module (430) then indexes the speech-annotated data (422). The word lattice (222) is reverse indexed (230), and content addressing (240) is applied to produce the indexed data (432, 242). A speech query (474) can be generated as input to a retrieval module (480) for retrieving a segment of the indexed digital media data (432). The speech query (474, 310) is converted into a word lattice (322), and a shortlist (344) is produced from it (322) by confidence filtering (330). The shortlist (344) is input to a lattice search engine (350) to search the indexed content (342) to obtain the search result (352).
    Type: Grant
    Filed: June 4, 1999
    Date of Patent: May 28, 2002
    Assignee: Kent Ridge Digital Labs
    Inventors: Haizhou Li, Jiankang Wu, Arcot Desai Narasimhalu
  • Patent number: 6397182
    Abstract: A system and a method for generating a speech recognition dictionary that can be used in a telephone system having speech recognition capabilities, in particular capabilities to effect a connection when the calling party utters the name of a subscriber (called party). The method generates transcriptions associated to respective vocabulary items in the speech recognition dictionary from audio greetings recorded by the telephone system subscribers. Normally such audio greetings are used in voice messaging applications. Typically, the greetings are played before allowing callers to leave messages in a voice mailbox of subscribers. An individual greeting is audio information that contains the name of the subscriber. This audio information is processed to generate a transcription associated to a vocabulary item in the speech recognition dictionary, representative of the subscriber name.
    Type: Grant
    Filed: October 12, 1999
    Date of Patent: May 28, 2002
    Assignee: Nortel Networks Limited
    Inventors: Brian Cruickshank, Pierre M. Forgues, Lin Lin
  • Patent number: 6389395
    Abstract: Out-of-vocabulary word models for a speech recognizer vocabulary are generated by forming phonemic transcriptions (phonetic baseforms) of user's utterances in terms of existing reference phonemes by using a speech recognition algorithm to match input sub-word feature sample sequences to suitably-constrained allowable sequences of existing reference phoneme features. The resultant new-vocabulary-word phonetic baseform models are stored for subsequent speech recognition using the same recognition algorithm.
    Type: Grant
    Filed: April 4, 1997
    Date of Patent: May 14, 2002
    Assignee: British Telecommunications public limited company
    Inventor: Simon P. Ringland
  • Publication number: 20020055842
    Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.
    Type: Application
    Filed: September 19, 2001
    Publication date: May 9, 2002
    Applicant: Seiko Epson Corporation
    Inventor: Yasunaga Miyazawa
  • Patent number: 6385579
    Abstract: A method of forming an augmented textual training corpus with compound words for use with an associated with a speech recognition system includes computing a measure for a consecutive word pair in the training corpus. The measure is then compared to a threshold value. The consecutive word pair is replaced in the training corpus with a corresponding compound word depending on the result of the comparison between the measure and the threshold value. One or more measures may be employed. A first measure is an average of a direct bigram probability value and a reverse bigram probability value. A second measure is based on mutual information between the words in the pair. A third measure is based on a comparison of the number of times a co-articulated baseform for the pair is preferred over a concatenation of non-co-articulated individual baseforms of the words forming the pair.
    Type: Grant
    Filed: April 29, 1999
    Date of Patent: May 7, 2002
    Assignee: International Business Machines Corporation
    Inventors: Mukund Padmanabhan, George Andrei Saon
  • Patent number: 6377921
    Abstract: A method of identifying mismatches between acoustic data and a corresponding transcription, the transcription being expressed in terms of basic units, comprises the steps of: aligning the acoustic data with the corresponding transcription; computing a probability score for each instance of a basic unit in the acoustic data with respect to the transcription; generating a distribution for each basic unit; tagging, as mismatches, instances of a basic unit corresponding to a particular range of scores in the distribution for each basic unit based on a threshold value; and correcting the mismatches.
    Type: Grant
    Filed: June 26, 1998
    Date of Patent: April 23, 2002
    Assignee: International Business Machines Corporation
    Inventors: Lalit R. Bahl, Mukund Padmanabhan
  • Patent number: 6377924
    Abstract: A method of enrolling phone-based speaker specific commands includes the first step of providing a set of (H) of speaker-independent phone-based Hidden Markov Models (HMMs), grammar (G) comprising a loop of phones with optional between word silence (BWS) and two utterances U1, and U2 of the command produced by the enrollment speaker and wherein the first frames of the first utterance contain only background noise. The processor generates a sequence of phone-like HMMs and the number of HMMs in that sequence as output. The second step performs model mean adjustment to suit enrollment microphone and speaker characteristics and performs segmentation. The third step generates an HMM for each segment except for silence for utterance U1. The fourth step re-estimates the HMM using both utterance U1 and U2.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: April 23, 2002
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, Coimbatore S. Ramalingam
  • Publication number: 20020046017
    Abstract: A method prepares a functional finite-state transducer (FST) with an epsilon or empty string on the input side for factorization into a bimachine. The method creates a left-deterministic input finite-state automation (FSA) by extracting and left-determinizing the input side of the functional FST. Subsequently, the corresponding sub-paths in the FST are identified for each arc in the left-deterministic FST and aligned.
    Type: Application
    Filed: December 18, 2000
    Publication date: April 18, 2002
    Applicant: Xerox Corporation
    Inventor: Andre Kempe
  • Publication number: 20020046030
    Abstract: Information that is latent in a caller's voice is processed for purposes of improving the handling of the call in any type of voice-interactive application. This implicit information in a caller's voice is not related to the actual words being said but rather to the characteristics of how those words are being said. This information, related to the caller's unique demographic profile, is used to decide how to respond to the caller for improved business performance. For example, by estimating the age and the gender of a caller based on his/her voice signal, a vendor associated with a calling center or Web site is able to make a sophisticated choice of what advertisement to present to the user or how to formulate a response to the caller. Similarly, this latent voice information can be used to determine which agent is likely best suited to handle a call with a caller with an estimated demographic, with the caller then being connected to that agent.
    Type: Application
    Filed: May 16, 2001
    Publication date: April 18, 2002
    Inventors: Jayant Ramaswamy Haritsa, Daniel Francis Lieuwen
  • Publication number: 20020046031
    Abstract: A method is described for compressing the storage space required by HMM prototypes in an electronic memory. For this purpose prescribed HMM prototypes are mapped onto compressed HMM prototypes with the aid of a neural network (encoder). These can be stored with a smaller storage space than the uncompressed HMM prototypes. A second neural network (decoder) serves to reconstruct the HMM prototypes.
    Type: Application
    Filed: September 6, 2001
    Publication date: April 18, 2002
    Applicant: Siemens Aktiengesellschaft
    Inventor: Harald Hoege
  • Patent number: 6374221
    Abstract: Automatic retraining of a speech recognizer during its normal operation in conjunction with an electronic device responsive to the speech recognizer is addressed. In this retraining, stored trained models are retrained on the basis of recognized user utterances. Feature vectors, model state transitions, and tentative recognition results are stored upon processing and evaluation of speech samples of the user utterances. A reliable transcript is determined for later adaptation of a speech model, in dependence upon the user's successive behavior when interacting with the speech recognizer and the electronic device. For example, in a name dialing process, such a behavior can be manual or voice re-dialing of the same number or dialing of a different phone number, immediately aborting an established communication, or braking it after a short period of time.
    Type: Grant
    Filed: June 22, 1999
    Date of Patent: April 16, 2002
    Assignee: Lucent Technologies Inc.
    Inventor: Raziel Haimi-Cohen
  • Patent number: 6374212
    Abstract: A continuous, speaker independent, speech recognition method and system for recognizing a variety of vocabulary input signals. A language model which is an implicit description of a graph consisting of a plurality of states and arcs is inputted into the system. An input speech signal, corresponding to a plurality of speech frames is received and processed using a shared memory multipurpose machine having a plurality of microprocessors working in parallel to produce a textual representation of the speech signal.
    Type: Grant
    Filed: March 13, 2001
    Date of Patent: April 16, 2002
    Assignee: AT&T Corp.
    Inventors: Steven Phillips, Anne Rogers
  • Patent number: 6374222
    Abstract: A memory management method is described for reducing the size of memory required in speech recognition searching. The searching involves parsing the input speech and building a dynamically changing search tree. The basic unit of the search network is a slot. The present invention describes ways of reducing the size of the slot and therefore the size of the required memory. The slot size is reduced by removing the time index, by the model_index and state_index being packed and by a coding for last_time field where one bit represents a slot is available for reuse and a second bit is for backtrace update.
    Type: Grant
    Filed: July 16, 1999
    Date of Patent: April 16, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yu-Hung Kao
  • Patent number: 6374220
    Abstract: A method for N-best search for continuous speech recognition with limited storage space includes the steps of Viterbi pruning word level (same word, different time alignment, thus non-output differentiation) states and keeping the N-best sub-optimal paths for sentence level (output differentiation) states.
    Type: Grant
    Filed: July 15, 1999
    Date of Patent: April 16, 2002
    Assignee: Texas Instruments Incorporated
    Inventor: Yu-Hung Kao
  • Publication number: 20020042710
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M set of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs only 1 out of the M subnetwork and yet gives the same recognition performance, thus reducing memory requirement for network storage by M-1/M.
    Type: Application
    Filed: July 26, 2001
    Publication date: April 11, 2002
    Inventor: Yifan Gong