Probability Patents (Class 704/240)
  • Patent number: 6868380
    Abstract: A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters.
    Type: Grant
    Filed: March 23, 2001
    Date of Patent: March 15, 2005
    Assignee: Eliza Corporation
    Inventor: John Kroeker
  • Patent number: 6859777
    Abstract: A hypertext navigation system that is controllable by spoken words has hypertext documents to which specific dictionaries and probability models for assisting in an acoustic voice recognition of hyper-links of this hypertext document are allocated. Control of a hypertext viewer or, respectively, browser and navigation in the hypertext document or hypertext system by pronouncing links is provided. The voice recognition is thereby optimally adapted to the links to be recognized without these having to be previously known.
    Type: Grant
    Filed: January 17, 2001
    Date of Patent: February 22, 2005
    Assignee: Siemens Aktiengesellschaft
    Inventor: Darin Edward Krasle
  • Patent number: 6850886
    Abstract: The present invention comprises a system and method for speech verification using an efficient confidence measure, and includes a speech verifier which compares a confidence measure for a recognized word to a predetermined threshold value in order to determine whether the recognized word is valid, where a recognized word corresponds to a word model that produces a highest recognition score. In accordance with the present invention, the foregoing confidence measure may be calculated using the recognition score for the recognized word and a pseudo filler score that may be based upon selected average recognition scores from an N-best list of recognition candidates.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: February 1, 2005
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Gustavo Hernandez Abrego, Xavier Menendez-Pidal
  • Patent number: 6847734
    Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.
    Type: Grant
    Filed: January 26, 2001
    Date of Patent: January 25, 2005
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Tomoyuki Hamamura
  • Patent number: 6839667
    Abstract: A method for performing speech recognition can include receiving user speech and determining a plurality of potential candidates. Each of the candidates can provide a textual interpretation of the speech. Confidence scores can be calculated for the candidates. The confidence scores can be compared to a predetermined threshold. Also, selected ones of the plurality of candidates can be presented to the user as alternative interpretations of the speech when none of the confidence scores is greater than the predetermined threshold. The selected ones of the plurality of candidates can have confidence scores above a predetermined minimum threshold, and thus can have confidence scores within a predetermined range.
    Type: Grant
    Filed: May 16, 2001
    Date of Patent: January 4, 2005
    Assignee: International Business Machines Corporation
    Inventor: David E. Reich
  • Publication number: 20040254790
    Abstract: A method, a system and recording medium in which automatic speech recognition may use large list grammars and a confidence measure driven scalable two-pass recognition strategy.
    Type: Application
    Filed: June 13, 2003
    Publication date: December 16, 2004
    Applicant: International Business Machines Corporation
    Inventors: Miroslav Novak, Diego Ruiz
  • Patent number: 6832190
    Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.
    Type: Grant
    Filed: November 10, 2000
    Date of Patent: December 14, 2004
    Assignee: Siemens Aktiengesellschaft
    Inventors: Jochen Junkawitsch, Harald Höge
  • Patent number: 6832191
    Abstract: To implement a speech recognizer for a language in conditions of substantial unavailability of related speech training material the first step (1,2) is, based on related speech training material, a multilingual speech recognizer (2) for a plurality of known languages. The recognizer for such given language (5) is then implemented by interpolation (4) starting from the said multilingual recognizer (2). The recognizer (5) generated in this fashion is susceptible of being subsequently refined based on related speech training material acquired online (4) during later use (FIG.
    Type: Grant
    Filed: August 28, 2000
    Date of Patent: December 14, 2004
    Assignee: Telecom Italia Lab S.p.A.
    Inventors: Alessandra Frasca, Giorgio Micca, Enrico Palme
  • Publication number: 20040243410
    Abstract: A method and apparatus determine the likelihood of a sequence of words based in part on a segment model. The segment model includes trajectory expressions formed as the product of a polynomial matrix and a generation matrix. The likelihood of the sequence of words is based in part on a segment probability derived by subtracting the trajectory expressions from a feature vector matrix that contains a sequence of feature vectors for a segment of speech. Aspects of the method and apparatus also include training the segment model using such a segment probability.
    Type: Application
    Filed: June 14, 2004
    Publication date: December 2, 2004
    Applicant: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Publication number: 20040243408
    Abstract: A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.
    Type: Application
    Filed: May 30, 2003
    Publication date: December 2, 2004
    Applicant: Microsoft Corporation
    Inventors: Jianfeng Gao, Mu Li, Chang-Ning Huang, Jian Sun, Lei Zhang, Ming Zhou
  • Publication number: 20040243409
    Abstract: An input text is analyzed into morphemes by using a prescribed morphological analysis procedure to generate word strings with part-of-speech tags, including form information for parts of speech having forms, as hypotheses. The probabilities of occurrence of each hypothesis in a corpus of text are calculated by use of two or more part-of-speech n-gram models, at least one of which takes the forms of the parts of speech into consideration. Lexicalized models and class models may also be used. The models are weighted and the probabilities are combined according to the weights to obtain a single probability for each hypothesis. The hypothesis with the highest probability is selected as the solution to the morphological analysis. By combining multiple models, this method can resolve ambiguity with a higher degree of accuracy than methods that use only a single model.
    Type: Application
    Filed: March 30, 2004
    Publication date: December 2, 2004
    Applicant: Oki Electric Industry Co., Ltd.
    Inventor: Tetsuji Nakagawa
  • Publication number: 20040243407
    Abstract: The present invention employs user modeling to model a user's behavior patterns. The user's behavior patterns are then used to influence named entity (NE) recognition.
    Type: Application
    Filed: May 27, 2003
    Publication date: December 2, 2004
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Peter K. L. Mau, Kuansan Wang, Milind Mahajan, Alejandro Acero
  • Publication number: 20040243406
    Abstract: This invention provides a system for speech recognition comparing speech against stored character strings in memory. Speech is transformed into spoken character strings. To accelerate the identification, a small group of characters from the stored character strings and the spoken character string are compared and the probabilities for identification may be calculated from those results. Those stored patterns, where the probability for identifying the speech exceeds a predetermined value, may be selected for further processing. The selected strings may have the remaining characters added to the group of characters for the next comparison. Alternatively, the number of characters for comparison may be incremented by a predetermined number in a step-by-step fashion, reducing the number of comparisons in subsequent steps as the probabilities for identification rise.
    Type: Application
    Filed: January 29, 2004
    Publication date: December 2, 2004
    Inventor: Ansgar Rinscheid
  • Publication number: 20040204941
    Abstract: A digital voice transcription system and method is provided having a digital communication network and a first server operatively coupled to the digital communication network. The first server stores a digital transcription job file corresponding to an author's voice and allocates stored job files for transcription. A second server corresponding to a transcription center is operatively coupled to the digital communication network. The second server is in digital communication with the first server and is arranged to initiate transfer of digital transcription job files allocated to the second server from the first server to the second server.
    Type: Application
    Filed: December 23, 2003
    Publication date: October 14, 2004
    Applicant: WeType4U
    Inventors: David Israch, Yorck P. Haase, Alind Gupta, Pavan Jha, Devesh Bhartiya, Scott C. Boterweg, Philip Austin, Fazil Atacan
  • Publication number: 20040204940
    Abstract: A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. The prior knowledge is embodied in a rule, combined from separate rules created for each label outputted by the classifier, each of which includes a weight measure p(x). A first a set of created entries for increasing the corpus of training entries is created by attaching all labels to each entry of the original corpus of training entries, with a weight &eegr;p(x), or &eegr;(1−p(x)), in association with each label that meets, or fails to meet, the condition specified for the label, &eegr; being a preselected positive number. The second set of is created by not attaching any of the labels to each of the original corpus of training entries, with a weight of &eegr;(1−p(x)), or &eegr;p(x), in association with each label that meets, or fails to meet, the condition specified for the label.
    Type: Application
    Filed: May 31, 2002
    Publication date: October 14, 2004
    Inventors: Hiyan Alshawi, Giuseppe DiFabbrizio, Narendra K. Gupta, Mazin G. Rahim, Robert E. Schapire, Yoram Singer
  • Publication number: 20040199386
    Abstract: A method is developed which includes 1) defining a switching state space model for a continuous valued hidden production-related parameter and the observed speech acoustics, and 2) approximating a posterior probability that provides the likelihood of a sequence of the hidden production-related parameters and a sequence of speech units based on a sequence of observed input values. In approximating the posterior probability, the boundaries of the speech units are not fixed but are optimally determined. Under one embodiment, a mixture of Gaussian approximation is used. In another embodiment, an HMM posterior approximation is used.
    Type: Application
    Filed: April 1, 2003
    Publication date: October 7, 2004
    Applicant: Microsoft Corporation
    Inventors: Hagai Attias, Leo Jingyu Lee, Li Deng
  • Publication number: 20040193412
    Abstract: A speech recognition method, system, and program product, the method comprising in one embodiment: obtaining a frame match score for each of a plurality of different speech elements for a frame; obtaining a scrunched score for each of a plurality of the frame match scores for the frame, wherein a scrunched score means applying a non-linear transformation to each of the frame match scores so that frame match score differences among relatively good competing frame matches are reduced while the score differences between good frame matches and the poor frame matches is substantially maintained or increased, wherein a relatively good frame match score is determined based on a criterion; for each of a plurality of hypotheses, accumulating the scrunched scores for frames of the hypothesis to obtain a hypothesis scrunched score for the hypothesis; selecting a plurality of hypotheses with better hypothesis scrunched scores as compared to the accumulated scrunched scores for other hypotheses; for each of the selected h
    Type: Application
    Filed: March 18, 2003
    Publication date: September 30, 2004
    Applicant: Aurilab, LLC
    Inventor: James K. Baker
  • Patent number: 6789061
    Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.
    Type: Grant
    Filed: August 14, 2000
    Date of Patent: September 7, 2004
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
  • Publication number: 20040167779
    Abstract: In order to prevent degradation of speech recognition accuracy due to an unknown word, a dictionary database has stored therein a word dictionary in which are stored, in addition to words for the objects of speech recognition, suffixes, which are sound elements and a sound element sequence, which form the unknown word, for classifying the unknown word by the part of speech thereof. Based on such a word dictionary, a matching section connects the acoustic models of an sound model database, and calculates the score using the series of features output by a feature extraction section on the basis of the connected acoustic model. Then, the matching section selects a series of the words, which represents the speech recognition result, on the basis of the score.
    Type: Application
    Filed: February 24, 2004
    Publication date: August 26, 2004
    Applicant: SONY CORPORATION
    Inventors: Helmut Lucke, Katsuki Minamino, Yasuharu Asano, Hiroaki Ogawa
  • Patent number: 6782362
    Abstract: A method and apparatus determine the likelihood of a sequence of words based in part on a segment model. The segment model includes trajectory expressions formed as the product of a polynomial matrix and a generation matrix. The likelihood of the sequence of words is based in part on a segment probability derived by subtracting the trajectory expressions from a feature vector matrix that contains a sequence of feature vectors for a segment of speech. Aspects of the method and apparatus also include training the segment model using such a segment probability.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: August 24, 2004
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Publication number: 20040158469
    Abstract: Outputs of an automatic probabilistic event detection system, such as a fact extraction system, a speech-to-text engine or an automatic character recognition system, are matched with comparable results produced manually or by a different system. This comparison allows statistical modeling of the run-time behavior of the event detection system. This model can subsequently be used to give supplemental or replacement data for an output sequence of the system. In particular, the model can effectively calibrate the system for use with data of a particular statistical nature.
    Type: Application
    Filed: February 5, 2004
    Publication date: August 12, 2004
    Applicant: Verint Systems, Inc.
    Inventor: Michael Brand
  • Publication number: 20040153319
    Abstract: A speech recognition system comprises exactly two automated speech recognition (ASR) engines connected to receive the same inputs. Each engine produces a recognition output, a hypothesis. The system implements one of two (or both) methods for combining the output of the two engines. In one method, a confusion matrix statistically generated for each speech recognition engine is converted into an alternatives matrix in which every column is ordered by highest-to-lowest probability. A program loop is set up in which the recognition outputs of the speech recognition engines are cross-compared with the alternatives matrices. If the output from the first ASR engine matches an alternative, its output is adopted as the final output. If the vectors provided by the alternatives matrices are exhausted without finding a match, the output from the first speech recognition engine is adopted as the final output. In a second method, the confusion matrix for each ASR engine is converted into Bayesian probability matrix.
    Type: Application
    Filed: January 30, 2003
    Publication date: August 5, 2004
    Inventor: Sherif Yacoub
  • Publication number: 20040148167
    Abstract: A method for controlling an information system during the output of stored information segments via a signaling device (50a). Useful information is stored in a database (32) for being requested, from which information at least one information segment is specified as a first data segment (W1) via a first voice signal (sa(t),sa(z)) and is provided via a control output (20,40,50;50a) or is converted (50b) into a control signal for a technical device (G). The information is organized in the database such that an initially limited first information area (32a) of stored information is accessible (4,4a,4b) to said voice signal, for selecting the specified information segment therefrom. A further information area (32b,32c,32d) of said database (32) is activated (59,70,4c,4d) as a second information area, if the information segment (W1) corresponding to a first voice signal segment (s1) of said first voice signal (sa(t) is not contained in said first information area (32a).
    Type: Application
    Filed: October 21, 2003
    Publication date: July 29, 2004
    Inventors: Klaus Schimmer, Peter Plakensteiner, Stefan Harbeck
  • Publication number: 20040148168
    Abstract: A method and device are provided for automatically differentiating and/or detecting acoustic signals, whereby the signals are statistically analyzed, at least in part, and their reflection coefficients of at least one are calculated. Thereafter, a comparison value, which is dependent exclusively on a single reflection coefficient, is calculated and compared with at least one predetermined reference value.
    Type: Application
    Filed: November 3, 2003
    Publication date: July 29, 2004
    Inventor: Tim Fingscheidt
  • Publication number: 20040138885
    Abstract: A combination system of speech recognition engines comprises a pool of speech recognition engines that vary amongst themselves in various characterizing measures like processing speed, error rates, cost, etc. One such speech recognition engine is designated as primary and others are designated as supplemental, according to the job at hand and the peculiar benefits of using each selected engine. The primary engine is run on every job. A supplemental engine may be run if some measure indicates more speed or more accuracy is needed. A combination unit aligns and combines the outputs of the primary and supplemental engines. Any grammar constraints are enforced by the combination unit in the final result. A finite state machine is generated from the grammar constraints, and is used to guide the search in word transition network for an optimal final string.
    Type: Application
    Filed: January 9, 2003
    Publication date: July 15, 2004
    Inventor: Xiaofan Lin
  • Publication number: 20040138886
    Abstract: A method of parametrically encoding a transient audio signal, including the steps of: determining a set V of the N largest frequency components of the transient audio signal, where N is a predetermined number; determining an approximate envelope of the transient audio signal; and determining a predetermined number P of samples W of the approximate envelope for use in generating a spline approximation of the approximate envelope, whereby a parametric representation of the transient audio signal is given by parameters including V, N, P and W, such that a decoder receiving the parametric representation can reproduce a received approximation of the transient audio signal.
    Type: Application
    Filed: July 23, 2003
    Publication date: July 15, 2004
    Applicant: STMicroelectronics Asia Pacific PTE Limited
    Inventors: Mohammed Javed Absar, Sapna George
  • Patent number: 6760699
    Abstract: A method and apparatus for performing automatic speech recognition (ASR) in a distributed ASR system for use over a wireless channel takes advantage of probabilistic information concerning the likelihood that a given, portion of the data has been accurately decoded to a particular value. The probability of error in each feature in a transmitted feature set is employed to improve speech recognition performance under adverse channel conditions. Bit error probabilities for each of the bits which are used to encode a given ASR feature are used to compute the confidence level that the system may have in the decoded value of that feature. Features that have been corrupted with high probability are advantageously either not used or are weighted less in the acoustic distance computation performed by the speech recognizer.
    Type: Grant
    Filed: April 24, 2000
    Date of Patent: July 6, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Vijitha Weerackody, Wolfgang Reichl, Alexandros Potamianos
  • Patent number: 6754626
    Abstract: The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.
    Type: Grant
    Filed: March 1, 2001
    Date of Patent: June 22, 2004
    Assignee: International Business Machines Corporation
    Inventor: Mark E. Epstein
  • Patent number: 6741666
    Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.
    Type: Grant
    Filed: January 11, 2000
    Date of Patent: May 25, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Félix Henry, Bertrand Berthelot, Eric Majani
  • Patent number: 6738745
    Abstract: Methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using a confidence score. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. The background models are created or trained based on speech data in other languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models.
    Type: Grant
    Filed: April 7, 2000
    Date of Patent: May 18, 2004
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, Mahesh Viswanathan
  • Patent number: 6735562
    Abstract: A method of estimating a confidence measure for a speech recognition system, involves comparing an input speech signal with a number of predetermined models of possible speech signals. Best scores indicating the degree of similarity between the input speech signal and each of the predetermined models are then used to determine a normalized variance, which is used as the Confidence Measure, in order to determine whether the input speech signal has been correctly recognized, the Confidence Measure is compared to a threshold value. The threshold value is weighted according to the Signal to Noise Ratio of the input speech signal and according to the number of predetermined models used.
    Type: Grant
    Filed: June 5, 2000
    Date of Patent: May 11, 2004
    Assignee: Motorola, Inc.
    Inventors: Yaxin Zhang, Ho Chuen Choi, Jian Ming Song
  • Publication number: 20040083102
    Abstract: This method of automatic processing of a speech signal comprises:
    Type: Application
    Filed: August 12, 2003
    Publication date: April 29, 2004
    Applicant: FRANCE TELECOM
    Inventors: Samir Nefti, Olivier Boeffard
  • Patent number: 6725195
    Abstract: Probabilistic recognition using clusters and simple probability functions provides improved performance by employing a limited number of clusters each using a relatively large number of simple probability functions. The simple probability functions for each of the limited number of state clusters are greater in number than the limited number of state clusters.
    Type: Grant
    Filed: October 22, 2001
    Date of Patent: April 20, 2004
    Assignee: SRI International
    Inventors: Ananth Sankar, Venkata Ramana Rao Gadde
  • Patent number: 6708149
    Abstract: The present invention discloses an apparatus and method of decoding information received over a noisy communications channel to determine the intended transmitted information. The present invention uses a vector fixed-lag algorithm to determine the probabilities of the intended transmitted information. The algorithm is implemented by multiplying an initial state vector with a matrix containing information about the communications channel. The product is then recursively multiplied by the matrix &tgr; times, using the new product with each recursive multiplication and the forward information is stored for a fixed period of time, &tgr;. The final product is multiplied with a unity column vector yielding a probability of a possible input. The estimated input is the input having the largest probability.
    Type: Grant
    Filed: April 30, 2001
    Date of Patent: March 16, 2004
    Assignee: AT&T Corp.
    Inventor: William Turin
  • Publication number: 20040039571
    Abstract: For “Super Audio CD” (SACD) the DSD signals are losslessly coded, using framing, prediction and entropy coding. Besides the efficiently encoded signals, a large number of parameters, i.e. the side-information, has to be stored on the SACD too. The smaller the storage capacity that is required for the side-information, the better the overall coding gain is. Therefore coding techniques are applied to the side-information too so as to compress the amount of data of the side information.
    Type: Application
    Filed: August 29, 2003
    Publication date: February 26, 2004
    Inventors: Alphons A.M.L. Bruekers, Adriaan J. Rijnberg
  • Publication number: 20040030551
    Abstract: A machine translation (MT) system may utilize a phrase-based joint probability model. The model may be used to generate source and target language sentences simultaneously. In an embodiment, the model may learn phrase-to-phrase alignments from word-to-word alignments generated by a word-to-word statistical MT system. The system may utilize the joint probability model for both source-to-target and target-to-source translation applications.
    Type: Application
    Filed: March 27, 2003
    Publication date: February 12, 2004
    Inventors: Daniel Marcu, William Wong, Kevin Knight, Philipp Koehn
  • Patent number: 6691087
    Abstract: A signal processing system for detecting the presence of a desired signal component by applying a probabilistic description to the classification and tracking of various signal components (e.g., desired versus non-desired signal components) in an input signal is disclosed.
    Type: Grant
    Filed: September 30, 1998
    Date of Patent: February 10, 2004
    Assignees: Sarnoff Corporation, LG Electronics, Inc.
    Inventors: Lucas Parra, Aalbert de Vries
  • Patent number: 6691088
    Abstract: An apparatus and method of determining parameters of a statistical language model for automatic speech recognition systems using a training corpus are disclosed. To improve the perplexity and the error rate in the speech recognition, at least a proportion of the elements of a vocabulary used is combined so as to form context-independent vocabulary element categories. The frequencies of occurrence of vocabulary element sequences, and if applicable, the frequencies of occurrence of derived sequences formed from the vocabulary element sequences through the replacement of at least one vocabulary element by the associated vocabulary element class, are evaluated in the language modeling process. The parameters of the language model are then derived from the evaluated frequencies of occurence.
    Type: Grant
    Filed: October 20, 1999
    Date of Patent: February 10, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Reinhard Blasig
  • Patent number: 6687665
    Abstract: In a voice pitch normalization device equipped in a voice recognition device VRAp for recognizing an incoming command voice Sva uttered by any speaker, and used to normalize the incoming command voice to be in an optimal pitch for voice recognition, a target voice generator produces a target voice signal by changing the incoming command voice Svd on the basis of a predetermined degree. A probability calculator calculates a probability indicating a degree of coincidence among the target voice signal and a plurality of words in sample data. A voice pitch changer repeatedly changes the target voice signal in voice pitch until a maximum probability becomes a predetermined probability or greater.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: February 3, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Mikio Oda, Tomoe Kawane
  • Publication number: 20040015351
    Abstract: A solution for determining the accuracy of a speech recognition system. A first graphical user interface (GUI) is provided for selecting a transaction log. The transaction log has at least one entry that specifies a speech recognition text result. A second GUI is also provided for selecting at least one audio segment corresponding to the entry. The second GUI includes an activatable icon for initiating transcription of the audio segment through a reference speech recognition engine to generate a second text result.
    Type: Application
    Filed: July 16, 2002
    Publication date: January 22, 2004
    Applicant: International Business Machines Corporation
    Inventors: Shailesh B. Gandhi, Peeyush Jaiswal, Victor S. Moore, Gregory L. Toon
  • Publication number: 20040015352
    Abstract: A method segments an audio signal including frames into non-speech and speech segments. First, high-dimensional spectral features are extracted from the audio signal. The high-dimensional features are then projected non-linearly to low-dimensional features that are subsequently averaged using a sliding window and weighted averages. A linear discriminant is applied to the averaged low-dimensional features to determine a threshold separating the low-dimensional features. The linear discriminant can be determined from a Gaussian mixture or a polynomial applied to a bi-model histogram distribution of the low-dimensional features. Then, the threshold can be used to classify the frames into either non-speech or speech segments. Speech segments having a very short duration can be discarded, and the longer speech segments can be further extended. In batch-mode or real-time the threshold can be updated continuously.
    Type: Application
    Filed: July 17, 2002
    Publication date: January 22, 2004
    Inventors: Bhiksha Ramakrishnan, Rita Singh
  • Patent number: 6678657
    Abstract: The present invention relates to a method and an apparatus for a robust feature extraction for speech recognition in a noisy environment, wherein the speech signal is segmented and is characterized by spectral components. The speech signal is splitted into a number of short term spectral components in L subbands, with L=1, 2, . . . and a noise spectrum from segments that only contain noise is estimated. Then a spectral subtraction of the estimated noise spectrum from the corresponding short term spectrum is performed and a probability for each short term spectrum component to contain noise is calculated. Finally these spectral component of each short-term spectrum, having a low probability to contain speech are interpolated in order to smooth those short-term, spectra that only contain noise. With the interpolation the spectral components containing noise are interpolated by reliable spectral speech components that could be found in the neighborhood.
    Type: Grant
    Filed: October 23, 2000
    Date of Patent: January 13, 2004
    Assignee: Telefonaktiebolaget LM Ericsson(Publ)
    Inventors: Raymond Brückner, Hans-Günter Hirsch, Rainer Klisch, Volker Springer
  • Patent number: 6678658
    Abstract: A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence of speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.
    Type: Grant
    Filed: July 7, 2000
    Date of Patent: January 13, 2004
    Assignee: The Regents of the University of California
    Inventors: John Hogden, David Nix
  • Publication number: 20040006465
    Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.
    Type: Application
    Filed: February 10, 2003
    Publication date: January 8, 2004
    Applicant: Speechworks International, Inc., a Delaware Corporation
    Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
  • Publication number: 20040002861
    Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.
    Type: Application
    Filed: June 20, 2003
    Publication date: January 1, 2004
    Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
  • Patent number: 6662158
    Abstract: A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: December 9, 2003
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Publication number: 20030216914
    Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the mean time, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.
    Type: Application
    Filed: May 20, 2002
    Publication date: November 20, 2003
    Inventors: James G. Droppo, Alejandro Acero, Li Deng
  • Patent number: 6643619
    Abstract: A method for reducing interference in acoustic signals by using of an adaptive filter method involving spectral subtraction. The inventive method enables a significant reduction of interference in acoustic signals, especially voice signals, without causing any substantial falsification of said signals such as echo or musical tones, and significantly reduces computational requirements in comparison with other methods known per se that are similarly designed to improve signal quality.
    Type: Grant
    Filed: June 20, 2000
    Date of Patent: November 4, 2003
    Inventors: Klaus Linhard, Tim Haulick
  • Patent number: 6633841
    Abstract: An extended signal coding system that accommodates substantially music-like signals within a signal while maintaining a high perceptual quality in a reproduced signal during discontinued transmission (DTX) operation. The extended signal coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the signal, to ensure the high perceptual quality in the reproduced signal. In certain embodiments of the invention, the signal is a speech signal, and the speech signal has a substantially music-like signal contained therein, and the extended signal coding system overrides any voice activity detection (VAD) decision that is used to determine which among a plurality of source coding modes are to be employed using a voice activity detection (VAD) correction/supervision circuitry. This is particularly relevant for discontinued transmission (DTX) operation.
    Type: Grant
    Filed: March 15, 2000
    Date of Patent: October 14, 2003
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Jes Thyssen, Adil Benyassine
  • Patent number: 6625600
    Abstract: The invention concerns a method and apparatus for processing a user's communication. The invention may include receiving a list of recognized symbol strings of one or more recognized entries. The list of recognized symbol strings may include a first similarity score associated with each recognized entry. From each recognized symbol string one or more contiguous sequences of N-symbols may be extracted. One of the extracted contiguous sequences of N-symbols may be matched with at least one stored contiguous sequence of N-symbols from a first database. A preliminary set of symbol strings and associated second similarity scores may be generated. The preliminary set of symbol strings may include one or more stored symbol strings from a second database that correspond to the at least one matched contiguous sequence of N-symbols. A third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings may be computed.
    Type: Grant
    Filed: May 1, 2001
    Date of Patent: September 23, 2003
    Assignee: Telelogue, Inc.
    Inventors: Yevgenly Lyudovyk, Esther Levin