Probability Patents (Class 704/240)

Speech recognition system and method for generating phonotic estimates

Patent number: 6868380

Abstract: A speech recognition system for transforming an acoustic signal into a stream of phonetic estimates includes a frequency analyzer for generating a short-time frequency representation of the acoustic signal. A novelty processor separates background components of the representation from region of interest components of the representation. The output of the novelty processor includes the region of interest components of the representation according to the novelty parameters. An attention processor produces a gating signal as a function of the novelty output according to attention parameters. A coincidence processor produces information regarding co-occurrences between samples of the novelty output over time and frequency. The coincidence processor selectively gates the coincidence output as a function of the gating signal according to one or more coincidence parameters.

Type: Grant

Filed: March 23, 2001

Date of Patent: March 15, 2005

Assignee: Eliza Corporation

Inventor: John Kroeker
Hypertext navigation system controlled by spoken words

Patent number: 6859777

Abstract: A hypertext navigation system that is controllable by spoken words has hypertext documents to which specific dictionaries and probability models for assisting in an acoustic voice recognition of hyper-links of this hypertext document are allocated. Control of a hypertext viewer or, respectively, browser and navigation in the hypertext document or hypertext system by pronouncing links is provided. The voice recognition is thereby optimally adapted to the links to be recognized without these having to be previously known.

Type: Grant

Filed: January 17, 2001

Date of Patent: February 22, 2005

Assignee: Siemens Aktiengesellschaft

Inventor: Darin Edward Krasle
System and method for speech verification using an efficient confidence measure

Patent number: 6850886

Abstract: The present invention comprises a system and method for speech verification using an efficient confidence measure, and includes a speech verifier which compares a confidence measure for a recognized word to a predetermined threshold value in order to determine whether the recognized word is valid, where a recognized word corresponds to a word model that produces a highest recognition score. In accordance with the present invention, the foregoing confidence measure may be calculated using the recognition score for the recognized word and a pseudo filler score that may be based upon selected average recognition scores from an N-best list of recognition candidates.

Type: Grant

Filed: May 31, 2001

Date of Patent: February 1, 2005

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Gustavo Hernandez Abrego, Xavier Menendez-Pidal
Word recognition method and storage medium that stores word recognition program

Patent number: 6847734

Abstract: In word recognition using the character recognition result, recognition processing is performed for an input character string that corresponds to a word to be recognized, a probability at which characteristics obtained as the result of character recognition are generated by conditioning characters of words contained in a word dictionary that stores in advance candidates of words to be recognized. The thus obtained probability is divided by a probability at which characteristics obtained as the result of character recognition are generated, and each of the division results obtained relevant to the characters of the words contained in the word dictionary is multiplied relevant to all the characters. The recognition results of the above words are obtained based on the multiplication results.

Type: Grant

Filed: January 26, 2001

Date of Patent: January 25, 2005

Assignee: Kabushiki Kaisha Toshiba

Inventor: Tomoyuki Hamamura
Method of speech recognition by presenting N-best word candidates

Patent number: 6839667

Abstract: A method for performing speech recognition can include receiving user speech and determining a plurality of potential candidates. Each of the candidates can provide a textual interpretation of the speech. Confidence scores can be calculated for the candidates. The confidence scores can be compared to a predetermined threshold. Also, selected ones of the plurality of candidates can be presented to the user as alternative interpretations of the speech when none of the confidence scores is greater than the predetermined threshold. The selected ones of the plurality of candidates can have confidence scores above a predetermined minimum threshold, and thus can have confidence scores within a predetermined range.

Type: Grant

Filed: May 16, 2001

Date of Patent: January 4, 2005

Assignee: International Business Machines Corporation

Inventor: David E. Reich
Method, system and recording medium for automatic speech recognition using a confidence measure driven scalable two-pass recognition strategy for large list grammars

Publication number: 20040254790

Abstract: A method, a system and recording medium in which automatic speech recognition may use large list grammars and a confidence measure driven scalable two-pass recognition strategy.

Type: Application

Filed: June 13, 2003

Publication date: December 16, 2004

Applicant: International Business Machines Corporation

Inventors: Miroslav Novak, Diego Ruiz
Method and array for introducing temporal correlation in hidden markov models for speech recognition

Patent number: 6832190

Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.

Type: Grant

Filed: November 10, 2000

Date of Patent: December 14, 2004

Assignee: Siemens Aktiengesellschaft

Inventors: Jochen Junkawitsch, Harald Höge
Process for implementing a speech recognizer, the related recognizer and process for speech recognition

Patent number: 6832191

Abstract: To implement a speech recognizer for a language in conditions of substantial unavailability of related speech training material the first step (1,2) is, based on related speech training material, a multilingual speech recognizer (2) for a plurality of known languages. The recognizer for such given language (5) is then implemented by interpolation (4) starting from the said multilingual recognizer (2). The recognizer (5) generated in this fashion is susceptible of being subsequently refined based on related speech training material acquired online (4) during later use (FIG.

Type: Grant

Filed: August 28, 2000

Date of Patent: December 14, 2004

Assignee: Telecom Italia Lab S.p.A.

Inventors: Alessandra Frasca, Giorgio Micca, Enrico Palme
Speech recognition method and apparatus utilizing segment models

Publication number: 20040243410

Abstract: A method and apparatus determine the likelihood of a sequence of words based in part on a segment model. The segment model includes trajectory expressions formed as the product of a polynomial matrix and a generation matrix. The likelihood of the sequence of words is based in part on a segment probability derived by subtracting the trajectory expressions from a feature vector matrix that contains a sequence of feature vectors for a segment of speech. Aspects of the method and apparatus also include training the segment model using such a segment probability.

Type: Application

Filed: June 14, 2004

Publication date: December 2, 2004

Applicant: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Kuansan Wang
Method and apparatus using source-channel models for word segmentation

Publication number: 20040243408

Abstract: A method and apparatus for segmenting text is provided that identifies a sequence of entity types from a sequence of characters and thereby identifies a segmentation for the sequence of characters. Under the invention, the sequence of entity types is identified using probabilistic models that describe the likelihood of a sequence of entities and the likelihood of sequences of characters given particular entities. Under one aspect of the invention, organization name entities are identified from a first sequence of identified entities to form a final sequence of identified entities.

Type: Application

Filed: May 30, 2003

Publication date: December 2, 2004

Applicant: Microsoft Corporation

Inventors: Jianfeng Gao, Mu Li, Chang-Ning Huang, Jian Sun, Lei Zhang, Ming Zhou
Morphological analyzer, morphological analysis method, and morphological analysis program

Publication number: 20040243409

Abstract: An input text is analyzed into morphemes by using a prescribed morphological analysis procedure to generate word strings with part-of-speech tags, including form information for parts of speech having forms, as hypotheses. The probabilities of occurrence of each hypothesis in a corpus of text are calculated by use of two or more part-of-speech n-gram models, at least one of which takes the forms of the parts of speech into consideration. Lexicalized models and class models may also be used. The models are weighted and the probabilities are combined according to the weights to obtain a single probability for each hypothesis. The hypothesis with the highest probability is selected as the solution to the morphological analysis. By combining multiple models, this method can resolve ambiguity with a higher degree of accuracy than methods that use only a single model.

Type: Application

Filed: March 30, 2004

Publication date: December 2, 2004

Applicant: Oki Electric Industry Co., Ltd.

Inventor: Tetsuji Nakagawa
System and method for user modeling to enhance named entity recognition

Publication number: 20040243407

Abstract: The present invention employs user modeling to model a user's behavior patterns. The user's behavior patterns are then used to influence named entity (NE) recognition.

Type: Application

Filed: May 27, 2003

Publication date: December 2, 2004

Applicant: Microsoft Corporation

Inventors: Dong Yu, Peter K. L. Mau, Kuansan Wang, Milind Mahajan, Alejandro Acero
System for speech recognition

Publication number: 20040243406

Abstract: This invention provides a system for speech recognition comparing speech against stored character strings in memory. Speech is transformed into spoken character strings. To accelerate the identification, a small group of characters from the stored character strings and the spoken character string are compared and the probabilities for identification may be calculated from those results. Those stored patterns, where the probability for identifying the speech exceeds a predetermined value, may be selected for further processing. The selected strings may have the remaining characters added to the group of characters for the next comparison. Alternatively, the number of characters for comparison may be incremented by a predetermined number in a step-by-step fashion, reducing the number of comparisons in subsequent steps as the probabilities for identification rise.

Type: Application

Filed: January 29, 2004

Publication date: December 2, 2004

Inventor: Ansgar Rinscheid
Digital transcription system and method

Publication number: 20040204941

Abstract: A digital voice transcription system and method is provided having a digital communication network and a first server operatively coupled to the digital communication network. The first server stores a digital transcription job file corresponding to an author's voice and allocates stored job files for transcription. A second server corresponding to a transcription center is operatively coupled to the digital communication network. The second server is in digital communication with the first server and is arranged to initiate transfer of digital transcription job files allocated to the second server from the first server to the second server.

Type: Application

Filed: December 23, 2003

Publication date: October 14, 2004

Applicant: WeType4U

Inventors: David Israch, Yorck P. Haase, Alind Gupta, Pavan Jha, Devesh Bhartiya, Scott C. Boterweg, Philip Austin, Fazil Atacan
Spoken language understanding that incorporates prior knowledge into boosting

Publication number: 20040204940

Abstract: A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. The prior knowledge is embodied in a rule, combined from separate rules created for each label outputted by the classifier, each of which includes a weight measure p(x). A first a set of created entries for increasing the corpus of training entries is created by attaching all labels to each entry of the original corpus of training entries, with a weight &eegr;p(x), or &eegr;(1−p(x)), in association with each label that meets, or fails to meet, the condition specified for the label, &eegr; being a preselected positive number. The second set of is created by not attaching any of the labels to each of the original corpus of training entries, with a weight of &eegr;(1−p(x)), or &eegr;p(x), in association with each label that meets, or fails to meet, the condition specified for the label.

Type: Application

Filed: May 31, 2002

Publication date: October 14, 2004

Inventors: Hiyan Alshawi, Giuseppe DiFabbrizio, Narendra K. Gupta, Mazin G. Rahim, Robert E. Schapire, Yoram Singer
Method of speech recognition using variational inference with switching state space models

Publication number: 20040199386

Abstract: A method is developed which includes 1) defining a switching state space model for a continuous valued hidden production-related parameter and the observed speech acoustics, and 2) approximating a posterior probability that provides the likelihood of a sequence of the hidden production-related parameters and a sequence of speech units based on a sequence of observed input values. In approximating the posterior probability, the boundaries of the speech units are not fixed but are optimally determined. Under one embodiment, a mixture of Gaussian approximation is used. In another embodiment, an HMM posterior approximation is used.

Type: Application

Filed: April 1, 2003

Publication date: October 7, 2004

Applicant: Microsoft Corporation

Inventors: Hagai Attias, Leo Jingyu Lee, Li Deng
Non-linear score scrunching for more efficient comparison of hypotheses

Publication number: 20040193412

Abstract: A speech recognition method, system, and program product, the method comprising in one embodiment: obtaining a frame match score for each of a plurality of different speech elements for a frame; obtaining a scrunched score for each of a plurality of the frame match scores for the frame, wherein a scrunched score means applying a non-linear transformation to each of the frame match scores so that frame match score differences among relatively good competing frame matches are reduced while the score differences between good frame matches and the poor frame matches is substantially maintained or increased, wherein a relatively good frame match score is determined based on a criterion; for each of a plurality of hypotheses, accumulating the scrunched scores for frames of the hypothesis to obtain a hypothesis scrunched score for the hypothesis; selecting a plurality of hypotheses with better hypothesis scrunched scores as compared to the accumulated scrunched scores for other hypotheses; for each of the selected h

Type: Application

Filed: March 18, 2003

Publication date: September 30, 2004

Applicant: Aurilab, LLC

Inventor: James K. Baker
Method and system for generating squeezed acoustic models for specialized speech recognizer

Patent number: 6789061

Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.

Type: Grant

Filed: August 14, 2000

Date of Patent: September 7, 2004

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
Speech recognition apparatus, speech recognition method, and recording medium

Publication number: 20040167779

Abstract: In order to prevent degradation of speech recognition accuracy due to an unknown word, a dictionary database has stored therein a word dictionary in which are stored, in addition to words for the objects of speech recognition, suffixes, which are sound elements and a sound element sequence, which form the unknown word, for classifying the unknown word by the part of speech thereof. Based on such a word dictionary, a matching section connects the acoustic models of an sound model database, and calculates the score using the series of features output by a feature extraction section on the basis of the connected acoustic model. Then, the matching section selects a series of the words, which represents the speech recognition result, on the basis of the score.

Type: Application

Filed: February 24, 2004

Publication date: August 26, 2004

Applicant: SONY CORPORATION

Inventors: Helmut Lucke, Katsuki Minamino, Yasuharu Asano, Hiroaki Ogawa
Speech recognition method and apparatus utilizing segment models

Patent number: 6782362

Abstract: A method and apparatus determine the likelihood of a sequence of words based in part on a segment model. The segment model includes trajectory expressions formed as the product of a polynomial matrix and a generation matrix. The likelihood of the sequence of words is based in part on a segment probability derived by subtracting the trajectory expressions from a feature vector matrix that contains a sequence of feature vectors for a segment of speech. Aspects of the method and apparatus also include training the segment model using such a segment probability.

Type: Grant

Filed: April 27, 2000

Date of Patent: August 24, 2004

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Kuansan Wang
Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments

Publication number: 20040158469

Abstract: Outputs of an automatic probabilistic event detection system, such as a fact extraction system, a speech-to-text engine or an automatic character recognition system, are matched with comparable results produced manually or by a different system. This comparison allows statistical modeling of the run-time behavior of the event detection system. This model can subsequently be used to give supplemental or replacement data for an output sequence of the system. In particular, the model can effectively calibrate the system for use with data of a particular statistical nature.

Type: Application

Filed: February 5, 2004

Publication date: August 12, 2004

Applicant: Verint Systems, Inc.

Inventor: Michael Brand
Two-engine speech recognition

Publication number: 20040153319

Abstract: A speech recognition system comprises exactly two automated speech recognition (ASR) engines connected to receive the same inputs. Each engine produces a recognition output, a hypothesis. The system implements one of two (or both) methods for combining the output of the two engines. In one method, a confusion matrix statistically generated for each speech recognition engine is converted into an alternatives matrix in which every column is ordered by highest-to-lowest probability. A program loop is set up in which the recognition outputs of the speech recognition engines are cross-compared with the alternatives matrices. If the output from the first ASR engine matches an alternative, its output is adopted as the final output. If the vectors provided by the alternatives matrices are exhausted without finding a match, the output from the first speech recognition engine is adopted as the final output. In a second method, the confusion matrix for each ASR engine is converted into Bayesian probability matrix.

Type: Application

Filed: January 30, 2003

Publication date: August 5, 2004

Inventor: Sherif Yacoub
Robust voice recognition with data bank organisation

Publication number: 20040148167

Abstract: A method for controlling an information system during the output of stored information segments via a signaling device (50a). Useful information is stored in a database (32) for being requested, from which information at least one information segment is specified as a first data segment (W1) via a first voice signal (sa(t),sa(z)) and is provided via a control output (20,40,50;50a) or is converted (50b) into a control signal for a technical device (G). The information is organized in the database such that an initially limited first information area (32a) of stored information is accessible (4,4a,4b) to said voice signal, for selecting the specified information segment therefrom. A further information area (32b,32c,32d) of said database (32) is activated (59,70,4c,4d) as a second information area, if the information segment (W1) corresponding to a first voice signal segment (s1) of said first voice signal (sa(t) is not contained in said first information area (32a).

Type: Application

Filed: October 21, 2003

Publication date: July 29, 2004

Inventors: Klaus Schimmer, Peter Plakensteiner, Stefan Harbeck
Method and device for automatically differentiating and/or detecting acoustic signals

Publication number: 20040148168

Abstract: A method and device are provided for automatically differentiating and/or detecting acoustic signals, whereby the signals are statistically analyzed, at least in part, and their reflection coefficients of at least one are calculated. Thereafter, a comparison value, which is dependent exclusively on a single reflection coefficient, is calculated and compared with at least one predetermined reference value.

Type: Application

Filed: November 3, 2003

Publication date: July 29, 2004

Inventor: Tim Fingscheidt
Commercial automatic speech recognition engine combinations

Publication number: 20040138885

Abstract: A combination system of speech recognition engines comprises a pool of speech recognition engines that vary amongst themselves in various characterizing measures like processing speed, error rates, cost, etc. One such speech recognition engine is designated as primary and others are designated as supplemental, according to the job at hand and the peculiar benefits of using each selected engine. The primary engine is run on every job. A supplemental engine may be run if some measure indicates more speed or more accuracy is needed. A combination unit aligns and combines the outputs of the primary and supplemental engines. Any grammar constraints are enforced by the combination unit in the final result. A finite state machine is generated from the grammar constraints, and is used to guide the search in word transition network for an optimal final string.

Type: Application

Filed: January 9, 2003

Publication date: July 15, 2004

Inventor: Xiaofan Lin
Method and system for parametric characterization of transient audio signals

Publication number: 20040138886

Abstract: A method of parametrically encoding a transient audio signal, including the steps of: determining a set V of the N largest frequency components of the transient audio signal, where N is a predetermined number; determining an approximate envelope of the transient audio signal; and determining a predetermined number P of samples W of the approximate envelope for use in generating a spline approximation of the approximate envelope, whereby a parametric representation of the transient audio signal is given by parameters including V, N, P and W, such that a decoder receiving the parametric representation can reproduce a received approximation of the transient audio signal.

Type: Application

Filed: July 23, 2003

Publication date: July 15, 2004

Applicant: STMicroelectronics Asia Pacific PTE Limited

Inventors: Mohammed Javed Absar, Sapna George
Soft feature decoding in a distributed automatic speech recognition system for use over wireless channels

Patent number: 6760699

Abstract: A method and apparatus for performing automatic speech recognition (ASR) in a distributed ASR system for use over a wireless channel takes advantage of probabilistic information concerning the likelihood that a given, portion of the data has been accurately decoded to a particular value. The probability of error in each feature in a transmitted feature set is employed to improve speech recognition performance under adverse channel conditions. Bit error probabilities for each of the bits which are used to encode a given ASR feature are used to compute the confidence level that the system may have in the decoded value of that feature. Features that have been corrupted with high probability are advantageously either not used or are weighted less in the acoustic distance computation performed by the speech recognizer.

Type: Grant

Filed: April 24, 2000

Date of Patent: July 6, 2004

Assignee: Lucent Technologies Inc.

Inventors: Vijitha Weerackody, Wolfgang Reichl, Alexandros Potamianos
Creating a hierarchical tree of language models for a dialog system based on prompt and dialog context

Patent number: 6754626

Abstract: The invention disclosed herein concerns a method of converting speech to text using a hierarchy of contextual models. The hierarchy of contextual models can be statistically smoothed into a language model. The method can include processing text with a plurality of contextual models. Each one of the plurality of contextual models can correspond to a node in a hierarchy of the plurality of contextual models. Also included can be identifying at least one of the contextual models relating to the text and processing subsequent user spoken utterances with the identified at least one contextual model.

Type: Grant

Filed: March 1, 2001

Date of Patent: June 22, 2004

Assignee: International Business Machines Corporation

Inventor: Mark E. Epstein
Device and method for transforming a digital signal

Patent number: 6741666

Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.

Type: Grant

Filed: January 11, 2000

Date of Patent: May 25, 2004

Assignee: Canon Kabushiki Kaisha

Inventors: Félix Henry, Bertrand Berthelot, Eric Majani
Methods and apparatus for identifying a non-target language in a speech recognition system

Patent number: 6738745

Abstract: Methods and apparatus are disclosed for detecting non-target language references in an audio transcription or speech recognition system using a confidence score. The confidence score may be based on (i) a probabilistic engine score provided by a speech recognition system, (ii) additional scores based on background models, or (iii) a combination of the foregoing. The engine score provided by the speech recognition system for a given input speech utterance reflects the degree of acoustic and linguistic match of the utterance with the trained target language. The background models are created or trained based on speech data in other languages, which may or may not include the target language itself. A number of types of background language models may be employed for each modeled language, including one or more of (i) prosodic models; (ii) acoustic models; (iii) phonotactic models; and (iv) keyword spotting models.

Type: Grant

Filed: April 7, 2000

Date of Patent: May 18, 2004

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, Mahesh Viswanathan
Method for estimating a confidence measure for a speech recognition system

Patent number: 6735562

Abstract: A method of estimating a confidence measure for a speech recognition system, involves comparing an input speech signal with a number of predetermined models of possible speech signals. Best scores indicating the degree of similarity between the input speech signal and each of the predetermined models are then used to determine a normalized variance, which is used as the Confidence Measure, in order to determine whether the input speech signal has been correctly recognized, the Confidence Measure is compared to a threshold value. The threshold value is weighted according to the Signal to Noise Ratio of the input speech signal and according to the number of predetermined models used.

Type: Grant

Filed: June 5, 2000

Date of Patent: May 11, 2004

Assignee: Motorola, Inc.

Inventors: Yaxin Zhang, Ho Chuen Choi, Jian Ming Song
Method of automatic processing of a speech signal

Publication number: 20040083102

Abstract: This method of automatic processing of a speech signal comprises:

Type: Application

Filed: August 12, 2003

Publication date: April 29, 2004

Applicant: FRANCE TELECOM

Inventors: Samir Nefti, Olivier Boeffard
Method and apparatus for probabilistic recognition using small number of state clusters

Patent number: 6725195

Abstract: Probabilistic recognition using clusters and simple probability functions provides improved performance by employing a limited number of clusters each using a relatively large number of simple probability functions. The simple probability functions for each of the limited number of state clusters are greater in number than the limited number of state clusters.

Type: Grant

Filed: October 22, 2001

Date of Patent: April 20, 2004

Assignee: SRI International

Inventors: Ananth Sankar, Venkata Ramana Rao Gadde
Vector fixed-lag algorithm for decoding input symbols

Patent number: 6708149

Abstract: The present invention discloses an apparatus and method of decoding information received over a noisy communications channel to determine the intended transmitted information. The present invention uses a vector fixed-lag algorithm to determine the probabilities of the intended transmitted information. The algorithm is implemented by multiplying an initial state vector with a matrix containing information about the communications channel. The product is then recursively multiplied by the matrix &tgr; times, using the new product with each recursive multiplication and the forward information is stored for a fixed period of time, &tgr;. The final product is multiplied with a unity column vector yielding a probability of a possible input. The estimated input is the input having the largest probability.

Type: Grant

Filed: April 30, 2001

Date of Patent: March 16, 2004

Assignee: AT&T Corp.

Inventor: William Turin
Efficient coding of side information in a lossless encoder

Publication number: 20040039571

Abstract: For “Super Audio CD” (SACD) the DSD signals are losslessly coded, using framing, prediction and entropy coding. Besides the efficiently encoded signals, a large number of parameters, i.e. the side-information, has to be stored on the SACD too. The smaller the storage capacity that is required for the side-information, the better the overall coding gain is. Therefore coding techniques are applied to the side-information too so as to compress the amount of data of the side information.

Type: Application

Filed: August 29, 2003

Publication date: February 26, 2004

Inventors: Alphons A.M.L. Bruekers, Adriaan J. Rijnberg
Phrase to phrase joint probability model for statistical machine translation

Publication number: 20040030551

Abstract: A machine translation (MT) system may utilize a phrase-based joint probability model. The model may be used to generate source and target language sentences simultaneously. In an embodiment, the model may learn phrase-to-phrase alignments from word-to-word alignments generated by a word-to-word statistical MT system. The system may utilize the joint probability model for both source-to-target and target-to-source translation applications.

Type: Application

Filed: March 27, 2003

Publication date: February 12, 2004

Inventors: Daniel Marcu, William Wong, Kevin Knight, Philipp Koehn
Method and apparatus for adaptive speech detection by applying a probabilistic description to the classification and tracking of signal components

Patent number: 6691087

Abstract: A signal processing system for detecting the presence of a desired signal component by applying a probabilistic description to the classification and tracking of various signal components (e.g., desired versus non-desired signal components) in an input signal is disclosed.

Type: Grant

Filed: September 30, 1998

Date of Patent: February 10, 2004

Assignees: Sarnoff Corporation, LG Electronics, Inc.

Inventors: Lucas Parra, Aalbert de Vries
Method of determining parameters of a statistical language model

Patent number: 6691088

Abstract: An apparatus and method of determining parameters of a statistical language model for automatic speech recognition systems using a training corpus are disclosed. To improve the perplexity and the error rate in the speech recognition, at least a proportion of the elements of a vocabulary used is combined so as to form context-independent vocabulary element categories. The frequencies of occurrence of vocabulary element sequences, and if applicable, the frequencies of occurrence of derived sequences formed from the vocabulary element sequences through the replacement of at least one vocabulary element by the associated vocabulary element class, are evaluated in the language modeling process. The parameters of the language model are then derived from the evaluated frequencies of occurence.

Type: Grant

Filed: October 20, 1999

Date of Patent: February 10, 2004

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Reinhard Blasig
Device for normalizing voice pitch for voice recognition

Patent number: 6687665

Abstract: In a voice pitch normalization device equipped in a voice recognition device VRAp for recognizing an incoming command voice Sva uttered by any speaker, and used to normalize the incoming command voice to be in an optimal pitch for voice recognition, a target voice generator produces a target voice signal by changing the incoming command voice Svd on the basis of a predetermined degree. A probability calculator calculates a probability indicating a degree of coincidence among the target voice signal and a plurality of words in sample data. A voice pitch changer repeatedly changes the target voice signal in voice pitch until a maximum probability becomes a predetermined probability or greater.

Type: Grant

Filed: October 27, 2000

Date of Patent: February 3, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Mikio Oda, Tomoe Kawane
Determining speech recognition accuracy

Publication number: 20040015351

Abstract: A solution for determining the accuracy of a speech recognition system. A first graphical user interface (GUI) is provided for selecting a transaction log. The transaction log has at least one entry that specifies a speech recognition text result. A second GUI is also provided for selecting at least one audio segment corresponding to the entry. The second GUI includes an activatable icon for initiating transcription of the audio segment through a reference speech recognition engine to generate a second text result.

Type: Application

Filed: July 16, 2002

Publication date: January 22, 2004

Applicant: International Business Machines Corporation

Inventors: Shailesh B. Gandhi, Peeyush Jaiswal, Victor S. Moore, Gregory L. Toon
Classifier-based non-linear projection for continuous speech segmentation

Publication number: 20040015352

Abstract: A method segments an audio signal including frames into non-speech and speech segments. First, high-dimensional spectral features are extracted from the audio signal. The high-dimensional features are then projected non-linearly to low-dimensional features that are subsequently averaged using a sliding window and weighted averages. A linear discriminant is applied to the averaged low-dimensional features to determine a threshold separating the low-dimensional features. The linear discriminant can be determined from a Gaussian mixture or a polynomial applied to a bi-model histogram distribution of the low-dimensional features. Then, the threshold can be used to classify the frames into either non-speech or speech segments. Speech segments having a very short duration can be discarded, and the longer speech segments can be further extended. In batch-mode or real-time the threshold can be updated continuously.

Type: Application

Filed: July 17, 2002

Publication date: January 22, 2004

Inventors: Bhiksha Ramakrishnan, Rita Singh
Method and apparatus for a robust feature extraction for speech recognition

Patent number: 6678657

Abstract: The present invention relates to a method and an apparatus for a robust feature extraction for speech recognition in a noisy environment, wherein the speech signal is segmented and is characterized by spectral components. The speech signal is splitted into a number of short term spectral components in L subbands, with L=1, 2, . . . and a noise spectrum from segments that only contain noise is estimated. Then a spectral subtraction of the estimated noise spectrum from the corresponding short term spectrum is performed and a probability for each short term spectrum component to contain noise is calculated. Finally these spectral component of each short-term spectrum, having a low probability to contain speech are interpolated in order to smooth those short-term, spectra that only contain noise. With the interpolation the spectral components containing noise are interpolated by reliable spectral speech components that could be found in the neighborhood.

Type: Grant

Filed: October 23, 2000

Date of Patent: January 13, 2004

Assignee: Telefonaktiebolaget LM Ericsson(Publ)

Inventors: Raymond Brückner, Hans-Günter Hirsch, Rainer Klisch, Volker Springer
Speech processing using conditional observable maximum likelihood continuity mapping

Patent number: 6678658

Abstract: A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence of speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.

Type: Grant

Filed: July 7, 2000

Date of Patent: January 13, 2004

Assignee: The Regents of the University of California

Inventors: John Hogden, David Nix
Dynamic semantic control of a speech recognition system

Publication number: 20040006465

Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.

Type: Application

Filed: February 10, 2003

Publication date: January 8, 2004

Applicant: Speechworks International, Inc., a Delaware Corporation

Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
Method and apparatus for performing observation probability calculations

Publication number: 20040002861

Abstract: A method and apparatus for calculating an observation probability includes a first operation unit that subtracts a mean of a first plurality of parameters of an input voice signal from a second parameter of an input voice signal, and multiplies the subtraction result to obtain a first output. The first output is squared and accumulated N times in a second operation unit to obtain a second output. A third operation unit subtracts a given weighted value from the second output to obtain a third output, and a comparator stores the third output for a comparator stores the third output in order to extract L outputs therefrom, and stores the L extracted outputs based on an order of magnitude of the extracted L outputs.

Type: Application

Filed: June 20, 2003

Publication date: January 1, 2004

Inventors: Byung-Ho Min, Tae-Su Kim, Hyun-Woo Park, Ho-Rang Jang, Keun-Cheol Hong, Sung-Jae Kim
Temporal pattern recognition method and apparatus utilizing segment and frame-based models

Patent number: 6662158

Abstract: A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.

Type: Grant

Filed: April 27, 2000

Date of Patent: December 9, 2003

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Kuansan Wang
Method of pattern recognition using noise reduction uncertainty

Publication number: 20030216914

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the mean time, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Application

Filed: May 20, 2002

Publication date: November 20, 2003

Inventors: James G. Droppo, Alejandro Acero, Li Deng
Method for reducing interference in acoustic signals using an adaptive filtering method involving spectral subtraction

Patent number: 6643619

Abstract: A method for reducing interference in acoustic signals by using of an adaptive filter method involving spectral subtraction. The inventive method enables a significant reduction of interference in acoustic signals, especially voice signals, without causing any substantial falsification of said signals such as echo or musical tones, and significantly reduces computational requirements in comparison with other methods known per se that are similarly designed to improve signal quality.

Type: Grant

Filed: June 20, 2000

Date of Patent: November 4, 2003

Inventors: Klaus Linhard, Tim Haulick
Voice activity detection speech coding to accommodate music signals

Patent number: 6633841

Abstract: An extended signal coding system that accommodates substantially music-like signals within a signal while maintaining a high perceptual quality in a reproduced signal during discontinued transmission (DTX) operation. The extended signal coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the signal, to ensure the high perceptual quality in the reproduced signal. In certain embodiments of the invention, the signal is a speech signal, and the speech signal has a substantially music-like signal contained therein, and the extended signal coding system overrides any voice activity detection (VAD) decision that is used to determine which among a plurality of source coding modes are to be employed using a voice activity detection (VAD) correction/supervision circuitry. This is particularly relevant for discontinued transmission (DTX) operation.

Type: Grant

Filed: March 15, 2000

Date of Patent: October 14, 2003

Assignee: Mindspeed Technologies, Inc.

Inventors: Jes Thyssen, Adil Benyassine
Method and apparatus for automatically processing a user's communication

Patent number: 6625600

Abstract: The invention concerns a method and apparatus for processing a user's communication. The invention may include receiving a list of recognized symbol strings of one or more recognized entries. The list of recognized symbol strings may include a first similarity score associated with each recognized entry. From each recognized symbol string one or more contiguous sequences of N-symbols may be extracted. One of the extracted contiguous sequences of N-symbols may be matched with at least one stored contiguous sequence of N-symbols from a first database. A preliminary set of symbol strings and associated second similarity scores may be generated. The preliminary set of symbol strings may include one or more stored symbol strings from a second database that correspond to the at least one matched contiguous sequence of N-symbols. A third similarity score associated with the one or more stored symbol strings included in the preliminary set of symbol strings may be computed.

Type: Grant

Filed: May 1, 2001

Date of Patent: September 23, 2003

Assignee: Telelogue, Inc.

Inventors: Yevgenly Lyudovyk, Esther Levin

prev … 8 9 10 11 12 13 14 15 next