Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Automatic Method For Measuring a Baby's, Particularly a Newborn's, Cry, and Related Apparatus

Publication number: 20080235030

Abstract: The present invention concerns an automatic method for measuring a baby's cry, comprising the following step: A. having N samples ?(i), for i=O, 1, . . . , (N?1), of an acoustic signal p(t) representing the cry, sampled at a sampling frequency? for a period of duration P; the method being characterised in that it assigns a score PainScore to the acoustic signal p(t) by means of a function AF of one or more acoustic parameters selected from the group comprising: —a root-mean-square or rms value prms of the acoustic signal p(t) in the period P; —a fundamental or pitch frequency F0 of the acoustic signal p(t), i.e. the minimum frequency at which a peak in the spectrum of the acoustic signal p(t) occurs in the period P; and—a configuration of amplitude and frequency modulation of the acoustic signal p(t) in the period P. The invention further concerns the apparatus performing the method.

Type: Application

Filed: March 10, 2006

Publication date: September 25, 2008

Applicants: UNIVERSITA' DEGLI STUDI DI SIENA, AZIENDA OSPEDALIERA UNIVERSITARIA SENESE

Inventors: Renata Sisto, Carlo Valerio Bellieni, Giuseppe Buonocore
Structure skeletons for efficient voice navigation through generic hierarchical objects

Patent number: 7418382

Abstract: A system and method for providing fast and efficient conversation navigation via a hierarchical structure (structure skeleton) which fully describes functions and services supported by a dialog (conversational) system. In one aspect, a conversational system and method is provided to pre-load dialog menus and target addresses to their associated dialog managing procedures in order to handle multiple or complex modes, contexts or applications. For instance, a content server (web site) (106) can download a skeleton or tree structure (109) describing the content (page) (107) or service provided by the server (106) when the client (100) connects to the server (106). The skeleton is hidden (not spoken) to the user but the user can advance to a page of interest, or to a particular dialog service, by uttering a voice command which is recognized by the conversational system reacting appropriately (as per the user's command) using the information contained within the skeleton.

Type: Grant

Filed: October 1, 1999

Date of Patent: August 26, 2008

Assignee: International Business Machines Corporation

Inventor: Stephane H. Maes
Noise robust speech recognition with a switching linear dynamic model

Patent number: 7418383

Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.

Type: Grant

Filed: September 3, 2004

Date of Patent: August 26, 2008

Assignee: Microsoft Corporation

Inventors: James Droppo, Alejandro Acero
METHOD OF EMOTION RECOGNITION

Publication number: 20080201144

Abstract: A method is disclosed in the present invention for recognizing emotion by setting different weights to at least of two kinds of unknown information, such as image and audio information, based on their recognition reliability respectively. The weights are determined by the distance between test data and hyperplane and the standard deviation of training data and normalized by the mean distance between training data and hyperplane, representing the classification reliability of different information. The method is capable of recognizing the emotion according to the unidentified information having higher weights while the at least two kinds of unidentified information have different result classified by the hyperplane and correcting wrong classification result of the other unidentified information so as to raise the accuracy while emotion recognition. Meanwhile, the present invention also provides a learning step with a characteristic of higher learning speed through an algorithm of iteration.

Type: Application

Filed: August 8, 2007

Publication date: August 21, 2008

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Kai-Tai Song, Meng-Ju Han, Jing-Huai Hsu, Jung-Wei Hong, Fuh-Yu Chang
METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS

Publication number: 20080195387

Abstract: A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.

Type: Application

Filed: October 19, 2006

Publication date: August 14, 2008

Applicant: NICE SYSTEMS LTD.

Inventors: Yaniv ZIGEL, Moshe WASSERBLAT
Reducing time for annotating speech data to develop a dialog application

Patent number: 7412383

Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.

Type: Grant

Filed: April 4, 2003

Date of Patent: August 12, 2008

Assignee: AT&T Corp

Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
Phonetic searching

Patent number: 7406415

Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.

Type: Grant

Filed: December 11, 2006

Date of Patent: July 29, 2008

Assignee: Georgia Tech Research Corporation

Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
Arrangement of speaker-independent speech recognition

Patent number: 7392184

Abstract: A method needed in speech recognition for forming a pronunciation model in a telecommunications system comprising at least one portable electronic device and server. The electronic device is arranged to compare the user's speech information with pronunciation models comprising acoustic units and stored in the electronic device. A character sequence is transferred from the electronic device to the server. In the server, the character device is converted into a sequence of acoustic units. A sequence of acoustic units is sent from the server to the electronic device.

Type: Grant

Filed: April 15, 2002

Date of Patent: June 24, 2008

Assignee: Nokia Corporation

Inventors: Olli Viikki, Kari Laurila
Speaker adaptation of vocabulary for speech recognition

Patent number: 7389228

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: December 16, 2002

Date of Patent: June 17, 2008

Assignee: International Business Machines Corporation

Inventors: Nitendra Rajput, Ashish Verma
Discriminative training of language models for text and speech classification

Patent number: 7379867

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Grant

Filed: June 3, 2003

Date of Patent: May 27, 2008

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
Fractal harmonic overtone mapping of speech and musical sounds

Patent number: 7376553

Abstract: An apparatus for signal processing based on an algorithm for representing harmonics in a fractal lattice. The apparatus includes a plurality of tuned segments, each tuned segment including a transceiver having an intrinsic resonant frequency the amplitude of the resonant frequency capable of being modified by either receiving an external input signal, or by internally generating a response to an applied feedback signal. A plurality of signal processing elements are arranged in an array pattern, the signal processing elements including at least one function selected from the group including buffers for storing information, a feedback device for generating a feedback signal, a controller for controlling an output signal, a connection circuit for connecting the plurality of tuned segments to signal processing elements, and a feedback connection circuit for conveying signals from the plurality of signal processing elements in the array to the tuned segments.

Type: Grant

Filed: July 8, 2004

Date of Patent: May 20, 2008

Inventor: Robert Patel Quinn
Method and apparatus for nonlinear frequency analysis of structured signals

Patent number: 7376562

Abstract: The present invention relates to systems and methods for processing acoustic signals, such as music and speech. The method involves nonlinear frequency analysis of an incoming acoustic signal. In one aspect, a network of nonlinear oscillators, each with a distinct frequency, is applied to process the signal. The frequency, amplitude, and phase of each signal component are identified. In addition, nonlinearities in the network recover components that are not present or not fully resolvable in the input signal. In another aspect, a modification of the nonlinear oscillator network is used to track changing frequency components of an input signal.

Type: Grant

Filed: June 22, 2004

Date of Patent: May 20, 2008

Assignees: Florida Atlantic University, Circular Logic, Inc.

Inventor: Edward W. Large
Automatic Speech Recognition System and Method

Publication number: 20080114595

Abstract: An automatic speech recognition method for identifying words from an input speech signal includes providing at least one hypothesis recognition based on the input speech signal, the hypothesis recognition being an individual hypothesis word or a sequence of individual hypothesis words, and computing a confidence measure for the hypothesis recognition, based on the input speech signal, wherein computing a confidence measure includes computing differential contributions to the confidence measure, each as a difference between a constrained acoustic score and an unconstrained acoustic score, weighting each differential contribution by applying thereto a cumulative distribution function of the differential contribution, so as to make the distributions of the confidence measures homogeneous in terms of rejection capability, as the language, vocabulary and grammar vary, and computing the confidence measure by averaging the weighted differential contributions.

Type: Application

Filed: December 28, 2004

Publication date: May 15, 2008

Inventors: Claudio Vair, Daniele Colibro
Speech recognition system, speech recognition method, speech synthesis system, speech synthesis method, and program product having increased accuracy

Patent number: 7369991

Abstract: The object of the present invention is to keep a high success rate in recognition with a low-volume of sound signal, without being affected by noise.

Type: Grant

Filed: March 4, 2003

Date of Patent: May 6, 2008

Assignee: NTT DoCoMo, Inc.

Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
Relative delta computations for determining the meaning of language inputs

Patent number: 7366666

Abstract: A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity.

Type: Grant

Filed: October 1, 2003

Date of Patent: April 29, 2008

Assignee: International Business Machines Corporation

Inventors: Rajesh Balchandran, Linda M. Boyer
Apparatus and method for isolating noise effects in a signal

Patent number: 7363200

Abstract: A matrix includes samples associated with a first signal and samples associated with a second signal. The second signal includes a first portion associated with the first signal and a second portion associated with at least one disturbance, such as white noise or colored noise. A projection of the matrix is produced using canonical QR-decomposition. Canonical QR-decomposition of the matrix produces an orthogonal matrix and an upper triangular matrix, where each value in the diagonal of the upper triangular matrix is greater than or equal to zero. The projection at least substantially separates the first portion of the second signal from the second portion of the second signal.

Type: Grant

Filed: February 5, 2004

Date of Patent: April 22, 2008

Assignee: Honeywell International Inc.

Inventor: Joseph Z. Lu
Minimizing resource consumption for speech recognition processing with dual access buffering

Patent number: 7349844

Abstract: In a processor system for audio processing, such as voice recognition and text-to-speech, a dedicated front-end processor, a core processor and a dedicated back-end processor are provided which are coupled by dual access stack. When an analog audio signal is inputted core processor is invoked only when a certain amount of data is present in the dual access stack. Likewise the back-end processor is invoked only when a certain amount of data is present in the dual access stack. This way the overall processing power required by the processing task is minimized as well as the power consumption of the processor system.

Type: Grant

Filed: September 11, 2003

Date of Patent: March 25, 2008

Assignee: International Business Machines Corporation

Inventor: Dieter Staiger
Speech recognition using discriminant features

Patent number: 7337114

Abstract: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.

Type: Grant

Filed: March 29, 2001

Date of Patent: February 26, 2008

Assignee: International Business Machines Corporation

Inventor: Ellen M. Eide
Perceptual harmonic cepstral coefficients as the front-end for speech recognition

Patent number: 7337107

Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.

Type: Grant

Filed: October 2, 2001

Date of Patent: February 26, 2008

Assignee: The Regents of the University of California

Inventors: Kenneth Rose, Liang Gu
Apparatus and Methods for the Detection of Emotions in Audio Interactions

Publication number: 20080040110

Abstract: An apparatus and method for detecting an emotional state of a speaker participating in an audio signal. The apparatus and method are based on the distance in voice features between a person being in an emotional state and the same person being in a neutral state. The apparatus and method comprise a training phase in which a training feature vector is determined, and an ongoing stage in which the training feature vector is used to determine emotional states in a working environment. Multiple types of emotions can be detected, and the method and apparatus are speaker-independent, i.e., no prior voice sample or information about the speaker is required.

Type: Application

Filed: August 8, 2005

Publication date: February 14, 2008

Applicant: NICE SYSTEMS LTD.

Inventors: Oren Pereg, Moshe Wasserblat
Phonetic searching

Patent number: 7324939

Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.

Type: Grant

Filed: January 27, 2006

Date of Patent: January 29, 2008

Assignee: Georgia Tech Research Corporation

Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
Method and apparatus for a interactive voice response system

Patent number: 7318029

Abstract: There is disclosed an interactive voice response system for prompting a user with feedback during speech recognition. A user who speaks too slowly or too quickly may speak even more slowly or quickly in response to an error in speech recognition. The present system aims to give the user specific feedback on the speed of speaking. The method can include: acquiring an utterance from a user; recognizing a string of words from the utterance; acquiring for each word the ratio of actual duration of delivery to ideal duration; calculating an average ratio for all the words wherein the average ratio is an indication of the speed of the delivery of the utterance; and prompting the user as to the speed of delivery of the utterance according to the average ratio.

Type: Grant

Filed: September 12, 2003

Date of Patent: January 8, 2008

Assignee: International Business Machines Corporation

Inventors: Wendy-Ann Coyle, Stephen James Haskey
Phonetic searching

Patent number: 7313521

Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.

Type: Grant

Filed: December 11, 2006

Date of Patent: December 25, 2007

Assignee: Georgia Tech Research Corporation

Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
Removing noise from feature vectors

Patent number: 7310599

Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. Aspects of the invention use mixtures of distributions of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.

Type: Grant

Filed: July 20, 2005

Date of Patent: December 18, 2007

Assignee: Microsoft Corporation

Inventors: Brendan J. Frey, Alejandro Acero, Li Deng
Method and apparatus for detecting voice activity

Patent number: 7302388

Abstract: Method and apparatus detect voice activity for spectrum or power efficiency purposes. The method determines and tracks the instant, minimum and maximum power levels of the input signal. The method selects a first range of signals to be considered as noise, and a second range of signals to be considered as voice. The method uses the selected voice, noise and power levels to calculate a log likelihood ratio (LLR). The method uses the LLR to determine a threshold, then uses the threshold for differentiating between noise and voice.

Type: Grant

Filed: February 17, 2004

Date of Patent: November 27, 2007

Assignee: Ciena Corporation

Inventors: Song Zhang, Eric Verreault
Extracting classifying data in music from an audio bitstream

Patent number: 7295977

Abstract: The method of the present invention utilizes machine-learning techniques, particularly Support Vector Machines in combination with a neural network, to process a unique machine-learning enabled representation of the audio bitstream. Using this method, a classifying machine is able to autonomously detect characteristics of a piece of music, such as the artist or genre, and classify it accordingly. The method includes transforming digital time-domain representation of music into a frequency-domain representation, then dividing that frequency data into time slices, and compressing it into frequency bands to form multiple learning representations of each song. The learning representations that result are processed by a group of Support Vector Machines, then by a neural network, both previously trained to distinguish among a given set of characteristics, to determine the classification.

Type: Grant

Filed: August 27, 2001

Date of Patent: November 13, 2007

Assignee: NEC Laboratories America, Inc.

Inventors: Brian Whitman, Gary W. Flake, Stephen R. Lawrence
Systems and methods for providing online fast speaker adaptation in speech recognition

Patent number: 7292977

Abstract: A system (230) performs speaker adaptation when performing speech recognition. The system (230) receives an audio segment and identifies the audio segment as a first audio segment or a subsequent audio segment associated with a speaker turn. The system (230) then decodes the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment and estimates a transformation matrix based on the transcription associated with the first audio segment. The system (230) decodes the audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the audio segment is the subsequent audio segment.

Type: Grant

Filed: October 16, 2003

Date of Patent: November 6, 2007

Assignee: BBNT Solutions LLC

Inventor: Daben Liu
Active learning process for spoken dialog systems

Patent number: 7292976

Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.

Type: Grant

Filed: May 29, 2003

Date of Patent: November 6, 2007

Assignee: AT&T Corp.

Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Giuseppe Riccardi, Gokhan Tur
Method for learning linguistically valid word pronunciations from acoustic data

Patent number: 7280963

Abstract: A computerized method is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The method includes graphing sets of initial pronunciations; thereafter in an ASR subsystem determining a highest-scoring set of initial pronunciations; generating sets of alternate pronunciations, wherein each set of alternate pronunciations includes the highest-scoring set of initial pronunciations with a lowest-probability phone of the highest-scoring initial pronunciation substituted with a unique-substitute phone; graphing the sets of alternate pronunciations; determining in the ASR subsystem a highest-scoring set of alternate pronunciations; and adding to a pronunciation dictionary the highest-scoring set of alternate pronunciations.

Type: Grant

Filed: September 12, 2003

Date of Patent: October 9, 2007

Assignee: Nuance Communications, Inc.

Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
Method, system and storage medium for commercial and musical composition recognition and storage

Patent number: 7277852

Abstract: A playlist generating method for generating a playlist of content from received broadcasted data is provided. The playlist generating method includes the steps of: extracting features of broadcast content beforehand, storing the features in a content feature file, and storing information relating to the broadcast content in a content information DB; extracting features from the received data, and storing the features in a data feature file; searching for broadcast content of a predetermined kind by comparing data in the content feature file and data in the data feature file; when a name of the predetermined kind of content is determined, storing data corresponding to the broadcast content of the predetermined kind in a search result file; generating a playlist for the broadcast content of the predetermined kind from the search result file and the content information DB.

Type: Grant

Filed: October 22, 2001

Date of Patent: October 2, 2007

Assignee: NTT Communications Corporation

Inventors: Miwako Iyoku, Tatsuhiro Kobayashi
Noninvasive detection of neuro diseases

Patent number: 7272559

Abstract: Noninvasive, remote methods and apparatus for detecting early phases of neuro diseases such as the non-tremor phase of Parkinson's disease, dyskinesia, dyslexia and neuroatrophy, etc., are disclosed. Five words spoken either directly into a microphone connected to a local analysis system or remotely, as by way of a telephone link to a system for analysis of time and frequency domains of speech characteristics are representative of the presence of disease. The method includes the steps of transducing a set of unmodified spoken words or numbers into electrical signals which are bandlimited and amplified. These signals are analyzed in both time and frequency domains to detect and measure the manifestation of neurological disorders in the envelope of the time representation and spectral density of the words.

Type: Grant

Filed: October 2, 2003

Date of Patent: September 18, 2007

Assignee: CEIE specs, Inc.

Inventor: Harbhajan S. Hayre
Method and system for learning linguistically valid word pronunciations from acoustic data

Patent number: 7266495

Abstract: A computerized pronunciation system is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The system includes a word list including at least one word; transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform; a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including: sets of initial pronunciations of the word, a scoring module configured score pronunciations and to generate phone probabilities, and a set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.

Type: Grant

Filed: September 12, 2003

Date of Patent: September 4, 2007

Assignee: Nuance Communications, Inc.

Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
Phonetic searching

Patent number: 7263484

Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.

Type: Grant

Filed: March 5, 2001

Date of Patent: August 28, 2007

Assignee: Georgia Tech Research Corporation

Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
System and method for relating syntax and semantics for a conversational speech application

Patent number: 7249018

Abstract: A conversation manager processes spoken utterances from a user of a computer. The conversation manager includes a semantics analysis module and a syntax manager. A domain model that is used in processing the spoken utterances includes an ontology (i.e., world view for the relevant domain of the spoken utterances), lexicon, and syntax definitions. The syntax manager combines the ontology, lexicon, and syntax definitions to generate a grammatic specification. The semantics module uses the grammatic specification and the domain model to develop a set of frames (i.e., internal representation of the spoken utterance). The semantics module then develops a set of propositions from the set of frames. The conversation manager then uses the set of propositions in further processing to provide a reply to the spoken utterance.

Type: Grant

Filed: October 25, 2001

Date of Patent: July 24, 2007

Assignee: International Business Machines Corporation

Inventors: Steven I. Ross, Robert C. Armes, Julie F. Alweis, Elizabeth A. Brownholtz, Jeffrey G. MacAllister
Speech recognition apparatus

Patent number: 7240002

Abstract: The present invention provides a speech recognition apparatus having high speech recognition performance and capable of performing speech recognition in a highly efficient manner. A matching unit 14 calculates the scores of words selected by a preliminary word selector 13 and determines a candidate for a speech recognition result on the basis of the calculated scores. A control unit 11 produces word connection relationships among words included in a word series employed as a candidate for the speech recognition result and stores them into a word connection information storage unit 16. A reevaluation unit 15 corrects the word connection relationships one by one. On the basis of the corrected word connection relationships, the control unit 11 determines the speech recognition result. A word connection managing unit 21 limits times allowed for a boundary between words represented by the word connection relationships to be located thereat.

Type: Grant

Filed: November 7, 2001

Date of Patent: July 3, 2007

Assignee: Sony Corporation

Inventors: Katsuki Minamino, Yasuharu Asano, Hiroaki Ogawa, Helmut Lucke
Tone detection algorithm for a voice activity detector

Patent number: 7231348

Abstract: There is provided a voice activity detection method for indicating an active voice mode and an inactive voice mode. The method comprises receiving an input signal having a plurality of frames, determining whether each of the plurality of frames includes an active voice signal or an inactive voice signal, determining a second reflection coefficient for each frame determined to include the inactive voice signal, comparing the second reflection coefficient with a reflection threshold, and selecting the active voice mode if the second reflection coefficient is greater than the reflection threshold. The method may further comprise selecting the inactive voice mode if the second reflection coefficient is not greater than the reflection threshold. The method may also comprise analyzing the input signal to determine an energy level of the input signal, and selecting the active voice mode if the energy level is greater than an energy threshold.

Type: Grant

Filed: January 26, 2006

Date of Patent: June 12, 2007

Assignee: Mindspeed Technologies, Inc.

Inventors: Yang Gao, Eyal Shlomot, Adil Benyassine
Speech recognition method

Patent number: 7219057

Abstract: A speech recognition method includes receiving signals derived from indices of a codebook corresponding to recognition feature vectors extracted from speech to be recognized. The signals include an indication of the number of bits per codebook index. The method also includes obtaining the string of indices from the received signals, obtaining the corresponding recognition feature vectors from the string of indices, and applying the recognition feature vectors to a word-level recognition process. To conserve network capacity, the size of the codebook and the corresponding number of bits per codebook index, are adapted on a dialogue-by-dialogue basis. The adaptation accomplishes a tradeoff between expected recognition rate and expected bitrate by optimizing a metric which is a function of both.

Type: Grant

Filed: June 8, 2005

Date of Patent: May 15, 2007

Assignee: Koninklijke Philips Electronics

Inventor: Yin-Pin Yang
Determining and using acoustic confusability, acoustic perplexity and synthetic acoustic word error rate

Patent number: 7219056

Abstract: Two statistics are disclosed for determining the quality of language models. These statistics are called acoustic perplexity and the synthetic acoustic word error rate (SAWER), and they depend upon methods for computing the acoustic confusability of words. It is possible to substitute models of acoustic data in place of real acoustic data in order to determine acoustic confusability. An evaluation model is created, a synthesizer model is created, and a matrix is determined from the evaluation and synthesizer models. Each of the evaluation and synthesizer models is a hidden Markov model. Once the matrix is determined, a confusability calculation may be performed. Different methods are used to determine synthetic likelihoods. The confusability may be normalized and smoothed and methods are disclosed that increase the speed of performing the matrix inversion and the confusability calculation. A method for caching and reusing computations for similar words is disclosed.

Type: Grant

Filed: April 19, 2001

Date of Patent: May 15, 2007

Assignee: International Business Machines Corporation

Inventors: Scott Elliot Axelrod, Peder Andreas Olsen, Harry William Printz, Peter Vincent de Souza
System and method for processing speech recognition results

Patent number: 7219058

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: October 1, 2001

Date of Patent: May 15, 2007

Assignee: AT&T Corp.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
Characterizing, synthesizing, and/or canceling out acoustic signals from sound sources

Patent number: 7191105

Abstract: A system for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate and animate sound sources. Electromagnetic sensors monitor excitation sources in sound producing systems, such as animate sound sources such as the human voice, or from machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The systems disclosed enable accurate calculation of transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.

Type: Grant

Filed: January 22, 2003

Date of Patent: March 13, 2007

Assignee: The Regents of the University of California

Inventors: John F. Holzrichter, Lawrence C. Ng
Coupled hidden Markov model for audiovisual speech recognition

Patent number: 7165029

Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

Type: Grant

Filed: May 9, 2002

Date of Patent: January 16, 2007

Assignee: Intel Corporation

Inventor: Ara V. Nefian
Apparatus and method for using user context information to improve N-best processing in the presence of speech recognition uncertainty

Patent number: 7162422

Abstract: The present invention provides a method and apparatus for using user-context information to improve N-best processing in the presence of speech recognition uncertainty. Once digitized voice data is received, the voice data is processed by a speech recognizer to determine one or more phrases recognized as the digitized voice data provided by the user based on the currently active recognition grammar. When more than one phrase is recognized as the digitized voice data provided by the user as a result of voice recognition uncertainty, user-specific context information is used to choose a recognized phrase from the one or more phrases recognized as the digitized voice data.

Type: Grant

Filed: September 29, 2000

Date of Patent: January 9, 2007

Assignee: Intel Corporation

Inventor: Steven M. Bennett
Feature extraction apparatus and method and pattern recognition apparatus and method

Patent number: 7117151

Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.

Type: Grant

Filed: March 29, 2005

Date of Patent: October 3, 2006

Assignee: Sony Corporation

Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
Method and apparatus for predicting word error rates from text

Patent number: 7117153

Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.

Type: Grant

Filed: February 13, 2003

Date of Patent: October 3, 2006

Assignee: Microsoft Corporation

Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
Frame erasure concealment technique for a bitstream-based feature extractor

Patent number: 7110947

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Grant

Filed: December 5, 2000

Date of Patent: September 19, 2006

Assignee: AT&T Corp.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Model adaptation apparatus, model adaptation method, storage medium, and pattern recognition apparatus

Patent number: 7107214

Abstract: To achieve an improvement in recognition performance, a non-speech acoustic model correction unit adapts a non-speech acoustic model representing a non-speech state using input data observed during an interval immediately before a speech recognition interval during which speech recognition is performed, by means of one of the most likelihood method, the complex statistic method, and the minimum distance-maximum separation theorem.

Type: Grant

Filed: July 8, 2005

Date of Patent: September 12, 2006

Assignee: Sony Corporation

Inventor: Hironaga Nakatsuka
Method of noise reduction based on dynamic aspects of speech

Patent number: 7107210

Abstract: A system and method are provided that reduce noise in pattern recognition signals. To do this, embodiments of the present invention utilize a prior model of dynamic aspects of clean speech together with one or both of a prior model of static aspects of clean speech, and an acoustic model that indicates the relationship between clean speech, noisy speech and noise. In one embodiment, components of a noise-reduced feature vector are produced by forming a weighted sum of predicted values from the prior model of dynamic aspects of clean speech, the prior model of static aspects of clean speech and the acoustic-environmental model.

Type: Grant

Filed: May 20, 2002

Date of Patent: September 12, 2006

Assignee: Microsoft Corporation

Inventors: Li Deng, James G. Droppo, Alejandro Acero
Environment adaptation for speech recognition in a speech communication system

Patent number: 7050974

Abstract: A speech communication system comprising a speech input terminal and a speech recognition apparatus which can communicate with each other through a wire or wireless communication network wherein the speech input terminal comprises speech input unit, a unit for creating environment information for speech recognition, which is unique to the speech input terminal or represents its operation state, and a communication control unit for transmitting the environment information to the speech recognition apparatus, and the speech recognition apparatus executes speech recognition processing on the basis of the environment information.

Type: Grant

Filed: September 13, 2000

Date of Patent: May 23, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Masayuki Yamada
Speech recognition with plural confidence measures

Patent number: 7043429

Abstract: A speech recognition system is used to receive a speech signal and output an output language word with respect to the speech signal. The speech recognition system has preset quantities for a first threshold, a second threshold, and a third threshold. The speech recognition system includes a first speech recognition device that is used to receive the speech signal and generate a first candidate language word and a first confidence measurement of the first candidate language word, according to the speech signal. A second speech recognition device is used to receive the speech signal and generate a second candidate language word and a second confidence measurement of the second candidate language word, according to the speech signal. A confidence measurement judging unit is used to output the language word, by comparing the first confidence measurement and the second confidence measurement to the above thresholds.

Type: Grant

Filed: March 28, 2002

Date of Patent: May 9, 2006

Assignee: Industrial Technology Research Institute

Inventors: Sen-Chia Chang, Shih-Chien Chien, Jia-Jang Tu
Determining redundancies in content object directories

Patent number: 7035867

Abstract: A system for identifying files can use fingerprints to compare various files and determine redundant files. Frequency representations of portions of files can be used, such as Fast Fourier Transforms, as the fingerprints.

Type: Grant

Filed: November 28, 2001

Date of Patent: April 25, 2006

Assignee: Aerocast.com, Inc.

Inventors: Mark R. Thompson, Nathan F. Raciborski

prev … 8 9 10 11 12 13 14 15 16 next