Specialized Equations Or Comparisons Patents (Class 704/236)
  • Publication number: 20080235030
    Abstract: The present invention concerns an automatic method for measuring a baby's cry, comprising the following step: A. having N samples ?(i), for i=O, 1, . . . , (N?1), of an acoustic signal p(t) representing the cry, sampled at a sampling frequency? for a period of duration P; the method being characterised in that it assigns a score PainScore to the acoustic signal p(t) by means of a function AF of one or more acoustic parameters selected from the group comprising: —a root-mean-square or rms value prms of the acoustic signal p(t) in the period P; —a fundamental or pitch frequency F0 of the acoustic signal p(t), i.e. the minimum frequency at which a peak in the spectrum of the acoustic signal p(t) occurs in the period P; and—a configuration of amplitude and frequency modulation of the acoustic signal p(t) in the period P. The invention further concerns the apparatus performing the method.
    Type: Application
    Filed: March 10, 2006
    Publication date: September 25, 2008
    Applicants: UNIVERSITA' DEGLI STUDI DI SIENA, AZIENDA OSPEDALIERA UNIVERSITARIA SENESE
    Inventors: Renata Sisto, Carlo Valerio Bellieni, Giuseppe Buonocore
  • Patent number: 7418382
    Abstract: A system and method for providing fast and efficient conversation navigation via a hierarchical structure (structure skeleton) which fully describes functions and services supported by a dialog (conversational) system. In one aspect, a conversational system and method is provided to pre-load dialog menus and target addresses to their associated dialog managing procedures in order to handle multiple or complex modes, contexts or applications. For instance, a content server (web site) (106) can download a skeleton or tree structure (109) describing the content (page) (107) or service provided by the server (106) when the client (100) connects to the server (106). The skeleton is hidden (not spoken) to the user but the user can advance to a page of interest, or to a particular dialog service, by uttering a voice command which is recognized by the conversational system reacting appropriately (as per the user's command) using the information contained within the skeleton.
    Type: Grant
    Filed: October 1, 1999
    Date of Patent: August 26, 2008
    Assignee: International Business Machines Corporation
    Inventor: Stephane H. Maes
  • Patent number: 7418383
    Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.
    Type: Grant
    Filed: September 3, 2004
    Date of Patent: August 26, 2008
    Assignee: Microsoft Corporation
    Inventors: James Droppo, Alejandro Acero
  • Publication number: 20080201144
    Abstract: A method is disclosed in the present invention for recognizing emotion by setting different weights to at least of two kinds of unknown information, such as image and audio information, based on their recognition reliability respectively. The weights are determined by the distance between test data and hyperplane and the standard deviation of training data and normalized by the mean distance between training data and hyperplane, representing the classification reliability of different information. The method is capable of recognizing the emotion according to the unidentified information having higher weights while the at least two kinds of unidentified information have different result classified by the hyperplane and correcting wrong classification result of the other unidentified information so as to raise the accuracy while emotion recognition. Meanwhile, the present invention also provides a learning step with a characteristic of higher learning speed through an algorithm of iteration.
    Type: Application
    Filed: August 8, 2007
    Publication date: August 21, 2008
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Kai-Tai Song, Meng-Ju Han, Jing-Huai Hsu, Jung-Wei Hong, Fuh-Yu Chang
  • Publication number: 20080195387
    Abstract: A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.
    Type: Application
    Filed: October 19, 2006
    Publication date: August 14, 2008
    Applicant: NICE SYSTEMS LTD.
    Inventors: Yaniv ZIGEL, Moshe WASSERBLAT
  • Patent number: 7412383
    Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
    Type: Grant
    Filed: April 4, 2003
    Date of Patent: August 12, 2008
    Assignee: AT&T Corp
    Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
  • Patent number: 7406415
    Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: July 29, 2008
    Assignee: Georgia Tech Research Corporation
    Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
  • Patent number: 7392184
    Abstract: A method needed in speech recognition for forming a pronunciation model in a telecommunications system comprising at least one portable electronic device and server. The electronic device is arranged to compare the user's speech information with pronunciation models comprising acoustic units and stored in the electronic device. A character sequence is transferred from the electronic device to the server. In the server, the character device is converted into a sequence of acoustic units. A sequence of acoustic units is sent from the server to the electronic device.
    Type: Grant
    Filed: April 15, 2002
    Date of Patent: June 24, 2008
    Assignee: Nokia Corporation
    Inventors: Olli Viikki, Kari Laurila
  • Patent number: 7389228
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: December 16, 2002
    Date of Patent: June 17, 2008
    Assignee: International Business Machines Corporation
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 7379867
    Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.
    Type: Grant
    Filed: June 3, 2003
    Date of Patent: May 27, 2008
    Assignee: Microsoft Corporation
    Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
  • Patent number: 7376553
    Abstract: An apparatus for signal processing based on an algorithm for representing harmonics in a fractal lattice. The apparatus includes a plurality of tuned segments, each tuned segment including a transceiver having an intrinsic resonant frequency the amplitude of the resonant frequency capable of being modified by either receiving an external input signal, or by internally generating a response to an applied feedback signal. A plurality of signal processing elements are arranged in an array pattern, the signal processing elements including at least one function selected from the group including buffers for storing information, a feedback device for generating a feedback signal, a controller for controlling an output signal, a connection circuit for connecting the plurality of tuned segments to signal processing elements, and a feedback connection circuit for conveying signals from the plurality of signal processing elements in the array to the tuned segments.
    Type: Grant
    Filed: July 8, 2004
    Date of Patent: May 20, 2008
    Inventor: Robert Patel Quinn
  • Patent number: 7376562
    Abstract: The present invention relates to systems and methods for processing acoustic signals, such as music and speech. The method involves nonlinear frequency analysis of an incoming acoustic signal. In one aspect, a network of nonlinear oscillators, each with a distinct frequency, is applied to process the signal. The frequency, amplitude, and phase of each signal component are identified. In addition, nonlinearities in the network recover components that are not present or not fully resolvable in the input signal. In another aspect, a modification of the nonlinear oscillator network is used to track changing frequency components of an input signal.
    Type: Grant
    Filed: June 22, 2004
    Date of Patent: May 20, 2008
    Assignees: Florida Atlantic University, Circular Logic, Inc.
    Inventor: Edward W. Large
  • Publication number: 20080114595
    Abstract: An automatic speech recognition method for identifying words from an input speech signal includes providing at least one hypothesis recognition based on the input speech signal, the hypothesis recognition being an individual hypothesis word or a sequence of individual hypothesis words, and computing a confidence measure for the hypothesis recognition, based on the input speech signal, wherein computing a confidence measure includes computing differential contributions to the confidence measure, each as a difference between a constrained acoustic score and an unconstrained acoustic score, weighting each differential contribution by applying thereto a cumulative distribution function of the differential contribution, so as to make the distributions of the confidence measures homogeneous in terms of rejection capability, as the language, vocabulary and grammar vary, and computing the confidence measure by averaging the weighted differential contributions.
    Type: Application
    Filed: December 28, 2004
    Publication date: May 15, 2008
    Inventors: Claudio Vair, Daniele Colibro
  • Patent number: 7369991
    Abstract: The object of the present invention is to keep a high success rate in recognition with a low-volume of sound signal, without being affected by noise.
    Type: Grant
    Filed: March 4, 2003
    Date of Patent: May 6, 2008
    Assignee: NTT DoCoMo, Inc.
    Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
  • Patent number: 7366666
    Abstract: A method for processing language input can include the step of determining at least two possible meanings for a language input. For each possible meaning, a probability that the possible meaning is a correct interpretation of the language input can be determined. At least one relative data computation can be computed based at least in part upon the probabilities. At least one irregularity within the language input can be detected based upon the relative delta computation. The irregularity can include mumble, ambiguous input, and/or compound input. At least one programmatic action can be performed responsive to the detection of the irregularity.
    Type: Grant
    Filed: October 1, 2003
    Date of Patent: April 29, 2008
    Assignee: International Business Machines Corporation
    Inventors: Rajesh Balchandran, Linda M. Boyer
  • Patent number: 7363200
    Abstract: A matrix includes samples associated with a first signal and samples associated with a second signal. The second signal includes a first portion associated with the first signal and a second portion associated with at least one disturbance, such as white noise or colored noise. A projection of the matrix is produced using canonical QR-decomposition. Canonical QR-decomposition of the matrix produces an orthogonal matrix and an upper triangular matrix, where each value in the diagonal of the upper triangular matrix is greater than or equal to zero. The projection at least substantially separates the first portion of the second signal from the second portion of the second signal.
    Type: Grant
    Filed: February 5, 2004
    Date of Patent: April 22, 2008
    Assignee: Honeywell International Inc.
    Inventor: Joseph Z. Lu
  • Patent number: 7349844
    Abstract: In a processor system for audio processing, such as voice recognition and text-to-speech, a dedicated front-end processor, a core processor and a dedicated back-end processor are provided which are coupled by dual access stack. When an analog audio signal is inputted core processor is invoked only when a certain amount of data is present in the dual access stack. Likewise the back-end processor is invoked only when a certain amount of data is present in the dual access stack. This way the overall processing power required by the processing task is minimized as well as the power consumption of the processor system.
    Type: Grant
    Filed: September 11, 2003
    Date of Patent: March 25, 2008
    Assignee: International Business Machines Corporation
    Inventor: Dieter Staiger
  • Patent number: 7337114
    Abstract: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: February 26, 2008
    Assignee: International Business Machines Corporation
    Inventor: Ellen M. Eide
  • Patent number: 7337107
    Abstract: Pitch estimation and classification into voiced, unvoiced and transitional speech were performed by a spectro-temporal auto-correlation technique. A peak picking formula was then employed. A weighting function was then applied to the power spectrum. The harmonics weighted power spectrum underwent mel-scaled band-pass filtering, and the log-energy of the filter's output was discrete cosine transformed to produce cepstral coefficients. A within-filter cubic-root amplitude compression was applied to reduce amplitude variation without compromise of the gain invariance properties.
    Type: Grant
    Filed: October 2, 2001
    Date of Patent: February 26, 2008
    Assignee: The Regents of the University of California
    Inventors: Kenneth Rose, Liang Gu
  • Publication number: 20080040110
    Abstract: An apparatus and method for detecting an emotional state of a speaker participating in an audio signal. The apparatus and method are based on the distance in voice features between a person being in an emotional state and the same person being in a neutral state. The apparatus and method comprise a training phase in which a training feature vector is determined, and an ongoing stage in which the training feature vector is used to determine emotional states in a working environment. Multiple types of emotions can be detected, and the method and apparatus are speaker-independent, i.e., no prior voice sample or information about the speaker is required.
    Type: Application
    Filed: August 8, 2005
    Publication date: February 14, 2008
    Applicant: NICE SYSTEMS LTD.
    Inventors: Oren Pereg, Moshe Wasserblat
  • Patent number: 7324939
    Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.
    Type: Grant
    Filed: January 27, 2006
    Date of Patent: January 29, 2008
    Assignee: Georgia Tech Research Corporation
    Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
  • Patent number: 7318029
    Abstract: There is disclosed an interactive voice response system for prompting a user with feedback during speech recognition. A user who speaks too slowly or too quickly may speak even more slowly or quickly in response to an error in speech recognition. The present system aims to give the user specific feedback on the speed of speaking. The method can include: acquiring an utterance from a user; recognizing a string of words from the utterance; acquiring for each word the ratio of actual duration of delivery to ideal duration; calculating an average ratio for all the words wherein the average ratio is an indication of the speed of the delivery of the utterance; and prompting the user as to the speed of delivery of the utterance according to the average ratio.
    Type: Grant
    Filed: September 12, 2003
    Date of Patent: January 8, 2008
    Assignee: International Business Machines Corporation
    Inventors: Wendy-Ann Coyle, Stephen James Haskey
  • Patent number: 7313521
    Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: December 25, 2007
    Assignee: Georgia Tech Research Corporation
    Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
  • Patent number: 7310599
    Abstract: A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. Aspects of the invention use mixtures of distributions of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.
    Type: Grant
    Filed: July 20, 2005
    Date of Patent: December 18, 2007
    Assignee: Microsoft Corporation
    Inventors: Brendan J. Frey, Alejandro Acero, Li Deng
  • Patent number: 7302388
    Abstract: Method and apparatus detect voice activity for spectrum or power efficiency purposes. The method determines and tracks the instant, minimum and maximum power levels of the input signal. The method selects a first range of signals to be considered as noise, and a second range of signals to be considered as voice. The method uses the selected voice, noise and power levels to calculate a log likelihood ratio (LLR). The method uses the LLR to determine a threshold, then uses the threshold for differentiating between noise and voice.
    Type: Grant
    Filed: February 17, 2004
    Date of Patent: November 27, 2007
    Assignee: Ciena Corporation
    Inventors: Song Zhang, Eric Verreault
  • Patent number: 7295977
    Abstract: The method of the present invention utilizes machine-learning techniques, particularly Support Vector Machines in combination with a neural network, to process a unique machine-learning enabled representation of the audio bitstream. Using this method, a classifying machine is able to autonomously detect characteristics of a piece of music, such as the artist or genre, and classify it accordingly. The method includes transforming digital time-domain representation of music into a frequency-domain representation, then dividing that frequency data into time slices, and compressing it into frequency bands to form multiple learning representations of each song. The learning representations that result are processed by a group of Support Vector Machines, then by a neural network, both previously trained to distinguish among a given set of characteristics, to determine the classification.
    Type: Grant
    Filed: August 27, 2001
    Date of Patent: November 13, 2007
    Assignee: NEC Laboratories America, Inc.
    Inventors: Brian Whitman, Gary W. Flake, Stephen R. Lawrence
  • Patent number: 7292977
    Abstract: A system (230) performs speaker adaptation when performing speech recognition. The system (230) receives an audio segment and identifies the audio segment as a first audio segment or a subsequent audio segment associated with a speaker turn. The system (230) then decodes the audio segment to generate a transcription associated with the first audio segment when the audio segment is the first audio segment and estimates a transformation matrix based on the transcription associated with the first audio segment. The system (230) decodes the audio segment using the transformation matrix to generate a transcription associated with the subsequent audio segment when the audio segment is the subsequent audio segment.
    Type: Grant
    Filed: October 16, 2003
    Date of Patent: November 6, 2007
    Assignee: BBNT Solutions LLC
    Inventor: Daben Liu
  • Patent number: 7292976
    Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.
    Type: Grant
    Filed: May 29, 2003
    Date of Patent: November 6, 2007
    Assignee: AT&T Corp.
    Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Giuseppe Riccardi, Gokhan Tur
  • Patent number: 7280963
    Abstract: A computerized method is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The method includes graphing sets of initial pronunciations; thereafter in an ASR subsystem determining a highest-scoring set of initial pronunciations; generating sets of alternate pronunciations, wherein each set of alternate pronunciations includes the highest-scoring set of initial pronunciations with a lowest-probability phone of the highest-scoring initial pronunciation substituted with a unique-substitute phone; graphing the sets of alternate pronunciations; determining in the ASR subsystem a highest-scoring set of alternate pronunciations; and adding to a pronunciation dictionary the highest-scoring set of alternate pronunciations.
    Type: Grant
    Filed: September 12, 2003
    Date of Patent: October 9, 2007
    Assignee: Nuance Communications, Inc.
    Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
  • Patent number: 7277852
    Abstract: A playlist generating method for generating a playlist of content from received broadcasted data is provided. The playlist generating method includes the steps of: extracting features of broadcast content beforehand, storing the features in a content feature file, and storing information relating to the broadcast content in a content information DB; extracting features from the received data, and storing the features in a data feature file; searching for broadcast content of a predetermined kind by comparing data in the content feature file and data in the data feature file; when a name of the predetermined kind of content is determined, storing data corresponding to the broadcast content of the predetermined kind in a search result file; generating a playlist for the broadcast content of the predetermined kind from the search result file and the content information DB.
    Type: Grant
    Filed: October 22, 2001
    Date of Patent: October 2, 2007
    Assignee: NTT Communications Corporation
    Inventors: Miwako Iyoku, Tatsuhiro Kobayashi
  • Patent number: 7272559
    Abstract: Noninvasive, remote methods and apparatus for detecting early phases of neuro diseases such as the non-tremor phase of Parkinson's disease, dyskinesia, dyslexia and neuroatrophy, etc., are disclosed. Five words spoken either directly into a microphone connected to a local analysis system or remotely, as by way of a telephone link to a system for analysis of time and frequency domains of speech characteristics are representative of the presence of disease. The method includes the steps of transducing a set of unmodified spoken words or numbers into electrical signals which are bandlimited and amplified. These signals are analyzed in both time and frequency domains to detect and measure the manifestation of neurological disorders in the envelope of the time representation and spectral density of the words.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: September 18, 2007
    Assignee: CEIE specs, Inc.
    Inventor: Harbhajan S. Hayre
  • Patent number: 7266495
    Abstract: A computerized pronunciation system is provided for generating pronunciations for words and storing the pronunciations in a pronunciation dictionary. The system includes a word list including at least one word; transcribed acoustic data including at least one waveform for the word and transcribed text associated with the waveform; a pronunciation-learning module configured to accept as input the word list and the transcribed acoustic data, the pronunciation-learning module including: sets of initial pronunciations of the word, a scoring module configured score pronunciations and to generate phone probabilities, and a set of alternate pronunciations of the word, wherein the set of alternate pronunciations include a highest-scoring set of initial pronunciations with a highest-scoring substitute phone substituted for a lowest-probability phone; and a pronunciation dictionary configured to receive the highest-scoring set of initial pronunciations and the set of alternate pronunciations.
    Type: Grant
    Filed: September 12, 2003
    Date of Patent: September 4, 2007
    Assignee: Nuance Communications, Inc.
    Inventors: Francoise Beaufays, Ananth Sankar, Mitchel Weintraub, Shaun Williams
  • Patent number: 7263484
    Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.
    Type: Grant
    Filed: March 5, 2001
    Date of Patent: August 28, 2007
    Assignee: Georgia Tech Research Corporation
    Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
  • Patent number: 7249018
    Abstract: A conversation manager processes spoken utterances from a user of a computer. The conversation manager includes a semantics analysis module and a syntax manager. A domain model that is used in processing the spoken utterances includes an ontology (i.e., world view for the relevant domain of the spoken utterances), lexicon, and syntax definitions. The syntax manager combines the ontology, lexicon, and syntax definitions to generate a grammatic specification. The semantics module uses the grammatic specification and the domain model to develop a set of frames (i.e., internal representation of the spoken utterance). The semantics module then develops a set of propositions from the set of frames. The conversation manager then uses the set of propositions in further processing to provide a reply to the spoken utterance.
    Type: Grant
    Filed: October 25, 2001
    Date of Patent: July 24, 2007
    Assignee: International Business Machines Corporation
    Inventors: Steven I. Ross, Robert C. Armes, Julie F. Alweis, Elizabeth A. Brownholtz, Jeffrey G. MacAllister
  • Patent number: 7240002
    Abstract: The present invention provides a speech recognition apparatus having high speech recognition performance and capable of performing speech recognition in a highly efficient manner. A matching unit 14 calculates the scores of words selected by a preliminary word selector 13 and determines a candidate for a speech recognition result on the basis of the calculated scores. A control unit 11 produces word connection relationships among words included in a word series employed as a candidate for the speech recognition result and stores them into a word connection information storage unit 16. A reevaluation unit 15 corrects the word connection relationships one by one. On the basis of the corrected word connection relationships, the control unit 11 determines the speech recognition result. A word connection managing unit 21 limits times allowed for a boundary between words represented by the word connection relationships to be located thereat.
    Type: Grant
    Filed: November 7, 2001
    Date of Patent: July 3, 2007
    Assignee: Sony Corporation
    Inventors: Katsuki Minamino, Yasuharu Asano, Hiroaki Ogawa, Helmut Lucke
  • Patent number: 7231348
    Abstract: There is provided a voice activity detection method for indicating an active voice mode and an inactive voice mode. The method comprises receiving an input signal having a plurality of frames, determining whether each of the plurality of frames includes an active voice signal or an inactive voice signal, determining a second reflection coefficient for each frame determined to include the inactive voice signal, comparing the second reflection coefficient with a reflection threshold, and selecting the active voice mode if the second reflection coefficient is greater than the reflection threshold. The method may further comprise selecting the inactive voice mode if the second reflection coefficient is not greater than the reflection threshold. The method may also comprise analyzing the input signal to determine an energy level of the input signal, and selecting the active voice mode if the energy level is greater than an energy threshold.
    Type: Grant
    Filed: January 26, 2006
    Date of Patent: June 12, 2007
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Yang Gao, Eyal Shlomot, Adil Benyassine
  • Patent number: 7219057
    Abstract: A speech recognition method includes receiving signals derived from indices of a codebook corresponding to recognition feature vectors extracted from speech to be recognized. The signals include an indication of the number of bits per codebook index. The method also includes obtaining the string of indices from the received signals, obtaining the corresponding recognition feature vectors from the string of indices, and applying the recognition feature vectors to a word-level recognition process. To conserve network capacity, the size of the codebook and the corresponding number of bits per codebook index, are adapted on a dialogue-by-dialogue basis. The adaptation accomplishes a tradeoff between expected recognition rate and expected bitrate by optimizing a metric which is a function of both.
    Type: Grant
    Filed: June 8, 2005
    Date of Patent: May 15, 2007
    Assignee: Koninklijke Philips Electronics
    Inventor: Yin-Pin Yang
  • Patent number: 7219056
    Abstract: Two statistics are disclosed for determining the quality of language models. These statistics are called acoustic perplexity and the synthetic acoustic word error rate (SAWER), and they depend upon methods for computing the acoustic confusability of words. It is possible to substitute models of acoustic data in place of real acoustic data in order to determine acoustic confusability. An evaluation model is created, a synthesizer model is created, and a matrix is determined from the evaluation and synthesizer models. Each of the evaluation and synthesizer models is a hidden Markov model. Once the matrix is determined, a confusability calculation may be performed. Different methods are used to determine synthetic likelihoods. The confusability may be normalized and smoothed and methods are disclosed that increase the speed of performing the matrix inversion and the confusability calculation. A method for caching and reusing computations for similar words is disclosed.
    Type: Grant
    Filed: April 19, 2001
    Date of Patent: May 15, 2007
    Assignee: International Business Machines Corporation
    Inventors: Scott Elliot Axelrod, Peder Andreas Olsen, Harry William Printz, Peter Vincent de Souza
  • Patent number: 7219058
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Grant
    Filed: October 1, 2001
    Date of Patent: May 15, 2007
    Assignee: AT&T Corp.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Patent number: 7191105
    Abstract: A system for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate and animate sound sources. Electromagnetic sensors monitor excitation sources in sound producing systems, such as animate sound sources such as the human voice, or from machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The systems disclosed enable accurate calculation of transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.
    Type: Grant
    Filed: January 22, 2003
    Date of Patent: March 13, 2007
    Assignee: The Regents of the University of California
    Inventors: John F. Holzrichter, Lawrence C. Ng
  • Patent number: 7165029
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: January 16, 2007
    Assignee: Intel Corporation
    Inventor: Ara V. Nefian
  • Patent number: 7162422
    Abstract: The present invention provides a method and apparatus for using user-context information to improve N-best processing in the presence of speech recognition uncertainty. Once digitized voice data is received, the voice data is processed by a speech recognizer to determine one or more phrases recognized as the digitized voice data provided by the user based on the currently active recognition grammar. When more than one phrase is recognized as the digitized voice data provided by the user as a result of voice recognition uncertainty, user-specific context information is used to choose a recognized phrase from the one or more phrases recognized as the digitized voice data.
    Type: Grant
    Filed: September 29, 2000
    Date of Patent: January 9, 2007
    Assignee: Intel Corporation
    Inventor: Steven M. Bennett
  • Patent number: 7117151
    Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.
    Type: Grant
    Filed: March 29, 2005
    Date of Patent: October 3, 2006
    Assignee: Sony Corporation
    Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
  • Patent number: 7117153
    Abstract: A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
    Type: Grant
    Filed: February 13, 2003
    Date of Patent: October 3, 2006
    Assignee: Microsoft Corporation
    Inventors: Milind Mahajan, Yonggang Deng, Alejandro Acero, Asela J. R. Gunawardana, Ciprian Chelba
  • Patent number: 7110947
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Grant
    Filed: December 5, 2000
    Date of Patent: September 19, 2006
    Assignee: AT&T Corp.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 7107214
    Abstract: To achieve an improvement in recognition performance, a non-speech acoustic model correction unit adapts a non-speech acoustic model representing a non-speech state using input data observed during an interval immediately before a speech recognition interval during which speech recognition is performed, by means of one of the most likelihood method, the complex statistic method, and the minimum distance-maximum separation theorem.
    Type: Grant
    Filed: July 8, 2005
    Date of Patent: September 12, 2006
    Assignee: Sony Corporation
    Inventor: Hironaga Nakatsuka
  • Patent number: 7107210
    Abstract: A system and method are provided that reduce noise in pattern recognition signals. To do this, embodiments of the present invention utilize a prior model of dynamic aspects of clean speech together with one or both of a prior model of static aspects of clean speech, and an acoustic model that indicates the relationship between clean speech, noisy speech and noise. In one embodiment, components of a noise-reduced feature vector are produced by forming a weighted sum of predicted values from the prior model of dynamic aspects of clean speech, the prior model of static aspects of clean speech and the acoustic-environmental model.
    Type: Grant
    Filed: May 20, 2002
    Date of Patent: September 12, 2006
    Assignee: Microsoft Corporation
    Inventors: Li Deng, James G. Droppo, Alejandro Acero
  • Patent number: 7050974
    Abstract: A speech communication system comprising a speech input terminal and a speech recognition apparatus which can communicate with each other through a wire or wireless communication network wherein the speech input terminal comprises speech input unit, a unit for creating environment information for speech recognition, which is unique to the speech input terminal or represents its operation state, and a communication control unit for transmitting the environment information to the speech recognition apparatus, and the speech recognition apparatus executes speech recognition processing on the basis of the environment information.
    Type: Grant
    Filed: September 13, 2000
    Date of Patent: May 23, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Masayuki Yamada
  • Patent number: 7043429
    Abstract: A speech recognition system is used to receive a speech signal and output an output language word with respect to the speech signal. The speech recognition system has preset quantities for a first threshold, a second threshold, and a third threshold. The speech recognition system includes a first speech recognition device that is used to receive the speech signal and generate a first candidate language word and a first confidence measurement of the first candidate language word, according to the speech signal. A second speech recognition device is used to receive the speech signal and generate a second candidate language word and a second confidence measurement of the second candidate language word, according to the speech signal. A confidence measurement judging unit is used to output the language word, by comparing the first confidence measurement and the second confidence measurement to the above thresholds.
    Type: Grant
    Filed: March 28, 2002
    Date of Patent: May 9, 2006
    Assignee: Industrial Technology Research Institute
    Inventors: Sen-Chia Chang, Shih-Chien Chien, Jia-Jang Tu
  • Patent number: 7035867
    Abstract: A system for identifying files can use fingerprints to compare various files and determine redundant files. Frequency representations of portions of files can be used, such as Fast Fourier Transforms, as the fingerprints.
    Type: Grant
    Filed: November 28, 2001
    Date of Patent: April 25, 2006
    Assignee: Aerocast.com, Inc.
    Inventors: Mark R. Thompson, Nathan F. Raciborski