Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Method and apparatus for automated voice dialing setup

Patent number: 7636426

Abstract: A telecommunications device includes a voice dialer and a text-to-speech engine. The text-to-speech engine is configured to convert at least a portion of a user contact list information to speech and the voice dialer is configured to receive an audio input and perform a voice recognition, comparing said audio input to converted user contact list information.

Type: Grant

Filed: August 10, 2005

Date of Patent: December 22, 2009

Assignee: Siemens Communications, Inc.

Inventors: Sarah Korah, John Vuong
MULTIPLE AUDIO/VIDEO DATA STREAM SIMULATION METHOD AND SYSTEM

Publication number: 20090313015

Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.

Type: Application

Filed: June 13, 2008

Publication date: December 17, 2009

Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
Frame erasure concealment technique for a bitstream-based feature extractor

Patent number: 7630894

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Grant

Filed: August 1, 2006

Date of Patent: December 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Apparatus and method for extracting syllabic nuclei

Patent number: 7627468

Abstract: An apparatus enabling automatic determination of a portion that reliably represents a feature of a speech waveform includes: an acoustic/prosodic analysis unit calculating, from data, distribution of an energy of a prescribed frequency range of the speech waveform on a time axis, and for extracting, among various syllables of the speech waveform, a range that is generated stably, based on the distribution and the pitch of the speech waveform; cepstral analysis unit estimating, based on the spectral distribution of the speech waveform on the time axis, a range of the speech waveform of which change is well controlled by a speaker; and a pseudo-syllabic center extracting unit extracting, as a portion of high reliability of the speech waveform, that range which has been estimated to be the stably generated range and of which change is estimated to be well controlled by the speaker.

Type: Grant

Filed: February 21, 2003

Date of Patent: December 1, 2009

Assignees: Japan Science and Technology Agency, Advanced Telecommunication Research Institute International

Inventors: Nick Campbell, Parham Mokhtari
Anti-clipping method for image sharpness enhancement

Patent number: 7620263

Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.

Type: Grant

Filed: October 6, 2005

Date of Patent: November 17, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
Speaker identifying apparatus and computer program product

Patent number: 7617102

Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract ge

Type: Grant

Filed: September 27, 2006

Date of Patent: November 10, 2009

Assignee: Advanced Telecommunications Research Institute International

Inventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
METHOD AND SYSTEM FOR ROBUST PATTERN MATCHING IN CONTINUOUS SPEECH

Publication number: 20090276216

Abstract: A method for speech recognition, the method includes: extracting time—frequency speech features from a series of reference speech elements in a first series of sampling windows; aligning reference speech elements that are not of equal time span duration; constructing a common subspace for the aligned speech features; determining a first set of coefficient vectors; extracting a time—frequency feature image from a test speech stream spanned by a second sampling window; approximating the extracted image in the common subspace for the aligned extracted time—frequency speech features with a second coefficient vector; computing a similarity measure between the first and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream is made in response to a similarity measure below a predefined threshold.

Type: Application

Filed: May 2, 2008

Publication date: November 5, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Lisa Amini, Pascal Frossard, Effrosyni Kokiopoulou, Oliver Verscheure
EMOTION DETECTING METHOD, EMOTION DETECTING APPARATUS, EMOTION DETECTING PROGRAM THAT IMPLEMENTS THE SAME METHOD, AND STORAGE MEDIUM THAT STORES THE SAME PROGRAM

Publication number: 20090265170

Abstract: An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.

Type: Application

Filed: September 13, 2007

Publication date: October 22, 2009

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Go Irie, Kouta Hidaka, Takashi Satou, Yukinobu Taniguchi, Shinya Nakajima
Method and apparatus for determining the possibility of pattern recognition of time series signal

Patent number: 7603274

Abstract: A method and apparatus for determining the possibility of pattern recognition of time series signal independent of a pattern recognition ratio is provided. The method for determining the possibility of pattern recognition of time series signal includes extracting a time forward feature and a time reversed feature from an input signal having a time series pattern, generating time forward alignment and time reversed alignment by using the time forward feature and the time reversed feature, comparing the time forward alignment with the time reversed alignment to compute a likelihood of pattern recognition, and determining that the input signal can be recognized if the likelihood is larger than a predetermined threshold value.

Type: Grant

Filed: November 2, 2005

Date of Patent: October 13, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventor: Kwangil Hwang
Segment set creating method and apparatus

Patent number: 7603278

Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.

Type: Grant

Filed: September 14, 2005

Date of Patent: October 13, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
Device and method for translating language

Patent number: 7593842

Abstract: A device and method for translating language is disclosed. In one embodiment, for example, a method for providing a translated output signal derived from a speech input signal, comprises receiving a speech input signal in a first language, converting the speech input signal into a digital format, comprising a voice model component representing a speech pattern of the speech input signal and a content component representing a content of the speech input signal, translating the content component from the first language into a second language to provide a translated content component; and generating an audible output signal comprising the translated content in an approximation of the speech pattern of the speech input signal.

Type: Grant

Filed: December 10, 2003

Date of Patent: September 22, 2009

Inventor: Leslie Rousseau
Lattice matching

Patent number: 7590605

Abstract: A system is described for matching lattices such as phoneme lattices generated by an automatic speech recognition unit. The system can be used to retrieve files from a database by comparing a query lattice with annotation lattices associated with the data files that can be retrieved, and by retrieving the data files having an annotation lattice most similar to the query lattice.

Type: Grant

Filed: July 16, 2004

Date of Patent: September 15, 2009

Assignee: Canon Kabushiki Kaisha

Inventor: Ljubomir Josifovski
Speaker clustering and adaptation method based on the HMM model variation information and its apparatus for speech recognition

Patent number: 7590537

Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.

Type: Grant

Filed: December 27, 2004

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
Correlating video images of lip movements with audio signals to improve speech recognition

Patent number: 7587318

Abstract: A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.

Type: Grant

Filed: September 12, 2003

Date of Patent: September 8, 2009

Assignee: Broadcom Corporation

Inventor: Nambi Seshadri
Method and Apparatus for Voice Searching for Stored Content Using Uniterm Discovery

Publication number: 20090210226

Abstract: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.

Type: Application

Filed: February 15, 2008

Publication date: August 20, 2009

Inventor: Changxue Ma
Applications of sub-audible speech recognition based upon electromyographic signals

Patent number: 7574357

Abstract: Method and system for generating electromyographic or sub-audible signals (“SAWPs”) and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.

Type: Grant

Filed: June 24, 2005

Date of Patent: August 11, 2009

Assignee: The United States of America as represented by the Admimnistrator of the National Aeronautics and Space Administration (NASA)

Inventors: C. Charles Jorgensen, Bradley J. Betts
System and method of spoken language understanding using word confusion networks

Patent number: 7571098

Abstract: Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments.

Type: Grant

Filed: May 29, 2003

Date of Patent: August 4, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, Jeremy Huntley Wright
Sound Processing Device and Program

Publication number: 20090192788

Abstract: In a sound processing device, a modulation spectrum specifier specifies a modulation spectrum of an input sound for each of a plurality of unit intervals. An index calculator calculates an index value corresponding to a magnitude of components of modulation frequencies belonging to a predetermined range of the modulation spectrum. A determinator determines whether the input sound of each of the unit intervals is a vocal sound or a non-vocal sound based on the index value.

Type: Application

Filed: January 23, 2009

Publication date: July 30, 2009

Applicant: Yamaha Corporation

Inventor: Yasuo YOSHIOKA
Low latency real-time vocal tract length normalization

Patent number: 7567903

Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

Type: Grant

Filed: January 12, 2005

Date of Patent: July 28, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Device and method for analyzing an information signal

Patent number: 7565213

Abstract: A significant short-time spectrum is extracted from an information signal, the means for extracting being configured to extract such short-time spectra which come closer to a specific characteristic than others. The short-time spectra extracted are then decomposed into component signals using ICA analysis, a component signal spectrum representing a profile spectrum of a tone source which generates a tone corresponding to the characteristic sought. From a sequence of short-time spectra of the information signal and from the profile spectra determined, an amplitude envelope is calculated for each profile spectrum to indicate how a tone source profile spectrum changes over time. The profile spectra and all the amplitude envelopes associated therewith provide a description of the information signal which may be evaluated further, for example for transcription purposes in the case of a music signal.

Type: Grant

Filed: May 5, 2005

Date of Patent: July 21, 2009

Assignee: Gracenote, Inc.

Inventors: Christian Dittmar, Christian Uhle, Jürgen Herre
Active learning process for spoken dialog systems

Patent number: 7562014

Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.

Type: Grant

Filed: September 26, 2007

Date of Patent: July 14, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z Hakkani-Tur, Mazin G Rahim, Giuseppe Riccardi, Gokhan Tur
SPEECH RECOGNITION SYSTEM AND METHOD WITH CEPSTRAL NOISE SUBTRACTION

Publication number: 20090157400

Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.

Type: Application

Filed: October 1, 2008

Publication date: June 18, 2009

Applicant: Industrial Technology Research Institute

Inventor: Shih-Ming Huang
Anomaly recognition method for data streams

Patent number: 7546236

Abstract: This invention identifies anomalies in a data stream, without prior training, by measuring the difficulty in finding similarities between neighborhoods in the ordered sequence of elements. Data elements in an area that is similar to much of the rest of the scene score low mismatches. On the other hand a region that possesses many dissimilarities with other parts of the ordered sequence will attract a high score of mismatches. The invention makes use of a trial and error process to find dissimilarities between parts of the data stream and does not require prior knowledge of the nature of the anomalies that may be present. The method avoids the use of processing dependencies between data elements and is capable of a straightforward parallel implementation for each data element. The invention is of application in searching for anomalous patterns in data streams, which include audio signals, health screening and geographical data. A method of error correction is also described.

Type: Grant

Filed: March 24, 2003

Date of Patent: June 9, 2009

Assignee: British Telecommunications public limited company

Inventor: Frederick W M Stentiford
METHOD, SYSTEM AND COMPUTER PROGRAM FOR ENHANCED SPEECH RECOGNITION OF DIGITS INPUT STRINGS

Publication number: 20090125306

Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.

Type: Application

Filed: September 19, 2008

Publication date: May 14, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Remi Lejeune, Hubert Crepy
Signal enhancement via noise reduction for speech recognition

Patent number: 7533015

Abstract: Provides speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise. Signal enhancement includes: subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.

Type: Grant

Filed: February 28, 2005

Date of Patent: May 12, 2009

Assignee: International Business Machines Corporation

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
Minimum bayes error feature selection in speech recognition

Patent number: 7529666

Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.

Type: Grant

Filed: October 30, 2000

Date of Patent: May 5, 2009

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George A. Saon
Two stage utterance verification device and method thereof in speech recognition system

Patent number: 7529665

Abstract: A two stage utterance verification device and a method thereof are provided. The two stage utterance verification method includes performing a first utterance verification function based on a SVM pattern classification method by using feature data inputted from a search block of a speech recognizer and performing a second utterance verification function based on a CART pattern classification method by using heterogeneity feature data including meta data extracted from a preprocessing module, intermediate results from function blocks of the speech recognizer and the result of the first utterance verification function. Therefore, the two state utterance verification device and the method thereof provide a high quality speech recognition service to a user.

Type: Grant

Filed: April 1, 2005

Date of Patent: May 5, 2009

Assignee: Electronics and Telecommunications Research Institute

Inventors: Sanghun Kim, YoungJik Lee
System and method for implementing a refined dictionary for speech recognition

Patent number: 7529668

Abstract: A system and method for implementing a refined dictionary for speech recognition includes a database analyzer that initially identifies first vocabulary words that are present in a training database and second vocabulary words that are not present in the training database. A relevance module then performs refinement procedures upon the first vocabulary words to produce refined short word pronunciations and refined long word pronunciations that are added to a refined dictionary. A consensus module compares the second pronunciations with calculated plurality pronunciations to identify final consensus pronunciations that are then included in the refined dictionary.

Type: Grant

Filed: August 3, 2004

Date of Patent: May 5, 2009

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Gustavo Abrego, Lex S. Olorenshaw
TIMING OF SPEECH RECOGNITION OVER LOSSY TRANSMISSION SYSTEMS

Publication number: 20090112585

Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.

Type: Application

Filed: December 29, 2008

Publication date: April 30, 2009

Applicant: AT&T Corp.

Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
Feature extraction apparatus and method and pattern recognition apparatus and method

Patent number: 7509256

Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.

Type: Grant

Filed: March 29, 2005

Date of Patent: March 24, 2009

Assignee: Sony Corporation

Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
Generalized Lempel-Ziv compression for multimedia signals

Patent number: 7505897

Abstract: The subject matter includes systems, engines, and methods for generalizing a class of Lempel-Ziv algorithms for lossy compression of multimedia. One implementation of the subject matter compresses audio signals. Because music, especially electronically generated music, has a substantial level of repetitiveness within a single audio clip, the basic Lempel-Ziv compression technique can be generalized to support representing a single window of an audio signal using a linear combination of filtered past windows. Exemplary similarity searches and filtering strategies for finding the past windows are described.

Type: Grant

Filed: January 27, 2005

Date of Patent: March 17, 2009

Assignee: Microsoft Corporation

Inventors: Darko Kirovski, Zeph Landau
COMBINING RESULTS OF IMAGE RETRIEVAL PROCESSES

Publication number: 20090070110

Abstract: A MMR system for newspaper publishing comprises a plurality of mobile devices, an MMR gateway, an MMR matching unit and an MMR publisher. The MMR matching unit receives an image query from the MMR gateway and sends it to one or more of the recognition units to identify a result including a document, the page and the location on the page. The MMR matching unit also includes a result combiner coupled to each of the recognition units to receive recognition results. The result combiner produces a list of most likely results and associated confidence scores. This list of results is sent by the result combiner back to the MMR gateway for presentation on the mobile device. The result combiner uses the quality predictor as an input in deciding which results are best. The present invention also includes a number of novel methods including a method for generating the list of best results.

Type: Application

Filed: September 15, 2008

Publication date: March 12, 2009

Inventors: Berna Erol, Jonathan J. Hull, Jorge Moraleda
SYSTEM AND METHOD FOR PROVIDING A COMPENSATED SPEECH RECOGNITION MODEL FOR SPEECH RECOGNITION

Publication number: 20090063144

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a communications device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Application

Filed: November 4, 2008

Publication date: March 5, 2009

Applicant: AT&T Corp.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
FEATURE EXTRACTING APPARATUS, COMPUTER PROGRAM PRODUCT, AND FEATURE EXTRACTION METHOD

Publication number: 20090048835

Abstract: A feature extracting apparatus includes a spectrum calculator that calculates a logarithmic frequency spectrum including frequency components obtained from an input speech signal at regular intervals on a logarithmic frequency scale of a frame; a function calculator that calculates a cross-correlation function between a logarithmic frequency spectrum of a time and a logarithmic frequency spectrum of one or plural times included in a certain temporal width before and after the time, from a sequence of the logarithmic frequency spectra calculated at each time; and a feature extractor that extracts a set of the cross-correlation functions as a local and relative fundamental-frequency pattern feature at the frame.

Type: Application

Filed: March 4, 2008

Publication date: February 19, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Takashi Masuko
Method and apparatus handling speech recognition errors in spoken dialogue systems

Patent number: 7493257

Abstract: To handle portions of a recognized sentence having an error, a user is questioned about contents associated with portions. According to a user's answer, a result is obtained. Speech recognition unit extracts a speech feature of a speech signal inputted from user and finds a phoneme nearest to the speech feature to recognize a word. Recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subjected to speech recognition according to predetermined criterion based on both sentence confidence and result of examining semantic structure. Meta-dialogue generation unit generates a question asking user for additional information based on content of a portion where the error exists and a type of the error.

Type: Grant

Filed: August 5, 2004

Date of Patent: February 17, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jung-eun Kim, Jae-won Lee
USING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL WITH A NAVIGATION SYSTEM

Publication number: 20090030685

Abstract: Speech recorded by an audio capture facility of a navigation facility is processed by a speech recognition facility to generate results that are provided to the navigation facility. When information related to a navigation application running on the navigation facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the navigation facility may optionally be allowed to edit the results being provided to the navigation facility. The speech recognition facility may also adapt speech recognition based on usage of the results.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Yongdeng Chen
SYSTEM AND METHOD FOR TRACKING DIALOGUE STATES USING PARTICLE FILTERS

Publication number: 20090030683

Abstract: Disclosed are methods, systems, and computer-readable media for tracking dialog states in a spoken dialog system. The method comprises casting a plurality of dialog states, or particles, as a network describing the probability relationships between each of a plurality of variables, sampling a subset of the plurality of dialog states, or particles, in the network, for each sampled dialog state, or particle, projecting into the future, assigning a weight to each sampled particle, and normalizing the assigned weights to yield a new estimated distribution over each variable's values, wherein the distribution over the variables is used in a spoken dialog system. Also disclosed is a method of tuning performance of the methods, systems, and computer-readable media by adding or removing particles to/from the network.

Type: Application

Filed: July 26, 2007

Publication date: January 29, 2009

Applicant: AT&T Labs, Inc

Inventor: Jason WILLIAMS
USING SPEECH RECOGNITION RESULTS BASED ON AN UNSTRUCTURED LANGUAGE MODEL IN A MOBILE COMMUNICATION FACILITY APPLICATION

Publication number: 20090030684

Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.

Type: Application

Filed: August 1, 2008

Publication date: January 29, 2009

Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
Multi-Class Constrained Maximum Likelihood Linear Regression

Publication number: 20090024390

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Application

Filed: May 2, 2008

Publication date: January 22, 2009

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Neeraj Deshmukh, Puming Zhan
Method of speech recognition using multimodal variational inference with switching state space models

Patent number: 7480615

Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.

Type: Grant

Filed: January 20, 2004

Date of Patent: January 20, 2009

Assignee: Microsoft Corporation

Inventors: Hagai Attias, Li Deng, Leo Lee
Method and device for characterizing a signal and method and device for producing an indexed signal

Patent number: 7478045

Abstract: In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.

Type: Grant

Filed: July 15, 2002

Date of Patent: January 13, 2009

Assignee: M2ANY GmbH

Inventors: Eric Allamanche, Jürgen Herre, Oliver Hellmuth, Thorsten Kastner
Signal detection using maximum a posteriori likelihood and noise spectral difference

Patent number: 7475012

Abstract: Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.

Type: Grant

Filed: December 9, 2004

Date of Patent: January 6, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Philip Garner, Toshiaki Fukada, Yasuhiro Komori
Audio-visual feature fusion and support vector machine useful for continuous speech recognition

Patent number: 7472063

Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Type: Grant

Filed: December 19, 2002

Date of Patent: December 30, 2008

Assignee: Intel Corporation

Inventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
VOICE PROCESSING METHODS AND SYSTEMS

Publication number: 20080306738

Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.

Type: Application

Filed: June 6, 2008

Publication date: December 11, 2008

Applicant: NATIONAL TAIWAN UNIVERSITY

Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
Speech recognition utilizing multitude of speech features

Patent number: 7464031

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Grant

Filed: November 28, 2003

Date of Patent: December 9, 2008

Assignee: International Business Machines Corporation

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Efficient Speech Recognition with Cluster Methods

Publication number: 20080300875

Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.

Type: Application

Filed: June 4, 2008

Publication date: December 4, 2008

Inventors: Kaisheng Yao, Yu Tsao
Method and System for Automatically Providing Linguistic Formulations that are Outside a Recognition Domain of an Automatic Speech Recognition System

Publication number: 20080270129

Abstract: A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.

Type: Application

Filed: February 17, 2005

Publication date: October 30, 2008

Applicant: Loquendo S.p.A.

Inventors: Daniele Colibro, Claudio Vair, Luciano Fissore, Cosmin Popovici
SYSTEMS AND METHODS FOR REDUCING ANNOTATION TIME

Publication number: 20080270130

Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.

Type: Application

Filed: July 1, 2008

Publication date: October 30, 2008

Applicant: AT&T Corp.

Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
METHOD AND SYSTEM FOR THE AUTOMATIC GENERATION OF SPEECH FEATURES FOR SCORING HIGH ENTROPY SPEECH

Publication number: 20080249773

Abstract: A method and system for automatically generating a scoring model for scoring a speech sample are disclosed. One or more training speech samples are received in response to a prompt. One or more speech features are determined for each of the training speech samples. A scoring model is then generated based on the speech features. At least one of the training speech samples may be a high entropy speech sample. An evaluation speech sample is received and a score is assigned to the evaluation speech sample using the scoring model. The evaluation speech sample may be a high entropy speech sample.

Type: Application

Filed: June 16, 2008

Publication date: October 9, 2008

Inventors: Isaac Bejar, Klaus Zechner
SYSTEM AND METHOD FOR LIKELIHOOD COMPUTATION IN MULTI-STREAM HMM BASED SPEECH RECOGNITION

Publication number: 20080235015

Abstract: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.

Type: Application

Filed: June 2, 2008

Publication date: September 25, 2008

Inventors: Stephen Mingyu Chu, Vaibhava Goel, Etienne Marcheret, Gerasimos Potamianos

prev … 7 8 9 10 11 12 13 14 15 … next