Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 7636426
    Abstract: A telecommunications device includes a voice dialer and a text-to-speech engine. The text-to-speech engine is configured to convert at least a portion of a user contact list information to speech and the voice dialer is configured to receive an audio input and perform a voice recognition, comparing said audio input to converted user contact list information.
    Type: Grant
    Filed: August 10, 2005
    Date of Patent: December 22, 2009
    Assignee: Siemens Communications, Inc.
    Inventors: Sarah Korah, John Vuong
  • Publication number: 20090313015
    Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.
    Type: Application
    Filed: June 13, 2008
    Publication date: December 17, 2009
    Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
  • Patent number: 7630894
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Grant
    Filed: August 1, 2006
    Date of Patent: December 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard Vandervoort Cox, Hong Kook Kim
  • Patent number: 7627468
    Abstract: An apparatus enabling automatic determination of a portion that reliably represents a feature of a speech waveform includes: an acoustic/prosodic analysis unit calculating, from data, distribution of an energy of a prescribed frequency range of the speech waveform on a time axis, and for extracting, among various syllables of the speech waveform, a range that is generated stably, based on the distribution and the pitch of the speech waveform; cepstral analysis unit estimating, based on the spectral distribution of the speech waveform on the time axis, a range of the speech waveform of which change is well controlled by a speaker; and a pseudo-syllabic center extracting unit extracting, as a portion of high reliability of the speech waveform, that range which has been estimated to be the stably generated range and of which change is estimated to be well controlled by the speaker.
    Type: Grant
    Filed: February 21, 2003
    Date of Patent: December 1, 2009
    Assignees: Japan Science and Technology Agency, Advanced Telecommunication Research Institute International
    Inventors: Nick Campbell, Parham Mokhtari
  • Patent number: 7620263
    Abstract: An image processing system provides image enhancement and anti-clipping units. The anti-clipping unit for image sharpness enhancement, operates such that any shoot artifacts in the enhanced image that go beyond pixel value lower/upper bounds are properly adjusted back within the lower and upper bounds, without causing prominent edge jaggedness artifacts in the final resulting output image.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: November 17, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Surapong Lertrattanapanich, Yeong-Taeg Kim, Zhi Zhou
  • Patent number: 7617102
    Abstract: A speaker identifying apparatus includes: a module for performing a principal component analysis on predetermined vocal tract geometrical parameters of a plurality of speakers and calculating an average and principal component vectors representing speaker-dependent variation; a module for performing acoustic analysis on the speech data being uttered for each of the speakers to calculate cepstrum coefficients; a module for calculating principal component coefficients for approximating the vocal tract geometrical parameter of each of the plurality of speakers by a linear sum of principal component coefficients; a module for determining, by multiple regression analysis, a coefficient sequence for estimating principal component coefficients by a linear sum of the plurality of prescribed features, for each of the plurality of speakers; a module for calculating a plurality of features from speech data of the speaker to be identified, and estimating principal component coefficients for calculating the vocal tract ge
    Type: Grant
    Filed: September 27, 2006
    Date of Patent: November 10, 2009
    Assignee: Advanced Telecommunications Research Institute International
    Inventors: Parham Mokhtari, Tatsuya Kitamura, Hironori Takemoto, Seiji Adachi, Kiyoshi Honda
  • Publication number: 20090276216
    Abstract: A method for speech recognition, the method includes: extracting time—frequency speech features from a series of reference speech elements in a first series of sampling windows; aligning reference speech elements that are not of equal time span duration; constructing a common subspace for the aligned speech features; determining a first set of coefficient vectors; extracting a time—frequency feature image from a test speech stream spanned by a second sampling window; approximating the extracted image in the common subspace for the aligned extracted time—frequency speech features with a second coefficient vector; computing a similarity measure between the first and the second coefficient vector; determining if the similarity measure is below a predefined threshold; and wherein a match between the reference speech elements and a portion of the test speech stream is made in response to a similarity measure below a predefined threshold.
    Type: Application
    Filed: May 2, 2008
    Publication date: November 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lisa Amini, Pascal Frossard, Effrosyni Kokiopoulou, Oliver Verscheure
  • Publication number: 20090265170
    Abstract: An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.
    Type: Application
    Filed: September 13, 2007
    Publication date: October 22, 2009
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Go Irie, Kouta Hidaka, Takashi Satou, Yukinobu Taniguchi, Shinya Nakajima
  • Patent number: 7603274
    Abstract: A method and apparatus for determining the possibility of pattern recognition of time series signal independent of a pattern recognition ratio is provided. The method for determining the possibility of pattern recognition of time series signal includes extracting a time forward feature and a time reversed feature from an input signal having a time series pattern, generating time forward alignment and time reversed alignment by using the time forward feature and the time reversed feature, comparing the time forward alignment with the time reversed alignment to compute a likelihood of pattern recognition, and determining that the input signal can be recognized if the likelihood is larger than a predetermined threshold value.
    Type: Grant
    Filed: November 2, 2005
    Date of Patent: October 13, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Kwangil Hwang
  • Patent number: 7603278
    Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.
    Type: Grant
    Filed: September 14, 2005
    Date of Patent: October 13, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
  • Patent number: 7593842
    Abstract: A device and method for translating language is disclosed. In one embodiment, for example, a method for providing a translated output signal derived from a speech input signal, comprises receiving a speech input signal in a first language, converting the speech input signal into a digital format, comprising a voice model component representing a speech pattern of the speech input signal and a content component representing a content of the speech input signal, translating the content component from the first language into a second language to provide a translated content component; and generating an audible output signal comprising the translated content in an approximation of the speech pattern of the speech input signal.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: September 22, 2009
    Inventor: Leslie Rousseau
  • Patent number: 7590605
    Abstract: A system is described for matching lattices such as phoneme lattices generated by an automatic speech recognition unit. The system can be used to retrieve files from a database by comparing a query lattice with annotation lattices associated with the data files that can be retrieved, and by retrieving the data files having an annotation lattice most similar to the query lattice.
    Type: Grant
    Filed: July 16, 2004
    Date of Patent: September 15, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventor: Ljubomir Josifovski
  • Patent number: 7590537
    Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
  • Patent number: 7587318
    Abstract: A speech recognition device can include an audio signal receiver configured to receive audio signals from a speech source, a video signal receiver configured to receive video signals from the speech source, and a processing unit configured to process the audio signals and the video signals. In addition, the speech recognition device can include a conversion unit configured to convert the audio signals and the video signals to recognizable speech, and an implementation unit configured to implement a task based on the recognizable speech.
    Type: Grant
    Filed: September 12, 2003
    Date of Patent: September 8, 2009
    Assignee: Broadcom Corporation
    Inventor: Nambi Seshadri
  • Publication number: 20090210226
    Abstract: A method, system and communication device for enabling voice-to-voice searching and ordered content retrieval via audio tags assigned to individual content, which tags generate uniterms that are matched against components of a voice query. The method includes storing content and tagging at least one of the content with an audio tag. The method further includes receiving a voice query to retrieve content stored on the device. When the voice query is received, the method completes a voice-to-voice search utilizing uniterms of the audio tag, scored against the phoneme latent lattice model generated by the voice query to identify matching terms within the audio tags and corresponding stored content. The retrieved content(s) associated with the identified audio tags having uniterms that score within the phoneme lattice model are outputted in an order corresponding to an order in which the uniterms are structured within the voice query.
    Type: Application
    Filed: February 15, 2008
    Publication date: August 20, 2009
    Inventor: Changxue Ma
  • Patent number: 7574357
    Abstract: Method and system for generating electromyographic or sub-audible signals (“SAWPs”) and for transmitting and recognizing the SAWPs that represent the original words and/or phrases. The SAWPs may be generated in an environment that interferes excessively with normal speech or that requires stealth communications, and may be transmitted using encoded, enciphered or otherwise transformed signals that are less subject to signal distortion or degradation in the ambient environment.
    Type: Grant
    Filed: June 24, 2005
    Date of Patent: August 11, 2009
    Assignee: The United States of America as represented by the Admimnistrator of the National Aeronautics and Space Administration (NASA)
    Inventors: C. Charles Jorgensen, Bradley J. Betts
  • Patent number: 7571098
    Abstract: Word lattices that are generated by an automatic speech recognition system are used to generate a modified word lattice that is usable by a spoken language understanding module. In one embodiment, the spoken language understanding module determines a set of salient phrases by calculating an intersection of the modified word lattice, which is optionally preprocessed, and a finite state machine that includes a plurality of salient grammar fragments.
    Type: Grant
    Filed: May 29, 2003
    Date of Patent: August 4, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Giuseppe Riccardi, Gokhan Tur, Jeremy Huntley Wright
  • Publication number: 20090192788
    Abstract: In a sound processing device, a modulation spectrum specifier specifies a modulation spectrum of an input sound for each of a plurality of unit intervals. An index calculator calculates an index value corresponding to a magnitude of components of modulation frequencies belonging to a predetermined range of the modulation spectrum. A determinator determines whether the input sound of each of the unit intervals is a vocal sound or a non-vocal sound based on the index value.
    Type: Application
    Filed: January 23, 2009
    Publication date: July 30, 2009
    Applicant: Yamaha Corporation
    Inventor: Yasuo YOSHIOKA
  • Patent number: 7567903
    Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: July 28, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Patent number: 7565213
    Abstract: A significant short-time spectrum is extracted from an information signal, the means for extracting being configured to extract such short-time spectra which come closer to a specific characteristic than others. The short-time spectra extracted are then decomposed into component signals using ICA analysis, a component signal spectrum representing a profile spectrum of a tone source which generates a tone corresponding to the characteristic sought. From a sequence of short-time spectra of the information signal and from the profile spectra determined, an amplitude envelope is calculated for each profile spectrum to indicate how a tone source profile spectrum changes over time. The profile spectra and all the amplitude envelopes associated therewith provide a description of the information signal which may be evaluated further, for example for transcription purposes in the case of a music signal.
    Type: Grant
    Filed: May 5, 2005
    Date of Patent: July 21, 2009
    Assignee: Gracenote, Inc.
    Inventors: Christian Dittmar, Christian Uhle, Jürgen Herre
  • Patent number: 7562014
    Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.
    Type: Grant
    Filed: September 26, 2007
    Date of Patent: July 14, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z Hakkani-Tur, Mazin G Rahim, Giuseppe Riccardi, Gokhan Tur
  • Publication number: 20090157400
    Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.
    Type: Application
    Filed: October 1, 2008
    Publication date: June 18, 2009
    Applicant: Industrial Technology Research Institute
    Inventor: Shih-Ming Huang
  • Patent number: 7546236
    Abstract: This invention identifies anomalies in a data stream, without prior training, by measuring the difficulty in finding similarities between neighborhoods in the ordered sequence of elements. Data elements in an area that is similar to much of the rest of the scene score low mismatches. On the other hand a region that possesses many dissimilarities with other parts of the ordered sequence will attract a high score of mismatches. The invention makes use of a trial and error process to find dissimilarities between parts of the data stream and does not require prior knowledge of the nature of the anomalies that may be present. The method avoids the use of processing dependencies between data elements and is capable of a straightforward parallel implementation for each data element. The invention is of application in searching for anomalous patterns in data streams, which include audio signals, health screening and geographical data. A method of error correction is also described.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: June 9, 2009
    Assignee: British Telecommunications public limited company
    Inventor: Frederick W M Stentiford
  • Publication number: 20090125306
    Abstract: The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.
    Type: Application
    Filed: September 19, 2008
    Publication date: May 14, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Remi Lejeune, Hubert Crepy
  • Patent number: 7533015
    Abstract: Provides speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise. Signal enhancement includes: subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.
    Type: Grant
    Filed: February 28, 2005
    Date of Patent: May 12, 2009
    Assignee: International Business Machines Corporation
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7529666
    Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.
    Type: Grant
    Filed: October 30, 2000
    Date of Patent: May 5, 2009
    Assignee: International Business Machines Corporation
    Inventors: Mukund Padmanabhan, George A. Saon
  • Patent number: 7529665
    Abstract: A two stage utterance verification device and a method thereof are provided. The two stage utterance verification method includes performing a first utterance verification function based on a SVM pattern classification method by using feature data inputted from a search block of a speech recognizer and performing a second utterance verification function based on a CART pattern classification method by using heterogeneity feature data including meta data extracted from a preprocessing module, intermediate results from function blocks of the speech recognizer and the result of the first utterance verification function. Therefore, the two state utterance verification device and the method thereof provide a high quality speech recognition service to a user.
    Type: Grant
    Filed: April 1, 2005
    Date of Patent: May 5, 2009
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Sanghun Kim, YoungJik Lee
  • Patent number: 7529668
    Abstract: A system and method for implementing a refined dictionary for speech recognition includes a database analyzer that initially identifies first vocabulary words that are present in a training database and second vocabulary words that are not present in the training database. A relevance module then performs refinement procedures upon the first vocabulary words to produce refined short word pronunciations and refined long word pronunciations that are added to a refined dictionary. A consensus module compares the second pronunciations with calculated plurality pronunciations to identify final consensus pronunciations that are then included in the refined dictionary.
    Type: Grant
    Filed: August 3, 2004
    Date of Patent: May 5, 2009
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Gustavo Abrego, Lex S. Olorenshaw
  • Publication number: 20090112585
    Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
    Type: Application
    Filed: December 29, 2008
    Publication date: April 30, 2009
    Applicant: AT&T Corp.
    Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
  • Patent number: 7509256
    Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.
    Type: Grant
    Filed: March 29, 2005
    Date of Patent: March 24, 2009
    Assignee: Sony Corporation
    Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
  • Patent number: 7505897
    Abstract: The subject matter includes systems, engines, and methods for generalizing a class of Lempel-Ziv algorithms for lossy compression of multimedia. One implementation of the subject matter compresses audio signals. Because music, especially electronically generated music, has a substantial level of repetitiveness within a single audio clip, the basic Lempel-Ziv compression technique can be generalized to support representing a single window of an audio signal using a linear combination of filtered past windows. Exemplary similarity searches and filtering strategies for finding the past windows are described.
    Type: Grant
    Filed: January 27, 2005
    Date of Patent: March 17, 2009
    Assignee: Microsoft Corporation
    Inventors: Darko Kirovski, Zeph Landau
  • Publication number: 20090070110
    Abstract: A MMR system for newspaper publishing comprises a plurality of mobile devices, an MMR gateway, an MMR matching unit and an MMR publisher. The MMR matching unit receives an image query from the MMR gateway and sends it to one or more of the recognition units to identify a result including a document, the page and the location on the page. The MMR matching unit also includes a result combiner coupled to each of the recognition units to receive recognition results. The result combiner produces a list of most likely results and associated confidence scores. This list of results is sent by the result combiner back to the MMR gateway for presentation on the mobile device. The result combiner uses the quality predictor as an input in deciding which results are best. The present invention also includes a number of novel methods including a method for generating the list of best results.
    Type: Application
    Filed: September 15, 2008
    Publication date: March 12, 2009
    Inventors: Berna Erol, Jonathan J. Hull, Jorge Moraleda
  • Publication number: 20090063144
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a communications device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Application
    Filed: November 4, 2008
    Publication date: March 5, 2009
    Applicant: AT&T Corp.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Publication number: 20090048835
    Abstract: A feature extracting apparatus includes a spectrum calculator that calculates a logarithmic frequency spectrum including frequency components obtained from an input speech signal at regular intervals on a logarithmic frequency scale of a frame; a function calculator that calculates a cross-correlation function between a logarithmic frequency spectrum of a time and a logarithmic frequency spectrum of one or plural times included in a certain temporal width before and after the time, from a sequence of the logarithmic frequency spectra calculated at each time; and a feature extractor that extracts a set of the cross-correlation functions as a local and relative fundamental-frequency pattern feature at the frame.
    Type: Application
    Filed: March 4, 2008
    Publication date: February 19, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Takashi Masuko
  • Patent number: 7493257
    Abstract: To handle portions of a recognized sentence having an error, a user is questioned about contents associated with portions. According to a user's answer, a result is obtained. Speech recognition unit extracts a speech feature of a speech signal inputted from user and finds a phoneme nearest to the speech feature to recognize a word. Recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subjected to speech recognition according to predetermined criterion based on both sentence confidence and result of examining semantic structure. Meta-dialogue generation unit generates a question asking user for additional information based on content of a portion where the error exists and a type of the error.
    Type: Grant
    Filed: August 5, 2004
    Date of Patent: February 17, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung-eun Kim, Jae-won Lee
  • Publication number: 20090030685
    Abstract: Speech recorded by an audio capture facility of a navigation facility is processed by a speech recognition facility to generate results that are provided to the navigation facility. When information related to a navigation application running on the navigation facility are provided to the speech recognition facility, the results generated are based at least in part on the application related information. The speech recognition facility uses an unstructured language model for generating results. The user of the navigation facility may optionally be allowed to edit the results being provided to the navigation facility. The speech recognition facility may also adapt speech recognition based on usage of the results.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Yongdeng Chen
  • Publication number: 20090030683
    Abstract: Disclosed are methods, systems, and computer-readable media for tracking dialog states in a spoken dialog system. The method comprises casting a plurality of dialog states, or particles, as a network describing the probability relationships between each of a plurality of variables, sampling a subset of the plurality of dialog states, or particles, in the network, for each sampled dialog state, or particle, projecting into the future, assigning a weight to each sampled particle, and normalizing the assigned weights to yield a new estimated distribution over each variable's values, wherein the distribution over the variables is used in a spoken dialog system. Also disclosed is a method of tuning performance of the methods, systems, and computer-readable media by adding or removing particles to/from the network.
    Type: Application
    Filed: July 26, 2007
    Publication date: January 29, 2009
    Applicant: AT&T Labs, Inc
    Inventor: Jason WILLIAMS
  • Publication number: 20090030684
    Abstract: A method and system for entering information into a software application resident on a mobile communication facility is provided. The method and system may include recording speech presented by a user using a mobile communication facility resident capture facility, transmitting the recording through a wireless communication facility to a speech recognition facility, transmitting information relating to the software application to the speech recognition facility, generating results utilizing the speech recognition facility using an unstructured language model based at least in part on the information relating to the software application and the recording, transmitting the results to the mobile communications facility, loading the results into the software application and simultaneously displaying the results as a set of words and as a set of application results based on those words.
    Type: Application
    Filed: August 1, 2008
    Publication date: January 29, 2009
    Inventors: Joseph P. Cerra, John N. Nguyen, Michael S. Phillips, Han Shu, Alexandra Beth Mischke
  • Publication number: 20090024390
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Application
    Filed: May 2, 2008
    Publication date: January 22, 2009
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Neeraj Deshmukh, Puming Zhan
  • Patent number: 7480615
    Abstract: A method of efficiently setting posterior probability parameters for a switching state space model begins by defining a window containing at least two but fewer than all of the frames. A separate posterior probability parameter is determined for each frame in the window. The window is then shifted sequentially from left to right in time so that it includes one or more subsequent frames in the sequence of frames. A separate posterior probability parameter is then determined for each frame in the shifted window. This method closely approximates a more rigorous solution but saves computational cost by two to three orders of magnitude. Further, a method of determining the optimal discrete state sequence in the switching state space model is invented that directly exploits the observation vector on a frame-by-frame basis and operates from left to right in time.
    Type: Grant
    Filed: January 20, 2004
    Date of Patent: January 20, 2009
    Assignee: Microsoft Corporation
    Inventors: Hagai Attias, Li Deng, Leo Lee
  • Patent number: 7478045
    Abstract: In a method for characterizing a signal representing an audio content a measure is determined for a tonality of the signal, whereupon a statement is made about the audio content of the signal on the basis of the measure for the tonality of the signal. The measure for the tonality is derived from a quotient whose numerator is the mean of the summed values of spectral components of the signal exponentiated with a first power and whose denominator is the mean of the summed values of spectral components exponentiated with a second power, the first and second powers differing from each other. The measure for the tonality of the signal for the content analysis is robust in relation to a signal distortion, due e.g. to MP3 coding, and has a high correlation with the content of the analyzed signal.
    Type: Grant
    Filed: July 15, 2002
    Date of Patent: January 13, 2009
    Assignee: M2ANY GmbH
    Inventors: Eric Allamanche, Jürgen Herre, Oliver Hellmuth, Thorsten Kastner
  • Patent number: 7475012
    Abstract: Robust signal detection against various types of background noise is implemented. According to a signal detection apparatus, the feature amount of an input signal sequence and the feature amount of a noise component contained in the signal sequence are extracted. After that, the first likelihood indicating probability that the signal sequence is detected and the second likelihood indicating probability that the noise component is detected are calculated on the basis of a predetermined signal-to-noise ratio and the extracted feature amount of the signal sequence. Additionally, a likelihood ratio indicating the ratio between the first likelihood and the second likelihood is calculated. Detection of the signal sequence is determined on the basis of the likelihood ratio.
    Type: Grant
    Filed: December 9, 2004
    Date of Patent: January 6, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Philip Garner, Toshiaki Fukada, Yasuhiro Komori
  • Patent number: 7472063
    Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.
    Type: Grant
    Filed: December 19, 2002
    Date of Patent: December 30, 2008
    Assignee: Intel Corporation
    Inventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
  • Publication number: 20080306738
    Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.
    Type: Application
    Filed: June 6, 2008
    Publication date: December 11, 2008
    Applicant: NATIONAL TAIWAN UNIVERSITY
    Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
  • Patent number: 7464031
    Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
    Type: Grant
    Filed: November 28, 2003
    Date of Patent: December 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Ramesh A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
  • Publication number: 20080300875
    Abstract: A speech recognition method and system, the method comprising the steps of providing a speech model, said speech model includes at least a portion of a state of Gaussian, clustering said Gaussian of said speech model to give N clusters of Gaussians, wherein N is an integer and utilizing said Gaussian in recognizing an utterance.
    Type: Application
    Filed: June 4, 2008
    Publication date: December 4, 2008
    Inventors: Kaisheng Yao, Yu Tsao
  • Publication number: 20080270129
    Abstract: A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.
    Type: Application
    Filed: February 17, 2005
    Publication date: October 30, 2008
    Applicant: Loquendo S.p.A.
    Inventors: Daniele Colibro, Claudio Vair, Luciano Fissore, Cosmin Popovici
  • Publication number: 20080270130
    Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
    Type: Application
    Filed: July 1, 2008
    Publication date: October 30, 2008
    Applicant: AT&T Corp.
    Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
  • Publication number: 20080249773
    Abstract: A method and system for automatically generating a scoring model for scoring a speech sample are disclosed. One or more training speech samples are received in response to a prompt. One or more speech features are determined for each of the training speech samples. A scoring model is then generated based on the speech features. At least one of the training speech samples may be a high entropy speech sample. An evaluation speech sample is received and a score is assigned to the evaluation speech sample using the scoring model. The evaluation speech sample may be a high entropy speech sample.
    Type: Application
    Filed: June 16, 2008
    Publication date: October 9, 2008
    Inventors: Isaac Bejar, Klaus Zechner
  • Publication number: 20080235015
    Abstract: A system and method for speech recognition includes determining active Gaussians related to a first feature stream and a second feature stream by labeling at least one of the first and second streams, and determining active Gaussians co-occurring in the first stream and the second stream based upon joint probability. A number of Gaussians computed is reduced based upon Gaussians already computed for the first stream and a number of Gaussians co-occurring in the second stream. Speech is decoded based on the Gaussians computed for the first and second streams.
    Type: Application
    Filed: June 2, 2008
    Publication date: September 25, 2008
    Inventors: Stephen Mingyu Chu, Vaibhava Goel, Etienne Marcheret, Gerasimos Potamianos