Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)
  • Publication number: 20090144058
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Application
    Filed: December 3, 2007
    Publication date: June 4, 2009
    Inventor: Alexander Sorin
  • Publication number: 20090112588
    Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence
    Type: Application
    Filed: October 31, 2007
    Publication date: April 30, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Krishna Kummamuru, Deepak S. Padmanabhan, Shourya Roy, L. Venkata Subramaniam
  • Publication number: 20090112587
    Abstract: A system and method for a speech recognition technology that allows language models to be customized through the addition of special pronunciations for components of phrases, which are added to the factory language models during customization. It allows components of a phrase to have different pronunciations inside customer-added phrases than are specified for those isolated components in the factory language models.
    Type: Application
    Filed: December 3, 2008
    Publication date: April 30, 2009
    Applicant: Dictaphone Corporation
    Inventors: William F. Cote, Jill Carrier
  • Publication number: 20090106023
    Abstract: A speech recognition word dictionary/language model making system for creating a word dictionary for recognizing a word not appearing in a learning text by selecting a word-generation-model-learning-method-by-word-class according to the word to be added which does not appear in the learning text and for making a language model. The speech recognition word dictionary/language model making system (100) includes a language model estimating device (111) for selecting estimating method information from a learning-method-knowledge-by-word-class storing section (109) for each word class of an addition word generating model which is a word generating model of the addition word according to the selected estimating method information and a database combining device (112) for adding an addition word to a word dictionary (105) and adding an addition word generating model to a word-generation-model-by-word-class database (107).
    Type: Application
    Filed: November 30, 2007
    Publication date: April 23, 2009
    Inventor: Kiyokazu Miki
  • Publication number: 20090099841
    Abstract: A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated.
    Type: Application
    Filed: October 3, 2008
    Publication date: April 16, 2009
    Applicant: Kubushiki Kaisha Toshiba
    Inventor: Langzhou CHEN
  • Publication number: 20090063147
    Abstract: A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.
    Type: Application
    Filed: August 18, 2007
    Publication date: March 5, 2009
    Applicant: CONCEPTUAL SPEECH LLC
    Inventor: Philippe Roy
  • Publication number: 20090037172
    Abstract: A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector.
    Type: Application
    Filed: July 23, 2004
    Publication date: February 5, 2009
    Inventors: Maurizio Fodrini, Donato Ettorre, Gianmario Bollano
  • Publication number: 20080312921
    Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.
    Type: Application
    Filed: August 20, 2008
    Publication date: December 18, 2008
    Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
  • Publication number: 20080312928
    Abstract: Disclosed herein is a computer implemented method and system for evaluating a mathematical expression spoken in a natural language by a user. The disclosed method and system provides a natural language speech recognition calculator comprising a speech recognition engine. The spoken mathematical expression is transmitted to the speech recognition engine via an audio input device. Mathematical entities of the spoken mathematical expression are extracted and represented in a hierarchical recursive format of a speech recognition grammar implemented by the speech recognition engine. A symbolic mathematical expression is generated from the extracted mathematical entities and then normalized with common measurement units. The normalized mathematical expression is then evaluated to generate a mathematical result. The mathematical result may be synthesized by a text-to-speech engine to produce a voice output.
    Type: Application
    Filed: September 20, 2007
    Publication date: December 18, 2008
    Inventors: Robert Patrick Goebel, Ravi Shivanna
  • Publication number: 20080294441
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Application
    Filed: December 6, 2006
    Publication date: November 27, 2008
    Inventor: Zsolt Saffer
  • Publication number: 20080270123
    Abstract: The present invention discloses means and method for indicating emotional attitudes of a speaker, either human or animal, according to voice intonation. The invention also discloses a method for advertising, marketing, educating, or lie detecting by indicating emotional attitudes of a speaker and a method of providing remote service by a group comprising at least one observer to at least one speaker. The invention also discloses a system for indicating emotional attitudes of a speaker comprising a glossary of intonations relating intonations to emotions attitudes.
    Type: Application
    Filed: December 20, 2006
    Publication date: October 30, 2008
    Inventors: Yoram Levanon, Lan Lossos
  • Publication number: 20080221891
    Abstract: An interactive speech recognition system includes a database containing a plurality of reference terms, a list memory that receives the reference terms of category “n,” a processing circuit that populates the list memory with the reference terms corresponding to the category “n,” and a recognition circuit that processes the reference terms and terms of a spoken phrase. The recognition circuit determines if a reference term of category “n” matches a term of the spoken phrase.
    Type: Application
    Filed: November 30, 2007
    Publication date: September 11, 2008
    Inventors: Lars Konig, Rainer Saam, Andreas Low
  • Publication number: 20080221884
    Abstract: In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application.
    Type: Application
    Filed: October 1, 2007
    Publication date: September 11, 2008
    Inventors: Joseph P. Cerra, Roman V. Kishchenko, John N. Nguyen, Michael S. Phillips, Han Shu
  • Publication number: 20080219428
    Abstract: A system for voice-activated dialing including means for initiating a call through a first connection between a user's phone and a switch at a central office; responsive to the first connection, means for initiating a second connection over the implicit trunk between the switch and a voice over internet protocol gateway, responsive to the second connection, means for initiating a third connection between the voice over internet protocol gateway and a voice-activated dialing platform; responsive to a keyword sent from the user's phone to the voice-activated dialing platform, means for disconnecting the implicit trunk and signaling the switch to connect to the voice-activated dialing platform over the explicit trunk; and responsive to a dialed number sent from the user's phone to the voice-activated dialing platform, means for handing the call off from the internet protocol gateway to the switch at the central office to process through call the implicit trunk.
    Type: Application
    Filed: March 6, 2007
    Publication date: September 11, 2008
    Inventors: David W. Reece, Roger T. Trueman, John Zeigler
  • Publication number: 20080215328
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Application
    Filed: September 13, 2007
    Publication date: September 4, 2008
    Applicant: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Publication number: 20080167871
    Abstract: A method and apparatus for improving the performance of voice recognition in a mobile device are provided. The method of recognizing a voice includes: monitoring the usage pattern of a user of a device for inputting a voice; selecting predetermined words from among words stored in the device based on the result of monitoring, and storing the selected words; and recognizing a voice based on an acoustic model and predetermined words. In this way, a voice can be recognized by using prediction of whom the user mainly makes a call to. Also, by automatically modeling the device usage pattern of the user and applying the pattern to vocabulary for voice recognition based on probabilities, the performance of voice recognition, as actually felt by the user, can be enhanced.
    Type: Application
    Filed: July 25, 2007
    Publication date: July 10, 2008
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Kyu-hong Kim, Jeong-su Kim, Ick-Sang Han
  • Publication number: 20080162146
    Abstract: A method and device are provided for classifying at least two languages in an automatic dialogue system, which processes digitized speech input. At least one speech recognition method and at least one language identification method are used on the digitized speech input in order, by logical evaluation of the results of the method, to identify the language of the speech input.
    Type: Application
    Filed: December 3, 2007
    Publication date: July 3, 2008
    Applicant: Deutsche Telekom AG
    Inventors: Martin Eckert, Roman Englert, Wiebke Johannsen, Fred Runge, Markus Van Ballegooy
  • Publication number: 20080162125
    Abstract: A method and apparatus for language independent voice searching in a mobile communication device is disclosed. The method may include receiving a search query from a user of the mobile communication device, converting speech parts in the search query into linguistic representations which covers at least one languages, generating a search phoneme lattice based on the linguistic representations, extracting query features from the search phoneme lattice, generating query feature vectors based on the extracted features, performing a coarse search using the query feature vectors and the indexing feature vectors from the indexing database, performing a fine search using the results of the coarse search and the indexing phoneme lattices stored in the indexing database, and outputting the fine search results to a dialog manager.
    Type: Application
    Filed: December 28, 2006
    Publication date: July 3, 2008
    Applicant: Motorola, Inc.
    Inventors: Changxue C. Ma, Feipeng Li
  • Publication number: 20080154601
    Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.
    Type: Application
    Filed: November 20, 2007
    Publication date: June 26, 2008
    Applicant: Microsoft Corporation
    Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
  • Publication number: 20080140401
    Abstract: The present invention is a method and apparatus for reading education. In one embodiment, a method for recognizing an utterance spoken by a reader, includes receiving text to be read by the reader, generating a grammar for speech recognition, in accordance with the text, receiving the utterance, interpreting the utterance in accordance with the grammar, and outputting feedback indicative of reader performance.
    Type: Application
    Filed: December 7, 2007
    Publication date: June 12, 2008
    Inventors: VICTOR ABRASH, DOUGLAS BERCOW
  • Publication number: 20080133240
    Abstract: A spoken dialog system includes: a communication processing section capable of communicating with a terminal device that stores user data; and at least one of a speech synthesis section that generates synthesized speech and a speech recognition section that recognizes input speech. The communication processing section acquires from the terminal device at least one information of reading information and grammatical information, the reading information containing prosodic information on an item value of at least one item of the user data and the grammatical information indicating one or a plurality of recognition grammars on an item value of at least one item of the user data. The speech synthesis section generates the synthesized speech using the reading information acquired by the communication processing section, and the speech recognition section recognizes the input speech using the grammatical information acquired by the communication processing section.
    Type: Application
    Filed: September 21, 2007
    Publication date: June 5, 2008
    Applicant: Fujitsu Limited
    Inventors: Ryosuke Miyata, Toshiyuki Fukuoka, Kyouko Okuyama, Eiji Kitagawa, Takuro Ikeda
  • Publication number: 20080120108
    Abstract: Performing speech recognition on a tonal language is done using a plurality of tonal models. Each tonal model has a multi-space distribution and corresponds to a known syllable in a language. A first data stream indicative of an observation of an utterance is received. The observation has both a discrete and a continuous tonal feature. A second data stream indicative of spectral features of a syllable of an utterance is also received. The first data stream is compared against at least one of the plurality of tonal models and the second data stream is compared against a spectral model.
    Type: Application
    Filed: November 16, 2006
    Publication date: May 22, 2008
    Inventors: Frank Kao-Ping Soong, Yao Qian
  • Publication number: 20080114596
    Abstract: Parameters for a feature extractor and acoustic model of a speech recognition module are trained. An objective function is utilized to determine values for the feature extractor parameters and the acoustic model parameters.
    Type: Application
    Filed: November 15, 2006
    Publication date: May 15, 2008
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, James G. Droppo, Milind V. Mahajan
  • Publication number: 20080077404
    Abstract: A speech recognition device includes an extracting unit that analyzes an input signal and extracts a feature to be used for speech recognition from the input signal; a storing unit configured to store therein an acoustic model that is a stochastic model for estimating what type of a phoneme is included in the feature; a speech-recognition unit that performs speech recognition on the input signal based on the feature and determines a word having maximum likelihood from the acoustic model; and an optimizing unit that dynamically self-optimizes parameters of the feature and the acoustic model depending on at least one of the input signal and a state of the speech recognition performed by the speech-recognition unit.
    Type: Application
    Filed: September 6, 2007
    Publication date: March 27, 2008
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Masami AKAMINE, Remco Teunen
  • Publication number: 20080077403
    Abstract: A speech recognition apparatus predicts, based on the occurrence cycle and duration time of impulse noise that occurs periodically, a segment in which impulse noise occurs, and executes speech recognition processing based on the feature components of the remaining frames excluding a feature component of a frame corresponding to the predicted segment, or the feature components extracted from frames created from sound data excluding a part corresponding to the predicted segment.
    Type: Application
    Filed: May 3, 2007
    Publication date: March 27, 2008
    Applicant: FUJITSU LIMITED
    Inventor: Shoji Hayakawa
  • Publication number: 20080033719
    Abstract: A radio-to-SIP adapter is shown to include a voice detection algorithm processor as well as other circuitry to provide an interface between a radio and SIP adapter to accommodate a transition from half duplex to full duplex and to cause a radio to transmit when human speech is present in an audio signal from a telephony network.
    Type: Application
    Filed: August 3, 2007
    Publication date: February 7, 2008
    Inventors: Douglas Hall, Daniel Floyd
  • Publication number: 20070299663
    Abstract: A method of optimizing audio input for speech recognition applications can include identifying a source waveform and at least one optimization parameter, wherein the optimization parameter is configured to adjust audio input to a speech recognition application. The source waveform can be modified according to the optimization parameter resulting in a modified waveform. At least one optimization parameter can be synchronized with the source waveform. At least two time dependant graphs can be displayed, where the time dependant graphs can include the source waveform, the modified waveform, and/or a graph for the optimization parameter plotted against time.
    Type: Application
    Filed: September 7, 2007
    Publication date: December 27, 2007
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Francis Fado, Peter Guasti
  • Publication number: 20070294083
    Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
    Type: Application
    Filed: June 11, 2007
    Publication date: December 20, 2007
    Inventors: Jerome Bellegarda, Kim Silverman