Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)

Restoration of high-order Mel Frequency Cepstral Coefficients

Publication number: 20090144058

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Application

Filed: December 3, 2007

Publication date: June 4, 2009

Inventor: Alexander Sorin
METHOD FOR SEGMENTING COMMUNICATION TRANSCRIPTS USING UNSUPERVSED AND SEMI-SUPERVISED TECHNIQUES

Publication number: 20090112588

Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence

Type: Application

Filed: October 31, 2007

Publication date: April 30, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Krishna Kummamuru, Deepak S. Padmanabhan, Shourya Roy, L. Venkata Subramaniam
SYSTEM AND METHOD FOR GENERATING A PHRASE PRONUNCIATION

Publication number: 20090112587

Abstract: A system and method for a speech recognition technology that allows language models to be customized through the addition of special pronunciations for components of phrases, which are added to the factory language models during customization. It allows components of a phrase to have different pronunciations inside customer-added phrases than are specified for those isolated components in the factory language models.

Type: Application

Filed: December 3, 2008

Publication date: April 30, 2009

Applicant: Dictaphone Corporation

Inventors: William F. Cote, Jill Carrier
Speech recognition word dictionary/language model making system, method, and program, and speech recognition system

Publication number: 20090106023

Abstract: A speech recognition word dictionary/language model making system for creating a word dictionary for recognizing a word not appearing in a learning text by selecting a word-generation-model-learning-method-by-word-class according to the word to be added which does not appear in the learning text and for making a language model. The speech recognition word dictionary/language model making system (100) includes a language model estimating device (111) for selecting estimating method information from a learning-method-knowledge-by-word-class storing section (109) for each word class of an addition word generating model which is a word generating model of the addition word according to the selected estimating method information and a database combining device (112) for adding an addition word to a word dictionary (105) and adding an addition word generating model to a word-generation-model-by-word-class database (107).

Type: Application

Filed: November 30, 2007

Publication date: April 23, 2009

Inventor: Kiyokazu Miki
AUTOMATIC SPEECH RECOGNITION METHOD AND APPARATUS

Publication number: 20090099841

Abstract: A system for calculating the look ahead probabilities at the nodes in a language model look ahead tree, wherein the words of the vocabulary of the language are located at the leaves of the tree, said apparatus comprising: means to assign a language model probability to each of the words of the vocabulary using a first low order language model; means to calculate the language look ahead probabilities for all nodes in said tree using said first language model; means to determine if the language model probability of one or more words of said vocabulary can be calculated using a higher order language model and updating said words with the higher order language model; and means to update the look ahead probability at only the nodes which are affected by the words where the language model has been updated.

Type: Application

Filed: October 3, 2008

Publication date: April 16, 2009

Applicant: Kubushiki Kaisha Toshiba

Inventor: Langzhou CHEN
PHONETIC, SYNTACTIC AND CONCEPTUAL ANALYSIS DRIVEN SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20090063147

Abstract: A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.

Type: Application

Filed: August 18, 2007

Publication date: March 5, 2009

Applicant: CONCEPTUAL SPEECH LLC

Inventor: Philippe Roy
Method for generating a vector codebook, method and device for compressing data, and distributed speech recognition system

Publication number: 20090037172

Abstract: A method for compressing data, the data being represented by an input vector having Q features, wherein Q is an integer higher than 1, including the steps of 1) providing a vector codebook of sub-sets of indexed Q-feature reference vectors and threshold values associated with the sub-sets for a prefixed feature; 2) identifying a sub-set of reference vectors among the sub-sets by progressively comparing the value of a feature of the input vector which corresponds to the prefixed feature, with the threshold values associated with the sub-sets; and 3) identifying the reference vector which, within the sub-set identified in step 2), provides the lowest distortion with respect to the input vector.

Type: Application

Filed: July 23, 2004

Publication date: February 5, 2009

Inventors: Maurizio Fodrini, Donato Ettorre, Gianmario Bollano
SPEECH RECOGNITION UTILIZING MULTITUDE OF SPEECH FEATURES

Publication number: 20080312921

Abstract: In a speech recognition system, the combination of a log-linear model with a multitude of speech features is provided to recognize unknown speech utterances. The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using a log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping, and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition.

Type: Application

Filed: August 20, 2008

Publication date: December 18, 2008

Inventors: Scott E. Axelrod, Sreeram Viswanath Balakrishnan, Stanley F. Chen, Yuging Gao, Rameah A. Gopinath, Hong-Kwang Kuo, Benoit Maison, David Nahamoo, Michael Alan Picheny, George A. Saon, Geoffrey G. Zweig
Natural language speech recognition calculator

Publication number: 20080312928

Abstract: Disclosed herein is a computer implemented method and system for evaluating a mathematical expression spoken in a natural language by a user. The disclosed method and system provides a natural language speech recognition calculator comprising a speech recognition engine. The spoken mathematical expression is transmitted to the speech recognition engine via an audio input device. Mathematical entities of the spoken mathematical expression are extracted and represented in a hierarchical recursive format of a speech recognition grammar implemented by the speech recognition engine. A symbolic mathematical expression is generated from the extracted mathematical entities and then normalized with common measurement units. The normalized mathematical expression is then evaluated to generate a mathematical result. The mathematical result may be synthesized by a text-to-speech engine to produce a voice output.

Type: Application

Filed: September 20, 2007

Publication date: December 18, 2008

Inventors: Robert Patrick Goebel, Ravi Shivanna
Speech Recognition System with Huge Vocabulary

Publication number: 20080294441

Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.

Type: Application

Filed: December 6, 2006

Publication date: November 27, 2008

Inventor: Zsolt Saffer
System for Indicating Emotional Attitudes Through Intonation Analysis and Methods Thereof

Publication number: 20080270123

Abstract: The present invention discloses means and method for indicating emotional attitudes of a speaker, either human or animal, according to voice intonation. The invention also discloses a method for advertising, marketing, educating, or lie detecting by indicating emotional attitudes of a speaker and a method of providing remote service by a group comprising at least one observer to at least one speaker. The invention also discloses a system for indicating emotional attitudes of a speaker comprising a glossary of intonations relating intonations to emotions attitudes.

Type: Application

Filed: December 20, 2006

Publication date: October 30, 2008

Inventors: Yoram Levanon, Lan Lossos
INTERACTIVE SPEECH RECOGNITION SYSTEM

Publication number: 20080221891

Abstract: An interactive speech recognition system includes a database containing a plurality of reference terms, a list memory that receives the reference terms of category “n,” a processing circuit that populates the list memory with the reference terms corresponding to the category “n,” and a recognition circuit that processes the reference terms and terms of a spoken phrase. The recognition circuit determines if a reference term of category “n” matches a term of the spoken phrase.

Type: Application

Filed: November 30, 2007

Publication date: September 11, 2008

Inventors: Lars Konig, Rainer Saam, Andreas Low
MOBILE ENVIRONMENT SPEECH PROCESSING FACILITY

Publication number: 20080221884

Abstract: In embodiments of the present invention improved capabilities are described for a mobile environment speech processing facility. The present invention may provide for the entering of text into a software application resident on a mobile communication facility, where recorded speech may be presented by the user using the mobile communications facility's resident capture facility. Transmission of the recording may be provided through a wireless communication facility to a speech recognition facility, and may be accompanied by information related to the software application. Results may be generated utilizing the speech recognition facility that may be independent of structured grammar, and may be based at least in part on the information relating to the software application and the recording. The results may then be transmitted to the mobile communications facility, where they may be loaded into the software application.

Type: Application

Filed: October 1, 2007

Publication date: September 11, 2008

Inventors: Joseph P. Cerra, Roman V. Kishchenko, John N. Nguyen, Michael S. Phillips, Han Shu
System and method for voice-activated dialing over implicit and explicit NFA trunks

Publication number: 20080219428

Abstract: A system for voice-activated dialing including means for initiating a call through a first connection between a user's phone and a switch at a central office; responsive to the first connection, means for initiating a second connection over the implicit trunk between the switch and a voice over internet protocol gateway, responsive to the second connection, means for initiating a third connection between the voice over internet protocol gateway and a voice-activated dialing platform; responsive to a keyword sent from the user's phone to the voice-activated dialing platform, means for disconnecting the implicit trunk and signaling the switch to connect to the voice-activated dialing platform over the explicit trunk; and responsive to a dialed number sent from the user's phone to the voice-activated dialing platform, means for handing the call off from the internet protocol gateway to the switch at the central office to process through call the implicit trunk.

Type: Application

Filed: March 6, 2007

Publication date: September 11, 2008

Inventors: David W. Reece, Roger T. Trueman, John Zeigler
METHOD AND SYSTEM FOR AUTOMATICALLY DETECTING MORPHEMES IN A TASK CLASSIFICATION SYSTEM USING LATTICES

Publication number: 20080215328

Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.

Type: Application

Filed: September 13, 2007

Publication date: September 4, 2008

Applicant: AT&T Corp.

Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
Method and apparatus for speech recognition using device usage pattern of user

Publication number: 20080167871

Abstract: A method and apparatus for improving the performance of voice recognition in a mobile device are provided. The method of recognizing a voice includes: monitoring the usage pattern of a user of a device for inputting a voice; selecting predetermined words from among words stored in the device based on the result of monitoring, and storing the selected words; and recognizing a voice based on an acoustic model and predetermined words. In this way, a voice can be recognized by using prediction of whom the user mainly makes a call to. Also, by automatically modeling the device usage pattern of the user and applying the pattern to vocabulary for voice recognition based on probabilities, the performance of voice recognition, as actually felt by the user, can be enhanced.

Type: Application

Filed: July 25, 2007

Publication date: July 10, 2008

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Kyu-hong Kim, Jeong-su Kim, Ick-Sang Han
METHOD AND DEVICE FOR CLASSIFYING SPOKEN LANGUAGE IN SPEECH DIALOG SYSTEMS

Publication number: 20080162146

Abstract: A method and device are provided for classifying at least two languages in an automatic dialogue system, which processes digitized speech input. At least one speech recognition method and at least one language identification method are used on the digitized speech input in order, by logical evaluation of the results of the method, to identify the language of the speech input.

Type: Application

Filed: December 3, 2007

Publication date: July 3, 2008

Applicant: Deutsche Telekom AG

Inventors: Martin Eckert, Roman Englert, Wiebke Johannsen, Fred Runge, Markus Van Ballegooy
METHOD AND APPARATUS FOR LANGUAGE INDEPENDENT VOICE INDEXING AND SEARCHING

Publication number: 20080162125

Abstract: A method and apparatus for language independent voice searching in a mobile communication device is disclosed. The method may include receiving a search query from a user of the mobile communication device, converting speech parts in the search query into linguistic representations which covers at least one languages, generating a search phoneme lattice based on the linguistic representations, extracting query features from the search phoneme lattice, generating query feature vectors based on the extracted features, performing a coarse search using the query feature vectors and the indexing feature vectors from the indexing database, performing a fine search using the results of the coarse search and the indexing phoneme lattices stored in the indexing database, and outputting the fine search results to a dialog manager.

Type: Application

Filed: December 28, 2006

Publication date: July 3, 2008

Applicant: Motorola, Inc.

Inventors: Changxue C. Ma, Feipeng Li
METHOD AND SYSTEM FOR PROVIDING MENU AND OTHER SERVICES FOR AN INFORMATION PROCESSING SYSTEM USING A TELEPHONE OR OTHER AUDIO INTERFACE

Publication number: 20080154601

Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.

Type: Application

Filed: November 20, 2007

Publication date: June 26, 2008

Applicant: Microsoft Corporation

Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
METHOD AND APPARATUS FOR READING EDUCATION

Publication number: 20080140401

Abstract: The present invention is a method and apparatus for reading education. In one embodiment, a method for recognizing an utterance spoken by a reader, includes receiving text to be read by the reader, generating a grammar for speech recognition, in accordance with the text, receiving the utterance, interpreting the utterance in accordance with the grammar, and outputting feedback indicative of reader performance.

Type: Application

Filed: December 7, 2007

Publication date: June 12, 2008

Inventors: VICTOR ABRASH, DOUGLAS BERCOW
Spoken dialog system, terminal device, speech information management device and recording medium with program recorded thereon

Publication number: 20080133240

Abstract: A spoken dialog system includes: a communication processing section capable of communicating with a terminal device that stores user data; and at least one of a speech synthesis section that generates synthesized speech and a speech recognition section that recognizes input speech. The communication processing section acquires from the terminal device at least one information of reading information and grammatical information, the reading information containing prosodic information on an item value of at least one item of the user data and the grammatical information indicating one or a plurality of recognition grammars on an item value of at least one item of the user data. The speech synthesis section generates the synthesized speech using the reading information acquired by the communication processing section, and the speech recognition section recognizes the input speech using the grammatical information acquired by the communication processing section.

Type: Application

Filed: September 21, 2007

Publication date: June 5, 2008

Applicant: Fujitsu Limited

Inventors: Ryosuke Miyata, Toshiyuki Fukuoka, Kyouko Okuyama, Eiji Kitagawa, Takuro Ikeda
Multi-space distribution for pattern recognition based on mixed continuous and discrete observations

Publication number: 20080120108

Abstract: Performing speech recognition on a tonal language is done using a plurality of tonal models. Each tonal model has a multi-space distribution and corresponds to a known syllable in a language. A first data stream indicative of an observation of an utterance is received. The observation has both a discrete and a continuous tonal feature. A second data stream indicative of spectral features of a syllable of an utterance is also received. The first data stream is compared against at least one of the plurality of tonal models and the second data stream is compared against a spectral model.

Type: Application

Filed: November 16, 2006

Publication date: May 22, 2008

Inventors: Frank Kao-Ping Soong, Yao Qian
DISCRIMINATIVE TRAINING FOR SPEECH RECOGNITION

Publication number: 20080114596

Abstract: Parameters for a feature extractor and acoustic model of a speech recognition module are trained. An objective function is utilized to determine values for the feature extractor parameters and the acoustic model parameters.

Type: Application

Filed: November 15, 2006

Publication date: May 15, 2008

Applicant: Microsoft Corporation

Inventors: Alejandro Acero, James G. Droppo, Milind V. Mahajan
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20080077404

Abstract: A speech recognition device includes an extracting unit that analyzes an input signal and extracts a feature to be used for speech recognition from the input signal; a storing unit configured to store therein an acoustic model that is a stochastic model for estimating what type of a phoneme is included in the feature; a speech-recognition unit that performs speech recognition on the input signal based on the feature and determines a word having maximum likelihood from the acoustic model; and an optimizing unit that dynamically self-optimizes parameters of the feature and the acoustic model depending on at least one of the input signal and a state of the speech recognition performed by the speech-recognition unit.

Type: Application

Filed: September 6, 2007

Publication date: March 27, 2008

Applicant: Kabushiki Kaisha Toshiba

Inventors: Masami AKAMINE, Remco Teunen
Speech recognition method, speech recognition apparatus and computer program

Publication number: 20080077403

Abstract: A speech recognition apparatus predicts, based on the occurrence cycle and duration time of impulse noise that occurs periodically, a segment in which impulse noise occurs, and executes speech recognition processing based on the feature components of the remaining frames excluding a feature component of a frame corresponding to the predicted segment, or the feature components extracted from frames created from sound data excluding a part corresponding to the predicted segment.

Type: Application

Filed: May 3, 2007

Publication date: March 27, 2008

Applicant: FUJITSU LIMITED

Inventor: Shoji Hayakawa
VOICE MODULATION RECOGNITION IN A RADIO-TO-SIP ADAPTER

Publication number: 20080033719

Abstract: A radio-to-SIP adapter is shown to include a voice detection algorithm processor as well as other circuitry to provide an interface between a radio and SIP adapter to accommodate a transition from half duplex to full duplex and to cause a radio to transmit when human speech is present in an audio signal from a telephony network.

Type: Application

Filed: August 3, 2007

Publication date: February 7, 2008

Inventors: Douglas Hall, Daniel Floyd
SPEECH RECOGNITION OPTIMIZATION TOOL

Publication number: 20070299663

Abstract: A method of optimizing audio input for speech recognition applications can include identifying a source waveform and at least one optimization parameter, wherein the optimization parameter is configured to adjust audio input to a speech recognition application. The source waveform can be modified according to the optimization parameter resulting in a modified waveform. At least one optimization parameter can be synchronized with the source waveform. At least two time dependant graphs can be displayed, where the time dependant graphs can include the source waveform, the modified waveform, and/or a graph for the optimization parameter plotted against time.

Type: Application

Filed: September 7, 2007

Publication date: December 27, 2007

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Francis Fado, Peter Guasti
Fast, language-independent method for user authentication by voice

Publication number: 20070294083

Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.

Type: Application

Filed: June 11, 2007

Publication date: December 20, 2007

Inventors: Jerome Bellegarda, Kim Silverman

prev 1 2 3