Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Voice recognition apparatus and method

Patent number: 8364484

Abstract: An input voice detect is detected after starting a voice input waiting state; the detected voice is recognized; an elapsed time from the start of the voice input waiting state is counted; an informative sound which urges a user to input the voice is outputted when the elapsed time reaches a preset output set time; and the output of the informative sound is stopped when the elapsed time at the time of inputting the voice is shorter than the output set timedetect.

Type: Grant

Filed: April 14, 2009

Date of Patent: January 29, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takehide Yano, Tadashi Amada, Kazunori Imoto, Koichi Yamamoto
Frame erasure concealment technique for a bitstream-based feature extractor

Patent number: 8359199

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Grant

Filed: November 29, 2011

Date of Patent: January 22, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Spectro-temporal varying approach for speech enhancement

Patent number: 8352257

Abstract: The present system proposes a technique called the spectro-temporal varying technique, to compute the suppression gain. This method is motivated by the perceptual properties of human auditory system; specifically, that the human ear has higher frequency resolution in the lower frequencies band and less frequency resolution in the higher frequencies, and also that the important speech information in the high frequencies are consonants which usually have random noise spectral shape. A second property of the human auditory system is that the human ear has lower temporal resolution in the lower frequencies and higher temporal resolution in the higher frequencies. Based on that, the system uses a spectro-temporal varying method which introduces the concept of frequency-smoothing by modifying the estimation of the a posteriori SNR. In addition, the system also makes the a priori SNR time-smoothing factor depend on frequency.

Type: Grant

Filed: December 20, 2007

Date of Patent: January 8, 2013

Assignee: QNX Software Systems Limited

Inventors: Phil A. Hetherington, Xueman Li
Document extracting method and document extracting apparatus

Patent number: 8351706

Abstract: Document data corresponding to each page included in a document is stored, and furthermore, feature data indicative of a feature of the document data and a document index indicating the document are associated with the document data. A document extracting apparatus obtains input document data, calculates feature data from the input document data, judges similarity between the input document data and the document data based on the feature data, obtains a document index associated with document data similar to the input document data, and extracts a plurality of pieces of document data associated with the document index. Thus, document data concerning the document including a page corresponding to the document data similar to the input document data is extracted for a plurality of pages.

Type: Grant

Filed: July 23, 2008

Date of Patent: January 8, 2013

Assignee: Sharp Kabushiki Kaisha

Inventor: Hitoshi Hirohata
Method for speech recognition on all languages and for inputing words using speech recognition

Patent number: 8352263

Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

Type: Grant

Filed: September 29, 2009

Date of Patent: January 8, 2013

Inventors: Tze-Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
SEARCHING DEVICE, SEARCHING METHOD, AND PROGRAM

Publication number: 20130006629

Abstract: The present invention relates to a searching device, searching method, and program whereby searching for a word string corresponding to input voice can be performed in a robust manner. A voice recognition unit 11 subjects an input voice to voice recognition. A matching unit 16 performs matching, for each of multiple word strings for search results which are word strings that are to be search results for word strings corresponding to the input voice, of a pronunciation symbol string for search results, which is an array of pronunciation symbols expressing pronunciation of the word string search result, and a recognition result pronunciation symbol string which is an array of pronunciation symbols expressing pronunciation of the voice recognition results of the input voice.

Type: Application

Filed: December 2, 2010

Publication date: January 3, 2013

Applicant: SONY CORPORATION

Inventors: Hitoshi Honda, Yoshinori Maeda, Satoshi Asakawa
System and method for supplemental speech recognition by identified idle resources

Patent number: 8346549

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer.

Type: Grant

Filed: December 4, 2009

Date of Patent: January 1, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Mazin Gilbert
System and method for processing speech recognition

Patent number: 8346550

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: February 14, 2011

Date of Patent: January 1, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
System and method for training diction

Patent number: 8340968

Abstract: A computer-implemented method for automatically training diction of a person acquires a speech data stream of the person as the person is speaking, compares the words in the speech data stream to a set of predefined undesirable phrases provided in a look-up table and upon detection of one of the predefined undesirable phrases in the speech data stream, alerting the person by an alarm.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 25, 2012

Assignee: Lockheed Martin Corporation

Inventor: Vladimir Gershman
Non-Scorable Response Filters For Speech Scoring Systems

Publication number: 20120323573

Abstract: A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.

Type: Application

Filed: March 23, 2012

Publication date: December 20, 2012

Inventors: Su-Youn Yoon, Derrick Higgins, Klaus Zechner, Shasha Xie, Je Hun Jeon, Keelan Evanini
Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics

Patent number: 8332221

Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.

Type: Grant

Filed: August 15, 2011

Date of Patent: December 11, 2012

Assignee: Nuance Communications Austria GmbH

Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
Noise reducing apparatus and noise reducing method

Patent number: 8331583

Abstract: A noise reducing apparatus includes: a voice signal inputting unit inputting an input voice signal; a noise occurrence period detecting unit detecting a noise occurrence period; a noise removing unit removing a noise for the noise occurrence period; a generation source signal acquiring unit acquiring a generation source signal with a time duration corresponding to a time duration corresponding to the noise occurrence period; a pitch calculating unit calculating a pitch of an input voice signal interval; an interval signal setting unit setting interval signals divided in each unit period interval; an interpolation signal generating unit generating an interpolation signal with the time duration corresponding to the noise occurrence period and alternately arranging the interval signal in a forward time direction and the interval signal in a backward time direction; and a combining unit combining the interpolation signal and the input voice signal, from which the noise is removed.

Type: Grant

Filed: February 18, 2010

Date of Patent: December 11, 2012

Assignee: Sony Corporation

Inventor: Kazuhiko Ozawa
Systems and methods of performing speech recognition using gestures

Patent number: 8321219

Abstract: Embodiments of the present invention improve methods of performing speech recognition using human gestures. In one embodiment, the present invention includes a speech recognition method comprising detecting a gesture, selecting a first recognition set based on the gesture, receiving a speech input signal, and recognizing the speech input signal in the context of the first recognition set.

Type: Grant

Filed: September 25, 2008

Date of Patent: November 27, 2012

Assignee: Sensory, Inc.

Inventor: Todd F. Mozer
METHOD AND SYSTEM FOR QUICKLY RECOGNIZING AND RESPONDING TO USER INTENTS AND QUESTIONS FROM NATURAL LANGUAGE INPUT USING INTELLIGENT HIERARCHICAL PROCESSING AND PERSONALIZED ADAPTIVE SEMANTIC INTERFACE

Publication number: 20120296638

Abstract: In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.

Type: Application

Filed: May 18, 2012

Publication date: November 22, 2012

Inventor: Ashish Patwa
SYSTEMS AND METHODS FOR DETERMINING THE N-BEST STRINGS

Publication number: 20120296648

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Application

Filed: July 30, 2012

Publication date: November 22, 2012

Applicant: AT&T Corp.

Inventors: Mehryar Mohri, Michael Dennis Riley
CHINESE SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20120290302

Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.

Type: Application

Filed: April 13, 2012

Publication date: November 15, 2012

Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
Method and apparatus for expanding bandwidth of voice signal

Patent number: 8311842

Abstract: A method and apparatus for expanding a bandwidth of an input narrowband voice signal is provided. The narrowband voice signal is analyzed separately for each frame, and a Degree of Voicing (DV) and a Degree of Stationary (DS) are calculated depending on the analysis. A Degree of Difficulty of Bandwidth Expansion (DDBWE) of the narrowband voice signal is calculated based on DV and DS. Bandwidth expansion is controlled according to DDBWE.

Type: Grant

Filed: March 3, 2008

Date of Patent: November 13, 2012

Assignee: Samsung Electronics Co., Ltd

Inventors: Geun-Bae Song, Min-Sung Kim, Hee-Jin Oh, Austin Kim, Jae-Bum Kim
Speech recognition with non-linear noise reduction on Mel-frequency cepstra

Patent number: 8306817

Abstract: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.

Type: Grant

Filed: January 8, 2008

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, James G. Droppo, Li Deng
Discriminative training of language models for text and speech classification

Patent number: 8306818

Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.

Type: Grant

Filed: April 15, 2008

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
System and Method for Community Feedback and Automatic Ratings for Speech Metrics

Publication number: 20120278075

Abstract: A system and method for collecting from an ASR, a first rating of an intelligibility of human speech, and collecting another intelligibility rating of such speech from networked listeners to such speech. The first rating and the second rating are weighed based on an importance to a user of the ratings, and a third rating is created from such weighted two ratings.

Type: Application

Filed: April 25, 2012

Publication date: November 1, 2012

Inventors: Sherrie Ellen Shammass, Eyal Eshed, Ariel Velikovsky
System And Method For Distributed Speech Recognition

Publication number: 20120253806

Abstract: A system and method for distributed speech recognition is provided. Audio data is obtained from a caller participating in a call with an agent. A main recognizer receives a main grammar template and the audio data. A plurality of secondary recognizers each receive the audio data and a reference that identifies a secondary grammar, which is a non-overlapping section of the main grammar template. Speech recognition is performed on each of the secondary recognizers and speech recognition results are identified by applying the secondary grammar to the audio data. An n number of most likely speech recognition results are selected. The main recognizer constructs a new grammar based on the main grammar template using the speech recognition results from each of the secondary recognizers as a new vocabulary. Further speech recognition results are identified by applying the new grammar to the audio data.

Type: Application

Filed: June 18, 2012

Publication date: October 4, 2012

Inventor: Gilad Odinak
SPEAKER STATE DETECTING APPARATUS AND SPEAKER STATE DETECTING METHOD

Publication number: 20120253807

Abstract: A speaker state detecting apparatus comprises: an audio input unit for acquiring, at least, a first voice emanated by a first speaker and a second voice emanated by a second speaker; a speech interval detecting unit for detecting an overlap period between a first speech period of the first speaker included in the first voice and a second speech period of the second speaker included in the second voice, which starts before the first speech period, or an interval between the first speech period and the second speech period; a state information extracting unit for extracting state information representing a state of the first speaker from the first speech period; and a state detecting unit for detecting the state of the first speaker in the first speech period based on the overlap period or the interval and the first state information.

Type: Application

Filed: February 3, 2012

Publication date: October 4, 2012

Applicant: FUJITSU LIMITED

Inventor: Akira KAMANO
SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD RISK FROM AUDIO SIGNALS

Publication number: 20120253805

Abstract: Systems, methods, and media for determining fraud risk from audio signals and non-audio data are provided herein. Some exemplary methods include receiving an audio signal and an associated audio signal identifier, receiving a fraud event identifier associated with a fraud event, determining a speaker model based on the received audio signal, determining a channel model based on a path of the received audio signal, using a server system, updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identified, and updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.

Type: Application

Filed: March 8, 2012

Publication date: October 4, 2012

Inventors: Anthony Rajakumar, Torsten Zeppenfeld, Lisa Guerra, Vipul Vyas
System and method for identifying audio command prompts for use in a voice response environment

Patent number: 8265932

Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.

Type: Grant

Filed: October 3, 2011

Date of Patent: September 11, 2012

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Speech recognition of character sequences

Patent number: 8255216

Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.

Type: Grant

Filed: October 30, 2006

Date of Patent: August 28, 2012

Assignee: Nuance Communications, Inc.

Inventor: Kenneth D. White
ACOUSTIC VOICE ACTIVITY DETECTION

Publication number: 20120209603

Abstract: Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.

Type: Application

Filed: January 9, 2012

Publication date: August 16, 2012

Inventor: Zhinian Jing
CONSONANT-SEGMENT DETECTION APPARATUS AND CONSONANT-SEGMENT DETECTION METHOD

Publication number: 20120197641

Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.

Type: Application

Filed: February 1, 2012

Publication date: August 2, 2012

Applicant: JVC KENWOOD Corporation

Inventors: Akiko Akechi, Takaaki Yamabe
Method for automatically controling volume level for calculating MOS

Patent number: 8233590

Abstract: The present invention relates to a method of automatically controlling the volume level of communication speech for Mean Opinion Score (MOS) measurement, which, before evaluating the quality of communication speech using a MOS measurement method, automatically controls the volume level of actual communication speech to a predetermined optimal level, thus improving the reliability of MOS values.

Type: Grant

Filed: November 28, 2006

Date of Patent: July 31, 2012

Assignee: Innowireless Co., Ltd.

Inventors: Jong Tae Chung, Jin Soup Joung, Young Su Kwak, Jin Man Kim, Hyun Seok Cho
Class detection scheme and time mediated averaging of class dependent models

Patent number: 8229744

Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.

Type: Grant

Filed: August 26, 2003

Date of Patent: July 24, 2012

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
METHOD AND SYSTEM FOR CANDIDATE MATCHING

Publication number: 20120185251

Abstract: A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates.

Type: Application

Filed: March 26, 2012

Publication date: July 19, 2012

Applicant: HOSHIKO LLC

Inventor: Gary Stephen Shuster
Utterance processing for network-based speech recognition utilizing a client-side cache

Patent number: 8224644

Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.

Type: Grant

Filed: December 18, 2008

Date of Patent: July 17, 2012

Assignee: Microsoft Corporation

Inventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
Arabic poetry meter identification system and method

Patent number: 8219386

Abstract: The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices.

Type: Grant

Filed: January 21, 2009

Date of Patent: July 10, 2012

Assignee: King Fahd University of Petroleum and Minerals

Inventors: Al-Zahrani Abdul Kareem Saleh, Moustafa Elshafei
Voice processing device and program

Patent number: 8214211

Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.

Type: Grant

Filed: August 26, 2008

Date of Patent: July 3, 2012

Assignee: Yamaha Corporation

Inventor: Yasuo Yoshioka
Lattice-based querying

Patent number: 8214210

Abstract: A system for processing a query operates by receiving a first query segment that includes audio speech. Next, the system generates a representation for this first query segment, where the representation includes at least two paths associated with alternative phrase sequences for an ambiguity in the audio speech. The system then compares the paths in the representation to a group of documents and determines matching scores for the group of documents based on the comparisons. Finally, the system presents a ranking of the group of documents, where the ranking is based on the matching scores for the group of documents.

Type: Grant

Filed: September 19, 2006

Date of Patent: July 3, 2012

Assignee: Oracle America, Inc.

Inventor: William A. Woods
Pattern identification method, apparatus, and program

Patent number: 8209172

Abstract: Pattern recognition capable of robust identification for the variance of an input pattern is performed with a low processing cost while the possibility of identification errors is decreased. In a pattern recognition apparatus which identifies the pattern of input data from a data input unit (11) by using a hierarchical feature extraction processor (12) which hierarchically extracts features, an extraction result distribution analyzer (13) analyzes a distribution of at least one feature extraction result obtained by a primary feature extraction processor (121). On the basis of the analytical result, a secondary feature extraction processor (122) performs predetermined secondary feature extraction.

Type: Grant

Filed: December 16, 2004

Date of Patent: June 26, 2012

Assignee: Canon Kabushiki Kaisha

Inventors: Yusuke Mitarai, Masakazu Matsuga, Katsuhiko Mori
Removing bias from features containing overlapping embedded grammars in a natural language understanding system

Patent number: 8204738

Abstract: A method of removing bias from an action classifier within a natural language understanding system can include identifying a sentence having a target embedded grammar that overlaps with at least one other embedded grammar and selecting a group of overlapping embedded grammars including the target embedded grammar and at least one additional embedded grammar. A sentence expansion can be created that includes the sentence including the target embedded grammar and a copy of the sentence for each additional embedded grammar of the group. Each copy of the sentence can include a different additional embedded grammar from the group in place of the target embedded grammar. The sentence expansion can be included within action classifier training data.

Type: Grant

Filed: November 3, 2006

Date of Patent: June 19, 2012

Assignee: Nuance Communications, Inc.

Inventor: Ilya Skuratovsky
METHOD FOR ESTIMATING LANGUAGE MODEL WEIGHT AND SYSTEM FOR THE SAME

Publication number: 20120150539

Abstract: Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.

Type: Application

Filed: December 13, 2011

Publication date: June 14, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Hyung Bae Jeon, Yun Keun Lee, Eui Sok Chung, Jong Jin Kim, Hoon Chung, Jeon Gue Park, Ho Young Jung, Byung Ok Kang, Ki Young Park, Sung Joo Lee, Jeom Ja Kang, Hwa Jeon Song
Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics

Patent number: 8200487

Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document.

Type: Grant

Filed: November 12, 2004

Date of Patent: June 12, 2012

Assignee: Nuance Communications Austria GmbH

Inventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
Sub-audible speech recognition based upon electromyographic signals

Patent number: 8200486

Abstract: Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns (“SASPs”) for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms (“SPTs”) are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.

Type: Grant

Filed: June 5, 2003

Date of Patent: June 12, 2012

Assignee: The United States of America as represented by the Administrator of the National Aeronautics & Space Administration (NASA)

Inventors: Charles C. Jorgensen, Diana D. Lee, Shane T. Agabon
Method for processing speech using absolute loudness

Patent number: 8200488

Abstract: The invention provides a method for processing speech comprising the steps of receiving a speech input (SI) of a speaker, generating speech parameters (SP) from said speech input (SI), determining parameters describing an absolute loudness (L) of said speech input (SI), and evaluating (EV) said speech input (SI) and/or said speech parameters (SP) using said parameters describing the absolute loudness (L). In particular, the step of evaluation (EV) comprises a step of emotion recognition and/or speaker identification. Further, a microphone array comprising a plurality of microphones is used for determining said parameters describing the absolute loudness. With a microphone array the distance of the speaker from the microphone array can be determined and the loudness can be normalized by the distance.

Type: Grant

Filed: December 10, 2003

Date of Patent: June 12, 2012

Assignee: Sony Deutschland GmbH

Inventors: Thomas Kemp, Ralf Kompe, Raquel Tato
Zero-search, zero-memory vector quantization

Patent number: 8185390

Abstract: The invention comprises a method for lossy data compression, akin to vector quantization, in which there is no explicit codebook and no search, i.e. the codebook memory and associated search computation are eliminated. Some memory and computation are still required, but these are dramatically reduced, compared to systems that do not exploit this method. For this reason, both the memory and computation requirements of the method are exponentially smaller than comparable methods that do not exploit the invention. Because there is no explicit codebook to be stored or searched, no such codebook need be generated either. This makes the method well suited to adaptive coding schemes, where the compression system adapts to the statistics of the data presented for processing: both the complexity of the algorithm executed for adaptation, and the amount of data transmitted to synchronize the sender and receiver, are exponentially smaller than comparable existing methods.

Type: Grant

Filed: April 23, 2009

Date of Patent: May 22, 2012

Assignee: Promptu Systems Corporation

Inventor: Harry Printz
Adapting enhanced acoustic models

Patent number: 8185392

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving voice queries, obtaining, for one or more of the voice queries, feedback information that references an action taken by a user that submitted the voice query after reviewing a result of the voice query, generating, for the one or more voice queries, a posterior recognition confidence measure that reflects a probability that the voice query was correctly recognized, wherein the posterior recognition confidence measure is generated based at least on the feedback information for the voice query, selecting a subset of the one or more voice queries based on the posterior recognition confidence measures, and adapting an acoustic model using the subset of the voice queries.

Type: Grant

Filed: September 30, 2011

Date of Patent: May 22, 2012

Assignee: Google Inc.

Inventors: Brian Strope, Douglas H. Beeferman
High performance HMM adaptation with joint compensation of additive and convolutive distortions

Patent number: 8180637

Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

Type: Grant

Filed: December 3, 2007

Date of Patent: May 15, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
Device and method for analyzing an information signal

Patent number: 8175730

Abstract: In order to analyze an information signal, a significant short-time spectrum is extracted from the information signal, the means for extracting being configured to extract such short-time spectra which come closer to a specific characteristic than other short-time spectra of the information signal. The short-time spectra extracted are then decomposed into component signals using ICA analysis, a component signal spectrum representing a profile spectrum of a tone source which generates a tone corresponding to the characteristic sought for. From a sequence of short-time spectra of the information signal and from the profile spectra determined, an amplitude envelope is eventually calculated for each profile spectrum, the amplitude envelope indicating how a profile spectrum of a tone source all in all changes over time.

Type: Grant

Filed: June 30, 2009

Date of Patent: May 8, 2012

Assignee: SONY Corporation

Inventors: Christian Dittmar, Christian Uhle, Jürgen Herre
IMS and SMS interworking

Patent number: 8175236

Abstract: Providing for inter-working between SMS network architectures and IMS network architectures in a mobile environment is described herein. By way of example, a next generation (NG) short message service center (SMSC) is provided that can receive SMS messages in mobile application protocol (MAP) and convert such messages to IMS protocol. In addition, the NG SMSC can also receive IMS data and convert the IMS data to an SMS MAP message. The NG SMSC can reference an IMS or an SMS location registry to determine a location of the target device, and convert from IMS to SMS MAP, and vice versa, as suitable. Accordingly, the NG SMSC can provide an efficient interface between legacy SMS and NG IMS network components while preserving legacy protocols associated with such networks.

Type: Grant

Filed: January 15, 2008

Date of Patent: May 8, 2012

Assignee: AT&T Mobility II LLC

Inventors: Vinod Kumar Pandey, Karl J. Schlieber, Matthew Wayne Stafford, Jianrong Wang
SPEECH DIALECT CLASSIFICATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20120109649

Abstract: Automatic speech recognition including receiving speech via a microphone, pre-processing the received speech to generate acoustic feature vectors, classifying dialect of the received speech, selecting at least one of an acoustic model or a lexicon specific to the classified dialect, decoding the acoustic feature vectors using a processor and at least one of the selected dialect-specific acoustic model or selected lexicon to produce a plurality of hypotheses for the received speech, and post-processing the plurality of hypotheses to identify one of the plurality of hypotheses as the received speech.

Type: Application

Filed: November 1, 2010

Publication date: May 3, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
FRONT-END PROCESSOR FOR SPEECH RECOGNITION, AND SPEECH RECOGNIZING APPARATUS AND METHOD USING THE SAME

Publication number: 20120095762

Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.

Type: Application

Filed: October 19, 2011

Publication date: April 19, 2012

Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
Speech recognition method for both english and chinese

Patent number: 8160866

Abstract: The present invention can recognize both English and Chinese at the same time. The most important skill is that the features of all English words (without samples) are entirely extracted from the features of Chinese syllables. The invention normalizes the signal waveforms of variable lengths for English words (Chinese syllables) such that the same words (syllables) can have the same features at the same time position. Hence the Bayesian classifier can recognize both the fast and slow utterance of sentences. The invention can improve the feature such that the speech recognition of the unknown English (Chinese) is guaranteed to be correct. Furthermore, since the invention can create the features of English words from the features of Chinese syllables, it can also create the features of other languages from the features of Chinese syllables and hence it can also recognize other languages, such as German, French, Japanese, Korean, Russian, etc.

Type: Grant

Filed: October 10, 2008

Date of Patent: April 17, 2012

Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
Method and system for asynchronously processing natural language utterances

Patent number: 8155962

Abstract: The methods and systems described herein may asynchronously process natural language utterances to provide real-time response performance and natural interaction with users. In particular, the methods and systems described herein may use various natural language speech recognition and interpretation components to identify a request (e.g., a query or command) in an utterance. The request identified in the utterance may then be processed with one or more domain agents, which may submit duplicate queries to multiple different data sources to process the request. The domain agents may then asynchronously evaluate responses to the duplicate queries to return results to users in a timely and natural manner, and further to account the fact that the different data sources may respond to the queries at different speeds, provide unsatisfactory responses to the queries, or fail to respond to the queries at all.

Type: Grant

Filed: July 19, 2010

Date of Patent: April 10, 2012

Assignee: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
Dialog system for human agent to correct abnormal output

Patent number: 8155959

Abstract: Systems and methods are described that automatically control modules of dialog systems. The systems and methods include a dialog module that receives and processes utterances from a speaker and outputs data used to generate synthetic speech outputs as responses to the utterances. A controller is coupled to the dialog module, and the controller detects an abnormal output of the dialog module when the dialog module is processing in an automatic mode. The controller comprises a mode control for an agent to control the dialog module by correcting the abnormal output and transferring a corrected output to a downstream dialog module that follows, in a processing path, the dialog module. The corrected output is used in further processing the utterances.

Type: Grant

Filed: November 7, 2007

Date of Patent: April 10, 2012

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Baoshi Yan, Zhe Feng

prev … 4 5 6 7 8 9 10 11 12 … next