Specialized Equations Or Comparisons Patents (Class 704/236)
-
Patent number: 8364484Abstract: An input voice detect is detected after starting a voice input waiting state; the detected voice is recognized; an elapsed time from the start of the voice input waiting state is counted; an informative sound which urges a user to input the voice is outputted when the elapsed time reaches a preset output set time; and the output of the informative sound is stopped when the elapsed time at the time of inputting the voice is shorter than the output set timedetect.Type: GrantFiled: April 14, 2009Date of Patent: January 29, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Takehide Yano, Tadashi Amada, Kazunori Imoto, Koichi Yamamoto
-
Patent number: 8359199Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.Type: GrantFiled: November 29, 2011Date of Patent: January 22, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard Vandervoort Cox, Hong Kook Kim
-
Patent number: 8352257Abstract: The present system proposes a technique called the spectro-temporal varying technique, to compute the suppression gain. This method is motivated by the perceptual properties of human auditory system; specifically, that the human ear has higher frequency resolution in the lower frequencies band and less frequency resolution in the higher frequencies, and also that the important speech information in the high frequencies are consonants which usually have random noise spectral shape. A second property of the human auditory system is that the human ear has lower temporal resolution in the lower frequencies and higher temporal resolution in the higher frequencies. Based on that, the system uses a spectro-temporal varying method which introduces the concept of frequency-smoothing by modifying the estimation of the a posteriori SNR. In addition, the system also makes the a priori SNR time-smoothing factor depend on frequency.Type: GrantFiled: December 20, 2007Date of Patent: January 8, 2013Assignee: QNX Software Systems LimitedInventors: Phil A. Hetherington, Xueman Li
-
Patent number: 8351706Abstract: Document data corresponding to each page included in a document is stored, and furthermore, feature data indicative of a feature of the document data and a document index indicating the document are associated with the document data. A document extracting apparatus obtains input document data, calculates feature data from the input document data, judges similarity between the input document data and the document data based on the feature data, obtains a document index associated with document data similar to the input document data, and extracts a plurality of pieces of document data associated with the document index. Thus, document data concerning the document including a page corresponding to the document data similar to the input document data is extracted for a plurality of pages.Type: GrantFiled: July 23, 2008Date of Patent: January 8, 2013Assignee: Sharp Kabushiki KaishaInventor: Hitoshi Hirohata
-
Patent number: 8352263Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.Type: GrantFiled: September 29, 2009Date of Patent: January 8, 2013Inventors: Tze-Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
-
Publication number: 20130006629Abstract: The present invention relates to a searching device, searching method, and program whereby searching for a word string corresponding to input voice can be performed in a robust manner. A voice recognition unit 11 subjects an input voice to voice recognition. A matching unit 16 performs matching, for each of multiple word strings for search results which are word strings that are to be search results for word strings corresponding to the input voice, of a pronunciation symbol string for search results, which is an array of pronunciation symbols expressing pronunciation of the word string search result, and a recognition result pronunciation symbol string which is an array of pronunciation symbols expressing pronunciation of the voice recognition results of the input voice.Type: ApplicationFiled: December 2, 2010Publication date: January 3, 2013Applicant: SONY CORPORATIONInventors: Hitoshi Honda, Yoshinori Maeda, Satoshi Asakawa
-
Patent number: 8346549Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer.Type: GrantFiled: December 4, 2009Date of Patent: January 1, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Mazin Gilbert
-
Patent number: 8346550Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: GrantFiled: February 14, 2011Date of Patent: January 1, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
-
Patent number: 8340968Abstract: A computer-implemented method for automatically training diction of a person acquires a speech data stream of the person as the person is speaking, compares the words in the speech data stream to a set of predefined undesirable phrases provided in a look-up table and upon detection of one of the predefined undesirable phrases in the speech data stream, alerting the person by an alarm.Type: GrantFiled: January 9, 2008Date of Patent: December 25, 2012Assignee: Lockheed Martin CorporationInventor: Vladimir Gershman
-
Publication number: 20120323573Abstract: A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.Type: ApplicationFiled: March 23, 2012Publication date: December 20, 2012Inventors: Su-Youn Yoon, Derrick Higgins, Klaus Zechner, Shasha Xie, Je Hun Jeon, Keelan Evanini
-
Patent number: 8332221Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labeling of successive parts of the document or the entire document.Type: GrantFiled: August 15, 2011Date of Patent: December 11, 2012Assignee: Nuance Communications Austria GmbHInventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
-
Patent number: 8331583Abstract: A noise reducing apparatus includes: a voice signal inputting unit inputting an input voice signal; a noise occurrence period detecting unit detecting a noise occurrence period; a noise removing unit removing a noise for the noise occurrence period; a generation source signal acquiring unit acquiring a generation source signal with a time duration corresponding to a time duration corresponding to the noise occurrence period; a pitch calculating unit calculating a pitch of an input voice signal interval; an interval signal setting unit setting interval signals divided in each unit period interval; an interpolation signal generating unit generating an interpolation signal with the time duration corresponding to the noise occurrence period and alternately arranging the interval signal in a forward time direction and the interval signal in a backward time direction; and a combining unit combining the interpolation signal and the input voice signal, from which the noise is removed.Type: GrantFiled: February 18, 2010Date of Patent: December 11, 2012Assignee: Sony CorporationInventor: Kazuhiko Ozawa
-
Patent number: 8321219Abstract: Embodiments of the present invention improve methods of performing speech recognition using human gestures. In one embodiment, the present invention includes a speech recognition method comprising detecting a gesture, selecting a first recognition set based on the gesture, receiving a speech input signal, and recognizing the speech input signal in the context of the first recognition set.Type: GrantFiled: September 25, 2008Date of Patent: November 27, 2012Assignee: Sensory, Inc.Inventor: Todd F. Mozer
-
Publication number: 20120296638Abstract: In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.Type: ApplicationFiled: May 18, 2012Publication date: November 22, 2012Inventor: Ashish Patwa
-
Publication number: 20120296648Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.Type: ApplicationFiled: July 30, 2012Publication date: November 22, 2012Applicant: AT&T Corp.Inventors: Mehryar Mohri, Michael Dennis Riley
-
Publication number: 20120290302Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.Type: ApplicationFiled: April 13, 2012Publication date: November 15, 2012Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
-
Patent number: 8311842Abstract: A method and apparatus for expanding a bandwidth of an input narrowband voice signal is provided. The narrowband voice signal is analyzed separately for each frame, and a Degree of Voicing (DV) and a Degree of Stationary (DS) are calculated depending on the analysis. A Degree of Difficulty of Bandwidth Expansion (DDBWE) of the narrowband voice signal is calculated based on DV and DS. Bandwidth expansion is controlled according to DDBWE.Type: GrantFiled: March 3, 2008Date of Patent: November 13, 2012Assignee: Samsung Electronics Co., LtdInventors: Geun-Bae Song, Min-Sung Kim, Hee-Jin Oh, Austin Kim, Jae-Bum Kim
-
Patent number: 8306817Abstract: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.Type: GrantFiled: January 8, 2008Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Alejandro Acero, James G. Droppo, Li Deng
-
Patent number: 8306818Abstract: Methods are disclosed for estimating language models such that the conditional likelihood of a class given a word string, which is very well correlated with classification accuracy, is maximized. The methods comprise tuning statistical language model parameters jointly for all classes such that a classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Specific embodiments of the present invention pertain to implementation of the rational function growth transform in the context of a discriminative training technique for n-gram classifiers.Type: GrantFiled: April 15, 2008Date of Patent: November 6, 2012Assignee: Microsoft CorporationInventors: Ciprian Chelba, Alejandro Acero, Milind Mahajan
-
Publication number: 20120278075Abstract: A system and method for collecting from an ASR, a first rating of an intelligibility of human speech, and collecting another intelligibility rating of such speech from networked listeners to such speech. The first rating and the second rating are weighed based on an importance to a user of the ratings, and a third rating is created from such weighted two ratings.Type: ApplicationFiled: April 25, 2012Publication date: November 1, 2012Inventors: Sherrie Ellen Shammass, Eyal Eshed, Ariel Velikovsky
-
Publication number: 20120253806Abstract: A system and method for distributed speech recognition is provided. Audio data is obtained from a caller participating in a call with an agent. A main recognizer receives a main grammar template and the audio data. A plurality of secondary recognizers each receive the audio data and a reference that identifies a secondary grammar, which is a non-overlapping section of the main grammar template. Speech recognition is performed on each of the secondary recognizers and speech recognition results are identified by applying the secondary grammar to the audio data. An n number of most likely speech recognition results are selected. The main recognizer constructs a new grammar based on the main grammar template using the speech recognition results from each of the secondary recognizers as a new vocabulary. Further speech recognition results are identified by applying the new grammar to the audio data.Type: ApplicationFiled: June 18, 2012Publication date: October 4, 2012Inventor: Gilad Odinak
-
Publication number: 20120253807Abstract: A speaker state detecting apparatus comprises: an audio input unit for acquiring, at least, a first voice emanated by a first speaker and a second voice emanated by a second speaker; a speech interval detecting unit for detecting an overlap period between a first speech period of the first speaker included in the first voice and a second speech period of the second speaker included in the second voice, which starts before the first speech period, or an interval between the first speech period and the second speech period; a state information extracting unit for extracting state information representing a state of the first speaker from the first speech period; and a state detecting unit for detecting the state of the first speaker in the first speech period based on the overlap period or the interval and the first state information.Type: ApplicationFiled: February 3, 2012Publication date: October 4, 2012Applicant: FUJITSU LIMITEDInventor: Akira KAMANO
-
Publication number: 20120253805Abstract: Systems, methods, and media for determining fraud risk from audio signals and non-audio data are provided herein. Some exemplary methods include receiving an audio signal and an associated audio signal identifier, receiving a fraud event identifier associated with a fraud event, determining a speaker model based on the received audio signal, determining a channel model based on a path of the received audio signal, using a server system, updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identified, and updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.Type: ApplicationFiled: March 8, 2012Publication date: October 4, 2012Inventors: Anthony Rajakumar, Torsten Zeppenfeld, Lisa Guerra, Vipul Vyas
-
Patent number: 8265932Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.Type: GrantFiled: October 3, 2011Date of Patent: September 11, 2012Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8255216Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.Type: GrantFiled: October 30, 2006Date of Patent: August 28, 2012Assignee: Nuance Communications, Inc.Inventor: Kenneth D. White
-
Publication number: 20120209603Abstract: Techniques for acoustic voice activity detection (AVAD) is described, including detecting a signal associated with a subband from a microphone, performing an operation on data associated with the signal, the operation generating a value associated with the subband, and determining whether the value distinguishes the signal from noise by using the value to determine a signal-to-noise ratio and comparing the value to a threshold.Type: ApplicationFiled: January 9, 2012Publication date: August 16, 2012Inventor: Zhinian Jing
-
Publication number: 20120197641Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.Type: ApplicationFiled: February 1, 2012Publication date: August 2, 2012Applicant: JVC KENWOOD CorporationInventors: Akiko Akechi, Takaaki Yamabe
-
Patent number: 8233590Abstract: The present invention relates to a method of automatically controlling the volume level of communication speech for Mean Opinion Score (MOS) measurement, which, before evaluating the quality of communication speech using a MOS measurement method, automatically controls the volume level of actual communication speech to a predetermined optimal level, thus improving the reliability of MOS values.Type: GrantFiled: November 28, 2006Date of Patent: July 31, 2012Assignee: Innowireless Co., Ltd.Inventors: Jong Tae Chung, Jin Soup Joung, Young Su Kwak, Jin Man Kim, Hyun Seok Cho
-
Patent number: 8229744Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.Type: GrantFiled: August 26, 2003Date of Patent: July 24, 2012Assignee: Nuance Communications, Inc.Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
-
Publication number: 20120185251Abstract: A method and system for candidate matching, such as used in match-making services, assesses narrative responses to measure candidate qualities. A candidate database includes self-assessment data and narrative data. Narrative data concerning a defined topic is analyzed to determine candidate qualities separate from topical information. Candidate qualities thus determined are included in candidate profiles and used to identify desirable candidates.Type: ApplicationFiled: March 26, 2012Publication date: July 19, 2012Applicant: HOSHIKO LLCInventor: Gary Stephen Shuster
-
Patent number: 8224644Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.Type: GrantFiled: December 18, 2008Date of Patent: July 17, 2012Assignee: Microsoft CorporationInventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
-
Patent number: 8219386Abstract: The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices.Type: GrantFiled: January 21, 2009Date of Patent: July 10, 2012Assignee: King Fahd University of Petroleum and MineralsInventors: Al-Zahrani Abdul Kareem Saleh, Moustafa Elshafei
-
Patent number: 8214211Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.Type: GrantFiled: August 26, 2008Date of Patent: July 3, 2012Assignee: Yamaha CorporationInventor: Yasuo Yoshioka
-
Patent number: 8214210Abstract: A system for processing a query operates by receiving a first query segment that includes audio speech. Next, the system generates a representation for this first query segment, where the representation includes at least two paths associated with alternative phrase sequences for an ambiguity in the audio speech. The system then compares the paths in the representation to a group of documents and determines matching scores for the group of documents based on the comparisons. Finally, the system presents a ranking of the group of documents, where the ranking is based on the matching scores for the group of documents.Type: GrantFiled: September 19, 2006Date of Patent: July 3, 2012Assignee: Oracle America, Inc.Inventor: William A. Woods
-
Patent number: 8209172Abstract: Pattern recognition capable of robust identification for the variance of an input pattern is performed with a low processing cost while the possibility of identification errors is decreased. In a pattern recognition apparatus which identifies the pattern of input data from a data input unit (11) by using a hierarchical feature extraction processor (12) which hierarchically extracts features, an extraction result distribution analyzer (13) analyzes a distribution of at least one feature extraction result obtained by a primary feature extraction processor (121). On the basis of the analytical result, a secondary feature extraction processor (122) performs predetermined secondary feature extraction.Type: GrantFiled: December 16, 2004Date of Patent: June 26, 2012Assignee: Canon Kabushiki KaishaInventors: Yusuke Mitarai, Masakazu Matsuga, Katsuhiko Mori
-
Patent number: 8204738Abstract: A method of removing bias from an action classifier within a natural language understanding system can include identifying a sentence having a target embedded grammar that overlaps with at least one other embedded grammar and selecting a group of overlapping embedded grammars including the target embedded grammar and at least one additional embedded grammar. A sentence expansion can be created that includes the sentence including the target embedded grammar and a copy of the sentence for each additional embedded grammar of the group. Each copy of the sentence can include a different additional embedded grammar from the group in place of the target embedded grammar. The sentence expansion can be included within action classifier training data.Type: GrantFiled: November 3, 2006Date of Patent: June 19, 2012Assignee: Nuance Communications, Inc.Inventor: Ilya Skuratovsky
-
Publication number: 20120150539Abstract: Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.Type: ApplicationFiled: December 13, 2011Publication date: June 14, 2012Applicant: Electronics and Telecommunications Research InstituteInventors: Hyung Bae Jeon, Yun Keun Lee, Eui Sok Chung, Jong Jin Kim, Hoon Chung, Jeon Gue Park, Ho Young Jung, Byung Ok Kang, Ki Young Park, Sung Joo Lee, Jeom Ja Kang, Hwa Jeon Song
-
Patent number: 8200487Abstract: The invention relates to a method, a computer program product, a segmentation system and a user interface for structuring an unstructured text by making use of statistical models trained on annotated training data. The method performs text segmentation into text sections and assigns labels to text sections as section headings. The performed segmentation and assignment is provided to a user for general review. Additionally, alternative segmentations and label assignments are provided to the user being capable to select alternative segmentations and alternative labels as well as to enter a user defined segmentation and user defined label. In response to the modifications introduced by the user, a plurality of different actions are initiated incorporating the re-segmentation and re-labelling of successive parts of the document or the entire document.Type: GrantFiled: November 12, 2004Date of Patent: June 12, 2012Assignee: Nuance Communications Austria GmbHInventors: Jochen Peters, Evgeny Matusov, Carsten Meyer, Dietrich Klakow
-
Patent number: 8200486Abstract: Method and system for processing and identifying a sub-audible signal formed by a source of sub-audible sounds. Sequences of samples of sub-audible sound patterns (“SASPs”) for known words/phrases in a selected database are received for overlapping time intervals, and Signal Processing Transforms (“SPTs”) are formed for each sample, as part of a matrix of entry values. The matrix is decomposed into contiguous, non-overlapping two-dimensional cells of entries, and neural net analysis is applied to estimate reference sets of weight coefficients that provide sums with optimal matches to reference sets of values. The reference sets of weight coefficients are used to determine a correspondence between a new (unknown) word/phrase and a word/phrase in the database.Type: GrantFiled: June 5, 2003Date of Patent: June 12, 2012Assignee: The United States of America as represented by the Administrator of the National Aeronautics & Space Administration (NASA)Inventors: Charles C. Jorgensen, Diana D. Lee, Shane T. Agabon
-
Patent number: 8200488Abstract: The invention provides a method for processing speech comprising the steps of receiving a speech input (SI) of a speaker, generating speech parameters (SP) from said speech input (SI), determining parameters describing an absolute loudness (L) of said speech input (SI), and evaluating (EV) said speech input (SI) and/or said speech parameters (SP) using said parameters describing the absolute loudness (L). In particular, the step of evaluation (EV) comprises a step of emotion recognition and/or speaker identification. Further, a microphone array comprising a plurality of microphones is used for determining said parameters describing the absolute loudness. With a microphone array the distance of the speaker from the microphone array can be determined and the loudness can be normalized by the distance.Type: GrantFiled: December 10, 2003Date of Patent: June 12, 2012Assignee: Sony Deutschland GmbHInventors: Thomas Kemp, Ralf Kompe, Raquel Tato
-
Patent number: 8185390Abstract: The invention comprises a method for lossy data compression, akin to vector quantization, in which there is no explicit codebook and no search, i.e. the codebook memory and associated search computation are eliminated. Some memory and computation are still required, but these are dramatically reduced, compared to systems that do not exploit this method. For this reason, both the memory and computation requirements of the method are exponentially smaller than comparable methods that do not exploit the invention. Because there is no explicit codebook to be stored or searched, no such codebook need be generated either. This makes the method well suited to adaptive coding schemes, where the compression system adapts to the statistics of the data presented for processing: both the complexity of the algorithm executed for adaptation, and the amount of data transmitted to synchronize the sender and receiver, are exponentially smaller than comparable existing methods.Type: GrantFiled: April 23, 2009Date of Patent: May 22, 2012Assignee: Promptu Systems CorporationInventor: Harry Printz
-
Patent number: 8185392Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving voice queries, obtaining, for one or more of the voice queries, feedback information that references an action taken by a user that submitted the voice query after reviewing a result of the voice query, generating, for the one or more voice queries, a posterior recognition confidence measure that reflects a probability that the voice query was correctly recognized, wherein the posterior recognition confidence measure is generated based at least on the feedback information for the voice query, selecting a subset of the one or more voice queries based on the posterior recognition confidence measures, and adapting an acoustic model using the subset of the voice queries.Type: GrantFiled: September 30, 2011Date of Patent: May 22, 2012Assignee: Google Inc.Inventors: Brian Strope, Douglas H. Beeferman
-
Patent number: 8180637Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.Type: GrantFiled: December 3, 2007Date of Patent: May 15, 2012Assignee: Microsoft CorporationInventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
-
Patent number: 8175730Abstract: In order to analyze an information signal, a significant short-time spectrum is extracted from the information signal, the means for extracting being configured to extract such short-time spectra which come closer to a specific characteristic than other short-time spectra of the information signal. The short-time spectra extracted are then decomposed into component signals using ICA analysis, a component signal spectrum representing a profile spectrum of a tone source which generates a tone corresponding to the characteristic sought for. From a sequence of short-time spectra of the information signal and from the profile spectra determined, an amplitude envelope is eventually calculated for each profile spectrum, the amplitude envelope indicating how a profile spectrum of a tone source all in all changes over time.Type: GrantFiled: June 30, 2009Date of Patent: May 8, 2012Assignee: SONY CorporationInventors: Christian Dittmar, Christian Uhle, Jürgen Herre
-
Patent number: 8175236Abstract: Providing for inter-working between SMS network architectures and IMS network architectures in a mobile environment is described herein. By way of example, a next generation (NG) short message service center (SMSC) is provided that can receive SMS messages in mobile application protocol (MAP) and convert such messages to IMS protocol. In addition, the NG SMSC can also receive IMS data and convert the IMS data to an SMS MAP message. The NG SMSC can reference an IMS or an SMS location registry to determine a location of the target device, and convert from IMS to SMS MAP, and vice versa, as suitable. Accordingly, the NG SMSC can provide an efficient interface between legacy SMS and NG IMS network components while preserving legacy protocols associated with such networks.Type: GrantFiled: January 15, 2008Date of Patent: May 8, 2012Assignee: AT&T Mobility II LLCInventors: Vinod Kumar Pandey, Karl J. Schlieber, Matthew Wayne Stafford, Jianrong Wang
-
Publication number: 20120109649Abstract: Automatic speech recognition including receiving speech via a microphone, pre-processing the received speech to generate acoustic feature vectors, classifying dialect of the received speech, selecting at least one of an acoustic model or a lexicon specific to the classified dialect, decoding the acoustic feature vectors using a processor and at least one of the selected dialect-specific acoustic model or selected lexicon to produce a plurality of hypotheses for the received speech, and post-processing the plurality of hypotheses to identify one of the plurality of hypotheses as the received speech.Type: ApplicationFiled: November 1, 2010Publication date: May 3, 2012Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Rathinavelu Chengalvarayan
-
Publication number: 20120095762Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.Type: ApplicationFiled: October 19, 2011Publication date: April 19, 2012Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
-
Patent number: 8160866Abstract: The present invention can recognize both English and Chinese at the same time. The most important skill is that the features of all English words (without samples) are entirely extracted from the features of Chinese syllables. The invention normalizes the signal waveforms of variable lengths for English words (Chinese syllables) such that the same words (syllables) can have the same features at the same time position. Hence the Bayesian classifier can recognize both the fast and slow utterance of sentences. The invention can improve the feature such that the speech recognition of the unknown English (Chinese) is guaranteed to be correct. Furthermore, since the invention can create the features of English words from the features of Chinese syllables, it can also create the features of other languages from the features of Chinese syllables and hence it can also recognize other languages, such as German, French, Japanese, Korean, Russian, etc.Type: GrantFiled: October 10, 2008Date of Patent: April 17, 2012Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
-
Patent number: 8155962Abstract: The methods and systems described herein may asynchronously process natural language utterances to provide real-time response performance and natural interaction with users. In particular, the methods and systems described herein may use various natural language speech recognition and interpretation components to identify a request (e.g., a query or command) in an utterance. The request identified in the utterance may then be processed with one or more domain agents, which may submit duplicate queries to multiple different data sources to process the request. The domain agents may then asynchronously evaluate responses to the duplicate queries to return results to users in a timely and natural manner, and further to account the fact that the different data sources may respond to the queries at different speeds, provide unsatisfactory responses to the queries, or fail to respond to the queries at all.Type: GrantFiled: July 19, 2010Date of Patent: April 10, 2012Assignee: VoiceBox Technologies, Inc.Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
-
Patent number: 8155959Abstract: Systems and methods are described that automatically control modules of dialog systems. The systems and methods include a dialog module that receives and processes utterances from a speaker and outputs data used to generate synthetic speech outputs as responses to the utterances. A controller is coupled to the dialog module, and the controller detects an abnormal output of the dialog module when the dialog module is processing in an automatic mode. The controller comprises a mode control for an agent to control the dialog module by correcting the abnormal output and transferring a corrected output to a downstream dialog module that follows, in a processing path, the dialog module. The corrected output is used in further processing the utterances.Type: GrantFiled: November 7, 2007Date of Patent: April 10, 2012Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Baoshi Yan, Zhe Feng