Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech recognition process

Patent number: 8775177

Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.

Type: Grant

Filed: October 31, 2012

Date of Patent: July 8, 2014

Assignee: Google Inc.

Inventors: Georg Heigold, Patrick An Phu Nguyen, Mitchel Weintraub, Vincent O. Vanhoucke
Content-based audio playback emphasis

Patent number: 8768706

Abstract: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.

Type: Grant

Filed: August 20, 2010

Date of Patent: July 1, 2014

Assignee: Multimodal Technologies, LLC

Inventors: Kjell Schubert, Juergen Fritsch, Michael Finke, Detlef Koll
Reference pattern adaptation apparatus, reference pattern adaptation method and reference pattern adaptation program

Patent number: 8762148

Abstract: A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device 2 includes a speech recognition section 18, an adaptation data calculating section 19 and a reference pattern adaptation section 20. The speech recognition section 18 calculates a recognition result teacher label from the input speech data and the reference pattern. The adaptation data calculating section 19 calculates adaptation data composed of a teacher label and speech data. The adaptation data is composed of the input speech data and the recognition result teacher label corrected for adaptation by the recognition error knowledge which is the statistical information of the tendency towards recognition errors of the reference pattern. The reference pattern adaptation section 20 adapts the reference pattern using the adaptation data to generate an adaptation pattern.

Type: Grant

Filed: February 16, 2007

Date of Patent: June 24, 2014

Assignee: NEC Corporation

Inventor: Yoshifumi Onishi
Recognizing the Numeric Language in Natural Spoken Dialogue

Publication number: 20140163988

Abstract: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.

Type: Application

Filed: February 17, 2014

Publication date: June 12, 2014

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Mazin G. Rahim, Giuseppe Riccardi, Jeremy Huntley Wright, Bruce Melvin Buntschuh, Allen Louis Gorin
Determining reading levels of electronic books

Patent number: 8744855

Abstract: Architectures and techniques are described to determine a reading level of an electronic book. In particular, words, phrases, clauses, and parts of speech of an electronic book may be tagged and used to determine the reading level of the electronic book. In some cases, the reading level of the electronic book is based on a level of complexity of sentences of the electronic book and a level of complexity of words of the electronic book.

Type: Grant

Filed: August 9, 2010

Date of Patent: June 3, 2014

Assignee: Amazon Technologies, Inc.

Inventor: Daniel B. Rausch
HMM learning device and method, program, and recording medium

Patent number: 8725510

Abstract: An HMM (Hidden Markov Model) learning device includes: a learning unit for learning a state transition probability as the function of actions that an agent can execute, with learning with HMM performed based on actions that the agent has executed, and time series information made up of an observation signal; and a storage unit for storing learning results by the learning unit as internal model data including a state-transition probability table and an observation probability table; with the learning unit calculating frequency variables used for estimation calculation of HMM state-transition and HMM observation probabilities; with the storage unit holding the frequency variables corresponding to each of state-transition probabilities and each of observation probabilities respectively, of the state-transition probability table; and with the learning unit using the frequency variables held by the storage unit to perform learning, and estimating the state-transition probability and the observation probability bas

Type: Grant

Filed: July 2, 2010

Date of Patent: May 13, 2014

Assignee: Sony Corporation

Inventors: Yukiko Yoshiike, Kenta Kawamoto, Kuniaki Noda, Kohtaro Sabe
Method and system for modeling a common-language speech recognition, by a computer, under the influence of a plurality of dialects

Patent number: 8712773

Abstract: The present invention relates to a method for modeling a common-language speech recognition, by a computer, under the influence of multiple dialects and concerns a technical field of speech recognition by a computer. In this method, a triphone standard common-language model is first generated based on training data of standard common language, and first and second monophone dialectal-accented common-language models are based on development data of dialectal-accented common languages of first kind and second kind, respectively. Then a temporary merged model is obtained in a manner that the first dialectal-accented common-language model is merged into the standard common-language model according to a first confusion matrix obtained by recognizing the development data of first dialectal-accented common language using the standard common-language model.

Type: Grant

Filed: October 29, 2009

Date of Patent: April 29, 2014

Assignees: Sony Computer Entertainment Inc., Tsinghua University

Inventors: Fang Zheng, Xi Xiao, Linquan Liu, Zhan You, Wenxiao Cao, Makoto Akabane, Ruxin Chen, Yoshikazu Takahashi
System and method for selecting audio contents by using speech recognition

Patent number: 8706489

Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.

Type: Grant

Filed: August 8, 2006

Date of Patent: April 22, 2014

Assignee: Delta Electronics Inc.

Inventors: Jia-lin Shen, Chien-Chou Hung
Belief tracking and action selection in spoken dialog systems

Patent number: 8676583

Abstract: An action is performed in a spoken dialog system in response to a user's spoken utterance. A policy which maps belief states of user intent to actions is retrieved or created. A belief state is determined based on the spoken utterance, and an action is selected based on the determined belief state and the policy. The action is performed, and in one embodiment, involves requesting clarification of the spoken utterance from the user. Creating a policy may involve simulating user inputs and spoken dialog system interactions, and modifying policy parameters iteratively until a policy threshold is satisfied. In one embodiment, a belief state is determined by converting the spoken utterance into text, assigning the text to one or more dialog slots associated with nodes in a probabilistic ontology tree (POT), and determining a joint probability based on probability distribution tables in the POT and on the dialog slot assignments.

Type: Grant

Filed: August 30, 2011

Date of Patent: March 18, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Rakesh Gupta, Deepak Ramachandran, Antoine Raux, Neville Mehta, Stefan Krawczyk, Matthew Hoffman
Information processing system, information processing apparatus, information processing program and recording medium

Patent number: 8676576

Abstract: A copyright managing information processing apparatus includes a storage module for storing copyrighted content including audio data; a first topic module for recognizing audio data in content opened to the public by a to-be-opened information processing apparatus, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; a second topic module for recognizing audio data in content stored in the storage means, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; and a similarity determining module for comparing the topic information generated by the first topic module with that created by the second topic module for thereby determining presence or absence of similarity therebetween.

Type: Grant

Filed: January 19, 2009

Date of Patent: March 18, 2014

Assignee: NEC Corporation

Inventor: Eiichirou Nomitsu
Noise power estimation system, noise power estimating method, speech recognition system and speech recognizing method

Patent number: 8666737

Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.

Type: Grant

Filed: September 14, 2011

Date of Patent: March 4, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
Acoustic scoring unit implemented on a single FPGA or ASIC

Patent number: 8639510

Abstract: A hardware acoustic scoring unit for a speech recognition system and a method of operation thereof are provided. Rather than scoring all senones in an acoustic model used for the speech recognition system, acoustic scoring logic first scores a set of ciphones based on acoustic features for one frame of sampled speech. The acoustic scoring logic then scores senones associated with the N highest scored ciphones. In one embodiment, the number (N) is three. While the acoustic scoring logic scores the senones associated with the N highest scored ciphones, high score ciphone identification logic operates in parallel with the acoustic scoring unit to identify one or more additional ciphones that have scores greater than a threshold. Once the acoustic scoring unit finishes scoring the senones for the N highest scored ciphones, the acoustic scoring unit then scores senones associated with the one or more additional ciphones.

Type: Grant

Filed: December 22, 2008

Date of Patent: January 28, 2014

Inventors: Kai Yu, Rob A. Rutenbar
Relevance recognition for a human machine dialog system contextual question answering based on a normalization of the length of the user input

Patent number: 8639517

Abstract: Disclosed are systems, methods and computer-readable media for controlling a computing device to provide contextual responses to user inputs. The method comprises receiving a user input, generating a set of features characterizing an association between the user input and a conversation context based on at least a semantic and syntactic analysis of user inputs and system responses, determining with a data-driven machine learning approach whether the user input begins a new topic or is associated with a previous conversation context and if the received question is associated with the existing topic, then generating a response to the user input using information associated with the user input and any previous user input associated with the existing topic, based on a normalization of the length of the user input.

Type: Grant

Filed: June 15, 2012

Date of Patent: January 28, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Giuseppe Di Fabbrizio, Junlan Feng
Systems and methods for extracting meaning from multimodal inputs using finite-state devices

Patent number: 8626507

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Grant

Filed: November 30, 2012

Date of Patent: January 7, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Speech processing system and method

Patent number: 8620655

Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic

Type: Grant

Filed: August 10, 2011

Date of Patent: December 31, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
Voice chat system, information processing apparatus, speech recognition method, keyword data electrode detection method, and program for speech recognition

Patent number: 8620658

Abstract: A voice chat system includes a plurality of information processing apparatuses that performs a voice chat while performing speech recognition and a search server connected to the plural information processing apparatuses via a communication network. The search server discloses a search keyword list containing the search keywords searched by the search server to at least one of the plural information processing apparatuses. The at least one information processing apparatus includes a recognition word dictionary generating unit that acquires the search keyword list from the search server to generate a recognition word dictionary containing words for use in the speech recognition, and a speech recognition unit that performs speech recognition on voice data obtained from a dialog of the conversation during the voice chat by referencing a recognition database containing the recognition word dictionary.

Type: Grant

Filed: April 14, 2008

Date of Patent: December 31, 2013

Assignees: Sony Corporation, So-Net Entertainment Corporation

Inventors: Motoki Nakade, Hiroaki Ogawa, Hitoshi Honda, Yoshinori Kurata, Daisuke Ishizuka
Method and system for considering information about an expected response when performing speech recognition

Patent number: 8612235

Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.

Type: Grant

Filed: June 8, 2012

Date of Patent: December 17, 2013

Assignee: Vocollect, Inc.

Inventors: Keith Braho, Amro El-Jaroudi
Method and equipment of pattern recognition, its program and its recording medium for improving searching efficiency in speech recognition

Patent number: 8612227

Abstract: The present invention provides a method and equipment of pattern recognition capable of efficiently pruning partial hypotheses without lowering recognition accuracy, its pattern recognition program, and its recording medium. In a second search unit, a likelihood calculation unit calculates an acoustic likelihood by matching time series data of acoustic feature parameters against a lexical tree stored in a second database and an acoustic model stored in a third database to determine an accumulated likelihood by accumulating the acoustic likelihood in a time direction. A self-transition unit causes each partial hypothesis to make a self-transition in a search process. An LR transition unit causes each partial hypothesis to make an RL transition. A reward attachment unit adds a reward R(x) in accordance with the number of reachable words to each partial hypothesis to raise the accumulated likelihood. A pruning unit excludes partial hypotheses with less likelihood from search targets.

Type: Grant

Filed: July 22, 2010

Date of Patent: December 17, 2013

Assignee: KDDI Corporation

Inventor: Tsuneo Kato
Voice recognition device, voice recognition method, and voice recognition program

Patent number: 8612225

Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.

Type: Grant

Filed: February 26, 2008

Date of Patent: December 17, 2013

Assignee: NEC Corporation

Inventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
Speech data process unit and speech data process unit control program for speech recognition

Patent number: 8606580

Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.

Type: Grant

Filed: December 30, 2008

Date of Patent: December 10, 2013

Assignee: Asahi Kasei Kabushiki Kaisha

Inventors: Makoto Shozakai, Goshu Nagino
Robust information fusion methods for decision making for multisource data

Patent number: 8589334

Abstract: Methods and systems are provided for developing decision information relating to a single system based on data received from a plurality of sensors. The method includes receiving first data from a first sensor that defines first information of a first type that is related to a system, receiving second data from a second sensor that defines second information of a second type that is related to said system, wherein the first type is different from the second type, generating a first decision model, a second decision model, and a third decision model, determining whether data is available from only the first sensor, only the second sensor, or both the first and second sensors, and selecting based on the determination of availability an additional model to apply the available data, wherein the additional model is selected from a plurality of additional decision models including the third decision model.

Type: Grant

Filed: January 18, 2011

Date of Patent: November 19, 2013

Assignee: Telcordia Technologies, Inc.

Inventor: Akshay Vashist
Free text matching system and method

Patent number: 8589165

Abstract: The present disclosure provides method and system for converting a free text expression of an identity to a phonetic equivalent code. The conversion follows a set of rules based on phonetic groupings and compresses the expression to a shorter series of characters than the expression. The phonetic equivalent code may be compared to one or more other phonetic equivalent code to establish a correlation between the codes. The phonetic equivalent code of the free text expression may be associated with the code of a known identity. The known identity may be provided to a user for confirmation of the identity. Further, a plurality of expressions stored in a database may be consolidated by converting the expressions to phonetic equivalent codes, comparing the codes to find correlations, and if appropriate reducing the number of expressions or mapping the expressions to a fewer number of expressions.

Type: Grant

Filed: January 24, 2012

Date of Patent: November 19, 2013

Assignee: United Services Automobile Association (USAA)

Inventors: Gregory Brian Meyer, James Elden Nicholson
Enhanced interface for use with speech recognition

Patent number: 8583439

Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.

Type: Grant

Filed: January 12, 2004

Date of Patent: November 12, 2013

Assignee: Verizon Services Corp.

Inventor: James Mark Kondziela
Speech recognition system and speech recognizing method

Patent number: 8577678

Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.

Type: Grant

Filed: March 10, 2011

Date of Patent: November 5, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
Mobile terminal and menu control method thereof

Patent number: 8560324

Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.

Type: Grant

Filed: January 31, 2012

Date of Patent: October 15, 2013

Assignee: LG Electronics Inc.

Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
Contact center call routing by agent attribute

Patent number: 8559618

Abstract: A system, method, and computer readable medium for contact center call routing by agent attribute, that comprises, detecting a caller voice attribute, sampling at least one agent voice attribute, storing the at least one agent voice attribute, matching the detected caller voice attribute to the stored voice attribute of the at least one agent, and routing a call based upon the matched agent voice attribute to the detected caller voice attribute.

Type: Grant

Filed: June 28, 2006

Date of Patent: October 15, 2013

Assignee: West Corporation

Inventors: Jeffrey William Cordell, James K Boutcher, Michelle L Steinbeck
Non-negative hidden Markov modeling of signals

Patent number: 8554553

Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.

Type: Grant

Filed: February 21, 2011

Date of Patent: October 8, 2013

Assignee: Adobe Systems Incorporated

Inventors: Gautham J. Mysore, Paris Smaragdis
Voice recognition device, voice recognition method, and voice recognition program

Patent number: 8548806

Abstract: A voice recognition device, a voice recognition method and a voice recognition program capable of appropriately restricting recognition objects based on voice input from a user to recognize the input voice with accuracy are provided.

Type: Grant

Filed: September 11, 2007

Date of Patent: October 1, 2013

Assignee: Honda Motor Co. Ltd.

Inventor: Hisayuki Nagashima
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8548807

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: June 9, 2009

Date of Patent: October 1, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Apparatus and method for speech recognition using a plurality of confidence score estimation algorithms

Patent number: 8543399

Abstract: An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.

Type: Grant

Filed: September 8, 2006

Date of Patent: September 24, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jae-hoon Jeong, Sang-bae Jeong, Jeong-su Kim, Nam-hoon Kim
Speech recognition based on pronunciation modeling

Patent number: 8532993

Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.

Type: Grant

Filed: July 2, 2012

Date of Patent: September 10, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Andrej Ljolje
Voice assistant system for determining activity information

Patent number: 8521538

Abstract: A system and method of assisting a care provider in the documentation of self-performance and support information for a resident or person includes a speech dialog with a care provider that uses the generation of speech to play to the care provider and the capture of speech spoken by a care provider. The speech dialog provides assistance to the care provider in providing care for a person according to a care plan for the person. The care plan includes one or more activities requiring a level of performance by the person. For the activity, speech inquiries are provided to the care provider, through the speech dialog, regarding performance of the activity by the person and regarding care provider assistance in the performance of the activity by the person. Speech input is captured from the care provider that is responsive to the speech inquiries. A code is then determined from the speech input and the code indicates the self-performance of the person and support information for a care provider for the activity.

Type: Grant

Filed: September 10, 2010

Date of Patent: August 27, 2013

Assignee: Vocollect Healthcare Systems, Inc.

Inventors: Michael Laughery, Bonnie Praksti, David M. Findlay, James E. Shearon
Method for segmenting audio signals

Patent number: 8521529

Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.

Type: Grant

Filed: April 18, 2005

Date of Patent: August 27, 2013

Assignee: Creative Technology Ltd

Inventors: Michael M. Goodwin, Jean Laroche
Speech recognition apparatus and method and program therefor

Patent number: 8510111

Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s

Type: Grant

Filed: February 8, 2008

Date of Patent: August 13, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
Modification of voice waveforms to change social signaling

Patent number: 8484035

Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.

Type: Grant

Filed: September 6, 2007

Date of Patent: July 9, 2013

Assignee: Massachusetts Institute of Technology

Inventor: Alex Paul Pentland
Library of existing spoken dialog data for use in generating new natural language spoken dialog systems

Patent number: 8478589

Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.

Type: Grant

Filed: January 5, 2005

Date of Patent: July 2, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
Server for automatically scoring opinion conveyed by text message containing pictorial-symbols

Patent number: 8478582

Abstract: A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol.

Type: Grant

Filed: February 2, 2010

Date of Patent: July 2, 2013

Assignee: KDDI Corporation

Inventors: Yukiko Habu, Ryoichi Kawada, Nobuhide Kotsuka, Sung Jiae, Koki Uchiyama, Santi Saeyor, Hirosuke Asano, Toshiaki Shimamura
Speech processing with source location estimation using signals from two or more microphones

Patent number: 8442833

Abstract: Computer implemented speech processing is disclosed. First and second voice segments are extracted from first and second microphone signals originating from first and second microphones. The first and second voice segments correspond to a voice sound originating from a common source. An estimated source location is generated based on a relative energy of the first and second voice segments and/or a correlation of the first and second voice segments. A determination whether the voice segment is desired or undesired may be made based on the estimated source location.

Type: Grant

Filed: February 2, 2010

Date of Patent: May 14, 2013

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Automatic computation streaming partition for voice recognition on multiple processors with limited memory

Patent number: 8442829

Abstract: Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.

Type: Grant

Filed: February 2, 2010

Date of Patent: May 14, 2013

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Multi-frame prediction for hybrid neural network/hidden Markov models

Patent number: 8442821

Abstract: A method and system for multi-frame prediction in a hybrid neural network/hidden Markov model automatic speech recognition (ASR) system is disclosed. An audio input signal may be transformed into a time sequence of feature vectors, each corresponding to respective temporal frame of a sequence of periodic temporal frames of the audio input signal. The time sequence of feature vectors may be concurrently input to a neural network, which may process them concurrently. In particular, the neural network may concurrently determine for the time sequence of feature vectors a set of emission probabilities for a plurality of hidden Markov models of the ASR system, where the set of emission probabilities are associated with the temporal frames. The set of emission probabilities may then be concurrently applied to the hidden Markov models for determining speech content of the audio input signal.

Type: Grant

Filed: July 27, 2012

Date of Patent: May 14, 2013

Assignee: Google Inc.

Inventor: Vincent Vanhoucke
Conditional model for natural language understanding

Patent number: 8442828

Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.

Type: Grant

Filed: March 17, 2006

Date of Patent: May 14, 2013

Assignee: Microsoft Corporation

Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
Methods and systems for natural language understanding using human knowledge and collected data

Patent number: 8433558

Abstract: Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.

Type: Grant

Filed: July 25, 2005

Date of Patent: April 30, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Mazin Gilbert, Narendra K. Gupta
Recognizer weight learning apparatus, speech recognition apparatus, and system

Patent number: 8428950

Abstract: A speech recognition apparatus (110) selects an optimum recognition result from recognition results output from a set of speech recognizers (s1-sM) based on a majority decision. This decision is implemented with taking into account weight values, as to the set of the speech recognizers, learned by a learning apparatus (100). The learning apparatus includes a unit (103) selecting speech recognizers corresponding to characteristics of speech for learning (101), a unit (104) finding recognition results of the speech for learning by using the selected speech recognizers, a unit (105) unifying the recognition results and generating a word string network, and a unit (106) finding weight values concerning a set of the speech recognizers by implementing learning processing.

Type: Grant

Filed: January 18, 2008

Date of Patent: April 23, 2013

Assignee: NEC Corporation

Inventors: Yoshifumi Onishi, Tadashi Emori
Generic framework for large-margin MCE training in speech recognition

Patent number: 8423364

Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

Type: Grant

Filed: February 20, 2007

Date of Patent: April 16, 2013

Assignee: Microsoft Corporation

Inventors: Dong Yu, Alejandro Acero, Li Deng, Xiaodong He
Restoration of high-order Mel frequency cepstral coefficients

Patent number: 8412526

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Grant

Filed: December 3, 2007

Date of Patent: April 2, 2013

Assignee: Nuance Communications, Inc.

Inventor: Alexander Sorin
Multiple audio/video data stream simulation

Patent number: 8392195

Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.

Type: Grant

Filed: May 31, 2012

Date of Patent: March 5, 2013

Assignee: International Business Machines Corporation

Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
Device and method for automatic participant identification in a recorded multimedia stream

Patent number: 8390669

Abstract: The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.

Type: Grant

Filed: December 15, 2009

Date of Patent: March 5, 2013

Assignee: Cisco Technology, Inc.

Inventors: Jason Catchpole, Craig Cockerton
Speech recognition system and method for generating a mask of the system

Patent number: 8392185

Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.

Type: Grant

Filed: August 19, 2009

Date of Patent: March 5, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
Predicting results for input data based on a model generated from clusters

Patent number: 8386232

Abstract: A method for predicting results for input data based on a model that is generated based on clusters of related characters, clusters of related segments, and training data. The method comprises receiving a data set that includes a plurality of words in a particular language. In the particular language, words are formed by characters. Clusters of related characters are formed from the data set. A model is generated based at least on the clusters of related characters and training data. The model may also be based on the clusters of related segments. The training data includes a plurality of entries, wherein each entry includes a character and a designated result for said character. A set of input data that includes characters that have not been associated with designated results is received. The model is applied to the input data to determine predicted results for characters within the input data.

Type: Grant

Filed: June 1, 2006

Date of Patent: February 26, 2013

Assignee: Yahoo! Inc.

Inventor: Fuchun Peng
Multi-class constrained maximum likelihood linear regression

Patent number: 8386254

Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.

Type: Grant

Filed: May 2, 2008

Date of Patent: February 26, 2013

Assignee: Nuance Communications, Inc.

Inventors: Neeraj Deshmukh, Puming Zhan

prev 1 2 3 4 5 6 7 … next