Markov Patents (Class 704/256)
  • Patent number: 8775177
    Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: July 8, 2014
    Assignee: Google Inc.
    Inventors: Georg Heigold, Patrick An Phu Nguyen, Mitchel Weintraub, Vincent O. Vanhoucke
  • Patent number: 8768706
    Abstract: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.
    Type: Grant
    Filed: August 20, 2010
    Date of Patent: July 1, 2014
    Assignee: Multimodal Technologies, LLC
    Inventors: Kjell Schubert, Juergen Fritsch, Michael Finke, Detlef Koll
  • Patent number: 8762148
    Abstract: A method and apparatus for carrying out adaptation using input speech data information even at a low reference pattern recognition performance. A reference pattern adaptation device 2 includes a speech recognition section 18, an adaptation data calculating section 19 and a reference pattern adaptation section 20. The speech recognition section 18 calculates a recognition result teacher label from the input speech data and the reference pattern. The adaptation data calculating section 19 calculates adaptation data composed of a teacher label and speech data. The adaptation data is composed of the input speech data and the recognition result teacher label corrected for adaptation by the recognition error knowledge which is the statistical information of the tendency towards recognition errors of the reference pattern. The reference pattern adaptation section 20 adapts the reference pattern using the adaptation data to generate an adaptation pattern.
    Type: Grant
    Filed: February 16, 2007
    Date of Patent: June 24, 2014
    Assignee: NEC Corporation
    Inventor: Yoshifumi Onishi
  • Publication number: 20140163988
    Abstract: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.
    Type: Application
    Filed: February 17, 2014
    Publication date: June 12, 2014
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Mazin G. Rahim, Giuseppe Riccardi, Jeremy Huntley Wright, Bruce Melvin Buntschuh, Allen Louis Gorin
  • Patent number: 8744855
    Abstract: Architectures and techniques are described to determine a reading level of an electronic book. In particular, words, phrases, clauses, and parts of speech of an electronic book may be tagged and used to determine the reading level of the electronic book. In some cases, the reading level of the electronic book is based on a level of complexity of sentences of the electronic book and a level of complexity of words of the electronic book.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: June 3, 2014
    Assignee: Amazon Technologies, Inc.
    Inventor: Daniel B. Rausch
  • Patent number: 8725510
    Abstract: An HMM (Hidden Markov Model) learning device includes: a learning unit for learning a state transition probability as the function of actions that an agent can execute, with learning with HMM performed based on actions that the agent has executed, and time series information made up of an observation signal; and a storage unit for storing learning results by the learning unit as internal model data including a state-transition probability table and an observation probability table; with the learning unit calculating frequency variables used for estimation calculation of HMM state-transition and HMM observation probabilities; with the storage unit holding the frequency variables corresponding to each of state-transition probabilities and each of observation probabilities respectively, of the state-transition probability table; and with the learning unit using the frequency variables held by the storage unit to perform learning, and estimating the state-transition probability and the observation probability bas
    Type: Grant
    Filed: July 2, 2010
    Date of Patent: May 13, 2014
    Assignee: Sony Corporation
    Inventors: Yukiko Yoshiike, Kenta Kawamoto, Kuniaki Noda, Kohtaro Sabe
  • Patent number: 8712773
    Abstract: The present invention relates to a method for modeling a common-language speech recognition, by a computer, under the influence of multiple dialects and concerns a technical field of speech recognition by a computer. In this method, a triphone standard common-language model is first generated based on training data of standard common language, and first and second monophone dialectal-accented common-language models are based on development data of dialectal-accented common languages of first kind and second kind, respectively. Then a temporary merged model is obtained in a manner that the first dialectal-accented common-language model is merged into the standard common-language model according to a first confusion matrix obtained by recognizing the development data of first dialectal-accented common language using the standard common-language model.
    Type: Grant
    Filed: October 29, 2009
    Date of Patent: April 29, 2014
    Assignees: Sony Computer Entertainment Inc., Tsinghua University
    Inventors: Fang Zheng, Xi Xiao, Linquan Liu, Zhan You, Wenxiao Cao, Makoto Akabane, Ruxin Chen, Yoshikazu Takahashi
  • Patent number: 8706489
    Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: April 22, 2014
    Assignee: Delta Electronics Inc.
    Inventors: Jia-lin Shen, Chien-Chou Hung
  • Patent number: 8676583
    Abstract: An action is performed in a spoken dialog system in response to a user's spoken utterance. A policy which maps belief states of user intent to actions is retrieved or created. A belief state is determined based on the spoken utterance, and an action is selected based on the determined belief state and the policy. The action is performed, and in one embodiment, involves requesting clarification of the spoken utterance from the user. Creating a policy may involve simulating user inputs and spoken dialog system interactions, and modifying policy parameters iteratively until a policy threshold is satisfied. In one embodiment, a belief state is determined by converting the spoken utterance into text, assigning the text to one or more dialog slots associated with nodes in a probabilistic ontology tree (POT), and determining a joint probability based on probability distribution tables in the POT and on the dialog slot assignments.
    Type: Grant
    Filed: August 30, 2011
    Date of Patent: March 18, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Rakesh Gupta, Deepak Ramachandran, Antoine Raux, Neville Mehta, Stefan Krawczyk, Matthew Hoffman
  • Patent number: 8676576
    Abstract: A copyright managing information processing apparatus includes a storage module for storing copyrighted content including audio data; a first topic module for recognizing audio data in content opened to the public by a to-be-opened information processing apparatus, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; a second topic module for recognizing audio data in content stored in the storage means, converting the audio data into text data, extracting keywords from the text data, and conducting topic processing using the keywords to create topic information; and a similarity determining module for comparing the topic information generated by the first topic module with that created by the second topic module for thereby determining presence or absence of similarity therebetween.
    Type: Grant
    Filed: January 19, 2009
    Date of Patent: March 18, 2014
    Assignee: NEC Corporation
    Inventor: Eiichirou Nomitsu
  • Patent number: 8666737
    Abstract: A noise power estimation system for estimating noise power of each frequency spectral component includes a cumulative histogram generating section for generating a cumulative histogram for each frequency spectral component of a time series signal, in which the horizontal axis indicates index of power level and the vertical axis indicates cumulative frequency and which is weighted by exponential moving average; and a noise power estimation section for determining an estimated value of noise power for each frequency spectral component of the time series signal based on the cumulative histogram.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: March 4, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa
  • Patent number: 8639510
    Abstract: A hardware acoustic scoring unit for a speech recognition system and a method of operation thereof are provided. Rather than scoring all senones in an acoustic model used for the speech recognition system, acoustic scoring logic first scores a set of ciphones based on acoustic features for one frame of sampled speech. The acoustic scoring logic then scores senones associated with the N highest scored ciphones. In one embodiment, the number (N) is three. While the acoustic scoring logic scores the senones associated with the N highest scored ciphones, high score ciphone identification logic operates in parallel with the acoustic scoring unit to identify one or more additional ciphones that have scores greater than a threshold. Once the acoustic scoring unit finishes scoring the senones for the N highest scored ciphones, the acoustic scoring unit then scores senones associated with the one or more additional ciphones.
    Type: Grant
    Filed: December 22, 2008
    Date of Patent: January 28, 2014
    Inventors: Kai Yu, Rob A. Rutenbar
  • Patent number: 8639517
    Abstract: Disclosed are systems, methods and computer-readable media for controlling a computing device to provide contextual responses to user inputs. The method comprises receiving a user input, generating a set of features characterizing an association between the user input and a conversation context based on at least a semantic and syntactic analysis of user inputs and system responses, determining with a data-driven machine learning approach whether the user input begins a new topic or is associated with a previous conversation context and if the received question is associated with the existing topic, then generating a response to the user input using information associated with the user input and any previous user input associated with the existing topic, based on a normalization of the length of the user input.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: January 28, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Giuseppe Di Fabbrizio, Junlan Feng
  • Patent number: 8626507
    Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: January 7, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8620658
    Abstract: A voice chat system includes a plurality of information processing apparatuses that performs a voice chat while performing speech recognition and a search server connected to the plural information processing apparatuses via a communication network. The search server discloses a search keyword list containing the search keywords searched by the search server to at least one of the plural information processing apparatuses. The at least one information processing apparatus includes a recognition word dictionary generating unit that acquires the search keyword list from the search server to generate a recognition word dictionary containing words for use in the speech recognition, and a speech recognition unit that performs speech recognition on voice data obtained from a dialog of the conversation during the voice chat by referencing a recognition database containing the recognition word dictionary.
    Type: Grant
    Filed: April 14, 2008
    Date of Patent: December 31, 2013
    Assignees: Sony Corporation, So-Net Entertainment Corporation
    Inventors: Motoki Nakade, Hiroaki Ogawa, Hitoshi Honda, Yoshinori Kurata, Daisuke Ishizuka
  • Patent number: 8612235
    Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.
    Type: Grant
    Filed: June 8, 2012
    Date of Patent: December 17, 2013
    Assignee: Vocollect, Inc.
    Inventors: Keith Braho, Amro El-Jaroudi
  • Patent number: 8612227
    Abstract: The present invention provides a method and equipment of pattern recognition capable of efficiently pruning partial hypotheses without lowering recognition accuracy, its pattern recognition program, and its recording medium. In a second search unit, a likelihood calculation unit calculates an acoustic likelihood by matching time series data of acoustic feature parameters against a lexical tree stored in a second database and an acoustic model stored in a third database to determine an accumulated likelihood by accumulating the acoustic likelihood in a time direction. A self-transition unit causes each partial hypothesis to make a self-transition in a search process. An LR transition unit causes each partial hypothesis to make an RL transition. A reward attachment unit adds a reward R(x) in accordance with the number of reachable words to each partial hypothesis to raise the accumulated likelihood. A pruning unit excludes partial hypotheses with less likelihood from search targets.
    Type: Grant
    Filed: July 22, 2010
    Date of Patent: December 17, 2013
    Assignee: KDDI Corporation
    Inventor: Tsuneo Kato
  • Patent number: 8612225
    Abstract: A voice recognition device that recognizes a voice of an input voice signal, comprises a voice model storage unit that stores in advance a predetermined voice model having a plurality of detail levels, the plurality of detail levels being information indicating a feature property of a voice for the voice model; a detail level selection unit that selects a detail level, closest to a feature property of an input voice signal, from the detail levels of the voice model stored in the voice model storage unit; and a parameter setting unit that sets parameters for recognizing the voice of an input voice according to the detail level selected by the detail level selection unit.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: December 17, 2013
    Assignee: NEC Corporation
    Inventors: Takayuki Arakawa, Ken Hanazawa, Masanori Tsujikawa
  • Patent number: 8606580
    Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 10, 2013
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Makoto Shozakai, Goshu Nagino
  • Patent number: 8589334
    Abstract: Methods and systems are provided for developing decision information relating to a single system based on data received from a plurality of sensors. The method includes receiving first data from a first sensor that defines first information of a first type that is related to a system, receiving second data from a second sensor that defines second information of a second type that is related to said system, wherein the first type is different from the second type, generating a first decision model, a second decision model, and a third decision model, determining whether data is available from only the first sensor, only the second sensor, or both the first and second sensors, and selecting based on the determination of availability an additional model to apply the available data, wherein the additional model is selected from a plurality of additional decision models including the third decision model.
    Type: Grant
    Filed: January 18, 2011
    Date of Patent: November 19, 2013
    Assignee: Telcordia Technologies, Inc.
    Inventor: Akshay Vashist
  • Patent number: 8589165
    Abstract: The present disclosure provides method and system for converting a free text expression of an identity to a phonetic equivalent code. The conversion follows a set of rules based on phonetic groupings and compresses the expression to a shorter series of characters than the expression. The phonetic equivalent code may be compared to one or more other phonetic equivalent code to establish a correlation between the codes. The phonetic equivalent code of the free text expression may be associated with the code of a known identity. The known identity may be provided to a user for confirmation of the identity. Further, a plurality of expressions stored in a database may be consolidated by converting the expressions to phonetic equivalent codes, comparing the codes to find correlations, and if appropriate reducing the number of expressions or mapping the expressions to a fewer number of expressions.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: November 19, 2013
    Assignee: United Services Automobile Association (USAA)
    Inventors: Gregory Brian Meyer, James Elden Nicholson
  • Patent number: 8583439
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: January 12, 2004
    Date of Patent: November 12, 2013
    Assignee: Verizon Services Corp.
    Inventor: James Mark Kondziela
  • Patent number: 8577678
    Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: March 10, 2011
    Date of Patent: November 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8560324
    Abstract: A mobile terminal including an input unit configured to receive an input to activate a voice recognition function on the mobile terminal, a memory configured to store information related to operations performed on the mobile terminal, and a controller configured to activate the voice recognition function upon receiving the input to activate the voice recognition function, to determine a meaning of an input voice instruction based on at least one prior operation performed on the mobile terminal and a language included in the voice instruction, and to provide operations related to the determined meaning of the input voice instruction based on the at least one prior operation performed on the mobile terminal and the language included in the voice instruction and based on a probability that the determined meaning of the input voice instruction matches the information related to the operations of the mobile terminal.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: October 15, 2013
    Assignee: LG Electronics Inc.
    Inventors: Jong-Ho Shin, Jae-Do Kwak, Jong-Keun Youn
  • Patent number: 8559618
    Abstract: A system, method, and computer readable medium for contact center call routing by agent attribute, that comprises, detecting a caller voice attribute, sampling at least one agent voice attribute, storing the at least one agent voice attribute, matching the detected caller voice attribute to the stored voice attribute of the at least one agent, and routing a call based upon the matched agent voice attribute to the detected caller voice attribute.
    Type: Grant
    Filed: June 28, 2006
    Date of Patent: October 15, 2013
    Assignee: West Corporation
    Inventors: Jeffrey William Cordell, James K Boutcher, Michelle L Steinbeck
  • Patent number: 8554553
    Abstract: Methods and systems for non-negative hidden Markov modeling of signals are described. For example, techniques disclosed herein may be applied to signals emitted by one or more sources. In some embodiments, methods and systems may enable the separation of a signal's various components. As such, the systems and methods disclosed herein may find a wide variety of applications. In audio-related fields, for example, these techniques may be useful in music recording and processing, source extraction, noise reduction, teaching, automatic transcription, electronic games, audio search and retrieval, and many other applications.
    Type: Grant
    Filed: February 21, 2011
    Date of Patent: October 8, 2013
    Assignee: Adobe Systems Incorporated
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Patent number: 8548806
    Abstract: A voice recognition device, a voice recognition method and a voice recognition program capable of appropriately restricting recognition objects based on voice input from a user to recognize the input voice with accuracy are provided.
    Type: Grant
    Filed: September 11, 2007
    Date of Patent: October 1, 2013
    Assignee: Honda Motor Co. Ltd.
    Inventor: Hisayuki Nagashima
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8543399
    Abstract: An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: September 24, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-hoon Jeong, Sang-bae Jeong, Jeong-su Kim, Nam-hoon Kim
  • Patent number: 8532993
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the language model.
    Type: Grant
    Filed: July 2, 2012
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8521538
    Abstract: A system and method of assisting a care provider in the documentation of self-performance and support information for a resident or person includes a speech dialog with a care provider that uses the generation of speech to play to the care provider and the capture of speech spoken by a care provider. The speech dialog provides assistance to the care provider in providing care for a person according to a care plan for the person. The care plan includes one or more activities requiring a level of performance by the person. For the activity, speech inquiries are provided to the care provider, through the speech dialog, regarding performance of the activity by the person and regarding care provider assistance in the performance of the activity by the person. Speech input is captured from the care provider that is responsive to the speech inquiries. A code is then determined from the speech input and the code indicates the self-performance of the person and support information for a care provider for the activity.
    Type: Grant
    Filed: September 10, 2010
    Date of Patent: August 27, 2013
    Assignee: Vocollect Healthcare Systems, Inc.
    Inventors: Michael Laughery, Bonnie Praksti, David M. Findlay, James E. Shearon
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Patent number: 8510111
    Abstract: A speech recognition apparatus includes a generating unit generating a speech-feature vector expressing a feature for each of frames obtained by dividing an input speech, a storage unit storing a first acoustic model obtained by modeling a feature of each word by using a state transition model, a storage unit configured to store at least one second acoustic model, a calculation unit calculating, for each state, a first probability of transition to an at-end-frame state to obtain first probabilities, and select a maximum probability of the first probabilities, a selection unit selecting a maximum-probability-transition path, a conversion unit converting the maximum-probability-transition path into a corresponding-transition-path corresponding to the second acoustic model, a calculation unit calculating a second probability of transition to the at-end-frame state on the corresponding-transition-path, and a finding unit finding to which word the input speech corresponds based on the maximum probability and the s
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: August 13, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Hiroshi Fujimura, Shinichi Tanaka
  • Patent number: 8484035
    Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: July 9, 2013
    Assignee: Massachusetts Institute of Technology
    Inventor: Alex Paul Pentland
  • Patent number: 8478589
    Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.
    Type: Grant
    Filed: January 5, 2005
    Date of Patent: July 2, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
  • Patent number: 8478582
    Abstract: A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: July 2, 2013
    Assignee: KDDI Corporation
    Inventors: Yukiko Habu, Ryoichi Kawada, Nobuhide Kotsuka, Sung Jiae, Koki Uchiyama, Santi Saeyor, Hirosuke Asano, Toshiaki Shimamura
  • Patent number: 8442833
    Abstract: Computer implemented speech processing is disclosed. First and second voice segments are extracted from first and second microphone signals originating from first and second microphones. The first and second voice segments correspond to a voice sound originating from a common source. An estimated source location is generated based on a relative energy of the first and second voice segments and/or a correlation of the first and second voice segments. A determination whether the voice segment is desired or undesired may be made based on the estimated source location.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: May 14, 2013
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 8442829
    Abstract: Speech processing is disclosed for an apparatus having a main processing unit, a memory unit, and one or more co-processors. Memory maintenance and voice recognition result retrievals upon execution are performed with a first main processor thread. Voice detection and initial feature extraction on the raw data are performed with a first co-processor. A second co-processor thread receives feature data derived for one or more features extracted by the first co-processor thread and information for locating probability density functions needed for probability computation by a speech recognition model and computes a probability that the one or more features correspond to a known sub-unit of speech using the probability density functions and the feature data. At least a portion of a path probability that a sequence of sub-units of speech correspond to a known speech unit is computed with a third co-processor thread.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: May 14, 2013
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 8442821
    Abstract: A method and system for multi-frame prediction in a hybrid neural network/hidden Markov model automatic speech recognition (ASR) system is disclosed. An audio input signal may be transformed into a time sequence of feature vectors, each corresponding to respective temporal frame of a sequence of periodic temporal frames of the audio input signal. The time sequence of feature vectors may be concurrently input to a neural network, which may process them concurrently. In particular, the neural network may concurrently determine for the time sequence of feature vectors a set of emission probabilities for a plurality of hidden Markov models of the ASR system, where the set of emission probabilities are associated with the temporal frames. The set of emission probabilities may then be concurrently applied to the hidden Markov models for determining speech content of the audio input signal.
    Type: Grant
    Filed: July 27, 2012
    Date of Patent: May 14, 2013
    Assignee: Google Inc.
    Inventor: Vincent Vanhoucke
  • Patent number: 8442828
    Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
  • Patent number: 8433558
    Abstract: Disclosed herein are systems and methods to incorporate human knowledge when developing and using statistical models for natural language understanding. The disclosed systems and methods embrace a data-driven approach to natural language understanding which progresses seamlessly along the continuum of availability of annotated collected data, from when there is no available annotated collected data to when there is any amount of annotated collected data.
    Type: Grant
    Filed: July 25, 2005
    Date of Patent: April 30, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Mazin Gilbert, Narendra K. Gupta
  • Patent number: 8428950
    Abstract: A speech recognition apparatus (110) selects an optimum recognition result from recognition results output from a set of speech recognizers (s1-sM) based on a majority decision. This decision is implemented with taking into account weight values, as to the set of the speech recognizers, learned by a learning apparatus (100). The learning apparatus includes a unit (103) selecting speech recognizers corresponding to characteristics of speech for learning (101), a unit (104) finding recognition results of the speech for learning by using the selected speech recognizers, a unit (105) unifying the recognition results and generating a word string network, and a unit (106) finding weight values concerning a set of the speech recognizers by implementing learning processing.
    Type: Grant
    Filed: January 18, 2008
    Date of Patent: April 23, 2013
    Assignee: NEC Corporation
    Inventors: Yoshifumi Onishi, Tadashi Emori
  • Patent number: 8423364
    Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: April 16, 2013
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Alejandro Acero, Li Deng, Xiaodong He
  • Patent number: 8412526
    Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: April 2, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Alexander Sorin
  • Patent number: 8392195
    Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: March 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
  • Patent number: 8390669
    Abstract: The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: March 5, 2013
    Assignee: Cisco Technology, Inc.
    Inventors: Jason Catchpole, Craig Cockerton
  • Patent number: 8392185
    Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: March 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8386232
    Abstract: A method for predicting results for input data based on a model that is generated based on clusters of related characters, clusters of related segments, and training data. The method comprises receiving a data set that includes a plurality of words in a particular language. In the particular language, words are formed by characters. Clusters of related characters are formed from the data set. A model is generated based at least on the clusters of related characters and training data. The model may also be based on the clusters of related segments. The training data includes a plurality of entries, wherein each entry includes a character and a designated result for said character. A set of input data that includes characters that have not been associated with designated results is received. The model is applied to the input data to determine predicted results for characters within the input data.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: February 26, 2013
    Assignee: Yahoo! Inc.
    Inventor: Fuchun Peng
  • Patent number: 8386254
    Abstract: A method of speech recognition converts an unknown speech input into a stream of representative features. The feature stream is transformed based on speaker dependent adaptation of multi-class feature models. Then automatic speech recognition is used to compare the transformed feature stream to multi-class speaker independent acoustic models to generate an output representative of the unknown speech input.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: February 26, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Neeraj Deshmukh, Puming Zhan