Markov Patents (Class 704/256)
  • Patent number: 7991616
    Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.
    Type: Grant
    Filed: October 22, 2007
    Date of Patent: August 2, 2011
    Assignee: Hitachi, Ltd.
    Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
  • Patent number: 7974843
    Abstract: The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants. The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.
    Type: Grant
    Filed: January 2, 2003
    Date of Patent: July 5, 2011
    Assignee: Siemens Aktiengesellschaft
    Inventor: Tobias Schneider
  • Patent number: 7974846
    Abstract: A data embedding device for embedding data in a speech code obtained by encoding a speech in accordance with a speech encoding method based on a voice generation process of a human being, includes an embedding judgment unit, every speech code, judging whether or not data should be embedded in the speech code, and an embedding unit embedding data in two or more parameter codes of a plurality of parameter codes constituting the speech code for which it is judged by the embedding judgment unit that the data should be embedded.
    Type: Grant
    Filed: March 17, 2004
    Date of Patent: July 5, 2011
    Assignee: Fujitsu Limited
    Inventors: Yoshiteru Tsuchinaga, Yasuji Ota, Masanao Suzuki, Masakiyo Tanaka
  • Patent number: 7970613
    Abstract: Use of runtime memory may be reduced in a data processing algorithm that uses one or more probability distribution functions. Each probability distribution function may be characterized by one or more uncompressed mean values and one or more variance values. The uncompressed mean and variance values may be represented by ?-bit floating point numbers, where ? is an integer greater than 1. The probability distribution functions are converted to compressed probability functions having compressed mean and/or variance values represented as ?-bit integers, where ? is less than ?, whereby the compressed mean and/or variance values occupy less memory space than the uncompressed mean and/or variance values. Portions of the data processing algorithm can be performed with the compressed mean and variance values.
    Type: Grant
    Filed: November 12, 2005
    Date of Patent: June 28, 2011
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7970614
    Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.
    Type: Grant
    Filed: May 8, 2007
    Date of Patent: June 28, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
  • Patent number: 7937270
    Abstract: A system and method recognizes speech securely using a secure multi-party computation protocol. The system includes a client and a server. The client is configured to provide securely speech in a form of an observation sequence of symbols, and the server is configured to provide securely a multiple trained hidden Markov models (HMMs), each trained HMM including a multiple states, a state transition probability distribution and an initial state distribution, and each state including a subset of the observation symbols and an observation symbol probability distribution. The observation symbol probability distributions are modeled by mixtures of Gaussian distributions. Also included are means for determining securely, for each HMM, a likelihood the observation sequence is produced by the states of the HMM, and means for determining a particular symbol with a maximum likelihood of a particular subset of the symbols corresponding to the speech.
    Type: Grant
    Filed: January 16, 2007
    Date of Patent: May 3, 2011
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Paris Smaragdis, Madhusudana Shashanka
  • Patent number: 7933847
    Abstract: An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L1-regularization. The algorithm is a modification of the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton algorithm, but which can now handle the discontinuity of the gradient using a procedure that chooses a search direction at each iteration and modifies the line search procedure. The algorithm includes an iterative optimization procedure where each iteration approximately minimizes the objective over a constrained region of the space on which the objective is differentiable (in the case of L1-regularization, a given orthant), models the second-order behavior of the objective by considering the loss component alone, using a “line-search” at each iteration that projects search points back onto the chosen orthant, and determines when to stop the line search.
    Type: Grant
    Filed: October 17, 2007
    Date of Patent: April 26, 2011
    Assignee: Microsoft Corporation
    Inventors: Galen Andrew, Jianfeng Gao
  • Patent number: 7933774
    Abstract: A system and method is provided for rapidly generating a new spoken dialog application. In one embodiment, a user experience person labels the transcribed data (e.g., 3000 utterances) using a set of interactive tools. The labeled data is then stored in a processed data database. During the labeling process, the user experience person not only groups utterances in various call type categories, but also flags (e.g., 100-200) specific utterances as positive and negative examples for use in an annotation guide. The labeled data in the processed data database can also be used to generate an initial natural language understanding (NLU) model.
    Type: Grant
    Filed: March 18, 2004
    Date of Patent: April 26, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Mazin G. Rahim, Allen Louis Gorin, Behzad Shahraray, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Patrick Guy Haffner, Harris Drucker, Steven Hart Lewis
  • Patent number: 7912717
    Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: March 22, 2011
    Inventor: Albert Galick
  • Patent number: 7899761
    Abstract: Disclosed herein are a system and method for trend prediction of signals in a time series using a Markov model. The method includes receiving a plurality of data series and input parameters, where the input parameters include a time step parameter, preprocessing the plurality of data series according to the input parameters, to form binned and classified data series, and processing the binned and classified data series. The processing includes initializing a Markov model for trend prediction, and training the Markov model for trend prediction of the binned and classified data series to form a trained Markov model. The method further includes deploying the trained Markov model for trend prediction, including outputting trend predictions. The method develops an architecture for the Markov model from the data series and the input parameters, and disposes the Markov model, having the architecture, for trend prediction.
    Type: Grant
    Filed: April 25, 2005
    Date of Patent: March 1, 2011
    Assignee: GM Global Technology Operations LLC
    Inventors: Shubha Kadambe, Leandro G. Barajas, Youngkwan Cho, Pulak Bandyopadhyay
  • Patent number: 7895040
    Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 22, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masaru Sakai, Shinichi Tanaka
  • Patent number: 7890327
    Abstract: Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.
    Type: Grant
    Filed: July 16, 2004
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Chitra Dorai, Ying Li
  • Patent number: 7881935
    Abstract: A speech recognition apparatus in which the accuracy in speech recognition is improved as the resource is prevented from increasing. Such a word which is probable as the result of the speech recognition is selected on the basis of an acoustic score and a linguistic score, while word selection is also performed on the basis of a measure different from the acoustic score, such as the number of phonemes being small, a part of speech being a pre-set one, inclusion in the past results of speech recognition or the linguistic score being not less than a pre-set value. The words so selected are subjected to matching processing.
    Type: Grant
    Filed: February 16, 2001
    Date of Patent: February 1, 2011
    Assignee: Sony Corporation
    Inventors: Yasuharu Asano, Katsuki Minamino, Hiroaki Ogawa, Helmut Lucke
  • Patent number: 7877255
    Abstract: A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: January 25, 2011
    Assignee: Voice Signal Technologies, Inc.
    Inventor: Igor Zlokarnik
  • Patent number: 7877256
    Abstract: A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.
    Type: Grant
    Filed: February 17, 2006
    Date of Patent: January 25, 2011
    Assignee: Microsoft Corporation
    Inventors: Xiaolong Li, Li Deng, Dong Yu, Alejandro Acero
  • Patent number: 7860314
    Abstract: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.
    Type: Grant
    Filed: October 29, 2004
    Date of Patent: December 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian I. Chelba, Alejandro Acero
  • Patent number: 7856356
    Abstract: A speech recognition system for a mobile terminal includes an acoustic variation channel unit and a pronunciation channel unit. The acoustic variation channel unit transforms a speech signal into feature parameters and Viterbi-decodes the speech signal to produce a varied phoneme sequence by using the feature parameters and predetermined models. Further, the pronunciation variation channel unit Viterbi-decodes the varied phoneme sequence to produce a word phoneme sequence by using the varied phoneme sequence and a preset DHMM (Discrete Hidden Markov Model) based context-dependent error model.
    Type: Grant
    Filed: December 20, 2006
    Date of Patent: December 21, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Hoon Chung, Yunkeun Lee
  • Publication number: 20100312561
    Abstract: An apparatus and a method for performing a grounding process using the POMDP are provided. The configuration is designed so that, in order to understand a request from a user through the utterances from the user, a grounding process is performed using the POMDP (Partially Observable Markov Decision Process) in which analysis information acquired from a language analyzing unit that receives the utterances of the user and performs language analysis and pragmatic information including task feasibility information acquired from the task manager that performs a task are set as observation information. Accordingly, understanding can be efficiently achieved, and high-speed and accurate recognition of the user request and task execution based on the user request can be provided.
    Type: Application
    Filed: December 4, 2008
    Publication date: December 9, 2010
    Inventor: Ugo Di Profio
  • Patent number: 7827031
    Abstract: A neural network in a speech-recognition system has computing units organized in levels including at least one hidden level and one output level. The computing units of the hidden level are connected to the computing units of the output level via weighted connections, and the computing units of the output level correspond to acoustic-phonetic units of the general vocabulary. This network executes the following steps: determining a subset of acoustic-phonetic units necessary for recognizing all the words contained in the general vocabulary subset; eliminating from the neural network all the weighted connections afferent to computing units of the output level that correspond to acoustic-phonetic units not contained in the previously determined subset of acoustic-phonetic units, thus obtaining a compacted neural network optimized for recognition of the words contained in the general vocabulary subset; and executing, at each moment in time, only the compacted neural network.
    Type: Grant
    Filed: February 12, 2003
    Date of Patent: November 2, 2010
    Assignee: Loquendo S.p.A.
    Inventors: Dario Albesano, Roberto Gemello
  • Patent number: 7818271
    Abstract: A method and apparatus are disclosed for selecting interaction policies. Values may be provided for a group of parameters for user models. Interaction policies within a specific tolerance of an optimal interaction policy for the user models may be learned. Up to a predetermined number of the learned interaction policies, within a specific tolerance of an optimal policy for the user models, may be selected for a wireless communication device. The wireless communication device, including the selected interaction policies, may determine whether any of a group of parameters, representing a user preference or contextual information with respect to use of the wireless communication device, is updated. When any of the group of parameters has been updated, the wireless communication device may select one of the selected interaction policies, such that the selected one of the selected interaction policies may determine a better interaction behavior for the wireless communication device.
    Type: Grant
    Filed: June 13, 2007
    Date of Patent: October 19, 2010
    Assignee: Motorola Mobility, Inc.
    Inventor: Michael E. Groble
  • Patent number: 7813927
    Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: October 12, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
  • Patent number: 7813925
    Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.
    Type: Grant
    Filed: April 6, 2006
    Date of Patent: October 12, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hiroki Yamamoto, Masayuki Yamada
  • Patent number: 7805305
    Abstract: The present invention discloses a method for semantically processing speech for speech recognition purposes. The method can reduce an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a multiple contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar. The method also reduces needed CPU requirements. Achieved reductions can be accomplished by representing the embedded grammar as a recursive transition network (RTN), where only one instance of the recursive transition network is used for the contexts. Other than the embedded grammars, a Hidden Markov Model (HMM) strategy can be used for the search space.
    Type: Grant
    Filed: October 12, 2006
    Date of Patent: September 28, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Daniel E. Badt, Tomas Beran, Radek Hampl, Pavel Krbec, Jan Sedivy
  • Patent number: 7792672
    Abstract: A method for converting a voice signal from a source speaker into a converted voice signal with acoustic characteristics similar to those of a target speaker includes the steps of determining (1) at least one function for transforming source speaker acoustic characteristics into acoustic characteristics similar to those of the target speaker using target and source speaker voice samples; and transforming acoustic characteristics of the source speaker voice signal to be converted by applying the transformation function(s). The method is characterized in that the transformation (2) includes the step (44) of applying only a predetermined portion of at least one transformation function to said signal to be converted.
    Type: Grant
    Filed: March 14, 2005
    Date of Patent: September 7, 2010
    Assignee: France Telecom
    Inventors: Olivier Rosec, Taoufik En-Najjary
  • Publication number: 20100217599
    Abstract: Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE*) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE* can be instantiated to discover all and only Markov boundaries independent of data distribution.
    Type: Application
    Filed: October 30, 2009
    Publication date: August 26, 2010
    Inventors: Alexander Statnikov, Konstantinos (Constantin) F. Aliferis
  • Patent number: 7778831
    Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: August 17, 2010
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ruxin Chen
  • Patent number: 7739294
    Abstract: A method for creating an ordered reading list of predetermined length of relevant topics from a hyperlinked database source of information website for a user. The method includes determining at least one topic of interest based on a plurality of methods and choosing a topic ordering algorithm from a plurality of topic ordering algorithms. A top-down schematic algorithm includes a page rank calculation performed by iterating until a convergence. A bottom-up schematic algorithm includes a linear parameterization of a ratio of an order from a plurality of source topics to a plurality of sink topics of an article, and a horizontal schematic algorithm includes an order parameterization by absolute differences of a log of a plurality of ranks and an absolute difference of a plurality of distances with analogous cutoff methods.
    Type: Grant
    Filed: January 12, 2007
    Date of Patent: June 15, 2010
    Inventor: Alexander David Wissner-Gross
  • Patent number: 7734471
    Abstract: An online dialog system and method are provided. The dialog system receives speech input and outputs an action according to its models. After executing the action, the system receives feedback from the environment or user. The system immediately utilizes the feedback to update its models in an online fashion.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: June 8, 2010
    Assignee: Microsoft Corporation
    Inventors: Timothy S. Paek, David M. Chickering, Eric J. Horvitz
  • Patent number: 7711559
    Abstract: A speech recognition apparatus that requires a reduced amount of computation for likelihood calculation is provided. A language lookahead score for a node of interest is generated based on the language scores for each recognition word shared by the node of interest. To this is added the node's acoustic score, which is calculated based on the likelihood of the connected hypotheses expressed by a path from the root node to the parent node of the node of interest. From this added result, the language lookahead score resulting when the parent node is the node of interest is deleted, and the language lookahead score is updated by adding the language lookahead score of the node of interest. The updating of the language lookahead score is terminated at a specific position in the tree structure.
    Type: Grant
    Filed: December 13, 2006
    Date of Patent: May 4, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hideo Kuboyama, Hiroki Yamamoto
  • Patent number: 7707027
    Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.
    Type: Grant
    Filed: April 13, 2006
    Date of Patent: April 27, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Rajesh Balchandran, Linda Boyer
  • Patent number: 7702507
    Abstract: Systems and methods that automatically register a mobile computing unit on a wireless network area, via employing a voice recognition system associated with the mobile computing unit. A handshake can occur between a mobile computing unit and a server of the network upon utterance of predetermined voice (e.g., a sequence of letters) by the user into the voice recognition component. As such, a mass deployment of mobile computing units on the network can be facilitated in a secure manner with just enough information to access the network.
    Type: Grant
    Filed: November 10, 2005
    Date of Patent: April 20, 2010
    Assignee: Symbol Technologies, Inc.
    Inventor: Patrick Tilley
  • Patent number: 7698740
    Abstract: The present invention aims at providing a sequential data examination method which can increase data examination accuracy compared with the prior art. The similarity is calculated between a layered network model generated from learning sequential data to be learned and a layered network model generated from testing sequential data to be tested. Based on the similarity, it is determined whether or not the testing sequential data to be tested belong to one or more categories. A network model for each layer of the layered network model is constructed by multiplying an element of the feature vector and its corresponding Eigen co-occurrence matrix.
    Type: Grant
    Filed: July 12, 2005
    Date of Patent: April 13, 2010
    Assignee: Japan Science and Technology Agency
    Inventors: Mizuki Oka, Kazuhiko Kato
  • Patent number: 7698136
    Abstract: The present invention is directed to a computer implemented method and apparatus for flexibly recognizing meaningful data items within an arbitrary user utterance. According to one example embodiment of the invention, a set of one or more key phrases and a set of one or more filler phrases are defined, probabilities are assigned to the key phrases and/or the filler phrases, and the user utterances is evaluated against the set of key phrases and the set of filler phrases using the probabilities.
    Type: Grant
    Filed: January 28, 2003
    Date of Patent: April 13, 2010
    Assignee: Voxify, Inc.
    Inventors: Patrick T. M. Nguyen, Adeeb W. M. Shana'a, Amit V. Desai
  • Patent number: 7680664
    Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.
    Type: Grant
    Filed: August 16, 2006
    Date of Patent: March 16, 2010
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Jian-Iai Zhou, Frank Kao-ping Soong
  • Publication number: 20100049519
    Abstract: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.
    Type: Application
    Filed: November 5, 2009
    Publication date: February 25, 2010
    Applicant: AT&T Corp.
    Inventors: Mazin G. Rahim, Giuseppe Riccardi, Jeremy Huntley Wright, Bruce Melvin Buntschuh, Allen Louis Gorin
  • Patent number: 7664640
    Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: February 16, 2010
    Assignee: Qinetiq Limited
    Inventor: Christopher John St. Clair Webber
  • Patent number: 7664643
    Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.
    Type: Grant
    Filed: August 25, 2006
    Date of Patent: February 16, 2010
    Assignees: Nuance Communications, Inc.
    Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
  • Patent number: 7640163
    Abstract: A method for providing a web page having an audio interface. The method including providing data specifying a web page, including in the data a first rule based grammar statement having a first phrase portion, a first command portion and a first tag portion, and including in the data a second rule based grammar statement having a second phrase portion, a second command portion, and a second tag portion.
    Type: Grant
    Filed: November 30, 2001
    Date of Patent: December 29, 2009
    Assignee: The Trustees of Columbia University in the City of New York
    Inventors: Michael L. Charney, Justin Starren
  • Publication number: 20090313025
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Application
    Filed: August 20, 2009
    Publication date: December 17, 2009
    Applicant: AT&T Corp.
    Inventors: Alistair D. CONKIE, Yeon-Jun KIM
  • Patent number: 7627473
    Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.
    Type: Grant
    Filed: October 15, 2004
    Date of Patent: December 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
  • Patent number: 7617103
    Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the acoustic model. From this score a misclassification measure is calculated and then a loss function is calculated from the misclassification measure. The loss function also includes a margin value that varies over each iteration in the training. Based on the calculated loss function the acoustic model is updated, where the loss function with the margin value is minimized. This process repeats until such time as an empirical convergence is met.
    Type: Grant
    Filed: August 25, 2006
    Date of Patent: November 10, 2009
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Alex Acero, Dong Yu, Li Deng
  • Patent number: 7603276
    Abstract: A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference storing unit.
    Type: Grant
    Filed: November 18, 2003
    Date of Patent: October 13, 2009
    Assignee: Panasonic Corporation
    Inventor: Shinichi Yoshizawa
  • Patent number: 7603272
    Abstract: Disclosed is a system and method of decomposing a lattice transition matrix into a block diagonal matrix. The method is applicable to automatic speech recognition but can be used in other contexts as well, such as parsing, named entity extraction and any other methods. The method normalizes the topology of any input graph according to a canonical form.
    Type: Grant
    Filed: June 19, 2007
    Date of Patent: October 13, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Giuseppe Riccardi
  • Patent number: 7603269
    Abstract: A speech recognition grammar creating apparatus, which is capable of eliminating complex labor associated with preparing all rules by taking into account changes of the order of component elements of a speech-recognizing object and possible combinations of component elements including at least one component element that can be omitted. In the speech recognition grammar creating apparatus, an image edit section groups together at least one component element that cannot be omitted and at least one component element that can be omitted, as the speech-recognizing object, into a component element group as an omission-allowed group. An augmented BNF converting section creates the speech recognition grammar by expanding the component element group obtained by the grouping.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: October 13, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Kazue Kaneko, Michio Aizawa
  • Patent number: 7593845
    Abstract: A method and apparatus for identifying a semantic structure from an input text forms at least two candidate semantic structures. A semantic score is determined for each candidate semantic structure based on the likelihood of the semantic structure. A syntactic score is also determined for each semantic structure based on the position of a word in the text and the position in the semantic structure of a semantic entity formed from the word. The syntactic score and the semantic score are combined to select a semantic structure for at least a portion of the text. In many embodiments, the semantic structure is built incrementally by building and scoring candidate structures for a portion of the text, pruning low scoring candidates, and adding additional semantic elements to the retained candidates.
    Type: Grant
    Filed: October 6, 2003
    Date of Patent: September 22, 2009
    Assignee: Microsoflt Corporation
    Inventor: William D. Ramsey
  • Patent number: 7587320
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: August 1, 2007
    Date of Patent: September 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 7584102
    Abstract: Building a language model for use in speech recognition includes identifying without user interaction a source of text related to a user. Text is retrieved from the identified source of text and a language model related to the user is built from the retrieved text.
    Type: Grant
    Filed: November 15, 2002
    Date of Patent: September 1, 2009
    Assignee: Scansoft, Inc.
    Inventors: Kwangil Hwang, Eric Fieleke
  • Patent number: 7574359
    Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.
    Type: Grant
    Filed: October 1, 2004
    Date of Patent: August 11, 2009
    Assignee: Microsoft Corporation
    Inventor: Chao Huang
  • Patent number: 7565290
    Abstract: A speech recognition apparatus includes a word dictionary having recognition target words, a first acoustic model which expresses a reference pattern of a speech unit by one or more states, a second acoustic model which is lower in precision than said first acoustic model, selection means for selecting one of said first acoustic model and said second acoustic model on the basis of a parameter associated with a state of interest, and likelihood calculation means for calculating a likelihood of an acoustic feature parameter with respect to said acoustic model selected by said selection means.
    Type: Grant
    Filed: June 24, 2005
    Date of Patent: July 21, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hideo Kuboyama, Toshiaki Fukada, Yasuhiro Komori
  • Patent number: 7542949
    Abstract: A method determines temporal patterns in data sequences. A hierarchical tree of nodes is constructed. Each node in the tree is associated with a composite hidden Markov model, in which the composite hidden Markov model has one independent path for each child node of a parent node of the hierarchical tree. The composite hidden Markov models are trained using training data sequences. The composite hidden Markov models associated with the nodes of the hierarchical tree are decomposed into a single final composite Markov model. The single final composite hidden Markov model can then be employed for determining temporal patterns in unknown data sequences.
    Type: Grant
    Filed: May 12, 2004
    Date of Patent: June 2, 2009
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Christopher R. Wren, David C. Minnen