Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech synthesizer

Patent number: 7991616

Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.

Type: Grant

Filed: October 22, 2007

Date of Patent: August 2, 2011

Assignee: Hitachi, Ltd.

Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer

Patent number: 7974843

Abstract: The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants. The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.

Type: Grant

Filed: January 2, 2003

Date of Patent: July 5, 2011

Assignee: Siemens Aktiengesellschaft

Inventor: Tobias Schneider
Data embedding device and data extraction device

Patent number: 7974846

Abstract: A data embedding device for embedding data in a speech code obtained by encoding a speech in accordance with a speech encoding method based on a voice generation process of a human being, includes an embedding judgment unit, every speech code, judging whether or not data should be embedded in the speech code, and an embedding unit embedding data in two or more parameter codes of a plurality of parameter codes constituting the speech code for which it is judged by the embedding judgment unit that the data should be embedded.

Type: Grant

Filed: March 17, 2004

Date of Patent: July 5, 2011

Assignee: Fujitsu Limited

Inventors: Yoshiteru Tsuchinaga, Yasuji Ota, Masanao Suzuki, Masakiyo Tanaka
Method and system for Gaussian probability data bit reduction and computation

Patent number: 7970613

Abstract: Use of runtime memory may be reduced in a data processing algorithm that uses one or more probability distribution functions. Each probability distribution function may be characterized by one or more uncompressed mean values and one or more variance values. The uncompressed mean and variance values may be represented by ?-bit floating point numbers, where ? is an integer greater than 1. The probability distribution functions are converted to compressed probability functions having compressed mean and/or variance values represented as ?-bit integers, where ? is less than ?, whereby the compressed mean and/or variance values occupy less memory space than the uncompressed mean and/or variance values. Portions of the data processing algorithm can be performed with the compressed mean and variance values.

Type: Grant

Filed: November 12, 2005

Date of Patent: June 28, 2011

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Continuous adaptation in detection systems via self-tuning from target population subsets

Patent number: 7970614

Abstract: The present invention provides a system and method for treating distortion propagated though a detection system. The system includes a compensation module that compensates for untreated distortions propagating through the detection compensation system, a user model pool that comprises of a plurality of model sets, and a model selector that selects at least one model set from plurality of model sets in the user model pool. The compensation is accomplished by continually producing scores distributed according to a prescribed distribution for the at least one model set and mitigating the adverse effects of the scores being distorted and lying off a pre-set operating point. The method for treating distortion propagated though a detection system includes receiving a signal from a remote device, and compensating the signal for untreated distortions.

Type: Grant

Filed: May 8, 2007

Date of Patent: June 28, 2011

Assignee: Nuance Communications, Inc.

Inventors: Janice J. Kim, Jiri Navratil, Jason W. Pelecanos, Ganesh N. Ramaswamy
System and method for recognizing speech securely using a secure multi-party computation protocol

Patent number: 7937270

Abstract: A system and method recognizes speech securely using a secure multi-party computation protocol. The system includes a client and a server. The client is configured to provide securely speech in a form of an observation sequence of symbols, and the server is configured to provide securely a multiple trained hidden Markov models (HMMs), each trained HMM including a multiple states, a state transition probability distribution and an initial state distribution, and each state including a subset of the observation symbols and an observation symbol probability distribution. The observation symbol probability distributions are modeled by mixtures of Gaussian distributions. Also included are means for determining securely, for each HMM, a likelihood the observation sequence is produced by the states of the HMM, and means for determining a particular symbol with a maximum likelihood of a particular subset of the symbols corresponding to the speech.

Type: Grant

Filed: January 16, 2007

Date of Patent: May 3, 2011

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Paris Smaragdis, Madhusudana Shashanka
Limited-memory quasi-newton optimization algorithm for L1-regularized objectives

Patent number: 7933847

Abstract: An algorithm that employs modified methods developed for optimizing differential functions but which can also handle the special non-differentiabilities that occur with the L1-regularization. The algorithm is a modification of the L-BFGS (limited-memory Broyden-Fletcher-Goldfarb-Shanno) quasi-Newton algorithm, but which can now handle the discontinuity of the gradient using a procedure that chooses a search direction at each iteration and modifies the line search procedure. The algorithm includes an iterative optimization procedure where each iteration approximately minimizes the objective over a constrained region of the space on which the objective is differentiable (in the case of L1-regularization, a given orthant), models the second-order behavior of the objective by considering the loss component alone, using a “line-search” at each iteration that projects search points back onto the chosen orthant, and determines when to stop the line search.

Type: Grant

Filed: October 17, 2007

Date of Patent: April 26, 2011

Assignee: Microsoft Corporation

Inventors: Galen Andrew, Jianfeng Gao
System and method for automatic generation of a natural language understanding model

Patent number: 7933774

Abstract: A system and method is provided for rapidly generating a new spoken dialog application. In one embodiment, a user experience person labels the transcribed data (e.g., 3000 utterances) using a set of interactive tools. The labeled data is then stored in a processed data database. During the labeling process, the user experience person not only groups utterances in various call type categories, but also flags (e.g., 100-200) specific utterances as positive and negative examples for use in an annotation guide. The labeled data in the processed data database can also be used to generate an initial natural language understanding (NLU) model.

Type: Grant

Filed: March 18, 2004

Date of Patent: April 26, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Mazin G. Rahim, Allen Louis Gorin, Behzad Shahraray, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Patrick Guy Haffner, Harris Drucker, Steven Hart Lewis
Method for uncovering hidden Markov models

Patent number: 7912717

Abstract: The invention uses the ModelGrower program to generate possible candidates from an original or aggregated model. An isomorphic reduction program operates on the candidates to identify and exclude isomorphic models. A Markov model evaluation and optimization program operates on the remaining non-isomorphic candidates. The candidates are optimized and the ones that most closely conform to the data are kept. The best optimized candidate of one stage becomes the starting candidate for the next stage where ModelGrower and the other programs operate on the optimized candidate to generate a new optimized candidate. The invention repeats the steps of growing, excluding isomorphs, evaluating and optimizing until such repetitions yield no significantly better results.

Type: Grant

Filed: November 18, 2005

Date of Patent: March 22, 2011

Inventor: Albert Galick
System and method for signal prediction

Patent number: 7899761

Abstract: Disclosed herein are a system and method for trend prediction of signals in a time series using a Markov model. The method includes receiving a plurality of data series and input parameters, where the input parameters include a time step parameter, preprocessing the plurality of data series according to the input parameters, to form binned and classified data series, and processing the binned and classified data series. The processing includes initializing a Markov model for trend prediction, and training the Markov model for trend prediction of the binned and classified data series to form a trained Markov model. The method further includes deploying the trained Markov model for trend prediction, including outputting trend predictions. The method develops an architecture for the Markov model from the data series and the input parameters, and disposes the Markov model, having the architecture, for trend prediction.

Type: Grant

Filed: April 25, 2005

Date of Patent: March 1, 2011

Assignee: GM Global Technology Operations LLC

Inventors: Shubha Kadambe, Leandro G. Barajas, Youngkwan Cho, Pulak Bandyopadhyay
Device and method of modeling acoustic characteristics with HMM and collating the same with a voice characteristic vector sequence

Patent number: 7895040

Abstract: According to an embodiment, voice recognition apparatus includes units of: acoustic processing, voice interval detecting, dictionary, collating, search target selecting, storing and determining, and voice recognition method includes processes of: selecting a search range on basis of a beam search, setting and storing a standard frame, storing an output probability of a certain transition path, determining whether or not the output probability of a certain path is stored. Number of times of calculation of the output probability is reduced by selecting the search range on basis of the beam search, calculating the output probability of the certain transition path only once in an interval from when the standard frame is set to when the standard frame is renewed, and storing and using thus calculated value as an approximate value of the output probability in subsequent frames.

Type: Grant

Filed: March 30, 2007

Date of Patent: February 22, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masaru Sakai, Shinichi Tanaka
Framework for extracting multiple-resolution semantics in composite media content analysis

Patent number: 7890327

Abstract: Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.

Type: Grant

Filed: July 16, 2004

Date of Patent: February 15, 2011

Assignee: International Business Machines Corporation

Inventors: Chitra Dorai, Ying Li
Speech recognition device and speech recognition method and recording medium utilizing preliminary word selection

Patent number: 7881935

Abstract: A speech recognition apparatus in which the accuracy in speech recognition is improved as the resource is prevented from increasing. Such a word which is probable as the result of the speech recognition is selected on the basis of an acoustic score and a linguistic score, while word selection is also performed on the basis of a measure different from the acoustic score, such as the number of phonemes being small, a part of speech being a pre-set one, inclusion in the past results of speech recognition or the linguistic score being not less than a pre-set value. The words so selected are subjected to matching processing.

Type: Grant

Filed: February 16, 2001

Date of Patent: February 1, 2011

Assignee: Sony Corporation

Inventors: Yasuharu Asano, Katsuki Minamino, Hiroaki Ogawa, Helmut Lucke
Speech recognition using channel verification

Patent number: 7877255

Abstract: A method for automatic speech recognition includes determining for an input signal a plurality scores representative of certainties that the input signal is associated with corresponding states of a speech recognition model, using the speech recognition model and the determined scores to compute an average signal, computing a difference value representative of a difference between the input signal and the average signal, and processing the input signal in accordance with the difference value.

Type: Grant

Filed: March 31, 2006

Date of Patent: January 25, 2011

Assignee: Voice Signal Technologies, Inc.

Inventor: Igor Zlokarnik
Time synchronous decoding for long-span hidden trajectory model

Patent number: 7877256

Abstract: A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.

Type: Grant

Filed: February 17, 2006

Date of Patent: January 25, 2011

Assignee: Microsoft Corporation

Inventors: Xiaolong Li, Li Deng, Dong Yu, Alejandro Acero
Adaptation of exponential models

Patent number: 7860314

Abstract: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.

Type: Grant

Filed: October 29, 2004

Date of Patent: December 28, 2010

Assignee: Microsoft Corporation

Inventors: Ciprian I. Chelba, Alejandro Acero
Speech recognition system for mobile terminal

Patent number: 7856356

Abstract: A speech recognition system for a mobile terminal includes an acoustic variation channel unit and a pronunciation channel unit. The acoustic variation channel unit transforms a speech signal into feature parameters and Viterbi-decodes the speech signal to produce a varied phoneme sequence by using the feature parameters and predetermined models. Further, the pronunciation variation channel unit Viterbi-decodes the varied phoneme sequence to produce a word phoneme sequence by using the varied phoneme sequence and a preset DHMM (Discrete Hidden Markov Model) based context-dependent error model.

Type: Grant

Filed: December 20, 2006

Date of Patent: December 21, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Hoon Chung, Yunkeun Lee
Information Processing Apparatus, Information Processing Method, and Computer Program

Publication number: 20100312561

Abstract: An apparatus and a method for performing a grounding process using the POMDP are provided. The configuration is designed so that, in order to understand a request from a user through the utterances from the user, a grounding process is performed using the POMDP (Partially Observable Markov Decision Process) in which analysis information acquired from a language analyzing unit that receives the utterances of the user and performs language analysis and pragmatic information including task feasibility information acquired from the task manager that performs a task are set as observation information. Accordingly, understanding can be efficiently achieved, and high-speed and accurate recognition of the user request and task execution based on the user request can be provided.

Type: Application

Filed: December 4, 2008

Publication date: December 9, 2010

Inventor: Ugo Di Profio
Method for accelerating the execution of speech recognition neural networks and the related speech recognition device

Patent number: 7827031

Abstract: A neural network in a speech-recognition system has computing units organized in levels including at least one hidden level and one output level. The computing units of the hidden level are connected to the computing units of the output level via weighted connections, and the computing units of the output level correspond to acoustic-phonetic units of the general vocabulary. This network executes the following steps: determining a subset of acoustic-phonetic units necessary for recognizing all the words contained in the general vocabulary subset; eliminating from the neural network all the weighted connections afferent to computing units of the output level that correspond to acoustic-phonetic units not contained in the previously determined subset of acoustic-phonetic units, thus obtaining a compacted neural network optimized for recognition of the words contained in the general vocabulary subset; and executing, at each moment in time, only the compacted neural network.

Type: Grant

Filed: February 12, 2003

Date of Patent: November 2, 2010

Assignee: Loquendo S.p.A.

Inventors: Dario Albesano, Roberto Gemello
Parameterized statistical interaction policies

Patent number: 7818271

Abstract: A method and apparatus are disclosed for selecting interaction policies. Values may be provided for a group of parameters for user models. Interaction policies within a specific tolerance of an optimal interaction policy for the user models may be learned. Up to a predetermined number of the learned interaction policies, within a specific tolerance of an optimal policy for the user models, may be selected for a wireless communication device. The wireless communication device, including the selected interaction policies, may determine whether any of a group of parameters, representing a user preference or contextual information with respect to use of the wireless communication device, is updated. When any of the group of parameters has been updated, the wireless communication device may select one of the selected interaction policies, such that the selected one of the selected interaction policies may determine a better interaction behavior for the wireless communication device.

Type: Grant

Filed: June 13, 2007

Date of Patent: October 19, 2010

Assignee: Motorola Mobility, Inc.

Inventor: Michael E. Groble
Method and apparatus for training a text independent speaker recognition system using speech data with text labels

Patent number: 7813927

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Grant

Filed: June 4, 2008

Date of Patent: October 12, 2010

Assignee: Nuance Communications, Inc.

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
State output probability calculating method and apparatus for mixture distribution HMM

Patent number: 7813925

Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.

Type: Grant

Filed: April 6, 2006

Date of Patent: October 12, 2010

Assignee: Canon Kabushiki Kaisha

Inventors: Hiroki Yamamoto, Masayuki Yamada
Enhancement to Viterbi speech processing algorithm for hybrid speech models that conserves memory

Patent number: 7805305

Abstract: The present invention discloses a method for semantically processing speech for speech recognition purposes. The method can reduce an amount of memory required for a Viterbi search of an N-gram language model having a value of N greater than two and also having at least one embedded grammar that appears in a multiple contexts to a memory size of approximately a bigram model search space with respect to the embedded grammar. The method also reduces needed CPU requirements. Achieved reductions can be accomplished by representing the embedded grammar as a recursive transition network (RTN), where only one instance of the recursive transition network is used for the contexts. Other than the embedded grammars, a Hidden Markov Model (HMM) strategy can be used for the search space.

Type: Grant

Filed: October 12, 2006

Date of Patent: September 28, 2010

Assignee: Nuance Communications, Inc.

Inventors: Daniel E. Badt, Tomas Beran, Radek Hampl, Pavel Krbec, Jan Sedivy
Method and system for the quick conversion of a voice signal

Patent number: 7792672

Abstract: A method for converting a voice signal from a source speaker into a converted voice signal with acoustic characteristics similar to those of a target speaker includes the steps of determining (1) at least one function for transforming source speaker acoustic characteristics into acoustic characteristics similar to those of the target speaker using target and source speaker voice samples; and transforming acoustic characteristics of the source speaker voice signal to be converted by applying the transformation function(s). The method is characterized in that the transformation (2) includes the step (44) of applying only a predetermined portion of at least one transformation function to said signal to be converted.

Type: Grant

Filed: March 14, 2005

Date of Patent: September 7, 2010

Assignee: France Telecom

Inventors: Olivier Rosec, Taoufik En-Najjary
Computer Implemented Method for Determining All Markov Boundaries and its Application for Discovering Multiple Maximally Accurate and Non-Redundant Predictive Models

Publication number: 20100217599

Abstract: Methods for discovery of a Markov boundary from data constitute one of the most important recent developments in pattern recognition and applied statistics, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property of probability theory may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to discover all Markov boundaries from such data. The present invention is a novel computer implemented generative method (termed TIE*) that can discover all Markov boundaries from a data sample drawn from a distribution. TIE* can be instantiated to discover all and only Markov boundaries independent of data distribution.

Type: Application

Filed: October 30, 2009

Publication date: August 26, 2010

Inventors: Alexander Statnikov, Konstantinos (Constantin) F. Aliferis
Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

Patent number: 7778831

Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.

Type: Grant

Filed: February 21, 2006

Date of Patent: August 17, 2010

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Method for creating a topical reading list

Patent number: 7739294

Abstract: A method for creating an ordered reading list of predetermined length of relevant topics from a hyperlinked database source of information website for a user. The method includes determining at least one topic of interest based on a plurality of methods and choosing a topic ordering algorithm from a plurality of topic ordering algorithms. A top-down schematic algorithm includes a page rank calculation performed by iterating until a convergence. A bottom-up schematic algorithm includes a linear parameterization of a ratio of an order from a plurality of source topics to a plurality of sink topics of an article, and a horizontal schematic algorithm includes an order parameterization by absolute differences of a log of a plurality of ranks and an absolute difference of a plurality of distances with analogous cutoff methods.

Type: Grant

Filed: January 12, 2007

Date of Patent: June 15, 2010

Inventor: Alexander David Wissner-Gross
Online learning for dialog systems

Patent number: 7734471

Abstract: An online dialog system and method are provided. The dialog system receives speech input and outputs an action according to its models. After executing the action, the system receives feedback from the environment or user. The system immediately utilizes the feedback to update its models in an online fashion.

Type: Grant

Filed: June 29, 2005

Date of Patent: June 8, 2010

Assignee: Microsoft Corporation

Inventors: Timothy S. Paek, David M. Chickering, Eric J. Horvitz
Speech recognition apparatus and method

Patent number: 7711559

Abstract: A speech recognition apparatus that requires a reduced amount of computation for likelihood calculation is provided. A language lookahead score for a node of interest is generated based on the language scores for each recognition word shared by the node of interest. To this is added the node's acoustic score, which is calculated based on the likelihood of the connected hypotheses expressed by a path from the root node to the parent node of the node of interest. From this added result, the language lookahead score resulting when the parent node is the node of interest is deleted, and the language lookahead score is updated by adding the language lookahead score of the node of interest. The updating of the language lookahead score is terminated at a specific position in the tree structure.

Type: Grant

Filed: December 13, 2006

Date of Patent: May 4, 2010

Assignee: Canon Kabushiki Kaisha

Inventors: Hideo Kuboyama, Hiroki Yamamoto
Identification and rejection of meaningless input during natural language classification

Patent number: 7707027

Abstract: A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

Type: Grant

Filed: April 13, 2006

Date of Patent: April 27, 2010

Assignee: Nuance Communications, Inc.

Inventors: Rajesh Balchandran, Linda Boyer
Voice activated rapid deployment for mobile computers

Patent number: 7702507

Abstract: Systems and methods that automatically register a mobile computing unit on a wireless network area, via employing a voice recognition system associated with the mobile computing unit. A handshake can occur between a mobile computing unit and a server of the network upon utterance of predetermined voice (e.g., a sequence of letters) by the user into the voice recognition component. As such, a mass deployment of mobile computing units on the network can be facilitated in a secure manner with just enough information to access the network.

Type: Grant

Filed: November 10, 2005

Date of Patent: April 20, 2010

Assignee: Symbol Technologies, Inc.

Inventor: Patrick Tilley
Sequential data examination method using Eigen co-occurrence matrix for masquerade detection

Patent number: 7698740

Abstract: The present invention aims at providing a sequential data examination method which can increase data examination accuracy compared with the prior art. The similarity is calculated between a layered network model generated from learning sequential data to be learned and a layered network model generated from testing sequential data to be tested. Based on the similarity, it is determined whether or not the testing sequential data to be tested belong to one or more categories. A network model for each layer of the layered network model is constructed by multiplying an element of the feature vector and its corresponding Eigen co-occurrence matrix.

Type: Grant

Filed: July 12, 2005

Date of Patent: April 13, 2010

Assignee: Japan Science and Technology Agency

Inventors: Mizuki Oka, Kazuhiko Kato
Methods and apparatus for flexible speech recognition

Patent number: 7698136

Abstract: The present invention is directed to a computer implemented method and apparatus for flexibly recognizing meaningful data items within an arbitrary user utterance. According to one example embodiment of the invention, a set of one or more key phrases and a set of one or more filler phrases are defined, probabilities are assigned to the key phrases and/or the filler phrases, and the user utterances is evaluated against the set of key phrases and the set of filler phrases using the probabilities.

Type: Grant

Filed: January 28, 2003

Date of Patent: April 13, 2010

Assignee: Voxify, Inc.

Inventors: Patrick T. M. Nguyen, Adeeb W. M. Shana'a, Amit V. Desai
Parsimonious modeling by non-uniform kernel allocation

Patent number: 7680664

Abstract: A multi-state pattern recognition model with non-uniform kernel allocation is formed by setting a number of states for a multi-state pattern recognition model and assigning different numbers of kernels to different states. The kernels are then trained using training data to form the multi-state pattern recognition model.

Type: Grant

Filed: August 16, 2006

Date of Patent: March 16, 2010

Assignee: Microsoft Corporation

Inventors: Peng Liu, Jian-Iai Zhou, Frank Kao-ping Soong
Recognizing the Numeric Language in Natural Spoken Dialogue

Publication number: 20100049519

Abstract: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.

Type: Application

Filed: November 5, 2009

Publication date: February 25, 2010

Applicant: AT&T Corp.

Inventors: Mazin G. Rahim, Giuseppe Riccardi, Jeremy Huntley Wright, Bruce Melvin Buntschuh, Allen Louis Gorin
System for estimating parameters of a gaussian mixture model

Patent number: 7664640

Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.

Type: Grant

Filed: March 24, 2003

Date of Patent: February 16, 2010

Assignee: Qinetiq Limited

Inventor: Christopher John St. Clair Webber
System and method for speech separation and multi-talker speech recognition

Patent number: 7664643

Abstract: A method, and a system to execute this method is being presented for the identification and separation of sources of an acoustic signal, which signal contains a mixture of multiple simultaneous component signals. The method represents the signal with multiple discrete state-variable sequences and combines acoustic and context level dynamics to achieve the source separation. The method identifies sources by discovering those frames of the signal whose features are dominated by single sources. The signal may be the simultaneous speech of multiple speakers.

Type: Grant

Filed: August 25, 2006

Date of Patent: February 16, 2010

Assignees: Nuance Communications, Inc.

Inventors: Ramesh Ambat Gopinath, John Randall Hershey, Trausti Thor Kristjansson, Peder Andreas Olsen, Steven John Rennie
Method and system for voice activating web pages

Patent number: 7640163

Abstract: A method for providing a web page having an audio interface. The method including providing data specifying a web page, including in the data a first rule based grammar statement having a first phrase portion, a first command portion and a first tag portion, and including in the data a second rule based grammar statement having a second phrase portion, a second command portion, and a second tag portion.

Type: Grant

Filed: November 30, 2001

Date of Patent: December 29, 2009

Assignee: The Trustees of Columbia University in the City of New York

Inventors: Michael L. Charney, Justin Starren
Automatic Segmentation in Speech Synthesis

Publication number: 20090313025

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Application

Filed: August 20, 2009

Publication date: December 17, 2009

Applicant: AT&T Corp.

Inventors: Alistair D. CONKIE, Yeon-Jun KIM
Hidden conditional random field models for phonetic classification and speech recognition

Patent number: 7627473

Abstract: A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses feature functions, at least one of which is based on a hidden state in a phonetic unit. Values for the feature functions are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.

Type: Grant

Filed: October 15, 2004

Date of Patent: December 1, 2009

Assignee: Microsoft Corporation

Inventors: Asela J. Gunawardana, Milind Mahajan, Alejandro Acero
Incrementally regulated discriminative margins in MCE training for speech recognition

Patent number: 7617103

Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the acoustic model. From this score a misclassification measure is calculated and then a loss function is calculated from the misclassification measure. The loss function also includes a margin value that varies over each iteration in the training. Based on the calculated loss function the acoustic model is updated, where the loss function with the margin value is minimized. This process repeats until such time as an empirical convergence is met.

Type: Grant

Filed: August 25, 2006

Date of Patent: November 10, 2009

Assignee: Microsoft Corporation

Inventors: Xiaodong He, Alex Acero, Dong Yu, Li Deng
Standard-model generation for speech recognition using a reference model

Patent number: 7603276

Abstract: A standard model creating apparatus which provides a high-precision standard model used for pattern recognition such as speech recognition, character recognition, or image recognition using a probability model based on a hidden Markov model, Bayesian theory, or linear discrimination analysis; intention interpretation using a probability model such as a Bayesian net; data-mining performed using a probability model; and so forth. The standard model creating apparatus includes a reference model preparing unit that prepares at least one reference model; a reference model storing unit that stores the reference model prepared by the reference model preparing unit; and a standard model creating unit that creates a standard model by calculating statistics of the standard model so as to maximize or locally maximize the probability or likelihood with respect to the reference model stored in the reference storing unit.

Type: Grant

Filed: November 18, 2003

Date of Patent: October 13, 2009

Assignee: Panasonic Corporation

Inventor: Shinichi Yoshizawa
System and method of word graph matrix decomposition

Patent number: 7603272

Abstract: Disclosed is a system and method of decomposing a lattice transition matrix into a block diagonal matrix. The method is applicable to automatic speech recognition but can be used in other contexts as well, such as parsing, named entity extraction and any other methods. The method normalizes the topology of any input graph according to a canonical form.

Type: Grant

Filed: June 19, 2007

Date of Patent: October 13, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Giuseppe Riccardi
Speech recognition grammar creating apparatus, control method therefor, program for implementing the method, and storage medium storing the program

Patent number: 7603269

Abstract: A speech recognition grammar creating apparatus, which is capable of eliminating complex labor associated with preparing all rules by taking into account changes of the order of component elements of a speech-recognizing object and possible combinations of component elements including at least one component element that can be omitted. In the speech recognition grammar creating apparatus, an image edit section groups together at least one component element that cannot be omitted and at least one component element that can be omitted, as the speech-recognizing object, into a component element group as an omission-allowed group. An augmented BNF converting section creates the speech recognition grammar by expanding the component element group obtained by the grouping.

Type: Grant

Filed: June 29, 2005

Date of Patent: October 13, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Kazue Kaneko, Michio Aizawa
Method and apparatus for identifying semantic structures from text

Patent number: 7593845

Abstract: A method and apparatus for identifying a semantic structure from an input text forms at least two candidate semantic structures. A semantic score is determined for each candidate semantic structure based on the likelihood of the semantic structure. A syntactic score is also determined for each semantic structure based on the position of a word in the text and the position in the semantic structure of a semantic entity formed from the word. The syntactic score and the semantic score are combined to select a semantic structure for at least a portion of the text. In many embodiments, the semantic structure is built incrementally by building and scoring candidate structures for a portion of the text, pruning low scoring candidates, and adding additional semantic elements to the retained candidates.

Type: Grant

Filed: October 6, 2003

Date of Patent: September 22, 2009

Assignee: Microsoflt Corporation

Inventor: William D. Ramsey
Automatic segmentation in speech synthesis

Patent number: 7587320

Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.

Type: Grant

Filed: August 1, 2007

Date of Patent: September 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Yeon-Jun Kim
Language model for use in speech recognition

Patent number: 7584102

Abstract: Building a language model for use in speech recognition includes identifying without user interaction a source of text related to a user. Text is retrieved from the identified source of text and a language model related to the user is built from the retrieved text.

Type: Grant

Filed: November 15, 2002

Date of Patent: September 1, 2009

Assignee: Scansoft, Inc.

Inventors: Kwangil Hwang, Eric Fieleke
Speaker selection training via a-posteriori Gaussian mixture model analysis, transformation, and combination of hidden Markov models

Patent number: 7574359

Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.

Type: Grant

Filed: October 1, 2004

Date of Patent: August 11, 2009

Assignee: Microsoft Corporation

Inventor: Chao Huang
Speech recognition method and apparatus

Patent number: 7565290

Abstract: A speech recognition apparatus includes a word dictionary having recognition target words, a first acoustic model which expresses a reference pattern of a speech unit by one or more states, a second acoustic model which is lower in precision than said first acoustic model, selection means for selecting one of said first acoustic model and said second acoustic model on the basis of a parameter associated with a state of interest, and likelihood calculation means for calculating a likelihood of an acoustic feature parameter with respect to said acoustic model selected by said selection means.

Type: Grant

Filed: June 24, 2005

Date of Patent: July 21, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Hideo Kuboyama, Toshiaki Fukada, Yasuhiro Komori
Determining temporal patterns in sensed data sequences by hierarchical decomposition of hidden Markov models

Patent number: 7542949

Abstract: A method determines temporal patterns in data sequences. A hierarchical tree of nodes is constructed. Each node in the tree is associated with a composite hidden Markov model, in which the composite hidden Markov model has one independent path for each child node of a parent node of the hierarchical tree. The composite hidden Markov models are trained using training data sequences. The composite hidden Markov models associated with the nodes of the hierarchical tree are decomposed into a single final composite Markov model. The single final composite hidden Markov model can then be employed for determining temporal patterns in unknown data sequences.

Type: Grant

Filed: May 12, 2004

Date of Patent: June 2, 2009

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Christopher R. Wren, David C. Minnen

prev 1 2 3 4 5 6 7 8 9 … next