Clustering Patents (Class 704/245)

Spoken man-machine interface with speaker identification

Patent number: 7769588

Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.

Type: Grant

Filed: August 20, 2008

Date of Patent: August 3, 2010

Assignee: Sony Deutschland GmbH

Inventors: Ralf Kompe, Thomas Kemp
Generic visual categorization method and system

Patent number: 7756341

Abstract: Generic visual categorization methods complement a general vocabulary with adapted vocabularies that are class specific. Images to be categorized are characterized within different categories through a histogram indicating whether the image is better described by the general vocabulary or the class-specific adapted vocabulary.

Type: Grant

Filed: June 30, 2005

Date of Patent: July 13, 2010

Assignee: Xerox Corporation

Inventor: Florent Perronnin
System and method for using meta-data dependent language modeling for automatic speech recognition

Patent number: 7752046

Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.

Type: Grant

Filed: October 29, 2004

Date of Patent: July 6, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Michiel A. E. Bacchiani, Brian E. Roark
Computer aided document retrieval

Patent number: 7747593

Abstract: A method of determining cluster attractors for a plurality of documents comprising at least one term. The method comprises calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents. Then, the entropy of the respective probability distribution is calculated. Finally, at least one of said probability distributions is selected as a cluster attractor depending on the respective entropy value. The method facilitates very small clusters to be formed enabling more focused retrieval during a document search.

Type: Grant

Filed: September 27, 2004

Date of Patent: June 29, 2010

Assignees: University of Ulster, St. Petersburg State University

Inventors: David Patterson, Vladimir Dobrynin
Information retrieving method and apparatus

Patent number: 7747435

Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.

Type: Grant

Filed: March 15, 2008

Date of Patent: June 29, 2010

Assignee: Sony Corporation

Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
Broadcast router having a serial digital audio data stream decoder

Patent number: 7747447

Abstract: A bi-phase decoder suitable for use in a broadcast router and an associated method for extracting subframes of digital audio data from a stream of digital audio data. Logical circuitry within the bi-phase decoder extracts subframes of the digital audio data by constructing a transition window from an estimated bit time, sampling the stream of digital audio data using a fast clock and applying the sampled stream of digital audio data to the transition window to identify transitions indicative of preambles of the subframes of digital audio data.

Type: Grant

Filed: June 20, 2003

Date of Patent: June 29, 2010

Assignee: Thomson Licensing

Inventors: Carl Christensen, Lynn Howard Arbuckle
Active learning for spoken language understanding

Patent number: 7742918

Abstract: Disclosed is a system and method of training a spoken language understanding module. Such a module may be utilized in a spoken dialog system. The method of training a spoken language understanding module comprises training acoustic and language models using a small set of transcribed data St, recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models, computing confidence scores of the utterances, selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si, redefining St as the union of St and Si, redefining Su as Su minus Si, and returning to the step of training acoustic and language models if word accuracy has not converged.

Type: Grant

Filed: July 5, 2007

Date of Patent: June 22, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
Pattern matching method and apparatus and speech information retrieval system

Patent number: 7739111

Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.

Type: Grant

Filed: August 9, 2006

Date of Patent: June 15, 2010

Assignee: Canon Kabushiki Kaisha

Inventor: Kazue Kaneko
SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

Publication number: 20100138223

Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).

Type: Application

Filed: March 13, 2008

Publication date: June 3, 2010

Inventor: Takafumi Koshinaka
Speech recognition method and system

Patent number: 7729911

Abstract: A speech recognition method comprising the steps of: storing multiple recognition models for a vocabulary set, each model distinguished from the other models in response to a Lombard characteristic, detecting at least one speaker utterance in a motor vehicle, selecting one of the multiple recognition models in response to a Lombard characteristic of the at least one speaker utterance, utilizing the selected recognition model to recognize the at least one speaker utterance; and providing a signal in response to the recognition.

Type: Grant

Filed: September 27, 2005

Date of Patent: June 1, 2010

Assignee: General Motors LLC

Inventors: Rathinavelu Chengalvarayan, Scott M. Pennock
System and method for improving the accuracy of audio searching

Patent number: 7725318

Abstract: A system and method for improving the accuracy of audio searching using multiple models to process an audio file or stream to obtain search tracks. The search tracks are processed to locate at least one search term and generate multiple search results. The number of search results is equivalent to the number of models used to process the audio stream. The search results are combined to generate a unified search result. The multiple models may represent different languages, dialects and accents.

Type: Grant

Filed: August 1, 2005

Date of Patent: May 25, 2010

Assignee: NICE Systems Inc.

Inventors: Marsal Gavalda, Moshe Wasserblat
Speech models generated using competitive training, asymmetric training, and data boosting

Patent number: 7693713

Abstract: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.

Type: Grant

Filed: June 17, 2005

Date of Patent: April 6, 2010

Assignee: Microsoft Corporation

Inventors: Xiaodong He, Jian Wu
System for estimating parameters of a gaussian mixture model

Patent number: 7664640

Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.

Type: Grant

Filed: March 24, 2003

Date of Patent: February 16, 2010

Assignee: Qinetiq Limited

Inventor: Christopher John St. Clair Webber
Speech recognition accuracy with multi-confidence thresholds

Patent number: 7657433

Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.

Type: Grant

Filed: September 8, 2006

Date of Patent: February 2, 2010

Assignee: TellMe Networks, Inc.

Inventor: Shuangyu Chang
System and method for fast on-line learning of transformed hidden Markov models

Patent number: 7657102

Abstract: A fast variational on-line learning technique for training a transformed hidden Markov model. A simplified general model and an associated estimation algorithm is provided for modeling visual data such as a video sequence. Specifically, once the model has been initialized, an expectation-maximization (“EM”) algorithm is used to learn the one or more object class models, so that the video sequence has high marginal probability under the model. In the expectation step (the “E-Step”), the model parameters are assumed to be correct, and for an input image, probabilistic inference is used to fill in the values of the unobserved or hidden variables, e.g., the object class and appearance. In one embodiment of the invention, a Viterbi algorithm and a latent image is employed for this purpose. In the maximization step (the “M-Step”), the model parameters are adjusted using the values of the unobserved variables calculated in the previous E-step.

Type: Grant

Filed: August 27, 2003

Date of Patent: February 2, 2010

Assignee: Microsoft Corp.

Inventors: Nebojsa Jojic, Nemanja Petrovic
Global boundary-centric feature extraction and associated discontinuity metrics

Patent number: 7643990

Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.

Type: Grant

Filed: October 23, 2003

Date of Patent: January 5, 2010

Assignee: Apple Inc.

Inventor: Jerome R. Bellegarda
Palette-based classifying and synthesizing of auditory information

Patent number: 7634405

Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.

Type: Grant

Filed: January 24, 2005

Date of Patent: December 15, 2009

Assignee: Microsoft Corporation

Inventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
Spoken man-machine interface with speaker identification

Patent number: 7620547

Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.

Type: Grant

Filed: January 24, 2005

Date of Patent: November 17, 2009

Assignee: Sony Deutschland GmbH

Inventors: Ralf Kompe, Thomas Kemp
BOUNDARY ESTIMATION APPARATUS AND METHOD

Publication number: 20090265166

Abstract: A boundary estimation apparatus includes an boundary estimation unit which estimates a first boundary separating a speech into first meaning units, a boundary estimation unit configured to estimate a second boundary separating a speech, related to the speech, into second meaning units related to the first meaning units, a pattern generating unit configured to generate a representative pattern showing representative characteristic in the analysis interval, a similarity calculation unit configured to calculate a similarity between the representative pattern and a characteristic pattern showing feature in a calculation interval for calculating the similarity in the speech, and the boundary estimation unit estimate as the second boundary based on the calculation interval, in which the similarity is higher than a threshold value or relatively high.

Type: Application

Filed: June 30, 2009

Publication date: October 22, 2009

Inventor: Kazuhiko Abe
Test summarization using relevance measures and latent semantic analysis

Patent number: 7607083

Abstract: Text summarizers using relevance measurement technologies and latent semantic analysis techniques provide accurate and useful summarization of the contents of text documents. Generic text summaries may be produced by ranking and extracting sentences from original documents; broad coverage of document content and decreased redundancy may simultaneously be achieved by constructing summaries from sentences that are highly ranked and different from each other. In one embodiment, conventional Information Retrieval (IR) technologies may be applied in a unique way to perform the summarization; relevance measurement, sentence selection, and term elimination may be repeated in successive iterations.

Type: Grant

Filed: March 26, 2001

Date of Patent: October 20, 2009

Assignee: NEC Corporation

Inventors: Yihong Gong, Xin Liu
Segment set creating method and apparatus

Patent number: 7603278

Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.

Type: Grant

Filed: September 14, 2005

Date of Patent: October 13, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
Speaker clustering and adaptation method based on the HMM model variation information and its apparatus for speech recognition

Patent number: 7590537

Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.

Type: Grant

Filed: December 27, 2004

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
Method and system for clustering using generalized sentence patterns

Patent number: 7584100

Abstract: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.

Type: Grant

Filed: June 30, 2004

Date of Patent: September 1, 2009

Assignee: Microsoft Corporation

Inventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng
Method for training of subspace coded gaussian models

Patent number: 7571097

Abstract: A method for compressing multiple dimensional gaussian distributions with diagonal covariance matrixes includes clustering a plurality of gaussian distributions in a multiplicity of clusters for each dimension. Each cluster can be represented by a centroid having a mean and a variance. A total decrease in likelihood of a training dataset is minimized for the representation of the plurality of gaussian distributions.

Type: Grant

Filed: March 13, 2003

Date of Patent: August 4, 2009

Assignee: Microsoft Corporation

Inventors: Alejandro Acero, Michael D. Plumpe
Noise adaptation system of speech model, noise adaptation method, and noise adaptation program for speech recognition

Patent number: 7552049

Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.

Type: Grant

Filed: March 10, 2004

Date of Patent: June 23, 2009

Assignees: NTT DoCoMo, Inc., Sadaoki Furui

Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
Systems and methods for discriminative density model selection

Patent number: 7548856

Abstract: The present invention utilizes a discriminative density model selection method to provide an optimized density model subset employable in constructing a classifier. By allowing multiple alternative density models to be considered for each class in a multi-class classification system and then developing an optimal configuration comprised of a single density model for each class, the classifier can be tuned to exhibit a desired characteristic such as, for example, high classification accuracy, low cost, and/or a balance of both. In one instance of the present invention, error graph, junction tree, and min-sum propagation algorithms are utilized to obtain an optimization from discriminatively selected density models.

Type: Grant

Filed: May 20, 2003

Date of Patent: June 16, 2009

Assignee: Microsoft Corporation

Inventors: Bo Thiesson, Christopher A. Meek
Method for reproducing audio documents with the aid of an interface comprising document groups and associated reproducing device

Patent number: 7546242

Abstract: A method of reproduction by a reproduction apparatus for reproducing audio documents forming part of a set of documents. The method includes a prior step of partitioning of the documents of the set into groups of documents whose audio parameters exhibit a similitude, making it possible to determine at least one document representing each group by taking into account its audio parameters. Then, an identifier of a document representing the group is reproduced graphically and/or in a sound manner. In this way, the user can take note of the type of music involved and can select this group by virtue of the graphical identifier. A command may be activated making it possible to go from one group to another; a group may be selected and reproduce the documents of this group. The invention also relates to a reproduction apparatus furnished with a user interface allowing reproduction.

Type: Grant

Filed: August 5, 2004

Date of Patent: June 9, 2009

Assignee: Thomson Licensing

Inventors: Louis Chevallier, Izabela Grasland, Jean-Ronan Vigouroux, Jean-Baptiste Henry
Methods and apparatus for generating dialog state conditioned language models

Patent number: 7542901

Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.

Type: Grant

Filed: August 24, 2006

Date of Patent: June 2, 2009

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
Minimum bayes error feature selection in speech recognition

Patent number: 7529666

Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.

Type: Grant

Filed: October 30, 2000

Date of Patent: May 5, 2009

Assignee: International Business Machines Corporation

Inventors: Mukund Padmanabhan, George A. Saon
Method and apparatus for identifying an unknown work

Patent number: 7529659

Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.

Type: Grant

Filed: September 28, 2005

Date of Patent: May 5, 2009

Assignee: Audible Magic Corporation

Inventor: Erling H. Wold
METHOD FOR SEGMENTING COMMUNICATION TRANSCRIPTS USING UNSUPERVSED AND SEMI-SUPERVISED TECHNIQUES

Publication number: 20090112588

Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence

Type: Application

Filed: October 31, 2007

Publication date: April 30, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Krishna Kummamuru, Deepak S. Padmanabhan, Shourya Roy, L. Venkata Subramaniam
Speech recognition word dictionary/language model making system, method, and program, and speech recognition system

Publication number: 20090106023

Abstract: A speech recognition word dictionary/language model making system for creating a word dictionary for recognizing a word not appearing in a learning text by selecting a word-generation-model-learning-method-by-word-class according to the word to be added which does not appear in the learning text and for making a language model. The speech recognition word dictionary/language model making system (100) includes a language model estimating device (111) for selecting estimating method information from a learning-method-knowledge-by-word-class storing section (109) for each word class of an addition word generating model which is a word generating model of the addition word according to the selected estimating method information and a database combining device (112) for adding an addition word to a word dictionary (105) and adding an addition word generating model to a word-generation-model-by-word-class database (107).

Type: Application

Filed: November 30, 2007

Publication date: April 23, 2009

Inventor: Kiyokazu Miki
Feature extraction apparatus and method and pattern recognition apparatus and method

Patent number: 7509256

Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.

Type: Grant

Filed: March 29, 2005

Date of Patent: March 24, 2009

Assignee: Sony Corporation

Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
Adaptation of compressed acoustic models

Patent number: 7499857

Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.

Type: Grant

Filed: May 15, 2003

Date of Patent: March 3, 2009

Assignee: Microsoft Corporation

Inventor: Asela J. Gunawardana
Wireless enabled speech recognition (SR) portable device including a programmable user trained SR profile for transmission to external SR enabled PC

Patent number: 7496693

Abstract: A method of interacting with a speech recognition (SR)-enabled personal computer (PC) is provided in which a user SR profile is transferred from a wireless-enabled device to the SR-enabled PC. Interaction with SR applications, on the SR-enabled PC, is carried out by transmitting speech signals wirelessly to the SR-enabled PC. The transmitted speech signals are recognized with the help of the transferred user SR profile.

Type: Grant

Filed: March 17, 2006

Date of Patent: February 24, 2009

Assignee: Microsoft Corporation

Inventors: Daniel B. Cook, David Mowatt, Oliver Scholz, Oscar E. Murillo
Refining of segmental boundaries in speech waveforms using contextual-dependent models

Patent number: 7496512

Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.

Type: Grant

Filed: April 13, 2004

Date of Patent: February 24, 2009

Assignee: Microsoft Corporation

Inventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
Timing of speech recognition over lossy transmission systems

Patent number: 7496503

Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.

Type: Grant

Filed: December 18, 2006

Date of Patent: February 24, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
Multiscale detection of local image structures

Patent number: 7474790

Abstract: A method and apparatus for the detection of local image structures represented as clusters in a joint-spatial range domain where the method comprises receiving an input image made having one or more clusters in a joint-spatial range domain, and each of the one or more clusters having a corresponding mode. Receiving a set of analysis matrices and selecting through each one of the analysis matrices. Using the selected analysis matrix to partition the input image into the one or more clusters and their corresponding modes, and computing a mean, ?, and a local covariance matrix ? for each of the corresponding modes of each of the one or more clusters. Selecting at least one of the one or more clusters, where each selected cluster has a stable mean and stable covariance matrix across the set of analysis matrices, whereby each of the selected clusters is indicative of a local image structure.

Type: Grant

Filed: September 29, 2004

Date of Patent: January 6, 2009

Assignee: Siemens Medical Solutions USA, Inc.

Inventors: Navneet Dalal, Dorin Comaniciu
Speaker recognition using local models

Patent number: 7475013

Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.

Type: Grant

Filed: March 26, 2004

Date of Patent: January 6, 2009

Assignee: Honda Motor Co., Ltd.

Inventor: Ryan Rifkin
Efficient recursive clustering based on a splitting function derived from successive eigen-decompositions

Patent number: 7472062

Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.

Type: Grant

Filed: January 4, 2002

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
KEYWORD OUTPUTTING APPARATUS AND METHOD

Publication number: 20080319746

Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.

Type: Application

Filed: March 25, 2008

Publication date: December 25, 2008

Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
SPOKEN MAN-MACHINE INTERFACE WITH SPEAKER IDENTIFICATION

Publication number: 20080319747

Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.

Type: Application

Filed: August 20, 2008

Publication date: December 25, 2008

Applicant: Sony Deutschland GmbH

Inventors: Ralf Kompe, Thomas Kemp
Method of modeling single data class from multi-class data

Patent number: 7454337

Abstract: The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.

Type: Grant

Filed: May 13, 2004

Date of Patent: November 18, 2008

Assignee: The United States of America as represented by the Director, National Security Agency, The

Inventors: David C. Smith, Daniel J. Richman
Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

Patent number: 7454341

Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

Type: Grant

Filed: September 30, 2000

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Jielin Pan, Baosheng Yuan
Computer system, method, and program product for generating a data structure for information retrieval, and an associated graphical user interface

Patent number: 7428541

Abstract: A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.

Type: Grant

Filed: December 15, 2003

Date of Patent: September 23, 2008

Assignee: International Business Machines Corporation

Inventor: Michael Edward Houle
System for identifying paraphrases using machine translation

Patent number: 7412385

Abstract: The present invention obtains a set of text segments from a cluster of different articles written about a common event. The set of text segments is then subjected to textual alignment techniques to identify paraphrases from the text segments in the text. The invention can also be used to generate paraphrases.

Type: Grant

Filed: November 12, 2003

Date of Patent: August 12, 2008

Assignee: Microsoft Corporation

Inventors: Christopher J. Brockett, William B. Dolan, Christopher B. Quirk
Method For Speech Recognition From a Partitioned Vocabulary

Publication number: 20080126090

Abstract: A is recognized using a predefinable vocabulary that is partitioned in sections of phonetically similar words. In a recognition process, first oral input is associated with one of the sections, then the oral input is determined from the vocabulary of the associated section.

Type: Application

Filed: October 4, 2005

Publication date: May 29, 2008

Inventor: Niels Kunstmann
Method and apparatus for differential compression of speaker models

Patent number: 7379868

Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.

Type: Grant

Filed: January 2, 2003

Date of Patent: May 27, 2008

Assignee: Massachusetts Institute of Technology

Inventor: Douglas A. Reynolds
Method and system of correcting spectral deformations in the voice, introduced by a communication network

Patent number: 7359857

Abstract: A technique for correcting the voice spectral deformations introduced by a communication network. Prior to the operation of equalization of the voice signal of a speaker, the constitution of classes of speakers is communicated, with one voice reference per class. Then, for a given speaker, the classification of this speaker is communicated, that is to say his allocation to a class from predefined classification criteria in order to make a voice reference which is closest to his own correspond to him. Then, for that given speaker, communicating the equalization of the digitized signal of the voice of the speaker carried out with, as a reference spectrum, the voice reference of the class to which the speaker has been allocated. This technique applies to the correction of the timbre of the voice in switched telephone networks, in ISDN networks and in mobile networks.

Type: Grant

Filed: November 25, 2003

Date of Patent: April 15, 2008

Assignee: France Telecom

Inventors: Gaël Mahe, André Gilloire
Bubble splitting for compact acoustic modeling

Patent number: 7328154

Abstract: An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.

Type: Grant

Filed: August 13, 2003

Date of Patent: February 5, 2008

Assignee: Matsushita Electrical Industrial Co., Ltd.

Inventors: Ambroise Mutel, Patrick Nguyen, Luca Rigazio

prev 1 2 3 4 5 6 7 next