Clustering Patents (Class 704/245)
  • Patent number: 7769588
    Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.
    Type: Grant
    Filed: August 20, 2008
    Date of Patent: August 3, 2010
    Assignee: Sony Deutschland GmbH
    Inventors: Ralf Kompe, Thomas Kemp
  • Patent number: 7756341
    Abstract: Generic visual categorization methods complement a general vocabulary with adapted vocabularies that are class specific. Images to be categorized are characterized within different categories through a histogram indicating whether the image is better described by the general vocabulary or the class-specific adapted vocabulary.
    Type: Grant
    Filed: June 30, 2005
    Date of Patent: July 13, 2010
    Assignee: Xerox Corporation
    Inventor: Florent Perronnin
  • Patent number: 7752046
    Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.
    Type: Grant
    Filed: October 29, 2004
    Date of Patent: July 6, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Michiel A. E. Bacchiani, Brian E. Roark
  • Patent number: 7747593
    Abstract: A method of determining cluster attractors for a plurality of documents comprising at least one term. The method comprises calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents. Then, the entropy of the respective probability distribution is calculated. Finally, at least one of said probability distributions is selected as a cluster attractor depending on the respective entropy value. The method facilitates very small clusters to be formed enabling more focused retrieval during a document search.
    Type: Grant
    Filed: September 27, 2004
    Date of Patent: June 29, 2010
    Assignees: University of Ulster, St. Petersburg State University
    Inventors: David Patterson, Vladimir Dobrynin
  • Patent number: 7747435
    Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.
    Type: Grant
    Filed: March 15, 2008
    Date of Patent: June 29, 2010
    Assignee: Sony Corporation
    Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
  • Patent number: 7747447
    Abstract: A bi-phase decoder suitable for use in a broadcast router and an associated method for extracting subframes of digital audio data from a stream of digital audio data. Logical circuitry within the bi-phase decoder extracts subframes of the digital audio data by constructing a transition window from an estimated bit time, sampling the stream of digital audio data using a fast clock and applying the sampled stream of digital audio data to the transition window to identify transitions indicative of preambles of the subframes of digital audio data.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: June 29, 2010
    Assignee: Thomson Licensing
    Inventors: Carl Christensen, Lynn Howard Arbuckle
  • Patent number: 7742918
    Abstract: Disclosed is a system and method of training a spoken language understanding module. Such a module may be utilized in a spoken dialog system. The method of training a spoken language understanding module comprises training acoustic and language models using a small set of transcribed data St, recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models, computing confidence scores of the utterances, selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si, redefining St as the union of St and Si, redefining Su as Su minus Si, and returning to the step of training acoustic and language models if word accuracy has not converged.
    Type: Grant
    Filed: July 5, 2007
    Date of Patent: June 22, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
  • Patent number: 7739111
    Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: June 15, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Kazue Kaneko
  • Publication number: 20100138223
    Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).
    Type: Application
    Filed: March 13, 2008
    Publication date: June 3, 2010
    Inventor: Takafumi Koshinaka
  • Patent number: 7729911
    Abstract: A speech recognition method comprising the steps of: storing multiple recognition models for a vocabulary set, each model distinguished from the other models in response to a Lombard characteristic, detecting at least one speaker utterance in a motor vehicle, selecting one of the multiple recognition models in response to a Lombard characteristic of the at least one speaker utterance, utilizing the selected recognition model to recognize the at least one speaker utterance; and providing a signal in response to the recognition.
    Type: Grant
    Filed: September 27, 2005
    Date of Patent: June 1, 2010
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, Scott M. Pennock
  • Patent number: 7725318
    Abstract: A system and method for improving the accuracy of audio searching using multiple models to process an audio file or stream to obtain search tracks. The search tracks are processed to locate at least one search term and generate multiple search results. The number of search results is equivalent to the number of models used to process the audio stream. The search results are combined to generate a unified search result. The multiple models may represent different languages, dialects and accents.
    Type: Grant
    Filed: August 1, 2005
    Date of Patent: May 25, 2010
    Assignee: NICE Systems Inc.
    Inventors: Marsal Gavalda, Moshe Wasserblat
  • Patent number: 7693713
    Abstract: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.
    Type: Grant
    Filed: June 17, 2005
    Date of Patent: April 6, 2010
    Assignee: Microsoft Corporation
    Inventors: Xiaodong He, Jian Wu
  • Patent number: 7664640
    Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: February 16, 2010
    Assignee: Qinetiq Limited
    Inventor: Christopher John St. Clair Webber
  • Patent number: 7657433
    Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: February 2, 2010
    Assignee: TellMe Networks, Inc.
    Inventor: Shuangyu Chang
  • Patent number: 7657102
    Abstract: A fast variational on-line learning technique for training a transformed hidden Markov model. A simplified general model and an associated estimation algorithm is provided for modeling visual data such as a video sequence. Specifically, once the model has been initialized, an expectation-maximization (“EM”) algorithm is used to learn the one or more object class models, so that the video sequence has high marginal probability under the model. In the expectation step (the “E-Step”), the model parameters are assumed to be correct, and for an input image, probabilistic inference is used to fill in the values of the unobserved or hidden variables, e.g., the object class and appearance. In one embodiment of the invention, a Viterbi algorithm and a latent image is employed for this purpose. In the maximization step (the “M-Step”), the model parameters are adjusted using the values of the unobserved variables calculated in the previous E-step.
    Type: Grant
    Filed: August 27, 2003
    Date of Patent: February 2, 2010
    Assignee: Microsoft Corp.
    Inventors: Nebojsa Jojic, Nemanja Petrovic
  • Patent number: 7643990
    Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.
    Type: Grant
    Filed: October 23, 2003
    Date of Patent: January 5, 2010
    Assignee: Apple Inc.
    Inventor: Jerome R. Bellegarda
  • Patent number: 7634405
    Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: December 15, 2009
    Assignee: Microsoft Corporation
    Inventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
  • Patent number: 7620547
    Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: November 17, 2009
    Assignee: Sony Deutschland GmbH
    Inventors: Ralf Kompe, Thomas Kemp
  • Publication number: 20090265166
    Abstract: A boundary estimation apparatus includes an boundary estimation unit which estimates a first boundary separating a speech into first meaning units, a boundary estimation unit configured to estimate a second boundary separating a speech, related to the speech, into second meaning units related to the first meaning units, a pattern generating unit configured to generate a representative pattern showing representative characteristic in the analysis interval, a similarity calculation unit configured to calculate a similarity between the representative pattern and a characteristic pattern showing feature in a calculation interval for calculating the similarity in the speech, and the boundary estimation unit estimate as the second boundary based on the calculation interval, in which the similarity is higher than a threshold value or relatively high.
    Type: Application
    Filed: June 30, 2009
    Publication date: October 22, 2009
    Inventor: Kazuhiko Abe
  • Patent number: 7607083
    Abstract: Text summarizers using relevance measurement technologies and latent semantic analysis techniques provide accurate and useful summarization of the contents of text documents. Generic text summaries may be produced by ranking and extracting sentences from original documents; broad coverage of document content and decreased redundancy may simultaneously be achieved by constructing summaries from sentences that are highly ranked and different from each other. In one embodiment, conventional Information Retrieval (IR) technologies may be applied in a unique way to perform the summarization; relevance measurement, sentence selection, and term elimination may be repeated in successive iterations.
    Type: Grant
    Filed: March 26, 2001
    Date of Patent: October 20, 2009
    Assignee: NEC Corporation
    Inventors: Yihong Gong, Xin Liu
  • Patent number: 7603278
    Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.
    Type: Grant
    Filed: September 14, 2005
    Date of Patent: October 13, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
  • Patent number: 7590537
    Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
  • Patent number: 7584100
    Abstract: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.
    Type: Grant
    Filed: June 30, 2004
    Date of Patent: September 1, 2009
    Assignee: Microsoft Corporation
    Inventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng
  • Patent number: 7571097
    Abstract: A method for compressing multiple dimensional gaussian distributions with diagonal covariance matrixes includes clustering a plurality of gaussian distributions in a multiplicity of clusters for each dimension. Each cluster can be represented by a centroid having a mean and a variance. A total decrease in likelihood of a training dataset is minimized for the representation of the plurality of gaussian distributions.
    Type: Grant
    Filed: March 13, 2003
    Date of Patent: August 4, 2009
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Michael D. Plumpe
  • Patent number: 7552049
    Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.
    Type: Grant
    Filed: March 10, 2004
    Date of Patent: June 23, 2009
    Assignees: NTT DoCoMo, Inc., Sadaoki Furui
    Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
  • Patent number: 7548856
    Abstract: The present invention utilizes a discriminative density model selection method to provide an optimized density model subset employable in constructing a classifier. By allowing multiple alternative density models to be considered for each class in a multi-class classification system and then developing an optimal configuration comprised of a single density model for each class, the classifier can be tuned to exhibit a desired characteristic such as, for example, high classification accuracy, low cost, and/or a balance of both. In one instance of the present invention, error graph, junction tree, and min-sum propagation algorithms are utilized to obtain an optimization from discriminatively selected density models.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: June 16, 2009
    Assignee: Microsoft Corporation
    Inventors: Bo Thiesson, Christopher A. Meek
  • Patent number: 7546242
    Abstract: A method of reproduction by a reproduction apparatus for reproducing audio documents forming part of a set of documents. The method includes a prior step of partitioning of the documents of the set into groups of documents whose audio parameters exhibit a similitude, making it possible to determine at least one document representing each group by taking into account its audio parameters. Then, an identifier of a document representing the group is reproduced graphically and/or in a sound manner. In this way, the user can take note of the type of music involved and can select this group by virtue of the graphical identifier. A command may be activated making it possible to go from one group to another; a group may be selected and reproduce the documents of this group. The invention also relates to a reproduction apparatus furnished with a user interface allowing reproduction.
    Type: Grant
    Filed: August 5, 2004
    Date of Patent: June 9, 2009
    Assignee: Thomson Licensing
    Inventors: Louis Chevallier, Izabela Grasland, Jean-Ronan Vigouroux, Jean-Baptiste Henry
  • Patent number: 7542901
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: August 24, 2006
    Date of Patent: June 2, 2009
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Patent number: 7529666
    Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.
    Type: Grant
    Filed: October 30, 2000
    Date of Patent: May 5, 2009
    Assignee: International Business Machines Corporation
    Inventors: Mukund Padmanabhan, George A. Saon
  • Patent number: 7529659
    Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.
    Type: Grant
    Filed: September 28, 2005
    Date of Patent: May 5, 2009
    Assignee: Audible Magic Corporation
    Inventor: Erling H. Wold
  • Publication number: 20090112588
    Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence
    Type: Application
    Filed: October 31, 2007
    Publication date: April 30, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Krishna Kummamuru, Deepak S. Padmanabhan, Shourya Roy, L. Venkata Subramaniam
  • Publication number: 20090106023
    Abstract: A speech recognition word dictionary/language model making system for creating a word dictionary for recognizing a word not appearing in a learning text by selecting a word-generation-model-learning-method-by-word-class according to the word to be added which does not appear in the learning text and for making a language model. The speech recognition word dictionary/language model making system (100) includes a language model estimating device (111) for selecting estimating method information from a learning-method-knowledge-by-word-class storing section (109) for each word class of an addition word generating model which is a word generating model of the addition word according to the selected estimating method information and a database combining device (112) for adding an addition word to a word dictionary (105) and adding an addition word generating model to a word-generation-model-by-word-class database (107).
    Type: Application
    Filed: November 30, 2007
    Publication date: April 23, 2009
    Inventor: Kiyokazu Miki
  • Patent number: 7509256
    Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.
    Type: Grant
    Filed: March 29, 2005
    Date of Patent: March 24, 2009
    Assignee: Sony Corporation
    Inventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
  • Patent number: 7499857
    Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.
    Type: Grant
    Filed: May 15, 2003
    Date of Patent: March 3, 2009
    Assignee: Microsoft Corporation
    Inventor: Asela J. Gunawardana
  • Patent number: 7496693
    Abstract: A method of interacting with a speech recognition (SR)-enabled personal computer (PC) is provided in which a user SR profile is transferred from a wireless-enabled device to the SR-enabled PC. Interaction with SR applications, on the SR-enabled PC, is carried out by transmitting speech signals wirelessly to the SR-enabled PC. The transmitted speech signals are recognized with the help of the transferred user SR profile.
    Type: Grant
    Filed: March 17, 2006
    Date of Patent: February 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Daniel B. Cook, David Mowatt, Oliver Scholz, Oscar E. Murillo
  • Patent number: 7496512
    Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.
    Type: Grant
    Filed: April 13, 2004
    Date of Patent: February 24, 2009
    Assignee: Microsoft Corporation
    Inventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
  • Patent number: 7496503
    Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
    Type: Grant
    Filed: December 18, 2006
    Date of Patent: February 24, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
  • Patent number: 7474790
    Abstract: A method and apparatus for the detection of local image structures represented as clusters in a joint-spatial range domain where the method comprises receiving an input image made having one or more clusters in a joint-spatial range domain, and each of the one or more clusters having a corresponding mode. Receiving a set of analysis matrices and selecting through each one of the analysis matrices. Using the selected analysis matrix to partition the input image into the one or more clusters and their corresponding modes, and computing a mean, ?, and a local covariance matrix ? for each of the corresponding modes of each of the one or more clusters. Selecting at least one of the one or more clusters, where each selected cluster has a stable mean and stable covariance matrix across the set of analysis matrices, whereby each of the selected clusters is indicative of a local image structure.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: January 6, 2009
    Assignee: Siemens Medical Solutions USA, Inc.
    Inventors: Navneet Dalal, Dorin Comaniciu
  • Patent number: 7475013
    Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.
    Type: Grant
    Filed: March 26, 2004
    Date of Patent: January 6, 2009
    Assignee: Honda Motor Co., Ltd.
    Inventor: Ryan Rifkin
  • Patent number: 7472062
    Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.
    Type: Grant
    Filed: January 4, 2002
    Date of Patent: December 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
  • Publication number: 20080319746
    Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.
    Type: Application
    Filed: March 25, 2008
    Publication date: December 25, 2008
    Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
  • Publication number: 20080319747
    Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.
    Type: Application
    Filed: August 20, 2008
    Publication date: December 25, 2008
    Applicant: Sony Deutschland GmbH
    Inventors: Ralf Kompe, Thomas Kemp
  • Patent number: 7454337
    Abstract: The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.
    Type: Grant
    Filed: May 13, 2004
    Date of Patent: November 18, 2008
    Assignee: The United States of America as represented by the Director, National Security Agency, The
    Inventors: David C. Smith, Daniel J. Richman
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7428541
    Abstract: A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.
    Type: Grant
    Filed: December 15, 2003
    Date of Patent: September 23, 2008
    Assignee: International Business Machines Corporation
    Inventor: Michael Edward Houle
  • Patent number: 7412385
    Abstract: The present invention obtains a set of text segments from a cluster of different articles written about a common event. The set of text segments is then subjected to textual alignment techniques to identify paraphrases from the text segments in the text. The invention can also be used to generate paraphrases.
    Type: Grant
    Filed: November 12, 2003
    Date of Patent: August 12, 2008
    Assignee: Microsoft Corporation
    Inventors: Christopher J. Brockett, William B. Dolan, Christopher B. Quirk
  • Publication number: 20080126090
    Abstract: A is recognized using a predefinable vocabulary that is partitioned in sections of phonetically similar words. In a recognition process, first oral input is associated with one of the sections, then the oral input is determined from the vocabulary of the associated section.
    Type: Application
    Filed: October 4, 2005
    Publication date: May 29, 2008
    Inventor: Niels Kunstmann
  • Patent number: 7379868
    Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.
    Type: Grant
    Filed: January 2, 2003
    Date of Patent: May 27, 2008
    Assignee: Massachusetts Institute of Technology
    Inventor: Douglas A. Reynolds
  • Patent number: 7359857
    Abstract: A technique for correcting the voice spectral deformations introduced by a communication network. Prior to the operation of equalization of the voice signal of a speaker, the constitution of classes of speakers is communicated, with one voice reference per class. Then, for a given speaker, the classification of this speaker is communicated, that is to say his allocation to a class from predefined classification criteria in order to make a voice reference which is closest to his own correspond to him. Then, for that given speaker, communicating the equalization of the digitized signal of the voice of the speaker carried out with, as a reference spectrum, the voice reference of the class to which the speaker has been allocated. This technique applies to the correction of the timbre of the voice in switched telephone networks, in ISDN networks and in mobile networks.
    Type: Grant
    Filed: November 25, 2003
    Date of Patent: April 15, 2008
    Assignee: France Telecom
    Inventors: Gaël Mahe, André Gilloire
  • Patent number: 7328154
    Abstract: An improved method is provided for constructing compact acoustic models for use in a speech recognizer. The method includes: partitioning speech data from a plurality of training speakers according to at least one speech related criteria (i.e., vocal tract length); grouping together the partitioned speech data from training speakers having a similar speech characteristic; and training an acoustic bubble model for each group using the speech data within the group.
    Type: Grant
    Filed: August 13, 2003
    Date of Patent: February 5, 2008
    Assignee: Matsushita Electrical Industrial Co., Ltd.
    Inventors: Ambroise Mutel, Patrick Nguyen, Luca Rigazio