Clustering Patents (Class 704/245)
-
Patent number: 7805300Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.Type: GrantFiled: March 21, 2005Date of Patent: September 28, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
-
Publication number: 20100241430Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.Type: ApplicationFiled: June 3, 2010Publication date: September 23, 2010Applicant: AT&T Intellectual Property II, L.P., via transfer from AT&T Corp.Inventors: Michiel A. U. Bacchiani, Brian E. Roark
-
Patent number: 7797158Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.Type: GrantFiled: June 20, 2007Date of Patent: September 14, 2010Assignee: AT&T Intellectual Property II, L.P.Inventor: Mazin Gilbert
-
Publication number: 20100217593Abstract: A program for generating Hidden Markov Models to be used for speech recognition with a given speech recognition system, the information storage medium storing a program, that renders a computer to function as a scheduled-to-be-used model group storage section that stores a scheduled-to-be-used model group including a plurality of Hidden Markov Models scheduled to be used by the given speech recognition system, and a filler model generation section that generates Hidden Markov Models to be used as filler models by the given speech recognition system based on all or at least a part of the Hidden Markov Model group in the scheduled-to-be-used model group.Type: ApplicationFiled: February 5, 2010Publication date: August 26, 2010Applicant: SEIKO EPSON CORPORATIONInventors: Paul W. Shields, Matthew E. Dunnachie, Yasutoshi Takizawa
-
Patent number: 7773809Abstract: A method and apparatus for generating discriminant functions for distinguishing obscene videos by using visual features of video data, and a method and apparatus for determining whether videos are obscene by using the generated discriminant functions, are provided.Type: GrantFiled: May 26, 2006Date of Patent: August 10, 2010Assignee: Electronics and Telecommunications Research InstituteInventors: Seung Min Lee, Taek Yong Nam, Jong Soo Jang, Ho Gyun Lee
-
Patent number: 7769588Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.Type: GrantFiled: August 20, 2008Date of Patent: August 3, 2010Assignee: Sony Deutschland GmbHInventors: Ralf Kompe, Thomas Kemp
-
Patent number: 7756341Abstract: Generic visual categorization methods complement a general vocabulary with adapted vocabularies that are class specific. Images to be categorized are characterized within different categories through a histogram indicating whether the image is better described by the general vocabulary or the class-specific adapted vocabulary.Type: GrantFiled: June 30, 2005Date of Patent: July 13, 2010Assignee: Xerox CorporationInventor: Florent Perronnin
-
Patent number: 7752046Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.Type: GrantFiled: October 29, 2004Date of Patent: July 6, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Michiel A. E. Bacchiani, Brian E. Roark
-
Patent number: 7747593Abstract: A method of determining cluster attractors for a plurality of documents comprising at least one term. The method comprises calculating, in respect of each term, a probability distribution indicative of the frequency of occurrence of the, or each, other term that co-occurs with said term in at least one of said documents. Then, the entropy of the respective probability distribution is calculated. Finally, at least one of said probability distributions is selected as a cluster attractor depending on the respective entropy value. The method facilitates very small clusters to be formed enabling more focused retrieval during a document search.Type: GrantFiled: September 27, 2004Date of Patent: June 29, 2010Assignees: University of Ulster, St. Petersburg State UniversityInventors: David Patterson, Vladimir Dobrynin
-
Patent number: 7747447Abstract: A bi-phase decoder suitable for use in a broadcast router and an associated method for extracting subframes of digital audio data from a stream of digital audio data. Logical circuitry within the bi-phase decoder extracts subframes of the digital audio data by constructing a transition window from an estimated bit time, sampling the stream of digital audio data using a fast clock and applying the sampled stream of digital audio data to the transition window to identify transitions indicative of preambles of the subframes of digital audio data.Type: GrantFiled: June 20, 2003Date of Patent: June 29, 2010Assignee: Thomson LicensingInventors: Carl Christensen, Lynn Howard Arbuckle
-
Patent number: 7747435Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.Type: GrantFiled: March 15, 2008Date of Patent: June 29, 2010Assignee: Sony CorporationInventors: Yasuhiro Toguri, Masayuki Nishiguchi
-
Patent number: 7742918Abstract: Disclosed is a system and method of training a spoken language understanding module. Such a module may be utilized in a spoken dialog system. The method of training a spoken language understanding module comprises training acoustic and language models using a small set of transcribed data St, recognizing utterances in a set Su that are candidates for transcription using the acoustic and language models, computing confidence scores of the utterances, selecting k utterances that have the smallest confidence scores from Su and transcribing them into a new set Si, redefining St as the union of St and Si, redefining Su as Su minus Si, and returning to the step of training acoustic and language models if word accuracy has not converged.Type: GrantFiled: July 5, 2007Date of Patent: June 22, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
-
Patent number: 7739111Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.Type: GrantFiled: August 9, 2006Date of Patent: June 15, 2010Assignee: Canon Kabushiki KaishaInventor: Kazue Kaneko
-
Publication number: 20100138223Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).Type: ApplicationFiled: March 13, 2008Publication date: June 3, 2010Inventor: Takafumi Koshinaka
-
Patent number: 7729911Abstract: A speech recognition method comprising the steps of: storing multiple recognition models for a vocabulary set, each model distinguished from the other models in response to a Lombard characteristic, detecting at least one speaker utterance in a motor vehicle, selecting one of the multiple recognition models in response to a Lombard characteristic of the at least one speaker utterance, utilizing the selected recognition model to recognize the at least one speaker utterance; and providing a signal in response to the recognition.Type: GrantFiled: September 27, 2005Date of Patent: June 1, 2010Assignee: General Motors LLCInventors: Rathinavelu Chengalvarayan, Scott M. Pennock
-
Patent number: 7725318Abstract: A system and method for improving the accuracy of audio searching using multiple models to process an audio file or stream to obtain search tracks. The search tracks are processed to locate at least one search term and generate multiple search results. The number of search results is equivalent to the number of models used to process the audio stream. The search results are combined to generate a unified search result. The multiple models may represent different languages, dialects and accents.Type: GrantFiled: August 1, 2005Date of Patent: May 25, 2010Assignee: NICE Systems Inc.Inventors: Marsal Gavalda, Moshe Wasserblat
-
Patent number: 7693713Abstract: Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.Type: GrantFiled: June 17, 2005Date of Patent: April 6, 2010Assignee: Microsoft CorporationInventors: Xiaodong He, Jian Wu
-
Patent number: 7664640Abstract: A signal processing system is disclosed which is implemented using Gaussian Mixture Model (GMM) based Hidden Markov Model (HMM), or a GMM alone, parameters of which are constrained during its optimization procedure. Also disclosed is a constraint system applied to input vectors representing the input signal to the system. The invention is particularly, but not exclusively, related to speech recognition systems. The invention reduces the tendency, common in prior art systems, to get caught in local minima associated with highly anisotropic Gaussian components—which reduces the recognizer performance—by employing the constraint system as above whereby the anisotropy of such components are minimized. The invention also covers a method of processing a signal, and a speech recognizer trained according to the method.Type: GrantFiled: March 24, 2003Date of Patent: February 16, 2010Assignee: Qinetiq LimitedInventor: Christopher John St. Clair Webber
-
Patent number: 7657102Abstract: A fast variational on-line learning technique for training a transformed hidden Markov model. A simplified general model and an associated estimation algorithm is provided for modeling visual data such as a video sequence. Specifically, once the model has been initialized, an expectation-maximization (“EM”) algorithm is used to learn the one or more object class models, so that the video sequence has high marginal probability under the model. In the expectation step (the “E-Step”), the model parameters are assumed to be correct, and for an input image, probabilistic inference is used to fill in the values of the unobserved or hidden variables, e.g., the object class and appearance. In one embodiment of the invention, a Viterbi algorithm and a latent image is employed for this purpose. In the maximization step (the “M-Step”), the model parameters are adjusted using the values of the unobserved variables calculated in the previous E-step.Type: GrantFiled: August 27, 2003Date of Patent: February 2, 2010Assignee: Microsoft Corp.Inventors: Nebojsa Jojic, Nemanja Petrovic
-
Patent number: 7657433Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.Type: GrantFiled: September 8, 2006Date of Patent: February 2, 2010Assignee: TellMe Networks, Inc.Inventor: Shuangyu Chang
-
Patent number: 7643990Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.Type: GrantFiled: October 23, 2003Date of Patent: January 5, 2010Assignee: Apple Inc.Inventor: Jerome R. Bellegarda
-
Patent number: 7634405Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.Type: GrantFiled: January 24, 2005Date of Patent: December 15, 2009Assignee: Microsoft CorporationInventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
-
Patent number: 7620547Abstract: The present invention provides a method for operating and/or for controlling a man-machine interface unit (MMI) for a finite user group environment. Utterances out of a group of user are repeatedly received. A process of user identification is carried out based on said received utterances. The process of user identification comprises a set of clustering so as to enable an enrolment-free performance.Type: GrantFiled: January 24, 2005Date of Patent: November 17, 2009Assignee: Sony Deutschland GmbHInventors: Ralf Kompe, Thomas Kemp
-
Publication number: 20090265166Abstract: A boundary estimation apparatus includes an boundary estimation unit which estimates a first boundary separating a speech into first meaning units, a boundary estimation unit configured to estimate a second boundary separating a speech, related to the speech, into second meaning units related to the first meaning units, a pattern generating unit configured to generate a representative pattern showing representative characteristic in the analysis interval, a similarity calculation unit configured to calculate a similarity between the representative pattern and a characteristic pattern showing feature in a calculation interval for calculating the similarity in the speech, and the boundary estimation unit estimate as the second boundary based on the calculation interval, in which the similarity is higher than a threshold value or relatively high.Type: ApplicationFiled: June 30, 2009Publication date: October 22, 2009Inventor: Kazuhiko Abe
-
Patent number: 7607083Abstract: Text summarizers using relevance measurement technologies and latent semantic analysis techniques provide accurate and useful summarization of the contents of text documents. Generic text summaries may be produced by ranking and extracting sentences from original documents; broad coverage of document content and decreased redundancy may simultaneously be achieved by constructing summaries from sentences that are highly ranked and different from each other. In one embodiment, conventional Information Retrieval (IR) technologies may be applied in a unique way to perform the summarization; relevance measurement, sentence selection, and term elimination may be repeated in successive iterations.Type: GrantFiled: March 26, 2001Date of Patent: October 20, 2009Assignee: NEC CorporationInventors: Yihong Gong, Xin Liu
-
Patent number: 7603278Abstract: A segment set before updating is read, and clustering considering a phoneme environment is performed to it. For each cluster obtained by the clustering, a representative segment of a segment set belonging to the cluster is generated. For each cluster, a segment belonging to the cluster is replaced with the representative segment so as to update the segment set.Type: GrantFiled: September 14, 2005Date of Patent: October 13, 2009Assignee: Canon Kabushiki KaishaInventors: Toshiaki Fukada, Masayuki Yamada, Yasuhiro Komori
-
Patent number: 7590537Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.Type: GrantFiled: December 27, 2004Date of Patent: September 15, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
-
Patent number: 7584100Abstract: A method and system for clustering documents based on generalized sentence patterns of the topics of the documents is provided. A generalized sentence patterns (“GSP”) system identifies a “sentence” that describes the topic of a document. To cluster documents, the GSP system generates a “generalized sentence” form of the sentence that describes the topic of each document. The generalized sentence is an abstraction of the words of the sentence. The GSP system identifies clusters of documents based on the patterns of their generalized sentences. The GSP system clusters documents when the generalized sentence representations of their topics have a similar pattern.Type: GrantFiled: June 30, 2004Date of Patent: September 1, 2009Assignee: Microsoft CorporationInventors: Benyu Zhang, Wei-Ying Ma, Zheng Chen, Hua-Jun Zeng
-
Patent number: 7571097Abstract: A method for compressing multiple dimensional gaussian distributions with diagonal covariance matrixes includes clustering a plurality of gaussian distributions in a multiplicity of clusters for each dimension. Each cluster can be represented by a centroid having a mean and a variance. A total decrease in likelihood of a training dataset is minimized for the representation of the plurality of gaussian distributions.Type: GrantFiled: March 13, 2003Date of Patent: August 4, 2009Assignee: Microsoft CorporationInventors: Alejandro Acero, Michael D. Plumpe
-
Patent number: 7552049Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.Type: GrantFiled: March 10, 2004Date of Patent: June 23, 2009Assignees: NTT DoCoMo, Inc., Sadaoki FuruiInventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
-
Patent number: 7548856Abstract: The present invention utilizes a discriminative density model selection method to provide an optimized density model subset employable in constructing a classifier. By allowing multiple alternative density models to be considered for each class in a multi-class classification system and then developing an optimal configuration comprised of a single density model for each class, the classifier can be tuned to exhibit a desired characteristic such as, for example, high classification accuracy, low cost, and/or a balance of both. In one instance of the present invention, error graph, junction tree, and min-sum propagation algorithms are utilized to obtain an optimization from discriminatively selected density models.Type: GrantFiled: May 20, 2003Date of Patent: June 16, 2009Assignee: Microsoft CorporationInventors: Bo Thiesson, Christopher A. Meek
-
Patent number: 7546242Abstract: A method of reproduction by a reproduction apparatus for reproducing audio documents forming part of a set of documents. The method includes a prior step of partitioning of the documents of the set into groups of documents whose audio parameters exhibit a similitude, making it possible to determine at least one document representing each group by taking into account its audio parameters. Then, an identifier of a document representing the group is reproduced graphically and/or in a sound manner. In this way, the user can take note of the type of music involved and can select this group by virtue of the graphical identifier. A command may be activated making it possible to go from one group to another; a group may be selected and reproduce the documents of this group. The invention also relates to a reproduction apparatus furnished with a user interface allowing reproduction.Type: GrantFiled: August 5, 2004Date of Patent: June 9, 2009Assignee: Thomson LicensingInventors: Louis Chevallier, Izabela Grasland, Jean-Ronan Vigouroux, Jean-Baptiste Henry
-
Patent number: 7542901Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.Type: GrantFiled: August 24, 2006Date of Patent: June 2, 2009Assignee: Nuance Communications, Inc.Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
-
Patent number: 7529659Abstract: A system for determining an identity of a received work. The system receives audio data for an unknown work. The audio data is divided into segments. The system generates a signature of the unknown work from each of the segments. Reduced dimension signatures are then generated at least a portion of the signatures. The reduced dimension signatures are then compared to reduced dimensions signatures of known works that are stored in a database. A list of candidates of known works is generated from the comparison. The signatures of the unknown works are then compared to the signatures of the known works in the list of candidates. The unknown work is then identified as the known work having signatures matching within a threshold.Type: GrantFiled: September 28, 2005Date of Patent: May 5, 2009Assignee: Audible Magic CorporationInventor: Erling H. Wold
-
Patent number: 7529666Abstract: In connection with speech recognition, the design of a linear transformation ??p×n, of rank p×n, which projects the features of a classifier x?n onto y=?x?p such as to achieve minimum Bayes error (or probability of misclassification). Two avenues are explored: the first is to maximize the ?-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of ?. While both approaches yield similar performance in practice, they outperform standard linear discriminant analysis features and show a 10% relative improvement in the word error rate over known cepstral features on a large vocabulary telephony speech recognition task.Type: GrantFiled: October 30, 2000Date of Patent: May 5, 2009Assignee: International Business Machines CorporationInventors: Mukund Padmanabhan, George A. Saon
-
Publication number: 20090112588Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a specified number of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentenceType: ApplicationFiled: October 31, 2007Publication date: April 30, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Krishna Kummamuru, Deepak S. Padmanabhan, Shourya Roy, L. Venkata Subramaniam
-
Publication number: 20090106023Abstract: A speech recognition word dictionary/language model making system for creating a word dictionary for recognizing a word not appearing in a learning text by selecting a word-generation-model-learning-method-by-word-class according to the word to be added which does not appear in the learning text and for making a language model. The speech recognition word dictionary/language model making system (100) includes a language model estimating device (111) for selecting estimating method information from a learning-method-knowledge-by-word-class storing section (109) for each word class of an addition word generating model which is a word generating model of the addition word according to the selected estimating method information and a database combining device (112) for adding an addition word to a word dictionary (105) and adding an addition word generating model to a word-generation-model-by-word-class database (107).Type: ApplicationFiled: November 30, 2007Publication date: April 23, 2009Inventor: Kiyokazu Miki
-
Patent number: 7509256Abstract: It is intended to increase the recognition rate in speech recognition and image recognition. An observation vector as input data, which represents a certain point in the observation vector space, is mapped to a distribution having a spread in the feature vector space, and a feature distribution parameter representing the distribution is determined. Pattern recognition of the input data is performed based on the feature distribution parameter.Type: GrantFiled: March 29, 2005Date of Patent: March 24, 2009Assignee: Sony CorporationInventors: Naoto Iwahashi, Hongchang Bao, Hitoshi Honda
-
Patent number: 7499857Abstract: The present invention is used to adapt acoustic models, quantized in subspaces, using adaptation training data (such as speaker-dependent training data). The acoustic model is compressed into multi-dimensional subspaces. A codebook is generated for each subspace. An adaptation transform is estimated, and it is applied to codewords in the codebooks, rather than to the means themselves.Type: GrantFiled: May 15, 2003Date of Patent: March 3, 2009Assignee: Microsoft CorporationInventor: Asela J. Gunawardana
-
Patent number: 7496693Abstract: A method of interacting with a speech recognition (SR)-enabled personal computer (PC) is provided in which a user SR profile is transferred from a wireless-enabled device to the SR-enabled PC. Interaction with SR applications, on the SR-enabled PC, is carried out by transmitting speech signals wirelessly to the SR-enabled PC. The transmitted speech signals are recognized with the help of the transferred user SR profile.Type: GrantFiled: March 17, 2006Date of Patent: February 24, 2009Assignee: Microsoft CorporationInventors: Daniel B. Cook, David Mowatt, Oliver Scholz, Oscar E. Murillo
-
Patent number: 7496512Abstract: A method and apparatus are provided for refining segmental boundaries in speech waveforms. Contextual acoustic feature similarities are used as a basis for clustering adjacent phoneme speech units, where each adjacent pair phoneme speech units include a segmental boundary. A refining model is trained for each cluster and used to refine boundaries of contextual phoneme speech units forming the clusters.Type: GrantFiled: April 13, 2004Date of Patent: February 24, 2009Assignee: Microsoft CorporationInventors: Yong Zhao, Min Chu, Jian-lai Zhou, Lijuan Wang
-
Patent number: 7496503Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.Type: GrantFiled: December 18, 2006Date of Patent: February 24, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
-
Patent number: 7475013Abstract: A system and method for voice recognition is disclosed. The system enrolls speakers using an enrollment voice samples and identification information. An extraction module characterizes enrollment voice samples with high-dimensional feature vectors or speaker data points. A data structuring module organizes data points into a high-dimensional data structure, such as a kd-tree, in which similarity between data points dictates a distance, such as a Euclidean distance, a Minkowski distance, or a Manhattan distance. The system recognizes a speaker using an unidentified voice sample. A data querying module searches the data structure to generate a subset of approximate nearest neighbors based on an extracted high-dimensional feature vector. A data modeling module uses Parzen windows to estimate a probability density function representing how closely characteristics of the unidentified speaker match enrolled speakers, in real-time, without extensive training data or parametric assumptions about data distribution.Type: GrantFiled: March 26, 2004Date of Patent: January 6, 2009Assignee: Honda Motor Co., Ltd.Inventor: Ryan Rifkin
-
Patent number: 7474790Abstract: A method and apparatus for the detection of local image structures represented as clusters in a joint-spatial range domain where the method comprises receiving an input image made having one or more clusters in a joint-spatial range domain, and each of the one or more clusters having a corresponding mode. Receiving a set of analysis matrices and selecting through each one of the analysis matrices. Using the selected analysis matrix to partition the input image into the one or more clusters and their corresponding modes, and computing a mean, ?, and a local covariance matrix ? for each of the corresponding modes of each of the one or more clusters. Selecting at least one of the one or more clusters, where each selected cluster has a stable mean and stable covariance matrix across the set of analysis matrices, whereby each of the selected clusters is indicative of a local image structure.Type: GrantFiled: September 29, 2004Date of Patent: January 6, 2009Assignee: Siemens Medical Solutions USA, Inc.Inventors: Navneet Dalal, Dorin Comaniciu
-
Patent number: 7472062Abstract: Methods and arrangements for facilitating data clustering. From a set of input data, a predetermined number of non-overlapping subsets are created. The input data is split recursively to create the subsets.Type: GrantFiled: January 4, 2002Date of Patent: December 30, 2008Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Jiri Navratil, Ganesh N. Ramaswamy
-
Publication number: 20080319746Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.Type: ApplicationFiled: March 25, 2008Publication date: December 25, 2008Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
-
Publication number: 20080319747Abstract: The method of operating a man-machine interface unit includes classifying at least one utterance of a speaker to be of a first type or of a second type. If the utterance is classified to be of the first type, the utterance belongs to a known speaker of a speaker data base, and if the utterance is classified to be of the second type, the utterance belongs to an unknown speaker that is not included in the speaker data base. The method also includes storing a set of utterances of the second type, clustering the set of utterances into clusters, wherein each cluster comprises utterances having similar features, and automatically adding a new speaker to the speaker data base based on utterances of one of the clusters.Type: ApplicationFiled: August 20, 2008Publication date: December 25, 2008Applicant: Sony Deutschland GmbHInventors: Ralf Kompe, Thomas Kemp
-
Patent number: 7454341Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.Type: GrantFiled: September 30, 2000Date of Patent: November 18, 2008Assignee: Intel CorporationInventors: Jielin Pan, Baosheng Yuan
-
Patent number: 7454337Abstract: The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.Type: GrantFiled: May 13, 2004Date of Patent: November 18, 2008Assignee: The United States of America as represented by the Director, National Security Agency, TheInventors: David C. Smith, Daniel J. Richman
-
Patent number: 7428541Abstract: A computer system for generating data structures for information retrieval of documents stored in a database. The computer system includes: a neighborhood patch generation system for defining patch of nodes having predetermined similarities in a hierarchy structure. The neighborhood patch generation subsystem includes a hierarchy generation subsystem for generating a hierarchy structure upon the document-keyword vectors and a patch definition subsystem. The computer system also comprises a cluster estimation subsystem for generating cluster data of the document-keyword vectors using the similarities of patches.Type: GrantFiled: December 15, 2003Date of Patent: September 23, 2008Assignee: International Business Machines CorporationInventor: Michael Edward Houle