Clustering Patents (Class 704/245)
  • Patent number: 8249871
    Abstract: A clustering tool to generate word clusters. In embodiments described, the clustering tool includes a clustering component that generates word clusters for words or word combinations in input data. In illustrated embodiments, the word clusters are used to modify or update a grammar for a closed vocabulary speech recognition application.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: August 21, 2012
    Assignee: Microsoft Corporation
    Inventor: Kunal Mukerjee
  • Patent number: 8244530
    Abstract: A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query.
    Type: Grant
    Filed: September 29, 2011
    Date of Patent: August 14, 2012
    Assignee: Google Inc.
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
  • Patent number: 8244531
    Abstract: A method is disclosed that enables the handling of audio streams for segments in the audio that might contain private information, in a way that is more straightforward than in some techniques in the prior art. The data-processing system of the illustrative embodiment receives a media stream that comprises an audio stream, possibly in addition to other types of media such as video. The audio stream comprises audio content, some of which can be private in nature. Once it receives the data, the data-processing system then analyzes the audio stream for private audio content by using one or more techniques that involve looking for private information as well as non-private information. As a result of the analysis, the data-processing system omits the private audio content from the resulting stream that contains the processed audio.
    Type: Grant
    Filed: September 28, 2008
    Date of Patent: August 14, 2012
    Assignee: Avaya Inc.
    Inventors: George William Erhart, Valentine C. Matula, David Joseph Skiba, Lawrence O'Gorman
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8219386
    Abstract: The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: July 10, 2012
    Assignee: King Fahd University of Petroleum and Minerals
    Inventors: Al-Zahrani Abdul Kareem Saleh, Moustafa Elshafei
  • Patent number: 8185395
    Abstract: An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: May 22, 2012
    Assignee: Honda Motor Co., Ltd.
    Inventors: Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino
  • Publication number: 20120123780
    Abstract: A video summary method comprises dividing a video into a plurality of video shots, analyzing each frame in a video shot from the plurality of video shots, determining a saliency of each frame of the video shot, determining a key frame of the video shot based on the saliency of each frame of the video shot, extracting visual features from the key frame and performing shot clustering of the plurality of video shots to determine concept patterns based on the visual features. The method further comprises fusing different concept patterns using a saliency tuning method and generating a summary of the video based upon a global optimization method.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 17, 2012
    Applicant: FutureWei Technologies, Inc.
    Inventors: Jizhou Gao, Yu Huang, Hong Heather Yu
  • Patent number: 8180627
    Abstract: The invention relates to an apparatus for clustering process models each consisting of model elements comprising a text phrase which describes in a natural language a process activity according to a process modeling language grammar and a natural language grammar, wherein said apparatus comprises a process object ontology memory for storing a process object ontology, a distance calculation unit for calculating a distance matrix employing said processing modeling language grammar and said natural language grammar, wherein said distance matrix consists of distances each indicating a dissimilarity of a pair of said process models, and a clustering unit which partitions said process models into a set of clusters based on said calculated distance matrix.
    Type: Grant
    Filed: July 2, 2008
    Date of Patent: May 15, 2012
    Assignee: Siemens Aktiengesellschaft
    Inventors: Andreas Bögl, Mathias Goller, Alexandra Grömer, Gustav Pomberger, Norbert Weber
  • Patent number: 8175875
    Abstract: A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query.
    Type: Grant
    Filed: May 19, 2006
    Date of Patent: May 8, 2012
    Assignee: Google Inc.
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
  • Patent number: 8171027
    Abstract: A computing device-implemented method includes receiving an additive tree; assigning data associated with the additive tree to one or more initial clusters; partitioning the additive tree into one or more pairs of additive sub-trees corresponding to one or more binary segmentations; computing a set that includes partitions resulting from a combination of the one or more initial clusters and the one or more pairs of additive sub-trees; evaluating one or more partitions of the set with one or more cluster validation criteria; storing one or more evaluation results for the one or more partitions; selecting at least one partition from the one or more partitions of the set that satisfies the one or more cluster validation criteria, where the at least one partition is associated with an optimal evaluation result; and removing at least one of the binary segmentations that corresponds to the at least one partition.
    Type: Grant
    Filed: October 29, 2009
    Date of Patent: May 1, 2012
    Assignee: The Mathworks, Inc.
    Inventor: Lucio Andrade-Cetto
  • Patent number: 8160875
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Grant
    Filed: August 26, 2010
    Date of Patent: April 17, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8145486
    Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.
    Type: Grant
    Filed: January 9, 2008
    Date of Patent: March 27, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Makoto Hirohata
  • Patent number: 8140331
    Abstract: Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.
    Type: Grant
    Filed: July 4, 2008
    Date of Patent: March 20, 2012
    Inventor: Xia Lou
  • Patent number: 8140333
    Abstract: A probability density function compensation method used for a continuous hidden Markov model and a speech recognition method and apparatus, the probability density function compensation method including extracting feature vectors from speech signals, and using the extracted feature vectors, training a model having a plurality of probability density functions to increase probabilities of recognizing the speech signals; obtaining a global variance by averaging variances of the plurality of the probability density functions after completing the training; obtaining a compensation factor using the global variance; and applying the global variance to each of the probability density functions and compensating each of the probability density functions for the global variance using the compensation factor.
    Type: Grant
    Filed: February 28, 2005
    Date of Patent: March 20, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Icksang Han, Sangbae Jeong, Eugene Jon
  • Patent number: 8126162
    Abstract: An audio signal interpolation apparatus is configured to perform interpolation processing on the basis of audio signals preceding and/or following a predetermined segment on a time axis so as to obtain an audio signal corresponding to the predetermined segment. The audio signal interpolation apparatus includes a waveform formation unit configured to form a waveform for the predetermined segment on the basis of time-domain samples of the preceding and/or the following audio signals and a power control unit configured to control power of the waveform for the predetermined segment formed by the waveform formation unit using a non-linear model selected on the basis of the preceding audio signal when the power of the preceding audio signal is larger than that of the following audio signal, or the following audio signal when the power of the preceding audio signal is smaller than that of the following audio signal.
    Type: Grant
    Filed: May 23, 2007
    Date of Patent: February 28, 2012
    Assignee: Sony Corporation
    Inventors: Chunmao Zhang, Toru Chinen
  • Patent number: 8112277
    Abstract: A node initializing unit generates a root node including inputted phonemic models. A candidate generating unit generates candidates of a pair of child sets by partitioning a set of phonemic models included in a node having no child node into two. A candidate deleting unit deletes candidates each including only phonemic models attached with determination information indicating that at least one of the child sets has a small amount of speech data for training. A similarity calculating unit calculates a sum of similarities among the phonemic models included in the child sets. A candidate selecting unit selects one of the candidates having a largest sum. A node generating unit generates two nodes including the two child sets included in the selected candidate, respectively. A clustering unit clusters the phonemic models in units of phonemic model sets each included in a node.
    Type: Grant
    Filed: September 22, 2008
    Date of Patent: February 7, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masaru Sakai
  • Patent number: 8078463
    Abstract: A method and apparatus for spotting a target speaker within a call interaction by generating speaker models based on one or more speaker's speech; and by searching for speaker models associated with one or more target speaker speech files.
    Type: Grant
    Filed: November 23, 2004
    Date of Patent: December 13, 2011
    Assignee: Nice Systems, Ltd.
    Inventors: Moshe Wasserblat, Yaniv Zigel, Oren Pereg
  • Patent number: 8069043
    Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.
    Type: Grant
    Filed: June 3, 2010
    Date of Patent: November 29, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Michiel A. U. Bacchiani, Brian E. Roark
  • Patent number: 8065145
    Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: November 22, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
  • Patent number: 8055503
    Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: November 8, 2011
    Assignee: Siemens Enterprise Communications, Inc.
    Inventors: Robert Scarano, Lawrence Mark
  • Patent number: 8055062
    Abstract: Disclosed herein is an information processing apparatus configured to classify time-series input data into N classes, including, a time-series feature quantity extracting section, N calculating sections, and a determination section.
    Type: Grant
    Filed: November 6, 2008
    Date of Patent: November 8, 2011
    Assignee: Sony Corporation
    Inventor: Yoko Komori
  • Patent number: 8040261
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions.
    Type: Grant
    Filed: December 30, 2010
    Date of Patent: October 18, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov
  • Patent number: 8036884
    Abstract: The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class.
    Type: Grant
    Filed: February 24, 2005
    Date of Patent: October 11, 2011
    Assignee: Sony Deutschland GmbH
    Inventors: Yin Hay Lam, Josep Maria Sola I Caros
  • Patent number: 8010357
    Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: August 30, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
  • Patent number: 8005680
    Abstract: Method for building a multimodal business channel between users, service providers and network operators. The service provided to the users is personalized with a user's profile derived from language and speech models delivered by a speech recognition system. The language and speech models are synchronized with user dependent language models stored in a central platform made accessible to various value added service providers. They may also be copied into various devices of the user. Natural language processing algorithms may be used for extracting topics from user's dialogues.
    Type: Grant
    Filed: November 21, 2006
    Date of Patent: August 23, 2011
    Assignee: Swisscom AG
    Inventor: Robert Van Kommer
  • Patent number: 7987091
    Abstract: A robot can make a dialog customized for the user by first storing various pieces of information appendant to an object as values of the corresponding items of the object. A topic that is related to the topic used in the immediately preceding conversation is then selected. Then, an acquisition conversation for acquiring the value of the item of the selected topic or a utilization conversation for utilizing the value of the item of the topic that is already stored is generated as the next conversation. The value acquired by the acquisition conversation is stored as the value of the corresponding item.
    Type: Grant
    Filed: December 2, 2003
    Date of Patent: July 26, 2011
    Assignee: Sony Corporation
    Inventors: Kazumi Aoyama, Yukiko Yoshiike, Shinya Ohtani, Rika Horinaka, Hideki Shimomura
  • Publication number: 20110161081
    Abstract: Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.
    Type: Application
    Filed: December 22, 2010
    Publication date: June 30, 2011
    Applicant: GOOGLE INC.
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
  • Patent number: 7969329
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: June 28, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov, Sergey V. Kolomiets
  • Patent number: 7966174
    Abstract: A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.
    Type: Grant
    Filed: February 14, 2008
    Date of Patent: June 21, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Giuseppe Riccardi
  • Patent number: 7957968
    Abstract: The invention includes a computer based system or method for automatically generating a grammar associated with a first task comprising the steps of: receiving first data representing the first task based from responses received from a distributed network; automatically tagging the first data into parts of speech to form first tagged data; identifying filler words and core words from said first tagged data; modeling sentence structure based upon said first tagged data using a first set of rules; identifying synonyms of said core words; and creating the grammar for the first task using said modeled sentence structure, first tagged data and said synonyms.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: June 7, 2011
    Assignee: Honda Motor Co., Ltd.
    Inventors: Rakesh Gupta, Ken Hennacy
  • Patent number: 7952497
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.
    Type: Grant
    Filed: May 6, 2009
    Date of Patent: May 31, 2011
    Assignee: Research In Motion Limited
    Inventors: Vadim Fux, Michael Elizarov
  • Patent number: 7949525
    Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Grant
    Filed: June 16, 2009
    Date of Patent: May 24, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
  • Patent number: 7937269
    Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.
    Type: Grant
    Filed: August 22, 2005
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
  • Patent number: 7933774
    Abstract: A system and method is provided for rapidly generating a new spoken dialog application. In one embodiment, a user experience person labels the transcribed data (e.g., 3000 utterances) using a set of interactive tools. The labeled data is then stored in a processed data database. During the labeling process, the user experience person not only groups utterances in various call type categories, but also flags (e.g., 100-200) specific utterances as positive and negative examples for use in an annotation guide. The labeled data in the processed data database can also be used to generate an initial natural language understanding (NLU) model.
    Type: Grant
    Filed: March 18, 2004
    Date of Patent: April 26, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Mazin G. Rahim, Allen Louis Gorin, Behzad Shahraray, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Patrick Guy Haffner, Harris Drucker, Steven Hart Lewis
  • Patent number: 7930172
    Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.
    Type: Grant
    Filed: December 8, 2009
    Date of Patent: April 19, 2011
    Assignee: Apple Inc.
    Inventor: Jerome R. Bellegarda
  • Patent number: 7930179
    Abstract: Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.
    Type: Grant
    Filed: October 2, 2007
    Date of Patent: April 19, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, Zhu Liu, Sarangarajan Parthasarathy, Aaron Edward Rosenberg
  • Patent number: 7912714
    Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters with
    Type: Grant
    Filed: April 1, 2008
    Date of Patent: March 22, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Krishna Kummamuru, Deepak S. Padmanaban, Shourya Roy, L. Venkata Subramaniam
  • Patent number: 7890327
    Abstract: Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.
    Type: Grant
    Filed: July 16, 2004
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Chitra Dorai, Ying Li
  • Patent number: 7881931
    Abstract: Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.
    Type: Grant
    Filed: February 4, 2008
    Date of Patent: February 1, 2011
    Assignee: Gracenote, Inc.
    Inventors: Maxwell Wells, Vidya Venkatachalam, Luca Cazzanti, Kwan Fai Cheung, Navdeep Dhillon, Somsak Sukittanon
  • Patent number: 7853449
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: December 14, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Patent number: 7844457
    Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: November 30, 2010
    Assignee: Microsoft Corporation
    Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
  • Patent number: 7822604
    Abstract: One embodiment of the present method and apparatus for identifying a conversing pair of users of a two-way speech medium includes receiving a plurality of binary voice activity streams, where the plurality of voice activity streams includes a first voice activity stream associated with a first user, and pairing the first voice activity stream with a second voice activity stream associated with a second user, in accordance with a complementary similarity between the first voice activity stream and the second voice activity stream.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: October 26, 2010
    Assignee: International Business Machines Corporation
    Inventors: Lisa Amini, Eric Bouillet, Olivier Verscheure, Michail Vlachos
  • Patent number: 7822614
    Abstract: A language analyzer performs speech recognition on a speech input by a speech input unit, specifies a possible word which is represented by the speech, and the score thereof, and supplies word data representing them to an agent processing unit. The agent processing unit stores process item data which defines a data acquisition process to acquire word data or the like, a discrimination process, and an input/output process, and wires or data defining transition from one process to another and giving a weighting factor to the transition, and executes a flow represented generally by the process item data and the wires to thereby control devices belonging to an input/output target device group. To which process in the flow the transition takes place is determined by the weighting factor of each wire, which is determined by the connection relationship between a point where the process has proceeded and the wire, and the score of word data.
    Type: Grant
    Filed: December 6, 2004
    Date of Patent: October 26, 2010
    Assignee: Kabushikikaisha Kenwood
    Inventor: Rika Koyama
  • Patent number: 7813926
    Abstract: A training system for a speech recognition application is disclosed. In embodiments described, the training system is used to train a classification model or language model. The classification model is trained using an adaptive language model generated by an iterative training process. In embodiments described, the training data is recognized by the speech recognition component and the recognized text is used to create the adaptive language model which is used for speech recognition in a following training iteration.
    Type: Grant
    Filed: March 16, 2006
    Date of Patent: October 12, 2010
    Assignee: Microsoft Corporation
    Inventors: Ye-Yi Wang, John Sie Yuen Lee, Alex Acero
  • Patent number: 7805300
    Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.
    Type: Grant
    Filed: March 21, 2005
    Date of Patent: September 28, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
  • Patent number: 7805301
    Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: September 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
  • Publication number: 20100241430
    Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.
    Type: Application
    Filed: June 3, 2010
    Publication date: September 23, 2010
    Applicant: AT&T Intellectual Property II, L.P., via transfer from AT&T Corp.
    Inventors: Michiel A. U. Bacchiani, Brian E. Roark
  • Patent number: 7797158
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.
    Type: Grant
    Filed: June 20, 2007
    Date of Patent: September 14, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Mazin Gilbert
  • Publication number: 20100217593
    Abstract: A program for generating Hidden Markov Models to be used for speech recognition with a given speech recognition system, the information storage medium storing a program, that renders a computer to function as a scheduled-to-be-used model group storage section that stores a scheduled-to-be-used model group including a plurality of Hidden Markov Models scheduled to be used by the given speech recognition system, and a filler model generation section that generates Hidden Markov Models to be used as filler models by the given speech recognition system based on all or at least a part of the Hidden Markov Model group in the scheduled-to-be-used model group.
    Type: Application
    Filed: February 5, 2010
    Publication date: August 26, 2010
    Applicant: SEIKO EPSON CORPORATION
    Inventors: Paul W. Shields, Matthew E. Dunnachie, Yasutoshi Takizawa
  • Patent number: 7773809
    Abstract: A method and apparatus for generating discriminant functions for distinguishing obscene videos by using visual features of video data, and a method and apparatus for determining whether videos are obscene by using the generated discriminant functions, are provided.
    Type: Grant
    Filed: May 26, 2006
    Date of Patent: August 10, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Min Lee, Taek Yong Nam, Jong Soo Jang, Ho Gyun Lee