Clustering Patents (Class 704/245)

Word clustering for input data

Patent number: 8249871

Abstract: A clustering tool to generate word clusters. In embodiments described, the clustering tool includes a clustering component that generates word clusters for words or word combinations in input data. In illustrated embodiments, the word clusters are used to modify or update a grammar for a closed vocabulary speech recognition application.

Type: Grant

Filed: November 18, 2005

Date of Patent: August 21, 2012

Assignee: Microsoft Corporation

Inventor: Kunal Mukerjee
Efficient indexing of documents with similar content

Patent number: 8244530

Abstract: A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query.

Type: Grant

Filed: September 29, 2011

Date of Patent: August 14, 2012

Assignee: Google Inc.

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
Method of retaining a media stream without its private audio content

Patent number: 8244531

Abstract: A method is disclosed that enables the handling of audio streams for segments in the audio that might contain private information, in a way that is more straightforward than in some techniques in the prior art. The data-processing system of the illustrative embodiment receives a media stream that comprises an audio stream, possibly in addition to other types of media such as video. The audio stream comprises audio content, some of which can be private in nature. Once it receives the data, the data-processing system then analyzes the audio stream for private audio content by using one or more techniques that involve looking for private information as well as non-private information. As a result of the analysis, the data-processing system omits the private audio content from the resulting stream that contains the processed audio.

Type: Grant

Filed: September 28, 2008

Date of Patent: August 14, 2012

Assignee: Avaya Inc.

Inventors: George William Erhart, Valentine C. Matula, David Joseph Skiba, Lawrence O'Gorman
Class detection scheme and time mediated averaging of class dependent models

Patent number: 8229744

Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.

Type: Grant

Filed: August 26, 2003

Date of Patent: July 24, 2012

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
Arabic poetry meter identification system and method

Patent number: 8219386

Abstract: The Arabic poetry meter identification system and method produces coded Al-Khalyli transcriptions of Arabic poetry. The meters (Wazn, Awzan being forms of the Arabic poems units Bayt, Abyate) are identified. A spoken or written poem is accepted as input. A coded transcription of the poetry pattern forms is produced from input processing. The system identifies and distinguishes between proper spoken poetic meter and improper poetic meter. Error in the poem meters (Bahr, Buhur) and the ending rhyme pattern, “Qafiya” are detected and verified. The system accepts user selection of a desired poem meter and then interactively aids the user in the composition of poetry in the selected meter, suggesting alternative words and word groups that follow the desired poem pattern and dactyl components. The system can be in a stand-alone device or integrated with other computing devices.

Type: Grant

Filed: January 21, 2009

Date of Patent: July 10, 2012

Assignee: King Fahd University of Petroleum and Minerals

Inventors: Al-Zahrani Abdul Kareem Saleh, Moustafa Elshafei
Information transmission device

Patent number: 8185395

Abstract: An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.

Type: Grant

Filed: September 13, 2005

Date of Patent: May 22, 2012

Assignee: Honda Motor Co., Ltd.

Inventors: Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino
Method and system for video summarization

Publication number: 20120123780

Abstract: A video summary method comprises dividing a video into a plurality of video shots, analyzing each frame in a video shot from the plurality of video shots, determining a saliency of each frame of the video shot, determining a key frame of the video shot based on the saliency of each frame of the video shot, extracting visual features from the key frame and performing shot clustering of the plurality of video shots to determine concept patterns based on the visual features. The method further comprises fusing different concept patterns using a saliency tuning method and generating a summary of the video based upon a global optimization method.

Type: Application

Filed: November 15, 2011

Publication date: May 17, 2012

Applicant: FutureWei Technologies, Inc.

Inventors: Jizhou Gao, Yu Huang, Hong Heather Yu
Method and an apparatus for clustering process models

Patent number: 8180627

Abstract: The invention relates to an apparatus for clustering process models each consisting of model elements comprising a text phrase which describes in a natural language a process activity according to a process modeling language grammar and a natural language grammar, wherein said apparatus comprises a process object ontology memory for storing a process object ontology, a distance calculation unit for calculating a distance matrix employing said processing modeling language grammar and said natural language grammar, wherein said distance matrix consists of distances each indicating a dissimilarity of a pair of said process models, and a clustering unit which partitions said process models into a set of clusters based on said calculated distance matrix.

Type: Grant

Filed: July 2, 2008

Date of Patent: May 15, 2012

Assignee: Siemens Aktiengesellschaft

Inventors: Andreas Bögl, Mathias Goller, Alexandra Grömer, Gustav Pomberger, Norbert Weber
Efficient indexing of documents with similar content

Patent number: 8175875

Abstract: A set of documents may be stored and indexed as a compressed sequence of tokens. A set of documents are grouped into clusters. Sequences of tokens representing the clusters of documents are encoded to elide some repeating instances of tokens. A compressed sequence of tokens is generated from the compressed cluster sequences of tokens. Queries on the compressed sequence are performed by identifying cluster sequences within the compressed sequence that are likely to have documents that satisfy the query and then identifying, within these identified clusters, the documents that actually satisfies the query.

Type: Grant

Filed: May 19, 2006

Date of Patent: May 8, 2012

Assignee: Google Inc.

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
Clustering for additive trees

Patent number: 8171027

Abstract: A computing device-implemented method includes receiving an additive tree; assigning data associated with the additive tree to one or more initial clusters; partitioning the additive tree into one or more pairs of additive sub-trees corresponding to one or more binary segmentations; computing a set that includes partitions resulting from a combination of the one or more initial clusters and the one or more pairs of additive sub-trees; evaluating one or more partitions of the set with one or more cluster validation criteria; storing one or more evaluation results for the one or more partitions; selecting at least one partition from the one or more partitions of the set that satisfies the one or more cluster validation criteria, where the at least one partition is associated with an optimal evaluation result; and removing at least one of the binary segmentations that corresponds to the at least one partition.

Type: Grant

Filed: October 29, 2009

Date of Patent: May 1, 2012

Assignee: The Mathworks, Inc.

Inventor: Lucio Andrade-Cetto
System and method for improving robustness of speech recognition using vocal tract length normalization codebooks

Patent number: 8160875

Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Type: Grant

Filed: August 26, 2010

Date of Patent: April 17, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Mazin Gilbert
Indexing apparatus, indexing method, and computer program product

Patent number: 8145486

Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.

Type: Grant

Filed: January 9, 2008

Date of Patent: March 27, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Makoto Hirohata
Feature extraction for identification and classification of audio signals

Patent number: 8140331

Abstract: Characteristic features are extracted from an audio sample based on its acoustic content. The features can be coded as fingerprints, which can be used to identify the audio from a fingerprints database. The features can also be used as parameters to separate the audio into different categories.

Type: Grant

Filed: July 4, 2008

Date of Patent: March 20, 2012

Inventor: Xia Lou
Probability density function compensation method for hidden markov model and speech recognition method and apparatus using the same

Patent number: 8140333

Abstract: A probability density function compensation method used for a continuous hidden Markov model and a speech recognition method and apparatus, the probability density function compensation method including extracting feature vectors from speech signals, and using the extracted feature vectors, training a model having a plurality of probability density functions to increase probabilities of recognizing the speech signals; obtaining a global variance by averaging variances of the plurality of the probability density functions after completing the training; obtaining a compensation factor using the global variance; and applying the global variance to each of the probability density functions and compensating each of the probability density functions for the global variance using the compensation factor.

Type: Grant

Filed: February 28, 2005

Date of Patent: March 20, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Icksang Han, Sangbae Jeong, Eugene Jon
Audio signal interpolation method and audio signal interpolation apparatus

Patent number: 8126162

Abstract: An audio signal interpolation apparatus is configured to perform interpolation processing on the basis of audio signals preceding and/or following a predetermined segment on a time axis so as to obtain an audio signal corresponding to the predetermined segment. The audio signal interpolation apparatus includes a waveform formation unit configured to form a waveform for the predetermined segment on the basis of time-domain samples of the preceding and/or the following audio signals and a power control unit configured to control power of the waveform for the predetermined segment formed by the waveform formation unit using a non-linear model selected on the basis of the preceding audio signal when the power of the preceding audio signal is larger than that of the following audio signal, or the following audio signal when the power of the preceding audio signal is smaller than that of the following audio signal.

Type: Grant

Filed: May 23, 2007

Date of Patent: February 28, 2012

Assignee: Sony Corporation

Inventors: Chunmao Zhang, Toru Chinen
Apparatus, method, and program for clustering phonemic models

Patent number: 8112277

Abstract: A node initializing unit generates a root node including inputted phonemic models. A candidate generating unit generates candidates of a pair of child sets by partitioning a set of phonemic models included in a node having no child node into two. A candidate deleting unit deletes candidates each including only phonemic models attached with determination information indicating that at least one of the child sets has a small amount of speech data for training. A similarity calculating unit calculates a sum of similarities among the phonemic models included in the child sets. A candidate selecting unit selects one of the candidates having a largest sum. A node generating unit generates two nodes including the two child sets included in the selected candidate, respectively. A clustering unit clusters the phonemic models in units of phonemic model sets each included in a node.

Type: Grant

Filed: September 22, 2008

Date of Patent: February 7, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Masaru Sakai
Method and apparatus for speaker spotting

Patent number: 8078463

Abstract: A method and apparatus for spotting a target speaker within a call interaction by generating speaker models based on one or more speaker's speech; and by searching for speaker models associated with one or more target speaker speech files.

Type: Grant

Filed: November 23, 2004

Date of Patent: December 13, 2011

Assignee: Nice Systems, Ltd.

Inventors: Moshe Wasserblat, Yaniv Zigel, Oren Pereg
System and method for using meta-data dependent language modeling for automatic speech recognition

Patent number: 8069043

Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.

Type: Grant

Filed: June 3, 2010

Date of Patent: November 29, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Michiel A. U. Bacchiani, Brian E. Roark
Keyword outputting apparatus and method

Patent number: 8065145

Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.

Type: Grant

Filed: March 25, 2008

Date of Patent: November 22, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
Methods and apparatus for audio data analysis and data mining using speech recognition

Patent number: 8055503

Abstract: A system and method provide an audio analysis intelligence tool with ad-hoc search capabilities using spoken words as an organized data form. An SQL-like interface is used to process and search audio data and combine it with other traditional data forms to enhance searching of audio segments to identify those audio segments satisfying minimum confidence levels for a match.

Type: Grant

Filed: November 1, 2006

Date of Patent: November 8, 2011

Assignee: Siemens Enterprise Communications, Inc.

Inventors: Robert Scarano, Lawrence Mark
Information processing apparatus, information processing method, and program

Patent number: 8055062

Abstract: Disclosed herein is an information processing apparatus configured to classify time-series input data into N classes, including, a time-series feature quantity extracting section, N calculating sections, and a determination section.

Type: Grant

Filed: November 6, 2008

Date of Patent: November 8, 2011

Assignee: Sony Corporation

Inventor: Yoko Komori
Handheld electronic device and method for disambiguation of compound text input employing different groupings of data sources to disambiguate different parts of input

Patent number: 8040261

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to generate compound language solutions by employing different groupings of data sources to generate different portions of the compound language solutions.

Type: Grant

Filed: December 30, 2010

Date of Patent: October 18, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael Elizarov
Identification of the presence of speech in digital audio data

Patent number: 8036884

Abstract: The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class.

Type: Grant

Filed: February 24, 2005

Date of Patent: October 11, 2011

Assignee: Sony Deutschland GmbH

Inventors: Yin Hay Lam, Josep Maria Sola I Caros
Combining active and semi-supervised learning for spoken language understanding

Patent number: 8010357

Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.

Type: Grant

Filed: January 12, 2005

Date of Patent: August 30, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokhan Tur
Method for personalization of a service

Patent number: 8005680

Abstract: Method for building a multimodal business channel between users, service providers and network operators. The service provided to the users is personalized with a user's profile derived from language and speech models delivered by a speech recognition system. The language and speech models are synchronized with user dependent language models stored in a central platform made accessible to various value added service providers. They may also be copied into various devices of the user. Natural language processing algorithms may be used for extracting topics from user's dialogues.

Type: Grant

Filed: November 21, 2006

Date of Patent: August 23, 2011

Assignee: Swisscom AG

Inventor: Robert Van Kommer
Dialog control device and method, and robot device

Patent number: 7987091

Abstract: A robot can make a dialog customized for the user by first storing various pieces of information appendant to an object as values of the corresponding items of the object. A topic that is related to the topic used in the immediately preceding conversation is then selected. Then, an acquisition conversation for acquiring the value of the item of the selected topic or a utilization conversation for utilizing the value of the item of the topic that is already stored is generated as the next conversation. The value acquired by the acquisition conversation is stored as the value of the corresponding item.

Type: Grant

Filed: December 2, 2003

Date of Patent: July 26, 2011

Assignee: Sony Corporation

Inventors: Kazumi Aoyama, Yukiko Yoshiike, Shinya Ohtani, Rika Horinaka, Hideki Shimomura
Speech Recognition Language Models

Publication number: 20110161081

Abstract: Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.

Type: Application

Filed: December 22, 2010

Publication date: June 30, 2011

Applicant: GOOGLE INC.

Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
Handheld electronic device with text disambiguation

Patent number: 7969329

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.

Type: Grant

Filed: October 31, 2007

Date of Patent: June 28, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael Elizarov, Sergey V. Kolomiets
Automatic clustering of tokens from a corpus for grammar acquisition

Patent number: 7966174

Abstract: A system for recognizing patterns is disclosed. Grammar learning from a corpus includes, for the other non-context words, generating frequency vectors for each non-context token in a corpus based upon counted occurrences of a predetermined relationship of the non-context tokens to identified context tokens. Clusters are grown from the frequency vectors according to a lexical correlation or a cluster tree among the non-context tokens. The cluster tree is used for pattern recognition.

Type: Grant

Filed: February 14, 2008

Date of Patent: June 21, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Giuseppe Riccardi
Automatic grammar generation using distributedly collected knowledge

Patent number: 7957968

Abstract: The invention includes a computer based system or method for automatically generating a grammar associated with a first task comprising the steps of: receiving first data representing the first task based from responses received from a distributed network; automatically tagging the first data into parts of speech to form first tagged data; identifying filler words and core words from said first tagged data; modeling sentence structure based upon said first tagged data using a first set of rules; identifying synonyms of said core words; and creating the grammar for the first task using said modeled sentence structure, first tagged data and said synonyms.

Type: Grant

Filed: December 12, 2006

Date of Patent: June 7, 2011

Assignee: Honda Motor Co., Ltd.

Inventors: Rakesh Gupta, Ken Hennacy
Handheld electronic device and method for disambiguation of compound text input for prioritizing compound language solutions according to quantity of text components

Patent number: 7952497

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.

Type: Grant

Filed: May 6, 2009

Date of Patent: May 31, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael Elizarov
Active labeling for spoken language understanding

Patent number: 7949525

Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: June 16, 2009

Date of Patent: May 24, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
Systems and methods for providing real-time classification of continuous data streams

Patent number: 7937269

Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

Type: Grant

Filed: August 22, 2005

Date of Patent: May 3, 2011

Assignee: International Business Machines Corporation

Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
System and method for automatic generation of a natural language understanding model

Patent number: 7933774

Abstract: A system and method is provided for rapidly generating a new spoken dialog application. In one embodiment, a user experience person labels the transcribed data (e.g., 3000 utterances) using a set of interactive tools. The labeled data is then stored in a processed data database. During the labeling process, the user experience person not only groups utterances in various call type categories, but also flags (e.g., 100-200) specific utterances as positive and negative examples for use in an annotation guide. The labeled data in the processed data database can also be used to generate an initial natural language understanding (NLU) model.

Type: Grant

Filed: March 18, 2004

Date of Patent: April 26, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Mazin G. Rahim, Allen Louis Gorin, Behzad Shahraray, David Crawford Gibbon, Zhu Liu, Bernard S. Renger, Patrick Guy Haffner, Harris Drucker, Steven Hart Lewis
Global boundary-centric feature extraction and associated discontinuity metrics

Patent number: 7930172

Abstract: Portions from time-domain speech segments are extracted. Feature vectors that represent the portions in a vector space are created. The feature vectors incorporate phase information of the portions. A distance between the feature vectors in the vector space is determined. In one aspect, the feature vectors are created by constructing a matrix W from the portions and decomposing the matrix W. In one aspect, decomposing the matrix W comprises extracting global boundary-centric features from the portions. In one aspect, the portions include at least one pitch period. In another aspect, the portions include centered pitch periods.

Type: Grant

Filed: December 8, 2009

Date of Patent: April 19, 2011

Assignee: Apple Inc.

Inventor: Jerome R. Bellegarda
Unsupervised speaker segmentation of multi-speaker speech data

Patent number: 7930179

Abstract: Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.

Type: Grant

Filed: October 2, 2007

Date of Patent: April 19, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Zhu Liu, Sarangarajan Parthasarathy, Aaron Edward Rosenberg
Method for segmenting communication transcripts using unsupervised and semi-supervised techniques

Patent number: 7912714

Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters with

Type: Grant

Filed: April 1, 2008

Date of Patent: March 22, 2011

Assignee: Nuance Communications, Inc.

Inventors: Krishna Kummamuru, Deepak S. Padmanaban, Shourya Roy, L. Venkata Subramaniam
Framework for extracting multiple-resolution semantics in composite media content analysis

Patent number: 7890327

Abstract: Disclosed is a general framework for extracting semantics from composite media content at various resolutions. Specifically, given a media stream, which may consist of various types of media modalities including audio, visual, text and graphics information, the disclosed framework describes how various types of semantics could be extracted at different levels by exploiting and integrating different media features. The output of this framework is a series of tagged (or annotated) media segments at different scales. Specifically, at the lowest resolution, the media segments are characterized in a more general and broader sense, thus they are identified at a larger scale; while at the highest resolution, the media content is more specifically analyzed, inspected and identified, which thus results in small-scaled media segments.

Type: Grant

Filed: July 16, 2004

Date of Patent: February 15, 2011

Assignee: International Business Machines Corporation

Inventors: Chitra Dorai, Ying Li
Automatic identification of sound recordings

Patent number: 7881931

Abstract: Copies of original sound recordings are identified by extracting features from the copy, creating a vector of those features, and comparing that vector against a database of vectors. Identification can be performed for copies of sound recordings that have been subjected to compression and other manipulation such that they are not exact replicas of the original. Computational efficiency permits many hundreds of queries to be serviced at the same time. The vectors may be less than 100 bytes, so that many millions of vectors can be stored on a portable device.

Type: Grant

Filed: February 4, 2008

Date of Patent: February 1, 2011

Assignee: Gracenote, Inc.

Inventors: Maxwell Wells, Vidya Venkatachalam, Luca Cazzanti, Kwan Fai Cheung, Navdeep Dhillon, Somsak Sukittanon
Methods and apparatus for generating dialog state conditioned language models

Patent number: 7853449

Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.

Type: Grant

Filed: March 28, 2008

Date of Patent: December 14, 2010

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
Unsupervised labeling of sentence level accent

Patent number: 7844457

Abstract: Methods are disclosed for automatic accent labeling without manually labeled data. The methods are designed to exploit accent distribution between function and content words.

Type: Grant

Filed: February 20, 2007

Date of Patent: November 30, 2010

Assignee: Microsoft Corporation

Inventors: YiNing Chen, Frank Kao-ping Soong, Min Chu
Method and apparatus for identifying conversing pairs over a two-way speech medium

Patent number: 7822604

Abstract: One embodiment of the present method and apparatus for identifying a conversing pair of users of a two-way speech medium includes receiving a plurality of binary voice activity streams, where the plurality of voice activity streams includes a first voice activity stream associated with a first user, and pairing the first voice activity stream with a second voice activity stream associated with a second user, in accordance with a complementary similarity between the first voice activity stream and the second voice activity stream.

Type: Grant

Filed: October 31, 2006

Date of Patent: October 26, 2010

Assignee: International Business Machines Corporation

Inventors: Lisa Amini, Eric Bouillet, Olivier Verscheure, Michail Vlachos
Device control, speech recognition device, agent device, control method

Patent number: 7822614

Abstract: A language analyzer performs speech recognition on a speech input by a speech input unit, specifies a possible word which is represented by the speech, and the score thereof, and supplies word data representing them to an agent processing unit. The agent processing unit stores process item data which defines a data acquisition process to acquire word data or the like, a discrimination process, and an input/output process, and wires or data defining transition from one process to another and giving a weighting factor to the transition, and executes a flow represented generally by the process item data and the wires to thereby control devices belonging to an input/output target device group. To which process in the flow the transition takes place is determined by the weighting factor of each wire, which is determined by the connection relationship between a point where the process has proceeded and the wire, and the score of word data.

Type: Grant

Filed: December 6, 2004

Date of Patent: October 26, 2010

Assignee: Kabushikikaisha Kenwood

Inventor: Rika Koyama
Training system for a speech recognition application

Patent number: 7813926

Abstract: A training system for a speech recognition application is disclosed. In embodiments described, the training system is used to train a classification model or language model. The classification model is trained using an adaptive language model generated by an iterative training process. In embodiments described, the training data is recognized by the speech recognition component and the recognized text is used to create the adaptive language model which is used for speech recognition in a following training iteration.

Type: Grant

Filed: March 16, 2006

Date of Patent: October 12, 2010

Assignee: Microsoft Corporation

Inventors: Ye-Yi Wang, John Sie Yuen Lee, Alex Acero
Apparatus and method for analysis of language model changes

Patent number: 7805300

Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.

Type: Grant

Filed: March 21, 2005

Date of Patent: September 28, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
Covariance estimation for pattern recognition

Patent number: 7805301

Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.

Type: Grant

Filed: July 1, 2005

Date of Patent: September 28, 2010

Assignee: Microsoft Corporation

Inventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
SYSTEM AND METHOD FOR USING META-DATA DEPENDENT LANGUAGE MODELING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20100241430

Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.

Type: Application

Filed: June 3, 2010

Publication date: September 23, 2010

Applicant: AT&T Intellectual Property II, L.P., via transfer from AT&T Corp.

Inventors: Michiel A. U. Bacchiani, Brian E. Roark
System and method for improving robustness of speech recognition using vocal tract length normalization codebooks

Patent number: 7797158

Abstract: Disclosed are systems, methods, and computer readable media for performing speech recognition. The method embodiment comprises selecting a codebook from a plurality of codebooks with a minimal acoustic distance to a received speech sample, the plurality of codebooks generated by a process of (a) computing a vocal tract length for a each of a plurality of speakers, (b) for each of the plurality of speakers, clustering speech vectors, and (c) creating a codebook for each speaker, the codebook containing entries for the respective speaker's vocal tract length, speech vectors, and an optional vector weight for each speech vector, (2) applying the respective vocal tract length associated with the selected codebook to normalize the received speech sample for use in speech recognition, and (3) recognizing the received speech sample based on the respective vocal tract length associated with the selected codebook.

Type: Grant

Filed: June 20, 2007

Date of Patent: September 14, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Mazin Gilbert
Program for creating Hidden Markov Model, information storage medium, system for creating Hidden Markov Model, speech recognition system, and method of speech recognition

Publication number: 20100217593

Abstract: A program for generating Hidden Markov Models to be used for speech recognition with a given speech recognition system, the information storage medium storing a program, that renders a computer to function as a scheduled-to-be-used model group storage section that stores a scheduled-to-be-used model group including a plurality of Hidden Markov Models scheduled to be used by the given speech recognition system, and a filler model generation section that generates Hidden Markov Models to be used as filler models by the given speech recognition system based on all or at least a part of the Hidden Markov Model group in the scheduled-to-be-used model group.

Type: Application

Filed: February 5, 2010

Publication date: August 26, 2010

Applicant: SEIKO EPSON CORPORATION

Inventors: Paul W. Shields, Matthew E. Dunnachie, Yasutoshi Takizawa
Method and apparatus for distinguishing obscene video using visual feature

Patent number: 7773809

Abstract: A method and apparatus for generating discriminant functions for distinguishing obscene videos by using visual features of video data, and a method and apparatus for determining whether videos are obscene by using the generated discriminant functions, are provided.

Type: Grant

Filed: May 26, 2006

Date of Patent: August 10, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Seung Min Lee, Taek Yong Nam, Jong Soo Jang, Ho Gyun Lee

prev 1 2 3 4 5 6 7 next