Clustering Patents (Class 704/245)

Methods, systems, and circuits for text independent speaker recognition with automatic learning features

Publication number: 20140195232

Abstract: Embodiments provide a method and system of text independent speaker recognition with a complexity comparable to a text dependent version. The scheme exploits the fact that speech is a quasi-stationary signal and simplifies the recognition process based on this theory. The modeling allows the speaker profile to be updated progressively with the new speech sample that is acquired during usage time.

Type: Application

Filed: April 1, 2013

Publication date: July 10, 2014

Applicant: STMicroelectronics Asia Pacific Pte Ltd.

Inventor: STMicroelectronics Asia Pacific Pte Ltd.
Speech recognition repair using contextual information

Patent number: 8762156

Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.

Type: Grant

Filed: September 28, 2011

Date of Patent: June 24, 2014

Assignee: Apple Inc.

Inventor: Lik Harry Chen
System And Method For Event Summarization Using Observer Social Media Messages

Publication number: 20140172427

Abstract: A method for processing messages pertaining to an event includes receiving a plurality of messages pertaining to the event from electronic communication devices associated with a plurality of observers of the event, generating a first message stream that includes only a portion of the plurality of messages corresponding to a first participant in the event, identifying a first sub-event in the first message stream with reference to a time distribution of messages and content distribution of messages in the first message stream, generating a sub-event summary with reference to a portion of the plurality of messages in the first message stream that are associated with the first sub-event, and transmitting the sub-event summary to a plurality of electronic communication devices associated with a plurality of users who are not observers of the event.

Type: Application

Filed: December 13, 2013

Publication date: June 19, 2014

Applicant: Robert Bosch GmbH

Inventors: Fei Liu, Fuliang Weng, Chao Shen, Lin Zhao
Automated difference recognition between speaking sounds and music

Patent number: 8712771

Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.

Type: Grant

Filed: October 31, 2013

Date of Patent: April 29, 2014

Inventor: Alon Konchitsky
System and method for efficient tracking of multiple dialog states with incremental recombination

Patent number: 8700402

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.

Type: Grant

Filed: June 4, 2013

Date of Patent: April 15, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Preserving audio data collection privacy in mobile devices

Patent number: 8700406

Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.

Type: Grant

Filed: August 19, 2011

Date of Patent: April 15, 2014

Assignee: Qualcomm Incorporated

Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
Intent mining via analysis of utterances

Patent number: 8688453

Abstract: According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance.

Type: Grant

Filed: February 28, 2011

Date of Patent: April 1, 2014

Assignee: Nuance Communications, Inc.

Inventors: Sachindra Joshi, Shantanu Godbole
Systems and methods for classifying sports video

Patent number: 8682654

Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.

Type: Grant

Filed: April 25, 2006

Date of Patent: March 25, 2014

Assignee: Cyberlink Corp.

Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
Audible authentication for wireless network enrollment

Patent number: 8661515

Abstract: An audible authentication of a wireless device for enrollment onto a secure wireless network includes an unauthorized wireless device that audibly emits a uniquely identifying secret code (e.g., a personal identification number (PIN)). In some implementations, the audible code is heard by the user and manually entered via a network-enrollment user interface. In other implementations, a network-authorizing device automatically picks up the audible code and verifies the code. If verified, the wireless device is enrolled onto the wireless network.

Type: Grant

Filed: May 10, 2010

Date of Patent: February 25, 2014

Assignee: Intel Corporation

Inventors: Marc Meylemans, Gary A. Martz, Jr.
System and method for efficiently reducing transcription error using hybrid voice transcription

Patent number: 8645136

Abstract: A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group.

Type: Grant

Filed: July 20, 2010

Date of Patent: February 4, 2014

Assignee: Intellisist, Inc.

Inventor: David Milstein
Apparatus and method for automatic extraction of important events in audio signals

Patent number: 8635065

Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted

Type: Grant

Filed: November 10, 2004

Date of Patent: January 21, 2014

Assignee: Sony Deutschland GmbH

Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
Speech classification apparatus, speech classification method, and speech classification program

Patent number: 8630853

Abstract: A speech classification apparatus includes a speech classification probability calculation unit that calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model, and a parameter updating unit that successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation unit.

Type: Grant

Filed: March 13, 2008

Date of Patent: January 14, 2014

Assignee: NEC Corporation

Inventor: Takafumi Koshinaka
Speech processing system and method

Patent number: 8612224

Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd

Type: Grant

Filed: August 23, 2011

Date of Patent: December 17, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
Speech data process unit and speech data process unit control program for speech recognition

Patent number: 8606580

Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.

Type: Grant

Filed: December 30, 2008

Date of Patent: December 10, 2013

Assignee: Asahi Kasei Kabushiki Kaisha

Inventors: Makoto Shozakai, Goshu Nagino
METHODS AND APPARATUS FOR PERFORMING TRANSFORMATION TECHNIQUES FOR DATA CLUSTERING AND/OR CLASSIFICATION

Publication number: 20130325472

Abstract: Some aspects include transforming data, at least a portion of which has been processed to determine frequency information associated with features in the data. Techniques include determining a first transformation based, at least in part, on the frequency information, applying at least the first transformation to the data to obtain transformed data, and fitting a plurality of clusters to the transformed data to obtain a plurality of established clusters. Some aspects include classifying input data by transforming the input data using at least the first transformation and comparing the transformed input data to the established clusters.

Type: Application

Filed: August 8, 2012

Publication date: December 5, 2013

Applicant: Nuance Communications, Inc.

Inventors: Leonid Rachevsky, Dimitri Kanevsky, Bhuvana Ramabhadran
Speaker-cluster dependent speaker recognition (speaker-type automated speech recognition)

Patent number: 8600750

Abstract: In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.

Type: Grant

Filed: June 8, 2010

Date of Patent: December 3, 2013

Assignee: Cisco Technology, Inc.

Inventors: Michael A. Ramalho, Todd C. Tatum, Shantanu Sarkar
VOICED SOUND INTERVAL DETECTION DEVICE, VOICED SOUND INTERVAL DETECTION METHOD AND VOICED SOUND INTERVAL DETECTION PROGRAM

Publication number: 20130311183

Abstract: This invention provides a voiced sound interval detection device which enables appropriate detection of a voiced sound interval of an observation signal even when a volume of sound from a sound source varies or when the number of sound sources is unknown or when different kinds of microphones are used together.

Type: Application

Filed: January 25, 2012

Publication date: November 21, 2013

Applicant: NEC CORPORATION

Inventor: Yoshifumi Onishi
Dialect-specific acoustic language modeling and speech recognition

Patent number: 8583432

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.

Type: Grant

Filed: July 25, 2012

Date of Patent: November 12, 2013

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
Method and system for speaker diarization

Patent number: 8554563

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Grant

Filed: September 11, 2012

Date of Patent: October 8, 2013

Assignee: Nuance Communications, Inc.

Inventor: Hagai Aronowitz
Efficient indexing of documents with similar content

Patent number: 8554561

Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.

Type: Grant

Filed: August 9, 2012

Date of Patent: October 8, 2013

Assignee: Google Inc.

Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
Method and system for speaker diarization

Patent number: 8554562

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Grant

Filed: November 15, 2009

Date of Patent: October 8, 2013

Assignee: Nuance Communications, Inc.

Inventor: Hagai Aronowitz
Clustering processing apparatus and clustering processing method

Patent number: 8553991

Abstract: A clustering processing apparatus comprises: N clustering units that group samples included in the data block into clusters, each clustering unit sequentially taking each sample as a target, grouping the target sample into one of the clusters within the data block, storing cluster information including identification on each cluster into which the samples are grouped within the data block, and storing sample assignment information indicating the cluster to which the target sample belongs; a cluster information transferring unit that selects cluster information on a cluster to be integrated from the cluster information when a predetermined condition is met, and transfers the selected cluster information to a third storage unit; and an updating unit that integrates clusters selected based on the cluster information stored in the third storage unit into an integrated cluster, and updates the sample assignment information based on information of the integrated clusters.

Type: Grant

Filed: June 15, 2011

Date of Patent: October 8, 2013

Assignee: Canon Kabushiki Kaisha

Inventor: Satoshi Naito
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8548807

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: June 9, 2009

Date of Patent: October 1, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
MODELING DEVICE AND METHOD FOR SPEAKER RECOGNITION, AND SPEAKER RECOGNITION SYSTEM

Publication number: 20130253931

Abstract: A modeling device and method for speaker recognition and a speaker recognition system are provided. The modeling device comprises a front end which receives enrollment speech data from each target speaker, a reference anchor set generation unit which generates a reference anchor set using the enrollment speech data based on an anchor space, and a voice print generation unit which generates voice prints based on the reference anchor set and the enrollment speech data. With the present disclosure, by taking the enrollment speech and speaker adaptation technique into account, anchor models with smaller size can be generated, so reliable and robust speaker recognition with smaller size reference anchor set is possible. It brings great advantages for computation speed improvement and great memory reduction.

Type: Application

Filed: December 10, 2010

Publication date: September 26, 2013

Inventors: Haifeng Shen, Long Ma, Bingqi Zhang
Speaker segmentation in noisy conversational speech

Patent number: 8543402

Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.

Type: Grant

Filed: April 29, 2011

Date of Patent: September 24, 2013

Assignee: The Intellisis Corporation

Inventor: Jiyong Ma
Method for segmenting audio signals

Patent number: 8521529

Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.

Type: Grant

Filed: April 18, 2005

Date of Patent: August 27, 2013

Assignee: Creative Technology Ltd

Inventors: Michael M. Goodwin, Jean Laroche
Realtime acoustic adaptation using stability measures

Patent number: 8515750

Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.

Type: Grant

Filed: September 19, 2012

Date of Patent: August 20, 2013

Assignee: Google Inc.

Inventors: Xin Lei, Petar Aleksic
Subword-Based Multi-Level Pronunciation Adaptation for Recognizing Accented Speech

Publication number: 20130191126

Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.

Type: Application

Filed: January 20, 2012

Publication date: July 25, 2013

Applicant: Microsoft Corporation

Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

Patent number: 8494849

Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.

Type: Grant

Filed: June 20, 2005

Date of Patent: July 23, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
Phonetic features for speech recognition

Patent number: 8484024

Abstract: Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.

Type: Grant

Filed: February 24, 2011

Date of Patent: July 9, 2013

Assignee: Nuance Communications, Inc.

Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
Voice analysis device, voice analysis method, voice analysis program, and system integration circuit

Patent number: 8478587

Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the

Type: Grant

Filed: March 13, 2008

Date of Patent: July 2, 2013

Assignee: Panasonic Corporation

Inventors: Takashi Kawamura, Ryouichi Kawanishi
Library of existing spoken dialog data for use in generating new natural language spoken dialog systems

Patent number: 8478589

Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.

Type: Grant

Filed: January 5, 2005

Date of Patent: July 2, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
System and method for efficient tracking of multiple dialog states with incremental recombination

Patent number: 8457968

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.

Type: Grant

Filed: December 8, 2009

Date of Patent: June 4, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Method and apparatus for processing scripts and related data

Patent number: 8447604

Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.

Type: Grant

Filed: May 28, 2010

Date of Patent: May 21, 2013

Assignee: Adobe Systems Incorporated

Inventor: Walter W. Chang
Method and system for generating training data for an automatic speech recognizer

Patent number: 8438026

Abstract: The invention describes a method and a system for generating training data (DT) for an automatic speech recogniser (2) for operating at a particular first sampling frequency (fH), comprising steps of deriving spectral characteristics (SL) from audio data (DL) sampled at a second frequency (fL) lower than the first sampling frequency (fH), extending the bandwidth of the spectral characteristics (SL) by retrieving bandwidth extending informationOBE) from a codebook (6), and processing the bandwidth extended spectral characteristics (SLE) to give the required training data (DT). Moreover a method and a system (5) for generating a codebook (6) for extending the bandwidth of spectral characteristics (SL) for audio data (DL) sampled at a second sampling frequency (fL) to spectral characteristics (SH) for a first sampling frequency (fH) higher than the second sampling frequency (fL) are described.

Type: Grant

Filed: February 10, 2005

Date of Patent: May 7, 2013

Assignee: Nuance Communications, Inc.

Inventors: Alexander Fischer, Rolf Dieter Bippus
Confidence tying for unsupervised synthetic speech adaptation

Patent number: 8438029

Abstract: Disclosed are apparatus and methods for generating synthesized utterances. A computing device can receive speech data corresponding to spoken utterances of a particular speaker. Textual elements of an input text corresponding to the speech data can be recognized. Confidence levels associated with the recognized textual elements can be determined. Speech-synthesis parameters of decision trees can be adapted based on the speech data, recognized textual elements, and associated confidence levels. Each adapted decision tree can map individual elements of a text to individual of the speech-synthesis parameters. A second input text can be received. The second input text can be mapped to speech-synthesis parameters using the adapted decision trees. A synthesized spoken utterance can be generated corresponding to the second input text using the speech-synthesis parameters. At least some of the speech-synthesis parameters are configured to simulate the particular speaker.

Type: Grant

Filed: August 22, 2012

Date of Patent: May 7, 2013

Assignee: Google Inc.

Inventors: Matthew Nicholas Stuttle, Byungha Chun
Speech recognition dictionary creating support device, computer readable medium storing processing program, and processing method

Patent number: 8423354

Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.

Type: Grant

Filed: November 5, 2010

Date of Patent: April 16, 2013

Assignee: Fujitsu Limited

Inventor: Sachiko Onodera
Handheld electronic device and method for dual-mode disambiguation of text input

Patent number: 8390574

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In response to an ambiguous editing input at a location preceding at least a portion of an output word, the software performs one disambiguation operation with respect to the editing input and another disambiguation operation with respect to the editing input in combination with the at least portion of the output word. The results are output in order of decreasing frequency value, with the results of the one disambiguation operation having the portion of the output word appended thereto.

Type: Grant

Filed: August 10, 2011

Date of Patent: March 5, 2013

Assignee: Research In Motion Limited

Inventors: Michael Elizarov, Vadim Fux, Dan Rubanovich
Progressive application of knowledge sources in multistage speech recognition

Patent number: 8386251

Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.

Type: Grant

Filed: June 8, 2009

Date of Patent: February 26, 2013

Assignee: Microsoft Corporation

Inventors: Nikko Strom, Julian Odell, Jon Hamaker
Systems and methods for evaluating a sequence of characters

Patent number: 8386238

Abstract: A sequence of characters may be evaluated to determine the presence of a natural language word. The sequence of characters may be analyzed to find a subsequence of alphabetical characters. Based on a statistical model of a natural language, a probability that the subsequence is a natural language word may be calculated. The probability may then be used to determine if the subsequence is indeed a natural language word.

Type: Grant

Filed: November 5, 2008

Date of Patent: February 26, 2013

Assignee: Citrix Systems, Inc.

Inventor: Anthony Spataro
Utterance verification method and apparatus for isolated word N-best recognition result

Patent number: 8374869

Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.

Type: Grant

Filed: August 4, 2009

Date of Patent: February 12, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
Method for separating source signals and apparatus thereof

Patent number: 8364483

Abstract: A method for separating a sound source from a mixed signal, includes Transforming a mixed signal to channel signals in frequency domain; and grouping several frequency bands for each channel signal to form frequency clusters. Further, the method for separating the sound source from the mixed signal includes separating the frequency clusters by applying a blind source separation to signals in frequency domain for each frequency cluster; and integrating the spectrums of the separated signal to restore the sound source in a time domain wherein each of the separated signals expresses one sound source.

Type: Grant

Filed: June 19, 2009

Date of Patent: January 29, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Ki-young Park, Ho-Young Jung, Yun Keun Lee, Jeon Gue Park, Jeom Ja Kang, Hoon Chung, Sung Joo Lee, Byung Ok Kang, Ji Hyun Wang, Eui Sok Chung, Hyung-Bae Jeon, Jong Jin Kim
METHOD AND SYSTEM FOR SPEAKER DIARIZATION

Publication number: 20130006635

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Application

Filed: September 11, 2012

Publication date: January 3, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES

Inventor: Hagai Aronowitz
IDENTIFYING PEOPLE THAT ARE PROXIMATE TO A MOBILE DEVICE USER VIA SOCIAL GRAPHS, SPEECH MODELS, AND USER CONTEXT

Publication number: 20130006634

Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.

Type: Application

Filed: January 6, 2012

Publication date: January 3, 2013

Applicant: QUALCOMM Incorporated

Inventors: Leonard Henry Grokop, Vidya Narayanan
MEANING EXTRACTION SYSTEM, MEANING EXTRACTION METHOD, AND RECORDING MEDIUM

Publication number: 20130006636

Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.

Type: Application

Filed: March 24, 2011

Publication date: January 3, 2013

Applicant: NEC CORPORATION

Inventors: Hironori Mizuguchi, Dai Kusui
LEARNING SPEECH MODELS FOR MOBILE DEVICE USERS

Publication number: 20130006633

Abstract: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into a plurality of signals. For each signal, the signal may be associated with value/s for one or more features (e.g., Mel-Frequency Cepstral coefficients). The received data may be clustered (e.g., by clustering features associated with the signals). A predominate voice cluster may be identified and associated with a user. A speech model (e.g., a Gaussian Mixture Model or Hidden Markov Model) may be trained based on data associated with the predominate cluster. A received audio signal may then be processed using the speech model to, e.g.: determine who was speaking; determine whether the user was speaking; determining whether anyone was speaking; and/or determine what words were said. A context of the device or the user may then be inferred based at least partly on the processed signal.

Type: Application

Filed: January 5, 2012

Publication date: January 3, 2013

Applicant: QUALCOMM Incorporated

Inventors: Leonard Henry Grokop, Vidya Narayanan
Method for speech recognition using partitioned vocabulary

Patent number: 8306820

Abstract: A is recognized using a predefinable vocabulary that is partitioned in sections of phonetically similar words. In a recognition process, first oral input is associated with one of the sections, then the oral input is determined from the vocabulary of the associated section.

Type: Grant

Filed: October 4, 2005

Date of Patent: November 6, 2012

Assignee: Siemens Aktiengesellschaft

Inventor: Niels Kunstmann
Identifying and generating audio cohorts based on audio data input

Patent number: 8301443

Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.

Type: Grant

Filed: November 21, 2008

Date of Patent: October 30, 2012

Assignee: International Business Machines Corporation

Inventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
Method and apparatus for speech dereverberation based on probabilistic models of source and room acoustics

Patent number: 8290170

Abstract: Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).

Type: Grant

Filed: May 1, 2006

Date of Patent: October 16, 2012

Assignees: Nippon Telegraph and Telephone Corporation, Georgia Tech Research Corporation

Inventors: Tomohiro Nakatani, Biing-Hwang Juang
Clique based clustering for named entity recognition system

Patent number: 8275608

Abstract: A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.

Type: Grant

Filed: July 3, 2008

Date of Patent: September 25, 2012

Assignee: Xerox Corporation

Inventors: Julien Ah-Pine, Guillaume Jacquet

prev 1 2 3 4 5 6 7 next