Clustering Patents (Class 704/245)
  • Publication number: 20140195232
    Abstract: Embodiments provide a method and system of text independent speaker recognition with a complexity comparable to a text dependent version. The scheme exploits the fact that speech is a quasi-stationary signal and simplifies the recognition process based on this theory. The modeling allows the speaker profile to be updated progressively with the new speech sample that is acquired during usage time.
    Type: Application
    Filed: April 1, 2013
    Publication date: July 10, 2014
    Applicant: STMicroelectronics Asia Pacific Pte Ltd.
    Inventor: STMicroelectronics Asia Pacific Pte Ltd.
  • Patent number: 8762156
    Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: June 24, 2014
    Assignee: Apple Inc.
    Inventor: Lik Harry Chen
  • Publication number: 20140172427
    Abstract: A method for processing messages pertaining to an event includes receiving a plurality of messages pertaining to the event from electronic communication devices associated with a plurality of observers of the event, generating a first message stream that includes only a portion of the plurality of messages corresponding to a first participant in the event, identifying a first sub-event in the first message stream with reference to a time distribution of messages and content distribution of messages in the first message stream, generating a sub-event summary with reference to a portion of the plurality of messages in the first message stream that are associated with the first sub-event, and transmitting the sub-event summary to a plurality of electronic communication devices associated with a plurality of users who are not observers of the event.
    Type: Application
    Filed: December 13, 2013
    Publication date: June 19, 2014
    Applicant: Robert Bosch GmbH
    Inventors: Fei Liu, Fuliang Weng, Chao Shen, Lin Zhao
  • Patent number: 8712771
    Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Grant
    Filed: October 31, 2013
    Date of Patent: April 29, 2014
    Inventor: Alon Konchitsky
  • Patent number: 8700402
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.
    Type: Grant
    Filed: June 4, 2013
    Date of Patent: April 15, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 8700406
    Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
    Type: Grant
    Filed: August 19, 2011
    Date of Patent: April 15, 2014
    Assignee: Qualcomm Incorporated
    Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
  • Patent number: 8688453
    Abstract: According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: April 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Sachindra Joshi, Shantanu Godbole
  • Patent number: 8682654
    Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.
    Type: Grant
    Filed: April 25, 2006
    Date of Patent: March 25, 2014
    Assignee: Cyberlink Corp.
    Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
  • Patent number: 8661515
    Abstract: An audible authentication of a wireless device for enrollment onto a secure wireless network includes an unauthorized wireless device that audibly emits a uniquely identifying secret code (e.g., a personal identification number (PIN)). In some implementations, the audible code is heard by the user and manually entered via a network-enrollment user interface. In other implementations, a network-authorizing device automatically picks up the audible code and verifies the code. If verified, the wireless device is enrolled onto the wireless network.
    Type: Grant
    Filed: May 10, 2010
    Date of Patent: February 25, 2014
    Assignee: Intel Corporation
    Inventors: Marc Meylemans, Gary A. Martz, Jr.
  • Patent number: 8645136
    Abstract: A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group.
    Type: Grant
    Filed: July 20, 2010
    Date of Patent: February 4, 2014
    Assignee: Intellisist, Inc.
    Inventor: David Milstein
  • Patent number: 8635065
    Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted
    Type: Grant
    Filed: November 10, 2004
    Date of Patent: January 21, 2014
    Assignee: Sony Deutschland GmbH
    Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
  • Patent number: 8630853
    Abstract: A speech classification apparatus includes a speech classification probability calculation unit that calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model, and a parameter updating unit that successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation unit.
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: January 14, 2014
    Assignee: NEC Corporation
    Inventor: Takafumi Koshinaka
  • Patent number: 8612224
    Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd
    Type: Grant
    Filed: August 23, 2011
    Date of Patent: December 17, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
  • Patent number: 8606580
    Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 10, 2013
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Makoto Shozakai, Goshu Nagino
  • Publication number: 20130325472
    Abstract: Some aspects include transforming data, at least a portion of which has been processed to determine frequency information associated with features in the data. Techniques include determining a first transformation based, at least in part, on the frequency information, applying at least the first transformation to the data to obtain transformed data, and fitting a plurality of clusters to the transformed data to obtain a plurality of established clusters. Some aspects include classifying input data by transforming the input data using at least the first transformation and comparing the transformed input data to the established clusters.
    Type: Application
    Filed: August 8, 2012
    Publication date: December 5, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Leonid Rachevsky, Dimitri Kanevsky, Bhuvana Ramabhadran
  • Patent number: 8600750
    Abstract: In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.
    Type: Grant
    Filed: June 8, 2010
    Date of Patent: December 3, 2013
    Assignee: Cisco Technology, Inc.
    Inventors: Michael A. Ramalho, Todd C. Tatum, Shantanu Sarkar
  • Publication number: 20130311183
    Abstract: This invention provides a voiced sound interval detection device which enables appropriate detection of a voiced sound interval of an observation signal even when a volume of sound from a sound source varies or when the number of sound sources is unknown or when different kinds of microphones are used together.
    Type: Application
    Filed: January 25, 2012
    Publication date: November 21, 2013
    Applicant: NEC CORPORATION
    Inventor: Yoshifumi Onishi
  • Patent number: 8583432
    Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.
    Type: Grant
    Filed: July 25, 2012
    Date of Patent: November 12, 2013
    Assignee: International Business Machines Corporation
    Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
  • Patent number: 8554563
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8554561
    Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
    Type: Grant
    Filed: August 9, 2012
    Date of Patent: October 8, 2013
    Assignee: Google Inc.
    Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
  • Patent number: 8554562
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Grant
    Filed: November 15, 2009
    Date of Patent: October 8, 2013
    Assignee: Nuance Communications, Inc.
    Inventor: Hagai Aronowitz
  • Patent number: 8553991
    Abstract: A clustering processing apparatus comprises: N clustering units that group samples included in the data block into clusters, each clustering unit sequentially taking each sample as a target, grouping the target sample into one of the clusters within the data block, storing cluster information including identification on each cluster into which the samples are grouped within the data block, and storing sample assignment information indicating the cluster to which the target sample belongs; a cluster information transferring unit that selects cluster information on a cluster to be integrated from the cluster information when a predetermined condition is met, and transfers the selected cluster information to a third storage unit; and an updating unit that integrates clusters selected based on the cluster information stored in the third storage unit into an integrated cluster, and updates the sample assignment information based on information of the integrated clusters.
    Type: Grant
    Filed: June 15, 2011
    Date of Patent: October 8, 2013
    Assignee: Canon Kabushiki Kaisha
    Inventor: Satoshi Naito
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20130253931
    Abstract: A modeling device and method for speaker recognition and a speaker recognition system are provided. The modeling device comprises a front end which receives enrollment speech data from each target speaker, a reference anchor set generation unit which generates a reference anchor set using the enrollment speech data based on an anchor space, and a voice print generation unit which generates voice prints based on the reference anchor set and the enrollment speech data. With the present disclosure, by taking the enrollment speech and speaker adaptation technique into account, anchor models with smaller size can be generated, so reliable and robust speaker recognition with smaller size reference anchor set is possible. It brings great advantages for computation speed improvement and great memory reduction.
    Type: Application
    Filed: December 10, 2010
    Publication date: September 26, 2013
    Inventors: Haifeng Shen, Long Ma, Bingqi Zhang
  • Patent number: 8543402
    Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.
    Type: Grant
    Filed: April 29, 2011
    Date of Patent: September 24, 2013
    Assignee: The Intellisis Corporation
    Inventor: Jiyong Ma
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Patent number: 8515750
    Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.
    Type: Grant
    Filed: September 19, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Xin Lei, Petar Aleksic
  • Publication number: 20130191126
    Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.
    Type: Application
    Filed: January 20, 2012
    Publication date: July 25, 2013
    Applicant: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
  • Patent number: 8494849
    Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.
    Type: Grant
    Filed: June 20, 2005
    Date of Patent: July 23, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
  • Patent number: 8484024
    Abstract: Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.
    Type: Grant
    Filed: February 24, 2011
    Date of Patent: July 9, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 8478587
    Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least the
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: July 2, 2013
    Assignee: Panasonic Corporation
    Inventors: Takashi Kawamura, Ryouichi Kawanishi
  • Patent number: 8478589
    Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.
    Type: Grant
    Filed: January 5, 2005
    Date of Patent: July 2, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
  • Patent number: 8457968
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.
    Type: Grant
    Filed: December 8, 2009
    Date of Patent: June 4, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 8447604
    Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: May 21, 2013
    Assignee: Adobe Systems Incorporated
    Inventor: Walter W. Chang
  • Patent number: 8438026
    Abstract: The invention describes a method and a system for generating training data (DT) for an automatic speech recogniser (2) for operating at a particular first sampling frequency (fH), comprising steps of deriving spectral characteristics (SL) from audio data (DL) sampled at a second frequency (fL) lower than the first sampling frequency (fH), extending the bandwidth of the spectral characteristics (SL) by retrieving bandwidth extending informationOBE) from a codebook (6), and processing the bandwidth extended spectral characteristics (SLE) to give the required training data (DT). Moreover a method and a system (5) for generating a codebook (6) for extending the bandwidth of spectral characteristics (SL) for audio data (DL) sampled at a second sampling frequency (fL) to spectral characteristics (SH) for a first sampling frequency (fH) higher than the second sampling frequency (fL) are described.
    Type: Grant
    Filed: February 10, 2005
    Date of Patent: May 7, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Alexander Fischer, Rolf Dieter Bippus
  • Patent number: 8438029
    Abstract: Disclosed are apparatus and methods for generating synthesized utterances. A computing device can receive speech data corresponding to spoken utterances of a particular speaker. Textual elements of an input text corresponding to the speech data can be recognized. Confidence levels associated with the recognized textual elements can be determined. Speech-synthesis parameters of decision trees can be adapted based on the speech data, recognized textual elements, and associated confidence levels. Each adapted decision tree can map individual elements of a text to individual of the speech-synthesis parameters. A second input text can be received. The second input text can be mapped to speech-synthesis parameters using the adapted decision trees. A synthesized spoken utterance can be generated corresponding to the second input text using the speech-synthesis parameters. At least some of the speech-synthesis parameters are configured to simulate the particular speaker.
    Type: Grant
    Filed: August 22, 2012
    Date of Patent: May 7, 2013
    Assignee: Google Inc.
    Inventors: Matthew Nicholas Stuttle, Byungha Chun
  • Patent number: 8423354
    Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.
    Type: Grant
    Filed: November 5, 2010
    Date of Patent: April 16, 2013
    Assignee: Fujitsu Limited
    Inventor: Sachiko Onodera
  • Patent number: 8390574
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In response to an ambiguous editing input at a location preceding at least a portion of an output word, the software performs one disambiguation operation with respect to the editing input and another disambiguation operation with respect to the editing input in combination with the at least portion of the output word. The results are output in order of decreasing frequency value, with the results of the one disambiguation operation having the portion of the output word appended thereto.
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: March 5, 2013
    Assignee: Research In Motion Limited
    Inventors: Michael Elizarov, Vadim Fux, Dan Rubanovich
  • Patent number: 8386251
    Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.
    Type: Grant
    Filed: June 8, 2009
    Date of Patent: February 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Nikko Strom, Julian Odell, Jon Hamaker
  • Patent number: 8386238
    Abstract: A sequence of characters may be evaluated to determine the presence of a natural language word. The sequence of characters may be analyzed to find a subsequence of alphabetical characters. Based on a statistical model of a natural language, a probability that the subsequence is a natural language word may be calculated. The probability may then be used to determine if the subsequence is indeed a natural language word.
    Type: Grant
    Filed: November 5, 2008
    Date of Patent: February 26, 2013
    Assignee: Citrix Systems, Inc.
    Inventor: Anthony Spataro
  • Patent number: 8374869
    Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: February 12, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
  • Patent number: 8364483
    Abstract: A method for separating a sound source from a mixed signal, includes Transforming a mixed signal to channel signals in frequency domain; and grouping several frequency bands for each channel signal to form frequency clusters. Further, the method for separating the sound source from the mixed signal includes separating the frequency clusters by applying a blind source separation to signals in frequency domain for each frequency cluster; and integrating the spectrums of the separated signal to restore the sound source in a time domain wherein each of the separated signals expresses one sound source.
    Type: Grant
    Filed: June 19, 2009
    Date of Patent: January 29, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Ki-young Park, Ho-Young Jung, Yun Keun Lee, Jeon Gue Park, Jeom Ja Kang, Hoon Chung, Sung Joo Lee, Byung Ok Kang, Ji Hyun Wang, Eui Sok Chung, Hyung-Bae Jeon, Jong Jin Kim
  • Publication number: 20130006635
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES
    Inventor: Hagai Aronowitz
  • Publication number: 20130006634
    Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.
    Type: Application
    Filed: January 6, 2012
    Publication date: January 3, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Leonard Henry Grokop, Vidya Narayanan
  • Publication number: 20130006636
    Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.
    Type: Application
    Filed: March 24, 2011
    Publication date: January 3, 2013
    Applicant: NEC CORPORATION
    Inventors: Hironori Mizuguchi, Dai Kusui
  • Publication number: 20130006633
    Abstract: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into a plurality of signals. For each signal, the signal may be associated with value/s for one or more features (e.g., Mel-Frequency Cepstral coefficients). The received data may be clustered (e.g., by clustering features associated with the signals). A predominate voice cluster may be identified and associated with a user. A speech model (e.g., a Gaussian Mixture Model or Hidden Markov Model) may be trained based on data associated with the predominate cluster. A received audio signal may then be processed using the speech model to, e.g.: determine who was speaking; determine whether the user was speaking; determining whether anyone was speaking; and/or determine what words were said. A context of the device or the user may then be inferred based at least partly on the processed signal.
    Type: Application
    Filed: January 5, 2012
    Publication date: January 3, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Leonard Henry Grokop, Vidya Narayanan
  • Patent number: 8306820
    Abstract: A is recognized using a predefinable vocabulary that is partitioned in sections of phonetically similar words. In a recognition process, first oral input is associated with one of the sections, then the oral input is determined from the vocabulary of the associated section.
    Type: Grant
    Filed: October 4, 2005
    Date of Patent: November 6, 2012
    Assignee: Siemens Aktiengesellschaft
    Inventor: Niels Kunstmann
  • Patent number: 8301443
    Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: October 30, 2012
    Assignee: International Business Machines Corporation
    Inventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
  • Patent number: 8290170
    Abstract: Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).
    Type: Grant
    Filed: May 1, 2006
    Date of Patent: October 16, 2012
    Assignees: Nippon Telegraph and Telephone Corporation, Georgia Tech Research Corporation
    Inventors: Tomohiro Nakatani, Biing-Hwang Juang
  • Patent number: 8275608
    Abstract: A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.
    Type: Grant
    Filed: July 3, 2008
    Date of Patent: September 25, 2012
    Assignee: Xerox Corporation
    Inventors: Julien Ah-Pine, Guillaume Jacquet