Clustering Patents (Class 704/245)
-
Publication number: 20140195232Abstract: Embodiments provide a method and system of text independent speaker recognition with a complexity comparable to a text dependent version. The scheme exploits the fact that speech is a quasi-stationary signal and simplifies the recognition process based on this theory. The modeling allows the speaker profile to be updated progressively with the new speech sample that is acquired during usage time.Type: ApplicationFiled: April 1, 2013Publication date: July 10, 2014Applicant: STMicroelectronics Asia Pacific Pte Ltd.Inventor: STMicroelectronics Asia Pacific Pte Ltd.
-
Patent number: 8762156Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.Type: GrantFiled: September 28, 2011Date of Patent: June 24, 2014Assignee: Apple Inc.Inventor: Lik Harry Chen
-
Publication number: 20140172427Abstract: A method for processing messages pertaining to an event includes receiving a plurality of messages pertaining to the event from electronic communication devices associated with a plurality of observers of the event, generating a first message stream that includes only a portion of the plurality of messages corresponding to a first participant in the event, identifying a first sub-event in the first message stream with reference to a time distribution of messages and content distribution of messages in the first message stream, generating a sub-event summary with reference to a portion of the plurality of messages in the first message stream that are associated with the first sub-event, and transmitting the sub-event summary to a plurality of electronic communication devices associated with a plurality of users who are not observers of the event.Type: ApplicationFiled: December 13, 2013Publication date: June 19, 2014Applicant: Robert Bosch GmbHInventors: Fei Liu, Fuliang Weng, Chao Shen, Lin Zhao
-
Patent number: 8712771Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.Type: GrantFiled: October 31, 2013Date of Patent: April 29, 2014Inventor: Alon Konchitsky
-
Patent number: 8700402Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.Type: GrantFiled: June 4, 2013Date of Patent: April 15, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Jason Williams
-
Patent number: 8700406Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.Type: GrantFiled: August 19, 2011Date of Patent: April 15, 2014Assignee: Qualcomm IncorporatedInventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
-
Patent number: 8688453Abstract: According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance.Type: GrantFiled: February 28, 2011Date of Patent: April 1, 2014Assignee: Nuance Communications, Inc.Inventors: Sachindra Joshi, Shantanu Godbole
-
Patent number: 8682654Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.Type: GrantFiled: April 25, 2006Date of Patent: March 25, 2014Assignee: Cyberlink Corp.Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
-
Patent number: 8661515Abstract: An audible authentication of a wireless device for enrollment onto a secure wireless network includes an unauthorized wireless device that audibly emits a uniquely identifying secret code (e.g., a personal identification number (PIN)). In some implementations, the audible code is heard by the user and manually entered via a network-enrollment user interface. In other implementations, a network-authorizing device automatically picks up the audible code and verifies the code. If verified, the wireless device is enrolled onto the wireless network.Type: GrantFiled: May 10, 2010Date of Patent: February 25, 2014Assignee: Intel CorporationInventors: Marc Meylemans, Gary A. Martz, Jr.
-
Patent number: 8645136Abstract: A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group.Type: GrantFiled: July 20, 2010Date of Patent: February 4, 2014Assignee: Intellisist, Inc.Inventor: David Milstein
-
Patent number: 8635065Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extractedType: GrantFiled: November 10, 2004Date of Patent: January 21, 2014Assignee: Sony Deutschland GmbHInventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
-
Patent number: 8630853Abstract: A speech classification apparatus includes a speech classification probability calculation unit that calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model, and a parameter updating unit that successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation unit.Type: GrantFiled: March 13, 2008Date of Patent: January 14, 2014Assignee: NEC CorporationInventor: Takafumi Koshinaka
-
Patent number: 8612224Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are updType: GrantFiled: August 23, 2011Date of Patent: December 17, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Catherine Breslin, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
-
Patent number: 8606580Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.Type: GrantFiled: December 30, 2008Date of Patent: December 10, 2013Assignee: Asahi Kasei Kabushiki KaishaInventors: Makoto Shozakai, Goshu Nagino
-
Publication number: 20130325472Abstract: Some aspects include transforming data, at least a portion of which has been processed to determine frequency information associated with features in the data. Techniques include determining a first transformation based, at least in part, on the frequency information, applying at least the first transformation to the data to obtain transformed data, and fitting a plurality of clusters to the transformed data to obtain a plurality of established clusters. Some aspects include classifying input data by transforming the input data using at least the first transformation and comparing the transformed input data to the established clusters.Type: ApplicationFiled: August 8, 2012Publication date: December 5, 2013Applicant: Nuance Communications, Inc.Inventors: Leonid Rachevsky, Dimitri Kanevsky, Bhuvana Ramabhadran
-
Patent number: 8600750Abstract: In an example embodiment, there is disclosed herein an automatic speech recognition (ASR) system that employs speaker clustering (or speaker type) for transcribing audio. A large corpus of audio with corresponding transcripts is analyzed to determine a plurality of speaker types (e.g., dialects). The ASR system is trained for each speaker type. Upon encountering a new user, the ASR system attempts to map the user to a speaker type. After the new user is mapped to a speaker type, the ASR employs the speaker type for transcribing audio from the new user.Type: GrantFiled: June 8, 2010Date of Patent: December 3, 2013Assignee: Cisco Technology, Inc.Inventors: Michael A. Ramalho, Todd C. Tatum, Shantanu Sarkar
-
Publication number: 20130311183Abstract: This invention provides a voiced sound interval detection device which enables appropriate detection of a voiced sound interval of an observation signal even when a volume of sound from a sound source varies or when the number of sound sources is unknown or when different kinds of microphones are used together.Type: ApplicationFiled: January 25, 2012Publication date: November 21, 2013Applicant: NEC CORPORATIONInventor: Yoshifumi Onishi
-
Patent number: 8583432Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. One system for automatic speech recognition includes a dialect recognition unit and a controller. The dialect recognition unit is configured to analyze acoustic input data to identify portions of the acoustic input data that conform to a general language and to identify portions of the acoustic input data that conform to at least one dialect of the general language. In addition, the controller is configured to apply a general language model and at least one dialect language model to the input data to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions.Type: GrantFiled: July 25, 2012Date of Patent: November 12, 2013Assignee: International Business Machines CorporationInventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
-
Patent number: 8554563Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: GrantFiled: September 11, 2012Date of Patent: October 8, 2013Assignee: Nuance Communications, Inc.Inventor: Hagai Aronowitz
-
Patent number: 8554561Abstract: A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.Type: GrantFiled: August 9, 2012Date of Patent: October 8, 2013Assignee: Google Inc.Inventors: Jeffrey A. Dean, Sanjay Ghemawat, Gautham Thambidorai
-
Patent number: 8554562Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: GrantFiled: November 15, 2009Date of Patent: October 8, 2013Assignee: Nuance Communications, Inc.Inventor: Hagai Aronowitz
-
Patent number: 8553991Abstract: A clustering processing apparatus comprises: N clustering units that group samples included in the data block into clusters, each clustering unit sequentially taking each sample as a target, grouping the target sample into one of the clusters within the data block, storing cluster information including identification on each cluster into which the samples are grouped within the data block, and storing sample assignment information indicating the cluster to which the target sample belongs; a cluster information transferring unit that selects cluster information on a cluster to be integrated from the cluster information when a predetermined condition is met, and transfers the selected cluster information to a third storage unit; and an updating unit that integrates clusters selected based on the cluster information stored in the third storage unit into an integrated cluster, and updates the sample assignment information based on information of the integrated clusters.Type: GrantFiled: June 15, 2011Date of Patent: October 8, 2013Assignee: Canon Kabushiki KaishaInventor: Satoshi Naito
-
Patent number: 8548807Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.Type: GrantFiled: June 9, 2009Date of Patent: October 1, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
-
Publication number: 20130253931Abstract: A modeling device and method for speaker recognition and a speaker recognition system are provided. The modeling device comprises a front end which receives enrollment speech data from each target speaker, a reference anchor set generation unit which generates a reference anchor set using the enrollment speech data based on an anchor space, and a voice print generation unit which generates voice prints based on the reference anchor set and the enrollment speech data. With the present disclosure, by taking the enrollment speech and speaker adaptation technique into account, anchor models with smaller size can be generated, so reliable and robust speaker recognition with smaller size reference anchor set is possible. It brings great advantages for computation speed improvement and great memory reduction.Type: ApplicationFiled: December 10, 2010Publication date: September 26, 2013Inventors: Haifeng Shen, Long Ma, Bingqi Zhang
-
Patent number: 8543402Abstract: System and methods for robust multiple speaker segmentation in noisy conversational speech are presented. Robust voice activity detection is applied to detect temporal speech events. In order to get robust speech features and detect speech events in a noisy environment, a noise reduction algorithm is applied, using noise tracking. After noise reduction and voice activity detection, the incoming audio/speech is initially labeled as speech segments or silence segments. With no prior knowledge of the number of speakers, the system identifies one reliable speech segment near the beginning of the conversational speech and extracts speech features with a short latency, then learns a statistical model from the selected speech segment. This initial statistical model is used to identify the succeeding speech segments in a conversation. The statistical model is also continuously adapted and expanded with newly identified speech segments that match well to the model.Type: GrantFiled: April 29, 2011Date of Patent: September 24, 2013Assignee: The Intellisis CorporationInventor: Jiyong Ma
-
Patent number: 8521529Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.Type: GrantFiled: April 18, 2005Date of Patent: August 27, 2013Assignee: Creative Technology LtdInventors: Michael M. Goodwin, Jean Laroche
-
Patent number: 8515750Abstract: Methods, systems, and computer programs encoded on a computer storage medium for real-time acoustic adaptation using stability measures are disclosed. The methods include the actions of receiving a transcription of a first portion of a speech session, wherein the transcription of the first portion of the speech session is generated using a speaker adaptation profile. The actions further include receiving a stability measure for a segment of the transcription and determining that the stability measure for the segment satisfies a threshold. Additionally, the actions include triggering an update of the speaker adaptation profile using the segment, or using a portion of speech data that corresponds to the segment. And the actions include receiving a transcription of a second portion of the speech session, wherein the transcription of the second portion of the speech session is generated using the updated speaker adaptation profile.Type: GrantFiled: September 19, 2012Date of Patent: August 20, 2013Assignee: Google Inc.Inventors: Xin Lei, Petar Aleksic
-
Publication number: 20130191126Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.Type: ApplicationFiled: January 20, 2012Publication date: July 25, 2013Applicant: Microsoft CorporationInventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
-
Patent number: 8494849Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.Type: GrantFiled: June 20, 2005Date of Patent: July 23, 2013Assignee: Telecom Italia S.p.A.Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
-
Patent number: 8484024Abstract: Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.Type: GrantFiled: February 24, 2011Date of Patent: July 9, 2013Assignee: Nuance Communications, Inc.Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
-
Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
Patent number: 8478587Abstract: A sound analysis device comprises: a sound parameter calculation unit operable to acquire an audio signal and calculate a sound parameter for each of partial audio signals, the partial audio signals each being the acquired audio signal in a unit of time; a category determination unit operable to determine, from among a plurality of environmental sound categories, which environmental sound category each of the partial audio signals belongs to, based on a corresponding one of the calculated sound parameters; a section setting unit operable to sequentially set judgement target sections on a time axis as time elapses, each of the judgment target sections including two or more of the units of time, the two or more of the units of time being consecutive; and an environment judgment unit operable to judge, based on a number of partial audio signals in each environmental sound category determined in at least a most recent judgment target section, an environment that surrounds the sound analysis device in at least theType: GrantFiled: March 13, 2008Date of Patent: July 2, 2013Assignee: Panasonic CorporationInventors: Takashi Kawamura, Ryouichi Kawanishi -
Patent number: 8478589Abstract: A machine-readable medium may include a group of reusable components for building a spoken dialog system. The reusable components may include a group of previously collected audible utterances. A machine-implemented method to build a library of reusable components for use in building a natural language spoken dialog system may include storing a dataset in a database. The dataset may include a group of reusable components for building a spoken dialog system. The reusable components may further include a group of previously collected audible utterances. A second method may include storing at least one set of data. Each one of the at least one set of data may include ones of the reusable components associated with audible data collected during a different collection phase.Type: GrantFiled: January 5, 2005Date of Patent: July 2, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Lee Begeja, Giuseppe Di Fabbrizio, David Crawford Gibbon, Dilek Z. Hakkani-Tur, Zhu Liu, Bernard S. Renger, Behzad Shahraray, Gokhan Tur
-
Patent number: 8457968Abstract: Disclosed herein are systems, methods, and computer-readable storage media for tracking multiple dialog states. A system practicing the method receives an N-best list of speech recognition candidates, a list of current partitions, and a belief for each of the current partitions. A partition is a group of dialog states. In an outer loop, the system iterates over the N-best list of speech recognition candidates. In an inner loop, the system performs a split, update, and recombination process to generate a fixed number of partitions after each speech recognition candidate in the N-best list. The system recognizes speech based on the N-best list and the fixed number of partitions. The split process can perform all possible splits on all partitions. The update process can compute an estimated new belief. The estimated new belief can be a product of ASR reliability, user likelihood to produce this action, and an original belief.Type: GrantFiled: December 8, 2009Date of Patent: June 4, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Jason Williams
-
Patent number: 8447604Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.Type: GrantFiled: May 28, 2010Date of Patent: May 21, 2013Assignee: Adobe Systems IncorporatedInventor: Walter W. Chang
-
Patent number: 8438026Abstract: The invention describes a method and a system for generating training data (DT) for an automatic speech recogniser (2) for operating at a particular first sampling frequency (fH), comprising steps of deriving spectral characteristics (SL) from audio data (DL) sampled at a second frequency (fL) lower than the first sampling frequency (fH), extending the bandwidth of the spectral characteristics (SL) by retrieving bandwidth extending informationOBE) from a codebook (6), and processing the bandwidth extended spectral characteristics (SLE) to give the required training data (DT). Moreover a method and a system (5) for generating a codebook (6) for extending the bandwidth of spectral characteristics (SL) for audio data (DL) sampled at a second sampling frequency (fL) to spectral characteristics (SH) for a first sampling frequency (fH) higher than the second sampling frequency (fL) are described.Type: GrantFiled: February 10, 2005Date of Patent: May 7, 2013Assignee: Nuance Communications, Inc.Inventors: Alexander Fischer, Rolf Dieter Bippus
-
Patent number: 8438029Abstract: Disclosed are apparatus and methods for generating synthesized utterances. A computing device can receive speech data corresponding to spoken utterances of a particular speaker. Textual elements of an input text corresponding to the speech data can be recognized. Confidence levels associated with the recognized textual elements can be determined. Speech-synthesis parameters of decision trees can be adapted based on the speech data, recognized textual elements, and associated confidence levels. Each adapted decision tree can map individual elements of a text to individual of the speech-synthesis parameters. A second input text can be received. The second input text can be mapped to speech-synthesis parameters using the adapted decision trees. A synthesized spoken utterance can be generated corresponding to the second input text using the speech-synthesis parameters. At least some of the speech-synthesis parameters are configured to simulate the particular speaker.Type: GrantFiled: August 22, 2012Date of Patent: May 7, 2013Assignee: Google Inc.Inventors: Matthew Nicholas Stuttle, Byungha Chun
-
Patent number: 8423354Abstract: A device extracts prosodic information including a power value from a speech data and an utterance section including a period with a power value equal to or larger than a threshold, from the speech data, divides the utterance section into each section in which a power value equal to or larger than another threshold, acquires phoneme sequence data for each divided speech data by phoneme recognition, generates clusters which is a set of the classified phoneme sequence data by clustering, calculates an evaluation value for each cluster, selects clusters for which the evaluation value is equal to or larger than a given value as candidate clusters, determines one of the phoneme sequence data from the phoneme sequence data constituting the cluster for each candidate cluster to be a representative phoneme sequence, and selects the divided speech data corresponding to the representative phoneme sequence as listening target speech data.Type: GrantFiled: November 5, 2010Date of Patent: April 16, 2013Assignee: Fujitsu LimitedInventor: Sachiko Onodera
-
Patent number: 8390574Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate text input. In response to an ambiguous editing input at a location preceding at least a portion of an output word, the software performs one disambiguation operation with respect to the editing input and another disambiguation operation with respect to the editing input in combination with the at least portion of the output word. The results are output in order of decreasing frequency value, with the results of the one disambiguation operation having the portion of the output word appended thereto.Type: GrantFiled: August 10, 2011Date of Patent: March 5, 2013Assignee: Research In Motion LimitedInventors: Michael Elizarov, Vadim Fux, Dan Rubanovich
-
Patent number: 8386251Abstract: A speech recognition system is provided with iteratively refined multiple passes through the received data to enhance the accuracy of the results by introducing constraints and adaptation from initial passes into subsequent recognition operations. The multiple passes are performed on an initial utterance received from a user. The iteratively enhanced subsequent passes are also performed on following utterances received from the user increasing an overall system efficiency and accuracy.Type: GrantFiled: June 8, 2009Date of Patent: February 26, 2013Assignee: Microsoft CorporationInventors: Nikko Strom, Julian Odell, Jon Hamaker
-
Patent number: 8386238Abstract: A sequence of characters may be evaluated to determine the presence of a natural language word. The sequence of characters may be analyzed to find a subsequence of alphabetical characters. Based on a statistical model of a natural language, a probability that the subsequence is a natural language word may be calculated. The probability may then be used to determine if the subsequence is indeed a natural language word.Type: GrantFiled: November 5, 2008Date of Patent: February 26, 2013Assignee: Citrix Systems, Inc.Inventor: Anthony Spataro
-
Patent number: 8374869Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.Type: GrantFiled: August 4, 2009Date of Patent: February 12, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
-
Patent number: 8364483Abstract: A method for separating a sound source from a mixed signal, includes Transforming a mixed signal to channel signals in frequency domain; and grouping several frequency bands for each channel signal to form frequency clusters. Further, the method for separating the sound source from the mixed signal includes separating the frequency clusters by applying a blind source separation to signals in frequency domain for each frequency cluster; and integrating the spectrums of the separated signal to restore the sound source in a time domain wherein each of the separated signals expresses one sound source.Type: GrantFiled: June 19, 2009Date of Patent: January 29, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Ki-young Park, Ho-Young Jung, Yun Keun Lee, Jeon Gue Park, Jeom Ja Kang, Hoon Chung, Sung Joo Lee, Byung Ok Kang, Ji Hyun Wang, Eui Sok Chung, Hyung-Bae Jeon, Jong Jin Kim
-
Publication number: 20130006635Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: ApplicationFiled: September 11, 2012Publication date: January 3, 2013Applicant: INTERNATIONAL BUSINESS MACHINESInventor: Hagai Aronowitz
-
Publication number: 20130006634Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.Type: ApplicationFiled: January 6, 2012Publication date: January 3, 2013Applicant: QUALCOMM IncorporatedInventors: Leonard Henry Grokop, Vidya Narayanan
-
Publication number: 20130006636Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.Type: ApplicationFiled: March 24, 2011Publication date: January 3, 2013Applicant: NEC CORPORATIONInventors: Hironori Mizuguchi, Dai Kusui
-
Publication number: 20130006633Abstract: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into a plurality of signals. For each signal, the signal may be associated with value/s for one or more features (e.g., Mel-Frequency Cepstral coefficients). The received data may be clustered (e.g., by clustering features associated with the signals). A predominate voice cluster may be identified and associated with a user. A speech model (e.g., a Gaussian Mixture Model or Hidden Markov Model) may be trained based on data associated with the predominate cluster. A received audio signal may then be processed using the speech model to, e.g.: determine who was speaking; determine whether the user was speaking; determining whether anyone was speaking; and/or determine what words were said. A context of the device or the user may then be inferred based at least partly on the processed signal.Type: ApplicationFiled: January 5, 2012Publication date: January 3, 2013Applicant: QUALCOMM IncorporatedInventors: Leonard Henry Grokop, Vidya Narayanan
-
Patent number: 8306820Abstract: A is recognized using a predefinable vocabulary that is partitioned in sections of phonetically similar words. In a recognition process, first oral input is associated with one of the sections, then the oral input is determined from the vocabulary of the associated section.Type: GrantFiled: October 4, 2005Date of Patent: November 6, 2012Assignee: Siemens AktiengesellschaftInventor: Niels Kunstmann
-
Patent number: 8301443Abstract: A computer implemented method, apparatus, and computer program product for generating audio cohorts. An audio analysis engine receives audio data from a set of audio input devices. The audio data is associated with a plurality of objects. The audio data comprises a set of audio patterns. The audio data is processed to identify attributes of the audio data to form digital audio data. The digital audio data comprises metadata describing the attributes of the audio data. A set of audio cohorts is generated using the digital audio data and cohort criteria. Each audio cohort in the set of audio cohorts comprises a set of objects from the plurality of objects that share at least one audio attribute in common.Type: GrantFiled: November 21, 2008Date of Patent: October 30, 2012Assignee: International Business Machines CorporationInventors: Robert Lee Angell, Robert R Friedlander, James R Kraemer
-
Patent number: 8290170Abstract: Speech dereverberation is achieved by accepting an observed signal for initialization (1000) and performing likelihood maximization (2000) which includes Fourier Transforms (4000).Type: GrantFiled: May 1, 2006Date of Patent: October 16, 2012Assignees: Nippon Telegraph and Telephone Corporation, Georgia Tech Research CorporationInventors: Tomohiro Nakatani, Biing-Hwang Juang
-
Patent number: 8275608Abstract: A soft clustering method comprises (i) grouping items into non-exclusive cliques based on features associated with the items, and (ii) clustering the non-exclusive cliques using a hard clustering algorithm to generate item groups on the basis of mutual similarity of the features of the items constituting the cliques. In some named entity recognition embodiments illustrated herein as examples, named entities together with contexts are grouped into cliques based on mutual context similarity. Each clique includes a plurality of different named entities having mutual context similarity. The cliques are clustered to generate named entity groups on the basis of mutual similarity of the contexts of the named entities constituting the cliques.Type: GrantFiled: July 3, 2008Date of Patent: September 25, 2012Assignee: Xerox CorporationInventors: Julien Ah-Pine, Guillaume Jacquet