Creating Patterns For Matching Patents (Class 704/243)
  • Patent number: 9058805
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
    Type: Grant
    Filed: May 13, 2013
    Date of Patent: June 16, 2015
    Assignee: Google Inc.
    Inventors: Petar Aleksic, Pedro J. Mengibar, Fadi Biadsy
  • Patent number: 9058813
    Abstract: A natural language system may receive user-input. The user-input may include personal or restrictable information. The natural language system may provide a dual processing system. The natural language system may store a true copy of the user-input, which may include the personal or restrictable information. The natural language system may also generate an obfuscated copy of the user-input that does not contain personal or restricted information. The true copy of the user-input may be stored in a secure storage system and may be retrieved by authorized personnel, which may include the user who provided the user-input. The obfuscated copy of the user-input may be stored in a storage system and may be employed in ongoing training of the natural language system.
    Type: Grant
    Filed: September 21, 2012
    Date of Patent: June 16, 2015
    Assignee: Rawles LLC
    Inventor: Scott I. Blanksteen
  • Patent number: 9053703
    Abstract: This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect.
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: June 9, 2015
    Assignee: Google Inc.
    Inventors: Eugene Weinstein, Pedro J. Moreno Mengibar
  • Patent number: 9053708
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: June 9, 2015
    Assignee: International Business Machines Corporation
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Patent number: 9047867
    Abstract: Methods and systems for recognition of concurrent, superimposed, or otherwise overlapping signals are described. A Markov Selection Model is introduced that, together with probabilistic decomposition methods, enable recognition of simultaneously emitted signals from various sources. For example, a signal mixture may include overlapping speech from different persons. In some instances, recognition may be performed without the need to separate signals or sources. As such, some of the techniques described herein may be useful in automatic transcription, noise reduction, teaching, electronic games, audio search and retrieval, medical and scientific applications, etc.
    Type: Grant
    Filed: February 21, 2011
    Date of Patent: June 2, 2015
    Assignee: Adobe Systems Incorporated
    Inventor: Paris Smaragdis
  • Patent number: 9043206
    Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. Certain embodiments of the system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.
    Type: Grant
    Filed: October 28, 2013
    Date of Patent: May 26, 2015
    Assignee: Cyberpulse, L.L.C.
    Inventor: James Roberge
  • Patent number: 9037469
    Abstract: An apparatus includes a plurality of applications and an integrator having a voice recognition module configured to identify at least one voice command from a user. The integrator is configured to integrate information from a remote source into at least one of the plurality of applications based on the identified voice command. A method includes analyzing speech from a first user of a first mobile device having a plurality of applications, identifying a voice command based on the analyzed speech using a voice recognition module, and incorporating information from the remote source into at least one of a plurality of applications based on the identified voice command.
    Type: Grant
    Filed: January 27, 2014
    Date of Patent: May 19, 2015
    Assignee: VERIZON PATENT AND LICENSING INC.
    Inventor: Robert E. Opaluch
  • Publication number: 20150134332
    Abstract: A speech recognition method and device are disclosed. The method includes: acquiring a text file specified by a user, and extracting a command word from the text file, to obtain a command word list; comparing the command word list with a command word library, to confirm whether the command word list includes a new command word; if the command word list includes the new command word, generating a corresponding new pronunciation dictionary; merging the new language model into a language model library; and receiving speech, and performing speech recognition on the speech according to an acoustic model, a phonation dictionary, and the language model library. Command words acquired online are closely related to online content; therefore, the number of the command words is limited and far less than the number of frequently used words.
    Type: Application
    Filed: January 16, 2015
    Publication date: May 14, 2015
    Inventors: Change Liu, Deming Zhang
  • Publication number: 20150134331
    Abstract: In an embodiment, an integrated circuit may include one or more CPUs, a memory controller, and a circuit configured to remain powered on when the rest of the SOC is powered down. The circuit may be configured to receive audio samples from a microphone, and match those audio samples against a predetermined pattern to detect a possible command from a user of the device that includes the SOC. In response to detecting the predetermined pattern, the circuit may cause the memory controller to power up so that audio samples may be stored in the memory to which the memory controller is coupled. The circuit may also cause the CPUs to be powered on and initialized, and the operating system (OS) may boot. During the time that the CPUs are initializing and the OS is booting, the circuit and the memory may be capturing the audio samples.
    Type: Application
    Filed: December 17, 2013
    Publication date: May 14, 2015
    Applicant: Apple Inc.
    Inventors: Timothy J. Millet, Manu Gulati, Michael F. Culbert
  • Patent number: 9026442
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150120297
    Abstract: A voice-responsive building management system is described herein. One system includes an interface, a dynamic grammar builder, and a speech processing engine. The interface is configured to receive a speech card of a user, wherein the speech card of the user includes speech training data of the user and domain vocabulary for applications of the building management system for which the user is authorized. The dynamic grammar builder is configured to generate grammar from a building information model of the building management system. The speech processing engine is configured to receive a voice command or voice query from the user, and execute the voice command or voice query using the speech training data of the user, the domain vocabulary, and the grammar generated from the building information model.
    Type: Application
    Filed: October 24, 2013
    Publication date: April 30, 2015
    Applicant: Honeywell International Inc.
    Inventor: Jayaprakash Meruva
  • Patent number: 9020818
    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: April 28, 2015
    Assignee: Malaspina Labs (Barbados) Inc.
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
  • Patent number: 9020820
    Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
    Type: Grant
    Filed: April 13, 2012
    Date of Patent: April 28, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Patent number: 9020816
    Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 28, 2015
    Assignee: 21CT, Inc.
    Inventor: Matthew McClain
  • Publication number: 20150112679
    Abstract: A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal.
    Type: Application
    Filed: September 29, 2014
    Publication date: April 23, 2015
    Inventor: Guo-Feng Zhang
  • Patent number: 9015044
    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: April 21, 2015
    Assignee: Malaspina Labs (Barbados) Inc.
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
  • Patent number: 9009045
    Abstract: Methods and systems for model-driven candidate sorting for evaluating digital interviews are described. In one embodiment, a model-driven candidate-sorting tool selects a data set of digital interview data for sorting. The data set includes candidate for interviewing candidates (also referred to herein as interviewees). The model-driven candidate-sorting tool analyzes the candidate data for the respective interviewing candidate to identify digital interviewing cues and applies the digital interview cues to a prediction model to predict an achievement index for the respective interviewing candidate. This is performed without reviewer input at the model-driven candidate-sorting tool. The list of interview candidates is sorted according the predicted achievement indices and the sorted list is presented to the reviewer in a user interface.
    Type: Grant
    Filed: February 18, 2014
    Date of Patent: April 14, 2015
    Assignee: HireVue, Inc.
    Inventors: Loren Larsen, Benjamin Taylor
  • Patent number: 9009040
    Abstract: According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text.
    Type: Grant
    Filed: May 5, 2010
    Date of Patent: April 14, 2015
    Assignee: Cisco Technology, Inc.
    Inventors: Todd C. Tatum, Michael A. Ramalho, Paul M. Dunn, Shantanu Sarkar, Tyrone T. Thorsen, Alan D. Gatzke
  • Publication number: 20150100317
    Abstract: A speech recognition device starts to generate dictionary data for each type of name based on name data and paraphrase data, and executes dictionary registration of the dictionary data. The speech recognition device obtains text information same as text information for generating the dictionary data last time. When back-up data corresponding to the last time text information is generated, the speech recognition device executes the dictionary registration of the dictionary data generated as the back-up data. Further, a dictionary data generation device executes the dictionary registration of the dictionary data based on given name data every time the dictionary data generation device completes generation of the dictionary data based on the given name data.
    Type: Application
    Filed: January 29, 2013
    Publication date: April 9, 2015
    Inventors: Hideaki Tsuji, Satoshi Miyaguni
  • Patent number: 9002708
    Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.
    Type: Grant
    Filed: May 8, 2012
    Date of Patent: April 7, 2015
    Assignee: NHN Corporation
    Inventors: Sang Ho Lee, Hoon Kim, Dong Ook Koo, Dae Sung Jung
  • Patent number: 9002703
    Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Jay A. Crosley
  • Patent number: 8996371
    Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.
    Type: Grant
    Filed: March 29, 2012
    Date of Patent: March 31, 2015
    Assignee: Nice-Systems Ltd.
    Inventors: Eyal Hurvitz, Ezra Daya, Oren Pereg, Moshe Wasserblat
  • Patent number: 8996380
    Abstract: Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.
    Type: Grant
    Filed: May 4, 2011
    Date of Patent: March 31, 2015
    Assignee: Shazam Entertainment Ltd.
    Inventors: Avery Li-Chun Wang, Rahul Powar, William Michael Mills, Christopher Jacques Penrose Barton, Philip Georges Inghelbrecht, Dheeraj Shankar Mukherjee
  • Patent number: 8996376
    Abstract: Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.
    Type: Grant
    Filed: April 5, 2008
    Date of Patent: March 31, 2015
    Assignee: Apple Inc.
    Inventors: Christopher Brian Fleizach, Reginald Dean Hudson
  • Patent number: 8996372
    Abstract: Speech recognition may be improved using data derived from an utterance. In some embodiments, audio data is received by a user device. Adaptation data may be retrieved from a data store accessible by the user device. The audio data and the adaptation data may be transmitted to a server device. The server device may use the audio data to calculate second adaptation data. The second adaptation data may be transmitted to the user device. Synchronously or asynchronously, the server device may perform speech recognition using the audio data and the second adaptation data and transmit speech recognition results back to the user device.
    Type: Grant
    Filed: October 30, 2012
    Date of Patent: March 31, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Hugh Secker-Walker, Bjorn Hoffmeister, Ryan Thomas, Stan Salvador, Karthik Ramakrishnan
  • Patent number: 8996373
    Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.
    Type: Grant
    Filed: October 5, 2011
    Date of Patent: March 31, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Publication number: 20150088509
    Abstract: System for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier and method for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier.
    Type: Application
    Filed: September 24, 2014
    Publication date: March 26, 2015
    Inventors: Alfonso Ortega Giménez, Luis Buera Rodriguez, Carlos Vaquero Avilés-Casco
  • Publication number: 20150088510
    Abstract: A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar.
    Type: Application
    Filed: December 4, 2014
    Publication date: March 26, 2015
    Inventors: Cyril Georges Luc ALLAUZEN, Sarangarajan PARTHASARATHY
  • Patent number: 8990084
    Abstract: State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.
    Type: Grant
    Filed: February 10, 2014
    Date of Patent: March 24, 2015
    Assignee: Interactions LLC
    Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Guiseppe Riccardi
  • Publication number: 20150081297
    Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
    Type: Application
    Filed: November 24, 2014
    Publication date: March 19, 2015
    Inventors: Dilek Zeynep HAKKANI-TUR, Giuseppe RICCARDI
  • Patent number: 8983836
    Abstract: Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.
    Type: Grant
    Filed: September 26, 2012
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Elizabeth V. Woodward, Shunguo Yan
  • Publication number: 20150073798
    Abstract: Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yael Karov, Eran Levy, Sari Brosh-Lipstein
  • Publication number: 20150073794
    Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.
    Type: Application
    Filed: June 17, 2014
    Publication date: March 12, 2015
    Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
  • Publication number: 20150073795
    Abstract: A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example.
    Type: Application
    Filed: August 13, 2014
    Publication date: March 12, 2015
    Inventor: Bozhao Tan
  • Patent number: 8977551
    Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.
    Type: Grant
    Filed: October 27, 2011
    Date of Patent: March 10, 2015
    Assignee: Goertek Inc.
    Inventors: Fengliang Wu, Zhenhua Wu
  • Patent number: 8972258
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: May 22, 2014
    Date of Patent: March 3, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8972259
    Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.
    Type: Grant
    Filed: September 9, 2010
    Date of Patent: March 3, 2015
    Assignee: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Publication number: 20150058013
    Abstract: Techniques are described for calculating one or more verbal fluency scores for a person. An example method includes classifying, by a computing device, samples of audio data of speech of a person, based on amplitudes of the samples, into a first class of samples including speech or sound and a second class of samples including silence. The method further includes analyzing the first class of samples to determine a number of words spoken by the person, and calculating a verbal fluency score for the person based at least in part on the determined number of words spoken by the person.
    Type: Application
    Filed: March 14, 2013
    Publication date: February 26, 2015
    Applicant: Regents of the University of Minnesota
    Inventors: Serguei V.S. Pakhomov, Laura Sue Hemmy, Kelvin O. Lim
  • Publication number: 20150058014
    Abstract: A conversation management system includes: a training unit that generates an articulation speech act and an entity name of a training corpus, that generates a lexical syntactic pattern, and that estimates a speech act and an entity name of a training corpus; a database that stores the articulation speech act, the entity name, and the lexical syntactic pattern of the training corpus; an execution unit that generates an articulation speech act and an entity name of a user, that generates a user lexical syntactic pattern, that estimates a speech act and an entity name of a user, that searches for an articulation pair corresponding to a user articulation at the database using a search condition including the estimated user speech act and the generated user lexical syntactic pattern, and that generates a final response by selecting an articulation template using a restriction condition including an estimated entity name among the found articulation pair; and an output unit that outputs a final response that is gen
    Type: Application
    Filed: January 18, 2013
    Publication date: February 26, 2015
    Inventors: Gary Geunbae Lee, Hyungjong Noh, Kyusong Lee
  • Publication number: 20150058015
    Abstract: A voice processing apparatus includes a voice quality determining unit configured to determine a target speaker determining method used for a voice quality conversion in accordance with a determining method control value for instructing the target speaker determining method of determining a target speaker whose voice quality is targeted to the voice quality conversion, and determine the target speaker in accordance with the target speaker determining method.
    Type: Application
    Filed: August 8, 2014
    Publication date: February 26, 2015
    Inventors: YUHKI MITSUFUJI, TORU CHINEN
  • Patent number: 8965761
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: February 27, 2014
    Date of Patent: February 24, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Thomas James Watson, Daniel Mark Schumacher
  • Publication number: 20150051909
    Abstract: Provided is a pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve the recognition performance, including a discriminative training unit for constructing model parameters of a second or subsequent system based on an output tendency of a previously-constructed model so as to be different from the output tendency of the previously-constructed model. Accordingly, when multiple systems are combined, the recognition performance can be improved without trials-and-errors.
    Type: Application
    Filed: August 13, 2013
    Publication date: February 19, 2015
    Applicants: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., MITSUBISHI ELECTRIC CORPORATION
    Inventors: Yuki TACHIOKA, Shinji Watanabe
  • Patent number: 8959019
    Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: February 17, 2015
    Assignee: Promptu Systems Corporation
    Inventors: Harry Printz, Narren Chittar
  • Patent number: 8949125
    Abstract: Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user's request.
    Type: Grant
    Filed: June 16, 2010
    Date of Patent: February 3, 2015
    Assignee: Google Inc.
    Inventor: Gal Chechik
  • Patent number: 8947220
    Abstract: Speech recognition in a vehicle through an extrinsic device includes detecting, via the vehicle, a presence of a mobile communications device that is configured with a speech recognition component. A vehicle processor encodes data lists stored in the vehicle and transmits the data lists and a vehicle identifier to the mobile communications device. In response to receiving a request to initiate a voice recognition session, the vehicle transmits the request and the vehicle identifier to the mobile communications device that causes activation of the speech recognition component. The mobile communications device retrieves the data lists via the identifier. In response to a voice command received by the speech recognition component, the speech recognition component interprets the voice command, determines an action by evaluating the voice command in view of the data lists, and transmits an instruction to the vehicle processor directing the vehicle to implement the action.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: February 3, 2015
    Assignee: GM Global Technology Operations LLC
    Inventors: Douglas C. Martin, Nathan D. Ampunan
  • Patent number: 8935167
    Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.
    Type: Grant
    Filed: September 25, 2012
    Date of Patent: January 13, 2015
    Assignee: Apple Inc.
    Inventor: Jerome Bellegarda
  • Patent number: 8924213
    Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set.
    Type: Grant
    Filed: July 9, 2012
    Date of Patent: December 30, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Patent number: 8918408
    Abstract: A computing device maintains an input history in memory. This input history includes input strings that have been previously entered into the computing device. When the user begins entering characters of an input string, a predictive input engine is activated. The predictive input engine receives the input string and the input history to generate a candidate list of predictive inputs which are presented to the user. The user can select one of the inputs from the list, or otherwise continue entering characters. The computing device generates the candidate list by combining frequency and recency information of the matching strings from the input history. Additionally, the candidate list can be manipulated to present a variety of candidates. By using a combination of frequency, recency and variety, a favorable user experience is provided.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: December 23, 2014
    Assignee: Microsoft Corporation
    Inventors: Katsutoshi Ohtsuki, Koji Watanabe
  • Patent number: 8918318
    Abstract: Speech recognition of even a speaker who uses a speech recognition system is enabled by using an extended recognition dictionary suited to the speaker without requiring any previous learning using an utterance label corresponding to the speech of the speaker.
    Type: Grant
    Filed: January 15, 2008
    Date of Patent: December 23, 2014
    Assignee: NEC Corporation
    Inventor: Yoshifumi Onishi