Creating Patterns For Matching Patents (Class 704/243)

Update patterns (Class 704/244)

Clustering (Class 704/245)

Multiple recognizer speech recognition

Patent number: 9058805

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

Type: Grant

Filed: May 13, 2013

Date of Patent: June 16, 2015

Assignee: Google Inc.

Inventors: Petar Aleksic, Pedro J. Mengibar, Fadi Biadsy
Automated removal of personally identifiable information

Patent number: 9058813

Abstract: A natural language system may receive user-input. The user-input may include personal or restrictable information. The natural language system may provide a dual processing system. The natural language system may store a true copy of the user-input, which may include the personal or restrictable information. The natural language system may also generate an obfuscated copy of the user-input that does not contain personal or restricted information. The true copy of the user-input may be stored in a secure storage system and may be retrieved by authorized personnel, which may include the user who provided the user-input. The obfuscated copy of the user-input may be stored in a storage system and may be employed in ongoing training of the natural language system.

Type: Grant

Filed: September 21, 2012

Date of Patent: June 16, 2015

Assignee: Rawles LLC

Inventor: Scott I. Blanksteen
Generating acoustic models

Patent number: 9053703

Abstract: This document describes methods, systems, techniques, and computer program products for generating and/or modifying acoustic models. Acoustic models and/or transformations for a target language/dialect can be generated and/or modified using acoustic models and/or transformations from a source language/dialect.

Type: Grant

Filed: November 8, 2011

Date of Patent: June 9, 2015

Assignee: Google Inc.

Inventors: Eugene Weinstein, Pedro J. Moreno Mengibar
System, method and program product for providing automatic speech recognition (ASR) in a shared resource environment

Patent number: 9053708

Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.

Type: Grant

Filed: July 18, 2012

Date of Patent: June 9, 2015

Assignee: International Business Machines Corporation

Inventors: Fernando Luiz Koch, Julio Nogima
Systems and methods for concurrent signal recognition

Patent number: 9047867

Abstract: Methods and systems for recognition of concurrent, superimposed, or otherwise overlapping signals are described. A Markov Selection Model is introduced that, together with probabilistic decomposition methods, enable recognition of simultaneously emitted signals from various sources. For example, a signal mixture may include overlapping speech from different persons. In some instances, recognition may be performed without the need to separate signals or sources. As such, some of the techniques described herein may be useful in automatic transcription, noise reduction, teaching, electronic games, audio search and retrieval, medical and scientific applications, etc.

Type: Grant

Filed: February 21, 2011

Date of Patent: June 2, 2015

Assignee: Adobe Systems Incorporated

Inventor: Paris Smaragdis
System and methods for matching an utterance to a template hierarchy

Patent number: 9043206

Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. Certain embodiments of the system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.

Type: Grant

Filed: October 28, 2013

Date of Patent: May 26, 2015

Assignee: Cyberpulse, L.L.C.

Inventor: James Roberge
Automated communication integrator

Patent number: 9037469

Abstract: An apparatus includes a plurality of applications and an integrator having a voice recognition module configured to identify at least one voice command from a user. The integrator is configured to integrate information from a remote source into at least one of the plurality of applications based on the identified voice command. A method includes analyzing speech from a first user of a first mobile device having a plurality of applications, identifying a voice command based on the analyzed speech using a voice recognition module, and incorporating information from the remote source into at least one of a plurality of applications based on the identified voice command.

Type: Grant

Filed: January 27, 2014

Date of Patent: May 19, 2015

Assignee: VERIZON PATENT AND LICENSING INC.

Inventor: Robert E. Opaluch
SPEECH RECOGNITION METHOD AND DEVICE

Publication number: 20150134332

Abstract: A speech recognition method and device are disclosed. The method includes: acquiring a text file specified by a user, and extracting a command word from the text file, to obtain a command word list; comparing the command word list with a command word library, to confirm whether the command word list includes a new command word; if the command word list includes the new command word, generating a corresponding new pronunciation dictionary; merging the new language model into a language model library; and receiving speech, and performing speech recognition on the speech according to an acoustic model, a phonation dictionary, and the language model library. Command words acquired online are closely related to online content; therefore, the number of the command words is limited and far less than the number of frequently used words.

Type: Application

Filed: January 16, 2015

Publication date: May 14, 2015

Inventors: Change Liu, Deming Zhang
Always-On Audio Control for Mobile Device

Publication number: 20150134331

Abstract: In an embodiment, an integrated circuit may include one or more CPUs, a memory controller, and a circuit configured to remain powered on when the rest of the SOC is powered down. The circuit may be configured to receive audio samples from a microphone, and match those audio samples against a predetermined pattern to detect a possible command from a user of the device that includes the SOC. In response to detecting the predetermined pattern, the circuit may cause the memory controller to power up so that audio samples may be stored in the memory to which the memory controller is coupled. The circuit may also cause the CPUs to be powered on and initialized, and the operating system (OS) may boot. During the time that the CPUs are initializing and the OS is booting, the circuit and the memory may be capturing the audio samples.

Type: Application

Filed: December 17, 2013

Publication date: May 14, 2015

Applicant: Apple Inc.

Inventors: Timothy J. Millet, Manu Gulati, Michael F. Culbert
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9026442

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: August 14, 2014

Date of Patent: May 5, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
VOICE-RESPONSIVE BUILDING MANAGEMENT SYSTEM

Publication number: 20150120297

Abstract: A voice-responsive building management system is described herein. One system includes an interface, a dynamic grammar builder, and a speech processing engine. The interface is configured to receive a speech card of a user, wherein the speech card of the user includes speech training data of the user and domain vocabulary for applications of the building management system for which the user is authorized. The dynamic grammar builder is configured to generate grammar from a building information model of the building management system. The speech processing engine is configured to receive a voice command or voice query from the user, and execute the voice command or voice query using the speech training data of the user, the domain vocabulary, and the grammar generated from the building information model.

Type: Application

Filed: October 24, 2013

Publication date: April 30, 2015

Applicant: Honeywell International Inc.

Inventor: Jayaprakash Meruva
Format based speech reconstruction from noisy signals

Patent number: 9020818

Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.

Type: Grant

Filed: August 20, 2012

Date of Patent: April 28, 2015

Assignee: Malaspina Labs (Barbados) Inc.

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
State detecting apparatus, communication apparatus, and storage medium storing state detecting program

Patent number: 9020820

Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.

Type: Grant

Filed: April 13, 2012

Date of Patent: April 28, 2015

Assignee: Fujitsu Limited

Inventors: Shoji Hayakawa, Naoshi Matsuo
Hidden markov model for speech processing with training method

Patent number: 9020816

Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.

Type: Grant

Filed: August 13, 2009

Date of Patent: April 28, 2015

Assignee: 21CT, Inc.

Inventor: Matthew McClain
METHOD FOR BUILDING LANGUAGE MODEL, SPEECH RECOGNITION METHOD AND ELECTRONIC APPARATUS

Publication number: 20150112679

Abstract: A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal.

Type: Application

Filed: September 29, 2014

Publication date: April 23, 2015

Inventor: Guo-Feng Zhang
Formant based speech reconstruction from noisy signals

Patent number: 9015044

Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.

Type: Grant

Filed: August 20, 2012

Date of Patent: April 21, 2015

Assignee: Malaspina Labs (Barbados) Inc.

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
Model-driven candidate sorting

Patent number: 9009045

Abstract: Methods and systems for model-driven candidate sorting for evaluating digital interviews are described. In one embodiment, a model-driven candidate-sorting tool selects a data set of digital interview data for sorting. The data set includes candidate for interviewing candidates (also referred to herein as interviewees). The model-driven candidate-sorting tool analyzes the candidate data for the respective interviewing candidate to identify digital interviewing cues and applies the digital interview cues to a prediction model to predict an achievement index for the respective interviewing candidate. This is performed without reviewer input at the model-driven candidate-sorting tool. The list of interview candidates is sorted according the predicted achievement indices and the sorted list is presented to the reviewer in a user interface.

Type: Grant

Filed: February 18, 2014

Date of Patent: April 14, 2015

Assignee: HireVue, Inc.

Inventors: Loren Larsen, Benjamin Taylor
Training a transcription system

Patent number: 9009040

Abstract: According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text.

Type: Grant

Filed: May 5, 2010

Date of Patent: April 14, 2015

Assignee: Cisco Technology, Inc.

Inventors: Todd C. Tatum, Michael A. Ramalho, Paul M. Dunn, Shantanu Sarkar, Tyrone T. Thorsen, Alan D. Gatzke
SPEECH RECOGNITION DEVICE

Publication number: 20150100317

Abstract: A speech recognition device starts to generate dictionary data for each type of name based on name data and paraphrase data, and executes dictionary registration of the dictionary data. The speech recognition device obtains text information same as text information for generating the dictionary data last time. When back-up data corresponding to the last time text information is generated, the speech recognition device executes the dictionary registration of the dictionary data generated as the back-up data. Further, a dictionary data generation device executes the dictionary registration of the dictionary data based on given name data every time the dictionary data generation device completes generation of the dictionary data based on the given name data.

Type: Application

Filed: January 29, 2013

Publication date: April 9, 2015

Inventors: Hideaki Tsuji, Satoshi Miyaguni
Speech recognition system and method based on word-level candidate generation

Patent number: 9002708

Abstract: A speech recognition system and method based on word-level candidate generation are provided. The speech recognition system may include a speech recognition result verifying unit to verify a word sequence and a candidate word for at least one word included in the word sequence when the word sequence and the candidate word are provided as a result of speech recognition. A word sequence displaying unit may display the word sequence in which the at least one word is visually distinguishable from other words of the word sequence. The word sequence displaying unit may display the word sequence by replacing the at least one word with the candidate word when the at least one word is selected by a user.

Type: Grant

Filed: May 8, 2012

Date of Patent: April 7, 2015

Assignee: NHN Corporation

Inventors: Sang Ho Lee, Hoon Kim, Dong Ook Koo, Dae Sung Jung
Community audio narration generation

Patent number: 9002703

Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.

Type: Grant

Filed: September 28, 2011

Date of Patent: April 7, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Jay A. Crosley
Method and system for automatic domain adaptation in speech recognition applications

Patent number: 8996371

Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.

Type: Grant

Filed: March 29, 2012

Date of Patent: March 31, 2015

Assignee: Nice-Systems Ltd.

Inventors: Eyal Hurvitz, Ezra Daya, Oren Pereg, Moshe Wasserblat
Methods and systems for synchronizing media

Patent number: 8996380

Abstract: Systems and methods of synchronizing media are provided. A client device may be used to capture a sample of a media stream being rendered by a media rendering source. The client device sends the sample to a position identification module to determine a time offset indicating a position in the media stream corresponding to the sampling time of the sample, and optionally a timescale ratio indicating a speed at which the media stream is being rendered by the media rendering source based on a reference speed of the media stream. The client device calculates a real-time offset using a present time, a timestamp of the media sample, the time offset, and optionally the timescale ratio. The client device then renders a second media stream at a position corresponding to the real-time offset to be in synchrony to the media stream being rendered by the media rendering source.

Type: Grant

Filed: May 4, 2011

Date of Patent: March 31, 2015

Assignee: Shazam Entertainment Ltd.

Inventors: Avery Li-Chun Wang, Rahul Powar, William Michael Mills, Christopher Jacques Penrose Barton, Philip Georges Inghelbrecht, Dheeraj Shankar Mukherjee
Intelligent text-to-speech conversion

Patent number: 8996376

Abstract: Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

Type: Grant

Filed: April 5, 2008

Date of Patent: March 31, 2015

Assignee: Apple Inc.

Inventors: Christopher Brian Fleizach, Reginald Dean Hudson
Using adaptation data with cloud-based speech recognition

Patent number: 8996372

Abstract: Speech recognition may be improved using data derived from an utterance. In some embodiments, audio data is received by a user device. Adaptation data may be retrieved from a data store accessible by the user device. The audio data and the adaptation data may be transmitted to a server device. The server device may use the audio data to calculate second adaptation data. The second adaptation data may be transmitted to the user device. Synchronously or asynchronously, the server device may perform speech recognition using the audio data and the second adaptation data and transmit speech recognition results back to the user device.

Type: Grant

Filed: October 30, 2012

Date of Patent: March 31, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Hugh Secker-Walker, Bjorn Hoffmeister, Ryan Thomas, Stan Salvador, Karthik Ramakrishnan
State detection device and state detecting method

Patent number: 8996373

Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.

Type: Grant

Filed: October 5, 2011

Date of Patent: March 31, 2015

Assignee: Fujitsu Limited

Inventors: Shoji Hayakawa, Naoshi Matsuo
ANTI-SPOOFING

Publication number: 20150088509

Abstract: System for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier and method for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier.

Type: Application

Filed: September 24, 2014

Publication date: March 26, 2015

Inventors: Alfonso Ortega Giménez, Luis Buera Rodriguez, Carlos Vaquero Avilés-Casco
SYSTEM AND METHOD FOR ROBUST ACCESS AND ENTRY TO LARGE STRUCTURED DATA USING VOICE FORM-FILLING

Publication number: 20150088510

Abstract: A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar.

Type: Application

Filed: December 4, 2014

Publication date: March 26, 2015

Inventors: Cyril Georges Luc ALLAUZEN, Sarangarajan PARTHASARATHY
Method of active learning for automatic speech recognition

Patent number: 8990084

Abstract: State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor-intensive and time-consuming. The present invention is an iterative method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples and then selecting the most informative ones with respect to a given cost function for a human to label. The method comprises automatically estimating a confidence score for each word of the utterance and exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. An utterance confidence score is computed based on these word confidence scores; then the utterances are selectively sampled to be transcribed using the utterance confidence scores.

Type: Grant

Filed: February 10, 2014

Date of Patent: March 24, 2015

Assignee: Interactions LLC

Inventors: Allen Louis Gorin, Dilek Z. Hakkani-Tur, Guiseppe Riccardi
SYSTEM AND METHOD FOR UNSUPERVISED AND ACTIVE LEARNING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20150081297

Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.

Type: Application

Filed: November 24, 2014

Publication date: March 19, 2015

Inventors: Dilek Zeynep HAKKANI-TUR, Giuseppe RICCARDI
Captioning using socially derived acoustic profiles

Patent number: 8983836

Abstract: Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.

Type: Grant

Filed: September 26, 2012

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Elizabeth V. Woodward, Shunguo Yan
AUTOMATIC GENERATION OF DOMAIN MODELS FOR VIRTUAL PERSONAL ASSISTANTS

Publication number: 20150073798

Abstract: Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.

Type: Application

Filed: September 8, 2014

Publication date: March 12, 2015

Inventors: Yael Karov, Eran Levy, Sari Brosh-Lipstein
SPEECH SYLLABLE/VOWEL/PHONE BOUNDARY DETECTION USING AUDITORY ATTENTION CUES

Publication number: 20150073794

Abstract: In syllable or vowel or phone boundary detection during speech, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more syllable or vowel or phone boundaries in the input window of sound can be detected by mapping the cumulative gist vector to one or more syllable or vowel or phone boundary characteristics using a machine learning algorithm.

Type: Application

Filed: June 17, 2014

Publication date: March 12, 2015

Inventors: Ozlem Kalinli-Akbacak, Ruxin Chen
User Programmable Voice Command Recognition Based On Sparse Features

Publication number: 20150073795

Abstract: A low power sound recognition sensor is configured to receive an analog signal that may contain a signature sound. Sparse sound parameter information is extracted from the analog signal. The extracted sparse sound parameter information is processed using a speaker dependent sound signature database stored in the sound recognition sensor to identify sounds or speech contained in the analog signal. The sound signature database may include several user enrollments for a sound command each representing an entire word or multiword phrase. The extracted sparse sound parameter information may be compared to the multiple user enrolled signatures using cosine distance, Euclidean distance, correlation distance, etc., for example.

Type: Application

Filed: August 13, 2014

Publication date: March 12, 2015

Inventor: Bozhao Tan
Parametric speech synthesis method and system

Patent number: 8977551

Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.

Type: Grant

Filed: October 27, 2011

Date of Patent: March 10, 2015

Assignee: Goertek Inc.

Inventors: Fengliang Wu, Zhenhua Wu
Sparse maximum a posteriori (map) adaption

Patent number: 8972258

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: May 22, 2014

Date of Patent: March 3, 2015

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
Speech recognition using multiple language models

Patent number: 8972260

Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Type: Grant

Filed: April 19, 2012

Date of Patent: March 3, 2015

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
System and method for teaching non-lexical speech effects

Patent number: 8972259

Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.

Type: Grant

Filed: September 9, 2010

Date of Patent: March 3, 2015

Assignee: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
AUTOMATED VERBAL FLUENCY ASSESSMENT

Publication number: 20150058013

Abstract: Techniques are described for calculating one or more verbal fluency scores for a person. An example method includes classifying, by a computing device, samples of audio data of speech of a person, based on amplitudes of the samples, into a first class of samples including speech or sound and a second class of samples including silence. The method further includes analyzing the first class of samples to determine a number of words spoken by the person, and calculating a verbal fluency score for the person based at least in part on the determined number of words spoken by the person.

Type: Application

Filed: March 14, 2013

Publication date: February 26, 2015

Applicant: Regents of the University of Minnesota

Inventors: Serguei V.S. Pakhomov, Laura Sue Hemmy, Kelvin O. Lim
SYSTEM AND METHOD FOR MANAGING CONVERSATION

Publication number: 20150058014

Abstract: A conversation management system includes: a training unit that generates an articulation speech act and an entity name of a training corpus, that generates a lexical syntactic pattern, and that estimates a speech act and an entity name of a training corpus; a database that stores the articulation speech act, the entity name, and the lexical syntactic pattern of the training corpus; an execution unit that generates an articulation speech act and an entity name of a user, that generates a user lexical syntactic pattern, that estimates a speech act and an entity name of a user, that searches for an articulation pair corresponding to a user articulation at the database using a search condition including the estimated user speech act and the generated user lexical syntactic pattern, and that generates a final response by selecting an articulation template using a restriction condition including an estimated entity name among the found articulation pair; and an output unit that outputs a final response that is gen

Type: Application

Filed: January 18, 2013

Publication date: February 26, 2015

Inventors: Gary Geunbae Lee, Hyungjong Noh, Kyusong Lee
VOICE PROCESSING APPARATUS, VOICE PROCESSING METHOD, AND PROGRAM

Publication number: 20150058015

Abstract: A voice processing apparatus includes a voice quality determining unit configured to determine a target speaker determining method used for a voice quality conversion in accordance with a determining method control value for instructing the target speaker determining method of determining a target speaker whose voice quality is targeted to the voice quality conversion, and determine the target speaker in accordance with the target speaker determining method.

Type: Application

Filed: August 8, 2014

Publication date: February 26, 2015

Inventors: YUHKI MITSUFUJI, TORU CHINEN
Differential dynamic content delivery with text display in dependence upon simultaneous speech

Patent number: 8965761

Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.

Type: Grant

Filed: February 27, 2014

Date of Patent: February 24, 2015

Assignee: Nuance Communications, Inc.

Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Thomas James Watson, Daniel Mark Schumacher
PATTERN RECOGNITION APPARATUS AND PATTERN RECOGNITION METHOD

Publication number: 20150051909

Abstract: Provided is a pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve the recognition performance, including a discriminative training unit for constructing model parameters of a second or subsequent system based on an output tendency of a previously-constructed model so as to be different from the output tendency of the previously-constructed model. Accordingly, when multiple systems are combined, the recognition performance can be improved without trials-and-errors.

Type: Application

Filed: August 13, 2013

Publication date: February 19, 2015

Applicants: MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC., MITSUBISHI ELECTRIC CORPORATION

Inventors: Yuki TACHIOKA, Shinji Watanabe
Efficient empirical determination, computation, and use of acoustic confusability measures

Patent number: 8959019

Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled

Type: Grant

Filed: October 31, 2007

Date of Patent: February 17, 2015

Assignee: Promptu Systems Corporation

Inventors: Harry Printz, Narren Chittar
Annotating maps with user-contributed pronunciations

Patent number: 8949125

Abstract: Systems and methods are provided to select a most typical pronunciation of a location name on a map from a plurality of user pronunciations. A server generates a reference speech model based on user pronunciations, compares the user pronunciations with the speech model and selects a pronunciation based on comparison. Alternatively, the server compares the distance between one the user pronunciations and every other user pronunciations and selects a pronunciation based on comparison. The server then annotates the map with the selected pronunciation and provides the audio output of the location name to a user device upon a user's request.

Type: Grant

Filed: June 16, 2010

Date of Patent: February 3, 2015

Assignee: Google Inc.

Inventor: Gal Chechik
Speech recognition functionality in a vehicle through an extrinsic device

Patent number: 8947220

Abstract: Speech recognition in a vehicle through an extrinsic device includes detecting, via the vehicle, a presence of a mobile communications device that is configured with a speech recognition component. A vehicle processor encodes data lists stored in the vehicle and transmits the data lists and a vehicle identifier to the mobile communications device. In response to receiving a request to initiate a voice recognition session, the vehicle transmits the request and the vehicle identifier to the mobile communications device that causes activation of the speech recognition component. The mobile communications device retrieves the data lists via the identifier. In response to a voice command received by the speech recognition component, the speech recognition component interprets the voice command, determines an action by evaluating the voice command in view of the data lists, and transmits an instruction to the vehicle processor directing the vehicle to implement the action.

Type: Grant

Filed: October 31, 2012

Date of Patent: February 3, 2015

Assignee: GM Global Technology Operations LLC

Inventors: Douglas C. Martin, Nathan D. Ampunan
Exemplar-based latent perceptual modeling for automatic speech recognition

Patent number: 8935167

Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.

Type: Grant

Filed: September 25, 2012

Date of Patent: January 13, 2015

Assignee: Apple Inc.

Inventor: Jerome Bellegarda
Detecting potential significant errors in speech recognition results

Patent number: 8924213

Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set.

Type: Grant

Filed: July 9, 2012

Date of Patent: December 30, 2014

Assignee: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
Candidate generation for predictive input using input history

Patent number: 8918408

Abstract: A computing device maintains an input history in memory. This input history includes input strings that have been previously entered into the computing device. When the user begins entering characters of an input string, a predictive input engine is activated. The predictive input engine receives the input string and the input history to generate a candidate list of predictive inputs which are presented to the user. The user can select one of the inputs from the list, or otherwise continue entering characters. The computing device generates the candidate list by combining frequency and recency information of the matching strings from the input history. Additionally, the candidate list can be manipulated to present a variety of candidates. By using a combination of frequency, recency and variety, a favorable user experience is provided.

Type: Grant

Filed: August 24, 2012

Date of Patent: December 23, 2014

Assignee: Microsoft Corporation

Inventors: Katsutoshi Ohtsuki, Koji Watanabe
Extended recognition dictionary learning device and speech recognition system

Patent number: 8918318

Abstract: Speech recognition of even a speaker who uses a speech recognition system is enabled by using an extended recognition dictionary suited to the speaker without requiring any previous learning using an utterance label corresponding to the speech of the speaker.

Type: Grant

Filed: January 15, 2008

Date of Patent: December 23, 2014

Assignee: NEC Corporation

Inventor: Yoshifumi Onishi

prev … 3 4 5 6 7 8 9 10 11 … next