Creation Of Reference Templates; Training Of Speech Recognition Systems, E.g., Adaption To The Characteristics Of The Speaker's Voice, Etc. (epo) Patents (Class 704/E15.007)

E Subclasses

Training (epo) (Class 704/E15.008)

Adaptation (epo) (Class 704/E15.009)

Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition

Publication number: 20140088964

Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.

Type: Application

Filed: September 25, 2012

Publication date: March 27, 2014

Applicant: APPLE INC.

Inventor: Jerome Bellegarda
SYSTEM AND METHOD FOR DECODING SPEECH

Publication number: 20140067394

Abstract: The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicants: KING ABDULAZIZ CITY FOR SCIENCE AND TECHNOLOGY, KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS

Inventors: DIA EDDIN M. ABUZEINA, MOUSTAFA ELSHAFEI, HUSNI AL-MUHTASEB, WASFI G. AL-KHATIB
NAME RECOGNITION SYSTEM

Publication number: 20130332164

Abstract: A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.

Type: Application

Filed: June 8, 2012

Publication date: December 12, 2013

Inventor: Devang K. Nalk
SYSTEM AND METHOD FOR REAL-TIME SPEAKER SEGMENTATION OF AUDIO INTERACTIONS

Publication number: 20130246064

Abstract: A system and method for real-time processing a signal of a voice interaction. In an embodiment, a digital representation of a portion of an interaction may be analyzed in real-time and a segment may be selected. The segment may be associated with a source based on a model of the source. The model may updated based on the segment. The updated model is used to associate subsequent segments with the source. Other embodiments are described and claimed.

Type: Application

Filed: March 13, 2012

Publication date: September 19, 2013

Inventors: Moshe WASSERBLAT, Tzachi ASHKENAZI, Merav BEN-ASHER, Oren PEREG
Subword-Based Multi-Level Pronunciation Adaptation for Recognizing Accented Speech

Publication number: 20130191126

Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.

Type: Application

Filed: January 20, 2012

Publication date: July 25, 2013

Applicant: Microsoft Corporation

Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
NORMALIZATION BASED DISCRIMINATIVE TRAINING FOR CONTINUOUS SPEECH RECOGNITION

Publication number: 20130185070

Abstract: A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.

Type: Application

Filed: January 12, 2012

Publication date: July 18, 2013

Applicant: Microsoft Corporation

Inventors: Qiang Huo, Zhi-Jie Yan, Yu Zhang
METHODS AND ELECTRONIC DEVICES FOR SPEECH RECOGNITION

Publication number: 20130144618

Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.

Type: Application

Filed: March 12, 2012

Publication date: June 6, 2013

Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin
SYSTEM AND METHOD FOR CROWD-SOURCED DATA LABELING

Publication number: 20130132080

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.

Type: Application

Filed: November 18, 2011

Publication date: May 23, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Jason Williams, Tirso Alonso, Barbara B. Hollister, Ilya Dan Melamed
SYSTEM AND METHOD FOR PERFORMING DUAL MODE SPEECH RECOGNITION

Publication number: 20130132084

Abstract: A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.

Type: Application

Filed: June 21, 2012

Publication date: May 23, 2013

Applicant: SOUNDHOUND, INC.

Inventors: Timothy Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
Consolidating Speech Recognition Results

Publication number: 20130073286

Abstract: Candidate interpretations resulting from application of speech recognition algorithms to spoken input are presented in a consolidated manner that reduces redundancy. A list of candidate interpretations is generated, and each candidate interpretation is subdivided into time-based portions, forming a grid. Those time-based portions that duplicate portions from other candidate interpretations are removed from the grid. A user interface is provided that presents the user with an opportunity to select among the candidate interpretations; the user interface is configured to present these alternatives without duplicate elements.

Type: Application

Filed: September 20, 2011

Publication date: March 21, 2013

Applicant: APPLE INC.

Inventors: Marcello Bastea-Forte, David A. Winarsky
Voice acquisition system for a vehicle

Patent number: 8350683

Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly attached at an inner portion of the windshield of a vehicle equipped with the interior rearview mirror assembly. The interior rearview mirror assembly includes at least two microphones for receiving audio signals within a cabin of the vehicle and generating an output indicative of the audio signals. A control is in the vehicle and is responsive to the output from the at least one microphone. The control at least partially distinguishes vocal signals from non-vocal signals present in the output. The at least two microphones provide sound capture for at least one of a hands free cell phone system, an audio recording system and a wireless communication system.

Type: Grant

Filed: August 15, 2011

Date of Patent: January 8, 2013

Assignee: Donnelly Corporation

Inventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
MEANING EXTRACTION SYSTEM, MEANING EXTRACTION METHOD, AND RECORDING MEDIUM

Publication number: 20130006636

Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.

Type: Application

Filed: March 24, 2011

Publication date: January 3, 2013

Applicant: NEC CORPORATION

Inventors: Hironori Mizuguchi, Dai Kusui
IDENTIFYING PEOPLE THAT ARE PROXIMATE TO A MOBILE DEVICE USER VIA SOCIAL GRAPHS, SPEECH MODELS, AND USER CONTEXT

Publication number: 20130006634

Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.

Type: Application

Filed: January 6, 2012

Publication date: January 3, 2013

Applicant: QUALCOMM Incorporated

Inventors: Leonard Henry Grokop, Vidya Narayanan
METHOD AND SYSTEM FOR SPEAKER DIARIZATION

Publication number: 20130006635

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Application

Filed: September 11, 2012

Publication date: January 3, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES

Inventor: Hagai Aronowitz
ENHANCED ACCURACY FOR SPEECH RECOGNITION GRAMMARS

Publication number: 20120284025

Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar The database may comprise a directory of names.

Type: Application

Filed: July 18, 2012

Publication date: November 8, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Harry Blanchard, Steven LEWIS, Shankarnarayan SIVAPRASAD, Lan ZHANG
COMMUNICATION INTERFACE APPARATUS AND METHOD FOR MULTI-USER AND SYSTEM

Publication number: 20120278066

Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.

Type: Application

Filed: November 9, 2010

Publication date: November 1, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
SPEECH RECOGNITION USING MULTIPLE LANGUAGE MODELS

Publication number: 20120271631

Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Type: Application

Filed: April 19, 2012

Publication date: October 25, 2012

Applicant: ROBERT BOSCH GMBH

Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
VOICE RECOGNITION DEVICE

Publication number: 20120239399

Abstract: Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.

Type: Application

Filed: March 30, 2010

Publication date: September 20, 2012

Inventors: Michihiro Yamazaki, Yuzo Maruta
RECOGNITION DICTIONARY CREATING DEVICE, VOICE RECOGNITION DEVICE, AND VOICE SYNTHESIZER

Publication number: 20120203553

Abstract: A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.

Type: Application

Filed: January 22, 2010

Publication date: August 9, 2012

Inventor: Yuzo Maruta
INTERACTIVE SPEECH RECOGNITION MODEL

Publication number: 20120173237

Abstract: A method and apparatus for updating a speech model on a multi-user speech recognition system with a personal speech model for a single user. A speech recognition system, for instance in a car, can include a generic speech model for comparison with the user speech input. A way of identifying a personal speech model, for instance in a mobile phone, is connected to the system. A mechanism is included for receiving personal speech model components, for instance a BLUETOOTH connection. The generic speech model is updated using the received personal speech model components. Speech recognition can then be performed on user speech using the updated generic speech model.

Type: Application

Filed: March 12, 2012

Publication date: July 5, 2012

Applicant: Nuance Communications, Inc.

Inventors: Barry Neil Dow, Eric William Janke, Daniel Lee Yuk Cheung, Benjamin Terrick Staniford
METHOD AND APPARATUS FOR GENERATING A VOICE-TAG

Publication number: 20120130715

Abstract: According to one embodiment, an apparatus for generating a voice-tag includes an input unit, a recognition unit, and a combination unit. The input unit is configured to input a registration speech. The recognition unit is configured to recognize the registration speech to obtain N-best recognition results, wherein N is an integer greater than or equal to 2. The combination unit is configured to combine the N-best recognition results as a voice-tag of the registration speech.

Type: Application

Filed: September 23, 2011

Publication date: May 24, 2012

Inventors: Rui Zhao, Lei He
SYSTEM AND METHOD FOR COMMUNICATION TERMINAL SURVEILLANCE BASED ON SPEAKER RECOGNITION

Publication number: 20120116762

Abstract: A Candidate Isolation System (CIS) detects subscribers of phone call services as candidates to be surveillance targets. A Voice Matching System (VMS) then decides whether or not a given candidate Communication Terminals (CTs) should be tracked by determining, using speaker recognition techniques, whether the subscriber operating the candidate CT is a known target subscriber. The CIS receives from the network call event data that relate to CTs in the network.

Type: Application

Filed: October 28, 2011

Publication date: May 10, 2012

Applicant: VERINT SYSTEMS LTD.

Inventors: Eithan Goldfarb, Yoav Ariav
SPEECH RECOGNITION APPARATUS

Publication number: 20120101821

Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.

Type: Application

Filed: October 13, 2011

Publication date: April 26, 2012

Applicant: DENSO CORPORATION

Inventor: Takahiro TSUDA
METHODS AND APPARATUS FOR GENERATING, UPDATING AND DISTRIBUTING SPEECH RECOGNITION MODELS

Publication number: 20120101812

Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.

Type: Application

Filed: December 30, 2011

Publication date: April 26, 2012

Applicant: GOOGLE INC.

Inventors: Craig Reding, Suzi Levas
VOICE CONTROL SYSTEM

Publication number: 20120078635

Abstract: One embodiment of a voice control system includes a first electronic device communicatively coupled to a server and configured to receive a speech recognition file from the server. The speech recognition file may include a speech recognition algorithm for converting one or more voice commands into text and a database including one or more entries comprising one or more voice commands and one or more executable commands associated with the one or more voice commands.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Applicant: Apple Inc.

Inventors: Fletcher Rothkopf, Stephen Brian Lynch, Adam Mittleman, Phil Hobson
METHODS AND SYSTEMS FOR OBTAINING LANGUAGE MODELS FOR TRANSCRIBING COMMUNICATIONS

Publication number: 20120059653

Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.

Type: Application

Filed: August 30, 2011

Publication date: March 8, 2012

Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
Systems and Methods for Keyword Analyzer

Publication number: 20120059849

Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.

Type: Application

Filed: September 8, 2010

Publication date: March 8, 2012

Applicant: DEMAND MEDIA, INC.

Inventors: David M. Yehaskel, Henrik M. Kjallbring
METHOD FOR ADJUSTING A VOICE RECOGNITION SYSTEM COMPRISING A SPEAKER AND A MICROPHONE, AND VOICE RECOGNITION SYSTEM

Publication number: 20110301954

Abstract: A method for adjusting a voice recognition system and a voice recognition system is disclosed, wherein the voice recognition system comprises a speaker and a microphone, and wherein the method comprises the steps of: memorizing an audio frequency signal, playing back the audio frequency signal by means of the speaker, generating a detection signal by detecting the audio frequency signal by means of the microphone, and adjusting parameters of the voice recognition system dependent on the detection signal.

Type: Application

Filed: June 3, 2010

Publication date: December 8, 2011

Applicant: Johnson Controls Technology Company

Inventors: Michael J. Sims, Brian L. Douthitt, David J. Hughes, Mark Zeinstra, Ted W. Ringold, Douglas W. Klamer, Todd Witters, Elisabet A. Anderson
SPEECH DIALOGUE APPARATUS, DIALOGUE CONTROL METHOD, AND DIALOGUE CONTROL PROGRAM

Publication number: 20110276329

Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.

Type: Application

Filed: January 20, 2010

Publication date: November 10, 2011

Inventors: Masaaki Ayabe, Jun Okamoto
SPEECH-BASED SPEAKER RECOGNITION SYSTEMS AND METHODS

Publication number: 20110276323

Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.

Type: Application

Filed: May 6, 2010

Publication date: November 10, 2011

Applicant: Senam Consulting, Inc.

Inventor: Serge Olegovich Seyfetdinov
METHODS AND APPARATUS FOR EXTRACTING ALTERNATE MEDIA TITLES TO FACILITATE SPEECH RECOGNITION

Publication number: 20110231189

Abstract: Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles.

Type: Application

Filed: March 19, 2010

Publication date: September 22, 2011

Applicant: Nuance Communications, Inc.

Inventors: Josef Damianus Anastasiadis, Christophe Nestor George Couvreur
Weight Coefficient Generation Device, Voice Recognition Device, Navigation Device, Vehicle, Weight Coefficient Generation Method, and Weight Coefficient Generation Program

Publication number: 20110231191

Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.

Type: Application

Filed: November 17, 2009

Publication date: September 22, 2011

Inventor: Toshiyuki Miyazaki
METHOD OF AND SYSTEM FOR PROVIDING ADAPTIVE RESPONDENT TRAINING IN A SPEECH RECOGNITION APPLICATION

Publication number: 20110231190

Abstract: A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.

Type: Application

Filed: March 21, 2011

Publication date: September 22, 2011

Applicant: Eliza Corporation

Inventors: Nasreen Quibria, Lucas Merrow, Oleg Boulanov, John P. Kroeker, Alexandra Drane
Automatic Language Model Update

Publication number: 20110213613

Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.

Type: Application

Filed: May 24, 2010

Publication date: September 1, 2011

Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno
Voice acquisition system for a vehicle

Patent number: 8004392

Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly. The mirror assembly may include a microphone for receiving audio signals within a cabin of the vehicle and generating an output indicative of these audio signals. The microphone may provide sound capture for a hands free cell phone system, an audio recording system and/or an emergency communication system. The system may include a control that is responsive to the output from the microphone and that distinguishes vocal signals from non-vocal signals present in the output. The microphone may provide sound capture for at least one accessory of the equipped vehicle, and the accessory may be responsive to a vocal signal captured by the microphone. The interior rearview mirror assembly may include at least one accessory, such as an antenna, a video device, a security system status indicator, a tire pressure indicator display and/or a loudspeaker.

Type: Grant

Filed: December 19, 2008

Date of Patent: August 23, 2011

Assignee: Donnelly Corporation

Inventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
METHOD OF RECOGNIZING SPEECH

Publication number: 20110166858

Abstract: A method for recognizing speech involves presenting an utterance to a speech recognition system and determining, via the speech recognition system, that the utterance contains a particular expression, where the particular expression is capable of being associated with at least two different meanings. The method further involves splitting the utterance into a plurality of speech frames, where each frame is assigned a predetermined time segment and a frame number, and indexing the utterance to i) a predetermined frame number, or ii) a predetermined time segment. The indexing of the utterance identifies that one of the frames includes the particular expression. Then the frame including the particular expression is re-presented to the speech recognition system to verify that the particular expression was actually recited in the utterance.

Type: Application

Filed: January 6, 2010

Publication date: July 7, 2011

Applicant: GENERAL MOTORS LLC

Inventor: Uma Arun
LANGUAGE MODEL CREATION APPARATUS, LANGUAGE MODEL CREATION METHOD, SPEECH RECOGNITION APPARATUS, SPEECH RECOGNITION METHOD, AND RECORDING MEDIUM

Publication number: 20110161072

Abstract: A frequency counting unit (15A) counts occurrence frequencies (14B) in input text data (14A) for respective words or word chains contained in the input text data (14A). A context diversity calculation unit (15B) calculates, for the respective words or word chains, diversity indices (14C) each indicating the context diversity of a word or word chain. A frequency correction unit (15C) corrects the occurrence frequencies (14B) of the respective words or word chains based on the diversity indices (14C) of the respective words or word chains. An N-gram language model creation unit (15D) creates an N-gram language model (14E) based on the corrected occurrence frequencies (14D) obtained for the respective words or word chains.

Type: Application

Filed: August 20, 2009

Publication date: June 30, 2011

Applicant: NEC CORPORATION

Inventors: Makoto Terao, Kiyokazu Miki, Hitoshi Yamamoto
Speech Recognition Language Models

Publication number: 20110161081

Abstract: Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.

Type: Application

Filed: December 22, 2010

Publication date: June 30, 2011

Applicant: GOOGLE INC.

Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
Disfluent-utterance tracking system and method

Publication number: 20110144993

Abstract: A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.

Type: Application

Filed: December 15, 2009

Publication date: June 16, 2011

Inventor: David Ruby
SYSTEM AND METHOD FOR TRAINING ADAPTATION-SPECIFIC ACOUSTIC MODELS FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20110137650

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.

Type: Application

Filed: December 8, 2009

Publication date: June 9, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Andrej LJOLJE
ENHANCED ACCURACY FOR SPEECH RECOGNITION GRAMMARS

Publication number: 20110137652

Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar. The database may comprise a directory of names.

Type: Application

Filed: February 14, 2011

Publication date: June 9, 2011

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Harry Blanchard, Steven Lewis, Shankarnarayan Sivaprasad, Lan Zhang
Vocabulary Dictionary Recompile for In-Vehicle Audio System

Publication number: 20110131037

Abstract: An in-vehicle audio system and methods are provided. A respective word or a respective phrase may be associated with each item of audio content stored in the in-vehicle audio system. The in-vehicle audio system may perform an action with respect to one of the stored items of audio content in response to a spoken command, which may include the respective word or the respective phrase associated with the one of the stored items. When audio content is to be added to the in-vehicle audio system, phonetics related to the audio content may be generated and added to a vocabulary dictionary during a compile process. When stored audio content is to be deleted from the in-vehicle audio system, phonetics related to the stored audio content to be deleted may be eliminated from the vocabulary dictionary during the compile process, which, in some embodiments, may be performed during a shutdown process.

Type: Application

Filed: November 20, 2010

Publication date: June 2, 2011

Applicant: Honda Motor Co., Ltd.

Inventors: Ritchie Huang, Stuart M. Yamamoto, David M. Kirsch
Systems and Methods for Creating and Using Geo-Centric Language Models

Publication number: 20110093265

Abstract: Systems and methods for creating and using geo-centric language models are provided herein. An exemplary method includes assigning each of a plurality of listings to a local service area, determining a geographic center for the local service area, computing a listing density for the local service area, and selecting a desired number of listings for a geo-centric listing set. The geo-centric listing set includes a subset of the plurality of listings. The exemplary method further includes dividing the local service area into regions based upon the listing density and the number of listings in the geo-centric listing set, and building a language model for the geo-centric listing set.

Type: Application

Filed: October 16, 2009

Publication date: April 21, 2011

Inventors: Amanda Stent, Dlamantino Caseiro, Ilija Zeljkovic, Jay Wilpon
SYSTEM AND METHOD FOR SPEECH-ENABLED ACCESS TO MEDIA CONTENT

Publication number: 20110082696

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.

Type: Application

Filed: October 5, 2009

Publication date: April 7, 2011

Applicant: AT & T Intellectual Property I, L.P.

Inventors: Michael JOHNSTON, Ebrahim KAZEMZADEH
SYSTEM AND METHOD FOR HANDLING REPEAT QUERIES DUE TO WRONG ASR OUTPUT

Publication number: 20110077942

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.

Type: Application

Filed: September 30, 2009

Publication date: March 31, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Andrej LJOLJE, Diamantino Antonio Caseiro
GENERATION AND SELECTION OF SPEECH RECOGNITION GRAMMARS FOR CONDUCTING SEARCHES

Publication number: 20110071827

Abstract: Various processes are disclosed for generating and selecting speech recognition grammars for conducting searches by voice. In one such process, search queries are selected from a search query log for incorporation into speech recognition grammar. The search query log may include or consist of search queries specified by users without the use of voice. Another disclosed process enables a user to efficiently submit a search query by partially spelling the search query (e.g., on a telephone keypad or via voice utterances) and uttering the full search query. The user's partial spelling is used to select a particular speech recognition grammar for interpreting the utterance of the full search query.

Type: Application

Filed: November 8, 2010

Publication date: March 24, 2011

Inventors: Nicholas J. Lee, Robert Frederick, Ronald J. Schoenbaum
METHOD OF RECOGNIZING SPEECH

Publication number: 20110046953

Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.

Type: Application

Filed: August 21, 2009

Publication date: February 24, 2011

Applicant: GENERAL MOTORS COMPANY

Inventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
APPARATUS AND METHOD FOR ENHANCED SPEECH RECOGNITION

Publication number: 20110004473

Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.

Type: Application

Filed: July 6, 2009

Publication date: January 6, 2011

Applicant: Nice Systems Ltd.

Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
SPEECH RECOGNITION SYSTEM

Publication number: 20100324901

Abstract: Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P?(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.

Type: Application

Filed: June 23, 2009

Publication date: December 23, 2010

Applicant: Autonomy Corporation Ltd.

Inventors: David Carter, Mahapathy Kadirkamanathan
MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA

Publication number: 20100318355

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Application

Filed: June 10, 2009

Publication date: December 16, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao

1 2 next