Creation Of Reference Templates; Training Of Speech Recognition Systems, E.g., Adaption To The Characteristics Of The Speaker's Voice, Etc. (epo) Patents (Class 704/E15.007)
-
Publication number: 20140088964Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.Type: ApplicationFiled: September 25, 2012Publication date: March 27, 2014Applicant: APPLE INC.Inventor: Jerome Bellegarda
-
Publication number: 20140067394Abstract: The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.Type: ApplicationFiled: August 28, 2012Publication date: March 6, 2014Applicants: KING ABDULAZIZ CITY FOR SCIENCE AND TECHNOLOGY, KING FAHD UNIVERSITY OF PETROLEUM AND MINERALSInventors: DIA EDDIN M. ABUZEINA, MOUSTAFA ELSHAFEI, HUSNI AL-MUHTASEB, WASFI G. AL-KHATIB
-
Publication number: 20130332164Abstract: A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.Type: ApplicationFiled: June 8, 2012Publication date: December 12, 2013Inventor: Devang K. Nalk
-
Publication number: 20130246064Abstract: A system and method for real-time processing a signal of a voice interaction. In an embodiment, a digital representation of a portion of an interaction may be analyzed in real-time and a segment may be selected. The segment may be associated with a source based on a model of the source. The model may updated based on the segment. The updated model is used to associate subsequent segments with the source. Other embodiments are described and claimed.Type: ApplicationFiled: March 13, 2012Publication date: September 19, 2013Inventors: Moshe WASSERBLAT, Tzachi ASHKENAZI, Merav BEN-ASHER, Oren PEREG
-
Publication number: 20130191126Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.Type: ApplicationFiled: January 20, 2012Publication date: July 25, 2013Applicant: Microsoft CorporationInventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
-
Publication number: 20130185070Abstract: A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.Type: ApplicationFiled: January 12, 2012Publication date: July 18, 2013Applicant: Microsoft CorporationInventors: Qiang Huo, Zhi-Jie Yan, Yu Zhang
-
Publication number: 20130144618Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.Type: ApplicationFiled: March 12, 2012Publication date: June 6, 2013Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin
-
Publication number: 20130132080Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.Type: ApplicationFiled: November 18, 2011Publication date: May 23, 2013Applicant: AT&T Intellectual Property I, L.P.Inventors: Jason Williams, Tirso Alonso, Barbara B. Hollister, Ilya Dan Melamed
-
Publication number: 20130132084Abstract: A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.Type: ApplicationFiled: June 21, 2012Publication date: May 23, 2013Applicant: SOUNDHOUND, INC.Inventors: Timothy Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
-
Publication number: 20130073286Abstract: Candidate interpretations resulting from application of speech recognition algorithms to spoken input are presented in a consolidated manner that reduces redundancy. A list of candidate interpretations is generated, and each candidate interpretation is subdivided into time-based portions, forming a grid. Those time-based portions that duplicate portions from other candidate interpretations are removed from the grid. A user interface is provided that presents the user with an opportunity to select among the candidate interpretations; the user interface is configured to present these alternatives without duplicate elements.Type: ApplicationFiled: September 20, 2011Publication date: March 21, 2013Applicant: APPLE INC.Inventors: Marcello Bastea-Forte, David A. Winarsky
-
Patent number: 8350683Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly attached at an inner portion of the windshield of a vehicle equipped with the interior rearview mirror assembly. The interior rearview mirror assembly includes at least two microphones for receiving audio signals within a cabin of the vehicle and generating an output indicative of the audio signals. A control is in the vehicle and is responsive to the output from the at least one microphone. The control at least partially distinguishes vocal signals from non-vocal signals present in the output. The at least two microphones provide sound capture for at least one of a hands free cell phone system, an audio recording system and a wireless communication system.Type: GrantFiled: August 15, 2011Date of Patent: January 8, 2013Assignee: Donnelly CorporationInventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
-
Publication number: 20130006636Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.Type: ApplicationFiled: March 24, 2011Publication date: January 3, 2013Applicant: NEC CORPORATIONInventors: Hironori Mizuguchi, Dai Kusui
-
Publication number: 20130006634Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.Type: ApplicationFiled: January 6, 2012Publication date: January 3, 2013Applicant: QUALCOMM IncorporatedInventors: Leonard Henry Grokop, Vidya Narayanan
-
Publication number: 20130006635Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: ApplicationFiled: September 11, 2012Publication date: January 3, 2013Applicant: INTERNATIONAL BUSINESS MACHINESInventor: Hagai Aronowitz
-
Publication number: 20120284025Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar The database may comprise a directory of names.Type: ApplicationFiled: July 18, 2012Publication date: November 8, 2012Applicant: AT&T Intellectual Property II, L.P.Inventors: Harry Blanchard, Steven LEWIS, Shankarnarayan SIVAPRASAD, Lan ZHANG
-
Publication number: 20120278066Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.Type: ApplicationFiled: November 9, 2010Publication date: November 1, 2012Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
-
Publication number: 20120271631Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.Type: ApplicationFiled: April 19, 2012Publication date: October 25, 2012Applicant: ROBERT BOSCH GMBHInventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
-
Publication number: 20120239399Abstract: Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.Type: ApplicationFiled: March 30, 2010Publication date: September 20, 2012Inventors: Michihiro Yamazaki, Yuzo Maruta
-
Publication number: 20120203553Abstract: A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.Type: ApplicationFiled: January 22, 2010Publication date: August 9, 2012Inventor: Yuzo Maruta
-
Publication number: 20120173237Abstract: A method and apparatus for updating a speech model on a multi-user speech recognition system with a personal speech model for a single user. A speech recognition system, for instance in a car, can include a generic speech model for comparison with the user speech input. A way of identifying a personal speech model, for instance in a mobile phone, is connected to the system. A mechanism is included for receiving personal speech model components, for instance a BLUETOOTH connection. The generic speech model is updated using the received personal speech model components. Speech recognition can then be performed on user speech using the updated generic speech model.Type: ApplicationFiled: March 12, 2012Publication date: July 5, 2012Applicant: Nuance Communications, Inc.Inventors: Barry Neil Dow, Eric William Janke, Daniel Lee Yuk Cheung, Benjamin Terrick Staniford
-
Publication number: 20120130715Abstract: According to one embodiment, an apparatus for generating a voice-tag includes an input unit, a recognition unit, and a combination unit. The input unit is configured to input a registration speech. The recognition unit is configured to recognize the registration speech to obtain N-best recognition results, wherein N is an integer greater than or equal to 2. The combination unit is configured to combine the N-best recognition results as a voice-tag of the registration speech.Type: ApplicationFiled: September 23, 2011Publication date: May 24, 2012Inventors: Rui Zhao, Lei He
-
Publication number: 20120116762Abstract: A Candidate Isolation System (CIS) detects subscribers of phone call services as candidates to be surveillance targets. A Voice Matching System (VMS) then decides whether or not a given candidate Communication Terminals (CTs) should be tracked by determining, using speaker recognition techniques, whether the subscriber operating the candidate CT is a known target subscriber. The CIS receives from the network call event data that relate to CTs in the network.Type: ApplicationFiled: October 28, 2011Publication date: May 10, 2012Applicant: VERINT SYSTEMS LTD.Inventors: Eithan Goldfarb, Yoav Ariav
-
Publication number: 20120101821Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.Type: ApplicationFiled: October 13, 2011Publication date: April 26, 2012Applicant: DENSO CORPORATIONInventor: Takahiro TSUDA
-
Publication number: 20120101812Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.Type: ApplicationFiled: December 30, 2011Publication date: April 26, 2012Applicant: GOOGLE INC.Inventors: Craig Reding, Suzi Levas
-
Publication number: 20120078635Abstract: One embodiment of a voice control system includes a first electronic device communicatively coupled to a server and configured to receive a speech recognition file from the server. The speech recognition file may include a speech recognition algorithm for converting one or more voice commands into text and a database including one or more entries comprising one or more voice commands and one or more executable commands associated with the one or more voice commands.Type: ApplicationFiled: September 24, 2010Publication date: March 29, 2012Applicant: Apple Inc.Inventors: Fletcher Rothkopf, Stephen Brian Lynch, Adam Mittleman, Phil Hobson
-
Publication number: 20120059653Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.Type: ApplicationFiled: August 30, 2011Publication date: March 8, 2012Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
-
Publication number: 20120059849Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.Type: ApplicationFiled: September 8, 2010Publication date: March 8, 2012Applicant: DEMAND MEDIA, INC.Inventors: David M. Yehaskel, Henrik M. Kjallbring
-
Publication number: 20110301954Abstract: A method for adjusting a voice recognition system and a voice recognition system is disclosed, wherein the voice recognition system comprises a speaker and a microphone, and wherein the method comprises the steps of: memorizing an audio frequency signal, playing back the audio frequency signal by means of the speaker, generating a detection signal by detecting the audio frequency signal by means of the microphone, and adjusting parameters of the voice recognition system dependent on the detection signal.Type: ApplicationFiled: June 3, 2010Publication date: December 8, 2011Applicant: Johnson Controls Technology CompanyInventors: Michael J. Sims, Brian L. Douthitt, David J. Hughes, Mark Zeinstra, Ted W. Ringold, Douglas W. Klamer, Todd Witters, Elisabet A. Anderson
-
Publication number: 20110276329Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.Type: ApplicationFiled: January 20, 2010Publication date: November 10, 2011Inventors: Masaaki Ayabe, Jun Okamoto
-
Publication number: 20110276323Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.Type: ApplicationFiled: May 6, 2010Publication date: November 10, 2011Applicant: Senam Consulting, Inc.Inventor: Serge Olegovich Seyfetdinov
-
Publication number: 20110231189Abstract: Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles.Type: ApplicationFiled: March 19, 2010Publication date: September 22, 2011Applicant: Nuance Communications, Inc.Inventors: Josef Damianus Anastasiadis, Christophe Nestor George Couvreur
-
Publication number: 20110231191Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.Type: ApplicationFiled: November 17, 2009Publication date: September 22, 2011Inventor: Toshiyuki Miyazaki
-
Publication number: 20110231190Abstract: A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.Type: ApplicationFiled: March 21, 2011Publication date: September 22, 2011Applicant: Eliza CorporationInventors: Nasreen Quibria, Lucas Merrow, Oleg Boulanov, John P. Kroeker, Alexandra Drane
-
Publication number: 20110213613Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.Type: ApplicationFiled: May 24, 2010Publication date: September 1, 2011Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno
-
Patent number: 8004392Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly. The mirror assembly may include a microphone for receiving audio signals within a cabin of the vehicle and generating an output indicative of these audio signals. The microphone may provide sound capture for a hands free cell phone system, an audio recording system and/or an emergency communication system. The system may include a control that is responsive to the output from the microphone and that distinguishes vocal signals from non-vocal signals present in the output. The microphone may provide sound capture for at least one accessory of the equipped vehicle, and the accessory may be responsive to a vocal signal captured by the microphone. The interior rearview mirror assembly may include at least one accessory, such as an antenna, a video device, a security system status indicator, a tire pressure indicator display and/or a loudspeaker.Type: GrantFiled: December 19, 2008Date of Patent: August 23, 2011Assignee: Donnelly CorporationInventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
-
Publication number: 20110166858Abstract: A method for recognizing speech involves presenting an utterance to a speech recognition system and determining, via the speech recognition system, that the utterance contains a particular expression, where the particular expression is capable of being associated with at least two different meanings. The method further involves splitting the utterance into a plurality of speech frames, where each frame is assigned a predetermined time segment and a frame number, and indexing the utterance to i) a predetermined frame number, or ii) a predetermined time segment. The indexing of the utterance identifies that one of the frames includes the particular expression. Then the frame including the particular expression is re-presented to the speech recognition system to verify that the particular expression was actually recited in the utterance.Type: ApplicationFiled: January 6, 2010Publication date: July 7, 2011Applicant: GENERAL MOTORS LLCInventor: Uma Arun
-
Publication number: 20110161072Abstract: A frequency counting unit (15A) counts occurrence frequencies (14B) in input text data (14A) for respective words or word chains contained in the input text data (14A). A context diversity calculation unit (15B) calculates, for the respective words or word chains, diversity indices (14C) each indicating the context diversity of a word or word chain. A frequency correction unit (15C) corrects the occurrence frequencies (14B) of the respective words or word chains based on the diversity indices (14C) of the respective words or word chains. An N-gram language model creation unit (15D) creates an N-gram language model (14E) based on the corrected occurrence frequencies (14D) obtained for the respective words or word chains.Type: ApplicationFiled: August 20, 2009Publication date: June 30, 2011Applicant: NEC CORPORATIONInventors: Makoto Terao, Kiyokazu Miki, Hitoshi Yamamoto
-
Publication number: 20110161081Abstract: Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.Type: ApplicationFiled: December 22, 2010Publication date: June 30, 2011Applicant: GOOGLE INC.Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
-
Publication number: 20110144993Abstract: A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Inventor: David Ruby
-
Publication number: 20110137650Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.Type: ApplicationFiled: December 8, 2009Publication date: June 9, 2011Applicant: AT&T Intellectual Property I, L.P.Inventor: Andrej LJOLJE
-
Publication number: 20110137652Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar. The database may comprise a directory of names.Type: ApplicationFiled: February 14, 2011Publication date: June 9, 2011Applicant: AT&T Intellectual Property II, L.P.Inventors: Harry Blanchard, Steven Lewis, Shankarnarayan Sivaprasad, Lan Zhang
-
Publication number: 20110131037Abstract: An in-vehicle audio system and methods are provided. A respective word or a respective phrase may be associated with each item of audio content stored in the in-vehicle audio system. The in-vehicle audio system may perform an action with respect to one of the stored items of audio content in response to a spoken command, which may include the respective word or the respective phrase associated with the one of the stored items. When audio content is to be added to the in-vehicle audio system, phonetics related to the audio content may be generated and added to a vocabulary dictionary during a compile process. When stored audio content is to be deleted from the in-vehicle audio system, phonetics related to the stored audio content to be deleted may be eliminated from the vocabulary dictionary during the compile process, which, in some embodiments, may be performed during a shutdown process.Type: ApplicationFiled: November 20, 2010Publication date: June 2, 2011Applicant: Honda Motor Co., Ltd.Inventors: Ritchie Huang, Stuart M. Yamamoto, David M. Kirsch
-
Publication number: 20110093265Abstract: Systems and methods for creating and using geo-centric language models are provided herein. An exemplary method includes assigning each of a plurality of listings to a local service area, determining a geographic center for the local service area, computing a listing density for the local service area, and selecting a desired number of listings for a geo-centric listing set. The geo-centric listing set includes a subset of the plurality of listings. The exemplary method further includes dividing the local service area into regions based upon the listing density and the number of listings in the geo-centric listing set, and building a language model for the geo-centric listing set.Type: ApplicationFiled: October 16, 2009Publication date: April 21, 2011Inventors: Amanda Stent, Dlamantino Caseiro, Ilija Zeljkovic, Jay Wilpon
-
Publication number: 20110082696Abstract: Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.Type: ApplicationFiled: October 5, 2009Publication date: April 7, 2011Applicant: AT & T Intellectual Property I, L.P.Inventors: Michael JOHNSTON, Ebrahim KAZEMZADEH
-
Publication number: 20110077942Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.Type: ApplicationFiled: September 30, 2009Publication date: March 31, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Andrej LJOLJE, Diamantino Antonio Caseiro
-
Publication number: 20110071827Abstract: Various processes are disclosed for generating and selecting speech recognition grammars for conducting searches by voice. In one such process, search queries are selected from a search query log for incorporation into speech recognition grammar. The search query log may include or consist of search queries specified by users without the use of voice. Another disclosed process enables a user to efficiently submit a search query by partially spelling the search query (e.g., on a telephone keypad or via voice utterances) and uttering the full search query. The user's partial spelling is used to select a particular speech recognition grammar for interpreting the utterance of the full search query.Type: ApplicationFiled: November 8, 2010Publication date: March 24, 2011Inventors: Nicholas J. Lee, Robert Frederick, Ronald J. Schoenbaum
-
Publication number: 20110046953Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.Type: ApplicationFiled: August 21, 2009Publication date: February 24, 2011Applicant: GENERAL MOTORS COMPANYInventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
-
Publication number: 20110004473Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.Type: ApplicationFiled: July 6, 2009Publication date: January 6, 2011Applicant: Nice Systems Ltd.Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
-
Publication number: 20100324901Abstract: Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P?(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.Type: ApplicationFiled: June 23, 2009Publication date: December 23, 2010Applicant: Autonomy Corporation Ltd.Inventors: David Carter, Mahapathy Kadirkamanathan
-
Publication number: 20100318355Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.Type: ApplicationFiled: June 10, 2009Publication date: December 16, 2010Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao