Creation Of Reference Templates; Training Of Speech Recognition Systems, E.g., Adaption To The Characteristics Of The Speaker's Voice, Etc. (epo) Patents (Class 704/E15.007)
  • Publication number: 20140088964
    Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.
    Type: Application
    Filed: September 25, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventor: Jerome Bellegarda
  • Publication number: 20140067394
    Abstract: The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicants: KING ABDULAZIZ CITY FOR SCIENCE AND TECHNOLOGY, KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS
    Inventors: DIA EDDIN M. ABUZEINA, MOUSTAFA ELSHAFEI, HUSNI AL-MUHTASEB, WASFI G. AL-KHATIB
  • Publication number: 20130332164
    Abstract: A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 12, 2013
    Inventor: Devang K. Nalk
  • Publication number: 20130246064
    Abstract: A system and method for real-time processing a signal of a voice interaction. In an embodiment, a digital representation of a portion of an interaction may be analyzed in real-time and a segment may be selected. The segment may be associated with a source based on a model of the source. The model may updated based on the segment. The updated model is used to associate subsequent segments with the source. Other embodiments are described and claimed.
    Type: Application
    Filed: March 13, 2012
    Publication date: September 19, 2013
    Inventors: Moshe WASSERBLAT, Tzachi ASHKENAZI, Merav BEN-ASHER, Oren PEREG
  • Publication number: 20130191126
    Abstract: Techniques are described for training a speech recognition model for accented speech. A subword parse table is employed that models mispronunciations at multiple subword levels, such as the syllable, position-specific cluster, and/or phone levels. Mispronunciation probability data is then generated at each level based on inputted training data, such as phone-level annotated transcripts of accented speech. Data from different levels of the subword parse table may then be combined to determine the accented speech model. Mispronunciation probability data at each subword level is based at least in part on context at that level. In some embodiments, phone-level annotated transcripts are generated using a semi-supervised method.
    Type: Application
    Filed: January 20, 2012
    Publication date: July 25, 2013
    Applicant: Microsoft Corporation
    Inventors: Albert Joseph Kishan Thambiratnam, Timo Pascal Mertens, Frank Torsten Bernd Seide
  • Publication number: 20130185070
    Abstract: A speech recognition system trains a plurality of feature transforms and a plurality of acoustic models using an irrelevant variability normalization based discriminative training. The speech recognition system employs the trained feature transforms to absorb or ignore variability within an unknown speech that is irrelevant to phonetic classification. The speech recognition system may then recognize the unknown speech using the trained recognition models. The speech recognition system may further perform an unsupervised adaptation to adapt the feature transforms for the unknown speech and thus increase the accuracy of recognizing the unknown speech.
    Type: Application
    Filed: January 12, 2012
    Publication date: July 18, 2013
    Applicant: Microsoft Corporation
    Inventors: Qiang Huo, Zhi-Jie Yan, Yu Zhang
  • Publication number: 20130144618
    Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.
    Type: Application
    Filed: March 12, 2012
    Publication date: June 6, 2013
    Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin
  • Publication number: 20130132084
    Abstract: A system and method for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
    Type: Application
    Filed: June 21, 2012
    Publication date: May 23, 2013
    Applicant: SOUNDHOUND, INC.
    Inventors: Timothy Stonehocker, Keyvan Mohajer, Bernard Mont-Reynaud
  • Publication number: 20130132080
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.
    Type: Application
    Filed: November 18, 2011
    Publication date: May 23, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Jason Williams, Tirso Alonso, Barbara B. Hollister, Ilya Dan Melamed
  • Publication number: 20130073286
    Abstract: Candidate interpretations resulting from application of speech recognition algorithms to spoken input are presented in a consolidated manner that reduces redundancy. A list of candidate interpretations is generated, and each candidate interpretation is subdivided into time-based portions, forming a grid. Those time-based portions that duplicate portions from other candidate interpretations are removed from the grid. A user interface is provided that presents the user with an opportunity to select among the candidate interpretations; the user interface is configured to present these alternatives without duplicate elements.
    Type: Application
    Filed: September 20, 2011
    Publication date: March 21, 2013
    Applicant: APPLE INC.
    Inventors: Marcello Bastea-Forte, David A. Winarsky
  • Patent number: 8350683
    Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly attached at an inner portion of the windshield of a vehicle equipped with the interior rearview mirror assembly. The interior rearview mirror assembly includes at least two microphones for receiving audio signals within a cabin of the vehicle and generating an output indicative of the audio signals. A control is in the vehicle and is responsive to the output from the at least one microphone. The control at least partially distinguishes vocal signals from non-vocal signals present in the output. The at least two microphones provide sound capture for at least one of a hands free cell phone system, an audio recording system and a wireless communication system.
    Type: Grant
    Filed: August 15, 2011
    Date of Patent: January 8, 2013
    Assignee: Donnelly Corporation
    Inventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
  • Publication number: 20130006636
    Abstract: A meaning extraction device includes a clustering unit, an extraction rule generation unit and an extraction rule application unit. The clustering unit acquires feature vectors that transform numerical features representing the features of words having specific meanings and the surrounding words into elements, and clusters the acquired feature vectors into a plurality of clusters on the basis of the degree of similarity between feature vectors. The extraction rule generation unit performs machine learning based on the feature vectors within a cluster for each cluster, and generates extraction rules to extract words having specific meanings. The extraction rule application unit receives feature vectors generated from the words in documents which are subject to meaning extraction, specifies the optimum extraction rules for the feature vectors, and extracts the meanings of the words on the basis of which the feature vectors were generated by applying the specified extraction rules to the feature vectors.
    Type: Application
    Filed: March 24, 2011
    Publication date: January 3, 2013
    Applicant: NEC CORPORATION
    Inventors: Hironori Mizuguchi, Dai Kusui
  • Publication number: 20130006635
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES
    Inventor: Hagai Aronowitz
  • Publication number: 20130006634
    Abstract: Techniques are provided to improve identification of a person using speaker recognition. In one embodiment, a unique social graph may be associated with each of a plurality of defined contexts. The social graph may indicate speakers likely to be present in a particular context. Thus, an audio signal including a speech signal may be collected and processed. A context may be inferred, and a corresponding social graph may be identified. A set of potential speakers may be determined based on the social graph. The processed signal may then be compared to a restricted set of speech models, each speech model being associated with a potential speaker. By limiting the set of potential speakers, speakers may be more accurately identified.
    Type: Application
    Filed: January 6, 2012
    Publication date: January 3, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Leonard Henry Grokop, Vidya Narayanan
  • Publication number: 20120284025
    Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar The database may comprise a directory of names.
    Type: Application
    Filed: July 18, 2012
    Publication date: November 8, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Harry Blanchard, Steven LEWIS, Shankarnarayan SIVAPRASAD, Lan ZHANG
  • Publication number: 20120278066
    Abstract: A communication interface apparatus for a system and a plurality of users is provided. The communication interface apparatus for the system and the plurality of users includes a first process unit configured to receive voice information and face information from at least one user, and determine whether the received voice information is voice information of at least one registered user based on user models corresponding to the respective received voice information and face information; a second process unit configured to receive the face information, and determine whether the at least one user's attention is on the system based on the received face information; and a third process unit configured to receive the voice information, analyze the received voice information, and determine whether the received voice information is substantially meaningful to the system based on a dialog model that represents conversation flow on a situation basis.
    Type: Application
    Filed: November 9, 2010
    Publication date: November 1, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Nam-Hoon Kim, Chi-Youn Park, Jeong-Mi Cho, Jeong-su Kim
  • Publication number: 20120271631
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Application
    Filed: April 19, 2012
    Publication date: October 25, 2012
    Applicant: ROBERT BOSCH GMBH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Publication number: 20120239399
    Abstract: Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.
    Type: Application
    Filed: March 30, 2010
    Publication date: September 20, 2012
    Inventors: Michihiro Yamazaki, Yuzo Maruta
  • Publication number: 20120203553
    Abstract: A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.
    Type: Application
    Filed: January 22, 2010
    Publication date: August 9, 2012
    Inventor: Yuzo Maruta
  • Publication number: 20120173237
    Abstract: A method and apparatus for updating a speech model on a multi-user speech recognition system with a personal speech model for a single user. A speech recognition system, for instance in a car, can include a generic speech model for comparison with the user speech input. A way of identifying a personal speech model, for instance in a mobile phone, is connected to the system. A mechanism is included for receiving personal speech model components, for instance a BLUETOOTH connection. The generic speech model is updated using the received personal speech model components. Speech recognition can then be performed on user speech using the updated generic speech model.
    Type: Application
    Filed: March 12, 2012
    Publication date: July 5, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Barry Neil Dow, Eric William Janke, Daniel Lee Yuk Cheung, Benjamin Terrick Staniford
  • Publication number: 20120130715
    Abstract: According to one embodiment, an apparatus for generating a voice-tag includes an input unit, a recognition unit, and a combination unit. The input unit is configured to input a registration speech. The recognition unit is configured to recognize the registration speech to obtain N-best recognition results, wherein N is an integer greater than or equal to 2. The combination unit is configured to combine the N-best recognition results as a voice-tag of the registration speech.
    Type: Application
    Filed: September 23, 2011
    Publication date: May 24, 2012
    Inventors: Rui Zhao, Lei He
  • Publication number: 20120116762
    Abstract: A Candidate Isolation System (CIS) detects subscribers of phone call services as candidates to be surveillance targets. A Voice Matching System (VMS) then decides whether or not a given candidate Communication Terminals (CTs) should be tracked by determining, using speaker recognition techniques, whether the subscriber operating the candidate CT is a known target subscriber. The CIS receives from the network call event data that relate to CTs in the network.
    Type: Application
    Filed: October 28, 2011
    Publication date: May 10, 2012
    Applicant: VERINT SYSTEMS LTD.
    Inventors: Eithan Goldfarb, Yoav Ariav
  • Publication number: 20120101821
    Abstract: A speech recognition apparatus is disclosed. The apparatus converts a speech signal into a digitalized speech data, and performs speech recognition based on the speech data. The apparatus makes a comparison between the speech data inputted the last time and the speech data inputted the time before the last time in response to a user's indication that the speech recognition results in erroneous recognition multiple times in a row. When the speech data inputted the last time is determined to substantially match the speech data inputted the time before the last time, the apparatus outputs a guidance prompting the user to utter an input target by calling it by another name.
    Type: Application
    Filed: October 13, 2011
    Publication date: April 26, 2012
    Applicant: DENSO CORPORATION
    Inventor: Takahiro TSUDA
  • Publication number: 20120101812
    Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.
    Type: Application
    Filed: December 30, 2011
    Publication date: April 26, 2012
    Applicant: GOOGLE INC.
    Inventors: Craig Reding, Suzi Levas
  • Publication number: 20120078635
    Abstract: One embodiment of a voice control system includes a first electronic device communicatively coupled to a server and configured to receive a speech recognition file from the server. The speech recognition file may include a speech recognition algorithm for converting one or more voice commands into text and a database including one or more entries comprising one or more voice commands and one or more executable commands associated with the one or more voice commands.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 29, 2012
    Applicant: Apple Inc.
    Inventors: Fletcher Rothkopf, Stephen Brian Lynch, Adam Mittleman, Phil Hobson
  • Publication number: 20120059849
    Abstract: In one embodiment, a system and method is provided to browse and analyze files comprising text strings tagged with metadata. The system and method comprise various functions including browsing the metadata tags in the file, browsing the text strings, selecting subsets of the text strings by including or excluding strings tagged with specific metadata tags, selecting text strings by matching patterns of words and/or parts of speech in the text string and matching selected text strings to a database to identify similar text string. The system and method further provide functions to generate suggested text selection rules by analyzing a selected subset of a plurality of text strings.
    Type: Application
    Filed: September 8, 2010
    Publication date: March 8, 2012
    Applicant: DEMAND MEDIA, INC.
    Inventors: David M. Yehaskel, Henrik M. Kjallbring
  • Publication number: 20120059653
    Abstract: A method for producing speech recognition results on a device includes receiving first speech recognition results, obtaining a language model, wherein the language model represents information stored on the device, and using the first speech recognition results and the language model to generate second speech recognition results.
    Type: Application
    Filed: August 30, 2011
    Publication date: March 8, 2012
    Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas, Jeffrey C. O'Neill
  • Publication number: 20110301954
    Abstract: A method for adjusting a voice recognition system and a voice recognition system is disclosed, wherein the voice recognition system comprises a speaker and a microphone, and wherein the method comprises the steps of: memorizing an audio frequency signal, playing back the audio frequency signal by means of the speaker, generating a detection signal by detecting the audio frequency signal by means of the microphone, and adjusting parameters of the voice recognition system dependent on the detection signal.
    Type: Application
    Filed: June 3, 2010
    Publication date: December 8, 2011
    Applicant: Johnson Controls Technology Company
    Inventors: Michael J. Sims, Brian L. Douthitt, David J. Hughes, Mark Zeinstra, Ted W. Ringold, Douglas W. Klamer, Todd Witters, Elisabet A. Anderson
  • Publication number: 20110276329
    Abstract: A speech dialogue apparatus, a dialogue control method, and a dialogue control program are provided, whereby an appropriate dialogue control is enabled by determining a user's proficiency level in a dialogue behavior correctly and performing an appropriate dialogue control according to the user's proficiency level correctly determined, without being influenced by an accidental one-time behavior of the user. An input unit 1 inputs a speech uttered by the user. An extraction unit 3 extracts a proficiency level determination factor that is a factor for determining a user's proficiency level in a dialogue behavior, based upon an input result of the speech of the input unit 1. A history storage unit 4 stores as a history the proficiency level determination factor extracted by the extraction unit 3.
    Type: Application
    Filed: January 20, 2010
    Publication date: November 10, 2011
    Inventors: Masaaki Ayabe, Jun Okamoto
  • Publication number: 20110276323
    Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 10, 2011
    Applicant: Senam Consulting, Inc.
    Inventor: Serge Olegovich Seyfetdinov
  • Publication number: 20110231189
    Abstract: Techniques for generating a set of one or more alternate titles associated with stored digital media content and updating a speech recognition system to enable the speech recognition system to recognize the set of alternate titles. The system operates on an original media title to extract a set of alternate media titles by applying at least one rule to the original title. The extracted set of alternate media titles are used to update the speech recognition system prior to runtime. In one aspect rules that are applied to original titles are determined by analyzing a corpus of original titles and corresponding possible alternate media titles that a user may use to refer to the original titles.
    Type: Application
    Filed: March 19, 2010
    Publication date: September 22, 2011
    Applicant: Nuance Communications, Inc.
    Inventors: Josef Damianus Anastasiadis, Christophe Nestor George Couvreur
  • Publication number: 20110231190
    Abstract: A system for conducting a telephonic speech recognition application includes an automated telephone device for making telephonic contact with a respondent and a speech recognition device which, upon the telephonic contact being made, presents the respondent with at least one introductory prompt for the respondent to reply to; receives a spoken response from the respondent; and performs a speech recognition analysis on the spoken response to determine a capability of the respondent to complete the application. If the speech recognition device, based on the spoken response to the introductory prompt, determines that the respondent is capable of competing the application, the speech recognition device presents at least one application prompt to the respondent.
    Type: Application
    Filed: March 21, 2011
    Publication date: September 22, 2011
    Applicant: Eliza Corporation
    Inventors: Nasreen Quibria, Lucas Merrow, Oleg Boulanov, John P. Kroeker, Alexandra Drane
  • Publication number: 20110231191
    Abstract: A weight coefficient generation device, a speech recognition device, a navigation system, a vehicle, a vehicle coefficient generation method, and a weight coefficient generation program are provided for the purpose of improving a speech recognition performance of place names. In order to address the above purpose, an address database 12 has address information data items including country names, city names, street names, and house numbers, and manages the address information having a tree structure indicating hierarchical relationships between the place names from wide area to a narrow area. Each of the place names stored in the address database 12 is taken as a speech recognition candidate. A weight coefficient calculation unit 11 of a weight coefficient generation device 10 calculates a weight coefficient of the likelihood of the aforementioned recognition candidate based on the number of the street names belonging to the lower hierarchy below the city names.
    Type: Application
    Filed: November 17, 2009
    Publication date: September 22, 2011
    Inventor: Toshiyuki Miyazaki
  • Publication number: 20110213613
    Abstract: A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.
    Type: Application
    Filed: May 24, 2010
    Publication date: September 1, 2011
    Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno
  • Patent number: 8004392
    Abstract: A voice acquisition system for a vehicle includes an interior rearview mirror assembly. The mirror assembly may include a microphone for receiving audio signals within a cabin of the vehicle and generating an output indicative of these audio signals. The microphone may provide sound capture for a hands free cell phone system, an audio recording system and/or an emergency communication system. The system may include a control that is responsive to the output from the microphone and that distinguishes vocal signals from non-vocal signals present in the output. The microphone may provide sound capture for at least one accessory of the equipped vehicle, and the accessory may be responsive to a vocal signal captured by the microphone. The interior rearview mirror assembly may include at least one accessory, such as an antenna, a video device, a security system status indicator, a tire pressure indicator display and/or a loudspeaker.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: August 23, 2011
    Assignee: Donnelly Corporation
    Inventors: Jonathan E. DeLine, Niall R. Lynam, Ralph A. Spooner, Phillip A. March
  • Publication number: 20110166858
    Abstract: A method for recognizing speech involves presenting an utterance to a speech recognition system and determining, via the speech recognition system, that the utterance contains a particular expression, where the particular expression is capable of being associated with at least two different meanings. The method further involves splitting the utterance into a plurality of speech frames, where each frame is assigned a predetermined time segment and a frame number, and indexing the utterance to i) a predetermined frame number, or ii) a predetermined time segment. The indexing of the utterance identifies that one of the frames includes the particular expression. Then the frame including the particular expression is re-presented to the speech recognition system to verify that the particular expression was actually recited in the utterance.
    Type: Application
    Filed: January 6, 2010
    Publication date: July 7, 2011
    Applicant: GENERAL MOTORS LLC
    Inventor: Uma Arun
  • Publication number: 20110161081
    Abstract: Methods, computer program products and systems are described for forming a speech recognition language model. Multiple query-website relationships are determined by identifying websites that are determined to be relevant to queries using one or more search engines. Clusters are identified in the query-website relationships by connecting common queries and connecting common websites. A speech recognition language model is created for a particular website based on at least one of analyzing at queries in a cluster that includes the website or analyzing webpage content of web pages in the cluster that includes the website.
    Type: Application
    Filed: December 22, 2010
    Publication date: June 30, 2011
    Applicant: GOOGLE INC.
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
  • Publication number: 20110161072
    Abstract: A frequency counting unit (15A) counts occurrence frequencies (14B) in input text data (14A) for respective words or word chains contained in the input text data (14A). A context diversity calculation unit (15B) calculates, for the respective words or word chains, diversity indices (14C) each indicating the context diversity of a word or word chain. A frequency correction unit (15C) corrects the occurrence frequencies (14B) of the respective words or word chains based on the diversity indices (14C) of the respective words or word chains. An N-gram language model creation unit (15D) creates an N-gram language model (14E) based on the corrected occurrence frequencies (14D) obtained for the respective words or word chains.
    Type: Application
    Filed: August 20, 2009
    Publication date: June 30, 2011
    Applicant: NEC CORPORATION
    Inventors: Makoto Terao, Kiyokazu Miki, Hitoshi Yamamoto
  • Publication number: 20110144993
    Abstract: A disfluent-utterance tracking system includes a speech transducer; one or more targeted-disfluent-utterance records stored in a memory; a real-time speech recording mechanism operatively connected with the speech transducer for recording a real-time utterance; and an analyzer operatively coupled with the targeted-disfluent-utterance record and with the real-time speech recording mechanism, the analyzer configured to compare one or more real-time snippets of the recorded speech with the targeted-disfluent-utterance record to determine and indicate to a user a level of correlation therebetween.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Inventor: David Ruby
  • Publication number: 20110137652
    Abstract: Disclosed herein are methods and systems for recognizing speech. A method embodiment comprises comparing received speech with a precompiled grammar based on a database and if the received speech matches data in the precompiled grammar then returning a result based on the matched data. If the received speech does not match data in the precompiled grammar, then dynamically compiling a new grammar based only on new data added to the database after the compiling of the precompiled grammar. The database may comprise a directory of names.
    Type: Application
    Filed: February 14, 2011
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Harry Blanchard, Steven Lewis, Shankarnarayan Sivaprasad, Lan Zhang
  • Publication number: 20110137650
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.
    Type: Application
    Filed: December 8, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej LJOLJE
  • Publication number: 20110131037
    Abstract: An in-vehicle audio system and methods are provided. A respective word or a respective phrase may be associated with each item of audio content stored in the in-vehicle audio system. The in-vehicle audio system may perform an action with respect to one of the stored items of audio content in response to a spoken command, which may include the respective word or the respective phrase associated with the one of the stored items. When audio content is to be added to the in-vehicle audio system, phonetics related to the audio content may be generated and added to a vocabulary dictionary during a compile process. When stored audio content is to be deleted from the in-vehicle audio system, phonetics related to the stored audio content to be deleted may be eliminated from the vocabulary dictionary during the compile process, which, in some embodiments, may be performed during a shutdown process.
    Type: Application
    Filed: November 20, 2010
    Publication date: June 2, 2011
    Applicant: Honda Motor Co., Ltd.
    Inventors: Ritchie Huang, Stuart M. Yamamoto, David M. Kirsch
  • Publication number: 20110093265
    Abstract: Systems and methods for creating and using geo-centric language models are provided herein. An exemplary method includes assigning each of a plurality of listings to a local service area, determining a geographic center for the local service area, computing a listing density for the local service area, and selecting a desired number of listings for a geo-centric listing set. The geo-centric listing set includes a subset of the plurality of listings. The exemplary method further includes dividing the local service area into regions based upon the listing density and the number of listings in the geo-centric listing set, and building a language model for the geo-centric listing set.
    Type: Application
    Filed: October 16, 2009
    Publication date: April 21, 2011
    Inventors: Amanda Stent, Dlamantino Caseiro, Ilija Zeljkovic, Jay Wilpon
  • Publication number: 20110082696
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for generating a speech recognition model for a media content retrieval system. The method causes a computing device to retrieve information describing media available in a media content retrieval system, construct a graph that models how the media are interconnected based on the retrieved information, rank the information describing the media based on the graph, and generate a speech recognition model based on the ranked information. The information can be a list of actors, directors, composers, titles, and/or locations. The graph that models how the media are interconnected can further model pieces of common information between two or more media. The method can further cause the computing device to weight the graph based on the retrieved information. The graph can further model relative popularity information in the list. The method can rank information based on a PageRank algorithm.
    Type: Application
    Filed: October 5, 2009
    Publication date: April 7, 2011
    Applicant: AT & T Intellectual Property I, L.P.
    Inventors: Michael JOHNSTON, Ebrahim KAZEMZADEH
  • Publication number: 20110077942
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Diamantino Antonio Caseiro
  • Publication number: 20110071827
    Abstract: Various processes are disclosed for generating and selecting speech recognition grammars for conducting searches by voice. In one such process, search queries are selected from a search query log for incorporation into speech recognition grammar. The search query log may include or consist of search queries specified by users without the use of voice. Another disclosed process enables a user to efficiently submit a search query by partially spelling the search query (e.g., on a telephone keypad or via voice utterances) and uttering the full search query. The user's partial spelling is used to select a particular speech recognition grammar for interpreting the utterance of the full search query.
    Type: Application
    Filed: November 8, 2010
    Publication date: March 24, 2011
    Inventors: Nicholas J. Lee, Robert Frederick, Ronald J. Schoenbaum
  • Publication number: 20110046953
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Application
    Filed: August 21, 2009
    Publication date: February 24, 2011
    Applicant: GENERAL MOTORS COMPANY
    Inventors: Uma Arun, Sherri J. Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Publication number: 20110004473
    Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.
    Type: Application
    Filed: July 6, 2009
    Publication date: January 6, 2011
    Applicant: Nice Systems Ltd.
    Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
  • Publication number: 20100324901
    Abstract: Various methods and apparatus are described for a speech recognition system. In an embodiment, the statistical language model (SLM) provides probability estimates of how linguistically likely a sequence of linguistic items are to occur in that sequence based on an amount of times the sequence of linguistic items occurs in text and phrases in general use. The speech recognition decoder module requests a correction module for one or more corrected probability estimates P?(z|xy) of how likely a linguistic item z follows a given sequence of linguistic items x followed by y, where (x, y, and z) are three variable linguistic items supplied from the decoder module. The correction module is trained to linguistics of a specific domain, and is located in between the decoder module and the SLM in order to adapt the probability estimates supplied by the SLM to the specific domain when those probability estimates from the SLM significantly disagree with the linguistic probabilities in that domain.
    Type: Application
    Filed: June 23, 2009
    Publication date: December 23, 2010
    Applicant: Autonomy Corporation Ltd.
    Inventors: David Carter, Mahapathy Kadirkamanathan
  • Publication number: 20100318355
    Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.
    Type: Application
    Filed: June 10, 2009
    Publication date: December 16, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao