Creating Patterns For Matching Patents (Class 704/243)
  • Publication number: 20140214420
    Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Kaisheng Yao, Yifan Gong
  • Publication number: 20140214421
    Abstract: Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.
    Type: Application
    Filed: January 31, 2013
    Publication date: July 31, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tur, Larry Heck, Heeyoung Lee
  • Publication number: 20140207458
    Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.
    Type: Application
    Filed: March 24, 2014
    Publication date: July 24, 2014
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
  • Publication number: 20140207457
    Abstract: A system and method are presented for using spoken word verification to reduce false alarms by exploiting global and local contexts on a lexical level, a phoneme level, and on an acoustical level. The reduction of false alarms may occur through a process that determines whether a word has been detected or if it is a false alarm. Training examples are used to generate models of internal and external contexts which are compared to test word examples. The word may be accepted or rejected based on comparison results. Comparison may be performed either at the end of the process or at multiple steps of the process to determine whether the word is rejected.
    Type: Application
    Filed: January 22, 2013
    Publication date: July 24, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Konstantin Biatov, Aravind Ganapathiraju, Felix Immanuel Wyss
  • Patent number: 8788256
    Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: July 22, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
  • Publication number: 20140200891
    Abstract: Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 17, 2014
    Inventors: Jean-Marie Henri Daniel Larcheveque, Elizabeth Ireland Powers, Freya Kate Recksiek, Dan Teodosiu
  • Patent number: 8781831
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Grant
    Filed: September 5, 2013
    Date of Patent: July 15, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
  • Patent number: 8781830
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: July 2, 2013
    Date of Patent: July 15, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: William K. Bodin, Michael J. Burkhart, Daniel G. Eisenhauer, Thomas J. Watson, Daniel M. Schumacher
  • Patent number: 8781821
    Abstract: A method is disclosed for controlling a voice-activated device by interpreting a spoken command as a series of voiced and non-voiced intervals. A responsive action is then performed according to the number of voiced intervals in the command. The method is well-suited to applications having a small number of specific voice-activated response functions. Applications using the inventive method offer numerous advantages over traditional speech recognition systems including speaker universality, language independence, no training or calibration needed, implementation with simple microcontrollers, and extremely low cost. For time-critical applications such as pulsers and measurement devices, where fast reaction is crucial to catch a transient event, the method provides near-instantaneous command response, yet versatile voice control.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: July 15, 2014
    Assignee: Zanavox
    Inventor: David Edward Newman
  • Patent number: 8775176
    Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: July 8, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Stephan Kanthak
  • Patent number: 8775177
    Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: July 8, 2014
    Assignee: Google Inc.
    Inventors: Georg Heigold, Patrick An Phu Nguyen, Mitchel Weintraub, Vincent O. Vanhoucke
  • Patent number: 8768697
    Abstract: In some embodiments, a method includes measuring a disparity between two speech samples by segmenting both a reference speech sample and a student speech sample into speech units. A duration disparity can be determined for units that are not adjacent to each other in the reference speech sample. A duration disparity can also be determined for the corresponding units in the student speech sample. A difference can then be calculated between the student speech sample duration disparity and the reference speech sample duration disparity.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: July 1, 2014
    Assignee: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Patent number: 8768703
    Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises detecting a text portion of a media stream including a video stream, the text portion not being consumable by a blind person, retrieving text associated with the text portion of the media stream, and converting the text to a first audio stream based on a first type of a first program in the media stream, and converting the text to a second audio stream based on a second type of a second program in the media stream.
    Type: Grant
    Filed: July 19, 2012
    Date of Patent: July 1, 2014
    Assignee: AT&T Intellectual Property, I, L.P.
    Inventors: Hisao M. Chang, Horst Schroeter
  • Patent number: 8756062
    Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: June 17, 2014
    Assignee: General Motors LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Patent number: 8751226
    Abstract: A speech processing apparatus 101 includes a recognition feature extracting unit 12 that extracts recognition feature information which is a characteristic of a speech recognition result 15 obtained by performing a speech recognition process on an inputted speech from the speech recognition result 15; a language feature extracting unit 11 that extracts language feature information which is a characteristic of a pre-registered language resource 14 from the language resource 14; and a model learning unit 13 that obtains a verification model 16 by a learning process based on the extracted recognition feature information and language feature information.
    Type: Grant
    Filed: June 18, 2007
    Date of Patent: June 10, 2014
    Assignee: NEC Corporation
    Inventors: Hitoshi Yamamoto, Kiyokazu Miki
  • Patent number: 8751230
    Abstract: A method and a device (1) for automatically generating vocabulary entry from input acoustic data (3), comprising a vocabulary entry type-specific acoustic phonetic transcription module (2; T) and a classifier module (6; 6?) for the classification of vocabulary entry types on the basis of the phonetic structure, wherein the classification of vocabulary entries is carried out in accordance with a number of predetermined types; and vocabulary entry type-specific phoneme-to-grapheme conversion means (28), to derive the respective vocabulary entries comprising a pair of a phonetic transcription and its grapheme form.
    Type: Grant
    Filed: June 17, 2009
    Date of Patent: June 10, 2014
    Assignee: Koninklijke Philips N.V.
    Inventor: Zsolt Saffer
  • Patent number: 8751231
    Abstract: Methods and systems for model-driven candidate sorting based on audio cues for evaluating digital interviews are described. In one embodiment, an audio cue generator identifies utterances in audio data of a digital interview. The utterances each include a group of one or more words spoken by a candidate in the digital interview. The audio cue generator generates audio cues of the digital interview based on the identified utterances. The audio cues are applies to a prediction model to predict an achievement index for the candidate based on the audio cues. The candidate is displayed in a list of candidates based on the achievement index. The list of candidates is sorted according to the candidates' achievement index.
    Type: Grant
    Filed: February 18, 2014
    Date of Patent: June 10, 2014
    Assignee: Hirevue, Inc.
    Inventors: Loren Larsen, Benjamin Taylor
  • Publication number: 20140156273
    Abstract: A system and a method perform information recognition. The method arranges data base information in a data base information structure. The method matches input information to the data base information using at least one matching algorithm and using a matching information structure. In accordance with the system and the method, the matching information structure differs from the data base information structure.
    Type: Application
    Filed: June 8, 2013
    Publication date: June 5, 2014
    Inventors: Walter Steven Rosenbaum, Joern Bach
  • Patent number: 8744853
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Patent number: 8744849
    Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: June 3, 2014
    Assignee: Industrial Technology Research Institute
    Inventor: Hsien-Cheng Liao
  • Patent number: 8744850
    Abstract: Challenge items for an audible based electronic challenge system are generated using a variety of techniques to identify optimal candidates. The challenge items are intended for use in a computing system that discriminates between humans and text to speech (TTS) system.
    Type: Grant
    Filed: January 14, 2013
    Date of Patent: June 3, 2014
    Assignee: John Nicholas and Kristin Gross
    Inventor: John Nicholas Gross
  • Patent number: 8738376
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: May 27, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Publication number: 20140142942
    Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.
    Type: Application
    Filed: January 23, 2014
    Publication date: May 22, 2014
    Applicant: Accumente, LLC
    Inventors: Jike CHONG, Ian Richard LANE, Senaka Wimal BUTHPITIYA
  • Patent number: 8731926
    Abstract: A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.
    Type: Grant
    Filed: March 3, 2011
    Date of Patent: May 20, 2014
    Assignee: Fujitsu Limited
    Inventors: Nobuyuki Washio, Shouji Harada
  • Patent number: 8731928
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: May 20, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 8731923
    Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.
    Type: Grant
    Filed: August 20, 2010
    Date of Patent: May 20, 2014
    Assignee: Adacel Systems, Inc.
    Inventor: Chang-Qing Shu
  • Patent number: 8725829
    Abstract: A method and system is described which allows users to identify (pre-recorded) sounds such as music, radio broadcast, commercials, and other audio signals in almost any environment. The audio signal (or sound) must be a recording represented in a database of recordings. The service can quickly identify the signal from just a few seconds of excerption, while tolerating high noise and distortion. Once the signal is identified to the user, the user may perform transactions interactively in real-time or offline using the identification information.
    Type: Grant
    Filed: April 26, 2004
    Date of Patent: May 13, 2014
    Assignee: Shazam Investments Limited
    Inventors: Avery Li-Chun Wang, Christopher Jacques Penrose Barton, Dheeraj Shankar Mukherjee, Philip Inghelbrecht
  • Publication number: 20140129222
    Abstract: When it is determined that sound data is unrecognizable through a speech recognition process by a first speech recognition unit (3), the same sound data as the sound data inputted to the first speech recognition unit (3) is transmitted to a second server device (60) and a first server device (70). Recognition data is generated which is formed of a character string that is a speech recognition result by the second server device (60) with respect to the sound data, and an acoustic model identifier series generated by a first acoustic model identifier series generation unit (27) of the first server (70) based on the sound data, and the generated recognition data is registered in a first recognition dictionary (3b) of the first speech recognition unit (3).
    Type: Application
    Filed: August 9, 2012
    Publication date: May 8, 2014
    Applicant: ASAHI KASEI KABUSHIKI KAISHA
    Inventor: Akihiro Okamoto
  • Patent number: 8719017
    Abstract: Speech recognition models are dynamically re-configurable based on user information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. The techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: May 6, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C Rose, Bojana Gajic
  • Publication number: 20140118472
    Abstract: In one embodiment, a method includes receiving requests to join a conference from a plurality of user devices proximate a first endpoint. The requests include a username. The method also includes receiving an audio signal for the conference from the first endpoint. The first endpoint is operable to capture audio proximate the first endpoint. The method also includes transmitting the audio signal to a second endpoint, remote from the first endpoint. The method also includes identifying, by a processor, an active speaker proximate the first endpoint based on information received from the plurality of user devices.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 1, 2014
    Inventors: Yanghua Liu, Weidong Chen, Biren Gandhi, Raghurama Bhat, Joseph Fouad Khouri, John Joseph Houston, Brian Thomas Toombs
  • Patent number: 8712756
    Abstract: A character input device is disclosed. The device includes a character input section including a plurality of character keys; a display section that displays an input character(s); and a next word prediction section that predicts a respective word being subsequently input in an event of input-word reception in the character input section and that displays the word as a next word candidate on the display section. The next word prediction section stores usage history information indicative of whether the next word candidate for the respective input-received word was used by a user, and determines in accordance with the usage history information of words as of a time point of the event of input-word reception of the word whether to display the next word candidate on the display section, and inhibits the display of a next word candidate when a value obtained by adding a constant to the number of used times of the candidate is smaller than the number of its unused times.
    Type: Grant
    Filed: April 29, 2008
    Date of Patent: April 29, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventor: Sun Xiaoning
  • Patent number: 8706487
    Abstract: Acoustic models and language models are learned according to a speaking length which indicates a length of a speaking section in speech data, and speech recognition process is implemented by using the learned acoustic models and language models. A speech recognition apparatus includes means (103) for detecting a speaking section in speech data (101) and for generating a section information which indicates the detected speaking section, means (104) for recognizing a data part corresponding to a section information in the speech data as well as text data (102) written from the speech data and for classifying the data part based on a speaking length thereof, and means (106) for learning acoustic models and language models (107) by using the classified data part (105).
    Type: Grant
    Filed: December 7, 2007
    Date of Patent: April 22, 2014
    Assignee: NEC Corporation
    Inventors: Tadashi Emori, Yoshifumi Onishi
  • Patent number: 8706499
    Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: April 22, 2014
    Assignee: Facebook, Inc.
    Inventors: Matthew Nicholas Papakipos, David Harry Garcia
  • Patent number: 8700406
    Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.
    Type: Grant
    Filed: August 19, 2011
    Date of Patent: April 15, 2014
    Assignee: Qualcomm Incorporated
    Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
  • Patent number: 8700400
    Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.
    Type: Grant
    Filed: December 30, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
  • Patent number: 8694313
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information. A method includes receiving an audio signal, generating an affinity score based on a frequency with which a user has previously communicated with a contact associated with an item of contact information, and further based on a recency of one or more past interactions between the user and the contact associated with the item of contact information, inferring a probability that the user intends to initiate a communication using the item of contact information based on the affinity score generated for the item of contact information, and generating a communication initiation grammar.
    Type: Grant
    Filed: May 19, 2010
    Date of Patent: April 8, 2014
    Assignee: Google Inc.
    Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
  • Patent number: 8694319
    Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
  • Patent number: 8694312
    Abstract: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
    Type: Grant
    Filed: February 22, 2013
    Date of Patent: April 8, 2014
    Assignee: MModal IP LLC
    Inventors: Lambert Mathias, Girija Yegnanarayanan, Juergen Fritsch
  • Patent number: 8688450
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information are described. A method includes determining, for each of multiple communications that were initiated by a user of a mobile device, a time when the communication was initiated or received; determining, for each of multiple contacts associated with the user, a probability associated with the contact based at least on the times when the communications were initiated or received; weighting a contact disambiguation grammar according to the probabilities; and processing audio data using the contact disambiguation grammar to select a particular contact.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: April 1, 2014
    Assignee: Google Inc.
    Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
  • Patent number: 8688454
    Abstract: The present invention relates to a method and apparatus for adapting a language model in response to error correction. One embodiment of a method for processing an input signal including human language includes receiving the input signal and applying a statistical language model combined with a separate, corrective language model to the input signal in order to produce a processing result.
    Type: Grant
    Filed: July 6, 2011
    Date of Patent: April 1, 2014
    Assignee: SRI International
    Inventor: Jing Zheng
  • Patent number: 8688453
    Abstract: According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: April 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Sachindra Joshi, Shantanu Godbole
  • Publication number: 20140088964
    Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.
    Type: Application
    Filed: September 25, 2012
    Publication date: March 27, 2014
    Applicant: APPLE INC.
    Inventor: Jerome Bellegarda
  • Patent number: 8682669
    Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: March 25, 2014
    Assignee: Synchronoss Technologies, Inc.
    Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Patent number: 8682674
    Abstract: Provided are methods and systems that extract facts of unstructured documents and build an oracle for various domains. The present invention addresses the problem of efficient finding and extraction of facts about a particular subject domain from semi-structured and unstructured documents, makes inferences of new facts from the extracted facts and the ways of verification of the facts, thus becoming a source of knowledge about the domain to be effectively queried. The methods and systems can also extract temporal information from unstructured and semi-structured documents, and can find and extract dynamically generated documents from Deep or Dynamic Web.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: March 25, 2014
    Assignee: Glenbrook Networks
    Inventors: Julia Komissarchik, Edward Komissarchik
  • Patent number: 8682666
    Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: March 25, 2014
    Assignee: Voicesense Ltd.
    Inventors: Yoav Degani, Yishai Zamir
  • Patent number: 8682665
    Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.
    Type: Grant
    Filed: April 28, 2011
    Date of Patent: March 25, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
  • Patent number: 8676580
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Patent number: 8676578
    Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.
    Type: Grant
    Filed: March 25, 2011
    Date of Patent: March 18, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
  • Patent number: 8676579
    Abstract: A method of authenticating a user of a mobile device having a first microphone and a second microphone, the method comprising receiving voice input from the user at the first and second microphones, determining a position of the user relative to the mobile device based on the voice input received by the first and second microphones, and authenticating the user based on the position of the user.
    Type: Grant
    Filed: April 30, 2012
    Date of Patent: March 18, 2014
    Assignee: BlackBerry Limited
    Inventor: James Allen Hymel
  • Patent number: 8676574
    Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: March 18, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventor: Ozlem Kalinli