Creating Patterns For Matching Patents (Class 704/243)

Update patterns (Class 704/244)

Clustering (Class 704/245)

FEATURE SPACE TRANSFORMATION FOR PERSONALIZATION USING GENERALIZED I-VECTOR CLUSTERING

Publication number: 20140214420

Abstract: Personalization for Automatic Speech Recognition (ASR) is associated with a particular device. A generalized i-vector clustering method is used to train i-vector parameters on utterances received from a device and to classify test utterances from the same device. A sub-loading matrix and a residual noise term may be used when determining the personalization. A Universal Background Model (UBM) is trained using the utterances. The UBM is applied to obtain i-vectors of training utterances received from a device and a Gaussian Mixture Model (GMM) is trained using the i-vectors. During testing, the i-vector for each utterance received from the device is estimated using the device's UBM. The utterance is then assigned to the cluster with the closest centroid in the GMM. For each utterance, the i-vector and the residual noise estimation is performed. Hyperparameter estimation is also performed. The i-vector estimation and hyperparameter estimation are performed until convergence.

Type: Application

Filed: January 25, 2013

Publication date: July 31, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Kaisheng Yao, Yifan Gong
PROSODIC AND LEXICAL ADDRESSEE DETECTION

Publication number: 20140214421

Abstract: Prosodic features are used for discriminating computer-directed speech from human-directed speech. Statistics and models describing energy/intensity patterns over time, speech/pause distributions, pitch patterns, vocal effort features, and speech segment duration patterns may be used for prosodic modeling. The prosodic features for at least a portion of an utterance are monitored over a period of time to determine a shape associated with the utterance. A score may be determined to assist in classifying the current utterance as human directed or computer directed without relying on knowledge of preceding utterances or utterances following the current utterance. Outside data may be used for training lexical addressee detection systems for the H-H-C scenario. H-C training data can be obtained from a single-user H-C collection and that H-H speech can be modeled using general conversational speech. H-C and H-H language models may also be adapted using interpolation with small amounts of matched H-H-C data.

Type: Application

Filed: January 31, 2013

Publication date: July 31, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tur, Larry Heck, Heeyoung Lee
Concise Dynamic Grammars Using N-Best Selection

Publication number: 20140207458

Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.

Type: Application

Filed: March 24, 2014

Publication date: July 24, 2014

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
FALSE ALARM REDUCTION IN SPEECH RECOGNITION SYSTEMS USING CONTEXTUAL INFORMATION

Publication number: 20140207457

Abstract: A system and method are presented for using spoken word verification to reduce false alarms by exploiting global and local contexts on a lexical level, a phoneme level, and on an acoustical level. The reduction of false alarms may occur through a process that determines whether a word has been detected or if it is a false alarm. Training examples are used to generate models of internal and external contexts which are compared to test word examples. The word may be accepted or rejected based on comparison results. Comparison may be performed either at the end of the process or at multiple steps of the process to determine whether the word is rejected.

Type: Application

Filed: January 22, 2013

Publication date: July 24, 2014

Applicant: INTERACTIVE INTELLIGENCE, INC.

Inventors: Konstantin Biatov, Aravind Ganapathiraju, Felix Immanuel Wyss
Multiple language voice recognition

Patent number: 8788256

Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.

Type: Grant

Filed: February 2, 2010

Date of Patent: July 22, 2014

Assignee: Sony Computer Entertainment Inc.

Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
Semantic Graphs and Conversational Agents

Publication number: 20140200891

Abstract: Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.

Type: Application

Filed: January 21, 2014

Publication date: July 17, 2014

Inventors: Jean-Marie Henri Daniel Larcheveque, Elizabeth Ireland Powers, Freya Kate Recksiek, Dan Teodosiu
System and method for standardized speech recognition infrastructure

Patent number: 8781831

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

Type: Grant

Filed: September 5, 2013

Date of Patent: July 15, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
Differential dynamic content delivery with text display in dependence upon simultaneous speech

Patent number: 8781830

Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.

Type: Grant

Filed: July 2, 2013

Date of Patent: July 15, 2014

Assignee: Nuance Communications, Inc.

Inventors: William K. Bodin, Michael J. Burkhart, Daniel G. Eisenhauer, Thomas J. Watson, Daniel M. Schumacher
Voiced interval command interpretation

Patent number: 8781821

Abstract: A method is disclosed for controlling a voice-activated device by interpreting a spoken command as a series of voiced and non-voiced intervals. A responsive action is then performed according to the number of voiced intervals in the command. The method is well-suited to applications having a small number of specific voice-activated response functions. Applications using the inventive method offer numerous advantages over traditional speech recognition systems including speaker universality, language independence, no training or calibration needed, implementation with simple microcontrollers, and extremely low cost. For time-critical applications such as pulsers and measurement devices, where fast reaction is crucial to catch a transient event, the method provides near-instantaneous command response, yet versatile voice control.

Type: Grant

Filed: April 30, 2012

Date of Patent: July 15, 2014

Assignee: Zanavox

Inventor: David Edward Newman
Method and system for providing an automated web transcription service

Patent number: 8775176

Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.

Type: Grant

Filed: August 26, 2013

Date of Patent: July 8, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mazin Gilbert, Stephan Kanthak
Speech recognition process

Patent number: 8775177

Abstract: A speech recognition process may perform the following operations: performing a preliminary recognition process on first audio to identify candidates for the first audio; generating first templates corresponding to the first audio, where each first template includes a number of elements; selecting second templates corresponding to the candidates, where the second templates represent second audio, and where each second template includes elements that correspond to the elements in the first templates; comparing the first templates to the second templates, where comparing comprises includes similarity metrics between the first templates and corresponding second templates; applying weights to the similarity metrics to produce weighted similarity metrics, where the weights are associated with corresponding second templates; and using the weighted similarity metrics to determine whether the first audio corresponds to the second audio.

Type: Grant

Filed: October 31, 2012

Date of Patent: July 8, 2014

Assignee: Google Inc.

Inventors: Georg Heigold, Patrick An Phu Nguyen, Mitchel Weintraub, Vincent O. Vanhoucke
Method for measuring speech characteristics

Patent number: 8768697

Abstract: In some embodiments, a method includes measuring a disparity between two speech samples by segmenting both a reference speech sample and a student speech sample into speech units. A duration disparity can be determined for units that are not adjacent to each other in the reference speech sample. A duration disparity can also be determined for the corresponding units in the student speech sample. A difference can then be calculated between the student speech sample duration disparity and the reference speech sample duration disparity.

Type: Grant

Filed: January 29, 2010

Date of Patent: July 1, 2014

Assignee: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
Methods and apparatus to present a video program to a visually impaired person

Patent number: 8768703

Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises detecting a text portion of a media stream including a video stream, the text portion not being consumable by a blind person, retrieving text associated with the text portion of the media stream, and converting the text to a first audio stream based on a first type of a first program in the media stream, and converting the text to a second audio stream based on a second type of a second program in the media stream.

Type: Grant

Filed: July 19, 2012

Date of Patent: July 1, 2014

Assignee: AT&T Intellectual Property, I, L.P.

Inventors: Hisao M. Chang, Horst Schroeter
Male acoustic model adaptation based on language-independent female speech data

Patent number: 8756062

Abstract: A method of generating proxy acoustic models for use in automatic speech recognition includes training acoustic models from speech received via microphone from male speakers of a first language, and adapting the acoustic models in response to language-independent speech data from female speakers of a second language, to generate proxy acoustic models for use during runtime of speech recognition of an utterance from a female speaker of the first language.

Type: Grant

Filed: December 10, 2010

Date of Patent: June 17, 2014

Assignee: General Motors LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Learning a verification model for speech recognition based on extracted recognition and language feature information

Patent number: 8751226

Abstract: A speech processing apparatus 101 includes a recognition feature extracting unit 12 that extracts recognition feature information which is a characteristic of a speech recognition result 15 obtained by performing a speech recognition process on an inputted speech from the speech recognition result 15; a language feature extracting unit 11 that extracts language feature information which is a characteristic of a pre-registered language resource 14 from the language resource 14; and a model learning unit 13 that obtains a verification model 16 by a learning process based on the extracted recognition feature information and language feature information.

Type: Grant

Filed: June 18, 2007

Date of Patent: June 10, 2014

Assignee: NEC Corporation

Inventors: Hitoshi Yamamoto, Kiyokazu Miki
Method and device for generating vocabulary entry from acoustic data

Patent number: 8751230

Abstract: A method and a device (1) for automatically generating vocabulary entry from input acoustic data (3), comprising a vocabulary entry type-specific acoustic phonetic transcription module (2; T) and a classifier module (6; 6?) for the classification of vocabulary entry types on the basis of the phonetic structure, wherein the classification of vocabulary entries is carried out in accordance with a number of predetermined types; and vocabulary entry type-specific phoneme-to-grapheme conversion means (28), to derive the respective vocabulary entries comprising a pair of a phonetic transcription and its grapheme form.

Type: Grant

Filed: June 17, 2009

Date of Patent: June 10, 2014

Assignee: Koninklijke Philips N.V.

Inventor: Zsolt Saffer
Model-driven candidate sorting based on audio cues

Patent number: 8751231

Abstract: Methods and systems for model-driven candidate sorting based on audio cues for evaluating digital interviews are described. In one embodiment, an audio cue generator identifies utterances in audio data of a digital interview. The utterances each include a group of one or more words spoken by a candidate in the digital interview. The audio cue generator generates audio cues of the digital interview based on the identified utterances. The audio cues are applies to a prediction model to predict an achievement index for the candidate based on the audio cues. The candidate is displayed in a list of candidates based on the achievement index. The list of candidates is sorted according to the candidates' achievement index.

Type: Grant

Filed: February 18, 2014

Date of Patent: June 10, 2014

Assignee: Hirevue, Inc.

Inventors: Loren Larsen, Benjamin Taylor
Method and System for Information Recognition

Publication number: 20140156273

Abstract: A system and a method perform information recognition. The method arranges data base information in a data base information structure. The method matches input information to the data base information using at least one matching algorithm and using a matching information structure. In accordance with the system and the method, the matching information structure differs from the data base information structure.

Type: Application

Filed: June 8, 2013

Publication date: June 5, 2014

Inventors: Walter Steven Rosenbaum, Joern Bach
Speaker-adaptive synthesized voice

Patent number: 8744853

Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.

Type: Grant

Filed: March 16, 2010

Date of Patent: June 3, 2014

Assignee: International Business Machines Corporation

Inventors: Masafumi Nishimura, Ryuki Tachibana
Microphone-array-based speech recognition system and method

Patent number: 8744849

Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.

Type: Grant

Filed: October 12, 2011

Date of Patent: June 3, 2014

Assignee: Industrial Technology Research Institute

Inventor: Hsien-Cheng Liao
System and method for generating challenge items for CAPTCHAs

Patent number: 8744850

Abstract: Challenge items for an audible based electronic challenge system are generated using a variety of techniques to identify optimal candidates. The challenge items are intended for use in a computing system that discriminates between humans and text to speech (TTS) system.

Type: Grant

Filed: January 14, 2013

Date of Patent: June 3, 2014

Assignee: John Nicholas and Kristin Gross

Inventor: John Nicholas Gross
Sparse maximum a posteriori (MAP) adaptation

Patent number: 8738376

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: October 28, 2011

Date of Patent: May 27, 2014

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
UTILIZING MULTIPLE PROCESSING UNITS FOR RAPID TRAINING OF HIDDEN MARKOV MODELS

Publication number: 20140142942

Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.

Type: Application

Filed: January 23, 2014

Publication date: May 22, 2014

Applicant: Accumente, LLC

Inventors: Jike CHONG, Ian Richard LANE, Senaka Wimal BUTHPITIYA
Spoken term detection apparatus, method, program, and storage medium

Patent number: 8731926

Abstract: A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.

Type: Grant

Filed: March 3, 2011

Date of Patent: May 20, 2014

Assignee: Fujitsu Limited

Inventors: Nobuyuki Washio, Shouji Harada
Speaker adaptation of vocabulary for speech recognition

Patent number: 8731928

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: March 15, 2013

Date of Patent: May 20, 2014

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
System and method for merging audio data streams for use in speech recognition applications

Patent number: 8731923

Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.

Type: Grant

Filed: August 20, 2010

Date of Patent: May 20, 2014

Assignee: Adacel Systems, Inc.

Inventor: Chang-Qing Shu
Method and system for identifying sound signals

Patent number: 8725829

Abstract: A method and system is described which allows users to identify (pre-recorded) sounds such as music, radio broadcast, commercials, and other audio signals in almost any environment. The audio signal (or sound) must be a recording represented in a database of recordings. The service can quickly identify the signal from just a few seconds of excerption, while tolerating high noise and distortion. Once the signal is identified to the user, the user may perform transactions interactively in real-time or offline using the identification information.

Type: Grant

Filed: April 26, 2004

Date of Patent: May 13, 2014

Assignee: Shazam Investments Limited

Inventors: Avery Li-Chun Wang, Christopher Jacques Penrose Barton, Dheeraj Shankar Mukherjee, Philip Inghelbrecht
SPEECH RECOGNITION SYSTEM, RECOGNITION DICTIONARY REGISTRATION SYSTEM, AND ACOUSTIC MODEL IDENTIFIER SERIES GENERATION APPARATUS

Publication number: 20140129222

Abstract: When it is determined that sound data is unrecognizable through a speech recognition process by a first speech recognition unit (3), the same sound data as the sound data inputted to the first speech recognition unit (3) is transmitted to a second server device (60) and a first server device (70). Recognition data is generated which is formed of a character string that is a speech recognition result by the second server device (60) with respect to the sound data, and an acoustic model identifier series generated by a first acoustic model identifier series generation unit (27) of the first server (70) based on the sound data, and the generated recognition data is registered in a first recognition dictionary (3b) of the first speech recognition unit (3).

Type: Application

Filed: August 9, 2012

Publication date: May 8, 2014

Applicant: ASAHI KASEI KABUSHIKI KAISHA

Inventor: Akihiro Okamoto
Systems and methods for dynamic re-configurable speech recognition

Patent number: 8719017

Abstract: Speech recognition models are dynamically re-configurable based on user information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. The techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

Type: Grant

Filed: May 15, 2008

Date of Patent: May 6, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard C Rose, Bojana Gajic
Active Speaker Indicator for Conference Participants

Publication number: 20140118472

Abstract: In one embodiment, a method includes receiving requests to join a conference from a plurality of user devices proximate a first endpoint. The requests include a username. The method also includes receiving an audio signal for the conference from the first endpoint. The first endpoint is operable to capture audio proximate the first endpoint. The method also includes transmitting the audio signal to a second endpoint, remote from the first endpoint. The method also includes identifying, by a processor, an active speaker proximate the first endpoint based on information received from the plurality of user devices.

Type: Application

Filed: October 31, 2012

Publication date: May 1, 2014

Inventors: Yanghua Liu, Weidong Chen, Biren Gandhi, Raghurama Bhat, Joseph Fouad Khouri, John Joseph Houston, Brian Thomas Toombs
Character input device and program for displaying next word candidates based on the candidates' usage history

Patent number: 8712756

Abstract: A character input device is disclosed. The device includes a character input section including a plurality of character keys; a display section that displays an input character(s); and a next word prediction section that predicts a respective word being subsequently input in an event of input-word reception in the character input section and that displays the word as a next word candidate on the display section. The next word prediction section stores usage history information indicative of whether the next word candidate for the respective input-received word was used by a user, and determines in accordance with the usage history information of words as of a time point of the event of input-word reception of the word whether to display the next word candidate on the display section, and inhibits the display of a next word candidate when a value obtained by adding a constant to the number of used times of the candidate is smaller than the number of its unused times.

Type: Grant

Filed: April 29, 2008

Date of Patent: April 29, 2014

Assignees: Sony Corporation, Sony Mobile Communications AB

Inventor: Sun Xiaoning
Audio recognition apparatus and speech recognition method using acoustic models and language models

Patent number: 8706487

Abstract: Acoustic models and language models are learned according to a speaking length which indicates a length of a speaking section in speech data, and speech recognition process is implemented by using the learned acoustic models and language models. A speech recognition apparatus includes means (103) for detecting a speaking section in speech data (101) and for generating a section information which indicates the detected speaking section, means (104) for recognizing a data part corresponding to a section information in the speech data as well as text data (102) written from the speech data and for classifying the data part based on a speaking length thereof, and means (106) for learning acoustic models and language models (107) by using the classified data part (105).

Type: Grant

Filed: December 7, 2007

Date of Patent: April 22, 2014

Assignee: NEC Corporation

Inventors: Tadashi Emori, Yoshifumi Onishi
Periodic ambient waveform analysis for enhanced social functions

Patent number: 8706499

Abstract: Client devices periodically capture ambient audio waveforms, generate waveform fingerprints, and upload the fingerprints to a server for analysis. The server compares the waveforms to a database of stored waveform fingerprints, and upon finding a match, pushes content or other information to the client device. The fingerprints in the database may be uploaded by other users, and compared to the received client waveform fingerprint based on common location or other social factors. Thus a client's location may be enhanced if the location of users whose fingerprints match the client's is known. In particular embodiments, the server may instruct clients whose fingerprints partially match to capture waveform data at a particular time and duration for further analysis and increased match confidence.

Type: Grant

Filed: August 16, 2011

Date of Patent: April 22, 2014

Assignee: Facebook, Inc.

Inventors: Matthew Nicholas Papakipos, David Harry Garcia
Preserving audio data collection privacy in mobile devices

Patent number: 8700406

Abstract: Techniques are disclosed for using the hardware and/or software of the mobile device to obscure speech in the audio data before a context determination is made by a context awareness application using the audio data. In particular, a subset of a continuous audio stream is captured such that speech (words, phrases and sentences) cannot be reliably reconstructed from the gathered audio. The subset is analyzed for audio characteristics, and a determination can be made regarding the ambient environment.

Type: Grant

Filed: August 19, 2011

Date of Patent: April 15, 2014

Assignee: Qualcomm Incorporated

Inventors: Leonard H. Grokop, Vidya Narayanan, James W. Dolter, Sanjiv Nanda
Subspace speech adaptation

Patent number: 8700400

Abstract: Subspace speech adaptation may be utilized for facilitating the recognition of speech containing short utterances. Speech training data may be received in a speech model by a computer. A first matrix may be determined for preconditioning speech statistics based on the speech training data. A second matrix may be determined for representing a basis for the speech to be recognized. A set of basis matrices may then be determined from the first matrix and the second matrix. Speech test data including a short utterance may then be received by the computer. The computer may then apply the set of basis matrices to the speech test data to produce a transcription. The transcription may represent speech recognition of the short utterance.

Type: Grant

Filed: December 30, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Daniel Povey, Kaisheng Yao, Yifan Gong
Disambiguation of contact information using historical data

Patent number: 8694313

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information. A method includes receiving an audio signal, generating an affinity score based on a frequency with which a user has previously communicated with a contact associated with an item of contact information, and further based on a recency of one or more past interactions between the user and the contact associated with the item of contact information, inferring a probability that the user intends to initiate a communication using the item of contact information based on the affinity score generated for the item of contact information, and generating a communication initiation grammar.

Type: Grant

Filed: May 19, 2010

Date of Patent: April 8, 2014

Assignee: Google Inc.

Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
Dynamic prosody adjustment for voice-rendering synthesized data

Patent number: 8694319

Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.

Type: Grant

Filed: November 3, 2005

Date of Patent: April 8, 2014

Assignee: International Business Machines Corporation

Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
Discriminative training of document transcription system

Patent number: 8694312

Abstract: A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.

Type: Grant

Filed: February 22, 2013

Date of Patent: April 8, 2014

Assignee: MModal IP LLC

Inventors: Lambert Mathias, Girija Yegnanarayanan, Juergen Fritsch
Disambiguation of contact information using historical and context data

Patent number: 8688450

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information are described. A method includes determining, for each of multiple communications that were initiated by a user of a mobile device, a time when the communication was initiated or received; determining, for each of multiple contacts associated with the user, a probability associated with the contact based at least on the times when the communications were initiated or received; weighting a contact disambiguation grammar according to the probabilities; and processing audio data using the contact disambiguation grammar to select a particular contact.

Type: Grant

Filed: July 10, 2012

Date of Patent: April 1, 2014

Assignee: Google Inc.

Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
Method and apparatus for adapting a language model in response to error correction

Patent number: 8688454

Abstract: The present invention relates to a method and apparatus for adapting a language model in response to error correction. One embodiment of a method for processing an input signal including human language includes receiving the input signal and applying a statistical language model combined with a separate, corrective language model to the input signal in order to produce a processing result.

Type: Grant

Filed: July 6, 2011

Date of Patent: April 1, 2014

Assignee: SRI International

Inventor: Jing Zheng
Intent mining via analysis of utterances

Patent number: 8688453

Abstract: According to example configurations, a speech processing system can include a syntactic parser, a word extractor, word extraction rules, and an analyzer. The syntactic parser of the speech processing system parses the utterance to identify syntactic relationships amongst words in the utterance. The word extractor utilizes word extraction rules to identify groupings of related words in the utterance that most likely represent an intended meaning of the utterance. The analyzer in the speech processing system maps each set of the sets of words produced by the word extractor to a respective candidate intent value to produce a list of candidate intent values for the utterance. The analyzer is configured to select, from the list of candidate intent values (i.e., possible intended meanings) of the utterance, a particular candidate intent value as being representative of the intent (i.e., intended meaning) of the utterance.

Type: Grant

Filed: February 28, 2011

Date of Patent: April 1, 2014

Assignee: Nuance Communications, Inc.

Inventors: Sachindra Joshi, Shantanu Godbole
Exemplar-Based Latent Perceptual Modeling for Automatic Speech Recognition

Publication number: 20140088964

Abstract: Methods, systems, and computer-readable media related to selecting observation-specific training data (also referred to as “observation-specific exemplars”) from a general training corpus, and then creating, from the observation-specific training data, a focused, observation-specific acoustic model for recognizing the observation in an output domain are disclosed. In one aspect, a global speech recognition model is established based on an initial set of training data; a plurality of input speech segments to be recognized in an output domain are received; and for each of the plurality of input speech segments: a respective set of focused training data relevant to the input speech segment is identified in the global speech recognition model; a respective focused speech recognition model is generated based on the respective set of focused training data; and the respective focused speech recognition model is provided to a recognition device for recognizing the input speech segment in the output domain.

Type: Application

Filed: September 25, 2012

Publication date: March 27, 2014

Applicant: APPLE INC.

Inventor: Jerome Bellegarda
System and method for building optimal state-dependent statistical utterance classifiers in spoken dialog systems

Patent number: 8682669

Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.

Type: Grant

Filed: August 21, 2009

Date of Patent: March 25, 2014

Assignee: Synchronoss Technologies, Inc.

Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents

Patent number: 8682674

Abstract: Provided are methods and systems that extract facts of unstructured documents and build an oracle for various domains. The present invention addresses the problem of efficient finding and extraction of facts about a particular subject domain from semi-structured and unstructured documents, makes inferences of new facts from the extracted facts and the ways of verification of the facts, thus becoming a source of knowledge about the domain to be effectively queried. The methods and systems can also extract temporal information from unstructured and semi-structured documents, and can find and extract dynamically generated documents from Deep or Dynamic Web.

Type: Grant

Filed: March 13, 2013

Date of Patent: March 25, 2014

Assignee: Glenbrook Networks

Inventors: Julia Komissarchik, Edward Komissarchik
Speaker characterization through speech analysis

Patent number: 8682666

Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.

Type: Grant

Filed: May 7, 2012

Date of Patent: March 25, 2014

Assignee: Voicesense Ltd.

Inventors: Yoav Degani, Yishai Zamir
Concise dynamic grammars using N-best selection

Patent number: 8682665

Abstract: A method and apparatus derive a dynamic grammar composed of a subset of a plurality of data elements that are each associated with one of a plurality of reference identifiers. The present invention generates a set of selection identifiers on the basis of a user-provided first input identifier and determines which of these selection identifiers are present in a set of pre-stored reference identifiers. The present invention creates a dynamic grammar that includes those data elements that are associated with those reference identifiers that are matched to any of the selection identifiers. Based on a user-provided second identifier and on the data elements of the dynamic grammar, the present invention selects one of the reference identifiers in the dynamic grammar.

Type: Grant

Filed: April 28, 2011

Date of Patent: March 25, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Deborah W. Brown, Randy G. Goldberg, Stephen Michael Marcus, Richard R. Rosinski
Automatic speech and concept recognition

Patent number: 8676580

Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.

Type: Grant

Filed: August 16, 2011

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
Meeting support apparatus, method and program

Patent number: 8676578

Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.

Type: Grant

Filed: March 25, 2011

Date of Patent: March 18, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
Dual microphone voice authentication for mobile device

Patent number: 8676579

Abstract: A method of authenticating a user of a mobile device having a first microphone and a second microphone, the method comprising receiving voice input from the user at the first and second microphones, determining a position of the user relative to the mobile device based on the voice input received by the first and second microphones, and authenticating the user based on the position of the user.

Type: Grant

Filed: April 30, 2012

Date of Patent: March 18, 2014

Assignee: BlackBerry Limited

Inventor: James Allen Hymel
Method for tone/intonation recognition using auditory attention cues

Patent number: 8676574

Abstract: In a spoken language processing method for tone/intonation recognition, an auditory spectrum may be determined for an input window of sound and one or more multi-scale features may be extracted from the auditory spectrum. Each multi-scale feature can be extracted using a separate two-dimensional spectro-temporal receptive filter. One or more feature maps corresponding to the one or more multi-scale features can be generated and an auditory gist vector can be extracted from each of the one or more feature maps. A cumulative gist vector may be obtained through augmentation of each auditory gist vector extracted from the one or more feature maps. One or more tonal characteristics corresponding to the input window of sound can be determined by mapping the cumulative gist vector to one or more tonal characteristics using a machine learning algorithm.

Type: Grant

Filed: November 10, 2010

Date of Patent: March 18, 2014

Assignee: Sony Computer Entertainment Inc.

Inventor: Ozlem Kalinli

prev … 5 6 7 8 9 10 11 12 13 … next