Specialized Models Patents (Class 704/250)

Mobile systems and methods of supporting natural language human-machine interactions

Patent number: 8447607

Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.

Type: Grant

Filed: June 4, 2012

Date of Patent: May 21, 2013

Assignee: VoiceBox Technologies, Inc.

Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
Secure voice transaction method and system

Patent number: 8442187

Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.

Type: Grant

Filed: April 17, 2012

Date of Patent: May 14, 2013

Assignee: International Business Machines Corporation

Inventors: Peeyush Jaiswal, Naveen Narayan
Conditional model for natural language understanding

Patent number: 8442828

Abstract: A conditional model is used in spoken language understanding. One such model is a conditional random field model.

Type: Grant

Filed: March 17, 2006

Date of Patent: May 14, 2013

Assignee: Microsoft Corporation

Inventors: Ye-Yi Wang, Alejandro Acero, John Sie Yuen Lee, Milind V. Mahajan
Device, system, and method of liveness detection utilizing voice biometrics

Patent number: 8442824

Abstract: Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score.

Type: Grant

Filed: November 25, 2009

Date of Patent: May 14, 2013

Assignee: Nuance Communications, Inc.

Inventors: Almog Aley-Raz, Nir Moshe Krause, Michael Itzhak Salmon, Ran Yehoshua Gazit
Biomimetic voice identifier

Patent number: 8442825

Abstract: A device for voice identification including a receiver, a segmenter, a resolver, two advancers, a buffer, and a plurality of IIR resonator digital filters where each IIR filter comprises a set of memory locations or functional equivalent to hold filter specifications, a memory location or functional equivalent to hold the arithmetic reciprocal of the filter's gain, a five cell controller array, several multipliers, an adder, a subtractor, and a logical non-shift register. Each cell of the five cell controller array has five logical states, each acting as a five-position single-pole rotating switch that operates in unison with the four others. Additionally, the device also includes an artificial neural network and a display means.

Type: Grant

Filed: August 16, 2011

Date of Patent: May 14, 2013

Assignee: The United States of America as Represented by the Director, National Security Agency

Inventor: Michael Sinutko
Automated distortion classification

Patent number: 8438030

Abstract: A method of and system for automated distortion classification. The method includes steps of (a) receiving audio including a user speech signal and at least some distortion associated with the signal; (b) pre-processing the received audio to generate acoustic feature vectors; (c) decoding the generated acoustic feature vectors to produce a plurality of hypotheses for the distortion; and (d) post-processing the plurality of hypotheses to identify at least one distortion hypothesis of the plurality of hypotheses as the received distortion. The system can include one or more distortion models including distortion-related acoustic features representative of various types of distortion and used by a decoder to compare the acoustic feature vectors with the distortion-related acoustic features to produce the plurality of hypotheses for the distortion.

Type: Grant

Filed: November 25, 2009

Date of Patent: May 7, 2013

Assignee: General Motors LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Head related transfer function (HRTF) enhancement for improved vertical-polar localization in spatial audio systems

Patent number: 8428269

Abstract: A spatial audio system for implementing a head-related transfer function (HRTF). A first stage implements a lateral HRTF that reproduces the median frequency response for a sound source located at a particular lateral distance from a listener, and second stage implements a vertical HRTF that reproduces the spectral changes when the vertical distance of a sound source changes relative to the listener. The system improves the vertical localization accuracy provided by an arbitrary measured HRTF by introducing an enhancement factor into the second processing stage. The enhancement factor increases the spectral differentiation between simulated sound sources located at different positions within the same “cone of confusion.

Type: Grant

Filed: May 20, 2010

Date of Patent: April 23, 2013

Assignee: The United States of America as represented by the Secretary of the Air Force

Inventors: Douglas S. Brungart, Griffin D. Romigh
Adaptive voice print for conversational biometric engine

Patent number: 8417525

Abstract: A computer-implemented method, system and/or program product update voice prints over time. A receiving computer receives an initial voice print. A determining period of time is calculated for that initial voice print. This determining period of time is a length of time during which an expected degree of change in subsequent voice prints, in comparison to the initial voice print, is predicted to occur. A new voice print is received after the determining period of time has passed, and the new voice print is compared with the initial voice print. In response to a change to the new voice print falling within the expected degree of change in comparison to the initial voice print, a voice print store is updated with the new voice print.

Type: Grant

Filed: February 9, 2010

Date of Patent: April 9, 2013

Assignee: International Business Machines Corporation

Inventors: Sheri Gayle Daye, Peeyush Jaiswal, Fang Wang
System, method and computer program product for extracting user profiles and habits based on speech recognition and calling history for telephone system advertising

Patent number: 8411830

Abstract: A system, method and computer program product for providing targeted messages to a person using telephony services by generating user profile information from telephony data and using the user profile information to retrieve targeted messages.

Type: Grant

Filed: November 18, 2011

Date of Patent: April 2, 2013

Assignee: iCall, Inc.

Inventors: Arlo Christopher Gilbert, Andrew Muldowney
Restoration of high-order Mel frequency cepstral coefficients

Patent number: 8412526

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N?L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Grant

Filed: December 3, 2007

Date of Patent: April 2, 2013

Assignee: Nuance Communications, Inc.

Inventor: Alexander Sorin
Systems, methods, and programs for detecting unauthorized use of text based communications

Patent number: 8386253

Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.

Type: Grant

Filed: July 13, 2012

Date of Patent: February 26, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Benjamin J. Stern
Restoration of high-order Mel Frequency Cepstral Coefficients

Publication number: 20130046540

Abstract: A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N-L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.

Type: Application

Filed: December 3, 2007

Publication date: February 21, 2013

Inventor: Alexander Sorin
System and method for management of call data using a vector based model and relational data structure

Patent number: 8379806

Abstract: A system and method for representing call content in a searchable database includes transcribing call content to text. The call content is projected to vector space, by creating a vector by indexing the call based on the content and determining a similarity of the call to an atomic-class dictionary. The call is classified in a relational database in accordance with the vector.

Type: Grant

Filed: August 22, 2008

Date of Patent: February 19, 2013

Assignee: International Business Machines Corporation

Inventors: Cheng Wu, Andrzej Sakrajda, Hong-Kwang Jeff Kuo, Vaibhava Goel, David Lubensky
System and method for standardized speech recognition infrastructure

Patent number: 8374867

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

Type: Grant

Filed: November 13, 2009

Date of Patent: February 12, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer
Utterance verification method and apparatus for isolated word N-best recognition result

Patent number: 8374869

Abstract: An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.

Type: Grant

Filed: August 4, 2009

Date of Patent: February 12, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Jeom Ja Kang, Yunkeun Lee, Jeon Gue Park, Ho-Young Jung, Hyung-Bae Jeon, Hoon Chung, Sung Joo Lee, Euisok Chung, Ji Hyun Wang, Byung Ok Kang, Ki-young Park, Jong Jin Kim
Method of recognizing speech

Patent number: 8374868

Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.

Type: Grant

Filed: August 21, 2009

Date of Patent: February 12, 2013

Assignee: General Motors LLC

Inventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
Position-dependent phonetic models for reliable pronunciation identification

Patent number: 8355917

Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.

Type: Grant

Filed: February 1, 2012

Date of Patent: January 15, 2013

Assignee: Microsoft Corporation

Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
IDENTIFICATION OF PEOPLE USING MULTIPLE TYPES OF INPUT

Publication number: 20120278077

Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

Type: Application

Filed: July 11, 2012

Publication date: November 1, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
Apparatus, method, and medium for dialogue speech recognition using topic domain detection

Patent number: 8301450

Abstract: An apparatus, method, and medium for dialogue speech recognition using topic domain detection are disclosed. An apparatus includes a forward search module performing a forward search in order to create a word lattice similar to a feature vector, which is extracted from an input voice signal, with reference to a global language model database, a pronunciation dictionary database and an acoustic model database, which have been previously established, a topic-domain-detection module detecting a topic domain by inferring a topic based on meanings of vocabularies contained in the word lattice using information of the word lattice created as a result of the forward search, and a backward-decoding module performing a backward decoding of the detected topic domain with reference to a specific topic domain language model database, which has been previously established, thereby outputting a speech recognition result for an input voice signal in text form. Accuracy and efficiency for a dialogue sentence are improved.

Type: Grant

Filed: October 30, 2006

Date of Patent: October 30, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jae-won Lee, In-jeong Choi
Method and system for identifying and correcting accent-induced speech recognition difficulties

Patent number: 8285546

Abstract: A system for use in speech recognition includes an acoustic module accessing a plurality of distinct-language acoustic models, each based upon a different language; a lexicon module accessing at least one lexicon model; and a speech recognition output module. The speech recognition output module generates a first speech recognition output using a first model combination that combines one of the plurality of distinct-language acoustic models with the at least one lexicon model. In response to a threshold determination, the speech recognition output module generates a second speech recognition output using a second model combination that combines a different one of the plurality of distinct-language acoustic models with the at least one distinct-language lexicon model.

Type: Grant

Filed: September 9, 2011

Date of Patent: October 9, 2012

Assignee: Nuance Communications, Inc.

Inventor: David E. Reich
Method and system for bio-metric voice print authentication

Patent number: 8280740

Abstract: A method (700) and system (900) for authenticating a user is provided. The method can include receiving one or more spoken utterances from a user (702), recognizing a phrase corresponding to one or more spoken utterances (704), identifying a biometric voice print of the user from one or more spoken utterances of the phrase (706), determining a device identifier associated with the device (708), and authenticating the user based on the phrase, the biometric voice print, and the device identifier (710). A location of the handset or the user can be employed as criteria for granting access to one or more resources (712).

Type: Grant

Filed: April 13, 2009

Date of Patent: October 2, 2012

Assignee: Porticus Technology, Inc.

Inventors: Germano Di Mambro, Bernardas Salna
VOICE RECOGNITION SYSTEM AND VOICE RECOGNITION METHOD

Publication number: 20120239401

Abstract: Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.

Type: Application

Filed: November 26, 2010

Publication date: September 20, 2012

Applicant: NEC CORPORATION

Inventor: Takayuki Arakawa
Method for assessing pronunciation abilities

Patent number: 8271281

Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.

Type: Grant

Filed: June 27, 2008

Date of Patent: September 18, 2012

Assignee: Nuance Communications, Inc.

Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
Quantizing feature vectors in decision-making applications

Patent number: 8271278

Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.

Type: Grant

Filed: April 3, 2010

Date of Patent: September 18, 2012

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure
System and method for identifying audio command prompts for use in a voice response environment

Patent number: 8265932

Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.

Type: Grant

Filed: October 3, 2011

Date of Patent: September 11, 2012

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Method and system for expanding a word graph to a phone graph based on a cross-word acoustical model to improve continuous speech recognition

Patent number: 8260614

Abstract: A method and system that expands a word graph to a phone graph. An unknown speech signal is received. A word graph is generated based on an application task or based on information extracted from the unknown speech signal. The word graph is expanded into a phone graph. The unknown speech signal is recognized using the phone graph. The phone graph can be based on a cross-word acoustical model to improve continuous speech recognition. By expanding a word graph into a phone graph, the phone graph can consume less memory than a word graph and can reduce greatly the computation cost in the decoding process than that of the word graph thus improving system performance. Furthermore, continuous speech recognition error rate can be reduced by using the phone graph, which provides a more accurate graph for continuous speech recognition.

Type: Grant

Filed: September 28, 2000

Date of Patent: September 4, 2012

Assignee: Intel Corporation

Inventors: Qingwei Zhao, Zhiwei Lin, Yonghong Yan
METHOD AND APPARATUS FOR CREATING VOICE TAG

Publication number: 20120221335

Abstract: According to one embodiment, the method may include constructing a first voice tag for registration speech based on Hidden Markov acoustic model (HMM), constructing a second voice tag for the registration speech based on template matching, and combining the first voice tag and the second voice tag to construct voice tag of the registration speech.

Type: Application

Filed: February 24, 2012

Publication date: August 30, 2012

Inventors: Rui Zhao, Lei He
SPEAKER CHARACTERIZATION THROUGH SPEECH ANALYSIS

Publication number: 20120221336

Abstract: A computer implemented method, data processing system, apparatus and computer program product for determining current behavioral, psychological and speech styles characteristics of a speaker in a given situation and context, through analysis of current speech utterances of the speaker. The analysis calculates different prosodic parameters of the speech utterances, consisting of unique secondary derivatives of the primary pitch and amplitude speech parameters, and compares these parameters with pre-obtained reference speech data, indicative of various behavioral, psychological and speech styles characteristics. The method includes the formation of the classification speech parameters reference database, as well as the analysis of the speaker's speech utterances in order to determine the current behavioral, psychological and speech styles characteristics of the speaker in the given situation.

Type: Application

Filed: May 7, 2012

Publication date: August 30, 2012

Applicant: VOICESENSE LTD.

Inventors: Yoav DEGANI, Yishai ZAMIR
Systems, methods, and programs for detecting unauthorized use of text based communications services

Patent number: 8244532

Abstract: Systems, methods, and programs for generating an authorized profile for a text communication device or account, may sample a text communication generated by the text communication device or account during communication and may store the text sample. The systems, methods, and programs may extract a language pattern from the stored text sample and may create an authorized profile based on the language pattern. Systems, methods, and programs for detecting unauthorized use of a text communication device or account may sample a text communication generated by the device or account during communication, may extract a language pattern from the audio sample, and may compare extracted language pattern of the sample with an authorized user profile.

Type: Grant

Filed: December 23, 2005

Date of Patent: August 14, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Lee Begeja, Benjamin J. Stern
HMM-based bilingual (Mandarin-English) TTS techniques

Patent number: 8244534

Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.

Type: Grant

Filed: August 20, 2007

Date of Patent: August 14, 2012

Assignee: Microsoft Corporation

Inventors: Yao Qian, Frank Kao-PingK Soong
Identification of people using multiple types of input

Patent number: 8234113

Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

Type: Grant

Filed: August 30, 2011

Date of Patent: July 31, 2012

Assignee: Microsoft Corporation

Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
Apparatus and method for generating noise adaptive acoustic model for environment migration including noise adaptive discriminative adaptation method

Patent number: 8234112

Abstract: Provided are an apparatus and method for generating a noise adaptive acoustic model including a noise adaptive discriminative adaptation method. The method includes: generating a baseline model parameter from large-capacity speech training data including various noise environments; and receiving the generated baseline model parameter and applying a discriminative adaptation method to the generated results to generate an migrated acoustic model parameter suitable for an actually applied environment.

Type: Grant

Filed: April 25, 2008

Date of Patent: July 31, 2012

Assignee: Electronics and Telecommunications Research Institute

Inventors: Byung Ok Kang, Ho Young Jung, Yun Keun Lee
Machine translation in continuous space

Patent number: 8229729

Abstract: A system and method for training a statistical machine translation model and decoding or translating using the same is disclosed. A source word versus target word co-occurrence matrix is created to define word pairs. Dimensionality of the matrix may be reduced. Word pairs are mapped as vectors into continuous space where the word pairs are vectors of continuous real numbers and not discrete entities in the continuous space. A machine translation parametric model is trained using an acoustic model training method based on word pair vectors in the continuous space.

Type: Grant

Filed: March 25, 2008

Date of Patent: July 24, 2012

Assignee: International Business Machines Corporation

Inventors: Ruhi Sarikaya, Yonggang Deng, Brian Edward Doorenbos Kingsbury, Yuqing Gao
Method and apparatus for recognizing a speaker in lawful interception systems

Patent number: 8219404

Abstract: A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the voice to be identified, thus reducing identification time and improving the statistical significance.

Type: Grant

Filed: August 9, 2007

Date of Patent: July 10, 2012

Assignee: Nice Systems, Ltd.

Inventors: Adam Weinberg, Irit Opher, Eyal Benaroya, Renan Gutman
System and method for providing large vocabulary speech processing based on fixed-point arithmetic

Patent number: 8195462

Abstract: Disclosed herein is a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. The invention relates to a system, method and computer-readable medium storing instructions for controlling a computing device according to the method. As an example embodiment, the method uses a speech recognition decoder that operates or uses fixed point arithmetic. The exemplary method comprises representing arc costs associated with at least one finite state transducer (FST) in fixed point, representing parameters associated with a hidden Markov model (HMM) in fixed point and processing speech data in the speech recognition decoder using fixed point arithmetic for the fixed point FST arc costs and the fixed point HMM parameters. The method may also include computing at the decoder sentence hypothesis probabilities with fixed point arithmetic as type Q-2e numbers.

Type: Grant

Filed: February 16, 2006

Date of Patent: June 5, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Charles Douglas Blewett, Enrico Luigi Bocchieri
Mobile systems and methods of supporting natural language human-machine interactions

Patent number: 8195468

Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.

Type: Grant

Filed: April 11, 2011

Date of Patent: June 5, 2012

Assignee: VoiceBox Technologies, Inc.

Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
Secure voice transaction method and system

Patent number: 8194827

Abstract: A security method and system. The method includes receiving by a computing system, a telephone call from a user. The computing system comprises an existing password/passphrase and a pre-recorded voice sample associated with the user. The computing system prompts the user to enter a password/passphrase using speech. The computing system receives speech data comprising a first password/passphrase from the user. The computing system converts the speech data to text data. The computing system first compares the text data to the first password/passphrase and determines a match. The computing system compares the speech data to the pre-recorded voice sample to determine a result indicating whether a frequency spectrum associated with the speech data matches a frequency spectrum associated with the pre-recorded voice sample. The computing system transmits the result to the user.

Type: Grant

Filed: April 29, 2008

Date of Patent: June 5, 2012

Assignee: International Business Machines Corporation

Inventors: Peeyush Jaiswal, Naveen Narayan
Handheld electronic device with text disambiguation

Patent number: 8179289

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.

Type: Grant

Filed: June 19, 2006

Date of Patent: May 15, 2012

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
High performance HMM adaptation with joint compensation of additive and convolutive distortions

Patent number: 8180637

Abstract: A method of compensating for additive and convolutive distortions applied to a signal indicative of an utterance is discussed. The method includes receiving a signal and initializing noise mean and channel mean vectors. Gaussian dependent matrix and Hidden Markov Model (HMM) parameters are calculated or updated to account for additive noise from the noise mean vector or convolutive distortion from the channel mean vector. The HMM parameters are adapted by decoding the utterance using the previously calculated HMM parameters and adjusting the Gaussian dependent matrix and the HMM parameters based upon data received during the decoding. The adapted HMM parameters are applied to decode the input utterance and provide a transcription of the utterance.

Type: Grant

Filed: December 3, 2007

Date of Patent: May 15, 2012

Assignee: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Alejandro Acero, Yifan Gong, Jinyu Li
Method for emotion recognition based on minimum classification error

Patent number: 8180638

Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.

Type: Grant

Filed: February 23, 2010

Date of Patent: May 15, 2012

Assignee: Korea Institute of Science and Technology

Inventors: Hyoung Gon Kim, Ig Jae Kim, Joon-Hyuk Chang, Kye Hwan Lee, Chang Seok Bae
VOICE DATA ANALYZING DEVICE, VOICE DATA ANALYZING METHOD, AND VOICE DATA ANALYZING PROGRAM

Publication number: 20120116763

Abstract: A voice data analyzing device comprises speaker model deriving means which derives speaker models as models each specifying character of voice of each speaker from voice data including a plurality of utterances to each of which a speaker label as information for identifying a speaker has been assigned and speaker co-occurrence model deriving means which derives a speaker co-occurrence model as a model representing the strength of co-occurrence relationship among the speakers from session data obtained by segmenting the voice data in units of sequences of conversation by use of the speaker models derived by the speaker model deriving means.

Type: Application

Filed: June 3, 2010

Publication date: May 10, 2012

Applicant: NEC CORPORATION

Inventor: Takafumi Koshinaka
Speech recognition system and method with cepstral noise subtraction

Patent number: 8150690

Abstract: The invention relates to a speech recognition system and method with cepstral noise subtraction. The speech recognition system and method utilize a first scalar coefficient, a second scalar coefficient, and a determining condition to limit the process for the cepstral feature vector, so as to avoid excessive enhancement or subtraction in the cepstral feature vector, so that the operation of the cepstral feature vector is performed properly to improve the anti-noise ability in speech recognition. Furthermore, the speech recognition system and method can be applied in any environment, and have a low complexity and can be easily integrated into other systems, so as to provide the user with a more reliable and stable speech recognition result.

Type: Grant

Filed: October 1, 2008

Date of Patent: April 3, 2012

Assignee: Industrial Technology Research Institute

Inventor: Shih-Ming Huang
Indexing apparatus, indexing method, and computer program product

Patent number: 8145486

Abstract: Acoustic models to provide features to a speech signal are created based on speech features included in regions where similarities of acoustic models created based on speech features in a certain time length are equal to or greater than a predetermined value. Feature vectors acquired by using the acoustic models of the regions and the speech features to provide features to speech signals of second segments are grouped by speaker.

Type: Grant

Filed: January 9, 2008

Date of Patent: March 27, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Makoto Hirohata
Signal detection using delta spectrum entropy

Patent number: 8126668

Abstract: Disclosed is a method of signal detection. A received input signal is divided into a frame unit and each input signal present in a first frame and a second frame is transformed into a frequency signal. Then, first power spectrum information and second power spectrum information are computed utilizing the transformed frequency signal and a delta spectrum entropy value corresponding to a difference of the two computed power spectrum information is obtained. A predetermined input signal is included in a predetermined frame among the input signal after judging through comparing the delta spectrum entropy value with a critical value. Desired signal can be detected in a noisy environment including a noise signal by using the delta spectrum entropy value.

Type: Grant

Filed: February 29, 2008

Date of Patent: February 28, 2012

Assignee: Sungkyunkwan University Foundation for Corporate Collaboration

Inventors: Kwang-Seok Hong, Yong-Wan Roh, Kue-Bum Lee
Method and apparatus for detecting unsolicited multimedia communications

Patent number: 8121839

Abstract: A service is configured to analyze multimedia communications to determine a likelihood that the communication is unsolicited. For example, the service may inspect e-mail messages, instant messaging messages, facsimile transmissions, voice communications, and video telephony, and analyze these forms of communication to determine whether an intended communication is unsolicited. In connection with voice and video telephony, a voice sample may be obtained from the caller and voice recognition may be performed on the sample to determine an identity of the person or the identity of the voice. The voice sample may also be used to determine the type of voice—i.e. if the voice is live, machine generated, or prerecorded. Where the call is a video telephony call, image recognition may be used to inspect an image of the person. The information obtained from voice recognition, voice type recognition, and image recognition may be used to detect whether the messages if from a known source of unsolicited communications.

Type: Grant

Filed: December 19, 2005

Date of Patent: February 21, 2012

Assignee: Rockstar Bidco, LP

Inventors: Samir Srivastava, Francois Audet, Vibhu Vivek
Adjusting a speech engine for a mobile computing device based on background noise

Patent number: 8121837

Abstract: Methods, apparatus, and products are disclosed for adjusting a speech engine for a mobile computing device based on background noise, the mobile computing device operatively coupled to a microphone, that include: sampling, through the microphone, background noise for a plurality of operating environments in which the mobile computing device operates; generating, for each operating environment, a noise model in dependence upon the sampled background noise for that operating environment; and configuring the speech engine for the mobile computing device with the noise model for the operating environment in which the mobile computing device currently operates.

Type: Grant

Filed: April 24, 2008

Date of Patent: February 21, 2012

Assignee: Nuance Communications, Inc.

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, Jr., Paritosh D. Patel
Speech recognition method, speech recognition system, and server thereof

Patent number: 8108212

Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.

Type: Grant

Filed: October 30, 2007

Date of Patent: January 31, 2012

Assignee: NEC Corporation

Inventor: Shuhei Maegawa
Age determination using speech

Patent number: 8099278

Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.

Type: Grant

Filed: December 22, 2010

Date of Patent: January 17, 2012

Assignee: Verizon Patent and Licensing Inc.

Inventor: Kevin R. Witzman
Text-dependent speaker verification

Patent number: 8099288

Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.

Type: Grant

Filed: February 12, 2007

Date of Patent: January 17, 2012

Assignee: Microsoft Corp.

Inventors: Zhengyou Zhang, Amarnag Subramaya
Voice recognition device

Patent number: 8099290

Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.

Type: Grant

Filed: October 20, 2009

Date of Patent: January 17, 2012

Assignee: Mitsubishi Electric Corporation

Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta

prev 1 2 3 4 5 6 7 8 … next