Specialized Models Patents (Class 704/250)

Speech recognition method, speech recognition system, and server thereof

Patent number: 8108212

Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.

Type: Grant

Filed: October 30, 2007

Date of Patent: January 31, 2012

Assignee: NEC Corporation

Inventor: Shuhei Maegawa
Age determination using speech

Patent number: 8099278

Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.

Type: Grant

Filed: December 22, 2010

Date of Patent: January 17, 2012

Assignee: Verizon Patent and Licensing Inc.

Inventor: Kevin R. Witzman
Text-dependent speaker verification

Patent number: 8099288

Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.

Type: Grant

Filed: February 12, 2007

Date of Patent: January 17, 2012

Assignee: Microsoft Corp.

Inventors: Zhengyou Zhang, Amarnag Subramaya
Voice recognition device

Patent number: 8099290

Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.

Type: Grant

Filed: October 20, 2009

Date of Patent: January 17, 2012

Assignee: Mitsubishi Electric Corporation

Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta
SPEECH RECOGNITION AND VOICE TRAINING DATA STORAGE AND ACCESS METHODS AND APPARATUS

Publication number: 20120010887

Abstract: Embodiments include a speech recognition system and a personal speech profile data (PSPD) storage device that is physically distinct from the speech recognition system. In the speech recognition system, a PSPD interface receives voice training data, which is associated with an individual, from the PSPD storage device. A speech input module produces a digital speech signal derived from an utterance made by a system user. A speech processing module accesses voice training data stored on the PSPD storage device through the PSPD interface, and executes a speech processing algorithm that analyzes the digital speech signal using the voice training data, in order to identify one or more recognized terms from the digital speech signal. A command processing module initiates execution of various applications based on the recognized terms. Embodiments may be implemented in various types of host systems, including an aircraft cockpit-based system.

Type: Application

Filed: July 8, 2010

Publication date: January 12, 2012

Applicant: HONEYWELL INTERNATIONAL INC.

Inventors: Lokesh Rayasandra Boregowda, Meruva Jayaprakash, Koushik Sinha
Model development authoring, generation and execution based on data and processor dependencies

Patent number: 8086455

Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.

Type: Grant

Filed: January 9, 2008

Date of Patent: December 27, 2011

Assignee: Microsoft Corporation

Inventors: Yifan Gong, Ye Tian
Apparatus for creating speaker model, and computer program product

Patent number: 8078462

Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.

Type: Grant

Filed: October 2, 2008

Date of Patent: December 13, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Yusuke Shinohara, Masami Akamine
System and method for detection and analysis of speech

Patent number: 8078465

Abstract: Certain aspects and embodiments of the present invention are directed to systems and methods for monitoring and analyzing the language environment and the development of a key child. A key child's language environment and language development can be monitored without placing artificial limitations on the key child's activities or requiring a third party observer. The language environment can be analyzed to identify words, vocalizations, or other noises directed to or spoken by the key child, independent of content. The analysis can include the number of responses between the child and another, such as an adult and the number of words spoken by the child and/or another, independent of content of the speech. One or more metrics can be determined based on the analysis and provided to assist in improving the language environment and/or tracking language development of the key child.

Type: Grant

Filed: January 23, 2008

Date of Patent: December 13, 2011

Assignee: LENA Foundation

Inventors: Terrance Paul, Dongxin Xu, Umit Yapenel, Sharmistha Gray
SYSTEM AND METHOD OF MULTI MODEL ADAPTATION AND VOICE RECOGNITION

Publication number: 20110301953

Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.

Type: Application

Filed: April 11, 2011

Publication date: December 8, 2011

Applicant: Seoby Electronic Co., Ltd

Inventor: Sung-Sub Lee
Voice recognition with dynamic filter bank adjustment based on speaker categorization

Patent number: 8050922

Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. The speaker is categorized as a male, female, or child and the categorization is used as a basis for dynamically adjusting a maximum frequency fmax and a minimum frequency fmin of a filter bank used for processing the input utterance to produce an output. Corresponding gender or age specific acoustic models are used to perform voice recognition based on the filter bank output.

Type: Grant

Filed: July 21, 2010

Date of Patent: November 1, 2011

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Speaker recognition in a multi-speaker environment and comparison of several voice prints to many

Patent number: 8036892

Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.

Type: Grant

Filed: July 8, 2010

Date of Patent: October 11, 2011

Assignee: American Express Travel Related Services Company, Inc.

Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
Method and apparatus for microphone matching for wearable directional hearing device using wearer's own voice

Patent number: 8031881

Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.

Type: Grant

Filed: September 18, 2007

Date of Patent: October 4, 2011

Assignee: Starkey Laboratories, Inc.

Inventor: Tao Zhang
Closed-loop command and response system for automatic communications between interacting computer systems over an audio communications channel

Patent number: 8032373

Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.

Type: Grant

Filed: February 28, 2007

Date of Patent: October 4, 2011

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Identification of people using multiple types of input

Patent number: 8024189

Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.

Type: Grant

Filed: June 22, 2006

Date of Patent: September 20, 2011

Assignee: Microsoft Corporation

Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
Voice recognition with parallel gender and age normalization

Patent number: 8010358

Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.

Type: Grant

Filed: February 21, 2006

Date of Patent: August 30, 2011

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Data modeling of class independent recognition models

Patent number: 8005674

Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.

Type: Grant

Filed: July 10, 2007

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Eric W Janke, Bin Jia
Discriminative training of multi-state barge-in models for speech processing

Patent number: 8000971

Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.

Type: Grant

Filed: October 31, 2007

Date of Patent: August 16, 2011

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Andrej Ljolje
Method and apparatus for estimating degree of similarity between voices

Patent number: 7996213

Abstract: A similarity degree estimation method is performed by two processes. In a first process, an inter-band correlation matrix is created from spectral data of an input voice such that the spectral data are divided into a plurality of discrete bands which are separated from each other with spaces therebetween along a frequency axis, a plurality of envelope components of the spectral data are obtained from the plurality of the discrete bands, and elements of the inter-band correlation matrix are correlation values between the respective envelope components of the input voice. In a second process, a degree of similarity is calculated between a pair of input voices to be compared with each other by using respective inter-band correlation matrices obtained for the pair of the input voices through the inter-band correlation matrix creation process.

Type: Grant

Filed: March 20, 2007

Date of Patent: August 9, 2011

Assignee: Yamaha Corporation

Inventors: Mikio Tohyama, Michiko Kazama, Satoru Goto, Takehiko Kawahara, Yasuo Yoshioka
Prosody conversion

Patent number: 7996222

Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.

Type: Grant

Filed: September 29, 2006

Date of Patent: August 9, 2011

Assignee: Nokia Corporation

Inventors: Jani K. Nurminen, Elina Helander
Handheld electronic device with text disambiguation

Patent number: 7994943

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.

Type: Grant

Filed: August 27, 2007

Date of Patent: August 9, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
Communicating across voice and text channels with emotion preservation

Patent number: 7983910

Abstract: Communicating across channels with emotion preservation includes: receiving, by a processor in a communication device, a voice communication; analyzing, by the processor in the communication device, the voice communication for first emotion content; analyzing, by the processor in the communication device, textual content of the voice communication for second emotion content; and marking up, by the processor in the communication device, the textual content with emotion metadata for one of the first emotion content and the second emotion content.

Type: Grant

Filed: March 3, 2006

Date of Patent: July 19, 2011

Assignee: International Business Machines Corporation

Inventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
Dynamic speech sharpening

Patent number: 7983917

Abstract: An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.

Type: Grant

Filed: October 29, 2009

Date of Patent: July 19, 2011

Assignee: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
Handheld electronic device with text disambiguation

Patent number: 7969329

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.

Type: Grant

Filed: October 31, 2007

Date of Patent: June 28, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael Elizarov, Sergey V. Kolomiets
Handheld electronic device and method for disambiguation of compound text input for prioritizing compound language solutions according to quantity of text components

Patent number: 7952497

Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.

Type: Grant

Filed: May 6, 2009

Date of Patent: May 31, 2011

Assignee: Research In Motion Limited

Inventors: Vadim Fux, Michael Elizarov
SYSTEM AND METHOD FOR STANDARDIZED SPEECH RECOGNITION INFRASTRUCTURE

Publication number: 20110119059

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

Type: Application

Filed: November 13, 2009

Publication date: May 19, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
METHOD AND SYSTEM FOR SPEAKER DIARIZATION

Publication number: 20110119060

Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.

Type: Application

Filed: November 15, 2009

Publication date: May 19, 2011

Applicant: International Business Machines Corporation

Inventor: Hagai Aronowitz
Systems and methods for providing real-time classification of continuous data streams

Patent number: 7937269

Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.

Type: Grant

Filed: August 22, 2005

Date of Patent: May 3, 2011

Assignee: International Business Machines Corporation

Inventors: Charu Chandra Aggarwal, Philip Shilung Yu
Unsupervised speaker segmentation of multi-speaker speech data

Patent number: 7930179

Abstract: Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.

Type: Grant

Filed: October 2, 2007

Date of Patent: April 19, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, Zhu Liu, Sarangarajan Parthasarathy, Aaron Edward Rosenberg
System for generating language model, method of generating language model, and program for language model generation

Publication number: 20110077943

Abstract: A first system for generating a language model is a system for generating a language model including: a topic history dependent language model storing unit; a topic history accumulation unit; and a language score calculation unit. In the system for generating the language model, a language score corresponding to history of topics is calculated by the language score calculation unit using history of topics in an utterance accumulated in the topic history accumulation unit and a language model stored in the topic history dependent language model storing unit. The topic history dependent language model storing unit may store a topic history dependent language model dependent on only most recent n topics. The topic history accumulation unit may accumulate only most recent n topics.

Type: Application

Filed: June 18, 2007

Publication date: March 31, 2011

Applicant: NEC CORPORATION

Inventors: Kiyokazu Miki, Kentaro Nagatomo
Method for automatic speaker recognition with hurst parameter based features and method for speaker classification based on fractional brownian motion classifiers

Patent number: 7904295

Abstract: It is proposed a text-independent automatic speaker recognition (ASkR) system which employs a new speech feature and a new classifier. The statistical feature pH is a vector of Hurst parameters obtained by applying a wavelet-based multi-dimensional estimator (M dim wavelets) to the windowed short-time segments of speech. The proposed classifier for the speaker identification and verification tasks is based on the multi-dimensional fBm (fractional Brownian motion) model, denoted by M dim fBm. For a given sequence of input speech features, the speaker model is obtained from the sequence of vectors of H parameters, means and variances of these features.

Type: Grant

Filed: September 2, 2004

Date of Patent: March 8, 2011

Inventor: Rosangelo Fernandes Coelho
Signal enhancement via noise reduction for speech recognition

Patent number: 7895038

Abstract: Speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise are provided with a method of signal enhancement including subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.

Type: Grant

Filed: May 26, 2008

Date of Patent: February 22, 2011

Assignee: International Business Machines Corporation

Inventors: Masafumi Nishimura, Tetsuya Takiguchi
Age determination using speech

Patent number: 7881933

Abstract: A device may include logic configured to receive voice data from a user, identify a result from the voice data, calculate a confidence score associated with the result, and determine a likely age range associated with the user based on the confidence score.

Type: Grant

Filed: March 23, 2007

Date of Patent: February 1, 2011

Assignee: Verizon Patent and Licensing Inc.

Inventor: Kevin R. Witzman
Method and apparatus for enrollment and verification of speaker authentication

Patent number: 7877254

Abstract: The present invention provides a method and apparatus for enrollment and verification of speaker authentication. The method for enrollment of speaker authentication, comprising: extracting an acoustic feature vector sequence from an enrollment utterance of a speaker; and generating a speaker template using the acoustic feature vector sequence; wherein said step of extracting an acoustic feature vector sequence comprises: generating a filter-bank for the enrollment utterance of the speaker for filtering locations and energies of formants in the spectrum of the enrollment utterance based on the enrollment utterance; filtering the spectrum of the enrollment utterance by the generated filter-bank; and generating the acoustic feature vector sequence from the filtered enrollment utterance.

Type: Grant

Filed: March 28, 2007

Date of Patent: January 25, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Jian Luan, Pei Ding, Lei He, Jie Hao
APPARATUS AND METHOD FOR ENHANCED SPEECH RECOGNITION

Publication number: 20110004473

Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.

Type: Application

Filed: July 6, 2009

Publication date: January 6, 2011

Applicant: Nice Systems Ltd.

Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
Methods and systems for secured access to devices and systems

Patent number: 7864987

Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.

Type: Grant

Filed: April 18, 2006

Date of Patent: January 4, 2011

Assignee: Infosys Technologies Ltd.

Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
Digital voice enhancement

Patent number: 7853450

Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.

Type: Grant

Filed: March 30, 2007

Date of Patent: December 14, 2010

Assignee: Alcatel-Lucent USA Inc.

Inventor: Bryan Kadel
ELECTRONIC APPARATUS AND VOICE RECOGNITION METHOD FOR THE SAME

Publication number: 20100268538

Abstract: Disclosed are an electronic apparatus and a voice recognition method for the same. The voice recognition method for the electronic apparatus includes: receiving an input voice of a user; determining characteristics of the user; and recognizing the input voice based on the determined characteristics of the user.

Type: Application

Filed: January 7, 2010

Publication date: October 21, 2010

Applicant: Samsung Electronics Co., Ltd.

Inventors: Hee-seob RYU, Seung-kwon PARK, Jong-ho LEA, Jong-hyuk JANG
Method and apparatus for training a text independent speaker recognition system using speech data with text labels

Patent number: 7813927

Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.

Type: Grant

Filed: June 4, 2008

Date of Patent: October 12, 2010

Assignee: Nuance Communications, Inc.

Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
Voice recognition system and method for recognizing input voice information

Patent number: 7809562

Abstract: A voice recognition system has a recognition dictionary storing voice information, a primary voice recognition means for performing primary voice recognition in response to input voice information pronounced by a user by the use of the recognition dictionary, and a recognition result judging means for deciding whether the primary voice recognition result is to be accepted or rejected. The voice recognition system includes a transceiver means for sending the input voice information of the user to an additional voice recognition means when the primary voice recognition result is rejected by the recognition result decision means and for receiving a secondary voice recognition result produced as a result of secondary voice recognition of the additional voice recognition means, and a recognition result output means for outputting the primary or secondary voice recognition result to an exterior of the voice recognition system.

Type: Grant

Filed: July 26, 2006

Date of Patent: October 5, 2010

Assignee: NEC Corporation

Inventor: Ken Hanazawa
METHOD AND SYSTEM TO AUTHENTICATE A USER AND/OR GENERATE CRYPTOGRAPHIC DATA

Publication number: 20100223057

Abstract: System and process for audio authentication of an individual or speaker including a processor for decomposing an audio signal received at the sensor into vectors representative of the speaker to be authenticated for transforming the super-vector V of the speaker resulting from the concatenation of the vectors associated with the said speaker into binary data 1001100 . . . 0 taking as an input the mean super-vector M resulting from the mean super-vector, and comparing the super-vector V of the speaker with the mean super-vector M, the said binary data thus obtained being transmitted to a module for extracting the speaker authentication taking as an input the public keys Kpub(1) in order to authenticate the speaker and/or to generate a cryptographic key associated with the speaker.

Type: Application

Filed: December 22, 2009

Publication date: September 2, 2010

Applicant: Thales

Inventors: François Capman, Sandra Marcello, Jean Martinelli
Adaptation method for inter-person biometrics variability

Patent number: 7788101

Abstract: Embodiments of a system and method for verifying an identity of a claimant are described. In accordance with one embodiment, a feature may be extracted from a biometric sample captured from a claimant claiming an identity. The extracted feature may be compared to a template associated with the identity to determine the similarity between the extracted feature and the template with the similarity between them being represented by a score. A determination may be made to determine whether the identity has a correction factor associated therewith. If the identity is determined to have a correction factor associated therewith, then the score may be modified using the correction factor. The score may then be compared to a threshold to determine whether to accept the claimant as the identity.

Type: Grant

Filed: October 31, 2005

Date of Patent: August 31, 2010

Assignee: Hitachi, Ltd.

Inventor: Clifford Tavares
Method and apparatus for fast search in call-center monitoring

Patent number: 7788095

Abstract: A method and apparatus for indexing one or more audio signals using a speech to text engine and a phoneme detection engine, and generating a combined lattice comprising a text part and a phoneme part. A word to be searched is searched for in the text part, and if not found, or is found with low certainty is divided into phonemes and searched for in the phoneme parts of the lattice.

Type: Grant

Filed: November 18, 2007

Date of Patent: August 31, 2010

Assignee: Nice Systems, Ltd.

Inventors: Moshe Wasserblant, Barak Eilam, Yuval Lubowich, Maor Nissan
Method For Emotion Recognition Based On Minimum Classification Error

Publication number: 20100217595

Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.

Type: Application

Filed: February 23, 2010

Publication date: August 26, 2010

Applicants: KOREA INSTITUTE OF SCIENCE AND TECHNOLOGY, Electronics and Telecommunications Research Institute

Inventors: Hyoung Gon KIM, Ig Jae KIM, Joon-Hyuk CHANG, Kye Hwan LEE, Chang Seok BAE
MULTIPLE LANGUAGE VOICE RECOGNITION

Publication number: 20100211376

Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.

Type: Application

Filed: February 2, 2010

Publication date: August 19, 2010

Applicant: Sony Computer Entertainment Inc.

Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
Voice recognition with dynamic filter bank adjustment based on speaker categorization determined from runtime pitch

Patent number: 7778831

Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.

Type: Grant

Filed: February 21, 2006

Date of Patent: August 17, 2010

Assignee: Sony Computer Entertainment Inc.

Inventor: Ruxin Chen
Speaker recognition in a multi-speaker environment and comparison of several voice prints to many

Patent number: 7778832

Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.

Type: Grant

Filed: September 26, 2007

Date of Patent: August 17, 2010

Assignee: American Express Travel Related Services Company, Inc.

Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
PLATFORM FOR LEARNING BASED RECOGNITION RESEARCH

Publication number: 20100205120

Abstract: A method for researching and developing a recognition model in a computing environment, including gathering one or more data samples from one or more users in the computing environment into a training data set used for creating the recognition model, receiving one or more training parameters defining a feature extraction algorithm configured to analyze one or more features of the training data set, a classifier algorithm configured to associate the features to a template set, a selection of a subset of the training data set, a type of the data samples, or combinations thereof, creating the recognition model based on the training parameters, and evaluating the recognition model.

Type: Application

Filed: February 6, 2009

Publication date: August 12, 2010

Applicant: Microsoft Corporation

Inventors: Yu Zou, Hao Wei, Gong Cheng, Dongmei Zhang, Jian Wang
CONFIDENCE LEVELS FOR SPEAKER RECOGNITION

Publication number: 20100204993

Abstract: The present invention relates to a system and method of making a verification decision within a speaker recognition system. A speech sample is gathered from a speaker over a period of time a verification score is then produce for said sample over the period. Once the verification score is determined a confidence measure is produced based on frame score observations from said sample over the period and a confidence measure calculated based on the standard Gaussian distribution. If the confidence measure indicates with a set level of confidence that the verification score is below the verification threshold the speaker is rejected and gathering process terminated.

Type: Application

Filed: December 19, 2007

Publication date: August 12, 2010

Inventor: Robert VOGT
Speaker Recognition in a Speech Recognition System

Publication number: 20100198598

Abstract: A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker.

Type: Application

Filed: February 4, 2010

Publication date: August 5, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Tobias Herbig, Franz Gerl
Quantizing feature vectors in decision-making applications

Patent number: 7769583

Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.

Type: Grant

Filed: May 13, 2006

Date of Patent: August 3, 2010

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure

prev 1 2 3 4 5 6 7 8 9 next