Specialized Models Patents (Class 704/250)
-
Patent number: 8108212Abstract: A speech recognition method comprises model selection step which selects a recognition model based on characteristic information of input speech and speech recognition step which translates input speech into text data based on the selected recognition model.Type: GrantFiled: October 30, 2007Date of Patent: January 31, 2012Assignee: NEC CorporationInventor: Shuhei Maegawa
-
Patent number: 8099278Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.Type: GrantFiled: December 22, 2010Date of Patent: January 17, 2012Assignee: Verizon Patent and Licensing Inc.Inventor: Kevin R. Witzman
-
Patent number: 8099288Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.Type: GrantFiled: February 12, 2007Date of Patent: January 17, 2012Assignee: Microsoft Corp.Inventors: Zhengyou Zhang, Amarnag Subramaya
-
Patent number: 8099290Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.Type: GrantFiled: October 20, 2009Date of Patent: January 17, 2012Assignee: Mitsubishi Electric CorporationInventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta
-
Publication number: 20120010887Abstract: Embodiments include a speech recognition system and a personal speech profile data (PSPD) storage device that is physically distinct from the speech recognition system. In the speech recognition system, a PSPD interface receives voice training data, which is associated with an individual, from the PSPD storage device. A speech input module produces a digital speech signal derived from an utterance made by a system user. A speech processing module accesses voice training data stored on the PSPD storage device through the PSPD interface, and executes a speech processing algorithm that analyzes the digital speech signal using the voice training data, in order to identify one or more recognized terms from the digital speech signal. A command processing module initiates execution of various applications based on the recognized terms. Embodiments may be implemented in various types of host systems, including an aircraft cockpit-based system.Type: ApplicationFiled: July 8, 2010Publication date: January 12, 2012Applicant: HONEYWELL INTERNATIONAL INC.Inventors: Lokesh Rayasandra Boregowda, Meruva Jayaprakash, Koushik Sinha
-
Patent number: 8086455Abstract: A recognition (e.g., speech, handwriting, etc.) model build process that is declarative and data-dependence-based. Process steps are defined in a declarative language as individual processors having input/output data relationships and data dependencies of predecessors and subsequent process steps. A compiler is utilized to generate the model building sequence. The compiler uses the input data and output data files of each model build processor to determine the sequence of model building and automatically orders the processing steps based on the declared input/output relationship (the user does not need to determine the order of execution). The compiler also automatically detects ill-defined processes, including cyclic definition and data being produced by more than one action. The user can add, change and/or modify a process by editing a declaration file, and rerunning the compiler, thereby a new process is automatically generated.Type: GrantFiled: January 9, 2008Date of Patent: December 27, 2011Assignee: Microsoft CorporationInventors: Yifan Gong, Ye Tian
-
Patent number: 8078462Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.Type: GrantFiled: October 2, 2008Date of Patent: December 13, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Yusuke Shinohara, Masami Akamine
-
Patent number: 8078465Abstract: Certain aspects and embodiments of the present invention are directed to systems and methods for monitoring and analyzing the language environment and the development of a key child. A key child's language environment and language development can be monitored without placing artificial limitations on the key child's activities or requiring a third party observer. The language environment can be analyzed to identify words, vocalizations, or other noises directed to or spoken by the key child, independent of content. The analysis can include the number of responses between the child and another, such as an adult and the number of words spoken by the child and/or another, independent of content of the speech. One or more metrics can be determined based on the analysis and provided to assist in improving the language environment and/or tracking language development of the key child.Type: GrantFiled: January 23, 2008Date of Patent: December 13, 2011Assignee: LENA FoundationInventors: Terrance Paul, Dongxin Xu, Umit Yapenel, Sharmistha Gray
-
Publication number: 20110301953Abstract: Provided is a system of voice recognition that adapts and stores a voice of a speaker for each feature to each of a basic voice model and new independent multi models and provides stable real-time voice recognition through voice recognition using a multi adaptive model.Type: ApplicationFiled: April 11, 2011Publication date: December 8, 2011Applicant: Seoby Electronic Co., LtdInventor: Sung-Sub Lee
-
Patent number: 8050922Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. The speaker is categorized as a male, female, or child and the categorization is used as a basis for dynamically adjusting a maximum frequency fmax and a minimum frequency fmin of a filter bank used for processing the input utterance to produce an output. Corresponding gender or age specific acoustic models are used to perform voice recognition based on the filter bank output.Type: GrantFiled: July 21, 2010Date of Patent: November 1, 2011Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 8036892Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.Type: GrantFiled: July 8, 2010Date of Patent: October 11, 2011Assignee: American Express Travel Related Services Company, Inc.Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
-
Patent number: 8031881Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.Type: GrantFiled: September 18, 2007Date of Patent: October 4, 2011Assignee: Starkey Laboratories, Inc.Inventor: Tao Zhang
-
Patent number: 8032373Abstract: A system and method for enabling two computer systems to communicate over an audio communications channel, such as a voice telephony connection. Such a system includes a software application that enables a user's computer to call, interrogate, download, and manage a voicemail account stored on a telephone company's computer, without human intervention. A voicemail retrieved from the telephone company's computer can be stored in a digital format on the user's computer. In such a format, the voicemail can be readily archived, or even distributed throughout a network, such as the Internet, in a digital form, such as an email attachment. Preferably a computationally efficient audio recognition algorithm is employed by the user's computer to respond to and navigate the automated audio menu of the telephone company's computer.Type: GrantFiled: February 28, 2007Date of Patent: October 4, 2011Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8024189Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: GrantFiled: June 22, 2006Date of Patent: September 20, 2011Assignee: Microsoft CorporationInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Patent number: 8010358Abstract: Methods and apparatus for voice recognition are disclosed. A voice signal is obtained and two or more voice recognition analyses are performed on the voice signal. Each voice recognition analysis uses a filter bank defined by a different maximum frequency and a different minimum frequency and wherein each voice recognition analysis produces a recognition probability ri of recognition of one or more speech units, whereby there are two or more recognition probabilities ri. The maximum frequency and the minimum frequency may be adjusted every time speech is windowed and analyzed. A final recognition probability Pf is determined based on the two or more recognition probabilities ri.Type: GrantFiled: February 21, 2006Date of Patent: August 30, 2011Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 8005674Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.Type: GrantFiled: July 10, 2007Date of Patent: August 23, 2011Assignee: International Business Machines CorporationInventors: Eric W Janke, Bin Jia
-
Patent number: 8000971Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.Type: GrantFiled: October 31, 2007Date of Patent: August 16, 2011Assignee: AT&T Intellectual Property I, L.P.Inventor: Andrej Ljolje
-
Patent number: 7996213Abstract: A similarity degree estimation method is performed by two processes. In a first process, an inter-band correlation matrix is created from spectral data of an input voice such that the spectral data are divided into a plurality of discrete bands which are separated from each other with spaces therebetween along a frequency axis, a plurality of envelope components of the spectral data are obtained from the plurality of the discrete bands, and elements of the inter-band correlation matrix are correlation values between the respective envelope components of the input voice. In a second process, a degree of similarity is calculated between a pair of input voices to be compared with each other by using respective inter-band correlation matrices obtained for the pair of the input voices through the inter-band correlation matrix creation process.Type: GrantFiled: March 20, 2007Date of Patent: August 9, 2011Assignee: Yamaha CorporationInventors: Mikio Tohyama, Michiko Kazama, Satoru Goto, Takehiko Kawahara, Yasuo Yoshioka
-
Patent number: 7996222Abstract: A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.Type: GrantFiled: September 29, 2006Date of Patent: August 9, 2011Assignee: Nokia CorporationInventors: Jani K. Nurminen, Elina Helander
-
Patent number: 7994943Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.Type: GrantFiled: August 27, 2007Date of Patent: August 9, 2011Assignee: Research In Motion LimitedInventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
-
Patent number: 7983910Abstract: Communicating across channels with emotion preservation includes: receiving, by a processor in a communication device, a voice communication; analyzing, by the processor in the communication device, the voice communication for first emotion content; analyzing, by the processor in the communication device, textual content of the voice communication for second emotion content; and marking up, by the processor in the communication device, the textual content with emotion metadata for one of the first emotion content and the second emotion content.Type: GrantFiled: March 3, 2006Date of Patent: July 19, 2011Assignee: International Business Machines CorporationInventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
-
Patent number: 7983917Abstract: An enhanced system for speech interpretation is provided. The system may include receiving a user verbalization and generating one or more preliminary interpretations of the verbalization by identifying one or more phonemes in the verbalization. An acoustic grammar may be used to map the phonemes to syllables or words, and the acoustic grammar may include one or more linking elements to reduce a search space associated with the grammar. The preliminary interpretations may be subject to various post-processing techniques to sharpen accuracy of the preliminary interpretation. A heuristic model may assign weights to various parameters based on a context, a user profile, or other domain knowledge. A probable interpretation may be identified based on a confidence score for each of a set of candidate interpretations generated by the heuristic model. The model may be augmented or updated based on various information associated with the interpretation of the verbalization.Type: GrantFiled: October 29, 2009Date of Patent: July 19, 2011Assignee: VoiceBox Technologies, Inc.Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
-
Patent number: 7969329Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.Type: GrantFiled: October 31, 2007Date of Patent: June 28, 2011Assignee: Research In Motion LimitedInventors: Vadim Fux, Michael Elizarov, Sergey V. Kolomiets
-
Patent number: 7952497Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria.Type: GrantFiled: May 6, 2009Date of Patent: May 31, 2011Assignee: Research In Motion LimitedInventors: Vadim Fux, Michael Elizarov
-
Publication number: 20110119059Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.Type: ApplicationFiled: November 13, 2009Publication date: May 19, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
-
Publication number: 20110119060Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: ApplicationFiled: November 15, 2009Publication date: May 19, 2011Applicant: International Business Machines CorporationInventor: Hagai Aronowitz
-
Patent number: 7937269Abstract: Systems and methods are provided for real-time classification of streaming data. In particular, systems and methods for real-time classification of continuous data streams implement micro-clustering methods for offline and online processing of training data to build and dynamically update training models that are used for classification, as well as incrementally clustering the data over contiguous segments of a continuous data stream (in real-time) into a plurality of micro-clusters from which target profiles are constructed which define/model the behavior of the data in individual segments of the data stream.Type: GrantFiled: August 22, 2005Date of Patent: May 3, 2011Assignee: International Business Machines CorporationInventors: Charu Chandra Aggarwal, Philip Shilung Yu
-
Patent number: 7930179Abstract: Systems and methods for unsupervised segmentation of multi-speaker speech or audio data by speaker. A front-end analysis is applied to input speech data to obtain feature vectors. The speech data is initially segmented and then clustered into groups of segments that correspond to different speakers. The clusters are iteratively modeled and resegmented to obtain stable speaker segmentations. The overlap between segmentation sets is checked to ensure successful speaker segmentation. Overlapping segments are combined and remodeled and resegmented. Optionally, the speech data is processed to produce a segmentation lattice to maximize the overall segmentation likelihood.Type: GrantFiled: October 2, 2007Date of Patent: April 19, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Allen Louis Gorin, Zhu Liu, Sarangarajan Parthasarathy, Aaron Edward Rosenberg
-
Publication number: 20110077943Abstract: A first system for generating a language model is a system for generating a language model including: a topic history dependent language model storing unit; a topic history accumulation unit; and a language score calculation unit. In the system for generating the language model, a language score corresponding to history of topics is calculated by the language score calculation unit using history of topics in an utterance accumulated in the topic history accumulation unit and a language model stored in the topic history dependent language model storing unit. The topic history dependent language model storing unit may store a topic history dependent language model dependent on only most recent n topics. The topic history accumulation unit may accumulate only most recent n topics.Type: ApplicationFiled: June 18, 2007Publication date: March 31, 2011Applicant: NEC CORPORATIONInventors: Kiyokazu Miki, Kentaro Nagatomo
-
Patent number: 7904295Abstract: It is proposed a text-independent automatic speaker recognition (ASkR) system which employs a new speech feature and a new classifier. The statistical feature pH is a vector of Hurst parameters obtained by applying a wavelet-based multi-dimensional estimator (M dim wavelets) to the windowed short-time segments of speech. The proposed classifier for the speaker identification and verification tasks is based on the multi-dimensional fBm (fractional Brownian motion) model, denoted by M dim fBm. For a given sequence of input speech features, the speaker model is obtained from the sequence of vectors of H parameters, means and variances of these features.Type: GrantFiled: September 2, 2004Date of Patent: March 8, 2011Inventor: Rosangelo Fernandes Coelho
-
Patent number: 7895038Abstract: Speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise are provided with a method of signal enhancement including subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.Type: GrantFiled: May 26, 2008Date of Patent: February 22, 2011Assignee: International Business Machines CorporationInventors: Masafumi Nishimura, Tetsuya Takiguchi
-
Patent number: 7881933Abstract: A device may include logic configured to receive voice data from a user, identify a result from the voice data, calculate a confidence score associated with the result, and determine a likely age range associated with the user based on the confidence score.Type: GrantFiled: March 23, 2007Date of Patent: February 1, 2011Assignee: Verizon Patent and Licensing Inc.Inventor: Kevin R. Witzman
-
Patent number: 7877254Abstract: The present invention provides a method and apparatus for enrollment and verification of speaker authentication. The method for enrollment of speaker authentication, comprising: extracting an acoustic feature vector sequence from an enrollment utterance of a speaker; and generating a speaker template using the acoustic feature vector sequence; wherein said step of extracting an acoustic feature vector sequence comprises: generating a filter-bank for the enrollment utterance of the speaker for filtering locations and energies of formants in the spectrum of the enrollment utterance based on the enrollment utterance; filtering the spectrum of the enrollment utterance by the generated filter-bank; and generating the acoustic feature vector sequence from the filtered enrollment utterance.Type: GrantFiled: March 28, 2007Date of Patent: January 25, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Jian Luan, Pei Ding, Lei He, Jie Hao
-
Publication number: 20110004473Abstract: A method and apparatus for improving speech recognition results for an audio signal captured within an organization, comprising: receiving the audio signal captured by a capturing or logging device; extracting a phonetic feature and an acoustic feature from the audio signal; decoding the phonetic feature into a phonetic searchable structure; storing the phonetic searchable structure and the acoustic feature in an index; performing phonetic search for a word or a phrase in the phonetic searchable structure to obtain a result; activating an audio analysis engine which receives the acoustic feature to validate the result and obtain an enhanced result.Type: ApplicationFiled: July 6, 2009Publication date: January 6, 2011Applicant: Nice Systems Ltd.Inventors: Ronen Laperdon, Moshe Wasserblat, Shimrit Artzi, Yuval Lubowich
-
Patent number: 7864987Abstract: An access system in one embodiment that first determines that someone has correct credentials by using a non-biometric authentication method such as typing in a password, presenting a Smart card containing a cryptographic secret, or having a valid digital signature. Once the credentials are authenticated, then the user must take at least two biometric tests, which can be chosen randomly. In one approach, the biometric tests need only check a template generated from the user who desires access with the stored templates matching the holder of the credentials authenticated by the non-biometric test. Access desirably will be allowed when both biometric tests are passed.Type: GrantFiled: April 18, 2006Date of Patent: January 4, 2011Assignee: Infosys Technologies Ltd.Inventors: Kumar Balepur Venkatanna, Rajat Moona, S V Subrahmanya
-
Patent number: 7853450Abstract: A method of transmitting digital voice information comprises encoding raw speech into encoded digital speech data. The beginning and end of individual phonemes within the encoded digital speech data are marked. The encoded digital speech data is formed into packets. The packets are fed into a speech decoding mechanism.Type: GrantFiled: March 30, 2007Date of Patent: December 14, 2010Assignee: Alcatel-Lucent USA Inc.Inventor: Bryan Kadel
-
Publication number: 20100268538Abstract: Disclosed are an electronic apparatus and a voice recognition method for the same. The voice recognition method for the electronic apparatus includes: receiving an input voice of a user; determining characteristics of the user; and recognizing the input voice based on the determined characteristics of the user.Type: ApplicationFiled: January 7, 2010Publication date: October 21, 2010Applicant: Samsung Electronics Co., Ltd.Inventors: Hee-seob RYU, Seung-kwon PARK, Jong-ho LEA, Jong-hyuk JANG
-
Patent number: 7813927Abstract: There is provided an apparatus for providing a Text Independent (TI) speaker recognition mode in a Text Dependent (TD) Hidden Markov Model (HMM) speaker recognition system and/or a Text Constrained (TC) HMM speaker recognition system. The apparatus includes a Gaussian Mixture Model (GMM) generator and a Gaussian weight normalizer. The GMM generator is for creating a GMM by pooling Gaussians from a plurality of HMM states. The Gaussian weight normalizer is for normalizing Gaussian weights with respect to the plurality of HMM states.Type: GrantFiled: June 4, 2008Date of Patent: October 12, 2010Assignee: Nuance Communications, Inc.Inventors: Jiri Navratil, James H. Nealand, Jason W. Pelecanos, Ganesh N. Ramaswamy, Ran D. Zilca
-
Patent number: 7809562Abstract: A voice recognition system has a recognition dictionary storing voice information, a primary voice recognition means for performing primary voice recognition in response to input voice information pronounced by a user by the use of the recognition dictionary, and a recognition result judging means for deciding whether the primary voice recognition result is to be accepted or rejected. The voice recognition system includes a transceiver means for sending the input voice information of the user to an additional voice recognition means when the primary voice recognition result is rejected by the recognition result decision means and for receiving a secondary voice recognition result produced as a result of secondary voice recognition of the additional voice recognition means, and a recognition result output means for outputting the primary or secondary voice recognition result to an exterior of the voice recognition system.Type: GrantFiled: July 26, 2006Date of Patent: October 5, 2010Assignee: NEC CorporationInventor: Ken Hanazawa
-
Publication number: 20100223057Abstract: System and process for audio authentication of an individual or speaker including a processor for decomposing an audio signal received at the sensor into vectors representative of the speaker to be authenticated for transforming the super-vector V of the speaker resulting from the concatenation of the vectors associated with the said speaker into binary data 1001100 . . . 0 taking as an input the mean super-vector M resulting from the mean super-vector, and comparing the super-vector V of the speaker with the mean super-vector M, the said binary data thus obtained being transmitted to a module for extracting the speaker authentication taking as an input the public keys Kpub(1) in order to authenticate the speaker and/or to generate a cryptographic key associated with the speaker.Type: ApplicationFiled: December 22, 2009Publication date: September 2, 2010Applicant: ThalesInventors: François Capman, Sandra Marcello, Jean Martinelli
-
Patent number: 7788101Abstract: Embodiments of a system and method for verifying an identity of a claimant are described. In accordance with one embodiment, a feature may be extracted from a biometric sample captured from a claimant claiming an identity. The extracted feature may be compared to a template associated with the identity to determine the similarity between the extracted feature and the template with the similarity between them being represented by a score. A determination may be made to determine whether the identity has a correction factor associated therewith. If the identity is determined to have a correction factor associated therewith, then the score may be modified using the correction factor. The score may then be compared to a threshold to determine whether to accept the claimant as the identity.Type: GrantFiled: October 31, 2005Date of Patent: August 31, 2010Assignee: Hitachi, Ltd.Inventor: Clifford Tavares
-
Patent number: 7788095Abstract: A method and apparatus for indexing one or more audio signals using a speech to text engine and a phoneme detection engine, and generating a combined lattice comprising a text part and a phoneme part. A word to be searched is searched for in the text part, and if not found, or is found with low certainty is divided into phonemes and searched for in the phoneme parts of the lattice.Type: GrantFiled: November 18, 2007Date of Patent: August 31, 2010Assignee: Nice Systems, Ltd.Inventors: Moshe Wasserblant, Barak Eilam, Yuval Lubowich, Maor Nissan
-
Publication number: 20100217595Abstract: Disclosed herein is a method for emotion recognition based on a minimum classification error. In the method, a speaker's neutral emotion is extracted using a Gaussian mixture model (GMM), other emotions except the neutral emotion are classified using the Gaussian Mixture Model to which a discriminative weight for minimizing the loss function of a classification error for the feature vector for emotion recognition is applied. In the emotion recognition, the emotion recognition is performed by applying a discriminative weight evaluated using the Gaussian Mixture Model based on minimum classification error to feature vectors of the emotion classified with difficult, thereby enhancing the performance of emotion recognition.Type: ApplicationFiled: February 23, 2010Publication date: August 26, 2010Applicants: KOREA INSTITUTE OF SCIENCE AND TECHNOLOGY, Electronics and Telecommunications Research InstituteInventors: Hyoung Gon KIM, Ig Jae KIM, Joon-Hyuk CHANG, Kye Hwan LEE, Chang Seok BAE
-
Publication number: 20100211376Abstract: Computer implemented speech processing generates one or more pronunciations of an input word in a first language by a non-native speaker of the first language who is a native speaker of a second language. The input word is converted into one or more pronunciations. Each pronunciation includes one or more phonemes selected from a set of phonemes associated with the second language. Each pronunciation is associated with the input word in an entry in a computer database. Each pronunciation in the database is associated with information identifying a pronunciation language and/or a phoneme language.Type: ApplicationFiled: February 2, 2010Publication date: August 19, 2010Applicant: Sony Computer Entertainment Inc.Inventors: Ruxin Chen, Gustavo Hernandez-Abrego, Masanori Omote, Xavier Menendez-Pidal
-
Patent number: 7778831Abstract: Voice recognition methods and systems are disclosed. A voice signal is obtained for an utterance of a speaker. A runtime pitch is determined from the voice signal for the utterance. The speaker is categorized based on the runtime pitch and one or more acoustic model parameters are adjusted based on a categorization of the speaker. The parameter adjustment may be performed at any instance of time during the recognition. A voice recognition analysis of the utterance is then performed based on the acoustic model.Type: GrantFiled: February 21, 2006Date of Patent: August 17, 2010Assignee: Sony Computer Entertainment Inc.Inventor: Ruxin Chen
-
Patent number: 7778832Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.Type: GrantFiled: September 26, 2007Date of Patent: August 17, 2010Assignee: American Express Travel Related Services Company, Inc.Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
-
Publication number: 20100205120Abstract: A method for researching and developing a recognition model in a computing environment, including gathering one or more data samples from one or more users in the computing environment into a training data set used for creating the recognition model, receiving one or more training parameters defining a feature extraction algorithm configured to analyze one or more features of the training data set, a classifier algorithm configured to associate the features to a template set, a selection of a subset of the training data set, a type of the data samples, or combinations thereof, creating the recognition model based on the training parameters, and evaluating the recognition model.Type: ApplicationFiled: February 6, 2009Publication date: August 12, 2010Applicant: Microsoft CorporationInventors: Yu Zou, Hao Wei, Gong Cheng, Dongmei Zhang, Jian Wang
-
Publication number: 20100204993Abstract: The present invention relates to a system and method of making a verification decision within a speaker recognition system. A speech sample is gathered from a speaker over a period of time a verification score is then produce for said sample over the period. Once the verification score is determined a confidence measure is produced based on frame score observations from said sample over the period and a confidence measure calculated based on the standard Gaussian distribution. If the confidence measure indicates with a set level of confidence that the verification score is below the verification threshold the speaker is rejected and gathering process terminated.Type: ApplicationFiled: December 19, 2007Publication date: August 12, 2010Inventor: Robert VOGT
-
Publication number: 20100198598Abstract: A method for recognizing a speaker of an utterance in a speech recognition system is disclosed. A likelihood score for each of a plurality of speaker models for different speakers is determined. The likelihood score indicating how well the speaker model corresponds to the utterance. For each of the plurality of speaker models, a probability that the utterance originates from that speaker is determined. The probability is determined based on the likelihood score for the speaker model and requires the estimation of a distribution of likelihood scores expected based at least in part on the training state of the speaker.Type: ApplicationFiled: February 4, 2010Publication date: August 5, 2010Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Tobias Herbig, Franz Gerl
-
Patent number: 7769583Abstract: A system, method and computer program product for classification of an analog electrical signal using statistical models of training data. A technique is described to quantize the analog electrical signal in a manner which maximizes the compression of the signal while simultaneously minimizing the diminution in the ability to classify the compressed signal. These goals are achieved by utilizing a quantizer designed to minimize the loss in a power of the log-likelihood ratio. A further technique is described to enhance the quantization process by optimally allocating a number of bits for each dimension of the quantized feature vector subject to a maximum number of bits available across all dimensions.Type: GrantFiled: May 13, 2006Date of Patent: August 3, 2010Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Hsin I. Tseng, Deepak S. Turaga, Olivier Verscheure